DataFrames-Python

import pandas as pd
import numpy as np

Let’s look at the women data set from R.

df = pd.read_csv("women.csv")
df = df.drop(columns = ["Unnamed: 0"]) # we drop this column because we do not                                         need it
df

##     height  weight
## 0       58     115
## 1       59     117
## 2       60     120
## 3       61     123
## 4       62     126
## 5       63     129
## 6       64     132
## 7       65     135
## 8       66     139
## 9       67     142
## 10      68     146
## 11      69     150
## 12      70     154
## 13      71     159
## 14      72     164

This is the same data set we looked at when we learned about data frames in R.

Now, let’s see how we look at the first column in women. This column is height.

df.iloc[0:15,0]

## 0     58
## 1     59
## 2     60
## 3     61
## 4     62
## 5     63
## 6     64
## 7     65
## 8     66
## 9     67
## 10    68
## 11    69
## 12    70
## 13    71
## 14    72
## Name: height, dtype: int64

We have to use the function iloc to do this. Also, notice how there is a 0:15 before the comma. This indicates the rows we want to see. 0 is the first row, and 14 is the last row. In Python, you need to include one number above the last row you would like to ensure all rows are included. We also wanted the first column, which is why there is a 0 after the column.

We also can find the height column by writing the code chunk below.

df["height"]

## 0     58
## 1     59
## 2     60
## 3     61
## 4     62
## 5     63
## 6     64
## 7     65
## 8     66
## 9     67
## 10    68
## 11    69
## 12    70
## 13    71
## 14    72
## Name: height, dtype: int64

Notice the Name: height on the bottom. This tells us what column we are looking at.

Let’s look at the data on the fifth woman. Remember the first womans data is in row 0, so the fifth womans data is in row 4. Since we want to look at columns 1 and 2, we write 0:2 for the columns we want. Remember we always add a number to the end when using :.

df.iloc[4, 0:2] # fifth woman is in the row 4

## height     62
## weight    126
## Name: 4, dtype: int64

Notice that the bottom says Name: 4. This means the data shown is for the fifth woman.

Let’s look at the weight of the seventh woman. Since we only want one row and one column, we can just put the number of which row and column we want.

df.iloc[6, 1]

## 132

If we want to look at two women at once, we can put the row numbers in brackets before the comma. The code chunk below will show us the height and weight for the fourth and sixth women.

df.iloc[[3, 5], 0:2]

##    height  weight
## 3      61     123
## 5      63     129

If we want to sort the values in weight, we can use the function sort_values. Look at the code chunk below to figure out how to do this.

df.sort_values("weight")

##     height  weight
## 0       58     115
## 1       59     117
## 2       60     120
## 3       61     123
## 4       62     126
## 5       63     129
## 6       64     132
## 7       65     135
## 8       66     139
## 9       67     142
## 10      68     146
## 11      69     150
## 12      70     154
## 13      71     159
## 14      72     164

Practice

Use the mtcars dataset to practice some of the things we learned.

df1 = pd.read_csv("mtcars.csv")
df1

##              Unnamed: 0   mpg  cyl   disp   hp  ...   qsec  vs  am  gear  carb
## 0             Mazda RX4  21.0    6  160.0  110  ...  16.46   0   1     4     4
## 1         Mazda RX4 Wag  21.0    6  160.0  110  ...  17.02   0   1     4     4
## 2            Datsun 710  22.8    4  108.0   93  ...  18.61   1   1     4     1
## 3        Hornet 4 Drive  21.4    6  258.0  110  ...  19.44   1   0     3     1
## 4     Hornet Sportabout  18.7    8  360.0  175  ...  17.02   0   0     3     2
## 5               Valiant  18.1    6  225.0  105  ...  20.22   1   0     3     1
## 6            Duster 360  14.3    8  360.0  245  ...  15.84   0   0     3     4
## 7             Merc 240D  24.4    4  146.7   62  ...  20.00   1   0     4     2
## 8              Merc 230  22.8    4  140.8   95  ...  22.90   1   0     4     2
## 9              Merc 280  19.2    6  167.6  123  ...  18.30   1   0     4     4
## 10            Merc 280C  17.8    6  167.6  123  ...  18.90   1   0     4     4
## 11           Merc 450SE  16.4    8  275.8  180  ...  17.40   0   0     3     3
## 12           Merc 450SL  17.3    8  275.8  180  ...  17.60   0   0     3     3
## 13          Merc 450SLC  15.2    8  275.8  180  ...  18.00   0   0     3     3
## 14   Cadillac Fleetwood  10.4    8  472.0  205  ...  17.98   0   0     3     4
## 15  Lincoln Continental  10.4    8  460.0  215  ...  17.82   0   0     3     4
## 16    Chrysler Imperial  14.7    8  440.0  230  ...  17.42   0   0     3     4
## 17             Fiat 128  32.4    4   78.7   66  ...  19.47   1   1     4     1
## 18          Honda Civic  30.4    4   75.7   52  ...  18.52   1   1     4     2
## 19       Toyota Corolla  33.9    4   71.1   65  ...  19.90   1   1     4     1
## 20        Toyota Corona  21.5    4  120.1   97  ...  20.01   1   0     3     1
## 21     Dodge Challenger  15.5    8  318.0  150  ...  16.87   0   0     3     2
## 22          AMC Javelin  15.2    8  304.0  150  ...  17.30   0   0     3     2
## 23           Camaro Z28  13.3    8  350.0  245  ...  15.41   0   0     3     4
## 24     Pontiac Firebird  19.2    8  400.0  175  ...  17.05   0   0     3     2
## 25            Fiat X1-9  27.3    4   79.0   66  ...  18.90   1   1     4     1
## 26        Porsche 914-2  26.0    4  120.3   91  ...  16.70   0   1     5     2
## 27         Lotus Europa  30.4    4   95.1  113  ...  16.90   1   1     5     2
## 28       Ford Pantera L  15.8    8  351.0  264  ...  14.50   0   1     5     4
## 29         Ferrari Dino  19.7    6  145.0  175  ...  15.50   0   1     5     6
## 30        Maserati Bora  15.0    8  301.0  335  ...  14.60   0   1     5     8
## 31           Volvo 142E  21.4    4  121.0  109  ...  18.60   1   1     4     2
## 
## [32 rows x 12 columns]

To double check what you get for your answers to these problems, look at the data frame to see if you get the correct number.

1.) Look at the mpg column of mtcars (Hint: There are two ways to do this. Look back at lines 30 and 38).

2.) Look at all of the data on the Merc 230 (Hint: The Merc 230 is in the 9th row. Also note that there are 12 columns in this data set. Remember to use iloc. Look at line 46 for help).

3.) Look at the number of cylinders (cyl) for the Datsun 710 (Hint: cyl is the third column and the Datsun 710 is in the 3rd row. Remember to use iloc. Look at line 54 for help).

4.) Sort the mpg for this data set (Hint: use sort_values. Look at line 66 for help).

DataFrames-Python

Alex Lewis

1/9/2021

Practice