import pandas as pd
import numpy as np
Let’s look at the women data set from R.
df = pd.read_csv("women.csv")
df = df.drop(columns = ["Unnamed: 0"]) # we drop this column because we do not need it
df
## height weight
## 0 58 115
## 1 59 117
## 2 60 120
## 3 61 123
## 4 62 126
## 5 63 129
## 6 64 132
## 7 65 135
## 8 66 139
## 9 67 142
## 10 68 146
## 11 69 150
## 12 70 154
## 13 71 159
## 14 72 164
This is the same data set we looked at when we learned about data frames in R.
Now, let’s see how we look at the first column in women. This column is height.
df.iloc[0:15,0]
## 0 58
## 1 59
## 2 60
## 3 61
## 4 62
## 5 63
## 6 64
## 7 65
## 8 66
## 9 67
## 10 68
## 11 69
## 12 70
## 13 71
## 14 72
## Name: height, dtype: int64
We have to use the function iloc to do this. Also, notice how there is a 0:15 before the comma. This indicates the rows we want to see. 0 is the first row, and 14 is the last row. In Python, you need to include one number above the last row you would like to ensure all rows are included. We also wanted the first column, which is why there is a 0 after the column.
We also can find the height column by writing the code chunk below.
df["height"]
## 0 58
## 1 59
## 2 60
## 3 61
## 4 62
## 5 63
## 6 64
## 7 65
## 8 66
## 9 67
## 10 68
## 11 69
## 12 70
## 13 71
## 14 72
## Name: height, dtype: int64
Notice the Name: height on the bottom. This tells us what column we are looking at.
Let’s look at the data on the fifth woman. Remember the first womans data is in row 0, so the fifth womans data is in row 4. Since we want to look at columns 1 and 2, we write 0:2 for the columns we want. Remember we always add a number to the end when using :.
df.iloc[4, 0:2] # fifth woman is in the row 4
## height 62
## weight 126
## Name: 4, dtype: int64
Notice that the bottom says Name: 4. This means the data shown is for the fifth woman.
Let’s look at the weight of the seventh woman. Since we only want one row and one column, we can just put the number of which row and column we want.
df.iloc[6, 1]
## 132
If we want to look at two women at once, we can put the row numbers in brackets before the comma. The code chunk below will show us the height and weight for the fourth and sixth women.
df.iloc[[3, 5], 0:2]
## height weight
## 3 61 123
## 5 63 129
If we want to sort the values in weight, we can use the function sort_values. Look at the code chunk below to figure out how to do this.
df.sort_values("weight")
## height weight
## 0 58 115
## 1 59 117
## 2 60 120
## 3 61 123
## 4 62 126
## 5 63 129
## 6 64 132
## 7 65 135
## 8 66 139
## 9 67 142
## 10 68 146
## 11 69 150
## 12 70 154
## 13 71 159
## 14 72 164
Use the mtcars dataset to practice some of the things we learned.
df1 = pd.read_csv("mtcars.csv")
df1
## Unnamed: 0 mpg cyl disp hp ... qsec vs am gear carb
## 0 Mazda RX4 21.0 6 160.0 110 ... 16.46 0 1 4 4
## 1 Mazda RX4 Wag 21.0 6 160.0 110 ... 17.02 0 1 4 4
## 2 Datsun 710 22.8 4 108.0 93 ... 18.61 1 1 4 1
## 3 Hornet 4 Drive 21.4 6 258.0 110 ... 19.44 1 0 3 1
## 4 Hornet Sportabout 18.7 8 360.0 175 ... 17.02 0 0 3 2
## 5 Valiant 18.1 6 225.0 105 ... 20.22 1 0 3 1
## 6 Duster 360 14.3 8 360.0 245 ... 15.84 0 0 3 4
## 7 Merc 240D 24.4 4 146.7 62 ... 20.00 1 0 4 2
## 8 Merc 230 22.8 4 140.8 95 ... 22.90 1 0 4 2
## 9 Merc 280 19.2 6 167.6 123 ... 18.30 1 0 4 4
## 10 Merc 280C 17.8 6 167.6 123 ... 18.90 1 0 4 4
## 11 Merc 450SE 16.4 8 275.8 180 ... 17.40 0 0 3 3
## 12 Merc 450SL 17.3 8 275.8 180 ... 17.60 0 0 3 3
## 13 Merc 450SLC 15.2 8 275.8 180 ... 18.00 0 0 3 3
## 14 Cadillac Fleetwood 10.4 8 472.0 205 ... 17.98 0 0 3 4
## 15 Lincoln Continental 10.4 8 460.0 215 ... 17.82 0 0 3 4
## 16 Chrysler Imperial 14.7 8 440.0 230 ... 17.42 0 0 3 4
## 17 Fiat 128 32.4 4 78.7 66 ... 19.47 1 1 4 1
## 18 Honda Civic 30.4 4 75.7 52 ... 18.52 1 1 4 2
## 19 Toyota Corolla 33.9 4 71.1 65 ... 19.90 1 1 4 1
## 20 Toyota Corona 21.5 4 120.1 97 ... 20.01 1 0 3 1
## 21 Dodge Challenger 15.5 8 318.0 150 ... 16.87 0 0 3 2
## 22 AMC Javelin 15.2 8 304.0 150 ... 17.30 0 0 3 2
## 23 Camaro Z28 13.3 8 350.0 245 ... 15.41 0 0 3 4
## 24 Pontiac Firebird 19.2 8 400.0 175 ... 17.05 0 0 3 2
## 25 Fiat X1-9 27.3 4 79.0 66 ... 18.90 1 1 4 1
## 26 Porsche 914-2 26.0 4 120.3 91 ... 16.70 0 1 5 2
## 27 Lotus Europa 30.4 4 95.1 113 ... 16.90 1 1 5 2
## 28 Ford Pantera L 15.8 8 351.0 264 ... 14.50 0 1 5 4
## 29 Ferrari Dino 19.7 6 145.0 175 ... 15.50 0 1 5 6
## 30 Maserati Bora 15.0 8 301.0 335 ... 14.60 0 1 5 8
## 31 Volvo 142E 21.4 4 121.0 109 ... 18.60 1 1 4 2
##
## [32 rows x 12 columns]
To double check what you get for your answers to these problems, look at the data frame to see if you get the correct number.
1.) Look at the mpg column of mtcars (Hint: There are two ways to do this. Look back at lines 30 and 38).
2.) Look at all of the data on the Merc 230 (Hint: The Merc 230 is in the 9th row. Also note that there are 12 columns in this data set. Remember to use iloc. Look at line 46 for help).
3.) Look at the number of cylinders (cyl) for the Datsun 710 (Hint: cyl is the third column and the Datsun 710 is in the 3rd row. Remember to use iloc. Look at line 54 for help).
4.) Sort the mpg for this data set (Hint: use sort_values. Look at line 66 for help).