Harold Nelson
11/4/2020
Make numpy available as np. Make pandas available as pd.
Get the Data.
Using pd.read_csv create the dataframe OAW2 from the csv file. Set the row number to index 0.
Create a boolean variable, Nov4_bool. This variable is true if the value of mo is 11 and the value of dy is 4 in the dataframe.
Use the boolean variable you just created to make a new dataframe Nov4_df by taking the correct days from OAW2. Use the nethod info() to verify your success. Is the number of rows correct? Do you still have all of the variables from the original dataframe?
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 79 entries, 176 to 28651
## Data columns (total 8 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 PRCP 79 non-null float64
## 1 TMAX 79 non-null float64
## 2 TMIN 79 non-null float64
## 3 yr 79 non-null int64
## 4 mo 79 non-null int64
## 5 dy 79 non-null int64
## 6 warmth 79 non-null object
## 7 wetness 79 non-null object
## dtypes: float64(3), int64(3), object(2)
## memory usage: 5.6+ KB
Use the methods head() and tail() to see the beginning and end of the dataframe.
## Beginning of Nov4_df
## PRCP TMAX TMIN yr mo dy warmth wetness
## 176 0.881890 59.00 48.02 1941 11 4 Cold Really Wet
## 541 0.051181 50.00 33.98 1942 11 4 Cold Damp
## 906 0.279528 59.00 42.98 1943 11 4 Cold Really Wet
## 1272 0.031496 60.08 46.04 1944 11 4 Cold Damp
## 1637 0.051181 53.96 42.08 1945 11 4 Cold Damp
## End
## PRCP TMAX TMIN yr mo dy warmth wetness
## 27190 0.031496 50.00 33.08 2015 11 4 Cold Damp
## 27556 0.011811 62.06 35.96 2016 11 4 Warm Damp
## 27921 0.381890 42.08 30.20 2017 11 4 Cold Really Wet
## 28286 0.559055 59.00 48.92 2018 11 4 Cold Really Wet
## 28651 0.000000 55.94 32.00 2019 11 4 Cold Dry
Repeat the steps above to create Dec4_df.
Dec4_bool = np.logical_and(OAW2["mo"] == 12, OAW2["dy"] == 4)
Dec4_df = OAW2[Dec4_bool]
Dec4_df.info()
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 79 entries, 206 to 28681
## Data columns (total 8 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 PRCP 79 non-null float64
## 1 TMAX 79 non-null float64
## 2 TMIN 79 non-null float64
## 3 yr 79 non-null int64
## 4 mo 79 non-null int64
## 5 dy 79 non-null int64
## 6 warmth 79 non-null object
## 7 wetness 79 non-null object
## dtypes: float64(3), int64(3), object(2)
## memory usage: 5.6+ KB
Use the method describe() on both dataframes. How do the values of PRCP, TMAX, and TMIN differ? Use the mean and median values to make your comparisons. Can we say that December 4 tends to be colder and wetter than November 4 in Olympia?
## November 4
## PRCP TMAX TMIN yr mo dy
## count 79.000000 79.000000 79.000000 79.000000 79.0 79.0
## mean 0.207715 54.971646 36.711899 1980.000000 11.0 4.0
## std 0.328971 5.892136 8.172382 22.949219 0.0 0.0
## min 0.000000 39.920000 21.920000 1941.000000 11.0 4.0
## 25% 0.000000 51.080000 30.020000 1960.500000 11.0 4.0
## 50% 0.098425 55.040000 35.960000 1980.000000 11.0 4.0
## 75% 0.295276 57.920000 42.980000 1999.500000 11.0 4.0
## max 1.940945 73.940000 53.960000 2019.000000 11.0 4.0
## December 4
## PRCP TMAX TMIN yr mo dy
## count 79.000000 79.000000 79.000000 79.000000 79.0 79.0
## mean 0.214841 46.076456 32.669873 1980.000000 12.0 4.0
## std 0.393739 5.430252 7.131187 22.949219 0.0 0.0
## min 0.000000 33.080000 6.080000 1941.000000 12.0 4.0
## 25% 0.000000 42.080000 28.940000 1960.500000 12.0 4.0
## 50% 0.059055 46.040000 33.080000 1980.000000 12.0 4.0
## 75% 0.220472 50.000000 35.960000 1999.500000 12.0 4.0
## max 2.271654 60.080000 46.040000 2019.000000 12.0 4.0
Note that the describe() method gave us information about the numeric variables, but it says nothing about the two categorical variables warmth and wetness. These variables have character string values. There is a method value_counts() we can use to see how the categorical variables differ between these two days.
Use the value_counts() method to see the contents of Nov4_df[“warmth”] and Dec4_df[“warmth”]. Describe the results.
## November
## Cold 72
## Warm 7
## Name: warmth, dtype: int64
## December
## Cold 79
## Name: warmth, dtype: int64
Repeat the last exercise for the variable wetness.
## November
## Really Wet 35
## Dry 29
## Damp 15
## Name: wetness, dtype: int64
## December
## Dry 28
## Really Wet 28
## Damp 23
## Name: wetness, dtype: int64
The categorical variable warmth has two values defined on the basis of the quantitative variable TMAX. We’d like to see the maximum value of TMAX for days with warmth = “Cold”. To see this do the following.
Using OAW2, create a new dataframe Cold_days by selecting those days where the value of warmth is “Cold”.
Use the method describe() to see the maximum value of TMAX in the new dataframe.
## PRCP TMAX ... mo dy
## count 15233.000000 15233.000000 ... 15233.000000 15233.000000
## mean 0.221215 49.780864 ... 6.069126 15.699337
## std 0.368447 6.777733 ... 4.340668 8.771451
## min 0.000000 17.960000 ... 1.000000 1.000000
## 25% 0.000000 46.040000 ... 2.000000 8.000000
## 50% 0.059055 50.000000 ... 4.000000 16.000000
## 75% 0.299213 55.040000 ... 11.000000 23.000000
## max 4.818898 60.080000 ... 12.000000 31.000000
##
## [8 rows x 6 columns]