Pandas Notes 2

Harold Nelson

11/4/2020

Task 1

Make numpy available as np. Make pandas available as pd.

Answer

import numpy as np
import pandas as pd

Task 2

Get the Data.

Using pd.read_csv create the dataframe OAW2 from the csv file. Set the row number to index 0.

Answer

OAW2 = pd.read_csv("OAW2.csv",index_col = 0)

Task 3

Create a boolean variable, Nov4_bool. This variable is true if the value of mo is 11 and the value of dy is 4 in the dataframe.

Answer

Nov4_bool = np.logical_and(OAW2["mo"] == 11, OAW2["dy"] == 4)

Task 4

Use the boolean variable you just created to make a new dataframe Nov4_df by taking the correct days from OAW2. Use the nethod info() to verify your success. Is the number of rows correct? Do you still have all of the variables from the original dataframe?

Answer

Nov4_df = OAW2[Nov4_bool]
Nov4_df.info()
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 79 entries, 176 to 28651
## Data columns (total 8 columns):
##  #   Column   Non-Null Count  Dtype  
## ---  ------   --------------  -----  
##  0   PRCP     79 non-null     float64
##  1   TMAX     79 non-null     float64
##  2   TMIN     79 non-null     float64
##  3   yr       79 non-null     int64  
##  4   mo       79 non-null     int64  
##  5   dy       79 non-null     int64  
##  6   warmth   79 non-null     object 
##  7   wetness  79 non-null     object 
## dtypes: float64(3), int64(3), object(2)
## memory usage: 5.6+ KB

Task 5

Use the methods head() and tail() to see the beginning and end of the dataframe.

Answer

print("Beginning of Nov4_df")
## Beginning of Nov4_df
print()
Nov4_df.head()
##           PRCP   TMAX   TMIN    yr  mo  dy warmth     wetness
## 176   0.881890  59.00  48.02  1941  11   4   Cold  Really Wet
## 541   0.051181  50.00  33.98  1942  11   4   Cold        Damp
## 906   0.279528  59.00  42.98  1943  11   4   Cold  Really Wet
## 1272  0.031496  60.08  46.04  1944  11   4   Cold        Damp
## 1637  0.051181  53.96  42.08  1945  11   4   Cold        Damp
print()
print("End")
## End
Nov4_df.tail()
##            PRCP   TMAX   TMIN    yr  mo  dy warmth     wetness
## 27190  0.031496  50.00  33.08  2015  11   4   Cold        Damp
## 27556  0.011811  62.06  35.96  2016  11   4   Warm        Damp
## 27921  0.381890  42.08  30.20  2017  11   4   Cold  Really Wet
## 28286  0.559055  59.00  48.92  2018  11   4   Cold  Really Wet
## 28651  0.000000  55.94  32.00  2019  11   4   Cold         Dry

Task 6

Repeat the steps above to create Dec4_df.

Answer

Dec4_bool = np.logical_and(OAW2["mo"] == 12, OAW2["dy"] == 4)
Dec4_df = OAW2[Dec4_bool]
Dec4_df.info()
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 79 entries, 206 to 28681
## Data columns (total 8 columns):
##  #   Column   Non-Null Count  Dtype  
## ---  ------   --------------  -----  
##  0   PRCP     79 non-null     float64
##  1   TMAX     79 non-null     float64
##  2   TMIN     79 non-null     float64
##  3   yr       79 non-null     int64  
##  4   mo       79 non-null     int64  
##  5   dy       79 non-null     int64  
##  6   warmth   79 non-null     object 
##  7   wetness  79 non-null     object 
## dtypes: float64(3), int64(3), object(2)
## memory usage: 5.6+ KB

Task 7

Use the method describe() on both dataframes. How do the values of PRCP, TMAX, and TMIN differ? Use the mean and median values to make your comparisons. Can we say that December 4 tends to be colder and wetter than November 4 in Olympia?

Answer

print("November 4")
## November 4
print(Nov4_df.describe())
##             PRCP       TMAX       TMIN           yr    mo    dy
## count  79.000000  79.000000  79.000000    79.000000  79.0  79.0
## mean    0.207715  54.971646  36.711899  1980.000000  11.0   4.0
## std     0.328971   5.892136   8.172382    22.949219   0.0   0.0
## min     0.000000  39.920000  21.920000  1941.000000  11.0   4.0
## 25%     0.000000  51.080000  30.020000  1960.500000  11.0   4.0
## 50%     0.098425  55.040000  35.960000  1980.000000  11.0   4.0
## 75%     0.295276  57.920000  42.980000  1999.500000  11.0   4.0
## max     1.940945  73.940000  53.960000  2019.000000  11.0   4.0
print()
print("December 4")
## December 4
print(Dec4_df.describe())
##             PRCP       TMAX       TMIN           yr    mo    dy
## count  79.000000  79.000000  79.000000    79.000000  79.0  79.0
## mean    0.214841  46.076456  32.669873  1980.000000  12.0   4.0
## std     0.393739   5.430252   7.131187    22.949219   0.0   0.0
## min     0.000000  33.080000   6.080000  1941.000000  12.0   4.0
## 25%     0.000000  42.080000  28.940000  1960.500000  12.0   4.0
## 50%     0.059055  46.040000  33.080000  1980.000000  12.0   4.0
## 75%     0.220472  50.000000  35.960000  1999.500000  12.0   4.0
## max     2.271654  60.080000  46.040000  2019.000000  12.0   4.0

Task 8

Note that the describe() method gave us information about the numeric variables, but it says nothing about the two categorical variables warmth and wetness. These variables have character string values. There is a method value_counts() we can use to see how the categorical variables differ between these two days.

Use the value_counts() method to see the contents of Nov4_df[“warmth”] and Dec4_df[“warmth”]. Describe the results.

Answer

print("November")
## November
print(Nov4_df["warmth"].value_counts())
## Cold    72
## Warm     7
## Name: warmth, dtype: int64
print()
print("December")
## December
print(Dec4_df["warmth"].value_counts())
## Cold    79
## Name: warmth, dtype: int64

Task 9

Repeat the last exercise for the variable wetness.

Answer

print("November")
## November
print(Nov4_df["wetness"].value_counts())
## Really Wet    35
## Dry           29
## Damp          15
## Name: wetness, dtype: int64
print()
print("December")
## December
print(Dec4_df["wetness"].value_counts())
## Dry           28
## Really Wet    28
## Damp          23
## Name: wetness, dtype: int64

Task 9

The categorical variable warmth has two values defined on the basis of the quantitative variable TMAX. We’d like to see the maximum value of TMAX for days with warmth = “Cold”. To see this do the following.

  1. Using OAW2, create a new dataframe Cold_days by selecting those days where the value of warmth is “Cold”.

  2. Use the method describe() to see the maximum value of TMAX in the new dataframe.

Answer

Cold_bool = OAW2["warmth"] == "Cold"
Cold_days = OAW2[Cold_bool]
Cold_days.describe()
##                PRCP          TMAX  ...            mo            dy
## count  15233.000000  15233.000000  ...  15233.000000  15233.000000
## mean       0.221215     49.780864  ...      6.069126     15.699337
## std        0.368447      6.777733  ...      4.340668      8.771451
## min        0.000000     17.960000  ...      1.000000      1.000000
## 25%        0.000000     46.040000  ...      2.000000      8.000000
## 50%        0.059055     50.000000  ...      4.000000     16.000000
## 75%        0.299213     55.040000  ...     11.000000     23.000000
## max        4.818898     60.080000  ...     12.000000     31.000000
## 
## [8 rows x 6 columns]