Pandas Notes 1

Harold Nelson

11/1/2020

Setup

In Cocalc, create a folder called “Pandas Notes”.

Download the file OAW2.csv from Moodle. Then upload it to the folder you just created in Cocalc.

Create a Jupyter notebook in the same folder to do the work below.

The data is from the weather station at the Olympia Airport.

Task 1

Make numpy available as np. Make pandas available as pd.

Answer

import numpy as np
import pandas as pd

Task 2

Get the Data.

Using pd.read_csv create the dataframe OAW2 from the csv file. Set the row number to index 0.

Answer

OAW2 = pd.read_csv("OAW2.csv",index_col = 0)

Task 3

What is it?

Use the method info() to get some basic information about the dataframe.

Answer

OAW2.info()
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 28708 entries, 1 to 28708
## Data columns (total 8 columns):
##  #   Column   Non-Null Count  Dtype  
## ---  ------   --------------  -----  
##  0   PRCP     28708 non-null  float64
##  1   TMAX     28708 non-null  float64
##  2   TMIN     28708 non-null  float64
##  3   yr       28708 non-null  int64  
##  4   mo       28708 non-null  int64  
##  5   dy       28708 non-null  int64  
##  6   warmth   28708 non-null  object 
##  7   wetness  28708 non-null  object 
## dtypes: float64(3), int64(3), object(2)
## memory usage: 2.0+ MB

Task 4

Get some statistical information on the contents.

Use the method describe().

Answer

OAW2.describe()
##                PRCP          TMAX  ...            mo           dy
## count  28708.000000  28708.000000  ...  28708.000000  28708.00000
## mean       0.136342     60.537656  ...      6.540999     15.73530
## std        0.300729     13.683659  ...      3.446058      8.80051
## min        0.000000     17.960000  ...      1.000000      1.00000
## 25%        0.000000     50.000000  ...      4.000000      8.00000
## 50%        0.000000     59.000000  ...      7.000000     16.00000
## 75%        0.141732     71.060000  ...     10.000000     23.00000
## max        4.818898    104.000000  ...     12.000000     31.00000
## 
## [8 rows x 6 columns]

Task 5

Use the methods head() and tail() to see the beginning and end of the dataframe.

Answer

print("Beginning")
## Beginning
print()
OAW2.head()
##        PRCP   TMAX   TMIN    yr  mo  dy warmth     wetness
## 1  0.000000  66.02  50.00  1941   5  13   Warm         Dry
## 2  0.000000  62.96  46.94  1941   5  14   Warm         Dry
## 3  0.299213  57.92  44.06  1941   5  15   Cold  Really Wet
## 4  1.078740  55.04  44.96  1941   5  16   Cold  Really Wet
## 5  0.059055  57.02  46.04  1941   5  17   Cold        Damp
print()
print("End")
## End
OAW2.tail()
##            PRCP   TMAX   TMIN    yr  mo  dy warmth     wetness
## 28704  0.011811  44.06  35.06  2019  12  27   Cold        Damp
## 28705  0.019685  44.06  33.08  2019  12  28   Cold        Damp
## 28706  0.039370  46.94  35.96  2019  12  29   Cold        Damp
## 28707  0.000000  46.94  42.08  2019  12  30   Cold         Dry
## 28708  1.559055  51.98  44.06  2019  12  31   Cold  Really Wet