Chapter 3 Notes

Harold Nelson

2023-01-31

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
mesosm <- read_csv("mesodata_small.csv", show_col_types = FALSE)
mesobig = read_csv("mesodata_large.csv",
show_col_types = FALSE)

Problem 1

Write code to filter the rows from the Skiatook station (SKIA) where the precipitation is greater than one inch using the mesosm dataset.

Solution

new_df = mesosm %>% 
  filter(STID == "SKIA", RAIN > 1)
summary(new_df)
##      MONTH           YEAR          STID                TMAX      
##  Min.   : 1.0   Min.   :2014   Length:50          Min.   :46.58  
##  1st Qu.: 4.0   1st Qu.:2015   Class :character   1st Qu.:63.28  
##  Median : 7.0   Median :2016   Mode  :character   Median :74.74  
##  Mean   : 6.7   Mean   :2016                      Mean   :74.08  
##  3rd Qu.: 9.0   3rd Qu.:2017                      3rd Qu.:85.50  
##  Max.   :12.0   Max.   :2018                      Max.   :92.41  
##       TMIN            HMAX            HMIN            RAIN       
##  Min.   :27.30   Min.   :79.93   Min.   :29.50   Min.   : 1.010  
##  1st Qu.:41.70   1st Qu.:86.37   1st Qu.:42.90   1st Qu.: 2.007  
##  Median :56.79   Median :89.58   Median :46.84   Median : 3.560  
##  Mean   :54.15   Mean   :88.75   Mean   :47.29   Mean   : 4.058  
##  3rd Qu.:66.99   3rd Qu.:91.89   3rd Qu.:51.30   3rd Qu.: 5.440  
##  Max.   :72.93   Max.   :95.02   Max.   :65.83   Max.   :11.930  
##       DATE           
##  Min.   :2014-03-01  
##  1st Qu.:2015-04-08  
##  Median :2016-06-16  
##  Mean   :2016-06-29  
##  3rd Qu.:2017-08-24  
##  Max.   :2018-12-01

Problem 2

Write code to selct only the month, year, station ID, and rainfall columns using the mesosm dataset.

Solution

new_df = mesosm %>% 
  select(MONTH, YEAR, STID, RAIN)
head(new_df)
## # A tibble: 6 × 4
##   MONTH  YEAR STID   RAIN
##   <dbl> <dbl> <chr> <dbl>
## 1     1  2014 HOOK   0.17
## 2     2  2014 HOOK   0.3 
## 3     3  2014 HOOK   0.31
## 4     4  2014 HOOK   0.4 
## 5     5  2014 HOOK   1.25
## 6     6  2014 HOOK   3.18

Problem 3

Write code to create a new column to the mesosm dataset called RAINMM that contains the rainfall values converted from inches to millimeters (1 mm = 0.3937 in).

Solution

new_df = mesosm %>% 
  mutate(RAINMM = .3937 * RAIN)

head(new_df)
## # A tibble: 6 × 10
##   MONTH  YEAR STID   TMAX  TMIN  HMAX  HMIN  RAIN DATE       RAINMM
##   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <date>      <dbl>
## 1     1  2014 HOOK   49.5  17.9  83.0  29.0  0.17 2014-01-01 0.0669
## 2     2  2014 HOOK   47.2  17.1  88.3  39.9  0.3  2014-02-01 0.118 
## 3     3  2014 HOOK   60.7  26.1  79.0  25.4  0.31 2014-03-01 0.122 
## 4     4  2014 HOOK   72.4  39.3  81.8  21.0  0.4  2014-04-01 0.157 
## 5     5  2014 HOOK   84.4  48.3  75.4  18.8  1.25 2014-05-01 0.492 
## 6     6  2014 HOOK   90.7  61.9  90.9  28.8  3.18 2014-06-01 1.25

Problem 4

Use the mesobig dataset to determine the percent of valid rainfall observations for each combination of station, month, and year and generate a graph of the results.

big_valid = mesobig %>% 
  mutate(RAIN = ifelse(RAIN < 990 & RAIN > -990,RAIN, NA)) %>% 
  group_by(MONTH,YEAR,STID) %>% 
  summarize(bad = sum(is.na(RAIN)),
            good = sum(!is.na(RAIN))) %>% 
  ungroup() %>% 
  mutate(fract_valid = good/(good + bad))
## `summarise()` has grouped output by 'MONTH', 'YEAR'. You can override using the
## `.groups` argument.
head(big_valid)
## # A tibble: 6 × 6
##   MONTH  YEAR STID    bad  good fract_valid
##   <dbl> <dbl> <chr> <int> <int>       <dbl>
## 1     1  1994 ACME     31     0           0
## 2     1  1994 ADAX      0    31           1
## 3     1  1994 ALTU      0    31           1
## 4     1  1994 ALV2     31     0           0
## 5     1  1994 ALVA      0    31           1
## 6     1  1994 ANT2     31     0           0

Now get some useful graphs.

big_valid %>% 
  ggplot(aes( x = fract_valid)) +
  geom_density() +
  geom_rug()

Problem 5

Use the mesosm dataset to generate a data frame containing minimum and maximum humidity in long format and graph the results.

Solution

humidity_tidy <- mesosm %>%
  pivot_longer(cols = starts_with("H"),
               values_to = "Humidity",
               names_to = "hstat") %>%
  mutate(hstat = factor(hstat,
                        levels = c("HMAX", "HMIN"),
                        labels = c("Maximum", "Minimum")))
glimpse(humidity_tidy)
## Rows: 480
## Columns: 9
## $ MONTH    <dbl> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,…
## $ YEAR     <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2…
## $ STID     <chr> "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK", "HOOK…
## $ TMAX     <dbl> 49.48032, 49.48032, 47.18071, 47.18071, 60.70613, 60.70613, 7…
## $ TMIN     <dbl> 17.92903, 17.92903, 17.05357, 17.05357, 26.06806, 26.06806, 3…
## $ RAIN     <dbl> 0.17, 0.17, 0.30, 0.30, 0.31, 0.31, 0.40, 0.40, 1.25, 1.25, 3…
## $ DATE     <date> 2014-01-01, 2014-01-01, 2014-02-01, 2014-02-01, 2014-03-01, …
## $ hstat    <fct> Maximum, Minimum, Maximum, Minimum, Maximum, Minimum, Maximum…
## $ Humidity <dbl> 83.04161, 29.00226, 88.28857, 39.94393, 78.96613, 25.39871, 8…