Harold Nelson
2/24/2022
Get the necessary packages and data.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## STATION NAME DATE PRCP
## Length:29472 Length:29472 Min. :2024-01-01 Min. :0.0000
## Class :character Class :character 1st Qu.:2024-04-02 1st Qu.:0.0000
## Mode :character Mode :character Median :2024-07-02 Median :0.0000
## Mean :2024-07-02 Mean :0.1369
## 3rd Qu.:2024-10-01 3rd Qu.:0.1400
## Max. :2024-12-31 Max. :4.8200
##
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 Min. :2024 8 : 2511
## 1st Qu.: 50.00 1st Qu.:33.00 1st Qu.:2024 10 : 2511
## Median : 59.00 Median :40.00 Median :2024 7 : 2510
## Mean : 60.55 Mean :39.83 Mean :2024 12 : 2506
## 3rd Qu.: 71.00 3rd Qu.:47.00 3rd Qu.:2024 1 : 2504
## Max. :110.00 Max. :69.00 Max. :2024 5 : 2494
## (Other):14436
## dy
## 1 : 969
## 2 : 969
## 4 : 969
## 5 : 969
## 10 : 969
## 11 : 969
## (Other):23658
Do a plot of the standard deviation of TMAX based on cal24.
cal24 %>%
group_by(DATE) %>%
summarize(sd_TMAX = sd(TMAX)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = sd_TMAX)) +
geom_point() +
ggtitle("Standard Deviation of TMAX")
Let’s look at the annual pattern for the difference between TMAX and TMIN using cal24.
cal24 %>%
mutate(diff = TMAX - TMIN) %>%
group_by(DATE) %>%
summarize(diff = mean(diff)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = diff)) +
geom_point() +
ggtitle("Difference between TMAX and TMIN")
The difference is much larger during the warm months of the year.
We can use plotly to make a plot interactive in two steps.
Create a ggplot object instead of just displaying the plot. You can display the plot by referencing the named object.
Use the named object in a call to ggplotly() to get an interactive graph.
There are two possible ways to look at precipitation. We could use either the mean value of precipitation for a date, or the probability of precipitation on that date.
Do the mean value of precipitation first. We’ll use plotly.
g1 = cal24 %>%
group_by(DATE) %>%
summarize(mean_precip = mean(PRCP)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = mean_precip)) +
geom_point()
ggplotly(g1) # For Rpubs or other html
Now do the probability of precipitation.
## Solution
cal24 %>%
group_by(DATE) %>%
summarize(prob_precip = mean(PRCP > 0)) %>%
ungroup() %>%
ggplot(aes(x = DATE, y = prob_precip)) +
geom_point()
Observation: Based on these two graphs, there are obvious similarities between the two, but there is one notable difference. The heavy rainfall of November and December does not carry over to the following January and February.
Create a graph showing loess curves for precipitation and TMAX. Since these two variables have such different values, you will have to create z-scores to make them visually compatible. Call the z-score variables n_TMAX and n_PRCP.
cal24 %>%
mutate(n_TMAX = (TMAX - mean(TMAX))/sd(TMAX),
n_PRCP = (PRCP - mean(PRCP))/sd(PRCP)) %>%
ggplot(aes(x = DATE)) +
geom_smooth(aes(y = n_TMAX), color = "red") +
geom_smooth(aes(y = n_PRCP), color = "blue")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The turning points in the summer are essentially the same. The peak in the precipitation curve to the right matches what we noted earlier in the graph of mean precipitation.