Harold Nelson
2/23/2021
Get the necessary packages and data.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
This little presentation looks at patterns in the weather for the Olympia area. It is based on data from the Olympia Airport beginning in 1941. About a dozen observations were removed because one or more variables had missing values.
Look at the data and see what we have.
## tibble [29,472 × 9] (S3: tbl_df/tbl/data.frame)
## $ STATION: chr [1:29472] "USW00024227" "USW00024227" "USW00024227" "USW00024227" ...
## $ NAME : chr [1:29472] "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" "OLYMPIA AIRPORT, WA US" ...
## $ DATE : Date[1:29472], format: "1941-05-13" "1941-05-14" ...
## $ PRCP : num [1:29472] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
## $ TMAX : num [1:29472] 66 63 58 55 57 59 58 65 68 85 ...
## $ TMIN : num [1:29472] 50 47 44 45 46 39 40 50 42 46 ...
## $ yr : Factor w/ 82 levels "1941","1942",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ mo : Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ dy : Factor w/ 31 levels "1","2","3","4",..: 13 14 15 16 17 18 19 20 21 22 ...
## STATION NAME DATE PRCP
## Length:29472 Length:29472 Min. :1941-05-13 Min. :0.0000
## Class :character Class :character 1st Qu.:1961-07-14 1st Qu.:0.0000
## Mode :character Mode :character Median :1981-09-15 Median :0.0000
## Mean :1981-09-19 Mean :0.1369
## 3rd Qu.:2001-12-01 3rd Qu.:0.1400
## Max. :2022-02-11 Max. :4.8200
##
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 1944 : 366 8 : 2511
## 1st Qu.: 50.00 1st Qu.:33.00 1948 : 366 10 : 2511
## Median : 59.00 Median :40.00 1952 : 366 7 : 2510
## Mean : 60.55 Mean :39.83 1956 : 366 12 : 2506
## 3rd Qu.: 71.00 3rd Qu.:47.00 1960 : 366 1 : 2504
## Max. :110.00 Max. :69.00 1964 : 366 5 : 2494
## (Other):27276 (Other):14436
## dy
## 1 : 969
## 2 : 969
## 4 : 969
## 5 : 969
## 10 : 969
## 11 : 969
## (Other):23658
To summarize the annual pattern, we need to put all of the data into a single year. You could think of this as making a prediction for an arbitrary future year. We need to do this so we can have a date variable that doesn’t contain different years. We’ll use 2024 since it’s a leap year. We’ll call this new dataframe cal24. Use make_date() from the lubridate package. Run summary() on cal24 to verify that you succeeded.
## STATION NAME DATE PRCP
## Length:29472 Length:29472 Min. :2024-01-01 Min. :0.0000
## Class :character Class :character 1st Qu.:2024-04-02 1st Qu.:0.0000
## Mode :character Mode :character Median :2024-07-02 Median :0.0000
## Mean :2024-07-02 Mean :0.1369
## 3rd Qu.:2024-10-01 3rd Qu.:0.1400
## Max. :2024-12-31 Max. :4.8200
##
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 Min. :2024 8 : 2511
## 1st Qu.: 50.00 1st Qu.:33.00 1st Qu.:2024 10 : 2511
## Median : 59.00 Median :40.00 Median :2024 7 : 2510
## Mean : 60.55 Mean :39.83 Mean :2024 12 : 2506
## 3rd Qu.: 71.00 3rd Qu.:47.00 3rd Qu.:2024 1 : 2504
## Max. :110.00 Max. :69.00 Max. :2024 5 : 2494
## (Other):14436
## dy
## 1 : 969
## 2 : 969
## 4 : 969
## 5 : 969
## 10 : 969
## 11 : 969
## (Other):23658
Create a scatterplot showing all of the values of TMAX for each value of DATE. Set size and alpha to .1 in the call to geom_point() to deal with the overplotting.
## Smoothing
Add geom_smooth() to the previous graph.
cal24 %>%
ggplot(aes(x = DATE, y = TMAX)) +
geom_point(size = .2, alpha = .2) +
geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Which takes longer, the rise or the fall?
It seems to take longer going up.
Add a geom_point() with red points to show calendar 2021. Create a dataframe cal21 by filtering for yr = 2021. Use make_date() to change the yr to 2024. Add a second geom_point() with this dataframe as the data argument. Set the size parameter to .5 in the second geom_point().
cal21 = oly_airport %>%
filter(yr == 2021) %>%
mutate(DATE = make_date(2024,mo,dy))
cal24 %>%
ggplot(aes(x = DATE, y = TMAX)) +
geom_point(size = .2, alpha = .2) +
geom_point(data = cal21,aes(x = DATE, y = TMAX),
color = "red",size=.5)
For each value of DATE compute a mean of TMAX and display the results in a graph.
cal24 %>%
group_by(DATE) %>%
summarize(MTMAX = mean(TMAX)) %>%
ggplot(aes(x = DATE,y = MTMAX)) +
geom_point(size = .5)
Do a density plot of the mean values of TMAX.
cal24 %>%
group_by(DATE) %>%
summarize(MTMAX = mean(TMAX)) %>%
ggplot(aes(x = MTMAX)) +
geom_density(adjust = .2)
Create a scatterplot showing all of the values of TMIN for each value of DATE. Set size and alpha to .1 in the call to geom_point() to deal with the overplotting.
## Smoothing
Add geom_smooth() to the previous graph.
cal24 %>%
ggplot(aes(x = DATE, y = TMIN)) +
geom_point(size = .2, alpha = .2) +
geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Add a geom_point() with red points to show calendar 2021. Create a dataframe cal21 by filtering for yr = 2021. Use make_date() to change the yr to 2024. Add a second geom_point() with this dataframe as the data argument. Set the size parameter to .5 in the second geom_point().
cal21 = oly_airport %>%
filter(yr == 2021) %>%
mutate(DATE = make_date(2024,mo,dy))
cal24 %>%
ggplot(aes(x = DATE, y = TMIN)) +
geom_point(size = .2, alpha = .2) +
geom_point(data = cal21,aes(x = DATE, y = TMIN),
color = "red",size=.5)
For each value of DATE compute a mean of TMIN and display the results in a graph.
cal24 %>%
group_by(DATE) %>%
summarize(MTMIN = mean(TMIN)) %>%
ggplot(aes(x = DATE,y = MTMIN)) +
geom_point(size = .5)
Do a density plot of the mean values of TMIN.