Harold Nelson
2/10/2021
This little presentation looks at weather for the Olympia area. If you’re new, what should you expect over the next few months?
I thought that July 2017 was unusually hot. Was it really different?
Get the necessary packages and data.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ dplyr 1.0.3
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Look at the data and see what we have.
## tibble [28,705 × 9] (S3: tbl_df/tbl/data.frame)
## $ STATION_NAME: chr [1:28705] "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" "OLYMPIA PRIEST PT PA WA US" ...
## $ DATE : Date[1:28705], format: "1948-01-01" "1948-01-02" ...
## $ PRCP : num [1:28705] 0.82 0.15 0.62 0.53 0 0.39 0.51 0.68 0 1.32 ...
## $ SNOW : num [1:28705] 0 0 0 0 0 0 0 0 0 0 ...
## $ TMAX : num [1:28705] 50 43 48 45 46 45 52 49 48 43 ...
## $ TMIN : num [1:28705] 40 40 35 35 31 33 40 36 29 31 ...
## $ yr : num [1:28705] 1948 1948 1948 1948 1948 ...
## $ mo : num [1:28705] 1 1 1 1 1 1 1 1 1 1 ...
## $ dy : int [1:28705] 1 2 3 4 5 6 7 8 9 10 ...
## STATION_NAME DATE PRCP SNOW
## Length:28705 Min. :1948-01-01 Min. :0.0000 Min. : 0.00000
## Class :character 1st Qu.:1959-10-04 1st Qu.:0.0000 1st Qu.: 0.00000
## Mode :character Median :1979-05-28 Median :0.0000 Median : 0.00000
## Mean :1980-03-14 Mean :0.1404 Mean : 0.03357
## 3rd Qu.:1999-01-21 3rd Qu.:0.1400 3rd Qu.: 0.00000
## Max. :2018-09-21 Max. :4.8200 Max. :14.20000
##
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 Min. :1948 Min. : 1.000
## 1st Qu.: 50.00 1st Qu.:33.00 1st Qu.:1959 1st Qu.: 4.000
## Median : 59.00 Median :40.00 Median :1979 Median : 7.000
## Mean : 60.45 Mean :39.85 Mean :1980 Mean : 6.501
## 3rd Qu.: 71.00 3rd Qu.:47.00 3rd Qu.:1999 3rd Qu.: 9.000
## Max. :104.00 Max. :69.00 Max. :2018 Max. :12.000
## NA's :13 NA's :14
## dy
## Min. : 1.00
## 1st Qu.: 8.00
## Median :16.00
## Mean :15.72
## 3rd Qu.:23.00
## Max. :31.00
##
The dataset contains daily observations of weather from July 1948 forward.
olyw1018 %>%
group_by(mo,dy) %>%
summarize(prain = mean(PRCP > 0,na.rm=TRUE)) %>%
ungroup() -> rainy
## `summarise()` has grouped output by 'mo'. You can override using the `.groups` argument.
Let’s create a permanent weather prediction for any future year based on average historical values for each day.
olyw1018 %>%
group_by(mo,dy) %>%
summarize(prain = mean(PRCP > 0),
dmax = mean(TMAX),
midmax = median(TMAX),
dmin = mean(TMIN)) %>%
ungroup() %>%
mutate(date =make_date(2020,mo,dy))-> cal20
## `summarise()` has grouped output by 'mo'. You can override using the `.groups` argument.
Put your code here showing only the months without the year.
Here’s a graph showing all of the historical values.
Here’s the same graph with 2018 values enhanced.
cal18 = filter(olyw1018,yr == 2018) %>%
mutate(date =make_date(2020,mo,dy))
cal20_all %>%
ggplot(aes(x=date,y=TMAX)) +
geom_point(alpha=.1,size=.5) +
geom_point(data=cal18,aes(date,TMAX),alpha=.8,color="red",size=.6) +
ggtitle("All Daily Maximum Temperatures") +
theme(plot.title = element_text(hjust = 0.5)) -> v1
ggplotly(v1)
cal18 = filter(olyw1018,yr == 2018) %>%
mutate(date =make_date(2020,mo,dy))
cal20_all %>%
ggplot(aes(x=date,y=TMAX)) +
geom_smooth(size=.5,color="blue") +
geom_point(data=cal18,aes(date,TMAX),alpha=.8,color="red",size=.6) +
ggtitle("Smoothed Daily Maximum Temperatures\n Calendar 2018 is Red") +
theme(plot.title = element_text(hjust = 0.5))
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
cal18 = filter(olyw1018,yr == 2018) %>%
mutate(date =make_date(2020,mo,dy))
cal20%>%
ggplot(aes(x=date,y=midmax)) +
geom_point(size= 1,color="blue") +
geom_point(data=cal18,aes(date,TMAX),alpha=.8,color="red",size=.6) +
ggtitle("Daily Maximum Temperatures\n Calendar 2018 is Red\n Median is Blue") +
theme(plot.title = element_text(hjust = 0.5)) -> v3
ggplotly(v3)