Harold Nelson
2026-06-08
Run the command to make the tidyverse available to your R session
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
After downloading OAW2309.Rdata from Moodle, load it and use glimpse() to inspect it.
## Rows: 30,075
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…
Use filter to create a dataframe with just the observations from January 1. Call the new dataframe Jan1. Use head() to verify the result.
## # A tibble: 6 × 7
## DATE PRCP TMAX TMIN mo dy yr
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl>
## 1 1942-01-01 0 35 11 1 1 1942
## 2 1943-01-01 0.05 42 34 1 1 1943
## 3 1944-01-01 0.61 48 35 1 1 1944
## 4 1945-01-01 0 51 40 1 1 1945
## 5 1946-01-01 0.35 52 43 1 1 1946
## 6 1947-01-01 0 41 25 1 1 1947
Warming? Use this data to consider the possibility that weather in Olympia has been getting warmer. Create an appropriate graph using the variables yr and TMAX.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Creat a new dataframe JFM15.It should contain data for the 15th of January, February, and March. Use head() to verify.
## # A tibble: 6 × 7
## DATE PRCP TMAX TMIN mo dy yr
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl>
## 1 1942-01-15 0 41 29 1 15 1942
## 2 1942-02-15 0 50 27 2 15 1942
## 3 1942-03-15 0.07 45 27 3 15 1942
## 4 1943-01-15 0.05 47 29 1 15 1943
## 5 1943-02-15 0 52 28 2 15 1943
## 6 1943-03-15 0 51 31 3 15 1943
Compare the values of TMAX for these three days with a boxplot.
## Task 7
A different comparison. Use facetting to show the differences among these three months. Make the basic plot a histogram.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
A different geom. Repeat the previous exercise but use geom_density() instead of geom_histogram().
Use mutate() to create a new variable TDIFF in the OAW2309 dataframe. It is TMAX - TMIN. Use head() to verify your work.
## # A tibble: 6 × 8
## DATE PRCP TMAX TMIN mo dy yr TDIFF
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl> <dbl>
## 1 1941-05-13 0 66 50 5 13 1941 16
## 2 1941-05-14 0 63 47 5 14 1941 16
## 3 1941-05-15 0.3 58 44 5 15 1941 14
## 4 1941-05-16 1.08 55 45 5 16 1941 10
## 5 1941-05-17 0.06 57 46 5 17 1941 11
## 6 1941-05-18 0 59 39 5 18 1941 20
Create a new dataframe SUM_DIFF with one row for each of the 12 months. The new variables in this dataframe are mean_diff and sd_diff. Arrange the dataframe by mean_diff.
SUM_DIFF = OAW2309 %>%
group_by(mo) %>%
summarize(mean_diff = mean(TDIFF),
sd_diff = sd(TDIFF)) %>%
arrange(mean_diff)
head(SUM_DIFF)## # A tibble: 6 × 3
## mo mean_diff sd_diff
## <fct> <dbl> <dbl>
## 1 12 12.2 5.42
## 2 1 13.0 6.25
## 3 11 15.0 6.80
## 4 2 16.5 7.74
## 5 3 19.6 8.58
## 6 10 20.5 8.73
Create a scatterplot of mean_diff and sd_diff.
Create a scatterplot of mean_diff and mo.
Use the basic dataframe OAW2309 to create a new dataframe SKINNY with just DATE, TMAX, and PRCP. Create a scatterplot with TMAX on the horizontal axis and PRCP on the vertical axis. Use the size parameter of geom_point to get a reasonable graph. Try sizes below .1.
## # A tibble: 6 × 3
## DATE TMAX PRCP
## <date> <dbl> <dbl>
## 1 1941-05-13 66 0
## 2 1941-05-14 63 0
## 3 1941-05-15 58 0.3
## 4 1941-05-16 55 1.08
## 5 1941-05-17 57 0.06
## 6 1941-05-18 59 0
Rerun the graphic with a different geom. Use geom_smooth()