Harold Nelson
2025-02-18
Create a new project, OAW.
Download the file OAW2309.Rdata.
Move the file to your project directory.
file.choose().
getwd().
Get the tidyverse and the OAW2309 dataframe.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Use str()
## tibble [30,075 × 7] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:30075], format: "1941-05-13" "1941-05-14" ...
## $ PRCP: num [1:30075] 0 0 0.3 1.08 0.06 0 0 0 0 0 ...
## $ TMAX: num [1:30075] 66 63 58 55 57 59 58 65 68 85 ...
## $ TMIN: num [1:30075] 50 47 44 45 46 39 40 50 42 46 ...
## $ mo : Factor w/ 12 levels "1","2","3","4",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ dy : int [1:30075] 13 14 15 16 17 18 19 20 21 22 ...
## $ yr : num [1:30075] 1941 1941 1941 1941 1941 ...
It is an alternative to str() and part of the tidyverse.
## Rows: 30,075
## Columns: 7
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1941…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59, 6…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46, 4…
## $ mo <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
## $ dy <int> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 2…
## $ yr <dbl> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941,…
This is a useful tool to inspect a dataframe. Install it, then load it.
Do a summary() Use geom_density() and geom_rug()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 50.00 59.00 60.62 71.00 110.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8.00 33.00 40.00 39.86 47.00 69.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1362 0.1400 4.8200
Transform to the base 10 logarithm. Add .1 to avoid zero values.
OAW2309 %>%
mutate(LPRCP = log10(PRCP + .1)) %>%
ggplot(aes(x = LPRCP)) +
geom_density(adjust = .5) +
geom_rug()
## Zero Values
What fraction of the days have zero values of PRCP?
What is the smallest non-zero value of PRCP?
## # A tibble: 6 × 7
## DATE PRCP TMAX TMIN mo dy yr
## <date> <dbl> <dbl> <dbl> <fct> <int> <dbl>
## 1 1941-05-26 0.01 59 37 5 26 1941
## 2 1941-06-01 0.01 65 48 6 1 1941
## 3 1941-08-28 0.01 70 51 8 28 1941
## 4 1941-09-11 0.01 66 44 9 11 1941
## 5 1941-11-05 0.01 64 40 11 5 1941
## 6 1941-12-11 0.01 44 34 12 11 1941
What fraction of the non-zero value days have PRCP > .01?
Which years have the greatest number of non-zero days.
years = non_zero %>%
group_by(yr) %>%
summarize(count = n()) %>%
arrange(desc(count))
head(years,10)
## # A tibble: 10 × 2
## yr count
## <dbl> <int>
## 1 1950 193
## 2 2016 190
## 3 1997 188
## 4 1948 186
## 5 1999 185
## 6 1968 184
## 7 1953 183
## 8 1955 183
## 9 1984 183
## 10 1990 183
## # A tibble: 10 × 2
## yr count
## <dbl> <int>
## 1 1952 143
## 2 1970 141
## 3 1993 141
## 4 2015 141
## 5 1985 131
## 6 1944 130
## 7 1987 130
## 8 1943 129
## 9 2023 107
## 10 1941 97
Exclude 1941 and 2023, which are incomplete. Then do a scatterplot with year on the horizontal axis and count on the vertical.
years %>%
filter(yr != 1941 & yr != 2023) %>%
ggplot(aes(x = yr, y = count)) +
geom_point() +
geom_smooth(method = "lm", color = "red")
## `geom_smooth()` using formula = 'y ~ x'
Define a day as bad if PRCP > 0 and TMIN <= 32. Calculate and display the probability that a day will be bad for each month.
OAW2309 %>%
mutate(bad = PRCP > 0 & TMIN <= 32) %>%
group_by(mo) %>%
summarize(fract_bad = mean(bad)) %>%
ungroup()
## # A tibble: 12 × 2
## mo fract_bad
## <fct> <dbl>
## 1 1 0.207
## 2 2 0.184
## 3 3 0.154
## 4 4 0.0709
## 5 5 0.00900
## 6 6 0.000402
## 7 7 0
## 8 8 0
## 9 9 0
## 10 10 0.0232
## 11 11 0.117
## 12 12 0.208