Harold Nelson
2/13/2022
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
This is the summary of the data received from NOAA with the data
ORDER ID: #2873535 Status Complete Date Submitted 2022-02-10 Product GHCND (CSV) Order Details
Processing Completed 2022-02-10 Stations
GHCND:USW00024227 Begin Date 1941-05-13 00:00 End Date 2022-02-07 23:59 Data Types
TMAX TMIN Units
Standard Custom Flag(s)
Station Name Eligible for Certification No
The data is available in Moodle as file_2873535. Bring it into your R Project.
Use readr and change the type of the Date column to “character”.
Use glimpse() on oly_airport.
## Rows: 29,497
## Columns: 6
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE <chr> "1941-05-13", "1941-05-14", "1941-05-15", "1941-05-16", "1941-…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…
The character column DATE contains dates in ISO-8601 format. Use as.date() to convert it and run glimpse again.
## Rows: 29,497
## Columns: 6
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…
Do a summary() and check for anomalies.
## STATION NAME DATE PRCP
## Length:29497 Length:29497 Min. :1941-05-13 Min. :0.0000
## Class :character Class :character 1st Qu.:1961-07-21 1st Qu.:0.0000
## Mode :character Mode :character Median :1981-09-28 Median :0.0000
## Mean :1981-09-28 Mean :0.1369
## 3rd Qu.:2001-12-06 3rd Qu.:0.1400
## Max. :2022-02-13 Max. :4.8200
## NA's :12
## TMAX TMIN
## Min. : 18.00 Min. :-8.00
## 1st Qu.: 50.00 1st Qu.:33.00
## Median : 59.00 Median :40.00
## Mean : 60.55 Mean :39.83
## 3rd Qu.: 71.00 3rd Qu.:47.00
## Max. :110.00 Max. :69.00
## NA's :22 NA's :20
## # A tibble: 25 × 6
## STATION NAME DATE PRCP TMAX TMIN
## <chr> <chr> <date> <dbl> <dbl> <dbl>
## 1 USW00024227 OLYMPIA AIRPORT, WA US 1996-01-24 NA 39 33
## 2 USW00024227 OLYMPIA AIRPORT, WA US 1996-07-03 0.12 67 NA
## 3 USW00024227 OLYMPIA AIRPORT, WA US 1996-12-26 NA NA NA
## 4 USW00024227 OLYMPIA AIRPORT, WA US 1996-12-27 NA NA NA
## 5 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-06 0 NA 28
## 6 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-07 0 61 NA
## 7 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-12 0 NA 28
## 8 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-13 0.39 NA NA
## 9 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-14 0.35 NA NA
## 10 USW00024227 OLYMPIA AIRPORT, WA US 1997-05-07 0 NA NA
## # … with 15 more rows
The missing data came from periods in 1996-97 and 2021-22.
## STATION NAME DATE PRCP
## Length:29472 Length:29472 Min. :1941-05-13 Min. :0.0000
## Class :character Class :character 1st Qu.:1961-07-14 1st Qu.:0.0000
## Mode :character Mode :character Median :1981-09-15 Median :0.0000
## Mean :1981-09-19 Mean :0.1369
## 3rd Qu.:2001-12-01 3rd Qu.:0.1400
## Max. :2022-02-11 Max. :4.8200
## TMAX TMIN
## Min. : 18.00 Min. :-8.00
## 1st Qu.: 50.00 1st Qu.:33.00
## Median : 59.00 Median :40.00
## Mean : 60.55 Mean :39.83
## 3rd Qu.: 71.00 3rd Qu.:47.00
## Max. :110.00 Max. :69.00
Get density with rug plots for TMAX and TMIN.
Note the peak at a moderate temperature and a shoulder with a secondary peak to the right of the primary peak.
Note the pronounced left skew and the single primary peak with weak shoulders to the left and right.
Load the package lubridate. Then use the functions year(), month() and day() to create the variables yr, mo, and dy. Make these variables factors. Use glimpse() and summary() to verify the results.
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
oly_airport = oly_airport %>%
mutate(yr = factor(year(DATE)),
mo = factor(month(DATE)),
dy = factor(day(DATE)))
glimpse(oly_airport)
## Rows: 29,472
## Columns: 9
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1…
## $ PRCP <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…
## $ yr <fct> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 19…
## $ mo <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6,…
## $ dy <fct> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28…
## STATION NAME DATE PRCP
## Length:29472 Length:29472 Min. :1941-05-13 Min. :0.0000
## Class :character Class :character 1st Qu.:1961-07-14 1st Qu.:0.0000
## Mode :character Mode :character Median :1981-09-15 Median :0.0000
## Mean :1981-09-19 Mean :0.1369
## 3rd Qu.:2001-12-01 3rd Qu.:0.1400
## Max. :2022-02-11 Max. :4.8200
##
## TMAX TMIN yr mo
## Min. : 18.00 Min. :-8.00 1944 : 366 8 : 2511
## 1st Qu.: 50.00 1st Qu.:33.00 1948 : 366 10 : 2511
## Median : 59.00 Median :40.00 1952 : 366 7 : 2510
## Mean : 60.55 Mean :39.83 1956 : 366 12 : 2506
## 3rd Qu.: 71.00 3rd Qu.:47.00 1960 : 366 1 : 2504
## Max. :110.00 Max. :69.00 1964 : 366 5 : 2494
## (Other):27276 (Other):14436
## dy
## 1 : 969
## 2 : 969
## 4 : 969
## 5 : 969
## 10 : 969
## 11 : 969
## (Other):23658
You will be able to get the data without rerunning this.