Importing Weather Data

Harold Nelson

2/13/2022

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)

Data Summary

This is the summary of the data received from NOAA with the data

ORDER ID: #2873535 Status Complete Date Submitted 2022-02-10 Product GHCND (CSV) Order Details
Processing Completed 2022-02-10 Stations
GHCND:USW00024227 Begin Date 1941-05-13 00:00 End Date 2022-02-07 23:59 Data Types
TMAX TMIN Units
Standard Custom Flag(s)
Station Name Eligible for Certification No

Download from Moodle

The data is available in Moodle as file_2873535. Bring it into your R Project.

Import into R

Use readr and change the type of the Date column to “character”.

Solution

oly_airport <- read_csv("file_2877326.csv", col_types = cols(DATE = col_character()))

Examine

Use glimpse() on oly_airport.

Solution

glimpse(oly_airport)
## Rows: 29,497
## Columns: 6
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME    <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE    <chr> "1941-05-13", "1941-05-14", "1941-05-15", "1941-05-16", "1941-…
## $ PRCP    <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX    <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN    <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…

The character column DATE contains dates in ISO-8601 format. Use as.date() to convert it and run glimpse again.

Solution

oly_airport$DATE = as.Date(oly_airport$DATE)
glimpse(oly_airport)
## Rows: 29,497
## Columns: 6
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME    <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE    <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1…
## $ PRCP    <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX    <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN    <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…

Summary

Do a summary() and check for anomalies.

Solution

summary(oly_airport)
##    STATION              NAME                DATE                 PRCP       
##  Length:29497       Length:29497       Min.   :1941-05-13   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:1961-07-21   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :1981-09-28   Median :0.0000  
##                                        Mean   :1981-09-28   Mean   :0.1369  
##                                        3rd Qu.:2001-12-06   3rd Qu.:0.1400  
##                                        Max.   :2022-02-13   Max.   :4.8200  
##                                                             NA's   :12      
##       TMAX             TMIN      
##  Min.   : 18.00   Min.   :-8.00  
##  1st Qu.: 50.00   1st Qu.:33.00  
##  Median : 59.00   Median :40.00  
##  Mean   : 60.55   Mean   :39.83  
##  3rd Qu.: 71.00   3rd Qu.:47.00  
##  Max.   :110.00   Max.   :69.00  
##  NA's   :22       NA's   :20

Inspect NA values and Drop these records.

Solution

oly_airport %>% filter(is.na(TMAX) |
                       is.na(TMIN) |
                       is.na(PRCP))
## # A tibble: 25 × 6
##    STATION     NAME                   DATE        PRCP  TMAX  TMIN
##    <chr>       <chr>                  <date>     <dbl> <dbl> <dbl>
##  1 USW00024227 OLYMPIA AIRPORT, WA US 1996-01-24 NA       39    33
##  2 USW00024227 OLYMPIA AIRPORT, WA US 1996-07-03  0.12    67    NA
##  3 USW00024227 OLYMPIA AIRPORT, WA US 1996-12-26 NA       NA    NA
##  4 USW00024227 OLYMPIA AIRPORT, WA US 1996-12-27 NA       NA    NA
##  5 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-06  0       NA    28
##  6 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-07  0       61    NA
##  7 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-12  0       NA    28
##  8 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-13  0.39    NA    NA
##  9 USW00024227 OLYMPIA AIRPORT, WA US 1997-04-14  0.35    NA    NA
## 10 USW00024227 OLYMPIA AIRPORT, WA US 1997-05-07  0       NA    NA
## # … with 15 more rows

The missing data came from periods in 1996-97 and 2021-22.

oly_airport = oly_airport %>% drop_na()
summary(oly_airport)
##    STATION              NAME                DATE                 PRCP       
##  Length:29472       Length:29472       Min.   :1941-05-13   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:1961-07-14   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :1981-09-15   Median :0.0000  
##                                        Mean   :1981-09-19   Mean   :0.1369  
##                                        3rd Qu.:2001-12-01   3rd Qu.:0.1400  
##                                        Max.   :2022-02-11   Max.   :4.8200  
##       TMAX             TMIN      
##  Min.   : 18.00   Min.   :-8.00  
##  1st Qu.: 50.00   1st Qu.:33.00  
##  Median : 59.00   Median :40.00  
##  Mean   : 60.55   Mean   :39.83  
##  3rd Qu.: 71.00   3rd Qu.:47.00  
##  Max.   :110.00   Max.   :69.00

Graphics

Get density with rug plots for TMAX and TMIN.

Solution

oly_airport %>% 
  ggplot(aes(x = TMAX)) +
  geom_density() +
  geom_rug() +
  ggtitle("TMAX")

Note the peak at a moderate temperature and a shoulder with a secondary peak to the right of the primary peak.

oly_airport %>% 
  ggplot(aes(x = TMIN)) +
  geom_density() +
  geom_rug() +
  ggtitle("TMIN")

Note the pronounced left skew and the single primary peak with weak shoulders to the left and right.

Add some Data

Load the package lubridate. Then use the functions year(), month() and day() to create the variables yr, mo, and dy. Make these variables factors. Use glimpse() and summary() to verify the results.

Solution

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
oly_airport  = oly_airport %>% 
  mutate(yr = factor(year(DATE)),
         mo = factor(month(DATE)),
         dy = factor(day(DATE)))

glimpse(oly_airport)
## Rows: 29,472
## Columns: 9
## $ STATION <chr> "USW00024227", "USW00024227", "USW00024227", "USW00024227", "U…
## $ NAME    <chr> "OLYMPIA AIRPORT, WA US", "OLYMPIA AIRPORT, WA US", "OLYMPIA A…
## $ DATE    <date> 1941-05-13, 1941-05-14, 1941-05-15, 1941-05-16, 1941-05-17, 1…
## $ PRCP    <dbl> 0.00, 0.00, 0.30, 1.08, 0.06, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ TMAX    <dbl> 66, 63, 58, 55, 57, 59, 58, 65, 68, 85, 84, 75, 72, 59, 61, 59…
## $ TMIN    <dbl> 50, 47, 44, 45, 46, 39, 40, 50, 42, 46, 46, 50, 41, 37, 48, 46…
## $ yr      <fct> 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 1941, 19…
## $ mo      <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6,…
## $ dy      <fct> 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28…
summary(oly_airport)
##    STATION              NAME                DATE                 PRCP       
##  Length:29472       Length:29472       Min.   :1941-05-13   Min.   :0.0000  
##  Class :character   Class :character   1st Qu.:1961-07-14   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Median :1981-09-15   Median :0.0000  
##                                        Mean   :1981-09-19   Mean   :0.1369  
##                                        3rd Qu.:2001-12-01   3rd Qu.:0.1400  
##                                        Max.   :2022-02-11   Max.   :4.8200  
##                                                                             
##       TMAX             TMIN             yr              mo       
##  Min.   : 18.00   Min.   :-8.00   1944   :  366   8      : 2511  
##  1st Qu.: 50.00   1st Qu.:33.00   1948   :  366   10     : 2511  
##  Median : 59.00   Median :40.00   1952   :  366   7      : 2510  
##  Mean   : 60.55   Mean   :39.83   1956   :  366   12     : 2506  
##  3rd Qu.: 71.00   3rd Qu.:47.00   1960   :  366   1      : 2504  
##  Max.   :110.00   Max.   :69.00   1964   :  366   5      : 2494  
##                                   (Other):27276   (Other):14436  
##        dy       
##  1      :  969  
##  2      :  969  
##  4      :  969  
##  5      :  969  
##  10     :  969  
##  11     :  969  
##  (Other):23658

Save the File.

You will be able to get the data without rerunning this.

Solution

save(oly_airport, file = "oly_airport.Rdata")