Instructions

“dplyr” comes with a data set called storms. This is one of the data sets available in the National Hurricane Center (NHC) Data Archive, which is part of the National Oceanic and Atmospheric Administration (NOAA). In particular, the data set storms refers to the Atlantic hurricane database best track data.

The following code loads the dplyr package and glipmses the “storms” data set

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
glimpse(storms)
## Rows: 19,066
## Columns: 13
## $ name                         <chr> "Amy", "Amy", "Amy", "Amy", "Amy", "Amy",…
## $ year                         <dbl> 1975, 1975, 1975, 1975, 1975, 1975, 1975,…
## $ month                        <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,…
## $ day                          <int> 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 2…
## $ hour                         <dbl> 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18,…
## $ lat                          <dbl> 27.5, 28.5, 29.5, 30.5, 31.5, 32.4, 33.3,…
## $ long                         <dbl> -79.0, -79.0, -79.0, -79.0, -78.8, -78.7,…
## $ status                       <fct> tropical depression, tropical depression,…
## $ category                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind                         <int> 25, 25, 25, 25, 25, 25, 25, 30, 35, 40, 4…
## $ pressure                     <int> 1013, 1013, 1013, 1013, 1012, 1012, 1011,…
## $ tropicalstorm_force_diameter <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hurricane_force_diameter     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

The data includes the positions and attributes of 198 tropical storms, measured every six hours during the lifetime of a storm.

You can find some technical description of storms by taking a peek at its manual (or help) documentation: ?storms

Here’s a full description of all the columns:

name: Storm name year, month, and day: Date of report hour: Hour of report (in UTC) lat: Latitude long: Longitude status: Storm classification (Tropical Depression, Tropical Storm, or Hurricane) category: Saffir-Simpson storm category (estimated from wind speed. -1 = Tropical Depression, 0 = Tropical Storm) wind: storm’s maximum sustained wind speed (in knots) pressure: Air pressure at the storm’s center (in millibars) ts_diameter: Diameter of the area experiencing tropical storm strength winds (34 knots or above) hu_diameter: Diameter of the area experiencing hurricane strength winds (64 knots or above)

Answer the following questions:

  1. Use “dplyr” functions/commands to create a table “storm_names_1980s” containing the name and year of storms recorded during the 1980s (i.e. from 1980 to 1989).
storm_names_1980s <- filter(storms,year >= 1980 & year <= 1989) %>%
  distinct(name, year)
storm_names_1980s
## # A tibble: 90 × 2
##    name      year
##    <chr>    <dbl>
##  1 Allen     1980
##  2 Bonnie    1980
##  3 Charley   1980
##  4 Georges   1980
##  5 Earl      1980
##  6 Danielle  1980
##  7 Frances   1980
##  8 Hermine   1980
##  9 Ivan      1980
## 10 Jeanne    1980
## # … with 80 more rows
  1. Use “dplyr” functions/commands to create a table “storms_per_year” containing the number of storms recorded in each year (i.e. counts or frequencies of storms in each year). This table should contain two columns: year values in the first column, and number of storms in the second column. (HINT: Investigate the distinct() function from dplry)
storms_per_year <- group_by (storms,year) %>%
  summarize (number_of_storms = n_distinct(name))
storms_per_year
## # A tibble: 47 × 2
##     year number_of_storms
##    <dbl>            <int>
##  1  1975                8
##  2  1976                7
##  3  1977                6
##  4  1978               11
##  5  1979                8
##  6  1980               11
##  7  1981               11
##  8  1982                5
##  9  1983                4
## 10  1984               12
## # … with 37 more rows
  1. Use “dplyr” functions/commands to create a table “storm_records_per_year” containing three columns: 1) name of storm, 2) year of storm, and 3) count for number of records (of the corresponding storm).
storm_records_per_year <- group_by(storms,name,year) %>%
  summarize(number_of_records = n())
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
storm_records_per_year
## # A tibble: 639 × 3
## # Groups:   name [258]
##    name      year number_of_records
##    <chr>    <dbl>             <int>
##  1 AL011993  1993                11
##  2 AL012000  2000                 4
##  3 AL021992  1992                 5
##  4 AL021994  1994                 6
##  5 AL021999  1999                 4
##  6 AL022000  2000                12
##  7 AL022001  2001                 5
##  8 AL022003  2003                 4
##  9 AL022006  2006                13
## 10 AL031987  1987                32
## # … with 629 more rows
  1. Use “dplyr” functions/commands to display the different (unique) types of storm status. (HINT: Investigate the distinct() function from dplry)
distinct(storms,status)
## # A tibble: 9 × 1
##   status                
##   <fct>                 
## 1 tropical depression   
## 2 tropical storm        
## 3 extratropical         
## 4 hurricane             
## 5 subtropical storm     
## 6 subtropical depression
## 7 disturbance           
## 8 other low             
## 9 tropical wave
  1. Use “dplyr” functions/commands to display the different types of storm categories.
distinct(storms,category)
## # A tibble: 6 × 1
##   category
##      <dbl>
## 1       NA
## 2        1
## 3        3
## 4        2
## 5        4
## 6        5
  1. Use “dplyr” functions/commands to create a table “storms_categ5” containing the name and year of those storms of category 5. (HINT: Investigate the distinct() function from dplry)
storms_categ5 <- filter(storms,category==5) %>%
  distinct(name,year)
storms_categ5
## # A tibble: 21 × 2
##    name     year
##    <chr>   <dbl>
##  1 Anita    1977
##  2 David    1979
##  3 Allen    1980
##  4 Gilbert  1988
##  5 Hugo     1989
##  6 Andrew   1992
##  7 Mitch    1998
##  8 Isabel   2003
##  9 Ivan     2004
## 10 Emily    2005
## # … with 11 more rows
  1. Use “dplyr” functions/commands to display a table “storms_statistics” showing the status, avg_pressure (average pressure), and avg_wind (average wind speed), for each type of storm category. This table should contain four columns: 1) category, 2) status, 3) avg_pressure, and 4) avg_wind.
storms_statistics <- group_by(storms,category,status) %>%
  summarize(avg_pressure = mean(pressure, na.rm =TRUE), avg_wind = mean(wind, na.rm = TRUE)) %>%
  select(category,status,avg_pressure,avg_wind)
## `summarise()` has grouped output by 'category'. You can override using the
## `.groups` argument.
storms_statistics
## # A tibble: 13 × 4
## # Groups:   category [6]
##    category status                 avg_pressure avg_wind
##       <dbl> <fct>                         <dbl>    <dbl>
##  1        1 hurricane                      981.     71.0
##  2        2 hurricane                      967.     89.5
##  3        3 hurricane                      955.    104. 
##  4        4 hurricane                      940.    122. 
##  5        5 hurricane                      918.    147. 
##  6       NA disturbance                   1010.     29.3
##  7       NA extratropical                  993.     41.4
##  8       NA other low                     1009.     25.4
##  9       NA subtropical depression        1008.     26.7
## 10       NA subtropical storm              998.     44.5
## 11       NA tropical depression           1008.     27.5
## 12       NA tropical storm                 999.     45.7
## 13       NA tropical wave                 1009.     28.6
  1. Use “dplyr” functions/commands to create a table “max_wind_per_storm” containing three columns: 1) year of storm, 2) name of storm, and 3) max_wind maximum wind speed record (for that storm).
max_wind_per_storm <- group_by(storms,year,name) %>%
  summarize(max_wind = max(wind, na.rm =TRUE)) %>%
  select(year,name,max_wind)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
max_wind_per_storm
## # A tibble: 639 × 3
## # Groups:   year [47]
##     year name     max_wind
##    <dbl> <chr>       <int>
##  1  1975 Amy            60
##  2  1975 Blanche        75
##  3  1975 Caroline      100
##  4  1975 Doris          95
##  5  1975 Eloise        110
##  6  1975 Faye           90
##  7  1975 Gladys        120
##  8  1975 Hallie         45
##  9  1976 Belle         105
## 10  1976 Candice        80
## # … with 629 more rows
  1. Use “dplyr” functions/commands to create a table “max_wind_per_year” containing three columns: 1) year of storm, 2) name of storm, and 3) wind maximum wind speed record (for that year). Arrange rows by wind speed in decreasing order.
max_wind_per_year <- group_by(storms,year,name)%>%
  summarize(max_wind = max(wind, na.rm =TRUE)) %>%
  select(year,name,max_wind) %>%
  arrange(desc(max_wind))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
max_wind_per_year
## # A tibble: 639 × 3
## # Groups:   year [47]
##     year name    max_wind
##    <dbl> <chr>      <int>
##  1  1980 Allen        165
##  2  1988 Gilbert      160
##  3  2005 Wilma        160
##  4  2019 Dorian       160
##  5  1998 Mitch        155
##  6  2005 Rita         155
##  7  2017 Irma         155
##  8  1977 Anita        150
##  9  1979 David        150
## 10  1992 Andrew       150
## # … with 629 more rows