“dplyr” comes with a data set called storms. This is one of the data sets available in the National Hurricane Center (NHC) Data Archive, which is part of the National Oceanic and Atmospheric Administration (NOAA). In particular, the data set storms refers to the Atlantic hurricane database best track data.
The following code loads the dplyr package and glipmses the “storms” data set
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
glimpse(storms)
## Rows: 19,066
## Columns: 13
## $ name <chr> "Amy", "Amy", "Amy", "Amy", "Amy", "Amy",…
## $ year <dbl> 1975, 1975, 1975, 1975, 1975, 1975, 1975,…
## $ month <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,…
## $ day <int> 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 2…
## $ hour <dbl> 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18,…
## $ lat <dbl> 27.5, 28.5, 29.5, 30.5, 31.5, 32.4, 33.3,…
## $ long <dbl> -79.0, -79.0, -79.0, -79.0, -78.8, -78.7,…
## $ status <fct> tropical depression, tropical depression,…
## $ category <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind <int> 25, 25, 25, 25, 25, 25, 25, 30, 35, 40, 4…
## $ pressure <int> 1013, 1013, 1013, 1013, 1012, 1012, 1011,…
## $ tropicalstorm_force_diameter <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hurricane_force_diameter <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
The data includes the positions and attributes of 198 tropical storms, measured every six hours during the lifetime of a storm.
You can find some technical description of storms by taking a peek at its manual (or help) documentation: ?storms
Here’s a full description of all the columns:
name: Storm name year, month, and day: Date of report hour: Hour of report (in UTC) lat: Latitude long: Longitude status: Storm classification (Tropical Depression, Tropical Storm, or Hurricane) category: Saffir-Simpson storm category (estimated from wind speed. -1 = Tropical Depression, 0 = Tropical Storm) wind: storm’s maximum sustained wind speed (in knots) pressure: Air pressure at the storm’s center (in millibars) ts_diameter: Diameter of the area experiencing tropical storm strength winds (34 knots or above) hu_diameter: Diameter of the area experiencing hurricane strength winds (64 knots or above)
Answer the following questions:
storm_names_1980s <- filter(storms,year >= 1980 & year <= 1989) %>%
distinct(name, year)
storm_names_1980s
## # A tibble: 90 × 2
## name year
## <chr> <dbl>
## 1 Allen 1980
## 2 Bonnie 1980
## 3 Charley 1980
## 4 Georges 1980
## 5 Earl 1980
## 6 Danielle 1980
## 7 Frances 1980
## 8 Hermine 1980
## 9 Ivan 1980
## 10 Jeanne 1980
## # … with 80 more rows
storms_per_year <- group_by (storms,year) %>%
summarize (number_of_storms = n_distinct(name))
storms_per_year
## # A tibble: 47 × 2
## year number_of_storms
## <dbl> <int>
## 1 1975 8
## 2 1976 7
## 3 1977 6
## 4 1978 11
## 5 1979 8
## 6 1980 11
## 7 1981 11
## 8 1982 5
## 9 1983 4
## 10 1984 12
## # … with 37 more rows
storm_records_per_year <- group_by(storms,name,year) %>%
summarize(number_of_records = n())
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
storm_records_per_year
## # A tibble: 639 × 3
## # Groups: name [258]
## name year number_of_records
## <chr> <dbl> <int>
## 1 AL011993 1993 11
## 2 AL012000 2000 4
## 3 AL021992 1992 5
## 4 AL021994 1994 6
## 5 AL021999 1999 4
## 6 AL022000 2000 12
## 7 AL022001 2001 5
## 8 AL022003 2003 4
## 9 AL022006 2006 13
## 10 AL031987 1987 32
## # … with 629 more rows
distinct(storms,status)
## # A tibble: 9 × 1
## status
## <fct>
## 1 tropical depression
## 2 tropical storm
## 3 extratropical
## 4 hurricane
## 5 subtropical storm
## 6 subtropical depression
## 7 disturbance
## 8 other low
## 9 tropical wave
distinct(storms,category)
## # A tibble: 6 × 1
## category
## <dbl>
## 1 NA
## 2 1
## 3 3
## 4 2
## 5 4
## 6 5
storms_categ5 <- filter(storms,category==5) %>%
distinct(name,year)
storms_categ5
## # A tibble: 21 × 2
## name year
## <chr> <dbl>
## 1 Anita 1977
## 2 David 1979
## 3 Allen 1980
## 4 Gilbert 1988
## 5 Hugo 1989
## 6 Andrew 1992
## 7 Mitch 1998
## 8 Isabel 2003
## 9 Ivan 2004
## 10 Emily 2005
## # … with 11 more rows
storms_statistics <- group_by(storms,category,status) %>%
summarize(avg_pressure = mean(pressure, na.rm =TRUE), avg_wind = mean(wind, na.rm = TRUE)) %>%
select(category,status,avg_pressure,avg_wind)
## `summarise()` has grouped output by 'category'. You can override using the
## `.groups` argument.
storms_statistics
## # A tibble: 13 × 4
## # Groups: category [6]
## category status avg_pressure avg_wind
## <dbl> <fct> <dbl> <dbl>
## 1 1 hurricane 981. 71.0
## 2 2 hurricane 967. 89.5
## 3 3 hurricane 955. 104.
## 4 4 hurricane 940. 122.
## 5 5 hurricane 918. 147.
## 6 NA disturbance 1010. 29.3
## 7 NA extratropical 993. 41.4
## 8 NA other low 1009. 25.4
## 9 NA subtropical depression 1008. 26.7
## 10 NA subtropical storm 998. 44.5
## 11 NA tropical depression 1008. 27.5
## 12 NA tropical storm 999. 45.7
## 13 NA tropical wave 1009. 28.6
max_wind_per_storm <- group_by(storms,year,name) %>%
summarize(max_wind = max(wind, na.rm =TRUE)) %>%
select(year,name,max_wind)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
max_wind_per_storm
## # A tibble: 639 × 3
## # Groups: year [47]
## year name max_wind
## <dbl> <chr> <int>
## 1 1975 Amy 60
## 2 1975 Blanche 75
## 3 1975 Caroline 100
## 4 1975 Doris 95
## 5 1975 Eloise 110
## 6 1975 Faye 90
## 7 1975 Gladys 120
## 8 1975 Hallie 45
## 9 1976 Belle 105
## 10 1976 Candice 80
## # … with 629 more rows
max_wind_per_year <- group_by(storms,year,name)%>%
summarize(max_wind = max(wind, na.rm =TRUE)) %>%
select(year,name,max_wind) %>%
arrange(desc(max_wind))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
max_wind_per_year
## # A tibble: 639 × 3
## # Groups: year [47]
## year name max_wind
## <dbl> <chr> <int>
## 1 1980 Allen 165
## 2 1988 Gilbert 160
## 3 2005 Wilma 160
## 4 2019 Dorian 160
## 5 1998 Mitch 155
## 6 2005 Rita 155
## 7 2017 Irma 155
## 8 1977 Anita 150
## 9 1979 David 150
## 10 1992 Andrew 150
## # … with 629 more rows