In this lab we work with the data set storms from the
dplyr package. It contains measurements with intervals of
six hours of various storms. Check its help page for the content of the
data.
Display the summary of storms.
## name year month day
## Length:19537 Min. :1975 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1994 1st Qu.: 8.000 1st Qu.: 8.00
## Mode :character Median :2004 Median : 9.000 Median :16.00
## Mean :2003 Mean : 8.706 Mean :15.73
## 3rd Qu.:2013 3rd Qu.: 9.000 3rd Qu.:24.00
## Max. :2022 Max. :12.000 Max. :31.00
##
## hour lat long status
## Min. : 0.000 Min. : 7.00 Min. :-136.90 tropical storm :6830
## 1st Qu.: 5.000 1st Qu.:18.30 1st Qu.: -78.80 hurricane :4803
## Median :12.000 Median :26.60 Median : -62.30 tropical depression:3569
## Mean : 9.101 Mean :27.01 Mean : -61.56 extratropical :2151
## 3rd Qu.:18.000 3rd Qu.:33.80 3rd Qu.: -45.50 other low :1453
## Max. :23.000 Max. :70.70 Max. : 13.50 subtropical storm : 298
## (Other) : 433
## category wind pressure tropicalstorm_force_diameter
## Min. :1.000 Min. : 10.00 Min. : 882.0 Min. : 0.0
## 1st Qu.:1.000 1st Qu.: 30.00 1st Qu.: 986.0 1st Qu.: 0.0
## Median :1.000 Median : 45.00 Median :1000.0 Median : 110.0
## Mean :1.896 Mean : 50.05 Mean : 993.5 Mean : 147.9
## 3rd Qu.:3.000 3rd Qu.: 65.00 3rd Qu.:1007.0 3rd Qu.: 220.0
## Max. :5.000 Max. :165.00 Max. :1024.0 Max. :1440.0
## NA's :14734 NA's :9512
## hurricane_force_diameter
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 0.00
## Mean : 14.92
## 3rd Qu.: 0.00
## Max. :300.00
## NA's :9512
Display a frequency table of the names of the storms with more than 80 entries, with the frequencies in descending order.
The variables category,
tropicalstorm_force_diameter, and
hurricane_force_diameter have missing values. Overwrite
storms without the missings on
category.
The variables tropicalstorm_force_diameter and
hurricane_force_diameter still have missings (check).
Replace the missings on these variables by their respective medians
using across. To do so, the class of these variables needs
to be converted from integer to numeric first.
## [1] 2633
## [1] 2633
# Convert to numeric and replace NAs with median
storms <- storms %>%
mutate(across(c(tropicalstorm_force_diameter, hurricane_force_diameter),
as.numeric)) %>%
mutate(across(c(tropicalstorm_force_diameter, hurricane_force_diameter),
~ ifelse(is.na(.), median(., na.rm = TRUE), .)))Display the summary of the transformed storms data.
## name year month day
## Length:4803 Min. :1975 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1992 1st Qu.: 8.000 1st Qu.: 8.00
## Mode :character Median :2001 Median : 9.000 Median :16.00
## Mean :2001 Mean : 8.952 Mean :15.88
## 3rd Qu.:2012 3rd Qu.: 9.000 3rd Qu.:24.00
## Max. :2022 Max. :12.000 Max. :31.00
##
## hour lat long
## Min. : 0.000 Min. : 9.50 Min. :-119.3
## 1st Qu.: 5.000 1st Qu.:19.70 1st Qu.: -76.2
## Median :12.000 Median :26.40 Median : -63.2
## Mean : 9.156 Mean :26.49 Mean : -63.9
## 3rd Qu.:18.000 3rd Qu.:32.50 3rd Qu.: -51.8
## Max. :23.000 Max. :50.80 Max. : -14.1
##
## status category wind
## hurricane :4803 Min. :1.000 Min. : 65.00
## disturbance : 0 1st Qu.:1.000 1st Qu.: 70.00
## extratropical : 0 Median :1.000 Median : 80.00
## other low : 0 Mean :1.896 Mean : 86.59
## subtropical depression: 0 3rd Qu.:3.000 3rd Qu.:100.00
## subtropical storm : 0 Max. :5.000 Max. :165.00
## (Other) : 0
## pressure tropicalstorm_force_diameter hurricane_force_diameter
## Min. : 882.0 Min. : 50.0 Min. : 0.00
## 1st Qu.: 958.0 1st Qu.:232.5 1st Qu.: 50.00
## Median : 973.0 Median :232.5 Median : 50.00
## Mean : 968.8 Mean :242.3 Mean : 55.82
## 3rd Qu.: 983.5 3rd Qu.:232.5 3rd Qu.: 50.00
## Max. :1005.0 Max. :870.0 Max. :300.00
##
Add the variable months by recoding
month into “Jan”, “Feb”, etc. using
case_match(). Notice that the class of month
has to be changed from numeric into character because numeric data
cannot be transformed into character data by
case_match().
storms <- storms %>%
mutate(month = as.character(month),
months = case_match(month,
"1" ~ "Jan",
"2" ~ "Feb",
"3" ~ "Mar",
"4" ~ "Apr",
"5" ~ "May",
"6" ~ "Jun",
"7" ~ "Jul",
"8" ~ "Aug",
"9" ~ "Sep",
"10" ~ "Oct",
"11" ~ "Nov",
"12" ~ "Dec"
))Add the variable max_wind with the values “y”
for the strongest wind per name, and “n” otherwise using
case_when().
storms <- storms %>%
group_by(name) %>%
mutate(max_wind = case_when(
wind == max(wind) ~ "y",
TRUE ~ "n"
)) %>%
ungroup()Remove the cases in storms for which max_wind is
“n”.
Most storms have their maximum wind velocity on multiple
measurements. To retain only one measurement per storm, filter on the
largest value of lat.
storms <- storms %>%
group_by(name) %>%
filter(lat == max(lat)) %>%
slice(1) %>% # in case of remaining ties
ungroup()Reproduce the output below.
months n
1 Aug 37
2 Jul 11
3 Jun 1
4 Nov 13
5 Oct 48
6 Sep 76
Reproduce the output below.
months mean(wind) sd(wind)
1 Jul 92.27273 23.80794
2 Aug 106.08108 27.49078
3 Sep 115.19737 24.95918
4 Oct 100.72917 26.03168
5 Nov 99.61538 26.33609
6 Jun 65.00000 NA
storms %>%
group_by(months) %>%
summarise(`mean(wind)` = mean(wind),
`sd(wind)` = sd(wind)) %>%
arrange(match(months, c("Jul", "Aug", "Sep", "Oct", "Nov", "Jun")))Reproduce the output below. Notice that the function
range() returns two values per month, which precludes use
of the function summarize().
months range(wind)
1 Jul 70
2 Jul 140
3 Aug 65
4 Aug 165
5 Sep 65
6 Sep 160
7 Oct 65
8 Oct 160
9 Nov 65
10 Nov 135
11 Jun 65
12 Jun 65
storms %>%
group_by(months) %>%
reframe(`range(wind)` = range(wind)) %>%
arrange(match(months, c("Jul", "Aug", "Sep", "Oct", "Nov", "Jun")))