In this lab we work with the data set storms from the dplyr package. It contains measurements with intervals of six hours of various storms. Check its help page for the content of the data.

library(dplyr)
library(stringr)

Display the summary of storms.

summary(storms)
##      name                year          month             day       
##  Length:19537       Min.   :1975   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:1994   1st Qu.: 8.000   1st Qu.: 8.00  
##  Mode  :character   Median :2004   Median : 9.000   Median :16.00  
##                     Mean   :2003   Mean   : 8.706   Mean   :15.73  
##                     3rd Qu.:2013   3rd Qu.: 9.000   3rd Qu.:24.00  
##                     Max.   :2022   Max.   :12.000   Max.   :31.00  
##                                                                    
##       hour             lat             long                         status    
##  Min.   : 0.000   Min.   : 7.00   Min.   :-136.90   tropical storm     :6830  
##  1st Qu.: 5.000   1st Qu.:18.30   1st Qu.: -78.80   hurricane          :4803  
##  Median :12.000   Median :26.60   Median : -62.30   tropical depression:3569  
##  Mean   : 9.101   Mean   :27.01   Mean   : -61.56   extratropical      :2151  
##  3rd Qu.:18.000   3rd Qu.:33.80   3rd Qu.: -45.50   other low          :1453  
##  Max.   :23.000   Max.   :70.70   Max.   :  13.50   subtropical storm  : 298  
##                                                     (Other)            : 433  
##     category          wind           pressure      tropicalstorm_force_diameter
##  Min.   :1.000   Min.   : 10.00   Min.   : 882.0   Min.   :   0.0              
##  1st Qu.:1.000   1st Qu.: 30.00   1st Qu.: 986.0   1st Qu.:   0.0              
##  Median :1.000   Median : 45.00   Median :1000.0   Median : 110.0              
##  Mean   :1.896   Mean   : 50.05   Mean   : 993.5   Mean   : 147.9              
##  3rd Qu.:3.000   3rd Qu.: 65.00   3rd Qu.:1007.0   3rd Qu.: 220.0              
##  Max.   :5.000   Max.   :165.00   Max.   :1024.0   Max.   :1440.0              
##  NA's   :14734                                     NA's   :9512                
##  hurricane_force_diameter
##  Min.   :  0.00          
##  1st Qu.:  0.00          
##  Median :  0.00          
##  Mean   : 14.92          
##  3rd Qu.:  0.00          
##  Max.   :300.00          
##  NA's   :9512

Display a frequency table of the names of the storms with more than 80 entries, with the frequencies in descending order.

storms %>%
  count(name, sort = TRUE) %>%
  filter(n > 80)

Data Selection

The variables category, tropicalstorm_force_diameter, and hurricane_force_diameter have missing values. Overwrite storms without the missings on category.

storms <- storms %>%
  filter(!is.na(category))

The variables tropicalstorm_force_diameter and hurricane_force_diameter still have missings (check). Replace the missings on these variables by their respective medians using across. To do so, the class of these variables needs to be converted from integer to numeric first.

# Check remaining missings
sum(is.na(storms$tropicalstorm_force_diameter))
## [1] 2633
sum(is.na(storms$hurricane_force_diameter))
## [1] 2633
# Convert to numeric and replace NAs with median
storms <- storms %>%
  mutate(across(c(tropicalstorm_force_diameter, hurricane_force_diameter),
                as.numeric)) %>%
  mutate(across(c(tropicalstorm_force_diameter, hurricane_force_diameter),
                ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

Display the summary of the transformed storms data.

summary(storms)
##      name                year          month             day       
##  Length:4803        Min.   :1975   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:1992   1st Qu.: 8.000   1st Qu.: 8.00  
##  Mode  :character   Median :2001   Median : 9.000   Median :16.00  
##                     Mean   :2001   Mean   : 8.952   Mean   :15.88  
##                     3rd Qu.:2012   3rd Qu.: 9.000   3rd Qu.:24.00  
##                     Max.   :2022   Max.   :12.000   Max.   :31.00  
##                                                                    
##       hour             lat             long       
##  Min.   : 0.000   Min.   : 9.50   Min.   :-119.3  
##  1st Qu.: 5.000   1st Qu.:19.70   1st Qu.: -76.2  
##  Median :12.000   Median :26.40   Median : -63.2  
##  Mean   : 9.156   Mean   :26.49   Mean   : -63.9  
##  3rd Qu.:18.000   3rd Qu.:32.50   3rd Qu.: -51.8  
##  Max.   :23.000   Max.   :50.80   Max.   : -14.1  
##                                                   
##                     status        category          wind       
##  hurricane             :4803   Min.   :1.000   Min.   : 65.00  
##  disturbance           :   0   1st Qu.:1.000   1st Qu.: 70.00  
##  extratropical         :   0   Median :1.000   Median : 80.00  
##  other low             :   0   Mean   :1.896   Mean   : 86.59  
##  subtropical depression:   0   3rd Qu.:3.000   3rd Qu.:100.00  
##  subtropical storm     :   0   Max.   :5.000   Max.   :165.00  
##  (Other)               :   0                                   
##     pressure      tropicalstorm_force_diameter hurricane_force_diameter
##  Min.   : 882.0   Min.   : 50.0                Min.   :  0.00          
##  1st Qu.: 958.0   1st Qu.:232.5                1st Qu.: 50.00          
##  Median : 973.0   Median :232.5                Median : 50.00          
##  Mean   : 968.8   Mean   :242.3                Mean   : 55.82          
##  3rd Qu.: 983.5   3rd Qu.:232.5                3rd Qu.: 50.00          
##  Max.   :1005.0   Max.   :870.0                Max.   :300.00          
## 

Recoding

Add the variable months by recoding month into “Jan”, “Feb”, etc. using case_match(). Notice that the class of month has to be changed from numeric into character because numeric data cannot be transformed into character data by case_match().

storms <- storms %>%
  mutate(month = as.character(month),
         months = case_match(month,
           "1"  ~ "Jan",
           "2"  ~ "Feb",
           "3"  ~ "Mar",
           "4"  ~ "Apr",
           "5"  ~ "May",
           "6"  ~ "Jun",
           "7"  ~ "Jul",
           "8"  ~ "Aug",
           "9"  ~ "Sep",
           "10" ~ "Oct",
           "11" ~ "Nov",
           "12" ~ "Dec"
         ))

Add the variable max_wind with the values “y” for the strongest wind per name, and “n” otherwise using case_when().

storms <- storms %>%
  group_by(name) %>%
  mutate(max_wind = case_when(
    wind == max(wind) ~ "y",
    TRUE              ~ "n"
  )) %>%
  ungroup()

Remove the cases in storms for which max_wind is “n”.

storms <- storms %>%
  filter(max_wind == "y")

Most storms have their maximum wind velocity on multiple measurements. To retain only one measurement per storm, filter on the largest value of lat.

storms <- storms %>%
  group_by(name) %>%
  filter(lat == max(lat)) %>%
  slice(1) %>%          # in case of remaining ties
  ungroup()

Summarizing Data

Reproduce the output below.

  months  n
1    Aug 37
2    Jul 11
3    Jun  1
4    Nov 13
5    Oct 48
6    Sep 76
storms %>%
  count(months) %>%
  arrange(months)

Reproduce the output below.

  months mean(wind) sd(wind)
1    Jul   92.27273 23.80794
2    Aug  106.08108 27.49078
3    Sep  115.19737 24.95918
4    Oct  100.72917 26.03168
5    Nov   99.61538 26.33609
6    Jun   65.00000       NA
storms %>%
  group_by(months) %>%
  summarise(`mean(wind)` = mean(wind),
            `sd(wind)`   = sd(wind)) %>%
  arrange(match(months, c("Jul", "Aug", "Sep", "Oct", "Nov", "Jun")))

Reproduce the output below. Notice that the function range() returns two values per month, which precludes use of the function summarize().

   months range(wind)
1     Jul          70
2     Jul         140
3     Aug          65
4     Aug         165
5     Sep          65
6     Sep         160
7     Oct          65
8     Oct         160
9     Nov          65
10    Nov         135
11    Jun          65
12    Jun          65
storms %>%
  group_by(months) %>%
  reframe(`range(wind)` = range(wind)) %>%
  arrange(match(months, c("Jul", "Aug", "Sep", "Oct", "Nov", "Jun")))

Strings

Display the unique names of the storms that have the letters “x” and/or “z” in their name.

storms %>%
  filter(str_detect(name, regex("[xz]", ignore_case = TRUE))) %>%
  distinct(name) %>%
  pull(name)
## [1] "Alex"    "Felix"   "Gonzalo" "Lorenzo" "Roxanne" "Zeta"

End of practical