Overview

This report is dedicated to exploring the NOAA Atlantic hurricane dataset included with the dplyr package. The storms in the dataset are recorded from 1975 to 2022. Specifically in this report, we’ll examine three characteristics of the dataset:

  1. How different types storms are increasing in frequency.
  2. The relationship between wind speed & air pressure at the storm’s center.
  3. How different storm categories relate to wind speed.

Data Preparation

An advantage of this dataset is that it’s already in a tidy format; each row is an observed storm, each column is the name of a variable relating to that storm, and each cell is the corresponding variable value. However, this dataset does have repeated observations per storm; in instances where it makes sense, the data will be aggregated prior to visualization.

Analysis

Types of Storms

The NOAA dataset in dplyr includes a column that specifies the type of storm being recorded. We can see in the faceted time series plots below that the following storm categories show an upward trend in frequency:

  • Tropical depressions
  • Tropical storms
  • Other low-severity storms

Interestingly, the frequency of hurricanes shows a less strong time-series pattern, although there does seem to be a small increase towards the end. However, there does seem to be an increasing amount of volatility in the number of hurricanes per year, suggesting a more difficult to predict pattern in hurricane frequency.

# Prior to plotting, the dataset is aggregated to represent the number of storms per year, per storm type.
storms %>%
  group_by(year, status) %>%
  summarise(storms = n_distinct(name)) %>%
  ungroup() %>%
  ggplot(aes(x=year, y=storms)) +
  geom_line() +
  facet_wrap(~status) +
  theme_minimal() +
  labs(title="Tropical storms & other non-hurricane storms are increasing in frequency.",
       x="Year", y="Number of Storms")
Time series plots of different storm types per year.

Time series plots of different storm types per year.

Wind Speed & Air Pressure

Two of the recorded columns in this dataset are wind and pressure, which measure the maximum wind speed (in knots) and air pressure at the storm’s center (in millibars) respectively. Can you guess what relationship these two variables might have?

Code

# A correlation plot between wind & pressure
# Including a trend line
storms %>%
  ggplot(aes(x=wind, y=pressure)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_minimal() +
  labs(title="There is a clear correlation between wind speed & air pressure in Atlantic storms.",
       x = "Wind Speed (in knots)", y = "Air Pressure (in millibars)")

Plot

A correlation plot between the wind speed & air pressure.

A correlation plot between the wind speed & air pressure.

If you guessed a negative correlation - good job! To read up more on this relationship, you can check out this article.

Storm Categories

Finally, we can match data to intuition and see how different categories of hurricanes are defined. Below is a violin plot of wind speeds across the different categories of hurricanes.

# The dataset is filtered to remove non-hurricane data, since those aren't categorized.
# Notice that the category is given to ggplot as a factor.
storms %>%
  filter(!is.na(category)) %>%
  ggplot(aes(x=factor(category), y=wind, fill = category)) +
  geom_violin() +
  coord_flip() +
  scale_x_discrete(limits=rev) +
  theme_minimal() +
  scale_fill_continuous(trans="reverse") +
  labs(title = "Different categories of hurricanes have clear wind speed thresholds.",
       x = "Wind speed (in knots)", y = "Hurricane category") +
  guides(fill="none")

You can see based on the abrupt ends & the non-overlaping violin plots that there is a well-defined relationship between the hurricane category & wind speed. To learn more about how these categories are defined, you can click here.