This report is dedicated to exploring the NOAA Atlantic
hurricane dataset included with the dplyr package. The
storms in the dataset are recorded from 1975 to 2022. Specifically in
this report, we’ll examine three characteristics of the dataset:
An advantage of this dataset is that it’s already in a tidy format; each row is an observed storm, each column is the name of a variable relating to that storm, and each cell is the corresponding variable value. However, this dataset does have repeated observations per storm; in instances where it makes sense, the data will be aggregated prior to visualization.
The NOAA dataset in dplyr includes a column that
specifies the type of storm being recorded. We can see in the faceted
time series plots below that the following storm categories show an
upward trend in frequency:
Interestingly, the frequency of hurricanes shows a less strong time-series pattern, although there does seem to be a small increase towards the end. However, there does seem to be an increasing amount of volatility in the number of hurricanes per year, suggesting a more difficult to predict pattern in hurricane frequency.
# Prior to plotting, the dataset is aggregated to represent the number of storms per year, per storm type.
storms %>%
group_by(year, status) %>%
summarise(storms = n_distinct(name)) %>%
ungroup() %>%
ggplot(aes(x=year, y=storms)) +
geom_line() +
facet_wrap(~status) +
theme_minimal() +
labs(title="Tropical storms & other non-hurricane storms are increasing in frequency.",
x="Year", y="Number of Storms")
Time series plots of different storm types per year.
Two of the recorded columns in this dataset are wind and
pressure, which measure the maximum wind speed (in knots)
and air pressure at the storm’s center (in millibars) respectively. Can
you guess what relationship these two variables might have?
# A correlation plot between wind & pressure
# Including a trend line
storms %>%
ggplot(aes(x=wind, y=pressure)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal() +
labs(title="There is a clear correlation between wind speed & air pressure in Atlantic storms.",
x = "Wind Speed (in knots)", y = "Air Pressure (in millibars)")
A correlation plot between the wind speed & air pressure.
If you guessed a negative correlation - good job! To read up more on this relationship, you can check out this article.
Finally, we can match data to intuition and see how different categories of hurricanes are defined. Below is a violin plot of wind speeds across the different categories of hurricanes.
# The dataset is filtered to remove non-hurricane data, since those aren't categorized.
# Notice that the category is given to ggplot as a factor.
storms %>%
filter(!is.na(category)) %>%
ggplot(aes(x=factor(category), y=wind, fill = category)) +
geom_violin() +
coord_flip() +
scale_x_discrete(limits=rev) +
theme_minimal() +
scale_fill_continuous(trans="reverse") +
labs(title = "Different categories of hurricanes have clear wind speed thresholds.",
x = "Wind speed (in knots)", y = "Hurricane category") +
guides(fill="none")
You can see based on the abrupt ends & the non-overlaping violin plots that there is a well-defined relationship between the hurricane category & wind speed. To learn more about how these categories are defined, you can click here.