The “storms” dataset from the “dplyr” package is a subset of the NOAA (National Oceanic and Atmospheric Administration) Atlantic hurricane database best track data. This dataset includes the positions and attributes of 198 tropical storms, measured every six hours during the lifetime of a storm. The data contains 10,010 observations and 13 variables, including information on the date, time, status and category of the storm.
Summary table template:
| Variable | Description |
|---|---|
| name | Name of the storm |
| year | Year of the storm |
| month | Month of the storm |
| day | Day of the storm |
| hour | Hour of the storm |
| lat | Latitude of the storm |
| long | Longitude of the storm |
| status | Storm classification (Tropical Depression, Tropical Storm, or Hurricane) |
| category | Saffir-Simpson storm category |
| wind | Storm’s maximum sustained wind speed (in knots) |
| pressure | Air pressure at the storm’s center (in millibars) |
| ts_diameter | Diameter of the area experiencing tropical storm strength winds (34 knots or above) |
| hu_diameter | Diameter of the area experiencing hurricane strength winds (64 knots or above) |
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
storms%>%
group_by(category)%>%
summarise(avgWind = mean(wind)) %>%
ggplot(aes(x = category , y = avgWind)) +
geom_point() +
labs(x= "Category",
y= "Average Wind Speed (in knots)",
title = "Average Wind by Category of Storm") +
theme_minimal()
This first plot is to exemplify the relationship between the Category and the average wind speed of that category, to understand how the “Saffir-Simpson storm category” system works. In this graphic we can see that there is a clear direct proportionality between both variables, and therefore we can understand that the higher the category of the storm, the greater the wind speed it carries.
storms%>%
ggplot(aes(x=as.factor(month))) +
geom_bar(aes(fill= category)) +
labs(x= "Month",
y= "Number of Storms",
title = "Monthly Number of Storms by Category",
fill = "Category" ) +
theme_minimal()
This plot shows hoe storm occurances happen throughout the year, showing which months are historically more likely to have a storm happening. This graph is meant to show the clear “season” of storms, where not only most storms happen, but also when the most intense ones happen. As a result, we can clearly see that September has the most storms by far, and there is a somewhat normal curve around it, meaning that storm occurances increase exponentially as we get closer to September and decrease exponentially as you get further away from said month. This graph has value since it could allow for better planning of authorities around months with higher indices of storms.