This data set is a pre-loaded dataset within the dplyr package in R. It contains data from the NOAA hurricane database.

First, I need to do some setup that will allow my code to show in the Markdown file, but omit certain messages and warnings, in addition to setting up the libraries I will be using:

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
library(scales)

Next, I want to store my data in a tibble dataframe:

data("storms", package = "dplyr")
storms <- as_tibble(storms)

Let’s get a better idea of the scope of this database. How many rows and columns are there?

n_rows <- nrow(storms)
n_cols <- ncol(storms)

print(paste("There are", n_rows, "rows and", n_cols, "columns in this dataset"))
## [1] "There are 19537 rows and 13 columns in this dataset"

Hurricane severity is categorized by the wind speeds. Let’s look at some stats for wind:

mean_wind <- mean(storms$wind, na.rm = TRUE)
min_wind <- min(storms$wind, na.rm = TRUE)
max_wind <- max(storms$wind, na.rm = TRUE)

print(paste("The average wind speed is", round(mean_wind, 0), "mph, and the lowest recorded speed is", round(min_wind, 0), "mph, while the highest recorded speed is", round(max_wind,0), "mph"))
## [1] "The average wind speed is 50 mph, and the lowest recorded speed is 10 mph, while the highest recorded speed is 165 mph"

What’s the correlation between the wind speed and pressure?

cor_wp <- cor(storms$wind, storms$pressure)

print(paste("The correlation between speed and pressure is", round(cor_wp, 2)))
## [1] "The correlation between speed and pressure is -0.93"

How many unique names storms were there per year?

#Find the number of UNIQUE storms per year, since the dataset has repetitions of names:
storms_yearly <- storms %>%
  summarise(storms_per_year = n_distinct(name), .by = year) %>%
  arrange(year)

ggplot(storms_yearly, aes(x = year, y = storms_per_year)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = comma) +
  labs(title = "Unique Named Storms per Year",
       x = "Year", y = "Count of Storms")

Finally, let’s do some basic summary statistics:

summary(storms)
##      name                year          month             day       
##  Length:19537       Min.   :1975   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:1994   1st Qu.: 8.000   1st Qu.: 8.00  
##  Mode  :character   Median :2004   Median : 9.000   Median :16.00  
##                     Mean   :2003   Mean   : 8.706   Mean   :15.73  
##                     3rd Qu.:2013   3rd Qu.: 9.000   3rd Qu.:24.00  
##                     Max.   :2022   Max.   :12.000   Max.   :31.00  
##                                                                    
##       hour             lat             long                         status    
##  Min.   : 0.000   Min.   : 7.00   Min.   :-136.90   tropical storm     :6830  
##  1st Qu.: 5.000   1st Qu.:18.30   1st Qu.: -78.80   hurricane          :4803  
##  Median :12.000   Median :26.60   Median : -62.30   tropical depression:3569  
##  Mean   : 9.101   Mean   :27.01   Mean   : -61.56   extratropical      :2151  
##  3rd Qu.:18.000   3rd Qu.:33.80   3rd Qu.: -45.50   other low          :1453  
##  Max.   :23.000   Max.   :70.70   Max.   :  13.50   subtropical storm  : 298  
##                                                     (Other)            : 433  
##     category          wind           pressure      tropicalstorm_force_diameter
##  Min.   :1.000   Min.   : 10.00   Min.   : 882.0   Min.   :   0.0              
##  1st Qu.:1.000   1st Qu.: 30.00   1st Qu.: 986.0   1st Qu.:   0.0              
##  Median :1.000   Median : 45.00   Median :1000.0   Median : 110.0              
##  Mean   :1.896   Mean   : 50.05   Mean   : 993.5   Mean   : 147.9              
##  3rd Qu.:3.000   3rd Qu.: 65.00   3rd Qu.:1007.0   3rd Qu.: 220.0              
##  Max.   :5.000   Max.   :165.00   Max.   :1024.0   Max.   :1440.0              
##  NA's   :14734                                     NA's   :9512                
##  hurricane_force_diameter
##  Min.   :  0.00          
##  1st Qu.:  0.00          
##  Median :  0.00          
##  Mean   : 14.92          
##  3rd Qu.:  0.00          
##  Max.   :300.00          
##  NA's   :9512