This data set is a pre-loaded dataset within the dplyr package in R. It contains data from the NOAA hurricane database.
First, I need to do some setup that will allow my code to show in the Markdown file, but omit certain messages and warnings, in addition to setting up the libraries I will be using:
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
library(scales)
Next, I want to store my data in a tibble dataframe:
data("storms", package = "dplyr")
storms <- as_tibble(storms)
Let’s get a better idea of the scope of this database. How many rows and columns are there?
n_rows <- nrow(storms)
n_cols <- ncol(storms)
print(paste("There are", n_rows, "rows and", n_cols, "columns in this dataset"))
## [1] "There are 19537 rows and 13 columns in this dataset"
Hurricane severity is categorized by the wind speeds. Let’s look at some stats for wind:
mean_wind <- mean(storms$wind, na.rm = TRUE)
min_wind <- min(storms$wind, na.rm = TRUE)
max_wind <- max(storms$wind, na.rm = TRUE)
print(paste("The average wind speed is", round(mean_wind, 0), "mph, and the lowest recorded speed is", round(min_wind, 0), "mph, while the highest recorded speed is", round(max_wind,0), "mph"))
## [1] "The average wind speed is 50 mph, and the lowest recorded speed is 10 mph, while the highest recorded speed is 165 mph"
What’s the correlation between the wind speed and pressure?
cor_wp <- cor(storms$wind, storms$pressure)
print(paste("The correlation between speed and pressure is", round(cor_wp, 2)))
## [1] "The correlation between speed and pressure is -0.93"
How many unique names storms were there per year?
#Find the number of UNIQUE storms per year, since the dataset has repetitions of names:
storms_yearly <- storms %>%
summarise(storms_per_year = n_distinct(name), .by = year) %>%
arrange(year)
ggplot(storms_yearly, aes(x = year, y = storms_per_year)) +
geom_line() +
geom_point() +
scale_y_continuous(labels = comma) +
labs(title = "Unique Named Storms per Year",
x = "Year", y = "Count of Storms")
Finally, let’s do some basic summary statistics:
summary(storms)
## name year month day
## Length:19537 Min. :1975 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1994 1st Qu.: 8.000 1st Qu.: 8.00
## Mode :character Median :2004 Median : 9.000 Median :16.00
## Mean :2003 Mean : 8.706 Mean :15.73
## 3rd Qu.:2013 3rd Qu.: 9.000 3rd Qu.:24.00
## Max. :2022 Max. :12.000 Max. :31.00
##
## hour lat long status
## Min. : 0.000 Min. : 7.00 Min. :-136.90 tropical storm :6830
## 1st Qu.: 5.000 1st Qu.:18.30 1st Qu.: -78.80 hurricane :4803
## Median :12.000 Median :26.60 Median : -62.30 tropical depression:3569
## Mean : 9.101 Mean :27.01 Mean : -61.56 extratropical :2151
## 3rd Qu.:18.000 3rd Qu.:33.80 3rd Qu.: -45.50 other low :1453
## Max. :23.000 Max. :70.70 Max. : 13.50 subtropical storm : 298
## (Other) : 433
## category wind pressure tropicalstorm_force_diameter
## Min. :1.000 Min. : 10.00 Min. : 882.0 Min. : 0.0
## 1st Qu.:1.000 1st Qu.: 30.00 1st Qu.: 986.0 1st Qu.: 0.0
## Median :1.000 Median : 45.00 Median :1000.0 Median : 110.0
## Mean :1.896 Mean : 50.05 Mean : 993.5 Mean : 147.9
## 3rd Qu.:3.000 3rd Qu.: 65.00 3rd Qu.:1007.0 3rd Qu.: 220.0
## Max. :5.000 Max. :165.00 Max. :1024.0 Max. :1440.0
## NA's :14734 NA's :9512
## hurricane_force_diameter
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 0.00
## Mean : 14.92
## 3rd Qu.: 0.00
## Max. :300.00
## NA's :9512