The NOAA Tornado Database is provided by the National Oceanic and
Atmospheric Administration. It contains records of tornadoes in the US
in 2024.
Source: https://www.noaa.gov/
I plan to explore how time, location, and path size affect damage costs. My cleaning will include substituting “-9” with NAs, filtering out NAs, and selecting key variables.
Key variables:
-Date
-State
-Magnitude
-Damage Costs
-Path Length
-Path Width
-Latitude/Longitude
I chose this topic and dataset because I was interested in plotting the separate starting and ending tornado latitude/longitudes on a map. I have also never seen a tornado, so I thought this would be the next best thing.
library(tidyverse)
library(highcharter)
library(leaflet)
library(RColorBrewer)
tornadoes <- read_csv("2024_torn.csv")
head(tornadoes)
## # A tibble: 6 × 29
## om yr mo dy date time tz st stf stn mag inj
## <dbl> <dbl> <chr> <chr> <date> <time> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 623402 2024 01 05 2024-01-05 05:56 3 TX 48 0 0 0
## 2 623403 2024 01 06 2024-01-06 14:32 3 FL 12 0 -9 0
## 3 623404 2024 01 06 2024-01-06 16:47 3 FL 12 0 0 0
## 4 623405 2024 01 08 2024-01-08 15:42 3 LA 22 0 0 0
## 5 623406 2024 01 08 2024-01-08 19:25 3 MS 28 0 0 0
## 6 623407 2024 01 08 2024-01-08 19:31 3 MS 28 0 0 0
## # ℹ 17 more variables: fat <dbl>, loss <dbl>, closs <dbl>, slat <dbl>,
## # slon <dbl>, elat <dbl>, elon <dbl>, len <dbl>, wid <dbl>, ns <dbl>,
## # sn <dbl>, sg <dbl>, f1 <dbl>, f2 <dbl>, f3 <dbl>, f4 <dbl>, fc <dbl>
#-9 indicates missing data, sub with NA
tornadoes$mag <- gsub(-9, NA, tornadoes$mag)
#Filter only useful variables without NAs
tornadoes2 <- tornadoes |>
filter(!is.na(mag)) |>
select(c(date, mo, dy, time, st, mag, loss, slat, slon, elat, elon, len, wid))
#head(tornadoes2)
tornadoes3 <- tornadoes2 |>
filter(!is.na(mag)) |>
group_by(st) |>
summarise(number_tornadoes = n(),
mean_width = mean(wid),
mean_length = mean(len),
mean_damage = mean(loss)) |>
arrange(desc(number_tornadoes))
head(tornadoes3, 5)
## # A tibble: 5 × 5
## st number_tornadoes mean_width mean_length mean_damage
## <chr> <int> <dbl> <dbl> <dbl>
## 1 OK 128 399. 5.37 2641711.
## 2 IL 127 162. 5.56 110705.
## 3 TX 108 268. 5.50 375662.
## 4 FL 99 218. 7.13 9081545.
## 5 IA 98 189. 7.00 488063.
#Top five states
tornadoes5 <- tornadoes2 |>
filter(st %in% c("OK", "IL", "TX", "FL", "IA"))
ggplot(tornadoes5, aes(x = date, y = wid, color = mag)) +
geom_point(aes(size = loss, alpha = 0.1)) +
labs(
x = "Month",
y = "Width (Yards)",
title = "Tornado Width over Time (2024)",
caption = "Source: NOAA"
) +
scale_color_brewer(palette = "RdYlGn") +
theme_minimal() +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
#https://stackoverflow.com/questions/10576095/formatting-dates-with-scale-x-date-in-ggplot2
#Ensure only valuable data is shown
tornadoesfinal <- tornadoes5 |>
filter(loss > 0)
hchart(
tornadoesfinal,
"bubble",
hcaes(x = wid, y = len, group = mag, z = loss)) |>
hc_title(text = "Effect of Tornado Length and Width on Damage in Top 5 US States (2024)") |>
hc_xAxis(type = "logarithmic",
title = list(text="Width (yards log scale)")) |>
hc_yAxis(type = "logarithmic",
title = list(text="Length (miles log scale)")) |>
hc_colors(rev(brewer.pal(5, "RdYlGn"))) |>
hc_tooltip(borderColor = "black",
pointFormat = "State: {point.st}
<br>Damage ($): {point.loss}") |>
hc_caption(text = "<b>Source:</b> NOAA <br> Bubble size indicates damage in dollars <br> Top 5 states by tornado frequency") |>
hc_legend(title = list(text = "Magnitude")) |>
hc_add_theme(hc_theme_538())
#Prepare dataset for mapping
tornadoesfinal2 <- tornadoes2 |>
mutate(latitude = slat, #allow leaflet to read lat/long
longitude = slon) |>
filter(loss > 0)
#Add color palette
#https://rstudio.github.io/leaflet/articles/colors.html
pal <- colorFactor(
palette = rev(brewer.pal(5, "RdYlGn")),
domain = tornadoesfinal2$mag
)
#Add interactivity
tooltip <- paste0(
"<b>State: </b>", tornadoesfinal2$st, "<br>",
"<b>Damage: </b>$", tornadoesfinal2$loss, "<br>",
"<b>Magnitude: </b>", tornadoesfinal2$mag, "<br>",
"<b>Path Length: </b>", tornadoesfinal2$len, " miles<br>",
"<b>Path Width: </b>", tornadoesfinal2$wid, " yards<br>"
)
#Plot
leaflet() |>
setView(lng = -95, lat = 38, zoom = 3.5) |>
addProviderTiles("Stadia.AlidadeSmoothDark") |>
addCircles(
data = tornadoesfinal2,
color = ~pal(mag),
radius = sqrt(tornadoesfinal2$loss)*10,
popup = tooltip
)
My scatterplot’s visualization represents how the length and width of a tornado’s path affect the damage it causes. The plot shows that length and width have a positive linear correlation, and that as they increase magnitude also go up. Tornadoes with higher magnitudes deal more costly damage, as seen by the increase in point radii. Additionally, in my test plot I found that tornado length/width was highest during tornado season. Isn’t that something.
My map represents information about where tornadoes are most frequent and powerful in the US. Similar to the scatterplot, color indicates magnitude and size indicates damage costs. An interesting pattern is that the most powerful tornadoes are in the central US and Florida with many weaker ones in the East, but there seem to be very few tornadoes at all in the West. This is apparently due to a lack of moisture and contrasting air masses which facilitate tornado development.
I wish I could have included the paths of the tornadoes on my map with their start and end points, but I could not successfully get this to work.