Image credit: Amanda Hill/NOAA Weather in Focus Photo Contest 2015
Image credit: Amanda Hill/NOAA Weather in Focus Photo Contest 2015

Introduction

The NOAA Tornado Database is provided by the National Oceanic and Atmospheric Administration. It contains records of tornadoes in the US in 2024.
Source: https://www.noaa.gov/

I plan to explore how time, location, and path size affect damage costs. My cleaning will include substituting “-9” with NAs, filtering out NAs, and selecting key variables.

Key variables:

-Date

-State

-Magnitude

-Damage Costs

-Path Length

-Path Width

-Latitude/Longitude

I chose this topic and dataset because I was interested in plotting the separate starting and ending tornado latitude/longitudes on a map. I have also never seen a tornado, so I thought this would be the next best thing.

Load libraries and dataset

library(tidyverse)
library(highcharter)
library(leaflet)
library(RColorBrewer)

tornadoes <- read_csv("2024_torn.csv")
head(tornadoes)
## # A tibble: 6 × 29
##       om    yr mo    dy    date       time      tz st    stf     stn   mag   inj
##    <dbl> <dbl> <chr> <chr> <date>     <time> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 623402  2024 01    05    2024-01-05 05:56      3 TX    48        0     0     0
## 2 623403  2024 01    06    2024-01-06 14:32      3 FL    12        0    -9     0
## 3 623404  2024 01    06    2024-01-06 16:47      3 FL    12        0     0     0
## 4 623405  2024 01    08    2024-01-08 15:42      3 LA    22        0     0     0
## 5 623406  2024 01    08    2024-01-08 19:25      3 MS    28        0     0     0
## 6 623407  2024 01    08    2024-01-08 19:31      3 MS    28        0     0     0
## # ℹ 17 more variables: fat <dbl>, loss <dbl>, closs <dbl>, slat <dbl>,
## #   slon <dbl>, elat <dbl>, elon <dbl>, len <dbl>, wid <dbl>, ns <dbl>,
## #   sn <dbl>, sg <dbl>, f1 <dbl>, f2 <dbl>, f3 <dbl>, f4 <dbl>, fc <dbl>

Cleaning

#-9 indicates missing data, sub with NA
tornadoes$mag <- gsub(-9, NA, tornadoes$mag)

#Filter only useful variables without NAs 
tornadoes2 <- tornadoes |>
  filter(!is.na(mag)) |>
  select(c(date, mo, dy, time, st, mag, loss, slat, slon, elat, elon, len, wid))

#head(tornadoes2)

Explore data

tornadoes3 <- tornadoes2 |>
  filter(!is.na(mag)) |>
  group_by(st) |>
  summarise(number_tornadoes = n(),
            mean_width = mean(wid),
            mean_length = mean(len),
            mean_damage = mean(loss)) |>
  arrange(desc(number_tornadoes))
head(tornadoes3, 5)
## # A tibble: 5 × 5
##   st    number_tornadoes mean_width mean_length mean_damage
##   <chr>            <int>      <dbl>       <dbl>       <dbl>
## 1 OK                 128       399.        5.37    2641711.
## 2 IL                 127       162.        5.56     110705.
## 3 TX                 108       268.        5.50     375662.
## 4 FL                  99       218.        7.13    9081545.
## 5 IA                  98       189.        7.00     488063.
#Top five states
tornadoes5 <- tornadoes2 |>
  filter(st %in% c("OK", "IL", "TX", "FL", "IA"))

Exploration plot (not final)

ggplot(tornadoes5, aes(x = date, y = wid, color = mag)) + 
  geom_point(aes(size = loss, alpha = 0.1)) + 
  labs(
    x = "Month",
    y = "Width (Yards)",
    title = "Tornado Width over Time (2024)",
    caption = "Source: NOAA"
  ) + 
  scale_color_brewer(palette = "RdYlGn") +
  theme_minimal() + 
  scale_x_date(date_breaks = "1 month", date_labels = "%b")

#https://stackoverflow.com/questions/10576095/formatting-dates-with-scale-x-date-in-ggplot2

Plot

#Ensure only valuable data is shown
tornadoesfinal <- tornadoes5 |>
  filter(loss > 0)
hchart(
  tornadoesfinal,
  "bubble",
  hcaes(x = wid, y = len, group = mag, z = loss)) |>
  hc_title(text = "Effect of Tornado Length and Width on Damage in Top 5 US States (2024)") |>
  hc_xAxis(type = "logarithmic", 
           title = list(text="Width (yards log scale)")) |>
  hc_yAxis(type = "logarithmic",
           title = list(text="Length (miles log scale)")) |>
  hc_colors(rev(brewer.pal(5, "RdYlGn"))) |>
  hc_tooltip(borderColor = "black",
             pointFormat = "State: {point.st}
             <br>Damage ($): {point.loss}") |>
  hc_caption(text = "<b>Source:</b> NOAA <br> Bubble size indicates damage in dollars <br> Top 5 states by tornado frequency") |>
  hc_legend(title = list(text = "Magnitude")) |>
  hc_add_theme(hc_theme_538())

Map

#Prepare dataset for mapping
tornadoesfinal2 <- tornadoes2 |>
  mutate(latitude = slat,      #allow leaflet to read lat/long
         longitude = slon) |>
  filter(loss > 0)
#Add color palette
#https://rstudio.github.io/leaflet/articles/colors.html
pal <- colorFactor(
  palette = rev(brewer.pal(5, "RdYlGn")), 
  domain = tornadoesfinal2$mag
)

#Add interactivity
tooltip <- paste0(
      "<b>State: </b>", tornadoesfinal2$st, "<br>",
      "<b>Damage: </b>$", tornadoesfinal2$loss, "<br>",
      "<b>Magnitude: </b>", tornadoesfinal2$mag, "<br>",
      "<b>Path Length: </b>", tornadoesfinal2$len, " miles<br>",
      "<b>Path Width: </b>", tornadoesfinal2$wid, " yards<br>"
    )

#Plot
leaflet() |>
  setView(lng = -95, lat = 38, zoom = 3.5) |>
  addProviderTiles("Stadia.AlidadeSmoothDark") |>
  addCircles(
    data = tornadoesfinal2,
    color = ~pal(mag),
    radius = sqrt(tornadoesfinal2$loss)*10,
    popup = tooltip
  )

Conclusion

My scatterplot’s visualization represents how the length and width of a tornado’s path affect the damage it causes. The plot shows that length and width have a positive linear correlation, and that as they increase magnitude also go up. Tornadoes with higher magnitudes deal more costly damage, as seen by the increase in point radii. Additionally, in my test plot I found that tornado length/width was highest during tornado season. Isn’t that something.

My map represents information about where tornadoes are most frequent and powerful in the US. Similar to the scatterplot, color indicates magnitude and size indicates damage costs. An interesting pattern is that the most powerful tornadoes are in the central US and Florida with many weaker ones in the East, but there seem to be very few tornadoes at all in the West. This is apparently due to a lack of moisture and contrasting air masses which facilitate tornado development.

I wish I could have included the paths of the tornadoes on my map with their start and end points, but I could not successfully get this to work.

Citations

https://rstudio.github.io/leaflet/articles/colors.html https://stackoverflow.com/questions/10576095/formatting-dates-with-scale-x-date-in-ggplot2