Introduction

This program looks at storm data from the National Weather Service and examines the impacts of different event types on public health and economic outcomes. The program will uplaod the correct dataset, clean and process, and produce a few exploratory graphs that attempt to provide ideas for future analysis.

Data Processing

This section sets up the directories and imports the libaries necessary for my results to be reproducible. It then imports the data and reads it in to R. The data is then processed for the analysis, mostly by cleaning the EVTYPE variable and formatting dates correctly. ## Setting Up This code sets up R for the analysis. The program uses tidyverse packages, as well as RColorBrewer to look pretty.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(RColorBrewer)

We will also create the directories to be used from a given working directory, and download the data to that directory.

if(!file.exists('./plots')){dir.create('./plots')}
if(!file.exists('./data')){dir.create('./data')}
if(!file.exists('./data/weatherdata.csv.bz2')) {
     url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(url,'./data/weatherdata.csv.bz2', method = "curl")
}

And finally, we will load the data

weatherdata <- read.csv('./data/weatherdata.csv.bz2')

Processing for Analysis

Columns and Dates

First, we take a high-level look at the data, just to be sure what we’re dealing with. This step isn’t needed for reproducibility, but is a best practice.

dim(weatherdata)
## [1] 902297     37
str(weatherdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Next, the analysis is only interested in a few of these columns, so we want to narrow down our data to just the columns for the analysis. These columns contain data on our dates, geographic information, the type of event, and the health and economic variables of interest to the analysis. I also prefer lower-case names, so I edit the names here.

relevant_cols <- c("BGN_DATE","STATE","EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP","REFNUM")
working_data <- weatherdata[,relevant_cols]
names(working_data) <- tolower(names(working_data))

Because the analysis is interested in how some of these variables have changed over time, we also need to correctly format the dates and then, since this is just a simple analysis, isolate the year for each observation.

working_data <- working_data %>% mutate(dates = mdy_hms(bgn_date)) %>% mutate(years = year(dates))

Now with all of the relevant columns added and formatted, the program checks for the percentage of NA’s in each column

num_of_na <- c()
for(i in seq_along(names(working_data))){
     na <- round(mean(is.na(working_data[,i])),2)*100
     num_of_na <- c(num_of_na,na)
}
num_of_na
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0

No Na’s! that’s good.

Dealing with the Damage exponents.

The property and crop damage exponents need some work in order for the data to be usable. First, let’s check for the different symbols being used.

sort(table(working_data$propdmgexp), decreasing = T)
## 
##             K      M      0      B      5      1      2      ?      m      H 
## 465934 424665  11330    216     40     28     25     13      8      7      6 
##      +      7      3      4      6      -      8      h 
##      5      5      4      4      4      1      1      1

Looks confusing, but luckily someone at this link already double checked what each symbol was standing in for. Based on their analysis, the following code wiill create a new column for each exponent, and then another to multiply the damage number by said exponent.

working_data$propdmgexponent <- working_data$propdmgexp
working_data$propdmgexponent[grep("\\?", working_data$propdmgexp)] <- "0"
working_data$propdmgexponent[grep("\\-", working_data$propdmgexp)] <- "0"
working_data$propdmgexponent[grep("\\+", working_data$propdmgexp)] <- "1"
working_data$propdmgexponent[grep("[0-8]", working_data$propdmgexp)] <- "1"
working_data$propdmgexponent[grep("[Hh]", working_data$propdmgexp)] <- "2"
working_data$propdmgexponent[grep("[Kk]", working_data$propdmgexp)] <- "3"
working_data$propdmgexponent[grep("[Mm]", working_data$propdmgexp)] <- "6"
working_data$propdmgexponent[grep("[Bb]", working_data$propdmgexp)] <- "9"

working_data$cropdmgexponent <- working_data$cropdmgexp
working_data$cropdmgexponent[grep("\\?", working_data$cropdmgexp)] <- "0"
working_data$cropdmgexponent[grep("\\-", working_data$cropdmgexp)] <- "0"
working_data$cropdmgexponent[grep("\\+", working_data$cropdmgexp)] <- "1"
working_data$cropdmgexponent[grep("[0-8]", working_data$cropdmgexp)] <- "1"
working_data$cropdmgexponent[grep("[Hh]", working_data$cropdmgexp)] <- "2"
working_data$cropdmgexponent[grep("[Kk]", working_data$cropdmgexp)] <- "3"
working_data$cropdmgexponent[grep("[Mm]", working_data$cropdmgexp)] <- "6"
working_data$cropdmgexponent[grep("[Bb]", working_data$cropdmgexp)] <- "9"

working_data$total_prop_damage <- (working_data$propdmg * 10^as.numeric(working_data$propdmgexponent))/1000000
working_data$total_crop_damage <- (working_data$cropdmg * 10^as.numeric(working_data$cropdmgexponent))/1000000

Reclassifying Events

First, we get a rough of idea of the number of events and those most commonly seen.

no.events <- length(unique(working_data$evtype))
sort(table(working_data$evtype), decreasing = TRUE)[1:40]
## 
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE 
##                      600

I narrowed these 985 events in to 15 different rough classifications. Those are winter-related (snow, ice), wind, visibility(dust or fog), tornadoes, thunderstorms, rain, ocean-related, landslides, hurricanes, heat-related, hail, floods, fire, drought, and other. These classifications obviously are not perfect, and one figure below in particular will give us pause on our classifications, but in order to start generating first-pass exploratory graphs, I thought it sufficient.

The below code classifies these events in to these 15 categories. Note that the other was done first, then rain and wind categories because these were the more general categories, and so if for example a classification was “Hurricane Wind”, I wanted this to be attributed to hurricanes rather than wind.

working_data$event_class <- "Other"
working_data$event_class[grep("WINDS?|GUSTY|BURST", working_data$evtype, ignore.case = T)] <- "Wind"
working_data$event_class[grep("RAIN|PRECIP", working_data$evtype, ignore.case = T)] <- "Rain"
working_data$event_class[grep("SNOW|BLIZZARD|IC(E|Y)|WINT|COLD|CHILL|FREEZ|RECORD LOW|SLEET|FROST|MIX", 
                              working_data$evtype, ignore.case = T)] <- "Winter"
working_data$event_class[grep("HAIL", working_data$evtype, ignore.case = T)] <- "Hail"
working_data$event_class[grep("TSTM|THUNDERSTORM|LIGHTN?ING", working_data$evtype, ignore.case = T)] <- "Thunderstorm"
working_data$event_class[grep("TORN|NADO|FUNNEL|WATERSPOUT", working_data$evtype, ignore.case = T)] <- "Tornado"
working_data$event_class[grep("FLOO?OD|FLD|SURF", working_data$evtype, ignore.case = T)] <- "Flood"
working_data$event_class[grep("FIRE", working_data$evtype, ignore.case = T)] <- "Fire"
working_data$event_class[grep("DROUGHT", working_data$evtype, ignore.case = T)] <- "Drought"
working_data$event_class[grep("HEAT|HIGH TEMPERATURE|WARMTH|RECORD HIGH", working_data$evtype, ignore.case = T)] <- "Heat"
working_data$event_class[grep("SLIDES?\\>|AVALAN", working_data$evtype, ignore.case = T)] <- "Landslides"
working_data$event_class[grep("RIP CURRENT|TIDE|MARINE|SEA|SURGE|WAVE", working_data$evtype, ignore.case = T)] <- "Ocean"
working_data$event_class[grep("HURRICANE|TROPICAL STORM|TYPHOON", working_data$evtype, ignore.case = T)] <- "Hurricane"
working_data$event_class[grep("FOG|DUST", working_data$evtype, ignore.case = T)] <- "Visibility" 

Now lets take a look at our final classifications, as well as the percentage of the data that got classified as “Other”. If this were too high, we would want to perhaps take another look at some of our missing classifications.

sort(table(working_data$event_class), decreasing = TRUE)[1:length(unique(working_data$event_class))]
## 
## Thunderstorm         Hail        Flood      Tornado       Winter         Wind 
##       340577       288838        87184        71544        46889        26327 
##        Ocean         Rain         Fire         Heat      Drought   Visibility 
##        14465        11902         4240         2757         2495         2472 
##   Landslides    Hurricane        Other 
##         1040          996          571
other_percentage <- round(mean(working_data$event_class == "Other")*100,2)

I am happy with 0.06 percent slippage in the other category!

Finally, we need to subset based on years. As figure 1 below will show, tornadoes far outpace all other events in casualities. This is because they were the first events tracked by the data, and so had a long start time. We will look at the total affects below, but except for the first graph and andother tracking tornadoes since 1950, we will be using the following data subset.

modern_data <- filter(working_data, years >= 1996)

And that concludes our data processing.

Results

Health Outcomes

First, lets take a look at the total outcomes of both fatalities and injuries across all years.

event_fatalsums <- working_data %>% group_by(event_class) %>% summarize(sums = sum(fatalities, na.rm = TRUE)) %>% mutate(tag = 'Fatalities')
event_injurysums <- working_data %>% group_by(event_class) %>% summarize(sums = sum(injuries, na.rm = TRUE)) %>% mutate(tag = 'Injuries')
total_health_sums <- rbind(event_fatalsums,event_injurysums)

casualitysums_bar <- ggplot(total_health_sums, aes(event_class, sums, fill = tag)) + 
        geom_bar(stat = "identity", color = "black") + 
        coord_flip() +
        scale_fill_brewer(palette = "Spectral") +
        labs(x ="Event Type", y = "Casualities", title = "Total Casualities by Event Type")
print(casualitysums_bar)

Clearly, we can’t look at all years for accurate representations of our data. Let’s try the modern data looking since 1996.

all_nados <- round(mean(working_data$event_class == 'Tornado')*100,2)
new_nados <- round(mean(modern_data$event_class == 'Tornado')*100,2)
modern_event_fatalsums <- modern_data %>% group_by(event_class) %>% summarize(sums = sum(fatalities, na.rm = TRUE)) %>% mutate(tag = 'Fatalities')
modern_event_injurysums <- modern_data %>% group_by(event_class) %>% summarize(sums = sum(injuries, na.rm = TRUE)) %>% mutate(tag = 'Injuries')
modern_total_health_sums <- rbind(modern_event_fatalsums,modern_event_injurysums)

modern_casualitysums_bar <- ggplot(modern_total_health_sums, aes(event_class, sums, fill = tag)) + 
     geom_bar(stat = "identity", color = "black") + 
     coord_flip() +
     scale_fill_brewer(palette = "Spectral") +
     labs(x ="Event Type", y = "Casualities", title = "Total Casualities by Event Type", subtitle = "Since 1996")
print(modern_casualitysums_bar)

So that clearly makes a difference, but tornadoes still dominate. Although, it does seem odd that the proportion of the dataset taken up by tornadoes only fell from 7.93 percent to 4.99 percent. That might be something to check in our data later.

It’s also no surprise that the most fatal events also cause the most injuries, though ocean and landslide events seem to be much more lethal in relative terms than the more destructive events like tornadoes and storms.

As the below graph shows, when ladnslide and ocean events do occur, they are significantly more lethal than other more broadly destructive events.

modern_lethality <- modern_event_fatalsums %>% bind_cols(modern_event_injurysums)
lethal_bar <- ggplot(modern_lethality,aes(event_class, sums/(sums1 + sums), fill = event_class)) + 
        geom_bar(stat = "identity", color = "black") + 
        ylim(0, 0.75) +
        coord_flip() +
        theme(legend.position="none") +
        labs(x ="Event Type", y = "Lethality", title = "Lethality by Event Type", subtitle = "Since 1996")
print(lethal_bar)

One other interesting point. Our data may be more accurately reflecting the occurrence rate of different events. As shown here, the event with the highest number of casualties is actually the heat-related category. It is important to note, however, that the means may be low because there are a vast number of zero-casualty events for a given class.

event_fatalmeans <- modern_data %>% group_by(event_class) %>% 
     summarize(means = mean(fatalities, na.rm = TRUE)) %>% mutate(tag = 'Fatalities')
event_injurymeans <- modern_data %>% group_by(event_class) %>% 
     summarize(means = mean(injuries, na.rm = TRUE)) %>% mutate(tag = 'Injuries')
total_health_means <- rbind(event_fatalmeans,event_injurymeans)

casualitymeans_bar <- ggplot(total_health_means, aes(event_class, means, fill = tag)) + 
     geom_bar(stat = "identity", color = "black") + 
     coord_flip() +
     scale_fill_brewer(palette = "Spectral") +
     labs(x ="Event Type", y = "Casualities", title = "Mean Casualities by Event Type")
print(casualitymeans_bar)

It would also be interest to see how some of these events vary over time (or at least since 1996). The below graph plots the five most destructive event classes over the time period.

big_events <- c('Winter','Tornado','Thunderstorm','Heat','Flood')
big_event_data <- modern_data %>% filter(event_class %in% big_events) %>% mutate(casualities = fatalities + injuries)
big_event_sums <- big_event_data %>% group_by(event_class, years) %>% summarize(sums = sum(casualities, na.rm = TRUE))
big_event_line <- ggplot(big_event_sums, aes(x = years, y = sums, group = event_class, color = event_class)) +
        geom_line() +
        geom_point() +
        labs(x = "Year", y = "Sum of Casualities", title = "Casualities by Event from 1996 to 2011") +
        theme(legend.title=element_blank())
print(big_event_line)

and just for fun, here is another graph tracking tornadoes from 1950. It is interesting that every 30 years or so there is a particularly bad tornado season in the US causing a spike in casualties.

tornado_event_data <- working_data %>% filter(event_class == 'Tornado') %>% mutate(casualities = fatalities + injuries)
tornado_event_sums <- tornado_event_data %>% group_by(years) %>% summarize(sums = sum(casualities, na.rm = TRUE))
tornado_event_line <- ggplot(tornado_event_sums, aes(x = years, y = sums)) +
        geom_line() +
        geom_point()+
        labs(x = 'Year', y = 'Sum of Casualities', title =  'Tornados from 1950 to 2011', subtitle = 'Just for fun')
print(tornado_event_line)

We can also try to examine some geographic data. Namely, I want to see how outcomes have changed for the 10 states with the most casualties causes by weather events have changed from 1996-2004 to 2004-2011.

rel_states <- big_event_data %>% group_by(state) %>% summarize(sums = sum(casualities, na.rm = TRUE)) %>%
        filter(sums %in% sort(sums, decreasing = TRUE)[1:10])
state_sums_early <- big_event_data %>% filter(years <= 2004 & state %in% rel_states[[1]]) %>% 
        group_by(state,event_class) %>% summarize(sums = sum(casualities, na.rm = TRUE)) %>% mutate(tag = 'Pre-2004')
state_sums_late <- big_event_data %>% filter(years > 2004 & state %in% rel_states[[1]]) %>% 
        group_by(state,event_class) %>% summarize(sums = sum(casualities, na.rm = TRUE)) %>% mutate(tag = 'Post-2004')
state_sums_total <- rbind(state_sums_early,state_sums_late)
state_event_line <- ggplot(state_sums_total, aes(x = tag, y = sums, group = event_class, color = event_class)) +
        geom_line() +
        geom_point() +
        facet_wrap( ~ state, ncol = 5) +
        coord_cartesian(ylim=c(0, 2500)) +
        labs(x = "Year", 
             y = "Sum of Casualities", 
             title = "Casualities by Event in Most Affected States", 
             subtitle = "Before and After 2004") +
        theme(legend.title=element_blank())
print(state_event_line)

Economic Outcomes

First, lets look at the aggregates of all damage for each of the major event types

event_propsums <- modern_data %>% group_by(event_class) %>% 
     summarize(sums = sum(total_prop_damage, na.rm = TRUE)) %>% mutate(tag = 'Property Damage')
event_cropsums <- modern_data %>% group_by(event_class) %>% summarize(sums = sum(total_crop_damage, na.rm = TRUE)) %>% 
     mutate(tag = 'Crop Damage')
total_damage_sums <- rbind(event_propsums,event_cropsums)
total_damage_sums_bar <- ggplot(total_damage_sums, aes(event_class, sums, fill = tag)) + 
     geom_bar(stat = "identity", color = "black") + 
     coord_flip() +
     scale_fill_brewer(palette = "Spectral") +
     labs(x ="Event Type", y = "Damage in $MM", title = "Total Damage by Event Type", subtitle = "Since 1996")
print(total_damage_sums_bar)

There seems to be a big difference between each type of damage, so lets look at each seperately

propsums_bar <- ggplot(event_propsums,aes(event_class, sums, fill = event_class)) + 
     geom_bar(stat = "identity", color = "black") + 
     coord_flip() +
     theme(legend.position="none") +
     ylim(0,200000) +
     labs(x ="Event Type", y = "Damage in $MM", title = "Total Property Damage by Event Type")
print(propsums_bar)

cropsums_bar <- ggplot(event_cropsums,aes(event_class, sums, fill = event_class)) + 
     geom_bar(stat = "identity", color = "black") + 
     coord_flip() +
     theme(legend.position="none") +
     ylim(0,15000) +
     labs(x ="Event Type", y = "Damage in $MM", title = "Total Crop Damage by Event Type")
print(cropsums_bar)

And look at these over time

## By Time
big_prop_events <- c('Flood','Hurricane','Ocean','Tornado','Hail')
big_prop_data <- modern_data %>% filter(event_class %in% big_prop_events)

big_crop_events <- c('Drought','Flood','Hurricane','Winter','Hail')
big_crop_data <- modern_data %>% filter(event_class %in% big_crop_events)

big_prop_sums <- big_prop_data %>% group_by(event_class, years) %>% summarize(sums = sum(total_prop_damage, na.rm = TRUE))

big_prop_line <- ggplot(big_prop_sums, aes(x = years, y = sums, group = event_class, color = event_class)) +
        geom_line() +
        geom_point() +
        labs(x = "Year", y = "Property Damage", title = "Property Damage by Event from 1996 to 2011") +
        theme(legend.title=element_blank())
print(big_prop_line)

## Guessing on these, "ocean" is probably mislabeled (2004 was hurricane Katrina)
## The big spike in the Flood in 2006 is likely the Mid-Atlantic Flood
big_crop_sums <- big_crop_data %>% group_by(event_class, years) %>% summarize(sums = sum(total_crop_damage, na.rm = TRUE))

big_crop_line <- ggplot(big_crop_sums, aes(x = years, y = sums, group = event_class, color = event_class)) +
        geom_line() +
        geom_point() +
        labs(x = "Year", y = "Crop Damage", title = "Crop Damage by Event from 1996 to 2011") +
        theme(legend.title=element_blank())
print(big_crop_line)

The property damage time graph is particularly interesting because it may point to a mislabeling of the data. it’s odd that hurricane and ocean events spike with one another during Hurricane Katrina, although these could of course just be correlted events

Conclusion

In conclusion, tornadoes seem to be the greatest weather-related health risk to Americans given this data, followed in aggregate terms by flooding, thunderstorms, and heat-related events. This information comes with the important caveat that some events, when they occur, are particularly dangerous, such as landslides and ocean-related events.

On the economic side, property damage vastly outweighs crop damage in these weather events. The events causing the most property damage were floods, hurricanes, ocean-related events, and tornadoes.

Crop damage was a result of mostly droughts, followed by hail and flooding.

There are some causes for concern in my processing of the data. There do seem to be some correlations between similar events, such as hurricanes and ocean swells. If I had more time to completely the assignment, one route I would really like to use k-means clustering based on the latitude and longitude data to attempt to more accurately group the weather events together. For example, in some of the first data in Alabama, there were simultaneous thunderstorms and tornadoes. This was likely a single weather event, and not accounting for these correlations is a big weakness in the current analysis.