The US Government’s National Oceanic and Atmospheric Administration (NOAA) publishes a database of severe weather event across the country. The database contains data from 1950 to 2011 and includes information on fatalities and injuries caused by each event as well as the associated economic losses from property and crop damage.
While the quality of the dataset is variable, data for more recent years is expected to be more complete. The key findings from this analysis of the dataset include
The severe weather dataset was downloaded and examined. It consists of over 9 million observations, each consisting of up to 37 variables including the date, time, and location of the event and the associated casualties and economic damage. The National Weather Service’s [Storm Data Documentation] (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) was used to provide clarity on the contents.
## Set up work environment and download the data
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## fileurl <-
"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## [1] "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## download.file(fileurl, destfile = "./Reproducible_Peer2/data.bz2")
extract <- read.csv("data.bz2")
The date on which the event occurred is provided as a character string in the format (m/dd/yyy h:mm:ss). The economic damage is provided as two sets of columns, the first providing a numeric value and the second providing the scale associated with the numeric value. As the documentation did not provide an inventory of the values for this field, only the most commonly used scales (e.g, “k” or “K” for kilo, i.e., multiply by 1000) were used. The analysis can easily be rerun if the corresponding information becomes available at a later stage. Relevant transformations were applied to produce a dataset ready for analysis.
extract$Year <- as.numeric(format(mdy_hms(extract$BGN_DATE), "%Y"))
## Define function to calculate damage $ values
dollarize <- function(input, column1, column2)
{ x <- input[,column1]
y <- input[,column2]
symb <- c("k","K","m","M","b","B")
mult <- c(1000,1000,1000000,1000000,1000000000,1000000000)
x * mult[match(y, symb)]
}
extract$PropDamage <- dollarize(extract, 25, 26) %>% replace_na(0)
extract$CropDamage <- dollarize(extract, 27, 28) %>% replace_na(0)
extract$TotalDamage <- extract$PropDamage + extract$CropDamage
## Subset the data needed for further analysis and clean up column names
data <- extract[,c(7, 8, 23, 24, 38, 39, 40, 41)]
colnames(data) <- c("State","Event_Type","fatalities","injuries", "Year",
"propertydamage","cropdamage","totaldamage")
The public health impacts of severe weather were assessed according to three parameters - by event type, by state, and by year.
total_f <- sum(data$fatalities)
total_i <- sum(data$injuries)
The total number of fatalities and injuries in the dataset were 15145 and 140528 respectively.
The data was grouped by event type and the top 10 causes of fatalities and injuries are presented below. It is evident that these casualties are highly concentrated within a small set of events and that there is some correlation between the number of fatalities and injuries for each type of severe weather event.
by_event <- data %>% group_by(Event_Type) %>%
summarize(Fatalities = sum(fatalities), Injuries = sum(injuries))
head(arrange(by_event[,1:2], desc(Fatalities)),10) ## Top causes of fatalities
## # A tibble: 10 × 2
## Event_Type Fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
head(arrange(by_event[,c(1,3)], desc(Injuries)),10) ## Top causes of injuries
## # A tibble: 10 × 2
## Event_Type Injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Similarly, the data was grouped by state and it was seen that the casualties are highly concentrated.
by_state <- data %>% group_by(State) %>%
summarize(Fatalities = sum(fatalities), Injuries = sum(injuries))
head(arrange(by_state[,1:2], desc(Fatalities)),10) ## States with highest fatalities
## # A tibble: 10 × 2
## State Fatalities
## <chr> <dbl>
## 1 IL 1421
## 2 TX 1366
## 3 PA 846
## 4 AL 784
## 5 MO 754
## 6 FL 746
## 7 MS 555
## 8 CA 550
## 9 AR 530
## 10 TN 521
head(arrange(by_state[,c(1,3)], desc(Injuries)),10) ## States with highest injuries
## # A tibble: 10 × 2
## State Injuries
## <chr> <dbl>
## 1 TX 17667
## 2 MO 8998
## 3 AL 8742
## 4 OH 7112
## 5 MS 6675
## 6 FL 5918
## 7 OK 5710
## 8 IL 5563
## 9 AR 5550
## 10 TN 5202
Plots for casualties over time are presented below. The number of fatalities casued by severe weather events has increased sharply since the 1990s.
by_year <- data %>% group_by(Year) %>% summarize(Fatalities = sum(fatalities),
Injuries = sum(injuries))
g1 <- ggplot(by_year, aes(x = Year, y = Fatalities, group = 1))
g1 + geom_line() + theme_minimal() +
scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) +
ggtitle("Fatalities Caused by Severe Weather Events in the USA")
g2 <- ggplot(by_year, aes(x = Year, y = Injuries, group = 1))
g2 + geom_line() + theme_minimal() +
scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) +
scale_y_continuous(breaks=c(2000, 4000, 6000, 8000, 10000)) +
ggtitle("Injuries Caused by Severe Weather Events in the USA")
As with public health impacts, the economic impacts were examined based on three parameters - by event type, by state, and by year.
total_eco <- round(sum(data$totaldamage)/1000000000,1)
The total economic damages associated with severe weather events was 476.4 billion dollars.
The top causes for property and crop damage are presented below (damage values are in $ millions). It is evident that these most of the losses are concentrated in a few event types.
eco_event <- data %>% group_by(Event_Type) %>%
summarize(Property_Damage = sum(propertydamage)/1000000,
Crop_Damage = sum(cropdamage)/1000000)
## Top causes of property damage
head(arrange(eco_event[,1:2], desc(Property_Damage)),10)
## # A tibble: 10 × 2
## Event_Type Property_Damage
## <chr> <dbl>
## 1 FLOOD 144658.
## 2 HURRICANE/TYPHOON 69306.
## 3 TORNADO 56937.
## 4 STORM SURGE 43324.
## 5 FLASH FLOOD 16141.
## 6 HAIL 15732.
## 7 HURRICANE 11868.
## 8 TROPICAL STORM 7704.
## 9 WINTER STORM 6688.
## 10 HIGH WIND 5270.
## Top causes of crop damage
head(arrange(eco_event[,c(1,3)], desc(Crop_Damage)),10)
## # A tibble: 10 × 2
## Event_Type Crop_Damage
## <chr> <dbl>
## 1 DROUGHT 13973.
## 2 FLOOD 5662.
## 3 RIVER FLOOD 5029.
## 4 ICE STORM 5022.
## 5 HAIL 3026.
## 6 HURRICANE 2742.
## 7 HURRICANE/TYPHOON 2608.
## 8 FLASH FLOOD 1421.
## 9 EXTREME COLD 1293.
## 10 FROST/FREEZE 1094.
The states with the highest losses can be seen in the following table, with just a few states accounting for a large share of the total losses (damage values are in $ millions).
eco_state <- data %>% group_by(State) %>%
summarize(Property_Damage = sum(propertydamage)/1000000,
Crop_Damage = sum(cropdamage)/1000000)
## States with highest property damage
head(arrange(eco_state[,1:2], desc(Property_Damage)),10)
## # A tibble: 10 × 2
## State Property_Damage
## <chr> <dbl>
## 1 CA 123588.
## 2 LA 60074.
## 3 FL 41510.
## 4 MS 29810.
## 5 TX 26642.
## 6 AL 17241.
## 7 IL 8537.
## 8 NC 8230.
## 9 MO 7179.
## 10 OH 6856.
## States with highest crop damage
head(arrange(eco_state[,c(1,3)], desc(Crop_Damage)),10)
## # A tibble: 10 × 2
## State Crop_Damage
## <chr> <dbl>
## 1 TX 7301.
## 2 MS 6610.
## 3 IL 5551.
## 4 IA 4697.
## 5 FL 3903.
## 6 CA 3528.
## 7 NE 2171.
## 8 NC 2054.
## 9 LA 1229.
## 10 OK 1207.
As can be seen in the following plot, the economic damage associated with severe weather events has increased dramatically since the 1990s. In particular, 2005 and 2006 had extremely large damages.
eco_year <- data %>% group_by(Year) %>%
summarize(Total_Damage = sum(totaldamage)/1000000)
g3 <- ggplot(eco_year, aes(x= Year, y = Total_Damage))
g3 + geom_line() + theme_minimal() +
scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) +
ylab("Total Economic Damage (million $)") +
ggtitle("Economic Damages from Severe Weather Events in the USA")
head(arrange(eco_year, desc(Total_Damage)),5)
## # A tibble: 5 × 2
## Year Total_Damage
## <dbl> <dbl>
## 1 2006 125472.
## 2 2005 100825.
## 3 2004 26799.
## 4 1993 21987.
## 5 2011 21556.
NOTE TO READER The above analysis can be easily replicated for narrower geographical scopes (e.g., at state or county level). Please contact the author of the report to request bespoke analysis.