In this report we examine data on economic and health impacts caused by storms and other severe weather events in the United States over the period 1996 to 2011. For health impacts, we look at the following two variables: number of injuries and number of fatalities. The sum of these two variables we define as casualties. For economic impacts, our focus is on two variables: property damage and crop damage, measured in total dollars. Total damage is the combined cost of property and crop damage.
The data for this report comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It was downloaded as a .csv file from the website for the course Reproducible Research through Coursera. We read in the data from a .csv file compressed through a bzip2 algorithm to reduce its size. To further reduce the size of the data file, we used the fread() function from the data.table package, which allowed us to specify which columns we wanted to include prior to reading in the file (“colsToKeep”), speeding up the loading time. The loading step has also been cached to speed processing time.
# select columns from dataset to read into r
colsToKeep <- c("BGN_DATE", "COUNTYNAME", "STATE", "EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
# read in data set using fread() function
weather_events <- fread("repdata-data-StormData.csv", header = TRUE, select=colsToKeep, stringsAsFactors = FALSE)
After reading in the dataset we determine the dimension and look at the first several rows of the dataset.
dim(weather_events)
## [1] 902297 10
head(weather_events)
## BGN_DATE COUNTYNAME STATE EVTYPE FATALITIES INJURIES PROPDMG
## 1: 4/18/1950 0:00:00 MOBILE AL TORNADO 0 15 25.0
## 2: 4/18/1950 0:00:00 BALDWIN AL TORNADO 0 0 2.5
## 3: 2/20/1951 0:00:00 FAYETTE AL TORNADO 0 2 25.0
## 4: 6/8/1951 0:00:00 MADISON AL TORNADO 0 2 2.5
## 5: 11/15/1951 0:00:00 CULLMAN AL TORNADO 0 2 2.5
## 6: 11/15/1951 0:00:00 LAUDERDALE AL TORNADO 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP
## 1: K 0
## 2: K 0
## 3: K 0
## 4: K 0
## 5: K 0
## 6: K 0
The NOAA started recording data for all 48 event types in January 1996, so we subset the dataset to only include events from this date onwards. Next we group the data by event type and year, and summarize health and economic data by these two groupings. As an additional processing step, we convert the event type field (“EVTYPE”) to uppercase to standardize the field and eliminate duplicates. We also transform the numerical annotations (i.e. “M” for “millions”) in the PROPDMGEXP and CROPDMGEXP so we can multiply these against the PROPDMG and CROPDMG fields to get standardized estimates for property and crop damage costs.
# summarize data on health and economic impacts by event type and year
weather_events %>% mutate(date = as.Date(BGN_DATE, "%m/%d/%Y"),
year = year(date),
EVTYPE = sapply(EVTYPE, toupper), # convert to uppercase for standardization
EVTYPE = ifelse(EVTYPE == "HURRICANE", "HURRICANE/TYPHOON",EVTYPE),
PROPDMGEXP = as.numeric(recode(PROPDMGEXP, "[^0-9]" = 10, "K" = 1000, "k" = 1000,
"M" = 1000000, "m" = 1000000, "B" = 1000000000,
"b" = 1000000000, "H" = 100, "h" = 100, "+" = 1, "-" = 0,
"?" = 0, .default = 0)),
PROPDMGCOST = PROPDMG * PROPDMGEXP,
CROPDMGEXP = as.numeric(recode(CROPDMGEXP, "[^0-9]" = 10, "K" = 1000, "k" = 1000,
"M" = 1000000, "m" = 1000000, "B" = 1000000000,
"b" = 1000000000, "H" = 100, "h" = 100, "+" = 1, "-" = 0,
"?" = 0, .default = 0)),
CROPDMGCOST = CROPDMG * CROPDMGEXP,
TOTALDMGCOST = PROPDMGCOST + CROPDMGCOST
) %>%
filter(date >= "01/01/1996") %>%
group_by(EVTYPE, year) %>%
summarise(count = n(), fatalities = sum(FATALITIES), injuries = sum(INJURIES),
casualties = sum(FATALITIES) + sum(INJURIES), cropdmg = sum(CROPDMGCOST),
propdmg = sum(PROPDMGCOST), totaldmg = sum(TOTALDMGCOST))-> event_impacts
The health impact variables (injuries, fatalities, casualties) are all highly skewed.
summary(event_impacts$injuries)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 62.07 3.00 6824.00
summary(event_impacts$fatalities)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 6.689 1.000 687.000
summary(event_impacts$casualties)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 68.76 5.00 7190.00
As illustrated in the following chart, tornadoes caused the most injuries, followed by floods, excessive heat, lightning, and thunderstorm wind.
most_injuries <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(injuries = sum(injuries)) %>%
arrange(-injuries) %>%
mutate(rank = rank(-injuries, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_injuries, aes(x=reorder(EVTYPE, injuries), y=injuries)) +
geom_bar(stat = "identity", width=.5, fill="tomato3") +
labs(title="Injuries by Weather Event",
subtitle="Top 5 by Number of Injuries, 1996-2011",
x = "Weather Event Types",
y = "Total Injuries") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
Excessive heat caused the most fatalities, followed by tornados, flash floods, lightning and floods.
most_deaths <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(fatalities = sum(fatalities)) %>%
arrange(-fatalities) %>%
mutate(rank = rank(-fatalities, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_deaths, aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
geom_bar(stat = "identity", width=.5, fill="royalblue1") +
labs(title="Fatalities by Weather Event",
subtitle="Top 5 by Number of Fatalities, 1996-2011",
x = "Weather Event Types",
y = "Total Fatalities") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
The economic impact variables are skewed in distribution, but not as dramatically skewed as the health impact data.
summary(event_impacts$propdmg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 1.887e+08 6.885e+05 1.165e+11
summary(event_impacts$cropdmg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 2.169e+07 0.000e+00 5.006e+09
summary(event_impacts$totaldmg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 2.000e+03 2.104e+08 1.309e+06 1.166e+11
Floods, hurricanes/typhoons, storm surges, tornados, and flash floods are the 5 events that caused the most property damage from 1996-2011.
most_propdmg <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(propdmg = sum(propdmg)) %>%
arrange(-propdmg) %>%
mutate(rank = rank(-propdmg, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_propdmg, aes(x=reorder(EVTYPE, propdmg), y=propdmg)) +
geom_bar(stat = "identity", width=.5, fill="tomato3") +
labs(title="Property Damage Costs by Weather Event",
subtitle="Top 5 by Damage Costs, 1996-2011",
x = "Weather Event Types",
y = "Total Property Damage Cost") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
Drought, hurricanes/typhoons, floods, hail, and flash floods caused the most crop damage costs between 1996-2011.
most_cropdmg <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(cropdmg = sum(cropdmg)) %>%
arrange(-cropdmg) %>%
mutate(rank = rank(-cropdmg, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_cropdmg, aes(x=reorder(EVTYPE, cropdmg), y=cropdmg)) +
geom_bar(stat = "identity", width=.5, fill="tomato3") +
labs(title="Crop Damage Costs by Weather Event",
subtitle="Top 5 by Damage Costs, 1996-2011",
x = "Weather Event Types",
y = "Total Crop Damage Cost") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
In summary, the bulk of health impacts and economic impacts due to storms and extreme weather events are caused by small portion of event types.Tornados and floods are the only event types that rank in the top 5 for both health and economic impacts.Tornadoes caused the most human casualties (injuries and fatalities) from 1996-2011, while floods caused the most economic damage (property and crop damage) over that same time span.
most_cas <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(casualties = sum(casualties)) %>%
arrange(-casualties) %>%
mutate(rank = rank(-casualties, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_cas, aes(x=reorder(EVTYPE, casualties), y=casualties)) +
geom_bar(stat = "identity", width=.5, fill="tomato3") +
labs(title="Casualties by Weather Event",
subtitle="Top 5 by Number of Casualties, 1996-2011",
x = "Weather Event Types",
y = "Total Casualties") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))
most_dmg <- event_impacts %>%
group_by(EVTYPE) %>%
summarize(totaldmg = sum(totaldmg)) %>%
arrange(-totaldmg) %>%
mutate(rank = rank(-totaldmg, ties.method = "first")) %>%
filter(rank <= 5)
theme_set(theme_bw())
ggplot(most_dmg, aes(x=reorder(EVTYPE, totaldmg), y=totaldmg)) +
geom_bar(stat = "identity", width=.5, fill="tomato3") +
labs(title="Economic Damage by Weather Event",
subtitle="Top 5 by Damage Costs, 1996-2011",
x = "Weather Event Types",
y = "Total Damage Costs") +
theme(axis.text.x = element_text(angle=65, vjust=0.6))