To determine the primary impacts of varous weather events on US health and economic wellbeing, we will analyze data from the NOAA National Climatic Data Center. This dataset covers the years 1950 to 2011. However, only data for 1996 to 2011 include all 48 event types which have been recorded, so our analysis will only cover those years.
The data includes counts by storm event type for injuries, fatalities, property damage and crop damage. We will compare totals of these counts for the entire time period to determine the top ten weather event impacts. Then we will group these totals by month to plot the seasonal occurrence of these events. By seeing the most significant weather events and their impacts over the course of the year, decisions can be made to allocate resources to better manage these impacts.
Across the United States from 1996 to 2011, the types of events most associated with adverse impacts to population health are tornados, heat, flooding, lightning, and thunderstorm winds. Floods, hurricanes, typhoons, storm surge, tornadoes, and hail have the greatest economic consequences. Other important events are rip currents, high winds, ice storms, avalanche, floods, drought, and ice storms.
Download the compressed CSV file and load it into a data.table. Remove extra columns, format the date columns, add a Month column, filter to include only data from the fifty states of the US, and convert property and crop damage values to US dollars.
# Download the data file (if missing).
url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zipfile <- "StormData.csv.bz2"
if (! file.exists(zipfile)) {
download.file(url, zipfile , "auto")
}
# Load into a data.table (faster and more feature-rich than a data.frame).
library(data.table)
stormdata <- as.data.table(read.csv(bzfile(zipfile), header = TRUE,
stringsAsFactors = FALSE))
# Remove extra columns (to free up memory).
stormdata[, c("STATE__","BGN_TIME","TIME_ZONE","COUNTY","COUNTYNAME",
"BGN_RANGE","BGN_AZI","BGN_LOCATI","END_TIME","COUNTY_END",
"COUNTYENDN","END_RANGE","END_AZI","END_LOCATI","LENGTH",
"WIDTH","F","MAG","WFO","STATEOFFIC","ZONENAMES","LATITUDE",
"LONGITUDE","LATITUDE_E","LONGITUDE_","REMARKS","REFNUM") := NULL]
# Prepare date columns. (Convert from character variable type to date type.)
suppressMessages(library(lubridate))
stormdata[, c("BGN_DATE", "END_DATE")
:= list(mdy_hms(BGN_DATE), mdy_hms(END_DATE))]
stormdata[, "Month" := as.factor(month(BGN_DATE))]
# Filter storm data to include only the 50 US states and years 1996 to present.
# See: http://www.ncdc.noaa.gov/stormevents/details.jsp
library(datasets)
stormdata <- stormdata[STATE %in% state.abb & BGN_DATE >= mdy("01/01/1996")]
# Convert damage estimates to US dollars. See section 2.7 "Damage" in this PDF:
# http://www.ncdc.noaa.gov/stormevents/pd01016005curr.pdf
# Convert property damage amounts to USD. (Convert exponent code and multiply.)
stormdata[, PROPDMGEXP := toupper(PROPDMGEXP)]
stormdata[, PROPDMGEXP := sub("^[^KMB]*$", 0, PROPDMGEXP)]
stormdata[, PROPDMGEXP := sub("K", 3, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMGEXP := sub("M", 6, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMGEXP := sub("B", 9, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMG := as.numeric(PROPDMG)*10^as.numeric(PROPDMGEXP)]
# Convert crop damage amounts to USD. (Convert exponent code and multiply.)
stormdata[, PROPDMGEXP := toupper(CROPDMGEXP)]
stormdata[, CROPDMGEXP := sub("^[^KMB]*$", 0, CROPDMGEXP)]
stormdata[, CROPDMGEXP := sub("K", 3, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMGEXP := sub("M", 6, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMGEXP := sub("B", 9, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMG := as.numeric(CROPDMG)*10^as.numeric(CROPDMGEXP)]
Group fatalities and injuries by event type.
health <- stormdata[, .(FATALITIES, INJURIES, Month), by=EVTYPE]
Determine the top ten weather event types for injuries with a sum of all injuries, grouped by event type, for the entire time period.
# Calculate sums by event type, then sort, and find top ten.
injuries <- health[, .(Injuries=sum(INJURIES)), EVTYPE]
setkey(injuries, Injuries)
topteninjuries <- tail(injuries, 10)
From the list of top ten weather event types for injuries, calculate injury sums by event type and month.
# Sum by event type and month, convert event type to factor.
injuriesbymonth <- health[EVTYPE %in% topteninjuries$EVTYPE,
.(Injuries=sum(INJURIES)), .(EVTYPE, Month)]
injuriesbymonth[, "EventType" := factor(EVTYPE, levels=rev(topteninjuries$EVTYPE))]
Determine the top ten weather event types for fatalities with a sum of all fatalities, grouped by event type, for the entire time period.
# Calculate sums by event type, then sort, and find top ten.
fatalities <- health[, .(Fatalities=sum(FATALITIES)), EVTYPE]
setkey(fatalities, Fatalities)
toptenfatalities <- tail(fatalities, 10)
From the list of top ten weather event types for fatalities, calculate fatality sums by event type and month.
# Sum by event type and month, convert event type to factor.
fatalitiesbymonth <- health[EVTYPE %in% toptenfatalities$EVTYPE,
.(Fatalities=sum(FATALITIES)), .(EVTYPE, Month)]
fatalitiesbymonth[, "EventType" := factor(EVTYPE, levels=rev(toptenfatalities$EVTYPE))]
Group property and crop damage by event type.
economic <- stormdata[, .(PROPDMG, CROPDMG, Month), by=EVTYPE]
Determine the top ten weather event types for property and crop damage with a sum of all damage, grouped by event type, for the entire time period.
# Calculate sums by event type, then sort, and find top ten.
damage <- economic[, .(Damage=sum(
sum(PROPDMG, na.rm=TRUE), sum(CROPDMG, na.rm=TRUE), na.rm=TRUE)/1e6), EVTYPE]
setkey(damage, Damage)
toptendamage <- tail(damage, 10)
From the list of top ten weather event types for crop and property damage, calculate damage sums by event type and month. Convert damage sums to millions of US dollars.
# Sum by event type and month, convert event type to factor.
damagebymonth <- economic[EVTYPE %in% toptendamage$EVTYPE,
.(Damage=sum(
sum(PROPDMG, na.rm=TRUE), sum(CROPDMG, na.rm=TRUE), na.rm=TRUE)/1e6),
.(EVTYPE, Month)]
damagebymonth[, "EventType" := factor(EVTYPE, levels=rev(toptendamage$EVTYPE))]
We will show three plots which demonstrate the top ten storm event types which have the greatest impact on population health and economics.
# Use a color-blind-friendly pallette for plotting 10 colors.
# See: http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7", "#996633",
"#999999", "#000000")
This plot shows the top ten weather event types for injuries.
library(ggplot2)
ggplot(injuriesbymonth,
aes(Month, Injuries, group=EventType, fill=EventType)) +
geom_bar(stat="identity") +
ggtitle("Public Health Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Injuries per Month, 1996-2011)") +
ylab("Number of Injuries") +
theme(plot.title = element_text(lineheight=.8, face="bold")) +
scale_fill_manual(values=cbPalette) +
theme(legend.title=element_blank())
Here is a table showing the total number of injuries from 1996 to 2011 for the top ten extreme weather types.
topteninjuries[order(-Injuries)]
## EVTYPE Injuries
## 1: TORNADO 91346
## 2: TSTM WIND 6941
## 3: FLOOD 6786
## 4: EXCESSIVE HEAT 6209
## 5: LIGHTNING 5212
## 6: HEAT 2100
## 7: ICE STORM 1975
## 8: FLASH FLOOD 1767
## 9: THUNDERSTORM WIND 1471
## 10: HAIL 1361
This plot shows the top ten weather event types for fatalities.
library(ggplot2)
ggplot(fatalitiesbymonth,
aes(Month, Fatalities, group=EventType, fill=EventType)) +
geom_bar(stat="identity") +
ggtitle("Public Health Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Fatalities per Month, 1996-2011)") +
ylab("Number of Fatalities") +
theme(plot.title = element_text(lineheight=.8, face="bold")) +
scale_fill_manual(values=cbPalette) +
theme(legend.title=element_blank())
Here is a table showing the total number of fatalities from 1996 to 2011 for the top ten extreme weather types.
toptenfatalities[order(-Fatalities)]
## EVTYPE Fatalities
## 1: TORNADO 5633
## 2: EXCESSIVE HEAT 1883
## 3: FLASH FLOOD 939
## 4: HEAT 935
## 5: LIGHTNING 806
## 6: TSTM WIND 502
## 7: FLOOD 464
## 8: RIP CURRENT 343
## 9: HIGH WIND 248
## 10: AVALANCHE 224
Across the United States, tornadoes are associated with the highest injury rate for the weather events recorded from 1996 to 2011. Tornados are most predominant in the first half of the year. Heat, particularly in midsummer, appears the most deadly. Floods, especially in October, as well as flash floods, lightning and thunderstorm winds in the summer months, also have a high impact on population health. Other important events are rip currents, high winds, ice storms, and avalanche.
This plot shows the top ten weather event types for economic impact.
library(ggplot2)
ggplot(damagebymonth,
aes(Month, Damage, group=EventType, fill=EventType)) +
geom_bar(stat="identity") +
ggtitle("Economic Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Damage per Month, 1996-2011)") +
ylab("Damage in Millions of US$") +
theme(plot.title = element_text(lineheight=.8, face="bold")) +
scale_fill_manual(values=cbPalette) +
theme(legend.title=element_blank())
Here is a table showing the total damage in millions of US dollars from 1996 to 2011 for the top ten extreme weather types.
toptendamage[order(-Damage)]
## EVTYPE Damage
## 1: FLOOD 150145.287
## 2: HURRICANE/TYPHOON 71636.601
## 3: TORNADO 57351.641
## 4: STORM SURGE 43323.466
## 5: HAIL 18758.215
## 6: FLASH FLOOD 17275.076
## 7: DROUGHT 15013.467
## 8: HURRICANE 12103.928
## 9: RIVER FLOOD 10148.401
## 10: ICE STORM 8967.021
Across the United States, flooding, especially in the winter and spring, causes the most economic damage, followed by hurricanes, typhoons, storm surge, and tornadoes. Hail also causes significant damage, mostly in the spring and fall. Other important events are floods, drought, and ice storms.