This report summarizes the effects that extreme weather has on population health and economy in the US, using data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. For our purposes, we use total fatalities and total injuries as our metrics for population health, and the sum of property damage and crop damage as the metric for economic consequences of weather events. Since the data for this report were pulled from a national database with redundant codes applied for similar weather events, and the event codes were not cleaned or transformed, this analysis should be considered exploratory, and would benefit from a systematic re-coding of the distinct weather events contained in the data. However, using the original data, we found that tornadoes are the most dangerous weather event in the US, both in terms of injuries and fatalities. We found that floods are the most costly weather events, and that water seems to be a consistent factor in the most costly weather events in the US.
First, create settings and load needed packages.
library(R.utils); library(ggplot2); library(plyr); library(dplyr)
echo = TRUE; options(scipen = 2)
Next, download the data and read into R (warning - this may take a few minutes).
setwd("~/R/Coursera/ReproResearch/project2")
if(!file.exists("stormData.csv")){
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="stormData.csv.bz2")
bunzip2("stormData.csv.bz2")
}
if(!"data.csv" %in% ls()){
data <- read.csv("stormData.csv")
}
Note: Although there are several duplicate event codes in the data set (e.g., “HURRICANE/TYPHOON” and “HURRICANE”), there was no cleaning performed on the event codes to aggregate similar events. I am not familiar enough with all of the different event types to reliably determine which events should be linked together, so I have left those data untouched. Please interpret the results of this analysis accordingly.
To explore the impact of severe weather on population health, let’s sum up the total number of fatatalities and injuries for each event type.
total_health <- ddply(data,.(EVTYPE),summarise,sum_fatal=sum(FATALITIES),sum_injury=sum(INJURIES))
To explore the economic consequences of severe weather, let’s sum up the total property damage and crop damage for each event type, and also create a sum variable combining the two damage types. We’ll need to run some pre-processing on these data, as the value in the “PROPDMG”/“CROPDMG” columns need to be evaluted in terms of the value represented in the “PROPDMGEXP”/“CROPDMGEXP” columns (i.e., H/h = 100, K/k = 1,000, M/m = 1,000,000, B/b = 1,000,000,000; data entries with any other values in these columns will be dropped from our analysis).
## Subset needed data
damage_data <- data[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
damage_data$PROPDMGEXP <- toupper(damage_data$PROPDMGEXP)
damage_data$CROPDMGEXP <- toupper(damage_data$CROPDMGEXP)
damage_data <- filter(damage_data, damage_data$PROPDMGEXP == "H" | damage_data$PROPDMGEXP == "K" | damage_data$PROPDMGEXP == "M" | damage_data$PROPDMGEXP == "B" | damage_data$CROPDMGEXP == "H" | damage_data$CROPDMGEXP == "K" | damage_data$CROPDMGEXP == "M" | damage_data$CROPDMGEXP == "B")
## Replace letter exponent with numeric value
damage_data$PROPDMGEXP <- gsub("",0,damage_data$PROPDMGEXP)
damage_data$PROPDMGEXP <- gsub("H",100,damage_data$PROPDMGEXP)
damage_data$PROPDMGEXP <- gsub("K",1000,damage_data$PROPDMGEXP)
damage_data$PROPDMGEXP <- gsub("M",1000000,damage_data$PROPDMGEXP)
damage_data$PROPDMGEXP <- gsub("B",1000000000,damage_data$PROPDMGEXP)
damage_data$CROPDMGEXP <- gsub("",0,damage_data$CROPDMGEXP)
damage_data$CROPDMGEXP <- gsub("H",100,damage_data$CROPDMGEXP)
damage_data$CROPDMGEXP <- gsub("K",1000,damage_data$CROPDMGEXP)
damage_data$CROPDMGEXP <- gsub("M",1000000,damage_data$CROPDMGEXP)
damage_data$CROPDMGEXP <- gsub("B",1000000000,damage_data$CROPDMGEXP)
## Multiply value by exponent value to create total dollar amount for each damage type, then sum together
damage_data <- mutate(damage_data,"PROP_dollars" = as.numeric(damage_data$PROPDMG)*as.numeric(damage_data$PROPDMGEXP))
damage_data <- mutate(damage_data,"CROP_dollars" = as.numeric(damage_data$CROPDMG)*as.numeric(damage_data$CROPDMGEXP))
## Warning in mutate_impl(.data, dots): NAs introduced by coercion
damage_data <- mutate(damage_data,"DAMAGE_dollars" = as.numeric(damage_data$PROP_dollars) + as.numeric(damage_data$CROP_dollars))
## Aggregate total damage for each event type, then re-order by total damage in dollars
total_damage <- ddply(damage_data,.(EVTYPE),summarise,sum_prop=sum(PROP_dollars),sum_crop=sum(CROP_dollars),sum_total=sum(DAMAGE_dollars))
total_damage <- total_damage[order(total_damage$sum_total,decreasing=TRUE),]
As noted above, no transformations were performed to clean up the event type codes, so please interpret the following results with caution.
IMPACT OF SEVERE WEATHER ON POPULATION HEALTH
Below are graphs of the top 10 events in terms of causing injuries and causing fatalities, respectively.
injury <- head(total_health[order(total_health$sum_injury,decreasing=TRUE),],10)
rownames(injury) <- NULL
injury$EVTYPE_order <- reorder(injury$EVTYPE,injury$sum_injury,decreasing=TRUE)
ggplot(injury, aes(EVTYPE_order,sum_injury))+geom_bar(stat="identity")+ coord_flip() +
labs(y="Total Injuries") + labs(x="Event Type Code") +
labs(title="Most Dangerous Weather Events in the US - Injuries")
fatal <- head(total_health[order(total_health$sum_fatal,decreasing=TRUE),],10)
rownames(fatal) <- NULL
fatal$EVTYPE_order <- reorder(fatal$EVTYPE,fatal$sum_fatal,decreasing=TRUE)
ggplot(fatal, aes(x=EVTYPE_order,y=sum_fatal)) + geom_bar(stat="identity") +
coord_flip() + labs(y="Total Fatalities") + labs(x="Event Type Code") +
labs(title="Most Dangerous Weather Events in the US - Fatalities")
As seen above, tornadoes are by far the most dangerous weather events in the US, both in terms of injuries and in terms of fatalities. There is heavy overlap among the 10 most injurious weather events in the US and the 10 most fatal weather events in the US.
ECONOMIC CONSEQUENCES OF SEVERE WEATHER
Below is a plot showing the total economic damages for the 10 most costly extreme weather event types in the US. The data represented include property damage and crop damage.
econ <- head(total_damage[order(total_damage$sum_total,decreasing=TRUE),],10)
rownames(econ) <- NULL
econ$EVTYPE_order <- reorder(econ$EVTYPE,econ$sum_total,decreasing=TRUE)
ggplot(econ,aes(x=EVTYPE_order,y=sum_total)) + geom_bar(stat="identity") +
coord_flip() + labs(y="Total Damage ($)") + labs(x="Event Type Code") +
labs(title="Most Costly Weather Events in the US")
Floods are reported as the most costly extreme weather event in the US, with hurricane/typhoon coming in second and storm surge coming in third. The plot above reflects the high potential for damage caused by water.