This report analyzes major weather events and their impact on public health and the economy. Specifically, data on fatalities, injuries, property damage, and crop damage by weather event is examined. To investigate these issues, National Weather Service data capturing characteristics of weather events between 1950 and 2011 was used. The data show that tornados resulted in the most injuries and fatalities of any weather event, therefore having the worst impact on population health. Floods, hurricanes, and typhoons had the worst economic consequences, resulting in the greatest costs in property damage and crop damage.
First, the Storm Data from the National Weather Service is downloaded and read into R as a csv file. Documentation for the data set is available here. The data has 902,297 observations of 37 variables.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="./StormData.csv",method="curl")
data <- read.csv("StormData.csv")
dim(data)
## [1] 902297 37
Since we are investigating the impact of event types on population health and the economy, we take a subset of the data that includes only the relevant variables.
data <- data[,c("EVTYPE","INJURIES","FATALITIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(data)
## EVTYPE INJURIES FATALITIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 15 0 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 2 0 25.0 K 0
## 4 TORNADO 2 0 2.5 K 0
## 5 TORNADO 2 0 2.5 K 0
## 6 TORNADO 6 0 2.5 K 0
The major questions to investigate are 1) which weather events have the greatest impact on population health, and 2) which weather events have the greatest economic consequences.
To determine which event types have the greatest effect on population health, the data was reshaped to sum injuries and fatalities by event type.
library(reshape2)
health_datamelt <- melt(data,id="EVTYPE",measure.vars=c("INJURIES","FATALITIES"))
health_cast <- dcast(health_datamelt,EVTYPE~variable,sum)
The resulting data frame was then sorted in descending order to get the top 10 weather events for fatalities.
fatalities_sort <- health_cast[with(health_cast,order(-FATALITIES)),]
top_fatalities <- head(fatalities_sort,10)
top_fatalities
## EVTYPE INJURIES FATALITIES
## 834 TORNADO 91346 5633
## 130 EXCESSIVE HEAT 6525 1903
## 153 FLASH FLOOD 1777 978
## 275 HEAT 2100 937
## 464 LIGHTNING 5230 816
## 856 TSTM WIND 6957 504
## 170 FLOOD 6789 470
## 585 RIP CURRENT 232 368
## 359 HIGH WIND 1137 248
## 19 AVALANCHE 170 224
The injuries and fatalities by event type are then plotted using the ggplot2 package. The graph shows that tornados have caused the most casualties of any weather event. Excessive heat produced the second largest number of fatalities, and a similar number of injuries as TSTM wind, floods, and lightning.
library(ggplot2)
#Use a subset of the molten data frame in order to more easily plot both fatalities and injuries in the same graph
plot_fatalities <- health_datamelt[with(health_datamelt,EVTYPE=="TORNADO"|EVTYPE=="EXCESSIVE HEAT"|EVTYPE=="FLASH FLOOD"|EVTYPE=="HEAT"|EVTYPE=="LIGHTNING"|EVTYPE=="TSTM WIND"|EVTYPE=="FLOOD"|EVTYPE=="RIP CURRENT"|EVTYPE=="HIGH WIND"|EVTYPE=="AVALANCHE"),]
#Plot the top weather events using ggplot
ggplot(plot_fatalities,aes(EVTYPE,value))+geom_col(aes(fill=variable),position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Fatalities and Injuries by Event Type")+labs(y="Fatalities and Injuries",x="Event Type")+theme(legend.position = "top")
The dollar amounts of property and crop damage are expressed as 3-digit numbers (PROPDMG and CROPDMG variables) with multipliers (PROPDMGEXP and CROPDMGEXP columns). In order to sum the cost for each type of weather event, the columns were combined into new variables, PROPDMGCOST and CROPDMGCOST.
#Create a lookup table to convert data into dollar amounts
PROPDMGEXP <- c("","-","?","+",0:8,"B","h","H","K","m","M","k")
key <- c(0,0,0,0,1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,100,100,1000,1000000,1000000,1000)
lookup <- data.frame(PROPDMGEXP, key)
#Add the "key" values as a new column in the dataframe using merge
datamerge <- merge(data,lookup)
#Calculate total cost for property and crop damage and store the results in new PROPDMGCOST and CROPDMGCOST columns
datamerge$PROPDMGCOST <- datamerge$PROPDMG*datamerge$key
datamerge$CROPDMGCOST <- datamerge$CROPDMG*datamerge$key
The resulting dataframe is then reshaped to sum the total property and crop damage by weather event.
econ_datamelt <- melt(datamerge,id="EVTYPE",measure.vars=c("PROPDMGCOST","CROPDMGCOST"))
econ_cast <- dcast(econ_datamelt,EVTYPE~variable,sum)
head(econ_cast)
## EVTYPE PROPDMGCOST CROPDMGCOST
## 1 HIGH SURF ADVISORY 200000 0
## 2 COASTAL FLOOD 0 0
## 3 FLASH FLOOD 50000 0
## 4 LIGHTNING 0 0
## 5 TSTM WIND 8100000 0
## 6 TSTM WIND (G45) 8000 0
Property damage and crop damage were considered separately. First, the dataframe was sorted to determine the top weather events for property damage. The results were then plotted, revealing that floods yielded the greatest costs in property damage, followed by hurricanes/typhoons and tornados.
PROPDMG_sort <- econ_cast[with(econ_cast,order(-PROPDMGCOST)),]
#Select the top 10 event types for property damage
PROPDMG_top <- head(PROPDMG_sort,10)
#Reorder factor levels so bar graph will be in descending order
PROPDMG_top$EVTYPE <- factor(PROPDMG_top$EVTYPE,levels=PROPDMG_top$EVTYPE[order(PROPDMG_top$PROPDMGCOST)])
#Plot the data
ggplot(PROPDMG_top,aes(EVTYPE,PROPDMGCOST))+geom_col(position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Property Damage by Event Type")+labs(y="Total Cost of Damage",x="Event Type")
The data was then sorted by crop damage costs and plotted, showing that hurricanes have the greatest costs in terms of crop damage, followed by hurricanes/typhoons and floods.
##Plot crop damage
CROPDMG_sort <- econ_cast[with(econ_cast,order(-CROPDMGCOST)),]
CROPDMG_top <- head(CROPDMG_sort,10)
CROPDMG_top$EVTYPE <-factor(CROPDMG_top$EVTYPE,levels=CROPDMG_top$EVTYPE[order(CROPDMG_top$CROPDMGCOST)])
ggplot(CROPDMG_top,aes(EVTYPE,CROPDMGCOST))+geom_col(position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Crop Damage by Event Type")+labs(y="Total Cost of Damage",x="Event Type")