Synopsis

This report analyzes major weather events and their impact on public health and the economy. Specifically, data on fatalities, injuries, property damage, and crop damage by weather event is examined. To investigate these issues, National Weather Service data capturing characteristics of weather events between 1950 and 2011 was used. The data show that tornados resulted in the most injuries and fatalities of any weather event, therefore having the worst impact on population health. Floods, hurricanes, and typhoons had the worst economic consequences, resulting in the greatest costs in property damage and crop damage.

Data Processing

First, the Storm Data from the National Weather Service is downloaded and read into R as a csv file. Documentation for the data set is available here. The data has 902,297 observations of 37 variables.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="./StormData.csv",method="curl")
data <- read.csv("StormData.csv")
dim(data)
## [1] 902297     37

Since we are investigating the impact of event types on population health and the economy, we take a subset of the data that includes only the relevant variables.

data <- data[,c("EVTYPE","INJURIES","FATALITIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(data)
##    EVTYPE INJURIES FATALITIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO       15          0    25.0          K       0           
## 2 TORNADO        0          0     2.5          K       0           
## 3 TORNADO        2          0    25.0          K       0           
## 4 TORNADO        2          0     2.5          K       0           
## 5 TORNADO        2          0     2.5          K       0           
## 6 TORNADO        6          0     2.5          K       0

Results

The major questions to investigate are 1) which weather events have the greatest impact on population health, and 2) which weather events have the greatest economic consequences.

1) Weather Events and Population Health

To determine which event types have the greatest effect on population health, the data was reshaped to sum injuries and fatalities by event type.

library(reshape2)
health_datamelt <- melt(data,id="EVTYPE",measure.vars=c("INJURIES","FATALITIES"))
health_cast <- dcast(health_datamelt,EVTYPE~variable,sum)

The resulting data frame was then sorted in descending order to get the top 10 weather events for fatalities.

fatalities_sort <- health_cast[with(health_cast,order(-FATALITIES)),]
top_fatalities <- head(fatalities_sort,10)
top_fatalities
##             EVTYPE INJURIES FATALITIES
## 834        TORNADO    91346       5633
## 130 EXCESSIVE HEAT     6525       1903
## 153    FLASH FLOOD     1777        978
## 275           HEAT     2100        937
## 464      LIGHTNING     5230        816
## 856      TSTM WIND     6957        504
## 170          FLOOD     6789        470
## 585    RIP CURRENT      232        368
## 359      HIGH WIND     1137        248
## 19       AVALANCHE      170        224

The injuries and fatalities by event type are then plotted using the ggplot2 package. The graph shows that tornados have caused the most casualties of any weather event. Excessive heat produced the second largest number of fatalities, and a similar number of injuries as TSTM wind, floods, and lightning.

library(ggplot2)
#Use a subset of the molten data frame in order to more easily plot both fatalities and injuries in the same graph
plot_fatalities <- health_datamelt[with(health_datamelt,EVTYPE=="TORNADO"|EVTYPE=="EXCESSIVE HEAT"|EVTYPE=="FLASH FLOOD"|EVTYPE=="HEAT"|EVTYPE=="LIGHTNING"|EVTYPE=="TSTM WIND"|EVTYPE=="FLOOD"|EVTYPE=="RIP CURRENT"|EVTYPE=="HIGH WIND"|EVTYPE=="AVALANCHE"),]
#Plot the top weather events using ggplot
ggplot(plot_fatalities,aes(EVTYPE,value))+geom_col(aes(fill=variable),position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Fatalities and Injuries by Event Type")+labs(y="Fatalities and Injuries",x="Event Type")+theme(legend.position = "top")

2) Weather Events and Economic Consequences

The dollar amounts of property and crop damage are expressed as 3-digit numbers (PROPDMG and CROPDMG variables) with multipliers (PROPDMGEXP and CROPDMGEXP columns). In order to sum the cost for each type of weather event, the columns were combined into new variables, PROPDMGCOST and CROPDMGCOST.

#Create a lookup table to convert data into dollar amounts
PROPDMGEXP <- c("","-","?","+",0:8,"B","h","H","K","m","M","k")
key <- c(0,0,0,0,1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,100,100,1000,1000000,1000000,1000)
lookup <- data.frame(PROPDMGEXP, key)
#Add the "key" values as a new column in the dataframe using merge
datamerge <- merge(data,lookup)
#Calculate total cost for property and crop damage and store the results in new PROPDMGCOST and CROPDMGCOST columns
datamerge$PROPDMGCOST <- datamerge$PROPDMG*datamerge$key
datamerge$CROPDMGCOST <- datamerge$CROPDMG*datamerge$key

The resulting dataframe is then reshaped to sum the total property and crop damage by weather event.

econ_datamelt <- melt(datamerge,id="EVTYPE",measure.vars=c("PROPDMGCOST","CROPDMGCOST"))
econ_cast <- dcast(econ_datamelt,EVTYPE~variable,sum)
head(econ_cast)
##                  EVTYPE PROPDMGCOST CROPDMGCOST
## 1    HIGH SURF ADVISORY      200000           0
## 2         COASTAL FLOOD           0           0
## 3           FLASH FLOOD       50000           0
## 4             LIGHTNING           0           0
## 5             TSTM WIND     8100000           0
## 6       TSTM WIND (G45)        8000           0

Property damage and crop damage were considered separately. First, the dataframe was sorted to determine the top weather events for property damage. The results were then plotted, revealing that floods yielded the greatest costs in property damage, followed by hurricanes/typhoons and tornados.

PROPDMG_sort <- econ_cast[with(econ_cast,order(-PROPDMGCOST)),]
#Select the top 10 event types for property damage
PROPDMG_top <- head(PROPDMG_sort,10)
#Reorder factor levels so bar graph will be in descending order
PROPDMG_top$EVTYPE <- factor(PROPDMG_top$EVTYPE,levels=PROPDMG_top$EVTYPE[order(PROPDMG_top$PROPDMGCOST)])
#Plot the data
ggplot(PROPDMG_top,aes(EVTYPE,PROPDMGCOST))+geom_col(position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Property Damage by Event Type")+labs(y="Total Cost of Damage",x="Event Type")

The data was then sorted by crop damage costs and plotted, showing that hurricanes have the greatest costs in terms of crop damage, followed by hurricanes/typhoons and floods.

##Plot crop damage
CROPDMG_sort <- econ_cast[with(econ_cast,order(-CROPDMGCOST)),]
CROPDMG_top <- head(CROPDMG_sort,10)
CROPDMG_top$EVTYPE <-factor(CROPDMG_top$EVTYPE,levels=CROPDMG_top$EVTYPE[order(CROPDMG_top$CROPDMGCOST)])
ggplot(CROPDMG_top,aes(EVTYPE,CROPDMGCOST))+geom_col(position = position_stack(reverse = TRUE)) +coord_flip()+ggtitle("Crop Damage by Event Type")+labs(y="Total Cost of Damage",x="Event Type")