Summary: Most harmful Weather Event Types

This analysis starts with the official dataset from the US National Weather Service that documents the time and impact of all severe or unusual weather conditions in the United States. The data is collected by weather professionals, public officials such as Police, and even volunteers. Data quality at collection time is an important priority of the sponsoring team.

Specifically, we wish to understand the weather events that cause the most property damage, and/or cause the most personal injury or death. We will accomplish this task in the most simplistic manner possible, by summing up the damage costs and personal injury counts by event type.

Data Processing

The raw data is fetched, as needed, from the National Weather Service URL and cached, or staged, on the local filesystem of the development laptop; this staging is designed to reduce the time and bandwidth required to download the file. Subsequently a smaller, aggregated Dataframe is built to support the analytical phases of this research.

if ( !file.exists("StormData.bz2") ) { 
   download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="StormData.bz2",  method="auto") 
 }  
stormDat <-  read.csv(   "~/RepData_PeerAssessment2/StormData.bz2" )  

Results: Damage

Damage that is caused by weather events is deivided into two types, which are damage to Property or damage to Crops. For the purposes of this analysis we do not need to differentiate between the two, so we simply add those two dollar figures together as we build an aggregated dataframe. Finally, the aggregated values for the various event types are sorted (descending) and the top 10 Event types isolated in order to generate a plot.

#process fields that are relevant to our investigative scope in to a smaller analytical object
aggDamage   <- aggregate((PROPDMG + CROPDMG) ~ EVTYPE, data = stormDat, sum)
names(aggDamage) <- c("EVTYPE", "DAMAGE")
sortedDamage <- aggDamage[ order( -aggDamage$DAMAGE), ]
top10 <- head(sortedDamage, 10)
pie(top10$DAMAGE, labels=top10$EVTYPE) 
title(main="Weather Events that have caused the most Damage") 

Results: Injury

The two types of threat to human well-being are given in the data as Fatalities and Injuries. This is not the same situation as in the case of property damange, where the loss is expressed in like units (dollars). Rather, Fatalities and Injuries are obviously of wildly different impact on the individual human. Also, even Injuries is expressed as a count of impacted persons, even though the degree of harm might vary considerably.

Nonetheless, a comparison between Event types must be made. Although it may be controversial, we have decided to normalize the impact by considering a Fatalities to be equal to multiple Injuries. The multiplier chosen is 20. Under these paramters, the top three weather events causing personal health impactes are Tornadoes, Excessive Heat, and Lightning.

#Build an analytical object with Personal injury information
#Use a 10:1 multiplier to put deaths in to a similar scale as injuries
aggInjury   <- aggregate((INJURIES + 20 * FATALITIES) ~ EVTYPE, data = stormDat, sum)
names(aggInjury) <- c("EVTYPE", "INJURY")
sortedInjury <- aggInjury[ order( -aggInjury$INJURY), ]
topInjury <- head(sortedInjury, 10)
pie(topInjury$INJURY, labels=topInjury$EVTYPE) 
title(main="Weather Events that have caused the most Personal Injury or Death") 

Results

As we can see from these two charts, Tornadoes are the most dangerous weather event overall, as they account for both the most property damange and personal injury.