Tornadoes most deadly and injurious, but not as expensive as floods

Synopsis

Severe weather events sometimes cause damage to property, injuries, and even deaths. This paper explores the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database (containing data for the 61 year period from 1950 to 2011) to find out what types of weather events cause the largest human and monetary damage. Using exploratory graphs, tornadoes are shown to cause the most injuries and deaths, but flooding causes the most property damage.

Data Processing

The first step in processing the data is to download the storm data (Documentation and an FAQ can be found at the respective links). The data file is compressed using a bz2 file format. To open this type of compression, the file must be opened using the bzfile() function. Below is the code that download the file and reads it into R from a csv file. If the file “storm.csv.bz2” already exists in the working directory, R will skip download the file and begin importing the CSV.

URL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("storm.csv.bz2")) {
  download.file(URL, "storm.csv.bz2", method="curl")
}

storm<-read.csv(bzfile("storm.csv.bz2"))

Deaths and Injuries Preparations

Once, the file is read in, some basic data cleansing must occur. In the documentation for the dataset, 48 categories of data are listed. Within the EVTYPE variable in the storm data frame, there are 961. For the purpose of this project, I will focus on the following event types (EVTYPE): “THUNDERSTORM WIND”, “HEAT”, “FLOOD”, and “HURRICANE”. THUNDERSTORM WIND has various spellings. One that occurs with great frequency is “TSTM WIND.” “HEAT”, “FLOOD”, and “HURRICANE” have similar duplicates. All are dealt with in the code below.

storm$EVTYPE[grepl("TSTM WIND",storm$EVTYPE)]="THUNDERSTORM WIND"
storm$EVTYPE[grepl("EXCESSIVE HEAT",storm$EVTYPE)]="HEAT"
storm$EVTYPE[grepl("RIVER FLOOD",storm$EVTYPE)]="FLOOD"
storm$EVTYPE[grepl("HURRICANE/TYPHOON",storm$EVTYPE)]="HURRICANE"

This sort of correction could be applied to many of the other categories in the EVTYPE variable. For the sake of time and to have the highest impact, I have chosen to focus only only these four since they impact the top 10 event types.

To conduct the analysis on the total number of casualties and injuries caused by these event types, I use the aggregate() formula to create two new data frames that sum the number of casualties and injuries by event type.

stormFatal<-aggregate(storm$FATALITIES, by=list(storm$EVTYPE), sum)
stormInjury<-aggregate(storm$INJURIES, by=list(storm$EVTYPE), sum)

As mentioned earlier, there are a large number of events. Graphing these events (as I will in the Results section), would be rather messy. In an effort to have a more tidy graph that is visually simple, I have chosen to restrict the data to the 10 events that cause the most deaths and the 10 events that cause the most injuries. Below is the code for obtaining the top 10. I also rename the columns for ease of use.

sF10<-stormFatal[order(stormFatal$x,decreasing=TRUE)[1:10],] 
sI10<-stormInjury[order(stormInjury$x,decreasing=TRUE)[1:10],] 
colnames(sF10)<-c("WeatherType", "Deaths")
colnames(sI10)<-c("WeatherType","Injuries")

Dollars of Damage

To prepare for the analysis of economic impact, the data needs to be cleansed. The data in both the CROPDMG (crop damage) and PROPDMG (property damage) need to be multiplied by the multiple of 1000 indicated in the CROPDMGEXP and PROPDMGEXP columns respectively (e.g. k=1000, m=1,000,000, b=1,000,000,000. I decided not to worry about any of the small values because they likely do not add up to the billions necessary to compete with the highest damage producers. To get a sense of the total economic impact, these two columns then are added together. The code below completed these processes.

stormb<-storm[storm$PROPDMGEXP %in% c("","B","k","K","m","M"),]
stormb$PROPDMGEXP<-factor(stormb$PROPDMGEXP)
levels(stormb$PROPDMGEXP) <-c("1", "1000000000","1000", "1000", "1000000", "1000000")
stormb$PROPDMGEXP <- as.numeric(paste(stormb$PROPDMGEXP))
stormb$PROP <- stormb$PROPDMG*stormb$PROPDMGEXP

stormc<-stormb[stormb$CROPDMGEXP %in% c("","B","k","K","m","M"),]
stormc$CROPDMGEXP<-factor(stormc$CROPDMGEXP)
levels(stormc$CROPDMGEXP)<- c("1", "1000000000","1000", "1000", "1000000", "1000000")
stormc$CROPDMGEXP <- as.numeric(paste(stormc$CROPDMGEXP))
stormc$CROP <- stormc$CROPDMG*stormc$CROPDMGEXP

stormc$TOTALDMG <- stormc$PROP + stormc$CROP

The data will be aggregated again the same way that it was early for injuries and deaths. This makes graphing more simple.

stormTOTAL<-aggregate(stormc$TOTALDMG, by=list(stormc$EVTYPE), sum)
sTOT10<-stormTOTAL[order(stormTOTAL$x,decreasing=TRUE)[1:10],]
colnames(sTOT10)<-c("WeatherType", "DamageinDollars")

Results

The plots produced in this paper are created using the ggplot2 package. To ensure that the package is loaded, I ran the code below.

library(ggplot2)

Injuries

The first graph displays the number of injuries caused by the top ten weather event types that cause injuries.

ggplot(data=sI10, aes(x=WeatherType, y=Injuries, fill=WeatherType)) + 
      geom_bar(stat="identity") +  ylab("Number of Injuries") + 
      ggtitle("Injuries from Weather Type") + theme(axis.text.x=element_blank())

plot of chunk unnamed-chunk-8

Figure 1: Injuries caused by the top ten most injurious weather events.

This graph clearly shows that over the 60 year period contained in the NOAA data, tornadoes cause considerably more injuries than any other storm type—nearly 10 times as many as the nearest competitors.

Deaths

The second graph displays the number of deaths by the top ten weather events that cause deaths.

ggplot(data=sF10, aes(x=WeatherType, y=Deaths, fill=WeatherType)) + 
      geom_bar(stat="identity") + ylab("Number of Deaths") + 
      ggtitle("Deaths from Weather Type") + theme(axis.text.x=element_blank())

plot of chunk unnamed-chunk-9

Figure 2: Deaths caused by the top ten most deadly weather events.

Again, tornadoes killed more people than any other weather event. This time there were nearly twice as many people killed by tornadoes than its nearest competitor. Heat was the closest.

Tornadoes by far are the events that are the most harmful to population health.

Damage

The third figure has to do with the total dollars of damage.

ggplot(data=sTOT10, aes(x=WeatherType, y=DamageinDollars/1000000000, fill=WeatherType)) + 
      geom_bar(stat="identity") + ylab("Damage in Billions of Dollars") + 
      ggtitle("Damage in Billions of Dollars from Weather Type") + 
      theme(axis.text.x=element_blank())

plot of chunk unnamed-chunk-10

Figure 3: Damage in Billions of Dollars caused by the top ten most damaging weather events.

This graph shows that Flooding and Hurricanes have caused the most dollars of damage between 1950 and 2011. Tornadoes came in third with a little over $50B in damage.

This suggests that Flooding has the greatest economic consequences of any weather-related event.