1 Synopsis

When severe weather strikes, fatalities, injuries, property damage, and crop damage can occur. Understanding the risks of these types of outcomes associated with various types of severe weather events is important in order to help communities be prepared. In this analysis, we will work with the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. After cleaning up their data to get it in a format where values can easily be compared, we will aggregrate fatalities, injuries, property damage (in USD), and crop damage (in USD) by severe weather event type. Then, we will sort these totals to see which types of weather events cause the most bodily harm via fatalities and/or injuries and economic destruction in the form of property damage and/or crop damage. We will see that tornadoes can have devastating effects on human life, while floods can have devastating economic impact.

2 Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We will begin by downloading and reading the csv file.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","stormData.csv.bz2")
stormData <- read.csv("stormData.csv.bz2")

For the purposes of this analysis, we are only interested in fatalities, injuries, property damage, and crop damage due to various types of severe weather events. These are recorded in the following columns within the dataset:

Throughout the rest of the analysis, we will restrict our attention to these particular columns within the dataset. We will also make sure that all of the columns containing characters have the same case by changing them all to upper case.

neededData <- subset(stormData,select=c(EVTYPE, FATALITIES, INJURIES, PROPDMG,
                                        PROPDMGEXP, CROPDMG, CROPDMGEXP))

neededData$EVTYPE <- toupper(neededData$EVTYPE)
neededData$PROPDMGEXP <- toupper(neededData$PROPDMGEXP)
neededData$CROPDMGEXP <- toupper(neededData$CROPDMGEXP)

Next, we will make the information regarding property and crop damage more usable. First, we will modify the PROPDMGEXP and CROPDMGEXP entries. We will convert any H’s (for hundreds) in these columns to 2’s since 10^2 = 100, any T’s (for thousands) to 3’s since 10^3=1000, any M’s (for millions) to 6’s since 10^6=1,000,000, and any B’s (for billions) to 9’s since 10^9 = 1,000,000,000. By looking at the other entries in these columns, we see that there are also many missing and/or possibly erroneous entries like “”, “+”,“-”, and “?”. We have chosen to make all of these entries equal to 0; this should have minimal impact on our analysis since most of these entries correspond to $0 in property and/or crop damage. After converting these two columns of exponents into integers, we use them, as well as the entries in PROPDMG and CROPDMG, to determine the dollar amounts of property and crop damage due to each storm. We store these values in two new variables, PROPDMGTOTAL and CROPDMGTOTAL. These values will be easier to compare to one another than when the entries were formatted as 2.5M or 1.5B, for example.

unique(neededData$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
unique(neededData$CROPDMGEXP)
## [1] ""  "M" "K" "B" "?" "0" "2"
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "H"] <- 2  
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "K"] <- 3
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "M"] <- 6
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "B"] <- 9

neededData$PROPDMGEXP[neededData$PROPDMGEXP == ""] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "+"] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "-"] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "?"] <- 0


neededData$PROPDMGEXP <- as.integer(neededData$PROPDMGEXP)
neededData$PROPDMGTOTAL <- neededData$PROPDMG*10^(neededData$PROPDMGEXP)


neededData$CROPDMGEXP[neededData$CROPDMGEXP == "H"] <- 2  
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "K"] <- 3
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "M"] <- 6
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "B"] <- 9

neededData$CROPDMGEXP[neededData$CROPDMGEXP == ""] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "+"] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "-"] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "?"] <- 0


neededData$CROPDMGEXP <- as.integer(neededData$CROPDMGEXP)
neededData$CROPDMGTOTAL <- neededData$CROPDMG*10^(neededData$CROPDMGEXP)

3 Results

3.1 Analyze impact on population health

We are now in a position to analyze the total number of fatalities due to various types of severe weather events. We will do this by aggregating and summing the given data and then sorting the data in descending order by the number of fatalities.

aggFatalities <- aggregate(FATALITIES~EVTYPE,data=neededData,FUN=sum)
aggFatalities <- aggFatalities[order(-aggFatalities$FATALITIES),]
head(aggFatalities,n=10)
##             EVTYPE FATALITIES
## 758        TORNADO       5633
## 116 EXCESSIVE HEAT       1903
## 138    FLASH FLOOD        978
## 243           HEAT        937
## 418      LIGHTNING        816
## 779      TSTM WIND        504
## 154          FLOOD        470
## 524    RIP CURRENT        368
## 320      HIGH WIND        248
## 19       AVALANCHE        224

The top ten causes of fatalities are displayed above. We can see that tornadoes cause almost three times as many fatalities as the next leading cause, excessive heat. Looking at this list, we do see that some event types could be consolidated (ex: excessive heat and heat). However, due to the huge lead tornadoes have on this list, tornadoes would remain on top even if event types were consolidated.

Now we will turn our attention to injuries, following the same procedure as we did above for fatalities.

aggInjuries <- aggregate(INJURIES~EVTYPE,data=neededData,FUN=sum)
aggInjuries <- aggInjuries[order(-aggInjuries$INJURIES),]
head(aggInjuries,n=10)
##                EVTYPE INJURIES
## 758           TORNADO    91346
## 779         TSTM WIND     6957
## 154             FLOOD     6789
## 116    EXCESSIVE HEAT     6525
## 418         LIGHTNING     5230
## 243              HEAT     2100
## 387         ICE STORM     1975
## 138       FLASH FLOOD     1777
## 685 THUNDERSTORM WIND     1488
## 212              HAIL     1361

The top ten causes of injuries are displayed above. We can see that tornadoes cause approximately 13 times as many injuries as the next leading cause, tstm wind. Looking at this list, we once again see that some event types could be consolidated (ex: tstm wind and thunderstorm wind). However, due to the huge lead tornadoes have on this list, tornadoes would remain on top even if event types were consolidated.

We now merge and sum our data on fatalities and injuries to get a composite measure of how destructive severe weather events can be in terms of bodily harm. We then sort the data in descending order based on the total harm (fatalities + injuries) done.

aggTotalHarm <- merge(aggFatalities,aggInjuries,by="EVTYPE")
aggTotalHarm$HARMTOTAL <- aggTotalHarm$FATALITIES+aggTotalHarm$INJURIES
aggTotalHarm <- aggTotalHarm[order(-aggTotalHarm$HARMTOTAL),]
head(aggTotalHarm,n=10)
##                EVTYPE FATALITIES INJURIES HARMTOTAL
## 758           TORNADO       5633    91346     96979
## 116    EXCESSIVE HEAT       1903     6525      8428
## 779         TSTM WIND        504     6957      7461
## 154             FLOOD        470     6789      7259
## 418         LIGHTNING        816     5230      6046
## 243              HEAT        937     2100      3037
## 138       FLASH FLOOD        978     1777      2755
## 387         ICE STORM         89     1975      2064
## 685 THUNDERSTORM WIND        133     1488      1621
## 888      WINTER STORM        206     1321      1527

The top ten causes of bodily harm due to severe weather events are shown in the list above. Not surprisingly, we see that tornadoes have a commanding lead on this list, with excessive heat in second place.

In the barplot below, we summarize these results by showing the number of fatalities, injuries, and total cases of bodily harm for the top five causes of bodily harm due to severe weather events. This barplot makes it clear that injuries are far more likely than fatalities in the event of severe weather.

aggTotalHarm <- aggTotalHarm[1:5,]
topHarm <- rbind(aggTotalHarm$FATALITIES,aggTotalHarm$INJURIES,aggTotalHarm$HARMTOTAL)
aggTotalHarm$EVTYPE
## [1] "TORNADO"        "EXCESSIVE HEAT" "TSTM WIND"      "FLOOD"         
## [5] "LIGHTNING"
colnames(topHarm) <- c("Tornado","Heat","Wind","Flood","Lightning")

library(lattice)
barplot(topHarm,beside=TRUE,las=1,legend.text=c("Fatalities","Injuries","Total"),col=c("red","yellow","blue"),main="Top 5 Causes of Bodily Harm",xlab="Weather Event",ylab="Cases")

3.2 Analyze economic impact

We will now analyze the total amount of property damage (in USD) caused by each type of severe weather event by aggregating and summing the given data and then sorting the data in descending order by the number of property damage totals.

aggPropDmg<-aggregate(PROPDMGTOTAL~ EVTYPE,data = neededData,FUN=sum)
aggPropDmg <- aggPropDmg[order(-aggPropDmg$PROPDMGTOTAL),]
head(aggPropDmg,n=10)
##                EVTYPE PROPDMGTOTAL
## 154             FLOOD 144657709807
## 372 HURRICANE/TYPHOON  69305840000
## 758           TORNADO  56947380677
## 599       STORM SURGE  43323536000
## 138       FLASH FLOOD  16822673979
## 212              HAIL  15735267513
## 363         HURRICANE  11868319010
## 772    TROPICAL STORM   7703890550
## 888      WINTER STORM   6688497251
## 320         HIGH WIND   5270046295

The top ten causes of property damage due to severe weather events are shown in the list above. We can see that floods cause approximately twice as much property damage as the next leading cause, hurricane/typhoon. Looking at this list, we once again see that some event types could be consolidated (ex: hurricane/typhoon and hurricane). However, due to the huge lead floods have on this list, floods would remain on top even if event types were consolidated.

Now we will turn our attention to crop damage, following the same procedure as we did above for property damage.

aggCropDmg<-aggregate(CROPDMGTOTAL~ EVTYPE,data = neededData,FUN=sum)
aggCropDmg <- aggCropDmg[order(-aggCropDmg$CROPDMGTOTAL),]
head(aggCropDmg,n=10)
##                EVTYPE CROPDMGTOTAL
## 84            DROUGHT  13972566000
## 154             FLOOD   5661968450
## 529       RIVER FLOOD   5029459000
## 387         ICE STORM   5022113500
## 212              HAIL   3025954473
## 363         HURRICANE   2741910000
## 372 HURRICANE/TYPHOON   2607872800
## 138       FLASH FLOOD   1421317100
## 125      EXTREME COLD   1312973000
## 187      FROST/FREEZE   1094186000

The top ten causes of crop damage due to severe weather events are shown in the list above. We can see that droughts cause approximately 2.5 times as much crop damage as the next leading cause, flood. Looking at this list, we once again see that some event types could be consolidated (ex: flood, river flood, and flash flood). If these event types were consolidated, the total amount of crop damage due to floods ($12,112,744,550) would be much closer to the amount of damage due to drought ($13,972,566,000). However, both numbers pale in comparison to the amount of property damage done by floods ($144,657,709,807), so crop damage seems to be far less significant than property damage.

We now merge and sum our data on property and crop damage to get a composite measure of how destructive severe weather events can be in terms of economic impact. We then sort the data in descending order based on the total damage (property + crop) done.

aggTotalDmg <- merge(aggPropDmg,aggCropDmg,by="EVTYPE")
aggTotalDmg$DMGTOTAL <- aggTotalDmg$PROPDMGTOTAL+aggTotalDmg$CROPDMGTOTAL
aggTotalDmg <- aggTotalDmg[order(-aggTotalDmg$DMGTOTAL),]
head(aggTotalDmg,n=10)
##                EVTYPE PROPDMGTOTAL CROPDMGTOTAL     DMGTOTAL
## 154             FLOOD 144657709807   5661968450 150319678257
## 372 HURRICANE/TYPHOON  69305840000   2607872800  71913712800
## 758           TORNADO  56947380677    414953270  57362333947
## 599       STORM SURGE  43323536000         5000  43323541000
## 212              HAIL  15735267513   3025954473  18761221986
## 138       FLASH FLOOD  16822673979   1421317100  18243991079
## 84            DROUGHT   1046106000  13972566000  15018672000
## 363         HURRICANE  11868319010   2741910000  14610229010
## 529       RIVER FLOOD   5118945500   5029459000  10148404500
## 387         ICE STORM   3944927860   5022113500   8967041360

The top ten causes of economic damage due to severe weather events are shown in the list above. We see that floods lead this list, followed by hurricane/typhoon. Even though droughts caused the most crop damage, they appear 7th on this list since crop damage pales in comparison to property damage.

In the barplot below, we summarize these results by showing the amount of property damage, crop damage, and total damage for the top seven causes of economic damage due to severe weather events.

aggTotalDmg <- aggTotalDmg[1:7,]
topDmg <- rbind(aggTotalDmg$PROPDMGTOTAL,aggTotalDmg$CROPDMGTOTAL,aggTotalDmg$DMGTOTAL)
aggTotalDmg$EVTYPE
## [1] "FLOOD"             "HURRICANE/TYPHOON" "TORNADO"          
## [4] "STORM SURGE"       "HAIL"              "FLASH FLOOD"      
## [7] "DROUGHT"
colnames(topDmg) <- c("Flood","Hurricane","Tornado","Storm Surge","Hail","Flash Flood","Drought")

barplot(topDmg/10^9,beside=TRUE,las=1,legend.text=c("Property Damage","Crop Damage","Total Damage"),col=c("red","yellow","blue"),main="Top 7 Causes of Economic Damage",xlab="Weather Event",ylab="Damage Amount (in billions of USD)",cex.names=.7)

3.3 Summary

In summary, tornadoes are by far the leading cause of fatalities and injuries due to severe weather, with excessive heat, thunderstorm wind, floods, and lightning all fairly close in second through fifth place. Floods have the biggest economic impact in terms of total property and crop damage, with hurricanes coming in second place. While droughts surpass floods in terms of crop damage, they are in seventh place when property and crop damage are combined since droughts do not cause a significant amount of property damage.