When severe weather strikes, fatalities, injuries, property damage, and crop damage can occur. Understanding the risks of these types of outcomes associated with various types of severe weather events is important in order to help communities be prepared. In this analysis, we will work with the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. After cleaning up their data to get it in a format where values can easily be compared, we will aggregrate fatalities, injuries, property damage (in USD), and crop damage (in USD) by severe weather event type. Then, we will sort these totals to see which types of weather events cause the most bodily harm via fatalities and/or injuries and economic destruction in the form of property damage and/or crop damage. We will see that tornadoes can have devastating effects on human life, while floods can have devastating economic impact.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We will begin by downloading and reading the csv file.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","stormData.csv.bz2")
stormData <- read.csv("stormData.csv.bz2")
For the purposes of this analysis, we are only interested in fatalities, injuries, property damage, and crop damage due to various types of severe weather events. These are recorded in the following columns within the dataset:
Throughout the rest of the analysis, we will restrict our attention to these particular columns within the dataset. We will also make sure that all of the columns containing characters have the same case by changing them all to upper case.
neededData <- subset(stormData,select=c(EVTYPE, FATALITIES, INJURIES, PROPDMG,
PROPDMGEXP, CROPDMG, CROPDMGEXP))
neededData$EVTYPE <- toupper(neededData$EVTYPE)
neededData$PROPDMGEXP <- toupper(neededData$PROPDMGEXP)
neededData$CROPDMGEXP <- toupper(neededData$CROPDMGEXP)
Next, we will make the information regarding property and crop damage more usable. First, we will modify the PROPDMGEXP and CROPDMGEXP entries. We will convert any H’s (for hundreds) in these columns to 2’s since 10^2 = 100, any T’s (for thousands) to 3’s since 10^3=1000, any M’s (for millions) to 6’s since 10^6=1,000,000, and any B’s (for billions) to 9’s since 10^9 = 1,000,000,000. By looking at the other entries in these columns, we see that there are also many missing and/or possibly erroneous entries like “”, “+”,“-”, and “?”. We have chosen to make all of these entries equal to 0; this should have minimal impact on our analysis since most of these entries correspond to $0 in property and/or crop damage. After converting these two columns of exponents into integers, we use them, as well as the entries in PROPDMG and CROPDMG, to determine the dollar amounts of property and crop damage due to each storm. We store these values in two new variables, PROPDMGTOTAL and CROPDMGTOTAL. These values will be easier to compare to one another than when the entries were formatted as 2.5M or 1.5B, for example.
unique(neededData$PROPDMGEXP)
## [1] "K" "M" "" "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
unique(neededData$CROPDMGEXP)
## [1] "" "M" "K" "B" "?" "0" "2"
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "H"] <- 2
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "K"] <- 3
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "M"] <- 6
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "B"] <- 9
neededData$PROPDMGEXP[neededData$PROPDMGEXP == ""] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "+"] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "-"] <- 0
neededData$PROPDMGEXP[neededData$PROPDMGEXP == "?"] <- 0
neededData$PROPDMGEXP <- as.integer(neededData$PROPDMGEXP)
neededData$PROPDMGTOTAL <- neededData$PROPDMG*10^(neededData$PROPDMGEXP)
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "H"] <- 2
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "K"] <- 3
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "M"] <- 6
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "B"] <- 9
neededData$CROPDMGEXP[neededData$CROPDMGEXP == ""] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "+"] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "-"] <- 0
neededData$CROPDMGEXP[neededData$CROPDMGEXP == "?"] <- 0
neededData$CROPDMGEXP <- as.integer(neededData$CROPDMGEXP)
neededData$CROPDMGTOTAL <- neededData$CROPDMG*10^(neededData$CROPDMGEXP)
We are now in a position to analyze the total number of fatalities due to various types of severe weather events. We will do this by aggregating and summing the given data and then sorting the data in descending order by the number of fatalities.
aggFatalities <- aggregate(FATALITIES~EVTYPE,data=neededData,FUN=sum)
aggFatalities <- aggFatalities[order(-aggFatalities$FATALITIES),]
head(aggFatalities,n=10)
## EVTYPE FATALITIES
## 758 TORNADO 5633
## 116 EXCESSIVE HEAT 1903
## 138 FLASH FLOOD 978
## 243 HEAT 937
## 418 LIGHTNING 816
## 779 TSTM WIND 504
## 154 FLOOD 470
## 524 RIP CURRENT 368
## 320 HIGH WIND 248
## 19 AVALANCHE 224
The top ten causes of fatalities are displayed above. We can see that tornadoes cause almost three times as many fatalities as the next leading cause, excessive heat. Looking at this list, we do see that some event types could be consolidated (ex: excessive heat and heat). However, due to the huge lead tornadoes have on this list, tornadoes would remain on top even if event types were consolidated.
Now we will turn our attention to injuries, following the same procedure as we did above for fatalities.
aggInjuries <- aggregate(INJURIES~EVTYPE,data=neededData,FUN=sum)
aggInjuries <- aggInjuries[order(-aggInjuries$INJURIES),]
head(aggInjuries,n=10)
## EVTYPE INJURIES
## 758 TORNADO 91346
## 779 TSTM WIND 6957
## 154 FLOOD 6789
## 116 EXCESSIVE HEAT 6525
## 418 LIGHTNING 5230
## 243 HEAT 2100
## 387 ICE STORM 1975
## 138 FLASH FLOOD 1777
## 685 THUNDERSTORM WIND 1488
## 212 HAIL 1361
The top ten causes of injuries are displayed above. We can see that tornadoes cause approximately 13 times as many injuries as the next leading cause, tstm wind. Looking at this list, we once again see that some event types could be consolidated (ex: tstm wind and thunderstorm wind). However, due to the huge lead tornadoes have on this list, tornadoes would remain on top even if event types were consolidated.
We now merge and sum our data on fatalities and injuries to get a composite measure of how destructive severe weather events can be in terms of bodily harm. We then sort the data in descending order based on the total harm (fatalities + injuries) done.
aggTotalHarm <- merge(aggFatalities,aggInjuries,by="EVTYPE")
aggTotalHarm$HARMTOTAL <- aggTotalHarm$FATALITIES+aggTotalHarm$INJURIES
aggTotalHarm <- aggTotalHarm[order(-aggTotalHarm$HARMTOTAL),]
head(aggTotalHarm,n=10)
## EVTYPE FATALITIES INJURIES HARMTOTAL
## 758 TORNADO 5633 91346 96979
## 116 EXCESSIVE HEAT 1903 6525 8428
## 779 TSTM WIND 504 6957 7461
## 154 FLOOD 470 6789 7259
## 418 LIGHTNING 816 5230 6046
## 243 HEAT 937 2100 3037
## 138 FLASH FLOOD 978 1777 2755
## 387 ICE STORM 89 1975 2064
## 685 THUNDERSTORM WIND 133 1488 1621
## 888 WINTER STORM 206 1321 1527
The top ten causes of bodily harm due to severe weather events are shown in the list above. Not surprisingly, we see that tornadoes have a commanding lead on this list, with excessive heat in second place.
In the barplot below, we summarize these results by showing the number of fatalities, injuries, and total cases of bodily harm for the top five causes of bodily harm due to severe weather events. This barplot makes it clear that injuries are far more likely than fatalities in the event of severe weather.
aggTotalHarm <- aggTotalHarm[1:5,]
topHarm <- rbind(aggTotalHarm$FATALITIES,aggTotalHarm$INJURIES,aggTotalHarm$HARMTOTAL)
aggTotalHarm$EVTYPE
## [1] "TORNADO" "EXCESSIVE HEAT" "TSTM WIND" "FLOOD"
## [5] "LIGHTNING"
colnames(topHarm) <- c("Tornado","Heat","Wind","Flood","Lightning")
library(lattice)
barplot(topHarm,beside=TRUE,las=1,legend.text=c("Fatalities","Injuries","Total"),col=c("red","yellow","blue"),main="Top 5 Causes of Bodily Harm",xlab="Weather Event",ylab="Cases")
We will now analyze the total amount of property damage (in USD) caused by each type of severe weather event by aggregating and summing the given data and then sorting the data in descending order by the number of property damage totals.
aggPropDmg<-aggregate(PROPDMGTOTAL~ EVTYPE,data = neededData,FUN=sum)
aggPropDmg <- aggPropDmg[order(-aggPropDmg$PROPDMGTOTAL),]
head(aggPropDmg,n=10)
## EVTYPE PROPDMGTOTAL
## 154 FLOOD 144657709807
## 372 HURRICANE/TYPHOON 69305840000
## 758 TORNADO 56947380677
## 599 STORM SURGE 43323536000
## 138 FLASH FLOOD 16822673979
## 212 HAIL 15735267513
## 363 HURRICANE 11868319010
## 772 TROPICAL STORM 7703890550
## 888 WINTER STORM 6688497251
## 320 HIGH WIND 5270046295
The top ten causes of property damage due to severe weather events are shown in the list above. We can see that floods cause approximately twice as much property damage as the next leading cause, hurricane/typhoon. Looking at this list, we once again see that some event types could be consolidated (ex: hurricane/typhoon and hurricane). However, due to the huge lead floods have on this list, floods would remain on top even if event types were consolidated.
Now we will turn our attention to crop damage, following the same procedure as we did above for property damage.
aggCropDmg<-aggregate(CROPDMGTOTAL~ EVTYPE,data = neededData,FUN=sum)
aggCropDmg <- aggCropDmg[order(-aggCropDmg$CROPDMGTOTAL),]
head(aggCropDmg,n=10)
## EVTYPE CROPDMGTOTAL
## 84 DROUGHT 13972566000
## 154 FLOOD 5661968450
## 529 RIVER FLOOD 5029459000
## 387 ICE STORM 5022113500
## 212 HAIL 3025954473
## 363 HURRICANE 2741910000
## 372 HURRICANE/TYPHOON 2607872800
## 138 FLASH FLOOD 1421317100
## 125 EXTREME COLD 1312973000
## 187 FROST/FREEZE 1094186000
The top ten causes of crop damage due to severe weather events are shown in the list above. We can see that droughts cause approximately 2.5 times as much crop damage as the next leading cause, flood. Looking at this list, we once again see that some event types could be consolidated (ex: flood, river flood, and flash flood). If these event types were consolidated, the total amount of crop damage due to floods ($12,112,744,550) would be much closer to the amount of damage due to drought ($13,972,566,000). However, both numbers pale in comparison to the amount of property damage done by floods ($144,657,709,807), so crop damage seems to be far less significant than property damage.
We now merge and sum our data on property and crop damage to get a composite measure of how destructive severe weather events can be in terms of economic impact. We then sort the data in descending order based on the total damage (property + crop) done.
aggTotalDmg <- merge(aggPropDmg,aggCropDmg,by="EVTYPE")
aggTotalDmg$DMGTOTAL <- aggTotalDmg$PROPDMGTOTAL+aggTotalDmg$CROPDMGTOTAL
aggTotalDmg <- aggTotalDmg[order(-aggTotalDmg$DMGTOTAL),]
head(aggTotalDmg,n=10)
## EVTYPE PROPDMGTOTAL CROPDMGTOTAL DMGTOTAL
## 154 FLOOD 144657709807 5661968450 150319678257
## 372 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 758 TORNADO 56947380677 414953270 57362333947
## 599 STORM SURGE 43323536000 5000 43323541000
## 212 HAIL 15735267513 3025954473 18761221986
## 138 FLASH FLOOD 16822673979 1421317100 18243991079
## 84 DROUGHT 1046106000 13972566000 15018672000
## 363 HURRICANE 11868319010 2741910000 14610229010
## 529 RIVER FLOOD 5118945500 5029459000 10148404500
## 387 ICE STORM 3944927860 5022113500 8967041360
The top ten causes of economic damage due to severe weather events are shown in the list above. We see that floods lead this list, followed by hurricane/typhoon. Even though droughts caused the most crop damage, they appear 7th on this list since crop damage pales in comparison to property damage.
In the barplot below, we summarize these results by showing the amount of property damage, crop damage, and total damage for the top seven causes of economic damage due to severe weather events.
aggTotalDmg <- aggTotalDmg[1:7,]
topDmg <- rbind(aggTotalDmg$PROPDMGTOTAL,aggTotalDmg$CROPDMGTOTAL,aggTotalDmg$DMGTOTAL)
aggTotalDmg$EVTYPE
## [1] "FLOOD" "HURRICANE/TYPHOON" "TORNADO"
## [4] "STORM SURGE" "HAIL" "FLASH FLOOD"
## [7] "DROUGHT"
colnames(topDmg) <- c("Flood","Hurricane","Tornado","Storm Surge","Hail","Flash Flood","Drought")
barplot(topDmg/10^9,beside=TRUE,las=1,legend.text=c("Property Damage","Crop Damage","Total Damage"),col=c("red","yellow","blue"),main="Top 7 Causes of Economic Damage",xlab="Weather Event",ylab="Damage Amount (in billions of USD)",cex.names=.7)
In summary, tornadoes are by far the leading cause of fatalities and injuries due to severe weather, with excessive heat, thunderstorm wind, floods, and lightning all fairly close in second through fifth place. Floods have the biggest economic impact in terms of total property and crop damage, with hurricanes coming in second place. While droughts surpass floods in terms of crop damage, they are in seventh place when property and crop damage are combined since droughts do not cause a significant amount of property damage.