Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
It was found that tornadoes have the biggest impact upon population health. Floods have the greatest total economic impact.
The data were loaded into R using the read.csv command.
myData=read.csv("StormData.csv.bz2", header = TRUE, sep = ",", quote = "\"")
As some of the data for variables “CROPDMGEXP” and “PROPDMGEXP” are incorrectly formatted, in order to automatically exclude values which are not in a correct format, I create new variables “CROPmultiplier” and “PROPmultiplier” set to the value of 0. Therefore, only valid alphabetical multipliers that I specifically identify will be have any impact on the final total.
myData$CROPmultiplier=0
myData$CROPmultiplier[myData$CROPDMGEXP=="m"]=1000000
myData$CROPmultiplier[myData$CROPDMGEXP=="M"]=1000000
myData$CROPmultiplier[myData$CROPDMGEXP=="k"]=1000
myData$CROPmultiplier[myData$CROPDMGEXP=="K"]=1000
myData$CROPmultiplier[myData$CROPDMGEXP=="B"]=1000000000
myData$PROPmultiplier=0
myData$PROPmultiplier[myData$PROPDMGEXP=="m"]=1000000
myData$PROPmultiplier[myData$PROPDMGEXP=="M"]=1000000
myData$PROPmultiplier[myData$PROPDMGEXP=="k"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="K"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="B"]=1000000000
myData$PROPmultiplier[myData$PROPDMGEXP=="H"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="h"]=1000000000
The total fatalities and injuries were calculated, and the type of event with the highest number for each was identified.
sumFatalities <- aggregate(FATALITIES ~ EVTYPE, data = myData, FUN = sum)
sumInjuries <- aggregate(INJURIES ~ EVTYPE, data = myData, FUN = sum)
head(sort(sumInjuries$INJURIES,decreasing=TRUE))
## [1] 91346 6957 6789 6525 5230 2100
sumInjuries[sumInjuries$INJURIES==91346,]
## EVTYPE INJURIES
## 834 TORNADO 91346
head(sort(sumFatalities$FATALITIES,decreasing=TRUE))
## [1] 5633 1903 978 937 816 504
sumFatalities[sumFatalities$FATALITIES==5633,]
## EVTYPE FATALITIES
## 834 TORNADO 5633
In terms of both fatalities and injuries, tornados are the most harmful to population health.
myData$DAMAGETOT=(myData$PROPmultiplier*myData$PROPDMG)+(myData$CROPDMG*myData$CROPmultiplier)
To examine which events have the greatest economic consequence, I create a plot including the new ‘DAMAGETOT’ variable. As there were many different event types, I only included the 10 associated with the highest damage totals. As shown on the plot, floods results in the highest total damage.
sumDamage <- aggregate(DAMAGETOT ~ EVTYPE, data = myData, FUN = sum)
head(sort(sumDamage$DAMAGETOT,decreasing=TRUE))
## [1] 150319678250 71913712800 57352113590 43323541000 18758226170
## [6] 17562128610
sumDamage[sumDamage$DAMAGETOT==150319678250,]
## EVTYPE DAMAGETOT
## 170 FLOOD 150319678250
damage2=sumDamage[order(sumDamage$DAMAGETOT, decreasing = TRUE),]
damage2=damage2[1:10,]
library(ggplot2)
qplot(EVTYPE, data = damage2, weight = DAMAGETOT, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous("Crop Damage (USD)") +
xlab("Weather Type") +
ggtitle("Figure 1: Total Damage by Severe Weather Events") +
geom_bar(fill="#FF9999", colour="black")