Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

It was found that tornadoes have the biggest impact upon population health. Floods have the greatest total economic impact.

Data Processing

The data were loaded into R using the read.csv command.

myData=read.csv("StormData.csv.bz2", header = TRUE, sep = ",", quote = "\"")

As some of the data for variables “CROPDMGEXP” and “PROPDMGEXP” are incorrectly formatted, in order to automatically exclude values which are not in a correct format, I create new variables “CROPmultiplier” and “PROPmultiplier” set to the value of 0. Therefore, only valid alphabetical multipliers that I specifically identify will be have any impact on the final total.

myData$CROPmultiplier=0
myData$CROPmultiplier[myData$CROPDMGEXP=="m"]=1000000
myData$CROPmultiplier[myData$CROPDMGEXP=="M"]=1000000
myData$CROPmultiplier[myData$CROPDMGEXP=="k"]=1000
myData$CROPmultiplier[myData$CROPDMGEXP=="K"]=1000
myData$CROPmultiplier[myData$CROPDMGEXP=="B"]=1000000000

myData$PROPmultiplier=0
myData$PROPmultiplier[myData$PROPDMGEXP=="m"]=1000000
myData$PROPmultiplier[myData$PROPDMGEXP=="M"]=1000000
myData$PROPmultiplier[myData$PROPDMGEXP=="k"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="K"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="B"]=1000000000
myData$PROPmultiplier[myData$PROPDMGEXP=="H"]=1000
myData$PROPmultiplier[myData$PROPDMGEXP=="h"]=1000000000 

Results

Question 1: Which types of events are most harmful to population health?

The total fatalities and injuries were calculated, and the type of event with the highest number for each was identified.

sumFatalities <- aggregate(FATALITIES ~ EVTYPE, data = myData, FUN = sum)
sumInjuries <- aggregate(INJURIES ~ EVTYPE, data = myData, FUN = sum)
head(sort(sumInjuries$INJURIES,decreasing=TRUE))
## [1] 91346  6957  6789  6525  5230  2100
sumInjuries[sumInjuries$INJURIES==91346,]
##      EVTYPE INJURIES
## 834 TORNADO    91346
head(sort(sumFatalities$FATALITIES,decreasing=TRUE))
## [1] 5633 1903  978  937  816  504
sumFatalities[sumFatalities$FATALITIES==5633,]
##      EVTYPE FATALITIES
## 834 TORNADO       5633

In terms of both fatalities and injuries, tornados are the most harmful to population health.

Question 2: Wich types of events have the greatest economic consequences?

myData$DAMAGETOT=(myData$PROPmultiplier*myData$PROPDMG)+(myData$CROPDMG*myData$CROPmultiplier)

To examine which events have the greatest economic consequence, I create a plot including the new ‘DAMAGETOT’ variable. As there were many different event types, I only included the 10 associated with the highest damage totals. As shown on the plot, floods results in the highest total damage.

sumDamage <- aggregate(DAMAGETOT ~ EVTYPE, data = myData, FUN = sum) 
head(sort(sumDamage$DAMAGETOT,decreasing=TRUE))
## [1] 150319678250  71913712800  57352113590  43323541000  18758226170
## [6]  17562128610
sumDamage[sumDamage$DAMAGETOT==150319678250,]
##     EVTYPE    DAMAGETOT
## 170  FLOOD 150319678250
damage2=sumDamage[order(sumDamage$DAMAGETOT, decreasing = TRUE),]
damage2=damage2[1:10,]

library(ggplot2)
qplot(EVTYPE, data = damage2, weight = DAMAGETOT, geom = "bar", binwidth = 1) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_y_continuous("Crop Damage (USD)") + 
  xlab("Weather Type") + 
  ggtitle("Figure 1: Total Damage by Severe Weather Events") +
  geom_bar(fill="#FF9999", colour="black")