knitr::opts_chunk$set(echo = TRUE)

Synopsis

The U.S. National Oceanic and Atmospheric Administration (NOAA) published storm data from 1950 to 2011. The analysis below was conducted to graphically represent the most dangerous and destructive weather events over the 61 year period studied. This was represented by 3 figures (1 single plot figure, 2 dual plot figures) that look at the human impacts: total casualties, total fatalities and total injuries as well as the economic impacts: total property damage (in Billions USD) and crop damage (in Billions USD). It was found that the most dangerous events to humans in the way of death or bodily injury are tornadoes. The most destructive to property was found to be flooding and to crops was found to be drought. After the raw data was loaded from the zip file some processing was done to allow for calculations and graphing. All graphs were made using the ggplot2, grid, and gridExtra packages.

Data Processing

The main processing items that need to be accomplished is unzipping and reading the file as well as making a calculated value to quantify damage/ human loss for each instance of a weather event, the steps necessary to complete this are documented below

#bz2 file in working directory
stormCSV <- read.csv(bzfile("./repdata%2Fdata%2FStormData.csv.bz2"))


#Checking column titles and data
head(stormCSV)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
stormData<- stormCSV[,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

#Checking Exponent field for conversion to numeric exponent
unique(stormData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(stormData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
#Converting unique valued to numeric exponent
stormData$PDXN[stormData$PROPDMGEXP == "-"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == "?"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == "+"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == ""] <- 1
stormData$PDXN[stormData$PROPDMGEXP == "0"] <- 1
stormData$PDXN[stormData$PROPDMGEXP == "1"] <- 10
stormData$PDXN[stormData$PROPDMGEXP == "2"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "3"] <- 1000
stormData$PDXN[stormData$PROPDMGEXP == "4"] <- 10000
stormData$PDXN[stormData$PROPDMGEXP == "5"] <- 100000
stormData$PDXN[stormData$PROPDMGEXP == "6"] <- 1000000
stormData$PDXN[stormData$PROPDMGEXP == "7"] <- 10000000
stormData$PDXN[stormData$PROPDMGEXP == "8"] <- 100000000
stormData$PDXN[stormData$PROPDMGEXP == "B"] <- 1000000000
stormData$PDXN[stormData$PROPDMGEXP == "H"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "h"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "K"] <- 1000
stormData$PDXN[stormData$PROPDMGEXP == "M"] <- 1000000
stormData$PDXN[stormData$PROPDMGEXP == "m"] <- 1000000
stormData$CDXN[stormData$CROPDMGEXP ==  "?"] <- 0
stormData$CDXN[stormData$CROPDMGEXP ==  ""] <- 1
stormData$CDXN[stormData$CROPDMGEXP ==  "0"] <- 1
stormData$CDXN[stormData$CROPDMGEXP ==  "2"] <- 100
stormData$CDXN[stormData$CROPDMGEXP ==  "B"] <- 1000000000 
stormData$CDXN[stormData$CROPDMGEXP ==  "K"] <- 1000
stormData$CDXN[stormData$CROPDMGEXP ==  "k"] <- 1000
stormData$CDXN[stormData$CROPDMGEXP ==  "M"] <- 1000000
stormData$CDXN[stormData$CROPDMGEXP ==  "m"] <- 1000000

Data Processing Description and Justification

Reviewing the data in columns PROPDMG and PROPDMGEXP it was noticed that instances with the value of -,?,+ needed to be set to a value of 0. All other instances with a blank “” cell or 0 in PROP/CROPDMGEXP were the literal PROP/CROPDMG value and were set to an exponent of 1. The exponent notation covered x^0 - x^8 numerically along with h for hundred (x^2), K for thousand (x^3), M for million (x^6) and B for billion (x^9), that was translated to the numerical equivalent and multiplied by the value to generate the actual value in dollars or column PDV The above processing yields two new columns: PDXN= Property Damage Exponent Numeric CDXN= Crop Damage Exponent Numeric

Multiplying the numeric exponent (PDXN/CDXN) with Property/Crop Damage Value yields: PDV= Property Damage Value CDV= Crop Damage Value

#Multiplying the columns to created new PDV/CDV Columns
stormData$PDV <- stormData$PROPDMG * stormData$PDXN
stormData$CDV <- stormData$CROPDMG * stormData$CDXN

Data Analysis and Plot Pre-processing

Deaths and Injuries totals by weather event are below, the values for the respective occurrence are summed by weather type then subset to be ordered by most death/injury to least

stormDeaths <- aggregate(FATALITIES ~ EVTYPE, stormData, FUN = sum)
stormInjuries <- aggregate(INJURIES ~ EVTYPE, stormData, FUN = sum)
stormDC <- stormDeaths
stormIC <- stormInjuries
colnames(stormDC) <- c("EVTYPE", "CASUALTIES")
colnames(stormIC) <- c("EVTYPE", "CASUALTIES")
stormCasualties <- rbind(stormDC, stormIC)
stormCasualtyTotal <- aggregate(CASUALTIES ~ EVTYPE, stormCasualties, FUN = sum)


stormDeathsOrdered <- stormDeaths[order(stormDeaths$FATALITIES, decreasing = T),]
stormInjuriesOrdered <- stormInjuries[order(stormInjuries$INJURIES, decreasing = T),]
stormCasualtiesOrdered <- stormCasualtyTotal[order(stormCasualtyTotal$CASUALTIES, decreasing = T),]

Property Damage and Crop Damage totals by weather event are below, the values for the respective occurrence are summed by weather type then subset to be ordered by most damage to least

totalPropD <- aggregate(PDV ~ EVTYPE, stormData, FUN = sum )
totalCropD <- aggregate(CDV ~ EVTYPE, stormData, FUN = sum )

propDOrdered <- totalPropD[order(totalPropD$PDV, decreasing = T),]
cropDOrdered <- totalCropD[order(totalCropD$CDV, decreasing = T),]

Results

Impact to population health

Figure 1. Most Human Casualties per Severe Weather Event

library(ggplot2)

                                        
casualtyPlot <- ggplot(data=head(stormCasualtiesOrdered,10), aes(x=reorder(EVTYPE, -CASUALTIES), y= CASUALTIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Human Casualties", title= "Total Casualties Caused by Severe Weather in US") + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90))

print(casualtyPlot)

As Shown in Figure 1, there exists a disproportionate number of human casualties in the United States from Tornado compared to any other severe weather type. Figure 2 will investigate further to see which cause more deaths versus just injuries as casualties is the sum of deaths and injuries due to an event

Figure 2. Most Injuries and Most Deaths per Severe Weather Event

library(ggplot2)
library(grid)
library(gridExtra)
                                        
injuryPlot <- ggplot(data=head(stormInjuriesOrdered,10), aes(x=reorder(EVTYPE, -INJURIES), y= INJURIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Persons Injured", title= "Injuries Caused by Severe Weather in US")  + theme(plot.title = element_text(size = 10, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) +  theme(axis.title.y = element_text( size = 10)) +  theme(axis.title.x = element_text( size = 10))  

fatalPlot <- ggplot(data=head(stormDeathsOrdered,10), aes(x=reorder(EVTYPE, -FATALITIES), y= FATALITIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Fatalities", title= "Fatalities Caused by Severe Weather in US") + theme(plot.title = element_text(size = 10, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) +  theme(axis.title.y = element_text( size = 10)) +  theme(axis.title.x = element_text( size = 10))

grid.arrange(fatalPlot, injuryPlot, ncol = 2)

Figure 2. Above gives some great insight into Figure 1. As shown Tornadoes injure far people more than they kill but both values are disproportionate to the rest of the weather events in the top 10. An interesting takeaway is that excessive heat is the 2ND most deadly which could be mitigated completely with proper emergency management planning for those most at risk during heat waves. ## Economic Impact

Figure 3. Highest Property Damage and Highest Crop Damage per Severe Weather Event

library(ggplot2)
library(grid)
library(gridExtra)
                                         
propPlot <- ggplot(data=head(propDOrdered,10), aes(x=reorder(EVTYPE, -PDV), y= PDV/(10^9))) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Property Damage (Billions of USD)", title= "Property Damage Caused by Severe Weather in US")  + theme(plot.title = element_text(size = 8, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) +  theme(axis.title.y = element_text( size = 8)) +  theme(axis.title.x = element_text( size = 8))

cropPlot <- ggplot(data=head(cropDOrdered,10), aes(x=reorder(EVTYPE, -CDV), y= CDV/(10^9))) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Crop Damage (Billions of USD)", title= "Crop Damage Caused by Severe Weather in US")  + theme(plot.title = element_text(size = 8, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90))  +  theme(axis.title.y = element_text( size = 8)) +  theme(axis.title.x = element_text( size = 8))

grid.arrange(propPlot, cropPlot, ncol = 2)

Figure 3. above shows that contrary to the human toll that tornadoes take, Flooding has caused the most property damage and drought has caused the most crop losses. In this case many of the top 10 events involved water precipitation which could account for the disproportionate damage from flooding and for crops not having rain speaks for itself as irrigation is the key to agriculture.