knitr::opts_chunk$set(echo = TRUE)
The U.S. National Oceanic and Atmospheric Administration (NOAA) published storm data from 1950 to 2011. The analysis below was conducted to graphically represent the most dangerous and destructive weather events over the 61 year period studied. This was represented by 3 figures (1 single plot figure, 2 dual plot figures) that look at the human impacts: total casualties, total fatalities and total injuries as well as the economic impacts: total property damage (in Billions USD) and crop damage (in Billions USD). It was found that the most dangerous events to humans in the way of death or bodily injury are tornadoes. The most destructive to property was found to be flooding and to crops was found to be drought. After the raw data was loaded from the zip file some processing was done to allow for calculations and graphing. All graphs were made using the ggplot2, grid, and gridExtra packages.
The main processing items that need to be accomplished is unzipping and reading the file as well as making a calculated value to quantify damage/ human loss for each instance of a weather event, the steps necessary to complete this are documented below
#bz2 file in working directory
stormCSV <- read.csv(bzfile("./repdata%2Fdata%2FStormData.csv.bz2"))
#Checking column titles and data
head(stormCSV)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
stormData<- stormCSV[,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
#Checking Exponent field for conversion to numeric exponent
unique(stormData$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(stormData$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
#Converting unique valued to numeric exponent
stormData$PDXN[stormData$PROPDMGEXP == "-"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == "?"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == "+"] <- 0
stormData$PDXN[stormData$PROPDMGEXP == ""] <- 1
stormData$PDXN[stormData$PROPDMGEXP == "0"] <- 1
stormData$PDXN[stormData$PROPDMGEXP == "1"] <- 10
stormData$PDXN[stormData$PROPDMGEXP == "2"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "3"] <- 1000
stormData$PDXN[stormData$PROPDMGEXP == "4"] <- 10000
stormData$PDXN[stormData$PROPDMGEXP == "5"] <- 100000
stormData$PDXN[stormData$PROPDMGEXP == "6"] <- 1000000
stormData$PDXN[stormData$PROPDMGEXP == "7"] <- 10000000
stormData$PDXN[stormData$PROPDMGEXP == "8"] <- 100000000
stormData$PDXN[stormData$PROPDMGEXP == "B"] <- 1000000000
stormData$PDXN[stormData$PROPDMGEXP == "H"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "h"] <- 100
stormData$PDXN[stormData$PROPDMGEXP == "K"] <- 1000
stormData$PDXN[stormData$PROPDMGEXP == "M"] <- 1000000
stormData$PDXN[stormData$PROPDMGEXP == "m"] <- 1000000
stormData$CDXN[stormData$CROPDMGEXP == "?"] <- 0
stormData$CDXN[stormData$CROPDMGEXP == ""] <- 1
stormData$CDXN[stormData$CROPDMGEXP == "0"] <- 1
stormData$CDXN[stormData$CROPDMGEXP == "2"] <- 100
stormData$CDXN[stormData$CROPDMGEXP == "B"] <- 1000000000
stormData$CDXN[stormData$CROPDMGEXP == "K"] <- 1000
stormData$CDXN[stormData$CROPDMGEXP == "k"] <- 1000
stormData$CDXN[stormData$CROPDMGEXP == "M"] <- 1000000
stormData$CDXN[stormData$CROPDMGEXP == "m"] <- 1000000
Reviewing the data in columns PROPDMG and PROPDMGEXP it was noticed that instances with the value of -,?,+ needed to be set to a value of 0. All other instances with a blank “” cell or 0 in PROP/CROPDMGEXP were the literal PROP/CROPDMG value and were set to an exponent of 1. The exponent notation covered x^0 - x^8 numerically along with h for hundred (x^2), K for thousand (x^3), M for million (x^6) and B for billion (x^9), that was translated to the numerical equivalent and multiplied by the value to generate the actual value in dollars or column PDV The above processing yields two new columns: PDXN= Property Damage Exponent Numeric CDXN= Crop Damage Exponent Numeric
Multiplying the numeric exponent (PDXN/CDXN) with Property/Crop Damage Value yields: PDV= Property Damage Value CDV= Crop Damage Value
#Multiplying the columns to created new PDV/CDV Columns
stormData$PDV <- stormData$PROPDMG * stormData$PDXN
stormData$CDV <- stormData$CROPDMG * stormData$CDXN
Deaths and Injuries totals by weather event are below, the values for the respective occurrence are summed by weather type then subset to be ordered by most death/injury to least
stormDeaths <- aggregate(FATALITIES ~ EVTYPE, stormData, FUN = sum)
stormInjuries <- aggregate(INJURIES ~ EVTYPE, stormData, FUN = sum)
stormDC <- stormDeaths
stormIC <- stormInjuries
colnames(stormDC) <- c("EVTYPE", "CASUALTIES")
colnames(stormIC) <- c("EVTYPE", "CASUALTIES")
stormCasualties <- rbind(stormDC, stormIC)
stormCasualtyTotal <- aggregate(CASUALTIES ~ EVTYPE, stormCasualties, FUN = sum)
stormDeathsOrdered <- stormDeaths[order(stormDeaths$FATALITIES, decreasing = T),]
stormInjuriesOrdered <- stormInjuries[order(stormInjuries$INJURIES, decreasing = T),]
stormCasualtiesOrdered <- stormCasualtyTotal[order(stormCasualtyTotal$CASUALTIES, decreasing = T),]
Property Damage and Crop Damage totals by weather event are below, the values for the respective occurrence are summed by weather type then subset to be ordered by most damage to least
totalPropD <- aggregate(PDV ~ EVTYPE, stormData, FUN = sum )
totalCropD <- aggregate(CDV ~ EVTYPE, stormData, FUN = sum )
propDOrdered <- totalPropD[order(totalPropD$PDV, decreasing = T),]
cropDOrdered <- totalCropD[order(totalCropD$CDV, decreasing = T),]
Figure 1. Most Human Casualties per Severe Weather Event
library(ggplot2)
casualtyPlot <- ggplot(data=head(stormCasualtiesOrdered,10), aes(x=reorder(EVTYPE, -CASUALTIES), y= CASUALTIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Human Casualties", title= "Total Casualties Caused by Severe Weather in US") + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90))
print(casualtyPlot)
As Shown in Figure 1, there exists a disproportionate number of human casualties in the United States from Tornado compared to any other severe weather type. Figure 2 will investigate further to see which cause more deaths versus just injuries as casualties is the sum of deaths and injuries due to an event
Figure 2. Most Injuries and Most Deaths per Severe Weather Event
library(ggplot2)
library(grid)
library(gridExtra)
injuryPlot <- ggplot(data=head(stormInjuriesOrdered,10), aes(x=reorder(EVTYPE, -INJURIES), y= INJURIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Persons Injured", title= "Injuries Caused by Severe Weather in US") + theme(plot.title = element_text(size = 10, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) + theme(axis.title.y = element_text( size = 10)) + theme(axis.title.x = element_text( size = 10))
fatalPlot <- ggplot(data=head(stormDeathsOrdered,10), aes(x=reorder(EVTYPE, -FATALITIES), y= FATALITIES)) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Fatalities", title= "Fatalities Caused by Severe Weather in US") + theme(plot.title = element_text(size = 10, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) + theme(axis.title.y = element_text( size = 10)) + theme(axis.title.x = element_text( size = 10))
grid.arrange(fatalPlot, injuryPlot, ncol = 2)
Figure 2. Above gives some great insight into Figure 1. As shown Tornadoes injure far people more than they kill but both values are disproportionate to the rest of the weather events in the top 10. An interesting takeaway is that excessive heat is the 2ND most deadly which could be mitigated completely with proper emergency management planning for those most at risk during heat waves. ## Economic Impact
Figure 3. Highest Property Damage and Highest Crop Damage per Severe Weather Event
library(ggplot2)
library(grid)
library(gridExtra)
propPlot <- ggplot(data=head(propDOrdered,10), aes(x=reorder(EVTYPE, -PDV), y= PDV/(10^9))) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Property Damage (Billions of USD)", title= "Property Damage Caused by Severe Weather in US") + theme(plot.title = element_text(size = 8, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) + theme(axis.title.y = element_text( size = 8)) + theme(axis.title.x = element_text( size = 8))
cropPlot <- ggplot(data=head(cropDOrdered,10), aes(x=reorder(EVTYPE, -CDV), y= CDV/(10^9))) + geom_bar(stat="identity") + labs(x="Severe Weather Event", y="Total Crop Damage (Billions of USD)", title= "Crop Damage Caused by Severe Weather in US") + theme(plot.title = element_text(size = 8, face="bold")) + theme(axis.text.x = element_text(face = "bold", size = 7, angle = 90)) + theme(axis.title.y = element_text( size = 8)) + theme(axis.title.x = element_text( size = 8))
grid.arrange(propPlot, cropPlot, ncol = 2)
Figure 3. above shows that contrary to the human toll that tornadoes take, Flooding has caused the most property damage and drought has caused the most crop losses. In this case many of the top 10 events involved water precipitation which could account for the disproportionate damage from flooding and for crops not having rain speaks for itself as irrigation is the key to agriculture.