Synopsis

This analysis covers severe weather events across the United States from 1950 to 2011 obtained through the NOAA Storm Database. Analysis was limited to events which occurred more than 10 times during the date range,encompassing a total of 900,632 events. Analysis of these events demonstrate that across the United States “TORNADO” events are most harmful to population health in terms of total fatalities (5633) and toal injuries (91,346). Furthemore the event “FLOOD” have the largest ecnomic consequences as measured by total damage of $144,657,709,800 over the time period measured.

Data Processing

  1. Load required libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
  1. Assuming the the current working directory has access to the raw data in “CSV” format and stored as StormData.csv
stormDat <- read.csv("StormData.csv", header=TRUE)
  1. We are only interested in the original fields corresponding to Event Type, Fatalities, Injuries, Property Damage, Prop Damage Exponential. Then we would like to limit the analysis to events that occured 10 or more times over the time period. Then we adjust the property damage based on the PropDmgExp which denotes if the value is in thousands, millions, or billions. When we have the adjusted propery damage adjust field we can drop the other assessments of property damage. Now we can generate a data frame with rows corresponding to each of the events and columns with the sum of Fatalities, Injuries, Property damage and the average(mean) of the same events. The final values are held in the data frame stormValues
stDat <- select(stormDat, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP)
tempTable <- table(stDat$EVTYPE)
eventsToKeep <- names(tempTable[tempTable >= 10])
stDat <- filter(stDat, EVTYPE %in% eventsToKeep)
stDat$PROPDMGADJUST <- rep(0, nrow(stDat))

billions <- grep("B", stDat$PROPDMGEXP, ignore.case = TRUE)
stDat[billions,]$PROPDMGADJUST <- stDat[billions,]$PROPDMG * 1000000000

millions <- grep("M", stDat$PROPDMGEXP, ignore.case = TRUE)
stDat[millions,]$PROPDMGADJUST <- stDat[millions,]$PROPDMG * 1000000

thousands <- grep("K", stDat$PROPDMGEXP, ignore.case = TRUE)
stDat[thousands,]$PROPDMGADJUST <- stDat[thousands,]$PROPDMG * 1000

stDat <- select(stDat, -PROPDMG, -PROPDMGEXP)

stormValues <- aggregate( . ~ EVTYPE, data = stDat, sum)
colnames(stormValues) <- c("EVENT","FATALITIES_SUM","INJURIES_SUM","PROPDMG_SUM")
stormValues <- cbind(stormValues, aggregate( . ~ EVTYPE, data = stDat, mean)[,-1])
colnames(stormValues)[5:7] <- c("FATALITIES_MEAN","INJURIES_MEAN","PROPDMG_MEAN")

Results

  1. The 10 events with highest total fatalities
topFatal <- arrange(stormValues, desc(FATALITIES_SUM))[1:10,]
ggplot(data = topFatal, aes(x=reorder(EVENT,-FATALITIES_SUM), y=FATALITIES_SUM)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust=1)) + labs(title="Total Fatalities by Event", y="Total Fatalities", x="Event")

  1. The 10 events with highest total damage
topDmg <- arrange(stormValues, desc(PROPDMG_SUM))[1:10,]
ggplot(data = topDmg, aes(x=reorder(EVENT,-PROPDMG_SUM), y=PROPDMG_SUM)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust=1)) + labs(title="Total Damage by Event", y="Total Damage ($)", x="Event")