Analysis of the NOAA Storm Database from 1950 to November of 2011 was used to answer two basic questions:
1. Which types of events are most harmful with respect to population health?
2. Which types of events have the greatest economic consequences?
The top five events with population health consequences were determined and ranked by either the total number of injuries and the total number of fatalities. Similarly, the top five events with economic consequences were determined and ranked by the total damages to crops and property. While tornados cause the greatest number of injuries and fatalities, floods cause the greatest economic damage as measured by the sum of crop and property damage.
Data from NOAA Storm Database from 1950 to November of 2011 was read into R as a zip file. The database tracks major storms and severe weather event characterisics, including property and crop damage estimates as well as any associated injuries and fatalities. The EVTYPE, event type, variable was used to create subsets of the entire dataset to answer specific questions regarding population health, as measured by injuries and fatalities, and regarding economic consequences, as measured by the sum of crop and property damage estimates for each incident.
library(dplyr)
library(lattice)
library(ggplot2)
library(knitr)
#read zip file into R
newdata <- read.csv("repdata-data-StormData.csv.bz2")
newdata$EVTYPE <- toupper(as.character(newdata$EVTYPE))
newdata$EVTYPE <- as.factor(newdata$EVTYPE)
# create subsets of dataset for answers to questions
health <- group_by(newdata, EVTYPE)
y <- colnames(newdata)
# find column numbers of events and damages
damage_logic <- grep("*[A-Z]DMG|EVTYPE", y)
# subset dataset for events and damages
DMGdata <- newdata[damage_logic]
factorTOnumber <- function(x) {
# function to convert order of magnitude factor to number
facts <- c("B", "M", "K", "H")
num <- c("1000000000", "1000000", "1000", "100")
#convert lower case entries to upper case
x<- toupper(as.character(x))
# assume entries which are not in facts are typos
# set typos to 1
x <- gsub("[^MBKH]","1",x)
# convert factors to numbers
for (i in 1:length(facts)) {
x <- gsub(facts[i], num[i],as.character(x))
}
x <- as.numeric(x)
# check for NAs?
for(i in 1:length(x)) {
if(is.na(x[i])) {
x[i] <- 1
}
}
x
}
# clean up damage dataset
DMGdata$PROPDMGEXP <-factorTOnumber(DMGdata$PROPDMGEXP)
DMGdata$CROPDMGEXP <-factorTOnumber(DMGdata$CROPDMGEXP)
Although the NOAA database contains a number of faulty entries, these entries did not appear to affect the rankings of the top five event types for either population health or total economic consequences. Tornados are responsible for the greatest number of injuries and deaths by far. While associated with a high number of fatalities, excessive heat is responsible for about one-third of the fatalities caused by tornados. Floods are associated with the greatest losses in crops and property damages.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Question 1
# sort q1 data set by total injuries or fatalities in EVTYPE
health_sum <- health %>% group_by(EVTYPE) %>% summarize( sum_fatalities = sum(FATALITIES), sum_injured = sum(INJURIES))
# order dataset by events causing the greatest number of fatalities
fatality_sum <- arrange(health_sum, desc(sum_fatalities))
# order dataset by events causing the greatest number of injuries
injured_sum <- arrange(health_sum, desc(sum_injured))
# determine top five event types
topfive_dead <- head(fatality_sum, 5)
topfive_injury <- head(injured_sum, 5)
# Question 2
# calculate damages in millions of USD
DMGdata <- mutate(DMGdata, property_damage = (PROPDMG * PROPDMGEXP)/10^6,
crop_damage = (CROPDMG * CROPDMGEXP)/10^6, total_damage = crop_damage + property_damage)
crop_tot_damageDF <- DMGdata %>% group_by(EVTYPE) %>% summarize(tot_crop_damage = sum(crop_damage))
crop_tot_damageDF<- arrange(crop_tot_damageDF, desc(tot_crop_damage))
property_tot_damageDF <- DMGdata %>% group_by(EVTYPE) %>% summarize(tot_property_damage = sum(property_damage))
property_tot_damageDF <- arrange (property_tot_damageDF, desc(tot_property_damage))
sum_tot_damageDF <- DMGdata %>% group_by(EVTYPE) %>% summarize(tot_damage = sum(total_damage))
sum_tot_damageDF <- arrange (sum_tot_damageDF, desc(tot_damage))
# determine top five event types
topfive_crop <- head(crop_tot_damageDF, 5)
topfive_prop <- head(property_tot_damageDF, 5)
topfive_cropprop <- head(sum_tot_damageDF, 5)
Tornados are most harmful to population health as measured by the number of either injuries and deaths (Figures 1 and 2).
Floods are associated with the greatest damage in economic terms, followed by hurricanes and tornados. (Figure 3)
library(ggplot2)
qplot(topfive_injury$EVTYPE,topfive_injury$sum_injured, geom ="bar", stat="identity", xlab = "Event type", ylab = "Total injuries", main = "Figure 1: Top five events causing greatest number of injuries")
qplot(topfive_dead$EVTYPE,topfive_dead$sum_fatalities, geom ="bar", stat="identity", xlab = "Event type", ylab = "Total deaths", main = "Figure 2: Top five events causing greatest number of fatalities")
qplot(topfive_cropprop$EVTYPE,topfive_cropprop$tot_damage, geom ="bar", stat="identity", xlab = "Event type", ylab = "Total damage (in millions USD)", main = "Figure 3: Top five events causing greatest economic damage")