Synopsis

This report uses storm data documented by the U.S. weather service from 1950-2011 to assess the impact of storms on human health and the economy. The first half of the report concerns processing the data for the summary analyses that follow. Data on injuries, fatalities, and costs to property and crops are all totalled for each type of storm contained in the data set (834 types), and the storm types with highest negative impacts are reported.

Data Processing

Packages

library(dplyr)
library(knitr)

Set Directory; Download and store the data file

if(!file.exists("/Users/samuelshaw/CourseraProjects/RepResearch2")){
        dir.create("/Users/samuelshaw/CourseraProjects/RepResearch2")}
setwd("/Users/samuelshaw/CourseraProjects/RepResearch2")

fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("StormData.csv.bz2")){
        download.file(fileUrl,destfile="StormData.csv.bz2")
}

Read the data

StormData<-read.csv("StormData.csv.bz2")
# str(StormData)

Clean the data

Data cleaning consists of the following 6 steps:
1. The analysis will concern only a portion of the data contained in the Storm dataset: the column that contains the type of Event - “EVTYPE”; columns pertaining to Harm to Population - “FATALITIES” & “INJURIES”; and columns pertaining to Economic consequences - “PROPDMG”, “PROPDMGEXP”, “CROPDMG” $ “CROPDMGEXP”. Minimizing the data set accordingly will make the file smaller and easier to work with.
SD<-select(StormData, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
2. To be consistent as possible regarding the type of event contained in the event type column, it makes sense to remove the “Summary” event entries; # c.f. SD$EVTYPE
SD<-filter(SD, !grepl("Summary", SD$EVTYPE))
3. Because the codebook for the Storm dataset does not explain all of the values contained in the “PROPDMGEXP” and “CROPDMGEXP” columns, uncoded values should be set to NA. Also, to keep values consistent for future data processing, the remaining values can be set to upper case.
SD$EVTYPE<-as.factor(toupper(SD$EVTYPE))
SD$PROPDMGEXP<-as.factor(toupper(SD$PROPDMGEXP))
SD$CROPDMGEXP<-as.factor(toupper(SD$CROPDMGEXP))

PrDExp<-as.character(SD$PROPDMGEXP)
PrDExp[PrDExp %in% ""]<-0
PrDExp[PrDExp %in% c("?","-","+","1","2","3","4","5","6","7","8")]<-NA
# unique(PrDExp)

CrDExp<-as.character(SD$CROPDMGEXP)
CrDExp[CrDExp %in% ""]<-0
CrDExp[CrDExp %in% c("?","-","+","1","2","3","4","5","6","7","8")]<-NA
# unique(CrDExp)
4. We need to transform the values contained in the columns “PROPDMG” & “CROPDMG” to include the value indicated by their exponents (i.e, columns “PROPDMGEXP”, and “CROPDMGEXP”, respectively). We can do this by substituting the character values of the exponents for what they represent - as indicated by the codebook: H = hundred, K = thousand, M = million, and B = billion.
PrDExp<-gsub("H", 100, PrDExp)
PrDExp<-gsub("K", 1000, PrDExp)
PrDExp<-gsub("M", 1000000, PrDExp)
PrDExp<-gsub("B", 1000000000, PrDExp)
CrDExp<-gsub("K", 1000, CrDExp)
CrDExp<-gsub("M", 1000000, CrDExp)
CrDExp<-gsub("B", 1000000000, CrDExp)
5. We can compute the totals property and total crop damage for each event by multiplying the original values by their exponents (processed above), and simplify the dataset again by selecting only the necessary columns.
SD$total.prop<-as.numeric(PrDExp)*SD$PROPDMG
SD$total.crop<-as.numeric(CrDExp)*SD$CROPDMG
SD2<-select(SD, c(1:3, 8:9))
str(SD2)
## 'data.frame':    902224 obs. of  5 variables:
##  $ EVTYPE    : Factor w/ 834 levels "   HIGH SURF ADVISORY",..: 694 694 694 694 694 694 694 694 694 694 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ total.prop: num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ total.crop: num  0 0 0 0 0 0 0 0 0 0 ...
(After cleaning and tidying the data, the dataset consists of 902,224 observations of 5 variables)
6. Finally, now that the data is clean and tidy, we can summarize it according the questions we want to answer: total damages, fatalities, and injuries for each type of storm. The following makes use of the base::aggregate() function, performing a summary on each variable individually, and then combining these in a single, summary dataset that contains totals for each event type.
totPD<-with(SD2, aggregate(total.prop, list(EVTYPE), sum))
totCD<-with(SD2, aggregate(total.crop, list(EVTYPE), sum))
totFATAL<-with(SD2, aggregate(FATALITIES, list(EVTYPE), sum))
totINJ<-with(SD2, aggregate(INJURIES, list(EVTYPE), sum))

names(totPD)<-c("EVTYPE", "total.property")
names(totCD)<-c("EVTYPE", "total.crop")
names(totFATAL)<-c("EVTYPE", "total.fatalities")
names(totINJ)<-c("EVTYPE", "total.injuries")

sd1<-cbind(totPD, totCD$total.crop, totFATAL$total.fatalities, totINJ$total.injuries)
names(sd1)<-c("EType", "prop", "crop", "fatal", "inj")

Results

total deaths, injuries, property damages and crop damages

sum(sd1$prop, na.rm=TRUE)/(2011-1950); # total yearly average property damage
## [1] 5327207437
sum(sd1$crop, na.rm=TRUE)/(2011-1950); # total yearly average crop damage
## [1] 800303331
sum(sd1$fatal)/(2011-1950); # total yearly average fatalities
## [1] 248.2787
sum(sd1$inj)/(2011-1950); # total yearly average injuries
## [1] 2303.738
Accross the U.S., on average, hundreds of people die, thousands are injured, and damages to property and crops cost billions each year.

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

fatalities<-sd1[order(-sd1[, 4]), ][1:10, ]
# fatalities
injuries<-sd1[order(-sd1[,5]), ][1:10, ]
# injuries
The following bar charts demonstrate the numbers of fatalities and injuries caused by storm type
par(mfrow=c(1,2), mar=c(7, 4, 3, 1))
heat<-heat.colors(10, alpha=.8); # use a color ramp!
barplot(height=fatalities$fatal, names.arg=fatalities$EType,
        las=2, cex.lab=.7, cex.axis=.7, cex.names=.6, cex.main=.8,
        ylim=c(0,6000), main="Deaths Caused by Storms",
        ylab="Total Fatalities", col=heat)

topo<-topo.colors(10, alpha=.8); # use a color ramp!
barplot(height=injuries$inj/1000, names.arg=injuries$EType,
        las=2, cex.lab=.7, cex.axis=.7, cex.names=.6, cex.main=.8,
        ylim=c(0,100), main="Injuries Caused by Storms",
        ylab="Total Injuries (in thousands)", col=topo)

Tornados have caused more deaths and injuries than other storm types by far. Since data have been collected in 1950, tornados have caused about 100 deaths and thousands of injuries per year. Heat waves are also highly fatal, resulting in dozens of deaths per year on average.

Question 2: Across the United States, which types of events have the greatest economic consequences?

propertydamage<-sd1[order(-sd1[, 2]), ][1:10, ]
# propertydamage
cropdamage<-sd1[order(-sd1[, 3]), ][1:10, ]
# cropdamage
The following bar charts demonstrate the relative damages to property and the relative damages to crops, by storm type
par(mfrow=c(1,2), mar=c(7, 4, 3, 1))

heat<-heat.colors(10, alpha=.8); # use a color ramp!
barplot(height=propertydamage$prop/10^9, names.arg=propertydamage$EType,
        las=2, cex.lab=.7, cex.axis=.7, cex.names=.6, cex.main=.8,
        ylim=c(0,160), main="Most Harmful Type of Storms to Property",
        ylab="Property Damage in $Billions", col=heat)

topo<-topo.colors(10, alpha=.8); # use a color ramp!
barplot(height=cropdamage$crop/10^9, names.arg=cropdamage$EType,
        las=2, cex.lab=.7, cex.axis=.7, cex.names=.6, cex.main=.8,
        ylim=c(0,15), main="Most Harmful Type of Storms to Crops",
        ylab="Crop Damage in $Billions", col=topo)

Floods and hurricanes have caused the most property damage, while droughts have done the most damage to crops. While these figures are measured in $Billions, it may be noted that the economic impact of storms to property is exponentially higher than storm damages to crops.
Finally, because damages to property and crops can be measured in dollars, we can calculate their combined economic impact.
propertydamage$Tecon<-propertydamage$prop+propertydamage$crop
Tot.Econ<-propertydamage[order(-propertydamage[,6]),]
par(mfrow=c(1,1), mar=c(6,4,4,2))
heat<-heat.colors(10, alpha=1); # use a color ramp!
bar5<-barplot(height=Tot.Econ$Tecon/10^9, names.arg=Tot.Econ$EType,
        las=2, cex.lab=.8, cex.axis=.9, cex.names=.6, cex.main=1,
        ylim=c(0,165), main="Total Economic Costs by Type of Storm",
        ylab="Total Damages in $Billions", col=heat)
text(x = bar5, y = round(Tot.Econ$Tecon/10^9,2), 
     label = round(Tot.Econ$Tecon/10^9,2), pos = 3, cex = 0.7)

Together, floods and hurricanes have had the greatest economic consequences. Since data has been collected in 1950, floods have cost over $100 billion in property damages alone - almost three Billion dollars year. Hurricanes have added an average additional billion dollars per year in damages.