Reproducible Research. Assignment II. Storm data indicates tornados kill you.

Synopsis.

The dataset consists of storm dat arcoss the United States of America from 1950 to November 2011. The data includes a number of variables of which we have included only those relating to damage to health and damage to proerty in this analysis. Analysis seems to indicate that tornados have been the most common event between 1950 and 2011 and are most likely to kill you. Heat waves on average cause the most injuries but are less common. Property damage is caused by floods and crop damage by droughts.

Data Processing.

Download the data from URL and read into R

if(!file.exists("./data")) {dir.create("./data")}
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "./data/repdata-data-StormData.csv", method="curl")
Storm <- read.csv("./data/repdata-data-StormData.csv")

Create new dataframe for population health variables only

pop <- data.frame(Storm$EVTYPE, Storm$FATALITIES, Storm$INJURIES)

Create new dataframe for economic variables only (one for property damage and one for crop damage)

eco_prop <- data.frame(Storm$EVTYPE, Storm$PROPDMG, Storm$PROPDMGEXP)
eco_crop <- data.frame(Storm$EVTYPE, Storm$CROPDMG, Storm$CROPDMGEXP)

Results.

Note. the following analysis is based on total fatalities and injuries rather then an average across events. While this does not account for which are the most dangerous individual events it does tell us which events have caused the most damage across the US from 1950 to 2011. So it tells us which events we need to do something about. You are better off protecting yourself against the vast number of 'x' that cause a few fatatlities then the unlikely event 'y' that kills more on average.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Calculate total fatalities by event and order by decreasing value

options(scipen = 1, digits = 2)
FbE <- aggregate(list(fatalities=pop$Storm.FATALITIES), list(event=pop$Storm.EVTYPE), sum)
FbE <- FbE[order(FbE$fatalities, decreasing=TRUE),]

Calculate total injuries by event and order by decreasing value

options(scipen = 1, digits = 2)
IbE <- aggregate(list(injuries=pop$Storm.INJURIES), list(event=pop$Storm.EVTYPE), sum)
IbE <- IbE[order(IbE$injuries, decreasing=TRUE),]

Draw graphs for top 10 harm events

fbe <- FbE[1:10,]
ibe <- IbE[1:10,]
par(mfrow=c(2,1))
barplot(ibe$injuries, col="blue", main="total injuries (blue) total fatalities (red) by event", ylab="injuries", names.arg = ibe$event, las=3)
barplot(fbe$fatalities, col="red", ylab="fatalities", names.arg = ibe$event, las=3)

plot of chunk graphs



It is clear from the data that tornados cause the most damage across the US. This is because their are more of them but in this analysis we are only looking at total fatalities and injuries.

Having said that if we take a quick look at average injuries it shows us something interesting.

options(scipen = 1, digits = 2)
IbEAvg <- aggregate(list(injuries=pop$Storm.INJURIES), list(event=pop$Storm.EVTYPE), mean)
IbEAvg <- IbEAvg[order(IbEAvg$injuries, decreasing=TRUE),]
head(IbEAvg)
##                     event injuries
## 277             Heat Wave       70
## 851 TROPICAL STORM GORDON       43
## 954            WILD FIRES       38
## 821         THUNDERSTORMW       27
## 366    HIGH WIND AND SEAS       20
## 656       SNOW/HIGH WINDS       18

It is clear that each heat wave causes a significant number of injuries.



Across the United States, which types of events have the greatest economic consequences?

Standardise the column values for damage for both crop and property.

eco_crop$times <- 0
eco_crop$times[grep("B", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000000)
eco_crop$times[grep("M", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000)
eco_crop$times[grep("m", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000)
eco_crop$times[grep("K", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000)
eco_crop$times[grep("k", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000)
eco_crop$Storm.CROPDMG <- eco_crop$Storm.CROPDMG*eco_crop$times

eco_prop$times <- 0
eco_prop$times[grep("B", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000000)
eco_prop$times[grep("M", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000)
eco_prop$times[grep("m", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000)
eco_prop$times[grep("K", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000)
eco_prop$times[grep("k", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000)
eco_prop$Storm.PROPDMG <- eco_prop$Storm.PROPDMG*eco_prop$times

Calculate totals for events costs

options(scipen = 1, digits = 2)
eco_crop <- aggregate(list(cost=eco_crop$Storm.CROPDMG), list(event=eco_crop$Storm.EVTYPE), sum)
eco_crop <- eco_crop[order(eco_crop$cost, decreasing=TRUE),]
options(scipen = 1, digits = 2)
eco_prop <- aggregate(list(cost=eco_prop$Storm.PROPDMG), list(event=eco_prop$Storm.EVTYPE), sum)
eco_prop <- eco_prop[order(eco_prop$cost, decreasing=TRUE),]

Draw panel graph for both property and crop damage for top 10 events

crops <- eco_crop[1:10,]
props <- eco_prop[1:10,]
par(mfrow=c(2,1))
barplot(crops$cost, col="yellow", main="damage to crops (yellow) and property (green) by event", ylab="cost", names.arg = crops$event, las=3)
barplot(props$cost, col="green", ylab="cost", names.arg = props$event, las=3)

plot of chunk propcrop_graphs



It seems clear that droughts damage crops and floods damage property.