Synopsis

Below we examine the effects of storms and severe weather on human health and property damage. We will see that tornados cause the most fatalities and injuries, as well as the most property and crop damage. Any steps that can mitigate the effects of tornados could have substantial benefits for human beings.

Data Processing

Include code for downloading the data.

setwd("C:\\Users\\Hamish\\Documents\\Educational Materials\\Coursera Data Science\\Course 5\\")
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "project2Data"
download.file(URL,destfile)
data <- read.csv("project2Data")

It will be helpful to add a variable to the ‘data’ dataframe that indicates the year in which a storm starts.

year <- as.numeric(format(as.Date(data$BGN_DATE,"%m/%d/%Y "),"%Y"))
uniqueyears <- sort(unique(year))
data$year <- year

Note that there are a lot of event types in the data that are very similar. For example, ‘TSTM’, ‘TSTM WIND (G35)’, ‘TSTM WIND (G40)’, ‘\(~\)TSTM WIND (G45)’, ‘TSTM WIND\(~~\)(G45)’, ‘THUNDERSTORM WIND’, and ‘TSTM WIND and LIGHTNING’. The only apparent difference between ‘\(~\)TSTM WIND (G45)’ and ‘TSTM WIND\(~~\)(G45)’ is where the spaces occur, so I think that it is reasonable to infer that they refer to the same thing. What is less clear is whether all TSTM variables should be combined, or should wind and lightning be kept separate. For example, wind seems more likely to do property damage and lightning seems more likely to kill people, so keeping them separate may be more informative. Without further instruction on this issue, I will not try to combine similar event types, and I do so fully aware that reasonable people may disagree with this decision.

Results

First, we examine the effects on human health and then we examine the economic effects.

What types of storms have the greatest impact on human health?

The data contain information on human fatalities and injuries. This document focuses on fatalities. The results for injuries are similar but are not examined here due to the fact that we are limited to three graphs.

The following is a bar chart indicating the 5 largest causes of death in the data (1950-2011). Tornados, by far, cause the most deaths, followed by excessive heat, flash flood, heat, and lightning.

deaths <- tapply(data$FATALITIES, data$EVTYPE, FUN=sum, na.rm=TRUE)
deaths <- sort(deaths, decreasing=TRUE)
b <- barplot(head(deaths,5), main="Total Storm Fatalities (1950-2011)", horiz=FALSE, names.arg=names(head(deaths,5)), las=2, cex.names=0.55, col="red")

print(b)
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5

Now, let’s take a closer look at the top five causes of death over time. In the following figure, tornados are indicated by black, excessive heat by red, flash floods by orange, heat by green, and lightning by blue. It is interesting to note that there is tornado data dating back to 1950, but there is only data for the other 4 causes of death starting in 1993 or 1994. Therefore, the apparent dominance of tornado-related deaths in the bar chart above should be understood, at least in part, as resulting from the apparent lack of availability of certain kinds of data in earlier years.

temp <- data[data$EVTYPE=="TORNADO",]
temp1 <- tapply(temp$FATALITIES, temp$year, FUN=sum, na.rm=TRUE)
plot(as.numeric(names(temp1)),temp1,type="l", xlab="Year", ylab="Number of Fatalities", ylim=c(0,700))
temp <- data[data$EVTYPE=="EXCESSIVE HEAT",]
temp2 <- tapply(temp$FATALITIES, temp$year, FUN=sum, na.rm=TRUE)
lines(as.numeric(names(temp2)),temp2,col="red")
temp <- data[data$EVTYPE=="FLASH FLOOD",]
temp3 <- tapply(temp$FATALITIES, temp$year, FUN=sum, na.rm=TRUE)
lines(as.numeric(names(temp3)),temp3,col="orange")
temp <- data[data$EVTYPE=="HEAT",]
temp4 <- tapply(temp$FATALITIES, temp$year, FUN=sum, na.rm=TRUE)
lines(as.numeric(names(temp4)),temp4,col="green")
temp <- data[data$EVTYPE=="LIGHTNING",]
temp5 <- tapply(temp$FATALITIES, temp$year, FUN=sum, na.rm=TRUE)
lines(as.numeric(names(temp5)),temp5,col="blue")
legend("top", legend=c("TORNADO","EXCESSIVE HEAT","FLASH FLOOD","HEAT","LIGHTNING"), col=c("black","red","orange","green","blue"), pch=16, cex=.75)

We also note that deaths due to heat (as opposed to excessive heat) are usually pretty low, except for a peak value of 687 deaths in 1995.

print(temp4)
## 1993 1994 1995 1997 1998 2000 2001 2006 2007 2008 2009 2010 2011 
##    7    6  687    1    5    1    1   47   22   27   26   44   63

What types of storms have the greatest economic impact?

Total storm damage is the sum of property damage and crop damage. The following graph shows the top 5 sources of damage and, as above, tornados cause the most damage, but, unlike above, tornados do not dominate the others quite as much.

propertyDamage <- tapply(data$PROPDMG, data$EVTYPE, sum)
propertyDamage <- sort(propertyDamage, decreasing=TRUE)
cropDamage <- tapply(data$CROPDMG, data$EVTYPE, sum)
cropDamage <- sort(cropDamage, decreasing=TRUE)
totalDamage <- propertyDamage + cropDamage
b <- barplot(head(totalDamage,5), main="Total Property and Crop Damage (1950-2011)", horiz=FALSE, names.arg=names(head(totalDamage,5)), las=2, cex.names=0.55, col="orange")

print(b)
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5

Note that, as with fatalities, the property and crop damage records are much more complete after 1993, so the fact that tornado damage ranks first in the preceding plot should be interpreted with care. Further analysis is omitted because of the limitation on the number of graphs.