Reproducible Research (Peer Assignment #2)

Impacts of weather-related events on population and property in the United States.

This report is based on an analysis of NOAA Storm Data. The goal of the analysis is to identify a series of events that have contributed to significant population health and economic losses since 1950.

Information about the storm data database can be found at http://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.docx. This word file describes the variables necessary for completing this report. The variables used in this report pertain to property and crop damage, deaths, and injuries, organized by event type.

We begin by setting a path to our working directory and by calling required libraries.

setwd("C:/R_files_Coursera")
library(car)
library(plyr)

Read data from the NOAA .csv file into a data frame. Only read variables that are required for the analysis.

Data Processing

filename <- "repdata_data_StormData.csv"
myData <- read.csv(filename, header = TRUE, sep = ",", colClasses = c("NULL", 
    "character", rep("NULL", 4), rep("factor", 2), rep("NULL", 14), rep("numeric", 
        3), "factor", "numeric", "factor", rep("NULL", 9)))

Impact on Population Health

We begin first with an analysis of fatalities and injuries by event type.

Sum the number of fatalities and injuries by event type. Create a third variable that contains the combined total of deaths and injuries. This has an important role in determining the scale of the stacked bar charts that we will be producing shortly.

Fatalities.Totals <- tapply(myData$FATALITIES, myData$EVTYPE, sum)
Injuries.Totals <- tapply(myData$INJURIES, myData$EVTYPE, sum)
Bodies.Totals <- Injuries.Totals
BodyBarplotData <- cbind(Fatalities.Totals, Injuries.Totals, Bodies.Totals)
for (i in 1:nrow(BodyBarplotData)) {
    BodyBarplotData[i, 1] <- BodyBarplotData[i, 1]
    BodyBarplotData[i, 2] <- BodyBarplotData[i, 2]
    BodyBarplotData[i, 3] <- BodyBarplotData[i, 1] + BodyBarplotData[i, 2]
}

Results

As we prepare to graph this data, sort the data according to the combined totals. To avoid displaying an extreme number of different event types, we will cap the number of events according to a 250 person impact. All events that impact through death or injury more than 250 people will be considered.

BodyBarplotData <- data.frame(BodyBarplotData)
BodyBarplotData <- BodyBarplotData[order(BodyBarplotData$Bodies.Totals), ]
BodyBarplotData <- subset(BodyBarplotData, BodyBarplotData$Bodies.Totals >= 
    250)

Transpose the data to comply with stacked bar plot format. Add formatting as needed.

Body.bp <- t(BodyBarplotData[, 1:2])
ticks <- pretty(Body.bp)
par(mar = c(5.1, 8, 4.1, 2.1))
labels <- format(ticks, big.mark = ",", scientific = FALSE)
barplot(Body.bp, ylab = "", xlab = "", las = 1, axes = FALSE, cex.names = 0.7, 
    main = "Deaths and Injuries (by Event Type)", col = c("red", "blue"), xlim = c(0, 
        BodyBarplotData[nrow(BodyBarplotData), 3]) * 1.1, horiz = TRUE)
legend("right", legend = c("Deaths", "Injuries"), cex = 0.8, y.intersp = 0.8, 
    fill = c("red", "blue"))
axis(side = 1, tck = -0.015, labels = NA)
axis(side = 1, lwd = 0, line = -0.6, las = 1, at = ticks, labels = labels)
mtext(side = 1, "Number of People Impacted (Combined)", line = 1.6)

plot of chunk unnamed-chunk-5

The above stacked bar chart indicates that tornadoes constitute the most severe threat in terms of loss of life and injury.

Impact on the Economy

Weather-related events often have a significant impact on property and crops. Damages can quickly escalate depending on the event. In this part of our report, we investigate the events that have proven to be the most expensive in terms of property and crop losses since 1950.

Since we have already read in the data, we can proceed to computing totals for fatalities and injuries by event type. We also derive a combined total to help with scaling the stacked bar chart. One signifcant factor, however, concerns the financial data used to report property and crop losses. These variables are reported in certain units such as millions or thousands. A recoding step is necessary to convert these values into their raw dollar equivalents. To aid with scaling, we also divide by 1 billion. Our reported values will, therefore, be in billions of dollars.

Data Processing

myData$Property.Currency.Units <- recode(myData$PROPDMGEXP, " ''=0;'-'=0;'?'=0;'+'=0; '0'=0;'1'=10;'2'=100;'3'=1000;'4'=10000;'5'=100000;'6'=1000000;'7'=10000000;'8'=100000000;'B'=1000000000;'h'=100;'H'=100; 'k'=1000;'K'=1000;'m'=1000000;'M'=1000000", 
    as.factor.result = FALSE)
myData$Property.Losses <- myData$PROPDMG * myData$Property.Currency.Units/1e+09
myData$Crop.Currency.Units <- recode(myData$CROPDMGEXP, " ''=0;'-'=0;'?'=0;'+'=0; '0'=0;'1'=10;'2'=100;'3'=1000;'4'=10000;'5'=100000;'6'=1000000;'7'=10000000;'8'=100000000;'B'=1000000000;'h'=100;'H'=100; 'k'=1000;'K'=1000;'m'=1000000;'M'=1000000", 
    as.factor.result = FALSE)
myData$Crop.Losses <- myData$CROPDMG * myData$Crop.Currency.Units/1e+09

Property.Totals <- tapply(myData$Property.Losses, myData$EVTYPE, sum)
Crop.Totals <- tapply(myData$Crop.Losses, myData$EVTYPE, sum)
Combined.Totals <- Property.Totals

We correct the combined totals column so that it reflects the totals for crop and property losses combined. In this stacked bar chart, we also limit the number of reported events to those in excess of $1 billion.

barplotData <- cbind(Property.Totals, Crop.Totals, Combined.Totals)
for (i in 1:nrow(barplotData)) {
    barplotData[i, 3] <- barplotData[i, 1] + barplotData[i, 2]
}
barplotData <- data.frame(barplotData)
barplotData <- barplotData[order(barplotData$Combined.Totals), ]
barplotData <- subset(barplotData, barplotData$Combined.Totals >= 1)

Results

We now graph the resulting data and examine it to reveal the most costly event type for both property and crop losses.

bp <- t(barplotData[, 1:2])
ticks <- pretty(bp)
par(mar = c(5.1, 10, 4.1, 2.1))
labels <- format(ticks, big.mark = ",", scientific = FALSE)
barplot(bp, ylab = "", xlab = "", las = 1, axes = FALSE, cex.names = 0.7, main = "Property and Crop Losses (by Event Type)", 
    col = c("red", "blue"), xlim = c(0, barplotData[nrow(barplotData), 3] * 
        1.1), horiz = TRUE)
legend("right", legend = c("Property", "Crop"), cex = 0.8, y.intersp = 0.8, 
    fill = c("red", "blue"))
axis(side = 1, tck = -0.015, labels = NA)
axis(side = 1, lwd = 0, line = -0.7, las = 1, at = ticks, labels = labels)
mtext(side = 1, "Cost ($ billion)", line = 1.5)

plot of chunk unnamed-chunk-8

The above stacked bar chart indicated that tornadoes are once again the most significant weather-related event in terms of property losses. However, drought is the most significant driver of crop losses. Thus concludes our report.