Coursera Reproducible Research Peer Assessment Assignment 2

Synopsis

Weather events such as storm can have great impact on health and properties.

In this project, the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database is examined. The events in the database start in the year 1950 and end in November 2011. Questions that I like to address are: 1) which type of events are most harmful to population health? 2) which type of events have the greatest consequences on economics?

The original data can be downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

First load packages:

library(knitr)
## Warning: package 'knitr' was built under R version 3.1.2
library(ggplot2)
opts_chunk$set(echo = TRUE)

Data Processing

Download, extract the file to the working directory, then load:

data <- read.csv("repdata-data-StormData.csv", 
                   header = TRUE, sep = ",", dec = ".")

Total of 902297 observations with 37 variables.

Results

1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Lets look at injuries and fatalities in relation to event type:

event.type <- data$EVTYPE
injure <- data$INJURIES
fatal <- data$FATALITIES

combined.harm <- injure + fatal

#subset a portion from the original data
sub.data <- data.frame(event.type, combined.harm)

sum.harm <- aggregate(combined.harm ~ event.type, sub.data, sum)

#indexing for the order
order <- order(sum.harm$combined.harm, decreasing = TRUE)
order.sum.harm <- sum.harm[order,] 
colnames(order.sum.harm) <- c("Events", 
                              "Total_Injuries_and_Fatalities")
high.20.harm <- head(order.sum.harm, 20)

#order the subset
order.harm.factor <- reorder(high.20.harm$Events, 
                             -high.20.harm$Total_Injuries_and_Fatalities)
head(high.20.harm)
##             Events Total_Injuries_and_Fatalities
## 834        TORNADO                         96979
## 130 EXCESSIVE HEAT                          8428
## 856      TSTM WIND                          7461
## 170          FLOOD                          7259
## 464      LIGHTNING                          6046
## 275           HEAT                          3037

Looks like the top events that cause the most injuries and fatalities are: TORNADO, 9.697910^{4}, EXCESSIVE HEAT, 8428, TSTM WIND, 7461 and so on.

Let’s visualize this in a plot:

plot1 <- qplot(order.harm.factor, high.20.harm$Total_Injuries_and_Fatalities, 
               data = high.20.harm, stat="identity", geom = "bar")
plot1 + labs(title = "Total Injuries and Fatalities versus Events",
             x = "Event Types", y = "Number of Injuries/Fatalities") + 
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

2: Across the United States, which types of events have the greatest economic consequences?

First lets subset the property and crop damage columns in relations to event types:

prop.damage <- data$PROPDMG
crop.damage <- data$CROPDMG
combined.damage <- prop.damage + crop.damage

#subset a portion from the original data
sub.data2 <- data.frame(event.type, combined.damage)

sum.damage <- aggregate(combined.damage ~ event.type, sub.data2,sum)

#indexing for the order
order2 <- order(sum.damage$combined.damage, decreasing = TRUE)
order.sum.damage <- sum.damage[order2,] 
colnames(order.sum.damage) <- c("Events", "Total_Economy_Damage")
high.20.damage <- head(order.sum.damage, 20)

#order the subset
order.damage.factor <- reorder(high.20.damage$Events, 
                               -high.20.damage$Total_Economy_Damage)
head(high.20.damage)
##                Events Total_Economy_Damage
## 834           TORNADO            3312276.7
## 153       FLASH FLOOD            1599325.1
## 856         TSTM WIND            1445168.2
## 244              HAIL            1268289.7
## 170             FLOOD            1067976.4
## 760 THUNDERSTORM WIND             943635.6

Now I make a plot to show the damage versus the event types:

plot2 <- qplot(order.damage.factor, high.20.damage$Total_Economy_Damage, 
               data = high.20.damage, stat="identity", geom = "bar")
plot2 + labs(title = "Total Economic Damage versus Events",
             x = "Event Types", y = "Economic Damage in USD") + 
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Conclusion

As one can see, the tornado causes the most economic damage in terms of properties and crops between the years 1950 and 2011.