Reproducible Research: Peer Graded Assignment: Course Project 2

Synopsis

We analyze the impact of Storms and other severe weather events on public health and economic problems based on U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events in the database start in the year 1950 and end in November 2011. We use data of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health (fatalities and injuries) and economy (property and crop damage). We found that Marine Thunderstorm Wind (TSTM WIND) caused most harmful with respect to population health in term of both fatalities and injuries, while HURRICANE/TYPHOON (HURRICANE/TYPHOON)have the greatest economic consequences in term of damage amounts.

Data Processing

Download & Read Data

cache = TRUE
echo = TRUE
if(!file.exists("stormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "stormData.csv.bz2")
}
NOAA <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string

Select Needed Columns – Subset (NOAA) storm database

We only need following columns for this analysis: ‘EVTYPE’,‘FATALITIES’,‘INJURIES’, ‘PROPDMG’, ‘PROPDMGEXP’, ‘CROPDMG’, ‘CROPDMGEXP’.

cache = TRUE
echo = TRUE
tidyNOAA <- NOAA[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
tidyNOAA$FATALITIES <-as.numeric(tidyNOAA$FATALITIES)
tidyNOAA$INJURIES <-as.numeric(tidyNOAA$INJURIES)
tidyNOAA$PROPDMG <-as.numeric(tidyNOAA$PROPDMG)
tidyNOAA$CROPDMG <-as.numeric(tidyNOAA$CROPDMG)
head(tidyNOAA)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO      19282    19199   19216          K   17967           
## 2 TORNADO      19282    18909   19130          K   17967           
## 3 TORNADO      19282    19229   19216          K   17967           
## 4 TORNADO      19282    19229   19130          K   17967           
## 5 TORNADO      19282    19229   19130          K   17967           
## 6 TORNADO      19282    19368   19130          K   17967

Data Transformations

The PROPDMG and CROPDMG damages columns contain values and PROPDMGEXP and CROPDMGEXP contain units of PROPDMG and CROPDMG respectively (i.e. “H” = hundreds, “K” = thousands, “M” = millions and “B” = billions), we must convert PROPDMG and CROPDMG into corresponding values to facilitate calculations.

Modifications of Property Damage (PROPDMG) with corresponding units (PROPDMGEXP) - i.e. Convert H, K, M, B units to a newly created Property Damage Amount column (PROPDMGAMT)

cache = TRUE
echo = TRUE
tidyNOAA$PROPDMGAMT = 0
tidyNOAA[tidyNOAA$PROPDMGEXP == "H", ]$PROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$PROPDMGEXP == "H", ]$PROPDMG) * 10^2
tidyNOAA[tidyNOAA$PROPDMGEXP == "K", ]$PROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$PROPDMGEXP == "K", ]$PROPDMG) * 10^3
tidyNOAA[tidyNOAA$PROPDMGEXP == "M", ]$PROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$PROPDMGEXP == "M", ]$PROPDMG) * 10^6
tidyNOAA[tidyNOAA$PROPDMGEXP == "B", ]$PROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$PROPDMGEXP == "B", ]$PROPDMG) * 10^9

Modifications of Crop Damage (CROPDMG) with corresponding units (CROPDMGEXP) - i.e. Convert H, K, M, B units to a newly created Crop Damage Amount column (CROPDMGAMT)

cache = TRUE
echo = TRUE
tidyNOAA$CROPDMGAMT = 0
tidyNOAA[tidyNOAA$CROPDMGEXP == "H", ]$CROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$CROPDMGEXP == "H", ]$CROPDMG) * 10^2
tidyNOAA[tidyNOAA$CROPDMGEXP == "K", ]$CROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$CROPDMGEXP == "K", ]$CROPDMG) * 10^3
tidyNOAA[tidyNOAA$CROPDMGEXP == "M", ]$CROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$CROPDMGEXP == "M", ]$CROPDMG) * 10^6
tidyNOAA[tidyNOAA$CROPDMGEXP == "B", ]$CROPDMGAMT = as.numeric(tidyNOAA[tidyNOAA$CROPDMGEXP == "B", ]$CROPDMG) * 10^9

head(tidyNOAA)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO      19282    19199   19216          K   17967           
## 2 TORNADO      19282    18909   19130          K   17967           
## 3 TORNADO      19282    19229   19216          K   17967           
## 4 TORNADO      19282    19229   19130          K   17967           
## 5 TORNADO      19282    19229   19130          K   17967           
## 6 TORNADO      19282    19368   19130          K   17967           
##   PROPDMGAMT CROPDMGAMT
## 1   19216000          0
## 2   19130000          0
## 3   19216000          0
## 4   19130000          0
## 5   19130000          0
## 6   19130000          0

Results

Q1 Across the United States, which types of events (EVTYPE) are most harmful with respect to population health?

A1.1. Plot number of fatalities (FATALITIES) by the most harmful event type (EVTYPE)

cache = TRUE
echo = TRUE
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=tidyNOAA, sum)
fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)

ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Type of Events") + ylab("Fatalities") + ggtitle("Number of Fatalities Ranked by Top 10 Weather Events")

A1.2. Plot number of injuries (INJURIES) by the most harmful event type (EVTYPE)

cache = TRUE
echo = TRUE
library(ggplot2)

injuries <- aggregate(INJURIES ~ EVTYPE, data=tidyNOAA, sum)
injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Type of Events") + ylab("Injuries") + ggtitle("Number of Injuries Ranked by Top 10 Weather Events")

Q2 Across the United States, which types of events have the greatest economic consequences?

A2 Plot damages amount (PROPDMGNUM + CROPDMGNUM) by types of events (EVTYPE) created the most economic damages

cache = TRUE
echo = TRUE
library(ggplot2)

damages <- aggregate(PROPDMGAMT + CROPDMGAMT ~ EVTYPE, data=tidyNOAA, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")
damages <- damages[order(-damages$TOTALDAMAGE), ][1:10, ]
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)

ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Type of Events") + ylab("Damages(US$)") + ggtitle("Property & Crop Damages Ranked by Top 10 Weather Events")

Conclusion

In conclusion, we found that Marine Thunderstorm Wind (TSTM WIND) caused most harmful with respect to population health in term of both fatalities and injuries, while HURRICANE/TYPHOON (HURRICANE/TYPHOON) have the greatest economic consequences in term of damage amounts.