Synopsis: The following analysis is intended to identify and convey the types of Storm Events (tornados, floods, hail, etc) have been most harmful with respect to population health and also to property damage in the US for the period of time from 1950 thru 1972. Data was sourced from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and found here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. This analysis is to answer the following question:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

From the results, we can find that tornado is the most harmful storm event to both health and economic.

Check the R Environment

Before investigating, check the R environment.

print(sessionInfo())

## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.2.1   tools_3.2.2     htmltools_0.2.6
##  [5] yaml_2.1.13     stringi_0.5-5   rmarkdown_0.8   knitr_1.11     
##  [9] stringr_1.0.0   digest_0.6.8    evaluate_0.7.2

Load a Required Package

For grouping and summarizing, I will use dplyr pacakge.

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Processing

Downloade and Unarchive the Dataset

Download compressed file containing Storm Data from the internet, unzip the file and load data into R. Strip leading and trailing white space in the data as it is read into R.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "dataset.zip")
dataset <- read.csv(bzfile("dataset.zip"), header = TRUE)

Check the Data Structure

dim(dataset)

## [1] 902297     37

names(dataset)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

From the columns names, it is needed to focus EVTYPE, FATALITIES, INJURIES, CROPDMG, and PROPDMG. FATALITIES and INJUIRES represent an effect to health, and CROPDMG and PROPDMG can be used as an effect to economic.

Results

The Most Harmful Storm Event (Health)

q1.df <- dataset %>% group_by(EVTYPE) %>% 
         summarise(fatalities = sum(FATALITIES), injuries = sum(INJURIES)) %>% 
         mutate(harmfulness = fatalities + injuries) %>% arrange(desc(harmfulness))

head(q1.df, 5)

## Source: local data frame [5 x 4]
## 
##           EVTYPE fatalities injuries harmfulness
## 1        TORNADO       5633    91346       96979
## 2 EXCESSIVE HEAT       1903     6525        8428
## 3      TSTM WIND        504     6957        7461
## 4          FLOOD        470     6789        7259
## 5      LIGHTNING        816     5230        6046

barplot(q1.df$harmfulness[1:5], names.arg = q1.df$EVTYPE[1:5],
        xlab = "Event Type", ylab = "Fatalities + Injuries(person)", 
        main = "Top 5 Most Harmful Storm Events (Health)")

From the above figure, we can find that tornado is the most hamful storm events to health.

The Most Harmful Storm Event (Economic)

q2.df <- dataset %>% group_by(EVTYPE) %>% 
         summarise(crop = sum(CROPDMG), property = sum(PROPDMG)) %>% 
         mutate(econ_dmg = crop + property) %>% arrange(desc(econ_dmg))

head(q2.df, 5)

## Source: local data frame [5 x 4]
## 
##        EVTYPE     crop  property econ_dmg
## 1     TORNADO 100018.5 3212258.2  3312277
## 2 FLASH FLOOD 179200.5 1420124.6  1599325
## 3   TSTM WIND 109202.6 1335965.6  1445168
## 4        HAIL 579596.3  688693.4  1268290
## 5       FLOOD 168037.9  899938.5  1067976

barplot(q2.df$econ_dmg[1:5], names.arg = q1.df$EVTYPE[1:5],
        xlab = "Event Type", ylab = "Crop + Property($)", 
        main = "Top 5 Most Harmful Storm Events (Economic)")

From the above figure, we can find that tornado is the most hamful storm events to economic also.

Reproducible Research: Peer Assesment 2

SONG Jaehyun

September 26, 2015