1. Synopsis

This research will analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm data, and try to answer 2 quesions.

  1. which types of events are most harmful with respect to population health in US
  2. which types of events have the greatest economic consequences in US ?

2. Data Introduction

The are 902,297 rows of storm data, and 39 variables. you may download and have a look at the data at Storm Data . For more detail please refer to Storm Data Document

3. Data Processing

3.1 Year Selected

This research will analysis the storm data from 1995, since there are only 4 types of storm were recorded in database before 1996. It makes more sense if we compare all the storm types.

3.2 Which events are most harmful to population health ?

(injuires + fatalities ) is the standard to judge whether the event is more harmful or not.

3.3 Which types of events have the greatest economic consequences in US ?

( Crop damage + Property damage ) is the standard to judge whether the event have the greatest economic consequences in US.

3.4 Code

# 1. Read the data
stormData <- read.csv("repdata-data-StormData.csv.bz2", header=TRUE, sep=",")
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
# 2. Filter the data, 1996 and after
stormData$date <- as.Date( as.character(stormData$BGN_DATE), "%m/%d/%Y")
after_1995_stromData <- subset(stormData, date > "1995-12-31")

# 3. Get injuries + fatalities
# 3.1 sum by etype
after_1965_by_type1 <- group_by(after_1995_stromData, EVTYPE)
after_1965_sum <- summarize(after_1965_by_type1, INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES))

# 3.2 remove 0 and sort
after_1965_sum <- subset(after_1965_sum, INJURIES >0, FATALITIES >0)

# 3.3 Calaculate the total and sort
after_1965_sum$total <- (after_1965_sum$INJURIES + after_1965_sum$FATALITIES)
after_1965_sum <- arrange(after_1965_sum, desc(total))

# 4. Get property damage + crop damage
# 4.1 replace unit blank,k,m,b with 1000,1000000,1000000000,0
after_1995_stromData$PROPDMGEXP <- gsub("k", 1000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- gsub("m", 1000000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- gsub("b", 1000000000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- as.numeric(gsub("^$", 0, after_1995_stromData$PROPDMGEXP))

after_1995_stromData$CROPDMGEXP <- gsub("k", 1000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- gsub("m", 1000000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- gsub("b", 1000000000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- as.numeric(gsub("^$", 0, after_1995_stromData$CROPDMGEXP))

# 4.2 toal = property damage * unit + crop damage * unit
after_1995_stromData$total <- after_1995_stromData$PROPDMG*as.numeric(after_1995_stromData$PROPDMGEXP) + after_1995_stromData$CROPDMG*as.numeric(after_1995_stromData$CROPDMGEXP)

# 4.3 group by and summarize
after_1965_by_type2 <- group_by(after_1995_stromData, EVTYPE)
after_1995_stromData_dmage<- summarize(after_1965_by_type2, total=sum(total))

# 4.4 remove 0 and sort
after_1995_stromData_dmage<- subset(after_1995_stromData_dmage, total > 0)
after_1995_stromData_dmage<- arrange(after_1995_stromData_dmage, desc(total))

4. Results

4.1 Which events are most harmful to population health ?

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
sp <- ggplot(data=head(after_1965_sum), aes(x=EVTYPE, y=total))
sp + geom_bar(stat="identity") + ggtitle("Top 6 Most harmful events to pupulation health") + ylab("Total injures and fatalities")

From the figure above you may see that TORNADO is the most harmful to population health.

4.2 which types of events have the greatest economic consequences in US?

sp <- ggplot(data=head(after_1995_stromData_dmage), aes(x=EVTYPE, y=total))
sp + geom_bar(stat="identity") + ggtitle("Top 6 Events cause greatest economic consequences in US") + ylab("Total damage in dollars")

From the figure above you may see that flood have the greatest economic consequences in US.