The NOAA storm database currently contains data from January 1950 to October 2014, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type. The following analysis show how different type of weather events affect America, either cost lives or demage property.

Data Processing

library(ggplot2)
suppressPackageStartupMessages(library(dplyr, quietly=TRUE))
mydata<-read.csv("repdata-data-StormData.csv",sep=",",header=T)
mydata<-tbl_df(mydata)
mydata<-mydata[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
mydata$EVTYPE<-tolower(mydata$EVTYPE)
mydata$PROPDMGEXP<-tolower(mydata$PROPDMGEXP)
mydata$CROPDMGEXP<-tolower(mydata$CROPDMGEXP)
PROPDMGEXP<-c("","?","+","-","h","k","m","b",0,1,2,3,4,5,6,7,8)
CROPDMGEXP=PROPDMGEXP
prop_trans<-c(0,1,1,1,100,1000,1000000,1000000000,1,10,100,1000,10000,100000,1000000,10000000,100000000)
crop_trans=prop_trans
prop_exp<-data.frame(PROPDMGEXP,prop_trans)
crop_exp<-data.frame(CROPDMGEXP,crop_trans)
mydata<-left_join(mydata,prop_exp,by="PROPDMGEXP")
mydata<-left_join(mydata,crop_exp,by="CROPDMGEXP")
mydata<-mydata[,c(-5,-7)]

I load the dataset into our workspace, change the variables to lower case in order to omit duplicate information, then create a backup dataframe to help tramsform units into numeric value in order to multiplied damage and units together.

Results(damage to population health)

health<-mutate(mydata,damage=FATALITIES+INJURIES)
health<-health[!health$damage==0,c(1,8)]
by_EVTYPE<-group_by(health,EVTYPE)
health2<-summarize(by_EVTYPE,damage=sum(damage))
health3<-arrange(health2,desc(damage))[1:10,]
health3
## Source: local data frame [10 x 2]
## 
##               EVTYPE damage
## 1            tornado  96979
## 2     excessive heat   8428
## 3          tstm wind   7461
## 4              flood   7259
## 5          lightning   6046
## 6               heat   3037
## 7        flash flood   2755
## 8          ice storm   2064
## 9  thunderstorm wind   1621
## 10      winter storm   1527
ggplot(aes(EVTYPE,log(damage)),data=health3)+geom_bar(stat="identity")+coord_polar()

I actually used dplyr package to tramsform the data, summarize the fatalities and injuries together by EVTYPE variable, then used a polar figure to illustrate top 10 weather type with regard of damage(log transform) to the pupulation.

We can easily find out from the picture that tornado is the most damaging weather with respect of human health.

Results(damage to economic)

eco<-mutate(mydata,prop_dmg=PROPDMG*prop_trans,crop_dmg=CROPDMG*crop_trans,eco_dmg=prop_dmg+crop_dmg)[,c(1,10)]
by_EVTYPE<-group_by(eco,EVTYPE)
eco_dmg<-summarize(by_EVTYPE,damage=sum(eco_dmg))
eco_dmg<-arrange(eco_dmg,desc(damage))[1:10,]
eco_dmg
## Source: local data frame [10 x 2]
## 
##               EVTYPE       damage
## 1              flood 150319678250
## 2  hurricane/typhoon  71913712800
## 3            tornado  57362333944
## 4        storm surge  43323541000
## 5               hail  18761221926
## 6        flash flood  18243990872
## 7            drought  15018672000
## 8          hurricane  14610229010
## 9        river flood  10148404500
## 10         ice storm   8967041360
ggplot(aes(EVTYPE,damage),data=eco_dmg)+geom_bar(stat="identity")+coord_polar()

I used dplyr package to tramsform the data, multiplied the units and damage, then summarize the property damage and crop damage together by EVTYPE variable, at last used a polar figure to illustrate top 10 weather type which are most harmful to the economic.

The most damaging weather with regard of economic impact in America is obviously flood.