The NOAA storm database currently contains data from January 1950 to October 2014, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type. The following analysis show how different type of weather events affect America, either cost lives or demage property.
library(ggplot2)
suppressPackageStartupMessages(library(dplyr, quietly=TRUE))
mydata<-read.csv("repdata-data-StormData.csv",sep=",",header=T)
mydata<-tbl_df(mydata)
mydata<-mydata[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
mydata$EVTYPE<-tolower(mydata$EVTYPE)
mydata$PROPDMGEXP<-tolower(mydata$PROPDMGEXP)
mydata$CROPDMGEXP<-tolower(mydata$CROPDMGEXP)
PROPDMGEXP<-c("","?","+","-","h","k","m","b",0,1,2,3,4,5,6,7,8)
CROPDMGEXP=PROPDMGEXP
prop_trans<-c(0,1,1,1,100,1000,1000000,1000000000,1,10,100,1000,10000,100000,1000000,10000000,100000000)
crop_trans=prop_trans
prop_exp<-data.frame(PROPDMGEXP,prop_trans)
crop_exp<-data.frame(CROPDMGEXP,crop_trans)
mydata<-left_join(mydata,prop_exp,by="PROPDMGEXP")
mydata<-left_join(mydata,crop_exp,by="CROPDMGEXP")
mydata<-mydata[,c(-5,-7)]
I load the dataset into our workspace, change the variables to lower case in order to omit duplicate information, then create a backup dataframe to help tramsform units into numeric value in order to multiplied damage and units together.
health<-mutate(mydata,damage=FATALITIES+INJURIES)
health<-health[!health$damage==0,c(1,8)]
by_EVTYPE<-group_by(health,EVTYPE)
health2<-summarize(by_EVTYPE,damage=sum(damage))
health3<-arrange(health2,desc(damage))[1:10,]
health3
## Source: local data frame [10 x 2]
##
## EVTYPE damage
## 1 tornado 96979
## 2 excessive heat 8428
## 3 tstm wind 7461
## 4 flood 7259
## 5 lightning 6046
## 6 heat 3037
## 7 flash flood 2755
## 8 ice storm 2064
## 9 thunderstorm wind 1621
## 10 winter storm 1527
ggplot(aes(EVTYPE,log(damage)),data=health3)+geom_bar(stat="identity")+coord_polar()
I actually used dplyr package to tramsform the data, summarize the fatalities and injuries together by EVTYPE variable, then used a polar figure to illustrate top 10 weather type with regard of damage(log transform) to the pupulation.
We can easily find out from the picture that tornado is the most damaging weather with respect of human health.
eco<-mutate(mydata,prop_dmg=PROPDMG*prop_trans,crop_dmg=CROPDMG*crop_trans,eco_dmg=prop_dmg+crop_dmg)[,c(1,10)]
by_EVTYPE<-group_by(eco,EVTYPE)
eco_dmg<-summarize(by_EVTYPE,damage=sum(eco_dmg))
eco_dmg<-arrange(eco_dmg,desc(damage))[1:10,]
eco_dmg
## Source: local data frame [10 x 2]
##
## EVTYPE damage
## 1 flood 150319678250
## 2 hurricane/typhoon 71913712800
## 3 tornado 57362333944
## 4 storm surge 43323541000
## 5 hail 18761221926
## 6 flash flood 18243990872
## 7 drought 15018672000
## 8 hurricane 14610229010
## 9 river flood 10148404500
## 10 ice storm 8967041360
ggplot(aes(EVTYPE,damage),data=eco_dmg)+geom_bar(stat="identity")+coord_polar()
I used dplyr package to tramsform the data, multiplied the units and damage, then summarize the property damage and crop damage together by EVTYPE variable, at last used a polar figure to illustrate top 10 weather type which are most harmful to the economic.
The most damaging weather with regard of economic impact in America is obviously flood.