This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In this section, storm data is downloaded from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and then imported to RStudio.
mywd<-"~/R/JHK-DataScience-Reproducible Research/Project 2"
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file<-"StormData.csv.bz2"
if (!file.exists(mywd)) dir.create(mywd,recursive=T)
setwd(mywd)
if(!file.exists(file)) {
download.file(url,file,method='curl')
unzip(file)
}
data<-read.csv(file)
Property/Crop Damage exponential factor is standardized as numeric format.
Fatalities and injuries are summed up to be population health.
Propties/Crop damage are summed up to be economic damage.
explist<-data.frame('EXP'=c("","+", "-", "0", "1", "2", "3" ,"4" ,"5", "6" ,"7", "8", "?", "B" ,"H", "K" ,"M" ,"h", "m","k"),"VAL"=c(0,NA,NA,0,1,2,3,4,5,6,7,8,NA,9,2,3,6,2,6,3))
EXP<-data.frame(data$PROPDMGEXP,data$CROPDMGEXP)
EXP[is.na(EXP[2]),2]<-""
names(EXP)<-c('PROPDMGEXP','CROPDMGEXP')
EXP<-merge(EXP,explist,by.x="PROPDMGEXP",by.y="EXP",sort=F,all.x=T)
names(EXP)[ncol(EXP)]<-'PROPDMGVAL'
EXP<-merge(EXP,explist,by.x="CROPDMGEXP",by.y="EXP",sort=F,all.x=T)
names(EXP)[ncol(EXP)]<-'CROPDMGVAL'
data<-transform(data,POPHEALTH=FATALITIES+INJURIES,BGN_DATE=as.Date(BGN_DATE,format="%m/%d/%Y"),ECODMG=PROPDMG*10**EXP$PROPDMGVAL+CROPDMG*10**EXP$CROPDMGVAL)
data<-transform(data,Year=format(BGN_DATE,'%Y'))
Population health and economic damage are aggregated to obtain annual total amount, categorize by storm type.
plot_data<-with(data,aggregate(data.frame(POPHEALTH,ECODMG),by=list(Year,EVTYPE),FUN=sum))
names(plot_data)<-c('Year','EVTYPE','POPHEALTH','ECODMG')
plot_data1<-with(plot_data,aggregate(data.frame(POPHEALTH),by=list(EVTYPE),FUN=mean))
names(plot_data1)[1]<-'EVTYPE'
plot_data1<-plot_data1[order(plot_data1[,2],decreasing=T),]
plot_data1<-plot_data1[!is.na(plot_data1[,2]),]
plot_data1<-head(plot_data1,20)
plot_data2<-with(plot_data,aggregate(data.frame(ECODMG),by=list(EVTYPE),FUN=mean))
names(plot_data2)[1]<-'EVTYPE'
plot_data2<-plot_data2[order(plot_data2[,2],decreasing=T),]
plot_data2<-plot_data2[!is.na(plot_data2[,2]),]
plot_data2<-head(plot_data2,20)
the first 20 most severe storm type is plotted in the bar chart:
library(ggplot2)
plot1<-ggplot(plot_data1,aes(EVTYPE,POPHEALTH))+geom_bar(stat="identity")+labs(title='total population health impact by storm type')+theme(axis.text.x=element_text(angle=90,hjust=1))
plot2<-ggplot(plot_data2,aes(EVTYPE,ECODMG))+geom_bar(stat="identity")+labs(title='total economic damage impact by storm type')+theme(axis.text.x=element_text(angle=90,hjust=1))
Among the various storm/weather types in U.S., tornado will cause highest impact on population health and thunderstorm wind will cause the most economic damage.