Reproducible Research-PA2

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Questions

1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?

Data Processing

First check version information about R and attached or loaded packages.

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C                                                   
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.8     evaluate_0.6     formatR_1.1      htmltools_0.2.6 
## [5] knitr_1.9        rmarkdown_0.3.11 stringr_0.6.2    tools_3.1.2

Get data

Download the NOAA Storm Data and load data into R

storm<-read.csv(bzfile("stormdata.csv.bz2"),header = T,stringsAsFactors=F)
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string

The storm consists of these columns:

names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

And we only need to extract parts of the data,in this case,we need these columns:EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG.
Extracted data are called usestorm

usestorm<-subset(storm,FATALITIES>0|INJURIES>0|PROPDMG>0|CROPDMG>0,select = c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG"))

Clean data

For the event types, there are some long sentences among them. However, while real event types are capital, unreal event types can be excluded and give them a type called “OTHER”

atoz<-"a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|000"
usestorm[grep(atoz,usestorm$EVTYPE),]$EVTYPE<-"OTHER"

Process data

values need to be transformed into numeric to be anylysed:

usestorm$FATALITIES<-as.numeric(usestorm$FATALITIES)
## Warning: NAs introduced by coercion
usestorm$INJURIES<-as.numeric(usestorm$INJURIES)
## Warning: NAs introduced by coercion
usestorm$PROPDMG<-as.numeric(usestorm$PROPDMG)
## Warning: NAs introduced by coercion
usestorm$CROPDMG<-as.numeric(usestorm$CROPDMG)
## Warning: NAs introduced by coercion

To analyse fatalities,injures,propdmg,cropdmg, generate four dataset by event types.
Here we only extract top ten results based on event types

fatalities<-aggregate(FATALITIES~EVTYPE,usestorm,sum)
fatalities<-fatalities[order(fatalities$FATALITIES,decreasing = T),]
f<-head(fatalities,n=10)

injuries<-aggregate(INJURIES~EVTYPE,usestorm,sum)
injuries<-injuries[order(injuries$INJURIES,decreasing = T),]
i<-head(injuries,n=10)

propdmg<-aggregate(PROPDMG~EVTYPE,usestorm,sum)
propdmg<-propdmg[order(propdmg$PROPDMG,decreasing = T),]
p<-head(propdmg,n=10)

cropdmg<-aggregate(CROPDMG~EVTYPE,usestorm,sum)
cropdmg<-cropdmg[order(cropdmg$CROPDMG,decreasing = T),]
c<-head(cropdmg,n=10)

And fatalities and injuries can be merged into one dataset:health. Add another columne called TOTAL to top ten results

health<-merge(fatalities,injuries,by = "EVTYPE")
health<-health[order(health$FATALITIES,health$INJURIES,decreasing = T),]
h<-head(health,n=10)
h$TOTAL<-h$FATALITIES+h$INJURIES

Results

Population health

The top ten most harmful event types with respect to population health:

h
##             EVTYPE FATALITIES INJURIES TOTAL
## 650        TORNADO       4658    80084 84742
## 442          OTHER       2573     3206  5779
## 108 EXCESSIVE HEAT       1416     4354  5770
## 230           HEAT        708      878  1586
## 390      LIGHTNING        562     3628  4190
## 129    FLASH FLOOD        559     1407  1966
## 670      TSTM WIND        471     6452  6923
## 145          FLOOD        258     6499  6757
## 485   RIP CURRENTS        204      297   501
## 302      HIGH WIND        194      919  1113

And here is the fatalities and injuries plot:

par(mfrow = c(1,2), oma = c(0,0,2,0))
barplot(f$FATALITIES,names.arg = f$EVTYPE,main="Fatalities",col = "green",las=2)
barplot(i$INJURIES,names.arg = i$EVTYPE,main="Injuries",col = "blue",las=2)
title("Top ten fatalities and injuries caused by events",outer = T)

We can easily find that:
1.“Tornado” can cause the most fatalities and injuries.
2.“TSTM Wind” and “flood” can cause lots of injuries.
3.“Excessive heat” can cause lots death among other types.
4. The “OTHER” type consists many different small types, so it is not representative here

Economic loss

We can find the top ten propdmg by events:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
qplot(PROPDMG,EVTYPE,data=p,ylab ="Event",xlab = "Propdmg",main="Top ten propdmg by event")

And top ten cropdmg by events:

qplot(CROPDMG,EVTYPE,data=c,ylab ="Event",xlab = "Cropdmg",main="Top ten cropdmg by event")