Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?
First check version information about R and attached or loaded packages.
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: i386-w64-mingw32/i386 (32-bit)
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8 evaluate_0.6 formatR_1.1 htmltools_0.2.6
## [5] knitr_1.9 rmarkdown_0.3.11 stringr_0.6.2 tools_3.1.2
Download the NOAA Storm Data and load data into R
storm<-read.csv(bzfile("stormdata.csv.bz2"),header = T,stringsAsFactors=F)
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string
The storm consists of these columns:
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
And we only need to extract parts of the data,in this case,we need these columns:EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG.
Extracted data are called usestorm
usestorm<-subset(storm,FATALITIES>0|INJURIES>0|PROPDMG>0|CROPDMG>0,select = c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG"))
For the event types, there are some long sentences among them. However, while real event types are capital, unreal event types can be excluded and give them a type called “OTHER”
atoz<-"a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|000"
usestorm[grep(atoz,usestorm$EVTYPE),]$EVTYPE<-"OTHER"
values need to be transformed into numeric to be anylysed:
usestorm$FATALITIES<-as.numeric(usestorm$FATALITIES)
## Warning: NAs introduced by coercion
usestorm$INJURIES<-as.numeric(usestorm$INJURIES)
## Warning: NAs introduced by coercion
usestorm$PROPDMG<-as.numeric(usestorm$PROPDMG)
## Warning: NAs introduced by coercion
usestorm$CROPDMG<-as.numeric(usestorm$CROPDMG)
## Warning: NAs introduced by coercion
To analyse fatalities,injures,propdmg,cropdmg, generate four dataset by event types.
Here we only extract top ten results based on event types
fatalities<-aggregate(FATALITIES~EVTYPE,usestorm,sum)
fatalities<-fatalities[order(fatalities$FATALITIES,decreasing = T),]
f<-head(fatalities,n=10)
injuries<-aggregate(INJURIES~EVTYPE,usestorm,sum)
injuries<-injuries[order(injuries$INJURIES,decreasing = T),]
i<-head(injuries,n=10)
propdmg<-aggregate(PROPDMG~EVTYPE,usestorm,sum)
propdmg<-propdmg[order(propdmg$PROPDMG,decreasing = T),]
p<-head(propdmg,n=10)
cropdmg<-aggregate(CROPDMG~EVTYPE,usestorm,sum)
cropdmg<-cropdmg[order(cropdmg$CROPDMG,decreasing = T),]
c<-head(cropdmg,n=10)
And fatalities and injuries can be merged into one dataset:health. Add another columne called TOTAL to top ten results
health<-merge(fatalities,injuries,by = "EVTYPE")
health<-health[order(health$FATALITIES,health$INJURIES,decreasing = T),]
h<-head(health,n=10)
h$TOTAL<-h$FATALITIES+h$INJURIES
The top ten most harmful event types with respect to population health:
h
## EVTYPE FATALITIES INJURIES TOTAL
## 650 TORNADO 4658 80084 84742
## 442 OTHER 2573 3206 5779
## 108 EXCESSIVE HEAT 1416 4354 5770
## 230 HEAT 708 878 1586
## 390 LIGHTNING 562 3628 4190
## 129 FLASH FLOOD 559 1407 1966
## 670 TSTM WIND 471 6452 6923
## 145 FLOOD 258 6499 6757
## 485 RIP CURRENTS 204 297 501
## 302 HIGH WIND 194 919 1113
And here is the fatalities and injuries plot:
par(mfrow = c(1,2), oma = c(0,0,2,0))
barplot(f$FATALITIES,names.arg = f$EVTYPE,main="Fatalities",col = "green",las=2)
barplot(i$INJURIES,names.arg = i$EVTYPE,main="Injuries",col = "blue",las=2)
title("Top ten fatalities and injuries caused by events",outer = T)
We can easily find that:
1.“Tornado” can cause the most fatalities and injuries.
2.“TSTM Wind” and “flood” can cause lots of injuries.
3.“Excessive heat” can cause lots death among other types.
4. The “OTHER” type consists many different small types, so it is not representative here
We can find the top ten propdmg by events:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
qplot(PROPDMG,EVTYPE,data=p,ylab ="Event",xlab = "Propdmg",main="Top ten propdmg by event")
And top ten cropdmg by events:
qplot(CROPDMG,EVTYPE,data=c,ylab ="Event",xlab = "Cropdmg",main="Top ten cropdmg by event")