This is an analysis regrading of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In this analysis, two specify questions are addressed:
1. Across the United States, which types of events are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?
This report has three big parts, Synopsis, Data Analysis, and Results. In Data Analysis, there are there parts,Data Processing,Clean and Tidy Data, Data Processing. In Data Analysis, data are loaded in R and different typings of events are intergrated, after that, population damage and economic damage for each event is defined and calculated, last, event of the greatest popoulation damage and event of the greatest economic damage is shown.
Load the Data
After downloading the data(bz2 file) from the Coursera Reproducible Research assignment website,load the data in R.
library("bitops")
library("RCurl")
setwd("E:/reproducible research")
data=read.csv(bzfile("repdata-data-StormData.csv.bz2"))
By looking at the National Weather Service Storm Data Documentations,according to the data name explainations, population health will be defined as the sum of column “INJURIES” and “FATALITIES”.
It does not make sense to reduce all the EVTYPE events down to 48 types. For the purposes of this analysis, it is assumed that f one event only cause several injuries and death, combining all the similiar events won’t increase the injuries and fatalities to thousands, as if one event causes small amount of injuries, it won’t cause an outstanding injuries in other states in general. Rather, instading of cleaning the EVTYPE first, let’s sum the injuries and fatalities amount for each event, and see which event cause the largest damage, then we will start cleaning the event types for those events that cause a rather large amont of damage.
data$sum <- data$INJURIES + data$FATALITIES
total=aggregate( sum~ EVTYPE, data = data, FUN= sum)
total=total[order(-total$sum),]
head(total,20)
## EVTYPE sum
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
## 153 FLASH FLOOD 2755
## 427 ICE STORM 2064
## 760 THUNDERSTORM WIND 1621
## 972 WINTER STORM 1527
## 359 HIGH WIND 1385
## 244 HAIL 1376
## 411 HURRICANE/TYPHOON 1339
## 310 HEAVY SNOW 1148
## 957 WILDFIRE 986
## 786 THUNDERSTORM WINDS 972
## 30 BLIZZARD 906
## 188 FOG 796
## 585 RIP CURRENT 600
## 955 WILD/FOREST FIRE 557
However, it should be noticed that there are many types in the EVTYPE column.In the EVTYPE column, although there are 985 types, there are only 48 different types of events. (Mostly due to typos, the choice of whether to use capitalisation or not,etc.) From what we got so far, the most damage types would be Tornado. However, in order to be 100% sure, different typos of events should be included, which means that “Tornado”, “tornado” and “TORNAO”columns should be intergrated into one. By doing so, we can be sure if the damage ranking will change, so that we can make sure which type is the most damage event. The first 15 most damage EVTYPE event names will be cleaned, as shown in the table above.(The most damage events will pop up even if it is not cleaned up, so only the first 15 events are selected)
The following code will intergrate events under different typings.Each will give the positions of each event, then the sum of fatalies and injuries will be given (15 sums,) then all the sum will be combined,so that a new, more precise ranking will be given.(By using ignore.case=T and pattern, all the different typing of events will be searched and all their position will be returned)
injurydead=data[,c("INJURIES","FATALITIES")]
#get the data position of the event "TORNADO"
tornado=grep(pattern="ornado",data$EVTYPE, ignore.case=T)
tornadosum=sum(injurydead[tornado,1])+sum(injurydead[tornado,2])
#get the data position of event "EXCESSIVE HEAT"
heat=grep(pattern="excessive",data$EVTYPE, ignore.case=T)
heatsum=sum(injurydead[heat,1])+sum(injurydead[heat,2])
#get the data position of event "TSTM WIND" (which include "TSTM WIND","THUNDERSTORM WIND", "HIGH WIND", "THUNDERSTORMS WIND")
tstm=grep(pattern="tstm",data$EVTYPE, ignore.case=T)
tstmsum=sum(injurydead[tstm,1])+sum(injurydead[tstm,2])
#get the data position of event "FLOOD" (which include "FLOOD" and "FLASH FLOOD"
flood=grep(pattern="flood",data$EVTYPE, ignore.case=T)
floodsum=sum(injurydead[flood,1])+sum(injurydead[flood,2])
#get the data position of event "LIGHTING"
light=grep(pattern="light",data$EVTYPE, ignore.case=T)
lightsum=sum(injurydead[light,1])+sum(injurydead[light,2])
#get the data position of event "HEAT"
heat2=grep(pattern="heat$",data$EVTYPE, ignore.case=T)
heat2sum=sum(injurydead[heat2,1])+sum(injurydead[heat2,2])
#get the data position of event " ICE STORM""
ice=grep(pattern="ice storm",data$EVTYPE, ignore.case=T)
icesum=sum(injurydead[ice,1])+sum(injurydead[ice,2])
#get the data position of event "THUDERSTORM WIND" and "THURNDERSTORM WINDs"
thunder=grep(pattern="thunderstorm wind",data$EVTYPE, ignore.case=T)
thundersum=sum(injurydead[thunder,1])+sum(injurydead[thunder,2])
#get the data position of event "WINTER STORM"
winter=grep(pattern="winter storm",data$EVTYPE, ignore.case=T)
wintersum=sum(injurydead[winter,1])+sum(injurydead[winter,2])
#get the data position of event "HAIL"
hail=grep(pattern="hail",data$EVTYPE, ignore.case=T)
hailsum=sum(injurydead[hail,1])+sum(injurydead[hail,2])
#get the data position of event "HURRICANE"
hurri=grep(pattern="hurricane",data$EVTYPE, ignore.case=T)
hurrisum=sum(injurydead[hurri,1])+sum(injurydead[hurri,2])
#get the data position of event "HEAVY SNOW"
heavy=grep(pattern="heavy snow",data$EVTYPE, ignore.case=T)
heavysum=sum(injurydead[heavy,1])+sum(injurydead[heavy,2])
#get the data position of event "WILDFIRE"
wild=grep(pattern="wildfire",data$EVTYPE, ignore.case=T)
wildsum=sum(injurydead[wild,1])+sum(injurydead[wild,2])
#get the data position of event "BLIZZARD"
blizz=grep(pattern="blizzard",data$EVTYPE, ignore.case=T)
blizzsum=sum(injurydead[blizz,1])+sum(injurydead[blizz,2])
#get the data position of event "FOG"
fog=grep(pattern="fog",data$EVTYPE, ignore.case=T)
fogsum=sum(injurydead[fog,1])+sum(injurydead[fog,2])
#create a new data frame for sum
totalsum=c(tornadosum,heatsum,tstmsum,floodsum,lightsum,heat2sum,icesum,thundersum,wintersum,hailsum,hurrisum,heavysum,wildsum,blizzsum,fogsum)
event=c("TORNADO","EXCESSIVE HEAT","TSTM WIND","FLOOD","LIGHTING","HEAT","ICE STORM","THUNDERSTORM","WINTER STORM","HAIL","HURRICANE","HEAVEY SNOW","WILDFIRE","BLIZZARD","FOG")
populationhealth=cbind(event,totalsum)
populationhealth2=as.data.frame(populationhealth)
populationhealth2
## event totalsum
## 1 TORNADO 97068
## 2 EXCESSIVE HEAT 8472
## 3 TSTM WIND 7609
## 4 FLOOD 10129
## 5 LIGHTING 6052
## 6 HEAT 11787
## 7 ICE STORM 2081
## 8 THUNDERSTORM 2637
## 9 WINTER STORM 1570
## 10 HAIL 1512
## 11 HURRICANE 1463
## 12 HEAVEY SNOW 1164
## 13 WILDFIRE 986
## 14 BLIZZARD 907
## 15 FOG 1158
From this new ranking, the population damage for each event is more precise, most event’s poopulation damage has increased. The most damaging event of population lost is TORNADO, the second is EXCESSIVE HEAT, the third is TSTM WIND.
library(ggplot2)
qplot(x=populationhealth2$event,y=populationhealth2$totalsum,geom=c("bar"),stat="identity")
Economic consequnces are defined as the sum of Property Damage(PROPDMG) and Crop Damage(CROPDMG). PROPDMGEXP and CROPDMGEXP are used to signify the magnitude (thousands, millions and billions.)
The following code will first convert the Property Damage and Crop Damage to the right format. PROPDMGEXP and CROPDMGEXP will be used to convert the money lost to the right format. There are three catagrories, “K”, “M”, “B”, which stand for thousands, millions, billions. For example, 25K will converted to 25000.
#convert Crop Damage to the right numeric format.
thouposition=grep("K",data$CROPDMGEXP)
data$CROPDMG[thouposition]=data$CROPDMG[thouposition]*1000
millposition=grep("M",data$CROPDMGEXP)
data$CROPDMG[millposition]=data$CROPDMG[millposition]*1000000
billposition=grep("B",data$CROPDMGEXP)
data$CROPDMG[billposition]=data$CROPDMG[billposition]*1000000000
#convert Property Damage to the right numeric format.
thouposition=grep("K",data$PROPDMGEXP)
data$PROPDMG[thouposition]=data$PROPDMG[thouposition]*1000
millposition=grep("M",data$PROPDMGEXP)
data$PROPDMG[millposition]=data$PROPDMG[millposition]*1000000
billposition=grep("B",data$PROPDMGEXP)
data$PROPDMG[billposition]=data$PROPDMG[billposition]*1000000000
Next, the following code will create a new column names “ecodamage”, which is the sum of each rows’ property damage and crop damage.
data$ecodamage=data$CROPDMG+data$PROPDMG
Last, sum the total economic damage according to each different events,in descending order.
ecosum=aggregate( ecodamage~ EVTYPE, data = data, FUN= sum)
ecosum=ecosum[order(-ecosum$ecodamage),]
head(ecosum,15)
## EVTYPE ecodamage
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57340614060
## 670 STORM SURGE 43323541000
## 244 HAIL 18752904943
## 153 FLASH FLOOD 17562129167
## 95 DROUGHT 15018672000
## 402 HURRICANE 14610229010
## 590 RIVER FLOOD 10148404500
## 427 ICE STORM 8967041360
## 848 TROPICAL STORM 8382236550
## 972 WINTER STORM 6715441251
## 359 HIGH WIND 5908617595
## 957 WILDFIRE 5060586800
## 856 TSTM WIND 5038935845
Next, using the same concept as before. the top 13 events are selected. Intergrate the different typings of each events, and form a data frame for the new ranking of the economic damage. (By using ignore.case=T and pattern, all the different typing of events will be searched and all their position will be returned)
damage=data[,c("EVTYPE","CROPDMG","PROPDMG","ecodamage")]
#get the data position of the event "FLOOD" (which include FALSH FLOOD, RIVER FLOOD and others)
flood=grep(pattern="flood",data$EVTYPE, ignore.case=T)
floodeco=sum(damage[flood,2])+sum(damage[flood,3])
#get the data position of the event "HURRICANE"
hurricane=grep(pattern="hurricane",data$EVTYPE, ignore.case=T)
hurricaneeco=sum(damage[hurricane,2])+sum(damage[hurricane,3])
#get the data position of the event "TORNADO"
tornado2=grep(pattern="tornado",data$EVTYPE, ignore.case=T)
tornado2eco=sum(damage[tornado2,2])+sum(damage[tornado2,3])
#get the data position of the event "STORM SURGE"
surge=grep(pattern="surge",data$EVTYPE, ignore.case=T)
surgeeco=sum(damage[surge,2])+sum(damage[surge,3])
#get the data position of the event "HAIL"
hail=grep(pattern="hail",data$EVTYPE, ignore.case=T)
haileco=sum(damage[hail,2])+sum(damage[hail,3])
#get the data position of the event "DROUGHT"
drought=grep(pattern="drought",data$EVTYPE, ignore.case=T)
droughteco=sum(damage[drought,2])+sum(damage[drought,3])
#get the data position of the event "ICE STORM"
icestorm=grep(pattern="ice storm",data$EVTYPE, ignore.case=T)
icestormeco=sum(damage[icestorm,2])+sum(damage[icestorm,3])
#get the data position of the event "TROPICAL STORM"
tropical=grep(pattern="tropical",data$EVTYPE, ignore.case=T)
tropicaleco=sum(damage[tropical,2])+sum(damage[tropical,3])
#get the data position of the event "WINTER STORM"
winterstorm=grep(pattern="winter storm",data$EVTYPE, ignore.case=T)
winterstormeco=sum(damage[winterstorm,2])+sum(damage[winterstorm,3])
#get the data position of the event "HIGH WIND"
high=grep(pattern="high wind",data$EVTYPE, ignore.case=T)
higheco=sum(damage[high,2])+sum(damage[high,3])
#creat a new ranking
EVENTS=c("FLOOD","HURRICANE","TORNADO","STORM SURGE","HAIL","DROUGHT","ICE STORM","TROPICAL STORM","WINTER STORM","HIGH WIND")
ECODAMAGE=c(floodeco,hurricaneeco,tornado2eco,surgeeco,haileco,droughteco,icestormeco,tropicaleco,winterstormeco,higheco)
economic=cbind(EVENTS,ECODAMAGE)
economic2=as.data.frame(economic)
economic2
## EVENTS ECODAMAGE
## 1 FLOOD 179909840041.71
## 2 HURRICANE 90241472840
## 3 TORNADO 58999059560.2
## 4 STORM SURGE 47966079000
## 5 HAIL 20728887366.6
## 6 DROUGHT 15018927780
## 7 ICE STORM 8968141360
## 8 TROPICAL STORM 8411023550
## 9 WINTER STORM 6782441251
## 10 HIGH WIND 6868121953
From the table above, FLOOD has the greatest economic damage.
qplot(x=economic2$EVENTS,y=economic2$ECODAMAGE,geom=c("bar"),stat="identity")
Two questions are addressed in this analysis. First, which event has the greatest damage among population? Second, which event has the greatest economic damage? In the first question, population damage is counted as the amount of people injuries and deaths. In the second question, economic damage is counted as the amount of proprety damage and crop damage. After intergrating each event with their various typings, it comes to that TORNADO has the greatest damage among population, which causes 97068 injuries and deaths. In the second question, similiar methods are used as in the first question. FLOOD has the greatest economic damage, which causes almost 1.8 *10^11 dollar damage.
However, it should be noticed that as there are many different typings to each events, the intergrations of each events are not strictly, bur rather catagories defined by the researcher. (The National Weather Service Storm Data Documentations are used as references. )