Analysis of the Consequences of Storm Events in the United States

Synopsis

This is an analysis regrading of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In this analysis, two specify questions are addressed:

1. Across the United States, which types of events are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

This report has three big parts, Synopsis, Data Analysis, and Results. In Data Analysis, there are there parts,Data Processing,Clean and Tidy Data, Data Processing. In Data Analysis, data are loaded in R and different typings of events are intergrated, after that, population damage and economic damage for each event is defined and calculated, last, event of the greatest popoulation damage and event of the greatest economic damage is shown.

Data Analysis

Load the Data

After downloading the data(bz2 file) from the Coursera Reproducible Research assignment website,load the data in R.

library("bitops")
library("RCurl")
setwd("E:/reproducible research")
data=read.csv(bzfile("repdata-data-StormData.csv.bz2"))

which types of events are most harmful with respect to population health?

By looking at the National Weather Service Storm Data Documentations,according to the data name explainations, population health will be defined as the sum of column “INJURIES” and “FATALITIES”.

It does not make sense to reduce all the EVTYPE events down to 48 types. For the purposes of this analysis, it is assumed that f one event only cause several injuries and death, combining all the similiar events won’t increase the injuries and fatalities to thousands, as if one event causes small amount of injuries, it won’t cause an outstanding injuries in other states in general. Rather, instading of cleaning the EVTYPE first, let’s sum the injuries and fatalities amount for each event, and see which event cause the largest damage, then we will start cleaning the event types for those events that cause a rather large amont of damage.

data$sum <- data$INJURIES + data$FATALITIES
total=aggregate( sum~ EVTYPE, data = data, FUN= sum)
total=total[order(-total$sum),]
head(total,20)
##                 EVTYPE   sum
## 834            TORNADO 96979
## 130     EXCESSIVE HEAT  8428
## 856          TSTM WIND  7461
## 170              FLOOD  7259
## 464          LIGHTNING  6046
## 275               HEAT  3037
## 153        FLASH FLOOD  2755
## 427          ICE STORM  2064
## 760  THUNDERSTORM WIND  1621
## 972       WINTER STORM  1527
## 359          HIGH WIND  1385
## 244               HAIL  1376
## 411  HURRICANE/TYPHOON  1339
## 310         HEAVY SNOW  1148
## 957           WILDFIRE   986
## 786 THUNDERSTORM WINDS   972
## 30            BLIZZARD   906
## 188                FOG   796
## 585        RIP CURRENT   600
## 955   WILD/FOREST FIRE   557
  1. Clean and Tidy Data

However, it should be noticed that there are many types in the EVTYPE column.In the EVTYPE column, although there are 985 types, there are only 48 different types of events. (Mostly due to typos, the choice of whether to use capitalisation or not,etc.) From what we got so far, the most damage types would be Tornado. However, in order to be 100% sure, different typos of events should be included, which means that “Tornado”, “tornado” and “TORNAO”columns should be intergrated into one. By doing so, we can be sure if the damage ranking will change, so that we can make sure which type is the most damage event. The first 15 most damage EVTYPE event names will be cleaned, as shown in the table above.(The most damage events will pop up even if it is not cleaned up, so only the first 15 events are selected)

The following code will intergrate events under different typings.Each will give the positions of each event, then the sum of fatalies and injuries will be given (15 sums,) then all the sum will be combined,so that a new, more precise ranking will be given.(By using ignore.case=T and pattern, all the different typing of events will be searched and all their position will be returned)

injurydead=data[,c("INJURIES","FATALITIES")]
#get the data position of the event "TORNADO"
tornado=grep(pattern="ornado",data$EVTYPE, ignore.case=T)
tornadosum=sum(injurydead[tornado,1])+sum(injurydead[tornado,2])
#get the data position of event "EXCESSIVE HEAT"
heat=grep(pattern="excessive",data$EVTYPE, ignore.case=T)
heatsum=sum(injurydead[heat,1])+sum(injurydead[heat,2])
#get the data position of event "TSTM WIND" (which include "TSTM WIND","THUNDERSTORM WIND", "HIGH WIND", "THUNDERSTORMS WIND")
tstm=grep(pattern="tstm",data$EVTYPE, ignore.case=T)
tstmsum=sum(injurydead[tstm,1])+sum(injurydead[tstm,2])
#get the data position of event "FLOOD" (which include "FLOOD" and "FLASH FLOOD"
flood=grep(pattern="flood",data$EVTYPE, ignore.case=T)
floodsum=sum(injurydead[flood,1])+sum(injurydead[flood,2])
#get the data position of event "LIGHTING"
light=grep(pattern="light",data$EVTYPE, ignore.case=T)
lightsum=sum(injurydead[light,1])+sum(injurydead[light,2])
#get the data position of event "HEAT"
heat2=grep(pattern="heat$",data$EVTYPE, ignore.case=T)
heat2sum=sum(injurydead[heat2,1])+sum(injurydead[heat2,2])
#get the data position of event " ICE STORM""
ice=grep(pattern="ice storm",data$EVTYPE, ignore.case=T)
icesum=sum(injurydead[ice,1])+sum(injurydead[ice,2])
#get the data position of event "THUDERSTORM WIND" and "THURNDERSTORM WINDs"
thunder=grep(pattern="thunderstorm wind",data$EVTYPE, ignore.case=T)
thundersum=sum(injurydead[thunder,1])+sum(injurydead[thunder,2])
#get the data position of event "WINTER STORM"
winter=grep(pattern="winter storm",data$EVTYPE, ignore.case=T)
wintersum=sum(injurydead[winter,1])+sum(injurydead[winter,2])
#get the data position of event "HAIL"
hail=grep(pattern="hail",data$EVTYPE, ignore.case=T)
hailsum=sum(injurydead[hail,1])+sum(injurydead[hail,2])
#get the data position of event "HURRICANE"
hurri=grep(pattern="hurricane",data$EVTYPE, ignore.case=T)
hurrisum=sum(injurydead[hurri,1])+sum(injurydead[hurri,2])
#get the data position of event "HEAVY SNOW"
heavy=grep(pattern="heavy snow",data$EVTYPE, ignore.case=T)
heavysum=sum(injurydead[heavy,1])+sum(injurydead[heavy,2])
#get the data position of event "WILDFIRE"
wild=grep(pattern="wildfire",data$EVTYPE, ignore.case=T)
wildsum=sum(injurydead[wild,1])+sum(injurydead[wild,2])
#get the data position of event "BLIZZARD"
blizz=grep(pattern="blizzard",data$EVTYPE, ignore.case=T)
blizzsum=sum(injurydead[blizz,1])+sum(injurydead[blizz,2])
#get the data position of event "FOG"
fog=grep(pattern="fog",data$EVTYPE, ignore.case=T)
fogsum=sum(injurydead[fog,1])+sum(injurydead[fog,2])

#create a new data frame for sum
totalsum=c(tornadosum,heatsum,tstmsum,floodsum,lightsum,heat2sum,icesum,thundersum,wintersum,hailsum,hurrisum,heavysum,wildsum,blizzsum,fogsum)
event=c("TORNADO","EXCESSIVE HEAT","TSTM WIND","FLOOD","LIGHTING","HEAT","ICE STORM","THUNDERSTORM","WINTER STORM","HAIL","HURRICANE","HEAVEY SNOW","WILDFIRE","BLIZZARD","FOG")
populationhealth=cbind(event,totalsum)
populationhealth2=as.data.frame(populationhealth)

Figure 1 (Ranking about population damage)

populationhealth2
##             event totalsum
## 1         TORNADO    97068
## 2  EXCESSIVE HEAT     8472
## 3       TSTM WIND     7609
## 4           FLOOD    10129
## 5        LIGHTING     6052
## 6            HEAT    11787
## 7       ICE STORM     2081
## 8    THUNDERSTORM     2637
## 9    WINTER STORM     1570
## 10           HAIL     1512
## 11      HURRICANE     1463
## 12    HEAVEY SNOW     1164
## 13       WILDFIRE      986
## 14       BLIZZARD      907
## 15            FOG     1158

From this new ranking, the population damage for each event is more precise, most event’s poopulation damage has increased. The most damaging event of population lost is TORNADO, the second is EXCESSIVE HEAT, the third is TSTM WIND.

Plot 1

library(ggplot2)
qplot(x=populationhealth2$event,y=populationhealth2$totalsum,geom=c("bar"),stat="identity")

which types of events have the greatest economic consequences?

Economic consequnces are defined as the sum of Property Damage(PROPDMG) and Crop Damage(CROPDMG). PROPDMGEXP and CROPDMGEXP are used to signify the magnitude (thousands, millions and billions.)

The following code will first convert the Property Damage and Crop Damage to the right format. PROPDMGEXP and CROPDMGEXP will be used to convert the money lost to the right format. There are three catagrories, “K”, “M”, “B”, which stand for thousands, millions, billions. For example, 25K will converted to 25000.

#convert Crop Damage to the right numeric format.
thouposition=grep("K",data$CROPDMGEXP)
data$CROPDMG[thouposition]=data$CROPDMG[thouposition]*1000
millposition=grep("M",data$CROPDMGEXP)
data$CROPDMG[millposition]=data$CROPDMG[millposition]*1000000
billposition=grep("B",data$CROPDMGEXP)
data$CROPDMG[billposition]=data$CROPDMG[billposition]*1000000000

#convert Property Damage to the right numeric format.
thouposition=grep("K",data$PROPDMGEXP)
data$PROPDMG[thouposition]=data$PROPDMG[thouposition]*1000
millposition=grep("M",data$PROPDMGEXP)
data$PROPDMG[millposition]=data$PROPDMG[millposition]*1000000
billposition=grep("B",data$PROPDMGEXP)
data$PROPDMG[billposition]=data$PROPDMG[billposition]*1000000000

Next, the following code will create a new column names “ecodamage”, which is the sum of each rows’ property damage and crop damage.

data$ecodamage=data$CROPDMG+data$PROPDMG

Last, sum the total economic damage according to each different events,in descending order.

ecosum=aggregate( ecodamage~ EVTYPE, data = data, FUN= sum)
ecosum=ecosum[order(-ecosum$ecodamage),]
head(ecosum,15)
##                EVTYPE    ecodamage
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57340614060
## 670       STORM SURGE  43323541000
## 244              HAIL  18752904943
## 153       FLASH FLOOD  17562129167
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041360
## 848    TROPICAL STORM   8382236550
## 972      WINTER STORM   6715441251
## 359         HIGH WIND   5908617595
## 957          WILDFIRE   5060586800
## 856         TSTM WIND   5038935845

Next, using the same concept as before. the top 13 events are selected. Intergrate the different typings of each events, and form a data frame for the new ranking of the economic damage. (By using ignore.case=T and pattern, all the different typing of events will be searched and all their position will be returned)

damage=data[,c("EVTYPE","CROPDMG","PROPDMG","ecodamage")]
#get the data position of the event "FLOOD" (which include FALSH FLOOD, RIVER FLOOD and others)
flood=grep(pattern="flood",data$EVTYPE, ignore.case=T)
floodeco=sum(damage[flood,2])+sum(damage[flood,3])
#get the data position of the event "HURRICANE"
hurricane=grep(pattern="hurricane",data$EVTYPE, ignore.case=T)
hurricaneeco=sum(damage[hurricane,2])+sum(damage[hurricane,3])
#get the data position of the event "TORNADO"
tornado2=grep(pattern="tornado",data$EVTYPE, ignore.case=T)
tornado2eco=sum(damage[tornado2,2])+sum(damage[tornado2,3])
#get the data position of the event "STORM SURGE"
surge=grep(pattern="surge",data$EVTYPE, ignore.case=T)
surgeeco=sum(damage[surge,2])+sum(damage[surge,3])
#get the data position of the event "HAIL"
hail=grep(pattern="hail",data$EVTYPE, ignore.case=T)
haileco=sum(damage[hail,2])+sum(damage[hail,3])
#get the data position of the event "DROUGHT"
drought=grep(pattern="drought",data$EVTYPE, ignore.case=T)
droughteco=sum(damage[drought,2])+sum(damage[drought,3])
#get the data position of the event "ICE STORM"
icestorm=grep(pattern="ice storm",data$EVTYPE, ignore.case=T)
icestormeco=sum(damage[icestorm,2])+sum(damage[icestorm,3])
#get the data position of the event "TROPICAL STORM"
tropical=grep(pattern="tropical",data$EVTYPE, ignore.case=T)
tropicaleco=sum(damage[tropical,2])+sum(damage[tropical,3])
#get the data position of the event "WINTER STORM"
winterstorm=grep(pattern="winter storm",data$EVTYPE, ignore.case=T)
winterstormeco=sum(damage[winterstorm,2])+sum(damage[winterstorm,3])
#get the data position of the event "HIGH WIND"
high=grep(pattern="high wind",data$EVTYPE, ignore.case=T)
higheco=sum(damage[high,2])+sum(damage[high,3])

#creat a new ranking
EVENTS=c("FLOOD","HURRICANE","TORNADO","STORM SURGE","HAIL","DROUGHT","ICE STORM","TROPICAL STORM","WINTER STORM","HIGH WIND")
ECODAMAGE=c(floodeco,hurricaneeco,tornado2eco,surgeeco,haileco,droughteco,icestormeco,tropicaleco,winterstormeco,higheco)
economic=cbind(EVENTS,ECODAMAGE)
economic2=as.data.frame(economic)

Figure 2 (economic damage events)

economic2
##            EVENTS       ECODAMAGE
## 1           FLOOD 179909840041.71
## 2       HURRICANE     90241472840
## 3         TORNADO   58999059560.2
## 4     STORM SURGE     47966079000
## 5            HAIL   20728887366.6
## 6         DROUGHT     15018927780
## 7       ICE STORM      8968141360
## 8  TROPICAL STORM      8411023550
## 9    WINTER STORM      6782441251
## 10      HIGH WIND      6868121953

From the table above, FLOOD has the greatest economic damage.

Plot 2

qplot(x=economic2$EVENTS,y=economic2$ECODAMAGE,geom=c("bar"),stat="identity")

Results

Two questions are addressed in this analysis. First, which event has the greatest damage among population? Second, which event has the greatest economic damage? In the first question, population damage is counted as the amount of people injuries and deaths. In the second question, economic damage is counted as the amount of proprety damage and crop damage. After intergrating each event with their various typings, it comes to that TORNADO has the greatest damage among population, which causes 97068 injuries and deaths. In the second question, similiar methods are used as in the first question. FLOOD has the greatest economic damage, which causes almost 1.8 *10^11 dollar damage.

However, it should be noticed that as there are many different typings to each events, the intergrations of each events are not strictly, bur rather catagories defined by the researcher. (The National Weather Service Storm Data Documentations are used as references. )