This document analyses the historical storm data provided by NOAA to find the top severe weather events in terms of impacts to population and economic damage.

Synopsis

Several weather related events occur in the United States every year and some are known to cause widespread population impact and economic damage. This study analyses the weather related events dating back to 1950 to determine which specific events have caused the most impacts economically and socially. The NOAA database provides weather related events and their corrosponding impact in a database file from 1950 to November 2011. The database contains various fields that capture information for each event . The most relevant fields for this study was determined to be the type of event (EVTYPE), the economic damage including monetary assesment to property and crops (PROPDMG,CROPDMG), and impact on populations (FATALITIES,INJURIES).

Data Processing

Population Impact

The format of the NOAA data is in the form of a csv and was loaded into R using the following code

data <- read.csv("C:/RLearning/RepResearch/data/repdata_data_StormData.csv")

Next, only the relevant fields for this study were extracted into a seperate dataset

x <- data[,c("BGN_DATE","STATE","COUNTY","END_DATE","EVTYPE","PROPDMG","CROPDMG","FATALITIES","INJURIES","REMARKS")]

The event types (EVTYPE) are grouped together by factoring this column on the dataset and a summation is applied for each specific event for the population impact columns (FATALITIES,INJURIES). This results in a dataset that contains the summation of a specific EVTYPE for all recorded years over the entire United States.

 totalsum <- data.frame(cbind(tapply(x$FATALITIES,x$EVTYPE,sum),tapply(x$INJURIES,x$EVTYPE,sum)))
 totalsum <- cbind(totalsum,rownames(totalsum))

This summation data can now sorted in decreasing order (FATALATIES first,INJURIES second). The assumption is that FATALATIES are considered more severe outcome than INJURIES and therefore sorted first. Two additional colums indicating FATALITIES and INJURIES as percentage of total are added to infer severity.

 rownames(totalsum) <- c()
  xx <-totalsum[order(totalsum[,1],totalsum[,2],decreasing=TRUE),]

 
xx$fpercent <- xx[,1]*100/sum(xx[,1])
xx$ipercent<- xx[,2]*100/sum(xx[,2])
 popdamages <- xx[1:10,c(3,1,2,4,5)]
colnames(popdamages) = c("EVTYPE","FATALITIES","INJURIES","FATALITIES (% total)","INJURIES(% total)")

Economic Impact

To asses economic consequenses of weather the dollar amounts in damages defined in PROPDMG and CROPDMG are added together for into a column called TOTALPROPDMG. This is now sorted in descending order from most damage to least.

 propsum <- data.frame(cbind(tapply(x$PROPDMG,x$EVTYPE,sum),tapply(x$CROPDMG,x$EVTYPE,sum)))
 propsum <- cbind(totalsum,rownames(totalsum))
propsum$TOTALPROPDMG <- totalsum[,1] + totalsum[,2]
propsumdesc <-propsum[order(propsum[,"TOTALPROPDMG"],decreasing=TRUE),]
propsumdesc$dmgpercent <- propsumdesc[,"TOTALPROPDMG"]*100/sum(propsumdesc[,"TOTALPROPDMG"])
propsumdesc <- propsumdesc[,c(3,5,6)]
rownames(propsumdesc) <- c()
colnames(propsumdesc) <- c("EVTYPE","TOTALDMG","% of Total")

Results

We when we review the population damage data in descending order we can see that TORNADO and EXCESSIVE HEAT events combined cause more than 50% of fatalities and injuries.

head(popdamages,10)
##             EVTYPE FATALITIES INJURIES FATALITIES (% total)
## 834        TORNADO       5633    91346            37.193793
## 130 EXCESSIVE HEAT       1903     6525            12.565203
## 153    FLASH FLOOD        978     1777             6.457577
## 275           HEAT        937     2100             6.186860
## 464      LIGHTNING        816     5230             5.387917
## 856      TSTM WIND        504     6957             3.327831
## 170          FLOOD        470     6789             3.103334
## 585    RIP CURRENT        368      232             2.429845
## 359      HIGH WIND        248     1137             1.637504
## 19       AVALANCHE        224      170             1.479036
##     INJURIES(% total)
## 834        65.0019925
## 130         4.6432028
## 153         1.2645167
## 275         1.4943641
## 464         3.7216782
## 856         4.9506148
## 170         4.8310657
## 585         0.1650917
## 359         0.8090914
## 19          0.1209723

And when we do the same with economic damage we see that the most damage is caused TORNADO event type (62%) followed by EXCESSIVE HEAT (5.5%)

head(propsumdesc,10)
##               EVTYPE TOTALDMG % of Total
## 1            TORNADO    96979 62.2966089
## 2     EXCESSIVE HEAT     8428  5.4139125
## 3          TSTM WIND     7461  4.7927386
## 4              FLOOD     7259  4.6629795
## 5          LIGHTNING     6046  3.8837820
## 6               HEAT     3037  1.9508842
## 7        FLASH FLOOD     2755  1.7697353
## 8          ICE STORM     2064  1.3258561
## 9  THUNDERSTORM WIND     1621  1.0412853
## 10      WINTER STORM     1527  0.9809023

A pie chart showing the events that contribute the bulk of the damage assesment is illustrated below

par(mfrow=c(1,3))
 pie(propsumdesc[1:5,3],propsumdesc$EVTYPE,main="Fatalities");pie(popdamages[1:5,4],popdamages$EVTYPE,main=
                                                                  "Injuries");pie(popdamages[1:5,3],popdamages$EVTYPE,main="Economic")

From the analysis we can conclude that the two weather events TORNADO and EXCESSIVE HEAT have the greatest impact to population and economy and preparing for these events will help mitigate these losses to a considerable extent.