Synopsis

This study aims to identify the types of severe weather events that are the most dangerous to human lives (causing death and injory), as well as the most damaging to property in the US. Data from National Oceanic and Atmospheric Authority (NOAA) gathered over close to 50 years (from 1951 to 1999), was used to perform the analyses. The results of this analysis indicates that the most damaging weather event, both in terms of loss of human life and property damage is the tornado. Although lightning and thunderstorm events are events that happen most frequently, their damage is not comparable to that of tornados.

Introduction

Severe weather events are so prevalent in the US that we have at least one type of severe event per each season. Some regions of the US are more prone to multiple events, while others are prone to only specific types. In this study I will explore the adverse effects from severe weather on the entire country - without specifically partitioning the impacts by region. The main research question that this study addresses is “what are the types of severe weather events that caused the most distructiin the US over the second half of the 20th century?” For the purpose of this study, “distruction” is categorized into: 1. Human death (fatalities) 2. Human injuries 2. Property damage measured in terms of dollar cost.

Study Methods

This study uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to answer the research question. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data consists of 902,207 weather events recorded during the second half of the 20th century (1951 - 1999). There are 37 variables detailing the different characteristics of the weather events - including date/times of their occurrence, the locations etc.

Data Processing

As could be seen in the following display, in the NOAA database used for this research, the weather events are catagorized into 985 “event types”.

setwd("C:\\Users\\Simon\\Desktop\\Data Science\\course5\\data")
stDat<-read.csv("repdata_data_StormData.csv")
str(stDat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The first order of data processing business is therefore to reduce the 985 types of events into identical categories that are more manageable for the purpose of my analysis. After examining the events in detail, I “collapsed” the event categories into 9 broader groups. These broader groups are: 1. Winter related severe weather events 2. Wind related severe events 3. Drought related events 4. Marine thunderstorm 5. Tornados 6. Floods 7. Lightning/thunderstorm events (land) 8. Hurricane and 9. Hail

The categorization of the events into these broader categories of weather events is performed using a “regular expression” search for common words used in describing each weather event in the data. The following code actually performs the categorization:

attach(stDat)
## The following object is masked from package:base:
## 
##     F
stDat$eventCat<-"Other event"
stDat$eventCat[grep("WINT", stDat$EVTYPE)]<-"Winter event"
stDat$eventCat[grep("SNOW", stDat$EVTYPE)]<-"Winter event"
stDat$eventCat[grep("BLIZZARD", stDat$EVTYPE)]<-"Winter event"
stDat$eventCat[grep("COLD", stDat$EVTYPE)]<-"Winter event"
stDat$eventCat[grep("FREEZ", stDat$EVTYPE)]<-"Winter event"
stDat$eventCat[grep("WIND", stDat$EVTYPE)]<-"Wind Related"
stDat$eventCat[grep("DRY", stDat$EVTYPE)]<-"Drought Related"
stDat$eventCat[grep("FIRE", stDat$EVTYPE)]<-"Drought Related"
stDat$eventCat[grep("TSTM", stDat$EVTYPE)]<-"Marine Thunderstorm"
stDat$eventCat[grep("TORNAD", stDat$EVTYPE)]<-"Tornados"
stDat$eventCat[grep("FUNNEL", stDat$EVTYPE)]<-"Tornados"
stDat$eventCat[grep("FLOOD", stDat$EVTYPE)]<-"Flood"
stDat$eventCat[grep("WATER", stDat$EVTYPE)]<-"Flood"
stDat$eventCat[grep("THUNDER", stDat$EVTYPE)]<-"Lightning/Thunderstorms"
stDat$eventCat[grep("LIGHT", stDat$EVTYPE)]<-"Lightning/Thunderstorms"
stDat$eventCat[grep("HURR", stDat$EVTYPE)]<-"Hurricane"
stDat$eventCat[grep("HAIL", stDat$EVTYPE)]<-"Hail"
detach(stDat)

The next task in the data processing step was to standardize the cost estimate variables that are given in the data. According to the document that describes the storm data, the damages from each type of event are given in three-significant-digit numbers with a column describing whether the numbers are in thousands, millions or billions. In order to summarize the columns correctly, we need to standardize the cost estimates in terms of unifrom dollar amounts. The following code accomplishes this. At this point, in order to allow easy manipulation, I will make the data frame into a data table. This will make the subsequent data processing more efficient.

library(data.table)
stdt<-data.table(stDat)
attach(stdt)
## The following object is masked from package:base:
## 
##     F
stdt$propDmgAmt[PROPDMGEXP=="K"]<-PROPDMG*1000
## Warning in stdt$propDmgAmt[PROPDMGEXP == "K"] <- PROPDMG * 1000: number of
## items to replace is not a multiple of replacement length
stdt$propDmgAmt[PROPDMGEXP=="M"]<-PROPDMG*1000000
## Warning in stdt$propDmgAmt[PROPDMGEXP == "M"] <- PROPDMG * 1e+06: number
## of items to replace is not a multiple of replacement length
stdt$propDmgAmt[PROPDMGEXP=="B"]<-PROPDMG*1000000000
## Warning in stdt$propDmgAmt[PROPDMGEXP == "B"] <- PROPDMG * 1e+09: number
## of items to replace is not a multiple of replacement length
stdt$cropDmgAmt[PROPDMGEXP=="K"]<-CROPDMG*1000
## Warning in stdt$cropDmgAmt[PROPDMGEXP == "K"] <- CROPDMG * 1000: number of
## items to replace is not a multiple of replacement length
stdt$cropDmgAmt[PROPDMGEXP=="M"]<-CROPDMG*1000000
## Warning in stdt$cropDmgAmt[PROPDMGEXP == "M"] <- CROPDMG * 1e+06: number
## of items to replace is not a multiple of replacement length
stdt$cropDmgAmt[PROPDMGEXP=="B"]<-CROPDMG*1000000000
## Warning in stdt$cropDmgAmt[PROPDMGEXP == "B"] <- CROPDMG * 1e+09: number
## of items to replace is not a multiple of replacement length

After these major categories of severe weather events are created, and the costs are standardized, the raw data are then summarized with each major category representing a single record. The code that performed this transformation can be viewed below. Note that fatalities and injuries count does not need the standardizataion that is applied to the cost estimates.

dmgDat<-stdt[, list(totalpropDmg=sum(propDmgAmt, na.rm=TRUE),
                    totalCropDmg=sum(cropDmgAmt, na.rm=TRUE),
                    totalFatalities=sum(FATALITIES, na.rm=TRUE),
                    totalInjuries=sum(INJURIES, na.rm=TRUE),
                    eventCount = .N), by=eventCat]

Results

The following bar-chart shows the number of overall events categorized into the broad categories described above:

evCats<-table(stDat$eventCat)
barplot(evCats, horiz=TRUE, beside = TRUE,
        col = c("lightblue", "mistyrose", "red", "lightcyan",
                "lavender", "cornsilk", "blue", "green", "orange", "grey"), axisnames = FALSE, args.legend=list(x="topright", pch=25, pt.cex=.5, cex=0.6, text.font=0, ncol=3), legend.text=TRUE, xlab="Number of events", main="Broad Categories of Weather Events")

We see from the above graph that hail is by far the most frequent damaging weather event. A total of 290,394 such events are recorded over the period of the study data. This is followed by marine thunderstorm events, which in turn, are follwed by lighting/thunderstorms.

The Damages Incurred by these evemts

The following graph portrays the total damages (in billions of dollars), incurred severe weather events categorized by the major categories of events. By far the most damaging category of weather event that in terms of property is the tornado.

propD<-dmgDat$totalpropDmg/1000000000
names(propD)<-dmgDat$eventCat
barplot(propD, horiz=TRUE, beside = TRUE,
        col = c("lightblue", "mistyrose", "red", "lightcyan",
                "lavender", "cornsilk", "blue", "green", "orange", "grey"), axisnames = FALSE, args.legend=list(x="topright", pch=25, pt.cex=.5, cex=0.6, text.font=0, ncol=3), legend.text=TRUE, xlab="Billions of Dollars", main="Property Damage by Broad Categories of Weather Events")

In terms of fatalities, again, tornados claimed by far the most life. A total of 5,633 lives have been lost to tornado events. The below graph shows total deaths due to the different weather events over the period of study.

fetD<-dmgDat$totalFatalities
names(fetD)<-dmgDat$eventCat
barplot(fetD, horiz=TRUE, beside = TRUE,
        col = c("lightblue", "mistyrose", "red", "lightcyan",
                "lavender", "cornsilk", "blue", "green", "orange", "grey"), axisnames = FALSE, args.legend=list(x="topright", pch=25, pt.cex=.5, cex=0.6, text.font=0, ncol=3), legend.text=TRUE, xlab="Number of fatalities", main="Fatalities by Broad Categories of Weather Events")

The total injury count attributed to tornadic events is even more staggering - 91,367. The details of damages to property as well as fatalities and injuries attributed to each of the major categories of weather events is provided on the table below.

head(dmgDat, 10)
##                    eventCat totalpropDmg totalCropDmg totalFatalities
##  1:                Tornados 357218292520     23822050            5633
##  2:     Marine Thunderstorm   8167079970            0             514
##  3:                    Hail  39039012880    175540510              45
##  4:            Winter event  28214394860     40653480             756
##  5:                   Flood  92722319590     69801140            1533
##  6:               Hurricane 293232902530         2000             135
##  7: Lightning/Thunderstorms   7288738340    202959660            1030
##  8:             Other event 333974082200     25589230            4687
##  9:            Wind Related  30495173030     27808370             690
## 10:         Drought Related  31158470700      3642760             122
##     totalInjuries eventCount
##  1:         91367      67666
##  2:          6970     226204
##  3:          1467     290394
##  4:          4191      42456
##  5:          8675      86527
##  6:          1326        286
##  7:          7712     125441
##  8:         15193      30673
##  9:          1990      28108
## 10:          1637       4542

Conclusion

In conclusion we see that tornados have been a major cause of human casualty and property damage over the last half of the 20th century. As such they need to be the target for scientific investigation in order to avoid unnecessary loss to human life and property damage.