Synopsis

This analysis looks at the NOAA’s database of storms and other severe weather events and their associated fatalities, injuries, and property damage. The objective of this analysis is to identify which types of storm/weather events have a greater impact on population health (as measured by fatalities and injuries) and which have greater economic consequences (as measured by total damages to property and crops). The first step in the analysis entailed identifying the most relevant time period for analysis – which turns out to be 1993 onwards. Then, for this time period, storm/weather event types were analyzed and reclassified into the major Event Types found in the sourcebook. Finally, bar charts of fatalities, injuries and total monetary damages by major Event Types were created, which allows for comparison.

Data Processing

The NOAA database was download from the URL specified. We analyzed the total fatalities per year and noticed a significant increase starting with 1993 onward, likely a result of increased and improved strom tracking across the US.

## Set the working directory
setwd("C:/Users/jlee/Documents/Data Science/Reproducible Research/")

## If the file isnt there, download it and unzip it
if (!file.exists("stormdata.csv.bz2")) {
    url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(url, dest="stormdata.csv.bz2", method="curl")
}

## Read into data frame, create a column for Year
storm <- read.csv(bzfile("stormdata.csv.bz2"),stringsAsFactor=FALSE)
library(lubridate)
storm$BGN_DATE <- strptime(storm$BGN_DATE, "%m/%d/%Y %H:%M:%S")
BGN_YEAR <- year(storm$BGN_DATE)
storm <- cbind(storm,BGN_YEAR)

##  Aggegate fatalities by Year
fatalities <- setNames(aggregate(storm$FATALITIES,list(year=storm$BGN_YEAR),sum),c("year","fatalities"))

In the table below, we see a step change increase in fatalities starting in 1993.

fatalities
##    year fatalities
## 1  1950         70
## 2  1951         34
## 3  1952        230
## 4  1953        519
## 5  1954         36
## 6  1955        129
## 7  1956         83
## 8  1957        193
## 9  1958         67
## 10 1959         58
## 11 1960         46
## 12 1961         52
## 13 1962         30
## 14 1963         31
## 15 1964         73
## 16 1965        301
## 17 1966         98
## 18 1967        114
## 19 1968        131
## 20 1969         66
## 21 1970         73
## 22 1971        159
## 23 1972         27
## 24 1973         89
## 25 1974        366
## 26 1975         60
## 27 1976         44
## 28 1977         43
## 29 1978         53
## 30 1979         84
## 31 1980         28
## 32 1981         24
## 33 1982         64
## 34 1983         37
## 35 1984        160
## 36 1985        112
## 37 1986         51
## 38 1987         89
## 39 1988         55
## 40 1989         79
## 41 1990         95
## 42 1991         73
## 43 1992         54
## 44 1993        298
## 45 1994        344
## 46 1995       1491
## 47 1996        542
## 48 1997        601
## 49 1998        687
## 50 1999        908
## 51 2000        477
## 52 2001        469
## 53 2002        498
## 54 2003        443
## 55 2004        370
## 56 2005        469
## 57 2006        599
## 58 2007        421
## 59 2008        488
## 60 2009        333
## 61 2010        425
## 62 2011       1002

As a result, we decided do further data processing only on data from 1993 onward in the NOAA dataset. Our next step was to identify the key Event Types. To do this, we calculate the total fatalities by event type during the time period. And, to further narrow the list of Event Types, we excluded all Event Types with 10 or less fatalities during this time period.

##  Use data from 1993 onward for further analysis
storm1993 <- storm[storm$BGN_YEAR >= 1993,]

## Get fatalities by events 
eventagg <- setNames(aggregate(storm1993$FATALITIES,list(event=storm1993$EVTYPE),sum),c("evtype","fatalities"))
keyevents<- eventagg[eventagg$fatalities>10,]

This left us with a manageable list of Event Types, which we will further aggregate and reclassify to the one of the 48 major Event Types found in the sourcebook.

keyevents
##                         evtype fatalities
## 19                   AVALANCHE        224
## 30                    BLIZZARD        101
## 66                        COLD         35
## 72               COLD AND SNOW         14
## 79             COLD/WIND CHILL         95
## 89                   DENSE FOG         18
## 117                 DUST STORM         22
## 130             EXCESSIVE HEAT       1903
## 140               EXTREME COLD        160
## 141    EXTREME COLD/WIND CHILL        125
## 142               EXTREME HEAT         96
## 146          EXTREME WINDCHILL         17
## 153                FLASH FLOOD        978
## 161          FLASH FLOOD/FLOOD         14
## 164             FLASH FLOODING         19
## 170                      FLOOD        470
## 177          FLOOD/FLASH FLOOD         17
## 188                        FOG         62
## 275                       HEAT        937
## 278                  HEAT WAVE        172
## 290                 HEAVY RAIN         98
## 310                 HEAVY SNOW        127
## 342       HEAVY SURF/HIGH SURF         42
## 350                  HIGH SURF        101
## 359                  HIGH WIND        248
## 376                 HIGH WINDS         35
## 402                  HURRICANE         61
## 411          HURRICANE/TYPHOON         64
## 427                  ICE STORM         89
## 442                  LANDSLIDE         38
## 464                  LIGHTNING        816
## 488         MARINE STRONG WIND         14
## 580      RECORD/EXCESSIVE HEAT         17
## 585                RIP CURRENT        368
## 586               RIP CURRENTS        204
## 670                STORM SURGE         13
## 671           STORM SURGE/TIDE         11
## 676                STRONG WIND        103
## 760          THUNDERSTORM WIND        133
## 786         THUNDERSTORM WINDS         64
## 834                    TORNADO       1621
## 842 TORNADOES, TSTM WIND, HAIL         25
## 848             TROPICAL STORM         58
## 856                  TSTM WIND        241
## 877                    TSUNAMI         33
## 886          UNSEASONABLY WARM         11
## 888  UNSEASONABLY WARM AND DRY         29
## 919       URBAN/SML STREAM FLD         28
## 955           WILD/FOREST FIRE         12
## 957                   WILDFIRE         75
## 960                       WIND         23
## 972               WINTER STORM        206
## 978             WINTER WEATHER         33
## 980         WINTER WEATHER/MIX         28

Here is how we mapped the Event Types in the NOAA database to one of the 48 major Event Types in the sourcebook.

##  Create a new column to store the mapped Event Type and bind to the dataset
evcat <-storm1993$EVTYPE
storm1993 <- cbind(storm1993,evcat)
storm1993$evcat <- as.character(storm1993$evcat)

##  Reclassigy events to the 31 Event Types in the Source Book
storm1993$evcat[grep("^AVALANCHE$",storm1993$evcat)] <- "Avalanche"
storm1993$evcat[grep("^BLIZZARD$",storm1993$evcat)] <- "Blizzard"
storm1993$evcat[grep("^COLD AND SNOW$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^COLD/WIND CHILL$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^COLD$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^DENSE FOG$",storm1993$evcat)] <- "Dense Fog"
storm1993$evcat[grep("^DUST STORM$",storm1993$evcat)] <- "Dust Storm"
storm1993$evcat[grep("^EXCESSIVE HEAT$",storm1993$evcat)] <- "Excessive Heat"
storm1993$evcat[grep("^EXTREME COLD$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^EXTREME COLD/WIND CHILL$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^EXTREME HEAT$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^EXTREME WINDCHILL$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^FLASH FLOOD/FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLASH FLOODING$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLOOD/FLASH FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLASH FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLOOD$",storm1993$evcat)] <- "Flood"
storm1993$evcat[grep("^FOG$",storm1993$evcat)] <- "Dense Fog"
storm1993$evcat[grep("^HEAT WAVE$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^HEAT$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^HEAVY RAIN$",storm1993$evcat)] <- "Heavy Rain"
storm1993$evcat[grep("^HEAVY SNOW$",storm1993$evcat)] <- "Heavy Snow"
storm1993$evcat[grep("^HEAVY SURF/HIGH SURF$",storm1993$evcat)] <- "High Surf"
storm1993$evcat[grep("^HIGH SURF$",storm1993$evcat)] <- "High Surf"
storm1993$evcat[grep("^HIGH WINDS$",storm1993$evcat)] <- "High Wind"
storm1993$evcat[grep("^HIGH WIND$",storm1993$evcat)] <- "High Wind"
storm1993$evcat[grep("^HURRICANE/TYPHOON$",storm1993$evcat)] <- "Hurricane (Typhoon)"
storm1993$evcat[grep("^HURRICANE$",storm1993$evcat)] <- "Hurricane (Typhoon)"
storm1993$evcat[grep("^ICE STORM$",storm1993$evcat)] <- "Ice Storm"
storm1993$evcat[grep("^LANDSLIDE$",storm1993$evcat)] <- "Other"
storm1993$evcat[grep("^LIGHTNING$",storm1993$evcat)] <- "Lightning"
storm1993$evcat[grep("^MARINE STRONG WIND$",storm1993$evcat)] <- "Marine Strong Wind"
storm1993$evcat[grep("^RECORD/EXCESSIVE HEAT$",storm1993$evcat)] <- "Excessive Heat"
storm1993$evcat[grep("^RIP CURRENTS$",storm1993$evcat)] <- "Rip Current"
storm1993$evcat[grep("^RIP CURRENT$",storm1993$evcat)] <- "Rip Current"
storm1993$evcat[grep("^STORM SURGE/TIDE$",storm1993$evcat)] <- "Storm Surge/Tide"
storm1993$evcat[grep("^STORM SURGE$",storm1993$evcat)] <- "Storm Surge/Tide"
storm1993$evcat[grep("^STRONG WIND$",storm1993$evcat)] <- "Strong Wind"
storm1993$evcat[grep("^THUNDERSTORM WINDS$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^THUNDERSTORM WIND$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^TORNADOES, TSTM WIND, HAIL$",storm1993$evcat)] <- "Tornado"
storm1993$evcat[grep("^TORNADO$",storm1993$evcat)] <- "Tornado"
storm1993$evcat[grep("^TROPICAL STORM$",storm1993$evcat)] <- "Tropical Storm"
storm1993$evcat[grep("^TSTM WIND$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^TSUNAMI$",storm1993$evcat)] <- "Tsunami"
storm1993$evcat[grep("^UNSEASONABLY WARM AND DRY$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^UNSEASONABLY WARM$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^URBAN/SML STREAM FLD$",storm1993$evcat)] <- "Lakeshore Flood"
storm1993$evcat[grep("^WILD/FOREST FIRE$",storm1993$evcat)] <- "Wildfire"
storm1993$evcat[grep("^WILDFIRE$",storm1993$evcat)] <- "Wildfire"
storm1993$evcat[grep("^WINTER STORM$",storm1993$evcat)] <- "Winter Storm"
storm1993$evcat[grep("^WINTER WEATHER/MIX$",storm1993$evcat)] <- "Winter Weather"
storm1993$evcat[grep("^WINTER WEATHER$",storm1993$evcat)] <- "Winter Weather"
storm1993$evcat[grep("^WIND$",storm1993$evcat)] <- "Strong Wind"

#  Now, for all other Event Types which are not mapped, we will reclassify to "Other"
match <- (storm1993$evcat == storm1993$EVTYPE)
storm1993$evcat[match] <- "Other"

Also, we calculated total damage, by summing up property and crop damages. We interpreted the “exp” columns as expontents to the 10 power.

##  Convert symbols to 0, B = billion, K = thousand, M = million, H = hundred.  Do this for Property.
storm1993$PROPDMGEXP <- gsub("+",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("?",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("-",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("B",9,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("K",3,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("M",6,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("H",2,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$PROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$PROPDMGEXP[is.na(storm1993$PROPDMGEXP)] <- 0

##  And do same for Crop Exponent
storm1993$CROPDMGEXP <- gsub("+",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("?",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("-",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("B",9,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("K",3,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("M",6,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("H",2,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$CROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$CROPDMGEXP[is.na(storm1993$PROPDMGEXP)] <- 0

## Caculate total damages as sum of property and crops.  Add as column to the dataset.
totdmg <- storm1993$PROPDMG*(10^storm1993$PROPDMGEXP) + storm1993$CROPDMG*(10^storm1993$CROPDMGEXP)
storm1993 <- cbind(storm1993,totdmg)

Results

Now, lets take a look at Fatalities, Injuries and Total Damages by major Event Type.

Here is a bar chart of Fatalities by major Event Type. The Events on the left caused the most fatalities.

##  Aggregate fatalities by major Event Type
fatalities <- setNames(aggregate(storm1993$FATALITIES,list(event=storm1993$evcat),sum),c("evcat","fatalities"))

##  Order by fatalities descending
fatalities <- fatalities[order(fatalities$fatalities,decreasing=TRUE),]

##  Plot fatalities by major Event Type
library(ggplot2)
g <- ggplot(fatalities,aes(x=evcat,y=fatalities)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Fatalities",title="Fatalities by Event Type\n(Since 1993)",x="Major Event Types")
g

Here is a bar chart of Injuries by major Event Type. Note that we keep the Event Category in the same order as for Fatalities to ease comparison.

##  Aggregate Injuries by major Event Type
injuries <- setNames(aggregate(storm1993$INJURIES,list(event=storm1993$evcat),sum),c("evcat","injuries"))

##  Plot injuries by major Event Type.   
q <- ggplot(injuries,aes(x=evcat,y=injuries)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Injuries",title="Injuries by Event Type\n(Since 1993)",x="Major Event Types")
q

Here is a bar chart of Total Damages by major Event Type. Note that we keep the Event Category in the same order as for Fatalities and Injuriies to ease comparison. We see that Hurricane (Typhoon) events cause the most property and crop damage.

dmg  <- setNames(aggregate(storm1993$totdmg,list(event=storm1993$evcat),sum),c("evcat","damages"))

dmg <- dmg[order(dmg$damages,decreasing=TRUE),]

k <- ggplot(dmg,aes(x=evcat,y=damages/10^9)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Total Damages in $Billions",title="Property & Crop Damages by Event Type\n(Since 1993)",x="Major Event Types")
k