This analysis looks at the NOAA’s database of storms and other severe weather events and their associated fatalities, injuries, and property damage. The objective of this analysis is to identify which types of storm/weather events have a greater impact on population health (as measured by fatalities and injuries) and which have greater economic consequences (as measured by total damages to property and crops). The first step in the analysis entailed identifying the most relevant time period for analysis – which turns out to be 1993 onwards. Then, for this time period, storm/weather event types were analyzed and reclassified into the major Event Types found in the sourcebook. Finally, bar charts of fatalities, injuries and total monetary damages by major Event Types were created, which allows for comparison.
The NOAA database was download from the URL specified. We analyzed the total fatalities per year and noticed a significant increase starting with 1993 onward, likely a result of increased and improved strom tracking across the US.
## Set the working directory
setwd("C:/Users/jlee/Documents/Data Science/Reproducible Research/")
## If the file isnt there, download it and unzip it
if (!file.exists("stormdata.csv.bz2")) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, dest="stormdata.csv.bz2", method="curl")
}
## Read into data frame, create a column for Year
storm <- read.csv(bzfile("stormdata.csv.bz2"),stringsAsFactor=FALSE)
library(lubridate)
storm$BGN_DATE <- strptime(storm$BGN_DATE, "%m/%d/%Y %H:%M:%S")
BGN_YEAR <- year(storm$BGN_DATE)
storm <- cbind(storm,BGN_YEAR)
## Aggegate fatalities by Year
fatalities <- setNames(aggregate(storm$FATALITIES,list(year=storm$BGN_YEAR),sum),c("year","fatalities"))
In the table below, we see a step change increase in fatalities starting in 1993.
fatalities
## year fatalities
## 1 1950 70
## 2 1951 34
## 3 1952 230
## 4 1953 519
## 5 1954 36
## 6 1955 129
## 7 1956 83
## 8 1957 193
## 9 1958 67
## 10 1959 58
## 11 1960 46
## 12 1961 52
## 13 1962 30
## 14 1963 31
## 15 1964 73
## 16 1965 301
## 17 1966 98
## 18 1967 114
## 19 1968 131
## 20 1969 66
## 21 1970 73
## 22 1971 159
## 23 1972 27
## 24 1973 89
## 25 1974 366
## 26 1975 60
## 27 1976 44
## 28 1977 43
## 29 1978 53
## 30 1979 84
## 31 1980 28
## 32 1981 24
## 33 1982 64
## 34 1983 37
## 35 1984 160
## 36 1985 112
## 37 1986 51
## 38 1987 89
## 39 1988 55
## 40 1989 79
## 41 1990 95
## 42 1991 73
## 43 1992 54
## 44 1993 298
## 45 1994 344
## 46 1995 1491
## 47 1996 542
## 48 1997 601
## 49 1998 687
## 50 1999 908
## 51 2000 477
## 52 2001 469
## 53 2002 498
## 54 2003 443
## 55 2004 370
## 56 2005 469
## 57 2006 599
## 58 2007 421
## 59 2008 488
## 60 2009 333
## 61 2010 425
## 62 2011 1002
As a result, we decided do further data processing only on data from 1993 onward in the NOAA dataset. Our next step was to identify the key Event Types. To do this, we calculate the total fatalities by event type during the time period. And, to further narrow the list of Event Types, we excluded all Event Types with 10 or less fatalities during this time period.
## Use data from 1993 onward for further analysis
storm1993 <- storm[storm$BGN_YEAR >= 1993,]
## Get fatalities by events
eventagg <- setNames(aggregate(storm1993$FATALITIES,list(event=storm1993$EVTYPE),sum),c("evtype","fatalities"))
keyevents<- eventagg[eventagg$fatalities>10,]
This left us with a manageable list of Event Types, which we will further aggregate and reclassify to the one of the 48 major Event Types found in the sourcebook.
keyevents
## evtype fatalities
## 19 AVALANCHE 224
## 30 BLIZZARD 101
## 66 COLD 35
## 72 COLD AND SNOW 14
## 79 COLD/WIND CHILL 95
## 89 DENSE FOG 18
## 117 DUST STORM 22
## 130 EXCESSIVE HEAT 1903
## 140 EXTREME COLD 160
## 141 EXTREME COLD/WIND CHILL 125
## 142 EXTREME HEAT 96
## 146 EXTREME WINDCHILL 17
## 153 FLASH FLOOD 978
## 161 FLASH FLOOD/FLOOD 14
## 164 FLASH FLOODING 19
## 170 FLOOD 470
## 177 FLOOD/FLASH FLOOD 17
## 188 FOG 62
## 275 HEAT 937
## 278 HEAT WAVE 172
## 290 HEAVY RAIN 98
## 310 HEAVY SNOW 127
## 342 HEAVY SURF/HIGH SURF 42
## 350 HIGH SURF 101
## 359 HIGH WIND 248
## 376 HIGH WINDS 35
## 402 HURRICANE 61
## 411 HURRICANE/TYPHOON 64
## 427 ICE STORM 89
## 442 LANDSLIDE 38
## 464 LIGHTNING 816
## 488 MARINE STRONG WIND 14
## 580 RECORD/EXCESSIVE HEAT 17
## 585 RIP CURRENT 368
## 586 RIP CURRENTS 204
## 670 STORM SURGE 13
## 671 STORM SURGE/TIDE 11
## 676 STRONG WIND 103
## 760 THUNDERSTORM WIND 133
## 786 THUNDERSTORM WINDS 64
## 834 TORNADO 1621
## 842 TORNADOES, TSTM WIND, HAIL 25
## 848 TROPICAL STORM 58
## 856 TSTM WIND 241
## 877 TSUNAMI 33
## 886 UNSEASONABLY WARM 11
## 888 UNSEASONABLY WARM AND DRY 29
## 919 URBAN/SML STREAM FLD 28
## 955 WILD/FOREST FIRE 12
## 957 WILDFIRE 75
## 960 WIND 23
## 972 WINTER STORM 206
## 978 WINTER WEATHER 33
## 980 WINTER WEATHER/MIX 28
Here is how we mapped the Event Types in the NOAA database to one of the 48 major Event Types in the sourcebook.
## Create a new column to store the mapped Event Type and bind to the dataset
evcat <-storm1993$EVTYPE
storm1993 <- cbind(storm1993,evcat)
storm1993$evcat <- as.character(storm1993$evcat)
## Reclassigy events to the 31 Event Types in the Source Book
storm1993$evcat[grep("^AVALANCHE$",storm1993$evcat)] <- "Avalanche"
storm1993$evcat[grep("^BLIZZARD$",storm1993$evcat)] <- "Blizzard"
storm1993$evcat[grep("^COLD AND SNOW$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^COLD/WIND CHILL$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^COLD$",storm1993$evcat)] <- "Cold/Wind Chill"
storm1993$evcat[grep("^DENSE FOG$",storm1993$evcat)] <- "Dense Fog"
storm1993$evcat[grep("^DUST STORM$",storm1993$evcat)] <- "Dust Storm"
storm1993$evcat[grep("^EXCESSIVE HEAT$",storm1993$evcat)] <- "Excessive Heat"
storm1993$evcat[grep("^EXTREME COLD$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^EXTREME COLD/WIND CHILL$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^EXTREME HEAT$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^EXTREME WINDCHILL$",storm1993$evcat)] <- "Extreme Cold/Wind Chill"
storm1993$evcat[grep("^FLASH FLOOD/FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLASH FLOODING$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLOOD/FLASH FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLASH FLOOD$",storm1993$evcat)] <- "Flash Flood"
storm1993$evcat[grep("^FLOOD$",storm1993$evcat)] <- "Flood"
storm1993$evcat[grep("^FOG$",storm1993$evcat)] <- "Dense Fog"
storm1993$evcat[grep("^HEAT WAVE$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^HEAT$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^HEAVY RAIN$",storm1993$evcat)] <- "Heavy Rain"
storm1993$evcat[grep("^HEAVY SNOW$",storm1993$evcat)] <- "Heavy Snow"
storm1993$evcat[grep("^HEAVY SURF/HIGH SURF$",storm1993$evcat)] <- "High Surf"
storm1993$evcat[grep("^HIGH SURF$",storm1993$evcat)] <- "High Surf"
storm1993$evcat[grep("^HIGH WINDS$",storm1993$evcat)] <- "High Wind"
storm1993$evcat[grep("^HIGH WIND$",storm1993$evcat)] <- "High Wind"
storm1993$evcat[grep("^HURRICANE/TYPHOON$",storm1993$evcat)] <- "Hurricane (Typhoon)"
storm1993$evcat[grep("^HURRICANE$",storm1993$evcat)] <- "Hurricane (Typhoon)"
storm1993$evcat[grep("^ICE STORM$",storm1993$evcat)] <- "Ice Storm"
storm1993$evcat[grep("^LANDSLIDE$",storm1993$evcat)] <- "Other"
storm1993$evcat[grep("^LIGHTNING$",storm1993$evcat)] <- "Lightning"
storm1993$evcat[grep("^MARINE STRONG WIND$",storm1993$evcat)] <- "Marine Strong Wind"
storm1993$evcat[grep("^RECORD/EXCESSIVE HEAT$",storm1993$evcat)] <- "Excessive Heat"
storm1993$evcat[grep("^RIP CURRENTS$",storm1993$evcat)] <- "Rip Current"
storm1993$evcat[grep("^RIP CURRENT$",storm1993$evcat)] <- "Rip Current"
storm1993$evcat[grep("^STORM SURGE/TIDE$",storm1993$evcat)] <- "Storm Surge/Tide"
storm1993$evcat[grep("^STORM SURGE$",storm1993$evcat)] <- "Storm Surge/Tide"
storm1993$evcat[grep("^STRONG WIND$",storm1993$evcat)] <- "Strong Wind"
storm1993$evcat[grep("^THUNDERSTORM WINDS$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^THUNDERSTORM WIND$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^TORNADOES, TSTM WIND, HAIL$",storm1993$evcat)] <- "Tornado"
storm1993$evcat[grep("^TORNADO$",storm1993$evcat)] <- "Tornado"
storm1993$evcat[grep("^TROPICAL STORM$",storm1993$evcat)] <- "Tropical Storm"
storm1993$evcat[grep("^TSTM WIND$",storm1993$evcat)] <- "Thunderstorm Wind"
storm1993$evcat[grep("^TSUNAMI$",storm1993$evcat)] <- "Tsunami"
storm1993$evcat[grep("^UNSEASONABLY WARM AND DRY$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^UNSEASONABLY WARM$",storm1993$evcat)] <- "Heat"
storm1993$evcat[grep("^URBAN/SML STREAM FLD$",storm1993$evcat)] <- "Lakeshore Flood"
storm1993$evcat[grep("^WILD/FOREST FIRE$",storm1993$evcat)] <- "Wildfire"
storm1993$evcat[grep("^WILDFIRE$",storm1993$evcat)] <- "Wildfire"
storm1993$evcat[grep("^WINTER STORM$",storm1993$evcat)] <- "Winter Storm"
storm1993$evcat[grep("^WINTER WEATHER/MIX$",storm1993$evcat)] <- "Winter Weather"
storm1993$evcat[grep("^WINTER WEATHER$",storm1993$evcat)] <- "Winter Weather"
storm1993$evcat[grep("^WIND$",storm1993$evcat)] <- "Strong Wind"
# Now, for all other Event Types which are not mapped, we will reclassify to "Other"
match <- (storm1993$evcat == storm1993$EVTYPE)
storm1993$evcat[match] <- "Other"
Also, we calculated total damage, by summing up property and crop damages. We interpreted the “exp” columns as expontents to the 10 power.
## Convert symbols to 0, B = billion, K = thousand, M = million, H = hundred. Do this for Property.
storm1993$PROPDMGEXP <- gsub("+",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("?",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("-",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$PROPDMGEXP <- gsub("B",9,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("K",3,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("M",6,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- gsub("H",2,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$PROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$PROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$PROPDMGEXP[is.na(storm1993$PROPDMGEXP)] <- 0
## And do same for Crop Exponent
storm1993$CROPDMGEXP <- gsub("+",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("?",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("-",0,storm1993$PROPDMGEXP,fixed=TRUE)
storm1993$CROPDMGEXP <- gsub("B",9,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("K",3,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("M",6,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- gsub("H",2,storm1993$PROPDMGEXP,ignore.case=TRUE)
storm1993$CROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$CROPDMGEXP <- as.integer(storm1993$PROPDMGEXP)
storm1993$CROPDMGEXP[is.na(storm1993$PROPDMGEXP)] <- 0
## Caculate total damages as sum of property and crops. Add as column to the dataset.
totdmg <- storm1993$PROPDMG*(10^storm1993$PROPDMGEXP) + storm1993$CROPDMG*(10^storm1993$CROPDMGEXP)
storm1993 <- cbind(storm1993,totdmg)
Now, lets take a look at Fatalities, Injuries and Total Damages by major Event Type.
Here is a bar chart of Fatalities by major Event Type. The Events on the left caused the most fatalities.
## Aggregate fatalities by major Event Type
fatalities <- setNames(aggregate(storm1993$FATALITIES,list(event=storm1993$evcat),sum),c("evcat","fatalities"))
## Order by fatalities descending
fatalities <- fatalities[order(fatalities$fatalities,decreasing=TRUE),]
## Plot fatalities by major Event Type
library(ggplot2)
g <- ggplot(fatalities,aes(x=evcat,y=fatalities)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Fatalities",title="Fatalities by Event Type\n(Since 1993)",x="Major Event Types")
g
Here is a bar chart of Injuries by major Event Type. Note that we keep the Event Category in the same order as for Fatalities to ease comparison.
## Aggregate Injuries by major Event Type
injuries <- setNames(aggregate(storm1993$INJURIES,list(event=storm1993$evcat),sum),c("evcat","injuries"))
## Plot injuries by major Event Type.
q <- ggplot(injuries,aes(x=evcat,y=injuries)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Injuries",title="Injuries by Event Type\n(Since 1993)",x="Major Event Types")
q
Here is a bar chart of Total Damages by major Event Type. Note that we keep the Event Category in the same order as for Fatalities and Injuriies to ease comparison. We see that Hurricane (Typhoon) events cause the most property and crop damage.
dmg <- setNames(aggregate(storm1993$totdmg,list(event=storm1993$evcat),sum),c("evcat","damages"))
dmg <- dmg[order(dmg$damages,decreasing=TRUE),]
k <- ggplot(dmg,aes(x=evcat,y=damages/10^9)) + geom_bar(stat="identity") + scale_x_discrete(limits=fatalities$evcat) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(y="Total Damages in $Billions",title="Property & Crop Damages by Event Type\n(Since 1993)",x="Major Event Types")
k