The following analysis takes data on the aftermath of disasters collected from the NOAA. Binning the measurements based on common event types, the total impact of each event on human life (fatalities and injuries) and economic impact (property and crop damage) are calculated and ranked. Tornadoes are the most harmful disaster types to human health, and floods cause the most damage to property and crops.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Data was acquired from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, and is described by the NOAA at the following locations:
The NOAA storm data is acquired from the available URL and read.
if(!file.exists("./data")){dir.create("./data")}
if(file.exists("./data/repdata-data-StormData.csv")) {
print("Data file is present")
dataRaw <- read.csv("./data/repdata-data-StormData.csv", na.strings=c(""," ","NA"))
} else if (file.exists("./data/repdata-data-StormData.csv.bz2")) {
print("Data file is present")
dataRaw <- read.csv("./data/repdata-data-StormData.csv.bz2", na.strings=c(""," ","NA"))
} else {
print("Data file is downloading from internet")
download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="./data/repdata-data-StormData.csv.bz2", method="curl")
dataRaw <- read.csv("./data/repdata-data-StormData.csv.bz2", na.strings=c(""," ","NA"))
}
## [1] "Data file is present"
A quick look at the raw data shows a few things:
head(dataRaw[,c(1:8, 23:28)])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 0 15 25.0 K 0 <NA>
## 2 0 0 2.5 K 0 <NA>
## 3 0 2 25.0 K 0 <NA>
## 4 0 2 2.5 K 0 <NA>
## 5 0 2 2.5 K 0 <NA>
## 6 0 6 2.5 K 0 <NA>
str(dataRaw$EVTYPE)
## Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
In order to accurately count the impact of each type of event, the events in EVTYPE need to be converted to uniform formats corresponding to the event types provided in the NOAA documentation, and the amount of damage needs to combine the values and exponents
Since this study is focused on the health and economic impact, a subset of the raw data will be used.
impact <- dataRaw[, c("EVTYPE","FATALITIES","INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
The following converion puts all of the DMGEXP values as uppercase characters, swaps out non-number values with a corresponding number value (such as B representing 9), converts to numbers, and uses them to determine the full-length damage value to property and crops.
impact$PROPDMGEXP <- toupper(as.character(impact$PROPDMGEXP))
impact$PROPDMGEXP[is.na(impact$PROPDMGEXP)] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="-"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="?"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="+"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="B"] <- "9"
impact$PROPDMGEXP[impact$PROPDMGEXP=="M"] <- "6"
impact$PROPDMGEXP[impact$PROPDMGEXP=="K"] <- "3"
impact$PROPDMGEXP[impact$PROPDMGEXP=="H"] <- "2"
impact$PROPDMGEXP <- as.numeric(impact$PROPDMGEXP)
impact$PROPDMGFULL <- impact$PROPDMG*(10^impact$PROPDMGEXP)
impact$CROPDMGEXP <- toupper(as.character(impact$CROPDMGEXP))
impact$CROPDMGEXP[is.na(impact$CROPDMGEXP)] <- "0"
impact$CROPDMGEXP[impact$CROPDMGEXP=="?"] <- "0"
impact$CROPDMGEXP[impact$CROPDMGEXP=="B"] <- "9"
impact$CROPDMGEXP[impact$CROPDMGEXP=="M"] <- "6"
impact$CROPDMGEXP[impact$CROPDMGEXP=="K"] <- "3"
impact$CROPDMGEXP <- as.numeric(impact$CROPDMGEXP)
impact$CROPDMGFULL <- impact$CROPDMG*(10^impact$CROPDMGEXP)
The event types are mainly entered in a uniform manner, but there are many that contain variations such as:
In order to more accurately describe the data, the EVTYPES were grouped into uniform categories based on some common character strings.
#Convert EVTYPEs to uniform codes
impact$EVTYPE <- toupper(impact$EVTYPE)
impact$EVTYPE <- gsub("* ", "", impact$EVTYPE)
impact$EVTYPE <- gsub("TSTM", "THUNDERSTORM", impact$EVTYPE)
impact$EVTYPE[grepl("LIGHTNING", x=impact$EVTYPE)] <- "Lightning"
impact$EVTYPE[grepl("LIGHTING", x=impact$EVTYPE)] <- "Lightning"
impact$EVTYPE[grepl("MARINETHUN", x=impact$EVTYPE)] <- "Marine Thunderstorm Winds"
impact$EVTYPE[grepl("THUN", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("TUNDER", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("THUD", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("FLASH", x=impact$EVTYPE)] <- "Flash Flood"
impact$EVTYPE[grepl("FIRE", x=impact$EVTYPE)] <- "Wildfire"
impact$EVTYPE[grepl("VOLC", x=impact$EVTYPE)] <- "Volcanic Ash"
impact$EVTYPE[grepl("MARINEHAIL", x=impact$EVTYPE)] <- "Marine Hail"
impact$EVTYPE[grepl("MARINESTRONG", x=impact$EVTYPE)] <- "Marine Strong Wind"
impact$EVTYPE[grepl("MARINEHIGH", x=impact$EVTYPE)] <- "Marine High Wind"
impact$EVTYPE[grepl("HAIL", x=impact$EVTYPE)] <- "Hail"
impact$EVTYPE[grepl("WATERSP", x=impact$EVTYPE)] <- "Waterspout"
impact$EVTYPE[grepl("SPOUT", x=impact$EVTYPE)] <- "Waterspout"
impact$EVTYPE[grepl("SLIDE", x=impact$EVTYPE)] <- "Debris Flow"
impact$EVTYPE[grepl("STREAM", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("URBAN", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("DRY", x=impact$EVTYPE)] <- "Drought"
impact$EVTYPE[grepl("DROUGHT", x=impact$EVTYPE)] <- "Drought"
impact$EVTYPE[grepl("DUST", x=impact$EVTYPE)] <- "Dust Storm"
impact$EVTYPE[grepl("RIP", x=impact$EVTYPE)] <- "Rip Current"
impact$EVTYPE[grepl("AVA", x=impact$EVTYPE)] <- "Avalanche"
impact$EVTYPE[grepl("EXCESSIVEHEAT", x=impact$EVTYPE)] <- "Excessive Heat"
impact$EVTYPE[grepl("HEAT", x=impact$EVTYPE)] <- "Heat"
impact$EVTYPE[grepl("LOWTI", x=impact$EVTYPE)] <- "Astronomical Low Tide"
impact$EVTYPE[grepl("EXT", x=impact$EVTYPE)] <- "Extreme Cold/Wind Chill"
impact$EVTYPE[grepl("EXCESSIVECOLD", x=impact$EVTYPE)] <- "Extreme Cold/Wind Chill"
impact$EVTYPE[grepl("COLD", x=impact$EVTYPE)] <- "Cold/Wind Chill"
impact$EVTYPE[grepl("CHILL", x=impact$EVTYPE)] <- "Cold/Wind Chill"
impact$EVTYPE[grepl("TORN", x=impact$EVTYPE)] <- "Tornado"
impact$EVTYPE[grepl("TROPICALSTORM", x=impact$EVTYPE)] <- "Tropical Storm"
impact$EVTYPE[grepl("TROPICALDEP", x=impact$EVTYPE)] <- "Tropical Depression"
impact$EVTYPE[grepl("SLEET", x=impact$EVTYPE)] <- "Sleet"
impact$EVTYPE[grepl("HURRICANE", x=impact$EVTYPE)] <- "Hurricane (Typhoon)"
impact$EVTYPE[grepl("TYPH", x=impact$EVTYPE)] <- "Hurricane (Typhoon)"
impact$EVTYPE[grepl("BLIZ", x=impact$EVTYPE)] <- "Blizzard"
impact$EVTYPE[grepl("COASTALFL", x=impact$EVTYPE)] <- "Coastal Flood"
impact$EVTYPE[grepl("CSTL", x=impact$EVTYPE)] <- "Coastal Flood"
impact$EVTYPE[grepl("SURGE", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("TIDAL", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("HIGHTIDE", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("LAKESHOREFLOOD", x=impact$EVTYPE)] <- "Lakeshore Flood"
impact$EVTYPE[grepl("LAKEFLOOD", x=impact$EVTYPE)] <- "Lakeshore Flood"
impact$EVTYPE[grepl("HEAVYRAIN", x=impact$EVTYPE)] <- "Heavy Rain"
impact$EVTYPE[grepl("ICESTO", x=impact$EVTYPE)] <- "Ice Storm"
impact$EVTYPE[grepl("FLOOD", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("ICE", x=impact$EVTYPE)] <- "Ice Storm"
impact$EVTYPE[grepl("WINTERSTO", x=impact$EVTYPE)] <- "Winter Storm"
impact$EVTYPE[grepl("WINTERWEAT", x=impact$EVTYPE)] <- "Winter Weather"
impact$EVTYPE[grepl("TSUNA", x=impact$EVTYPE)] <- "Tsunami"
impact$EVTYPE[grepl("LAKE", x=impact$EVTYPE)] <- "Lake-Effect Snow"
impact$EVTYPE[grepl("WIND", x=impact$EVTYPE)] <- "Wind"
impact$EVTYPE[grepl("HEAVYSNOW", x=impact$EVTYPE)] <- "Heavy Snow"
impact$EVTYPE[grepl("FREEZING", x=impact$EVTYPE)] <- "Sleet"
impact$EVTYPE[grepl("FREEZ", x=impact$EVTYPE)] <- "Frost/Freeze"
impact$EVTYPE[grepl("FROST", x=impact$EVTYPE)] <- "Frost/Freeze"
The first analysis looks at the harm inflicted on people as injuries or fatalities. Injuries and fatalities are summed by each event type and stored in a new data frame. Event types with the 10 most fatalities are taken. The columns of the data frame for the types of human impact are condensed into 1 column with the harm (fatality, injury, total) as a factor.
eventHarm <- aggregate(FATALITIES~EVTYPE,impact, FUN=sum)
eventHarm <- merge(eventHarm, aggregate(INJURIES~EVTYPE,impact, FUN=sum))
eventHarm$TotalHarmed <- eventHarm$FATALITIES + eventHarm$INJURIES
eventHarm <- eventHarm[order(-eventHarm$FATALITIES)[1:10],]
library(reshape2)
Harm <- melt(eventHarm, id.vars="EVTYPE", variable.name = "harmType")
eventHarm
## EVTYPE FATALITIES INJURIES TotalHarmed
## 249 Tornado 5633 91364 96997
## 36 Excessive Heat 1920 6525 8445
## 58 Heat 1212 2684 3896
## 44 Flash Flood 1035 1802 2837
## 99 Lightning 817 5232 6049
## 248 Thunderstorm Winds 737 9510 10247
## 159 Rip Current 577 529 1106
## 45 Flood 511 6873 7384
## 283 Wind 441 1910 2351
## 42 Extreme Cold/Wind Chill 305 260 565
The second analysis looks at the damage inflicted on property and crops in monetary values. Property and crop damages are summed by each event type and stored in a new data frame. Event types with the 10 largest property damages are taken. The columns of the data frame for the types of damage are condensed into 1 column with the damage (property, crop, total) as a factor.
eventDamage <- aggregate(PROPDMGFULL~EVTYPE,impact, FUN=sum)
eventDamage <- merge(eventDamage, aggregate(CROPDMGFULL~EVTYPE,impact, FUN=sum))
eventDamage$TotalDamage <- eventDamage$PROPDMGFULL + eventDamage$CROPDMGFULL
eventDamage <- eventDamage[order(-eventDamage$PROPDMGFULL)[1:10],]
Damage <- melt(eventDamage, id.vars="EVTYPE", variable.name = "damageType")
eventDamage
## EVTYPE PROPDMGFULL CROPDMGFULL TotalDamage
## 45 Flood 150208839377 10855961050 161064800427
## 83 Hurricane (Typhoon) 85356410010 5516117800 90872527810
## 249 Tornado 56952152376 414961470 57367113846
## 180 Storm Surge/Tide 47974662150 855000 47975517150
## 44 Flash Flood 17589312096 1532197150 19121509246
## 56 Hail 15977560513 3046887623 19024448136
## 248 Thunderstorm Winds 12779403800 1274158988 14053562788
## 282 Wildfire 8496628500 403281630 8899910130
## 253 Tropical Storm 7714390550 694896000 8409286550
## 284 Winter Storm 6749497251 32444000 6781941251
Entries in the data are cleaned up to make figures look better.
Harm$harmType <- gsub("FATALITIES", "Fatalities", Harm$harmType)
Harm$harmType <- gsub("INJURIES", "Injuries", Harm$harmType)
Harm$harmType <- gsub("TotalHarm", "Total Harm", Harm$harmType)
Damage$damageType <- gsub("PROPDMGFULL", "Property Damage", Damage$damageType)
Damage$damageType <- gsub("CROPDMGFULL", "Crop Damage", Damage$damageType)
Damage$damageType <- gsub("TotalDamage", "Total Damage", Damage$damageType)
The plot below shows the impact on human health based on the number of fatalities, injuries, and the sum of both. Based on this plot we see tornadoes cause the most harm to people.
library(ggplot2)
ggplot(Harm, aes(x=reorder(EVTYPE, -value), y=value)) +
geom_bar(stat="identity", aes(fill=harmType), position="dodge") +
xlab("Disaster Type") +
ylab("Number of People") +
ggtitle("Top 10 Disaster Types with the Most Fatalities") +
theme(axis.text.x = element_text(angle=45, hjust=1))
The plot below shows the economic impact based on the cost of damages to property, crops, and the sum of both. Based on this plot we see floods cause the most property damage.
ggplot(Damage, aes(x=reorder(EVTYPE, -value), y=value)) +
geom_bar(stat="identity", aes(fill=damageType), position="dodge") +
xlab("Disaster Type") +
ylab("Damage Caused ($)") +
ggtitle("Top 10 Disaster Types with the Most Property Damage") +
theme(axis.text.x = element_text(angle=45, hjust=1))