Weather events such as storms, hurricanes, lightning, tornados, flooding, and other severe weather events can result in significant economic impact due to damage to property and crops. Additionally, weather events can result in significant loss of life and injury.
This project involves exploring the United States National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks a number of characteristics of major weather events in the United States.
The NOAA storm database contains records from 1950 through 2011. Records indicate the event type, location data, date and time, property damage amounts, and crop damage amounts.
By understanding the economic and public health impacts of weather related events it is possible to properly allocate resources to reduce the negative impacts of the events.
Research Questions
If the data file does not exist then download the data file.
if (!"StormData.csv.bz2" %in% dir("./data")) {
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile = "./data/StormData.csv.bz2", method="curl")
}
Read in the input dataset if it is not already in memory.
if (!"storm_data" %in% ls()) {
storm_data <- read.csv(bzfile("./data/StormData.csv.bz2"))
}
Preprocessing the data:
To reduce memory utilization and improve speed, select only columns relevant
to this investigation:
1. EVTYPE - the event type
2. FATALITIES - the number of fatalities resulting from the event
3. INJURIES - the number of injuries resulting from the event
4. PROPDMG - the property damage amount resulting from the event
5. PROPDMGEXP - a character that identifies a numeric factor to apply
to the PROPDMG value
6. CROPDMG - the crop damage amount resulting from the event
7. CROPDMGEXP - a character that identifies a numeric factor to apply
to the CROPDMG valueCreate two data.frames:
1. Containing any events with fatalities or injuries
2. Containing any events with property damage or crop damages
storm_data_cols <- storm_data[,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG",
"PROPDMGEXP","CROPDMG","CROPDMGEXP")]
## Get rid of any punctuation in the EVTYPE field
storm_data_cols$EVTYPE <- gsub("[[:punct:]]"," ", storm_data_cols$EVTYPE)
In the data frame there are 985 unique EVTYPEs. Consolidating the EVTYPEs reduces the count to 37. The choices for consolidation were based on descriptions of the various events as well as accounting for a number of spelling errors.
The final count for consolidated EVTYPES was 27 unique factors. These accounted for the vast majority of the damages and deaths. There were 10 remaining EVTYPES from the original dataset which were left as-is. Their impact on the valulation was minimal.
storm_data_cols$EVTYPE[grep("tstm|thunder", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "THUNDERSTORM"
storm_data_cols$EVTYPE[grep("hail", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "HAIL"
storm_data_cols$EVTYPE[grep("blizzard|snow", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "SNOW"
storm_data_cols$EVTYPE[grep("flood|fld|high water|floood|rapidly rising water",
storm_data_cols$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
storm_data_cols$EVTYPE[grep("wind|downburst", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "WIND"
storm_data_cols$EVTYPE[grep("rain|precip|shower|wetness", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "RAIN"
storm_data_cols$EVTYPE[grep("hurricane", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "HURRICANE"
storm_data_cols$EVTYPE[grep("light|LIGNTNING", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "LIGHTNING"
storm_data_cols$EVTYPE[grep("tornado|funnel|gustnado|torndao|landspout", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "TORNADO"
storm_data_cols$EVTYPE[grep("tropical", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "TROPICAL STORM"
storm_data_cols$EVTYPE[grep("extreme|unseasonab|excessive|heat wave", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "EXTREME TEMP"
storm_data_cols$EVTYPE[grep("fire", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "FIRE"
storm_data_cols$EVTYPE[grep("slide|slump", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "LANDSLIDE"
storm_data_cols$EVTYPE[grep("freeze", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "FREEZE"
storm_data_cols$EVTYPE[grep("winter|wintry|mix|freezing|frost", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "WINTER MIX"
storm_data_cols$EVTYPE[grep("ice|icy|glaze", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "ICE CONDITIONS"
storm_data_cols$EVTYPE[grep("surf|swell|tide|surge|beach|current|seiche|high seas|high waves|rogue wave",
storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "SURF/SURGE"
storm_data_cols$EVTYPE[grep("microb", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "MICROBURST"
storm_data_cols$EVTYPE[grep("drought", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "DROUGHT"
storm_data_cols$EVTYPE[grep("cold", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "COLD"
storm_data_cols$EVTYPE[grep("coastal", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "COASTAL EVENT"
storm_data_cols$EVTYPE[grep("waterspout|water spout|wayterspout", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "WATERSPOUT"
storm_data_cols$EVTYPE[grep("dust", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "DUST STORM"
storm_data_cols$EVTYPE[grep("fog", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "FOG"
storm_data_cols$EVTYPE[grep("heat|hot weather|hot spell|high temp|hot and dry|hot dry|hot pattern|record high|record temp",
storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "HEAT WAVE"
storm_data_cols$EVTYPE[grep("other", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "OTHER"
storm_data_cols$EVTYPE[grep("urban", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "URBAN STREAM"
storm_data_cols$EVTYPE[grep("volcanic", storm_data_cols$EVTYPE,
ignore.case = TRUE)] <- "VOCANIC"
Two data frames are created. The first data frame contains any records that have any fatalities or injuries (storm_data_cols_fatalities_injuries). The second data frame contains any records that have property damages or crop damages (storm_data_cols_damages).
# Create two data frames: one that includes fatalities & injuries > 0 and one where
# damages are > 0
storm_data_cols_fatalities_injuries <- subset(storm_data_cols,
FATALITIES > 0 | INJURIES > 0)
storm_data_cols_damages <- subset(storm_data_cols, PROPDMG > 0 | CROPDMG > 0)
The PRODDMGEXP field has 19 different values and CROPDMGEXP has 9 factors. The values of the ****DMGEXP are a numeric multiplier that should be applied to the corresponding DMG field. Some of the less evident factors were determined from the publication: How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP
URL: https://rstudio-pubs-static.s3.amazonaws.com/58957_37b6723ee52b455990e149edde45e5b6.html
For PRODDMGEXP/CROPDMGEXP:
Create three new numeric columns for the multipliers and the total damages:
storm_data_cols_damages["PROPDMG_MULTIPLIER"] <- 0
storm_data_cols_damages["CROPDMG_MULTIPLIER"] <- 0
storm_data_cols_damages["TOTAL_DAMAGES"] <- 0
# Coerce the ****DMGEXP to character
storm_data_cols_damages$PROPDMGEXP <- as.character(storm_data_cols_damages$PROPDMGEXP)
storm_data_cols_damages$CROPDMGEXP <- as.character(storm_data_cols_damages$CROPDMGEXP)
# Assign the property damage multiplier based on the table above.
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("\\?|\\-",
storm_data_cols_damages$PROPDMGEXP)] <- 0
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("^?",
storm_data_cols_damages$PROPDMGEXP)] <- 0
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("\\+",
storm_data_cols_damages$PROPDMGEXP)] <- 1
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("0|1|2|3|4|5|6|7|8|9",
storm_data_cols_damages$PROPDMGEXP)] <- 10
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("H|h",
storm_data_cols_damages$PROPDMGEXP)] <- 100
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("K|k",
storm_data_cols_damages$PROPDMGEXP)] <- 1000
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("M|m",
storm_data_cols_damages$PROPDMGEXP)] <- 1000000
storm_data_cols_damages$PROPDMG_MULTIPLIER[grep("B|b",
storm_data_cols_damages$PROPDMGEXP)] <- 1000000000
# Assign the crop damage multiplier based on the table above.
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("\\?|\\-",
storm_data_cols_damages$CROPDMGEXP)] <- 0
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("^?",
storm_data_cols_damages$CROPDMGEXP)] <- 0
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("\\+",
storm_data_cols_damages$CROPDMGEXP)] <- 1
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("0|1|2|3|4|5|6|7|8|9",
storm_data_cols_damages$CROPDMGEXP)] <- 10
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("H|h",
storm_data_cols_damages$CROPDMGEXP)] <- 100
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("K|k",
storm_data_cols_damages$CROPDMGEXP)] <- 1000
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("M|m",
storm_data_cols_damages$CROPDMGEXP)] <- 1000000
storm_data_cols_damages$CROPDMG_MULTIPLIER[grep("B|b",
storm_data_cols_damages$CROPDMGEXP)] <- 1000000000
The total damages for crop and property damages are calculated by taking the product of the property damage value and the multiplier added to the crop damage value multiplied by the multiplier.
# Calculate the total damages for crop and property damages
storm_data_cols_damages$TOTAL_DAMAGES <- (storm_data_cols_damages$PROPDMG * storm_data_cols_damages$PROPDMG_MULTIPLIER) +
(storm_data_cols_damages$CROPDMG * storm_data_cols_damages$CROPDMG_MULTIPLIER)
Aggregate the TOTAL_DAMAGES by the EVTYPE and sum up the TOTAL_DAMAGES for all EVTYPES.
total_damages_summary <- aggregate(TOTAL_DAMAGES ~ EVTYPE, storm_data_cols_damages, sum)
Sort the damage total, in descending order, so that we can find the most costly events.
damages_sorted <- total_damages_summary[order(-total_damages_summary$TOTAL_DAMAGES),]
Now look at fatalities and injuries for the events in the second data frame.
storm_data_cols_fatalities_injuries["TOTAL_INJURIES"] <- 0
storm_data_cols_fatalities_injuries$TOTAL_INJURIES <- storm_data_cols_fatalities_injuries$FATALITIES +
storm_data_cols_fatalities_injuries$INJURIES
Use aggregate to total up the injuries and death for each of the EVTYPES.
total_injuries_summary <- aggregate(TOTAL_INJURIES ~ EVTYPE,
storm_data_cols_fatalities_injuries, sum)
Sort the injury/deaths total, in descending order, so that we can find the event with the most injuries and deaths
injuries_sorted <- total_injuries_summary[order(-total_injuries_summary$TOTAL_INJURIES),]
## The event types that resulted in the most injuries and deaths are:
## EVTYPE TOTAL_INJURIES
## 33 TORNADO 97046
## 32 THUNDERSTORM 10301
## 10 FLOOD 10240
Generate a plot of the injuries and deaths for each event. Settings are similar to those of the previous bar plot.
options(scipen = 999)
par(mar=c(12,8,4,2), cex=0.5, las=2)
barplot(total_injuries_summary$TOTAL_INJURIES,names.arg=total_injuries_summary$EVTYPE, main="Injuries and Deaths From US Weather Events", col="red")
title(xlab="Weather Event Category", line=10, cex.lab=1.2)
title(ylab="Number of Injuries and Deaths", line=4, cex.lab=1.2)
cat("The 3 most costly events are:")
## The 3 most costly events are:
head(damages_sorted,3)
## EVTYPE TOTAL_DAMAGES
## 13 FLOOD 179970030845
## 18 HURRICANE 90161472810
## 30 TORNADO 57408368807
Generate a barplot of the total damages (y-axis) and EVTYPE (x-axis). The margins, scaling (due to the size of the plot and number of EVTYPES), and labels are set perpendicular to the axis (due to EVTYPES). Individual characteristics of the x and y axis are set based on the size of the labels.
options(scipen = 999)
par(mar=c(12,8,4,2), cex=0.5, las=2)
barplot(total_damages_summary$TOTAL_DAMAGES,names.arg=total_damages_summary$EVTYPE, main="Cost of US Weather Events", col="blue")
title(xlab="Weather Event Category", line=8, cex.lab=1.2)
title(ylab="Cost of event in US Dollars", line=7, cex.lab=1.2)
The analysis, based upon the NOAA weather event storm database, determined that the weather event type that resulted in the most significant number of injuries was TORNADO. Additionally, the weather event that resulted in the highest amount of property and crop damage was FLOOD. With this information, the federal and local governments can allocate resources to help mitigate damages and deaths from these events and potentially develop response plans to minimize their impact.