Synopsis
By exploring 1950-2011 storm data compiled by the U.S. National and Oceanic Atmospheric Administration (NOAA), this study identifies the type of storm event or events that severely impacted human health and economic well-being throughout the United States.
Data Processing
# Setting working directory...
setwd("~/Documents/r-things")
# Downloading and reading dataset...
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, "storm.csv.bz2")
storms <- read.csv("storm.csv.bz2", header=T)
library(tibble)
## Warning: package 'tibble' was built under R version 4.0.5
as_tibble(storms, 5) # To preview and verify layout of dataset
## # A tibble: 902,297 x 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/195~ 0130 CST 97 MOBILE AL TORNA~ 0
## 2 1 4/18/195~ 0145 CST 3 BALDWIN AL TORNA~ 0
## 3 1 2/20/195~ 1600 CST 57 FAYETTE AL TORNA~ 0
## 4 1 6/8/1951~ 0900 CST 89 MADISON AL TORNA~ 0
## 5 1 11/15/19~ 1500 CST 43 CULLMAN AL TORNA~ 0
## 6 1 11/15/19~ 2000 CST 77 LAUDERDALE AL TORNA~ 0
## 7 1 11/16/19~ 0100 CST 9 BLOUNT AL TORNA~ 0
## 8 1 1/22/195~ 0900 CST 123 TALLAPOOSA AL TORNA~ 0
## 9 1 2/13/195~ 2000 CST 125 TUSCALOOSA AL TORNA~ 0
## 10 1 2/13/195~ 2000 CST 57 FAYETTE AL TORNA~ 0
## # ... with 902,287 more rows, and 28 more variables: BGN_AZI <chr>,
## # BGN_LOCATI <chr>, END_DATE <chr>, END_TIME <chr>, COUNTY_END <dbl>,
## # COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <chr>, END_LOCATI <chr>,
## # LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>, FATALITIES <dbl>,
## # INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <chr>, CROPDMG <dbl>,
## # CROPDMGEXP <chr>, WFO <chr>, STATEOFFIC <chr>, ZONENAMES <chr>,
## # LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>, ...
# The other variable of interest is EVTYPE, or the storm event type. Here, we would like to see the names of individual event types. We find a sample of those here...
event <- storms[, "EVTYPE"]
as_tibble(unique(event))
## # A tibble: 985 x 1
## value
## <chr>
## 1 TORNADO
## 2 TSTM WIND
## 3 HAIL
## 4 FREEZING RAIN
## 5 SNOW
## 6 ICE STORM/FLASH FLOOD
## 7 SNOW/ICE
## 8 WINTER STORM
## 9 HURRICANE OPAL/HIGH WINDS
## 10 THUNDERSTORM WINDS
## # ... with 975 more rows
# Note: These are just a few samples. Also note at least a couple of inconsistencies here, i.e. the phrases 'TSTM WIND' and 'THUNDERSTORM WINDS.' A cursory check of remaining unique entries in EVTYPE reveals numerous similar inconsitencies, misspellings, variants of the same event type, entries written either in upper-case or lower-case, or entries, such as the following, that does not explicitly state any event type...
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following object is masked _by_ '.GlobalEnv':
##
## storms
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plyr)
## Warning: package 'plyr' was built under R version 4.0.5
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(stringr)
## Warning: package 'stringr' was built under R version 4.0.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
library(stringr)
as_tibble(storms[str_detect(storms$EVTYPE,"Summary"), 8]) # Will come back to this later.
## # A tibble: 73 x 1
## value
## <chr>
## 1 Summary Jan 17
## 2 Summary of March 14
## 3 Summary of March 23
## 4 Summary of March 24
## 5 Summary of April 3rd
## 6 Summary of April 12
## 7 Summary of April 13
## 8 Summary of April 21
## 9 Summary August 11
## 10 Summary of April 27
## # ... with 63 more rows
# Next comes the laborious task of tidying the dataset, by renaming certain event types, so that we can have a consistent set of event types to work with. As a basis, we will refer to section 2.1.1, Storm Data Event Table, of the National Weather Service Instruction 10-1605 and continue from there.
# Where applicable, we will try to marry up and rephrase some terms and phrases with whatever appropriate event is listed in section 2.1.1. Records with event types that cannot be closely matched with any of the the events in 2.1.1 are eliminated.
storms$EVTYPE <- toupper(as.character(storms$EVTYPE)) # For uniformity, all event type entries are converted to uppercase where needed.
# Recall the "summary" entries in EVTYPE. In all these cases, save one, there is more than one event written in the remarks. None stood out. Couple that with no reported fatalities, injuries, and property and crop damage, these records will be eliminated.
storms$EVTYPE[storms$EVTYPE == "BLIZZARD SUMMARY"] <- "BLIZZARD" # There is one exception. This keeper "summary" is actually a reported blizzard (per notes in the corresponding remarks).
storms2 <- storms[!grepl("SUMMARY", storms$EVTYPE),] # New dataframe with the "summary" EVTYPES records removed.
storms3 <- storms2[!grepl(246124, storms$REFNUM),] # Newer dataframe. The specific record (REFNUM = 246124) with an event type classified as a literal '?' has nothing in the remarks to declare or imply a type of storm, so it is also excluded from the dataframe.
# Where applicable, will now perform some rephrasing of EVTYPE entries to closely match listed event types in 2.1.1...
# ASTRONOMICAL LOW TIDE -- Will keep as is
# AVALANCHE (there was one misspelling)
storms3$EVTYPE <- gsub("AVALANCE", "AVALANCHE", storms3$EVTYPE)
# BLIZZARD
storms3$EVTYPE <- gsub(".*BLIZZARD.*", "BLIZZARD", storms3$EVTYPE)
# COASTAL FLOOD
storms3$EVTYPE <- gsub(".*COASTAL.*", "COASTAL FLOOD", storms3$EVTYPE)
# COLD/WIND CHILL & EXTREME COLD/WINDCHILL -- Going to make an exception to the "2.1.1 rule" here and lump these two into one new event type called COLD CONDITIONS.
storms3$EVTYPE <- gsub(".*COLD.*", "COLD CONDITIONS", storms3$EVTYPE)
# DENSE FOG
storms3$EVTYPE <- gsub(".*PATCHY DENSE FOG.*", "DENSE FOG", storms3$EVTYPE)
# DENSE SMOKE -- Will keep as is
# DROUGHT
storms3$EVTYPE <- gsub(".*DROUGHT.*", "DROUGHT", storms3$EVTYPE)
# DUST DEVIL
storms3$EVTYPE <- gsub(".*DUST DEVIL.*", "DUST DEVIL", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*DUST DEVIL.*", "DUST DEVEL", storms3$EVTYPE)
# DUST STORM
storms3$EVTYPE <- gsub(".*DUST STORM.*", "DUST STORM", storms3$EVTYPE)
# EXCESSIVE HEAT & HEAT -- Will be combining these event types to collectively call them HEAT CONDITIONS
storms3$EVTYPE <- gsub(".*HEAT.*", "HEAT CONDITIONS", storms3$EVTYPE)
# FLASH FLOOD & FLOOD -- Combining two of these event types
storms3$EVTYPE <- gsub(".*FLOOD.*", "FLASH FLOOD", storms3$EVTYPE)
# FREEZING FOG -- Leaving as is
# FROST/FREEZE
storms3$EVTYPE <- gsub(".*FROST.*", "FROST/FREEZE", storms3$EVTYPE)
# FUNNEL CLOUD
storms3$EVTYPE <- gsub(".*FUNNEL.*", "FUNNEL CLOUD", storms3$EVTYPE)
# HAIL
storms3$EVTYPE <- gsub(".*HAIL.*", "HAIL", storms3$EVTYPE)
# HEAVY RAIN
storms3$EVTYPE <- gsub(".*HEAVY RAIN.*", "HEAVY RAIN", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*RECORD RAINFALL.*", "HEAVY RAIN", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*RAINSTORM.*", "HEAVY RAIN", storms3$EVTYPE)
# HEAVY SNOW
storms3$EVTYPE <- gsub(".*HEAVY SNOW.*", "HEAVY SNOW", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*RECORD SNOWFALL.*", "HEAVY SNOW", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*SNOW AND WIND.*", "HEAVY SNOW", storms3$EVTYPE)
# HEAVY SURF
storms3$EVTYPE <- gsub(".*HIGH SURF ADV.*", "HEAVY SURF", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*HAZARDOUS SURF.*", "HEAVY SURF", storms3$EVTYPE)
# HIGH WIND
storms3$EVTYPE <- gsub(".*HIGH WIND.*", "HIGH WIND", storms3$EVTYPE)
# HURRICANE (TYPHOON)
storms3$EVTYPE <- gsub(".*HURRICANE.*", "HURRICANE (TYPHOON)", storms3$EVTYPE)
# ICE STORM
storms3$EVTYPE <- gsub(".*ICE STORM.*", "ICE STORM", storms3$EVTYPE)
# LIGHTNING
storms3$EVTYPE <- gsub(".*LIGHTNING.*", "LIGHTNING", storms3$EVTYPE)
# All marine events listed in 2.1.1 were incorporated into their appropriate land-based equivalents
# RIP CURRENT
storms3$EVTYPE <- gsub(".*RIP CURRENT.*", "RIP CURRENT", storms3$EVTYPE)
# SEICH - Keeping this as is
# SLEET
storms3$EVTYPE <- gsub(".*SLEET.*", "SLEET", storms3$EVTYPE)
# STORM SURGE/TIDE
storms3$EVTYPE <- gsub(".*STORM SURGE.*", "STORM SURGE/TIDE", storms3$EVTYPE)
# STRONG WIND
storms3$EVTYPE <- gsub(".*STRONG WIND.*", "STRONG WIND", storms3$EVTYPE)
# THUNDERSTORM WIND -- This one has one of the most variations and misspellings
storms3$EVTYPE <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*THUNDERSTORM WINS.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*THUNDERSTORMS WIN.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*SEVERE THUNDERSTORM.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*SEVERE THUNDERSTORMS.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*THUNDERSTORM.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*THUNDERSTROM WINDS.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*THUNDERESTORM WINDS.*", "THUNDERSTORM WIND", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*TSTM WIND.*", "THUNDERSTORM WIND", storms3$EVTYPE)
# TORNADO
storms3$EVTYPE <- gsub(".*TORNADO.*", "TORNADO", storms3$EVTYPE)
# TROPICAL DEPRESSION - Will leave as is
# TROPICAL STORM
storms3$EVTYPE <- gsub(".*TROPICAL STORM.*", "TROPICAL STORM", storms3$EVTYPE)
# VOLCANIC ASH
storms3$EVTYPE <- gsub(".*VOLCANIC ASH.*", "VOLCANIC ASH", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*VOLCANIC ERUPTION.*", "VOLCANIC ASH", storms3$EVTYPE)
# WATERSPOUT
storms3$EVTYPE <- gsub(".*WATERSPOUT.*", "WATERSPOUT", storms3$EVTYPE)
storms3$EVTYPE <- gsub(".*WATER SPOUT.*", "WATERSPOUT", storms3$EVTYPE)
# WILDFIRE
storms3$EVTYPE <- gsub(".*WILD.*", "WILDFIRE", storms3$EVTYPE)
# WINTER STORM
storms3$EVTYPE <- gsub(".*WINTER STORM.*", "WINTER STORM", storms3$EVTYPE)
# WINTER WEATHER
storms3$EVTYPE <- gsub(".*WINTER WEATHER.*", "WINTER WEATHER", storms3$EVTYPE)
# Building a new dataframe that meets this condition...
storms4 <- storms3[storms3$EVTYPE=="ASTRONOMICAL LOW TIDE"|
storms3$EVTYPE=="AVALANCHE"|
storms3$EVTYPE=="BLIZZARD"|
storms3$EVTYPE=="COASTAL FLOOD"|
storms3$EVTYPE=="COLD CONDITIONS"|
storms3$EVTYPE=="DENSE FOG"|
storms3$EVTYPE=="DENSE SMOKE"|
storms3$EVTYPE=="DROUGHT"|
storms3$EVTYPE=="DUST DEVIL"|
storms3$EVTYPE=="DUST STORM"|
storms3$EVTYPE=="HEAT CONDITIONS"|
storms3$EVTYPE=="FLASH FLOOD & FLOOD"|
storms3$EVTYPE=="FREEZING FOG"|
storms3$EVTYPE=="FROST/FREEZE"|
storms3$EVTYPE=="FUNNEL CLOUD"|
storms3$EVTYPE=="HAIL"|
storms3$EVTYPE=="HEAVY RAIN"|
storms3$EVTYPE=="HEAVY SNOW"|
storms3$EVTYPE=="HEAVY SURF"|
storms3$EVTYPE=="HIGH WIND"|
storms3$EVTYPE=="HURRICANE (TYPHOON)"|
storms3$EVTYPE=="ICE STORM"|
storms3$EVTYPE=="LIGHTNING"|
storms3$EVTYPE=="RIP CURRENT"|
storms3$EVTYPE=="SEICH"|
storms3$EVTYPE=="SLEET"|
storms3$EVTYPE=="STORM SURGE/TIDE"|
storms3$EVTYPE=="STRONG WIND"|
storms3$EVTYPE=="THUNDERSTORM WIND"|
storms3$EVTYPE=="TORNADO"|
storms3$EVTYPE=="TROPICAL DEPRESSION"|
storms3$EVTYPE=="TROPICAL STORM"|
storms3$EVTYPE=="VOLCANIC ASH"|
storms3$EVTYPE=="WATERSPOUT"|
storms3$EVTYPE=="WILDFIRE"|
storms3$EVTYPE=="WINTER STORM"|
storms3$EVTYPE=="WINTER WEATHER",]
unique(storms4$EVTYPE) # To ensure that we have the EVTYPES that we need
## [1] "TORNADO" "THUNDERSTORM WIND" "HAIL"
## [4] "WINTER STORM" "HIGH WIND" "COLD CONDITIONS"
## [7] "HURRICANE (TYPHOON)" "HEAVY RAIN" "LIGHTNING"
## [10] "DENSE FOG" "RIP CURRENT" "FUNNEL CLOUD"
## [13] "HEAT CONDITIONS" "WATERSPOUT" "BLIZZARD"
## [16] "HEAVY SNOW" "ICE STORM" "AVALANCHE"
## [19] "DUST STORM" "SLEET" "STRONG WIND"
## [22] "HEAVY SURF" "WILDFIRE" "WINTER WEATHER"
## [25] "DROUGHT" "STORM SURGE/TIDE" "TROPICAL STORM"
## [28] "FROST/FREEZE" "FREEZING FOG" "VOLCANIC ASH"
## [31] "TROPICAL DEPRESSION" "DENSE SMOKE" "ASTRONOMICAL LOW TIDE"
## [34] NA
# The other chosen measures (variables) of interest are FATALITIES and INJURIES (which are self-explanatory and should help one get a sense of storm severity on human health), and PROPDMG and CROPDMG (property damage and crop damage, respectively; to measure storms' economic consequences. Will now create another, smaller dataframe with just these variables (columns), plus EVTYPE. First, would like to retain just the records that have at least one fatality, injury, or crop damage or property damage reported.
storms5 <- storms4[storms4$FATALITIES>0|
storms4$INJURIES>0|
storms4$PROPDMG>0|
storms4$CROPDMG>0,]
storm_data <- storms5[, c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG")] # The final dataframe to work off of and for determining storm events' impacts.
head(storm_data, 15) # Check of the dataframe, first fifteen rows
## EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1 TORNADO 0 15 25.0 0
## 2 TORNADO 0 0 2.5 0
## 3 TORNADO 0 2 25.0 0
## 4 TORNADO 0 2 2.5 0
## 5 TORNADO 0 2 2.5 0
## 6 TORNADO 0 6 2.5 0
## 7 TORNADO 0 1 2.5 0
## 8 TORNADO 0 0 2.5 0
## 9 TORNADO 1 14 25.0 0
## 10 TORNADO 0 0 25.0 0
## 11 TORNADO 0 3 2.5 0
## 12 TORNADO 0 3 2.5 0
## 13 TORNADO 1 26 250.0 0
## 14 TORNADO 0 12 0.0 0
## 15 TORNADO 0 6 25.0 0
Results
# FATALITIES & INJURIES
human_health <- ddply(storm_data, .(EVTYPE), summarize, Total=sum(FATALITIES+INJURIES))
human_health<- human_health[order(-human_health$Total),]
as_tibble(head(human_health, 5)) # Picked the top 5 event types that impacted human health the most
## # A tibble: 5 x 2
## EVTYPE Total
## <chr> <dbl>
## 1 TORNADO 97043
## 2 HEAT CONDITIONS 12341
## 3 THUNDERSTORM WIND 10171
## 4 LIGHTNING 6049
## 5 ICE STORM 2079
# FATALITIES & INJURIES PLOT
health_plot <- ggplot(head(human_health,5), aes(x=EVTYPE, y=Total, fill=EVTYPE))+geom_bar(stat="identity")+xlab("Event Type")+ylab("Total Fatalities & Injuries")+ggtitle("Figure 1. Top 5 Storm Event Types with Severe Health Consequences")
health_plot + theme(axis.text.x = element_text(angle = 45))
# PROPERTY DAMAGES
props <- ddply(storm_data, .(EVTYPE), summarize, Total_dollars=sum(PROPDMG))
props <- props[order(-props$Total),]
as_tibble(head(props, 5)) # Top 5 event types that impacted property
## # A tibble: 5 x 2
## EVTYPE Total_dollars
## <chr> <dbl>
## 1 TORNADO 3215748.
## 2 THUNDERSTORM WIND 2667394.
## 3 HAIL 699320.
## 4 LIGHTNING 603672.
## 5 HIGH WIND 381594.
# PROPERTY DAMAGES PLOT
prop_plot <- ggplot(head(props,5), aes(x=EVTYPE, y=Total_dollars, fill=EVTYPE))+geom_bar(stat="identity")+xlab("Event Type")+ylab("Total Dollars")+ggtitle("Figure 2. Top 5 Storm Event Types with Severe Property Damage Impact")
prop_plot + theme(axis.text.x = element_text(angle = 45))
# CROP DAMAGES
crops <- ddply(storm_data, .(EVTYPE), summarize, Total_dollars=sum(CROPDMG))
crops <- crops[order(-crops$Total),]
as_tibble(head(crops, 5)) # Top 5 event types that impacted property
## # A tibble: 5 x 2
## EVTYPE Total_dollars
## <chr> <dbl>
## 1 HAIL 585957.
## 2 THUNDERSTORM WIND 194870.
## 3 TORNADO 100027.
## 4 DROUGHT 33954.
## 5 HIGH WIND 19058.
# CROP DAMAGES PLOT
crop_plot <- ggplot(head(crops,5), aes(x=EVTYPE, y=Total_dollars, fill=EVTYPE))+geom_bar(stat="identity")+xlab("Event Type")+ylab("Total Dollars")+ggtitle("Figure 2. Top 5 Storm Event Types with Severe Crop Damage Impact")
crop_plot + theme(axis.text.x = element_text(angle = 45))
Across the United States, tornadoes were the most harmful of storm events to have severely affected human population health.
Regarding negative economic impacts, tornadoes caused the highest total dollar amount in property damages, while hail was tops in crop damages.