Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The source code for this document can be found at my github.
The analysis on the storm event database revealed that tornadoes had the most severe impact to population health across the United States between 1950 and 2011. When controlling for all other variables, excessive heat ranks second in the number of deaths caused. Floods and flash floods also rank high among the most dangerours severe weather events in terms of the number of deaths and injuries inflicted to the population. In terms of the economy, floods inflicted the most damage when damages to both crops and properties are combined. In terms of damages to properties, floods did the most damage to followed by hurricanes. Drought contributed the most damage to crops, followed by hurricanes.
Please click on the tabs below to reveal the sections.
The Storm Events Database, provided by National Climatic Data Center was used for the analysis of the economic and health impacts of the severe weather events to the United States between the years 1950 and 2011. The data is stored as a comma separated file and can be downloaded here. There is also some documentation of the data available here.
To determine the impacts of the severe conditions to health and to the economy of the US during 1950 to 2011, we need data regarding the types of severe weather events, the types of impact to health, and the amount of damage to crops and property. The variable EVTYPE contains information regarding the severe weather events during this period. From the documentation (p. 6), NOAA identifies 48 distinct types of severe weather events. FATALITIES contains the number of deaths for each event, while INJURIES contains the count of persons who were injured during an event. Finally, the information regarding the costs to the economy of each severe weather event are stored in the variables PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.
PROPDMG and CROPDMB contain the cost estimates in US dollars of each event in terms of damage to properties and crops, respectively. From the documentation, estimates entered in actual dollars. However, if these estimates could not be retrieved from proper agencies, rough estimates were used. The rough estimates in PROPDMG and CROPDMG are encoded as three significant digits with the corresponding magnitude stored in the variables PROPDMGEXP and CROPDMGEXP, respectively. The magnitudes of the amount of PROPDMG and CROPDMG are encoded as alphabetical characters. The following alphabetical characters are used to signify magnitude: K for thousands, M for millions, and B for billions. In cases where no rough estimates were given, PROPDMGEXP and CROPDMGEXP are not available.
The following steps were taken to prepare the data for analysis.
fread function of the data.table package.PROPDMG, PROPDMGEXP, CROPDMG, and DROPDMGEXP.Click on the tabs below to show each step of the data processing.
The following code chunck downloads the data, saves it in a filename stormdata.csv.bz2, stores the data to the R object storm, and converts the column names to small letters.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
library(data.table)
library(tidyverse)
if(!file.exists("stormdata.csv.bz2")){
download.file(url, destfile = "stormdata.csv.bz2", method = "curl")}
# storm <- readr::read_csv("stormdata.csv.bz2")
storm <- fread(sprintf("bzcat %s | tr -d '\\000'", "stormdata.csv.bz2"))#:>
Read 23.8% of 967216 rows
Read 48.6% of 967216 rows
Read 68.2% of 967216 rows
Read 81.7% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:07
colnames(storm) <- tolower(names(storm))At this point, the number of unique events in the variable evtype is
length(unique(storm$evtype))#:> [1] 985
This means that we need a lot of cleaning to do in order to bring this number down to 48.
The following code chunk stores the unique severe weather conditions as identified by NOAA.
storm_events <- c("winter weather", "winter storm", "wildfire", "waterspout", "volcanic ash", "tsunami",
"tropical storm", "tropical depression", "tornado", "thunderstorm wind", "strong wind", "storm surge/tide",
"sleet", "seiche", "rip current", "marine thunderstorm wind", "marine strong wind", "marine high wind",
"marine hail", "lightning", "lake-effect snow", "lakeshore flood", "ice storm", "hurricane/typhoon",
"high wind", "high surf", "heavy snow", "heavy rain", "heat", "hail",
"funnel cloud", "frost/freeze", "freezing fog", "flood", "flash flood", "extreme cold/wind chill",
"excessive heat", "dust storm", "dust devil", "drought", "dense smoke", "dense fog",
"debris flow", "cold/wind chill", "coastal flood", "blizzard", "avalanche", "astronomical low tide")We can now remove the non-alphabet characters from evtype and create a new variable evtype2 which shows the names of the weather conditions if they match those in storm_events, and shows others otherwise. Note that I have removed the s whenever it appeared at the end of the entry so that entries like flood and floods will be counted as flood.
storm <- storm %>%
mutate(evtype = gsub("s$", "", trimws(tolower(evtype))),
evtype = gsub("[[:blank:][:punct:]+]", "", evtype),
evtype = gsub("[:punct:]+]", "", evtype),
evtype = gsub("[0-9]+", " ", evtype),
evtype2 = case_when(evtype %in% storm_events ~ evtype,
TRUE ~ "others")
)We should see a reduction of unique severe weather condition names in evtype2.
table(storm$evtype2)#:>
#:> avalanche blizzard drought flood hail heat
#:> 386 2719 2488 25330 288661 767
#:> lightning others seiche sleet tornado tsunami
#:> 15756 498818 21 59 60653 20
#:> waterspout wildfire
#:> 3846 2773
While we have brought down the number of severe weather conditions below 48, we have brought it down too much. As a result, there are a lot of information lost in the severe weather events lumped to others.
The following cdoe chunk identifies some patterns corresponding to the type of severe weather conditions identified by NOAA. However, we note that some entries identify more than one events. In some of such cases, the first event is the one retained. Some weather events can also easily be confused with each other such as flood, flash flood, and coastal flood. In such cases, unfortunately, no special care was taken to create patterns that will distinguish one from another. Therefore, discretion should be applied when looking at the results of the analysis where similar event names appear close to each other.
coastal_flood <- "erosion/cstl flood|coastal flooding/erosion|coastal erosion|coastal flooding|cbeach flood|beach erosion/coastal flood|erosin|coastal/tidal flood|coastalflood|cstl flooding/erosion|river flood"
blizzard <- "blizzard"
freeze_frost <- "freeze|freezing rain/sleet|freezing rain/snow|freezing spray|frost|frost\\freeze|freezing rain sleet and light|blizzard/freezing rain|freezing rain|freezing rain and sleet|freezing rain sleet|damaging freeze|early freeze|early frost|freeze|hard freeze"
storm_surge_tide <- "blow-out tide|surf|coastal surge|coastal storm|coastalstorm"
dust_storm <- "blowing dust|duststorm"
heavy_snow <- "blowing snow|wetsnow"
flash_flood <- "breakup flooding|flash flood from ice jam|flashflood|flash flood|flood flash|flood/flash|flood watch/|flood/flash"
wildfire <- "forest fire|brush fire|grass fire"
cold_wind_chill <- "cold, cold temperature|cold wave|cold weather|cold wind chill temperature|cold/wind|cool spell"
dust_devil <- "dust devel|dust devil waterspout"
extreme_cold_wind_chill <- "excessive cold|extreme wind chill|extreme wind chill/blowing sno|extreme windchill|extreme cold|extreme/record cold"
freezing_fog <- "fog and cold temperature"
winter_storm <- "ice storm"
hail <- "hail aloft|hail damage|hail flooding|hail storm|hail(0.85)|hail/icy road|hail/wind|hailstorm"
drought <- "heat drought|heat wave drought|heat/drought"
excessive_heat <- "hyperthermia|heat wave|heatburst|hightemperature|hot|excessiveheat"
lake_effect_snow <- "heavy lake snow"
high_wind <- "highwind"
heavy_snow <- "heavysnow"
lightning <- "lightning"
ice_storm <- "ice|icy"
hurricane <- "hurricane|typhoon"
marine_hail <- "marinehail"
thunderstorm <- "marinetst|thunderstorm"
sleet <- "sleet"
# Modify evtype2 further
storm <- storm %>%
mutate(
evtype2 = case_when(
str_detect(evtype, coastal_flood) ~ "coastal flood",
str_detect(evtype,blizzard) ~ "blizzard",
str_detect(evtype,freeze_frost) ~ "freeze/frost",
str_detect(evtype,storm_surge_tide) ~ "storm surge/tide",
str_detect(evtype,dust_storm) ~ "dust storm",
str_detect(evtype,heavy_snow) ~ "heavy snow",
str_detect(evtype,flash_flood) ~ "flash flood",
str_detect(evtype,wildfire) ~ "wild fire",
str_detect(evtype,cold_wind_chill) ~ "cold wind/chill",
str_detect(evtype,dust_devil) ~ "dust devil",
str_detect(evtype,extreme_cold_wind_chill) ~ "extreme cold wind/chill",
str_detect(evtype,freezing_fog) ~ "freezing fog",
str_detect(evtype,winter_storm) ~ "winter storm",
str_detect(evtype,hail) ~ "hail",
str_detect(evtype,drought) ~ "drought",
str_detect(evtype,excessive_heat) ~ "excessive heat",
str_detect(evtype,lake_effect_snow) ~ "lake effect snow",
str_detect(evtype,thunderstorm) ~ "thunderstorm",
str_detect(evtype,high_wind) ~ "high wind",
str_detect(evtype,heavy_snow) ~ "heavy snow",
str_detect(evtype,lightning) ~ "lightning",
str_detect(evtype,ice_storm) ~ "ice storm",
str_detect(evtype,hurricane) ~ "hurricane",
str_detect(evtype,marine_hail) ~ "marine hail",
TRUE ~ evtype2
)
)The following are now the unique events in evtype2
table(storm$evtype2)#:>
#:> avalanche blizzard coastal flood drought
#:> 386 2745 851 2488
#:> dust storm excessive heat flash flood flood
#:> 430 1722 55670 25330
#:> freeze/frost hail heat heavy snow
#:> 1512 288666 767 15795
#:> high wind hurricane ice storm lightning
#:> 21931 298 2211 15765
#:> marine hail others seiche sleet
#:> 442 281095 21 59
#:> storm surge/tide thunderstorm tornado tsunami
#:> 1076 115745 60653 20
#:> waterspout wildfire
#:> 3846 2773
There number of names of unique severe weather conditions left in evtype2 is
length(unique(storm$evtype2))#:> [1] 26
For this analysis, we shall focus on these 26 variables. We also notice that we have greatly reduced the number of variables lumped in the entry others.
In the code chunck below, I have created the columns prop_dmg_exp and crop_dmg_exp, which contains the powers of ten multipliers to propdmb and cropdmg. NA’s will be introduced since propdmgexp and cropdmgexp are only available for rough estimates.
storm <- storm %>%
mutate(
prop_dmg_exp = case_when(
propdmgexp %in% c('h', "H") ~ 2,
propdmgexp %in% c('k', "K") ~ 3,
propdmgexp %in% c('m', "M") ~ 6,
propdmgexp %in% c('b', "B") ~ 9,
!is.na(as.numeric(propdmgexp)) ~ as.numeric(propdmgexp),
propdmgexp %in% c('', '-', '?', '+') ~ 0
),
crop_dmg_exp = case_when(
cropdmgexp %in% c('h', "H") ~ 2,
cropdmgexp %in% c('k', "K") ~ 3,
cropdmgexp %in% c('m', "M") ~ 6,
cropdmgexp %in% c('b', "B") ~ 9,
!is.na(as.numeric(cropdmgexp)) ~ as.numeric(cropdmgexp),
cropdmgexp %in% c('', '-', '?', '+') ~ 0
)
)#:> Warning in eval_bare(f[[2]], env): NAs introduced by coercion
#:> Warning in eval_bare(f[[3]], env): NAs introduced by coercion
#:> Warning in eval_bare(f[[2]], env): NAs introduced by coercion
#:> Warning in eval_bare(f[[3]], env): NAs introduced by coercion
The following code chunks now computes the rough estimates, and stores the results to prop_dmg and crop_dmg for property damages and crop damages, respectively.
storm <- storm %>%
mutate(
prop_dmg = case_when(
!is.na(prop_dmg_exp) ~ propdmg * 10^prop_dmg_exp,
TRUE ~ propdmg
),
crop_dmg = case_when(
!is.na(crop_dmg_exp) ~ cropdmg * 10^crop_dmg_exp,
TRUE ~ propdmg
)
)The amount in US dollars of the damages in crops and properties, and the total damages aggregated for every type of event for the between the years 1950 and 2011 is computed in the following code chunk.
damage_by_event <- storm %>% group_by(evtype2) %>%
summarise(
crop_damage = sum(crop_dmg, na.rm = TRUE),
prop_damage = sum(prop_dmg, na.rm = TRUE),
total_event_damage = crop_damage + prop_damage
) %>%
arrange(total_event_damage) %>%
mutate(evtype2 = factor(evtype2, levels = .$evtype2))Click on the tabs below to show the analysis.
The following code chunk calculates the total deaths and injuries caused by the severe weather conditions in the US between 1950 and 2011.
storm %>% summarise(
event_fatalities = sum(fatalities, na.rm = TRUE),
event_injuries = sum(injuries, na.rm = TRUE)
)#:> event_fatalities event_injuries
#:> 1 15145 140528
There were more than 15,000 deaths and 140,000 injuries in the population of the United States between 1950 and 2011 due to severe weather events.
The following code chunk computes the number of fatalities and injuries in the US between 1950 and 2011 due to the severe weather events, and plots the frequencies using Cleveland dot plots.
health_impact <- storm %>% group_by(evtype2) %>%
summarise(
fatalities = sum(fatalities, na.rm = TRUE),
injuries = sum(injuries, na.rm = TRUE)
)
health_impact2 <- gather(health_impact, impact, frequency, -evtype2)
health_impact3 <- health_impact2 %>% mutate(event.impact = paste(evtype2, impact, sep=".")) %>%
mutate(event.impact = factor(event.impact, levels = event.impact[order(frequency)]))
health_impact3 %>% group_by(impact) %>% top_n(12) %>%
ggplot(aes(event.impact, frequency, color = impact, label = frequency)) +
geom_segment(aes(x = event.impact, y = 0, xend = event.impact, yend = frequency)) +
geom_point() +
theme_bw() +
theme(panel.grid = element_blank()) +
#geom_text(hjust = -.2, size = 3) +
scale_x_discrete(name = "Type of Event", breaks = health_impact3$event.impact, labels = health_impact3$evtype2) +
facet_wrap(~impact, scales = "free", nrow = 2) +
ylab("Frequency") +
coord_flip() +
ggtitle("Health impact of severe weather events")#:> Selecting by event.impact
From the plot above, tornadoes caused the greatest impact to the health of population. Between 1950 and 2011, there were 5633 deaths and 91346 injuries due to tornado alone.
During this period, the most number of deaths due to a tornado happened on July 12, 1995.
The code below plots the cost in US dollars of damages to crops and properties.
damage_by_event2 <- damage_by_event %>%
gather(damage, value, -evtype2) %>%
mutate(event.damage = paste(evtype2, damage, sep=".")) %>%
mutate(event.damage = factor(event.damage, levels = event.damage[order(value)])) %>%
mutate(damage = factor(damage, labels = c("Crop Damage", "Property Damage","Total Event Damage")))
damage_by_event2 %>% group_by(damage) %>% top_n(10) %>%
ggplot(aes(value, event.damage, label = value, color = damage)) +
geom_segment(aes(x=0, y=event.damage, xend = value, yend=event.damage)) +
geom_point() +
facet_wrap(~damage, scale = "free", nrow = 3) +
scale_y_discrete(name = "Type of Event", breaks = damage_by_event2$event.damage, labels = damage_by_event2$evtype2) +
theme_bw() +
theme(panel.grid = element_blank()) +
ggtitle("Amount of Damages to Crop and Property by Event Type",
subtitle = "in US Dollars, plotted at logarithmic scale") +
xlab("Value in US Dollars")#:> Selecting by event.damage
From the plot above, we see that floods caused the largest economic loss to the United States between 1995 and 2011. During this period, more than 150 billion US dollars worth of properties were lost due to floods during this period.
On the other hand, drought caused the most damage to crops, costing the United States more or less 15 billion US dollars worth of crops. During the same period, next to floods, hurricanes also caused large damages to both crops and properties.