Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. In this report, we aim to describe which types of events are most harmful with respect to population health, and which types of events have the greatest economic consequences in the United States. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States in 1950-2011, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. From these data, we found that tornadoes are the most harmful events across the United States, with respect to public health, as they account for the highest number of fatalities and injuries over time. We also found that events categorised as ‘Other’ have the greatest economic consequences in the United States, accounting for the largest total monetary losses over time.
The data come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The events in the database start in the year 1950 and end in November 2011. We read the file and display the structure of the dataset below.
StormData <- read.csv("repdata-data-StormData.csv.bz2")
str(StormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
We are interested in the type of events with respect to population health and those with the greatest economic consequences across the United States. We extract the relevant columns to process this data.
# Selecting the relevant data
StormData <- StormData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# The EVTYPE variable contains various types of events, some with lower and
# upper cases, and some with extra spaces.
# Creating a new variable all in lower cases and removing those extra spaces in the text.
StormData$EVTYPE_lower <- tolower(StormData$EVTYPE)
StormData$EVTYPE_lower <- gsub("\\s+", " ", StormData$EVTYPE_lower)
StormData$EVTYPE_lower <- trimws(StormData$EVTYPE_lower)
# Number of unique events
length(unique(StormData$EVTYPE_lower))
## [1] 883
Due to the inconsistencies and typographical errors in naming the events, they have been grouped based on keywords, with ambiguous event types into an ‘Other’ category.
# Events were grouped based on keywords due to data inconsistencies and typo
StormData$Event <- "Other"
StormData$Event[grepl("avalanche", StormData$EVTYPE_lower)] <- "Avalanche"
StormData$Event[grepl("blizzard", StormData$EVTYPE_lower)] <- "Blizzard"
StormData$Event[grepl("cold | wind chill | hypothermia", StormData$EVTYPE_lower)] <- "Cold"
StormData$Event[grepl("drought", StormData$EVTYPE_lower)] <- "Drought"
StormData$Event[grepl("flood | flash flood | fld", StormData$EVTYPE_lower)] <- "Flood"
StormData$Event[grepl("fog", StormData$EVTYPE_lower)] <- "Fog"
StormData$Event[grepl("hail", StormData$EVTYPE_lower)] <- "Hail"
StormData$Event[grepl("heat | excessive heat", StormData$EVTYPE_lower)] <- "Heat"
StormData$Event[grepl("rain | heavy rain", StormData$EVTYPE_lower)] <- "Heavy Rain"
StormData$Event[grepl("hurricane | typhoon", StormData$EVTYPE_lower)] <- "Hurricane"
StormData$Event[grepl("lightning", StormData$EVTYPE_lower)] <- "Lightning"
StormData$Event[grepl("rip current", StormData$EVTYPE_lower)] <- "Rip Current"
StormData$Event[grepl("storm surge", StormData$EVTYPE_lower)] <- "Storm Surge"
StormData$Event[grepl("thunderstorm | tstm", StormData$EVTYPE_lower)] <- "Thunderstorm Wind"
StormData$Event[grepl("tornado", StormData$EVTYPE_lower)] <- "Tornado"
StormData$Event[grepl("fire | wildfire", StormData$EVTYPE_lower)] <- "Wildfire"
StormData$Event[grepl("snow | winter | ice | sleet | freezing", StormData$EVTYPE_lower)] <- "Winter Weather"
We now process the data relevant to the economic problems, which consist of property and crop damages.
# Values/abbreviations in property and crop damage
table(StormData$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
table(StormData$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
# Create a function for multiplier
convert_exp <- function(exp) {
exp <- toupper(exp)
if (exp %in% c("", NA)) return(1)
if (exp == "H" | exp == "h") return(10^2) #hundreds
if (exp == "K" | exp == "k") return(10^3) #thousands
if (exp == "M" | exp == "m") return(10^6) #millions
if (exp == "B") return(10^9) #billions
if (grepl("^[0-8]$", exp)) return(10^as.numeric(exp))
return(1)
}
# Applying the function to the dataset
StormData$prop_mult <- sapply(StormData$PROPDMGEXP, convert_exp)
StormData$crop_mult <- sapply(StormData$CROPDMGEXP, convert_exp)
# Value of property and crop damages
StormData$prop_damage <- StormData$PROPDMG * StormData$prop_mult
StormData$crop_damage <- StormData$CROPDMG * StormData$crop_mult
We now process and summarise the type of events by public health and economic problems.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Total number of public health problems, consisting of fatalities and injuries
StormData$ph_impact <- StormData$FATALITIES + StormData$INJURIES
# Summary of public health problems by events
health_summary <- StormData %>%
group_by(Event) %>%
summarise(Total = sum(ph_impact, na.rm = TRUE)) %>%
arrange(desc(Total))
# Total economic damage, consisting of property and crop damages
StormData$economic_damage <- StormData$prop_damage + StormData$crop_damage
# Summary of economic damages by events
economic_summary <- StormData %>%
group_by(Event) %>%
summarise(Total = sum(economic_damage, na.rm = TRUE)) %>%
arrange(desc(Total))
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
# Bar Plot
ggplot(data = health_summary, aes(x = reorder(Event,-Total), y = Total)) +
geom_bar(stat = "identity") +
xlab("Events") +
ylab("Frequency") +
ggtitle("Type of events causing public health problems (1950-2011)") + theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
We have aggregated public health problems (consisting of fatalities and injuries) by event type. The bar plot suggests that tornadoes are the most harmful events across the United States, with respect to public health, as they account for the highest number of fatalities and injuries over time. Events categorised as “Other” account for a substantial proportion of public health problems (about half the number compared to tornadoes), suggesting a wide range of contributing weather conditions. Lightning and thunderstorm winds also contribute noticeably to public health impacts, though at lower frequencies. Overall, sudden and violent weather events tend to pose the greatest risk to population health.
# Bar plot
ggplot(data = economic_summary, aes(x = reorder(Event,-Total), y = Total)) +
geom_bar(stat = "identity") +
xlab("Events") +
ylab("Amount") +
ggtitle("Type of events causing economic problems (1950-2011)") + theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Economic impacts, defined as the combined property and crop damage, were aggregated by event type. The bar plot indicates that events categorised as ‘Other’ have the greatest economic consequences in the United States, accounting for the largest total monetary losses over time. Among specific event types, tornadoes and storm surges cause the most significant economic damage, followed by hail and drought. Thunderstorm winds and hurricanes also contribute notably to overall economic losses, though to a lesser extent. Overall, weather events associated with widespread physical damage or prolonged impacts tend to result in the greatest economic consequences.