This analysis explores the National Oceanic and Atmospheric Administration (NOAA) Storm Database to assess the impact of severe weather events on public health and the economy in the United States from 1950 to 2011. We investigate which types of events cause the most harm to public health, in terms of fatalities and injuries, and which types result in the greatest economic losses, considering property and crop damage. Our results reveal the most damaging events in these two categories, providing valuable insights for prioritizing disaster preparedness and resource allocation in communities vulnerable to severe weather.
In this section, we describe the steps taken to load and preprocess the data, followed by the methods used to analyze it.
We begin by downloading and loading the NOAA Storm Database into R.
The data is in CSV format and compressed using the bzip2 algorithm. We
use the read.csv()
function to load the data into R.
# Load the required package
library(dplyr)
# Download the file
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "StormData.csv.bz2")
# Load the data
storm_data <- read.csv("StormData.csv.bz2")
# Data check
dim(storm_data)
## [1] 902297 37
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
head(storm_data, 2)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14 100 3 0 0 15 25.0
## 2 0 2 150 2 0 0 0 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
library(dplyr)
# Filter the relevant columns and rows
storm <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP, CROPDMGEXP) %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
# Transform all column names to lowercase
names(storm) <- tolower(names(storm))
For the analysis, we will compute the total fatalities and injuries
by event type and the total property and crop damage for each event
type. We aggregate the data by evtype
using the
summarise()
function.
library(dplyr)
# Aggregate the data by event type and create a new variable for total health impact
event_health_impact <- storm %>%
group_by(evtype) %>%
summarise(
total_fatalities = sum(fatalities, na.rm = TRUE),
total_injuries = sum(injuries, na.rm = TRUE),
total_propdmg = sum(propdmg, na.rm = TRUE),
total_cropdmg = sum(cropdmg, na.rm = TRUE),
total_health_impact = sum(fatalities, na.rm = TRUE) +
sum(injuries, na.rm = TRUE)) %>%
arrange(desc(total_health_impact))
Since the damage values (propdmg
and
cropdmg
) are recorded in different scales (e.g., thousands,
millions, etc.), we standardize these values for a more meaningful
comparison.
# Function to convert exponent characters to numeric values
convert_exp_to_numeric <- function(exp) {
if (exp == "K") {
return(1000)
} else if (exp == "M") {
return(1e6)
} else if (exp == "B") {
return(1e9)
} else if (exp == "H") {
return(100)
} else if (exp == "") {
return(1) # No exponent (i.e., the value is assumed to be in actual units)
} else if (exp %in% c("1", "2", "3", "4", "5", "6", "7", "8")) {
return(10^as.numeric(exp))
} else {
return(NA) # Return NA for unexpected values
}
}
# Apply the conversion function to both propdmgexp and cropdmgexp
storm$propdmgexp_numeric <- unlist(sapply(storm$propdmgexp, convert_exp_to_numeric))
storm$cropdmgexp_numeric <- unlist(sapply(storm$cropdmgexp, convert_exp_to_numeric))
# Adjust the damage columns by multiplying with the corresponding exponent values
storm$propdmg <- storm$propdmg * storm$propdmgexp_numeric
storm$cropdmg <- storm$cropdmg * storm$cropdmgexp_numeric
We evaluate the types of events causing the most harm to population health by summing the fatalities and injuries. The event types that result in the highest total fatalities and injuries are tornadoes and heat waves.
library(dplyr)
library(ggplot2)
# Summarize the total health impact (total fatalities and total injuries for the 10 most harmful events)
health_impact <- event_health_impact %>%
top_n(10, total_health_impact) %>%
arrange(desc(total_health_impact))
# Plot the most harmful events in terms of health impact
ggplot(health_impact, aes(x = reorder(evtype, total_health_impact),
y = total_health_impact)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Most Harmful Weather Events in Terms of Health Impact",
x = "Event Type",
y = "Health Impact (Total Fatalities & Injuries)") +
theme_minimal()
Next, we focus on the economic impact of severe weather events. This includes the total property and crop damage caused by each event type. Hurricanes and floods are among the most damaging events in terms of economic costs.
library(dplyr)
library(ggplot2)
library(scales)
# Summarize the total economic damage for each event type
economic_impact <- storm %>%
group_by(evtype) %>%
summarise(
total_propdmg = sum(propdmg, na.rm = TRUE),
total_cropdmg = sum(cropdmg, na.rm = TRUE)
) %>%
mutate(total_damage = total_propdmg + total_cropdmg) %>% # Create total damage variable
arrange(desc(total_damage)) %>%
top_n(10, total_damage) # Get top 10 events by economic damage
# Function to format numbers in billions
format_billions <- function(x) {
paste0(label_comma()(x / 1e9), "B")
}
# Plot the most damaging events in terms of economic impact
ggplot(economic_impact, aes(x = reorder(evtype, total_damage), y = total_damage)) +
geom_bar(stat = "identity", aes(fill = total_damage), color = "black") +
coord_flip() +
labs(title =
"Top 10 Weather Events by Economic Cost",
x = "Event Type",
y = "Economic Cost (Total Property & Crop Damage in Billions (B) of Dollars ($))") +
theme_minimal() +
scale_fill_gradient(low = "blue", high = "red", name = "Total Damage",
labels = format_billions) +
scale_y_continuous(labels = format_billions)
This analysis identifies the most harmful weather events in the U.S. based on both public health and economic consequences. Tornadoes, heat waves, and hurricanes are among the most impactful events in terms of fatalities, injuries, and economic costs. By understanding these patterns, municipalities can better allocate resources for disaster preparedness and mitigation.