This analysis explores the NOAA Storm Database to identify the types of weather events most harmful to population health and those with the greatest economic consequences across the United States. The study covers events from 1950 to 2011, with a focus on fatalities, injuries, and economic damages. Data processing involved cleaning the raw data, converting damage values, and aggregating impacts by event type. Results show that tornadoes have the most significant impact on population health, while floods cause the greatest economic damage. This information is crucial for government and municipal managers in prioritizing resources for severe weather preparedness and response.
In this section, we describe the process of loading and preparing the NOAA Storm Database for analysis. The raw data was downloaded from the provided URL and read into R. We then cleaned the data by selecting relevant columns, standardizing event type names, and converting damage values to consistent numeric formats. The data was then aggregated to summarize health impacts (fatalities and injuries) and economic impacts (property and crop damage) by event type. ### Loading the Data This subsection includes code to load the required libraries, read the data file using data.table for efficiency, and explore the structure of the dataset.
knitr::opts_chunk$set(
echo = TRUE,
fig.height = 6,
fig.width = 10,
message = FALSE,
warning = FALSE
)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# Download and read the data
#url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download.file(url, destfile = "repdata_data_StormData.csv")
storm_data <- read.csv("repdata_data_StormData.csv")
###Clean and preprocess the data This subsection describes the steps taken to preprocess the data. This includes:
storm_data_clean <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(
EVTYPE = toupper(EVTYPE),
PROPDMGEXP = toupper(PROPDMGEXP),
CROPDMGEXP = toupper(CROPDMGEXP)
)
The function convert_exp is designed to convert alphabetic exponents used in the NOAA Storm Database into their corresponding numeric multipliers. Its operation can be described as follows: A single argument, exp, is taken by the function. This argument represents the exponent character from the dataset. A series of if-else statements is utilized to check the value of exp and return the appropriate multiplier:
Case sensitivity is not considered in this function, as uppercase and lowercase letters are treated identically. The function’s importance lies in its role in correctly interpreting the damage values in the dataset. In the original data, a damage value might be recorded as “1.5M”, signifying 1.5 million dollars. This is stored as two separate values: 1.5 in the damage column and “M” in the exponent column. Through the use of this function, “M” can be correctly converted to 1,000,000, and then multiplied by 1.5 to obtain the actual damage value of 1,500,000 dollars. When this function is employed in combination with sapply() in the damage calculation step, the conversion is applied to every row in the dataset. This ensures that all damage values are correctly interpreted and converted to their actual dollar amounts. As a result, accurate aggregation and comparison of economic impacts across different types of weather events can be performed.
convert_exp <- function(exp) {
if (exp %in% c("H", "h")) return(100)
else if (exp %in% c("K", "k")) return(1000)
else if (exp %in% c("M", "m")) return(1e6)
else if (exp %in% c("B", "b")) return(1e9)
else return(1)
}
The actual damage amounts for both property and crop damage are calculated by this code chunk. In the original dataset, two separate columns are used to store the damage values: a numeric value (PROPDMG for property damage, CROPDMG for crop damage) and an exponent (PROPDMGEXP for property damage, CROPDMGEXP for crop damage).
Letters are used to represent multipliers in the exponent column:
“K” or “k” is used for thousand (10^3)
“M” or “m” is used for million (10^6)
“B” or “b” is used for billion (10^9)
“H” or “h” is used for hundred (10^2)
The letter exponents are converted into actual numeric multipliers by the convert_exp function (defined earlier). The damage value is then multiplied by its corresponding multiplier to obtain the actual damage amount in dollars.
For example, if PROPDMG is 1.5 and PROPDMGEXP is “M”, the actual property damage would be calculated as 1.5 * 1,000,000 = 1,500,000 dollars.
The calculation is performed for both property damage (PROPDMG_ACTUAL) and crop damage (CROPDMG_ACTUAL), with new columns being created to display the actual dollar amounts.
storm_data_clean <- storm_data_clean %>%
mutate(
PROPDMG_ACTUAL = PROPDMG * sapply(PROPDMGEXP, convert_exp),
CROPDMG_ACTUAL = CROPDMG * sapply(CROPDMGEXP, convert_exp)
)
The data is aggregated by event type to summarize both health and economic impacts by this code chunk. Two new data frames are created: health_impact and economic_impact.
For health_impact: - The data is grouped by event type (EVTYPE). - All fatalities and injuries for each event type are summed. - A total health impact is calculated by adding fatalities and injuries. - The results are arranged in descending order of total health impact.
For economic_impact: - The data is grouped by event type (EVTYPE). - All property damage and crop damage for each event type are summed. - A total economic impact is calculated by adding property and crop damage. - The results are arranged in descending order of total economic impact.
These aggregations allow for the identification of event types with the largest overall impacts on health and the economy, respectively. By arranging the results in descending order, the most impactful event types can be easily identified. The summarized data frames will be used later for analysis and visualization to answer the main questions of the assignment regarding which event types are most harmful to population health and have the greatest economic consequences.
health_impact <- storm_data_clean %>%
group_by(EVTYPE) %>%
summarise(
FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES),
TOTAL_HEALTH_IMPACT = FATALITIES + INJURIES
) %>%
arrange(desc(TOTAL_HEALTH_IMPACT))
economic_impact <- storm_data_clean %>%
group_by(EVTYPE) %>%
summarise(
PROPERTY_DAMAGE = sum(PROPDMG_ACTUAL),
CROP_DAMAGE = sum(CROPDMG_ACTUAL),
TOTAL_ECONOMIC_IMPACT = PROPERTY_DAMAGE + CROP_DAMAGE
) %>%
arrange(desc(TOTAL_ECONOMIC_IMPACT))
This section presents the results of the analysis. It should be divided into subsections based on the types of impacts being analyzed.
The plot below displays the top 10 weather events by total health impact, combining fatalities and injuries. Tornadoes clearly stand out as the most harmful event type, followed by excessive heat and TSTM wind. Tornadoes account for 96,979 total fatalities and injuries over the period studied, significantly more than any other event type.
This finding underscores the critical importance of tornado warning systems and public education about tornado safety procedures. It also suggests that resources for emergency medical services and hospitals might be particularly important in tornado-prone areas.
The top three weather events in terms of health impact are:
ggplot(head(health_impact, 10), aes(x = reorder(EVTYPE, -TOTAL_HEALTH_IMPACT), y = TOTAL_HEALTH_IMPACT)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events by Health Impact",
x = "Event Type",
y = "Total Fatalities and Injuries") +
theme_minimal() +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
### Economic Consequences of Severe Weather Events When examining the
economic impact of severe weather events, a somewhat different pattern
emerges. This plot illustrates the top 10 weather events by total
economic impact, combining property and crop damage. Floods emerge as
the costliest type of weather event, followed by hurricanes/typhoons and
tornadoes. The total economic damage caused by floods over the study
period amounts to approximately $150 billion.
This result highlights the need for robust flood prevention and mitigation strategies, including infrastructure improvements and zoning regulations. It also suggests that flood insurance programs play a crucial role in economic resilience to severe weather events.
ggplot(head(economic_impact, 10), aes(x = reorder(EVTYPE, -TOTAL_ECONOMIC_IMPACT), y = TOTAL_ECONOMIC_IMPACT / 1e9)) +
geom_bar(stat = "identity", fill = "darkgreen") +
coord_flip() +
labs(title = "Top 10 Weather Events by Economic Impact",
x = "Event Type",
y = "Total Damage (Billions of Dollars)") +
theme_minimal() +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
##Conclusion
The analysis of the NOAA Storm Database has demonstrated that various types of severe weather events pose distinct risks to both population health and economic stability. Tornadoes have been identified as the most significant threat to human health, with the highest combined numbers of fatalities and injuries. In contrast, floods have emerged as the most economically destructive events, causing the greatest combined property and crop damage.
These findings underscore the importance of targeted actions for disaster preparedness and response. It is recommended that:
Prioritization be given to the development and implementation of tornado warning systems and public education campaigns about tornado safety, particularly in high-risk regions. Significant resources be allocated towards flood prevention, mitigation, and recovery efforts. Emergency medical services be adequately equipped to manage mass casualty events, especially those related to tornadoes. The role of insurance programs, particularly flood insurance, be considered in enhancing community resilience. Comprehensive disaster response plans be formulated to address both the immediate health impacts and the long-term economic consequences of severe weather events. By focusing on these critical areas, resources can be more effectively allocated to safeguard both the health of populations and the economic stability of communities in the face of severe weather events.