This analysis examines the U.S. National Oceanic and Atmospheric
Administration’s (NOAA) storm database to evaluate the impacts of severe
weather events on population health and economic consequences across the
United States. The dataset includes records of major storms and weather
events, capturing details such as event types, fatalities, injuries, and
property and crop damage. The primary goal is to identify which types of
events are most harmful to population health and which have the greatest
economic consequences. The analysis focuses on the EVTYPE
variable to categorize weather events. For population health, we
aggregate fatalities and injuries by event type to determine the most
harmful events. For economic consequences, we calculate the total
property and crop damage, converting damage values into a consistent
numerical format. The results reveal that tornadoes are the most harmful
to population health, causing the highest number of fatalities and
injuries. Floods, however, lead to the greatest economic losses due to
extensive property damage. Visualizations, including bar plots, are used
to present the top 10 event types for both health and economic impacts.
This report is intended for government or municipal managers to help
prioritize resources for severe weather preparedness.
The data is a comma-separated-value (CSV) file that was originally compressed with bzip2. The file has been downloaded and unzipped, and is located on the desktop. We’ll load the raw CSV file into R for analysis.
# Set working directory to the desktop (Please adjust path based on your system)
setwd("~/Desktop/")
# Read the uncompressed CSV file
storm_data <- read_csv("repdata_data_StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(storm_data)
## # A tibble: 6 × 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0
## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0
## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0
## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0
## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0
## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
The NOAA storm database contains inconsistencies in event types and damage values. We’ll clean the data to make it suitable for analysis. # Standardize Event Types The EVTYPE column contains the type of weather event, but there are many variations of the same event (e.g., “HIGH WIND” and ” HIGH WINDS”). We’ll simplify by converting all event types to uppercase and removing extra spaces. # Convert event types to uppercase and trim whitespace
storm_data <- storm_data %>%
mutate(EVTYPE = toupper(trimws(EVTYPE)))
To assess the impact on population health, we aggregate the total fatalities and injuries by event type (EVTYPE). We then identify the top 10 event types with the highest combined impact (fatalities + injuries)
# Aggregate fatalities and injuries by event type
health_impact <- storm_data %>% group_by(EVTYPE) %>% summarize(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_health_impact = total_fatalities + total_injuries) %>%
arrange(desc(total_health_impact))
# Display the top 10 event types
head(health_impact, 10)
## # A tibble: 10 × 4
## EVTYPE total_fatalities total_injuries total_health_impact
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
We’ll create a bar plot to visualize the top 10 event types by total health impact.
ggplot(head(health_impact, 10), aes(x = reorder(EVTYPE, -total_health_impact), y = total_health_impact)) +
geom_bar(stat = "identity", fill = "blue") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs( title = "Top 10 Weather Events by Population Health Impact",
x = "Event Type",
y = "Total Fatalities + Injuries" )
## The plot shows that tornadoes are the most harmful to population health, causing the highest number of fatalities and injuries combined.
To evaluate economic consequences, we aggregate the total property and crop damage by event type. But,Firstly, we process the data of economic damage: the property and crop damage values are stored in two columns each: PROPDMG and PROPDMGEXP for property damage, and CROPDMG and CROPDMGEXP for crop damage. The EXP columns contain multipliers (e.g., “K” for thousands, “M” for millions, “B” for billions). We’ll convert these into numeric values.
# Function to convert damage values with multipliers
convert_damage <- function(dmg, exp) {
# Replace empty or invalid multipliers with 0
exp <- toupper(exp)
exp <- ifelse(is.na(exp) | exp == "", "0", exp)
# Define multipliers
multipliers <- c("0" = 1, "K" = 1e3, "M" = 1e6, "B" = 1e9)
# Convert damage to numeric
dmg_numeric <- dmg * multipliers[exp]
dmg_numeric[is.na(dmg_numeric)] <- 0
# Replace NAs with 0
return(dmg_numeric)
}
# Apply the function to property and crop damage
storm_data <- storm_data %>%
mutate(
prop_dmg_numeric = convert_damage(PROPDMG, PROPDMGEXP),
crop_dmg_numeric = convert_damage(CROPDMG, CROPDMGEXP),
total_dmg = prop_dmg_numeric + crop_dmg_numeric)
We use the total_dmg column created earlier, which combines property and crop damage in dollars.
# Aggregate total damage by event type
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarize(total_damage = sum(total_dmg, na.rm = TRUE)) %>%
arrange(desc(total_damage))
head(economic_impact, 10)
## # A tibble: 10 × 2
## EVTYPE total_damage
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352113886.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758221486.
## 6 FLASH FLOOD 17562179078.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
We’ll create a bar plot to visualize the top 10 event types by total economic damage.
ggplot(head(economic_impact, 10), aes(x = reorder(EVTYPE, -total_damage), y = total_damage / 1e9)) +
geom_bar(stat = "identity", fill = "orange") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type",
y = "Total Damage (Billions of USD)")
## The plot indicates that floods cause the greatest economic damage, primarily due to extensive property damage.