This analysis explores the NOAA Storm Database to assess the impact of severe weather events on population health and economic consequences. Two primary questions are addressed: 1. Which types of events are most harmful to population health (based on fatalities and injuries)? 2. Which types of events have the greatest economic consequences (considering property and crop damages)?
The data was processed by cleaning missing values, creating summary variables (e.g., total health impacts and economic damages), and sorting the events to identify the most harmful. The results are visualized through bar plots to represent the top events for both health and economic impacts.
The data was loaded from the CSV file
"data/repdata-data-StormData.csv" and processed using R.
The dataset contains information on various weather events, including
their effects on health and the economy.
# Load the necessary library and data
weather_data <- read.csv("data/repdata-data-StormData.csv")
# Check the structure and missing values
str(weather_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
colSums(is.na(weather_data))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 0 0 0 0 0 0 0
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 0 0 0 0 0 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F
## 902297 0 0 0 0 0 843563
## MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 0 0 0 0 0 0 0
## WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## 0 0 0 47 0 40 0
## REMARKS REFNUM
## 0 0
FATALITIES and INJURIES were
selected to analyze the health impacts of each event. A new column,
FI, was created by adding fatalities and injuries to
provide a total measure of health impact.EVTYPE column was converted into a factor to aid in
visualization and analysis.PROPDMG (property damage) and
CROPDMG (crop damage) were selected, and a new column,
total, was created by adding these two values to assess the
total economic impact of each event.# Health Impact Analysis
health_related <- weather_data[, c('FATALITIES', 'INJURIES', 'EVTYPE')]
health_related$FI <- health_related$FATALITIES + health_related$INJURIES
health_related$EVTYPE <- as.factor(health_related$EVTYPE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Sorting by total health impact (FI)
filtered_FI <- health_related %>%
arrange(desc(FI)) %>%
slice(1:10)
# Sorting by fatalities and injuries
health_related_sorted_F <- health_related %>%
arrange(desc(FATALITIES)) %>%
slice(1:10)
health_related_sorted_INJ <- health_related %>%
arrange(desc(INJURIES)) %>%
slice(1:10)
The top 10 events with the highest total health impact (fatalities + injuries), fatalities, and injuries are presented below.
# Create plots to visualize the health impact
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Plot for total health impact (FI)
plot1 <- ggplot(data = filtered_FI, aes(x = FI, y = EVTYPE, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
theme_classic() +
scale_fill_brewer(palette = 'Set3') +
labs(title = "Top 10 Events by Total Health Impact (Fatalities + Injuries)", x = "Total Impact", y = "Event Type")
# Plot for fatalities
plot2 <- ggplot(data = health_related_sorted_F, aes(x = EVTYPE, y = FATALITIES, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
scale_fill_brewer(palette = 'Set3') +
theme_classic() +
labs(title = "Top 10 Events by Fatalities", x = "Event Type", y = "Fatalities")
# Plot for injuries
plot3 <- ggplot(data = health_related_sorted_INJ, aes(x = EVTYPE, y = INJURIES, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
scale_fill_brewer(palette = 'Set3') +
theme_classic() +
labs(title = "Top 10 Events by Injuries", x = "Event Type", y = "Injuries")
# Arrange plots in one grid
grid.arrange(plot1, plot2, plot3, ncol = 1, nrow = 3)
Figure 1: The top 10 weather events by total health impact (fatalities + injuries), fatalities, and injuries are shown. Flash floods and tornadoes are among the most harmful.
The top 20 events with the greatest total economic damage, as well as separate plots for property and crop damages, are presented below.
# Economic Impact Analysis
eco_dam <- weather_data[, c('MAG', 'F', 'PROPDMG', 'CROPDMG', 'EVTYPE')]
eco_dam$total <- eco_dam$PROPDMG + eco_dam$CROPDMG
# Sorting by total damage
eco_dam_sorted <- eco_dam %>%
arrange(desc(total)) %>%
slice(1:20)
# Renaming columns for clarity
colnames(eco_dam_sorted) <- c('Mag', 'F_value', 'PROPDMG', 'CROPDMG', 'EVTYPE', 'Total')
# Plot for total economic damage
pl1 <- ggplot(data = eco_dam_sorted, aes(x = Total, y = EVTYPE, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
theme_classic() +
theme_bw(base_family = 'Times') +
scale_fill_brewer(palette = 'YlOrRd') +
labs(title = "Top 20 Events by Total Economic Damage", x = "Total Damage", y = "Event Type")
# Plot for crop damage
pl2 <- ggplot(data = eco_dam_sorted, aes(x = CROPDMG, y = EVTYPE, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
theme_classic() +
theme_bw(base_family = 'Times') +
scale_fill_brewer(palette = 'BuGn') +
labs(title = "Crop Damages", x = "Crop Damage", y = "Event Type")
# Plot for property damage
pl3 <- ggplot(data = eco_dam_sorted, aes(x = PROPDMG, y = EVTYPE, fill = EVTYPE)) +
geom_bar(stat = 'identity') +
theme_classic() +
theme_bw(base_family = 'Times') +
scale_fill_brewer(palette = 'YlGnBu') +
labs(title = "Property Damages", x = "Property Damage", y = "Event Type")
# Arrange plots in one grid
grid.arrange(pl1, pl2, pl3, ncol = 1, nrow = 3)
Figure 2: The top 20 weather events by total economic damage (property + crop), as well as separate plots for property and crop damages. Flash floods cause the most economic damage in general.
We also performed a correlation analysis to explore the relationship
between event severity (MAG and F) and
economic/health damages.
# Correlation analysis
clean_data <- na.omit(eco_dam)
numeric_clean_data <- clean_data %>%
select(where(is.numeric))
correlation_matrix <- as.data.frame(cor(numeric_clean_data))
correlation_matrix
## MAG F PROPDMG CROPDMG total
## MAG 1.0000000000 -0.004578299 0.003010199 -0.0004843312 0.002795929
## F -0.0045782988 1.000000000 0.280954733 0.0537385761 0.281000503
## PROPDMG 0.0030101985 0.280954733 1.000000000 0.0889312107 0.979430285
## CROPDMG -0.0004843312 0.053738576 0.088931211 1.0000000000 0.288085256
## total 0.0027959289 0.281000503 0.979430285 0.2880852561 1.000000000
The correlation matrix shows a weak positive correlation between the
total economic damage and the F value, suggesting that more
severe events tend to cause higher economic damage.
This analysis identified the types of severe weather events that have the greatest impact on population health and economic damage. Flash floods and tornadoes cause the most harm in terms of both health and economic consequences.
```