Synopsis

This analysis explores the NOAA Storm Database to assess the impact of severe weather events on population health and economic consequences. Two primary questions are addressed: 1. Which types of events are most harmful to population health (based on fatalities and injuries)? 2. Which types of events have the greatest economic consequences (considering property and crop damages)?

The data was processed by cleaning missing values, creating summary variables (e.g., total health impacts and economic damages), and sorting the events to identify the most harmful. The results are visualized through bar plots to represent the top events for both health and economic impacts.

Data Processing

The data was loaded from the CSV file "data/repdata-data-StormData.csv" and processed using R. The dataset contains information on various weather events, including their effects on health and the economy.

# Load the necessary library and data
weather_data <- read.csv("data/repdata-data-StormData.csv")

# Check the structure and missing values
str(weather_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
colSums(is.na(weather_data))
##    STATE__   BGN_DATE   BGN_TIME  TIME_ZONE     COUNTY COUNTYNAME      STATE 
##          0          0          0          0          0          0          0 
##     EVTYPE  BGN_RANGE    BGN_AZI BGN_LOCATI   END_DATE   END_TIME COUNTY_END 
##          0          0          0          0          0          0          0 
## COUNTYENDN  END_RANGE    END_AZI END_LOCATI     LENGTH      WIDTH          F 
##     902297          0          0          0          0          0     843563 
##        MAG FATALITIES   INJURIES    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0          0          0          0          0 
##        WFO STATEOFFIC  ZONENAMES   LATITUDE  LONGITUDE LATITUDE_E LONGITUDE_ 
##          0          0          0         47          0         40          0 
##    REMARKS     REFNUM 
##          0          0

Data Transformation

  1. Health Impact Analysis:
    • The columns FATALITIES and INJURIES were selected to analyze the health impacts of each event. A new column, FI, was created by adding fatalities and injuries to provide a total measure of health impact.
    • The EVTYPE column was converted into a factor to aid in visualization and analysis.
  2. Economic Impact Analysis:
    • The columns PROPDMG (property damage) and CROPDMG (crop damage) were selected, and a new column, total, was created by adding these two values to assess the total economic impact of each event.
# Health Impact Analysis
health_related <- weather_data[, c('FATALITIES', 'INJURIES', 'EVTYPE')]
health_related$FI <- health_related$FATALITIES + health_related$INJURIES
health_related$EVTYPE <- as.factor(health_related$EVTYPE)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Sorting by total health impact (FI)
filtered_FI <- health_related %>%
    arrange(desc(FI)) %>%
    slice(1:10)

# Sorting by fatalities and injuries
health_related_sorted_F <- health_related %>%
    arrange(desc(FATALITIES)) %>%
    slice(1:10)
health_related_sorted_INJ <- health_related %>%
    arrange(desc(INJURIES)) %>%
    slice(1:10)

Results

Health Impact

The top 10 events with the highest total health impact (fatalities + injuries), fatalities, and injuries are presented below.

# Create plots to visualize the health impact
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
# Plot for total health impact (FI)
plot1 <- ggplot(data = filtered_FI, aes(x = FI, y = EVTYPE, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    theme_classic() +
    scale_fill_brewer(palette = 'Set3') +
    labs(title = "Top 10 Events by Total Health Impact (Fatalities + Injuries)", x = "Total Impact", y = "Event Type")

# Plot for fatalities
plot2 <- ggplot(data = health_related_sorted_F, aes(x = EVTYPE, y = FATALITIES, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    scale_fill_brewer(palette = 'Set3') +
    theme_classic() +
    labs(title = "Top 10 Events by Fatalities", x = "Event Type", y = "Fatalities")

# Plot for injuries
plot3 <- ggplot(data = health_related_sorted_INJ, aes(x = EVTYPE, y = INJURIES, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    scale_fill_brewer(palette = 'Set3') +
    theme_classic() +
    labs(title = "Top 10 Events by Injuries", x = "Event Type", y = "Injuries")

# Arrange plots in one grid
grid.arrange(plot1, plot2, plot3, ncol = 1, nrow = 3)

Figure 1: The top 10 weather events by total health impact (fatalities + injuries), fatalities, and injuries are shown. Flash floods and tornadoes are among the most harmful.

Economic Impact

The top 20 events with the greatest total economic damage, as well as separate plots for property and crop damages, are presented below.

# Economic Impact Analysis
eco_dam <- weather_data[, c('MAG', 'F', 'PROPDMG', 'CROPDMG', 'EVTYPE')]
eco_dam$total <- eco_dam$PROPDMG + eco_dam$CROPDMG

# Sorting by total damage
eco_dam_sorted <- eco_dam %>%
    arrange(desc(total)) %>%
    slice(1:20)

# Renaming columns for clarity
colnames(eco_dam_sorted) <- c('Mag', 'F_value', 'PROPDMG', 'CROPDMG', 'EVTYPE', 'Total')

# Plot for total economic damage
pl1 <- ggplot(data = eco_dam_sorted, aes(x = Total, y = EVTYPE, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    theme_classic() +
    theme_bw(base_family = 'Times') +
    scale_fill_brewer(palette = 'YlOrRd') +
    labs(title = "Top 20 Events by Total Economic Damage", x = "Total Damage", y = "Event Type")

# Plot for crop damage
pl2 <- ggplot(data = eco_dam_sorted, aes(x = CROPDMG, y = EVTYPE, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    theme_classic() +
    theme_bw(base_family = 'Times') +
    scale_fill_brewer(palette = 'BuGn') +
    labs(title = "Crop Damages", x = "Crop Damage", y = "Event Type")

# Plot for property damage
pl3 <- ggplot(data = eco_dam_sorted, aes(x = PROPDMG, y = EVTYPE, fill = EVTYPE)) +
    geom_bar(stat = 'identity') +
    theme_classic() +
    theme_bw(base_family = 'Times') +
    scale_fill_brewer(palette = 'YlGnBu') +
    labs(title = "Property Damages", x = "Property Damage", y = "Event Type")

# Arrange plots in one grid
grid.arrange(pl1, pl2, pl3, ncol = 1, nrow = 3)

Figure 2: The top 20 weather events by total economic damage (property + crop), as well as separate plots for property and crop damages. Flash floods cause the most economic damage in general.

Correlation Analysis

We also performed a correlation analysis to explore the relationship between event severity (MAG and F) and economic/health damages.

# Correlation analysis
clean_data <- na.omit(eco_dam)
numeric_clean_data <- clean_data %>%
    select(where(is.numeric))

correlation_matrix <- as.data.frame(cor(numeric_clean_data))
correlation_matrix
##                   MAG            F     PROPDMG       CROPDMG       total
## MAG      1.0000000000 -0.004578299 0.003010199 -0.0004843312 0.002795929
## F       -0.0045782988  1.000000000 0.280954733  0.0537385761 0.281000503
## PROPDMG  0.0030101985  0.280954733 1.000000000  0.0889312107 0.979430285
## CROPDMG -0.0004843312  0.053738576 0.088931211  1.0000000000 0.288085256
## total    0.0027959289  0.281000503 0.979430285  0.2880852561 1.000000000

The correlation matrix shows a weak positive correlation between the total economic damage and the F value, suggesting that more severe events tend to cause higher economic damage.

Conclusion

This analysis identified the types of severe weather events that have the greatest impact on population health and economic damage. Flash floods and tornadoes cause the most harm in terms of both health and economic consequences.

```