Synopsis

This analysis explores data from the NOAA Storm Database to understand the impacts of severe weather events across the United States. The study focuses on identifying which types of events are most harmful to population health and which cause the greatest economic damage. Using reported fatalities, injuries, property damage, and crop damage, the data are processed and summarized by event type. The results provide a clear overview of the most significant weather-related risks. These findings can assist public officials and emergency planners in prioritizing resources and preparedness efforts. All results shown are fully reproducible from the raw data source.


Data Processing

# Load required packages
library(dplyr)
library(ggplot2)

# Load the raw data directly from the compressed CSV file
storm_data <- read.csv("repdata_data_StormData.csv")

# Inspect structure
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Selecting relevant variables

storm_data <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES,
         PROPDMG, PROPDMGEXP,
         CROPDMG, CROPDMGEXP)

Converting damage exponent values

convert_exp <- function(exp) {
  ifelse(exp %in% c("K", "k"), 1e3,
  ifelse(exp %in% c("M", "m"), 1e6,
  ifelse(exp %in% c("B", "b"), 1e9, 1)))
}

storm_data$PROPDMG_MULT <- convert_exp(storm_data$PROPDMGEXP)
storm_data$CROPDMG_MULT <- convert_exp(storm_data$CROPDMGEXP)

storm_data$PROPDMG_TOTAL <- storm_data$PROPDMG * storm_data$PROPDMG_MULT
storm_data$CROPDMG_TOTAL <- storm_data$CROPDMG * storm_data$CROPDMG_MULT

Aggregating health and economic impacts

health_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  mutate(Total_Health = Fatalities + Injuries) %>%
  arrange(desc(Total_Health))

economic_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(
    Economic_Damage = sum(PROPDMG_TOTAL + CROPDMG_TOTAL, na.rm = TRUE)
  ) %>%
  arrange(desc(Economic_Damage))

Results

Events most harmful to population health

top_health <- head(health_impact, 10)

ggplot(top_health, aes(x = reorder(EVTYPE, Total_Health),
                       y = Total_Health)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Weather Events Harmful to Population Health",
       x = "Event Type",
       y = "Total Fatalities and Injuries")

Figure 1: Tornadoes and excessive heat events are responsible for the highest combined number of fatalities and injuries, making them the most harmful to population health.


Events with greatest economic consequences

top_economic <- head(economic_impact, 10)

ggplot(top_economic, aes(x = reorder(EVTYPE, Economic_Damage),
                         y = Economic_Damage / 1e9)) +
  geom_bar(stat = "identity", fill = "darkred") +
  coord_flip() +
  labs(title = "Top 10 Weather Events by Economic Damage",
       x = "Event Type",
       y = "Economic Damage (Billion USD)")

Figure 2: Floods, hurricanes, and storm surges account for the largest economic losses, largely due to extensive property and infrastructure damage.


Summary tables

head(health_impact, 5)
## # A tibble: 5 × 4
##   EVTYPE         Fatalities Injuries Total_Health
##   <chr>               <dbl>    <dbl>        <dbl>
## 1 TORNADO              5633    91346        96979
## 2 EXCESSIVE HEAT       1903     6525         8428
## 3 TSTM WIND             504     6957         7461
## 4 FLOOD                 470     6789         7259
## 5 LIGHTNING             816     5230         6046
head(economic_impact, 5)
## # A tibble: 5 × 2
##   EVTYPE            Economic_Damage
##   <chr>                       <dbl>
## 1 FLOOD               150319678257 
## 2 HURRICANE/TYPHOON    71913712800 
## 3 TORNADO              57352114049.
## 4 STORM SURGE          43323541000 
## 5 HAIL                 18758221521.

Conclusion

The analysis shows that tornadoes are the most dangerous events in terms of population health, causing the highest number of fatalities and injuries. In contrast, floods and hurricanes have the greatest economic impact due to widespread property and crop damage. These findings highlight the importance of targeted disaster preparedness strategies. Understanding both human and economic consequences allows decision-makers to allocate resources more effectively and mitigate future risks.