Analysis of NOAA storm database

2025-03-04

Synopsis

Natural disasters alter the makeup and change lives, altering public health as well as the economy. In order for us to understand the impact that they have on civilian population, we must make information more easily digestible, which can be done through graphs.

This project utilizes the National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database includes the date, time, place, type, state, county, mortality, injury, and property damage, along with other data. Using this, we can find out and graph the mortality rate, injury rate, and the economic damage these disasters cause.

This analysis should display the following data: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ? Across the United States, which types of events have the greatest economic consequences ?

As a result of our findings, it is shown that Tornados have the greatest total health impact and are the most economically impactful disaster.

Data Processing

Load the necessary packages

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load the dataset (assuming it’s a CSV file)

data <- read.csv("/home/rstudio/Reproducible Research/week2/repdata_data_StormData1.csv", stringsAsFactors = FALSE)

Summarize health impact (fatalities + injuries)

I created 3 separate variables, one for fatalities, one for injuries, and one that combined both, arranging the Total Health Impact from greatest to least impactful.

health_impact <- data %>%
  group_by(EVTYPE) %>%
  summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
            Total_Injuries = sum(INJURIES, na.rm = TRUE),
            Total_Health_Impact = Total_Fatalities + Total_Injuries,
            .groups = "drop") %>%
  arrange(desc(Total_Health_Impact)) 

Summarize economic impact (property + crop damage)

I created 3 separate variables, oen for property damage, one for crop damage, and that combined both, arranging the Total Economic Impact from greatest to least impactful.

economic_impact <- data %>%
  group_by(EVTYPE) %>%
  summarise(Total_Property_Damage = sum(PROPDMG, na.rm = TRUE),
            Total_Crop_Damage = sum(CROPDMG, na.rm = TRUE),
            Total_Economic_Impact = Total_Property_Damage + Total_Crop_Damage,
            .groups = "drop") %>%
  arrange(desc(Total_Economic_Impact))

Results

Top 10 events causing the most harm to population health

top_health <- health_impact[1:10, ]
list(top_health)
## [[1]]
## # A tibble: 10 x 4
##    EVTYPE            Total_Fatalities Total_Injuries Total_Health_Impact
##    <chr>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527

Plot for population fatality impact

ggplot(top_health, aes(x = reorder(EVTYPE, -Total_Fatalities), y = Total_Fatalities, fill = EVTYPE)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top 10 Events Most Fatal to Population Health",
       x = "Event Type",
       y = "Total Fatalities") +
  guides(fill = FALSE)

Plot for population injury impact

ggplot(top_health, aes(x = reorder(EVTYPE, -Total_Injuries), y = Total_Injuries, fill = EVTYPE)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top 10 Events Most Injurous to Population Health",
       x = "Event Type",
       y = "Total Injuries") +
  guides(fill = FALSE)

Top 10 events causing the greatest economic impact

top_economic <- economic_impact[1:10, ]
list(top_economic)
## [[1]]
## # A tibble: 10 x 4
##    EVTYPE            Total_Property_Damage Total_Crop_Dama… Total_Economic_Impa…
##    <chr>                             <dbl>            <dbl>                <dbl>
##  1 TORNADO                        3212258.          100019.             3312277.
##  2 FLASH FLOOD                    1420125.          179200.             1599325.
##  3 TSTM WIND                      1335966.          109203.             1445168.
##  4 HAIL                            688693.          579596.             1268290.
##  5 FLOOD                           899938.          168038.             1067976.
##  6 THUNDERSTORM WIND               876844.           66791.              943636.
##  7 LIGHTNING                       603352.            3581.              606932.
##  8 THUNDERSTORM WIN…               446293.           18685.              464978.
##  9 HIGH WIND                       324732.           17283.              342015.
## 10 WINTER STORM                    132721.            1979.              134700.

Plot for economic impact

ggplot(top_economic, aes(x = reorder(EVTYPE, Total_Economic_Impact), y = Total_Economic_Impact, fill = EVTYPE)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top 10 Events with Greatest Economic Consequences",
       x = "Event Type",
       y = "Total Property + Crop Damage (in dollars)") +
  guides(fill = FALSE)