1. Introduction

This assignment is Peer-graded Assignment: Course Project 2

2. Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, focusing on the impact of severe weather events on population health and the economy. We identify the most harmful event types in terms of fatalities and injuries, and those with the greatest economic consequences.

3. Data Pre-Processing

a. Set working directory

setwd("C:/Users/baret/Desktop/R Programming Storm Data")

b. Load the necessary libraries

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tibble' was built under R version 4.3.3
## Warning: package 'tidyr' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'purrr' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'stringr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)

4. Data Processing

a. Load the data

storm_data <- read_csv("repdata-data-StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

b. Convert date columns to Date type

storm_data <- storm_data %>%
  mutate(
    BGN_DATE = as.Date(BGN_DATE, format="%m/%d/%Y %H:%M:%S"),
    END_DATE = as.Date(END_DATE, format="%m/%d/%Y %H:%M:%S")
  )

c. Convert necessary columns to factors

storm_data <- storm_data %>%
  mutate_at(vars(STATE, COUNTYNAME, EVTYPE, BGN_AZI, BGN_LOCATI, END_AZI, END_LOCATI, PROPDMGEXP, CROPDMGEXP), as.factor)

d. Ensure numeric columns are numeric

storm_data <- storm_data %>%
  mutate_at(vars(FATALITIES, INJURIES, PROPDMG, CROPDMG), as.numeric)

e. Investigate and handle missing values

storm_data <- storm_data %>%
  drop_na(FATALITIES, INJURIES, PROPDMG, CROPDMG)

f. Interpreting PROPDMGEXP and CROPDMGEXP

convert_exp <- function(exp) {
  ifelse(exp %in% c('H', 'h'), 100,
         ifelse(exp %in% c('K', 'k'), 1000,
                ifelse(exp %in% c('M', 'm'), 1e6,
                       ifelse(exp %in% c('B', 'b'), 1e9,
                              ifelse(exp %in% c('', '-', '?', '+'), 1,
                                     as.numeric(exp))))))
}

storm_data <- storm_data %>%
  mutate(
    PROPDMGEXP = sapply(PROPDMGEXP, convert_exp),
    CROPDMGEXP = sapply(CROPDMGEXP, convert_exp),
    PROPDMG = PROPDMG * PROPDMGEXP,
    CROPDMG = CROPDMG * CROPDMGEXP
  ) %>%
  select(-PROPDMGEXP, -CROPDMGEXP)

5. Results

a. Aggregate data by event type for population health impact

health_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarize(
    total_fatalities = sum(FATALITIES),
    total_injuries = sum(INJURIES)
  ) %>%
  arrange(desc(total_fatalities), desc(total_injuries))

b. Top 10 events by fatalities

top_10_fatalities <- health_impact %>% top_n(10, total_fatalities)
# The events most harmful with respect to population health are: Tornadoes, Excessive Heat, Flash Floods. 

c. Plotting the top 10 events by fatalities

d. Aggregate data by event type for economic impact

economic_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarize(
    total_property_damage = sum(PROPDMG),
    total_crop_damage = sum(CROPDMG)
  ) %>%
  arrange(desc(total_property_damage + total_crop_damage))

e. Top 10 events by total economic damage (property + crop damage)

top_10_economic_damage <- economic_impact %>% top_n(10, total_property_damage + total_crop_damage)
# The events with the greatest economic consequences are: Tornadoes, TSTM Wind, and Hail; High Winds/Cold; Hurricane Opal/High Winds and Winter Storm High Winds.

f. Plotting the top 10 events by economic damage

6. Conclusion

This analysis has highlighted the weather events that have the most significant impact on population health and economic damage in the United States. Tornadoes, excessive heat, and floods are among the top events affecting human health, while floods, hurricanes, and tornadoes are the costliest in terms of economic damage. These findings can inform policy and resource allocation to mitigate the effects of severe weather events.