Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database - Health and Economic Impacts

Synopsis

My Reproducible Research is focus on impact of Storms and other severe weather events that can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This dataset tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The analysis of the dataset was ment to show or give insights to you as to makek informed decisions as to put in preventive measures that can secure facilities and resources.

Loading labraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Processing

The raw dataset is a compressed .csv.bz2 file and contains data on weather event types (EVTYPE), human impacts (FATALITIES, INJURIES), and economic impacts (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP).

storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
names(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Data Transformation and Analysis

To evaluate which types of weather events have the greatest economic consequences, the following data transformation and analysis steps were performed:

Severe Weather Events Most Harmful to Population Health

To identify which types of severe weather events were most harmful to population health, the data was processed as follows:

Grouping by Event Type: The dataset was grouped by the EVTYPE variable, which categorizes different storm and weather events (e.g., tornado, flood, lightning).

Summarizing Fatalities and Injuries: For each event type, the total number of fatalities (FATALITIES) and injuries (INJURIES) was computed using sum(…, na.rm = TRUE) to ensure missing values did not interfere with the calculations.

Calculating Total Health Impact: A new variable, Total, was created by adding the total fatalities and injuries for each event type. This provided a unified measure of the overall impact of each weather event on population health.

Sorting and Selecting Top Events: The data was arranged in descending order based on the Total health impact. The top_n(10, Total) function was used to retain only the top 10 most harmful event types.

Visualization Preparation: The resulting dataset (health_impact) was used to create a horizontal bar chart to visually display and compare the top 10 most harmful events.

Economic Impact

To evaluate which types of weather events have the greatest economic consequences, the following data transformation steps were performed:

Understanding Damage Exponents: In the original dataset, the PROPDMGEXP and CROPDMGEXP columns represent the units of property and crop damage costs, respectively. These values are encoded as characters such as “K” (thousands), “M” (millions), and “B” (billions), or occasionally as numeric characters or other symbols. To accurately compute monetary values, these exponent codes had to be converted to numeric multipliers.

Creating a Conversion Function: A custom function named convert_exp() was defined to map each exponent to its corresponding numeric multiplier:

‘K’ or ‘k’ → 1,000 ‘M’ or ‘m’ → 1,000,000 ‘B’ or ‘b’ → 1,000,000,000 Numeric characters (e.g., ‘2’) → 10 raised to that number (e.g., 10^2 = 100) All other or unknown values were defaulted to 1 to minimize data loss

Applying the Conversion: The PROPDMGEXP and CROPDMGEXP columns were first converted to character type and then passed through the convert_exp() function using sapply() to generate new numeric multiplier columns.

Calculating Actual Damage Costs

New columns were then created using mutate():

PROP_COST = PROPDMG × PROPDMGEXP CROP_COST = CROPDMG × CROPDMGEXP TOTAL_COST = Sum of property and crop damage for each event This processing ensured that the economic impact of each weather event was accurately quantified in dollars and could be reliably analyzed and compared across event types.

# Convert damage exponents to numeric values
convert_exp <- function(e) {
  if (e %in% c('K', 'k')) return(1e3)
  if (e %in% c('M', 'm')) return(1e6)
  if (e %in% c('B', 'b')) return(1e9)
  if (grepl("^[0-9]$", e)) return(10^as.numeric(e))
  return(1)
}

storm$PROPDMGEXP <- sapply(as.character(storm$PROPDMGEXP), convert_exp)
storm$CROPDMGEXP <- sapply(as.character(storm$CROPDMGEXP), convert_exp)

storm <- storm %>%
  mutate(
    PROP_COST = PROPDMG * PROPDMGEXP,
    CROP_COST = CROPDMG * CROPDMGEXP,
    TOTAL_COST = PROP_COST + CROP_COST
  )

Results

Summary of the most significant findings on storm-related impacts across the United States from 1950 to 2011.

Most Harmful Events to Population Health Tornadoes has appear to be by far the most harmful weather events in terms of human health, causing the highest combined number of fatalities and injuries. Other impactful events include excessive heat, floods, and lightning.

health_impact <- storm %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  mutate(Total = Fatalities + Injuries) %>%
  arrange(desc(Total)) %>%
  top_n(10, Total)

# Plot
ggplot(health_impact, aes(reorder(EVTYPE, -Total), Total)) +
  geom_bar(stat = "identity", fill = "brown") +
  coord_flip() +
  labs(title = "Top 10 Events Most Harmful to Health", x = "Event Type", y = "Fatalities + Injuries")

Figure 1. Top 10 Weather Event Types Causing the Greatest Total Health Impact (Fatalities + Injuries)

This bar chart shows the top 10 event types based on the combined number of fatalities and injuries. Tornadoes are clearly the most harmful event type.

Events with Greatest Economic Consequences In terms of economic impact, floods cause the most damage overall, followed by hurricanes/typhoons, tornadoes, and storm surges. These events result in billions of dollars in property and crop damage.

economic_impact <- storm %>%
  group_by(EVTYPE) %>%
  summarise(TotalCost = sum(TOTAL_COST, na.rm = TRUE)) %>%
  arrange(desc(TotalCost)) %>%
  top_n(10, TotalCost)

ggplot(economic_impact, aes(reorder(EVTYPE, -TotalCost), TotalCost / 1e9)) +
  geom_bar(stat = "identity", fill = "purple") +
  coord_flip() +
  labs(title = "Top 10 Events with Greatest Economic Damage", x = "Event Type", y = "Total Damage (Billion USD)")

Conclusion

From the analysis you can see clearly factors that has seriously impacted or damage crops and properties in the US.