Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most harmful to population health and have the greatest economic consequences. The database contains information on major storms and weather events from 1950 to November 2011, including fatalities, injuries, and property damage estimates. We analyzed the data to identify the top weather events causing health impacts (fatalities and injuries) and economic damage (property and crop damage). The results show that tornadoes are the most harmful to population health, causing the highest combined fatalities and injuries. For economic consequences, floods cause the greatest total damage when combining property and crop losses. These findings can help government and municipal managers prioritize resources and prepare for the most impactful severe weather events. The analysis was conducted using R and follows reproducible research principles with all code and data processing steps documented.

Data Processing

Loading Required Libraries

First, we load the necessary R packages for data manipulation and visualization.

# Install packages if you don't have them
# Uncomment these lines if needed:
# install.packages("dplyr")
# install.packages("ggplot2")

library(dplyr)
library(ggplot2)

Downloading and Reading the Data

We download the storm data file from the course website and read it directly from the compressed format.

# Download the data file if it doesn't exist
if(!file.exists("stormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "stormData.csv.bz2")
}

# Read the data directly from the compressed file
storm_data <- read.csv("stormData.csv.bz2")

# Check the structure of the data
dim(storm_data)
## [1] 902297     37
head(storm_data, 3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3

Processing Data for Health Impact Analysis

We calculate total health impact by summing fatalities and injuries for each event type.

# Select only the columns we need for health analysis
health_data <- storm_data %>%
    select(EVTYPE, FATALITIES, INJURIES) %>%
    group_by(EVTYPE) %>%
    summarize(
        Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
        Total_Injuries = sum(INJURIES, na.rm = TRUE),
        Total_Health_Impact = Total_Fatalities + Total_Injuries
    ) %>%
    arrange(desc(Total_Health_Impact)) %>%
    head(10)

# Display the top 10 events
print(health_data)
## # A tibble: 10 × 4
##    EVTYPE            Total_Fatalities Total_Injuries Total_Health_Impact
##    <chr>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527

Processing Data for Economic Impact Analysis

We process property and crop damage data, converting the alphabetic exponents to numeric values.

# Function to convert damage exponents to multipliers
convert_exponent <- function(exp_value) {
    exp_value <- toupper(as.character(exp_value))
    ifelse(exp_value == "K", 1000,
    ifelse(exp_value == "M", 1000000,
    ifelse(exp_value == "B", 1000000000,
    ifelse(exp_value == "H", 100,
    ifelse(exp_value %in% c("0","1","2","3","4","5","6","7","8"), 10,
    ifelse(exp_value == "", 1, 1))))))
}

# Calculate total economic damage
economic_data <- storm_data %>%
    select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
    mutate(
        Property_Damage = PROPDMG * sapply(PROPDMGEXP, convert_exponent),
        Crop_Damage = CROPDMG * sapply(CROPDMGEXP, convert_exponent),
        Total_Damage = Property_Damage + Crop_Damage
    ) %>%
    group_by(EVTYPE) %>%
    summarize(
        Total_Property_Damage = sum(Property_Damage, na.rm = TRUE),
        Total_Crop_Damage = sum(Crop_Damage, na.rm = TRUE),
        Total_Economic_Impact = sum(Total_Damage, na.rm = TRUE)
    ) %>%
    arrange(desc(Total_Economic_Impact)) %>%
    head(10)

# Display the top 10 events
print(economic_data)
## # A tibble: 10 × 4
##    EVTYPE          Total_Property_Damage Total_Crop_Damage Total_Economic_Impact
##    <chr>                           <dbl>             <dbl>                 <dbl>
##  1 FLOOD                    144657709807        5661968450          150319678257
##  2 HURRICANE/TYPH…           69305840000        2607872800           71913712800
##  3 TORNADO                   56937162900         414954710           57352117610
##  4 STORM SURGE               43323536000              5000           43323541000
##  5 HAIL                      15732269934        3025954653           18758224587
##  6 FLASH FLOOD               16140815218        1421317100           17562132318
##  7 DROUGHT                    1046106000       13972566000           15018672000
##  8 HURRICANE                 11868319010        2741910000           14610229010
##  9 RIVER FLOOD                5118945500        5029459000           10148404500
## 10 ICE STORM                  3944928310        5022113500            8967041810

Results

Question 1: Events Most Harmful to Population Health

The following figure shows the top 10 weather event types that caused the most combined fatalities and injuries across the United States.

# Create bar plot for health impact
ggplot(health_data, aes(x = reorder(EVTYPE, Total_Health_Impact), 
                        y = Total_Health_Impact)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    coord_flip() +
    labs(title = "Top 10 Weather Events Most Harmful to Population Health",
         x = "Event Type",
         y = "Total Health Impact (Fatalities + Injuries)") +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5, face = "bold"))

Finding: Tornadoes are by far the most harmful weather event to population health, causing over 90,000 combined fatalities and injuries. This is followed by excessive heat and thunderstorm winds.

Question 2: Events with Greatest Economic Consequences

The following figure shows the top 10 weather event types that caused the most economic damage (property and crop damage combined).

# Create bar plot for economic impact
ggplot(economic_data, aes(x = reorder(EVTYPE, Total_Economic_Impact), 
                          y = Total_Economic_Impact/1000000000)) +
    geom_bar(stat = "identity", fill = "darkred") +
    coord_flip() +
    labs(title = "Top 10 Weather Events with Greatest Economic Consequences",
         x = "Event Type",
         y = "Total Economic Damage (Billions of USD)") +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5, face = "bold"))

Finding: Floods cause the greatest economic damage, exceeding $150 billion in total property and crop losses. Hurricanes/typhoons and tornadoes also cause significant economic damage.

Conclusion

This analysis identified tornadoes as the most harmful weather event for population health and floods as the event with the greatest economic consequences. Municipal and government managers should prioritize preparedness and response resources for these event types.