Synopsis

This report analyzes NOAA’s Storm Data to evaluate the effects of weather events on population health and economic loss in the United States. Data is processed and compiled by event type, calculating fatalities, injuries, and monetary damage.

Graphical plots reveal that tornadoes are the most deadly (fatalities), and also have the largest effect on population health (fatalities+injuries). While floods cause the highest economic damage. Percentage analyses emphasizes the significant contributions of these events to overall impacts. The findings offer insights for improving disaster management and policy decisions.

Data Processing

Data was loaded from web-based database. Next, the downloaded file was used to create a data frame and inspected to facilitate future working.

        #download and read in data
        url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        destfile <- "StormData.csv.bz2"  # fixed file name
        #download the file if it doesn't already exist
        if(!file.exists(destfile)) {
                download.file(url, destfile, mode = "wb")
        }
        #read the compressed CSV file
        stormdata <- read.csv(bzfile(destfile))
        #inspect data to aid later working
        str(stormdata)
        names(stormdata)

The original data was modified into three groups.

  1. Total fatalities by weather event.
  2. Total fatalities and injuries by weather event.
  3. Total cost of property and crop damage by weather event.



First the data is compiling by total fatalities per weather event (EVTYPE) and the top 5 most fatal weather events are selected.

        #group fatalities by event type
        fatalities_by_event <- stormdata %>%
            group_by(EVTYPE) %>%
            summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
            filter(total_fatalities > 0) %>%
            arrange(desc(total_fatalities))
        #select the top 5
        top_fatalities <- fatalities_by_event %>% top_n(5, total_fatalities)

Second the data is complied by total fatalities and total injuries per event and the top 5 most harmful events are selected.

        #find most dangerous events by combining fatalities and injuries
            health_impact <- stormdata %>%
                group_by(EVTYPE) %>%
                summarise(total_health = sum(FATALITIES + INJURIES, na.rm = TRUE)) %>%
                filter(total_health > 0) %>%
                arrange(desc(total_health))
            #reorder data
            top_health <- health_impact %>% top_n(5, total_health)

This data will be used to answer which types of events are most harmful to population health.



Lastly, the data is modified to address total economic damage by compiled by total property and crop damage.

Property damage and crop damage is first calculated in a dollar amount since this is not given in the original data.

          #convert symbols to multipliers
            exp_to_multiplier <- function(exp) {
                if(exp %in% c("", "0")) {
                    1
                } else if(exp %in% c("K", "k")) {
                    1e3
                } else if(exp %in% c("M", "m")) {
                    1e6
                } else if(exp %in% c("B", "b")) {
                    1e9
                } else {
                    1
                }
            }
            #create multiplier columns for property and crop damage
            stormdata$prop_multiplier <- sapply(stormdata$PROPDMGEXP, exp_to_multiplier)
            stormdata$crop_multiplier <- sapply(stormdata$CROPDMGEXP, exp_to_multiplier)
            #calculate property and crop damages in dollars
            stormdata$prop_dollars <- stormdata$PROPDMG * stormdata$prop_multiplier
            stormdata$crop_dollars <- stormdata$CROPDMG * stormdata$crop_multiplier
            #find total economic damage and create column for this
            stormdata$total_damage <- stormdata$prop_dollars + stormdata$crop_dollars

Then, the total cost of property and crop damage per weather event is calculated. The top 10 most costly events are selected.

            #find total damage by event type
            damage_by_event <- stormdata %>%
                group_by(EVTYPE) %>%
                summarise(total_damage = sum(total_damage, na.rm = TRUE)) %>%
                filter(total_damage > 0) %>%
                arrange(desc(total_damage))
            #select  top 10 events
            top_damage <- damage_by_event %>% top_n(10, total_damage)

This data will be used to answer which types of events have the greatest economic consequence.

Results

We will address population health in two parts.

First, by looking solely at fatalities from each weather events to determine which storms are the most deadly overall.

        ggplot(top_fatalities, aes(x = reorder(EVTYPE, total_fatalities), y = total_fatalities)) +
            geom_bar(stat = "identity", fill = "red") +
            coord_flip() +
            labs(title = "Top 5 Events by Fatalities",
                 x = "Weather Event",
                 y = "Total Fatalities")

Tornadoes are the most dangerous weather event for overall deaths and excessive heat is second.



Knowing the percent of overall deaths due to storm related fatalities can help put their impact in perspective.

        #find overall fatalities from the dataset
        overall_fatalities <- sum(stormdata$FATALITIES, na.rm = TRUE)
        #find the percentage contribution for each top event
        top_fatalities <- top_fatalities %>%
            mutate(percentage = total_fatalities / overall_fatalities * 100)
        #make a pretty table for the report
        fatalities_table <- top_fatalities %>%
                mutate(Percentage = round(percentage, 2)) %>%
                select(Weather_Event = EVTYPE, Percentage) %>%
                head(3)
        #print table
        knitr::kable(fatalities_table)
Weather_Event Percentage
TORNADO 37.19
EXCESSIVE HEAT 12.57
FLASH FLOOD 6.46

This confirms that tornadoes account for almost 40% of fatalities due to weather related events.



Next, we will look at fatalities and injuries to see which weather events have the largest impact on population health.

   ggplot(top_health, aes(x = reorder(EVTYPE, total_health), y = total_health)) +
                geom_bar(stat = "identity", fill = "blue") +
                coord_flip() +
                labs(title = "Top 5 Events by Total Health Impact",
                     x = "Total Health Impact (Fatalities + Injuries)",
                     y = "Weather Event")

This shows that tornado are the most harmful to overall health. Excessive heat is next, but comprised a much smaller portion of impact to population health.

To further demonstrate the effect these events have, the percentage of overall health impact is calculated.

            #find overall injuries from the dataset
            overall_health <- sum(stormdata$FATALITIES + stormdata$INJURIES, na.rm = TRUE)
            #find the percentage contribution for each top event
            top_health <- top_health %>%
                mutate(percentage = total_health / overall_health * 100)
            #make a pretty table for the report
            health_table <- top_health %>%
                    mutate(Percentage = round(percentage, 2)) %>%
            select(Weather_Event = EVTYPE, Percentage) %>%
                    head(5)
            #print table
            knitr::kable(health_table)
Weather_Event Percentage
TORNADO 62.30
EXCESSIVE HEAT 5.41
TSTM WIND 4.79
FLOOD 4.66
LIGHTNING 3.88

Tornadoes account for about 60% of all storm related injuries and deaths, and all others events account for less than 6%.

This confirms that tornadoes have the largest impact to population health of all storm related events.



Now we address economic consequences by looking at which weather events cause the most economic damage to property and crops.

            # Create a horizontal bar plot of the top events by economic damage
            ggplot(top_damage, aes(x = reorder(EVTYPE, total_damage), y = total_damage)) +
                geom_bar(stat = "identity", fill = "orange") +
                coord_flip() +
                labs(title = "Top 10 Events by Economic Consequences",
                     x = "Weather Event Type",
                     y = "Total Damage (USD)") +
                scale_y_continuous(labels = scales::dollar_format())

Flood contribute to the largest economic damage, followed by typhoons/hurricanes, and again by tornadoes.

It is worth pointing out that flood damage appears to be more than double the damage cause by any other single weather event.

We will confirm this with a percentage analysis of overall damage.

            #find overall economic damage across all events
            overall_damage <- sum(stormdata$total_damage, na.rm = TRUE)
            #find percentage contribution for each top event
            top_damage <- top_damage %>%
                mutate(percentage = total_damage / overall_damage * 100) 
            #make it look pretty in a table 
            damage_table <- top_damage %>%
                    mutate(Percentage = round(percentage, 2)) %>%
                    select(Weather_Event = EVTYPE, Percentage) %>%
            head(5)
            #print table
            knitr::kable(damage_table)
Weather_Event Percentage
FLOOD 31.55
HURRICANE/TYPHOON 15.09
TORNADO 12.04
STORM SURGE 9.09
HAIL 3.94

This confirms that floods have the largest economic impact, about 32%, and are more than double the economic damage of any other weather event.

Also, hurricane/typhoons and tornadoes economic damage combined, about 28%, represent almost as large an economic impact as floods.