Summary

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (which tracks characteristics from major storms and weather events including when and where they occur as well as estimates any fatalities, injuries, and property damage), I evaluate which storm event are the most harmful to population health and have the greatest economic consequences. The former is operationalized as the sum of fatalities and injuries while the latter is operationalized as the sum of property and crop damage estimates. In short, these analyses find that tornadoes have caused cause the highest levels of population health harms and floods have caused the highest economic costs.

##Data Processing To read in the raw data, I used the read.csv function.

storm_data <- read.csv("repdata_data_StormData.csv.bz2")
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Next, I created a new dataframe “harm” and calculated a new column of the sum of event injuries and fatalities.

  #create new harm data set with just injury, fatal, and event data 
  harm <- storm_data %>%
    group_by(EVTYPE) %>%
    summarize(
      sum_fatal=sum(FATALITIES), 
      sum_injury=sum(INJURIES)
    )
  
  #create updated dataset with sum col that adds sum_fatal and sum_injury cols
  harmsum <- harm %>%
    mutate(sum=sum_fatal + sum_injury) 

Next, I worked with the property and crop damage estimates. These estimates were formatted to be two columns, one with a 1-3 digit integer and a character column with K (thousands), M (millions), and B (billions). To make these easier to work with, I created a multiplier object that would multiply the integer in the PROPDMG and CROPDMG values by 1,000, 1,000,000, and 1,000,000,000 based on the character string. I then combined modified estimates of property and crop damage to create an estimate of total economic costs.

 class(storm_data$PROPDMG) #numeric 
## [1] "numeric"
    class(storm_data$PROPDMGEXP) #character string
## [1] "character"
    library(stringr) 
    
  #Property Damage 
    multiplier <- case_when(
      storm_data$PROPDMGEXP == "K" ~ 1000, 
      storm_data$PROPDMGEXP == "M" ~ 1000000, 
      storm_data$PROPDMGEXP == "B" ~ 1000000000
    )
    
    storm_data$prop_cost <- storm_data$PROPDMG * multiplier
    
  #Crop damage 
    multiplier2 <- case_when(
      storm_data$CROPDMGEXP == "K" ~ 1000, 
      storm_data$CROPDMGEXP == "M" ~ 1000000, 
      storm_data$CROPDMGEXP == "B" ~ 1000000000
    )
    
    storm_data$crop_cost <- storm_data$CROPDMG * multiplier2
    
    #Combine Crop and Property damage
    cost <- storm_data %>%
      group_by(EVTYPE) %>%
      summarize(
        sum_prop_cost=sum(prop_cost, na.rm=TRUE), 
        sum_crop_cost=sum(crop_cost, na.rm=TRUE)
      )
    cost$total_cost <- cost$sum_prop_cost + cost$sum_crop_cost

Results

Across the United States, I found that tornados had the highest total population health harms (96,979 total injuries and fatalities). Below, you can see the top 10 harmful events, their total fatalities, injuries, and sum of injuries and fatalities.

 result <- harmsum %>%
    group_by(EVTYPE) %>%
    summarize(max_value=max(sum)) %>%
    filter(max_value==max(max_value))
       
   result    #tornado has the highest sum--96679
## # A tibble: 1 × 2
##   EVTYPE  max_value
##   <chr>       <dbl>
## 1 TORNADO     96979
   #sort sum col by eventype 
   harmsum_sort <- harmsum %>%
     arrange(desc(sum)) 
       
    head(harmsum_sort, 10)  #show top ten events with highest values
## # A tibble: 10 × 4
##    EVTYPE            sum_fatal sum_injury   sum
##    <chr>                 <dbl>      <dbl> <dbl>
##  1 TORNADO                5633      91346 96979
##  2 EXCESSIVE HEAT         1903       6525  8428
##  3 TSTM WIND               504       6957  7461
##  4 FLOOD                   470       6789  7259
##  5 LIGHTNING               816       5230  6046
##  6 HEAT                    937       2100  3037
##  7 FLASH FLOOD             978       1777  2755
##  8 ICE STORM                89       1975  2064
##  9 THUNDERSTORM WIND       133       1488  1621
## 10 WINTER STORM            206       1321  1527

Across the United States, floods yielded the highest economic costs (150,319,678,250) followed by hurricanes/typhoons (71,913,712,800). See the below plot of the top 10 storm event types of economic cost totals (property + crop damage) and the list of top 10 events with total property, crop, and total economic cost estimates.

 result2 <- cost %>%
      group_by(EVTYPE) %>%
      summarize(max_value=max(total_cost)) %>%
      filter(max_value==max(max_value))
    result2  #flood
## # A tibble: 1 × 2
##   EVTYPE    max_value
##   <chr>         <dbl>
## 1 FLOOD  150319678250
    cost_sum_sort <- cost %>%    #filter data for top 10 biggest cost values
      arrange(desc(total_cost)) %>%
      slice(1:10) 
      
    library(ggplot2) 
    ggplot(cost_sum_sort, aes(x=EVTYPE, y=total_cost, fill=EVTYPE)) +
      geom_col() +
      theme(axis.text.y = element_text(size = 5)) +
      labs(title="Top 10 Events Types by Total Economic Cost", 
           x="Event Type", 
           y="Estimated Cost", 
           color="Event Type") 
## Ignoring unknown labels:
## • colour : "Event Type"

    head(cost_sum_sort, 10)  #show top ten events with highest values
## # A tibble: 10 × 4
##    EVTYPE            sum_prop_cost sum_crop_cost   total_cost
##    <chr>                     <dbl>         <dbl>        <dbl>
##  1 FLOOD              144657709800    5661968450 150319678250
##  2 HURRICANE/TYPHOON   69305840000    2607872800  71913712800
##  3 TORNADO             56925660480     414953110  57340613590
##  4 STORM SURGE         43323536000          5000  43323541000
##  5 HAIL                15727366720    3025537450  18752904170
##  6 FLASH FLOOD         16140811510    1421317100  17562128610
##  7 DROUGHT              1046106000   13972566000  15018672000
##  8 HURRICANE           11868319010    2741910000  14610229010
##  9 RIVER FLOOD          5118945500    5029459000  10148404500
## 10 ICE STORM            3944927810    5022113500   8967041310

##Discussion

I theorize that the storm events that create the most population health harms and economic harms are not the same because tornadoes (the storm event that results in the highest population harms) often are not as predictable as floods and hurricanes (the events that result in higher economic costs). Consequently, floods and hurricanes are often more predictable by tracking rain fall and storm development and people have time to evacuate the area.