Title

Analysis of storm harmfulness by type of events on population health and economy based on Data from the National Weather Service.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.
Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, the objective of the following study is to identify which types of events are most harmful to population health and have the greatest economic consequences.

The database for this study tracks characteristics of major storms and weather events in the United States from 1950 to 2011, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Storm Data are available downloading file here.
Downloading is done in project working folder saving date of download with following code:

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = paste(getwd(),"/Storm_file",sep=""))
print(Sys.Date())
## [1] "2021-03-05"

Reading Storm data and summary.

data <- read.csv("Storm_file")
dim(data)
## [1] 902297     37
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Harmfulness to population health is analyzed using event type (“EVTYPE”), fatalities (“FATALITIES”) and injuries (“INJURIES”).
Economic consequences is analyzed using event type (“EVTYPE”), property damages (“PROPDMG” and cost magnitude “PROPDMGEXP”) and crop damages (“CROPDMG” and cost magnitude “CROPDMGEXP”).
Direct and indirect fatalities, injuries or damages (properties and crop) are counted.
To take in account over time changes in data collection, the analysis is focused on data collected after year 2000 (with beginning of events date:“BGN_DATE”).

data2 <- data[as.Date(data$BGN_DATE, "%m/%d/%Y %T") > as.Date("12/31/1999", "%m/%d/%Y"),]

To keep maximum of usable data for each analysis, health and economic data are cleaned in two separate sets.

health <- data2[,c(2,8,23,24)]
str(health)
## 'data.frame':    523163 obs. of  4 variables:
##  $ BGN_DATE  : chr  "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" ...
##  $ EVTYPE    : chr  "TSTM WIND" "TSTM WIND" "TSTM WIND" "TSTM WIND" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : num  0 0 0 0 0 0 0 0 0 0 ...
economic <- data2[,c(2,8,25,26,27,28)]
str(economic)
## 'data.frame':    523163 obs. of  6 variables:
##  $ BGN_DATE  : chr  "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" ...
##  $ EVTYPE    : chr  "TSTM WIND" "TSTM WIND" "TSTM WIND" "TSTM WIND" ...
##  $ PROPDMG   : num  20 20 2 5 2 5 4 30 20 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "K" "K" "K" "K" ...

According to preparation guide, magnitude for damages cost is defined with an exponential abbreviation (empty if no damage, “K” for thousands, “M” for Millions and “B” for Billions).
For accuracy purpose, data are discarded from economic analysis if incorrect abbreviation or number is used either in property and crop damages.

nb: there is much less inaccurate values with data above year 2000

economic <- economic[economic$PROPDMGEXP=="K"|economic$PROPDMGEXP=="M"|economic$PROPDMGEXP=="B"|economic$PROPDMGEXP=="",]
economic <- economic[economic$CROPDMGEXP=="K"|economic$CROPDMGEXP=="M"|economic$CROPDMGEXP=="B"|economic$CROPDMGEXP=="",]

For the analysis, all damages cost are converted in Million $

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
economic <-
  economic %>%
  mutate(propcosts = ifelse(PROPDMGEXP == "K", PROPDMG/1000,
                          ifelse(PROPDMGEXP == "M",PROPDMG,
                                 ifelse(PROPDMGEXP == "B",PROPDMG*1000,0))),
         cropcosts = ifelse(CROPDMGEXP == "K", CROPDMG/1000,
                          ifelse(CROPDMGEXP == "M",CROPDMG,
                                 ifelse(CROPDMGEXP == "B",CROPDMG*1000,0))))

Results

1) Which types of events are most harmful to population health?

Health data are grouped by event type to calculate sum of fatalities and injuries for each type of event.
Results are merged in one set by event type, ten most harmful events are sorted based on fatalities count.
Count of injuries for most harmful events are showed in Table 1. Injuries are not represented alone as it doesn’t reflect event harmfulness separated from fatalities.

Table 1

#data with sum of fatalities by type of event
mostharmful_fatalities <- health %>%
                          group_by(EVTYPE) %>%
                          summarise(Fatalities = sum(FATALITIES))

#data with sum of injuries by type of event
mostharmful_injuries <- health %>%
                        group_by(EVTYPE) %>%
                        summarise(Injuries = sum(INJURIES))

#merged data
mostharmful_health <- merge(mostharmful_fatalities, mostharmful_injuries, by = "EVTYPE")

mostharmful_health <- mostharmful_health %>%
                      arrange(desc(Fatalities), desc(Injuries))

head(mostharmful_health, 10)  
##               EVTYPE Fatalities Injuries
## 1            TORNADO       1193    15213
## 2     EXCESSIVE HEAT       1013     3708
## 3        FLASH FLOOD        600      812
## 4          LIGHTNING        466     2993
## 5        RIP CURRENT        340      208
## 6              FLOOD        266      315
## 7               HEAT        231     1222
## 8          AVALANCHE        179      126
## 9          HIGH WIND        131      677
## 10 THUNDERSTORM WIND        130     1400

Figure 1

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
g <- ggplot(data = mostharmful_health[1:10,], 
            aes(x = Fatalities, y = reorder(EVTYPE, Fatalities), 
                fill = reorder(EVTYPE, -Fatalities)))
g + 
  geom_col(color = "black") + 
  scale_fill_brewer(palette = "Spectral") + 
  theme_minimal() +
  theme(legend.position = "none", 
        axis.title.y = element_blank(), 
        plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
  ggtitle("Most harmful events to population health\nUnited States: 2000-2011") +
  xlab("Number of Fatalities")

2) Which types of events have the greatest economic consequences?

Economic data are grouped by event type to calculate sum of property and crop damages costs for each type of event.
Results are analyzed separately as cost magnitude isn’t comparable. Ten most harmful events are sorted based on properties damage costs (Table and Figure 2) and crop damage costs (Table and Figure 3).

Table 2

#data with sum of properties costs by type of event
mostharmful_prop <-
  economic %>%
      group_by(EVTYPE) %>%
      summarise(Costs = sum(propcosts)) %>%
      ungroup() %>%
      arrange(desc(Costs))

head(mostharmful_prop, 10)
## # A tibble: 10 x 2
##    EVTYPE              Costs
##    <chr>               <dbl>
##  1 FLOOD             134691.
##  2 HURRICANE/TYPHOON  69306.
##  3 STORM SURGE        43171.
##  4 TORNADO            19461.
##  5 HAIL               11989.
##  6 FLASH FLOOD        11877.
##  7 TROPICAL STORM      7195.
##  8 HIGH WIND           4946.
##  9 WILDFIRE            4759.
## 10 STORM SURGE/TIDE    4641.

Figure 2

g <- ggplot(data = mostharmful_prop[1:10,], 
            aes(x = Costs, y = reorder(EVTYPE, Costs),
                fill = reorder(EVTYPE, Costs)))

g + 
  geom_col(color = "black") + 
  scale_fill_brewer(palette = "PuOr") + 
  theme_minimal() +
  theme(legend.position = "none", 
        axis.title.y = element_blank(), 
        plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
  ggtitle("Events with highest properties damage costs\nUnited States: 2000-2011") +
  xlab("Cost in M$")

Table 3

#data with sum of crop costs by type of event      
mostharmful_crop <- 
  economic %>%
      group_by(EVTYPE) %>%
      summarise(Costs = sum(cropcosts)) %>%
      ungroup() %>%
      arrange(desc(Costs))

head(mostharmful_crop, 10)
## # A tibble: 10 x 2
##    EVTYPE            Costs
##    <chr>             <dbl>
##  1 DROUGHT           9136.
##  2 FLOOD             4222.
##  3 HURRICANE/TYPHOON 2608.
##  4 HAIL              1785.
##  5 FROST/FREEZE      1094.
##  6 FLASH FLOOD        905.
##  7 HIGH WIND          498.
##  8 EXCESSIVE HEAT     492.
##  9 HURRICANE          449.
## 10 TROPICAL STORM     412.

Figure 3

g <- ggplot(data = mostharmful_crop[1:10,], 
            aes(x = Costs, y = reorder(EVTYPE, Costs), 
                fill = reorder(EVTYPE, -Costs)))

g + 
  geom_col(color = "black") + 
  scale_fill_brewer(palette = "BrBG") + 
  theme_minimal() +
  theme(legend.position = "none", 
        axis.title.y = element_blank(), 
        plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
  ggtitle("Events with highest crop damage costs\nUnited States: 2000-2011") +
  xlab("Cost in M$")

References

National Weather Service [Storm Data Documentation] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf National Weather Service [FAQ] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf