Synopsis:

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This report document contains the results of exploration of the NOAA Storm Database and answers to some basic questions about severe weather events:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Briefly, across the United States, tornado events are by far most harmful with respect to population health by incuring the highest cumulative number of fatalities and injuries. However, hail events have the greatest cumulative economic consequences in terms of property and crop damages, followed by tornado events. The annual health and economic impacts by these two severe weather events have varied over time.

Data Processing

The data used for this report comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file is downloadable from a course web site.

There is some documentation of the database available, where one will find how some of the variables are constructed/defined: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

    * National Weather Service Storm Data Documentation 
    * National Climatic Data Center Storm Events FAQ

The event records in the database are from year 1950 ending in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The codes below downloaded and loaded the data into R:

The next chunk of codes created new variables in preparation for analysis. First create the year variable:

dat$DATE <- parse_date_time(dat$BGN_DATE, "%m-%d-%Y %H:%M:%S")
dat$YEAR <- year(dat$DATE)

The key variables describing health impact include FATALITIES and INJURIES. Total number of adverse cases (fatalities + injuries) was calculated for each severe weather event type.

Two variables contain dollar estimates that are rounded to three significant digits in the raw data: PROPDMG (property damage) and CROPDMG (crop damage). Each of these two variables is followed by another variable with an alphabetical character signifying the magnitude of the number(PROPDMGEXP and CROPDMGEXP), i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. A new analyzable variable for economic damage was computed from each of the two variable pairs: PROPDMG/PROPDMGEXP and CROPDMG/CROPDMGEXP.

## create 2 new variables for (1) population harm, (2) economic damage

dat.tidy <- dat %>%
        mutate(
                POPHARM = FATALITIES + INJURIES, # population health harm
                PROPDMGVALUE = # property damage $ value
                        if_else(PROPDMGEXP == "K", PROPDMG*1, #  $ start with unit in thousands
                        if_else(PROPDMGEXP == "M", PROPDMG*1000, PROPDMG*1000000)),
                CROPDMGVALUE = # crop damage $ value
                        if_else(CROPDMGEXP == "K", CROPDMG*1,
                        if_else(CROPDMGEXP == "M", CROPDMG*1000, CROPDMG*1000000))) %>%
        mutate(ECONDMG = PROPDMGVALUE + CROPDMGVALUE) %>% # economic damage $ value
        select(EVTYPE,YEAR,POPHARM,ECONDMG) %>%
        group_by(EVTYPE,YEAR) %>%
        summarise(ECON_DAMAGE = round(sum(ECONDMG, na.rm = TRUE)/1000000, 2), # convert $ unit to billions
                  POPL_HARM = sum(POPHARM, na.rm = TRUE)) 

## Compute for each event type: (1) total population harm, (2) total economic damage

dat1 <- dat.tidy %>%
        select(-YEAR) %>%
        group_by(EVTYPE) %>%
        summarise(T.POPL_HARM = sum(POPL_HARM, na.rm=TRUE), 
              T.ECON_DAMAGE = sum(ECON_DAMAGE, na.rm=TRUE)) 

Analysis

To determing the most impactful event type to population healt, the tidy dataset was ranked by total population harm variable and the top 5 event types were selected.

Hdata <- dat1[, c("EVTYPE", "T.POPL_HARM")] %>% arrange(desc(T.POPL_HARM)) %>% 
    mutate(EVTYPE = parse_factor(EVTYPE, levels = NULL, include_na = FALSE)) %>% 
    slice(1:5)

For event type with the greatest economic impact the tidy dataset was ranked by total economic damage variable and the top 5 event types were selected.

Edata <- dat1[, c("EVTYPE", "T.ECON_DAMAGE")] %>% arrange(desc(T.ECON_DAMAGE)) %>% 
    mutate(EVTYPE = parse_factor(EVTYPE, levels = NULL, include_na = FALSE)) %>% 
    slice(1:5)

The above codes will generate only the total cumulative impacts of the weather event types over the years of records keeping. After identifying the most impactful event types, knowing their pattern of impacts over time may have additional policy implications.

Therefore, a new dataset with yearly estimates for each event type over 10 years from 2002 to 2011 (recent records are more complete) was created in order to conduct a trend analysis for the identified single most impactful event type to each of population health and economy.

The new dataset was created with the following codes:

dat2 <- dat.tidy %>% filter((EVTYPE == "HAIL" | EVTYPE == "TORNADO") & (YEAR > 
    2001)) %>% select(EVTYPE, POPL_HARM, ECON_DAMAGE, YEAR)


names(dat2)[1] <- "Event_type"  # substitute more interpretable name on subsequent graphs

Results

Figure 1 shows that tornado by far caused the greatest total harm to population health in the United States in the years between 1950 and 2011.

theme_set(theme_bw())
ggplot(Hdata, aes(x = EVTYPE, y = T.POPL_HARM)) + geom_bar(stat = "identity", 
    width = 0.8, fill = "tomato3") + labs(title = "Figure 1. Tornado caused the greatest total harm to US population health", 
    subtitle = "Top 5 severe weather events ranked according to total health harm (1950 to 2011)", 
    caption = "Source: Storm Data from NOAA", y = "Total casualties & injuries", 
    x = "Severe weather event type") + theme(axis.text.x = element_text(angle = 65, 
    vjust = 0.6))

From Figure 2, hail caused the greatest total loss to the US economy over the same period (1950 to 2011), followed by tornado. However, hail was not on the top list of severe weather events causing most harm to population health in Figure 1.

theme_set(theme_bw())
ggplot(Edata, aes(x = EVTYPE, y = T.ECON_DAMAGE)) + geom_bar(stat = "identity", 
    width = 0.8, fill = "tomato3") + labs(title = "Figure 2. Hail caused the greatest loss to the US economy", 
    subtitle = "Top 5 severe weather events ranked according to total economic damage (1950 to 2011)", 
    caption = "Source: Storm Data from NOAA", y = "Property & crop damage (billion US $)", 
    x = "Severe weather event type") + theme(axis.text.x = element_text(angle = 55, 
    vjust = 0.6))

Following the identification of hail and tornado as having the greatest impacts on population health and economic consequences, Figure 3 was constructed to show the trend of their impacts from 2002 to 2011.

g1 <- ggplot(dat2, aes(x = YEAR, y = POPL_HARM, color = Event_type, shape = Event_type)) + 
    geom_line(size = 3, alpha = 0.6) + geom_point(size = 3) + labs(y = "Fatalities/injuries", 
    x = "Year") + theme(axis.text.x = element_text(face = "bold", color = "#993333", 
    size = 14, angle = 45), axis.text.y = element_text(face = "bold", color = "#993333", 
    size = 14, angle = 45)) + scale_x_continuous(breaks = c(2002, 2003, 2004, 
    2005, 2006, 2007, 2008, 2009, 2010, 2011))


g2 <- ggplot(dat2, aes(x = YEAR, y = ECON_DAMAGE, color = Event_type, shape = Event_type)) + 
    geom_line(size = 3, alpha = 0.6) + geom_point(size = 3) + labs(y = "Damage (Billions $)", 
    x = "Year") + theme(axis.text.x = element_text(face = "bold", color = "#993333", 
    size = 14, angle = 45), axis.text.y = element_text(face = "bold", color = "#993333", 
    size = 14, angle = 45)) + scale_x_continuous(breaks = c(2002, 2003, 2004, 
    2005, 2006, 2007, 2008, 2009, 2010, 2011))



grid.arrange(g1, g2, nrow = 2, top = "Figure 3. Economic and population health impacts of Hail and Tornado over time")

The population health impact of tornado events spiked in 2011, so did its economic impact which was greater than for hail events in the same year. The economic impact of hail events peaked in 2011.

Appendix

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 15063)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2    gridExtra_2.3   ggplot2_2.2.1   lubridate_1.6.0
## [5] dplyr_0.7.2     tidyr_0.7.0     readr_1.1.1     httr_1.3.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11     knitr_1.16       bindr_0.1        magrittr_1.5    
##  [5] hms_0.3          munsell_0.4.3    colorspace_1.3-2 R6_2.2.2        
##  [9] rlang_0.1.1      plyr_1.8.4       stringr_1.2.0    tools_3.4.1     
## [13] grid_3.4.1       gtable_0.2.0     htmltools_0.3.6  lazyeval_0.2.0  
## [17] yaml_2.1.14      rprojroot_1.2    digest_0.6.12    assertthat_0.2.0
## [21] tibble_1.3.3     formatR_1.5      purrr_0.2.3      glue_1.1.1      
## [25] evaluate_0.10.1  rmarkdown_1.6    labeling_0.3     stringi_1.1.5   
## [29] compiler_3.4.1   scales_0.5.0     backports_1.1.0  pkgconfig_2.0.1