Summary

Introduction: Severe weather events are an important cause of public health and economical damage. Global warming is leading to an increase in frequency and intensity of severe weather events.

Materials and methods: The U.S. National Oceanic and Atmospheric Administration storm data base was utilized and processed using R Studio version 4.0.2.

Analysis: The objective of this project is to identify the type of severe weather events that have had a greater impact to US economy and public health. The NOAA storms database contains data from the period between 1950 and November 2011. The necessary variables for this analysis were selected, converted into the adequate variable types, grouped by event types and separated into different data frames. The total public health impact was estimated by the sum of the total fatalities and injuries.
The total economic impact was estimated with the sum of the property and crop damage by event type. Plots were created to visualize the ten types of severe weather events that have the greatest public health and economical impact.

Results: During the period analyzed, tornadoes have amounted the greatest impact on public health. Wild fires have amounted the greatest economical impact.

Conclusion: Global warming will continue to increase the frequency and intensity of severe weather events. Actions must be taken in order to reduce their impact on the United States Economy and public health.



Introduction

Climate change and it’s impact has become a leading threat to the nation’s health. The increased temperature of the earth’s surface, air and water has lead to a higher intensity and frequency of precipitations, storms, hurricanes, floods, droughts and associated wild fires.1
There is a well-established association between high ambient temperature and higher rates of mortality in the US and around the world.2

Health outcomes from severe weather events can arise from multiple situations3:

  1. Hazards from exposure to storm impact.
  2. Evacuation.
  3. Post-storm hazards from utility outages and sheltering in place in inadequate housing.
  4. Exposure to secondary hazards including contaminated drinking water, contact with contaminated flood waters and mold and moisture in housing.
  5. Population displacement and disruption of services.
  6. Mental health effects from traumatic or stressful experiences during and after storms.
  7. Health and safety risks from clean-up and recovery activities.

The economical impact of severe weather events can be measured by the damage to Property buildings and Crops. The national weather service makes a best guess using all available data at the time of the publication. The damage amounts are received from a variety of sources. Property and crop damage reported in the NOAA database should be considered as broad estimates.

In this project, the impact to public health and economical damage derived from severe weather events will be evaluated.



Materials and Methods

For this project the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been used. This database tracks characteristics of major storm and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Descriptive statistics were used for the analysis. Frequency tables were created to address the research questions.

For the analysis and data processing R studio was used, below is displayed the session info that was used during the making of this report.

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Spanish_Mexico.1252  LC_CTYPE=Spanish_Mexico.1252   
## [3] LC_MONETARY=Spanish_Mexico.1252 LC_NUMERIC=C                   
## [5] LC_TIME=Spanish_Mexico.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.28   R6_2.5.1        jsonlite_1.7.2  magrittr_2.0.1 
##  [5] evaluate_0.14   rlang_0.4.12    stringi_1.7.5   jquerylib_0.1.4
##  [9] bslib_0.3.1     rmarkdown_2.11  tools_4.0.2     stringr_1.4.0  
## [13] xfun_0.26       yaml_2.2.1      fastmap_1.1.0   compiler_4.0.2 
## [17] htmltools_0.5.2 knitr_1.36      sass_0.4.0



Analysis

The objective of this project is to determine the impact of storms and severe weather events that have taken place in the United States from the years 1950 and November 2011 by answering the following questions:

  1. Across the US, which type of events are most harmful with respect to population health?
  2. Across the US, which type of events have the greatest economic consequences?

To answer this questions, the NOAA database was processed to obtain the relevant data.



Data Processing

For data processing, the package tiyverse and magrittr were used.

library(tidyverse)
library(magrittr)

The URL to download the file was saved in an object called url, and an if loop was created to download the file, if this doesn’t exist already in the working directory.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(file.exists("./Data/repData_data_StormData.csv.bz2")){
        message("The file was already downloaded.")
} else {
        download.file(url, destfile = "./Data/repData_data_StormData.csv.bz2")
        ifelse(file.exists("./Data/repData_data_StormData.csv.bz2"),
               message("File Downloaded Succesfully"), 
               message("Error downloading file"))
}
## The file was already downloaded.

The data was read and stored in a data frame called data, which then was converted into a tibble, for easier visualization.

data <- read.csv("./Data/repData_data_StormData.csv.bz2", header = TRUE)
data <- as_tibble(data)
head(data)
## # A tibble: 6 x 37
##   STATE__ BGN_DATE   BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
##     <dbl> <chr>      <chr>    <chr>      <dbl> <chr>      <chr> <chr>      <dbl>
## 1       1 4/18/1950~ 0130     CST           97 MOBILE     AL    TORNA~         0
## 2       1 4/18/1950~ 0145     CST            3 BALDWIN    AL    TORNA~         0
## 3       1 2/20/1951~ 1600     CST           57 FAYETTE    AL    TORNA~         0
## 4       1 6/8/1951 ~ 0900     CST           89 MADISON    AL    TORNA~         0
## 5       1 11/15/195~ 1500     CST           43 CULLMAN    AL    TORNA~         0
## 6       1 11/15/195~ 2000     CST           77 LAUDERDALE AL    TORNA~         0
## # ... with 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## #   END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## #   END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <int>,
## #   MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## #   PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## #   STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## #   LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>

The data frame has the following dimensions: 902297, 37. It contains 902,297 observations of 37 variables.

The structure of the variables is as follows:

str(data)
## tibble [902,297 x 37] (S3: tbl_df/tbl/data.frame)
##  $ STATE__   : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:902297] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:902297] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr [1:902297] "" "" "" "" ...
##  $ BGN_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ END_DATE  : chr [1:902297] "" "" "" "" ...
##  $ END_TIME  : chr [1:902297] "" "" "" "" ...
##  $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
##  $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr [1:902297] "" "" "" "" ...
##  $ END_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ LENGTH    : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr [1:902297] "" "" "" "" ...
##  $ WFO       : chr [1:902297] "" "" "" "" ...
##  $ STATEOFFIC: chr [1:902297] "" "" "" "" ...
##  $ ZONENAMES : chr [1:902297] "" "" "" "" ...
##  $ LATITUDE  : num [1:902297] 3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
##  $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
##  $ REMARKS   : chr [1:902297] "" "" "" "" ...
##  $ REFNUM    : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...

Some variables were redefined as factors for easier data manipulation. The modified variables were:

  • STATE__
  • COUNTY
  • COUNTYNAME
  • STATE
  • EVTYPE
charvars <- c("STATE__", "COUNTY", "COUNTYNAME", "STATE", "EVTYPE")
data %<>% mutate_at(charvars, factor)
str(data[, 1:8])
## tibble [902,297 x 8] (S3: tbl_df/tbl/data.frame)
##  $ STATE__   : Factor w/ 70 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:902297] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : Factor w/ 557 levels "0","1","2","3",..: 98 4 58 90 44 78 10 124 126 58 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...

BGN_DATE and END_DATE were converted into date format (YYYY-MM-DD) for easier manipulation and plotting, and time for those dates were removed, as they are specified in another variable.

data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y")
data$END_DATE <- as.Date(data$END_DATE, "%m/%d/%Y")
str(data[, c(2, 12)])
## tibble [902,297 x 2] (S3: tbl_df/tbl/data.frame)
##  $ BGN_DATE: Date[1:902297], format: "1950-04-18" "1950-04-18" ...
##  $ END_DATE: Date[1:902297], format: NA NA ...

The necessary variables for the analysis were selected from the original data frame and another data frame called data2 with the selected variables was created. The selected variables were:

  • STATE
  • COUNTYNAME
  • BGN_DATE
  • EVTYPE
  • END_DATE
  • FATALITIES
  • INJURIES
  • PROPDMG
  • PROPDMGEXP
  • CROPDMG
  • CROPDMGEXP
data2 <- data %>% 
  select(STATE, COUNTYNAME, BGN_DATE, EVTYPE, END_DATE, FATALITIES, 
         INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(data2)
## # A tibble: 6 x 11
##   STATE COUNTYNAME BGN_DATE   EVTYPE  END_DATE FATALITIES INJURIES PROPDMG
##   <fct> <fct>      <date>     <fct>   <date>        <dbl>    <dbl>   <dbl>
## 1 AL    MOBILE     1950-04-18 TORNADO NA                0       15    25  
## 2 AL    BALDWIN    1950-04-18 TORNADO NA                0        0     2.5
## 3 AL    FAYETTE    1951-02-20 TORNADO NA                0        2    25  
## 4 AL    MADISON    1951-06-08 TORNADO NA                0        2     2.5
## 5 AL    CULLMAN    1951-11-15 TORNADO NA                0        2     2.5
## 6 AL    LAUDERDALE 1951-11-15 TORNADO NA                0        6     2.5
## # ... with 3 more variables: PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>



Fatalities and Injuries

The variables from data2 were grouped by event type, fatalities and injuries in order to summarize them.

health <- data2 %>% 
  group_by(EVTYPE) %>%
  summarize(fatalities_n = sum(FATALITIES), injuries_n = sum(INJURIES))
tail(health)
## # A tibble: 6 x 3
##   EVTYPE             fatalities_n injuries_n
##   <fct>                     <dbl>      <dbl>
## 1 WINTER WEATHER/MIX           28         72
## 2 WINTERY MIX                   0          0
## 3 Wintry mix                    0          0
## 4 Wintry Mix                    0          0
## 5 WINTRY MIX                    1         77
## 6 WND                           0          0

Now we have gotten a data frame with the total number of fatalities and injuries for every event type, we will select only those rows that had one or more injuries or fatalities.

health <- health %>% filter(fatalities_n != 0 | injuries_n != 0)
head(health)
## # A tibble: 6 x 3
##   EVTYPE       fatalities_n injuries_n
##   <fct>               <dbl>      <dbl>
## 1 AVALANCE                1          0
## 2 AVALANCHE             224        170
## 3 BLACK ICE               1         24
## 4 BLIZZARD              101        805
## 5 blowing snow            1          1
## 6 BLOWING SNOW            1         13

A third variable was added, corresponding to the sum of the total number of fatalities plus injuries per event type.

health <- health %>% mutate(total_n = fatalities_n + injuries_n)
head(health)
## # A tibble: 6 x 4
##   EVTYPE       fatalities_n injuries_n total_n
##   <fct>               <dbl>      <dbl>   <dbl>
## 1 AVALANCE                1          0       1
## 2 AVALANCHE             224        170     394
## 3 BLACK ICE               1         24      25
## 4 BLIZZARD              101        805     906
## 5 blowing snow            1          1       2
## 6 BLOWING SNOW            1         13      14

Then the data frame was sorted based on the value from the new variable, in decreasing order.

order <- order(health$total_n, decreasing = TRUE)
health <- health[order, ]
health[1:10,]
## # A tibble: 10 x 4
##    EVTYPE            fatalities_n injuries_n total_n
##    <fct>                    <dbl>      <dbl>   <dbl>
##  1 TORNADO                   5633      91346   96979
##  2 EXCESSIVE HEAT            1903       6525    8428
##  3 TSTM WIND                  504       6957    7461
##  4 FLOOD                      470       6789    7259
##  5 LIGHTNING                  816       5230    6046
##  6 HEAT                       937       2100    3037
##  7 FLASH FLOOD                978       1777    2755
##  8 ICE STORM                   89       1975    2064
##  9 THUNDERSTORM WIND          133       1488    1621
## 10 WINTER STORM               206       1321    1527

Above are shown the 10 events that accounted for the greatest total fatalities and injuries combined.



Crops and property

For the estimation of the economical cost caused by event type, the following variables were utilized:

  • PROPDMG
  • PROPDMGEXP
  • CROPDM
  • CROPDMGEXP

To get the total damage cost, the variables PROPDMG and CROPDMG were multiplied by the exponential variables PROPDMGEXP and CROPDMGEXP, which were first converted from the character to their corresponding numeric value (“K” to one thousand, “M” to one million).

data2$PROPDMGEXP <- gsub("K", 1000, data2$PROPDMGEXP, ignore.case = TRUE)
data2$PROPDMGEXP <- gsub("M", 1000000, data2$PROPDMGEXP, ignore.case = TRUE)
data2$CROPDMGEXP <- gsub("K", 1000, data2$CROPDMGEXP, ignore.case = TRUE)
data2$CROPDMGEXP <- gsub("M", 1000000, data2$CROPDMGEXP, ignore.case = TRUE)
data2$PROPDMGEXP <- as.numeric(data2$PROPDMGEXP)
data2$CROPDMGEXP <- as.numeric(data2$CROPDMGEXP)

2 new variables were created, with the corresponding cost multiplied by it’s exponent.

data2$prop_subtotal <- data2$PROPDMG * data2$PROPDMGEXP
data2$crop_subtotal <- data2$CROPDMG * data2$CROPDMGEXP

The modified data frame was then grouped by event type to summarize the total cost of property and crop damage

econ <- data2 %>%
  group_by(EVTYPE) %>%
  summarize(prop_total = sum(prop_subtotal), crop_total = sum(crop_subtotal))
head(econ)
## # A tibble: 6 x 3
##   EVTYPE                  prop_total crop_total
##   <fct>                        <dbl>      <dbl>
## 1 "   HIGH SURF ADVISORY"     200000         NA
## 2 " COASTAL FLOOD"                NA         NA
## 3 " FLASH FLOOD"               50000         NA
## 4 " LIGHTNING"                    NA         NA
## 5 " TSTM WIND"                    NA         NA
## 6 " TSTM WIND (G45)"            8000         NA

The events that had missing values or values equal to zero were eliminated, and a new column with the combined total of crop and property damage was added.

prop_na <- is.na(econ$prop_total)
crop_na <- is.na(econ$crop_total)
econ[prop_na, 2] <- 0
econ[crop_na, 3] <- 0
econ <- econ %>% filter(prop_total != 0 | crop_total != 0)
econ <- econ %>% mutate(comb_total = prop_total + crop_total)
head(econ)
## # A tibble: 6 x 4
##   EVTYPE                  prop_total crop_total comb_total
##   <fct>                        <dbl>      <dbl>      <dbl>
## 1 "   HIGH SURF ADVISORY"     200000          0     200000
## 2 " FLASH FLOOD"               50000          0      50000
## 3 " TSTM WIND (G45)"            8000          0       8000
## 4 "?"                           5000          0       5000
## 5 "APACHE COUNTY"               5000          0       5000
## 6 "ASTRONOMICAL LOW TIDE"     320000          0     320000

The resulting data frame was then sorted by decreasing order based on combined total cost.

order <- order(econ$comb_total, decreasing = TRUE)
econ <- econ[order, ]
econ[1:10, ]
## # A tibble: 10 x 4
##    EVTYPE                  prop_total crop_total comb_total
##    <fct>                        <dbl>      <dbl>      <dbl>
##  1 WILD FIRES               624100000          0  624100000
##  2 HAILSTORM                241000000          0  241000000
##  3 EXCESSIVE WETNESS                0  142000000  142000000
##  4 HIGH WINDS/COLD          110500000    7000000  117500000
##  5 River Flooding           106155000          0  106155000
##  6 MAJOR FLOOD              105000000          0  105000000
##  7 COLD AND WET CONDITIONS          0   66000000   66000000
##  8 WINTER STORM HIGH WINDS   60000000    5000000   65000000
##  9 HURRICANE EMILY           50000000          0   50000000
## 10 Early Frost                      0   42000000   42000000



Plot creation

A bar plot was created with ggplot2 to visualize the ten most harmful type of events across the United States. The code to generate the plot is shown below, but the plot will be shown in the Results section.

g <- ggplot(health[1:10, ], aes(x = reorder(EVTYPE, -total_n), y = total_n, fill = EVTYPE))
g + geom_bar(stat = "identity") +
  scale_fill_manual(values = c("#ef946c", "#4f6d7a", "#758e4f", "#f2f3ae", "#ffcb69", 
                               "#d08c60", "#997B66", "#845a6d", "#BDAA9D", "#ffc176")) + 
  labs(title = "Most harmful types of events across the United States", 
       x = "Event type", y = "Damage to population health (Fatalities + Injuries)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

Another plot was created to visualize the economical impact by type of events across the United States. The code is shown below and the plot will be shown in the Results section.

h <- ggplot(econ[1:10, ], aes(x = reorder(EVTYPE, -comb_total), y = comb_total/1000000, fill = EVTYPE))
h + geom_bar(stat = "identity") +
  scale_fill_manual(values = c("#ef946c", "#4f6d7a", "#758e4f", "#F2F3AE", "#FFCB69", 
                               "#D08C60", "#997B66", "#845a6d", "#BDAA9D", "#ffc176")) + 
  labs(title = "Greatest economical impact by type of events across the United States", 
       x = "Event type", y = "Economical Impact (Millions)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))



Results

Now that the data has been processed, the project questions will be answered.

Across the US, which type of events are most harmful with respect to population health?

After processing and plotting the data, it is evident that the most harmful events for public health during the period analyzed was tornadoes, accounting a total of 5,643 fatalities and 91,346 injuries. Below the plot is shown.

Public Health impact of severe weather events. The total damage to public health was estimated with the sum of fatalities and injuries per event type, and displayed in a bar plot on the Y-axis.



Across the US, which type of events have the greatest economic consequences?

The type of severe weather event that has had a greater economical impact during the period analyzed was wild fires, with a total estimated damage of 624.1 million dollars. The plot is shown below.

Economical impact of severe weather events. The total economical impact was estimated by the sum of property and crop damage, shown in the Y-axis of the bar plot, in millions of USD.



Conclusions

Tornadoes have amounted the most damage to public health, followed by excessive heat, thunderstorm winds, floods and lightening. Wild fires is the leading cause of economical damage, followed by hailstorm, excessive wetness, high winds, river flooding and major flooding.

These events are associated with global warming and climate change. Actions are needed to address the impact of severe weather events, in order to shorten or avoid the shortage of medical services, and to proportionate more safety during post-severe weather events actions, as well as enhance the infrastructure of emergency services and housing buildings, to avoid the damage caused by storms and floods.



References

  1. Runkle J, Svendsen ER, Hamann M, Kwok RK, Pearce J. Population Health Adaptation Approaches to the Increasing Severity and Frequency of Weather-Related Disasters Resulting From our Changing Climate: A Literature Review and Application to Charleston, South Carolina. Curr Environ Heal reports. 2018;5(4):439–52.

  2. Danielle X. Morales, Sara E. Grineski and TWC. Effectiveness of National Weather Service Heat Alerts in Preventing Mortality in 20 US Cities. Physiol Behav. 2016;176(1):139–48.

  3. Lane K, Charles-Guzman K, Wheeler K, Abid Z, Graber N, Matte T. Health effects of coastal storms and flooding in urban areas: A review and vulnerability assessment. J Environ Public Health. 2013;2013.