Introduction: Severe weather events are an important cause of public health and economical damage. Global warming is leading to an increase in frequency and intensity of severe weather events.
Materials and methods: The U.S. National Oceanic and Atmospheric Administration storm data base was utilized and processed using R Studio version 4.0.2.
Analysis: The objective of this project is to identify the type of severe weather events that have had a greater impact to US economy and public health. The NOAA storms database contains data from the period between 1950 and November 2011. The necessary variables for this analysis were selected, converted into the adequate variable types, grouped by event types and separated into different data frames. The total public health impact was estimated by the sum of the total fatalities and injuries.
The total economic impact was estimated with the sum of the property and crop damage by event type. Plots were created to visualize the ten types of severe weather events that have the greatest public health and economical impact.
Results: During the period analyzed, tornadoes have amounted the greatest impact on public health. Wild fires have amounted the greatest economical impact.
Conclusion: Global warming will continue to increase the frequency and intensity of severe weather events. Actions must be taken in order to reduce their impact on the United States Economy and public health.
Climate change and it’s impact has become a leading threat to the nation’s health. The increased temperature of the earth’s surface, air and water has lead to a higher intensity and frequency of precipitations, storms, hurricanes, floods, droughts and associated wild fires.1
There is a well-established association between high ambient temperature and higher rates of mortality in the US and around the world.2
Health outcomes from severe weather events can arise from multiple situations3:
The economical impact of severe weather events can be measured by the damage to Property buildings and Crops. The national weather service makes a best guess using all available data at the time of the publication. The damage amounts are received from a variety of sources. Property and crop damage reported in the NOAA database should be considered as broad estimates.
In this project, the impact to public health and economical damage derived from severe weather events will be evaluated.
For this project the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been used. This database tracks characteristics of major storm and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Descriptive statistics were used for the analysis. Frequency tables were created to address the research questions.
For the analysis and data processing R studio was used, below is displayed the session info that was used during the making of this report.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Spanish_Mexico.1252 LC_CTYPE=Spanish_Mexico.1252
## [3] LC_MONETARY=Spanish_Mexico.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Mexico.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.28 R6_2.5.1 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.12 stringi_1.7.5 jquerylib_0.1.4
## [9] bslib_0.3.1 rmarkdown_2.11 tools_4.0.2 stringr_1.4.0
## [13] xfun_0.26 yaml_2.2.1 fastmap_1.1.0 compiler_4.0.2
## [17] htmltools_0.5.2 knitr_1.36 sass_0.4.0
The objective of this project is to determine the impact of storms and severe weather events that have taken place in the United States from the years 1950 and November 2011 by answering the following questions:
To answer this questions, the NOAA database was processed to obtain the relevant data.
For data processing, the package tiyverse and magrittr were used.
library(tidyverse)
library(magrittr)
The URL to download the file was saved in an object called url, and an if loop was created to download the file, if this doesn’t exist already in the working directory.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(file.exists("./Data/repData_data_StormData.csv.bz2")){
message("The file was already downloaded.")
} else {
download.file(url, destfile = "./Data/repData_data_StormData.csv.bz2")
ifelse(file.exists("./Data/repData_data_StormData.csv.bz2"),
message("File Downloaded Succesfully"),
message("Error downloading file"))
}
## The file was already downloaded.
The data was read and stored in a data frame called data, which then was converted into a tibble, for easier visualization.
data <- read.csv("./Data/repData_data_StormData.csv.bz2", header = TRUE)
data <- as_tibble(data)
head(data)
## # A tibble: 6 x 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950~ 0130 CST 97 MOBILE AL TORNA~ 0
## 2 1 4/18/1950~ 0145 CST 3 BALDWIN AL TORNA~ 0
## 3 1 2/20/1951~ 1600 CST 57 FAYETTE AL TORNA~ 0
## 4 1 6/8/1951 ~ 0900 CST 89 MADISON AL TORNA~ 0
## 5 1 11/15/195~ 1500 CST 43 CULLMAN AL TORNA~ 0
## 6 1 11/15/195~ 2000 CST 77 LAUDERDALE AL TORNA~ 0
## # ... with 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <int>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
The data frame has the following dimensions: 902297, 37. It contains 902,297 observations of 37 variables.
The structure of the variables is as follows:
str(data)
## tibble [902,297 x 37] (S3: tbl_df/tbl/data.frame)
## $ STATE__ : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr [1:902297] "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr [1:902297] "" "" "" "" ...
## $ BGN_LOCATI: chr [1:902297] "" "" "" "" ...
## $ END_DATE : chr [1:902297] "" "" "" "" ...
## $ END_TIME : chr [1:902297] "" "" "" "" ...
## $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
## $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr [1:902297] "" "" "" "" ...
## $ END_LOCATI: chr [1:902297] "" "" "" "" ...
## $ LENGTH : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
## $ CROPDMG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr [1:902297] "" "" "" "" ...
## $ WFO : chr [1:902297] "" "" "" "" ...
## $ STATEOFFIC: chr [1:902297] "" "" "" "" ...
## $ ZONENAMES : chr [1:902297] "" "" "" "" ...
## $ LATITUDE : num [1:902297] 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
## $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
## $ REMARKS : chr [1:902297] "" "" "" "" ...
## $ REFNUM : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...
Some variables were redefined as factors for easier data manipulation. The modified variables were:
STATE__COUNTYCOUNTYNAMESTATEEVTYPEcharvars <- c("STATE__", "COUNTY", "COUNTYNAME", "STATE", "EVTYPE")
data %<>% mutate_at(charvars, factor)
str(data[, 1:8])
## tibble [902,297 x 8] (S3: tbl_df/tbl/data.frame)
## $ STATE__ : Factor w/ 70 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : Factor w/ 557 levels "0","1","2","3",..: 98 4 58 90 44 78 10 124 126 58 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
BGN_DATE and END_DATE were converted into date format (YYYY-MM-DD) for easier manipulation and plotting, and time for those dates were removed, as they are specified in another variable.
data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y")
data$END_DATE <- as.Date(data$END_DATE, "%m/%d/%Y")
str(data[, c(2, 12)])
## tibble [902,297 x 2] (S3: tbl_df/tbl/data.frame)
## $ BGN_DATE: Date[1:902297], format: "1950-04-18" "1950-04-18" ...
## $ END_DATE: Date[1:902297], format: NA NA ...
The necessary variables for the analysis were selected from the original data frame and another data frame called data2 with the selected variables was created. The selected variables were:
STATECOUNTYNAMEBGN_DATEEVTYPEEND_DATEFATALITIESINJURIESPROPDMGPROPDMGEXPCROPDMGCROPDMGEXPdata2 <- data %>%
select(STATE, COUNTYNAME, BGN_DATE, EVTYPE, END_DATE, FATALITIES,
INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(data2)
## # A tibble: 6 x 11
## STATE COUNTYNAME BGN_DATE EVTYPE END_DATE FATALITIES INJURIES PROPDMG
## <fct> <fct> <date> <fct> <date> <dbl> <dbl> <dbl>
## 1 AL MOBILE 1950-04-18 TORNADO NA 0 15 25
## 2 AL BALDWIN 1950-04-18 TORNADO NA 0 0 2.5
## 3 AL FAYETTE 1951-02-20 TORNADO NA 0 2 25
## 4 AL MADISON 1951-06-08 TORNADO NA 0 2 2.5
## 5 AL CULLMAN 1951-11-15 TORNADO NA 0 2 2.5
## 6 AL LAUDERDALE 1951-11-15 TORNADO NA 0 6 2.5
## # ... with 3 more variables: PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>
The variables from data2 were grouped by event type, fatalities and injuries in order to summarize them.
health <- data2 %>%
group_by(EVTYPE) %>%
summarize(fatalities_n = sum(FATALITIES), injuries_n = sum(INJURIES))
tail(health)
## # A tibble: 6 x 3
## EVTYPE fatalities_n injuries_n
## <fct> <dbl> <dbl>
## 1 WINTER WEATHER/MIX 28 72
## 2 WINTERY MIX 0 0
## 3 Wintry mix 0 0
## 4 Wintry Mix 0 0
## 5 WINTRY MIX 1 77
## 6 WND 0 0
Now we have gotten a data frame with the total number of fatalities and injuries for every event type, we will select only those rows that had one or more injuries or fatalities.
health <- health %>% filter(fatalities_n != 0 | injuries_n != 0)
head(health)
## # A tibble: 6 x 3
## EVTYPE fatalities_n injuries_n
## <fct> <dbl> <dbl>
## 1 AVALANCE 1 0
## 2 AVALANCHE 224 170
## 3 BLACK ICE 1 24
## 4 BLIZZARD 101 805
## 5 blowing snow 1 1
## 6 BLOWING SNOW 1 13
A third variable was added, corresponding to the sum of the total number of fatalities plus injuries per event type.
health <- health %>% mutate(total_n = fatalities_n + injuries_n)
head(health)
## # A tibble: 6 x 4
## EVTYPE fatalities_n injuries_n total_n
## <fct> <dbl> <dbl> <dbl>
## 1 AVALANCE 1 0 1
## 2 AVALANCHE 224 170 394
## 3 BLACK ICE 1 24 25
## 4 BLIZZARD 101 805 906
## 5 blowing snow 1 1 2
## 6 BLOWING SNOW 1 13 14
Then the data frame was sorted based on the value from the new variable, in decreasing order.
order <- order(health$total_n, decreasing = TRUE)
health <- health[order, ]
health[1:10,]
## # A tibble: 10 x 4
## EVTYPE fatalities_n injuries_n total_n
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
Above are shown the 10 events that accounted for the greatest total fatalities and injuries combined.
For the estimation of the economical cost caused by event type, the following variables were utilized:
PROPDMGPROPDMGEXPCROPDMCROPDMGEXPTo get the total damage cost, the variables PROPDMG and CROPDMG were multiplied by the exponential variables PROPDMGEXP and CROPDMGEXP, which were first converted from the character to their corresponding numeric value (“K” to one thousand, “M” to one million).
data2$PROPDMGEXP <- gsub("K", 1000, data2$PROPDMGEXP, ignore.case = TRUE)
data2$PROPDMGEXP <- gsub("M", 1000000, data2$PROPDMGEXP, ignore.case = TRUE)
data2$CROPDMGEXP <- gsub("K", 1000, data2$CROPDMGEXP, ignore.case = TRUE)
data2$CROPDMGEXP <- gsub("M", 1000000, data2$CROPDMGEXP, ignore.case = TRUE)
data2$PROPDMGEXP <- as.numeric(data2$PROPDMGEXP)
data2$CROPDMGEXP <- as.numeric(data2$CROPDMGEXP)
2 new variables were created, with the corresponding cost multiplied by it’s exponent.
data2$prop_subtotal <- data2$PROPDMG * data2$PROPDMGEXP
data2$crop_subtotal <- data2$CROPDMG * data2$CROPDMGEXP
The modified data frame was then grouped by event type to summarize the total cost of property and crop damage
econ <- data2 %>%
group_by(EVTYPE) %>%
summarize(prop_total = sum(prop_subtotal), crop_total = sum(crop_subtotal))
head(econ)
## # A tibble: 6 x 3
## EVTYPE prop_total crop_total
## <fct> <dbl> <dbl>
## 1 " HIGH SURF ADVISORY" 200000 NA
## 2 " COASTAL FLOOD" NA NA
## 3 " FLASH FLOOD" 50000 NA
## 4 " LIGHTNING" NA NA
## 5 " TSTM WIND" NA NA
## 6 " TSTM WIND (G45)" 8000 NA
The events that had missing values or values equal to zero were eliminated, and a new column with the combined total of crop and property damage was added.
prop_na <- is.na(econ$prop_total)
crop_na <- is.na(econ$crop_total)
econ[prop_na, 2] <- 0
econ[crop_na, 3] <- 0
econ <- econ %>% filter(prop_total != 0 | crop_total != 0)
econ <- econ %>% mutate(comb_total = prop_total + crop_total)
head(econ)
## # A tibble: 6 x 4
## EVTYPE prop_total crop_total comb_total
## <fct> <dbl> <dbl> <dbl>
## 1 " HIGH SURF ADVISORY" 200000 0 200000
## 2 " FLASH FLOOD" 50000 0 50000
## 3 " TSTM WIND (G45)" 8000 0 8000
## 4 "?" 5000 0 5000
## 5 "APACHE COUNTY" 5000 0 5000
## 6 "ASTRONOMICAL LOW TIDE" 320000 0 320000
The resulting data frame was then sorted by decreasing order based on combined total cost.
order <- order(econ$comb_total, decreasing = TRUE)
econ <- econ[order, ]
econ[1:10, ]
## # A tibble: 10 x 4
## EVTYPE prop_total crop_total comb_total
## <fct> <dbl> <dbl> <dbl>
## 1 WILD FIRES 624100000 0 624100000
## 2 HAILSTORM 241000000 0 241000000
## 3 EXCESSIVE WETNESS 0 142000000 142000000
## 4 HIGH WINDS/COLD 110500000 7000000 117500000
## 5 River Flooding 106155000 0 106155000
## 6 MAJOR FLOOD 105000000 0 105000000
## 7 COLD AND WET CONDITIONS 0 66000000 66000000
## 8 WINTER STORM HIGH WINDS 60000000 5000000 65000000
## 9 HURRICANE EMILY 50000000 0 50000000
## 10 Early Frost 0 42000000 42000000
A bar plot was created with ggplot2 to visualize the ten most harmful type of events across the United States. The code to generate the plot is shown below, but the plot will be shown in the Results section.
g <- ggplot(health[1:10, ], aes(x = reorder(EVTYPE, -total_n), y = total_n, fill = EVTYPE))
g + geom_bar(stat = "identity") +
scale_fill_manual(values = c("#ef946c", "#4f6d7a", "#758e4f", "#f2f3ae", "#ffcb69",
"#d08c60", "#997B66", "#845a6d", "#BDAA9D", "#ffc176")) +
labs(title = "Most harmful types of events across the United States",
x = "Event type", y = "Damage to population health (Fatalities + Injuries)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Another plot was created to visualize the economical impact by type of events across the United States. The code is shown below and the plot will be shown in the Results section.
h <- ggplot(econ[1:10, ], aes(x = reorder(EVTYPE, -comb_total), y = comb_total/1000000, fill = EVTYPE))
h + geom_bar(stat = "identity") +
scale_fill_manual(values = c("#ef946c", "#4f6d7a", "#758e4f", "#F2F3AE", "#FFCB69",
"#D08C60", "#997B66", "#845a6d", "#BDAA9D", "#ffc176")) +
labs(title = "Greatest economical impact by type of events across the United States",
x = "Event type", y = "Economical Impact (Millions)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Now that the data has been processed, the project questions will be answered.
After processing and plotting the data, it is evident that the most harmful events for public health during the period analyzed was tornadoes, accounting a total of 5,643 fatalities and 91,346 injuries. Below the plot is shown.
Public Health impact of severe weather events. The total damage to public health was estimated with the sum of fatalities and injuries per event type, and displayed in a bar plot on the Y-axis.
The type of severe weather event that has had a greater economical impact during the period analyzed was wild fires, with a total estimated damage of 624.1 million dollars. The plot is shown below.
Economical impact of severe weather events. The total economical impact was estimated by the sum of property and crop damage, shown in the Y-axis of the bar plot, in millions of USD.
Tornadoes have amounted the most damage to public health, followed by excessive heat, thunderstorm winds, floods and lightening. Wild fires is the leading cause of economical damage, followed by hailstorm, excessive wetness, high winds, river flooding and major flooding.
These events are associated with global warming and climate change. Actions are needed to address the impact of severe weather events, in order to shorten or avoid the shortage of medical services, and to proportionate more safety during post-severe weather events actions, as well as enhance the infrastructure of emergency services and housing buildings, to avoid the damage caused by storms and floods.
Runkle J, Svendsen ER, Hamann M, Kwok RK, Pearce J. Population Health Adaptation Approaches to the Increasing Severity and Frequency of Weather-Related Disasters Resulting From our Changing Climate: A Literature Review and Application to Charleston, South Carolina. Curr Environ Heal reports. 2018;5(4):439–52.
Danielle X. Morales, Sara E. Grineski and TWC. Effectiveness of National Weather Service Heat Alerts in Preventing Mortality in 20 US Cities. Physiol Behav. 2016;176(1):139–48.
Lane K, Charles-Guzman K, Wheeler K, Abid Z, Graber N, Matte T. Health effects of coastal storms and flooding in urban areas: A review and vulnerability assessment. J Environ Public Health. 2013;2013.