Severe weather events can have population health and economic consequences. In this analysis we have explored the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine across the United States which types of events are most harmful with respect to population health and which ones have the greatest economic consequences. The results indicate that from 2001 to 2011 tornadoes had the greatest population health consequences with an average of 1,408 injuries and 105 fatalities per year. On the other hand, the results indicate that in the same time range floods had the greatest economic consequences with an average of $12.5 billion in damage per year.
The following libraries need to be loaded in order to reproduce the analysis.
library(ggplot2)
library(gridExtra)
library(grid)
Version information about R, the OS and attached or loaded packages used for the analysis.
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gridExtra_2.3 ggplot2_3.3.2
##
## loaded via a namespace (and not attached):
## [1] knitr_1.30 magrittr_2.0.1 tidyselect_1.1.0 munsell_0.5.0
## [5] colorspace_2.0-0 R6_2.5.0 rlang_0.4.9 stringr_1.4.0
## [9] dplyr_1.0.2 tools_4.0.3 gtable_0.3.0 xfun_0.19
## [13] withr_2.3.0 htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1
## [17] digest_0.6.27 tibble_3.0.4 lifecycle_0.2.0 crayon_1.3.4
## [21] purrr_0.3.4 vctrs_0.3.5 glue_1.4.2 evaluate_0.14
## [25] rmarkdown_2.5 stringi_1.5.3 compiler_4.0.3 pillar_1.4.7
## [29] generics_0.1.0 scales_1.1.1 pkgconfig_2.0.3
This analysis was based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database: Storm Data. The database comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The function read.csv was used to decompress and read the data.
storm_data <- read.csv("repdata_data_StormData.csv.bz2",stringsAsFactors = FALSE)
In order to work with a lighter database, only the variables needed were kept.
storm_data <- storm_data[ , c(
"BGN_DATE", # begin date of the event
"EVTYPE", # type of event
"INJURIES", # number of injuries caused by the event
"FATALITIES", # number of fatalities caused by the event
"PROPDMG", # property damage caused by the event
"PROPDMGEXP", # magnitude of the property damage value
"CROPDMG", # crop damage caused by the event
"CROPDMGEXP", # magnitude of the crops damage value
"REFNUM" # unique identifier of the event
)]
The database has severe weather events information from 1950 to 2011, but for the earlier years there are fewer events recorded. Only the years from 2001 to 2011 were considered for the analysis to make sure a more complete data were used.
storm_data$Dates <- as.POSIXct(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
storm_data$Year <- format(storm_data$Dates, format="%Y")
storm_data <- subset(storm_data, Year >= 2001 & Year <= 2011)
Base on the Storm Data Documentation there are 48 types of events. Nevertheless, there are 169 different values in the event type variable (EVTYPE).
length(unique(storm_data$EVTYPE))
## [1] 169
Because of this discrepancy, this variable must be cleaned. The first thing that was done was to turn this variable into upper case, to combine the lower case and upper case versions of event types into one. Then adjustments were made to correct misspellings and to assign the most similar event type based on the data documentation.
storm_data$EVTYPE = toupper(storm_data$EVTYPE)
storm_data$EVTYPE[(grepl('TSTM', storm_data$EVTYPE) | grepl('THUNDERSTORM', storm_data$EVTYPE)) & !grepl('MARINE', storm_data$EVTYPE) & !grepl('NON', storm_data$EVTYPE)] <- 'THUNDERSTORM WIND'
storm_data$EVTYPE[(grepl('TSTM', storm_data$EVTYPE) | grepl('THUNDERSTORM', storm_data$EVTYPE)) & grepl('MARINE', storm_data$EVTYPE)] <- 'MARINE THUNDERSTORM WIND'
storm_data$EVTYPE[grepl('WINT', storm_data$EVTYPE) & !grepl('STORM', storm_data$EVTYPE)] <- 'WINTER WEATHER'
storm_data$EVTYPE[grepl('FLD', storm_data$EVTYPE)] <- 'FLOOD'
storm_data$EVTYPE[grepl('LAND', storm_data$EVTYPE) | grepl('SLIDE', storm_data$EVTYPE)] <- 'DEBRIS FLOW'
storm_data$EVTYPE[grepl('FIRE', storm_data$EVTYPE)] <- 'WILDFIRE'
storm_data$EVTYPE[grepl('SURF', storm_data$EVTYPE)] <- 'HIGH SURF'
storm_data$EVTYPE[grepl('DRY', storm_data$EVTYPE)] <- 'DROUGHT'
storm_data$EVTYPE[grepl('FR', storm_data$EVTYPE)] <- 'FROST/FREEZE'
storm_data$EVTYPE[grepl('HURRICANE', storm_data$EVTYPE)] <- 'HURRICANE/TYPHOON'
storm_data$EVTYPE[grepl('ICE', storm_data$EVTYPE) | grepl('ICT', storm_data$EVTYPE)] <- 'ICE STORM'
storm_data$EVTYPE[grepl('RIP', storm_data$EVTYPE) & grepl('CURRENT', storm_data$EVTYPE)] <- 'RIP CURRENT'
storm_data$EVTYPE[grepl('SNOW', storm_data$EVTYPE) & !grepl('HEAVY', storm_data$EVTYPE) & !grepl('LAKE', storm_data$EVTYPE)] <- 'WINTER WEATHER'
These adjustments have reduced to only 110 the different values in the event type variable (EVTYPE), narrowing the impact of the discrepancies.
length(unique(storm_data$EVTYPE))
## [1] 110
There are two variables for the health consequences of severe weather events: injuries (INJURIES) and fatalities (FATALITIES). A new variable was created with the total cases (TOTAFF). A new set of tables were created with the total cases per type of event and year and then another set of new tables were created with the average annual cases per type of event.
storm_data$TOTAFF <- storm_data$INJURIES + storm_data$FATALITIES
injuries_evtype_year_sum <- setNames(aggregate(INJURIES ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_INJURIES"))
fatalities_evtype_year_sum <- setNames(aggregate(FATALITIES ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_FATALITIES"))
totaff_evtype_year_sum <- setNames(aggregate(TOTAFF ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_TOTAFF"))
injuries_evtype_avg <- setNames(aggregate(SUM_INJURIES ~ EVTYPE, data = injuries_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_INJURIES"))
fatalities_evtype_avg <- setNames(aggregate(SUM_FATALITIES ~ EVTYPE, data = fatalities_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_FATALITIES"))
totaff_evtype_avg <- setNames(aggregate(SUM_TOTAFF ~ EVTYPE, data = totaff_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_TOTAFF"))
There are two variables for the economic consequences of severe weather events: property damage (PROPDMG) and crop damage (CROPDMG). For each one of them there is another variable with the magnitude of the value (PROPDMGEXP and CROPDMGEXP). For example, if an event has a “10” in the PROPDMG variable and a “M” in the PROPDMGEXP variable, the real property damage would be 10 millions. Because of this the economic consequences variables must be adjusted to reflect their real value. Two new variables were created reflecting this (PROPDMG2 and CROPDMG2) and a variable with the total economic consequences (TOTDMG). The variables were divided by 1,000,000,000 in order to read them as billions.
storm_data$PROPDMGEXP2 <-
ifelse(
toupper(storm_data$PROPDMGEXP) == "H",
100,
ifelse(
toupper(storm_data$PROPDMGEXP) == "K",
1000,
ifelse(
toupper(storm_data$PROPDMGEXP) == "M",
1000000,
ifelse(
toupper(storm_data$PROPDMGEXP) == "B",
1000000000,
1))))
storm_data$CROPDMGEXP2 <-
ifelse(
toupper(storm_data$CROPDMGEXP) == "H",
100,
ifelse(
toupper(storm_data$CROPDMGEXP) == "K",
1000,
ifelse(
toupper(storm_data$CROPDMGEXP) == "M",
1000000,
ifelse(
toupper(storm_data$CROPDMGEXP) == "B",
1000000000,
1))))
storm_data$PROPDMG2 <- storm_data$PROPDMG * storm_data$PROPDMGEXP2 / 1000000000
storm_data$CROPDMG2 <- storm_data$CROPDMG * storm_data$CROPDMGEXP2 / 1000000000
storm_data$TOTDMG <- storm_data$PROPDMG2 + storm_data$CROPDMG2
A new set of tables were created with the total damage per type of event and year and then another set of new tables were created with the average annual damage per type of event.
propdmg_evtype_year_sum <- setNames(aggregate(PROPDMG2 ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_PROPDMG"))
cropdmg_evtype_year_sum <- setNames(aggregate(CROPDMG2 ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_CROPDMG"))
totdmg_evtype_year_sum <- setNames(aggregate(TOTDMG ~ EVTYPE + Year, data = storm_data, sum, na.rm = TRUE),c("EVTYPE", "YEAR","SUM_TOTDMG"))
propdmg_evtype_avgyear <- setNames(aggregate(SUM_PROPDMG ~ EVTYPE, data = propdmg_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_PROPDMG"))
cropdmg_evtype_avgyear <- setNames(aggregate(SUM_CROPDMG ~ EVTYPE, data = cropdmg_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_CROPDMG"))
totdmg_evtype_avgyear <- setNames(aggregate(SUM_TOTDMG ~ EVTYPE, data = totdmg_evtype_year_sum, mean, na.rm = TRUE),c("EVTYPE", "AVG_TOTDMG"))
The results indicate that from 2001 to 2011 tornadoes were the most harmful with respect to population health, with an average of 1,303 injuries and 105 fatalities per year. The top 5 type of events in terms of injuries and fatalities are shown in the following figure.
plot1 <- ggplot(totaff_evtype_avg[order(-totaff_evtype_avg$AVG_TOTAFF),][1:5,], aes(x=reorder(EVTYPE,AVG_TOTAFF), y=AVG_TOTAFF)) +
geom_bar(stat = "identity", fill="coral4")+
geom_text(aes(label = round(AVG_TOTAFF,digits=0)), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Injuries + Fatalities",
subtitle = "Top 5 events by average annual injuries + fatalities")
plot2 <- ggplot(injuries_evtype_avg[order(-injuries_evtype_avg$AVG_INJURIES),][1:5,], aes(x=reorder(EVTYPE,AVG_INJURIES), y=AVG_INJURIES)) +
geom_bar(stat = "identity", fill="coral")+
geom_text(aes(label = round(AVG_INJURIES,digits=0)), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(size = 8, hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Injuries",
subtitle = "Top 5 events by average annual injuries")
plot3 <- ggplot(fatalities_evtype_avg[order(-fatalities_evtype_avg$AVG_FATALITIES),][1:5,], aes(x=reorder(EVTYPE,AVG_FATALITIES), y=AVG_FATALITIES)) +
geom_bar(stat = "identity", fill="coral")+
geom_text(aes(label = round(AVG_FATALITIES,digits=0)), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(size = 8, hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Fatalities",
subtitle = "Top 5 events by average annual fatalities")
lay <- rbind(c(1,1),
c(2,3))
layout_matrix = lay
figure1 <- grid.arrange(plot1,plot2,plot3,layout_matrix = lay,bottom = textGrob("Source: U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database \nNote: Events from 2001 to 2011",gp = gpar(fontface = 1, fontsize = 9),hjust = 0,x = 0.01))
grid.rect(gp=gpar(fill=NA))
The results indicate that from 2001 to 2011 floods had the greatest economic consequences with an average of $12.5 billion in damage per year. The top 5 type of events in terms of property and crop damages are shown in the following figure.
plot4 <- ggplot(totdmg_evtype_avgyear[order(-totdmg_evtype_avgyear$AVG_TOTDMG),][1:5,], aes(x=reorder(EVTYPE,AVG_TOTDMG), y=AVG_TOTDMG)) +
geom_bar(stat = "identity", fill="green4")+
geom_text(aes(label = sprintf("%0.1f", round(AVG_TOTDMG, digits = 1))), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Total Damage",
subtitle = "Top 5 events by average annual damage (in billion U.S. Dollars)")
plot5 <- ggplot(propdmg_evtype_avgyear[order(-propdmg_evtype_avgyear$AVG_PROPDMG),][1:5,], aes(x=reorder(EVTYPE,AVG_PROPDMG), y=AVG_PROPDMG)) +
geom_bar(stat = "identity", fill="green3")+
geom_text(aes(label = sprintf("%0.1f", round(AVG_PROPDMG, digits = 1))), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(size = 8, hjust = 0.5), plot.caption = element_text(hjust = 1),axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Property Damage",
subtitle = "Top 5 events by average annual damage \n (in billion U.S. Dollars)")
plot6 <- ggplot(cropdmg_evtype_avgyear[order(-cropdmg_evtype_avgyear$AVG_CROPDMG),][1:5,], aes(x=reorder(EVTYPE,AVG_CROPDMG), y=AVG_CROPDMG)) +
geom_bar(stat = "identity", fill="green3")+
geom_text(aes(label = sprintf("%0.1f", round(AVG_CROPDMG, digits = 1))), hjust = 1.2, colour = "white")+
coord_flip()+
theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5), plot.subtitle = element_text(size = 8, hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank())+
labs(title = "Crop Damage",
subtitle = "Top 5 events by average annual damage \n (in billion U.S. Dollars)")
lay <- rbind(c(1,1),
c(2,3))
layout_matrix = lay
figure2 <- grid.arrange(plot4,plot5,plot6,layout_matrix = lay,bottom = textGrob("Source: U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database \nNote: Events from 2001 to 2011",gp = gpar(fontface = 1, fontsize = 9),hjust = 0,x = 0.01))
grid.rect(gp=gpar(fill=NA))