The purpose of this report is to present the economic and public health consequences of meteorological phenomena in the United States during the period from 1950 to 2012. To achieve this, a series of graphs and descriptive data will be presented using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. To interpret the meaning of the variables, we use the storm data documentation: National Weather Service Storm Data Documentation
First, we access the Storm Data database and load it into the R environment under the name “tormenta”. (We also load the libraries that we will use in the work).
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(lubridate)
##
## Adjuntando el paquete: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
urlfile <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(urlfile, destfile = "storm.csv", method = "curl")
tormenta <- read.csv("storm.csv")
As mentioned in the NOAA documentation, data recording began in January 1950, initially registering only one type of phenomenon: tornadoes. In January 1996, more events began to be recorded, so we will only include data from January 1996 onwards. To do this, we first need to convert the BGN_DATE variable to a date format.
tormenta$BGN_DATE <- as.Date(strptime(tormenta$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
tormenta <- subset(tormenta, BGN_DATE >= "1996-01-01")
In this work, we are only interested in aggregated data, so it does not make sense to keep variables containing geographical location data (STATE, COUNTY, COUNTYNAME). We are also only interested in variables related to fatalities, injuries, and economic damages to property and crops (year, events, FATALITIES, INJURIES).
tormenta_salud <- tormenta %>%
mutate(events = ifelse(rowSums(select(., FATALITIES, INJURIES)) == 0,
NA, EVTYPE),
año = year(tormenta$BGN_DATE)) %>%
select(año, events, FATALITIES, INJURIES)
tormenta_salud <- tormenta_salud[complete.cases(tormenta_salud),]
Now, we group by year and event and sum the number of deaths and injuries.
tormenta_salud2 <- tormenta_salud %>%
group_by(año, events) %>%
summarise(deaths = sum(FATALITIES), inj = sum(INJURIES))
## `summarise()` has grouped output by 'año'. You can override using the `.groups`
## argument.
To facilitate graphing, we will separate the database into two. One will be for injuries, “injuries”, and the other for “fatalities”. In both datasets, we will calculate the percentage of deaths/injuries caused by an event relative to the total number of deaths/injuries caused by all events.
injuries <- tormenta_salud2 %>%
mutate(per_inj = round(inj * 100 / (sum(inj))),
events = ifelse(per_inj < 5, "Other", events)) %>%
filter(per_inj > 1)
fatal <- tormenta_salud2 %>%
mutate(per_death = round(deaths * 100 / (sum(deaths))),
events = ifelse(per_death < 5, "Other", events)) %>%
filter(per_death >= 1)
For this analysis, we only keep observations where some economic impact has occurred, meaning that the property and crop variables sum to something other than zero.
tormenta_economia <- tormenta %>%
mutate(events = ifelse(rowSums(select(., PROPDMG, CROPDMG)) == 0,
NA, EVTYPE),
año = year(tormenta$BGN_DATE)) %>%
select(año, events, PROPDMG, CROPDMG, PROPDMGEXP, CROPDMGEXP)
tormenta_economia <- tormenta_economia[complete.cases(tormenta_economia),]
We use the variables indicating figures to scale the property and crop variables, creating new variables that indicate the true nominal dollar value of both variables. Additionally, we group by event type, summing the total value of crops and properties.
tormenta_economia2 <- tormenta_economia %>%
mutate(property = ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
ifelse(PROPDMGEXP == "M", PROPDMG * 1000000,
if_else(PROPDMGEXP == "B", PROPDMG * 1000000000, 0))),
crops = ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
ifelse(CROPDMGEXP == "M", CROPDMG * 1000000,
if_else(CROPDMGEXP == "B", CROPDMG * 1000000000, 0)))) %>%
group_by(events) %>%
summarise(crop = sum(crops), propers = sum(property))
In the following command, we create a dataframe with the corresponding percentages of material damages for each activity relative to the total material damage of that activity.
tormenta_economia3 <- tormenta_economia2 %>%
mutate(Crops = round(crop * 100 / (sum(crop))),
Properties = round(propers * 100 / (sum(propers)))) %>%
filter(Crops >= 1 | Properties >= 1)
Finally, we format the dataframe for visualization with ggplot2, using the pivot_longer function.
tormenta_economia3_long <- tormenta_economia3 %>%
pivot_longer(cols = c(Crops, Properties), names_to = "variable", values_to = "valor")
Minor public health impacts over the years. The main meteorological phenomena causing injuries over the years have been tornadoes, tsunamis, excessive heat, and floods.
ggplot(injuries, aes(x = factor(año), y = per_inj, fill = events)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = scales::percent) +
labs(x = "Year", y = "% Percentage of Injuries", fill = "Event") +
ggtitle("Injuries by event for each year, %. In US from 1996 to 2012") +
theme_minimal()
The main meteorological phenomena causing fatalities have been excessive heat, extreme cold, strong winds, and tornadoes.
ggplot(fatal, aes(x = factor(año), y = per_death, fill = events)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = scales::percent) +
labs(x = "Year", y = "% Percentage of Fatalities", fill = "Event") +
ggtitle("Fatalities by event for each year, %. In US from 1996 to 2012") +
theme_minimal()
The main phenomena that have affected crops in the United States are droughts, floods, hurricanes/typhoons, and hail. Those that have impacted properties are floods, hurricanes/typhoons, storm surges, and tornadoes.
ggplot(tormenta_economia3_long, aes(x = events, y = valor, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Event", y = "Percentage, %",
title = "Percentage of the economic value damaged by weather events. In US from 1996 to 2012") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.title = element_blank())