The aim of this report is to analyze the negative consequences associated to a variety of weather effects in the USA. This will be based on storm data provided by the U.S. National Oceanic and Atmospheric Administration. Our analysis will focus on the the number of fatalities and the value of property and crop damage associated to the events.This assessment will be made across the different states that integrate the USA. The results show that, in terms of fatalities, tornados are meant to be the main issue for policymakers and regulators. Conversely, when the focus is made on economic damage, droughts and floods are the most noticeable problem.
Before downloading the data, it should be mentioned that these R packages will be required to run this code.
library(dplyr)
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
library(metafolio)
First of all, we must download the data.
if (dir.exists("SecondAssignment")==FALSE){
dir.create("SecondAssignment")
setwd("./SecondAssignment")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "data.csv.bz2", method = "curl")
data <- read.csv("data.csv.bz2")
} else {
setwd("./SecondAssignment")
data <- read.csv("data.csv.bz2")
}
In this analysis, we will mainly use two variables: EVTYPE and STATE. Since we will based our assessment on them, it is useful to convert them into factor variables (they are currently character variables). If we were interested in the dates of the event, it would be useful to transform the begin and end dates to date format, though these variables will not be used now (the displayed code could be used).
data$EVTYPE <- factor(data$EVTYPE)
data$STATE <- factor(data$STATE)
data$BGN_DATE<-strptime(data$BGN_DATE,format = "%m/%d/%Y %H:%M:%S")
data$END_DATE<-strptime(data$END_DATE,format = "%m/%d/%Y %H:%M:%S")
The data counts with 902297 observations and 37 variables. The earliest event recorded took place the 1950-01-03, whilst the latest, started the 2011-11-30. Before starting the analysis, data on economic damage should be transformed into the same measurement unit.
data$PROPDMGEXP <- factor(data$PROPDMGEXP)
levels(data$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m" "M"
levels(data$PROPDMGEXP)<-c(rep(1,13),10^9,10^2,10^2,10^3,10^6,10^6)
data$PROPDMGEXP<-as.numeric(as.character(data$PROPDMGEXP))
data$PROPDMG<-data$PROPDMG*data$PROPDMGEXP
data$CROPDMGEXP <- factor(data$CROPDMGEXP)
levels(data$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
levels(data$CROPDMGEXP) <- c(rep(1,4),10^9,10^3,10^3,10^6,10^6)
data$CROPDMGEXP<-as.numeric(as.character(data$CROPDMGEXP))
data$CROPDMG<-data$CROPDMG*data$CROPDMGEXP
We analyse the number of fatalities per region. Firstly, it is displayed the event that, for each state, causes the largest number of fatalities. The most dangerous events seem to be heat and tornados.
data <- group_by(data,STATE,EVTYPE)
tabmortality <- summarise(data, deaths=sum(FATALITIES)) %>%
group_by(STATE) %>%
summarise(fatalities=max(deaths), event=EVTYPE[which.max(deaths)])
ggplot(data=tabmortality[tabmortality$fatalities>0,],aes(STATE,fatalities,fill=event))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90)) +
ylab("Fatalities") + xlab("State")+ guides(fill=guide_legend(title="Event"))
min_deats_state<-500
min_deaths_event<-50
Next, we proceed to perform an analysis just taking into account the states facing a higher number of fatalities derived from weather events. The limit is set equal to 500 total deaths in the state derived from any of these events (it is, nonetheless, straightforward to modify). Then, it is plot the composition of weather events related deaths in that state. In order to simplify the plot, events causing less than 50 are dropped.
tabmortality <- summarise(data, deaths=sum(FATALITIES)) %>%
group_by(STATE)
state_deaths <- summarise(tabmortality,state_deaths=sum(deaths))
tabmortality <- merge(tabmortality,state_deaths)
tabmortality <- tabmortality[tabmortality$state_deaths>min_deats_state &
tabmortality$deaths>min_deaths_event,]
colnames(tabmortality)<-c("State","Event","Fatalities","Statefatalities")
ggplot(data=tabmortality, aes(State,Statefatalities,fill=Event))+
geom_col() + ylab("Fatalities") +coord_flip() +
scale_fill_brewer(palette = "Paired")
Tornados, as we saw at the beginning of the report, are the most problematic event. Nonetheless, it is noticeable that those states that suffer a high number of fatalities derived from one weather event, also tend to face a large quantity of deaths associated with other episodes.
minpropdam <- 5000
mincropdam <- 5000
To address the effects of weather events on the economic sectors, we distinguish between property and crop damages. We plot the events that, for each state, causes the largest loss in each aspect. Due to the number of states, and in order to simplify the graphs, two minimum damage values are set: 5000 for property damages and 5000 for crop damages.
data <- group_by(data,STATE,EVTYPE)
damtable <- summarise(data, propdamage=sum(PROPDMG), cropdamage=sum(CROPDMG)) %>%
group_by(STATE) %>%
summarise(propdmg=max(propdamage), propevent=EVTYPE[which.max(propdamage)],
cropdmg=max(cropdamage), cropevent=EVTYPE[which.max(cropdamage)])
evs <- union(damtable$propevent,damtable$cropevent)
evcol <- gg_color_hue(length(evs))
names(evcol) <- evs
pd<-ggplot(data=damtable[damtable$propdmg>50000,],
aes(STATE,propdmg,fill=propevent))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90))+
ylab("Property Damage") + theme(legend.title = element_blank()) +
theme(legend.position = "top") + scale_fill_manual("Legend",values=evcol)
cd<-ggplot(data=damtable[damtable$cropdmg>50000,],
aes(STATE,cropdmg,fill=cropevent))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90)) +
ylab("Crop Damage") + theme(legend.title = element_blank())+
theme(legend.position = "top") + scale_fill_manual("Legend",values=evcol)
ggarrange(pd, cd, common.legend = TRUE)
According to the information that can be inferred from the graphs, the issues, that should drew mainly our attention should be tornados, droughts and floods. It should be highlighted that countries that face huge life losses from one sort of event tend to suffer similar issues with other episodes. A similar conclusion can be drawn from economic losses plots, where it can be seen how some US states are heavily damaged by this sort of events, whereas others face barely no damage whatsoever. This fact might be important for social policy.