SYNOPSIS In this report we will explore the NOAA Storm Database in order to answer some basic questions about severe weather events using R. We will expose the processes from the extraction and cleaning to the analysis of the data, in order to discover the top 5 climatic events that have generated major damages (fatalities, injuries, and property damage) between 1950 and 2011.
Loading and Processing DATA
Requested Packages Activations:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.6.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
##
## intersect, setdiff, union
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(knitr)
library(markdown)
library(stringr)
Download and read the DATA
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Data_storm.csv.bz2")
storm <- read.csv('Data_storm.csv.bz2',header = TRUE)
We found out that we need to replace the values in the variables CROPDMGEXP and PROPDMGEXP in order to calculate property damage correctly:
clean_storm <- storm %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "K", CROPDMG * 1000, CROPDMG),
CROPDMG = ifelse(CROPDMGEXP == "M", CROPDMG * 1000000, CROPDMG),
CROPDMG = ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000, CROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "K", PROPDMG * 1000, PROPDMG),
PROPDMG = ifelse(PROPDMGEXP == "M", PROPDMG * 1000000, PROPDMG),
PROPDMG = ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000, PROPDMG))
clean_storm$BGN_DATE <- as.Date(clean_storm$BGN_DAT, format="%m/%d/%Y")
Also, we had to fix the spelling of the “Events” levels in order to group them correctly:
clean_storm2 <- clean_storm %>% # we use library(stringr) to maek this work
mutate(EVTYPE = str_to_sentence(EVTYPE),
EVTYPE = str_trim(EVTYPE, side = "left"),
EVTYPE = str_replace_all(EVTYPE, pattern = "tstm", replacement = "thunderstorm"),
EVTYPE = str_replace_all(EVTYPE, pattern = "\\s\\(g\\d*\\)", replacement = ""),
EVTYPE = str_replace_all(EVTYPE, pattern = "^heat", replacement = "excessive heat"),
EVTYPE = str_replace_all(EVTYPE, pattern = "wild/forest fire", replacement = "wildfire"),
EVTYPE = str_replace_all(EVTYPE, pattern = "strong wind", replacement = "high wind"),
EVTYPE = str_replace_all(EVTYPE, pattern = "winter weather", replacement = "winter storm"),
EVTYPE = str_replace_all(EVTYPE, pattern = "Hurricane/typhoon", replacement = "Hurricane"),
EVTYPE = str_replace_all(EVTYPE, pattern = "^hurricane$", replacement = "hurricane"))
Impact of weather events on public health
We grouped the data by “Event type” and we sumeize both FATALITIES and INJURIES variables. Also we created a new column ‘TOTALHU’ adding the total number of cases (FATALITIES + INJURIES).
storm_hum<- clean_storm2 %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarize_all(list(sum)) %>%
mutate(TOTALHUM = FATALITIES + INJURIES) %>%
arrange(desc(TOTALHUM))
We order and limited the data to show the Top 5 weatehr event. Also, the variables were renamed in order generate one clear expliratory graphic:
TOTALHUM <-head(storm_hum[order(storm_hum$TOTALHUM, decreasing=TRUE),],5)
FATALITIES <- TOTALHUM %>% mutate(hum_type="Fatalities", hum_amount=FATALITIES)
INJURIES <- TOTALHUM %>% mutate(hum_type="Injuries", hum_amount=INJURIES)
hum_major <- rbind(FATALITIES,INJURIES)
Health impacts of top 5 most injurious weather event detailied in the following graph:
ggplot(hum_major, aes(x=EVTYPE, y=hum_amount, fill=factor(hum_type))) +
geom_col(color="#69b3a2")+
labs(title = "Fatalities and Injuries by Event Type",
subtitle = "Top 5 Weather Events",
x = "Weather Event",
y = "Number of Casualties")
Impact of weather events on Damage Property
We grouped the data by “Event type” and we sumeize both CROPDMG and PROPDMG variables. Also we created a new column ‘TOTALEXP’ adding the total number of cases (CROPDMG + PROPDMG).
storm_exp<- clean_storm2 %>%
select(EVTYPE,CROPDMG,PROPDMG) %>%
group_by(EVTYPE) %>%
summarize_all(list(sum))%>%
mutate(TOTALEXP = CROPDMG + PROPDMG) %>%
arrange(desc(TOTALEXP))
We order and limited the data to show the Top 5 weatehr event. Also, the variables were renamed in order generate one clear expliratory graphic:
TOTALEXP <-head(storm_exp[order(storm_exp$TOTALEXP, decreasing=TRUE),],5)
PROPDMG <- TOTALEXP %>% mutate(damage_type="Property", damage_amount=PROPDMG)
CROPDMG <- TOTALEXP %>% mutate(damage_type="Crops", damage_amount=CROPDMG)
damage_major <- rbind(PROPDMG,CROPDMG)
Property impacts of top 5 most injurious weather event detailied in the following graph:
ggplot(damage_major, aes(x=EVTYPE, y=damage_amount, fill=factor(damage_type))) +
geom_col(color="#69b3a2")+
scale_y_continuous(name="Fluorescent intensity/arbitrary units", labels = scales::comma) +
labs(title = "Property and Crop Damage by Event Type",
subtitle = "Top 5 Weather Events",
x = "Weather Event",
y = "Number of Casualties")