This data analysis report of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was produced for the Coursera Jonh Hopkins Reproducible Research course assigment. The intent of this work is try to adress the problem of spot most harmful weather Events from a health and financial perspectives. The Falalities and Injuries were sumed to represent health harm. The property and crop loss were sumed to represente economic damages. The worst event from a human health perspective was Tornados. From a economic perspective the worst evet was flood.
Load the packages from tidyverse like dplyr, tibble and ggplot.
# Load libraries
library(tidyverse)
The default Rbase package was used to automaticaly
read compressed comma separated files with the function
read.csv to load the NOOA data onto R enviroment.
# Load data
df <- read.csv("repdata_data_StormData.csv.bz2")
The pre processing step applyed to the data was is the selection of the columns/variables relevant to the analysis and the exponetiation of the damages.
Columns Selected:
Event Type: EVTYPE
Fatalities: FATALITIES
Injuries: INJURIES
Property Damage: PROPDMG
Property Damage Exponentiation: PROPDMGEXP
Crop Damage: CROPDMG
Crop Damage Exponentiation: CROPDAMGEXP
# Define exponentiation dictionary
dict_exp <- c("K"= 10^3,
"M"= 10^6,
"B"= 10^9,
"m"= 10^6,
"+"= 10^0,
"0"= 10^0,
"5"= 10^5,
"6"= 10^6,
"?"= 10^0,
"4"= 10^4,
"2"= 10^2,
"3"= 10^3,
"h"= 10^2,
"7"= 10^7,
"H"= 10^2,
"-"= 10^0,
"1"= 10^1,
"8"= 10^8)
# Select columns and exponentiate damages
df2 <- df %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP != "", dict_exp[PROPDMGEXP] * PROPDMG, PROPDMG),
CROPDMG = ifelse(CROPDMGEXP != "", dict_exp[CROPDMGEXP] * CROPDMG, CROPDMG))
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
# Group by and calculate pop health harm
pop_health <- df2 %>%
group_by(EVTYPE) %>%
summarise(HEALTH = sum(FATALITIES) + sum(INJURIES)) %>%
ungroup() %>%
arrange(desc(HEALTH)) %>%
slice(1:5) %>%
mutate(EVTYPE =factor(EVTYPE, levels = c(EVTYPE)))
# Draw plot of health harm events
pop_health %>%
ggplot(mapping = aes(y = EVTYPE, x = HEALTH))+
geom_col(fill = "darkred")+
labs(title = "Top 5 Harmful Events to the Population Health",
subtitle = "Date Reference: 1950 to 2011",
x = "Health Harm (People)",
y = "Event Type",
caption = "Source: NOAA Storm Database \n
Aggregation: Sum of Fatalities and Injuries by Event Type")
Across the United States, which types of events have the greatest economic consequences?
# Group by and calculate economy damage
economy <- df2 %>%
group_by(EVTYPE) %>%
summarise(ECONOMY = sum(PROPDMG) + sum(CROPDMG)) %>%
ungroup() %>%
arrange(desc(ECONOMY)) %>%
slice(1:5) %>%
mutate(EVTYPE = factor(EVTYPE, levels = c(EVTYPE)))
# Draw plot of health harm events
economy %>%
ggplot(mapping = aes(y = EVTYPE, x = ECONOMY))+
geom_col(fill = "darkgreen")+
labs(title = "Top 5 Damage Events to the Economy",
subtitle = "Date Reference: 1950 to 2011",
x = "Damage (Dollars)",
y = "Event Type",
caption = "Source: NOAA Storm Database \n
Aggregation: Sum of Properties and Crop Damages by Event Type")