This Markdown downloads the raw data, treats it and analyses the weather events in the US and shows how tornadoes are the deadliest events and floods the ones with the highest property damage
As a first step we are going to read the raw data situated in a zip file given by the instruction
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(knitr)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url,destfile="data.csv.bz2")
raw.data <- as_tibble(read.csv("data.csv.bz2"))
For this analysis we care only about four variables: Event Type, Fatalities, Property damage and Property damage exponent so a more focused data frame is made to account for those variables. PROPDMGEXP is a multiplier to the variable PROPDMG (property damage) so the dataframe is treated in order to show the correct numerical value of the damage
study.data <- select(raw.data, EVTYPE, FATALITIES, PROPDMG,PROPDMGEXP)
from <- c("K", "M", "", "B", "m",
"+", "0", "5", "6", "?",
"4", "2", "3", "h", "7",
"H", "-", "1", "8")
to <- c(1000,1000000,0,1000000000,1000000,
1,1,1,1,0,
1,1,1,1000,1,
100,0,1,1)
map = setNames(to, from)
new <- map[study.data$PROPDMGEXP]
new[is.na(new)] <- 0
study.data$property.damage <- study.data$PROPDMG*new
The first question asked in this study is: Across the United States, which types of events are most harmful with respect to population health? Harm will be studied with the variable fatalities
health.impact <- study.data %>%
group_by(EVTYPE) %>%
summarise(fatalities=sum(FATALITIES)) %>%
mutate(perc.fatalities=fatalities/sum(fatalities)*100) %>%
arrange(desc(fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
head(health.impact)
## # A tibble: 6 x 3
## EVTYPE fatalities perc.fatalities
## <chr> <dbl> <dbl>
## 1 TORNADO 5633 37.2
## 2 EXCESSIVE HEAT 1903 12.6
## 3 FLASH FLOOD 978 6.46
## 4 HEAT 937 6.19
## 5 LIGHTNING 816 5.39
## 6 TSTM WIND 504 3.33
We can see that tornadoes and excessive heat account for most deaths and therefore have the most harm towards the population health
The second question asked in this study is: Across the United States, which types of events are most harmful with respect to property damage?
prop.impact <- study.data %>%
group_by(EVTYPE) %>%
summarise(damage=sum(property.damage)) %>%
mutate(perc.damage=damage/sum(damage)*100) %>%
arrange(desc(damage))
## `summarise()` ungrouping output (override with `.groups` argument)
head(prop.impact)
## # A tibble: 6 x 3
## EVTYPE damage perc.damage
## <chr> <dbl> <dbl>
## 1 FLOOD 144657709800 33.9
## 2 HURRICANE/TYPHOON 69305840000 16.2
## 3 TORNADO 56937160776. 13.3
## 4 STORM SURGE 43323536000 10.1
## 5 FLASH FLOOD 16140811860. 3.78
## 6 HAIL 15732267486. 3.68
As for property damage, the worst weather event is a flood followed by hurricane/typhoon and tornado
par(mfrow=c(1:2))
barplot(height = health.impact$fatalities[1:5],names.arg = health.impact$EVTYPE[1:5], las=2, main = "Fatalities")
barplot(height = prop.impact$damage[1:5],names.arg = prop.impact$EVTYPE[1:5], las=2, main="Property damage")
As it can be seen in the figure, tornadoes and flash flood appear on both and are therefore devastating weather effects for both people and economics