Sinopsis

This Markdown downloads the raw data, treats it and analyses the weather events in the US and shows how tornadoes are the deadliest events and floods the ones with the highest property damage

Data Processing

As a first step we are going to read the raw data situated in a zip file given by the instruction

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(knitr)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url,destfile="data.csv.bz2")
raw.data <- as_tibble(read.csv("data.csv.bz2"))

For this analysis we care only about four variables: Event Type, Fatalities, Property damage and Property damage exponent so a more focused data frame is made to account for those variables. PROPDMGEXP is a multiplier to the variable PROPDMG (property damage) so the dataframe is treated in order to show the correct numerical value of the damage

study.data <- select(raw.data, EVTYPE, FATALITIES, PROPDMG,PROPDMGEXP)
from <- c("K", "M", "",  "B", "m", 
          "+", "0", "5", "6", "?", 
          "4", "2", "3", "h", "7", 
          "H", "-", "1", "8")
to <- c(1000,1000000,0,1000000000,1000000,
        1,1,1,1,0,
        1,1,1,1000,1,
        100,0,1,1)
map = setNames(to, from)
new <- map[study.data$PROPDMGEXP]
new[is.na(new)] <- 0
study.data$property.damage <- study.data$PROPDMG*new

Results

The first question asked in this study is: Across the United States, which types of events are most harmful with respect to population health? Harm will be studied with the variable fatalities

health.impact <- study.data %>%
        group_by(EVTYPE) %>%
        summarise(fatalities=sum(FATALITIES)) %>%
        mutate(perc.fatalities=fatalities/sum(fatalities)*100) %>%
        arrange(desc(fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
head(health.impact)
## # A tibble: 6 x 3
##   EVTYPE         fatalities perc.fatalities
##   <chr>               <dbl>           <dbl>
## 1 TORNADO              5633           37.2 
## 2 EXCESSIVE HEAT       1903           12.6 
## 3 FLASH FLOOD           978            6.46
## 4 HEAT                  937            6.19
## 5 LIGHTNING             816            5.39
## 6 TSTM WIND             504            3.33

We can see that tornadoes and excessive heat account for most deaths and therefore have the most harm towards the population health

The second question asked in this study is: Across the United States, which types of events are most harmful with respect to property damage?

prop.impact <- study.data %>%
        group_by(EVTYPE) %>%
        summarise(damage=sum(property.damage)) %>%
        mutate(perc.damage=damage/sum(damage)*100) %>%
        arrange(desc(damage))
## `summarise()` ungrouping output (override with `.groups` argument)
head(prop.impact)
## # A tibble: 6 x 3
##   EVTYPE                   damage perc.damage
##   <chr>                     <dbl>       <dbl>
## 1 FLOOD             144657709800        33.9 
## 2 HURRICANE/TYPHOON  69305840000        16.2 
## 3 TORNADO            56937160776.       13.3 
## 4 STORM SURGE        43323536000        10.1 
## 5 FLASH FLOOD        16140811860.        3.78
## 6 HAIL               15732267486.        3.68

As for property damage, the worst weather event is a flood followed by hurricane/typhoon and tornado

par(mfrow=c(1:2))
barplot(height = health.impact$fatalities[1:5],names.arg = health.impact$EVTYPE[1:5], las=2, main = "Fatalities")
barplot(height = prop.impact$damage[1:5],names.arg = prop.impact$EVTYPE[1:5], las=2, main="Property damage")

As it can be seen in the figure, tornadoes and flash flood appear on both and are therefore devastating weather effects for both people and economics