In this analysis, I will use the NOAA Storm Database Analysis to identify which event has the most devastating effect on human lives and the one that makes the most damages in an amount of money. For this, I will use mostly data table package for a faster processing.
First load neded packages and set working directory
library(data.table)
library(ggplot2)
library(R.utils)
library(scales)
setwd("C:/Users/Rafael/Documents/GitHub/RepData_PeerAssessment2")
Environment Information
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
## [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
## [5] LC_TIME=Portuguese_Brazil.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.2
## [5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.9 stringi_1.1.2
## [9] rmarkdown_1.3 knitr_1.15.1 stringr_1.1.0 digest_0.6.11
## [13] evaluate_0.10
Start loading data by downloading, unzipping and importing needed columns of the raw data
file <- "repdata-data-StormData.csv.bz2"
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", file)
pa2 <- fread(bunzip2(file), select = c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
##
Read 93.1% of 967216 rows
Read 902297 rows and 7 (of 37) columns from 0.523 GB file in 00:00:03
Begin data transformation to evaluate with respect to population health.
Create a subset with the needed columns, grouped by event types and ordered bu fatalities and then injuries.
ph <- pa2[,.(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES)), by=(EVTYPE)][order(-FATALITIES,-INJURIES)]
phMax <- tolower(ph[1,EVTYPE])
phMaxF <- format(ph[1,FATALITIES], big.mark = ",")
phMaxI <- format(ph[1,INJURIES], big.mark = ",")
head(ph)
## EVTYPE FATALITIES INJURIES
## 1: TORNADO 5633 91346
## 2: EXCESSIVE HEAT 1903 6525
## 3: FLASH FLOOD 978 1777
## 4: HEAT 937 2100
## 5: LIGHTNING 816 5230
## 6: TSTM WIND 504 6957
Begin data transformation to evaluate economic consequences.
Create a subset with the needed columns, grouped by event types.
ec <- pa2[,.(propDmgValue = sum(PROPDMG * ifelse(PROPDMGEXP == "K"
, 10^3
,ifelse(PROPDMGEXP == "M", 10^6
,ifelse(PROPDMGEXP == "B"
, 10^9,0)
)))
,cropDmgValue = sum(CROPDMG * ifelse(CROPDMGEXP == "K"
, 10^3
,ifelse(CROPDMGEXP == "M", 10^6
,ifelse(CROPDMGEXP == "B"
, 10^9,0)
))))
,by=(EVTYPE)]
Create a column with the sam of both propertie and crops damages, and orders the data set by it. Plot a column graph with the top 10 values.
ec <- ec[,totalValue := propDmgValue + cropDmgValue][order(-totalValue)]
ecMax <- tolower(ec[1,EVTYPE])
ecMaxM <- format(ec[1,totalValue], small.mark = ".", big.mark = ",")
ggplot(head(ec), aes(reorder(EVTYPE, -totalValue), totalValue))+
geom_col() +
labs(y = "Total Value (USD)"
, x = "Event Type"
, title = "Top 10 Event - Damages in USD")+
scale_y_continuous(labels=comma)
1 - Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Answer: The tornado is the most harmful event with respect to population health with 5,633 fatalities and 91,346 injures.
2 - Across the United States, which types of events have the greatest economic consequences?
Answer: The flood is the event that brings the greatest economic consequecies with $150,319,678,250 worth in damages.