This analysis endeavours to identify the most dangerous and costly meteorological events in the USA. For this purpose, I downloaded the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from here. Firstly, I calculate the average number of fatalities for each event type. I define the most dangerous event types as the ones that cause the highest number of fatalities.
setwd("/home/cristian/Documents/PhD/lectures/coursera/reproducibleResearch/week4/")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
storms <- read.csv("repdata_data_StormData.csv")
colnames(storms) <- tolower(colnames(storms))
colnames(storms)
## [1] "state__" "bgn_date" "bgn_time" "time_zone" "county"
## [6] "countyname" "state" "evtype" "bgn_range" "bgn_azi"
## [11] "bgn_locati" "end_date" "end_time" "county_end" "countyendn"
## [16] "end_range" "end_azi" "end_locati" "length" "width"
## [21] "f" "mag" "fatalities" "injuries" "propdmg"
## [26] "propdmgexp" "cropdmg" "cropdmgexp" "wfo" "stateoffic"
## [31] "zonenames" "latitude" "longitude" "latitude_e" "longitude_"
## [36] "remarks" "refnum"
dim(storms)
## [1] 902297 37
To address this question, I will calculate the average number of fatalities for each type of events.
stormsDt <- tbl_df(storms)
stormsEvtype <- group_by(stormsDt, evtype)
fatalitiesByEvtype <- summarize(stormsEvtype, avgFatalities = mean(fatalities))
filter(fatalitiesByEvtype, avgFatalities > 5)
## Source: local data frame [4 x 2]
##
## evtype avgFatalities
## (fctr) (dbl)
## 1 COLD AND SNOW 14.000000
## 2 RECORD/EXCESSIVE HEAT 5.666667
## 3 TORNADOES, TSTM WIND, HAIL 25.000000
## 4 TROPICAL STORM GORDON 8.000000
Tornadoes that are accompanied by thunderstorm, wind and hail cause the most fatalities, with an average of 25 deaths per event. Next come cold and snow, tropical storms and excessive heat.
I create a new variable called economicDamage that is the sum of the cost of property damage and the cost of crop damage.
stormsDt <- mutate(stormsDt, economicDamage = propdmg + cropdmg)
hist(stormsDt$economicDamage, xlab = "Economic damage / $", main = "Histogram of the cost of
economic damage")
stormsEvtype <- group_by(stormsDt, evtype)
economicDamageByEvtype <- summarize(stormsEvtype, avgEconomicDamage = mean(economicDamage))
arrange(filter(economicDamageByEvtype, avgEconomicDamage > 600), desc(avgEconomicDamage))
## Source: local data frame [2 x 2]
##
## evtype avgEconomicDamage
## (fctr) (dbl)
## 1 TROPICAL STORM GORDON 1000
## 2 COASTAL EROSION 766
The two most costly meteorological events are tropical storms and coastal erosion, costing an average 1000$ and 766$ per event, respectively.
National Wheather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=fr_CH.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_CH.UTF-8 LC_COLLATE=fr_CH.UTF-8
## [5] LC_MONETARY=fr_CH.UTF-8 LC_MESSAGES=fr_CH.UTF-8
## [7] LC_PAPER=fr_CH.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_CH.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_0.4.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.4 digest_0.6.9 assertthat_0.1 R6_2.1.2
## [5] DBI_0.3.1 formatR_1.3 magrittr_1.5 evaluate_0.8.3
## [9] stringi_1.0-1 lazyeval_0.1.10 rmarkdown_0.9.5 tools_3.2.3
## [13] stringr_1.0.0 yaml_2.1.13 parallel_3.2.3 htmltools_0.3.5
## [17] knitr_1.12.3