Synopsis

This analysis endeavours to identify the most dangerous and costly meteorological events in the USA. For this purpose, I downloaded the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from here. Firstly, I calculate the average number of fatalities for each event type. I define the most dangerous event types as the ones that cause the highest number of fatalities.

Data processing

Set working directory and load packages

setwd("/home/cristian/Documents/PhD/lectures/coursera/reproducibleResearch/week4/")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load the data

storms <- read.csv("repdata_data_StormData.csv")
colnames(storms) <- tolower(colnames(storms))
colnames(storms)
##  [1] "state__"    "bgn_date"   "bgn_time"   "time_zone"  "county"    
##  [6] "countyname" "state"      "evtype"     "bgn_range"  "bgn_azi"   
## [11] "bgn_locati" "end_date"   "end_time"   "county_end" "countyendn"
## [16] "end_range"  "end_azi"    "end_locati" "length"     "width"     
## [21] "f"          "mag"        "fatalities" "injuries"   "propdmg"   
## [26] "propdmgexp" "cropdmg"    "cropdmgexp" "wfo"        "stateoffic"
## [31] "zonenames"  "latitude"   "longitude"  "latitude_e" "longitude_"
## [36] "remarks"    "refnum"
dim(storms)
## [1] 902297     37

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To address this question, I will calculate the average number of fatalities for each type of events.

stormsDt <- tbl_df(storms)
stormsEvtype <- group_by(stormsDt, evtype)
fatalitiesByEvtype <- summarize(stormsEvtype, avgFatalities = mean(fatalities))
filter(fatalitiesByEvtype, avgFatalities > 5)
## Source: local data frame [4 x 2]
## 
##                       evtype avgFatalities
##                       (fctr)         (dbl)
## 1              COLD AND SNOW     14.000000
## 2      RECORD/EXCESSIVE HEAT      5.666667
## 3 TORNADOES, TSTM WIND, HAIL     25.000000
## 4      TROPICAL STORM GORDON      8.000000

Tornadoes that are accompanied by thunderstorm, wind and hail cause the most fatalities, with an average of 25 deaths per event. Next come cold and snow, tropical storms and excessive heat.

Across the United States, which types of events have the greatest economic consequences?

I create a new variable called economicDamage that is the sum of the cost of property damage and the cost of crop damage.

stormsDt <- mutate(stormsDt, economicDamage = propdmg + cropdmg)
hist(stormsDt$economicDamage, xlab = "Economic damage / $", main = "Histogram of the cost of
     economic damage")

stormsEvtype <- group_by(stormsDt, evtype)
economicDamageByEvtype <- summarize(stormsEvtype, avgEconomicDamage = mean(economicDamage))
arrange(filter(economicDamageByEvtype, avgEconomicDamage > 600), desc(avgEconomicDamage))
## Source: local data frame [2 x 2]
## 
##                  evtype avgEconomicDamage
##                  (fctr)             (dbl)
## 1 TROPICAL STORM GORDON              1000
## 2       COASTAL EROSION               766

The two most costly meteorological events are tropical storms and coastal erosion, costing an average 1000$ and 766$ per event, respectively.

Source of the data

National Wheather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

Reproducible research: session info

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
## 
## locale:
##  [1] LC_CTYPE=fr_CH.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_CH.UTF-8        LC_COLLATE=fr_CH.UTF-8    
##  [5] LC_MONETARY=fr_CH.UTF-8    LC_MESSAGES=fr_CH.UTF-8   
##  [7] LC_PAPER=fr_CH.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_CH.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_0.4.3
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.4     digest_0.6.9    assertthat_0.1  R6_2.1.2       
##  [5] DBI_0.3.1       formatR_1.3     magrittr_1.5    evaluate_0.8.3 
##  [9] stringi_1.0-1   lazyeval_0.1.10 rmarkdown_0.9.5 tools_3.2.3    
## [13] stringr_1.0.0   yaml_2.1.13     parallel_3.2.3  htmltools_0.3.5
## [17] knitr_1.12.3