The effects of major storms and weather events in the United States"

author: Derek Corcoran

Synopsis

The National Oceanic and Atmospheric Administrations (NOAA) studies weather and oceanographic events including severe storm and major weather events in the US. Among the data they inform is the number of deaths, injuries, crop damages and property damages.

This document summarises the top five event types in terms of deaths, injuries and combined econmic losses. Our work shows that Tornados are by far the natural disaster that causes more health and economic damage across the United States.

Data processing

Since the dataset is large and managing it, the whole rmarkdown document is set to cache the results

A directory is created and the data is downloaded from the internet to the computer and loaded into R

if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
## Warning in dir.create("./PeerAssessment2"): '.\PeerAssessment2' already
## exists
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v2.0.1 (2015-04-24) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
if (!file.exists("./PeerAssessment2/Data")) {
  bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")
## [1] "Data"     "Data.zip"
Storm.data <- read.csv ('./PeerAssessment2/Data')

We see some of the data

str(Storm.data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Results

which types of events are most harmful with respect to population health?

There are 2 variables that involve the population health the number of injuries (INJURIES) and the number of deaths (DEATHS) In order to summarise the data and answer our question we have to load the dplyr package

library("dplyr", lib.loc="~/R/win-library/3.2")
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now will make a summary data by number of injuries grouped by event type and make a graph of the top five causes of injuries by natural catastrophes in the united states.

bad_health_injuries <- summarise(group_by(Storm.data, EVTYPE), Total_injuries=sum(INJURIES))
bad_health_injuries<-arrange (bad_health_injuries, desc(Total_injuries))
top_five_injuries <- bad_health_injuries[1:5,]
barplot(top_five_injuries$Total_injuries,names.arg=top_five_injuries$EVTYPE,cex.names=0.8, main="Number of injuries caused")

Now will make a summary data by number of deaths grouped by event type and make a graph of the top five causes of deaths by natural catastrophes in the united states.

bad_health_deaths <- summarise(group_by(Storm.data, EVTYPE), Total_deaths=sum(FATALITIES))
bad_health_deaths<-arrange (bad_health_deaths, desc(Total_deaths))
top_five_deaths <- bad_health_deaths[1:5,]
barplot(top_five_deaths$Total_deaths,names.arg=top_five_deaths$EVTYPE,cex.names=0.6, main="Number of Deaths caused")

which types of events have the greatest economic consequences?

In terms of economic losses the variables CROPDMG and PROPDMG acount for the estimated losses in dollars for crop damage and property damage respectibly. In our analysis economic losses are counted as the sum of properties and crop losses. We sum both losses and graph the top five natural disasters that cause more economic losses

Economic_loss <- summarise(group_by(Storm.data, EVTYPE), Total_loss=(sum(CROPDMG)+sum(PROPDMG)))
Economic_loss<-arrange (Economic_loss, desc(Total_loss))
top_five_loss <- Economic_loss[1:5,]
barplot(top_five_loss$Total_loss,names.arg=top_five_loss$EVTYPE,cex.names=0.8, main="Total economic loss")

Final thoughts

Tornados are the only natural catastrophe in the top five in all three categories. Furthermore, they are the top cause of injuries, deaths and economic loss in the US and they always at least double the second cause.