Synopsis:

The analysis performed below looks at the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and tries to find out which event types have the most damaging consequences for population health and economic health and it appears that a tornado event is the most damaging for population health in terms of the maximum number of fatalities and injuries and a flood event is the most damaging from an economic standpoint (with losses in billions of dollars).

Data Processing

There are a few packages that are required to load the data into R and perform the analysis that are mentioned below.

Load packages

library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## R.utils v2.7.0 successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Loading the data
The code below describes a process to extract a file that contains the data (if it hasn’t been extracted already) and then load the data using the ‘read.csv’ command

##setwd("Set to your working directory")

if(!file.exists('repdata%2Fdata%2FStormData.csv')){
        bunzip2('repdata%2Fdata%2FStormData.csv.bz2')
}

stormdata <- read.csv('repdata%2Fdata%2FStormData.csv')

Question 1 - Across the US which types of events are most harmful to population health?

Description of analysis process
In order to answer this question I used the ‘str’ command to determine the names and characteristics of the variables. After studing the data, the two most relevant variables in this regard seem to be “FATALITIES” and “INJURIES” and so I plotted each of these variables against event type (“EVTYPE”) to see if there is a corelation. The code below describes this process.

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##Set up to show the plots side-by-side
par(mfrow = c(1,2))

plot(stormdata$FATALITIES, stormdata$EVTYPE, pch = 16, xlab = "Fatalities", ylab = "Event Type", main = "Fatalities vs Event Type")

plot(stormdata$INJURIES, stormdata$EVTYPE, pch = 17, xlab = "Injuries", ylab = "Event Type", main = "Injuries vs Event Type")

So it can be clearly seen from the plot that the maximum number of fatalities is close to 600 for a particular event. If this data point is considered an outlier in the sense that this observation was only made in a particular year, then the event type with the next highest number of fatalities is for an event number that is probably somewhere between 800 and 850.

When looking at the plot for the maximum number of injuries it can again be seen from the plot that the maximum number of injuries occurs consistently for an event with a number somewhere between 800 and 850.

Analyzing maximum injuries caused by events
The code below describes how I extracted the maximum value for injuries and then looked at that row to find the corresponding event name and the actual value for the maximum number of injuries

##Find out the row which has the maximum value for injuries
injuries_max <- which.max(stormdata$INJURIES)

##Find out the event type that has the maximum number of injuries
max_injuries_event <- stormdata[paste(injuries_max),"EVTYPE"]

## Find out the maximum number of injuries
max_injuries <- stormdata[paste(injuries_max), "INJURIES"]

Results
Thus it can be seen that the event with the maximum number of injuries is TORNADO and the maximum number of injuries is 1700

Analyzing maximum fatalities caused by events
From the plot that shows the fatalities versus event type, I had observed that there are only 2 events where the number of fatalities exceeded 150 and disregarded the event with close to 600 fatalities since it seemed to represent a rare event so the code below describes a way to just filter the 1 row where the number of fatalities exceeds 150 and is less than 500.

max_fatalities_row <- stormdata %>% filter(stormdata$FATALITIES > 150 & stormdata$FATALITIES < 500)

##Find out the event type that has the maximum number of fatalities
max_fatalities_event <- max_fatalities_row$EVTYPE

## Find out the maximum number of fatalities
max_fatalities <- max_fatalities_row$FATALITIES

Results
The event with the maximum number of fatalities is TORNADO and the maximum number of fatalities is 158

In conclusion, it can be stated that the event most harmful to population health is a tornado event.

Question 2 - Across the US which types of events have the greatest economic consequences?

Description of analysis process
In order to answer this question I used the ‘str’ command to determine the names and characteristics of the variables. After studing the data, I leant that the variables most likely related to economic damage are PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP. To try and make sense of this a bit more, I then looked at the link to the Storm Data Documentation available on the assignment page and learnt that the factors that are mentioned as a part of PROPDMGEXP AND CROPDMGEXP are denoting damage magnitude where K stands for thousands, M for millions and B for billions.

So it now becomes clear that we need to filter out the data where the values for the variables PROPDMGEXP and CROPDMGEXP are equal to “B” to signify maximum damage (in billions of dollars) and the actual value is represented by the variables PROPDMG and CROPDMG. I would then need to find the corresponding event type for the maximum economic damage. The code and plots below describes this process.

##Set up to show the plots side-by-side
par(mfrow = c(1,2))

##Filter out the rows that have property damage in billions of dollars
max_prop <- stormdata %>% filter(PROPDMGEXP == "B")

## Find out the row which has maximum property damage
max_prop_1 <- which.max (max_prop$PROPDMG)

## Find out the event type associated with the maximum property damage
max_prop_event <- max_prop[paste(max_prop_1),"EVTYPE"]

##Filter out the rows that have crop damage in billions of dollars
max_crop <- stormdata %>% filter(CROPDMGEXP == "B")

## Find out the row which has maximum crop damage
max_crop_1 <- which.max( max_crop$CROPDMG)

## Find out the event type associated with the maximum crop damage
max_crop_event <- max_crop[paste(max_crop_1), "EVTYPE"]

plot(max_prop$PROPDMG, max_prop$EVTYPE, pch = 16, xlab = "Property Damage (Billions USD)", ylab = "Event Type Index", main = "Event Index vs Property Damage")

plot(max_crop$CROPDMG, max_crop$EVTYPE, pch = 18, xlab = "Crop Damage (Billions USD)", ylab = "Event type index", main = "Event Index vs Crop Damage")

Results
Thus it can be seen that the event with the maximum property damage is FLOOD and the event with the maximum crop damage is RIVER FLOOD.

So it can be concluded that the event type that can do the most economic damage is a flood event.