Craig Cunningham 10/20/2017 Synopsis Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events in the database start in the year 1950 and end in November 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This report analyzes the storm event records in the severe weather tracking database to determine which of these event types has the most impact in terms of harm to human health and property damage.

The analysis results show that Tornado events are associated with the vast majority of weather-related human and economic impact.

Data Processing The data for this report come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It was downloaded from the website: Storm Data (47Mb) Database documentation is available at the following links. Here you will find how some of the variables are constructed/defined. National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ

Code for downloading and extracting database…

## Source file
## https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
if(!file.exists("repdata-data-StormData.csv")) {
        if(!file.exists("repdata-data-StormData.csv.bz2")) {
                download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                              "repdata-data-StormData.csv.bz2",
                              method = "auto")
        }

        library(R.utils)
        bunzip2("repdata-data-StormData.csv.bz2")
}
wd <- read.csv("repdata-data-StormData.csv")
totBytes <- file.size("repdata-data-StormData.csv")

The total size of the data file is 5.616374510^{8} bytes.

Processing Data for Analysis Once extracted, the data were loaded into a table for processing

## load libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(RColorBrewer)

stormData<-read.table("repdata-data-StormData.csv",
                        header=TRUE, 
                        sep=","
                      )
nmRecs<-dim(stormData)[[1]] ## capture the number of records
names(stormData) ## List the columns
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The dataset has a total of 902297 records

This report focuses on the information related to: - Type of Event recorded in column 8 labeled EVTYPE - Number of fatalities recorded in column 23 labeled FATALITIES - Number of injuries recorded in column 24 labeled INJURIES - Amount of property damage recorded in column 25 & 26, labeled PROPDMG & PROPDMGEP - Amount of crop damage recorded in columnS 27 & 28 labeled CROPDMG & CROPDMGEXP

Code to subset the dataset for fields of interest

df <- subset(stormData[,c(8,23,24,25,26,27,28)])

CROPDMGEXP and PROPDMGEXP are multiplier codes for hundreds, thousands, millions, billions

Create a table to covert damage expense codes to mulitpliers

## ---- 
## create vectors to translate EXP columns to multipliers
code <- c("h","H","k","K","m","M","b","B")
multp <- c(100, 100,1000,1000,1000000,1000000,1000000000,1000000000)

Code to compute damage amounts using multipliers from PROPDMGEXP & CROPDMGEXP

library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

# Use code and multp to create a multiplier field PROPDMGx
df <- mutate(df, PROPDMGx = as.numeric(ifelse(PROPDMGEXP %in% code,
                                             multp,0)))
#Use code and multp to create a multiplier field CROPDMGx
df <- mutate(df, CROPDMGx = as.numeric(ifelse(CROPDMGEXP %in% code,
                                              multp,0)))
## Code to compute damage expenses using multiplier
df$PROPDMGEXP.0 <- df$PROPDMG * df$PROPDMGx
df$CROPDMGEXP.0 <- df$CROPDMG * df$CROPDMGx

To quantify the impact of storms for the broad groups of ‘Impact on Human Health’ and ‘Economic Impact’, a summary field was created for each type of impact: Field name: HARM = sum of FATALITIES + INJURIES Field name: EXPENSE = sum of PROPDMGEXP.0 + CROPDMGEXP.0

Code to create summary fields

## Create HARM field that is the sum of FATALITIES & INJURIES
df <- mutate(df, HARM = FATALITIES + INJURIES)
## Create EXPENSE field that is the sum of PROPDMGEXP & CROPDMGEXP
df <- mutate(df, EXPENSE = PROPDMGEXP.0 + CROPDMGEXP.0)

Results With the data processing completed, the impact of the weather events was analyzed using the new summary fields HARM and EXPENSE.

First, the totals were run by EVTYPE and summarized:

Code for summarizing damage totals by type of weather event

## --------- Analyze HARM -----------
totHarmByEVT<-aggregate(df$HARM, list(df$EVTYPE), sum) # summarize harm by type
names(totHarmByEVT) <- c("EventType", "TotHarm") # name columns
totHarmByEVT<-totHarmByEVT[order(-totHarmByEVT$TotHarm),] # sort descending
## subset results
topTotHarm <- subset(totHarmByEVT[1:5,]) # take the top five event types by harm

## --------- Analyze EXPENSE -----------
totExpByEVT<-aggregate(df$EXPENSE, list(df$EVTYPE), sum)
names(totExpByEVT) <- c("EventType", "TotExpense")
totExpByEVT<-totExpByEVT[order(-totExpByEVT$TotExpense),]
## subset to top five events by expense
topTotExp <- subset(totExpByEVT[1:5,])
## convert expense amounts to Billions
topTotExp$TotExpense<-topTotExp[,2]/1000000000

Next, the damage impacts were plotted for each domain of ‘Impact to Human Health’ (HARM) and ‘Economic Impact’ (EXPENSE)

Code for plotting weather event impact

library(ggplot2)
library(scales)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Attaching package: ‘gridExtra’

The following object is masked from ‘package:dplyr’:

combine

## Harm
h <- ggplot(topTotHarm, aes(EventType))
h <- h + geom_bar(aes(weight = TotHarm, fill = EventType)) +
        scale_fill_brewer(palette = "Set1") +  ## set colors+
        scale_y_continuous(label = comma) + ## format the y axis
        ggtitle("Population Health Impact by Event Type") + ## add a title
        theme(legend.position = "none") + ## turn off legend
        labs(x = "Event Type", y = "Fatalities + Injuries") ## set custom labels
## Expense
e <- ggplot(topTotExp, aes(EventType))
e <- e + geom_bar(aes(weight = TotExpense, fill = EventType)) +
        scale_fill_brewer(palette = "Set1") +  ## set colors +
        scale_y_continuous(label = comma) + ## format the y axis
        ggtitle("Damage Expense Impact by Event Type") + ## add a title
        theme(legend.position = "none") + ## turn off legend
         ## set custom labels
        labs(x = "Event Type", y = "Property & Crop Expense ($Billions)")

Figure 1. Weather Event Impacts Top 5

require(gridExtra)
grid.arrange(h, e, ncol=2)

Summary It is clear from the data analyzed that in the United States from 1950 to 2011, tornados were the most destructive in terms of on both Human Health and Economic Impact as compared to all the other weather events.

—–End of Report—–