Public Health and Economic Impact of Storms and Other Severe Weather Events

Synopsis

The purpose of this project is to evaluate the public health and economic impact of storms and other severe weather events. The hypothesis is that certain storm types have more severe public health and economic impacts than others. To investigate this hypothesis, data was obtained from the Natioanal Weather Service. The events in the database start in the year 1950 and end in November 2011. Data for the years 2000 to 2010 were used to focus the study on the last decade. For purposes of this analysis, a fatality and injury are weighted equally when determining the toal public health impact of an event. In addition, both property damage and crop damage were combined to determine the total economic impact of the event. From this data, it was determined that Tornados had the highest public health impact as measured by the combination of fatalaties and injuries and that Floods type had the highest economic impact across all events over the study period.

Loading and processing the data

Download the data for the project Storm Data from the course website. The data is delimited with commas and stored in bz2 compressed format.

Prepare environment

=========================================================================================================

This chunk of code checks to see if all the packages needed to run this code are installed
and it calls the libraries to make sure they are available

=========================================================================================================

setwd("./")  #set the working directory to the default directory for the user

packages<-function(x){    #function to detemine if needed packages are installed
  x<-as.character(match.call()[[2]])
  if (!require(x,character.only=TRUE)){
    install.packages(pkgs=x,repos="http://cran.r-project.org")
    require(x,character.only=TRUE)
  }
}

packages(knitr)         #install and load knitr

## Loading required package: knitr

packages(dplyr)         #install and load dplyr

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

packages(ggplot2)       #install and load ggplot2

## Loading required package: ggplot2

packages(gridExtra)     #install and load gridExtra

## Loading required package: gridExtra

packages(grid)          #install and load grid

## Loading required package: grid

packages(mosaic)        #install and load mosaic

## Loading required package: mosaic
## Loading required package: lattice
## Loading required package: car
## Loading required package: mosaicData
## 
## Attaching package: 'mosaic'
## 
## The following object is masked from 'package:car':
## 
##     logit
## 
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## 
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## 
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum

packages(RGraphics)     #install and load RGraphics

## Loading required package: RGraphics

packages(downloader)    #install and load downloader

## Loading required package: downloader

packages(lubridate)

## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## 
## The following object is masked from 'package:mosaic':
## 
##     interval

remove(packages)        #function no longer needed

=========================================================================================================

Download the data if needed

#fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download(fileurl, dest="stormdata.csv.bz2", mode="wb")

Read data and review first few rows of file.

filenam <- "./stormdata.csv.bz2"        
stormdata <- read.csv(filenam, header = TRUE, sep = ",", na.strings = "NA") 

dim(stormdata)

## [1] 902297     37

head(stormdata[, 1:10])

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI
## 1 TORNADO         0        
## 2 TORNADO         0        
## 3 TORNADO         0        
## 4 TORNADO         0        
## 5 TORNADO         0        
## 6 TORNADO         0

After reading the data, we check the first few rows (There are 909,297) rows in the data set

========================================================================================================

A review of the data set indicates that the fields of interest include:

BGN_DATE = The start date of the weather event
COUNTYNAME = The county the event occured in
STATE = The state of the event
EVTYPE = the type of weather event
FATALITIES = Number of fatalities caused by the event
INJURIES = Number of injuries caused by the event
PROPDMG = The estimated amount of property damage caused by the event
PROPDMGEXP = The expression level of the property damage k = $1,000, M = $1M
CROPDMG = The estimated amount of crop damage caused by the event
CROPDMGEXP = The expression level of the property damage k = $1,000, M = $1M
REMARKS = A description of the event

======================================================================================================

This section of code stages the data set for analysis by:

Adding a study year integer to the dataset
Filtering the data to 2001 to 2010
Calculating the total public health and economic impact
Calulating the total ecomomic impact

stormdata <- mutate(stormdata, STUDY_YEAR  = year(as.Date(BGN_DATE, format = "%m/%d/%Y %H:%M:%S")))
studydata <- filter(stormdata, STUDY_YEAR > 2000 & STUDY_YEAR < 2011)
remove(stormdata)
#caculuate total public health impact = fatalities + injuries
studydata <- mutate(studydata, public_health_impact = FATALITIES + INJURIES) 
#convert property damage into whole dollars
studydata <- mutate(studydata, property_damage = ifelse(PROPDMGEXP == "K" , PROPDMG*1000,
                            ifelse(PROPDMGEXP == "M", PROPDMG*1000000, 0)))
#convert crop damage into whole dollars                            
studydata <- mutate(studydata, crop_damage = ifelse(CROPDMGEXP == "K" , CROPDMG*1000,
                            ifelse(CROPDMGEXP == "M", CROPDMG*1000000, 0)))
#combine property and crop damage for total impact                            
studydata <- mutate(studydata, totaldamage = property_damage + crop_damage)

========================================================================================================

The next step is to summarize the data in two ways:

Total economic impact by event type
Total public health impact by event type

total_impact  <- studydata %>%
    group_by(EVTYPE) %>% 
    summarize(public_health_impact = sum(public_health_impact), total_damage = sum(totaldamage))

dim(total_impact)

## [1] 169   3

There are 169 types of events reported in the study data. There are a number of events that have no economic or public health impact that be removed from the analysis. The next step removes the events that have 0 cummulative public health and economic impact

total_impact <- filter(total_impact, public_health_impact != 0 & total_damage !=0)

===============================================================================================================================

Results

Economic Impact Analysis

This section of code and then arranges the data in economic impact order, plots the chart of the data and reports on the statistics of the economic impacts of events

total_economic_impact <- arrange(total_impact, desc(total_damage))
total_economic_impact$evtypeorder <- reorder(total_economic_impact$EVTYPE, desc(total_economic_impact$total_damage))  
g1<- ggplot(total_economic_impact, aes(y=total_damage, x=evtypeorder)) 
graph1<- g1 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Damage") + xlab("Event Type")
graph2<- graph1 + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
grid.arrange(graph2, nrow=1, widths=c(960), heights=c(720))

head (total_economic_impact[1:10,])

## Source: local data frame [6 x 4]
## 
##              EVTYPE public_health_impact total_damage       evtypeorder
##              (fctr)                (dbl)        (dbl)            (fctr)
## 1             FLOOD                  501  13211177480             FLOOD
## 2              HAIL                  459  10869379610              HAIL
## 3       FLASH FLOOD                 1255   9706818510       FLASH FLOOD
## 4           TORNADO                 8733   9414048270           TORNADO
## 5           DROUGHT                    4   6512055000           DROUGHT
## 6 HURRICANE/TYPHOON                 1339   4903712800 HURRICANE/TYPHOON

summary(total_economic_impact$total_damage)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 1.000e+03 1.919e+06 4.686e+07 1.409e+09 8.292e+08 1.321e+10

From the plot above and the summary statistics it is clear that floods had the highest economic impact of all events with over *$13B impact in the study period. The top three events are related to rain or water. 24 event types have less than $1M impact over the 10 year period

==============================================================================================================================

Public Health Impact Analysis

This section of code arranges the data in public health impact order, plots the chart of the data and reports on the statistics of the public health impacts of events

total_public_health_impact <- arrange(total_impact, desc(public_health_impact))   
total_public_health_impact$phorder <- reorder(total_public_health_impact$EVTYPE, desc(total_public_health_impact$public_health_impact))  
g2<- ggplot(total_public_health_impact, aes(y=public_health_impact, x=phorder)) 
graph3<- g2 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Public Health Impact") + xlab("Event Type")
graph4<- graph3 + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
grid.arrange(graph4, nrow=1, widths=c(960), heights=c(720))

head (total_public_health_impact[1:10,])

## Source: local data frame [6 x 4]
## 
##              EVTYPE public_health_impact total_damage           phorder
##              (fctr)                (dbl)        (dbl)            (fctr)
## 1           TORNADO                 8733   9414048270           TORNADO
## 2    EXCESSIVE HEAT                 3924    495662000    EXCESSIVE HEAT
## 3         LIGHTNING                 2816    514112640         LIGHTNING
## 4         TSTM WIND                 1570   2122507560         TSTM WIND
## 5 HURRICANE/TYPHOON                 1339   4903712800 HURRICANE/TYPHOON
## 6       FLASH FLOOD                 1255   9706818510       FLASH FLOOD

summary(total_public_health_impact$public_health_impact)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    19.0   137.0   509.2   374.0  8733.0

From the plot above and the summary statistics it is clear that tornados had the highest economic impact of all events with *8,733 fatalities/injuries reported during the study period. The public health impact of tornadoes is more thab double the next event type,
excessive heat with 3,974 fatalities/injuries.

================================================================================================================================================

Final Analysis

A review of the top economic impact events compared to their corresponding public health impact indicates that thers is not a tight correlation between the high economic impact events and high public health impact events. The chart show below that the highest public health impact event ranks #4 on the economic impact list. The closest ranked events in both economic and public health impact is Hurricane/Typhoon ranking 5th in economic impact and 4th in public health impact.

correlationimpact<- total_economic_impact[1:20,] 
head(correlationimpact[1:10,1:3])

## Source: local data frame [6 x 3]
## 
##              EVTYPE public_health_impact total_damage
##              (fctr)                (dbl)        (dbl)
## 1             FLOOD                  501  13211177480
## 2              HAIL                  459  10869379610
## 3       FLASH FLOOD                 1255   9706818510
## 4           TORNADO                 8733   9414048270
## 5           DROUGHT                    4   6512055000
## 6 HURRICANE/TYPHOON                 1339   4903712800

#create compressed total economic impact graph
correlationimpact$evtypeorder <- reorder(correlationimpact$EVTYPE, desc(correlationimpact$total_damage))  
g3<- ggplot(correlationimpact, aes(y=total_damage, x=evtypeorder)) 
graph5<- g3 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Damage") + xlab("Event Type")
graph6<- graph5 + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
#create compressed total public health impact graph
correlationimpact$phorder <- reorder(correlationimpact$EVTYPE, desc(correlationimpact$public_health_impact))  
g4<- ggplot(correlationimpact, aes(y=public_health_impact, x=phorder)) 
graph7<- g4 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Public Health Impact") + xlab("Event Type")
graph8<- graph7 + theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

grid.arrange(graph6, graph8, ncol=1, nrow=2, widths=c(960), heights=c(720,720) )   #combine plot 6,8