The purpose of this project is to evaluate the public health and economic impact of storms and other severe weather events. The hypothesis is that certain storm types have more severe public health and economic impacts than others. To investigate this hypothesis, data was obtained from the Natioanal Weather Service. The events in the database start in the year 1950 and end in November 2011. Data for the years 2000 to 2010 were used to focus the study on the last decade. For purposes of this analysis, a fatality and injury are weighted equally when determining the toal public health impact of an event. In addition, both property damage and crop damage were combined to determine the total economic impact of the event. From this data, it was determined that Tornados had the highest public health impact as measured by the combination of fatalaties and injuries and that Floods type had the highest economic impact across all events over the study period.
Download the data for the project Storm Data from the course website. The data is delimited with commas and stored in bz2 compressed format.
=========================================================================================================
This chunk of code checks to see if all the packages needed to run this code are installed
and it calls the libraries to make sure they are available
=========================================================================================================
setwd("./") #set the working directory to the default directory for the user
packages<-function(x){ #function to detemine if needed packages are installed
x<-as.character(match.call()[[2]])
if (!require(x,character.only=TRUE)){
install.packages(pkgs=x,repos="http://cran.r-project.org")
require(x,character.only=TRUE)
}
}
packages(knitr) #install and load knitr
## Loading required package: knitr
packages(dplyr) #install and load dplyr
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
packages(ggplot2) #install and load ggplot2
## Loading required package: ggplot2
packages(gridExtra) #install and load gridExtra
## Loading required package: gridExtra
packages(grid) #install and load grid
## Loading required package: grid
packages(mosaic) #install and load mosaic
## Loading required package: mosaic
## Loading required package: lattice
## Loading required package: car
## Loading required package: mosaicData
##
## Attaching package: 'mosaic'
##
## The following object is masked from 'package:car':
##
## logit
##
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
##
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
##
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
packages(RGraphics) #install and load RGraphics
## Loading required package: RGraphics
packages(downloader) #install and load downloader
## Loading required package: downloader
packages(lubridate)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
##
## The following object is masked from 'package:mosaic':
##
## interval
remove(packages) #function no longer needed
=========================================================================================================
#fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download(fileurl, dest="stormdata.csv.bz2", mode="wb")
Read data and review first few rows of file.
filenam <- "./stormdata.csv.bz2"
stormdata <- read.csv(filenam, header = TRUE, sep = ",", na.strings = "NA")
dim(stormdata)
## [1] 902297 37
head(stormdata[, 1:10])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI
## 1 TORNADO 0
## 2 TORNADO 0
## 3 TORNADO 0
## 4 TORNADO 0
## 5 TORNADO 0
## 6 TORNADO 0
After reading the data, we check the first few rows (There are 909,297) rows in the data set
========================================================================================================
A review of the data set indicates that the fields of interest include:
======================================================================================================
This section of code stages the data set for analysis by:
stormdata <- mutate(stormdata, STUDY_YEAR = year(as.Date(BGN_DATE, format = "%m/%d/%Y %H:%M:%S")))
studydata <- filter(stormdata, STUDY_YEAR > 2000 & STUDY_YEAR < 2011)
remove(stormdata)
#caculuate total public health impact = fatalities + injuries
studydata <- mutate(studydata, public_health_impact = FATALITIES + INJURIES)
#convert property damage into whole dollars
studydata <- mutate(studydata, property_damage = ifelse(PROPDMGEXP == "K" , PROPDMG*1000,
ifelse(PROPDMGEXP == "M", PROPDMG*1000000, 0)))
#convert crop damage into whole dollars
studydata <- mutate(studydata, crop_damage = ifelse(CROPDMGEXP == "K" , CROPDMG*1000,
ifelse(CROPDMGEXP == "M", CROPDMG*1000000, 0)))
#combine property and crop damage for total impact
studydata <- mutate(studydata, totaldamage = property_damage + crop_damage)
========================================================================================================
The next step is to summarize the data in two ways:
total_impact <- studydata %>%
group_by(EVTYPE) %>%
summarize(public_health_impact = sum(public_health_impact), total_damage = sum(totaldamage))
dim(total_impact)
## [1] 169 3
There are 169 types of events reported in the study data. There are a number of events that have no economic or public health impact that be removed from the analysis. The next step removes the events that have 0 cummulative public health and economic impact
total_impact <- filter(total_impact, public_health_impact != 0 & total_damage !=0)
===============================================================================================================================
This section of code and then arranges the data in economic impact order, plots the chart of the data and reports on the statistics of the economic impacts of events
total_economic_impact <- arrange(total_impact, desc(total_damage))
total_economic_impact$evtypeorder <- reorder(total_economic_impact$EVTYPE, desc(total_economic_impact$total_damage))
g1<- ggplot(total_economic_impact, aes(y=total_damage, x=evtypeorder))
graph1<- g1 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Damage") + xlab("Event Type")
graph2<- graph1 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(graph2, nrow=1, widths=c(960), heights=c(720))
head (total_economic_impact[1:10,])
## Source: local data frame [6 x 4]
##
## EVTYPE public_health_impact total_damage evtypeorder
## (fctr) (dbl) (dbl) (fctr)
## 1 FLOOD 501 13211177480 FLOOD
## 2 HAIL 459 10869379610 HAIL
## 3 FLASH FLOOD 1255 9706818510 FLASH FLOOD
## 4 TORNADO 8733 9414048270 TORNADO
## 5 DROUGHT 4 6512055000 DROUGHT
## 6 HURRICANE/TYPHOON 1339 4903712800 HURRICANE/TYPHOON
summary(total_economic_impact$total_damage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000e+03 1.919e+06 4.686e+07 1.409e+09 8.292e+08 1.321e+10
From the plot above and the summary statistics it is clear that floods had the highest economic impact of all events with over *$13B impact in the study period. The top three events are related to rain or water. 24 event types have less than $1M impact over the 10 year period
==============================================================================================================================
This section of code arranges the data in public health impact order, plots the chart of the data and reports on the statistics of the public health impacts of events
total_public_health_impact <- arrange(total_impact, desc(public_health_impact))
total_public_health_impact$phorder <- reorder(total_public_health_impact$EVTYPE, desc(total_public_health_impact$public_health_impact))
g2<- ggplot(total_public_health_impact, aes(y=public_health_impact, x=phorder))
graph3<- g2 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Public Health Impact") + xlab("Event Type")
graph4<- graph3 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(graph4, nrow=1, widths=c(960), heights=c(720))
head (total_public_health_impact[1:10,])
## Source: local data frame [6 x 4]
##
## EVTYPE public_health_impact total_damage phorder
## (fctr) (dbl) (dbl) (fctr)
## 1 TORNADO 8733 9414048270 TORNADO
## 2 EXCESSIVE HEAT 3924 495662000 EXCESSIVE HEAT
## 3 LIGHTNING 2816 514112640 LIGHTNING
## 4 TSTM WIND 1570 2122507560 TSTM WIND
## 5 HURRICANE/TYPHOON 1339 4903712800 HURRICANE/TYPHOON
## 6 FLASH FLOOD 1255 9706818510 FLASH FLOOD
summary(total_public_health_impact$public_health_impact)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 19.0 137.0 509.2 374.0 8733.0
From the plot above and the summary statistics it is clear that tornados had the highest economic impact of all events with *8,733 fatalities/injuries reported during the study period. The public health impact of tornadoes is more thab double the next event type,
excessive heat with 3,974 fatalities/injuries.
================================================================================================================================================
A review of the top economic impact events compared to their corresponding public health impact indicates that thers is not a tight correlation between the high economic impact events and high public health impact events. The chart show below that the highest public health impact event ranks #4 on the economic impact list. The closest ranked events in both economic and public health impact is Hurricane/Typhoon ranking 5th in economic impact and 4th in public health impact.
correlationimpact<- total_economic_impact[1:20,]
head(correlationimpact[1:10,1:3])
## Source: local data frame [6 x 3]
##
## EVTYPE public_health_impact total_damage
## (fctr) (dbl) (dbl)
## 1 FLOOD 501 13211177480
## 2 HAIL 459 10869379610
## 3 FLASH FLOOD 1255 9706818510
## 4 TORNADO 8733 9414048270
## 5 DROUGHT 4 6512055000
## 6 HURRICANE/TYPHOON 1339 4903712800
#create compressed total economic impact graph
correlationimpact$evtypeorder <- reorder(correlationimpact$EVTYPE, desc(correlationimpact$total_damage))
g3<- ggplot(correlationimpact, aes(y=total_damage, x=evtypeorder))
graph5<- g3 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Damage") + xlab("Event Type")
graph6<- graph5 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#create compressed total public health impact graph
correlationimpact$phorder <- reorder(correlationimpact$EVTYPE, desc(correlationimpact$public_health_impact))
g4<- ggplot(correlationimpact, aes(y=public_health_impact, x=phorder))
graph7<- g4 + geom_histogram(binwidth=500, fill=NA, color="blue", stat="identity") + theme_bw() + ylab("Public Health Impact") + xlab("Event Type")
graph8<- graph7 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(graph6, graph8, ncol=1, nrow=2, widths=c(960), heights=c(720,720) ) #combine plot 6,8