In this study we aimed to assess the health and economic impacts of weather events recorded from the 1950’s until 2012. Health impact was measured by reported injuries and fatalities for each event. Economic impact was measured by estimated property (homes, vehicles, infrastructure, and buildings) and crop damage costs. To make the data more digestable tables and lots include the top 1% of weather events based on the average or total impact (deaths/injuries, damages). The weather events wih the largest health impacts appeared to be tornados which have caused the greatest number of deaths. In terms of monetary damages Hurricanes and Hurricane-like events cause the most damage on average, but flooding has caused the most damage in total.
Load library/packages.
library(ggplot2)
library(reshape2)
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.1.1
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.33.0 (2014-08-24) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(plyr)
Download, decompress and read in data. Remove unneeded columns to free up memory.
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "NOAA_storm.csv.bz2", method="curl")
bunzip2(filename = "NOAA_storm.csv.bz2", overwrite = TRUE)
NOAA<-read.csv("NOAA_storm.csv")
head(NOAA)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
StormNOAA<-NOAA[,c(2,7,8,23,24,25,26,27,28)]
rm(NOAA)
head(StormNOAA)
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 4/18/1950 0:00:00 AL TORNADO 0 15 25.0 K
## 2 4/18/1950 0:00:00 AL TORNADO 0 0 2.5 K
## 3 2/20/1951 0:00:00 AL TORNADO 0 2 25.0 K
## 4 6/8/1951 0:00:00 AL TORNADO 0 2 2.5 K
## 5 11/15/1951 0:00:00 AL TORNADO 0 2 2.5 K
## 6 11/15/1951 0:00:00 AL TORNADO 0 6 2.5 K
## CROPDMG CROPDMGEXP
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
Create a smaller dataset to save on computational time (this is to speed testing), set records to the number entries you want sampled from the original dataset (must be a positive real number)
records<-100000
StormNOAA<-StormNOAA[sample(nrow(StormNOAA), floor(abs(records)),]
list(StormNOAA$EVTYPE)
StormNOAA$EVTYPE<-droplevels(StormNOAA$EVTYPE)
str(StormNOAA)
Process and aggregate the data for question 1. Here I am tabulating the Mean, total and maximum number of injuries and deaths for each weather event. In an ideal world I would attempt to collapse similar events (i.e.“WINTER WEATHER/MIX”, “WINTERY MIX”, “Wintry mix”, “Wintry Mix”, “WINTRY MIX”)
health<-cbind(aggregate(x = StormNOAA[,4]+StormNOAA[,5], by = list(StormNOAA$EVTYPE), FUN = mean, data = StormNOAA), aggregate(StormNOAA[,4]+StormNOAA[,5] ~ StormNOAA$EVTYPE, FUN = sum,data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,4]+StormNOAA[,5] ~ StormNOAA$EVTYPE, FUN = max ,data = StormNOAA), table(StormNOAA$EVTYPE))[,c(1,2,4,6,8)]
colnames(health)=c("EVTYPE","Average", "Total", "Maximum", "No. of Events")
First, the damage estimates needed to be expanded out. PROPDMGEXP denotes 100, 1000, 1000000, 1000000000’s, PROPDMG is the precise amount to 3 sig-dig. So an event like the one below:
StormNOAA[187584,]
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG
## 187584 10/4/1995 0:00:00 AL HURRICANE OPAL 0 0 20
## PROPDMGEXP CROPDMG CROPDMGEXP
## 187584 m 10 m
In this instance, $20,000,000 in damage. The same convention also holds for Crop damages CROP. Some of the DMGEXP fields have characters other than H,K,M,B. These entries will be ignored as I can not reliablely determine the meaning of these codes. One thought was that they too were the number of trailing zeros for the *DMG field, owever, these codes show up in entries with no reported damage.
StormNOAA[StormNOAA$PROPDMGEXP=="m"|StormNOAA$PROPDMGEXP=="M",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="m"|StormNOAA$PROPDMGEXP=="M",6]*1000000
StormNOAA[StormNOAA$PROPDMGEXP=="K",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="K",6]*1000
StormNOAA[StormNOAA$PROPDMGEXP=="B",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="B",6]*1000000000
StormNOAA[StormNOAA$PROPDMGEXP=="h"|StormNOAA$PROPDMGEXP=="H",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="h"|StormNOAA$PROPDMGEXP=="H",6]*1000000
StormNOAA[!(StormNOAA$PROPDMGEXP %in% c("k","K","h","H","B","m","M")),6 ]<-0
StormNOAA[StormNOAA$CROPDMGEXP=="m"|StormNOAA$CROPDMGEXP=="M",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="m"|StormNOAA$CROPDMGEXP=="M",8]*1000000
StormNOAA[StormNOAA$CROPDMGEXP=="K",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="K",8]*1000
StormNOAA[StormNOAA$CROPDMGEXP=="B",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="B",8]*1000000000
StormNOAA[!(StormNOAA$CROPDMGEXP %in% c("k","K","h","H","B","m","M")),8 ]<-0
econ<-cbind(aggregate(x = StormNOAA[,6]+StormNOAA[,8], by = list(StormNOAA$EVTYPE), FUN = mean, data = StormNOAA), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = sum,data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = max, data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = sd,data = StormNOAA, na.rm=TRUE), table(StormNOAA$EVTYPE))[,c(1,2,4,6,8,10)]
colnames(econ)=c("EVTYPE","Average", "Total", "Max","SD", "No. of Events")
The effect of weather events on population health can be assessed by injuries and fatalites for each event. I first tabulated the mean and total number of health incidents (injuries and fatalities) per event type. I then created a boxplot the top 1% (either mean or total) of events to assess the most severe weather events to human health.
subset(health, Average>quantile(health$Average, c(.99)) | Total>quantile(health$Total, c(.99)))
## EVTYPE Average Total Maximum No. of Events
## 130 EXCESSIVE HEAT 5.02265 8428 521 1678
## 153 FLASH FLOOD 0.05076 2755 159 54277
## 170 FLOOD 0.28662 7259 802 25326
## 275 HEAT 3.95958 3037 583 767
## 277 Heat Wave 70.00000 70 70 1
## 279 HEAT WAVE DROUGHT 19.00000 19 19 1
## 366 HIGH WIND AND SEAS 23.00000 23 23 1
## 411 HURRICANE/TYPHOON 15.21591 1339 787 88
## 427 ICE STORM 1.02891 2064 1569 2006
## 464 LIGHTNING 0.38378 6046 51 15754
## 656 SNOW/HIGH WINDS 18.00000 36 34 2
## 760 THUNDERSTORM WIND 0.01963 1621 70 82563
## 821 THUNDERSTORMW 27.00000 27 27 1
## 834 TORNADO 1.59894 96979 1742 60652
## 842 TORNADOES, TSTM WIND, HAIL 25.00000 25 25 1
## 851 TROPICAL STORM GORDON 51.00000 51 51 1
## 856 TSTM WIND 0.03392 7461 60 219940
## 954 WILD FIRES 38.25000 153 153 4
## 972 WINTER STORM 0.13356 1527 170 11433
## 973 WINTER STORM HIGH WINDS 16.00000 16 16 1
This table shows the mean, maxmimum, and total number of injuries/deaths for the top 1% of weather events. You can also see the how many of each weather event has occured during the recorded time period to assess the frequency of each weather event.
topTenNOAA<-StormNOAA[StormNOAA$EVTYPE %in% health[health$Average>quantile(health$Average, c(.99))|health$Total>quantile(health$Total, c(.99)),1],]
topTenNOAA$EVTYPE<-droplevels(topTenNOAA$EVTYPE)
par(mar=c(8,4,2.5,1.5))
boxplot(topTenNOAA$INJURIES+topTenNOAA$FATALITIES~topTenNOAA$EVTYPE, las=2, yaxt="n", cex.axis=0.5, ylab="Number of Deaths and Injuries", pch=19, range=0)
axis(2, cex.lab=1)
The bloxplot is a graphical representation of the range in health effects for the top 1% of weather events. This shows that some events have widley varying effects on health outcomes while others do not. The data show that Tornados have caused the most injury/loss to life while Huricane/Typhons and wild fires have a higher average injury/fatality rate, though they are rarer.
subset(econ, Average>quantile(econ$Average, c(.99)) | Total>quantile(econ$Total, c(.99)))
## EVTYPE Average Total Max SD
## 95 DROUGHT 6.036e+06 1.502e+10 1.000e+09 4.397e+07
## 136 EXCESSIVE WETNESS 1.420e+08 1.420e+08 1.420e+08 NA
## 153 FLASH FLOOD 3.236e+05 1.756e+10 1.000e+09 6.276e+06
## 170 FLOOD 5.935e+06 1.503e+11 1.150e+11 7.234e+08
## 244 HAIL 6.500e+04 1.876e+10 1.800e+09 4.527e+06
## 299 HEAVY RAIN/SEVERE WEATHER 1.250e+09 2.500e+09 2.500e+09 1.768e+09
## 402 HURRICANE 8.397e+07 1.461e+10 3.500e+09 3.328e+08
## 408 HURRICANE OPAL 3.546e+08 3.192e+09 2.105e+09 7.334e+08
## 409 HURRICANE OPAL/HIGH WINDS 1.100e+08 1.100e+08 1.100e+08 NA
## 411 HURRICANE/TYPHOON 8.172e+08 7.191e+10 1.693e+10 2.494e+09
## 427 ICE STORM 4.470e+06 8.967e+09 5.000e+09 1.127e+08
## 590 RIVER FLOOD 5.866e+07 1.015e+10 1.000e+10 7.602e+08
## 604 SEVERE THUNDERSTORM 9.274e+07 1.206e+09 1.200e+09 3.327e+08
## 670 STORM SURGE 1.660e+08 4.332e+10 3.130e+10 2.056e+09
## 834 TORNADO 9.456e+05 5.735e+10 2.800e+09 1.644e+07
## 842 TORNADOES, TSTM WIND, HAIL 1.602e+09 1.602e+09 1.602e+09 NA
## 954 WILD FIRES 1.560e+08 6.241e+08 6.190e+08 3.087e+08
## No. of Events
## 95 2488
## 136 1
## 153 54277
## 170 25326
## 244 288661
## 299 2
## 402 174
## 408 9
## 409 1
## 411 88
## 427 2006
## 590 173
## 604 13
## 670 261
## 834 60652
## 842 1
## 954 4
The above table reports the average damage costs, maximum, and total damage costs incurred by each type of weather event. You can also see how many of each weather event has occured during the recorded time period.
topTenEcon<-StormNOAA[StormNOAA$EVTYPE %in% econ[econ$Average>quantile(econ$Average, c(.99))|econ$Total>quantile(econ$Total, c(.99)),1],]
topEcon<-subset(econ, Average>quantile(econ$Average, c(.99)) | Total>quantile(econ$Total, c(.99)))
ggplot(topEcon, aes(x=topEcon$EVTYPE, y=topEcon$Average))+geom_bar(position=position_dodge(),stat="identity")+geom_errorbar(aes(ymin=0, ymax=Average+SD), width=0.2, position=position_dodge(0.9))+ylab("Cost (USD)") + xlab("Weather Events")+theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_path).
The above chart depicts the average cost per weather event (top 1% in average cost or total cost) plus the standard deviation (there is no minus S.D. because there is no such thing as negative cost). From the chart and teh table you can see that Hurricanes/Severe storms cause the most monetary damage on average however Floods have down the most damage in total (see table).