Synopsis

In this study we aimed to assess the health and economic impacts of weather events recorded from the 1950’s until 2012. Health impact was measured by reported injuries and fatalities for each event. Economic impact was measured by estimated property (homes, vehicles, infrastructure, and buildings) and crop damage costs. To make the data more digestable tables and lots include the top 1% of weather events based on the average or total impact (deaths/injuries, damages). The weather events wih the largest health impacts appeared to be tornados which have caused the greatest number of deaths. In terms of monetary damages Hurricanes and Hurricane-like events cause the most damage on average, but flooding has caused the most damage in total.

Data Processing

Load library/packages.

library(ggplot2)
library(reshape2)
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.1.1
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.33.0 (2014-08-24) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(plyr)

Download, decompress and read in data. Remove unneeded columns to free up memory.

download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "NOAA_storm.csv.bz2", method="curl")

bunzip2(filename = "NOAA_storm.csv.bz2", overwrite = TRUE)

NOAA<-read.csv("NOAA_storm.csv")

head(NOAA)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
StormNOAA<-NOAA[,c(2,7,8,23,24,25,26,27,28)]
rm(NOAA)
head(StormNOAA)
##             BGN_DATE STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1  4/18/1950 0:00:00    AL TORNADO          0       15    25.0          K
## 2  4/18/1950 0:00:00    AL TORNADO          0        0     2.5          K
## 3  2/20/1951 0:00:00    AL TORNADO          0        2    25.0          K
## 4   6/8/1951 0:00:00    AL TORNADO          0        2     2.5          K
## 5 11/15/1951 0:00:00    AL TORNADO          0        2     2.5          K
## 6 11/15/1951 0:00:00    AL TORNADO          0        6     2.5          K
##   CROPDMG CROPDMGEXP
## 1       0           
## 2       0           
## 3       0           
## 4       0           
## 5       0           
## 6       0

Create a smaller dataset to save on computational time (this is to speed testing), set records to the number entries you want sampled from the original dataset (must be a positive real number)

records<-100000
StormNOAA<-StormNOAA[sample(nrow(StormNOAA), floor(abs(records)),]
list(StormNOAA$EVTYPE)
StormNOAA$EVTYPE<-droplevels(StormNOAA$EVTYPE)
str(StormNOAA)

Process and aggregate the data for question 1. Here I am tabulating the Mean, total and maximum number of injuries and deaths for each weather event. In an ideal world I would attempt to collapse similar events (i.e.“WINTER WEATHER/MIX”, “WINTERY MIX”, “Wintry mix”, “Wintry Mix”, “WINTRY MIX”)

health<-cbind(aggregate(x = StormNOAA[,4]+StormNOAA[,5], by = list(StormNOAA$EVTYPE), FUN = mean, data = StormNOAA), aggregate(StormNOAA[,4]+StormNOAA[,5] ~ StormNOAA$EVTYPE, FUN = sum,data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,4]+StormNOAA[,5] ~ StormNOAA$EVTYPE, FUN = max ,data = StormNOAA), table(StormNOAA$EVTYPE))[,c(1,2,4,6,8)]

colnames(health)=c("EVTYPE","Average", "Total", "Maximum", "No. of Events")

First, the damage estimates needed to be expanded out. PROPDMGEXP denotes 100, 1000, 1000000, 1000000000’s, PROPDMG is the precise amount to 3 sig-dig. So an event like the one below:

StormNOAA[187584,]
##                 BGN_DATE STATE         EVTYPE FATALITIES INJURIES PROPDMG
## 187584 10/4/1995 0:00:00    AL HURRICANE OPAL          0        0      20
##        PROPDMGEXP CROPDMG CROPDMGEXP
## 187584          m      10          m

In this instance, $20,000,000 in damage. The same convention also holds for Crop damages CROP. Some of the DMGEXP fields have characters other than H,K,M,B. These entries will be ignored as I can not reliablely determine the meaning of these codes. One thought was that they too were the number of trailing zeros for the *DMG field, owever, these codes show up in entries with no reported damage.

StormNOAA[StormNOAA$PROPDMGEXP=="m"|StormNOAA$PROPDMGEXP=="M",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="m"|StormNOAA$PROPDMGEXP=="M",6]*1000000
StormNOAA[StormNOAA$PROPDMGEXP=="K",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="K",6]*1000
StormNOAA[StormNOAA$PROPDMGEXP=="B",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="B",6]*1000000000
StormNOAA[StormNOAA$PROPDMGEXP=="h"|StormNOAA$PROPDMGEXP=="H",6]<-StormNOAA[StormNOAA$PROPDMGEXP=="h"|StormNOAA$PROPDMGEXP=="H",6]*1000000
StormNOAA[!(StormNOAA$PROPDMGEXP %in% c("k","K","h","H","B","m","M")),6 ]<-0

StormNOAA[StormNOAA$CROPDMGEXP=="m"|StormNOAA$CROPDMGEXP=="M",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="m"|StormNOAA$CROPDMGEXP=="M",8]*1000000
StormNOAA[StormNOAA$CROPDMGEXP=="K",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="K",8]*1000
StormNOAA[StormNOAA$CROPDMGEXP=="B",8]<-StormNOAA[StormNOAA$CROPDMGEXP=="B",8]*1000000000
StormNOAA[!(StormNOAA$CROPDMGEXP %in% c("k","K","h","H","B","m","M")),8 ]<-0

econ<-cbind(aggregate(x = StormNOAA[,6]+StormNOAA[,8], by = list(StormNOAA$EVTYPE), FUN = mean, data = StormNOAA), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = sum,data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = max, data = StormNOAA, na.rm=TRUE), aggregate(StormNOAA[,6]+StormNOAA[,8] ~ StormNOAA$EVTYPE, FUN = sd,data = StormNOAA, na.rm=TRUE), table(StormNOAA$EVTYPE))[,c(1,2,4,6,8,10)]

colnames(econ)=c("EVTYPE","Average", "Total", "Max","SD", "No. of Events")

Results

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The effect of weather events on population health can be assessed by injuries and fatalites for each event. I first tabulated the mean and total number of health incidents (injuries and fatalities) per event type. I then created a boxplot the top 1% (either mean or total) of events to assess the most severe weather events to human health.

subset(health, Average>quantile(health$Average, c(.99)) | Total>quantile(health$Total, c(.99)))
##                         EVTYPE  Average Total Maximum No. of Events
## 130             EXCESSIVE HEAT  5.02265  8428     521          1678
## 153                FLASH FLOOD  0.05076  2755     159         54277
## 170                      FLOOD  0.28662  7259     802         25326
## 275                       HEAT  3.95958  3037     583           767
## 277                  Heat Wave 70.00000    70      70             1
## 279          HEAT WAVE DROUGHT 19.00000    19      19             1
## 366         HIGH WIND AND SEAS 23.00000    23      23             1
## 411          HURRICANE/TYPHOON 15.21591  1339     787            88
## 427                  ICE STORM  1.02891  2064    1569          2006
## 464                  LIGHTNING  0.38378  6046      51         15754
## 656            SNOW/HIGH WINDS 18.00000    36      34             2
## 760          THUNDERSTORM WIND  0.01963  1621      70         82563
## 821              THUNDERSTORMW 27.00000    27      27             1
## 834                    TORNADO  1.59894 96979    1742         60652
## 842 TORNADOES, TSTM WIND, HAIL 25.00000    25      25             1
## 851      TROPICAL STORM GORDON 51.00000    51      51             1
## 856                  TSTM WIND  0.03392  7461      60        219940
## 954                 WILD FIRES 38.25000   153     153             4
## 972               WINTER STORM  0.13356  1527     170         11433
## 973    WINTER STORM HIGH WINDS 16.00000    16      16             1

This table shows the mean, maxmimum, and total number of injuries/deaths for the top 1% of weather events. You can also see the how many of each weather event has occured during the recorded time period to assess the frequency of each weather event.

topTenNOAA<-StormNOAA[StormNOAA$EVTYPE %in% health[health$Average>quantile(health$Average, c(.99))|health$Total>quantile(health$Total, c(.99)),1],]

topTenNOAA$EVTYPE<-droplevels(topTenNOAA$EVTYPE)

par(mar=c(8,4,2.5,1.5))
boxplot(topTenNOAA$INJURIES+topTenNOAA$FATALITIES~topTenNOAA$EVTYPE, las=2, yaxt="n", cex.axis=0.5, ylab="Number of Deaths and Injuries", pch=19, range=0)
axis(2, cex.lab=1)

plot of chunk unnamed-chunk-8

The bloxplot is a graphical representation of the range in health effects for the top 1% of weather events. This shows that some events have widley varying effects on health outcomes while others do not. The data show that Tornados have caused the most injury/loss to life while Huricane/Typhons and wild fires have a higher average injury/fatality rate, though they are rarer.

Question 2: Across the United States, which types of events have the greatest economic consequences?

subset(econ, Average>quantile(econ$Average, c(.99)) | Total>quantile(econ$Total, c(.99)))
##                         EVTYPE   Average     Total       Max        SD
## 95                     DROUGHT 6.036e+06 1.502e+10 1.000e+09 4.397e+07
## 136          EXCESSIVE WETNESS 1.420e+08 1.420e+08 1.420e+08        NA
## 153                FLASH FLOOD 3.236e+05 1.756e+10 1.000e+09 6.276e+06
## 170                      FLOOD 5.935e+06 1.503e+11 1.150e+11 7.234e+08
## 244                       HAIL 6.500e+04 1.876e+10 1.800e+09 4.527e+06
## 299  HEAVY RAIN/SEVERE WEATHER 1.250e+09 2.500e+09 2.500e+09 1.768e+09
## 402                  HURRICANE 8.397e+07 1.461e+10 3.500e+09 3.328e+08
## 408             HURRICANE OPAL 3.546e+08 3.192e+09 2.105e+09 7.334e+08
## 409  HURRICANE OPAL/HIGH WINDS 1.100e+08 1.100e+08 1.100e+08        NA
## 411          HURRICANE/TYPHOON 8.172e+08 7.191e+10 1.693e+10 2.494e+09
## 427                  ICE STORM 4.470e+06 8.967e+09 5.000e+09 1.127e+08
## 590                RIVER FLOOD 5.866e+07 1.015e+10 1.000e+10 7.602e+08
## 604        SEVERE THUNDERSTORM 9.274e+07 1.206e+09 1.200e+09 3.327e+08
## 670                STORM SURGE 1.660e+08 4.332e+10 3.130e+10 2.056e+09
## 834                    TORNADO 9.456e+05 5.735e+10 2.800e+09 1.644e+07
## 842 TORNADOES, TSTM WIND, HAIL 1.602e+09 1.602e+09 1.602e+09        NA
## 954                 WILD FIRES 1.560e+08 6.241e+08 6.190e+08 3.087e+08
##     No. of Events
## 95           2488
## 136             1
## 153         54277
## 170         25326
## 244        288661
## 299             2
## 402           174
## 408             9
## 409             1
## 411            88
## 427          2006
## 590           173
## 604            13
## 670           261
## 834         60652
## 842             1
## 954             4

The above table reports the average damage costs, maximum, and total damage costs incurred by each type of weather event. You can also see how many of each weather event has occured during the recorded time period.

topTenEcon<-StormNOAA[StormNOAA$EVTYPE %in% econ[econ$Average>quantile(econ$Average, c(.99))|econ$Total>quantile(econ$Total, c(.99)),1],]

topEcon<-subset(econ, Average>quantile(econ$Average, c(.99)) | Total>quantile(econ$Total, c(.99)))

ggplot(topEcon, aes(x=topEcon$EVTYPE, y=topEcon$Average))+geom_bar(position=position_dodge(),stat="identity")+geom_errorbar(aes(ymin=0, ymax=Average+SD), width=0.2, position=position_dodge(0.9))+ylab("Cost (USD)") + xlab("Weather Events")+theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-10

The above chart depicts the average cost per weather event (top 1% in average cost or total cost) plus the standard deviation (there is no minus S.D. because there is no such thing as negative cost). From the chart and teh table you can see that Hurricanes/Severe storms cause the most monetary damage on average however Floods have down the most damage in total (see table).