Health and Economic effects of Severe Weather in the US

Synopsis

This report investigates the health and economic effects of severe weather events on communities in the US. It determines which events are most harmful to population health, including fatalaties and injuries. Severe weather events with the greatest economic impacts, such as property damage, are also analyzed. The data analyzed are obtained from the National Oceanic and Atmospheric Administration (NOAA) storm database. The data used covers the period from 1950 through November, 2011.

The results clearly show that the severe weather events with the greatest health impact, both for fatalities as well as injuries, are tornado events, followed by excessive heat as a cause for fatalities. The main cause for total damages, and property damage in particular, is seen to be tornados. For crops, hail is the severe weather event causing the greatest damage.

Data Processing

The data is the relevant NOAA storm data from 1950 through November 2011. Metadata about the weather events data can be found at:

Obtaining and Reading the Data

The data was obtained from a class CDN site at: raw data

# Locations for the data.
dataDir <- "./data/"
dataUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zipFile <- "StormData.csv.bz2"

if (!file.exists(dataDir)) {
    dir.create(dataDir)
}

if (!file.exists(paste(dataDir, zipFile, sep = ""))) {
    download.file(dataUrl, destfile = paste(dataDir, zipFile, sep = ""), method = "curl")
    download_date <- date()
}

The data was read into an R data frame from the CSV file.

stormdata <- read.table(bzfile(paste(dataDir, zipFile, sep = "")), sep = ",", 
    na.string = "", header = TRUE)
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "10/10/1954 0:00:00",..: 6523 6523 4213 11116 1426 1426 1462 2873 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "000","0000","00:00:00 AM",..: 212 257 2645 1563 2524 3126 122 1563 3126 3126 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29600 levels "5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13512 1872 4597 10591 4371 10093 1972 23872 24417 4597 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 34 levels "E","Eas","EE",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ BGN_LOCATI: Factor w/ 54428 levels "?","(01R)AFB GNRY RNG AL",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ END_DATE  : Factor w/ 6662 levels "10/10/1993 0:00:00",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ END_TIME  : Factor w/ 3646 levels "?","0000","00:00:00 AM",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 23 levels "E","ENE","ESE",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ END_LOCATI: Factor w/ 34505 levels "(0E4)PAYSON ARPT",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ WFO       : Factor w/ 541 levels "2","43","9V9",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ STATEOFFIC: Factor w/ 249 levels "ALABAMA, Central",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ ZONENAMES : Factor w/ 25111 levels "                                                                                                                               "| __truncated__,..: NA NA NA NA NA NA NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436780 levels " ","  ","   ",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We will be investigating the economic and health effects of weather events across the US. We will summarize the data for this report over the entire time period, so we can aggregate the information by event type for the entire data set.

The columns of the data we are interested in are then:

subdata <- aggregate(cbind(FATALITIES, INJURIES, PROPDMG, CROPDMG) ~ EVTYPE, 
    data = stormdata, sum)
str(subdata)
## 'data.frame':    985 obs. of  5 variables:
##  $ EVTYPE    : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 0 1 ...
##  $ INJURIES  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMG   : num  5 0 0 0 0 ...
##  $ CROPDMG   : num  0 0 0 0 0 ...

Results

The data includes health effects, both fatalities and injuries, as well as economic impacts, both property and crop damage.

library(ggplot2)
fatalityorder <- order(-subdata$FATALITIES)
fatalitydata <- head(subdata[fatalityorder, ], n = 7)
fatalitydata$EVTYPE <- reorder(fatalitydata$EVTYPE, fatalitydata$FATALITIES, 
    function(x) -x)
fatalityplot <- qplot(EVTYPE, FATALITIES, data = fatalitydata, geom = "bar", 
    stat = "identity") + theme(axis.text.x = element_text(angle = 25, hjust = 1)) + 
    labs(title = "Most frequent cause of fatalities", y = "Number of Fatalities", 
        x = "")
injuryorder <- order(-subdata$INJURIES)
injurydata <- head(subdata[injuryorder, ], n = 7)
injurydata$EVTYPE <- reorder(injurydata$EVTYPE, injurydata$INJURIES, function(x) -x)
injuryplot <- qplot(EVTYPE, INJURIES, data = injurydata, geom = "bar", stat = "identity") + 
    theme(axis.text.x = element_text(angle = 25, hjust = 1)) + labs(title = "Most frequent cause of injuries", 
    y = "Number of Injuries", x = "")

The following figure includes a plot on the left showing the most prevalent severe weather event causes of fatalities and the plot on the right showing the most prevalent causes of injuries.

You can see that in both cases, taken across the US in total, that a tornado severe weather event has the most impact on health, both in terms of fatalities as well as injuries. These events are then followed in severity by excessive heat for fatalities.

library(gridExtra)
## Loading required package: grid
grid.arrange(fatalityplot, injuryplot, ncol = 2, main = "Health Effects of Severe Weather Events")

plot of chunk healthfigure

We can check to see what proportion of the fatalities and injuries were due to these causes:

totalfatalities <- sum(subdata$FATALITIES)
mostfatalities <- sum(subdata[subdata$EVTYPE == "TORNADO" | subdata$EVTYPE == 
    "EXCESSIVE HEAT", "FATALITIES"]/totalfatalities)

totalinjuries <- sum(subdata$INJURIES)
mostinjuries <- sum(subdata[subdata$EVTYPE == "TORNADO", "INJURIES"]/totalinjuries)

We can see that 49.8% of the total 15,145 fatalities were caused by either tornado or excessive heat events

We can see that 65% of the total 140,528 injuries were sustained during tornado events.

totalpropdmg <- sum(subdata$PROPDMG)
propertyorder <- order(-subdata$PROPDMG)
propertydata <- head(subdata[propertyorder, ], n = 7)
propertydata$EVTYPE <- reorder(propertydata$EVTYPE, propertydata$PROPDMG, function(x) -x)
propertyplot <- qplot(EVTYPE, PROPDMG, data = propertydata, geom = "bar", stat = "identity") + 
    theme(axis.text.x = element_text(angle = 25, hjust = 1)) + labs(title = "Property damage amounts", 
    y = "Amount of Damage", x = "")
totalcropdmg <- sum(subdata$CROPDMG)
croporder <- order(-subdata$CROPDMG)
cropdata <- head(subdata[croporder, ], n = 7)
cropdata$EVTYPE <- reorder(cropdata$EVTYPE, cropdata$CROPDMG, function(x) -x)
cropplot <- qplot(EVTYPE, CROPDMG, data = cropdata, geom = "bar", stat = "identity") + 
    theme(axis.text.x = element_text(angle = 25, hjust = 1)) + labs(title = "Crop damage amounts", 
    y = "Amount of Damage", x = "")

The following figure includes a plot on the left showing the most prevalent severe weather event causes of property damage and the plot on the right showing the most prevalent causes of crop damage.

You can see that in both cases, taken across the US in total, that the causes of the most damage to property are similar to, but different, from the causes of crop damage.

Crop damage occurs mostly due to hail, with any other effect causing less than half the damage. Property damage is caused mostly by tornado.

library(gridExtra)
grid.arrange(propertyplot, cropplot, ncol = 2, main = "Cost of Damage due to Severe Weather Events")

plot of chunk damagefigure

We can check to see what proportion of the property and crop damages were due to these causes:

totalpropdmg <- sum(subdata$PROPDMG)
mostpropdmg <- subdata[subdata$EVTYPE == "TORNADO", "PROPDMG"]/totalpropdmg

totalcropdmg <- sum(subdata$CROPDMG)
mostcropdmg <- subdata[subdata$EVTYPE == "HAIL", "CROPDMG"]/totalcropdmg

We can see that 29.5% of the total 10,884,500 property damage was caused by tornado events

We can see that 42.1% of the total 1,377,827 crop damage was sustained during hail events.

In the figure below we look at the causes for the total damage, both property and crop, to see the predominant cause is tornados.

damagetotals <- cbind(subdata["EVTYPE"], totaldamage = subdata$PROPDMG + subdata$CROPDMG)
totalorder <- order(-damagetotals$totaldamage)
totaldata <- head(damagetotals[totalorder, ], n = 7)
totaldata$EVTYPE <- reorder(totaldata$EVTYPE, totaldata$totaldamage, function(x) -x)
qplot(EVTYPE, totaldamage, data = totaldata, geom = "bar", stat = "identity") + 
    theme(axis.text.x = element_text(angle = 25, hjust = 1)) + labs(title = "Total damage amounts", 
    y = "Amount of Damage", x = "")

plot of chunk totalplot