Impact of Severe Weather Events on Public Health and Economy in the United States

Synopsis

In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that tornado and excessive heat are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.

Global settings

echo = TRUE  # Always make code visible
options(scipen=1)  # Turn off scientific notations for numbers
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v2.0.2 (2015-04-27) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(plyr)
library(ggplot2)
library(gridExtra)
## Loading required package: grid

Data Processing

Read the storm data csv file. If the data already exists in the working environment, we do not need to load it again. Otherwise, we read the csv file.

if (!"stormData" %in% ls()) {
  bunzip2("./repdata_data_StormData.csv.bz2", overwrite=T, remove=F)
  stormData <- read.csv("./repdata_data_StormData.csv", sep = ",")
}
dim(stormData)
## [1] 902297     37
head(stormData, n = 2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2

There are 902297 observations of 37 variables which record the characteristics of major storms and weather events in the United States from the year 1950 till end November 2011.

Impact on Public Health

We look at the number of fatalities and injuries that are caused by the top 10 most severe weather events.

sortFunc <- function(fieldName, top = 10, dataset = stormData) {
    index <- which(colnames(dataset) == fieldName)
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    names(field) <- c("EVTYPE", fieldName)
    field <- arrange(field, field[, 2], decreasing = T)
    field <- head(field, n = top)
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}

fatalities <- sortFunc("FATALITIES")
injuries <- sortFunc("INJURIES")

Impact on Economy

We will convert the property damage and crop damage data into numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns contain alphabetical characters which are used to signify magnitude, “K” for thousands, “M” for millions, and “B” for billions.

convertFunc <- function(fieldName, newFieldName, dataset = stormData) {
    totalLen <- dim(dataset)[2]
    index <- which(colnames(dataset) == fieldName)
    dataset[, index] <- as.character(dataset[, index])
    logic <- !is.na(toupper(dataset[, index]))
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    dataset[, index] <- as.numeric(dataset[, index])
    dataset[is.na(dataset[, index]), index] <- 0
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    names(dataset)[totalLen + 1] <- newFieldName
    return(dataset)
}
storm <- convertFunc("PROPDMGEXP", "propertyDamage", dataset = stormData)
## Warning in convertFunc("PROPDMGEXP", "propertyDamage", dataset =
## stormData): NAs introduced by coercion
storm <- convertFunc("CROPDMGEXP", "cropDamage", dataset = storm)
## Warning in convertFunc("CROPDMGEXP", "cropDamage", dataset = storm): NAs
## introduced by coercion
property <- sortFunc("propertyDamage", dataset = storm)
crop <- sortFunc("cropDamage", dataset = storm)

Results

As for the impact on public health, we have a pair of graphs of total fatalities and total injuries caused by these severe weather events.

fatalitiesPlot <- qplot(x=EVTYPE, y=FATALITIES, data=fatalities, stat="identity", geom="bar", 
                        xlab="Severe Weather Type", ylab="Number of Fatalities", 
                        main="Total Fatalities by\n Major Weather Events\n in the U.S. from 1950 - 2011") +
                        theme(axis.text.x = element_text(angle = 45, hjust = 1))

injuriesPlot <- qplot(x=EVTYPE, y=INJURIES, data = injuries, stat="identity", geom = "bar", 
                      xlab="Severe Weather Type", ylab="Number of Injuries", 
                      main = "Total Injuries by\n Major Weather Events\n in the U.S. from 1950 - 2011") +
                      theme(axis.text.x = element_text(angle = 45, hjust = 1))
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)

Based on the above bar charts, we find that tornado and excessive heat cause most fatalities; tornado causes most injuries in the United States from 1950 to 2011.

As for the impact on economy, we have a pair of graphs of total property damages and total crop damages caused by these severe weather events.

propertyPlot <- qplot(x=EVTYPE, y=propertyDamage, data=property, stat="identity", geom="bar",
                      xlab="Severe Weather Type", ylab="Property Damage in US dollars",
                      main="Total Property Damage by\n Major Weather Events\n in the U.S. from 1950 - 2011") +
                      theme(axis.text.x = element_text(angle = 45, hjust = 1))

cropPlot <- qplot(x=EVTYPE, y=cropDamage, data=crop, stat="identity", geom="bar", 
                  xlab="Severe Weather Type", ylab="Crop Damage in US dollars", 
                  main="Total Crop Damage by\n Major Weather Events\n in the U.S. from 1950 - 2011") +
                  theme(axis.text.x = element_text(angle = 45, hjust = 1))
grid.arrange(propertyPlot, cropPlot, ncol = 2)

Based on the above bar charts, we find that flood and hurricane/typhoon caused most property damage; drought and flood caused most crop damage in the United States from 1950 to 2011.

Conclusion

From these data, we found that tornado and excessive heat were the most harmful with respect to population health, while flood, drought, and hurricane/typhoon had the greatest economic consequences.