Impact of Severe Weather Events on Public Health and Economy in the USA

Synopsis

We analyze the impact of different weather events on the public health and the economy of thte United States. We use data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We look at estimates of fatalities, injuries, and property damage.

We address the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

We conclude that the deadliest events were tornados and excessive heat. The events which resulted in the greatest number of injuries were tornados and thunderstorm wind. The events leading to the greatest economic damage were tornados and flash floods.

Data Processing

Load the necessary libraries.

library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v2.1.0 (2015-05-27) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(knitr)

Download, unzip, and load the data file.

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "storm_data.csv.bz2")
bunzip2("storm_data.csv.bz2", overwrite=T, remove=F)

storm_data <- read.csv("storm_data.csv",header = TRUE, stringsAsFactors = FALSE)

# We look at the structure of this data set.
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We want to subset the data by the type of damamge - either economic or population health.

data_col <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm_data_sub <- storm_data[data_col]
storm_data_sub$YEAR <- as.integer(format(as.Date(storm_data_sub$BGN_DATE, "%m/%d/%Y 0:00:00"), "%Y"))

head(storm_data_sub)
##             BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1  4/18/1950 0:00:00 TORNADO          0       15    25.0          K
## 2  4/18/1950 0:00:00 TORNADO          0        0     2.5          K
## 3  2/20/1951 0:00:00 TORNADO          0        2    25.0          K
## 4   6/8/1951 0:00:00 TORNADO          0        2     2.5          K
## 5 11/15/1951 0:00:00 TORNADO          0        2     2.5          K
## 6 11/15/1951 0:00:00 TORNADO          0        6     2.5          K
##   CROPDMG CROPDMGEXP YEAR
## 1       0            1950
## 2       0            1950
## 3       0            1951
## 4       0            1951
## 5       0            1951
## 6       0            1951
str(storm_data_sub)
## 'data.frame':    902297 obs. of  9 variables:
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ YEAR      : int  1950 1950 1951 1951 1951 1951 1951 1952 1952 1952 ...

According to the Storm Data Documentation, PROPDMGEXP and CROPDMGEXP are exponents. We convert these to numerical exponents.

convert_exp <- function(x){
  x[x==" " || x==""] <- "0"
  x[x=="H" || x=="h"] <- "100" 
  x[x=="K" || x=="k"] <- "1000"
  x[x=="M" || x=="m"] <- "1000000"
  x[x=="B" || x=="b"] <- "1000000000"
  return(as.numeric(x))
  }

storm_data_sub$PROPDMGEXP <- convert_exp(storm_data_sub$PROPDMGEXP)

storm_data_sub$PROPDMGEXP[is.na(storm_data_sub$PROPDMGEXP)] = 0
storm_data_sub$PropertyDamage = storm_data_sub$PROPDMG * 10^storm_data_sub$PROPDMGEXP
summary(storm_data_sub$PropertyDamage)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     Inf     Inf     Inf     Inf     Inf     Inf  663123
storm_data_sub$CROPDMGEXP <- convert_exp(storm_data_sub$CROPDMGEXP)

storm_data_sub$CROPDMGEXP[is.na(storm_data_sub$CROPDMGEXP)] = 0
storm_data_sub$CropDamage = storm_data_sub$CROPDMG * 10^storm_data_sub$CROPDMGEXP
summary(storm_data_sub$CropDamage)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.527   0.000 990.000

Compute the combined property and crop damage

TOTAL_DMG <- storm_data_sub$PROPDMG + storm_data_sub$CROPDMG
clean_storm_data <- cbind(storm_data_sub, TOTAL_DMG)
str(clean_storm_data)
## 'data.frame':    902297 obs. of  12 variables:
##  $ BGN_DATE      : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ EVTYPE        : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES    : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES      : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG       : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP    : num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ CROPDMG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ YEAR          : int  1950 1950 1951 1951 1951 1951 1951 1952 1952 1952 ...
##  $ PropertyDamage: num  Inf Inf Inf Inf Inf ...
##  $ CropDamage    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ TOTAL_DMG     : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

First, we consider the 10 most harmful events with respect to fatalities and injuries and plot a bar chart.

fatal_data <- aggregate(clean_storm_data$FATALITIES, by = list(clean_storm_data$EVTYPE), "sum")
fatal_data <- fatal_data[order(-fatal_data$x), ][1:10, ]

str(fatal_data)
## 'data.frame':    10 obs. of  2 variables:
##  $ Group.1: chr  "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT" ...
##  $ x      : num  5633 1903 978 937 816 ...
fatal_plot<- ggplot(fatal_data, aes(x = Group.1, y = x)) + geom_bar(stat = "identity", fill = "#FF0011", 
    las = 3) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
    ylab("Count") + ggtitle("Top Ten Fatal Events")

fatal_plot

Next, we consider the 10 events leading to the greatest number of injuries.

injury_data <- aggregate(clean_storm_data$INJURIES, by = list(clean_storm_data$EVTYPE), "sum")
injury_data <- injury_data[order(-injury_data$x), ][1:10, ]

str(injury_data)
## 'data.frame':    10 obs. of  2 variables:
##  $ Group.1: chr  "TORNADO" "TSTM WIND" "FLOOD" "EXCESSIVE HEAT" ...
##  $ x      : num  91346 6957 6789 6525 5230 ...
injury_plot<- ggplot(injury_data, aes(x = Group.1, y = x)) + geom_bar(stat = "identity", fill = "#0022FF", 
    las = 3) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
    ylab("Count") + ggtitle("Top Ten Causes of Injuries")

injury_plot

Across the United States, which types of events have the greatest economic consequences?

We consider the 10 costliest events with respect to economic damage and plot a bar chart.

damage_data <- aggregate(clean_storm_data$TOTAL_DMG, by = list(clean_storm_data$EVTYPE), "sum")
damage_data <- damage_data[order(-damage_data$x), ][1:10, ]

str(damage_data)
## 'data.frame':    10 obs. of  2 variables:
##  $ Group.1: chr  "TORNADO" "FLASH FLOOD" "TSTM WIND" "HAIL" ...
##  $ x      : num  3312277 1599325 1445168 1268290 1067976 ...
damage_plot<- ggplot(damage_data, aes(x = Group.1, y = x)) + geom_bar(stat = "identity", fill = "#05FF22", 
    las = 3) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
    ylab("Count") + ggtitle("Top Ten Events Resulting in Economic Damage")

damage_plot

Conclusions

Between the years 1950 and 2011, tornados were the leading cause of both injuries and fatalitites, and property and crop damamge.