Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database- Health and Economic Impacts

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The objective of this report is to look at health and economic impacts of weather events in the US. The goal would be to present this report to a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. Therefore, we will only be looking at the data from 1993 onward.

The analysis of the data shows that the top 4 weather events that cause injuries and fatalities are tornadoes, heat, flood, and wind.Floods and tropical Storms (including hurricanes) cause the most economic cost to property, and heat and floods cause the most economic cost to crops.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. Answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Downloaded the Storm Data (using the above link) to my local hard drive.

Reading the dataset into R.

# Load the data

Storm_data <- read.csv("C:/Users/lisa.mccormick/Documents/Coursera materials/Reproducible Research/repdata_data_StormData.csv.bz2", header = TRUE, sep = ",")

#Convert the begin date to a date class. Extract and create a year variable.

Storm_data$DATE <- as.Date(as.POSIXct(Storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))

Storm_data$YEAR <- as.numeric(format(Storm_data$DATE, "%Y"))

Preprocessing the dataset to include variables of interest.

We are interested in the health and economic impacts for different weather events. We need to include the following variables in our dataset:

Health impacts: FATALITIES: number of deaths INJURIES: number of injuries

Economic impacts: PROPDMG: property damage PROPDMGEXP: property damage value CROPDMG: crop damage CROPDMGEXP: crop damage value

Weather events: EVTYPE: weather event type

Also want to include year in the data table.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Impact <- Storm_data %>% 
  select(YEAR, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Check for missing values
any(is.na(Impact))
## [1] FALSE

Let’s look at the distribution of events over the years.

library(dplyr)

year_type <- Impact %>% 
  group_by(YEAR) %>% 
  summarize(Total = length(unique(EVTYPE)))

View(year_type)

From 1950 - 1992 there were only a few types of events recorded. Between 1993 - 2002 there were one or two hundred types of events recorded. From 2003 - 2011 there were about 50 types of events recorded.

We will only use the data from 1993 - 2011 to make recommendations to a government official. The data from before 1993 is not very useful for this purpose and may skew the results since only a few types of events were recorded in those years.

Let’s look at how many event categories there are.

library(dplyr)

Impact_event <- Impact %>% 
  filter(between(YEAR, 1993, 2011))

length(unique(Impact_event$EVTYPE))
## [1] 985

There are too many different event categories (985) so we need to combine some of them to make our downstream analyses more meaningful.

#Let's look at the most common types of events so we can make some categories.

Common <- sort(table(Impact_event$EVTYPE), decreasing = TRUE)
head(Common)
## 
##              HAIL         TSTM WIND THUNDERSTORM WIND       FLASH FLOOD 
##            226829            128977             82563             54277 
##           TORNADO             FLOOD 
##             25888             25326
#Making categories in a new column variable called EVENT.

"OTHER" -> Impact_event$EVENT 
"TORNADO" -> Impact_event$EVENT[grep("TORNADO|FUNNEL", Impact_event$EVTYPE, ignore.case = TRUE)]
"RAIN" -> Impact_event$EVENT[grep("THUNDERSTORM|RAIN|PRECIPITATION", Impact_event$EVTYPE, ignore.case = TRUE)]
"HAIL" -> Impact_event$EVENT[grep("HAIL", Impact_event$EVTYPE, ignore.case = TRUE)]
"FLOOD" -> Impact_event$EVENT[grep("FLOOD|STREAM|SURGE", Impact_event$EVTYPE, ignore.case = TRUE)]
"FIRE" -> Impact_event$EVENT[grep("FIRE", Impact_event$EVTYPE, ignore.case = TRUE)]
"HEAT" -> Impact_event$EVENT[grep("DROUGHT|HEAT|WARM|HOT|DRY", Impact_event$EVTYPE, ignore.case = TRUE)]
"FOG" -> Impact_event$EVENT[grep("FOG", Impact_event$EVTYPE, ignore.case = TRUE)]
"LANDSLIDE" -> Impact_event$EVENT[grep("LANDSLIDE", Impact_event$EVTYPE, ignore.case = TRUE)]
"LIGHTNING" -> Impact_event$EVENT[grep("LIGHTNING", Impact_event$EVTYPE, ignore.case = TRUE)]
"SNOW" -> Impact_event$EVENT[grep("SNOW|WINTER|WINTRY|ICY|ICE|BLIZZARD|FREEZE|FREEZING|COLD|SLEET|FROST", Impact_event$EVTYPE, ignore.case = TRUE)]
"WIND" -> Impact_event$EVENT[grep("WIND", Impact_event$EVTYPE, ignore.case = TRUE)]
"TROPICAL STORM" -> Impact_event$EVENT[grep("HURRICANE|TROPICAL|RIP|SURF|WATERSPOUT", Impact_event$EVTYPE, ignore.case = TRUE)]
"DUST" -> Impact_event$EVENT[grep("DUST", Impact_event$EVTYPE, ignore.case = TRUE)]


#Let's look at the new categories.

length(unique(Impact_event$EVENT))
## [1] 14
sort(table(Impact_event$EVENT), decreasing = TRUE)
## 
##           WIND           HAIL          FLOOD           SNOW        TORNADO 
##         273934         227443          86511          45092          32900 
##      LIGHTNING           RAIN TROPICAL STORM           HEAT           FIRE 
##          15764          12022           6745           5764           4239 
##            FOG          OTHER      LANDSLIDE           DUST 
##           1834           1288            613            589

Now we have 14 event categories.

Results

Public Health Impacts

The first question asks which types of events are most harmful with respect to population health.

Let’s look at the consequences to public health for each event.

library(tidyr)
library(reshape2) 
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(reshape) 
## 
## Attaching package: 'reshape'
## The following objects are masked from 'package:reshape2':
## 
##     colsplit, melt, recast
## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths
## The following object is masked from 'package:dplyr':
## 
##     rename
library(ggplot2)

Public_health <- Impact_event %>% 
  group_by(EVENT) %>% 
  summarize(Total_Injuries = sum(INJURIES), Total_Fatalities = sum(FATALITIES)) %>% 
  gather(Type, Total, Total_Injuries:Total_Fatalities, factor_key=TRUE)


ggplot(data=Public_health, aes(x=EVENT, y=Total, fill=Type)) +
  geom_bar(stat="identity") +
  xlab("Weather Event Type") + ylab("Number of People") +
  ggtitle("Number of Injuries and Fatalities for each Weather Event") +
  theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1)) +
  scale_fill_discrete(labels = c("Injuries", "Fatalities")) +
  labs(fill = "")

The top 4 weather events that cause injuries and fatalities are tornadoes, heat, flood, and wind.

Economic Impacts

The second question asks which types of events have the greatest economic consequences.

First we need to compute the economic cost for each event. The PROPDMG and CROPDMG costs are in dollars, but the PROPDMGEXP and CROPDMGEXP variables modify the dollar amounts. Let’s look at the values of those variables.

unique(Impact_event$PROPDMGEXP)
##  [1] ""  "B" "K" "M" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(Impact_event$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

The “B”, “M/m”, and “K/k” values multiply the cost by billions, millions, and thousands of dollars respectively.

Let’s use those modifiers to calculate a new cost in dollars.

options(scipen= 999)

#Create new cost column for property damage.

Impact_event$PROPMULT <- as.character(Impact_event$PROPDMGEXP)
Impact_event$PROPMULT[is.na(Impact_event$PROPMULT)] <- 0
Impact_event$PROPMULT[!grepl("B|M|K", Impact_event$PROPMULT, ignore.case = TRUE)] <- 0
Impact_event$PROPMULT[grep("B", Impact_event$PROPMULT, ignore.case = TRUE)] <- "1000000000"
Impact_event$PROPMULT[grep("M", Impact_event$PROPMULT, ignore.case = TRUE)] <- "1000000"
Impact_event$PROPMULT[grep("K", Impact_event$PROPMULT, ignore.case = TRUE)] <- "1000"
Impact_event$PROPMULT <- as.numeric(as.character(Impact_event$PROPMULT))

Impact_event$PROPCOST <- Impact_event$PROPDMG * Impact_event$PROPMULT

#Create new cost column for crop damage.

Impact_event$CROPMULT <- as.character(Impact_event$CROPDMGEXP)
Impact_event$CROPMULT[is.na(Impact_event$CROPMULT)] <- 0
Impact_event$CROPMULT[!grepl("B|M|K", Impact_event$CROPMULT, ignore.case = TRUE)] <- 0
Impact_event$CROPMULT[grep("B", Impact_event$CROPMULT, ignore.case = TRUE)] <- "1000000000"
Impact_event$CROPMULT[grep("M", Impact_event$CROPMULT, ignore.case = TRUE)] <- "1000000"
Impact_event$CROPMULT[grep("K", Impact_event$CROPMULT, ignore.case = TRUE)] <- "1000"
Impact_event$CROPMULT <- as.numeric(as.character(Impact_event$CROPMULT))

Impact_event$CROPCOST <- Impact_event$CROPDMG * Impact_event$CROPMULT

Let’s look at the economic consequences to property and crops for each event.

library(tidyr)
library(reshape2) 
library(reshape) 
library(ggplot2)

Economic <- Impact_event %>% 
  group_by(EVENT) %>% 
  summarize(Total_PropertyDamage = sum(PROPCOST), Total_CropDamage = sum(CROPCOST)) %>% 
  gather(Type, TotalCost, Total_PropertyDamage:Total_CropDamage, factor_key=TRUE)


ggplot(data=Economic, aes(x=EVENT, y=(TotalCost/10^9), fill=Type)) +
  geom_bar(stat="identity") +
  xlab("Weather Event Type") + ylab("Total Cost (billions of dollars") +
  ggtitle("Total Economic Cost for each Weather Event") +
  theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1)) +
  scale_fill_discrete(labels = c("Property", "Crops")) +
  labs(fill = "")

Floods and Tropical Storms (includes Hurricanes) cause the most economic cost to property. Heat and Floods cause the most economic cost to crops.