U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database analysis

Effects of Climate Events on Population Health and Economy

Marco Marchetti
Date: 16-June-2017
Reproducible Research: Peer Assessment 2

======================================================

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

1 SYNOPSIS

This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The analysis address 2 main questions:
1. Which types of events are most harmful to population health?
2. Which types of events have the greatest economic consequences?

The analysis consider the absolute and the mean effects of each type of event over the years (1950-2011). Tornado is the type of event that has most affected human health, while Flood cause the main economic consequences. Both Tornado and Flood have high effects considering absolute values but not so high considering mean values. The 2nd most frequent event is Hail (after thunderstorm wind that is the first most frequent) but it has impacts relatively low proportionally to others events).

2. DATA PROCESSING

The analysis was performed on Storm Events Database. The source data set come in the form of a comma-separated-value file compressed via the bzip2 algorithm. The documentation of the database is also available on NOAA website.

2.1 Loading Data

Loading required packages

library(R.utils) # for bunzip2
library(ggplot2)
library(gridExtra)
library(plyr)

Download, extract and read data.

if (!file.exists("./StormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                "./StormData.csv.bz2")
}

if (!file.exists("./StormData.csv")) {
  bunzip2("./StormData.csv.bz2", "./StormData.csv", remove = FALSE)
}

# Read csv
stormData <- read.csv("./StormData.csv")

2.2 Data set reduction.

Considering a subset of the source data set because not all data set columns are useful for the analysis.
The reduced data set obtained has 3 main information categories:
- General Info: Event reference number(REFNUM), Begin date (BGN_DATE), Event Type (EVTYPE)
- Most harmful to population health Info: Num. of FATALITIES, Num. of INJURIES
- Greatest economic consequences Info: Property Damage (PROPDMG/PROPDMGEXP), Crop Damage (CROPDMG/CROPDMGEXP)
(Values in $ - K=thousands, M=millions, B=billions)

#Create the reducedStormData
subCol <- c("REFNUM","BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
reducedStormData <- stormData[, subCol]

2.3 Data set Exploration and Processing

2.3.1 Main data exploration and Date Conversion

There are 902297 Events in reduced Storm Data and 0 NA.
The events in the database start in the year 1950 and ends in 2011.
In the earlier years of the database there are generally fewer events recorded but in this analysis we consider all Events availables.

dim(reducedStormData)
## [1] 902297      9
sum (is.na (reducedStormData))
## [1] 0
# Add a new column YEAR and converting BGN_DATE in numeric
reducedStormData$YEAR <- as.numeric(format(as.Date(reducedStormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))

hist(reducedStormData$YEAR , main = "Fig. 1. Num. of Events per Year", 
     breaks = 16, 
     col="steelblue", 
     xlab="Year",
     ylab="Tot. Count")

Fig.1. In the earlier years there are fewer events and most events are available after the 1990.

2.3.2 Property and Crop Damage Conversion

Converting property damage (PROPDMG) and crop damage (CROPDMG) into comparable numerical forms (PROPDAMAGE,CROPDAMAGE).
calculateExp is the function that converts symbol to a power of 10 (using PROPDMGEXP and CROPDMGEXP)

# some EXP level are wrong (- ? +) and some are number
levels(reducedStormData$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(reducedStormData$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
# Power of 10 function (value, exponent) 
calculateExp <- function(x, exp = "") {
  switch(as.character(exp), `-` = x, `?` = x, `+` = x, `1` = x, `2` = x*(10^2), `3` = x*(10^3), 
         `4` = x*(10^4), `5` = x*(10^5), `6` = x*(10^6), `7` = x*(10^7), `8` = x*(10^8), 
         H = x*100, K = x*1000, M = x*1e+06, B = x*1e+09,
         h = x*100, k = x*1000, m = x*1e+06, b = x*1e+09, x)
}

# Create PROPDAMAGE CROPDAMAGE
reducedStormData$PROPDAMAGE <- mapply(calculateExp, reducedStormData$PROPDMG, reducedStormData$PROPDMGEXP)
reducedStormData$CROPDAMAGE <- mapply(calculateExp, reducedStormData$CROPDMG, reducedStormData$CROPDMGEXP)
2.3.3 Event Types Agregation

There are 985 Event Types in the Storm database.
Many Event Types have “overlapping labels” and can be reduced to similar instances.
The aggregation consider only the most frequent Event types and classify them into the NOAA Data table categories.
New Event Types categories (based on NOAA Data table) are stored in NEWEVTYPE column.

# There are 985 Event types
length(levels(factor(reducedStormData$EVTYPE)))
## [1] 985
# Some Event Type text modification
reducedStormData$NEWEVTYPE <- toupper(reducedStormData$EVTYPE)
reducedStormData$NEWEVTYPE <- gsub('TSTM', 'THUNDERSTORM',reducedStormData$NEWEVTYPE)
reducedStormData$NEWEVTYPE <- gsub('WINDS', 'WIND',reducedStormData$NEWEVTYPE)
reducedStormData$NEWEVTYPE <- gsub('FLD|FLLOODIN|FLDG|FLOODING', 'FLOOD',reducedStormData$NEWEVTYPE)
reducedStormData$NEWEVTYPE <- gsub('CURRENTS', 'CURRENT',reducedStormData$NEWEVTYPE)
reducedStormData$NEWEVTYPE <- gsub('WEATHER/MIX', 'WEATHER',reducedStormData$NEWEVTYPE)

# Aggregation based on Event Types frequency and NOAA Data table
reducedStormData[grepl("URBAN FLOOD|RIVER FLOOD|URBAN/SML STREAM|URBAN/SMALL STREAM",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "FLOOD"
reducedStormData[grepl("FLOOD/FLASH FLOOD",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "FLASH FLOOD"
reducedStormData[grepl("RECORD HEAT|EXTREME HEAT|HEAT WAVE|RECORD WARMTH|UNSEASONABLY WARM",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "EXCESSIVE HEAT"
reducedStormData[grepl("EXTREME COLD|COLD/WIND CHILL|WIND CHILL|EXTREME WINDCHILL",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "EXTREME COLD/WIND CHILL"
reducedStormData[grepl("HURRICANE/TYPHOON|HURRICANE",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "HURRICANE (TYPHOON)"
reducedStormData[grepl("ASTRONOMICAL HIGH TIDE|STORM SURGE",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "STORM SURGE/TIDE"
reducedStormData[grepl("WILD/FOREST FIRE",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "WILDFIRE"
reducedStormData[grepl("HEAVY SURF/HIGH SURF",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "HIGH SURF"
reducedStormData[grepl("TORNADOES",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "TORNADO"
reducedStormData[grepl("TROPICAL STORM",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "TROPICAL STORM"
reducedStormData[grepl("RECORD/EXCESSIVE HEAT",  reducedStormData$EVTYPE, ignore.case = TRUE), "NEWEVTYPE"] <- "EXCESSIVE HEAT"

reducedStormData$NEWEVTYPE <- factor(reducedStormData$NEWEVTYPE)

After the Events aggregation there are 11 Top Events Types that are over 10.000 Events and candidates as most harmful for the population and with greatest economic consequences.

summary(reducedStormData$NEWEVTYPE, maxsum = 15)
##        THUNDERSTORM WIND                     HAIL                  TORNADO 
##                   323355                   288661                    60655 
##              FLASH FLOOD                    FLOOD                HIGH WIND 
##                    55586                    29455                    21747 
##                LIGHTNING               HEAVY SNOW MARINE THUNDERSTORM WIND 
##                    15754                    15708                    11987 
##               HEAVY RAIN             WINTER STORM           WINTER WEATHER 
##                    11742                    11433                     8149 
##             FUNNEL CLOUD                 WILDFIRE                  (Other) 
##                     6844                     4219                    37002
topEvents <- count(reducedStormData$NEWEVTYPE)
colnames(topEvents)[1] <- "NEWEVTYPE"
colnames(topEvents)[2] <- "FREQ"
topEvents <- topEvents[order(topEvents$FREQ, decreasing = TRUE), ]
topEvents$REORDER <- reorder(topEvents$NEWEVTYPE, topEvents$FREQ) 

ggplot(topEvents, aes(NEWEVTYPE, FREQ/1000)) + 
  geom_bar(aes(x=REORDER), data=topEvents[1:11,], stat = "identity", fill="steelblue") +
  labs(title="Fig. 2. Top 11 Event Types frequency") +
  xlab("Event Type") + ylab("Tot. Count (K)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))

Fig.2: The two most frequent Events are Thunderstorm wind and Hail and together are more than the double of the others events sum

3. Results

3.1 Across the United States, which types of events are most harmful with respect to population health?

  • Tornado is the most harmful event with respect to population health but only in absolute values and it is at 3rd position in the Top 11 frequents events.
  • Thunderstom Wind is the most frequent event but is not so hamful with respect to population health compared to Tornado.
  • Heat and Excessive Heat are very dangerous both in absolute and mean values. Considering the sum of their mean values.
  • Rip Current and Avalanche are dangerous considering fatalities (especially in mean values) but are not included in the Top 11 most frequent events.
3.1.1 Top 10 fatalities per Event Type
# fatalities per Event Type
subFatalities <- subset(aggregate(FATALITIES ~ NEWEVTYPE, reducedStormData, sum), FATALITIES > 0)
subFatalities <- subFatalities[order(subFatalities$FATALITIES, decreasing = TRUE), ]
subFatalities$REORDER <- reorder(subFatalities$NEWEVTYPE, subFatalities$FATALITIES) 

plotAbs <- ggplot(subFatalities, aes(NEWEVTYPE, FATALITIES)) +
  geom_bar(aes(x=REORDER), data=subFatalities[1:11,], stat="identity", fill="#F8766D") +
  labs(title="Fig.3. Top Fatalities") +
  xlab("Event Type") + ylab("Tot. Count") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subFatalities$REORDER <- NULL

# Mean Values
subFatalities <- merge(topEvents, subFatalities, by="NEWEVTYPE", all=TRUE)
subFatalities$MEANFATALITIES <- subFatalities$FATALITIES/subFatalities$FREQ
subFatalities$REORDER <- reorder(subFatalities$NEWEVTYPE, subFatalities$MEANFATALITIES) 
subFatalities <- subFatalities[order(subFatalities$FATALITIES, decreasing = TRUE),]

plotMean <- ggplot(subFatalities, aes(NEWEVTYPE, MEANFATALITIES)) +
  geom_bar(aes(x=REORDER), data=subFatalities[1:11,],stat = "identity", fill="#F8766D") +
  labs(title="Fig.4. Top Fatalities (Mean)") +
  xlab("Event Type") + ylab("Mean") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subFatalities$REORDER <- NULL

grid.arrange(plotAbs, plotMean, nrow=1, ncol=2)

Fig.3: Tornado is the most harmful event with respect to fatalities
Fig.4: Heat and Excessive Heat together are very high respect to others mean values.
Rip Current and Avalanche are not included in the Top 11 most frequent events but are dangerous considering fatalities

3.1.2 Top 10 Injuries per Event Type
# injuries per Event Type
subInjuries <- subset(aggregate(INJURIES ~ NEWEVTYPE, reducedStormData, sum), INJURIES > 0)
subInjuries <- subInjuries[order(subInjuries$INJURIES, decreasing = TRUE), ]
subInjuries$REORDER <- reorder(subInjuries$NEWEVTYPE, subInjuries$INJURIES) 

plotAbs <- ggplot(subInjuries, aes(NEWEVTYPE, INJURIES/1000)) +
  geom_bar(aes(x=REORDER), data=subInjuries[1:10,], stat = "identity", fill="#00BFC4") +
  labs(title="Fig. 5. Top Injuries") +
  xlab("Event Type") + ylab("Tot. Count (K)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1), 
         plot.title = element_text(hjust = 0.5)))
subInjuries$REORDER <- NULL

# Mean Values
subInjuries <- merge(topEvents, subInjuries, by="NEWEVTYPE", all=TRUE)
subInjuries$MEANINJURIES <- subInjuries$INJURIES/subInjuries$FREQ
subInjuries$REORDER <- reorder(subInjuries$NEWEVTYPE, subInjuries$MEANINJURIES) 
subInjuries <- subInjuries[order(subInjuries$INJURIES, decreasing = TRUE),]

plotMean <- ggplot(subInjuries, aes(NEWEVTYPE, MEANINJURIES)) +
  geom_bar(aes(x=REORDER), data=subInjuries[1:10,], stat = "identity", fill="#00BFC4") +
  labs(title="Fig. 6. Top Injuries (Mean)") +
  xlab("Event Type") + ylab("Mean") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subInjuries$REORDER <- NULL

grid.arrange(plotAbs, plotMean, nrow=1, ncol=2)

Fig.5: Tornado is a sort of outlier on injuries respect to others events
Fig.6: Also in mean values Heat and Excessive Heat together are very high respect to others events

3.1.3 Top events harmful for population

Union of absolute values between the fatalities and injuries per Event type.

# Top events by harmful for population 
subTotalFatalities <- subFatalities[,c("NEWEVTYPE","FATALITIES")]
subTotalInjuries <- subInjuries[,c("NEWEVTYPE","INJURIES")]
colnames(subTotalFatalities)[2] <- "VALUE"
colnames(subTotalInjuries)[2] <- "VALUE"
subTotalFatalities$NAME <- "FATALITIES"
subTotalInjuries$NAME <- "INJURIES"

subTotalHarmful <- rbind(subTotalFatalities, subTotalInjuries)
subTotalHarmful <- subTotalHarmful[order(subTotalHarmful$VALUE, decreasing = TRUE), ]
selectUnion <- union(subTotalFatalities[1:10, 1], subTotalInjuries[1:10, 1])
subTopHarmful<- subTotalHarmful[subTotalHarmful$NEWEVTYPE %in% selectUnion,]
subTopHarmful$REORDER <- reorder(subTopHarmful$NEWEVTYPE, subTopHarmful$VALUE) 

ggplot(subTopHarmful, aes(NEWEVTYPE, VALUE/1000, fill = NAME)) + 
  geom_bar(aes(x=REORDER), data=subTopHarmful, stat = "identity") +
  labs(title="Fig. 7. Top events harmful by population health") +
  xlab("Event Type") + ylab("Tot. Count (K)") +
  (theme(axis.text.x = element_text(angle=60,hjust=1),
         plot.title = element_text(hjust = 0.5)))

Fig.7: In absolute values Tornado is the most harmful event and it’s value is greater than all other events together. Some of the Top 11 frequent events do not appear as most harmful with respect to population health: Heavy Rain, Marine Thunderstorm wind, Heavy Snow and Hail.

3.2 Across the United States, which types of events have the greatest economic consequences?

  • Flood is the event with the greatest economic consequences (in absolute values but not in mean values) and it is at 5th position in the Top 11 frequent events.
  • Hurricane is also an event with high economic consequences (especially considering the mean values) but is out of the Top 11 most frequent events.
  • Drought has great economic consequences on Crop both in absolute and mean values.
3.2.1 Top 10 Property Damage per Event Type
# Property Damage per Event Type
subPropertyDamage <- subset(aggregate(PROPDAMAGE ~ NEWEVTYPE, reducedStormData, sum), PROPDAMAGE > 0)
subPropertyDamage <- subPropertyDamage[order(subPropertyDamage$PROPDAMAGE, decreasing = TRUE), ]
subPropertyDamage$REORDER <- reorder(subPropertyDamage$NEWEVTYPE, subPropertyDamage$PROPDAMAGE) 

plotAbs <- ggplot(subPropertyDamage, aes(NEWEVTYPE, PROPDAMAGE/10^9)) +
  geom_bar(aes(x=REORDER), data=subPropertyDamage[1:10,], stat = "identity", fill="#00BFC4") +
  labs(title="Fig. 8. Top Property Damages") +
  xlab("Event Type") + ylab("Damage ($B)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subPropertyDamage$REORDER <- NULL

# Mean Values
subPropertyDamage <- merge(topEvents, subPropertyDamage, by="NEWEVTYPE", all=TRUE)
subPropertyDamage$MEANPROPDAMAGE <- subPropertyDamage$PROPDAMAGE/subPropertyDamage$FREQ
subPropertyDamage$REORDER <- reorder(subPropertyDamage$NEWEVTYPE, subPropertyDamage$MEANPROPDAMAGE) 
subPropertyDamage <- subPropertyDamage[order(subPropertyDamage$PROPDAMAGE, decreasing = TRUE),]

plotMean <- ggplot(subPropertyDamage, aes(NEWEVTYPE, MEANPROPDAMAGE/10^6)) +
  geom_bar(aes(x=REORDER), data=subPropertyDamage[1:10,], stat = "identity", fill="#00BFC4") +
  labs(title="Fig. 9. Top Property Damages (Mean)") +
  xlab("Event Type") + ylab("Mean ($M)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subPropertyDamage$REORDER <- NULL

grid.arrange(plotAbs, plotMean, nrow=1, ncol=2)

Fig.8: Flood is the event with the greatest economic consequences on properties in absolute values
Fig.9: Hurricane is the event with high economic consequences on properties in mean values and Flood is low considering its impact on absolute values

3.2.2 Top 10 Crop Damage per Event Type
# Crop Damage per Event Type
subCropDamage <- subset(aggregate(CROPDAMAGE ~ NEWEVTYPE,reducedStormData, sum), CROPDAMAGE > 0)
subCropDamage <- subCropDamage[order(subCropDamage$CROPDAMAGE, decreasing = TRUE), ]
subCropDamage$REORDER <- reorder(subCropDamage$NEWEVTYPE, subCropDamage$CROPDAMAGE) 

plotAbs <- ggplot(subCropDamage, aes(NEWEVTYPE, CROPDAMAGE/10^9)) +
  geom_bar(aes(x=REORDER), data=subCropDamage[1:10,], stat = "identity", fill="#F8766D") +
  labs(title="Fig. 10. Top Crop Damages") +
  xlab("Event Type") + ylab("Damage ($B)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subCropDamage$REORDER <- NULL

# Mean Values
subCropDamage <- merge(topEvents, subCropDamage, by="NEWEVTYPE", all=TRUE)
subCropDamage$MEANCROPDAMAGE <- subCropDamage$CROPDAMAGE/subCropDamage$FREQ
subCropDamage$REORDER <- reorder(subCropDamage$NEWEVTYPE, subCropDamage$MEANCROPDAMAGE) 
subCropDamage <- subCropDamage[order(subCropDamage$CROPDAMAGE, decreasing = TRUE),]

plotMean <- ggplot(subCropDamage, aes(NEWEVTYPE, MEANCROPDAMAGE/10^6)) +
  geom_bar(aes(x=REORDER), data=subCropDamage[1:10,], stat = "identity", fill="#F8766D") +
  labs(title="Fig. 11. Top Crop Damages (Mean)") +
  xlab("Event Type") + ylab("Mean Count ($M)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))
subCropDamage$REORDER <- NULL

grid.arrange(plotAbs, plotMean, nrow=1, ncol=2)

Fig.10: Drought and Flood are the highest values and with the major consequences on Crops.
Fig.11: Hurricane is the highest value considering mean values but only the 3rd event in absolute values

3.2.3 Top events with greatest economic consequences

Union of absolute values between the Property and Crop Damages per Event type.

# Top events by type of Damage
subTotalCropDamage <- subCropDamage[,c("NEWEVTYPE","CROPDAMAGE")]
subTotalPropertyDamage <- subPropertyDamage[,c("NEWEVTYPE","PROPDAMAGE")]
colnames(subTotalCropDamage)[2] <- "VALUE"
colnames(subTotalPropertyDamage)[2] <- "VALUE"
subTotalCropDamage$NAME <- "CROP"
subTotalPropertyDamage$NAME <- "PROPERTY"

subTotalDamage <- rbind(subTotalCropDamage, subTotalPropertyDamage)
subTotalDamage <- subTotalDamage[order(subTotalDamage$VALUE, decreasing = TRUE), ]
selectUnion <- union(subTotalCropDamage[1:10, 1], subTotalPropertyDamage[1:10, 1])
subTopDamage<- subTotalDamage[subTotalDamage$NEWEVTYPE %in% selectUnion,]
subTopDamage$REORDER <- reorder(subTopDamage$NEWEVTYPE, subTopDamage$VALUE) 

ggplot(subTopDamage, aes(NEWEVTYPE, VALUE/10^9, fill = NAME)) + 
  geom_bar(aes(x=REORDER), data=subTopDamage, stat = "identity") +
  labs(title="Fig. 12. Top events with greatest economic consequences") +
  xlab("Event Type") + ylab("Damage ($B)") +
  (theme(axis.text.x = element_text(angle=50,hjust=1),
         plot.title = element_text(hjust = 0.5)))

Fig.12: Flood is the event with the greatest economic consequences on properties and Drought on Crops. Some of the Top 11 frequent events do not appear as greatest economic consequences events: Marine Thunderstorm wind, Heavy Snow, Lightning and High Wind.