Examining Weather Damage - Economic and Human Costs

Extreme weather events such as tornado, thunderstorms, blizzards, etc. have the potential to cause catastrophic destruction. Costs of such destruction is not only counted in economic terms, but in human lives.

The following analysis attempts to identify which types of weather events have caused the most destruction. Data will be analyzed from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database contains records of weather events and related economic and human impact from 1950 to 2011. The database can be found at:

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Additional information about the database can be found here:

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

Research Findings

The analysis examined four areas of destruction related to specific weather events:

  • Fatalities
  • Injuries
  • Property damage
  • Crop damage

The findings suggest that Tornados have caused the most destruction among all event categories–both in property/crop damage as well as human lives. Other highly destructive weather events include: Storm Winds, Hail, Heat and Winter Storms.

Data Download

To begin the analysis, import the data from the NOAA site:

filename <- "repdata_data_StormData.csv.bz2"
if (!file.exists(filename)){
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, filename)
} 
if (!file.exists("repdata_data_StormData")) { 
unzip(filename) 
}
data<-read.csv("repdata_data_StormData", stringsAsFactors = FALSE)

Next, load R packages required for processing, analysis and graphics:

library(dplyr)
library(lubridate)
library(tidyr)
library(chron)
library(summarytools)
library(ggplot2)
library(questionr)
library(stats)
library(reshape2)
library(scales)
library(stringr)

Data Processing for Analysis

The analysis will use the following 5 variables in the database:

Examining these variable reveals three that require cleaning/re-coding procedures in preparation for the analysis. The procedures are as follows:

Variable 1 - “PROPDMG” - Property damage expense (US Dollars)

Property damage expense is the dollar amount of damage from a particular event. This dollar amount is listed in two separate fields, where PROPDMG includes an abbreviated dollar amount (either in 000’s or 000,000’s) next to a variable PROPDMGEXP which indicates the related “K” (thousand) or an “M” (million) extension. To use these dollar amounts for the analysis, we create a new field “propexp” that contains the correct amount abbreviated in billions of dollars (due to the very large amounts when aggregated).

## Recode property damage to new field shown in billions of dollars
data$propexp<-as.numeric(0)
data <- within(data, propexp[PROPDMGEXP=="K"] <- (PROPDMG[PROPDMGEXP=="K"]/1000000))
data <- within(data, propexp[PROPDMGEXP=="M"] <- (PROPDMG[PROPDMGEXP=="M"]/1000))
Variable 2 - “CROPDMG” - Crop damage expense (US Dollars)

As with the above property damage recoding, we employ the same procedure to crop damage expense, which is also in two separate fields; the new field “cropexp” is created with the same dollar formating.

## Recode crop damage to new field shown in billions of dollars
data$cropexp<-as.numeric(0)
data <- within(data, cropexp[CROPDMGEXP=="K"] <- (CROPDMG[CROPDMGEXP=="K"]/1000000))
data <- within(data, cropexp[CROPDMGEXP=="M"] <- (PROPDMG[CROPDMGEXP=="M"]/1000))

Variable 3 - “EVTYPE” - Event Type Category Name

The field containing weather event categories is EVTYPE. To clean this field, we first make a duplicate called “event2” in order to keep the original variable available throughout the analysis. We then run a frequency count on event2 examining which events have the most destructive effects. We also identify how many unique category names are listed in the field.

## Create new event field "event2"
data$event2<-data$EVTYPE

## Create preliminary frequency counts representing 90%+ of event counts in 1950-2011
head(questionr::freq(data$event2, cum = TRUE, sort = "dec", total = TRUE), 15)
##                         n    % val% %cum val%cum
## HAIL               288661 32.0 32.0 32.0    32.0
## TSTM WIND          219940 24.4 24.4 56.4    56.4
## THUNDERSTORM WIND   82563  9.2  9.2 65.5    65.5
## TORNADO             60652  6.7  6.7 72.2    72.2
## FLASH FLOOD         54277  6.0  6.0 78.3    78.3
## FLOOD               25326  2.8  2.8 81.1    81.1
## THUNDERSTORM WINDS  20843  2.3  2.3 83.4    83.4
## HIGH WIND           20212  2.2  2.2 85.6    85.6
## LIGHTNING           15754  1.7  1.7 87.4    87.4
## HEAVY SNOW          15708  1.7  1.7 89.1    89.1
## HEAVY RAIN          11723  1.3  1.3 90.4    90.4
## WINTER STORM        11433  1.3  1.3 91.7    91.7
## WINTER WEATHER       7026  0.8  0.8 92.4    92.4
## FUNNEL CLOUD         6839  0.8  0.8 93.2    93.2
## MARINE TSTM WIND     6175  0.7  0.7 93.9    93.9

Number of unique event types:

length(summary(as.factor(data$event2),maxsum=1000))
## [1] 985

From above results, we see that:

  • The top 15 categories represent nearly 94% of all event types (by frequency); we also notice overlapping categories with similar weather types
  • There are 985 unique category names

To re-classify and catogorize all 985 names would be very time consuming as well as unproductive given the goals of the research (identify the most destructive events). Therefore, the following cleaning procedures will seek to identify a select “top-tier”" list that provides an adequate representation.

Through a few iterations of re-classifying, the following cleaing procedure combines similar weather events into a list of eleven final categoreis which represent approximately 98% of total events.

## Create cateogories of events that represent 98% of all events
data$event2<-str_replace_all(data$event2, "DENSE FOG|FOG", "FOG")
data$event2<-str_replace_all(data$event2, "EXCESSIVE HEAT|HEAT", "HEAT")
data$event2<-str_replace_all(data$event2, "STORM WIND/HAIL|HAIL", "HAIL")
data$event2<-str_replace_all(data$event2, "WILD/FOREST FIRE|WILDFIRE", "WILDFIRE")
data$event2<-str_replace_all(data$event2, "FUNNEL CLOUD|TORNADO", "TORNADO")
data$event2<-str_replace_all(data$event2, "FLOODING|FLASH FLOOD|URBAN/SML STREAM FLD|COASTAL FLOOD|HIGH SURF|FLOOD/FLOOD|FLOOD|HEAVY RAIN", "FLOOD")
data$event2<-str_replace_all(data$event2, "SNOW|WINTER WEATHER/WIND CHILL|COLD/WIND CHILL|WINTER WEATHER/MIX|EXTREME COLD|LAKE-EFFECT SNOW|WINTER STORM|EXTREME COLD/WIND CHILL|ICE STORM|HEAVY SNOW|FROST/FREEZE|WINTER WEATHER|BLIZZARD", "WINTER WEATHER")
data$event2<-str_replace_all(data$event2, "STORM WIND/HAIL|MARINE STORM WIND|THUNDERSTORMS|MARINETHUNDERSTORM|TSTM WIND|THUNDERSTORM WIND|THUNDERSTORM WINDS|TSTM WIND/HAIL|STORM WINDS|MARINE TSTM WIND|MARINE THUNDERSTORM WIND|STRONG WIND|HIGH WIND|HIGH WINDS|MARINE THUNDERSTORM", "STORM WIND")
## Create frequency counts to verify 98% of events included in analysis
head(questionr::freq(data$event2, cum = TRUE, sort = "dec", total = TRUE), 11)
##                     n    % val% %cum val%cum
## STORM WIND     338272 37.5 37.5 37.5    37.5
## HAIL           288661 32.0 32.0 69.5    69.5
## FLOOD           96213 10.7 10.7 80.1    80.1
## TORNADO         67491  7.5  7.5 87.6    87.6
## WINTER WEATHER  43755  4.8  4.8 92.5    92.5
## STORM WINDS     22578  2.5  2.5 95.0    95.0
## LIGHTNING       15754  1.7  1.7 96.7    96.7
## WILDFIRE         4218  0.5  0.5 97.2    97.2
## WATERSPOUT       3796  0.4  0.4 97.6    97.6
## DROUGHT          2488  0.3  0.3 97.9    97.9
## HEAT             2445  0.3  0.3 98.2    98.2

Next, we create a new field “eventnew” that only includes these 11 events for the analysis:

## Create new event field with top 11 event categories (98.6% OF EVENTS)
data$eventnew<-NA
data <- within(data, eventnew[event2=="STORM WIND"] <- "Storm Wind")
data <- within(data, eventnew[event2=="HAIL"] <- "Hail")
data <- within(data, eventnew[event2=="FLOOD"] <- "Flood")
data <- within(data, eventnew[event2=="TORNADO"] <- "Tornado")
data <- within(data, eventnew[event2=="WINTER WEATHER"] <- "Winter Storms")
data <- within(data, eventnew[event2=="LIGHTNING"] <- "Ligthning")
data <- within(data, eventnew[event2=="WILDFIRE"] <- "Wildfire")
data <- within(data, eventnew[event2=="WATERSPOUT"] <- "Waterspout")
data <- within(data, eventnew[event2=="DROUGHT"] <- "Drought")
data <- within(data, eventnew[event2=="HEAT"] <- "Heat")
data <- within(data, eventnew[event2=="FOG"] <- "Fog")

After the cleaning procedures are completed, we can then run summary counts on the four analysis variables (Fatalities, Injuries, Property Damage and Crop Damage) to validate the high degree of causal effects from these events.

datafat <- data %>% group_by(eventnew) %>%
  summarise(
    EventCount = n(),
    Fatalities = sum(FATALITIES),
    Injuries = sum(INJURIES),
    Property = sum(propexp),
    Crops = sum(cropexp)
  )
na.omit(arrange(datafat, desc(EventCount)))
## # A tibble: 11 x 6
##    eventnew      EventCount Fatalities Injuries Property  Crops
##    <chr>              <int>      <dbl>    <dbl>    <dbl>  <dbl>
##  1 Storm Wind        338272       1007     9896 12.1     26.7  
##  2 Hail              288661         15     1361 13.9     43.7  
##  3 Flood              96213       1684     9052 38.5     47.9  
##  4 Tornado            67491       5633    91349 51.6      9.63 
##  5 Winter Storms      43755        844     5864  7.39     1.74 
##  6 Ligthning          15754        816     5230  0.929    0.104
##  7 Wildfire            4218         87     1456  5.23     2.30 
##  8 Waterspout          3796          3       29  0.00935  0    
##  9 Drought             2488          0        4  1.05     1.37 
## 10 Heat                2445       2840     8625  0.00955  0.296
## 11 Fog                 1831         80     1076  0.0228   0

Results of Analysis

Once the data is processed, three analyses are conducted to exam which weather events have had the most detrimental effects.

Analysis # 1 - Number of Events

From the bar chart below, we see that Storm Winds and Hail emerge as the most frequent types of weather events. Floods, Tornadoes and Winter Storms also have high frequency relative to others.

# Bar Chart 1 - Most Frequent Events
 # Make data table for plot
tbleve<-table(data$eventnew)
tbleve<-data.frame(tbleve)
tbleve<-arrange(tbleve, desc(tbleve[,2]))

# Make plot 
ggplot(tbleve, aes(x=reorder(Var1, -Freq), y=Freq)) +
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Number of Events", 
       subtitle="Years 1950 - 2011", 
       caption="source: NOAA",
       y = "Frequency",
       x = "Event")+
  theme(axis.text.x = element_text(angle=65, vjust=0.6))+
  theme(plot.title=element_text(size=15), plot.subtitle = element_text(size = 10), axis.title.x=element_text(size=8), 
        axis.title.y=element_text(size=7), axis.text.x=element_text(size=8), 
        axis.text.y=element_text(size=7))+
  scale_y_continuous(labels = comma, limits = c(0, 390000), breaks = c(100000, 200000, 300000))

Analysis # 2 - Number of Fatalities & Injuries (by Event)

The following chart shows that Tornadoes are the most damaging type of weather event, resulting in the most injuries and fatalities among all categories. Heat, Flooding and Winter Storms are also highly impact.

## Bar chart 2 - Fatilites and Injuries by Event
  # Make data table for plot
tblfat<-data %>% filter(!is.na(eventnew)) %>% group_by(eventnew) %>%
    summarise(Injuries = sum(INJURIES), Fatalities = sum(FATALITIES))
tblfat <- tblfat[order(desc(tblfat$Fatalities)), ]
tblfat$eventnew <- factor(tblfat$eventnew, levels = tblfat$eventnew)

  # Make plot 
tblfat2 <- melt(tblfat,id="eventnew")   
ggplot(tblfat2, aes(x=eventnew, y=value, fill=variable)) + 
  stat_summary(fun.y=sum, geom="bar", position=position_dodge(1)) +
  labs(title="Number of Fatilities & Injuries", 
       subtitle="Years 1950 - 2011", 
       caption="source: NOAA",
       y = "No. of Fatalities/Injuries",
       x = "Event")+
  scale_fill_manual("legend", values = c("Injuries" = "burlywood2", "Fatalities" = "black"))+
  theme(legend.title = element_blank()) +  
  theme(axis.text.x = element_text(angle=65, vjust=0.6))+
  theme(plot.title=element_text(size=15), plot.subtitle = element_text(size = 10), axis.title.x=element_text(size=8), 
        axis.title.y=element_text(size=7), axis.text.x=element_text(size=8), 
        axis.text.y=element_text(size=7))+
  scale_y_continuous(labels = comma, limits = c(0, 100000), breaks = c(20000, 40000, 60000, 80000))

Analysis # 3 - Property and Crop Damage Expense (by Event)

Regarding property and crop damage, Tornadoes, Floods, Hail and Storm Winds have caused the most most economic damage, resulting in the loss of billions of dollars of time.

## Bar chart 3 - Crop and Property Damage by Event
# Make data table for plot
tbldam<-data %>% filter(!is.na(eventnew)) %>% group_by(eventnew) %>%
  summarise(property = sum(propexp), crops = sum(cropexp))
tbldam <- tbldam[order(desc(tbldam$property)), ]
tbldam$eventnew <- factor(tbldam$eventnew, levels = tbldam$eventnew)

# Make plot 
tbldam2 <- melt(tbldam,id="eventnew")   
ggplot(tbldam2, aes(x=eventnew, y=value, fill=variable)) + 
  stat_summary(fun.y=sum, geom="bar", position=position_dodge(1)) +
  labs(title="Property and Crop Damage Expense", 
       subtitle="Years 1950 - 2011 (US $Billions)", 
       caption="source: NOAA",
       y = "US Dollars (000,000,000's)",
       x = "Event")+
  scale_fill_manual("legend", values = c("property" = "lightsalmon4", "crops" = "springgreen4"))+
  theme(legend.title = element_blank()) +  
  theme(axis.text.x = element_text(angle=65, vjust=0.6))+
  theme(plot.title=element_text(size=15), plot.subtitle = element_text(size = 10), axis.title.x=element_text(size=8), 
        axis.title.y=element_text(size=7), axis.text.x=element_text(size=8), 
        axis.text.y=element_text(size=7))+
  scale_y_continuous(labels = dollar, limits = c(0, 55), breaks = c(10, 20, 30, 40, 50))

Summary

Based on the above findings, Tornadoes, Floods, Hail, Heat and Winter Storms have caused the greatest destructive impact in the US from 1950 to 2011. This is especially true for Tornadoes, which can have the greatest impact on human lives.