Assessment of the economic impact of 60 years of severe weather events in the United States

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities effected by these events. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

##Data ##National Weather Service Storm Database National Climatic Data Center Storm Events database records the information collected on all the severe weather events in the United States from the year 1950. This copy of the database ends at end of November 2011. In the earlier years of the database there are fewer events recorded. This is likely to be due to a lack of good records. More recent years should be considered more complete as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time according to the NWS’s website there are unique periods of record available depending on the event type. The following timelines show the different time spans for each period of unique data collection and processing procedures:

From 1950 through 1954, only tornado events were recorded. From 1955 through 1995, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. Since 1996 the number of event types have increased to 48 (as per NWS directive 10-1605). This will cause some bias in data and may result in mistaken belief that such events have suddenly increased at these cutoff points in time. Also it is unclear from the documentation whether the economic costs are calculated using the “time value of money”. This is important as there is a significant difference between 1950 US$ and 2011 US$.

The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the following source: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 A bz2 zip file can be read using the ‘read.csv()’ function without the need for a separate unzipping section.

Further reading can be obtained from: “https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

Aims

The aim of this study are to address the following questions: 1- which types of events are most harmful with respect to population health? 2- which types of events have the greatest economic consequences?

Methods

load the appropriate packages needed for data analysis

The following R packages are used for data analysis

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(stringr)

Downloading the dataset

This step involves downloading the National Weather Service Storm Database from the link provided. as it is compressed via the bzip2 algorithm to reduce its size, it needs to be unzipped. The uncompressed file is a cvs file called repdata_data_StormData.csv read into a file called storm_data. The following R code performs this task:

if(!file.exists("~/data1")){dir.create("~/data1")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="~/data1/repdata_data_StormData.csv.bz2")
setwd("~/data1")

stormData <- read.csv('repdata_data_StormData.csv.bz2', header = TRUE, sep = ",")
head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Preprocessing the data elements and data visualization

stormData$annual <- as.numeric (format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y"))

Preprocessing

The data in the database requires siginificant processing this includes: Standardising the EVTYPE events and removing whitespaces:

stormData$EVTYPE <- str_trim(stormData$EVTYPE)
stormData$EVTYPE <- toupper(stormData$EVTYPE)

The whole dataset is very large so Reducing the dataset to the data you need would be efficient. It also sums up the fatalilties and injuries caused by each weather event and lists them under Events_harm2:

Analysis_data<-select(stormData, EVTYPE, FATALITIES,INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(Analysis_data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0
#make sure Events are set as character and fatalities and injuries are st as integers.#
Events<- as.character(Analysis_data$EVTYPE)
FATALITIES<-as.integer(Analysis_data$FATALITIES)
INJURIES<- as.integer(Analysis_data$INJURIES)
Events_harm<- aggregate(FATALITIES + INJURIES ~ EVTYPE, data=Analysis_data, sum, na.rm=TRUE)
names(Events_harm)[2]<-"total"
Events_harm2<- Events_harm[order(-Events_harm$total),]
fatal_EVT <- summarise(group_by(Analysis_data, EVTYPE), fatalities = sum(FATALITIES))
top10fatal <- head(arrange(fatal_EVT, desc(fatalities)), n = 10)
injuries_EVT <- summarise(group_by(Analysis_data, EVTYPE), injuries = sum(INJURIES))
 Injuries<- injuries_EVT[order(-injuries_EVT$injuries), ]
Injuriestop10<- head(arrange(injuries_EVT, desc(injuries)), n=10)

The last data preprocessing event is to calculate the economic impact of each weather event by combining the property damage and crop damage enteries. For some reason these values have multipliers for each observtion. These multipliers are Hundred (H), Thousand (K), Million (M) and Billion (B).This is the code for converting these values to their numeric values so the comparisons could be performed:

Analysis_data$PROPDMGEXP<- as.character(Analysis_data$PROPDMGEXP)
Analysis_data<- mutate(Analysis_data, PROPDMGEXP = ifelse (PROPDMGEXP == "B",9, 
                  ifelse (PROPDMGEXP %in% c("M","m"),6,
                           ifelse (PROPDMGEXP %in% c("K","k"), 3, 
                                   ifelse (PROPDMGEXP %in% c("H", "h"), 2,
                                        ifelse(PROPDMGEXP %in% c("+","?","-"),0,
                                          ifelse(PROPDMGEXP == "",1,
                                                 PROPDMGEXP)))))))

Analysis_data$PROPDMGEXP<- as.numeric(Analysis_data$PROPDMGEXP)
Analysis_data$PROPDMG1<- Analysis_data$PROPDMG*10^Analysis_data$PROPDMGEXP

Analysis_data$CROPDMGEXP<- as.character(Analysis_data$CROPDMGEXP)
Analysis_data<- mutate(Analysis_data, CROPDMGEXP = ifelse (CROPDMGEXP == "B",9, 
                  ifelse (CROPDMGEXP %in% c("M","m"),6,
                           ifelse (CROPDMGEXP %in% c("K","k"), 3, 
                                   ifelse (CROPDMGEXP %in% c("H", "h"), 2,
                                        ifelse(CROPDMGEXP %in% c("+","?","-", "Inf"),0,
                                          ifelse(CROPDMGEXP == "",1,
                                                 CROPDMGEXP)))))))
Analysis_data$CROPDMGEXP<- as.numeric(Analysis_data$CROPDMGEXP)
Analysis_data$CROPDMG1<- Analysis_data$CROPDMG*10^Analysis_data$CROPDMGEXP
Analysis_data$DMG<- Analysis_data$PROPDMG1 + Analysis_data$CROPDMG1

propDamg <- summarise(group_by(Analysis_data, EVTYPE), PROPDAM = sum(PROPDMG1))
top10PropDamg <- head(arrange(propDamg, desc(PROPDAM)), n = 10)
cropDamg <- summarise(group_by(Analysis_data, EVTYPE), cropDAMG = sum(CROPDMG1))
top10CropDamg<- head(arrange(cropDamg, desc(cropDAMG)), n=10)

Results

The following histogram illustrates the annualized frequency of (recorded) severe weather events between the years 1950 and 2011:

stormData$annual <- as.numeric (format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y"))
hist(stormData$annual, breaks = 60, main="Frequency of Extreme weather Events per year", ylab = "Year")

As you can see the following the introduction of the NWS directive 10-1605 in 1995 there has been a significant increase iin the yearly frequency of extreme weather events and that makes sense as many more up to 48 types of weather events are recorded. Althrough there is underlying trend of gradually increasing frequency of recorded events.

Impact of severe weather events oh human health in the united States

The following list are the top 10 events which have had the largest impact on human health (i.e. causing death or severe injuries):

head(Events_harm2, 10)
##                EVTYPE total
## 750           TORNADO 96979
## 108    EXCESSIVE HEAT  8428
## 771         TSTM WIND  7461
## 146             FLOOD  7259
## 410         LIGHTNING  6046
## 235              HEAT  3037
## 130       FLASH FLOOD  2755
## 379         ICE STORM  2064
## 677 THUNDERSTORM WIND  1621
## 880      WINTER STORM  1527

The following 2 tables list the events the top 10 events responsible for fatalities and injuries:

Fatalities

print(top10fatal)
## # A tibble: 10 x 2
##    EVTYPE         fatalities
##    <chr>               <dbl>
##  1 TORNADO              5633
##  2 EXCESSIVE HEAT       1903
##  3 FLASH FLOOD           978
##  4 HEAT                  937
##  5 LIGHTNING             816
##  6 TSTM WIND             504
##  7 FLOOD                 470
##  8 RIP CURRENT           368
##  9 HIGH WIND             248
## 10 AVALANCHE             224

Injuries

head(Injuries, 10)
## # A tibble: 10 x 2
##    EVTYPE            injuries
##    <chr>                <dbl>
##  1 TORNADO              91346
##  2 TSTM WIND             6957
##  3 FLOOD                 6789
##  4 EXCESSIVE HEAT        6525
##  5 LIGHTNING             5230
##  6 HEAT                  2100
##  7 ICE STORM             1975
##  8 FLASH FLOOD           1777
##  9 THUNDERSTORM WIND     1488
## 10 HAIL                  1361

This is the graphical representation of the above findings:

library(cowplot)
## 
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
##   default ggplot2 theme anymore. To recover the previous
##   behavior, execute:
##   theme_set(theme_cowplot())
## ********************************************************
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
fatalitiesPlot <- ggplot(top10fatal, aes(x = reorder(EVTYPE,-fatalities), y = fatalities)) + geom_bar(stat = "identity", fill = "blue") +
  theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
    xlab("Event Type") + ylab("Number of Fatalities") + 
    ggtitle("Top 10 Severe Weather Events\n causing Fatalities in US\n from 1995 to 2011")
InjuriesPlot <- ggplot(Injuriestop10, aes(x = reorder(EVTYPE,-injuries), y = injuries)) + geom_bar(stat = "identity", fill = "blue") +
  theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
    xlab("Event Type") + ylab("Number of Injuries") + 
    ggtitle("Top 10 Severe Weather Events\n causing Injuries from 1995 to 2011")
cowplot::plot_grid(fatalitiesPlot, InjuriesPlot, align = "v")

Figure-2: illustrates the human impact of advese weather events.

Economic impact of adverse weather events

The economic impact of adverse weather events between 1950 and 2011 has been considerable the total economic impact of adverse weather events (as recorded in the National Weather Service Storm Database) in the Unites States was US$ 477,329,065,794 of which US$ 428,224,873,514 was property damage and US$ 49,104,192,280 was damage to crops.

#total economic impact
sum(Analysis_data$DMG)
## [1] 477329065794
#Property damage
sum(Analysis_data$PROPDMG1)
## [1] 428224873514
#Crop damage
sum(Analysis_data$CROPDMG1)
## [1] 49104192280

The following is a list of the top 10 causes of property damage as recorded by the National Weather Service Storm Database:

print(top10PropDamg)
## # A tibble: 10 x 2
##    EVTYPE                  PROPDAM
##    <chr>                     <dbl>
##  1 FLOOD             144657709870 
##  2 HURRICANE/TYPHOON  69305840000 
##  3 TORNADO            56947380704.
##  4 STORM SURGE        43323536000 
##  5 FLASH FLOOD        16822725842.
##  6 HAIL               15735268026.
##  7 HURRICANE          11868319010 
##  8 TROPICAL STORM      7703890550 
##  9 WINTER STORM        6688497251 
## 10 HIGH WIND           5270046295
print(top10CropDamg)
## # A tibble: 10 x 2
##    EVTYPE               cropDAMG
##    <chr>                   <dbl>
##  1 DROUGHT           13972566000
##  2 FLOOD              5661968450
##  3 RIVER FLOOD        5029459000
##  4 ICE STORM          5022113500
##  5 HAIL               3025954500
##  6 HURRICANE          2741910000
##  7 HURRICANE/TYPHOON  2607872800
##  8 FLASH FLOOD        1421317100
##  9 EXTREME COLD       1312973000
## 10 FROST/FREEZE       1094186000

This is the graphical representation of the above findings:

library(cowplot)

PROPDMGPlot <- ggplot(top10PropDamg, aes(x = reorder(EVTYPE,-PROPDAM), y = PROPDAM)) + geom_bar(stat = "identity", fill = "blue") +
  theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
    xlab("Event Type") + ylab("Property damage (US$)") + 
    ggtitle("Top 10 Severe Weather Events causing property damage in the US")
CROPDMGPlot <- ggplot(top10CropDamg, aes(x = reorder(EVTYPE,-cropDAMG), y = cropDAMG)) + geom_bar(stat = "identity", fill = "blue") +
  theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
    xlab("Event Type") + ylab("Crop damage (US$)") + 
    ggtitle("Top 10 Severe Weather Events causing property damage in the US")
cowplot::plot_grid(PROPDMGPlot, CROPDMGPlot, align = "v")