Extreme Weather Events in the United States

Synopsis

For this assignment we will look at extreme weather events (EVTYPE variable) in the NOAA Storm Database. The data are available for download here, while a description of the data can be found here.

We will focus on events that caused the most health damage (injuries and fatalities) as well as the most economic consequences (crop damage).

Data Processing

knitr::opts_chunk$set(echo=TRUE, cache=TRUE)
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.26.0 (2024-01-24 05:12:50 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.12.3 (2023-11-18 01:00:02 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, isOpen, nullfile, parse, use, warnings
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(ggplot2)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(! file.exists("download.bz2")){
    download.file(url, "download.bz2")
}
if(! file.exists("download")){
    bunzip2("download.bz2")
} 
data <- fread("download")

Events harmful to population health

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

“FATALITIES” and “INJURIES” are the column names related to population health

# calculate fatalities and injuries by event type
fatalities_inj_evt <- data[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), total_adverse_event = sum(FATALITIES + INJURIES)), by = .(EVTYPE)]
# order by fatalities in decreasing order
fatalities_inj_evt <- fatalities_inj_evt[order(-FATALITIES),]
#take top ten to display
fatalities_inj_evt <- fatalities_inj_evt[1:10,]
#melt data.table in order to display counts by event type
fatalities_inj_evt_melt <- melt(fatalities_inj_evt, id.vars="EVTYPE", variable.name="adverseEvent")
healthPlot <- ggplot(fatalities_inj_evt_melt, aes(x=reorder(EVTYPE, -value), y=value)) + geom_col(aes(fill=adverseEvent), position="dodge") + labs(title = "Top 10 Adverse Weather Events in USA", x="Event Type", y="Frequency") + theme(axis.text.x = element_text(angle=90, hjust=1))

Events that cause most economic damage

In paragraph 2.7 “Damage”, of the “National Weather Service Instruction” document, linked to above, we see that amounts are “rounded to 3 significant digits, followed by an alphabetical character signifying the magnitude of the number, e.g. 1.55B for $1,550,000,000”. Where B is billion, M million, and K thousand.

The amounts appear to be in the “CROPDMG” and “PROPDMG” columns, while the letters appear to be in the “CROPDMGEXP” and “PROPDMGEXP” columns. However, data in the “EXP” columns is not uniform, as shown by:

unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

There are lower case as well as upper case letters as well as numbers (the latter are not mentioned in the documentation). We will use the following translation of numbers and letters to powers of ten (as the EXP part of the column name suggests):

propDmgTranslation <-  c("\"\"" = 10^0,
                 "-" = 10^0, 
                 "+" = 10^0,
                 "?" = 0^0,
                 "0" = 10^0,
                 "1" = 10^1,
                 "2" = 10^2,
                 "3" = 10^3,
                 "4" = 10^4,
                 "5" = 10^5,
                 "6" = 10^6,
                 "7" = 10^7,
                 "8" = 10^8,
                 "9" = 10^9,
                 "H" = 10^2,
                 "K" = 10^3,
                 "M" = 10^6,
                 "B" = 10^9)
cropDmgTranslation <-  c("\"\"" = 10^0,
                "?" = 10^0, 
                "0" = 10^0,
                "2" = 10^2,
                "K" = 10^3,
                "M" = 10^6,
                "B" = 10^9)
#make sure we only have upper case 
toUppercols <- c("PROPDMGEXP", "CROPDMGEXP")
data[,  (toUppercols) := c(lapply(.SD, toupper)), .SDcols = toUppercols]

#do actual translation
data[, PROPDMGEXP := propDmgTranslation[as.character(data[,PROPDMGEXP])]]
data[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]

data[, CROPDMGEXP := cropDmgTranslation[as.character(data[,CROPDMGEXP])] ]
data[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

Then we can calculate the individual costs for property and crops:

data <- data[, .(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, propCost = PROPDMG * PROPDMGEXP, CROPDMG, CROPDMGEXP, cropCost = CROPDMG * CROPDMGEXP)]

Calculate combined properties and crop damage costs per event type (“EVTYPE”):

totalCostCombined <- data[, .(propCost = sum(propCost), cropCost = sum(cropCost), totalCostCombined = sum(propCost) + sum(cropCost)), by = .(EVTYPE)]
# order by totalCostCombined in descending order
totalCostCombined <- totalCostCombined[order(-totalCostCombined),]
# take first ten 
totalCostCombined <- totalCostCombined[1:10,]
firstTen <- head(totalCostCombined)

We see that flooding events have to highest combined cost for both property and crop damage. In graph form:

totalCostCombinedMelt <- melt(firstTen, id.vars="EVTYPE", variable.name="costType")
economicalDamagePlot <- ggplot(totalCostCombinedMelt, aes(x=reorder(EVTYPE, -value), y=value)) + geom_col(aes(fill=costType), position="dodge") + labs(title = "Top 10 Weather Events in USA in Terms of Economical Damage", x="Event Type", y="Cost (USD)") + theme(axis.text.x = element_text(angle=90, hjust=1))

Results

Events harmful to population health

For events that were most harmful to population health we mostly looked at the “FATALITIES” and “INJURIES” columns in the data set, as detailed above. After summing up the fatalities and injuries by event type we come to the following top ten most harmful events:

##             EVTYPE FATALITIES INJURIES total_adverse_event
##             <char>      <num>    <num>               <num>
##  1:        TORNADO       5633    91346               96979
##  2: EXCESSIVE HEAT       1903     6525                8428
##  3:    FLASH FLOOD        978     1777                2755
##  4:           HEAT        937     2100                3037
##  5:      LIGHTNING        816     5230                6046
##  6:      TSTM WIND        504     6957                7461
##  7:          FLOOD        470     6789                7259
##  8:    RIP CURRENT        368      232                 600
##  9:      HIGH WIND        248     1137                1385
## 10:      AVALANCHE        224      170                 394

Which we can visualise in graph form:

In this plot “total_adverse_event” means the sum of FATALITIES and INJURIES.We conclude that TORNADOs are the most harmful events in terms of fatalities and injuries.

Events that cause most economic damage

For events that cause the most economic damage we considered crop damage and property damage (CROPDMG and PROPDMG) as detailed above. After calculating total costs by event type we come to the following top ten of events that cause most economic damage:

##               EVTYPE     propCost   cropCost totalCostCombined
##               <char>        <num>      <num>             <num>
## 1:             FLOOD 144657709807 5661968450      150319678257
## 2: HURRICANE/TYPHOON  69305840000 2607872800       71913712800
## 3:           TORNADO  56947380676  414953270       57362333946
## 4:       STORM SURGE  43323536000       5000       43323541000
## 5:              HAIL  15735267513 3025954473       18761221986
## 6:       FLASH FLOOD  16822673978 1421317100       18243991078

This can be visulalised in a bar chart as follows:

In this graph, totalCostCombined means the cost of property damage and crop damage combined. We conclude that in terms of economic damage FLOODs are the most harmful event.