Analysis of health and economic impact of weather events in US using NOAA data

Synopsis

The goal of this analysis is to study weather event data from 1950 to November 2011 and find the event types with the greatest impact on population health and those with the greatest economic impact. The impact on health will be measured by the total number of injuries and fatalities by event type, while the economic impact will be measured by the total value of damage to property and crops.

The data is from the US National Oceanic and Atmospheric Administration’s (NOAA) Storm Events Database.

The analysis shows tornadoes have the greatest health impact, being responsible for a significantly higher number of injuries and deaths than any other event type. The events responsible for the greatest property and crop damage are floods and draughts respectively.

1. Data processing

System information

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

1.1 Loading required libraries

library(dplyr)

1.2 Downloading storm data
We check if the raw data file exists in the working directory. If it does not, we download it from the url provided.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filename <- "StormData.csv.bz2"

if (!file.exists(filename)){
        download.file(url, destfile = filename, method = "curl")
}

1.3 Reading in storm data
Now that the raw data file is in the working directory, we read it into a data frame and then subset the columns that we require for the analysis.

data <- read.csv("StormData.csv.bz2")

data <- data[,c("EVTYPE", 
                "FATALITIES", 
                "INJURIES", 
                "PROPDMG", 
                "PROPDMGEXP", 
                "CROPDMG", 
                "CROPDMGEXP")
             ]

1.4 Calculatng total value of damage

1.4.1 Property damage
To calculate the value of the property damage, first we must replace the exponent labels with numerical values. Unrecognised exponent labels, such as +, ? and -, are replaced with 0s. The total value is then calculated by multiplying the damage value by its exponent.

##  Find all exponential indicators
unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
##  Assign exponential values
data$propExp[data$PROPDMGEXP %in% c("", "0")] <- 1
data$propExp[data$PROPDMGEXP == "1"] <- 1e+01
data$propExp[data$PROPDMGEXP %in% c("2", "h", "H")] <- 1e+02
data$propExp[data$PROPDMGEXP %in% c("K", "3")] <- 1e+03
data$propExp[data$PROPDMGEXP == "4"] <- 1e+04
data$propExp[data$PROPDMGEXP == "5"] <- 1e+05
data$propExp[data$PROPDMGEXP %in% c("M", "m", "6")] <- 1e+06
data$propExp[data$PROPDMGEXP == "7"] <- 1e+07
data$propExp[data$PROPDMGEXP == "8"] <- 1e+08
data$propExp[data$PROPDMGEXP == "B"] <- 1e+09

##  Set unrecognised indicators to 0
data$propExp[data$PROPDMGEXP %in% c("+", "?", "-")] <- 0

##  Calculate value of damage
data$propDmgVal <- data$PROPDMG * data$propExp

1.4.2 Crop damage
The same process is applied as in section 1.4.1.

##  Find all exponential indicators
unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
##  Assign exponential values
data$cropExp[data$CROPDMGEXP %in% c("0","")] <- 1
data$cropExp[data$CROPDMGEXP == "2"] <- 1e+02
data$cropExp[data$CROPDMGEXP %in% c("K","k")] <- 1e+03
data$cropExp[data$CROPDMGEXP %in% c("M", "m")] <- 1e+06
data$cropExp[data$CROPDMGEXP == "B"] <- 1e+09

##  Set unrecognised indicators to 0
data$cropExp[data$CROPDMGEXP == "?"] <- 0

##  Calculate value of damage
data$cropDmgVal <- data$CROPDMG * data$cropExp

2. Results

2.1 Health outcomes

2.1.1 Total injuries and deaths by event type
We create a new data frame with the sum of injuries and fatalities by event type.

health <- data %>%
        group_by(EVTYPE) %>%
        summarise(sumFatal = sum(FATALITIES, na.rm = TRUE),
                  sumInj = sum(INJURIES, na.rm = TRUE)
                  )

2.1.2 Top 10 weather events by number of injuries and deaths
Two data frames are now created, the first with the top 10 weather event types measured by fatalities. The second is the top 10 measured by injuries.

fatal_top_10 <- health[order(-health$sumFatal),][1:10,]
inj_top_10 <- health[order(-health$sumInj),][1:10,]

2.1.3 Plotting the health data

x1 <- c("Tornado", "Excessive heat", "Flash flood", "Heat", "Lightning", 
        "TSTM wind", "Flood", "Rip current", "High wind", "Avalanche")
x2 <- c("Tornado", "TSTM wind", "Flood", "Excessive heat", "Lightning",  
        "Heat", "Ice storm", "Flash flood", "Thunderstorm wind", "Hail")

par(mfrow = c(1,2), mar = c(9,4.2,3,1), las = 2, cex.main = 0.9)
barplot(fatal_top_10$sumFatal/1000,
        names.arg = x1,
        ylim = c(0,6),
        main = "Events with most fatalities",
        ylab = "Total fatalities (thousands)",
        col = "cadetblue3")
barplot(inj_top_10$sumInj/1000,
        names.arg = x2,
        ylim = c(0,100),
        main = "Events with most injuries",
        ylab = "Total fatalities (thousands)",
        col = "cadetblue3")
Barplots showing the top 10 weather event types measured by total number 
of fatalities and total number of injuries

Barplots showing the top 10 weather event types measured by total number of fatalities and total number of injuries

The above plot shows that tornadoes were, by a considerable margin, the most dangerous weather event type in the period between 1950 and November 2011. They killed almost three times as many people as excessive heat, the second worst event type by fatalities. Similarly when looking at injuries, tornadoes were responsible for at least thirteen times as many injuries as any other event type.

2.2 Economic outcomes

2.2.1 Total property and crop damage by event type
We create a new data frame with the sum of property damage and crop damage by event type.

econ <- data %>%
          group_by(EVTYPE) %>%
          summarise(sumPropDmg = sum(propDmgVal, na.rm = TRUE),
                    sumCropDmg = sum(cropDmgVal, na.rm = TRUE)
                    )

2.2.2 Top 10 weather events by total property and crop damage
Two data frames are now created, the first with the top 10 weather event types measured by property damage. The second is the top 10 measured by crop damage.

prop_top_10 <- econ[order(-econ$sumPropDmg),][1:10,]
crop_top_10 <- econ[order(-econ$sumCropDmg),][1:10,]

2.2.3 Plotting economic outcomes

x3 <- c("Flood", "Hurricane/typhoon", "Tornado", "Storm surge", "Flash flood", 
        "Hail", "Hurricane", "Tropical storm", "Winter storm", "High wind")
x4 <- c("Drought", "Flood", "River flood", "Ice storm", "Hail",
        "Hurricane", "Hurricane/typhoon", "Flash flood", "Extreme cold", 
        "Frost/freeze")

par(mfrow = c(1,2), mar = c(9,4.2,3,1), las = 2, cex.main = 1)
barplot(prop_top_10$sumPropDmg/1000000000,
        names.arg = x3,
        main = "Events with most property damage",
        ylab = "Property damage (billion $)",
        col = "cadetblue3")
barplot(crop_top_10$sumCropDmg/1000000000,
        names.arg = x4,
        ylim = c(0,14),
        main = "Events with most crop damage",
        ylab = "Crop damage (billion $)",
        col = "cadetblue3")
Barplots showing the top 10 weather event types measured by total value
of propery damage and total value of crop damage

Barplots showing the top 10 weather event types measured by total value of propery damage and total value of crop damage

In the period between 1950 and Novermber 2011, floods have caused the most damage to property when considering the monetary value of repair. With damage worth over $140 billion, they are responsible for more than twice the repair bill of the second worst event type, hurricanes and typhoons. In the same period, droughts have been responsible for the most damage to crops with a value of $14 billion. Flooding is the next worst event type for crops.