Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. According to our research focused on the period 1996 - 2011, Excessive Heat and Tornados are the events that had the biggest impact on population health, with 3 308 fatalities and 27 058 injuries, the largest account in both categories. Across the period, 39% of the property damages were caused by floods for a total of 144B USD.

Obtaining and processing data

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./repdata_data_StormData.zip", method = "curl")
noaadata <- read.csv(bzfile("repdata_data_StormData.zip"))
library(dplyr)
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

We know from NOAA that it only recorded measurements from Tornados in its early years. NOAA added the full range of event types in 1996. In order to get a comparison of all types of measurements without biases due to the duration of measurements, we subset all records since 1996 and use only those from the 15 year period up to 2011 in our analysis

#
noaadata$Year <- year(mdy_hms(as.character(noaadata$BGN_DATE)))
noaadata <- subset(noaadata, noaadata$Year>1995)

At first glance, we observe that the EVTYPE variable of the data contains far more unique type of events than the 48 Event Names described in the documentation (Table 1 of section 2.1.1). This can be explained by 2 main reasons : 1. Some additional categories are the concatenation of several EVTYPEs, as it can be difficult to differentiate them. 2. Some additional categories are the result of small differences in labeling.

Let’s correct som of those labelling differences (the main ones only) to make labels of EVTYPE more uniform.

#Code to show similar EVTYPEs: unique(grep("(.*)FLASH(.*)", noaadata$EVTYPE, value = TRUE))
noaadata$EVTYPE <- gsub("(.*)FLASH(.*)", "FLASH FLOOD", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("^TSTM(.*)", "THUNDERSTORM WIND", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)EXCESSIVE HEAT(.*)", "EXCESSIVE HEAT", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)RIP CURRENTS(.*)", "RIP CURRENT", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)HEAT WAVE(.*)", "HEAT", noaadata$EVTYPE)

The NOAA data contains a variable PROPDMGEXP, which is a coefficient to be applied to the variable PROPDMG to obtain the full value of damages. The variable PROPDMGEXP values are {- ? + 0 1 2 3 4 5 6 7 8 B h H K m M}, where [0:8] is a coefficient of 10, “h” and “H” stand for “hundreds,”K" stands for thousands, and so on. In order to anticipate on later calculations, let’s process now a new variable with the total value of Property damage.

# let's define a function converting each symbol to the corresponding coefficient.
findcoeff <- function (X) {
  if (X == "K") {return(1000)}
  else if (X == "M") {return(1000000)}
  else if (X == "B") {return(1000000000)}
  else if (X %in% c(0:8)) {return(10)}
  else {return(0)}
}

#Let's create the new variable PROPDMGVALUE with an initial value of 0. We also store the position of the records where Property Damage is superior to 0.
noaadata$PROPDMGVALUE <- rep(0,length(noaadata$PROPDMG))
isdmg <- which(noaadata$PROPDMG>0)

#We run a loop to fill in the values. This step may take 10 to 15 minutes.
for(i in isdmg) {
      noaadata$PROPDMGVALUE[i] <- noaadata$PROPDMG[i] * findcoeff   (noaadata$PROPDMGEXP[i])
}

Results

In the following sections, we study the impact of storms on population health and propriety damage at the national level.

Storms impact on population health

fatalities <- sum(noaadata$FATALITIES)
injuries <- sum(noaadata$INJURIES)

Records of storms and meteorological conditions between 1996 and November 2011 show direct or indirect causes to 8732 fatalities and 57975 injuries in the period.

Let’s take a look at which event types have caused most fatalities and most injuries during this period. To do so, we first evaluate the sum of fatalities and injuries for each event type and isolate a summary of the 15 types of events that have had the most impact.

library(dplyr)
grp_evtype <- group_by(noaadata,EVTYPE)
sum_pophealth <- summarize(grp_evtype, sum(FATALITIES), sum(INJURIES))
colnames(sum_pophealth) <- c("EVTYPE", "tot.FATALITIES", "tot.INJURIES")
sum_pophealth <- arrange(sum_pophealth, desc(tot.FATALITIES), desc(tot.INJURIES))
fat_events <- sum(sum_pophealth$tot.FATALITIES>0)
inj_events <- sum(sum_pophealth$tot.INJURIES>0)
head(sum_pophealth,15)
## # A tibble: 15 x 3
##    EVTYPE                  tot.FATALITIES tot.INJURIES
##    <chr>                            <dbl>        <dbl>
##  1 EXCESSIVE HEAT                    1797         6391
##  2 TORNADO                           1511        20667
##  3 FLASH FLOOD                        887         1674
##  4 LIGHTNING                          651         4141
##  5 RIP CURRENT                        542          503
##  6 FLOOD                              414         6758
##  7 THUNDERSTORM WIND                  377         5128
##  8 HEAT                               237         1222
##  9 HIGH WIND                          235         1083
## 10 AVALANCHE                          223          156
## 11 WINTER STORM                       191         1292
## 12 EXTREME COLD/WIND CHILL            125           24
## 13 EXTREME COLD                       113           79
## 14 HEAVY SNOW                         107          698
## 15 STRONG WIND                        103          278

Let’s modify the data.frame to prepare plotting with a stacked bar chart. To do so, we create a new data.frame where “Fatalities” and “Injuries” are levels of the factor “Casualties”.

fatalities_data <- data.frame(sum_pophealth$EVTYPE[1:15], sum_pophealth$tot.FATALITIES[1:15], rep("Fatalities", 15))
colnames(fatalities_data) <- c("EVTYPE", "Casualties", "Category")
injuries_data <- data.frame(sum_pophealth$EVTYPE[1:15], sum_pophealth$tot.INJURIES[1:15], rep("Injuries", 15))
colnames(injuries_data) <- c("EVTYPE", "Casualties", "Category")
healthdata <- rbind(fatalities_data, injuries_data)

library(ggplot2)
g <- ggplot(healthdata, aes(x = reorder(EVTYPE, -Casualties), y = Casualties, fill = Category )) 
g <- g + theme(axis.text.x = element_text(angle = 65, hjust = 1))
g <- g + geom_bar(stat="identity")
g <- g + labs(title ="Impact of Storms on Population Health", x = "Event Type")
g

According to this chart, Tornados and Excessive Heat are the biggest danger for health population, as they account for both the largest number of fatalities and a large proportion of the injuries. Flash Flood and Lightnings have also had strong impact on population health during the period.

The economic consequences of storms

In this second part, we focus on the economic consequences caused by storms, and specifically on the property damages.

## It's time to use the PROPDMGVALUE variable that we created while processing data for economic damages.
sum_damages <- summarize(grp_evtype, sum(PROPDMGVALUE)/1000000000)
colnames(sum_damages) <- c("EVTYPE", "PROPDMGVALUE")
sum_damages <- arrange(sum_damages, desc(PROPDMGVALUE))
damages_events <- sum(sum_damages$PROPDMGVALUE>0)
sum_damages$PROPDMGPC <- (sum_damages$PROPDMGVALUE*100)/sum(sum_damages$PROPDMGVALUE)
damages_rank <- head(sum_damages,15)
damages_rank$PROPDMGVALUE <- round(damages_rank$PROPDMGVALUE)
print(damages_rank)
## # A tibble: 15 x 3
##    EVTYPE            PROPDMGVALUE PROPDMGPC
##    <chr>                    <dbl>     <dbl>
##  1 FLOOD                      144    39.2  
##  2 HURRICANE/TYPHOON           69    18.9  
##  3 STORM SURGE                 43    11.8  
##  4 TORNADO                     25     6.71 
##  5 FLASH FLOOD                 15     4.15 
##  6 HAIL                        15     3.98 
##  7 HURRICANE                   12     3.22 
##  8 THUNDERSTORM WIND            8     2.16 
##  9 TROPICAL STORM               8     2.08 
## 10 HIGH WIND                    5     1.43 
## 11 WILDFIRE                     5     1.30 
## 12 STORM SURGE/TIDE             5     1.27 
## 13 ICE STORM                    4     0.993
## 14 WILD/FOREST FIRE             3     0.818
## 15 WINTER STORM                 2     0.418

Together, Storm events have caused 367B USD of property damages between 1996 and November 2011. The following plot shows the 15 types of events that are associated with the largest property damages :

g <- ggplot(damages_rank, aes(x = reorder(EVTYPE, -PROPDMGVALUE), y = PROPDMGVALUE))
g <- g + theme(axis.text.x = element_text(angle = 65, hjust = 1))
g <- g + geom_bar(stat="identity", fill = "orange")
g <- g + geom_text(aes(label = PROPDMGVALUE), position = position_dodge(0))
g <- g + labs(title ="Economic Consequences of Storms", x = "Event Type", y = "Property Damages (B USD)")
g

Floods, Typhoons and Storm Surges are the three events that caused most of the economic property damages. Together they account for 69.9% of the property damages calculated in the period. In particular, floods have created the most damages with 144 billion USD losses during the period.