Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. According to our research focused on the period 1996 - 2011, Excessive Heat and Tornados are the events that had the biggest impact on population health, with 3 308 fatalities and 27 058 injuries, the largest account in both categories. Across the period, 39% of the property damages were caused by floods for a total of 144B USD.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./repdata_data_StormData.zip", method = "curl")
noaadata <- read.csv(bzfile("repdata_data_StormData.zip"))
library(dplyr)
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
We know from NOAA that it only recorded measurements from Tornados in its early years. NOAA added the full range of event types in 1996. In order to get a comparison of all types of measurements without biases due to the duration of measurements, we subset all records since 1996 and use only those from the 15 year period up to 2011 in our analysis
#
noaadata$Year <- year(mdy_hms(as.character(noaadata$BGN_DATE)))
noaadata <- subset(noaadata, noaadata$Year>1995)
At first glance, we observe that the EVTYPE variable of the data contains far more unique type of events than the 48 Event Names described in the documentation (Table 1 of section 2.1.1). This can be explained by 2 main reasons : 1. Some additional categories are the concatenation of several EVTYPEs, as it can be difficult to differentiate them. 2. Some additional categories are the result of small differences in labeling.
Let’s correct som of those labelling differences (the main ones only) to make labels of EVTYPE more uniform.
#Code to show similar EVTYPEs: unique(grep("(.*)FLASH(.*)", noaadata$EVTYPE, value = TRUE))
noaadata$EVTYPE <- gsub("(.*)FLASH(.*)", "FLASH FLOOD", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("^TSTM(.*)", "THUNDERSTORM WIND", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)EXCESSIVE HEAT(.*)", "EXCESSIVE HEAT", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)RIP CURRENTS(.*)", "RIP CURRENT", noaadata$EVTYPE)
noaadata$EVTYPE <- gsub("(.*)HEAT WAVE(.*)", "HEAT", noaadata$EVTYPE)
The NOAA data contains a variable PROPDMGEXP, which is a coefficient to be applied to the variable PROPDMG to obtain the full value of damages. The variable PROPDMGEXP values are {- ? + 0 1 2 3 4 5 6 7 8 B h H K m M}, where [0:8] is a coefficient of 10, “h” and “H” stand for “hundreds,”K" stands for thousands, and so on. In order to anticipate on later calculations, let’s process now a new variable with the total value of Property damage.
# let's define a function converting each symbol to the corresponding coefficient.
findcoeff <- function (X) {
if (X == "K") {return(1000)}
else if (X == "M") {return(1000000)}
else if (X == "B") {return(1000000000)}
else if (X %in% c(0:8)) {return(10)}
else {return(0)}
}
#Let's create the new variable PROPDMGVALUE with an initial value of 0. We also store the position of the records where Property Damage is superior to 0.
noaadata$PROPDMGVALUE <- rep(0,length(noaadata$PROPDMG))
isdmg <- which(noaadata$PROPDMG>0)
#We run a loop to fill in the values. This step may take 10 to 15 minutes.
for(i in isdmg) {
noaadata$PROPDMGVALUE[i] <- noaadata$PROPDMG[i] * findcoeff (noaadata$PROPDMGEXP[i])
}
In the following sections, we study the impact of storms on population health and propriety damage at the national level.
fatalities <- sum(noaadata$FATALITIES)
injuries <- sum(noaadata$INJURIES)
Records of storms and meteorological conditions between 1996 and November 2011 show direct or indirect causes to 8732 fatalities and 57975 injuries in the period.
Let’s take a look at which event types have caused most fatalities and most injuries during this period. To do so, we first evaluate the sum of fatalities and injuries for each event type and isolate a summary of the 15 types of events that have had the most impact.
library(dplyr)
grp_evtype <- group_by(noaadata,EVTYPE)
sum_pophealth <- summarize(grp_evtype, sum(FATALITIES), sum(INJURIES))
colnames(sum_pophealth) <- c("EVTYPE", "tot.FATALITIES", "tot.INJURIES")
sum_pophealth <- arrange(sum_pophealth, desc(tot.FATALITIES), desc(tot.INJURIES))
fat_events <- sum(sum_pophealth$tot.FATALITIES>0)
inj_events <- sum(sum_pophealth$tot.INJURIES>0)
head(sum_pophealth,15)
## # A tibble: 15 x 3
## EVTYPE tot.FATALITIES tot.INJURIES
## <chr> <dbl> <dbl>
## 1 EXCESSIVE HEAT 1797 6391
## 2 TORNADO 1511 20667
## 3 FLASH FLOOD 887 1674
## 4 LIGHTNING 651 4141
## 5 RIP CURRENT 542 503
## 6 FLOOD 414 6758
## 7 THUNDERSTORM WIND 377 5128
## 8 HEAT 237 1222
## 9 HIGH WIND 235 1083
## 10 AVALANCHE 223 156
## 11 WINTER STORM 191 1292
## 12 EXTREME COLD/WIND CHILL 125 24
## 13 EXTREME COLD 113 79
## 14 HEAVY SNOW 107 698
## 15 STRONG WIND 103 278
Let’s modify the data.frame to prepare plotting with a stacked bar chart. To do so, we create a new data.frame where “Fatalities” and “Injuries” are levels of the factor “Casualties”.
fatalities_data <- data.frame(sum_pophealth$EVTYPE[1:15], sum_pophealth$tot.FATALITIES[1:15], rep("Fatalities", 15))
colnames(fatalities_data) <- c("EVTYPE", "Casualties", "Category")
injuries_data <- data.frame(sum_pophealth$EVTYPE[1:15], sum_pophealth$tot.INJURIES[1:15], rep("Injuries", 15))
colnames(injuries_data) <- c("EVTYPE", "Casualties", "Category")
healthdata <- rbind(fatalities_data, injuries_data)
library(ggplot2)
g <- ggplot(healthdata, aes(x = reorder(EVTYPE, -Casualties), y = Casualties, fill = Category ))
g <- g + theme(axis.text.x = element_text(angle = 65, hjust = 1))
g <- g + geom_bar(stat="identity")
g <- g + labs(title ="Impact of Storms on Population Health", x = "Event Type")
g
According to this chart, Tornados and Excessive Heat are the biggest danger for health population, as they account for both the largest number of fatalities and a large proportion of the injuries. Flash Flood and Lightnings have also had strong impact on population health during the period.
In this second part, we focus on the economic consequences caused by storms, and specifically on the property damages.
## It's time to use the PROPDMGVALUE variable that we created while processing data for economic damages.
sum_damages <- summarize(grp_evtype, sum(PROPDMGVALUE)/1000000000)
colnames(sum_damages) <- c("EVTYPE", "PROPDMGVALUE")
sum_damages <- arrange(sum_damages, desc(PROPDMGVALUE))
damages_events <- sum(sum_damages$PROPDMGVALUE>0)
sum_damages$PROPDMGPC <- (sum_damages$PROPDMGVALUE*100)/sum(sum_damages$PROPDMGVALUE)
damages_rank <- head(sum_damages,15)
damages_rank$PROPDMGVALUE <- round(damages_rank$PROPDMGVALUE)
print(damages_rank)
## # A tibble: 15 x 3
## EVTYPE PROPDMGVALUE PROPDMGPC
## <chr> <dbl> <dbl>
## 1 FLOOD 144 39.2
## 2 HURRICANE/TYPHOON 69 18.9
## 3 STORM SURGE 43 11.8
## 4 TORNADO 25 6.71
## 5 FLASH FLOOD 15 4.15
## 6 HAIL 15 3.98
## 7 HURRICANE 12 3.22
## 8 THUNDERSTORM WIND 8 2.16
## 9 TROPICAL STORM 8 2.08
## 10 HIGH WIND 5 1.43
## 11 WILDFIRE 5 1.30
## 12 STORM SURGE/TIDE 5 1.27
## 13 ICE STORM 4 0.993
## 14 WILD/FOREST FIRE 3 0.818
## 15 WINTER STORM 2 0.418
Together, Storm events have caused 367B USD of property damages between 1996 and November 2011. The following plot shows the 15 types of events that are associated with the largest property damages :
g <- ggplot(damages_rank, aes(x = reorder(EVTYPE, -PROPDMGVALUE), y = PROPDMGVALUE))
g <- g + theme(axis.text.x = element_text(angle = 65, hjust = 1))
g <- g + geom_bar(stat="identity", fill = "orange")
g <- g + geom_text(aes(label = PROPDMGVALUE), position = position_dodge(0))
g <- g + labs(title ="Economic Consequences of Storms", x = "Event Type", y = "Property Damages (B USD)")
g
Floods, Typhoons and Storm Surges are the three events that caused most of the economic property damages. Together they account for 69.9% of the property damages calculated in the period. In particular, floods have created the most damages with 144 billion USD losses during the period.