This report presents the effect of different extreme weather events on public health and economy. Source data used for this study is the storm database from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. This report focuses on the impact of the weather events in terms of fatalities, health injuries, as well as economical losses. The latter are assessed through the property and crop damages. The aim is to address the two following questions:
Q1 - Across the United States, which types of events are most harmful with respect to population health?
Q2 - Across the United States, which types of events have the greatest economic consequences?
1.1 Load the libraries required for this analysis (dplyr, tm, RColorBrewer)
library(dplyr)
library(tm)
library(RColorBrewer)
1.2 Load the data base
Loading the data contained in the zip file using read.csv.
# Create the folder to save the original data
if (!file.exists("data")) {
dir.create("data")
}
# Download the original data in the data folder and read the file
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile="./data/original_data.bz2")
data <- read.csv("./data/original_data.bz2", stringsAsFactors = FALSE)
colnames(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
1.3 Processing raw data
The raw data contains 37 variables, some of them are useless in the scope of this study. Based on the documentation, a limited number of variable is selected in the raw data:
- the beginning date of event BGN_DATE,
- the type of event EVTYPE,
- the number of fatalities FATALITIES and injured people INJURIES,
- the amount of property damage PROPDMG together with the corresponding expansion factors PROPDMGEXP in US dollars (H = hundreds, K = thousands, M = millions, B = billions),
- the amount of crops damage CROPDMG together with the corresponding expansion factors CROPDMGEXP in US dollars.
subdata <- select(data, BGN_DATE, EVTYPE, FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The damages are expressed with a value associated to an expansion factor, the latter consists in a upper-case or lower-case letter. The multiplication of the damage values by the expension factor is done by creating a specific function ExpansionFactor.
ExpansionFactor <- function (value, letter) {
if (is.na(value) | is.na(letter) | letter == "")
{
value
}
else
{
factor <- switch(letter,
H = 100 , h = 100,
K = 1000 , k = 1000,
M = 1000000 , m = 1000000,
B = 1000000000, b = 1000000000)
value * factor
}
}
Then the observations associated to no factor or a known expansion factors are selected in the dataset before using ExpansionFactor function. Finally BGN_DATE variable is converted in “Year” format.
Factors <- c("", "H", "h", "K", "k", "M", "m", "B", "b")
subdata <- subdata[(subdata$PROPDMGEXP %in% Factors) &
(subdata$CROPDMGEXP %in% Factors),]
subdata$PropDamage <- mapply(ExpansionFactor,subdata$PROPDMG,subdata$PROPDMGEXP)
subdata$CropDamage <- mapply(ExpansionFactor,subdata$CROPDMG,subdata$CROPDMGEXP)
subdata$Year <- as.numeric(format(as.Date(subdata$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
The nomenclature used to describe the events is not homogeneous. Functions from tm library are used to standardize a little bit the terminology (non exhaustive).
subdata$EventType <- tolower(subdata$EVTYPE)
subdata$EventType <- removePunctuation(subdata$EventType)
subdata$EventType <- removeNumbers(subdata$EventType)
subdata$EventType <- gsub("tstm","thunderstorm",subdata$EventType)
subdata$EventType[grep("tornado",subdata$EventType)] <- "tornado"
subdata$EventType[grep("thunderstorm",subdata$EventType)] <- "thunderstorm"
subdata$EventType[grep("heat",subdata$EventType)] <- "heat"
subdata$EventType[grep("fld|flood",subdata$EventType)] <- "flood"
Then, the number climate related events registered by year is represented in the following histogram.
hist(subdata$Year,
main = "Severe weather events registered by year \n
from 1950 to 2011 in the US",
xlab = "Years",
ylab = "Number of events registered")
Fig.1: Histogram of severe weather events registered between 1950 and 2011 based on NOAA database.
Based on Fig.1, the period was limited to 1980-2011 as climate related events prior 1980 might not have been systematically registered in the NOAA database.
subdata <- subdata[subdata$Year >=1980,]
The sum of injuries, fatalities, property and crop damages for each event type was computed. In addition, the total damage, estimated through the sum of crop and property losses, was calculated.
subdata2 <- mutate(subdata, TotDamage = PropDamage+CropDamage) %>%
group_by(EventType)
Fatalities <- summarize(subdata2, Sum = sum(FATALITIES))
Injuries <- summarize(subdata2, Sum = sum(INJURIES))
Property <- summarize(subdata2, Sum = sum(PropDamage))
Crop <- summarize(subdata2, Sum = sum(CropDamage))
Total <- summarize(subdata2, Sum = sum(TotDamage))
Q1 - Across the United States, which types of events are most harmful with respect to population health?
In the following barplots, the top ten events which have caused injuries and death are displayed.
par(mfcol=c(1,2))
#Plot Injuries
TopSortedInjuries <- Injuries[order(-Injuries[,2]),][1:10,]
barplot(TopSortedInjuries$Sum/1e3,
legend.text = TopSortedInjuries$EventType,
col=brewer.pal(10,"Spectral"),
main="A - Injuries by climate events\n from 1980-2011 in the US",
args.legend=list(cex=0.6),
xlab="Climate events",
ylab="Total number of injuries [thousands]")
#Plot Fatalities
TopSortedFatalities <- Fatalities[order(-Fatalities[,2]),][1:10,]
barplot(TopSortedFatalities$Sum/1e3,
legend.text = TopSortedFatalities$EventType,
col=brewer.pal(10,"Spectral"),
main="B - Fatalities by climate events\n from 1980-2011 in the US",
args.legend=list(cex=0.6),
xlab="Climate events",
ylab="Total number of fatalities [thousands]")
Fig. 2: Injuries (A) and fatalities (B) due to extreme climate events in the US between 1950 and 2011 based on NOAA database.
While Tornado events were from far the climate event causing the most injuries between 1980 and 2011, extreme heat events caused however the highest number of deaths in the same period in the US.
Q2. Across the United States, which types of events have the greatest economic consequences?
In the following barplots, the top ten events which lead to property and crop losses are displayed.
par(mfcol=c(1,3))
#Plot Property losses
TopSortedProperty <- Property[order(-Property[,2]),][1:10,]
barplot(TopSortedProperty$Sum/1e6,
legend.text = TopSortedProperty$EventType,
col=brewer.pal(10,"RdYlGn"),
main="A - Property damages",
args.legend=list(cex=0.6),
xlab="Climate events",
ylab="Losses [billions $]")
#Plot Crop losses
TopSortedCrop <- Crop[order(-Crop[,2]),][1:10,]
barplot(TopSortedCrop$Sum/1e6,
legend.text = TopSortedCrop$EventType,
col=brewer.pal(10,"PRGn"),
main="B - Crop damages",
args.legend=list(cex=0.6),
xlab="Climate events",
ylab="Losses [billions $]")
#Plot Total losses
TopSortedTotal <- Total[order(-Total[,2]),][1:10,]
barplot(TopSortedTotal$Sum/1e6,
legend.text = TopSortedTotal$EventType,
col=brewer.pal(10,"BrBG"),
main="C - Total damages",
args.legend=list(cex=0.6),
xlab="Climate events",
ylab="Losses [billions $]")
Fig. 3: Property (A), crop (B) and total (C) losses due to extreme climate events in the US between 1980 and 2011 based on NOAA database.
Flooding was the climate events causing the most property losses in the US between 1980 and 2011. Regarding the crops, severe drought was unsurprisingly the first event causing losses, followed by flood episodes. However, losses associated to crops were one order of magnitude below those associated to properties. Therefore, flood had the greatest economic consequences in total between 1980 and 2011 in the US. Flood control and protection measures may help decreasing losses in the future.