The objective of this report is to identify the most harmful weather events in the US affecting both, population health and material damages. The main data source is the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks the characteristics of major storms and weather events in the US since 1950.
The weather events most harmful to human health are tornados, followed closely by heat related events and, in a third row, flood and wind events. All these events account for almost 90% of fatalities. Curiously, lightning events, as unusual as they might seem, appear in the fifth position.
Unsurprisingly, the events with most economic impact on property and crop damages are floods and violent wind related events, such as tornados, hurricanes and other storms.
The following code was used to download the data base in compressed form from the NOAA web site.
setwd("C:/Users/AAB330/Google Drive 2/Training/DataScience/ReproducibleResearch/PeerAssessment_2")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zipfn <- "stormData.csv.bz2"
download.file(url, zipfn)
## Error: esquema de URL sin soporte
After downloading the database, it was un-compressed using the 7-Zip program in the operating system environment. The resulting file was named 'stormData.csv'. Data was loaded into R with the following commands.
fn <- "stormData.csv"
storm <- read.table(fn, sep=",", header=T)
We propose to study the frequency of weather events, so we add a new column with the year to the data set. We do this with the following code.
library(stringr)
storm[,38] <- str_sub(str_extract(storm$BGN_DATE,"/[0-9]+ "),2,5)
colnames(storm)[38] <- "year"
The following code provides a ranking of the most harmful events considering the number of fatalities and injuries they cause.
harm <- with(storm, aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE + year,
data=storm, sum))
harm <- harm[with(harm, order(-FATALITIES, -INJURIES)),]
harm[1:10,]
## EVTYPE year FATALITIES INJURIES
## 665 HEAT 1995 687 808
## 2320 TORNADO 2011 587 6163
## 4 TORNADO 1953 519 5131
## 1476 EXCESSIVE HEAT 1999 500 1461
## 64 TORNADO 1974 366 6824
## 37 TORNADO 1965 301 5197
## 3 TORNADO 1952 230 1915
## 2056 EXCESSIVE HEAT 2006 205 993
## 13 TORNADO 1957 193 1976
## 1356 EXCESSIVE HEAT 1998 168 633
We can obtain totals and averages per year.
tharm <- with(harm, aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data=harm, sum))
tharm <- tharm[with(tharm, order(-FATALITIES, -INJURIES)),]
mharm <- with(harm, aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data=harm, mean))
mharm <- mharm[with(mharm, order(-FATALITIES, -INJURIES)),]
head(tharm)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
## 856 TSTM WIND 504 6957
head(mharm)
## EVTYPE FATALITIES INJURIES
## 130 EXCESSIVE HEAT 105.72 362.50
## 834 TORNADO 90.85 1473.32
## 275 HEAT 72.08 161.54
## 278 HEAT WAVE 57.33 103.00
## 153 FLASH FLOOD 51.47 93.53
## 142 EXTREME HEAT 48.00 77.50
It is clear that there are several phenomena that are coded as different causes but that they are closely related, such as 'heat', 'heat wave', 'excesive heat', etc. We explore the impact of grouping all these in one singel group with the following code.
heat <- tharm[grep("HEAT", tharm$EVTYPE, value=F),]
apply(heat[,2:3],2, sum)
## FATALITIES INJURIES
## 3138 9154
We can see that the impact is significative. Exploring other Event Types in the same way, we see that it is needed to group them to obtain significant results. We do this with the following code.
#
summDF <- function(df, li) {
sumdf <- data.frame("Event" = "", "Fatalities" = 0, "Injuries" = 0)[-1,]
for (n in 1:length(li)) {
attr <- df[grep(li[[n]], df$EVTYPE, value=F),]
sumdf <- rbind(sumdf, data.frame("Event" = li[[n]],
"FATALITIES" = sum(attr[,3]),
"INJURIES" = sum(attr[,4])))
}
sumdf
}
events <- list("TORNADO", "HEAT", "FLOOD", "LIGHTNING", "COLD",
"WIND", "CURRENT", "AVALANCHE", "STORM","SNOW")
sumharm <- summDF(harm, events)
totals <- apply(sumharm[2:3], 2, sum)
sumharm[,4] <- round(sumharm[,2] / totals[1],2)
sumharm[,5] <- round(sumharm[,3] / totals[2],2)
colnames(sumharm)[4:5] <- c("Prop_Fat", "Prop_Inj")
sumharm <- sumharm[order(-sumharm[,2]),]
sumharm
## Event FATALITIES INJURIES Prop_Fat Prop_Inj
## 1 TORNADO 5661 91407 0.39 0.68
## 2 HEAT 3138 9154 0.21 0.07
## 3 FLOOD 1523 8603 0.10 0.06
## 6 WIND 1446 11495 0.10 0.09
## 4 LIGHTNING 817 5232 0.06 0.04
## 9 STORM 633 6691 0.04 0.05
## 7 CURRENT 577 529 0.04 0.00
## 5 COLD 443 320 0.03 0.00
## 8 AVALANCHE 224 171 0.02 0.00
## 10 SNOW 167 1161 0.01 0.01
It seems that we have now arrived to a small group of events that we can trust they show more reliably the impact of the different weather events on the US population.
Our source data base presents two different magnitudes to be considered when analysing the impact of weather events on material assets: damages to property and damages to crops. As they are codified with different magnitudes codes, we need to process the data set to obtain values that can be added together. The following code was used to do that.
unique(storm$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
magnitudes <- as.character(unique(storm$PROPDMGEXP))
magProp <- data.frame(cbind(magnitudes,c(3,6,0,9,6,0,0,5,6,0,4,2,3,2,7,2,0,0,0)))
colnames(magProp) <- c("Code", "Exp")
magProp$Exp <- as.numeric(as.character(magProp$Exp))
head(magProp, 5)
## Code Exp
## 1 K 3
## 2 M 6
## 3 0
## 4 B 9
## 5 m 6
costDamage <- function (n) {
val <- function(x) {10 ** magProp[magProp[,1] == x, 2]}
cost <- rep(0, n)
for (i in 1:n) {
cost[i] <- storm2[i,"PROPDMG"] * val(storm2[i,"PROPDMGEXP"])
}
cost
}
storm2 <- storm[!(storm$PROPDMG == 0 & storm$CROPDMG == 0),]
cost <- costDamage(nrow(storm2))
storm2[,39] <- cost
colnames(storm2)[39] <- "Prop_Cost"
storm2[1:3,c(8,38,39)]
## EVTYPE year Prop_Cost
## 1 TORNADO 1950 25000
## 2 TORNADO 1950 2500
## 3 TORNADO 1951 25000
In the same way, we process the information corresponding to damage to crops with the following code.
unique(storm$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
magnitudes <- as.character(unique(storm$CROPDMGEXP))
magCrop <- data.frame(cbind(magnitudes,c(0,6,3,6,9,0,1,3,2)))
colnames(magCrop) <- c("Code", "Exp")
magCrop$Exp <- as.numeric(as.character(magCrop$Exp))
head(magCrop, 5)
## Code Exp
## 1 0
## 2 M 6
## 3 K 3
## 4 m 6
## 5 B 9
cropDamage <- function (n) {
val <- function(x) {10 ** magCrop[magCrop[,1] == x,2]}
cost <- rep(0, n)
for (i in 1:n) {
cost[i] <- storm2[i,"CROPDMG"] * val(storm2[i,"CROPDMGEXP"])
}
cost
}
cost <- cropDamage(nrow(storm2))
storm2[,40] <- cost
colnames(storm2)[40] <- "Crop_Cost"
We can now add up both property and crop damages to obtain the total impact on material assets.
totDamage <- with(storm2, aggregate(cbind(Prop_Cost, Crop_Cost) ~ EVTYPE,
data=storm2, sum))
totDamage$Tot_Cost <- totDamage$Prop_Cost + totDamage$Crop_Cost
totDamage <- totDamage[with(totDamage, order(-Tot_Cost)),]
totDamage[1:10,]
## EVTYPE Prop_Cost Crop_Cost Tot_Cost
## 72 FLOOD 1.447e+11 5.662e+09 1.503e+11
## 197 HURRICANE/TYPHOON 6.931e+10 2.608e+09 7.191e+10
## 354 TORNADO 5.695e+10 4.150e+08 5.736e+10
## 299 STORM SURGE 4.332e+10 5.000e+03 4.332e+10
## 116 HAIL 1.574e+10 3.026e+09 1.876e+10
## 59 FLASH FLOOD 1.682e+10 1.421e+09 1.824e+10
## 39 DROUGHT 1.046e+09 1.397e+10 1.502e+10
## 189 HURRICANE 1.187e+10 2.742e+09 1.461e+10
## 262 RIVER FLOOD 5.119e+09 5.029e+09 1.015e+10
## 206 ICE STORM 3.945e+09 5.022e+09 8.967e+09
The weather events most harmful to human health are tornados, followed closely by heat related events and, in a third row, flood and wind events. All these events account for almost 90% of fatalities. Curiously, lightning events, as unusual as they might seem, appear in the fifth position.
The following picture show the most harmful events to population health. The code to produce this plot is shown below.
library(ggplot2)
df <- data.frame(sumharm, row.names=NULL)
ggplot(data = df, aes(x = df$Event, y = df$FATALITIES)) +
geom_bar(color="blue", fill="blue", stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type") + ylab("Fatalities") +
ggtitle("NOAA Top 10 - Fatality Count, 1950-2011")
The events with most economic impact on property and crop damages are floods and violent winds-related events, such as tornados, hurricanes and other storms.
The following graph show those weather events with greatest economic impacts. The code to produce this plot is shown below.
library(ggplot2)
df <- totDamage[1:10,]
df$Tot_Cost <- df$Tot_Cost / 10**9
ggplot(data = df, aes(x = df$EVTYPE, y = df$Tot_Cost)) +
geom_bar(color="blue", fill="blue", stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type") + ylab("Damages") +
ggtitle("NOAA Top 10 - Damages, 1950-2011 [Billions USD]")