Most harmful weather events in the US

Synopsys

The objective of this report is to identify the most harmful weather events in the US affecting both, population health and material damages. The main data source is the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks the characteristics of major storms and weather events in the US since 1950.

The weather events most harmful to human health are tornados, followed closely by heat related events and, in a third row, flood and wind events. All these events account for almost 90% of fatalities. Curiously, lightning events, as unusual as they might seem, appear in the fifth position.

Unsurprisingly, the events with most economic impact on property and crop damages are floods and violent wind related events, such as tornados, hurricanes and other storms.

Data processing

The following code was used to download the data base in compressed form from the NOAA web site.

setwd("C:/Users/AAB330/Google Drive 2/Training/DataScience/ReproducibleResearch/PeerAssessment_2")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zipfn <- "stormData.csv.bz2"
download.file(url, zipfn)
## Error: esquema de URL sin soporte

After downloading the database, it was un-compressed using the 7-Zip program in the operating system environment. The resulting file was named 'stormData.csv'. Data was loaded into R with the following commands.

fn <- "stormData.csv"
storm <- read.table(fn, sep=",", header=T)

We propose to study the frequency of weather events, so we add a new column with the year to the data set. We do this with the following code.

library(stringr)
storm[,38] <- str_sub(str_extract(storm$BGN_DATE,"/[0-9]+ "),2,5)
colnames(storm)[38] <- "year"

Exploring data related to population health

The following code provides a ranking of the most harmful events considering the number of fatalities and injuries they cause.

harm <- with(storm, aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE + year,
                              data=storm, sum))
harm <- harm[with(harm, order(-FATALITIES, -INJURIES)),]
harm[1:10,]
##              EVTYPE year FATALITIES INJURIES
## 665            HEAT 1995        687      808
## 2320        TORNADO 2011        587     6163
## 4           TORNADO 1953        519     5131
## 1476 EXCESSIVE HEAT 1999        500     1461
## 64          TORNADO 1974        366     6824
## 37          TORNADO 1965        301     5197
## 3           TORNADO 1952        230     1915
## 2056 EXCESSIVE HEAT 2006        205      993
## 13          TORNADO 1957        193     1976
## 1356 EXCESSIVE HEAT 1998        168      633

We can obtain totals and averages per year.

tharm <- with(harm, aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data=harm, sum))
tharm <- tharm[with(tharm, order(-FATALITIES, -INJURIES)),]
mharm <- with(harm, aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data=harm, mean))
mharm <- mharm[with(mharm, order(-FATALITIES, -INJURIES)),]
head(tharm)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
head(mharm)
##             EVTYPE FATALITIES INJURIES
## 130 EXCESSIVE HEAT     105.72   362.50
## 834        TORNADO      90.85  1473.32
## 275           HEAT      72.08   161.54
## 278      HEAT WAVE      57.33   103.00
## 153    FLASH FLOOD      51.47    93.53
## 142   EXTREME HEAT      48.00    77.50

It is clear that there are several phenomena that are coded as different causes but that they are closely related, such as 'heat', 'heat wave', 'excesive heat', etc. We explore the impact of grouping all these in one singel group with the following code.

heat <- tharm[grep("HEAT", tharm$EVTYPE, value=F),]
apply(heat[,2:3],2, sum)
## FATALITIES   INJURIES 
##       3138       9154

We can see that the impact is significative. Exploring other Event Types in the same way, we see that it is needed to group them to obtain significant results. We do this with the following code.

#
summDF <- function(df, li) {
    sumdf <- data.frame("Event" = "", "Fatalities" = 0, "Injuries" = 0)[-1,]
    for (n in 1:length(li)) {
        attr <- df[grep(li[[n]], df$EVTYPE, value=F),]
        sumdf <- rbind(sumdf, data.frame("Event" = li[[n]], 
                                         "FATALITIES" = sum(attr[,3]),
                                         "INJURIES" = sum(attr[,4])))    
    }
    sumdf
}
events <- list("TORNADO", "HEAT", "FLOOD", "LIGHTNING", "COLD",
               "WIND", "CURRENT", "AVALANCHE", "STORM","SNOW")
sumharm <- summDF(harm, events)
totals <- apply(sumharm[2:3], 2, sum)
sumharm[,4] <- round(sumharm[,2] / totals[1],2)
sumharm[,5] <- round(sumharm[,3] / totals[2],2)
colnames(sumharm)[4:5] <- c("Prop_Fat", "Prop_Inj")
sumharm <- sumharm[order(-sumharm[,2]),]
sumharm
##        Event FATALITIES INJURIES Prop_Fat Prop_Inj
## 1    TORNADO       5661    91407     0.39     0.68
## 2       HEAT       3138     9154     0.21     0.07
## 3      FLOOD       1523     8603     0.10     0.06
## 6       WIND       1446    11495     0.10     0.09
## 4  LIGHTNING        817     5232     0.06     0.04
## 9      STORM        633     6691     0.04     0.05
## 7    CURRENT        577      529     0.04     0.00
## 5       COLD        443      320     0.03     0.00
## 8  AVALANCHE        224      171     0.02     0.00
## 10      SNOW        167     1161     0.01     0.01

It seems that we have now arrived to a small group of events that we can trust they show more reliably the impact of the different weather events on the US population.

Exploring data related to material damages

Our source data base presents two different magnitudes to be considered when analysing the impact of weather events on material assets: damages to property and damages to crops. As they are codified with different magnitudes codes, we need to process the data set to obtain values that can be added together. The following code was used to do that.

unique(storm$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
magnitudes <- as.character(unique(storm$PROPDMGEXP))
magProp <- data.frame(cbind(magnitudes,c(3,6,0,9,6,0,0,5,6,0,4,2,3,2,7,2,0,0,0)))
colnames(magProp) <- c("Code", "Exp")
magProp$Exp <- as.numeric(as.character(magProp$Exp))
head(magProp, 5)
##   Code Exp
## 1    K   3
## 2    M   6
## 3        0
## 4    B   9
## 5    m   6
costDamage <- function (n) {
    val <- function(x) {10 ** magProp[magProp[,1] == x, 2]}
    cost <- rep(0, n)
    for (i in 1:n) {
        cost[i] <- storm2[i,"PROPDMG"] * val(storm2[i,"PROPDMGEXP"])
    }
    cost
}
storm2 <- storm[!(storm$PROPDMG == 0 & storm$CROPDMG == 0),]
cost <- costDamage(nrow(storm2))
storm2[,39] <- cost
colnames(storm2)[39] <- "Prop_Cost"
storm2[1:3,c(8,38,39)]
##    EVTYPE year Prop_Cost
## 1 TORNADO 1950     25000
## 2 TORNADO 1950      2500
## 3 TORNADO 1951     25000

In the same way, we process the information corresponding to damage to crops with the following code.

unique(storm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
magnitudes <- as.character(unique(storm$CROPDMGEXP))
magCrop <- data.frame(cbind(magnitudes,c(0,6,3,6,9,0,1,3,2)))
colnames(magCrop) <- c("Code", "Exp")
magCrop$Exp <- as.numeric(as.character(magCrop$Exp))
head(magCrop, 5)
##   Code Exp
## 1        0
## 2    M   6
## 3    K   3
## 4    m   6
## 5    B   9
cropDamage <- function (n) {
    val <- function(x) {10 ** magCrop[magCrop[,1] == x,2]}
    cost <- rep(0, n)
    for (i in 1:n) {
        cost[i] <- storm2[i,"CROPDMG"] * val(storm2[i,"CROPDMGEXP"])
    }
    cost
}
cost <- cropDamage(nrow(storm2))
storm2[,40] <- cost
colnames(storm2)[40] <- "Crop_Cost"

We can now add up both property and crop damages to obtain the total impact on material assets.

totDamage <- with(storm2, aggregate(cbind(Prop_Cost, Crop_Cost)  ~ EVTYPE,
                                    data=storm2, sum))
totDamage$Tot_Cost <- totDamage$Prop_Cost + totDamage$Crop_Cost
totDamage <- totDamage[with(totDamage, order(-Tot_Cost)),]
totDamage[1:10,]
##                EVTYPE Prop_Cost Crop_Cost  Tot_Cost
## 72              FLOOD 1.447e+11 5.662e+09 1.503e+11
## 197 HURRICANE/TYPHOON 6.931e+10 2.608e+09 7.191e+10
## 354           TORNADO 5.695e+10 4.150e+08 5.736e+10
## 299       STORM SURGE 4.332e+10 5.000e+03 4.332e+10
## 116              HAIL 1.574e+10 3.026e+09 1.876e+10
## 59        FLASH FLOOD 1.682e+10 1.421e+09 1.824e+10
## 39            DROUGHT 1.046e+09 1.397e+10 1.502e+10
## 189         HURRICANE 1.187e+10 2.742e+09 1.461e+10
## 262       RIVER FLOOD 5.119e+09 5.029e+09 1.015e+10
## 206         ICE STORM 3.945e+09 5.022e+09 8.967e+09

Results

Most harmful weather events to human health

The weather events most harmful to human health are tornados, followed closely by heat related events and, in a third row, flood and wind events. All these events account for almost 90% of fatalities. Curiously, lightning events, as unusual as they might seem, appear in the fifth position.

The following picture show the most harmful events to population health. The code to produce this plot is shown below.

library(ggplot2)
df <- data.frame(sumharm, row.names=NULL) 
ggplot(data = df, aes(x = df$Event, y = df$FATALITIES)) +
    geom_bar(color="blue", fill="blue", stat="identity") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    xlab("Event Type") + ylab("Fatalities") +
    ggtitle("NOAA Top 10 - Fatality Count, 1950-2011")

plot of chunk results_1

Weather events with greatest economic impact

The events with most economic impact on property and crop damages are floods and violent winds-related events, such as tornados, hurricanes and other storms.

The following graph show those weather events with greatest economic impacts. The code to produce this plot is shown below.

library(ggplot2)
df <- totDamage[1:10,]
df$Tot_Cost <- df$Tot_Cost / 10**9
ggplot(data = df, aes(x = df$EVTYPE, y = df$Tot_Cost)) +
    geom_bar(color="blue", fill="blue", stat="identity") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    xlab("Event Type") + ylab("Damages") +
    ggtitle("NOAA Top 10 - Damages, 1950-2011 [Billions USD]")

plot of chunk results_2