Synopsis

The analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database reveals that different severe events harm the population and impact the economy. Tornadoes are by far the most harmful events across the country when we consider the number of fatalities and injured people. Damage to properties and crop have been taken into account to identify which type of severe events have the greatest economic consequences. Although many events can heavily affect crop, damage to properties is more than ten times larger compared to damage to crop. For this reason, Floods, Hurricanes and Tornadoes have a major impact on the economy of the country.

Data Processing

We start the data processing by reading the raw data file from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The National Weather Service Storm Data Documentation classifies severe events in 48 different categories, see Table 2.1.1. However, the event names in the raw data file can be mispelled, some time very badly. We first apply a regular expression filter to map the events in the raw data file in the 48 categories of Table 2.1.1.

setwd("~/Dropbox/Coursera/Reproducible-Research/PeerAssessment2")
fdata <- read.csv(bzfile(description = './repdata-data-StormData.csv.bz2'))

evts.name <- as.character(levels(fdata$EVTYPE))
s <- c("astronomical","avalanc(h|)e","^blizzard","^(( |)coastal(| +)flood|beach)","^cold","debris")
s <- c(s,"freezing fog","smoke","drought","dust de","dust","(excessive he|extreme heat|record heat)")
s <- c(s,"cold","flash flood|flood.flash","(^flood|flooding$)","(frost|freeze)","^funnel","fog|vog","^hail","heat")
s <- c(s,"^(heavy rain|hvy rain)","lake(.*) snow","rip cu","^high wind","hurricane|typhoon","heavy snow")
s <- c(s,"ice(| )storm","(lake flood|lakeshore flood)","^( |)lightning","marine hail","marine high wind")
s <- c(s,"marine strong wind","marine t(.*) wind","surf","seiche","sleet","tide","strong wind")
s <- c(s,"(t(h|)u.der..... wind|^(| )TSTM|^thunderstorm)","^tornado|torndao","depre","tropical st","tsunami","volcanic")
s <- c(s,"wa(|y)ter(| )spout","wild(|/forest )fire","winter storm","winter|^wintry")
ev <- vector("list",length(s))
for (i in 1:length(s)) {
    idx <- grep(s[i],evts.name, ignore.case = TRUE)
    ev[[i]] <- evts.name[idx]
    evts.name <- evts.name[-idx]
}
ev[[24]] <- c(ev[[24]],evts.name[idx <- grep("^(| )wind",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[39]] <- c(ev[[39]],evts.name[idx <- grep("^t(.*)winds",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[4]] <- c(ev[[4]],evts.name[idx <- grep("^c(.*)flood|^Ero(.*)flood",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[2]] <- c(ev[[2]],evts.name[idx <- grep("landslide|slide",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[15]] <- c(ev[[15]],evts.name[idx <- grep("flood|stream|fldg",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[12]] <- c(ev[[12]],evts.name[idx <- grep("warm",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[9]] <- c(ev[[9]],evts.name[idx <- grep("dry",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[21]] <- c(ev[[21]],evts.name[idx <- grep("rain|wet",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[26]] <- c(ev[[26]],evts.name[idx <- grep("snow",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[19]] <- c(ev[[19]],evts.name[idx <- grep("hail",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[39]] <- c(ev[[39]],evts.name[idx <- grep("thund",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[24]] <- c(ev[[24]],evts.name[idx <- grep("wind|wnd",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[46]] <- c(ev[[46]],evts.name[idx <- grep("fire",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[13]] <- c(ev[[13]],evts.name[idx <- grep("low te|hyp",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[17]] <- c(ev[[17]],evts.name[idx <- grep("funnel",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[16]] <- c(ev[[16]],evts.name[idx <- grep("ice|icy",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[29]] <- c(ev[[29]],evts.name[idx <- grep("lig",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[12]] <- c(ev[[12]],evts.name[idx <- grep("temp|hot",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]
ev[[3]] <- c(ev[[3]],evts.name[idx <- grep("bli",evts.name, ignore.case = TRUE)])
evts.name <- evts.name[-idx]

events <- c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill","Debris Flow")
events <- c(events,"Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat")
events <- c(events,"Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze","Funnel Cloud")
events <- c(events,"Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind")
events <- c(events,"Hurricane (Typhoon)","Ice Storm","Lake-Effect Snow","Lakeshore Flood","Lightning")
events <- c(events,"Marine Hail","Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind")
events <- c(events,"Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind")
events <- c(events,"Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash","Waterspout")
events <- c(events,"Wildfire","Winter Storm","Winter Weather")

events.ref <- cbind(events,ev[c(1:6,18,8:17,7,19:21,26,34,24,25,27,22,28:33,23,35:48)])

pdata <- data.frame(matrix(vector(),0,10))
names(pdata) <- c("EventClass","EvType","Fatalities","Injuries","DamageProp",
                    "DamagePropExp","DamagePropCash","DamageCrop","DamageCropExp","DamageCropCash")

Then we create a function valuedmg to calculate the numerical value of the damage of each event from the variables in the raw data file.

valuedmg <- function(v, vexp) {
    expv <- c("H",100,"K",1000,"M",1e6,"B",1e9)
    n <- length(v)
    valuedmg <- vector(mode = "numeric", length = n)
    for (i in 1:n) {
        if (v[i] > 0) {
            j <- match(as.character(vexp[i]),expv)
            if (!is.na(j)) {
                valuedmg[i] <- v[i] * as.numeric(expv[j+1])
            }
        }
    }
    valuedmg
}

We organize our data in an empty data frame with fields EventClass, EvType, Fatalities, Injuries, DamageProp, DamagePropExp, DamagePropCash, DamageCrop, DamageCropExp, DamageCropCash. For each one of the 48 different categories, we copy the interesting fields of the records in the raw data frame in our new data frame and, also, we calculate the numerical value of the damage at properties and crop.

for (i in 1:length(events)) {
    idx <- fdata$EVTYPE %in% events.ref[[i,2]]
    
    pdata.add <- data.frame(EventClass = events.ref[i,1],EvType = fdata$EVTYPE[idx])
    pdata.add <- cbind(pdata.add,Fatalities = fdata$FATALITIES[idx],Injuries = fdata$INJURIES[idx]
                        ,DamageProp = fdata$PROPDMG[idx],DamagePropExp = fdata$PROPDMGEXP[idx]
                        ,DamageCrop = fdata$CROPDMG[idx],DamageCropExp = fdata$CROPDMGEXP[idx])
    
    dmg.prop <- valuedmg(pdata.add$DamageProp,pdata.add$DamagePropExp)
    dmg.crop <- valuedmg(pdata.add$DamageCrop,pdata.add$DamageCropExp)
    
    pdata.add <- cbind(pdata.add[,1:6],DamagePropCash = dmg.prop,pdata.add[,7:8],DamageCropCash = dmg.crop)
    
    pdata <- rbind(pdata,pdata.add)
}

n.fatalities <- with(pdata,tapply(Fatalities,events,FUN = sum))
n.injuries <- with(pdata,tapply(Injuries,events,FUN = sum))
dmg.prop.sum <- with(pdata,tapply(DamagePropCash,events,FUN = sum))
dmg.crop.sum <- with(pdata,tapply(DamageCropCash,events,FUN = sum))

sum.data <- data.frame(Event = names(n.fatalities), Fatalities = as.vector(n.fatalities),
                       Injuries = as.vector(n.injuries), DamageProperties = as.vector(dmg.prop.sum),
                       DamageCrop = as.vector(dmg.crop.sum), 
                       DamageTotal = as.vector(dmg.prop.sum) + as.vector(dmg.crop.sum))

idx1 <- order(sum.data$Fatalities, decreasing = TRUE)
idx2 <- order(sum.data$Injuries, decreasing = TRUE)
idx3 <- order(sum.data$DamageProperties, decreasing = TRUE)
idx4 <- order(sum.data$DamageCrop, decreasing = TRUE)
idx5 <- order(sum.data$DamageTotal, decreasing = TRUE)

rfat <- sum(sum.data$Fatalities[idx1[1:10]]) / sum(sum.data$Fatalities[idx1])
rinj <- sum(sum.data$Injuries[idx2[1:10]]) / sum(sum.data$Injuries[idx2])
rprop <- sum(sum.data$DamageProperties[idx3[1:10]]) / sum(sum.data$DamageProperties[idx3])
rcrop <- sum(sum.data$DamageCrop[idx4[1:10]]) / sum(sum.data$DamageCrop[idx4])
rtot <- sum(sum.data$DamageTotal[idx5[1:10]]) / sum(sum.data$DamageTotal[idx5])
percent_events = nrow(pdata) / nrow(fdata)

This procedure allows us to classify 901597 of the total 902297 events in the raw data frame. Only 0.0776% of the original records have not been taken into account in our analysis. Then, we calculate the sum of fatalities, injuries, damage to properties and crop for each one of the 48 categories.

Results

Our analysis reveals that there is not an unique type of event that threatens the population and impact the economy. In the next three plots we are gonna focus on what event is more harmful for the population and which one creates more economical damage. However, in this document we have ignored the economical damage due to fatalities and injuries because we are not able to quantify it. For each figure, we report only the ten most significant events for the analysis.

idx1 <- idx1[1:10]
idx2 <- idx2[1:10]
cexf = 1.25
# par(mfrow = c(1,2), mar = c(0,4,0,0)+0.1, oma = c(10,0,4,6))
par(mfrow = c(1,2), mar = c(10,4,1,0.25)+0.1, oma = c(0,0,0,0))
barplot(height = sum.data$Fatalities[idx1]/1e3, 
        names.arg = sum.data$Event[idx1],
        horiz = FALSE, las = 2,ylab = "Fatalities (Thousand)",
        cex.lab = cexf,col = "red")
par(mar = c(10,4,1,0)+0.1)
barplot(height = sum.data$Injuries[idx2]/1e3, 
        names.arg = sum.data$Event[idx2],
        horiz = FALSE, las = 2,ylab = "Injuries (Thousand)",
        col = "blue",cex.lab = cexf)

plot of chunk plot_fatalities

Figure 1. (Left Plot) Number of fatalities (in thousands) for each event type. These ten events represent the 87.1% of the total fatalities in the database. (Right Plot) Number of injuries (in thousands) for each category. These ten events represent the 92.1% of the total injuries in the database.

From Fig. 1 we see that Tornadoes are the most harmful events with respect to population health. The sum of fatalities due to Heat events (e.g. Excessive Heat and Heat) is roughly only half of the number of people which died because of Tornadoes. As expected, Tornadoes are by far the meteorological events responsible for most of the injuries in the country.

idx3 <- idx3[1:10]
idx4 <- idx4[1:10]
cexf = 1.25
par(mfrow = c(1,2), mar = c(10,4,1,0.25)+0.1, oma = c(0,0,0,0))
barplot(height = sum.data$DamageProperties[idx3]/1e9, 
        names.arg = sum.data$Event[idx3],
        horiz = FALSE, las = 2,ylab = "Damage Properties (Billion)",
        cex.lab = cexf,col = "cyan")
par(mar = c(10,4,1,0)+0.1)
barplot(height = sum.data$DamageCrop[idx4]/1e9, 
        names.arg = sum.data$Event[idx4],
        horiz = FALSE, las = 2,ylab = "Damage Crop (Billion)",
        col = "green",cex.lab = cexf)

plot of chunk plot_damage_partial

Figure 2. (Left Plot) Ten most representative categories of severe events for damage to properties (in billions). These ten events represent the 95.5% of the damage to properties in the database. (Right Plot) Damage to crop (in billions) for each event type. These ten events represent the 92.7% of the damage to crop in the database.

The second figure of this report shows that properties and crop are vulnerable to different severe events. Floods, Hurricanes and Tornadoes are heavily responsible to damage to properties. There is probably a strong correlation between extreme events like Hurricanes and Tornadoes and resulting floods. However, the current analysis is unable to quantify this correlation.

On the other hand, many severe events have a strong economical impact on crop. Lack (Droughts) and too much water (Floods) can easily damage plants, also severe meteorological conditions due to Hurricanes, Ice Storms and low temperatures (Hail and Frost/Freeze) create strong damage to crop. For this reason, it is impossible to identify the event that has the greatest economical impact on crop.

idx5 <- idx5[1:10]
cexf = 1.25
par(mfrow = c(1,1), mar = c(10,4,1,0.25)+0.1, oma = c(0,0,0,0))
barplot(height = sum.data$DamageTotal[idx5]/1e9, 
        names.arg = sum.data$Event[idx5],
        horiz = FALSE, las = 2,ylab = "Damage Total (Billion)",
        cex.lab = cexf,col = "grey")

plot of chunk plot_damage_total

Figure 3. Total damage (in billions) for each event category. These ten severe events represent the 92.8% of the total damage in the database.

Finally, the plot of the total damage reveals that severe events which impact most properties have a greates economical consequence in the country. This is not unexpected because of the large difference in damage (an order of magnitude) between the two plots in Fig. 2. Floods, Hurricanes and Tornadoes are the most serious events with the greates economical impact. Further analysis is needed to reveal a correlation between floods and these extreme meteorological events (Hurricanes etc.).