Report from Atmospheric Administration's storm database (NOAA)

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Here we explore the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Reports go from 1950 to 2011.

Data Processing

The data are a little bit messy, with 902297 rows and 37 columns in a csv format.

data = read.csv(bzfile("./repdata_data_StormData.csv.bz2"))

Data have been processed to understand which types of events are most harmful with respect to population health. First of all I selected only the rows corresponding to events leading to fatalities or injuries. I then summarized all the the events belonging to the same types.

datanozero = data[data$FATALITIES > 0 | data$INJURIES > 0, ]
datanozero$EVTYPE = tolower(datanozero$EVTYPE)
agg.f = aggregate(FATALITIES ~ EVTYPE, data = datanozero, FUN = sum)
agg.i = aggregate(INJURIES ~ EVTYPE, data = datanozero, FUN = sum)
merged.fi = merge(agg.f, agg.i, by = "EVTYPE")
ordered.fatal = merged.fi[order(merged.fi$FATALITIES, decreasing = T), ]
ordered.injur = merged.fi[order(merged.fi$INJURIES, decreasing = T), ]

I plotted a figure with two panels showing the top-six phenomena resulting in the highest number of fatalities (left) and injuries (right).

require(RColorBrewer)
## Loading required package: RColorBrewer
par(mfrow = c(1, 2), mar = c(8, 4, 3, 1))
barplot(ordered.fatal$FATALITIES[1:6]/1000, col = brewer.pal(6, "Blues"), names = c(ordered.fatal$EVTYPE)[1:6], 
    las = 2, main = "Main Causes for Fatalities", ylim = c(0, 6), ylab = "number of events (thousands)")
abline(h = 0)
barplot(ordered.injur$INJURIES[1:6]/1000, col = brewer.pal(6, "Blues"), names = c(ordered.injur$EVTYPE)[1:6], 
    las = 2, main = "Main Causes for Injuries", ylim = c(0, 100), yaxt = "n", 
    ylab = "number of events (thousands)")
axis(side = 2, at = c(0, 20, 40, 60, 80, 100), labels = format(c(0, 20, 40, 
    60, 80, 100), scientific = FALSE), las = 2)
abline(h = 0)

I finally checked what has been the event responsible for the highest number of fatalities from 1950 to 2011.

paste(tolower(as.character(data[data$FATALITIES == max(data$FATALITIES), 8])), 
    "was responsible for", max(data$FATALITIES), "deaths in", as.character(data[data$FATALITIES == 
        max(data$FATALITIES), 11]), "the", as.Date(data[data$FATALITIES == max(data$FATALITIES), 
        2], "%e/%m/%Y"))
## [1] "heat was responsible for 583 deaths in Northeast Illinois the 1995-12-07"

And the one responsible for the highest number of injuries.

paste(tolower(as.character(data[data$INJURIES == max(data$INJURIES), 8])), "was responsible for", 
    max(data$INJURIES), "injuries in", as.character(data[data$INJURIES == max(data$INJURIES), 
        6]), "the", as.Date(data[data$INJURIES == max(data$INJURIES), 2], "%e/%m/%Y"))
## [1] "tornado was responsible for 1700 injuries in WICHITA the 1979-10-04"

The second part of the analysis consisted in understanding which types of disasters have the greatest economic consequences. First of all I selected only the rows corresponding to events leading to damages in properties or crops.

datanozero = data[data$PROPDMG > 0 | data$CROPDMG > 0, ]
datanozero$PROPDMGEXP = as.character(datanozero$PROPDMGEXP)
datanozero$CROPDMGEXP = as.character(datanozero$CROPDMGEXP)
datanozero$EVTYPE = tolower(datanozero$EVTYPE)

I had to multiply the PROPDMG and CROPDMG columns with their corresponding PROPDMGEXP and CROPDMGEXP columns to get the total cost in dollars. The EXP columns contained: digits (0-9), indicating the exponent to which elevate 10 to get the number for the multiplication; B, standing for billions; M, for millions; K, for thousands; and H, for hundreds (in some cases they used the lower letter). Other values where ambiguous so they have been converted to the 0 digit (+, , ?, -).

substitute_exponent = function(x) {
    for (i in 1:length(x)) {
        if (x[i] == "B") {
            x[i] = 1e+09
        }
        if (x[i] == "K" || x[i] == "3" || x[i] == "k") {
            x[i] = 1000
        }
        if (x[i] == "M" || x[i] == "m" || x[i] == "6") {
            x[i] = 1e+06
        }
        if (x[i] == "H" || x[i] == "h" || x[i] == "2") {
            x[i] = 100
        }
        if (x[i] == "" || x[i] == "-" || x[i] == "0" || x[i] == "?" || x[i] == 
            "+") {
            x[i] = "zero"
        }
        if (x[i] == "1") {
            x[i] = 10
        }
        if (x[i] == "4") {
            x[i] = 10000
        }
        if (x[i] == "5") {
            x[i] = 1e+05
        }
        if (x[i] == "7") {
            x[i] = 1e+07
        }
        if (x[i] == "8") {
            x[i] = 1e+08
        }
    }
    return(x)
}

b = datanozero$CROPDMGEXP
b = substitute_exponent(b)
b[b == "zero"] = "1"

datanozero$CROPDMGEXP = as.numeric(b)
datanozero$CROPCOST = datanozero$CROPDMG * datanozero$CROPDMGEXP

a = datanozero$PROPDMGEXP
a = substitute_exponent(a)
a[a == "zero"] = "1"
datanozero$PROPDMGEXP = as.numeric(a)
datanozero$PROPCOST = datanozero$PROPDMG * datanozero$PROPDMGEXP

I then summarized all the the events belonging to the same type.

agg = aggregate(PROPCOST ~ EVTYPE, data = datanozero, FUN = sum)
agg_2 = aggregate(CROPCOST ~ EVTYPE, data = datanozero, FUN = sum)
merged = merge(agg_2, agg, by = "EVTYPE")
merged$Total_cost = merged$PROPCOST + merged$CROPCOST
ordered = merged[order(merged$Total_cost, merged$PROPCOST, merged$CROPCOST, 
    decreasing = T), ]
ordered$prop.percentage = ordered$PROPCOST/sum(ordered$PROPCOST) * 100
ordered$crop.percentage = ordered$CROPCOST/sum(ordered$CROPCOST) * 100

I plotted a figure with three panels showing the top-five causes resulting in the highest costs in millions dollars (upper panel), and the percentage different events played in the cost both for properties damage (bottom left) and crop damage (bottom right).

nf <- layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))
par(mar = c(7, 8, 3, 1))
barplot(cbind(t(as.matrix(ordered[1:5, 2:4]/1e+06)), c(sum(ordered[6:length(ordered$EVTYPE), 
    2])/1e+06, sum(ordered[6:length(ordered$EVTYPE), 3])/1e+06, sum(ordered[6:length(ordered$EVTYPE), 
    4])/1e+06)), col = rep(c("cyan", "light blue", "dark blue"), 6), names = c(as.character(ordered$EVTYPE[1:5]), 
    "all others"), horiz = T, las = 1, cex.axis = 1, beside = T, xlab = "millions ($)", 
    xlim = c(0, 160000), legend.text = c("Crop Cost", "Property Cost", "Total Cost"), 
    args.legend = list(x = 150000, y = 15), main = "Weather events with the greatest economic consequences", 
    cex.main = 1.1)

ordered.prop = merged[order(merged$PROPCOST, decreasing = T), ]
par(mar = c(0, 6, 1, 4))
pie(c(ordered.prop$PROPCOST[1:5], sum(ordered.prop$PROPCOT[6:length(ordered$Total_cost)]))/1e+06, 
    col = brewer.pal(6, "Blues"), labels = c(as.character(ordered.prop$EVTYPE[1:5]), 
        "all others"), main = "Main Causes of Property Damage", cex.main = 1.1)

par(mar = c(0, 6, 1, 4))
ordered.crop = merged[order(merged$CROPCOST, decreasing = T), ]
pie(c(ordered.crop$CROPCOST[1:5], sum(ordered.prop$CROPCOST[6:length(ordered$Total_cost)]))/1e+06, 
    col = brewer.pal(6, "Blues"), labels = c(as.character(ordered.crop$EVTYPE[1:5]), 
        "all others"), main = "Main Causes of Crop Damage", cex.main = 1.1)

Similarly, I checked what has been the single events responsible for the highest damage on properties from 1950 to 2011.

paste(tolower(as.character(datanozero[datanozero$PROPCOST == max(datanozero$PROPCOST), 
    8])), "was responsible for a damage of", max(datanozero$PROPCOST)/10^9, 
    "billions dollars in", length(as.character(data[datanozero$PROPCOST == max(datanozero$PROPCOST), 
        6])), "occasions; one of which happened on", as.Date(data[datanozero$PROPCOST == 
        max(datanozero$PROPCOST), 2], "%e/%m/%Y")[1], "in", as.character(data[datanozero$PROPCOST == 
        max(datanozero$PROPCOST), 7])[1])
## [1] "flood was responsible for a damage of 115 billions dollars in 4 occasions; one of which happened on 1974-04-05 in TX"

And the one responsible for crop damage.

paste(tolower(as.character(datanozero[datanozero$CROPCOST == max(datanozero$CROPCOST), 
    8]))[1], "and", tolower(as.character(datanozero[datanozero$CROPCOST == max(datanozero$CROPCOST), 
    8]))[2], "were responsible for a damage of", max(datanozero$CROPCOST)/10^9, 
    "billions dollars in", length(as.character(data[datanozero$CROPCOST == max(datanozero$CROPCOST), 
        6])), "occasions; one of which was the", tolower(as.character(datanozero[datanozero$CROPCOST == 
        max(datanozero$CROPCOST), 8]))[2], "that happened on", as.Date(data[datanozero$CROPCOST == 
        max(datanozero$CROPCOST), 2], "%e/%m/%Y")[2], "in", as.character(data[datanozero$CROPCOST == 
        max(datanozero$CROPCOST), 6])[2])
## [1] "river flood and ice storm were responsible for a damage of 5 billions dollars in 8 occasions; one of which was the ice storm that happened on 1975-02-04 in CLINTON"

Results

Across the United States, which types of events are most harmful with respect to population health?

The weather disaster responsible for the higher number of fatalities (5633 cases) and injuries (9.1346 × 104 cases) is the tornado. Other events, are excessive heat, flood, lightning and thunderstorm wind (tstm wind), that nevertheless played a minor role in the decades studied.

plot of chunk plotting.q1

Interestingly, the adverse event responsible for the highest number of fatalities is an intense heat wave that affected northern Illinois from Wednesday July 12 through Sunday July 16 in 1995. The combined and cumulative effects of several days of high temperatures, high humidity and intense July sunshine resulted in the death of 583 people in Chicago and surrounding areas (with peaks of 40 degrees Celsius). On the other hand, the disaster responsible for the highest number of injuries is the tornado that hit Whichita, Texas, injuring 1700 people.

Across the United States, which types of events have the greatest economic consequences?

Adverse weather events can have consequences both on properties and on agricultural crops. Overall, the greatest cost is due to damages on properties. The main cause is flood, followed by hurricane/typhoons and tornado, storm surge and hail (upper panel).

The diverse kind of adverse events contribute to different percentages of damage in properties (bottom left) and crops (bottom right). While floods are the main cause for damage to properties with almost 145 billions dollars in the period studied, damages to crops are due to a lot of different events contributing in a more uniform way, where the most harmful is the drought (with approximately 14 billions $).

plot of chunk plotting.q2

Notably, the single events with the highest economic consequences were the four floods one of which happened in 1975 in Texas. As regards crop damage, the worst single events were 4 river flood and 4 ice storm that were responsible for a 5 billions $ damage. One of this was the ice storm that hit Clinton city in 1975.