Comparing health and economic consequences of weather events across the US

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

We will explore the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database and answer some basic questions about severe weather events and their relationship with population health and economic consequences. The events in the database start in the year 1950 and end in November 2011.

Data processing

packages <- c("maps", "knitr", "scatterplot3d", "fields")
diff <- setdiff(packages, rownames(installed.packages()))
if (length(diff > 0)) {
    install.packages(diff)
}

library(scatterplot3d)
library(fields)
library(maps)

Loading the data and showing names of columns

if (!exists("sdata")) {
    sdata <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
}
names(sdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Damage data is recorded with two variables: PROPDMG (number) and PROPDMGEXP (exponent). We will define a function that transforms this into a single value:

union(levels(sdata$PROPDMGEXP), levels(sdata$CROPDMGEXP))
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M" "k"
realmoney <- function(x, exp) {
    if (is.integer(exp)) 
        return(x * 10^exp)
    if (exp %in% c("", "-")) 
        return(x)
    if (exp %in% c("b", "B")) 
        return(x * 1e+09)
    if (exp %in% c("m", "M")) 
        return(x * 1e+06)
    if (exp %in% c("h", "H")) 
        return(x * 1e+06)
    if (exp %in% c("k", "K")) 
        return(x * 1000)
    return(ceiling(x))
}

sdata$PROP <- mapply(realmoney, x = sdata$PROPDMG, exp = sdata$PROPDMGEXP)
sdata$CROP <- mapply(realmoney, x = sdata$CROPDMG, exp = sdata$CROPDMGEXP)
sdata$COST <- sdata$PROP + sdata$CROP

Results

Injuries by Event Type

How many injuries has each event type provoked? (30 most harmful event types showed)

totinj <- aggregate(sdata$INJURIES, by = list(EVTYPE = sdata$EVTYPE), sum)
totinjnz <- totinj[which(totinj$x > 0), ]
head(totinjnz[with(totinjnz, order(-x)), ], n = 30)
##                 EVTYPE     x
## 834            TORNADO 91346
## 856          TSTM WIND  6957
## 170              FLOOD  6789
## 130     EXCESSIVE HEAT  6525
## 464          LIGHTNING  5230
## 275               HEAT  2100
## 427          ICE STORM  1975
## 153        FLASH FLOOD  1777
## 760  THUNDERSTORM WIND  1488
## 244               HAIL  1361
## 972       WINTER STORM  1321
## 411  HURRICANE/TYPHOON  1275
## 359          HIGH WIND  1137
## 310         HEAVY SNOW  1021
## 957           WILDFIRE   911
## 786 THUNDERSTORM WINDS   908
## 30            BLIZZARD   805
## 188                FOG   734
## 955   WILD/FOREST FIRE   545
## 117         DUST STORM   440
## 978     WINTER WEATHER   398
## 89           DENSE FOG   342
## 848     TROPICAL STORM   340
## 278          HEAT WAVE   309
## 376         HIGH WINDS   302
## 586       RIP CURRENTS   297
## 676        STRONG WIND   280
## 290         HEAVY RAIN   251
## 585        RIP CURRENT   232
## 140       EXTREME COLD   231

We conclude that Tornados are the most dangerous events for the population

Damage by Event Type

How much damage (both property and cost) has each event type provoked? (30 most costly event types showed)

totcost <- aggregate(sdata$COST, by = list(EVTYPE = sdata$EVTYPE), sum)
head(totcost[with(totcost, order(-x)), ], n = 30)
##                         EVTYPE         x
## 170                      FLOOD 1.503e+11
## 411          HURRICANE/TYPHOON 7.191e+10
## 834                    TORNADO 5.735e+10
## 670                STORM SURGE 4.332e+10
## 244                       HAIL 1.876e+10
## 153                FLASH FLOOD 1.756e+10
## 95                     DROUGHT 1.502e+10
## 402                  HURRICANE 1.461e+10
## 590                RIVER FLOOD 1.015e+10
## 427                  ICE STORM 8.967e+09
## 848             TROPICAL STORM 8.382e+09
## 972               WINTER STORM 6.715e+09
## 359                  HIGH WIND 5.909e+09
## 957                   WILDFIRE 5.061e+09
## 856                  TSTM WIND 5.039e+09
## 671           STORM SURGE/TIDE 4.642e+09
## 760          THUNDERSTORM WIND 3.898e+09
## 409             HURRICANE OPAL 3.192e+09
## 955           WILD/FOREST FIRE 3.109e+09
## 298  HEAVY RAIN/SEVERE WEATHER 2.500e+09
## 786         THUNDERSTORM WINDS 1.947e+09
## 842 TORNADOES, TSTM WIND, HAIL 1.602e+09
## 290                 HEAVY RAIN 1.428e+09
## 140               EXTREME COLD 1.361e+09
## 604        SEVERE THUNDERSTORM 1.206e+09
## 212               FROST/FREEZE 1.104e+09
## 310                 HEAVY SNOW 1.067e+09
## 464                  LIGHTNING 9.408e+08
## 30                    BLIZZARD 7.713e+08
## 376                 HIGH WINDS 6.490e+08

We conclude that Floods are the most expensive events in terms of total damage (property and crop)

Injuries by State

This is a comparative map of the spatial distribution of injuries by state.

First we match the states:

stateinj <- aggregate(sdata$INJURIES, by = list(state = sdata$STATE), sum)
data(state.fips)
mat2 <- match(state.fips$abb, stateinj$state)
stateinj <- stateinj[mat2, ]

Now we create the map:


col1 <- heat.colors(100)
logsc <- log(stateinj$x)
mi <- min(logsc)
ma <- max(logsc)

colores <- col1[ceiling(1 + 99 * (logsc - mi)/(ma - mi))]
map("state", fill = TRUE, col = colores, boundary = FALSE)
title("Total injuries")
ticks <- c(500, 1000, 5000, 10000, 15000)
image.plot(legend.only = TRUE, zlim = c(mi, ma), nlevel = 100, col = col1, axis.args = list(at = log(ticks), 
    labels = ticks))

plot of chunk map2

This is the number of injuries by state:

unique(stateinj)
##    state     x
## 2     AL  8742
## 7     AZ   968
## 5     AR  5550
## 8     CA  3278
## 9     CO  1004
## 10    CT   897
## 12    DE   338
## 11    DC   383
## 13    FL  5918
## 14    GA  5061
## 19    ID   273
## 20    IL  5563
## 21    IN  4720
## 18    IA  2892
## 22    KS  3449
## 23    KY  3480
## 24    LA  3215
## 33    ME   177
## 32    MD  1537
## 31    MA  2121
## 35    MI  4586
## 36    MN  2282
## 38    MS  6675
## 37    MO  8998
## 39    MT   181
## 42    NE  1471
## 46    NV   232
## 43    NH   195
## 44    NJ  1152
## 45    NM   385
## 47    NY  1340
## 40    NC  3415
## 41    ND   608
## 48    OH  7112
## 49    OK  5710
## 50    OR   225
## 51    PA  3223
## 57    RI    48
## 58    SC  1786
## 59    SD   868
## 62    TN  5202
## 63    TX 17667
## 64    UT  1070
## 67    VT    71
## 65    VA  1703
## 68    WA   753
## 70    WV   363
## 69    WI  2309
## 71    WY   432

Cost by State

This is a comparative map of the spatial distribution of damage cost by state (US dollars).

First we match the states:

statecost <- aggregate(sdata$COST, by = list(state = sdata$STATE), sum)
data(state.fips)
mat2 <- match(state.fips$abb, statecost$state)
statecost <- statecost[mat2, ]

Now we create the map:


col1 <- heat.colors(100)
logsc <- log(statecost$x)
mi <- min(logsc)
ma <- max(logsc)

colores <- col1[ceiling(1 + 99 * (logsc - mi)/(ma - mi))]
map("state", fill = TRUE, col = colores, boundary = FALSE)
title("Total Cost $")
ticks <- c(1e+08, 1e+09, 1e+10, 1e+11)
image.plot(legend.only = TRUE, zlim = c(mi, ma), nlevel = 100, col = col1, axis.args = list(at = log(ticks), 
    labels = ticks))

plot of chunk map1

This is the cost by state in dollars:

unique(statecost)
##    state         x
## 2     AL 1.785e+10
## 7     AZ 3.961e+09
## 5     AR 4.557e+09
## 8     CA 1.271e+11
## 9     CO 2.890e+09
## 10    CT 7.616e+08
## 12    DE 1.790e+08
## 11    DC 1.578e+08
## 13    FL 4.541e+10
## 14    GA 6.028e+09
## 19    ID 2.675e+08
## 20    IL 1.409e+10
## 21    IN 4.891e+09
## 18    IA 1.019e+10
## 22    KS 5.062e+09
## 23    KY 3.038e+09
## 24    LA 6.130e+10
## 33    ME 5.623e+08
## 32    MD 1.300e+09
## 31    MA 1.280e+09
## 35    MI 2.687e+09
## 36    MN 5.702e+09
## 38    MS 3.642e+10
## 37    MO 7.932e+09
## 39    MT 4.053e+08
## 42    NE 5.300e+09
## 46    NV 8.408e+08
## 43    NH 2.204e+08
## 44    NJ 3.295e+09
## 45    NM 1.981e+09
## 47    NY 4.972e+09
## 40    NC 1.028e+10
## 41    ND 5.866e+09
## 48    OH 7.250e+09
## 49    OK 6.720e+09
## 50    OR 1.065e+09
## 51    PA 5.423e+09
## 57    RI 1.206e+08
## 58    SC 1.268e+09
## 59    SD 8.521e+08
## 62    TN 6.583e+09
## 63    TX 3.394e+10
## 64    UT 7.982e+08
## 67    VT 1.538e+09
## 65    VA 2.532e+09
## 68    WA 1.413e+09
## 70    WV 1.027e+09
## 69    WI 4.203e+09
## 71    WY 2.221e+08