Reproducible Research: Peer Assessment 2 - Storm Data Analysis

Hi! Thanks for taking the time to review this assessment! We will require the ggplot2, data.table and R.utils libraries to knit this R Markdown document

library(ggplot2, quietly=T, verbose=F, warn.conflicts=F)
library(data.table, quietly=T, verbose=F, warn.conflicts=F)
library(R.utils, quietly=T, verbose=F, warn.conflicts=F)
## Loading required package: R.methodsS3
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save

Title: An analysis of severe weather impacts on population health and economic costs

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In particular, we aim to address the following questions through this data anaysis:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

As a result of this analysis, we conclude that increased assistance for excessive heat and tornado severe weather conditions are the best for reducing impact in population health. On the other hand, most of the economic impact derive from flooding and severe storm conditions.

Data Processing

For this analysis, we are gathering the data from a file made available by the Coursera class, that has a compressed file with all the data. We proceed to download and uncompress the data.

filename = "repdata_peerassessment2.csv"
filename.bzipped = "repdata_peerassessment2.bz2"
if (!file.exists(filename)) {
  if (!file.exists(filename.bzipped)) {
    retval = download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                           destfile = filename.bzipped,
                           method = "curl")
  }
  bunzip2(filename.bzipped, destname=filename)
}

We load the data up on a data.table object to process it.

storm = data.table(read.csv(filename, header=T, sep=",", stringsAsFactors=F))
str(storm)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

We will to the folowing processing to the raw data in order to facilitate computation:

  • The BGM_DATE filed will be converted to Date objects
invisible(storm[, BGN_DATE := as.Date(storm$BGN_DATE, format="%m/%d/%Y %H:%M:%S")])
  • We selected the Top 50 most occuring event types (encoded on the EVTYPE variable) for this analysis. That will address the most common event types.
event.order = table(storm$EVTYPE)[order(table(storm$EVTYPE), decreasing=T)]
event.order = event.order[1:50]
event.order
## 
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE 
##                      304                      261
storm = storm[EVTYPE %chin% names(event.order)]
  • We normalized the information from Property Damage and Crop Damage by multiplying the existing field by their correspondent values of “K”, “M” and “B”. Invalid entries on the PROPDMGEXP and CROPDMGEXP are ignored
invisible(storm[PROPDMGEXP == "K", PROPDMG := PROPDMG * 1000])
invisible(storm[PROPDMGEXP == "M", PROPDMG := PROPDMG * 1000 * 1000])
invisible(storm[PROPDMGEXP == "B", PROPDMG := PROPDMG * 1000 * 1000 * 1000])
invisible(storm[CROPDMGEXP == "K", CROPDMG := CROPDMG * 1000])
invisible(storm[CROPDMGEXP == "M", CROPDMG := CROPDMG * 1000 * 1000])
invisible(storm[CROPDMGEXP == "B", CROPDMG := CROPDMG * 1000 * 1000 * 1000])
summary(storm$PROPDMG + storm$CROPDMG)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 4.04e+05 1.00e+03 1.15e+11

Results

This section describes the results from both questions.

Impact on Population Health

In order to evaluate the impact on population health, we will analyse the information on fatailities and injuries (encoded on the variables FATALITIES and INJURIES) for us to have a better understanding of the impact of those events.

We will add all occurancies of both items over different event types and also average them out on the absolute number of events.

storm.health = storm[, list(event.count = .N,
                            fatalities.tot = sum(FATALITIES, na.rm=T),
                            fatalities.mean = sum(FATALITIES, na.rm=T)/.N,
                            injuries.tot = sum(INJURIES, na.rm=T),
                            injuries.mean = sum(INJURIES, na.rm=T)/.N), by="EVTYPE"]
invisible(storm.health[, health.tot := fatalities.tot + injuries.tot])
invisible(storm.health[, health.mean := fatalities.mean + injuries.mean])
storm.health
##                       EVTYPE event.count fatalities.tot fatalities.mean
##  1:                  TORNADO       60652           5633       9.287e-02
##  2:                TSTM WIND      219940            504       2.292e-03
##  3:                     HAIL      288661             15       5.196e-05
##  4:                     SNOW         587              5       8.518e-03
##  5:             WINTER STORM       11433            206       1.802e-02
##  6:       THUNDERSTORM WINDS       20843             64       3.071e-03
##  7:               HEAVY RAIN       11723             98       8.360e-03
##  8:                LIGHTNING       15754            816       5.180e-02
##  9:        THUNDERSTORM WIND       82563            133       1.611e-03
## 10:                DENSE FOG        1293             18       1.392e-02
## 11:              RIP CURRENT         470            368       7.830e-01
## 12:              FLASH FLOOD       54277            978       1.802e-02
## 13:           FLASH FLOODING         682             19       2.786e-02
## 14:               HIGH WINDS        1533             35       2.283e-02
## 15:             FUNNEL CLOUD        6839              0       0.000e+00
## 16:                     HEAT         767            937       1.222e+00
## 17:                     WIND         340             23       6.765e-02
## 18:                    FLOOD       25326            470       1.856e-02
## 19:               WATERSPOUT        3796              3       7.903e-04
## 20:             EXTREME COLD         655            160       2.443e-01
## 21:                HIGH WIND       20212            248       1.227e-02
## 22:                 BLIZZARD        2719            101       3.715e-02
## 23:               HEAVY SNOW       15708            127       8.085e-03
## 24:            COASTAL FLOOD         650              3       4.615e-03
## 25:                ICE STORM        2006             89       4.437e-02
## 26:                AVALANCHE         386            224       5.803e-01
## 27:               DUST STORM         427             22       5.152e-02
## 28:           EXCESSIVE HEAT        1678           1903       1.134e+00
## 29:                HIGH SURF         725            101       1.393e-01
## 30:        FLOOD/FLASH FLOOD         624             17       2.724e-02
## 31:              STRONG WIND        3566            103       2.888e-02
## 32:           WINTER WEATHER        7026             33       4.697e-03
## 33:                  DROUGHT        2488              0       0.000e+00
## 34:              STORM SURGE         261             13       4.981e-02
## 35:           TROPICAL STORM         690             58       8.406e-02
## 36:                 WILDFIRE        2761             75       2.716e-02
## 37:         WILD/FOREST FIRE        1457             12       8.236e-03
## 38:                      FOG         538             62       1.152e-01
## 39:             RIP CURRENTS         304            204       6.711e-01
## 40:                LANDSLIDE         600             38       6.333e-02
## 41:         LAKE-EFFECT SNOW         636              0       0.000e+00
## 42:     URBAN/SML STREAM FLD        3392             28       8.255e-03
## 43:           TSTM WIND/HAIL        1028              5       4.864e-03
## 44:         MARINE TSTM WIND        6175              9       1.457e-03
## 45:             FROST/FREEZE        1342              0       0.000e+00
## 46:       WINTER WEATHER/MIX        1104             28       2.536e-02
## 47:  EXTREME COLD/WIND CHILL        1002            125       1.248e-01
## 48:              MARINE HAIL         442              0       0.000e+00
## 49:          COLD/WIND CHILL         539             95       1.763e-01
## 50: MARINE THUNDERSTORM WIND        5812             10       1.721e-03
##                       EVTYPE event.count fatalities.tot fatalities.mean
##     injuries.tot injuries.mean health.tot health.mean
##  1:        91346     1.5060674      96979   1.5989415
##  2:         6957     0.0316314       7461   0.0339229
##  3:         1361     0.0047149       1376   0.0047668
##  4:           29     0.0494037         34   0.0579216
##  5:         1321     0.1155427       1527   0.1335607
##  6:          908     0.0435638        972   0.0466344
##  7:          251     0.0214109        349   0.0297705
##  8:         5230     0.3319792       6046   0.3837755
##  9:         1488     0.0180226       1621   0.0196335
## 10:          342     0.2645012        360   0.2784223
## 11:          232     0.4936170        600   1.2765957
## 12:         1777     0.0327395       2755   0.0507581
## 13:            8     0.0117302         27   0.0395894
## 14:          302     0.1969993        337   0.2198304
## 15:            3     0.0004387          3   0.0004387
## 16:         2100     2.7379400       3037   3.9595828
## 17:           86     0.2529412        109   0.3205882
## 18:         6789     0.2680644       7259   0.2866224
## 19:           29     0.0076396         32   0.0084299
## 20:          231     0.3526718        391   0.5969466
## 21:         1137     0.0562537       1385   0.0685236
## 22:          805     0.2960647        906   0.3332107
## 23:         1021     0.0649987       1148   0.0730838
## 24:            2     0.0030769          5   0.0076923
## 25:         1975     0.9845464       2064   1.0289133
## 26:          170     0.4404145        394   1.0207254
## 27:          440     1.0304450        462   1.0819672
## 28:         6525     3.8885578       8428   5.0226460
## 29:          152     0.2096552        253   0.3489655
## 30:           15     0.0240385         32   0.0512821
## 31:          280     0.0785193        383   0.1074033
## 32:          398     0.0566467        431   0.0613436
## 33:            4     0.0016077          4   0.0016077
## 34:           38     0.1455939         51   0.1954023
## 35:          340     0.4927536        398   0.5768116
## 36:          911     0.3299529        986   0.3571170
## 37:          545     0.3740563        557   0.3822924
## 38:          734     1.3643123        796   1.4795539
## 39:          297     0.9769737        501   1.6480263
## 40:           52     0.0866667         90   0.1500000
## 41:            0     0.0000000          0   0.0000000
## 42:           79     0.0232901        107   0.0315448
## 43:           95     0.0924125        100   0.0972763
## 44:            8     0.0012955         17   0.0027530
## 45:            0     0.0000000          0   0.0000000
## 46:           72     0.0652174        100   0.0905797
## 47:           24     0.0239521        149   0.1487026
## 48:            0     0.0000000          0   0.0000000
## 49:           12     0.0222635        107   0.1985158
## 50:           26     0.0044735         36   0.0061941
##     injuries.tot injuries.mean health.tot health.mean

We can consider the top aggregate health incidents (number of fatalities + number of injuries) on all recorded events of a specific type to try to understand the potential impacts of the events. We select the top 5 event types for this plot here.

storm.health = storm.health[order(health.tot, decreasing=T)]
storm.health.tot = storm.health[1:5]
qplot(as.factor(EVTYPE), health.tot, data=storm.health.tot, 
      geom="bar", stat="identity", 
      main="Aggregate of health incidents per event type",
      xlab="Severe Weather Event Type",
      ylab="Count of Fatalities + Injuries", fill=..y..)

plot of chunk unnamed-chunk-8

Alternativelly, we can look at the averages of such health incidents across every single event, which can provide us with an expected value of the impacts of such incidents

storm.health = storm.health[order(health.mean, decreasing=T)]
storm.health.mean = storm.health[1:5]
qplot(as.factor(EVTYPE), health.mean, data=storm.health.mean, 
      geom="bar", stat="identity", 
      main="Average of health incidents per event type",
      xlab="Severe Weather Event Type",
      ylab="Average of Fatalities + Injuries per instance", fill=..y..)

plot of chunk unnamed-chunk-9

Given these results, it would seem that events related to heating (EXCESSIVE HEAT and HEAT) and with TORNADOs are the ones that are worth investigating to divert additional funds for assistance if we are to reduce injuries and loss of life after severe weather events.

Economic Consequences

In order to evaluate the impact on the economy, we will analyse the information on crop and propery damage (encoded on the variables PROPDMG and CROPDMG) for us to have a better understanding of the impact of those events.

There variables have already been normalized on the Data Processing section

This time, we will only consider the averages of economic damage for each event type.

storm.econ = storm[, list(event.count = .N,
                          prop.mean = sum(PROPDMG, na.rm=T)/.N,
                          crop.mean = sum(CROPDMG, na.rm=T)/.N), by="EVTYPE"]
invisible(storm.econ[, econ.mean := prop.mean + crop.mean])
storm.econ
##                       EVTYPE event.count prop.mean crop.mean econ.mean
##  1:                  TORNADO       60652 9.386e+05 6.842e+03 9.454e+05
##  2:                TSTM WIND      219940 2.039e+04 2.519e+03 2.291e+04
##  3:                     HAIL      288661 5.448e+04 1.048e+04 6.497e+04
##  4:                     SNOW         587 2.515e+04 1.704e+01 2.517e+04
##  5:             WINTER STORM       11433 5.850e+05 2.357e+03 5.874e+05
##  6:       THUNDERSTORM WINDS       20843 8.317e+04 9.147e+03 9.231e+04
##  7:               HEAVY RAIN       11723 5.922e+04 6.256e+04 1.218e+05
##  8:                LIGHTNING       15754 5.895e+04 7.676e+02 5.972e+04
##  9:        THUNDERSTORM WIND       82563 4.219e+04 5.025e+03 4.721e+04
## 10:                DENSE FOG        1293 7.482e+03 0.000e+00 7.482e+03
## 11:              RIP CURRENT         470 2.128e+00 0.000e+00 2.128e+00
## 12:              FLASH FLOOD       54277 2.974e+05 2.619e+04 3.236e+05
## 13:           FLASH FLOODING         682 4.513e+05 2.215e+04 4.734e+05
## 14:               HIGH WINDS        1533 3.968e+05 2.656e+04 4.234e+05
## 15:             FUNNEL CLOUD        6839 2.845e+01 0.000e+00 2.845e+01
## 16:                     HEAT         767 2.343e+03 5.234e+05 5.258e+05
## 17:                     WIND         340 2.554e+04 8.824e+02 2.642e+04
## 18:                    FLOOD       25326 5.712e+06 2.236e+05 5.935e+06
## 19:               WATERSPOUT        3796 2.464e+03 0.000e+00 2.464e+03
## 20:             EXTREME COLD         655 1.034e+05 1.974e+06 2.077e+06
## 21:                HIGH WIND       20212 2.607e+05 3.159e+04 2.923e+05
## 22:                 BLIZZARD        2719 2.424e+05 4.121e+04 2.837e+05
## 23:               HEAVY SNOW       15708 5.937e+04 8.572e+03 6.794e+04
## 24:            COASTAL FLOOD         650 3.656e+05 0.000e+00 3.656e+05
## 25:                ICE STORM        2006 1.967e+06 2.504e+06 4.470e+06
## 26:                AVALANCHE         386 9.642e+03 0.000e+00 9.642e+03
## 27:               DUST STORM         427 1.300e+04 7.260e+03 2.026e+04
## 28:           EXCESSIVE HEAT        1678 4.621e+03 2.934e+05 2.981e+05
## 29:                HIGH SURF         725 1.236e+05 0.000e+00 1.236e+05
## 30:        FLOOD/FLASH FLOOD         624 2.789e+05 1.523e+05 4.312e+05
## 31:              STRONG WIND        3566 4.914e+04 1.821e+04 6.736e+04
## 32:           WINTER WEATHER        7026 2.970e+03 2.135e+03 5.105e+03
## 33:                  DROUGHT        2488 4.205e+05 5.616e+06 6.036e+06
## 34:              STORM SURGE         261 1.660e+08 1.916e+01 1.660e+08
## 35:           TROPICAL STORM         690 1.117e+07 9.831e+05 1.215e+07
## 36:                 WILDFIRE        2761 1.726e+06 1.070e+05 1.833e+06
## 37:         WILD/FOREST FIRE        1457 2.060e+06 7.330e+04 2.134e+06
## 38:                      FOG         538 2.445e+04 0.000e+00 2.445e+04
## 39:             RIP CURRENTS         304 5.329e+02 0.000e+00 5.329e+02
## 40:                LANDSLIDE         600 5.410e+05 3.336e+04 5.744e+05
## 41:         LAKE-EFFECT SNOW         636 6.307e+04 0.000e+00 6.307e+04
## 42:     URBAN/SML STREAM FLD        3392 1.719e+04 2.502e+03 1.969e+04
## 43:           TSTM WIND/HAIL        1028 4.313e+04 6.293e+04 1.061e+05
## 44:         MARINE TSTM WIND        6175 8.779e+02 0.000e+00 8.779e+02
## 45:             FROST/FREEZE        1342 7.064e+03 8.153e+05 8.223e+05
## 46:       WINTER WEATHER/MIX        1104 5.772e+03 0.000e+00 5.772e+03
## 47:  EXTREME COLD/WIND CHILL        1002 8.631e+03 4.990e+01 8.681e+03
## 48:              MARINE HAIL         442 9.050e+00 0.000e+00 9.050e+00
## 49:          COLD/WIND CHILL         539 3.692e+03 1.113e+03 4.805e+03
## 50: MARINE THUNDERSTORM WIND        5812 7.509e+01 8.603e+00 8.369e+01
##                       EVTYPE event.count prop.mean crop.mean econ.mean

Again, we select the top 5 event types in economic impact for this plot here. We put this one in a logarithmic scale so we couls better visualize the impact.

storm.econ = storm.econ[order(econ.mean, decreasing=T)]
storm.econ.tot = storm.econ[1:5]
qplot(as.factor(EVTYPE), econ.mean, data=storm.econ.tot, 
      geom="bar", stat="identity", log="y",
      main="Average of economic impact per event type",
      xlab="Severe Weather Event Type",
      ylab="Average of Property and Crop Damage per Incident", fill=..y..)

plot of chunk unnamed-chunk-11

It is clear that the most economic impact come from flooding and storm related severe weather events, most probably because of crop damage and flooding of residential and commercial areas.