Synopsis

This document is aimed to identify which types of storm and severe weather events are most harmful to public health and which types have the greatest economic consequences.

The analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including place, date and estimates of any fatalities, injuries and property damage.

Documentation of the data base is available:
1. NATIONAL WEATHER SERVICE INSTRUCTION 1-1605
2. Storm Data FAQ Page

Instruction 1-165, issued on August 17, 2007, contains the updated events type table among some other useful information.

The events in the database start in the year 1950 and end in November 2011. In recent years there are more events recorded and more complete.

The analysis conclusion is that Tornado is the most harmful and damaging event type. Resources should be allocated prioritary to prepare for this event type.

Data processing

Download the data file from the url provided.

if (!file.exists ("../files/StormData.bz2")){
    fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(fileUrl, destfile = "../files/StormData.bz2", method = "wget")
}

The data come as a comma-separated-value file compressed via the bzip2 algorithm. It can be read as a csv file.

if (!exists("ds")){
    ds <- read.csv("../files/StormData.bz2", stringsAsFactors = FALSE, na.strings = "")
}

Check the dataset dim and vars.

dim(ds)
## [1] 902297     37
str(ds)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Convert BGN_DATE to date.

ds$begin.date <- as.Date(ds$BGN_DATE, "%m/%d/%Y")
summary(ds$begin.date)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "1950-01-03" "1995-04-20" "2002-03-18" "1998-12-27" "2007-07-28" 
##         Max. 
## "2011-11-30"

Summary of event types.

str(as.factor(ds$EVTYPE))
##  Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
summary(as.factor(ds$EVTYPE))
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940

There are many more events types in dataset than in published table. Some cleaning applied.

ds$evt <- toupper (ds$EVTYPE)
ds$evt <- gsub("^ *| *$", "", ds$evt)
ds$evt <- gsub(" {2,}", " ", ds$evt)
ds$evt <- gsub(" /|/ ", "", ds$evt)
ds$evt <- gsub("COLD$", "COLD/WIND CHILL", ds$evt)
ds$evt <- gsub("^URBAN", "FLOOD", ds$evt)
ds$evt <- gsub("^FLOOD(.)*", "FLOOD", ds$evt)
ds$evt <- gsub("^HEAT(.)+", "HEAT", ds$evt)
ds$evt <- gsub("^HIGH WIND(.)+", "HIGH WIND", ds$evt)
ds$evt <- gsub("^HURRICANE(.)*", "HURRICANE (TYPHOON)", ds$evt)
ds$evt <- gsub("^MARINE TSTM(.)*", "MARINE THUNDERSTORM WIND", ds$evt)
ds$evt <- gsub("^THUNDERSTORM(.)+", "THUNDERSTORM WIND", ds$evt)
ds$evt <- gsub("^TSTM WIND(.)*", "THUNDERSTORM WIND", ds$evt)
ds$evt <- gsub("^WILD/(.)+", "WILDFIRE", ds$evt)
ds$evt <- gsub("^WINTER WEATHER(.)+", "WINTER WEATHER", ds$evt)

Summary of event types after cleaning.

summary(as.factor(ds$evt))
##            THUNDERSTORM WIND                         HAIL 
##                       324713                       288661 
##                      TORNADO                  FLASH FLOOD 
##                        60652                        54278 
##                        FLOOD                    HIGH WIND 
##                        29913                        21802 
##                    LIGHTNING                   HEAVY SNOW 
##                        15755                        15708 
##     MARINE THUNDERSTORM WIND                   HEAVY RAIN 
##                        11987                        11742 
##                 WINTER STORM               WINTER WEATHER 
##                        11433                         8155 
##                 FUNNEL CLOUD                     WILDFIRE 
##                         6844                         4219 
##                   WATERSPOUT                  STRONG WIND 
##                         3797                         3569 
##                     BLIZZARD                      DROUGHT 
##                         2719                         2488 
##                    ICE STORM               EXCESSIVE HEAT 
##                         2006                         1678 
##      EXTREME COLD/WIND CHILL                 FROST/FREEZE 
##                         1659                         1343 
##                    DENSE FOG                         HEAT 
##                         1293                          848 
##                    HIGH SURF               TROPICAL STORM 
##                          734                          690 
##               FLASH FLOODING                COASTAL FLOOD 
##                          682                          657 
##             LAKE-EFFECT SNOW              COLD/WIND CHILL 
##                          636                          621 
##                         SNOW                    LANDSLIDE 
##                          617                          600 
##                          FOG                  RIP CURRENT 
##                          538                          470 
##                  MARINE HAIL                   DUST STORM 
##                          442                          427 
##                    AVALANCHE                         WIND 
##                          386                          347 
##                 RIP CURRENTS          HURRICANE (TYPHOON) 
##                          304                          288 
##                  STORM SURGE                FREEZING RAIN 
##                          261                          260 
##         HEAVY SURF/HIGH SURF            EXTREME WINDCHILL 
##                          228                          204 
##                 STRONG WINDS               DRY MICROBURST 
##                          204                          186 
##             COASTAL FLOODING                   LIGHT SNOW 
##                          183                          176 
##        ASTRONOMICAL LOW TIDE                  RIVER FLOOD 
##                          174                          173 
##                RECORD WARMTH                   DUST DEVIL 
##                          154                          149 
##             STORM SURGE/TIDE             MARINE HIGH WIND 
##                          148                          135 
##            UNSEASONABLY WARM       ASTRONOMICAL HIGH TIDE 
##                          126                          103 
##            MODERATE SNOWFALL                   WINTRY MIX 
##                          101                           94 
##                FUNNEL CLOUDS                   HEAVY SURF 
##                           87                           87 
##                  RECORD HEAT                       FREEZE 
##                           82                           76 
##       RECORD COLD/WIND CHILL                  GUSTY WINDS 
##                           68                           65 
##                          ICE          TROPICAL DEPRESSION 
##                           61                           60 
##                        SLEET                        FROST 
##                           59                           57 
##             UNSEASONABLY DRY                   SMALL HAIL 
##                           56                           53 
##                        OTHER           MARINE STRONG WIND 
##                           52                           48 
##                 FREEZING FOG                       FUNNEL 
##                           46                           46 
##                 THUNDERSTORM                        GLAZE 
##                           45                           43 
##           TEMPERATURE RECORD          MIXED PRECIPITATION 
##                           43                           37 
##                  WATERSPOUTS        MONTHLY PRECIPITATION 
##                           37                           36 
##                        WINDS                 SNOW AND ICE 
##                           36                           34 
##                 FLASH FLOODS           HEAVY SNOW SQUALLS 
##                           32                           32 
##                    ICY ROADS                  WIND DAMAGE 
##                           32                           31 
##                      HAIL 75               RIVER FLOODING 
##                           29                           29 
##                  HEAVY RAINS               EXCESSIVE SNOW 
##                           26                           25 
##              HEAVY LAKE SNOW               TIDAL FLOODING 
##                           25                           25 
##             FREEZING DRIZZLE                   GUSTY WIND 
##                           24                           24 
##             LAKE EFFECT SNOW              LAKESHORE FLOOD 
##                           23                           23 
##          LIGHT FREEZING RAIN         SEVERE THUNDERSTORMS 
##                           23                           23 
## UNSEASONABLY COLD/WIND CHILL                      (Other) 
##                           23                         1774

Still too many event types. It seems to be sensible to select events with frequency > 0.1%

min_freq <- length(ds$evt)[1] * 0.001
evt_freq <- tapply(ds$evt, ds$evt, length)
events <- names(evt_freq[evt_freq > min_freq])
table(ds$evt[ds$evt %in% events])
## 
##                 BLIZZARD                DENSE FOG                  DROUGHT 
##                     2719                     1293                     2488 
##           EXCESSIVE HEAT  EXTREME COLD/WIND CHILL              FLASH FLOOD 
##                     1678                     1659                    54278 
##                    FLOOD             FROST/FREEZE             FUNNEL CLOUD 
##                    29913                     1343                     6844 
##                     HAIL               HEAVY RAIN               HEAVY SNOW 
##                   288661                    11742                    15708 
##                HIGH WIND                ICE STORM                LIGHTNING 
##                    21802                     2006                    15755 
## MARINE THUNDERSTORM WIND              STRONG WIND        THUNDERSTORM WIND 
##                    11987                     3569                   324713 
##                  TORNADO               WATERSPOUT                 WILDFIRE 
##                    60652                     3797                     4219 
##             WINTER STORM           WINTER WEATHER 
##                    11433                     8155

Filter the dataset with events and select cols needed for analysis.

ds1 <- ds[ds$evt %in% events, c("begin.date", "evt", "FATALITIES", "INJURIES", "PROPDMG", 
                                    "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
summary(ds1)
##    begin.date             evt              FATALITIES       
##  Min.   :1950-01-03   Length:886414      Min.   :  0.00000  
##  1st Qu.:1995-03-19   Class :character   1st Qu.:  0.00000  
##  Median :2002-03-29   Mode  :character   Median :  0.00000  
##  Mean   :1998-12-05                      Mean   :  0.01362  
##  3rd Qu.:2007-07-27                      3rd Qu.:  0.00000  
##  Max.   :2011-11-30                      Max.   :158.00000  
##     INJURIES            PROPDMG         PROPDMGEXP       
##  Min.   :   0.0000   Min.   :   0.00   Length:886414     
##  1st Qu.:   0.0000   1st Qu.:   0.00   Class :character  
##  Median :   0.0000   Median :   0.00   Mode  :character  
##  Mean   :   0.1494   Mean   :  11.95                     
##  3rd Qu.:   0.0000   3rd Qu.:   0.50                     
##  Max.   :1700.0000   Max.   :5000.00                     
##     CROPDMG         CROPDMGEXP       
##  Min.   :  0.000   Length:886414     
##  1st Qu.:  0.000   Class :character  
##  Median :  0.000   Mode  :character  
##  Mean   :  1.506                     
##  3rd Qu.:  0.000                     
##  Max.   :990.000

Check magnitudes of damages.

table(ds1$PROPDMGEXP, useNA = "ifany")
## 
##      -      ?      +      0      1      2      3      4      5      6 
##      1      8      4    215     23     13      4      4     28      4 
##      7      8      B      h      H      K      m      M   <NA> 
##      5      1     14      1      6 417736      6  10793 457548
table(ds1$CROPDMGEXP, useNA = "ifany")
## 
##      ?      0      2      B      k      K      M   <NA> 
##      6     19      1      5     19 276613   1826 607925

Proportion of missing values is high. Check amounts with missing magnitude.

table(ds1$PROPDMG[is.na(ds1$PROPDMGEXP)])
## 
##      0      1      2      3      4      5      6      7      8      9 
## 457476      4      7     16      8     11      5      3      2      3 
##     10     20     35     75 
##      7      4      1      1
table(ds1$CROPDMG[is.na(ds1$CROPDMGEXP)])
## 
##      0      3      4 
## 607922      1      2

Most of them are 0.

Transform magnitues to figures. Million as unit.

ds1$pdmg[ds1$PROPDMGEXP %in% c("k", "K")] <- 0.001
ds1$pdmg[ds1$PROPDMGEXP %in% c("m", "M")] <- 1
ds1$pdmg[ds1$PROPDMGEXP %in% c("b", "B")] <- 1000
ds1$pdmg[is.na(ds1$pdmg)] <- 0

ds1$cdmg[ds1$CROPDMGEXP %in% c("k", "K")] <- 0.001
ds1$cdmg[ds1$CROPDMGEXP %in% c("m", "M")] <- 1
ds1$cdmg[ds1$CROPDMGEXP %in% c("b", "B")] <- 1000
ds1$cdmg[is.na(ds1$cdmg)] <- 0

Obtain same magnitude and total amount of damage per event.

ds1$pd <- ds1$PROPDMG * ds1$pdmg
summary(ds1$pd)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 3.10e-01 0.00e+00 1.15e+05
ds1$cd <- ds1$CROPDMG * ds1$cdmg
summary(ds1$cd)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.0e+00 0.0e+00 0.0e+00 4.1e-02 0.0e+00 5.0e+03
ds1$td <- ds1$pd + ds1$cd
summary(ds1$td)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 3.50e-01 0.00e+00 1.15e+05
ds1$tif <- ds1$FATALITIES + ds1$INJURIES

head(ds1)
##   begin.date     evt FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1950-04-18 TORNADO          0       15    25.0          K       0
## 2 1950-04-18 TORNADO          0        0     2.5          K       0
## 3 1951-02-20 TORNADO          0        2    25.0          K       0
## 4 1951-06-08 TORNADO          0        2     2.5          K       0
## 5 1951-11-15 TORNADO          0        2     2.5          K       0
## 6 1951-11-15 TORNADO          0        6     2.5          K       0
##   CROPDMGEXP  pdmg cdmg     pd cd     td tif
## 1       <NA> 0.001    0 0.0250  0 0.0250  15
## 2       <NA> 0.001    0 0.0025  0 0.0025   0
## 3       <NA> 0.001    0 0.0250  0 0.0250   2
## 4       <NA> 0.001    0 0.0025  0 0.0025   2
## 5       <NA> 0.001    0 0.0025  0 0.0025   2
## 6       <NA> 0.001    0 0.0025  0 0.0025   6

Results

Make some plots to see results.

Types of events most harmful with respect to population health across the US.

Filter events with fatalities and injuries.

ds2 <- ds1[ds1$tif > 0, c("begin.date", "evt", "FATALITIES", "INJURIES", "tif")]

Scatter plot.

qplot(evt, tif, data = ds2, log = "y", xlab = "Event", ylab = "log(People)",
      main = "Fatalities and injuries by event type", geom = "jitter") +
     theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

The result is that Tornado is by far the most harmful event type in terms of fatalities and injuries.

Types of events that have the greatest economic consequences across the US.

Filter events with economic consequences.

ds3 <- ds1[ds1$td > 0, c("begin.date", "evt", "pd", "cd", "td")]

Scatter plot.

qplot(evt, td, data = ds3, log = "y", xlab = "Event", ylab = "log(US$ in millions)",
      main = "Economic damage by event type", geom = "jitter", alpha = I(1/10)) + 
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

There is one outlier of type flood that distort the result by totals.

summary(ds3$td[ds3$evt == "FLOOD"])
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##      0.00      0.01      0.05     13.35      0.30 115000.00
ds3[ds3$td == max(ds3$td), ]
##        begin.date   evt     pd   cd       td
## 605953 2006-01-01 FLOOD 115000 32.5 115032.5

The result is that Tornado is the event type that causes greatest economic damage. Floods, hail and thunderstorm wind are also very damaging.