Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property or corp loss (in US Dollars).

This analysis builds on this dataset to find out what are the most severe weather events in terms of personnal and goods loss, and quantify that loss by type of event, in order to properly assign funds to tackle these events accordingly.

Tornados are the principal cause of death (and injuries) by severe weather events.

Framing the problem

The objective of the following analysis is to help prioritize resources to tackle different types of weather events, based on its relative impact on wealth destruction and personnal loss.

We propose to answer the two following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The data for this analysis comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Getting data from files

The following code chunk loads the storm database into an R dataframe and give us a glimpse on the data structure (I previously downloaded the data from here and placed it in the data directory).

sd <- read.csv(bzfile("./data/data.csv.bz2"))
str(sd)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
a <- dim(sd)

This database holds 902297 observations of weather events in the States, accross 37 variables.

Processing the data

In order to provide a proper answer for our questions, we will only need eight of those variables, for those observations that impacted on loss of lives, injuries or Propriety/crops damage.

library(dplyr)
SD <- tbl_df(sd)
rm(sd)
SD <- SD %>% select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,CROPDMG, CROPDMGEXP) %>% filter(FATALITIES>0 | INJURIES>0 | PROPDMG >0 | CROPDMG >0)
dim2 <- dim(SD)

We are left with only 254633observations of the initial 902297. The following provides an overview on these eight variables and provides a reasoning for its inclusion in later analyses.

EVTYPE

The first variable we will introduce is EVTYPE: it’s a Event Name Designator and the cornerstone variable for all subsequent analyses. According to Storm data documentation there are 48 different weather events considered in this database.

However, if we look at the values this variable takes, we will get much more possibilites.

SD$EVTYPE <- factor(SD$EVTYPE)
sort(unique(SD$EVTYPE))
##   [1]    HIGH SURF ADVISORY           FLASH FLOOD                  
##   [3]  TSTM WIND                      TSTM WIND (G45)              
##   [5] ?                              AGRICULTURAL FREEZE           
##   [7] APACHE COUNTY                  ASTRONOMICAL HIGH TIDE        
##   [9] ASTRONOMICAL LOW TIDE          AVALANCE                      
##  [11] AVALANCHE                      Beach Erosion                 
##  [13] BLACK ICE                      BLIZZARD                      
##  [15] BLIZZARD/WINTER STORM          BLOWING DUST                  
##  [17] blowing snow                   BLOWING SNOW                  
##  [19] BREAKUP FLOODING               BRUSH FIRE                    
##  [21] COASTAL  FLOODING/EROSION      COASTAL EROSION               
##  [23] Coastal Flood                  COASTAL FLOOD                 
##  [25] Coastal Flooding               COASTAL FLOODING              
##  [27] COASTAL FLOODING/EROSION       Coastal Storm                 
##  [29] COASTAL STORM                  COASTAL SURGE                 
##  [31] COASTALSTORM                   Cold                          
##  [33] COLD                           COLD AIR TORNADO              
##  [35] COLD AND SNOW                  COLD AND WET CONDITIONS       
##  [37] Cold Temperature               COLD WAVE                     
##  [39] COLD WEATHER                   COLD/WIND CHILL               
##  [41] COLD/WINDS                     COOL AND WET                  
##  [43] DAM BREAK                      Damaging Freeze               
##  [45] DAMAGING FREEZE                DENSE FOG                     
##  [47] DENSE SMOKE                    DOWNBURST                     
##  [49] DROUGHT                        DROUGHT/EXCESSIVE HEAT        
##  [51] DROWNING                       DRY MICROBURST                
##  [53] DRY MIRCOBURST WINDS           Dust Devil                    
##  [55] DUST DEVIL                     DUST DEVIL WATERSPOUT         
##  [57] DUST STORM                     DUST STORM/HIGH WINDS         
##  [59] Early Frost                    Erosion/Cstl Flood            
##  [61] EXCESSIVE HEAT                 EXCESSIVE RAINFALL            
##  [63] EXCESSIVE SNOW                 EXCESSIVE WETNESS             
##  [65] Extended Cold                  Extreme Cold                  
##  [67] EXTREME COLD                   EXTREME COLD/WIND CHILL       
##  [69] EXTREME HEAT                   EXTREME WIND CHILL            
##  [71] EXTREME WINDCHILL              FALLING SNOW/ICE              
##  [73] FLASH FLOOD                    FLASH FLOOD - HEAVY RAIN      
##  [75] FLASH FLOOD FROM ICE JAMS      FLASH FLOOD LANDSLIDES        
##  [77] FLASH FLOOD WINDS              FLASH FLOOD/                  
##  [79] FLASH FLOOD/ STREET            FLASH FLOOD/FLOOD             
##  [81] FLASH FLOOD/LANDSLIDE          FLASH FLOODING                
##  [83] FLASH FLOODING/FLOOD           FLASH FLOODING/THUNDERSTORM WI
##  [85] FLASH FLOODS                   FLOOD                         
##  [87] FLOOD & HEAVY RAIN             FLOOD FLASH                   
##  [89] FLOOD/FLASH                    FLOOD/FLASH FLOOD             
##  [91] FLOOD/FLASH/FLOOD              FLOOD/FLASHFLOOD              
##  [93] FLOOD/RAIN/WINDS               FLOOD/RIVER FLOOD             
##  [95] FLOODING                       FLOODING/HEAVY RAIN           
##  [97] FLOODS                         FOG                           
##  [99] FOG AND COLD TEMPERATURES      FOREST FIRES                  
## [101] Freeze                         FREEZE                        
## [103] Freezing drizzle               Freezing Drizzle              
## [105] FREEZING DRIZZLE               FREEZING FOG                  
## [107] Freezing Rain                  FREEZING RAIN                 
## [109] FREEZING RAIN/SLEET            FREEZING RAIN/SNOW            
## [111] Freezing Spray                 FROST                         
## [113] Frost/Freeze                   FROST/FREEZE                  
## [115] FROST\\FREEZE                  FUNNEL CLOUD                  
## [117] Glaze                          GLAZE                         
## [119] GLAZE ICE                      GLAZE/ICE STORM               
## [121] gradient wind                  Gradient wind                 
## [123] GRADIENT WIND                  GRASS FIRES                   
## [125] GROUND BLIZZARD                GUSTNADO                      
## [127] GUSTY WIND                     GUSTY WIND/HAIL               
## [129] GUSTY WIND/HVY RAIN            Gusty wind/rain               
## [131] Gusty winds                    Gusty Winds                   
## [133] GUSTY WINDS                    HAIL                          
## [135] HAIL 0.75                      HAIL 075                      
## [137] HAIL 100                       HAIL 125                      
## [139] HAIL 150                       HAIL 175                      
## [141] HAIL 200                       HAIL 275                      
## [143] HAIL 450                       HAIL 75                       
## [145] HAIL DAMAGE                    HAIL/WIND                     
## [147] HAIL/WINDS                     HAILSTORM                     
## [149] HARD FREEZE                    HAZARDOUS SURF                
## [151] HEAT                           Heat Wave                     
## [153] HEAT WAVE                      HEAT WAVE DROUGHT             
## [155] HEAT WAVES                     HEAVY LAKE SNOW               
## [157] HEAVY MIX                      HEAVY PRECIPITATION           
## [159] HEAVY RAIN                     HEAVY RAIN AND FLOOD          
## [161] Heavy Rain/High Surf           HEAVY RAIN/LIGHTNING          
## [163] HEAVY RAIN/SEVERE WEATHER      HEAVY RAIN/SMALL STREAM URBAN 
## [165] HEAVY RAIN/SNOW                HEAVY RAINS                   
## [167] HEAVY RAINS/FLOODING           HEAVY SEAS                    
## [169] HEAVY SHOWER                   HEAVY SNOW                    
## [171] HEAVY SNOW-SQUALLS             HEAVY SNOW AND HIGH WINDS     
## [173] HEAVY SNOW AND STRONG WINDS    Heavy snow shower             
## [175] HEAVY SNOW SQUALLS             HEAVY SNOW/BLIZZARD           
## [177] HEAVY SNOW/BLIZZARD/AVALANCHE  HEAVY SNOW/FREEZING RAIN      
## [179] HEAVY SNOW/HIGH WINDS & FLOOD  HEAVY SNOW/ICE                
## [181] HEAVY SNOW/SQUALLS             HEAVY SNOW/WIND               
## [183] HEAVY SNOW/WINTER STORM        HEAVY SNOWPACK                
## [185] Heavy Surf                     HEAVY SURF                    
## [187] Heavy surf and wind            HEAVY SURF COASTAL FLOODING   
## [189] HEAVY SURF/HIGH SURF           HEAVY SWELLS                  
## [191] HIGH                           HIGH  WINDS                   
## [193] HIGH SEAS                      High Surf                     
## [195] HIGH SURF                      HIGH SWELLS                   
## [197] HIGH TIDES                     HIGH WATER                    
## [199] HIGH WAVES                     HIGH WIND                     
## [201] HIGH WIND (G40)                HIGH WIND 48                  
## [203] HIGH WIND AND SEAS             HIGH WIND DAMAGE              
## [205] HIGH WIND/BLIZZARD             HIGH WIND/HEAVY SNOW          
## [207] HIGH WIND/SEAS                 HIGH WINDS                    
## [209] HIGH WINDS HEAVY RAINS         HIGH WINDS/                   
## [211] HIGH WINDS/COASTAL FLOOD       HIGH WINDS/COLD               
## [213] HIGH WINDS/HEAVY RAIN          HIGH WINDS/SNOW               
## [215] HURRICANE                      HURRICANE-GENERATED SWELLS    
## [217] Hurricane Edouard              HURRICANE EMILY               
## [219] HURRICANE ERIN                 HURRICANE FELIX               
## [221] HURRICANE GORDON               HURRICANE OPAL                
## [223] HURRICANE OPAL/HIGH WINDS      HURRICANE/TYPHOON             
## [225] HVY RAIN                       HYPERTHERMIA/EXPOSURE         
## [227] HYPOTHERMIA                    Hypothermia/Exposure          
## [229] HYPOTHERMIA/EXPOSURE           ICE                           
## [231] ICE AND SNOW                   ICE FLOES                     
## [233] ICE JAM                        Ice jam flood (minor          
## [235] ICE JAM FLOODING               ICE ON ROAD                   
## [237] ICE ROADS                      ICE STORM                     
## [239] ICE STORM/FLASH FLOOD          ICE/STRONG WINDS              
## [241] ICY ROADS                      LAKE-EFFECT SNOW              
## [243] Lake Effect Snow               LAKE EFFECT SNOW              
## [245] LAKE FLOOD                     LAKESHORE FLOOD               
## [247] LANDSLIDE                      LANDSLIDES                    
## [249] Landslump                      LANDSPOUT                     
## [251] LATE SEASON SNOW               LIGHT FREEZING RAIN           
## [253] Light snow                     Light Snow                    
## [255] LIGHT SNOW                     Light Snowfall                
## [257] LIGHTING                       LIGHTNING                     
## [259] LIGHTNING  WAUSEON             LIGHTNING AND HEAVY RAIN      
## [261] LIGHTNING AND THUNDERSTORM WIN LIGHTNING FIRE                
## [263] LIGHTNING INJURY               LIGHTNING THUNDERSTORM WINDS  
## [265] LIGHTNING.                     LIGHTNING/HEAVY RAIN          
## [267] LIGNTNING                      LOW TEMPERATURE               
## [269] MAJOR FLOOD                    Marine Accident               
## [271] MARINE HAIL                    MARINE HIGH WIND              
## [273] MARINE MISHAP                  MARINE STRONG WIND            
## [275] MARINE THUNDERSTORM WIND       MARINE TSTM WIND              
## [277] Microburst                     MICROBURST                    
## [279] MICROBURST WINDS               MINOR FLOODING                
## [281] MIXED PRECIP                   Mixed Precipitation           
## [283] MIXED PRECIPITATION            MUD SLIDE                     
## [285] MUD SLIDES                     MUD SLIDES URBAN FLOODING     
## [287] Mudslide                       MUDSLIDE                      
## [289] Mudslides                      MUDSLIDES                     
## [291] NON-SEVERE WIND DAMAGE         NON-TSTM WIND                 
## [293] NON TSTM WIND                  Other                         
## [295] OTHER                          RAIN                          
## [297] RAIN/SNOW                      RAIN/WIND                     
## [299] RAINSTORM                      RAPIDLY RISING WATER          
## [301] RECORD COLD                    RECORD HEAT                   
## [303] RECORD RAINFALL                RECORD SNOW                   
## [305] RECORD/EXCESSIVE HEAT          RIP CURRENT                   
## [307] RIP CURRENTS                   RIP CURRENTS/HEAVY SURF       
## [309] RIVER AND STREAM FLOOD         RIVER FLOOD                   
## [311] River Flooding                 RIVER FLOODING                
## [313] ROCK SLIDE                     ROGUE WAVE                    
## [315] ROUGH SEAS                     ROUGH SURF                    
## [317] RURAL FLOOD                    SEICHE                        
## [319] SEVERE THUNDERSTORM            SEVERE THUNDERSTORM WINDS     
## [321] SEVERE THUNDERSTORMS           SEVERE TURBULENCE             
## [323] SLEET                          SLEET/ICE STORM               
## [325] SMALL HAIL                     SMALL STREAM FLOOD            
## [327] Snow                           SNOW                          
## [329] SNOW ACCUMULATION              SNOW AND HEAVY SNOW           
## [331] SNOW AND ICE                   SNOW AND ICE STORM            
## [333] SNOW FREEZING RAIN             SNOW SQUALL                   
## [335] Snow Squalls                   SNOW SQUALLS                  
## [337] SNOW/ BITTER COLD              SNOW/ ICE                     
## [339] SNOW/BLOWING SNOW              SNOW/COLD                     
## [341] SNOW/FREEZING RAIN             SNOW/HEAVY SNOW               
## [343] SNOW/HIGH WINDS                SNOW/ICE                      
## [345] SNOW/ICE STORM                 SNOW/SLEET                    
## [347] SNOW/SLEET/FREEZING RAIN       SNOWMELT FLOODING             
## [349] STORM FORCE WINDS              STORM SURGE                   
## [351] STORM SURGE/TIDE               Strong Wind                   
## [353] STRONG WIND                    Strong Winds                  
## [355] STRONG WINDS                   THUDERSTORM WINDS             
## [357] THUNDEERSTORM WINDS            THUNDERESTORM WINDS           
## [359] THUNDERSNOW                    THUNDERSTORM                  
## [361] THUNDERSTORM  WINDS            THUNDERSTORM DAMAGE TO        
## [363] THUNDERSTORM HAIL              THUNDERSTORM WIND             
## [365] THUNDERSTORM WIND (G40)        THUNDERSTORM WIND 60 MPH      
## [367] THUNDERSTORM WIND 65 MPH       THUNDERSTORM WIND 65MPH       
## [369] THUNDERSTORM WIND 98 MPH       THUNDERSTORM WIND G50         
## [371] THUNDERSTORM WIND G52          THUNDERSTORM WIND G55         
## [373] THUNDERSTORM WIND G60          THUNDERSTORM WIND TREES       
## [375] THUNDERSTORM WIND.             THUNDERSTORM WIND/ TREE       
## [377] THUNDERSTORM WIND/ TREES       THUNDERSTORM WIND/AWNING      
## [379] THUNDERSTORM WIND/HAIL         THUNDERSTORM WIND/LIGHTNING   
## [381] THUNDERSTORM WINDS             THUNDERSTORM WINDS 13         
## [383] THUNDERSTORM WINDS 63 MPH      THUNDERSTORM WINDS AND        
## [385] THUNDERSTORM WINDS G60         THUNDERSTORM WINDS HAIL       
## [387] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS.           
## [389] THUNDERSTORM WINDS/ FLOOD      THUNDERSTORM WINDS/FLOODING   
## [391] THUNDERSTORM WINDS/FUNNEL CLOU THUNDERSTORM WINDS/HAIL       
## [393] THUNDERSTORM WINDS53           THUNDERSTORM WINDSHAIL        
## [395] THUNDERSTORM WINDSS            THUNDERSTORM WINS             
## [397] THUNDERSTORMS                  THUNDERSTORMS WIND            
## [399] THUNDERSTORMS WINDS            THUNDERSTORMW                 
## [401] THUNDERSTORMWINDS              THUNDERSTROM WIND             
## [403] THUNDERTORM WINDS              THUNERSTORM WINDS             
## [405] Tidal Flooding                 TIDAL FLOODING                
## [407] TORNADO                        TORNADO F0                    
## [409] TORNADO F1                     TORNADO F2                    
## [411] TORNADO F3                     TORNADOES                     
## [413] TORNADOES, TSTM WIND, HAIL     TORNDAO                       
## [415] Torrential Rainfall            TROPICAL DEPRESSION           
## [417] TROPICAL STORM                 TROPICAL STORM ALBERTO        
## [419] TROPICAL STORM DEAN            TROPICAL STORM GORDON         
## [421] TROPICAL STORM JERRY           Tstm Wind                     
## [423] TSTM WIND                      TSTM WIND  (G45)              
## [425] TSTM WIND (41)                 TSTM WIND (G35)               
## [427] TSTM WIND (G40)                TSTM WIND (G45)               
## [429] TSTM WIND 40                   TSTM WIND 45                  
## [431] TSTM WIND 55                   TSTM WIND 65)                 
## [433] TSTM WIND AND LIGHTNING        TSTM WIND DAMAGE              
## [435] TSTM WIND G45                  TSTM WIND G58                 
## [437] TSTM WIND/HAIL                 TSTM WINDS                    
## [439] TSTMW                          TSUNAMI                       
## [441] TUNDERSTORM WIND               TYPHOON                       
## [443] Unseasonable Cold              UNSEASONABLY COLD             
## [445] UNSEASONABLY WARM              UNSEASONABLY WARM AND DRY     
## [447] UNSEASONAL RAIN                URBAN AND SMALL               
## [449] URBAN AND SMALL STREAM FLOODIN URBAN FLOOD                   
## [451] URBAN FLOODING                 URBAN FLOODS                  
## [453] URBAN SMALL                    URBAN/SMALL STREAM            
## [455] URBAN/SMALL STREAM FLOOD       URBAN/SML STREAM FLD          
## [457] VOLCANIC ASH                   WARM WEATHER                  
## [459] WATERSPOUT                     WATERSPOUT-                   
## [461] WATERSPOUT-TORNADO             WATERSPOUT TORNADO            
## [463] WATERSPOUT/ TORNADO            WATERSPOUT/TORNADO            
## [465] WET MICROBURST                 Whirlwind                     
## [467] WHIRLWIND                      WILD FIRES                    
## [469] WILD/FOREST FIRE               WILD/FOREST FIRES             
## [471] WILDFIRE                       WILDFIRES                     
## [473] Wind                           WIND                          
## [475] WIND AND WAVE                  Wind Damage                   
## [477] WIND DAMAGE                    WIND STORM                    
## [479] WIND/HAIL                      WINDS                         
## [481] WINTER STORM                   WINTER STORM HIGH WINDS       
## [483] WINTER STORMS                  WINTER WEATHER                
## [485] WINTER WEATHER MIX             WINTER WEATHER/MIX            
## [487] Wintry Mix                     WINTRY MIX                    
## 488 Levels:    HIGH SURF ADVISORY  FLASH FLOOD ... WINTRY MIX
lev1 <- nlevels(SD$EVTYPE)

In fact, the new database contains 488 different values for this variable, as listed above.
Information appearing in Storm Data is provided by sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc, so we can see that there are lots of different ways to spell the same value, such as high wind, HIGH WIND or high winds, typos, etc…

The following code chunks try to fix the majority of these issues on a new variable - EVTYPE1.

SD$EVTYPE <- tolower(SD$EVTYPE) ##all lower case
SD$EVTYPE1 <- SD$EVTYPE
SD$EVTYPE1 <- sub("winds", "wind", SD$EVTYPE)
SD$EVTYPE1 <- sub("storms", "storm", SD$EVTYPE)
SD$EVTYPE1 <- sub("rains", "rain", SD$EVTYPE)
SD$EVTYPE1 <- sub("seas", "surf", SD$EVTYPE)
SD$EVTYPE1 <- sub("torndao", "tornado", SD$EVTYPE)
SD$EVTYPE1 <- sub("sea", "surf", SD$EVTYPE)
SD$EVTYPE1 <- sub("floods", "flood", SD$EVTYPE)
SD$EVTYPE1 <- sub("flooding", "flood", SD$EVTYPE)
SD$EVTYPE1 <- sub("  ", " ", SD$EVTYPE)
SD$EVTYPE1 <- sub(".", "", SD$EVTYPE)
SD$EVTYPE1 <- sub("lake flood", "lakeshore flood", SD$EVTYPE)
snow <- unique(grep("*snow*", SD$EVTYPE, value=T))

The documentation shows two snow related possible values: heavy snow and lake-effect snow, but in reallity there are all these different entries for snow-related events:

heavy snow, heavy snow/wind, heavy snowpack, snow, freezing rain/snow, thundersnow, heavy rain/snow, snow and heavy snow, snow/heavy snow, snow and ice, snow and ice storm, heavy lake snow, heavy snow/freezing rain, heavy snow/winter storm, heavy snow and high winds, heavy snow/high winds & flood, snow/cold, heavy snow squalls, snow squall, snow/ice storm, heavy snow/squalls, heavy snow-squalls, snow freezing rain, snow/sleet, snow/freezing rain, snow squalls, snow/sleet/freezing rain, record snow, blowing snow, heavy snow/blizzard, ice and snow, heavy snow/ice, high wind/heavy snow, snow/ice, heavy snow/blizzard/avalanche, snow/ bitter cold, snow/high winds, high winds/snow, snowmelt flooding, heavy snow and strong winds, snow accumulation, snow/ ice, snow/blowing snow, heavy snow shower, light snow, light snowfall, cold and snow, rain/snow, lake effect snow, excessive snow, late season snow, falling snow/ice, lake-effect snow

Lets try to fix this (to the best of our knowledge)…

lake <- grep("*lake*",sort(unique(grep("*snow*", SD$EVTYPE, value=T))), value = T)
for(i in 1:3){
        SD[SD$EVTYPE == lake[i],]$EVTYPE1 <- "lake-effect Snow"
}
SD[grepl("*snow*",SD$EVTYPE),]$EVTYPE1 <- "heavy snow"

The same applies for floods. The documentation refers 4 types of floods: Coastal flood, flash flood, flood, and lakeshore flood.

sort(unique(grep("*flood*", SD$EVTYPE, value=T)))
##  [1] " flash flood"                   "breakup flooding"              
##  [3] "coastal  flooding/erosion"      "coastal flood"                 
##  [5] "coastal flooding"               "coastal flooding/erosion"      
##  [7] "erosion/cstl flood"             "flash flood"                   
##  [9] "flash flood - heavy rain"       "flash flood from ice jams"     
## [11] "flash flood landslides"         "flash flood winds"             
## [13] "flash flood/"                   "flash flood/ street"           
## [15] "flash flood/flood"              "flash flood/landslide"         
## [17] "flash flooding"                 "flash flooding/flood"          
## [19] "flash flooding/thunderstorm wi" "flash floods"                  
## [21] "flood"                          "flood & heavy rain"            
## [23] "flood flash"                    "flood/flash"                   
## [25] "flood/flash flood"              "flood/flash/flood"             
## [27] "flood/flashflood"               "flood/rain/winds"              
## [29] "flood/river flood"              "flooding"                      
## [31] "flooding/heavy rain"            "floods"                        
## [33] "heavy rain and flood"           "heavy rains/flooding"          
## [35] "heavy snow/high winds & flood"  "heavy surf coastal flooding"   
## [37] "high winds/coastal flood"       "ice jam flood (minor"          
## [39] "ice jam flooding"               "ice storm/flash flood"         
## [41] "lake flood"                     "lakeshore flood"               
## [43] "major flood"                    "minor flooding"                
## [45] "mud slides urban flooding"      "river and stream flood"        
## [47] "river flood"                    "river flooding"                
## [49] "rural flood"                    "small stream flood"            
## [51] "snowmelt flooding"              "thunderstorm winds/ flood"     
## [53] "thunderstorm winds/flooding"    "tidal flooding"                
## [55] "urban and small stream floodin" "urban flood"                   
## [57] "urban flooding"                 "urban floods"                  
## [59] "urban/small stream flood"

As far as I see it, floods can be a nefast consequence of heavy rain, so all entries such as flood/rain will be set to flood. Coas

SD[grepl("*coastal*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*cstl*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*tidal*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*flash*",SD$EVTYPE),]$EVTYPE1 <- "Flash Flood"
SD[grepl("*lakeshore*",SD$EVTYPE),]$EVTYPE1 <- "Lakeshore Flood"
SD[grepl("*flood*",SD$EVTYPE),]$EVTYPE1 <- "flood"

There are lots of unnecessary information for Hurricanes too, and just one entry in the documentation: Hurricane (Typhoon), thunderstorm, and so many others…

sort(unique(grep("*hurricane*", SD$EVTYPE, value=T)))
##  [1] "hurricane"                  "hurricane-generated swells"
##  [3] "hurricane edouard"          "hurricane emily"           
##  [5] "hurricane erin"             "hurricane felix"           
##  [7] "hurricane gordon"           "hurricane opal"            
##  [9] "hurricane opal/high winds"  "hurricane/typhoon"
SD[grepl("*hurricane*",SD$EVTYPE),]$EVTYPE1 <- "Hurricane (Typhoon)"
SD[grepl("*high wind*",SD$EVTYPE),]$EVTYPE1 <- "high wind"
SD[grepl("*thunderstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thuderstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thunderestorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thunerstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thundeerstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thundertorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*current*",SD$EVTYPE),]$EVTYPE1 <- "rip currents"
SD[grepl("*tstm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*tropical storm*",SD$EVTYPE),]$EVTYPE1 <- "tropical storm"
SD[grepl("*lighting*",SD$EVTYPE),]$EVTYPE1 <- "lightning"
SD[grepl("*lightning*",SD$EVTYPE),]$EVTYPE1 <- "lightning"
SD[grepl("*tornado*",SD$EVTYPE),]$EVTYPE1 <- "tornado"
SD[grepl("*hail*",SD$EVTYPE),]$EVTYPE1 <- "Hail"

We can also probably skip some of them, such as those containing the word “summary” (not sure of what they stand for, but they hold nothing relevant for our analysis).

library(dplyr)
SDsumm <- filter(SD, grepl("Summary*", SD$EVTYPE))
SDsummD <- dim(SDsumm)
summarise(SDsumm, fatalities = sum(FATALITIES), injuries = sum(INJURIES), 
          property = sum(PROPDMG), crops = sum(CROPDMG))
## Source: local data frame [1 x 4]
## 
##   fatalities injuries property crops
## 1          0        0        0     0
SD <- filter(SD, !grepl("Summary*", SD$EVTYPE)) ##removes "summary*" entries from EVTYPE

This removes 0, 9 different possibilities to EVTYPE. We can see that they hold no data regarding the variables of interest to us.

Unfortunatelly, I don’t have any more time to further improve this data…so lets go on.

BGN_DATE

BGN_DATE gives us the event date. It is important to know the year in order to compute the net present value of wealth loss,i.e., to compute all monetary values with the same time reference. This will be accomplish by using USA’s CPI (consumer price index) according to this formulation.

class(SD$BGN_DATE)
## [1] "factor"
SD$BGN_DATE[1]
## [1] 4/18/1950 0:00:00
## 16335 Levels: 1/1/1966 0:00:00 1/1/1972 0:00:00 ... 9/9/2011 0:00:00

First we need to process this variable. We are interested in getting the year only. The following code chunk achieves just that.

library(lubridate)
SD$BGN_DATE <- mdy_hms(SD$BGN_DATE)
SD$BGN_DATE <-year(SD$BGN_DATE)

FATALITIES AND INJURIES

This two variables give us the number of personnal loss inflicted by extreme weather events in the USA since 1950 to 2011.

ttl_fatal <- format(sum(SD$FATALITIES), digits = 6, big.mark=" ")
ttl_inj <- format(sum(SD$INJURIES), digits = 6, big.mark=" ")
yearFat <- sort(tapply(SD$FATALITIES, SD$BGN_DATE, sum), decreasing = T)
yearFat[1:10]
## 1995 2011 1999 1998 1997 2006 1996 1953 2002 2008 
## 1491 1002  908  687  601  599  542  519  498  488
maxfat <- max(yearFat)
yearInj <- sort(tapply(SD$INJURIES, SD$BGN_DATE, sum), decreasing = T)
yearInj[1:10]
##  1998  2011  1974  1965  1999  1953  1995  1994  1997  2006 
## 11177  7792  6824  5197  5148  5131  4480  4161  3800  3368

In the last 60 years, 15 145 persons died by extreme weather events accross the USA, while 140 528 persons sustained some kind of injuries. The above tables give us the years with the bigest figures in terms of personnal loss. 1995 was the year when most people died in the States due to extreme weather events, with 1491 casualties.

typeFat <- sort(tapply(SD$FATALITIES, SD$EVTYPE, sum), decreasing = T)
typeFat[1:10]
##        tornado excessive heat    flash flood           heat      lightning 
##           5633           1903            978            937            816 
##      tstm wind          flood    rip current      high wind      avalanche 
##            504            470            368            248            224
maxTypeFat <- format(max(typeFat), digits = 4, big.mark=" ") 
rel10Fat <- format(sum(typeFat[1:10])/sum(SD$FATALITIES)*100, digits = 2)
trnrelFat <- format(max(typeFat)/sum(SD$FATALITIES)*100, digits = 2)
typeInj <- sort(tapply(SD$INJURIES, SD$EVTYPE, sum), decreasing = T)
typeInj[1:10]
##           tornado         tstm wind             flood    excessive heat 
##             91346              6957              6789              6525 
##         lightning              heat         ice storm       flash flood 
##              5230              2100              1975              1777 
## thunderstorm wind              hail 
##              1488              1361
maxTypeInj <- format(max(typeInj),digits = 5, big.mark=" ") 
rel10Inj <- format(sum(typeInj[1:10])/sum(SD$INJURIES)*100,digits = 2)
trnrelInj <- format(max(typeInj)/sum(SD$INJURIES)*100, digits = 2)

The top ten events are responsible for 80% of all registred deaths and 89% of all injuries. By type of event, tornado is the greatest cause of personnal loss, with 5 633 deaths and 91 346 injuries. These figures correspond to 37% of all deaths and 65% of all injuries during the last 60 years.

PROPDMG-PROPDMGEXP and CROPDMG-CROPDMGEXP

These two variables-pairs together provide the dolar value of property damage and crop damage caused by extreme weather events. According to the documentation:

  • PROPDMG is entered as actual dollar amounts […] rounded to three significant digits
  • PROPDMGEXP provides an alphabetical character signifying the magnitude of the number: K for thousands, M for millions, and B for billions.

The same applies for crops´ data. in order to get the total amounts of Property and Crops value lost to severe weather events in the states during the last 60 years, we must combine the information of the variables in a third variable: PROPVALUE and CROPVALUE, respectively.

mutate(SD, PR = 0, CR =0)
## Source: local data frame [254,633 x 11]
## 
##    BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1      1950 tornado          0       15    25.0          K       0
## 2      1950 tornado          0        0     2.5          K       0
## 3      1951 tornado          0        2    25.0          K       0
## 4      1951 tornado          0        2     2.5          K       0
## 5      1951 tornado          0        2     2.5          K       0
## 6      1951 tornado          0        6     2.5          K       0
## 7      1951 tornado          0        1     2.5          K       0
## 8      1952 tornado          0        0     2.5          K       0
## 9      1952 tornado          1       14    25.0          K       0
## 10     1952 tornado          0        0    25.0          K       0
## ..      ...     ...        ...      ...     ...        ...     ...
## Variables not shown: CROPDMGEXP (fctr), EVTYPE1 (chr), PR (dbl), CR (dbl)
       for(i in 1:length(SD$PROPDMG)){
        if(SD$PROPDMGEXP[i] == "K") {SD$PR[i] <- SD$PROPDMG[i]*1000}
        else if(SD$PROPDMGEXP[i] == "M") {SD$PR[i] <- SD$PROPDMG[i]*1000000}
        else if(SD$PROPDMGEXP[i] == "B") {SD$PR[i] <- SD$PROPDMG[i]*1000000000}
        else {SD$PR[i] <- 0}
     } 
for(i in 1:length(SD$CROPDMG)){
        if(SD$CROPDMGEXP[i] == "K") {SD$CR[i] <- SD$CROPDMG[i]*1000}
        else if(SD$CROPDMGEXP[i] == "M") {SD$CR[i] <- SD$CROPDMG[i]*1000000}
        else if(SD$CROPDMGEXP[i] == "B") {SD$CR[i] <- SD$CROPDMG[i]*1000000000}
        else {SD$CR[i] <- 0}
     } 

Finally, we must understand that a dollar lost in 1950 is not equal to a same dollar spent today. We must update all values to same year. for this we will use a dataset that says what is the value of an 1950’s dollar on all other years. For instance, one dollar of 1950 would be worth five times its value in 1990, and over 9 times in 2011.

library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
library(data.table)
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, mday, month, quarter, wday, week, yday, year
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, last
usDollar <- read.xlsx("./data/USDollar1950_2011.xlsx", sheetIndex =2, header =F)
dollar <-as.data.table(usDollar)
names(dollar) <- c("BGN_DATE","dollarUpd")
## Warning in `names<-.data.table`(`*tmp*`, value = c("BGN_DATE",
## "dollarUpd": The names(x)<-value syntax copies the whole table. This is
## due to <- in R itself. Please change to setnames(x,old,new) which does not
## copy and is faster. See help('setnames'). You can safely ignore this
## warning if it is inconvenient to change right now. Setting options(warn=2)
## turns this warning into an error, so you can then use traceback() to find
## and change your names<- calls.
dollar
##     BGN_DATE dollarUpd
##  1:     1950     1.000
##  2:     1951     1.059
##  3:     1952     1.122
##  4:     1953     1.131
##  5:     1954     1.139
##  6:     1955     1.131
##  7:     1956     1.135
##  8:     1957     1.169
##  9:     1958     1.203
## 10:     1959     1.224
## 11:     1960     1.245
## 12:     1961     1.262
## 13:     1962     1.271
## 14:     1963     1.288
## 15:     1964     1.309
## 16:     1965     1.322
## 17:     1966     1.347
## 18:     1967     1.394
## 19:     1968     1.436
## 20:     1969     1.504
## 21:     1970     1.597
## 22:     1971     1.686
## 23:     1972     1.741
## 24:     1973     1.800
## 25:     1974     1.957
## 26:     1975     2.199
## 27:     1976     2.351
## 28:     1977     2.466
## 29:     1978     2.631
## 30:     1979     2.868
## 31:     1980     3.249
## 32:     1981     3.656
## 33:     1982     3.983
## 34:     1983     4.135
## 35:     1984     4.292
## 36:     1985     4.461
## 37:     1986     4.631
## 38:     1987     4.682
## 39:     1988     4.889
## 40:     1989     5.105
## 41:     1990     5.343
## 42:     1991     5.669
## 43:     1992     5.843
## 44:     1993     6.012
## 45:     1994     6.177
## 46:     1995     6.343
## 47:     1996     6.504
## 48:     1997     6.720
## 49:     1998     6.834
## 50:     1999     6.944
## 51:     2000     7.131
## 52:     2001     7.372
## 53:     2002     7.487
## 54:     2003     7.665
## 55:     2004     7.809
## 56:     2005     8.063
## 57:     2006     8.338
## 58:     2007     8.550
## 59:     2008     8.899
## 60:     2009     8.907
## 61:     2010     9.150
## 62:     2011     9.287
## 63:     2012     9.562
## 64:     2013     9.728
## 65:     2014     9.874
##     BGN_DATE dollarUpd
DT <-as.data.table(SD)
setkey(DT,BGN_DATE); setkey(dollar, BGN_DATE)
DTD <-merge(DT,dollar)
DTD[,PR1:={tmp <- (PR/dollarUpd); tmp*9.287}]  ##9.287 is the value of an 1950 US$ in 2011 terms
##         BGN_DATE       EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
##      1:     1950      tornado          0       15    25.0          K
##      2:     1950      tornado          0        0     2.5          K
##      3:     1950      tornado          1        1     2.5          K
##      4:     1950      tornado          0        0     2.5          K
##      5:     1950      tornado          0        0    25.0          K
##     ---                                                             
## 254629:     2011 winter storm          0        0     5.0          K
## 254630:     2011  strong wind          0        0     0.6          K
## 254631:     2011  strong wind          0        0     1.0          K
## 254632:     2011      drought          0        0     2.0          K
## 254633:     2011    high wind          0        0     7.5          K
##         CROPDMG CROPDMGEXP      EVTYPE1    PR CR dollarUpd      PR1
##      1:       0                 tornado 25000  0     1.000 232175.0
##      2:       0                 tornado  2500  0     1.000  23217.5
##      3:       0                 tornado  2500  0     1.000  23217.5
##      4:       0                 tornado  2500  0     1.000  23217.5
##      5:       0                 tornado 25000  0     1.000 232175.0
##     ---                                                            
## 254629:       0          K winter storm  5000  0     9.287   5000.0
## 254630:       0          K  strong wind   600  0     9.287    600.0
## 254631:       0          K  strong wind  1000  0     9.287   1000.0
## 254632:       0          K      drought  2000  0     9.287   2000.0
## 254633:       0          K    high wind  7500  0     9.287   7500.0
DTD[,CR1:={tmp <- (CR/dollarUpd); tmp*9.287}]
##         BGN_DATE       EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
##      1:     1950      tornado          0       15    25.0          K
##      2:     1950      tornado          0        0     2.5          K
##      3:     1950      tornado          1        1     2.5          K
##      4:     1950      tornado          0        0     2.5          K
##      5:     1950      tornado          0        0    25.0          K
##     ---                                                             
## 254629:     2011 winter storm          0        0     5.0          K
## 254630:     2011  strong wind          0        0     0.6          K
## 254631:     2011  strong wind          0        0     1.0          K
## 254632:     2011      drought          0        0     2.0          K
## 254633:     2011    high wind          0        0     7.5          K
##         CROPDMG CROPDMGEXP      EVTYPE1    PR CR dollarUpd      PR1 CR1
##      1:       0                 tornado 25000  0     1.000 232175.0   0
##      2:       0                 tornado  2500  0     1.000  23217.5   0
##      3:       0                 tornado  2500  0     1.000  23217.5   0
##      4:       0                 tornado  2500  0     1.000  23217.5   0
##      5:       0                 tornado 25000  0     1.000 232175.0   0
##     ---                                                                
## 254629:       0          K winter storm  5000  0     9.287   5000.0   0
## 254630:       0          K  strong wind   600  0     9.287    600.0   0
## 254631:       0          K  strong wind  1000  0     9.287   1000.0   0
## 254632:       0          K      drought  2000  0     9.287   2000.0   0
## 254633:       0          K    high wind  7500  0     9.287   7500.0   0

Now, we can compute the total value lost for the last 60 years, due to severe weather events, terms of property and crops, in 2011 $US dollars.

ttlBll <- DTD[,list(sum(PR1)/1000000000, sum(CR1)/1000000000)]
propdmg <- sort(tapply(DTD$PR1,DTD$EVTYPE, sum), decreasing =T)/1000000000
cropdmg <- sort(tapply(DTD$CR1,DTD$EVTYPE, sum), decreasing =T)/1000000000

Finally, the total amount of property lost to severe weather events, between 1950 and 2011, sums up to 579.0098583 billion dollars. The crops lost amounts to 64.1573639billion dollars. both these values are stated in 2011 values.

propdmg[1:10]
##             flood           tornado hurricane/typhoon       storm surge 
##        162.923216        141.564080         80.589074         49.962701 
##       flash flood              hail         hurricane      winter storm 
##         19.357078         18.676635         15.298925          9.677231 
##    tropical storm       river flood 
##          9.600441          7.900586
maxProp <- format(max(propdmg), digits = 4, big.mark=" ")
rel10Prop <- format(sum(propdmg[1:10])/sum(DTD$PR1)/1000000000*100, digits = 2)
trnrelProp <- format(max(propdmg)/sum(DTD$PR1)/1000000000*100, digits = 2)

The above table give us the top ten causes for lost property. The top ten events are responsible for 8.9e-17% of all property loss. By type of event, flood is the greatest cause of property loss, summing up 162.9 billion dollars. The top ten corresponds to 2.8e-17% of all property value lost during the last 60 years.

cropdmg[1:10]
##           drought       river flood         ice storm             flood 
##         17.657482          7.767300          7.547484          6.719910 
##              hail         hurricane hurricane/typhoon      extreme cold 
##          3.802699          3.631492          3.026714          1.781367 
##       flash flood      frost/freeze 
##          1.686380          1.165059
maxCrop <- format(max(cropdmg), digits = 4, big.mark=" ")
rel10Crop <- format(sum(cropdmg[1:10])/sum(DTD$CR1)/1000000000*100, digits = 2)
trnrelCrop <- format(max(cropdmg)/sum(DTD$CR1)/1000000000*100, digits = 2)

The above table give us the top ten causes for lost crops. The top ten events are responsible for 8.5e-17% of all property loss. By type of event, flood is the greatest cause of crops’ loss, summing up 17.66 billion dollars. The top ten corresponds to 2.8e-17% of all crop value lost during the last 60 years.

RESULTS

This assignment really took me a lot of time, and I’m afraid it didn’t finished as neat as I would have liked, but I have no more time…

So, you can say that, strictly, I have a results section. I have been giving the main results from this analysis as long as I was going through the analysis.

In this last part, I would like to present a plot showing something interesting.