The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property or corp loss (in US Dollars).
This analysis builds on this dataset to find out what are the most severe weather events in terms of personnal and goods loss, and quantify that loss by type of event, in order to properly assign funds to tackle these events accordingly.
In the last 60 years, 15 145 persons died by extreme weather events accross the USA, while 140 528 persons sustained some kind of injuries. Tornados are the principal cause of death (and injuries) by severe weather events.
The total amount of property lost to severe weather events, between 1950 and 2011, sums up to 579 billion dollars (2011 values). The crops lost is over 64 billion dollars for this period. Droughts is the main causer of crops’ loss, while floods, tornados and hurricanes makes the top three Property loss causes.
The objective of the following analysis is to help prioritize resources to tackle different types of weather events, based on its relative impact on wealth destruction and personnal loss.
We propose to answer the two following questions:
The data for this analysis comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The following code chunk loads the storm database into an R dataframe and give us a glimpse on the data structure (I previously downloaded the data from here and placed it in the data directory).
sd <- read.csv(bzfile("./data/data.csv.bz2"))
str(sd)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
a <- dim(sd)
This database holds 902297 observations of weather events in the States, accross 37 variables.
In order to provide a proper answer for our questions, we will only need eight of those variables, for those observations that impacted on loss of lives, injuries or Propriety/crops damage.
library(dplyr)
SD <- tbl_df(sd)
rm(sd)
SD <- SD %>% select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,CROPDMG, CROPDMGEXP) %>% filter(FATALITIES>0 | INJURIES>0 | PROPDMG >0 | CROPDMG >0)
dim2 <- dim(SD)
We are left with only 254633observations of the initial 902297. The following provides an overview on these eight variables and provides a reasoning for its inclusion in later analyses.
The first variable we will introduce is EVTYPE: it’s a Event Name Designator and the cornerstone variable for all subsequent analyses. According to Storm data documentation there are 48 different weather events considered in this database.
However, if we look at the values this variable takes, we will get much more possibilites.
SD$EVTYPE <- factor(SD$EVTYPE)
sort(unique(SD$EVTYPE))
## [1] HIGH SURF ADVISORY FLASH FLOOD
## [3] TSTM WIND TSTM WIND (G45)
## [5] ? AGRICULTURAL FREEZE
## [7] APACHE COUNTY ASTRONOMICAL HIGH TIDE
## [9] ASTRONOMICAL LOW TIDE AVALANCE
## [11] AVALANCHE Beach Erosion
## [13] BLACK ICE BLIZZARD
## [15] BLIZZARD/WINTER STORM BLOWING DUST
## [17] blowing snow BLOWING SNOW
## [19] BREAKUP FLOODING BRUSH FIRE
## [21] COASTAL FLOODING/EROSION COASTAL EROSION
## [23] Coastal Flood COASTAL FLOOD
## [25] Coastal Flooding COASTAL FLOODING
## [27] COASTAL FLOODING/EROSION Coastal Storm
## [29] COASTAL STORM COASTAL SURGE
## [31] COASTALSTORM Cold
## [33] COLD COLD AIR TORNADO
## [35] COLD AND SNOW COLD AND WET CONDITIONS
## [37] Cold Temperature COLD WAVE
## [39] COLD WEATHER COLD/WIND CHILL
## [41] COLD/WINDS COOL AND WET
## [43] DAM BREAK Damaging Freeze
## [45] DAMAGING FREEZE DENSE FOG
## [47] DENSE SMOKE DOWNBURST
## [49] DROUGHT DROUGHT/EXCESSIVE HEAT
## [51] DROWNING DRY MICROBURST
## [53] DRY MIRCOBURST WINDS Dust Devil
## [55] DUST DEVIL DUST DEVIL WATERSPOUT
## [57] DUST STORM DUST STORM/HIGH WINDS
## [59] Early Frost Erosion/Cstl Flood
## [61] EXCESSIVE HEAT EXCESSIVE RAINFALL
## [63] EXCESSIVE SNOW EXCESSIVE WETNESS
## [65] Extended Cold Extreme Cold
## [67] EXTREME COLD EXTREME COLD/WIND CHILL
## [69] EXTREME HEAT EXTREME WIND CHILL
## [71] EXTREME WINDCHILL FALLING SNOW/ICE
## [73] FLASH FLOOD FLASH FLOOD - HEAVY RAIN
## [75] FLASH FLOOD FROM ICE JAMS FLASH FLOOD LANDSLIDES
## [77] FLASH FLOOD WINDS FLASH FLOOD/
## [79] FLASH FLOOD/ STREET FLASH FLOOD/FLOOD
## [81] FLASH FLOOD/LANDSLIDE FLASH FLOODING
## [83] FLASH FLOODING/FLOOD FLASH FLOODING/THUNDERSTORM WI
## [85] FLASH FLOODS FLOOD
## [87] FLOOD & HEAVY RAIN FLOOD FLASH
## [89] FLOOD/FLASH FLOOD/FLASH FLOOD
## [91] FLOOD/FLASH/FLOOD FLOOD/FLASHFLOOD
## [93] FLOOD/RAIN/WINDS FLOOD/RIVER FLOOD
## [95] FLOODING FLOODING/HEAVY RAIN
## [97] FLOODS FOG
## [99] FOG AND COLD TEMPERATURES FOREST FIRES
## [101] Freeze FREEZE
## [103] Freezing drizzle Freezing Drizzle
## [105] FREEZING DRIZZLE FREEZING FOG
## [107] Freezing Rain FREEZING RAIN
## [109] FREEZING RAIN/SLEET FREEZING RAIN/SNOW
## [111] Freezing Spray FROST
## [113] Frost/Freeze FROST/FREEZE
## [115] FROST\\FREEZE FUNNEL CLOUD
## [117] Glaze GLAZE
## [119] GLAZE ICE GLAZE/ICE STORM
## [121] gradient wind Gradient wind
## [123] GRADIENT WIND GRASS FIRES
## [125] GROUND BLIZZARD GUSTNADO
## [127] GUSTY WIND GUSTY WIND/HAIL
## [129] GUSTY WIND/HVY RAIN Gusty wind/rain
## [131] Gusty winds Gusty Winds
## [133] GUSTY WINDS HAIL
## [135] HAIL 0.75 HAIL 075
## [137] HAIL 100 HAIL 125
## [139] HAIL 150 HAIL 175
## [141] HAIL 200 HAIL 275
## [143] HAIL 450 HAIL 75
## [145] HAIL DAMAGE HAIL/WIND
## [147] HAIL/WINDS HAILSTORM
## [149] HARD FREEZE HAZARDOUS SURF
## [151] HEAT Heat Wave
## [153] HEAT WAVE HEAT WAVE DROUGHT
## [155] HEAT WAVES HEAVY LAKE SNOW
## [157] HEAVY MIX HEAVY PRECIPITATION
## [159] HEAVY RAIN HEAVY RAIN AND FLOOD
## [161] Heavy Rain/High Surf HEAVY RAIN/LIGHTNING
## [163] HEAVY RAIN/SEVERE WEATHER HEAVY RAIN/SMALL STREAM URBAN
## [165] HEAVY RAIN/SNOW HEAVY RAINS
## [167] HEAVY RAINS/FLOODING HEAVY SEAS
## [169] HEAVY SHOWER HEAVY SNOW
## [171] HEAVY SNOW-SQUALLS HEAVY SNOW AND HIGH WINDS
## [173] HEAVY SNOW AND STRONG WINDS Heavy snow shower
## [175] HEAVY SNOW SQUALLS HEAVY SNOW/BLIZZARD
## [177] HEAVY SNOW/BLIZZARD/AVALANCHE HEAVY SNOW/FREEZING RAIN
## [179] HEAVY SNOW/HIGH WINDS & FLOOD HEAVY SNOW/ICE
## [181] HEAVY SNOW/SQUALLS HEAVY SNOW/WIND
## [183] HEAVY SNOW/WINTER STORM HEAVY SNOWPACK
## [185] Heavy Surf HEAVY SURF
## [187] Heavy surf and wind HEAVY SURF COASTAL FLOODING
## [189] HEAVY SURF/HIGH SURF HEAVY SWELLS
## [191] HIGH HIGH WINDS
## [193] HIGH SEAS High Surf
## [195] HIGH SURF HIGH SWELLS
## [197] HIGH TIDES HIGH WATER
## [199] HIGH WAVES HIGH WIND
## [201] HIGH WIND (G40) HIGH WIND 48
## [203] HIGH WIND AND SEAS HIGH WIND DAMAGE
## [205] HIGH WIND/BLIZZARD HIGH WIND/HEAVY SNOW
## [207] HIGH WIND/SEAS HIGH WINDS
## [209] HIGH WINDS HEAVY RAINS HIGH WINDS/
## [211] HIGH WINDS/COASTAL FLOOD HIGH WINDS/COLD
## [213] HIGH WINDS/HEAVY RAIN HIGH WINDS/SNOW
## [215] HURRICANE HURRICANE-GENERATED SWELLS
## [217] Hurricane Edouard HURRICANE EMILY
## [219] HURRICANE ERIN HURRICANE FELIX
## [221] HURRICANE GORDON HURRICANE OPAL
## [223] HURRICANE OPAL/HIGH WINDS HURRICANE/TYPHOON
## [225] HVY RAIN HYPERTHERMIA/EXPOSURE
## [227] HYPOTHERMIA Hypothermia/Exposure
## [229] HYPOTHERMIA/EXPOSURE ICE
## [231] ICE AND SNOW ICE FLOES
## [233] ICE JAM Ice jam flood (minor
## [235] ICE JAM FLOODING ICE ON ROAD
## [237] ICE ROADS ICE STORM
## [239] ICE STORM/FLASH FLOOD ICE/STRONG WINDS
## [241] ICY ROADS LAKE-EFFECT SNOW
## [243] Lake Effect Snow LAKE EFFECT SNOW
## [245] LAKE FLOOD LAKESHORE FLOOD
## [247] LANDSLIDE LANDSLIDES
## [249] Landslump LANDSPOUT
## [251] LATE SEASON SNOW LIGHT FREEZING RAIN
## [253] Light snow Light Snow
## [255] LIGHT SNOW Light Snowfall
## [257] LIGHTING LIGHTNING
## [259] LIGHTNING WAUSEON LIGHTNING AND HEAVY RAIN
## [261] LIGHTNING AND THUNDERSTORM WIN LIGHTNING FIRE
## [263] LIGHTNING INJURY LIGHTNING THUNDERSTORM WINDS
## [265] LIGHTNING. LIGHTNING/HEAVY RAIN
## [267] LIGNTNING LOW TEMPERATURE
## [269] MAJOR FLOOD Marine Accident
## [271] MARINE HAIL MARINE HIGH WIND
## [273] MARINE MISHAP MARINE STRONG WIND
## [275] MARINE THUNDERSTORM WIND MARINE TSTM WIND
## [277] Microburst MICROBURST
## [279] MICROBURST WINDS MINOR FLOODING
## [281] MIXED PRECIP Mixed Precipitation
## [283] MIXED PRECIPITATION MUD SLIDE
## [285] MUD SLIDES MUD SLIDES URBAN FLOODING
## [287] Mudslide MUDSLIDE
## [289] Mudslides MUDSLIDES
## [291] NON-SEVERE WIND DAMAGE NON-TSTM WIND
## [293] NON TSTM WIND Other
## [295] OTHER RAIN
## [297] RAIN/SNOW RAIN/WIND
## [299] RAINSTORM RAPIDLY RISING WATER
## [301] RECORD COLD RECORD HEAT
## [303] RECORD RAINFALL RECORD SNOW
## [305] RECORD/EXCESSIVE HEAT RIP CURRENT
## [307] RIP CURRENTS RIP CURRENTS/HEAVY SURF
## [309] RIVER AND STREAM FLOOD RIVER FLOOD
## [311] River Flooding RIVER FLOODING
## [313] ROCK SLIDE ROGUE WAVE
## [315] ROUGH SEAS ROUGH SURF
## [317] RURAL FLOOD SEICHE
## [319] SEVERE THUNDERSTORM SEVERE THUNDERSTORM WINDS
## [321] SEVERE THUNDERSTORMS SEVERE TURBULENCE
## [323] SLEET SLEET/ICE STORM
## [325] SMALL HAIL SMALL STREAM FLOOD
## [327] Snow SNOW
## [329] SNOW ACCUMULATION SNOW AND HEAVY SNOW
## [331] SNOW AND ICE SNOW AND ICE STORM
## [333] SNOW FREEZING RAIN SNOW SQUALL
## [335] Snow Squalls SNOW SQUALLS
## [337] SNOW/ BITTER COLD SNOW/ ICE
## [339] SNOW/BLOWING SNOW SNOW/COLD
## [341] SNOW/FREEZING RAIN SNOW/HEAVY SNOW
## [343] SNOW/HIGH WINDS SNOW/ICE
## [345] SNOW/ICE STORM SNOW/SLEET
## [347] SNOW/SLEET/FREEZING RAIN SNOWMELT FLOODING
## [349] STORM FORCE WINDS STORM SURGE
## [351] STORM SURGE/TIDE Strong Wind
## [353] STRONG WIND Strong Winds
## [355] STRONG WINDS THUDERSTORM WINDS
## [357] THUNDEERSTORM WINDS THUNDERESTORM WINDS
## [359] THUNDERSNOW THUNDERSTORM
## [361] THUNDERSTORM WINDS THUNDERSTORM DAMAGE TO
## [363] THUNDERSTORM HAIL THUNDERSTORM WIND
## [365] THUNDERSTORM WIND (G40) THUNDERSTORM WIND 60 MPH
## [367] THUNDERSTORM WIND 65 MPH THUNDERSTORM WIND 65MPH
## [369] THUNDERSTORM WIND 98 MPH THUNDERSTORM WIND G50
## [371] THUNDERSTORM WIND G52 THUNDERSTORM WIND G55
## [373] THUNDERSTORM WIND G60 THUNDERSTORM WIND TREES
## [375] THUNDERSTORM WIND. THUNDERSTORM WIND/ TREE
## [377] THUNDERSTORM WIND/ TREES THUNDERSTORM WIND/AWNING
## [379] THUNDERSTORM WIND/HAIL THUNDERSTORM WIND/LIGHTNING
## [381] THUNDERSTORM WINDS THUNDERSTORM WINDS 13
## [383] THUNDERSTORM WINDS 63 MPH THUNDERSTORM WINDS AND
## [385] THUNDERSTORM WINDS G60 THUNDERSTORM WINDS HAIL
## [387] THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS.
## [389] THUNDERSTORM WINDS/ FLOOD THUNDERSTORM WINDS/FLOODING
## [391] THUNDERSTORM WINDS/FUNNEL CLOU THUNDERSTORM WINDS/HAIL
## [393] THUNDERSTORM WINDS53 THUNDERSTORM WINDSHAIL
## [395] THUNDERSTORM WINDSS THUNDERSTORM WINS
## [397] THUNDERSTORMS THUNDERSTORMS WIND
## [399] THUNDERSTORMS WINDS THUNDERSTORMW
## [401] THUNDERSTORMWINDS THUNDERSTROM WIND
## [403] THUNDERTORM WINDS THUNERSTORM WINDS
## [405] Tidal Flooding TIDAL FLOODING
## [407] TORNADO TORNADO F0
## [409] TORNADO F1 TORNADO F2
## [411] TORNADO F3 TORNADOES
## [413] TORNADOES, TSTM WIND, HAIL TORNDAO
## [415] Torrential Rainfall TROPICAL DEPRESSION
## [417] TROPICAL STORM TROPICAL STORM ALBERTO
## [419] TROPICAL STORM DEAN TROPICAL STORM GORDON
## [421] TROPICAL STORM JERRY Tstm Wind
## [423] TSTM WIND TSTM WIND (G45)
## [425] TSTM WIND (41) TSTM WIND (G35)
## [427] TSTM WIND (G40) TSTM WIND (G45)
## [429] TSTM WIND 40 TSTM WIND 45
## [431] TSTM WIND 55 TSTM WIND 65)
## [433] TSTM WIND AND LIGHTNING TSTM WIND DAMAGE
## [435] TSTM WIND G45 TSTM WIND G58
## [437] TSTM WIND/HAIL TSTM WINDS
## [439] TSTMW TSUNAMI
## [441] TUNDERSTORM WIND TYPHOON
## [443] Unseasonable Cold UNSEASONABLY COLD
## [445] UNSEASONABLY WARM UNSEASONABLY WARM AND DRY
## [447] UNSEASONAL RAIN URBAN AND SMALL
## [449] URBAN AND SMALL STREAM FLOODIN URBAN FLOOD
## [451] URBAN FLOODING URBAN FLOODS
## [453] URBAN SMALL URBAN/SMALL STREAM
## [455] URBAN/SMALL STREAM FLOOD URBAN/SML STREAM FLD
## [457] VOLCANIC ASH WARM WEATHER
## [459] WATERSPOUT WATERSPOUT-
## [461] WATERSPOUT-TORNADO WATERSPOUT TORNADO
## [463] WATERSPOUT/ TORNADO WATERSPOUT/TORNADO
## [465] WET MICROBURST Whirlwind
## [467] WHIRLWIND WILD FIRES
## [469] WILD/FOREST FIRE WILD/FOREST FIRES
## [471] WILDFIRE WILDFIRES
## [473] Wind WIND
## [475] WIND AND WAVE Wind Damage
## [477] WIND DAMAGE WIND STORM
## [479] WIND/HAIL WINDS
## [481] WINTER STORM WINTER STORM HIGH WINDS
## [483] WINTER STORMS WINTER WEATHER
## [485] WINTER WEATHER MIX WINTER WEATHER/MIX
## [487] Wintry Mix WINTRY MIX
## 488 Levels: HIGH SURF ADVISORY FLASH FLOOD ... WINTRY MIX
lev1 <- nlevels(SD$EVTYPE)
In fact, the new database contains 488 different values for this variable, as listed above.
Information appearing in Storm Data is provided by sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc, so we can see that there are lots of different ways to spell the same value, such as high wind, HIGH WIND or high winds, typos, etc…
The following code chunks try to fix the majority of these issues on a new variable - EVTYPE1.
SD$EVTYPE <- tolower(SD$EVTYPE) ##all lower case
SD$EVTYPE1 <- SD$EVTYPE
SD$EVTYPE1 <- sub("winds", "wind", SD$EVTYPE)
SD$EVTYPE1 <- sub("storms", "storm", SD$EVTYPE)
SD$EVTYPE1 <- sub("rains", "rain", SD$EVTYPE)
SD$EVTYPE1 <- sub("seas", "surf", SD$EVTYPE)
SD$EVTYPE1 <- sub("torndao", "tornado", SD$EVTYPE)
SD$EVTYPE1 <- sub("sea", "surf", SD$EVTYPE)
SD$EVTYPE1 <- sub("floods", "flood", SD$EVTYPE)
SD$EVTYPE1 <- sub("flooding", "flood", SD$EVTYPE)
SD$EVTYPE1 <- sub(" ", " ", SD$EVTYPE)
SD$EVTYPE1 <- sub(".", "", SD$EVTYPE)
SD$EVTYPE1 <- sub("lake flood", "lakeshore flood", SD$EVTYPE)
snow <- unique(grep("*snow*", SD$EVTYPE, value=T))
The documentation shows two snow related possible values: heavy snow and lake-effect snow, but in reallity there are all these different entries for snow-related events:
heavy snow, heavy snow/wind, heavy snowpack, snow, freezing rain/snow, thundersnow, heavy rain/snow, snow and heavy snow, snow/heavy snow, snow and ice, snow and ice storm, heavy lake snow, heavy snow/freezing rain, heavy snow/winter storm, heavy snow and high winds, heavy snow/high winds & flood, snow/cold, heavy snow squalls, snow squall, snow/ice storm, heavy snow/squalls, heavy snow-squalls, snow freezing rain, snow/sleet, snow/freezing rain, snow squalls, snow/sleet/freezing rain, record snow, blowing snow, heavy snow/blizzard, ice and snow, heavy snow/ice, high wind/heavy snow, snow/ice, heavy snow/blizzard/avalanche, snow/ bitter cold, snow/high winds, high winds/snow, snowmelt flooding, heavy snow and strong winds, snow accumulation, snow/ ice, snow/blowing snow, heavy snow shower, light snow, light snowfall, cold and snow, rain/snow, lake effect snow, excessive snow, late season snow, falling snow/ice, lake-effect snow
Lets try to fix this (to the best of our knowledge)…
lake <- grep("*lake*",sort(unique(grep("*snow*", SD$EVTYPE, value=T))), value = T)
for(i in 1:3){
SD[SD$EVTYPE == lake[i],]$EVTYPE1 <- "lake-effect Snow"
}
SD[grepl("*snow*",SD$EVTYPE),]$EVTYPE1 <- "heavy snow"
The same applies for floods. The documentation refers 4 types of floods: Coastal flood, flash flood, flood, and lakeshore flood.
sort(unique(grep("*flood*", SD$EVTYPE, value=T)))
## [1] " flash flood" "breakup flooding"
## [3] "coastal flooding/erosion" "coastal flood"
## [5] "coastal flooding" "coastal flooding/erosion"
## [7] "erosion/cstl flood" "flash flood"
## [9] "flash flood - heavy rain" "flash flood from ice jams"
## [11] "flash flood landslides" "flash flood winds"
## [13] "flash flood/" "flash flood/ street"
## [15] "flash flood/flood" "flash flood/landslide"
## [17] "flash flooding" "flash flooding/flood"
## [19] "flash flooding/thunderstorm wi" "flash floods"
## [21] "flood" "flood & heavy rain"
## [23] "flood flash" "flood/flash"
## [25] "flood/flash flood" "flood/flash/flood"
## [27] "flood/flashflood" "flood/rain/winds"
## [29] "flood/river flood" "flooding"
## [31] "flooding/heavy rain" "floods"
## [33] "heavy rain and flood" "heavy rains/flooding"
## [35] "heavy snow/high winds & flood" "heavy surf coastal flooding"
## [37] "high winds/coastal flood" "ice jam flood (minor"
## [39] "ice jam flooding" "ice storm/flash flood"
## [41] "lake flood" "lakeshore flood"
## [43] "major flood" "minor flooding"
## [45] "mud slides urban flooding" "river and stream flood"
## [47] "river flood" "river flooding"
## [49] "rural flood" "small stream flood"
## [51] "snowmelt flooding" "thunderstorm winds/ flood"
## [53] "thunderstorm winds/flooding" "tidal flooding"
## [55] "urban and small stream floodin" "urban flood"
## [57] "urban flooding" "urban floods"
## [59] "urban/small stream flood"
As far as I see it, floods can be a nefast consequence of heavy rain, so all entries such as flood/rain will be set to flood. Coas
SD[grepl("*coastal*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*cstl*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*tidal*",SD$EVTYPE),]$EVTYPE1 <- "Coastal Flood"
SD[grepl("*flash*",SD$EVTYPE),]$EVTYPE1 <- "Flash Flood"
SD[grepl("*lakeshore*",SD$EVTYPE),]$EVTYPE1 <- "Lakeshore Flood"
SD[grepl("*flood*",SD$EVTYPE),]$EVTYPE1 <- "flood"
There are lots of unnecessary information for Hurricanes too, and just one entry in the documentation: Hurricane (Typhoon), thunderstorm, and so many others…
sort(unique(grep("*hurricane*", SD$EVTYPE, value=T)))
## [1] "hurricane" "hurricane-generated swells"
## [3] "hurricane edouard" "hurricane emily"
## [5] "hurricane erin" "hurricane felix"
## [7] "hurricane gordon" "hurricane opal"
## [9] "hurricane opal/high winds" "hurricane/typhoon"
SD[grepl("*hurricane*",SD$EVTYPE),]$EVTYPE1 <- "Hurricane (Typhoon)"
SD[grepl("*high wind*",SD$EVTYPE),]$EVTYPE1 <- "high wind"
SD[grepl("*thunderstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thuderstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thunderestorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thunerstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thundeerstorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*thundertorm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*current*",SD$EVTYPE),]$EVTYPE1 <- "rip currents"
SD[grepl("*tstm*",SD$EVTYPE),]$EVTYPE1 <- "thunderstorm"
SD[grepl("*tropical storm*",SD$EVTYPE),]$EVTYPE1 <- "tropical storm"
SD[grepl("*lighting*",SD$EVTYPE),]$EVTYPE1 <- "lightning"
SD[grepl("*lightning*",SD$EVTYPE),]$EVTYPE1 <- "lightning"
SD[grepl("*tornado*",SD$EVTYPE),]$EVTYPE1 <- "tornado"
SD[grepl("*hail*",SD$EVTYPE),]$EVTYPE1 <- "Hail"
We can also probably skip some of them, such as those containing the word “summary” (not sure of what they stand for, but they hold nothing relevant for our analysis).
library(dplyr)
SDsumm <- filter(SD, grepl("Summary*", SD$EVTYPE))
SDsummD <- dim(SDsumm)
summarise(SDsumm, fatalities = sum(FATALITIES), injuries = sum(INJURIES),
property = sum(PROPDMG), crops = sum(CROPDMG))
## Source: local data frame [1 x 4]
##
## fatalities injuries property crops
## 1 0 0 0 0
SD <- filter(SD, !grepl("Summary*", SD$EVTYPE)) ##removes "summary*" entries from EVTYPE
This removes 0, 9 different possibilities to EVTYPE. We can see that they hold no data regarding the variables of interest to us.
Unfortunatelly, I don’t have any more time to further improve this data…so lets go on.
BGN_DATE gives us the event date. It is important to know the year in order to compute the net present value of wealth loss,i.e., to compute all monetary values with the same time reference. This could be accomplish by using USA’s CPI (consumer price index) according to this formulation, but I’m using a quicker method (see below).
class(SD$BGN_DATE)
## [1] "factor"
SD$BGN_DATE[1]
## [1] 4/18/1950 0:00:00
## 16335 Levels: 1/1/1966 0:00:00 1/1/1972 0:00:00 ... 9/9/2011 0:00:00
First we need to process this variable. We are interested in getting the year only. The following code chunk achieves just that.
library(lubridate)
SD$BGN_DATE <- mdy_hms(SD$BGN_DATE)
SD$BGN_DATE <-year(SD$BGN_DATE)
This two variables give us the number of personnal loss inflicted by extreme weather events in the USA since 1950 to 2011, in terms of deaths and injureds.
These two variables-pairs together provide the dolar value of property damage and crop damage caused by extreme weather events. According to the documentation:
The same applies for crops´ data. in order to get the total amounts of Property and Crops value lost to severe weather events in the states during the last 60 years, we must combine the information of the variables in a third variable: PROPVALUE and CROPVALUE, respectively.
mutate(SD, PR = 0, CR =0)
## Source: local data frame [254,633 x 11]
##
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1950 tornado 0 15 25.0 K 0
## 2 1950 tornado 0 0 2.5 K 0
## 3 1951 tornado 0 2 25.0 K 0
## 4 1951 tornado 0 2 2.5 K 0
## 5 1951 tornado 0 2 2.5 K 0
## 6 1951 tornado 0 6 2.5 K 0
## 7 1951 tornado 0 1 2.5 K 0
## 8 1952 tornado 0 0 2.5 K 0
## 9 1952 tornado 1 14 25.0 K 0
## 10 1952 tornado 0 0 25.0 K 0
## .. ... ... ... ... ... ... ...
## Variables not shown: CROPDMGEXP (fctr), EVTYPE1 (chr), PR (dbl), CR (dbl)
for(i in 1:length(SD$PROPDMG)){
if(SD$PROPDMGEXP[i] == "K") {SD$PR[i] <- SD$PROPDMG[i]*1000}
else if(SD$PROPDMGEXP[i] == "M") {SD$PR[i] <- SD$PROPDMG[i]*1000000}
else if(SD$PROPDMGEXP[i] == "B") {SD$PR[i] <- SD$PROPDMG[i]*1000000000}
else {SD$PR[i] <- 0}
}
for(i in 1:length(SD$CROPDMG)){
if(SD$CROPDMGEXP[i] == "K") {SD$CR[i] <- SD$CROPDMG[i]*1000}
else if(SD$CROPDMGEXP[i] == "M") {SD$CR[i] <- SD$CROPDMG[i]*1000000}
else if(SD$CROPDMGEXP[i] == "B") {SD$CR[i] <- SD$CROPDMG[i]*1000000000}
else {SD$CR[i] <- 0}
}
Finally, we must understand that a dollar lost in 1950 is not equal to a same dollar spent today. We must update all values to same year. for this we will use a dataset that says what is the value of an 1950’s dollar on all other years. For instance, one dollar of 1950 would be worth five times its value in 1990, and over 9 times in 2011.
library(xlsx)
library(data.table)
usDollar <- read.xlsx("./data/USDollar1950_2011.xlsx", sheetIndex =2, header =F)
dollar <-as.data.table(usDollar)
names(dollar) <- c("BGN_DATE","dollarUpd")
dollar
## BGN_DATE dollarUpd
## 1: 1950 1.000
## 2: 1951 1.059
## 3: 1952 1.122
## 4: 1953 1.131
## 5: 1954 1.139
## 6: 1955 1.131
## 7: 1956 1.135
## 8: 1957 1.169
## 9: 1958 1.203
## 10: 1959 1.224
## 11: 1960 1.245
## 12: 1961 1.262
## 13: 1962 1.271
## 14: 1963 1.288
## 15: 1964 1.309
## 16: 1965 1.322
## 17: 1966 1.347
## 18: 1967 1.394
## 19: 1968 1.436
## 20: 1969 1.504
## 21: 1970 1.597
## 22: 1971 1.686
## 23: 1972 1.741
## 24: 1973 1.800
## 25: 1974 1.957
## 26: 1975 2.199
## 27: 1976 2.351
## 28: 1977 2.466
## 29: 1978 2.631
## 30: 1979 2.868
## 31: 1980 3.249
## 32: 1981 3.656
## 33: 1982 3.983
## 34: 1983 4.135
## 35: 1984 4.292
## 36: 1985 4.461
## 37: 1986 4.631
## 38: 1987 4.682
## 39: 1988 4.889
## 40: 1989 5.105
## 41: 1990 5.343
## 42: 1991 5.669
## 43: 1992 5.843
## 44: 1993 6.012
## 45: 1994 6.177
## 46: 1995 6.343
## 47: 1996 6.504
## 48: 1997 6.720
## 49: 1998 6.834
## 50: 1999 6.944
## 51: 2000 7.131
## 52: 2001 7.372
## 53: 2002 7.487
## 54: 2003 7.665
## 55: 2004 7.809
## 56: 2005 8.063
## 57: 2006 8.338
## 58: 2007 8.550
## 59: 2008 8.899
## 60: 2009 8.907
## 61: 2010 9.150
## 62: 2011 9.287
## 63: 2012 9.562
## 64: 2013 9.728
## 65: 2014 9.874
## BGN_DATE dollarUpd
DT <-as.data.table(SD)
setkey(DT,BGN_DATE); setkey(dollar, BGN_DATE)
DTD <-merge(DT,dollar)
DTD[,PR1:={tmp <- (PR/dollarUpd); tmp*9.287}] ##9.287 is the value of an 1950 US$ in 2011 terms
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1: 1950 tornado 0 15 25.0 K
## 2: 1950 tornado 0 0 2.5 K
## 3: 1950 tornado 1 1 2.5 K
## 4: 1950 tornado 0 0 2.5 K
## 5: 1950 tornado 0 0 25.0 K
## ---
## 254629: 2011 winter storm 0 0 5.0 K
## 254630: 2011 strong wind 0 0 0.6 K
## 254631: 2011 strong wind 0 0 1.0 K
## 254632: 2011 drought 0 0 2.0 K
## 254633: 2011 high wind 0 0 7.5 K
## CROPDMG CROPDMGEXP EVTYPE1 PR CR dollarUpd PR1
## 1: 0 tornado 25000 0 1.000 232175.0
## 2: 0 tornado 2500 0 1.000 23217.5
## 3: 0 tornado 2500 0 1.000 23217.5
## 4: 0 tornado 2500 0 1.000 23217.5
## 5: 0 tornado 25000 0 1.000 232175.0
## ---
## 254629: 0 K winter storm 5000 0 9.287 5000.0
## 254630: 0 K strong wind 600 0 9.287 600.0
## 254631: 0 K strong wind 1000 0 9.287 1000.0
## 254632: 0 K drought 2000 0 9.287 2000.0
## 254633: 0 K high wind 7500 0 9.287 7500.0
DTD[,CR1:={tmp <- (CR/dollarUpd); tmp*9.287}]
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1: 1950 tornado 0 15 25.0 K
## 2: 1950 tornado 0 0 2.5 K
## 3: 1950 tornado 1 1 2.5 K
## 4: 1950 tornado 0 0 2.5 K
## 5: 1950 tornado 0 0 25.0 K
## ---
## 254629: 2011 winter storm 0 0 5.0 K
## 254630: 2011 strong wind 0 0 0.6 K
## 254631: 2011 strong wind 0 0 1.0 K
## 254632: 2011 drought 0 0 2.0 K
## 254633: 2011 high wind 0 0 7.5 K
## CROPDMG CROPDMGEXP EVTYPE1 PR CR dollarUpd PR1 CR1
## 1: 0 tornado 25000 0 1.000 232175.0 0
## 2: 0 tornado 2500 0 1.000 23217.5 0
## 3: 0 tornado 2500 0 1.000 23217.5 0
## 4: 0 tornado 2500 0 1.000 23217.5 0
## 5: 0 tornado 25000 0 1.000 232175.0 0
## ---
## 254629: 0 K winter storm 5000 0 9.287 5000.0 0
## 254630: 0 K strong wind 600 0 9.287 600.0 0
## 254631: 0 K strong wind 1000 0 9.287 1000.0 0
## 254632: 0 K drought 2000 0 9.287 2000.0 0
## 254633: 0 K high wind 7500 0 9.287 7500.0 0
ttl_fatal <- format(sum(SD$FATALITIES), digits = 6, big.mark=" ")
ttl_inj <- format(sum(SD$INJURIES), digits = 6, big.mark=" ")
yearFat <- sort(tapply(SD$FATALITIES, SD$BGN_DATE, sum), decreasing = T)
yearFat[1:10]
## 1995 2011 1999 1998 1997 2006 1996 1953 2002 2008
## 1491 1002 908 687 601 599 542 519 498 488
maxfat <- max(yearFat)
yearInj <- sort(tapply(SD$INJURIES, SD$BGN_DATE, sum), decreasing = T)
yearInj[1:10]
## 1998 2011 1974 1965 1999 1953 1995 1994 1997 2006
## 11177 7792 6824 5197 5148 5131 4480 4161 3800 3368
In the last 60 years, 15 145 persons died by extreme weather events accross the USA, while 140 528 persons sustained some kind of injuries. The above tables give us the years with the bigest figures in terms of personnal loss. 1995 was the year when most people died in the States due to extreme weather events, with 1491 casualties.
the following plots the deadliest events by year.
plot(DTD$BGN_DATE, DTD$FATALITIES, pch =19, col="red", main = "Deadliest events by year", xlab = "year", ylab = "number of deaths")
typeFat <- sort(tapply(SD$FATALITIES, SD$EVTYPE, sum), decreasing = T)
typeFat[1:10]
## tornado excessive heat flash flood heat lightning
## 5633 1903 978 937 816
## tstm wind flood rip current high wind avalanche
## 504 470 368 248 224
maxTypeFat <- format(max(typeFat), digits = 4, big.mark=" ")
rel10Fat <- format(sum(typeFat[1:10])/sum(SD$FATALITIES)*100, digits = 2)
trnrelFat <- format(max(typeFat)/sum(SD$FATALITIES)*100, digits = 2)
typeInj <- sort(tapply(SD$INJURIES, SD$EVTYPE, sum), decreasing = T)
typeInj[1:10]
## tornado tstm wind flood excessive heat
## 91346 6957 6789 6525
## lightning heat ice storm flash flood
## 5230 2100 1975 1777
## thunderstorm wind hail
## 1488 1361
maxTypeInj <- format(max(typeInj),digits = 5, big.mark=" ")
rel10Inj <- format(sum(typeInj[1:10])/sum(SD$INJURIES)*100,digits = 2)
trnrelInj <- format(max(typeInj)/sum(SD$INJURIES)*100, digits = 2)
The top ten events are responsible for 80% of all registred deaths and 89% of all injuries. By type of event, tornado is the greatest cause of personnal loss, with 5 633 deaths and 91 346 injuries. These figures correspond to 37% of all deaths and 65% of all injuries during the last 60 years.
we can also compute the total value lost for the last 60 years, due to severe weather events, terms of property and crops, in 2011 $US dollars.
ttlBll <- DTD[,list(sum(PR1)/1000000000, sum(CR1)/1000000000)]
propdmg <- sort(tapply(DTD$PR1,DTD$EVTYPE, sum), decreasing =T)/1000000000
cropdmg <- sort(tapply(DTD$CR1,DTD$EVTYPE, sum), decreasing =T)/1000000000
Finally, the total amount of property lost to severe weather events, between 1950 and 2011, sums up to 579.0098583 billion dollars. The crops lost amounts to 64.1573639billion dollars. both these values are stated in 2011 values.
propdmg[1:10]
## flood tornado hurricane/typhoon storm surge
## 162.923216 141.564080 80.589074 49.962701
## flash flood hail hurricane winter storm
## 19.357078 18.676635 15.298925 9.677231
## tropical storm river flood
## 9.600441 7.900586
maxProp <- format(max(propdmg), digits = 4, big.mark=" ")
rel10Prop <- format((sum(propdmg[1:10])*1000000000)/sum(DTD$PR1)*100, digits = 2)
trnrelProp <- format((max(propdmg)*1000000000)/sum(DTD$PR1)*100, digits = 2)
The above table give us the top ten causes for lost property. The top ten events are responsible for 89% of all property loss. By type of event, flood is the greatest cause of property loss, summing up 162.9 billion dollars, which corresponds to 28% of all property value lost during the last 60 years.
cropdmg[1:10]
## drought river flood ice storm flood
## 17.657482 7.767300 7.547484 6.719910
## hail hurricane hurricane/typhoon extreme cold
## 3.802699 3.631492 3.026714 1.781367
## flash flood frost/freeze
## 1.686380 1.165059
maxCrop <- format(max(cropdmg), digits = 4, big.mark=" ")
rel10Crop <- format(sum(cropdmg[1:10])/sum(DTD$CR1)/1000000000*100, digits = 2)
trnrelCrop <- format(max(cropdmg)/sum(DTD$CR1)/1000000000*100, digits = 2)
The above table give us the top ten causes for lost crops. The top ten events are responsible for 8.5e-17% of all property loss. By type of event, drought is the greatest cause of crops’ loss, summing up 17.66 billion dollars, which is 2.8e-17% of all crop value lost during the last 60 years.