This analysis looks at the consequences of storms and other severe weather events in America. Using data from the National Climatic Data Center, the consequences of severe weather events for both public health (fatalities and injuries) and the economy (crop damage and property damage) in the USA are analysed. The most damaging weather event for public health was identified as tornadoes while the most damaging event in economic terms was also identified as tornadoes. The results of this analysis should be interpreted with caution and some limitations of the analysis are briefly discusseed.
The first steps are to download the dataset and read in the dataset.
download.file (“https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2”,“StormData.csv.bz2”)
Stormdata <- read.csv("StormData.csv.bz2", stringsAsFactors= FALSE)
There is no codebook accompanying the dataset but there is accompanying documentation that we need to download and read.
Lets take a look at the data.
str (Stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
dim (Stormdata)
## [1] 902297 37
It is apparent that we are dealing with a reasonably large dataset here. Over 900,000 observations across 37 variables. We will load the dplyr library and convert to a tbl_df for ease of viewing. We will also load the other packages we need for the analysis.
library (dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library (reshape2)
## Warning: package 'reshape2' was built under R version 3.1.3
library (ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
Stormdata <- tbl_df (Stormdata)
We are only interested in observations which recorded either human or economic damage so we filter on these columns. This reduces the size of the dataset we have to work with.
Stormdata <- filter (Stormdata, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
number_of_events <- length (unique(Stormdata$EVTYPE))
unique (Stormdata$EVTYPE)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "ICE STORM/FLASH FLOOD"
## [5] "WINTER STORM" "HURRICANE OPAL/HIGH WINDS"
## [7] "THUNDERSTORM WINDS" "HURRICANE ERIN"
## [9] "HURRICANE OPAL" "HEAVY RAIN"
## [11] "LIGHTNING" "THUNDERSTORM WIND"
## [13] "DENSE FOG" "RIP CURRENT"
## [15] "THUNDERSTORM WINS" "FLASH FLOODING"
## [17] "FLASH FLOOD" "TORNADO F0"
## [19] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [21] "HEAT" "HIGH WINDS"
## [23] "WIND" "HEAVY RAINS"
## [25] "LIGHTNING AND HEAVY RAIN" "THUNDERSTORM WINDS HAIL"
## [27] "COLD" "HEAVY RAIN/LIGHTNING"
## [29] "FLASH FLOODING/THUNDERSTORM WI" "FLOODING"
## [31] "WATERSPOUT" "EXTREME COLD"
## [33] "LIGHTNING/HEAVY RAIN" "BREAKUP FLOODING"
## [35] "HIGH WIND" "FREEZE"
## [37] "RIVER FLOOD" "HIGH WINDS HEAVY RAINS"
## [39] "AVALANCHE" "MARINE MISHAP"
## [41] "HIGH TIDES" "HIGH WIND/SEAS"
## [43] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [45] "COASTAL FLOOD" "SEVERE TURBULENCE"
## [47] "RECORD RAINFALL" "HEAVY SNOW"
## [49] "HEAVY SNOW/WIND" "DUST STORM"
## [51] "FLOOD" "APACHE COUNTY"
## [53] "SLEET" "DUST DEVIL"
## [55] "ICE STORM" "EXCESSIVE HEAT"
## [57] "THUNDERSTORM WINDS/FUNNEL CLOU" "GUSTY WINDS"
## [59] "FLOODING/HEAVY RAIN" "HEAVY SURF COASTAL FLOODING"
## [61] "HIGH SURF" "WILD FIRES"
## [63] "HIGH" "WINTER STORM HIGH WINDS"
## [65] "WINTER STORMS" "MUDSLIDES"
## [67] "RAINSTORM" "SEVERE THUNDERSTORM"
## [69] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [71] "THUNDERSTORMS WINDS" "FLOOD/FLASH FLOOD"
## [73] "FLOOD/RAIN/WINDS" "THUNDERSTORMS"
## [75] "FLASH FLOOD WINDS" "WINDS"
## [77] "FUNNEL CLOUD" "HIGH WIND DAMAGE"
## [79] "STRONG WIND" "HEAVY SNOWPACK"
## [81] "FLASH FLOOD/" "HEAVY SURF"
## [83] "DRY MIRCOBURST WINDS" "DRY MICROBURST"
## [85] "URBAN FLOOD" "THUNDERSTORM WINDSS"
## [87] "MICROBURST WINDS" "HEAT WAVE"
## [89] "UNSEASONABLY WARM" "COASTAL FLOODING"
## [91] "STRONG WINDS" "BLIZZARD"
## [93] "WATERSPOUT/TORNADO" "WATERSPOUT TORNADO"
## [95] "STORM SURGE" "URBAN/SMALL STREAM FLOOD"
## [97] "WATERSPOUT-" "TORNADOES, TSTM WIND, HAIL"
## [99] "TROPICAL STORM ALBERTO" "TROPICAL STORM"
## [101] "TROPICAL STORM GORDON" "TROPICAL STORM JERRY"
## [103] "LIGHTNING THUNDERSTORM WINDS" "URBAN FLOODING"
## [105] "MINOR FLOODING" "WATERSPOUT-TORNADO"
## [107] "LIGHTNING INJURY" "LIGHTNING AND THUNDERSTORM WIN"
## [109] "FLASH FLOODS" "THUNDERSTORM WINDS53"
## [111] "WILDFIRE" "DAMAGING FREEZE"
## [113] "THUNDERSTORM WINDS 13" "HURRICANE"
## [115] "SNOW" "LIGNTNING"
## [117] "FROST" "FREEZING RAIN/SNOW"
## [119] "HIGH WINDS/" "THUNDERSNOW"
## [121] "FLOODS" "COOL AND WET"
## [123] "HEAVY RAIN/SNOW" "GLAZE ICE"
## [125] "MUD SLIDE" "HIGH WINDS"
## [127] "RURAL FLOOD" "MUD SLIDES"
## [129] "EXTREME HEAT" "DROUGHT"
## [131] "COLD AND WET CONDITIONS" "EXCESSIVE WETNESS"
## [133] "SLEET/ICE STORM" "GUSTNADO"
## [135] "FREEZING RAIN" "SNOW AND HEAVY SNOW"
## [137] "GROUND BLIZZARD" "EXTREME WIND CHILL"
## [139] "MAJOR FLOOD" "SNOW/HEAVY SNOW"
## [141] "FREEZING RAIN/SLEET" "ICE JAM FLOODING"
## [143] "COLD AIR TORNADO" "WIND DAMAGE"
## [145] "FOG" "TSTM WIND 55"
## [147] "SMALL STREAM FLOOD" "THUNDERTORM WINDS"
## [149] "HAIL/WINDS" "SNOW AND ICE"
## [151] "WIND STORM" "GRASS FIRES"
## [153] "LAKE FLOOD" "HAIL/WIND"
## [155] "WIND/HAIL" "ICE"
## [157] "SNOW AND ICE STORM" "THUNDERSTORM WINDS"
## [159] "WINTER WEATHER" "DROUGHT/EXCESSIVE HEAT"
## [161] "THUNDERSTORMS WIND" "TUNDERSTORM WIND"
## [163] "URBAN AND SMALL STREAM FLOODIN" "THUNDERSTORM WIND/LIGHTNING"
## [165] "HEAVY RAIN/SEVERE WEATHER" "THUNDERSTORM"
## [167] "WATERSPOUT/ TORNADO" "LIGHTNING."
## [169] "HURRICANE-GENERATED SWELLS" "RIVER AND STREAM FLOOD"
## [171] "HIGH WINDS/COASTAL FLOOD" "RAIN"
## [173] "RIVER FLOODING" "ICE FLOES"
## [175] "THUNDERSTORM WIND G50" "LIGHTNING FIRE"
## [177] "HEAVY LAKE SNOW" "RECORD COLD"
## [179] "HEAVY SNOW/FREEZING RAIN" "COLD WAVE"
## [181] "DUST DEVIL WATERSPOUT" "TORNADO F3"
## [183] "TORNDAO" "FLOOD/RIVER FLOOD"
## [185] "MUD SLIDES URBAN FLOODING" "TORNADO F1"
## [187] "GLAZE/ICE STORM" "GLAZE"
## [189] "HEAVY SNOW/WINTER STORM" "MICROBURST"
## [191] "AVALANCE" "BLIZZARD/WINTER STORM"
## [193] "DUST STORM/HIGH WINDS" "ICE JAM"
## [195] "FOREST FIRES" "FROST\\FREEZE"
## [197] "THUNDERSTORM WINDS." "HVY RAIN"
## [199] "HAIL 150" "HAIL 075"
## [201] "HAIL 100" "THUNDERSTORM WIND G55"
## [203] "HAIL 125" "THUNDERSTORM WIND G60"
## [205] "THUNDERSTORM WINDS G60" "HARD FREEZE"
## [207] "HAIL 200" "HEAVY SNOW AND HIGH WINDS"
## [209] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY RAIN AND FLOOD"
## [211] "RIP CURRENTS/HEAVY SURF" "URBAN AND SMALL"
## [213] "WILDFIRES" "FOG AND COLD TEMPERATURES"
## [215] "SNOW/COLD" "FLASH FLOOD FROM ICE JAMS"
## [217] "TSTM WIND G58" "MUDSLIDE"
## [219] "HEAVY SNOW SQUALLS" "SNOW SQUALL"
## [221] "SNOW/ICE STORM" "HEAVY SNOW/SQUALLS"
## [223] "HEAVY SNOW-SQUALLS" "ICY ROADS"
## [225] "HEAVY MIX" "SNOW FREEZING RAIN"
## [227] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [229] "SNOW SQUALLS" "SNOW/SLEET/FREEZING RAIN"
## [231] "RECORD SNOW" "HAIL 0.75"
## [233] "RECORD HEAT" "THUNDERSTORM WIND 65MPH"
## [235] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [237] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [239] "TORNADO F2" "RIP CURRENTS"
## [241] "HURRICANE EMILY" "COASTAL SURGE"
## [243] "HURRICANE GORDON" "HURRICANE FELIX"
## [245] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WINDS 63 MPH"
## [247] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [249] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [251] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [253] "BLOWING SNOW" "HEAVY SNOW/BLIZZARD"
## [255] "THUNDERSTORM HAIL" "THUNDERSTORM WINDSHAIL"
## [257] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [259] "ICE AND SNOW" "STORM FORCE WINDS"
## [261] "HEAVY SNOW/ICE" "LIGHTING"
## [263] "HIGH WIND/HEAVY SNOW" "THUNDERSTORM WINDS AND"
## [265] "HEAVY PRECIPITATION" "HIGH WIND/BLIZZARD"
## [267] "TSTM WIND DAMAGE" "FLOOD FLASH"
## [269] "RAIN/WIND" "SNOW/ICE"
## [271] "HAIL 75" "HEAT WAVE DROUGHT"
## [273] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAT WAVES"
## [275] "UNSEASONABLY WARM AND DRY" "UNSEASONABLY COLD"
## [277] "RECORD/EXCESSIVE HEAT" "THUNDERSTORM WIND G52"
## [279] "HIGH WAVES" "FLASH FLOOD/FLOOD"
## [281] "FLOOD/FLASH" "LOW TEMPERATURE"
## [283] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [285] "THUNDERSTORM WINDS/FLOODING" "HYPOTHERMIA"
## [287] "THUNDEERSTORM WINDS" "THUNERSTORM WINDS"
## [289] "HIGH WINDS/COLD" "COLD/WINDS"
## [291] "SNOW/ BITTER COLD" "COLD WEATHER"
## [293] "RAPIDLY RISING WATER" "WILD/FOREST FIRE"
## [295] "ICE/STRONG WINDS" "SNOW/HIGH WINDS"
## [297] "HIGH WINDS/SNOW" "SNOWMELT FLOODING"
## [299] "HEAVY SNOW AND STRONG WINDS" "SNOW ACCUMULATION"
## [301] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [303] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [305] "FREEZING DRIZZLE" "HAIL 175"
## [307] "FLASH FLOODING/FLOOD" "HAIL 275"
## [309] "HAIL 450" "EXCESSIVE RAINFALL"
## [311] "THUNDERSTORMW" "HAILSTORM"
## [313] "TSTM WINDS" "TSTMW"
## [315] "TSTM WIND 65)" "TROPICAL STORM DEAN"
## [317] "THUNDERSTORM WINDS/ FLOOD" "LANDSLIDE"
## [319] "HIGH WIND AND SEAS" "THUNDERSTORMWINDS"
## [321] "WILD/FOREST FIRES" "HEAVY SEAS"
## [323] "HAIL DAMAGE" "FLOOD & HEAVY RAIN"
## [325] "?" "THUNDERSTROM WIND"
## [327] "FLOOD/FLASHFLOOD" "HIGH WATER"
## [329] "HIGH WIND 48" "LANDSLIDES"
## [331] "URBAN/SMALL STREAM" "BRUSH FIRE"
## [333] "HEAVY SHOWER" "HEAVY SWELLS"
## [335] "URBAN SMALL" "URBAN FLOODS"
## [337] "FLASH FLOOD/LANDSLIDE" "HEAVY RAIN/SMALL STREAM URBAN"
## [339] "FLASH FLOOD LANDSLIDES" "TSTM WIND/HAIL"
## [341] "Other" "Ice jam flood (minor"
## [343] "Tstm Wind" "URBAN/SML STREAM FLD"
## [345] "ROUGH SURF" "Heavy Surf"
## [347] "Dust Devil" "Marine Accident"
## [349] "Freeze" "Strong Wind"
## [351] "COASTAL STORM" "Erosion/Cstl Flood"
## [353] "River Flooding" "Damaging Freeze"
## [355] "Beach Erosion" "High Surf"
## [357] "Heavy Rain/High Surf" "Unseasonable Cold"
## [359] "Early Frost" "Wintry Mix"
## [361] "Extreme Cold" "Coastal Flooding"
## [363] "Torrential Rainfall" "Landslump"
## [365] "Hurricane Edouard" "Coastal Storm"
## [367] "TIDAL FLOODING" "Tidal Flooding"
## [369] "Strong Winds" "EXTREME WINDCHILL"
## [371] "Glaze" "Extended Cold"
## [373] "Whirlwind" "Heavy snow shower"
## [375] "Light snow" "Light Snow"
## [377] "MIXED PRECIP" "Freezing Spray"
## [379] "DOWNBURST" "Mudslides"
## [381] "Microburst" "Mudslide"
## [383] "Cold" "Coastal Flood"
## [385] "Snow Squalls" "Wind Damage"
## [387] "Light Snowfall" "Freezing Drizzle"
## [389] "Gusty wind/rain" "GUSTY WIND/HVY RAIN"
## [391] "Wind" "Cold Temperature"
## [393] "Heat Wave" "Snow"
## [395] "COLD AND SNOW" "RAIN/SNOW"
## [397] "TSTM WIND (G45)" "Gusty Winds"
## [399] "GUSTY WIND" "TSTM WIND 40"
## [401] "TSTM WIND 45" "TSTM WIND (41)"
## [403] "TSTM WIND (G40)" "Frost/Freeze"
## [405] "AGRICULTURAL FREEZE" "OTHER"
## [407] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [409] "Lake Effect Snow" "Freezing Rain"
## [411] "Mixed Precipitation" "BLACK ICE"
## [413] "COASTALSTORM" "LIGHT SNOW"
## [415] "DAM BREAK" "Gusty winds"
## [417] "blowing snow" "GRADIENT WIND"
## [419] "TSTM WIND AND LIGHTNING" "gradient wind"
## [421] "Gradient wind" "Freezing drizzle"
## [423] "WET MICROBURST" "Heavy surf and wind"
## [425] "TYPHOON" "HIGH SWELLS"
## [427] "SMALL HAIL" "UNSEASONAL RAIN"
## [429] "COASTAL FLOODING/EROSION" " TSTM WIND (G45)"
## [431] "TSTM WIND (G45)" "HIGH WIND (G40)"
## [433] "TSTM WIND (G35)" "COASTAL EROSION"
## [435] "SEICHE" "COASTAL FLOODING/EROSION"
## [437] "HYPERTHERMIA/EXPOSURE" "WINTRY MIX"
## [439] "ROCK SLIDE" "GUSTY WIND/HAIL"
## [441] " TSTM WIND" "LANDSPOUT"
## [443] "EXCESSIVE SNOW" "LAKE EFFECT SNOW"
## [445] "FLOOD/FLASH/FLOOD" "MIXED PRECIPITATION"
## [447] "WIND AND WAVE" "LIGHT FREEZING RAIN"
## [449] "ICE ROADS" "ROUGH SEAS"
## [451] "TSTM WIND G45" "NON-SEVERE WIND DAMAGE"
## [453] "WARM WEATHER" "THUNDERSTORM WIND (G40)"
## [455] " FLASH FLOOD" "LATE SEASON SNOW"
## [457] "WINTER WEATHER MIX" "ROGUE WAVE"
## [459] "FALLING SNOW/ICE" "NON-TSTM WIND"
## [461] "NON TSTM WIND" "BLOWING DUST"
## [463] "VOLCANIC ASH" " HIGH SURF ADVISORY"
## [465] "HAZARDOUS SURF" "WHIRLWIND"
## [467] "ICE ON ROAD" "DROWNING"
## [469] "EXTREME COLD/WIND CHILL" "MARINE TSTM WIND"
## [471] "HURRICANE/TYPHOON" "WINTER WEATHER/MIX"
## [473] "FROST/FREEZE" "ASTRONOMICAL HIGH TIDE"
## [475] "HEAVY SURF/HIGH SURF" "TROPICAL DEPRESSION"
## [477] "LAKE-EFFECT SNOW" "MARINE HIGH WIND"
## [479] "TSUNAMI" "STORM SURGE/TIDE"
## [481] "COLD/WIND CHILL" "LAKESHORE FLOOD"
## [483] "MARINE THUNDERSTORM WIND" "MARINE STRONG WIND"
## [485] "ASTRONOMICAL LOW TIDE" "DENSE SMOKE"
## [487] "MARINE HAIL" "FREEZING FOG"
It is apparent that the column in the data that records the type of weather event is the EVTYPE column and the accompanying documentation indicates that there are 48 distinct event types currently defined and used to classify severe weather events in America. The EVTYPE column in our data shows 488 and it is apparent that the data in this column is quite messy. Before proceeding further with the analysis I attempted to reduce the numer of weather events in this column to more closely resemble the 48 categories in the supporting documentation.
I did this by replacing character strings with one of the 48 defined events using the gsub function as per the code below. This is a rather crude and inefficient method of mapping the events. It is quite laborious and means in some cases, where variables share a common string, converting strings into intermediate character string before converting to one of the recognised 48 event types for example when dealing with FLOOD, COASTAL FLOOD, FLASH FLOOD categories.
Stormdata$EVTYPE <- toupper(Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*HURRICANE.*|.*TYPHOON.*","HURRICANE (TYPHOON)", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("NON-TSTM WIND|NON TSTM WIND|.*HIGH WIND.*|HIGH WIND", "HIGH WIND", Stormdata$EVTYPE)
Stormdata$EVTPE <- gsub (".*MARINE TSTM.*", "MARINE TW", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*TSTM.*|.*THUDER.*|.*THUNDERT.*|.*THUNDERSTR.*|.*BURST.*|.*GUSTN.*|.*THUNDERSTORM.*|.*THUNDERESTORM.*|.*THUNERST.*|.*THUNDEER.*", "THUNDERSTORM WIND", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("MARINE TW", "MARINE THUNDERSTORM WIND", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("WINTRY MIX|.*WINTER WEATHER.*|LIGHT SNOW", "WINTER WEATHER", Stormdata$EVTYPE)
## in order to avoid Flash Flood, Coastal Flood and Lakeshore Flood all being subsumed
## into Flood, I had to first convert these strings into something else and then convert
## them back.
Stormdata$EVTYPE <- gsub (".*FLASH.*|RAPIDLY RISING WATER", "FF", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("*COASTAL.*", "CF", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("LAKE FLOOD|.*LAKESH.*", "LF", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*FLOOD.*", "FLOOD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("FF", "FLASH FLOOD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("CF", "COASTAL FLOOD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("LF", "LAKESHORE FLOOD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*TROPICAL STORM.*", "TROPICAL STORM", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*WATERSPOUT.*", "WATERSPOUT", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*TORNA.*|.*TORND.*|.*WHIRL.*|LANDSPOUT", "TORNADO", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*HEAVY SNOW.*|.*HEAVYSN.*", "HEAVY SNOW", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*STRONG WIND.*", "STRONG WIND", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*MARINE HAIL.*", "MH", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*HAIL.*", "HAIL", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("MH", "MARINE HAIL", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*FIRE.*", "WILDFIRE", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*AVALANC.*", "AVALANCHE", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*HEAVY SWELL.*|HIGH WAVES|HIGH SWELLS|HEAVY SEAS|.*SURF.*", "HIGH SURF", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*FREEZ.*|.*FROST.*", "FROST/FREEZE", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*LIGHTNIN.*|.*LIGNT.*|LIGHTING", "LIGHTNING", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*WARM.*", "HEAT", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*EXTREME COLD.*|.*BITTER.*|.*RECORD CO.*|.*HYPOTHERMIA.*|.*HYPERTHERMIA*|.*CHILL.*", "EXTREMCOLD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*COLD.*|.*CHILL.*|.*LOW TE.*", "COLD/WIND CHILL", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("EXTREMCOLD|EXTREME COLD/WIND CHILL/EXPOSURE", "EXTREME COLD/WIND CHILL", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*DROUGHT.*", "DROUGHT", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*HEAVY RAIN.*|UNSEASONAL RAIN|HEAVY PRECIPITATION|EXCESSIVE RAINFALL|.*HEAVY SHOW.*|.*HVY RAIN.*|.*TORRENTIAL.*|RECORD RAINFALL", "HEAVY RAIN", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*GUSTY.*", "STRONG WIND", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*WINTER STORM.*", "WINTER STORM", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("LAKE EFLASH FLOODECT SNOW|LAKE-EFLASH FLOODECT SNOW", "LAKEEFFECT", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*SNOW.*", "HEAVY SNOW", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("LAKEEFFECT", "LAKE-EFFECT SNOW", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*SLIDE.*|.*SLUMP.*", "DEBRIS FLOW", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("HEAT WAVES|RECORD/EXCESSIVE HEAT|RECORD HEAT", "EXCESSIVE HEAT", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*ICE STORM.*", "ICST", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*WINTER WEATHER.*|RAIN/WIND|WINTER WEATHER.*|COOL AND WET|GLAZE|ICE ON ROAD|ICE|ICE ON ROAD|BLACK ICE|ICY ROADS|FOG|ICE JAM", "WINTER WEATHER", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("ICST", "ICE STORM", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*BLIZZARD.*", "BLIZZARD", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub (".*WIND DAMAGE.*|HIGH WINDS", "HIGH WIND", Stormdata$EVTYPE)
Stormdata$EVTYPE <- gsub ("BLOWING DUST", "DUST STORM", Stormdata$EVTYPE)
1. Across the United States which types of events are most harmful with respect to population health?
In order to look at the effects of severe weather events on health, I summed fatalities and injuries for each event type and then combined thse into a total column which I used to rank the events in decreasing order of impact on population health.
Stormdata %>% select (EVTYPE, FATALITIES, INJURIES)%>% group_by (EVTYPE)%>%summarise (total_injury=sum(INJURIES),total_fatal=sum(FATALITIES))%>% mutate (total=total_fatal + total_injury)%>% arrange (desc(total)) -> health
head (health, 20)
## Source: local data frame [20 x 4]
##
## EVTYPE total_injury total_fatal total
## 1 TORNADO 91364 5634 96998
## 2 THUNDERSTORM WIND 9572 758 10330
## 3 EXCESSIVE HEAT 6575 1927 8502
## 4 FLOOD 6795 484 7279
## 5 LIGHTNING 5231 817 6048
## 6 HEAT 2119 977 3096
## 7 FLASH FLOOD 1802 1036 2838
## 8 ICE STORM 1990 89 2079
## 9 WINTER WEATHER 1759 146 1905
## 10 HIGH WIND 1531 297 1828
## 11 WILDFIRE 1608 90 1698
## 12 WINTER STORM 1338 216 1554
## 13 HURRICANE (TYPHOON) 1333 135 1468
## 14 HAIL 1371 15 1386
## 15 HEAVY SNOW 1121 147 1268
## 16 BLIZZARD 805 101 906
## 17 COLD/WIND CHILL 321 484 805
## 18 RIP CURRENT 232 368 600
## 19 HEAT WAVE 379 172 551
## 20 RIP CURRENTS 297 204 501
It is apparent from the table above that the weather event most harmful to human health in America is Tornadoes followed by Thunderstorm Wind and Excessive Heat.
top <- health [1:10, 1:3]
topmelt <- melt (top, id.var="EVTYPE")
g <- ggplot(topmelt, aes(x=EVTYPE, y=value, fill=variable)) + geom_bar(stat="identity") + scale_x_discrete(limits=c("HIGH WIND", "WINTER WEATHER", "ICE STORM", "FLASH FLOOD","HEAT", "LIGHTNING", "FLOOD", "EXCESSIVE HEAT", "THUNDERSTORM WIND", "TORNADO"))
g <- g + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Health Damage - 10 most severe events") + xlab("Weather Event") + ylab("Deaths and Injuries")
print (g)
It is important to note that effects of Tornados have been recorded since 1950 and effects of Thunderstorm Wind and Hail since 1955 while all other events have only been recorded since 1996. For comparison purposes therefore it might be better to use an average fatalities per year metric rather than simply summing these values. If we were to do that then EXCESSIVE HEAT would be at the top of the fatalities table while TORNADO would remain the top cause of injuries. TORNADO would still be the most damaging event in terms of combined injuries and fatalities.
2. Across the United States which types of events have the most economic consequences?
Stormdata %>% select (EVTYPE, PROPDMG, CROPDMG)%>% group_by (EVTYPE)%>%summarise (Property_Damage=sum(PROPDMG), Crop_Damage=sum(CROPDMG))%>% mutate (Total_Damage=Property_Damage + Crop_Damage)%>% arrange (desc(Total_Damage)) -> economic
head (economic, 20)
## Source: local data frame [20 x 4]
##
## EVTYPE Property_Damage Crop_Damage Total_Damage
## 1 TORNADO 3214551.41 100026.77 3314578.18
## 2 THUNDERSTORM WIND 2680781.61 199414.43 2880196.04
## 3 FLASH FLOOD 1473873.90 186484.21 1660358.11
## 4 HAIL 689827.78 581468.51 1271296.29
## 5 FLOOD 943552.62 177916.52 1121469.14
## 6 LIGHTNING 603429.78 3580.61 607010.39
## 7 HIGH WIND 382507.67 21637.81 404145.48
## 8 WINTER STORM 133720.59 2478.99 136199.58
## 9 HEAVY SNOW 133427.86 2175.72 135603.58
## 10 WILDFIRE 125223.29 9565.74 134789.03
## 11 ICE STORM 66500.67 1688.95 68189.62
## 12 STRONG WIND 66292.63 1831.90 68124.53
## 13 HEAVY RAIN 54147.69 11695.80 65843.49
## 14 TROPICAL STORM 49932.68 6465.12 56397.80
## 15 DROUGHT 4299.05 33954.40 38253.45
## 16 HURRICANE (TYPHOON) 25186.65 11637.79 36824.44
## 17 WINTER WEATHER 36775.51 20.00 36795.51
## 18 URBAN/SML STREAM FLD 26051.94 2793.80 28845.74
## 19 BLIZZARD 25418.48 172.00 25590.48
## 20 COLD/WIND CHILL 15366.04 6946.74 22312.78
The table above indicates that tornadoes were the most damaging event in economic terms followed by thunderstorm wind and flash flood. Again however a more comprehensive analysis would seek to quantify exactly how much damage occurred on a yearly basis as a result of each event. The analysis would also need to take into account the effects of inflation.
topec <- economic [1:10, 1:3]
topecmelt <- melt (topec, id.var="EVTYPE")
g1 <- ggplot(topecmelt, aes(x=EVTYPE, y=value, fill=variable)) + geom_bar(stat="identity") + scale_x_discrete(limits=c("WILDFIRE", "HEAVY SNOW", "WINTER STORM", "HIGH WIND", "LIGHTNING","FLOOD", "HAIL", "FLASH FLOOD", "THUNDERSTORM WIND", "TORNADO"))
g1 <- g1 + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Economic Damage - 10 most severe events") + xlab("Weather Event") + ylab("Property and Crop Damage")
print (g1)
If this data were being used to inform public policy decisions then a more rigorous analysis should be carried out. Some limitations of this analysis have already been identified. I summarise these and others below.