SYNOPSIS For this analysis, I examined data obtained from the U.S. National Oceanic and Atmospheric Administration that contained information on characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In my analyses, I cleaned the data sets by condensing identical variable names and eliminating missing or uninterpretable values. I then pulled the highest percentiles to look at the most dangerous and most economically expensive weather events and made a simple plot for each that is clear in showing which events are the most detrimental to public health and the country’s economy.
This first section will load the required libraries, read the raw data, and take a look at the data.
##Load in some libraries.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##Set right working directory and load in the raw data.
setwd("~/coursera/Finalproject")
data <- read.csv("repdata_data_StormData.csv.bz2")
##Check out the data and find the columns we'll need.
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
This section is where the data cleaning takes place. There were many repeated values and spelling errors in the dataset, so they all needed to be aggregated to continue with the analysis.
##EVTYPE, FATALITIES, and INJURIES will give us the information we need for the first question.
eventhealth <- data.frame(data$EVTYPE, data$FATALITIES, data$INJURIES)
##Let's look at it. We see EVTYPE has mixed capitalization and repeated variables.
head(eventhealth)
## data.EVTYPE data.FATALITIES data.INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
unique(eventhealth$data.EVTYPE)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "FREEZING RAIN"
## [5] "SNOW" "ICE STORM/FLASH FLOOD"
## [7] "SNOW/ICE" "WINTER STORM"
## [9] "HURRICANE OPAL/HIGH WINDS" "THUNDERSTORM WINDS"
## [11] "RECORD COLD" "HURRICANE ERIN"
## [13] "HURRICANE OPAL" "HEAVY RAIN"
## [15] "LIGHTNING" "THUNDERSTORM WIND"
## [17] "DENSE FOG" "RIP CURRENT"
## [19] "THUNDERSTORM WINS" "FLASH FLOOD"
## [21] "FLASH FLOODING" "HIGH WINDS"
## [23] "FUNNEL CLOUD" "TORNADO F0"
## [25] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [27] "HEAT" "WIND"
## [29] "LIGHTING" "HEAVY RAINS"
## [31] "LIGHTNING AND HEAVY RAIN" "FUNNEL"
## [33] "WALL CLOUD" "FLOODING"
## [35] "THUNDERSTORM WINDS HAIL" "FLOOD"
## [37] "COLD" "HEAVY RAIN/LIGHTNING"
## [39] "FLASH FLOODING/THUNDERSTORM WI" "WALL CLOUD/FUNNEL CLOUD"
## [41] "THUNDERSTORM" "WATERSPOUT"
## [43] "EXTREME COLD" "HAIL 1.75)"
## [45] "LIGHTNING/HEAVY RAIN" "HIGH WIND"
## [47] "BLIZZARD" "BLIZZARD WEATHER"
## [49] "WIND CHILL" "BREAKUP FLOODING"
## [51] "HIGH WIND/BLIZZARD" "RIVER FLOOD"
## [53] "HEAVY SNOW" "FREEZE"
## [55] "COASTAL FLOOD" "HIGH WIND AND HIGH TIDES"
## [57] "HIGH WIND/BLIZZARD/FREEZING RA" "HIGH TIDES"
## [59] "HIGH WIND AND HEAVY SNOW" "RECORD COLD AND HIGH WIND"
## [61] "RECORD HIGH TEMPERATURE" "RECORD HIGH"
## [63] "HIGH WINDS HEAVY RAINS" "HIGH WIND/ BLIZZARD"
## [65] "ICE STORM" "BLIZZARD/HIGH WIND"
## [67] "HIGH WIND/LOW WIND CHILL" "HEAVY SNOW/HIGH"
## [69] "RECORD LOW" "HIGH WINDS AND WIND CHILL"
## [71] "HEAVY SNOW/HIGH WINDS/FREEZING" "LOW TEMPERATURE RECORD"
## [73] "AVALANCHE" "MARINE MISHAP"
## [75] "WIND CHILL/HIGH WIND" "HIGH WIND/WIND CHILL/BLIZZARD"
## [77] "HIGH WIND/WIND CHILL" "HIGH WIND/HEAVY SNOW"
## [79] "HIGH TEMPERATURE RECORD" "FLOOD WATCH/"
## [81] "RECORD HIGH TEMPERATURES" "HIGH WIND/SEAS"
## [83] "HIGH WINDS/HEAVY RAIN" "HIGH SEAS"
## [85] "SEVERE TURBULENCE" "RECORD RAINFALL"
## [87] "RECORD SNOWFALL" "RECORD WARMTH"
## [89] "HEAVY SNOW/WIND" "EXTREME HEAT"
## [91] "WIND DAMAGE" "DUST STORM"
## [93] "APACHE COUNTY" "SLEET"
## [95] "HAIL STORM" "FUNNEL CLOUDS"
## [97] "FLASH FLOODS" "DUST DEVIL"
## [99] "EXCESSIVE HEAT" "THUNDERSTORM WINDS/FUNNEL CLOU"
## [101] "WINTER STORM/HIGH WIND" "WINTER STORM/HIGH WINDS"
## [103] "GUSTY WINDS" "STRONG WINDS"
## [105] "FLOODING/HEAVY RAIN" "SNOW AND WIND"
## [107] "HEAVY SURF COASTAL FLOODING" "HEAVY SURF"
## [109] "HEAVY PRECIPATATION" "URBAN FLOODING"
## [111] "HIGH SURF" "BLOWING DUST"
## [113] "URBAN/SMALL" "WILD FIRES"
## [115] "HIGH" "URBAN/SMALL FLOODING"
## [117] "WATER SPOUT" "HIGH WINDS DUST STORM"
## [119] "WINTER STORM HIGH WINDS" "LOCAL FLOOD"
## [121] "WINTER STORMS" "MUDSLIDES"
## [123] "RAINSTORM" "SEVERE THUNDERSTORM"
## [125] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [127] "THUNDERSTORMS WINDS" "DRY MICROBURST"
## [129] "FLOOD/FLASH FLOOD" "FLOOD/RAIN/WINDS"
## [131] "WINDS" "DRY MICROBURST 61"
## [133] "THUNDERSTORMS" "FLASH FLOOD WINDS"
## [135] "URBAN/SMALL STREAM FLOODING" "MICROBURST"
## [137] "STRONG WIND" "HIGH WIND DAMAGE"
## [139] "STREAM FLOODING" "URBAN AND SMALL"
## [141] "HEAVY SNOWPACK" "ICE"
## [143] "FLASH FLOOD/" "DOWNBURST"
## [145] "GUSTNADO AND" "FLOOD/RAIN/WIND"
## [147] "WET MICROBURST" "DOWNBURST WINDS"
## [149] "DRY MICROBURST WINDS" "DRY MIRCOBURST WINDS"
## [151] "DRY MICROBURST 53" "SMALL STREAM URBAN FLOOD"
## [153] "MICROBURST WINDS" "HIGH WINDS 57"
## [155] "DRY MICROBURST 50" "HIGH WINDS 66"
## [157] "HIGH WINDS 76" "HIGH WINDS 63"
## [159] "HIGH WINDS 67" "BLIZZARD/HEAVY SNOW"
## [161] "HEAVY SNOW/HIGH WINDS" "BLOWING SNOW"
## [163] "HIGH WINDS 82" "HIGH WINDS 80"
## [165] "HIGH WINDS 58" "FREEZING DRIZZLE"
## [167] "LIGHTNING THUNDERSTORM WINDSS" "DRY MICROBURST 58"
## [169] "HAIL 75" "HIGH WINDS 73"
## [171] "HIGH WINDS 55" "LIGHT SNOW AND SLEET"
## [173] "URBAN FLOOD" "DRY MICROBURST 84"
## [175] "THUNDERSTORM WINDS 60" "HEAVY RAIN/FLOODING"
## [177] "THUNDERSTORM WINDSS" "TORNADOS"
## [179] "GLAZE" "RECORD HEAT"
## [181] "COASTAL FLOODING" "HEAT WAVE"
## [183] "FIRST SNOW" "FREEZING RAIN AND SLEET"
## [185] "UNSEASONABLY DRY" "UNSEASONABLY WET"
## [187] "WINTRY MIX" "WINTER WEATHER"
## [189] "UNSEASONABLY COLD" "EXTREME/RECORD COLD"
## [191] "RIP CURRENTS HEAVY SURF" "SLEET/RAIN/SNOW"
## [193] "UNSEASONABLY WARM" "DROUGHT"
## [195] "NORMAL PRECIPITATION" "HIGH WINDS/FLOODING"
## [197] "DRY" "RAIN/SNOW"
## [199] "SNOW/RAIN/SLEET" "WATERSPOUT/TORNADO"
## [201] "WATERSPOUTS" "WATERSPOUT TORNADO"
## [203] "URBAN/SMALL STREAM FLOOD" "STORM SURGE"
## [205] "WATERSPOUT-TORNADO" "WATERSPOUT-"
## [207] "TORNADOES, TSTM WIND, HAIL" "TROPICAL STORM ALBERTO"
## [209] "TROPICAL STORM" "TROPICAL STORM GORDON"
## [211] "TROPICAL STORM JERRY" "LIGHTNING THUNDERSTORM WINDS"
## [213] "WAYTERSPOUT" "MINOR FLOODING"
## [215] "LIGHTNING INJURY" "URBAN/SMALL STREAM FLOOD"
## [217] "LIGHTNING AND THUNDERSTORM WIN" "THUNDERSTORM WINDS53"
## [219] "URBAN AND SMALL STREAM FLOOD" "URBAN AND SMALL STREAM"
## [221] "WILDFIRE" "DAMAGING FREEZE"
## [223] "THUNDERSTORM WINDS 13" "SMALL HAIL"
## [225] "HEAVY SNOW/HIGH WIND" "HURRICANE"
## [227] "WILD/FOREST FIRE" "SMALL STREAM FLOODING"
## [229] "MUD SLIDE" "LIGNTNING"
## [231] "FROST" "FREEZING RAIN/SNOW"
## [233] "HIGH WINDS/" "THUNDERSNOW"
## [235] "FLOODS" "EXTREME WIND CHILLS"
## [237] "COOL AND WET" "HEAVY RAIN/SNOW"
## [239] "SMALL STREAM AND URBAN FLOODIN" "SMALL STREAM/URBAN FLOOD"
## [241] "SNOW/SLEET/FREEZING RAIN" "SEVERE COLD"
## [243] "GLAZE ICE" "COLD WAVE"
## [245] "EARLY SNOW" "SMALL STREAM AND URBAN FLOOD"
## [247] "HIGH WINDS" "RURAL FLOOD"
## [249] "SMALL STREAM AND" "MUD SLIDES"
## [251] "HAIL 80" "EXTREME WIND CHILL"
## [253] "COLD AND WET CONDITIONS" "EXCESSIVE WETNESS"
## [255] "GRADIENT WINDS" "HEAVY SNOW/BLOWING SNOW"
## [257] "SLEET/ICE STORM" "THUNDERSTORM WINDS URBAN FLOOD"
## [259] "THUNDERSTORM WINDS SMALL STREA" "ROTATING WALL CLOUD"
## [261] "LARGE WALL CLOUD" "COLD AIR FUNNEL"
## [263] "GUSTNADO" "COLD AIR FUNNELS"
## [265] "BLOWING SNOW- EXTREME WIND CHI" "SNOW AND HEAVY SNOW"
## [267] "GROUND BLIZZARD" "MAJOR FLOOD"
## [269] "SNOW/HEAVY SNOW" "FREEZING RAIN/SLEET"
## [271] "ICE JAM FLOODING" "SNOW- HIGH WIND- WIND CHILL"
## [273] "STREET FLOOD" "COLD AIR TORNADO"
## [275] "SMALL STREAM FLOOD" "FOG"
## [277] "THUNDERSTORM WINDS 2" "FUNNEL CLOUD/HAIL"
## [279] "ICE/SNOW" "TSTM WIND 51"
## [281] "TSTM WIND 50" "TSTM WIND 52"
## [283] "TSTM WIND 55" "HEAVY SNOW/BLIZZARD"
## [285] "THUNDERSTORM WINDS 61" "HAIL 0.75"
## [287] "THUNDERSTORM DAMAGE" "THUNDERTORM WINDS"
## [289] "HAIL 1.00" "HAIL/WINDS"
## [291] "SNOW AND ICE" "WIND STORM"
## [293] "SNOWSTORM" "GRASS FIRES"
## [295] "LAKE FLOOD" "PROLONG COLD"
## [297] "HAIL/WIND" "HAIL 1.75"
## [299] "THUNDERSTORMW 50" "WIND/HAIL"
## [301] "SNOW AND ICE STORM" "URBAN AND SMALL STREAM FLOODIN"
## [303] "THUNDERSTORMS WIND" "THUNDERSTORM WINDS"
## [305] "HEAVY SNOW/SLEET" "AGRICULTURAL FREEZE"
## [307] "DROUGHT/EXCESSIVE HEAT" "TUNDERSTORM WIND"
## [309] "TROPICAL STORM DEAN" "THUNDERTSORM WIND"
## [311] "THUNDERSTORM WINDS/ HAIL" "THUNDERSTORM WIND/LIGHTNING"
## [313] "HEAVY RAIN/SEVERE WEATHER" "THUNDESTORM WINDS"
## [315] "WATERSPOUT/ TORNADO" "LIGHTNING."
## [317] "WARM DRY CONDITIONS" "HURRICANE-GENERATED SWELLS"
## [319] "HEAVY SNOW/ICE STORM" "RIVER AND STREAM FLOOD"
## [321] "HIGH WIND 63" "COASTAL SURGE"
## [323] "HEAVY SNOW AND ICE STORM" "MINOR FLOOD"
## [325] "HIGH WINDS/COASTAL FLOOD" "RAIN"
## [327] "RIVER FLOODING" "SNOW/RAIN"
## [329] "ICE FLOES" "HIGH WAVES"
## [331] "SNOW SQUALLS" "SNOW SQUALL"
## [333] "THUNDERSTORM WIND G50" "LIGHTNING FIRE"
## [335] "BLIZZARD/FREEZING RAIN" "HEAVY LAKE SNOW"
## [337] "HEAVY SNOW/FREEZING RAIN" "LAKE EFFECT SNOW"
## [339] "HEAVY WET SNOW" "DUST DEVIL WATERSPOUT"
## [341] "THUNDERSTORM WINDS/HEAVY RAIN" "THUNDERSTROM WINDS"
## [343] "THUNDERSTORM WINDS LE CEN" "HAIL 225"
## [345] "BLIZZARD AND HEAVY SNOW" "HEAVY SNOW AND ICE"
## [347] "ICE STORM AND SNOW" "HEAVY SNOW ANDBLOWING SNOW"
## [349] "HEAVY SNOW/ICE" "BLIZZARD AND EXTREME WIND CHIL"
## [351] "LOW WIND CHILL" "BLOWING SNOW & EXTREME WIND CH"
## [353] "WATERSPOUT/" "URBAN/SMALL STREAM"
## [355] "TORNADO F3" "FUNNEL CLOUD."
## [357] "TORNDAO" "HAIL 0.88"
## [359] "FLOOD/RIVER FLOOD" "MUD SLIDES URBAN FLOODING"
## [361] "TORNADO F1" "THUNDERSTORM WINDS G"
## [363] "DEEP HAIL" "GLAZE/ICE STORM"
## [365] "HEAVY SNOW/WINTER STORM" "AVALANCE"
## [367] "BLIZZARD/WINTER STORM" "DUST STORM/HIGH WINDS"
## [369] "ICE JAM" "FOREST FIRES"
## [371] "THUNDERSTORM WIND G60" "FROST\\FREEZE"
## [373] "THUNDERSTORM WINDS." "HAIL 88"
## [375] "HAIL 175" "HVY RAIN"
## [377] "HAIL 100" "HAIL 150"
## [379] "HAIL 075" "THUNDERSTORM WIND G55"
## [381] "HAIL 125" "THUNDERSTORM WINDS G60"
## [383] "HARD FREEZE" "HAIL 200"
## [385] "THUNDERSTORM WINDS FUNNEL CLOU" "THUNDERSTORM WINDS 62"
## [387] "WILDFIRES" "RECORD HEAT WAVE"
## [389] "HEAVY SNOW AND HIGH WINDS" "HEAVY SNOW/HIGH WINDS & FLOOD"
## [391] "HAIL FLOODING" "THUNDERSTORM WINDS/FLASH FLOOD"
## [393] "HIGH WIND 70" "WET SNOW"
## [395] "HEAVY RAIN AND FLOOD" "LOCAL FLASH FLOOD"
## [397] "THUNDERSTORM WINDS 53" "FLOOD/FLASH FLOODING"
## [399] "TORNADO/WATERSPOUT" "RAIN AND WIND"
## [401] "THUNDERSTORM WIND 59" "THUNDERSTORM WIND 52"
## [403] "COASTAL/TIDAL FLOOD" "SNOW/ICE STORM"
## [405] "BELOW NORMAL PRECIPITATION" "RIP CURRENTS/HEAVY SURF"
## [407] "FLASH FLOOD/FLOOD" "EXCESSIVE RAIN"
## [409] "RECORD/EXCESSIVE HEAT" "HEAT WAVES"
## [411] "LIGHT SNOW" "THUNDERSTORM WIND 69"
## [413] "HAIL DAMAGE" "LIGHTNING DAMAGE"
## [415] "RECORD TEMPERATURES" "LIGHTNING AND WINDS"
## [417] "FOG AND COLD TEMPERATURES" "OTHER"
## [419] "RECORD SNOW" "SNOW/COLD"
## [421] "FLASH FLOOD FROM ICE JAMS" "TSTM WIND G58"
## [423] "MUDSLIDE" "HEAVY SNOW SQUALLS"
## [425] "HEAVY SNOW/SQUALLS" "HEAVY SNOW-SQUALLS"
## [427] "ICY ROADS" "HEAVY MIX"
## [429] "SNOW FREEZING RAIN" "LACK OF SNOW"
## [431] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [433] "SNOW DROUGHT" "THUNDERSTORMW WINDS"
## [435] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WIND 65MPH"
## [437] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [439] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [441] "TORRENTIAL RAIN" "TORNADO F2"
## [443] "RIP CURRENTS" "HURRICANE EMILY"
## [445] "HURRICANE GORDON" "HURRICANE FELIX"
## [447] "THUNDERSTORM WIND 59 MPH" "THUNDERSTORM WINDS 63 MPH"
## [449] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [451] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [453] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [455] "THUNDERSTORM WIND 59 MPH." "HEAVY SNOW FREEZING RAIN"
## [457] "DAM FAILURE" "THUNDERSTORM HAIL"
## [459] "HAIL 088" "THUNDERSTORM WINDSHAIL"
## [461] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [463] "ICE AND SNOW" "RECORD COLD/FROST"
## [465] "STORM FORCE WINDS" "FREEZING RAIN AND SNOW"
## [467] "FREEZING RAIN SLEET AND" "SOUTHEAST"
## [469] "HEAVY SNOW & ICE" "FREEZING DRIZZLE AND FREEZING"
## [471] "THUNDERSTORM WINDS AND" "HAIL/ICY ROADS"
## [473] "FLASH FLOOD/HEAVY RAIN" "HEAVY RAIN; URBAN FLOOD WINDS;"
## [475] "HEAVY PRECIPITATION" "TSTM WIND DAMAGE"
## [477] "HIGH WATER" "FLOOD FLASH"
## [479] "RAIN/WIND" "THUNDERSTORM WINDS 50"
## [481] "THUNDERSTORM WIND G52" "FLOOD FLOOD/FLASH"
## [483] "THUNDERSTORM WINDS 52" "SNOW SHOWERS"
## [485] "THUNDERSTORM WIND G51" "HEAT WAVE DROUGHT"
## [487] "HEAVY SNOW/BLIZZARD/AVALANCHE" "RECORD SNOW/COLD"
## [489] "WET WEATHER" "UNSEASONABLY WARM AND DRY"
## [491] "FREEZING RAIN SLEET AND LIGHT" "RECORD/EXCESSIVE RAINFALL"
## [493] "TIDAL FLOOD" "BEACH EROSIN"
## [495] "THUNDERSTORM WIND G61" "FLOOD/FLASH"
## [497] "LOW TEMPERATURE" "SLEET & FREEZING RAIN"
## [499] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [501] "THUNDERSTORM WINDS/FLOODING" "THUNDEERSTORM WINDS"
## [503] "HIGHWAY FLOODING" "THUNDERSTORM W INDS"
## [505] "HYPOTHERMIA" "FLASH FLOOD/ FLOOD"
## [507] "THUNDERSTORM WIND 50" "THUNERSTORM WINDS"
## [509] "HEAVY RAIN/MUDSLIDES/FLOOD" "MUD/ROCK SLIDE"
## [511] "HIGH WINDS/COLD" "BEACH EROSION/COASTAL FLOOD"
## [513] "COLD/WINDS" "SNOW/ BITTER COLD"
## [515] "THUNDERSTORM WIND 56" "SNOW SLEET"
## [517] "DRY HOT WEATHER" "COLD WEATHER"
## [519] "RAPIDLY RISING WATER" "HAIL ALOFT"
## [521] "EARLY FREEZE" "ICE/STRONG WINDS"
## [523] "EXTREME WIND CHILL/BLOWING SNO" "SNOW/HIGH WINDS"
## [525] "HIGH WINDS/SNOW" "EARLY FROST"
## [527] "SNOWMELT FLOODING" "HEAVY SNOW AND STRONG WINDS"
## [529] "SNOW ACCUMULATION" "BLOWING SNOW/EXTREME WIND CHIL"
## [531] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [533] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [535] "FLASH FLOODING/FLOOD" "HAIL 275"
## [537] "HAIL 450" "FLASH FLOOODING"
## [539] "EXCESSIVE RAINFALL" "THUNDERSTORMW"
## [541] "HAILSTORM" "TSTM WINDS"
## [543] "BEACH FLOOD" "HAILSTORMS"
## [545] "TSTMW" "FUNNELS"
## [547] "TSTM WIND 65)" "THUNDERSTORM WINDS/ FLOOD"
## [549] "HEAVY RAINFALL" "HEAT/DROUGHT"
## [551] "HEAT DROUGHT" "NEAR RECORD SNOW"
## [553] "LANDSLIDE" "HIGH WIND AND SEAS"
## [555] "THUNDERSTORMWINDS" "THUNDERSTORM WINDS HEAVY RAIN"
## [557] "SLEET/SNOW" "EXCESSIVE"
## [559] "SNOW/SLEET/RAIN" "WILD/FOREST FIRES"
## [561] "HEAVY SEAS" "DUSTSTORM"
## [563] "FLOOD & HEAVY RAIN" "?"
## [565] "THUNDERSTROM WIND" "FLOOD/FLASHFLOOD"
## [567] "SNOW AND COLD" "HOT PATTERN"
## [569] "PROLONG COLD/SNOW" "BRUSH FIRES"
## [571] "SNOW\\COLD" "WINTER MIX"
## [573] "EXCESSIVE PRECIPITATION" "SNOWFALL RECORD"
## [575] "HOT/DRY PATTERN" "DRY PATTERN"
## [577] "MILD/DRY PATTERN" "MILD PATTERN"
## [579] "LANDSLIDES" "HEAVY SHOWERS"
## [581] "HEAVY SNOW AND" "HIGH WIND 48"
## [583] "LAKE-EFFECT SNOW" "BRUSH FIRE"
## [585] "WATERSPOUT FUNNEL CLOUD" "URBAN SMALL STREAM FLOOD"
## [587] "SAHARAN DUST" "HEAVY SHOWER"
## [589] "URBAN FLOOD LANDSLIDE" "HEAVY SWELLS"
## [591] "URBAN SMALL" "URBAN FLOODS"
## [593] "SMALL STREAM" "HEAVY RAIN/URBAN FLOOD"
## [595] "FLASH FLOOD/LANDSLIDE" "LANDSLIDE/URBAN FLOOD"
## [597] "HEAVY RAIN/SMALL STREAM URBAN" "FLASH FLOOD LANDSLIDES"
## [599] "EXTREME WINDCHILL" "URBAN/SML STREAM FLD"
## [601] "TSTM WIND/HAIL" "Other"
## [603] "Record dry month" "Temperature record"
## [605] "Minor Flooding" "Ice jam flood (minor"
## [607] "High Wind" "Tstm Wind"
## [609] "ROUGH SURF" "Wind"
## [611] "Heavy Surf" "Dust Devil"
## [613] "Wind Damage" "Marine Accident"
## [615] "Snow" "Freeze"
## [617] "Snow Squalls" "Coastal Flooding"
## [619] "Heavy Rain" "Strong Wind"
## [621] "COASTAL STORM" "COASTALFLOOD"
## [623] "Erosion/Cstl Flood" "Heavy Rain and Wind"
## [625] "Light Snow/Flurries" "Wet Month"
## [627] "Wet Year" "Tidal Flooding"
## [629] "River Flooding" "Damaging Freeze"
## [631] "Beach Erosion" "Hot and Dry"
## [633] "Flood/Flash Flood" "Icy Roads"
## [635] "High Surf" "Heavy Rain/High Surf"
## [637] "Thunderstorm Wind" "Rain Damage"
## [639] "Unseasonable Cold" "Early Frost"
## [641] "Wintry Mix" "blowing snow"
## [643] "STREET FLOODING" "Record Cold"
## [645] "Extreme Cold" "Ice Fog"
## [647] "Excessive Cold" "Torrential Rainfall"
## [649] "Freezing Rain" "Landslump"
## [651] "Late-season Snowfall" "Hurricane Edouard"
## [653] "Coastal Storm" "Flood"
## [655] "HEAVY RAIN/WIND" "TIDAL FLOODING"
## [657] "Winter Weather" "Snow squalls"
## [659] "Strong Winds" "Strong winds"
## [661] "RECORD WARM TEMPS." "Ice/Snow"
## [663] "Mudslide" "Glaze"
## [665] "Extended Cold" "Snow Accumulation"
## [667] "Freezing Fog" "Drifting Snow"
## [669] "Whirlwind" "Heavy snow shower"
## [671] "Heavy rain" "LATE SNOW"
## [673] "Record May Snow" "Record Winter Snow"
## [675] "Heavy Precipitation" " COASTAL FLOOD"
## [677] "Record temperature" "Light snow"
## [679] "Late Season Snowfall" "Gusty Wind"
## [681] "small hail" "Light Snow"
## [683] "MIXED PRECIP" "Black Ice"
## [685] "Mudslides" "Gradient wind"
## [687] "Snow and Ice" "Freezing Spray"
## [689] "Summary Jan 17" "Summary of March 14"
## [691] "Summary of March 23" "Summary of March 24"
## [693] "Summary of April 3rd" "Summary of April 12"
## [695] "Summary of April 13" "Summary of April 21"
## [697] "Summary August 11" "Summary of April 27"
## [699] "Summary of May 9-10" "Summary of May 10"
## [701] "Summary of May 13" "Summary of May 14"
## [703] "Summary of May 22 am" "Summary of May 22 pm"
## [705] "Heatburst" "Summary of May 26 am"
## [707] "Summary of May 26 pm" "Metro Storm, May 26"
## [709] "Summary of May 31 am" "Summary of May 31 pm"
## [711] "Summary of June 3" "Summary of June 4"
## [713] "Summary June 5-6" "Summary June 6"
## [715] "Summary of June 11" "Summary of June 12"
## [717] "Summary of June 13" "Summary of June 15"
## [719] "Summary of June 16" "Summary June 18-19"
## [721] "Summary of June 23" "Summary of June 24"
## [723] "Summary of June 30" "Summary of July 2"
## [725] "Summary of July 3" "Summary of July 11"
## [727] "Summary of July 22" "Summary July 23-24"
## [729] "Summary of July 26" "Summary of July 29"
## [731] "Summary of August 1" "Summary August 2-3"
## [733] "Summary August 7" "Summary August 9"
## [735] "Summary August 10" "Summary August 17"
## [737] "Summary August 21" "Summary August 28"
## [739] "Summary September 4" "Summary September 20"
## [741] "Summary September 23" "Summary Sept. 25-26"
## [743] "Summary: Oct. 20-21" "Summary: October 31"
## [745] "Summary: Nov. 6-7" "Summary: Nov. 16"
## [747] "Microburst" "wet micoburst"
## [749] "Hail(0.75)" "Funnel Cloud"
## [751] "Urban Flooding" "No Severe Weather"
## [753] "Urban flood" "Urban Flood"
## [755] "Cold" "Summary of May 22"
## [757] "Summary of June 6" "Summary August 4"
## [759] "Summary of June 10" "Summary of June 18"
## [761] "Summary September 3" "Summary: Sept. 18"
## [763] "Coastal Flood" "coastal flooding"
## [765] "Small Hail" "Record Temperatures"
## [767] "Light Snowfall" "Freezing Drizzle"
## [769] "Gusty wind/rain" "GUSTY WIND/HVY RAIN"
## [771] "Blowing Snow" "Early snowfall"
## [773] "Monthly Snowfall" "Record Heat"
## [775] "Seasonal Snowfall" "Monthly Rainfall"
## [777] "Cold Temperature" "Sml Stream Fld"
## [779] "Heat Wave" "MUDSLIDE/LANDSLIDE"
## [781] "Saharan Dust" "Volcanic Ash"
## [783] "Volcanic Ash Plume" "Thundersnow shower"
## [785] "NONE" "COLD AND SNOW"
## [787] "DAM BREAK" "TSTM WIND (G45)"
## [789] "SLEET/FREEZING RAIN" "BLACK ICE"
## [791] "BLOW-OUT TIDES" "UNSEASONABLY COOL"
## [793] "TSTM HEAVY RAIN" "Gusty Winds"
## [795] "GUSTY WIND" "TSTM WIND 40"
## [797] "TSTM WIND 45" "TSTM WIND (41)"
## [799] "TSTM WIND (G40)" "TSTM WND"
## [801] "Wintry mix" " TSTM WIND"
## [803] "Frost" "Frost/Freeze"
## [805] "RAIN (HEAVY)" "Record Warmth"
## [807] "Prolong Cold" "Cold and Frost"
## [809] "URBAN/SML STREAM FLDG" "STRONG WIND GUST"
## [811] "LATE FREEZE" "BLOW-OUT TIDE"
## [813] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [815] "Lake Effect Snow" "Mixed Precipitation"
## [817] "Record High" "COASTALSTORM"
## [819] "Snow and sleet" "Freezing rain"
## [821] "Gusty winds" "Blizzard Summary"
## [823] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"
## [825] "SUMMARY OF MARCH 29" "GRADIENT WIND"
## [827] "Icestorm/Blizzard" "Flood/Strong Wind"
## [829] "TSTM WIND AND LIGHTNING" "gradient wind"
## [831] "Freezing drizzle" "Mountain Snows"
## [833] "URBAN/SMALL STRM FLDG" "Heavy surf and wind"
## [835] "Mild and Dry Pattern" "COLD AND FROST"
## [837] "TYPHOON" "HIGH SWELLS"
## [839] "HIGH SWELLS" "VOLCANIC ASH"
## [841] "DRY SPELL" " LIGHTNING"
## [843] "BEACH EROSION" "UNSEASONAL RAIN"
## [845] "EARLY RAIN" "PROLONGED RAIN"
## [847] "WINTERY MIX" "COASTAL FLOODING/EROSION"
## [849] "HOT SPELL" "UNSEASONABLY HOT"
## [851] " TSTM WIND (G45)" "TSTM WIND (G45)"
## [853] "HIGH WIND (G40)" "TSTM WIND (G35)"
## [855] "DRY WEATHER" "ABNORMAL WARMTH"
## [857] "UNUSUAL WARMTH" "WAKE LOW WIND"
## [859] "MONTHLY RAINFALL" "COLD TEMPERATURES"
## [861] "COLD WIND CHILL TEMPERATURES" "MODERATE SNOW"
## [863] "MODERATE SNOWFALL" "URBAN/STREET FLOODING"
## [865] "COASTAL EROSION" "UNUSUAL/RECORD WARMTH"
## [867] "BITTER WIND CHILL" "BITTER WIND CHILL TEMPERATURES"
## [869] "SEICHE" "TSTM"
## [871] "COASTAL FLOODING/EROSION" "UNSEASONABLY WARM YEAR"
## [873] "HYPERTHERMIA/EXPOSURE" "ROCK SLIDE"
## [875] "ICE PELLETS" "PATCHY DENSE FOG"
## [877] "RECORD COOL" "RECORD WARM"
## [879] "HOT WEATHER" "RECORD TEMPERATURE"
## [881] "TROPICAL DEPRESSION" "VOLCANIC ERUPTION"
## [883] "COOL SPELL" "WIND ADVISORY"
## [885] "GUSTY WIND/HAIL" "RED FLAG FIRE WX"
## [887] "FIRST FROST" "EXCESSIVELY DRY"
## [889] "SNOW AND SLEET" "LIGHT SNOW/FREEZING PRECIP"
## [891] "VOG" "MONTHLY PRECIPITATION"
## [893] "MONTHLY TEMPERATURE" "RECORD DRYNESS"
## [895] "EXTREME WINDCHILL TEMPERATURES" "MIXED PRECIPITATION"
## [897] "DRY CONDITIONS" "REMNANTS OF FLOYD"
## [899] "EARLY SNOWFALL" "FREEZING FOG"
## [901] "LANDSPOUT" "DRIEST MONTH"
## [903] "RECORD COLD" "LATE SEASON HAIL"
## [905] "EXCESSIVE SNOW" "DRYNESS"
## [907] "FLOOD/FLASH/FLOOD" "WIND AND WAVE"
## [909] "LIGHT FREEZING RAIN" " WIND"
## [911] "MONTHLY SNOWFALL" "RECORD PRECIPITATION"
## [913] "ICE ROADS" "ROUGH SEAS"
## [915] "UNSEASONABLY WARM/WET" "UNSEASONABLY COOL & WET"
## [917] "UNUSUALLY WARM" "TSTM WIND G45"
## [919] "NON SEVERE HAIL" "NON-SEVERE WIND DAMAGE"
## [921] "UNUSUALLY COLD" "WARM WEATHER"
## [923] "LANDSLUMP" "THUNDERSTORM WIND (G40)"
## [925] "UNSEASONABLY WARM & WET" " FLASH FLOOD"
## [927] "LOCALLY HEAVY RAIN" "WIND GUSTS"
## [929] "UNSEASONAL LOW TEMP" "HIGH SURF ADVISORY"
## [931] "LATE SEASON SNOW" "GUSTY LAKE WIND"
## [933] "ABNORMALLY DRY" "WINTER WEATHER MIX"
## [935] "RED FLAG CRITERIA" "WND"
## [937] "CSTL FLOODING/EROSION" "SMOKE"
## [939] " WATERSPOUT" "SNOW ADVISORY"
## [941] "EXTREMELY WET" "UNUSUALLY LATE SNOW"
## [943] "VERY DRY" "RECORD LOW RAINFALL"
## [945] "ROGUE WAVE" "PROLONG WARMTH"
## [947] "ACCUMULATED SNOWFALL" "FALLING SNOW/ICE"
## [949] "DUST DEVEL" "NON-TSTM WIND"
## [951] "NON TSTM WIND" "GUSTY THUNDERSTORM WINDS"
## [953] "PATCHY ICE" "HEAVY RAIN EFFECTS"
## [955] "EXCESSIVE HEAT/DROUGHT" "NORTHERN LIGHTS"
## [957] "MARINE TSTM WIND" " HIGH SURF ADVISORY"
## [959] "HAZARDOUS SURF" "FROST/FREEZE"
## [961] "WINTER WEATHER/MIX" "ASTRONOMICAL HIGH TIDE"
## [963] "WHIRLWIND" "VERY WARM"
## [965] "ABNORMALLY WET" "TORNADO DEBRIS"
## [967] "EXTREME COLD/WIND CHILL" "ICE ON ROAD"
## [969] "DROWNING" "GUSTY THUNDERSTORM WIND"
## [971] "MARINE HAIL" "HIGH SURF ADVISORIES"
## [973] "HURRICANE/TYPHOON" "HEAVY SURF/HIGH SURF"
## [975] "SLEET STORM" "STORM SURGE/TIDE"
## [977] "COLD/WIND CHILL" "MARINE HIGH WIND"
## [979] "TSUNAMI" "DENSE SMOKE"
## [981] "LAKESHORE FLOOD" "MARINE THUNDERSTORM WIND"
## [983] "MARINE STRONG WIND" "ASTRONOMICAL LOW TIDE"
## [985] "VOLCANIC ASHFALL"
##Let's group all of the variables that are exact copies.
eventhealth <- eventhealth %>% group_by(data.EVTYPE) %>% summarise_all(list(sum))
head(eventhealth)
## # A tibble: 6 × 3
## data.EVTYPE data.FATALITIES data.INJURIES
## <chr> <dbl> <dbl>
## 1 " HIGH SURF ADVISORY" 0 0
## 2 " COASTAL FLOOD" 0 0
## 3 " FLASH FLOOD" 0 0
## 4 " LIGHTNING" 0 0
## 5 " TSTM WIND" 0 0
## 6 " TSTM WIND (G45)" 0 0
##Now we should get rid of any variables that have zeroes in both FATALITIES and INJURIES
nozeroes <- subset(eventhealth, eventhealth$data.FATALITIES > 0 | eventhealth$data.INJURIES > 0)
head(nozeroes)
## # A tibble: 6 × 3
## data.EVTYPE data.FATALITIES data.INJURIES
## <chr> <dbl> <dbl>
## 1 AVALANCE 1 0
## 2 AVALANCHE 224 170
## 3 BLACK ICE 1 24
## 4 BLIZZARD 101 805
## 5 blowing snow 1 1
## 6 BLOWING SNOW 1 13
##Here we make a column that totals the fatalities and injuries
nozeroes$total <- nozeroes$data.FATALITIES + nozeroes$data.INJURIES
head(nozeroes)
## # A tibble: 6 × 4
## data.EVTYPE data.FATALITIES data.INJURIES total
## <chr> <dbl> <dbl> <dbl>
## 1 AVALANCE 1 0 1
## 2 AVALANCHE 224 170 394
## 3 BLACK ICE 1 24 25
## 4 BLIZZARD 101 805 906
## 5 blowing snow 1 1 2
## 6 BLOWING SNOW 1 13 14
##Now we need to match similar/identical-but-typo'd variables. Let's see what we've got.
##For some reason, it would only work id I aggregated after each grep. It works, but its ugly and clunky
##If anyone has advice for the peer review comments, it would be greatly appreciated!
unique(nozeroes$data.EVTYPE)
## [1] "AVALANCE" "AVALANCHE"
## [3] "BLACK ICE" "BLIZZARD"
## [5] "blowing snow" "BLOWING SNOW"
## [7] "BRUSH FIRE" "COASTAL FLOOD"
## [9] "Coastal Flooding" "COASTAL FLOODING"
## [11] "COASTAL FLOODING/EROSION" "Coastal Storm"
## [13] "COASTAL STORM" "COASTALSTORM"
## [15] "Cold" "COLD"
## [17] "COLD AND SNOW" "Cold Temperature"
## [19] "COLD WAVE" "COLD WEATHER"
## [21] "COLD/WIND CHILL" "COLD/WINDS"
## [23] "DENSE FOG" "DROUGHT"
## [25] "DROUGHT/EXCESSIVE HEAT" "DROWNING"
## [27] "DRY MICROBURST" "DRY MIRCOBURST WINDS"
## [29] "Dust Devil" "DUST DEVIL"
## [31] "DUST STORM" "EXCESSIVE HEAT"
## [33] "EXCESSIVE RAINFALL" "EXCESSIVE SNOW"
## [35] "Extended Cold" "Extreme Cold"
## [37] "EXTREME COLD" "EXTREME COLD/WIND CHILL"
## [39] "EXTREME HEAT" "EXTREME WINDCHILL"
## [41] "FALLING SNOW/ICE" "FLASH FLOOD"
## [43] "FLASH FLOOD/FLOOD" "FLASH FLOODING"
## [45] "FLASH FLOODING/FLOOD" "FLASH FLOODS"
## [47] "FLOOD" "FLOOD & HEAVY RAIN"
## [49] "FLOOD/FLASH FLOOD" "FLOOD/RIVER FLOOD"
## [51] "FLOODING" "FOG"
## [53] "FOG AND COLD TEMPERATURES" "FREEZE"
## [55] "FREEZING DRIZZLE" "FREEZING RAIN"
## [57] "FREEZING RAIN/SNOW" "Freezing Spray"
## [59] "FROST" "FUNNEL CLOUD"
## [61] "GLAZE" "GLAZE/ICE STORM"
## [63] "GUSTY WIND" "Gusty winds"
## [65] "Gusty Winds" "GUSTY WINDS"
## [67] "HAIL" "HAZARDOUS SURF"
## [69] "HEAT" "Heat Wave"
## [71] "HEAT WAVE" "HEAT WAVE DROUGHT"
## [73] "HEAT WAVES" "HEAVY RAIN"
## [75] "HEAVY RAINS" "HEAVY SEAS"
## [77] "HEAVY SNOW" "HEAVY SNOW AND HIGH WINDS"
## [79] "Heavy snow shower" "HEAVY SNOW/BLIZZARD/AVALANCHE"
## [81] "HEAVY SNOW/ICE" "Heavy Surf"
## [83] "HEAVY SURF" "Heavy surf and wind"
## [85] "HEAVY SURF/HIGH SURF" "HIGH"
## [87] "HIGH SEAS" "High Surf"
## [89] "HIGH SURF" "HIGH SWELLS"
## [91] "HIGH WATER" "HIGH WAVES"
## [93] "HIGH WIND" "HIGH WIND 48"
## [95] "HIGH WIND AND SEAS" "HIGH WIND/HEAVY SNOW"
## [97] "HIGH WIND/SEAS" "HIGH WINDS"
## [99] "HIGH WINDS/COLD" "HIGH WINDS/SNOW"
## [101] "HURRICANE" "Hurricane Edouard"
## [103] "HURRICANE EMILY" "HURRICANE ERIN"
## [105] "HURRICANE FELIX" "HURRICANE OPAL"
## [107] "HURRICANE OPAL/HIGH WINDS" "HURRICANE-GENERATED SWELLS"
## [109] "HURRICANE/TYPHOON" "HYPERTHERMIA/EXPOSURE"
## [111] "HYPOTHERMIA" "Hypothermia/Exposure"
## [113] "HYPOTHERMIA/EXPOSURE" "ICE"
## [115] "ICE ON ROAD" "ICE ROADS"
## [117] "ICE STORM" "ICE STORM/FLASH FLOOD"
## [119] "ICY ROADS" "LANDSLIDE"
## [121] "LANDSLIDES" "LIGHT SNOW"
## [123] "LIGHTNING" "LIGHTNING AND THUNDERSTORM WIN"
## [125] "LIGHTNING INJURY" "LIGHTNING."
## [127] "LOW TEMPERATURE" "Marine Accident"
## [129] "MARINE HIGH WIND" "MARINE MISHAP"
## [131] "MARINE STRONG WIND" "MARINE THUNDERSTORM WIND"
## [133] "MARINE TSTM WIND" "MINOR FLOODING"
## [135] "MIXED PRECIP" "Mudslide"
## [137] "Mudslides" "NON TSTM WIND"
## [139] "NON-SEVERE WIND DAMAGE" "OTHER"
## [141] "RAIN/SNOW" "RAIN/WIND"
## [143] "RAPIDLY RISING WATER" "RECORD COLD"
## [145] "RECORD HEAT" "RECORD/EXCESSIVE HEAT"
## [147] "RIP CURRENT" "RIP CURRENTS"
## [149] "RIP CURRENTS/HEAVY SURF" "RIVER FLOOD"
## [151] "River Flooding" "RIVER FLOODING"
## [153] "ROGUE WAVE" "ROUGH SEAS"
## [155] "ROUGH SURF" "SLEET"
## [157] "SMALL HAIL" "Snow"
## [159] "SNOW" "SNOW AND ICE"
## [161] "SNOW SQUALL" "Snow Squalls"
## [163] "SNOW/ BITTER COLD" "SNOW/HIGH WINDS"
## [165] "STORM SURGE" "STORM SURGE/TIDE"
## [167] "STRONG WIND" "Strong Winds"
## [169] "STRONG WINDS" "THUNDERSNOW"
## [171] "THUNDERSTORM" "THUNDERSTORM WINDS"
## [173] "THUNDERSTORM WIND" "THUNDERSTORM WIND (G40)"
## [175] "THUNDERSTORM WIND G52" "THUNDERSTORM WINDS"
## [177] "THUNDERSTORM WINDS 13" "THUNDERSTORM WINDS/HAIL"
## [179] "THUNDERSTORM WINDSS" "THUNDERSTORMS WINDS"
## [181] "THUNDERSTORMW" "THUNDERTORM WINDS"
## [183] "TIDAL FLOODING" "TORNADO"
## [185] "TORNADO F2" "TORNADO F3"
## [187] "TORNADOES, TSTM WIND, HAIL" "Torrential Rainfall"
## [189] "TROPICAL STORM" "TROPICAL STORM GORDON"
## [191] "TSTM WIND" "TSTM WIND (G35)"
## [193] "TSTM WIND (G40)" "TSTM WIND (G45)"
## [195] "TSTM WIND/HAIL" "TSUNAMI"
## [197] "TYPHOON" "UNSEASONABLY COLD"
## [199] "UNSEASONABLY WARM" "UNSEASONABLY WARM AND DRY"
## [201] "URBAN AND SMALL STREAM FLOODIN" "URBAN/SML STREAM FLD"
## [203] "WARM WEATHER" "WATERSPOUT"
## [205] "WATERSPOUT TORNADO" "WATERSPOUT/TORNADO"
## [207] "Whirlwind" "WILD FIRES"
## [209] "WILD/FOREST FIRE" "WILDFIRE"
## [211] "WIND" "WIND STORM"
## [213] "WINDS" "WINTER STORM"
## [215] "WINTER STORM HIGH WINDS" "WINTER STORMS"
## [217] "WINTER WEATHER" "WINTER WEATHER MIX"
## [219] "WINTER WEATHER/MIX" "WINTRY MIX"
nozeroes$data.EVTYPE[grep("thunder", nozeroes$data.EVTYPE)] <- "thunderstorm"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("coastal fl", nozeroes$data.EVTYPE)] <- "coastal flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("cold", nozeroes$data.EVTYPE)] <- "cold"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("heat", nozeroes$data.EVTYPE)] <- "heat"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("wint", nozeroes$data.EVTYPE)] <- "winter weather"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("fire", nozeroes$data.EVTYPE)] <- "wild fire"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("wind", nozeroes$data.EVTYPE)] <- "wind"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("watersp", nozeroes$data.EVTYPE)] <- "waterspout"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("unseas", nozeroes$data.EVTYPE)] <- "heat"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("tornado", nozeroes$data.EVTYPE)] <- "tornado"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("snow", nozeroes$data.EVTYPE)] <- "snow"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("rip", nozeroes$data.EVTYPE)] <- "rip current"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("ice", nozeroes$data.EVTYPE)] <- "ice hazard"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("hurricane", nozeroes$data.EVTYPE)] <- "hurricane"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("aval", nozeroes$data.EVTYPE)] <- "avalanche"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
unique(nozeroes$data.EVTYPE)
## [1] "avalanche" "blizzard"
## [3] "coastal flood" "coastal storm"
## [5] "coastalstorm" "cold"
## [7] "dense fog" "drought"
## [9] "drowning" "dry microburst"
## [11] "dust devil" "dust storm"
## [13] "excessive rainfall" "flash flood"
## [15] "flash flood/flood" "flash flooding"
## [17] "flash flooding/flood" "flash floods"
## [19] "flood" "flood & heavy rain"
## [21] "flood/flash flood" "flood/river flood"
## [23] "flooding" "fog"
## [25] "freeze" "freezing drizzle"
## [27] "freezing rain" "freezing spray"
## [29] "frost" "funnel cloud"
## [31] "glaze" "hail"
## [33] "hazardous surf" "heat"
## [35] "heavy rain" "heavy rains"
## [37] "heavy seas" "heavy surf"
## [39] "heavy surf/high surf" "high"
## [41] "high seas" "high surf"
## [43] "high swells" "high water"
## [45] "high waves" "hurricane"
## [47] "hyperthermia/exposure" "hypothermia"
## [49] "hypothermia/exposure" "ice hazard"
## [51] "icy roads" "landslide"
## [53] "landslides" "lightning"
## [55] "lightning and thunderstorm win" "lightning injury"
## [57] "lightning." "low temperature"
## [59] "marine accident" "marine mishap"
## [61] "minor flooding" "mixed precip"
## [63] "mudslide" "mudslides"
## [65] "other" "rapidly rising water"
## [67] "rip current" "river flood"
## [69] "river flooding" "rogue wave"
## [71] "rough seas" "rough surf"
## [73] "sleet" "small hail"
## [75] "snow" "storm surge"
## [77] "storm surge/tide" "thunderstorm"
## [79] "thunderstormw" "tidal flooding"
## [81] "tornado" "torrential rainfall"
## [83] "tropical storm" "tropical storm gordon"
## [85] "tsunami" "typhoon"
## [87] "urban and small stream floodin" "urban/sml stream fld"
## [89] "warm weather" "waterspout"
## [91] "wild fire" "wind"
## [93] "winter weather"
##A few more
nozeroes$data.EVTYPE[grep("flood", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("heavy seas", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("landsl", nozeroes$data.EVTYPE)] <- "landslide"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("flash", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("high seas", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("high water", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("river flood", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("warm weath", nozeroes$data.EVTYPE)] <- "heat"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("dust", nozeroes$data.EVTYPE)] <- "dust storm"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("hazardous surf", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("heavy surf", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("high surf", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("high waves", nozeroes$data.EVTYPE)] <- "rough seas"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("hypothe", nozeroes$data.EVTYPE)] <- "hypothermia"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("lightning", nozeroes$data.EVTYPE)] <- "lightning"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("tropical", nozeroes$data.EVTYPE)] <- "tropical storm"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("urban", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("tidal flood", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("minor flood", nozeroes$data.EVTYPE)] <- "flood"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
nozeroes$data.EVTYPE[grep("torrential rain", nozeroes$data.EVTYPE)] <- "excessive rainfall"
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
##Aggregate that and check again to see if everything looks like they're in good categories
nozeroes <- aggregate(. ~ data.EVTYPE, transform(nozeroes, data.EVTYPE = tolower(data.EVTYPE)), sum)
unique(nozeroes$data.EVTYPE)
## [1] "avalanche" "blizzard" "coastal storm"
## [4] "coastalstorm" "cold" "dense fog"
## [7] "drought" "drowning" "dry microburst"
## [10] "dust storm" "excessive rainfall" "flood"
## [13] "fog" "freeze" "freezing drizzle"
## [16] "freezing rain" "freezing spray" "frost"
## [19] "funnel cloud" "glaze" "hail"
## [22] "heat" "heavy rain" "heavy rains"
## [25] "high" "high swells" "hurricane"
## [28] "hyperthermia/exposure" "hypothermia" "ice hazard"
## [31] "icy roads" "landslide" "lightning"
## [34] "low temperature" "marine accident" "marine mishap"
## [37] "mixed precip" "mudslide" "mudslides"
## [40] "other" "rapidly rising water" "rip current"
## [43] "rogue wave" "rough seas" "rough surf"
## [46] "sleet" "small hail" "snow"
## [49] "storm surge" "storm surge/tide" "thunderstorm"
## [52] "thunderstormw" "tornado" "tropical storm"
## [55] "tsunami" "typhoon" "waterspout"
## [58] "wild fire" "wind" "winter weather"
Next, the upper quantile of data is selected to answer our question of which events are the most hazardous to public saefty.
##Looks good to me. Now we want to see the 75% quantile because that will have the chunk with the most dangerous events.
quantile(nozeroes$total)
## 0% 25% 50% 75% 100%
## 1.0 4.0 27.5 573.0 96997.0
##The 75% quantile was at 783.5, so we'll use that.
upper75 <- subset(nozeroes, nozeroes$total >= 783.5)
If we get the max from our upper quantile, we see that tornadoes are the most dangerous event. Next we will make a plot that shows where tornadoes stand along with other events in this upper quartile.
##So, what's the most dangerous event? This line will return, tornado!
upper75[which.max(upper75$total),]
## data.EVTYPE data.FATALITIES data.INJURIES total
## 53 tornado 5633 91364 96997
##We're going to make a plot, so let's prepare our variables to be plotted.
trial1 <- as.numeric(as.factor(upper75$data.EVTYPE))
trial2 <- as.data.frame(trial1)
trial2$total <- upper75$total
Here is the code for making a simple line plot. Tornadoes appear to be quite dangerous indeed for public health.
##Calling quantile again to see where I want my ticks to be for the y axis.
quantile(upper75$total)
## 0% 25% 50% 75% 100%
## 796.00 1298.00 1972.50 9187.75 96997.00
marks <- c(0, 20000, 40000, 60000, 80000, 90000, 1000000)
plot(trial2$trial1, trial2$total, type="l", col="red", lwd=3, xlab="", ylab="Total", main="Most dangerous weather events", xaxt = "n", yaxt = "n")
axis(1, xaxp = c(1, 15, 15), at = c(1:15), labels = c("Blizzard", "Flood", "Fog", "Hail", "Heat", "Hurricane", "Ice hazard", "Lightning","Rip current", "Snow", "Thunderstorm", "Tornado", "Wildfire", "Wind", "Winter Weather"), cex.axis = 0.7, las = 2)
axis(2,at=marks,labels=marks)
##Tornados sure to blow it out of the ballpark!
This plot shows the events along the x-axis and the number of combined injuries and fatalities along the y-axis.
In this section, we will now assess which weather events are the most economically expensive. First, we will load the necessary packages, look at the data, and narrow it down to the columns we need.
library(dplyr)
library(data.table)
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:data.table':
##
## transpose
##Let's look and see what variables we'll need.
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
##Narrow down the dataset to the variables we want
econdata <- data.frame(data$EVTYPE, data$PROPDMG, data$PROPDMGEXP, data$CROPDMG, data$CROPDMGEXP)
Next, we see that the dollar values for event type is represented as a decimal following an H, K, M, or B (for hundreds, thousands, millions, and billions). The first thing we’re going to do about this is make these values uniform by capitalizing them and eradicating rows that contain nonsensical inputs in this column (+, ?, 2, among others). This will remove a large portion of the dataset, but unfortunately these values cannot be worked with if they cannot be interpreted.
##If we make sure that only H, K, M, and B are values in these categories,
##We see that there are lots of weird typo values!
unique(econdata$data.PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(econdata$data.CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
##Let's remove the nonsensical entries.
condensedata <- subset(econdata, !(data.PROPDMGEXP %in% c("+", "", "0","6", "5", "?", "4", "2", "3", "7", "-", "1", "8")))
condensedata <- subset(econdata, !(data.CROPDMGEXP %in% c("", "?", "0", "2")))
##This took a significant amount of data out.
##Yet, the value of each damage number being sensible is necessary for interpreting this data.
##Make the capitalizations uniform.
condensedata$data.PROPDMGEXP <- toupper(condensedata$data.PROPDMGEXP)
condensedata$data.CROPDMGEXP <- toupper(condensedata$data.CROPDMGEXP)
condensedata$data.EVTYPE <- tolower(condensedata$data.EVTYPE)
This next section will organize the event types by making similar and identical variables have uniform value names to be aggregated later.
##Let's check out how many different events there are, and how many typo's we have.
unique(condensedata$data.EVTYPE)
## [1] "hurricane opal/high winds" "thunderstorm winds"
## [3] "hurricane erin" "hurricane opal"
## [5] "tornado" "thunderstorm winds/hail"
## [7] "flash flood" "flash flooding"
## [9] "thunderstorm winds hail" "hail"
## [11] "flooding" "heavy rain"
## [13] "heat" "high winds heavy rains"
## [15] "flood" "thunderstorm wind"
## [17] "river flood" "winter storm"
## [19] "high winds" "winter storm high winds"
## [21] "winter storms" "severe thunderstorms"
## [23] "severe thunderstorm winds" "flood/rain/winds"
## [25] "thunderstorms" "winds"
## [27] "flood/flash flood" "severe thunderstorm"
## [29] "extreme cold" "lightning"
## [31] "heavy snow" "tornadoes, tstm wind, hail"
## [33] "tropical storm" "tropical storm gordon"
## [35] "tropical storm jerry" "tornado f0"
## [37] "blizzard" "damaging freeze"
## [39] "frost" "cool and wet"
## [41] "glaze ice" "heat wave"
## [43] "drought" "cold and wet conditions"
## [45] "excessive wetness" "urban flood"
## [47] "urban flooding" "thunderstorms winds"
## [49] "gustnado" "freeze"
## [51] "cold air tornado" "wind damage"
## [53] "high wind" "snow"
## [55] "small stream flood" "hail/winds"
## [57] "hail/wind" "drought/excessive heat"
## [59] "heavy rains" "tstm wind"
## [61] "thunderstorm winds lightning" "ice storm"
## [63] "dust storm/high winds" "ice jam flooding"
## [65] "forest fires" "hvy rain"
## [67] "hail 150" "hail 075"
## [69] "hail 100" "hail 125"
## [71] "thunderstorm wind g60" "thunderstorm winds g60"
## [73] "hard freeze" "hail 200"
## [75] "heavy snow/high winds & flood" "floods"
## [77] "hurricane felix" "thunderstorm"
## [79] "thunderstorm wind." "strong winds"
## [81] "thunderstorm hail" "thuderstorm winds"
## [83] "extreme heat" "heat wave drought"
## [85] "coastal flooding" "heavy rains/flooding"
## [87] "high winds/cold" "river flooding"
## [89] "wild/forest fire" "tornadoes"
## [91] "thunderstorms wind" "flash flood/flood"
## [93] "flash flooding/flood" "thunderstorm windss"
## [95] "tropical storm dean" "thunderstorm winds/ flood"
## [97] "wild/forest fires" "rain"
## [99] "wildfires" "hurricane"
## [101] "funnel cloud" "tstm wind/hail"
## [103] "excessive heat" "urban/sml stream fld"
## [105] "waterspout" "heavy rain/high surf"
## [107] "unseasonable cold" "early frost"
## [109] "storm surge" "frost/freeze"
## [111] "agricultural freeze" "other"
## [113] "unseasonably cold" "typhoon"
## [115] "small hail" "unseasonal rain"
## [117] "gusty winds" "unseasonably warm"
## [119] "icy roads" "fog"
## [121] "dust storm" "gusty wind"
## [123] "dry microburst" "landslide"
## [125] "wind" "strong wind"
## [127] "extreme cold/wind chill" "wildfire"
## [129] "extreme windchill" "hurricane/typhoon"
## [131] "heavy surf/high surf" "tropical depression"
## [133] "rip current" "astronomical high tide"
## [135] "coastal flood" "dense fog"
## [137] "winter weather" "high surf"
## [139] "cold/wind chill" "tsunami"
## [141] "avalanche" "sleet"
## [143] "lake-effect snow" "storm surge/tide"
## [145] "freezing fog" "seiche"
## [147] "lakeshore flood" "marine thunderstorm wind"
## [149] "marine hail" "marine strong wind"
## [151] "dust devil" "astronomical low tide"
## [153] "dense smoke" "marine high wind"
## [155] "volcanic ashfall"
##Take care of some of the many typo's...
condensedata$data.EVTYPE[grep("thunder", condensedata$data.EVTYPE)] <- "thunderstorm"
condensedata$data.EVTYPE[grep("coastal fl", condensedata$data.EVTYPE)] <- "coastal flood"
condensedata$data.EVTYPE[grep("cold", condensedata$data.EVTYPE)] <- "cold"
condensedata$data.EVTYPE[grep("heat", condensedata$data.EVTYPE)] <- "heat"
condensedata$data.EVTYPE[grep("wint", condensedata$data.EVTYPE)] <- "winter weather"
condensedata$data.EVTYPE[grep("fire", condensedata$data.EVTYPE)] <- "wild fire"
condensedata$data.EVTYPE[grep("wind", condensedata$data.EVTYPE)] <- "wind"
condensedata$data.EVTYPE[grep("watersp", condensedata$data.EVTYPE)] <- "waterspout"
condensedata$data.EVTYPE[grep("unseas", condensedata$data.EVTYPE)] <- "heat"
condensedata$data.EVTYPE[grep("tornado", condensedata$data.EVTYPE)] <- "tornado"
condensedata$data.EVTYPE[grep("snow", condensedata$data.EVTYPE)] <- "snow"
condensedata$data.EVTYPE[grep("rip", condensedata$data.EVTYPE)] <- "rip current"
condensedata$data.EVTYPE[grep("ice", condensedata$data.EVTYPE)] <- "ice hazard"
condensedata$data.EVTYPE[grep("hurricane", condensedata$data.EVTYPE)] <- "hurricane"
condensedata$data.EVTYPE[grep("aval", condensedata$data.EVTYPE)] <- "avalanche"
condensedata$data.EVTYPE[grep("flood", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("heavy seas", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("landsl", condensedata$data.EVTYPE)] <- "landslide"
condensedata$data.EVTYPE[grep("flash", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("high seas", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("high water", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("river flood", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("warm weath", condensedata$data.EVTYPE)] <- "heat"
condensedata$data.EVTYPE[grep("dust", condensedata$data.EVTYPE)] <- "dust storm"
condensedata$data.EVTYPE[grep("hazardous surf", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("heavy surf", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("high surf", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("high waves", condensedata$data.EVTYPE)] <- "rough seas"
condensedata$data.EVTYPE[grep("hypothe", condensedata$data.EVTYPE)] <- "hypothermia"
condensedata$data.EVTYPE[grep("lightning", condensedata$data.EVTYPE)] <- "lightning"
condensedata$data.EVTYPE[grep("tropical", condensedata$data.EVTYPE)] <- "tropical storm"
condensedata$data.EVTYPE[grep("urban", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("tidal flood", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("minor flood", condensedata$data.EVTYPE)] <- "flood"
condensedata$data.EVTYPE[grep("torrential rain", condensedata$data.EVTYPE)] <- "excessive rainfall"
In order to aggregate our similar event types, we need to do something about the columns with the H, K, M and B values. We need to make them numerical so that the values can be summed.
##Now I will sort PROPDMGEXP as an ordered factor.
trial1 <- factor(condensedata$data.PROPDMGEXP, levels = c("B", "M", "K", "H"), ordered = TRUE)
condensedata$data.PROPDMGEXP <- trial1
##Same with CROPDMGEXP.
trial2 <- factor(condensedata$data.CROPDMGEXP, levels = c("B", "M", "K", "H"), ordered = TRUE)
condensedata$data.CROPDMGEXP <- trial2
##We need to change these numeric values to non scientific notation!
##Also, we need to change those factor values to their numeric counterpart.
levels(condensedata$data.PROPDMGEXP)[levels(condensedata$data.PROPDMGEXP) %in% c("B", "M", "K", "H")] <- c(1000000000, 1000000, 1000, 100)
condensedata$data.PROPDMGEXP <- as.numeric(condensedata$data.PROPDMGEXP)
condensedata$data.PROPDMGEXP[grep(1, condensedata$data.PROPDMGEXP)] <-1000000000
condensedata$data.PROPDMGEXP[grep(2, condensedata$data.PROPDMGEXP)] <-1000000
condensedata$data.PROPDMGEXP[grep(3, condensedata$data.PROPDMGEXP)] <-1000
condensedata$data.PROPDMGEXP[grep(4, condensedata$data.PROPDMGEXP)] <-100
levels(condensedata$data.CROPDMGEXP)[levels(condensedata$data.CROPDMGEXP) %in% c("B", "M", "K", "H")] <- c(1000000000, 1000000, 1000, 100)
condensedata$data.CROPDMGEXP <- as.numeric(condensedata$data.CROPDMGEXP)
condensedata$data.CROPDMGEXP[grep(1, condensedata$data.CROPDMGEXP)] <-1000000000
condensedata$data.CROPDMGEXP[grep(2, condensedata$data.CROPDMGEXP)] <-1000000
condensedata$data.CROPDMGEXP[grep(3, condensedata$data.CROPDMGEXP)] <-1000
condensedata$data.CROPDMGEXP[grep(4, condensedata$data.CROPDMGEXP)] <-100
condensedata$data.PROPDMGEXP <- format(condensedata$data.PROPDMGEXP, scientific = F)
condensedata$data.CROPDMGEXP <- format(condensedata$data.CROPDMGEXP, scientific = F)
condensedata$data.PROPDMGEXP <- as.numeric(condensedata$data.PROPDMGEXP)
## Warning: NAs introduced by coercion
condensedata$data.CROPDMGEXP <- as.numeric(condensedata$data.CROPDMGEXP)
We can now multiply the columns to get our desired numerical value for the dollar amount, and aggregate.
##Now let's multiply the corresponding damage columns to finally get a real workable number!
condensedata$propdamage <- condensedata$data.PROPDMG * condensedata$data.PROPDMGEXP
condensedata$propdamage <- format(condensedata$propdamage, scientific = F)
condensedata$cropdamage <- condensedata$data.CROPDMG * condensedata$data.CROPDMGEXP
condensedata$cropdamage <- format(condensedata$cropdamage, scientific = F)
condensedata$cropdamage <- as.numeric(condensedata$cropdamage)
condensedata$propdamage <- as.numeric(condensedata$propdamage)
## Warning: NAs introduced by coercion
##Double check what it all looks like.
head(condensedata)
## data.EVTYPE data.PROPDMG data.PROPDMGEXP data.CROPDMG data.CROPDMGEXP
## 187566 wind 0.1 1e+09 10 1e+06
## 187571 thunderstorm 5.0 1e+06 500 1e+03
## 187581 hurricane 25.0 1e+06 1 1e+06
## 187583 hurricane 48.0 1e+06 4 1e+06
## 187584 hurricane 20.0 1e+06 10 1e+06
## 187653 thunderstorm 50.0 1e+03 50 1e+03
## propdamage cropdamage
## 187566 1.0e+08 1e+07
## 187571 5.0e+06 5e+05
## 187581 2.5e+07 1e+06
## 187583 4.8e+07 4e+06
## 187584 2.0e+07 1e+07
## 187653 5.0e+04 5e+04
##Great! Now let's get rid of those old variables.
condensedata <- subset(condensedata, select = -c(data.CROPDMGEXP, data.PROPDMGEXP, data.PROPDMG, data.CROPDMG))
##Aggregate and condense identicals!
condensedata <- aggregate(. ~ data.EVTYPE, transform(condensedata, data.EVTYPE = tolower(data.EVTYPE)), sum)
Now that things are tidied up a bit, we can order our values in decreasing order to find that floods are the most economically damaging weather event.
##Make a total column.
condensedata$total <- condensedata$propdamage + condensedata$cropdamage
##And order it in decreasing order!
##From this we'll see that floods cause the most damage.
condensedata <- condensedata[order(condensedata$total, decreasing = TRUE),]
And, finally, we make a plot to show how floods compare to the other events.
##Now we have our dataset that's been cleaned up a bit, let's make a plot!
##I'll get the quantiles to narrow our plot to the 75th percentile of top damage.
quantile(condensedata$total)
## 0% 25% 50% 75% 100%
## 0 105750 44027500 1092444525 157763987440
upper75 <- subset(condensedata, condensedata$total >= 1092444525)
eventtype <- as.numeric(as.factor(upper75$data.EVTYPE))
eventtype <- as.data.frame(eventtype)
eventtype$total <- upper75$total
##In order to make a plot in this format, I needed to look at eventtype and upper75 again and MATCH
##the row numbers correctly!
plot(eventtype$eventtype, eventtype$total, type="p", col="red", lwd=3, xlab="", ylab="Total", main="Most expensive weather events", xaxt = "n")
axis(1, xaxp = c(1, 12, 12), at = c(1:12), labels = c("Drought", "Flood", "Hail", "Hurricane", "Ice Hazard", "Storm Surge", "Thunderstorm", "Tornado","Tropical Storm", "Wildfire", "Wind", "Winter Weather"), cex.axis = 0.7, las = 2)
This plot shows the events on the x-axis and the number of cost in dollars in scientific notation on the y-axis. Because the numbers were astronomical, it made sense to leave it in scientific notation. If the reader wishes to see the exact number, they can access it via:
eventtype
## eventtype total
## 1 2 157763987440
## 2 4 44203445800
## 3 8 16520165550
## 4 3 10020591590
## 5 11 6262199250
## 6 5 5935452600
## 7 6 4641493000
## 8 7 4329075140
## 9 10 3838549570
## 10 1 1886417000
## 11 9 1531654350
## 12 12 1142465700
From our analysis, we can conclude that the most dangerous weather events in the United States are tornadoes, and the most economically expensive weather events are floods. This information will help the respective agencies prepare for and minimize the future costs of these two impactful weather events.