The purpose of this analysis is to determine, which types of storm events have the biggest impact on human health and on the economy in the United States. For this purpose, we will use a dataset from the National Weather Service an an acompannying document that describes the methodology of storm events classification and recording, hereinafter referred to as the “description document”. The results are presented via two bar plots showing the top ten event types causing the biggest health and economic damage.
Sources:
The main dataset:
File: repdata%2Fdata%2FStormData.csv.bz2
Available at: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
The description document:
Title: National Weather Service Instruction 10-1605, August 17, 2007, Operations And Services, Performance, Nwspd 10-16, Storm Data Preparation
File: repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
Available at: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
Tools used:
- Acer A517-51G, Intel Core i5-8250U 1.60GHz, 12.0 GB, Windows 10 Home 64 bit
- RStudio 3.4.3
- MS Excell
if(!file.exists("data.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data.bz2")
}
a <- read.csv("data.csv.bz2")
Let’s look at the columns first:
data <- a
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We will select only the columns we need, i.e. the event type and the columns summarising fatalities, injuries and economic damage:
library(dplyr)
data <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(data)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
We will convert the factors into character vectors:
data$EVTYPE <- as.character(data$EVTYPE)
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)
Now let’s see what the data look like. First check for NAs and missing values:
table(is.na(data))
##
## FALSE
## 6316079
table(data == "")
##
## FALSE TRUE
## 5231732 1084347
So there are no NAs but there are missing values. Let’s check those closer:
lapply(data, function(x) sum(x==""))
## $EVTYPE
## [1] 0
##
## $FATALITIES
## [1] 0
##
## $INJURIES
## [1] 0
##
## $PROPDMG
## [1] 0
##
## $PROPDMGEXP
## [1] 465934
##
## $CROPDMG
## [1] 0
##
## $CROPDMGEXP
## [1] 618413
So the empty values are only in the columns PROPDMGEXP and CROPDMGGEXP which contain the characters to signify the magnitute of damage. Therefore, they should only be missing when the damages are zero. Let’s check that:
nrow(filter(data, PROPDMGEXP == "" & PROPDMG !=0))
## [1] 76
nrow(filter(data, CROPDMGEXP == "" & CROPDMG !=0))
## [1] 3
We can see that the characters are missing even with non-zero damage. Those observations are useless so we will remove them:
data <- filter(data, !(PROPDMGEXP == "" & PROPDMG !=0))
data <- filter(data, !(CROPDMGEXP == "" & CROPDMG !=0))
Let’s now examine the columns related to damage further. The numerical columns are ok - they have the “num”" class and there are neither NAs nor empty fields. What about the character columns? The only acceptable values in those are K, M and B.
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "0" "k" "?" "2"
Those are clearly not the only values appearing there. First let’s replace the lower-case acceptable characters by upper cases:
data[data$PROPDMGEXP == "m",]$PROPDMGEXP <- "M"
data[data$CROPDMGEXP == "m",]$CROPDMGEXP <- "M"
data[data$CROPDMGEXP == "k",]$CROPDMGEXP <- "K"
What about those characters that make no sense? There is a chance that people used them randomly with zero-damage cases. We should keep those observations to preserve the data. The rest must be deleted because it carries no valuable information.
data <- filter(data, !(!(PROPDMGEXP %in% c("M", "K", "B")) & PROPDMG != 0))
data <- filter(data, !(!(CROPDMGEXP %in% c("M", "K", "B")) & CROPDMG != 0))
So the “EXP” columns are now clean and we can use them to calculate the real value of the economic damage:
data[data$PROPDMGEXP == "K",]$PROPDMG <- data[data$PROPDMGEXP == "K",]$PROPDMG * 1000
data[data$PROPDMGEXP == "M",]$PROPDMG <- data[data$PROPDMGEXP == "M",]$PROPDMG * 1000000
data[data$PROPDMGEXP == "B",]$PROPDMG <- data[data$PROPDMGEXP == "B",]$PROPDMG * 1000000000
data[data$CROPDMGEXP == "K",]$CROPDMG <- data[data$CROPDMGEXP == "K",]$CROPDMG * 1000
data[data$CROPDMGEXP == "M",]$CROPDMG <- data[data$CROPDMGEXP == "M",]$CROPDMG * 1000000
data[data$CROPDMGEXP == "B",]$CROPDMG <- data[data$CROPDMGEXP == "B",]$CROPDMG * 1000000000
Now let’s move to the EVTYPE column:
unique(sort(data$EVTYPE))
## [1] " HIGH SURF ADVISORY" " COASTAL FLOOD"
## [3] " FLASH FLOOD" " LIGHTNING"
## [5] " TSTM WIND" " TSTM WIND (G45)"
## [7] " WATERSPOUT" " WIND"
## [9] "?" "ABNORMAL WARMTH"
## [11] "ABNORMALLY DRY" "ABNORMALLY WET"
## [13] "ACCUMULATED SNOWFALL" "AGRICULTURAL FREEZE"
## [15] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [17] "ASTRONOMICAL LOW TIDE" "AVALANCE"
## [19] "AVALANCHE" "BEACH EROSIN"
## [21] "Beach Erosion" "BEACH EROSION"
## [23] "BEACH EROSION/COASTAL FLOOD" "BEACH FLOOD"
## [25] "BELOW NORMAL PRECIPITATION" "BITTER WIND CHILL"
## [27] "BITTER WIND CHILL TEMPERATURES" "Black Ice"
## [29] "BLACK ICE" "BLIZZARD"
## [31] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"
## [33] "Blizzard Summary" "BLIZZARD WEATHER"
## [35] "BLIZZARD/FREEZING RAIN" "BLIZZARD/HEAVY SNOW"
## [37] "BLIZZARD/HIGH WIND" "BLIZZARD/WINTER STORM"
## [39] "BLOW-OUT TIDE" "BLOW-OUT TIDES"
## [41] "BLOWING DUST" "blowing snow"
## [43] "Blowing Snow" "BLOWING SNOW"
## [45] "BLOWING SNOW- EXTREME WIND CHI" "BLOWING SNOW & EXTREME WIND CH"
## [47] "BLOWING SNOW/EXTREME WIND CHIL" "BRUSH FIRE"
## [49] "BRUSH FIRES" "COASTAL FLOODING/EROSION"
## [51] "COASTAL EROSION" "Coastal Flood"
## [53] "COASTAL FLOOD" "coastal flooding"
## [55] "Coastal Flooding" "COASTAL FLOODING"
## [57] "COASTAL FLOODING/EROSION" "Coastal Storm"
## [59] "COASTAL STORM" "COASTAL SURGE"
## [61] "COASTAL/TIDAL FLOOD" "COASTALFLOOD"
## [63] "COASTALSTORM" "Cold"
## [65] "COLD" "COLD AIR FUNNEL"
## [67] "COLD AIR FUNNELS" "COLD AIR TORNADO"
## [69] "Cold and Frost" "COLD AND FROST"
## [71] "COLD AND SNOW" "COLD AND WET CONDITIONS"
## [73] "Cold Temperature" "COLD TEMPERATURES"
## [75] "COLD WAVE" "COLD WEATHER"
## [77] "COLD WIND CHILL TEMPERATURES" "COLD/WIND CHILL"
## [79] "COLD/WINDS" "COOL AND WET"
## [81] "COOL SPELL" "CSTL FLOODING/EROSION"
## [83] "DAM BREAK" "DAM FAILURE"
## [85] "Damaging Freeze" "DAMAGING FREEZE"
## [87] "DEEP HAIL" "DENSE FOG"
## [89] "DENSE SMOKE" "DOWNBURST"
## [91] "DOWNBURST WINDS" "DRIEST MONTH"
## [93] "Drifting Snow" "DROUGHT"
## [95] "DROUGHT/EXCESSIVE HEAT" "DROWNING"
## [97] "DRY" "DRY CONDITIONS"
## [99] "DRY HOT WEATHER" "DRY MICROBURST"
## [101] "DRY MICROBURST 50" "DRY MICROBURST 53"
## [103] "DRY MICROBURST 58" "DRY MICROBURST 61"
## [105] "DRY MICROBURST 84" "DRY MICROBURST WINDS"
## [107] "DRY MIRCOBURST WINDS" "DRY PATTERN"
## [109] "DRY SPELL" "DRY WEATHER"
## [111] "DRYNESS" "DUST DEVEL"
## [113] "Dust Devil" "DUST DEVIL"
## [115] "DUST DEVIL WATERSPOUT" "DUST STORM"
## [117] "DUST STORM/HIGH WINDS" "DUSTSTORM"
## [119] "EARLY FREEZE" "Early Frost"
## [121] "EARLY FROST" "EARLY RAIN"
## [123] "EARLY SNOW" "Early snowfall"
## [125] "EARLY SNOWFALL" "Erosion/Cstl Flood"
## [127] "EXCESSIVE" "Excessive Cold"
## [129] "EXCESSIVE HEAT" "EXCESSIVE HEAT/DROUGHT"
## [131] "EXCESSIVE PRECIPITATION" "EXCESSIVE RAIN"
## [133] "EXCESSIVE RAINFALL" "EXCESSIVE SNOW"
## [135] "EXCESSIVE WETNESS" "EXCESSIVELY DRY"
## [137] "Extended Cold" "Extreme Cold"
## [139] "EXTREME COLD" "EXTREME COLD/WIND CHILL"
## [141] "EXTREME HEAT" "EXTREME WIND CHILL"
## [143] "EXTREME WIND CHILL/BLOWING SNO" "EXTREME WIND CHILLS"
## [145] "EXTREME WINDCHILL" "EXTREME WINDCHILL TEMPERATURES"
## [147] "EXTREME/RECORD COLD" "EXTREMELY WET"
## [149] "FALLING SNOW/ICE" "FIRST FROST"
## [151] "FIRST SNOW" "FLASH FLOOD"
## [153] "FLASH FLOOD - HEAVY RAIN" "FLASH FLOOD FROM ICE JAMS"
## [155] "FLASH FLOOD LANDSLIDES" "FLASH FLOOD/"
## [157] "FLASH FLOOD/ FLOOD" "FLASH FLOOD/ STREET"
## [159] "FLASH FLOOD/FLOOD" "FLASH FLOOD/HEAVY RAIN"
## [161] "FLASH FLOOD/LANDSLIDE" "FLASH FLOODING"
## [163] "FLASH FLOODING/FLOOD" "FLASH FLOODING/THUNDERSTORM WI"
## [165] "FLASH FLOODS" "FLASH FLOOODING"
## [167] "Flood" "FLOOD"
## [169] "FLOOD & HEAVY RAIN" "FLOOD FLASH"
## [171] "FLOOD FLOOD/FLASH" "FLOOD WATCH/"
## [173] "FLOOD/FLASH" "Flood/Flash Flood"
## [175] "FLOOD/FLASH FLOOD" "FLOOD/FLASH FLOODING"
## [177] "FLOOD/FLASH/FLOOD" "FLOOD/FLASHFLOOD"
## [179] "FLOOD/RAIN/WIND" "FLOOD/RAIN/WINDS"
## [181] "FLOOD/RIVER FLOOD" "Flood/Strong Wind"
## [183] "FLOODING" "FLOODS"
## [185] "FOG" "FOG AND COLD TEMPERATURES"
## [187] "FOREST FIRES" "Freeze"
## [189] "FREEZE" "Freezing drizzle"
## [191] "Freezing Drizzle" "FREEZING DRIZZLE"
## [193] "FREEZING DRIZZLE AND FREEZING" "Freezing Fog"
## [195] "FREEZING FOG" "Freezing rain"
## [197] "Freezing Rain" "FREEZING RAIN"
## [199] "FREEZING RAIN AND SLEET" "FREEZING RAIN AND SNOW"
## [201] "FREEZING RAIN SLEET AND" "FREEZING RAIN SLEET AND LIGHT"
## [203] "FREEZING RAIN/SLEET" "FREEZING RAIN/SNOW"
## [205] "Freezing Spray" "Frost"
## [207] "FROST" "Frost/Freeze"
## [209] "FROST/FREEZE" "FROST\\FREEZE"
## [211] "FUNNEL" "Funnel Cloud"
## [213] "FUNNEL CLOUD" "FUNNEL CLOUD."
## [215] "FUNNEL CLOUD/HAIL" "FUNNEL CLOUDS"
## [217] "FUNNELS" "Glaze"
## [219] "GLAZE" "GLAZE ICE"
## [221] "GLAZE/ICE STORM" "gradient wind"
## [223] "Gradient wind" "GRADIENT WIND"
## [225] "GRADIENT WINDS" "GRASS FIRES"
## [227] "GROUND BLIZZARD" "GUSTNADO"
## [229] "GUSTNADO AND" "GUSTY LAKE WIND"
## [231] "GUSTY THUNDERSTORM WIND" "GUSTY THUNDERSTORM WINDS"
## [233] "Gusty Wind" "GUSTY WIND"
## [235] "GUSTY WIND/HAIL" "GUSTY WIND/HVY RAIN"
## [237] "Gusty wind/rain" "Gusty winds"
## [239] "Gusty Winds" "GUSTY WINDS"
## [241] "HAIL" "HAIL 0.75"
## [243] "HAIL 0.88" "HAIL 075"
## [245] "HAIL 088" "HAIL 1.00"
## [247] "HAIL 1.75" "HAIL 1.75)"
## [249] "HAIL 100" "HAIL 125"
## [251] "HAIL 150" "HAIL 175"
## [253] "HAIL 200" "HAIL 225"
## [255] "HAIL 275" "HAIL 450"
## [257] "HAIL 75" "HAIL 80"
## [259] "HAIL 88" "HAIL ALOFT"
## [261] "HAIL DAMAGE" "HAIL FLOODING"
## [263] "HAIL STORM" "Hail(0.75)"
## [265] "HAIL/ICY ROADS" "HAIL/WIND"
## [267] "HAIL/WINDS" "HAILSTORM"
## [269] "HAILSTORMS" "HARD FREEZE"
## [271] "HAZARDOUS SURF" "HEAT"
## [273] "HEAT DROUGHT" "Heat Wave"
## [275] "HEAT WAVE" "HEAT WAVE DROUGHT"
## [277] "HEAT WAVES" "HEAT/DROUGHT"
## [279] "Heatburst" "HEAVY LAKE SNOW"
## [281] "HEAVY MIX" "HEAVY PRECIPATATION"
## [283] "Heavy Precipitation" "HEAVY PRECIPITATION"
## [285] "Heavy rain" "Heavy Rain"
## [287] "HEAVY RAIN" "HEAVY RAIN AND FLOOD"
## [289] "Heavy Rain and Wind" "HEAVY RAIN EFFECTS"
## [291] "HEAVY RAIN/FLOODING" "Heavy Rain/High Surf"
## [293] "HEAVY RAIN/LIGHTNING" "HEAVY RAIN/MUDSLIDES/FLOOD"
## [295] "HEAVY RAIN/SEVERE WEATHER" "HEAVY RAIN/SMALL STREAM URBAN"
## [297] "HEAVY RAIN/SNOW" "HEAVY RAIN/URBAN FLOOD"
## [299] "HEAVY RAIN/WIND" "HEAVY RAIN; URBAN FLOOD WINDS;"
## [301] "HEAVY RAINFALL" "HEAVY RAINS"
## [303] "HEAVY RAINS/FLOODING" "HEAVY SEAS"
## [305] "HEAVY SHOWER" "HEAVY SHOWERS"
## [307] "HEAVY SNOW" "HEAVY SNOW-SQUALLS"
## [309] "HEAVY SNOW FREEZING RAIN" "HEAVY SNOW & ICE"
## [311] "HEAVY SNOW AND" "HEAVY SNOW AND HIGH WINDS"
## [313] "HEAVY SNOW AND ICE" "HEAVY SNOW AND ICE STORM"
## [315] "HEAVY SNOW AND STRONG WINDS" "HEAVY SNOW ANDBLOWING SNOW"
## [317] "Heavy snow shower" "HEAVY SNOW SQUALLS"
## [319] "HEAVY SNOW/BLIZZARD" "HEAVY SNOW/BLIZZARD/AVALANCHE"
## [321] "HEAVY SNOW/BLOWING SNOW" "HEAVY SNOW/FREEZING RAIN"
## [323] "HEAVY SNOW/HIGH" "HEAVY SNOW/HIGH WIND"
## [325] "HEAVY SNOW/HIGH WINDS" "HEAVY SNOW/HIGH WINDS & FLOOD"
## [327] "HEAVY SNOW/HIGH WINDS/FREEZING" "HEAVY SNOW/ICE"
## [329] "HEAVY SNOW/ICE STORM" "HEAVY SNOW/SLEET"
## [331] "HEAVY SNOW/SQUALLS" "HEAVY SNOW/WIND"
## [333] "HEAVY SNOW/WINTER STORM" "HEAVY SNOWPACK"
## [335] "Heavy Surf" "HEAVY SURF"
## [337] "Heavy surf and wind" "HEAVY SURF COASTAL FLOODING"
## [339] "HEAVY SURF/HIGH SURF" "HEAVY SWELLS"
## [341] "HEAVY WET SNOW" "HIGH"
## [343] "HIGH SWELLS" "HIGH WINDS"
## [345] "HIGH SEAS" "High Surf"
## [347] "HIGH SURF" "HIGH SURF ADVISORIES"
## [349] "HIGH SURF ADVISORY" "HIGH SWELLS"
## [351] "HIGH TEMPERATURE RECORD" "HIGH TIDES"
## [353] "HIGH WATER" "HIGH WAVES"
## [355] "High Wind" "HIGH WIND"
## [357] "HIGH WIND (G40)" "HIGH WIND 48"
## [359] "HIGH WIND 63" "HIGH WIND 70"
## [361] "HIGH WIND AND HEAVY SNOW" "HIGH WIND AND HIGH TIDES"
## [363] "HIGH WIND AND SEAS" "HIGH WIND DAMAGE"
## [365] "HIGH WIND/ BLIZZARD" "HIGH WIND/BLIZZARD"
## [367] "HIGH WIND/BLIZZARD/FREEZING RA" "HIGH WIND/HEAVY SNOW"
## [369] "HIGH WIND/LOW WIND CHILL" "HIGH WIND/SEAS"
## [371] "HIGH WIND/WIND CHILL" "HIGH WIND/WIND CHILL/BLIZZARD"
## [373] "HIGH WINDS" "HIGH WINDS 55"
## [375] "HIGH WINDS 57" "HIGH WINDS 58"
## [377] "HIGH WINDS 63" "HIGH WINDS 66"
## [379] "HIGH WINDS 67" "HIGH WINDS 73"
## [381] "HIGH WINDS 76" "HIGH WINDS 80"
## [383] "HIGH WINDS 82" "HIGH WINDS AND WIND CHILL"
## [385] "HIGH WINDS DUST STORM" "HIGH WINDS HEAVY RAINS"
## [387] "HIGH WINDS/" "HIGH WINDS/COASTAL FLOOD"
## [389] "HIGH WINDS/COLD" "HIGH WINDS/FLOODING"
## [391] "HIGH WINDS/HEAVY RAIN" "HIGH WINDS/SNOW"
## [393] "HIGHWAY FLOODING" "Hot and Dry"
## [395] "HOT PATTERN" "HOT SPELL"
## [397] "HOT WEATHER" "HOT/DRY PATTERN"
## [399] "HURRICANE" "HURRICANE-GENERATED SWELLS"
## [401] "Hurricane Edouard" "HURRICANE EMILY"
## [403] "HURRICANE ERIN" "HURRICANE FELIX"
## [405] "HURRICANE GORDON" "HURRICANE OPAL"
## [407] "HURRICANE OPAL/HIGH WINDS" "HURRICANE/TYPHOON"
## [409] "HVY RAIN" "HYPERTHERMIA/EXPOSURE"
## [411] "HYPOTHERMIA" "Hypothermia/Exposure"
## [413] "HYPOTHERMIA/EXPOSURE" "ICE"
## [415] "ICE AND SNOW" "ICE FLOES"
## [417] "Ice Fog" "ICE JAM"
## [419] "Ice jam flood (minor" "ICE JAM FLOODING"
## [421] "ICE ON ROAD" "ICE PELLETS"
## [423] "ICE ROADS" "ICE STORM"
## [425] "ICE STORM AND SNOW" "ICE STORM/FLASH FLOOD"
## [427] "Ice/Snow" "ICE/SNOW"
## [429] "ICE/STRONG WINDS" "Icestorm/Blizzard"
## [431] "Icy Roads" "ICY ROADS"
## [433] "LACK OF SNOW" "LAKE-EFFECT SNOW"
## [435] "Lake Effect Snow" "LAKE EFFECT SNOW"
## [437] "LAKE FLOOD" "LAKESHORE FLOOD"
## [439] "LANDSLIDE" "LANDSLIDE/URBAN FLOOD"
## [441] "LANDSLIDES" "Landslump"
## [443] "LANDSLUMP" "LANDSPOUT"
## [445] "LARGE WALL CLOUD" "Late-season Snowfall"
## [447] "LATE FREEZE" "LATE SEASON HAIL"
## [449] "LATE SEASON SNOW" "Late Season Snowfall"
## [451] "LATE SNOW" "LIGHT FREEZING RAIN"
## [453] "Light snow" "Light Snow"
## [455] "LIGHT SNOW" "LIGHT SNOW AND SLEET"
## [457] "Light Snow/Flurries" "LIGHT SNOW/FREEZING PRECIP"
## [459] "Light Snowfall" "LIGHTING"
## [461] "LIGHTNING" "LIGHTNING WAUSEON"
## [463] "LIGHTNING AND HEAVY RAIN" "LIGHTNING AND THUNDERSTORM WIN"
## [465] "LIGHTNING AND WINDS" "LIGHTNING DAMAGE"
## [467] "LIGHTNING FIRE" "LIGHTNING INJURY"
## [469] "LIGHTNING THUNDERSTORM WINDS" "LIGHTNING THUNDERSTORM WINDSS"
## [471] "LIGHTNING." "LIGHTNING/HEAVY RAIN"
## [473] "LIGNTNING" "LOCAL FLASH FLOOD"
## [475] "LOCAL FLOOD" "LOCALLY HEAVY RAIN"
## [477] "LOW TEMPERATURE" "LOW TEMPERATURE RECORD"
## [479] "LOW WIND CHILL" "MAJOR FLOOD"
## [481] "Marine Accident" "MARINE HAIL"
## [483] "MARINE HIGH WIND" "MARINE MISHAP"
## [485] "MARINE STRONG WIND" "MARINE THUNDERSTORM WIND"
## [487] "MARINE TSTM WIND" "Metro Storm, May 26"
## [489] "Microburst" "MICROBURST"
## [491] "MICROBURST WINDS" "Mild and Dry Pattern"
## [493] "MILD PATTERN" "MILD/DRY PATTERN"
## [495] "MINOR FLOOD" "Minor Flooding"
## [497] "MINOR FLOODING" "MIXED PRECIP"
## [499] "Mixed Precipitation" "MIXED PRECIPITATION"
## [501] "MODERATE SNOW" "MODERATE SNOWFALL"
## [503] "MONTHLY PRECIPITATION" "Monthly Rainfall"
## [505] "MONTHLY RAINFALL" "Monthly Snowfall"
## [507] "MONTHLY SNOWFALL" "MONTHLY TEMPERATURE"
## [509] "Mountain Snows" "MUD SLIDE"
## [511] "MUD SLIDES" "MUD SLIDES URBAN FLOODING"
## [513] "MUD/ROCK SLIDE" "Mudslide"
## [515] "MUDSLIDE" "MUDSLIDE/LANDSLIDE"
## [517] "Mudslides" "MUDSLIDES"
## [519] "NEAR RECORD SNOW" "No Severe Weather"
## [521] "NON-SEVERE WIND DAMAGE" "NON-TSTM WIND"
## [523] "NON SEVERE HAIL" "NON TSTM WIND"
## [525] "NONE" "NORMAL PRECIPITATION"
## [527] "NORTHERN LIGHTS" "Other"
## [529] "OTHER" "PATCHY DENSE FOG"
## [531] "PATCHY ICE" "Prolong Cold"
## [533] "PROLONG COLD" "PROLONG COLD/SNOW"
## [535] "PROLONG WARMTH" "PROLONGED RAIN"
## [537] "RAIN" "RAIN (HEAVY)"
## [539] "RAIN AND WIND" "Rain Damage"
## [541] "RAIN/SNOW" "RAIN/WIND"
## [543] "RAINSTORM" "RAPIDLY RISING WATER"
## [545] "RECORD COLD" "Record Cold"
## [547] "RECORD COLD" "RECORD COLD AND HIGH WIND"
## [549] "RECORD COLD/FROST" "RECORD COOL"
## [551] "Record dry month" "RECORD DRYNESS"
## [553] "Record Heat" "RECORD HEAT"
## [555] "RECORD HEAT WAVE" "Record High"
## [557] "RECORD HIGH" "RECORD HIGH TEMPERATURE"
## [559] "RECORD HIGH TEMPERATURES" "RECORD LOW"
## [561] "RECORD LOW RAINFALL" "Record May Snow"
## [563] "RECORD PRECIPITATION" "RECORD RAINFALL"
## [565] "RECORD SNOW" "RECORD SNOW/COLD"
## [567] "RECORD SNOWFALL" "Record temperature"
## [569] "RECORD TEMPERATURE" "Record Temperatures"
## [571] "RECORD TEMPERATURES" "RECORD WARM"
## [573] "RECORD WARM TEMPS." "Record Warmth"
## [575] "RECORD WARMTH" "Record Winter Snow"
## [577] "RECORD/EXCESSIVE HEAT" "RECORD/EXCESSIVE RAINFALL"
## [579] "RED FLAG CRITERIA" "RED FLAG FIRE WX"
## [581] "REMNANTS OF FLOYD" "RIP CURRENT"
## [583] "RIP CURRENTS" "RIP CURRENTS HEAVY SURF"
## [585] "RIP CURRENTS/HEAVY SURF" "RIVER AND STREAM FLOOD"
## [587] "RIVER FLOOD" "River Flooding"
## [589] "RIVER FLOODING" "ROCK SLIDE"
## [591] "ROGUE WAVE" "ROTATING WALL CLOUD"
## [593] "ROUGH SEAS" "ROUGH SURF"
## [595] "RURAL FLOOD" "Saharan Dust"
## [597] "SAHARAN DUST" "Seasonal Snowfall"
## [599] "SEICHE" "SEVERE COLD"
## [601] "SEVERE THUNDERSTORM" "SEVERE THUNDERSTORM WINDS"
## [603] "SEVERE THUNDERSTORMS" "SEVERE TURBULENCE"
## [605] "SLEET" "SLEET & FREEZING RAIN"
## [607] "SLEET STORM" "SLEET/FREEZING RAIN"
## [609] "SLEET/ICE STORM" "SLEET/RAIN/SNOW"
## [611] "SLEET/SNOW" "small hail"
## [613] "Small Hail" "SMALL HAIL"
## [615] "SMALL STREAM" "SMALL STREAM AND"
## [617] "SMALL STREAM AND URBAN FLOOD" "SMALL STREAM AND URBAN FLOODIN"
## [619] "SMALL STREAM FLOOD" "SMALL STREAM FLOODING"
## [621] "SMALL STREAM URBAN FLOOD" "SMALL STREAM/URBAN FLOOD"
## [623] "Sml Stream Fld" "SMOKE"
## [625] "Snow" "SNOW"
## [627] "SNOW- HIGH WIND- WIND CHILL" "Snow Accumulation"
## [629] "SNOW ACCUMULATION" "SNOW ADVISORY"
## [631] "SNOW AND COLD" "SNOW AND HEAVY SNOW"
## [633] "Snow and Ice" "SNOW AND ICE"
## [635] "SNOW AND ICE STORM" "Snow and sleet"
## [637] "SNOW AND SLEET" "SNOW AND WIND"
## [639] "SNOW DROUGHT" "SNOW FREEZING RAIN"
## [641] "SNOW SHOWERS" "SNOW SLEET"
## [643] "SNOW SQUALL" "Snow squalls"
## [645] "Snow Squalls" "SNOW SQUALLS"
## [647] "SNOW/ BITTER COLD" "SNOW/ ICE"
## [649] "SNOW/BLOWING SNOW" "SNOW/COLD"
## [651] "SNOW/FREEZING RAIN" "SNOW/HEAVY SNOW"
## [653] "SNOW/HIGH WINDS" "SNOW/ICE"
## [655] "SNOW/ICE STORM" "SNOW/RAIN"
## [657] "SNOW/RAIN/SLEET" "SNOW/SLEET"
## [659] "SNOW/SLEET/FREEZING RAIN" "SNOW/SLEET/RAIN"
## [661] "SNOW\\COLD" "SNOWFALL RECORD"
## [663] "SNOWMELT FLOODING" "SNOWSTORM"
## [665] "SOUTHEAST" "STORM FORCE WINDS"
## [667] "STORM SURGE" "STORM SURGE/TIDE"
## [669] "STREAM FLOODING" "STREET FLOOD"
## [671] "STREET FLOODING" "Strong Wind"
## [673] "STRONG WIND" "STRONG WIND GUST"
## [675] "Strong winds" "Strong Winds"
## [677] "STRONG WINDS" "Summary August 10"
## [679] "Summary August 11" "Summary August 17"
## [681] "Summary August 2-3" "Summary August 21"
## [683] "Summary August 28" "Summary August 4"
## [685] "Summary August 7" "Summary August 9"
## [687] "Summary Jan 17" "Summary July 23-24"
## [689] "Summary June 18-19" "Summary June 5-6"
## [691] "Summary June 6" "Summary of April 12"
## [693] "Summary of April 13" "Summary of April 21"
## [695] "Summary of April 27" "Summary of April 3rd"
## [697] "Summary of August 1" "Summary of July 11"
## [699] "Summary of July 2" "Summary of July 22"
## [701] "Summary of July 26" "Summary of July 29"
## [703] "Summary of July 3" "Summary of June 10"
## [705] "Summary of June 11" "Summary of June 12"
## [707] "Summary of June 13" "Summary of June 15"
## [709] "Summary of June 16" "Summary of June 18"
## [711] "Summary of June 23" "Summary of June 24"
## [713] "Summary of June 3" "Summary of June 30"
## [715] "Summary of June 4" "Summary of June 6"
## [717] "Summary of March 14" "Summary of March 23"
## [719] "Summary of March 24" "SUMMARY OF MARCH 24-25"
## [721] "SUMMARY OF MARCH 27" "SUMMARY OF MARCH 29"
## [723] "Summary of May 10" "Summary of May 13"
## [725] "Summary of May 14" "Summary of May 22"
## [727] "Summary of May 22 am" "Summary of May 22 pm"
## [729] "Summary of May 26 am" "Summary of May 26 pm"
## [731] "Summary of May 31 am" "Summary of May 31 pm"
## [733] "Summary of May 9-10" "Summary Sept. 25-26"
## [735] "Summary September 20" "Summary September 23"
## [737] "Summary September 3" "Summary September 4"
## [739] "Summary: Nov. 16" "Summary: Nov. 6-7"
## [741] "Summary: Oct. 20-21" "Summary: October 31"
## [743] "Summary: Sept. 18" "Temperature record"
## [745] "THUDERSTORM WINDS" "THUNDEERSTORM WINDS"
## [747] "THUNDERESTORM WINDS" "THUNDERSNOW"
## [749] "Thundersnow shower" "THUNDERSTORM"
## [751] "THUNDERSTORM WINDS" "THUNDERSTORM DAMAGE"
## [753] "THUNDERSTORM DAMAGE TO" "THUNDERSTORM HAIL"
## [755] "THUNDERSTORM W INDS" "Thunderstorm Wind"
## [757] "THUNDERSTORM WIND" "THUNDERSTORM WIND (G40)"
## [759] "THUNDERSTORM WIND 50" "THUNDERSTORM WIND 52"
## [761] "THUNDERSTORM WIND 56" "THUNDERSTORM WIND 59"
## [763] "THUNDERSTORM WIND 59 MPH" "THUNDERSTORM WIND 59 MPH."
## [765] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WIND 65 MPH"
## [767] "THUNDERSTORM WIND 65MPH" "THUNDERSTORM WIND 69"
## [769] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND G50"
## [771] "THUNDERSTORM WIND G51" "THUNDERSTORM WIND G52"
## [773] "THUNDERSTORM WIND G55" "THUNDERSTORM WIND G60"
## [775] "THUNDERSTORM WIND G61" "THUNDERSTORM WIND TREES"
## [777] "THUNDERSTORM WIND." "THUNDERSTORM WIND/ TREE"
## [779] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [781] "THUNDERSTORM WIND/HAIL" "THUNDERSTORM WIND/LIGHTNING"
## [783] "THUNDERSTORM WINDS" "THUNDERSTORM WINDS LE CEN"
## [785] "THUNDERSTORM WINDS 13" "THUNDERSTORM WINDS 2"
## [787] "THUNDERSTORM WINDS 50" "THUNDERSTORM WINDS 52"
## [789] "THUNDERSTORM WINDS 53" "THUNDERSTORM WINDS 60"
## [791] "THUNDERSTORM WINDS 61" "THUNDERSTORM WINDS 62"
## [793] "THUNDERSTORM WINDS 63 MPH" "THUNDERSTORM WINDS AND"
## [795] "THUNDERSTORM WINDS FUNNEL CLOU" "THUNDERSTORM WINDS G"
## [797] "THUNDERSTORM WINDS G60" "THUNDERSTORM WINDS HAIL"
## [799] "THUNDERSTORM WINDS HEAVY RAIN" "THUNDERSTORM WINDS LIGHTNING"
## [801] "THUNDERSTORM WINDS SMALL STREA" "THUNDERSTORM WINDS URBAN FLOOD"
## [803] "THUNDERSTORM WINDS." "THUNDERSTORM WINDS/ FLOOD"
## [805] "THUNDERSTORM WINDS/ HAIL" "THUNDERSTORM WINDS/FLASH FLOOD"
## [807] "THUNDERSTORM WINDS/FLOODING" "THUNDERSTORM WINDS/FUNNEL CLOU"
## [809] "THUNDERSTORM WINDS/HAIL" "THUNDERSTORM WINDS/HEAVY RAIN"
## [811] "THUNDERSTORM WINDS53" "THUNDERSTORM WINDSHAIL"
## [813] "THUNDERSTORM WINDSS" "THUNDERSTORM WINS"
## [815] "THUNDERSTORMS" "THUNDERSTORMS WIND"
## [817] "THUNDERSTORMS WINDS" "THUNDERSTORMW"
## [819] "THUNDERSTORMW 50" "THUNDERSTORMW WINDS"
## [821] "THUNDERSTORMWINDS" "THUNDERSTROM WIND"
## [823] "THUNDERSTROM WINDS" "THUNDERTORM WINDS"
## [825] "THUNDERTSORM WIND" "THUNDESTORM WINDS"
## [827] "THUNERSTORM WINDS" "TIDAL FLOOD"
## [829] "Tidal Flooding" "TIDAL FLOODING"
## [831] "TORNADO" "TORNADO DEBRIS"
## [833] "TORNADO F0" "TORNADO F1"
## [835] "TORNADO F2" "TORNADO F3"
## [837] "TORNADO/WATERSPOUT" "TORNADOES"
## [839] "TORNADOES, TSTM WIND, HAIL" "TORNADOS"
## [841] "TORNDAO" "TORRENTIAL RAIN"
## [843] "Torrential Rainfall" "TROPICAL DEPRESSION"
## [845] "TROPICAL STORM" "TROPICAL STORM ALBERTO"
## [847] "TROPICAL STORM DEAN" "TROPICAL STORM GORDON"
## [849] "TROPICAL STORM JERRY" "TSTM"
## [851] "TSTM HEAVY RAIN" "Tstm Wind"
## [853] "TSTM WIND" "TSTM WIND (G45)"
## [855] "TSTM WIND (41)" "TSTM WIND (G35)"
## [857] "TSTM WIND (G40)" "TSTM WIND (G45)"
## [859] "TSTM WIND 40" "TSTM WIND 45"
## [861] "TSTM WIND 50" "TSTM WIND 51"
## [863] "TSTM WIND 52" "TSTM WIND 55"
## [865] "TSTM WIND 65)" "TSTM WIND AND LIGHTNING"
## [867] "TSTM WIND DAMAGE" "TSTM WIND G45"
## [869] "TSTM WIND G58" "TSTM WIND/HAIL"
## [871] "TSTM WINDS" "TSTM WND"
## [873] "TSTMW" "TSUNAMI"
## [875] "TUNDERSTORM WIND" "TYPHOON"
## [877] "Unseasonable Cold" "UNSEASONABLY COLD"
## [879] "UNSEASONABLY COOL" "UNSEASONABLY COOL & WET"
## [881] "UNSEASONABLY DRY" "UNSEASONABLY HOT"
## [883] "UNSEASONABLY WARM" "UNSEASONABLY WARM & WET"
## [885] "UNSEASONABLY WARM AND DRY" "UNSEASONABLY WARM YEAR"
## [887] "UNSEASONABLY WARM/WET" "UNSEASONABLY WET"
## [889] "UNSEASONAL LOW TEMP" "UNSEASONAL RAIN"
## [891] "UNUSUAL WARMTH" "UNUSUAL/RECORD WARMTH"
## [893] "UNUSUALLY COLD" "UNUSUALLY LATE SNOW"
## [895] "UNUSUALLY WARM" "URBAN AND SMALL"
## [897] "URBAN AND SMALL STREAM" "URBAN AND SMALL STREAM FLOOD"
## [899] "URBAN AND SMALL STREAM FLOODIN" "Urban flood"
## [901] "Urban Flood" "URBAN FLOOD"
## [903] "URBAN FLOOD LANDSLIDE" "Urban Flooding"
## [905] "URBAN FLOODING" "URBAN FLOODS"
## [907] "URBAN SMALL" "URBAN SMALL STREAM FLOOD"
## [909] "URBAN/SMALL" "URBAN/SMALL FLOODING"
## [911] "URBAN/SMALL STREAM" "URBAN/SMALL STREAM FLOOD"
## [913] "URBAN/SMALL STREAM FLOOD" "URBAN/SMALL STREAM FLOODING"
## [915] "URBAN/SMALL STRM FLDG" "URBAN/SML STREAM FLD"
## [917] "URBAN/SML STREAM FLDG" "URBAN/STREET FLOODING"
## [919] "VERY DRY" "VERY WARM"
## [921] "VOG" "Volcanic Ash"
## [923] "VOLCANIC ASH" "Volcanic Ash Plume"
## [925] "VOLCANIC ASHFALL" "VOLCANIC ERUPTION"
## [927] "WAKE LOW WIND" "WALL CLOUD"
## [929] "WALL CLOUD/FUNNEL CLOUD" "WARM DRY CONDITIONS"
## [931] "WARM WEATHER" "WATER SPOUT"
## [933] "WATERSPOUT" "WATERSPOUT-"
## [935] "WATERSPOUT-TORNADO" "WATERSPOUT FUNNEL CLOUD"
## [937] "WATERSPOUT TORNADO" "WATERSPOUT/"
## [939] "WATERSPOUT/ TORNADO" "WATERSPOUT/TORNADO"
## [941] "WATERSPOUTS" "WAYTERSPOUT"
## [943] "wet micoburst" "WET MICROBURST"
## [945] "Wet Month" "WET SNOW"
## [947] "WET WEATHER" "Wet Year"
## [949] "Whirlwind" "WHIRLWIND"
## [951] "WILD FIRES" "WILD/FOREST FIRE"
## [953] "WILD/FOREST FIRES" "WILDFIRE"
## [955] "WILDFIRES" "Wind"
## [957] "WIND" "WIND ADVISORY"
## [959] "WIND AND WAVE" "WIND CHILL"
## [961] "WIND CHILL/HIGH WIND" "Wind Damage"
## [963] "WIND DAMAGE" "WIND GUSTS"
## [965] "WIND STORM" "WIND/HAIL"
## [967] "WINDS" "WINTER MIX"
## [969] "WINTER STORM" "WINTER STORM HIGH WINDS"
## [971] "WINTER STORM/HIGH WIND" "WINTER STORM/HIGH WINDS"
## [973] "WINTER STORMS" "Winter Weather"
## [975] "WINTER WEATHER" "WINTER WEATHER MIX"
## [977] "WINTER WEATHER/MIX" "WINTERY MIX"
## [979] "Wintry mix" "Wintry Mix"
## [981] "WINTRY MIX" "WND"
According to the description document, there are 48 permited event types. The dataset contains 982 of them so the EVTYPE column is obviously very untidy. Let’s try to clean it. First, we will capitalize all the characters to make string matching easier:
data$EVTYPE <- toupper(data$EVTYPE)
Now let’s fix a couple of things that are obvious at the first glance:
data$EVTYPE <- gsub("^\\s+|\\s+$", "", data$EVTYPE) #remove head and tail white spaces
data$EVTYPE <- gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", data$EVTYPE, perl=TRUE) #remove double white spaces between words
data <- filter(data, !grepl("SUMMARY", EVTYPE)) #remove summaries
data <- filter(data, !grepl("\\?", EVTYPE)) #remove the "?" description
data$EVTYPE <- sub("TSTM", "THUNDERSTORM", data$EVTYPE)
data$EVTYPE <- sub("WND", "WIND", data$EVTYPE)
data$EVTYPE <- sub("W INDS", "WINDS", data$EVTYPE)
data$EVTYPE <- sub("THUNDERSTORMWINDS", "THUNDERSTORM WINDS", data$EVTYPE)
I have created a .csv file from the description document containing the permitted event-type categories. We will use it to tidy the event types further:
names <- read.csv("Event_types.csv", header = FALSE)
names <- as.character(names[,1])
names <- toupper(names)
names
## [1] "ASTRONOMICAL LOW TIDE" "AVALANCHE"
## [3] "BLIZZARD" "COASTAL FLOOD"
## [5] "COLD/WIND CHILL" "DEBRIS FLOW"
## [7] "DENSE FOG" "DENSE SMOKE"
## [9] "DROUGHT" "DUST DEVIL"
## [11] "DUST STORM" "EXCESSIVE HEAT"
## [13] "EXTREME COLD/WIND CHILL" "FLASH FLOOD"
## [15] "FLOOD" "FROST/FREEZE"
## [17] "FUNNEL CLOUD" "FREEZING FOG"
## [19] "HAIL" "HEAT"
## [21] "HEAVY RAIN" "HEAVY SNOW"
## [23] "HIGH SURF" "HIGH WIND"
## [25] "HURRICANE (TYPHOON)" "ICE STORM"
## [27] "LAKE-EFFECT SNOW" "LAKESHORE FLOOD"
## [29] "LIGHTNING" "MARINE HAIL"
## [31] "MARINE HIGH WIND" "MARINE STRONG WIND"
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"
## [35] "SEICHE" "SLEET"
## [37] "STORM SURGE/TIDE" "STRONG WIND"
## [39] "THUNDERSTORM WIND" "TORNADO"
## [41] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [43] "TSUNAMI" "VOLCANIC ASH"
## [45] "WATERSPOUT" "WILDFIRE"
## [47] "WINTER STORM" "WINTER WEATHER"
First we will search for the strings that match exactly the permitted strings but contain something else (as in “HAIL 59”) and we will replace them with the correct string:
for(i in 1:length(names)) {
data$EVTYPE[grepl(names[i], data$EVTYPE)] <- names[i]
}
length(sort(unique(data$EVTYPE)))
## [1] 417
We will do the same with approximate matching:
for(i in 1:length(names)) {
data$EVTYPE[agrepl(names[i], data$EVTYPE)] <- names[i]
}
Now let’s see how far we got:
length(sort(unique(data$EVTYPE)))
## [1] 354
So we have reduced the event types values by more than a half which is better but still far from perfect. I tried to use hierarchical clustering and the partial matching with the “amatch” function but the results still needed a lot of manual adjustments. Besides, closer look at the description document shows that some of the terms are sub-terms of the main ones (for example a gustnado is a manifestation of a thunderstorm wind and therefore should be classified that way). We will therefore fix the rest of the event types manually. We will save them in a .csv document, go through them in a table processor (in this case MS Excell) and assign to each of the term either one of the terms from the description document or a rulling-out value “X”.
The method for assigning the correct values is going to be to full-text search all the mismatching terms in the description document. We will pick the event type that mentions the mismatching term in its description. In case of ambivalence, we will use the “X” marking and those lines will be deleted. The result, of course, leaves a certain space for personal interpretation and bias. We will present the system of matching hereunder so reviewers can discuss or fix the way it was done.
write.csv(sort(unique(data$EVTYPE)), "types.csv")
types <- read.csv("types2.csv", header = FALSE, col.names = c("RAW", "Event_Type"), stringsAsFactors = FALSE)
types
## RAW Event_Type
## 1 ABNORMAL WARMTH EXCESSIVE HEAT
## 2 ABNORMALLY DRY DROUGHT
## 3 ABNORMALLY WET X
## 4 ACCUMULATED SNOWFALL HEAVY SNOW
## 5 AGRICULTURAL FREEZE FROST/FREEZE
## 6 APACHE COUNTY X
## 7 ASTRONOMICAL HIGH TIDE HIGH SURF
## 8 ASTRONOMICAL LOW TIDE ASTRONOMICAL LOW TIDE
## 9 AVALANCHE AVALANCHE
## 10 BEACH EROSIN X
## 11 BEACH EROSION X
## 12 BELOW NORMAL PRECIPITATION X
## 13 BLACK ICE X
## 14 BLIZZARD BLIZZARD
## 15 BLOW-OUT TIDE X
## 16 BLOW-OUT TIDES X
## 17 BLOWING DUST DUST STORM
## 18 BLOWING SNOW X
## 19 BLOWING SNOW- EXTREME WIND CHI X
## 20 BLOWING SNOW & EXTREME WIND CH X
## 21 BRUSH FIRE WILDFIRE
## 22 BRUSH FIRES WILDFIRE
## 23 COASTAL EROSION X
## 24 COASTAL STORM X
## 25 COASTAL SURGE STORM SURGE/TIDE
## 26 COASTALSTORM X
## 27 COLD COLD/WIND CHILL
## 28 COLD AIR FUNNEL FUNNEL CLOUD
## 29 COLD AIR FUNNELS FUNNEL CLOUD
## 30 COLD AND FROST FROST/FREEZE
## 31 COLD AND SNOW COLD/WIND CHILL
## 32 COLD AND WET CONDITIONS COLD/WIND CHILL
## 33 COLD TEMPERATURE COLD/WIND CHILL
## 34 COLD TEMPERATURES COLD/WIND CHILL
## 35 COLD WAVE COLD/WIND CHILL
## 36 COLD/WINDS COLD/WIND CHILL
## 37 COOL AND WET COLD/WIND CHILL
## 38 COOL SPELL COLD/WIND CHILL
## 39 DAM BREAK FLASH FLOOD
## 40 DAMAGING FREEZE FROST/FREEZE
## 41 DENSE FOG DENSE FOG
## 42 DENSE SMOKE DENSE SMOKE
## 43 DOWNBURST X
## 44 DOWNBURST WINDS X
## 45 DRIEST MONTH DROUGHT
## 46 DRIFTING SNOW WINTER WEATHER
## 47 DROUGHT DROUGHT
## 48 DROWNING X
## 49 DRY DROUGHT
## 50 DRY CONDITIONS DROUGHT
## 51 DRY MICROBURST THUNDERSTORM WIND
## 52 DRY MICROBURST 50 THUNDERSTORM WIND
## 53 DRY MICROBURST 53 THUNDERSTORM WIND
## 54 DRY MICROBURST 58 THUNDERSTORM WIND
## 55 DRY MICROBURST 61 THUNDERSTORM WIND
## 56 DRY MICROBURST 84 THUNDERSTORM WIND
## 57 DRY MICROBURST WINDS THUNDERSTORM WIND
## 58 DRY MIRCOBURST WINDS THUNDERSTORM WIND
## 59 DRY PATTERN DROUGHT
## 60 DRY SPELL DROUGHT
## 61 DRYNESS DROUGHT
## 62 DUST DEVIL DUST DEVIL
## 63 DUST STORM DUST STORM
## 64 EARLY FREEZE FROST/FREEZE
## 65 EARLY FROST FROST/FREEZE
## 66 EARLY RAIN X
## 67 EARLY SNOW WINTER WEATHER
## 68 EARLY SNOWFALL WINTER WEATHER
## 69 EXCESSIVE X
## 70 EXCESSIVE COLD EXTREME COLD/WIND CHILL
## 71 EXCESSIVE PRECIPITATION HEAVY RAIN
## 72 EXCESSIVE RAIN HEAVY RAIN
## 73 EXCESSIVE RAINFALL HEAVY RAIN
## 74 EXCESSIVE SNOW HEAVY SNOW
## 75 EXCESSIVELY DRY DROUGHT
## 76 EXTENDED COLD COLD/WIND CHILL
## 77 EXTREME COLD EXTREME COLD/WIND CHILL
## 78 EXTREME/RECORD COLD EXTREME COLD/WIND CHILL
## 79 EXTREMELY WET X
## 80 FALLING SNOW/ICE HEAVY SNOW
## 81 FIRST FROST FROST/FREEZE
## 82 FIRST SNOW X
## 83 FLOOD FLOOD
## 84 FOG DENSE FOG
## 85 FOG AND COLD TEMPERATURES DENSE FOG
## 86 FOREST FIRES WILDFIRE
## 87 FREEZE FROST/FREEZE
## 88 FREEZING DRIZZLE WINTER WEATHER
## 89 FREEZING DRIZZLE AND FREEZING WINTER WEATHER
## 90 FREEZING FOG FREEZING FOG
## 91 FREEZING RAIN WINTER WEATHER
## 92 FREEZING RAIN AND SNOW WINTER WEATHER
## 93 FREEZING RAIN/SNOW WINTER WEATHER
## 94 FREEZING SPRAY X
## 95 FROST FROST/FREEZE
## 96 FROST/FREEZE FROST/FREEZE
## 97 FUNNEL FUNNEL CLOUD
## 98 FUNNEL CLOUD FUNNEL CLOUD
## 99 FUNNELS FUNNEL CLOUD
## 100 GLAZE X
## 101 GLAZE ICE X
## 102 GRADIENT WIND X
## 103 GRADIENT WINDS X
## 104 GRASS FIRES WILDFIRE
## 105 GUSTNADO THUNDERSTORM WIND
## 106 GUSTNADO AND THUNDERSTORM WIND
## 107 GUSTY LAKE WIND STRONG WIND
## 108 GUSTY WIND STRONG WIND
## 109 GUSTY WIND/HVY RAIN STRONG WIND
## 110 GUSTY WIND/RAIN STRONG WIND
## 111 GUSTY WINDS STRONG WIND
## 112 HAIL HAIL
## 113 HARD FREEZE FROST/FREEZE
## 114 HAZARDOUS SURF HIGH SURF
## 115 HEAT HEAT
## 116 HIGH X
## 117 HIGH SEAS X
## 118 HIGH SURF HIGH SURF
## 119 HIGH SWELLS X
## 120 HIGH TEMPERATURE RECORD HEAT
## 121 HIGH TIDES HIGH SURF
## 122 HIGH WATER X
## 123 HIGH WAVES X
## 124 HIGH WIND HIGH WIND
## 125 HOT AND DRY DROUGHT
## 126 HOT PATTERN HEAT
## 127 HOT SPELL HEAT
## 128 HOT/DRY PATTERN DROUGHT
## 129 HURRICANE HURRICANE (TYPHOON)
## 130 HURRICANE-GENERATED SWELLS HURRICANE (TYPHOON)
## 131 HURRICANE EDOUARD HURRICANE (TYPHOON)
## 132 HURRICANE EMILY HURRICANE (TYPHOON)
## 133 HURRICANE ERIN HURRICANE (TYPHOON)
## 134 HURRICANE FELIX HURRICANE (TYPHOON)
## 135 HURRICANE GORDON HURRICANE (TYPHOON)
## 136 HURRICANE OPAL HURRICANE (TYPHOON)
## 137 HURRICANE/TYPHOON HURRICANE (TYPHOON)
## 138 HVY RAIN HEAVY RAIN
## 139 HYPERTHERMIA/EXPOSURE X
## 140 HYPOTHERMIA X
## 141 HYPOTHERMIA/EXPOSURE X
## 142 ICE X
## 143 ICE AND SNOW WINTER WEATHER
## 144 ICE FLOES X
## 145 ICE FOG FREEZING FOG
## 146 ICE JAM X
## 147 ICE ON ROAD X
## 148 ICE PELLETS X
## 149 ICE ROADS X
## 150 ICE STORM ICE STORM
## 151 ICE/SNOW X
## 152 ICY ROADS X
## 153 LACK OF SNOW X
## 154 LAKE-EFFECT SNOW LAKE-EFFECT SNOW
## 155 LANDSLIDE DEBRIS FLOW
## 156 LANDSLIDES DEBRIS FLOW
## 157 LANDSLUMP X
## 158 LANDSPOUT TORNADO
## 159 LARGE WALL CLOUD X
## 160 LATE-SEASON SNOWFALL X
## 161 LATE FREEZE FROST/FREEZE
## 162 LATE SEASON SNOW X
## 163 LATE SEASON SNOWFALL X
## 164 LATE SNOW X
## 165 LIGHT FREEZING RAIN X
## 166 LIGHT SNOW X
## 167 LIGHT SNOW/FLURRIES X
## 168 LIGHT SNOW/FREEZING PRECIP X
## 169 LIGHT SNOWFALL X
## 170 LIGHTNING LIGHTNING
## 171 LOW TEMPERATURE X
## 172 LOW TEMPERATURE RECORD FROST/FREEZE
## 173 MARINE ACCIDENT X
## 174 MARINE MISHAP X
## 175 METRO STORM, MAY 26 X
## 176 MICROBURST THUNDERSTORM WIND
## 177 MICROBURST WINDS THUNDERSTORM WIND
## 178 MILD AND DRY PATTERN X
## 179 MILD PATTERN X
## 180 MILD/DRY PATTERN X
## 181 MIXED PRECIP X
## 182 MIXED PRECIPITATION X
## 183 MODERATE SNOW X
## 184 MODERATE SNOWFALL X
## 185 MONTHLY PRECIPITATION X
## 186 MONTHLY RAINFALL X
## 187 MONTHLY SNOWFALL X
## 188 MONTHLY TEMPERATURE X
## 189 MOUNTAIN SNOWS X
## 190 MUD SLIDE X
## 191 MUD SLIDES X
## 192 MUD/ROCK SLIDE X
## 193 MUDSLIDE X
## 194 MUDSLIDE/LANDSLIDE X
## 195 MUDSLIDES X
## 196 NEAR RECORD SNOW HEAVY SNOW
## 197 NON-SEVERE WIND DAMAGE X
## 198 NONE X
## 199 NORMAL PRECIPITATION X
## 200 NORTHERN LIGHTS X
## 201 OTHER X
## 202 PATCHY ICE X
## 203 PROLONG COLD COLD/WIND CHILL
## 204 PROLONG COLD/SNOW COLD/WIND CHILL
## 205 PROLONG WARMTH HEAT
## 206 PROLONGED RAIN X
## 207 RAIN X
## 208 RAIN AND WIND X
## 209 RAIN DAMAGE X
## 210 RAIN/SNOW X
## 211 RAIN/WIND X
## 212 RAINSTORM X
## 213 RAPIDLY RISING WATER X
## 214 RECORD COLD EXTREME COLD/WIND CHILL
## 215 RECORD COLD/FROST EXTREME COLD/WIND CHILL
## 216 RECORD COOL EXTREME COLD/WIND CHILL
## 217 RECORD DRY MONTH DROUGHT
## 218 RECORD DRYNESS DROUGHT
## 219 RECORD HIGH X
## 220 RECORD HIGH TEMPERATURE EXCESSIVE HEAT
## 221 RECORD HIGH TEMPERATURES EXCESSIVE HEAT
## 222 RECORD LOW X
## 223 RECORD LOW RAINFALL DROUGHT
## 224 RECORD MAY SNOW X
## 225 RECORD PRECIPITATION X
## 226 RECORD RAINFALL X
## 227 RECORD SNOW X
## 228 RECORD SNOW/COLD X
## 229 RECORD SNOWFALL X
## 230 RECORD TEMPERATURE X
## 231 RECORD TEMPERATURES X
## 232 RECORD WARM EXCESSIVE HEAT
## 233 RECORD WARM TEMPS. EXCESSIVE HEAT
## 234 RECORD WARMTH EXCESSIVE HEAT
## 235 RECORD WINTER SNOW X
## 236 RECORD/EXCESSIVE RAINFALL X
## 237 RED FLAG CRITERIA X
## 238 RED FLAG FIRE WX X
## 239 RIP CURRENT RIP CURRENT
## 240 ROCK SLIDE DEBRIS FLOW
## 241 ROGUE WAVE X
## 242 ROTATING WALL CLOUD X
## 243 ROUGH SEAS X
## 244 ROUGH SURF X
## 245 SAHARAN DUST X
## 246 SEASONAL SNOWFALL X
## 247 SEICHE SEICHE
## 248 SEVERE COLD X
## 249 SEVERE THUNDERSTORM X
## 250 SEVERE THUNDERSTORMS X
## 251 SEVERE TURBULENCE X
## 252 SLEET SLEET
## 253 SMALL STREAM X
## 254 SMALL STREAM AND X
## 255 SML STREAM FLD X
## 256 SMOKE X
## 257 SNOW X
## 258 SNOW ACCUMULATION X
## 259 SNOW ADVISORY X
## 260 SNOW AND COLD X
## 261 SNOW AND ICE X
## 262 SNOW AND WIND X
## 263 SNOW FREEZING RAIN X
## 264 SNOW SHOWERS X
## 265 SNOW SQUALL WINTER WEATHER
## 266 SNOW SQUALLS WINTER WEATHER
## 267 SNOW/ BITTER COLD X
## 268 SNOW/ ICE X
## 269 SNOW/BLOWING SNOW X
## 270 SNOW/COLD X
## 271 SNOW/FREEZING RAIN X
## 272 SNOW/ICE X
## 273 SNOW/RAIN X
## 274 SNOW\\COLD X
## 275 SNOWFALL RECORD HEAVY SNOW
## 276 SNOWSTORM X
## 277 STORM FORCE WINDS X
## 278 STORM SURGE STORM SURGE/TIDE
## 279 STORM SURGE/TIDE STORM SURGE/TIDE
## 280 STRONG WIND STRONG WIND
## 281 TEMPERATURE RECORD X
## 282 THUNDERSNOW X
## 283 THUNDERSNOW SHOWER X
## 284 THUNDERSTORM X
## 285 THUNDERSTORM DAMAGE X
## 286 THUNDERSTORM DAMAGE TO X
## 287 THUNDERSTORM WIND THUNDERSTORM WIND
## 288 THUNDERSTORMS X
## 289 THUNDERSTORMW X
## 290 THUNDERSTORMW 50 X
## 291 TORNADO TORNADO
## 292 TORNDAO TORNADO
## 293 TORRENTIAL RAIN X
## 294 TORRENTIAL RAINFALL X
## 295 TROPICAL DEPRESSION TROPICAL DEPRESSION
## 296 TROPICAL STORM TROPICAL STORM
## 297 TSUNAMI TSUNAMI
## 298 TYPHOON HURRICANE (TYPHOON)
## 299 UNSEASONABLE COLD COLD/WIND CHILL
## 300 UNSEASONABLY COLD COLD/WIND CHILL
## 301 UNSEASONABLY COOL COLD/WIND CHILL
## 302 UNSEASONABLY COOL & WET COLD/WIND CHILL
## 303 UNSEASONABLY DRY DROUGHT
## 304 UNSEASONABLY HOT HEAT
## 305 UNSEASONABLY WARM HEAT
## 306 UNSEASONABLY WARM & WET HEAT
## 307 UNSEASONABLY WARM AND DRY DROUGHT
## 308 UNSEASONABLY WARM YEAR X
## 309 UNSEASONABLY WARM/WET HEAT
## 310 UNSEASONABLY WET X
## 311 UNSEASONAL LOW TEMP X
## 312 UNSEASONAL RAIN X
## 313 UNUSUAL WARMTH HEAT
## 314 UNUSUAL/RECORD WARMTH EXCESSIVE HEAT
## 315 UNUSUALLY COLD COLD/WIND CHILL
## 316 UNUSUALLY LATE SNOW WINTER WEATHER
## 317 UNUSUALLY WARM HEAT
## 318 URBAN AND SMALL X
## 319 URBAN AND SMALL STREAM X
## 320 URBAN SMALL X
## 321 URBAN/SMALL X
## 322 URBAN/SMALL STREAM X
## 323 URBAN/SMALL STRM FLDG X
## 324 URBAN/SML STREAM FLD X
## 325 URBAN/SML STREAM FLDG X
## 326 VERY DRY DROUGHT
## 327 VERY WARM HEAT
## 328 VOG X
## 329 VOLCANIC ASH VOLCANIC ASH
## 330 VOLCANIC ERUPTION X
## 331 WAKE LOW WIND X
## 332 WALL CLOUD X
## 333 WARM DRY CONDITIONS DROUGHT
## 334 WATERSPOUT TORNADO
## 335 WET MICOBURST THUNDERSTORM WIND
## 336 WET MICROBURST THUNDERSTORM WIND
## 337 WET MONTH X
## 338 WET SNOW X
## 339 WET YEAR X
## 340 WHIRLWIND X
## 341 WILD/FOREST FIRE WILDFIRE
## 342 WILD/FOREST FIRES WILDFIRE
## 343 WILDFIRE WILDFIRE
## 344 WIND X
## 345 WIND ADVISORY X
## 346 WIND AND WAVE X
## 347 WIND DAMAGE X
## 348 WIND GUSTS X
## 349 WIND STORM X
## 350 WINDS X
## 351 WINTER MIX X
## 352 WINTER STORM X
## 353 WINTERY MIX X
## 354 WINTRY MIX X
Let’s now replace the original “EVTYPE” column by the fixed values and filter out all the “X” values:
data <- inner_join(data, types, by = c("EVTYPE" = "RAW")) %>% select(Event_Type, FATALITIES, INJURIES, CROPDMG, PROPDMG) %>% filter(Event_Type != "X")
The task is to determine which events cause the biggest economic harm and the biggest harm to human health. We will measure economic harm by a simple sum of property damage and crop damage. In the case of human health, we will sum the fatilities and injuries but we will give a bigger weight to the fatalities. Injuries can be serious but also relatively minor, whereas a case of death is always serious. We will create the new variables:
data <- mutate(data, Health_Damage = 0.7*FATALITIES + 0.3*INJURIES, Property_Damage = CROPDMG+PROPDMG)
We will now show the top 10 event types in two different bar plots. The first one will show the 10 event types that have the severest impact on human health, and the other one will show the same impact on property.
library(ggplot2)
health <- data %>% group_by(Event_Type) %>% summarise(Total_Health_Damage = sum(Health_Damage)) %>% arrange(desc(Total_Health_Damage))
health <- health[1:10,]
health$Event_Type <- factor(health$Event_Type, levels = health$Event_Type[order(-health$Total_Health_Damage)])
property <- data %>% group_by(Event_Type) %>% summarise(Total_Property_Damage = sum(Property_Damage)) %>% arrange(desc(Total_Property_Damage))
property <- property[1:10,]
property$Event_Type <- factor(property$Event_Type, levels = property$Event_Type[order(-property$Total_Property_Damage)])
ggplot(health, aes(x=Event_Type, y=Total_Health_Damage)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ggplot(property, aes(x=Event_Type, y=Total_Property_Damage)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))