Synopsis

The purpose of this analysis is to determine, which types of storm events have the biggest impact on human health and on the economy in the United States. For this purpose, we will use a dataset from the National Weather Service an an acompannying document that describes the methodology of storm events classification and recording, hereinafter referred to as the “description document”. The results are presented via two bar plots showing the top ten event types causing the biggest health and economic damage.

Sources:

The main dataset:
File: repdata%2Fdata%2FStormData.csv.bz2
Available at: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

The description document:
Title: National Weather Service Instruction 10-1605, August 17, 2007, Operations And Services, Performance, Nwspd 10-16, Storm Data Preparation
File: repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
Available at: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

Tools used:
- Acer A517-51G, Intel Core i5-8250U 1.60GHz, 12.0 GB, Windows 10 Home 64 bit
- RStudio 3.4.3
- MS Excell

Data Processing

Loading data

if(!file.exists("data.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data.bz2")
}
a <- read.csv("data.csv.bz2")

Cleaning data

Let’s look at the columns first:

data <- a
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We will select only the columns we need, i.e. the event type and the columns summarising fatalities, injuries and economic damage:

library(dplyr)
data <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(data)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

We will convert the factors into character vectors:

data$EVTYPE <- as.character(data$EVTYPE)
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)

Now let’s see what the data look like. First check for NAs and missing values:

table(is.na(data))
## 
##   FALSE 
## 6316079
table(data == "")
## 
##   FALSE    TRUE 
## 5231732 1084347

So there are no NAs but there are missing values. Let’s check those closer:

lapply(data, function(x) sum(x==""))
## $EVTYPE
## [1] 0
## 
## $FATALITIES
## [1] 0
## 
## $INJURIES
## [1] 0
## 
## $PROPDMG
## [1] 0
## 
## $PROPDMGEXP
## [1] 465934
## 
## $CROPDMG
## [1] 0
## 
## $CROPDMGEXP
## [1] 618413

So the empty values are only in the columns PROPDMGEXP and CROPDMGGEXP which contain the characters to signify the magnitute of damage. Therefore, they should only be missing when the damages are zero. Let’s check that:

nrow(filter(data, PROPDMGEXP == "" & PROPDMG !=0))
## [1] 76
nrow(filter(data, CROPDMGEXP == "" & CROPDMG !=0))
## [1] 3

We can see that the characters are missing even with non-zero damage. Those observations are useless so we will remove them:

data <- filter(data, !(PROPDMGEXP == "" & PROPDMG !=0))
data <- filter(data, !(CROPDMGEXP == "" & CROPDMG !=0))

Let’s now examine the columns related to damage further. The numerical columns are ok - they have the “num”" class and there are neither NAs nor empty fields. What about the character columns? The only acceptable values in those are K, M and B.

unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "0" "k" "?" "2"

Those are clearly not the only values appearing there. First let’s replace the lower-case acceptable characters by upper cases:

data[data$PROPDMGEXP == "m",]$PROPDMGEXP <- "M"
data[data$CROPDMGEXP == "m",]$CROPDMGEXP <- "M"
data[data$CROPDMGEXP == "k",]$CROPDMGEXP <- "K"

What about those characters that make no sense? There is a chance that people used them randomly with zero-damage cases. We should keep those observations to preserve the data. The rest must be deleted because it carries no valuable information.

data <- filter(data, !(!(PROPDMGEXP %in% c("M", "K", "B")) & PROPDMG != 0))
data <- filter(data, !(!(CROPDMGEXP %in% c("M", "K", "B")) & CROPDMG != 0))

So the “EXP” columns are now clean and we can use them to calculate the real value of the economic damage:

data[data$PROPDMGEXP == "K",]$PROPDMG <- data[data$PROPDMGEXP == "K",]$PROPDMG * 1000
data[data$PROPDMGEXP == "M",]$PROPDMG <- data[data$PROPDMGEXP == "M",]$PROPDMG * 1000000
data[data$PROPDMGEXP == "B",]$PROPDMG <- data[data$PROPDMGEXP == "B",]$PROPDMG * 1000000000
data[data$CROPDMGEXP == "K",]$CROPDMG <- data[data$CROPDMGEXP == "K",]$CROPDMG * 1000
data[data$CROPDMGEXP == "M",]$CROPDMG <- data[data$CROPDMGEXP == "M",]$CROPDMG * 1000000
data[data$CROPDMGEXP == "B",]$CROPDMG <- data[data$CROPDMGEXP == "B",]$CROPDMG * 1000000000

Now let’s move to the EVTYPE column:

unique(sort(data$EVTYPE))
##   [1] "   HIGH SURF ADVISORY"          " COASTAL FLOOD"                
##   [3] " FLASH FLOOD"                   " LIGHTNING"                    
##   [5] " TSTM WIND"                     " TSTM WIND (G45)"              
##   [7] " WATERSPOUT"                    " WIND"                         
##   [9] "?"                              "ABNORMAL WARMTH"               
##  [11] "ABNORMALLY DRY"                 "ABNORMALLY WET"                
##  [13] "ACCUMULATED SNOWFALL"           "AGRICULTURAL FREEZE"           
##  [15] "APACHE COUNTY"                  "ASTRONOMICAL HIGH TIDE"        
##  [17] "ASTRONOMICAL LOW TIDE"          "AVALANCE"                      
##  [19] "AVALANCHE"                      "BEACH EROSIN"                  
##  [21] "Beach Erosion"                  "BEACH EROSION"                 
##  [23] "BEACH EROSION/COASTAL FLOOD"    "BEACH FLOOD"                   
##  [25] "BELOW NORMAL PRECIPITATION"     "BITTER WIND CHILL"             
##  [27] "BITTER WIND CHILL TEMPERATURES" "Black Ice"                     
##  [29] "BLACK ICE"                      "BLIZZARD"                      
##  [31] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"       
##  [33] "Blizzard Summary"               "BLIZZARD WEATHER"              
##  [35] "BLIZZARD/FREEZING RAIN"         "BLIZZARD/HEAVY SNOW"           
##  [37] "BLIZZARD/HIGH WIND"             "BLIZZARD/WINTER STORM"         
##  [39] "BLOW-OUT TIDE"                  "BLOW-OUT TIDES"                
##  [41] "BLOWING DUST"                   "blowing snow"                  
##  [43] "Blowing Snow"                   "BLOWING SNOW"                  
##  [45] "BLOWING SNOW- EXTREME WIND CHI" "BLOWING SNOW & EXTREME WIND CH"
##  [47] "BLOWING SNOW/EXTREME WIND CHIL" "BRUSH FIRE"                    
##  [49] "BRUSH FIRES"                    "COASTAL  FLOODING/EROSION"     
##  [51] "COASTAL EROSION"                "Coastal Flood"                 
##  [53] "COASTAL FLOOD"                  "coastal flooding"              
##  [55] "Coastal Flooding"               "COASTAL FLOODING"              
##  [57] "COASTAL FLOODING/EROSION"       "Coastal Storm"                 
##  [59] "COASTAL STORM"                  "COASTAL SURGE"                 
##  [61] "COASTAL/TIDAL FLOOD"            "COASTALFLOOD"                  
##  [63] "COASTALSTORM"                   "Cold"                          
##  [65] "COLD"                           "COLD AIR FUNNEL"               
##  [67] "COLD AIR FUNNELS"               "COLD AIR TORNADO"              
##  [69] "Cold and Frost"                 "COLD AND FROST"                
##  [71] "COLD AND SNOW"                  "COLD AND WET CONDITIONS"       
##  [73] "Cold Temperature"               "COLD TEMPERATURES"             
##  [75] "COLD WAVE"                      "COLD WEATHER"                  
##  [77] "COLD WIND CHILL TEMPERATURES"   "COLD/WIND CHILL"               
##  [79] "COLD/WINDS"                     "COOL AND WET"                  
##  [81] "COOL SPELL"                     "CSTL FLOODING/EROSION"         
##  [83] "DAM BREAK"                      "DAM FAILURE"                   
##  [85] "Damaging Freeze"                "DAMAGING FREEZE"               
##  [87] "DEEP HAIL"                      "DENSE FOG"                     
##  [89] "DENSE SMOKE"                    "DOWNBURST"                     
##  [91] "DOWNBURST WINDS"                "DRIEST MONTH"                  
##  [93] "Drifting Snow"                  "DROUGHT"                       
##  [95] "DROUGHT/EXCESSIVE HEAT"         "DROWNING"                      
##  [97] "DRY"                            "DRY CONDITIONS"                
##  [99] "DRY HOT WEATHER"                "DRY MICROBURST"                
## [101] "DRY MICROBURST 50"              "DRY MICROBURST 53"             
## [103] "DRY MICROBURST 58"              "DRY MICROBURST 61"             
## [105] "DRY MICROBURST 84"              "DRY MICROBURST WINDS"          
## [107] "DRY MIRCOBURST WINDS"           "DRY PATTERN"                   
## [109] "DRY SPELL"                      "DRY WEATHER"                   
## [111] "DRYNESS"                        "DUST DEVEL"                    
## [113] "Dust Devil"                     "DUST DEVIL"                    
## [115] "DUST DEVIL WATERSPOUT"          "DUST STORM"                    
## [117] "DUST STORM/HIGH WINDS"          "DUSTSTORM"                     
## [119] "EARLY FREEZE"                   "Early Frost"                   
## [121] "EARLY FROST"                    "EARLY RAIN"                    
## [123] "EARLY SNOW"                     "Early snowfall"                
## [125] "EARLY SNOWFALL"                 "Erosion/Cstl Flood"            
## [127] "EXCESSIVE"                      "Excessive Cold"                
## [129] "EXCESSIVE HEAT"                 "EXCESSIVE HEAT/DROUGHT"        
## [131] "EXCESSIVE PRECIPITATION"        "EXCESSIVE RAIN"                
## [133] "EXCESSIVE RAINFALL"             "EXCESSIVE SNOW"                
## [135] "EXCESSIVE WETNESS"              "EXCESSIVELY DRY"               
## [137] "Extended Cold"                  "Extreme Cold"                  
## [139] "EXTREME COLD"                   "EXTREME COLD/WIND CHILL"       
## [141] "EXTREME HEAT"                   "EXTREME WIND CHILL"            
## [143] "EXTREME WIND CHILL/BLOWING SNO" "EXTREME WIND CHILLS"           
## [145] "EXTREME WINDCHILL"              "EXTREME WINDCHILL TEMPERATURES"
## [147] "EXTREME/RECORD COLD"            "EXTREMELY WET"                 
## [149] "FALLING SNOW/ICE"               "FIRST FROST"                   
## [151] "FIRST SNOW"                     "FLASH FLOOD"                   
## [153] "FLASH FLOOD - HEAVY RAIN"       "FLASH FLOOD FROM ICE JAMS"     
## [155] "FLASH FLOOD LANDSLIDES"         "FLASH FLOOD/"                  
## [157] "FLASH FLOOD/ FLOOD"             "FLASH FLOOD/ STREET"           
## [159] "FLASH FLOOD/FLOOD"              "FLASH FLOOD/HEAVY RAIN"        
## [161] "FLASH FLOOD/LANDSLIDE"          "FLASH FLOODING"                
## [163] "FLASH FLOODING/FLOOD"           "FLASH FLOODING/THUNDERSTORM WI"
## [165] "FLASH FLOODS"                   "FLASH FLOOODING"               
## [167] "Flood"                          "FLOOD"                         
## [169] "FLOOD & HEAVY RAIN"             "FLOOD FLASH"                   
## [171] "FLOOD FLOOD/FLASH"              "FLOOD WATCH/"                  
## [173] "FLOOD/FLASH"                    "Flood/Flash Flood"             
## [175] "FLOOD/FLASH FLOOD"              "FLOOD/FLASH FLOODING"          
## [177] "FLOOD/FLASH/FLOOD"              "FLOOD/FLASHFLOOD"              
## [179] "FLOOD/RAIN/WIND"                "FLOOD/RAIN/WINDS"              
## [181] "FLOOD/RIVER FLOOD"              "Flood/Strong Wind"             
## [183] "FLOODING"                       "FLOODS"                        
## [185] "FOG"                            "FOG AND COLD TEMPERATURES"     
## [187] "FOREST FIRES"                   "Freeze"                        
## [189] "FREEZE"                         "Freezing drizzle"              
## [191] "Freezing Drizzle"               "FREEZING DRIZZLE"              
## [193] "FREEZING DRIZZLE AND FREEZING"  "Freezing Fog"                  
## [195] "FREEZING FOG"                   "Freezing rain"                 
## [197] "Freezing Rain"                  "FREEZING RAIN"                 
## [199] "FREEZING RAIN AND SLEET"        "FREEZING RAIN AND SNOW"        
## [201] "FREEZING RAIN SLEET AND"        "FREEZING RAIN SLEET AND LIGHT" 
## [203] "FREEZING RAIN/SLEET"            "FREEZING RAIN/SNOW"            
## [205] "Freezing Spray"                 "Frost"                         
## [207] "FROST"                          "Frost/Freeze"                  
## [209] "FROST/FREEZE"                   "FROST\\FREEZE"                 
## [211] "FUNNEL"                         "Funnel Cloud"                  
## [213] "FUNNEL CLOUD"                   "FUNNEL CLOUD."                 
## [215] "FUNNEL CLOUD/HAIL"              "FUNNEL CLOUDS"                 
## [217] "FUNNELS"                        "Glaze"                         
## [219] "GLAZE"                          "GLAZE ICE"                     
## [221] "GLAZE/ICE STORM"                "gradient wind"                 
## [223] "Gradient wind"                  "GRADIENT WIND"                 
## [225] "GRADIENT WINDS"                 "GRASS FIRES"                   
## [227] "GROUND BLIZZARD"                "GUSTNADO"                      
## [229] "GUSTNADO AND"                   "GUSTY LAKE WIND"               
## [231] "GUSTY THUNDERSTORM WIND"        "GUSTY THUNDERSTORM WINDS"      
## [233] "Gusty Wind"                     "GUSTY WIND"                    
## [235] "GUSTY WIND/HAIL"                "GUSTY WIND/HVY RAIN"           
## [237] "Gusty wind/rain"                "Gusty winds"                   
## [239] "Gusty Winds"                    "GUSTY WINDS"                   
## [241] "HAIL"                           "HAIL 0.75"                     
## [243] "HAIL 0.88"                      "HAIL 075"                      
## [245] "HAIL 088"                       "HAIL 1.00"                     
## [247] "HAIL 1.75"                      "HAIL 1.75)"                    
## [249] "HAIL 100"                       "HAIL 125"                      
## [251] "HAIL 150"                       "HAIL 175"                      
## [253] "HAIL 200"                       "HAIL 225"                      
## [255] "HAIL 275"                       "HAIL 450"                      
## [257] "HAIL 75"                        "HAIL 80"                       
## [259] "HAIL 88"                        "HAIL ALOFT"                    
## [261] "HAIL DAMAGE"                    "HAIL FLOODING"                 
## [263] "HAIL STORM"                     "Hail(0.75)"                    
## [265] "HAIL/ICY ROADS"                 "HAIL/WIND"                     
## [267] "HAIL/WINDS"                     "HAILSTORM"                     
## [269] "HAILSTORMS"                     "HARD FREEZE"                   
## [271] "HAZARDOUS SURF"                 "HEAT"                          
## [273] "HEAT DROUGHT"                   "Heat Wave"                     
## [275] "HEAT WAVE"                      "HEAT WAVE DROUGHT"             
## [277] "HEAT WAVES"                     "HEAT/DROUGHT"                  
## [279] "Heatburst"                      "HEAVY LAKE SNOW"               
## [281] "HEAVY MIX"                      "HEAVY PRECIPATATION"           
## [283] "Heavy Precipitation"            "HEAVY PRECIPITATION"           
## [285] "Heavy rain"                     "Heavy Rain"                    
## [287] "HEAVY RAIN"                     "HEAVY RAIN AND FLOOD"          
## [289] "Heavy Rain and Wind"            "HEAVY RAIN EFFECTS"            
## [291] "HEAVY RAIN/FLOODING"            "Heavy Rain/High Surf"          
## [293] "HEAVY RAIN/LIGHTNING"           "HEAVY RAIN/MUDSLIDES/FLOOD"    
## [295] "HEAVY RAIN/SEVERE WEATHER"      "HEAVY RAIN/SMALL STREAM URBAN" 
## [297] "HEAVY RAIN/SNOW"                "HEAVY RAIN/URBAN FLOOD"        
## [299] "HEAVY RAIN/WIND"                "HEAVY RAIN; URBAN FLOOD WINDS;"
## [301] "HEAVY RAINFALL"                 "HEAVY RAINS"                   
## [303] "HEAVY RAINS/FLOODING"           "HEAVY SEAS"                    
## [305] "HEAVY SHOWER"                   "HEAVY SHOWERS"                 
## [307] "HEAVY SNOW"                     "HEAVY SNOW-SQUALLS"            
## [309] "HEAVY SNOW   FREEZING RAIN"     "HEAVY SNOW & ICE"              
## [311] "HEAVY SNOW AND"                 "HEAVY SNOW AND HIGH WINDS"     
## [313] "HEAVY SNOW AND ICE"             "HEAVY SNOW AND ICE STORM"      
## [315] "HEAVY SNOW AND STRONG WINDS"    "HEAVY SNOW ANDBLOWING SNOW"    
## [317] "Heavy snow shower"              "HEAVY SNOW SQUALLS"            
## [319] "HEAVY SNOW/BLIZZARD"            "HEAVY SNOW/BLIZZARD/AVALANCHE" 
## [321] "HEAVY SNOW/BLOWING SNOW"        "HEAVY SNOW/FREEZING RAIN"      
## [323] "HEAVY SNOW/HIGH"                "HEAVY SNOW/HIGH WIND"          
## [325] "HEAVY SNOW/HIGH WINDS"          "HEAVY SNOW/HIGH WINDS & FLOOD" 
## [327] "HEAVY SNOW/HIGH WINDS/FREEZING" "HEAVY SNOW/ICE"                
## [329] "HEAVY SNOW/ICE STORM"           "HEAVY SNOW/SLEET"              
## [331] "HEAVY SNOW/SQUALLS"             "HEAVY SNOW/WIND"               
## [333] "HEAVY SNOW/WINTER STORM"        "HEAVY SNOWPACK"                
## [335] "Heavy Surf"                     "HEAVY SURF"                    
## [337] "Heavy surf and wind"            "HEAVY SURF COASTAL FLOODING"   
## [339] "HEAVY SURF/HIGH SURF"           "HEAVY SWELLS"                  
## [341] "HEAVY WET SNOW"                 "HIGH"                          
## [343] "HIGH  SWELLS"                   "HIGH  WINDS"                   
## [345] "HIGH SEAS"                      "High Surf"                     
## [347] "HIGH SURF"                      "HIGH SURF ADVISORIES"          
## [349] "HIGH SURF ADVISORY"             "HIGH SWELLS"                   
## [351] "HIGH TEMPERATURE RECORD"        "HIGH TIDES"                    
## [353] "HIGH WATER"                     "HIGH WAVES"                    
## [355] "High Wind"                      "HIGH WIND"                     
## [357] "HIGH WIND (G40)"                "HIGH WIND 48"                  
## [359] "HIGH WIND 63"                   "HIGH WIND 70"                  
## [361] "HIGH WIND AND HEAVY SNOW"       "HIGH WIND AND HIGH TIDES"      
## [363] "HIGH WIND AND SEAS"             "HIGH WIND DAMAGE"              
## [365] "HIGH WIND/ BLIZZARD"            "HIGH WIND/BLIZZARD"            
## [367] "HIGH WIND/BLIZZARD/FREEZING RA" "HIGH WIND/HEAVY SNOW"          
## [369] "HIGH WIND/LOW WIND CHILL"       "HIGH WIND/SEAS"                
## [371] "HIGH WIND/WIND CHILL"           "HIGH WIND/WIND CHILL/BLIZZARD" 
## [373] "HIGH WINDS"                     "HIGH WINDS 55"                 
## [375] "HIGH WINDS 57"                  "HIGH WINDS 58"                 
## [377] "HIGH WINDS 63"                  "HIGH WINDS 66"                 
## [379] "HIGH WINDS 67"                  "HIGH WINDS 73"                 
## [381] "HIGH WINDS 76"                  "HIGH WINDS 80"                 
## [383] "HIGH WINDS 82"                  "HIGH WINDS AND WIND CHILL"     
## [385] "HIGH WINDS DUST STORM"          "HIGH WINDS HEAVY RAINS"        
## [387] "HIGH WINDS/"                    "HIGH WINDS/COASTAL FLOOD"      
## [389] "HIGH WINDS/COLD"                "HIGH WINDS/FLOODING"           
## [391] "HIGH WINDS/HEAVY RAIN"          "HIGH WINDS/SNOW"               
## [393] "HIGHWAY FLOODING"               "Hot and Dry"                   
## [395] "HOT PATTERN"                    "HOT SPELL"                     
## [397] "HOT WEATHER"                    "HOT/DRY PATTERN"               
## [399] "HURRICANE"                      "HURRICANE-GENERATED SWELLS"    
## [401] "Hurricane Edouard"              "HURRICANE EMILY"               
## [403] "HURRICANE ERIN"                 "HURRICANE FELIX"               
## [405] "HURRICANE GORDON"               "HURRICANE OPAL"                
## [407] "HURRICANE OPAL/HIGH WINDS"      "HURRICANE/TYPHOON"             
## [409] "HVY RAIN"                       "HYPERTHERMIA/EXPOSURE"         
## [411] "HYPOTHERMIA"                    "Hypothermia/Exposure"          
## [413] "HYPOTHERMIA/EXPOSURE"           "ICE"                           
## [415] "ICE AND SNOW"                   "ICE FLOES"                     
## [417] "Ice Fog"                        "ICE JAM"                       
## [419] "Ice jam flood (minor"           "ICE JAM FLOODING"              
## [421] "ICE ON ROAD"                    "ICE PELLETS"                   
## [423] "ICE ROADS"                      "ICE STORM"                     
## [425] "ICE STORM AND SNOW"             "ICE STORM/FLASH FLOOD"         
## [427] "Ice/Snow"                       "ICE/SNOW"                      
## [429] "ICE/STRONG WINDS"               "Icestorm/Blizzard"             
## [431] "Icy Roads"                      "ICY ROADS"                     
## [433] "LACK OF SNOW"                   "LAKE-EFFECT SNOW"              
## [435] "Lake Effect Snow"               "LAKE EFFECT SNOW"              
## [437] "LAKE FLOOD"                     "LAKESHORE FLOOD"               
## [439] "LANDSLIDE"                      "LANDSLIDE/URBAN FLOOD"         
## [441] "LANDSLIDES"                     "Landslump"                     
## [443] "LANDSLUMP"                      "LANDSPOUT"                     
## [445] "LARGE WALL CLOUD"               "Late-season Snowfall"          
## [447] "LATE FREEZE"                    "LATE SEASON HAIL"              
## [449] "LATE SEASON SNOW"               "Late Season Snowfall"          
## [451] "LATE SNOW"                      "LIGHT FREEZING RAIN"           
## [453] "Light snow"                     "Light Snow"                    
## [455] "LIGHT SNOW"                     "LIGHT SNOW AND SLEET"          
## [457] "Light Snow/Flurries"            "LIGHT SNOW/FREEZING PRECIP"    
## [459] "Light Snowfall"                 "LIGHTING"                      
## [461] "LIGHTNING"                      "LIGHTNING  WAUSEON"            
## [463] "LIGHTNING AND HEAVY RAIN"       "LIGHTNING AND THUNDERSTORM WIN"
## [465] "LIGHTNING AND WINDS"            "LIGHTNING DAMAGE"              
## [467] "LIGHTNING FIRE"                 "LIGHTNING INJURY"              
## [469] "LIGHTNING THUNDERSTORM WINDS"   "LIGHTNING THUNDERSTORM WINDSS" 
## [471] "LIGHTNING."                     "LIGHTNING/HEAVY RAIN"          
## [473] "LIGNTNING"                      "LOCAL FLASH FLOOD"             
## [475] "LOCAL FLOOD"                    "LOCALLY HEAVY RAIN"            
## [477] "LOW TEMPERATURE"                "LOW TEMPERATURE RECORD"        
## [479] "LOW WIND CHILL"                 "MAJOR FLOOD"                   
## [481] "Marine Accident"                "MARINE HAIL"                   
## [483] "MARINE HIGH WIND"               "MARINE MISHAP"                 
## [485] "MARINE STRONG WIND"             "MARINE THUNDERSTORM WIND"      
## [487] "MARINE TSTM WIND"               "Metro Storm, May 26"           
## [489] "Microburst"                     "MICROBURST"                    
## [491] "MICROBURST WINDS"               "Mild and Dry Pattern"          
## [493] "MILD PATTERN"                   "MILD/DRY PATTERN"              
## [495] "MINOR FLOOD"                    "Minor Flooding"                
## [497] "MINOR FLOODING"                 "MIXED PRECIP"                  
## [499] "Mixed Precipitation"            "MIXED PRECIPITATION"           
## [501] "MODERATE SNOW"                  "MODERATE SNOWFALL"             
## [503] "MONTHLY PRECIPITATION"          "Monthly Rainfall"              
## [505] "MONTHLY RAINFALL"               "Monthly Snowfall"              
## [507] "MONTHLY SNOWFALL"               "MONTHLY TEMPERATURE"           
## [509] "Mountain Snows"                 "MUD SLIDE"                     
## [511] "MUD SLIDES"                     "MUD SLIDES URBAN FLOODING"     
## [513] "MUD/ROCK SLIDE"                 "Mudslide"                      
## [515] "MUDSLIDE"                       "MUDSLIDE/LANDSLIDE"            
## [517] "Mudslides"                      "MUDSLIDES"                     
## [519] "NEAR RECORD SNOW"               "No Severe Weather"             
## [521] "NON-SEVERE WIND DAMAGE"         "NON-TSTM WIND"                 
## [523] "NON SEVERE HAIL"                "NON TSTM WIND"                 
## [525] "NONE"                           "NORMAL PRECIPITATION"          
## [527] "NORTHERN LIGHTS"                "Other"                         
## [529] "OTHER"                          "PATCHY DENSE FOG"              
## [531] "PATCHY ICE"                     "Prolong Cold"                  
## [533] "PROLONG COLD"                   "PROLONG COLD/SNOW"             
## [535] "PROLONG WARMTH"                 "PROLONGED RAIN"                
## [537] "RAIN"                           "RAIN (HEAVY)"                  
## [539] "RAIN AND WIND"                  "Rain Damage"                   
## [541] "RAIN/SNOW"                      "RAIN/WIND"                     
## [543] "RAINSTORM"                      "RAPIDLY RISING WATER"          
## [545] "RECORD  COLD"                   "Record Cold"                   
## [547] "RECORD COLD"                    "RECORD COLD AND HIGH WIND"     
## [549] "RECORD COLD/FROST"              "RECORD COOL"                   
## [551] "Record dry month"               "RECORD DRYNESS"                
## [553] "Record Heat"                    "RECORD HEAT"                   
## [555] "RECORD HEAT WAVE"               "Record High"                   
## [557] "RECORD HIGH"                    "RECORD HIGH TEMPERATURE"       
## [559] "RECORD HIGH TEMPERATURES"       "RECORD LOW"                    
## [561] "RECORD LOW RAINFALL"            "Record May Snow"               
## [563] "RECORD PRECIPITATION"           "RECORD RAINFALL"               
## [565] "RECORD SNOW"                    "RECORD SNOW/COLD"              
## [567] "RECORD SNOWFALL"                "Record temperature"            
## [569] "RECORD TEMPERATURE"             "Record Temperatures"           
## [571] "RECORD TEMPERATURES"            "RECORD WARM"                   
## [573] "RECORD WARM TEMPS."             "Record Warmth"                 
## [575] "RECORD WARMTH"                  "Record Winter Snow"            
## [577] "RECORD/EXCESSIVE HEAT"          "RECORD/EXCESSIVE RAINFALL"     
## [579] "RED FLAG CRITERIA"              "RED FLAG FIRE WX"              
## [581] "REMNANTS OF FLOYD"              "RIP CURRENT"                   
## [583] "RIP CURRENTS"                   "RIP CURRENTS HEAVY SURF"       
## [585] "RIP CURRENTS/HEAVY SURF"        "RIVER AND STREAM FLOOD"        
## [587] "RIVER FLOOD"                    "River Flooding"                
## [589] "RIVER FLOODING"                 "ROCK SLIDE"                    
## [591] "ROGUE WAVE"                     "ROTATING WALL CLOUD"           
## [593] "ROUGH SEAS"                     "ROUGH SURF"                    
## [595] "RURAL FLOOD"                    "Saharan Dust"                  
## [597] "SAHARAN DUST"                   "Seasonal Snowfall"             
## [599] "SEICHE"                         "SEVERE COLD"                   
## [601] "SEVERE THUNDERSTORM"            "SEVERE THUNDERSTORM WINDS"     
## [603] "SEVERE THUNDERSTORMS"           "SEVERE TURBULENCE"             
## [605] "SLEET"                          "SLEET & FREEZING RAIN"         
## [607] "SLEET STORM"                    "SLEET/FREEZING RAIN"           
## [609] "SLEET/ICE STORM"                "SLEET/RAIN/SNOW"               
## [611] "SLEET/SNOW"                     "small hail"                    
## [613] "Small Hail"                     "SMALL HAIL"                    
## [615] "SMALL STREAM"                   "SMALL STREAM AND"              
## [617] "SMALL STREAM AND URBAN FLOOD"   "SMALL STREAM AND URBAN FLOODIN"
## [619] "SMALL STREAM FLOOD"             "SMALL STREAM FLOODING"         
## [621] "SMALL STREAM URBAN FLOOD"       "SMALL STREAM/URBAN FLOOD"      
## [623] "Sml Stream Fld"                 "SMOKE"                         
## [625] "Snow"                           "SNOW"                          
## [627] "SNOW- HIGH WIND- WIND CHILL"    "Snow Accumulation"             
## [629] "SNOW ACCUMULATION"              "SNOW ADVISORY"                 
## [631] "SNOW AND COLD"                  "SNOW AND HEAVY SNOW"           
## [633] "Snow and Ice"                   "SNOW AND ICE"                  
## [635] "SNOW AND ICE STORM"             "Snow and sleet"                
## [637] "SNOW AND SLEET"                 "SNOW AND WIND"                 
## [639] "SNOW DROUGHT"                   "SNOW FREEZING RAIN"            
## [641] "SNOW SHOWERS"                   "SNOW SLEET"                    
## [643] "SNOW SQUALL"                    "Snow squalls"                  
## [645] "Snow Squalls"                   "SNOW SQUALLS"                  
## [647] "SNOW/ BITTER COLD"              "SNOW/ ICE"                     
## [649] "SNOW/BLOWING SNOW"              "SNOW/COLD"                     
## [651] "SNOW/FREEZING RAIN"             "SNOW/HEAVY SNOW"               
## [653] "SNOW/HIGH WINDS"                "SNOW/ICE"                      
## [655] "SNOW/ICE STORM"                 "SNOW/RAIN"                     
## [657] "SNOW/RAIN/SLEET"                "SNOW/SLEET"                    
## [659] "SNOW/SLEET/FREEZING RAIN"       "SNOW/SLEET/RAIN"               
## [661] "SNOW\\COLD"                     "SNOWFALL RECORD"               
## [663] "SNOWMELT FLOODING"              "SNOWSTORM"                     
## [665] "SOUTHEAST"                      "STORM FORCE WINDS"             
## [667] "STORM SURGE"                    "STORM SURGE/TIDE"              
## [669] "STREAM FLOODING"                "STREET FLOOD"                  
## [671] "STREET FLOODING"                "Strong Wind"                   
## [673] "STRONG WIND"                    "STRONG WIND GUST"              
## [675] "Strong winds"                   "Strong Winds"                  
## [677] "STRONG WINDS"                   "Summary August 10"             
## [679] "Summary August 11"              "Summary August 17"             
## [681] "Summary August 2-3"             "Summary August 21"             
## [683] "Summary August 28"              "Summary August 4"              
## [685] "Summary August 7"               "Summary August 9"              
## [687] "Summary Jan 17"                 "Summary July 23-24"            
## [689] "Summary June 18-19"             "Summary June 5-6"              
## [691] "Summary June 6"                 "Summary of April 12"           
## [693] "Summary of April 13"            "Summary of April 21"           
## [695] "Summary of April 27"            "Summary of April 3rd"          
## [697] "Summary of August 1"            "Summary of July 11"            
## [699] "Summary of July 2"              "Summary of July 22"            
## [701] "Summary of July 26"             "Summary of July 29"            
## [703] "Summary of July 3"              "Summary of June 10"            
## [705] "Summary of June 11"             "Summary of June 12"            
## [707] "Summary of June 13"             "Summary of June 15"            
## [709] "Summary of June 16"             "Summary of June 18"            
## [711] "Summary of June 23"             "Summary of June 24"            
## [713] "Summary of June 3"              "Summary of June 30"            
## [715] "Summary of June 4"              "Summary of June 6"             
## [717] "Summary of March 14"            "Summary of March 23"           
## [719] "Summary of March 24"            "SUMMARY OF MARCH 24-25"        
## [721] "SUMMARY OF MARCH 27"            "SUMMARY OF MARCH 29"           
## [723] "Summary of May 10"              "Summary of May 13"             
## [725] "Summary of May 14"              "Summary of May 22"             
## [727] "Summary of May 22 am"           "Summary of May 22 pm"          
## [729] "Summary of May 26 am"           "Summary of May 26 pm"          
## [731] "Summary of May 31 am"           "Summary of May 31 pm"          
## [733] "Summary of May 9-10"            "Summary Sept. 25-26"           
## [735] "Summary September 20"           "Summary September 23"          
## [737] "Summary September 3"            "Summary September 4"           
## [739] "Summary: Nov. 16"               "Summary: Nov. 6-7"             
## [741] "Summary: Oct. 20-21"            "Summary: October 31"           
## [743] "Summary: Sept. 18"              "Temperature record"            
## [745] "THUDERSTORM WINDS"              "THUNDEERSTORM WINDS"           
## [747] "THUNDERESTORM WINDS"            "THUNDERSNOW"                   
## [749] "Thundersnow shower"             "THUNDERSTORM"                  
## [751] "THUNDERSTORM  WINDS"            "THUNDERSTORM DAMAGE"           
## [753] "THUNDERSTORM DAMAGE TO"         "THUNDERSTORM HAIL"             
## [755] "THUNDERSTORM W INDS"            "Thunderstorm Wind"             
## [757] "THUNDERSTORM WIND"              "THUNDERSTORM WIND (G40)"       
## [759] "THUNDERSTORM WIND 50"           "THUNDERSTORM WIND 52"          
## [761] "THUNDERSTORM WIND 56"           "THUNDERSTORM WIND 59"          
## [763] "THUNDERSTORM WIND 59 MPH"       "THUNDERSTORM WIND 59 MPH."     
## [765] "THUNDERSTORM WIND 60 MPH"       "THUNDERSTORM WIND 65 MPH"      
## [767] "THUNDERSTORM WIND 65MPH"        "THUNDERSTORM WIND 69"          
## [769] "THUNDERSTORM WIND 98 MPH"       "THUNDERSTORM WIND G50"         
## [771] "THUNDERSTORM WIND G51"          "THUNDERSTORM WIND G52"         
## [773] "THUNDERSTORM WIND G55"          "THUNDERSTORM WIND G60"         
## [775] "THUNDERSTORM WIND G61"          "THUNDERSTORM WIND TREES"       
## [777] "THUNDERSTORM WIND."             "THUNDERSTORM WIND/ TREE"       
## [779] "THUNDERSTORM WIND/ TREES"       "THUNDERSTORM WIND/AWNING"      
## [781] "THUNDERSTORM WIND/HAIL"         "THUNDERSTORM WIND/LIGHTNING"   
## [783] "THUNDERSTORM WINDS"             "THUNDERSTORM WINDS      LE CEN"
## [785] "THUNDERSTORM WINDS 13"          "THUNDERSTORM WINDS 2"          
## [787] "THUNDERSTORM WINDS 50"          "THUNDERSTORM WINDS 52"         
## [789] "THUNDERSTORM WINDS 53"          "THUNDERSTORM WINDS 60"         
## [791] "THUNDERSTORM WINDS 61"          "THUNDERSTORM WINDS 62"         
## [793] "THUNDERSTORM WINDS 63 MPH"      "THUNDERSTORM WINDS AND"        
## [795] "THUNDERSTORM WINDS FUNNEL CLOU" "THUNDERSTORM WINDS G"          
## [797] "THUNDERSTORM WINDS G60"         "THUNDERSTORM WINDS HAIL"       
## [799] "THUNDERSTORM WINDS HEAVY RAIN"  "THUNDERSTORM WINDS LIGHTNING"  
## [801] "THUNDERSTORM WINDS SMALL STREA" "THUNDERSTORM WINDS URBAN FLOOD"
## [803] "THUNDERSTORM WINDS."            "THUNDERSTORM WINDS/ FLOOD"     
## [805] "THUNDERSTORM WINDS/ HAIL"       "THUNDERSTORM WINDS/FLASH FLOOD"
## [807] "THUNDERSTORM WINDS/FLOODING"    "THUNDERSTORM WINDS/FUNNEL CLOU"
## [809] "THUNDERSTORM WINDS/HAIL"        "THUNDERSTORM WINDS/HEAVY RAIN" 
## [811] "THUNDERSTORM WINDS53"           "THUNDERSTORM WINDSHAIL"        
## [813] "THUNDERSTORM WINDSS"            "THUNDERSTORM WINS"             
## [815] "THUNDERSTORMS"                  "THUNDERSTORMS WIND"            
## [817] "THUNDERSTORMS WINDS"            "THUNDERSTORMW"                 
## [819] "THUNDERSTORMW 50"               "THUNDERSTORMW WINDS"           
## [821] "THUNDERSTORMWINDS"              "THUNDERSTROM WIND"             
## [823] "THUNDERSTROM WINDS"             "THUNDERTORM WINDS"             
## [825] "THUNDERTSORM WIND"              "THUNDESTORM WINDS"             
## [827] "THUNERSTORM WINDS"              "TIDAL FLOOD"                   
## [829] "Tidal Flooding"                 "TIDAL FLOODING"                
## [831] "TORNADO"                        "TORNADO DEBRIS"                
## [833] "TORNADO F0"                     "TORNADO F1"                    
## [835] "TORNADO F2"                     "TORNADO F3"                    
## [837] "TORNADO/WATERSPOUT"             "TORNADOES"                     
## [839] "TORNADOES, TSTM WIND, HAIL"     "TORNADOS"                      
## [841] "TORNDAO"                        "TORRENTIAL RAIN"               
## [843] "Torrential Rainfall"            "TROPICAL DEPRESSION"           
## [845] "TROPICAL STORM"                 "TROPICAL STORM ALBERTO"        
## [847] "TROPICAL STORM DEAN"            "TROPICAL STORM GORDON"         
## [849] "TROPICAL STORM JERRY"           "TSTM"                          
## [851] "TSTM HEAVY RAIN"                "Tstm Wind"                     
## [853] "TSTM WIND"                      "TSTM WIND  (G45)"              
## [855] "TSTM WIND (41)"                 "TSTM WIND (G35)"               
## [857] "TSTM WIND (G40)"                "TSTM WIND (G45)"               
## [859] "TSTM WIND 40"                   "TSTM WIND 45"                  
## [861] "TSTM WIND 50"                   "TSTM WIND 51"                  
## [863] "TSTM WIND 52"                   "TSTM WIND 55"                  
## [865] "TSTM WIND 65)"                  "TSTM WIND AND LIGHTNING"       
## [867] "TSTM WIND DAMAGE"               "TSTM WIND G45"                 
## [869] "TSTM WIND G58"                  "TSTM WIND/HAIL"                
## [871] "TSTM WINDS"                     "TSTM WND"                      
## [873] "TSTMW"                          "TSUNAMI"                       
## [875] "TUNDERSTORM WIND"               "TYPHOON"                       
## [877] "Unseasonable Cold"              "UNSEASONABLY COLD"             
## [879] "UNSEASONABLY COOL"              "UNSEASONABLY COOL & WET"       
## [881] "UNSEASONABLY DRY"               "UNSEASONABLY HOT"              
## [883] "UNSEASONABLY WARM"              "UNSEASONABLY WARM & WET"       
## [885] "UNSEASONABLY WARM AND DRY"      "UNSEASONABLY WARM YEAR"        
## [887] "UNSEASONABLY WARM/WET"          "UNSEASONABLY WET"              
## [889] "UNSEASONAL LOW TEMP"            "UNSEASONAL RAIN"               
## [891] "UNUSUAL WARMTH"                 "UNUSUAL/RECORD WARMTH"         
## [893] "UNUSUALLY COLD"                 "UNUSUALLY LATE SNOW"           
## [895] "UNUSUALLY WARM"                 "URBAN AND SMALL"               
## [897] "URBAN AND SMALL STREAM"         "URBAN AND SMALL STREAM FLOOD"  
## [899] "URBAN AND SMALL STREAM FLOODIN" "Urban flood"                   
## [901] "Urban Flood"                    "URBAN FLOOD"                   
## [903] "URBAN FLOOD LANDSLIDE"          "Urban Flooding"                
## [905] "URBAN FLOODING"                 "URBAN FLOODS"                  
## [907] "URBAN SMALL"                    "URBAN SMALL STREAM FLOOD"      
## [909] "URBAN/SMALL"                    "URBAN/SMALL FLOODING"          
## [911] "URBAN/SMALL STREAM"             "URBAN/SMALL STREAM  FLOOD"     
## [913] "URBAN/SMALL STREAM FLOOD"       "URBAN/SMALL STREAM FLOODING"   
## [915] "URBAN/SMALL STRM FLDG"          "URBAN/SML STREAM FLD"          
## [917] "URBAN/SML STREAM FLDG"          "URBAN/STREET FLOODING"         
## [919] "VERY DRY"                       "VERY WARM"                     
## [921] "VOG"                            "Volcanic Ash"                  
## [923] "VOLCANIC ASH"                   "Volcanic Ash Plume"            
## [925] "VOLCANIC ASHFALL"               "VOLCANIC ERUPTION"             
## [927] "WAKE LOW WIND"                  "WALL CLOUD"                    
## [929] "WALL CLOUD/FUNNEL CLOUD"        "WARM DRY CONDITIONS"           
## [931] "WARM WEATHER"                   "WATER SPOUT"                   
## [933] "WATERSPOUT"                     "WATERSPOUT-"                   
## [935] "WATERSPOUT-TORNADO"             "WATERSPOUT FUNNEL CLOUD"       
## [937] "WATERSPOUT TORNADO"             "WATERSPOUT/"                   
## [939] "WATERSPOUT/ TORNADO"            "WATERSPOUT/TORNADO"            
## [941] "WATERSPOUTS"                    "WAYTERSPOUT"                   
## [943] "wet micoburst"                  "WET MICROBURST"                
## [945] "Wet Month"                      "WET SNOW"                      
## [947] "WET WEATHER"                    "Wet Year"                      
## [949] "Whirlwind"                      "WHIRLWIND"                     
## [951] "WILD FIRES"                     "WILD/FOREST FIRE"              
## [953] "WILD/FOREST FIRES"              "WILDFIRE"                      
## [955] "WILDFIRES"                      "Wind"                          
## [957] "WIND"                           "WIND ADVISORY"                 
## [959] "WIND AND WAVE"                  "WIND CHILL"                    
## [961] "WIND CHILL/HIGH WIND"           "Wind Damage"                   
## [963] "WIND DAMAGE"                    "WIND GUSTS"                    
## [965] "WIND STORM"                     "WIND/HAIL"                     
## [967] "WINDS"                          "WINTER MIX"                    
## [969] "WINTER STORM"                   "WINTER STORM HIGH WINDS"       
## [971] "WINTER STORM/HIGH WIND"         "WINTER STORM/HIGH WINDS"       
## [973] "WINTER STORMS"                  "Winter Weather"                
## [975] "WINTER WEATHER"                 "WINTER WEATHER MIX"            
## [977] "WINTER WEATHER/MIX"             "WINTERY MIX"                   
## [979] "Wintry mix"                     "Wintry Mix"                    
## [981] "WINTRY MIX"                     "WND"

According to the description document, there are 48 permited event types. The dataset contains 982 of them so the EVTYPE column is obviously very untidy. Let’s try to clean it. First, we will capitalize all the characters to make string matching easier:

data$EVTYPE <- toupper(data$EVTYPE)

Now let’s fix a couple of things that are obvious at the first glance:

data$EVTYPE <- gsub("^\\s+|\\s+$", "", data$EVTYPE) #remove head and tail white spaces
data$EVTYPE <- gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", data$EVTYPE, perl=TRUE) #remove double white spaces between words
data <- filter(data, !grepl("SUMMARY", EVTYPE)) #remove summaries
data <- filter(data, !grepl("\\?", EVTYPE)) #remove the "?" description
data$EVTYPE <- sub("TSTM", "THUNDERSTORM", data$EVTYPE)
data$EVTYPE <- sub("WND", "WIND", data$EVTYPE)
data$EVTYPE <- sub("W INDS", "WINDS", data$EVTYPE)
data$EVTYPE <- sub("THUNDERSTORMWINDS", "THUNDERSTORM WINDS", data$EVTYPE)

I have created a .csv file from the description document containing the permitted event-type categories. We will use it to tidy the event types further:

names <- read.csv("Event_types.csv", header = FALSE)
names <- as.character(names[,1])
names <- toupper(names)
names
##  [1] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##  [3] "BLIZZARD"                 "COASTAL FLOOD"           
##  [5] "COLD/WIND CHILL"          "DEBRIS FLOW"             
##  [7] "DENSE FOG"                "DENSE SMOKE"             
##  [9] "DROUGHT"                  "DUST DEVIL"              
## [11] "DUST STORM"               "EXCESSIVE HEAT"          
## [13] "EXTREME COLD/WIND CHILL"  "FLASH FLOOD"             
## [15] "FLOOD"                    "FROST/FREEZE"            
## [17] "FUNNEL CLOUD"             "FREEZING FOG"            
## [19] "HAIL"                     "HEAT"                    
## [21] "HEAVY RAIN"               "HEAVY SNOW"              
## [23] "HIGH SURF"                "HIGH WIND"               
## [25] "HURRICANE (TYPHOON)"      "ICE STORM"               
## [27] "LAKE-EFFECT SNOW"         "LAKESHORE FLOOD"         
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASH"            
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"

First we will search for the strings that match exactly the permitted strings but contain something else (as in “HAIL 59”) and we will replace them with the correct string:

for(i in 1:length(names)) {
  data$EVTYPE[grepl(names[i], data$EVTYPE)] <- names[i]
}
length(sort(unique(data$EVTYPE)))
## [1] 417

We will do the same with approximate matching:

for(i in 1:length(names)) {
  data$EVTYPE[agrepl(names[i], data$EVTYPE)] <- names[i]
}

Now let’s see how far we got:

length(sort(unique(data$EVTYPE)))
## [1] 354

So we have reduced the event types values by more than a half which is better but still far from perfect. I tried to use hierarchical clustering and the partial matching with the “amatch” function but the results still needed a lot of manual adjustments. Besides, closer look at the description document shows that some of the terms are sub-terms of the main ones (for example a gustnado is a manifestation of a thunderstorm wind and therefore should be classified that way). We will therefore fix the rest of the event types manually. We will save them in a .csv document, go through them in a table processor (in this case MS Excell) and assign to each of the term either one of the terms from the description document or a rulling-out value “X”.

The method for assigning the correct values is going to be to full-text search all the mismatching terms in the description document. We will pick the event type that mentions the mismatching term in its description. In case of ambivalence, we will use the “X” marking and those lines will be deleted. The result, of course, leaves a certain space for personal interpretation and bias. We will present the system of matching hereunder so reviewers can discuss or fix the way it was done.

write.csv(sort(unique(data$EVTYPE)), "types.csv")
types <- read.csv("types2.csv", header = FALSE, col.names = c("RAW", "Event_Type"), stringsAsFactors = FALSE)
types
##                                RAW              Event_Type
## 1                  ABNORMAL WARMTH          EXCESSIVE HEAT
## 2                   ABNORMALLY DRY                 DROUGHT
## 3                   ABNORMALLY WET                       X
## 4             ACCUMULATED SNOWFALL              HEAVY SNOW
## 5              AGRICULTURAL FREEZE            FROST/FREEZE
## 6                    APACHE COUNTY                       X
## 7           ASTRONOMICAL HIGH TIDE               HIGH SURF
## 8            ASTRONOMICAL LOW TIDE   ASTRONOMICAL LOW TIDE
## 9                        AVALANCHE               AVALANCHE
## 10                    BEACH EROSIN                       X
## 11                   BEACH EROSION                       X
## 12      BELOW NORMAL PRECIPITATION                       X
## 13                       BLACK ICE                       X
## 14                        BLIZZARD                BLIZZARD
## 15                   BLOW-OUT TIDE                       X
## 16                  BLOW-OUT TIDES                       X
## 17                    BLOWING DUST              DUST STORM
## 18                    BLOWING SNOW                       X
## 19  BLOWING SNOW- EXTREME WIND CHI                       X
## 20  BLOWING SNOW & EXTREME WIND CH                       X
## 21                      BRUSH FIRE                WILDFIRE
## 22                     BRUSH FIRES                WILDFIRE
## 23                 COASTAL EROSION                       X
## 24                   COASTAL STORM                       X
## 25                   COASTAL SURGE        STORM SURGE/TIDE
## 26                    COASTALSTORM                       X
## 27                            COLD         COLD/WIND CHILL
## 28                 COLD AIR FUNNEL            FUNNEL CLOUD
## 29                COLD AIR FUNNELS            FUNNEL CLOUD
## 30                  COLD AND FROST            FROST/FREEZE
## 31                   COLD AND SNOW         COLD/WIND CHILL
## 32         COLD AND WET CONDITIONS         COLD/WIND CHILL
## 33                COLD TEMPERATURE         COLD/WIND CHILL
## 34               COLD TEMPERATURES         COLD/WIND CHILL
## 35                       COLD WAVE         COLD/WIND CHILL
## 36                      COLD/WINDS         COLD/WIND CHILL
## 37                    COOL AND WET         COLD/WIND CHILL
## 38                      COOL SPELL         COLD/WIND CHILL
## 39                       DAM BREAK             FLASH FLOOD
## 40                 DAMAGING FREEZE            FROST/FREEZE
## 41                       DENSE FOG               DENSE FOG
## 42                     DENSE SMOKE             DENSE SMOKE
## 43                       DOWNBURST                       X
## 44                 DOWNBURST WINDS                       X
## 45                    DRIEST MONTH                 DROUGHT
## 46                   DRIFTING SNOW          WINTER WEATHER
## 47                         DROUGHT                 DROUGHT
## 48                        DROWNING                       X
## 49                             DRY                 DROUGHT
## 50                  DRY CONDITIONS                 DROUGHT
## 51                  DRY MICROBURST       THUNDERSTORM WIND
## 52               DRY MICROBURST 50       THUNDERSTORM WIND
## 53               DRY MICROBURST 53       THUNDERSTORM WIND
## 54               DRY MICROBURST 58       THUNDERSTORM WIND
## 55               DRY MICROBURST 61       THUNDERSTORM WIND
## 56               DRY MICROBURST 84       THUNDERSTORM WIND
## 57            DRY MICROBURST WINDS       THUNDERSTORM WIND
## 58            DRY MIRCOBURST WINDS       THUNDERSTORM WIND
## 59                     DRY PATTERN                 DROUGHT
## 60                       DRY SPELL                 DROUGHT
## 61                         DRYNESS                 DROUGHT
## 62                      DUST DEVIL              DUST DEVIL
## 63                      DUST STORM              DUST STORM
## 64                    EARLY FREEZE            FROST/FREEZE
## 65                     EARLY FROST            FROST/FREEZE
## 66                      EARLY RAIN                       X
## 67                      EARLY SNOW          WINTER WEATHER
## 68                  EARLY SNOWFALL          WINTER WEATHER
## 69                       EXCESSIVE                       X
## 70                  EXCESSIVE COLD EXTREME COLD/WIND CHILL
## 71         EXCESSIVE PRECIPITATION              HEAVY RAIN
## 72                  EXCESSIVE RAIN              HEAVY RAIN
## 73              EXCESSIVE RAINFALL              HEAVY RAIN
## 74                  EXCESSIVE SNOW              HEAVY SNOW
## 75                 EXCESSIVELY DRY                 DROUGHT
## 76                   EXTENDED COLD         COLD/WIND CHILL
## 77                    EXTREME COLD EXTREME COLD/WIND CHILL
## 78             EXTREME/RECORD COLD EXTREME COLD/WIND CHILL
## 79                   EXTREMELY WET                       X
## 80                FALLING SNOW/ICE              HEAVY SNOW
## 81                     FIRST FROST            FROST/FREEZE
## 82                      FIRST SNOW                       X
## 83                           FLOOD                   FLOOD
## 84                             FOG               DENSE FOG
## 85       FOG AND COLD TEMPERATURES               DENSE FOG
## 86                    FOREST FIRES                WILDFIRE
## 87                          FREEZE            FROST/FREEZE
## 88                FREEZING DRIZZLE          WINTER WEATHER
## 89   FREEZING DRIZZLE AND FREEZING          WINTER WEATHER
## 90                    FREEZING FOG            FREEZING FOG
## 91                   FREEZING RAIN          WINTER WEATHER
## 92          FREEZING RAIN AND SNOW          WINTER WEATHER
## 93              FREEZING RAIN/SNOW          WINTER WEATHER
## 94                  FREEZING SPRAY                       X
## 95                           FROST            FROST/FREEZE
## 96                    FROST/FREEZE            FROST/FREEZE
## 97                          FUNNEL            FUNNEL CLOUD
## 98                    FUNNEL CLOUD            FUNNEL CLOUD
## 99                         FUNNELS            FUNNEL CLOUD
## 100                          GLAZE                       X
## 101                      GLAZE ICE                       X
## 102                  GRADIENT WIND                       X
## 103                 GRADIENT WINDS                       X
## 104                    GRASS FIRES                WILDFIRE
## 105                       GUSTNADO       THUNDERSTORM WIND
## 106                   GUSTNADO AND       THUNDERSTORM WIND
## 107                GUSTY LAKE WIND             STRONG WIND
## 108                     GUSTY WIND             STRONG WIND
## 109            GUSTY WIND/HVY RAIN             STRONG WIND
## 110                GUSTY WIND/RAIN             STRONG WIND
## 111                    GUSTY WINDS             STRONG WIND
## 112                           HAIL                    HAIL
## 113                    HARD FREEZE            FROST/FREEZE
## 114                 HAZARDOUS SURF               HIGH SURF
## 115                           HEAT                    HEAT
## 116                           HIGH                       X
## 117                      HIGH SEAS                       X
## 118                      HIGH SURF               HIGH SURF
## 119                    HIGH SWELLS                       X
## 120        HIGH TEMPERATURE RECORD                    HEAT
## 121                     HIGH TIDES               HIGH SURF
## 122                     HIGH WATER                       X
## 123                     HIGH WAVES                       X
## 124                      HIGH WIND               HIGH WIND
## 125                    HOT AND DRY                 DROUGHT
## 126                    HOT PATTERN                    HEAT
## 127                      HOT SPELL                    HEAT
## 128                HOT/DRY PATTERN                 DROUGHT
## 129                      HURRICANE     HURRICANE (TYPHOON)
## 130     HURRICANE-GENERATED SWELLS     HURRICANE (TYPHOON)
## 131              HURRICANE EDOUARD     HURRICANE (TYPHOON)
## 132                HURRICANE EMILY     HURRICANE (TYPHOON)
## 133                 HURRICANE ERIN     HURRICANE (TYPHOON)
## 134                HURRICANE FELIX     HURRICANE (TYPHOON)
## 135               HURRICANE GORDON     HURRICANE (TYPHOON)
## 136                 HURRICANE OPAL     HURRICANE (TYPHOON)
## 137              HURRICANE/TYPHOON     HURRICANE (TYPHOON)
## 138                       HVY RAIN              HEAVY RAIN
## 139          HYPERTHERMIA/EXPOSURE                       X
## 140                    HYPOTHERMIA                       X
## 141           HYPOTHERMIA/EXPOSURE                       X
## 142                            ICE                       X
## 143                   ICE AND SNOW          WINTER WEATHER
## 144                      ICE FLOES                       X
## 145                        ICE FOG            FREEZING FOG
## 146                        ICE JAM                       X
## 147                    ICE ON ROAD                       X
## 148                    ICE PELLETS                       X
## 149                      ICE ROADS                       X
## 150                      ICE STORM               ICE STORM
## 151                       ICE/SNOW                       X
## 152                      ICY ROADS                       X
## 153                   LACK OF SNOW                       X
## 154               LAKE-EFFECT SNOW        LAKE-EFFECT SNOW
## 155                      LANDSLIDE             DEBRIS FLOW
## 156                     LANDSLIDES             DEBRIS FLOW
## 157                      LANDSLUMP                       X
## 158                      LANDSPOUT                 TORNADO
## 159               LARGE WALL CLOUD                       X
## 160           LATE-SEASON SNOWFALL                       X
## 161                    LATE FREEZE            FROST/FREEZE
## 162               LATE SEASON SNOW                       X
## 163           LATE SEASON SNOWFALL                       X
## 164                      LATE SNOW                       X
## 165            LIGHT FREEZING RAIN                       X
## 166                     LIGHT SNOW                       X
## 167            LIGHT SNOW/FLURRIES                       X
## 168     LIGHT SNOW/FREEZING PRECIP                       X
## 169                 LIGHT SNOWFALL                       X
## 170                      LIGHTNING               LIGHTNING
## 171                LOW TEMPERATURE                       X
## 172         LOW TEMPERATURE RECORD            FROST/FREEZE
## 173                MARINE ACCIDENT                       X
## 174                  MARINE MISHAP                       X
## 175            METRO STORM, MAY 26                       X
## 176                     MICROBURST       THUNDERSTORM WIND
## 177               MICROBURST WINDS       THUNDERSTORM WIND
## 178           MILD AND DRY PATTERN                       X
## 179                   MILD PATTERN                       X
## 180               MILD/DRY PATTERN                       X
## 181                   MIXED PRECIP                       X
## 182            MIXED PRECIPITATION                       X
## 183                  MODERATE SNOW                       X
## 184              MODERATE SNOWFALL                       X
## 185          MONTHLY PRECIPITATION                       X
## 186               MONTHLY RAINFALL                       X
## 187               MONTHLY SNOWFALL                       X
## 188            MONTHLY TEMPERATURE                       X
## 189                 MOUNTAIN SNOWS                       X
## 190                      MUD SLIDE                       X
## 191                     MUD SLIDES                       X
## 192                 MUD/ROCK SLIDE                       X
## 193                       MUDSLIDE                       X
## 194             MUDSLIDE/LANDSLIDE                       X
## 195                      MUDSLIDES                       X
## 196               NEAR RECORD SNOW              HEAVY SNOW
## 197         NON-SEVERE WIND DAMAGE                       X
## 198                           NONE                       X
## 199           NORMAL PRECIPITATION                       X
## 200                NORTHERN LIGHTS                       X
## 201                          OTHER                       X
## 202                     PATCHY ICE                       X
## 203                   PROLONG COLD         COLD/WIND CHILL
## 204              PROLONG COLD/SNOW         COLD/WIND CHILL
## 205                 PROLONG WARMTH                    HEAT
## 206                 PROLONGED RAIN                       X
## 207                           RAIN                       X
## 208                  RAIN AND WIND                       X
## 209                    RAIN DAMAGE                       X
## 210                      RAIN/SNOW                       X
## 211                      RAIN/WIND                       X
## 212                      RAINSTORM                       X
## 213           RAPIDLY RISING WATER                       X
## 214                    RECORD COLD EXTREME COLD/WIND CHILL
## 215              RECORD COLD/FROST EXTREME COLD/WIND CHILL
## 216                    RECORD COOL EXTREME COLD/WIND CHILL
## 217               RECORD DRY MONTH                 DROUGHT
## 218                 RECORD DRYNESS                 DROUGHT
## 219                    RECORD HIGH                       X
## 220        RECORD HIGH TEMPERATURE          EXCESSIVE HEAT
## 221       RECORD HIGH TEMPERATURES          EXCESSIVE HEAT
## 222                     RECORD LOW                       X
## 223            RECORD LOW RAINFALL                 DROUGHT
## 224                RECORD MAY SNOW                       X
## 225           RECORD PRECIPITATION                       X
## 226                RECORD RAINFALL                       X
## 227                    RECORD SNOW                       X
## 228               RECORD SNOW/COLD                       X
## 229                RECORD SNOWFALL                       X
## 230             RECORD TEMPERATURE                       X
## 231            RECORD TEMPERATURES                       X
## 232                    RECORD WARM          EXCESSIVE HEAT
## 233             RECORD WARM TEMPS.          EXCESSIVE HEAT
## 234                  RECORD WARMTH          EXCESSIVE HEAT
## 235             RECORD WINTER SNOW                       X
## 236      RECORD/EXCESSIVE RAINFALL                       X
## 237              RED FLAG CRITERIA                       X
## 238               RED FLAG FIRE WX                       X
## 239                    RIP CURRENT             RIP CURRENT
## 240                     ROCK SLIDE             DEBRIS FLOW
## 241                     ROGUE WAVE                       X
## 242            ROTATING WALL CLOUD                       X
## 243                     ROUGH SEAS                       X
## 244                     ROUGH SURF                       X
## 245                   SAHARAN DUST                       X
## 246              SEASONAL SNOWFALL                       X
## 247                         SEICHE                  SEICHE
## 248                    SEVERE COLD                       X
## 249            SEVERE THUNDERSTORM                       X
## 250           SEVERE THUNDERSTORMS                       X
## 251              SEVERE TURBULENCE                       X
## 252                          SLEET                   SLEET
## 253                   SMALL STREAM                       X
## 254               SMALL STREAM AND                       X
## 255                 SML STREAM FLD                       X
## 256                          SMOKE                       X
## 257                           SNOW                       X
## 258              SNOW ACCUMULATION                       X
## 259                  SNOW ADVISORY                       X
## 260                  SNOW AND COLD                       X
## 261                   SNOW AND ICE                       X
## 262                  SNOW AND WIND                       X
## 263             SNOW FREEZING RAIN                       X
## 264                   SNOW SHOWERS                       X
## 265                    SNOW SQUALL          WINTER WEATHER
## 266                   SNOW SQUALLS          WINTER WEATHER
## 267              SNOW/ BITTER COLD                       X
## 268                      SNOW/ ICE                       X
## 269              SNOW/BLOWING SNOW                       X
## 270                      SNOW/COLD                       X
## 271             SNOW/FREEZING RAIN                       X
## 272                       SNOW/ICE                       X
## 273                      SNOW/RAIN                       X
## 274                     SNOW\\COLD                       X
## 275                SNOWFALL RECORD              HEAVY SNOW
## 276                      SNOWSTORM                       X
## 277              STORM FORCE WINDS                       X
## 278                    STORM SURGE        STORM SURGE/TIDE
## 279               STORM SURGE/TIDE        STORM SURGE/TIDE
## 280                    STRONG WIND             STRONG WIND
## 281             TEMPERATURE RECORD                       X
## 282                    THUNDERSNOW                       X
## 283             THUNDERSNOW SHOWER                       X
## 284                   THUNDERSTORM                       X
## 285            THUNDERSTORM DAMAGE                       X
## 286         THUNDERSTORM DAMAGE TO                       X
## 287              THUNDERSTORM WIND       THUNDERSTORM WIND
## 288                  THUNDERSTORMS                       X
## 289                  THUNDERSTORMW                       X
## 290               THUNDERSTORMW 50                       X
## 291                        TORNADO                 TORNADO
## 292                        TORNDAO                 TORNADO
## 293                TORRENTIAL RAIN                       X
## 294            TORRENTIAL RAINFALL                       X
## 295            TROPICAL DEPRESSION     TROPICAL DEPRESSION
## 296                 TROPICAL STORM          TROPICAL STORM
## 297                        TSUNAMI                 TSUNAMI
## 298                        TYPHOON     HURRICANE (TYPHOON)
## 299              UNSEASONABLE COLD         COLD/WIND CHILL
## 300              UNSEASONABLY COLD         COLD/WIND CHILL
## 301              UNSEASONABLY COOL         COLD/WIND CHILL
## 302        UNSEASONABLY COOL & WET         COLD/WIND CHILL
## 303               UNSEASONABLY DRY                 DROUGHT
## 304               UNSEASONABLY HOT                    HEAT
## 305              UNSEASONABLY WARM                    HEAT
## 306        UNSEASONABLY WARM & WET                    HEAT
## 307      UNSEASONABLY WARM AND DRY                 DROUGHT
## 308         UNSEASONABLY WARM YEAR                       X
## 309          UNSEASONABLY WARM/WET                    HEAT
## 310               UNSEASONABLY WET                       X
## 311            UNSEASONAL LOW TEMP                       X
## 312                UNSEASONAL RAIN                       X
## 313                 UNUSUAL WARMTH                    HEAT
## 314          UNUSUAL/RECORD WARMTH          EXCESSIVE HEAT
## 315                 UNUSUALLY COLD         COLD/WIND CHILL
## 316            UNUSUALLY LATE SNOW          WINTER WEATHER
## 317                 UNUSUALLY WARM                    HEAT
## 318                URBAN AND SMALL                       X
## 319         URBAN AND SMALL STREAM                       X
## 320                    URBAN SMALL                       X
## 321                    URBAN/SMALL                       X
## 322             URBAN/SMALL STREAM                       X
## 323          URBAN/SMALL STRM FLDG                       X
## 324           URBAN/SML STREAM FLD                       X
## 325          URBAN/SML STREAM FLDG                       X
## 326                       VERY DRY                 DROUGHT
## 327                      VERY WARM                    HEAT
## 328                            VOG                       X
## 329                   VOLCANIC ASH            VOLCANIC ASH
## 330              VOLCANIC ERUPTION                       X
## 331                  WAKE LOW WIND                       X
## 332                     WALL CLOUD                       X
## 333            WARM DRY CONDITIONS                 DROUGHT
## 334                     WATERSPOUT                 TORNADO
## 335                  WET MICOBURST       THUNDERSTORM WIND
## 336                 WET MICROBURST       THUNDERSTORM WIND
## 337                      WET MONTH                       X
## 338                       WET SNOW                       X
## 339                       WET YEAR                       X
## 340                      WHIRLWIND                       X
## 341               WILD/FOREST FIRE                WILDFIRE
## 342              WILD/FOREST FIRES                WILDFIRE
## 343                       WILDFIRE                WILDFIRE
## 344                           WIND                       X
## 345                  WIND ADVISORY                       X
## 346                  WIND AND WAVE                       X
## 347                    WIND DAMAGE                       X
## 348                     WIND GUSTS                       X
## 349                     WIND STORM                       X
## 350                          WINDS                       X
## 351                     WINTER MIX                       X
## 352                   WINTER STORM                       X
## 353                    WINTERY MIX                       X
## 354                     WINTRY MIX                       X

Let’s now replace the original “EVTYPE” column by the fixed values and filter out all the “X” values:

data <- inner_join(data, types, by = c("EVTYPE" = "RAW")) %>% select(Event_Type, FATALITIES, INJURIES, CROPDMG, PROPDMG) %>% filter(Event_Type != "X")

Evaluation

The task is to determine which events cause the biggest economic harm and the biggest harm to human health. We will measure economic harm by a simple sum of property damage and crop damage. In the case of human health, we will sum the fatilities and injuries but we will give a bigger weight to the fatalities. Injuries can be serious but also relatively minor, whereas a case of death is always serious. We will create the new variables:

data <- mutate(data, Health_Damage = 0.7*FATALITIES + 0.3*INJURIES, Property_Damage = CROPDMG+PROPDMG)

Results

We will now show the top 10 event types in two different bar plots. The first one will show the 10 event types that have the severest impact on human health, and the other one will show the same impact on property.

library(ggplot2)

health <- data %>% group_by(Event_Type) %>% summarise(Total_Health_Damage = sum(Health_Damage)) %>% arrange(desc(Total_Health_Damage))
health <- health[1:10,]
health$Event_Type <- factor(health$Event_Type, levels = health$Event_Type[order(-health$Total_Health_Damage)])

property <- data %>% group_by(Event_Type) %>% summarise(Total_Property_Damage = sum(Property_Damage)) %>% arrange(desc(Total_Property_Damage))
property <- property[1:10,]
property$Event_Type <- factor(property$Event_Type, levels = property$Event_Type[order(-property$Total_Property_Damage)])

ggplot(health, aes(x=Event_Type, y=Total_Health_Damage)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

ggplot(property, aes(x=Event_Type, y=Total_Property_Damage)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))