Synopsis

This data analysis is aimed at determining the severe weather events with the greatest consequences in terms of both population health and economic damages. Data comes from the U.S. National Oceanic and Atmospheric Administration’s storm database. After getting and cleaning data, for each type of severe event the average yearly total of fatalitites, injuries and economic losses are taken, in order to assess the most impactful event types. Results show that some events, such as heat waves, are particularly dangerous for people’s health but not for the economy, whereas hail and severe weather events massively threaten properties, but are comparatively less problematic from the standpoint of human lives. Results show also that tornadoes, floods and thunderstorm winds are amongst the worst events types in terms of both population health and economic consequences.

Data Processing

The first step consist of reading data into R. Then, data needs to be quality checked before the actual data analysis can take place.

Data Reading

Since the size of the input data file is considerable, we choose to read the first 1000 rows in order to explore its structure.

library(readr)
dataset_sample <- read_csv("./data/repdata-data-StormData.csv.bz2", n_max = 1000)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   BGN_DATE = col_character(),
##   BGN_TIME = col_character(),
##   TIME_ZONE = col_character(),
##   COUNTYNAME = col_character(),
##   STATE = col_character(),
##   EVTYPE = col_character(),
##   BGN_AZI = col_logical(),
##   BGN_LOCATI = col_logical(),
##   END_DATE = col_logical(),
##   END_TIME = col_logical(),
##   COUNTYENDN = col_logical(),
##   END_AZI = col_logical(),
##   END_LOCATI = col_logical(),
##   PROPDMGEXP = col_character(),
##   CROPDMGEXP = col_logical(),
##   WFO = col_logical(),
##   STATEOFFIC = col_logical(),
##   ZONENAMES = col_logical(),
##   REMARKS = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
str(dataset_sample)
## tibble [1,000 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ STATE__   : num [1:1000] 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:1000] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:1000] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:1000] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num [1:1000] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:1000] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:1000] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:1000] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : logi [1:1000] NA NA NA NA NA NA ...
##  $ BGN_LOCATI: logi [1:1000] NA NA NA NA NA NA ...
##  $ END_DATE  : logi [1:1000] NA NA NA NA NA NA ...
##  $ END_TIME  : logi [1:1000] NA NA NA NA NA NA ...
##  $ COUNTY_END: num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi [1:1000] NA NA NA NA NA NA ...
##  $ END_RANGE : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : logi [1:1000] NA NA NA NA NA NA ...
##  $ END_LOCATI: logi [1:1000] NA NA NA NA NA NA ...
##  $ LENGTH    : num [1:1000] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num [1:1000] 100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : num [1:1000] 3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num [1:1000] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:1000] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:1000] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:1000] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: logi [1:1000] NA NA NA NA NA NA ...
##  $ WFO       : logi [1:1000] NA NA NA NA NA NA ...
##  $ STATEOFFIC: logi [1:1000] NA NA NA NA NA NA ...
##  $ ZONENAMES : logi [1:1000] NA NA NA NA NA NA ...
##  $ LATITUDE  : num [1:1000] 3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num [1:1000] 8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num [1:1000] 3051 0 0 0 0 ...
##  $ LONGITUDE_: num [1:1000] 8806 0 0 0 0 ...
##  $ REMARKS   : logi [1:1000] NA NA NA NA NA NA ...
##  $ REFNUM    : num [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   STATE__ = col_double(),
##   ..   BGN_DATE = col_character(),
##   ..   BGN_TIME = col_character(),
##   ..   TIME_ZONE = col_character(),
##   ..   COUNTY = col_double(),
##   ..   COUNTYNAME = col_character(),
##   ..   STATE = col_character(),
##   ..   EVTYPE = col_character(),
##   ..   BGN_RANGE = col_double(),
##   ..   BGN_AZI = col_logical(),
##   ..   BGN_LOCATI = col_logical(),
##   ..   END_DATE = col_logical(),
##   ..   END_TIME = col_logical(),
##   ..   COUNTY_END = col_double(),
##   ..   COUNTYENDN = col_logical(),
##   ..   END_RANGE = col_double(),
##   ..   END_AZI = col_logical(),
##   ..   END_LOCATI = col_logical(),
##   ..   LENGTH = col_double(),
##   ..   WIDTH = col_double(),
##   ..   F = col_double(),
##   ..   MAG = col_double(),
##   ..   FATALITIES = col_double(),
##   ..   INJURIES = col_double(),
##   ..   PROPDMG = col_double(),
##   ..   PROPDMGEXP = col_character(),
##   ..   CROPDMG = col_double(),
##   ..   CROPDMGEXP = col_logical(),
##   ..   WFO = col_logical(),
##   ..   STATEOFFIC = col_logical(),
##   ..   ZONENAMES = col_logical(),
##   ..   LATITUDE = col_double(),
##   ..   LONGITUDE = col_double(),
##   ..   LATITUDE_E = col_double(),
##   ..   LONGITUDE_ = col_double(),
##   ..   REMARKS = col_logical(),
##   ..   REFNUM = col_double()
##   .. )

We note that some column data types appear not to match the expected data type for the column. Moreover, since we are interested only in damages to people and properties, we can filter out a number of variables and keep only those related to the date and location of the event, its type, and consequences in terms of fatalties, injuries and economic damages.

dataset <- read_csv("./data/repdata-data-StormData.csv.bz2",
                    col_types = "ncccnccc-------------nnnncnc---------")

Data Preprocessing

First, we check for missing values in the variables relevant for the analysis. We look at each variable individually because we are willing to keep partial observations.

nrow(dataset) - sum(is.na(dataset$FATALITIES) | is.na(dataset$INJURIES) | is.na(dataset$PROPDMG) | is.na(dataset$PROPDMGEXP) | is.na(dataset$CROPDMG) | is.na(dataset$CROPDMGEXP) )
## [1] 279566

No observation completely useless were found. We double-check also on individual variables:

nrow(dataset) - sum(is.na(dataset$FATALITIES))
## [1] 902297
nrow(dataset) - sum(is.na(dataset$INJURIES))
## [1] 902297
nrow(dataset) - sum(is.na(dataset$PROPDMG))
## [1] 902297
nrow(dataset) - sum(is.na(dataset$CROPDMG))
## [1] 902297

Relevant columns does not contain missing values, which is positive. We do not need to remove any row on the basis of missing values.

nrow(dataset) - sum(is.na(dataset$PROPDMGEXP))
## [1] 436363
nrow(dataset) - sum(is.na(dataset$CROPDMGEXP))
## [1] 283884

However, there are missing values in the exponent variables, which we will need to take into account later. Another variable which will be useful for the analysis is BGN_DATE, containing the date of the event. We check its format to ensure it is suitable for processing.

head(dataset$BGN_DATE)
## [1] "4/18/1950 0:00:00"  "4/18/1950 0:00:00"  "2/20/1951 0:00:00" 
## [4] "6/8/1951 0:00:00"   "11/15/1951 0:00:00" "11/15/1951 0:00:00"

The date turns out to be encoded in an irregular format which is painful to work with. Given that for the analysis only the year will be relevant, the following preprocessing will fix the issue:

library(dplyr)
dataset <- mutate(dataset, YEAR = sapply(strsplit(dataset$BGN_DATE, "[/ ]"), function(x){x[3]}))

Now we take a look on the various different events recorded in the data set.

unique(dataset$EVTYPE)
##   [1] "TORNADO"                        "TSTM WIND"                     
##   [3] "HAIL"                           "FREEZING RAIN"                 
##   [5] "SNOW"                           "ICE STORM/FLASH FLOOD"         
##   [7] "SNOW/ICE"                       "WINTER STORM"                  
##   [9] "HURRICANE OPAL/HIGH WINDS"      "THUNDERSTORM WINDS"            
##  [11] "RECORD COLD"                    "HURRICANE ERIN"                
##  [13] "HURRICANE OPAL"                 "HEAVY RAIN"                    
##  [15] "LIGHTNING"                      "THUNDERSTORM WIND"             
##  [17] "DENSE FOG"                      "RIP CURRENT"                   
##  [19] "THUNDERSTORM WINS"              "FLASH FLOOD"                   
##  [21] "FLASH FLOODING"                 "HIGH WINDS"                    
##  [23] "FUNNEL CLOUD"                   "TORNADO F0"                    
##  [25] "THUNDERSTORM WINDS LIGHTNING"   "THUNDERSTORM WINDS/HAIL"       
##  [27] "HEAT"                           "WIND"                          
##  [29] "LIGHTING"                       "HEAVY RAINS"                   
##  [31] "LIGHTNING AND HEAVY RAIN"       "FUNNEL"                        
##  [33] "WALL CLOUD"                     "FLOODING"                      
##  [35] "THUNDERSTORM WINDS HAIL"        "FLOOD"                         
##  [37] "COLD"                           "HEAVY RAIN/LIGHTNING"          
##  [39] "FLASH FLOODING/THUNDERSTORM WI" "WALL CLOUD/FUNNEL CLOUD"       
##  [41] "THUNDERSTORM"                   "WATERSPOUT"                    
##  [43] "EXTREME COLD"                   "HAIL 1.75)"                    
##  [45] "LIGHTNING/HEAVY RAIN"           "HIGH WIND"                     
##  [47] "BLIZZARD"                       "BLIZZARD WEATHER"              
##  [49] "WIND CHILL"                     "BREAKUP FLOODING"              
##  [51] "HIGH WIND/BLIZZARD"             "RIVER FLOOD"                   
##  [53] "HEAVY SNOW"                     "FREEZE"                        
##  [55] "COASTAL FLOOD"                  "HIGH WIND AND HIGH TIDES"      
##  [57] "HIGH WIND/BLIZZARD/FREEZING RA" "HIGH TIDES"                    
##  [59] "HIGH WIND AND HEAVY SNOW"       "RECORD COLD AND HIGH WIND"     
##  [61] "RECORD HIGH TEMPERATURE"        "RECORD HIGH"                   
##  [63] "HIGH WINDS HEAVY RAINS"         "HIGH WIND/ BLIZZARD"           
##  [65] "ICE STORM"                      "BLIZZARD/HIGH WIND"            
##  [67] "HIGH WIND/LOW WIND CHILL"       "HEAVY SNOW/HIGH"               
##  [69] "RECORD LOW"                     "HIGH WINDS AND WIND CHILL"     
##  [71] "HEAVY SNOW/HIGH WINDS/FREEZING" "LOW TEMPERATURE RECORD"        
##  [73] "AVALANCHE"                      "MARINE MISHAP"                 
##  [75] "WIND CHILL/HIGH WIND"           "HIGH WIND/WIND CHILL/BLIZZARD" 
##  [77] "HIGH WIND/WIND CHILL"           "HIGH WIND/HEAVY SNOW"          
##  [79] "HIGH TEMPERATURE RECORD"        "FLOOD WATCH/"                  
##  [81] "RECORD HIGH TEMPERATURES"       "HIGH WIND/SEAS"                
##  [83] "HIGH WINDS/HEAVY RAIN"          "HIGH SEAS"                     
##  [85] "SEVERE TURBULENCE"              "RECORD RAINFALL"               
##  [87] "RECORD SNOWFALL"                "RECORD WARMTH"                 
##  [89] "HEAVY SNOW/WIND"                "EXTREME HEAT"                  
##  [91] "WIND DAMAGE"                    "DUST STORM"                    
##  [93] "APACHE COUNTY"                  "SLEET"                         
##  [95] "HAIL STORM"                     "FUNNEL CLOUDS"                 
##  [97] "FLASH FLOODS"                   "DUST DEVIL"                    
##  [99] "EXCESSIVE HEAT"                 "THUNDERSTORM WINDS/FUNNEL CLOU"
## [101] "WINTER STORM/HIGH WIND"         "WINTER STORM/HIGH WINDS"       
## [103] "GUSTY WINDS"                    "STRONG WINDS"                  
## [105] "FLOODING/HEAVY RAIN"            "SNOW AND WIND"                 
## [107] "HEAVY SURF COASTAL FLOODING"    "HEAVY SURF"                    
## [109] "HEAVY PRECIPATATION"            "URBAN FLOODING"                
## [111] "HIGH SURF"                      "BLOWING DUST"                  
## [113] "URBAN/SMALL"                    "WILD FIRES"                    
## [115] "HIGH"                           "URBAN/SMALL FLOODING"          
## [117] "WATER SPOUT"                    "HIGH WINDS DUST STORM"         
## [119] "WINTER STORM HIGH WINDS"        "LOCAL FLOOD"                   
## [121] "WINTER STORMS"                  "MUDSLIDES"                     
## [123] "RAINSTORM"                      "SEVERE THUNDERSTORM"           
## [125] "SEVERE THUNDERSTORMS"           "SEVERE THUNDERSTORM WINDS"     
## [127] "THUNDERSTORMS WINDS"            "DRY MICROBURST"                
## [129] "FLOOD/FLASH FLOOD"              "FLOOD/RAIN/WINDS"              
## [131] "WINDS"                          "DRY MICROBURST 61"             
## [133] "THUNDERSTORMS"                  "FLASH FLOOD WINDS"             
## [135] "URBAN/SMALL STREAM FLOODING"    "MICROBURST"                    
## [137] "STRONG WIND"                    "HIGH WIND DAMAGE"              
## [139] "STREAM FLOODING"                "URBAN AND SMALL"               
## [141] "HEAVY SNOWPACK"                 "ICE"                           
## [143] "FLASH FLOOD/"                   "DOWNBURST"                     
## [145] "GUSTNADO AND"                   "FLOOD/RAIN/WIND"               
## [147] "WET MICROBURST"                 "DOWNBURST WINDS"               
## [149] "DRY MICROBURST WINDS"           "DRY MIRCOBURST WINDS"          
## [151] "DRY MICROBURST 53"              "SMALL STREAM URBAN FLOOD"      
## [153] "MICROBURST WINDS"               "HIGH WINDS 57"                 
## [155] "DRY MICROBURST 50"              "HIGH WINDS 66"                 
## [157] "HIGH WINDS 76"                  "HIGH WINDS 63"                 
## [159] "HIGH WINDS 67"                  "BLIZZARD/HEAVY SNOW"           
## [161] "HEAVY SNOW/HIGH WINDS"          "BLOWING SNOW"                  
## [163] "HIGH WINDS 82"                  "HIGH WINDS 80"                 
## [165] "HIGH WINDS 58"                  "FREEZING DRIZZLE"              
## [167] "LIGHTNING THUNDERSTORM WINDSS"  "DRY MICROBURST 58"             
## [169] "HAIL 75"                        "HIGH WINDS 73"                 
## [171] "HIGH WINDS 55"                  "LIGHT SNOW AND SLEET"          
## [173] "URBAN FLOOD"                    "DRY MICROBURST 84"             
## [175] "THUNDERSTORM WINDS 60"          "HEAVY RAIN/FLOODING"           
## [177] "THUNDERSTORM WINDSS"            "TORNADOS"                      
## [179] "GLAZE"                          "RECORD HEAT"                   
## [181] "COASTAL FLOODING"               "HEAT WAVE"                     
## [183] "FIRST SNOW"                     "FREEZING RAIN AND SLEET"       
## [185] "UNSEASONABLY DRY"               "UNSEASONABLY WET"              
## [187] "WINTRY MIX"                     "WINTER WEATHER"                
## [189] "UNSEASONABLY COLD"              "EXTREME/RECORD COLD"           
## [191] "RIP CURRENTS HEAVY SURF"        "SLEET/RAIN/SNOW"               
## [193] "UNSEASONABLY WARM"              "DROUGHT"                       
## [195] "NORMAL PRECIPITATION"           "HIGH WINDS/FLOODING"           
## [197] "DRY"                            "RAIN/SNOW"                     
## [199] "SNOW/RAIN/SLEET"                "WATERSPOUT/TORNADO"            
## [201] "WATERSPOUTS"                    "WATERSPOUT TORNADO"            
## [203] "URBAN/SMALL STREAM FLOOD"       "STORM SURGE"                   
## [205] "WATERSPOUT-TORNADO"             "WATERSPOUT-"                   
## [207] "TORNADOES, TSTM WIND, HAIL"     "TROPICAL STORM ALBERTO"        
## [209] "TROPICAL STORM"                 "TROPICAL STORM GORDON"         
## [211] "TROPICAL STORM JERRY"           "LIGHTNING THUNDERSTORM WINDS"  
## [213] "WAYTERSPOUT"                    "MINOR FLOODING"                
## [215] "LIGHTNING INJURY"               "URBAN/SMALL STREAM  FLOOD"     
## [217] "LIGHTNING AND THUNDERSTORM WIN" "THUNDERSTORM WINDS53"          
## [219] "URBAN AND SMALL STREAM FLOOD"   "URBAN AND SMALL STREAM"        
## [221] "WILDFIRE"                       "DAMAGING FREEZE"               
## [223] "THUNDERSTORM WINDS 13"          "SMALL HAIL"                    
## [225] "HEAVY SNOW/HIGH WIND"           "HURRICANE"                     
## [227] "WILD/FOREST FIRE"               "SMALL STREAM FLOODING"         
## [229] "MUD SLIDE"                      "LIGNTNING"                     
## [231] "FROST"                          "FREEZING RAIN/SNOW"            
## [233] "HIGH WINDS/"                    "THUNDERSNOW"                   
## [235] "FLOODS"                         "EXTREME WIND CHILLS"           
## [237] "COOL AND WET"                   "HEAVY RAIN/SNOW"               
## [239] "SMALL STREAM AND URBAN FLOODIN" "SMALL STREAM/URBAN FLOOD"      
## [241] "SNOW/SLEET/FREEZING RAIN"       "SEVERE COLD"                   
## [243] "GLAZE ICE"                      "COLD WAVE"                     
## [245] "EARLY SNOW"                     "SMALL STREAM AND URBAN FLOOD"  
## [247] "HIGH  WINDS"                    "RURAL FLOOD"                   
## [249] "SMALL STREAM AND"               "MUD SLIDES"                    
## [251] "HAIL 80"                        "EXTREME WIND CHILL"            
## [253] "COLD AND WET CONDITIONS"        "EXCESSIVE WETNESS"             
## [255] "GRADIENT WINDS"                 "HEAVY SNOW/BLOWING SNOW"       
## [257] "SLEET/ICE STORM"                "THUNDERSTORM WINDS URBAN FLOOD"
## [259] "THUNDERSTORM WINDS SMALL STREA" "ROTATING WALL CLOUD"           
## [261] "LARGE WALL CLOUD"               "COLD AIR FUNNEL"               
## [263] "GUSTNADO"                       "COLD AIR FUNNELS"              
## [265] "BLOWING SNOW- EXTREME WIND CHI" "SNOW AND HEAVY SNOW"           
## [267] "GROUND BLIZZARD"                "MAJOR FLOOD"                   
## [269] "SNOW/HEAVY SNOW"                "FREEZING RAIN/SLEET"           
## [271] "ICE JAM FLOODING"               "SNOW- HIGH WIND- WIND CHILL"   
## [273] "STREET FLOOD"                   "COLD AIR TORNADO"              
## [275] "SMALL STREAM FLOOD"             "FOG"                           
## [277] "THUNDERSTORM WINDS 2"           "FUNNEL CLOUD/HAIL"             
## [279] "ICE/SNOW"                       "TSTM WIND 51"                  
## [281] "TSTM WIND 50"                   "TSTM WIND 52"                  
## [283] "TSTM WIND 55"                   "HEAVY SNOW/BLIZZARD"           
## [285] "THUNDERSTORM WINDS 61"          "HAIL 0.75"                     
## [287] "THUNDERSTORM DAMAGE"            "THUNDERTORM WINDS"             
## [289] "HAIL 1.00"                      "HAIL/WINDS"                    
## [291] "SNOW AND ICE"                   "WIND STORM"                    
## [293] "SNOWSTORM"                      "GRASS FIRES"                   
## [295] "LAKE FLOOD"                     "PROLONG COLD"                  
## [297] "HAIL/WIND"                      "HAIL 1.75"                     
## [299] "THUNDERSTORMW 50"               "WIND/HAIL"                     
## [301] "SNOW AND ICE STORM"             "URBAN AND SMALL STREAM FLOODIN"
## [303] "THUNDERSTORMS WIND"             "THUNDERSTORM  WINDS"           
## [305] "HEAVY SNOW/SLEET"               "AGRICULTURAL FREEZE"           
## [307] "DROUGHT/EXCESSIVE HEAT"         "TUNDERSTORM WIND"              
## [309] "TROPICAL STORM DEAN"            "THUNDERTSORM WIND"             
## [311] "THUNDERSTORM WINDS/ HAIL"       "THUNDERSTORM WIND/LIGHTNING"   
## [313] "HEAVY RAIN/SEVERE WEATHER"      "THUNDESTORM WINDS"             
## [315] "WATERSPOUT/ TORNADO"            "LIGHTNING."                    
## [317] "WARM DRY CONDITIONS"            "HURRICANE-GENERATED SWELLS"    
## [319] "HEAVY SNOW/ICE STORM"           "RIVER AND STREAM FLOOD"        
## [321] "HIGH WIND 63"                   "COASTAL SURGE"                 
## [323] "HEAVY SNOW AND ICE STORM"       "MINOR FLOOD"                   
## [325] "HIGH WINDS/COASTAL FLOOD"       "RAIN"                          
## [327] "RIVER FLOODING"                 "SNOW/RAIN"                     
## [329] "ICE FLOES"                      "HIGH WAVES"                    
## [331] "SNOW SQUALLS"                   "SNOW SQUALL"                   
## [333] "THUNDERSTORM WIND G50"          "LIGHTNING FIRE"                
## [335] "BLIZZARD/FREEZING RAIN"         "HEAVY LAKE SNOW"               
## [337] "HEAVY SNOW/FREEZING RAIN"       "LAKE EFFECT SNOW"              
## [339] "HEAVY WET SNOW"                 "DUST DEVIL WATERSPOUT"         
## [341] "THUNDERSTORM WINDS/HEAVY RAIN"  "THUNDERSTROM WINDS"            
## [343] "THUNDERSTORM WINDS      LE CEN" "HAIL 225"                      
## [345] "BLIZZARD AND HEAVY SNOW"        "HEAVY SNOW AND ICE"            
## [347] "ICE STORM AND SNOW"             "HEAVY SNOW ANDBLOWING SNOW"    
## [349] "HEAVY SNOW/ICE"                 "BLIZZARD AND EXTREME WIND CHIL"
## [351] "LOW WIND CHILL"                 "BLOWING SNOW & EXTREME WIND CH"
## [353] "WATERSPOUT/"                    "URBAN/SMALL STREAM"            
## [355] "TORNADO F3"                     "FUNNEL CLOUD."                 
## [357] "TORNDAO"                        "HAIL 0.88"                     
## [359] "FLOOD/RIVER FLOOD"              "MUD SLIDES URBAN FLOODING"     
## [361] "TORNADO F1"                     "THUNDERSTORM WINDS G"          
## [363] "DEEP HAIL"                      "GLAZE/ICE STORM"               
## [365] "HEAVY SNOW/WINTER STORM"        "AVALANCE"                      
## [367] "BLIZZARD/WINTER STORM"          "DUST STORM/HIGH WINDS"         
## [369] "ICE JAM"                        "FOREST FIRES"                  
## [371] "THUNDERSTORM WIND G60"          "FROST\\FREEZE"                 
## [373] "THUNDERSTORM WINDS."            "HAIL 88"                       
## [375] "HAIL 175"                       "HVY RAIN"                      
## [377] "HAIL 100"                       "HAIL 150"                      
## [379] "HAIL 075"                       "THUNDERSTORM WIND G55"         
## [381] "HAIL 125"                       "THUNDERSTORM WINDS G60"        
## [383] "HARD FREEZE"                    "HAIL 200"                      
## [385] "THUNDERSTORM WINDS FUNNEL CLOU" "THUNDERSTORM WINDS 62"         
## [387] "WILDFIRES"                      "RECORD HEAT WAVE"              
## [389] "HEAVY SNOW AND HIGH WINDS"      "HEAVY SNOW/HIGH WINDS & FLOOD" 
## [391] "HAIL FLOODING"                  "THUNDERSTORM WINDS/FLASH FLOOD"
## [393] "HIGH WIND 70"                   "WET SNOW"                      
## [395] "HEAVY RAIN AND FLOOD"           "LOCAL FLASH FLOOD"             
## [397] "THUNDERSTORM WINDS 53"          "FLOOD/FLASH FLOODING"          
## [399] "TORNADO/WATERSPOUT"             "RAIN AND WIND"                 
## [401] "THUNDERSTORM WIND 59"           "THUNDERSTORM WIND 52"          
## [403] "COASTAL/TIDAL FLOOD"            "SNOW/ICE STORM"                
## [405] "BELOW NORMAL PRECIPITATION"     "RIP CURRENTS/HEAVY SURF"       
## [407] "FLASH FLOOD/FLOOD"              "EXCESSIVE RAIN"                
## [409] "RECORD/EXCESSIVE HEAT"          "HEAT WAVES"                    
## [411] "LIGHT SNOW"                     "THUNDERSTORM WIND 69"          
## [413] "HAIL DAMAGE"                    "LIGHTNING DAMAGE"              
## [415] "RECORD TEMPERATURES"            "LIGHTNING AND WINDS"           
## [417] "FOG AND COLD TEMPERATURES"      "OTHER"                         
## [419] "RECORD SNOW"                    "SNOW/COLD"                     
## [421] "FLASH FLOOD FROM ICE JAMS"      "TSTM WIND G58"                 
## [423] "MUDSLIDE"                       "HEAVY SNOW SQUALLS"            
## [425] "HEAVY SNOW/SQUALLS"             "HEAVY SNOW-SQUALLS"            
## [427] "ICY ROADS"                      "HEAVY MIX"                     
## [429] "SNOW FREEZING RAIN"             "LACK OF SNOW"                  
## [431] "SNOW/SLEET"                     "SNOW/FREEZING RAIN"            
## [433] "SNOW DROUGHT"                   "THUNDERSTORMW WINDS"           
## [435] "THUNDERSTORM WIND 60 MPH"       "THUNDERSTORM WIND 65MPH"       
## [437] "THUNDERSTORM WIND/ TREES"       "THUNDERSTORM WIND/AWNING"      
## [439] "THUNDERSTORM WIND 98 MPH"       "THUNDERSTORM WIND TREES"       
## [441] "TORRENTIAL RAIN"                "TORNADO F2"                    
## [443] "RIP CURRENTS"                   "HURRICANE EMILY"               
## [445] "HURRICANE GORDON"               "HURRICANE FELIX"               
## [447] "THUNDERSTORM WIND 59 MPH"       "THUNDERSTORM WINDS 63 MPH"     
## [449] "THUNDERSTORM WIND/ TREE"        "THUNDERSTORM DAMAGE TO"        
## [451] "THUNDERSTORM WIND 65 MPH"       "FLASH FLOOD - HEAVY RAIN"      
## [453] "THUNDERSTORM WIND."             "FLASH FLOOD/ STREET"           
## [455] "THUNDERSTORM WIND 59 MPH."      "HEAVY SNOW   FREEZING RAIN"    
## [457] "DAM FAILURE"                    "THUNDERSTORM HAIL"             
## [459] "HAIL 088"                       "THUNDERSTORM WINDSHAIL"        
## [461] "LIGHTNING  WAUSEON"             "THUDERSTORM WINDS"             
## [463] "ICE AND SNOW"                   "RECORD COLD/FROST"             
## [465] "STORM FORCE WINDS"              "FREEZING RAIN AND SNOW"        
## [467] "FREEZING RAIN SLEET AND"        "SOUTHEAST"                     
## [469] "HEAVY SNOW & ICE"               "FREEZING DRIZZLE AND FREEZING" 
## [471] "THUNDERSTORM WINDS AND"         "HAIL/ICY ROADS"                
## [473] "FLASH FLOOD/HEAVY RAIN"         "HEAVY RAIN; URBAN FLOOD WINDS;"
## [475] "HEAVY PRECIPITATION"            "TSTM WIND DAMAGE"              
## [477] "HIGH WATER"                     "FLOOD FLASH"                   
## [479] "RAIN/WIND"                      "THUNDERSTORM WINDS 50"         
## [481] "THUNDERSTORM WIND G52"          "FLOOD FLOOD/FLASH"             
## [483] "THUNDERSTORM WINDS 52"          "SNOW SHOWERS"                  
## [485] "THUNDERSTORM WIND G51"          "HEAT WAVE DROUGHT"             
## [487] "HEAVY SNOW/BLIZZARD/AVALANCHE"  "RECORD SNOW/COLD"              
## [489] "WET WEATHER"                    "UNSEASONABLY WARM AND DRY"     
## [491] "FREEZING RAIN SLEET AND LIGHT"  "RECORD/EXCESSIVE RAINFALL"     
## [493] "TIDAL FLOOD"                    "BEACH EROSIN"                  
## [495] "THUNDERSTORM WIND G61"          "FLOOD/FLASH"                   
## [497] "LOW TEMPERATURE"                "SLEET & FREEZING RAIN"         
## [499] "HEAVY RAINS/FLOODING"           "THUNDERESTORM WINDS"           
## [501] "THUNDERSTORM WINDS/FLOODING"    "THUNDEERSTORM WINDS"           
## [503] "HIGHWAY FLOODING"               "THUNDERSTORM W INDS"           
## [505] "HYPOTHERMIA"                    "FLASH FLOOD/ FLOOD"            
## [507] "THUNDERSTORM WIND 50"           "THUNERSTORM WINDS"             
## [509] "HEAVY RAIN/MUDSLIDES/FLOOD"     "MUD/ROCK SLIDE"                
## [511] "HIGH WINDS/COLD"                "BEACH EROSION/COASTAL FLOOD"   
## [513] "COLD/WINDS"                     "SNOW/ BITTER COLD"             
## [515] "THUNDERSTORM WIND 56"           "SNOW SLEET"                    
## [517] "DRY HOT WEATHER"                "COLD WEATHER"                  
## [519] "RAPIDLY RISING WATER"           "HAIL ALOFT"                    
## [521] "EARLY FREEZE"                   "ICE/STRONG WINDS"              
## [523] "EXTREME WIND CHILL/BLOWING SNO" "SNOW/HIGH WINDS"               
## [525] "HIGH WINDS/SNOW"                "EARLY FROST"                   
## [527] "SNOWMELT FLOODING"              "HEAVY SNOW AND STRONG WINDS"   
## [529] "SNOW ACCUMULATION"              "BLOWING SNOW/EXTREME WIND CHIL"
## [531] "SNOW/ ICE"                      "SNOW/BLOWING SNOW"             
## [533] "TORNADOES"                      "THUNDERSTORM WIND/HAIL"        
## [535] "FLASH FLOODING/FLOOD"           "HAIL 275"                      
## [537] "HAIL 450"                       "FLASH FLOOODING"               
## [539] "EXCESSIVE RAINFALL"             "THUNDERSTORMW"                 
## [541] "HAILSTORM"                      "TSTM WINDS"                    
## [543] "BEACH FLOOD"                    "HAILSTORMS"                    
## [545] "TSTMW"                          "FUNNELS"                       
## [547] "TSTM WIND 65)"                  "THUNDERSTORM WINDS/ FLOOD"     
## [549] "HEAVY RAINFALL"                 "HEAT/DROUGHT"                  
## [551] "HEAT DROUGHT"                   "NEAR RECORD SNOW"              
## [553] "LANDSLIDE"                      "HIGH WIND AND SEAS"            
## [555] "THUNDERSTORMWINDS"              "THUNDERSTORM WINDS HEAVY RAIN" 
## [557] "SLEET/SNOW"                     "EXCESSIVE"                     
## [559] "SNOW/SLEET/RAIN"                "WILD/FOREST FIRES"             
## [561] "HEAVY SEAS"                     "DUSTSTORM"                     
## [563] "FLOOD & HEAVY RAIN"             "?"                             
## [565] "THUNDERSTROM WIND"              "FLOOD/FLASHFLOOD"              
## [567] "SNOW AND COLD"                  "HOT PATTERN"                   
## [569] "PROLONG COLD/SNOW"              "BRUSH FIRES"                   
## [571] "SNOW\\COLD"                     "WINTER MIX"                    
## [573] "EXCESSIVE PRECIPITATION"        "SNOWFALL RECORD"               
## [575] "HOT/DRY PATTERN"                "DRY PATTERN"                   
## [577] "MILD/DRY PATTERN"               "MILD PATTERN"                  
## [579] "LANDSLIDES"                     "HEAVY SHOWERS"                 
## [581] "HEAVY SNOW AND"                 "HIGH WIND 48"                  
## [583] "LAKE-EFFECT SNOW"               "BRUSH FIRE"                    
## [585] "WATERSPOUT FUNNEL CLOUD"        "URBAN SMALL STREAM FLOOD"      
## [587] "SAHARAN DUST"                   "HEAVY SHOWER"                  
## [589] "URBAN FLOOD LANDSLIDE"          "HEAVY SWELLS"                  
## [591] "URBAN SMALL"                    "URBAN FLOODS"                  
## [593] "SMALL STREAM"                   "HEAVY RAIN/URBAN FLOOD"        
## [595] "FLASH FLOOD/LANDSLIDE"          "LANDSLIDE/URBAN FLOOD"         
## [597] "HEAVY RAIN/SMALL STREAM URBAN"  "FLASH FLOOD LANDSLIDES"        
## [599] "EXTREME WINDCHILL"              "URBAN/SML STREAM FLD"          
## [601] "TSTM WIND/HAIL"                 "Other"                         
## [603] "Record dry month"               "Temperature record"            
## [605] "Minor Flooding"                 "Ice jam flood (minor"          
## [607] "High Wind"                      "Tstm Wind"                     
## [609] "ROUGH SURF"                     "Wind"                          
## [611] "Heavy Surf"                     "Dust Devil"                    
## [613] "Wind Damage"                    "Marine Accident"               
## [615] "Snow"                           "Freeze"                        
## [617] "Snow Squalls"                   "Coastal Flooding"              
## [619] "Heavy Rain"                     "Strong Wind"                   
## [621] "COASTAL STORM"                  "COASTALFLOOD"                  
## [623] "Erosion/Cstl Flood"             "Heavy Rain and Wind"           
## [625] "Light Snow/Flurries"            "Wet Month"                     
## [627] "Wet Year"                       "Tidal Flooding"                
## [629] "River Flooding"                 "Damaging Freeze"               
## [631] "Beach Erosion"                  "Hot and Dry"                   
## [633] "Flood/Flash Flood"              "Icy Roads"                     
## [635] "High Surf"                      "Heavy Rain/High Surf"          
## [637] "Thunderstorm Wind"              "Rain Damage"                   
## [639] "Unseasonable Cold"              "Early Frost"                   
## [641] "Wintry Mix"                     "blowing snow"                  
## [643] "STREET FLOODING"                "Record Cold"                   
## [645] "Extreme Cold"                   "Ice Fog"                       
## [647] "Excessive Cold"                 "Torrential Rainfall"           
## [649] "Freezing Rain"                  "Landslump"                     
## [651] "Late-season Snowfall"           "Hurricane Edouard"             
## [653] "Coastal Storm"                  "Flood"                         
## [655] "HEAVY RAIN/WIND"                "TIDAL FLOODING"                
## [657] "Winter Weather"                 "Snow squalls"                  
## [659] "Strong Winds"                   "Strong winds"                  
## [661] "RECORD WARM TEMPS."             "Ice/Snow"                      
## [663] "Mudslide"                       "Glaze"                         
## [665] "Extended Cold"                  "Snow Accumulation"             
## [667] "Freezing Fog"                   "Drifting Snow"                 
## [669] "Whirlwind"                      "Heavy snow shower"             
## [671] "Heavy rain"                     "LATE SNOW"                     
## [673] "Record May Snow"                "Record Winter Snow"            
## [675] "Heavy Precipitation"            "Record temperature"            
## [677] "Light snow"                     "Late Season Snowfall"          
## [679] "Gusty Wind"                     "small hail"                    
## [681] "Light Snow"                     "MIXED PRECIP"                  
## [683] "Black Ice"                      "Mudslides"                     
## [685] "Gradient wind"                  "Snow and Ice"                  
## [687] "Freezing Spray"                 "Summary Jan 17"                
## [689] "Summary of March 14"            "Summary of March 23"           
## [691] "Summary of March 24"            "Summary of April 3rd"          
## [693] "Summary of April 12"            "Summary of April 13"           
## [695] "Summary of April 21"            "Summary August 11"             
## [697] "Summary of April 27"            "Summary of May 9-10"           
## [699] "Summary of May 10"              "Summary of May 13"             
## [701] "Summary of May 14"              "Summary of May 22 am"          
## [703] "Summary of May 22 pm"           "Heatburst"                     
## [705] "Summary of May 26 am"           "Summary of May 26 pm"          
## [707] "Metro Storm, May 26"            "Summary of May 31 am"          
## [709] "Summary of May 31 pm"           "Summary of June 3"             
## [711] "Summary of June 4"              "Summary June 5-6"              
## [713] "Summary June 6"                 "Summary of June 11"            
## [715] "Summary of June 12"             "Summary of June 13"            
## [717] "Summary of June 15"             "Summary of June 16"            
## [719] "Summary June 18-19"             "Summary of June 23"            
## [721] "Summary of June 24"             "Summary of June 30"            
## [723] "Summary of July 2"              "Summary of July 3"             
## [725] "Summary of July 11"             "Summary of July 22"            
## [727] "Summary July 23-24"             "Summary of July 26"            
## [729] "Summary of July 29"             "Summary of August 1"           
## [731] "Summary August 2-3"             "Summary August 7"              
## [733] "Summary August 9"               "Summary August 10"             
## [735] "Summary August 17"              "Summary August 21"             
## [737] "Summary August 28"              "Summary September 4"           
## [739] "Summary September 20"           "Summary September 23"          
## [741] "Summary Sept. 25-26"            "Summary: Oct. 20-21"           
## [743] "Summary: October 31"            "Summary: Nov. 6-7"             
## [745] "Summary: Nov. 16"               "Microburst"                    
## [747] "wet micoburst"                  "Hail(0.75)"                    
## [749] "Funnel Cloud"                   "Urban Flooding"                
## [751] "No Severe Weather"              "Urban flood"                   
## [753] "Urban Flood"                    "Cold"                          
## [755] "Summary of May 22"              "Summary of June 6"             
## [757] "Summary August 4"               "Summary of June 10"            
## [759] "Summary of June 18"             "Summary September 3"           
## [761] "Summary: Sept. 18"              "Coastal Flood"                 
## [763] "coastal flooding"               "Small Hail"                    
## [765] "Record Temperatures"            "Light Snowfall"                
## [767] "Freezing Drizzle"               "Gusty wind/rain"               
## [769] "GUSTY WIND/HVY RAIN"            "Blowing Snow"                  
## [771] "Early snowfall"                 "Monthly Snowfall"              
## [773] "Record Heat"                    "Seasonal Snowfall"             
## [775] "Monthly Rainfall"               "Cold Temperature"              
## [777] "Sml Stream Fld"                 "Heat Wave"                     
## [779] "MUDSLIDE/LANDSLIDE"             "Saharan Dust"                  
## [781] "Volcanic Ash"                   "Volcanic Ash Plume"            
## [783] "Thundersnow shower"             "NONE"                          
## [785] "COLD AND SNOW"                  "DAM BREAK"                     
## [787] "TSTM WIND (G45)"                "SLEET/FREEZING RAIN"           
## [789] "BLACK ICE"                      "BLOW-OUT TIDES"                
## [791] "UNSEASONABLY COOL"              "TSTM HEAVY RAIN"               
## [793] "Gusty Winds"                    "GUSTY WIND"                    
## [795] "TSTM WIND 40"                   "TSTM WIND 45"                  
## [797] "TSTM WIND (41)"                 "TSTM WIND (G40)"               
## [799] "TSTM WND"                       "Wintry mix"                    
## [801] "Frost"                          "Frost/Freeze"                  
## [803] "RAIN (HEAVY)"                   "Record Warmth"                 
## [805] "Prolong Cold"                   "Cold and Frost"                
## [807] "URBAN/SML STREAM FLDG"          "STRONG WIND GUST"              
## [809] "LATE FREEZE"                    "BLOW-OUT TIDE"                 
## [811] "Hypothermia/Exposure"           "HYPOTHERMIA/EXPOSURE"          
## [813] "Lake Effect Snow"               "Mixed Precipitation"           
## [815] "Record High"                    "COASTALSTORM"                  
## [817] "Snow and sleet"                 "Freezing rain"                 
## [819] "Gusty winds"                    "Blizzard Summary"              
## [821] "SUMMARY OF MARCH 24-25"         "SUMMARY OF MARCH 27"           
## [823] "SUMMARY OF MARCH 29"            "GRADIENT WIND"                 
## [825] "Icestorm/Blizzard"              "Flood/Strong Wind"             
## [827] "TSTM WIND AND LIGHTNING"        "gradient wind"                 
## [829] "Freezing drizzle"               "Mountain Snows"                
## [831] "URBAN/SMALL STRM FLDG"          "Heavy surf and wind"           
## [833] "Mild and Dry Pattern"           "COLD AND FROST"                
## [835] "TYPHOON"                        "HIGH SWELLS"                   
## [837] "HIGH  SWELLS"                   "VOLCANIC ASH"                  
## [839] "DRY SPELL"                      "BEACH EROSION"                 
## [841] "UNSEASONAL RAIN"                "EARLY RAIN"                    
## [843] "PROLONGED RAIN"                 "WINTERY MIX"                   
## [845] "COASTAL FLOODING/EROSION"       "HOT SPELL"                     
## [847] "UNSEASONABLY HOT"               "TSTM WIND  (G45)"              
## [849] "HIGH WIND (G40)"                "TSTM WIND (G35)"               
## [851] "DRY WEATHER"                    "ABNORMAL WARMTH"               
## [853] "UNUSUAL WARMTH"                 "WAKE LOW WIND"                 
## [855] "MONTHLY RAINFALL"               "COLD TEMPERATURES"             
## [857] "COLD WIND CHILL TEMPERATURES"   "MODERATE SNOW"                 
## [859] "MODERATE SNOWFALL"              "URBAN/STREET FLOODING"         
## [861] "COASTAL EROSION"                "UNUSUAL/RECORD WARMTH"         
## [863] "BITTER WIND CHILL"              "BITTER WIND CHILL TEMPERATURES"
## [865] "SEICHE"                         "TSTM"                          
## [867] "COASTAL  FLOODING/EROSION"      "UNSEASONABLY WARM YEAR"        
## [869] "HYPERTHERMIA/EXPOSURE"          "ROCK SLIDE"                    
## [871] "ICE PELLETS"                    "PATCHY DENSE FOG"              
## [873] "RECORD COOL"                    "RECORD WARM"                   
## [875] "HOT WEATHER"                    "RECORD TEMPERATURE"            
## [877] "TROPICAL DEPRESSION"            "VOLCANIC ERUPTION"             
## [879] "COOL SPELL"                     "WIND ADVISORY"                 
## [881] "GUSTY WIND/HAIL"                "RED FLAG FIRE WX"              
## [883] "FIRST FROST"                    "EXCESSIVELY DRY"               
## [885] "SNOW AND SLEET"                 "LIGHT SNOW/FREEZING PRECIP"    
## [887] "VOG"                            "MONTHLY PRECIPITATION"         
## [889] "MONTHLY TEMPERATURE"            "RECORD DRYNESS"                
## [891] "EXTREME WINDCHILL TEMPERATURES" "MIXED PRECIPITATION"           
## [893] "DRY CONDITIONS"                 "REMNANTS OF FLOYD"             
## [895] "EARLY SNOWFALL"                 "FREEZING FOG"                  
## [897] "LANDSPOUT"                      "DRIEST MONTH"                  
## [899] "RECORD  COLD"                   "LATE SEASON HAIL"              
## [901] "EXCESSIVE SNOW"                 "DRYNESS"                       
## [903] "FLOOD/FLASH/FLOOD"              "WIND AND WAVE"                 
## [905] "LIGHT FREEZING RAIN"            "MONTHLY SNOWFALL"              
## [907] "RECORD PRECIPITATION"           "ICE ROADS"                     
## [909] "ROUGH SEAS"                     "UNSEASONABLY WARM/WET"         
## [911] "UNSEASONABLY COOL & WET"        "UNUSUALLY WARM"                
## [913] "TSTM WIND G45"                  "NON SEVERE HAIL"               
## [915] "NON-SEVERE WIND DAMAGE"         "UNUSUALLY COLD"                
## [917] "WARM WEATHER"                   "LANDSLUMP"                     
## [919] "THUNDERSTORM WIND (G40)"        "UNSEASONABLY WARM & WET"       
## [921] "LOCALLY HEAVY RAIN"             "WIND GUSTS"                    
## [923] "UNSEASONAL LOW TEMP"            "HIGH SURF ADVISORY"            
## [925] "LATE SEASON SNOW"               "GUSTY LAKE WIND"               
## [927] "ABNORMALLY DRY"                 "WINTER WEATHER MIX"            
## [929] "RED FLAG CRITERIA"              "WND"                           
## [931] "CSTL FLOODING/EROSION"          "SMOKE"                         
## [933] "SNOW ADVISORY"                  "EXTREMELY WET"                 
## [935] "UNUSUALLY LATE SNOW"            "VERY DRY"                      
## [937] "RECORD LOW RAINFALL"            "ROGUE WAVE"                    
## [939] "PROLONG WARMTH"                 "ACCUMULATED SNOWFALL"          
## [941] "FALLING SNOW/ICE"               "DUST DEVEL"                    
## [943] "NON-TSTM WIND"                  "NON TSTM WIND"                 
## [945] "GUSTY THUNDERSTORM WINDS"       "PATCHY ICE"                    
## [947] "HEAVY RAIN EFFECTS"             "EXCESSIVE HEAT/DROUGHT"        
## [949] "NORTHERN LIGHTS"                "MARINE TSTM WIND"              
## [951] "HAZARDOUS SURF"                 "FROST/FREEZE"                  
## [953] "WINTER WEATHER/MIX"             "ASTRONOMICAL HIGH TIDE"        
## [955] "WHIRLWIND"                      "VERY WARM"                     
## [957] "ABNORMALLY WET"                 "TORNADO DEBRIS"                
## [959] "EXTREME COLD/WIND CHILL"        "ICE ON ROAD"                   
## [961] "DROWNING"                       "GUSTY THUNDERSTORM WIND"       
## [963] "MARINE HAIL"                    "HIGH SURF ADVISORIES"          
## [965] "HURRICANE/TYPHOON"              "HEAVY SURF/HIGH SURF"          
## [967] "SLEET STORM"                    "STORM SURGE/TIDE"              
## [969] "COLD/WIND CHILL"                "MARINE HIGH WIND"              
## [971] "TSUNAMI"                        "DENSE SMOKE"                   
## [973] "LAKESHORE FLOOD"                "MARINE THUNDERSTORM WIND"      
## [975] "MARINE STRONG WIND"             "ASTRONOMICAL LOW TIDE"         
## [977] "VOLCANIC ASHFALL"

We note that there are some entries which are not properly events, but summaries for a given day. Since we are interested in individual events, we decide to remove these entries. Also, we remove observations without the indication of the event (i.e., marked with a “?” in the EVTYPE variable), since these are useless for the analysis.

dataset <- dataset[-grep("Summary", dataset$EVTYPE),]
dataset <- dataset[!dataset$EVTYPE == "?",]

We noted also that some event types appear to be similar, if not the same event type. However, we choose not to merge similar events since that operation would require more domain knowledge than available.

Now, we noticed that damages are encoded with two variables, one for the number representing the value (i.e. PROPDMG and CROPDMG for property and crop damages respectively), and another for the magnitude (i.e. PROPDMGEXP and CROPDMGEXP for property and crop damages respectively). This format is unsuitable for the analysis, since we need the actual value of the damage in order to perform calculations. Therefore, we need to compute actual values. First, we take a look at these variables:

unique(dataset$PROPDMGEXP)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(dataset$CROPDMGEXP)
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"

With a bit of help from the dataset documentation, we interpret numeric values as exponents of the order of magnitude, “h” as hundreds, “k” as thousands, “M” as millions and “B” as billions. “+” is interpreted as a signal that the estimate is a lower bound, and we decide to keep the corresponding value as is. In order to decide what to do with the “?” value, we look at the corresponding value assumed by the PROPDMG or CROPDMG variable when PROPDMGEXP or CROPDMGEXP respectively are “?”:

as.numeric(unique(dataset[!is.na(dataset$PROPDMGEXP),][dataset[!is.na(dataset$PROPDMGEXP),]$PROPDMGEXP == "?","PROPDMG"]))
## [1] 0
as.numeric(unique(dataset[!is.na(dataset$CROPDMGEXP),][dataset[!is.na(dataset$CROPDMGEXP),]$CROPDMGEXP == "?","CROPDMG"]))
## [1] 0

Therefore, we can compute damages as follows:

dataset <- dataset %>% mutate(PROPDMGFULL = case_when(
    PROPDMGEXP == "0" | PROPDMGEXP == "?" | PROPDMGEXP == "-" | PROPDMGEXP == "+" | is.na(PROPDMGEXP) ~ PROPDMG,
    PROPDMGEXP == "1" ~ PROPDMG * 10,
    PROPDMGEXP == "h" | PROPDMGEXP == "H" | PROPDMGEXP == "2" ~ PROPDMG * 100,
    PROPDMGEXP == "k" | PROPDMGEXP == "K" | PROPDMGEXP == "3" ~ PROPDMG * 1000,
    PROPDMGEXP == "4" ~ PROPDMG * 10000,
    PROPDMGEXP == "5" ~ PROPDMG * 10000,
    PROPDMGEXP == "m" | PROPDMGEXP == "M" | PROPDMGEXP == "6" ~ PROPDMG * 1000000,
    PROPDMGEXP == "7" ~ PROPDMG * 10000000,
    PROPDMGEXP == "8" ~ PROPDMG * 100000000,
    PROPDMGEXP == "b" | PROPDMGEXP == "B" | PROPDMGEXP == "9" ~ PROPDMG * 1000000000
)) %>% mutate(CROPDMGFULL = case_when(
    CROPDMGEXP == "0" | CROPDMGEXP == "?" | CROPDMGEXP == "-" | CROPDMGEXP == "+" | is.na(CROPDMGEXP) ~ CROPDMG,
    CROPDMGEXP == "1" ~ CROPDMG * 10,
    CROPDMGEXP == "h" | CROPDMGEXP == "H" | CROPDMGEXP == "2" ~ CROPDMG * 100,
    CROPDMGEXP == "k" | CROPDMGEXP == "K" | CROPDMGEXP == "3" ~ CROPDMG * 1000,
    CROPDMGEXP == "4" ~ CROPDMG * 10000,
    CROPDMGEXP == "5" ~ CROPDMG * 10000,
    CROPDMGEXP == "m" | CROPDMGEXP == "M" | CROPDMGEXP == "6" ~ CROPDMG * 1000000,
    CROPDMGEXP == "7" ~ CROPDMG * 10000000,
    CROPDMGEXP == "8" ~ CROPDMG * 100000000,
    CROPDMGEXP == "b" | CROPDMGEXP == "B" | CROPDMGEXP == "9" ~ CROPDMG * 1000000000
)) %>% mutate (DMGTOTAL = PROPDMGFULL + CROPDMGFULL)

Now the data is finally ready for analysis.

Data Analysis

In order to evaluate the impact of each type of event, we decided to take the total of each impact from all the occurrence of the event within a year, then take the average across all years. In other word, our summarising operation is the yearly average total.

We know that the number of recorded events has increased over the years. Therefore, the first step in the analysis is to check this via a bar plot.

library(ggplot2)
years <- dataset %>% group_by(YEAR) %>% summarise(NUMEVS = n())
qplot(YEAR, NUMEVS, data = years, geom = 'col', fill = I("red")) + 
    labs(title = "Events recorded per year") +
    xlab("Year") + ylab("Number of recorded events") +
    theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(size = 6, angle=45)) +
    geom_hline(yintercept = 20000, colour = "steelblue", size = 1.2)

The bar plot clearly shows the imbalance in the number of recorded events across the years, with a clear step at the early ’90s. Consequently, we decide to estabilish a cutoff at 20.000 recorded events and filter out years below the cutoff threshold from the dataset. We also compute the fraction of observations collected in the years above the threshold.

n <- nrow(dataset);
dataset <- filter(dataset, YEAR %in% years[years$NUMEVS > 20000, ]$YEAR);
nrow(dataset)/n
## [1] 0.7781413

Interestingly, almost 78% of observations were collected after 1994 (the cutoff year, as it can be seen from the bar plot).

We are now ready to compute the yearly average totals.

data_out <- dataset %>% group_by(EVTYPE, YEAR) %>% 
            summarise(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), DMG = sum(DMGTOTAL)) %>% 
            summarise(FATALITIES = mean(FATALITIES), INJURIES = mean(INJURIES), DMG =  mean(DMG))

Results

We are interested in the top 10 event types which are more harmful to people and economy. Damage to people can be measured in terms of fatalities or injuries (here we do not try to combine these two metrics as it is sometimes done, since we have no metric for comparing a death with an injury).

data_out %>% arrange(desc(FATALITIES)) %>% select(-"DMG")
## # A tibble: 857 x 3
##    EVTYPE             FATALITIES INJURIES
##    <chr>                   <dbl>    <dbl>
##  1 EXCESSIVE HEAT          106.     362. 
##  2 TORNADO                  88.5   1254. 
##  3 HEAT                     77.5    175. 
##  4 HEAT WAVE                57.3    103  
##  5 FLASH FLOOD              52.8     97.4
##  6 EXTREME HEAT             48       77.5
##  7 LIGHTNING                44.1    284. 
##  8 RIP CURRENT              28.3     17.8
##  9 THUNDERSTORM WINDS       25.5    364. 
## 10 FLOOD                    25      377. 
## # … with 847 more rows
data_out %>% arrange(desc(INJURIES)) %>% select(-"DMG")
## # A tibble: 857 x 3
##    EVTYPE             FATALITIES INJURIES
##    <chr>                   <dbl>    <dbl>
##  1 TORNADO                 88.5     1254.
##  2 FLOOD                   25        377.
##  3 THUNDERSTORM WINDS      25.5      364.
##  4 EXCESSIVE HEAT         106.       362.
##  5 HURRICANE/TYPHOON       16        319.
##  6 LIGHTNING               44.1      284.
##  7 TSTM WIND               18.5      279.
##  8 THUNDERSTORM WIND       16.6      184.
##  9 HEAT                    77.5      175.
## 10 ICE STORM                4.78     110.
## # … with 847 more rows

Although the two orders do not perfectly match, we can see that many event types, such as tornadoes, heat waves, floods, thunderstorm winds and lightnings appear in the top 10 positions in both orders. We can therefore conclude that these events are the most harmful to population health.

When it comes to the economy, we already have a single evaluation metric.

data_out %>% arrange(desc(DMG)) %>% select(c("EVTYPE", "DMG"))
## # A tibble: 857 x 2
##    EVTYPE                             DMG
##    <chr>                            <dbl>
##  1 HURRICANE/TYPHOON         17978428200 
##  2 FLOOD                      8315919514.
##  3 STORM SURGE                3926685545.
##  4 HURRICANE OPAL             3191846000 
##  5 HEAVY RAIN/SEVERE WEATHER  2500000000 
##  6 TORNADO                    1443511937.
##  7 HURRICANE                  1123402232.
##  8 HAIL                       1017669088.
##  9 FLASH FLOOD                 988850635.
## 10 THUNDERSTORM WINDS          844045443.
## # … with 847 more rows

These are the top 10 event types having greater economic consequences.

Discussion

It is interesting to observe that hail and severe weather events are amongst the most economically damaging events, but not amongst those causing the most damage to human health, whereas heat waves are particularly dangerous for people while having less severe impact on properties. Hurricanes deserves special mention, since they are amongst the most threatening events in terms of both properties and injuries, but they result in relatively fewer deaths compared to other events. Lastly, floods, tornadoes and thunderstorm winds are definitely the biggest threats for both people aand properties, since they make their way in the top 10 in all three rankings.