Introducion

Read in the data

dat <- read.csv("repdata-data-StormData.csv")

This is a report of storm data from the National Oceanic and Atmospheric Administration (NOAA). Table 2.1.1 within the documentation found at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf contains a list of 48 different weather events. These are listed here:

##  [1] "astronomical low tide"    "avalanche"               
##  [3] "blizzard"                 "coastal flood"           
##  [5] "cold/wind chill"          "debris flow"             
##  [7] "dense fog"                "dense smoke"             
##  [9] "drought"                  "dust devil"              
## [11] "dust storm"               "excessive heat"          
## [13] "extreme cold/wind chill"  "flash flood"             
## [15] "flood"                    "frost/freeze"            
## [17] "funnel cloud"             "freezing fog"            
## [19] "hail"                     "heat"                    
## [21] "heavy rain"               "heavy snow"              
## [23] "high surf"                "high wind"               
## [25] "hurricane (typhoon)"      "ice storm"               
## [27] "lake-effect snow"         "lakeshore flood"         
## [29] "lightning"                "marine hail"             
## [31] "marine high wind"         "marine strong wind"      
## [33] "marine thunderstorm wind" "rip current"             
## [35] "seiche"                   "sleet"                   
## [37] "storm surge/tide"         "strong wind"             
## [39] "thunderstorm wind"        "tornado"                 
## [41] "tropical depression"      "tropical storm"          
## [43] "tsunami"                  "volcanic ash"            
## [45] "waterspout"               "wildfire"                
## [47] "winter storm"             "winter weather"

However, the list of events found within the variable EVTYPE contains 985 levels and therefore needs to be heavily cleaned up.
In addition, for the purposes of reporting the effects of these events, I re-group them by combining similar events. For example, blizzard, cold/wind chill, extreme cold/wind chill, frost/freeze, heavy snow, ice storm, lake-effect snow, sleet, and winter storm can all fit within the larger category of winter weather. This leaves us with the following list of 26 event types:

##  [1] "astronomical low tide" "avalanche"            
##  [3] "debris flow"           "fog"                  
##  [5] "smoke"                 "drought"              
##  [7] "dust devil"            "dust storm"           
##  [9] "heat"                  "flood"                
## [11] "hail"                  "heavy rain"           
## [13] "high surf"             "high wind"            
## [15] "hurricane"             "lightning"            
## [17] "seiche"                "surge"                
## [19] "thunderstorm"          "tornado"              
## [21] "tropical storm"        "tsunami"              
## [23] "volcanic ash"          "waterspout"           
## [25] "wildfire"              "winter weather"

Data processing

In order to process this data, I created a function to find words or character strings within EVTYPE and then create a list of each value that contains that character string. I can then subset that list appropriately to replace that value with one that matches the appropriate storm event. In this way, I can make sure that EVTYPE includes only the list of 26 event types and that they are appropriately categorized.

load libraries

library(qdap)
## Loading required package: qdapDictionaries
## Loading required package: qdapRegex
## Loading required package: qdapTools
## Loading required package: RColorBrewer
## 
## Attaching package: 'qdap'
## The following object is masked from 'package:base':
## 
##     Filter
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:qdap':
## 
##     %>%
## The following object is masked from 'package:qdapTools':
## 
##     id
## The following objects are masked from 'package:qdapRegex':
## 
##     escape, explain
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

findEvent function

findEvent <- function(event){
        name <- grep(event, dat$EVTYPE, value = TRUE)
        name <<- as.factor(name)
}

initital cleaning

dat$EVTYPE <- gsub("^ *", "", dat$EVTYPE) # removes extra spaces
dat$EVTYPE <- tolower(dat$EVTYPE) # everything to lower case
dat$EVTYPE <- gsub("^summary.*", "na", dat$EVTYPE) 
dat$EVTYPE <- gsub("[?]", "na", dat$EVTYPE) # the last two create 'na' observations from data that does not make sense
dat$EVTYPE <- as.factor(dat$EVTYPE)

finding and replacing values in order

 # astronomical low tide
findEvent("tide")
levels(name) #no replacing needed for low tide
## [1] "astronomical high tide"   "astronomical low tide"   
## [3] "blow-out tide"            "blow-out tides"          
## [5] "high tides"               "high wind and high tides"
## [7] "storm surge/tide"
# avalanche
findEvent("aval") 
levels(name)
## [1] "avalance"                      "avalanche"                    
## [3] "heavy snow/blizzard/avalanche"
dat$EVTYPE <- multigsub(levels(name), "avalanche", dat$EVTYPE)
#debris flow
findEvent("slide")
levels(name)
##  [1] "flash flood landslides"     "flash flood/landslide"     
##  [3] "heavy rain/mudslides/flood" "landslide"                 
##  [5] "landslide/urban flood"      "landslides"                
##  [7] "mud slide"                  "mud slides"                
##  [9] "mud slides urban flooding"  "mud/rock slide"            
## [11] "mudslide"                   "mudslide/landslide"        
## [13] "mudslides"                  "rock slide"                
## [15] "urban flood landslide"
dat$EVTYPE <- multigsub(levels(name), "debris flow", dat$EVTYPE)
# fog
findEvent("fog") 
levels(name)
## [1] "dense fog"                 "fog"                      
## [3] "fog and cold temperatures" "freezing fog"             
## [5] "ice fog"                   "patchy dense fog"
dat$EVTYPE <- multigsub(levels(name), "fog", dat$EVTYPE)
# smoke
findEvent("smoke")
levels(name)
## [1] "dense smoke" "smoke"
dat$EVTYPE <- multigsub(levels(name), "smoke", dat$EVTYPE)
# drought
findEvent("drought")
levels(name)
## [1] "drought"                "drought/excessive heat"
## [3] "excessive heat/drought" "heat drought"          
## [5] "heat wave drought"      "heat/drought"          
## [7] "snow drought"
dat$EVTYPE <- multigsub(levels(name), "drought", dat$EVTYPE)
findEvent("dry")
levels(name)
##  [1] "abnormally dry"            "dry"                      
##  [3] "dry conditions"            "dry hot weather"          
##  [5] "dry microburst"            "dry microburst 50"        
##  [7] "dry microburst 53"         "dry microburst 58"        
##  [9] "dry microburst 61"         "dry microburst 84"        
## [11] "dry microburst winds"      "dry mircoburst winds"     
## [13] "dry pattern"               "dry spell"                
## [15] "dry weather"               "dryness"                  
## [17] "excessively dry"           "hot and dry"              
## [19] "hot/dry pattern"           "mild and dry pattern"     
## [21] "mild/dry pattern"          "record dry month"         
## [23] "record dryness"            "unseasonably dry"         
## [25] "unseasonably warm and dry" "very dry"                 
## [27] "warm dry conditions"
dat$EVTYPE <- multigsub(levels(name), "drought", dat$EVTYPE)
# dust devil
findEvent("dust")
levels(name)
## [1] "blowing dust"          "dust devel"            "dust devil"           
## [4] "dust devil waterspout" "dust storm"            "dust storm/high winds"
## [7] "duststorm"             "high winds dust storm" "saharan dust"
devil <- levels(name)[2:4]
devil <- as.factor(devil)
dat$EVTYPE <- multigsub(levels(devil), "dust devil", dat$EVTYPE)
# dust storm
findEvent("dust")
levels(name)
## [1] "blowing dust"          "dust devil"            "dust storm"           
## [4] "dust storm/high winds" "duststorm"             "high winds dust storm"
## [7] "saharan dust"
dust <- levels(name)[-(2)] # removed dust devil
dust <- as.factor(dust)
dat$EVTYPE <- multigsub(levels(dust), "dust storm", dat$EVTYPE)
# heat
findEvent("hot")
levels(name)
## [1] "hot pattern"      "hot spell"        "hot weather"     
## [4] "unseasonably hot"
dat$EVTYPE <- multigsub(levels(name), "heat", dat$EVTYPE)
findEvent("heat")
levels(name)
## [1] "excessive heat"        "extreme heat"          "heat"                 
## [4] "heat wave"             "heat waves"            "heatburst"            
## [7] "record heat"           "record heat wave"      "record/excessive heat"
dat$EVTYPE <- multigsub(levels(name), "heat", dat$EVTYPE)
findEvent("warm")
levels(name)
##  [1] "abnormal warmth"         "prolong warmth"         
##  [3] "record warm"             "record warm temps."     
##  [5] "record warmth"           "unseasonably warm"      
##  [7] "unseasonably warm & wet" "unseasonably warm year" 
##  [9] "unseasonably warm/wet"   "unusual warmth"         
## [11] "unusual/record warmth"   "unusually warm"         
## [13] "very warm"               "warm weather"
dat$EVTYPE <- multigsub(levels(name), "heat", dat$EVTYPE)
# flood
findEvent("flood")
levels(name)
##  [1] "beach erosion/coastal flood"    "beach flood"                   
##  [3] "breakup flooding"               "coastal flood"                 
##  [5] "coastal flooding"               "coastal flooding/erosion"      
##  [7] "coastal/tidal flood"            "coastalflood"                  
##  [9] "cstl flooding/erosion"          "erosion/cstl flood"            
## [11] "flash flood"                    "flash flood - heavy rain"      
## [13] "flash flood from ice jams"      "flash flood winds"             
## [15] "flash flood/"                   "flash flood/ flood"            
## [17] "flash flood/ street"            "flash flood/flood"             
## [19] "flash flood/heavy rain"         "flash flooding"                
## [21] "flash flooding/flood"           "flash flooding/thunderstorm wi"
## [23] "flash floods"                   "flood"                         
## [25] "flood & heavy rain"             "flood flash"                   
## [27] "flood flood/flash"              "flood watch/"                  
## [29] "flood/flash"                    "flood/flash flood"             
## [31] "flood/flash flooding"           "flood/flash/flood"             
## [33] "flood/flashflood"               "flood/rain/wind"               
## [35] "flood/rain/winds"               "flood/river flood"             
## [37] "flood/strong wind"              "flooding"                      
## [39] "flooding/heavy rain"            "floods"                        
## [41] "hail flooding"                  "heavy rain and flood"          
## [43] "heavy rain; urban flood winds;" "heavy rain/flooding"           
## [45] "heavy rain/urban flood"         "heavy rains/flooding"          
## [47] "heavy snow/high winds & flood"  "heavy surf coastal flooding"   
## [49] "high winds/coastal flood"       "high winds/flooding"           
## [51] "highway flooding"               "ice jam flood (minor"          
## [53] "ice jam flooding"               "ice storm/flash flood"         
## [55] "lake flood"                     "lakeshore flood"               
## [57] "local flash flood"              "local flood"                   
## [59] "major flood"                    "minor flood"                   
## [61] "minor flooding"                 "river and stream flood"        
## [63] "river flood"                    "river flooding"                
## [65] "rural flood"                    "small stream and urban flood"  
## [67] "small stream and urban floodin" "small stream flood"            
## [69] "small stream flooding"          "small stream urban flood"      
## [71] "small stream/urban flood"       "snowmelt flooding"             
## [73] "stream flooding"                "street flood"                  
## [75] "street flooding"                "thunderstorm winds urban flood"
## [77] "thunderstorm winds/ flood"      "thunderstorm winds/flash flood"
## [79] "thunderstorm winds/flooding"    "tidal flood"                   
## [81] "tidal flooding"                 "urban and small stream flood"  
## [83] "urban and small stream floodin" "urban flood"                   
## [85] "urban flooding"                 "urban floods"                  
## [87] "urban small stream flood"       "urban/small flooding"          
## [89] "urban/small stream flood"       "urban/small stream flooding"   
## [91] "urban/street flooding"
dat$EVTYPE <- multigsub(levels(name), "flood", dat$EVTYPE)
# hail
findEvent("hail")
levels(name)
##  [1] "deep hail"                  "funnel cloud/hail"         
##  [3] "gusty wind/hail"            "hail"                      
##  [5] "hail 0.75"                  "hail 0.88"                 
##  [7] "hail 075"                   "hail 088"                  
##  [9] "hail 1.00"                  "hail 1.75"                 
## [11] "hail 1.75)"                 "hail 100"                  
## [13] "hail 125"                   "hail 150"                  
## [15] "hail 175"                   "hail 200"                  
## [17] "hail 225"                   "hail 275"                  
## [19] "hail 450"                   "hail 75"                   
## [21] "hail 80"                    "hail 88"                   
## [23] "hail aloft"                 "hail damage"               
## [25] "hail storm"                 "hail(0.75)"                
## [27] "hail/icy roads"             "hail/wind"                 
## [29] "hail/winds"                 "hailstorm"                 
## [31] "hailstorms"                 "late season hail"          
## [33] "marine hail"                "non severe hail"           
## [35] "small hail"                 "thunderstorm hail"         
## [37] "thunderstorm wind/hail"     "thunderstorm winds hail"   
## [39] "thunderstorm winds/ hail"   "thunderstorm winds/hail"   
## [41] "thunderstorm windshail"     "tornadoes, tstm wind, hail"
## [43] "tstm wind/hail"             "wind/hail"
dat$EVTYPE <- multigsub(levels(name), "hail", dat$EVTYPE)
# heavy rain
findEvent("heavy rain")
levels(name)
##  [1] "heavy rain"                    "heavy rain and wind"          
##  [3] "heavy rain effects"            "heavy rain/high surf"         
##  [5] "heavy rain/lightning"          "heavy rain/severe weather"    
##  [7] "heavy rain/small stream urban" "heavy rain/snow"              
##  [9] "heavy rain/wind"               "heavy rainfall"               
## [11] "heavy rains"                   "high winds heavy rains"       
## [13] "high winds/heavy rain"         "lightning and heavy rain"     
## [15] "lightning/heavy rain"          "locally heavy rain"           
## [17] "thunderstorm winds heavy rain" "thunderstorm winds/heavy rain"
## [19] "tstm heavy rain"
dat$EVTYPE <- multigsub(levels(name), "heavy rain", dat$EVTYPE)
findEvent("wet")
levels(name)
##  [1] "abnormally wet"          "cold and wet conditions"
##  [3] "cool and wet"            "excessive wetness"      
##  [5] "extremely wet"           "heavy wet snow"         
##  [7] "unseasonably cool & wet" "unseasonably wet"       
##  [9] "wet micoburst"           "wet microburst"         
## [11] "wet month"               "wet snow"               
## [13] "wet weather"             "wet year"
dat$EVTYPE <- multigsub(levels(name), "heavy rain", dat$EVTYPE)
# high surf
findEvent("high surf")
levels(name)
## [1] "heavy surf/high surf" "high surf"            "high surf advisories"
## [4] "high surf advisory"
dat$EVTYPE <- multigsub(levels(name), "high surf", dat$EVTYPE)
findEvent("rip current")
levels(name)
## [1] "rip current"             "rip currents"           
## [3] "rip currents heavy surf" "rip currents/heavy surf"
dat$EVTYPE <- multigsub(levels(name), "high surf", dat$EVTYPE)
#high wind
findEvent("high wind")
levels(name)
##  [1] "blizzard/high wind"             "heavy snow and high winds"     
##  [3] "heavy snow/high wind"           "heavy snow/high winds"         
##  [5] "heavy snow/high winds/freezing" "high wind"                     
##  [7] "high wind (g40)"                "high wind 48"                  
##  [9] "high wind 63"                   "high wind 70"                  
## [11] "high wind and heavy snow"       "high wind and high tides"      
## [13] "high wind and seas"             "high wind damage"              
## [15] "high wind/ blizzard"            "high wind/blizzard"            
## [17] "high wind/blizzard/freezing ra" "high wind/heavy snow"          
## [19] "high wind/low wind chill"       "high wind/seas"                
## [21] "high wind/wind chill"           "high wind/wind chill/blizzard" 
## [23] "high winds"                     "high winds 55"                 
## [25] "high winds 57"                  "high winds 58"                 
## [27] "high winds 63"                  "high winds 66"                 
## [29] "high winds 67"                  "high winds 73"                 
## [31] "high winds 76"                  "high winds 80"                 
## [33] "high winds 82"                  "high winds and wind chill"     
## [35] "high winds/"                    "high winds/cold"               
## [37] "high winds/snow"                "hurricane opal/high winds"     
## [39] "marine high wind"               "record cold and high wind"     
## [41] "snow- high wind- wind chill"    "snow/high winds"               
## [43] "wind chill/high wind"           "winter storm high winds"       
## [45] "winter storm/high wind"         "winter storm/high winds"
dat$EVTYPE <- multigsub(levels(name), "high wind", dat$EVTYPE)
findEvent("wind")
levels(name)
##   [1] "bitter wind chill"              "bitter wind chill temperatures"
##   [3] "blizzard and extreme wind chil" "blowing snow & extreme wind ch"
##   [5] "blowing snow- extreme wind chi" "blowing snow/extreme wind chil"
##   [7] "cold wind chill temperatures"   "cold/wind chill"               
##   [9] "cold/winds"                     "downburst winds"               
##  [11] "extreme cold/wind chill"        "extreme wind chill"            
##  [13] "extreme wind chill/blowing sno" "extreme wind chills"           
##  [15] "extreme windchill"              "extreme windchill temperatures"
##  [17] "gradient wind"                  "gradient winds"                
##  [19] "gusty lake wind"                "gusty thunderstorm wind"       
##  [21] "gusty thunderstorm winds"       "gusty wind"                    
##  [23] "gusty wind/hvy rain"            "gusty wind/rain"               
##  [25] "gusty winds"                    "heavy snow and strong winds"   
##  [27] "heavy snow/wind"                "heavy surf and wind"           
##  [29] "high wind"                      "ice/strong winds"              
##  [31] "lightning and winds"            "lightning thunderstorm winds"  
##  [33] "lightning thunderstorm windss"  "low wind chill"                
##  [35] "marine strong wind"             "marine thunderstorm wind"      
##  [37] "marine tstm wind"               "microburst winds"              
##  [39] "non tstm wind"                  "non-severe wind damage"        
##  [41] "non-tstm wind"                  "rain and wind"                 
##  [43] "rain/wind"                      "severe thunderstorm winds"     
##  [45] "snow and wind"                  "storm force winds"             
##  [47] "strong wind"                    "strong wind gust"              
##  [49] "strong winds"                   "thuderstorm winds"             
##  [51] "thundeerstorm winds"            "thunderestorm winds"           
##  [53] "thunderstorm wind"              "thunderstorm wind (g40)"       
##  [55] "thunderstorm wind 50"           "thunderstorm wind 52"          
##  [57] "thunderstorm wind 56"           "thunderstorm wind 59"          
##  [59] "thunderstorm wind 59 mph"       "thunderstorm wind 59 mph."     
##  [61] "thunderstorm wind 60 mph"       "thunderstorm wind 65 mph"      
##  [63] "thunderstorm wind 65mph"        "thunderstorm wind 69"          
##  [65] "thunderstorm wind 98 mph"       "thunderstorm wind g50"         
##  [67] "thunderstorm wind g51"          "thunderstorm wind g52"         
##  [69] "thunderstorm wind g55"          "thunderstorm wind g60"         
##  [71] "thunderstorm wind g61"          "thunderstorm wind trees"       
##  [73] "thunderstorm wind."             "thunderstorm wind/ tree"       
##  [75] "thunderstorm wind/ trees"       "thunderstorm wind/awning"      
##  [77] "thunderstorm wind/lightning"    "thunderstorm winds"            
##  [79] "thunderstorm winds 13"          "thunderstorm winds 2"          
##  [81] "thunderstorm winds 50"          "thunderstorm winds 52"         
##  [83] "thunderstorm winds 53"          "thunderstorm winds 60"         
##  [85] "thunderstorm winds 61"          "thunderstorm winds 62"         
##  [87] "thunderstorm winds 63 mph"      "thunderstorm winds and"        
##  [89] "thunderstorm winds funnel clou" "thunderstorm winds g"          
##  [91] "thunderstorm winds g60"         "thunderstorm winds le cen"     
##  [93] "thunderstorm winds lightning"   "thunderstorm winds small strea"
##  [95] "thunderstorm winds."            "thunderstorm winds/funnel clou"
##  [97] "thunderstorm winds53"           "thunderstorm windss"           
##  [99] "thunderstorms wind"             "thunderstorms winds"           
## [101] "thunderstormw winds"            "thunderstormwinds"             
## [103] "thunderstrom wind"              "thunderstrom winds"            
## [105] "thundertorm winds"              "thundertsorm wind"             
## [107] "thundestorm winds"              "thunerstorm winds"             
## [109] "tstm wind"                      "tstm wind (41)"                
## [111] "tstm wind (g35)"                "tstm wind (g40)"               
## [113] "tstm wind (g45)"                "tstm wind 40"                  
## [115] "tstm wind 45"                   "tstm wind 50"                  
## [117] "tstm wind 51"                   "tstm wind 52"                  
## [119] "tstm wind 55"                   "tstm wind 65)"                 
## [121] "tstm wind and lightning"        "tstm wind damage"              
## [123] "tstm wind g45"                  "tstm wind g58"                 
## [125] "tstm winds"                     "tunderstorm wind"              
## [127] "wake low wind"                  "whirlwind"                     
## [129] "wind"                           "wind advisory"                 
## [131] "wind and wave"                  "wind chill"                    
## [133] "wind damage"                    "wind gusts"                    
## [135] "wind storm"                     "winds"
dat$EVTYPE <- multigsub(levels(name), "high wind", dat$EVTYPE)
# hurricane
findEvent("hurricane")
levels(name)
## [1] "hurricane"                  "hurricane edouard"         
## [3] "hurricane emily"            "hurricane erin"            
## [5] "hurricane felix"            "hurricane gordon"          
## [7] "hurricane opal"             "hurricane-generated swells"
## [9] "hurricane/typhoon"
dat$EVTYPE <- multigsub(levels(name), "hurricane", dat$EVTYPE)
findEvent("typhoon")
levels(name)
## [1] "typhoon"
dat$EVTYPE <- gsub("typhoon", "hurricane", dat$EVTYPE)
#lightning
findEvent("lightning")
levels(name)
## [1] "lightning"                      "lightning and thunderstorm win"
## [3] "lightning damage"               "lightning fire"                
## [5] "lightning injury"               "lightning wauseon"             
## [7] "lightning."
dat$EVTYPE <- multigsub(levels(name), "lightning", dat$EVTYPE)
# seiche
findEvent("seiche")
levels(name)
## [1] "seiche"
#storm surge/tide
findEvent("surge")
levels(name)
## [1] "coastal surge"    "storm surge"      "storm surge/tide"
dat$EVTYPE <- multigsub(levels(name), "surge", dat$EVTYPE)
#thunderstorm
findEvent("thunderstorm")
levels(name)
##  [1] "severe thunderstorm"    "severe thunderstorms"  
##  [3] "thunderstorm"           "thunderstorm damage"   
##  [5] "thunderstorm damage to" "thunderstorm w inds"   
##  [7] "thunderstorm wins"      "thunderstorms"         
##  [9] "thunderstormw"          "thunderstormw 50"
dat$EVTYPE <- multigsub(levels(name), "thunderstorm", dat$EVTYPE)
findEvent("tstm")
levels(name)
## [1] "tstm"     "tstm wnd" "tstmw"
dat$EVTYPE <- multigsub(levels(name), "thunderstorm", dat$EVTYPE)
# tornado
findEvent("tornado")
levels(name)
##  [1] "cold air tornado"    "tornado"             "tornado debris"     
##  [4] "tornado f0"          "tornado f1"          "tornado f2"         
##  [7] "tornado f3"          "tornado/waterspout"  "tornadoes"          
## [10] "tornados"            "waterspout tornado"  "waterspout-tornado" 
## [13] "waterspout/ tornado" "waterspout/tornado"
torn <- levels(name)[c(1:7, 9:10)] # save for waterspout
torn <- as.factor(torn)
dat$EVTYPE <- multigsub(levels(torn), "tornado", dat$EVTYPE)
#tropical storm
findEvent("tropical")
levels(name)
## [1] "tropical depression"    "tropical storm"        
## [3] "tropical storm alberto" "tropical storm dean"   
## [5] "tropical storm gordon"  "tropical storm jerry"
dat$EVTYPE <- multigsub(levels(name), "tropical storm", dat$EVTYPE)
#tsunami
findEvent("tsunami")
levels(name)
## [1] "tsunami"
#volcanic ash
findEvent("volcanic ash")
levels(name)
## [1] "volcanic ash"       "volcanic ash plume" "volcanic ashfall"
dat$EVTYPE <- multigsub(levels(name), "volcanic ash", dat$EVTYPE)
# waterspout
findEvent("waterspout")
levels(name)
##  [1] "tornado/waterspout"      "waterspout"             
##  [3] "waterspout funnel cloud" "waterspout tornado"     
##  [5] "waterspout-"             "waterspout-tornado"     
##  [7] "waterspout/"             "waterspout/ tornado"    
##  [9] "waterspout/tornado"      "waterspouts"
dat$EVTYPE <- multigsub(levels(name), "waterspout", dat$EVTYPE)
# wildfire
findEvent("fire")
levels(name)
##  [1] "brush fire"        "brush fires"       "forest fires"     
##  [4] "grass fires"       "red flag fire wx"  "wild fires"       
##  [7] "wild/forest fire"  "wild/forest fires" "wildfire"         
## [10] "wildfires"
dat$EVTYPE <- multigsub(levels(name), "wildfire", dat$EVTYPE)
# winter weather
findEvent("wint")
levels(name)
##  [1] "blizzard/winter storm"   "heavy snow/winter storm"
##  [3] "record winter snow"      "winter mix"             
##  [5] "winter storm"            "winter storms"          
##  [7] "winter weather"          "winter weather mix"     
##  [9] "winter weather/mix"      "wintery mix"            
## [11] "wintry mix"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("bliz")
levels(name)
## [1] "blizzard"                "blizzard and heavy snow"
## [3] "blizzard summary"        "blizzard weather"       
## [5] "blizzard/freezing rain"  "blizzard/heavy snow"    
## [7] "ground blizzard"         "heavy snow/blizzard"    
## [9] "icestorm/blizzard"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("cold")
levels(name)
##  [1] "cold"                "cold air funnel"     "cold air funnels"   
##  [4] "cold and frost"      "cold and snow"       "cold temperature"   
##  [7] "cold temperatures"   "cold wave"           "cold weather"       
## [10] "excessive cold"      "extended cold"       "extreme cold"       
## [13] "extreme/record cold" "prolong cold"        "prolong cold/snow"  
## [16] "record cold"         "record cold/frost"   "record snow/cold"   
## [19] "severe cold"         "snow and cold"       "snow/ bitter cold"  
## [22] "snow/cold"           "snow\\cold"          "unseasonable cold"  
## [25] "unseasonably cold"   "unusually cold"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("frost")
levels(name)
## [1] "early frost"   "first frost"   "frost"         "frost/freeze" 
## [5] "frost\\freeze"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("freez")
levels(name)
##  [1] "agricultural freeze"           "damaging freeze"              
##  [3] "early freeze"                  "freeze"                       
##  [5] "freezing drizzle"              "freezing drizzle and freezing"
##  [7] "freezing rain"                 "freezing rain and sleet"      
##  [9] "freezing rain and snow"        "freezing rain sleet and"      
## [11] "freezing rain sleet and light" "freezing rain/sleet"          
## [13] "freezing rain/snow"            "freezing spray"               
## [15] "hard freeze"                   "heavy snow freezing rain"     
## [17] "heavy snow/freezing rain"      "late freeze"                  
## [19] "light freezing rain"           "light snow/freezing precip"   
## [21] "sleet & freezing rain"         "sleet/freezing rain"          
## [23] "snow freezing rain"            "snow/freezing rain"           
## [25] "snow/sleet/freezing rain"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("snow")
levels(name)
##  [1] "accumulated snowfall"       "blowing snow"              
##  [3] "drifting snow"              "early snow"                
##  [5] "early snowfall"             "excessive snow"            
##  [7] "falling snow/ice"           "first snow"                
##  [9] "heavy lake snow"            "heavy snow"                
## [11] "heavy snow & ice"           "heavy snow and"            
## [13] "heavy snow and ice"         "heavy snow and ice storm"  
## [15] "heavy snow andblowing snow" "heavy snow shower"         
## [17] "heavy snow squalls"         "heavy snow-squalls"        
## [19] "heavy snow/blowing snow"    "heavy snow/high"           
## [21] "heavy snow/ice"             "heavy snow/ice storm"      
## [23] "heavy snow/sleet"           "heavy snow/squalls"        
## [25] "heavy snowpack"             "ice and snow"              
## [27] "ice storm and snow"         "ice/snow"                  
## [29] "lack of snow"               "lake effect snow"          
## [31] "lake-effect snow"           "late season snow"          
## [33] "late season snowfall"       "late snow"                 
## [35] "late-season snowfall"       "light snow"                
## [37] "light snow and sleet"       "light snow/flurries"       
## [39] "light snowfall"             "moderate snow"             
## [41] "moderate snowfall"          "monthly snowfall"          
## [43] "mountain snows"             "near record snow"          
## [45] "rain/snow"                  "record may snow"           
## [47] "record snow"                "record snowfall"           
## [49] "seasonal snowfall"          "sleet/rain/snow"           
## [51] "sleet/snow"                 "snow"                      
## [53] "snow accumulation"          "snow advisory"             
## [55] "snow and heavy snow"        "snow and ice"              
## [57] "snow and ice storm"         "snow and sleet"            
## [59] "snow showers"               "snow sleet"                
## [61] "snow squall"                "snow squalls"              
## [63] "snow/ ice"                  "snow/blowing snow"         
## [65] "snow/heavy snow"            "snow/ice"                  
## [67] "snow/ice storm"             "snow/rain"                 
## [69] "snow/rain/sleet"            "snow/sleet"                
## [71] "snow/sleet/rain"            "snowfall record"           
## [73] "snowstorm"                  "thundersnow"               
## [75] "thundersnow shower"         "unusually late snow"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("ice")
levels(name)
##  [1] "black ice"       "glaze ice"       "glaze/ice storm"
##  [4] "ice"             "ice floes"       "ice jam"        
##  [7] "ice on road"     "ice pellets"     "ice roads"      
## [10] "ice storm"       "patchy ice"      "sleet/ice storm"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)
findEvent("sleet")
levels(name)
## [1] "sleet"       "sleet storm"
dat$EVTYPE <- multigsub(levels(name), "winter weather", dat$EVTYPE)

How many levels are left in EVTYPE? And what are they?

dat$EVTYPE <- as.factor(dat$EVTYPE)
length(levels(dat$EVTYPE))
## [1] 151
levels(dat$EVTYPE)
##   [1] "apache county"              "astronomical high tide"    
##   [3] "astronomical low tide"      "avalanche"                 
##   [5] "beach erosin"               "beach erosion"             
##   [7] "below normal precipitation" "blow-out tide"             
##   [9] "blow-out tides"             "coastal erosion"           
##  [11] "coastal storm"              "coastalstorm"              
##  [13] "cool spell"                 "dam break"                 
##  [15] "dam failure"                "debris flow"               
##  [17] "downburst"                  "driest month"              
##  [19] "drought"                    "drowning"                  
##  [21] "dust devil"                 "dust storm"                
##  [23] "early rain"                 "excessive"                 
##  [25] "excessive precipitation"    "excessive rain"            
##  [27] "excessive rainfall"         "flash floooding"           
##  [29] "flood"                      "fog"                       
##  [31] "funnel"                     "funnel cloud"              
##  [33] "funnel cloud."              "funnel clouds"             
##  [35] "funnels"                    "glaze"                     
##  [37] "gustnado"                   "gustnado and"              
##  [39] "hail"                       "hazardous surf"            
##  [41] "heat"                       "heavy mix"                 
##  [43] "heavy precipatation"        "heavy precipitation"       
##  [45] "heavy rain"                 "heavy seas"                
##  [47] "heavy shower"               "heavy showers"             
##  [49] "heavy surf"                 "heavy swells"              
##  [51] "high"                       "high high wind"            
##  [53] "high seas"                  "high surf"                 
##  [55] "high swells"                "high temperature record"   
##  [57] "high tides"                 "high water"                
##  [59] "high waves"                 "high wind"                 
##  [61] "hurricane"                  "hvy rain"                  
##  [63] "hyperthermia/exposure"      "hypothermia"               
##  [65] "hypothermia/exposure"       "icy roads"                 
##  [67] "landslump"                  "landspout"                 
##  [69] "large wall cloud"           "lighting"                  
##  [71] "lightning"                  "ligntning"                 
##  [73] "low temperature"            "low temperature record"    
##  [75] "marine accident"            "marine mishap"             
##  [77] "metro storm, may 26"        "microburst"                
##  [79] "mild pattern"               "mixed precip"              
##  [81] "mixed precipitation"        "monthly precipitation"     
##  [83] "monthly rainfall"           "monthly temperature"       
##  [85] "na"                         "no severe weather"         
##  [87] "none"                       "normal precipitation"      
##  [89] "northern lights"            "other"                     
##  [91] "prolonged rain"             "rain"                      
##  [93] "rain (heavy)"               "rain damage"               
##  [95] "rainstorm"                  "rapidly rising water"      
##  [97] "record cool"                "record high"               
##  [99] "record high temperature"    "record high temperatures"  
## [101] "record low"                 "record low rainfall"       
## [103] "record precipitation"       "record rainfall"           
## [105] "record temperature"         "record temperatures"       
## [107] "record/excessive rainfall"  "red flag criteria"         
## [109] "remnants of floyd"          "rogue wave"                
## [111] "rotating wall cloud"        "rough seas"                
## [113] "rough surf"                 "seiche"                    
## [115] "severe turbulence"          "small stream"              
## [117] "small stream and"           "sml stream fld"            
## [119] "smoke"                      "southeast"                 
## [121] "surge"                      "temperature record"        
## [123] "thunderstorm"               "tornado"                   
## [125] "torndao"                    "torrential rain"           
## [127] "torrential rainfall"        "tropical storm"            
## [129] "tsunami"                    "unseasonably cool"         
## [131] "unseasonal low temp"        "unseasonal rain"           
## [133] "urban and small"            "urban and small stream"    
## [135] "urban small"                "urban/small"               
## [137] "urban/small stream"         "urban/small strm fldg"     
## [139] "urban/sml stream fld"       "urban/sml stream fldg"     
## [141] "vog"                        "volcanic ash"              
## [143] "volcanic eruption"          "wall cloud"                
## [145] "wall cloud/funnel cloud"    "water spout"               
## [147] "waterspout"                 "wayterspout"               
## [149] "wildfire"                   "winter weather"            
## [151] "wnd"

The remaining factors that are not any of the 26 will not be reported in the analysis.

newdat <- subset(dat, EVTYPE %in% regroup)
newdat$EVTYPE <- factor(newdat$EVTYPE)

We would like to know: 1) which types of events are most harmful with respect to population health? and 2) which types of events have the greatest economic consequences?

Create a summary data frame for injuries and fatalities (population health).

Inj <- newdat %>%
        group_by(EVTYPE) %>%
        summarise(inj_max = max(INJURIES),
                  inj_ave = mean(INJURIES),
                  fat_max = max(FATALITIES),
                  fat_ave = mean(FATALITIES))

To look at the property and crop damage, we need to clean up the PROPDMGEXP and CROPDMGEXP variables and multiply them by the PROPDMG and CROPDMG variables respectively.

newdat$PROPDMGEXP <- gsub("[Hh]", 100, newdat$PROPDMGEXP)
newdat$PROPDMGEXP <- gsub("[Mm]", 1000000, newdat$PROPDMGEXP)
newdat$PROPDMGEXP <- gsub("B", 1000000000, newdat$PROPDMGEXP)
newdat$PROPDMGEXP <- gsub("K", 1000, newdat$PROPDMGEXP)
newdat$PROPDMGEXP <- as.numeric(newdat$PROPDMGEXP)
## Warning: NAs introduced by coercion
newdat$property <- with(newdat, PROPDMG*PROPDMGEXP)#new variable expressing property damage value

newdat$CROPDMGEXP <- gsub("[Mm]", 1000000, newdat$CROPDMGEXP)
newdat$CROPDMGEXP <- gsub("B", 1000000000, newdat$CROPDMGEXP)
newdat$CROPDMGEXP <- gsub("K", 1000, newdat$CROPDMGEXP)
newdat$CROPDMGEXP <- as.numeric(newdat$CROPDMGEXP)
## Warning: NAs introduced by coercion
newdat$crops <- with(newdat, CROPDMG*CROPDMGEXP)#new variable expressions crop damage value

dmg <- newdat %>%
        group_by(EVTYPE) %>%
        summarise(prop_max = max(property, na.rm = TRUE),
                  prop_ave = mean(property, na.rm = TRUE),
                  crop_max = max(crops, na.rm = TRUE),
                  crop_ave = mean(crops, na.rm = TRUE))

Results

Which events are most harmful to human population health?

par(mfrow = c(1, 1))
plot(Inj$EVTYPE, Inj$inj_ave, ylab = "Avg # injuries", xlab = "Event type")

with(Inj, which(inj_ave > 3)) # 10 14 22
## [1] 10 14 22
Inj$EVTYPE[10]
## [1] heat
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
Inj$EVTYPE[14]
## [1] hurricane
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
Inj$EVTYPE[22]
## [1] tsunami
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather

Events that cause the greatest average number of injuries are heat related, hurricanes, and tsunamis.

plot(Inj$EVTYPE, Inj$fat_ave, ylab = "Avg # fatalities", xlab = "Event type")

with(Inj, which(fat_ave > 0.5)) # 2 10 22
## [1]  2 10 22
Inj$EVTYPE[2]
## [1] avalanche
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather

Events that cause the greatest average fatalities are avalanches, heat related, and tsunamis.

Which events have the greatest economic consequences?

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
plot(dmg$EVTYPE, dmg$prop_ave, ylab = "Avg property damage", xlab = "Event type")
with(dmg, which(prop_ave > 6e4)) #14 16 18 19 20 21 23
##  [1]  3  4  7  9 11 12 13 14 15 16 18 19 20 21 22 23 25 26
plot(dmg$EVTYPE, dmg$crop_ave, ylab = "Avg crop damage", xlab = "Event type")

with(dmg, which.max(crop_ave)) #14
## [1] 14
dmg$EVTYPE[14]
## [1] hurricane
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
dmg$EVTYPE[16]
## [1] seiche
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
dmg$EVTYPE[18]
## [1] surge
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
dmg$EVTYPE[19]
## [1] thunderstorm
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
dmg$EVTYPE[21]
## [1] tropical storm
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather
dmg$EVTYPE[23]
## [1] volcanic ash
## 26 Levels: astronomical low tide avalanche debris flow ... winter weather

The greatest average property damage occurs from hurricanes, seiches, storm surges, thunderstorms, tropical storms, and volcanic ash. The greatest average crop damages occurs from hurricanes.