This analysis is to study the impact of severe weather with respect to population health and economic. The analysis was conducted on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The dataset was downloaded and being processed prior to the analysis. At the end of the analysis, the analysis showed that tornado and flood have the greatest impact on population health and economic respectively.
First we download the data from the website that is hosting the data.
# set the working directory
setwd('/Users/Daniel/COURSERA DATA SCIENCE/reproducible research/Assignment 2')
# download the file from the hosting website and rename the downloaded file
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', destfile = 'FStormData.csv.bz2', method = 'curl')
The file is in .bz2 format, we have to use the ‘bunzip2’ command in ‘R.utils’ package to unzip the file. Lets load the package before proceed any further.
# load the 'R.utils' library
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.2.5
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.3.0 (2016-04-13) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
We can now unzip the .b2z file using the bunzip2 command.
# unzip the file
bunzip2('FStormData.csv.bz2', overwrite = T, remove = F, destname = 'FStormData.csv')
As the data volume is quite large, we will use data table instead of data frame for better performance.
# load the file into data table
library(data.table)
dt.raw <- fread(input = 'FStormData.csv', sep = ',', header = T, stringsAsFactors = F)
##
Read 22.7% of 967216 rows
Read 49.6% of 967216 rows
Read 68.2% of 967216 rows
Read 80.6% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:07
## Warning in fread(input = "FStormData.csv", sep = ",", header = T,
## stringsAsFactors = F): Read less rows (902297) than were allocated
## (967216). Run again with verbose=TRUE and please report.
Lets look at the structure of the data and some high level summary of the data.
str(dt.raw)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : chr "3" "2" "2" "2" ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
dim(dt.raw)
## [1] 902297 37
As we only concern on the impact of the weather event on the population health and economic, we shall only retain the variables that are relevant to our analysis.
# retain the relevant variables in the data and exclude the rest
dt.raw <- dt.raw[, .(STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)]
dim(dt.raw)
## [1] 902297 8
# check the NA count in terms of the fraction of the total row count in each variable
library(dplyr) # load 'dplyr' to use the chain method
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
apply(dt.raw, 2, function(x){is.na(x) %>% sum()/length(x)})
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 0 0 0 0 0 0
## CROPDMG CROPDMGEXP
## 0 0
head(dt.raw)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: AL TORNADO 0 15 25.0 K 0
## 2: AL TORNADO 0 0 2.5 K 0
## 3: AL TORNADO 0 2 25.0 K 0
## 4: AL TORNADO 0 2 2.5 K 0
## 5: AL TORNADO 0 2 2.5 K 0
## 6: AL TORNADO 0 6 2.5 K 0
We can see that the dataset has been reduced to 8 variables while the row count remained unchanged. There are no missing values in all of these variables as shown in the validation.
Look at the summary of each variables.
summary(dt.raw)
## STATE EVTYPE FATALITIES
## Length:902297 Length:902297 Min. : 0.0000
## Class :character Class :character 1st Qu.: 0.0000
## Mode :character Mode :character Median : 0.0000
## Mean : 0.0168
## 3rd Qu.: 0.0000
## Max. :583.0000
## INJURIES PROPDMG PROPDMGEXP
## Min. : 0.0000 Min. : 0.00 Length:902297
## 1st Qu.: 0.0000 1st Qu.: 0.00 Class :character
## Median : 0.0000 Median : 0.00 Mode :character
## Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :1700.0000 Max. :5000.00
## CROPDMG CROPDMGEXP
## Min. : 0.000 Length:902297
## 1st Qu.: 0.000 Class :character
## Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
From the documentation, we learnt that the PROPDMGEXP and CROPDMGEXP means the exponent unit of the economic damage. The naming convention is as follows:
Look into the unique value in PROPDMGEXP and CROPDMGEXP.
# tabulate the unique value in PROPDMGEXP and CROPDMGEXP
table(dt.raw$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(dt.raw$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
As we can see, there are plenty of inputs that were not part of the designated values. We shall treat these scenarios as exception and assign value 1 to it.
# create a helper to function map the exponential character to numeric values
f.exp <- function(x){
switch (toupper(x),
H = 10^2,
K = 10^3,
M = 10^6,
B = 10^9,
1
)
}
# calculate the PROPDMG in full numeric format and store it in a new column
dt.raw[, PROPDMG.1:= mapply(f.exp, PROPDMGEXP) * PROPDMG]
dt.raw[, CROPDMG.1:= mapply(f.exp, CROPDMGEXP) * CROPDMG]
head(dt.raw)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: AL TORNADO 0 15 25.0 K 0
## 2: AL TORNADO 0 0 2.5 K 0
## 3: AL TORNADO 0 2 25.0 K 0
## 4: AL TORNADO 0 2 2.5 K 0
## 5: AL TORNADO 0 2 2.5 K 0
## 6: AL TORNADO 0 6 2.5 K 0
## PROPDMG.1 CROPDMG.1
## 1: 25000 0
## 2: 2500 0
## 3: 25000 0
## 4: 2500 0
## 5: 2500 0
## 6: 2500 0
Lets now look at the character variables as character variables tend to mess up most of the time. We start with variable EVTYPE.
# print out the unique value and sort by ascending
event <- unique(dt.raw$EVTYPE)
event[order(event)]
## [1] " HIGH SURF ADVISORY" " COASTAL FLOOD"
## [3] " FLASH FLOOD" " LIGHTNING"
## [5] " TSTM WIND" " TSTM WIND (G45)"
## [7] " WATERSPOUT" " WIND"
## [9] "?" "ABNORMAL WARMTH"
## [11] "ABNORMALLY DRY" "ABNORMALLY WET"
## [13] "ACCUMULATED SNOWFALL" "AGRICULTURAL FREEZE"
## [15] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [17] "ASTRONOMICAL LOW TIDE" "AVALANCE"
## [19] "AVALANCHE" "BEACH EROSIN"
## [21] "Beach Erosion" "BEACH EROSION"
## [23] "BEACH EROSION/COASTAL FLOOD" "BEACH FLOOD"
## [25] "BELOW NORMAL PRECIPITATION" "BITTER WIND CHILL"
## [27] "BITTER WIND CHILL TEMPERATURES" "Black Ice"
## [29] "BLACK ICE" "BLIZZARD"
## [31] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"
## [33] "Blizzard Summary" "BLIZZARD WEATHER"
## [35] "BLIZZARD/FREEZING RAIN" "BLIZZARD/HEAVY SNOW"
## [37] "BLIZZARD/HIGH WIND" "BLIZZARD/WINTER STORM"
## [39] "BLOW-OUT TIDE" "BLOW-OUT TIDES"
## [41] "BLOWING DUST" "blowing snow"
## [43] "Blowing Snow" "BLOWING SNOW"
## [45] "BLOWING SNOW & EXTREME WIND CH" "BLOWING SNOW- EXTREME WIND CHI"
## [47] "BLOWING SNOW/EXTREME WIND CHIL" "BREAKUP FLOODING"
## [49] "BRUSH FIRE" "BRUSH FIRES"
## [51] "COASTAL FLOODING/EROSION" "COASTAL EROSION"
## [53] "Coastal Flood" "COASTAL FLOOD"
## [55] "coastal flooding" "Coastal Flooding"
## [57] "COASTAL FLOODING" "COASTAL FLOODING/EROSION"
## [59] "Coastal Storm" "COASTAL STORM"
## [61] "COASTAL SURGE" "COASTAL/TIDAL FLOOD"
## [63] "COASTALFLOOD" "COASTALSTORM"
## [65] "Cold" "COLD"
## [67] "COLD AIR FUNNEL" "COLD AIR FUNNELS"
## [69] "COLD AIR TORNADO" "Cold and Frost"
## [71] "COLD AND FROST" "COLD AND SNOW"
## [73] "COLD AND WET CONDITIONS" "Cold Temperature"
## [75] "COLD TEMPERATURES" "COLD WAVE"
## [77] "COLD WEATHER" "COLD WIND CHILL TEMPERATURES"
## [79] "COLD/WIND CHILL" "COLD/WINDS"
## [81] "COOL AND WET" "COOL SPELL"
## [83] "CSTL FLOODING/EROSION" "DAM BREAK"
## [85] "DAM FAILURE" "Damaging Freeze"
## [87] "DAMAGING FREEZE" "DEEP HAIL"
## [89] "DENSE FOG" "DENSE SMOKE"
## [91] "DOWNBURST" "DOWNBURST WINDS"
## [93] "DRIEST MONTH" "Drifting Snow"
## [95] "DROUGHT" "DROUGHT/EXCESSIVE HEAT"
## [97] "DROWNING" "DRY"
## [99] "DRY CONDITIONS" "DRY HOT WEATHER"
## [101] "DRY MICROBURST" "DRY MICROBURST 50"
## [103] "DRY MICROBURST 53" "DRY MICROBURST 58"
## [105] "DRY MICROBURST 61" "DRY MICROBURST 84"
## [107] "DRY MICROBURST WINDS" "DRY MIRCOBURST WINDS"
## [109] "DRY PATTERN" "DRY SPELL"
## [111] "DRY WEATHER" "DRYNESS"
## [113] "DUST DEVEL" "Dust Devil"
## [115] "DUST DEVIL" "DUST DEVIL WATERSPOUT"
## [117] "DUST STORM" "DUST STORM/HIGH WINDS"
## [119] "DUSTSTORM" "EARLY FREEZE"
## [121] "Early Frost" "EARLY FROST"
## [123] "EARLY RAIN" "EARLY SNOW"
## [125] "Early snowfall" "EARLY SNOWFALL"
## [127] "Erosion/Cstl Flood" "EXCESSIVE"
## [129] "Excessive Cold" "EXCESSIVE HEAT"
## [131] "EXCESSIVE HEAT/DROUGHT" "EXCESSIVE PRECIPITATION"
## [133] "EXCESSIVE RAIN" "EXCESSIVE RAINFALL"
## [135] "EXCESSIVE SNOW" "EXCESSIVE WETNESS"
## [137] "EXCESSIVELY DRY" "Extended Cold"
## [139] "Extreme Cold" "EXTREME COLD"
## [141] "EXTREME COLD/WIND CHILL" "EXTREME HEAT"
## [143] "EXTREME WIND CHILL" "EXTREME WIND CHILL/BLOWING SNO"
## [145] "EXTREME WIND CHILLS" "EXTREME WINDCHILL"
## [147] "EXTREME WINDCHILL TEMPERATURES" "EXTREME/RECORD COLD"
## [149] "EXTREMELY WET" "FALLING SNOW/ICE"
## [151] "FIRST FROST" "FIRST SNOW"
## [153] "FLASH FLOOD" "FLASH FLOOD - HEAVY RAIN"
## [155] "FLASH FLOOD FROM ICE JAMS" "FLASH FLOOD LANDSLIDES"
## [157] "FLASH FLOOD WINDS" "FLASH FLOOD/"
## [159] "FLASH FLOOD/ FLOOD" "FLASH FLOOD/ STREET"
## [161] "FLASH FLOOD/FLOOD" "FLASH FLOOD/HEAVY RAIN"
## [163] "FLASH FLOOD/LANDSLIDE" "FLASH FLOODING"
## [165] "FLASH FLOODING/FLOOD" "FLASH FLOODING/THUNDERSTORM WI"
## [167] "FLASH FLOODS" "FLASH FLOOODING"
## [169] "Flood" "FLOOD"
## [171] "FLOOD & HEAVY RAIN" "FLOOD FLASH"
## [173] "FLOOD FLOOD/FLASH" "FLOOD WATCH/"
## [175] "FLOOD/FLASH" "Flood/Flash Flood"
## [177] "FLOOD/FLASH FLOOD" "FLOOD/FLASH FLOODING"
## [179] "FLOOD/FLASH/FLOOD" "FLOOD/FLASHFLOOD"
## [181] "FLOOD/RAIN/WIND" "FLOOD/RAIN/WINDS"
## [183] "FLOOD/RIVER FLOOD" "Flood/Strong Wind"
## [185] "FLOODING" "FLOODING/HEAVY RAIN"
## [187] "FLOODS" "FOG"
## [189] "FOG AND COLD TEMPERATURES" "FOREST FIRES"
## [191] "Freeze" "FREEZE"
## [193] "Freezing drizzle" "Freezing Drizzle"
## [195] "FREEZING DRIZZLE" "FREEZING DRIZZLE AND FREEZING"
## [197] "Freezing Fog" "FREEZING FOG"
## [199] "Freezing rain" "Freezing Rain"
## [201] "FREEZING RAIN" "FREEZING RAIN AND SLEET"
## [203] "FREEZING RAIN AND SNOW" "FREEZING RAIN SLEET AND"
## [205] "FREEZING RAIN SLEET AND LIGHT" "FREEZING RAIN/SLEET"
## [207] "FREEZING RAIN/SNOW" "Freezing Spray"
## [209] "Frost" "FROST"
## [211] "Frost/Freeze" "FROST/FREEZE"
## [213] "FROST\\FREEZE" "FUNNEL"
## [215] "Funnel Cloud" "FUNNEL CLOUD"
## [217] "FUNNEL CLOUD." "FUNNEL CLOUD/HAIL"
## [219] "FUNNEL CLOUDS" "FUNNELS"
## [221] "Glaze" "GLAZE"
## [223] "GLAZE ICE" "GLAZE/ICE STORM"
## [225] "gradient wind" "Gradient wind"
## [227] "GRADIENT WIND" "GRADIENT WINDS"
## [229] "GRASS FIRES" "GROUND BLIZZARD"
## [231] "GUSTNADO" "GUSTNADO AND"
## [233] "GUSTY LAKE WIND" "GUSTY THUNDERSTORM WIND"
## [235] "GUSTY THUNDERSTORM WINDS" "Gusty Wind"
## [237] "GUSTY WIND" "GUSTY WIND/HAIL"
## [239] "GUSTY WIND/HVY RAIN" "Gusty wind/rain"
## [241] "Gusty winds" "Gusty Winds"
## [243] "GUSTY WINDS" "HAIL"
## [245] "HAIL 0.75" "HAIL 0.88"
## [247] "HAIL 075" "HAIL 088"
## [249] "HAIL 1.00" "HAIL 1.75"
## [251] "HAIL 1.75)" "HAIL 100"
## [253] "HAIL 125" "HAIL 150"
## [255] "HAIL 175" "HAIL 200"
## [257] "HAIL 225" "HAIL 275"
## [259] "HAIL 450" "HAIL 75"
## [261] "HAIL 80" "HAIL 88"
## [263] "HAIL ALOFT" "HAIL DAMAGE"
## [265] "HAIL FLOODING" "HAIL STORM"
## [267] "Hail(0.75)" "HAIL/ICY ROADS"
## [269] "HAIL/WIND" "HAIL/WINDS"
## [271] "HAILSTORM" "HAILSTORMS"
## [273] "HARD FREEZE" "HAZARDOUS SURF"
## [275] "HEAT" "HEAT DROUGHT"
## [277] "Heat Wave" "HEAT WAVE"
## [279] "HEAT WAVE DROUGHT" "HEAT WAVES"
## [281] "HEAT/DROUGHT" "Heatburst"
## [283] "HEAVY LAKE SNOW" "HEAVY MIX"
## [285] "HEAVY PRECIPATATION" "Heavy Precipitation"
## [287] "HEAVY PRECIPITATION" "Heavy rain"
## [289] "Heavy Rain" "HEAVY RAIN"
## [291] "HEAVY RAIN AND FLOOD" "Heavy Rain and Wind"
## [293] "HEAVY RAIN EFFECTS" "HEAVY RAIN; URBAN FLOOD WINDS;"
## [295] "HEAVY RAIN/FLOODING" "Heavy Rain/High Surf"
## [297] "HEAVY RAIN/LIGHTNING" "HEAVY RAIN/MUDSLIDES/FLOOD"
## [299] "HEAVY RAIN/SEVERE WEATHER" "HEAVY RAIN/SMALL STREAM URBAN"
## [301] "HEAVY RAIN/SNOW" "HEAVY RAIN/URBAN FLOOD"
## [303] "HEAVY RAIN/WIND" "HEAVY RAINFALL"
## [305] "HEAVY RAINS" "HEAVY RAINS/FLOODING"
## [307] "HEAVY SEAS" "HEAVY SHOWER"
## [309] "HEAVY SHOWERS" "HEAVY SNOW"
## [311] "HEAVY SNOW FREEZING RAIN" "HEAVY SNOW & ICE"
## [313] "HEAVY SNOW AND" "HEAVY SNOW AND HIGH WINDS"
## [315] "HEAVY SNOW AND ICE" "HEAVY SNOW AND ICE STORM"
## [317] "HEAVY SNOW AND STRONG WINDS" "HEAVY SNOW ANDBLOWING SNOW"
## [319] "Heavy snow shower" "HEAVY SNOW SQUALLS"
## [321] "HEAVY SNOW-SQUALLS" "HEAVY SNOW/BLIZZARD"
## [323] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAVY SNOW/BLOWING SNOW"
## [325] "HEAVY SNOW/FREEZING RAIN" "HEAVY SNOW/HIGH"
## [327] "HEAVY SNOW/HIGH WIND" "HEAVY SNOW/HIGH WINDS"
## [329] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY SNOW/HIGH WINDS/FREEZING"
## [331] "HEAVY SNOW/ICE" "HEAVY SNOW/ICE STORM"
## [333] "HEAVY SNOW/SLEET" "HEAVY SNOW/SQUALLS"
## [335] "HEAVY SNOW/WIND" "HEAVY SNOW/WINTER STORM"
## [337] "HEAVY SNOWPACK" "Heavy Surf"
## [339] "HEAVY SURF" "Heavy surf and wind"
## [341] "HEAVY SURF COASTAL FLOODING" "HEAVY SURF/HIGH SURF"
## [343] "HEAVY SWELLS" "HEAVY WET SNOW"
## [345] "HIGH" "HIGH SWELLS"
## [347] "HIGH WINDS" "HIGH SEAS"
## [349] "High Surf" "HIGH SURF"
## [351] "HIGH SURF ADVISORIES" "HIGH SURF ADVISORY"
## [353] "HIGH SWELLS" "HIGH TEMPERATURE RECORD"
## [355] "HIGH TIDES" "HIGH WATER"
## [357] "HIGH WAVES" "High Wind"
## [359] "HIGH WIND" "HIGH WIND (G40)"
## [361] "HIGH WIND 48" "HIGH WIND 63"
## [363] "HIGH WIND 70" "HIGH WIND AND HEAVY SNOW"
## [365] "HIGH WIND AND HIGH TIDES" "HIGH WIND AND SEAS"
## [367] "HIGH WIND DAMAGE" "HIGH WIND/ BLIZZARD"
## [369] "HIGH WIND/BLIZZARD" "HIGH WIND/BLIZZARD/FREEZING RA"
## [371] "HIGH WIND/HEAVY SNOW" "HIGH WIND/LOW WIND CHILL"
## [373] "HIGH WIND/SEAS" "HIGH WIND/WIND CHILL"
## [375] "HIGH WIND/WIND CHILL/BLIZZARD" "HIGH WINDS"
## [377] "HIGH WINDS 55" "HIGH WINDS 57"
## [379] "HIGH WINDS 58" "HIGH WINDS 63"
## [381] "HIGH WINDS 66" "HIGH WINDS 67"
## [383] "HIGH WINDS 73" "HIGH WINDS 76"
## [385] "HIGH WINDS 80" "HIGH WINDS 82"
## [387] "HIGH WINDS AND WIND CHILL" "HIGH WINDS DUST STORM"
## [389] "HIGH WINDS HEAVY RAINS" "HIGH WINDS/"
## [391] "HIGH WINDS/COASTAL FLOOD" "HIGH WINDS/COLD"
## [393] "HIGH WINDS/FLOODING" "HIGH WINDS/HEAVY RAIN"
## [395] "HIGH WINDS/SNOW" "HIGHWAY FLOODING"
## [397] "Hot and Dry" "HOT PATTERN"
## [399] "HOT SPELL" "HOT WEATHER"
## [401] "HOT/DRY PATTERN" "HURRICANE"
## [403] "Hurricane Edouard" "HURRICANE EMILY"
## [405] "HURRICANE ERIN" "HURRICANE FELIX"
## [407] "HURRICANE GORDON" "HURRICANE OPAL"
## [409] "HURRICANE OPAL/HIGH WINDS" "HURRICANE-GENERATED SWELLS"
## [411] "HURRICANE/TYPHOON" "HVY RAIN"
## [413] "HYPERTHERMIA/EXPOSURE" "HYPOTHERMIA"
## [415] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [417] "ICE" "ICE AND SNOW"
## [419] "ICE FLOES" "Ice Fog"
## [421] "ICE JAM" "Ice jam flood (minor"
## [423] "ICE JAM FLOODING" "ICE ON ROAD"
## [425] "ICE PELLETS" "ICE ROADS"
## [427] "ICE STORM" "ICE STORM AND SNOW"
## [429] "ICE STORM/FLASH FLOOD" "Ice/Snow"
## [431] "ICE/SNOW" "ICE/STRONG WINDS"
## [433] "Icestorm/Blizzard" "Icy Roads"
## [435] "ICY ROADS" "LACK OF SNOW"
## [437] "Lake Effect Snow" "LAKE EFFECT SNOW"
## [439] "LAKE FLOOD" "LAKE-EFFECT SNOW"
## [441] "LAKESHORE FLOOD" "LANDSLIDE"
## [443] "LANDSLIDE/URBAN FLOOD" "LANDSLIDES"
## [445] "Landslump" "LANDSLUMP"
## [447] "LANDSPOUT" "LARGE WALL CLOUD"
## [449] "LATE FREEZE" "LATE SEASON HAIL"
## [451] "LATE SEASON SNOW" "Late Season Snowfall"
## [453] "LATE SNOW" "Late-season Snowfall"
## [455] "LIGHT FREEZING RAIN" "Light snow"
## [457] "Light Snow" "LIGHT SNOW"
## [459] "LIGHT SNOW AND SLEET" "Light Snow/Flurries"
## [461] "LIGHT SNOW/FREEZING PRECIP" "Light Snowfall"
## [463] "LIGHTING" "LIGHTNING"
## [465] "LIGHTNING WAUSEON" "LIGHTNING AND HEAVY RAIN"
## [467] "LIGHTNING AND THUNDERSTORM WIN" "LIGHTNING AND WINDS"
## [469] "LIGHTNING DAMAGE" "LIGHTNING FIRE"
## [471] "LIGHTNING INJURY" "LIGHTNING THUNDERSTORM WINDS"
## [473] "LIGHTNING THUNDERSTORM WINDSS" "LIGHTNING."
## [475] "LIGHTNING/HEAVY RAIN" "LIGNTNING"
## [477] "LOCAL FLASH FLOOD" "LOCAL FLOOD"
## [479] "LOCALLY HEAVY RAIN" "LOW TEMPERATURE"
## [481] "LOW TEMPERATURE RECORD" "LOW WIND CHILL"
## [483] "MAJOR FLOOD" "Marine Accident"
## [485] "MARINE HAIL" "MARINE HIGH WIND"
## [487] "MARINE MISHAP" "MARINE STRONG WIND"
## [489] "MARINE THUNDERSTORM WIND" "MARINE TSTM WIND"
## [491] "Metro Storm, May 26" "Microburst"
## [493] "MICROBURST" "MICROBURST WINDS"
## [495] "Mild and Dry Pattern" "MILD PATTERN"
## [497] "MILD/DRY PATTERN" "MINOR FLOOD"
## [499] "Minor Flooding" "MINOR FLOODING"
## [501] "MIXED PRECIP" "Mixed Precipitation"
## [503] "MIXED PRECIPITATION" "MODERATE SNOW"
## [505] "MODERATE SNOWFALL" "MONTHLY PRECIPITATION"
## [507] "Monthly Rainfall" "MONTHLY RAINFALL"
## [509] "Monthly Snowfall" "MONTHLY SNOWFALL"
## [511] "MONTHLY TEMPERATURE" "Mountain Snows"
## [513] "MUD SLIDE" "MUD SLIDES"
## [515] "MUD SLIDES URBAN FLOODING" "MUD/ROCK SLIDE"
## [517] "Mudslide" "MUDSLIDE"
## [519] "MUDSLIDE/LANDSLIDE" "Mudslides"
## [521] "MUDSLIDES" "NEAR RECORD SNOW"
## [523] "No Severe Weather" "NON SEVERE HAIL"
## [525] "NON TSTM WIND" "NON-SEVERE WIND DAMAGE"
## [527] "NON-TSTM WIND" "NONE"
## [529] "NORMAL PRECIPITATION" "NORTHERN LIGHTS"
## [531] "Other" "OTHER"
## [533] "PATCHY DENSE FOG" "PATCHY ICE"
## [535] "Prolong Cold" "PROLONG COLD"
## [537] "PROLONG COLD/SNOW" "PROLONG WARMTH"
## [539] "PROLONGED RAIN" "RAIN"
## [541] "RAIN (HEAVY)" "RAIN AND WIND"
## [543] "Rain Damage" "RAIN/SNOW"
## [545] "RAIN/WIND" "RAINSTORM"
## [547] "RAPIDLY RISING WATER" "RECORD COLD"
## [549] "Record Cold" "RECORD COLD"
## [551] "RECORD COLD AND HIGH WIND" "RECORD COLD/FROST"
## [553] "RECORD COOL" "Record dry month"
## [555] "RECORD DRYNESS" "Record Heat"
## [557] "RECORD HEAT" "RECORD HEAT WAVE"
## [559] "Record High" "RECORD HIGH"
## [561] "RECORD HIGH TEMPERATURE" "RECORD HIGH TEMPERATURES"
## [563] "RECORD LOW" "RECORD LOW RAINFALL"
## [565] "Record May Snow" "RECORD PRECIPITATION"
## [567] "RECORD RAINFALL" "RECORD SNOW"
## [569] "RECORD SNOW/COLD" "RECORD SNOWFALL"
## [571] "Record temperature" "RECORD TEMPERATURE"
## [573] "Record Temperatures" "RECORD TEMPERATURES"
## [575] "RECORD WARM" "RECORD WARM TEMPS."
## [577] "Record Warmth" "RECORD WARMTH"
## [579] "Record Winter Snow" "RECORD/EXCESSIVE HEAT"
## [581] "RECORD/EXCESSIVE RAINFALL" "RED FLAG CRITERIA"
## [583] "RED FLAG FIRE WX" "REMNANTS OF FLOYD"
## [585] "RIP CURRENT" "RIP CURRENTS"
## [587] "RIP CURRENTS HEAVY SURF" "RIP CURRENTS/HEAVY SURF"
## [589] "RIVER AND STREAM FLOOD" "RIVER FLOOD"
## [591] "River Flooding" "RIVER FLOODING"
## [593] "ROCK SLIDE" "ROGUE WAVE"
## [595] "ROTATING WALL CLOUD" "ROUGH SEAS"
## [597] "ROUGH SURF" "RURAL FLOOD"
## [599] "Saharan Dust" "SAHARAN DUST"
## [601] "Seasonal Snowfall" "SEICHE"
## [603] "SEVERE COLD" "SEVERE THUNDERSTORM"
## [605] "SEVERE THUNDERSTORM WINDS" "SEVERE THUNDERSTORMS"
## [607] "SEVERE TURBULENCE" "SLEET"
## [609] "SLEET & FREEZING RAIN" "SLEET STORM"
## [611] "SLEET/FREEZING RAIN" "SLEET/ICE STORM"
## [613] "SLEET/RAIN/SNOW" "SLEET/SNOW"
## [615] "small hail" "Small Hail"
## [617] "SMALL HAIL" "SMALL STREAM"
## [619] "SMALL STREAM AND" "SMALL STREAM AND URBAN FLOOD"
## [621] "SMALL STREAM AND URBAN FLOODIN" "SMALL STREAM FLOOD"
## [623] "SMALL STREAM FLOODING" "SMALL STREAM URBAN FLOOD"
## [625] "SMALL STREAM/URBAN FLOOD" "Sml Stream Fld"
## [627] "SMOKE" "Snow"
## [629] "SNOW" "Snow Accumulation"
## [631] "SNOW ACCUMULATION" "SNOW ADVISORY"
## [633] "SNOW AND COLD" "SNOW AND HEAVY SNOW"
## [635] "Snow and Ice" "SNOW AND ICE"
## [637] "SNOW AND ICE STORM" "Snow and sleet"
## [639] "SNOW AND SLEET" "SNOW AND WIND"
## [641] "SNOW DROUGHT" "SNOW FREEZING RAIN"
## [643] "SNOW SHOWERS" "SNOW SLEET"
## [645] "SNOW SQUALL" "Snow squalls"
## [647] "Snow Squalls" "SNOW SQUALLS"
## [649] "SNOW- HIGH WIND- WIND CHILL" "SNOW/ BITTER COLD"
## [651] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [653] "SNOW/COLD" "SNOW/FREEZING RAIN"
## [655] "SNOW/HEAVY SNOW" "SNOW/HIGH WINDS"
## [657] "SNOW/ICE" "SNOW/ICE STORM"
## [659] "SNOW/RAIN" "SNOW/RAIN/SLEET"
## [661] "SNOW/SLEET" "SNOW/SLEET/FREEZING RAIN"
## [663] "SNOW/SLEET/RAIN" "SNOW\\COLD"
## [665] "SNOWFALL RECORD" "SNOWMELT FLOODING"
## [667] "SNOWSTORM" "SOUTHEAST"
## [669] "STORM FORCE WINDS" "STORM SURGE"
## [671] "STORM SURGE/TIDE" "STREAM FLOODING"
## [673] "STREET FLOOD" "STREET FLOODING"
## [675] "Strong Wind" "STRONG WIND"
## [677] "STRONG WIND GUST" "Strong winds"
## [679] "Strong Winds" "STRONG WINDS"
## [681] "Summary August 10" "Summary August 11"
## [683] "Summary August 17" "Summary August 2-3"
## [685] "Summary August 21" "Summary August 28"
## [687] "Summary August 4" "Summary August 7"
## [689] "Summary August 9" "Summary Jan 17"
## [691] "Summary July 23-24" "Summary June 18-19"
## [693] "Summary June 5-6" "Summary June 6"
## [695] "Summary of April 12" "Summary of April 13"
## [697] "Summary of April 21" "Summary of April 27"
## [699] "Summary of April 3rd" "Summary of August 1"
## [701] "Summary of July 11" "Summary of July 2"
## [703] "Summary of July 22" "Summary of July 26"
## [705] "Summary of July 29" "Summary of July 3"
## [707] "Summary of June 10" "Summary of June 11"
## [709] "Summary of June 12" "Summary of June 13"
## [711] "Summary of June 15" "Summary of June 16"
## [713] "Summary of June 18" "Summary of June 23"
## [715] "Summary of June 24" "Summary of June 3"
## [717] "Summary of June 30" "Summary of June 4"
## [719] "Summary of June 6" "Summary of March 14"
## [721] "Summary of March 23" "Summary of March 24"
## [723] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"
## [725] "SUMMARY OF MARCH 29" "Summary of May 10"
## [727] "Summary of May 13" "Summary of May 14"
## [729] "Summary of May 22" "Summary of May 22 am"
## [731] "Summary of May 22 pm" "Summary of May 26 am"
## [733] "Summary of May 26 pm" "Summary of May 31 am"
## [735] "Summary of May 31 pm" "Summary of May 9-10"
## [737] "Summary Sept. 25-26" "Summary September 20"
## [739] "Summary September 23" "Summary September 3"
## [741] "Summary September 4" "Summary: Nov. 16"
## [743] "Summary: Nov. 6-7" "Summary: Oct. 20-21"
## [745] "Summary: October 31" "Summary: Sept. 18"
## [747] "Temperature record" "THUDERSTORM WINDS"
## [749] "THUNDEERSTORM WINDS" "THUNDERESTORM WINDS"
## [751] "THUNDERSNOW" "Thundersnow shower"
## [753] "THUNDERSTORM" "THUNDERSTORM WINDS"
## [755] "THUNDERSTORM DAMAGE" "THUNDERSTORM DAMAGE TO"
## [757] "THUNDERSTORM HAIL" "THUNDERSTORM W INDS"
## [759] "Thunderstorm Wind" "THUNDERSTORM WIND"
## [761] "THUNDERSTORM WIND (G40)" "THUNDERSTORM WIND 50"
## [763] "THUNDERSTORM WIND 52" "THUNDERSTORM WIND 56"
## [765] "THUNDERSTORM WIND 59" "THUNDERSTORM WIND 59 MPH"
## [767] "THUNDERSTORM WIND 59 MPH." "THUNDERSTORM WIND 60 MPH"
## [769] "THUNDERSTORM WIND 65 MPH" "THUNDERSTORM WIND 65MPH"
## [771] "THUNDERSTORM WIND 69" "THUNDERSTORM WIND 98 MPH"
## [773] "THUNDERSTORM WIND G50" "THUNDERSTORM WIND G51"
## [775] "THUNDERSTORM WIND G52" "THUNDERSTORM WIND G55"
## [777] "THUNDERSTORM WIND G60" "THUNDERSTORM WIND G61"
## [779] "THUNDERSTORM WIND TREES" "THUNDERSTORM WIND."
## [781] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM WIND/ TREES"
## [783] "THUNDERSTORM WIND/AWNING" "THUNDERSTORM WIND/HAIL"
## [785] "THUNDERSTORM WIND/LIGHTNING" "THUNDERSTORM WINDS"
## [787] "THUNDERSTORM WINDS LE CEN" "THUNDERSTORM WINDS 13"
## [789] "THUNDERSTORM WINDS 2" "THUNDERSTORM WINDS 50"
## [791] "THUNDERSTORM WINDS 52" "THUNDERSTORM WINDS 53"
## [793] "THUNDERSTORM WINDS 60" "THUNDERSTORM WINDS 61"
## [795] "THUNDERSTORM WINDS 62" "THUNDERSTORM WINDS 63 MPH"
## [797] "THUNDERSTORM WINDS AND" "THUNDERSTORM WINDS FUNNEL CLOU"
## [799] "THUNDERSTORM WINDS G" "THUNDERSTORM WINDS G60"
## [801] "THUNDERSTORM WINDS HAIL" "THUNDERSTORM WINDS HEAVY RAIN"
## [803] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS SMALL STREA"
## [805] "THUNDERSTORM WINDS URBAN FLOOD" "THUNDERSTORM WINDS."
## [807] "THUNDERSTORM WINDS/ FLOOD" "THUNDERSTORM WINDS/ HAIL"
## [809] "THUNDERSTORM WINDS/FLASH FLOOD" "THUNDERSTORM WINDS/FLOODING"
## [811] "THUNDERSTORM WINDS/FUNNEL CLOU" "THUNDERSTORM WINDS/HAIL"
## [813] "THUNDERSTORM WINDS/HEAVY RAIN" "THUNDERSTORM WINDS53"
## [815] "THUNDERSTORM WINDSHAIL" "THUNDERSTORM WINDSS"
## [817] "THUNDERSTORM WINS" "THUNDERSTORMS"
## [819] "THUNDERSTORMS WIND" "THUNDERSTORMS WINDS"
## [821] "THUNDERSTORMW" "THUNDERSTORMW 50"
## [823] "THUNDERSTORMW WINDS" "THUNDERSTORMWINDS"
## [825] "THUNDERSTROM WIND" "THUNDERSTROM WINDS"
## [827] "THUNDERTORM WINDS" "THUNDERTSORM WIND"
## [829] "THUNDESTORM WINDS" "THUNERSTORM WINDS"
## [831] "TIDAL FLOOD" "Tidal Flooding"
## [833] "TIDAL FLOODING" "TORNADO"
## [835] "TORNADO DEBRIS" "TORNADO F0"
## [837] "TORNADO F1" "TORNADO F2"
## [839] "TORNADO F3" "TORNADO/WATERSPOUT"
## [841] "TORNADOES" "TORNADOES, TSTM WIND, HAIL"
## [843] "TORNADOS" "TORNDAO"
## [845] "TORRENTIAL RAIN" "Torrential Rainfall"
## [847] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [849] "TROPICAL STORM ALBERTO" "TROPICAL STORM DEAN"
## [851] "TROPICAL STORM GORDON" "TROPICAL STORM JERRY"
## [853] "TSTM" "TSTM HEAVY RAIN"
## [855] "Tstm Wind" "TSTM WIND"
## [857] "TSTM WIND (G45)" "TSTM WIND (41)"
## [859] "TSTM WIND (G35)" "TSTM WIND (G40)"
## [861] "TSTM WIND (G45)" "TSTM WIND 40"
## [863] "TSTM WIND 45" "TSTM WIND 50"
## [865] "TSTM WIND 51" "TSTM WIND 52"
## [867] "TSTM WIND 55" "TSTM WIND 65)"
## [869] "TSTM WIND AND LIGHTNING" "TSTM WIND DAMAGE"
## [871] "TSTM WIND G45" "TSTM WIND G58"
## [873] "TSTM WIND/HAIL" "TSTM WINDS"
## [875] "TSTM WND" "TSTMW"
## [877] "TSUNAMI" "TUNDERSTORM WIND"
## [879] "TYPHOON" "Unseasonable Cold"
## [881] "UNSEASONABLY COLD" "UNSEASONABLY COOL"
## [883] "UNSEASONABLY COOL & WET" "UNSEASONABLY DRY"
## [885] "UNSEASONABLY HOT" "UNSEASONABLY WARM"
## [887] "UNSEASONABLY WARM & WET" "UNSEASONABLY WARM AND DRY"
## [889] "UNSEASONABLY WARM YEAR" "UNSEASONABLY WARM/WET"
## [891] "UNSEASONABLY WET" "UNSEASONAL LOW TEMP"
## [893] "UNSEASONAL RAIN" "UNUSUAL WARMTH"
## [895] "UNUSUAL/RECORD WARMTH" "UNUSUALLY COLD"
## [897] "UNUSUALLY LATE SNOW" "UNUSUALLY WARM"
## [899] "URBAN AND SMALL" "URBAN AND SMALL STREAM"
## [901] "URBAN AND SMALL STREAM FLOOD" "URBAN AND SMALL STREAM FLOODIN"
## [903] "Urban flood" "Urban Flood"
## [905] "URBAN FLOOD" "URBAN FLOOD LANDSLIDE"
## [907] "Urban Flooding" "URBAN FLOODING"
## [909] "URBAN FLOODS" "URBAN SMALL"
## [911] "URBAN SMALL STREAM FLOOD" "URBAN/SMALL"
## [913] "URBAN/SMALL FLOODING" "URBAN/SMALL STREAM"
## [915] "URBAN/SMALL STREAM FLOOD" "URBAN/SMALL STREAM FLOOD"
## [917] "URBAN/SMALL STREAM FLOODING" "URBAN/SMALL STRM FLDG"
## [919] "URBAN/SML STREAM FLD" "URBAN/SML STREAM FLDG"
## [921] "URBAN/STREET FLOODING" "VERY DRY"
## [923] "VERY WARM" "VOG"
## [925] "Volcanic Ash" "VOLCANIC ASH"
## [927] "Volcanic Ash Plume" "VOLCANIC ASHFALL"
## [929] "VOLCANIC ERUPTION" "WAKE LOW WIND"
## [931] "WALL CLOUD" "WALL CLOUD/FUNNEL CLOUD"
## [933] "WARM DRY CONDITIONS" "WARM WEATHER"
## [935] "WATER SPOUT" "WATERSPOUT"
## [937] "WATERSPOUT FUNNEL CLOUD" "WATERSPOUT TORNADO"
## [939] "WATERSPOUT-" "WATERSPOUT-TORNADO"
## [941] "WATERSPOUT/" "WATERSPOUT/ TORNADO"
## [943] "WATERSPOUT/TORNADO" "WATERSPOUTS"
## [945] "WAYTERSPOUT" "wet micoburst"
## [947] "WET MICROBURST" "Wet Month"
## [949] "WET SNOW" "WET WEATHER"
## [951] "Wet Year" "Whirlwind"
## [953] "WHIRLWIND" "WILD FIRES"
## [955] "WILD/FOREST FIRE" "WILD/FOREST FIRES"
## [957] "WILDFIRE" "WILDFIRES"
## [959] "Wind" "WIND"
## [961] "WIND ADVISORY" "WIND AND WAVE"
## [963] "WIND CHILL" "WIND CHILL/HIGH WIND"
## [965] "Wind Damage" "WIND DAMAGE"
## [967] "WIND GUSTS" "WIND STORM"
## [969] "WIND/HAIL" "WINDS"
## [971] "WINTER MIX" "WINTER STORM"
## [973] "WINTER STORM HIGH WINDS" "WINTER STORM/HIGH WIND"
## [975] "WINTER STORM/HIGH WINDS" "WINTER STORMS"
## [977] "Winter Weather" "WINTER WEATHER"
## [979] "WINTER WEATHER MIX" "WINTER WEATHER/MIX"
## [981] "WINTERY MIX" "Wintry mix"
## [983] "Wintry Mix" "WINTRY MIX"
## [985] "WND"
From the event list, we can see that the event naming is not consistent. For instance, ‘TSTM WIN’ and ’ TSM WIND (G45) can actually be grouped together. We need to normalised these values.
# normalisation of the values of variable EVTYPE
# load the 'stringr' package for string manipulation
library(stringr)
# change the character to upper case and remove leading/trailing spaces
dt.raw[, EVTYPE:= str_trim(toupper(EVTYPE), side = 'both')]
# standardize the event naming
dt.raw$EVTYPE <- gsub(' AND ', '/', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('ABNORMALLY', 'ABNORMAL', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('ABNORMALLY', 'ABNORMAL', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('AVALANCE', 'AVALANCHE', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('EROSIN', 'EROSION', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('BITTER WIND CHILL TEMPERATURES', 'BITTER WIND CHILL', dt.raw$EVTYPE)
dt.raw$EVTYPE <- gsub('CHIL\\>', 'CHILL', dt.raw$EVTYPE, fixed = F)
dt.raw$EVTYPE[grep('^blizzard', dt.raw$EVTYPE, ignore.case = T)] = 'BLIZZARD'
dt.raw$EVTYPE[grep('^extreme wind', dt.raw$EVTYPE, ignore.case = T)] = 'EXTREME COLD/WIND CHILL'
dt.raw$EVTYPE[grep('^dry microburst', dt.raw$EVTYPE, ignore.case = T)] = 'DRY MICROBURST'
dt.raw$EVTYPE[grep('^FLASH FLOOD.*', dt.raw$EVTYPE, ignore.case = T)] = 'FLASH FLOOD'
dt.raw$EVTYPE[grep('^FLOOD.*', dt.raw$EVTYPE, ignore.case = T)] = 'FLOOD'
dt.raw$EVTYPE[grep('^((?!flash).*flood)+', dt.raw$EVTYPE,perl = T, ignore.case = T)]= 'FLOOD'
dt.raw$EVTYPE[grep('^HAIL.*', dt.raw$EVTYPE, ignore.case = T)] = 'HAIL'
dt.raw$EVTYPE[grep('^HEAT.*', dt.raw$EVTYPE, ignore.case = T)] = 'HEAT'
dt.raw$EVTYPE[grep('^HEAVY RAIN.*', dt.raw$EVTYPE, ignore.case = T)] = 'HEAVY RAIN'
dt.raw$EVTYPE[grep('^HEAVY SNOW.*', dt.raw$EVTYPE, ignore.case = T)] = 'HEAVY SNOW'
dt.raw$EVTYPE[grep('^HIGH WIND.*', dt.raw$EVTYPE, ignore.case = T)] = 'HIGH WIND'
dt.raw$EVTYPE[grep('^HURRICANE.*', dt.raw$EVTYPE, ignore.case = T)] = 'HURRICANE'
dt.raw$EVTYPE[grep('^LIGHT SNOW.*', dt.raw$EVTYPE, ignore.case = T)] = 'LIGHT SNOW'
dt.raw$EVTYPE[grep('^LIGHTNING.*', dt.raw$EVTYPE, ignore.case = T)] = 'LIGHTNING'
dt.raw$EVTYPE[grep('^MUD ?.*SLIDE.*', dt.raw$EVTYPE, ignore.case = T)] = 'MUD SLIDE'
dt.raw$EVTYPE[grep('^RECORD HEAT.*', dt.raw$EVTYPE, ignore.case = T)] = 'HEAT'
dt.raw$EVTYPE[grep('^RIP CURRENT.*', dt.raw$EVTYPE, ignore.case = T)] = 'RIP CURRENT'
dt.raw$EVTYPE[grep('^SLEET.*', dt.raw$EVTYPE, ignore.case = T)] = 'SLEET'
dt.raw$EVTYPE[grep('^THU.*M.*WIND.*', dt.raw$EVTYPE, ignore.case = T)] = 'THUNDERSTORM WIND'
dt.raw$EVTYPE[grep('^TORN.*D.*O.*', dt.raw$EVTYPE, ignore.case = T)] = 'TORNADO'
dt.raw$EVTYPE[grep('^TSTM.*', dt.raw$EVTYPE, ignore.case = T)] = 'THUNDERSTORM WIND'
dt.raw$EVTYPE[grep('^WATERSP.*', dt.raw$EVTYPE, ignore.case = T)] = 'WATERSPOUT'
dt.raw$EVTYPE[grep('^WINTER STO.*', dt.raw$EVTYPE, ignore.case = T)] = 'WINTER STORM'
Identify the top 10 weather event that brought greatest impact to the population health.
# sum the injuries and fatalities of each event type
top10.event <- dt.raw[, .(EVTYPE,total = FATALITIES + INJURIES)]
# calculate the sum of impact on health aggregate by event type
top10.event <- top10.event[, .(total = sum(total, na.rm = T)), EVTYPE]
# extract the top 10 event type with highest impact
top10.event <- arrange(top10.event, -total)[1:10, ]
# show the result
head(top10.event)
## EVTYPE total
## 1: TORNADO 97022
## 2: THUNDERSTORM WIND 10179
## 3: EXCESSIVE HEAT 8428
## 4: FLOOD 7326
## 5: LIGHTNING 6049
## 6: HEAT 3664
# subset the dataset to contain only top 10 event type and calculate the sum of health impact
top10.data <- dt.raw[EVTYPE %in% top10.event$EVTYPE, .(EVTYPE, FATALITIES, INJURIES)]
top10.data <- data.table::melt(top10.data, id.vars = 'EVTYPE', measure.vars = c('INJURIES', 'FATALITIES'))
top10.data <- top10.data[, .(value = sum(value)), .(EVTYPE, variable)]
# sort the dataset by the variable
top10.data <- arrange(top10.data, variable)
head(top10.data)
## EVTYPE variable value
## 1: TORNADO INJURIES 91364
## 2: THUNDERSTORM WIND INJURIES 9469
## 3: FLOOD INJURIES 6819
## 4: WINTER STORM INJURIES 1353
## 5: LIGHTNING INJURIES 5232
## 6: FLASH FLOOD INJURIES 1785
# reorder the level in the EVTYPE so that the chart will display from highest to lowest
top10.data$EVTYPE <- reorder(as.factor(top10.data$EVTYPE), top10.data$value, FUN = sum)
Now we can plot the chart to see the top 10 event type that have the greatest impact to the population health.
library(ggplot2)
ggplot(top10.data) + geom_bar(mapping = aes(EVTYPE, value, fill=variable), stat = 'identity', position = 'stack') + coord_flip() + labs(title = 'Top 10 Event Type with Greatesh Impact on Population Health', y = 'Total of injuries and fatalities', x = 'Event type')
From the graph, we learnt that TORNADO (total injuries and fatalities of 97,022 incidents) is the event type that has the greatest impact on the population health.
Identify the top 10 weather event that brought greatest impact to the population economy.
# sum the property dmg and crop dmg of each event type
top10.econ <- dt.raw[, .(EVTYPE,total = PROPDMG.1 + CROPDMG.1)]
top10.econ <- top10.econ[, .(total = sum(total)), EVTYPE]
# sort it by total damage in descending manner
setorder(top10.econ, -total)
# subset the top 10 event type
top10.econ <- top10.econ[1:10,]
top10.econ
## EVTYPE total
## 1: FLOOD 161739817704
## 2: HURRICANE 90271472810
## 3: TORNADO 58959393949
## 4: STORM SURGE 43323541000
## 5: HAIL 19000564666
## 6: FLASH FLOOD 18170032328
## 7: DROUGHT 15018672000
## 8: THUNDERSTORM WIND 10986143694
## 9: ICE STORM 8967041360
## 10: TROPICAL STORM 8382236550
# subset the top 10 damage event type from dataset
top10.data1 <- dt.raw[EVTYPE %in% top10.econ$EVTYPE,]
# melt the data table so that we can plot the stacked chart
top10.data1 <- data.table::melt(top10.data1, id.vars = 'EVTYPE', measure.vars = c('PROPDMG.1', 'CROPDMG.1'))
# calculate the aggregated sum by variable type
top10.data1 <- top10.data1[, .(total = sum(value, na.rm = T)), .(EVTYPE, variable)]
# sort the dataset by variable type for better order arrangement in bar chart
setorder(top10.data1, variable)
# change the total unit to billion
top10.data1[, total:=total/(10^9)]
# reorder the level in the EVTYPE so that the chart will display from highest to lowest
top10.data1$EVTYPE <- reorder(as.factor(top10.data1$EVTYPE), top10.data1$total)
Now we can plot the chart to see the top 10 event type that have the greatest impact to the population economy.
ggplot(top10.data1) + geom_bar(mapping = aes(x = EVTYPE, y = total, fill = variable), stat = 'identity')+ coord_flip() + labs(title = 'Top 10 Event Type with Greatesh Impact on Population Economy', y = 'Total of damage on property and crops (in billion)', x = 'Event type')
From the graph, we learnt that FLOOD (total crop and property damage of 162 billion) is the event type that has the greatest impact on the population economy.