When looking at significant natural disasters in the USA it becomes clear that certain types of events cause more bodily harm while other types cause more financial harm. Tornados top out the list in terms of causing bodiy harm while floods and hurricanes cause the most property damage. Finally droughts cause the most crop damage with floods a close second.
First data was downloaded from the National Oceanic and Atmospheric Administration(NOAA) which contains data on significant weather phenomena.
if(!file.exists("StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.csv.bz2",method="curl")
}
Load the libraries we will need
library(dplyr)
library(tidyr)
library(ggplot2)
First the data was read in from the file changing the option to not make strings factors as this is a costly operation for large datasets.
data <- read.csv("StormData.csv.bz2",stringsAsFactors=F)
Now we are going to have a quick look around our dataset to see what we have.
dim(data)
## [1] 902297 37
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI
## Min. : 0.000 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character
## Median : 0.000 Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_DATE END_TIME COUNTY_END COUNTYENDN
## Length:902297 Length:902297 Min. :0 Mode:logical
## Class :character Class :character 1st Qu.:0 NA's:902297
## Mode :character Mode :character Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## END_RANGE END_AZI END_LOCATI
## Min. : 0.0000 Length:902297 Length:902297
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.9862
## 3rd Qu.: 0.0000
## Max. :925.0000
##
## LENGTH WIDTH F MAG
## Min. : 0.0000 Min. : 0.000 Min. :0.0 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0 1st Qu.: 0.0
## Median : 0.0000 Median : 0.000 Median :1.0 Median : 50.0
## Mean : 0.2301 Mean : 7.503 Mean :0.9 Mean : 46.9
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.:1.0 3rd Qu.: 75.0
## Max. :2315.0000 Max. :4400.000 Max. :5.0 Max. :22000.0
## NA's :843563
## FATALITIES INJURIES PROPDMG
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
##
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
So first lets reduce this dataset to something manageable. We have no interest in events which have no fatalities, injuries, property damage or crop damage so lets remove those.
data <- filter(data,FATALITIES > 0 | INJURIES >0 | PROPDMG > 0 | CROPDMG > 0)
dim(data)
## [1] 254633 37
Now we are down to about 250,000 events from almost 1 million.
Looking at the documentation for this dataset we need to interpret the PROPDMG and CROPDMG numbers with their magnitude.
table(data$PROPDMGEXP)
##
## - + 0 2 3 4 5 6 7
## 11585 1 5 210 1 1 4 18 3 3
## B h H K m M
## 40 1 6 231428 7 11320
table(data$CROPDMGEXP)
##
## ? 0 B k K m M
## 152664 6 17 7 21 99932 1 1985
We can see that in some cases people didn’t follow the instructions. While we could make a guess as to what some of these mean it wouldn’t be reliable. So the only assumption we will make is that 0 means no multiplier and we will simply drop the rest.
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
data <- subset(data, PROPDMGEXP %in% c("","0","K","M","B"))
data <- subset(data, CROPDMGEXP %in% c("","0","K","M","B"))
dim(data)
## [1] 254584 37
Now we can move on to replacing those letter to numbers.
data$PROPMAG<-1
data$PROPMAG[data$PROPDMGEXP=="K"] <- 1000
data$PROPMAG[data$PROPDMGEXP=="M"] <- 1000000
data$PROPMAG[data$PROPDMGEXP=="B"] <- 1000000000
data$CROPMAG<-1
data$CROPMAG[data$CROPDMGEXP=="K"] <- 1000
data$CROPMAG[data$CROPDMGEXP=="M"] <- 1000000
data$CROPMAG[data$CROPDMGEXP=="B"] <- 1000000000
Now we can extrapolate the correct number for damage by multiplying the columns.
data$PROPDMG <- data$PROPDMG * data$PROPMAG
data$CROPDMG <- data$CROPDMG * data$CROPMAG
summary(data$PROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 2.000e+03 1.000e+04 1.678e+06 3.500e+04 1.150e+11
summary(data$CROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 1.928e+05 0.000e+00 5.000e+09
We also have too many columns so lets just grab the columns we want to deal with
data <- select(data,EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG)
dim(data)
## [1] 254584 5
Now lets have look at the EVTYPE field.
unique(data$EVTYPE)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "ICE STORM/FLASH FLOOD"
## [5] "WINTER STORM" "HURRICANE OPAL/HIGH WINDS"
## [7] "THUNDERSTORM WINDS" "HURRICANE ERIN"
## [9] "HURRICANE OPAL" "HEAVY RAIN"
## [11] "LIGHTNING" "THUNDERSTORM WIND"
## [13] "DENSE FOG" "RIP CURRENT"
## [15] "THUNDERSTORM WINS" "FLASH FLOODING"
## [17] "FLASH FLOOD" "TORNADO F0"
## [19] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS/HAIL"
## [21] "HEAT" "HIGH WINDS"
## [23] "WIND" "HEAVY RAINS"
## [25] "LIGHTNING AND HEAVY RAIN" "THUNDERSTORM WINDS HAIL"
## [27] "COLD" "HEAVY RAIN/LIGHTNING"
## [29] "FLASH FLOODING/THUNDERSTORM WI" "FLOODING"
## [31] "WATERSPOUT" "EXTREME COLD"
## [33] "LIGHTNING/HEAVY RAIN" "HIGH WIND"
## [35] "FREEZE" "RIVER FLOOD"
## [37] "HIGH WINDS HEAVY RAINS" "AVALANCHE"
## [39] "MARINE MISHAP" "HIGH TIDES"
## [41] "HIGH WIND/SEAS" "HIGH WINDS/HEAVY RAIN"
## [43] "HIGH SEAS" "COASTAL FLOOD"
## [45] "SEVERE TURBULENCE" "RECORD RAINFALL"
## [47] "HEAVY SNOW" "HEAVY SNOW/WIND"
## [49] "DUST STORM" "FLOOD"
## [51] "APACHE COUNTY" "SLEET"
## [53] "DUST DEVIL" "ICE STORM"
## [55] "EXCESSIVE HEAT" "THUNDERSTORM WINDS/FUNNEL CLOU"
## [57] "GUSTY WINDS" "HEAVY SURF COASTAL FLOODING"
## [59] "HIGH SURF" "WILD FIRES"
## [61] "HIGH" "WINTER STORM HIGH WINDS"
## [63] "WINTER STORMS" "MUDSLIDES"
## [65] "RAINSTORM" "SEVERE THUNDERSTORM"
## [67] "SEVERE THUNDERSTORMS" "SEVERE THUNDERSTORM WINDS"
## [69] "THUNDERSTORMS WINDS" "FLOOD/FLASH FLOOD"
## [71] "FLOOD/RAIN/WINDS" "THUNDERSTORMS"
## [73] "WINDS" "FUNNEL CLOUD"
## [75] "HIGH WIND DAMAGE" "STRONG WIND"
## [77] "HEAVY SNOWPACK" "FLASH FLOOD/"
## [79] "HEAVY SURF" "DRY MIRCOBURST WINDS"
## [81] "DRY MICROBURST" "URBAN FLOOD"
## [83] "THUNDERSTORM WINDSS" "MICROBURST WINDS"
## [85] "HEAT WAVE" "UNSEASONABLY WARM"
## [87] "COASTAL FLOODING" "STRONG WINDS"
## [89] "BLIZZARD" "WATERSPOUT/TORNADO"
## [91] "WATERSPOUT TORNADO" "STORM SURGE"
## [93] "URBAN/SMALL STREAM FLOOD" "WATERSPOUT-"
## [95] "TORNADOES, TSTM WIND, HAIL" "TROPICAL STORM ALBERTO"
## [97] "TROPICAL STORM" "TROPICAL STORM GORDON"
## [99] "TROPICAL STORM JERRY" "LIGHTNING THUNDERSTORM WINDS"
## [101] "URBAN FLOODING" "MINOR FLOODING"
## [103] "WATERSPOUT-TORNADO" "LIGHTNING INJURY"
## [105] "LIGHTNING AND THUNDERSTORM WIN" "FLASH FLOODS"
## [107] "THUNDERSTORM WINDS53" "WILDFIRE"
## [109] "DAMAGING FREEZE" "THUNDERSTORM WINDS 13"
## [111] "HURRICANE" "SNOW"
## [113] "LIGNTNING" "FROST"
## [115] "FREEZING RAIN/SNOW" "HIGH WINDS/"
## [117] "THUNDERSNOW" "FLOODS"
## [119] "COOL AND WET" "HEAVY RAIN/SNOW"
## [121] "GLAZE ICE" "MUD SLIDE"
## [123] "HIGH WINDS" "RURAL FLOOD"
## [125] "MUD SLIDES" "EXTREME HEAT"
## [127] "DROUGHT" "COLD AND WET CONDITIONS"
## [129] "EXCESSIVE WETNESS" "SLEET/ICE STORM"
## [131] "GUSTNADO" "FREEZING RAIN"
## [133] "SNOW AND HEAVY SNOW" "GROUND BLIZZARD"
## [135] "EXTREME WIND CHILL" "MAJOR FLOOD"
## [137] "SNOW/HEAVY SNOW" "FREEZING RAIN/SLEET"
## [139] "ICE JAM FLOODING" "COLD AIR TORNADO"
## [141] "WIND DAMAGE" "FOG"
## [143] "TSTM WIND 55" "SMALL STREAM FLOOD"
## [145] "THUNDERTORM WINDS" "HAIL/WINDS"
## [147] "SNOW AND ICE" "WIND STORM"
## [149] "GRASS FIRES" "LAKE FLOOD"
## [151] "HAIL/WIND" "WIND/HAIL"
## [153] "ICE" "SNOW AND ICE STORM"
## [155] "THUNDERSTORM WINDS" "WINTER WEATHER"
## [157] "DROUGHT/EXCESSIVE HEAT" "THUNDERSTORMS WIND"
## [159] "TUNDERSTORM WIND" "URBAN AND SMALL STREAM FLOODIN"
## [161] "THUNDERSTORM WIND/LIGHTNING" "HEAVY RAIN/SEVERE WEATHER"
## [163] "THUNDERSTORM" "WATERSPOUT/ TORNADO"
## [165] "LIGHTNING." "HURRICANE-GENERATED SWELLS"
## [167] "RIVER AND STREAM FLOOD" "HIGH WINDS/COASTAL FLOOD"
## [169] "RAIN" "RIVER FLOODING"
## [171] "ICE FLOES" "LIGHTNING FIRE"
## [173] "HEAVY LAKE SNOW" "RECORD COLD"
## [175] "HEAVY SNOW/FREEZING RAIN" "COLD WAVE"
## [177] "DUST DEVIL WATERSPOUT" "TORNADO F3"
## [179] "TORNDAO" "FLOOD/RIVER FLOOD"
## [181] "MUD SLIDES URBAN FLOODING" "TORNADO F1"
## [183] "GLAZE/ICE STORM" "GLAZE"
## [185] "HEAVY SNOW/WINTER STORM" "MICROBURST"
## [187] "AVALANCE" "BLIZZARD/WINTER STORM"
## [189] "DUST STORM/HIGH WINDS" "ICE JAM"
## [191] "FOREST FIRES" "FROST\\FREEZE"
## [193] "THUNDERSTORM WINDS." "HVY RAIN"
## [195] "HAIL 150" "HAIL 075"
## [197] "HAIL 100" "THUNDERSTORM WIND G55"
## [199] "HAIL 125" "THUNDERSTORM WIND G60"
## [201] "THUNDERSTORM WINDS G60" "HARD FREEZE"
## [203] "HAIL 200" "HEAVY SNOW AND HIGH WINDS"
## [205] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY RAIN AND FLOOD"
## [207] "RIP CURRENTS/HEAVY SURF" "URBAN AND SMALL"
## [209] "WILDFIRES" "FOG AND COLD TEMPERATURES"
## [211] "SNOW/COLD" "FLASH FLOOD FROM ICE JAMS"
## [213] "TSTM WIND G58" "MUDSLIDE"
## [215] "HEAVY SNOW SQUALLS" "SNOW SQUALL"
## [217] "SNOW/ICE STORM" "HEAVY SNOW/SQUALLS"
## [219] "HEAVY SNOW-SQUALLS" "ICY ROADS"
## [221] "HEAVY MIX" "SNOW FREEZING RAIN"
## [223] "SNOW/SLEET" "SNOW/FREEZING RAIN"
## [225] "SNOW SQUALLS" "SNOW/SLEET/FREEZING RAIN"
## [227] "RECORD SNOW" "HAIL 0.75"
## [229] "RECORD HEAT" "THUNDERSTORM WIND 65MPH"
## [231] "THUNDERSTORM WIND/ TREES" "THUNDERSTORM WIND/AWNING"
## [233] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND TREES"
## [235] "TORNADO F2" "RIP CURRENTS"
## [237] "HURRICANE EMILY" "COASTAL SURGE"
## [239] "HURRICANE GORDON" "HURRICANE FELIX"
## [241] "THUNDERSTORM WIND 60 MPH" "THUNDERSTORM WINDS 63 MPH"
## [243] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM DAMAGE TO"
## [245] "THUNDERSTORM WIND 65 MPH" "FLASH FLOOD - HEAVY RAIN"
## [247] "THUNDERSTORM WIND." "FLASH FLOOD/ STREET"
## [249] "BLOWING SNOW" "HEAVY SNOW/BLIZZARD"
## [251] "THUNDERSTORM HAIL" "THUNDERSTORM WINDSHAIL"
## [253] "LIGHTNING WAUSEON" "THUDERSTORM WINDS"
## [255] "ICE AND SNOW" "STORM FORCE WINDS"
## [257] "HEAVY SNOW/ICE" "LIGHTING"
## [259] "HIGH WIND/HEAVY SNOW" "THUNDERSTORM WINDS AND"
## [261] "HEAVY PRECIPITATION" "HIGH WIND/BLIZZARD"
## [263] "TSTM WIND DAMAGE" "FLOOD FLASH"
## [265] "RAIN/WIND" "SNOW/ICE"
## [267] "HAIL 75" "HEAT WAVE DROUGHT"
## [269] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAT WAVES"
## [271] "UNSEASONABLY WARM AND DRY" "UNSEASONABLY COLD"
## [273] "RECORD/EXCESSIVE HEAT" "THUNDERSTORM WIND G52"
## [275] "HIGH WAVES" "FLASH FLOOD/FLOOD"
## [277] "FLOOD/FLASH" "LOW TEMPERATURE"
## [279] "HEAVY RAINS/FLOODING" "THUNDERESTORM WINDS"
## [281] "THUNDERSTORM WINDS/FLOODING" "HYPOTHERMIA"
## [283] "THUNDEERSTORM WINDS" "THUNERSTORM WINDS"
## [285] "HIGH WINDS/COLD" "COLD/WINDS"
## [287] "SNOW/ BITTER COLD" "COLD WEATHER"
## [289] "RAPIDLY RISING WATER" "WILD/FOREST FIRE"
## [291] "ICE/STRONG WINDS" "SNOW/HIGH WINDS"
## [293] "HIGH WINDS/SNOW" "SNOWMELT FLOODING"
## [295] "HEAVY SNOW AND STRONG WINDS" "SNOW ACCUMULATION"
## [297] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [299] "TORNADOES" "THUNDERSTORM WIND/HAIL"
## [301] "FREEZING DRIZZLE" "HAIL 175"
## [303] "FLASH FLOODING/FLOOD" "HAIL 275"
## [305] "HAIL 450" "EXCESSIVE RAINFALL"
## [307] "THUNDERSTORMW" "HAILSTORM"
## [309] "TSTM WINDS" "TSTMW"
## [311] "TSTM WIND 65)" "TROPICAL STORM DEAN"
## [313] "THUNDERSTORM WINDS/ FLOOD" "LANDSLIDE"
## [315] "HIGH WIND AND SEAS" "THUNDERSTORMWINDS"
## [317] "WILD/FOREST FIRES" "HEAVY SEAS"
## [319] "HAIL DAMAGE" "FLOOD & HEAVY RAIN"
## [321] "?" "THUNDERSTROM WIND"
## [323] "FLOOD/FLASHFLOOD" "HIGH WATER"
## [325] "HIGH WIND 48" "LANDSLIDES"
## [327] "URBAN/SMALL STREAM" "BRUSH FIRE"
## [329] "HEAVY SHOWER" "HEAVY SWELLS"
## [331] "URBAN SMALL" "URBAN FLOODS"
## [333] "FLASH FLOOD/LANDSLIDE" "HEAVY RAIN/SMALL STREAM URBAN"
## [335] "FLASH FLOOD LANDSLIDES" "TSTM WIND/HAIL"
## [337] "Other" "Ice jam flood (minor"
## [339] "Tstm Wind" "URBAN/SML STREAM FLD"
## [341] "ROUGH SURF" "Heavy Surf"
## [343] "Dust Devil" "Marine Accident"
## [345] "Freeze" "Strong Wind"
## [347] "COASTAL STORM" "Erosion/Cstl Flood"
## [349] "River Flooding" "Damaging Freeze"
## [351] "Beach Erosion" "High Surf"
## [353] "Heavy Rain/High Surf" "Unseasonable Cold"
## [355] "Early Frost" "Wintry Mix"
## [357] "Extreme Cold" "Coastal Flooding"
## [359] "Torrential Rainfall" "Landslump"
## [361] "Hurricane Edouard" "Coastal Storm"
## [363] "TIDAL FLOODING" "Tidal Flooding"
## [365] "Strong Winds" "EXTREME WINDCHILL"
## [367] "Glaze" "Extended Cold"
## [369] "Whirlwind" "Heavy snow shower"
## [371] "Light snow" "Light Snow"
## [373] "MIXED PRECIP" "Freezing Spray"
## [375] "DOWNBURST" "Mudslides"
## [377] "Microburst" "Mudslide"
## [379] "Cold" "Coastal Flood"
## [381] "Snow Squalls" "Wind Damage"
## [383] "Light Snowfall" "Freezing Drizzle"
## [385] "Gusty wind/rain" "GUSTY WIND/HVY RAIN"
## [387] "Wind" "Cold Temperature"
## [389] "Heat Wave" "Snow"
## [391] "COLD AND SNOW" "RAIN/SNOW"
## [393] "TSTM WIND (G45)" "Gusty Winds"
## [395] "GUSTY WIND" "TSTM WIND 40"
## [397] "TSTM WIND 45" "TSTM WIND (41)"
## [399] "TSTM WIND (G40)" "Frost/Freeze"
## [401] "AGRICULTURAL FREEZE" "OTHER"
## [403] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [405] "Lake Effect Snow" "Freezing Rain"
## [407] "Mixed Precipitation" "BLACK ICE"
## [409] "COASTALSTORM" "LIGHT SNOW"
## [411] "DAM BREAK" "Gusty winds"
## [413] "blowing snow" "GRADIENT WIND"
## [415] "TSTM WIND AND LIGHTNING" "gradient wind"
## [417] "Gradient wind" "Freezing drizzle"
## [419] "WET MICROBURST" "Heavy surf and wind"
## [421] "TYPHOON" "HIGH SWELLS"
## [423] "SMALL HAIL" "UNSEASONAL RAIN"
## [425] "COASTAL FLOODING/EROSION" " TSTM WIND (G45)"
## [427] "TSTM WIND (G45)" "HIGH WIND (G40)"
## [429] "TSTM WIND (G35)" "COASTAL EROSION"
## [431] "SEICHE" "COASTAL FLOODING/EROSION"
## [433] "HYPERTHERMIA/EXPOSURE" "WINTRY MIX"
## [435] "ROCK SLIDE" "GUSTY WIND/HAIL"
## [437] " TSTM WIND" "LANDSPOUT"
## [439] "EXCESSIVE SNOW" "LAKE EFFECT SNOW"
## [441] "FLOOD/FLASH/FLOOD" "MIXED PRECIPITATION"
## [443] "WIND AND WAVE" "LIGHT FREEZING RAIN"
## [445] "ICE ROADS" "ROUGH SEAS"
## [447] "TSTM WIND G45" "NON-SEVERE WIND DAMAGE"
## [449] "WARM WEATHER" "THUNDERSTORM WIND (G40)"
## [451] " FLASH FLOOD" "LATE SEASON SNOW"
## [453] "WINTER WEATHER MIX" "ROGUE WAVE"
## [455] "FALLING SNOW/ICE" "NON-TSTM WIND"
## [457] "NON TSTM WIND" "BLOWING DUST"
## [459] "VOLCANIC ASH" " HIGH SURF ADVISORY"
## [461] "HAZARDOUS SURF" "WHIRLWIND"
## [463] "ICE ON ROAD" "DROWNING"
## [465] "EXTREME COLD/WIND CHILL" "MARINE TSTM WIND"
## [467] "HURRICANE/TYPHOON" "WINTER WEATHER/MIX"
## [469] "FROST/FREEZE" "ASTRONOMICAL HIGH TIDE"
## [471] "HEAVY SURF/HIGH SURF" "TROPICAL DEPRESSION"
## [473] "LAKE-EFFECT SNOW" "MARINE HIGH WIND"
## [475] "TSUNAMI" "STORM SURGE/TIDE"
## [477] "COLD/WIND CHILL" "LAKESHORE FLOOD"
## [479] "MARINE THUNDERSTORM WIND" "MARINE STRONG WIND"
## [481] "ASTRONOMICAL LOW TIDE" "DENSE SMOKE"
## [483] "MARINE HAIL" "FREEZING FOG"
Wow this field is not very tidy. We are only really concerned with major events so lets try to identify which types of events we even want to worry about.
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,FATALITIES=sum(FATALITIES))
temp <- arrange(temp,desc(FATALITIES))
head(temp,10)
## Source: local data frame [10 x 2]
##
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 246
## 10 AVALANCHE 224
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,INJURIES=sum(INJURIES))
temp <- arrange(temp,desc(INJURIES))
head(temp,10)
## Source: local data frame [10 x 2]
##
## EVTYPE INJURIES
## 1 TORNADO 91345
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1359
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,PROPDMG=sum(PROPDMG))
temp <- arrange(temp,desc(PROPDMG))
head(temp,10)
## Source: local data frame [10 x 2]
##
## EVTYPE PROPDMG
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56937160617
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140811979
## 6 HAIL 15732267013
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046260
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,CROPDMG=sum(CROPDMG))
temp <- arrange(temp,desc(CROPDMG))
head(temp,10)
## Source: local data frame [10 x 2]
##
## EVTYPE CROPDMG
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3000954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1420887100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
OK so now we have a list of categories we should target. I’m not going to do anything fancy here. Just going to try and group up most of the data. Floods are a bit problematic because of the distinction between between flash flood and regular floods so I am going to just group them into one category of flood. We can always come back and break them out later if we wish, but for our purposes this should be fine for now. I will group all of the winter stuff into one category as well. This should be good enough for our purposes.
data$EVTYPE[grep("tornado",data$EVTYPE,ignore.case = T)]<-"Tornado"
data$EVTYPE[grep("heat",data$EVTYPE,ignore.case = T)]<-"Excessive Heat"
data$EVTYPE[grep("flood",data$EVTYPE,ignore.case = T)]<-"Flood"
data$EVTYPE[grep("lightning",data$EVTYPE,ignore.case = T)]<-"Lightning"
data$EVTYPE[grep("thunderstorm",data$EVTYPE,ignore.case = T)]<-"Thunderstorm"
data$EVTYPE[grep("tstm",data$EVTYPE,ignore.case = T)]<-"Thunderstorm"
data$EVTYPE[grep("hail",data$EVTYPE,ignore.case = T)]<-"Hail"
data$EVTYPE[grep("hurricane",data$EVTYPE,ignore.case = T)]<-"Hurricane (Typhoon)"
data$EVTYPE[grep("typhoon",data$EVTYPE,ignore.case = T)]<-"Hurricane (Typhoon)"
data$EVTYPE[grep("winter",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("snow",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("sleet",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("ice",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
First lets have a look at deaths and injuries
DandI <- group_by(data,EVTYPE)
DandI <- summarise(DandI,deaths=sum(FATALITIES),injuries=sum(INJURIES))
#now lets grap the top 5 of each (which may overlap)
tdeaths <- arrange(DandI,desc(deaths))$deaths[5]
tinjuries <- arrange(DandI,desc(injuries))$injuries[5]
DandI <- filter(DandI,deaths>tdeaths | injuries>tinjuries)
DandI$both <- DandI$deaths+DandI$injuries
DandI <- gather(DandI, type,total,deaths:both)
#DandI$EVTYPE <- as.factor(DandI$EVTYPE)
#DandI$EVTYPE <- factor(DandI$EVTYPE,levels(DandI$EVTYPE)[c(5,1,4,2,3)])
ggplot(DandI,aes(x=reorder(EVTYPE,desc(total)),y=total,fill=type)) +
geom_bar(position="dodge",stat="identity") +
ggtitle("Numer of Deats and Injuries for the Most Harmful Types of Events") +
xlab("Event Type") +
labs(fill="")
It is pretty clear that Tornadoes are the most dangerous type of event.
Moving onto looking at property damage will will repeat pretty much what we did before but this time use the total property damage.
dmg <- group_by(data,EVTYPE)
dmg <- summarise(dmg,total=sum(PROPDMG))
dmg <- arrange(dmg,desc(total))
dmg <- head(dmg)
dmg
## Source: local data frame [6 x 2]
##
## EVTYPE total
## 1 Flood 167528840813
## 2 Hurricane (Typhoon) 85356410010
## 3 Tornado 58593097867
## 4 STORM SURGE 43323536000
## 5 Hail 15974564013
## 6 Winter Storm 11765149361
Lets convert that to billions and throw it on a chart.
dmg$total <- dmg$total/1000000000
ggplot(dmg,aes(x=reorder(EVTYPE,desc(total)),y=total)) +
geom_bar(stat="identity",fill="blue") +
ggtitle("Property Damage for Different Types of Event") +
ylab("Property Damage (in billions)") +
xlab("Event Type")
It is pretty clear that Floods cause the most property damage with Hurricanes in 2nd place.
Moving onto crop damage
dmg <- group_by(data,EVTYPE)
dmg <- summarise(dmg,total=sum(CROPDMG))
dmg <- arrange(dmg,desc(total))
dmg <- head(dmg)
dmg
## Source: local data frame [6 x 2]
##
## EVTYPE total
## 1 DROUGHT 13972566000
## 2 Flood 12379679100
## 3 Hurricane (Typhoon) 5516117800
## 4 Winter Storm 5204221400
## 5 Hail 3021887623
## 6 EXTREME COLD 1292973000
Lets convert that to billions and throw it on a chart.
dmg$total <- dmg$total/1000000000
ggplot(dmg,aes(x=reorder(EVTYPE,desc(total)),y=total)) +
geom_bar(stat="identity",fill="blue") +
ggtitle("Crop Damage for Different Types of Event") +
ylab("Crop Damage (in billions)") +
xlab("Event Type")
We can see here that Droughts cause the most damage while Floods come in a close second. I was suprised to see how much damage winter storms cause to crops.