Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.
The aim of this project is to analyze this database to understand events having the maximum impact with regard to public health and economic consequences. In the following sections the data is downloaded and and cleaned and the relevant analysis is carried out. Events with maximum impact with respect to public health are determined by the number of fatalities and injuries, while events with maximum impact with regard to economic consequences are detemined with regard to crop damage and property damage. It is found that tornados cause maximum damage to public health and floods cause maximum economic damage.
library(dplyr)
library(ggplot2)
The data is available at this link and the documentation is available at this link. The following code downloads and saves the data in the local machine.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url=fileUrl,destfile="~/DataScientistsToolbox/05ReproducibleResearch/Week4/Assignment_Final/repdata_data_StormData.csv.bz2")
stormdata <- read.csv("repdata_data_StormData.csv.bz2")
dim(stormdata)
## [1] 902297 37
length(levels(stormdata$EVTYPE))
## [1] 985
The above shows that there are 902297 observations of 37 variables. Among the variables of concern are EVTYPE, which refers to the event-type (storm, rain, hail etc.), CROPDMG and CROPDMGEXP which describe damage to crops, PROPDMG and PROPDMGEXP which describe damage to property, and FATALITIES and INJURIES. 985 number of unique entries in the EVTYPE variable. This high number is because of the high variability in the number of ways in which a single event-type has been entered. For example, “Wnterstorm”, “Winter Storm”, “winterstorm”, “winterstorms” etc. all refer to the single event-type “winterstorms”.
This extreme variability is the main challenge in processing the data. To complicate matters, some of the entries in the EVTYPE variable cannot be unequivocally assigned to a single weather event. For example, it is not clear if “heavysnowhighwinds&flood” specifically refers to heavy snow, high winds, or floods.
A few things can be straightaway done to reduce this variability. The first thing is to replace all upper case letters with lower case letters. The next is to remove all blank spaces and characters such as “(”, “)”, “-” and “/”, as well as all numbers.
stormdata$EVTYPE <- as.factor(tolower(stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(gsub(" ","",stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(gsub("/","",stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(gsub("-","",stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(gsub("[0-9]","",stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(gsub("\\(|\\)","",stormdata$EVTYPE))
The general strategy to proceed from here is as follows. Page 6 of the documentation lists the following 50 main categories of event-type:
Our general strategy to proceed will be the following. As far as possible, we shall try to assign each entry in the EVTYPE variable to one of the categories from the above list. For example if an entry reads “wintryweather” or “winterweather”, then it is assigned to the “winterweather” (the 50th element in the above list). If an entry belongs to multiple categories, then they shall be grouped into a category describing these multiple categories. For example, “Heavyrain/winterstorm”, “Winterstorm Severerain” will be grouped into the category “heavrainwinterstorm”. How to extract the consequences on the basis of this strategy will be described later.
This event has been properly entered.
stormdata$EVTYPE <- as.character(stormdata$EVTYPE)
avalanche_cases <- c("avalance","avalanche")
stormdata$EVTYPE[stormdata$EVTYPE %in% avalanche_cases] <- "avalanche"
bliz_cases <- c("blizzard","blizzardfreezingrain","blizzardsummary","blizzardweather","groundblizzard")
stormdata$EVTYPE[stormdata$EVTYPE %in% bliz_cases] <- "blizzard"
coastalflood_cases <-
c("beacherosioncoastalflood","coastalflood","coastalflooding","coastalfloodingerosion","coastaltidalflood","cstlfloodingerosion","beachflood")
stormdata$EVTYPE[stormdata$EVTYPE %in% coastalflood_cases] <- "coastalflood"
coldwindchill_cases <-
c("bitterwindchill","bitterwindchilltemperatures","cold","coldtemperature",
"coldtemperatures","coldwave","coldweather","coldwindchill","coldwindchilltemperatures",
"coldwinds","excessivecold","extendedcold","extremecold","extremecoldwindchill",
"extremecordcold","extremewindchill","extremewindchillblowingsno","extremewindchills",
"extremewindchilltemperatures","prolongcold","recordcold","severecold",
"unseasonablecold","unseasonablycold","unusuallycold","windchill","unseasonablycool","blowingsnow&extremewindch","blowingsnowextremewindchil")
stormdata$EVTYPE[stormdata$EVTYPE %in% coldwindchill_cases] <- "coldwindchill"
This event has been properly entered.
densefog_cases <- c("densefog","fog","patchydensefog","vog")
stormdata$EVTYPE[stormdata$EVTYPE %in% densefog_cases] <- "densefog"
drought_cases <- c("drought","snowdrought")
stormdata$EVTYPE[stormdata$EVTYPE %in% drought_cases] <- "drought"
heatdrought_cases <- c("droughtexcessiveheat","excessiveheatdrought","heatdrought","heatwavedrought")
stormdata$EVTYPE[stormdata$EVTYPE %in% heatdrought_cases] <- "heatdrought"
dust_cases <- c("blowingdust","dustdevel","dustdevil")
stormdata$EVTYPE[stormdata$EVTYPE %in% dust_cases] <- "dustdevil"
This event has been properly entered.
heat_cases <-
c("excessiveheat","extremeheat","heat","heatburst","heatwave","heatwaves",
"recordexcessiveheat","recordheat","recordheatwave","recordwarmtemps.","recordwarm","unusuallywarm","unseasonablywarmyear","unseasonablyhot","abnormalwarmth","hotweather","hightemperaturerecord","recordhightemperatures")
stormdata$EVTYPE[stormdata$EVTYPE %in% heat_cases] <- "excessiveheat"
coldwindchill_cases <-
c("bitterwindchill","bitterwindchilltemperatures","cold","coldandsnow","coldandwetconditions","coldtemperature","coldtemperatures","coldwave","coldweather","coldwindchill","coldwindchilltemperatures","coldwinds","excessivecold","extendedcold","extremecold","extremecoldwindchill","extremerecordcold","extremewindchill","extremewindchillblowingsno","extremewindchills","extremewindchilltemperatures","fogandcoldtemperatures","highwindlowwindchill","lowwindchill","prolongcold","prolongcoldsnow","recordcold","recordsnowcold","severecold","snowandcold","snowbittercold","snowcold","snow\\cold","unseasonablecold","unseasonablycold","unusuallycold","windchill")
stormdata$EVTYPE[stormdata$EVTYPE %in% coldwindchill_cases] <- "extremecoldwindchill"
#
flash_cases <-
c("flashflood","flashfloodflood", "flashfloodfromicejams","flashfloodheavyrain",
"flashflooding","flashfloodingflood","flashfloodingthunderstormwi",
"flashfloodlandslide","flashfloodlandslides","flashfloods","flashfloodstreet",
"flashfloodwinds","flashfloooding","floodflash","floodflashflood","floodflashflooding",
"floodfloodflash","icestormflashflood","localflashflood")
stormdata$EVTYPE[stormdata$EVTYPE %in% flash_cases] <- "flashflood"
#
#
flood_cases <-
c("flood","flooding","floodrainwind","floodrainwinds","floodriverflood","floods","floodwatch","localflood","majorflood","minorflood","minorflooding","riverandstreamflood","riverflood","riverflooding","ruralflood","smallstreamflood","smallstreamflooding","snowmeltflooding","streamflooding","streetflood","streetflooding","smallstreamandurbanflood","smallstreamandurbanfloodin","smallstreamurbanflood","urbanandsmall","urbanandsmallstream","urbanandsmallstreamflood","urbanandsmallstreamfloodin","urbanflood","urbanflooding","urbanfloods","urbansmall","urbansmallflooding","urbansmallstream","urbansmallstreamflood","urbansmallstreamflooding","urbansmallstrmfldg","urbansmlstreamfld","urbansmlstreamfldg","urbanstreetflooding","icejamflooding")
stormdata$EVTYPE[stormdata$EVTYPE %in% flood_cases] <- "flood"
#
frostfreeze_cases <-
c("coldandfrost","earlyfrost","firstfrost","frost","frostfreeze",
"frost\\freeze","recordcoldfrost","agriculturalfreeze","damagingfreeze",
"earlyfreeze","freeze","hardfreeze","latefreeze")
stormdata$EVTYPE[stormdata$EVTYPE %in% frostfreeze_cases] <- "frostfreeze"
funnel_cases <- c("coldairfunnel","coldairfunnels","funnel","funnelcloud","funnelcloud.","funnelclouds","funnels","wallcloudfunnelcloud","funnelcloudhail")
stormdata$EVTYPE[stormdata$EVTYPE %in% funnel_cases] <- "funnelcloud"
This event has been properly entered.
hail_cases <-
c("deephail","hail","hail.","hailaloft","haildamage","hailicyroads",
"hailstorm","hailstorms","hailwind","hailwinds","lateseasonhail",
"nonseverehail","smallhail","thunderstormhail","windhail")
stormdata$EVTYPE[stormdata$EVTYPE %in% hail_cases] <- "hail"
This event has been included in “excessive heat”.
heavyrain_cases <-
c("heavyprecipatation","heavyprecipitation","heavyshower","heavyshowers","excessiverain","excessiverainfall","heavyrain","heavyrainandwind","heavyraineffects","heavyrainfall","heavyrains","heavyrainsevereweather","heavyrainsmallstreamurban","heavyrainwind","hvyrain","locallyheavyrain","prolongedrain","rain","raindamage","rainheavy","rainstorm","rainwind","recordexcessiverainfall","recordrainfall","torrentialrain","torrentialrainfall",
"unseasonalrain","excessiveprecipitation")
stormdata$EVTYPE[stormdata$EVTYPE %in% heavyrain_cases] <- "heavyrain"
heavysnowice_cases <-
c("heavysnow","heavysnowand","heavysnowandblowingsnow","heavysnowandice",
"heavysnowblowingsnow","heavysnowhigh","heavysnowice","heavysnow&ice",
"heavysnowpack","heavysnowshower","snowandheavysnow",
"snowadvisory","snowheavysnow","snowice","iceandsnow","icesnow","snowsquall","snowsqualls","snowstorm","snowandice")
stormdata$EVTYPE[stormdata$EVTYPE %in% heavysnowice_cases] <- "heavysnowandice"
surf_cases <-
c("hazardoussurf","heavysurf","heavysurfhighsurf","highsurf","highsurfadvisories","highsurfadvisory","roughsurf")
stormdata$EVTYPE[stormdata$EVTYPE %in% surf_cases] <- "highsurf"
highwind_cases <-c("highwind","highwindandseas","highwinddamage","highwindg","highwinds","highwindscold","highwindseas","highwindssnow","snowhighwinds")
stormdata$EVTYPE[stormdata$EVTYPE %in% highwind_cases] <- "highwind"
hurtyp_cases <-
c("hurricane","hurricaneedouard","hurricaneemily","hurricaneerin","hurricanefelix","hurricanegeneratedswells","hurricanegordon","hurricaneopal","hurricaneopalhighwinds","hurricanetyphoon","typhoon")
stormdata$EVTYPE[stormdata$EVTYPE %in% hurtyp_cases] <- "hurricanetyphoon"
icestorm_cases <- c("glazeicestorm","icestorm")
stormdata$EVTYPE[stormdata$EVTYPE %in% icestorm_cases] <- "icestorm"
lakeeffectsnow_cases <- c("heavylakesnow","lakeeffectsnow")
stormdata$EVTYPE[stormdata$EVTYPE %in% lakeeffectsnow_cases] <- "lakeeffectsnow"
lakeshoreflood_cases <- c("lakeflood","lakeshoreflood")
stormdata$EVTYPE[stormdata$EVTYPE %in% lakeshoreflood_cases] <- "lakeshoreflood"
lightning_cases <- c("lightning","lightning.","lightningandwinds","lightningdamage","lightningfire","lightninginjury","lightningwauseon")
stormdata$EVTYPE[stormdata$EVTYPE %in% lightning_cases] <- "lightning"
This event has been properly entered.
This event has been properly entered.
This event has been properly entered.
marinethunderstormwind_cases <- c("marinethunderstormwind","marinetstmwind")
stormdata$EVTYPE[stormdata$EVTYPE %in% marinethunderstormwind_cases] <- "marinethunderstormwind"
rip_cases <- c("ripcurrent","ripcurrents")
stormdata$EVTYPE[stormdata$EVTYPE %in% rip_cases] <- "ripcurrent"
#
This event has been properly entered.
sleet_cases <- c("sleet","sleetstorm","snowsleet","snowandsleet","lightsnowandsleet","sleetsnow","snowsleetrain")
stormdata$EVTYPE[stormdata$EVTYPE %in% sleet_cases] <- "sleet"
tide_cases <- c("astronomicalhightide","blowouttide","blowouttides","hightides","stormsurgetide","coastalsurge","stormsurge","tidalflooding")
stormdata$EVTYPE[stormdata$EVTYPE %in% tide_cases] <- "stormsurgetide"
strongwind_cases <- c("stormforcewinds","strongwind","strongwinds","strongwindgust","windadvisory","gustnado","windgusts","gustywind","wind")
stormdata$EVTYPE[stormdata$EVTYPE %in% strongwind_cases] <- "strongwind"
thunderstormwind_cases <-
c("gustythunderstormwind","gustythunderstormwinds","severethunderstorm","severethunderstorms","severethunderstormwinds","thunderestormwinds","thunderstorm","thunderstormdamage","thunderstormdamageto","thunderstorms","thunderstormswind","thunderstormswinds","thunderstormw","thunderstormwind","thunderstormwind.","thunderstormwindawning","thunderstormwindg","thunderstormwindmph","thunderstormwindmph.","thunderstormwinds","thunderstormwinds.","thunderstormwindsand","thunderstormwindsg","thunderstormwindslecen","thunderstormwindslightning","thunderstormwindsmph","thunderstormwindss","thunderstormwindssmallstrea","thunderstormwindtree","thunderstormwindtrees","thunderstormwins","thunderstormwwinds","thunderstromwind","thunderstromwinds","thundertormwinds","thundertsormwind","tstm","tstmheavyrain","tstmw","tstmwind","tstmwindandlightning","tstmwinddamage","tstmwindg","tstmwindhail","tstmwinds","tstmwnd","thundeerstormwinds","tunderstormwind")
stormdata$EVTYPE[stormdata$EVTYPE %in% thunderstormwind_cases] <- "thunderstormwind"
#tstmwindandlightning,tstmheavyrain,thunderstormwindslightning - leave out?
tornado_cases <-
c("coldairtornado","tornado","tornadodebris","tornadoes","tornadof","tornados","torndao")
stormdata$EVTYPE[stormdata$EVTYPE %in% tornado_cases] <- "tornado"
This event has been properly entered.
tropicalstorm_cases <- c("tropicalstorm","tropicalstormalberto","tropicalstormdean","tropicalstormgordon" ,"tropicalstormjerry")
stormdata$EVTYPE[stormdata$EVTYPE %in% tropicalstorm_cases] <- "tropicalstorm"
This event has been properly entered.
volc_cases <- c("volcanicash","volcanicashfall","volcanicashplume","volcaniceruption")
stormdata$EVTYPE[stormdata$EVTYPE %in% volc_cases] <- "volcanicash"
waterspout_cases <-
c("waterspout","waterspoutfunnelcloud","waterspouts","waterspouttornado","wayterspout",
"tornadowaterspout")
stormdata$EVTYPE[stormdata$EVTYPE %in% waterspout_cases] <- "waterspout"
wildfire_cases <-
c("brushfire","brushfires","forestfires","grassfires","redflagfirewx","wildfire","wildfires","wildforestfire","wildforestfires")
stormdata$EVTYPE[stormdata$EVTYPE %in% wildfire_cases] <- "wildfire"
winterstorm_cases <- c("winterstorm","winterstorms")
stormdata$EVTYPE[stormdata$EVTYPE %in% winterstorm_cases] <- "winterstorm"
winter_cases <- c("wintermix","winterweather","winterweathermix","winterymix")
stormdata$EVTYPE[stormdata$EVTYPE %in% winter_cases] <- "winterweather"
#mudslide and landslide
slide_cases <-
c("landslide","landslides","mudrockslide",
"mudslide","mudslidelandslide","mudslides","rockslide")
stormdata$EVTYPE[stormdata$EVTYPE %in% slide_cases] <- "landslide"
#thunderstormwindlightning
thunderstormwindlightning_cases <- c("thunderstormwindlightning","lightningandthunderstormwin")
stormdata$EVTYPE[stormdata$EVTYPE %in% thunderstormwindlightning_cases] <- "thunderstormwindlightning"
#heavyrainlightning
heavyrainlightning_cases <- c("lightningandheavyrain","heavyrainlightning","lightningheavyrain")
stormdata$EVTYPE[stormdata$EVTYPE %in% heavyrainlightning_cases] <- "heavyrainlightning"
#winterstormhighwind
winterstormhighwind_cases <-
c("winterstormhighwind","winterstormhighwinds")
stormdata$EVTYPE[stormdata$EVTYPE %in% winterstormhighwind_cases] <- "winterstormhighwind"
#highwindwindchill
highwindwindchill_cases <-
c("highwindwindchill","highwindsandwindchill","windchillhighwind","snowhighwindwindchill")
stormdata$EVTYPE[stormdata$EVTYPE %in% highwindwindchill_cases] <- "highwindwindchill"
#highwindblizzard
highwindblizzard_cases <-
c("highwindblizzard","highwindblizzardfreezingra","blizzardhighwind")
stormdata$EVTYPE[stormdata$EVTYPE %in% highwindblizzard_cases] <- "highwindblizzard"
#heavysnowhighwind
heavysnowhighwind_cases <-
c("heavysnowhighwind","heavysnowhighwindsfreezing","heavysnowhighwinds","highwindandheavysnow","highwindheavysnow")
stormdata$EVTYPE[stormdata$EVTYPE %in% heavysnowhighwind_cases] <- "heavysnowhighwind"
#highwindheavyrain
highwindheavyrain_cases <-
c("highwindheavyrain","highwindsheavyrains","highwindsheavyrain")
stormdata$EVTYPE[stormdata$EVTYPE %in% highwindheavyrain_cases] <- "highwindheavyrain"
#unseasonablywarmwet
stormdata$EVTYPE[stormdata$EVTYPE %in% c("unseasonablywarmwet","unseasonablywarm&wet")] <- "unseasonablywarmwet"
#heavysnowicestorm
stormdata$EVTYPE[stormdata$EVTYPE %in% c("snowandicestorm","heavysnowandicestorm","heavysnowicestorm","icestormandsnow")] <- "heavysnowicestorm"
#dryweather
dry_cases <- c("excessivelydry","recorddryness","dryconditions","driestmonth","dryness","abnormallydry","verydry","dryspell","mildanddrypattern","drypattern","hotdrypattern","milddrypattern","unseasonablydry")
stormdata$EVTYPE[stormdata$EVTYPE %in% dry_cases] <- "dryweather"
stormdata$EVTYPE <- as.character(stormdata$EVTYPE)
length(unique(stormdata$EVTYPE))
## [1] 329
unique(stormdata$EVTYPE)
## [1] "tornado" "thunderstormwind"
## [3] "hail" "freezingrain"
## [5] "snow" "flashflood"
## [7] "heavysnowandice" "winterstorm"
## [9] "hurricanetyphoon" "extremecoldwindchill"
## [11] "heavyrain" "lightning"
## [13] "densefog" "ripcurrent"
## [15] "highwind" "funnelcloud"
## [17] "thunderstormwindshail" "excessiveheat"
## [19] "strongwind" "lighting"
## [21] "heavyrainlightning" "wallcloud"
## [23] "flood" "waterspout"
## [25] "blizzard" "breakupflooding"
## [27] "highwindblizzard" "frostfreeze"
## [29] "coastalflood" "highwindandhightides"
## [31] "stormsurgetide" "heavysnowhighwind"
## [33] "recordcoldandhighwind" "recordhightemperature"
## [35] "recordhigh" "highwindheavyrain"
## [37] "icestorm" "recordlow"
## [39] "highwindwindchill" "lowtemperaturerecord"
## [41] "avalanche" "marinemishap"
## [43] "highwindwindchillblizzard" "highseas"
## [45] "severeturbulence" "recordsnowfall"
## [47] "recordwarmth" "heavysnowwind"
## [49] "winddamage" "duststorm"
## [51] "apachecounty" "sleet"
## [53] "dustdevil" "thunderstormwindsfunnelclou"
## [55] "winterstormhighwind" "gustywinds"
## [57] "floodingheavyrain" "snowandwind"
## [59] "heavysurfcoastalflooding" "highsurf"
## [61] "wildfire" "high"
## [63] "highwindsduststorm" "landslide"
## [65] "drymicroburst" "winds"
## [67] "microburst" "ice"
## [69] "downburst" "gustnadoand"
## [71] "wetmicroburst" "downburstwinds"
## [73] "drymicroburstwinds" "drymircoburstwinds"
## [75] "microburstwinds" "blizzardheavysnow"
## [77] "blowingsnow" "freezingdrizzle"
## [79] "lightningthunderstormwindss" "heavyrainflooding"
## [81] "glaze" "firstsnow"
## [83] "freezingrainandsleet" "dryweather"
## [85] "unseasonablywet" "wintrymix"
## [87] "winterweather" "ripcurrentsheavysurf"
## [89] "sleetrainsnow" "unseasonablywarm"
## [91] "drought" "normalprecipitation"
## [93] "highwindsflooding" "dry"
## [95] "rainsnow" "snowrainsleet"
## [97] "tornadoes,tstmwind,hail" "tropicalstorm"
## [99] "lightningthunderstormwinds" "thunderstormwindlightning"
## [101] "ligntning" "freezingrainsnow"
## [103] "thundersnow" "coolandwet"
## [105] "heavyrainsnow" "snowsleetfreezingrain"
## [107] "glazeice" "earlysnow"
## [109] "smallstreamand" "excessivewetness"
## [111] "gradientwinds" "sleeticestorm"
## [113] "thunderstormwindsurbanflood" "rotatingwallcloud"
## [115] "largewallcloud" "blowingsnowextremewindchi"
## [117] "freezingrainsleet" "heavysnowblizzard"
## [119] "windstorm" "lakeshoreflood"
## [121] "heavysnowicestorm" "heavysnowsleet"
## [123] "heatdrought" "thundestormwinds"
## [125] "warmdryconditions" "highwindscoastalflood"
## [127] "snowrain" "icefloes"
## [129] "highwaves" "lakeeffectsnow"
## [131] "heavysnowfreezingrain" "heavywetsnow"
## [133] "dustdevilwaterspout" "thunderstormwindsheavyrain"
## [135] "blizzardandheavysnow" "blizzardandextremewindchil"
## [137] "mudslidesurbanflooding" "heavysnowwinterstorm"
## [139] "blizzardwinterstorm" "duststormhighwinds"
## [141] "icejam" "heavysnowandhighwinds"
## [143] "heavysnowhighwinds&flood" "hailflooding"
## [145] "thunderstormwindsflashflood" "wetsnow"
## [147] "heavyrainandflood" "rainandwind"
## [149] "snowicestorm" "belownormalprecipitation"
## [151] "lightsnow" "recordtemperatures"
## [153] "other" "recordsnow"
## [155] "heavysnowsqualls" "icyroads"
## [157] "heavymix" "snowfreezingrain"
## [159] "lackofsnow" "damfailure"
## [161] "thuderstormwinds" "freezingrainandsnow"
## [163] "freezingrainsleetand" "southeast"
## [165] "freezingdrizzleandfreezing" "heavyrain;urbanfloodwinds;"
## [167] "highwater" "snowshowers"
## [169] "heavysnowblizzardavalanche" "wetweather"
## [171] "unseasonablywarmanddry" "freezingrainsleetandlight"
## [173] "tidalflood" "beacherosin"
## [175] "lowtemperature" "sleet&freezingrain"
## [177] "heavyrainsflooding" "thunderstormwindsflooding"
## [179] "highwayflooding" "hypothermia"
## [181] "thunerstormwinds" "heavyrainmudslidesflood"
## [183] "dryhotweather" "rapidlyrisingwater"
## [185] "icestrongwinds" "heavysnowandstrongwinds"
## [187] "snowaccumulation" "snowblowingsnow"
## [189] "thunderstormwindhail" "thunderstormwindsflood"
## [191] "nearrecordsnow" "excessive"
## [193] "heavyseas" "flood&heavyrain"
## [195] "?" "hotpattern"
## [197] "snowfallrecord" "mildpattern"
## [199] "saharandust" "urbanfloodlandslide"
## [201] "heavyswells" "smallstream"
## [203] "heavyrainurbanflood" "landslideurbanflood"
## [205] "recorddrymonth" "temperaturerecord"
## [207] "icejamfloodminor" "marineaccident"
## [209] "coastalstorm" "erosioncstlflood"
## [211] "lightsnowflurries" "wetmonth"
## [213] "wetyear" "beacherosion"
## [215] "hotanddry" "heavyrainhighsurf"
## [217] "icefog" "landslump"
## [219] "lateseasonsnowfall" "freezingfog"
## [221] "driftingsnow" "whirlwind"
## [223] "latesnow" "recordmaysnow"
## [225] "recordwintersnow" "recordtemperature"
## [227] "mixedprecip" "blackice"
## [229] "gradientwind" "freezingspray"
## [231] "summaryjan" "summaryofmarch"
## [233] "summaryofaprilrd" "summaryofapril"
## [235] "summaryaugust" "summaryofmay"
## [237] "summaryofmayam" "summaryofmaypm"
## [239] "metrostorm,may" "summaryofjune"
## [241] "summaryjune" "summaryofjuly"
## [243] "summaryjuly" "summaryofaugust"
## [245] "summaryseptember" "summarysept."
## [247] "summary:oct." "summary:october"
## [249] "summary:nov." "wetmicoburst"
## [251] "nosevereweather" "summary:sept."
## [253] "lightsnowfall" "gustywindrain"
## [255] "gustywindhvyrain" "earlysnowfall"
## [257] "monthlysnowfall" "seasonalsnowfall"
## [259] "monthlyrainfall" "smlstreamfld"
## [261] "volcanicash" "thundersnowshower"
## [263] "none" "dambreak"
## [265] "sleetfreezingrain" "hypothermiaexposure"
## [267] "mixedprecipitation" "icestormblizzard"
## [269] "floodstrongwind" "mountainsnows"
## [271] "heavysurfandwind" "highswells"
## [273] "earlyrain" "hotspell"
## [275] "unusualwarmth" "wakelowwind"
## [277] "moderatesnow" "moderatesnowfall"
## [279] "coastalerosion" "unusualrecordwarmth"
## [281] "seiche" "hyperthermiaexposure"
## [283] "icepellets" "recordcool"
## [285] "tropicaldepression" "coolspell"
## [287] "gustywindhail" "lightsnowfreezingprecip"
## [289] "monthlyprecipitation" "monthlytemperature"
## [291] "remnantsoffloyd" "landspout"
## [293] "excessivesnow" "windandwave"
## [295] "lightfreezingrain" "recordprecipitation"
## [297] "iceroads" "roughseas"
## [299] "unseasonablywarmwet" "unseasonablycool&wet"
## [301] "nonseverewinddamage" "warmweather"
## [303] "unseasonallowtemp" "lateseasonsnow"
## [305] "gustylakewind" "redflagcriteria"
## [307] "wnd" "smoke"
## [309] "extremelywet" "unusuallylatesnow"
## [311] "recordlowrainfall" "roguewave"
## [313] "prolongwarmth" "accumulatedsnowfall"
## [315] "fallingsnowice" "nontstmwind"
## [317] "patchyice" "northernlights"
## [319] "marinethunderstormwind" "verywarm"
## [321] "abnormallywet" "iceonroad"
## [323] "drowning" "marinehail"
## [325] "marinehighwind" "tsunami"
## [327] "densesmoke" "marinestrongwind"
## [329] "astronomicallowtide"
The number of injuries and fatalities will be used to determine events with maximum impact on public health. The following calculates the sum of injuries and fatalities for each event type and plots the top ten events with maximum number of injuries and fatalities.
# Injuries
event_injury <- aggregate(INJURIES~EVTYPE,data=stormdata,sum)
event_injury <- arrange(event_injury,desc(INJURIES))
head(event_injury,10)
## EVTYPE INJURIES
## 1 tornado 91364
## 2 thunderstormwind 9507
## 3 excessiveheat 9209
## 4 flood 6873
## 5 lightning 5231
## 6 icestorm 1990
## 7 flashflood 1802
## 8 wildfire 1608
## 9 highwind 1506
## 10 hail 1371
injury_top10 <- event_injury[1:10,]
# Fatalities
event_fatality <- aggregate(FATALITIES~EVTYPE,data=stormdata,sum)
event_fatality <- arrange(event_fatality,desc(FATALITIES))
head(event_fatality,10)
## EVTYPE FATALITIES
## 1 tornado 5633
## 2 excessiveheat 3132
## 3 flashflood 1035
## 4 lightning 817
## 5 thunderstormwind 711
## 6 ripcurrent 572
## 7 flood 511
## 8 extremecoldwindchill 468
## 9 highwind 293
## 10 avalanche 225
fatality_top10 <- event_fatality[1:10,]
g_inj <- ggplot(injury_top10,aes(x=reorder(EVTYPE,-INJURIES),y=INJURIES,fill=INJURIES))
g_inj + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90,vjust=0.5,hjust=1)) + xlab("") + ggtitle("Top 10 events with maximum injuries")
g_fat <- ggplot(fatality_top10,aes(x=reorder(EVTYPE,-FATALITIES),y=FATALITIES,fill=FATALITIES))
g_fat + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90,vjust=0.5,hjust=1)) + xlab("") + ggtitle("Top 10 events with maximum fatalities")
Tornadoes have caused the maximum damage to public health, with nearly 5,633 fatalities and 91,364 injuries, from 1950 t0 2011.
The amount of damage on crops and property will be used to determine events with maximum impact on the economy.
According to page 12 of the documentation, the crop damage is given to three significant digits in the CROPDMG column and its exponent is given in the CROPDMGEXP column. For property damage the corresponding columns are PROPDMG and PROPDMGEXP. However this method has not been consistently followed while entering the data, so some care must be exercised while extracting the numbers for the economic damage. The following code achieves this objective.
# Damage to crops:
stormdata$CROPDMGEXP_ACTUAL <- 0
for (ii in 0:9) {
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP == as.character(ii)] <- ii
}
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP %in% c("","+","-","?")] <- 0
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP %in% c("h","H")] <- 2
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP %in% c("k","K")] <- 3
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP %in% c("m","M")] <- 6
stormdata$CROPDMGEXP_ACTUAL[stormdata$CROPDMGEXP %in% c("b","B")] <- 9
stormdata$CROPDMG_ACTUAL <- stormdata$CROPDMG * 10^stormdata$CROPDMGEXP_ACTUAL
# Damage to property:
stormdata$PROPDMGEXP_ACTUAL <- 0
for (ii in 0:9) {
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP == as.character(ii)] <- ii
}
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP %in% c("","+","-","?")] <- 0
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP %in% c("h","H")] <- 2
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP %in% c("k","K")] <- 3
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP %in% c("m","M")] <- 6
stormdata$PROPDMGEXP_ACTUAL[stormdata$PROPDMGEXP %in% c("b","B")] <- 9
stormdata$PROPDMG_ACTUAL <- stormdata$PROPDMG * 10^stormdata$PROPDMGEXP_ACTUAL
# Total damage:
stormdata$TOTAL_DAMAGE <- stormdata$CROPDMG_ACTUAL + stormdata$PROPDMG_ACTUAL
In the above, stormdata$CROPDMG_ACTUAL and stormdata$PROPDMG_ACTUAL give the actual damage in US dollars to crops and property, respectively, and TOTAL_DAMAGE is the sum of these two quantities. The next code plots the top ten events which caused the maximum damage.
event_damage <- arrange(aggregate(TOTAL_DAMAGE~EVTYPE,data=stormdata,sum),desc(TOTAL_DAMAGE))
head(event_damage,10)
## EVTYPE TOTAL_DAMAGE
## 1 flood 161063274407
## 2 hurricanetyphoon 90872527810
## 3 tornado 57367113946
## 4 stormsurgetide 47975517150
## 5 flashflood 19122009246
## 6 hail 19024483136
## 7 drought 15018672000
## 8 thunderstormwind 12449411764
## 9 icestorm 8967041360
## 10 wildfire 8899910130
damage_top10 <- event_damage[1:10,]
gdam <- ggplot(damage_top10,aes(x=reorder(EVTYPE,-TOTAL_DAMAGE),y=TOTAL_DAMAGE/1e9,fill=TOTAL_DAMAGE/1e9))
gdam + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90,vjust=0.5,hjust=1)) + xlab("") + ylab("DAMAGE (BILLION USD)") + scale_fill_continuous("BILLION USD") + ggtitle("Top 10 events with maximum economic damage")
Floods have caused the maximum economic damage, with loss of property and crops worth more than 160 billion USD from 1950 to 2011.
This assignment is part of the course “Reproducible Research” offered cy Coursera at https://www.coursera.org/learn/reproducible-research and taught by Prof. Roger D. Peng, Johns Hopkins University.
The data is available at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and the documentation is available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.