In this data analysis we studied the effects of differents wheather events. The goal of this analysis is to determine which events are most harmfull for human health and cause more material damages. In order to achieve this goal, I based the analysis in five fields that brings information of injuries, fatalities, costs and wheather events. For calculating the material damages I used fields that keep information about the estimate cost of the disaster. Those costs are valorated in thousands, millions, and also billions of dollars. To be able to work with the data I had to address a clean data process because there were repeated data, among another problems the data had. I extracted the data that were needed to answer the following questions :
questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? In this report are presented the steps I followed to give an answer to the objective questions
First of all I setted the working directory and loaded the dataset
data<-read.csv("repdata-data-StormData.csv")
I got relevant columns for question 1
relevData<-data[,c(1:8,23:26)]
Because we need to know the damage caused for the event we only care for those rows that are not 0 in both fatality and injuries field.
healthDmg<-relevData[relevData$FATALITIES!=0 & relevData$INJURIES!=0,]
We have now only 2649 rows of interest. Those rows where fatalities and injuries was not 0
Because all event types were uppercased I converted the entire column to upper case in order to avoid repeated events
healthDmg$EVTYPE<-toupper(healthDmg$EVTYPE)
Since there were more than one event in some observations, I decided to take the first one that appears.
Getting events before “AND” word
for (i in grep(" AND ",healthDmg$EVTYPE)){
pos<-regexpr(" AND ",healthDmg$EVTYPE[[i]],fixed="TRUE")
healthDmg$EVTYPE[[i]]<-substring(healthDmg$EVTYPE[[i]],1,pos-1)
}
Getting event before “/” character
for (i in grep("/",healthDmg$EVTYPE)){
pos<-regexpr("/",healthDmg$EVTYPE[[i]],fixed="TRUE")
healthDmg$EVTYPE[[i]]<-substring(healthDmg$EVTYPE[[i]],1,pos-1)
}
Next created a vector with the 84 unique values sorted for detecting duplicates and anomalies
length(unique(healthDmg$EVTYPE))
## [1] 74
uniqueEvType<-sort(unique(healthDmg$EVTYPE))
uniqueEvType
## [1] "AVALANCHE" "BLACK ICE"
## [3] "BLIZZARD" "BLOWING SNOW"
## [5] "COASTAL STORM" "COLD"
## [7] "DENSE FOG" "DUST STORM"
## [9] "EXCESSIVE HEAT" "EXCESSIVE RAINFALL"
## [11] "EXTREME COLD" "EXTREME WINDCHILL"
## [13] "FLASH FLOOD" "FLOOD"
## [15] "FLOODING" "FOG"
## [17] "FREEZING DRIZZLE" "FREEZING RAIN"
## [19] "FROST" "GLAZE"
## [21] "GUSTY WINDS" "HAIL"
## [23] "HEAT" "HEAT WAVE"
## [25] "HEAT WAVE DROUGHT" "HEAVY RAIN"
## [27] "HEAVY SNOW" "HEAVY SURF"
## [29] "HIGH SEAS" "HIGH SURF"
## [31] "HIGH WIND" "HIGH WINDS"
## [33] "HURRICANE" "ICE"
## [35] "ICE STORM" "ICY ROADS"
## [37] "LANDSLIDE" "LANDSLIDES"
## [39] "LIGHT SNOW" "LIGHTNING"
## [41] "MARINE ACCIDENT" "MARINE HIGH WIND"
## [43] "MARINE MISHAP" "MARINE STRONG WIND"
## [45] "MARINE THUNDERSTORM WIND" "MARINE TSTM WIND"
## [47] "MIXED PRECIP" "RAIN"
## [49] "RIP CURRENT" "RIP CURRENTS"
## [51] "ROUGH SEAS" "ROUGH SURF"
## [53] "SNOW" "STORM SURGE"
## [55] "STRONG WIND" "STRONG WINDS"
## [57] "THUNDERSNOW" "THUNDERSTORM WIND"
## [59] "THUNDERSTORM WINDS" "TORNADO"
## [61] "TROPICAL STORM" "TROPICAL STORM GORDON"
## [63] "TSTM WIND" "TSUNAMI"
## [65] "URBAN" "WATERSPOUT"
## [67] "WILD" "WILD FIRES"
## [69] "WILDFIRE" "WIND"
## [71] "WINTER STORM" "WINTER STORM HIGH WINDS"
## [73] "WINTER STORMS" "WINTER WEATHER"
As you can see, there were a lot of elements that were repeated so I fixed that by doing a coincedence list of bad and good terms. I based this part in section 2.1.1, Table 1. Storm Data Event Table of Storm Data documentation file (repdata-peer2_doc-pd01016005curr.pdf)
badOnes<-c("HIGH WINDS","ICE","LANDSLIDES","MARINE MISHAP","MARINE TSTM WIND","TSTM WIND",
"RIP CURRENTS","ROUGH SEAS","STRONG WINDS","THUNDERSNOW","THUNDERSTORM WINDS",
"TROPICAL STORM GORDON","WILD FIRES","WILD/FOREST FIRE","WINTER STORM HIGH WINDS",
"WIND","HIGH WINDS","WINTER STORMS", "URBAN/SML STREAM FLD","WINTER WEATHER/MIX",
"FLOODING","HEAT WAVE","HEAT WAVE DROUGHT","GLAZE","HEAVY SURF","ROUGH SURF",
"SNOW","WILD","URBAN","THUDERSTORM WINDS","COASTAL FLOODING","EXTREME WIND CHILL",
"FLOODS", "HIGH WINDS","ICE ROADS","LIGHTING","LIGNTNING","MARINE TS", "MUDSLIDE",
"MUDSLIDES","RECO","THUNDERSTORM WINDS","THUNDERSTORMW","TORNDAO","TSTMW",
"TUNDERSTORM WIND","WATERSP"," FLASH FLOO", " TSTM WIN"," HIGH SURF ADVISORY")
goodOnes<-c("HIGH WIND","ICE STORM","LANDSLIDE","MARINE ACCIDENT","MARINE THUNDERSTORM WIND",
"MARINE THUNDERSTORM WIND","RIP CURRENT","ROUGH SURF","STRONG WIND",
"THUNDERSTORM WIND","THUNDERSTORM WIND","TROPICAL STORM","WILDFIRE",
"WILDFIRE","WINTER STORM","HIGH WIND","HIGH WIND","WINTER STORM","HEAVY RAIN",
"WINTER WEATHER","FLOOD","HEAT","HEAT","FROST","HIGH SURF","HIGH SURF",
"HEAVY SNOW","WILDFIRE","FLOOD","THUDERSTORM WIND","COASTAL FLOOD","EXTREME WINDCHILL",
"FLOOD","HIGH WIND","ICY ROADS","LIGHTNING","LIGHTNING","MARINE THUNDERSTORM WIND",
"MUD SLIDE","MUD SLIDE","RECORD SNOW","THUNDERSTORM WIND","THUNDERSTORMS","TORNADO",
"THUNDERSTORM WIND","THUNDERSTORM WIND","WATERSPOUT","FLASH FLOOD","THUNDERSTORM WIND",
"HIGH SURF ADVISORY")
Then using the lists I replaced the bad terms with the good terms
for (i in 1:length(badOnes)){
healthDmg[healthDmg$EVTYPE==badOnes[i],8]<-goodOnes[i]
}
healthDmg$EVTYPE<-factor(healthDmg$EVTYPE)
Then I created again a vector with unique event type field. As you can see there are only 49 unique values, and that is much better.
This is the unique vector calculated again:
uniqueEvType<-sort(unique(healthDmg$EVTYPE))
uniqueEvType
## [1] AVALANCHE BLACK ICE
## [3] BLIZZARD BLOWING SNOW
## [5] COASTAL STORM COLD
## [7] DENSE FOG DUST STORM
## [9] EXCESSIVE HEAT EXCESSIVE RAINFALL
## [11] EXTREME COLD EXTREME WINDCHILL
## [13] FLASH FLOOD FLOOD
## [15] FOG FREEZING DRIZZLE
## [17] FREEZING RAIN FROST
## [19] GUSTY WINDS HAIL
## [21] HEAT HEAVY RAIN
## [23] HEAVY SNOW HIGH SEAS
## [25] HIGH SURF HIGH WIND
## [27] HURRICANE ICE STORM
## [29] ICY ROADS LANDSLIDE
## [31] LIGHT SNOW LIGHTNING
## [33] MARINE ACCIDENT MARINE HIGH WIND
## [35] MARINE STRONG WIND MARINE THUNDERSTORM WIND
## [37] MIXED PRECIP RAIN
## [39] RIP CURRENT STORM SURGE
## [41] STRONG WIND THUNDERSTORM WIND
## [43] TORNADO TROPICAL STORM
## [45] TSUNAMI WATERSPOUT
## [47] WILDFIRE WINTER STORM
## [49] WINTER WEATHER
## 49 Levels: AVALANCHE BLACK ICE BLIZZARD BLOWING SNOW ... WINTER WEATHER
Now is time to process the data to answer the question number 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
I created a new data frame with total fatalities and injuries by type of event
harmPob<-aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,data=healthDmg, FUN = sum)
I ordered harmPob by Fatalities. And these are the five events that are more harmfull for people health
head(harmPob[order(-harmPob$FATALITIES),],5)
## EVTYPE FATALITIES INJURIES
## 43 TORNADO 5227 60187
## 9 EXCESSIVE HEAT 402 4791
## 32 LIGHTNING 283 649
## 36 MARINE THUNDERSTORM WIND 211 666
## 13 FLASH FLOOD 171 641
Based on seccion 2.1.1 from the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 document, I created by hand (copy-paste and edit) a csv file with the event designator from Table 1. page 6.
designators<-read.csv("Designator.csv")
designators$Event.Name<-toupper(designators$Event.Name)
A merge was made in order to assign designator to each event. And then, based on the table information I completed the NA values
harmPobD<- merge(harmPob,designators,by.x="EVTYPE",by.y="Event.Name",all.x=TRUE)
harmPobD$Designator<-as.character(harmPobD$Designator)
harmPobD[is.na(harmPobD$Designator),4]<-c("Z","Z","Z","Z","C","Z","Z","Z","Z","Z","Z","Z","M","Z","Z","C","Z","M","C","C","Z")
harmPobD$Designator<-factor(harmPobD$Designator)
Then I got fatalities and injuries by designator
bydes<-aggregate(cbind(FATALITIES,INJURIES)~Designator,harmPobD,mean)
Then I did the following barplot
library(ggplot2)
library(reshape2)
bydes.long<-melt(bydes,id.vars="Designator")
bydes$long$variable<-substring(bydes.long$variable,2)
p<-ggplot(bydes.long,aes(variable,value,fill=as.factor(Designator)))+
geom_bar(position="dodge",stat="identity")+ labs(x="Harms")+labs(title="Harm by Designation")
p
As you can see, there are more incidence of people injured in Counties and the fatality incidence is significantly lower than for injures. One explanation could be that people of a County are better adapted to Events that occurs more frecuently, and know how to protect them self from death.
The event type data for this question were pretty dirty. So the cleaning process was more complicated
Preprosesing data
matDmg<-relevData[relevData$PROPDMG!=0,]
matDmg$EVTYPE<-toupper(matDmg$EVTYPE)
First we are going to create a vector with the unique ordered values
length(unique(matDmg$EVTYPE))
## [1] 375
uniqueEvType<-sort(unique(matDmg$EVTYPE))
uniqueEvType
## [1] " HIGH SURF ADVISORY" " FLASH FLOOD"
## [3] " TSTM WIND" " TSTM WIND (G45)"
## [5] "?" "APACHE COUNTY"
## [7] "ASTRONOMICAL HIGH TIDE" "ASTRONOMICAL LOW TIDE"
## [9] "AVALANCHE" "BEACH EROSION"
## [11] "BLIZZARD" "BLIZZARD/WINTER STORM"
## [13] "BLOWING DUST" "BLOWING SNOW"
## [15] "BREAKUP FLOODING" "BRUSH FIRE"
## [17] "COASTAL FLOODING/EROSION" "COASTAL EROSION"
## [19] "COASTAL FLOOD" "COASTAL FLOODING"
## [21] "COASTAL FLOODING/EROSION" "COASTAL STORM"
## [23] "COASTAL SURGE" "COLD"
## [25] "COLD AIR TORNADO" "COLD/WIND CHILL"
## [27] "DAM BREAK" "DAMAGING FREEZE"
## [29] "DENSE FOG" "DENSE SMOKE"
## [31] "DOWNBURST" "DROUGHT"
## [33] "DRY MICROBURST" "DUST DEVIL"
## [35] "DUST DEVIL WATERSPOUT" "DUST STORM"
## [37] "DUST STORM/HIGH WINDS" "EROSION/CSTL FLOOD"
## [39] "EXCESSIVE HEAT" "EXCESSIVE SNOW"
## [41] "EXTENDED COLD" "EXTREME COLD"
## [43] "EXTREME COLD/WIND CHILL" "EXTREME HEAT"
## [45] "EXTREME WIND CHILL" "EXTREME WINDCHILL"
## [47] "FLASH FLOOD" "FLASH FLOOD - HEAVY RAIN"
## [49] "FLASH FLOOD FROM ICE JAMS" "FLASH FLOOD LANDSLIDES"
## [51] "FLASH FLOOD WINDS" "FLASH FLOOD/"
## [53] "FLASH FLOOD/ STREET" "FLASH FLOOD/FLOOD"
## [55] "FLASH FLOOD/LANDSLIDE" "FLASH FLOODING"
## [57] "FLASH FLOODING/FLOOD" "FLASH FLOODING/THUNDERSTORM WI"
## [59] "FLASH FLOODS" "FLOOD"
## [61] "FLOOD & HEAVY RAIN" "FLOOD FLASH"
## [63] "FLOOD/FLASH" "FLOOD/FLASH FLOOD"
## [65] "FLOOD/FLASH/FLOOD" "FLOOD/FLASHFLOOD"
## [67] "FLOOD/RIVER FLOOD" "FLOODING"
## [69] "FLOODING/HEAVY RAIN" "FLOODS"
## [71] "FOG" "FOREST FIRES"
## [73] "FREEZE" "FREEZING DRIZZLE"
## [75] "FREEZING FOG" "FREEZING RAIN"
## [77] "FREEZING RAIN/SLEET" "FREEZING RAIN/SNOW"
## [79] "FROST" "FROST/FREEZE"
## [81] "FROST\\FREEZE" "FUNNEL CLOUD"
## [83] "GLAZE" "GLAZE ICE"
## [85] "GRADIENT WIND" "GRASS FIRES"
## [87] "GROUND BLIZZARD" "GUSTNADO"
## [89] "GUSTY WIND" "GUSTY WIND/HAIL"
## [91] "GUSTY WIND/HVY RAIN" "GUSTY WIND/RAIN"
## [93] "GUSTY WINDS" "HAIL"
## [95] "HAIL 0.75" "HAIL 100"
## [97] "HAIL 175" "HAIL 275"
## [99] "HAIL 450" "HAIL 75"
## [101] "HAIL DAMAGE" "HAIL/WIND"
## [103] "HAIL/WINDS" "HAILSTORM"
## [105] "HEAT" "HEAT WAVE"
## [107] "HEAT WAVE DROUGHT" "HEAVY LAKE SNOW"
## [109] "HEAVY MIX" "HEAVY PRECIPITATION"
## [111] "HEAVY RAIN" "HEAVY RAIN AND FLOOD"
## [113] "HEAVY RAIN/HIGH SURF" "HEAVY RAIN/LIGHTNING"
## [115] "HEAVY RAIN/SEVERE WEATHER" "HEAVY RAIN/SMALL STREAM URBAN"
## [117] "HEAVY RAIN/SNOW" "HEAVY RAINS"
## [119] "HEAVY RAINS/FLOODING" "HEAVY SHOWER"
## [121] "HEAVY SNOW" "HEAVY SNOW-SQUALLS"
## [123] "HEAVY SNOW AND STRONG WINDS" "HEAVY SNOW SHOWER"
## [125] "HEAVY SNOW SQUALLS" "HEAVY SNOW/BLIZZARD"
## [127] "HEAVY SNOW/BLIZZARD/AVALANCHE" "HEAVY SNOW/FREEZING RAIN"
## [129] "HEAVY SNOW/HIGH WINDS & FLOOD" "HEAVY SNOW/ICE"
## [131] "HEAVY SNOW/SQUALLS" "HEAVY SNOW/WIND"
## [133] "HEAVY SNOW/WINTER STORM" "HEAVY SNOWPACK"
## [135] "HEAVY SURF" "HEAVY SURF COASTAL FLOODING"
## [137] "HEAVY SURF/HIGH SURF" "HEAVY SWELLS"
## [139] "HIGH WINDS" "HIGH SEAS"
## [141] "HIGH SURF" "HIGH SWELLS"
## [143] "HIGH TIDES" "HIGH WATER"
## [145] "HIGH WIND" "HIGH WIND (G40)"
## [147] "HIGH WIND 48" "HIGH WIND AND SEAS"
## [149] "HIGH WIND DAMAGE" "HIGH WIND/BLIZZARD"
## [151] "HIGH WIND/HEAVY SNOW" "HIGH WIND/SEAS"
## [153] "HIGH WINDS" "HIGH WINDS HEAVY RAINS"
## [155] "HIGH WINDS/" "HIGH WINDS/COASTAL FLOOD"
## [157] "HIGH WINDS/COLD" "HIGH WINDS/HEAVY RAIN"
## [159] "HIGH WINDS/SNOW" "HURRICANE"
## [161] "HURRICANE-GENERATED SWELLS" "HURRICANE EMILY"
## [163] "HURRICANE ERIN" "HURRICANE FELIX"
## [165] "HURRICANE GORDON" "HURRICANE OPAL"
## [167] "HURRICANE OPAL/HIGH WINDS" "HURRICANE/TYPHOON"
## [169] "ICE" "ICE AND SNOW"
## [171] "ICE FLOES" "ICE JAM"
## [173] "ICE JAM FLOOD (MINOR" "ICE JAM FLOODING"
## [175] "ICE ROADS" "ICE STORM"
## [177] "ICE/STRONG WINDS" "ICY ROADS"
## [179] "LAKE-EFFECT SNOW" "LAKE EFFECT SNOW"
## [181] "LAKE FLOOD" "LAKESHORE FLOOD"
## [183] "LANDSLIDE" "LANDSLIDES"
## [185] "LANDSLUMP" "LANDSPOUT"
## [187] "LATE SEASON SNOW" "LIGHT FREEZING RAIN"
## [189] "LIGHT SNOW" "LIGHT SNOWFALL"
## [191] "LIGHTING" "LIGHTNING"
## [193] "LIGHTNING WAUSEON" "LIGHTNING AND HEAVY RAIN"
## [195] "LIGHTNING FIRE" "LIGHTNING THUNDERSTORM WINDS"
## [197] "LIGHTNING/HEAVY RAIN" "LIGNTNING"
## [199] "MAJOR FLOOD" "MARINE ACCIDENT"
## [201] "MARINE HAIL" "MARINE HIGH WIND"
## [203] "MARINE STRONG WIND" "MARINE THUNDERSTORM WIND"
## [205] "MARINE TSTM WIND" "MICROBURST"
## [207] "MICROBURST WINDS" "MINOR FLOODING"
## [209] "MIXED PRECIPITATION" "MUD SLIDE"
## [211] "MUD SLIDES" "MUD SLIDES URBAN FLOODING"
## [213] "MUDSLIDE" "MUDSLIDES"
## [215] "NON-SEVERE WIND DAMAGE" "NON-TSTM WIND"
## [217] "OTHER" "RAIN"
## [219] "RAINSTORM" "RECORD COLD"
## [221] "RECORD RAINFALL" "RECORD SNOW"
## [223] "RIP CURRENT" "RIP CURRENTS"
## [225] "RIVER AND STREAM FLOOD" "RIVER FLOOD"
## [227] "RIVER FLOODING" "ROCK SLIDE"
## [229] "ROUGH SURF" "RURAL FLOOD"
## [231] "SEICHE" "SEVERE THUNDERSTORM"
## [233] "SEVERE THUNDERSTORM WINDS" "SEVERE THUNDERSTORMS"
## [235] "SEVERE TURBULENCE" "SLEET/ICE STORM"
## [237] "SMALL HAIL" "SNOW"
## [239] "SNOW ACCUMULATION" "SNOW AND HEAVY SNOW"
## [241] "SNOW AND ICE" "SNOW AND ICE STORM"
## [243] "SNOW FREEZING RAIN" "SNOW SQUALL"
## [245] "SNOW SQUALLS" "SNOW/ BITTER COLD"
## [247] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [249] "SNOW/COLD" "SNOW/FREEZING RAIN"
## [251] "SNOW/HEAVY SNOW" "SNOW/HIGH WINDS"
## [253] "SNOW/ICE" "SNOW/ICE STORM"
## [255] "SNOW/SLEET" "SNOW/SLEET/FREEZING RAIN"
## [257] "SNOWMELT FLOODING" "STORM FORCE WINDS"
## [259] "STORM SURGE" "STORM SURGE/TIDE"
## [261] "STRONG WIND" "STRONG WINDS"
## [263] "THUDERSTORM WINDS" "THUNDEERSTORM WINDS"
## [265] "THUNDERESTORM WINDS" "THUNDERSNOW"
## [267] "THUNDERSTORM" "THUNDERSTORM WINDS"
## [269] "THUNDERSTORM DAMAGE TO" "THUNDERSTORM HAIL"
## [271] "THUNDERSTORM WIND" "THUNDERSTORM WIND 60 MPH"
## [273] "THUNDERSTORM WIND 65 MPH" "THUNDERSTORM WIND 65MPH"
## [275] "THUNDERSTORM WIND 98 MPH" "THUNDERSTORM WIND G50"
## [277] "THUNDERSTORM WIND G55" "THUNDERSTORM WIND TREES"
## [279] "THUNDERSTORM WIND/ TREE" "THUNDERSTORM WIND/ TREES"
## [281] "THUNDERSTORM WIND/AWNING" "THUNDERSTORM WIND/HAIL"
## [283] "THUNDERSTORM WIND/LIGHTNING" "THUNDERSTORM WINDS"
## [285] "THUNDERSTORM WINDS 13" "THUNDERSTORM WINDS 63 MPH"
## [287] "THUNDERSTORM WINDS AND" "THUNDERSTORM WINDS HAIL"
## [289] "THUNDERSTORM WINDS LIGHTNING" "THUNDERSTORM WINDS."
## [291] "THUNDERSTORM WINDS/ FLOOD" "THUNDERSTORM WINDS/FLOODING"
## [293] "THUNDERSTORM WINDS/FUNNEL CLOU" "THUNDERSTORM WINDS/HAIL"
## [295] "THUNDERSTORM WINDS53" "THUNDERSTORM WINDSHAIL"
## [297] "THUNDERSTORM WINDSS" "THUNDERSTORM WINS"
## [299] "THUNDERSTORMS" "THUNDERSTORMS WIND"
## [301] "THUNDERSTORMS WINDS" "THUNDERSTORMW"
## [303] "THUNDERSTORMWINDS" "THUNDERSTROM WIND"
## [305] "THUNDERTORM WINDS" "THUNERSTORM WINDS"
## [307] "TIDAL FLOODING" "TORNADO"
## [309] "TORNADO F0" "TORNADO F1"
## [311] "TORNADO F2" "TORNADO F3"
## [313] "TORNADOES, TSTM WIND, HAIL" "TORNDAO"
## [315] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [317] "TROPICAL STORM ALBERTO" "TROPICAL STORM DEAN"
## [319] "TROPICAL STORM GORDON" "TROPICAL STORM JERRY"
## [321] "TSTM WIND" "TSTM WIND (G45)"
## [323] "TSTM WIND (41)" "TSTM WIND (G35)"
## [325] "TSTM WIND (G40)" "TSTM WIND (G45)"
## [327] "TSTM WIND 40" "TSTM WIND 45"
## [329] "TSTM WIND 55" "TSTM WIND 65)"
## [331] "TSTM WIND AND LIGHTNING" "TSTM WIND DAMAGE"
## [333] "TSTM WIND G45" "TSTM WIND G58"
## [335] "TSTM WIND/HAIL" "TSTM WINDS"
## [337] "TSTMW" "TSUNAMI"
## [339] "TUNDERSTORM WIND" "TYPHOON"
## [341] "URBAN AND SMALL" "URBAN FLOOD"
## [343] "URBAN FLOODING" "URBAN FLOODS"
## [345] "URBAN SMALL" "URBAN/SMALL STREAM"
## [347] "URBAN/SMALL STREAM FLOOD" "URBAN/SML STREAM FLD"
## [349] "VOLCANIC ASH" "WATERSPOUT"
## [351] "WATERSPOUT-" "WATERSPOUT-TORNADO"
## [353] "WATERSPOUT TORNADO" "WATERSPOUT/ TORNADO"
## [355] "WATERSPOUT/TORNADO" "WET MICROBURST"
## [357] "WHIRLWIND" "WILD FIRES"
## [359] "WILD/FOREST FIRE" "WILD/FOREST FIRES"
## [361] "WILDFIRE" "WILDFIRES"
## [363] "WIND" "WIND AND WAVE"
## [365] "WIND DAMAGE" "WIND STORM"
## [367] "WIND/HAIL" "WINDS"
## [369] "WINTER STORM" "WINTER STORM HIGH WINDS"
## [371] "WINTER STORMS" "WINTER WEATHER"
## [373] "WINTER WEATHER MIX" "WINTER WEATHER/MIX"
## [375] "WINTRY MIX"
This function replace the original string by the cadena atribute if content variable is TRUE If content variabñe is FALSE extracts the substring before the initial position of cadena
clearSbst<-function(dataset,cadena,fix="TRUE",content=FALSE){
for (i in grep(cadena,dataset)){
pos<-regexpr("/",dataset[[i]],fixed=fix)
# print(pos)
if (!content)
dataset[[i]]<-substring(dataset[[i]],1,pos-1)
else
dataset[[i]]<-substring(dataset[[i]],pos,pos+(nchar(cadena)+1))
}
dataset
}
Using clearSbst function, I replaced some of the values of event type field
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE,"/")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "-")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "&")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE," AND ")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE,"\\\\")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WINTER STORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WINTER WEATHER",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TSTM WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TROPICAL STORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TORNADO",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "FLASH FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "COASTAL FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "COLD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "DUST DEVIL",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "GLAZE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "GUSTY WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HAIL",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HEAVY SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HEAVY SURF",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HIGH WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HURRICANE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "ICE JAM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LANDSLIDE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LIGHT SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LIGHTNING",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "MICROBURST",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "MUD SLIDE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "RIP CURRENT",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "RIVER FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "SEVERE THUNDERSTORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "STRONG WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "THUNDEERSTORM WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "URBAN FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WIND",fix="TRUE",content=TRUE)
And then I used the bad and good terms lists to clean even more the data
for (i in 1:length(badOnes)){
matDmg[matDmg$EVTYPE==badOnes[i],8]<-goodOnes[i]
}
Then I created again a vector with unique event type field. As you can see there are only 135 unique values, and that is much better.
This is the unique vector calculated again:
uniqueEvType<-sort(unique(matDmg$EVTYPE))
uniqueEvType
## [1] "" "?"
## [3] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [5] "ASTRONOMICAL LOW TIDE" "AVALANCHE"
## [7] "BEACH EROSION" "BLIZZARD"
## [9] "BLOW" "BLOWING DUST"
## [11] "BREAKUP FLOODING" "BRUSH FIRE"
## [13] "COASTAL EROSION" "COASTAL FLOOD"
## [15] "COASTAL STORM" "COASTAL SURGE"
## [17] "COLD" "DAM BREAK"
## [19] "DAMAGING FREEZE" "DENSE FOG"
## [21] "DENSE SMOKE" "DOWNBURST"
## [23] "DROUGHT" "DRY MICROB"
## [25] "DUST DEVIL" "DUST STORM"
## [27] "EROSION" "EXCE"
## [29] "EXCESSIVE HEAT" "EXTE"
## [31] "EXTR" "EXTREME HEAT"
## [33] "FLASH FLOOD" "FLOOD"
## [35] "FLOOD FLASH" "FOG"
## [37] "FOREST FIRES" "FREEZE"
## [39] "FREEZING DRIZZLE" "FREEZING FOG"
## [41] "FREEZING RAIN" "FROST"
## [43] "FUNNEL CLOUD" "GRAD"
## [45] "GRASS FIRES" "GROUND BLIZZARD"
## [47] "GUST" "GUSTNADO"
## [49] "HAIL" "HEAT"
## [51] "HEAV" "HEAVY MIX"
## [53] "HEAVY PRECIPITATION" "HEAVY RAIN"
## [55] "HEAVY RAINS" "HEAVY SHOWER"
## [57] "HEAVY SNOW" "HEAVY SWELLS"
## [59] "HIGH" "HIGH SEAS"
## [61] "HIGH SURF" "HIGH SURF ADVISORY"
## [63] "HIGH SWELLS" "HIGH TIDES"
## [65] "HIGH WATER" "HIGH WIND"
## [67] "HURRICANE" "ICE FLOES"
## [69] "ICE JAM" "ICE STORM"
## [71] "ICY ROADS" "LAKE"
## [73] "LAKE FLOOD" "LAKESHORE FLOOD"
## [75] "LANDSLIDE" "LANDSLUMP"
## [77] "LANDSPOUT" "LATE"
## [79] "LIGH" "LIGHT FREEZING RAIN"
## [81] "LIGHTNING" "MAJOR FLOOD"
## [83] "MARI" "MARINE ACCIDENT"
## [85] "MARINE HI" "MARINE STRO"
## [87] "MARINE THUNDERSTORM WIND" "MICROBURST"
## [89] "MINOR FLOODING" "MIXED PRECIPITATION"
## [91] "MUD SLIDE" "OTHER"
## [93] "RAIN" "RAINSTORM"
## [95] "RECORD RAINFALL" "RECORD SNOW"
## [97] "RIP CURRENT" "RIVER FLOOD"
## [99] "ROCK SLIDE" "RURAL FLOOD"
## [101] "SEICHE" "SEVERE THUNDERSTORM"
## [103] "SEVERE TURBULENCE" "SLEET"
## [105] "SMAL" "STOR"
## [107] "STORM SURGE" "STRO"
## [109] "THUD" "THUN"
## [111] "THUNDERST" "THUNDERSTORM"
## [113] "THUNDERSTORM DAMAGE TO" "THUNDERSTORM WIND"
## [115] "THUNDERSTORM WINS" "THUNDERSTORMS"
## [117] "TIDAL FLOODING" "TORNADO"
## [119] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [121] "TSTM" "TSUNAMI"
## [123] "TUND" "TYPHOON"
## [125] "URBAN FLOOD" "URBAN SMALL"
## [127] "VOLCANIC ASH" "WATERSPOUT"
## [129] "WET MICROB" "WHIR"
## [131] "WILDFIRE" "WILDFIRES"
## [133] "WINTER STORM" "WINTER WEATHER"
## [135] "WINTRY MIX"
Finally the total inversión by event type was calculated answering the second question: 2. Across the United States, which types of events have the greatest economic consequences?
matDmg<-aggregate(cbind(PROPDMG,PROPDMGEXP)~EVTYPE,data=matDmg, FUN = sum)
head(matDmg[order(-matDmg$PROPDMG),],5)
## EVTYPE PROPDMG PROPDMGEXP
## 118 TORNADO 3214534.0 672670
## 33 FLASH FLOOD 1455187.6 358013
## 121 TSTM 1344802.7 1036905
## 110 THUN 1327667.5 938622
## 34 FLOOD 952186.6 185413
As you can see the Tornados are the event type that generate more costs.