In this data analysis we studied the effects of differents wheather events. The goal of this analysis is to determine which events are most harmfull for human health and cause more material damages. In order to achieve this goal, I based the analysis in five fields that brings information of injuries, fatalities, costs and wheather events. For calculating the material damages I used fields that keep information about the estimate cost of the disaster. Those costs are valorated in thousands, millions, and also billions of dollars. To be able to work with the data I had to address a clean data process because there were repeated data, among another problems the data had. I extracted the data that were needed to answer the following questions :
questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? In this report are presented the steps I followed to give an answer to the objective questions

Data processing

First of all I setted the working directory and loaded the dataset

data<-read.csv("repdata-data-StormData.csv")

I got relevant columns for question 1

relevData<-data[,c(1:8,23:26)] 

Because we need to know the damage caused for the event we only care for those rows that are not 0 in both fatality and injuries field.

healthDmg<-relevData[relevData$FATALITIES!=0 & relevData$INJURIES!=0,]

We have now only 2649 rows of interest. Those rows where fatalities and injuries was not 0

Because all event types were uppercased I converted the entire column to upper case in order to avoid repeated events

healthDmg$EVTYPE<-toupper(healthDmg$EVTYPE)

Cleaning process for Question 1:

Since there were more than one event in some observations, I decided to take the first one that appears.

Getting events before “AND” word

for (i in grep(" AND ",healthDmg$EVTYPE)){
        pos<-regexpr(" AND ",healthDmg$EVTYPE[[i]],fixed="TRUE")
        healthDmg$EVTYPE[[i]]<-substring(healthDmg$EVTYPE[[i]],1,pos-1)
}

Getting event before “/” character

for (i in grep("/",healthDmg$EVTYPE)){
        pos<-regexpr("/",healthDmg$EVTYPE[[i]],fixed="TRUE")
        healthDmg$EVTYPE[[i]]<-substring(healthDmg$EVTYPE[[i]],1,pos-1)
}

Next created a vector with the 84 unique values sorted for detecting duplicates and anomalies

length(unique(healthDmg$EVTYPE))
## [1] 74
uniqueEvType<-sort(unique(healthDmg$EVTYPE))
uniqueEvType
##  [1] "AVALANCHE"                "BLACK ICE"               
##  [3] "BLIZZARD"                 "BLOWING SNOW"            
##  [5] "COASTAL STORM"            "COLD"                    
##  [7] "DENSE FOG"                "DUST STORM"              
##  [9] "EXCESSIVE HEAT"           "EXCESSIVE RAINFALL"      
## [11] "EXTREME COLD"             "EXTREME WINDCHILL"       
## [13] "FLASH FLOOD"              "FLOOD"                   
## [15] "FLOODING"                 "FOG"                     
## [17] "FREEZING DRIZZLE"         "FREEZING RAIN"           
## [19] "FROST"                    "GLAZE"                   
## [21] "GUSTY WINDS"              "HAIL"                    
## [23] "HEAT"                     "HEAT WAVE"               
## [25] "HEAT WAVE DROUGHT"        "HEAVY RAIN"              
## [27] "HEAVY SNOW"               "HEAVY SURF"              
## [29] "HIGH SEAS"                "HIGH SURF"               
## [31] "HIGH WIND"                "HIGH WINDS"              
## [33] "HURRICANE"                "ICE"                     
## [35] "ICE STORM"                "ICY ROADS"               
## [37] "LANDSLIDE"                "LANDSLIDES"              
## [39] "LIGHT SNOW"               "LIGHTNING"               
## [41] "MARINE ACCIDENT"          "MARINE HIGH WIND"        
## [43] "MARINE MISHAP"            "MARINE STRONG WIND"      
## [45] "MARINE THUNDERSTORM WIND" "MARINE TSTM WIND"        
## [47] "MIXED PRECIP"             "RAIN"                    
## [49] "RIP CURRENT"              "RIP CURRENTS"            
## [51] "ROUGH SEAS"               "ROUGH SURF"              
## [53] "SNOW"                     "STORM SURGE"             
## [55] "STRONG WIND"              "STRONG WINDS"            
## [57] "THUNDERSNOW"              "THUNDERSTORM WIND"       
## [59] "THUNDERSTORM WINDS"       "TORNADO"                 
## [61] "TROPICAL STORM"           "TROPICAL STORM GORDON"   
## [63] "TSTM WIND"                "TSUNAMI"                 
## [65] "URBAN"                    "WATERSPOUT"              
## [67] "WILD"                     "WILD FIRES"              
## [69] "WILDFIRE"                 "WIND"                    
## [71] "WINTER STORM"             "WINTER STORM HIGH WINDS" 
## [73] "WINTER STORMS"            "WINTER WEATHER"

As you can see, there were a lot of elements that were repeated so I fixed that by doing a coincedence list of bad and good terms. I based this part in section 2.1.1, Table 1. Storm Data Event Table of Storm Data documentation file (repdata-peer2_doc-pd01016005curr.pdf)

badOnes<-c("HIGH WINDS","ICE","LANDSLIDES","MARINE MISHAP","MARINE TSTM WIND","TSTM WIND",
           "RIP CURRENTS","ROUGH SEAS","STRONG WINDS","THUNDERSNOW","THUNDERSTORM WINDS",
           "TROPICAL STORM GORDON","WILD FIRES","WILD/FOREST FIRE","WINTER STORM HIGH WINDS",
           "WIND","HIGH WINDS","WINTER STORMS", "URBAN/SML STREAM FLD","WINTER WEATHER/MIX",
           "FLOODING","HEAT WAVE","HEAT WAVE DROUGHT","GLAZE","HEAVY SURF","ROUGH SURF",
           "SNOW","WILD","URBAN","THUDERSTORM WINDS","COASTAL  FLOODING","EXTREME WIND CHILL",
           "FLOODS", "HIGH  WINDS","ICE ROADS","LIGHTING","LIGNTNING","MARINE TS", "MUDSLIDE",
           "MUDSLIDES","RECO","THUNDERSTORM  WINDS","THUNDERSTORMW","TORNDAO","TSTMW",
           "TUNDERSTORM WIND","WATERSP"," FLASH FLOO", " TSTM WIN","   HIGH SURF ADVISORY")

goodOnes<-c("HIGH WIND","ICE STORM","LANDSLIDE","MARINE ACCIDENT","MARINE THUNDERSTORM WIND",
            "MARINE THUNDERSTORM WIND","RIP CURRENT","ROUGH SURF","STRONG WIND",
            "THUNDERSTORM WIND","THUNDERSTORM WIND","TROPICAL STORM","WILDFIRE",
            "WILDFIRE","WINTER STORM","HIGH WIND","HIGH WIND","WINTER STORM","HEAVY RAIN",
            "WINTER WEATHER","FLOOD","HEAT","HEAT","FROST","HIGH SURF","HIGH SURF",
            "HEAVY SNOW","WILDFIRE","FLOOD","THUDERSTORM WIND","COASTAL FLOOD","EXTREME WINDCHILL",
            "FLOOD","HIGH WIND","ICY ROADS","LIGHTNING","LIGHTNING","MARINE THUNDERSTORM WIND",
            "MUD SLIDE","MUD SLIDE","RECORD SNOW","THUNDERSTORM WIND","THUNDERSTORMS","TORNADO",
            "THUNDERSTORM WIND","THUNDERSTORM WIND","WATERSPOUT","FLASH FLOOD","THUNDERSTORM WIND",
            "HIGH SURF ADVISORY")

Then using the lists I replaced the bad terms with the good terms

for (i in 1:length(badOnes)){
        healthDmg[healthDmg$EVTYPE==badOnes[i],8]<-goodOnes[i]
}
healthDmg$EVTYPE<-factor(healthDmg$EVTYPE)

Then I created again a vector with unique event type field. As you can see there are only 49 unique values, and that is much better.

This is the unique vector calculated again:

uniqueEvType<-sort(unique(healthDmg$EVTYPE))
uniqueEvType
##  [1] AVALANCHE                BLACK ICE               
##  [3] BLIZZARD                 BLOWING SNOW            
##  [5] COASTAL STORM            COLD                    
##  [7] DENSE FOG                DUST STORM              
##  [9] EXCESSIVE HEAT           EXCESSIVE RAINFALL      
## [11] EXTREME COLD             EXTREME WINDCHILL       
## [13] FLASH FLOOD              FLOOD                   
## [15] FOG                      FREEZING DRIZZLE        
## [17] FREEZING RAIN            FROST                   
## [19] GUSTY WINDS              HAIL                    
## [21] HEAT                     HEAVY RAIN              
## [23] HEAVY SNOW               HIGH SEAS               
## [25] HIGH SURF                HIGH WIND               
## [27] HURRICANE                ICE STORM               
## [29] ICY ROADS                LANDSLIDE               
## [31] LIGHT SNOW               LIGHTNING               
## [33] MARINE ACCIDENT          MARINE HIGH WIND        
## [35] MARINE STRONG WIND       MARINE THUNDERSTORM WIND
## [37] MIXED PRECIP             RAIN                    
## [39] RIP CURRENT              STORM SURGE             
## [41] STRONG WIND              THUNDERSTORM WIND       
## [43] TORNADO                  TROPICAL STORM          
## [45] TSUNAMI                  WATERSPOUT              
## [47] WILDFIRE                 WINTER STORM            
## [49] WINTER WEATHER          
## 49 Levels: AVALANCHE BLACK ICE BLIZZARD BLOWING SNOW ... WINTER WEATHER

Results question 1

Now is time to process the data to answer the question number 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

I created a new data frame with total fatalities and injuries by type of event

harmPob<-aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,data=healthDmg, FUN = sum)

I ordered harmPob by Fatalities. And these are the five events that are more harmfull for people health

head(harmPob[order(-harmPob$FATALITIES),],5)
##                      EVTYPE FATALITIES INJURIES
## 43                  TORNADO       5227    60187
## 9            EXCESSIVE HEAT        402     4791
## 32                LIGHTNING        283      649
## 36 MARINE THUNDERSTORM WIND        211      666
## 13              FLASH FLOOD        171      641

Based on seccion 2.1.1 from the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 document, I created by hand (copy-paste and edit) a csv file with the event designator from Table 1. page 6.

designators<-read.csv("Designator.csv")
designators$Event.Name<-toupper(designators$Event.Name)

A merge was made in order to assign designator to each event. And then, based on the table information I completed the NA values

harmPobD<- merge(harmPob,designators,by.x="EVTYPE",by.y="Event.Name",all.x=TRUE)
harmPobD$Designator<-as.character(harmPobD$Designator)
harmPobD[is.na(harmPobD$Designator),4]<-c("Z","Z","Z","Z","C","Z","Z","Z","Z","Z","Z","Z","M","Z","Z","C","Z","M","C","C","Z")
harmPobD$Designator<-factor(harmPobD$Designator)

Then I got fatalities and injuries by designator

bydes<-aggregate(cbind(FATALITIES,INJURIES)~Designator,harmPobD,mean)

Then I did the following barplot

library(ggplot2)
library(reshape2)
bydes.long<-melt(bydes,id.vars="Designator")
bydes$long$variable<-substring(bydes.long$variable,2)
p<-ggplot(bydes.long,aes(variable,value,fill=as.factor(Designator)))+
        geom_bar(position="dodge",stat="identity")+ labs(x="Harms")+labs(title="Harm by Designation")
p

As you can see, there are more incidence of people injured in Counties and the fatality incidence is significantly lower than for injures. One explanation could be that people of a County are better adapted to Events that occurs more frecuently, and know how to protect them self from death.

Cleaning process for Question 2:

The event type data for this question were pretty dirty. So the cleaning process was more complicated

Preprosesing data

matDmg<-relevData[relevData$PROPDMG!=0,]
matDmg$EVTYPE<-toupper(matDmg$EVTYPE)

First we are going to create a vector with the unique ordered values

length(unique(matDmg$EVTYPE))
## [1] 375
uniqueEvType<-sort(unique(matDmg$EVTYPE))
uniqueEvType
##   [1] "   HIGH SURF ADVISORY"          " FLASH FLOOD"                  
##   [3] " TSTM WIND"                     " TSTM WIND (G45)"              
##   [5] "?"                              "APACHE COUNTY"                 
##   [7] "ASTRONOMICAL HIGH TIDE"         "ASTRONOMICAL LOW TIDE"         
##   [9] "AVALANCHE"                      "BEACH EROSION"                 
##  [11] "BLIZZARD"                       "BLIZZARD/WINTER STORM"         
##  [13] "BLOWING DUST"                   "BLOWING SNOW"                  
##  [15] "BREAKUP FLOODING"               "BRUSH FIRE"                    
##  [17] "COASTAL  FLOODING/EROSION"      "COASTAL EROSION"               
##  [19] "COASTAL FLOOD"                  "COASTAL FLOODING"              
##  [21] "COASTAL FLOODING/EROSION"       "COASTAL STORM"                 
##  [23] "COASTAL SURGE"                  "COLD"                          
##  [25] "COLD AIR TORNADO"               "COLD/WIND CHILL"               
##  [27] "DAM BREAK"                      "DAMAGING FREEZE"               
##  [29] "DENSE FOG"                      "DENSE SMOKE"                   
##  [31] "DOWNBURST"                      "DROUGHT"                       
##  [33] "DRY MICROBURST"                 "DUST DEVIL"                    
##  [35] "DUST DEVIL WATERSPOUT"          "DUST STORM"                    
##  [37] "DUST STORM/HIGH WINDS"          "EROSION/CSTL FLOOD"            
##  [39] "EXCESSIVE HEAT"                 "EXCESSIVE SNOW"                
##  [41] "EXTENDED COLD"                  "EXTREME COLD"                  
##  [43] "EXTREME COLD/WIND CHILL"        "EXTREME HEAT"                  
##  [45] "EXTREME WIND CHILL"             "EXTREME WINDCHILL"             
##  [47] "FLASH FLOOD"                    "FLASH FLOOD - HEAVY RAIN"      
##  [49] "FLASH FLOOD FROM ICE JAMS"      "FLASH FLOOD LANDSLIDES"        
##  [51] "FLASH FLOOD WINDS"              "FLASH FLOOD/"                  
##  [53] "FLASH FLOOD/ STREET"            "FLASH FLOOD/FLOOD"             
##  [55] "FLASH FLOOD/LANDSLIDE"          "FLASH FLOODING"                
##  [57] "FLASH FLOODING/FLOOD"           "FLASH FLOODING/THUNDERSTORM WI"
##  [59] "FLASH FLOODS"                   "FLOOD"                         
##  [61] "FLOOD & HEAVY RAIN"             "FLOOD FLASH"                   
##  [63] "FLOOD/FLASH"                    "FLOOD/FLASH FLOOD"             
##  [65] "FLOOD/FLASH/FLOOD"              "FLOOD/FLASHFLOOD"              
##  [67] "FLOOD/RIVER FLOOD"              "FLOODING"                      
##  [69] "FLOODING/HEAVY RAIN"            "FLOODS"                        
##  [71] "FOG"                            "FOREST FIRES"                  
##  [73] "FREEZE"                         "FREEZING DRIZZLE"              
##  [75] "FREEZING FOG"                   "FREEZING RAIN"                 
##  [77] "FREEZING RAIN/SLEET"            "FREEZING RAIN/SNOW"            
##  [79] "FROST"                          "FROST/FREEZE"                  
##  [81] "FROST\\FREEZE"                  "FUNNEL CLOUD"                  
##  [83] "GLAZE"                          "GLAZE ICE"                     
##  [85] "GRADIENT WIND"                  "GRASS FIRES"                   
##  [87] "GROUND BLIZZARD"                "GUSTNADO"                      
##  [89] "GUSTY WIND"                     "GUSTY WIND/HAIL"               
##  [91] "GUSTY WIND/HVY RAIN"            "GUSTY WIND/RAIN"               
##  [93] "GUSTY WINDS"                    "HAIL"                          
##  [95] "HAIL 0.75"                      "HAIL 100"                      
##  [97] "HAIL 175"                       "HAIL 275"                      
##  [99] "HAIL 450"                       "HAIL 75"                       
## [101] "HAIL DAMAGE"                    "HAIL/WIND"                     
## [103] "HAIL/WINDS"                     "HAILSTORM"                     
## [105] "HEAT"                           "HEAT WAVE"                     
## [107] "HEAT WAVE DROUGHT"              "HEAVY LAKE SNOW"               
## [109] "HEAVY MIX"                      "HEAVY PRECIPITATION"           
## [111] "HEAVY RAIN"                     "HEAVY RAIN AND FLOOD"          
## [113] "HEAVY RAIN/HIGH SURF"           "HEAVY RAIN/LIGHTNING"          
## [115] "HEAVY RAIN/SEVERE WEATHER"      "HEAVY RAIN/SMALL STREAM URBAN" 
## [117] "HEAVY RAIN/SNOW"                "HEAVY RAINS"                   
## [119] "HEAVY RAINS/FLOODING"           "HEAVY SHOWER"                  
## [121] "HEAVY SNOW"                     "HEAVY SNOW-SQUALLS"            
## [123] "HEAVY SNOW AND STRONG WINDS"    "HEAVY SNOW SHOWER"             
## [125] "HEAVY SNOW SQUALLS"             "HEAVY SNOW/BLIZZARD"           
## [127] "HEAVY SNOW/BLIZZARD/AVALANCHE"  "HEAVY SNOW/FREEZING RAIN"      
## [129] "HEAVY SNOW/HIGH WINDS & FLOOD"  "HEAVY SNOW/ICE"                
## [131] "HEAVY SNOW/SQUALLS"             "HEAVY SNOW/WIND"               
## [133] "HEAVY SNOW/WINTER STORM"        "HEAVY SNOWPACK"                
## [135] "HEAVY SURF"                     "HEAVY SURF COASTAL FLOODING"   
## [137] "HEAVY SURF/HIGH SURF"           "HEAVY SWELLS"                  
## [139] "HIGH  WINDS"                    "HIGH SEAS"                     
## [141] "HIGH SURF"                      "HIGH SWELLS"                   
## [143] "HIGH TIDES"                     "HIGH WATER"                    
## [145] "HIGH WIND"                      "HIGH WIND (G40)"               
## [147] "HIGH WIND 48"                   "HIGH WIND AND SEAS"            
## [149] "HIGH WIND DAMAGE"               "HIGH WIND/BLIZZARD"            
## [151] "HIGH WIND/HEAVY SNOW"           "HIGH WIND/SEAS"                
## [153] "HIGH WINDS"                     "HIGH WINDS HEAVY RAINS"        
## [155] "HIGH WINDS/"                    "HIGH WINDS/COASTAL FLOOD"      
## [157] "HIGH WINDS/COLD"                "HIGH WINDS/HEAVY RAIN"         
## [159] "HIGH WINDS/SNOW"                "HURRICANE"                     
## [161] "HURRICANE-GENERATED SWELLS"     "HURRICANE EMILY"               
## [163] "HURRICANE ERIN"                 "HURRICANE FELIX"               
## [165] "HURRICANE GORDON"               "HURRICANE OPAL"                
## [167] "HURRICANE OPAL/HIGH WINDS"      "HURRICANE/TYPHOON"             
## [169] "ICE"                            "ICE AND SNOW"                  
## [171] "ICE FLOES"                      "ICE JAM"                       
## [173] "ICE JAM FLOOD (MINOR"           "ICE JAM FLOODING"              
## [175] "ICE ROADS"                      "ICE STORM"                     
## [177] "ICE/STRONG WINDS"               "ICY ROADS"                     
## [179] "LAKE-EFFECT SNOW"               "LAKE EFFECT SNOW"              
## [181] "LAKE FLOOD"                     "LAKESHORE FLOOD"               
## [183] "LANDSLIDE"                      "LANDSLIDES"                    
## [185] "LANDSLUMP"                      "LANDSPOUT"                     
## [187] "LATE SEASON SNOW"               "LIGHT FREEZING RAIN"           
## [189] "LIGHT SNOW"                     "LIGHT SNOWFALL"                
## [191] "LIGHTING"                       "LIGHTNING"                     
## [193] "LIGHTNING  WAUSEON"             "LIGHTNING AND HEAVY RAIN"      
## [195] "LIGHTNING FIRE"                 "LIGHTNING THUNDERSTORM WINDS"  
## [197] "LIGHTNING/HEAVY RAIN"           "LIGNTNING"                     
## [199] "MAJOR FLOOD"                    "MARINE ACCIDENT"               
## [201] "MARINE HAIL"                    "MARINE HIGH WIND"              
## [203] "MARINE STRONG WIND"             "MARINE THUNDERSTORM WIND"      
## [205] "MARINE TSTM WIND"               "MICROBURST"                    
## [207] "MICROBURST WINDS"               "MINOR FLOODING"                
## [209] "MIXED PRECIPITATION"            "MUD SLIDE"                     
## [211] "MUD SLIDES"                     "MUD SLIDES URBAN FLOODING"     
## [213] "MUDSLIDE"                       "MUDSLIDES"                     
## [215] "NON-SEVERE WIND DAMAGE"         "NON-TSTM WIND"                 
## [217] "OTHER"                          "RAIN"                          
## [219] "RAINSTORM"                      "RECORD COLD"                   
## [221] "RECORD RAINFALL"                "RECORD SNOW"                   
## [223] "RIP CURRENT"                    "RIP CURRENTS"                  
## [225] "RIVER AND STREAM FLOOD"         "RIVER FLOOD"                   
## [227] "RIVER FLOODING"                 "ROCK SLIDE"                    
## [229] "ROUGH SURF"                     "RURAL FLOOD"                   
## [231] "SEICHE"                         "SEVERE THUNDERSTORM"           
## [233] "SEVERE THUNDERSTORM WINDS"      "SEVERE THUNDERSTORMS"          
## [235] "SEVERE TURBULENCE"              "SLEET/ICE STORM"               
## [237] "SMALL HAIL"                     "SNOW"                          
## [239] "SNOW ACCUMULATION"              "SNOW AND HEAVY SNOW"           
## [241] "SNOW AND ICE"                   "SNOW AND ICE STORM"            
## [243] "SNOW FREEZING RAIN"             "SNOW SQUALL"                   
## [245] "SNOW SQUALLS"                   "SNOW/ BITTER COLD"             
## [247] "SNOW/ ICE"                      "SNOW/BLOWING SNOW"             
## [249] "SNOW/COLD"                      "SNOW/FREEZING RAIN"            
## [251] "SNOW/HEAVY SNOW"                "SNOW/HIGH WINDS"               
## [253] "SNOW/ICE"                       "SNOW/ICE STORM"                
## [255] "SNOW/SLEET"                     "SNOW/SLEET/FREEZING RAIN"      
## [257] "SNOWMELT FLOODING"              "STORM FORCE WINDS"             
## [259] "STORM SURGE"                    "STORM SURGE/TIDE"              
## [261] "STRONG WIND"                    "STRONG WINDS"                  
## [263] "THUDERSTORM WINDS"              "THUNDEERSTORM WINDS"           
## [265] "THUNDERESTORM WINDS"            "THUNDERSNOW"                   
## [267] "THUNDERSTORM"                   "THUNDERSTORM  WINDS"           
## [269] "THUNDERSTORM DAMAGE TO"         "THUNDERSTORM HAIL"             
## [271] "THUNDERSTORM WIND"              "THUNDERSTORM WIND 60 MPH"      
## [273] "THUNDERSTORM WIND 65 MPH"       "THUNDERSTORM WIND 65MPH"       
## [275] "THUNDERSTORM WIND 98 MPH"       "THUNDERSTORM WIND G50"         
## [277] "THUNDERSTORM WIND G55"          "THUNDERSTORM WIND TREES"       
## [279] "THUNDERSTORM WIND/ TREE"        "THUNDERSTORM WIND/ TREES"      
## [281] "THUNDERSTORM WIND/AWNING"       "THUNDERSTORM WIND/HAIL"        
## [283] "THUNDERSTORM WIND/LIGHTNING"    "THUNDERSTORM WINDS"            
## [285] "THUNDERSTORM WINDS 13"          "THUNDERSTORM WINDS 63 MPH"     
## [287] "THUNDERSTORM WINDS AND"         "THUNDERSTORM WINDS HAIL"       
## [289] "THUNDERSTORM WINDS LIGHTNING"   "THUNDERSTORM WINDS."           
## [291] "THUNDERSTORM WINDS/ FLOOD"      "THUNDERSTORM WINDS/FLOODING"   
## [293] "THUNDERSTORM WINDS/FUNNEL CLOU" "THUNDERSTORM WINDS/HAIL"       
## [295] "THUNDERSTORM WINDS53"           "THUNDERSTORM WINDSHAIL"        
## [297] "THUNDERSTORM WINDSS"            "THUNDERSTORM WINS"             
## [299] "THUNDERSTORMS"                  "THUNDERSTORMS WIND"            
## [301] "THUNDERSTORMS WINDS"            "THUNDERSTORMW"                 
## [303] "THUNDERSTORMWINDS"              "THUNDERSTROM WIND"             
## [305] "THUNDERTORM WINDS"              "THUNERSTORM WINDS"             
## [307] "TIDAL FLOODING"                 "TORNADO"                       
## [309] "TORNADO F0"                     "TORNADO F1"                    
## [311] "TORNADO F2"                     "TORNADO F3"                    
## [313] "TORNADOES, TSTM WIND, HAIL"     "TORNDAO"                       
## [315] "TROPICAL DEPRESSION"            "TROPICAL STORM"                
## [317] "TROPICAL STORM ALBERTO"         "TROPICAL STORM DEAN"           
## [319] "TROPICAL STORM GORDON"          "TROPICAL STORM JERRY"          
## [321] "TSTM WIND"                      "TSTM WIND  (G45)"              
## [323] "TSTM WIND (41)"                 "TSTM WIND (G35)"               
## [325] "TSTM WIND (G40)"                "TSTM WIND (G45)"               
## [327] "TSTM WIND 40"                   "TSTM WIND 45"                  
## [329] "TSTM WIND 55"                   "TSTM WIND 65)"                 
## [331] "TSTM WIND AND LIGHTNING"        "TSTM WIND DAMAGE"              
## [333] "TSTM WIND G45"                  "TSTM WIND G58"                 
## [335] "TSTM WIND/HAIL"                 "TSTM WINDS"                    
## [337] "TSTMW"                          "TSUNAMI"                       
## [339] "TUNDERSTORM WIND"               "TYPHOON"                       
## [341] "URBAN AND SMALL"                "URBAN FLOOD"                   
## [343] "URBAN FLOODING"                 "URBAN FLOODS"                  
## [345] "URBAN SMALL"                    "URBAN/SMALL STREAM"            
## [347] "URBAN/SMALL STREAM FLOOD"       "URBAN/SML STREAM FLD"          
## [349] "VOLCANIC ASH"                   "WATERSPOUT"                    
## [351] "WATERSPOUT-"                    "WATERSPOUT-TORNADO"            
## [353] "WATERSPOUT TORNADO"             "WATERSPOUT/ TORNADO"           
## [355] "WATERSPOUT/TORNADO"             "WET MICROBURST"                
## [357] "WHIRLWIND"                      "WILD FIRES"                    
## [359] "WILD/FOREST FIRE"               "WILD/FOREST FIRES"             
## [361] "WILDFIRE"                       "WILDFIRES"                     
## [363] "WIND"                           "WIND AND WAVE"                 
## [365] "WIND DAMAGE"                    "WIND STORM"                    
## [367] "WIND/HAIL"                      "WINDS"                         
## [369] "WINTER STORM"                   "WINTER STORM HIGH WINDS"       
## [371] "WINTER STORMS"                  "WINTER WEATHER"                
## [373] "WINTER WEATHER MIX"             "WINTER WEATHER/MIX"            
## [375] "WINTRY MIX"

This function replace the original string by the cadena atribute if content variable is TRUE If content variabñe is FALSE extracts the substring before the initial position of cadena

clearSbst<-function(dataset,cadena,fix="TRUE",content=FALSE){
        for (i in grep(cadena,dataset)){
                pos<-regexpr("/",dataset[[i]],fixed=fix)
#                print(pos)
                if (!content)
                        dataset[[i]]<-substring(dataset[[i]],1,pos-1)
                else
                        dataset[[i]]<-substring(dataset[[i]],pos,pos+(nchar(cadena)+1))
        }
        dataset
}

Using clearSbst function, I replaced some of the values of event type field

matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE,"/")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "-")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "&")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE," AND ")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE,"\\\\")
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WINTER STORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WINTER WEATHER",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TSTM WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TROPICAL STORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "TORNADO",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "FLASH FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "COASTAL FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "COLD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "DUST DEVIL",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "GLAZE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "GUSTY WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HAIL",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HEAVY SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HEAVY SURF",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HIGH WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "HURRICANE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "ICE JAM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LANDSLIDE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LIGHT SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "LIGHTNING",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "MICROBURST",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "MUD SLIDE",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "RIP CURRENT",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "RIVER FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "SEVERE THUNDERSTORM",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "SNOW",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "STRONG WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "THUNDEERSTORM WIND",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "URBAN FLOOD",fix="TRUE",content=TRUE)
matDmg$EVTYPE<-clearSbst(matDmg$EVTYPE, "WIND",fix="TRUE",content=TRUE)

And then I used the bad and good terms lists to clean even more the data

for (i in 1:length(badOnes)){
        matDmg[matDmg$EVTYPE==badOnes[i],8]<-goodOnes[i]
}

Then I created again a vector with unique event type field. As you can see there are only 135 unique values, and that is much better.

This is the unique vector calculated again:

uniqueEvType<-sort(unique(matDmg$EVTYPE))
uniqueEvType
##   [1] ""                         "?"                       
##   [3] "APACHE COUNTY"            "ASTRONOMICAL HIGH TIDE"  
##   [5] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##   [7] "BEACH EROSION"            "BLIZZARD"                
##   [9] "BLOW"                     "BLOWING DUST"            
##  [11] "BREAKUP FLOODING"         "BRUSH FIRE"              
##  [13] "COASTAL EROSION"          "COASTAL FLOOD"           
##  [15] "COASTAL STORM"            "COASTAL SURGE"           
##  [17] "COLD"                     "DAM BREAK"               
##  [19] "DAMAGING FREEZE"          "DENSE FOG"               
##  [21] "DENSE SMOKE"              "DOWNBURST"               
##  [23] "DROUGHT"                  "DRY MICROB"              
##  [25] "DUST DEVIL"               "DUST STORM"              
##  [27] "EROSION"                  "EXCE"                    
##  [29] "EXCESSIVE HEAT"           "EXTE"                    
##  [31] "EXTR"                     "EXTREME HEAT"            
##  [33] "FLASH FLOOD"              "FLOOD"                   
##  [35] "FLOOD FLASH"              "FOG"                     
##  [37] "FOREST FIRES"             "FREEZE"                  
##  [39] "FREEZING DRIZZLE"         "FREEZING FOG"            
##  [41] "FREEZING RAIN"            "FROST"                   
##  [43] "FUNNEL CLOUD"             "GRAD"                    
##  [45] "GRASS FIRES"              "GROUND BLIZZARD"         
##  [47] "GUST"                     "GUSTNADO"                
##  [49] "HAIL"                     "HEAT"                    
##  [51] "HEAV"                     "HEAVY MIX"               
##  [53] "HEAVY PRECIPITATION"      "HEAVY RAIN"              
##  [55] "HEAVY RAINS"              "HEAVY SHOWER"            
##  [57] "HEAVY SNOW"               "HEAVY SWELLS"            
##  [59] "HIGH"                     "HIGH SEAS"               
##  [61] "HIGH SURF"                "HIGH SURF ADVISORY"      
##  [63] "HIGH SWELLS"              "HIGH TIDES"              
##  [65] "HIGH WATER"               "HIGH WIND"               
##  [67] "HURRICANE"                "ICE FLOES"               
##  [69] "ICE JAM"                  "ICE STORM"               
##  [71] "ICY ROADS"                "LAKE"                    
##  [73] "LAKE FLOOD"               "LAKESHORE FLOOD"         
##  [75] "LANDSLIDE"                "LANDSLUMP"               
##  [77] "LANDSPOUT"                "LATE"                    
##  [79] "LIGH"                     "LIGHT FREEZING RAIN"     
##  [81] "LIGHTNING"                "MAJOR FLOOD"             
##  [83] "MARI"                     "MARINE ACCIDENT"         
##  [85] "MARINE HI"                "MARINE STRO"             
##  [87] "MARINE THUNDERSTORM WIND" "MICROBURST"              
##  [89] "MINOR FLOODING"           "MIXED PRECIPITATION"     
##  [91] "MUD SLIDE"                "OTHER"                   
##  [93] "RAIN"                     "RAINSTORM"               
##  [95] "RECORD RAINFALL"          "RECORD SNOW"             
##  [97] "RIP CURRENT"              "RIVER FLOOD"             
##  [99] "ROCK SLIDE"               "RURAL FLOOD"             
## [101] "SEICHE"                   "SEVERE THUNDERSTORM"     
## [103] "SEVERE TURBULENCE"        "SLEET"                   
## [105] "SMAL"                     "STOR"                    
## [107] "STORM SURGE"              "STRO"                    
## [109] "THUD"                     "THUN"                    
## [111] "THUNDERST"                "THUNDERSTORM"            
## [113] "THUNDERSTORM DAMAGE TO"   "THUNDERSTORM WIND"       
## [115] "THUNDERSTORM WINS"        "THUNDERSTORMS"           
## [117] "TIDAL FLOODING"           "TORNADO"                 
## [119] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [121] "TSTM"                     "TSUNAMI"                 
## [123] "TUND"                     "TYPHOON"                 
## [125] "URBAN FLOOD"              "URBAN SMALL"             
## [127] "VOLCANIC ASH"             "WATERSPOUT"              
## [129] "WET MICROB"               "WHIR"                    
## [131] "WILDFIRE"                 "WILDFIRES"               
## [133] "WINTER STORM"             "WINTER WEATHER"          
## [135] "WINTRY MIX"

Results question 2

Finally the total inversión by event type was calculated answering the second question: 2. Across the United States, which types of events have the greatest economic consequences?

matDmg<-aggregate(cbind(PROPDMG,PROPDMGEXP)~EVTYPE,data=matDmg, FUN = sum)
head(matDmg[order(-matDmg$PROPDMG),],5)
##          EVTYPE   PROPDMG PROPDMGEXP
## 118     TORNADO 3214534.0     672670
## 33  FLASH FLOOD 1455187.6     358013
## 121        TSTM 1344802.7    1036905
## 110        THUN 1327667.5     938622
## 34        FLOOD  952186.6     185413

As you can see the Tornados are the event type that generate more costs.