Synopsis

After loading the original data, it was processed and cleaned according to the instructions in the Data Processing section of this document. After analysis of the dataset, two barplots were constructed to see which were the five top events that caused health damage and the five top events that caused economy damage. With the conclusions from this work, we hope better ways of prevention can be undertaken to minimize the impact of those weather events in human life and economy.

Data Processing

We will start by loading the dplyr package which will be very helpful in the processing of this data.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

After that, we will start by checking if the data is in the working directory or not and proceed to download it from the original website and loading it.

if (!"StormData.csv.bz2" %in% dir(".")) {
  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url, "StormData.csv.bz2")
}


if (!"data" %in% ls()){
  data <- read.csv("StormData.csv.bz2")
}

Let’s start by checking the structure of this data

str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

As we can see, it is quite a complex dataset. Let us try to simplify it a little before going into deep cleaning and processing. From reading the National Weather Service Storm Data Documentation (avaliable at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) and since our objective is to analyse the impact of weather event types in human health and economy, not all variables are of interest to us.

For our objective, the variables that will suffice are EVTYPE (for knowing the event type), FATALITIES and INJURIES (they are the main source of information about health damage), PROPDMG and CROPDMG (main source of information about economy damage as in propriety damage and crop damage) and CROPDMGEXP and PROPDMGEXP (allow us to evaluate the cost of damage in millions, billions, etc.) so it makes sense to subset only the important variables. We will also need the dates for reasons that will be explained soon.

data = select(data, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP, CROPDMGEXP)
head(data)
##             BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1  4/18/1950 0:00:00 TORNADO          0       15    25.0       0
## 2  4/18/1950 0:00:00 TORNADO          0        0     2.5       0
## 3  2/20/1951 0:00:00 TORNADO          0        2    25.0       0
## 4   6/8/1951 0:00:00 TORNADO          0        2     2.5       0
## 5 11/15/1951 0:00:00 TORNADO          0        2     2.5       0
## 6 11/15/1951 0:00:00 TORNADO          0        6     2.5       0
##   PROPDMGEXP CROPDMGEXP
## 1          K           
## 2          K           
## 3          K           
## 4          K           
## 5          K           
## 6          K

From the information that was given, we know that the events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records and thus more recent years should be considered more complete.

With this information, it is important to know the years for which we have more records and that can help us with better quality data. It’s also a good way to lessen the complexity of the dataset as well.

To do that, first we need to be able to get the years from the data (will need some processing of dates) and then build a histogram of the years and the frequency of events.

data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
data$BGN_DATE <- as.numeric(format(data$BGN_DATE, "%Y"))
head(data)
##   BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP
## 1     1950 TORNADO          0       15    25.0       0          K
## 2     1950 TORNADO          0        0     2.5       0          K
## 3     1951 TORNADO          0        2    25.0       0          K
## 4     1951 TORNADO          0        2     2.5       0          K
## 5     1951 TORNADO          0        2     2.5       0          K
## 6     1951 TORNADO          0        6     2.5       0          K
##   CROPDMGEXP
## 1           
## 2           
## 3           
## 4           
## 5           
## 6

Everything went well. Let us build the histogram next.

hist(data$BGN_DATE, main="Number of Events per Year", xlab="Year", ylab="Frequency of Events", col="red")

From observing the histogram, a good start point seems to subset data from this dataset starting from year 1990 until year 2011.

data <- filter(data, BGN_DATE >= 1990)
head(data)
##   BGN_DATE    EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP
## 1     1990      HAIL          0        0     0.0       0           
## 2     1990 TSTM WIND          0        0     0.0       0           
## 3     1990 TSTM WIND          0        0     0.0       0           
## 4     1990 TSTM WIND          0        0     0.0       0           
## 5     1990   TORNADO          0       28     2.5       0          M
## 6     1990 TSTM WIND          0        0     0.0       0           
##   CROPDMGEXP
## 1           
## 2           
## 3           
## 4           
## 5           
## 6

To lessen the complexity of this dataset even more, let us subset only the values where FATALITIES, INJURIES, PROPDMG or CROPDMG > 0 as those are the ones that we are interested in.

data <- filter(data, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
head(data)
##   BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP
## 1     1990 TORNADO          0       28     2.5       0          M
## 2     1990 TORNADO          0        0    25.0       0          K
## 3     1990 TORNADO          0        0    25.0       0          K
## 4     1990 TORNADO          0        3     2.5       0          M
## 5     1990 TORNADO          0        2     2.5       0          M
## 6     1990 TORNADO          0       15     2.5       0          M
##   CROPDMGEXP
## 1           
## 2           
## 3           
## 4           
## 5           
## 6

As we are interested in health damage and economy damage, we should add the damage by fatalities with the damage by injuries (new variable: HEALTHDMG) as well as the damage in proprieties and in crops (new variable: ECONDMG).

data <- mutate(data, HEALTHDMG = FATALITIES + INJURIES)

This takes care of the variable HEALTHDMG. For the ECONDMG, some more processing has to be done as the value of the sum of PROPDMG and CROPDMG has to take in account the values of CROPDMGEXP and PROPDMGEXP. From reading the documentation (link upper in the report), the meaning of those EXP is the following: “K” is thousands, “M” is millions and “B” is billions. Using the ifelse function we can build a variable that multiplies the damage cost by the value of that EXP or that equals the value of 1 (having no effect when the EXP is not specified) in the following way:

PROPFACTOR <- ifelse(data$PROPDMGEXP == "K", 1000, ifelse(data$PROPDMGEXP == "M", 1000000, ifelse(data$PROPDMGEXP == "B", 1000000000, 1)))

CROPFACTOR <- ifelse(data$CROPDMGEXP == "K", 1000, ifelse(data$CROPDMGEXP == "M", 1000000, ifelse(data$CROPDMGEXP == "B", 1000000000, 1)))

PROPDMG <- data$PROPDMG * PROPFACTOR

CROPDMG <- data$CROPDMG * CROPFACTOR

data$ECONDMG <- PROPDMG + CROPDMG

Taking another look at the data, everything seems to be alright.

head(data)
##   BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP
## 1     1990 TORNADO          0       28     2.5       0          M
## 2     1990 TORNADO          0        0    25.0       0          K
## 3     1990 TORNADO          0        0    25.0       0          K
## 4     1990 TORNADO          0        3     2.5       0          M
## 5     1990 TORNADO          0        2     2.5       0          M
## 6     1990 TORNADO          0       15     2.5       0          M
##   CROPDMGEXP HEALTHDMG ECONDMG
## 1                   28 2500000
## 2                    0   25000
## 3                    0   25000
## 4                    3 2500000
## 5                    2 2500000
## 6                   15 2500000

However if we look at the different categories of EVTYPES that exist in this dataset:

unique_evtype <- unique(data$EVTYPE)
str(unique_evtype)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 856 244 429 972 410 786 406 409 290 ...

We see it is a factor with 985 levels, which is strange considering according to the documentation, there are only 48 categories (table 2.1.1 - page 6).

unique_evtype
##   [1] TORNADO                        TSTM WIND                     
##   [3] HAIL                           ICE STORM/FLASH FLOOD         
##   [5] WINTER STORM                   HURRICANE OPAL/HIGH WINDS     
##   [7] THUNDERSTORM WINDS             HURRICANE ERIN                
##   [9] HURRICANE OPAL                 HEAVY RAIN                    
##  [11] LIGHTNING                      THUNDERSTORM WIND             
##  [13] DENSE FOG                      RIP CURRENT                   
##  [15] THUNDERSTORM WINS              FLASH FLOODING                
##  [17] FLASH FLOOD                    TORNADO F0                    
##  [19] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS/HAIL       
##  [21] HEAT                           HIGH WINDS                    
##  [23] WIND                           HEAVY RAINS                   
##  [25] LIGHTNING AND HEAVY RAIN       THUNDERSTORM WINDS HAIL       
##  [27] COLD                           HEAVY RAIN/LIGHTNING          
##  [29] FLASH FLOODING/THUNDERSTORM WI FLOODING                      
##  [31] WATERSPOUT                     EXTREME COLD                  
##  [33] LIGHTNING/HEAVY RAIN           BREAKUP FLOODING              
##  [35] HIGH WIND                      FREEZE                        
##  [37] RIVER FLOOD                    HIGH WINDS HEAVY RAINS        
##  [39] AVALANCHE                      MARINE MISHAP                 
##  [41] HIGH TIDES                     HIGH WIND/SEAS                
##  [43] HIGH WINDS/HEAVY RAIN          HIGH SEAS                     
##  [45] COASTAL FLOOD                  SEVERE TURBULENCE             
##  [47] RECORD RAINFALL                HEAVY SNOW                    
##  [49] HEAVY SNOW/WIND                DUST STORM                    
##  [51] FLOOD                          APACHE COUNTY                 
##  [53] SLEET                          DUST DEVIL                    
##  [55] ICE STORM                      EXCESSIVE HEAT                
##  [57] THUNDERSTORM WINDS/FUNNEL CLOU GUSTY WINDS                   
##  [59] FLOODING/HEAVY RAIN            HEAVY SURF COASTAL FLOODING   
##  [61] HIGH SURF                      WILD FIRES                    
##  [63] HIGH                           WINTER STORM HIGH WINDS       
##  [65] WINTER STORMS                  MUDSLIDES                     
##  [67] RAINSTORM                      SEVERE THUNDERSTORM           
##  [69] SEVERE THUNDERSTORMS           SEVERE THUNDERSTORM WINDS     
##  [71] THUNDERSTORMS WINDS            FLOOD/FLASH FLOOD             
##  [73] FLOOD/RAIN/WINDS               THUNDERSTORMS                 
##  [75] FLASH FLOOD WINDS              WINDS                         
##  [77] FUNNEL CLOUD                   HIGH WIND DAMAGE              
##  [79] STRONG WIND                    HEAVY SNOWPACK                
##  [81] FLASH FLOOD/                   HEAVY SURF                    
##  [83] DRY MIRCOBURST WINDS           DRY MICROBURST                
##  [85] URBAN FLOOD                    THUNDERSTORM WINDSS           
##  [87] MICROBURST WINDS               HEAT WAVE                     
##  [89] UNSEASONABLY WARM              COASTAL FLOODING              
##  [91] STRONG WINDS                   BLIZZARD                      
##  [93] WATERSPOUT/TORNADO             WATERSPOUT TORNADO            
##  [95] STORM SURGE                    URBAN/SMALL STREAM FLOOD      
##  [97] WATERSPOUT-                    TORNADOES, TSTM WIND, HAIL    
##  [99] TROPICAL STORM ALBERTO         TROPICAL STORM                
## [101] TROPICAL STORM GORDON          TROPICAL STORM JERRY          
## [103] LIGHTNING THUNDERSTORM WINDS   URBAN FLOODING                
## [105] MINOR FLOODING                 WATERSPOUT-TORNADO            
## [107] LIGHTNING INJURY               LIGHTNING AND THUNDERSTORM WIN
## [109] FLASH FLOODS                   THUNDERSTORM WINDS53          
## [111] WILDFIRE                       DAMAGING FREEZE               
## [113] THUNDERSTORM WINDS 13          HURRICANE                     
## [115] SNOW                           LIGNTNING                     
## [117] FROST                          FREEZING RAIN/SNOW            
## [119] HIGH WINDS/                    THUNDERSNOW                   
## [121] FLOODS                         COOL AND WET                  
## [123] HEAVY RAIN/SNOW                GLAZE ICE                     
## [125] MUD SLIDE                      HIGH  WINDS                   
## [127] RURAL FLOOD                    MUD SLIDES                    
## [129] EXTREME HEAT                   DROUGHT                       
## [131] COLD AND WET CONDITIONS        EXCESSIVE WETNESS             
## [133] SLEET/ICE STORM                GUSTNADO                      
## [135] FREEZING RAIN                  SNOW AND HEAVY SNOW           
## [137] GROUND BLIZZARD                EXTREME WIND CHILL            
## [139] MAJOR FLOOD                    SNOW/HEAVY SNOW               
## [141] FREEZING RAIN/SLEET            ICE JAM FLOODING              
## [143] COLD AIR TORNADO               WIND DAMAGE                   
## [145] FOG                            TSTM WIND 55                  
## [147] SMALL STREAM FLOOD             THUNDERTORM WINDS             
## [149] HAIL/WINDS                     SNOW AND ICE                  
## [151] WIND STORM                     GRASS FIRES                   
## [153] LAKE FLOOD                     HAIL/WIND                     
## [155] WIND/HAIL                      ICE                           
## [157] SNOW AND ICE STORM             THUNDERSTORM  WINDS           
## [159] WINTER WEATHER                 DROUGHT/EXCESSIVE HEAT        
## [161] THUNDERSTORMS WIND             TUNDERSTORM WIND              
## [163] URBAN AND SMALL STREAM FLOODIN THUNDERSTORM WIND/LIGHTNING   
## [165] HEAVY RAIN/SEVERE WEATHER      THUNDERSTORM                  
## [167] WATERSPOUT/ TORNADO            LIGHTNING.                    
## [169] HURRICANE-GENERATED SWELLS     RIVER AND STREAM FLOOD        
## [171] HIGH WINDS/COASTAL FLOOD       RAIN                          
## [173] RIVER FLOODING                 ICE FLOES                     
## [175] THUNDERSTORM WIND G50          LIGHTNING FIRE                
## [177] HEAVY LAKE SNOW                RECORD COLD                   
## [179] HEAVY SNOW/FREEZING RAIN       COLD WAVE                     
## [181] DUST DEVIL WATERSPOUT          TORNADO F3                    
## [183] TORNDAO                        FLOOD/RIVER FLOOD             
## [185] MUD SLIDES URBAN FLOODING      TORNADO F1                    
## [187] GLAZE/ICE STORM                GLAZE                         
## [189] HEAVY SNOW/WINTER STORM        MICROBURST                    
## [191] AVALANCE                       BLIZZARD/WINTER STORM         
## [193] DUST STORM/HIGH WINDS          ICE JAM                       
## [195] FOREST FIRES                   FROST\\FREEZE                 
## [197] THUNDERSTORM WINDS.            HVY RAIN                      
## [199] HAIL 150                       HAIL 075                      
## [201] HAIL 100                       THUNDERSTORM WIND G55         
## [203] HAIL 125                       THUNDERSTORM WIND G60         
## [205] THUNDERSTORM WINDS G60         HARD FREEZE                   
## [207] HAIL 200                       HEAVY SNOW AND HIGH WINDS     
## [209] HEAVY SNOW/HIGH WINDS & FLOOD  HEAVY RAIN AND FLOOD          
## [211] RIP CURRENTS/HEAVY SURF        URBAN AND SMALL               
## [213] WILDFIRES                      FOG AND COLD TEMPERATURES     
## [215] SNOW/COLD                      FLASH FLOOD FROM ICE JAMS     
## [217] TSTM WIND G58                  MUDSLIDE                      
## [219] HEAVY SNOW SQUALLS             SNOW SQUALL                   
## [221] SNOW/ICE STORM                 HEAVY SNOW/SQUALLS            
## [223] HEAVY SNOW-SQUALLS             ICY ROADS                     
## [225] HEAVY MIX                      SNOW FREEZING RAIN            
## [227] SNOW/SLEET                     SNOW/FREEZING RAIN            
## [229] SNOW SQUALLS                   SNOW/SLEET/FREEZING RAIN      
## [231] RECORD SNOW                    HAIL 0.75                     
## [233] RECORD HEAT                    THUNDERSTORM WIND 65MPH       
## [235] THUNDERSTORM WIND/ TREES       THUNDERSTORM WIND/AWNING      
## [237] THUNDERSTORM WIND 98 MPH       THUNDERSTORM WIND TREES       
## [239] TORNADO F2                     RIP CURRENTS                  
## [241] HURRICANE EMILY                COASTAL SURGE                 
## [243] HURRICANE GORDON               HURRICANE FELIX               
## [245] THUNDERSTORM WIND 60 MPH       THUNDERSTORM WINDS 63 MPH     
## [247] THUNDERSTORM WIND/ TREE        THUNDERSTORM DAMAGE TO        
## [249] THUNDERSTORM WIND 65 MPH       FLASH FLOOD - HEAVY RAIN      
## [251] THUNDERSTORM WIND.             FLASH FLOOD/ STREET           
## [253] BLOWING SNOW                   HEAVY SNOW/BLIZZARD           
## [255] THUNDERSTORM HAIL              THUNDERSTORM WINDSHAIL        
## [257] LIGHTNING  WAUSEON             THUDERSTORM WINDS             
## [259] ICE AND SNOW                   STORM FORCE WINDS             
## [261] HEAVY SNOW/ICE                 LIGHTING                      
## [263] HIGH WIND/HEAVY SNOW           THUNDERSTORM WINDS AND        
## [265] HEAVY PRECIPITATION            HIGH WIND/BLIZZARD            
## [267] TSTM WIND DAMAGE               FLOOD FLASH                   
## [269] RAIN/WIND                      SNOW/ICE                      
## [271] HAIL 75                        HEAT WAVE DROUGHT             
## [273] HEAVY SNOW/BLIZZARD/AVALANCHE  HEAT WAVES                    
## [275] UNSEASONABLY WARM AND DRY      UNSEASONABLY COLD             
## [277] RECORD/EXCESSIVE HEAT          THUNDERSTORM WIND G52         
## [279] HIGH WAVES                     FLASH FLOOD/FLOOD             
## [281] FLOOD/FLASH                    LOW TEMPERATURE               
## [283] HEAVY RAINS/FLOODING           THUNDERESTORM WINDS           
## [285] THUNDERSTORM WINDS/FLOODING    HYPOTHERMIA                   
## [287] THUNDEERSTORM WINDS            THUNERSTORM WINDS             
## [289] HIGH WINDS/COLD                COLD/WINDS                    
## [291] SNOW/ BITTER COLD              COLD WEATHER                  
## [293] RAPIDLY RISING WATER           WILD/FOREST FIRE              
## [295] ICE/STRONG WINDS               SNOW/HIGH WINDS               
## [297] HIGH WINDS/SNOW                SNOWMELT FLOODING             
## [299] HEAVY SNOW AND STRONG WINDS    SNOW ACCUMULATION             
## [301] SNOW/ ICE                      SNOW/BLOWING SNOW             
## [303] TORNADOES                      THUNDERSTORM WIND/HAIL        
## [305] FREEZING DRIZZLE               HAIL 175                      
## [307] FLASH FLOODING/FLOOD           HAIL 275                      
## [309] HAIL 450                       EXCESSIVE RAINFALL            
## [311] THUNDERSTORMW                  HAILSTORM                     
## [313] TSTM WINDS                     TSTMW                         
## [315] TSTM WIND 65)                  TROPICAL STORM DEAN           
## [317] THUNDERSTORM WINDS/ FLOOD      LANDSLIDE                     
## [319] HIGH WIND AND SEAS             THUNDERSTORMWINDS             
## [321] WILD/FOREST FIRES              HEAVY SEAS                    
## [323] HAIL DAMAGE                    FLOOD & HEAVY RAIN            
## [325] ?                              THUNDERSTROM WIND             
## [327] FLOOD/FLASHFLOOD               HIGH WATER                    
## [329] HIGH WIND 48                   LANDSLIDES                    
## [331] URBAN/SMALL STREAM             BRUSH FIRE                    
## [333] HEAVY SHOWER                   HEAVY SWELLS                  
## [335] URBAN SMALL                    URBAN FLOODS                  
## [337] FLASH FLOOD/LANDSLIDE          HEAVY RAIN/SMALL STREAM URBAN 
## [339] FLASH FLOOD LANDSLIDES         TSTM WIND/HAIL                
## [341] Other                          Ice jam flood (minor          
## [343] Tstm Wind                      URBAN/SML STREAM FLD          
## [345] ROUGH SURF                     Heavy Surf                    
## [347] Dust Devil                     Marine Accident               
## [349] Freeze                         Strong Wind                   
## [351] COASTAL STORM                  Erosion/Cstl Flood            
## [353] River Flooding                 Damaging Freeze               
## [355] Beach Erosion                  High Surf                     
## [357] Heavy Rain/High Surf           Unseasonable Cold             
## [359] Early Frost                    Wintry Mix                    
## [361] Extreme Cold                   Coastal Flooding              
## [363] Torrential Rainfall            Landslump                     
## [365] Hurricane Edouard              Coastal Storm                 
## [367] TIDAL FLOODING                 Tidal Flooding                
## [369] Strong Winds                   EXTREME WINDCHILL             
## [371] Glaze                          Extended Cold                 
## [373] Whirlwind                      Heavy snow shower             
## [375] Light snow                     Light Snow                    
## [377] MIXED PRECIP                   Freezing Spray                
## [379] DOWNBURST                      Mudslides                     
## [381] Microburst                     Mudslide                      
## [383] Cold                           Coastal Flood                 
## [385] Snow Squalls                   Wind Damage                   
## [387] Light Snowfall                 Freezing Drizzle              
## [389] Gusty wind/rain                GUSTY WIND/HVY RAIN           
## [391] Wind                           Cold Temperature              
## [393] Heat Wave                      Snow                          
## [395] COLD AND SNOW                  RAIN/SNOW                     
## [397] TSTM WIND (G45)                Gusty Winds                   
## [399] GUSTY WIND                     TSTM WIND 40                  
## [401] TSTM WIND 45                   TSTM WIND (41)                
## [403] TSTM WIND (G40)                Frost/Freeze                  
## [405] AGRICULTURAL FREEZE            OTHER                         
## [407] Hypothermia/Exposure           HYPOTHERMIA/EXPOSURE          
## [409] Lake Effect Snow               Freezing Rain                 
## [411] Mixed Precipitation            BLACK ICE                     
## [413] COASTALSTORM                   LIGHT SNOW                    
## [415] DAM BREAK                      Gusty winds                   
## [417] blowing snow                   GRADIENT WIND                 
## [419] TSTM WIND AND LIGHTNING        gradient wind                 
## [421] Gradient wind                  Freezing drizzle              
## [423] WET MICROBURST                 Heavy surf and wind           
## [425] TYPHOON                        HIGH SWELLS                   
## [427] SMALL HAIL                     UNSEASONAL RAIN               
## [429] COASTAL FLOODING/EROSION        TSTM WIND (G45)              
## [431] TSTM WIND  (G45)               HIGH WIND (G40)               
## [433] TSTM WIND (G35)                COASTAL EROSION               
## [435] SEICHE                         COASTAL  FLOODING/EROSION     
## [437] HYPERTHERMIA/EXPOSURE          WINTRY MIX                    
## [439] ROCK SLIDE                     GUSTY WIND/HAIL               
## [441]  TSTM WIND                     LANDSPOUT                     
## [443] EXCESSIVE SNOW                 LAKE EFFECT SNOW              
## [445] FLOOD/FLASH/FLOOD              MIXED PRECIPITATION           
## [447] WIND AND WAVE                  LIGHT FREEZING RAIN           
## [449] ICE ROADS                      ROUGH SEAS                    
## [451] TSTM WIND G45                  NON-SEVERE WIND DAMAGE        
## [453] WARM WEATHER                   THUNDERSTORM WIND (G40)       
## [455]  FLASH FLOOD                   LATE SEASON SNOW              
## [457] WINTER WEATHER MIX             ROGUE WAVE                    
## [459] FALLING SNOW/ICE               NON-TSTM WIND                 
## [461] NON TSTM WIND                  BLOWING DUST                  
## [463] VOLCANIC ASH                      HIGH SURF ADVISORY         
## [465] HAZARDOUS SURF                 WHIRLWIND                     
## [467] ICE ON ROAD                    DROWNING                      
## [469] EXTREME COLD/WIND CHILL        MARINE TSTM WIND              
## [471] HURRICANE/TYPHOON              WINTER WEATHER/MIX            
## [473] FROST/FREEZE                   ASTRONOMICAL HIGH TIDE        
## [475] HEAVY SURF/HIGH SURF           TROPICAL DEPRESSION           
## [477] LAKE-EFFECT SNOW               MARINE HIGH WIND              
## [479] TSUNAMI                        STORM SURGE/TIDE              
## [481] COLD/WIND CHILL                LAKESHORE FLOOD               
## [483] MARINE THUNDERSTORM WIND       MARINE STRONG WIND            
## [485] ASTRONOMICAL LOW TIDE          DENSE SMOKE                   
## [487] MARINE HAIL                    FREEZING FOG                  
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

By checking this, we see that many categories are badly spelled or new categories were created. A full blown cleaning of this dataset would need a specialist to deal with all the meta data. I will now use regular expressions to try to join most of the scattered categories into the categories in the documentation.

data$EVTYPE[grep("^Aval", data$EVTYPE, ignore.case=TRUE)] <- "AVALANCHE"
data$EVTYPE[grep("^Blizz", data$EVTYPE, ignore.case=TRUE)] <- "BLIZZARD"
data$EVTYPE[grep("^Astronomic", data$EVTYPE, ignore.case=TRUE)] <- "ASTRONOMICAL LOW TIDE"
data$EVTYPE[grep("^Cold", data$EVTYPE, ignore.case=TRUE)] <- "COLD/WIND CHILL"
data$EVTYPE[grep("^Dry", data$EVTYPE, ignore.case=TRUE)] <- "HEAT"
data$EVTYPE[grep("Excessive Heat", data$EVTYPE, ignore.case=TRUE)] <- "EXCESSIVE HEAT"
data$EVTYPE[grep("Fire", data$EVTYPE, ignore.case=TRUE)] <- "WILDFIRE"
data$EVTYPE[grep("Marine", data$EVTYPE, ignore.case=TRUE)] <- "MARINE THUNDERSTORM WIND"
data$EVTYPE[grep("^Flood", data$EVTYPE, ignore.case=TRUE)] <- "FLOOD"
data$EVTYPE[grep("Fog", data$EVTYPE, ignore.case=TRUE)] <- "DENSE FOG"
data$EVTYPE[grep("Hurricane", data$EVTYPE, ignore.case=TRUE)] <- "HURRICANE"
data$EVTYPE[grep("Typh", data$EVTYPE, ignore.case=TRUE)] <- "HURRICANE"
data$EVTYPE[grep("Winter", data$EVTYPE, ignore.case=TRUE)] <- "WINTER STORM"
data$EVTYPE[grep("Torn", data$EVTYPE, ignore.case=TRUE)] <- "TORNADO"
data$EVTYPE[grep("^RIP", data$EVTYPE, ignore.case=TRUE)] <- "RIP CURRENT"
data$EVTYPE[grep("^TSTM", data$EVTYPE, ignore.case=TRUE)] <- "THUNDERSTORM WIND"
data$EVTYPE[grep("^Snow", data$EVTYPE, ignore.case=TRUE)] <- "HEAVY SNOW"
data$EVTYPE[grep("Snow$", data$EVTYPE, ignore.case=TRUE)] <- "LAKE-EFFECT SNOW"
data$EVTYPE[grep("Storm surge", data$EVTYPE, ignore.case=TRUE)] <- "STORM SURGE/TIDE"
data$EVTYPE[grep("^Ice", data$EVTYPE, ignore.case=TRUE)] <- "ICE STORM"
data$EVTYPE[grep("^Water", data$EVTYPE, ignore.case=TRUE)] <- "WATERSPOUT"
data$EVTYPE[grep("rain", data$EVTYPE, ignore.case=TRUE)] <- "HEAVY RAIN"
data$EVTYPE[grep("Mud", data$EVTYPE, ignore.case=TRUE)] <- "AVALANCHE"
data$EVTYPE[grep("FLD", data$EVTYPE, ignore.case=TRUE)] <- "FLOOD"
data$EVTYPE[grep("High Wind", data$EVTYPE, ignore.case=TRUE)] <- "HIGH WIND"
data$EVTYPE[grep("^HEAVY SNOW", data$EVTYPE, ignore.case=TRUE)] <- "HEAVY SNOW"
data$EVTYPE[intersect(grep("THUNDERSTORM WIND", data$EVTYPE, ignore.case=TRUE), grep("MARINE", data$EVTYPE, ignore.case=TRUE, invert=TRUE))] <- "THUNDERSTORM WIND"

Now that the hard part of cleaning and pre-processing the data is done, it is time to analyse the data and get some answers for the questions we have.

Results

Using the dplyr package, let us group the dataset by event type. After that, it’s a matter of summing all the HEALTHDMG and ECONDMG according to the event type and ordering the dataset from higher value of damage to lower value.

by_event <- group_by(data, EVTYPE)

by_event <- summarise(by_event, HEALTHDMG = sum(HEALTHDMG), ECONDMG = sum(ECONDMG))

Starting with the analysis of the health damage and building a barplot of the top 5 events that cause health damage

by_event <- arrange(by_event, desc(HEALTHDMG))

barplot(by_event$HEALTHDMG[1:5], col = rainbow(5), 
        main = "Top 5 Events that cause Health Damage",
        xlab = "Events",
        ylab = "Health Damage caused")
legend("topright", legend = by_event$EVTYPE[1:5], fill = rainbow(5))

From this barplot, we can see that the top 5 events that cause health damage are in order: Tornadoes, Excessive Heat, Thunderstorm Winds, Floods and Lightning

Doing the same for the ECONDMG column:

by_event <- arrange(by_event, desc(ECONDMG))

barplot(by_event$ECONDMG[1:5], col = rainbow(5), 
        main = "Top 5 Events that cause Economic Damage",
        xlab = "Events",
        ylab = "Economic Damage caused")
legend("topright", legend = by_event$EVTYPE[1:5], fill = rainbow(5))

From this barplot, we conclude that the top 5 events that cause economic damage are in order: Floods, Hurricanes, Storm Surges/Tides, Tornadoes and Hail.