Synopsis

When looking at significant natural disasters in the USA it becomes clear that certain types of events cause more bodily harm while other types cause more financial harm. Tornados top out the list in terms of causing bodiy harm while floods and hurricanes cause the most property damage. Finally droughts cause the most crop damage with floods a close second.

Loading and Processing the Raw Data

First data was downloaded from the National Oceanic and Atmospheric Administration(NOAA) which contains data on significant weather phenomena.

if(!file.exists("StormData.csv.bz2")){
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              "StormData.csv.bz2",method="curl")
}

Load the libraries we will need

library(dplyr)
library(tidyr)
library(ggplot2)

Reading in the data

First the data was read in from the file changing the option to not make strings factors as this is a costly operation for large datasets.

data <- read.csv("StormData.csv.bz2",stringsAsFactors=F)

Now we are going to have a quick look around our dataset to see what we have.

dim(data)
## [1] 902297     37
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI       
##  Min.   :   0.000   Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character  
##  Mean   :   1.484                                        
##  3rd Qu.:   1.000                                        
##  Max.   :3749.000                                        
##                                                          
##    END_DATE           END_TIME           COUNTY_END COUNTYENDN    
##  Length:902297      Length:902297      Min.   :0    Mode:logical  
##  Class :character   Class :character   1st Qu.:0    NA's:902297   
##  Mode  :character   Mode  :character   Median :0                  
##                                        Mean   :0                  
##                                        3rd Qu.:0                  
##                                        Max.   :0                  
##                                                                   
##    END_RANGE          END_AZI           END_LOCATI       
##  Min.   :  0.0000   Length:902297      Length:902297     
##  1st Qu.:  0.0000   Class :character   Class :character  
##  Median :  0.0000   Mode  :character   Mode  :character  
##  Mean   :  0.9862                                        
##  3rd Qu.:  0.0000                                        
##  Max.   :925.0000                                        
##                                                          
##      LENGTH              WIDTH                F               MAG         
##  Min.   :   0.0000   Min.   :   0.000   Min.   :0.0      Min.   :    0.0  
##  1st Qu.:   0.0000   1st Qu.:   0.000   1st Qu.:0.0      1st Qu.:    0.0  
##  Median :   0.0000   Median :   0.000   Median :1.0      Median :   50.0  
##  Mean   :   0.2301   Mean   :   7.503   Mean   :0.9      Mean   :   46.9  
##  3rd Qu.:   0.0000   3rd Qu.:   0.000   3rd Qu.:1.0      3rd Qu.:   75.0  
##  Max.   :2315.0000   Max.   :4400.000   Max.   :5.0      Max.   :22000.0  
##                                         NA's   :843563                    
##    FATALITIES          INJURIES            PROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Median :  0.0000   Median :   0.0000   Median :   0.00  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##                                                          
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000                     
##                                                         
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

So first lets reduce this dataset to something manageable. We have no interest in events which have no fatalities, injuries, property damage or crop damage so lets remove those.

data <- filter(data,FATALITIES > 0 | INJURIES >0 | PROPDMG > 0 | CROPDMG > 0)
dim(data)
## [1] 254633     37

Now we are down to about 250,000 events from almost 1 million.

Looking at the documentation for this dataset we need to interpret the PROPDMG and CROPDMG numbers with their magnitude.

table(data$PROPDMGEXP)
## 
##             -      +      0      2      3      4      5      6      7 
##  11585      1      5    210      1      1      4     18      3      3 
##      B      h      H      K      m      M 
##     40      1      6 231428      7  11320
table(data$CROPDMGEXP)
## 
##             ?      0      B      k      K      m      M 
## 152664      6     17      7     21  99932      1   1985

We can see that in some cases people didn’t follow the instructions. While we could make a guess as to what some of these mean it wouldn’t be reliable. So the only assumption we will make is that 0 means no multiplier and we will simply drop the rest.

data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
data <- subset(data, PROPDMGEXP %in% c("","0","K","M","B"))
data <- subset(data, CROPDMGEXP %in% c("","0","K","M","B"))
dim(data)
## [1] 254584     37

Now we can move on to replacing those letter to numbers.

data$PROPMAG<-1
data$PROPMAG[data$PROPDMGEXP=="K"] <- 1000
data$PROPMAG[data$PROPDMGEXP=="M"] <- 1000000
data$PROPMAG[data$PROPDMGEXP=="B"] <- 1000000000
data$CROPMAG<-1
data$CROPMAG[data$CROPDMGEXP=="K"] <- 1000
data$CROPMAG[data$CROPDMGEXP=="M"] <- 1000000
data$CROPMAG[data$CROPDMGEXP=="B"] <- 1000000000

Now we can extrapolate the correct number for damage by multiplying the columns.

data$PROPDMG <- data$PROPDMG * data$PROPMAG
data$CROPDMG <- data$CROPDMG * data$CROPMAG
summary(data$PROPDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 2.000e+03 1.000e+04 1.678e+06 3.500e+04 1.150e+11
summary(data$CROPDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 1.928e+05 0.000e+00 5.000e+09

We also have too many columns so lets just grab the columns we want to deal with

data <- select(data,EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG)
dim(data)
## [1] 254584      5

Now lets have look at the EVTYPE field.

unique(data$EVTYPE)
##   [1] "TORNADO"                        "TSTM WIND"                     
##   [3] "HAIL"                           "ICE STORM/FLASH FLOOD"         
##   [5] "WINTER STORM"                   "HURRICANE OPAL/HIGH WINDS"     
##   [7] "THUNDERSTORM WINDS"             "HURRICANE ERIN"                
##   [9] "HURRICANE OPAL"                 "HEAVY RAIN"                    
##  [11] "LIGHTNING"                      "THUNDERSTORM WIND"             
##  [13] "DENSE FOG"                      "RIP CURRENT"                   
##  [15] "THUNDERSTORM WINS"              "FLASH FLOODING"                
##  [17] "FLASH FLOOD"                    "TORNADO F0"                    
##  [19] "THUNDERSTORM WINDS LIGHTNING"   "THUNDERSTORM WINDS/HAIL"       
##  [21] "HEAT"                           "HIGH WINDS"                    
##  [23] "WIND"                           "HEAVY RAINS"                   
##  [25] "LIGHTNING AND HEAVY RAIN"       "THUNDERSTORM WINDS HAIL"       
##  [27] "COLD"                           "HEAVY RAIN/LIGHTNING"          
##  [29] "FLASH FLOODING/THUNDERSTORM WI" "FLOODING"                      
##  [31] "WATERSPOUT"                     "EXTREME COLD"                  
##  [33] "LIGHTNING/HEAVY RAIN"           "HIGH WIND"                     
##  [35] "FREEZE"                         "RIVER FLOOD"                   
##  [37] "HIGH WINDS HEAVY RAINS"         "AVALANCHE"                     
##  [39] "MARINE MISHAP"                  "HIGH TIDES"                    
##  [41] "HIGH WIND/SEAS"                 "HIGH WINDS/HEAVY RAIN"         
##  [43] "HIGH SEAS"                      "COASTAL FLOOD"                 
##  [45] "SEVERE TURBULENCE"              "RECORD RAINFALL"               
##  [47] "HEAVY SNOW"                     "HEAVY SNOW/WIND"               
##  [49] "DUST STORM"                     "FLOOD"                         
##  [51] "APACHE COUNTY"                  "SLEET"                         
##  [53] "DUST DEVIL"                     "ICE STORM"                     
##  [55] "EXCESSIVE HEAT"                 "THUNDERSTORM WINDS/FUNNEL CLOU"
##  [57] "GUSTY WINDS"                    "HEAVY SURF COASTAL FLOODING"   
##  [59] "HIGH SURF"                      "WILD FIRES"                    
##  [61] "HIGH"                           "WINTER STORM HIGH WINDS"       
##  [63] "WINTER STORMS"                  "MUDSLIDES"                     
##  [65] "RAINSTORM"                      "SEVERE THUNDERSTORM"           
##  [67] "SEVERE THUNDERSTORMS"           "SEVERE THUNDERSTORM WINDS"     
##  [69] "THUNDERSTORMS WINDS"            "FLOOD/FLASH FLOOD"             
##  [71] "FLOOD/RAIN/WINDS"               "THUNDERSTORMS"                 
##  [73] "WINDS"                          "FUNNEL CLOUD"                  
##  [75] "HIGH WIND DAMAGE"               "STRONG WIND"                   
##  [77] "HEAVY SNOWPACK"                 "FLASH FLOOD/"                  
##  [79] "HEAVY SURF"                     "DRY MIRCOBURST WINDS"          
##  [81] "DRY MICROBURST"                 "URBAN FLOOD"                   
##  [83] "THUNDERSTORM WINDSS"            "MICROBURST WINDS"              
##  [85] "HEAT WAVE"                      "UNSEASONABLY WARM"             
##  [87] "COASTAL FLOODING"               "STRONG WINDS"                  
##  [89] "BLIZZARD"                       "WATERSPOUT/TORNADO"            
##  [91] "WATERSPOUT TORNADO"             "STORM SURGE"                   
##  [93] "URBAN/SMALL STREAM FLOOD"       "WATERSPOUT-"                   
##  [95] "TORNADOES, TSTM WIND, HAIL"     "TROPICAL STORM ALBERTO"        
##  [97] "TROPICAL STORM"                 "TROPICAL STORM GORDON"         
##  [99] "TROPICAL STORM JERRY"           "LIGHTNING THUNDERSTORM WINDS"  
## [101] "URBAN FLOODING"                 "MINOR FLOODING"                
## [103] "WATERSPOUT-TORNADO"             "LIGHTNING INJURY"              
## [105] "LIGHTNING AND THUNDERSTORM WIN" "FLASH FLOODS"                  
## [107] "THUNDERSTORM WINDS53"           "WILDFIRE"                      
## [109] "DAMAGING FREEZE"                "THUNDERSTORM WINDS 13"         
## [111] "HURRICANE"                      "SNOW"                          
## [113] "LIGNTNING"                      "FROST"                         
## [115] "FREEZING RAIN/SNOW"             "HIGH WINDS/"                   
## [117] "THUNDERSNOW"                    "FLOODS"                        
## [119] "COOL AND WET"                   "HEAVY RAIN/SNOW"               
## [121] "GLAZE ICE"                      "MUD SLIDE"                     
## [123] "HIGH  WINDS"                    "RURAL FLOOD"                   
## [125] "MUD SLIDES"                     "EXTREME HEAT"                  
## [127] "DROUGHT"                        "COLD AND WET CONDITIONS"       
## [129] "EXCESSIVE WETNESS"              "SLEET/ICE STORM"               
## [131] "GUSTNADO"                       "FREEZING RAIN"                 
## [133] "SNOW AND HEAVY SNOW"            "GROUND BLIZZARD"               
## [135] "EXTREME WIND CHILL"             "MAJOR FLOOD"                   
## [137] "SNOW/HEAVY SNOW"                "FREEZING RAIN/SLEET"           
## [139] "ICE JAM FLOODING"               "COLD AIR TORNADO"              
## [141] "WIND DAMAGE"                    "FOG"                           
## [143] "TSTM WIND 55"                   "SMALL STREAM FLOOD"            
## [145] "THUNDERTORM WINDS"              "HAIL/WINDS"                    
## [147] "SNOW AND ICE"                   "WIND STORM"                    
## [149] "GRASS FIRES"                    "LAKE FLOOD"                    
## [151] "HAIL/WIND"                      "WIND/HAIL"                     
## [153] "ICE"                            "SNOW AND ICE STORM"            
## [155] "THUNDERSTORM  WINDS"            "WINTER WEATHER"                
## [157] "DROUGHT/EXCESSIVE HEAT"         "THUNDERSTORMS WIND"            
## [159] "TUNDERSTORM WIND"               "URBAN AND SMALL STREAM FLOODIN"
## [161] "THUNDERSTORM WIND/LIGHTNING"    "HEAVY RAIN/SEVERE WEATHER"     
## [163] "THUNDERSTORM"                   "WATERSPOUT/ TORNADO"           
## [165] "LIGHTNING."                     "HURRICANE-GENERATED SWELLS"    
## [167] "RIVER AND STREAM FLOOD"         "HIGH WINDS/COASTAL FLOOD"      
## [169] "RAIN"                           "RIVER FLOODING"                
## [171] "ICE FLOES"                      "LIGHTNING FIRE"                
## [173] "HEAVY LAKE SNOW"                "RECORD COLD"                   
## [175] "HEAVY SNOW/FREEZING RAIN"       "COLD WAVE"                     
## [177] "DUST DEVIL WATERSPOUT"          "TORNADO F3"                    
## [179] "TORNDAO"                        "FLOOD/RIVER FLOOD"             
## [181] "MUD SLIDES URBAN FLOODING"      "TORNADO F1"                    
## [183] "GLAZE/ICE STORM"                "GLAZE"                         
## [185] "HEAVY SNOW/WINTER STORM"        "MICROBURST"                    
## [187] "AVALANCE"                       "BLIZZARD/WINTER STORM"         
## [189] "DUST STORM/HIGH WINDS"          "ICE JAM"                       
## [191] "FOREST FIRES"                   "FROST\\FREEZE"                 
## [193] "THUNDERSTORM WINDS."            "HVY RAIN"                      
## [195] "HAIL 150"                       "HAIL 075"                      
## [197] "HAIL 100"                       "THUNDERSTORM WIND G55"         
## [199] "HAIL 125"                       "THUNDERSTORM WIND G60"         
## [201] "THUNDERSTORM WINDS G60"         "HARD FREEZE"                   
## [203] "HAIL 200"                       "HEAVY SNOW AND HIGH WINDS"     
## [205] "HEAVY SNOW/HIGH WINDS & FLOOD"  "HEAVY RAIN AND FLOOD"          
## [207] "RIP CURRENTS/HEAVY SURF"        "URBAN AND SMALL"               
## [209] "WILDFIRES"                      "FOG AND COLD TEMPERATURES"     
## [211] "SNOW/COLD"                      "FLASH FLOOD FROM ICE JAMS"     
## [213] "TSTM WIND G58"                  "MUDSLIDE"                      
## [215] "HEAVY SNOW SQUALLS"             "SNOW SQUALL"                   
## [217] "SNOW/ICE STORM"                 "HEAVY SNOW/SQUALLS"            
## [219] "HEAVY SNOW-SQUALLS"             "ICY ROADS"                     
## [221] "HEAVY MIX"                      "SNOW FREEZING RAIN"            
## [223] "SNOW/SLEET"                     "SNOW/FREEZING RAIN"            
## [225] "SNOW SQUALLS"                   "SNOW/SLEET/FREEZING RAIN"      
## [227] "RECORD SNOW"                    "HAIL 0.75"                     
## [229] "RECORD HEAT"                    "THUNDERSTORM WIND 65MPH"       
## [231] "THUNDERSTORM WIND/ TREES"       "THUNDERSTORM WIND/AWNING"      
## [233] "THUNDERSTORM WIND 98 MPH"       "THUNDERSTORM WIND TREES"       
## [235] "TORNADO F2"                     "RIP CURRENTS"                  
## [237] "HURRICANE EMILY"                "COASTAL SURGE"                 
## [239] "HURRICANE GORDON"               "HURRICANE FELIX"               
## [241] "THUNDERSTORM WIND 60 MPH"       "THUNDERSTORM WINDS 63 MPH"     
## [243] "THUNDERSTORM WIND/ TREE"        "THUNDERSTORM DAMAGE TO"        
## [245] "THUNDERSTORM WIND 65 MPH"       "FLASH FLOOD - HEAVY RAIN"      
## [247] "THUNDERSTORM WIND."             "FLASH FLOOD/ STREET"           
## [249] "BLOWING SNOW"                   "HEAVY SNOW/BLIZZARD"           
## [251] "THUNDERSTORM HAIL"              "THUNDERSTORM WINDSHAIL"        
## [253] "LIGHTNING  WAUSEON"             "THUDERSTORM WINDS"             
## [255] "ICE AND SNOW"                   "STORM FORCE WINDS"             
## [257] "HEAVY SNOW/ICE"                 "LIGHTING"                      
## [259] "HIGH WIND/HEAVY SNOW"           "THUNDERSTORM WINDS AND"        
## [261] "HEAVY PRECIPITATION"            "HIGH WIND/BLIZZARD"            
## [263] "TSTM WIND DAMAGE"               "FLOOD FLASH"                   
## [265] "RAIN/WIND"                      "SNOW/ICE"                      
## [267] "HAIL 75"                        "HEAT WAVE DROUGHT"             
## [269] "HEAVY SNOW/BLIZZARD/AVALANCHE"  "HEAT WAVES"                    
## [271] "UNSEASONABLY WARM AND DRY"      "UNSEASONABLY COLD"             
## [273] "RECORD/EXCESSIVE HEAT"          "THUNDERSTORM WIND G52"         
## [275] "HIGH WAVES"                     "FLASH FLOOD/FLOOD"             
## [277] "FLOOD/FLASH"                    "LOW TEMPERATURE"               
## [279] "HEAVY RAINS/FLOODING"           "THUNDERESTORM WINDS"           
## [281] "THUNDERSTORM WINDS/FLOODING"    "HYPOTHERMIA"                   
## [283] "THUNDEERSTORM WINDS"            "THUNERSTORM WINDS"             
## [285] "HIGH WINDS/COLD"                "COLD/WINDS"                    
## [287] "SNOW/ BITTER COLD"              "COLD WEATHER"                  
## [289] "RAPIDLY RISING WATER"           "WILD/FOREST FIRE"              
## [291] "ICE/STRONG WINDS"               "SNOW/HIGH WINDS"               
## [293] "HIGH WINDS/SNOW"                "SNOWMELT FLOODING"             
## [295] "HEAVY SNOW AND STRONG WINDS"    "SNOW ACCUMULATION"             
## [297] "SNOW/ ICE"                      "SNOW/BLOWING SNOW"             
## [299] "TORNADOES"                      "THUNDERSTORM WIND/HAIL"        
## [301] "FREEZING DRIZZLE"               "HAIL 175"                      
## [303] "FLASH FLOODING/FLOOD"           "HAIL 275"                      
## [305] "HAIL 450"                       "EXCESSIVE RAINFALL"            
## [307] "THUNDERSTORMW"                  "HAILSTORM"                     
## [309] "TSTM WINDS"                     "TSTMW"                         
## [311] "TSTM WIND 65)"                  "TROPICAL STORM DEAN"           
## [313] "THUNDERSTORM WINDS/ FLOOD"      "LANDSLIDE"                     
## [315] "HIGH WIND AND SEAS"             "THUNDERSTORMWINDS"             
## [317] "WILD/FOREST FIRES"              "HEAVY SEAS"                    
## [319] "HAIL DAMAGE"                    "FLOOD & HEAVY RAIN"            
## [321] "?"                              "THUNDERSTROM WIND"             
## [323] "FLOOD/FLASHFLOOD"               "HIGH WATER"                    
## [325] "HIGH WIND 48"                   "LANDSLIDES"                    
## [327] "URBAN/SMALL STREAM"             "BRUSH FIRE"                    
## [329] "HEAVY SHOWER"                   "HEAVY SWELLS"                  
## [331] "URBAN SMALL"                    "URBAN FLOODS"                  
## [333] "FLASH FLOOD/LANDSLIDE"          "HEAVY RAIN/SMALL STREAM URBAN" 
## [335] "FLASH FLOOD LANDSLIDES"         "TSTM WIND/HAIL"                
## [337] "Other"                          "Ice jam flood (minor"          
## [339] "Tstm Wind"                      "URBAN/SML STREAM FLD"          
## [341] "ROUGH SURF"                     "Heavy Surf"                    
## [343] "Dust Devil"                     "Marine Accident"               
## [345] "Freeze"                         "Strong Wind"                   
## [347] "COASTAL STORM"                  "Erosion/Cstl Flood"            
## [349] "River Flooding"                 "Damaging Freeze"               
## [351] "Beach Erosion"                  "High Surf"                     
## [353] "Heavy Rain/High Surf"           "Unseasonable Cold"             
## [355] "Early Frost"                    "Wintry Mix"                    
## [357] "Extreme Cold"                   "Coastal Flooding"              
## [359] "Torrential Rainfall"            "Landslump"                     
## [361] "Hurricane Edouard"              "Coastal Storm"                 
## [363] "TIDAL FLOODING"                 "Tidal Flooding"                
## [365] "Strong Winds"                   "EXTREME WINDCHILL"             
## [367] "Glaze"                          "Extended Cold"                 
## [369] "Whirlwind"                      "Heavy snow shower"             
## [371] "Light snow"                     "Light Snow"                    
## [373] "MIXED PRECIP"                   "Freezing Spray"                
## [375] "DOWNBURST"                      "Mudslides"                     
## [377] "Microburst"                     "Mudslide"                      
## [379] "Cold"                           "Coastal Flood"                 
## [381] "Snow Squalls"                   "Wind Damage"                   
## [383] "Light Snowfall"                 "Freezing Drizzle"              
## [385] "Gusty wind/rain"                "GUSTY WIND/HVY RAIN"           
## [387] "Wind"                           "Cold Temperature"              
## [389] "Heat Wave"                      "Snow"                          
## [391] "COLD AND SNOW"                  "RAIN/SNOW"                     
## [393] "TSTM WIND (G45)"                "Gusty Winds"                   
## [395] "GUSTY WIND"                     "TSTM WIND 40"                  
## [397] "TSTM WIND 45"                   "TSTM WIND (41)"                
## [399] "TSTM WIND (G40)"                "Frost/Freeze"                  
## [401] "AGRICULTURAL FREEZE"            "OTHER"                         
## [403] "Hypothermia/Exposure"           "HYPOTHERMIA/EXPOSURE"          
## [405] "Lake Effect Snow"               "Freezing Rain"                 
## [407] "Mixed Precipitation"            "BLACK ICE"                     
## [409] "COASTALSTORM"                   "LIGHT SNOW"                    
## [411] "DAM BREAK"                      "Gusty winds"                   
## [413] "blowing snow"                   "GRADIENT WIND"                 
## [415] "TSTM WIND AND LIGHTNING"        "gradient wind"                 
## [417] "Gradient wind"                  "Freezing drizzle"              
## [419] "WET MICROBURST"                 "Heavy surf and wind"           
## [421] "TYPHOON"                        "HIGH SWELLS"                   
## [423] "SMALL HAIL"                     "UNSEASONAL RAIN"               
## [425] "COASTAL FLOODING/EROSION"       " TSTM WIND (G45)"              
## [427] "TSTM WIND  (G45)"               "HIGH WIND (G40)"               
## [429] "TSTM WIND (G35)"                "COASTAL EROSION"               
## [431] "SEICHE"                         "COASTAL  FLOODING/EROSION"     
## [433] "HYPERTHERMIA/EXPOSURE"          "WINTRY MIX"                    
## [435] "ROCK SLIDE"                     "GUSTY WIND/HAIL"               
## [437] " TSTM WIND"                     "LANDSPOUT"                     
## [439] "EXCESSIVE SNOW"                 "LAKE EFFECT SNOW"              
## [441] "FLOOD/FLASH/FLOOD"              "MIXED PRECIPITATION"           
## [443] "WIND AND WAVE"                  "LIGHT FREEZING RAIN"           
## [445] "ICE ROADS"                      "ROUGH SEAS"                    
## [447] "TSTM WIND G45"                  "NON-SEVERE WIND DAMAGE"        
## [449] "WARM WEATHER"                   "THUNDERSTORM WIND (G40)"       
## [451] " FLASH FLOOD"                   "LATE SEASON SNOW"              
## [453] "WINTER WEATHER MIX"             "ROGUE WAVE"                    
## [455] "FALLING SNOW/ICE"               "NON-TSTM WIND"                 
## [457] "NON TSTM WIND"                  "BLOWING DUST"                  
## [459] "VOLCANIC ASH"                   "   HIGH SURF ADVISORY"         
## [461] "HAZARDOUS SURF"                 "WHIRLWIND"                     
## [463] "ICE ON ROAD"                    "DROWNING"                      
## [465] "EXTREME COLD/WIND CHILL"        "MARINE TSTM WIND"              
## [467] "HURRICANE/TYPHOON"              "WINTER WEATHER/MIX"            
## [469] "FROST/FREEZE"                   "ASTRONOMICAL HIGH TIDE"        
## [471] "HEAVY SURF/HIGH SURF"           "TROPICAL DEPRESSION"           
## [473] "LAKE-EFFECT SNOW"               "MARINE HIGH WIND"              
## [475] "TSUNAMI"                        "STORM SURGE/TIDE"              
## [477] "COLD/WIND CHILL"                "LAKESHORE FLOOD"               
## [479] "MARINE THUNDERSTORM WIND"       "MARINE STRONG WIND"            
## [481] "ASTRONOMICAL LOW TIDE"          "DENSE SMOKE"                   
## [483] "MARINE HAIL"                    "FREEZING FOG"

Wow this field is not very tidy. We are only really concerned with major events so lets try to identify which types of events we even want to worry about.

temp <- group_by(data,EVTYPE)
temp <- summarise(temp,FATALITIES=sum(FATALITIES))
temp <- arrange(temp,desc(FATALITIES))
head(temp,10)
## Source: local data frame [10 x 2]
## 
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        246
## 10      AVALANCHE        224
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,INJURIES=sum(INJURIES))
temp <- arrange(temp,desc(INJURIES))
head(temp,10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE INJURIES
## 1            TORNADO    91345
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1359
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,PROPDMG=sum(PROPDMG))
temp <- arrange(temp,desc(PROPDMG))
head(temp,10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE      PROPDMG
## 1              FLOOD 144657709807
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56937160617
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140811979
## 6               HAIL  15732267013
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497251
## 10         HIGH WIND   5270046260
temp <- group_by(data,EVTYPE)
temp <- summarise(temp,CROPDMG=sum(CROPDMG))
temp <- arrange(temp,desc(CROPDMG))
head(temp,10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE     CROPDMG
## 1            DROUGHT 13972566000
## 2              FLOOD  5661968450
## 3        RIVER FLOOD  5029459000
## 4          ICE STORM  5022113500
## 5               HAIL  3000954473
## 6          HURRICANE  2741910000
## 7  HURRICANE/TYPHOON  2607872800
## 8        FLASH FLOOD  1420887100
## 9       EXTREME COLD  1292973000
## 10      FROST/FREEZE  1094086000

OK so now we have a list of categories we should target. I’m not going to do anything fancy here. Just going to try and group up most of the data. Floods are a bit problematic because of the distinction between between flash flood and regular floods so I am going to just group them into one category of flood. We can always come back and break them out later if we wish, but for our purposes this should be fine for now. I will group all of the winter stuff into one category as well. This should be good enough for our purposes.

data$EVTYPE[grep("tornado",data$EVTYPE,ignore.case = T)]<-"Tornado"
data$EVTYPE[grep("heat",data$EVTYPE,ignore.case = T)]<-"Excessive Heat"
data$EVTYPE[grep("flood",data$EVTYPE,ignore.case = T)]<-"Flood"
data$EVTYPE[grep("lightning",data$EVTYPE,ignore.case = T)]<-"Lightning"
data$EVTYPE[grep("thunderstorm",data$EVTYPE,ignore.case = T)]<-"Thunderstorm"
data$EVTYPE[grep("tstm",data$EVTYPE,ignore.case = T)]<-"Thunderstorm"
data$EVTYPE[grep("hail",data$EVTYPE,ignore.case = T)]<-"Hail"
data$EVTYPE[grep("hurricane",data$EVTYPE,ignore.case = T)]<-"Hurricane (Typhoon)"
data$EVTYPE[grep("typhoon",data$EVTYPE,ignore.case = T)]<-"Hurricane (Typhoon)"
data$EVTYPE[grep("winter",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("snow",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("sleet",data$EVTYPE,ignore.case = T)]<-"Winter Storm"
data$EVTYPE[grep("ice",data$EVTYPE,ignore.case = T)]<-"Winter Storm"

Results

First lets have a look at deaths and injuries

DandI <- group_by(data,EVTYPE)
DandI <- summarise(DandI,deaths=sum(FATALITIES),injuries=sum(INJURIES))

#now lets grap the top 5 of each (which may overlap)

tdeaths <- arrange(DandI,desc(deaths))$deaths[5]
tinjuries <- arrange(DandI,desc(injuries))$injuries[5]
DandI <- filter(DandI,deaths>tdeaths | injuries>tinjuries)
DandI$both <- DandI$deaths+DandI$injuries
DandI <- gather(DandI, type,total,deaths:both)
#DandI$EVTYPE <- as.factor(DandI$EVTYPE)
#DandI$EVTYPE <- factor(DandI$EVTYPE,levels(DandI$EVTYPE)[c(5,1,4,2,3)])
ggplot(DandI,aes(x=reorder(EVTYPE,desc(total)),y=total,fill=type)) +
  geom_bar(position="dodge",stat="identity") +
  ggtitle("Numer of Deats and Injuries for the Most Harmful Types of Events") +
  xlab("Event Type") +
  labs(fill="")

It is pretty clear that Tornadoes are the most dangerous type of event.

Moving onto looking at property damage will will repeat pretty much what we did before but this time use the total property damage.

dmg <- group_by(data,EVTYPE)
dmg <- summarise(dmg,total=sum(PROPDMG))
dmg <- arrange(dmg,desc(total))
dmg <- head(dmg)
dmg
## Source: local data frame [6 x 2]
## 
##                EVTYPE        total
## 1               Flood 167528840813
## 2 Hurricane (Typhoon)  85356410010
## 3             Tornado  58593097867
## 4         STORM SURGE  43323536000
## 5                Hail  15974564013
## 6        Winter Storm  11765149361

Lets convert that to billions and throw it on a chart.

dmg$total <- dmg$total/1000000000
ggplot(dmg,aes(x=reorder(EVTYPE,desc(total)),y=total)) + 
  geom_bar(stat="identity",fill="blue") +
  ggtitle("Property Damage for Different Types of Event") +
  ylab("Property Damage (in billions)") +
  xlab("Event Type")

It is pretty clear that Floods cause the most property damage with Hurricanes in 2nd place.

Moving onto crop damage

dmg <- group_by(data,EVTYPE)
dmg <- summarise(dmg,total=sum(CROPDMG))
dmg <- arrange(dmg,desc(total))
dmg <- head(dmg)
dmg
## Source: local data frame [6 x 2]
## 
##                EVTYPE       total
## 1             DROUGHT 13972566000
## 2               Flood 12379679100
## 3 Hurricane (Typhoon)  5516117800
## 4        Winter Storm  5204221400
## 5                Hail  3021887623
## 6        EXTREME COLD  1292973000

Lets convert that to billions and throw it on a chart.

dmg$total <- dmg$total/1000000000
ggplot(dmg,aes(x=reorder(EVTYPE,desc(total)),y=total)) + 
  geom_bar(stat="identity",fill="blue") +
  ggtitle("Crop Damage for Different Types of Event") +
  ylab("Crop Damage (in billions)") +
  xlab("Event Type")

We can see here that Droughts cause the most damage while Floods come in a close second. I was suprised to see how much damage winter storms cause to crops.