Synopsis

Storms cause public health and economic problems in US. Preventing fatalities, injuries and asset loss are major concern for local authorities.

The goal of this analysis was to identify the most hazardous weather events in terms of population health and economy in US. This analysis is based on US National Oceanic and Atmospheric Administration data covering events from 1950 to 2011.

The analysis shows that the most harmful event for population health is tornado. The most harmful event for economy is flood.

Prerequisite packages

This analysis used four packages: reshape2, dplyr, lubridate, ggplot2.

library(reshape2)
library(dplyr)
library(lubridate)
library(ggplot2)

Load data

Data from NOAA is available from cloud storage here. Data description is here. More information about the storm is available in FAQ.

The approach to loading the data is to download compressed file from URL if not found in working directory. Then load the data using read.csv and validate the data by checking the file size and data dimensions, which are available in supporting forum.

raw_data_url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("repdata_data_StormData.csv.bz2")) {
    download.file(url = raw_data_url, 
                  destfile = "repdata_data_StormData.csv.bz2")
    }

file_info <- file.info("repdata_data_StormData.csv.bz2")
storm_data <- read.csv("repdata_data_StormData.csv.bz2", sep = ",", header = TRUE)

stopifnot(file_info[,1] == 49177144)
stopifnot(dim(storm_data) == c(902297,37))

Data processing

Subsetting dataset

There are seven total variables required to perform this analysis: event type, fatalities, injuries, damage to property, property damage multiplier, crop damage, crop damage multiplier. Those can be obtained from the names of the variables in the raw data.

names(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The list of necessary variables follows:
- EVTYPE - event type;
- FATALITIES - number of fatalities in event;
- INJURIES - number of injuries in event;
- PROPDMG - property damage;
- PROPDMGEXP - property damage multiplier;
- CROPDMG - crop damage;
- CROPDMGEXP - crop damage multiplier.

The dataset is reduced to these variables.

subset_data1 <- storm_data %>% 
    select(event_type = EVTYPE,
           fatalities = FATALITIES,
           injuries = INJURIES,
           property_damage = PROPDMG,
           property_multiplier = PROPDMGEXP,
           crop_damage = CROPDMG,
           crop_multiplier = CROPDMGEXP)

Part of observations contains event type NONE - we kick those observations. Additionally, we only need observations with values greater than zero.

subset_data2 <- subset_data1 %>% 
    filter(fatalities >0 | injuries > 0 | 
               property_damage > 0 | crop_damage > 0 & event_type != "NONE")

The subset has reduced to 254633 rows. Check that there are no missiong values in the subset.

dim(subset_data2)
## [1] 254633      7
sum(complete.cases(subset_data2))
## [1] 254633

Modifying event type

The official events type are 48. The subset contains 488 unique events type.

length(unique(subset_data2$event_type))
## [1] 488

The reason behind large number of types is that the dataset contains typos and similar events in capital letters like: wind and WIND.

unique(subset_data2$event_type)
##   [1] TORNADO                        TSTM WIND                     
##   [3] HAIL                           ICE STORM/FLASH FLOOD         
##   [5] WINTER STORM                   HURRICANE OPAL/HIGH WINDS     
##   [7] THUNDERSTORM WINDS             HURRICANE ERIN                
##   [9] HURRICANE OPAL                 HEAVY RAIN                    
##  [11] LIGHTNING                      THUNDERSTORM WIND             
##  [13] DENSE FOG                      RIP CURRENT                   
##  [15] THUNDERSTORM WINS              FLASH FLOODING                
##  [17] FLASH FLOOD                    TORNADO F0                    
##  [19] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS/HAIL       
##  [21] HEAT                           HIGH WINDS                    
##  [23] WIND                           HEAVY RAINS                   
##  [25] LIGHTNING AND HEAVY RAIN       THUNDERSTORM WINDS HAIL       
##  [27] COLD                           HEAVY RAIN/LIGHTNING          
##  [29] FLASH FLOODING/THUNDERSTORM WI FLOODING                      
##  [31] WATERSPOUT                     EXTREME COLD                  
##  [33] LIGHTNING/HEAVY RAIN           BREAKUP FLOODING              
##  [35] HIGH WIND                      FREEZE                        
##  [37] RIVER FLOOD                    HIGH WINDS HEAVY RAINS        
##  [39] AVALANCHE                      MARINE MISHAP                 
##  [41] HIGH TIDES                     HIGH WIND/SEAS                
##  [43] HIGH WINDS/HEAVY RAIN          HIGH SEAS                     
##  [45] COASTAL FLOOD                  SEVERE TURBULENCE             
##  [47] RECORD RAINFALL                HEAVY SNOW                    
##  [49] HEAVY SNOW/WIND                DUST STORM                    
##  [51] FLOOD                          APACHE COUNTY                 
##  [53] SLEET                          DUST DEVIL                    
##  [55] ICE STORM                      EXCESSIVE HEAT                
##  [57] THUNDERSTORM WINDS/FUNNEL CLOU GUSTY WINDS                   
##  [59] FLOODING/HEAVY RAIN            HEAVY SURF COASTAL FLOODING   
##  [61] HIGH SURF                      WILD FIRES                    
##  [63] HIGH                           WINTER STORM HIGH WINDS       
##  [65] WINTER STORMS                  MUDSLIDES                     
##  [67] RAINSTORM                      SEVERE THUNDERSTORM           
##  [69] SEVERE THUNDERSTORMS           SEVERE THUNDERSTORM WINDS     
##  [71] THUNDERSTORMS WINDS            FLOOD/FLASH FLOOD             
##  [73] FLOOD/RAIN/WINDS               THUNDERSTORMS                 
##  [75] FLASH FLOOD WINDS              WINDS                         
##  [77] FUNNEL CLOUD                   HIGH WIND DAMAGE              
##  [79] STRONG WIND                    HEAVY SNOWPACK                
##  [81] FLASH FLOOD/                   HEAVY SURF                    
##  [83] DRY MIRCOBURST WINDS           DRY MICROBURST                
##  [85] URBAN FLOOD                    THUNDERSTORM WINDSS           
##  [87] MICROBURST WINDS               HEAT WAVE                     
##  [89] UNSEASONABLY WARM              COASTAL FLOODING              
##  [91] STRONG WINDS                   BLIZZARD                      
##  [93] WATERSPOUT/TORNADO             WATERSPOUT TORNADO            
##  [95] STORM SURGE                    URBAN/SMALL STREAM FLOOD      
##  [97] WATERSPOUT-                    TORNADOES, TSTM WIND, HAIL    
##  [99] TROPICAL STORM ALBERTO         TROPICAL STORM                
## [101] TROPICAL STORM GORDON          TROPICAL STORM JERRY          
## [103] LIGHTNING THUNDERSTORM WINDS   URBAN FLOODING                
## [105] MINOR FLOODING                 WATERSPOUT-TORNADO            
## [107] LIGHTNING INJURY               LIGHTNING AND THUNDERSTORM WIN
## [109] FLASH FLOODS                   THUNDERSTORM WINDS53          
## [111] WILDFIRE                       DAMAGING FREEZE               
## [113] THUNDERSTORM WINDS 13          HURRICANE                     
## [115] SNOW                           LIGNTNING                     
## [117] FROST                          FREEZING RAIN/SNOW            
## [119] HIGH WINDS/                    THUNDERSNOW                   
## [121] FLOODS                         COOL AND WET                  
## [123] HEAVY RAIN/SNOW                GLAZE ICE                     
## [125] MUD SLIDE                      HIGH  WINDS                   
## [127] RURAL FLOOD                    MUD SLIDES                    
## [129] EXTREME HEAT                   DROUGHT                       
## [131] COLD AND WET CONDITIONS        EXCESSIVE WETNESS             
## [133] SLEET/ICE STORM                GUSTNADO                      
## [135] FREEZING RAIN                  SNOW AND HEAVY SNOW           
## [137] GROUND BLIZZARD                EXTREME WIND CHILL            
## [139] MAJOR FLOOD                    SNOW/HEAVY SNOW               
## [141] FREEZING RAIN/SLEET            ICE JAM FLOODING              
## [143] COLD AIR TORNADO               WIND DAMAGE                   
## [145] FOG                            TSTM WIND 55                  
## [147] SMALL STREAM FLOOD             THUNDERTORM WINDS             
## [149] HAIL/WINDS                     SNOW AND ICE                  
## [151] WIND STORM                     GRASS FIRES                   
## [153] LAKE FLOOD                     HAIL/WIND                     
## [155] WIND/HAIL                      ICE                           
## [157] SNOW AND ICE STORM             THUNDERSTORM  WINDS           
## [159] WINTER WEATHER                 DROUGHT/EXCESSIVE HEAT        
## [161] THUNDERSTORMS WIND             TUNDERSTORM WIND              
## [163] URBAN AND SMALL STREAM FLOODIN THUNDERSTORM WIND/LIGHTNING   
## [165] HEAVY RAIN/SEVERE WEATHER      THUNDERSTORM                  
## [167] WATERSPOUT/ TORNADO            LIGHTNING.                    
## [169] HURRICANE-GENERATED SWELLS     RIVER AND STREAM FLOOD        
## [171] HIGH WINDS/COASTAL FLOOD       RAIN                          
## [173] RIVER FLOODING                 ICE FLOES                     
## [175] THUNDERSTORM WIND G50          LIGHTNING FIRE                
## [177] HEAVY LAKE SNOW                RECORD COLD                   
## [179] HEAVY SNOW/FREEZING RAIN       COLD WAVE                     
## [181] DUST DEVIL WATERSPOUT          TORNADO F3                    
## [183] TORNDAO                        FLOOD/RIVER FLOOD             
## [185] MUD SLIDES URBAN FLOODING      TORNADO F1                    
## [187] GLAZE/ICE STORM                GLAZE                         
## [189] HEAVY SNOW/WINTER STORM        MICROBURST                    
## [191] AVALANCE                       BLIZZARD/WINTER STORM         
## [193] DUST STORM/HIGH WINDS          ICE JAM                       
## [195] FOREST FIRES                   FROST\\FREEZE                 
## [197] THUNDERSTORM WINDS.            HVY RAIN                      
## [199] HAIL 150                       HAIL 075                      
## [201] HAIL 100                       THUNDERSTORM WIND G55         
## [203] HAIL 125                       THUNDERSTORM WIND G60         
## [205] THUNDERSTORM WINDS G60         HARD FREEZE                   
## [207] HAIL 200                       HEAVY SNOW AND HIGH WINDS     
## [209] HEAVY SNOW/HIGH WINDS & FLOOD  HEAVY RAIN AND FLOOD          
## [211] RIP CURRENTS/HEAVY SURF        URBAN AND SMALL               
## [213] WILDFIRES                      FOG AND COLD TEMPERATURES     
## [215] SNOW/COLD                      FLASH FLOOD FROM ICE JAMS     
## [217] TSTM WIND G58                  MUDSLIDE                      
## [219] HEAVY SNOW SQUALLS             SNOW SQUALL                   
## [221] SNOW/ICE STORM                 HEAVY SNOW/SQUALLS            
## [223] HEAVY SNOW-SQUALLS             ICY ROADS                     
## [225] HEAVY MIX                      SNOW FREEZING RAIN            
## [227] SNOW/SLEET                     SNOW/FREEZING RAIN            
## [229] SNOW SQUALLS                   SNOW/SLEET/FREEZING RAIN      
## [231] RECORD SNOW                    HAIL 0.75                     
## [233] RECORD HEAT                    THUNDERSTORM WIND 65MPH       
## [235] THUNDERSTORM WIND/ TREES       THUNDERSTORM WIND/AWNING      
## [237] THUNDERSTORM WIND 98 MPH       THUNDERSTORM WIND TREES       
## [239] TORNADO F2                     RIP CURRENTS                  
## [241] HURRICANE EMILY                COASTAL SURGE                 
## [243] HURRICANE GORDON               HURRICANE FELIX               
## [245] THUNDERSTORM WIND 60 MPH       THUNDERSTORM WINDS 63 MPH     
## [247] THUNDERSTORM WIND/ TREE        THUNDERSTORM DAMAGE TO        
## [249] THUNDERSTORM WIND 65 MPH       FLASH FLOOD - HEAVY RAIN      
## [251] THUNDERSTORM WIND.             FLASH FLOOD/ STREET           
## [253] BLOWING SNOW                   HEAVY SNOW/BLIZZARD           
## [255] THUNDERSTORM HAIL              THUNDERSTORM WINDSHAIL        
## [257] LIGHTNING  WAUSEON             THUDERSTORM WINDS             
## [259] ICE AND SNOW                   STORM FORCE WINDS             
## [261] HEAVY SNOW/ICE                 LIGHTING                      
## [263] HIGH WIND/HEAVY SNOW           THUNDERSTORM WINDS AND        
## [265] HEAVY PRECIPITATION            HIGH WIND/BLIZZARD            
## [267] TSTM WIND DAMAGE               FLOOD FLASH                   
## [269] RAIN/WIND                      SNOW/ICE                      
## [271] HAIL 75                        HEAT WAVE DROUGHT             
## [273] HEAVY SNOW/BLIZZARD/AVALANCHE  HEAT WAVES                    
## [275] UNSEASONABLY WARM AND DRY      UNSEASONABLY COLD             
## [277] RECORD/EXCESSIVE HEAT          THUNDERSTORM WIND G52         
## [279] HIGH WAVES                     FLASH FLOOD/FLOOD             
## [281] FLOOD/FLASH                    LOW TEMPERATURE               
## [283] HEAVY RAINS/FLOODING           THUNDERESTORM WINDS           
## [285] THUNDERSTORM WINDS/FLOODING    HYPOTHERMIA                   
## [287] THUNDEERSTORM WINDS            THUNERSTORM WINDS             
## [289] HIGH WINDS/COLD                COLD/WINDS                    
## [291] SNOW/ BITTER COLD              COLD WEATHER                  
## [293] RAPIDLY RISING WATER           WILD/FOREST FIRE              
## [295] ICE/STRONG WINDS               SNOW/HIGH WINDS               
## [297] HIGH WINDS/SNOW                SNOWMELT FLOODING             
## [299] HEAVY SNOW AND STRONG WINDS    SNOW ACCUMULATION             
## [301] SNOW/ ICE                      SNOW/BLOWING SNOW             
## [303] TORNADOES                      THUNDERSTORM WIND/HAIL        
## [305] FREEZING DRIZZLE               HAIL 175                      
## [307] FLASH FLOODING/FLOOD           HAIL 275                      
## [309] HAIL 450                       EXCESSIVE RAINFALL            
## [311] THUNDERSTORMW                  HAILSTORM                     
## [313] TSTM WINDS                     TSTMW                         
## [315] TSTM WIND 65)                  TROPICAL STORM DEAN           
## [317] THUNDERSTORM WINDS/ FLOOD      LANDSLIDE                     
## [319] HIGH WIND AND SEAS             THUNDERSTORMWINDS             
## [321] WILD/FOREST FIRES              HEAVY SEAS                    
## [323] HAIL DAMAGE                    FLOOD & HEAVY RAIN            
## [325] ?                              THUNDERSTROM WIND             
## [327] FLOOD/FLASHFLOOD               HIGH WATER                    
## [329] HIGH WIND 48                   LANDSLIDES                    
## [331] URBAN/SMALL STREAM             BRUSH FIRE                    
## [333] HEAVY SHOWER                   HEAVY SWELLS                  
## [335] URBAN SMALL                    URBAN FLOODS                  
## [337] FLASH FLOOD/LANDSLIDE          HEAVY RAIN/SMALL STREAM URBAN 
## [339] FLASH FLOOD LANDSLIDES         TSTM WIND/HAIL                
## [341] Other                          Ice jam flood (minor          
## [343] Tstm Wind                      URBAN/SML STREAM FLD          
## [345] ROUGH SURF                     Heavy Surf                    
## [347] Dust Devil                     Marine Accident               
## [349] Freeze                         Strong Wind                   
## [351] COASTAL STORM                  Erosion/Cstl Flood            
## [353] River Flooding                 Damaging Freeze               
## [355] Beach Erosion                  High Surf                     
## [357] Heavy Rain/High Surf           Unseasonable Cold             
## [359] Early Frost                    Wintry Mix                    
## [361] Extreme Cold                   Coastal Flooding              
## [363] Torrential Rainfall            Landslump                     
## [365] Hurricane Edouard              Coastal Storm                 
## [367] TIDAL FLOODING                 Tidal Flooding                
## [369] Strong Winds                   EXTREME WINDCHILL             
## [371] Glaze                          Extended Cold                 
## [373] Whirlwind                      Heavy snow shower             
## [375] Light snow                     Light Snow                    
## [377] MIXED PRECIP                   Freezing Spray                
## [379] DOWNBURST                      Mudslides                     
## [381] Microburst                     Mudslide                      
## [383] Cold                           Coastal Flood                 
## [385] Snow Squalls                   Wind Damage                   
## [387] Light Snowfall                 Freezing Drizzle              
## [389] Gusty wind/rain                GUSTY WIND/HVY RAIN           
## [391] Wind                           Cold Temperature              
## [393] Heat Wave                      Snow                          
## [395] COLD AND SNOW                  RAIN/SNOW                     
## [397] TSTM WIND (G45)                Gusty Winds                   
## [399] GUSTY WIND                     TSTM WIND 40                  
## [401] TSTM WIND 45                   TSTM WIND (41)                
## [403] TSTM WIND (G40)                Frost/Freeze                  
## [405] AGRICULTURAL FREEZE            OTHER                         
## [407] Hypothermia/Exposure           HYPOTHERMIA/EXPOSURE          
## [409] Lake Effect Snow               Freezing Rain                 
## [411] Mixed Precipitation            BLACK ICE                     
## [413] COASTALSTORM                   LIGHT SNOW                    
## [415] DAM BREAK                      Gusty winds                   
## [417] blowing snow                   GRADIENT WIND                 
## [419] TSTM WIND AND LIGHTNING        gradient wind                 
## [421] Gradient wind                  Freezing drizzle              
## [423] WET MICROBURST                 Heavy surf and wind           
## [425] TYPHOON                        HIGH SWELLS                   
## [427] SMALL HAIL                     UNSEASONAL RAIN               
## [429] COASTAL FLOODING/EROSION        TSTM WIND (G45)              
## [431] TSTM WIND  (G45)               HIGH WIND (G40)               
## [433] TSTM WIND (G35)                COASTAL EROSION               
## [435] SEICHE                         COASTAL  FLOODING/EROSION     
## [437] HYPERTHERMIA/EXPOSURE          WINTRY MIX                    
## [439] ROCK SLIDE                     GUSTY WIND/HAIL               
## [441]  TSTM WIND                     LANDSPOUT                     
## [443] EXCESSIVE SNOW                 LAKE EFFECT SNOW              
## [445] FLOOD/FLASH/FLOOD              MIXED PRECIPITATION           
## [447] WIND AND WAVE                  LIGHT FREEZING RAIN           
## [449] ICE ROADS                      ROUGH SEAS                    
## [451] TSTM WIND G45                  NON-SEVERE WIND DAMAGE        
## [453] WARM WEATHER                   THUNDERSTORM WIND (G40)       
## [455]  FLASH FLOOD                   LATE SEASON SNOW              
## [457] WINTER WEATHER MIX             ROGUE WAVE                    
## [459] FALLING SNOW/ICE               NON-TSTM WIND                 
## [461] NON TSTM WIND                  BLOWING DUST                  
## [463] VOLCANIC ASH                      HIGH SURF ADVISORY         
## [465] HAZARDOUS SURF                 WHIRLWIND                     
## [467] ICE ON ROAD                    DROWNING                      
## [469] EXTREME COLD/WIND CHILL        MARINE TSTM WIND              
## [471] HURRICANE/TYPHOON              WINTER WEATHER/MIX            
## [473] FROST/FREEZE                   ASTRONOMICAL HIGH TIDE        
## [475] HEAVY SURF/HIGH SURF           TROPICAL DEPRESSION           
## [477] LAKE-EFFECT SNOW               MARINE HIGH WIND              
## [479] TSUNAMI                        STORM SURGE/TIDE              
## [481] COLD/WIND CHILL                LAKESHORE FLOOD               
## [483] MARINE THUNDERSTORM WIND       MARINE STRONG WIND            
## [485] ASTRONOMICAL LOW TIDE          DENSE SMOKE                   
## [487] MARINE HAIL                    FREEZING FOG                  
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD  FLASH FLOOD ... WND

One way to solve this problem is to convert all event types to lowercase characters. This action reduced the number of event types to 447.

subset_data3 <- subset_data2 %>% 
    mutate(event_type = tolower(event_type))
length(unique(subset_data3$event_type))
## [1] 447

Further reduction of the dataset is acheived by grouping the events type using the key word from the official event type names. The events that do not match any key word are grouped into other.

subset_data3$event <- "other"
subset_data3$event[grep("avalanche", subset_data3$event_type)] <- "avalanche"
subset_data3$event[grep("blizzard", subset_data3$event_type)] <- "snow"
subset_data3$event[grep("flood", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("wind", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("fog", subset_data3$event_type)] <- "fog"
subset_data3$event[grep("cold", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("chill", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("frost", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("freeze", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("tornado", subset_data3$event_type)] <- "tornado"
subset_data3$event[grep("hail", subset_data3$event_type)] <- "hail"
subset_data3$event[grep("winds", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("win", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("wins", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("storm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("rainstorm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("thunderstorm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("snow", subset_data3$event_type)] <- "snow"
subset_data3$event[grep("rain", subset_data3$event_type)] <- "rain"
subset_data3$event[grep("heat", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("hurricane", subset_data3$event_type)] <- "hurricane"
subset_data3$event[grep("fld", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("current", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("surf", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("fire", subset_data3$event_type)] <- "fire"
subset_data3$event[grep("water", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("wave", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("tsunami", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("lightning", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("warm", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("torndao", subset_data3$event_type)] <- "tornado"
subset_data3$event[grep("high tides", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("high seas", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("marine mishap", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("slide", subset_data3$event_type)] <- "slides"
subset_data3$event[grep("dust devil", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("dry microburst", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("low temperature", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("freezing spray", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("dam break", subset_data3$event_type)] <- "flood"

This operation has reduced the dataset events type to 16 unique values.

length(unique(subset_data3$event))
## [1] 16

Top-10 events type by the number of rows cover 99% of all observations.

sum(sort(table(subset_data3$event), decreasing = TRUE)[1:10]) / nrow(subset_data3)
## [1] 0.9927072

The dataset is reduced to top-10 events.

sum(sort(table(subset_data3$event), decreasing = TRUE)[1:10])
## [1] 252776
sort(table(subset_data3$event), decreasing = TRUE)[1:10]
## 
##    wind   storm tornado   flood    hail    snow    fire    rain    heat   waves 
##   74491   72246   39961   33223   26157    2134    1258    1246    1126     934
subset_data4 <- subset_data3 %>% 
    filter(event %in% c("wind", "storm", "tornado", "flood", "hail", "snow", 
                        "fire", "rain", "heat", "waves"))
dim(subset_data4)
## [1] 252776      8

Cleaning units

Property damage multiplier and crop damage multiplier contain different units.

unique(subset_data4$property_multiplier)
##  [1] K M   B + 0 5 m 6 4 h 2 7 3 H -
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(subset_data4$crop_multiplier)
## [1]   K M B ? 0 k
## Levels:  ? 0 2 B k K m M

Some units like k or K, m or M, B reflect thousands, millions and billions of USD. Numbers from 1 to 8 show the power coefficient for the multiplier like: \[ e = 10^k \]

where $ e $ is multiplier and $ k $ is from 1 to 8.

Other symbols do not transform to currency directly. For the purpose of this analysis those symbols are condidered to be 1 USD.

subset_data5 <- subset_data4 %>% 
    mutate(property_multiplier = tolower(property_multiplier),
           crop_multiplier = tolower(crop_multiplier))

subset_data5$property_multiplier <- as.character(subset_data5$property_multiplier)
subset_data5$property_multiplier[is.na(subset_data5$property_multiplier)] <- 0
subset_data5$property_multiplier[!grepl("k|m|b|h|2|3|4|5|6|7", subset_data5$property_multiplier)] <- 0
subset_data5$property_multiplier[grep("k", subset_data5$property_multiplier)] <- "3"
subset_data5$property_multiplier[grep("m", subset_data5$property_multiplier)] <- "6"
subset_data5$property_multiplier[grep("b", subset_data5$property_multiplier)] <- "9"
subset_data5$property_multiplier[grep("h", subset_data5$property_multiplier)] <- "2"

subset_data5$crop_multiplier <- as.character(subset_data5$crop_multiplier)
subset_data5$crop_multiplier[is.na(subset_data5$crop_multiplier)] <- 0
subset_data5$crop_multiplier[!grepl("k|m|b|h|2|3|4|5|6|7", subset_data5$crop_multiplier)] <- 0
subset_data5$crop_multiplier[grep("k", subset_data5$crop_multiplier)] <- "3"
subset_data5$crop_multiplier[grep("m", subset_data5$crop_multiplier)] <- "6"
subset_data5$crop_multiplier[grep("b", subset_data5$crop_multiplier)] <- "9"
subset_data5$crop_multiplier[grep("h", subset_data5$crop_multiplier)] <- "2"

subset_data5$property_multiplier <- as.numeric(as.character(subset_data5$property_multiplier))
subset_data5$crop_multiplier <- as.numeric(as.character(subset_data5$crop_multiplier))

subset_data5 <- subset_data5 %>% 
    mutate(prop_damage = property_damage * 10^property_multiplier,
           crop_damage = crop_damage * 10^crop_multiplier) %>% 
    select(event, fatalities, injuries, crop_damage, prop_damage)

head(subset_data5)
##     event fatalities injuries crop_damage prop_damage
## 1 tornado          0       15           0       25000
## 2 tornado          0        0           0        2500
## 3 tornado          0        2           0       25000
## 4 tornado          0        2           0        2500
## 5 tornado          0        2           0        2500
## 6 tornado          0        6           0        2500

Data aggregation by health and economic results

The data is aggregated by the type of event using summarise function resulting in 2 datasets: health and economic. Economic figures are transformed into billions.

health_dt <- subset_data5 %>% 
    group_by(event) %>% summarise(fatalities = sum(fatalities),
                                  injuries = sum(injuries))
fatal_h <- health_dt %>% 
    select(event, result = fatalities) %>% 
    mutate(type = "fatal")
injuries_h <- health_dt %>% 
    select(event, result = injuries) %>% 
    mutate(type = "injury")
health_dt <- rbind(fatal_h, injuries_h)
health_dt
## # A tibble: 20 x 3
##    event   result type  
##    <chr>    <dbl> <chr> 
##  1 fire        90 fatal 
##  2 flood     1562 fatal 
##  3 hail        15 fatal 
##  4 heat      3002 fatal 
##  5 rain       114 fatal 
##  6 snow       265 fatal 
##  7 storm     1450 fatal 
##  8 tornado   5633 fatal 
##  9 waves      968 fatal 
## 10 wind      1291 fatal 
## 11 fire      1608 injury
## 12 flood     8753 injury
## 13 hail      1371 injury
## 14 heat      8920 injury
## 15 rain       305 injury
## 16 snow      1969 injury
## 17 storm    11923 injury
## 18 tornado  91364 injury
## 19 waves     1313 injury
## 20 wind      9616 injury
economic_dt <- subset_data5 %>% 
    group_by(event) %>% summarise(crop_damage_bln = sum(crop_damage) / 1000000000, 
                                  prop_damage_bln = sum(prop_damage) / 1000000000)
prop_econ <- economic_dt %>% 
    select(event, result = prop_damage_bln) %>% 
    mutate(type = "property damage")
crop_econ <- economic_dt %>% 
    select(event, result = crop_damage_bln) %>% 
    mutate(type = "crop damage")
economic_dt <- rbind(prop_econ, crop_econ)
economic_dt
## # A tibble: 20 x 3
##    event      result type           
##    <chr>       <dbl> <chr>          
##  1 fire      8.50    property damage
##  2 flood   168.      property damage
##  3 hail     15.7     property damage
##  4 heat      0.0171  property damage
##  5 rain      3.25    property damage
##  6 snow      1.68    property damage
##  7 storm    74.2     property damage
##  8 tornado  57.0     property damage
##  9 waves     0.271   property damage
## 10 wind     12.4     property damage
## 11 fire      0.403   crop damage    
## 12 flood    12.3     crop damage    
## 13 hail      3.05    crop damage    
## 14 heat      0.899   crop damage    
## 15 rain      0.918   crop damage    
## 16 snow      0.247   crop damage    
## 17 storm     6.42    crop damage    
## 18 tornado   0.415   crop damage    
## 19 waves     0.00712 crop damage    
## 20 wind      1.41    crop damage

Results

Which type of events are most harmful to population health

Tornado is the most harmful event of all severe weather conditions in US. Total number of injuries and deaths is 96997, which is 7 times more than the storms.

health_dts <- health_dt %>% 
    group_by(event) %>% summarise(result = sum(result)) %>% 
    arrange(desc(result))
health_dts
## # A tibble: 10 x 2
##    event   result
##    <chr>    <dbl>
##  1 tornado  96997
##  2 storm    13373
##  3 heat     11922
##  4 wind     10907
##  5 flood    10315
##  6 waves     2281
##  7 snow      2234
##  8 fire      1698
##  9 hail      1386
## 10 rain       419

Distribution of health impact is given in the figure below.

ggplot(health_dt) +
    theme_bw() +
    ggtitle("Weather impact on population health") +
    xlab("Event type") + ylab("Fatalities and Injuries") +
    geom_col(aes(reorder(event, result), result, fill = type))

Which types of events have the greatest economic consequences

In economic terms floods caused 181 bln USD loss, which makes them the most harmful event.

economic_dts <- economic_dt %>% 
    group_by(event) %>% summarise(result = sum(result)) %>% 
    arrange(desc(result))
economic_dts
## # A tibble: 10 x 2
##    event    result
##    <chr>     <dbl>
##  1 flood   181.   
##  2 storm    80.6  
##  3 tornado  57.4  
##  4 hail     18.8  
##  5 wind     13.8  
##  6 fire      8.90 
##  7 rain      4.17 
##  8 snow      1.93 
##  9 heat      0.916
## 10 waves     0.278

Distribution of economic impact is given in the figure below.

ggplot(economic_dt) +
    theme_bw() +
    ggtitle("Weather impact on economy") +
    xlab("Event type") + ylab("Loss in bln USD") +
    geom_col(aes(reorder(event, result), result, fill = type))

Conclusion

Tornado is the major cause of death or injury of other types of severe weather conditions. It caused almost 100k deaths and injuries. As for the economy, the most harmful is the flood with 1814 bln USD in total loss. The major source of loss is property damage.