Analysis of NOAA Storm Data and identifying the sources of events which have the greatest impact on population health and economy of the United States
Synopsis
What is meant by harmful? On exploring the dataset we have found that FATALITIES and INJURIES are two variables that explain the term: HARMFUL
Now the variable BGN_DATE is one variable we can use for analysis. We are given to understand that the data is right from 1950 to 2011 i.e. almost 6 decades. Now based on mentor feedback and some EDA & Research I found that only Tornadoes, Thunderstorm Wind and Hail were recorded till 1995. Hence let us just exclude data till 1995.
Then we have another variable WFO. Let us explore that as well.
Now EVTYPE should ideally have only 48 values but has 1000+. Now if we map all these 1000+ to 48 that is not working out. So the approach is to find the contenders to top spots and only find the matches for them. But before that let us just exclude the rows which have neither any health or economic risks. That actually removes 70% of data and brings down distinct EVTYPE values by half.
We need to check on outliers as well since at many times one single large event can skew numbers but that does not reflect on the overall picture.
Conclusion of my analysis is towards the end
Data Processing
Let us first load the data with caching enabled
stormData <- read.csv("repdata_data_StormData.csv.bz2")
Let us explore the data and see the columns
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.6.3
library(knitr)
library(RColorBrewer)
Approach
stormData$BGN_YEAR <- stormData$BGN_DATE
lvl.year <- levels(stormData$BGN_YEAR)
lvl.year <- as.Date.character(lvl.year,format="%m/%d/%Y")
lvl.year <- as.numeric(format(lvl.year,format="%Y"))
levels(stormData$BGN_YEAR) <- lvl.year
stormData$BGN_DECADE <- stormData$BGN_YEAR
lvl.decade <- as.numeric(levels(stormData$BGN_DECADE))
lvl.decade <- (lvl.decade %/% 10) * 10
levels(stormData$BGN_DECADE) <- lvl.decade
stormData <- stormData[as.numeric(levels(stormData$BGN_YEAR)[stormData$BGN_YEAR]) >= 1996,]
stormData <- stormData[(stormData$PROPDMG != 0) | (stormData$CROPDMG != 0) | (stormData$FATALITIES != 0) | (stormData$INJURIES != 0),]
stormData$PROPDMG_REAL <- stormData$PROPDMG
stormData$CROPDMG_REAL <- stormData$CROPDMG
exp.multiplier <- data.frame(PROPDMGEXP=c("H","h","K","k","M","m","B","b","-","?",""," ",0:8),
MULTIPLIER=c(100,100,1000,1000,1000000,1000000,1000000000,1000000000,0,0,0,0,rep(10,9))
)
stormData <- merge(stormData,exp.multiplier)
stormData$PROPDMG_REAL <- stormData$PROPDMG * stormData$MULTIPLIER
stormData <- stormData[!(colnames(stormData) %in% "MULTIPLIER")]
colnames(exp.multiplier)[1] <- "CROPDMGEXP"
stormData <- merge(stormData,exp.multiplier)
stormData$CROPDMG_REAL <- stormData$CROPDMG * stormData$MULTIPLIER
stormData <- stormData[!(colnames(stormData) %in% "MULTIPLIER")]
stormData$ECODMG_REAL <- round((stormData$CROPDMG_REAL + stormData$PROPDMG_REAL)/1000000,0)
nrow(stormData)
## [1] 201318
head(stormData)
## CROPDMGEXP PROPDMGEXP STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 1 12 6/6/2001 0:00:00 12:00:00 PM CST 33
## 2 12 8/21/2001 0:00:00 01:43:00 PM EST 103
## 3 39 6/24/2004 0:00:00 05:30:00 PM EST 153
## 4 12 7/19/2002 0:00:00 02:20:00 PM CST 113
## 5 34 8/23/1998 0:00:00 08:00:00 AM EST 9
## 6 12 8/21/2001 0:00:00 01:15:00 PM EST 103
## COUNTYNAME STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI
## 1 ESCAMBIA FL RIP CURRENTS 0.0 PENSACOLA BEACH
## 2 PINELLAS FL LIGHTNING 2.0 NW ST PETERSBURG
## 3 SUMMIT OH LIGHTNING 0.0 PENINSULA
## 4 SANTA ROSA FL LIGHTNING 0.0 BAGDAD
## 5 CAPE MAY NJ RIP CURRENTS 0.0 OCEAN CITY
## 6 PINELLAS FL LIGHTNING 2.5 NW ST PETERSBURG
## END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI
## 1 6/6/2001 0:00:00 12:10:00 PM 0 NA 0.0
## 2 8/21/2001 0:00:00 01:43:00 PM 0 NA 2.0 NW
## 3 6/24/2004 0:00:00 05:30:00 PM 0 NA 0.0
## 4 7/19/2002 0:00:00 02:20:00 PM 0 NA 0.0
## 5 8/29/1998 0:00:00 06:00:00 PM 0 NA 0.0
## 6 8/21/2001 0:00:00 01:15:00 PM 0 NA 2.5 NW
## END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG CROPDMG WFO
## 1 PENSACOLA BEACH 0 0 NA 0 2 2 0 0 MOB
## 2 ST PETERSBURG 0 0 NA 0 0 1 0 0 TBW
## 3 PENINSULA 0 0 NA 0 0 9 0 0 CLE
## 4 BAGDAD 0 0 NA 0 0 1 0 0 MOB
## 5 CAPE MAY PT 0 0 NA 0 0 7 0 0 PHI
## 6 ST PETERSBURG 0 0 NA 0 0 1 0 0 TBW
## STATEOFFIC ZONENAMES LATITUDE LONGITUDE LATITUDE_E
## 1 FLORIDA, West Panhandle 0 0 0
## 2 FLORIDA, West Central 0 0 0
## 3 OHIO, North 0 0 0
## 4 FLORIDA, West Panhandle 0 0 0
## 5 NEW JERSEY, South and Northwest 0 0 0
## 6 FLORIDA, West Central 0 0 0
## LONGITUDE_
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## REMARKS
## 1 A 59 year old male from Tennessee and his daughters father in-law, a 60 year old male from Great Britain, both drowned while swimming in rough waters near the Santa Rosa and Escambia county lines. The male from England was swimming in the rough surf when he encountered a strong rip current. The male from Tennessee jumped in to help the man from England and they both drowned. Two firefighters attempting to rescue the men had to be pulled from the heavy surf and were transported to a local hospital. They were checked over and released. Rescuers on the scene said the surf was the roughest they had ever seen. The drownings occurred near Parking Lot number 1 east of Pensacola Beach.\n
## 2 A 47 year old male was injured by a nearby lightning strike while using a garden hose outdoors along the 1800 block of 28th Street N. in St. Petersburg.\n
## 3 Nine boys ages 11 to 14 were injured when lightning struck a tree near their campsite. All of the injuries were minor. The boys were staying at a campground in Boston Township.\n
## 4 A female went outside to roll up her automobile windows as a thunderstorm was approaching. She was hit by lightning when a strike hit a nearby tree, went through the root system of the tree and hit her. She was taken to a local hospital, examined and released.\n
## 5 Rip currents and large waves associated with Hurricane Bonnie caused hundreds of water rescues, at least eight injuries and resulted in swimming restrictions up and down the New Jersey Coast through the full week of August 23rd. Bonnie made her closest pass to New Jersey (at tropical storm force strength) during the afternoon of the 28th when she passed about 140 miles east of Atlantic City.\n\nLarger waves and rip currents started on the 23rd. That day alone the Atlantic City Beach Patrol rescued about 75 swimmers. There were nearly one hundred water rescues that day at the beaches in Cape May, North Wildwood (a 10-year-old boy was pulled out 100 feet) and Wildwood. In Cape May, the pounding surf caused five injuries that included separated shoulders, neck injuries and broken noses. In Monmouth County, 10 swimmers were rescued late in the day at Bradley Beach. On the 24th, swimming restrictions started as swells increased to six to eight feet. Boogie boards and inflatable rafts were banned. Swimmers were generally restricted to waist high water. Margate (Atlantic County) ordered sailboats to be cleared from beach areas. The most reported rescues on the 24th were in Monmouth County (about 25) in Manasquan and Spring Lake. One teenager in Spring Lake was injured. On the 25th a few beaches were closed, especially at high tide. The surf became so rough that even five lifeguards were rescued in Point Pleasant (Ocean County) and Atlantic City. Two juveniles suffered neck injuries in Ocean City (Cape May County). Strong winds also sent beach umbrellas flying. Rough swells in Barnegat Inlet capsized one boat and knocked several people off their jet skis. Seven people were rescued. As Bonnie neared the North Carolina Coast on the 26th, beach restrictions became tighter. Numerous beaches were closed and surfing was banned in several communities. Some beaches permitted bathers to only go into the water up to their ankles. The United States Coast Guard issued a warning for boaters and personal watercraft users to stay off the ocean. The surfing and bathing bands peaked on the 27th and 28th as Bonnie made her closest approach to New Jersey. \n\nSome minor beach erosion occurred because of Bonnie, but coastal New Jersey received only fringe effects from the tropical system on the 28th. The strongest wind gusts were mainly around 35 mph. The highest wind gust at the Delaware Bay Buoy was 46 mph. Rainfall amounts were under an inch. Tidal departures were only between 1.0 and 1.5 feet above normal and were below the thresholds for even minor tidal flooding. The most lasting effects of Bonnie were the rough surf, larger waves and rip currents. According to researchers on the subject of rip currents, rip currents and the sandbars that cause them are always present. Rip currents also form near rock piles, jetties and fishing piers. The theory states conditions get exacerbated (and the rip currents become stronger) when larger than average waves combine with stronger than normal onshore winds. The steady winds pile the water onto the beach longer than usual, building or enlarging the offshore sandbars. The excess water then cuts a channel and/or moves quickly through a pre-existing one in the sandbar and races back to sea through this break. The northeasters of the previous winter were believed to have created a larger than normal number of offshore sandbars.
## 6 A man was injured by the electrical charge of a lightning bolt during a thunderstorm that struck his auto repair business along the 1800 block of 28th Street N. The man was injured while turning on a metal water spigot when lightning struck the commercial building and the charge traveled through the plumping system. The electrical charge from the lightning strike was intense enough to throw the victim backward to the ground and cause chest pains.\n
## REFNUM BGN_YEAR BGN_DECADE PROPDMG_REAL CROPDMG_REAL ECODMG_REAL
## 1 417936 2001 2000 0 0 0
## 2 418401 2001 2000 0 0 0
## 3 549915 2004 2000 0 0 0
## 4 452426 2002 2000 0 0 0
## 5 331164 1998 1990 0 0 0
## 6 418399 2001 2000 0 0 0
Getting Unique EVTYPE Values
evtypes <- unique(stormData$EVTYPE)
evtypes
## [1] RIP CURRENTS LIGHTNING
## [3] HAIL TSTM WIND
## [5] FLASH FLOOD HEAT
## [7] EXCESSIVE HEAT HIGH WIND
## [9] FLOOD TORNADO
## [11] STRONG WIND Coastal Flooding
## [13] BLIZZARD RIP CURRENT
## [15] WINTER STORM TSTM WIND/HAIL
## [17] HEAVY SNOW HEAVY RAIN
## [19] FOG River Flooding
## [21] HEAVY SURF/HIGH SURF URBAN/SML STREAM FLD
## [23] STORM SURGE DUST STORM
## [25] STRONG WINDS AVALANCHE
## [27] RIVER FLOOD ICE STORM
## [29] Heavy Surf COASTAL FLOOD
## [31] WIND WILD/FOREST FIRE
## [33] HURRICANE/TYPHOON DRY MICROBURST
## [35] HIGH WATER HIGH SURF
## [37] EXTREME COLD/WIND CHILL EXTREME COLD
## [39] EXTREME WINDCHILL WINTER WEATHER/MIX
## [41] HURRICANE WET MICROBURST
## [43] ROGUE WAVE High Surf
## [45] UNSEASONABLY WARM Freezing Spray
## [47] Wind TROPICAL STORM
## [49] Gusty Winds Torrential Rainfall
## [51] HEAVY SURF MARINE TSTM WIND
## [53] LIGHT SNOW Lake Effect Snow
## [55] WINTRY MIX COLD
## [57] GUSTY WINDS OTHER
## [59] Hypothermia/Exposure WATERSPOUT
## [61] SNOW COLD/WIND CHILL
## [63] Strong Winds WILDFIRE
## [65] TSTM WIND (G45) ROUGH SURF
## [67] GLAZE ICY ROADS
## [69] LANDSLIDE FREEZING RAIN
## [71] WINTER WEATHER Light Snow
## [73] Gradient wind THUNDERSTORM
## [75] HIGH SEAS ROUGH SEAS
## [77] Whirlwind TSTM WIND (G40)
## [79] Freezing drizzle FUNNEL CLOUD
## [81] MIXED PRECIP DROWNING
## [83] LIGHT FREEZING RAIN SNOW SQUALL
## [85] HIGH SWELLS Mudslide
## [87] GUSTY WIND DUST DEVIL
## [89] Heavy snow shower FALLING SNOW/ICE
## [91] BLACK ICE Dust Devil
## [93] HYPERTHERMIA/EXPOSURE Mixed Precipitation
## [95] NON TSTM WIND Freezing Drizzle
## [97] Gusty wind/rain COLD WEATHER
## [99] GRADIENT WIND DENSE FOG
## [101] HYPOTHERMIA/EXPOSURE Heavy surf and wind
## [103] Strong Wind Beach Erosion
## [105] RECORD HEAT WINTER WEATHER MIX
## [107] RIVER FLOODING LANDSPOUT
## [109] Extreme Cold WARM WEATHER
## [111] SNOW AND ICE GUSTY WIND/HVY RAIN
## [113] WINDS Marine Accident
## [115] TSTM WIND 45 MUD SLIDE
## [117] Hurricane Edouard DOWNBURST
## [119] Heat Wave Cold
## [121] HEAVY SEAS Glaze
## [123] COASTAL STORM ICE ON ROAD
## [125] ICE ROADS DAM BREAK
## [127] SEICHE FREEZING DRIZZLE
## [129] FROST SMALL HAIL
## [131] Microburst COLD AND SNOW
## [133] TSTM WIND AND LIGHTNING TSTM WIND (41)
## [135] gradient wind Coastal Flood
## [137] RAIN/SNOW TSTM WIND G45
## [139] EXCESSIVE SNOW RAIN
## [141] LANDSLIDES TSTM WIND 40
## [143] HIGH WIND (G40) MIXED PRECIPITATION
## [145] TSTM WIND (G35) LAKE EFFECT SNOW
## [147] Cold Temperature COASTAL FLOODING
## [149] THUNDERSTORM WIND (G40) Mudslides
## [151] HAZARDOUS SURF BRUSH FIRE
## [153] TIDAL FLOODING Other
## [155] Ice jam flood (minor Tstm Wind
## [157] COASTALSTORM Wintry Mix
## [159] Gusty winds Tidal Flooding
## [161] Extended Cold Light snow
## [163] FLASH FLOOD blowing snow
## [165] Snow Squalls Wind Damage
## [167] Light Snowfall Snow
## [169] DROUGHT FLOOD/FLASH/FLOOD
## [171] TYPHOON FLASH FLOOD/FLOOD
## [173] Freezing Rain COASTAL EROSION
## [175] SNOW SQUALLS HIGH WINDS
## [177] TSTM WIND (G45) TSTM WIND (G45)
## [179] LATE SEASON SNOW ASTRONOMICAL HIGH TIDE
## [181] MARINE HIGH WIND Landslump
## [183] Coastal Storm NON-TSTM WIND
## [185] ROCK SLIDE BLOWING DUST
## [187] GUSTY WIND/HAIL TSTM WIND
## [189] MUDSLIDE VOLCANIC ASH
## [191] HIGH SURF ADVISORY TROPICAL DEPRESSION
## [193] LAKE-EFFECT SNOW NON-SEVERE WIND DAMAGE
## [195] STORM SURGE/TIDE MARINE STRONG WIND
## [197] DAMAGING FREEZE WHIRLWIND
## [199] Erosion/Cstl Flood COASTAL FLOODING/EROSION
## [201] COASTAL FLOODING/EROSION WIND AND WAVE
## [203] THUNDERSTORM WIND UNSEASONABLY COLD
## [205] FROST/FREEZE FREEZE
## [207] Damaging Freeze LAKESHORE FLOOD
## [209] ASTRONOMICAL LOW TIDE DENSE SMOKE
## [211] MARINE THUNDERSTORM WIND FREEZING FOG
## [213] TSUNAMI MARINE HAIL
## [215] Frost/Freeze Freeze
## [217] UNSEASONAL RAIN Heavy Rain/High Surf
## [219] AGRICULTURAL FREEZE Unseasonable Cold
## [221] Early Frost HARD FREEZE
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD FLASH FLOOD ... WND
We can see that many of the events are the same with different variations. If I try to map all the 500 odd event types to the 48 recorded event types from NOAA, I will go crazy. So let me just take top 10 events on Fatalities and Damages (Property + Crop) first and see what is happening here
stormSummary.Health <- stormData %>% group_by(EVTYPE) %>%
summarize(FATALITIES=sum(FATALITIES), INJURIES=sum(INJURIES))
stormSummary.Health <- stormSummary.Health %>% arrange(desc(FATALITIES))
stormSummary.Health
## # A tibble: 222 x 3
## EVTYPE FATALITIES INJURIES
## <fct> <dbl> <dbl>
## 1 EXCESSIVE HEAT 1797 6391
## 2 TORNADO 1511 20667
## 3 FLASH FLOOD 887 1674
## 4 LIGHTNING 651 4141
## 5 FLOOD 414 6758
## 6 RIP CURRENT 340 209
## 7 TSTM WIND 241 3629
## 8 HEAT 237 1222
## 9 HIGH WIND 235 1083
## 10 AVALANCHE 223 156
## # ... with 212 more rows
Now on Injuries
stormSummary.Health <- stormSummary.Health %>% arrange(desc(INJURIES))
stormSummary.Health
## # A tibble: 222 x 3
## EVTYPE FATALITIES INJURIES
## <fct> <dbl> <dbl>
## 1 TORNADO 1511 20667
## 2 FLOOD 414 6758
## 3 EXCESSIVE HEAT 1797 6391
## 4 LIGHTNING 651 4141
## 5 TSTM WIND 241 3629
## 6 FLASH FLOOD 887 1674
## 7 THUNDERSTORM WIND 130 1400
## 8 WINTER STORM 191 1292
## 9 HURRICANE/TYPHOON 64 1275
## 10 HEAT 237 1222
## # ... with 212 more rows
Now on Economic Risks
stormSummary.Economy <- stormData %>% group_by(EVTYPE) %>%
summarize(ECO_DAMAGE=sum(PROPDMG_REAL+CROPDMG_REAL))
stormSummary.Economy <- stormSummary.Economy %>% arrange(desc(ECO_DAMAGE))
stormSummary.Economy
## # A tibble: 222 x 2
## EVTYPE ECO_DAMAGE
## <fct> <dbl>
## 1 FLOOD 148919611950
## 2 HURRICANE/TYPHOON 71913712800
## 3 STORM SURGE 43193541000
## 4 TORNADO 24900370720
## 5 HAIL 17071172870
## 6 FLASH FLOOD 16557105610
## 7 HURRICANE 14554229010
## 8 DROUGHT 14413667000
## 9 TROPICAL STORM 8320186550
## 10 HIGH WIND 5881421660
## # ... with 212 more rows
So after taking the top 10 in each of the categories, we will do some substitutions and consolidate all the similar named EVTYPES into the main EVTYPES. Some Regex at work here.
stormData$NEWEVTYPE <- stormData$EVTYPE
stormData$NEWEVTYPE <- as.character(stormData$NEWEVTYPE)
stormData$NEWEVTYPE[grep("T[A-Z]*S[A-Z]*T[A-Z]*M[A-Z ]*WIND",stormData$NEWEVTYPE,TRUE)] <- "TSTM WIND"
stormData$NEWEVTYPE[grep("(HUR|TYP)",stormData$NEWEVTYPE,TRUE)] <- "HURRICANE/TYPHOON"
stormData$NEWEVTYPE[grep("FLASH[A-Z ]*FLOOD",stormData$NEWEVTYPE,TRUE)] <- "FLASH FLOOD"
stormData$NEWEVTYPE[grep("EX[A-Z ]*COLD",stormData$NEWEVTYPE,TRUE)] <- "EXTREME COLD"
stormData$NEWEVTYPE[grep("(RE|EX)[A-Z ]*HEAT",stormData$NEWEVTYPE,TRUE)] <- "EXCESSIVE HEAT"
stormData$NEWEVTYPE[grep("HEAT[A-Z ]*WAVE",stormData$NEWEVTYPE,TRUE)] <- "EXCESSIVE HEAT"
stormData$NEWEVTYPE[grep("^(EXCESSIVE)*HEAT",stormData$NEWEVTYPE,TRUE)] <- "HEAT"
stormData$NEWEVTYPE[grep("RIVER[A-Z ]*FLOOD",stormData$NEWEVTYPE,TRUE)] <- "FLOOD"
stormData$NEWEVTYPE[grep("^(FLASH)*FLOOD",stormData$NEWEVTYPE,TRUE)] <- "FLOOD"
stormData$NEWEVTYPE[grep("COLD",stormData$NEWEVTYPE,TRUE)] <- "COLD"
stormData$NEWEVTYPE[grep("(FROST|FREEZE)",stormData$NEWEVTYPE,TRUE)] <- "FROST/FREEZE"
stormData$NEWEVTYPE[grep("RAIN",stormData$NEWEVTYPE,TRUE)] <- "RAIN"
stormData$NEWEVTYPE[grep("ICE[A-Z ]*STORM",stormData$NEWEVTYPE,TRUE)] <- "ICE STORM"
stormData$NEWEVTYPE[grep("DROUGHT",stormData$NEWEVTYPE,TRUE)] <- "DROUGHT"
stormData$NEWEVTYPE[grep("TORNADO",stormData$NEWEVTYPE,TRUE)] <- "TORNADO"
stormData$NEWEVTYPE[grep("STORM[A-Z ]*SURGE",stormData$NEWEVTYPE,TRUE)] <- "STORM SURGE"
stormData$NEWEVTYPE[grep("TROP[A-Z ]*STORM",stormData$NEWEVTYPE,TRUE)] <- "TROPICAL STORM"
stormData$NEWEVTYPE[grep("WINT[A-Z ]*STORM",stormData$NEWEVTYPE,TRUE)] <- "WINTER STORM"
stormData$NEWEVTYPE[grep("HAIL",stormData$NEWEVTYPE,TRUE)] <- "HAIL"
stormData$NEWEVTYPE[grep("RIP[A-Z ]*CURR",stormData$NEWEVTYPE,TRUE)] <- "RIP CURRENT"
stormData$NEWEVTYPE[grep("LIGHTNING",stormData$NEWEVTYPE,TRUE)] <- "LIGHTNING"
stormData$NEWEVTYPE[grep("AVALANCHE",stormData$NEWEVTYPE,TRUE)] <- "AVALANCHE"
stormData$NEWEVTYPE[grep("LIGHTNING",stormData$NEWEVTYPE,TRUE)] <- "LIGHTNING"
stormData$NEWEVTYPE[grep("HIGH[ ]*WIND",stormData$NEWEVTYPE,TRUE)] <- "HIGH WIND"
stormData$NEWEVTYPE[grep("WIND",stormData$NEWEVTYPE,TRUE)] <- "HIGH WIND"
newevtypes <- unique(stormData$NEWEVTYPE)
newevtypes
## [1] "RIP CURRENT" "LIGHTNING"
## [3] "HAIL" "HIGH WIND"
## [5] "FLASH FLOOD" "HEAT"
## [7] "EXCESSIVE HEAT" "FLOOD"
## [9] "TORNADO" "Coastal Flooding"
## [11] "BLIZZARD" "WINTER STORM"
## [13] "HEAVY SNOW" "RAIN"
## [15] "FOG" "HEAVY SURF/HIGH SURF"
## [17] "URBAN/SML STREAM FLD" "STORM SURGE"
## [19] "DUST STORM" "AVALANCHE"
## [21] "ICE STORM" "Heavy Surf"
## [23] "COASTAL FLOOD" "WILD/FOREST FIRE"
## [25] "HURRICANE/TYPHOON" "DRY MICROBURST"
## [27] "HIGH WATER" "HIGH SURF"
## [29] "COLD" "WINTER WEATHER/MIX"
## [31] "WET MICROBURST" "ROGUE WAVE"
## [33] "High Surf" "UNSEASONABLY WARM"
## [35] "Freezing Spray" "TROPICAL STORM"
## [37] "HEAVY SURF" "LIGHT SNOW"
## [39] "Lake Effect Snow" "WINTRY MIX"
## [41] "OTHER" "Hypothermia/Exposure"
## [43] "WATERSPOUT" "SNOW"
## [45] "WILDFIRE" "ROUGH SURF"
## [47] "GLAZE" "ICY ROADS"
## [49] "LANDSLIDE" "WINTER WEATHER"
## [51] "Light Snow" "THUNDERSTORM"
## [53] "HIGH SEAS" "ROUGH SEAS"
## [55] "Freezing drizzle" "FUNNEL CLOUD"
## [57] "MIXED PRECIP" "DROWNING"
## [59] "SNOW SQUALL" "HIGH SWELLS"
## [61] "Mudslide" "DUST DEVIL"
## [63] "Heavy snow shower" "FALLING SNOW/ICE"
## [65] "BLACK ICE" "Dust Devil"
## [67] "HYPERTHERMIA/EXPOSURE" "Mixed Precipitation"
## [69] "Freezing Drizzle" "DENSE FOG"
## [71] "HYPOTHERMIA/EXPOSURE" "Beach Erosion"
## [73] "WINTER WEATHER MIX" "LANDSPOUT"
## [75] "WARM WEATHER" "SNOW AND ICE"
## [77] "Marine Accident" "MUD SLIDE"
## [79] "DOWNBURST" "HEAVY SEAS"
## [81] "Glaze" "COASTAL STORM"
## [83] "ICE ON ROAD" "ICE ROADS"
## [85] "DAM BREAK" "SEICHE"
## [87] "FREEZING DRIZZLE" "FROST/FREEZE"
## [89] "Microburst" "Coastal Flood"
## [91] "EXCESSIVE SNOW" "LANDSLIDES"
## [93] "MIXED PRECIPITATION" "LAKE EFFECT SNOW"
## [95] "COASTAL FLOODING" "Mudslides"
## [97] "HAZARDOUS SURF" "BRUSH FIRE"
## [99] "TIDAL FLOODING" "Other"
## [101] "Ice jam flood (minor" "COASTALSTORM"
## [103] "Wintry Mix" "Tidal Flooding"
## [105] "Light snow" "blowing snow"
## [107] "Snow Squalls" "Light Snowfall"
## [109] "Snow" "DROUGHT"
## [111] "COASTAL EROSION" "SNOW SQUALLS"
## [113] "LATE SEASON SNOW" "ASTRONOMICAL HIGH TIDE"
## [115] "Landslump" "Coastal Storm"
## [117] "ROCK SLIDE" "BLOWING DUST"
## [119] "MUDSLIDE" "VOLCANIC ASH"
## [121] " HIGH SURF ADVISORY" "TROPICAL DEPRESSION"
## [123] "LAKE-EFFECT SNOW" "Erosion/Cstl Flood"
## [125] "COASTAL FLOODING/EROSION" "COASTAL FLOODING/EROSION"
## [127] "LAKESHORE FLOOD" "ASTRONOMICAL LOW TIDE"
## [129] "DENSE SMOKE" "FREEZING FOG"
## [131] "TSUNAMI"
Results
Let us present some Cross Tabulations. Also we will only show to an extent where the CUMULATIVE TOTAL is up to 80%
Year Wise Fatalities
storm.xtab.Year.Fatalities <- stormData %>%
group_by(NEWEVTYPE, BGN_YEAR)%>%
summarise(FATALITIES=sum(FATALITIES))%>%
spread(BGN_YEAR, FATALITIES)
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
storm.xtab.Year.Fatalities$TOTAL <- rowSums(storm.xtab.Year.Fatalities[2:ncol(storm.xtab.Year.Fatalities)],na.rm = TRUE)
storm.xtab.Year.Fatalities <- storm.xtab.Year.Fatalities %>% arrange(desc(TOTAL))
sum.of.total.Fatalities <- sum(storm.xtab.Year.Fatalities$TOTAL)
sum.of.total.Fatalities.80 <- sum.of.total.Fatalities * 0.8
storm.xtab.Year.Fatalities$PERCENTAGE <- round(storm.xtab.Year.Fatalities$TOTAL*100/sum.of.total.Fatalities,2)
storm.xtab.Year.Fatalities$CUMTOTAL <- cumsum(storm.xtab.Year.Fatalities$TOTAL)
storm.xtab.Year.Fatalities <- storm.xtab.Year.Fatalities %>% filter(CUMTOTAL < sum.of.total.Fatalities.80)
storm.xtab.Year.Fatalities %>% select(-CUMTOTAL) %>% kable()
| NEWEVTYPE | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | TOTAL | PERCENTAGE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EXCESSIVE HEAT | 36 | 80 | 168 | 502 | 157 | 165 | 167 | 36 | 6 | 158 | 205 | 22 | 15 | 7 | 39 | 36 | 1799 | 20.60 |
| TORNADO | 26 | 68 | 130 | 94 | 41 | 40 | 55 | 54 | 35 | 38 | 67 | 81 | 129 | 21 | 45 | 587 | 1511 | 17.30 |
| FLASH FLOOD | 92 | 76 | 70 | 49 | 27 | 34 | 32 | 69 | 58 | 28 | 59 | 70 | 55 | 33 | 67 | 68 | 887 | 10.16 |
| HIGH WIND | 60 | 82 | 65 | 62 | 57 | 30 | 45 | 44 | 48 | 23 | 52 | 31 | 60 | 37 | 32 | 73 | 801 | 9.17 |
| LIGHTNING | 53 | 42 | 44 | 46 | 52 | 44 | 51 | 44 | 33 | 38 | 45 | 45 | 25 | 34 | 29 | 26 | 651 | 7.46 |
| RIP CURRENT | 25 | 10 | 22 | 23 | 29 | 49 | 43 | 41 | 32 | 35 | 26 | 46 | 44 | 39 | 49 | 29 | 542 | 6.21 |
| FLOOD | 33 | 39 | 62 | 16 | 6 | 13 | 10 | 18 | 24 | 15 | 19 | 17 | 22 | 23 | 41 | 58 | 416 | 4.76 |
| COLD | 56 | 39 | 11 | 7 | 20 | 4 | 11 | 19 | 27 | 24 | 10 | 27 | 39 | 28 | 27 | 23 | 372 | 4.26 |
So we can see that mostly the fatalities are due to Excessive Heat and not due to tornadoes as my initial raw understanding of data had suggested. Even if you look at tornado, there is one big chunk of fatalities in 2011.
On doing some Google Search I found out this article on Wikipedia which explains the 2011 figure
https://en.wikipedia.org/wiki/2011_Super_Outbreak
If you ignore that, the fatalities from Tornado comes down drastically. Perhaps the reason might be that Government gives enough advance warnings and relocates people. Excessive Heat caused most fatalities perhaps it was not given that much footage.
Now let us look at the Cross Tab for Injuries
storm.xtab.Year.Injuries <- stormData %>%
group_by(NEWEVTYPE, BGN_YEAR)%>%
summarise(INJURIES=sum(INJURIES))%>%
spread(BGN_YEAR, INJURIES)
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
storm.xtab.Year.Injuries$TOTAL <- rowSums(storm.xtab.Year.Injuries[2:ncol(storm.xtab.Year.Injuries)],na.rm = TRUE)
storm.xtab.Year.Injuries <- storm.xtab.Year.Injuries %>% arrange(desc(TOTAL))
sum.of.total.Injuries <- sum(storm.xtab.Year.Injuries$TOTAL)
sum.of.total.Injuries.80 <- sum.of.total.Injuries * 0.8
storm.xtab.Year.Injuries$PERCENTAGE <- round(storm.xtab.Year.Injuries$TOTAL*100/sum.of.total.Injuries,2)
storm.xtab.Year.Injuries$CUMTOTAL <- cumsum(storm.xtab.Year.Injuries$TOTAL)
storm.xtab.Year.Injuries <- storm.xtab.Year.Injuries %>% filter(CUMTOTAL < sum.of.total.Injuries.80)
storm.xtab.Year.Injuries %>% select(-CUMTOTAL) %>% kable()
| NEWEVTYPE | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | TOTAL | PERCENTAGE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TORNADO | 705 | 1033 | 1874 | 1842 | 882 | 743 | 968 | 1087 | 396 | 537 | 992 | 659 | 1690 | 397 | 699 | 6163 | 20667 | 35.65 |
| FLOOD | 19 | 286 | 6130 | 9 | 6 | 8 | 5 | 5 | 99 | 16 | 5 | 8 | 16 | 10 | 127 | 10 | 6759 | 11.66 |
| HIGH WIND | 450 | 552 | 949 | 473 | 445 | 441 | 410 | 382 | 311 | 228 | 329 | 320 | 342 | 238 | 370 | 436 | 6676 | 11.52 |
| EXCESSIVE HEAT | 129 | 530 | 633 | 1461 | 466 | 445 | 378 | 174 | 74 | 298 | 993 | 536 | 131 | 12 | 63 | 138 | 6461 | 11.14 |
| LIGHTNING | 315 | 307 | 283 | 243 | 371 | 372 | 256 | 238 | 279 | 309 | 245 | 139 | 207 | 201 | 182 | 194 | 4141 | 7.14 |
| FLASH FLOOD | 44 | 233 | 304 | 281 | 32 | 263 | 71 | 61 | 29 | 22 | 18 | 51 | 30 | 17 | 188 | 30 | 1674 | 2.89 |
Tornadoes cause more injuries than fatalities as is observed here. Tornadoes are more or less consistent across years. At #2 are floods but that as one can see is primarily due to a large number in 1998. Otherwise it is Tornadoes and Excessive Heat.
Let us try to investigate this 1998 outlier a bit
storm.1998 <- stormData[as.numeric(levels(stormData$BGN_YEAR)[stormData$BGN_YEAR]) == 1998,]
storm.1998 <- storm.1998[storm.1998$NEWEVTYPE=="FLOOD",]
storm.1998 <- storm.1998[order(-storm.1998$INJURIES),]
storm.1998 %>% select(WFO,INJURIES) %>% head(20)
## WFO INJURIES
## 184522 EWX 800
## 199991 EWX 750
## 194360 EWX 600
## 196556 EWX 550
## 143853 EWX 500
## 188559 EWX 500
## 197658 EWX 500
## 199938 EWX 500
## 186071 EWX 300
## 191096 EWX 125
## 114242 EWX 100
## 179026 EWX 100
## 198550 EWX 100
## 198994 EWX 100
## 114328 EWX 70
## 114295 EWX 50
## 160481 EWX 50
## 179799 EWX 50
## 181078 EWX 50
## 182371 EWX 50
We can see that most of them are EWX. After doing some Google Search, EWX is the Weather Forecast office in Austin/San Antonio, Texas. My guess is that this event relates to the below Wikipedia Article -
https://en.wikipedia.org/wiki/October_1998_Central_Texas_floods
Now let us look at the Cross Tab for Economic Damage (figures in million). Since both property and crop damages ultimately are in terms are money, we are clubbing them together here.
storm.xtab.Year.Eco <- stormData %>%
group_by(NEWEVTYPE, BGN_YEAR)%>%
summarise(ECONOMIC_DAMAGE=sum(ECODMG_REAL))%>%
spread(BGN_YEAR, ECONOMIC_DAMAGE)
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
storm.xtab.Year.Eco$TOTAL <- rowSums(storm.xtab.Year.Eco[2:ncol(storm.xtab.Year.Eco)],na.rm = TRUE)
storm.xtab.Year.Eco <- storm.xtab.Year.Eco %>% arrange(desc(TOTAL))
sum.of.total.Eco <- sum(storm.xtab.Year.Eco$TOTAL)
sum.of.total.90.Eco <- sum.of.total.Eco * 0.9
storm.xtab.Year.Eco$PERCENTAGE <- round(storm.xtab.Year.Eco$TOTAL*100/sum.of.total.Eco,2)
storm.xtab.Year.Eco$CUMTOTAL <- cumsum(storm.xtab.Year.Eco$TOTAL)
storm.xtab.Year.Eco <- storm.xtab.Year.Eco %>% filter(CUMTOTAL < sum.of.total.90.Eco)
storm.xtab.Year.Eco %>% select(-CUMTOTAL) %>% kable()
| NEWEVTYPE | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | TOTAL | PERCENTAGE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FLOOD | 1418 | 6207 | 2048 | 416 | 1308 | 580 | 456 | 551 | 1149 | 1292 | 116599 | 997 | 3070 | 586 | 4126 | 7803 | 148606 | 37.57 |
| HURRICANE/TYPHOON | 1731 | 668 | 3719 | 5539 | 6 | 1 | 1013 | 1018 | 18921 | 51802 | NA | 37 | 2591 | 0 | NA | 21 | 87067 | 22.01 |
| STORM SURGE | 5 | 1 | 5 | 4 | 1 | 0 | 9 | 56 | 43 | 43058 | 0 | 4 | 4592 | 0 | 0 | 41 | 47819 | 12.09 |
| TORNADO | 694 | 701 | 1680 | 1960 | 394 | 595 | 776 | 1234 | 495 | 465 | 718 | 1366 | 1661 | 534 | 1113 | 9751 | 24137 | 6.10 |
| HAIL | 763 | 368 | 1457 | 577 | 519 | 2591 | 440 | 550 | 492 | 502 | 1381 | 501 | 551 | 1750 | 3484 | 474 | 16400 | 4.15 |
| FLASH FLOOD | 1048 | 725 | 493 | 1327 | 555 | 618 | 228 | 2071 | 795 | 269 | 2123 | 1202 | 1486 | 370 | 909 | 1411 | 15630 | 3.95 |
| DROUGHT | 638 | 276 | 2182 | 1334 | 2439 | 1274 | 737 | 1217 | 1 | 1388 | 2498 | 373 | 0 | 13 | 0 | 31 | 14401 | 3.64 |
So Floods are the single largest contributors to economic loss. Hurricanes and Storm Surge come second but a bulk of its figures come in 2005. My memory does vaguely remind me of Katrina but let us do some verification here.
hurricane.2005 <- stormData[as.numeric(levels(stormData$BGN_YEAR)[stormData$BGN_YEAR]) == 2005,]
hurricane.2005 <- hurricane.2005[(hurricane.2005$NEWEVTYPE=="HURRICANE/TYPHOON") | (hurricane.2005$NEWEVTYPE=="STORM SURGE"),]
hurricane.2005 <- hurricane.2005[order(-hurricane.2005$ECODMG_REAL),]
hurricane.2005 %>% select(WFO,ECODMG_REAL) %>% head(20)
## WFO ECODMG_REAL
## 3526 LIX 31300
## 359 LIX 16930
## 4609 LIX 11260
## 10099 MFL 10000
## 102769 JAN 7390
## 356 LIX 7350
## 5721 LCH 4000
## 2429 LCH 2090
## 111234 MOB 1500
## 1461 MOB 1000
## 199859 MFL 523
## 100867 LIX 432
## 99836 MOB 250
## 73359 HGX 160
## 182263 MOB 120
## 98675 TBW 101
## 98423 EYW 99
## 98954 TAE 62
## 200383 MHX 54
## 201172 JAN 53
So as we can see the maximum Economic Damage is reported by Weather Forecast Office LIX which is based out of New Orleans, followed by MFL & JAN which are based out of Miami, Florida and Jackson, Mississippi. We can very safely presume that this damage was caused by Katrina.
https://en.wikipedia.org/wiki/Hurricane_Katrina
Plots
Since we have outliers in all three departments, we are going to have an extra column for outlier and show a panel plot for each. One for outlier year and other for all years except the outlier.
Only then we can get a correct picture otherwise it will be heavily influenced by the outlier year.
spectral.colors <- brewer.pal(name="Spectral",n=11)
stormSummary.Fatalities <- stormData %>% mutate(OUTLIER_YEAR=ifelse(levels(BGN_YEAR)[BGN_YEAR]==2011,"2011","1996 to 2010")) %>%
group_by(NEWEVTYPE, OUTLIER_YEAR) %>%
summarize(FATALITIES=sum(FATALITIES)) %>% arrange(desc(FATALITIES))
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
ggplot(data=head(stormSummary.Fatalities,20), aes(NEWEVTYPE,FATALITIES)) +
geom_bar(stat="identity", size = 1, fill=spectral.colors[1]) +
labs(title = "Health Risks of Events (Fatalities)",
subtitle = "",
y = "Fatalities", x = "Event") +
facet_grid(. ~ OUTLIER_YEAR) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
stormSummary.Injuries <- stormData %>% mutate(OUTLIER_YEAR=ifelse(levels(BGN_YEAR)[BGN_YEAR]==1998,"1998","OTHER YEARS from 1996 to 2011")) %>%
group_by(NEWEVTYPE, OUTLIER_YEAR) %>%
summarize(INJURIES=sum(INJURIES)) %>% arrange(desc(INJURIES))
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
ggplot(data=head(stormSummary.Injuries,20), aes(NEWEVTYPE,INJURIES)) +
geom_bar(stat="identity", size = 1, fill=spectral.colors[2]) +
labs(title = "Health Risks of Events (Injuries)",
subtitle = "",
y = "Injuries", x = "Event") +
facet_grid(. ~ OUTLIER_YEAR) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
stormSummary.EcoDamages <- stormData %>% mutate(OUTLIER_YEAR=ifelse(levels(BGN_YEAR)[BGN_YEAR]==2005,"2005","OTHER YEARS from 1996 to 2011")) %>%
group_by(NEWEVTYPE, OUTLIER_YEAR) %>%
summarize(ECODMG_REAL=sum(ECODMG_REAL)) %>% arrange(desc(ECODMG_REAL))
## `summarise()` has grouped output by 'NEWEVTYPE'. You can override using the `.groups` argument.
ggplot(data=head(stormSummary.EcoDamages,20), aes(NEWEVTYPE,ECODMG_REAL)) +
geom_bar(stat="identity", size = 1, fill=spectral.colors[3]) +
labs(title = "Economic Risks of Events",
subtitle = "",
y = "Combined Property + Crop Damages (in million)", x = "Event") +
facet_grid(. ~ OUTLIER_YEAR) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
CONCLUSION
Till 1995 only Tornadoes, Thunderstom Wind and Hail were recorded. So whatever damage was caused Health or Economy was not correctly captured. We cannot come to any conclusions till 1995 since we do not have complete data. This is on basis of the NOAA Portal.
Health Risks are identified into Fatalities and Injuries.
Except for 2011, maximum fatalities seem to have been caused due to Excessive Heat. Secondary factors are Tornadoes, Flash Floods and High Winds. In 2011 there was a Tornado Super Outbreak which caused a lot of fatalities.
Except for 1998, maximum injuries seem to have been caused due to Tornadoes with secondary factors being Excessive Heat and High Winds. In 1998, there was Texas Flooding which looks to have caused lot of injuries.
Economic Risks are identified into Property and Crop Damages. We have clubbed them together since both ultimately boil down to money. Except for 2005, Floods are the major contributors for economic damages with other factors being Hurricanes/Typhoons and Tornadoes. In 2005 Hurricane Katrina caused a lot of devastation and economic damage.
Thanks
Jayesh