Synopsis

This paper shows the result of little investigation, based on NOAA Storm Database (more details can be covered on NOAA web-site). The database contains events from 1950 to November 2011 (on the site there are data up to June 2020), we use a dataset precompiled for educational purposes, keep that in mind when trying to compare analysis. To answer the desired questions, I’ve grouped data by event type with changing some similar values, also estimated total sums of money in order to make a conclusion about most expensive disaster. Two plots are presented at the end of this document, with top 20 position of each impact - to people and economics.

Libraries and environment

In this case I’ve used libraries ggplot2 and dplyr.

library(ggplot2)
library(dplyr)

Also, here’s my environment.

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251   
## [3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C                   
## [5] LC_TIME=Russian_Russia.1251    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.0.2   ggplot2_3.3.2
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.29       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0   
##  [5] colorspace_1.4-1 R6_2.4.1         rlang_0.4.7      stringr_1.4.0   
##  [9] tools_4.0.2      grid_4.0.2       gtable_0.3.0     xfun_0.17       
## [13] withr_2.2.0      htmltools_0.5.0  ellipsis_0.3.1   yaml_2.2.1      
## [17] digest_0.6.25    tibble_3.0.3     lifecycle_0.2.0  crayon_1.3.4    
## [21] purrr_0.3.4      vctrs_0.3.4      glue_1.4.2       evaluate_0.14   
## [25] rmarkdown_2.3    stringi_1.5.3    compiler_4.0.2   pillar_1.4.6    
## [29] generics_0.0.2   scales_1.1.1     pkgconfig_2.0.3

Here’s the code for downloading the data.

url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
localFile <- 'stormData.csv.bz2'
if(!file.exists(localFile)) download.file(url, localFile) 
storm <- read.csv(localFile)

Data Processing

The two questions of this analysis are what types of events are most harmful to population and to economic. Let’s see the total of them:

length(unique(storm$EVTYPE))
## [1] 985

There are 985 types, much more than 48 listed in the official document - the National Weather Service Storm Data Documentation. This amount obviously must be reduced.

x <- as.data.frame(sort(table(storm$EVTYPE), decreasing = T))
head(x, 15)
##                  Var1   Freq
## 1                HAIL 288661
## 2           TSTM WIND 219940
## 3   THUNDERSTORM WIND  82563
## 4             TORNADO  60652
## 5         FLASH FLOOD  54277
## 6               FLOOD  25326
## 7  THUNDERSTORM WINDS  20843
## 8           HIGH WIND  20212
## 9           LIGHTNING  15754
## 10         HEAVY SNOW  15708
## 11         HEAVY RAIN  11723
## 12       WINTER STORM  11433
## 13     WINTER WEATHER   7026
## 14       FUNNEL CLOUD   6839
## 15   MARINE TSTM WIND   6175

There seems to be some similar types in our dataset with different names (like TSTM WIND, THUNDERSTORM WIND and THUNDERSTORM WINDS on the beginning of this list), so I will try to transform 985 original types to the list of official 48. The list is available at this link.

I will do this part according to frequency in variable x. If two or more types are in one row, I take first type.

Hail, quite simple. Exclude winds and marine hail - there are such types, and winds are more frequent.

hail <- unique(storm$EVTYPE[grepl('HAIL', storm$EVTYPE) & !(grepl('WIND', storm$EVTYPE)) & !(grepl('MARINE', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% hail] <- 'HAIL'

Thunderstorm Wind also has several types (tstm wind, Thunderstorm Winds, Thunderstorm Wind). Exclude marine, tornado, flash flood and the word ‘Non’.

tstm <- unique(storm$EVTYPE[(grepl('TSTM', storm$EVTYPE) | (grepl('THUNDERSTORM', storm$EVTYPE))) & !(grepl('NON', storm$EVTYPE)) & !(grepl('MARINE', storm$EVTYPE)) & !(grepl('TORNADO', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% tstm] <- 'THUNDERSTORM WIND'

Tornado - just exclude waterspout.

tornado <- unique(storm$EVTYPE[grepl('TORNADO', storm$EVTYPE) & !(grepl('WATERSPOUT', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% tornado] <- 'TORNADO'

Flash flood. Exclude ice storms.

flashfl <- unique(storm$EVTYPE[grepl('FLASH', storm$EVTYPE) & !(grepl('ICE', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% flashfl] <- 'FLASH FLOOD'

Flood, exclude lakeshore, coastal and flash floods. Exclude heavy rain if it goes first.

flood <- unique(storm$EVTYPE[grepl('FLOOD', storm$EVTYPE) & !(grepl('LAKE', storm$EVTYPE)) & !(grepl('COAST', storm$EVTYPE)) & !(grepl('CSTL', storm$EVTYPE)) & !(grepl('FLASH', storm$EVTYPE)) & !(grepl('RAIN/', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% flood] <- 'FLOOD'

High wind - exclude marine, blizzard (if first), hurricane, winter storm and dust storm.

highWind <- unique(storm$EVTYPE[grepl('HIGH WIND', storm$EVTYPE) & !(grepl('MARINE', storm$EVTYPE)) & !(grepl('BLIZZARD/', storm$EVTYPE)) & !(grepl('HURRICANE', storm$EVTYPE)) & !(grepl('WINTER STORM', storm$EVTYPE)) & !(grepl('DUST', storm$EVTYPE))])
storm$EVTYPE[storm$EVTYPE %in% highWind] <- 'HIGH WIND'

Ligthning. This is simple.

lightning <- unique(storm$EVTYPE[grepl('LIGHTNING', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% lightning] <- 'LIGHTNING'

Heavy snow, just exclude blizzard.

hsnow <- unique(storm$EVTYPE[grepl('HEAVY', storm$EVTYPE) & grepl('SNOW', storm$EVTYPE) & !grepl('BLIZZARD', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% hsnow] <- 'HEAVY SNOW'

Heavy rain.

hrain <- unique(storm$EVTYPE[grepl('HEAVY', storm$EVTYPE) & grepl('RAIN', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% hrain] <- 'HEAVY RAIN'

Winter storm.

wintstorm <- unique(storm$EVTYPE[grepl('WINTER', storm$EVTYPE) & grepl('STORM', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% wintstorm] <- 'WINTER STORM'

Storm surge/tide.

tide <- unique(storm$EVTYPE[grepl('TIDE', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% tide] <- 'STORM SURGE/TIDE'

Rip current.

rip <- unique(storm$EVTYPE[grepl('RIP', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% rip] <- 'RIP CURRENT'

Fog.

fog <- unique(storm$EVTYPE[grepl('FOG', storm$EVTYPE)])
storm$EVTYPE[storm$EVTYPE %in% fog] <- 'FREEZING FOG'

Last transformation, I merge similar data types manually: 1) MARINE TSTM WIND with MARINE THUNDERSTORM WIND 2) HURRICANE/TYPHOON with HURRICANE 3) WILDFIRE with WILD/FOREST FIRE

storm$EVTYPE[storm$EVTYPE == 'MARINE TSTM WIND'] <- 'MARINE THUNDERSTORM WIND'
storm$EVTYPE[storm$EVTYPE == 'HURRICANE'] <- 'HURRICANE/TYPHOON'
storm$EVTYPE[storm$EVTYPE == 'WILD/FOREST FIRE'] <- 'WILDFIRE'

Now the frequency decreases fast, all other types seems not to be of any interest until I get information about their consequences.

Economical

Now let’s look at the economical part. The economical consequences in the original dataset are presented in 4 columns, two of them are numbers, other two (PROPDMGEXP and CROPDMGEXP) seems to be multipliers.

unique(storm$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

K/k looks like Kilo~ (10^3), M/m for Millions (10^6) and B for Billions (10^9). Before estimating I drop all rows with all other values, using logical ‘OR’ - because there can be some cases that caused damage only to property or only to crops. After that clean all messy values that are still there and transform multipliers to numbers, then add a column with total sum.

econStorm <- storm %>% filter(PROPDMGEXP %in% c("K", "M", "B", "m") | CROPDMGEXP %in% c("M", "K", "m", "B", "k"))
#property
econStorm$PROPDMGEXP[!(econStorm$PROPDMGEXP %in% c("K", "M", "B", "m"))] <- 0
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("K"))] <- 1000
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("M", "m"))] <- 1000000
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("B"))] <- 1000000000
econStorm$PROPDMGEXP <- as.integer(econStorm$PROPDMGEXP)
#crops
econStorm$CROPDMGEXP[!(econStorm$CROPDMGEXP %in% c("M", "K", "m", "B", "k"))] <- 0
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("K", "k"))] <- 1000
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("M", "m"))] <- 1000000
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("B"))] <- 1000000000
econStorm$CROPDMGEXP <- as.integer(econStorm$CROPDMGEXP)
#total
econStorm <- econStorm %>% mutate(total = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)

To check if total sum was calculated properly, I compare it with REMARKS column. This column can give some additional info about economical consequences.

econStorm$total[which.max(econStorm$total)]
## [1] 115032500000
econStorm$REMARKS[which.max(econStorm$total)]
## [1] "Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million."

So, this two sums are to far from each other, $115.03 billion and $76 million. Now I try to interprete B as million.

econStorm <- storm %>% filter(PROPDMGEXP %in% c("K", "M", "B", "m") | CROPDMGEXP %in% c("M", "K", "m", "B", "k"))
#property
econStorm$PROPDMGEXP[!(econStorm$PROPDMGEXP %in% c("K", "M", "B", "m"))] <- 0
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("K"))] <- 1000
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("M", "m"))] <- 1000000
econStorm$PROPDMGEXP[(econStorm$PROPDMGEXP %in% c("B"))] <- 1000000
econStorm$PROPDMGEXP <- as.integer(econStorm$PROPDMGEXP)
#crops
econStorm$CROPDMGEXP[!(econStorm$CROPDMGEXP %in% c("M", "K", "m", "B", "k"))] <- 0
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("K", "k"))] <- 1000
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("M", "m"))] <- 1000000
econStorm$CROPDMGEXP[(econStorm$CROPDMGEXP %in% c("B"))] <- 1000000
econStorm$CROPDMGEXP <- as.integer(econStorm$CROPDMGEXP)
#total
econStorm <- econStorm %>% mutate(total = (PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP) / 1000000000)

Check again.

econStorm$total[which.max(econStorm$total)]
## [1] 1.104
econStorm$REMARKS[which.max(econStorm$total)]
## [1] "The collective effects of Hurricane Charley in southwest and west central Florida during August 13th resulted in 8 direct fatalities, 16 indirect fatalities, 792 injuries, eight tornadoes, an eight foot storm surge in Lee County, an estimated $11.2 billion in property damage (estimated to be about twice that of the insured damage), and $460 million in crop damage.  The fast movement of Hurricane Charley limited rain fall totals to 4 to 6 inches along the track of the hurricane's eye wall. River flooding of one to three feet above flood stage was found on area rivers south of Interstate 4.  \n\nIn Hardee County there were 6 injuries, zero fatalities, and damage was estimated at $750 million.  A wind gust of 109 mph was recorded in Wauchula.   Approximately 1,400 homes were destroyed and 3,600 homes had minor to major damage.  The storm knocked out the power to the entire county.  Many citrus groves north of Wauchula were severely damaged. \n\nIn Highlands County there were reports of severely damaged homes in the Sebring and Avon Park areas.  A radio tower was blown down near Sebring and numerous trees and power lines were downed in the eastern and northern parts of the county. \n\nIn Polk County there was one direct fatality and six indirect fatalities. There were six injuries, mostly due to carbon monoxide poisoning from backup generators used after the storm.  Half of the county lost power.  A 50 foot deep sink hole opened up and swallowed a car on State Road 60 east of Bartow.  A wind gust of 64 mph was recorded at the airport in Winter Haven before the instrument stopped working due to a power outage and before the hurricane arrived.  A peak gust of 69 knots (79 mph) was recorded in Haines City.  The Bartow and Lake Wales area were hit the hardest.  In all, 22,994 structures were damaged and 739  were destroyed.  Citrus groves around and south of Lake Wales were severely damaged. \n\nThe damage total listed was estimated by doubling the insured losses.  The doubling accounted for uninsured losses and damage to the public infrastructure.  Flood damage was included within these totals because it was not possible to extract the flood damage information from the available data.  M61VE"

Now I have $1.104 billion in total and $11.2 billion in REMARKS. Try another field.

econStorm$total[econStorm$REFNUM == 267880]
## [1] 0.79215
econStorm$REMARKS[econStorm$REFNUM == 267880]
## [1] "Hurricane Fran moved onshore near Cape Fear Thursday evening (9/6) and raced north toward Raleigh cutting a swath of destruction.  The Category 3 hurricane destroyed or damaged 90% of the homes in North Topsail Beach.  The town hall and police station were destroyed.  A 15 foot storm surge cut a 100-foot-wide inlet through thye middle of the island.  A Camp Lejeune-based Marine lost his life when he and two others mistakenly drove onto the island at the height of the storm.  State Route 1568 was washed out and NC210 was covered with sand.  Damage in North Topsail and Onslow County alone exceeds $500 million as 6,688 structures were destroyed or damaged.  In Carteret County, Emerald Isle reported 67 homes destroyed and 409 with major damage.  Thirty-three mobile homes were destroyed.  The Emerald Isle fishing pier was destroyed, and Bogue Sound Pier lost 150 feet.  Erosion along the dunes ranged from 5 to 20 feet.  Winds gusted to 100 mph at Atlantic Beach.  Storm surge in Swansboro was measured at 10 feet.  Several businesses along the waterfront were destroyed and water covered Main Street.  Storm surges approaching 9 feet flooded portions of Washington and Belhaven.  The storm surge in Washington was the highest since the September 3, 1912 unnamed hurricane.  New Bern had a storm surge on the Neuse River of 10 feet.  One bridge was closed when an approach road collapsed.  Other bridges were out near Stella and Swansboro.  One man was killed near Greenville when his car struck a tree.  In Duplin County, near Rose Hill, a woman perished when her chimney collapsed through the house.  A woman died when a tree fell through her trailer in Catherine Lake, Onslow County.  Wind gusts measured included 94 mph at the New River Marine Corps Air Station in Jacksonville and 92 mph at Duke Marine Labs in Beaufort.  M42VE, F92PH, F18MH, M19IW"

$500 million in REMARKS vs $792.15 million in total. Close enough, try with with B for billions to make sure. The total is just the same, so finally I make conclusion, that B in DMGEXP columns means million.

The top-20 types by economical damage.

econAgg <- aggregate(econStorm$total, by = list(econStorm$EVTYPE), sum)
names(econAgg) <- c('Type', 'USD.total')
econTop20 <- econAgg[order(econAgg$USD.total, decreasing = T),][1:20, ]

Social

Now the social damage.

socStorm <- storm %>% filter(FATALITIES > 0 | INJURIES > 0) %>% 
  mutate(total = FATALITIES + INJURIES) %>%
  select(EVTYPE, FATALITIES, INJURIES, total)

And the social top-20.

socAgg <- aggregate(socStorm[2:4], by = list(socStorm$EVTYPE), sum)
names(socAgg) <- c('Type', 'Fatalities', 'Injuries','People.total')
socTop20 <- socAgg[order(socAgg$People.total, decreasing = T),][1:20, ]

Results

ggplot(socTop20, aes(reorder(Type, -People.total), People.total)) + geom_col() +
  ggtitle('Top-20 disasters by social damage, USA, 1950-2011 years') +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(name = 'People total') +
  scale_x_discrete(name = 'Type of disaster') +
  theme(axis.text.x = element_text(angle = 45, hjust=1))

Tornadoes are most harmful with respect to population health.

ggplot(econTop20, aes(reorder(Type, -USD.total), USD.total)) + geom_col() +
  ggtitle('Top-20 disasters by economic consequences, USA, 1950-2011 years') +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(name = 'USD Billions total') +
  scale_x_discrete(name = 'Type of disaster') +
  theme(axis.text.x = element_text(angle = 45, hjust=1))

Tornadoes have the greatest economic consequences.