Synopsis

For this study the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is used. With this its possibel to examine the effect that severe weather events have in the U.S. The main goal will be to show which weather events that are most harmfull to the population, and which events that have the greatest economic consequences. There will also be a short notice, on which subjects could benefit from further analysis. The data processing is included for reproducibility, so feel free to skip to the results.

Data Processing

# packages used
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## De følgende objekter er maskerede fra 'package:stats':
## 
##     filter, lag
## 
## De følgende objekter er maskerede fra 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
library(reshape2)
library(ggplot2)

# read the NOAA data
storm.data <- read_csv("repdata-data-StormData.csv.bz2")

The purpose of this study is to analyse the most harmful events, and the economic consequences, so the data gets minimized to the variables that will help in that study.

storm.data <- storm.data %>% select(one_of("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
                                "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
storm.data$BGN_DATE <- as.Date(storm.data$BGN_DATE, "%m/%d/%Y")

# show histogram
hist(year(storm.data$BGN_DATE), xlab="year", main="Recorded weather events", breaks=60)

The NOAA database contains data back to 1950, but in the histogram we can see that the frequency that events gets recorded has increased significantly during the 90s. With that in mind and to lessen inflations effect on the economic part of the analysis we will look on the data from 1990 until november 2011.

storm.data <- storm.data %>% filter(year(storm.data$BGN_DATE)>=1990)
# analysis of event types
events <- as.data.frame(table(storm.data$EVTYPE))
events <- events %>% arrange(desc(Freq)) %>% mutate(Event=tolower(Var1), Var1=NULL) %>%
        select(one_of(c("Event", "Freq")))
str(events)
## 'data.frame':    977 obs. of  2 variables:
##  $ Event: chr  "hail" "tstm wind" "thunderstorm wind" "flash flood" ...
##  $ Freq : int  240945 147991 82563 54278 29764 25326 20843 20212 15755 15708 ...
head(events, 20)
##                       Event   Freq
## 1                      hail 240945
## 2                 tstm wind 147991
## 3         thunderstorm wind  82563
## 4               flash flood  54278
## 5                   tornado  29764
## 6                     flood  25326
## 7        thunderstorm winds  20843
## 8                 high wind  20212
## 9                 lightning  15755
## 10               heavy snow  15708
## 11               heavy rain  11723
## 12             winter storm  11433
## 13           winter weather   7026
## 14             funnel cloud   6839
## 15         marine tstm wind   6175
## 16 marine thunderstorm wind   5812
## 17               waterspout   3797
## 18              strong wind   3566
## 19     urban/sml stream fld   3392
## 20                 wildfire   2761

NOAA has defined 48 different event types, but the amount of registered event types is over 900. We will try to categorize the over 900 different registered types. This categorization will be an approcimation, where we try to atleast make sure that the most common registered events will be properly categorized, and also try to group the types further than what NOAA do.

storm.data <- mutate(storm.data, EVTYPE=tolower(EVTYPE))
storm.data$EVTYPE[grepl("hail", storm.data$EVTYPE)] <- "Hail"
storm.data$EVTYPE[grepl("tstm|thunder", storm.data$EVTYPE)] <- "Thunderstorm"
storm.data$EVTYPE[grepl("flash|flood|fld", storm.data$EVTYPE)] <- "Flood"
storm.data$EVTYPE[grepl("tornado|funnel|waterspout", storm.data$EVTYPE)] <- "Tornado"
storm.data$EVTYPE[grepl("light", storm.data$EVTYPE)] <- "Lightning"
storm.data$EVTYPE[grepl("snow|ice|blizzard|sleet|icy", storm.data$EVTYPE)] <- "Ice Storm"
storm.data$EVTYPE[grepl("rain|shower|precip", storm.data$EVTYPE)] <- "Rain"
storm.data$EVTYPE[grepl("wint|freez|cool|frost|cold|chill|glaze", storm.data$EVTYPE)] <- "Cold"
storm.data$EVTYPE[grepl("wind", storm.data$EVTYPE)] <- "Wind"
storm.data$EVTYPE[grepl("fire", storm.data$EVTYPE)] <- "Fire"
storm.data$EVTYPE[grepl("drought|dry", storm.data$EVTYPE)] <- "Drought"
storm.data$EVTYPE[grepl("heat|hot|warm|record", storm.data$EVTYPE)] <- "Heat"
storm.data$EVTYPE[grepl("fog", storm.data$EVTYPE)] <- "Fog"
storm.data$EVTYPE[grepl("wave|tide|surf|rip current|storm surge|tsunami", storm.data$EVTYPE)] <- "Wave & Tide"
storm.data$EVTYPE[grepl("hurricane|typhoon|tropical", storm.data$EVTYPE)] <- "Hurricane"
storm.data$EVTYPE[grepl("slide", storm.data$EVTYPE)] <- "Landslide"
storm.data$EVTYPE[grepl("dust", storm.data$EVTYPE)] <- "Dust Storm"
storm.data$EVTYPE[grepl("smoke", storm.data$EVTYPE)] <- "Smoke"

# some of the events are clearly not registered correctly by their event type, so they are removed.
storm.data <- storm.data[!grepl("summary", storm.data$EVTYPE),]
storm.data <- storm.data[!grepl("other", storm.data$EVTYPE),]
storm.data <- storm.data[!grepl("\\?", storm.data$EVTYPE),]

Results

Harmfull events

By grouping the data on the event types, and looking at fatalities and injuries, we can see which events are the most harmfull to the population.

harm.events <- storm.data %>% group_by(event=EVTYPE) %>%
        summarise(tot.fat=sum(FATALITIES), tot.inj=sum(INJURIES)) %>% 
        arrange(desc(tot.fat+tot.inj)) %>% 
        head(10) %>% melt(id=1)
colnames(harm.events) <- c("event", "harm.type", "value")
harm.events$event <- reorder(harm.events$event, harm.events$value)

ggplot(data=harm.events, aes(x=event, y=value, fill=harm.type)) +
        geom_bar(stat="identity", position="stack") + ylab("casualties") +
        coord_flip() + ggtitle("Harmfull events to the population 1990 - 2011") +
        scale_fill_discrete(name="legend", labels=c("fatalities", "injuries"))

The graphic shows that in the period 1990-2011 tornados have been the most harmfull, when you look at the amount of total casualties. But more importantly, we can also see that heat has caused the highest amount of fatalities.

Economic consequences

In this part we will look at the variables that register the estimated expenses in property damage and crop damage

# to calculate the economic effect we have to combine the prop/crop-dmg with the multiplier in prop/crop-dmgexp

# a helpful function
calc <- function(dmg, exp){
        if (is.numeric(exp)==TRUE)
                expo <-  as.numeric(exp)
        if (exp == "h")
                expo <- 2
        else if (exp[1] == "k")
                expo <- 3
        else if (exp[1] == "m")
                expo <- 6
        else if (exp[1] == "b")
                expo <- 9
        else
                expo <- 0
        dmg * 10^expo
}
# clean exp variables
storm.data$PROPDMGEXP <- tolower(storm.data$PROPDMGEXP)
storm.data$CROPDMGEXP <- tolower(storm.data$CROPDMGEXP)
storm.data$PROPDMGEXP[is.na(storm.data$PROPDMGEXP)] <- 0
storm.data$CROPDMGEXP[is.na(storm.data$CROPDMGEXP)] <- 0

# add calculated expenses to the df
storm.data <- mutate(storm.data, prop.dmg = mapply(calc, PROPDMG, PROPDMGEXP))
storm.data <- mutate(storm.data, crop.dmg = mapply(calc, CROPDMG, CROPDMGEXP))

# calculate and show the 10 most expensive weather events
tot.dmg <- storm.data %>% group_by(EVTYPE) %>% summarise(total=sum(prop.dmg+crop.dmg)) %>% arrange(desc(total))
tot.dmg10 <- tot.dmg[1:10,]
tot.dmg10
## Source: local data frame [10 x 2]
## 
##          EVTYPE        total
##           (chr)        (dbl)
## 1         Flood 179976092782
## 2     Hurricane  99173551360
## 3   Wave & Tide  48221044150
## 4       Tornado  30939173679
## 5          Hail  20734204440
## 6       Drought  15025675380
## 7  Thunderstorm  12138241370
## 8     Ice Storm  10914748662
## 9          Cold  10528439651
## 10         Fire   8899910130

We can clearly see that in the period 1990-2011 that flooding has caused the highest amout of expenses, followed by hurricanes and wave & tide events. Next We will visualize the data by year.

# sum the expenses by year
tot.dmg.year <- storm.data %>% filter(EVTYPE %in% tot.dmg10$EVTYPE) %>%
        group_by(year=year(BGN_DATE), EVTYPE) %>%
        summarise(total=sum(prop.dmg+crop.dmg))
# visualize in a lineplot by event over the years 1990-2011
ggplot(data=tot.dmg.year, aes(x=year, y=total/1000000, colour=EVTYPE)) + geom_line() +
        ylab("total expenses in millions") + ggtitle("Total expenses by year and event") +
        scale_color_discrete(name="legend")

The graphic shows that the economic consequences is very spiky with Flood spike in 2006, and smaller Hurricane and Wave & Tide spike in the year before. With spiky events that will have expenses far exceeding what you can cointain in one budget year we have to look for other solutions. Probably have to save up in funds, waiting to be used in years with excesive spikes.

Further study

Suggested topics for further study :