Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extend possible is a key concern.

This report aims to use the U.S. National Oceanic and Atomspheric Administration’s (NOAA) storm database to answer the following questions about severe weather events.

And, with that, it should help you prioritize resources for different types of severe weather events.

Data Processing

The following code chunk begins with downloading the Storm Data to data folder, and load all the data to variable wdata

if (!file.exists("data")) {
        dir.create("data")
}

if (!file.exists("./data/StormData.csv.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="./data/StormData.csv.bz2", method="curl")
}

wdata <- read.csv(bzfile("./data/StormData.csv.bz2"), stringsAsFactors=FALSE)

Then converting the data to data frame tbl using dplyr package. Then, have a look at the structure of the data.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
wdata1 <- tbl_df(wdata)
str(wdata1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

First of all, we will find out how much data were collected in each year, and also how is the data spread out.

library(tidyr)
library(lubridate)
library(grid)
library(gridExtra)
library(ggplot2)
names(wdata1) <- tolower(names(wdata1)) # Converting all the variables name to lowercase for quicker typing
wdata1 <- mutate(wdata1, year=year(mdy_hms(bgn_date)), year10s=cut(year, breaks=6, labels=c("1950-1960", "1960-1970", "1970-1980", "1980-1990", "1990-2000", "2000 onwards")))

plot1 <- ggplot(wdata1, aes(x=factor(year))) + 
        geom_bar() + 
        scale_x_discrete(breaks=c(1950, 1960, 1970, 1980, 1990, 2000, 2010)) + 
        ggtitle("Distribution of events logging between 1950 to 2011") +
        xlab("Year") +
        ylab("Number of events recorded")

plot2 <- ggplot(wdata1, aes(x=1, y=year)) + 
        geom_boxplot(stats="identity") + 
        xlab("Number of events recorded") +
        ylab("Year") +
        coord_flip() 

plot3 <- ggplot(droplevels(subset(wdata1, year10s != "1990-2000" & year10s != "2000 onwards")), 
                aes(x=year10s, fill=evtype)) + 
        geom_bar() + 
        ggtitle("Events recorded between 1950 to 1990") +
        xlab("Year") +
        ylab("Number of events recorded")

plot4 <- ggplot(droplevels(subset(wdata1, year10s != "1990-2000" & year10s != "2000 onwards")), 
                aes(x=state)) + 
        geom_bar() + 
        facet_grid(evtype ~ .) + 
        ggtitle("Events recorded in the states between 1950 to 1990") +
        xlab("State") +
        ylab("Number of events recorded") +
        scale_fill_discrete(name="Type of events")

grid.arrange(plot1, plot2, plot3, plot4, ncol=1)

From the first 2 plots, we see that there are not many data collected between 1950 to 1990. And, from 1995 onwards, the data collected increases significantly. Then, we cut the data into 6 pieces, 1950-1960, 1960-1970, 1970-1980, 1980-1990, 1990-2000 and 2000 onwards, to check the number and type of events collected in the number of states during those period. And, in the 3rd plot, you will find that between 1950 to 1990, there were only 3 type of events logged. They are HAIL, TORNADO and TSTM WIND (Thunderstorm Wind). Lastly, from the 4th plot, we will see that these 3 events are the highest in TX.

Next, I’ve decided to use the data between the 1st and 3rd quantile, which is year 1995 to 2007, from the 2nd plot for analysis.

wdata2 <- wdata1 %>%
        select(refnum, year, state, evtype, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp, remarks) %>%
        filter(year >= 1995 & year <= 2007)

wdata2$propdmgexp <- tolower(wdata2$propdmgexp)
wdata2$cropdmgexp <- tolower(wdata2$cropdmgexp)
wdata2$evtype <- tolower(wdata2$evtype)
wdata2$remarks <- as.character(wdata2$remarks)

for(i in c("b", "m", "k", "h")) {
        if (i == "b") {
                wdata2[grep(i, wdata2$propdmgexp), "propdmg"] <- wdata2[grep(i, wdata2$propdmgexp), "propdmg"] * 1000000000
                wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] <- wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] * 1000000000
        } else if (i == "m") {
                wdata2[grep(i, wdata2$propdmgexp), "propdmg"] <- wdata2[grep(i, wdata2$propdmgexp), "propdmg"] * 1000000
                wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] <- wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] * 1000000
        } else if (i == "k") {
                wdata2[grep(i, wdata2$propdmgexp), "propdmg"] <- wdata2[grep(i, wdata2$propdmgexp), "propdmg"] * 1000
                wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] <- wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] * 1000
        } else if (i == "h") {
                wdata2[grep(i, wdata2$propdmgexp), "propdmg"] <- wdata2[grep(i, wdata2$propdmgexp), "propdmg"] * 100
                wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] <- wdata2[grep(i, wdata2$cropdmgexp), "cropdmg"] * 100
        }
}

Now let’s look at those data that contains fatalities. Then, clean up the event type with the appropriate names.

fatal <- wdata2 %>%
        select(year, state, evtype, fatalities, remarks) %>%
        filter(fatalities > 0)

fatal$evtype <- sub("^blowing snow$", "blizzard", fatal$evtype)
fatal$evtype <- sub("^black ice$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^cold wave$", "extreme cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^cold weather$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^cold$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^coastal flooding$", "coastal flood", fatal$evtype)
fatal$evtype <- sub("^coastalstorm$", "coastal storm", fatal$evtype)
fatal$evtype <- sub("^cold temperature$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^cold and snow$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^drought/excessive heat$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^dry microburst$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^drowning$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^excessive rainfall$", "heavy rain", fatal$evtype)
fatal$evtype <- sub("^extended cold$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^extreme heat$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^extreme cold$", "extreme cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^extreme windchill$", "extreme cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^falling snow/ice$", "heavy snow", fatal$evtype)
fatal$evtype <- sub("^flash flooding$", "flash flood", fatal$evtype)
fatal$evtype <- sub("^flash floods$", "flash flood", fatal$evtype)
fatal$evtype <- sub("^flood & heavy rain$", "flood", fatal$evtype)
fatal$evtype <- sub("^flood/flash flood$", "flood", fatal$evtype)
fatal$evtype <- sub("^flooding$", "flood", fatal$evtype)
fatal$evtype <- sub("^fog$", "dense fog", fatal$evtype)
fatal$evtype <- sub("^freezing drizzle$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^freezing rain$", "frost freeze", fatal$evtype)
fatal$evtype <- sub("^freezing spray$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^frost$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^glaze$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^gusty wind$", "strong wind", fatal$evtype)
fatal$evtype <- sub("^heat wave drought$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^heat wave$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^heavy seas$", "marina strong wind", fatal$evtype)
fatal$evtype <- sub("^heavy snow and high winds$", "heavy snow", fatal$evtype)
fatal$evtype <- sub("^heavy surf$", "high surf", fatal$evtype)
fatal$evtype <- sub("^heavy surf and wind$", "high surf", fatal$evtype)
fatal$evtype <- sub("^high seas$", "high surf", fatal$evtype)
fatal$evtype <- sub("^heavy surf/high surf$", "high surf", fatal$evtype)
fatal$evtype <- sub("^high swells$", "high surf", fatal$evtype)
fatal$evtype <- sub("^high water$", "high surf", fatal$evtype)
fatal$evtype <- sub("^high waves$", "high surf", fatal$evtype)
fatal$evtype <- sub("^high winds$", "high wind", fatal$evtype)
fatal$evtype <- sub("^hurricane erin$", "hurricane", fatal$evtype)
fatal$evtype <- sub("^hurricane felix$", "hurricane", fatal$evtype)
fatal$evtype <- sub("^hurricane opal$", "hurricane", fatal$evtype)
fatal$evtype <- sub("^hurricane opal/high winds$", "hurricane", fatal$evtype)
fatal$evtype <- sub("^hurricane/typhoon$", "hurricane", fatal$evtype)
fatal$evtype <- sub("^hypothermia/exposure$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^hyperthermia/exposure$", "cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^ice$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^ice on road$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^icy roads$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^landslide(s)?", "debris flow", fatal$evtype)
fatal$evtype <- sub("^light snow$", "winter weather", fatal$evtype)
fatal$evtype <- sub("^lightning.$", "lightning", fatal$evtype)
fatal$evtype <- sub("^marina strong wind$", "marine strong wind", fatal$evtype)
fatal$evtype <- sub("^marine accident$", "marine strong wind", fatal$evtype)
fatal$evtype <- sub("^marine mishap$", "marine strong wind", fatal$evtype)
fatal$evtype <- sub("^marine tstm wind$", "marine thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^mixed precip$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^mudslide(s)?$", "flash flood", fatal$evtype)
fatal$evtype <- sub("^rip currents$", "rip current", fatal$evtype)
fatal[752, "evtype"] <- "flood"
fatal[753, "evtype"] <- "frost/freeze"
fatal$evtype <- sub("^rapidly rising water$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^record heat$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^rip currents/heavy surf$", "rip current", fatal$evtype)
fatal$evtype <- sub("^river flood(ing)?$", "flood", fatal$evtype)
fatal$evtype <- sub("^rough seas$", "marine strong wind", fatal$evtype)
fatal$evtype <- sub("^rough surf$", "high surf", fatal$evtype)
fatal$evtype <- sub("^snow and ice$", "winter weather", fatal$evtype)
fatal$evtype <- sub("^snow squall(s)?$", "heavy snow", fatal$evtype)
fatal$evtype <- sub("^storm surge$", "storm surge/tide", fatal$evtype)
fatal$evtype <- sub("^strong winds$", "strong wind", fatal$evtype)
fatal$evtype <- sub("^thunderstorm$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^thunderstorm wind \\(g40\\)", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^thunderstorm wind g52$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^thundertorm wind(s)?$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^thunderstorm winds$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^tstm wind$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^tstm wind \\(g35\\)$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^tstm wind/hail$", "thunderstorm wind", fatal$evtype)
fatal$evtype <- sub("^thundersnow$", "frost/freeze", fatal$evtype)
fatal$evtype <- sub("^unseasonably cold$", "extreme cold/wind chill", fatal$evtype)
fatal$evtype <- sub("^unseasonably warm( and dry)?$", "excessive heat", fatal$evtype)
fatal$evtype <- sub("^urban and small stream floodin$", "heavy rain", fatal$evtype)
fatal$evtype <- sub("^urban.*fld$", "heavy rain", fatal$evtype)
fatal$evtype <- sub("^whirlwind$", "dust devil", fatal$evtype)
fatal$evtype <- sub("^wild/forest fire$", "wildfire", fatal$evtype)
position <- grep("^wind$", fatal$evtype)
for (i in position) {
        if (grepl("[H|]urricane", fatal[i, 5])) {
                fatal[i, "evtype"] = "hurricane"
        } else {
                fatal[i, "evtype"] = "strong wind"
        }
}
fatal$evtype <- sub("^wind storm$", "strong wind", fatal$evtype)
fatal$evtype <- sub("^winds$", "strong wind", fatal$evtype)
fatal$evtype <- sub("^winter storm high winds$", "strong wind", fatal$evtype)
fatal$evtype <- sub("^wintry mix$", "winter weather", fatal$evtype)
fatal$evtype <- sub("^winter weather/mix$", "winter weather", fatal$evtype)

Results

Let’s look at the type of events happened frequently and most harmful with respect to population health.

fatal %>%
        group_by(evtype) %>%
        summarize(eventCount = n(), fatalitiesCount=sum(fatalities)) %>%
        arrange(desc(fatalitiesCount), desc(eventCount)) %>%
        print
## Source: local data frame [39 x 3]
## 
##                     evtype eventCount fatalitiesCount
## 1           excessive heat        556            2106
## 2                     heat         79             764
## 3                  tornado        295             763
## 4              flash flood        454             723
## 5                lightning        572             616
## 6              rip current        361             408
## 7        thunderstorm wind        268             311
## 8                    flood        189             297
## 9  extreme cold/wind chill        174             247
## 10               high wind        166             226
## ..                     ...        ...             ...

Let’s have a look at the top 1% that has the highest fatalities rate. And, that figure is quantile(fatal$fatalities, probs=0.99).

top_fatal <- fatal %>%
        select(year, state, evtype, fatalities) %>%
        filter(fatalities >= quantile(fatal$fatalities, probs=0.99)) %>%
        arrange(desc(fatalities))

top_fatal_summary <- top_fatal %>% 
        group_by(evtype) %>%
        summarize(eventCount = n(), fatalitiesCount=sum(fatalities)) %>%
        arrange(desc(fatalitiesCount), desc(eventCount)) %>%
        print
## Source: local data frame [10 x 3]
## 
##             evtype eventCount fatalitiesCount
## 1   excessive heat         29             938
## 2             heat          3             630
## 3          tornado          5             120
## 4   tropical storm          1              22
## 5       heavy rain          1              19
## 6      flash flood          1              16
## 7        hurricane          1              15
## 8  cold/wind chill          1              14
## 9      debris flow          1              14
## 10        wildfire          1              14
ggplot(top_fatal, aes(x=evtype, y=fatalities, fill=state)) + geom_bar(position="dodge", stat="identity") + facet_grid(year ~ .)

Although there was 1 heat event that causes the highest fatal rate in United States. It was a unique case, as it happens only once out of 10 years. The most harmful event to the population is excessive heat event. It happens almost every year with a lot of death rates.

Now, let’s look at the type of events which have the greatest economic consequences. First of all, extract those data with property and crop damages. Then fix the names of event types.

damage <- wdata2 %>%
        select(year, state, evtype, propdmg, cropdmg, remarks) %>%
        filter(propdmg > 0 | cropdmg >0)

damage$evtype <- sub("^agricultural freeze$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^astronomical high tide$", "high surf", damage$evtype)
damage$evtype <- sub("^beach erosion$", "high surf", damage$evtype)
damage$evtype <- sub("^blowing dust$", "dust storm", damage$evtype)
damage$evtype <- sub("^blowing snow$", "winter weather", damage$evtype)
damage$evtype <- sub("^breakup flooding$", "flash flood", damage$evtype)
damage$evtype <- sub("^coastal  flooding/erosion$", "coastal flood", damage$evtype)
damage$evtype <- sub("^coastal flooding/erosion$", "coastal flood", damage$evtype)
damage$evtype <- sub("^coastal flooding$", "coastal flood", damage$evtype)
damage$evtype <- sub("^coastal erosion$", "high surf", damage$evtype)
damage$evtype <- sub("^cold$", "cold/wind chill", damage$evtype)
damage$evtype <- sub("^cold and wet conditions$", "heavy rain", damage$evtype)
damage$evtype <- sub("^dam break$", "flood", damage$evtype)
damage$evtype <- sub("^damaging freeze$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^downburst$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^drought/excessive heat$", "drought", damage$evtype)
damage$evtype <- sub("^dry microburst$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^dust devil waterspout$", "dust devil", damage$evtype)
damage$evtype <- sub("^early frost$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^erosion/cstl flood$", "coastal flood", damage$evtype)
damage$evtype <- sub("^excessive snow$", "heavy snow", damage$evtype)
damage$evtype <- sub("^excessive wetness$", "heavy rain", damage$evtype)
damage$evtype <- sub("^extended cold$", "extreme cold/wind chill", damage$evtype)
damage$evtype <- sub("^extreme cold$", "extreme cold/wind chill", damage$evtype)
damage$evtype <- sub("^extreme heat$", "excessive heat", damage$evtype)
damage$evtype <- sub("^extreme wind chill$", "extreme cold/wind chill", damage$evtype)
damage$evtype <- sub("^extreme windchill$", "extreme cold/wind chill", damage$evtype)
damage$evtype <- sub("^flash flood - heavy rain$", "flash flood", damage$evtype)
damage$evtype <- sub("^flash flood winds$", "flash flood", damage$evtype)
damage$evtype <- sub("^flash flood/ street$", "flash flood", damage$evtype)
damage$evtype <- sub("^flash flood/flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^flash flooding$", "flash flood", damage$evtype)
damage$evtype <- sub("^flash floods$", "flash flood", damage$evtype)
damage$evtype <- sub("^flood & heavy rain$", "flood", damage$evtype)
damage$evtype <- sub("^flood/flash$", "flash flood", damage$evtype)
damage$evtype <- sub("^flood/flash flood$", "flood", damage$evtype)
damage$evtype <- sub("^flood/flash/flood$", "flood", damage$evtype)
damage$evtype <- sub("^flood/rain/winds$", "flood", damage$evtype)
damage$evtype <- sub("^flooding$", "flood", damage$evtype)
damage$evtype <- sub("^flooding/heavy rain$", "flood", damage$evtype)
damage$evtype <- sub("^floods$", "flood", damage$evtype)
damage$evtype <- sub("^fog$", "dense fog", damage$evtype)
damage$evtype <- sub("^freeze$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^freezing drizzle$", "winter weather", damage$evtype)
damage$evtype <- sub("^freezing rain$", "winter weather", damage$evtype)
damage$evtype <- sub("^freezing rain/snow$", "winter weather", damage$evtype)
damage$evtype <- sub("^frost$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^glaze$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^glaze ice$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^gradient wind$", "tropical depression", damage$evtype)
damage$evtype <- sub("^grass fires$", "wildfire", damage$evtype)
damage$evtype <- sub("^gusty wind$", "strong wind", damage$evtype)
damage$evtype <- sub("^gusty wind/hail$", "strong wind", damage$evtype)
damage$evtype <- sub("^gusty wind/hvy rain$", "strong wind", damage$evtype)
damage$evtype <- sub("^gusty wind/rain$", "strong wind", damage$evtype)
damage$evtype <- sub("^gusty winds$", "strong wind", damage$evtype)
damage$evtype <- sub("^gustnado$", "strong wind", damage$evtype)
damage$evtype <- sub("^hail 0.75$", "hail", damage$evtype)
damage$evtype <- sub("^hail 075$", "hail", damage$evtype)
damage$evtype <- sub("^hail 100$", "hail", damage$evtype)
damage$evtype <- sub("^hail 125$", "hail", damage$evtype)
damage$evtype <- sub("^hail 150$", "hail", damage$evtype)
damage$evtype <- sub("^hail 175$", "hail", damage$evtype)
damage$evtype <- sub("^hail 200$", "hail", damage$evtype)
damage$evtype <- sub("^hail 275$", "hail", damage$evtype)
damage$evtype <- sub("^hail 450$", "hail", damage$evtype)
damage$evtype <- sub("^hail 75$", "hail", damage$evtype)
damage$evtype <- sub("^hail damage$", "hail", damage$evtype)
damage$evtype <- sub("^hailstorm$", "hail", damage$evtype)
damage$evtype <- sub("^hard freeze$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^heat wave$", "excessive heat", damage$evtype)
damage$evtype <- sub("^heat wave drought$", "excessive heat", damage$evtype)
damage$evtype <- sub("^heavy mix$", "heavy rain", damage$evtype)
damage$evtype <- sub("^heavy rain and flood$", "heavy rain", damage$evtype)
damage$evtype <- sub("^heavy rain/high surf$", "heavy rain", damage$evtype)
damage$evtype <- sub("^heavy rain/severe weather$", "heavy rain", damage$evtype)
damage$evtype <- sub("^heavy rains$", "heavy rain", damage$evtype)
damage$evtype <- sub("^heavy snow and strong winds$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy snow shower$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy snow squalls$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy snow-squalls$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy snow/high winds & flood$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy snow/ice$", "heavy snow", damage$evtype)
damage$evtype <- sub("^heavy surf$", "high surf", damage$evtype)
damage$evtype <- sub("^heavy surf coastal flooding$", "high surf", damage$evtype)
damage$evtype <- sub("^heavy surf/high surf$", "high surf", damage$evtype)
damage$evtype <- sub("^heavy swells$", "high surf", damage$evtype)
damage$evtype <- sub("^high  winds$", "high wind", damage$evtype)
damage$evtype <- sub("^high seas$", "high surf", damage$evtype)
damage$evtype <- sub("^high swells$", "high surf", damage$evtype)
damage$evtype <- sub("^high wind \\(g40\\)$", "high wind", damage$evtype)
damage$evtype <- sub("^high wind damage$", "high wind", damage$evtype)
damage$evtype <- sub("^high winds$", "high wind", damage$evtype)
damage$evtype <- sub("^high winds heavy rains$", "high wind", damage$evtype)
damage$evtype <- sub("^hurricane erin$", "hurricane", damage$evtype)
damage$evtype <- sub("^hurricane felix$", "hurricane", damage$evtype)
damage$evtype <- sub("^hurricane opal$", "hurricane", damage$evtype)
damage$evtype <- sub("^hurricane opal/high winds$", "hurricane", damage$evtype)
damage$evtype <- sub("^hurricane-generated swells$", "hurricane", damage$evtype)
damage$evtype <- sub("^hurricane/typhoon$", "hurricane", damage$evtype)
damage$evtype <- sub("^hvy rain$", "heavy rain", damage$evtype)
damage$evtype <- sub("^ice$", "winter weather", damage$evtype)
damage$evtype <- sub("^ice jam flood \\(minor$", "flash flood", damage$evtype)
damage$evtype <- sub("^ice jam flooding$", "flash flood", damage$evtype)
damage$evtype <- sub("^ice roads$", "winter weather", damage$evtype)
damage$evtype <- sub("^ice/strong winds$", "ice storm", damage$evtype)
damage$evtype <- sub("^icy roads$", "winter weather", damage$evtype)
damage$evtype <- sub("^lake effect snow$", "lake-effect snow", damage$evtype)
damage$evtype <- sub("^lake flood$", "lakeshore flood", damage$evtype)
damage$evtype <- sub("^landslide(s)?", "debris flow", damage$evtype)
damage$evtype <- sub("^landslump$", "debris flow", damage$evtype)
damage$evtype <- sub("^landspout$", "tornado", damage$evtype)
damage$evtype <- sub("^late season snow$", "winter weather", damage$evtype)
damage$evtype <- sub("^light freezing rain$", "winter weather", damage$evtype)
damage$evtype <- sub("^light snow$", "winter weather", damage$evtype)
damage$evtype <- sub("^light snowfall$", "winter weather", damage$evtype)
damage$evtype <- sub("^lightning  wauseon$", "lightning", damage$evtype)
damage$evtype <- sub("^lightning and heavy rain$", "lightning", damage$evtype)
damage$evtype <- sub("^lightning fire$", "lightning", damage$evtype)
damage$evtype <- sub("^lightning thunderstorm winds$", "lightning", damage$evtype)
damage$evtype <- sub("^ligntning$", "lightning", damage$evtype)
damage$evtype <- sub("^marine accident$", "marine high wind", damage$evtype)
damage$evtype <- sub("^marine tstm wind$", "marine thunderstorm wind", damage$evtype)
damage$evtype <- sub("^microburst$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^mixed precipitation$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^mudslide(s)?$", "flash flood", damage$evtype)
damage$evtype <- sub("^mud slides urban flooding$", "flash flood", damage$evtype)
position <- grep("^other$", damage$evtype)

for (i in position) {
        if (grepl("avalanches", damage[i, 6])) {
                damage[i, "evtype"] = "avalanche"
        } else if (grepl("dustdevil", damage[i, 6])) {
                damage[i, "evtype"] = "dust devil"
        } else if (grepl("dust devil", damage[i, 6])) {
                damage[i, "evtype"] = "dust devil"
        } else {
                damage[i, "evtype"] = "heavy rain"
        }
}

damage$evtype <- sub("^rain$", "heavy rain", damage$evtype)
damage$evtype <- sub("^record cold$", "cold/wind chill", damage$evtype)
damage$evtype <- sub("^river and stream flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^river flood$", "flood", damage$evtype)
damage$evtype <- sub("^river flooding$", "flood", damage$evtype)
damage$evtype <- sub("^rock slide$", "debris flow", damage$evtype)
damage$evtype <- sub("^rough surf$", "high surf", damage$evtype)
damage$evtype <- sub("^rural flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^rip currents$", "rip current", damage$evtype)
damage$evtype <- sub("^severe thunderstorm$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^severe thunderstorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^severe thunderstorms$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^small hail$", "hail", damage$evtype)
damage$evtype <- sub("^snow$", "winter weather", damage$evtype)
damage$evtype <- sub("^snow and ice$", "winter weather", damage$evtype)
damage$evtype <- sub("^snow freezing rain$", "winter weather", damage$evtype)
damage$evtype <- sub("^snow squall(s)?$", "heavy snow", damage$evtype)
damage$evtype <- sub("^snow/ice$", "winter weather", damage$evtype)
damage$evtype <- sub("^snow/sleet/freezing rain$", "winter weather", damage$evtype)
damage$evtype <- sub("^storm force winds$", "strong wind", damage$evtype)
damage$evtype <- sub("^storm surge$", "storm surge/tide", damage$evtype)
damage$evtype <- sub("^strong winds$", "strong wind", damage$evtype)
damage$evtype <- sub("^thundeerstorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderestorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thundersnow$", "frost/freeze", damage$evtype)
damage$evtype <- sub("^thunderstorm$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm  winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm damage to$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind 60 mph$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind 65 mph$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind 65mph$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind 98 mph$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind g55$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind g60$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind trees$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind.$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind/ tree$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind/ trees$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind/awning$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wind/lightning$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds 63 mph$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds and$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds g60$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds hail$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds lightning$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds/ flood$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds/hail$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm winds53$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm windshail$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm windss$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorm wins$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorms wind$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstorms winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunderstormw$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thundertorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^thunerstorm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tornado f0$", "tornado", damage$evtype)
damage$evtype <- sub("^tornado f1$", "tornado", damage$evtype)
damage$evtype <- sub("^tornado f2$", "tornado", damage$evtype)
damage$evtype <- sub("^tornado f3$", "tornado", damage$evtype)
damage$evtype <- sub("^tstm wind$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind  \\(g45\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind \\(41\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind \\(g35\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind \\(g40\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind \\(g45\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind 40$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind 45$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind 55$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind 65\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind and lightning$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind damage$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind g45$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind g58$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm wind/hail$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tstm winds$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^tunderstorm wind$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^typhoon$", "hurricane", damage$evtype)
damage$evtype <- sub("^tidal flooding$", "flood", damage$evtype)
damage$evtype <- sub("^tropical storm dean$", "tropical storm", damage$evtype)
damage$evtype <- sub("^tropical storm jerry$", "tropical storm", damage$evtype)
damage$evtype <- sub("^unseasonable cold$", "extreme cold/wind chill", damage$evtype)
damage$evtype <- sub("^unseasonably warm$", "excessive heat", damage$evtype)
damage$evtype <- sub("^unseasonal rain$", "tropical storm", damage$evtype)
damage$evtype <- sub("^urban flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^urban flooding$", "flash flood", damage$evtype)
damage$evtype <- sub("^urban/small stream flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^urban/sml stream fld$", "flash flood", damage$evtype)
damage$evtype <- sub("^waterspout tornado$", "waterspout", damage$evtype)
damage$evtype <- sub("^waterspout-tornado$", "waterspout", damage$evtype)
damage$evtype <- sub("^waterspout/ tornado$", "waterspout", damage$evtype)
damage$evtype <- sub("^waterspout/tornado$", "waterspout", damage$evtype)
damage$evtype <- sub("^wet microburst$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^whirlwind$", "tornado", damage$evtype)
damage$evtype <- sub("^wild fires$", "wildfire", damage$evtype)
damage$evtype <- sub("^wild/forest fire$", "wildfire", damage$evtype)
damage$evtype <- sub("^wild/forest fires$", "wildfire", damage$evtype)
position <- grep("^wind$", damage$evtype)
for (i in position) {
        if (grepl("[H|h]urricane", damage[i, 6])) {
                damage[i, "evtype"] = "hurricane"
        } else {
                damage[i, "evtype"] = "strong wind"
        }
}
damage$evtype <- sub("^wind and wave$", "strong wind", damage$evtype)
damage$evtype <- sub("^wind damage$", "strong wind", damage$evtype)
damage$evtype <- sub("^winds$", "strong wind", damage$evtype)
damage$evtype <- sub("^wind storm$", "strong wind", damage$evtype)
damage$evtype <- sub("^winter storm high winds$", "winter storm", damage$evtype)
damage$evtype <- sub("^winter weather mix$", "winter weather", damage$evtype)
damage$evtype <- sub("^winter weather/mix$", "winter weather", damage$evtype)
damage$evtype <- sub("^wintry mix$", "winter weather", damage$evtype)
damage$evtype <- sub("^ tstm wind$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^   high surf advisory$", "high surf", damage$evtype)
damage$evtype <- sub("^ tstm wind \\(g45\\)$", "thunderstorm wind", damage$evtype)
damage$evtype <- sub("^ flash flood$", "flash flood", damage$evtype)
damage$evtype <- sub("^non-severe wind damage$", "high wind", damage$evtype)

Let’s look at what is the highest property and/or crop damage between 1995 to 2007.

damage <- damage %>%
        select(year, state, evtype, propdmg, cropdmg) %>%
        gather(damageType, damageValue, -(year:evtype))

damage %>% 
        group_by(evtype) %>% 
        summarize(eventCount=n(), totalDamage=sum(damageValue)) %>%
        arrange(desc(totalDamage)) %>%
        print
## Source: local data frame [49 x 3]
## 
##               evtype eventCount  totalDamage
## 1              flood      10220 134183167672
## 2          hurricane        394  88155747810
## 3   storm surge/tide        350  43198086000
## 4            drought        390  14916349780
## 5        flash flood      27976  12351307550
## 6            tornado      17438  11902851727
## 7               hail      36312  11485708853
## 8     tropical storm        384   7919064200
## 9  thunderstorm wind     147050   7633704667
## 10          wildfire       1196   6962370360
## ..               ...        ...          ...

Let’s have a look at the top 1% that has the highest property or crop damage, quantile(damage$damageValue, probs=0.99).

top_damage <- damage %>%
        filter(damageValue >= quantile(damage$damageValue, probs=0.99))

top_damage_summary <- top_damage %>%
        group_by(year, evtype, state) %>%
        summarize(eventCount=n(), totalDamage=sum(damageValue), numState=n_distinct(state))

plot5 <- top_damage_summary %>%
        filter(evtype == "flood" | evtype == "hurricane" | evtype=="storm surge" | evtype=="drought" | evtype=="flash flood") %>%
        ggplot(aes(x=factor(year), y=totalDamage, fill=evtype)) + geom_bar(position="dodge", stat="identity")

plot6 <- top_damage_summary %>%
        filter(evtype == "flood" | evtype == "hurricane" | evtype=="storm surge" | evtype=="drought" | evtype=="flash flood") %>%
        ggplot(aes(x=evtype, y=totalDamage, fill=state)) + geom_bar(position="dodge", stat="identity") + facet_grid(year ~ .)

grid.arrange(plot5, plot6, ncol=1)

Looking at the summary and the first chart, event flood is the event that cuases the highest property and crop damages. However, when we look deeper in the second chart, you will realise that flood is so high because of 1 event in 2007. On average, hurricane is causing more damages than flood.