The purpose of this analysis is to identify types of storm events that pose the greatest potential harm to population health, and which ones have the greatest economic consequences.

The event data has been captured by the National Oceanographic and Atmospheric Administration (NOAA) in a storm database. I modified the data to facilitate analysis, deriving an event year variable, converting damage estimates to an equivalent dollar-based metric, mapping the free-text event types to 22 categories, and dropping data from 1995 and prior years, which may be incomplete and hence not well suited for estimating future impacts.

I generated boxplots representing the annual distribution of event counts, fatalities, injuries, property damage estimates, and crop damage estimates, ordered consistently by descending median event frequency, and tagged with thresholds for each metric that enable easy visual identification of high-impact events.

The analysis showed that heat events, while relatively less frequent, are associated with high absolute fatality and injury rates, constituting a major risk to human health. Tornadoes, floods, and lightning strikes are common and associated with high fatality and/or injury rates.

The events with the greatest economic consequences are somewhat different. Hurricanes result in huge property and crop losses, despite relatively low incidence rates. Droughts and floods are somewhat more common, and have particularly negative effects on crops. Meanwhile, thunderstorms, hail storms, floods, tornadoes, and winter storms are relatively frequent AND cause heavy property damages.

Data Processing

A number of supplemental packages must be loaded to support the analysis (assumed already installed).

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
library(gridExtra)
library(lubridate)
library(scales)
library(stringr)

I first read the contents of the Storm Data dataset into a data frame called storm, downloading the dataset into the current working directory if not already present.

if (!file.exists("StormData.csv.bz2"))   
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="StormData.csv.bz2")
storm <- read.csv("StormData.csv.bz2", stringsAsFactors=FALSE)

I combined the BGN_DATE / BGN_TIME and END_DATE / END_TIME fields into single date/time variables and reformatted them as POSIXct. The time component was ignored in cases with invalid entries.

storm$BGN_DT <- with(storm, substr(BGN_DATE, 1, nchar(BGN_DATE)-8))
storm$BGN_DT <- with(storm, paste(BGN_DT, " ", substr(BGN_TIME,1,2), ":", substr(BGN_TIME,3,4), sep=""))
storm$BGN_DT <- mdy_hm(storm$BGN_DT)
## Warning: 5 failed to parse.
# Revert to midnight if invalid time
storm$BGN_DT <- ifelse(is.na(storm$BGN_DT), mdy_hms(storm$BGN_DATE), storm$BGN_DT)

storm$END_DT <- with(storm, substr(END_DATE, 1, nchar(END_DATE)-8))
storm$END_DT <- with(storm, paste(END_DT, " ", substr(END_TIME,1,2), ":", substr(END_TIME,3,4), sep=""))
storm$END_DT <- mdy_hm(storm$END_DT)
## Warning: 243795 failed to parse.
# Revert to midnight if invalid time
storm$END_DT <- ifelse(is.na(storm$END_DT), mdy_hms(storm$END_DATE), storm$END_DT)

# Extract the year component of the BEGIN value for later use
storm$BGN_YR <- year(as.POSIXlt(storm$BGN_DT, origin=origin))

Property and crop damage estimates in some cases must be converted to the same scale using the magnitude indicator provided in the database (K/k = thousands, M/m = millions, B/b = billions). I created PROPDMG_AMT and CROPDMG_AMT variables to house damage estimates on an equivalent dollar basis. Modifiers other than K/k, M/m, and B/b were ignored (assumed to be data-entry errors).

storm$PROPDMG_AMT <- ifelse((storm$PROPDMGEXP %in% c("k","K")), storm$PROPDMG*1000, 
                            ifelse((storm$PROPDMGEXP %in% c("m","M")), storm$PROPDMG*1000000,
                                   ifelse((storm$PROPDMGEXP %in% c("b","B")), storm$PROPDMG*1000000000,
                                          storm$PROPDMG
                                   )
                            )
)
summary(storm$PROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   12.06    0.50 5000.00
summary(storm$PROPDMG_AMT)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11
storm$CROPDMG_AMT <- ifelse((storm$CROPDMGEXP %in% c("k","K")), storm$CROPDMG*1000, 
                            ifelse((storm$CROPDMGEXP %in% c("m","M")), storm$CROPDMG*1000000,
                                   ifelse((storm$CROPDMGEXP %in% c("b","B")), storm$CROPDMG*1000000000,
                                          storm$CROPDMG
                                   )
                            )
)
summary(storm$CROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.527   0.000 990.000
summary(storm$CROPDMG_AMT)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09

The EVTYPE field appears to be free-form text, in which the same event type can arise in a number of different forms.

I examined the contents to derive a handful of simple standardization rules that eliminate needless variation due to misspellings and intentional abbreviations.

storm$EVTYPE_MOD <- toupper(storm$EVTYPE)
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "TSTM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDERSTORMS", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDERSTROM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDESTORM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDEERSTORM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "SNOW-SQUALLS", "SNOW SQUALLS")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LAKE-EFFECT", "LAKE EFFECT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WINDS", "WIND")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WND", "WIND")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WAYTERSPOUT", "WATERSPOUT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WATER SPOUT", "WATERSPOUT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "CURRENTS", "CURRENT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "MICOBURST", "MICROBURST")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LIGHTING", "LIGHTNING")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LIGNTNING", "LIGHTNING")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "TORNDAO", "TORNADO")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "SML", "SMALL")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "FLD", "FLOOD")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "FLOOOD", "FLOOD")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "AVALANCE", "AVALANCHE")

#Examine common field values (100 or more instances)
data.frame(table(storm$EVTYPE_MOD)) %>% filter(Freq >= 100) %>% arrange(desc(Freq))
##                        Var1   Freq
## 1         THUNDERSTORM WIND 323383
## 2                      HAIL 288661
## 3                   TORNADO  60653
## 4               FLASH FLOOD  54277
## 5                     FLOOD  25327
## 6                 HIGH WIND  21747
## 7                 LIGHTNING  15758
## 8                HEAVY SNOW  15708
## 9  MARINE THUNDERSTORM WIND  11987
## 10               HEAVY RAIN  11742
## 11             WINTER STORM  11433
## 12           WINTER WEATHER   7045
## 13             FUNNEL CLOUD   6844
## 14               WATERSPOUT   3798
## 15              STRONG WIND   3773
## 16 URBAN/SMALL STREAM FLOOD   3422
## 17                 WILDFIRE   2761
## 18                 BLIZZARD   2719
## 19                  DROUGHT   2488
## 20                ICE STORM   2006
## 21           EXCESSIVE HEAT   1678
## 22         WILD/FOREST FIRE   1457
## 23             FROST/FREEZE   1343
## 24                DENSE FOG   1293
## 25       WINTER WEATHER/MIX   1104
## 26   THUNDERSTORM WIND/HAIL   1053
## 27  EXTREME COLD/WIND CHILL   1002
## 28              RIP CURRENT    774
## 29                     HEAT    767
## 30                HIGH SURF    734
## 31           TROPICAL STORM    690
## 32           FLASH FLOODING    683
## 33         LAKE EFFECT SNOW    659
## 34             EXTREME COLD    657
## 35            COASTAL FLOOD    656
## 36        FLOOD/FLASH FLOOD    625
## 37                     SNOW    617
## 38                LANDSLIDE    600
## 39          COLD/WIND CHILL    539
## 40                      FOG    538
## 41              MARINE HAIL    442
## 42               DUST STORM    427
## 43                AVALANCHE    387
## 44                     WIND    383
## 45              STORM SURGE    261
## 46            FREEZING RAIN    260
## 47              URBAN FLOOD    251
## 48     HEAVY SURF/HIGH SURF    228
## 49        EXTREME WINDCHILL    204
## 50           DRY MICROBURST    186
## 51         COASTAL FLOODING    183
## 52               LIGHT SNOW    176
## 53    ASTRONOMICAL LOW TIDE    174
## 54                HURRICANE    174
## 55              RIVER FLOOD    173
## 56            RECORD WARMTH    154
## 57               DUST DEVIL    149
## 58         STORM SURGE/TIDE    148
## 59         MARINE HIGH WIND    135
## 60        UNSEASONABLY WARM    126
## 61                 FLOODING    120
## 62   ASTRONOMICAL HIGH TIDE    103
## 63        MODERATE SNOWFALL    101

I also mapped the modified event types to general categories to ease interpretation. In cases that match multiple conditions, the last transformation applied will stand.

storm$EVTYPE_GRP = "OTHER"
storm[grep("TEMPERATUR", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "TEMPERATURE"
storm[grep("AVALANCHE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "AVALANCHE"
storm[grep("WIND|TURBULENCE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WIND"
storm[grep("RAIN|WET|SHOWER", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "RAIN"
storm[grep("HAIL", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HAIL"
storm[grep("FIRE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FIRE"
storm[grep("FOG", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FOG"
storm[grep("HEAT|WARM|HOT|HIGH TEMP", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HEAT"
storm[grep("COLD|COOL|LOW TEMP", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "COLD"
storm[grep("DROUGHT|DRY", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "DROUGHT"
storm[grep("FLOOD|HIGH WATER", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FLOOD"
storm[grep("SMOKE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "SMOKE"
storm[grep("DUST", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "DUST"
storm[grep("VOLCAN", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "VOLCANIC ERUPTION"
storm[grep("SNOW|WINTER|BLIZZARD|ICE|FROST|SLEET|MIX|FREEZ|ICY", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WINTER STORM"
storm[grep("LIGHTNING", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "LIGHTNING"
storm[grep("SLIDE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "LANDSLIDE"
storm[grep("TORNADO|FUNNEL|WATERSPOUT|MICROBURST|ROTATING|GUSTNADO", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "TORNADO"
storm[grep("THUNDERSTORM", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "THUNDERSTORM"
storm[grep("HURRICANE|TYPHOON|TROPICAL", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HURRICANE"
storm[grep("CURRENT|SURF|TIDE|SURGE|TSUNAMI|HIGH SEAS|SWELLS|WAVE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WAVES"

The storm dataset was now in a form well suited for the objectives of the analysis.

Analysis

The NOAA database documentation indicates that data collection has been sparse in the past. I first examined the data for clues as to when relatively complete reporting may have begun. As a first step, I summarized the event stream by year, examining event frequency, the number of reporting Weather Field Offices (WFOs), and the number of distinct state office (STATEOFFIC) values, then graphed the tallies by year.

storm_rptg <- storm %>%
  group_by(BGN_YR) %>%
  summarize(N=n(),
            N_WFO=length(unique(WFO)),
            N_SO=length(unique(STATEOFFIC))
            ) %>%
  arrange(BGN_YR)
tbl_df(storm_rptg)
## Source: local data frame [62 x 4]
## 
##    BGN_YR     N N_WFO  N_SO
##     (dbl) (int) (int) (int)
## 1    1950   223     1     1
## 2    1951   269     1     1
## 3    1952   272     1     1
## 4    1953   492     1     1
## 5    1954   609     1     1
## 6    1955  1413     1     1
## 7    1956  1703     1     1
## 8    1957  2184     1     1
## 9    1958  2213     1     1
## 10   1959  1813     1     1
## ..    ...   ...   ...   ...
p1 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N))+
  geom_bar(stat="identity", fill="slategrey")+
  labs(x="", y="Nbr of Events")+
  geom_vline(xintercept=1995.5,color="darkorange", size=1.5)+
  annotate("text", label="1996", x=1996, y=60000, color="darkorange", hjust=0)
p2 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N_WFO))+
  geom_bar(stat="identity", fill="slategrey")+
  labs(x="",y="Nbr of WFOs Reporting")+
  geom_vline(xintercept=1995.5,color="darkorange", size=1.5)
p3 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N_SO))+
  geom_bar(stat="identity", fill="slategrey")+
  labs(x="Year (BGN_DATE)",y="Nbr of State Offices")+
  geom_vline(xintercept=1995.5,color="darkorange", size=1.5)
grid.arrange(p1, p2, p3, nrow=3)

The number of events captured in the database increases steadily starting in 1994, whereas the WFO and State Office fields become consistently populated as of 1996. For the current purpose of assessing drivers of injury and loss, I decided to discard data collected prior to 1996.

I next aggregated the raw data structure by event type grouping and year (1996 forward only), creating a new data frame called storm_grpyr. This data structure captures the total number of events recorded, as well as annual fatalities, injuries, property damage amounts (in $millions), and crop damage amounts (also in $millions).

In the process, I also established certain thresholds for each metric, easing the identification of high-frequency/magnitude events.

storm_grpyr <- storm %>%
  filter(BGN_YR >= 1996) %>% 
  group_by(EVTYPE_GRP, BGN_YR) %>%
  summarize(
    N=n(),
    TF=sum(FATALITIES),
    TI=sum(INJURIES),
    TP=sum(PROPDMG_AMT/1000000),
    TC=sum(CROPDMG_AMT/1000000)
  ) %>%
  arrange(desc(N))

# Calculate summary statistics for key metrics
medn <- storm_grpyr %>%
  group_by(EVTYPE_GRP) %>%
  summarize(
    N_med=median(N),
    N_mean=mean(N),
    TF_med=median(TF),
    TI_med=median(TI),
    TP_med=median(TP),
    TC_med=median(TC)
  )

# Create factors representing "of-interest" thresholds per metric
medn$N_high <- as.factor(medn$N_med > 2500)
medn$TF_high <- as.factor(medn$TF_med > 25)
medn$TI_high <- as.factor(medn$TI_med > 250)
medn$TP_high <- as.factor(medn$TP_med > 250)
medn$TC_high <- as.factor(medn$TC_med > 100)

# Merge summary statistics back to main table
storm_grpyr <- left_join(storm_grpyr, medn, by="EVTYPE_GRP")
storm_grpyr$EVTYPE_GRP <- factor(storm_grpyr$EVTYPE_GRP, levels=storm_grpyr[order(storm_grpyr$N_med),]$EVTYPE_GRP, ordered=TRUE)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
# Examine the result
tbl_df(storm_grpyr)
## Source: local data frame [324 x 18]
## 
##    EVTYPE_GRP BGN_YR     N    TF    TI     TP    TC N_med N_mean TF_med
##        (fctr)  (dbl) (int) (dbl) (dbl)  (dbl) (dbl) (dbl)  (dbl)  (dbl)
## 1   AVALANCHE   2010    50    22    20 0.2018     0  24.5 23.625   12.5
## 2   AVALANCHE   2008    39    25    14 1.1260     0  24.5 23.625   12.5
## 3   AVALANCHE   2006    36    13     6 0.0000     0  24.5 23.625   12.5
## 4   AVALANCHE   2005    35    11    16 0.2000     0  24.5 23.625   12.5
## 5   AVALANCHE   2000    28    16    18 0.7550     0  24.5 23.625   12.5
## 6   AVALANCHE   2007    27    18    12 0.0020     0  24.5 23.625   12.5
## 7   AVALANCHE   2001    25    18     5 0.0440     0  24.5 23.625   12.5
## 8   AVALANCHE   2002    25    18    11 0.0315     0  24.5 23.625   12.5
## 9   AVALANCHE   1999    24    26    10 0.0650     0  24.5 23.625   12.5
## 10  AVALANCHE   2011    21     9     8 0.0550     0  24.5 23.625   12.5
## ..        ...    ...   ...   ...   ...    ...   ...   ...    ...    ...
## Variables not shown: TI_med (dbl), TP_med (dbl), TC_med (dbl), N_high
##   (fctr), TF_high (fctr), TI_high (fctr), TP_high (fctr), TC_high (fctr)

Results

The analysis is driven by a series of boxplots showing event and key metric distributions over the 1996-2011 period represented in the database.

First, I examined overall event frequencies and impacts on population health, as expressed by the numbers of fatalities and injuries associated with each event type. Note that the event types are presented in descending order of the median number of annual events (the associated factor conversion generated several warnings). I also used log10-scale conversions to ease visual interpretation.

p1 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=N, fill=N_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Annual Events", x="", y="")+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=2500, color="darkorange", size=1)
p2 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=N, fill=N_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Annual Events (log10)", x="", y="")+
  scale_y_log10()+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=2500, color="darkorange", size=1)
p3 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TF, fill=TF_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Annual Fatalities (log10)", x="", y="")+
  scale_y_log10()+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=25, color="darkorange", size=1)
p4 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TI, fill=TI_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Annual Injuries (log10)", x="", y="")+
  scale_y_log10()+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=250, color="darkorange", size=1)
grid.arrange(p1, p2, p3, p4, nrow=2, ncol=2)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 86 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 69 rows containing non-finite values (stat_boxplot).

Thunderstorms, hail storms, and floods constitute the most frequent event types, with low and inconsistent incident rates for most other categories. Events generating relatively high fatality rates (medians exceeding the threshold rate of 25 per year) included floods, tornadoes, wind, lightning, waves, and heat. Heat events displayed a wide range of fatality rates over the period with the highest median of any category. Events generating relatively high injury rates (medians exceeding the threshold rate of 250 per year) included thunderstorms, tornadoes, lightning, and heat.

In summary, heat events have dramatic impacts on population health, despite being relatively uncommon. Tornadoes, floods, and lightning strikes are relatively common and also associated with high fatality and/or injury rates.

Next, I generated similar boxplots representing overall property and crop damage estimates, again showing both basic and log10-scale views.

p1 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TP, fill=TP_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Property Damage (MM)", x="", y="")+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  scale_y_continuous(labels=dollar, limits=c(0,1000))+
  geom_hline(yintercept=250, color="darkorange", size=1)
p2 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TP, fill=TP_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Property Damage (MM, log10)", x="", y="")+
  scale_y_log10(labels=dollar)+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=250, color="darkorange", size=1)
p3 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TC, fill=TC_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Crop Damage (MM)", x="", y="")+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  scale_y_continuous(labels=dollar, limits=c(0,2000))+
  geom_hline(yintercept=100, color="darkorange", size=1)
p4 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TC, fill=TC_high))+
  geom_boxplot()+
  coord_flip()+
  labs(title="Crop Damage (MM, log10)", x="", y="")+
  scale_y_log10(labels=dollar)+
  theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
  scale_fill_manual(values=c("slategrey","darkorange"))+
  geom_hline(yintercept=100, color="darkorange", size=1)
grid.arrange(p1, p2, p3, p4, nrow=2, ncol=2)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 45 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 47 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 4 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 127 rows containing non-finite values (stat_boxplot).

From the standpoint of property and crop damage amounts, hurricanes stand out as high-severity events, despite relatively low frequency. Drought and flood, as might be expected, are both major contributors to crop losses, although floods are much more frequent than droughts in the available database. Thunderstorms, hail, floods, tornadoes, and winter storms are also major contributors to property damage, while also occurring at high frequency. It is interesting to note that property damages rank-order similarly to event frequency, again with the exception of hurricanes, the high-severity outlier.