The purpose of this analysis is to identify types of storm events that pose the greatest potential harm to population health, and which ones have the greatest economic consequences.
The event data has been captured by the National Oceanographic and Atmospheric Administration (NOAA) in a storm database. I modified the data to facilitate analysis, deriving an event year variable, converting damage estimates to an equivalent dollar-based metric, mapping the free-text event types to 22 categories, and dropping data from 1995 and prior years, which may be incomplete and hence not well suited for estimating future impacts.
I generated boxplots representing the annual distribution of event counts, fatalities, injuries, property damage estimates, and crop damage estimates, ordered consistently by descending median event frequency, and tagged with thresholds for each metric that enable easy visual identification of high-impact events.
The analysis showed that heat events, while relatively less frequent, are associated with high absolute fatality and injury rates, constituting a major risk to human health. Tornadoes, floods, and lightning strikes are common and associated with high fatality and/or injury rates.
The events with the greatest economic consequences are somewhat different. Hurricanes result in huge property and crop losses, despite relatively low incidence rates. Droughts and floods are somewhat more common, and have particularly negative effects on crops. Meanwhile, thunderstorms, hail storms, floods, tornadoes, and winter storms are relatively frequent AND cause heavy property damages.
A number of supplemental packages must be loaded to support the analysis (assumed already installed).
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(grid)
library(gridExtra)
library(lubridate)
library(scales)
library(stringr)
I first read the contents of the Storm Data dataset into a data frame called storm, downloading the dataset into the current working directory if not already present.
if (!file.exists("StormData.csv.bz2"))
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="StormData.csv.bz2")
storm <- read.csv("StormData.csv.bz2", stringsAsFactors=FALSE)
I combined the BGN_DATE / BGN_TIME and END_DATE / END_TIME fields into single date/time variables and reformatted them as POSIXct. The time component was ignored in cases with invalid entries.
storm$BGN_DT <- with(storm, substr(BGN_DATE, 1, nchar(BGN_DATE)-8))
storm$BGN_DT <- with(storm, paste(BGN_DT, " ", substr(BGN_TIME,1,2), ":", substr(BGN_TIME,3,4), sep=""))
storm$BGN_DT <- mdy_hm(storm$BGN_DT)
## Warning: 5 failed to parse.
# Revert to midnight if invalid time
storm$BGN_DT <- ifelse(is.na(storm$BGN_DT), mdy_hms(storm$BGN_DATE), storm$BGN_DT)
storm$END_DT <- with(storm, substr(END_DATE, 1, nchar(END_DATE)-8))
storm$END_DT <- with(storm, paste(END_DT, " ", substr(END_TIME,1,2), ":", substr(END_TIME,3,4), sep=""))
storm$END_DT <- mdy_hm(storm$END_DT)
## Warning: 243795 failed to parse.
# Revert to midnight if invalid time
storm$END_DT <- ifelse(is.na(storm$END_DT), mdy_hms(storm$END_DATE), storm$END_DT)
# Extract the year component of the BEGIN value for later use
storm$BGN_YR <- year(as.POSIXlt(storm$BGN_DT, origin=origin))
Property and crop damage estimates in some cases must be converted to the same scale using the magnitude indicator provided in the database (K/k = thousands, M/m = millions, B/b = billions). I created PROPDMG_AMT and CROPDMG_AMT variables to house damage estimates on an equivalent dollar basis. Modifiers other than K/k, M/m, and B/b were ignored (assumed to be data-entry errors).
storm$PROPDMG_AMT <- ifelse((storm$PROPDMGEXP %in% c("k","K")), storm$PROPDMG*1000,
ifelse((storm$PROPDMGEXP %in% c("m","M")), storm$PROPDMG*1000000,
ifelse((storm$PROPDMGEXP %in% c("b","B")), storm$PROPDMG*1000000000,
storm$PROPDMG
)
)
)
summary(storm$PROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 12.06 0.50 5000.00
summary(storm$PROPDMG_AMT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11
storm$CROPDMG_AMT <- ifelse((storm$CROPDMGEXP %in% c("k","K")), storm$CROPDMG*1000,
ifelse((storm$CROPDMGEXP %in% c("m","M")), storm$CROPDMG*1000000,
ifelse((storm$CROPDMGEXP %in% c("b","B")), storm$CROPDMG*1000000000,
storm$CROPDMG
)
)
)
summary(storm$CROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.527 0.000 990.000
summary(storm$CROPDMG_AMT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
The EVTYPE field appears to be free-form text, in which the same event type can arise in a number of different forms.
I examined the contents to derive a handful of simple standardization rules that eliminate needless variation due to misspellings and intentional abbreviations.
storm$EVTYPE_MOD <- toupper(storm$EVTYPE)
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "TSTM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDERSTORMS", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDERSTROM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDESTORM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "THUNDEERSTORM", "THUNDERSTORM")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "SNOW-SQUALLS", "SNOW SQUALLS")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LAKE-EFFECT", "LAKE EFFECT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WINDS", "WIND")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WND", "WIND")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WAYTERSPOUT", "WATERSPOUT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "WATER SPOUT", "WATERSPOUT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "CURRENTS", "CURRENT")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "MICOBURST", "MICROBURST")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LIGHTING", "LIGHTNING")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "LIGNTNING", "LIGHTNING")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "TORNDAO", "TORNADO")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "SML", "SMALL")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "FLD", "FLOOD")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "FLOOOD", "FLOOD")
storm$EVTYPE_MOD <- str_replace(storm$EVTYPE_MOD, "AVALANCE", "AVALANCHE")
#Examine common field values (100 or more instances)
data.frame(table(storm$EVTYPE_MOD)) %>% filter(Freq >= 100) %>% arrange(desc(Freq))
## Var1 Freq
## 1 THUNDERSTORM WIND 323383
## 2 HAIL 288661
## 3 TORNADO 60653
## 4 FLASH FLOOD 54277
## 5 FLOOD 25327
## 6 HIGH WIND 21747
## 7 LIGHTNING 15758
## 8 HEAVY SNOW 15708
## 9 MARINE THUNDERSTORM WIND 11987
## 10 HEAVY RAIN 11742
## 11 WINTER STORM 11433
## 12 WINTER WEATHER 7045
## 13 FUNNEL CLOUD 6844
## 14 WATERSPOUT 3798
## 15 STRONG WIND 3773
## 16 URBAN/SMALL STREAM FLOOD 3422
## 17 WILDFIRE 2761
## 18 BLIZZARD 2719
## 19 DROUGHT 2488
## 20 ICE STORM 2006
## 21 EXCESSIVE HEAT 1678
## 22 WILD/FOREST FIRE 1457
## 23 FROST/FREEZE 1343
## 24 DENSE FOG 1293
## 25 WINTER WEATHER/MIX 1104
## 26 THUNDERSTORM WIND/HAIL 1053
## 27 EXTREME COLD/WIND CHILL 1002
## 28 RIP CURRENT 774
## 29 HEAT 767
## 30 HIGH SURF 734
## 31 TROPICAL STORM 690
## 32 FLASH FLOODING 683
## 33 LAKE EFFECT SNOW 659
## 34 EXTREME COLD 657
## 35 COASTAL FLOOD 656
## 36 FLOOD/FLASH FLOOD 625
## 37 SNOW 617
## 38 LANDSLIDE 600
## 39 COLD/WIND CHILL 539
## 40 FOG 538
## 41 MARINE HAIL 442
## 42 DUST STORM 427
## 43 AVALANCHE 387
## 44 WIND 383
## 45 STORM SURGE 261
## 46 FREEZING RAIN 260
## 47 URBAN FLOOD 251
## 48 HEAVY SURF/HIGH SURF 228
## 49 EXTREME WINDCHILL 204
## 50 DRY MICROBURST 186
## 51 COASTAL FLOODING 183
## 52 LIGHT SNOW 176
## 53 ASTRONOMICAL LOW TIDE 174
## 54 HURRICANE 174
## 55 RIVER FLOOD 173
## 56 RECORD WARMTH 154
## 57 DUST DEVIL 149
## 58 STORM SURGE/TIDE 148
## 59 MARINE HIGH WIND 135
## 60 UNSEASONABLY WARM 126
## 61 FLOODING 120
## 62 ASTRONOMICAL HIGH TIDE 103
## 63 MODERATE SNOWFALL 101
I also mapped the modified event types to general categories to ease interpretation. In cases that match multiple conditions, the last transformation applied will stand.
storm$EVTYPE_GRP = "OTHER"
storm[grep("TEMPERATUR", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "TEMPERATURE"
storm[grep("AVALANCHE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "AVALANCHE"
storm[grep("WIND|TURBULENCE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WIND"
storm[grep("RAIN|WET|SHOWER", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "RAIN"
storm[grep("HAIL", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HAIL"
storm[grep("FIRE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FIRE"
storm[grep("FOG", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FOG"
storm[grep("HEAT|WARM|HOT|HIGH TEMP", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HEAT"
storm[grep("COLD|COOL|LOW TEMP", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "COLD"
storm[grep("DROUGHT|DRY", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "DROUGHT"
storm[grep("FLOOD|HIGH WATER", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "FLOOD"
storm[grep("SMOKE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "SMOKE"
storm[grep("DUST", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "DUST"
storm[grep("VOLCAN", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "VOLCANIC ERUPTION"
storm[grep("SNOW|WINTER|BLIZZARD|ICE|FROST|SLEET|MIX|FREEZ|ICY", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WINTER STORM"
storm[grep("LIGHTNING", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "LIGHTNING"
storm[grep("SLIDE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "LANDSLIDE"
storm[grep("TORNADO|FUNNEL|WATERSPOUT|MICROBURST|ROTATING|GUSTNADO", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "TORNADO"
storm[grep("THUNDERSTORM", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "THUNDERSTORM"
storm[grep("HURRICANE|TYPHOON|TROPICAL", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "HURRICANE"
storm[grep("CURRENT|SURF|TIDE|SURGE|TSUNAMI|HIGH SEAS|SWELLS|WAVE", storm$EVTYPE_MOD),]$EVTYPE_GRP <- "WAVES"
The storm dataset was now in a form well suited for the objectives of the analysis.
The NOAA database documentation indicates that data collection has been sparse in the past. I first examined the data for clues as to when relatively complete reporting may have begun. As a first step, I summarized the event stream by year, examining event frequency, the number of reporting Weather Field Offices (WFOs), and the number of distinct state office (STATEOFFIC) values, then graphed the tallies by year.
storm_rptg <- storm %>%
group_by(BGN_YR) %>%
summarize(N=n(),
N_WFO=length(unique(WFO)),
N_SO=length(unique(STATEOFFIC))
) %>%
arrange(BGN_YR)
tbl_df(storm_rptg)
## Source: local data frame [62 x 4]
##
## BGN_YR N N_WFO N_SO
## (dbl) (int) (int) (int)
## 1 1950 223 1 1
## 2 1951 269 1 1
## 3 1952 272 1 1
## 4 1953 492 1 1
## 5 1954 609 1 1
## 6 1955 1413 1 1
## 7 1956 1703 1 1
## 8 1957 2184 1 1
## 9 1958 2213 1 1
## 10 1959 1813 1 1
## .. ... ... ... ...
p1 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N))+
geom_bar(stat="identity", fill="slategrey")+
labs(x="", y="Nbr of Events")+
geom_vline(xintercept=1995.5,color="darkorange", size=1.5)+
annotate("text", label="1996", x=1996, y=60000, color="darkorange", hjust=0)
p2 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N_WFO))+
geom_bar(stat="identity", fill="slategrey")+
labs(x="",y="Nbr of WFOs Reporting")+
geom_vline(xintercept=1995.5,color="darkorange", size=1.5)
p3 <- ggplot(data=storm_rptg, aes(x=BGN_YR, y=N_SO))+
geom_bar(stat="identity", fill="slategrey")+
labs(x="Year (BGN_DATE)",y="Nbr of State Offices")+
geom_vline(xintercept=1995.5,color="darkorange", size=1.5)
grid.arrange(p1, p2, p3, nrow=3)
The number of events captured in the database increases steadily starting in 1994, whereas the WFO and State Office fields become consistently populated as of 1996. For the current purpose of assessing drivers of injury and loss, I decided to discard data collected prior to 1996.
I next aggregated the raw data structure by event type grouping and year (1996 forward only), creating a new data frame called storm_grpyr. This data structure captures the total number of events recorded, as well as annual fatalities, injuries, property damage amounts (in $millions), and crop damage amounts (also in $millions).
In the process, I also established certain thresholds for each metric, easing the identification of high-frequency/magnitude events.
storm_grpyr <- storm %>%
filter(BGN_YR >= 1996) %>%
group_by(EVTYPE_GRP, BGN_YR) %>%
summarize(
N=n(),
TF=sum(FATALITIES),
TI=sum(INJURIES),
TP=sum(PROPDMG_AMT/1000000),
TC=sum(CROPDMG_AMT/1000000)
) %>%
arrange(desc(N))
# Calculate summary statistics for key metrics
medn <- storm_grpyr %>%
group_by(EVTYPE_GRP) %>%
summarize(
N_med=median(N),
N_mean=mean(N),
TF_med=median(TF),
TI_med=median(TI),
TP_med=median(TP),
TC_med=median(TC)
)
# Create factors representing "of-interest" thresholds per metric
medn$N_high <- as.factor(medn$N_med > 2500)
medn$TF_high <- as.factor(medn$TF_med > 25)
medn$TI_high <- as.factor(medn$TI_med > 250)
medn$TP_high <- as.factor(medn$TP_med > 250)
medn$TC_high <- as.factor(medn$TC_med > 100)
# Merge summary statistics back to main table
storm_grpyr <- left_join(storm_grpyr, medn, by="EVTYPE_GRP")
storm_grpyr$EVTYPE_GRP <- factor(storm_grpyr$EVTYPE_GRP, levels=storm_grpyr[order(storm_grpyr$N_med),]$EVTYPE_GRP, ordered=TRUE)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
# Examine the result
tbl_df(storm_grpyr)
## Source: local data frame [324 x 18]
##
## EVTYPE_GRP BGN_YR N TF TI TP TC N_med N_mean TF_med
## (fctr) (dbl) (int) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1 AVALANCHE 2010 50 22 20 0.2018 0 24.5 23.625 12.5
## 2 AVALANCHE 2008 39 25 14 1.1260 0 24.5 23.625 12.5
## 3 AVALANCHE 2006 36 13 6 0.0000 0 24.5 23.625 12.5
## 4 AVALANCHE 2005 35 11 16 0.2000 0 24.5 23.625 12.5
## 5 AVALANCHE 2000 28 16 18 0.7550 0 24.5 23.625 12.5
## 6 AVALANCHE 2007 27 18 12 0.0020 0 24.5 23.625 12.5
## 7 AVALANCHE 2001 25 18 5 0.0440 0 24.5 23.625 12.5
## 8 AVALANCHE 2002 25 18 11 0.0315 0 24.5 23.625 12.5
## 9 AVALANCHE 1999 24 26 10 0.0650 0 24.5 23.625 12.5
## 10 AVALANCHE 2011 21 9 8 0.0550 0 24.5 23.625 12.5
## .. ... ... ... ... ... ... ... ... ... ...
## Variables not shown: TI_med (dbl), TP_med (dbl), TC_med (dbl), N_high
## (fctr), TF_high (fctr), TI_high (fctr), TP_high (fctr), TC_high (fctr)
The analysis is driven by a series of boxplots showing event and key metric distributions over the 1996-2011 period represented in the database.
First, I examined overall event frequencies and impacts on population health, as expressed by the numbers of fatalities and injuries associated with each event type. Note that the event types are presented in descending order of the median number of annual events (the associated factor conversion generated several warnings). I also used log10-scale conversions to ease visual interpretation.
p1 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=N, fill=N_high))+
geom_boxplot()+
coord_flip()+
labs(title="Annual Events", x="", y="")+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=2500, color="darkorange", size=1)
p2 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=N, fill=N_high))+
geom_boxplot()+
coord_flip()+
labs(title="Annual Events (log10)", x="", y="")+
scale_y_log10()+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=2500, color="darkorange", size=1)
p3 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TF, fill=TF_high))+
geom_boxplot()+
coord_flip()+
labs(title="Annual Fatalities (log10)", x="", y="")+
scale_y_log10()+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=25, color="darkorange", size=1)
p4 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TI, fill=TI_high))+
geom_boxplot()+
coord_flip()+
labs(title="Annual Injuries (log10)", x="", y="")+
scale_y_log10()+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=250, color="darkorange", size=1)
grid.arrange(p1, p2, p3, p4, nrow=2, ncol=2)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 86 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 69 rows containing non-finite values (stat_boxplot).
Thunderstorms, hail storms, and floods constitute the most frequent event types, with low and inconsistent incident rates for most other categories. Events generating relatively high fatality rates (medians exceeding the threshold rate of 25 per year) included floods, tornadoes, wind, lightning, waves, and heat. Heat events displayed a wide range of fatality rates over the period with the highest median of any category. Events generating relatively high injury rates (medians exceeding the threshold rate of 250 per year) included thunderstorms, tornadoes, lightning, and heat.
In summary, heat events have dramatic impacts on population health, despite being relatively uncommon. Tornadoes, floods, and lightning strikes are relatively common and also associated with high fatality and/or injury rates.
Next, I generated similar boxplots representing overall property and crop damage estimates, again showing both basic and log10-scale views.
p1 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TP, fill=TP_high))+
geom_boxplot()+
coord_flip()+
labs(title="Property Damage (MM)", x="", y="")+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
scale_y_continuous(labels=dollar, limits=c(0,1000))+
geom_hline(yintercept=250, color="darkorange", size=1)
p2 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TP, fill=TP_high))+
geom_boxplot()+
coord_flip()+
labs(title="Property Damage (MM, log10)", x="", y="")+
scale_y_log10(labels=dollar)+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=250, color="darkorange", size=1)
p3 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TC, fill=TC_high))+
geom_boxplot()+
coord_flip()+
labs(title="Crop Damage (MM)", x="", y="")+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
scale_y_continuous(labels=dollar, limits=c(0,2000))+
geom_hline(yintercept=100, color="darkorange", size=1)
p4 <- ggplot(data=storm_grpyr, aes(x=EVTYPE_GRP, y=TC, fill=TC_high))+
geom_boxplot()+
coord_flip()+
labs(title="Crop Damage (MM, log10)", x="", y="")+
scale_y_log10(labels=dollar)+
theme(axis.text.y=element_text(size=7), legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"))+
scale_fill_manual(values=c("slategrey","darkorange"))+
geom_hline(yintercept=100, color="darkorange", size=1)
grid.arrange(p1, p2, p3, p4, nrow=2, ncol=2)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 45 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 47 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 4 rows containing non-finite values (stat_boxplot).
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 127 rows containing non-finite values (stat_boxplot).
From the standpoint of property and crop damage amounts, hurricanes stand out as high-severity events, despite relatively low frequency. Drought and flood, as might be expected, are both major contributors to crop losses, although floods are much more frequent than droughts in the available database. Thunderstorms, hail, floods, tornadoes, and winter storms are also major contributors to property damage, while also occurring at high frequency. It is interesting to note that property damages rank-order similarly to event frequency, again with the exception of hurricanes, the high-severity outlier.