Both public and economic health can be impacted by weather events. The U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database tracks the characteristics of major weather events in the United States of America and, amongst other information, it provides information regarding fatalities, injuries, crop and property damages.
This project aims to investigate these data to identify which weather events negatively impact public health and which result in economic damages. The analysis showed that between the years 1950 and 2011 tornados were the leading cause of both fatalities and injuries, whereas floods and drought were the leading causes of property and crop damages, respectively.
The following sections will describe how the data was loaded, transformed, and analysed.
For the analyses of this assignment, the following packages were installed and loaded into R studio using the commands install.packages() and library(): "dplyr" (package version 0.8.3), "ggplot2" (package version 3.3.3), and "scales" (package version 1.1.1).
# loading packages
library(dplyr)
library(ggplot2)
library(scales)
The raw data was downloaded into the working directory on the 1st of April 2021 from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and loaded as "Storm" into R Studio using the command read.csv(). The "Storm" dataframe contains 902297 observations and 37 variables.
# downloading file into working directory
if(!file.exists("Stromdata.csv.bz2")) {
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, destfile = "Stromdata.csv.bz2", method = "curl")
}
# reading in the data
Storm <- read.csv("StormData.csv.bz2")
The "Storm" dataframe was initially explored using the commands colnames() and str() to get an idea of the variables present.
# exploring the dataset
colnames(Storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(Storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
For this analysis, only a subset of the data is relevant. The following section describes how the data was preprocessed to arrive at new, tidy dataframes that are ready for analysis.
The analysis will look at the number of fatalities and injuries, as well as the total crop and property damages caused by the various weather events. Although the database starts in 1950, the early years in the database only contain few entries. For this analysis, all data collected between 1950-2011 is considered. The following list summarises the variables of interest for this analysis:
* "EVTYPE": the weather event (for example, "FLOOD", "WIND", "SNOW", etc)
* "PROPDMG": the approximate property damage, in USD
* "PROPDMGEXP": the exponent to the corresponding value in the column "PROPDMG"
* "CROPDMG": the approximate crop damage, in USD
* "CROPDMGEXP": the exponent to the corresponding value in the column "CROPDMG" * "FATALITIES": the estimated number of fatalities
* "INJURIES": the estimated number of people injured
First, a copy of the original "Storm" dataframe called "data1" was created to facilitate the data processing stages. Using the select() command, the columns "EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "FATALITIES", and "INJURIES" were subsetted for all rows in which at least one of the four categories "PROPDMG", "CROPDMG", "FATALITIES", or "INJURIES" had a value greater than 0. This reduced the original dataframe to 254633 observations and 7 variables.
Secondly, the categories in the column "EVTYPE" needed to be sorted. According to NOAA, the data can be grouped into 48 main event types. However, upon looking at the raw data using the command unique(data1$EVTYPE), 488 entries are returned, indicating a lot of redundancy. Further inspection shows that by changing all of the terms to capital letters using the command toupper() this number could be reduced to 447 event types. In order to avoid looking at and sorting each term individually, the command grep() was used to group the events by key words. To this end, a new column called "EVENT_TYPE" filled with the term "(OTHER EVENT TYPES)" was created. Using the command grep(), the "EVTYPES"" were grouped according to key words. This technique was by no means perfect, but was an efficient way to easily reduce the number of Event Types to 38 (including a mixed category named "(Other EVENT TYPES)", containing 158 rows of data).
Thirdly, the damage in USD to property and crop is split into two columns: once with a number, and once with an exponent that the former needs to be multiplied by in order to have the complete value of economic damage. The variables "K", "M", and "B" indicate that the value in the "PROPDMG" or "CROPDMG" columns needs to be multipled by 1000, 1000000, and 1000000000, respectively. To this end, the grep() command was used to replace these letters with the aforementioned corresponding numbers in the "PROPDMGEXP" and "CROPDMGEXP" columns. The class of these columns was changed to "numeric" before multiplying the "PROPDMG" and the "PROPDMGEXP", and the "CROPDMG" and "CROPDMGEXP" columns with each other to form two new columns "PROPDMG_TOTAL" and "CROPDMG_TOTAL", respectively.
Finally, the columns relevant for the analysis from "data1" were subsetted into the dataframe "data2": "EVENT_TYPES", "PROPDMG_TOTAL", "CROPDMG_TOTAL", "FATALITIES", and "INJURIES". "data2" was tranformed into a dataframe and all "NA"s were replaced with "0".
# creating a copy of the original dataframe for processing
data1 <- Storm
# selecting the relevant columns
data1 <- data1 %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES) %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
# changing all terms to uppercase
data1$EVTYPE <- toupper(data1$EVTYPE)
# create a new variable "EVENT_TYPE" to transform variable EVTYPE in groups according to key words
data1$EVENT_TYPES <- "(OTHER EVENT TYPES)"
data1$EVENT_TYPES[grep("COAST", data1$EVTYPE)] <- "COAST"
data1$EVENT_TYPES[grep("FLASH", data1$EVTYPE)] <- "FLASH"
data1$EVENT_TYPES[grep("FLOOD", data1$EVTYPE)] <- "FLOOD"
data1$EVENT_TYPES[grep("FLASH", data1$EVTYPE)] <- "FLASH FLOOD"
data1$EVENT_TYPES[grep("COAST", data1$EVTYPE)] <- "COASTAL FLOOD"
data1$EVENT_TYPES[grep("DEVIL", data1$EVTYPE)] <- "DEVIL"
data1$EVENT_TYPES[grep("DUST", data1$EVTYPE)] <- "DUST STORM"
data1$EVENT_TYPES[grep("DEVIL", data1$EVTYPE)] <- "DUST DEVIL"
data1$EVENT_TYPES[grep("FROST", data1$EVTYPE)] <- "FROST/FREEZE"
data1$EVENT_TYPES[grep("FREEZE", data1$EVTYPE)] <- "FROST/FREEZE"
data1$EVENT_TYPES[grep("AVALAN", data1$EVTYPE)] <- "AVALANCHE"
data1$EVENT_TYPES[grep("BLIZZARD", data1$EVTYPE)] <- "BLIZZARD"
data1$EVENT_TYPES[grep("WIND", data1$EVTYPE)] <- "WIND"
data1$EVENT_TYPES[grep("DEBRIS", data1$EVTYPE)] <- "DEBRIS FLOW"
data1$EVENT_TYPES[grep("FOG", data1$EVTYPE)] <- "FOG"
data1$EVENT_TYPES[grep("SMOKE", data1$EVTYPE)] <- "SMOKE"
data1$EVENT_TYPES[grep("DROUGHT", data1$EVTYPE)] <- "DROUGHT"
data1$EVENT_TYPES[grep("WIND", data1$EVTYPE)] <- "WIND"
data1$EVENT_TYPES[grep("HEAT", data1$EVTYPE)] <- "HEAT"
data1$EVENT_TYPES[grep("CHILL", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("EXTREME COLD", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("HYPOTHERMIA", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("EXPOSURE", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("RAIN", data1$EVTYPE)] <- "RAIN"
data1$EVENT_TYPES[grep("SNOW", data1$EVTYPE)] <- "SNOW"
data1$EVENT_TYPES[grep("FUNNEL", data1$EVTYPE)] <- "FUNNELCLOUD"
data1$EVENT_TYPES[grep("HAIL", data1$EVTYPE)] <- "HAIL"
data1$EVENT_TYPES[grep("HURRICANE", data1$EVTYPE)] <- "HURRICANE/TYPHOON"
data1$EVENT_TYPES[grep("TYPHOON", data1$EVTYPE)] <- "HURRICANE/TYPHOON"
data1$EVENT_TYPES[grep("LIGHTNING", data1$EVTYPE)] <- "LIGHTNING"
data1$EVENT_TYPES[grep("RIP CURRENT", data1$EVTYPE)] <- "RIP CURRENT"
data1$EVENT_TYPES[grep("TORNADO", data1$EVTYPE)] <- "TORNADO"
data1$EVENT_TYPES[grep("GLAZE", data1$EVTYPE)] <- "GLAZE"
data1$EVENT_TYPES[grep("FIRE", data1$EVTYPE)] <- "FIRE"
data1$EVENT_TYPES[grep("TSUNAMI", data1$EVTYPE)] <- "TSUNAMI"
data1$EVENT_TYPES[grep("LANDSLIDE", data1$EVTYPE)] <- "LANDSLIDE"
data1$EVENT_TYPES[grep("SURF", data1$EVTYPE)] <- "SURF"
data1$EVENT_TYPES[grep("MIXED PRECIP", data1$EVTYPE)] <- "RAIN"
data1$EVENT_TYPES[grep("BLACK ICE", data1$EVTYPE)] <- "ICE"
data1$EVENT_TYPES[grep("ICY ROAD", data1$EVTYPE)] <- "ICE"
data1$EVENT_TYPES[grep("ASTRONOMICAL", data1$EVTYPE)] <- "ASTRONOMICAL TIDE"
data1$EVENT_TYPES[grep("COLD", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("ICE STORM", data1$EVTYPE)] <- "ICE STORM"
data1$EVENT_TYPES[grep("SURGE", data1$EVTYPE)] <- "SURGE/TIDE"
data1$EVENT_TYPES[grep("TROPICAL", data1$EVTYPE)] <- "TROPICAL STORM"
data1$EVENT_TYPES[grep("HIGH SEA", data1$EVTYPE)] <- "SURF"
data1$EVENT_TYPES[grep("URBAN/SML STREAM FLD", data1$EVTYPE)] <- "URBAN/SML STREAM FLD"
data1$EVENT_TYPES[grep("WINTER STORM", data1$EVTYPE)] <- "W STORM"
data1$EVENT_TYPES[grep("WINTER", data1$EVTYPE)] <- "WINTER WEATHER"
data1$EVENT_TYPES[grep("WINTRY", data1$EVTYPE)] <- "WINTER WEATHER"
data1$EVENT_TYPES[grep("W STORM", data1$EVTYPE)] <- "WINTER STORM"
data1$EVENT_TYPES[grep("WATERSPOUT", data1$EVTYPE)] <- "WATERSPOUT"
data1$EVENT_TYPES[grep("SEICHE", data1$EVTYPE)] <- "SEICHE"
data1$EVENT_TYPES[grep("MICROBURST", data1$EVTYPE)] <- "THUNDERSTORM"
data1$EVENT_TYPES[grep("THUNDERSTORM", data1$EVTYPE)] <- "THUNDERSTORM"
# checking the number of Event Types
sort(table(data1$EVENT_TYPES), decreasing = TRUE)
##
## WIND THUNDERSTORM TORNADO
## 73271 56045 39960
## HAIL FLASH FLOOD LIGHTNING
## 26607 21597 13300
## FLOOD WINTER WEATHER SNOW
## 10622 2064 1893
## FIRE RAIN HEAT
## 1259 1248 980
## ICE STORM URBAN/SML STREAM FLD RIP CURRENT
## 714 702 641
## EXTREME COLD/WIND CHILL TROPICAL STORM AVALANCHE
## 497 456 269
## DROUGHT BLIZZARD COASTAL FLOOD
## 266 255 239
## SURF HURRICANE/TYPHOON SURGE/TIDE
## 235 233 225
## LANDSLIDE FOG (OTHER EVENT TYPES)
## 198 188 158
## FROST/FREEZE DUST STORM DUST DEVIL
## 155 104 95
## WATERSPOUT GLAZE ICE
## 64 23 23
## TSUNAMI FUNNELCLOUD ASTRONOMICAL TIDE
## 14 13 10
## SEICHE SMOKE
## 9 1
# PROPDMG and CROPDMG values needed to be multiplied by their corresponding exponent columns
# K = 1000, M = 1000000, and B =10000000000
data1$PROPDMGEXP[grep("K", data1$PROPDMGEXP)] <- "1000"
data1$PROPDMGEXP[grep("M", data1$PROPDMGEXP)] <- "1000000"
data1$PROPDMGEXP[grep("B", data1$PROPDMGEXP)] <- "1000000000"
data1$CROPDMGEXP[grep("K", data1$CROPDMGEXP)] <- "1000"
data1$CROPDMGEXP[grep("M", data1$CROPDMGEXP)] <- "1000000"
data1$CROPDMGEXP[grep("B", data1$CROPDMGEXP)] <- "1000000000"
data1$PROPDMGEXP <- as.numeric(data1$PROPDMGEXP)
data1$CROPDMGEXP <- as.numeric(data1$CROPDMGEXP)
# creating a new variable "PROPDMG_TOTAL" and "CROPDMG_TOTAL"
data1$PROPDMG_TOTAL <- data1$PROPDMG * data1$PROPDMGEXP
data1$CROPDMG_TOTAL <- data1$CROPDMG * data1$CROPDMGEXP
# reorganising all the columns and only keep those that are needed
data2 <- data1 %>%
select(EVENT_TYPES, PROPDMG_TOTAL, CROPDMG_TOTAL, FATALITIES, INJURIES)
# replacing all NA values with 0
data2 <- as.data.frame(data2)
data2[is.na(data2)] <- 0
The dataframe "data2" contains all of the information needed to answer the questions of interest, however the data is not yet in tidy form. The following section describes how "data2" is further transformed to yield two tidy dataframes that are then directly used in the "Results" section of this analysis.
For the first tidy dataframe regarding the health impact (i.e. looking at "EVENT_TYPES", "FATALITIES" and "INJURIES"), "data2" was subset to contain only the rows for which either the recorded values for "FATALITIES" or "INJURIES" was greater than 0. These values were then summed up by "EVENT_TYPES" to generate "data3". The dataframe "data3" contains a total of 35 observations and 3 variables ("EVENT_TYPES", "FATALITIES", and "INJURIES"). "data3" was the split into two dataframes "data3_FAT" and "data3_INJ", where each contained the "EVENT_TYPES" and "FATALITIES" or "INJURIES" columns, respectively. Both dataframes initially contain 35 observations and two variables. The "FATALITIES" and "INJURIES" columns were renamed to "COUNT", and for each dataframe a new column "HARM" was containing the term "FATALITIES" or "INJURY", respectively. This then allowed "data3_FAT" and "data3_INJ" to be recombined using the rbind() command into the dataframe "data3_tidy. This dataframe contains 70 observations of 3 variables ("EVENT_TYPES", "COUNT" (i.e. the number of fatalities or injuries recorded), and "HARM" (i.e. either "FATALITY" or "INJURY")).
For the second tidy dataframe regarding the economic impact (i.e. looking at "EVENT_TYPES", "PROPDMG" and "CROPDMG"), "data2" was subset to contain only the rows for which either the recorded values for "CROPDMG_TOTAL" or "PROPDMG_TOTAL" was greater than 0. These values were then summed up by "EVENT_TYPES" to generate "data4". The dataframe "data4" contains a total of 38 observations and 3 variables ("EVENT_TYPES", "CROPDMG_TOTAL", and "PROPDMG_TOTAL"). "data4" was the split into two dataframes "data4_CROP" and "data4_PROP", where each contained the "EVENT_TYPES" and "CROPDMG_TOTAL" or "PROPDMG_TOTAL" columns, respectively. Both dataframes initially contain 38 observations and two variables. The "CROPDMG_TOTAL" and "PROPDMG_TOTAL" column were renamed to "COUNT", and for each dataframe a new column "DAMAGE" was containing the term "CROP DAMAGE" or "PROPERTY DAMAGE", respectively. This then allowed "data4_CROP" and "data4_PROP" to be recombined using the rbind() command into the dataframe "data4_tidy. This dataframe contains 76 observations of 3 variables ("EVENT_TYPES", "COUNT" (i.e. the amount of crop or property damage in USD), and "DAMAGE" (i.e. either "CROP DAMAGE" or "PROPERTY DAMAGE")).
# generating a tidy dataframe to investigate the health impact ("FATALITIES" and "INJURIES")
data3 <- data2 %>%
filter(FATALITIES > 0 | INJURIES > 0) %>%
group_by(EVENT_TYPES) %>%
summarise(FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES))
# making the data tidy
data3_FAT <- data3 %>%
select(EVENT_TYPES, FATALITIES) %>%
rename(COUNT = FATALITIES)
data3_FAT$HARM <- "FATALITIES"
data3_INJ <- data3 %>%
select(EVENT_TYPES, INJURIES) %>%
rename(COUNT = INJURIES)
data3_INJ$HARM <- "INJURY"
data3_tidy <- rbind(data3_FAT, data3_INJ) %>%
arrange(-COUNT)
# generating a tidy dataframe to investigate the economic impact ("CROPDMG_TOTAL" and "PROPDMG_TOTAL")
data4 <- data2 %>%
filter(CROPDMG_TOTAL > 0 | PROPDMG_TOTAL > 0) %>%
group_by(EVENT_TYPES) %>%
summarise(CROP = sum(CROPDMG_TOTAL),
PROP = sum(PROPDMG_TOTAL))
# making the data tidy
data4_CROP <- data4 %>%
select(EVENT_TYPES, CROP) %>%
rename(COUNT = CROP)
data4_CROP$DAMAGE <- "CROP DAMAGE"
data4_PROP <- data4 %>%
select(EVENT_TYPES, PROP) %>%
rename(COUNT = PROP)
data4_PROP$DAMAGE <- "PROPERTY DAMAGE"
data4_tidy <- rbind(data4_CROP, data4_PROP) %>%
arrange(-COUNT)
The Top 10 events causing the most injuries or fatalities was looked at using the head() command. The data for all categories was then plotted using ggplot(). The results show that the most harmful event type is the tornado, which has resulted in 91364 injuries and 5658 fatalities between the years 1950 and 2011.
# Table of the event types causing the top 10 number of injuries and/or fatalities
head(data3_tidy, 10)
## # A tibble: 10 x 3
## EVENT_TYPES COUNT HARM
## <chr> <dbl> <chr>
## 1 TORNADO 91364 INJURY
## 2 HEAT 9224 INJURY
## 3 WIND 8861 INJURY
## 4 FLOOD 6795 INJURY
## 5 TORNADO 5658 FATALITIES
## 6 LIGHTNING 5231 INJURY
## 7 HEAT 3138 FATALITIES
## 8 THUNDERSTORM 2507 INJURY
## 9 ICE STORM 1992 INJURY
## 10 WINTER WEATHER 1968 INJURY
# Plotting the data
plot1 <- ggplot(data3_tidy, aes(x = reorder(EVENT_TYPES, -COUNT), y = COUNT)) +
geom_histogram(stat = "identity", aes(fill = HARM)) +
labs(x = "Weather Event", y = "Sum Total") +
ggtitle("Total Number of Fatalities and Injuries Summarised by Weather Event \n (1950-2011)",
subtitle = "Figure 1: Tornados are the leading cause of injuries and fatalities") +
theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "white"),
panel.border = element_rect(colour = "black", fill=NA, size=0.5),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25),
legend.position =c(0.88,0.7),
legend.title = element_blank(),
legend.text = element_text(size = 8)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
print(plot1)
The Top 10 events causing the most crop or property damage was looked at using the head() command. The data for all categories was then plotted using ggplot(). The results show that the most harmful event type resulting in property damage was flooding (150,167,161,680 USD), whereas the most harmful event resulting in crop damage was drought (13,972,566,000 USD) between the years 1950 and 2011.
# Table of the event types causing the top 10 number of crop and/or property damage
head(data4_tidy, 10)
## # A tibble: 10 x 3
## EVENT_TYPES COUNT DAMAGE
## <chr> <dbl> <chr>
## 1 FLOOD 150167161680. PROPERTY DAMAGE
## 2 HURRICANE/TYPHOON 85336410010 PROPERTY DAMAGE
## 3 TORNADO 58530432191 PROPERTY DAMAGE
## 4 SURGE/TIDE 47965224000 PROPERTY DAMAGE
## 5 FLASH FLOOD 16907365196. PROPERTY DAMAGE
## 6 HAIL 16013999370 PROPERTY DAMAGE
## 7 DROUGHT 13972566000 CROP DAMAGE
## 8 FLOOD 10734652950 CROP DAMAGE
## 9 WIND 10580518930 PROPERTY DAMAGE
## 10 FIRE 8501628500 PROPERTY DAMAGE
# Plotting the data
plot2 <- ggplot(data4_tidy, aes(x = reorder(EVENT_TYPES, -COUNT), y = COUNT)) +
geom_histogram(stat = "identity", aes(fill = DAMAGE)) +
labs(x = "Weather Event", y = "Total Damages in USD") +
ggtitle("Crop and Property Damages (USD) Summarised by Weather Event \n (1950-2011)",
subtitle = "Figure 2: Flood and drought are the leading causes of property and crop damage, respectively") +
theme(plot.title = element_text(hjust = 0.5),
panel.background = element_rect(fill = "white"),
panel.border = element_rect(colour = "black", fill=NA, size=0.5),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25),
legend.position =c(0.85,0.75),
legend.title = element_blank(),
legend.text = element_text(size = 8)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
print(plot2)