Exploring the NOAA Storm Database

Identifying the Key Weather Events Negatively Impacting Human Health and Causing Economic Damage

Synopsis

Both public and economic health can be impacted by weather events. The U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database tracks the characteristics of major weather events in the United States of America and, amongst other information, it provides information regarding fatalities, injuries, crop and property damages.

This project aims to investigate these data to identify which weather events negatively impact public health and which result in economic damages. The analysis showed that between the years 1950 and 2011 tornados were the leading cause of both fatalities and injuries, whereas floods and drought were the leading causes of property and crop damages, respectively.

Data Processing

The following sections will describe how the data was loaded, transformed, and analysed.

Loading Required Packages

For the analyses of this assignment, the following packages were installed and loaded into R studio using the commands install.packages() and library(): "dplyr" (package version 0.8.3), "ggplot2" (package version 3.3.3), and "scales" (package version 1.1.1).

# loading packages
library(dplyr)
library(ggplot2)
library(scales) 
Reading in the Data

The raw data was downloaded into the working directory on the 1st of April 2021 from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and loaded as "Storm" into R Studio using the command read.csv(). The "Storm" dataframe contains 902297 observations and 37 variables.

# downloading file into working directory
if(!file.exists("Stromdata.csv.bz2")) {
    fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, destfile = "Stromdata.csv.bz2", method = "curl")
}

# reading in the data
Storm <- read.csv("StormData.csv.bz2")
Taking an initial look at the data

The "Storm" dataframe was initially explored using the commands colnames() and str() to get an idea of the variables present.

# exploring the dataset
colnames(Storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(Storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
Pre-Processing the Data

For this analysis, only a subset of the data is relevant. The following section describes how the data was preprocessed to arrive at new, tidy dataframes that are ready for analysis.

The analysis will look at the number of fatalities and injuries, as well as the total crop and property damages caused by the various weather events. Although the database starts in 1950, the early years in the database only contain few entries. For this analysis, all data collected between 1950-2011 is considered. The following list summarises the variables of interest for this analysis:
* "EVTYPE": the weather event (for example, "FLOOD", "WIND", "SNOW", etc)
* "PROPDMG": the approximate property damage, in USD
* "PROPDMGEXP": the exponent to the corresponding value in the column "PROPDMG"
* "CROPDMG": the approximate crop damage, in USD
* "CROPDMGEXP": the exponent to the corresponding value in the column "CROPDMG" * "FATALITIES": the estimated number of fatalities
* "INJURIES": the estimated number of people injured

First, a copy of the original "Storm" dataframe called "data1" was created to facilitate the data processing stages. Using the select() command, the columns "EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "FATALITIES", and "INJURIES" were subsetted for all rows in which at least one of the four categories "PROPDMG", "CROPDMG", "FATALITIES", or "INJURIES" had a value greater than 0. This reduced the original dataframe to 254633 observations and 7 variables.

Secondly, the categories in the column "EVTYPE" needed to be sorted. According to NOAA, the data can be grouped into 48 main event types. However, upon looking at the raw data using the command unique(data1$EVTYPE), 488 entries are returned, indicating a lot of redundancy. Further inspection shows that by changing all of the terms to capital letters using the command toupper() this number could be reduced to 447 event types. In order to avoid looking at and sorting each term individually, the command grep() was used to group the events by key words. To this end, a new column called "EVENT_TYPE" filled with the term "(OTHER EVENT TYPES)" was created. Using the command grep(), the "EVTYPES"" were grouped according to key words. This technique was by no means perfect, but was an efficient way to easily reduce the number of Event Types to 38 (including a mixed category named "(Other EVENT TYPES)", containing 158 rows of data).

Thirdly, the damage in USD to property and crop is split into two columns: once with a number, and once with an exponent that the former needs to be multiplied by in order to have the complete value of economic damage. The variables "K", "M", and "B" indicate that the value in the "PROPDMG" or "CROPDMG" columns needs to be multipled by 1000, 1000000, and 1000000000, respectively. To this end, the grep() command was used to replace these letters with the aforementioned corresponding numbers in the "PROPDMGEXP" and "CROPDMGEXP" columns. The class of these columns was changed to "numeric" before multiplying the "PROPDMG" and the "PROPDMGEXP", and the "CROPDMG" and "CROPDMGEXP" columns with each other to form two new columns "PROPDMG_TOTAL" and "CROPDMG_TOTAL", respectively.

Finally, the columns relevant for the analysis from "data1" were subsetted into the dataframe "data2": "EVENT_TYPES", "PROPDMG_TOTAL", "CROPDMG_TOTAL", "FATALITIES", and "INJURIES". "data2" was tranformed into a dataframe and all "NA"s were replaced with "0".

# creating a copy of the original dataframe for processing  
data1 <- Storm

# selecting the relevant columns
data1 <- data1 %>%
  select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES) %>%
  filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

# changing all terms to uppercase 
data1$EVTYPE <- toupper(data1$EVTYPE)

# create a new variable "EVENT_TYPE" to transform variable EVTYPE in groups according to key words
data1$EVENT_TYPES <- "(OTHER EVENT TYPES)"

data1$EVENT_TYPES[grep("COAST", data1$EVTYPE)] <- "COAST"
data1$EVENT_TYPES[grep("FLASH", data1$EVTYPE)] <- "FLASH"
data1$EVENT_TYPES[grep("FLOOD", data1$EVTYPE)] <- "FLOOD"
data1$EVENT_TYPES[grep("FLASH", data1$EVTYPE)] <- "FLASH FLOOD"
data1$EVENT_TYPES[grep("COAST", data1$EVTYPE)] <- "COASTAL FLOOD" 
data1$EVENT_TYPES[grep("DEVIL", data1$EVTYPE)] <- "DEVIL"
data1$EVENT_TYPES[grep("DUST", data1$EVTYPE)] <- "DUST STORM"
data1$EVENT_TYPES[grep("DEVIL", data1$EVTYPE)] <- "DUST DEVIL"
data1$EVENT_TYPES[grep("FROST", data1$EVTYPE)] <- "FROST/FREEZE"
data1$EVENT_TYPES[grep("FREEZE", data1$EVTYPE)] <- "FROST/FREEZE"
data1$EVENT_TYPES[grep("AVALAN", data1$EVTYPE)] <- "AVALANCHE"
data1$EVENT_TYPES[grep("BLIZZARD", data1$EVTYPE)] <- "BLIZZARD"
data1$EVENT_TYPES[grep("WIND", data1$EVTYPE)] <- "WIND"
data1$EVENT_TYPES[grep("DEBRIS", data1$EVTYPE)] <- "DEBRIS FLOW"
data1$EVENT_TYPES[grep("FOG", data1$EVTYPE)] <- "FOG"
data1$EVENT_TYPES[grep("SMOKE", data1$EVTYPE)] <- "SMOKE"
data1$EVENT_TYPES[grep("DROUGHT", data1$EVTYPE)] <- "DROUGHT"
data1$EVENT_TYPES[grep("WIND", data1$EVTYPE)] <- "WIND"
data1$EVENT_TYPES[grep("HEAT", data1$EVTYPE)] <- "HEAT"
data1$EVENT_TYPES[grep("CHILL", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("EXTREME COLD", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("HYPOTHERMIA", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("EXPOSURE", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("RAIN", data1$EVTYPE)] <- "RAIN"
data1$EVENT_TYPES[grep("SNOW", data1$EVTYPE)] <- "SNOW"
data1$EVENT_TYPES[grep("FUNNEL", data1$EVTYPE)] <- "FUNNELCLOUD"
data1$EVENT_TYPES[grep("HAIL", data1$EVTYPE)] <- "HAIL"
data1$EVENT_TYPES[grep("HURRICANE", data1$EVTYPE)] <- "HURRICANE/TYPHOON"
data1$EVENT_TYPES[grep("TYPHOON", data1$EVTYPE)] <- "HURRICANE/TYPHOON"
data1$EVENT_TYPES[grep("LIGHTNING", data1$EVTYPE)] <- "LIGHTNING"
data1$EVENT_TYPES[grep("RIP CURRENT", data1$EVTYPE)] <- "RIP CURRENT"
data1$EVENT_TYPES[grep("TORNADO", data1$EVTYPE)] <- "TORNADO"
data1$EVENT_TYPES[grep("GLAZE", data1$EVTYPE)] <- "GLAZE"
data1$EVENT_TYPES[grep("FIRE", data1$EVTYPE)] <- "FIRE"
data1$EVENT_TYPES[grep("TSUNAMI", data1$EVTYPE)] <- "TSUNAMI"
data1$EVENT_TYPES[grep("LANDSLIDE", data1$EVTYPE)] <- "LANDSLIDE"
data1$EVENT_TYPES[grep("SURF", data1$EVTYPE)] <- "SURF"
data1$EVENT_TYPES[grep("MIXED PRECIP", data1$EVTYPE)] <- "RAIN"
data1$EVENT_TYPES[grep("BLACK ICE", data1$EVTYPE)] <- "ICE"
data1$EVENT_TYPES[grep("ICY ROAD", data1$EVTYPE)] <- "ICE"
data1$EVENT_TYPES[grep("ASTRONOMICAL", data1$EVTYPE)] <- "ASTRONOMICAL TIDE"
data1$EVENT_TYPES[grep("COLD", data1$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
data1$EVENT_TYPES[grep("ICE STORM", data1$EVTYPE)] <- "ICE STORM"
data1$EVENT_TYPES[grep("SURGE", data1$EVTYPE)] <- "SURGE/TIDE"
data1$EVENT_TYPES[grep("TROPICAL", data1$EVTYPE)] <- "TROPICAL STORM"
data1$EVENT_TYPES[grep("HIGH SEA", data1$EVTYPE)] <- "SURF"
data1$EVENT_TYPES[grep("URBAN/SML STREAM FLD", data1$EVTYPE)] <- "URBAN/SML STREAM FLD"
data1$EVENT_TYPES[grep("WINTER STORM", data1$EVTYPE)] <- "W STORM"
data1$EVENT_TYPES[grep("WINTER", data1$EVTYPE)] <- "WINTER WEATHER"
data1$EVENT_TYPES[grep("WINTRY", data1$EVTYPE)] <- "WINTER WEATHER"
data1$EVENT_TYPES[grep("W STORM", data1$EVTYPE)] <- "WINTER STORM"
data1$EVENT_TYPES[grep("WATERSPOUT", data1$EVTYPE)] <- "WATERSPOUT"
data1$EVENT_TYPES[grep("SEICHE", data1$EVTYPE)] <- "SEICHE"
data1$EVENT_TYPES[grep("MICROBURST", data1$EVTYPE)] <- "THUNDERSTORM"
data1$EVENT_TYPES[grep("THUNDERSTORM", data1$EVTYPE)] <- "THUNDERSTORM"

# checking the number of Event Types 
sort(table(data1$EVENT_TYPES), decreasing = TRUE)
## 
##                    WIND            THUNDERSTORM                 TORNADO 
##                   73271                   56045                   39960 
##                    HAIL             FLASH FLOOD               LIGHTNING 
##                   26607                   21597                   13300 
##                   FLOOD          WINTER WEATHER                    SNOW 
##                   10622                    2064                    1893 
##                    FIRE                    RAIN                    HEAT 
##                    1259                    1248                     980 
##               ICE STORM    URBAN/SML STREAM FLD             RIP CURRENT 
##                     714                     702                     641 
## EXTREME COLD/WIND CHILL          TROPICAL STORM               AVALANCHE 
##                     497                     456                     269 
##                 DROUGHT                BLIZZARD           COASTAL FLOOD 
##                     266                     255                     239 
##                    SURF       HURRICANE/TYPHOON              SURGE/TIDE 
##                     235                     233                     225 
##               LANDSLIDE                     FOG     (OTHER EVENT TYPES) 
##                     198                     188                     158 
##            FROST/FREEZE              DUST STORM              DUST DEVIL 
##                     155                     104                      95 
##              WATERSPOUT                   GLAZE                     ICE 
##                      64                      23                      23 
##                 TSUNAMI             FUNNELCLOUD       ASTRONOMICAL TIDE 
##                      14                      13                      10 
##                  SEICHE                   SMOKE 
##                       9                       1
# PROPDMG and CROPDMG values needed to be multiplied by their corresponding exponent columns 
# K = 1000, M = 1000000, and B =10000000000 
data1$PROPDMGEXP[grep("K", data1$PROPDMGEXP)] <- "1000"
data1$PROPDMGEXP[grep("M", data1$PROPDMGEXP)] <- "1000000"
data1$PROPDMGEXP[grep("B", data1$PROPDMGEXP)] <- "1000000000"
data1$CROPDMGEXP[grep("K", data1$CROPDMGEXP)] <- "1000"
data1$CROPDMGEXP[grep("M", data1$CROPDMGEXP)] <- "1000000"
data1$CROPDMGEXP[grep("B", data1$CROPDMGEXP)] <- "1000000000"

data1$PROPDMGEXP <- as.numeric(data1$PROPDMGEXP)
data1$CROPDMGEXP <- as.numeric(data1$CROPDMGEXP)

# creating a new variable "PROPDMG_TOTAL" and "CROPDMG_TOTAL" 
data1$PROPDMG_TOTAL <- data1$PROPDMG * data1$PROPDMGEXP
data1$CROPDMG_TOTAL <- data1$CROPDMG * data1$CROPDMGEXP

# reorganising all the columns and only keep those that are needed
data2 <- data1 %>%
    select(EVENT_TYPES, PROPDMG_TOTAL, CROPDMG_TOTAL, FATALITIES, INJURIES)

# replacing all NA values with 0
data2 <- as.data.frame(data2)
data2[is.na(data2)] <- 0
Tidying the Data

The dataframe "data2" contains all of the information needed to answer the questions of interest, however the data is not yet in tidy form. The following section describes how "data2" is further transformed to yield two tidy dataframes that are then directly used in the "Results" section of this analysis.

For the first tidy dataframe regarding the health impact (i.e. looking at "EVENT_TYPES", "FATALITIES" and "INJURIES"), "data2" was subset to contain only the rows for which either the recorded values for "FATALITIES" or "INJURIES" was greater than 0. These values were then summed up by "EVENT_TYPES" to generate "data3". The dataframe "data3" contains a total of 35 observations and 3 variables ("EVENT_TYPES", "FATALITIES", and "INJURIES"). "data3" was the split into two dataframes "data3_FAT" and "data3_INJ", where each contained the "EVENT_TYPES" and "FATALITIES" or "INJURIES" columns, respectively. Both dataframes initially contain 35 observations and two variables. The "FATALITIES" and "INJURIES" columns were renamed to "COUNT", and for each dataframe a new column "HARM" was containing the term "FATALITIES" or "INJURY", respectively. This then allowed "data3_FAT" and "data3_INJ" to be recombined using the rbind() command into the dataframe "data3_tidy. This dataframe contains 70 observations of 3 variables ("EVENT_TYPES", "COUNT" (i.e. the number of fatalities or injuries recorded), and "HARM" (i.e. either "FATALITY" or "INJURY")).

For the second tidy dataframe regarding the economic impact (i.e. looking at "EVENT_TYPES", "PROPDMG" and "CROPDMG"), "data2" was subset to contain only the rows for which either the recorded values for "CROPDMG_TOTAL" or "PROPDMG_TOTAL" was greater than 0. These values were then summed up by "EVENT_TYPES" to generate "data4". The dataframe "data4" contains a total of 38 observations and 3 variables ("EVENT_TYPES", "CROPDMG_TOTAL", and "PROPDMG_TOTAL"). "data4" was the split into two dataframes "data4_CROP" and "data4_PROP", where each contained the "EVENT_TYPES" and "CROPDMG_TOTAL" or "PROPDMG_TOTAL" columns, respectively. Both dataframes initially contain 38 observations and two variables. The "CROPDMG_TOTAL" and "PROPDMG_TOTAL" column were renamed to "COUNT", and for each dataframe a new column "DAMAGE" was containing the term "CROP DAMAGE" or "PROPERTY DAMAGE", respectively. This then allowed "data4_CROP" and "data4_PROP" to be recombined using the rbind() command into the dataframe "data4_tidy. This dataframe contains 76 observations of 3 variables ("EVENT_TYPES", "COUNT" (i.e. the amount of crop or property damage in USD), and "DAMAGE" (i.e. either "CROP DAMAGE" or "PROPERTY DAMAGE")).

# generating a tidy dataframe to investigate the health impact ("FATALITIES" and "INJURIES") 
data3 <- data2 %>%
  filter(FATALITIES > 0 | INJURIES > 0)  %>%
  group_by(EVENT_TYPES) %>%
  summarise(FATALITIES = sum(FATALITIES), 
            INJURIES = sum(INJURIES)) 

# making the data tidy
data3_FAT <- data3 %>%
  select(EVENT_TYPES, FATALITIES) %>%
  rename(COUNT = FATALITIES)
data3_FAT$HARM <- "FATALITIES"

data3_INJ <- data3 %>%
  select(EVENT_TYPES, INJURIES) %>%
  rename(COUNT = INJURIES)
data3_INJ$HARM <- "INJURY"

data3_tidy <- rbind(data3_FAT, data3_INJ) %>%
    arrange(-COUNT)

# generating a tidy dataframe to investigate the economic impact ("CROPDMG_TOTAL" and "PROPDMG_TOTAL") 
data4 <- data2 %>%
  filter(CROPDMG_TOTAL > 0 | PROPDMG_TOTAL > 0)  %>%
  group_by(EVENT_TYPES) %>%
  summarise(CROP = sum(CROPDMG_TOTAL), 
            PROP = sum(PROPDMG_TOTAL)) 

# making the data tidy
data4_CROP <- data4 %>%
  select(EVENT_TYPES, CROP) %>%
  rename(COUNT = CROP)
data4_CROP$DAMAGE <- "CROP DAMAGE"

data4_PROP <- data4 %>%
  select(EVENT_TYPES, PROP) %>%
  rename(COUNT = PROP)
data4_PROP$DAMAGE <- "PROPERTY DAMAGE"

data4_tidy <- rbind(data4_CROP, data4_PROP) %>%
    arrange(-COUNT)

Results

Across the United States, which types of events are most harmful with respect to population health?

The Top 10 events causing the most injuries or fatalities was looked at using the head() command. The data for all categories was then plotted using ggplot(). The results show that the most harmful event type is the tornado, which has resulted in 91364 injuries and 5658 fatalities between the years 1950 and 2011.

# Table of the event types causing the top 10 number of injuries and/or fatalities
head(data3_tidy, 10)
## # A tibble: 10 x 3
##    EVENT_TYPES    COUNT HARM      
##    <chr>          <dbl> <chr>     
##  1 TORNADO        91364 INJURY    
##  2 HEAT            9224 INJURY    
##  3 WIND            8861 INJURY    
##  4 FLOOD           6795 INJURY    
##  5 TORNADO         5658 FATALITIES
##  6 LIGHTNING       5231 INJURY    
##  7 HEAT            3138 FATALITIES
##  8 THUNDERSTORM    2507 INJURY    
##  9 ICE STORM       1992 INJURY    
## 10 WINTER WEATHER  1968 INJURY
# Plotting the data 
plot1 <- ggplot(data3_tidy, aes(x = reorder(EVENT_TYPES, -COUNT), y = COUNT)) +
  geom_histogram(stat = "identity", aes(fill = HARM)) +
   labs(x = "Weather Event", y = "Sum Total") +
  ggtitle("Total Number of Fatalities and Injuries Summarised by Weather Event \n (1950-2011)", 
          subtitle = "Figure 1: Tornados are the leading cause of injuries and fatalities") +
  theme(plot.title = element_text(hjust = 0.5),
        panel.background = element_rect(fill = "white"),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5), 
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25), 
        legend.position =c(0.88,0.7), 
        legend.title = element_blank(), 
        legend.text = element_text(size = 8)) + 
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) 

print(plot1)

Across the United States, which types of events have the greatest economic consequences?

The Top 10 events causing the most crop or property damage was looked at using the head() command. The data for all categories was then plotted using ggplot(). The results show that the most harmful event type resulting in property damage was flooding (150,167,161,680 USD), whereas the most harmful event resulting in crop damage was drought (13,972,566,000 USD) between the years 1950 and 2011.

# Table of the event types causing the top 10 number of crop and/or property damage
head(data4_tidy, 10)
## # A tibble: 10 x 3
##    EVENT_TYPES               COUNT DAMAGE         
##    <chr>                     <dbl> <chr>          
##  1 FLOOD             150167161680. PROPERTY DAMAGE
##  2 HURRICANE/TYPHOON  85336410010  PROPERTY DAMAGE
##  3 TORNADO            58530432191  PROPERTY DAMAGE
##  4 SURGE/TIDE         47965224000  PROPERTY DAMAGE
##  5 FLASH FLOOD        16907365196. PROPERTY DAMAGE
##  6 HAIL               16013999370  PROPERTY DAMAGE
##  7 DROUGHT            13972566000  CROP DAMAGE    
##  8 FLOOD              10734652950  CROP DAMAGE    
##  9 WIND               10580518930  PROPERTY DAMAGE
## 10 FIRE                8501628500  PROPERTY DAMAGE
# Plotting the data 
plot2 <- ggplot(data4_tidy, aes(x = reorder(EVENT_TYPES, -COUNT), y = COUNT)) +
  geom_histogram(stat = "identity", aes(fill = DAMAGE)) +
   labs(x = "Weather Event", y = "Total Damages in USD") +
  ggtitle("Crop and Property Damages (USD) Summarised by Weather Event \n (1950-2011)",
           subtitle = "Figure 2: Flood and drought are the leading causes of property and crop damage, respectively") +
  theme(plot.title = element_text(hjust = 0.5),
        panel.background = element_rect(fill = "white"),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5), 
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25), 
        legend.position =c(0.85,0.75), 
        legend.title = element_blank(), 
        legend.text = element_text(size = 8)) + 
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x)))
  
print(plot2)