For the cost to public health, Tornados are the most dangerous in terms of absolute number of fatalities and injuries. However, such events are frequent and, on average, only 1 death occurs every 10 tornado events despite there being an average of nearly 2 injuries to eveny event. Tsunamis on the other hand, while having a low total number of deaths, are the most deadly event when they occur, killing at least 1 person and injuring an average of 6 people per event.
For the cost to property and crop damage; property damage was significantly greater than that to crops. Such property damage was largely caused by Floods and Hurricanes with 150.483 Billion USD and 85.356 billion USD respectivly. In terms of crop damage however, this was unuprisingly largely due to Droughts and Floods, causing 13.973 bilion USD and 10.956 billion USD worth of damage respectivly.
# Load in relevant R packages
library(dplyr)
library(ggplot2)
library(reshape2)
# Set the number format of output numbers
options(scipen=99999, digits=3)
# Set the working directory to the data file location
wd <- '/Users/rek514/Documents/Data_science'
setwd(wd)
# Details on the location of the downloaded file
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
destfile <- 'StormData.csv.bz2'
# If we havent downlaoded the data yet, downlaod it now.
if(!file.exists(destfile)){ download.file(fileURL,destfile)}
# Read in data set
dataset <- read.csv('StormData.csv.bz2', na.strings = "")
# Create a copy of the dataset to modify
clean_dataset <- dataset
# Convert the EVTYPE variable to a chavter rather then factor
clean_dataset$EVTYPE <- as.character(clean_dataset$EVTYPE)
# Some inputs have multiple event types so we want to remove them where possible
# This looks for seperaters (/, and, &) and only keep the first event type of this input.
problementry <- grep('/', clean_dataset$EVTYPE)
for (i in 1:length(problementry)){
clean_dataset$EVTYPE[(problementry[i])] <- gsub("/.*", "",clean_dataset$EVTYPE[(problementry[i])])}
problementry2 <- grep('&', clean_dataset$EVTYPE)
for (i in 1:length(problementry2)){
clean_dataset$EVTYPE[(problementry2[i])] <- sub("&.*", "",clean_dataset$EVTYPE[(problementry[i])])}
problementry3 <- grep('and', clean_dataset$EVTYPE, ignore.case = TRUE)
for (i in 1:length(problementry3)){
clean_dataset$EVTYPE[(problementry3[i])] <- sub("and.*", "",clean_dataset$EVTYPE[(problementry[i])])}
This means that there will be some cases not used as they do not fit in which one of these predefined 48 weather events or it is unclear which category the event belongs to (e.g. ‘Record high’ may refer to heat, rain or wind). If the entry met either of these criteria, the case was removed from the analysis.
# Some inputs have not been given the standard EV Type name so we want to ensure all the same event type, regardless of exact naming convention, are analysed under the same category.
# Create a new coolumn variable to save his category information
clean_dataset[,38] <- NA
colnames(clean_dataset)[38] <- 'Global_event'
# Make all the EVTYPE enries lower case to avoid duplciates
clean_dataset$EVTYPE <- tolower(clean_dataset$EVTYPE)
# Create a variable to store the inputted event names associated with each of the 48 weather event types
# For example 'Heavy Rain', 'Heavy Precipitation' and 'Torrential Rain' will all be categorised as 'Heavy Rain'
event_types <- vector(mode="list", length=48)
# Create a variable to save the index locations of entries associated with each event type
event_index <- vector(mode="list", length=48)
# Predefined names of the 48 weather event types
evnames <- c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill",
"Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat",
"Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze","Funnel Cloud","Freezing Fog","Hail",
"Heat","Heavy Rain","Heavy Snow","High Surf","High Wind","Hurricane (Typhoon)","Ice Storm","Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather")
names(event_types) <- evnames
names(event_index) <- evnames
# Creates a list of all the unique inputs for the evtype variable
labels <- unique(as.character(clean_dataset$EVTYPE))
# This chunk of text goes through each of the 48 variables and and finds the associated EVTYPE variable names and the indices of the entries in the main dataset.
# Each of these were manually checked to ensure each search criteria encompased all the relavant entries
# In some cases, when the event type still stated 2 categories, the first category inputted was chosen.
# Similarly, some evtype inputs did not fit into any category or were not specific enough to be assogned to 1 cateogy or the other e.g. 'Record high' may refer to heat, rain or wind, so was not included.
event_index[["High Wind"]]<- grep("(?=.*wnd.*)(?!.*tstm)|(?=.*wind.*)(?!.*low)(?!.*marine)(?!.*strong)(?!.*down)(?!.*T.*e.*o.*m.*ind)(?!.*rain)(?!.*whirl)(?!.*tstm)(?!.*mi.*o)(?!.*flood)(?!.*chill.*)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[["Tornado"]]<- grep("(?=.*torn.*)(?!.*waterspout)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[["Dust Devil"]]<- grep('devil|devel', x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[["Coastal Flood"]]<-grep('(?=.*coastal.*)(?!.*surf)(?!.*erosion)|(?=.*cstl.*)|(?=.*tidal*)', x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[["Lightning"]]<- grep('(?=.*^lig.*ing)(?!.*rain)|(?=.*^ lig.*ing)', x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Lakeshore Flood']]<- grep(c("Lake.*flood"), x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Winter Storm']]<- grep("^win.*sto", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Winter Weather']]<-grep("^win.*wea", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Seiche']]<-grep("Seiche", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Wildfire']]<-grep("(?=.*fire)(?!.*lightning)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Waterspout']]<-grep("^Wa.*ter.*spout|^ Wa.*ter.*spout", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Hurricane (Typhoon)']]<-grep('Hurricane|Typhoon', x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Thunderstorm Wind']]<-grep('^(?=.*TSTM)(?!.*tornado)(?!.*marine)(?!.*non)|^(?=.*T.*e.*o.*m)(?!.*winter)(?!.*marine)(?!.*non)(?!.*tornado)(?!.*lightning)|(?=.*gustnado)|(?=.*downburst)|(?=.*mic.*oburst)|(?=.*whirlwind)', x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Tropical Storm']]<-grep("^Tropical Storm", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Tropical Depression']]<-grep("depression", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Flash Flood']]<-grep("Flash", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Hail']]<-grep("^(?=.*hail)(?!.*marine)(?!.*wind)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Heavy Rain']]<-grep("^(?=.*Precip)(?!.*monthly)|(?=.*rain)(?!.*monthly)(?!.*freezing)(?!.*low)(?!.*wind)|(?=.*urba.*flo)(?!.*thunderstorm)(?!.*mud)|(?=.*stre.*fl.*d)|(?=.*shower)(?!.*snow)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Drought']]<-grep("drought", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Funnel Cloud']]<-grep("(?=.*funnel)(?!.*thunderstorm)(?!.*waterspout)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Frost/Freeze']]<-grep("(?=.*frost)|(?=.*freeze)(?!.*snow)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Lake-Effect Snow']]<-grep("lake.* snow", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Heavy Snow']]<-grep("(?=.*snow)(?!.*lake)(?!.*lake)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Ice Storm']]<-grep("(?=.*ice.*storm)|(?=.*freezing)(?!.*fog)(?!.*snow)(?!.*road)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Heat']]<-grep("(?=.*heat)(?!.*ex.*)|(?=.*warm)|(?=.*hot)|(?=.*HIGH.*TEMP)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Excessive Heat']]<-grep("(?=.*ex.*heat)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Volcanic Ash']]<-grep("volcanic", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Flood']]<-grep("(?=.*flood)(?!.*urban)(?!.*flash)(?!.*street)(?!.*urban)(?!.*tidal)(?!.*c.*l)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Marine Hail']]<-grep("marine.*hail", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Marine High Wind']]<-grep("marine.*high", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Marine Strong Wind']]<-grep("marine.*strong", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Marine Thunderstorm Wind']]<-grep("marine.*tstm|marine.*thunder", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Extreme Cold/Wind Chill']]<-grep("(?=.*ext.*chill)|(?=.*ex.*cold)|(?=.*sev.*cold)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Cold/Wind Chill']]<-grep("(?=.*wind.*chill)(?!.*ex.*)(?!.*snow)|(?=.*cold)(?!.*ex.*)(?!.*snow)|(?=.*low.*temp.*)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Dense Fog']]<-grep("(?=.*fog)(?!.*ice)(?!.*freezing)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Freezing Fog']]<-grep("(?=.*fog)(?=.*freezing)|(?=.*fog)(?=.*ice)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Sleet']]<-grep("(?=.*sleet)(?!.*snow)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Rip Current']]<-grep("rip", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['High Surf']]<-grep("(?=.*surf)(?!.*rip)|(?=.*waves)(?!.*heat)|(?=.*swell)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Astronomical Low Tide']]<-grep("ASTRONOMICAL LOW TIDE", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Avalanche']]<-grep("Aval.*|rock|.*slide", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Dense Smoke']]<-grep("smoke", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Tsunami']]<-grep("TSUNAMI", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Blizzard']]<-grep("BLIZZARD", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Dust Storm']]<-grep("(?=.*dust)(?!.*dev.*l)(?!.*wind)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
event_index[['Storm Surge/Tide']]<-grep("storm surge|high tides", x=clean_dataset$EVTYPE, ignore.case = TRUE)
event_index[['Strong Wind']]<-grep("(?=.*strong.*wind.*)(?!.*marine)", x=clean_dataset$EVTYPE, ignore.case = TRUE,perl=TRUE)
# Here we loop though each of the 48 weather conditions and save out the associated evntype names and we alter the Global_type variable to represent the associated global event category (out of a possible 48)
used_labels <- c()
for (i in (1:(length(event_index)))){
this_name <- names(event_types)[i]
clean_dataset$Global_event[event_index[[this_name]]] <- this_name
event_types[[this_name]] <- unique(clean_dataset$EVTYPE[event_index[[this_name]]])
used_labels <- c(used_labels,(event_types[[this_name]]))
}
# Here we save out the unused names for the EVTYPE variable - this was again checked to ensure none met the desires criteria
nonlabels <- labels[-(grep(paste("!",paste(used_labels, collapse = "|", sep="|")),labels, ignore.case = TRUE))]
# For property damage, remove any unwanted magnitude factors
clean_dataset$PROPDMGEXP <- gsub("0|1|2|3|4|5|6|7|8|[?]|[+]|[-]", NA, clean_dataset$PROPDMGEXP, perl=TRUE)
# For H,K,M ana B magnitude factors, replace with actual numeric value
clean_dataset$PROPDMGEXP <- gsub("b", "1000000000", clean_dataset$PROPDMGEXP, ignore.case = TRUE)
clean_dataset$PROPDMGEXP <- gsub("m", "1000000", clean_dataset$PROPDMGEXP, ignore.case = TRUE)
clean_dataset$PROPDMGEXP <- gsub("k", "1000", clean_dataset$PROPDMGEXP, ignore.case = TRUE)
clean_dataset$PROPDMGEXP <- gsub("h", "100", clean_dataset$PROPDMGEXP, ignore.case = TRUE)
clean_dataset$PROPDMGEXP[is.na(clean_dataset$PROPDMGEXP)] <- 1
# Caculate the total cost taking the magnitude into consideration
clean_dataset$PROPDMG<-as.numeric(clean_dataset$PROPDMG)*as.numeric(clean_dataset$PROPDMGEXP)
# For crop damage, remove any unwanted magnitude factors
clean_dataset$CROPDMGEXP <- gsub("0|2|[?]", NA, clean_dataset$CROPDMGEXP, perl=TRUE)
# For H,K,M ana B magnitude factors, replace with actual numeric value
clean_dataset$CROPDMGEXP <- gsub("b", "1000000000", clean_dataset$CROPDMGEXP, ignore.case = TRUE)
clean_dataset$CROPDMGEXP <- gsub("m", "1000000", clean_dataset$CROPDMGEXP, ignore.case = TRUE)
clean_dataset$CROPDMGEXP <- gsub("k", "1000", clean_dataset$CROPDMGEXP, ignore.case = TRUE)
clean_dataset$CROPDMGEXP <- gsub("h", "100", clean_dataset$CROPDMGEXP, ignore.case = TRUE)
clean_dataset$CROPDMGEXP[is.na(clean_dataset$CROPDMGEXP)] <- 1
# Caculate the total cost taking the magnitude into consideration
clean_dataset$CROPDMG<-as.numeric(clean_dataset$CROPDMG)*as.numeric(clean_dataset$CROPDMGEXP)
——————————————————————————————————–
# Total fatality/injurt for each event type
type_grouped <- group_by(clean_dataset, Global_event)
type_data <- summarise(type_grouped,Fatalities_total=sum(FATALITIES, na.rm = TRUE),Fatalities_mean=mean(FATALITIES, na.rm = TRUE),Injuries_total=sum(INJURIES, na.rm = TRUE),Injuries_mean=mean(INJURIES, na.rm = TRUE))
# Order data by fatalities total with greatest first
type_data <- type_data[with(type_data, order(-Fatalities_total)), ]
#Melt the data
melted_fatalities <- melt((type_data[1:3]), id.vars = "Global_event")
# Remove rows with an event name of NA
melted_fatalities <- filter(melted_fatalities, !is.na(Global_event))
# Plots event types by total fatality
g <- ggplot(melted_fatalities, aes(Global_event,value, fill=factor(variable)))
g + geom_bar(stat='identity', position='dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))+ facet_grid(melted_fatalities$variable~., scales = 'free')+ggtitle('Total and Average Fatality rate from Weather events across the US')+xlab("Weather Event")+ylab("Total")+ theme(legend.position="none")
Figure 1: Bar graph to show the total number of fatalities from each weather event and the mean number of fatalities for each individual event.
——————————————————————————————————–
The above graphs show that Tornados caused the highest number of fatalities in the US with 5658 deaths. This was followed by Excessive Heat and Heat which caused 1999 and 1131 deaths respectivly. However, when taking into consideration the average fatality rate (ratio of deaths to recorded weather events), Tsunamis are the most dangerous to public health with a average of 1.65 weather per event - in that sense, there is at least 1 death for every excessive heat event. Excessive Heat had the second highest fatality rate, again with atleast 1 death per event.
What about injury rates however?
# Order data by injuries total
type_data <- type_data[with(type_data, order(-Injuries_total)), ]
# Melt the data and remove NA rowns
melted_injuries <- melt((type_data[,-(2:3)]), id.vars = "Global_event")
melted_injuries <- filter(melted_injuries, !is.na(Global_event))
# Plot both the total and mean injury rates
g <- ggplot(melted_injuries, aes(Global_event,value, fill=factor(variable)))
g + geom_bar(stat='identity', position='dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))+ facet_grid(melted_fatalities$variable~., scales = 'free')+ggtitle('Total and Average Injury rate from Weather events across the US')+xlab("Weather Event")+ylab("Total")+ theme(legend.position="none")
Figure 2: Bar graph to show the total number of injuries from each weather event and the mean number of injuries for each individual event
——————————————————————————————————–
The above graphs also appear to show a similar story to the fatality data with Tornados caused the highest number of injuries in the US. However, both Thunder storms and Floods provide a greater danger to the public in terms of injuries compared to the heat events.
When looking at the average injury rate however, this shows a different picture. Here there is atleast one injury for every Tsunami, Hurricane, Excessive Heat, Heat, Tornado and Dust storm event. This is particually dangerous for Tsunami events where on average over 6 people are injured in every event.
RECOMMENDATION: Based on these findings, we would recommend that resourses should be put into the preduction and preparation for tornado events which are the most harmful to the US population whereby 1.5 people becomes injured on average in each event. However, such events tend to occur with high frequency and, when event occurance is taken into account, Tsunami and Heat events show both high fartality and injury rates.
In that sense, as soon as authorities realise a Tsunami and Heat event is apparant, critical resourses should be put into preventing such problems.
——————————————————————————————————–
# Total property and crop damage cost for each event type
type_grouped <- group_by(clean_dataset, Global_event)
cost_data <- summarise(type_grouped,Property_total=sum(PROPDMG),Crops_total=sum(CROPDMG))
#Melt data and remove NA rows
melted_cost <- melt(cost_data, id.vars = "Global_event")
melted_cost <- filter(melted_cost, !is.na(Global_event))
# Plots the top 10 event types by total fatality
g <- ggplot(melted_cost, aes(Global_event,value/1000000, fill=factor(variable)))
g + geom_bar(stat='identity', position='dodge')+theme(axis.text.x = element_text(angle = 90, hjust = 1))+ facet_grid(melted_cost$variable~., scales = 'free')+ggtitle('Total Financial costs from Weather events across the US')+xlab("Weather Event")+ylab("Total Cost (Million US$)")+ theme(legend.position="none")
# Calculate the higehest 2 property damage costs
highestcost <-cost_data$Property_total[order(-cost_data$Property_total)[1]]
secondcost <-cost_data$Property_total[order(-cost_data$Property_total)[2]]
# Calculate the higehest 2 crop damage costs
highestcostcrop <-cost_data$Crops_total[order(-cost_data$Crops_total)[1]]
secondcostcrop <-cost_data$Crops_total[order(-cost_data$Crops_total)[2]]
Figure 3: Bar graph to show the total damage cost to both property and crops from each weather event.
——————————————————————————————————–
In terms of Property damage, there has been 427.319 billion USD worth of damage compared to 49.104 billion USD for crop damage.
Property damage has been largely caused by Floods and Hurricanes (Typhoon), with 150.483 Billion USD and 85.356 billion USD respectivly. This was then followed by Tornados and Storm Surges and Tides. In terms of crop damage however, this was unsuprisingly largely due to Droughts and Floods, causing 13.973 bilion USD and 10.956 billion USD worth of damge respectivly.
——————————————————————————————————–