Analysis author: “John Mastapeter” date: “7/29/2020” —
#Using Historic Storm Data to Calculate Harmful Events and the Consequences to public health and the economy#
The United States of America is a large and geographically diverse country that experiences numerous environmental and weather related events each year. For over sixty year the federal government has been keeping records of the various weather related events that cause damage to population centers and the economy.
Injuries are fatalities provided the measures of an event’s harm to public health while the estimates to total crop damage and property damage provide figures for economic consequences of natural disasters.
Using simple R Programming, plots can be quickly created to show the top ten events that cause injuries and fatalities along with identifying the top ten events that cause property damage or crop damage.
Before data can be processed and analyzed, set knitr code chuck global options and load additional libraries necessary for later analysis
knitr::opts_chunk$set(echo = TRUE)
library(RCurl)
library(data.table)
library(ggplot2)
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
Assign working directory and download the data from url provided and verify that the data was successfully downloaded to working directory
#Set working directory
working_dir <- "C:/Users/mastapeterj/Documents/Coursera_DataScience/RPubsAssignment1"
setwd <- working_dir
#Assign link with data for analysis
data_link <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
#Download data to working directory and check to see if it downloaded
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 'C:/Users/mastapeterj/Documents/Coursera_DataScience/RPubsAssignment1.repdata_data_StormData.csv.bz2', method = 'curl')
file.exists("repdata_data_StormData.csv")
## [1] TRUE
Review Data in Working Directory
#Read and review csv
data_1 <- read.csv("repdata_data_StormData.csv.bz2", header = TRUE)
data_1_ex <- head(data_1)
data_1_ex
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
#List and count Event Types
EventTypes <- unique(data_1$EVTYPE)
Num_of_Types <- length(EventTypes)
Num_of_Types
## [1] 985
The total number of events monitored by the federal governnment number 985, but not all cause significant consequences to public health and the economy.
Isolate the information relevant to determining public health, EVTYPE, FATALITIES, and INJURIES, by extracting them into a new dataframe. Calculatinng the total number of fatalities and injuries will process more effeciently from a smaller dataframe
#Extract columns relevant to populaltion health; FATALITIES and INJURIES
data_health <- data_1[,c("EVTYPE", "FATALITIES", "INJURIES")]
#Calculate total injuries and fatalities
total_health <- setDT(data_health)[, lapply(.SD, sum), by = EVTYPE]
#Extract top ten EVTYPES for FATALIIES and INJURIES
total_health_by_fat <- total_health[order(total_health$FATALITIES, decreasing = TRUE),]
topten_events_fatalities <- total_health_by_fat[,c("EVTYPE", "FATALITIES")][1:10]
total_health_by_inj <- total_health[order(total_health$INJURIES, decreasing = TRUE),]
topten_events_injuries <- total_health_by_inj[,c("EVTYPE", "INJURIES")][1:10]
Repeat the process of isolating the relevant columns for economic impact, PROPDMG and CROPDMG.
However, the economic impact also includes monetary estimates, PROPDMGEXP and CROPDMGEST. These coluns contain the value estimates in hundreds-h, thousands-k, millions-m, or billions-b. The values have to be calculated as well in order to come to an accurate assessment of the damage caused by each event.
#Extract columns relevant to populaltion health; PROPDMG and CROPDMG
data_property <- data_1[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
#Remove NAs
data_property$PROPDMGEXP <- sub("^$", 0, data_property$PROPDMGEXP)
data_property$CROPDMGEXP <- sub("^$", 0, data_property$CROPDMGEXP)
data_property[is.na(data_property)] <- 0
#Incorporate PROPDMGEXP into PROPDMG
data_property$PROPDMGEXP <- as.character(data_property$PROPDMGEXP)
data_property$PROPDMGEXP[is.na(data_property$PROPDMGEXP)] <- 0
propdmg_estvals <- data_property$PROPDMGEXP[!grepl("K|M|B|H", data_property$PROPDMGEXP, ignore.case = TRUE)]
data_property$PROPDMGEXP[grep("H", data_property$PROPDMGEXP, ignore.case = TRUE)] <- "2"
data_property$PROPDMGEXP[grep("K", data_property$PROPDMGEXP, ignore.case = TRUE)] <- "3"
data_property$PROPDMGEXP[grep("M", data_property$PROPDMGEXP, ignore.case = TRUE)] <- "6"
data_property$PROPDMGEXP[grep("B", data_property$PROPDMGEXP, ignore.case = TRUE)] <- "9"
data_property$PROPDMGEXP <- as.numeric(as.character(data_property$PROPDMGEXP))
## Warning: NAs introduced by coercion
data_property$PROPEST <- data_property$PROPDMG * 10^data_property$PROPDMGEXP
#Incorporate CROPDMGEXP into CROPDMG
data_property$CROPDMGEXP <- as.character(data_property$CROPDMGEXP)
data_property$CROPDMGEXP[is.na(data_property$CROPDMGEXP)] <- 0
cropdmg_estvals <-data_property$CROPDMGEXP[!grepl("K|M|B", data_property$CROPDMGEXP, ignore.case = TRUE)]
data_property$CROPDMGEXP[grep("K", data_property$CROPDMGEXP, ignore.case = TRUE)] <- "3"
data_property$CROPDMGEXP[grep("M", data_property$CROPDMGEXP, ignore.case = TRUE)] <- "6"
data_property$CROPDMGEXP[grep("B", data_property$CROPDMGEXP, ignore.case = TRUE)] <- "9"
data_property$CROPDMGEXP <- as.numeric(as.character(data_property$CROPDMGEXP))
## Warning: NAs introduced by coercion
data_property$CROPEST <- data_property$CROPDMG * 10^data_property$CROPDMGEXP
#Extract top ten EVTYPES for damaged propert
total_property <- setDT(data_property)[, lapply(.SD, sum), by = EVTYPE]
total_damage_prop <- total_property[order(total_property$PROPEST, decreasing = TRUE),]
topten_events_propdmg <- total_damage_prop[,c("EVTYPE", "PROPEST")][1:10]
total_damage_crop <- total_property[order(total_property$CROPEST, decreasing = TRUE),]
topten_events_cropdmg <- total_damage_crop[,c("EVTYPE", "CROPEST")][1:10]
Print the top ten events that cause injuries, fatalities, cause the most property damage, and cause the most crop damage.
#Top Ten Injuries
topten_events_injuries
## EVTYPE INJURIES
## 1: TORNADO 91346
## 2: TSTM WIND 6957
## 3: FLOOD 6789
## 4: EXCESSIVE HEAT 6525
## 5: LIGHTNING 5230
## 6: HEAT 2100
## 7: ICE STORM 1975
## 8: FLASH FLOOD 1777
## 9: THUNDERSTORM WIND 1488
## 10: HAIL 1361
#Top Ten Fatalities
topten_events_fatalities
## EVTYPE FATALITIES
## 1: TORNADO 5633
## 2: EXCESSIVE HEAT 1903
## 3: FLASH FLOOD 978
## 4: HEAT 937
## 5: LIGHTNING 816
## 6: TSTM WIND 504
## 7: FLOOD 470
## 8: RIP CURRENT 368
## 9: HIGH WIND 248
## 10: AVALANCHE 224
#Top Ten for Property Damage
topten_events_propdmg
## EVTYPE PROPEST
## 1: FLOOD 144657709807
## 2: HURRICANE/TYPHOON 69305840000
## 3: STORM SURGE 43323536000
## 4: HURRICANE 11868319010
## 5: TROPICAL STORM 7703890550
## 6: WINTER STORM 6688497251
## 7: RIVER FLOOD 5118945500
## 8: WILDFIRE 4765114000
## 9: STORM SURGE/TIDE 4641188000
## 10: TSTM WIND 4484928495
#Top Ten for Crop Damage
topten_events_cropdmg
## EVTYPE CROPEST
## 1: DROUGHT 13972566000
## 2: FLOOD 5661968450
## 3: RIVER FLOOD 5029459000
## 4: ICE STORM 5022113500
## 5: HAIL 3025954473
## 6: HURRICANE 2741910000
## 7: HURRICANE/TYPHOON 2607872800
## 8: FLASH FLOOD 1421317100
## 9: EXTREME COLD 1292973000
## 10: FROST/FREEZE 1094086000
# Injuries Plot
injurychart <- ggplot(topten_events_injuries, aes(x = EVTYPE, color = EVTYPE))+geom_point(aes(y = INJURIES), shape = "square")+xlab("Event Type")+ylab("Injuries")+ggtitle("Injures by Event")+theme(axis.text.x = element_blank())
#Fatalities plot
fatalitieschart <- ggplot(topten_events_fatalities, aes(x = EVTYPE, color = EVTYPE))+geom_point(aes(y = FATALITIES), shape = "triangle")+xlab("Event Type")+ylab("Fatalities")+ggtitle("Fatalities by Event")+theme(axis.text.x = element_blank())
#Population Health Plot
plot_grid(injurychart, fatalitieschart, labels = "AUTO")
#Crop Damage Plot
cropchart <- ggplot(topten_events_cropdmg, aes(x = EVTYPE, color = EVTYPE))+geom_point(aes(y = CROPEST), shape = "square")+xlab("Event Type")+ylab("Damage")+ggtitle("Crop Damage")+theme(axis.text.x = element_blank())
#Property Damage Plot
propertychart <- ggplot(topten_events_propdmg, aes(x = EVTYPE, color = EVTYPE))+geom_point(aes(y = PROPEST), shape = "triangle")+xlab("Event Type")+ylab("Damage")+ggtitle("Property Damage")+theme(axis.text.x = element_blank())
#Economic Consequensces Plot
plot_grid(cropchart, propertychart, labels = "AUTO")