The goal of this assignment is to explore data from the National Oceanic and Atmospheric Administration storm database inorder to establish which severe weather events affect public health and economic activities in the US the most.
We begin this analysis by preparing the data for analysis i.e clear the environment, set working directory, download, extract and read the data into R.
rm(list = ls()) #clean up working environment
setwd("H:/Data Science/Reproducible Research/week4/Assignment/") #set working directory
# Load required libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
#initialize variable with file download URL
fileURL = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2" #Initialize file URL for data download
if(!file.exists("Data")){ #check if data directory exists, and create
dir.create("Data")
}
if(!file.exists("Data/StormData.csv.bz2")){ #check if data file exists and download
download.file(fileURL, destfile = "Data/StormData.csv.bz2")
}
#Load data into in R
StormData <-read.csv("Data/StormData.csv.bz2", sep=",", header=T, quote = "\"")
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : EOF within quoted string
#Prepare data for analysis
subsetSD <- StormData[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
#Create variable for math friendly values and initialize it to 0
subsetSD$PROPDMGNUM = 0
# Convert H, K, M, B units to calculate for Property Damage
subsetSD[subsetSD$PROPDMGEXP == "H", ]$PROPDMGNUM = subsetSD[subsetSD$PROPDMGEXP == "H", ]$PROPDMG * 10^2
subsetSD[subsetSD$PROPDMGEXP == "K", ]$PROPDMGNUM = subsetSD[subsetSD$PROPDMGEXP == "K", ]$PROPDMG * 10^3
subsetSD[subsetSD$PROPDMGEXP == "M", ]$PROPDMGNUM = subsetSD[subsetSD$PROPDMGEXP == "M", ]$PROPDMG * 10^6
subsetSD[subsetSD$PROPDMGEXP == "B", ]$PROPDMGNUM = subsetSD[subsetSD$PROPDMGEXP == "B", ]$PROPDMG * 10^9
# Convert H, K, M, B units to calculate Crop Damage
subsetSD$CROPDMGNUM = 0
subsetSD[subsetSD$CROPDMGEXP == "H", ]$CROPDMGNUM = subsetSD[subsetSD$CROPDMGEXP == "H", ]$CROPDMG * 10^2
subsetSD[subsetSD$CROPDMGEXP == "K", ]$CROPDMGNUM = subsetSD[subsetSD$CROPDMGEXP == "K", ]$CROPDMG * 10^3
subsetSD[subsetSD$CROPDMGEXP == "M", ]$CROPDMGNUM = subsetSD[subsetSD$CROPDMGEXP == "M", ]$CROPDMG * 10^6
subsetSD[subsetSD$CROPDMGEXP == "B", ]$CROPDMGNUM = subsetSD[subsetSD$CROPDMGEXP == "B", ]$CROPDMG * 10^9
For this analysis, we look at weather events causing death, injury and economic destruction.
To establish major cause of death, we look at the fatal injuries and what causes them.
Additionally, we look at the events cause most injuries
In order to understand the extent of economic data mage, we take the value of both crop and property damages by each event type.
As it turns out, floods cause the most crop and property damages