The purpose of this excercise is to explore the NOAA Storm Database and answer the following questions * Types of events most harmful with respect to population health across United States * Types of events that have the greatest economic consequences across United States
Following are the steps performed in this data analysis.
Setting up the variables for downloading the dataset
require(knitr)
## Loading required package: knitr
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
downloaded.date <- date()
Downloading NOAA Storm dataset from the URL https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 on Sat May 19 01:30:40 2018.
dest.file <- paste0(getwd(),"/repdata-data-StormData.csv.bz2")
download.file(url, dest.file, quiet=TRUE)
#unlink(dest.file)
Creating the dataset
zip.file <- paste0(getwd(),"/repdata-data-StormData.csv.bz2")
storm.df <- read.csv(bzfile(zip.file), header=TRUE, na.strings=c("", "NA", "-", "+", "?"), stringsAsFactors=FALSE)
Subsetting the dataset
reqd.columns <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
filtered.df <- (subset(storm.df, FATALITIES>0 | INJURIES>0 | PROPDMG>0 | CROPDMG>0, select=reqd.columns))
filtered.df$PROPDMGEXP <- ifelse(is.na(filtered.df$PROPDMGEXP), 0, as.numeric(chartr("hHkKmMbB", "22336699", filtered.df$PROPDMGEXP)))
filtered.df$CROPDMGEXP <- ifelse(is.na(filtered.df$CROPDMGEXP), 0, as.numeric(chartr("hHkKmMbB", "22336699", filtered.df$CROPDMGEXP)))
filtered.df$EVTYPE <- toupper(filtered.df$EVTYPE)
filtered.df$health.impact <- filtered.df$FATALITIES + filtered.df$INJURIES
filtered.df$economic.impact <- (filtered.df$PROPDMG * 10 ^ filtered.df$PROPDMGEXP) + (filtered.df$CROPDMG * 10 ^ filtered.df$CROPDMGEXP)
Finding the events having harmful impact on population health
health.df <- aggregate(x=filtered.df$health.impact, by=list(filtered.df$EVTYPE), FUN=sum, na.rm=TRUE)
names(health.df)[1] <- "event.types"
names(health.df)[2] <- "health.impact"
health.impact.df <- head(health.df[order(-health.df[,2]), ])
barplot(tapply(health.impact.df$health.impact, health.impact.df$event.types, sum), main="Events having harmful impact on population health", xlab="Event Types", ylab="Health Impact")
Finding the events having economic impact
economic.df <- aggregate(x=filtered.df$economic.impact, by=list(filtered.df$EVTYPE), FUN=sum, na.rm=TRUE)
names(economic.df)[1] <- "event.types"
names(economic.df)[2] <- "economic.impact"
economic.impact.df <- head(economic.df[order(-economic.df[,2]), ])
barplot(tapply(economic.impact.df$economic.impact, economic.impact.df$event.types, sum), main="Events having economic impact", xlab="Event Types", ylab="Economic Impact")
After performing the data analysis, it is found that * TORNADO, EXCESSIVE HEAT and TSTM WIND have more harmful impact on population health across United States. * FLOOD, HURRICANE/TYPHOON and TORNADO have more economic impact across United States.