Title

The purpose of this excercise is to explore the NOAA Storm Database and answer the following questions * Types of events most harmful with respect to population health across United States * Types of events that have the greatest economic consequences across United States

Synopsis

Following are the steps performed in this data analysis.

  1. The dataset is downloaded and loaded into a dataframe.
  2. The attributes of interest are
    • EVTYPE - Event types
    • FATALITIES - Count of fatalities
    • INJURIES - Count of injuries
    • PROPDMG - Damage to properties in dollars
    • PROPDMGEXP - Multiplication factor for property damages
    • CROPDMG - Damage to crops in dollars
    • CROPDMGEXP - Multiplication factor for crop damages
  3. The dataset is filtered where fatalities or injuries or property damages or crop damages are greater than 0.
  4. The event type attribute is converted to upper case.
  5. A new attribute for health impact is calculated as follows. Health impact = Count of fatalities + Count of injuries
  6. A new attribute for economic impact is calculated as follows. Economic impact = (Property damage) * 10 ^ (Property damage multiplication factor) + (Crop damage) * 10 ^ (Crop damage multiplication factor)
  7. A new dataset for health impact is created by summing up the health impact attribute by event type and selecting only the top 6 events which have more impact.
  8. Plot a boxplot between the health impact and the event types attributes.
  9. A new dataset for economic impact is created by summing up the economic impact attribute by event type and selecting only the top 6 events which have more impact.
  10. Plot a boxplot between the economic impact and the event types attributes.

Data Processing

Setting up the variables for downloading the dataset

require(knitr)
## Loading required package: knitr
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
downloaded.date <- date()

Downloading NOAA Storm dataset from the URL https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 on Sat May 19 01:30:40 2018.

dest.file <- paste0(getwd(),"/repdata-data-StormData.csv.bz2")
download.file(url, dest.file, quiet=TRUE)
#unlink(dest.file)

Creating the dataset

zip.file <- paste0(getwd(),"/repdata-data-StormData.csv.bz2")
storm.df <- read.csv(bzfile(zip.file), header=TRUE, na.strings=c("", "NA", "-", "+", "?"), stringsAsFactors=FALSE)

Subsetting the dataset

reqd.columns <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
filtered.df <- (subset(storm.df, FATALITIES>0 | INJURIES>0 | PROPDMG>0 | CROPDMG>0, select=reqd.columns))
filtered.df$PROPDMGEXP <- ifelse(is.na(filtered.df$PROPDMGEXP), 0, as.numeric(chartr("hHkKmMbB", "22336699", filtered.df$PROPDMGEXP)))
filtered.df$CROPDMGEXP <- ifelse(is.na(filtered.df$CROPDMGEXP), 0, as.numeric(chartr("hHkKmMbB", "22336699", filtered.df$CROPDMGEXP)))
filtered.df$EVTYPE <- toupper(filtered.df$EVTYPE)
filtered.df$health.impact <- filtered.df$FATALITIES + filtered.df$INJURIES
filtered.df$economic.impact <- (filtered.df$PROPDMG * 10 ^ filtered.df$PROPDMGEXP) + (filtered.df$CROPDMG * 10 ^ filtered.df$CROPDMGEXP)

Finding the events having harmful impact on population health

health.df <- aggregate(x=filtered.df$health.impact, by=list(filtered.df$EVTYPE), FUN=sum, na.rm=TRUE)
names(health.df)[1] <- "event.types"
names(health.df)[2] <- "health.impact"
health.impact.df <- head(health.df[order(-health.df[,2]), ])
barplot(tapply(health.impact.df$health.impact, health.impact.df$event.types, sum), main="Events having harmful impact on population health", xlab="Event Types", ylab="Health Impact")

Finding the events having economic impact

economic.df <- aggregate(x=filtered.df$economic.impact, by=list(filtered.df$EVTYPE), FUN=sum, na.rm=TRUE)
names(economic.df)[1] <- "event.types"
names(economic.df)[2] <- "economic.impact"
economic.impact.df <- head(economic.df[order(-economic.df[,2]), ])
barplot(tapply(economic.impact.df$economic.impact, economic.impact.df$event.types, sum), main="Events having economic impact", xlab="Event Types", ylab="Economic Impact")

Results

After performing the data analysis, it is found that * TORNADO, EXCESSIVE HEAT and TSTM WIND have more harmful impact on population health across United States. * FLOOD, HURRICANE/TYPHOON and TORNADO have more economic impact across United States.