Synopsis

It checks if dataset from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database exists. The dataset includes major storms and weather events in the United States, as well as estimates of any fatalities, injuries, and property damage.

This project addresses 2 main questions. 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

This part includes loading and preprocessing R packages needed to run the programs. And it downloads the storm data directly from the web.

## This helps with loading the packages needed. 
load.packages <- function(pkgs)
{
    if(class(pkgs) == "character")
        pkgs = c(pkgs)
    for(pkg in pkgs){
        if(!(pkg %in% installed.packages()[,"Package"])){
            install.packages(pkg, repos="http://cran.us.r-project.org")
        }
        library(pkg,character.only=TRUE)
    }
}
load.packages(c("dplyr", "ggplot2", "knitr"))

Loading and preprocessing the data

## check if the file already exists. If not, download the file.
if(!file.exists("repdata_data_StormData.csv.bz2"))
{ 
  fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileurl, "repdata%2Fdata%2FStormData.csv.bz2", mode="wb")
}

## read the activity.csv file into activitydata
stormdata <- read.csv("repdata_data_StormData.csv.bz2")

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? For determining harmfulness, I used FATALITIES. This finds the top 10 events that have the most fatalities, and plots the data.

##For finding the events that are most harmful, I used FATALITIES. 
fatalitysummary <- aggregate(FATALITIES ~ EVTYPE, data = stormdata, FUN = "sum", na.rm = TRUE)
fatalitysummary <- fatalitysummary[order(fatalitysummary$FATALITIES, decreasing=TRUE), ]
fatalitysummary <- fatalitysummary[1:10, ]

Across the United States, which types of events have the greatest economic consequences?

# Convert Property Damage(PROPDMG) to dollar ammount based on PROPDMGEXP. Multiple by Hundreds for H, Thousands for K, Millions for M, Billions for B. Assign it to PROPDMGVALUE column. 
stormdata$PROPDMGVALUE <- 0
stormdata[stormdata$PROPDMGEXP == "H", ]$PROPDMGVALUE <- stormdata[stormdata$PROPDMGEXP == "H", ]$PROPDMG * 100
stormdata[stormdata$PROPDMGEXP == "K", ]$PROPDMGVALUE <- stormdata[stormdata$PROPDMGEXP == "K", ]$PROPDMG * 1000
stormdata[stormdata$PROPDMGEXP == "M", ]$PROPDMGVALUE <- stormdata[stormdata$PROPDMGEXP == "M", ]$PROPDMG * 1000000
stormdata[stormdata$PROPDMGEXP == "B", ]$PROPDMGVALUE <- stormdata[stormdata$PROPDMGEXP == "B", ]$PROPDMG * 1000000000

# Do the same calculation for crop damage. 
stormdata$CROPDMGVALUE <- 0
stormdata[stormdata$CROPDMGEXP == "H", ]$CROPDMGVALUE <- stormdata[stormdata$CROPDMGEXP == "H", ]$CROPDMG * 100
stormdata[stormdata$CROPDMGEXP == "K", ]$CROPDMGVALUE <- stormdata[stormdata$CROPDMGEXP == "K", ]$CROPDMG * 1000
stormdata[stormdata$CROPDMGEXP == "M", ]$CROPDMGVALUE <- stormdata[stormdata$CROPDMGEXP == "M", ]$CROPDMG * 1000000
stormdata[stormdata$CROPDMGEXP == "B", ]$CROPDMGVALUE <- stormdata[stormdata$CROPDMGEXP == "B", ]$CROPDMG * 1000000000


totaldamage <- aggregate(PROPDMGVALUE + CROPDMGVALUE ~ EVTYPE, data = stormdata, sum)
names(totaldamage) <- c("EVTYPE", "TOTALDMG")

totaldamage <- totaldamage[order(totaldamage$TOTALDMG, decreasing=TRUE), ]
totaldamage <- totaldamage[1:10,]

Results

First it shows bar plot that shows 10 weather event types with most fatalities.

## use ggplot to plot the top 10 event types with fatalities. 
ggplot(fatalitysummary, aes(x=EVTYPE, y=FATALITIES)) + geom_bar(stat="identity") + xlab("Event Type") + ylab("Fatalities count") + ggtitle("Top 10 event types with most fatalities") + theme(axis.text.x = element_text(angle = 90))

Second, it shows a bar plot that shows 10 weather event types with most property and crop damages combined (in dollars)

## use ggplot to plot the top 10 event types with most property and crop damage
ggplot(totaldamage, aes(x = EVTYPE, y = TOTALDMG)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90)) + xlab("Event Type") + ylab("Total damage in dollars") +  ggtitle("Top 10 Event types with most Property & Crop Damage")