Goal of the Assignment:

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

About the data:

Storm Data Disclaimer. Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents: DOWNLOAD

The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce; Rare, unusual, weather phenomena that generate media attention, such as snow flurries in South Florida or the San Diego coastal area; and *Other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occur in connection with another event.

The types of events: Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill, Debris Flow , Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat, Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog, Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane (Typhoon), Ice Storm, Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind, Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet, Storm Surge/Tide, Strong Wind, Thunderstorm Wind, Tornado, Tropical Depression, Tropical Storm, Tsunami, Volcanic Ash, Waterspout Wildfire, Winter Storm, and Winter Weather, etc…

Environment/Session Information

The following described the session and the environment of which this analysis was conducted: R version 3.2.0 (2015-04-16), Platform: x86_64-w64-mingw32/x64 (64-bit), *Running under: Windows 8 x64 (build 9200).

Synopsis

The data analysis addresses how much harm certain events in the United States to the population health, particularly the events listed above that caused of deaths and injuries. Also, the data analysis aims to identify which types of events have the greatest economic consequences in terms of total damage expenses: which sums up the damages done to properties and crops combined.

The data will be loaded originally with 902,297 observations with 37 variables. Data Processing includes selecting only relevant information to address the analysis of deaths, injuries, and economic damages (to properties, and crops). Data will also be subsetted to include only readings from 1991 until 2011, where the data collection process lessen the presence of NAs, which will be set to 0 for any existing ones in the subset. Lastly, data will be plotted to show which events rank the heighest in damages measured by number of deaths, injuries and billions of dollars.

Data Processing

1. Load the necessary packages. Normally, warnings will appear. Use “warning=FALSE” to suppress them. If the data is not in the designated folder, which is “./Data/” of the working directory, R will download the file and unzip it in the data folder. In order to avoid reading the data file, which is 409.4 MB, a condition checks if the data is already stored in a variable named “stormData” within the Global Environment
library(ggplot2)
library(dplyr)
library(plyr)

# NOTE: use the setwd() to point to the folder you wish to work on. In my computer it was: setwd("~/Desktop/Coursera/DataScience/5ReproducibleResearch/Project2"), the important thing is to save the .Rmd file and expect the code to create the following "./Data/" folder in your working directory

if (!"repdata-data-StormData.csv.bz2" %in% dir(".")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "./stormData.csv.bz2")
    bunzip2("./Data/repdata-data-StormData.csv", overwrite=T, remove=F)
}

if (!"stormData" %in% ls()) {
stormData <- read.csv("./Data/repdata-data-StormData.csv")
}
3. Since there are less data before 1990, the data we will focus on is 1991 to 2011
muchsmallerStormData <- smallerStormData[smallerStormData$YEAR > 1990, ]
4 Consolidating PROPDMG and PROPDMGEXP into PROPTOTALEXP, CROPDMG and CROPDMGEXP into CROPTOTALEXP by storing the amount of expenses after evaluating the alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.
#This function converts the magnitude to its numerical value
evaluateMagnitudeExpenses <- function(x){
    if(grepl("h", x, ignore.case=TRUE)) {
      x = 100
    }
    else if(grepl("k", x, ignore.case=TRUE)) {
      x = 1000
    }
    else if(grepl("m", x, ignore.case=TRUE)) {
      x = 1000000
    }
    else if(grepl("b", x, ignore.case=TRUE)) {
      x = 1000000000
    }
    else if(x == "" || x == " "){
      x = 1
    }
    else{
      x <- NA
    }
    x
}
5. The “calculateAmt”" function to take two parameters amount and magnitude and returns the product of amount and evaluated magnitude. This function will be used to evaluate the total amount by multiplying to the evaluated magniture from the function “evaluateMagnitudeExpenses”"
   calculateAmt <- function(amt, mag){
    if(is.numeric(amt)){
        mag = evaluateMagnitudeExpenses(mag)   
        amt = amt*mag
    }

    if(!is.numeric(amt)){
      amt <- 0
    }

    amt
  }
6. To calculate the total amount of damage due to the events on the economy, the property and crop damages should be calculated and added to one another. The mutation of the data involves adding extra columns to contain the sum of the property and crop damages. In the end, a cleaner and trimmed dataset will be saved.
# Call to the function calculateAmount, 
 muchsmallerStormData$PROPTOTALDMG <- mapply(calculateAmt, muchsmallerStormData$PROPDMG, muchsmallerStormData$PROPDMGEXP)
  muchsmallerStormData$CROPTOTALDMG <- mapply(calculateAmt, muchsmallerStormData$CROPDMG, muchsmallerStormData$CROPDMGEXP)
# Trimming the data to what is needed for analysis
stormDataExpTot <- muchsmallerStormData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPTOTALDMG", "CROPTOTALDMG", "YEAR")]
stormDataExpTotal <- rename(stormDataExpTot, c("EVTYPE"="event", "FATALITIES"="fatalities", "INJURIES"="injuries", "PROPTOTALDMG"="propdmg", "CROPTOTALDMG"="cropdmg", "YEAR"="year"))
stormDataExpTotal <- mutate(stormDataExpTotal, total = propdmg + cropdmg)
stormDataTrimmed <- stormDataExpTotal[, c("event", "fatalities", "injuries", "total", "year")]
7. So far, the values of EVTYPE can be combined together based on commonality.
factorEVData <- factor(stormDataTrimmed$event)
factorYearData <- factor(stormDataTrimmed$year)
countDistinctEV <- unique(factorEVData)  
countEVtype <- length(countDistinctEV)
countofObservations <- nrow(stormData)
8. Looking at the number of event types, a decision to count fatalities as a factor of event type was chosen
aggrFatalitiesEV <- aggregate(fatalities ~ event, data = stormDataTrimmed, sum)
aggrFatalitiesEV <- arrange(aggrFatalitiesEV, desc(fatalities))
top10Fatalities <- aggrFatalitiesEV[1:10,]
9. Counting injuries
aggrInjuriesEV <- aggregate(injuries ~ event, data = stormDataTrimmed, sum)
aggrInjuriesEV <- arrange(aggrInjuriesEV, desc(injuries))
top10Injuries <- aggrInjuriesEV[1:10,]

Plots:

1. By, plotting the fatalities and injuries against the top 10 specific events that affected the U.S., we can see that the most severe events that cost lives
fatalitiesPlot <- qplot(event, data = top10Fatalities, weight = fatalities, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Fatalities") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Events") + 
    ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1991 - 2011")
injuriesPlot <- qplot(event, data = top10Injuries, weight = injuries, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Injuries") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Events") + 
    ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1991 - 2011")

2. Plotting the Total Amount of Economic Damages (includes both property and crop expenses)
aggrTotalEconDMG <- aggregate(total ~ event, data = stormDataTrimmed, sum)
aggrTotalEconDMG <- arrange(aggrTotalEconDMG, desc(total))
aggrTotalEconDMG <- mutate(aggrTotalEconDMG, billionsofdollars = total/1000000000)
top10EconDMG <- aggrTotalEconDMG[1:20,]

econDMGPlot <- qplot(event, data = top10EconDMG, weight = billionsofdollars, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Total Economic Damage ($Billions)") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Events") + 
    ggtitle("Total amount in Billions of dollars \n of Property and Crop Damages by \nSevere Weather Events in the U.S.\n from 1991 - 2011")

Results

From the data gathered between 1991 to 2011, we see below the top 5 events and the damages they caused the U.S.

#Top 5 events and the number of fatalities they caused:
top10Fatalities[1:5,]
##            event fatalities
## 1 EXCESSIVE HEAT       1903
## 2        TORNADO       1699
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
#Top 5 events and the number of injuries they caused:
top10Injuries[1:5,]
##            event injuries
## 1        TORNADO    25497
## 2          FLOOD     6789
## 3 EXCESSIVE HEAT     6525
## 4      LIGHTNING     5230
## 5      TSTM WIND     4441
#Top 5 events and the amount of billions of dollars expenses:
top10EconDMG[1:5,]
##               event        total billionsofdollars
## 1             FLOOD 150319678257         150.31968
## 2 HURRICANE/TYPHOON  71913712800          71.91371
## 3       STORM SURGE  43323541000          43.32354
## 4           TORNADO  29262722353          29.26272
## 5              HAIL  18733216730          18.73322