The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Storm Data Disclaimer. Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents: DOWNLOAD
The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce; Rare, unusual, weather phenomena that generate media attention, such as snow flurries in South Florida or the San Diego coastal area; and *Other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occur in connection with another event.
The types of events: Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill, Debris Flow , Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat, Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog, Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane (Typhoon), Ice Storm, Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind, Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet, Storm Surge/Tide, Strong Wind, Thunderstorm Wind, Tornado, Tropical Depression, Tropical Storm, Tsunami, Volcanic Ash, Waterspout Wildfire, Winter Storm, and Winter Weather, etc…
The following described the session and the environment of which this analysis was conducted: R version 3.2.0 (2015-04-16), Platform: x86_64-w64-mingw32/x64 (64-bit), *Running under: Windows 8 x64 (build 9200).
The data analysis addresses how much harm certain events in the United States to the population health, particularly the events listed above that caused of deaths and injuries. Also, the data analysis aims to identify which types of events have the greatest economic consequences in terms of total damage expenses: which sums up the damages done to properties and crops combined.
The data will be loaded originally with 902,297 observations with 37 variables. Data Processing includes selecting only relevant information to address the analysis of deaths, injuries, and economic damages (to properties, and crops). Data will also be subsetted to include only readings from 1991 until 2011, where the data collection process lessen the presence of NAs, which will be set to 0 for any existing ones in the subset. Lastly, data will be plotted to show which events rank the heighest in damages measured by number of deaths, injuries and billions of dollars.
library(ggplot2)
library(dplyr)
library(plyr)
# NOTE: use the setwd() to point to the folder you wish to work on. In my computer it was: setwd("~/Desktop/Coursera/DataScience/5ReproducibleResearch/Project2"), the important thing is to save the .Rmd file and expect the code to create the following "./Data/" folder in your working directory
if (!"repdata-data-StormData.csv.bz2" %in% dir(".")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "./stormData.csv.bz2")
bunzip2("./Data/repdata-data-StormData.csv", overwrite=T, remove=F)
}
if (!"stormData" %in% ls()) {
stormData <- read.csv("./Data/repdata-data-StormData.csv")
}
muchsmallerStormData <- smallerStormData[smallerStormData$YEAR > 1990, ]
#This function converts the magnitude to its numerical value
evaluateMagnitudeExpenses <- function(x){
if(grepl("h", x, ignore.case=TRUE)) {
x = 100
}
else if(grepl("k", x, ignore.case=TRUE)) {
x = 1000
}
else if(grepl("m", x, ignore.case=TRUE)) {
x = 1000000
}
else if(grepl("b", x, ignore.case=TRUE)) {
x = 1000000000
}
else if(x == "" || x == " "){
x = 1
}
else{
x <- NA
}
x
}
calculateAmt <- function(amt, mag){
if(is.numeric(amt)){
mag = evaluateMagnitudeExpenses(mag)
amt = amt*mag
}
if(!is.numeric(amt)){
amt <- 0
}
amt
}
# Call to the function calculateAmount,
muchsmallerStormData$PROPTOTALDMG <- mapply(calculateAmt, muchsmallerStormData$PROPDMG, muchsmallerStormData$PROPDMGEXP)
muchsmallerStormData$CROPTOTALDMG <- mapply(calculateAmt, muchsmallerStormData$CROPDMG, muchsmallerStormData$CROPDMGEXP)
# Trimming the data to what is needed for analysis
stormDataExpTot <- muchsmallerStormData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPTOTALDMG", "CROPTOTALDMG", "YEAR")]
stormDataExpTotal <- rename(stormDataExpTot, c("EVTYPE"="event", "FATALITIES"="fatalities", "INJURIES"="injuries", "PROPTOTALDMG"="propdmg", "CROPTOTALDMG"="cropdmg", "YEAR"="year"))
stormDataExpTotal <- mutate(stormDataExpTotal, total = propdmg + cropdmg)
stormDataTrimmed <- stormDataExpTotal[, c("event", "fatalities", "injuries", "total", "year")]
factorEVData <- factor(stormDataTrimmed$event)
factorYearData <- factor(stormDataTrimmed$year)
countDistinctEV <- unique(factorEVData)
countEVtype <- length(countDistinctEV)
countofObservations <- nrow(stormData)
aggrFatalitiesEV <- aggregate(fatalities ~ event, data = stormDataTrimmed, sum)
aggrFatalitiesEV <- arrange(aggrFatalitiesEV, desc(fatalities))
top10Fatalities <- aggrFatalitiesEV[1:10,]
aggrInjuriesEV <- aggregate(injuries ~ event, data = stormDataTrimmed, sum)
aggrInjuriesEV <- arrange(aggrInjuriesEV, desc(injuries))
top10Injuries <- aggrInjuriesEV[1:10,]
fatalitiesPlot <- qplot(event, data = top10Fatalities, weight = fatalities, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Events") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1991 - 2011")
injuriesPlot <- qplot(event, data = top10Injuries, weight = injuries, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Events") +
ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1991 - 2011")
aggrTotalEconDMG <- aggregate(total ~ event, data = stormDataTrimmed, sum)
aggrTotalEconDMG <- arrange(aggrTotalEconDMG, desc(total))
aggrTotalEconDMG <- mutate(aggrTotalEconDMG, billionsofdollars = total/1000000000)
top10EconDMG <- aggrTotalEconDMG[1:20,]
econDMGPlot <- qplot(event, data = top10EconDMG, weight = billionsofdollars, geom = "bar", binwidth = 1) +
scale_y_continuous("Total Economic Damage ($Billions)") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Events") +
ggtitle("Total amount in Billions of dollars \n of Property and Crop Damages by \nSevere Weather Events in the U.S.\n from 1991 - 2011")
From the data gathered between 1991 to 2011, we see below the top 5 events and the damages they caused the U.S.
#Top 5 events and the number of fatalities they caused:
top10Fatalities[1:5,]
## event fatalities
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1699
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
#Top 5 events and the number of injuries they caused:
top10Injuries[1:5,]
## event injuries
## 1 TORNADO 25497
## 2 FLOOD 6789
## 3 EXCESSIVE HEAT 6525
## 4 LIGHTNING 5230
## 5 TSTM WIND 4441
#Top 5 events and the amount of billions of dollars expenses:
top10EconDMG[1:5,]
## event total billionsofdollars
## 1 FLOOD 150319678257 150.31968
## 2 HURRICANE/TYPHOON 71913712800 71.91371
## 3 STORM SURGE 43323541000 43.32354
## 4 TORNADO 29262722353 29.26272
## 5 HAIL 18733216730 18.73322