The basic goal of this assignment is to explore the NOAA (U.S. National Oceanic and Atmospheric Administration’s) Storm Database and answer two basic questions about severe weather events. The first question was to determine which types of weather events are most harmful with respect to population health. The other question was to determine which types of events have the greatest economic consequences. Based on an analysis of the aftermentioned database, tornados appear to cause the most harm to human populations as well as the most econmic damage.
The working directory was set and the data was read into R and the necessary packages were loaded into R.
setwd("~/Data Science/Course 5/Week 4 Assignment/")
data<-read.csv("Stormdata.csv", stringsAsFactors = FALSE)
storm <- data.frame(data$EVTYPE, data$FATALITIES, data$INJURIES, data$PROPDMG,
data$PROPDMGEXP, data$CROPDMG, data$CROPDMGEXP, stringsAsFactors = FALSE)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(utils)
Using the Storm data, all the blanks and NAs within the data were transformed into zeros. Depending on the letter value in the crop and property fields, they were then given the numerical value that corresponded to their letter value. Those values were then multipled with the CROPDMGEXP and PROPDMGEXP fields respectively to determine the true crop and property damage values respectively. The new column with this data (damage_total) was then bound into the storm data table.
The tapply function was the used for fatalities, injuries and economic damage (property + crop damage) seperately and were placed in descending order.
storm$data.FATALITIES[(storm$data.FATALITIES == "")] <- 0
storm$data.INJURIES[(storm$data.INJURIES == "")] <- 0
storm$data.PROPDMG[(storm$data.PROPDMG == "NA")] <- 0
storm$data.CROPDMG[(storm$data.CROPDMG == "NA")] <- 0
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "")] <- 0
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "+") | (storm$data.PROPDMGEXP ==
"-") | (storm$data.PROPDMGEXP == "?")] <- 1
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "h") | (storm$data.PROPDMGEXP ==
"H")] <- 2
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "k") | (storm$data.PROPDMGEXP ==
"K")] <- 3
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "m") | (storm$data.PROPDMGEXP ==
"M")] <- 6
storm$data.PROPDMGEXP[(storm$data.PROPDMGEXP == "B")] <- 9
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "")] <- 0
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "+") | (storm$data.CROPDMGEXP ==
"-") | (storm$data.CROPDMGEXP == "?")] <- 1
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "h") | (storm$data.CROPDMGEXP ==
"H")] <- 2
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "k") | (storm$data.CROPDMGEXP ==
"K")] <- 3
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "m") | (storm$data.CROPDMGEXP ==
"M")] <- 6
storm$data.CROPDMGEXP[(storm$data.CROPDMGEXP == "B")] <- 9
storm$data.PROPDMGEXP <- as.integer(storm$data.PROPDMGEXP)
storm$data.CROPDMGEXP <- as.integer(storm$data.CROPDMGEXP)
damage_total <- storm$data.PROPDMG * 10^storm$data.PROPDMGEXP + storm$data.CROPDMG *
10^storm$data.CROPDMGEXP
storm <- cbind(storm, damage_total)
te <- sort(tapply(storm$damage_total, storm$data.EVTYPE, sum), decreasing = T)
tf <- sort(tapply(storm$data.FATALITIES, storm$data.EVTYPE, sum), decreasing = T)
ti <- sort(tapply(storm$data.INJURIES, storm$data.EVTYPE, sum), decreasing = T)
The names and head of each respective variable was then coerced into a vector and plots were created.
names <- as.vector(names(head(ti)))
vals <- as.vector(head(ti))
df <- data.frame(names, vals)
health<-ggplot(data = df, aes(x = df$names, y = df$vals)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +
ylab("# of Injuries") + ggtitle("NOAA Top 6: Highest Injury Counts, 1950-2011")
print(health)
names <- as.vector(names(head(tf)))
vals <- as.vector(head(tf))
df <- data.frame(names, vals)
death<-ggplot(data = df, aes(x = df$names, y = df$vals)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +
ylab("# of Fatalities") + ggtitle("NOAA Top 6: Highest Fatality Counts, 1950-2011")
print(death)
names <- as.vector(names(head(te)))
vals <- as.vector(head(te))
df <- data.frame(names, vals)
cost<-ggplot(data = df, aes(x = df$names, y = df$vals)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +
ylab("Economic Damage ($)") + ggtitle("NOAA Top 6: Highest Damage (Property + Crop), 1950-2011")
print(cost)
Therefore, tornadoes caused the greatest impact to human health while flooding caused the greatest damage to property and crops.