I decided to select the top 5 major events using the dplyr package for the categories of Injuries, Deaths, Crop Damage, Property Damage. I then used the filtered data to look at the impact of that variable in regards to the event type. Based on which area you live in and which of these weather events are most common, you can use the this report to get a better picture of the health and economic impacts these events have on your community. While Tornados may seem to have the most impact on health, it is split on damages with hail having a larger impact on crops.
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Include Libraries
library(RCurl)
## Loading required package: bitops
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
Read the csv into the workspace
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv.bz2", method = "curl")
stormdata <- read.csv("StormData.csv.bz2")
Filter the data to relevent fields using the dplyr package
sdata <- filter(stormdata, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
sd <- select(sdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
Filtering the data by total deaths
##Use tapply to apply the sum function by event type
t <- tapply(sd$FATALITIES, sd$EVTYPE, sum)
##sort the data from largest to smallest
td<-sort(t, decreasing = T)
##select the top 5 observations
tds <- td[1:5]
##change the vector to a data frame
dff<-as.data.frame(tds)
##Add the row names as columns
dff[,2] <- row.names(dff)
## Add the column names
colnames(dff) <- c("Total_Deaths", "Event")
## Change the total deaths column to numeric
dff$Total_Deaths<-as.numeric(dff$Total_Deaths)
##remove the row names
row.names(dff)<-NULL
##store plot the total deaths
d_plot <- qplot(Event, Total_Deaths, data = dff, main = "Total Deaths from Weather Events", xlab = "Events", ylab = "Total Deaths")
Filtering the data by total injuries
##Use tapply to apply the sum function by event type
t2 <- tapply(sd$INJURIES, sd$EVTYPE, sum)
##sort the data from largest to smallest
ti<-sort(t2, decreasing = T)
##select the top 5 observations
tis <- ti[1:5]
##change the vector to a data frame
dfi<-as.data.frame(tis)
##Add the row names as columns
dfi[,2] <- row.names(dfi)
## Add the column names
colnames(dfi) <- c("Total_Injuries", "Event")
## Change the total injuries column to numeric
dfi$Total_Injuries<-as.numeric(dfi$Total_Injuries)
##remove the row names
row.names(dfi)<-NULL
##Store plot the total injuries
i_plot <- qplot(Event, Total_Injuries, data = dfi, main = "Total Injuries from Weather Events", xlab = "Events", ylab = "Total Injuries")
Filtering the data by total property damage
##Use tapply to apply the sum function by event type
t3 <- tapply(sd$PROPDMG, sd$EVTYPE, sum)
##sort the data from largest to smallest
tp<-sort(t3, decreasing = T)
##select the top 5 observations
tps <- tp[1:5]
##change the vector to a data frame
dfp<-as.data.frame(tps)
##Add the row names as columns
dfp[,2] <- row.names(dfp)
## Add the column names
colnames(dfp) <- c("Total_Property_Damage", "Event")
## Change the total property damage column to numeric
dfp$Total_Property_Damage<-as.numeric(dfp$Total_Property_Damage)
##remove the row names
row.names(dfp)<-NULL
##Store plot the total property damage
p_plot <- qplot(Event, Total_Property_Damage, data = dfp, main = "Total Property Damage from Weather Events", xlab = "Events", ylab = "Total Property Damage")
Filtering the data by total crop damage
##Use tapply to apply the sum function by event type
t4 <- tapply(sd$CROPDMG, sd$EVTYPE, sum)
##sort the data from largest to smallest
tc<-sort(t4, decreasing = T)
##select the top 5 observations
tcs <- tc[1:5]
##change the vector to a data frame
dfc<-as.data.frame(tcs)
##Add the row names as columns
dfc[,2] <- row.names(dfc)
## Add the column names
colnames(dfc) <- c("Total_Crop_Damage", "Event")
## Change the total crop damage column to numeric to graph
dfc$Total_Crop_Damage<-as.numeric(dfc$Total_Crop_Damage)
##remove the row names
row.names(dfc)<-NULL
##Store plot the total crop damage
c_plot <- qplot(Event, Total_Crop_Damage, data = dfc, main = "Total Crop Damage from Weather Events(6e+05 = 600,000)", xlab = "Events", ylab = "Total Crop Damage")
Tornados are by far the most harmful to the population in both deaths and injuries. Hail appears to have the highest economic effect on crops while tornados have the greatest effect on property damage.
grid.arrange(d_plot, i_plot)
grid.arrange(c_plot, p_plot)