The basic goal of this assignment is to explore the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data was downloaded and saved on local computer. Then it was loaded on the R using the read.csv command. If object strom.data is already loaded, use that cached object instead of loading it each time the Rmd file is knitted. Moreover, this assignment used some libraries such as reshape2, reshape, ggplot2, plyr, and dplyr.
#Loading the libraries
library(reshape2)
library(reshape)
library(ggplot2)
library(plyr)
library(dplyr)
setwd("~/Desktop/Training/Data Science-Coursera/5 /Week 4/Peer-graded Assignment_Course Project 2")
storm_data <- read.csv("StormData.csv.bz2", header = T)
To understand the impact on population health we looked at the fatalities and injuries by event type and created a list of the 10 most severe weather events related to the impact on public health by aggregating up the fatalities and injuries by event type:
total_health <- aggregate(cbind(FATALITIES, INJURIES)~EVTYPE, data=storm_data, sum, na.rm=TRUE)
We then reshaped the data by adding the health cause as a factor allowing us group injuries and fatalities in a single plot:
total_health_factor <- melt(total_health[order(-(total_health$FATALITIES + total_health$INJURIES)),][1:10,], id.vars = "EVTYPE")
names(total_health_factor) <- c("EVTYPE","CAUSE","COUNT")
For the property damage we took a similar approach aggreagating the property damage by event type and then selecting the ten events that have cause the greatest economic impact related to property damage.
As the property damange estimates where collected from various data sources, with different measurements, we first had to transform the data into a single comparable value:
storm_data <- transform(storm_data, PROPDMG =
ifelse(PROPDMGEXP %in% "B", PROPDMG*10^9,
ifelse(PROPDMGEXP %in% c("m", "M"), PROPDMG*10^6,
ifelse(PROPDMGEXP %in% c("k", "K"), PROPDMG*10^3,
ifelse(PROPDMGEXP %in% c("h", "H"), PROPDMG*100,
PROPDMG)))))
We could then aggregate the values and create the list of the ten events with the most property damage:
property_damage <- aggregate(PROPDMG~EVTYPE, data=storm_data, sum, na.rm = T)
top_list <- property_damage[order(property_damage$PROPDMG, decreasing = T),][1:10,]
With the total_health_factor data set we can create a list of the 10 most harmful events with respect to population health:
ggplot(total_health_factor, aes(x = reorder(EVTYPE, COUNT), y = COUNT, fill = CAUSE)) +
geom_bar(stat = "identity") + coord_flip() +
scale_y_continuous(breaks = seq(0, 100000, by = 2500)) +
ylab("Total injuries and fatalities") +
xlab("Event type") +
ggtitle("The 10 most harmful events with respect to population health") +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
Another key concern is property damage caused by severe weather events. Although the storm database also includes crop damage caused by weather events, in this report we will focus on property damage as it impacts many personal lifes.
ggplot(top_list, aes(x = reorder(EVTYPE, PROPDMG/10e9), y = PROPDMG/10e9)) +
geom_bar(stat = "identity", fill = "darkturquoise") + coord_flip() +
ylab("Property damage in billions of dollars") +
xlab("Event type") +
ggtitle("The 10 most harmful events with respect to property damage")