This report is to fulfil the requirment of project 2 of Reproducible Research class.
Our aim is to explore National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to evalute top 5 weather events that are causing deaths and injuries. We are expecting an overlap between these top 5 weather events but some weather events might cause more injuries without being fatal.
We accessed the data through the Reproducible Research class website-the original raw data file was extracted as csv.bz2 file.
We created several varialbes. These new variables summed up the total fatalites and injuries according to each weather event.
library(plyr)
HealthOutcome <- ddply(Storm1, c("EVTYPE"), summarize, SumFatal=sum(FATALITIES), SumInjury=sum(INJURIES))
str(HealthOutcome)
## 'data.frame': 985 obs. of 3 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ SumFatal : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SumInjury: num 0 0 0 0 0 0 0 0 0 0 ...
In this project, we are focusing in the top 5 weather events causing deaths and injuries; so we extracted two samll datasets with those events accordingly.
TopFatal <- arrange(HealthOutcome, desc(SumFatal))
TopInjury <- arrange(HealthOutcome, desc(SumInjury))
Top5Fatal <- TopFatal[1:5, ]
Top5Injury <- TopInjury[1:5, ]
Top5Fatal
## EVTYPE SumFatal SumInjury
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
Top5Injury
## EVTYPE SumFatal SumInjury
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
From the above results we see that Tornado is both the number 1 cause of both deaths and jnjuries. In addition, we see clearly an overlap in these causes. We could see an event like flash flood caused more fatality while a flood caused much more injuries and less fatality.
The follwoing plots show the same result. So we could see visually the impact of these weather events and get a better grasp for the impact of the impact of these different type of weather events.
To create our plots, we first creat a long form of the two small datasets.
library(reshape)
##
## Attaching package: 'reshape'
##
## The following objects are masked from 'package:plyr':
##
## rename, round_any
Top5FatalLong <- melt(Top5Fatal, id.vars="EVTYPE")
Top5InjuryLong <- melt(Top5Injury, id.vars="EVTYPE")
par(mfrow = c(1, 3))
with(subset(Storm1, STATE =="CA"), plot(INJURIES, FATALITIES, col = "blue", pch=19, main="California"))
with(subset(Storm1, STATE =="NY"), plot(INJURIES, FATALITIES, col = "green", pch=19, main="New York"))
with(subset(Storm1, STATE =="AL"), plot(INJURIES, FATALITIES, col = "red", pch=19, main="Alabama"))
The above graph from different states in United states shows the top causes of injuries might not be the same as top causes of fatalities. But we still think there is an overplap between them.
library(ggplot2)
ggplot(Top5FatalLong, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
geom_bar(stat="identity", position = position_dodge()) +
labs(x = "Type of Weather Events", y="Toal Fatalities", title = "Top 5 Weather Events with Most Fatalities- USA")+
scale_fill_brewer(palette="Set1")
library(ggplot2)
ggplot(Top5InjuryLong, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
geom_bar(stat="identity", position = position_dodge()) +
labs(x = "Type of Weather Events", y="Toal Injuries", title = "Top 5 Weather Events with Most Injuries- USA")+
scale_fill_brewer(palette="Set1")