This report is to fulfil the requirment of project 2 of Reproducible Research class.

Synopsis

Our aim is to explore National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to evalute top 5 weather events that are causing deaths and injuries. We are expecting an overlap between these top 5 weather events but some weather events might cause more injuries without being fatal.

Data processing

We accessed the data through the Reproducible Research class website-the original raw data file was extracted as csv.bz2 file.

We created several varialbes. These new variables summed up the total fatalites and injuries according to each weather event.

library(plyr)
HealthOutcome <- ddply(Storm1, c("EVTYPE"), summarize, SumFatal=sum(FATALITIES), SumInjury=sum(INJURIES))

str(HealthOutcome)
## 'data.frame':    985 obs. of  3 variables:
##  $ EVTYPE   : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ SumFatal : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SumInjury: num  0 0 0 0 0 0 0 0 0 0 ...

In this project, we are focusing in the top 5 weather events causing deaths and injuries; so we extracted two samll datasets with those events accordingly.

TopFatal <- arrange(HealthOutcome, desc(SumFatal))
TopInjury <- arrange(HealthOutcome, desc(SumInjury))

Top5Fatal <- TopFatal[1:5, ]
Top5Injury <- TopInjury[1:5, ]

Results

Top5Fatal 
##           EVTYPE SumFatal SumInjury
## 1        TORNADO     5633     91346
## 2 EXCESSIVE HEAT     1903      6525
## 3    FLASH FLOOD      978      1777
## 4           HEAT      937      2100
## 5      LIGHTNING      816      5230
Top5Injury 
##           EVTYPE SumFatal SumInjury
## 1        TORNADO     5633     91346
## 2      TSTM WIND      504      6957
## 3          FLOOD      470      6789
## 4 EXCESSIVE HEAT     1903      6525
## 5      LIGHTNING      816      5230

From the above results we see that Tornado is both the number 1 cause of both deaths and jnjuries. In addition, we see clearly an overlap in these causes. We could see an event like flash flood caused more fatality while a flood caused much more injuries and less fatality.

The follwoing plots show the same result. So we could see visually the impact of these weather events and get a better grasp for the impact of the impact of these different type of weather events.

To create our plots, we first creat a long form of the two small datasets.

library(reshape)
## 
## Attaching package: 'reshape'
## 
## The following objects are masked from 'package:plyr':
## 
##     rename, round_any
Top5FatalLong <- melt(Top5Fatal, id.vars="EVTYPE")
Top5InjuryLong <- melt(Top5Injury, id.vars="EVTYPE")

Figure 1: Association of injuries from Weather Events type with fatalities

par(mfrow = c(1, 3))
with(subset(Storm1, STATE =="CA"), plot(INJURIES, FATALITIES, col = "blue", pch=19, main="California"))
with(subset(Storm1, STATE =="NY"), plot(INJURIES, FATALITIES, col = "green", pch=19, main="New York"))
with(subset(Storm1, STATE =="AL"), plot(INJURIES, FATALITIES, col = "red", pch=19, main="Alabama"))

The above graph from different states in United states shows the top causes of injuries might not be the same as top causes of fatalities. But we still think there is an overplap between them.

Figure 2 & 3: Top 5 Weather Events causing Deaths and Injuries

library(ggplot2)
ggplot(Top5FatalLong, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
 geom_bar(stat="identity", position = position_dodge()) +
 labs(x = "Type of Weather Events", y="Toal Fatalities", title = "Top 5 Weather Events with Most Fatalities- USA")+
 scale_fill_brewer(palette="Set1")

library(ggplot2)
ggplot(Top5InjuryLong, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
  geom_bar(stat="identity", position = position_dodge()) +
  labs(x = "Type of Weather Events", y="Toal Injuries", title = "Top 5 Weather Events with Most Injuries- USA")+
  scale_fill_brewer(palette="Set1")