This analysis exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which accumulated between 1950 and 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We care about two question among the data.
After reading the data, a number of manipulation techniques were applied to summarize the data by type, including ddply, reassignment of character variables, and some simple arithmetic. Some findings from this analysis:
setwd("h://MOOC/Johnhopkins/Reproducible Research/assignment_2/")
Read the data.
storms = read.csv("StormData.csv", as.is=TRUE,
colClasses=c(rep("NULL", 7),
NA, # EV_TYPE
rep("NULL", 14),
rep(NA, 3), # FATALITIES, INJURIES, PROPDMG
"NULL",
NA, # CROPDMG
rep("NULL", 10)))
## Warning: EOF within quoted string
names(storms) = c("Event", "Fatalities", "Injuries", "Property", "Crop")
We should know how many times an severe weather occurs.
storms$EventCount = 1
Find the Top 5 severe weather event types with respect to fatalities, injuries, property damage, and crop damage.
totals = aggregate(cbind(as.integer(Fatalities), as.integer(Injuries),
as.integer(Property), as.integer(Crop), EventCount) ~ Event, storms, sum)
names(totals) = c("Event", "Fatalities", "Injuries", "Property", "Crop", "EventCount")
COUNT = 5
top5 = function(v, n=COUNT) {
i = length(v) - n + 1
sort(v, partial=i)[i]
}
p_dmg = totals[totals$Fatalities + totals$Injuries >= top5(totals$Fatalities + totals$Injuries),
c("Event", "Fatalities", "Injuries")]
We can conclude from this graph that TORNADO is obviously dangerous severe weather.
library(lattice)
barchart(Fatalities + Injuries ~ Event, p_dmg,
xlab="Severe Weather Event", ylab="Fatalities / Injuries", auto.key=TRUE, stack=FALSE)
But may be it’s just because TORNADO is often happens in USA, the TORNADO happens the most in past 50 years in USA rather than other event.
e_times = totals[totals$EventCount>=tail(sort(totals$EventCount), 5)[1], c(1,6)]
We define the population health can be represened by the sum the number of injuries and the number of fatalities. Create a new variable dmg representing the sum of injuries and fatalities. Calculate the number of injuries and fatalities for each event.
Then, we divide dmg by EventCount, and create a variable dmg_rate to evaluate relative damage of event.
totals$dmg = totals$Injuries + totals$Fatalities
totals$dmg_rate = totals$dmg / totals$EventCount
Order the dataset by the most fatal/injurious weather events
totalMostDangerous = totals[with(totals, order(-dmg),),]
MostDangerousPerEvent = totals[with(totals, order(-dmg_rate),),]
Create a dataset limited to only the ten most dangerous events
top5MostDangerous = totalMostDangerous[1:5,]
top5MostDangerousPerEvent = totalMostDangerous[1:5,]
Sum PROPDMGEXP (property damage) and CROPDMGEXP (crop damage) for an analysis of total damage expenses.
totals$mdmg = totals$Property + totals$Crop
And find out Top 5 events which has the greatest economic consequences.
m_dmg = totals[totals$Property + totals$Crop >= top5(totals$Property + totals$Crop),
c("Event", "Property", "Crop")]
So, we can find that TORNADO also has the greatest economic consequences.
Graph of the most dangerous weather events
library(ggplot2)
g = ggplot(top5MostDangerous,aes(Event,dmg_rate))
g = g + ylab("Injuries and Fatalities")
g = g + xlab("Weather Events")
g = g + geom_bar(stat="identity")
g = g + ggtitle("Top 10 Most Dangerous Weather Events")
g = g + theme(axis.text.x=element_text(angle=90))
print(g)
Top 5 Most Dangerous - table of values
top5MostDangerous
## Event Fatalities Injuries Property Crop EventCount dmg
## 829 TORNADO 4658 80084 2571436 39105 49289 84742
## 851 TSTM WIND 471 6452 1058797 95361 189928 6923
## 170 FLOOD 258 6499 339304 60838 9586 6757
## 130 EXCESSIVE HEAT 1416 4354 53 2 991 5770
## 463 LIGHTNING 562 3628 334017 2407 9953 4190
## dmg_rate
## 829 1.71929
## 851 0.03645
## 170 0.70488
## 130 5.82240
## 463 0.42098
Graph of the greatest economic consequences events
library(lattice)
barchart(Property + Crop ~ Event, m_dmg,
xlab="Severe Weather Event", ylab="Fatalities / Injuries", auto.key=TRUE, stack=FALSE)