Synopsis

This analysis exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which accumulated between 1950 and 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We care about two question among the data.

After reading the data, a number of manipulation techniques were applied to summarize the data by type, including ddply, reassignment of character variables, and some simple arithmetic. Some findings from this analysis:

Notes and Assumptions
  • StormData.csv is in the working directory
  • the “plyr”, “lattice” and “ggplot2” package has been required

Data Processing

Loading and pre-processing the Data

setwd("h://MOOC/Johnhopkins/Reproducible Research/assignment_2/")

Read the data.

storms = read.csv("StormData.csv", as.is=TRUE,
                  colClasses=c(rep("NULL", 7), 
                               NA,                # EV_TYPE
                               rep("NULL", 14),
                               rep(NA, 3),        # FATALITIES, INJURIES, PROPDMG
                               "NULL",
                               NA,                # CROPDMG
                               rep("NULL", 10)))
## Warning: EOF within quoted string
names(storms) = c("Event", "Fatalities", "Injuries", "Property", "Crop")

Data Transformations

Finding the events that cause the most injuries and fatalities

We should know how many times an severe weather occurs.

storms$EventCount = 1

Find the Top 5 severe weather event types with respect to fatalities, injuries, property damage, and crop damage.

totals = aggregate(cbind(as.integer(Fatalities), as.integer(Injuries), 
                         as.integer(Property), as.integer(Crop), EventCount) ~ Event, storms, sum)
names(totals) = c("Event", "Fatalities", "Injuries", "Property", "Crop", "EventCount")
COUNT = 5
top5 = function(v, n=COUNT) {
    i = length(v) - n + 1
    sort(v, partial=i)[i] 
}
p_dmg = totals[totals$Fatalities + totals$Injuries >= top5(totals$Fatalities + totals$Injuries), 
                 c("Event", "Fatalities", "Injuries")]

We can conclude from this graph that TORNADO is obviously dangerous severe weather.

library(lattice)
barchart(Fatalities + Injuries ~ Event, p_dmg, 
         xlab="Severe Weather Event", ylab="Fatalities / Injuries", auto.key=TRUE, stack=FALSE)

plot of chunk unnamed-chunk-5

But may be it’s just because TORNADO is often happens in USA, the TORNADO happens the most in past 50 years in USA rather than other event.

e_times = totals[totals$EventCount>=tail(sort(totals$EventCount), 5)[1], c(1,6)]

We define the population health can be represened by the sum the number of injuries and the number of fatalities. Create a new variable dmg representing the sum of injuries and fatalities. Calculate the number of injuries and fatalities for each event.

Then, we divide dmg by EventCount, and create a variable dmg_rate to evaluate relative damage of event.

totals$dmg = totals$Injuries + totals$Fatalities
totals$dmg_rate = totals$dmg / totals$EventCount

Order the dataset by the most fatal/injurious weather events

totalMostDangerous = totals[with(totals, order(-dmg),),]
MostDangerousPerEvent = totals[with(totals, order(-dmg_rate),),]

Create a dataset limited to only the ten most dangerous events

top5MostDangerous = totalMostDangerous[1:5,]
top5MostDangerousPerEvent = totalMostDangerous[1:5,]
Finding the events that cause the most expensive damage

Sum PROPDMGEXP (property damage) and CROPDMGEXP (crop damage) for an analysis of total damage expenses.

totals$mdmg = totals$Property + totals$Crop

And find out Top 5 events which has the greatest economic consequences.

m_dmg = totals[totals$Property + totals$Crop >= top5(totals$Property + totals$Crop), 
                 c("Event", "Property", "Crop")]

So, we can find that TORNADO also has the greatest economic consequences.

Results

The most dangerous weather events

Graph of the most dangerous weather events

library(ggplot2)
g = ggplot(top5MostDangerous,aes(Event,dmg_rate))
g = g + ylab("Injuries and Fatalities")
g = g + xlab("Weather Events")
g = g + geom_bar(stat="identity")
g = g + ggtitle("Top 10 Most Dangerous Weather Events")
g = g + theme(axis.text.x=element_text(angle=90))
print(g)

plot of chunk unnamed-chunk-12

Top 5 Most Dangerous - table of values

top5MostDangerous
##              Event Fatalities Injuries Property  Crop EventCount   dmg
## 829        TORNADO       4658    80084  2571436 39105      49289 84742
## 851      TSTM WIND        471     6452  1058797 95361     189928  6923
## 170          FLOOD        258     6499   339304 60838       9586  6757
## 130 EXCESSIVE HEAT       1416     4354       53     2        991  5770
## 463      LIGHTNING        562     3628   334017  2407       9953  4190
##     dmg_rate
## 829  1.71929
## 851  0.03645
## 170  0.70488
## 130  5.82240
## 463  0.42098

Graph of the greatest economic consequences events

library(lattice)
barchart(Property + Crop ~ Event, m_dmg, 
         xlab="Severe Weather Event", ylab="Fatalities / Injuries", auto.key=TRUE, stack=FALSE)

plot of chunk unnamed-chunk-14

References