Weather events such as storm can have great impact on health and properties.
In this project, the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database is examined. The events in the database start in the year 1950 and end in November 2011. Questions that I like to address are: 1) which type of events are most harmful to population health? 2) which type of events have the greatest consequences on economics?
The original data can be downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
First load packages:
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.2
library(ggplot2)
opts_chunk$set(echo = TRUE)
Download, extract the file to the working directory, then load:
data <- read.csv("repdata-data-StormData.csv",
header = TRUE, sep = ",", dec = ".")
Total of 902297 observations with 37 variables.
1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Lets look at injuries and fatalities in relation to event type:
event.type <- data$EVTYPE
injure <- data$INJURIES
fatal <- data$FATALITIES
combined.harm <- injure + fatal
#subset a portion from the original data
sub.data <- data.frame(event.type, combined.harm)
sum.harm <- aggregate(combined.harm ~ event.type, sub.data, sum)
#indexing for the order
order <- order(sum.harm$combined.harm, decreasing = TRUE)
order.sum.harm <- sum.harm[order,]
colnames(order.sum.harm) <- c("Events",
"Total_Injuries_and_Fatalities")
high.20.harm <- head(order.sum.harm, 20)
#order the subset
order.harm.factor <- reorder(high.20.harm$Events,
-high.20.harm$Total_Injuries_and_Fatalities)
head(high.20.harm)
## Events Total_Injuries_and_Fatalities
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
Looks like the top events that cause the most injuries and fatalities are: TORNADO, 9.697910^{4}, EXCESSIVE HEAT, 8428, TSTM WIND, 7461 and so on.
Let’s visualize this in a plot:
plot1 <- qplot(order.harm.factor, high.20.harm$Total_Injuries_and_Fatalities,
data = high.20.harm, stat="identity", geom = "bar")
plot1 + labs(title = "Total Injuries and Fatalities versus Events",
x = "Event Types", y = "Number of Injuries/Fatalities") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
2: Across the United States, which types of events have the greatest economic consequences?
First lets subset the property and crop damage columns in relations to event types:
prop.damage <- data$PROPDMG
crop.damage <- data$CROPDMG
combined.damage <- prop.damage + crop.damage
#subset a portion from the original data
sub.data2 <- data.frame(event.type, combined.damage)
sum.damage <- aggregate(combined.damage ~ event.type, sub.data2,sum)
#indexing for the order
order2 <- order(sum.damage$combined.damage, decreasing = TRUE)
order.sum.damage <- sum.damage[order2,]
colnames(order.sum.damage) <- c("Events", "Total_Economy_Damage")
high.20.damage <- head(order.sum.damage, 20)
#order the subset
order.damage.factor <- reorder(high.20.damage$Events,
-high.20.damage$Total_Economy_Damage)
head(high.20.damage)
## Events Total_Economy_Damage
## 834 TORNADO 3312276.7
## 153 FLASH FLOOD 1599325.1
## 856 TSTM WIND 1445168.2
## 244 HAIL 1268289.7
## 170 FLOOD 1067976.4
## 760 THUNDERSTORM WIND 943635.6
Now I make a plot to show the damage versus the event types:
plot2 <- qplot(order.damage.factor, high.20.damage$Total_Economy_Damage,
data = high.20.damage, stat="identity", geom = "bar")
plot2 + labs(title = "Total Economic Damage versus Events",
x = "Event Types", y = "Economic Damage in USD") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
As one can see, the tornado causes the most economic damage in terms of properties and crops between the years 1950 and 2011.