Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. The goal of this study is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. And you can dowload the data set from here: Storm Data.
Other documents can be find here:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?
For question 1, the top 10 most harmful evens in the US are Tornado, Excessive Heat, TSTM Wind, Flood, Lightning, Heat, Flash Flood, Ice Storm, Thuderstorm Wind, and winter storms. The number one harmful event tornado cause dramatic death and injury (96979 incidences) and it’s more than ten times harmful than the next most harmful event Excessive Heat (8428 incidences). For question 2, the top 10 most costly events in the US are Tornado, Flood, Hail, Flash Flood, Drought, Hurricane, TSTM wind, Hurricane/Typhoon, High Wind and Wildfire. The number 1 event tornado is almost double the cost of the flood.
library(data.table)
library(ggplot2)
dat <- read.csv("repdata-data-StormData.csv.bz2")
# names(dat)
# str(dat)
Convert events variable values to uniformly uppercase
Aggregate data basing on events type
Find the tope 10 events
# 1. There are mixed case in the EVTYPE (event) variable
dat$EVTYPE <- toupper(dat$EVTYPE)
eventTypeUnique <- unique(dat$EVTYPE)
# 2. Agreegate data based on the events type
results <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = dat, na.rm = T, sum)
## Remove all the value = 0
results <- subset(results, FATALITIES>0 & INJURIES>0)
rownames(results) <- NULL
results$total <- as.numeric(results$FATALITIES)+ as.numeric(results$INJURIES)
# 3. Reorder and get top 10
results <- results[order(results$total,decreasing=TRUE),]
top10 <- results[1:10,]
Plot the data for the top ten events
## Reorder data so plot will have a decreasing (most dangeous to least) pattern
top10 <- transform(top10, EVTYPE = reorder(EVTYPE, order(total)))
## Use ggplot to have better figure
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
ggplot(data = top10, aes(EVTYPE, total, fill = total)) + geom_bar(stat = "identity", position="dodge", fill=rainbow(n=length(top10$EVTYPE))) + xlab("Events Type") + ylab("Total (Fatalities + Injuries) ") + ggtitle("Top 10 Most Harmful Events (Fatalities + Injuries) ") + theme(legend.position = "none") + coord_flip()
Based on the plot, we can tell that Tornado is the most fatal event of all.
Convert PROPDMGEXP and CROPDMGEXP to uppercase
Multiply by property and crop damage values with the proper exponent value based on PROPDMGEXP and CROPDMGEXP
Find the tope 10 events for the property and crop damage combined estmates.
# 1. Convert t PROPDMGEXP and CROPDMGEXP to uppercase
dat$PROPDMGEXP <- toupper(dat$PROPDMGEXP)
dat$CROPDMGEXP <- toupper(dat$CROPDMGEXP)
# 2. Multiply by property and crop damage values with the proper exponent value based on PROPDMGEXP and CROPDMGEXP
symbles <- c("K", "M", "B")
expVal <- c(1000, 1e+6, 1e+9)
for (i in 1:3){
#print(i)
dat[dat$PROPDMGEXP == symbles[i], ]$PROPDMG <- dat[dat$PROPDMGEXP == symbles[i], ]$PROPDMG * expVal[i]
dat[dat$CROPDMGEXP == symbles[i], ]$CROPDMG <- dat[dat$CROPDMGEXP == symbles[i], ]$CROPDMG * expVal[i]
}
# head(dat)
EconomicCost <- c(dat$CROPDMG + dat$PROPDMG)
EventType <- c(dat$EVTYPE)
datNew <- data.frame(EventType, EconomicCost)
# head(datNew)
datNew <- aggregate(EconomicCost~EventType, data = datNew, sum)
# 3. Find the tope 10 events for the property and crop damage combined estmates.
EcoTop10 <- datNew[order(datNew$EconomicCost, decreasing=TRUE),][1:10,]
library(ggplot2)
EcoTop10 <- transform(EcoTop10, EventType = reorder(EventType, order(EconomicCost)))
ggplot(data = EcoTop10, aes(EventType, EconomicCost, fill = EconomicCost)) + geom_bar(stat = "identity", position="dodge", fill=rainbow(n=length(top10$EVTYPE))) + xlab("Events Type") + ylab("Total Economic Cost ") + ggtitle("Top 10 Most Costly Events") + theme(legend.position = "none") + coord_flip()
From the above chart, we can tell the tornado is the most costly event in US.
After analysis the data, I found out, in the United States, Tornado is both the most harmful event and most costly event. And the other events even though harmful and costly but not as bad as tornado. So it is clear that US government should pay much attention on tornado events to prevent death, injury and economic loss.