Abstract

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. The goal of this study is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

Introduction

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. And you can dowload the data set from here: Storm Data.

Other documents can be find here:

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Questions to answer:

1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

Results:

For question 1, the top 10 most harmful evens in the US are Tornado, Excessive Heat, TSTM Wind, Flood, Lightning, Heat, Flash Flood, Ice Storm, Thuderstorm Wind, and winter storms. The number one harmful event tornado cause dramatic death and injury (96979 incidences) and it’s more than ten times harmful than the next most harmful event Excessive Heat (8428 incidences). For question 2, the top 10 most costly events in the US are Tornado, Flood, Hail, Flash Flood, Drought, Hurricane, TSTM wind, Hurricane/Typhoon, High Wind and Wildfire. The number 1 event tornado is almost double the cost of the flood.

Methods

Data loading

library(data.table)
library(ggplot2)
dat <- read.csv("repdata-data-StormData.csv.bz2")
# names(dat)
# str(dat)

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Data Processing:

  1. Convert events variable values to uniformly uppercase

  2. Aggregate data basing on events type

  3. Find the tope 10 events

# 1. There are mixed case in the EVTYPE (event) variable
dat$EVTYPE <- toupper(dat$EVTYPE)
eventTypeUnique <- unique(dat$EVTYPE)



# 2. Agreegate data based on the events type
results <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = dat, na.rm = T, sum)
## Remove all the value = 0
results <- subset(results, FATALITIES>0 & INJURIES>0)
rownames(results) <- NULL
results$total <- as.numeric(results$FATALITIES)+ as.numeric(results$INJURIES)


# 3. Reorder and get top 10
results <- results[order(results$total,decreasing=TRUE),]
top10 <- results[1:10,]

Plotting Step:

Plot the data for the top ten events

## Reorder data so plot will have a decreasing (most dangeous to least) pattern
top10 <- transform(top10, EVTYPE = reorder(EVTYPE, order(total)))

## Use ggplot to have better figure
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
ggplot(data = top10, aes(EVTYPE, total, fill = total)) + geom_bar(stat = "identity", position="dodge", fill=rainbow(n=length(top10$EVTYPE))) + xlab("Events Type") + ylab("Total (Fatalities + Injuries) ") + ggtitle("Top 10 Most Harmful Events (Fatalities + Injuries) ") + theme(legend.position = "none") + coord_flip()

Based on the plot, we can tell that Tornado is the most fatal event of all.

Question 2: Across the United States, which types of events have the greatest economic consequences?

Data Processing Step:

  1. Convert PROPDMGEXP and CROPDMGEXP to uppercase

  2. Multiply by property and crop damage values with the proper exponent value based on PROPDMGEXP and CROPDMGEXP

  3. Find the tope 10 events for the property and crop damage combined estmates.

# 1. Convert t PROPDMGEXP and CROPDMGEXP to uppercase
dat$PROPDMGEXP <- toupper(dat$PROPDMGEXP)
dat$CROPDMGEXP <- toupper(dat$CROPDMGEXP)

# 2. Multiply by property and crop damage values with the proper exponent value based on PROPDMGEXP and CROPDMGEXP
symbles <- c("K", "M", "B")
expVal <- c(1000, 1e+6, 1e+9)
for (i in 1:3){
  #print(i)
  dat[dat$PROPDMGEXP == symbles[i], ]$PROPDMG <- dat[dat$PROPDMGEXP == symbles[i], ]$PROPDMG * expVal[i]
  dat[dat$CROPDMGEXP == symbles[i], ]$CROPDMG <- dat[dat$CROPDMGEXP == symbles[i], ]$CROPDMG * expVal[i]
}

# head(dat)
EconomicCost <- c(dat$CROPDMG + dat$PROPDMG)
EventType <- c(dat$EVTYPE)

datNew <- data.frame(EventType, EconomicCost)
# head(datNew)
datNew <- aggregate(EconomicCost~EventType, data = datNew, sum)

# 3. Find the tope 10 events for the property and crop damage combined estmates.
EcoTop10 <- datNew[order(datNew$EconomicCost, decreasing=TRUE),][1:10,]

Plotting Step:

library(ggplot2)
EcoTop10 <- transform(EcoTop10, EventType = reorder(EventType, order(EconomicCost)))

ggplot(data = EcoTop10, aes(EventType, EconomicCost, fill = EconomicCost)) + geom_bar(stat = "identity", position="dodge", fill=rainbow(n=length(top10$EVTYPE))) + xlab("Events Type") + ylab("Total Economic Cost ") + ggtitle("Top 10 Most Costly Events") + theme(legend.position = "none") + coord_flip()

From the above chart, we can tell the tornado is the most costly event in US.

Conclusion

After analysis the data, I found out, in the United States, Tornado is both the most harmful event and most costly event. And the other events even though harmful and costly but not as bad as tornado. So it is clear that US government should pay much attention on tornado events to prevent death, injury and economic loss.