======================================================
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis on the storm event database revealed that tornadoes are the most dangerous weather event to the population health. The second most dangerous event type is the excessive heat. The economic impact of weather events was also analyzed. Flash floods and thunderstorm winds caused billions of dollars in property damages between 1950 and 2011. The largest crop damage caused by drought, followed by flood and hails.
The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.
The first step is to read the data into a data frame.
storm <- read.csv("/Users/Malter/Desktop/Coursera/Reproducible Research/Assessment 2/repdata-data-StormData.csv", sep = "\t")
Before the analysis, the data need some preprocessing. Event types don’t have a specific format. For instance, there are events with types Frost/Freeze, FROST/FREEZE and FROST\\FREEZE which obviously refer to the same type of event.
Load required packages
library(ggplot2)
library(car)
Read data from file
file <- "/Users/Malter/Desktop/Coursera/Reproducible Research/Assessment 2/repdata-data-StormData.csv"
# get info about file init <- read.csv(file, sep=',', header=TRUE,
# nrows=5000, stringsAsFactors=FALSE, quote = '') classes <-
# sapply(init,class) cols <- colnames(init)
data <- read.csv("/Users/Malter/Desktop/Coursera/Reproducible Research/Assessment 2/repdata-data-StormData.csv", sep = "\t")
data <- read.csv(file, header = TRUE, stringsAsFactors = FALSE)
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
We’ll be using following metrics: EVTYPE - event type (e.g. flood, tornado) FATALITIES - fatalities INJURIES - injuries PROPDMG - property damage CROPDMG - crop damage
Clean up data. Convert PROPDMG & CROPDMG to same scale values (ones USD)
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)
data$PROPDMGEXP[data$PROPDMGEXP == "" | data$PROPDMGEXP == "+" | data$PROPDMGEXP == "?" | data$PROPDMGEXP == "-"] <- "1"
data$PROPDMGEXP[data$PROPDMGEXP == "H" | data$PROPDMGEXP == "h"] <- "100"
data$PROPDMGEXP[data$PROPDMGEXP == "K" | data$PROPDMGEXP == "k"] <- "1000"
data$PROPDMGEXP[data$PROPDMGEXP == "M" | data$PROPDMGEXP == "m"] <- "1000000"
data$PROPDMGEXP[data$PROPDMGEXP == "B" | data$PROPDMGEXP == "b"] <- "1000000000"
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
data$PROPDMGUSD <- data$PROPDMG * data$PROPDMGEXP
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)
data$CROPDMGEXP[data$CROPDMGEXP == "" | data$CROPDMGEXP == "?"] <- "1"
data$CROPDMGEXP[data$CROPDMGEXP == "B" | data$CROPDMGEXP == "b"] <- "1000000000"
data$CROPDMGEXP[data$CROPDMGEXP == "M" | data$CROPDMGEXP == "m"] <- "1000000"
data$CROPDMGEXP[data$CROPDMGEXP == "K" | data$CROPDMGEXP == "k"] <- "1000"
data$CROPDMGEXP[data$CROPDMGEXP == "" | data$CROPDMGEXP == "?"] <- "1"
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)
data$CROPDMGUSD <- data$CROPDMG * data$CROPDMGEXP
Aggregate data per event type and calcualte two new column health - total of fatalities and injuries damage - total of property and crop damage
# Aggreata date per EVTYPE
agg <- aggregate(cbind(FATALITIES, INJURIES, PROPDMGUSD, CROPDMGUSD) ~ EVTYPE, data = data, FUN = sum)
# Add calculated column 'health' as a sum of FATALITIES and INJURIES
agg$health <- agg$FATALITIES + agg$INJURIES
# Add calculated column 'damage' as a sum of FATALITIES and INJURIES
agg$damage <- agg$PROPDMGUSD + agg$CROPDMGUSD
Prepare data sets for graphing of health impact
# Examine fatalities on their own
fatalities <- agg[order(agg$FATALITIES, decreasing = T),][1:10,]
fatalities <- transform(fatalities, EVTYPE=reorder(EVTYPE, -FATALITIES) )
fatalities$TYPE <- "FATALITIES"
# Examine combined fatalities and injuries
health <- agg[order(agg$health, decreasing = T),][1:10,]
healthFatalities <- health[,1:2]
names(healthFatalities)[2] <- "PEOPLE"
healthFatalities$TYPE <- "FATALITIES"
healthInjuries <- health[,c(1,3)]
names(healthInjuries)[2] <- "PEOPLE"
healthInjuries$TYPE <- "INJURIES"
healthPlot <- rbind(healthFatalities, healthInjuries)
healthPlot <- transform(healthPlot, EVTYPE=reorder(EVTYPE, -PEOPLE) )
Prepare data sets for graphing of economic impact
# Aggreata date per EVTYPE
agg <- aggregate(cbind(FATALITIES, INJURIES, PROPDMGUSD, CROPDMGUSD) ~ EVTYPE, data = data, FUN = sum)
# Add calculated column 'health' as a sum of FATALITIES and INJURIES
agg$health <- agg$FATALITIES + agg$INJURIES
# Add calculated column 'damage' as a sum of FATALITIES and INJURIES
agg$damage <- agg$PROPDMGUSD + agg$CROPDMGUSD
Prepare data sets for graphing of health impact
damage <- agg[order(agg$damage, decreasing = T),][1:10,]
damageProperty <- damage[,c(1,4)]
names(damageProperty)[2] <- "DAMAGE"
damageProperty$TYPE <- "PROPERTY"
damageCrop <- damage[,c(1,5)]
names(damageCrop)[2] <- "DAMAGE"
damageCrop$TYPE <- "CROP"
damagePlot <- rbind(damageProperty, damageCrop)
damagePlot <- transform(damagePlot, EVTYPE=reorder(EVTYPE, -DAMAGE) )
damagePlot$DAMAGE <- damagePlot$DAMAGE / 1000000
Number of fatalities
Tornado is the most dangerous event type when number of fatalities is examined.
qplot(
EVTYPE,
FATALITIES,
data = fatalities,
fill = TYPE,
geom = "bar",
stat = "identity",
main = "Fatalities",
ylab = "Number of people",
xlab = ""
) + scale_fill_discrete("") + theme(axis.text.x = element_text(angle = 90))
Combined number of fatalities and injuries
Tornado is the most dangerous event type when combined number of fatalities and injures is examined.
qplot(
EVTYPE,
PEOPLE,
data = healthPlot,
fill = TYPE,
geom = "bar",
stat = "identity",
main = "Fatalities & Injuries",
ylab = "Number of people",
xlab = ""
) + scale_fill_discrete("") + theme(axis.text.x = element_text(angle = 90))
Which event causes the most economic damages
Flood is the most economically damaging event type.
qplot(
EVTYPE,
DAMAGE,
data = damagePlot,
fill = TYPE,
geom = "bar",
stat = "identity",
main = "Economic damage",
ylab = "Damage in million $",
xlab = ""
) + scale_fill_discrete("") + theme(axis.text.x = element_text(angle = 90))