This project uses 1950-2011 weather event data from NOAA to determine which storm events across the United States have the most impact on public health (fatalities and injuries) and the largest economic consequences (property and crop damage).
Overall, the analysis determined that tornados top the list as both the most fatal and injury producing weather event across the United States with 5633 fatalities and 91346 injuries. The next largest events in terms of fatalities were heat and floods. In terms of economic damage, floods, hurricanes, and tornados produced the most property damage, while droughts, floods, and ice storms produced the most crop damage.
library(R.utils)
library(ggplot2)
library(reshape2)
library(dplyr)
Before loading the data, the file was unzipped and located in the active directory. Due to the size of the file, this takes a few minutes.
setwd("~/Data_Science_Specialization/5_ReproducibleResearch/CourseProj2")
#bunzip2("repdata-data-StormData.csv.bz2", "stormData.csv")
storm <- read.csv("stormData.csv")
Next, we did a little exploring of the data to see what we need to do for analysis
str(storm)
head(storm)
summary(storm)
colnames(storm)
[1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
[6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
[11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
[16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
[21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
[26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
[31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
[36] "REMARKS" "REFNUM"
Subset only the necessary columns of the data to make it more manageable.
storm <- storm[, c(8,23:28)]
head(storm)[1:5,]
EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
1 TORNADO 0 15 25.0 K 0
2 TORNADO 0 0 2.5 K 0
3 TORNADO 0 2 25.0 K 0
4 TORNADO 0 2 2.5 K 0
5 TORNADO 0 2 2.5 K 0
For this section we are looking at the columns for storm type, number of fatalities, and injuries. First, we remove all the data points with both zero fatalities and injuries.
x <- which(storm$FATALITIES == 0 & storm$INJURIES == 0)
storm_1 <- storm[-x,]
Next, the data are summarized over each storm type for fatalities and injuries and arranged in order of most fatal and most injuries.
fatal <- aggregate(storm_1$FATALITIES, by = list(storm_1$EVTYPE), FUN = sum)
injured <- aggregate(storm_1$INJURIES, by = list(storm_1$EVTYPE), FUN = sum)
health <- merge(fatal, injured, by = "Group.1")
fatal <- arrange(fatal, desc(x))
injured <- arrange(injured, desc(x))
colnames(health) <- c("EVTYPE", "FATALITIES", "INJURIES")
health_FAT <- arrange(health, desc(FATALITIES), desc(INJURIES))
health_INJ <- arrange(health, desc(INJURIES), desc(FATALITIES))
Plot the top ten most fatal storm types and those that produce the most injuries
par(mar=c(11,6,4,1), mgp = c(4, 1, 0))
par(mfrow = c(1,2))
barplot(health_FAT$FATALITIES[1:10], names.arg = health_FAT$EVTYPE[1:10], las =2, col = "red",
ylab = "Number Fatalities")
title(main = "Top 10 Fatalities \n by Weather Event")
barplot(health_INJ$INJURIES[1:10], names.arg = health_INJ$EVTYPE[1:10], las =2, col = "blue",
ylab = "Number Injuries")
title(main = "Top 10 Injuries \n by Weather Event")
From the plot we see that tornados produce by far the most fatalities and injuries. Heat, thunderstorms, and floods also produce significant casualties.
First check the PROPDMGEXP and CROPDMGEXP to see what we have and standardize in millions of dollars.
unique(storm$PROPDMGEXP)
[1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
According to the NOAA documentation, the acceptable values for the PROPDMGEXP are b (billions), m (millions), and k (thousands). For this analysis, we will assume any other values just represent the actual dollar figure in the PROPDMG or CROPDMG columns.
econ_storm <- storm
econ_storm$PROPMULT <- NA #Property Damage Multiplier
b <- grep("b", econ_storm$PROPDMGEXP, ignore.case = TRUE)
m <- grep("m", econ_storm$PROPDMGEXP, ignore.case = TRUE)
k <- grep("k", econ_storm$PROPDMGEXP, ignore.case = TRUE)
econ_storm$PROPMULT[b] <- 1000
econ_storm$PROPMULT[m] <- 1
econ_storm$PROPMULT[k] <- .001
econ_storm$PROPMULT[-c(b,m,k)] <- .000001 #if not m,b, or k
unique(econ_storm$CROPDMGEXP)
[1] M K m B ? 0 k 2
Levels: ? 0 2 B k K m M
econ_storm$CROPMULT <- NA #Crop Damage Multiplier
b <- grep("b", econ_storm$CROPDMGEXP, ignore.case = TRUE)
m <- grep("m", econ_storm$CROPDMGEXP, ignore.case = TRUE)
k <- grep("k", econ_storm$CROPDMGEXP, ignore.case = TRUE)
econ_storm$CROPMULT[b] <- 1000
econ_storm$CROPMULT[m] <- 1
econ_storm$CROPMULT[k] <- .001
econ_storm$CROPMULT[-c(b,m,k)] <- .000001 #if not m,b, or k
The data are summarized across each storm type by both property and crop damage in millions of dollars.
x <- which(econ_storm$PROPDMG == 0 & econ_storm$CROPDMG == 0)
storm_1 <- econ_storm[-x,]
PropDamage <- aggregate(storm_1$PROPDMG*storm_1$PROPMULT,
by = list(storm_1$EVTYPE), FUN = sum)
CropDamage <- aggregate(storm_1$CROPDMG*storm_1$CROPMULT,
by = list(storm_1$EVTYPE), FUN = sum)
Damage <- merge(PropDamage, CropDamage, by = "Group.1")
colnames(Damage) <- c("EVTYPE", "PROPDAMAGE", "CROPDAMAGE")
Damage_prop <- arrange(Damage, desc(PROPDAMAGE), desc(CROPDAMAGE))
Damage_crop <- arrange(Damage, desc(CROPDAMAGE), desc(PROPDAMAGE))
Now, plot the top ten storms in terms of both property damage and crop damage
par(mar=c(11,6,4,1), mgp = c(4, 1, 0))
par(mfrow = c(1,2))
barplot(Damage_prop$PROPDAMAGE[1:10], names.arg = Damage_prop$EVTYPE[1:10],
las =2, col = "red", ylab = "Property Damage (Millions USD)")
title(main = "Top 10 Weather Events \n for Property Damage")
barplot(Damage_crop$CROPDAMAGE[1:10], names.arg = Damage_crop$EVTYPE[1:10],
las =2, col = "blue", ylab = "Crop Damage (Millions USD)")
title(main = "Top 10 Weather Events \n for Crop Damage")
From these plots, it appears the flood and drought are the most damaging in terms of property and crop damage. For property damage, wind events such as typhoons, tornados, and hurricanes also produce significant damage. For crops, temperature dependent events such as ice storms, and cold/freezing conditions produce significant crop damage.