Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to identify event types which are most harmful to population health, and those which have the greatest economic consequences.
We download the storm database file from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and process it in R as follows -
# Read data from BZ2 file -
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
total_rows <- nrow(data)
Load the CSV data in R, there are 902297 rows in data.
The dataset contains number of injuries and fatalities per year, for each event type. We compute sum of all cases of injuries and fatalities, grouped by the event type. We then add the cases for injuries and fatalities to get total cases related to health.
# Sum on injuries & fatalities, grouped by event type
evt_health <- aggregate(cbind(data$INJURIES, data$FATALITIES), list(Event = data$EVTYPE),
sum)
names(evt_health) <- c("Event", "Injuries", "Fatalities")
# Get total health cases by adding injuries and fatalities
evt_health$Total <- evt_health$Injuries + evt_health$Fatalities
# Sort on the total health cases, descending
evt_health_top <- evt_health[order(-evt_health[, 4]), c(1:4)]
The dataset contains columns for property damages, and crop damages, along with the units. The units can be either a number (0-9) which will represent the exponent (damage * 10exp). The unit can also be “k/K”, which means 1000, or “m/M” which means million or “b/B” which means billion. We compute sum of all property and crop damage values, grouped by the event type. We then add the amount for property and crop damage to get total financial damage.
computeAmount <- function(entry) {
value <- as.numeric(entry[1])
if (!is.numeric(value) | is.na(value)) {
value <- 0
}
exp <- as.character(entry[2])
amount <- value
if (exp %in% c("0":"9")) {
amount <- value * (10^as.numeric(exp))
} else if ((exp == "b" | exp == "B")) {
amount <- value * (10^9)
} else if ((exp == "m" | exp == "M")) {
amount <- value * (10^6)
} else if ((exp == "k" | exp == "K")) {
amount <- value * (10^3)
}
amount
}
# Compute property damage amount per row -
data$PROPDMGAMOUNT <- apply(data[, c("PROPDMG", "PROPDMGEXP")], 1, computeAmount)
# Compute crop damage amount per row -
data$CROPDMGAMOUNT <- apply(data[, c("CROPDMG", "CROPDMGEXP")], 1, computeAmount)
# Sum on damage amounts, aggregated by event type
evt_finance <- aggregate(cbind(data$PROPDMGAMOUNT, data$CROPDMGAMOUNT), list(Event = data$EVTYPE),
sum)
names(evt_finance) <- c("Event", "PropertyDamageAmount", "CropDamageAmount")
# Get total damage by adding property and crop damage
evt_finance$TotalDamageAmount <- evt_finance$PropertyDamageAmount + evt_finance$CropDamageAmount
# Sort on the total damage, descending
evt_finance_top <- evt_finance[order(-evt_finance[, 4]), c(1:4)]
Here are the top five event types affecting population health -
rownames(evt_health_top) <- NULL
evt_health_top$Event <- as.factor(evt_health_top$Event)
evt_health_top <- evt_health_top[1:5, ]
evt_health_top
## Event Injuries Fatalities Total
## 1 TORNADO 91346 5633 96979
## 2 EXCESSIVE HEAT 6525 1903 8428
## 3 TSTM WIND 6957 504 7461
## 4 FLOOD 6789 470 7259
## 5 LIGHTNING 5230 816 6046
Here is the chart showing the total health cases for top 5 event types.
barplot(evt_health_top$Total, main = "Storm events affecting population health",
xlab = "Event Type", ylab = "Total number of health cases", col = c("lightblue",
"mistyrose", "lightcyan", "lavender", "cornsilk"), legend = evt_health_top$Event)
As you can see, TORNADO is the single most harmful storm event type which affects population health.
Here are the top five event types causing maximum financial damage -
rownames(evt_finance_top) <- NULL
evt_finance_top$Event <- as.factor(evt_finance_top$Event)
evt_finance_top <- evt_finance_top[1:5, ]
evt_finance_top
## Event PropertyDamageAmount CropDamageAmount
## 1 FLOOD 1.447e+11 5.662e+09
## 2 HURRICANE/TYPHOON 6.931e+10 2.608e+09
## 3 TORNADO 5.695e+10 4.150e+08
## 4 STORM SURGE 4.332e+10 5.000e+03
## 5 HAIL 1.574e+10 3.026e+09
## TotalDamageAmount
## 1 1.503e+11
## 2 7.191e+10
## 3 5.736e+10
## 4 4.332e+10
## 5 1.876e+10
Here is the chart showing the total damage for top 5 event types.
barplot(evt_finance_top$TotalDamageAmount, main = "Storm events causing financial damage",
xlab = "Event Type", ylab = "Total financial damage", col = c("lightblue",
"mistyrose", "lightcyan", "lavender", "cornsilk"), legend = evt_finance_top$Event)
As you can see, FLOOD is the most harmful storm event type which causes maximum financial damage.