Synopsis: This data analysis project examines the types of hazardous weather events most harmful to a populations’ health and those events with the greatest economic consequences. It was written to assist individuals responsible for preparing for severe weather events to understand the greatest threats.
Extreme heat was found to be the greatest threat to population health in terms of the average number of deaths and injuries reported in each incident. Tornadoes resulted in the highest numbers of reported deaths and injuries overall.
Hurricanes and typhoons resulted in the greatest average property and crop losses per event. Flooding also had a major economic impact on crops.
The first step to be taken in reproducing the results from this analysis would be to download the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and load it into R:
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
# Download data file
# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","data/StormData.csv.bz2")
# Unzip data file
# bunzip2("data/StormData.csv.bz2", "data/StormData.csv", remove = F, skip = T)
# Read column header
h1 <- readLines("data/StormData.csv",1)
# Remove uneeded characters from header
h1 <- gsub("\"","",h1)
# Create a list of column names
h1 <- strsplit(h1,',',fixed = T)
# Read in rest of storm data file
sdata <- read.csv("data/StormData.csv", skip = 1, header = F)
# Set column names on storm data file
names(sdata) <- make.names(h1[[1]])
The next step would be to calculate the health impacts from the various types of weather events listed in the storm database. NOAA classifies severe weather events into 48 categories.
# Load libraries
library(dplyr)
# Extract "event type" (col 8), "fatalities" (col 23), and
# "injuries"" (col 24) from storm data table
ph <- sdata[,c(8,23,24)]
# Group the storm data by "event type"
byGrp <- group_by(ph,EVTYPE)
# Summarize by total number of fatalities and injuries by event,
# and average fatalities and injuries per event.
phSum <- summarize(byGrp, COUNT = n(), FATALITY = sum(FATALITIES), INJURY = sum(INJURIES), FATAL_PER_EVT = FATALITY/COUNT, INJURE_PER_EVT = INJURY/COUNT)
# Arrange storm data in descending order by number
# of fatalities associated with each event
descFatal <- arrange(phSum, desc(FATALITY))
# Filter out non-standard naming conventions used in reporting
# weather event types by ensuring at least 100 events were reported
phSum2 <- filter(phSum, COUNT > 100)
# Arrange storm data in descending order by average
# fatalities associated with each event
descAvFatal <- arrange(phSum2, desc(FATAL_PER_EVT))
Next, we examine the economic impacts of severe weather events. NOAA stores information on the economic consequences of severe weather as multiples of three alphabetic codes: K for 1,000’s, M for millions, and B for billions. Converting these values to numeric figures facilitates comparison.
# Load libraries
library(dplyr)
# Extract "event type" (col 8), "property damage" (col 25),
# "prop damage magnitude" (col 26) from storm data table
pd <- sdata[,c(8,25:26)]
# Create function to convert alphabetic loss figures to numeric values
magTimes <- function(x, y) {
if (y=="K"|y=="k") {
result <- x * 1000;
} else if (y=="M"|y=="m") {
result <- x * 1000000;
} else if (y=="B"|y=="b") {
result <- x * 1000000000;
} else {
result <- x;
}
return(result)
}
# Add new column "propdmgamt" to property damage table
# to create numeric loss figures
pd$propdmgamt <- mapply(magTimes,pd$PROPDMG,pd$PROPDMGEXP)
# Group the storm data by "event type"
byGrp <- group_by(pd,EVTYPE)
# Summarize by total property damage loss associated with the event type, and
# average property damage per event
pdSum <- summarize(byGrp, COUNT = n(), PROP_DMG_AMT = sum(propdmgamt), PROP_DMG_PER_EVT = PROP_DMG_AMT/COUNT)
# Arrange storm data in descending order by property
# damage loss associated with each event
descPropLoss <- arrange(pdSum, desc(PROP_DMG_AMT))
# Load libraries
library(dplyr)
# Extract "event type" (col 8), "crop damage" (col 27),
# and "crop damage magnitude (col 28) from storm data table
cd <- sdata[,c(8,27:28)]
# Add new column "cropdmgamt" to crop damage table
# to create numeric loss figures
cd$cropdmgamt <- mapply(magTimes,cd$CROPDMG,cd$CROPDMGEXP)
# Group the storm data by "event type"
byGrp <- group_by(cd,EVTYPE)
# Summarize by total crop damage loss associated with each event,
# and average loss reported per event
cdSum <- summarize(byGrp, COUNT = n(), CROP_DMG_AMT = sum(cropdmgamt), CROP_DMG_PER_EVT = CROP_DMG_AMT/COUNT)
# Arrange storm data in descending order by crop
# damage loss associated with each event
descCropLoss <- arrange(cdSum, desc(CROP_DMG_AMT))
# Load libraries
library(knitr)
# Summarize Top 10 Event Types by total fatalities and injuries reported
kable(head(descFatal[,c(1,3:4)],10), caption = "Table 1: Total Reported Fatalities and Injuries by Event Type", col.names = c("Event Type","Fatalities","Injuries"))
| Event Type | Fatalities | Injuries |
|---|---|---|
| TORNADO | 5633 | 91346 |
| EXCESSIVE HEAT | 1903 | 6525 |
| FLASH FLOOD | 978 | 1777 |
| HEAT | 937 | 2100 |
| LIGHTNING | 816 | 5230 |
| TSTM WIND | 504 | 6957 |
| FLOOD | 470 | 6789 |
| RIP CURRENT | 368 | 232 |
| HIGH WIND | 248 | 1137 |
| AVALANCHE | 224 | 170 |
# Summarize Top 10 Event Types by average fatalities and injuries per event
kable(head(descAvFatal[,c(1,5:6)],10), caption = "Table 2: Average Fatalities and Injuries per Event", col.names = c("Event Type","Average Fatalities","Average Injuries"))
| Event Type | Average Fatalities | Average Injuries |
|---|---|---|
| HEAT | 1.2216428 | 2.7379400 |
| EXCESSIVE HEAT | 1.1340882 | 3.8885578 |
| RIP CURRENT | 0.7829787 | 0.4936170 |
| RIP CURRENTS | 0.6710526 | 0.9769737 |
| AVALANCHE | 0.5803109 | 0.4404145 |
| HURRICANE | 0.3505747 | 0.2643678 |
| EXTREME COLD | 0.2442748 | 0.3526718 |
| HEAVY SURF/HIGH SURF | 0.1842105 | 0.2105263 |
| COLD/WIND CHILL | 0.1762523 | 0.0222635 |
| HIGH SURF | 0.1393103 | 0.2096552 |
There were 60,652 “Tornado” and 54,277 “Flash Flood” events reported in NOAA’s storm database as of March 4, 2017. To avoid skewing the results with events reported with non-standard naming conventions, Table 2 was filtered by events names with more than 100 instances. This analysis indicated that weather events involving extreme heat and cold were the source of the greatest average numbers of fatalities.
Table 2 is sorted in descending order by the number of fatalities associated with each event. “Excessive Heat” remains at the top of the list even when the data is sorted by the number of injuries associated with each event. Other event types such as “Tornado” and “Ice Storm” rise into the top 10 list when the number of reported injuries is used to rank the events.
# Don't use scientific notation
options(scipen = 999)
# Calculate top 10 reported event types
top10propLoss <- head(descPropLoss,10)
# Refactor the event type to limit to 10 factors/Event Types
top10propLoss$EVTYPE <- factor(top10propLoss$EVTYPE)
# Set axis label orientation and font size
par(las = 2, cex.axis = 0.4)
# Create a barplot of the total number of property loss incidents reported
barplot(top10propLoss$COUNT, names.arg = top10propLoss$EVTYPE, ylim=c(0,300000), main = "Total Property Loss Events Reported")
Figure 1
The total number of reported incidents for each of the top 10 property loss types is reported above. “Hail” was by far the most frequently reported property damage event.
# Set axis label orientation and font size
par(las = 2, cex.axis = 0.4)
# Create barplot of top 10, average property loss per event type
barplot(top10propLoss$PROP_DMG_PER_EVT, names.arg = top10propLoss$EVTYPE, ylim=c(0,800000000), main = "Top 10 Average Property Loss per Event Type")
Figure 2
“Hurricanes” and “Typhoons” resulted in the greatest property losses with average losses approaching $800 million dollars.
# Calculate top 10 reported event types
top10cropLoss <- head(descCropLoss,10)
# Refactor the event type to 10 returned factors/Event Types
top10cropLoss$EVTYPE <- factor(top10cropLoss$EVTYPE)
# Set axis label orientation and font size
par(las = 2, cex.axis = 0.4)
# Create barplot of top 10, average crop loss per event type
barplot(top10cropLoss$CROP_DMG_PER_EVT, names.arg = top10cropLoss$EVTYPE, ylim=c(0,30000000), main = "Top 10 Average Crop Loss per Event Type")
Figure 3
“Hurricanes,” “Typhoons,” and “Floods” resulted in the greatest crop losses with average losses approaching $30 million.