Synopsis
The primary of this report is to explore and process the NOAA Storm Database in order to answer the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
In addition, following report contains a Results section and several plots to illustrate the analysis and the answers to the basic questions. All computer code used in the analysis, as well as links to the original data, is included.
Accomplishing these two objectives took the following steps:
Downloading the data set
The data was downloaded from a compressed file at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, unzipped to reveal the CSV file ‘repdata-data-StormData.csv’, which was then loaded into RStudio into the data frame ‘rawdata’.
if (!file.exists('repdata-data-StormData.csv')) unzip('repdata-data-StormData.csv.bz2')
rawdata <- read.csv("repdata-data-StormData.csv")
Initial inspection of the data
raw.cols <- ncol(rawdata) # Number of raw data columns
raw.rows <- nrow(rawdata) # Number of raw data rows
event.names.unique <- length(unique(rawdata$EVTYPE)) # Unique event types
paste("After loading the data, a review of the data set revealed that it contained", raw.cols, "columns and",format(raw.rows, big.mark=',') ,"rows of information, with each row having a unique reference number corresponding to a particular event.", sep=" ")
## [1] "After loading the data, a review of the data set revealed that it contained 37 columns and 902,297 rows of information, with each row having a unique reference number corresponding to a particular event."
paste("These events were associated with", event.names.unique,"unique event types.", sep=" ")
## [1] "These events were associated with 985 unique event types."
The purpose of data processing actions associated was to modify the raw data into a format that was suitable for analysis that would address the two key questions to be addressed in this report. The data processing activities included several actions: - Extracting the relevant data for further evaluation - Adding or removing data to aid in the analysis
Extracting relevant data
A review of the 37 columns revealed that only four columns contained information that was relevant to the questions at hand:
These four columns, along with a fifth column, REFNUM, which had a unique reference number for each observation, were placed into the data frame object ‘keydata’ for further analysis. Also, the column names were changed for clarity to the following:
# Included only relevant columns plus reference ID numbers for further analysis
keydata <- rawdata[,c(8,23:25,37)]
# Changed the column names of the keydata data frame
colnames(keydata) <- c("event.type", "deaths", "injuries", "damage", "event.id")
Adding or removing data to aid in the analysis
To aid in further analysis, added three columns of logical vectors; death.event, injury.event, and damage.event, to identify observations involving fatalities, injuries, or damage. Also, the ‘event.type’ column was changed from type integer to type character.
# Added columns of logical vectors to indicate the presences of deaths, injuries, or damage
keydata$death.event <- keydata$deaths > 0
keydata$injury.event <- keydata$injuries > 0
keydata$damage.event <- keydata$damage > 0
# Changing the event.type column from type integer to type character
keydata$event.type <- as.character(keydata$event.type)
# Logical vector identifies observations (rows) without deaths, injuries, or damage
no.harm <- ((!keydata$death.event)&(!keydata$injury.event)&(!keydata$damage.event))
removed.rows <- sum(no.harm) # Number of rows to be removed
keydata <- keydata[!no.harm,] # Removed rows lacking deaths, injuries, or damage
unique.event.names <- unique(keydata$event.type)
paste("There were",format(removed.rows, big.mark=','), "events that did not result in any deaths, injuries, or damage, and they were removed from the 'keydata' data frame.", sep=" ")
## [1] "There were 653,495 events that did not result in any deaths, injuries, or damage, and they were removed from the 'keydata' data frame."
paste("The remaining ", format(nrow(keydata), big.mark=','), " rows of data, which accounted for ", format(100*nrow(keydata)/nrow(rawdata), digits = 4), "% of the ",format(nrow(rawdata), big.mark=',')," observations, represented events associated with at least one death or injury to a person, or some level of damage to property, and is the group of observations that would be subjected to further analysis.", sep = "")
## [1] "The remaining 248,802 rows of data, which accounted for 27.57% of the 902,297 observations, represented events associated with at least one death or injury to a person, or some level of damage to property, and is the group of observations that would be subjected to further analysis."
The raw data contained a substantial amount of information that was not needed to answer the two questions that had to be addressed by this report. Once the unnecessary information was eliminated, and once additional information was added to identify those events were harmful to human health; specifically events that involved death, injury, or measurable damage; it became possible to complete an overview of the data of interest to see how these events of interest were distributed. The distribution of the magnitude of deaths, injuries, and damage were highly skewed, with a large proportion of these events having a small number of deaths or injuries, or a relatively low level of economic damage.
Events that cause harm
Those events that are harmful to public health are assumed to be the events that cause deaths, injuries, or economic damage. Descriptions of the categories of events in this category are captured by the ‘event.type’ variable. However, the previous descriptions of these types of events showed that most of these events cause relatively low levels of harm, and that a relatively small proportion of events account for a significant fraction of the total harm.
For events with deaths, injuries, and damage, the events with a magnitude of harmful outcomes that were at or above their respective 90th percentile levels caused more than half of all harm. Focusing on those events that were most harmful to public health will help to identify those type of events that cause the most harm.
In the remainder of this report, events that are most harmful to public health will be defined as those that were associated with death, injury, or damage events that were at or above the 90th percentile for their respective category.
keydata$percentile90 <- keydata$deaths>=death.90th.percentile | keydata$injuries>=injury.90th.percentile | keydata$damage >= damage.90th.percentile
number.significant.harm <- sum(keydata$percentile90)
paste("- ",format(100*number.significant.harm/nrow(rawdata), digits = 3), "% of the ",format(nrow(rawdata), big.mark=',')," observations, representing ", format(number.significant.harm, big.mark=','), " events, are considered to be those that were most harmful to public health.", sep = "")
## [1] "- 3.5% of the 902,297 observations, representing 31,621 events, are considered to be those that were most harmful to public health."
paste("- These noteworthy events represented ", format(100*number.significant.harm/nrow(keydata), digits = 3), "% of those ",format(nrow(keydata), big.mark=',')," events that caused at least one death or injury, or that caused some level of damage.", sep = "")
## [1] "- These noteworthy events represented 12.7% of those 248,802 events that caused at least one death or injury, or that caused some level of damage."
paste("There were ", length(unique(keydata[keydata$percentile90, "event.type"])), " unique descriptions used for those events that were considered most harmful to human health.", sep = "")
## [1] "There were 192 unique descriptions used for those events that were considered most harmful to human health."
# Object containing only those events that were most harmful to human health
big.events <- keydata[keydata$percentile90,c("event.type","deaths","injuries","damage")]
rownames(big.events) <- NULL
# Unique descriptions of the most harmful events
unique.desc <- unique(keydata[keydata$percentile90, "event.type"])
# Number of unique descriptions
unique.desc.num <- length(unique.desc)
# Create an object for summary information for the three kinds of 90th percentile plus events
big.event.summary <- NULL
big.event.summary <- cbind(big.event.summary,unique.desc)
# Death sums from these unique events
big.death.sum <- sapply(1:unique.desc.num, function(x) {
sum(big.events[big.events$event.type==unique.desc[x],"deaths"])
})
big.event.summary <- cbind(big.event.summary,big.death.sum)
# Injury sums from these unique events
big.injury.sum <- sapply(1:unique.desc.num, function(x) {
sum(big.events[big.events$event.type==unique.desc[x],"injuries"])
})
big.event.summary <- cbind(big.event.summary,big.injury.sum)
# Damage sums from these unique events
big.damage.sum <- sapply(1:unique.desc.num, function(x) {
sum(big.events[big.events$event.type==unique.desc[x],"damage"])
})
big.event.summary <- cbind(big.event.summary,big.damage.sum)
# Ensure that the big.event.summary object is a data frame
big.event.summary <- as.data.frame(big.event.summary)
# big.event.summary by deaths
big.event.death.sort <- big.event.summary[order(-big.death.sum),]
# big.event.summary by injuries
big.event.injury.sort <- big.event.summary[order(-big.injury.sum),]
# big.event.summary by damage
big.event.damage.sort <- big.event.summary[order(-big.damage.sum),]
paste("Top 10 event types for total deaths in 90th percentile or above death events:")
## [1] "Top 10 event types for total deaths in 90th percentile or above death events:"
print(big.event.death.sort[1:10,c("unique.desc","big.death.sum")], row.names=FALSE)
## unique.desc big.death.sum
## TORNADO 5071
## EXCESSIVE HEAT 1405
## HEAT 761
## FLASH FLOOD 398
## FLOOD 195
## HEAT WAVE 161
## TSTM WIND 122
## WINTER STORM 94
## EXTREME HEAT 91
## HIGH WIND 89
paste("Top 10 event type for total injuries in 90th percentile or above injury events:")
## [1] "Top 10 event type for total injuries in 90th percentile or above injury events:"
print(big.event.injury.sort[1:10,c("unique.desc","big.injury.sum")], row.names=FALSE)
## unique.desc big.injury.sum
## TORNADO 79864
## FLOOD 6601
## EXCESSIVE HEAT 6202
## TSTM WIND 2246
## HEAT 2004
## ICE STORM 1889
## HURRICANE/TYPHOON 1256
## FLASH FLOOD 1163
## WINTER STORM 1011
## HAIL 858
paste("Top 10 event type for total damage amounts in 90th percentile or above damage events:")
## [1] "Top 10 event type for total damage amounts in 90th percentile or above damage events:"
print(big.event.damage.sort[1:10,c("unique.desc","big.damage.sum")], row.names=FALSE)
## unique.desc big.damage.sum
## TORNADO 2772450.25
## FLASH FLOOD 1122434.52
## FLOOD 776705.14
## TSTM WIND 692550.19
## HAIL 451042.5
## THUNDERSTORM WIND 434193.9
## LIGHTNING 432924.4
## THUNDERSTORM WINDS 291592
## HIGH WIND 245969.6
## WINTER STORM 109352.9
The analysis section identified the types of events that were associated with the the events that had the greatest magnitudes of harful effects on the public, specifically events with the greatest magnitude of deaths, injuries, and damage.
As a first step toward managing and reducing threats to public health and safety, government or municipal managers who might be responsible for preparing for severe weather events may also have need to prioritize resources for different types of events. As a first step, it may be prudent to identify those types of events that have been associated with the greatest magnitude of harmful weather-related outcomes across the United States to see which, if any, may be relevant to their stakeholders, especially if that type of event is associated with significant deaths, injuries, and economic losses.
# Matching top injury, death, and damage event types
matching.events.vector <- as.character(big.event.injury.sort$unique.desc[1:10]) %in% as.character(big.event.death.sort$unique.desc[1:10]) %in% as.character(big.event.damage.sort$unique.desc[1:10])
# Event types present in each top ten list
big.event.intersection.death.injury <- intersect(as.character(big.event.injury.sort$unique.desc[1:10]), as.character(big.event.death.sort$unique.desc[1:10]))
big.event.intersection.all.harm <- intersect(big.event.intersection.death.injury, as.character(big.event.damage.sort$unique.desc[1:10]))
paste("The following ",length(big.event.intersection.all.harm), " harmful event descriptions were associated for the top 10 list for most deaths, most total injuries, and highest damage costs where the magnitude of the harm in was at or above the 90th percentile for each respective type of harm.", sep="")
## [1] "The following 5 harmful event descriptions were associated for the top 10 list for most deaths, most total injuries, and highest damage costs where the magnitude of the harm in was at or above the 90th percentile for each respective type of harm."
cat(sort(big.event.intersection.all.harm),sep="\n")
## FLASH FLOOD
## FLOOD
## TORNADO
## TSTM WIND
## WINTER STORM