Synopsis

The primary of this report is to explore and process the NOAA Storm Database in order to answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

In addition, following report contains a Results section and several plots to illustrate the analysis and the answers to the basic questions. All computer code used in the analysis, as well as links to the original data, is included.

Accomplishing these two objectives took the following steps:

  1. Downloading the data set
  2. Performing an initial inspection of the data
  3. Extracting data relevant for the analysis
  4. Provide an overview of deaths, injuries, and damage
  5. Performing the necessary analyses to answer the two questions that must be answered.
  6. Provide a summary of the analysis

Downloading the data set

The data was downloaded from a compressed file at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, unzipped to reveal the CSV file ‘repdata-data-StormData.csv’, which was then loaded into RStudio into the data frame ‘rawdata’.

if (!file.exists('repdata-data-StormData.csv')) unzip('repdata-data-StormData.csv.bz2')
rawdata <- read.csv("repdata-data-StormData.csv")

Initial inspection of the data

raw.cols <- ncol(rawdata) # Number of raw data columns
raw.rows <- nrow(rawdata) # Number of raw data rows
event.names.unique <- length(unique(rawdata$EVTYPE)) # Unique event types

paste("After loading the data, a review of the data set revealed that it contained",  raw.cols, "columns and",format(raw.rows, big.mark=',') ,"rows of information, with each row having a unique reference number corresponding to a particular event.", sep=" ")
## [1] "After loading the data, a review of the data set revealed that it contained 37 columns and 902,297 rows of information, with each row having a unique reference number corresponding to a particular event."
paste("These events were associated with", event.names.unique,"unique event types.", sep=" ")
## [1] "These events were associated with 985 unique event types."

Data processing

The purpose of data processing actions associated was to modify the raw data into a format that was suitable for analysis that would address the two key questions to be addressed in this report. The data processing activities included several actions: - Extracting the relevant data for further evaluation - Adding or removing data to aid in the analysis

Extracting relevant data

A review of the 37 columns revealed that only four columns contained information that was relevant to the questions at hand:

  1. EVTYPE - Event type
  2. FATALITIES - Number of deaths
  3. INJURIES - Number of injuries
  4. PROPDMG - Amount of property damage

These four columns, along with a fifth column, REFNUM, which had a unique reference number for each observation, were placed into the data frame object ‘keydata’ for further analysis. Also, the column names were changed for clarity to the following:

  1. EVTYPE - event.type
  2. FATALITIES - deaths
  3. INJURIES - injuries
  4. PROPDMG - damage
  5. REFNUM - event.id
# Included only relevant columns plus reference ID numbers for further analysis
keydata <- rawdata[,c(8,23:25,37)] 

# Changed the column names of the keydata data frame
colnames(keydata) <- c("event.type", "deaths", "injuries", "damage", "event.id")

Adding or removing data to aid in the analysis

To aid in further analysis, added three columns of logical vectors; death.event, injury.event, and damage.event, to identify observations involving fatalities, injuries, or damage. Also, the ‘event.type’ column was changed from type integer to type character.

# Added columns of logical vectors to indicate the presences of deaths, injuries, or damage
keydata$death.event <- keydata$deaths > 0
keydata$injury.event <- keydata$injuries > 0
keydata$damage.event <- keydata$damage > 0

# Changing the event.type column from type integer to type character
keydata$event.type <- as.character(keydata$event.type)

# Logical vector identifies observations (rows) without deaths, injuries, or damage
no.harm <- ((!keydata$death.event)&(!keydata$injury.event)&(!keydata$damage.event))
removed.rows <- sum(no.harm) # Number of rows to be removed

keydata <- keydata[!no.harm,] # Removed rows lacking deaths, injuries, or damage
unique.event.names <- unique(keydata$event.type)

paste("There were",format(removed.rows, big.mark=','), "events that did not result in any deaths, injuries, or damage, and they were  removed from the 'keydata' data frame.", sep=" ")
## [1] "There were 653,495 events that did not result in any deaths, injuries, or damage, and they were  removed from the 'keydata' data frame."
paste("The remaining ", format(nrow(keydata), big.mark=','), " rows of data, which accounted for ", format(100*nrow(keydata)/nrow(rawdata), digits = 4), "% of the ",format(nrow(rawdata), big.mark=',')," observations, represented events associated with at least one death or injury to a person, or some level of damage to property, and is the group of observations that would be subjected to further analysis.", sep = "")
## [1] "The remaining 248,802 rows of data, which accounted for 27.57% of the 902,297 observations, represented events associated with at least one death or injury to a person, or some level of damage to property, and is the group of observations that would be subjected to further analysis."

Analysis

The raw data contained a substantial amount of information that was not needed to answer the two questions that had to be addressed by this report. Once the unnecessary information was eliminated, and once additional information was added to identify those events were harmful to human health; specifically events that involved death, injury, or measurable damage; it became possible to complete an overview of the data of interest to see how these events of interest were distributed. The distribution of the magnitude of deaths, injuries, and damage were highly skewed, with a large proportion of these events having a small number of deaths or injuries, or a relatively low level of economic damage.

Events that cause harm

Those events that are harmful to public health are assumed to be the events that cause deaths, injuries, or economic damage. Descriptions of the categories of events in this category are captured by the ‘event.type’ variable. However, the previous descriptions of these types of events showed that most of these events cause relatively low levels of harm, and that a relatively small proportion of events account for a significant fraction of the total harm.

For events with deaths, injuries, and damage, the events with a magnitude of harmful outcomes that were at or above their respective 90th percentile levels caused more than half of all harm. Focusing on those events that were most harmful to public health will help to identify those type of events that cause the most harm.

In the remainder of this report, events that are most harmful to public health will be defined as those that were associated with death, injury, or damage events that were at or above the 90th percentile for their respective category.

keydata$percentile90 <- keydata$deaths>=death.90th.percentile | keydata$injuries>=injury.90th.percentile | keydata$damage >= damage.90th.percentile
number.significant.harm <- sum(keydata$percentile90)

paste("- ",format(100*number.significant.harm/nrow(rawdata), digits = 3), "% of the ",format(nrow(rawdata), big.mark=',')," observations, representing ", format(number.significant.harm, big.mark=','), " events, are considered to be those that were most harmful to public health.", sep = "")
## [1] "- 3.5% of the 902,297 observations, representing 31,621 events, are considered to be those that were most harmful to public health."
paste("- These noteworthy events represented ", format(100*number.significant.harm/nrow(keydata), digits = 3), "% of those ",format(nrow(keydata), big.mark=',')," events that caused at least one death or injury, or that caused some level of damage.", sep = "")
## [1] "- These noteworthy events represented 12.7% of those 248,802 events that caused at least one death or injury, or that caused some level of damage."
paste("There were ", length(unique(keydata[keydata$percentile90, "event.type"])), " unique descriptions used for those events that were considered most harmful to human health.", sep = "")
## [1] "There were 192 unique descriptions used for those events that were considered most harmful to human health."
# Object containing only those events that were most harmful to human health
big.events <- keydata[keydata$percentile90,c("event.type","deaths","injuries","damage")]
rownames(big.events) <- NULL

# Unique descriptions of the most harmful events
unique.desc <- unique(keydata[keydata$percentile90, "event.type"])

# Number of unique descriptions
unique.desc.num <- length(unique.desc)

# Create an object for summary information for the three kinds of 90th percentile plus events
big.event.summary <- NULL
big.event.summary <- cbind(big.event.summary,unique.desc)

# Death sums from these unique events
big.death.sum <- sapply(1:unique.desc.num, function(x) {
        sum(big.events[big.events$event.type==unique.desc[x],"deaths"])
})
big.event.summary <- cbind(big.event.summary,big.death.sum)

# Injury sums from these unique events
big.injury.sum <- sapply(1:unique.desc.num, function(x) {
        sum(big.events[big.events$event.type==unique.desc[x],"injuries"])
})
big.event.summary <- cbind(big.event.summary,big.injury.sum)

# Damage sums from these unique events
big.damage.sum <- sapply(1:unique.desc.num, function(x) {
        sum(big.events[big.events$event.type==unique.desc[x],"damage"])
})
big.event.summary <- cbind(big.event.summary,big.damage.sum)

# Ensure that the big.event.summary object is a data frame
big.event.summary <- as.data.frame(big.event.summary)


# big.event.summary by deaths
big.event.death.sort <- big.event.summary[order(-big.death.sum),]
# big.event.summary by injuries
big.event.injury.sort <- big.event.summary[order(-big.injury.sum),]
# big.event.summary by damage
big.event.damage.sort <- big.event.summary[order(-big.damage.sum),]

paste("Top 10 event types for total deaths in 90th percentile or above death events:")
## [1] "Top 10 event types for total deaths in 90th percentile or above death events:"
print(big.event.death.sort[1:10,c("unique.desc","big.death.sum")], row.names=FALSE)
##     unique.desc big.death.sum
##         TORNADO          5071
##  EXCESSIVE HEAT          1405
##            HEAT           761
##     FLASH FLOOD           398
##           FLOOD           195
##       HEAT WAVE           161
##       TSTM WIND           122
##    WINTER STORM            94
##    EXTREME HEAT            91
##       HIGH WIND            89
paste("Top 10 event type for total injuries in 90th percentile or above injury events:")
## [1] "Top 10 event type for total injuries in 90th percentile or above injury events:"
print(big.event.injury.sort[1:10,c("unique.desc","big.injury.sum")], row.names=FALSE)
##        unique.desc big.injury.sum
##            TORNADO          79864
##              FLOOD           6601
##     EXCESSIVE HEAT           6202
##          TSTM WIND           2246
##               HEAT           2004
##          ICE STORM           1889
##  HURRICANE/TYPHOON           1256
##        FLASH FLOOD           1163
##       WINTER STORM           1011
##               HAIL            858
paste("Top 10 event type for total damage amounts in 90th percentile or above damage events:")
## [1] "Top 10 event type for total damage amounts in 90th percentile or above damage events:"
print(big.event.damage.sort[1:10,c("unique.desc","big.damage.sum")], row.names=FALSE)
##         unique.desc big.damage.sum
##             TORNADO     2772450.25
##         FLASH FLOOD     1122434.52
##               FLOOD      776705.14
##           TSTM WIND      692550.19
##                HAIL       451042.5
##   THUNDERSTORM WIND       434193.9
##           LIGHTNING       432924.4
##  THUNDERSTORM WINDS         291592
##           HIGH WIND       245969.6
##        WINTER STORM       109352.9

Results

The analysis section identified the types of events that were associated with the the events that had the greatest magnitudes of harful effects on the public, specifically events with the greatest magnitude of deaths, injuries, and damage.

As a first step toward managing and reducing threats to public health and safety, government or municipal managers who might be responsible for preparing for severe weather events may also have need to prioritize resources for different types of events. As a first step, it may be prudent to identify those types of events that have been associated with the greatest magnitude of harmful weather-related outcomes across the United States to see which, if any, may be relevant to their stakeholders, especially if that type of event is associated with significant deaths, injuries, and economic losses.

# Matching top injury, death, and damage event types
matching.events.vector <- as.character(big.event.injury.sort$unique.desc[1:10]) %in% as.character(big.event.death.sort$unique.desc[1:10]) %in% as.character(big.event.damage.sort$unique.desc[1:10]) 


# Event types present in each top ten list
big.event.intersection.death.injury <- intersect(as.character(big.event.injury.sort$unique.desc[1:10]), as.character(big.event.death.sort$unique.desc[1:10]))

big.event.intersection.all.harm <- intersect(big.event.intersection.death.injury, as.character(big.event.damage.sort$unique.desc[1:10]))

paste("The following ",length(big.event.intersection.all.harm), " harmful event descriptions were associated for the top 10 list for most deaths, most total injuries, and highest damage costs where the magnitude of the harm in was at or above the 90th percentile for each respective type of harm.", sep="")
## [1] "The following 5 harmful event descriptions were associated for the top 10 list for most deaths, most total injuries, and highest damage costs where the magnitude of the harm in was at or above the 90th percentile for each respective type of harm."
cat(sort(big.event.intersection.all.harm),sep="\n")
## FLASH FLOOD
## FLOOD
## TORNADO
## TSTM WIND
## WINTER STORM