Impact of severe weather events in the United States

Beatriz Gutierrez

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal is to explore the NOAA Storm Database and answer the following questions about severe weather events:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

This data set is downloaded only in the case it does not already exist in the working directory. After that, it is decompressed and load into memory using the read.csv R command. This data set contains the raw data:

# downloading the data from the NOAA web
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_name <-".\\repdata_data_StormData.csv.bz2"

if (!file.exists(file_name)) {
        download.file(file_url, destfile = zip_file, mode = 'wb')
        date_download <- date() 
}
# decompressing and loading into memory 
raw_data <- read.csv(file_name)

There is some documentation of the database available, such as information about the variables are constructed/defined in the National Weather Service Storm Data Documentation. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The variables that will be used for this analysis include:

EVTYPE: Event Type
FATALITIES: # of fatalities caused by each type of event
INJURIES: # of injuries caused by each type of event
PROPDMG: property damaged cost
PROPDMGEXP: units of the property damaged cost

Analysis of the most harmful types of events with respect to population health across the United States.

The impact of severe weather events on the population health is represented by the fatalities and the injuries. So, both of them have been analyzed.

Fatalities

The number of fatalities by types of events were computed omitting the missing values.

fatalities_per_event <- aggregate(FATALITIES ~ EVTYPE, data = raw_data, FUN = sum, na.rm = TRUE)

The way to determine the most harmful types of events was using the 95th percentile of these fatalities and getting the top 0.5 above this percentile. For that purpose, the fatalities by event type were organized in ascending order and only values greater than zero were considered.

order_fatalities_asc <- fatalities_per_event[order(fatalities_per_event$FATALITIES), ]
# remove events that do not caused fatalities
order_fatalities_asc <- order_fatalities_asc[which(order_fatalities_asc$FATALITIES > 0), ]
# select most harmful fatalities by computing the 95th percentile
fatalities_95_percentil <- quantile(order_fatalities_asc$FATALITIES, c(.95))
# get the top 0.5 events
fatalities_most_harmful <- order_fatalities_asc[which(order_fatalities_asc$FATALITIES > fatalities_95_percentil), ]

Injuries

The number of injuries by types of events were computed omitting the missing values.

injuries_per_event <- aggregate(INJURIES ~ EVTYPE, data = raw_data, FUN = sum, na.rm = TRUE)

The way to determine the most harmful events was using the 95th percentile of these injuries and obtain the top 0.5 above this percentile. For that purpose, the injuries by event type were organized in ascending order and only values greater than zero were considered.

order_injuries_asc <- injuries_per_event[order(injuries_per_event$INJURIES), ]
# remove events that do not caused fatalities 
order_injuries_asc <- order_injuries_asc[which(order_injuries_asc$INJURIES > 0), ]
# select most harmful fatalities by computing the 95th percentile
injuries_95_percentil <- quantile(order_injuries_asc$INJURIES, c(.95))
# get the top 0.5 events
injuries_most_harmful <- order_injuries_asc[which(order_injuries_asc$INJURIES > injuries_95_percentil), ]

Analysis of the most harmful types of events with respect to the greatest economic consequences across the United States.

Many severe events can result in property damage, which are represented in the data set by the PROPDMG variable. However, their units are defined by the variable PROPDMGEXP such as: H (100), K (1,000), M (1,000,000) or B (1,000,000,000). A pre-process task was done to convert all property damage amounts into dollars and stored in a new variable PROPDMG_dollar.

damage_data <- raw_data
#add a new column with the damage in dollars
damage_data$PROPDMG_dollar <- ifelse (damage_data$PROPDMGEXP == "B", damage_data$PROPDMG*10^9,
                                      ifelse (damage_data$PROPDMGEXP %in% c("M", "m"), damage_data$PROPDMG*10^6,
                                      ifelse (damage_data$PROPDMGEXP %in% c("K", "k"), damage_data$PROPDMG*10^3,
                                      ifelse (damage_data$PROPDMGEXP %in% c("H", "h"), damage_data$PROPDMG*10^2,
                                      damage_data$PROPDMG))))

Then, the property damage values in dollars were aggregated by event type omitting the missing values.

damage_per_event <- aggregate(PROPDMG_dollar ~ EVTYPE, data = damage_data, FUN = sum, na.rm = TRUE)

The same criteria to determine the most harmful events was used. The 95th percentile of these property damages was computed and the top 0.5 above this percentile was selected. For that purpose, the property damage values by event type were organized in ascending order and only values greater than zero were considered.

# order the data by damages in ascending order
order_damage_asc <- damage_per_event[order(damage_per_event$PROPDMG_dollar), ]
# remove events that do not caused damages 
order_damage_asc <- order_damage_asc[which(order_damage_asc$PROPDMG_dollar > 0), ]
# select greatest economic consequences by computing the 95th percentile
damages_95_percentil <- quantile(order_damage_asc$PROPDMG_dollar, c(.95))
# get the top 0.5 events that cause worst damages
damages_most_harmful <- order_damage_asc[which(order_damage_asc$PROPDMG_dollar > damages_95_percentil), ]

Results

Question 1: Which types of events are most harmful to population health?

A two rows panel depicts the most harmful event types regarding to fatalities and injuries. For that purpose the library ggplot2 is used to plot the information and the library cowplot is used to create a two rows panel. Notice that the event types are different for fatalities and injuries, and also the number of event types, because they were selected using the 95th percentile in each case.

library(cowplot)

## Warning: package 'cowplot' was built under R version 4.1.3

library(ggplot2)

# plot fatalities
fatalities_plot <- ggplot(fatalities_most_harmful, aes(x = FATALITIES, y = reorder(EVTYPE, FATALITIES))) +
        geom_bar(stat = "identity", fill = "darkorange") +
        xlab('Fatalities') +
        ylab('Events') 

# plot injuries
injuries_plot <- ggplot(injuries_most_harmful, aes(x = INJURIES, y = reorder(EVTYPE, INJURIES))) +
        geom_bar(stat = "identity", fill = "darkturquoise") +
        xlab('Injuries') +
        ylab('Events')

plot_grid(fatalities_plot, injuries_plot,  align = "v", nrow = 2)

Fatalities (top) and injuries (bottom) caused by most harmful events

These graphs show that tornado is the most harmful weather event regarding to population health, in terms of both fatalities and injuries.

Question 2: Which types of events have the greatest economic consequences?

The most harmful types of events regarding property damage are depicted in this plot using the library ggplot2. where the damage amount was big enough to represent as billion of dollars.

# plot damages
library(ggplot2)
ggplot(damages_most_harmful, aes(x = PROPDMG_dollar/10e9, y = reorder(EVTYPE, PROPDMG_dollar/10e9))) +
        geom_bar(stat = "identity", fill = "blue") +
        xlab('Property damage [billions of dollars]') +
        ylab('Events') +
        ggtitle('Property damage caused by most harmful events')

These graphs show that flood has the greatest economic consequences in property damage.

Reproducible Research Peer-graded Assignment: Course Project 2