Exploratory Analysis about the Impact of US Severe Weather Events

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Sypnosis

Our analysis want to answer two basic questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Loading the data

  1. Set the environment
  2. The data file is downloaded if it is not present (You can get the zipped file here)
  3. It is uncompressed and read as a data frame (i.e. read.csv())
# Load the required libraries
library(xtable)
library(data.table)
library(reshape2)
library(ggplot2)
# Read the data
file.con <- bzfile("repdata_data_StormData.csv.bz2", open = "r")
stormData <- read.csv(file.con, stringsAsFactors = FALSE)
close(file.con)

# Convert the names to lowercase
old.names <- names(stormData)
new.names <- tolower(old.names)
setnames(stormData, old.names, new.names)

Data Processing

We are centrated in the next variables:

And group the 'evetype' variable summarizing (by counting) the number of fatalities, injuries, propdmg and cropdmg. But we are only interested in the case where this variables are greater than zero.

  1. Aggregate the 'evetype' variable by each of the other variables
  2. Merge all the variables
  3. Subset the rows where at least 'fatalities' or 'injuries' variable are greater than zero (interest in population health)
  4. Subset the rows where at least 'propdmg' or 'cropdmg' variable are greater than zero (interest in economic consequences)
# Agreggate by events the sum of fatalities
fatalities <- sapply(split(stormData$fatalities, stormData$evtype), sum)
# Agreggate by events the sum of injuries
injuries <- sapply(split(stormData$injuries, stormData$evtype), sum)
# Agreggate by events the sum of propdmg
propdmg <- sapply(split(stormData$propdmg, stormData$evtype), sum)
# Agreggate by events the sum of cropdmg
cropdmg <- sapply(split(stormData$cropdmg, stormData$evtype), sum)
# Create a single data frame
eventsDmg <- as.data.frame(cbind(fatalities, injuries, propdmg, cropdmg))
eventsDmg <- cbind(events = rownames(eventsDmg), eventsDmg)
rownames(eventsDmg) <- NULL

# Subset where fatalities > 0 | injuries > 0
dmgHealth <- subset(eventsDmg, fatalities > 0 | injuries > 0)
dmgHealth <- dmgHealth[with(dmgHealth, order(fatalities, injuries, decreasing = T)), 
    ]
# Subset where propdmg > 0 | cropdmg > 0
dmgEcomcs <- subset(eventsDmg, propdmg > 0 | cropdmg > 0)
dmgEcomcs <- dmgEcomcs[with(dmgEcomcs, order(propdmg, cropdmg, decreasing = T)), 
    ]

Results

We want to answer:

For this we show the barplot of the 5 most harmful events cosidering for this the number of fatilities and injuries

top5Health <- head(dmgHealth, n = 5)[, c(1, 2, 3)]

top5HealthMelt <- melt(top5Health, id.vars = "events")
levels(top5HealthMelt$variable) <- c("fatalities", "injuries")

ggplot(top5HealthMelt, aes(x = events, y = value)) + geom_bar(stat = "identity", 
    position = "stack") + facet_grid(variable ~ .) + geom_text(aes(label = value, 
    vjust = -0.4), size = 3) + labs(x = "Event", y = "Count")

plot of chunk top5Health

Answer:
Event: TORNADO
Fatalities: 5633
Injuries: 9.1346 × 104

For this we show the barplot of the 5 events with the most economic consequences cosidering for this the amount of property damage and the amount of crop damage

top5Ecomcs <- head(dmgEcomcs, n = 5)[, c(1, 4, 5)]
top5EcomcsMelt <- melt(top5Ecomcs, id.vars = "events")
levels(top5EcomcsMelt$variable) <- c("propdmg", "cropdmg")

ggplot(top5EcomcsMelt, aes(x = events, y = value)) + geom_bar(stat = "identity", 
    position = "dodge") + facet_grid(variable ~ .) + geom_text(aes(label = value, 
    vjust = -0.4), size = 3) + labs(x = "Event", y = "Count")

plot of chunk top5Ecomcs

Answer:
Event: TORNADO
Property Damage: 3.2123 × 106
Crop Damage: 1.0002 × 105