1 Synopsis
2 Data Processing
3 Dangerous Events with respect to Population Health
4 Question 2: Across the United States, which types of events have the greatest economic consequences?
5 Results
- 5.1 Plot for injuries and death due to weather events
- 5.2 Plot for damage of properties and crops due to weather events

knitr::opts_chunk$set(echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE)

1 Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded from the following link.

Storm Data 47Mb

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The objective of the assignment is to understand the impact of the weather events on health and national economy, especially the damaged caused in properties and crops.

2 Data Processing

# The code will check if the data directory exists in the working directory,
# if the directory is present, it will check if the storm data file has been
# downloaded earlier. If so, it will read the file. Else, it will create the
# data directory and download the file. Thereafter it will read the file.

# create data directory if it does not exists
wrkdir <- getwd()
subdir <- "data"
dir.create(file.path(wrkdir, subdir), showWarnings = FALSE)
datadir <- file.path(wrkdir, subdir)

# Download the stormdata file into the Data Directory if the file does not exists
setwd(datadir)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"

# Download the file only if the file does not exist in the data directory
if (!file.exists(destfile)) {
        download.file(url, destfile, method="curl")
        closeAllConnections()
        }
# Read the stormdata
if (!exists("origdata")){
        origdata <- read.csv(bzfile(destfile))
        }

# Set back the working directory
setwd(wrkdir)

# Keep the original data for future reference
stormdata <- origdata

# Clean the eventtype data
# translate all event types to lowercase
stormdata$EVTYPE <- tolower(stormdata$EVTYPE)

# replace all punct. characters with a space
stormdata$EVTYPE  <- gsub("[[:blank:][:punct:]+]", " ", stormdata$EVTYPE)

# Substitude winds with wind
stormdata$EVTYPE <- gsub("winds", "wind", stormdata$EVTYPE)

# Substitute tstm with thunderstorm
stormdata$EVTYPE <- gsub("tstm", "thunderstorm", stormdata$EVTYPE)

# Trim leading whitespaces
stormdata$EVTYPE <- sub("^\\s", "", stormdata$EVTYPE)

3 Dangerous Events with respect to Population Health

3.1 Data processing

# Include required library
require(knitr)
require(dplyr)
require(tidyr)
require(ggplot2)

# group data by event type
dfgrp <- group_by (stormdata, EVTYPE)

# summarise data by event type while totaling the fatalities and injuries
casualty <- summarize(dfgrp, 
                      fatalities = sum(FATALITIES),
                      injuries = sum(INJURIES))

# Filter records having no fatalities and injuries
casualty <- filter(casualty, fatalities!=0, injuries!=0)

# Sorted on fatalities
casualty.fatalities <- arrange(casualty, desc(fatalities))

# Sorted on injuries
casualty.injuries <- arrange(casualty, desc(injuries))

3.2 Top 10 events that caused largest number of fatalities are

# Using Kable for neater output of the table
knitr::kable(select(casualty.fatalities, EVTYPE, fatalities)[1:10,])

EVTYPE	fatalities
tornado	5633
excessive heat	1903
flash flood	978
heat	937
lightning	816
thunderstorm wind	701
flood	470
rip current	368
high wind	283
avalanche	224

3.3 Top 10 events that caused most number of injuries are

# Using Kable for neater output of the table
knitr::kable(select(casualty.injuries, EVTYPE, injuries)[1:10,])

EVTYPE	injuries
tornado	91346
thunderstorm wind	9353
flood	6789
excessive heat	6525
lightning	5230
heat	2100
ice storm	1975
flash flood	1777
high wind	1439
hail	1361

4 Question 2: Across the United States, which types of events have the greatest economic consequences?

To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.

In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.

4.1 Data processing

# the purpose of this function is to determine the exponential value based 
# on the charcter in PROPDMGEXP and CROPDMGEXP columns in the dataset
trans_exp <- function(exp) {
        if (exp %in% c('h', 'H'))
                return(2)
        else if (exp %in% c('h', 'H'))
                return(3)
        else if (exp %in% c('m', 'M'))
                return(6)
        else if (exp %in% c('b', 'B'))
                return(9)
        else if (!is.na(as.numeric(exp)))
                return(as.numeric(exp))
        else if (exp %in% c('', '-', '?', '+'))
                return(0)
        else {
                stop("Invalid exponent value.")
                }
        }

# Create a column containing the expotential 
stormdata$propdmg_exp   <- sapply(stormdata$PROPDMGEXP, FUN=trans_exp)
stormdata$cropdmg_exp   <- sapply(stormdata$CROPDMGEXP, FUN=trans_exp)

# Compute the dollar amount of the property and crop damage
stormdata$prop_dmg <- stormdata$PROPDMG * (10 ** stormdata$propdmg_exp)
stormdata$crop_dmg <- stormdata$CROPDMG * (10 ** stormdata$cropdmg_exp)

# Compute the economic loss by event type

# group by event type
dfgrp <- group_by (stormdata, EVTYPE)

# summarise by event type
economicloss <- summarize(dfgrp, 
                          prop_dmg = sum(prop_dmg),
                          crop_dmg = sum(crop_dmg))

# Filter records having no economic loss
economicloss <- filter(economicloss, prop_dmg!=0, crop_dmg!=0)

# Sorted on fatalities
economicloss.prop_dmg <- arrange(economicloss, desc(prop_dmg))

# Sorted on injuries
economicloss.crop_dmg <- arrange(economicloss, desc(crop_dmg))

4.2 The top 10 events that caused most property damage (in dollars) are

# Using Kable for neater output of the table
knitr::kable(select(economicloss.prop_dmg, EVTYPE, prop_dmg)[1:10,])

EVTYPE	prop_dmg
tornado	3.163480e+23
thunderstorm wind	2.645861e+23
flash flood	1.405882e+23
flood	8.785298e+22
hail	6.751067e+22
lightning	6.028593e+22
high wind	3.760199e+22
winter storm	1.311573e+22
heavy snow	1.214391e+22
wildfire	8.081400e+21

4.3 The top 10 events that caused biggest crop damage (in dollars) are

# Using Kable for neater output of the table
knitr::kable(select(economicloss.crop_dmg, EVTYPE, crop_dmg)[1:10,])

EVTYPE	crop_dmg
hail	5.769940e+12
thunderstorm wind	1.937181e+12
flash flood	1.780814e+12
flood	1.630884e+12
tornado	9.957467e+11
drought	2.284111e+11
high wind	1.844799e+11
heavy rain	1.047210e+11
frost freeze	6.154814e+10
tropical storm	5.293312e+10

5 Results

All data are shown in logrithmic scale (Log10) as the range of values is quite large.

5.1 Plot for injuries and death due to weather events

# create one dataframe with all the data for plotting using facets
fataldata            <- casualty.fatalities[1:10,]
fataldata$dangertype <- "Fatalities"
fataldata$number     <- fataldata$fatalities

injurdata            <- casualty.injuries[1:10,]
injurdata$dangertype <- "Injuries"
injurdata$number     <- injurdata$injuries

# Combine property and crop damage
dangerdata <- rbind(fataldata,injurdata)

# Retain relevant columns only
dangerdata <- select(dangerdata, EVTYPE, dangertype, number)

# define the plot data
plotdata <- dangerdata 

# labels
xlabel <- "Event Type"
ylabel <- "Fatalities & Injuries  (Log10 scale)"
title  <- "Health cosequences of Weather Events"

# creating the plot
ggplot(plotdata, aes(x = reorder(toupper(EVTYPE), number), 
                     y = log10(number), 
                     fill = EVTYPE)) +
        geom_bar(stat="identity") +
        labs (x= xlabel, y=ylabel) +
        labs(title=title)+
        facet_wrap(~dangertype)+
        theme(legend.position="none")+
        coord_flip()

Tornadoes caused the most number of deaths and injuries among all event types. The other events that caused the most death are excessive heat and flash flood. The events that caused high number of deaths are Thunderstorm wind and Excessive heat. Excessive heat figures as one of the highest cause of both death and injuries after Tornado.

5.2 Plot for damage of properties and crops due to weather events

# create one dataframe with all the data for plotting using facets
propdata            <- economicloss.prop_dmg[1:10,]
propdata$damagetype <- "Property damage"
propdata$damageamt  <- propdata$prop_dmg

cropdata            <- economicloss.prop_dmg[1:10,]
cropdata$damagetype <- "Crop damage"
cropdata$damageamt  <- cropdata$crop_dmg

# Combine property and crop damage
damagedata <- rbind(propdata,cropdata)

# Retain relevant columns only
damagedata <- select(damagedata, EVTYPE, damagetype, damageamt)

# define the plot data
plotdata <- damagedata

# labels
xlabel <- "Event Type"
ylabel <- "Property Damage in dollars (Log10 scale)"
title  <- "Economic cosequences of Weather Events"

# creating the plot
ggplot(plotdata, aes(x = reorder(toupper(EVTYPE), damageamt), 
                     y = log10(damageamt), 
                     fill = EVTYPE)) +
        geom_bar(stat="identity") +
        labs (x= xlabel, y=ylabel) +
        labs(title=title)+
        facet_wrap(~damagetype)+
        theme(legend.position="none")+
        coord_flip()

Tornado also caused immense loss of both property and the crop. However, the biggest damage to the crop was done by Hail followed by thunderstorm wind and flash flood. The thunderstorm wind and flash flood appear as the 2nd and 3rd largest cause of damage to both properties and crops.

Health and Economic Consequences of the Weather in US between 1950-2011

Sarajit Poddar

22 March 2015