knitr::opts_chunk$set(echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE)
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Data
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded from the following link.
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The objective of the assignment is to understand the impact of the weather events on health and national economy, especially the damaged caused in properties and crops.
# The code will check if the data directory exists in the working directory,
# if the directory is present, it will check if the storm data file has been
# downloaded earlier. If so, it will read the file. Else, it will create the
# data directory and download the file. Thereafter it will read the file.
# create data directory if it does not exists
wrkdir <- getwd()
subdir <- "data"
dir.create(file.path(wrkdir, subdir), showWarnings = FALSE)
datadir <- file.path(wrkdir, subdir)
# Download the stormdata file into the Data Directory if the file does not exists
setwd(datadir)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
# Download the file only if the file does not exist in the data directory
if (!file.exists(destfile)) {
download.file(url, destfile, method="curl")
closeAllConnections()
}
# Read the stormdata
if (!exists("origdata")){
origdata <- read.csv(bzfile(destfile))
}
# Set back the working directory
setwd(wrkdir)
# Keep the original data for future reference
stormdata <- origdata
# Clean the eventtype data
# translate all event types to lowercase
stormdata$EVTYPE <- tolower(stormdata$EVTYPE)
# replace all punct. characters with a space
stormdata$EVTYPE <- gsub("[[:blank:][:punct:]+]", " ", stormdata$EVTYPE)
# Substitude winds with wind
stormdata$EVTYPE <- gsub("winds", "wind", stormdata$EVTYPE)
# Substitute tstm with thunderstorm
stormdata$EVTYPE <- gsub("tstm", "thunderstorm", stormdata$EVTYPE)
# Trim leading whitespaces
stormdata$EVTYPE <- sub("^\\s", "", stormdata$EVTYPE)
# Include required library
require(knitr)
require(dplyr)
require(tidyr)
require(ggplot2)
# group data by event type
dfgrp <- group_by (stormdata, EVTYPE)
# summarise data by event type while totaling the fatalities and injuries
casualty <- summarize(dfgrp,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
# Filter records having no fatalities and injuries
casualty <- filter(casualty, fatalities!=0, injuries!=0)
# Sorted on fatalities
casualty.fatalities <- arrange(casualty, desc(fatalities))
# Sorted on injuries
casualty.injuries <- arrange(casualty, desc(injuries))
# Using Kable for neater output of the table
knitr::kable(select(casualty.fatalities, EVTYPE, fatalities)[1:10,])
| EVTYPE | fatalities |
|---|---|
| tornado | 5633 |
| excessive heat | 1903 |
| flash flood | 978 |
| heat | 937 |
| lightning | 816 |
| thunderstorm wind | 701 |
| flood | 470 |
| rip current | 368 |
| high wind | 283 |
| avalanche | 224 |
# Using Kable for neater output of the table
knitr::kable(select(casualty.injuries, EVTYPE, injuries)[1:10,])
| EVTYPE | injuries |
|---|---|
| tornado | 91346 |
| thunderstorm wind | 9353 |
| flood | 6789 |
| excessive heat | 6525 |
| lightning | 5230 |
| heat | 2100 |
| ice storm | 1975 |
| flash flood | 1777 |
| high wind | 1439 |
| hail | 1361 |
To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.
In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.
# the purpose of this function is to determine the exponential value based
# on the charcter in PROPDMGEXP and CROPDMGEXP columns in the dataset
trans_exp <- function(exp) {
if (exp %in% c('h', 'H'))
return(2)
else if (exp %in% c('h', 'H'))
return(3)
else if (exp %in% c('m', 'M'))
return(6)
else if (exp %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(exp)))
return(as.numeric(exp))
else if (exp %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent value.")
}
}
# Create a column containing the expotential
stormdata$propdmg_exp <- sapply(stormdata$PROPDMGEXP, FUN=trans_exp)
stormdata$cropdmg_exp <- sapply(stormdata$CROPDMGEXP, FUN=trans_exp)
# Compute the dollar amount of the property and crop damage
stormdata$prop_dmg <- stormdata$PROPDMG * (10 ** stormdata$propdmg_exp)
stormdata$crop_dmg <- stormdata$CROPDMG * (10 ** stormdata$cropdmg_exp)
# Compute the economic loss by event type
# group by event type
dfgrp <- group_by (stormdata, EVTYPE)
# summarise by event type
economicloss <- summarize(dfgrp,
prop_dmg = sum(prop_dmg),
crop_dmg = sum(crop_dmg))
# Filter records having no economic loss
economicloss <- filter(economicloss, prop_dmg!=0, crop_dmg!=0)
# Sorted on fatalities
economicloss.prop_dmg <- arrange(economicloss, desc(prop_dmg))
# Sorted on injuries
economicloss.crop_dmg <- arrange(economicloss, desc(crop_dmg))
# Using Kable for neater output of the table
knitr::kable(select(economicloss.prop_dmg, EVTYPE, prop_dmg)[1:10,])
| EVTYPE | prop_dmg |
|---|---|
| tornado | 3.163480e+23 |
| thunderstorm wind | 2.645861e+23 |
| flash flood | 1.405882e+23 |
| flood | 8.785298e+22 |
| hail | 6.751067e+22 |
| lightning | 6.028593e+22 |
| high wind | 3.760199e+22 |
| winter storm | 1.311573e+22 |
| heavy snow | 1.214391e+22 |
| wildfire | 8.081400e+21 |
# Using Kable for neater output of the table
knitr::kable(select(economicloss.crop_dmg, EVTYPE, crop_dmg)[1:10,])
| EVTYPE | crop_dmg |
|---|---|
| hail | 5.769940e+12 |
| thunderstorm wind | 1.937181e+12 |
| flash flood | 1.780814e+12 |
| flood | 1.630884e+12 |
| tornado | 9.957467e+11 |
| drought | 2.284111e+11 |
| high wind | 1.844799e+11 |
| heavy rain | 1.047210e+11 |
| frost freeze | 6.154814e+10 |
| tropical storm | 5.293312e+10 |
All data are shown in logrithmic scale (Log10) as the range of values is quite large.
# create one dataframe with all the data for plotting using facets
fataldata <- casualty.fatalities[1:10,]
fataldata$dangertype <- "Fatalities"
fataldata$number <- fataldata$fatalities
injurdata <- casualty.injuries[1:10,]
injurdata$dangertype <- "Injuries"
injurdata$number <- injurdata$injuries
# Combine property and crop damage
dangerdata <- rbind(fataldata,injurdata)
# Retain relevant columns only
dangerdata <- select(dangerdata, EVTYPE, dangertype, number)
# define the plot data
plotdata <- dangerdata
# labels
xlabel <- "Event Type"
ylabel <- "Fatalities & Injuries (Log10 scale)"
title <- "Health cosequences of Weather Events"
# creating the plot
ggplot(plotdata, aes(x = reorder(toupper(EVTYPE), number),
y = log10(number),
fill = EVTYPE)) +
geom_bar(stat="identity") +
labs (x= xlabel, y=ylabel) +
labs(title=title)+
facet_wrap(~dangertype)+
theme(legend.position="none")+
coord_flip()
Tornadoes caused the most number of deaths and injuries among all event types. The other events that caused the most death are excessive heat and flash flood. The events that caused high number of deaths are Thunderstorm wind and Excessive heat. Excessive heat figures as one of the highest cause of both death and injuries after Tornado.
# create one dataframe with all the data for plotting using facets
propdata <- economicloss.prop_dmg[1:10,]
propdata$damagetype <- "Property damage"
propdata$damageamt <- propdata$prop_dmg
cropdata <- economicloss.prop_dmg[1:10,]
cropdata$damagetype <- "Crop damage"
cropdata$damageamt <- cropdata$crop_dmg
# Combine property and crop damage
damagedata <- rbind(propdata,cropdata)
# Retain relevant columns only
damagedata <- select(damagedata, EVTYPE, damagetype, damageamt)
# define the plot data
plotdata <- damagedata
# labels
xlabel <- "Event Type"
ylabel <- "Property Damage in dollars (Log10 scale)"
title <- "Economic cosequences of Weather Events"
# creating the plot
ggplot(plotdata, aes(x = reorder(toupper(EVTYPE), damageamt),
y = log10(damageamt),
fill = EVTYPE)) +
geom_bar(stat="identity") +
labs (x= xlabel, y=ylabel) +
labs(title=title)+
facet_wrap(~damagetype)+
theme(legend.position="none")+
coord_flip()
Tornado also caused immense loss of both property and the crop. However, the biggest damage to the crop was done by Hail followed by thunderstorm wind and flash flood. The thunderstorm wind and flash flood appear as the 2nd and 3rd largest cause of damage to both properties and crops.