Devansh Saxena

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.

The first step is to read the data into a data frame.

data.file <- "stormData.csv.bz2"
if (!file.exists(data.file)) {
    cat("Downloading data file")
    data.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(data.url, destfile=data.file, method="curl")
}
## Downloading data file

Economic Effects of Weather Events

To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.

In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.

This data set spans a long time period (1950-2011). Because of the increase in population density over this period (which significantly affects all aspects of storm damage), this analysis is restricted to Jan 1st, 2001 and later. Note that there are no incomplete cases in this dataset. We also combine the Event types HURRICANE/TYPHOON and HURRICANE.

storm.data <- read.csv(bzfile(data.file), stringsAsFactors=FALSE,
                       nrows=903000)
# Do this cut first so we can count identified exponent cut losses
storm.data$BGN_DATE <- as.Date(storm.data$BGN_DATE, format="%m/%d/%Y")
desired.cols <- c("BGN_DATE", "FATALITIES", "INJURIES", "EVTYPE",
                 "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")     
storm.data <- subset(storm.data, BGN_DATE >= "2001/01/01", desired.cols)
nrows.storm <- nrow(storm.data)
storm.data$PROPDMGEXP <- toupper(storm.data$PROPDMGEXP)
storm.data$CROPDMGEXP <- toupper(storm.data$CROPDMGEXP)
good.exponents <- c("", "K", "M", "B")
storm.data <- subset(storm.data, PROPDMGEXP %in% good.exponents &
                    CROPDMGEXP %in% good.exponents)
cat(paste("Requiring understood exponents removed",
          nrows.storm - nrow(storm.data), "of", nrows.storm, "rows"))
## Requiring understood exponents removed 1 of 488692 rows
storm.data$EVTYPE <- as.factor(storm.data$EVTYPE)
names(storm.data) <- c("Begin.Date", "Fatalities", "Injuries",
                        "Event.Type", "Property.Damage", 
                        "Property.Damage.Exp",
                        "Crop.Damage", "Crop.Damage.Exp")
cat(paste("Number of incomplete cases:", sum(!complete.cases(storm.data))))
## Number of incomplete cases: 0

Now multiply in the damage exponent scalings (e.g., Billions).

val.prefix <- function(dmg, exponent) {
    if (exponent == "K") {dmg * 1e3}
    else if (exponent == "M") {dmg * 1e6}
    else if (exponent == "B") {dmg * 1e9}
    else dmg
}
storm.data$Property.Damage <- mapply(val.prefix, storm.data$Property.Damage, 
                                    storm.data$Property.Damage.Exp)
storm.data$Crop.Damage <- mapply(val.prefix, storm.data$Crop.Damage,
                                 storm.data$Crop.Damage.Exp)  
storm.data$Property.Damage.Exp <- NULL  # No longer needed
storm.data$Crop.Damage.Exp <- NULL

It is convenient for later to combine HURRICANE/TYPHOON events and HURRICANE events, since the governmental response is similar.

whtyph <- which(storm.data$Event.Type=="HURRICANE/TYPHOON")
storm.data[whtyph, "Event.Type"] <- "HURRICANE"

Inflation correction

Inflation is not included in the damage values. This is not a huge effect between 2001 and 2011, but is worth including because the relative rarity of some major effects means that not correction will introduce significant noise in the estimates. Inflation can be computed using information on the consumer price index, here dowloaded on 2014/10/14 at 11:13pm UTC.

cpi.file <- "CPI.csv"
if (!file.exists(cpi.file)) {
    cpi.url <- "http://research.stlouisfed.org/fred2/data/CPIAUCSL.csv"
    download.file(cpi.url, destfile=cpi.file, method="curl")
}

Convert this to a function that gives the CPI relative to 2001/01/01 using linear interpolation, and apply it to the damages. This turns out to be much faster if we convert the dates to numeric:

Now all of the damages are in Jan 2001 dollars.

Results

Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat and flash floods.

Economic impact of weather events

Start by forming the summary statistics:

library(plyr, quiet=TRUE)
storm.agg <- ddply(storm.data, ~ Event.Type, summarize,
                    mean.Fatalities=mean(Fatalities),
                    total.Fatalities=sum(Fatalities),
                    mean.Injuries=mean(Injuries),
                    total.Injuries=sum(Injuries),
                    mean.Crop=mean(Crop.Damage),
                    total.Crop=sum(Crop.Damage),
                    mean.Property=mean(Property.Damage),
                    total.Property=sum(Property.Damage))

There are too many types of events to easily visualize, so here only the top 5 are shown (i.e., top fatalities, top injuries, etc.). Damages are converted to billions of dollars.

nshow <- 5
top.Fatalities <- storm.agg[order(storm.agg$total.Fatalities, decreasing=T),
                            c("Event.Type", "total.Fatalities")]
top.Fatalities <- head(top.Fatalities, nshow)
top.Injuries <- storm.agg[order(storm.agg$total.Injuries, decreasing=T),
                            c("Event.Type", "total.Injuries")]
top.Injuries <- head(top.Injuries, nshow)
top.Property <- storm.agg[order(storm.agg$total.Property, decreasing=T),
                            c("Event.Type", "total.Property")]
top.Property <- head(top.Property, nshow)
top.Property <- mutate(top.Property, total.Property=total.Property*1e-9)
top.Crop <- storm.agg[order(storm.agg$total.Crop, decreasing=T),
                            c("Event.Type", "total.Crop")]
top.Crop <- head(top.Crop, nshow)
top.Crop <- mutate(top.Crop, total.Crop=total.Crop*1e-9)
require(ggplot2) || stop("Install ggplot2")
## Loading required package: ggplot2
## [1] TRUE
require(grid) || stop("Install ggplot2")
## Loading required package: grid
## [1] TRUE
fat.plot <- ggplot(top.Fatalities, aes(Event.Type, total.Fatalities)) +           
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Total Fatalities",x="Storm Type")
inj.plot <- ggplot(top.Injuries, aes(Event.Type, total.Injuries)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Total Injuries", x="Storm Type")
prop.plot <- ggplot(top.Property, aes(Event.Type, total.Property)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Total Property Damage [Billions of $]", 
                 x="Storm Type")
crop.plot <- ggplot(top.Crop, aes(Event.Type, total.Crop)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Total Crop Damage [Billions of $]", 
                 x="Storm Type")
grid.newpage()
vp <- viewport(layout = grid.layout(2, 4, height=unit(c(1, 6), "null")))
pushViewport(vp)
print(fat.plot, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(inj.plot, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))
print(prop.plot, vp = viewport(layout.pos.row = 2, layout.pos.col = 3))
print(crop.plot, vp = viewport(layout.pos.row = 2, layout.pos.col = 4))

grid.text("Top 5 storm types in the US 2001-2011 by total damages", 
          vp = viewport(layout.pos.row = 1, layout.pos.col = 1:4))

As can be seen, Tornadoes cause the largest number of injuries and fatalities, while floods and droughts cause the largest amount of financial damage over the period studied.

Mean damage per event

We can do the same thing by average damages per event. Because of the way that tornadoes are tracked in the NOAA database (when they cross county lines or temporarily lift off, they are counted as a new event) the mean damage per event information is difficult to interpret for tornadoes. Fixing this would be rather difficult, unfortunately, but it is worth keeping in mind.

mean.Fatalities <- storm.agg[order(storm.agg$mean.Fatalities, decreasing=T),
                            c("Event.Type", "mean.Fatalities")]
mean.Fatalities <- head(mean.Fatalities, nshow)
mean.Injuries <- storm.agg[order(storm.agg$mean.Injuries, decreasing=T),
                            c("Event.Type", "mean.Injuries")]
mean.Injuries <- head(mean.Injuries, nshow)
mean.Property <- storm.agg[order(storm.agg$mean.Property, decreasing=T),
                            c("Event.Type", "mean.Property")]
mean.Property <- head(mean.Property, nshow)
mean.Property <- mutate(mean.Property, mean.Property=mean.Property*1e-6)
mean.Crop <- storm.agg[order(storm.agg$mean.Crop, decreasing=T),
                            c("Event.Type", "mean.Crop")]
mean.Crop <- head(mean.Crop, nshow)
mean.Crop <- mutate(mean.Crop, mean.Property=mean.Crop*1e-6)
fat.mplot <- ggplot(mean.Fatalities, aes(Event.Type, mean.Fatalities)) +           
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Mean Fatalities",x="Storm Type")
inj.mplot <- ggplot(mean.Injuries, aes(Event.Type, mean.Injuries)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Mean Injuries", x="Storm Type")
prop.mplot <- ggplot(mean.Property, aes(Event.Type, mean.Property)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Mean Property Damage [Millions of $]", x="Storm Type")
crop.mplot <- ggplot(mean.Crop, aes(Event.Type, mean.Crop)) +
            geom_bar(stat="identity", fill="blue") +
            theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
            labs(y="Mean Crop Damage [Millions of $]", 
                 x="Storm Type")
grid.newpage()
vp <- viewport(layout = grid.layout(2, 4, height=unit(c(1, 6), "null")))
pushViewport(vp)
print(fat.mplot, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(inj.mplot, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))
print(prop.mplot, vp = viewport(layout.pos.row = 2, layout.pos.col = 3))
print(crop.mplot, vp = viewport(layout.pos.row = 2, layout.pos.col = 4))

grid.text("Top 5 storm types in the US 2001-2011 by mean damages", 
          vp = viewport(layout.pos.row = 1, layout.pos.col = 1:4))

Property damages are given in logarithmic scale due to large range of values. The data shows that flash floods and thunderstorm winds cost the largest property damages among weather-related natural diseasters. Note that, due to untidy nature of the available data, type flood and flash flood are separate values and should be merged for more accurate data-driven conclusions.The total damages caused by Hurricanes every year will vary significantly can be seen in the following plot:

library(lubridate, quiet=TRUE)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:plyr':
## 
##     here
## The following object is masked from 'package:base':
## 
##     date
hurricane.data <- storm.data[storm.data$Event.Type == "HURRICANE",]
hurricane.data <- mutate(hurricane.data, 
                  Year = as.factor(year(Begin.Date)),
                  Total.Damage = (Property.Damage + Crop.Damage) * 1e-9)
hurricane.agg <- ddply(hurricane.data, ~ Year, summarize,
                       Total.Damage=sum(Total.Damage))
ggplot(hurricane.data, aes(Year, Total.Damage)) +
    geom_bar(stat="identity") + 
    labs(y="Crop and Property Damage [Billions of $]",
         title="Total Hurricane Damage by Year")