Local severe weather events have nation-wide effects

U.S. National Oceanic and Atmospheric Administration (NOAA) publishes database which describes characteristics of major storms and weather events in the United States covering timespan from 1950 to 2011. Present report investigates the nation-wide economic consequences and public health effects of adverse weather events recorded in the database.

Overall,tornadoes are the cause of majority of weather-related deaths and injuries in addition to the significant economic impact. Economically, floods are the biggest cause of loss.

Typically weather events are relatively local even though their impact can be nation wide, depending on the area they occur. Therefore it is noted, that the preparedness for severe weather events should be based on information of local weather conditions.

Preparing the working environment

library(data.table)
library(ggplot2)
library(plyr)
library(maps)

setwd('/Users/majulass/Documents/2014/reproducible-research/peer-assesment-02')
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "storm-dataset.csv.bz2", method = "curl")
weather.df <- read.csv(bzfile("./storm-dataset.csv.bz2"))

Utility functions

# Process multipliers in exponent field.

process.multipliers <- function(multiplier,number){
  switch(tolower(as.character(multiplier)), 
  h={
    100 * number 
  },
  k={
    1000 * number
  },
  m={
   1000000 * number    
  },
  b={
    1000000000 * number
  },
  {
   number
  }
  )


  }

# Function for calculating total impact on public health
calculate.health.cost <- function(injuries,fatalities){
  injuries + fatalities
  }

# Utility function for creating Title Case strings.
# From: http://stackoverflow.com/a/15778614
topropper <- function(x) {
  # Makes Proper Capitalization out of a string or collection of strings. 
  sapply(x, function(strn)
   { s <- strsplit(strn, "\\s")[[1]]
       paste0(toupper(substring(s, 1,1)), 
             tolower(substring(s, 2)),
             collapse=" ")}, USE.NAMES=FALSE)
}

# Utility function for wrapping long titles in ggplot plots.
wrapper <- function(x, ...) paste(strwrap(x, ...), collapse = "\n")

Data processing

The original, unprocessed dataset contains 902297 rows. The dataset is prepared for analysis by renaming the variables following the principles of tidy data.

# Renaming the field based on information found at http://ire.org/nicar/database-library/databases/storm-events/
weather.names<-c(
  'state.code',
  'event.began.date',
  'event.began.time',
  'timezone',
  'county.code',
  'county.names',
  'state',
  'event.type',
  'event.start.point',
  'event.begin.azimuth',
  'event.start.location',
  'event.end.date',
  'event.end.time',
  'event.end.county',
  'event.coundn.end',
  'event.start.point',
  'event.end.azimuth',
  'event.end.location',
  'tornado.path.length',
  'tordano.path.width',
  'tornado.intensity.scale',
  'hail.magnitude',
  'fatalities',
  'injuries',
  'property.damage',
  'property.damage.exp',
  'crop.damage',
  'crop.damage.exp',
  'wfo.office.id',
  'event.us.state',
  'event.zones',
  'event.start.latitude',
  'event.start.longitude',
  'event.end.latitude',
  'event.end.longitude',
  'remarks',
  'ncdc.reference')

names(weather.df)<-weather.names

By observing the dataset visually it can be seen that event though the dataset is published by established govermental organization, the dataset is rather messy; compared to the codebook it contains non-standard and missing values which can distort the result of analyses based on the dataset. Therefore a strategy for cleaning the dataset is needed. For serious academic research or policy-making, the chosen strategy should try to minimize the loss of observations resulted from cleaning. Eg. observations with non-standard or unknown values should be imputed with replacement values.

For the purposes of this assingment more straightforward strategy is sufficient – the ambiquous observations are removed and only the major errors are corrected.

The following table lists top 50 weather event codes in the dataset:

event.type.counts<-as.data.frame(sort(table(weather.df$event.type),decreasing=TRUE))[c(1:50),]
event.type.counts <- as.data.frame(event.type.counts)
kable(as.data.frame(event.type.counts), 
      format="html",
      caption="Table 1: Top 50 weather events in unprocessed NOAA data.")

Table 1: Top 50 weather events in unprocessed NOAA data.
	event.type.counts
HAIL	288661
TSTM WIND	219940
THUNDERSTORM WIND	82563
TORNADO	60652
FLASH FLOOD	54277
FLOOD	25326
THUNDERSTORM WINDS	20843
HIGH WIND	20212
LIGHTNING	15754
HEAVY SNOW	15708
HEAVY RAIN	11723
WINTER STORM	11433
WINTER WEATHER	7026
FUNNEL CLOUD	6839
MARINE TSTM WIND	6175
MARINE THUNDERSTORM WIND	5812
WATERSPOUT	3796
STRONG WIND	3566
URBAN/SML STREAM FLD	3392
WILDFIRE	2761
BLIZZARD	2719
DROUGHT	2488
ICE STORM	2006
EXCESSIVE HEAT	1678
HIGH WINDS	1533
WILD/FOREST FIRE	1457
FROST/FREEZE	1342
DENSE FOG	1293
WINTER WEATHER/MIX	1104
TSTM WIND/HAIL	1028
EXTREME COLD/WIND CHILL	1002
HEAT	767
HIGH SURF	725
TROPICAL STORM	690
FLASH FLOODING	682
EXTREME COLD	655
COASTAL FLOOD	650
LAKE-EFFECT SNOW	636
FLOOD/FLASH FLOOD	624
LANDSLIDE	600
SNOW	587
COLD/WIND CHILL	539
FOG	538
RIP CURRENT	470
MARINE HAIL	442
DUST STORM	427
AVALANCHE	386
WIND	340
RIP CURRENTS	304
STORM SURGE	261

Based on the table, at least non-standard event codes related to thunderstorm winds (TSTM WIND, THUNDERSTORM WINDS) should be mapped to official values given in National Weather Service Instruction (NWSI) 10-1605, hereafter NWSI-10-1065. For the purposes of this assingment the effect of mapping other values is negligible taking into account the total number of observations with valid event type code.

weather.df$event.type <- mapvalues(weather.df$event.type, from = c("TSTM WIND", "THUNDERSTORM WINDS"), to = c("THUNDERSTORM WIND", "THUNDERSTORM WIND"))

Next, the dataset is filtered using permitted weather event types and exponent values.

weather.dt <- data.table(weather.df)

permitted.event.types <- c(
  "ASTRONOMICAL LOW TIDE",
    "AVALANCHE",
    "BLIZZARD",
    "COASTAL FLOOD",
    "COLD/WIND CHILL",
    "DEBRIS FLOW",
    "DENSE FOG",
    "DENSE SMOKE",
    "DROUGHT",
    "DUST DEVIL",
    "DUST STORM",
    "EXCESSIVE HEAT",
    "EXTREME COLD/WIND CHILL",
    "FLASH FLOOD",
    "FLOOD",
    "FROST/FREEZE",
    "FUNNEL CLOUD",
    "FREEZING FOG",
    "HAIL",
    "HEAT",
    "HEAVY RAIN",
    "HEAVY SNOW",
    "HIGH SURF",
    "HIGH WIND",
    "HURRICANE (TYPHOON)",
    "ICE STORM",
    "LAKE-EFFECT SNOW",
    "LAKESHORE FLOOD",
    "LIGHTNING",
    "MARINE HAIL",
    "MARINE HIGH WIND",
    "MARINE STRONG WIND",
    "MARINE THUNDERSTORM WIND",
    "RIP CURRENT",
    "SEICHE",
    "SLEET",
    "STORM SURGE/TIDE",
    "STRONG WIND",
    "THUNDERSTORM WIND",
    "TORNADO",
    "TROPICAL DEPRESSION",
    "TROPICAL STORM",
    "TSUNAMI",
    "VOLCANIC ASH",
    "WATERSPOUT",
    "WILDFIRE",
    "WINTER STORM",
    "WINTER WEATHER")

In the original data, the dollar impact values of weather events are given in a format in which the numeric value is given separatery of the exponent of the number. In the NWSI-10-1065 the permitted exponents are:

H (hundreds)
K (thousands)
M (millions)
B (billions)

Only observations with permitted exponent values are included in the analysis, to remove potentially ambiquous observations.

multipliers=c('h','k','m','b','')
weather.dt$property.damage.exp <- tolower(weather.dt$property.damage.exp)
weather.dt$crop.damage.exp <- tolower(weather.dt$crop.damage.exp)
weather.dt.subset <- weather.dt[weather.dt$event.type %in% permitted.event.types,]
weather.dt.subset.subset <- weather.dt.subset[weather.dt.subset$property.damage.exp %in% multipliers & weather.dt.subset$crop.damage.exp %in% multipliers ,]

The original dataset had 26239 observations with ambiquous weather event type and 341 observations with and exponent value with unknown meaning. When rows with ambiquous values are removed, the resulting final dataset used in the analysis contains 875737 rows.

In the last data processing step, the exponent values are transformed to actual dollar amounts. After the last processing step, the dataset is ready for analysis, in which the following variables are used:

property.damage.dollars – property damage in dollars (derived variable)
crop.damage.dollars – crop damage in dollars (derived variable)
event.type – type of the weather event, as described in NWSI-10-1065 codebook.
injuries – number of injuries caused by weather event
fatalities – number of deaths caused by weather event

In later steps, derived variable state.name is included to the analysis to enable geographic investigation of the data.

weather.dt.subset.subset$property.damage.dollars <- mapply(
  process.multipliers, 
  weather.dt.subset.subset$property.damage.exp,
  weather.dt.subset.subset$property.damage)

weather.dt.subset.subset$crop.damage.dollars <- mapply(
  process.multipliers,
  weather.dt.subset.subset$crop.damage.exp,
  weather.dt.subset.subset$crop.damage)

weather.dt.subset.subset$total.damage.dollars <- weather.dt.subset.subset$property.damage.dollars + 
  weather.dt.subset.subset$crop.damage.dollars

weather.dt.subset.subset$population.health.cost <-mapply(
  calculate.health.cost,
  weather.dt.subset.subset$injuries,
  weather.dt.subset.subset$fatalities)

Calculating the total impact to the US economy.

# Prepare dataset for summing over event type

setkey(weather.dt.subset.subset, event.type)

## Group damages by event type, US wide.

# Damages to economy

total.us.crop.damages <- weather.dt.subset.subset[,sum(crop.damage.dollars),by = list(event.type)]
total.us.crop.damages$damage.type <- 'Crop'

total.us.property.damages <- weather.dt.subset.subset[,sum(property.damage.dollars),by = list(event.type)]
total.us.property.damages$damage.type <- 'Property'

total.us.economic.damages <- rbind(total.us.crop.damages,total.us.property.damages)
total.us.economic.damages <- total.us.economic.damages[order(total.us.economic.damages$event.type),]
total.us.economic.damages$event.name <- lapply(total.us.economic.damages$event.type,as.character)
total.us.economic.damages$event.name <- topropper(total.us.economic.damages$event.name)

Calculating the total damage to public health.

# Damages to public health

total.us.fatalities <- weather.dt.subset.subset[,sum(fatalities),by = list(event.type)]
total.us.fatalities$damage.type <- 'Fatalities'

total.us.injuries <- weather.dt.subset.subset[,sum(injuries),by = list(event.type)]
total.us.injuries$damage.type <- 'Injuries'

total.us.public.health.effect <- rbind(total.us.fatalities,total.us.injuries)

total.us.public.health.effect <- total.us.public.health.effect[order(total.us.public.health.effect$event.type),]
total.us.public.health.effect$event.name <- lapply(total.us.public.health.effect$event.type,as.character)
total.us.public.health.effect$event.name <- topropper(total.us.public.health.effect$event.name)

Results

Nation-wide effect on public health

Overall, tornadoes are the cause of majority of weather-related deaths and injuries (Figure 1). Other significant causes are thunderstorm winds, flood and extensive heat, but in relation to the effect of tornadoes, their scale is minuscule.

dollar.impact.labels<- c('<1M','50B','100B')
dollar.impact.breaks<- c(1000000,50000000000,100000000000)


ggplot(
    data = total.us.public.health.effect,
    aes(
        x = event.name,
        y = V1,
        group = damage.type,
        fill = damage.type)) +       
    geom_bar(stat="identity") + 
    xlab("Event") +
    ylab("Number of casualties") +
    ggtitle(wrapper("Figure 1. Casualties of weather-related events from 1950 to 2011.", width=80)) +
    guides(fill=guide_legend(title="Category of casualty")) +
    coord_flip() +
    theme_minimal()

plot of chunk total-public-health-effect

Total impact on US economy

Nation-wide, floods and tornadoes have the greatest impact on the economy (Figure 2). When damage to the agriculture is inspected separately, droughts are the number one cause of economic loss.

ggplot(
    data = total.us.economic.damages,
    aes(
        x = event.name,
        y = V1,
        group = damage.type,
        fill = damage.type)) +       
    geom_bar(stat="identity") + 
    scale_y_continuous(labels=dollar.impact.labels, breaks=dollar.impact.breaks) + 
    xlab("Event") +
    ylab("Economic impact") +
    ggtitle(wrapper("Figure 2. Total economic damage of weather-related events from 1950 to 2011.", width=80)) +
    guides(fill=guide_legend(title="Category of damage")) +
    coord_flip() +
    theme_minimal()

plot of chunk total-economic-damage

Geographic distribution of weather event impact

Even though the nation-wide aggregation of the weather event statistics gives a good overview of the total economic and public health impact of weather events, it should be noted that for preparing for severe weather events, more localized information is needed.

As the geographic area of the US is vast, the possible weather conditions vary highly – there's no such thing as the weather of the USA (Tables 2 and 3). Nation-wide aggregation also hides the fact that the weather events can be extremely local; for example, floods typically affect relatively small areas but can cause serious damage. Also, tornadoes which are the major cause of deaths and injuries related to weather events nation-wide, are occuring typically in particular geographical area, known as Tornado Alley.

## Derive state names from the more detailed area descriptions.

weather.dt.subset.subset$state.name <- NA
weather.dt.subset.subset$state.name <- lapply(weather.dt.subset.subset$event.us.state,as.character)
weather.dt.subset.subset$state.name <- sub(',.*','',weather.dt.subset.subset$state.name)


weather.dt.subset.geographic <- weather.dt.subset.subset[weather.dt.subset.subset$state.name !="",]


setkey(weather.dt.subset.subset, event.type, event.us.state)

## State-part level damages

area.property.damages<-weather.dt.subset.geographic[,sum(total.damage.dollars),by = list(event.us.state,event.type,state.name)]
area.property.damages <- area.property.damages[area.property.damages$V1>1000000,]
area.property.damages<-area.property.damages[order(area.property.damages$event.us.state, -area.property.damages$V1), ]

# Detailed look on state level
new.hampshire.property.damages <- area.property.damages[area.property.damages$state.name == "NEW HAMPSHIRE",-3, with=FALSE]
new.hampshire.property.damages$event.name <- lapply(new.hampshire.property.damages$event.type,as.character)
new.hampshire.property.damages$event.name <- topropper(new.hampshire.property.damages$event.name)
new.hampshire.property.damages <- new.hampshire.property.damages[,-2, with=FALSE]
new.hampshire.property.damages <- new.hampshire.property.damages[,c(1,3,2),with=FALSE]

montana.property.damages <- area.property.damages[area.property.damages$state.name == "MONTANA",-3, with=FALSE]
montana.property.damages$event.name <- lapply(montana.property.damages$event.type,as.character)
montana.property.damages$event.name <- topropper(montana.property.damages$event.name)
montana.property.damages <- montana.property.damages[,-2, with=FALSE]
montana.property.damages <- montana.property.damages[,c(1,3,2),with=FALSE]

setnames(new.hampshire.property.damages,
         c('event.us.state','event.name','V1'),
         c('US state','Event','Economic impact (in USD)')
         )
setnames(montana.property.damages,
         c('event.us.state','event.name','V1'),
         c('US state','Event','Economic impact (in USD)')
         )

kable(montana.property.damages,
      format = "html",
      caption="Table 2: Economic effect of weather-related events in Montana")

Table 2: Economic effect of weather-related events in Montana
US state	Event	Economic impact (in USD)
MONTANA, Central	Hail	60056600
MONTANA, Central	Tornado	4301000
MONTANA, Central	Flood	3218400
MONTANA, Central	Winter Storm	3215000
MONTANA, Central	High Wind	2165000
MONTANA, East	Hail	43283100
MONTANA, East	Thunderstorm Wind	22990900
MONTANA, East	Wildfire	4075500
MONTANA, East	Flash Flood	3447000
MONTANA, East	Winter Storm	3310000
MONTANA, East	Tornado	1312000
MONTANA, South	Tornado	63094000
MONTANA, South	Flood	20430000
MONTANA, South	Hail	16606000
MONTANA, South	Flash Flood	4030000
MONTANA, South	Winter Storm	3000000
MONTANA, South	Wildfire	1700000
MONTANA, South	Blizzard	1500000
MONTANA, South	Heavy Snow	1214000
MONTANA, West	Flash Flood	7048600
MONTANA, West	Flood	3061000
MONTANA, West	Thunderstorm Wind	2864000
MONTANA, West	Hail	2206000
MONTANA, West	High Wind	1252800
MONTANA, West	Heavy Snow	1042000

kable(new.hampshire.property.damages,
      format = "html",
      caption="Table 3: Economic effect of weather-related events in New Hampshire")

Table 3: Economic effect of weather-related events in New Hampshire
US state	Event	Economic impact (in USD)
NEW HAMPSHIRE, North and Central	Flood	57595270
NEW HAMPSHIRE, North and Central	Ice Storm	27629000
NEW HAMPSHIRE, North and Central	Flash Flood	21868000
NEW HAMPSHIRE, North and Central	Lightning	8895000
NEW HAMPSHIRE, North and Central	Coastal Flood	5304500
NEW HAMPSHIRE, Southern	Ice Storm	37305000
NEW HAMPSHIRE, Southern	Flash Flood	14975000
NEW HAMPSHIRE, Southern	Heavy Snow	7433000
NEW HAMPSHIRE, Southern	High Wind	4645000
NEW HAMPSHIRE, Southern	Flood	3918000
NEW HAMPSHIRE, Southern	Thunderstorm Wind	1551000
NEW HAMPSHIRE, Southern	Lightning	1131500

## Group health costs damages by state, statewide
statewide.area.health.cost <- weather.dt.subset.subset[,sum(population.health.cost),by = list(state.name,event.type)]
statewide.area.health.cost <- statewide.area.health.cost[statewide.area.health.cost$V1>10,]
statewide.area.health.cost <- statewide.area.health.cost[order(statewide.area.health.cost$state.name, -statewide.area.health.cost$V1), ]
statewide.area.health.cost$cost.breaks<-cut(statewide.area.health.cost$V1, 
                      breaks=c(0, 50, 250, 500, 1000, 2000, 10000), 
                      labels=c("<50", "50-250", "250-500", "500-1000", "1000-2000", "2000<"),
                      include.lowest=TRUE)


tornado.health.cost.state <- weather.dt.subset.subset[weather.dt.subset.subset$event.type=="TORNADO",sum(population.health.cost),by = list(state.name)]

tornado.health.cost.state$cost.breaks<-cut(tornado.health.cost.state$V1, 
                      breaks=c(0, 50, 250, 500, 1000, 2000, 10000), 
                      labels=c("<50", "50-250", "250-500", "500-1000", "1000-2000", "2000<"),
                      include.lowest=TRUE)


setnames(tornado.health.cost.state, 'state.name','region')
tornado.health.cost.state$region<-tolower(tornado.health.cost.state$region)

all_states <- map_data("state")
map.state <- data.table(map_data('state'))
setkey(map.state,region)


tornado.health.map.df <- merge(map.state,tornado.health.cost.state, by="region")
ggplot(tornado.health.map.df, aes(x=long, y=lat, group=group, fill=cost.breaks)) + 
  geom_polygon() +
  coord_map() +
  scale_fill_brewer(type="seq",palette="Reds") +
  labs(
    fill = "Number of casualties", 
    title = "Figure 3. Geographic distribution of tornado casualties from 1950 to 2011",
    x="",
    y="") + 
  scale_y_continuous(breaks=c()) +
  scale_x_continuous(breaks=c()) +
  theme(panel.border =  element_blank()) +
  theme_minimal()

plot of chunk tornado-geographic-area-impact