Impact Study: Weather Related Events Across the U.S.

The following data analysis addresses the two questions:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Synopsis

This report acts as an aid for municipal managers or other government officials looking to prioritize resources for various weather-related events. Though we make no specific recommendations, we are able to tease out of the NOAA Storm Database, those events that have the largest impact on health and the economy. Utilizing simple graphs, charts and summaries the reader will find that our results clearly show a primary source of concern. This report has been created in such a way that these commands are reproducible containing every programming code used in the analysis along with a description of the thinking that goes into such an analysis. The results will show that the types of events most harmful with respect to health are primarly wind related. Conversely, the events with the greatest economic consequences revolve around the costal areas and are primarly water related.

Data Processing

Data Source

The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be found at the following URL: Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

How the NOAA Data is Loaded

After downloading the data to the working directory for R, we use a single command to load and parse it from the raw data file repdata-data-StormData.csv.bz2.

# load the data
stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Processing the Data

To process the data we utilize a different technique for each question, each technique utilizes the stormData variable as it's default.

The resulting data.file contains 902,297 observations of 37 variables.

Impact of Severe Weather on Health of United States Population

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question we utilize the FATALITIES and INJURIES columns. Though we could have simply summed them up together (and ultimately we want to see them together) it was decided to keep them separate and use a stacked barplot to see it's impact separately for each of the most harmful events.

Health Data Transformation

The first step is to aggregate the data into a new data.frame which we'll call, health. To start we total up the FATALITIES column by the events (EVTYPE). Then set the column names to event and fatalaties respectively. Finally we tack on the final column, injuries also by event.

# Health Impact - Data Processing
health <- aggregate(c(stormData$FATALITIES), by = list(stormData$EVTYPE), "sum")
colnames(health) <- c("event", "fatalaties")
health <- cbind(health, injuries = aggregate(c(stormData$INJURIES), by = list(stormData$EVTYPE), 
    "sum")$x)
str(health)

## 'data.frame':    985 obs. of  3 variables:
##  $ event     : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ fatalaties: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ injuries  : num  0 0 0 0 0 0 0 0 0 0 ...

head(health)

##                   event fatalaties injuries
## 1    HIGH SURF ADVISORY          0        0
## 2         COASTAL FLOOD          0        0
## 3           FLASH FLOOD          0        0
## 4             LIGHTNING          0        0
## 5             TSTM WIND          0        0
## 6       TSTM WIND (G45)          0        0

This is a large list (985 Observations) that we then narrow down by dropping any row that contains only 0's in both columns. Row names are not helpful at the moment so they are removed.

# drop values that are just zero
health <- health[health$fatalaties > 0 | health$injuries > 0, ]
rownames(health) <- NULL
head(health)

##          event fatalaties injuries
## 1     AVALANCE          1        0
## 2    AVALANCHE        224      170
## 3    BLACK ICE          1       24
## 4     BLIZZARD        101      805
## 5 blowing snow          1        1
## 6 BLOWING SNOW          1       13

The next step is to extract just the important events. The simplest way to do this is to just sort by the fatalaties and injuries and grab the top ones.

health <- head(health[order(health$fatalaties, health$injuries, decreasing = T), 
    ])
health

##              event fatalaties injuries
## 184        TORNADO       5633    91346
## 32  EXCESSIVE HEAT       1903     6525
## 42     FLASH FLOOD        978     1777
## 69            HEAT        937     2100
## 123      LIGHTNING        816     5230
## 191      TSTM WIND        504     6957

The plotting data requires a matrix as it's input, so we create a special matrix variable from the health data that we'll use to plot and setup the event column as the row names.

plotHealth <- as.matrix(health[, c("fatalaties", "injuries")])

rownames(plotHealth) <- health$event

In order to plot our new matrix we need to transpose it which we do as part of the plotting process. The output of which gives us:

t(plotHealth)

##            TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING TSTM WIND
## fatalaties    5633           1903         978  937       816       504
## injuries     91346           6525        1777 2100      5230      6957

A stacked bar chart reveals the relative differences concerning those affected given the top events.

par(oma = c(4, 1, 0, 1))
barplot(height = t(plotHealth), width = 1, col = 1:2, legend.text = c("fatalaties", 
    "injuries"), main = "Events Most Harmful with Respect to Population Health", 
    ylab = "Affected People", las = 3)

plot of chunk unnamed-chunk-7

CAPTION

The black portions of each bar represent the fatalaties for each event, whereas the red component displays injuries. This graph is meant to display relative amounts of health impact whereas the above chart gives absolute values if needed.

RESULTS

Which types of events are most harmful to population health

Nothing comes close to tornadoes in terms of public health, resulting in 5,633 fatalaties and 91,346 injuries in our dataset. Though wind causes more injuries than excessive heat (6957 vs 6525), the heat is cause for more deaths (937 vs 504). The worst six weather related events are (in order of fatalaties): Tornadoes, Excessive Heat, Flood, Heat, Lightning and TSTM Wind.

Events that have the Greatest Economic Consequences

Question 2: Across the United States, which types of events have the greatest economic consequences?

To answer this question we have to review the data in some detail. There are many errors in the economic data which can be quite misleading. Click Here for Details. It was therefore determined that since we were are looking for only the greatest economic consequences, identifying the smaller details of economic impact could successfully be set aside in an attempt to determine only the most prevalent events.

Reviewing the data reveals that there are far fewer events that make their impact in the billions.

length(stormData$PROPDMGEXP[stormData$PROPDMGEXP == "B"])

## [1] 40

Since it would take 1,000 entries of $1 millon to reach this level of impact we should be able to set aside all other data in view of these very large impacts. Revealing the unique events for this dataset confirms our suspicions.

unique(stormData$EVTYPE[stormData$PROPDMGEXP == "B"])

##  [1] WINTER STORM               HURRICANE OPAL/HIGH WINDS 
##  [3] HURRICANE OPAL             TORNADOES, TSTM WIND, HAIL
##  [5] RIVER FLOOD                HEAVY RAIN/SEVERE WEATHER 
##  [7] SEVERE THUNDERSTORM        FLOOD                     
##  [9] HURRICANE                  WILD/FOREST FIRE          
## [11] TROPICAL STORM             FLASH FLOOD               
## [13] WILDFIRE                   HURRICANE/TYPHOON         
## [15] HIGH WIND                  STORM SURGE               
## [17] STORM SURGE/TIDE           HAIL                      
## [19] TORNADO                   
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Economic Data Transformation

These are the type of events one would associate with the the majority of economic impact from weather related events. Continuing in this frame we create an economic dataset that looks at events in the billions.

economicImpact <- table(stormData$EVTYPE[stormData$PROPDMGEXP == "B"])
sort(economicImpact[economicImpact > 1], decreasing = T)

## 
## HURRICANE/TYPHOON             FLOOD         HURRICANE           TORNADO 
##                12                 5                 3                 3 
##    HURRICANE OPAL       STORM SURGE 
##                 2                 2

Plotting this reveals those events that have the largest economic impact.

plotData <- economicImpact[economicImpact > 1]
barplot(rev(sort(plotData)), legend.text = rownames(rev(sort(plotData))), col = 1:6, 
    axisnames = F, ylab = "Number of Billion Dollar Events", main = "Events with the Greatest Economic Consequences")

plot of chunk unnamed-chunk-11

CAPTION

Each bar is represented by a color which corresponds to the event type of the same color in the legend. The number of billion dollar events is listed on the y-axis in an effort to display the relative size of each event.

RESULTS

Which types of events have the greatest economic consequences

Events at sea plainly have the largest impact on property with Hurricanes (and Typhoons) leading the way. Flooding is next, followed by Tornadoes.