Analysis of the Health and Economic Impacts of Severe Weather Events in the U.S., from data housed in the NOAA Storm Database (1950-2011).

Synopsis

In the following report, we will summarize and chart data contained in the NOAA (U.S. National Oceanic and Atmospheric Administration) Storm Database pertaining to severe weather events occurring domestically from 1950-2011. This analysis will specifically focus on the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Summaries and charts will be used to underscore which events result in the greatest negative impact, thereby signalling to authorities how best to prioritize resources for maximum harm reduction. We found that health was most greatly impacted by tornadoes, while floods had the most deleterious economic consequences.

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Load necessary libraries.

# Load necessary libraries
library(dplyr)
library(ggplot2)
library(knitr)
library(gridExtra)
library(scales)
library(reshape2)

Load and read data into table to be analyzed.

# Download file
if (!file.exists("StormData.csv.bz2")) {
  fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
  download.file(fileURL, destfile='StormData.csv.bz2')
}
# Read into table - only read file if not in cache
if(!file.exists("stormData.Rdata")) {
     stormData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)
     save(stormData, file = "stormData.Rdata")
} else {
     load("stormData.Rdata")
}

View structure / summaries of the data. Select relevant fields for analysis.

Note: We will be focusing on EVTYPE (type of weather event), FATALITIES (count of fatalities), INJURIES (count of injuries), PROPDMG (a numerical measure of property damage), PROPDMGEXP (an exponent used as a multiplication factor for PROPDMG), CROPDMG (a numerical measure of crop damage), and CROPDMGEXP (an exponent used as a multiplication factor for CROPDMG).

# View structure of the data
str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# Select relevant fields for analysis
stormAnalysis <- select(stormData, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(stormAnalysis)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Clean data and prepare for analysis.

The PROPDMGEXP and CROPDMGEXP fields are populated by character values meant to denote numerical values (ex. “K” = 1000). Some values are not applicable and stripped from the data for the purpose of the downstream analysis (ex. “+”, “5”, “6”, “0”).

Once these exponent fields are cleaned, they are assigned the appropriate numerical values. Finally, they are multiplied with their corresponding PROPDMG and CROPDMG columns to arrive at the true total amount.

# Convert exponential Property Damage and Crop Damage to number values
## Find unique values for PROPDMGEXP 
unique(stormAnalysis$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
## Assign based on number value. Using 1 for non-assignables, and 0 for null.
stormAnalysis2 <- stormAnalysis
stormAnalysis_PROP <- mutate(stormAnalysis2, PROPDMGEXP_NUM = ifelse(PROPDMGEXP == "K" ,1000,
                                       ifelse(PROPDMGEXP == "M" | PROPDMGEXP == "m" ,1000000,
                                       ifelse(PROPDMGEXP == "B",1000000000,
                                       ifelse(PROPDMGEXP == "H" | PROPDMGEXP == "h", 100,
                                       ifelse(PROPDMGEXP == "", 0, 1))
                                                            ))))
## Multiply the new numeric exponent field by the property damage to find true value. Add to dataframe as PROP_DAMAGE
stormAnalysis_PROP$PROP_DAMAGE <- stormAnalysis_PROP$PROPDMGEXP_NUM * stormAnalysis_PROP$PROPDMG
rm(stormAnalysis2)

## Follow same steps for Crop Damage...

## Find unique values for CROPDMGEXP 
unique(stormAnalysis$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
## Assign based on number value. Using 1 for non-assignables, and 0 for null.
stormAnalysis2 <- stormAnalysis
stormAnalysis_CROP <- mutate(stormAnalysis2, CROPDMGEXP_NUM = ifelse(CROPDMGEXP == "K" | CROPDMGEXP == "k" ,1000,
                                                              ifelse(CROPDMGEXP == "M" | CROPDMGEXP == "m" ,1000000,
                                                              ifelse(CROPDMGEXP == "B",1000000000,
                                                              ifelse(CROPDMGEXP == "0", 0,
                                                              ifelse(CROPDMGEXP == "", 0, 1))
                                                                            ))))
## Multiply the new numeric exponent field by the crop damage to find true value. Add to dataframe as CROP_DAMAGE
stormAnalysis_CROP$CROP_DAMAGE <- stormAnalysis_CROP$CROPDMGEXP_NUM * stormAnalysis_CROP$CROPDMG
rm(stormAnalysis2)

Analysis

Top 10 event types impacting population health

Displaying the top 10 weather event types, ordered by count of fatalities.

Tornadoes are at the top of the list, followed by heat related events and flash floods.

FATALITY <- stormAnalysis %>% 
              group_by(EVTYPE) %>%
                summarise(total = sum(FATALITIES)) %>%
                  arrange(desc(total))
FATALITY[1:10,]
## Source: local data frame [10 x 2]
## 
##            EVTYPE total
##             (chr) (dbl)
## 1         TORNADO  5633
## 2  EXCESSIVE HEAT  1903
## 3     FLASH FLOOD   978
## 4            HEAT   937
## 5       LIGHTNING   816
## 6       TSTM WIND   504
## 7           FLOOD   470
## 8     RIP CURRENT   368
## 9       HIGH WIND   248
## 10      AVALANCHE   224

Displaying the top 10 weather event types, ordered by count of injuries.

Again, Tornadoes are at the top, followed by wind, floods and excessive heat.

INJURY <- stormAnalysis %>% 
            group_by(EVTYPE) %>%
              summarise(total = sum(INJURIES)) %>%
                arrange(desc(total))
INJURY[1:10,]  
## Source: local data frame [10 x 2]
## 
##               EVTYPE total
##                (chr) (dbl)
## 1            TORNADO 91346
## 2          TSTM WIND  6957
## 3              FLOOD  6789
## 4     EXCESSIVE HEAT  6525
## 5          LIGHTNING  5230
## 6               HEAT  2100
## 7          ICE STORM  1975
## 8        FLASH FLOOD  1777
## 9  THUNDERSTORM WIND  1488
## 10              HAIL  1361

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We observe from the following grid-chart that tornadoes are by far the leading cause of both fatality and injury from the sampled data. The charts display the top ten event types in diminishing order of frequency.

## Display data in side-by-side bar chart form.

### Redefining tables for chart (choosing top 10 and sorting the EVTYPE)
FATALITY <- FATALITY[1:10,]
FATALITY$EVTYPE <- factor(FATALITY$EVTYPE, levels = FATALITY$EVTYPE[order(FATALITY$total)])
INJURY <- INJURY[1:10,]  
INJURY$EVTYPE <- factor(INJURY$EVTYPE, levels = INJURY$EVTYPE[order(INJURY$total)])

g1 <- ggplot(FATALITY, aes(x = EVTYPE, y = total)) +
              geom_bar(stat = "identity", fill = "green") +
              scale_y_continuous() +
              coord_flip() +
              labs(title = "Top 10 Fatality", x = "Weather Event", y = "Count of Fatalities")

g2 <- ggplot(INJURY, aes(x = EVTYPE, y = total)) +
              geom_bar(stat = "identity", fill = "blue") +
              scale_y_continuous() +
              coord_flip() +
              labs(title = "Top 10 Injury", x = "Weather Event", y = "Count of Injuries") 

grid.arrange(g1, g2, ncol = 2, top = "Top 10 Harmful Events to Population Health: Fatalities / Injuries")


2. Across the United States, which types of events have the greatest economic consequences?

First, let’s examine a summary of the top ten events causing property damage. Floods, Hurricane/Typhoons and Tornadoes are the top three causes, with floods as the dominant event.

## Top 10 event types impacting property and crop damage

## Top 10 by Property
### Remove all but required fields
PROPERTY_CHART <- select(stormAnalysis_PROP, EVTYPE, PROP_DAMAGE)
PROPERTY <- PROPERTY_CHART %>% 
                group_by(EVTYPE) %>%
                summarise(total = sum(PROP_DAMAGE)) %>%
                arrange(desc(total))

PROPERTY10 <- PROPERTY[1:10,]
### display "total" as currency - Create separate table for displayed version (will use numberic version for combined chart below)
PROPERTY_DISPLAY <- PROPERTY10
PROPERTY_DISPLAY$total <- dollar(PROPERTY_DISPLAY$total)
PROPERTY_DISPLAY
## Source: local data frame [10 x 2]
## 
##               EVTYPE            total
##                (chr)            (chr)
## 1              FLOOD $144,657,709,800
## 2  HURRICANE/TYPHOON  $69,305,840,000
## 3            TORNADO  $56,937,160,776
## 4        STORM SURGE  $43,323,536,000
## 5        FLASH FLOOD  $16,140,811,860
## 6               HAIL  $15,732,267,486
## 7          HURRICANE  $11,868,319,010
## 8     TROPICAL STORM   $7,703,890,550
## 9       WINTER STORM   $6,688,497,251
## 10         HIGH WIND   $5,270,046,295

Next, we’ll look at the top ten weather events causing crop damage. Drought is the top cause, followed by Flood, River Flood and Ice Storm.

## Top 10 by Crop
### Remove all but required fields
CROP_CHART <- select(stormAnalysis_CROP, EVTYPE, CROP_DAMAGE)
CROP <- CROP_CHART %>% 
                group_by(EVTYPE) %>%
                summarise(total = sum(CROP_DAMAGE)) %>%
                arrange(desc(total))
CROP10 <- CROP[1:10,] 


### display "total" as currency - Create separate table for displayed version (will use numberic version for combined chart below)
CROP_DISPLAY <- CROP
CROP_DISPLAY$total <- dollar(CROP_DISPLAY$total)
CROP_DISPLAY
## Source: local data frame [985 x 2]
## 
##               EVTYPE           total
##                (chr)           (chr)
## 1            DROUGHT $13,972,566,000
## 2              FLOOD  $5,661,968,450
## 3        RIVER FLOOD  $5,029,459,000
## 4          ICE STORM  $5,022,113,500
## 5               HAIL  $3,025,954,450
## 6          HURRICANE  $2,741,910,000
## 7  HURRICANE/TYPHOON  $2,607,872,800
## 8        FLASH FLOOD  $1,421,317,100
## 9       EXTREME COLD  $1,292,973,000
## 10      FROST/FREEZE  $1,094,086,000
## ..               ...             ...

Finally, we produce a chart displaying the top ten weather events by total overall damage (property combined with crop). Floods are the clearly dominant cause of economic damage.

## Display totals on a bar chart, thus outlining which events caused the most overall damage

PROP_CROP <- full_join(PROPERTY, CROP, by = "EVTYPE")

PROP_CROP <- mutate(PROP_CROP, GRAND_TOTAL = total.x + total.y)

## Take only top 10 by GRAND_TOTAL

PROP_CROP <- arrange(PROP_CROP, desc(GRAND_TOTAL))

PROP_CROP <- PROP_CROP[1:10,]

## Remove subtotals and display GRAND TOTAL as dollar amount in TOTAL version
PROP_CROP_TOTAL <- select(PROP_CROP, EVTYPE, GRAND_TOTAL)
#PROP_CROP_TOTAL$GRAND_TOTAL <- dollar(PROP_CROP_TOTAL$GRAND_TOTAL)

# Create chart of top 10 total economic damage

## Sort by factor
PROP_CROP_TOTAL$EVTYPE <- factor(PROP_CROP_TOTAL$EVTYPE, levels = PROP_CROP_TOTAL$EVTYPE[order(PROP_CROP_TOTAL$GRAND_TOTAL, decreasing = TRUE)])

g3 <- ggplot(PROP_CROP_TOTAL, aes(x = EVTYPE, y = GRAND_TOTAL)) +
              geom_bar(stat = "identity", fill = "red") +
              scale_y_continuous(labels = comma) +
              #coord_flip() +
              theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
              labs(title = "Top 10 Total Economic Damage", x = "Weather Event", y = "Total Amount (in dollars)")

g3