Synopsis

To inform policy on preventative measures against harmful and damaging weather events in the United States, this analysis attempts to determine the types of weather event types that are most harmful with respect to population health, as well as weather event types that are most damaging to the country’s economy. It uses data provided by the National Oceanic and Atmospheric Administration (NOAA). It follows a simple approach to produce two rankings, each listing the most dangerous and most damaging weather event types observed in the United States between 1996 and 2011 respectively.

Introduction

Weather events often have negative consequences on the health of a population, as well as on the economy of a country. Policy makers need to make informed decisions on allocating resources to counteract these weather events. This report contributes by attempting to answer two questions:

  1. Across the United States, which types of weather events are most harmful with respect to population health?
  2. Across the United States, which types of weather events have the most severe economic consequences?

The analysis employs simple ranking approach to answer the questions stated above. It follows the following logical steps:

  1. Start with an initial data set of recorded weather events and subset the data to exclude dirty observations.
  2. Choose a set of variables that are indicative of population health and another set of variables that are indicative of economic consequences.
  3. Create subsets containing valid observations.
  4. Aggregate values per event type.
  5. Sort data in descending order.
  6. Select top 10 observations.

Data

The data set used in this analysis is provided by the National Oceanic and Atmospheric Administration (NOAA).

The NOAA data set covers observations from 1950 to 2011. However, only events recorded after 1996 are used.

Data Processing

Loading the data

packages <- c("R.utils", "plyr", "ggplot2")

installed <- packages %in% installed.packages()
if (any(!installed)) {
  install.packages(packages[!installed])
}

lapply(packages, library, character.only = TRUE)
## [[1]]
##  [1] "R.utils"     "R.oo"        "R.methodsS3" "stats"       "graphics"   
##  [6] "grDevices"   "utils"       "datasets"    "methods"     "base"       
## 
## [[2]]
##  [1] "plyr"        "R.utils"     "R.oo"        "R.methodsS3" "stats"      
##  [6] "graphics"    "grDevices"   "utils"       "datasets"    "methods"    
## [11] "base"       
## 
## [[3]]
##  [1] "ggplot2"     "plyr"        "R.utils"     "R.oo"        "R.methodsS3"
##  [6] "stats"       "graphics"    "grDevices"   "utils"       "datasets"   
## [11] "methods"     "base"
if (!file.exists("storm_data.csv")) {
  file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  file_name <- "storm_data.csv.bz2"
  download.file(file_url, file_name, method = "auto")

  if (!file.exists("storm_data.csv")) {
    bunzip2(file_name, overwrite = TRUE)
  }
}
original_storm_data <- read.csv("storm_data.csv")
nrow(original_storm_data)
## [1] 902297

Cleaning the data

storm_data <- original_storm_data[, c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
storm_data$BGN_DATE <- as.Date(as.character(storm_data$BGN_DATE), "%m/%d/%Y %H:%M:%S")
storm_data <- subset(storm_data, as.numeric(format(storm_data$BGN_DATE,"%Y")) > 1996)
nrow(storm_data)
## [1] 621260

Subsetting data with negative consequences for population health

For observations of the FATALITIES and INJURIES variables to be valuable for determining negative consequences on population health, they must be greater than 0.

storm_data_for_harmfulness <- subset(storm_data, FATALITIES > 0 | INJURIES > 0)

For analysing consequences on population health, only EVTYPE, FATALITIES, and INJURIES are considered.

storm_data_for_harmfulness <- storm_data_for_harmfulness[,c("EVTYPE","FATALITIES","INJURIES")]
nrow(storm_data_for_harmfulness)
## [1] 11851

To approximate the total effect, fatalities and injuries are combined.

storm_data_for_harmfulness$HARMFULNESS <- storm_data_for_harmfulness$FATALITIES + storm_data_for_harmfulness$INJURIES
storm_data_for_harmfulness_grouped_per_event_type <- ddply(storm_data_for_harmfulness, .(EVTYPE), numcolwise(sum))

Subsetting data with negative consequences for the economy

For economic impact, valid property and crop damage values are required.

storm_data_for_economy <- subset(storm_data, PROPDMGEXP != "" & CROPDMGEXP != "")
storm_data_for_economy <- subset(storm_data_for_economy, PROPDMG > 0 | CROPDMG > 0)
storm_data_for_economy <- storm_data_for_economy[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
nrow(storm_data_for_economy)
## [1] 91502

Multipliers are applied to obtain full damage values.

storm_data_for_economy$PROPDMGEXP <- toupper(storm_data_for_economy$PROPDMGEXP)
storm_data_for_economy$CROPDMGEXP <- toupper(storm_data_for_economy$CROPDMGEXP)

storm_data_for_economy$PROPDMGFULL <- ifelse(storm_data_for_economy$PROPDMGEXP=="K", storm_data_for_economy$PROPDMG*1000,
ifelse(storm_data_for_economy$PROPDMGEXP=="M", storm_data_for_economy$PROPDMG*1000000,
ifelse(storm_data_for_economy$PROPDMGEXP=="B", storm_data_for_economy$PROPDMG*1000000000,0)))

storm_data_for_economy$CROPDMGFULL <- ifelse(storm_data_for_economy$CROPDMGEXP=="K", storm_data_for_economy$CROPDMG*1000,
ifelse(storm_data_for_economy$CROPDMGEXP=="M", storm_data_for_economy$CROPDMG*1000000,
ifelse(storm_data_for_economy$CROPDMGEXP=="B", storm_data_for_economy$CROPDMG*1000000000,0)))
storm_data_for_economy$DAMAGE <- storm_data_for_economy$PROPDMGFULL + storm_data_for_economy$CROPDMGFULL
storm_data_for_economy_grouped_per_event_type <- ddply(storm_data_for_economy, .(EVTYPE), numcolwise(sum))

Results

Population health

top_10_events_for_harmfulness <- storm_data_for_harmfulness_grouped_per_event_type[
  order(storm_data_for_harmfulness_grouped_per_event_type$HARMFULNESS, decreasing = TRUE),
][1:10, ]

top_10_events_for_harmfulness[, c("EVTYPE", "HARMFULNESS")]
##                EVTYPE HARMFULNESS
## 96            TORNADO       21447
## 22     EXCESSIVE HEAT        8093
## 29              FLOOD        7121
## 66          LIGHTNING        4424
## 98          TSTM WIND        3524
## 28        FLASH FLOOD        2425
## 94  THUNDERSTORM WIND        1530
## 42               HEAT        1459
## 55  HURRICANE/TYPHOON        1339
## 112      WINTER STORM        1209

Economy

top_10_events_for_damage <- storm_data_for_economy_grouped_per_event_type[
  order(storm_data_for_economy_grouped_per_event_type$DAMAGE, decreasing = TRUE),
][1:10, ]

top_10_events_for_damage[, c("EVTYPE", "DAMAGE")]
##               EVTYPE       DAMAGE
## 17             FLOOD 136877233900
## 32 HURRICANE/TYPHOON  29348167800
## 51           TORNADO  16203902150
## 31         HURRICANE  11474663000
## 25              HAIL   9172124220
## 16       FLASH FLOOD   8246133530
## 48  STORM SURGE/TIDE   4641493000
## 50 THUNDERSTORM WIND   3780985440
## 61          WILDFIRE   3684468370
## 30         HIGH WIND   2873328540

Conclusion

This analysis shows that excessive heat, floods, lightning and tornadoes are among the most harmful weather events to population health, while floods, hail, hurricanes and tornadoes cause the greatest economic damage.