Synopsis

In this report, we present an analysis of data collected on the effects of damaging weather events, to include damage to public health by death or injury and economic damage to property and to agricultural crops. To answer the questions of which weather events caused the most damage to health and to property, we rank the events by amount of damage caused, measured in US dollars, and compare them to a cumulative frequency distribution of the damages by using a Pareto chart. (See Wikipedia for more about Pareto charts.)

Data Processing

Load the necessary packages.

knitr::opts_chunk$set(echo = TRUE)
library(data.table)
library(dplyr)

Read the health data into R.

Here we will also replace some of the variable names with more readable ones, as well as collect the totals for the fatality and injury variables

current_path <- "/home/rob/Data Science/reproducibleresearch/week4"
setwd(current_path) # Author use only.

if(!file.exists("repdata%2Fdata%2FStormData.csv.bz2")){
    download.file(data_url, "StormData.csv.bz2")
    time_data_downloaded <- Sys.time()
}

relevant_variables <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")

storm_data <- fread(sprintf("bzcat %s | tr -d '\\000'", "StormData.csv.bz2"), na.strings = "", select = relevant_variables)
names(storm_data) <- c("event_type", "fatalities", "injuries", "property_damage", "property_damage_exponent", "crop_damage", "crop_damage_exponent")

fatality_data <- storm_data %>%
        group_by(event_type) %>%
        summarise(total = sum(fatalities)) %>%
        arrange(desc(total))

injury_data <- storm_data %>%
        group_by(event_type) %>%
        summarise(total = sum(injuries)) %>%
        arrange(desc(total))

Read the property data into R.

Handling the property and crop damage data takes more care, as the raw data use letters to indicate which power of ten the given damage value should be multiplied by.

property_data <- select(storm_data, 
                        event_type,
                        property_damage,
                        property_damage_exponent)

property_data <- within(property_data, {
        exponent = 0
        exponent[property_damage_exponent %in% c("H", "h")] = 2
        exponent[property_damage_exponent %in% c("K", "k")] = 3
        exponent[property_damage_exponent %in% c("M", "m")] = 6
        exponent[property_damage_exponent %in% c("B", "b")] = 9
        exponent[property_damage_exponent %in% c("+", "-", "?", " ", "")] = 0
        })

property_damage <- property_data %>%
        mutate(property_damage_cost = property_damage * 10^exponent) %>%
        group_by(event_type) %>%
        summarise(total_property_cost = sum(property_damage_cost)) %>%
        arrange(desc(total_property_cost)) %>%
        filter(row_number() <= 180)


crop_data <- select(storm_data,
                    event_type,
                    crop_damage,
                    crop_damage_exponent)

crop_data <- within(crop_data, {
        exponent = 0
        exponent[crop_damage_exponent %in% c("H", "h")] = 2
        exponent[crop_damage_exponent %in% c("K", "k")] = 3
        exponent[crop_damage_exponent %in% c("M", "m")] = 6
        exponent[crop_damage_exponent %in% c("B", "b")] = 9
        exponent[crop_damage_exponent %in% c("+", "-", "?", " ", "")] = 0
        })

crop_damage <- crop_data %>%
        mutate(crop_damage_cost = crop_damage * 10^exponent) %>%
        group_by(event_type) %>%
        summarise(total_crop_cost = sum(crop_damage_cost)) %>%
        arrange(desc(total_crop_cost)) %>%
        filter(row_number() <= 37)

Results

Here we produce the Pareto charts to see which events are the causes of the majority of the damages to health and property. The code to produce these figures is lengthy, but we believe that the analysis will show them worthwhile. Note that because there were so many unique event types, full Pareto charts would be impractical, as the vast majority of event types cause comparatively little damage. So only the first few events are shown. However, the events shown will account for at least 75% of damage caused.