1 - Synopsis

Every year there is loss of life and injuries to people, as well as property damage that result from various types of storms in the U.S. Data on such events and their outcomes is tracked and made available from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Using this data I have analyzed the types of storms as well as the personal and economic outcomes that have resulted.

2 - Data Processing

2.1 - Load in the Data

The data is loaded from an avaiable NOAA Storm Data set and saved as StormData.csv.bz2.

library(data.table)
library(ggplot2)
library(plyr)
library(gridExtra)
library(grid)
destfile <- "StormData.csv.bz2"
location <- "/Users/edloessi/Documents/data"
if (!file.exists(paste(location, destfile, sep="/"))){
       fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
       download.file(fileUrl, destfile = paste(location, destfile, sep="/"))
}

2.2 - Read the data into a data table.

Due to the processing requirments I set the cache=TRUE so that running the Rmarkdown document wouldn’t process the data each time.

stormData <- read.csv(bzfile(paste(location, destfile, sep="/")), header = T)
stormDataT <- as.data.table(stormData)

2.3 Check the column names from the dataset.

colnames(stormDataT)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

2.4 - Subsetting the dataset

This project is looking specifically at personal health and property damage issues. The data set is further subsetted below so that we are only workinf with those variables.

stormDataSub <- stormDataT[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
stormDataSub
##             EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
##      1:    TORNADO          0       15    25.0          K       0
##      2:    TORNADO          0        0     2.5          K       0
##      3:    TORNADO          0        2    25.0          K       0
##      4:    TORNADO          0        2     2.5          K       0
##      5:    TORNADO          0        2     2.5          K       0
##     ---                                                          
## 902293:  HIGH WIND          0        0     0.0          K       0
## 902294:  HIGH WIND          0        0     0.0          K       0
## 902295:  HIGH WIND          0        0     0.0          K       0
## 902296:   BLIZZARD          0        0     0.0          K       0
## 902297: HEAVY SNOW          0        0     0.0          K       0
##         CROPDMGEXP
##      1:           
##      2:           
##      3:           
##      4:           
##      5:           
##     ---           
## 902293:          K
## 902294:          K
## 902295:          K
## 902296:          K
## 902297:          K

Organizing the data for making calculations

Both property/crop damage and personal injury data require some reorganization before certain calculations and results display are practical.

2.5 - Calculate the specific damage to property and crops

The datafile contains the property damage type, a dollar cost, and an exponent value the keys to the dollar cost. To normalize the exponent and create a useable dollar figure i.e. 2.5K becomes 2500 you need to first run the loop below.

getExp <- function(e) {
    if (e %in% c("h", "H"))
        return(2)
    else if (e %in% c("k", "K"))
        return(3)
    else if (e %in% c("m", "M"))
        return(6)
    else if (e %in% c("b", "B"))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c("", "-", "?", "+"))
        return(0)
    else {
        stop("Invalid value.")
    }
}

Next, you must calculate the correct property and crop damage figures.

propExp <- sapply(stormDataSub$PROPDMGEXP, FUN=getExp)
stormDataSub$propDamage <- stormDataSub$PROPDMG * (10 ** propExp)
cropExp <- sapply(stormDataSub$CROPDMGEXP, FUN=getExp)
stormDataSub$cropDamage <- stormDataSub$CROPDMG * (10 ** cropExp)

Next, the financial damage for crops and property have to be summarized according to the event type.

econDamage <- ddply(stormDataSub, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))

Next, events not causing any financial damage will be omitted using the code below.

econDamage <- econDamage[(econDamage$propDamage > 0 | econDamage$cropDamage > 0), ]

Lastly, sort the property and crop damage in decreasing order for use in future graphing and visualization.

propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]
cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]

2.6 - Population Health

The fatalaties and injuries need to be summarized according to the event type and put on decreasing order.

harm2health <- ddply(stormDataSub, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatal <- harm2health[order(harm2health$fatalities, decreasing = T), ]
injury <- harm2health[order(harm2health$injuries, decreasing = T), ]

3.0 - RESULTS

3.1 - The Population Health Analysis

These two plots show the top 5 weather events and the injuries and fatalities associated with each.

p1 <- ggplot(data=head(injury,5), aes(x=reorder(EVTYPE, -injuries), y=injuries)) +
   geom_bar(fill="dodgerblue3",stat="identity")  + 
    ylab("Total number of injuries") + xlab("Event type") +
    ggtitle("Population Health impact of weather events in the US - Top 5") +
    theme(legend.position="none")

p2 <- ggplot(data=head(fatal,5), aes(x=reorder(EVTYPE, -fatalities), y=fatalities)) +
    geom_bar(fill="violetred4",stat="identity") +
    ylab("Total number of fatalities") + xlab("Event type") +
    theme(legend.position="none")

grid.arrange(p1, p2, nrow =2)

3.2 - Property and Crop Damage Analysis

These two plots show the top 5 weather events and the property and crop damage associated with each.

p1 <- ggplot(data=head(propDmgSorted,5), aes(x=reorder(EVTYPE, -propDamage), y=log10(propDamage), fill=propDamage )) +
    geom_bar(fill="cyan4", stat="identity") +
    xlab("Event type") + ylab("Property damage in dollars (log10)") +
    ggtitle("Property and crop damage economic impact of weather events in the US - Top 5") +
    theme(plot.title = element_text(hjust = 0))

p2 <- ggplot(data=head(cropDmgSorted,5), aes(x=reorder(EVTYPE, -cropDamage), y=cropDamage, fill=cropDamage)) +
    geom_bar(fill="forestgreen", stat="identity") + 
    xlab("Event type") + ylab("Crop damage in dollars") + 
    theme(legend.position="none")

grid.arrange(p1, p2, ncol=1, nrow =2)