Synopsis

This report will take NOAA weather data and determine which weather events cause the most harm to human health and the most economic damage. The data will be cleanned to accurately and consistently represent the weather types and damage. Graphs will be produced that will show the weather events that are most problematic.

Data Processing

# clear out global environment, 
rm(list=ls())  

setwd("~/Documents/datasciencecoursera/ReproducibleResearch/PA2")
d <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

#load libraries
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following object is masked from 'package:plyr':
## 
##     here
library(ggplot2)

Start Cleaning the Data

From the Document https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf page 6 we see that data for all weather types was not collected until January 1996. The belief is this will give an inaccuate picture of what events cause the most problems because it will show Tornados and other events that were recorded earlier as being more prevelant in total than other events that were only recoreded after 1996.

#  limit data to that with dates before jan 1996
rdate <- as.POSIXct("1/1/1996", "%m/%d/%Y", tz="")
data<- d[which(strptime(as.character(d$BGN_DATE), "%m/%d/%Y %H:%M:%S") > rdate),]

Lets also get rid of data that does not report any health or economic loss. We will do this by dropping rows that don’t have any property, crop, injury or fatality data.

#subset on events with damage to persons or property
data <- subset(data, data$PROPDMG > 0 | data$CROPDMG > 0 | 
                   data$FATALITIES > 0 | data$INJURIES > 0)

The data has Crop damage and Property damage numbers and multipliers in it. I am going to calculate a total property damage value. This will be property damage times the the multiplier plus the same for crop damage- crop damage times the multiplier. The multipliers need to be convert from a letter code to a numeric value.

##First MAKE CHANGE TO Incorrect value pointed out by TA on Discussion board.
data[data$REFNUM == 605943,"PROPDMGEXP"] <- "M" 
 
data$PROPX[data$PROPDMGEXP %in% c("K","k") ] <- 1000
data$PROPX[data$PROPDMGEXP %in% c("M","m") ] <- 1000000
data$PROPX[data$PROPDMGEXP %in% c("B","b") ] <- 1000000000

# now calculate and add a complete property damage value
data <- mutate(data, PropertyDamage = PROPX * PROPDMG)  

#Do the same for crops
data$CROPX[data$CROPDMGEXP %in% c("K","k") ] <- 1000
data$CROPX[data$CROPDMGEXP %in% c("M","m") ] <- 1000000
data$CROPX[data$CROPDMGEXP %in% c("B","b") ] <- 1000000000
# now calculate and add a complete property damage value
data <- mutate(data, CropDamage = CROPX * CROPDMG) 

For my analysis I will be using a total for crop damages and property damages called EconCost.
This will represent the total recorded economic cost of a weather event.

## remove NA values with mapvalues so EconCost will not be NA
data <- mutate(data, EconCost = mapvalues(CropDamage, NA, 0) + 
                       mapvalues(PropertyDamage, NA, 0))

For my analysis I will sum the Fatalities and Injuries into a value called Casualties. I will use this number to represent the recorded harm to human health caused by a weather event.

data <- mutate(data, Casualties=  FATALITIES + INJURIES)

At this point I will clean out the columns of data I will no longer need for analysis.

data <- data[ , c(8,42, 43)]

c_names <- c("EventType","EconCost", "Casualties")
colnames(data) <- c_names

Correcting Event Type names

The event type names that are supposed to be used can be seen here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf page 6 I have added them to a vector to use for verification purposes.

#list of valid names from report
ValidNames <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", 
"Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil","Dust Storm","Excessive Heat", 
"Extreme Cold/Wind Chill", "Flash Flood", 
"Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", 
"High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", 
"Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", 
"Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", 
"Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", 
"Wildfire", "Winter Storm", "Winter Weather")

There are a lot more Event Type names than the 48 listed. We can see some of them here.

head(unique(data[,1]), 20)
##  [1] WINTER STORM         TORNADO              TSTM WIND           
##  [4] HIGH WIND            FLASH FLOOD          FREEZING RAIN       
##  [7] EXTREME COLD         LIGHTNING            HAIL                
## [10] FLOOD                TSTM WIND/HAIL       EXCESSIVE HEAT      
## [13] RIP CURRENTS         Other                HEAVY SNOW          
## [16] WILD/FOREST FIRE     ICE STORM            BLIZZARD            
## [19] STORM SURGE          Ice jam flood (minor
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

The next batch of code will correct some the Event Type Names so they will be consistent. I have commented the code for each step.

## trim extra spaces
data[,1] <- str_trim(as.character(data[,1]), side = "both")

## update Thunderstorm before other changes
data[,1] <- sub("tstm","thunderstorm", data[,1], ignore.case = TRUE)

##function to Capitalise words at begining for with space beforehand 
data[,1] <- tolower(as.character(data[,1]))  ## start with all lower case
## functions to Uppercase first letter
capwords <- function(s, strict = FALSE) {
        cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
## Capitalise words with / before them
capwordsSlash <- function(s, strict = FALSE) {
        cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = "/" )
sapply(strsplit(s, split = "/"), cap, USE.NAMES = !is.null(names(s)))
}
# run functions
data[,1] <- sapply(data[,1], capwords)
data[,1] <- sapply(data[,1], capwordsSlash)

#Make some changes to Event names before running replacement table.
## These correct several problems and helped reduce the size of the update table.
data[,1] <- sub("Thunderstorm Wind.+", "Thunderstorm Wind", data[,1], ignore.case = FALSE, perl = TRUE
    )
data[,1] <- sub(".*Snow.*", "Heavy Snow", data[,1], ignore.case = FALSE, perl = TRUE
)

data[,1] <- sub(".*wind.*", "High Wind", data[,1], ignore.case = TRUE, perl = TRUE
)
data[,1] <- sub(".*(hurricane|typhoon).*", "Hurricane (Typhoon)", data[,1], ignore.case = TRUE, perl = TRUE
)

data[,1] <- sub(".*Ice Jam Flood \\(minor.*", "Flood", data[,1], ignore.case = TRUE)

Next I used an update table to make changes to the remaining incorrect event types. I had create the update table by hand, but I am including a print out of it the appendix so it can be reproduced.

The following code uses the event update table to make the remaining changes to Event Type names.

## search through Event names, if they are not valid, change them to valid names using the update
##table
updateTable <- read.csv("EventUpdate.csv", stringsAsFactors = FALSE)

for (j in 1:nrow(data)){
       if (data[j,][[1]] %in% ValidNames)
               next
    for (i in 1:nrow(updateTable)){
        if (grepl (updateTable[i,][[1]], data[j,][[1]]) )
                data[j,][[1]] <- updateTable[i,][[2]]        
    }
}

RESULTS

The Results can be clearly seen by these graphs.
Here is a graph showing the Weather Events with the greatest Economic Costs on top.

## include just event types with economic costs and 
## remove any remaining event types that don't match Valid Names
d1 <- subset(data, data$EconCost >0 & data$EventType %in% ValidNames )

d1$EconCost <- d1$EconCost / 1000000000 

##group and order events
d1 <- d1 %>% group_by(EventType) %>% summarise_each(funs(sum))
d1 <- arrange(d1,desc(EconCost))
d1$EventType <- factor(d1$EventType, levels = d1$EventType[order(d1$EconCost)])

##Plot events by EconCost
g <- ggplot(d1[1:25,], aes(EventType, EconCost))
g + geom_bar(stat = "identity", fill="#0072B2") + coord_flip() +
        labs(title="Top Weather Event Types by Total Economic Cost",
             x = "Event Type",
             y = "Total Economic Cost(Billions of dollars)")

Here is a graph of the Weather Events showing the greater Casualty rates on top.

## include just event types with casualties and
## remove any remaining invalid type names
d2 <- subset(data, data$Casualties > 0 & data$EventType %in% ValidNames)

## group by event type and summarise
d2 <- d2 %>% group_by(EventType) %>% summarise_each(funs(sum))
##order the Events by number of casulties
d2$EventType <- factor(d2$EventType, levels = d2$EventType[order(d2$Casualties)])

##Plot results for casualties
g <- ggplot(d2, aes(EventType, Casualties))
g + geom_bar(stat = "identity", fill="#D55E00") +
      coord_flip() +
      labs(title="Top Weather Event Types by Total Casualties",
           x="Event Type",
           y = "Total Casualties")

Appendix

1. Event Type update table

Here is a printout of the update table used to correct Event Type values.

updateTable <- read.csv("EventUpdate.csv", stringsAsFactors = FALSE)

print(updateTable)
##                  CurrentName             CorrectName
## 1        Agricultural Freeze         Cold/Wind Chill
## 2     Astronomical High Tide        Storm Surge/Tide
## 3              Beach Erosion        Storm Surge/Tide
## 4                  Black Ice          Winter Weather
## 5               Blowing Dust              Dust Storm
## 6  Coastal  Flooding/Erosion        Storm Surge/Tide
## 7            Coastal Erosion        Storm Surge/Tide
## 8           Coastal Flooding        Storm Surge/Tide
## 9           Coastal Flooding        Storm Surge/Tide
## 10  Coastal Flooding/Erosion        Storm Surge/Tide
## 11  Coastal Flooding/Erosion        Storm Surge/Tide
## 12             Coastal Storm        Storm Surge/Tide
## 13             Coastal Storm        Storm Surge/Tide
## 14              Coastalstorm        Storm Surge/Tide
## 15                      Cold         Cold/Wind Chill
## 16                 Dam Break                   Flood
## 17           Damaging Freeze Extreme Cold/Wind Chill
## 18                 Downburst       Thunderstorm Wind
## 19            Dry Microburst       Thunderstorm Wind
## 20               Early Frost         Cold/Wind Chill
## 21        Erosion/Cstl Flood        Storm Surge/Tide
## 22              Extreme Cold Extreme Cold/Wind Chill
## 23               Flash Flood             Flash Flood
## 24         Flash Flood/Flood                   Flood
## 25         Flood/Flash/Flood                   Flood
## 26                       Fog             FreezingFog
## 27                    Freeze         Cold/Wind Chill
## 28                    Freeze         Cold/Wind Chill
## 29          Freezing Drizzle                   Sleet
## 30          Freezing Drizzle                   Sleet
## 31          Freezing Drizzle                   Sleet
## 32             Freezing Rain                   Sleet
## 33            Freezing Spray          Winter Weather
## 34               FreezingFog            Freezing Fog
## 35               FreezingFog            Freezing Fog
## 36               FreezingFog            Freezing Fog
## 37                     Frost         Cold/Wind Chill
## 38                     Glaze         Cold/Wind Chill
## 39                     Glaze         Cold/Wind Chill
## 40               Hard Freeze         Cold/Wind Chill
## 41            Hazardous Surf        Storm Surge/Tide
## 42                 Heat Wave          Excessive Heat
## 43      Heavy Rain/High Surf              Heavy Rain
## 44                Heavy Seas        Storm Surge/Tide
## 45                Heavy Surf        Storm Surge/Tide
## 46      Heavy Surf/High Surf        Storm Surge/Tide
## 47                 High Seas        Storm Surge/Tide
## 48        High Surf Advisory        Storm Surge/Tide
## 49               High Swells        Storm Surge/Tide
## 50                High Water        Storm Surge/Tide
## 51     Hyperthermia/Exposure         Cold/Wind Chill
## 52      Hypothermia/Exposure         Cold/Wind Chill
## 53      Hypothermia/Exposure         Cold/Wind Chill
## 54               Ice On Road               Ice Storm
## 55                 Ice Roads               Ice Storm
## 56                 Icy Roads               Ice Storm
## 57                 Landslide             Flash Flood
## 58                 Landslump             Flash Flood
## 59                 Landspout               Ice Storm
## 60                Microburst       Thunderstorm Wind
## 61              Mixed Precip          Winter Weather
## 62       Mixed Precipitation          Winter Weather
## 63                 Mud Slide             Flash Flood
## 64                  Mudslide             Flash Flood
## 65                      Rain                   Flood
## 66               Record Heat          Excessive Heat
## 67              Rip Currents             Rip Current
## 68               River Flood                   Flood
## 69            River Flooding                   Flood
## 70                Rogue Wave                 Tsunami
## 71                Rough Seas        Storm Surge/Tide
## 72                Rough Surf      Marine Strong Wind
## 73                Small Hail                    Hail
## 74               Storm Surge        Storm Surge/Tide
## 75              Thunderstorm       Thunderstorm Wind
## 76            Tidal Flooding        Storm Surge/Tide
## 77            Tidal Flooding        Storm Surge/Tide
## 78       Torrential Rainfall             Flash Flood
## 79         Unseasonably Warm                    Heat
## 80           Unseasonal Rain                   Flood
## 81      Urban/Sml Stream Fld                   Flood
## 82              Warm Weather                    Heat
## 83            Wet Microburst       Thunderstorm Wind
## 84          Wild/Forest Fire                Wildfire
## 85        Winter Weather Mix          Winter Weather
## 86        Winter Weather/Mix          Winter Weather
## 87                Wintry Mix          Winter Weather

2. Session information for the system used to create this report:

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.1   lubridate_1.3.3 stringr_1.0.0   dplyr_0.4.3    
## [5] plyr_1.8.3     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.1      knitr_1.11       magrittr_1.5     MASS_7.3-43     
##  [5] munsell_0.4.2    colorspace_1.2-6 R6_2.1.1         tools_3.2.2     
##  [9] parallel_3.2.2   grid_3.2.2       gtable_0.1.2     DBI_0.3.1       
## [13] htmltools_0.2.6  yaml_2.1.13      lazyeval_0.1.10  assertthat_0.1  
## [17] digest_0.6.8     reshape2_1.4.1   memoise_0.2.1    evaluate_0.8    
## [21] rmarkdown_0.8.1  labeling_0.3     stringi_1.0-1    scales_0.3.0    
## [25] proto_0.3-10