Reproducible Research: Peer Assessment 2

Impact of Severe Weather Events on Public Health and Economy in the United States

Synonpsis

This report will look at the public health and economic impacts of severe weather events. The data is collected byU.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and is available at : https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 From the analysis, tornado results in the highest casualties and also highest injuries while flood causes the greatest economic cost and also property damage in the United States from 1995 to 2011.

Basic settings

echo = TRUE  # Always make code visible
options(scipen = 1)  # Turn off scientific notations for numbers
library(R.utils)
library(ggplot2)
library(plyr)
library(gridExtra)

Data Processing

Read the downloaded csv file.

stormData <- read.csv("repdata-data-StormData.csv", sep = ",")

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

if (dim(stormData)[2] == 37) {
    stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(stormData$year, breaks = 30)

Based on the above histogram, the number of events tracked starts to increase around 1995. The subset of the data from 1995 to 2011 will be used for analysis instead.

storm <- stormData[stormData$year >= 1995, ]## dataset that contains all data after 1995
dim(storm)
## [1] 681500     38

Now, the data will focus on data from 1995 onwards.

Impact on Public Health

In this section, we check the number of fatalities and injuries that are caused by the severe weather events. Finally we do a calculation of the number of casualties, which is the sum of fatatlities and injuries caused by a each weather event. We would like to get the top 5 most severe types of weather events.

sortFun <- function(ColHeader, top = 5, dataset = stormData) {
    index <- which(colnames(dataset) == ColHeader)## find the column that has the same name as the chosen ColHeader
    ## looking at the same kind of EVTYPE, sum up the number according to the chosen column header
    ## put this into a new temp variable field
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    ## name the column headers of the temp dataset field with EVTYPE and chosen ColHeader
    names(field) <- c("EVTYPE", ColHeader)
    ## rearrange the dataset based on the second column (the chosen ColHeader), decreasing = True
    field <- arrange(field, field[, 2], decreasing = T)
    ## Cut away the bottom, leaving only the top few, based on the value of Top variable
    field <- head(field, n = top)
    ## reducing the number of EVTYPE to the n= top
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}
## find out the top 15 ENVTYPES with higest fatalities and injuries
fatalities <- sortFun("FATALITIES", dataset = storm)
injuries <- sortFun("INJURIES", dataset = storm)

## the total number of fatalities and injuries added together will give the highest public health impact
storm <- cbind(storm, storm$FATALITIES + storm$INJURIES)
names(storm)[dim(storm)[2]] <- "Casualties"
casualties <-sortFun("Casualties", dataset = storm)

Impact on Economy

We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B). We will then do a final combination of property and crop damage to get the total cost on tehe economy

convertFun <- function(dataset = storm, ColHeader, newColHeader) {
    ## calculate the number of columns
    totalLen <- dim(dataset)[2]
    ## find the column that has the same name as the chosen ColHeader
    index <- which(colnames(dataset) == ColHeader)
    ## change the column with the chosen ColHeader to become all characters 
    dataset[, index] <- as.character(dataset[, index])
    ## # returns TRUE if x is not missing
    logic <- !is.na(toupper(dataset[, index]))
    ## if the data is not missing and if the row value is B, change it to 9 and so on
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    ## change all the values from characters back to numeric for further calculations
    dataset[, index] <- as.numeric(dataset[, index])
    ## change all NA to 0
    dataset[is.na(dataset[, index]), index] <- 0
    ## make a calculation to the actual value times the value exponential, add it as a new column
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    ## give the new column a name
    names(dataset)[totalLen + 1] <- newColHeader
    return(dataset)
}
## create two new columns of propertyDamage & cropdamage
storm <- convertFun(storm, "PROPDMGEXP", "propertyDamage")
## Warning in convertFun(storm, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storm <- convertFun(storm, "CROPDMGEXP", "cropDamage")
## Warning in convertFun(storm, "CROPDMGEXP", "cropDamage"): NAs introduced
## by coercion
##see that two new columns have been added

names(storm)
##  [1] "STATE__"        "BGN_DATE"       "BGN_TIME"       "TIME_ZONE"     
##  [5] "COUNTY"         "COUNTYNAME"     "STATE"          "EVTYPE"        
##  [9] "BGN_RANGE"      "BGN_AZI"        "BGN_LOCATI"     "END_DATE"      
## [13] "END_TIME"       "COUNTY_END"     "COUNTYENDN"     "END_RANGE"     
## [17] "END_AZI"        "END_LOCATI"     "LENGTH"         "WIDTH"         
## [21] "F"              "MAG"            "FATALITIES"     "INJURIES"      
## [25] "PROPDMG"        "PROPDMGEXP"     "CROPDMG"        "CROPDMGEXP"    
## [29] "WFO"            "STATEOFFIC"     "ZONENAMES"      "LATITUDE"      
## [33] "LONGITUDE"      "LATITUDE_E"     "LONGITUDE_"     "REMARKS"       
## [37] "REFNUM"         "year"           "Casualties"     "propertyDamage"
## [41] "cropDamage"
##remove scientific notation in printing with this code:
options(scipen=999)
## top 15 ENVTYPE with the highest Property damage and Cropdamage
property <- sortFun("propertyDamage", dataset = storm)
crop <- sortFun("cropDamage", dataset = storm)

## the total cost of Property and Crop damage added together will give the highest economic impact
storm <- cbind(storm, storm$propertyDamage + storm$cropDamage)
names(storm)[dim(storm)[2]] <- "Cost"
cost <-sortFun("Cost", dataset = storm)

Results

To assess the impact on public health, the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.

fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Fatalities") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Injuries") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)

Excessive heat caused the most fatalities and tornado caused most injuries in the United States from 1995 to 2011.

As for the impact on economy the following is a pair of graphs of total property damage and total crop damage affected by these severe weather events.

propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property Damage in US dollars")+ 
    xlab("Severe Weather Type") + ggtitle("Total Property Damage by\n Severe Weather Events in\n the U.S. from 1995 - 2011")

cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") + 
    xlab("Severe Weather Type") + ggtitle("Total Crop Damage by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
grid.arrange(propertyPlot, cropPlot, ncol = 2)

Flood caused the highest property damage while drought caused the highest crop damage in the United States from 1995 to 2011.

casualtiesPlot <- qplot(EVTYPE, data = casualties, weight = Casualties, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Casualties") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Casualties by Severe Weather\n Events in the U.S.\n from 1995 - 2011")

costPlot<- qplot(EVTYPE, data = cost, weight = Cost, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") + 
    xlab("Severe Weather Type") + ggtitle("Total Cost by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
grid.arrange(casualtiesPlot, costPlot, ncol = 2)

We do a final plot to view the combined damage of public health and the economy.

Conclusion

The final conclusion from the combined graphs tells us that tornado results in the highest casualties and also highest injuries while flood causes the greatest economic cost and also property damage in the United States from 1995 to 2011.