This report will look at the public health and economic impacts of severe weather events. The data is collected byU.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and is available at : https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 From the analysis, tornado results in the highest casualties and also highest injuries while flood causes the greatest economic cost and also property damage in the United States from 1995 to 2011.
echo = TRUE # Always make code visible
options(scipen = 1) # Turn off scientific notations for numbers
library(R.utils)
library(ggplot2)
library(plyr)
library(gridExtra)
Read the downloaded csv file.
stormData <- read.csv("repdata-data-StormData.csv", sep = ",")
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
if (dim(stormData)[2] == 37) {
stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(stormData$year, breaks = 30)
Based on the above histogram, the number of events tracked starts to increase around 1995. The subset of the data from 1995 to 2011 will be used for analysis instead.
storm <- stormData[stormData$year >= 1995, ]## dataset that contains all data after 1995
dim(storm)
## [1] 681500 38
Now, the data will focus on data from 1995 onwards.
In this section, we check the number of fatalities and injuries that are caused by the severe weather events. Finally we do a calculation of the number of casualties, which is the sum of fatatlities and injuries caused by a each weather event. We would like to get the top 5 most severe types of weather events.
sortFun <- function(ColHeader, top = 5, dataset = stormData) {
index <- which(colnames(dataset) == ColHeader)## find the column that has the same name as the chosen ColHeader
## looking at the same kind of EVTYPE, sum up the number according to the chosen column header
## put this into a new temp variable field
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
## name the column headers of the temp dataset field with EVTYPE and chosen ColHeader
names(field) <- c("EVTYPE", ColHeader)
## rearrange the dataset based on the second column (the chosen ColHeader), decreasing = True
field <- arrange(field, field[, 2], decreasing = T)
## Cut away the bottom, leaving only the top few, based on the value of Top variable
field <- head(field, n = top)
## reducing the number of EVTYPE to the n= top
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
## find out the top 15 ENVTYPES with higest fatalities and injuries
fatalities <- sortFun("FATALITIES", dataset = storm)
injuries <- sortFun("INJURIES", dataset = storm)
## the total number of fatalities and injuries added together will give the highest public health impact
storm <- cbind(storm, storm$FATALITIES + storm$INJURIES)
names(storm)[dim(storm)[2]] <- "Casualties"
casualties <-sortFun("Casualties", dataset = storm)
We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B). We will then do a final combination of property and crop damage to get the total cost on tehe economy
convertFun <- function(dataset = storm, ColHeader, newColHeader) {
## calculate the number of columns
totalLen <- dim(dataset)[2]
## find the column that has the same name as the chosen ColHeader
index <- which(colnames(dataset) == ColHeader)
## change the column with the chosen ColHeader to become all characters
dataset[, index] <- as.character(dataset[, index])
## # returns TRUE if x is not missing
logic <- !is.na(toupper(dataset[, index]))
## if the data is not missing and if the row value is B, change it to 9 and so on
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
## change all the values from characters back to numeric for further calculations
dataset[, index] <- as.numeric(dataset[, index])
## change all NA to 0
dataset[is.na(dataset[, index]), index] <- 0
## make a calculation to the actual value times the value exponential, add it as a new column
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
## give the new column a name
names(dataset)[totalLen + 1] <- newColHeader
return(dataset)
}
## create two new columns of propertyDamage & cropdamage
storm <- convertFun(storm, "PROPDMGEXP", "propertyDamage")
## Warning in convertFun(storm, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storm <- convertFun(storm, "CROPDMGEXP", "cropDamage")
## Warning in convertFun(storm, "CROPDMGEXP", "cropDamage"): NAs introduced
## by coercion
##see that two new columns have been added
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE"
## [5] "COUNTY" "COUNTYNAME" "STATE" "EVTYPE"
## [9] "BGN_RANGE" "BGN_AZI" "BGN_LOCATI" "END_DATE"
## [13] "END_TIME" "COUNTY_END" "COUNTYENDN" "END_RANGE"
## [17] "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES"
## [25] "PROPDMG" "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
## [29] "WFO" "STATEOFFIC" "ZONENAMES" "LATITUDE"
## [33] "LONGITUDE" "LATITUDE_E" "LONGITUDE_" "REMARKS"
## [37] "REFNUM" "year" "Casualties" "propertyDamage"
## [41] "cropDamage"
##remove scientific notation in printing with this code:
options(scipen=999)
## top 15 ENVTYPE with the highest Property damage and Cropdamage
property <- sortFun("propertyDamage", dataset = storm)
crop <- sortFun("cropDamage", dataset = storm)
## the total cost of Property and Crop damage added together will give the highest economic impact
storm <- cbind(storm, storm$propertyDamage + storm$cropDamage)
names(storm)[dim(storm)[2]] <- "Cost"
cost <-sortFun("Cost", dataset = storm)
To assess the impact on public health, the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.
fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)
Excessive heat caused the most fatalities and tornado caused most injuries in the United States from 1995 to 2011.
As for the impact on economy the following is a pair of graphs of total property damage and total crop damage affected by these severe weather events.
propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property Damage in US dollars")+
xlab("Severe Weather Type") + ggtitle("Total Property Damage by\n Severe Weather Events in\n the U.S. from 1995 - 2011")
cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") +
xlab("Severe Weather Type") + ggtitle("Total Crop Damage by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
grid.arrange(propertyPlot, cropPlot, ncol = 2)
Flood caused the highest property damage while drought caused the highest crop damage in the United States from 1995 to 2011.
casualtiesPlot <- qplot(EVTYPE, data = casualties, weight = Casualties, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Casualties") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Casualties by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
costPlot<- qplot(EVTYPE, data = cost, weight = Cost, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") +
xlab("Severe Weather Type") + ggtitle("Total Cost by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
grid.arrange(casualtiesPlot, costPlot, ncol = 2)
We do a final plot to view the combined damage of public health and the economy.
The final conclusion from the combined graphs tells us that tornado results in the highest casualties and also highest injuries while flood causes the greatest economic cost and also property damage in the United States from 1995 to 2011.