library(ggplot2)
library(gridExtra)
This report draws from the US National Weather Service’s Storm Data dataset to identify the weather events that have had the greatest impact on population health (fatalities and injuries) and economic loss (property damage and crop damage) for the time period 1950 - 2011.
Contained within this report are graphs illustrating the weather events that have had the greatest impact in these regards. These graphs and the accompanying text indicate which event types warrant the highest priority for preventative and remedial expenditures.
The Storm Data dataset was sourced online at the date of this report’s publication. The data set was read into RStudio and subsetted to 7 columns containing the data relevant for this analysis.
## setwd("5. Reproducible Research/Quizzes/Project 2")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
download.file(url,destfile)
sd <- read.csv("StormData.csv.bz2")
sd1 <- sd[, c(8,23:28)]
Data transformations were undertaken on the subsetted dataframe, to produce four dataframes corresponding to fatalities, injuries, property damage and crop damage. These four dataframes were set up to facilitate data plotting.
In each of the four cases, data was subsetted to obtain observations with relevant statistics (ie a value greater than 0 in fatalities, injuries, etc). The EVTYPE data element was unfactorised and refactorised to update the number of relevant factors for each of the four datasets.
In the case of property damage and crop damage, datasets were subsetted depending on whether the exponent of damage values was K (thousands of dollars), M (millions of dollars) or B (billions of dollars). The damage value for K and B amounts was converted to the equivalent value in millions of dollars, and the data was recombined. In both cases a small amount of data which did not possess a correct exponent value was lost - this amounted to 327 observations out of 239,174 for property damage, and 15 observations out of 22,099 for crop damage. For the purpose of this report these observations are regarded as insignificant to the final analysis.
A final dataframe for the four statistics was then constructed, containing the number of fatalities / injuries or the value of damage in millions of dollars for each event type for which a relevant value was recorded.
## Data transformations: fatalities data
sd1a <- sd1[sd1$FATALITIES > 0, ]
sd1a$EVTYPE <- as.character(sd1a$EVTYPE)
sd1a$EVTYPE <- as.factor(sd1a$EVTYPE)
sde1 <- split(sd1a, list(sd1a$EVTYPE))
sde1f <- data.frame(matrix(ncol=2,nrow=1, dimnames=list(NULL, c("EVTYPE", "FATALITIES"))))
for (i in 1:length(sde1)) {
sde1f[i,1] <- as.character(sde1[[i]]$EVTYPE[1])
sde1f[i,2] <- sum(sde1[[i]]$FATALITIES)
}
sde1f <- sde1f[order(sde1f$FATALITIES, decreasing=TRUE),]
## Data transformations: injuries data
sd1b <- sd1[sd1$INJURIES > 0, ]
sd1b$EVTYPE <- as.character(sd1b$EVTYPE)
sd1b$EVTYPE <- as.factor(sd1b$EVTYPE)
sde2 <- split(sd1b, list(sd1b$EVTYPE))
sde1i <- data.frame(matrix(ncol=2,nrow=1, dimnames=list(NULL, c("EVTYPE", "INJURIES"))))
for (i in 1:length(sde2)) {
sde1i[i,1] <- as.character(sde2[[i]]$EVTYPE[1])
sde1i[i,2] <- sum(sde2[[i]]$INJURIES)
}
sde1i <- sde1i[order(sde1i$INJURIES, decreasing=TRUE),]
## Data transformations: property damage data
sd1c <- sd1[sd1$PROPDMG > 0, ]
sd1c$EVTYPE <- as.character(sd1c$EVTYPE)
sd1c$EVTYPE <- as.factor(sd1c$EVTYPE)
## Revalue PROPDMG amounts by exponent
sd1c_b <- sd1c[sd1c$PROPDMGEXP == "B" | sd1c$PROPDMGEXP == "b", ]
for (i in 1:length(sd1c_b$PROPDMGEXP)) {sd1c_b[i,4] <- sd1c_b[i,4]*1000}
sd1c_k <- sd1c[sd1c$PROPDMGEXP == "K" | sd1c$PROPDMGEXP == "k", ]
for (i in 1:length(sd1c_k$PROPDMGEXP)) {sd1c_k[i,4] <- sd1c_k[i,4]/1000}
sd1c_m <- sd1c[sd1c$PROPDMGEXP == "M" | sd1c$PROPDMGEXP == "m", ]
sd1c2 <- data.frame(rbind(sd1c_b, sd1c_m, sd1c_k))
sde3 <- split(sd1c2, list(sd1c2$EVTYPE))
## Set up CROPDMG dataframe for plotting
sde1p <- data.frame(matrix(ncol=2,nrow=1, dimnames=list(NULL, c("EVTYPE", "PROPDMG"))))
for (i in 1:length(sde3)) {
sde1p[i,1] <- as.character(sde3[[i]]$EVTYPE[1])
sde1p[i,2] <- sum(sde3[[i]]$PROPDMG)
}
sde1p <- sde1p[order(sde1p$PROPDMG, decreasing=TRUE),]
## Data transformations: crop damage data
sd1d <- sd1[sd1$CROPDMG > 0, ]
sd1d$EVTYPE <- as.character(sd1d$EVTYPE)
sd1d$EVTYPE <- as.factor(sd1d$EVTYPE)
## Revalue CROPDMG amounts by exponent
sd1d_b <- sd1d[sd1d$CROPDMGEXP == "B" | sd1d$CROPDMGEXP == "b", ]
for (i in 1:length(sd1d_b$CROPDMGEXP)) {sd1d_b[i,6] <- sd1d_b[i,6]*1000}
sd1d_k <- sd1d[sd1d$CROPDMGEXP == "K" | sd1d$CROPDMGEXP == "k", ]
for (i in 1:length(sd1d_k$CROPDMGEXP)) {sd1d_k[i,6] <- sd1d_k[i,6]/1000}
sd1d_m <- sd1d[sd1d$CROPDMGEXP == "M" | sd1d$CROPDMGEXP == "m", ]
sd1d2 <- data.frame(rbind(sd1d_b, sd1d_m, sd1d_k))
sde4 <- split(sd1d2, list(sd1d2$EVTYPE))
## Set up CROPDMG dataframe for plotting
sde1c <- data.frame(matrix(ncol=2,nrow=1, dimnames=list(NULL, c("EVTYPE", "CROPDMG"))))
for (i in 1:length(sde4)) {
sde1c[i,1] <- as.character(sde4[[i]]$EVTYPE[1])
sde1c[i,2] <- sum(sde4[[i]]$CROPDMG)
}
sde1c <- sde1c[order(sde1c$CROPDMG, decreasing=TRUE),]
The following plots outline the weather event types that have had the most significant impact on population health (in terms of fatalities and injuries) and the most significant economic consequences (in terms of property damage and crop damage) in the US for the time period 1950-2011.
## Fatalities Graph
sde1fa <- sde1f[1:10,]
p1 <- ggplot(sde1fa, aes(EVTYPE, FATALITIES, col = EVTYPE, fill = FATALITIES))
p1 <- p1 + geom_col() + scale_x_discrete(labels=NULL)
p1 <- p1 + labs(title="Most serious event types: Fatalities", x="Event Type", y="Fatalities", caption = "Top causes of fatalities by event type in the US, 1950-2011", fill="Fatalities", col="Event Type")
p1 <- p1 + scale_fill_gradient(low="white", high="black")
p1 <- p1 + theme(plot.title = element_text(size = 10))
p1 <- p1 + theme(plot.caption = element_text(size = 6, hjust=1))
p1 <- p1 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p1 <- p1 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p1 <- p1 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
## Injuries Graph
sde1ia <- sde1i[1:10,]
p2 <- ggplot(sde1ia, aes(EVTYPE, INJURIES, col = EVTYPE, fill = INJURIES))
p2 <- p2 + geom_col() + scale_x_discrete(labels=NULL)
p2 <- p2 + labs(title="Most serious event types: Injuries", x="Event Type", y="Injuries", caption = "Top causes of injuries by event type in the US, 1950-2011", fill="Injuries", col="Event Type")
p2 <- p2 + scale_fill_gradient(low="white", high="yellow")
p2 <- p2 + theme(plot.title = element_text(size = 10))
p2 <- p2 + theme(plot.caption = element_text(size = 6, hjust=1))
p2 <- p2 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p2 <- p2 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p2 <- p2 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
## Fatalities Graph excluding tornadoes
sde1fa <- sde1f[2:10,]
p3 <- ggplot(sde1fa, aes(EVTYPE, FATALITIES, col = EVTYPE, fill = FATALITIES))
p3 <- p3 + geom_col() + scale_x_discrete(labels=NULL)
p3 <- p3 + labs(title="Top causes of Fatalities (excl. tornadoes)", x="Event Type", y="Fatalities", caption = "Top fatalities by event type, excluding tornadoes", fill="Fatalities", col="Event Type")
p3 <- p3 + scale_fill_gradient(low="white", high="black")
p3 <- p3 + theme(plot.title = element_text(size = 10))
p3 <- p3 + theme(plot.caption = element_text(size = 6, hjust=1))
p3 <- p3 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p3 <- p3 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p3 <- p3 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
## Injuries Graph excluding tornadoes
sde1ia <- sde1i[2:10,]
p4 <- ggplot(sde1ia, aes(EVTYPE, INJURIES, col = EVTYPE, fill = INJURIES))
p4 <- p4 + geom_col() + scale_x_discrete(labels=NULL)
p4 <- p4 + labs(title="Top causes of Injuries (excl. tornadoes)", x="Event Type", y="Injuries", caption = "Top injuries by event type, excluding tornadoes", fill="Injuries", col="Event Type")
p4 <- p4 + scale_fill_gradient(low="white", high="yellow")
p4 <- p4 + theme(plot.title = element_text(size = 10))
p4 <- p4 + theme(plot.caption = element_text(size = 6, hjust=1))
p4 <- p4 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p4 <- p4 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p4 <- p4 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
## Property damage graph
sde1pa <- sde1p[1:10,]
p5 <- ggplot(sde1pa, aes(EVTYPE, PROPDMG, col = EVTYPE, fill = PROPDMG))
p5 <- p5 + geom_col() + scale_x_discrete(labels=NULL)
p5 <- p5 + labs(title="Most serious event types: Property Damage", x="Event Type", y="Property Damage (millions of dollars)", caption = "Top causes of property damage by event type", fill="Property Damage", col="Event Type")
p5 <- p5 + scale_fill_gradient(low="white", high="red")
p5 <- p5 + theme(plot.title = element_text(size = 10))
p5 <- p5 + theme(plot.caption = element_text(size = 6, hjust=1))
p5 <- p5 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p5 <- p5 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p5 <- p5 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
## Crop damage graph
sde1ca <- sde1c[1:10,]
p6 <- ggplot(sde1ca, aes(EVTYPE, CROPDMG, col = EVTYPE, fill = CROPDMG))
p6 <- p6 + geom_col() + scale_x_discrete(labels=NULL)
p6 <- p6 + labs(title="Most serious event types: Crop Damage", x="Event Type", y="Crop Damage (millions of dollars)", caption = "Top causes of crop damage by event type", fill="Crop Damage", col="Event Type")
p6 <- p6 + scale_fill_gradient(low="white", high="green")
p6 <- p6 + theme(plot.title = element_text(size = 10))
p6 <- p6 + theme(plot.caption = element_text(size = 6, hjust=1))
p6 <- p6 + guides(shape = guide_legend(override.aes = list(size = 0.5)))
p6 <- p6 + guides(color = guide_legend(override.aes = list(size = 0.5)))
p6 <- p6 + theme(legend.title = element_text(size = 6), legend.text = element_text(size = 6))
Plot 1 outlines the fatalities and injuries incurred by the most impactful weather event types for each of these two parameters. The plot illustrates that tornadoes have had by far the most impact in terms of fatalities and injuries during this time period.
grid.arrange(p1, p2, nrow = 1)
Plot 2 replots the fatalities and injuries data but excludes tornadoes from the analysis: this provide a clearer picture of the comparative impact of the remaining most impactful event types on fatalities and injuries.
grid.arrange(p3, p4, nrow = 1)
Plot 3 outlines the property damage and crop damage incurred by the most impactful weather event types for each of these two parameters, in millions of dollars. The plot illustrates that floods have had the most impact in terms of property damage, and droughts have had the most impact in terms of crop damage during this time period.
grid.arrange(p5, p6, nrow = 1)
Taken together, these plots provide an indication of the particular areas for which funding could be prioritised in order to prevent, mitigate or respond most effectively to the population health and economic impacts of weather events in the US.