This report has been created for the purposes of completing an end-of-module project for the Reproducible Research Course offered by the Johns Hopkins Bloomberg School Of Public Health on Coursera. For the purposes of the assignment, storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA) Website was analysed to ascertain the public health and financial impact of natural disasters in the US. From NOAA’s website, “the database contains storm data from January 1950 [to date] as entered by NOAA’s National Weather Service (NWS)”. The report was required to be created for the purposes of de-briefing government officials responsible for preparing for severe weather events and the report format(html) was required to be uploaded on Rpubs.
For the purposes of the assignment data was downloaded from the assignment provided URL Link.
library(tidyverse, warn.conflicts = FALSE)
DlURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("Noaa.bz2")){
download.file(DlURL, destfile = "./Noaa.bz2")
}
if(!exists("rawData")){
rawData <- as.tibble(read.csv(bzfile("./Noaa.bz2")))
}
After loading the data into the local evironment, the function str() was used to identify the column names required for analysis in conjunction with the database documentation from the NOAA website (see supplemental material).
The following subsets of data were created for the purposes of analysis:
Public Health Impact
Financial Impact
While fatalities and injuries both need prevention, injuries, require resources to manage and treat. Summing up the two obscures the respective impacts, hence, it was decided to present injuries and fatalities separately.
##Public Health Subset
PHSubsetData <- rawData %>% select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>% summarise_all(sum)
TopInjuries <- PHSubsetData %>%
arrange(desc(INJURIES)) %>%
top_n(10, INJURIES) %>%
mutate(INJURIES = INJURIES/1000) ##In 1000s for better graphical representation
TopFatalities <- PHSubsetData %>%
arrange(desc(FATALITIES)) %>%
top_n(10, FATALITIES) %>%
mutate(INJURIES = INJURIES/1000) ##In 1000s for better graphical representation
TopPHIssues <- union(TopInjuries, TopFatalities) %>% arrange(desc(FATALITIES))
FinSubsetData <- rawData %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
unique(FinSubsetData$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(FinSubsetData$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
Per NOAA documentation (see supplemental material for link) column headings “CROPDMGEXP” and “PROPDMGEXP” modify the columns “CROPDMG” and “PROPDMG” by using “B”, “M”, and “K” to denote figures in Billions, Millions, and Thousands, respectively. For the purposes of this analysis, column entries not strictly denoted with “B”, “b”, “M”, “m”, “K”, or “k” were dropped. Further more, character notations were substituted with numerical values to aid in computation of the financial impact using gsub(). Finally, three separate subsets were created to convey the results of the analysis. Given that some authorities would be more concerned about crop damage vs. property damage, it was decided to present those data separately.
FinSubsetData <- FinSubsetData %>%
filter(!(PROPDMGEXP %in% (c("-", "+", "?", "", "0", "5", "3")))) %>%
filter(!(CROPDMGEXP %in% (c("-", "+", "?", "", "0"))))
##Function to change the damage suffix to numerical amounts
SufFun <- function(char){
char <- as.character(char);
char <- gsub("h|H", 100, char);
char <- gsub("m|M", 1000000, char);
char <- gsub("b|B", 1000000000, char);
char <- gsub("k|K", 1000, char);
char <- as.numeric(char)
}
FinSubsetData$CROPDMGEXP <- SufFun(FinSubsetData$CROPDMGEXP)
FinSubsetData$PROPDMGEXP <- SufFun(FinSubsetData$PROPDMGEXP)
##Total Financial Impact in Billions
SumFinImpact <- FinSubsetData %>%
mutate(Cost = (CROPDMG*CROPDMGEXP) + (PROPDMG*PROPDMGEXP)) %>%
group_by(EVTYPE) %>%
summarise_all(sum) %>%
mutate(Cost = Cost/10^9) %>%
arrange(desc(Cost)) %>%
top_n(15, Cost)
SumFinImpact$Cost <- round(SumFinImpact$Cost, digits = 2)
##Property Damage in Billions
PropDamage <- FinSubsetData %>%
select(EVTYPE, PROPDMG, PROPDMGEXP) %>%
mutate(Cost = PROPDMG*PROPDMGEXP) %>%
group_by(EVTYPE) %>%
summarise_all(sum) %>%
mutate(Cost = Cost/10^9) %>%
arrange(desc(Cost)) %>%
top_n(10, Cost)
PropDamage$Cost <- round(PropDamage$Cost, digits = 2)
##CropDamage
CropDamage <- FinSubsetData %>%
select(EVTYPE, CROPDMG, CROPDMGEXP) %>%
mutate(Cost = CROPDMG*CROPDMGEXP) %>%
group_by(EVTYPE) %>%
summarise_all(sum) %>%
mutate(Cost = Cost/10^9) %>%
arrange(desc(Cost)) %>%
top_n(10, Cost)
CropDamage$Cost <- round(CropDamage$Cost, digits = 2)
The following bar-charts were created to visualize the events that cause the most (top ten) impact on public health from the perspective of fatalities and injuries.
par(mar = c(5,11,5,0))
par(mfrow = c(1,2))
barplot(TopPHIssues$FATALITIES, names.arg = TopPHIssues$EVTYPE, las=2, horiz = T, xlim = c(0,6000), main = "Fatalities", col = "pink")
par(mar = c(5,2,5,5))
barplot(TopPHIssues$INJURIES, las=2, horiz = T, main = "Injuries (in 1000's)", xlim = c(0,100), col = "purple")
As can be seen from the chart, Tornadoes, Excessive Heat, and, Flooding are a great threat to public health.
The following bar-charts were created to visualize the events that result in the most (top-ten) financial loss due to severe weather.
par(mfrow = c(1,2))
par(mar = c(5,11,5,0))
barplot(PropDamage$Cost, names.arg = PropDamage$EVTYPE, las=2, horiz = T, xlim = c(0,150), main = "Property Damage Cost in $B", col = "Magenta")
par(mar = c(5,11,5,2))
barplot(CropDamage$Cost, names.arg = CropDamage$EVTYPE, las=2, horiz = T, main = "Crop Damage Cost in $B", xlim = c(0,6), col = "Coral")
As can be seen from the chart, the top causes for crop damage are adverse weather events, such as, Frost, Drought, Hurricanes, and, Floods.
Flooding constitutes the main cause for property damage, followed by, Hurricanes and Tornadoes.
Additionally, the following bar-chart was created to visualize the overall financial impact from severe weather.
par(mfrow = c(1,1))
par(mar = c(5,14,5,2))
barplot(SumFinImpact$Cost, names.arg = SumFinImpact$EVTYPE, las=2, horiz = T, main = "Top 15 Costliest Forms of Natural Disasters (NOAA Data)", xlim = c(0,150), col = "Orange", sub = "Cost in Billions")
As can be seen from the chart, Flooding is a major cause for both crop and property damage.
All files (minus the original data) and code have been uploaded onto Github Link
Documentation for the Storm-data database can be sourced directly from NOAA’s website Link
For more information on the Reproducible Research course offered by the Johns Hopkins Bloomberg School of Public Health on Coursera visit the course homepage at Link