Reproducible Research Course Project 2
Peer-graded Assignment
This course project is available on GitHub
Storms and other extreme weather events have an impact on both population health and economic stability. This is examined in the analysis “Analysis of U.S. Storm Event Data and the Impact on Population Health and the Economy”. The project starts with an overview that can be found on GitHub, highlighting how important it is to comprehend the effects of catastrophic weather events. In order to mitigate negative impacts, the analysis focuses on preventing fatalities, injuries, and property damage. The weather events with the biggest negative effects on population health and economic ramifications are identified based on estimates of crop destruction, property damage, fatalities, and injuries. The environment is configured to load necessary packages and specify knitr parameters, ensuring reproducibility. Data is retrieved and analyzed to reveal insights into negative effects and financial ramifications.
###In short The estimates for fatalities and injuries were used to determine weather events with the most harmful impact to population health. Property damage and crop damage cost estimates were used to determine weather events with the greatest economic consequences.
if (!require(ggplot2)) {
install.packages("ggplot2")
library(ggplot2)
}
## Loading required package: ggplot2
if (!require(dplyr)) {
install.packages("dplyr")
library(dplyr, warn.conflicts = FALSE)
}
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!require(xtable)) {
install.packages("xtable")
library(xtable, warn.conflicts = FALSE)
}
## Loading required package: xtable
sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Asia/Seoul
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] xtable_1.8-4 dplyr_1.1.4 ggplot2_3.4.4
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 cli_3.6.2 knitr_1.45 rlang_1.1.3
## [5] xfun_0.41 generics_0.1.3 jsonlite_1.8.8 glue_1.7.0
## [9] colorspace_2.1-0 htmltools_0.5.7 sass_0.4.8 fansi_1.0.6
## [13] scales_1.3.0 rmarkdown_2.25 grid_4.3.2 evaluate_0.23
## [17] munsell_0.5.0 jquerylib_0.1.4 tibble_3.2.1 fastmap_1.1.1
## [21] yaml_2.3.8 lifecycle_1.0.4 compiler_4.3.2 pkgconfig_2.0.3
## [25] rstudioapi_0.15.0 digest_0.6.34 R6_2.5.1 tidyselect_1.2.0
## [29] utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3 bslib_0.6.1
## [33] withr_3.0.0 tools_4.3.2 gtable_0.3.4 cachem_1.0.8
stormDataFileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
stormDataFile <- "storm-data.csv.bz2"
if (!file.exists('data')) {
dir.create('data')
}
if (!file.exists(stormDataFile)) {
download.file(url = stormDataFileURL, destfile = stormDataFile)
}
data <- read.csv(stormDataFile, sep = ",", header = TRUE)
stopifnot(file.size(stormDataFile) == 49177144)
stopifnot(dim(data) == c(902297,37))
harmfuldata <- data[, c("EVTYPE", "FATALITIES", "INJURIES")]
economicdata <- data[, c("EVTYPE", "PROPDMG","PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
An alphabetical character used to indicate magnitude and logs “K” for
thousands, “M” for millions, and “B” for billions should be present in
PROPDMGEXP} andCROPDMGEXP}. Nevertheless, a cursory
examination of the data reveals that multiple additional characters are
being recorded.
getMultiplier <- function(exp) {
exp <- toupper(exp);
if (exp == "") return (10^0);
if (exp == "-") return (10^0);
if (exp == "?") return (10^0);
if (exp == "+") return (10^0);
if (exp == "0") return (10^0);
if (exp == "1") return (10^1);
if (exp == "2") return (10^2);
if (exp == "3") return (10^3);
if (exp == "4") return (10^4);
if (exp == "5") return (10^5);
if (exp == "6") return (10^6);
if (exp == "7") return (10^7);
if (exp == "8") return (10^8);
if (exp == "9") return (10^9);
if (exp == "H") return (10^2);
if (exp == "K") return (10^3);
if (exp == "M") return (10^6);
if (exp == "B") return (10^9);
return (NA);
}
# calculate property damage and crop damage costs (in billions)
economicdata$PROP_COST <- with(economicdata, as.numeric(PROPDMG) * sapply(PROPDMGEXP, getMultiplier))/10^9
economicdata$CROP_COST <- with(economicdata, as.numeric(CROPDMG) * sapply(CROPDMGEXP, getMultiplier))/10^9
sum_by_EVTYPE <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, harmfuldata, sum)
data_filtered <- sum_by_EVTYPE[sum_by_EVTYPE[,2] != 0 | sum_by_EVTYPE[,3] != 0, ]
data_filtered$harmful <- data_filtered[,2] + data_filtered[,3]
par(mar = c(8, 4, 4, 2) + 0.1)
# Create a bar plot
barplot(data_filtered$harmful,
names.arg = data_filtered$EVTYPE,
xlab = "", # Remove default x-axis label
ylab = "Harmful Impact",
main = "Harmful Impact by Event Type",
col = "skyblue",
border = "black",
las = 3, # Rotate labels vertically for better readability
cex.names = 0.5) # Adjust font size of names on x-axis
The resulted Plot contain a lot of data so it is better to take only the top 10 values and plot them again.
sorted_data <- data_filtered[order(data_filtered$harmful, decreasing = TRUE), ]
top_10 <- head(sorted_data, 10)
# Plot the top 10 values
barplot(top_10$harmful,
names.arg = top_10$EVTYPE,
xlab = "",
ylab = "Harmful Impact",
main = "Top 10 Harmful Impacts by Event Type",
col = "skyblue",
border = "black",
las = 3, # Rotate labels vertically for better readability
cex.names = 0.8) # Adjust font size of names on x-axis
#Q2
economicdata$Damage <- economicdata$PROP_COST + economicdata$CROP_COST
economicdataFiltered <- economicdata[economicdata$Damage != 0, ]
sum_by_Damage <- aggregate( Damage ~ EVTYPE, data = economicdataFiltered, FUN = sum)
sorted_economicdata <- sum_by_Damage[order(sum_by_Damage$Damage, decreasing = TRUE), ]
economicdata_top_15 <- head(sorted_economicdata, 15)
# Plot the top 10 values
barplot(economicdata_top_15$Damage,
names.arg = economicdata_top_15$EVTYPE,
xlab = "",
ylab = "Economic Impact",
main = "Top 15 economic Impacts by Event Type",
col = "skyblue",
border = "black",
las = 3, # Rotate labels vertically for better readability
cex.names = 0.8) # Adjust font size of names on x-axis
## Results
The following conclusions can be made in light of the information presented in this analysis and backed by the data and graphs that are included:
Across the United States, which types of events (as indicated in the EVTYPE EVTYPE variable) are most harmful with respect to population health?
The highest number of deaths and injuries are caused by tornadoes.
Across the United States, which types of events have the greatest economic consequences?
The majority of crop destruction and property damage expenses are attributed to floods.