Introduction:
WHO reported people within European Region does not have access to basic drinking water (15
million), basic sanitation (29 million) and hygiene services (4 million) during 2020 (WHO, 2022).
A lot of focus has been emphasized on household but rarely to no focus on unserved or
homeless persons. Still surface water has been reported to be used by 2.5 million persons
within western and central Asia as well southern and eastern Europe in 2020. Despite seeing a
decline in open defecation and surface water drinking since 2000 but still almost 99.8% of rural
population still practice open defecation. One of the main reasons reported for this gap is social
inequalities (WHO, 2022).
The data extracted from WHO comprised of 8 columns and showing data regarding SDG goal 6,
European Regions, residence type (rural/urban/total), service type (drinking-water/ sanitation /
hygiene), year (2010-2022), coverage, population, and service level (safely managed/ basic/
limited/ at least-basic/ unimproved/ surface water). The data provides the opportunity to identify
potential gaps in service type and its level within specific region and do analysis to determine
potential region that might need more effort with growing population.
Three novel Questions for Further Investigation: I have formulated three novel questions for future investigation:
Question_1: How does the distribution of basic drinking water, sanitation, and hygiene services differ among various European regions?
Question_2:What trends can be observed in the coverage of essential services from 2010 to 2022, and how are these trends associated with changes in population?
Question_3:Have there been significant disparities in service levels between rural and urban areas within the European Region, and how have these disparities evolved over time?
Research Purpose:
The primary purpose of this project is to compare the different regions receiving service level
over time. Identifying if the coverage is optimum with the change in population and identifying
potential regions that could need attention in the upcoming year. Furthermore, to predict in
upcoming years who is going towards a more sustainable lifestyle.
library(readr) library(dplyr)
library(ggplot2) library(MASS)
# Load the data df <- read_csv("/Users/mohammedhossain/Desktop/washdash-download (1).csv")
summary(df)
## Type Region Residence Type Service Type ## Length:3367 Length:3367 Length:3367 Length:3367 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ## ## ## ## Year Coverage Population Service level ## Min. :2010 Min. : 0.000 Min. :0.000e+00 Length:3367 ## 1st Qu.:2013 1st Qu.: 2.486 1st Qu.:4.366e+06 Class :character ## Median :2016 Median : 12.110 Median :3.306e+07 Mode :character ## Mean :2016 Mean : 22.447 Mean :1.497e+08 ## 3rd Qu.:2019 3rd Qu.: 34.190 3rd Qu.:1.755e+08 ## Max. :2022 Max. :100.000 Max. :2.173e+09
options(scipen = 999) # Aggregating coverage and population data by year aggregated_data <- df %>% group_by(Year) %>% summarize(mean_coverage = mean(Coverage), mean_population = mean(Population)) # Rename the columns names(aggregated_data)[1] <- "Year" names(aggregated_data)[2] <- "Mean_Coverage" names(aggregated_data)[3] <- "Mean_Population" # Convert Year column to integer aggregated_data$Year <- as.integer(aggregated_data$Year) # Round Mean Value to 2 digits aggregated_data$Mean_Coverage <- round(aggregated_data$Mean_Coverage, digits = 2) aggregated_data$Mean_Population <- round(aggregated_data$Mean_Population, digits = 2) # Plotting trends in coverage and population over time ggplot(aggregated_data, aes(x = Year)) + geom_line(aes(y = Mean_Coverage, color = "Coverage"), linetype = "solid", linewidth = 1.5) + geom_line(aes(y = Mean_Population, color = "Population"), linetype = "dashed", linewidth = 1.5) + labs(title = "Trends in Coverage of Essential Services and Population (2010-2022)", x = "Year", y = "Mean Value") + scale_color_manual(values = c("Coverage" = "blue", "Population" = "red")) + theme_minimal() + theme(legend.key.size = unit(1.5, "lines"))
# Bar plot ggplot(df, aes(x = Region, y = Population, fill = Region)) + geom_bar(stat = "identity") + labs(title = "Population by Region", x = "Region", y = "Population") + theme_minimal()
library(ppcor) # Calculate Pearson correlation coefficient and its significance cor_result <- cor.test(df$Coverage, df$Population, method = "pearson") cor_result
## ## Pearson's product-moment correlation ## ## data: df$Coverage and df$Population ## t = 45.286, df = 3365, p-value < 0.00000000000000022 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.5939276 0.6359223 ## sample estimates: ## cor ## 0.6153614