Assessing Service Level Disparities in European Regions: A Comparative Analysis and Future Projections

Introduction:
WHO reported people within European Region does not have access to basic drinking water (15 million), basic sanitation (29 million) and hygiene services (4 million) during 2020 (WHO, 2022). A lot of focus has been emphasized on household but rarely to no focus on unserved or homeless persons. Still surface water has been reported to be used by 2.5 million persons within western and central Asia as well southern and eastern Europe in 2020. Despite seeing a decline in open defecation and surface water drinking since 2000 but still almost 99.8% of rural population still practice open defecation. One of the main reasons reported for this gap is social inequalities (WHO, 2022).
The data extracted from WHO comprised of 8 columns and showing data regarding SDG goal 6, European Regions, residence type (rural/urban/total), service type (drinking-water/ sanitation / hygiene), year (2010-2022), coverage, population, and service level (safely managed/ basic/ limited/ at least-basic/ unimproved/ surface water). The data provides the opportunity to identify potential gaps in service type and its level within specific region and do analysis to determine potential region that might need more effort with growing population.

Three novel Questions for Further Investigation: I have formulated three novel questions for future investigation:
Question_1: How does the distribution of basic drinking water, sanitation, and hygiene services differ among various European regions?
Question_2:What trends can be observed in the coverage of essential services from 2010 to 2022, and how are these trends associated with changes in population?
Question_3:Have there been significant disparities in service levels between rural and urban areas within the European Region, and how have these disparities evolved over time?

Research Purpose:
The primary purpose of this project is to compare the different regions receiving service level over time. Identifying if the coverage is optimum with the change in population and identifying potential regions that could need attention in the upcoming year. Furthermore, to predict in upcoming years who is going towards a more sustainable lifestyle.

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

# Load the data
df <- read_csv("/Users/mohammedhossain/Desktop/washdash-download (1).csv")

## Rows: 3367 Columns: 8

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Type, Region, Residence Type, Service Type, Service level
## dbl (3): Year, Coverage, Population
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

summary(df)

##      Type              Region          Residence Type     Service Type      
##  Length:3367        Length:3367        Length:3367        Length:3367       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##       Year         Coverage         Population        Service level     
##  Min.   :2010   Min.   :  0.000   Min.   :0.000e+00   Length:3367       
##  1st Qu.:2013   1st Qu.:  2.486   1st Qu.:4.366e+06   Class :character  
##  Median :2016   Median : 12.110   Median :3.306e+07   Mode  :character  
##  Mean   :2016   Mean   : 22.447   Mean   :1.497e+08                     
##  3rd Qu.:2019   3rd Qu.: 34.190   3rd Qu.:1.755e+08                     
##  Max.   :2022   Max.   :100.000   Max.   :2.173e+09

options(scipen = 999)

# Aggregating coverage and population data by year
aggregated_data <- df %>%
  group_by(Year) %>%
  summarize(mean_coverage = mean(Coverage),
            mean_population = mean(Population))

# Rename the columns
names(aggregated_data)[1] <- "Year"
names(aggregated_data)[2] <- "Mean_Coverage"
names(aggregated_data)[3] <- "Mean_Population"

# Convert Year column to integer
aggregated_data$Year <- as.integer(aggregated_data$Year)

# Round Mean Value to 2 digits
aggregated_data$Mean_Coverage <- round(aggregated_data$Mean_Coverage, digits = 2)
aggregated_data$Mean_Population <- round(aggregated_data$Mean_Population, digits = 2)

# Plotting trends in coverage and population over time
ggplot(aggregated_data, aes(x = Year)) +
  geom_line(aes(y = Mean_Coverage, color = "Coverage"), linetype = "solid", linewidth = 1.5) +
  geom_line(aes(y = Mean_Population, color = "Population"), linetype = "dashed", linewidth = 1.5) +
  labs(title = "Trends in Coverage of Essential Services and Population (2010-2022)",
       x = "Year", y = "Mean Value") +
  scale_color_manual(values = c("Coverage" = "blue", "Population" = "red")) +
  theme_minimal() +
  theme(legend.key.size = unit(1.5, "lines"))

# Bar plot
ggplot(df, aes(x = Region, y = Population, fill = Region)) +
  geom_bar(stat = "identity") +
  labs(title = "Population by Region", x = "Region", y = "Population") +
  theme_minimal()

library(ppcor)
# Calculate Pearson correlation coefficient and its significance
cor_result <- cor.test(df$Coverage, df$Population, method = "pearson")
cor_result

## 
##  Pearson's product-moment correlation
## 
## data:  df$Coverage and df$Population
## t = 45.286, df = 3365, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5939276 0.6359223
## sample estimates:
##       cor 
## 0.6153614