library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Dataset on Progress in Household Drinking water, Sanitation, and Hygiene is as follow:

WA<-read.csv("/Users/rupeshswarnakar/Desktop/washdash-download.csv")

Below is the summary of each column in the dataset(WA):

summary(WA)
##      Type              Region          Residence.Type     Service.Type      
##  Length:260         Length:260         Length:260         Length:260        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##       Year         Coverage         Population        Service.level     
##  Min.   :2022   Min.   :  0.000   Min.   :0.000e+00   Length:260        
##  1st Qu.:2022   1st Qu.:  1.803   1st Qu.:3.524e+06   Class :character  
##  Median :2022   Median : 10.250   Median :2.916e+07   Mode  :character  
##  Mean   :2022   Mean   : 22.571   Mean   :1.660e+08                     
##  3rd Qu.:2022   3rd Qu.: 33.408   3rd Qu.:1.748e+08                     
##  Max.   :2022   Max.   :100.000   Max.   :2.173e+09

Novel Questions to be experimented:

Q.1 Which Region is comparatively behind on progress in household drinking water?

Q.2 Is there any relationship between development of country to progress on drinking water?

Q.3 Is the population affecting drinking water, sanitation and hygiene? #Sometimes more population can impact the availability or drinking water, sanitation and hygiene.

Addressing Q.2 by using aggregation function:

let’s take a look into Regions that are relatively behind on development such as Sub-Saharan Africa and Oceania:

WA|>
  filter(Region=='Sub-Saharan Africa')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level Coverage
## 1          Basic service 33.60807
## 2        Limited service 13.35015
## 3 Safely managed service 33.17985
## 4          Surface water  5.82792
## 5             Unimproved 14.03401
WA|>
  filter(Region=='Oceania')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level Coverage
## 1         At least basic 55.36870
## 2          Basic service 37.50377
## 3        Limited service  1.76193
## 4 Safely managed service 55.32068
## 5          Surface water 13.61583
## 6             Unimproved 16.76829

Let’s compare these above data with that of comparitively developed Regions such as Europe and Northern America:

WA|>
  filter(Region=='Europe and Northern America')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level   Coverage
## 1          Basic service  6.3636600
## 2        Limited service  0.3317933
## 3 Safely managed service 92.1247667
## 4          Surface water  0.0279600
## 5             Unimproved  1.1518233

We can also look at ‘safely managed’ and ‘unimproved’ service level of all Regions to see which are comparatively higher than the other:

WA|>
  filter(Service.Type== 'Drinking water')|>
  filter(Service.level=='Safely managed service')|>
  aggregate(Coverage~Region, mean)
##                             Region Coverage
## 1        Australia and New Zealand 99.53387
## 2        Central and Southern Asia 67.47019
## 3   Eastern and South-Eastern Asia 76.75868
## 4      Europe and Northern America 92.12477
## 5  Latin America and the Caribbean 69.51487
## 6 Northern Africa and Western Asia 78.96726
## 7                          Oceania 55.32068
## 8               Sub-Saharan Africa 33.17985
WA|>
  filter(Service.Type== 'Drinking water')|>
  filter(Service.level=='Unimproved')|>
  aggregate(Coverage~Region, mean)
##                             Region  Coverage
## 1        Australia and New Zealand  0.018150
## 2        Central and Southern Asia  2.044047
## 3   Eastern and South-Eastern Asia  2.350480
## 4      Europe and Northern America  1.151823
## 5  Latin America and the Caribbean  1.488903
## 6 Northern Africa and Western Asia  2.290723
## 7                          Oceania 16.768293
## 8               Sub-Saharan Africa 14.034013

From the above two datasets we can see that Sub-Saharan Africa and Oceania are comparatively less progressed in terms of drinking water.

Let’s use the visualization using boxplot to see if the above statement is true.

ggplot(WA, aes(x=Service.Type,
               y=Population,
               fill=Region))+
  geom_boxplot()+
  labs(x="Different Types of Services",
       y="Population using Services",
       title="Population vs Types of Services in Different SDG Regions")+
  scale_color_brewer(palette='Dark2')

  • This visualization shows that 2nd quartile in Sub-Saharan Africa is higher than other regions in Drinking water.
  • This means most of the population in Sub-Saharan Africa is getting some kind of similar quality of services (either good or bad).
  • That quality of services can be observed from the below visualization.
ggplot(WA, aes(x=Service.level,
               y=Coverage,
               fill=Region))+
  geom_boxplot()+
  labs(x="Quality of Different Services",
       y="% Coverage using Services",
       title="% Coverage of Population vs Quality of Services in Different SDG Regions")+
  scale_color_brewer(palette='Dark2')

  • This visualization shows that Sub-Saharan Africa and Oceania are higher in service level like unimproved, open defecation, no hand-washing facility.

  • This visualization shows that Sub-Saharan Africa and Oceania are lowest in terms of Safely managed services.

  • Also, it shows that Sub-Saharan Africa and Oceania are utilizing surface water more than any other regions. This could mean, they may be using it to drink, or for sanitary purpose.

Conclusion:

  • These above visualization shows that regions like Sub-Saharan Africa and Oceania which are relatively back in terms of development than other Regions have effects on Drinking water.

  • This above analysis opens the door for investigation on questions such as comparison between the mortality rate on population of developed vs developing regions, or impact on health due to urbanization vs unsanitary behavior.