library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Dataset on Progress in Household Drinking Water, Sanitation, and Hygiene is as follow:

WA<- read.csv("/Users/rupeshswarnakar/Desktop/washdash-download.csv")

Below is the summary of each column in the dataset(WA):

summary(WA)
##      Type              Region          Residence.Type     Service.Type      
##  Length:260         Length:260         Length:260         Length:260        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##       Year         Coverage         Population        Service.level     
##  Min.   :2022   Min.   :  0.000   Min.   :0.000e+00   Length:260        
##  1st Qu.:2022   1st Qu.:  1.803   1st Qu.:3.524e+06   Class :character  
##  Median :2022   Median : 10.250   Median :2.916e+07   Mode  :character  
##  Mean   :2022   Mean   : 22.571   Mean   :1.660e+08                     
##  3rd Qu.:2022   3rd Qu.: 33.408   3rd Qu.:1.748e+08                     
##  Max.   :2022   Max.   :100.000   Max.   :2.173e+09

Here we can see that, mean of coverage(%) is around 22.6. This means around 23% of population in average has utilized the service (either good service or bad service). Rest of the population is very thin; and they also have utilized some sort of services. Also, the 3rd quartile is around 33% which is also significant because it tells us that majority of 75% of population is gathered in that 33% of actual population that are utilizing the services.

Novel Questions to be experimented:

Q.1 Which Region is comparatively behind on progress in household drinking water?

Q.2 Is there any relationship between development of country to progress on drinking water?

Q.3 Is the population affecting drinking water, sanitation and hygiene? #Sometimes more population can impact the availability or drinking water, sanitation and hygiene.

Addressing Q.2 by using aggregation function:

Let’s take a look into Regions that are relatively behind on development such as Sub-Saharan Africa:

WA|>
  filter(Region=='Sub-Saharan Africa')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level Coverage
## 1          Basic service 33.60807
## 2        Limited service 13.35015
## 3 Safely managed service 33.17985
## 4          Surface water  5.82792
## 5             Unimproved 14.03401

From the above analysis, we can see that Sub-Saharan Africa has around 33% of safely managed services. This is relatively lower as compared to other developed regions which we will see in further analysis. Also, the unimproved service level is around 14% and the surface water is around 6%. This observation tell us that surface water is also used in various service type such as drinking water, sanitation and hygiene which might have increased the unimproved service level.

Let’s take a look into Regions that are relatively behind on development such as Oceania:

WA|>
  filter(Region=='Oceania')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level Coverage
## 1         At least basic 55.36870
## 2          Basic service 37.50377
## 3        Limited service  1.76193
## 4 Safely managed service 55.32068
## 5          Surface water 13.61583
## 6             Unimproved 16.76829

From the above analysis, we can see that Oceania has around 55% of safely managed services. This is comparatively higher than Sub-Saharan Africa but overall not very high. Also, the unimproved service level is around 16% and the surface water is around 13%. This observation tell us that surface water is also used in various service type such as drinking water which might have increased the unimproved service level.

Let’s compare these above data with that of developed Regions such as Europe and Northern America:

WA|>
  filter(Region=='Europe and Northern America')|>
  filter(Service.Type== 'Drinking water')|>
  aggregate(Coverage~Service.level, mean)
##            Service.level   Coverage
## 1          Basic service  6.3636600
## 2        Limited service  0.3317933
## 3 Safely managed service 92.1247667
## 4          Surface water  0.0279600
## 5             Unimproved  1.1518233

From the above analysis, we can see that Europe and Northern America has around 92% of safely managed services. This is higher than Sub-Saharan Africa and Oceania. Also, the unimproved service level is around 1% and the surface water is around 0%. This observation overall tell us that developed regions have managed their drinking water at higher level.

We can further look at ‘safely managed’ service level of all Regions to see which are comparatively higher than the other:

WA|>
  filter(Service.Type== 'Drinking water')|>
  filter(Service.level=='Safely managed service')|>
  aggregate(Coverage~Region, mean)
##                             Region Coverage
## 1        Australia and New Zealand 99.53387
## 2        Central and Southern Asia 67.47019
## 3   Eastern and South-Eastern Asia 76.75868
## 4      Europe and Northern America 92.12477
## 5  Latin America and the Caribbean 69.51487
## 6 Northern Africa and Western Asia 78.96726
## 7                          Oceania 55.32068
## 8               Sub-Saharan Africa 33.17985

From the above analysis, we can see that Sub-Saharan Africa and Oceania are relatively lower in coverage of public utilizing the safely managed services. Also, developed regions are far higher around 90% of coverage on safely managed services. And, regions that are developing rapidly are around 70% of coverage on safely managed services.

We can also further make comparison on ‘unimproved’ service level of all Regions to see which are comparatively higher than the other:

WA|>
  filter(Service.Type== 'Drinking water')|>
  filter(Service.level=='Unimproved')|>
  aggregate(Coverage~Region, mean)
##                             Region  Coverage
## 1        Australia and New Zealand  0.018150
## 2        Central and Southern Asia  2.044047
## 3   Eastern and South-Eastern Asia  2.350480
## 4      Europe and Northern America  1.151823
## 5  Latin America and the Caribbean  1.488903
## 6 Northern Africa and Western Asia  2.290723
## 7                          Oceania 16.768293
## 8               Sub-Saharan Africa 14.034013

From the above comparison we can see that Sub-Saharan Africa and Oceania are comparatively higher than other developed regions. This analysis again opens the door for further investigation on what aspect of development is affecting the crucial basic need of drinking water.

Let’s use the visualization using boxplot to further explain the above analysis.

ggplot(WA, aes(x=Service.Type,
               y=Population,
               fill=Region))+
  geom_boxplot()+
  labs(x="Different Types of Services",
       y="Population using Services",
       title="Population vs Types of Services in Different SDG Regions")+
  scale_color_brewer(palette='Dark2')

  • This visualization shows that 2nd quartile in Sub-Saharan Africa is higher than other regions in Drinking water.
  • This means most of the population in Sub-Saharan Africa is getting some kind of similar quality of services (either good or bad).
  • That quality of services can be observed from the below visualization.
ggplot(WA, aes(x=Service.level,
               y=Coverage,
               fill=Region))+
  geom_boxplot()+
  labs(x="Quality of Different Services",
       y="% Coverage using Services",
       title="% Coverage of Population vs Quality of Services in Different SDG Regions")+
  scale_color_brewer(palette='Dark2')

  • This visualization shows that Sub-Saharan Africa and Oceania are higher in service level like unimproved, open defecation, no hand-washing facility.

  • This visualization shows that Sub-Saharan Africa and Oceania are lowest in terms of Safely managed services.

  • Also, it shows that Sub-Saharan Africa and Oceania are utilizing surface water more than any other regions. This could mean, they may be using it to drink, or for sanitary purpose.

Conclusion:

  • These above analysis and visualization shows that regions like Sub-Saharan Africa and Oceania which are relatively back in terms of development than other Regions, have negative effects on Drinking water.

  • This above analysis opens the door for investigation on questions such as comparison between the mortality rate of population of developed vs developing regions, or impact on health due to urbanization vs unsanitary behavior.

  • The key point to further investigate might also be on aspects of development such as political instability, employment, literacy of public, GDP, geographical difficulties, international affairs, etc. which may be affecting the drinking water in various regions.