library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
WA<- read.csv("/Users/rupeshswarnakar/Desktop/washdash-download.csv")
summary(WA)
## Type Region Residence.Type Service.Type
## Length:260 Length:260 Length:260 Length:260
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Year Coverage Population Service.level
## Min. :2022 Min. : 0.000 Min. :0.000e+00 Length:260
## 1st Qu.:2022 1st Qu.: 1.803 1st Qu.:3.524e+06 Class :character
## Median :2022 Median : 10.250 Median :2.916e+07 Mode :character
## Mean :2022 Mean : 22.571 Mean :1.660e+08
## 3rd Qu.:2022 3rd Qu.: 33.408 3rd Qu.:1.748e+08
## Max. :2022 Max. :100.000 Max. :2.173e+09
Here we can see that, mean of coverage(%) is around 22.6. This means around 23% of population in average has utilized the service (either good service or bad service). Rest of the population is very thin; and they also have utilized some sort of services. Also, the 3rd quartile is around 33% which is also significant because it tells us that majority of 75% of population is gathered in that 33% of actual population that are utilizing the services.
Q.1 Which Region is comparatively behind on progress in household drinking water?
Q.2 Is there any relationship between development of country to progress on drinking water?
Q.3 Is the population affecting drinking water, sanitation and hygiene? #Sometimes more population can impact the availability or drinking water, sanitation and hygiene.
Let’s take a look into Regions that are relatively behind on development such as Sub-Saharan Africa:
WA|>
filter(Region=='Sub-Saharan Africa')|>
filter(Service.Type== 'Drinking water')|>
aggregate(Coverage~Service.level, mean)
## Service.level Coverage
## 1 Basic service 33.60807
## 2 Limited service 13.35015
## 3 Safely managed service 33.17985
## 4 Surface water 5.82792
## 5 Unimproved 14.03401
From the above analysis, we can see that Sub-Saharan Africa has around 33% of safely managed services. This is relatively lower as compared to other developed regions which we will see in further analysis. Also, the unimproved service level is around 14% and the surface water is around 6%. This observation tell us that surface water is also used in various service type such as drinking water, sanitation and hygiene which might have increased the unimproved service level.
Let’s take a look into Regions that are relatively behind on development such as Oceania:
WA|>
filter(Region=='Oceania')|>
filter(Service.Type== 'Drinking water')|>
aggregate(Coverage~Service.level, mean)
## Service.level Coverage
## 1 At least basic 55.36870
## 2 Basic service 37.50377
## 3 Limited service 1.76193
## 4 Safely managed service 55.32068
## 5 Surface water 13.61583
## 6 Unimproved 16.76829
From the above analysis, we can see that Oceania has around 55% of safely managed services. This is comparatively higher than Sub-Saharan Africa but overall not very high. Also, the unimproved service level is around 16% and the surface water is around 13%. This observation tell us that surface water is also used in various service type such as drinking water which might have increased the unimproved service level.
Let’s compare these above data with that of developed Regions such as Europe and Northern America:
WA|>
filter(Region=='Europe and Northern America')|>
filter(Service.Type== 'Drinking water')|>
aggregate(Coverage~Service.level, mean)
## Service.level Coverage
## 1 Basic service 6.3636600
## 2 Limited service 0.3317933
## 3 Safely managed service 92.1247667
## 4 Surface water 0.0279600
## 5 Unimproved 1.1518233
From the above analysis, we can see that Europe and Northern America has around 92% of safely managed services. This is higher than Sub-Saharan Africa and Oceania. Also, the unimproved service level is around 1% and the surface water is around 0%. This observation overall tell us that developed regions have managed their drinking water at higher level.
We can further look at ‘safely managed’ service level of all Regions to see which are comparatively higher than the other:
WA|>
filter(Service.Type== 'Drinking water')|>
filter(Service.level=='Safely managed service')|>
aggregate(Coverage~Region, mean)
## Region Coverage
## 1 Australia and New Zealand 99.53387
## 2 Central and Southern Asia 67.47019
## 3 Eastern and South-Eastern Asia 76.75868
## 4 Europe and Northern America 92.12477
## 5 Latin America and the Caribbean 69.51487
## 6 Northern Africa and Western Asia 78.96726
## 7 Oceania 55.32068
## 8 Sub-Saharan Africa 33.17985
From the above analysis, we can see that Sub-Saharan Africa and Oceania are relatively lower in coverage of public utilizing the safely managed services. Also, developed regions are far higher around 90% of coverage on safely managed services. And, regions that are developing rapidly are around 70% of coverage on safely managed services.
We can also further make comparison on ‘unimproved’ service level of all Regions to see which are comparatively higher than the other:
WA|>
filter(Service.Type== 'Drinking water')|>
filter(Service.level=='Unimproved')|>
aggregate(Coverage~Region, mean)
## Region Coverage
## 1 Australia and New Zealand 0.018150
## 2 Central and Southern Asia 2.044047
## 3 Eastern and South-Eastern Asia 2.350480
## 4 Europe and Northern America 1.151823
## 5 Latin America and the Caribbean 1.488903
## 6 Northern Africa and Western Asia 2.290723
## 7 Oceania 16.768293
## 8 Sub-Saharan Africa 14.034013
From the above comparison we can see that Sub-Saharan Africa and Oceania are comparatively higher than other developed regions. This analysis again opens the door for further investigation on what aspect of development is affecting the crucial basic need of drinking water.
ggplot(WA, aes(x=Service.Type,
y=Population,
fill=Region))+
geom_boxplot()+
labs(x="Different Types of Services",
y="Population using Services",
title="Population vs Types of Services in Different SDG Regions")+
scale_color_brewer(palette='Dark2')
ggplot(WA, aes(x=Service.level,
y=Coverage,
fill=Region))+
geom_boxplot()+
labs(x="Quality of Different Services",
y="% Coverage using Services",
title="% Coverage of Population vs Quality of Services in Different SDG Regions")+
scale_color_brewer(palette='Dark2')
This visualization shows that Sub-Saharan Africa and Oceania are higher in service level like unimproved, open defecation, no hand-washing facility.
This visualization shows that Sub-Saharan Africa and Oceania are lowest in terms of Safely managed services.
Also, it shows that Sub-Saharan Africa and Oceania are utilizing surface water more than any other regions. This could mean, they may be using it to drink, or for sanitary purpose.
These above analysis and visualization shows that regions like Sub-Saharan Africa and Oceania which are relatively back in terms of development than other Regions, have negative effects on Drinking water.
This above analysis opens the door for investigation on questions such as comparison between the mortality rate of population of developed vs developing regions, or impact on health due to urbanization vs unsanitary behavior.
The key point to further investigate might also be on aspects of development such as political instability, employment, literacy of public, GDP, geographical difficulties, international affairs, etc. which may be affecting the drinking water in various regions.