Safe Water Life Expectancy by Region/ Continent

About The Data

this data set is combined. The safe Water Life expectancy only have three variables. The data was merged by nations data set from the course data set resources. The data now have five variables which includes country, region, percentage population safe water, average life expectancy, and income. there are more than one thousand observation and now reduced to 81. So there is no major cleaning of this data. Here an article for my topic a quick background: https://pubmed.ncbi.nlm.nih.gov/10845266

Introduction and Brief

When it comes to quality of life on a global scale, it is imperative to look at the basic needs which are available to the people of individual countries and regions. Using data from The Nations and safe water life expectancy, I am interested in figuring out the factors that influence the life expectancy of people all around the world.Also, I wanted to find out more about the region where I came from. I have experience this situation. I asked my self , is it because our region is extremely poor or because we don’t have opportunities? So I was anxious to know about other region and countries as well. What might have been the factors to safe water life expectancy.Does it mean having safe water will increase your life expectancy? There some variables that are directly correlated to life expectancy I believe they are: percentage of population safe water, average life expectancy, and income. These four things are the basic fundamentals need for citizens in each country should be granted.In my project, I will give a great data analysis on my Tableau public link where you can see the names of the countries, region, GDP, and the percentage population of life expectancy that I will be adding here. In fact, safe water in a country is essential for public health, whether it is used for drinking, domestic use, food production, agriculture and recreational purposes. Improved water supply and sanitation can boost countries economic growth and can contribute greatly to poverty reduction. In my research project, I was curious to find out safe water Life expectancy in every region in the world.That include (country, income, Region, Average Life Exp. GDP per capital, and population).

load needed library

these are the necessary library I need to able to run my graphs.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(rvest)
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(plotly)
## Warning: package 'plotly' was built under R version 4.2.1
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.2.1
library(ggplot2)
library(RColorBrewer)
library(dplyr )
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

load the datas

setwd("C:/Users/baise/OneDrive/Desktop/Baidata110summer")
nations <- read_csv("nations.csv")
## Rows: 5275 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): iso2c, iso3c, country, region, income
## dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
safewater <- read_csv("SafeWaterLifeExpectancy1.csv")
## Rows: 81 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (2): percent_pop_safe_water, avgLifeExp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

load the merged data

nations2 <- nations %>%
  select(country, region, income)
safenew <- safewater %>%
  left_join(nations2, by = "country" )
#safe_2 <- df[-which(duplicated(safenew)), ]
safe2 <- distinct(safenew)

view the data

I filtered this chunk to have a preview of some countries and region that are low income/ poor countries. This is because I was curious to know countries from the bottom. There was not a surprised for me when I saw countries from Africa and south Asia. As you seen the percent population of citizens that having safe water. In my Tableau data, you will be able to view each country population, percentage life expectancy, and GDP Per capital.

head(safe2)
## # A tibble: 6 × 5
##   country  percent_pop_safe_water avgLifeExp region              income         
##   <chr>                     <dbl>      <dbl> <chr>               <chr>          
## 1 Uganda                     6.44       59.5 Sub-Saharan Africa  Low income     
## 2 Ethiopia                  10.5        65.0 Sub-Saharan Africa  Low income     
## 3 Nigeria                   19.4        53.0 Sub-Saharan Africa  Lower middle i…
## 4 Cambodia                  24.1        68.5 East Asia & Pacific Low income     
## 5 Nepal                     26.8        69.9 South Asia          Low income     
## 6 Ghana                     26.9        62.4 Sub-Saharan Africa  Lower middle i…

This is a quick overview showing variables in the data set. countries, percent population, average life expectancy, region and income. As the chart display, the first six rows are mostly in Africa and Asia it is because the list was arranged by low incomes and GDP.I also believe because this regions are of extreme poor. The percentage population of safe water of these countries are below 50%. Which may potentially have an impact on their life expectancy. these includes, the lack of income, resource, and good health care facilities.

Here is a quick summary of the data

of the rows and columns. As you can see there are 81 column and 5 rows in this data set

str(safe2)
## tibble [81 × 5] (S3: tbl_df/tbl/data.frame)
##  $ country               : chr [1:81] "Uganda" "Ethiopia" "Nigeria" "Cambodia" ...
##  $ percent_pop_safe_water: num [1:81] 6.44 10.54 19.4 24.1 26.75 ...
##  $ avgLifeExp            : num [1:81] 59.5 65 53 68.5 69.9 ...
##  $ region                : chr [1:81] "Sub-Saharan Africa" "Sub-Saharan Africa" "Sub-Saharan Africa" "East Asia & Pacific" ...
##  $ income                : chr [1:81] "Low income" "Low income" "Lower middle income" "Low income" ...

Describing the data

I describe the data because I wanted to view all the information the data contain.

describe(safe2)
##                        vars  n  mean    sd median trimmed   mad   min    max
## country*                  1 81 41.00 23.53  41.00   41.00 29.65  1.00  81.00
## percent_pop_safe_water    2 81 79.40 24.87  91.69   83.84 10.68  6.44 100.00
## avgLifeExp                3 81 75.68  6.55  76.64   76.51  7.04 52.98  83.84
## region*                   4 78  2.92  1.63   2.00    2.66  0.00  1.00   7.00
## income*                   5 78  3.21  1.42   3.00    3.25  1.48  1.00   5.00
##                        range  skew kurtosis   se
## country*               80.00  0.00    -1.24 2.61
## percent_pop_safe_water 93.56 -1.30     0.57 2.76
## avgLifeExp             30.87 -1.26     1.90 0.73
## region*                 6.00  1.41     0.84 0.18
## income*                 4.00  0.05    -1.54 0.16

Plot One

This is graph showing Regions and percentage population having safe water. Each dot represent a region, and this data was from 1996-2014. Each country has it own percentage population of safe water.I included the interactive activity to be able to view each dot in every year and read values.

Plot one overview

Plot 1, In this plot, there are few dots in each region. The data is from 1996 to 2014. So each dot represent a year. Also, you will see percentage population of safe water changes every year. The first region is East Asia Pacific, in this region the percentage population safe water is gradually increasing every. These are countries that have a very high income and GDP. So these percentage population of safe water is very good. these countries includes, China, Japan, South Korea, Australia, New Zealand etc.. and their life expectancy rate will be probably high because they have all type of resources including good healthcare facilities. Next, Europe and Central Asia. There are a lot of dots in this region meaning there are a lot of countries in this continent, these includes Turkey, Ukraine, Russia, Hungry, Romania, Armenia, Belarus etc.. these country were behind before especially the percentage population safe water but now since technology have enlightening the world, they have been growing very year and ever since. Back in the 1990s, their life expectancy was below average. Now, it has increase tremendously.Third, Latin America and Caribbean region/ continent, also have a good percentage population of safe water. They have been growing over the years . These countries includes Mexico, Peru, Argentina, Brazil, Colombia, Costa Rica etc… there are 33 countries made of this region/continent they have been improving in terms of the percentage population getting safe water. There are 21 countries made up of this region. Fifth, North America, this is one of the most developed continent in the world. these countries include United State, Canada etc. there are 23 countries made up this region. The percentage population of safe water has increasing tremendously. Sixth, South Asia, this region is still developing their percentage population safe water is below average this might be because their GDP is below the living standard. Lastly, Sub-Saharan African, this is a very small continent / region. This is where most poor countries can be found. This region will takes decade to reach a peak of having a better percent population of safe water because there are not enough resources and available resource to support it people as a result, the life expectancy is very low. Poor heath care, income, and other facilities are very limited. The GDP is very low in this regions. I created a tablau dashbord that show each countries, region, GDP, and average life expectancy.

## Tableau link: this link provived a clear view of regions, countries, GDP, population and life expectancy I case you may wondering why the percent population for safe water is below or high of a region. In this chart, you have a detailed summary of the variables. Then can assume the diffrences. here is the link:

https://public.tableau.com/app/profile/bai.sesay/viz/LifeExpectancyandWaterAccess/SafeWaterLifeExpectancybyRegionandCountry?publish=yes

p <- safe2 %>% 
  ggplot(aes(region, percent_pop_safe_water, color = region)) + xlab("region") + ylab("percent_pop_safe_water") +
  geom_point() + coord_flip()
  ggtitle("Percent Population safe water by region ")
## $title
## [1] "Percent Population safe water by region "
## 
## attr(,"class")
## [1] "labels"
ggplotly(p)

Plot two

In this box plot, I created this just to show some outlier in some developed regions.This outlier may indicates that some countries that are in this region may have lower percentage population of safe water. The factors may be low in GDP, overpopulation, insufficient income, and life expectancy.

plot2 <- safe2 %>%
  ggplot(aes(x= region, y= percent_pop_safe_water, fill = "region")) +
  geom_boxplot() + coord_flip() + ggtitle("percent_pop_safe_water by regions")
plot2

Plot 3

In this plot, this show where I really focus on my final visualization. I am only interested in Regions and percentage population of safe water life expectancy.The countries, population, and GDP can be view and compare to regions.

safe2 %>%
  ggplot(aes(x = region, fill = region)) + 
  geom_bar(alpha = 0.8) +  coord_flip()

  facet_wrap(vars(region)) 
## <ggproto object: Class FacetWrap, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetWrap, Facet, gg>

To sum up with

The first graph showing percentage population of safe water by regions. The interactive activity enable reader to view regions since the data was established the data was established from 1996-2014. As we seen from the graph, most region improve over time. some regions declined like sub-Sahara Africa and south Asia. According to world vision, not having safe water does not only pose threat to life expectancy in world poorest regions or a country, but it also tend to abuse specially sexual abuse to girls and women. In Africa, girls and women walk very long distance to fetched water for their home. Mostly in the dry season where they usually alone especially at night on their way back. Also, Illnesses, many water source that are not portable are filled with harmful bacteria which make children and adult sick and threatening their life and future. Illnesses may include typhoid malaria, diarrhea, dysentery, polio, cholera etc and even blindness, washing with dirty water can cause many eye disease such as trachoma, which can lead to blindness if untreated. Also, lost of education can be a greater culprit as well because finding safe/portable water can take hours to reach destination. That may caused children and teenager to miss school because they are too tired to do well in school or to able to make to school. To reflect, I was teenager living in my country, we usually walked miles at 12-1 am to fetched portable water especially in the dry season. African weather is tropical where in there two seasons, Rainy and dry season which are six month each.so In the dry season it is very difficult to find safe drinking water unless you walk miles in the middle of the night.The data represent a strong correlated with economic development in regions/countries. Also, improvement in economic conditions are vital force behind life expectancy. we can conclude that life expectancy increases country improves its standard of living.If I would have more time for this project, I would do research or the mortality and fertility rates to see if high income countries have a good health care access/facilites and a better mortality rate.

Defination of terms

I.. Region : an area or division especially part of a country or the world having definable characteristic but not always a fixed boundaries.

  1. Percentage population: Divide the target demographic by the entire population, and then multiply the result by 100 to covert it to percentage.

  2. Safe water: portable water, water that is safe to be used as drinking water

  3. Gross Domestic Product (GDP) is a monetary measure of the market value of all the final goods and sevices produce in a specific time period by countries (Wikipedia)

##SOURCES

Sources i: https://pubmed.ncbi.nlm.nih.gov/10845266

Source ii: https://www.globalcitizen.org/en/content/why-do-so-many-people-still-struggle-to-access-cle/

Source iii: https://www.prb.org/resources/clean-waters-historic-effect-on-u-s-mortality-rates-provides-hope-for-developing-countries/