Data Preparation

library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(dplyr)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Populatio Growth Data set

#Original Population Growth dataset (pg)
# Reading the original .csv file that downloaded from http://data.un.org/

pg_original_file <- read.csv("https://raw.githubusercontent.com/gpadmaperuma/DATA606/master/SYB62_T03_201907_Population%20Growth%2C%20Fertility%20and%20Mortality%20Indicators.csv", header = TRUE, skip = 1)
head(pg_original_file) %>% kable()
Region.Country.Area X Year Series Value Footnotes Source
1 Total, all countries or areas 2005 Population annual rate of increase (percent) 1.2570 Data refers to a 5-year period preceding the reference year. United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
1 Total, all countries or areas 2005 Total fertility rate (children per women) 2.6513 Data refers to a 5-year period preceding the reference year. United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
1 Total, all countries or areas 2005 Infant mortality for both sexes (per 1,000 live births) 49.2161 Data refers to a 5-year period preceding the reference year. United Nations Statistics Division, New York, “Demographic Yearbook 2015” and the demographic statistics database, last accessed June 2017.
1 Total, all countries or areas 2005 Maternal mortality ratio (deaths per 100,000 population) 288.0000 World Health Organization (WHO), the United Nations Children’s Fund (UNICEF), the United Nations Population Fund (UNFPA), the World Bank and the United Nations Population Division, “Trends in Maternal Mortality 1990 - 2015.”
1 Total, all countries or areas 2005 Life expectancy at birth for both sexes (years) 67.0455 Data refers to a 5-year period preceding the reference year. United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
1 Total, all countries or areas 2005 Life expectancy at birth for males (years) 64.8082 Data refers to a 5-year period preceding the reference year. United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.

Population Data set

#Original population dataset(p)
p_original <- read.csv("https://raw.githubusercontent.com/gpadmaperuma/DATA606/master/Population%2C%20Surface%20Area%20and%20Density.csv", header = TRUE, skip = 1)
head(p_original) %>% kable()
Region.Country.Area X Year Series Value Footnotes Source
1 Total, all countries or areas 2005 Population mid-year estimates (millions) 6541.9070 United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
1 Total, all countries or areas 2005 Population mid-year estimates for males (millions) 3296.4853 United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
1 Total, all countries or areas 2005 Population mid-year estimates for females (millions) 3245.4217 United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
1 Total, all countries or areas 2005 Sex ratio (males per 100 females) 101.5734 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
1 Total, all countries or areas 2005 Population aged 0 to 14 years old (percentage) 28.1425 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
1 Total, all countries or areas 2005 Population aged 60+ years old (percentage) 10.2516 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
dim(pg_original_file)
## [1] 4984    7
dim(p_original)
## [1] 7351    7

Research question

While analyzing these data I will try to find solutions to my two questions:
(1) Which Region has the highest Life Expectancy? (2) Is there a relationship in Population, fertility rate and infant mortality rate in these Regions?

Cases

There are 4984 Cases in the population growth dataset. Each case represent population growth and indicators of fertility and mortality of the world There are 7351 Cases in the population dataset. Each case represent populatin of males, females, seniors etc.

Data collection

These data was obtained from the United Nations Database called UNdata:A world of information.
UNdata is a web-based data service for the global user community. These data are maintained by the Statistical Division of the Department of Economics and Social Affairs (UN DESA) of UN Secretariat. Most of the data sourced by UN partner organization such as UNICEF, UNDP, UNHCR, WHO etc.

Type of study

These data are obtained as a part of UN research efforts in order to solve world economic, health and other problems.These are observational data collected in UN researches of those countries or regions.

Variables

Dependent Variable

The responce variable for this dataset is value which is a quantitative variable.It holds all the population, fertility and mortality rates.

Independent Variable

Two Qualitative independent variables are the Region/Country/Area and Series and one quantitative independent variable is year that data was collected.

Exploring data with Relevant summary statistics

summary statistics for each the variables and appropriate visualizations

#summary of original population growth file
summary(pg_original_file)
##  Region.Country.Area           X             Year     
##  Min.   :  1.0       Afghanistan:  21   Min.   :2000  
##  1st Qu.:152.0       Albania    :  21   1st Qu.:2005  
##  Median :388.0       Algeria    :  21   Median :2010  
##  Mean   :393.4       Angola     :  21   Mean   :2010  
##  3rd Qu.:624.0       Argentina  :  21   3rd Qu.:2015  
##  Max.   :894.0       Armenia    :  21   Max.   :2018  
##                      (Other)    :4858                 
##                                                       Series   
##  Infant mortality for both sexes (per 1,000 live births) :702  
##  Life expectancy at birth for both sexes (years)         :705  
##  Life expectancy at birth for females (years)            :735  
##  Life expectancy at birth for males (years)              :735  
##  Maternal mortality ratio (deaths per 100,000 population):573  
##  Population annual rate of increase (percent)            :799  
##  Total fertility rate (children per women)               :735  
##      Value         
##  Min.   :  -4.978  
##  1st Qu.:   3.074  
##  Median :  52.536  
##  Mean   :  57.959  
##  3rd Qu.:  73.586  
##  Max.   :1986.136  
##                    
##                                                                                                                                                                                                                                                                          Footnotes   
##  Data refers to a 5-year period preceding the reference year.                                                                                                                                                                                                                 :3835  
##                                                                                                                                                                                                                                                                               : 659  
##  Data refers to a 5-year period preceding the reference year.;For statistical purposes, the data for China do not include those for the Hong Kong Special Administrative Region (Hong Kong SAR), Macao Special Administrative Region (Macao SAR) and Taiwan Province of China.:  18  
##  Data refers to a 5-year period preceding the reference year.;Including Abkhazia and South Ossetia.                                                                                                                                                                           :  18  
##  Data refers to a 5-year period preceding the reference year.;Including Agalega, Rodrigues and Saint Brandon.                                                                                                                                                                 :  18  
##  Data refers to a 5-year period preceding the reference year.;Including Åland Islands.                                                                                                                                                                                        :  18  
##  (Other)                                                                                                                                                                                                                                                                      : 418  
##                                                                                                                                                                                                                                                                                                        Source    
##  United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.                                                                                                                                                                                    : 799  
##  United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.:2910  
##  United Nations Statistics Division, New York, "Demographic Yearbook 2015" and the demographic statistics database, last accessed June 2017.                                                                                                                                                              : 702  
##  World Health Organization (WHO), the United Nations Children's Fund (UNICEF), the United Nations Population Fund (UNFPA), the World Bank and the United Nations Population Division, "Trends in Maternal Mortality 1990 - 2015."                                                                         : 573  
##                                                                                                                                                                                                                                                                                                                  
##                                                                                                                                                                                                                                                                                                                  
## 
#summary of original population file
summary(p_original)
##  Region.Country.Area           X             Year     
##  Min.   :  1.0       Montserrat :  30   Min.   :2000  
##  1st Qu.:151.0       Afghanistan:  29   1st Qu.:2006  
##  Median :384.0       Africa     :  29   Median :2016  
##  Mean   :391.3       Albania    :  29   Mean   :2013  
##  3rd Qu.:624.0       Algeria    :  29   3rd Qu.:2017  
##  Max.   :894.0       Americas   :  29   Max.   :2019  
##                      (Other)    :7176                 
##                                                   Series    
##  Population density                                  :1115  
##  Population mid-year estimates (millions)            :1115  
##  Sex ratio (males per 100 females)                   :1018  
##  Population aged 0 to 14 years old (percentage)      : 993  
##  Population aged 60+ years old (percentage)          : 993  
##  Population mid-year estimates for females (millions): 925  
##  (Other)                                             :1192  
##      Value                                                    Footnotes   
##  Min.   :     0.00                                                 :6373  
##  1st Qu.:     5.11   Projected estimate (medium fertility variant).:  88  
##  Median :    21.73   De jure population.                           :  55  
##  Mean   :   201.69   Calculated by the UN Statistics Division.     :  32  
##  3rd Qu.:    94.51   Including Åland Islands.                      :  29  
##  Max.   :136162.00   Including Svalbard and Jan Mayen Islands.     :  29  
##                      (Other)                                       : 745  
##                                                                                                                                                                                                                                                                                                        Source    
##  United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.                                                                                                                                                                                    :4080  
##  United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.:3004  
##  United Nations Statistics Division, New York, "Demographic Yearbook 2015" and the demographic statistics database, last accessed June 2017.                                                                                                                                                              : 267  
##                                                                                                                                                                                                                                                                                                                  
##                                                                                                                                                                                                                                                                                                                  
##                                                                                                                                                                                                                                                                                                                  
## 
#Population Growth
describe(pg_original_file)
##                     vars    n    mean     sd  median trimmed    mad
## Region.Country.Area    1 4984  393.39 264.59  388.00  385.88 349.89
## X*                     2 4984  133.17  76.52  132.00  133.08  99.33
## Year                   3 4984 2009.95   4.12 2010.00 2009.97   7.41
## Series*                4 4984    4.03   2.02    4.00    4.03   2.97
## Value                  5 4984   57.96 108.95   52.54   41.12  43.83
## Footnotes*             6 4984   10.95   6.20   11.00   10.55   0.00
## Source*                7 4984    2.21   0.85    2.00    2.14   0.00
##                         min     max   range  skew kurtosis   se
## Region.Country.Area    1.00  894.00  893.00  0.16    -1.23 3.75
## X*                     1.00  265.00  264.00  0.01    -1.20 1.08
## Year                2000.00 2018.00   18.00 -0.03    -1.42 0.06
## Series*                1.00    7.00    6.00  0.00    -1.28 0.03
## Value                 -4.98 1986.14 1991.11  6.45    58.89 1.54
## Footnotes*             1.00   42.00   41.00  1.70     6.19 0.09
## Source*                1.00    4.00    3.00  0.72     0.07 0.01
#Population
describe(p_original)
##                     vars    n    mean      sd  median trimmed    mad  min
## Region.Country.Area    1 7351  391.27  265.72  384.00  383.42 349.89    1
## X*                     2 7351  133.16   76.79  132.00  133.03  99.33    1
## Year                   3 7351 2012.69    5.62 2016.00 2012.91   4.45 2000
## Series*                4 7351    4.11    2.09    4.00    4.09   2.97    1
## Value                  5 7351  201.69 2094.52   21.73   38.65  29.74    0
## Footnotes*             6 7351    6.18   14.86    1.00    1.54   0.00    1
## Source*                7 7351    1.48    0.57    1.00    1.43   0.00    1
##                        max  range  skew kurtosis    se
## Region.Country.Area    894    893  0.16    -1.23  3.10
## X*                     266    265  0.02    -1.20  0.90
## Year                  2019     19 -0.29    -1.49  0.07
## Series*                  8      7  0.10    -1.14  0.02
## Value               136162 136162 41.93  2475.20 24.43
## Footnotes*              74     73  2.84     6.86  0.17
## Source*                  3      2  0.67    -0.57  0.01
#Population Growth
head(pg_original_file)
##   Region.Country.Area                             X Year
## 1                   1 Total, all countries or areas 2005
## 2                   1 Total, all countries or areas 2005
## 3                   1 Total, all countries or areas 2005
## 4                   1 Total, all countries or areas 2005
## 5                   1 Total, all countries or areas 2005
## 6                   1 Total, all countries or areas 2005
##                                                     Series    Value
## 1             Population annual rate of increase (percent)   1.2570
## 2                Total fertility rate (children per women)   2.6513
## 3  Infant mortality for both sexes (per 1,000 live births)  49.2161
## 4 Maternal mortality ratio (deaths per 100,000 population) 288.0000
## 5          Life expectancy at birth for both sexes (years)  67.0455
## 6               Life expectancy at birth for males (years)  64.8082
##                                                      Footnotes
## 1 Data refers to a 5-year period preceding the reference year.
## 2 Data refers to a 5-year period preceding the reference year.
## 3 Data refers to a 5-year period preceding the reference year.
## 4                                                             
## 5 Data refers to a 5-year period preceding the reference year.
## 6 Data refers to a 5-year period preceding the reference year.
##                                                                                                                                                                                                                                                                                                      Source
## 1                                                                                                                                                                                     United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
## 2 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
## 3                                                                                                                                                               United Nations Statistics Division, New York, "Demographic Yearbook 2015" and the demographic statistics database, last accessed June 2017.
## 4                                                                          World Health Organization (WHO), the United Nations Children's Fund (UNICEF), the United Nations Population Fund (UNFPA), the World Bank and the United Nations Population Division, "Trends in Maternal Mortality 1990 - 2015."
## 5 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
## 6 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
#Population
head(p_original)
##   Region.Country.Area                             X Year
## 1                   1 Total, all countries or areas 2005
## 2                   1 Total, all countries or areas 2005
## 3                   1 Total, all countries or areas 2005
## 4                   1 Total, all countries or areas 2005
## 5                   1 Total, all countries or areas 2005
## 6                   1 Total, all countries or areas 2005
##                                                 Series     Value Footnotes
## 1             Population mid-year estimates (millions) 6541.9070          
## 2   Population mid-year estimates for males (millions) 3296.4853          
## 3 Population mid-year estimates for females (millions) 3245.4217          
## 4                    Sex ratio (males per 100 females)  101.5734          
## 5       Population aged 0 to 14 years old (percentage)   28.1425          
## 6           Population aged 60+ years old (percentage)   10.2516          
##                                                                                                                                                                                                                                                                                                      Source
## 1                                                                                                                                                                                     United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
## 2                                                                                                                                                                                     United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
## 3                                                                                                                                                                                     United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2019.
## 4 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
## 5 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.
## 6 United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2015 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2019.

Tidying data for easy visualization

Original data includes values for both regions and countries. I will create two subsets one for regions and one for Countries. In this way it will be easy for us to visualize data in a more organized manner.

# Deleting unwanted columns from the original file and save as a new data frame.
UN_PopulationGrowth <-
  select(pg_original_file, -c("Region.Country.Area","Footnotes", "Source"))

head(UN_PopulationGrowth)
##                               X Year
## 1 Total, all countries or areas 2005
## 2 Total, all countries or areas 2005
## 3 Total, all countries or areas 2005
## 4 Total, all countries or areas 2005
## 5 Total, all countries or areas 2005
## 6 Total, all countries or areas 2005
##                                                     Series    Value
## 1             Population annual rate of increase (percent)   1.2570
## 2                Total fertility rate (children per women)   2.6513
## 3  Infant mortality for both sexes (per 1,000 live births)  49.2161
## 4 Maternal mortality ratio (deaths per 100,000 population) 288.0000
## 5          Life expectancy at birth for both sexes (years)  67.0455
## 6               Life expectancy at birth for males (years)  64.8082
UN_Population <-
  select(p_original, -c("Region.Country.Area","Footnotes", "Source"))

head(UN_Population)
##                               X Year
## 1 Total, all countries or areas 2005
## 2 Total, all countries or areas 2005
## 3 Total, all countries or areas 2005
## 4 Total, all countries or areas 2005
## 5 Total, all countries or areas 2005
## 6 Total, all countries or areas 2005
##                                                 Series     Value
## 1             Population mid-year estimates (millions) 6541.9070
## 2   Population mid-year estimates for males (millions) 3296.4853
## 3 Population mid-year estimates for females (millions) 3245.4217
## 4                    Sex ratio (males per 100 females)  101.5734
## 5       Population aged 0 to 14 years old (percentage)   28.1425
## 6           Population aged 60+ years old (percentage)   10.2516

Creating a subset for Regions

#Population Growth by region
PG_Region <- UN_PopulationGrowth %>%
  slice(22:564)

names(PG_Region)[names(PG_Region) == "X"] <- "Region"

head(PG_Region)
##   Region Year                                                  Series
## 1 Africa 2005            Population annual rate of increase (percent)
## 2 Africa 2005               Total fertility rate (children per women)
## 3 Africa 2005 Infant mortality for both sexes (per 1,000 live births)
## 4 Africa 2005         Life expectancy at birth for both sexes (years)
## 5 Africa 2005              Life expectancy at birth for males (years)
## 6 Africa 2005            Life expectancy at birth for females (years)
##     Value
## 1  2.4390
## 2  5.0771
## 3 81.0492
## 4 53.5269
## 5 51.9582
## 6 55.1343
#Population by Region
Total_Population_Region <- UN_Population %>%
  slice(30:870)

names(Total_Population_Region)[names(Total_Population_Region) == "X"] <- "Region"
head(Total_Population_Region)
##   Region Year                                               Series
## 1 Africa 2005             Population mid-year estimates (millions)
## 2 Africa 2005   Population mid-year estimates for males (millions)
## 3 Africa 2005 Population mid-year estimates for females (millions)
## 4 Africa 2005                    Sex ratio (males per 100 females)
## 5 Africa 2005       Population aged 0 to 14 years old (percentage)
## 6 Africa 2005           Population aged 60+ years old (percentage)
##      Value
## 1 916.1543
## 2 456.6481
## 3 459.5062
## 4  99.3780
## 5  41.8707
## 6   5.1112
PG_by_Region <- PG_Region %>%
  spread(key = Series, value = Value)

names(PG_by_Region)[names(PG_by_Region) == "Infant mortality for both sexes (per 1,000 live births)"] <- "Infant_Mortality"
names(PG_by_Region)[names(PG_by_Region) == "Life expectancy at birth for both sexes (years)"] <- "Life_Expectancy"
names(PG_by_Region)[names(PG_by_Region) == "Maternal mortality ratio (deaths per 100,000 population)"] <- "Maternal_mortality_ratio"
names(PG_by_Region)[names(PG_by_Region) == "Life expectancy at birth for males (years)"] <- "LifeExpectancy_males"
names(PG_by_Region)[names(PG_by_Region) == "Life expectancy at birth for females (years)"] <- "LifeExpectancy_females"
names(PG_by_Region)[names(PG_by_Region) == "Population annual rate of increase (percent)"] <- "Population_increase_rate"
names(PG_by_Region)[names(PG_by_Region) == "Total fertility rate (children per women)"] <- "Total_fertility_rate"
names(PG_by_Region)[names(PG_by_Region) == "X"] <- "Region"
head(PG_by_Region)
##   Region Year Infant_Mortality Life_Expectancy LifeExpectancy_females
## 1 Africa 2005          81.0492         53.5269                55.1343
## 2 Africa 2010          67.7143         56.7825                58.3389
## 3 Africa 2015          55.9325         60.2471                61.9302
## 4   Asia 2005          45.8017         68.3315                70.1224
## 5   Asia 2010          37.1114         70.0293                71.9654
## 6   Asia 2015          29.5012         71.8300                74.0127
##   LifeExpectancy_males Maternal_mortality_ratio Population_increase_rate
## 1              51.9582                       NA                    2.439
## 2              55.2459                       NA                    2.522
## 3              58.5824                       NA                    2.581
## 4              66.6483                       NA                    1.227
## 5              68.2175                       NA                    1.132
## 6              69.8000                       NA                    1.036
##   Total_fertility_rate
## 1               5.0771
## 2               4.9000
## 3               4.7301
## 4               2.4467
## 5               2.3281
## 6               2.2098
P_by_Region <- Total_Population_Region %>%
  spread(key = Series, value = Value)

names(P_by_Region)[names(P_by_Region) == "Population mid-year estimates (millions)"] <- "Pop.est.total"
names(P_by_Region)[names(P_by_Region) == "Population mid-year estimates for males (millions)"] <- "Pop.est.males"
names(P_by_Region)[names(P_by_Region) == "Population mid-year estimates for females (millions)"] <- "Pop.est.females"
names(P_by_Region)[names(P_by_Region) == "Sex ratio (males per 100 females)"] <- "m.to.f.ratio"

P_by_Region <-
  select(P_by_Region, -c("Population aged 0 to 14 years old (percentage)","Population aged 60+ years old (percentage)", "Surface area (thousand km2)"))

head(P_by_Region)
##     Region Year Population density Pop.est.total Pop.est.females
## 1   Africa 2005            30.9005      916.1543        459.5062
## 2   Africa 2010            35.0542     1039.3040        521.0514
## 3   Africa 2017            41.9658     1244.2223        622.8357
## 4   Africa 2019            44.1191     1308.0642        654.5505
## 5 Americas 2005            20.9061      884.7882        448.2641
## 6 Americas 2010            22.0840      934.6398        473.6061
##   Pop.est.males m.to.f.ratio
## 1      456.6481      99.3780
## 2      518.2526      99.4629
## 3      621.3865      99.7673
## 4      653.5137      99.8416
## 5      436.5240      97.3810
## 6      461.0337      97.3454

Data Visualization

Using some data such as Infant mortality rate, Life expectancy for both sexes, males,and females, I am creating some interacting scatter-plots for better understand of these populations around the world. All the data are for the years for major regions: 2005, 2005, 2015.

# Infant mortality rate by region
g <- ggplot(PG_by_Region, aes(x = Infant_Mortality, y = Region, text = Year))+
  geom_point(aes(color=Region))
ggplotly(g)
# Life expectancy for both sexes by region (Main Regions)
g<-ggplot(subset(PG_by_Region, Region %in% c("Africa", "Asia", "Australia and New Zealand", "Europe", "Caribbean", "South America", "Northern America")),
          aes(x = Life_Expectancy, y = Region, text = Year))+
  geom_point(aes(color=Region))
ggplotly(g)
# Life expectancy for Males by region
g<-ggplot(PG_by_Region, aes(x = LifeExpectancy_males, y = Region, fill = Region, text = Year))+
  geom_point(aes(color=Region))
ggplotly(g)
# Life expectancy for Females by region
g<-ggplot(PG_by_Region, aes(x = LifeExpectancy_females, y = Region, text = Year))+
  geom_point(aes(color=Region))

knitr::opts_chunk$set(fig.width=12, fig.height=8) 
ggplotly(g)
g <- ggplot(PG_by_Region, aes(x = Region, y = Life_Expectancy, fill = as.character(Year))) +
  geom_bar(stat = "Identity", position = "dodge") +
  geom_text(aes(label = paste0(round(PG_by_Region$Life_Expectancy,0))), hjust=-0.5, color="black", position = position_dodge(1), size = 2)+
  scale_fill_brewer(palette = "Paired") +
  theme(axis.text.x=element_text(angle = 0, vjust = 1)) +
  theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom") +
  ggtitle("Life Expectancy by Region") +
  xlab("Regions") +  ylab ("Age in Years") + 
  coord_flip()
ggplotly(g)
# Life expectancy for both sexes by region (Main Regions)
g<-ggplot(subset(PG_by_Region, Region %in% c("Africa", "Asia", "Australia and New Zealand", "Europe", "Caribbean", "South America", "Northern America")),
          aes(x = Life_Expectancy, y = Region, text = Year))+
  geom_point(aes(color=Region))
ggplotly(g)

Data Visualization with merged dataframs

Comparing Population Growth and Population data to see the relationship between Fertility Rate, Infant Mortality and Population growth rate. Fertility rate, which shows the children per woman, has clearly declined within the time frame 2005 to 2010. This has impacted on slow increase in population increase rate. Also Infant Mortality rates are also have declined. This reflects in decreased fertility rates as well. In an environment with high child mortality women will give birth to more children than they want to ensure against the loss of children (source: https://ourworldindata.org/fertility-rate)

#Merged population data for 2005 and 2010  
merged_population_data <- PG_by_Region %>% 
  inner_join(P_by_Region, by = c("Region" = "Region", "Year" = "Year"))
## Warning: Column `Region` joining factors with different levels, coercing to
## character vector
head(merged_population_data) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>% 
  scroll_box(width="100%",height="400px")
Region Year Infant_Mortality Life_Expectancy LifeExpectancy_females LifeExpectancy_males Maternal_mortality_ratio Population_increase_rate Total_fertility_rate Population density Pop.est.total Pop.est.females Pop.est.males m.to.f.ratio
Africa 2005 81.0492 53.5269 55.1343 51.9582 NA 2.439 5.0771 30.9005 916.1543 459.5062 456.6481 99.3780
Africa 2010 67.7143 56.7825 58.3389 55.2459 NA 2.522 4.9000 35.0542 1039.3040 521.0514 518.2526 99.4629
Asia 2005 45.8017 68.3315 70.1224 66.6483 NA 1.227 2.4467 128.1851 3977.9865 1942.7627 2035.2237 104.7593
Asia 2010 37.1114 70.0293 71.9654 68.2175 NA 1.132 2.3281 135.6484 4209.5937 2054.2814 2155.3123 104.9181
Australia and New Zealand 2005 5.0247 80.1206 82.5737 77.6371 NA 1.242 1.8043 3.0600 24.3139 12.2181 12.0958 98.9993
Australia and New Zealand 2010 4.5033 81.2801 83.5156 79.0327 NA 1.741 1.9850 3.3383 26.5247 13.2996 13.2251 99.4400
summary(merged_population_data)
##     Region               Year      Infant_Mortality  Life_Expectancy
##  Length:56          Min.   :2005   Min.   :  3.762   Min.   :49.54  
##  Class :character   1st Qu.:2005   1st Qu.: 16.112   1st Qu.:63.64  
##  Mode  :character   Median :2008   Median : 31.162   Median :69.75  
##                     Mean   :2008   Mean   : 36.569   Mean   :68.00  
##                     3rd Qu.:2010   3rd Qu.: 51.484   3rd Qu.:74.52  
##                     Max.   :2010   Max.   :103.615   Max.   :81.28  
##                                                                     
##  LifeExpectancy_females LifeExpectancy_males Maternal_mortality_ratio
##  Min.   :50.51          Min.   :48.58        Min.   : 36.0           
##  1st Qu.:64.72          1st Qu.:61.54        1st Qu.: 83.5           
##  Median :72.87          Median :67.58        Median :103.0           
##  Mean   :70.48          Mean   :65.58        Mean   :199.1           
##  3rd Qu.:77.26          3rd Qu.:71.41        3rd Qu.:207.2           
##  Max.   :83.52          Max.   :79.03        Max.   :717.0           
##                                              NA's   :42              
##  Population_increase_rate Total_fertility_rate Population density
##  Min.   :-0.4270          Min.   :1.260        Min.   :  3.06    
##  1st Qu.: 0.8057          1st Qu.:1.990        1st Qu.: 20.53    
##  Median : 1.3230          Median :2.534        Median : 40.50    
##  Mean   : 1.3953          Mean   :2.975        Mean   : 71.56    
##  3rd Qu.: 1.8373          3rd Qu.:3.216        3rd Qu.:130.78    
##  Max.   : 3.2270          Max.   :6.381        Max.   :267.58    
##                                                                  
##  Pop.est.total      Pop.est.females     Pop.est.males      
##  Min.   :   0.498   Min.   :   0.2467   Min.   :   0.2516  
##  1st Qu.:  87.932   1st Qu.:  44.8318   1st Qu.:  43.0999  
##  Median : 250.161   Median : 122.5598   Median : 127.6007  
##  Mean   : 559.181   Mean   : 276.6138   Mean   : 282.5668  
##  3rd Qu.: 630.032   3rd Qu.: 316.6489   3rd Qu.: 311.5643  
##  Max.   :4209.594   Max.   :2054.2814   Max.   :2155.3123  
##                                                            
##   m.to.f.ratio   
##  Min.   : 88.71  
##  1st Qu.: 97.10  
##  Median : 98.98  
##  Mean   : 99.36  
##  3rd Qu.:101.30  
##  Max.   :108.33  
## 
population_subset <- merged_population_data[c(1:3,8:9)]
head(population_subset)
##                      Region Year Infant_Mortality Population_increase_rate
## 1                    Africa 2005          81.0492                    2.439
## 2                    Africa 2010          67.7143                    2.522
## 3                      Asia 2005          45.8017                    1.227
## 4                      Asia 2010          37.1114                    1.132
## 5 Australia and New Zealand 2005           5.0247                    1.242
## 6 Australia and New Zealand 2010           4.5033                    1.741
##   Total_fertility_rate
## 1               5.0771
## 2               4.9000
## 3               2.4467
## 4               2.3281
## 5               1.8043
## 6               1.9850
library(ggpubr)
## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
Population_Increase <- ggplot(population_subset, aes(y=Region, x=Population_increase_rate, fill = Year))+
  geom_point(color = "blue")

Fertility_Rate <- ggplot(population_subset, aes(y=Region, x=Total_fertility_rate, fill = Year))+
  geom_point(color = "Red")

Infant_Mortality <- ggplot(population_subset, aes(y=Region, x=Infant_Mortality, fill = Year))+
  geom_point(color = "Green")

figure <- ggarrange(Population_Increase, Fertility_Rate, Infant_Mortality,
                    labels = c("A", "B", "c"),
                    ncol = 3, nrow = 3, scales = "Free")
## Warning in as_grob.default(plot): Cannot convert object of class character
## into a grob.
figure

Statistical Analysis

Create a function to calculate the correlation and round it to 4 decimal digits

findCorrelation <- function() {
  x = population_subset$Infant_Mortality
  y = population_subset$Total_fertility_rate
  corr = round(cor(x, y),4)
  print (paste0("Correlation = ",corr))
  return (corr)
}

c = findCorrelation()
## [1] "Correlation = 0.9209"

Create a function for Linear Model

findStatsFunction <- function() {
  m = lm (Infant_Mortality ~ Total_fertility_rate, data = population_subset)
  s = summary(m)
  print(s)
  
  slp = round(m$coefficients[2], 4)
  int = round(m$coefficients[1], 4)

  return (m)
}
m = findStatsFunction()
## 
## Call:
## lm(formula = Infant_Mortality ~ Total_fertility_rate, data = population_subset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.212  -7.672  -2.871   7.250  23.868 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -15.682      3.328  -4.712 1.76e-05 ***
## Total_fertility_rate   17.564      1.012  17.356  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.62 on 54 degrees of freedom
## Multiple R-squared:  0.848,  Adjusted R-squared:  0.8452 
## F-statistic: 301.2 on 1 and 54 DF,  p-value: < 2.2e-16

Display the Linear Model

plot = ggplot(population_subset, aes(Infant_Mortality, Total_fertility_rate)) + geom_point(colour="blue") + 
    xlab("Infant mortality") + ylab("Fertility Rate") + labs(title = "Infant Mortality vs. Total Fertility Rate")
ggplotly(plot)

Hypothesis Testing for Infant mortality and fertility reltionship

\(H_0\) : Null Hypothesis - There is no relationship between Infant Mortality and Fertility Rate \(H_A\) : Alternative Hypothesis - There is a relationship between Infant Mortality and Fertility Rate

Here the multiple R value is 0.9209 which shows that there is significant correlation between Infant Mortality and Fertility Rate. Also the value of R square is 0.9209 which shows the extent to which the Infant Mortality affect the Fertility Rate. Therefore, we reject the null hypothesis (H_0) and accept the Alternative hypothesis (H_1).

Conclusion with Research Question answers

  1. Which Region has the highest Life Expectancy?

Graph for Life expectancy by region clearly shows that Australia and New Zealand region with the highest life expactancy of age 82. This region is on the top for the last 15 years with highest Life expectancy over age 80.

  1. Is there a increase/decrease Population, fertility rate and infant mortality rate in these country/region/area?

Years Infant.Mortalit Pop.increase Fertility Australia and New Zealand 2005 5.0247 1.242 1.8043 6 Australia and New Zealand 2010 4.5033 1.741 1.9850

Infant Mortality rate decreased while population and fertility rate incrased. This is a good sign and probably a good indicators for their high life expectancy in that region. Overall all the regions have decrease in infant mortality and incrase in population increase and fertility, just in a lower speed than it used to be.