Old-age dependency ratio 65+/(15-64) , 1950-2100

This data will be used to analyze the ratio of people aged 65+ with population 15-64. It gives us a measure of the World Population Prospects and how old-age population have increased or decreased, over the years.

This data was sourced from the website https://population.un.org/wpp/Download/Standard/Population/. I have used one dataset which which shows “Old-age dependency ratio (ratio of population aged 65+ per 100 population 15-64)” from 1950 to 2020 by “Region, subregion, country or area”

Here are the variable descriptions for dataset:
  • Region, subregion, country or area = Data are presented by region, subregion, country or area using the four classifications to group countries”

  • Country code = Specifies the country code

  • Type = Type of classification used to group countries

  • 1950, 1955, .., 2020 = Years specifying “ratio of population aged 65+ per 100 population 15-64”

Coding:

Loading the required packages:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Note: Using Kable, I’m only displaying first 30 rows as the number of rows in dataset is more than 200

Data Cleaning:

Reading the data from csv file and updating “null and …” records to “NA”

agedep <- read_csv(file="Assignment1-Konuganti-Old-Age-Dependency-Data-Wrangling-Visualization.csv",na = c("", "...", "NA", "N/A"))
## Rows: 255 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Variant, Region, subregion, country or area *, Notes, Type
## dbl (18): Index, Country code, Parent code, 1950, 1955, 1960, 1965, 1970, 19...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kbl(agedep[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Index Variant Region, subregion, country or area * Notes Country code Type Parent code 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
1 Estimates WORLD NA 900 World 0 8.4 8.5 8.6 8.9 9.3 9.7 10.0 9.9 10.1 10.6 10.9 11.2 11.6 12.6 14.3
2 Estimates UN development groups a 1803 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 Estimates More developed regions b 901 Development Group 1803 11.9 12.6 13.4 14.3 15.5 16.7 17.8 17.5 18.7 20.4 21.3 22.6 23.7 26.6 30.0
4 Estimates Less developed regions c 902 Development Group 1803 6.5 6.4 6.2 6.3 6.6 6.8 7.1 7.3 7.4 7.8 8.2 8.5 8.8 9.6 11.3
5 Estimates Least developed countries d 941 Development Group 902 5.9 5.4 5.3 5.4 5.5 5.8 5.9 5.9 5.9 6.0 6.1 6.1 6.1 6.2 6.2
6 Estimates Less developed regions, excluding least developed countries e 934 Development Group 902 6.6 6.5 6.4 6.4 6.7 7.0 7.3 7.4 7.6 8.0 8.5 8.8 9.2 10.2 12.1
7 Estimates Less developed regions, excluding China NA 948 Development Group 1803 6.2 6.1 6.1 6.4 6.5 6.6 6.7 6.7 6.9 7.2 7.5 7.7 8.0 8.5 9.4
8 Estimates Land-locked Developing Countries (LLDC) f 1636 Special other 1803 6.8 6.7 6.6 6.7 6.8 6.9 7.0 6.7 6.8 7.0 6.9 6.8 6.5 6.3 6.6
9 Estimates Small Island Developing States (SIDS) g 1637 Special other 1803 6.8 6.8 6.7 7.2 7.9 8.5 9.0 9.3 9.5 9.7 10.0 10.5 11.0 11.9 13.7
10 Estimates World Bank income groups NA 1802 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
11 Estimates High-income countries h 1503 Income Group 1802 12.1 13.0 13.7 14.5 15.5 16.4 17.3 17.1 18.2 19.3 20.3 21.3 22.5 25.3 28.2
12 Estimates Middle-income countries h 1517 Income Group 1802 6.9 6.8 6.7 6.8 7.2 7.6 7.9 7.9 8.1 8.5 8.9 9.2 9.5 10.4 12.3
13 Estimates Upper-middle-income countries h 1502 Income Group 1517 7.2 7.2 7.1 7.1 7.6 8.1 8.6 8.6 8.9 9.6 10.2 10.7 11.1 12.6 15.8
14 Estimates Lower-middle-income countries h 1501 Income Group 1517 6.5 6.3 6.2 6.5 6.7 6.9 7.0 7.0 7.0 7.3 7.5 7.7 7.8 8.2 9.1
15 Estimates Low-income countries h 1500 Income Group 1802 5.8 5.6 5.5 5.5 5.6 5.7 5.9 5.9 6.0 6.1 6.0 6.0 6.0 6.0 6.0
16 Estimates No income group available NA 1518 Income Group 1802 8.4 8.4 8.5 8.1 8.2 9.3 9.7 10.1 10.2 10.8 11.4 11.8 13.1 15.2 17.7
17 Estimates Geographic regions i 1840 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
18 Estimates Africa j 903 Region 1840 5.9 5.7 5.8 5.9 6.0 6.1 6.1 6.2 6.2 6.2 6.2 6.0 6.0 6.1 6.3
19 Estimates Asia k 935 Region 1840 6.8 6.6 6.4 6.4 6.8 7.1 7.5 7.7 8.0 8.5 9.1 9.5 10.0 11.0 13.1
20 Estimates Europe l 908 Region 1840 12.1 12.7 13.6 14.8 16.3 17.8 18.9 17.8 19.0 20.9 21.8 23.3 24.0 26.4 29.5
21 Estimates Latin America and the Caribbean m 904 Region 1840 6.3 6.4 6.7 7.1 7.3 7.5 7.8 7.9 8.2 8.7 9.1 9.8 10.5 11.6 13.4
22 Estimates Northern America n 905 Region 1840 12.6 14.1 15.0 15.4 15.9 16.3 17.2 18.0 18.9 19.2 18.7 18.5 19.5 22.3 25.8
23 Estimates Oceania o 909 Region 1840 11.6 12.2 12.4 12.2 11.7 12.1 12.9 13.3 14.1 14.9 15.3 15.7 16.3 18.1 20.1
24 Estimates Sustainable Development Goal (SDG) regions p 1828 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
25 Estimates SUB-SAHARAN AFRICA NA 947 SDG region 1828 5.9 5.6 5.6 5.6 5.7 5.7 5.8 5.9 5.9 5.8 5.7 5.5 5.4 5.4 5.5
26 Estimates Eastern Africa NA 910 Subregion 947 5.5 5.6 5.5 5.6 5.6 5.7 5.8 5.7 5.7 5.6 5.5 5.3 5.2 5.2 5.3
27 Estimates Burundi NA 108 Country/Area 910 5.8 5.6 5.5 5.7 5.8 6.4 6.0 5.8 5.6 5.4 5.3 4.5 4.1 4.0 4.5
28 Estimates Comoros NA 174 Country/Area 910 6.6 6.0 5.8 5.7 5.8 6.0 6.2 6.3 6.2 6.0 5.7 5.5 5.3 5.1 5.4
29 Estimates Djibouti NA 262 Country/Area 910 3.9 4.1 4.3 4.4 4.6 4.8 4.6 4.8 4.9 5.2 5.4 5.6 6.0 6.6 7.1
30 Estimates Eritrea NA 232 Country/Area 910 6.2 5.6 5.1 4.9 4.8 4.8 4.9 5.0 5.3 7.0 7.6 6.6 7.1 8.2 8.3

Removing the unnecessary columns (Column Names - Index, Notes)

agedep_rm <- agedep[ -c(1,4) ]
kbl(agedep_rm[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Variant Region, subregion, country or area * Country code Type Parent code 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Estimates WORLD 900 World 0 8.4 8.5 8.6 8.9 9.3 9.7 10.0 9.9 10.1 10.6 10.9 11.2 11.6 12.6 14.3
Estimates UN development groups 1803 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Estimates More developed regions 901 Development Group 1803 11.9 12.6 13.4 14.3 15.5 16.7 17.8 17.5 18.7 20.4 21.3 22.6 23.7 26.6 30.0
Estimates Less developed regions 902 Development Group 1803 6.5 6.4 6.2 6.3 6.6 6.8 7.1 7.3 7.4 7.8 8.2 8.5 8.8 9.6 11.3
Estimates Least developed countries 941 Development Group 902 5.9 5.4 5.3 5.4 5.5 5.8 5.9 5.9 5.9 6.0 6.1 6.1 6.1 6.2 6.2
Estimates Less developed regions, excluding least developed countries 934 Development Group 902 6.6 6.5 6.4 6.4 6.7 7.0 7.3 7.4 7.6 8.0 8.5 8.8 9.2 10.2 12.1
Estimates Less developed regions, excluding China 948 Development Group 1803 6.2 6.1 6.1 6.4 6.5 6.6 6.7 6.7 6.9 7.2 7.5 7.7 8.0 8.5 9.4
Estimates Land-locked Developing Countries (LLDC) 1636 Special other 1803 6.8 6.7 6.6 6.7 6.8 6.9 7.0 6.7 6.8 7.0 6.9 6.8 6.5 6.3 6.6
Estimates Small Island Developing States (SIDS) 1637 Special other 1803 6.8 6.8 6.7 7.2 7.9 8.5 9.0 9.3 9.5 9.7 10.0 10.5 11.0 11.9 13.7
Estimates World Bank income groups 1802 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Estimates High-income countries 1503 Income Group 1802 12.1 13.0 13.7 14.5 15.5 16.4 17.3 17.1 18.2 19.3 20.3 21.3 22.5 25.3 28.2
Estimates Middle-income countries 1517 Income Group 1802 6.9 6.8 6.7 6.8 7.2 7.6 7.9 7.9 8.1 8.5 8.9 9.2 9.5 10.4 12.3
Estimates Upper-middle-income countries 1502 Income Group 1517 7.2 7.2 7.1 7.1 7.6 8.1 8.6 8.6 8.9 9.6 10.2 10.7 11.1 12.6 15.8
Estimates Lower-middle-income countries 1501 Income Group 1517 6.5 6.3 6.2 6.5 6.7 6.9 7.0 7.0 7.0 7.3 7.5 7.7 7.8 8.2 9.1
Estimates Low-income countries 1500 Income Group 1802 5.8 5.6 5.5 5.5 5.6 5.7 5.9 5.9 6.0 6.1 6.0 6.0 6.0 6.0 6.0
Estimates No income group available 1518 Income Group 1802 8.4 8.4 8.5 8.1 8.2 9.3 9.7 10.1 10.2 10.8 11.4 11.8 13.1 15.2 17.7
Estimates Geographic regions 1840 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Estimates Africa 903 Region 1840 5.9 5.7 5.8 5.9 6.0 6.1 6.1 6.2 6.2 6.2 6.2 6.0 6.0 6.1 6.3
Estimates Asia 935 Region 1840 6.8 6.6 6.4 6.4 6.8 7.1 7.5 7.7 8.0 8.5 9.1 9.5 10.0 11.0 13.1
Estimates Europe 908 Region 1840 12.1 12.7 13.6 14.8 16.3 17.8 18.9 17.8 19.0 20.9 21.8 23.3 24.0 26.4 29.5
Estimates Latin America and the Caribbean 904 Region 1840 6.3 6.4 6.7 7.1 7.3 7.5 7.8 7.9 8.2 8.7 9.1 9.8 10.5 11.6 13.4
Estimates Northern America 905 Region 1840 12.6 14.1 15.0 15.4 15.9 16.3 17.2 18.0 18.9 19.2 18.7 18.5 19.5 22.3 25.8
Estimates Oceania 909 Region 1840 11.6 12.2 12.4 12.2 11.7 12.1 12.9 13.3 14.1 14.9 15.3 15.7 16.3 18.1 20.1
Estimates Sustainable Development Goal (SDG) regions 1828 Label/Separator 900 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Estimates SUB-SAHARAN AFRICA 947 SDG region 1828 5.9 5.6 5.6 5.6 5.7 5.7 5.8 5.9 5.9 5.8 5.7 5.5 5.4 5.4 5.5
Estimates Eastern Africa 910 Subregion 947 5.5 5.6 5.5 5.6 5.6 5.7 5.8 5.7 5.7 5.6 5.5 5.3 5.2 5.2 5.3
Estimates Burundi 108 Country/Area 910 5.8 5.6 5.5 5.7 5.8 6.4 6.0 5.8 5.6 5.4 5.3 4.5 4.1 4.0 4.5
Estimates Comoros 174 Country/Area 910 6.6 6.0 5.8 5.7 5.8 6.0 6.2 6.3 6.2 6.0 5.7 5.5 5.3 5.1 5.4
Estimates Djibouti 262 Country/Area 910 3.9 4.1 4.3 4.4 4.6 4.8 4.6 4.8 4.9 5.2 5.4 5.6 6.0 6.6 7.1
Estimates Eritrea 232 Country/Area 910 6.2 5.6 5.1 4.9 4.8 4.8 4.9 5.0 5.3 7.0 7.6 6.6 7.1 8.2 8.3

Dropping rows/records with NA values

agedep_complete <- na.omit(agedep_rm) # Remove NA
kbl(agedep_complete[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Variant Region, subregion, country or area * Country code Type Parent code 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Estimates WORLD 900 World 0 8.4 8.5 8.6 8.9 9.3 9.7 10.0 9.9 10.1 10.6 10.9 11.2 11.6 12.6 14.3
Estimates More developed regions 901 Development Group 1803 11.9 12.6 13.4 14.3 15.5 16.7 17.8 17.5 18.7 20.4 21.3 22.6 23.7 26.6 30.0
Estimates Less developed regions 902 Development Group 1803 6.5 6.4 6.2 6.3 6.6 6.8 7.1 7.3 7.4 7.8 8.2 8.5 8.8 9.6 11.3
Estimates Least developed countries 941 Development Group 902 5.9 5.4 5.3 5.4 5.5 5.8 5.9 5.9 5.9 6.0 6.1 6.1 6.1 6.2 6.2
Estimates Less developed regions, excluding least developed countries 934 Development Group 902 6.6 6.5 6.4 6.4 6.7 7.0 7.3 7.4 7.6 8.0 8.5 8.8 9.2 10.2 12.1
Estimates Less developed regions, excluding China 948 Development Group 1803 6.2 6.1 6.1 6.4 6.5 6.6 6.7 6.7 6.9 7.2 7.5 7.7 8.0 8.5 9.4
Estimates Land-locked Developing Countries (LLDC) 1636 Special other 1803 6.8 6.7 6.6 6.7 6.8 6.9 7.0 6.7 6.8 7.0 6.9 6.8 6.5 6.3 6.6
Estimates Small Island Developing States (SIDS) 1637 Special other 1803 6.8 6.8 6.7 7.2 7.9 8.5 9.0 9.3 9.5 9.7 10.0 10.5 11.0 11.9 13.7
Estimates High-income countries 1503 Income Group 1802 12.1 13.0 13.7 14.5 15.5 16.4 17.3 17.1 18.2 19.3 20.3 21.3 22.5 25.3 28.2
Estimates Middle-income countries 1517 Income Group 1802 6.9 6.8 6.7 6.8 7.2 7.6 7.9 7.9 8.1 8.5 8.9 9.2 9.5 10.4 12.3
Estimates Upper-middle-income countries 1502 Income Group 1517 7.2 7.2 7.1 7.1 7.6 8.1 8.6 8.6 8.9 9.6 10.2 10.7 11.1 12.6 15.8
Estimates Lower-middle-income countries 1501 Income Group 1517 6.5 6.3 6.2 6.5 6.7 6.9 7.0 7.0 7.0 7.3 7.5 7.7 7.8 8.2 9.1
Estimates Low-income countries 1500 Income Group 1802 5.8 5.6 5.5 5.5 5.6 5.7 5.9 5.9 6.0 6.1 6.0 6.0 6.0 6.0 6.0
Estimates No income group available 1518 Income Group 1802 8.4 8.4 8.5 8.1 8.2 9.3 9.7 10.1 10.2 10.8 11.4 11.8 13.1 15.2 17.7
Estimates Africa 903 Region 1840 5.9 5.7 5.8 5.9 6.0 6.1 6.1 6.2 6.2 6.2 6.2 6.0 6.0 6.1 6.3
Estimates Asia 935 Region 1840 6.8 6.6 6.4 6.4 6.8 7.1 7.5 7.7 8.0 8.5 9.1 9.5 10.0 11.0 13.1
Estimates Europe 908 Region 1840 12.1 12.7 13.6 14.8 16.3 17.8 18.9 17.8 19.0 20.9 21.8 23.3 24.0 26.4 29.5
Estimates Latin America and the Caribbean 904 Region 1840 6.3 6.4 6.7 7.1 7.3 7.5 7.8 7.9 8.2 8.7 9.1 9.8 10.5 11.6 13.4
Estimates Northern America 905 Region 1840 12.6 14.1 15.0 15.4 15.9 16.3 17.2 18.0 18.9 19.2 18.7 18.5 19.5 22.3 25.8
Estimates Oceania 909 Region 1840 11.6 12.2 12.4 12.2 11.7 12.1 12.9 13.3 14.1 14.9 15.3 15.7 16.3 18.1 20.1
Estimates SUB-SAHARAN AFRICA 947 SDG region 1828 5.9 5.6 5.6 5.6 5.7 5.7 5.8 5.9 5.9 5.8 5.7 5.5 5.4 5.4 5.5
Estimates Eastern Africa 910 Subregion 947 5.5 5.6 5.5 5.6 5.6 5.7 5.8 5.7 5.7 5.6 5.5 5.3 5.2 5.2 5.3
Estimates Burundi 108 Country/Area 910 5.8 5.6 5.5 5.7 5.8 6.4 6.0 5.8 5.6 5.4 5.3 4.5 4.1 4.0 4.5
Estimates Comoros 174 Country/Area 910 6.6 6.0 5.8 5.7 5.8 6.0 6.2 6.3 6.2 6.0 5.7 5.5 5.3 5.1 5.4
Estimates Djibouti 262 Country/Area 910 3.9 4.1 4.3 4.4 4.6 4.8 4.6 4.8 4.9 5.2 5.4 5.6 6.0 6.6 7.1
Estimates Eritrea 232 Country/Area 910 6.2 5.6 5.1 4.9 4.8 4.8 4.9 5.0 5.3 7.0 7.6 6.6 7.1 8.2 8.3
Estimates Ethiopia 231 Country/Area 910 5.7 5.2 4.9 4.8 5.1 5.3 6.2 5.9 6.3 6.1 6.1 6.2 6.4 6.4 6.3
Estimates Kenya 404 Country/Area 910 7.0 7.3 7.5 7.5 7.1 6.7 6.1 5.6 5.1 4.7 4.3 3.9 3.5 3.7 4.3
Estimates Madagascar 450 Country/Area 910 5.4 5.7 6.0 6.4 6.8 7.2 6.7 6.1 5.9 5.8 5.7 5.6 5.3 5.1 5.5
Estimates Malawi 454 Country/Area 910 6.0 6.1 6.0 5.8 5.7 5.4 5.5 5.9 6.0 6.6 6.1 5.6 5.2 5.0 4.9

Data Wrangling:

It is untidy data as it does not satisfy tidy data principles:

  • Not every row is an observation.
  • Every variable does not form a column.

The process below tidies the data.

Using pivot_longer:

  • In this dataset, we have a vector of column names(1950,1955,..,2020) that contain values, not variables. Creating 2 new columns named “Year” and “Age_Dependency_Ratio” with pivot longer to tidy the dataset.
agedep_longer=agedep_complete %>%
  pivot_longer(c('1950', '1955','1960', '1965','1970', '1975','1980', '1985','1990', '1995',
                 '2000', '2005','2010', '2015','2020'), names_to = "Year", values_to = "Age_Dependency_Ratio")
kbl(agedep_longer[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Variant Region, subregion, country or area * Country code Type Parent code Year Age_Dependency_Ratio
Estimates WORLD 900 World 0 1950 8.4
Estimates WORLD 900 World 0 1955 8.5
Estimates WORLD 900 World 0 1960 8.6
Estimates WORLD 900 World 0 1965 8.9
Estimates WORLD 900 World 0 1970 9.3
Estimates WORLD 900 World 0 1975 9.7
Estimates WORLD 900 World 0 1980 10.0
Estimates WORLD 900 World 0 1985 9.9
Estimates WORLD 900 World 0 1990 10.1
Estimates WORLD 900 World 0 1995 10.6
Estimates WORLD 900 World 0 2000 10.9
Estimates WORLD 900 World 0 2005 11.2
Estimates WORLD 900 World 0 2010 11.6
Estimates WORLD 900 World 0 2015 12.6
Estimates WORLD 900 World 0 2020 14.3
Estimates More developed regions 901 Development Group 1803 1950 11.9
Estimates More developed regions 901 Development Group 1803 1955 12.6
Estimates More developed regions 901 Development Group 1803 1960 13.4
Estimates More developed regions 901 Development Group 1803 1965 14.3
Estimates More developed regions 901 Development Group 1803 1970 15.5
Estimates More developed regions 901 Development Group 1803 1975 16.7
Estimates More developed regions 901 Development Group 1803 1980 17.8
Estimates More developed regions 901 Development Group 1803 1985 17.5
Estimates More developed regions 901 Development Group 1803 1990 18.7
Estimates More developed regions 901 Development Group 1803 1995 20.4
Estimates More developed regions 901 Development Group 1803 2000 21.3
Estimates More developed regions 901 Development Group 1803 2005 22.6
Estimates More developed regions 901 Development Group 1803 2010 23.7
Estimates More developed regions 901 Development Group 1803 2015 26.6
Estimates More developed regions 901 Development Group 1803 2020 30.0

Using pivot_wider:

  • In this dataset, an observation(Column Name - Variant) is scattered across multiple rows. I have used pivot wider to create a new column “Estimates” and used the values from column “Age_Dependency_Ratio”
agedep_wider = agedep_longer %>% pivot_wider(names_from = Variant, values_from = Age_Dependency_Ratio)
kbl(agedep_wider[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Region, subregion, country or area * Country code Type Parent code Year Estimates
WORLD 900 World 0 1950 8.4
WORLD 900 World 0 1955 8.5
WORLD 900 World 0 1960 8.6
WORLD 900 World 0 1965 8.9
WORLD 900 World 0 1970 9.3
WORLD 900 World 0 1975 9.7
WORLD 900 World 0 1980 10.0
WORLD 900 World 0 1985 9.9
WORLD 900 World 0 1990 10.1
WORLD 900 World 0 1995 10.6
WORLD 900 World 0 2000 10.9
WORLD 900 World 0 2005 11.2
WORLD 900 World 0 2010 11.6
WORLD 900 World 0 2015 12.6
WORLD 900 World 0 2020 14.3
More developed regions 901 Development Group 1803 1950 11.9
More developed regions 901 Development Group 1803 1955 12.6
More developed regions 901 Development Group 1803 1960 13.4
More developed regions 901 Development Group 1803 1965 14.3
More developed regions 901 Development Group 1803 1970 15.5
More developed regions 901 Development Group 1803 1975 16.7
More developed regions 901 Development Group 1803 1980 17.8
More developed regions 901 Development Group 1803 1985 17.5
More developed regions 901 Development Group 1803 1990 18.7
More developed regions 901 Development Group 1803 1995 20.4
More developed regions 901 Development Group 1803 2000 21.3
More developed regions 901 Development Group 1803 2005 22.6
More developed regions 901 Development Group 1803 2010 23.7
More developed regions 901 Development Group 1803 2015 26.6
More developed regions 901 Development Group 1803 2020 30.0

The data is now completely tidy and ready for data visualization:

  • We removed unnecessary columns.
  • We also checked for any NA & special values and removed the records
  • Applied pivot_longer for addressing spread across multiple columns issue
  • Applied pivot_wider for addressing observations scattered across multiple rows issue

DATA VISUALIZATION:

1. Box Plot of Year vs Age_Dependency_Ratio(%)

theme_set(theme_bw())
ggplot(data = agedep_wider, mapping = aes(x = Year, y = Estimates)) +
  geom_boxplot()+ labs(x="Year", y="%Ratio of population aged 65+ per 100 population 15-64",title = "Box Plot of Old-age dependency ratio")

Regarding the above box plot, we can infer that, % of old-age people is increasing from 1950 to 2020 based on median values.

2. Scatter Plot of Year vs Age_Dependency_Ratio(%) grouped by type of classification

ggplot(data = agedep_wider) +
  geom_point(mapping = aes(x = Year, y = Estimates)) +
  facet_wrap(~ Type, nrow = 1) + coord_flip() + labs(x="Year", y="%Ratio of population aged 65+ per 100 population 15-64",title = "Scatter Plot of Old-age dependency ratio grouped by type of classification")

Regarding the above scatter plot, we can infer that, % of old-age people group by type of classification, there is % increase in last 10 years for all classifications.

As the number of regions/countries are more than 100, creating a vector of data(Continents) for better visualization

Region= c('Africa','Asia','Europe','Latin America and the Caribbean','Northern America','Oceania')
agedep_vis=agedep_wider[agedep_wider$`Region, subregion, country or area *` %in% Region,]
kbl(agedep_vis[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Region, subregion, country or area * Country code Type Parent code Year Estimates
Africa 903 Region 1840 1950 5.9
Africa 903 Region 1840 1955 5.7
Africa 903 Region 1840 1960 5.8
Africa 903 Region 1840 1965 5.9
Africa 903 Region 1840 1970 6.0
Africa 903 Region 1840 1975 6.1
Africa 903 Region 1840 1980 6.1
Africa 903 Region 1840 1985 6.2
Africa 903 Region 1840 1990 6.2
Africa 903 Region 1840 1995 6.2
Africa 903 Region 1840 2000 6.2
Africa 903 Region 1840 2005 6.0
Africa 903 Region 1840 2010 6.0
Africa 903 Region 1840 2015 6.1
Africa 903 Region 1840 2020 6.3
Asia 935 Region 1840 1950 6.8
Asia 935 Region 1840 1955 6.6
Asia 935 Region 1840 1960 6.4
Asia 935 Region 1840 1965 6.4
Asia 935 Region 1840 1970 6.8
Asia 935 Region 1840 1975 7.1
Asia 935 Region 1840 1980 7.5
Asia 935 Region 1840 1985 7.7
Asia 935 Region 1840 1990 8.0
Asia 935 Region 1840 1995 8.5
Asia 935 Region 1840 2000 9.1
Asia 935 Region 1840 2005 9.5
Asia 935 Region 1840 2010 10.0
Asia 935 Region 1840 2015 11.0
Asia 935 Region 1840 2020 13.1

3. Scatter Plot of Year vs Age_Dependency_Ratio(%) group by type of classification by Continent

ggplot(data = agedep_vis) +
  geom_point(mapping = aes(x = Year, y = Estimates, color = `Region, subregion, country or area *`, size = Estimates))+ labs(x="Year", y="%Ratio of population aged 65+ per 100 population 15-64",title = "Scatter plot of Ratio of population aged 65+ per 100 by Continent")

Regarding the above scatter plot, we can infer that, % of old-age people by continent, Europe continent has highest proportion of old-aged people by 2020 which is 30% and Africa continent has the least proportion of old-aged people by 2020 which is close to 5%.

Creating a vector of data(Region/country) for better visualization

Region2= c('More developed regions','Less developed regions','High-income countries','Middle-income countries','Low-income countries','Small Island Developing States (SIDS)')
agedep_vis2=agedep_wider[agedep_wider$`Region, subregion, country or area *` %in% Region2,]
kbl(agedep_vis2[1:30,]) %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Region, subregion, country or area * Country code Type Parent code Year Estimates
More developed regions 901 Development Group 1803 1950 11.9
More developed regions 901 Development Group 1803 1955 12.6
More developed regions 901 Development Group 1803 1960 13.4
More developed regions 901 Development Group 1803 1965 14.3
More developed regions 901 Development Group 1803 1970 15.5
More developed regions 901 Development Group 1803 1975 16.7
More developed regions 901 Development Group 1803 1980 17.8
More developed regions 901 Development Group 1803 1985 17.5
More developed regions 901 Development Group 1803 1990 18.7
More developed regions 901 Development Group 1803 1995 20.4
More developed regions 901 Development Group 1803 2000 21.3
More developed regions 901 Development Group 1803 2005 22.6
More developed regions 901 Development Group 1803 2010 23.7
More developed regions 901 Development Group 1803 2015 26.6
More developed regions 901 Development Group 1803 2020 30.0
Less developed regions 902 Development Group 1803 1950 6.5
Less developed regions 902 Development Group 1803 1955 6.4
Less developed regions 902 Development Group 1803 1960 6.2
Less developed regions 902 Development Group 1803 1965 6.3
Less developed regions 902 Development Group 1803 1970 6.6
Less developed regions 902 Development Group 1803 1975 6.8
Less developed regions 902 Development Group 1803 1980 7.1
Less developed regions 902 Development Group 1803 1985 7.3
Less developed regions 902 Development Group 1803 1990 7.4
Less developed regions 902 Development Group 1803 1995 7.8
Less developed regions 902 Development Group 1803 2000 8.2
Less developed regions 902 Development Group 1803 2005 8.5
Less developed regions 902 Development Group 1803 2010 8.8
Less developed regions 902 Development Group 1803 2015 9.6
Less developed regions 902 Development Group 1803 2020 11.3

4. Scatter Plot of Region/Country vs Type of classfication

ggplot(data = agedep_vis2) +
  geom_point(mapping = aes(x = `Region, subregion, country or area *`, y = Type, size = Estimates))+ labs(x="Region/Country Grouping", y="Type of classification",title = "Scatter plot based on region/country by classification type") +
  theme(plot.title = element_text(hjust = 0.5)) + coord_flip()

Regarding the above scatter plot, we can infer that, More developed and High-income countries have high % of old-aged people where as less developed and Low-income countries have less % of old-aged people.