1 Introduction

1.1 Introduction

The issue of world population reaching unsustainable levels is a topic of concern for many people. The growth of the global population has been a significant demographic trend over the past few centuries The world’s population has increased rapidly, particularly since the Industrial Revolution. This report will show you a spesific numeric detail value about human population Hopefully, this data make us aware that we need to control population before resource depletion, food insecurity, urbanization and overcrowding, and other bad impact affect our world.

data source is obtained from kaggle.

1.2 Brief

To analyze and visualize the data we will use the following library :

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(glue)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggpubr)
library(scales)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tidyr)
library(stringr)
library(viridis)
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
## 
##     viridis_pal

Theme for visualization

theme_gpt <- theme(
  legend.key = element_rect(fill = "black"),
  legend.background = element_rect(color = "white", fill = "#2F4F4F"),
  plot.subtitle = element_text(size = 6, color = "white"),
  panel.background = element_rect(fill = "#F8F8FF"),
  panel.border = element_rect(fill = NA),
  panel.grid.minor.x = element_blank(),
  panel.grid.major.x = element_blank(),
  panel.grid.major.y = element_line(color = "darkgrey", linetype = 2),
  panel.grid.minor.y = element_blank(),
  plot.background = element_rect(fill = "#263238"),
  text = element_text(color = "white"),
  axis.text = element_text(color = "white")
)

2 Data Explanatory

2.1 Data 2

This data show world population by country.

2.1.1 Data input & Structure Data

pop_country <- read.csv("data_assets/countries-table_updated.csv")
head(pop_country)
glimpse(pop_country)
## Rows: 234
## Columns: 21
## $ country         <chr> "India", "China", "United States", "Indonesia", "Pakis…
## $ continent       <chr> "Asia", "Asia", "North America", "Asia", "Asia", "Afri…
## $ subcontinent    <chr> "South Asia", "East Asia", "North America", "Southeast…
## $ rank            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ area            <dbl> 3287590, 9706961, 9372610, 1904569, 881912, 923768, 85…
## $ landAreaKm      <dbl> 2973190, 9424703, 9147420, 1877519, 770880, 910770, 83…
## $ cca2            <chr> "IN", "CN", "US", "ID", "PK", "NG", "BR", "BD", "RU", …
## $ cca3            <chr> "IND", "CHN", "USA", "IDN", "PAK", "NGA", "BRA", "BGD"…
## $ netChange       <dbl> 0.4184, -0.0113, 0.0581, 0.0727, 0.1495, 0.1680, 0.039…
## $ growthRate      <dbl> 0.0081, -0.0002, 0.0050, 0.0074, 0.0198, 0.0241, 0.005…
## $ worldPercentage <dbl> 0.1785, 0.1781, 0.0425, 0.0347, 0.0300, 0.0280, 0.0270…
## $ density         <dbl> 480.5033, 151.2696, 37.1686, 147.8196, 311.9625, 245.7…
## $ densityMi       <dbl> 1244.5036, 391.7884, 96.2666, 382.8528, 807.9829, 636.…
## $ place           <int> 356, 156, 840, 360, 586, 566, 76, 50, 643, 484, 231, 3…
## $ pop1980         <int> 696828385, 982372466, 223140018, 148177096, 80624057, …
## $ pop2000         <int> 1059633675, 1264099069, 282398554, 214072421, 15436992…
## $ pop2010         <int> 1240613620, 1348191368, 311182845, 244016173, 19445449…
## $ pop2022         <int> 1417173173, 1425887337, 338289857, 275501339, 23582486…
## $ pop2023         <int> 1428627663, 1425671352, 339996563, 277534122, 24048565…
## $ pop2030         <int> 1514994080, 1415605906, 352162301, 292150100, 27402983…
## $ pop2050         <int> 1670490596, 1312636325, 375391963, 317225213, 36780846…

We can see that this each data in columns have the correct or desirable data type, therefore we do not need to change anything.

2.1.2 Missing value

colSums(is.na(pop_country))
##         country       continent    subcontinent            rank            area 
##               0               0               0               0               0 
##      landAreaKm            cca2            cca3       netChange      growthRate 
##               0               1               0               8               0 
## worldPercentage         density       densityMi           place         pop1980 
##               6               0               0               0               0 
##         pop2000         pop2010         pop2022         pop2023         pop2030 
##               0               0               0               0               0 
##         pop2050 
##               0

We have some missing value in this dataset, but each row in this dataset gives important data for each country, therefore we can not remove all the row that consist missing value. Instead we will replace the missing value with zero.

pop_country[is.na(pop_country)] <- 0
colSums(is.na(pop_country))
##         country       continent    subcontinent            rank            area 
##               0               0               0               0               0 
##      landAreaKm            cca2            cca3       netChange      growthRate 
##               0               0               0               0               0 
## worldPercentage         density       densityMi           place         pop1980 
##               0               0               0               0               0 
##         pop2000         pop2010         pop2022         pop2023         pop2030 
##               0               0               0               0               0 
##         pop2050 
##               0

2.1.3 Subsetting and Practical Statistic

cca2 and cca3 (abbrivation of country) is irrelevant for data analyze or visualization therefore we need to drop these data.

pop_country <- pop_country %>%
  select(-cca2, -cca3)
names(pop_country)
##  [1] "country"         "continent"       "subcontinent"    "rank"           
##  [5] "area"            "landAreaKm"      "netChange"       "growthRate"     
##  [9] "worldPercentage" "density"         "densityMi"       "place"          
## [13] "pop1980"         "pop2000"         "pop2010"         "pop2022"        
## [17] "pop2023"         "pop2030"         "pop2050"

Next we will inspect the statistical summary of the dataset :

summary(pop_country)
##    country           continent         subcontinent            rank       
##  Length:234         Length:234         Length:234         Min.   :  1.00  
##  Class :character   Class :character   Class :character   1st Qu.: 59.25  
##  Mode  :character   Mode  :character   Mode  :character   Median :117.50  
##                                                           Mean   :117.50  
##                                                           3rd Qu.:175.75  
##                                                           Max.   :234.00  
##       area            landAreaKm         netChange           growthRate       
##  Min.   :       0   Min.   :       0   Min.   :-0.028600   Min.   :-0.074500  
##  1st Qu.:    2650   1st Qu.:    2626   1st Qu.: 0.000000   1st Qu.: 0.002325  
##  Median :   81200   Median :   75689   Median : 0.000650   Median : 0.008200  
##  Mean   :  581450   Mean   :  557112   Mean   : 0.009954   Mean   : 0.009737  
##  3rd Qu.:  430426   3rd Qu.:  404788   3rd Qu.: 0.007075   3rd Qu.: 0.016850  
##  Max.   :17098242   Max.   :16376870   Max.   : 0.418400   Max.   : 0.049800  
##  worldPercentage       density            densityMi            place      
##  Min.   :0.000000   Min.   :    0.138   Min.   :    0.36   Min.   :  4.0  
##  1st Qu.:0.000100   1st Qu.:   39.748   1st Qu.:  102.95   1st Qu.:223.0  
##  Median :0.000700   Median :   97.481   Median :  252.48   Median :439.0  
##  Mean   :0.004294   Mean   :  451.288   Mean   : 1168.84   Mean   :439.1  
##  3rd Qu.:0.002900   3rd Qu.:  242.929   3rd Qu.:  629.19   3rd Qu.:659.8  
##  Max.   :0.178500   Max.   :21402.705   Max.   :55433.01   Max.   :894.0  
##     pop1980             pop2000             pop2010         
##  Min.   :      733   Min.   :6.510e+02   Min.   :5.960e+02  
##  1st Qu.:   229614   1st Qu.:3.272e+05   1st Qu.:3.931e+05  
##  Median :  3141146   Median :4.293e+06   Median :4.943e+06  
##  Mean   : 18984617   Mean   :2.627e+07   Mean   :2.985e+07  
##  3rd Qu.:  9826054   3rd Qu.:1.576e+07   3rd Qu.:1.916e+07  
##  Max.   :982372466   Max.   :1.264e+09   Max.   :1.348e+09  
##     pop2022             pop2023             pop2030         
##  Min.   :5.100e+02   Min.   :5.180e+02   Min.   :5.610e+02  
##  1st Qu.:4.197e+05   1st Qu.:4.226e+05   1st Qu.:4.561e+05  
##  Median :5.560e+06   Median :5.644e+06   Median :6.178e+06  
##  Mean   :3.407e+07   Mean   :3.437e+07   Mean   :3.651e+07  
##  3rd Qu.:2.248e+07   3rd Qu.:2.325e+07   3rd Qu.:2.616e+07  
##  Max.   :1.426e+09   Max.   :1.429e+09   Max.   :1.515e+09  
##     pop2050         
##  Min.   :7.310e+02  
##  1st Qu.:5.466e+05  
##  Median :6.352e+06  
##  Mean   :4.149e+07  
##  3rd Qu.:3.569e+07  
##  Max.   :1.670e+09

Quick insight from this summary :

  • Country that is most populated in the world have 1,428,627,663 people (2023)
  • Country that is least populated in the world have 518 people (2023)
  • Averagy country population is 34,374,425 people (2023)

2.2 Data 2

This data show world population growth year by year from 1950 to 2023

2.2.1 Data Input

growth <- read.csv("data_assets/Population Growth.csv")
head(growth)
glimpse(growth)
## Rows: 74
## Columns: 3
## $ Year                   <int> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
## $ Population.Growth.Rate <chr> "2,499,322,157", "2,543,130,380", "2,590,270,89…
## $ Growth.Rate            <dbl> 0.0000, 0.0175, 0.0185, 0.0193, 0.0196, 0.0201,…

We can see that this each data in columns did not have the correct or desirable data type, therefore we need to change to desirable data type.

Remove commas and convert the column to numeric in Population.Growth.Rate

growth$Population.Growth.Rate <- as.numeric(gsub(",","",growth$Population.Growth.Rate))
glimpse(growth)
## Rows: 74
## Columns: 3
## $ Year                   <int> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
## $ Population.Growth.Rate <dbl> 2499322157, 2543130380, 2590270899, 2640278797,…
## $ Growth.Rate            <dbl> 0.0000, 0.0175, 0.0185, 0.0193, 0.0196, 0.0201,…

2.2.2 Missing Value

colSums(is.na(growth))
##                   Year Population.Growth.Rate            Growth.Rate 
##                      0                      0                      0

There are no missing value in this dataset.

2.2.3 Practical Statistic

summary(growth)
##       Year      Population.Growth.Rate  Growth.Rate     
##  Min.   :1950   Min.   :2.499e+09      Min.   :0.00000  
##  1st Qu.:1968   1st Qu.:3.565e+09      1st Qu.:0.01270  
##  Median :1986   Median :4.996e+09      Median :0.01750  
##  Mean   :1986   Mean   :5.090e+09      Mean   :0.01593  
##  3rd Qu.:2005   3rd Qu.:6.538e+09      3rd Qu.:0.01890  
##  Max.   :2023   Max.   :8.045e+09      Max.   :0.02240

Quick insight from this summary :

  • current world population is 8,045,311,447 people
  • average growth rate each year is 0.01593

3 Study Case

3.1 Population Growth Year by Year from 1950

plot_agg_growth <- growth %>%
  mutate(label = glue("Year: {Year}
                      Population: {format(Population.Growth.Rate, big.mark = ',')}
                      Growth rate: {Growth.Rate}"))

plot_growth <- plot_agg_growth %>%
  ggplot(aes(x = Year,
             y = Population.Growth.Rate,
             text = label)) + 
  geom_point(color = "navy",size = 0.5) +
  geom_line(aes(group = 1),color = "red",linewidth = 1) +
  geom_area(aes(group = 1), alpha = 0.3) +
  scale_y_continuous(labels = function(x) paste0(x / 1e9, " bn")) + 
  labs(x = "Year", y = "Population", title = "Population Growth Year by Year from 1950") +
  theme_gpt

ggplotly(plot_growth, tooltip = "text")

Quick Insight :

  • current world population is 8,045,311,447 people

  • Every year from 1950 there no decline in world population, always raising exponentially

3.2 World Population in Each Continent

library(scales)
country_agg <- pop_country %>%
  select(continent, pop2023, pop2050) %>%
  group_by(continent) %>%
  summarise(pop2023_sum = sum(pop2023),
            pop2050_sum = sum(pop2050)) %>%
  mutate(label = sprintf("Current Population: %s\nExpected Population in 2050: %s",
                         comma(pop2023_sum),
                         comma(pop2050_sum)))

plot_country2 <- country_agg %>%  
  ggplot(aes(y = reorder(continent, pop2023_sum), text = label)) +
  geom_bar(aes(x = pop2050_sum), stat = "identity", position = "stack", fill = "gray") +
  geom_bar(aes(x = pop2023_sum), stat = "identity", position = "stack", fill = "darkred") +
  labs(x = "Population", y = "Continent", title = "Current Population each Continent") +
  theme_gpt +
  scale_x_continuous(labels = function(x) paste0(x / 1e6, " m")) +
  scale_fill_manual(values = c("gray", "darkred"),
                    labels = c("Expected Population", "Current Population"))

ggplotly(plot_country2, tooltip = "text")

Quick Insight :

  • In this plot, gray bar is expected population in 2050 and red bar is current population

  • If there no gray bar in sight, it mean that continent will have declined population, that’s a good thing!

  • Europe is the only continent that will have declined population by 2050

  • Asia is the most populated continent in the world

  • Oceania is the least populated continent in the

3.3 Density vs Land Area in each Country of Current Population

What’s a Density ?

population density, which is the number of people per square kilometer (km²) in a specific area. It is a commonly used measure to understand how crowded or sparsely populated an area is.

country_agg1 <- pop_country %>%
  mutate(label = glue("{country}, {subcontinent}
                      World Rank : {rank}
                      Population :{comma_format()(pop2023)}
                      Land area : {comma_format()(landAreaKm)} km²
                      Density : {comma_format(accuracy = 1)(density)} people/km² " ))

plot_country1 <- country_agg1 %>%  
  ggplot(aes(x = landAreaKm,
             y = density,
             color = continent,
             text = label)) + 
  geom_point() +
  labs(x = "Land Area (km²)", y = "Density (people/km²)", title = "Density vs Land Area in each Country of Current Population",color = "Continent") +
  theme_gpt + 
  scale_color_viridis_d()+
  scale_x_continuous(labels = function(x) paste0(x / 1e6, " m"))+
  scale_y_continuous(labels = comma)

ggplotly(plot_country1, tooltip = "text")

Quick Insight :

  • With this plot we can tell that there is little to none correlation between Density and Land Area

  • In this case, high density with low land area equals the country is a crowded enviroment, on the other hand, a country with low density and high land area have a vacant enviroment.

  • Countries in the world is more distributed in low density area with low land area.

4 Conclusion

with the data shown above we can make conclusion :

  • Every year from 1950 world population increased exponentially

  • Europe is the only continent that will have declined population by 2050

  • Asia is the most populated continent in the world

  • Oceania is the least populated continent in the

  • Countries in the world is more distributed in low density area with low land area