The issue of world population reaching unsustainable levels is a topic of concern for many people. The growth of the global population has been a significant demographic trend over the past few centuries The world’s population has increased rapidly, particularly since the Industrial Revolution. This report will show you a spesific numeric detail value about human population Hopefully, this data make us aware that we need to control population before resource depletion, food insecurity, urbanization and overcrowding, and other bad impact affect our world.
data source is obtained from kaggle.
To analyze and visualize the data we will use the following library :
library(ggplot2)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(glue)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggpubr)
library(scales)
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyr)
library(stringr)
library(viridis)## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
Theme for visualization
theme_gpt <- theme(
legend.key = element_rect(fill = "black"),
legend.background = element_rect(color = "white", fill = "#2F4F4F"),
plot.subtitle = element_text(size = 6, color = "white"),
panel.background = element_rect(fill = "#F8F8FF"),
panel.border = element_rect(fill = NA),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "darkgrey", linetype = 2),
panel.grid.minor.y = element_blank(),
plot.background = element_rect(fill = "#263238"),
text = element_text(color = "white"),
axis.text = element_text(color = "white")
)This data show world population by country.
pop_country <- read.csv("data_assets/countries-table_updated.csv")
head(pop_country)glimpse(pop_country)## Rows: 234
## Columns: 21
## $ country <chr> "India", "China", "United States", "Indonesia", "Pakis…
## $ continent <chr> "Asia", "Asia", "North America", "Asia", "Asia", "Afri…
## $ subcontinent <chr> "South Asia", "East Asia", "North America", "Southeast…
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ area <dbl> 3287590, 9706961, 9372610, 1904569, 881912, 923768, 85…
## $ landAreaKm <dbl> 2973190, 9424703, 9147420, 1877519, 770880, 910770, 83…
## $ cca2 <chr> "IN", "CN", "US", "ID", "PK", "NG", "BR", "BD", "RU", …
## $ cca3 <chr> "IND", "CHN", "USA", "IDN", "PAK", "NGA", "BRA", "BGD"…
## $ netChange <dbl> 0.4184, -0.0113, 0.0581, 0.0727, 0.1495, 0.1680, 0.039…
## $ growthRate <dbl> 0.0081, -0.0002, 0.0050, 0.0074, 0.0198, 0.0241, 0.005…
## $ worldPercentage <dbl> 0.1785, 0.1781, 0.0425, 0.0347, 0.0300, 0.0280, 0.0270…
## $ density <dbl> 480.5033, 151.2696, 37.1686, 147.8196, 311.9625, 245.7…
## $ densityMi <dbl> 1244.5036, 391.7884, 96.2666, 382.8528, 807.9829, 636.…
## $ place <int> 356, 156, 840, 360, 586, 566, 76, 50, 643, 484, 231, 3…
## $ pop1980 <int> 696828385, 982372466, 223140018, 148177096, 80624057, …
## $ pop2000 <int> 1059633675, 1264099069, 282398554, 214072421, 15436992…
## $ pop2010 <int> 1240613620, 1348191368, 311182845, 244016173, 19445449…
## $ pop2022 <int> 1417173173, 1425887337, 338289857, 275501339, 23582486…
## $ pop2023 <int> 1428627663, 1425671352, 339996563, 277534122, 24048565…
## $ pop2030 <int> 1514994080, 1415605906, 352162301, 292150100, 27402983…
## $ pop2050 <int> 1670490596, 1312636325, 375391963, 317225213, 36780846…
We can see that this each data in columns have the correct or desirable data type, therefore we do not need to change anything.
colSums(is.na(pop_country))## country continent subcontinent rank area
## 0 0 0 0 0
## landAreaKm cca2 cca3 netChange growthRate
## 0 1 0 8 0
## worldPercentage density densityMi place pop1980
## 6 0 0 0 0
## pop2000 pop2010 pop2022 pop2023 pop2030
## 0 0 0 0 0
## pop2050
## 0
We have some missing value in this dataset, but each row in this dataset gives important data for each country, therefore we can not remove all the row that consist missing value. Instead we will replace the missing value with zero.
pop_country[is.na(pop_country)] <- 0
colSums(is.na(pop_country))## country continent subcontinent rank area
## 0 0 0 0 0
## landAreaKm cca2 cca3 netChange growthRate
## 0 0 0 0 0
## worldPercentage density densityMi place pop1980
## 0 0 0 0 0
## pop2000 pop2010 pop2022 pop2023 pop2030
## 0 0 0 0 0
## pop2050
## 0
cca2 and cca3 (abbrivation of country) is irrelevant for data analyze or visualization therefore we need to drop these data.
pop_country <- pop_country %>%
select(-cca2, -cca3)
names(pop_country)## [1] "country" "continent" "subcontinent" "rank"
## [5] "area" "landAreaKm" "netChange" "growthRate"
## [9] "worldPercentage" "density" "densityMi" "place"
## [13] "pop1980" "pop2000" "pop2010" "pop2022"
## [17] "pop2023" "pop2030" "pop2050"
Next we will inspect the statistical summary of the dataset :
summary(pop_country)## country continent subcontinent rank
## Length:234 Length:234 Length:234 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.: 59.25
## Mode :character Mode :character Mode :character Median :117.50
## Mean :117.50
## 3rd Qu.:175.75
## Max. :234.00
## area landAreaKm netChange growthRate
## Min. : 0 Min. : 0 Min. :-0.028600 Min. :-0.074500
## 1st Qu.: 2650 1st Qu.: 2626 1st Qu.: 0.000000 1st Qu.: 0.002325
## Median : 81200 Median : 75689 Median : 0.000650 Median : 0.008200
## Mean : 581450 Mean : 557112 Mean : 0.009954 Mean : 0.009737
## 3rd Qu.: 430426 3rd Qu.: 404788 3rd Qu.: 0.007075 3rd Qu.: 0.016850
## Max. :17098242 Max. :16376870 Max. : 0.418400 Max. : 0.049800
## worldPercentage density densityMi place
## Min. :0.000000 Min. : 0.138 Min. : 0.36 Min. : 4.0
## 1st Qu.:0.000100 1st Qu.: 39.748 1st Qu.: 102.95 1st Qu.:223.0
## Median :0.000700 Median : 97.481 Median : 252.48 Median :439.0
## Mean :0.004294 Mean : 451.288 Mean : 1168.84 Mean :439.1
## 3rd Qu.:0.002900 3rd Qu.: 242.929 3rd Qu.: 629.19 3rd Qu.:659.8
## Max. :0.178500 Max. :21402.705 Max. :55433.01 Max. :894.0
## pop1980 pop2000 pop2010
## Min. : 733 Min. :6.510e+02 Min. :5.960e+02
## 1st Qu.: 229614 1st Qu.:3.272e+05 1st Qu.:3.931e+05
## Median : 3141146 Median :4.293e+06 Median :4.943e+06
## Mean : 18984617 Mean :2.627e+07 Mean :2.985e+07
## 3rd Qu.: 9826054 3rd Qu.:1.576e+07 3rd Qu.:1.916e+07
## Max. :982372466 Max. :1.264e+09 Max. :1.348e+09
## pop2022 pop2023 pop2030
## Min. :5.100e+02 Min. :5.180e+02 Min. :5.610e+02
## 1st Qu.:4.197e+05 1st Qu.:4.226e+05 1st Qu.:4.561e+05
## Median :5.560e+06 Median :5.644e+06 Median :6.178e+06
## Mean :3.407e+07 Mean :3.437e+07 Mean :3.651e+07
## 3rd Qu.:2.248e+07 3rd Qu.:2.325e+07 3rd Qu.:2.616e+07
## Max. :1.426e+09 Max. :1.429e+09 Max. :1.515e+09
## pop2050
## Min. :7.310e+02
## 1st Qu.:5.466e+05
## Median :6.352e+06
## Mean :4.149e+07
## 3rd Qu.:3.569e+07
## Max. :1.670e+09
Quick insight from this summary :
This data show world population growth year by year from 1950 to 2023
growth <- read.csv("data_assets/Population Growth.csv")
head(growth)glimpse(growth)## Rows: 74
## Columns: 3
## $ Year <int> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
## $ Population.Growth.Rate <chr> "2,499,322,157", "2,543,130,380", "2,590,270,89…
## $ Growth.Rate <dbl> 0.0000, 0.0175, 0.0185, 0.0193, 0.0196, 0.0201,…
We can see that this each data in columns did not have the correct or desirable data type, therefore we need to change to desirable data type.
Remove commas and convert the column to numeric in Population.Growth.Rate
growth$Population.Growth.Rate <- as.numeric(gsub(",","",growth$Population.Growth.Rate))
glimpse(growth)## Rows: 74
## Columns: 3
## $ Year <int> 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957,…
## $ Population.Growth.Rate <dbl> 2499322157, 2543130380, 2590270899, 2640278797,…
## $ Growth.Rate <dbl> 0.0000, 0.0175, 0.0185, 0.0193, 0.0196, 0.0201,…
colSums(is.na(growth))## Year Population.Growth.Rate Growth.Rate
## 0 0 0
There are no missing value in this dataset.
summary(growth)## Year Population.Growth.Rate Growth.Rate
## Min. :1950 Min. :2.499e+09 Min. :0.00000
## 1st Qu.:1968 1st Qu.:3.565e+09 1st Qu.:0.01270
## Median :1986 Median :4.996e+09 Median :0.01750
## Mean :1986 Mean :5.090e+09 Mean :0.01593
## 3rd Qu.:2005 3rd Qu.:6.538e+09 3rd Qu.:0.01890
## Max. :2023 Max. :8.045e+09 Max. :0.02240
Quick insight from this summary :
plot_agg_growth <- growth %>%
mutate(label = glue("Year: {Year}
Population: {format(Population.Growth.Rate, big.mark = ',')}
Growth rate: {Growth.Rate}"))
plot_growth <- plot_agg_growth %>%
ggplot(aes(x = Year,
y = Population.Growth.Rate,
text = label)) +
geom_point(color = "navy",size = 0.5) +
geom_line(aes(group = 1),color = "red",linewidth = 1) +
geom_area(aes(group = 1), alpha = 0.3) +
scale_y_continuous(labels = function(x) paste0(x / 1e9, " bn")) +
labs(x = "Year", y = "Population", title = "Population Growth Year by Year from 1950") +
theme_gpt
ggplotly(plot_growth, tooltip = "text")Quick Insight :
current world population is 8,045,311,447 people
Every year from 1950 there no decline in world population, always raising exponentially
library(scales)
country_agg <- pop_country %>%
select(continent, pop2023, pop2050) %>%
group_by(continent) %>%
summarise(pop2023_sum = sum(pop2023),
pop2050_sum = sum(pop2050)) %>%
mutate(label = sprintf("Current Population: %s\nExpected Population in 2050: %s",
comma(pop2023_sum),
comma(pop2050_sum)))
plot_country2 <- country_agg %>%
ggplot(aes(y = reorder(continent, pop2023_sum), text = label)) +
geom_bar(aes(x = pop2050_sum), stat = "identity", position = "stack", fill = "gray") +
geom_bar(aes(x = pop2023_sum), stat = "identity", position = "stack", fill = "darkred") +
labs(x = "Population", y = "Continent", title = "Current Population each Continent") +
theme_gpt +
scale_x_continuous(labels = function(x) paste0(x / 1e6, " m")) +
scale_fill_manual(values = c("gray", "darkred"),
labels = c("Expected Population", "Current Population"))
ggplotly(plot_country2, tooltip = "text")Quick Insight :
In this plot, gray bar is expected population in 2050 and red bar is current population
If there no gray bar in sight, it mean that continent will have declined population, that’s a good thing!
Europe is the only continent that will have declined population by 2050
Asia is the most populated continent in the world
Oceania is the least populated continent in the
What’s a Density ?
population density, which is the number of people per square kilometer (km²) in a specific area. It is a commonly used measure to understand how crowded or sparsely populated an area is.
country_agg1 <- pop_country %>%
mutate(label = glue("{country}, {subcontinent}
World Rank : {rank}
Population :{comma_format()(pop2023)}
Land area : {comma_format()(landAreaKm)} km²
Density : {comma_format(accuracy = 1)(density)} people/km² " ))
plot_country1 <- country_agg1 %>%
ggplot(aes(x = landAreaKm,
y = density,
color = continent,
text = label)) +
geom_point() +
labs(x = "Land Area (km²)", y = "Density (people/km²)", title = "Density vs Land Area in each Country of Current Population",color = "Continent") +
theme_gpt +
scale_color_viridis_d()+
scale_x_continuous(labels = function(x) paste0(x / 1e6, " m"))+
scale_y_continuous(labels = comma)
ggplotly(plot_country1, tooltip = "text")Quick Insight :
With this plot we can tell that there is little to none correlation between Density and Land Area
In this case, high density with low land area equals the country is a crowded enviroment, on the other hand, a country with low density and high land area have a vacant enviroment.
Countries in the world is more distributed in low density area with low land area.
with the data shown above we can make conclusion :
Every year from 1950 world population increased exponentially
Europe is the only continent that will have declined population by 2050
Asia is the most populated continent in the world
Oceania is the least populated continent in the
Countries in the world is more distributed in low density area with low land area