TPLA_P2_J0303201065_Rahma Fairuz Rania
More projects : rpubs.com/rahmarania or github.com/rahmarania
Preprocessing
Business Question
As a growth observer, i want to know which country has the smallest area and which continent is the most population.
Import Library
Instal library we need.
library(dplyr) # Data manipulation
library(ggplot2) # Visualization## Warning: package 'ggplot2' was built under R version 4.2.3
library(glue) # hover text
library(plotly) # interactive plot## Warning: package 'plotly' was built under R version 4.2.3
Import data
Place .csv file in the same folder with our project, or you can see from the path to load it.
wpop <- read.csv("data/world_population.csv")
head(wpop)## Rank CCA3 Country.Territory Capital Continent X2022.Population
## 1 36 AFG Afghanistan Kabul Asia 41128771
## 2 138 ALB Albania Tirana Europe 2842321
## 3 34 DZA Algeria Algiers Africa 44903225
## 4 213 ASM American Samoa Pago Pago Oceania 44273
## 5 203 AND Andorra Andorra la Vella Europe 79824
## 6 42 AGO Angola Luanda Africa 35588987
## X2020.Population X2015.Population X2010.Population X2000.Population
## 1 38972230 33753499 28189672 19542982
## 2 2866849 2882481 2913399 3182021
## 3 43451666 39543154 35856344 30774621
## 4 46189 51368 54849 58230
## 5 77700 71746 71519 66097
## 6 33428485 28127721 23364185 16394062
## X1990.Population X1980.Population X1970.Population Area..km..
## 1 10694796 12486631 10752971 652230
## 2 3295066 2941651 2324731 28748
## 3 25518074 18739378 13795915 2381741
## 4 47818 32886 27075 199
## 5 53569 35611 19860 468
## 6 11828638 8330047 6029700 1246700
## Density..per.km.. Growth.Rate World.Population.Percentage
## 1 63.0587 1.0257 0.52
## 2 98.8702 0.9957 0.04
## 3 18.8531 1.0164 0.56
## 4 222.4774 0.9831 0.00
## 5 170.5641 1.0100 0.00
## 6 28.5466 1.0315 0.45
About dataset
This data contains information of world’s population. Data can be
found here
Rank : Rank of total population each
country
CCA3 : Initial of Country
Country.Territory : Country
Capital : Capital name of the country
Continent : Continent of the country
X2022.Population : Total population in 2022
X2020.Population : Total population in 2020
X2015.Population : Total population in 2015
X2010.Population : Total population in 2010
X2000.Population : Total population in 2000
X1990.Population : Total population in 1990
X1980.Population : Total population in 1980
X1970.Population : Total population in 1970
Area..km... : Total area of country per KM2
Density..per.km... : Density of country per KM2
Growth.Rate : Growth rate by country
World.Population.Percentage : World population percentage
by each country
Missing Value
Our data is clean, there is no NA values.
colSums(is.na(wpop))## Rank CCA3
## 0 0
## Country.Territory Capital
## 0 0
## Continent X2022.Population
## 0 0
## X2020.Population X2015.Population
## 0 0
## X2010.Population X2000.Population
## 0 0
## X1990.Population X1980.Population
## 0 0
## X1970.Population Area..km..
## 0 0
## Density..per.km.. Growth.Rate
## 0 0
## World.Population.Percentage
## 0
Check Data Types
Change data type correctly.
glimpse(wpop)## Rows: 234
## Columns: 17
## $ Rank <int> 36, 138, 34, 213, 203, 42, 224, 201, 33, 1…
## $ CCA3 <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", …
## $ Country.Territory <chr> "Afghanistan", "Albania", "Algeria", "Amer…
## $ Capital <chr> "Kabul", "Tirana", "Algiers", "Pago Pago",…
## $ Continent <chr> "Asia", "Europe", "Africa", "Oceania", "Eu…
## $ X2022.Population <int> 41128771, 2842321, 44903225, 44273, 79824,…
## $ X2020.Population <int> 38972230, 2866849, 43451666, 46189, 77700,…
## $ X2015.Population <int> 33753499, 2882481, 39543154, 51368, 71746,…
## $ X2010.Population <int> 28189672, 2913399, 35856344, 54849, 71519,…
## $ X2000.Population <int> 19542982, 3182021, 30774621, 58230, 66097,…
## $ X1990.Population <int> 10694796, 3295066, 25518074, 47818, 53569,…
## $ X1980.Population <int> 12486631, 2941651, 18739378, 32886, 35611,…
## $ X1970.Population <int> 10752971, 2324731, 13795915, 27075, 19860,…
## $ Area..km.. <int> 652230, 28748, 2381741, 199, 468, 1246700,…
## $ Density..per.km.. <dbl> 63.0587, 98.8702, 18.8531, 222.4774, 170.5…
## $ Growth.Rate <dbl> 1.0257, 0.9957, 1.0164, 0.9831, 1.0100, 1.…
## $ World.Population.Percentage <dbl> 0.52, 0.04, 0.56, 0.00, 0.00, 0.45, 0.00, …
wpop <- wpop %>% mutate(Continent = as.factor(Continent))
str(wpop)## 'data.frame': 234 obs. of 17 variables:
## $ Rank : int 36 138 34 213 203 42 224 201 33 140 ...
## $ CCA3 : chr "AFG" "ALB" "DZA" "ASM" ...
## $ Country.Territory : chr "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## $ Capital : chr "Kabul" "Tirana" "Algiers" "Pago Pago" ...
## $ Continent : Factor w/ 6 levels "Africa","Asia",..: 2 3 1 5 3 1 4 4 6 2 ...
## $ X2022.Population : int 41128771 2842321 44903225 44273 79824 35588987 15857 93763 45510318 2780469 ...
## $ X2020.Population : int 38972230 2866849 43451666 46189 77700 33428485 15585 92664 45036032 2805608 ...
## $ X2015.Population : int 33753499 2882481 39543154 51368 71746 28127721 14525 89941 43257065 2878595 ...
## $ X2010.Population : int 28189672 2913399 35856344 54849 71519 23364185 13172 85695 41100123 2946293 ...
## $ X2000.Population : int 19542982 3182021 30774621 58230 66097 16394062 11047 75055 37070774 3168523 ...
## $ X1990.Population : int 10694796 3295066 25518074 47818 53569 11828638 8316 63328 32637657 3556539 ...
## $ X1980.Population : int 12486631 2941651 18739378 32886 35611 8330047 6560 64888 28024803 3135123 ...
## $ X1970.Population : int 10752971 2324731 13795915 27075 19860 6029700 6283 64516 23842803 2534377 ...
## $ Area..km.. : int 652230 28748 2381741 199 468 1246700 91 442 2780400 29743 ...
## $ Density..per.km.. : num 63.1 98.9 18.9 222.5 170.6 ...
## $ Growth.Rate : num 1.026 0.996 1.016 0.983 1.01 ...
## $ World.Population.Percentage: num 0.52 0.04 0.56 0 0 0.45 0 0 0.57 0.03 ...
Data Wrangling
head(wpop)## Rank CCA3 Country.Territory Capital Continent X2022.Population
## 1 36 AFG Afghanistan Kabul Asia 41128771
## 2 138 ALB Albania Tirana Europe 2842321
## 3 34 DZA Algeria Algiers Africa 44903225
## 4 213 ASM American Samoa Pago Pago Oceania 44273
## 5 203 AND Andorra Andorra la Vella Europe 79824
## 6 42 AGO Angola Luanda Africa 35588987
## X2020.Population X2015.Population X2010.Population X2000.Population
## 1 38972230 33753499 28189672 19542982
## 2 2866849 2882481 2913399 3182021
## 3 43451666 39543154 35856344 30774621
## 4 46189 51368 54849 58230
## 5 77700 71746 71519 66097
## 6 33428485 28127721 23364185 16394062
## X1990.Population X1980.Population X1970.Population Area..km..
## 1 10694796 12486631 10752971 652230
## 2 3295066 2941651 2324731 28748
## 3 25518074 18739378 13795915 2381741
## 4 47818 32886 27075 199
## 5 53569 35611 19860 468
## 6 11828638 8330047 6029700 1246700
## Density..per.km.. Growth.Rate World.Population.Percentage
## 1 63.0587 1.0257 0.52
## 2 98.8702 0.9957 0.04
## 3 18.8531 1.0164 0.56
## 4 222.4774 0.9831 0.00
## 5 170.5641 1.0100 0.00
## 6 28.5466 1.0315 0.45
# Smallest Countries
couare <- wpop %>%
group_by(Country.Territory) %>%
summarise(ar = Area..km.., grow = Growth.Rate) %>% arrange(desc(ar)) %>% tail(10) %>% ungroup() %>% mutate(label = glue("Total Area {ar} Km2
Growth Rate {grow}"))# Continent with the most population
conpop <- wpop %>% select(Continent,World.Population.Percentage)%>%
group_by(Continent)%>%
summarise(poptotal = sum(World.Population.Percentage)) %>% ungroup() # Total population of countries in Asia in 2 years
aspop <- wpop %>% filter(Continent == "Asia") %>%
group_by(Country.Territory) %>% summarise(total = sum(X2022.Population, X2020.Population)) %>%
arrange(desc(total)) %>% head() %>% ungroup() Visualization
# Smallest Countries
plot1 <- ggplot(couare, aes(x = ar, y = reorder(Country.Territory, ar), text = label)) + geom_col(aes(fill = ar)) +
labs(title = paste("Smallest Country by Its Area"))+
scale_fill_gradient(low = "red", high= "green") +
theme_minimal() +
theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
ggplotly(plot1, tooltip = "text")# Continent with the most population
ggplot(conpop, aes(x="", y = poptotal, fill=Continent)) +
geom_bar(stat="identity", width=1, color="white") +
labs(title = paste("Population of Continent Over The World")) +
coord_polar("y", start=0) +
scale_fill_brewer(palette = "Paired")+
theme_void() + geom_text(aes(label = paste0(round(poptotal), "%")), position = position_stack(vjust = 0.5), color = 'white') +
theme(plot.title = element_text(color = "black", size = 16, face = "bold", hjust = 0.5))ggplot(aspop, aes(x = total, y = reorder(Country.Territory, total))) +
geom_col(aes(fill = Country.Territory)) +
labs(title = "Biggest Asian's Countries Population") +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))Conclusion
From visualization above, we can conclude that the smallest country is Vatican, and it has a suitable growth rate so there is no need to worry about overcrowding. We can also see from the pie chart that the most populated continent was Asia, which the biggest populated is China. It’s better to make a policies to restrain the birth rate so that population density will not occur.