TPLA_P2_J0303201065_Rahma Fairuz Rania

More projects : rpubs.com/rahmarania or github.com/rahmarania

Preprocessing

Business Question

As a growth observer, i want to know which country has the smallest area and which continent is the most population.

Import Library

Instal library we need.

library(dplyr) # Data manipulation
library(ggplot2) # Visualization
## Warning: package 'ggplot2' was built under R version 4.2.3
library(glue) # hover text
library(plotly) # interactive plot
## Warning: package 'plotly' was built under R version 4.2.3

Import data

Place .csv file in the same folder with our project, or you can see from the path to load it.

wpop <- read.csv("data/world_population.csv")
head(wpop)
##   Rank CCA3 Country.Territory          Capital Continent X2022.Population
## 1   36  AFG       Afghanistan            Kabul      Asia         41128771
## 2  138  ALB           Albania           Tirana    Europe          2842321
## 3   34  DZA           Algeria          Algiers    Africa         44903225
## 4  213  ASM    American Samoa        Pago Pago   Oceania            44273
## 5  203  AND           Andorra Andorra la Vella    Europe            79824
## 6   42  AGO            Angola           Luanda    Africa         35588987
##   X2020.Population X2015.Population X2010.Population X2000.Population
## 1         38972230         33753499         28189672         19542982
## 2          2866849          2882481          2913399          3182021
## 3         43451666         39543154         35856344         30774621
## 4            46189            51368            54849            58230
## 5            77700            71746            71519            66097
## 6         33428485         28127721         23364185         16394062
##   X1990.Population X1980.Population X1970.Population Area..km..
## 1         10694796         12486631         10752971     652230
## 2          3295066          2941651          2324731      28748
## 3         25518074         18739378         13795915    2381741
## 4            47818            32886            27075        199
## 5            53569            35611            19860        468
## 6         11828638          8330047          6029700    1246700
##   Density..per.km.. Growth.Rate World.Population.Percentage
## 1           63.0587      1.0257                        0.52
## 2           98.8702      0.9957                        0.04
## 3           18.8531      1.0164                        0.56
## 4          222.4774      0.9831                        0.00
## 5          170.5641      1.0100                        0.00
## 6           28.5466      1.0315                        0.45

About dataset

This data contains information of world’s population. Data can be found here
Rank : Rank of total population each country
CCA3 : Initial of Country
Country.Territory : Country
Capital : Capital name of the country
Continent : Continent of the country
X2022.Population : Total population in 2022
X2020.Population : Total population in 2020
X2015.Population : Total population in 2015
X2010.Population : Total population in 2010
X2000.Population : Total population in 2000
X1990.Population : Total population in 1990
X1980.Population : Total population in 1980
X1970.Population : Total population in 1970
Area..km... : Total area of country per KM2
Density..per.km... : Density of country per KM2
Growth.Rate : Growth rate by country
World.Population.Percentage : World population percentage by each country

Missing Value

Our data is clean, there is no NA values.

colSums(is.na(wpop))
##                        Rank                        CCA3 
##                           0                           0 
##           Country.Territory                     Capital 
##                           0                           0 
##                   Continent            X2022.Population 
##                           0                           0 
##            X2020.Population            X2015.Population 
##                           0                           0 
##            X2010.Population            X2000.Population 
##                           0                           0 
##            X1990.Population            X1980.Population 
##                           0                           0 
##            X1970.Population                  Area..km.. 
##                           0                           0 
##           Density..per.km..                 Growth.Rate 
##                           0                           0 
## World.Population.Percentage 
##                           0

Check Data Types

Change data type correctly.

glimpse(wpop)
## Rows: 234
## Columns: 17
## $ Rank                        <int> 36, 138, 34, 213, 203, 42, 224, 201, 33, 1…
## $ CCA3                        <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", …
## $ Country.Territory           <chr> "Afghanistan", "Albania", "Algeria", "Amer…
## $ Capital                     <chr> "Kabul", "Tirana", "Algiers", "Pago Pago",…
## $ Continent                   <chr> "Asia", "Europe", "Africa", "Oceania", "Eu…
## $ X2022.Population            <int> 41128771, 2842321, 44903225, 44273, 79824,…
## $ X2020.Population            <int> 38972230, 2866849, 43451666, 46189, 77700,…
## $ X2015.Population            <int> 33753499, 2882481, 39543154, 51368, 71746,…
## $ X2010.Population            <int> 28189672, 2913399, 35856344, 54849, 71519,…
## $ X2000.Population            <int> 19542982, 3182021, 30774621, 58230, 66097,…
## $ X1990.Population            <int> 10694796, 3295066, 25518074, 47818, 53569,…
## $ X1980.Population            <int> 12486631, 2941651, 18739378, 32886, 35611,…
## $ X1970.Population            <int> 10752971, 2324731, 13795915, 27075, 19860,…
## $ Area..km..                  <int> 652230, 28748, 2381741, 199, 468, 1246700,…
## $ Density..per.km..           <dbl> 63.0587, 98.8702, 18.8531, 222.4774, 170.5…
## $ Growth.Rate                 <dbl> 1.0257, 0.9957, 1.0164, 0.9831, 1.0100, 1.…
## $ World.Population.Percentage <dbl> 0.52, 0.04, 0.56, 0.00, 0.00, 0.45, 0.00, …
wpop <- wpop %>% mutate(Continent = as.factor(Continent))
str(wpop)
## 'data.frame':    234 obs. of  17 variables:
##  $ Rank                       : int  36 138 34 213 203 42 224 201 33 140 ...
##  $ CCA3                       : chr  "AFG" "ALB" "DZA" "ASM" ...
##  $ Country.Territory          : chr  "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ Capital                    : chr  "Kabul" "Tirana" "Algiers" "Pago Pago" ...
##  $ Continent                  : Factor w/ 6 levels "Africa","Asia",..: 2 3 1 5 3 1 4 4 6 2 ...
##  $ X2022.Population           : int  41128771 2842321 44903225 44273 79824 35588987 15857 93763 45510318 2780469 ...
##  $ X2020.Population           : int  38972230 2866849 43451666 46189 77700 33428485 15585 92664 45036032 2805608 ...
##  $ X2015.Population           : int  33753499 2882481 39543154 51368 71746 28127721 14525 89941 43257065 2878595 ...
##  $ X2010.Population           : int  28189672 2913399 35856344 54849 71519 23364185 13172 85695 41100123 2946293 ...
##  $ X2000.Population           : int  19542982 3182021 30774621 58230 66097 16394062 11047 75055 37070774 3168523 ...
##  $ X1990.Population           : int  10694796 3295066 25518074 47818 53569 11828638 8316 63328 32637657 3556539 ...
##  $ X1980.Population           : int  12486631 2941651 18739378 32886 35611 8330047 6560 64888 28024803 3135123 ...
##  $ X1970.Population           : int  10752971 2324731 13795915 27075 19860 6029700 6283 64516 23842803 2534377 ...
##  $ Area..km..                 : int  652230 28748 2381741 199 468 1246700 91 442 2780400 29743 ...
##  $ Density..per.km..          : num  63.1 98.9 18.9 222.5 170.6 ...
##  $ Growth.Rate                : num  1.026 0.996 1.016 0.983 1.01 ...
##  $ World.Population.Percentage: num  0.52 0.04 0.56 0 0 0.45 0 0 0.57 0.03 ...

Data Wrangling

head(wpop)
##   Rank CCA3 Country.Territory          Capital Continent X2022.Population
## 1   36  AFG       Afghanistan            Kabul      Asia         41128771
## 2  138  ALB           Albania           Tirana    Europe          2842321
## 3   34  DZA           Algeria          Algiers    Africa         44903225
## 4  213  ASM    American Samoa        Pago Pago   Oceania            44273
## 5  203  AND           Andorra Andorra la Vella    Europe            79824
## 6   42  AGO            Angola           Luanda    Africa         35588987
##   X2020.Population X2015.Population X2010.Population X2000.Population
## 1         38972230         33753499         28189672         19542982
## 2          2866849          2882481          2913399          3182021
## 3         43451666         39543154         35856344         30774621
## 4            46189            51368            54849            58230
## 5            77700            71746            71519            66097
## 6         33428485         28127721         23364185         16394062
##   X1990.Population X1980.Population X1970.Population Area..km..
## 1         10694796         12486631         10752971     652230
## 2          3295066          2941651          2324731      28748
## 3         25518074         18739378         13795915    2381741
## 4            47818            32886            27075        199
## 5            53569            35611            19860        468
## 6         11828638          8330047          6029700    1246700
##   Density..per.km.. Growth.Rate World.Population.Percentage
## 1           63.0587      1.0257                        0.52
## 2           98.8702      0.9957                        0.04
## 3           18.8531      1.0164                        0.56
## 4          222.4774      0.9831                        0.00
## 5          170.5641      1.0100                        0.00
## 6           28.5466      1.0315                        0.45
# Smallest Countries
couare <- wpop %>% 
      group_by(Country.Territory) %>%
      summarise(ar = Area..km.., grow = Growth.Rate) %>% arrange(desc(ar)) %>% tail(10) %>% ungroup() %>% mutate(label = glue("Total Area {ar} Km2
                                                                                                          Growth Rate {grow}"))
# Continent with the most population
conpop <- wpop %>% select(Continent,World.Population.Percentage)%>% 
  group_by(Continent)%>% 
  summarise(poptotal = sum(World.Population.Percentage)) %>% ungroup() 
# Total population of countries in Asia in 2 years 
aspop <- wpop %>% filter(Continent == "Asia") %>% 
      group_by(Country.Territory) %>% summarise(total = sum(X2022.Population, X2020.Population)) %>% 
      arrange(desc(total)) %>% head() %>%  ungroup() 

Visualization

# Smallest Countries
plot1 <- ggplot(couare, aes(x = ar, y = reorder(Country.Territory, ar), text = label)) + geom_col(aes(fill = ar)) +
      labs(title = paste("Smallest Country by Its Area"))+
      scale_fill_gradient(low = "red", high= "green") +
      theme_minimal() +
      theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
    
ggplotly(plot1, tooltip = "text")
# Continent with the most population
ggplot(conpop, aes(x="", y = poptotal, fill=Continent)) +
  geom_bar(stat="identity", width=1, color="white") +
  labs(title = paste("Population of Continent Over The World")) +
  coord_polar("y", start=0) + 
   scale_fill_brewer(palette = "Paired")+
   theme_void() + geom_text(aes(label = paste0(round(poptotal), "%")), position = position_stack(vjust = 0.5), color = 'white') +
  theme(plot.title = element_text(color = "black", size = 16, face = "bold", hjust = 0.5))

ggplot(aspop, aes(x = total, y = reorder(Country.Territory, total))) +
  geom_col(aes(fill = Country.Territory)) +
  labs(title = "Biggest Asian's Countries Population") +
  theme_minimal() +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Conclusion

From visualization above, we can conclude that the smallest country is Vatican, and it has a suitable growth rate so there is no need to worry about overcrowding. We can also see from the pie chart that the most populated continent was Asia, which the biggest populated is China. It’s better to make a policies to restrain the birth rate so that population density will not occur.