Assignment 7

Introduction

For this assignment, I wanted to see if there are indicators or causes behind a country’s population growth/decline. Therefore, I retrieved this dataset from the Worldometers website, which measures population statistics for countries all over the world. I scraped this data from the website, saved it as a static dataset (csv) on my computer, and have uploaded it to this document (as seen below). As you will see, the data does not upload in the exact form we need so some data wrangling will be necessary.

I will analyze this dataset by creating visualizations to show the relationship between the quantitative measurements provided. These measurements include fertility rate, median age, country size, change in urban population, etc. From these visualizations, I hope to distinguish the most prominent factors in a country’s population growth.

library(skimr)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

Uploading the Data

worldometer <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/olsonp2_xavier_edu/IQAq0nojylhgTKForNOX9qzLAZU4lA1KciJh22WsrGul1X0?download=1")
Rows: 233 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Country..or.dependency., Yearly.Change, Net.Change, Migrants..net.,...
dbl (3): X., Fert..Rate, Median.Age
num (3): Population.2025, Density..P.Km.., Land.Area..Km..

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Viewing the Data

view(worldometer)

Data Wrangling

A lot of these values carried over as text strings, therefore preventing us from doing extensive analysis. This is why the commands below were included; these changed the values to numeric.

worldometer <- worldometer %>%
  mutate(
    Yearly.Change = gsub("−", "-", Yearly.Change),  
    Yearly.Change = gsub("%", "", Yearly.Change),   
    Yearly.Change = as.numeric(Yearly.Change)
  )

worldometer <- worldometer %>%
  mutate(
    Urban.Pop.. = gsub("−", "-", Yearly.Change),  
    Urban.Pop.. = gsub("%", "", Yearly.Change),   
    Urban.Pop.. = as.numeric(Yearly.Change)
  )

worldometer <- worldometer %>%
  mutate(
    World.Share = gsub("−", "-", Yearly.Change),  
    World.Share = gsub("%", "", Yearly.Change),   
    World.Share = as.numeric(Yearly.Change)
  )

worldometer$Population.2025 <- str_remove_all(worldometer$Population.2025, ",")
worldometer$Population.2025 <- as.numeric(worldometer$Population.2025)

worldometer <- worldometer %>%
  mutate(
    Net.Change = gsub("−", "-", Yearly.Change),  
    Net.Change = as.numeric(Yearly.Change)
  )

worldometer <- worldometer %>%
  mutate(
    Migrants..net. = gsub("−", "-", Yearly.Change),  
    Migrants..net. = as.numeric(Yearly.Change)
  )

worldometer$Density..P.Km.. <- as.numeric(worldometer$Density..P.Km..)

worldometer$Land.Area..Km.. <- str_remove_all(worldometer$Land.Area..Km.., ",")
worldometer$Land.Area..Km.. <- as.numeric(worldometer$Land.Area..Km..)

Additional Column - Growth Category

I wanted to add one column that is essentially a dummy variable stating whether the country experienced a positive or negative population growth. This will be used to do further analysis and differentiate visualizations specifically for those that experienced a positive/negative population growth.

worldometer <- worldometer %>%
  mutate(Growth_Category = if_else(Yearly.Change > 0,
                                   "Positive",
                                   "Negative"))

Fertility Rate to Population Growth (Scatterplot)

worldometer %>%
  ggplot(aes(x = Fert..Rate, y = Yearly.Change)) +
  geom_point() +
  labs(title = "Fertility Rate to Population Growth", x = "Fertility Rate (%)", y = "Population Growth (%)")

This is likely a relationship we could have predicted, however, it is still interesting to look at. While there is some clutter in the data points, there is a clear, positive relationship between fertility rate and population growth.

Median Age to Population Growth by Growth Category (Scatterplot)

worldometer %>%
  ggplot(aes(x = Median.Age, y = Yearly.Change)) +
  geom_point() +
  facet_wrap(~ Growth_Category) +
  labs(title = "Median Age to Population Growth", x = "Median Age", y = "Population Growth (%)")

The results of this visualization are rather interesting. In the case of countries that experienced a negative population growth, there is no recognizable relationship to the median age. However, for countries that had a positive population growth, it seems that the higher the median age, the lower the growth was. Therefore, I believe we can still conclude that there is an inverse relationship between the median age of a country and its population growth.

Migrants to Population Growth (Scatterplot)

worldometer %>%
  ggplot(aes(x = Migrants..net., y = Yearly.Change)) +
  geom_point() +
  labs(title = "Migrants to Population Growth", x = "Migrants", y = "Population Growth (%)")

This is a perfectly linear relationship between the change in migrants and the country’s population growth. This is another relationship that we could have expected; however, it is still interesting to see the relationship mapped out. We will include this relationship in our final conclusion at the bottom.

Country Size by Growth Category (Scatterplot)

worldometer %>%
  ggplot(aes(x = Land.Area..Km.., y = Yearly.Change, color = Growth_Category)) +
  geom_point() +
  facet_wrap(~ Growth_Category) +
  scale_x_log10() +
  labs(title = "Country Size to Population Growth",
       x = "Country Size (km², log scale)",
       y = "Population Growth (%)",
       color = "Growth Category")
Warning in scale_x_log10(): log-10 transformation introduced infinite values.

There is no recognizable relationship between the size of the country and its population growth, even when isolated by Growth Category. This is somewhat surprising as I assumed there would be somewhat of a positive relationship; however, there are many countries with large plots of land that are uninhabitable. Therefore, this will not be one of the variables included in our conclusion as an indicator of population growth.

Country Density by Growth Category (Boxplot)

worldometer %>%
  ggplot(aes(x = Growth_Category, y = Density..P.Km..)) +
  geom_boxplot() +
  labs(title = "Country Density by Growth Category", x = "Growth Category", y = "Country Density (per km)")

I found these results to be rather surprising. I initially assumed country density would be significantly higher for countries that experienced population growth compared to those that did not. However, we can see that density remains about the same (with the exception of some outliers) across both categories. Therefore, we will not include this variable in our final conclusion.

Change in Urban Population to Population Growth (Scatterplot)

worldometer %>%
  ggplot(aes(x = Urban.Pop.., y = Yearly.Change)) +
  geom_point() +
  labs(title = "Change in Urban Population to Population Growth", x = "Change in Urban Pop.", y = "Population Growth (%)")

This is another perfectly linear relationship, similar to the change in migrants visualization above. It is not a surprise that these variables are heavily related, and will certainly be included in our conclusion.

Final Conclusion

Upon completing our analysis of specific variables and their effect on yearly population growth, we have been able to narrow down the list of variables that have a real effect. The most obvious being the Change in Migrants and Urban Population. These relationships were perfectly linear to the point that an equation could be derived from the slope of the trendline.

While the variables mentioned above were the most obvious, two other variables seemed to have an effect on population growth. First, countries with a high fertility rate seemed to have higher population growth. This relationship was not perfectly linear like the ones mentioned above; however, there is a noticeable positive relationship. Second, in countries that experienced population growth, there was a negative relationship with the median age of the country. However, in countries that had a negative population growth, there was no detectable relationship to the median age. Despite the inconsistency between Growth Categories, I would still consider the median age of a country to have an effect on the growth of its population.