Introduction to Statistics - Midterm Project

Midterm Project - Group 5 - Le Ngo Mai Uyen - Nguyen Van Trung - Nguyen Duong Thai Anh - Nguyen Chau Ngoc


1. INTRODUCTION

1.1 Motivation

Our group focused on Tunisia, Egypt, Libya, and Sudan out of the six North African countries because these four nations show the most dynamic and contrasting transformations in how income factors relate to health outcomes. They capture diverse aspects of economic growth, inequality levels, and demographic transitions that make them ideal for analyzing how wealth distribution and income growth influence health indicators such as life expectancy and fertility rates. By comparing these four, we can uncover not only general regional patterns but also the unique national stories behind North Africa’s socio-economic evolution.

1.2 Data Description

The focus is on income categories and examining how financial inputs result in health outcomes across all four countries over time. The features used from the dataset include:

  • Time: From 1800 to 2040. Reflects historical progress and key developmental milestones.
  • Country: Four chosen North African countries — Egypt, Tunisia, Libya, and Sudan.
  • income_level: Indicates whether a country is considered level 1–level 5.
  • household_income: The mean income of all households in the country, usually expressed in local currency or USD. Indicates general wealth and living standards.
  • gini: A measure of income inequality in a country.
    • 0 = perfect equality (everyone has the same income)
    • 100 = perfect inequality (one person has all the income)
  • life_expectancy_at_birth: Average expected lifespan of a newborn.
  • babies_per_woman: The average number of children a woman will have in her lifetime.
  • Share of people on level 1–5: Percentage of the population whose income falls within each income level, from the poorest (Level 1) to the richest (Level 5).

1.3 Key Questions for Analysis

1.3.1 How did income_level, gini, babies_per_woman and life_expectancy_at_birth evolve from the 1800s to the 2000s across Tunisia, Egypt, Libya, and Sudan?
Observation: From the 1800s to the 2000s, all four North African countries transitioned from low-income, high-fertility societies with short life expectancy to more urbanized and economically developed nations with improved health outcomes. Between 1800 and 1950, income levels remained low, inequality moderate, and life expectancy rarely exceeded 40 years. After 1950, however, rapid socio-economic change began.

1.3.2 How do income-related factors influence babies_per_woman and life_expectancy_at_birth in North African countries over time?
Observation: Income plays a central role in shaping health outcomes across North African countries. As income levels and household earnings rise, people gain better access to healthcare, education, and nutrition - leading to longer life expectancy and lower fertility rates. Overall, the data reveals that sustainable health progress depends on both economic growth and social equality.

1.3.3 Do all four countries follow the same general trend between income inputs and health outcomes?
Observation: No, they don’t. While the overall relationship between income and health outcomes is generally positive across the four countries, the specific patterns differ notably due to each nation’s unique historical and socio-economic context. All countries show that higher income levels are associated with longer life expectancy and lower fertility rates, but their paths are not identical.

2. DATA PROCESSING

2.1 Libraries

To process and visualize the datasets effectively, we used several R libraries:

  • readr: Efficiently read CSV datasets.
  • dplyr: Manipulate and transform data, including filtering, joining, grouping, and summarizing variables.
  • tidyr: Reshape and organize data for analysis and visualization.
  • ggplot2: Create detailed and flexible plots.
  • plotly: Add interactivity to visualizations for dynamic exploration.
  • patchwork: Combine multiple plots into cohesive layouts.
library(dplyr)
library(readr)
library(ggplot2)
library(tidyr)
library(plotly)
library(patchwork)

2.2 Dataset Reading and Country Selection

We began by importing the datasets income_level.csv and world_data.csv. The analysis focused on four North African countries: Egypt, Libya, Sudan, and Tunisia. Using dplyr, we filtered both datasets to retain only the observations for these countries.

# Read datasets
income_level <- read_csv("income_level.csv", show_col_types = FALSE)
world_data <- read_csv("world_data.csv", show_col_types = FALSE)

# Define countries of interest
countries <- c("Egypt", "Libya", "Sudan", "Tunisia")

# Filter datasets
income_filtered <- income_level %>%
  filter(name %in% countries)

world_filtered <- world_data %>%
  filter(country %in% countries)

2.3 Combining and Verifying the Data

Since income_level only extends to 2040 while world_data continues to 2100, we aligned the datasets for the period 1800–2040 using a left join. This preserved all rows from income_level while appending corresponding variables from world_data.

combined_data_beforeclean <- income_filtered %>%
  left_join(world_filtered, by = c("name" = "country", "year" = "year"))

We verified the merge with several checks. These steps ensured that all entries were correctly aligned by country and year.

# Check unique country names
unique(income_filtered$name)
unique(world_filtered$country)

# Check year ranges
range(income_filtered$year)
range(world_filtered$year)

# Check missing values
sum(is.na(combined_data_beforeclean))
colSums(is.na(combined_data_beforeclean))
combined_data_beforeclean %>% filter(if_any(everything(), is.na))

# Spot-check specific rows
combined_data_beforeclean %>% 
  filter(name == "Egypt" & year %in% c(1800, 1900, 2000))

# Check for duplicate country-year pairs
combined_data_beforeclean %>%
  group_by(name, year) %>%
  summarise(n = n()) %>%
  filter(n > 1)

2.4 Cleaning the Data

Finally, we removed unnecessary columns automatically generated during import (e.g., those starting with ...1) and verified the structure and completeness of the cleaned dataset.

combined_data <- combined_data_beforeclean %>%
  select(-starts_with("...1"))

# Inspect structure and missing values
str(combined_data)
sum(is.na(combined_data))
colSums(is.na(combined_data))
combined_data %>% filter(if_any(everything(), is.na))

# Preview cleaned dataset
head(combined_data)

The resulting combined_data contains aligned observations for all variables of interest for the selected countries, covering the years 1800–2040. This verified and cleaned dataset serves as the basis for subsequent statistical analysis and visualization.


4. THE RELATIONSHIP BETWEEN INCOME AND HEALTH OUTCOMES

After carefully examining the dataset, our group chose to focus on the relationship between income indicators and health outcomes. Specifically, we aim to explore how different measures of income - such as Income Level, Household Income, and the Gini Index - affect key health indicators like Life Expectancy and Babies per Woman. Our goal is to understand how economic conditions shape population health and fertility trends across countries.

To uncover broader patterns that transcend national and temporal boundaries, we used boxplots to examine the relationship between income and health variables across all countries. By removing time and country identifiers, we transformed the dataset into a unified pool of observations. This allowed us to focus purely on the interaction between income indicators - such as income level, household income, and the Gini index - and health outcomes like life expectancy and fertility rate. Boxplots provided a clear visual summary of these relationships, highlighting central tendencies, variability, and outliers across income groups. This approach enabled us to detect consistent trends and anomalies, offering insights into how economic conditions may influence population health regardless of specific historical or geographic contexts.

Additionally, we investigated the direct relationship between Babies per Woman and Life Expectancy, as these two indicators often reflect a country’s stage of economic and social development. This holistic approach helps us reveal cross-cutting trends and anomalies that might otherwise be hidden in country-specific analyses.

4.1 Shared Income Level contributions

4.1.1 Income Level and Life Expectancy: Positive Relationship


The diagram illustrates a clear positive relationship between income level and life expectancy at birth across countries. As income level rises, life expectancy increases substantially. Countries within income level 1 record the lowest life expectancy, averaging around 35 years, while those in level 3 reach approximately 75 years. At income level 4, the life expectancy stabilizes at a consistently high value, showing almost no variation among nations in this category. The wide dispersion observed in level 2 reflects uneven progress in healthcare and living standards among lower-middle-income countries. Overall, the figure demonstrates that economic development is strongly associated with improved longevity and health outcomes.

4.1.2 Income Level and Babies Per Woman: Negative Relationship


The box plot shows a strong negative relationship between income level and fertility rate across countries. As income increases, the number of babies per woman decreases remarkably. Countries in income level 1 experience the highest fertility rates, averaging around seven children per woman. In contrast, fertility rates in income level 3 drop sharply to approximately three children per woman. Although income levels 3 and 4 differ in their minimum and maximum values, both display similar average fertility rates, indicating that fertility stabilizes once nations reach higher income groups - especially when almost no variation is observed at income level 4. In addition to the overall decreasing trend, countries within income levels 2 and 3 demonstrate a wide range in fertility rates, reflecting differences in social, cultural, and economic development among middle-income nations.

4.2 Average household income contributions

4.2.1 Household Income and Babies Per Woman: Negative Relationship


The box plot reveals a clear negative relationship between household income and fertility rate across countries. As household income rises, the average number of babies per woman consistently declines. At the lowest income range (0-2,000), fertility rates are high with median values around six to seven children per woman and several outliers below this level. Moving toward higher income ranges (2,000-5,000 and 5,000-10,000), the median fertility rate drops sharply, and the spread widens, indicating greater variation among middle-income households. Beyond the 10,000-20,000 range, fertility continues to decline, with most households showing around three children or fewer. However, at the highest income range (20,000-35,000), the fertility rate unexpectedly reached the highest value of around eight children, indicating an outcast country which probably experiences data imbalance or unique cultural factors - such as education, social development,… Globally, the downward trend of household income and birth rate still holds strong.

4.2.2 Household Income and Life Expectancy: Positive Relationship


The box plot displays a generally positive relationship between household income and life expectancy at birth. As household income increases, the average life expectancy tends to rise. Countries with the lowest income range (0-2K) experience the shortest life expectancies and widest dispersion, suggesting that poverty contributes to unstable access to healthcare and living environment. From 2K to 10K, life expectancy increases sharply, reaching its peak at around 80 years in the 5K-10K range. However, beyond this point, the pattern slightly reverses. Despite higher income ranges (10K-20K and 20K-35K), life expectancy no longer continues to rise and even shows a small decline, implying that after reaching a certain economic limitation, further increases in household income may not significantly enhance average lifespan. Generally, this stable pattern reflects the health and living condition improvements shaped by financial quality.

4.3 Gini Ranges contributions

4.3.1 Gini and Life Expectancy: Fluctuating Relationship


The boxplot presents a fluctuating relationship between life expectancy at birth and the gini coefficient range across countries.This figure especially shows no clear correlation between income inequality and life expectancy. In other words, life expectancy varies irregularly across all gini ranges, suggesting that inequality alone does not strongly determine a country’s average lifespan. Although countries with moderate inequality (Gini 30-40) tend to show higher and more dispersed life expectancies, the overall trend remains inconsistent. This indicates that other factors-such as healthcare systems, education, and government policies-may play a more decisive role in shaping life expectancy than inequality levels alone.

4.3.2 Gini and Babies Per Woman: Independent Relationship


The boxplot illustrates the relationship between the gini coefficient and fertility rate across countries. Overall, there is no clear negative or positive trend between income inequality and the number of babies per woman, as fertility remains high throughout most gini ranges. Not only countries within the 30-40 gini range exhibit the widest variation - fertility rates spanning from approximately two to eight children per woman, but those in the gini range 40-50 also display a set of outliers, both reflecting the diverse economic and social conditions among nations with moderate inequality. In contrast, countries with higher inequality levels, particularly those within the 50-70 gini range, display consistently high fertility rates with limited variation. Meanwhile, the lowest (20-30) and highest (70-80) Gini ranges show minimal difference, likely due to similar culture and social structure across the North African area.

4.4 Fertility rate and Life Expectancy

Babies Per Woman and Life Expectancy: Negative Relationship


The box plot presents a negative relationship between life expectancy at birth and fertility rate across countries. As life expectancy increases, the number of babies per woman declines notably. Countries within the two lowest life expectancy ranges (6-21 years and 21-37 years) display a stable fertility pattern, with women having around six to seven children on average and very little variation between nations. This suggests that fertility remains consistently high regardless of short lifespan. In contrast, as life expectancy rises beyond 37 years, fertility rates begin to drop sharply. Nations within the highest range (68-83 years) record the lowest fertility levels, averaging about two children per woman. The decreasing and stabilizing trend at higher life expectancy levels indicates that as countries develop and citizens live longer, family sizes tend to shrink to a lower rate.


5. UNIQUE COUNTRY PATTERNS

5.1 Income Level and Babies Per Woman


5.1.1 Libya’s unique patterns

The box plot of Libya presents a distinctive pattern compared to the general global trend between income level and fertility rate. While most countries experience a gradual decline in fertility as income rises, Libya shows a sudden drop. At income levels 1 and 2, the fertility rate remains consistently high, averaging around seven babies per woman while other nations’ birth rate drops to about 6 children at income level 2. However, once reaching income level 3, the fertility rate plummets sharply, displaying a wide variation across the population.

This unusual trend can be explained through Libya’s historical and social context. After the discovery of oil in the 1950s, Libya rapidly shifted from a low-income nation to a wealthy oil-dependent economy. Yet, despite this economic growth, traditional family structures and limited female employment kept fertility rates high for decades. During Gaddafi’s rule, expanded healthcare and education gradually lowered birth rates, but regional inequality and conservative norms - religious beliefs or traditional rules -maintained strong variations. The sharp fluctuation at level 3 may also reflect the instability following the 2011 civil conflict, which disrupted economic conditions and household decisions.

5.1.2 Tunisia’s unique patterns

The box plot of Tunisia exhibits a distinct pattern compared to the general global trend between income and fertility rate. While most nations experience a smooth and gradual decline as income rises, Tunisia shows a sharper drop in fertility, particularly between levels 2 and 3. At income level 2, the fertility rate remains moderately high, around six babies per woman, which is higher than the global median at this income group. However, once Tunisia transitions into income level 3, the fertility rate falls dramatically to just above two babies per woman, indicating one of the steepest declines among four North African countries.

This pattern reflects Tunisia’s unique historical trajectory. Following independence in 1956, Tunisia became one of the first Arab nations to implement progressive family planning policies and expand female education. The 1966 National Family Planning Program drastically lowered birth rates by empowering women’s reproductive choices. Additionally, Tunisia’s early investment in girls’ education and urbanization accelerated the demographic transition, with more women joining the workforce and delaying marriage. The sharp contrast between levels 2 and 3 therefore mirrors the country’s rapid modernization and gender equality.

5.2 Household Income and Life Expectancy

Libya’s Unique patterns


In the highest household income range (20k - 35k), Libya’s life expectancy shows a surprising decline compared to the previous income bracket - being distinctive from the global pattern where health outcomes typically continue improving with wealth. While most countries in this range maintain stable life expectancies around 75-80 years, Libya’s distribution median drops to roughly 68-70 years. This special drop reveals how higher income in Libya doesn’t necessarily translate into better living conditions or longevity.

According to historical context, this paradox reflects the country’s resource dependent economy and political instability. During Gaddafi’s era, oil revenues created an increase of wealthy elites whose income was not matched by improvements in healthcare infrastructure or social services. After the 2011 civil war, the collapse of public health systems, shortages of medical supplies, and mass displacement further impacted life expectancy, even among high-ranked groups. Consequently, high income no longer guarantees access to reliable healthcare or security. The broad range and presence of outliers in this income group thus mirror a nation where economic privilege coexists with weak systematic management, exposing the importance of social and economic equality.

5.3 Household Income and Babies Per Woman

Libya’s unique patterns


The box plot of Libya reveals a striking deviation from the global trend between household income and fertility rate. While the overall global pattern shows a consistent decline in fertility as income rises, Libya’s distribution is far more inconsistent. At lower income ranges (0-5,000 USD), fertility remains exceptionally high and stable, averaging around seven babies per woman - higher than the global median for similar income groups. Interestingly, at the highest income bracket (20,000-35,000 USD), fertility unexpectedly rebounds to about eight babies per woman - an unseen bouncing back in other three countries.

This rare phenomenon can be traced back to Libya’s complicated economic and political history. The discovery of oil in the 1950s transformed Libya from a poor desert nation into one of Africa’s richest economies. But, wealth distribution remained uneven and largely dependent on state control, resulting in a big gap in standards of living between urban and rural people. During Gaddafi’s rule, expanded healthcare and education gradually lowered birth rates, but regional inequality and conservative norms maintained strong variations. The unexpected fertility rebound among the highest-income group may be explained by Libya’s government policy and subsidies that Libya’s oil wealth financed free healthcare, education, housing, and child benefits, removing much of the financial pressure associated with raising children. In other words, financial security removes the economic constraints that limit family size in other countries’ societies, allowing wealthy households to maintain high fertility while still maintaining a comfortable living conditions.

5.4 Gini And Babies Per Woman

Tunisia’s unique patterns:


The box plot of Tunisia displays a noticeably wider variation in fertility within the 40-50 Gini coefficient range compared to the global pattern. While most countries show a more moderate fertility rate under this level of income inequality - around 5 to 7 babies, Tunisia’s data reveals extreme dispersion - stretching from as low as two to above six babies per woman, followed by multiple outliers.

Historically, this divergence can be traced back to the country’s uneven modernization after independence. Tunisia’s early adoption of family planning in the 1960s and its rapid expansion of female education sharply lowered fertility in urban and coastal areas. However, the inland and rural regions were left behind, remaining culturally conservative. As a result, families in wealthier urban centers followed small-family norms, while those in poorer rural communities continued traditional high-birth patterns. The abundance of outliers and the wide range in this Gini bracket thus reflect Tunisia’s dual social reality - a modern, low-fertility elite coexisting with an undeveloped population still followed by large-family structures, both shaped by decades of uneven development and regional inequality.

6. CONCLUSION

This study of Tunisia, Egypt, Libya, and Sudan reveals how income development strongly influences health outcomes across North Africa. Over time, all four countries have shifted from low-income, high-fertility, and low-life-expectancy societies to more developed economies with longer lifespans and smaller families-especially after the 1950s, when independence, education, and healthcare reforms accelerated change.

Overall, higher income levels and household earnings are closely linked to improved life expectancy and lower fertility rates. However, the relationship is not uniform. Libya’s oil-driven boom and Tunisia’s persistent inequality show that economic growth alone does not ensure better health for all. The Gini index proves crucial where inequality remains high, health outcomes improve more slowly.

In short, rising income is a key driver of health progress, but its impact depends on how evenly wealth is shared and how effectively governments invest in human development. North Africa’s experience demonstrates that true progress requires not just growth, but inclusive and equitable growth.

REFERENCES

Nasser’s socialist reforms and industrialization policies:
Rutherford, B. K. (2024). Understanding change in Egypt’s social contract since 2011. Mediterranean Politics, 1-21. https://doi.org/10.1080/13629395.2024.2379734

Tunisia under Habib Bourguiba:
Admin, M. (2022, January 17). The first population policies implemented in Africa: the case of Tunisia. MAHB. https://mahb.stanford.edu/blog/first-population-policies-implemented-africa-case-tunisia/

Libya’s oil discovery:
Cordell, D. D., Fowler, L. G., Buru, M., Barbour, N., Brown, & Carl, L. (2025, October 11). Libya | History, people, Map, & Government. Encyclopedia Britannica. https://www.britannica.com/place/Libya/The-discovery-of-oil

Sudan’s agriculture growth and financial aids:
Elhaj Mustafa, M., Mahagoub Elshakh, M., & Ebaidalla, E. M. (2018). Does Foreign Aid Promote Economic Growth in Sudan? Evidence from ARDL Bounds Testing Analysis. In ERF, ERF’ 24th Annual Conference [Conference-proceeding]. https://erf.org.eg/app/uploads/2018/02/4-23-Mahjoub-Ebaidallah.pdf

Libya’s demographic history during 1950s–2000s:
Ibrahim, M. (2022). (The demographic transition in Libya, the factors influencing it and its effects). ResearchGate. https://www.researchgate.net/publication/367412771_The_demographic_transition_in_Libya_the_factors_influencing_it_and_its_effects_alantqal_aldymwghrafy_fy_lybya_alwaml_almwthrt_fyh_watharh

Tunisia’s National Family Planning:
Admin, M. (2022, January 17). The first population policies implemented in Africa: the case of Tunisia. MAHB. https://mahb.stanford.edu/blog/first-population-policies-implemented-africa-case-tunisia/