Project 2

Project 2:

Economic Development and life Expectancy across countries!

Question:

Do richer countries live longer?

Introduction

This project uses data from the World Bank’s World Development Indicators to examine whether richer countries tend to have higher life expectancy than poorer ones. The dataset includes categorical variables such as country name and income group, and quantitative variables such as GDP per capita, life expectancy, population, school enrollment, and unemployment rate.
The goal of my project analyse is to understand how economic and social conditions influence life expectancy across countries.
Source: World Bank, World Development Indicators

library(readr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:readr':

    col_factor

library(RColorBrewer)

Data cleaning

The data was downloaded from the World Bank. I selected the year 2020, renamed the variables, merged the datasets by country, added income group from the metadata file, and removed missing values.

# Load files
gdp <- read_csv("~/Downloads/project 2/API_NY.GDP.PCAP.CD_DS2_en_csv_v2_245.csv", skip = 4)

New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`

life <- read_csv("~/Downloads/project 2/API_SP.DYN.LE00.IN_DS2_en_csv_v2_163.csv", skip = 4)

New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`

pop <- read_csv("~/Downloads/project 2/API_SP.POP.TOTL_DS2_en_csv_v2_58.csv", skip = 4)

New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (65): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ... lgl (2): 2025,
...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`

school <- read_csv("~/Downloads/project 2/API_SE.SEC.ENRR_DS2_en_csv_v2_758.csv", skip = 4)

New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (56): 1970,
1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, ... lgl (11): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, ...71
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`

unemp <- read_csv("~/Downloads/project 2/API_SL.UEM.TOTL.ZS_DS2_en_csv_v2_36.csv", skip = 4)

New names:
Rows: 266 Columns: 71
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): Country Name, Country Code, Indicator Name, Indicator Code dbl (35): 1991,
1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, ... lgl (32): 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...71`

meta <- read_csv("~/Downloads/project 2/Metadata_Country_API_NY.GDP.PCAP.CD_DS2_en_csv_v2_245.csv")

New names:
Rows: 265 Columns: 6
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(5): Country Code, Region, IncomeGroup, SpecialNotes, TableName lgl (1): ...6
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...6`

# Keep only 2020
gdp <- gdp %>%
  select(`Country Name`, `Country Code`, `2020`) %>%
  rename(GDP_per_capita = `2020`)

life <- life %>%
  select(`Country Name`, `Country Code`, `2020`) %>%
  rename(Life_expectancy = `2020`)

pop <- pop %>%
  select(`Country Name`, `Country Code`, `2020`) %>%
  rename(Population = `2020`)

school <- school %>%
  select(`Country Name`, `Country Code`, `2020`) %>%
  rename(School_enrollment = `2020`)

unemp <- unemp %>%
  select(`Country Name`, `Country Code`, `2020`) %>%
  rename(Unemployment_rate = `2020`)

meta_small <- meta %>%
  select(`Country Code`, IncomeGroup)

# Merge everything
project2_data <- gdp %>%
  inner_join(life, by = c("Country Name", "Country Code")) %>%
  inner_join(pop, by = c("Country Name", "Country Code")) %>%
  inner_join(school, by = c("Country Name", "Country Code")) %>%
  inner_join(unemp, by = c("Country Name", "Country Code")) %>%
  inner_join(meta_small, by = "Country Code") %>%
  filter(
    !is.na(GDP_per_capita),
    !is.na(Life_expectancy),
    !is.na(Population),
    !is.na(School_enrollment),
    !is.na(Unemployment_rate),
    !is.na(IncomeGroup)
  )

glimpse(project2_data)

Rows: 128
Columns: 8
$ `Country Name`    <chr> "Albania", "United Arab Emirates", "Argentina", "Arm…
$ `Country Code`    <chr> "ALB", "ARE", "ARG", "ARM", "AUS", "AUT", "AZE", "BD…
$ GDP_per_capita    <dbl> 6027.9135, 37991.7493, 8535.5994, 4268.6809, 51983.4…
$ Life_expectancy   <dbl> 77.82400, 81.93600, 75.87800, 73.37561, 83.20000, 81…
$ Population        <dbl> 2528480, 9401038, 45191965, 2961500, 25649248, 89168…
$ School_enrollment <dbl> 95.92945, 99.16354, 112.37673, 88.50803, 137.49074, …
$ Unemployment_rate <dbl> 11.639, 4.294, 11.461, 18.175, 6.394, 5.201, 7.240, …
$ IncomeGroup       <chr> "Upper middle income", "High income", "Upper middle …

Multiple Linear Regression

The dependent variable in this project is life expectancy. The independent variables are GDP per capita, school enrollment, and unemployment rate.

model <- lm(Life_expectancy ~ GDP_per_capita + School_enrollment + Unemployment_rate,
            data = project2_data)

summary(model)


Call:
lm(formula = Life_expectancy ~ GDP_per_capita + School_enrollment + 
    Unemployment_rate, data = project2_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.4412  -1.9395   0.5143   2.4052   8.4438 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       6.069e+01  1.485e+00  40.882  < 2e-16 ***
GDP_per_capita    1.353e-04  2.076e-05   6.518 1.61e-09 ***
School_enrollment 1.217e-01  1.864e-02   6.529 1.53e-09 ***
Unemployment_rate 1.212e-02  6.802e-02   0.178    0.859    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.859 on 124 degrees of freedom
Multiple R-squared:  0.646, Adjusted R-squared:  0.6374 
F-statistic: 75.41 on 3 and 124 DF,  p-value: < 2.2e-16

Visualization

This graph shows the relationship between GDP per capita and life expectancy across countries. Income group is shown with color, and population is shown with point size.

final_plot <- ggplot(
  project2_data,
  aes(
    x = GDP_per_capita,
    y = Life_expectancy,
    color = IncomeGroup,
    size = Population,
    text = paste(
      "Country:", `Country Name`,
      "<br>Income Group:", IncomeGroup,
      "<br>GDP per capita:", round(GDP_per_capita, 2),
      "<br>Life expectancy:", round(Life_expectancy, 2),
      "<br>Population:", comma(Population)
    )
  )
) +
  geom_point(alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  labs(
    title = "Do Richer Countries Live Longer?",
    x = "GDP per Capita (USD)",
    y = "Life Expectancy (Years)",
    color = "Income Group",
    size = "Population",
    caption = "Source: World Bank, World Development Indicators"
  ) +
  theme_minimal()

final_plot

This interactive version of the graph allows you professor to see and go over each point to see detailed information about each country. I found it interesting to implementing in.

ggplotly(final_plot, tooltip = "text")

Final Analysis

The results suggest that richer countries generally tend to have higher life expectancy. The visualization shows a positive relationship between GDP per capita and life expectancy, meaning that countries with higher income levels usually have longer average lifespans. The regression model helps show whether GDP per capita, school enrollment, and unemployment rate are useful in explaining differences in life expectancy across countries.One limitation of this analysis is that other important variables, such as healthcare spending or access to clean water, were not included.