Understanding the Relationship Between Population, Employment, and Economic Output Across Countries using the Penn World Table

This analysis aims to explore how population and employment levels influence economic output (measured by rgdpo) using the Penn World Data. We will:

  1. Wrangle the dataset to extract relevant information.

  2. Present summarized tables for key insights.

  3. Visualize the relationships using effective plots.

  4. Interpret the findings to provide insights into economic trends.

First, we have to load the dataset from penn world table and perform data wrangling.

library(readxl)
pwt1001 <- read_excel("C:/Users/admin/Documents/pwt1001.xlsx", 
                      sheet = "Data")

For Data wrangling, we will start with filtering out the null values in the columns specified in the following code.

data_selected <- pwt1001 %>%
  select(country, year, pop, emp, rgdpo) %>%
  filter(!is.na(pop), !is.na(emp), !is.na(rgdpo)) %>%
  mutate(pop = pop * 1e6, emp = emp * 1e6) # Convert to absolute numbers

data_selected
## # A tibble: 9,529 × 5
##    country  year   pop    emp rgdpo
##    <chr>   <dbl> <dbl>  <dbl> <dbl>
##  1 Aruba    1991 64622 29200. 3177.
##  2 Aruba    1992 68235 30903. 3371.
##  3 Aruba    1993 72504 32912. 3699.
##  4 Aruba    1994 76700 34896. 4173.
##  5 Aruba    1995 80324 36628. 4184.
##  6 Aruba    1996 83200 38026. 3977.
##  7 Aruba    1997 85451 39143. 4282.
##  8 Aruba    1998 87277 40070. 4661.
##  9 Aruba    1999 89005 40956. 4854.
## 10 Aruba    2000 90853 41900. 4130.
## # ℹ 9,519 more rows

Now we will summarize the average population, employment, and GDP output and display top 10 countries according to average GDP output.

summary_table <- data_selected %>%
  group_by(country) %>%
  summarise(
    avg_population = mean(pop, na.rm = TRUE),
    avg_employment = mean(emp, na.rm = TRUE),
    avg_gdp_output = mean(rgdpo, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_gdp_output))


# Display top 10 countries
head(summary_table, 10)
## # A tibble: 10 × 4
##    country            avg_population avg_employment avg_gdp_output
##    <chr>                       <dbl>          <dbl>          <dbl>
##  1 United States          242932211.     109262384.       9626137.
##  2 China                 1052894442.     550352060.       5129267.
##  3 Japan                  114751281.      59044377.       2860134.
##  4 Russian Federation     145646567.      69524551.       2653549.
##  5 India                  811894048.     311509443.       2185836.
##  6 Germany                 78162630.      38430277.       2097340.
##  7 France                  56028346.      23275317.       1483397.
##  8 United Kingdom          57329230.      26283842.       1481765.
##  9 Italy                   55287731.      22122705.       1307158.
## 10 Brazil                 133036516.      51949627.       1167996.

Here, we will create a Visualization of Relationship Between Population and GDP Output. We will use a scatter plot to analyse the relation.

Visualization of Relationship Between Population and GDP Output

ggplot(data_selected, aes(x = log(pop), y = log(rgdpo))) +
  geom_point(alpha = 0.5, color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(
    title = "Log-Scale Population vs GDP Output",
    x = "Log(Population)",
    y = "Log(Real GDP Output)"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Further, we will look into the employment trends of top 5 countries according to average GDP and plot a line chart.

Summary and Interpretation

Findings:

  1. Population and GDP Output: There is a positive correlation between population size and GDP output, indicating that larger populations often contribute to higher economic outputs.

  2. Top Performing Countries: The top 5 countries with the highest GDP output show consistent employment growth over the years.

  3. Employment Trends: Countries with higher employment tend to have steady economic growth, but this is not a strict rule.