This analysis aims to explore how population and employment levels influence economic output (measured by rgdpo) using the Penn World Data. We will:
Wrangle the dataset to extract relevant information.
Present summarized tables for key insights.
Visualize the relationships using effective plots.
Interpret the findings to provide insights into economic trends.
First, we have to load the dataset from penn world table and perform data wrangling.
library(readxl)
pwt1001 <- read_excel("C:/Users/admin/Documents/pwt1001.xlsx",
sheet = "Data")
For Data wrangling, we will start with filtering out the null values in the columns specified in the following code.
data_selected <- pwt1001 %>%
select(country, year, pop, emp, rgdpo) %>%
filter(!is.na(pop), !is.na(emp), !is.na(rgdpo)) %>%
mutate(pop = pop * 1e6, emp = emp * 1e6) # Convert to absolute numbers
data_selected
## # A tibble: 9,529 × 5
## country year pop emp rgdpo
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Aruba 1991 64622 29200. 3177.
## 2 Aruba 1992 68235 30903. 3371.
## 3 Aruba 1993 72504 32912. 3699.
## 4 Aruba 1994 76700 34896. 4173.
## 5 Aruba 1995 80324 36628. 4184.
## 6 Aruba 1996 83200 38026. 3977.
## 7 Aruba 1997 85451 39143. 4282.
## 8 Aruba 1998 87277 40070. 4661.
## 9 Aruba 1999 89005 40956. 4854.
## 10 Aruba 2000 90853 41900. 4130.
## # ℹ 9,519 more rows
Now we will summarize the average population, employment, and GDP output and display top 10 countries according to average GDP output.
summary_table <- data_selected %>%
group_by(country) %>%
summarise(
avg_population = mean(pop, na.rm = TRUE),
avg_employment = mean(emp, na.rm = TRUE),
avg_gdp_output = mean(rgdpo, na.rm = TRUE)
) %>%
arrange(desc(avg_gdp_output))
# Display top 10 countries
head(summary_table, 10)
## # A tibble: 10 × 4
## country avg_population avg_employment avg_gdp_output
## <chr> <dbl> <dbl> <dbl>
## 1 United States 242932211. 109262384. 9626137.
## 2 China 1052894442. 550352060. 5129267.
## 3 Japan 114751281. 59044377. 2860134.
## 4 Russian Federation 145646567. 69524551. 2653549.
## 5 India 811894048. 311509443. 2185836.
## 6 Germany 78162630. 38430277. 2097340.
## 7 France 56028346. 23275317. 1483397.
## 8 United Kingdom 57329230. 26283842. 1481765.
## 9 Italy 55287731. 22122705. 1307158.
## 10 Brazil 133036516. 51949627. 1167996.
Here, we will create a Visualization of Relationship Between Population and GDP Output. We will use a scatter plot to analyse the relation.
ggplot(data_selected, aes(x = log(pop), y = log(rgdpo))) +
geom_point(alpha = 0.5, color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(
title = "Log-Scale Population vs GDP Output",
x = "Log(Population)",
y = "Log(Real GDP Output)"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Further, we will look into the employment trends of top 5 countries according to average GDP and plot a line chart.
#Identify top 5 countries by average GDP
top_countries <- summary_table %>% top_n(5, avg_gdp_output) %>% pull(country)
# Filter data for top countries
top_countries_data <- data_selected %>% filter(country %in% top_countries)
# Plot
ggplot(top_countries_data, aes(x = year, y = emp, color = country)) +
geom_line(size = 1) +
labs(
title = "Employment Trends Over Time (Top 5 Countries by GDP)",
x = "Year",
y = "Employment (in millions)"
) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Findings:
Population and GDP Output: There is a positive correlation between population size and GDP output, indicating that larger populations often contribute to higher economic outputs.
Top Performing Countries: The top 5 countries with the highest GDP output show consistent employment growth over the years.
Employment Trends: Countries with higher employment tend to have steady economic growth, but this is not a strict rule.