# Load required packages
library(tidyverse)
library(dslabs)ECON 465 – Week 4 Lab: Exploratory Data Analysis for Economic Insight
Lab Objectives
By the end of this lab, you will be able to:
- Understand the three main components of a ggplot2 plot
- Create and interpret scatterplots, histograms, and boxplots
- Use faceting to create small multiples for comparison
- Create time series plots to visualize trends over time
- Apply data transformations (log transforms) to reveal patterns
- Use real-world data to challenge common misconceptions about global development
The Economic Question
Is the world really divided into “Western rich nations” and “developing nations” in Africa, Asia, and Latin America? Has income inequality across countries worsened during the last 40 years? In this lab, we use data visualization to answer these questions, following the work of Hans Rosling and the Gapminder Foundation.
Datasets for This Lab
We will use the gapminder dataset from the dslabs package. This dataset contains life expectancy, fertility rates, GDP, and population data for 10,545 country-year observations.
# To Check the details of gapminder data set (variable descriptions) ??gapminder
# Load the gapminder dataset and examine it
data(gapminder)
gapminder |> as_tibble()1 Quick Introduction to ggplot2
1.1 The Three Main Components of a ggplot
Every ggplot2 plot has three essential components:
Data: The dataset containing the variables we want to plot
Aesthetics (aes): Mappings from variables to visual properties (x-axis, y-axis, color, size, etc.)
Geometry (geom): The type of plot (points, lines, bars, etc.)
Basic template:
ggplot(data = dataset, aes(x = variable1, y = variable2)) +
geom_something()1.2 Simple Example with Gapminder
Let’s create a scatterplot of life expectancy vs. fertility rate for 1962:
# Filter for 1962 and create scatterplot
gapminder |>
filter(year == 1962) |>
ggplot(aes(x = fertility, y = life_expectancy)) +
geom_point() +
labs(
title = "Life Expectancy vs. Fertility Rate (1962)",
x = "Fertility (children per woman)",
y = "Life Expectancy (years)"
) +
theme_minimal()This plot reveals two distinct clusters – countries with high fertility/low life expectancy and countries with low fertility/high life expectancy.
2 Case Study 1: New Insights on Poverty
Based on Chapter 10.1-10.7 of Irizarry’s “Introduction to Data Science”
2.1 Background: Testing Our Knowledge
Hans Rosling, co-founder of the Gapminder Foundation, often began his talks with a quiz. For each pair below, which country had higher infant mortality in 2015?
Sri Lanka or Turkey
Poland or South Korea
Malaysia or Russia
Pakistan or Vietnam
Thailand or South Africa
Without data, most people pick the non-European countries. Let’s check with data:
# Compare infant mortality rates for 2015
comparisons <- c("Sri Lanka", "Turkey", "Poland", "South Korea",
"Malaysia", "Russia", "Pakistan", "Vietnam",
"Thailand", "South Africa")