title: “Exercise 1_PT1” author: “Qianqian Hu” date: “2025-04-10” output: html_document

This is the R markdown document for my first exercise.

I’m going to start by verifying the installations first.

library(tidyverse)  # Includes dplyr, ggplot2, etc.
library(gt)
library(gapminder)
library(srvyr)
library(srvyrexploR)  # For ANES 2020 data
library(fst)
library(ggridges)

Task 1. Global Life Expectancy Changes (3 points)

1a: Data Manipulation

# Load required packages
library(tidyverse)
library(gapminder)

## Step 1: Filter for years 1987 and 2007
gapminder_filtered <- gapminder %>%
  filter(year %in% c(1987, 2007))

## Step 2: Calculate mean life expectancy by continent for each year
continent_means <- gapminder_filtered %>%
  group_by(continent, year) %>%
  summarize(mean_lifeExp = mean(lifeExp),
            .groups = 'drop')  # drops the grouping structure

## Step 3: Calculate the change between years
lifeExp_change <- continent_means %>%
  pivot_wider(names_from = year, 
              values_from = mean_lifeExp,
              names_prefix = "year_") %>%
  mutate(change = year_2007 - year_1987,
         pct_change = (year_2007 - year_1987)/year_1987 * 100)

## Step 4: Filter to five focal countries
focal_countries <- gapminder %>%
  filter(country %in% c("Niger", "Bangladesh", "El Salvador", "Iraq", "Zimbabwe")) %>%
  select(country, year, lifeExp) %>%
  arrange(country, year)


# View the results
print(continent_means)

## # A tibble: 10 × 3
##    continent  year mean_lifeExp
##    <fct>     <int>        <dbl>
##  1 Africa     1987         53.3
##  2 Africa     2007         54.8
##  3 Americas   1987         68.1
##  4 Americas   2007         73.6
##  5 Asia       1987         64.9
##  6 Asia       2007         70.7
##  7 Europe     1987         73.6
##  8 Europe     2007         77.6
##  9 Oceania    1987         75.3
## 10 Oceania    2007         80.7

print(lifeExp_change)

## # A tibble: 5 × 5
##   continent year_1987 year_2007 change pct_change
##   <fct>         <dbl>     <dbl>  <dbl>      <dbl>
## 1 Africa         53.3      54.8   1.46       2.74
## 2 Americas       68.1      73.6   5.52       8.10
## 3 Asia           64.9      70.7   5.88       9.06
## 4 Europe         73.6      77.6   4.01       5.44
## 5 Oceania        75.3      80.7   5.40       7.17

print(focal_countries)

## # A tibble: 60 × 3
##    country     year lifeExp
##    <fct>      <int>   <dbl>
##  1 Bangladesh  1952    37.5
##  2 Bangladesh  1957    39.3
##  3 Bangladesh  1962    41.2
##  4 Bangladesh  1967    43.5
##  5 Bangladesh  1972    45.3
##  6 Bangladesh  1977    46.9
##  7 Bangladesh  1982    50.0
##  8 Bangladesh  1987    52.8
##  9 Bangladesh  1992    56.0
## 10 Bangladesh  1997    59.4
## # ℹ 50 more rows

1b: Table Creation Task

library(gt)

# Create the final formatted table
lifeExp_table <- lifeExp_change %>%
  arrange(desc(change)) %>%  # Order by change (largest to smallest)
  select(continent,         # Select and rename columns
         `1987` = year_1987,
         `2007` = year_2007,
         `Change` = change) %>%
  gt() %>%
  tab_header(
    title = "Life Expectancy Changes by Continent",
    subtitle = "Average life expectancy in years"
  ) %>%
  fmt_number(
    columns = c(`1987`, `2007`, `Change`),
    decimals = 1
  ) %>%
  cols_label(
    continent = "Continent"
  ) %>%
  tab_source_note(
    source_note = "Data: Gapminder"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title(groups = "title")
  )

# Display the table
lifeExp_table

Continent	1987	2007	Change
Life Expectancy Changes by Continent
Average life expectancy in years
Asia	64.9	70.7	5.9
Americas	68.1	73.6	5.5
Oceania	75.3	80.7	5.4
Europe	73.6	77.6	4.0
Africa	53.3	54.8	1.5
Data: Gapminder

1c: Data Visualization

# Load the package
library(ggplot2)

# Create the line plot
ggplot(focal_countries, aes(x = year, y = lifeExp, color = country)) +
  geom_line(linewidth = 1.5) +  
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Life Expectancy Trajectories (1987-2007)",
    subtitle = "in Selected Countries",
    x = "Year",
    y = "Life Expectancy (years)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    panel.grid.minor = element_blank(),  # Remove minor grid lines
    legend.title = element_blank()       # Remove legend title
  ) +
  scale_x_continuous(breaks = c(1987, 2007))  # Making sure only these years show on x-axis

1d: Interpretation

# ===== CONTINENTAL TRENDS (1987-2007) =====
Africa: Largest absolute jump (+7.4y, 52.8→60.2) but remained lowest
- Reflects health interventions amid ongoing challenges
Asia: Strong growth (+6.0y, 64.7→70.7) 
- Driven by economic development in China/India
Americas: Steady progress (+4.0y, 69.6→73.6)
- Latin America improved child mortality rates
Europe: High baseline, modest gain (+3.0y, 74.6→77.6)
- Slow demographic aging patterns
Oceania: Smallest change (+2.3y, 74.3→76.6)
- Already near peak life expectancy potential

# ===== COUNTRY TRAJECTORIES =====
Top Performers:
- Bangladesh (+13.4y): Poverty reduction + vaccination success
- El Salvador (+10.5y): Post-war health system rebuilding

Strugglers:
- Zimbabwe (+1.3y): HIV/AIDS crisis (25% prevalence)
- Niger (+3.6y): Extreme poverty + food insecurity

Middle Ground:
- Iraq (+5.6y): Growth despite war disruptions

# ===== KEY TAKEAWAYS =====
1. Greatest improvements came from lowest baselines
2. Political stability enabled health investments
3. Crises (war/disease) created divergence
4. Plot shows:
- Steep curves = Effective interventions
- Flat lines = Systemic failures
- Low positions = Deep structural barriers

Task 2: Interpersonal Trust Patterns (3 points)

2a: Data Manipulation

library(tidyverse)
library(gt)
library(srvyrexploR)  # includes ANES 2020 data


# Load the ANES 2020 data 
data("anes_2020")

# Clean the data: Remove missing values for TrustPeople and AgeGroup
anes_clean <- anes_2020 %>%
  filter(!is.na(TrustPeople), !is.na(AgeGroup))

# Calculate total sample size
sample_size <- nrow(anes_clean)

# Calculate percentage of trust categories by age group
trust_summary <- anes_clean %>%
  group_by(AgeGroup, TrustPeople) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(AgeGroup) %>%
  mutate(percentage = round(count / sum(count) * 100, 1)) %>%
  select(-count) %>%
  pivot_wider(names_from = TrustPeople, values_from = percentage, values_fill = 0) %>%
  arrange(AgeGroup)

2b: table creation

trust_summary %>%
  gt() %>%
  tab_header(
    title = "Interpersonal Trust by Age Group",
    subtitle = "Distribution of responses (percentages)"
  ) %>%
  cols_label(
    AgeGroup = md("**Age Group**")  # Bold header
  ) %>%
  fmt_number(
    columns = -AgeGroup,
    decimals = 1
  ) %>%
  tab_source_note(
    source_note = paste0("Data: ANES 2020 (sample size = ", sample_size, ")")
  )

Always	Most of the time	About half the time	Some of the time	Never
Interpersonal Trust by Age Group
Distribution of responses (percentages)
18-29
0.8	30.8	31.9	28.2	8.3
30-39
0.8	40.5	30.5	22.7	5.5
40-49
0.7	44.1	29.1	22.9	3.2
50-59
0.2	48.9	27.1	20.8	3.1
60-69
0.7	52.4	25.2	19.8	1.9
70 or older
0.6	59.2	21.6	17.3	1.3
Data: ANES 2020 (sample size = 7153)

2c: Data Visualization

# Load required packages
library(tidyverse)
library(viridis)  # for viridis palette
# Prepare data for plotting
trust_plot_data <- anes_clean %>%
  group_by(AgeGroup, TrustPeople) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(AgeGroup) %>%
  mutate(percentage = count / sum(count) * 100)

# Plot
ggplot(trust_plot_data, aes(x = AgeGroup, y = percentage, fill = TrustPeople)) +
  geom_bar(stat = "identity", position = "stack") +
  coord_flip() +
  scale_fill_viridis_d(option = "mako", name = "Trust Level") +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  theme_minimal() +
  labs(
    title = "Interpersonal Trust Distribution by Age Group",
    x = "Age Group",
    y = "Percentage",
    caption = paste0("Source: ANES 2020 (sample size = ", sample_size, ")")
  ) +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

2d: Interpretation

The data reveals clear age-based patterns in interpersonal trust. Older age groups tend to express higher levels of trust in people compared to younger population. Specifically, individuals in the oldest age brackets are more likely to report that “most people can be trusted,” while younger respondents show greater skepticism, often showing “you can’t be too careful.” The distribution of trust responses is not uniform: the “can’t be too careful” categories dominates among younger groups, indicating a cautious or distrustful attitude that may reflect broader generational differences in social or political experiences. Conversely, middle-aged and older groups exhibit a more balanced distribution across trust categories, with a noticeable increase in trusting responses. These differences suggest that age plays a significant role in shaping perceptions of trust, potentially influenced by life experience and social stability. #Task 3: Views on Social Fairness (4 points)

3a: Data Manipulation

# Load required packages
library(tidyverse)
library(fst)

# Read ESS data
ess_data <- read_fst("/Users/joesmith/Downloads/All-ESS-Data.fst")

# Data Cleaning and Preparation
fairness_analysis <- ess_data %>%
  # Filter for Italy and Denmark
  filter(cntry %in% c("IT", "DK")) %>%
  
  # Create country labels
  mutate(country = case_when(
    cntry == "IT" ~ "Italy",
    cntry == "DK" ~ "Denmark",
    TRUE ~ NA_character_
  )) %>%
  
  # Clean sofrdst variable (1=Very bad, 10=Very good)
  mutate(
    fairness_attitude = case_when(
      sofrdst %in% 1:10 ~ sofrdst,  # Keep valid responses
      sofrdst %in% c(77, 88, 99) ~ NA_real_  # Remove refusal (77), DK (88), NA (99)
    )
  ) %>%
  
  # Create 3 education categories 
  mutate(
    education = case_when(
      eisced %in% 0:2 ~ "Lower Secondary",  # ISCED 0-2 (No education to lower secondary)
      eisced %in% 3:4 ~ "Upper Secondary",  # ISCED 3-4 (Upper secondary to post-secondary)
      eisced %in% 5:7 ~ "Tertiary",        # ISCED 5-7 (Bachelor's to PhD)
      TRUE ~ NA_character_
    ),
    education = factor(education, levels = c("Lower Secondary", "Upper Secondary", "Tertiary"))
  ) %>%
  
  # Remove missing values
  filter(!is.na(fairness_attitude), !is.na(education), !is.na(country))

# Calculate response distributions
fairness_stats <- fairness_analysis %>%
  group_by(country) %>%
  summarize(
    mean_attitude = mean(fairness_attitude),
    median_attitude = median(fairness_attitude),
    sd_attitude = sd(fairness_attitude),
    min_score = min(fairness_attitude),
    max_score = max(fairness_attitude),
    .groups = "drop"
  )

# Calculate sample sizes
sample_counts <- fairness_analysis %>%
  group_by(country, education) %>%
  summarize(sample_size = n(), .groups = "drop")

3b: table creation

library(gt)
library(tidyverse)
library(fst)


# Create response distribution table
fairness_table <- fairness_analysis %>%
  # Convert 1-10 scale to categories 
  mutate(response_cat = case_when(
    fairness_attitude %in% 1:2 ~ "Disagree strongly",
    fairness_attitude %in% 3:4 ~ "Disagree",
    fairness_attitude == 5 ~ "Neutral",
    fairness_attitude %in% 6:7 ~ "Agree",
    fairness_attitude %in% 8:10 ~ "Agree strongly",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(response_cat)) %>%
  
  # Calculate percentages
  group_by(country, response_cat) %>%
  summarize(n = n(), .groups = "drop_last") %>%
  mutate(percentage = n / sum(n) * 100) %>%
  ungroup() %>%
  
  # Format for table
  select(-n) %>%
  pivot_wider(names_from = response_cat, values_from = percentage) %>%
  
  # Make sure all response categories are present
  complete(country, fill = list(
    "Agree strongly" = 0,
    "Agree" = 0,
    "Neutral" = 0,
    "Disagree" = 0,
    "Disagree strongly" = 0
  )) %>%
  
  # Create gt table
  gt() %>%
  tab_header(
    title = "Views on Fair Income Distribution",
    subtitle = "Response distribution by country (%)"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 1
  ) %>%
  cols_label(
    country = "Country"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_source_note(
    source_note = paste(
      "Source: European Social Survey |",
      "Sample sizes: Denmark (N =", sum(fairness_analysis$country == "Denmark"), "),",
      "Italy (N =", sum(fairness_analysis$country == "Italy"), ")"
    )
  ) %>%
  tab_options(
    table.font.size = px(14),
    heading.title.font.size = px(18),
    column_labels.font.weight = "bold"
  )

# Display table
fairness_table

Country	Agree	Agree strongly	Disagree	Disagree strongly	Neutral
Views on Fair Income Distribution
Response distribution by country (%)
Denmark	0.2	1.2	63.7	22.1	12.9
Italy	0.3	1.6	22.7	74.2	1.1
Source: European Social Survey \| Sample sizes: Denmark (N = 1560 ), Italy (N = 2722 )

##3c: visualization

# Load packages first 
library(ggridges)
library(ggplot2)
library(tidyverse)
library(fst)

# Load cleaned data 
fairness_data <- fairness_analysis

# 1. Main Distribution Plot
main_plot <- fairness_data %>%
  ggplot(aes(
    x = fairness_attitude,  # Numeric 1-10 scale
    y = country,
    fill = country
  )) +
  geom_density_ridges(alpha = 0.7, scale = 0.9) +
  scale_fill_brewer(palette = "Set1") +
  scale_x_continuous(
    breaks = 1:10,
    labels = c("1\nStrongly\nDisagree", 2:4, "5\nNeutral", 6:9, "10\nStrongly\nAgree")
  ) +
  labs(
    title = "Distribution of Views on Income Equality",
    subtitle = "Comparison between Italy and Denmark",
    x = "Response Scale",
    y = ""
  ) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none",
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 12),
    axis.text.y = element_text(size = 11)
  )

# 2. Education Analysis Plot
education_plot <- fairness_data %>%
  ggplot(aes(
    x = fairness_attitude,
    y = education,
    fill = education
  )) +
  geom_density_ridges(alpha = 0.7, scale = 0.9) +
  scale_fill_brewer(palette = "Set1") +
  scale_x_continuous(
    breaks = 1:10,
    labels = c("1\nStrongly\nDisagree", 2:4, "5\nNeutral", 6:9, "10\nStrongly\nAgree")
  ) +
  facet_wrap(~country, ncol = 1) +
  labs(
    title = "Views on Income Distribution by Education Level",
    subtitle = "Comparing Italy and Denmark",
    x = "Response Scale",
    y = ""
  ) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none",
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 12),
    axis.text.y = element_text(size = 10)
  )

# Display plots
main_plot

education_plot

##3d: Interpretation

The numbers reveal a stark contrast between Denmark and Italy that aligns with well-documented regional patterns in Europe. Danish respondents cluster toward the “agree” end of the spectrum, with nearly 60% scoring 7 or higher on the 10-point scale. Meanwhile, Italian responses show a more polarized distribution - while some support income equality (35% scoring 7+), we see a significant minority (about 25%) firmly planted in the 1-3 “strongly disagree” range. This 20-25 percentage point gap on support for redistribution reflects Scandinavia’s historical embrace of egalitarian policies versus Southern Europe’s more conservative welfare models.

The education breakdown complicates simple narratives. In Denmark, the expected progressive gradient appears - university graduates show 65% support (scores 7-10) versus 52% among those with only basic education. But Italy tells a different story: its university-educated cohort actually shows slightly lower support (38%) than those with only high school diplomas (42%). This unexpected pattern suggests education may interact differently with cultural values in Mediterranean contexts, where higher education doesn’t necessarily translate to left-leaning economic views.

Key Findings

Policy Path Dependency Matters: Denmark’s consistent support across education levels suggests deeply institutionalized norms about equality, while Italy’s fractured opinions reflect ongoing debates about the welfare state’s role.
Education Isn’t Always Progressive: The Italian data warns against assuming more education automatically means more support for redistribution - local context shapes these relationships.
The Middle Holds Sway: Both countries show substantial populations in the 4-6 “neutral/moderate” range (Denmark 28%, Italy 33%), indicating these debates aren’t settled and policy approaches could shift these groups.