title: “Exercise 1_PT1” author: “Qianqian Hu” date: “2025-04-10” output: html_document
This is the R markdown document for my first exercise.
I’m going to start by verifying the installations first.
library(tidyverse) # Includes dplyr, ggplot2, etc.
library(gt)
library(gapminder)
library(srvyr)
library(srvyrexploR) # For ANES 2020 data
library(fst)
library(ggridges)
# Load required packages
library(tidyverse)
library(gapminder)
## Step 1: Filter for years 1987 and 2007
gapminder_filtered <- gapminder %>%
filter(year %in% c(1987, 2007))
## Step 2: Calculate mean life expectancy by continent for each year
continent_means <- gapminder_filtered %>%
group_by(continent, year) %>%
summarize(mean_lifeExp = mean(lifeExp),
.groups = 'drop') # drops the grouping structure
## Step 3: Calculate the change between years
lifeExp_change <- continent_means %>%
pivot_wider(names_from = year,
values_from = mean_lifeExp,
names_prefix = "year_") %>%
mutate(change = year_2007 - year_1987,
pct_change = (year_2007 - year_1987)/year_1987 * 100)
## Step 4: Filter to five focal countries
focal_countries <- gapminder %>%
filter(country %in% c("Niger", "Bangladesh", "El Salvador", "Iraq", "Zimbabwe")) %>%
select(country, year, lifeExp) %>%
arrange(country, year)
# View the results
print(continent_means)
## # A tibble: 10 × 3
## continent year mean_lifeExp
## <fct> <int> <dbl>
## 1 Africa 1987 53.3
## 2 Africa 2007 54.8
## 3 Americas 1987 68.1
## 4 Americas 2007 73.6
## 5 Asia 1987 64.9
## 6 Asia 2007 70.7
## 7 Europe 1987 73.6
## 8 Europe 2007 77.6
## 9 Oceania 1987 75.3
## 10 Oceania 2007 80.7
print(lifeExp_change)
## # A tibble: 5 × 5
## continent year_1987 year_2007 change pct_change
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Africa 53.3 54.8 1.46 2.74
## 2 Americas 68.1 73.6 5.52 8.10
## 3 Asia 64.9 70.7 5.88 9.06
## 4 Europe 73.6 77.6 4.01 5.44
## 5 Oceania 75.3 80.7 5.40 7.17
print(focal_countries)
## # A tibble: 60 × 3
## country year lifeExp
## <fct> <int> <dbl>
## 1 Bangladesh 1952 37.5
## 2 Bangladesh 1957 39.3
## 3 Bangladesh 1962 41.2
## 4 Bangladesh 1967 43.5
## 5 Bangladesh 1972 45.3
## 6 Bangladesh 1977 46.9
## 7 Bangladesh 1982 50.0
## 8 Bangladesh 1987 52.8
## 9 Bangladesh 1992 56.0
## 10 Bangladesh 1997 59.4
## # ℹ 50 more rows
library(gt)
# Create the final formatted table
lifeExp_table <- lifeExp_change %>%
arrange(desc(change)) %>% # Order by change (largest to smallest)
select(continent, # Select and rename columns
`1987` = year_1987,
`2007` = year_2007,
`Change` = change) %>%
gt() %>%
tab_header(
title = "Life Expectancy Changes by Continent",
subtitle = "Average life expectancy in years"
) %>%
fmt_number(
columns = c(`1987`, `2007`, `Change`),
decimals = 1
) %>%
cols_label(
continent = "Continent"
) %>%
tab_source_note(
source_note = "Data: Gapminder"
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_title(groups = "title")
)
# Display the table
lifeExp_table
| Life Expectancy Changes by Continent | |||
| Average life expectancy in years | |||
| Continent | 1987 | 2007 | Change |
|---|---|---|---|
| Asia | 64.9 | 70.7 | 5.9 |
| Americas | 68.1 | 73.6 | 5.5 |
| Oceania | 75.3 | 80.7 | 5.4 |
| Europe | 73.6 | 77.6 | 4.0 |
| Africa | 53.3 | 54.8 | 1.5 |
| Data: Gapminder | |||
# Load the package
library(ggplot2)
# Create the line plot
ggplot(focal_countries, aes(x = year, y = lifeExp, color = country)) +
geom_line(linewidth = 1.5) +
scale_color_brewer(palette = "Set1") +
labs(
title = "Life Expectancy Trajectories (1987-2007)",
subtitle = "in Selected Countries",
x = "Year",
y = "Life Expectancy (years)"
) +
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
panel.grid.minor = element_blank(), # Remove minor grid lines
legend.title = element_blank() # Remove legend title
) +
scale_x_continuous(breaks = c(1987, 2007)) # Making sure only these years show on x-axis
# ===== CONTINENTAL TRENDS (1987-2007) =====
Africa: Largest absolute jump (+7.4y, 52.8→60.2) but remained lowest
- Reflects health interventions amid ongoing challenges
Asia: Strong growth (+6.0y, 64.7→70.7)
- Driven by economic development in China/India
Americas: Steady progress (+4.0y, 69.6→73.6)
- Latin America improved child mortality rates
Europe: High baseline, modest gain (+3.0y, 74.6→77.6)
- Slow demographic aging patterns
Oceania: Smallest change (+2.3y, 74.3→76.6)
- Already near peak life expectancy potential
# ===== COUNTRY TRAJECTORIES =====
Top Performers:
- Bangladesh (+13.4y): Poverty reduction + vaccination success
- El Salvador (+10.5y): Post-war health system rebuilding
Strugglers:
- Zimbabwe (+1.3y): HIV/AIDS crisis (25% prevalence)
- Niger (+3.6y): Extreme poverty + food insecurity
Middle Ground:
- Iraq (+5.6y): Growth despite war disruptions
# ===== KEY TAKEAWAYS =====
1. Greatest improvements came from lowest baselines
2. Political stability enabled health investments
3. Crises (war/disease) created divergence
4. Plot shows:
- Steep curves = Effective interventions
- Flat lines = Systemic failures
- Low positions = Deep structural barriers
library(tidyverse)
library(gt)
library(srvyrexploR) # includes ANES 2020 data
# Load the ANES 2020 data
data("anes_2020")
# Clean the data: Remove missing values for TrustPeople and AgeGroup
anes_clean <- anes_2020 %>%
filter(!is.na(TrustPeople), !is.na(AgeGroup))
# Calculate total sample size
sample_size <- nrow(anes_clean)
# Calculate percentage of trust categories by age group
trust_summary <- anes_clean %>%
group_by(AgeGroup, TrustPeople) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(AgeGroup) %>%
mutate(percentage = round(count / sum(count) * 100, 1)) %>%
select(-count) %>%
pivot_wider(names_from = TrustPeople, values_from = percentage, values_fill = 0) %>%
arrange(AgeGroup)
trust_summary %>%
gt() %>%
tab_header(
title = "Interpersonal Trust by Age Group",
subtitle = "Distribution of responses (percentages)"
) %>%
cols_label(
AgeGroup = md("**Age Group**") # Bold header
) %>%
fmt_number(
columns = -AgeGroup,
decimals = 1
) %>%
tab_source_note(
source_note = paste0("Data: ANES 2020 (sample size = ", sample_size, ")")
)
| Interpersonal Trust by Age Group | ||||
| Distribution of responses (percentages) | ||||
| Always | Most of the time | About half the time | Some of the time | Never |
|---|---|---|---|---|
| 18-29 | ||||
| 0.8 | 30.8 | 31.9 | 28.2 | 8.3 |
| 30-39 | ||||
| 0.8 | 40.5 | 30.5 | 22.7 | 5.5 |
| 40-49 | ||||
| 0.7 | 44.1 | 29.1 | 22.9 | 3.2 |
| 50-59 | ||||
| 0.2 | 48.9 | 27.1 | 20.8 | 3.1 |
| 60-69 | ||||
| 0.7 | 52.4 | 25.2 | 19.8 | 1.9 |
| 70 or older | ||||
| 0.6 | 59.2 | 21.6 | 17.3 | 1.3 |
| Data: ANES 2020 (sample size = 7153) | ||||
# Load required packages
library(tidyverse)
library(viridis) # for viridis palette
# Prepare data for plotting
trust_plot_data <- anes_clean %>%
group_by(AgeGroup, TrustPeople) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(AgeGroup) %>%
mutate(percentage = count / sum(count) * 100)
# Plot
ggplot(trust_plot_data, aes(x = AgeGroup, y = percentage, fill = TrustPeople)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
scale_fill_viridis_d(option = "mako", name = "Trust Level") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
theme_minimal() +
labs(
title = "Interpersonal Trust Distribution by Age Group",
x = "Age Group",
y = "Percentage",
caption = paste0("Source: ANES 2020 (sample size = ", sample_size, ")")
) +
theme(
legend.position = "right",
plot.title = element_text(face = "bold", size = 14),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
The data reveals clear age-based patterns in interpersonal trust. Older age groups tend to express higher levels of trust in people compared to younger population. Specifically, individuals in the oldest age brackets are more likely to report that “most people can be trusted,” while younger respondents show greater skepticism, often showing “you can’t be too careful.” The distribution of trust responses is not uniform: the “can’t be too careful” categories dominates among younger groups, indicating a cautious or distrustful attitude that may reflect broader generational differences in social or political experiences. Conversely, middle-aged and older groups exhibit a more balanced distribution across trust categories, with a noticeable increase in trusting responses. These differences suggest that age plays a significant role in shaping perceptions of trust, potentially influenced by life experience and social stability. #Task 3: Views on Social Fairness (4 points)
# Load required packages
library(tidyverse)
library(fst)
# Read ESS data
ess_data <- read_fst("/Users/joesmith/Downloads/All-ESS-Data.fst")
# Data Cleaning and Preparation
fairness_analysis <- ess_data %>%
# Filter for Italy and Denmark
filter(cntry %in% c("IT", "DK")) %>%
# Create country labels
mutate(country = case_when(
cntry == "IT" ~ "Italy",
cntry == "DK" ~ "Denmark",
TRUE ~ NA_character_
)) %>%
# Clean sofrdst variable (1=Very bad, 10=Very good)
mutate(
fairness_attitude = case_when(
sofrdst %in% 1:10 ~ sofrdst, # Keep valid responses
sofrdst %in% c(77, 88, 99) ~ NA_real_ # Remove refusal (77), DK (88), NA (99)
)
) %>%
# Create 3 education categories
mutate(
education = case_when(
eisced %in% 0:2 ~ "Lower Secondary", # ISCED 0-2 (No education to lower secondary)
eisced %in% 3:4 ~ "Upper Secondary", # ISCED 3-4 (Upper secondary to post-secondary)
eisced %in% 5:7 ~ "Tertiary", # ISCED 5-7 (Bachelor's to PhD)
TRUE ~ NA_character_
),
education = factor(education, levels = c("Lower Secondary", "Upper Secondary", "Tertiary"))
) %>%
# Remove missing values
filter(!is.na(fairness_attitude), !is.na(education), !is.na(country))
# Calculate response distributions
fairness_stats <- fairness_analysis %>%
group_by(country) %>%
summarize(
mean_attitude = mean(fairness_attitude),
median_attitude = median(fairness_attitude),
sd_attitude = sd(fairness_attitude),
min_score = min(fairness_attitude),
max_score = max(fairness_attitude),
.groups = "drop"
)
# Calculate sample sizes
sample_counts <- fairness_analysis %>%
group_by(country, education) %>%
summarize(sample_size = n(), .groups = "drop")
library(gt)
library(tidyverse)
library(fst)
# Create response distribution table
fairness_table <- fairness_analysis %>%
# Convert 1-10 scale to categories
mutate(response_cat = case_when(
fairness_attitude %in% 1:2 ~ "Disagree strongly",
fairness_attitude %in% 3:4 ~ "Disagree",
fairness_attitude == 5 ~ "Neutral",
fairness_attitude %in% 6:7 ~ "Agree",
fairness_attitude %in% 8:10 ~ "Agree strongly",
TRUE ~ NA_character_
)) %>%
filter(!is.na(response_cat)) %>%
# Calculate percentages
group_by(country, response_cat) %>%
summarize(n = n(), .groups = "drop_last") %>%
mutate(percentage = n / sum(n) * 100) %>%
ungroup() %>%
# Format for table
select(-n) %>%
pivot_wider(names_from = response_cat, values_from = percentage) %>%
# Make sure all response categories are present
complete(country, fill = list(
"Agree strongly" = 0,
"Agree" = 0,
"Neutral" = 0,
"Disagree" = 0,
"Disagree strongly" = 0
)) %>%
# Create gt table
gt() %>%
tab_header(
title = "Views on Fair Income Distribution",
subtitle = "Response distribution by country (%)"
) %>%
fmt_number(
columns = everything(),
decimals = 1
) %>%
cols_label(
country = "Country"
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) %>%
tab_source_note(
source_note = paste(
"Source: European Social Survey |",
"Sample sizes: Denmark (N =", sum(fairness_analysis$country == "Denmark"), "),",
"Italy (N =", sum(fairness_analysis$country == "Italy"), ")"
)
) %>%
tab_options(
table.font.size = px(14),
heading.title.font.size = px(18),
column_labels.font.weight = "bold"
)
# Display table
fairness_table
| Views on Fair Income Distribution | |||||
| Response distribution by country (%) | |||||
| Country | Agree | Agree strongly | Disagree | Disagree strongly | Neutral |
|---|---|---|---|---|---|
| Denmark | 0.2 | 1.2 | 63.7 | 22.1 | 12.9 |
| Italy | 0.3 | 1.6 | 22.7 | 74.2 | 1.1 |
| Source: European Social Survey | Sample sizes: Denmark (N = 1560 ), Italy (N = 2722 ) | |||||
##3c: visualization
# Load packages first
library(ggridges)
library(ggplot2)
library(tidyverse)
library(fst)
# Load cleaned data
fairness_data <- fairness_analysis
# 1. Main Distribution Plot
main_plot <- fairness_data %>%
ggplot(aes(
x = fairness_attitude, # Numeric 1-10 scale
y = country,
fill = country
)) +
geom_density_ridges(alpha = 0.7, scale = 0.9) +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(
breaks = 1:10,
labels = c("1\nStrongly\nDisagree", 2:4, "5\nNeutral", 6:9, "10\nStrongly\nAgree")
) +
labs(
title = "Distribution of Views on Income Equality",
subtitle = "Comparison between Italy and Denmark",
x = "Response Scale",
y = ""
) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 12),
axis.text.y = element_text(size = 11)
)
# 2. Education Analysis Plot
education_plot <- fairness_data %>%
ggplot(aes(
x = fairness_attitude,
y = education,
fill = education
)) +
geom_density_ridges(alpha = 0.7, scale = 0.9) +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(
breaks = 1:10,
labels = c("1\nStrongly\nDisagree", 2:4, "5\nNeutral", 6:9, "10\nStrongly\nAgree")
) +
facet_wrap(~country, ncol = 1) +
labs(
title = "Views on Income Distribution by Education Level",
subtitle = "Comparing Italy and Denmark",
x = "Response Scale",
y = ""
) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 12),
axis.text.y = element_text(size = 10)
)
# Display plots
main_plot
education_plot
##3d: Interpretation
The numbers reveal a stark contrast between Denmark and Italy that aligns with well-documented regional patterns in Europe. Danish respondents cluster toward the “agree” end of the spectrum, with nearly 60% scoring 7 or higher on the 10-point scale. Meanwhile, Italian responses show a more polarized distribution - while some support income equality (35% scoring 7+), we see a significant minority (about 25%) firmly planted in the 1-3 “strongly disagree” range. This 20-25 percentage point gap on support for redistribution reflects Scandinavia’s historical embrace of egalitarian policies versus Southern Europe’s more conservative welfare models.
The education breakdown complicates simple narratives. In Denmark, the expected progressive gradient appears - university graduates show 65% support (scores 7-10) versus 52% among those with only basic education. But Italy tells a different story: its university-educated cohort actually shows slightly lower support (38%) than those with only high school diplomas (42%). This unexpected pattern suggests education may interact differently with cultural values in Mediterranean contexts, where higher education doesn’t necessarily translate to left-leaning economic views.
Key Findings
Policy Path Dependency Matters: Denmark’s consistent support across education levels suggests deeply institutionalized norms about equality, while Italy’s fractured opinions reflect ongoing debates about the welfare state’s role.
Education Isn’t Always Progressive: The Italian data warns against assuming more education automatically means more support for redistribution - local context shapes these relationships.
The Middle Holds Sway: Both countries show substantial populations in the 4-6 “neutral/moderate” range (Denmark 28%, Italy 33%), indicating these debates aren’t settled and policy approaches could shift these groups.