Mini Project: Programming in R (ESS attitudes + wellbeing)

Author

Stephanie Bugler

1 Introduction

1.1 Why

Following on from an R workshop I attended back in Summer 2025, I wanted to further develop and use my skills on a different dataset at home. I used R extensively during my masters thesis but forgot all of my skills over the summer. Therefore, I wanted to start from scratch and learn the basics over a longer period of time.

1.2 The dataset

The dataset comes from the European Social Survey (ESS) cumulative file for rounds 1–9 (so it pools multiple survey waves across different years). It’s a large cross-national survey with adult respondents from many European countries, using a standardised questionnaire so the same kinds of questions are comparable across countries and across time.

For this mini project I focus on a few key parts of the file:

  • Country and time info: cntry (country code) and essround (which survey round the person is from).

  • Demographics: agea (age) and gndr (gender).

  • Subjective wellbeing: happy (happiness) and stflife (life satisfaction), both on 0–10 scales.

  • Institutional trust: several items like trust in parliament, legal system, police, politicians, parties, the European Parliament, and the UN (e.g., trstprl, trstlgl, trstplc, etc.), also typically on 0–10 scales.

1.3 The file format

I am documenting this skill in a Quatro document. This was a method I learned during the workshop and I enjoy how it makes code visual and more accessible. It is also advantageous as you learn a second language, Markdown, which I now use extensively to write notes in my job at a tech start up.

The following is a documentaion of the skills I’ve been working on sporadically over the last few months, building on what I already learned during my thesis and the R workshop.


2 06 Oct 2025

# make sure packages are installed.
# I got "there is no package called ..." a bunch of times, so i'm just installing first.

pkgs <- c(
  "readr", "dplyr", "ggplot2", "tidyr", "stringr", "forcats",
  "janitor", "skimr", "gtsummary", "broom", "scales",
  "survey", "haven"
)

to_install <- pkgs[!pkgs %in% installed.packages()[, "Package"]]
if (length(to_install) > 0) {
  install.packages(to_install)
}

# load packages after install
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
library(forcats)

library(janitor)
library(skimr)
library(gtsummary)
library(broom)
library(scales)

library(survey)
library(haven)

# paths (windows = better with forward slashes)
project_dir <- "C:/Users/StephanieBugler/OneDrive - medicalvalues GmbH/Dokumente/R for Uni/R Project"
data_file   <- "C:/Users/StephanieBugler/OneDrive - medicalvalues GmbH/Dokumente/R for Uni/data/ESS1-9e01_1.sav"

# I got a file error once bc i copied the path wrong, so doing checks.
if (!dir.exists(project_dir)) stop("Project folder not found. Check project_dir path.")
if (!file.exists(data_file)) stop("Data file not found. Check data_file path.")

setwd(project_dir)
getwd()
[1] "C:/Users/StephanieBugler/OneDrive - medicalvalues GmbH/Dokumente/R for Uni/R Project"

2.1 Import the data file

# reading SPSS .sav
# if you get: 'Error: cannot allocate vector...' then the file is too big for memory.
# in that case i'd try selecting fewer vars after loading, or use a laptop restart (not ideal but yeah).

ess_raw <- read_sav(data_file) |>
  clean_names()

glimpse(ess_raw)
Rows: 2,358
Columns: 35
$ cntry    <chr+lbl> "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE…
$ cname    <chr> "ESS1-9e01", "ESS1-9e01", "ESS1-9e01", "ESS1-9e01", "ESS1-9e0…
$ cedition <chr> "1.0", "1.0", "1.0", "1.0", "1.0", "1.0", "1.0", "1.0", "1.0"…
$ cproddat <chr> "10.12.2020", "10.12.2020", "10.12.2020", "10.12.2020", "10.1…
$ cseqno   <dbl> 101403, 101404, 101405, 101406, 101407, 101408, 101409, 10141…
$ name     <chr> "ESS9e03", "ESS9e03", "ESS9e03", "ESS9e03", "ESS9e03", "ESS9e…
$ essround <dbl+lbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, …
$ edition  <chr> "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "…
$ idno     <dbl> 9, 10, 64, 65, 91, 119, 150, 212, 255, 270, 279, 304, 311, 31…
$ dweight  <dbl> 0.9994662, 0.9994662, 0.9994662, 0.9994662, 0.9994662, 0.9994…
$ pspwght  <dbl> 1.2750090, 0.8540229, 0.7596949, 1.0794106, 1.2697877, 1.2750…
$ pweight  <dbl> 3.037345, 3.037345, 3.037345, 3.037345, 3.037345, 3.037345, 3…
$ anweight <dbl> 3.872642, 2.593962, 2.307456, 3.278542, 3.856783, 3.872642, 2…
$ trstprl  <dbl+lbl>  2,  7,  3,  3,  4,  9, 10,  5,  5,  7, NA,  6,  6,  8,  …
$ trstlgl  <dbl+lbl>  4,  8,  5,  4,  5,  7, 10,  7,  8,  8,  5,  9,  7,  9, 1…
$ trstplc  <dbl+lbl>  5,  8,  6,  4,  7,  7, 10,  7,  9,  8,  7,  9,  7,  8, 1…
$ trstplt  <dbl+lbl>  0,  6,  3,  3,  5,  8, 10,  5,  5,  6,  3,  5,  6,  7,  …
$ trstprt  <dbl+lbl> 2, 6, 5, 3, 5, 8, 4, 5, 5, 6, 1, 5, 3, 7, 9, 4, 3, 8, 5, …
$ trstep   <dbl+lbl>  3,  4,  5,  2,  4, 10, 10,  6,  3,  7,  3,  5,  5,  7,  …
$ trstun   <dbl+lbl>  0,  5,  6,  2,  5, 10, 10,  6,  7,  7, NA,  7,  5,  7,  …
$ lrscale  <dbl+lbl> 5, 5, 3, 2, 1, 5, 5, 2, 3, 5, 1, 4, 5, 6, 4, 3, 0, 5, 5, …
$ stflife  <dbl+lbl> 10,  8,  8,  6,  9,  9, 10,  8,  4,  9, 10,  9,  9,  8, 1…
$ stfeco   <dbl+lbl> 10,  8,  5,  6,  9,  8, 10,  8,  5,  9,  7,  9,  9, 10,  …
$ stfgov   <dbl+lbl>  7,  5,  6,  2,  5, NA,  5,  7,  4,  6, NA,  6,  8,  7,  …
$ stfdem   <dbl+lbl>  7,  9,  6,  4,  5, 10, 10,  7,  6,  9, NA,  5,  9,  9, 1…
$ imsmetn  <dbl+lbl> 1, 1, 1, 2, 1, 2, 2, 1, 1, 1, 3, 2, 1, 2, 2, 1, 1, 2, 1, …
$ imdfetn  <dbl+lbl>  1,  1,  1,  2, NA,  2,  2,  1,  2,  2,  3,  2,  2,  2,  …
$ impcntr  <dbl+lbl>  1,  1,  1,  2, NA,  1,  3,  1, NA,  3,  3,  3,  2,  2,  …
$ imbgeco  <dbl+lbl> NA, 10,  9,  4,  8,  8, 10,  8,  7,  7,  0,  9, 10,  5,  …
$ imueclt  <dbl+lbl>  8,  5,  8,  4,  8,  8, 10,  8,  7,  4,  0,  9, 10,  5,  …
$ imwbcnt  <dbl+lbl> 5, 5, 8, 3, 8, 7, 6, 8, 7, 5, 0, 5, 5, 5, 5, 7, 4, 0, 7, …
$ happy    <dbl+lbl> 10,  8, 10,  6,  9,  9,  9,  8,  5,  9, 10,  9,  8,  8, 1…
$ rlgdgr   <dbl+lbl>  9,  7,  9,  1,  5,  9,  5,  0,  3,  9,  5,  0, 10,  3, 1…
$ gndr     <dbl+lbl> 2, 2, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, …
$ agea     <dbl+lbl> 26, 65, 74, 64, 54, 20, 71, 41, 62, 65, 67, 47, 67, 48, 6…

3 12 Oct 2025

3.1 First look at variables + quick summaries

# sometimes skim() takes ages on huge files, so i skim a smaller set first.
vars_i_care <- c(
  "cntry","essround","gndr","agea","happy","stflife",
  "trstprl","trstlgl","trstplc","trstplt","trstprt","trstep","trstun",
  "pspwght","pweight","dweight"
)

ess_raw |>
  select(any_of(vars_i_care)) |>
  skim()
Data summary
Name select(ess_raw, any_of(va…
Number of rows 2358
Number of columns 16
_______________________
Column type frequency:
character 1
numeric 15
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
cntry 0 1 2 2 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
essround 0 1.00 9.00 0.00 9.00 9.00 9.00 9.00 9.00 ▁▁▇▁▁
gndr 0 1.00 1.49 0.50 1.00 1.00 1.00 2.00 2.00 ▇▁▁▁▇
agea 4 1.00 49.65 19.06 15.00 34.00 51.00 64.00 90.00 ▆▆▇▇▃
happy 3 1.00 7.82 1.70 0.00 7.00 8.00 9.00 10.00 ▁▁▂▇▆
stflife 6 1.00 7.66 1.96 0.00 7.00 8.00 9.00 10.00 ▁▁▂▇▆
trstprl 31 0.99 5.10 2.48 0.00 3.00 5.00 7.00 10.00 ▅▆▇▇▂
trstlgl 21 0.99 6.13 2.47 0.00 5.00 7.00 8.00 10.00 ▂▃▆▇▃
trstplc 5 1.00 7.10 2.14 0.00 6.00 8.00 9.00 10.00 ▁▂▃▇▅
trstplt 19 0.99 3.96 2.27 0.00 2.00 4.00 6.00 10.00 ▇▇▇▃▁
trstprt 22 0.99 3.99 2.15 0.00 2.00 4.00 5.00 10.00 ▆▇▇▃▁
trstep 81 0.97 4.56 2.40 0.00 3.00 5.00 6.00 10.00 ▅▆▇▅▁
trstun 104 0.96 4.90 2.36 0.00 3.00 5.00 7.00 10.00 ▃▆▇▆▁
pspwght 0 1.00 1.00 0.44 0.43 0.74 0.94 1.10 3.79 ▇▂▁▁▁
pweight 0 1.00 3.04 0.00 3.04 3.04 3.04 3.04 3.04 ▁▁▇▁▁
dweight 0 1.00 1.00 0.02 1.00 1.00 1.00 1.00 1.63 ▇▁▁▁▁
# check what's actually in the file 
missing_vars <- setdiff(vars_i_care, names(ess_raw))
missing_vars
character(0)

4 26 Oct 2025

4.1 Cleaning missing values + recoding

# ESS uses special missing codes like 7/8/9 and also 77/88/99.
# first time i forgot this and got silly means.
# i also made the mistake of applying na_if to character cols, so i guard it.

ess_to_na <- function(x) {
  if (!is.numeric(x)) return(x)
  dplyr::na_if(x, 77) |>
    dplyr::na_if(88) |>
    dplyr::na_if(99) |>
    dplyr::na_if(7)  |>
    dplyr::na_if(8)  |>
    dplyr::na_if(9)
}

ess <- ess_raw |>
  mutate(across(everything(), ess_to_na)) |>
  mutate(
    gndr_f = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      TRUE ~ NA_character_
    ) |> factor(),
    cntry = as.factor(cntry),
    essround = as.integer(essround)
  )

ess |>
  select(any_of(c("cntry","essround","gndr_f","agea","happy","stflife",
                 "trstprl","trstlgl","trstplc","trstplt","trstprt","trstep","trstun"))) |>
  skim()
Data summary
Name select(…)
Number of rows 2358
Number of columns 13
_______________________
Column type frequency:
factor 2
numeric 11
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
cntry 0 1 FALSE 1 DE: 2358
gndr_f 0 1 FALSE 2 Mal: 1212, Fem: 1146

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
essround 2358 0.00 NaN NA NA NA NA NA NA
agea 30 0.99 49.31 18.88 15 33 51 64 90 ▆▆▇▇▂
happy 1670 0.29 7.12 2.80 0 5 6 10 10 ▁▂▇▁▇
stflife 1549 0.34 6.81 3.01 0 5 6 10 10 ▂▃▆▁▇
trstprl 720 0.69 4.02 2.14 0 3 4 5 10 ▅▆▇▁▁
trstlgl 1084 0.54 4.65 2.44 0 3 5 6 10 ▃▅▇▁▂
trstplc 1411 0.40 5.81 2.79 0 4 5 6 10 ▂▃▇▁▅
trstplt 335 0.86 3.43 1.96 0 2 4 5 10 ▇▇▇▁▁
trstprt 304 0.87 3.52 1.84 0 2 4 5 10 ▆▇▇▁▁
trstep 571 0.76 3.74 2.02 0 3 4 5 10 ▅▆▇▁▁
trstun 691 0.71 3.96 1.99 0 3 4 5 10 ▃▆▇▁▁
# missingness check
ess |>
  summarise(
    n = n(),
    miss_happy = mean(is.na(happy)),
    miss_stflife = mean(is.na(stflife)),
    miss_agea = mean(is.na(agea)),
    miss_trstprl = if ("trstprl" %in% names(ess)) mean(is.na(trstprl)) else NA_real_
  )

5 08 Nov 2025

5.1 Descriptive stats

ess |>
  select(happy, stflife, agea, gndr_f) |>
  tbl_summary(
    statistic = all_continuous() ~ "{mean} ({sd})",
    missing_text = "(missing)"
  )
Characteristic N = 2,3581
How happy are you
    0 8 (1.2%)
    1 4 (0.6%)
    2 17 (2.5%)
    3 37 (5.4%)
    4 46 (6.7%)
    5 128 (19%)
    6 138 (20%)
    10 310 (45%)
    (missing) 1,670
How satisfied with life as a whole
    0 17 (2.1%)
    1 13 (1.6%)
    2 30 (3.7%)
    3 54 (6.7%)
    4 66 (8.2%)
    5 143 (18%)
    6 141 (17%)
    10 345 (43%)
    (missing) 1,549
Age of respondent, calculated 49 (19)
    (missing) 30
gndr_f
    Female 1,146 (49%)
    Male 1,212 (51%)
1 n (%); Mean (SD)
ggplot(ess, aes(x = happy)) +
  geom_histogram(binwidth = 1, boundary = 0) +
  scale_x_continuous(breaks = 0:10) +
  labs(title = "Happiness (0-10)", x = "happy", y = "count")

ggplot(ess, aes(x = stflife)) +
  geom_histogram(binwidth = 1, boundary = 0) +
  scale_x_continuous(breaks = 0:10) +
  labs(title = "Life satisfaction (0-10)", x = "stflife", y = "count")

# group summary by gender
# i got an error once: "object 'gndr_f' not found" bc i was still using ess_raw.
ess |>
  filter(!is.na(gndr_f), !is.na(happy)) |>
  group_by(gndr_f) |>
  summarise(
    n = n(),
    mean_happy = mean(happy),
    sd_happy = sd(happy),
    .groups = "drop"
  )

6 21 Nov 2025

6.1 create mean scores

trust_vars <- c("trstprl","trstlgl","trstplc","trstplt","trstprt","trstep","trstun")
trust_vars_use <- intersect(trust_vars, names(ess))

# if this ends up empty, then i didn't download the trust vars (or names differ)
trust_vars_use
[1] "trstprl" "trstlgl" "trstplc" "trstplt" "trstprt" "trstep"  "trstun" 
ess <- ess |>
  mutate(
    trust_mean = if (length(trust_vars_use) > 0) rowMeans(across(all_of(trust_vars_use)), na.rm = TRUE) else NA_real_,
    trust_n    = if (length(trust_vars_use) > 0) rowSums(!is.na(across(all_of(trust_vars_use)))) else NA_real_
  )

ess |>
  count(trust_n) |>
  arrange(trust_n)
ggplot(ess, aes(x = trust_mean)) +
  geom_histogram(binwidth = 0.5) +
  scale_x_continuous(limits = c(0, 10)) +
  labs(title = "Mean institutional trust (0-10)", x = "trust_mean", y = "count")

# keep trust_mean only when at least 3 trust items answered
# i picked 3 bc 1 item feels too random.
ess <- ess |>
  mutate(trust_mean = if_else(trust_n >= 3, trust_mean, NA_real_))

7 05 Dec 2025

7.1 Correlations & scatterplots

ess |>
  select(happy, stflife, trust_mean, agea) |>
  cor(use = "pairwise.complete.obs") |>
  round(2)
           happy stflife trust_mean  agea
happy       1.00    0.83       0.27  0.05
stflife     0.83    1.00       0.34  0.11
trust_mean  0.27    0.34       1.00 -0.11
agea        0.05    0.11      -0.11  1.00
ggplot(ess, aes(x = trust_mean, y = stflife)) +
  geom_point(alpha = 0.15) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_x_continuous(limits = c(0, 10)) +
  scale_y_continuous(limits = c(0, 10)) +
  labs(
    title = "Trust and life satisfaction",
    x = "Mean trust (0-10)",
    y = "Life satisfaction (0-10)"
  )

8 19 Dec 2025

8.1 Basic regression

m1 <- lm(stflife ~ trust_mean, data = ess)
summary(m1)

Call:
lm(formula = stflife ~ trust_mean, data = ess)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.0172  -2.0500  -0.4442   2.7424   5.5020 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.49805    0.24878  18.080   <2e-16 ***
trust_mean   0.55192    0.05768   9.569   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.824 on 706 degrees of freedom
  (1650 observations deleted due to missingness)
Multiple R-squared:  0.1148,    Adjusted R-squared:  0.1135 
F-statistic: 91.56 on 1 and 706 DF,  p-value: < 2.2e-16
tidy(m1, conf.int = TRUE)
# add age + gender
# 
table(ess$gndr_f, useNA = "ifany")

Female   Male 
  1146   1212 
m2 <- lm(stflife ~ trust_mean + agea + gndr_f, data = ess)
summary(m2)

Call:
lm(formula = stflife ~ trust_mean + agea + gndr_f, data = ess)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.7985 -1.9946 -0.3604  2.5787  6.2658 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.098822   0.421315   7.355 5.41e-13 ***
trust_mean  0.573568   0.058257   9.846  < 2e-16 ***
agea        0.021910   0.005701   3.843 0.000133 ***
gndr_fMale  0.347888   0.212796   1.635 0.102537    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.798 on 692 degrees of freedom
  (1662 observations deleted due to missingness)
Multiple R-squared:  0.1349,    Adjusted R-squared:  0.1312 
F-statistic: 35.97 on 3 and 692 DF,  p-value: < 2.2e-16
tidy(m2, conf.int = TRUE)

9 11 Jan 2026

by_country <- ess |>
  group_by(cntry) |>
  summarise(
    n = n(),
    mean_trust = mean(trust_mean, na.rm = TRUE),
    mean_stflife = mean(stflife, na.rm = TRUE),
    mean_happy = mean(happy, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(mean_stflife))

by_country
ggplot(by_country, aes(x = mean_trust, y = mean_stflife)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Country means: trust vs life satisfaction",
    x = "Mean trust (country average)",
    y = "Mean life satisfaction (country average)"
  )

10 28 Jan 2026

# find which weight columns exist
weight_candidates <- c("pspwght","pweight","dweight")
weight_candidates[weight_candidates %in% names(ess)]
[1] "pspwght" "pweight" "dweight"
weight_var <- if ("pspwght" %in% names(ess)) "pspwght" else
  if ("pweight" %in% names(ess)) "pweight" else
  if ("dweight" %in% names(ess)) "dweight" else
  NA_character_

weight_var
[1] "pspwght"
if (!is.na(weight_var)) {
  ess_svy <- svydesign(
    ids = ~1,
    weights = as.formula(paste0("~", weight_var)),
    data = ess
  )

  svymean(~stflife, design = ess_svy, na.rm = TRUE)

  m2_w <- svyglm(stflife ~ trust_mean + agea + gndr_f, design = ess_svy)
  summary(m2_w)
  tidy(m2_w, conf.int = TRUE)
} else {
  # if this prints, it just means the weight vars aren't in the file i loaded.
  "No weight column found in the dataset, skipping weighted analysis."
}

11 30 Jan 2026

# saving cleaned file into the project folder
# i save it as a new file so i don't overwrite the original .sav or anything.
write_csv(ess, file.path(project_dir, "ess_cleaned_for_project.csv"))