Magdalena Fink Seminararbeit

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(foreign)
# Variable name                         Label
# -----------------------------------------------------------------------------------------------
# ess11_reg                             ESS11REG – Regional-level data (identifier)
# 
# reg11_area_2023                       Area (km²) – 2023
# reg11_tpopsz_2023                     Population size – Total – 2023
# reg11_fpopsz_2023                     Population size – Female – 2023
# reg11_mpopsz_2023                     Population size – Male – 2023
# reg11_pode_2023                       Population density – 2023
# 
# reg11_lbirth_2023                     Live births (total) – 2023
# reg11_death_2023                      Deaths (total) – 2023
# reg11_natgrow_2023                    Natural change of population – 2023
# reg11_cnmigrat_2023                   Net migration plus statistical adjustment – 2023
# reg11_grow_2023                       Total population change – 2023
# 
# reg11_gbirthrt_2023                   Crude birth rate – 2023
# reg11_gdeathrt_2023                   Crude death rate – 2023
# reg11_natgrowrt_2023                  Crude rate of natural change of population – 2023
# reg11_cnmigratrt_2023                 Crude rate of net migration plus statistical adjustment – 2023
# reg11_growrt_2023                     Crude rate of total population change – 2023
# 
# reg11_gdp_eurhab_2023                 GDP at current market prices – Euro per inhabitant – 2023
# reg11_gdp_mio_eur_2023                GDP at current market prices – Million euro – 2023
# reg11_gdp_eurhab_eu27_2020_2023       GDP per inhabitant in % of EU27 (from 2020) average – 2023
# reg11_gdp_mio_nac_2023                GDP at current market prices – Million units of national currency – 2023
# reg11_gdp_mio_pps_eu27_2020_2023      GDP at current market prices – Million PPS (EU27 from 2020) – 2023
# reg11_gdp_pps_eu27_2020_hab_2023      GDP at current market prices – PPS per inhabitant – 2023
# reg11_gdp_pps_hab_eu27_2020_2023      GDP per inhabitant in % of EU27 (from 2020) average (PPS) – 2023
# 

# read region data (see above)
dfreg <- read.spss("ESS11_ML_Region.sav", 
                   to.data.frame = TRUE)
names(dfreg)

##  [1] "region"                           "reg11_area_2023"                 
##  [3] "reg11_tpopsz_2023"                "reg11_fpopsz_2023"               
##  [5] "reg11_mpopsz_2023"                "reg11_pode_2023"                 
##  [7] "reg11_lbirth_2023"                "reg11_death_2023"                
##  [9] "reg11_natgrow_2023"               "reg11_cnmigrat_2023"             
## [11] "reg11_grow_2023"                  "reg11_gbirthrt_2023"             
## [13] "reg11_gdeathrt_2023"              "reg11_natgrowrt_2023"            
## [15] "reg11_cnmigratrt_2023"            "reg11_growrt_2023"               
## [17] "reg11_gdp_eurhab_2023"            "reg11_gdp_mio_eur_2023"          
## [19] "reg11_gdp_eurhab_eu27_2020_2023"  "reg11_gdp_mio_nac_2023"          
## [21] "reg11_gdp_mio_pps_eu27_2020_2023" "reg11_gdp_pps_eu27_2020_hab_2023"
## [23] "reg11_gdp_pps_hab_eu27_2020_2023"

dfreg$region = trimws(dfreg$region)
# check
unique(dfreg$region)

##   [1] "AT11 " "AT12 " "AT13 " "AT21 " "AT22 " "AT31 " "AT32 " "AT33 " "AT34 "
##  [10] "BE10 " "BE21 " "BE22 " "BE23 " "BE24 " "BE25 " "BE31 " "BE32 " "BE33 "
##  [19] "BE34 " "BE35 " "BG311" "BG312" "BG313" "BG314" "BG315" "BG321" "BG322"
##  [28] "BG323" "BG324" "BG325" "BG331" "BG332" "BG333" "BG334" "BG341" "BG342"
##  [37] "BG343" "BG344" "BG411" "BG412" "BG413" "BG414" "BG415" "BG421" "BG422"
##  [46] "BG423" "BG424" "BG425" "CY0  " "DE1  " "DE2  " "DE3  " "DE4  " "DE6  "
##  [55] "DE7  " "DE8  " "DE9  " "DEA  " "DEB  " "DEC  " "DED  " "DEE  " "DEF  "
##  [64] "DEG  " "EL30 " "EL41 " "EL42 " "EL43 " "EL51 " "EL52 " "EL53 " "EL54 "
##  [73] "EL61 " "EL62 " "EL63 " "EL64 " "EL65 " "ES11 " "ES12 " "ES13 " "ES21 "
##  [82] "ES22 " "ES23 " "ES24 " "ES30 " "ES41 " "ES42 " "ES43 " "ES51 " "ES52 "
##  [91] "ES53 " "ES61 " "ES62 " "ES64 " "ES70 " "FI196" "FI1B1" "FI1C1" "FI1C2"
## [100] "FI1C5" "FI1D5" "FI1D7" "FI200" "FR10 " "FRB0 " "FRC1 " "FRC2 " "FRD1 "
## [109] "FRD2 " "FRE1 " "FRE2 " "FRF1 " "FRF2 " "FRF3 " "FRG0 " "FRH0 " "FRI1 "
## [118] "FRI2 " "FRI3 " "FRJ1 " "FRJ2 " "FRK1 " "FRK2 " "FRL0 " "HR021" "HR022"
## [127] "HR023" "HR024" "HR025" "HR026" "HR027" "HR028" "HR031" "HR032" "HR033"
## [136] "HR034" "HR035" "HR036" "HR037" "HU110" "HU120" "HU211" "HU212" "HU213"
## [145] "HU221" "HU222" "HU223" "HU231" "HU232" "HU233" "HU311" "HU312" "HU313"
## [154] "HU321" "HU322" "HU323" "HU331" "HU332" "HU333" "IE041" "IE042" "IE051"
## [163] "IE052" "IE053" "IE061" "IE062" "IE063" "ITC  " "ITF  " "ITG  " "ITH  "
## [172] "ITI  " "LT011" "LT021" "LT022" "LT023" "LT024" "LT025" "LT026" "LT027"
## [181] "LT028" "LT029" "ME0  " "NL11 " "NL12 " "NL13 " "NL21 " "NL22 " "NL23 "

# still white spaces at the end of region code. 
# so we have to trim non-regular white space manually:
dfreg$region = gsub("^[[:space:]\u00A0]+|[[:space:]\u00A0]+$", "", dfreg$region)
# check
unique(dfreg$region) # yep, much better now

##   [1] "AT11"  "AT12"  "AT13"  "AT21"  "AT22"  "AT31"  "AT32"  "AT33"  "AT34" 
##  [10] "BE10"  "BE21"  "BE22"  "BE23"  "BE24"  "BE25"  "BE31"  "BE32"  "BE33" 
##  [19] "BE34"  "BE35"  "BG311" "BG312" "BG313" "BG314" "BG315" "BG321" "BG322"
##  [28] "BG323" "BG324" "BG325" "BG331" "BG332" "BG333" "BG334" "BG341" "BG342"
##  [37] "BG343" "BG344" "BG411" "BG412" "BG413" "BG414" "BG415" "BG421" "BG422"
##  [46] "BG423" "BG424" "BG425" "CY0"   "DE1"   "DE2"   "DE3"   "DE4"   "DE6"  
##  [55] "DE7"   "DE8"   "DE9"   "DEA"   "DEB"   "DEC"   "DED"   "DEE"   "DEF"  
##  [64] "DEG"   "EL30"  "EL41"  "EL42"  "EL43"  "EL51"  "EL52"  "EL53"  "EL54" 
##  [73] "EL61"  "EL62"  "EL63"  "EL64"  "EL65"  "ES11"  "ES12"  "ES13"  "ES21" 
##  [82] "ES22"  "ES23"  "ES24"  "ES30"  "ES41"  "ES42"  "ES43"  "ES51"  "ES52" 
##  [91] "ES53"  "ES61"  "ES62"  "ES64"  "ES70"  "FI196" "FI1B1" "FI1C1" "FI1C2"
## [100] "FI1C5" "FI1D5" "FI1D7" "FI200" "FR10"  "FRB0"  "FRC1"  "FRC2"  "FRD1" 
## [109] "FRD2"  "FRE1"  "FRE2"  "FRF1"  "FRF2"  "FRF3"  "FRG0"  "FRH0"  "FRI1" 
## [118] "FRI2"  "FRI3"  "FRJ1"  "FRJ2"  "FRK1"  "FRK2"  "FRL0"  "HR021" "HR022"
## [127] "HR023" "HR024" "HR025" "HR026" "HR027" "HR028" "HR031" "HR032" "HR033"
## [136] "HR034" "HR035" "HR036" "HR037" "HU110" "HU120" "HU211" "HU212" "HU213"
## [145] "HU221" "HU222" "HU223" "HU231" "HU232" "HU233" "HU311" "HU312" "HU313"
## [154] "HU321" "HU322" "HU323" "HU331" "HU332" "HU333" "IE041" "IE042" "IE051"
## [163] "IE052" "IE053" "IE061" "IE062" "IE063" "ITC"   "ITF"   "ITG"   "ITH"  
## [172] "ITI"   "LT011" "LT021" "LT022" "LT023" "LT024" "LT025" "LT026" "LT027"
## [181] "LT028" "LT029" "ME0"   "NL11"  "NL12"  "NL13"  "NL21"  "NL22"  "NL23"

nrow(dfreg)

## [1] 189

ncol(dfreg)

## [1] 23

table(complete.cases(dfreg))

## 
## TRUE 
##  189

# number of regions
length(unique(dfreg$region))

## [1] 189

# read original data
df10 = read.spss("ESS10e03_3.sav", to.data.frame = TRUE, use.value.labels = FALSE)
df11 = read.spss("ESS11_unlabeled.0-10.sav", to.data.frame = TRUE, use.value.labels = FALSE)
df10$region = trimws(df10$region)
df11$region = trimws(df11$region)

nrow(df10)

## [1] 37611

ncol(df10)

## [1] 621

nrow(df11)

## [1] 40156

ncol(df11)

## [1] 640

table(complete.cases(df10))

## 
## FALSE 
## 37611

table(complete.cases(df11))

## 
## FALSE 
## 40156

# number of regions in df10
length(unique(df10$region))

## [1] 248

# number of regions in df11
length(unique(df11$region))

## [1] 264

# regions in df10 but not in df11
unique(df10$region)[!(unique(df10$region) %in% unique(df11$region))]

##  [1] "BG331" "BG334" "BG411" "BG423" "BG415" "BG344" "BG421" "BG342" "BG425"
## [10] "BG323" "BG321" "BG312" "BG322" "BG413" "BG332" "BG343" "BG314" "BG422"
## [19] "BG333" "BG341" "BG325" "BG324" "BG315" "BG414" "BG311" "BG412" "BG424"
## [28] "CZ051" "CZ071" "CZ042" "CZ010" "CZ031" "CZ032" "CZ080" "CZ063" "CZ053"
## [37] "CZ052" "CZ020" "CZ072" "CZ041" "CZ064" "EE004" "EE001" "EE008" "EE00A"
## [46] "EE009" "FI1D9" "FI1D8" "IS002" "IS001" NA      "ME0"   "MK002" "MK008"
## [55] "MK007" "MK005" "MK006" "MK003" "MK001" "MK004"

# regions in df11 but not in df10
unique(df11$region)[!(unique(df11$region) %in% unique(df10$region))]

##  [1] "AT31"  "AT22"  "AT33"  "AT32"  "AT12"  "AT11"  "AT13"  "AT34"  "AT21" 
## [10] "CY0"   "DEF"   "DE2"   "DEB"   "DE8"   "DEA"   "DE7"   "DE9"   "DE1"  
## [19] "DED"   "DE4"   "DEG"   "DE3"   "DE6"   "DEE"   "DEC"   "ES21"  "ES61" 
## [28] "ES12"  "ES52"  "ES30"  "ES11"  "ES51"  "ES24"  "ES42"  "ES41"  "ES64" 
## [37] "ES13"  "ES43"  "ES23"  "ES70"  "ES53"  "ES62"  "ES22"  "FI1D6" "FI1D4"
## [46] "IS01"  "IS02"  "PL91"  "PL71"  "PL81"  "PL61"  "PL21"  "PL41"  "PL51" 
## [55] "PL42"  "PL63"  "PL92"  "PL82"  "PL52"  "PL72"  "PL22"  "PL43"  "PL84" 
## [64] "PL62"  "RS21"  "RS22"  "RS11"  "RS12"  "SE22"  "SE23"  "SE12"  "SE31" 
## [73] "SE33"  "SE21"  "SE11"  "SE32"

# number of regions in df10 but not in df11
length(unique(df10$region)[!(unique(df10$region) %in% unique(df11$region))])

## [1] 60

#length(setdiff(df10$region, df11$region))
# number of regions in df11 but not in df10
length(unique(df11$region)[!(unique(df11$region) %in% unique(df10$region))])

## [1] 76

#length(setdiff(df11$region, df10$region))
# number of regions in both df11 OR df10
length(union(df10$region, df11$region))

## [1] 324

# number of regions in both df11 AND df10
length(intersect(df10$region, df11$region))

## [1] 188

# list of all variable names from both rounds
allvars = union(names(df10), names(df11))

# variable names from one round which do not exist in the other round
vars_not_in_df10 = setdiff(allvars, names(df10))
vars_not_in_df11 = setdiff(allvars, names(df11))

# add missing columns to both rounds and fill with NA
if (length(vars_not_in_df10) > 0) df10[vars_not_in_df10] = NA
if (length(vars_not_in_df11) > 0) df11[vars_not_in_df11] = NA

# align order of columns  in both rounds
df10 = df10[allvars]   
df11 = df11[allvars]
nrow(df10)

## [1] 37611

ncol(df10)

## [1] 912

nrow(df11)

## [1] 40156

ncol(df11)

## [1] 912

# number of regions in df10
length(unique(df10$region))

## [1] 248

# number of regions in df11
length(unique(df11$region))

## [1] 264

# append both data sets
df = rbind(df10, df11)
nrow(df)

## [1] 77767

ncol(df)

## [1] 912

# number of regions in df
length(unique(df$region))

## [1] 324

####  "Right-wing Populist Voter", ESS10
# Country            ESS10_variable   Right-wing_populist_party_included                                   ESS10_Code(s)
# ----------------------------------------------------------------------------------------------------------------------
# Belgium            prtvtebe         Vlaams Belang                                                          6
# Bulgaria           prtvtebg         VMRO, Ataka, Vazrazhdane                                               6, 8, 9
# Switzerland        prtvthch         Swiss People's Party (SVP)                                             1
# Croatia            prtvtbhr         DP / Hrvatski suverenisti / Hrast (nationalist bloc)                   3
# Czechia            prtvtecz         Svoboda a přímá demokracie (SPD)                                       8
# Estonia            prtvthee         EKRE (Eesti Konservatiivne Rahvaerakond)                               6
# Finland            prtvtefi         True Finns                                                             5
# France             prtvtefr         Front National (FN)                                                    11
# Greece             prtvtdgr         Ελληνική Λύση, Χρυσή Αυγή                                              5, 7
# Hungary            prtvtghu         Fidesz, Jobbik                                                         3, 4
# Iceland            prtvtdis         Miðflokkurinn                                                          6
# Ireland            prtvtdie         (none included)                                                        —
# Italy              prtvtdit         Lega, Fratelli d’Italia, CasaPound                                     3, 5, 10
# Lithuania          prtvclt1–3       National Alliance (NS)                                                 7
# Montenegro         prtvtame         Democratic Front (DF)                                                  9
# Netherlands        prtvthnl         Party for Freedom (PVV), Forum for Democracy, JA21                     3, 13, 16
# North Macedonia    prtvtmk          VMRO-DPMNE, Levica                                                     2, 5
# Norway             prtvtbno         Fremskrittspartiet (Progress Party)                                    8
# Portugal           prtvtdpt         CHEGA, PNR                                                             4, 15
# Slovenia           prtvtfsi         SDS, SNS                                                               8, 11
# Slovakia           prtvtesk         ĽS Naše Slovensko                                                      4
# United Kingdom     prtvtdgb         UK Independence Party, Brexit Party                                    7, 8

####  "Right-wing Populist Voter", ESS11
# Country           ESS11_variable           Right-wing_populist_party_included                                   ESS11_Code(s)
# ----------------------------------------------------------------------------------------------------------------------------
# Austria           prtvtdat                 FPÖ                                                                  3
# Croatia           prtvtchr                 DP Miroslava Škore / Hrvatski suverenisti / Hrast (nationalist bloc) 3
# Finland           prtvtffi                 True Finns                                                           8
# Germany           prtvgde1, prtvgde2       Alternative for Germany (AfD)                                        6
# Hungary           prtvthhu                 Mi Hazánk                                                            5
# Ireland           prtvteie                 (none included)                                                      —
# Lithuania         prtvclt1–3               National Alliance (NS)                                               7
# Netherlands       prtvtinl                 Party for Freedom (PVV)                                              3
# Norway            prtvtcno                 Fremskrittspartiet (Progress Party)                                  8
# Slovakia          prtvtesk                 ĽS Naše Slovensko                                                    4
# Slovenia          prtvtgsi                 SDS – Slovenska demokratska stranka                                  8
# Switzerland       prtvthch                 Swiss People’s Party (SVP)                                           1
# United Kingdom    prtvtdgb                 UKIP, Brexit Party                                                   7,8

# there is voting information for 
countries = c("AT", "BE", "BG", "CH", "CZ", "DE", "EE", "FI", "FR",
              "GB", "GR", "HR", "HU", "IS", "IT", "LT", "ME", "MK",
              "NL", "NO", "PT", "SI", "SK")
# there is no voting information for 
setdiff(unique(df$cntry), countries)

## [1] "IE" "CY" "ES" "PL" "RS" "SE"

# reduce df to counttries providing voting info
df = df[!(df$cntry %in% setdiff(unique(df$cntry), countries)),]
#check
length(unique(df$cntry)) == length(countries)

## [1] TRUE

# define right-wing populist voters
df$rwpop = 
  (df$cntry == "AT" & df$prtvtdat == 3) |
  (df$cntry == "BE" & df$prtvtebe == 6) |
  (df$cntry == "BG" & df$prtvtebg %in% c(6, 8, 9)) |
  (df$cntry == "CH" & df$prtvthch == 1) |
  (df$cntry == "CZ" & df$prtvtecz == 8) |
  (df$cntry == "DE" & (df$prtvgde1 %in% c(6) | df$prtvgde2 %in% c(6))) |
  (df$cntry == "EE" & df$prtvthee == 6) |
  (df$cntry == "FI" & df$prtvtffi %in% c(8)) | #11
  (df$cntry == "FI" & df$prtvtefi == 5) | #10
  (df$cntry == "FR" & df$prtvtefr == 11) |
  (df$cntry == "GB" & df$prtvtdgb %in% c(7,8)) |
  (df$cntry == "GR" & df$prtvtdgr %in% c(5, 7)) |
  (df$cntry == "HR" & df$prtvtchr == 3) |
  (df$cntry == "HU" & df$prtvthhu == 5) | # 11
  (df$cntry == "HU" & df$prtvtghu %in% c(3, 4)) | #10
  (df$cntry == "IS" & df$prtvtdis == 6) |
  (df$cntry == "IT" & df$prtvtdit %in% c(3, 5, 10)) |
  (df$cntry == "LT" & (
    df$prtvclt1 == 7 |
      df$prtvclt2 == 7 |
      df$prtvclt3 == 7 )) |
  (df$cntry == "ME" & df$prtvtame == 9) |
  (df$cntry == "MK" & df$prtvtmk %in% c(2, 5)) |
  (df$cntry == "NL" & df$prtvtinl == 3) | #11
  (df$cntry == "NL" & df$prtvthnl %in% c(3, 13, 16)) | #10
  (df$cntry == "NO" & df$prtvtbno == 8) |
  (df$cntry == "PT" & df$prtvtdpt %in% c(4, 15)) |
  (df$cntry == "SK" & df$prtvtesk == 4) |
  (df$cntry == "SI" & df$prtvtgsi == 8) | # 11
  (df$cntry == "SI" & df$prtvtfsi %in% c(8, 11))  # 10

# rwpop[df$is.na(rwpop)] = 0

table(df$rwpop, useNA="always")

## 
## FALSE  TRUE  <NA> 
## 38496  3922 24798

tapply(df$rwpop, df$cntry, mean, na.rm=T)

##          AT          BE          BG          CH          CZ          DE 
## 0.154471545 0.087710084 0.028329654 0.226429102 0.064139942 0.042561983 
##          EE          FI          FR          GB          GR          HR 
## 0.130663857 0.239906832 0.122425629 0.004941758 0.015118790 0.054862843 
##          HU          IS          IT          LT          ME          MK 
## 0.394225983 0.045171340 0.048864668 1.000000000 0.245635910 0.158152554 
##          NL          NO          PT          SI          SK 
## 0.148014440 0.083333333 0.006228589 0.377445339 0.053929122

#### Define complex Variable "Depression Score"
# CES-D8 items
df$d1 = as.numeric(df$fltdpr)
df$d2 = as.numeric(df$flteeff)
df$d3 = as.numeric(df$fltlnl)
df$d4 = 5 - as.numeric(df$enjlf)
df$d5 = as.numeric(df$fltsd)
df$d6 = 5 - as.numeric(df$wrhpp)
df$d7 = as.numeric(df$slprl)
df$d8 = as.numeric(df$cldgng)

# CES-D8 score
df$cesd8 = rowMeans(df[, paste0("d", 1:8)])

#### Define complex Variable "BMI"
grep("weight", names(df))

## [1]   7   9  10 695

names(df)[grep("weight", names(df))]  # list all variable names containing "weight"

## [1] "dweight"  "pweight"  "anweight" "weighta"

names(df)[694:696] # look up adjacent variable names

## [1] "height"  "weighta" "dshltgp"

class(df$height)

## [1] "numeric"

#df$height = as.numeric(as.character(df$height))
mean(df$height, na.rm=T)

## [1] 171.082

df$height = df$height /100 # body height in meter

class(df$weighta)

## [1] "numeric"

#df$weighta = as.numeric(as.character(df$weighta))

df$bmi = df$weighta / (df$height**2)

# https://apps.who.int/nutrition/landscape/help.aspx?menu=0&helpid=420
# BMI < 17.0 indicates moderate and severe thinness
# BMI < 18.5 indicates underweight
# BMI 18.5–24.9 indicates normal weight
# BMI ≥ 25.0 indicates overweight
# BMI ≥ 30.0 indicates obesity


##################################################
##### SEMINAR PAPER STUDENT INPUT ################
##################################################
# define list of variable names to be used for aggregated data
#varnames = c("cntry", "region", "gndr", "agea", "health", "rwpop", "cesd8", "bmi",    "eisced",
#"hinctnta")
#🔧 1️⃣ varnames expanding
varnames = c(
  "cntry",
  "region",
  "gndr",
  "agea",
  "health",
  "rwpop",
  "eisced",    # education
  "hinctnta",  # household income (deciles)
  "cesd8",     # depression score
  "bmi"        # body mass index
)

Hypothesis 1 (Age): # Regions with a higher average age exhibit a higher share of right-wing populist voters.

Hypothesis 2 (Education): # Regions with lower average levels of education exhibit a higher share of right-wing populist voters.

Hypothesis 3 (Health): # Regions with poorer average health exhibit a higher share of right-wing populist voters.

Hypothesis 4 (Depression / Mental Health): # Regions with higher average depression scores (CES-D8) exhibit a higher share of right-wing populist voters.

Hypothesis 5 (Gender): # Regions with a higher proportion of men exhibit a higher share of right-wing populist voters.

##################################################
##### END SEMINAR PAPER STUDENT INPUT ############
##################################################

# and limit the dataset to complete cases for these variables
summary(df[,varnames]) ## check for NAs

##     cntry              region               gndr            agea      
##  Length:67216       Length:67216       Min.   :1.000   Min.   :15.00  
##  Class :character   Class :character   1st Qu.:1.000   1st Qu.:36.00  
##  Mode  :character   Mode  :character   Median :2.000   Median :52.00  
##                                        Mean   :1.535   Mean   :51.13  
##                                        3rd Qu.:2.000   3rd Qu.:66.00  
##                                        Max.   :2.000   Max.   :90.00  
##                                                        NA's   :446    
##      health        rwpop             eisced          hinctnta     
##  Min.   :1.000   Mode :logical   Min.   : 1.000   Min.   : 1.000  
##  1st Qu.:1.000   FALSE:38496     1st Qu.: 3.000   1st Qu.: 3.000  
##  Median :2.000   TRUE :3922      Median : 4.000   Median : 5.000  
##  Mean   :2.152   NA's :24798     Mean   : 4.241   Mean   : 5.488  
##  3rd Qu.:3.000                   3rd Qu.: 6.000   3rd Qu.: 8.000  
##  Max.   :5.000                   Max.   :55.000   Max.   :10.000  
##  NA's   :82                      NA's   :253      NA's   :14217   
##      cesd8            bmi       
##  Min.   :1.000   Min.   :16.00  
##  1st Qu.:1.375   1st Qu.:22.76  
##  Median :1.625   Median :25.39  
##  Mean   :1.703   Mean   :25.78  
##  3rd Qu.:2.000   3rd Qu.:28.23  
##  Max.   :4.000   Max.   :40.00  
##  NA's   :36423   NA's   :37615

nrow(df)

## [1] 67216

df = df[complete.cases(df[,varnames]), varnames]
nrow(df)

## [1] 14128

df = df[df$cntry %in% countries, ]
nrow(df)

## [1] 14128

# change region codes so that regions with < 30 respondents will be summarized to 
# "other regions in country"
# get small regions
region_n = aggregate(cbind(n = !is.na(region)) ~ cntry+region, df, sum, na.rm = TRUE)
region_n = region_n[region_n$n < 30, ]
# create new region code comrpising all small regions of a single country
region_n$newRegion = paste0(region_n$cntry, "_OTH")
# replace region in original data with new region code
tmp = merge(df, region_n[, c("region", "newRegion")],
            by = c("region"), all.x = TRUE, sort = FALSE)

unique(tmp$region)

##   [1] "AT34"  "BE34"  "BE31"  "CH07"  "DE8"   "DEC"   "FI1D5" "FI197" "FI1C1"
##  [10] "FI194" "FI1D7" "FI193" "FI1D1" "FI195" "FI196" "FI1D6" "FI1C3" "FI1C4"
##  [19] "FI1D2" "FI1C2" "FI1D4" "FI1C5" "FI1D3" "UKN"   "EL62"  "EL42"  "EL41" 
##  [28] "HR023" "HR021" "HR022" "HR036" "HR035" "HR026" "HR033" "HR032" "HR063"
##  [37] "HR024" "HR034" "HR061" "HR027" "HR037" "HU232" "HU211" "HU333" "HU212"
##  [46] "HU312" "HU223" "HU213" "HU221" "HU332" "HU313" "LT023" "LT011" "LT022"
##  [55] "NL23"  "SI038" "SI036" "SI035" "SI033" "AT31"  "AT22"  "AT33"  "AT12" 
##  [64] "AT11"  "AT13"  "AT32"  "DEA"   "DE1"   "AT21"  "DEF"   "DE2"   "DE3"  
##  [73] "DE7"   "DEB"   "DE4"   "DEG"   "BE24"  "BE23"  "BE25"  "BE32"  "BE10" 
##  [82] "BE21"  "BE33"  "BE22"  "DE6"   "BE35"  "HU120" "HU231" "HU311" "HU323"
##  [91] "HU321" "HU110" "HU331" "NL31"  "UKK"   "UKL"   "UKD"   "UKG"   "UKH"  
## [100] "UKC"   "UKF"   "NL42"  "UKJ"   "UKM"   "CH02"  "CH03"  "CH05"  "CH01" 
## [109] "CH06"  "CH04"  "UKI"   "EL64"  "EL63"  "EL52"  "EL61"  "DE9"   "DED"  
## [118] "DEE"   "EL30"  "EL51"  "EL43"  "EL65"  "EL53"  "EL54"  "ITI"   "HR025"
## [127] "HR050" "NL32"  "HR062" "NL33"  "NL34"  "HR065" "NL13"  "HR064" "NL41" 
## [136] "FI1B1" "NL21"  "HU222" "HU322" "NL22"  "NL12"  "UKE"   "PT11"  "PT18" 
## [145] "PT17"  "ITF"   "ITC"   "ITG"   "ITH"   "HR031" "SK023" "HR028" "SK010"
## [154] "SK031" "SK042" "NL11"  "SK032" "SK021" "SK041" "SK022" "PT16"  "PT15" 
## [163] "SI031" "SI041" "SI042" "SI034" "SI043" "SI044" "SI032" "SI037"

length(unique(tmp$region))

## [1] 170

# replace region where a newRegion is available
tmp$region = ifelse(!is.na(tmp$newRegion), tmp$newRegion, tmp$region)
unique(tmp$region)

##   [1] "AT_OTH" "BE_OTH" "CH_OTH" "DE_OTH" "FI_OTH" "GB_OTH" "GR_OTH" "HR_OTH"
##   [9] "HU_OTH" "LT_OTH" "NL_OTH" "SI_OTH" "AT31"   "AT22"   "AT33"   "AT12"  
##  [17] "AT11"   "AT13"   "AT32"   "DEA"    "DE1"    "AT21"   "DEF"    "DE2"   
##  [25] "DE3"    "DE7"    "DEB"    "DE4"    "DEG"    "BE24"   "BE23"   "BE25"  
##  [33] "BE32"   "BE10"   "BE21"   "BE33"   "BE22"   "DE6"    "BE35"   "HU120" 
##  [41] "HU231"  "HU311"  "HU323"  "HU321"  "HU110"  "HU331"  "NL31"   "UKK"   
##  [49] "UKL"    "UKD"    "UKG"    "UKH"    "UKC"    "UKF"    "NL42"   "UKJ"   
##  [57] "UKM"    "CH02"   "CH03"   "CH05"   "CH01"   "CH06"   "CH04"   "UKI"   
##  [65] "EL64"   "EL63"   "EL52"   "EL61"   "DE9"    "DED"    "DEE"    "EL30"  
##  [73] "EL51"   "EL43"   "EL65"   "EL53"   "EL54"   "ITI"    "HR025"  "HR050" 
##  [81] "NL32"   "HR062"  "NL33"   "NL34"   "HR065"  "NL13"   "HR064"  "NL41"  
##  [89] "FI1B1"  "NL21"   "HU222"  "HU322"  "NL22"   "NL12"   "UKE"    "PT11"  
##  [97] "PT18"   "PT17"   "ITF"    "ITC"    "ITG"    "ITH"    "HR031"  "SK023" 
## [105] "HR028"  "SK010"  "SK031"  "SK042"  "NL11"   "SK032"  "SK021"  "SK041" 
## [113] "SK022"  "PT16"   "PT15"   "SI031"  "SI041"  "SI042"  "SI034"  "SI043" 
## [121] "SI044"  "SI032"  "SI037"

length(unique(tmp$region))

## [1] 123

# drop helper column (optional)
tmp$newRegion = NULL

df = tmp
unique(df$region)

##   [1] "AT_OTH" "BE_OTH" "CH_OTH" "DE_OTH" "FI_OTH" "GB_OTH" "GR_OTH" "HR_OTH"
##   [9] "HU_OTH" "LT_OTH" "NL_OTH" "SI_OTH" "AT31"   "AT22"   "AT33"   "AT12"  
##  [17] "AT11"   "AT13"   "AT32"   "DEA"    "DE1"    "AT21"   "DEF"    "DE2"   
##  [25] "DE3"    "DE7"    "DEB"    "DE4"    "DEG"    "BE24"   "BE23"   "BE25"  
##  [33] "BE32"   "BE10"   "BE21"   "BE33"   "BE22"   "DE6"    "BE35"   "HU120" 
##  [41] "HU231"  "HU311"  "HU323"  "HU321"  "HU110"  "HU331"  "NL31"   "UKK"   
##  [49] "UKL"    "UKD"    "UKG"    "UKH"    "UKC"    "UKF"    "NL42"   "UKJ"   
##  [57] "UKM"    "CH02"   "CH03"   "CH05"   "CH01"   "CH06"   "CH04"   "UKI"   
##  [65] "EL64"   "EL63"   "EL52"   "EL61"   "DE9"    "DED"    "DEE"    "EL30"  
##  [73] "EL51"   "EL43"   "EL65"   "EL53"   "EL54"   "ITI"    "HR025"  "HR050" 
##  [81] "NL32"   "HR062"  "NL33"   "NL34"   "HR065"  "NL13"   "HR064"  "NL41"  
##  [89] "FI1B1"  "NL21"   "HU222"  "HU322"  "NL22"   "NL12"   "UKE"    "PT11"  
##  [97] "PT18"   "PT17"   "ITF"    "ITC"    "ITG"    "ITH"    "HR031"  "SK023" 
## [105] "HR028"  "SK010"  "SK031"  "SK042"  "NL11"   "SK032"  "SK021"  "SK041" 
## [113] "SK022"  "PT16"   "PT15"   "SI031"  "SI041"  "SI042"  "SI034"  "SI043" 
## [121] "SI044"  "SI032"  "SI037"

##################################################
##### SEMINAR PAPER STUDENT INPUT ################
##################################################
#  aggregation data by country and region
#🔧 2️⃣ Aggregation expanding
dfa = aggregate(
  cbind(
    pct_male = (gndr == 1),
    mean_age = agea,
    pct_good_health = (health %in% c(1, 2)),
    mean_education = eisced,
    mean_income = hinctnta,
    mean_cesd8 = cesd8,
    mean_bmi = bmi,
    pct_rwpop = rwpop
  ) ~ cntry + region,
  df, mean, na.rm = TRUE
)

The individual-level survey responses are aggregated to the regional level in order to compute regional means and proportions.

Each region is assigned values for the proportion of men, mean age, average health status, average level of education, mean household income, average depression score (CES-D8), mean body mass index (BMI), and the proportion of right-wing populist voters.

Missing values are excluded from the calculations.

##################################################
##### END SEMINAR PAPER STUDENT INPUT ############
##################################################

nrow(dfa)

## [1] 123

summary(dfa)

##     cntry              region             pct_male         mean_age    
##  Length:123         Length:123         Min.   :0.2388   Min.   :45.95  
##  Class :character   Class :character   1st Qu.:0.4364   1st Qu.:51.93  
##  Mode  :character   Mode  :character   Median :0.4878   Median :54.25  
##                                        Mean   :0.4890   Mean   :54.23  
##                                        3rd Qu.:0.5488   3rd Qu.:56.55  
##                                        Max.   :0.6935   Max.   :61.97  
##  pct_good_health  mean_education   mean_income      mean_cesd8   
##  Min.   :0.3846   Min.   :2.531   Min.   :3.609   Min.   :1.430  
##  1st Qu.:0.6200   1st Qu.:3.807   1st Qu.:5.006   1st Qu.:1.573  
##  Median :0.6848   Median :4.330   Median :5.560   Median :1.681  
##  Mean   :0.6789   Mean   :4.328   Mean   :5.568   Mean   :1.695  
##  3rd Qu.:0.7599   3rd Qu.:4.727   3rd Qu.:6.158   3rd Qu.:1.784  
##  Max.   :1.0000   Max.   :6.686   Max.   :7.420   Max.   :2.098  
##     mean_bmi       pct_rwpop      
##  Min.   :23.21   Min.   :0.00000  
##  1st Qu.:25.53   1st Qu.:0.00000  
##  Median :26.10   Median :0.03974  
##  Mean   :26.11   Mean   :0.09149  
##  3rd Qu.:26.74   3rd Qu.:0.11396  
##  Max.   :27.93   Max.   :1.00000

# should be equal
unique(df$region)

##   [1] "AT_OTH" "BE_OTH" "CH_OTH" "DE_OTH" "FI_OTH" "GB_OTH" "GR_OTH" "HR_OTH"
##   [9] "HU_OTH" "LT_OTH" "NL_OTH" "SI_OTH" "AT31"   "AT22"   "AT33"   "AT12"  
##  [17] "AT11"   "AT13"   "AT32"   "DEA"    "DE1"    "AT21"   "DEF"    "DE2"   
##  [25] "DE3"    "DE7"    "DEB"    "DE4"    "DEG"    "BE24"   "BE23"   "BE25"  
##  [33] "BE32"   "BE10"   "BE21"   "BE33"   "BE22"   "DE6"    "BE35"   "HU120" 
##  [41] "HU231"  "HU311"  "HU323"  "HU321"  "HU110"  "HU331"  "NL31"   "UKK"   
##  [49] "UKL"    "UKD"    "UKG"    "UKH"    "UKC"    "UKF"    "NL42"   "UKJ"   
##  [57] "UKM"    "CH02"   "CH03"   "CH05"   "CH01"   "CH06"   "CH04"   "UKI"   
##  [65] "EL64"   "EL63"   "EL52"   "EL61"   "DE9"    "DED"    "DEE"    "EL30"  
##  [73] "EL51"   "EL43"   "EL65"   "EL53"   "EL54"   "ITI"    "HR025"  "HR050" 
##  [81] "NL32"   "HR062"  "NL33"   "NL34"   "HR065"  "NL13"   "HR064"  "NL41"  
##  [89] "FI1B1"  "NL21"   "HU222"  "HU322"  "NL22"   "NL12"   "UKE"    "PT11"  
##  [97] "PT18"   "PT17"   "ITF"    "ITC"    "ITG"    "ITH"    "HR031"  "SK023" 
## [105] "HR028"  "SK010"  "SK031"  "SK042"  "NL11"   "SK032"  "SK021"  "SK041" 
## [113] "SK022"  "PT16"   "PT15"   "SI031"  "SI041"  "SI042"  "SI034"  "SI043" 
## [121] "SI044"  "SI032"  "SI037"

unique(dfa$region)

##   [1] "AT_OTH" "AT11"   "AT12"   "AT13"   "AT21"   "AT22"   "AT31"   "AT32"  
##   [9] "AT33"   "BE_OTH" "BE10"   "BE21"   "BE22"   "BE23"   "BE24"   "BE25"  
##  [17] "BE32"   "BE33"   "BE35"   "CH_OTH" "CH01"   "CH02"   "CH03"   "CH04"  
##  [25] "CH05"   "CH06"   "DE_OTH" "DE1"    "DE2"    "DE3"    "DE4"    "DE6"   
##  [33] "DE7"    "DE9"    "DEA"    "DEB"    "DED"    "DEE"    "DEF"    "DEG"   
##  [41] "EL30"   "EL43"   "EL51"   "EL52"   "EL53"   "EL54"   "EL61"   "EL63"  
##  [49] "EL64"   "EL65"   "FI_OTH" "FI1B1"  "GB_OTH" "GR_OTH" "HR_OTH" "HR025" 
##  [57] "HR028"  "HR031"  "HR050"  "HR062"  "HR064"  "HR065"  "HU_OTH" "HU110" 
##  [65] "HU120"  "HU222"  "HU231"  "HU311"  "HU321"  "HU322"  "HU323"  "HU331" 
##  [73] "ITC"    "ITF"    "ITG"    "ITH"    "ITI"    "LT_OTH" "NL_OTH" "NL11"  
##  [81] "NL12"   "NL13"   "NL21"   "NL22"   "NL31"   "NL32"   "NL33"   "NL34"  
##  [89] "NL41"   "NL42"   "PT11"   "PT15"   "PT16"   "PT17"   "PT18"   "SI_OTH"
##  [97] "SI031"  "SI032"  "SI034"  "SI037"  "SI041"  "SI042"  "SI043"  "SI044" 
## [105] "SK010"  "SK021"  "SK022"  "SK023"  "SK031"  "SK032"  "SK041"  "SK042" 
## [113] "UKC"    "UKD"    "UKE"    "UKF"    "UKG"    "UKH"    "UKI"    "UKJ"   
## [121] "UKK"    "UKL"    "UKM"

length(unique(df$region))

## [1] 123

length(unique(dfa$region))

## [1] 123

# different aggregation functions cannot be used in a single aggregate command
# that's why we add the group sizes separately
#dfn  = aggregate(cbind(n = !is.na(cntry)) ~ cntry, df, sum,  na.rm = TRUE)
dfn  = aggregate(cbind(n = !is.na(cntry)) ~ cntry+region, df, sum,  na.rm = TRUE)
nrow(dfn)

## [1] 123

# merge group sizes with data
#dfa = merge(dfa, dfn, by = c("cntry"))
dfa = merge(dfa, dfn, by = c("cntry","region"))
nrow(dfa)

## [1] 123

dfa

##     cntry region  pct_male mean_age pct_good_health mean_education mean_income
## 1      AT AT_OTH 0.4827586 57.44828       0.7586207       4.137931    5.896552
## 2      AT   AT11 0.4571429 59.54286       0.6571429       3.542857    5.985714
## 3      AT   AT12 0.4456929 59.82397       0.7003745       3.764045    5.393258
## 4      AT   AT13 0.3785047 61.11215       0.7009346       5.948598    4.985981
## 5      AT   AT21 0.2941176 61.00000       0.7450980       3.588235    4.568627
## 6      AT   AT22 0.4878049 54.67683       0.6768293       4.006098    4.628049
## 7      AT   AT31 0.4730539 58.86228       0.6886228       4.485030    5.083832
## 8      AT   AT32 0.4615385 59.95385       0.6153846       4.523077    5.323077
## 9      AT   AT33 0.4365079 61.53968       0.7063492       4.460317    5.301587
## 10     BE BE_OTH 0.5227273 51.70455       0.8636364       5.795455    6.477273
## 11     BE   BE10 0.5416667 56.62500       0.7083333       5.416667    5.375000
## 12     BE   BE21 0.5956284 52.46448       0.7267760       5.502732    6.415301
## 13     BE   BE22 0.5212766 57.44681       0.6914894       4.585106    5.680851
## 14     BE   BE23 0.5620438 53.71533       0.7080292       4.708029    6.408759
## 15     BE   BE24 0.6086957 55.03261       0.6956522       6.293478    6.804348
## 16     BE   BE25 0.5714286 55.38393       0.6696429       5.321429    6.258929
## 17     BE   BE32 0.4117647 54.98824       0.6470588       4.800000    5.517647
## 18     BE   BE33 0.3734940 52.46988       0.7108434       5.734940    5.710843
## 19     BE   BE35 0.5151515 54.60606       0.7272727       5.393939    5.030303
## 20     CH CH_OTH 0.6666667 54.25000       0.8333333       5.166667    5.333333
## 21     CH   CH01 0.5894737 55.40000       0.8421053       5.042105    5.642105
## 22     CH   CH02 0.5228758 54.84967       0.8039216       4.647059    5.830065
## 23     CH   CH03 0.5584416 59.53247       0.8311688       4.519481    5.493506
## 24     CH   CH04 0.5116279 59.46512       0.8488372       5.883721    6.081395
## 25     CH   CH05 0.6290323 54.61290       0.8709677       4.709677    5.790323
## 26     CH   CH06 0.6086957 54.62319       0.8115942       4.478261    5.289855
## 27     DE DE_OTH 0.4318182 52.40909       0.6818182       4.409091    4.500000
## 28     DE    DE1 0.4930556 48.90278       0.6527778       4.565972    6.663194
## 29     DE    DE2 0.4941520 49.45906       0.6578947       4.312865    6.479532
## 30     DE    DE3 0.3918919 49.58108       0.6081081       4.918919    6.094595
## 31     DE    DE4 0.6307692 49.27692       0.5230769       4.000000    5.692308
## 32     DE    DE6 0.4411765 50.23529       0.8823529       5.205882    7.235294
## 33     DE    DE7 0.5424837 48.15033       0.6209150       4.254902    5.921569
## 34     DE    DE9 0.5529412 53.34706       0.6058824       4.400000    6.270588
## 35     DE    DEA 0.5208333 51.07765       0.6136364       4.329545    6.380682
## 36     DE    DEB 0.5063291 45.94937       0.6708861       4.405063    6.113924
## 37     DE    DED 0.5625000 56.60417       0.5937500       4.697917    5.947917
## 38     DE    DEE 0.5689655 57.27586       0.4482759       3.896552    4.810345
## 39     DE    DEF 0.5301205 55.24096       0.6265060       4.530120    6.361446
## 40     DE    DEG 0.5416667 53.33333       0.6250000       4.333333    5.437500
## 41     FI FI_OTH 0.6935484 49.24194       0.6370968       4.516129    6.016129
## 42     FI  FI1B1 0.5588235 53.58824       0.8235294       4.882353    6.411765
## 43     GB GB_OTH 0.4482759 54.10345       0.6896552       3.551724    4.517241
## 44     GB    UKC 0.5681818 49.97727       0.5681818       3.977273    4.568182
## 45     GB    UKD 0.5980392 52.60784       0.6764706       4.539216    5.303922
## 46     GB    UKE 0.5921053 53.03947       0.6447368       5.197368    4.881579
## 47     GB    UKF 0.5148515 52.48515       0.6633663       4.405941    4.356436
## 48     GB    UKG 0.5108696 51.56522       0.6847826       5.413043    5.152174
## 49     GB    UKH 0.5688073 55.94495       0.7706422       4.532110    5.944954
## 50     GB    UKI 0.4683544 51.07595       0.7594937       5.367089    6.050633
## 51     GB    UKJ 0.5384615 55.22436       0.6538462       6.685897    6.435897
## 52     GB    UKK 0.4150943 55.33962       0.6603774       5.037736    5.330189
## 53     GB    UKL 0.5000000 51.96875       0.6875000       5.656250    4.500000
## 54     GB    UKM 0.5588235 55.54412       0.6617647       4.514706    5.029412
## 55     GR   EL30 0.4365482 52.13706       0.8147208       4.124365    6.200508
## 56     GR   EL43 0.5272727 56.59091       0.6727273       3.363636    5.500000
## 57     GR   EL51 0.4756098 49.51220       0.8536585       3.939024    5.012195
## 58     GR   EL52 0.4714286 50.29714       0.8657143       3.991429    5.100000
## 59     GR   EL53 0.4262295 54.60656       0.7704918       3.245902    5.065574
## 60     GR   EL54 0.5079365 61.96825       0.7936508       2.920635    5.412698
## 61     GR   EL61 0.4800000 46.90667       0.9000000       4.633333    5.066667
## 62     GR   EL63 0.3636364 52.86364       0.7272727       3.659091    4.568182
## 63     GR   EL64 0.4339623 60.03774       0.5471698       2.735849    4.113208
## 64     GR   EL65 0.4363636 48.96364       0.7636364       3.945455    5.272727
## 65     GR GR_OTH 0.6857143 49.51429       0.7428571       4.800000    7.142857
## 66     HR HR_OTH 0.5550239 58.03349       0.5311005       3.607656    4.870813
## 67     HR  HR025 0.4603175 56.01587       0.4285714       3.873016    4.809524
## 68     HR  HR028 0.3902439 57.53659       0.3902439       3.853659    4.634146
## 69     HR  HR031 0.5476190 58.42857       0.6190476       4.476190    5.928571
## 70     HR  HR050 0.3888889 56.51852       0.6851852       4.287037    5.574074
## 71     HR  HR062 0.4047619 57.78571       0.5000000       3.571429    6.142857
## 72     HR  HR064 0.3714286 54.80000       0.6285714       3.714286    5.914286
## 73     HR  HR065 0.3404255 53.91489       0.7872340       3.808511    6.000000
## 74     HU HU_OTH 0.4271845 56.07282       0.6165049       3.626214    4.932039
## 75     HU  HU110 0.3056995 49.90155       0.7823834       4.186528    7.419689
## 76     HU  HU120 0.2388060 49.95522       0.8358209       3.813433    7.320896
## 77     HU  HU222 0.4358974 59.66667       0.6666667       3.538462    6.564103
## 78     HU  HU231 0.3877551 53.69388       0.5918367       3.428571    4.326531
## 79     HU  HU311 0.3916667 54.10000       0.5000000       3.291667    3.975000
## 80     HU  HU321 0.4642857 50.35714       0.6964286       3.642857    4.392857
## 81     HU  HU322 0.3888889 51.69444       0.7500000       3.750000    3.666667
## 82     HU  HU323 0.4844720 51.56522       0.5403727       3.366460    4.850932
## 83     HU  HU331 0.4523810 54.78571       0.6428571       3.595238    5.714286
## 84     IT    ITC 0.4605010 52.40848       0.7398844       3.803468    5.190751
## 85     IT    ITF 0.4699739 54.02089       0.6240209       3.151436    4.146214
## 86     IT    ITG 0.4076433 53.35032       0.6815287       3.324841    4.980892
## 87     IT    ITH 0.5295858 52.36686       0.7603550       3.857988    5.997041
## 88     IT    ITI 0.4887640 52.04213       0.6601124       3.500000    5.887640
## 89     LT LT_OTH 0.3333333 49.33333       1.0000000       6.333333    6.333333
## 90     NL NL_OTH 0.3684211 54.42105       0.6315789       4.526316    5.578947
## 91     NL   NL11 0.5128205 54.41026       0.7948718       4.743590    5.769231
## 92     NL   NL12 0.5208333 57.62500       0.8125000       4.833333    5.750000
## 93     NL   NL13 0.6086957 56.21739       0.7826087       4.695652    6.586957
## 94     NL   NL21 0.5049505 51.51485       0.7920792       4.346535    6.356436
## 95     NL   NL22 0.5695364 53.62252       0.8079470       4.960265    6.370861
## 96     NL   NL31 0.4807692 50.64423       0.7884615       4.865385    6.567308
## 97     NL   NL32 0.5789474 54.35789       0.7631579       5.078947    6.968421
## 98     NL   NL33 0.4802260 53.97740       0.7401130       4.768362    6.468927
## 99     NL   NL34 0.5428571 54.71429       0.7428571       3.971429    6.714286
## 100    NL   NL41 0.4569536 51.95364       0.7152318       4.927152    6.649007
## 101    NL   NL42 0.5492958 53.22535       0.6901408       4.478873    6.309859
## 102    PT   PT11 0.4562842 52.17486       0.5573770       3.262295    4.937158
## 103    PT   PT15 0.5483871 51.54839       0.5806452       3.000000    4.580645
## 104    PT   PT16 0.3768116 57.25725       0.4601449       3.021739    3.800725
## 105    PT   PT17 0.4457364 53.05039       0.6550388       4.120155    5.465116
## 106    PT   PT18 0.3906250 59.78125       0.4218750       2.531250    3.609375
## 107    SI SI_OTH 0.5294118 51.90196       0.8235294       4.235294    5.000000
## 108    SI  SI031 0.5833333 54.61111       0.5000000       3.805556    5.166667
## 109    SI  SI032 0.4854369 54.53398       0.6601942       4.330097    5.213592
## 110    SI  SI034 0.4444444 49.74074       0.7283951       4.518519    6.000000
## 111    SI  SI037 0.5652174 50.73913       0.6739130       4.195652    6.173913
## 112    SI  SI041 0.5031056 53.77640       0.7204969       4.850932    6.192547
## 113    SI  SI042 0.6491228 50.73684       0.7368421       4.210526    6.105263
## 114    SI  SI043 0.5750000 54.37500       0.6500000       4.050000    4.925000
## 115    SI  SI044 0.3947368 60.39474       0.5000000       3.552632    4.947368
## 116    SK  SK010 0.4565217 58.97826       0.4565217       4.217391    7.108696
## 117    SK  SK021 0.4428571 58.57143       0.4285714       3.942857    5.414286
## 118    SK  SK022 0.4728682 55.74419       0.5038760       4.317829    5.286822
## 119    SK  SK023 0.5600000 53.36667       0.6933333       3.966667    5.586667
## 120    SK  SK031 0.4405594 57.73427       0.4545455       4.335664    5.314685
## 121    SK  SK032 0.4065934 56.98901       0.3846154       3.626374    4.307692
## 122    SK  SK041 0.4029851 54.53731       0.5671642       4.283582    5.388060
## 123    SK  SK042 0.5000000 57.33333       0.5476190       4.011905    5.559524
##     mean_cesd8 mean_bmi  pct_rwpop   n
## 1     1.530172 25.65850 0.17241379  29
## 2     1.571429 25.47077 0.14285714  70
## 3     1.562266 25.70209 0.11610487 267
## 4     1.568341 25.56164 0.10747664 214
## 5     1.607843 26.15031 0.15686275  51
## 6     1.529726 26.50661 0.18292683 164
## 7     1.690120 25.99666 0.19760479 167
## 8     1.665385 25.18255 0.15384615  65
## 9     1.681548 25.45578 0.16666667 126
## 10    1.480114 24.63566 0.00000000  44
## 11    1.789062 24.07798 0.00000000  48
## 12    1.599044 26.12873 0.13661202 183
## 13    1.595745 26.12325 0.18085106  94
## 14    1.562956 26.02955 0.10218978 137
## 15    1.600543 25.68808 0.07608696  92
## 16    1.575893 26.44825 0.13392857 112
## 17    1.794118 26.98694 0.00000000  85
## 18    1.722892 24.33762 0.00000000  83
## 19    1.768939 24.75450 0.03030303  33
## 20    1.614583 24.26184 0.16666667  12
## 21    1.515789 25.20338 0.09473684  95
## 22    1.526961 25.20670 0.26143791 153
## 23    1.521104 25.34956 0.24675325  77
## 24    1.578488 24.98325 0.15116279  86
## 25    1.469758 24.79281 0.37096774  62
## 26    1.465580 25.77022 0.17391304  69
## 27    1.789773 24.93724 0.11363636  44
## 28    1.661458 25.63838 0.02430556 288
## 29    1.637061 25.85977 0.02631579 342
## 30    1.746622 25.44850 0.04054054  74
## 31    1.788462 27.23272 0.10769231  65
## 32    1.452206 25.95352 0.00000000  34
## 33    1.755719 26.60990 0.01960784 153
## 34    1.687500 26.71238 0.04705882 170
## 35    1.680161 26.28106 0.05303030 528
## 36    1.759494 25.32294 0.00000000  79
## 37    1.683594 26.27621 0.08333333  96
## 38    1.801724 27.77265 0.06896552  58
## 39    1.656627 26.20553 0.02409639  83
## 40    1.679688 26.99110 0.10416667  48
## 41    1.571573 27.72343 1.00000000 124
## 42    1.602941 27.15768 1.00000000  34
## 43    1.706897 27.00836 0.00000000  29
## 44    2.008523 27.16156 0.00000000  44
## 45    1.685049 25.86093 0.00000000 102
## 46    1.588816 27.34275 0.00000000  76
## 47    1.730198 26.28055 0.01980198 101
## 48    1.748641 26.25703 0.01086957  92
## 49    1.663991 25.89544 0.00000000 109
## 50    1.634494 25.59192 0.00000000  79
## 51    1.698718 25.91027 0.01282051 156
## 52    1.700472 25.82941 0.00000000 106
## 53    1.792969 27.04563 0.00000000  32
## 54    1.696691 26.59329 0.01470588  68
## 55    1.945749 26.03364 0.00000000 394
## 56    1.879545 27.45669 0.00000000 110
## 57    1.766768 25.14961 0.00000000  82
## 58    1.720000 25.14670 0.00000000 350
## 59    1.928279 25.76392 0.00000000  61
## 60    2.043651 26.70398 0.00000000  63
## 61    2.098333 26.59869 0.00000000 150
## 62    1.963068 26.89429 0.00000000  44
## 63    1.889151 27.46300 0.00000000  53
## 64    1.970455 26.41764 0.00000000  55
## 65    1.939286 26.09709 0.00000000  35
## 66    1.711722 27.58606 0.08133971 209
## 67    1.777778 27.06480 0.09523810  63
## 68    1.789634 27.93299 0.02439024  41
## 69    1.565476 27.47418 0.00000000  42
## 70    1.515046 25.95715 0.03703704 108
## 71    1.970238 27.80422 0.00000000  42
## 72    1.664286 26.54907 0.02857143  35
## 73    1.534574 25.44099 0.06382979  47
## 74    1.805825 26.62052 0.11165049 206
## 75    1.672927 25.80950 0.15025907 193
## 76    1.669776 26.32125 0.04477612 134
## 77    1.663462 27.33137 0.05128205  39
## 78    1.681122 27.04244 0.02040816  49
## 79    1.968750 25.80854 0.03333333 120
## 80    1.801339 26.14935 0.07142857  56
## 81    2.059028 25.81821 0.08333333  36
## 82    1.909161 27.29849 0.03726708 161
## 83    2.014881 27.12768 0.02380952  42
## 84    1.697495 24.25050 0.00000000 519
## 85    1.933094 25.82954 0.00000000 383
## 86    1.771497 23.94052 0.00000000 157
## 87    1.689719 24.64319 0.00000000 338
## 88    1.749298 24.95986 0.00000000 356
## 89    1.750000 23.20759 1.00000000   3
## 90    1.473684 24.44203 0.10526316  19
## 91    1.493590 25.75522 0.02564103  39
## 92    1.479167 26.13090 0.06250000  48
## 93    1.510870 26.79257 0.02173913  46
## 94    1.558168 25.39421 0.06930693 101
## 95    1.488411 25.42422 0.06622517 151
## 96    1.627404 24.84196 0.01923077 104
## 97    1.561184 25.26850 0.01052632 190
## 98    1.473164 25.67125 0.05649718 177
## 99    1.446429 25.68057 0.11428571  35
## 100   1.603477 25.65819 0.03973510 151
## 101   1.501761 25.69814 0.08450704  71
## 102   1.899249 26.13351 0.00000000 366
## 103   1.633065 25.04001 0.00000000  31
## 104   1.938859 26.65494 0.00000000 276
## 105   1.764050 26.05313 0.00000000 258
## 106   1.929688 26.18804 0.00000000  64
## 107   1.600490 26.43409 0.23529412  51
## 108   1.572917 26.10682 0.19444444  36
## 109   1.648058 26.45109 0.22330097 103
## 110   1.577160 26.66520 0.30864198  81
## 111   1.573370 27.56286 0.23913043  46
## 112   1.561335 25.50544 0.18012422 161
## 113   1.429825 27.44512 0.29824561  57
## 114   1.515625 26.09102 0.12500000  40
## 115   1.585526 26.85990 0.10526316  38
## 116   2.005435 27.88829 0.00000000  46
## 117   1.994643 27.87327 0.08571429  70
## 118   1.696705 27.29013 0.03875969 129
## 119   1.610000 25.69103 0.04000000 150
## 120   1.698427 26.42991 0.05594406 143
## 121   1.883242 26.47323 0.14285714  91
## 122   1.779851 26.77478 0.02985075  67
## 123   1.702381 27.90144 0.02380952  84

unique(dfa$region)

##   [1] "AT_OTH" "AT11"   "AT12"   "AT13"   "AT21"   "AT22"   "AT31"   "AT32"  
##   [9] "AT33"   "BE_OTH" "BE10"   "BE21"   "BE22"   "BE23"   "BE24"   "BE25"  
##  [17] "BE32"   "BE33"   "BE35"   "CH_OTH" "CH01"   "CH02"   "CH03"   "CH04"  
##  [25] "CH05"   "CH06"   "DE_OTH" "DE1"    "DE2"    "DE3"    "DE4"    "DE6"   
##  [33] "DE7"    "DE9"    "DEA"    "DEB"    "DED"    "DEE"    "DEF"    "DEG"   
##  [41] "FI_OTH" "FI1B1"  "GB_OTH" "UKC"    "UKD"    "UKE"    "UKF"    "UKG"   
##  [49] "UKH"    "UKI"    "UKJ"    "UKK"    "UKL"    "UKM"    "EL30"   "EL43"  
##  [57] "EL51"   "EL52"   "EL53"   "EL54"   "EL61"   "EL63"   "EL64"   "EL65"  
##  [65] "GR_OTH" "HR_OTH" "HR025"  "HR028"  "HR031"  "HR050"  "HR062"  "HR064" 
##  [73] "HR065"  "HU_OTH" "HU110"  "HU120"  "HU222"  "HU231"  "HU311"  "HU321" 
##  [81] "HU322"  "HU323"  "HU331"  "ITC"    "ITF"    "ITG"    "ITH"    "ITI"   
##  [89] "LT_OTH" "NL_OTH" "NL11"   "NL12"   "NL13"   "NL21"   "NL22"   "NL31"  
##  [97] "NL32"   "NL33"   "NL34"   "NL41"   "NL42"   "PT11"   "PT15"   "PT16"  
## [105] "PT17"   "PT18"   "SI_OTH" "SI031"  "SI032"  "SI034"  "SI037"  "SI041" 
## [113] "SI042"  "SI043"  "SI044"  "SK010"  "SK021"  "SK022"  "SK023"  "SK031" 
## [121] "SK032"  "SK041"  "SK042"

unique(dfreg$region)

##   [1] "AT11"  "AT12"  "AT13"  "AT21"  "AT22"  "AT31"  "AT32"  "AT33"  "AT34" 
##  [10] "BE10"  "BE21"  "BE22"  "BE23"  "BE24"  "BE25"  "BE31"  "BE32"  "BE33" 
##  [19] "BE34"  "BE35"  "BG311" "BG312" "BG313" "BG314" "BG315" "BG321" "BG322"
##  [28] "BG323" "BG324" "BG325" "BG331" "BG332" "BG333" "BG334" "BG341" "BG342"
##  [37] "BG343" "BG344" "BG411" "BG412" "BG413" "BG414" "BG415" "BG421" "BG422"
##  [46] "BG423" "BG424" "BG425" "CY0"   "DE1"   "DE2"   "DE3"   "DE4"   "DE6"  
##  [55] "DE7"   "DE8"   "DE9"   "DEA"   "DEB"   "DEC"   "DED"   "DEE"   "DEF"  
##  [64] "DEG"   "EL30"  "EL41"  "EL42"  "EL43"  "EL51"  "EL52"  "EL53"  "EL54" 
##  [73] "EL61"  "EL62"  "EL63"  "EL64"  "EL65"  "ES11"  "ES12"  "ES13"  "ES21" 
##  [82] "ES22"  "ES23"  "ES24"  "ES30"  "ES41"  "ES42"  "ES43"  "ES51"  "ES52" 
##  [91] "ES53"  "ES61"  "ES62"  "ES64"  "ES70"  "FI196" "FI1B1" "FI1C1" "FI1C2"
## [100] "FI1C5" "FI1D5" "FI1D7" "FI200" "FR10"  "FRB0"  "FRC1"  "FRC2"  "FRD1" 
## [109] "FRD2"  "FRE1"  "FRE2"  "FRF1"  "FRF2"  "FRF3"  "FRG0"  "FRH0"  "FRI1" 
## [118] "FRI2"  "FRI3"  "FRJ1"  "FRJ2"  "FRK1"  "FRK2"  "FRL0"  "HR021" "HR022"
## [127] "HR023" "HR024" "HR025" "HR026" "HR027" "HR028" "HR031" "HR032" "HR033"
## [136] "HR034" "HR035" "HR036" "HR037" "HU110" "HU120" "HU211" "HU212" "HU213"
## [145] "HU221" "HU222" "HU223" "HU231" "HU232" "HU233" "HU311" "HU312" "HU313"
## [154] "HU321" "HU322" "HU323" "HU331" "HU332" "HU333" "IE041" "IE042" "IE051"
## [163] "IE052" "IE053" "IE061" "IE062" "IE063" "ITC"   "ITF"   "ITG"   "ITH"  
## [172] "ITI"   "LT011" "LT021" "LT022" "LT023" "LT024" "LT025" "LT026" "LT027"
## [181] "LT028" "LT029" "ME0"   "NL11"  "NL12"  "NL13"  "NL21"  "NL22"  "NL23"

# finally merge with regional data
ncol(dfa)

## [1] 11

nrow(dfa)

## [1] 123

ncol(dfreg)

## [1] 23

nrow(dfreg)

## [1] 189

# recode small regions before merge (see above)
tmp = merge(dfreg, region_n[, c("region", "newRegion")],
            by = c("region"), all.x = TRUE, sort = FALSE)

unique(tmp$region)

##   [1] "AT34"  "BE31"  "BE34"  "DE8"   "DEC"   "EL41"  "EL42"  "EL62"  "FI196"
##  [10] "FI1C1" "FI1C2" "FI1C5" "FI1D5" "FI1D7" "HR021" "HR022" "HR023" "HR024"
##  [19] "HR026" "HR027" "HR032" "HR033" "HR034" "HR035" "HR036" "HR037" "HU211"
##  [28] "HU212" "HU213" "HU221" "HU223" "HU232" "HU312" "HU313" "HU332" "HU333"
##  [37] "LT011" "LT022" "LT023" "NL23"  "AT11"  "AT12"  "AT13"  "AT21"  "AT22" 
##  [46] "AT31"  "AT32"  "AT33"  "CY0"   "BE10"  "BE21"  "BE22"  "BE23"  "BE24" 
##  [55] "BE25"  "BG324" "BE32"  "BE33"  "DEB"   "BE35"  "BG311" "BG312" "BG313"
##  [64] "BG314" "BG315" "BG321" "BG322" "BG323" "BG414" "BG325" "BG331" "BG332"
##  [73] "BG333" "BG334" "BG341" "BG342" "BG343" "BG344" "BG411" "BG412" "BG413"
##  [82] "DE7"   "BG415" "BG421" "BG422" "BG423" "BG424" "BG425" "ES51"  "DE1"  
##  [91] "DE2"   "DE3"   "DE4"   "DE6"   "EL43"  "EL51"  "DE9"   "DEA"   "LT026"
## [100] "HU110" "DED"   "DEE"   "DEF"   "DEG"   "EL30"  "FRC1"  "FRC2"  "ES21" 
## [109] "ES22"  "EL52"  "EL53"  "EL54"  "EL61"  "FRF3"  "EL63"  "EL64"  "EL65" 
## [118] "ES11"  "ES12"  "ES13"  "ES64"  "ES70"  "ES23"  "ES24"  "ES30"  "ES41" 
## [127] "ES42"  "ES43"  "HR025" "ES52"  "ES53"  "ES61"  "ES62"  "NL21"  "FRD1" 
## [136] "FRD2"  "FI1B1" "LT025" "FRF1"  "LT027" "HU120" "LT029" "FI200" "FR10" 
## [145] "FRB0"  "HU222" "FRJ1"  "FRJ2"  "FRK1"  "FRE1"  "FRE2"  "IE061" "FRF2" 
## [154] "HU321" "FRG0"  "FRH0"  "FRI1"  "FRI2"  "FRI3"  "HR031" "LT021" "HU231"
## [163] "FRK2"  "FRL0"  "HU311" "IE062" "IE063" "ITC"   "ITF"   "ITG"   "ITH"  
## [172] "HR028" "NL13"  "IE041" "IE042" "IE051" "LT024" "IE053" "ME0"   "NL11" 
## [181] "LT028" "HU322" "IE052" "HU323" "NL12"  "ITI"   "NL22"  "HU233" "HU331"

length(unique(tmp$region))

## [1] 189

# replace region where a newRegion is available
tmp$region = ifelse(!is.na(tmp$newRegion), tmp$newRegion, tmp$region)
unique(tmp$region)

##   [1] "AT_OTH" "BE_OTH" "DE_OTH" "GR_OTH" "FI_OTH" "HR_OTH" "HU_OTH" "LT_OTH"
##   [9] "NL_OTH" "AT11"   "AT12"   "AT13"   "AT21"   "AT22"   "AT31"   "AT32"  
##  [17] "AT33"   "CY0"    "BE10"   "BE21"   "BE22"   "BE23"   "BE24"   "BE25"  
##  [25] "BG324"  "BE32"   "BE33"   "DEB"    "BE35"   "BG311"  "BG312"  "BG313" 
##  [33] "BG314"  "BG315"  "BG321"  "BG322"  "BG323"  "BG414"  "BG325"  "BG331" 
##  [41] "BG332"  "BG333"  "BG334"  "BG341"  "BG342"  "BG343"  "BG344"  "BG411" 
##  [49] "BG412"  "BG413"  "DE7"    "BG415"  "BG421"  "BG422"  "BG423"  "BG424" 
##  [57] "BG425"  "ES51"   "DE1"    "DE2"    "DE3"    "DE4"    "DE6"    "EL43"  
##  [65] "EL51"   "DE9"    "DEA"    "LT026"  "HU110"  "DED"    "DEE"    "DEF"   
##  [73] "DEG"    "EL30"   "FRC1"   "FRC2"   "ES21"   "ES22"   "EL52"   "EL53"  
##  [81] "EL54"   "EL61"   "FRF3"   "EL63"   "EL64"   "EL65"   "ES11"   "ES12"  
##  [89] "ES13"   "ES64"   "ES70"   "ES23"   "ES24"   "ES30"   "ES41"   "ES42"  
##  [97] "ES43"   "HR025"  "ES52"   "ES53"   "ES61"   "ES62"   "NL21"   "FRD1"  
## [105] "FRD2"   "FI1B1"  "LT025"  "FRF1"   "LT027"  "HU120"  "LT029"  "FI200" 
## [113] "FR10"   "FRB0"   "HU222"  "FRJ1"   "FRJ2"   "FRK1"   "FRE1"   "FRE2"  
## [121] "IE061"  "FRF2"   "HU321"  "FRG0"   "FRH0"   "FRI1"   "FRI2"   "FRI3"  
## [129] "HR031"  "LT021"  "HU231"  "FRK2"   "FRL0"   "HU311"  "IE062"  "IE063" 
## [137] "ITC"    "ITF"    "ITG"    "ITH"    "HR028"  "NL13"   "IE041"  "IE042" 
## [145] "IE051"  "LT024"  "IE053"  "ME0"    "NL11"   "LT028"  "HU322"  "IE052" 
## [153] "HU323"  "NL12"   "ITI"    "NL22"   "HU233"  "HU331"

length(unique(tmp$region))

## [1] 158

# drop helper column (optional)
tmp$newRegion = NULL

dfreg = tmp
unique(dfreg$region)

##   [1] "AT_OTH" "BE_OTH" "DE_OTH" "GR_OTH" "FI_OTH" "HR_OTH" "HU_OTH" "LT_OTH"
##   [9] "NL_OTH" "AT11"   "AT12"   "AT13"   "AT21"   "AT22"   "AT31"   "AT32"  
##  [17] "AT33"   "CY0"    "BE10"   "BE21"   "BE22"   "BE23"   "BE24"   "BE25"  
##  [25] "BG324"  "BE32"   "BE33"   "DEB"    "BE35"   "BG311"  "BG312"  "BG313" 
##  [33] "BG314"  "BG315"  "BG321"  "BG322"  "BG323"  "BG414"  "BG325"  "BG331" 
##  [41] "BG332"  "BG333"  "BG334"  "BG341"  "BG342"  "BG343"  "BG344"  "BG411" 
##  [49] "BG412"  "BG413"  "DE7"    "BG415"  "BG421"  "BG422"  "BG423"  "BG424" 
##  [57] "BG425"  "ES51"   "DE1"    "DE2"    "DE3"    "DE4"    "DE6"    "EL43"  
##  [65] "EL51"   "DE9"    "DEA"    "LT026"  "HU110"  "DED"    "DEE"    "DEF"   
##  [73] "DEG"    "EL30"   "FRC1"   "FRC2"   "ES21"   "ES22"   "EL52"   "EL53"  
##  [81] "EL54"   "EL61"   "FRF3"   "EL63"   "EL64"   "EL65"   "ES11"   "ES12"  
##  [89] "ES13"   "ES64"   "ES70"   "ES23"   "ES24"   "ES30"   "ES41"   "ES42"  
##  [97] "ES43"   "HR025"  "ES52"   "ES53"   "ES61"   "ES62"   "NL21"   "FRD1"  
## [105] "FRD2"   "FI1B1"  "LT025"  "FRF1"   "LT027"  "HU120"  "LT029"  "FI200" 
## [113] "FR10"   "FRB0"   "HU222"  "FRJ1"   "FRJ2"   "FRK1"   "FRE1"   "FRE2"  
## [121] "IE061"  "FRF2"   "HU321"  "FRG0"   "FRH0"   "FRI1"   "FRI2"   "FRI3"  
## [129] "HR031"  "LT021"  "HU231"  "FRK2"   "FRL0"   "HU311"  "IE062"  "IE063" 
## [137] "ITC"    "ITF"    "ITG"    "ITH"    "HR028"  "NL13"   "IE041"  "IE042" 
## [145] "IE051"  "LT024"  "IE053"  "ME0"    "NL11"   "LT028"  "HU322"  "IE052" 
## [153] "HU323"  "NL12"   "ITI"    "NL22"   "HU233"  "HU331"

dfa = merge(dfa, dfreg, by = "region")
ncol(dfa)

## [1] 33

nrow(dfa)

## [1] 103

# Last clean up 
# remove still existing very small regions
dfa[dfa$n < 30, c("region", "n")]

##    region  n
## 1  AT_OTH 29
## 95 LT_OTH  3
## 96 LT_OTH  3
## 97 LT_OTH  3
## 98 NL_OTH 19

dfa = dfa[dfa$n >= 28, ]
nrow(dfa)

## [1] 99

##################################################
##### SEMINAR PAPER STUDENT INPUT ################
##################################################


# exclude regions with 0 or 100% right-wing populist vote
dfa <- dfa[dfa$pct_rwpop > 0 & dfa$pct_rwpop < 1, ]
#####end of student input
nrow(dfa)

## [1] 66

Prior to the analysis, regions in which the proportion of right-wing populist voters was 0% or 100% were excluded. Such extreme values could distort the estimates and compromise the reliability of statistical analyses, including correlation and regression models.

### Analysis  

cor(dfa[,3:(ncol(dfa)-1)], dfa$pct_rwpop, use = "complete.obs") # all pairwise correlations

##                                         [,1]
## pct_male                         -0.14233860
## mean_age                          0.39581613
## pct_good_health                   0.17119316
## mean_education                    0.01898300
## mean_income                      -0.14752417
## mean_cesd8                       -0.17813497
## mean_bmi                         -0.20696289
## pct_rwpop                         1.00000000
## n                                 0.01401506
## reg11_area_2023                  -0.21491942
## reg11_tpopsz_2023                -0.29314018
## reg11_fpopsz_2023                -0.29287131
## reg11_mpopsz_2023                -0.29340672
## reg11_pode_2023                   0.02869454
## reg11_lbirth_2023                -0.29797517
## reg11_death_2023                 -0.29643683
## reg11_natgrow_2023                0.26166004
## reg11_cnmigrat_2023              -0.28420719
## reg11_grow_2023                  -0.25274052
## reg11_gbirthrt_2023              -0.13695615
## reg11_gdeathrt_2023              -0.22940311
## reg11_natgrowrt_2023              0.14367709
## reg11_cnmigratrt_2023             0.08277691
## reg11_growrt_2023                 0.12868925
## reg11_gdp_eurhab_2023             0.18434332
## reg11_gdp_mio_eur_2023           -0.28938147
## reg11_gdp_eurhab_eu27_2020_2023   0.18551842
## reg11_gdp_mio_nac_2023            0.08874371
## reg11_gdp_mio_pps_eu27_2020_2023 -0.28718035
## reg11_gdp_pps_eu27_2020_hab_2023  0.21739747

# start with standard model (only structural data)
model = lm(pct_rwpop ~ pct_male + mean_age , dfa)
summary(model)

## 
## Call:
## lm(formula = pct_rwpop ~ pct_male + mean_age, data = dfa)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.084537 -0.023343 -0.004767  0.014157  0.098654 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.172536   0.090454  -1.907 0.061026 .  
## pct_male    -0.091189   0.064193  -1.421 0.160380    
## mean_age     0.005510   0.001556   3.540 0.000757 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04107 on 63 degrees of freedom
## Multiple R-squared:  0.1828, Adjusted R-squared:  0.1569 
## F-statistic: 7.048 on 2 and 63 DF,  p-value: 0.001728

# positive and highly significant age effect
# borderline significant gender effect (female vote more right?)

First, pairwise correlations were calculated between the share of right-wing populist voters and all explanatory variables. This step serves to identify initial associations and to detect potential factors influencing voting behavior.

# Step 1: Bivariate correlations with the dependent variable

biv_cor <- cor(
  dfa[, c("mean_age", "pct_male", "mean_education", "mean_income",
          "pct_good_health", "mean_cesd8", "mean_bmi")],
  dfa$pct_rwpop,
  use = "complete.obs"
)

round(biv_cor, 3)

##                   [,1]
## mean_age         0.396
## pct_male        -0.142
## mean_education   0.019
## mean_income     -0.148
## pct_good_health  0.171
## mean_cesd8      -0.178
## mean_bmi        -0.207

First, the bivariate relationships between the share of right-wing populist voters and the potential explanatory variables were examined. The regional mean age shows a relatively strong positive association. Weaker associations are observed for BMI, depression scores (CES-D8), and the proportion of individuals reporting good health. Education and income are highly correlated; therefore, only one of these variables is included in subsequent analyses to avoid redundancy and multicollinearity.

# Step 1b: Bivariate scatterplots

plot(dfa$mean_age, dfa$pct_rwpop,
     xlab = "Mean age",
     ylab = "RW populist vote share",
     main = "Age and RW vote",
     pch = 16)
abline(lm(pct_rwpop ~ mean_age, data = dfa), col = "red")

plot(dfa$pct_male, dfa$pct_rwpop,
     xlab = "Share of men",
     ylab = "RW populist vote share",
     main = "Gender and RW vote",
     pch = 16)

abline(lm(pct_rwpop ~ pct_male, data = dfa), col = "red")

##### Including Plots
# Graphical representation of the relationships

my_col <- rgb(0, 0, 1, 0.4)


plot(dfa$mean_bmi, dfa$pct_rwpop, 
     xlab = "Mean BMI", ylab = "Share of RW Populist Voters",
     main = "BMI and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ mean_bmi, data = dfa), col = "red", lwd = 2)

# 2. Depression and Right-Wing Populist Vote
plot(dfa$mean_cesd8, dfa$pct_rwpop, 
     xlab = "Mean Depression Score (CES-D8)", ylab = "Share of RW Populist Voters",
     main = "Depression and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ mean_cesd8, data = dfa), col = "red", lwd = 2)

# 3. Good Health and Right-Wing Populist Vote
plot(dfa$pct_good_health, dfa$pct_rwpop, 
     xlab = "Share of Good Health", ylab = "Share of RW Populist Voters",
     main = "Good Health and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ pct_good_health, data = dfa), col = "red", lwd = 2)

# 4. Education and Right-Wing Populist Vote
plot(dfa$mean_education, dfa$pct_rwpop, 
     xlab = "Mean Education Level", ylab = "Share of RW Populist Voters",
     main = "Education and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ mean_education, data = dfa), col = "red", lwd = 2)

# 5. Income and Right-Wing Populist Vote
plot(dfa$mean_income, dfa$pct_rwpop, 
     xlab = "Mean Income (Deciles)", ylab = "Share of RW Populist Voters",
     main = "Income and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ mean_income, data = dfa), col = "red", lwd = 2)

# 6. Age and Right-Wing Populist Vote
plot(dfa$mean_age, dfa$pct_rwpop, 
     xlab = "Mean Age", ylab = "Share of RW Populist Voters",
     main = "Age and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ mean_age, data = dfa), col = "red", lwd = 2)

# 7. Gender (Male Share) and Right-Wing Populist Vote
plot(dfa$pct_male, dfa$pct_rwpop, 
     xlab = "Share of Men", ylab = "Share of RW Populist Voters",
     main = "Gender and Right-Wing Populist Vote", 
     pch = 16, col = my_col)
abline(lm(pct_rwpop ~ pct_male, data = dfa), col = "red", lwd = 2)

The scatterplots visualize the bivariate relationships and help identify potential outliers. For example, the plot for mean age shows a clear positive trend, whereas the association with the proportion of men appears much weaker. The plots provide an initial exploratory basis for the subsequent multivariate analysis.

# Step 2: Correlations among the predictors
pred_vars <- dfa[, c("mean_age", "pct_male", "mean_education",
                     "mean_income", "pct_good_health", "mean_cesd8")]

cor_matrix <- cor(pred_vars, use = "complete.obs")
round(cor_matrix, 2)

##                 mean_age pct_male mean_education mean_income pct_good_health
## mean_age            1.00     0.05          -0.16       -0.35           -0.22
## pct_male            0.05     1.00           0.26        0.14           -0.24
## mean_education     -0.16     0.26           1.00        0.51            0.49
## mean_income        -0.35     0.14           0.51        1.00            0.50
## pct_good_health    -0.22    -0.24           0.49        0.50            1.00
## mean_cesd8         -0.18    -0.29          -0.51       -0.55           -0.48
##                 mean_cesd8
## mean_age             -0.18
## pct_male             -0.29
## mean_education       -0.51
## mean_income          -0.55
## pct_good_health      -0.48
## mean_cesd8            1.00

The correlation matrix shows the relationships among the explanatory variables. Strong correlations, such as between education and income, indicate potential redundancy. To avoid multicollinearity, only education was retained as an indicator of socioeconomic status in the regression model.

# Step 3: Stepwise Regression Analysis

# Baseline model: Age + Gender
m1 <- lm(
  pct_rwpop ~ scale(mean_age) + scale(pct_male),
  data = dfa,
  weights = n
)
summary(m1)

## 
## Call:
## lm(formula = pct_rwpop ~ scale(mean_age) + scale(pct_male), data = dfa, 
##     weights = n)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68414 -0.23467 -0.06358  0.13154  1.25331 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.090085   0.004277  21.063  < 2e-16 ***
## scale(mean_age)  0.019856   0.004148   4.787 1.06e-05 ***
## scale(pct_male) -0.011754   0.004566  -2.574   0.0124 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4201 on 63 degrees of freedom
## Multiple R-squared:  0.3059, Adjusted R-squared:  0.2839 
## F-statistic: 13.88 on 2 and 63 DF,  p-value: 1.01e-05

Result: The regional mean age shows a positive and statistically significant association with the share of right-wing populist voters. The proportion of men is not significant. This baseline model serves as a reference for comparison with the extended models.

#Final model: Age + Gender + Education + Health
m2 <- lm(
  pct_rwpop ~ scale(mean_age) + scale(pct_male) +
    scale(mean_education) + scale(pct_good_health),
  data = dfa,
  weights = n
)
summary(m2)

## 
## Call:
## lm(formula = pct_rwpop ~ scale(mean_age) + scale(pct_male) + 
##     scale(mean_education) + scale(pct_good_health), data = dfa, 
##     weights = n)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.72772 -0.24498 -0.08542  0.18652  1.12733 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.091945   0.004077  22.549  < 2e-16 ***
## scale(mean_age)         0.023708   0.004106   5.774  2.8e-07 ***
## scale(pct_male)        -0.005136   0.005275  -0.974   0.3340    
## scale(mean_education)  -0.002035   0.005757  -0.353   0.7250    
## scale(pct_good_health)  0.018422   0.006994   2.634   0.0107 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3956 on 61 degrees of freedom
## Multiple R-squared:  0.4039, Adjusted R-squared:  0.3648 
## F-statistic: 10.33 on 4 and 61 DF,  p-value: 1.869e-06

Result: The positive effect of age remains robust and significant. A higher proportion of the population reporting good health is associated with a lower share of votes for right-wing populist parties, although the effect is moderate. Education does not show a significant effect once age and health are accounted for.

#comparing models
anova(m1, m2)

## Analysis of Variance Table
## 
## Model 1: pct_rwpop ~ scale(mean_age) + scale(pct_male)
## Model 2: pct_rwpop ~ scale(mean_age) + scale(pct_male) + scale(mean_education) + 
##     scale(pct_good_health)
##   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
## 1     63 11.1163                                
## 2     61  9.5471  2    1.5692 5.0129 0.009646 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: The model comparison using ANOVA indicates that the extended model explains the data slightly better. Age remains the consistently strongest predictor of the share of right-wing populist voters.

#summary of the results

summary(m1)

## 
## Call:
## lm(formula = pct_rwpop ~ scale(mean_age) + scale(pct_male), data = dfa, 
##     weights = n)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68414 -0.23467 -0.06358  0.13154  1.25331 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.090085   0.004277  21.063  < 2e-16 ***
## scale(mean_age)  0.019856   0.004148   4.787 1.06e-05 ***
## scale(pct_male) -0.011754   0.004566  -2.574   0.0124 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4201 on 63 degrees of freedom
## Multiple R-squared:  0.3059, Adjusted R-squared:  0.2839 
## F-statistic: 13.88 on 2 and 63 DF,  p-value: 1.01e-05

summary(m2)

## 
## Call:
## lm(formula = pct_rwpop ~ scale(mean_age) + scale(pct_male) + 
##     scale(mean_education) + scale(pct_good_health), data = dfa, 
##     weights = n)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.72772 -0.24498 -0.08542  0.18652  1.12733 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.091945   0.004077  22.549  < 2e-16 ***
## scale(mean_age)         0.023708   0.004106   5.774  2.8e-07 ***
## scale(pct_male)        -0.005136   0.005275  -0.974   0.3340    
## scale(mean_education)  -0.002035   0.005757  -0.353   0.7250    
## scale(pct_good_health)  0.018422   0.006994   2.634   0.0107 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3956 on 61 degrees of freedom
## Multiple R-squared:  0.4039, Adjusted R-squared:  0.3648 
## F-statistic: 10.33 on 4 and 61 DF,  p-value: 1.869e-06

##################################################
##### END SEMINAR PAPER STUDENT INPUT ############
##################################################
##### END SEMINAR PAPER STUDENT INPUT ############
##################################################

Seminararbeit Final Magdalena Fink

Introduction

Right-wing populist parties have gained more and more support across many European countries over the past decade. This trend has raised concerns about democratic stability and the broader social conditions which can encourage political discontent. Researches indicate that support for populist parties is often linked to perceived social decline, economic insecurity, and feelings of marginalization among certain population groups (Inglehart & Norris, 2016; Gidron & Hall, 2017).
More recently, especially health has been discussed as a relevant determinant of political behavior. Poor mental and physical health can increase feelings of insecurity and pessimism. These feelings may lead to dissatisfaction with political institutions and increase support for exclusionary or anti-establishment parties (Marmot et al., 2012; Case & Deaton, 2020). Also Empirical studies suggest that health-related disadvantage is socially patterned and closely linked with education, income, and labor market status (Wilkinson & Pickett, 2009). From a public health perspective, understanding the relationship between health indicators and political thinking is therefore highly relevant. Laverty and Hopkinson (2025) demonstrate that poorer population health is associated with higher levels of right-wing populist voting at the national level in Europe. Building on this work, the present study aims to replicate and extend their findings using data from the European Social Survey (ESS). The aim of this paper is to examine whether regional differences in health and socio-demographic characteristics are related to the share of right-wing populist voters across European countries. By considering both physical and mental health indicators, the study adds to existing research on the link between health inequalities and political preferences.

Literature review and Hypotheses

Previous studies have shown that health outcomes are socially patterned and closely linked to education, income, and labor market position. Poor health is more prevalent in socioeconomically disadvantaged groups, which are also more likely to experience political alienation and distrust in institutions (Kavanagh, 2021). Mental health, especially depressive symptoms, may help explain the link between health and political behavior. Depression is often related to pessimism, feelings of helplessness, and low trust in society, which can make people more open to populist messages. Previous studies show that regions with higher psychological distress tend to display stronger support for radical or anti-establishment parties (Gidron & Hall, 2017; Schraff, 2019). Physical health indicators, such as obesity measured via Body Mass Index (BMI), can be interpreted as markers of cumulative disadvantage and unhealthy living conditions. Higher BMI levels are often associated with lower socioeconomic status and limited access to health-promoting resources, which may indirectly contribute to political dissatisfaction (Marmot et al., 2010). In addition to health related factors, socio demographic characteristics are important confounders in explaining variation in right wing populist support, and many studies in the literature include age, education, income, and gender as predictors of radical right wing voting behaviour (Stockemer, Lentz, & Mayer, 2018). Based on previous research, several hypotheses are formulated to examine the relationship between regional socio-demographic and health characteristics and support for right-wing populist parties. First, it is expected that regions with an older population structure show higher proportions of right-wing populist voters (H1). Also age has repeatedly been identified as an important determinant of voting behavior, with older populations tending to display stronger support for right-wing populist parties. Second, a lower level of education is assumed to be negatively associated with right-wing populist voting. Regions with lower average levels of education are therefore expected to exhibit higher proportions of right-wing populist voters (H2). Third, physical health is hypothesized to play an important role. Regions characterized by poorer average health outcomes are expected to show higher levels of support for right-wing populist parties (H3). Fourth, mental health is considered an additional explanatory factor. Regions with higher average levels of depressive symptoms, measured by the CES-D8 scale, are expected to have higher proportions of right-wing populist voters (H4). Finally, gender composition is included as a socio-demographic factor. Regions with a higher proportion of men are expected to exhibit higher right-wing populist vote shares (H5).

Methods

The study uses data from the European Social Survey (ESS). Data from individuals were combined and summarized for each country and region. Countries without comparable voting information were left out. The dependent variable is the proportion of right-wing populist voters (pct_rwpop), constructed by identifying country-specific right-wing populist parties based on ESS voting variables. Independent variables include:

• Average age (mean_age) • Proportion of males (pct_male) • Proportion reporting good or very good health (pct_good_health) • Mean depression score based on the CES-D8 scale (mean_cesd8) • Mean Body Mass Index (mean_bmi) • Mean education level (mean_education) • Mean household income (mean_income)

Regions with fewer than 30 respondents were aggregated into country-specific “other regions” to ensure sufficient sample sizes at the regional level. The analysis includes descriptive statistics, bivariate correlations, and multivariate linear regression models. In addition, graphical analyses were used to visually assess and illustrate the relationships implied by the hypotheses.

Results

4.1 Sample Description The final dataset comprises 123 country–region units across several European countries. The proportion of right-wing populist voters varies considerably across regions. Substantial regional differences are observed in age structure, health indicators, educational attainment, and income levels, indicating pronounced social and health-related heterogeneity.

4.2 Bivariate Associations

4.2.1 Bivariate Correlations

Pairwise correlations between the share of right-wing populist voters and the potential explanatory variables were first examined. Regional mean age shows a relatively strong positive association with right-wing populist voting. Weaker positive associations are observed for BMI and depression scores (CES-D8), while good self-rated health and educational attainment are negatively associated. Education and income are highly correlated; to avoid redundancy and multicollinearity, only education is retained for the multivariate analysis.

4.2.2 Scatterplots

Scatterplots were used to visualize the bivariate relationships and to detect potential outliers. The plot for mean age displays a clear positive trend, whereas the association with the proportion of men is weaker and less consistent. These plots provide an initial exploratory basis for subsequent hypothesis testing in the multivariate analysis.

4.3 Predictor Correlations

4.4 Multivariate Regression Analysis

4.4.1 Baseline Model: Age + Gender

A baseline regression model including regional mean age and the proportion of men was estimated. Results indicate that age has a positive and statistically significant association with the share of right-wing populist voters, while gender does not show a significant effect. This baseline model provides a reference for comparison with the extended model. 4.4.2 Final Model: Age + Gender + Education + Health The extended model additionally includes education and the proportion of individuals reporting good or very good health. All predictors were standardized. The positive effect of age remains robust and significant. A higher proportion of healthy individuals is associated with a lower share of votes for right-wing populist parties, although the effect is moderate. Education does not have a significant effect once age and health are accounted for. 4.4.3 Model Comparison ANOVA model comparisons indicate that the extended model explains the data slightly better than the baseline model. Age remains the consistently strongest predictor of the share of right-wing populist voters.

4.5 Summary of Hypotheses Testing

The following section summarizes the results of the empirical hypothesis testing, highlighting the extent to which each proposed relationship is supported by the statistical analyses. H1 (Age): Strongly supported. Regions with higher average age consistently exhibit higher shares of right-wing populist voters. H2 (Education): Limited support. Negative association at the bivariate level weakens in the multivariate model. H3 (Health): Partially supported. Poorer health is associated with higher right-wing populist voting, particularly in the extended model. H4 (Depression): Supported at the bivariate level, but not in the multivariate analysis. H5 (Gender): Not supported. No significant effect of the proportion of men on voting patterns.

Discussion

The results of this study largely support the proposed hypotheses. Regions with poorer physical and mental health consistently show higher shares of right-wing populist voters, even after controlling for socio-demographic and economic factors. This finding aligns with prior research suggesting that health-related disadvantage can contribute to political dissatisfaction and increase support for populist parties. The positive association between BMI and populist voting may reflect broader structural inequalities, including access to health resources and socioeconomic disadvantage. The observed relationship with depressive symptoms highlights mental health as an important political determinant, indicating that psychological well-being may influence political attitudes and voting behavior. Although education and income exhibit protective effects, their significance is attenuated in the multivariate analysis, suggesting that health factors exert an independent influence. Also several limitations should be acknowledged. First, the analysis relies on aggregated cross-sectional data, which restricts causal inference. Second, regional aggregation may obscure individual-level mechanisms and heterogeneity. Finally, unmeasured contextual variables could also contribute to observed regional differences. Despite these limitations, the findings provide robust evidence for a meaningful association between health and political behavior at the regional level. Future research could investigate longitudinal or individual-level data to better disentangle causal pathways and explore the interaction between health, socioeconomic status, and political preferences.

Conclusion

This study replicates and extends previous findings by demonstrating that both physical and mental health indicators are significantly associated with right-wing populist voting at the regional level in Europe. Health disparities appear to contribute to regional variation in populist support, independently of socioeconomic and demographic factors. From a public health perspective, improving population health may have broader societal benefits beyond individual well-being, potentially fostering political stability and social cohesion. Addressing health inequalities may therefore not only improve quality of life but also reduce political polarization and populist support.

References

Case, A., & Deaton, A. (2020). Deaths of despair and the future of capitalism. Princeton University Press.

Gidron, N., & Hall, P. A. (2017). The politics of social status: Economic and cultural roots of the populist right. The British Journal of Sociology, 68(S1), S57–S84. https://doi.org/10.1111/1468-4446.12319

Inglehart, R., & Norris, P. (2016). Trump, Brexit, and the rise of populism: Economic have-nots and cultural backlash (Faculty Research Working Paper Series RWP16-026). Harvard Kennedy School.

Kavanagh, N. M., Menon, A., & Heinze, J. E. (2021). Does health vulnerability predict voting for right-wing populist parties in Europe? American Political Science Review, 115(3), 1104–1109. https://doi.org/10.1017/S0003055421000265

Laverty, A. A., & Hopkinson, N. S. (2025). What is the relationship between population health and voting patterns? An ecological study in England. BMJ Open Respiratory Research, 12, e003526. https://doi.org/10.1136/bmjresp-2025-003526

Marmot, M., Allen, J., Bell, R., Bloomer, E., & Goldblatt, P. (2012). WHO European review of social determinants of health and the health divide. The Lancet, 380(9846), 1011–1029. https://doi.org/10.1016/S0140-6736(12)61228-8

Marmot, M., Allen, J., Goldblatt, P., Boyce, T., & McNeish, D. (2010). Fair society, healthy lives: The Marmot review. Institute of Health Equity. https://www.instituteofhealthequity.org/resources-reports/fair-society-healthy-lives-the-marmot-review

Schraff, D. (2019). Political trust during the COVID-19 pandemic: Rally around the flag or lockdown effects? European Journal of Political Research, 60(4), 1007–1017. https://doi.org/10.1111/1475-6765.12425

Stockemer, D., Lentz, T., & Mayer, D. (2018). Individual predictors of the radical right-wing vote in Europe: A meta-analysis of articles in peer-reviewed journals (1995–2016). Government and Opposition, 53(3), 569–593. https://doi.org/10.1017/gov.2018.2

Wilkinson, R. G., & Pickett, K. E. (2009). The spirit level: Why more equal societies almost always do better. Allen Lane.

Magdalena Fink Seminararbeit

2026-02-05

R Markdown