Weronika Nitecka_Hmwk_1QM2

Logistic regressions - HMWK 1QM2 - WN

Note on the use of AI, I used Gemini o3 Thinking and package documentation to understand how I can clean variables in bulk, as described here: Clean union_member, diploma, religion, and income by recoding non-responses as NA.

Downloading all necessary libraries:

library(ggeffects)
Warning: package 'ggeffects' was built under R version 4.4.3
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rio)
library(stargazer)

Please cite as: 

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
  1. Downloading the dataset and importing it to the local R environment
df <- import("https://github.com/scpo-quantimethods/quantitative_methods_two/raw/main/sessions/session2/data/poll2024.RData")
Warning: Missing `trust` will be set to FALSE by default for RData in 2.0.0.
  1. Determining the dependent/independent variables.

Considering that we are interested in how voting for the radical right varies by gender, let’s check the available variables with glimpse(). The independent variable is age, this is the X that will vary and change the Y. The dependent variable is Y, the voting for the radical right, which will vary with gender (X).

X: Gender is a categorical number as the possible outputs are 1 or 2.

Y: Vote choice is a categorical number, it has a finite number of outputs considering the variable second_round_vote, or also a categorical number if we consider directly the variable voteRN created later on.

glimpse(df)
Rows: 5,109
Columns: 139
$ respondent_id                       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
$ gender                              <dbl> 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2…
$ birth_year                          <dbl> 1981, 1991, 1997, 1965, 1989, 1941…
$ region                              <dbl> 44, 11, 11, 11, 44, 27, 75, 32, 94…
$ commune                             <dbl> 1, 4, 2, 5, 3, 5, 4, 3, 2, 5, 2, 2…
$ diploma                             <dbl> 15, 19, 13, 15, 7, 6, 4, 10, 13, 8…
$ employment_status                   <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 5, 1…
$ socio_professional_group            <dbl> 6666, 6666, 6666, 6666, 6666, 4, 6…
$ student_job                         <dbl> 6666, 6666, 6666, 6666, 6666, 6666…
$ previous_job                        <dbl> 6666, 6666, 6666, 6666, 6666, 6666…
$ socio_professional_category         <dbl> 34, 34, 31, 46, 54, 6666, 45, 37, …
$ professional_status                 <dbl> 2, 2, 1, 4, 3, 2, 2, 4, 4, 4, 2, 4…
$ company_size_index                  <dbl> 6666, 6666, 1, 6666, 6666, 6666, 6…
$ public_sector_qualification         <dbl> 1, 1, 6666, 6666, 6666, 2, 3, 6666…
$ private_sector_qualification        <dbl> 6666, 6666, 6666, 1, 4, 6666, 6666…
$ income                              <dbl> 11, 11, 10, 10, 7, 8, 4, 10, 9, 9,…
$ household                           <dbl> 4, 2, 4, 2, 2, 1, 1, 2, 4, 3, 1, 1…
$ children                            <dbl> 0, 0, 2, 0, 0, 0, 0, 1, 2, 0, 0, 0…
$ political_interest                  <dbl> 4, 4, 3, 3, 4, 4, 3, 2, 4, 3, 1, 3…
$ left_right_scale                    <dbl> 8, 2, 2, 3, 6, 4, 7, 5, 2, 5, 3, 2…
$ first_round_participation           <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
$ first_round_vote                    <dbl> 2, 2, 2, 5, 1, 3, 1, 3, 2, 3, 2, 2…
$ second_round_participation          <dbl> 4, 4, 7, 4, 4, 4, 4, 4, 4, 7, 4, 4…
$ second_round_configuration          <dbl> 2, 11, 6666, 4, 2, 1, 3, 1, 3, 666…
$ second_round_vote                   <dbl> 6, 2, 6666, 3, 1, 6, 9999, 2, 4, 6…
$ european_election_participation     <dbl> 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4…
$ european_election_vote              <dbl> 4, 4, 4, 2, 1, 2, 1, 6666, 4, 2, 9…
$ presidential_election_participation <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4…
$ presidential_election_vote          <dbl> 3, 3, 3, 1, 2, 1, 2, 1, 3, 1, 6666…
$ leaflets                            <dbl> 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 2…
$ meetings                            <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2…
$ facebook                            <dbl> 3, 3, 2, 9999, 2, 9999, 2, 1, 1, 3…
$ x_twitter                           <dbl> 3, 3, 3, 1, 1, 9999, 2, 2, 3, 3, 3…
$ youtube                             <dbl> 1, 3, 2, 9999, 2, 9999, 2, 2, 1, 3…
$ instagram                           <dbl> 3, 3, 3, 9999, 2, 9999, 2, 2, 3, 3…
$ snapchat                            <dbl> 3, 3, 3, 9999, 3, 9999, 3, 2, 3, 3…
$ tiktok                              <dbl> 3, 3, 3, 9999, 3, 9999, 3, 3, 3, 3…
$ twitch                              <dbl> 2, 3, 3, 9999, 3, 9999, 3, 3, 3, 3…
$ whatsapp                            <dbl> 1, 2, 2, 9999, 2, 9999, 2, 2, 3, 3…
$ contact                             <dbl> 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ party                               <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ association                         <dbl> 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 2…
$ petition                            <dbl> 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1…
$ protest                             <dbl> 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ badge                               <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2…
$ pensions                            <dbl> 1, 1, 2, 3, 3, 4, 2, 3, 2, 4, 3, 2…
$ prioritize_ecology                  <dbl> 3, 1, 2, 2, 3, 5, 5, 2, 1, 4, 5, 2…
$ too_many_immigrants                 <dbl> 3, 4, 5, 3, 1, 3, 1, 3, 4, 2, 2, 4…
$ easy_for_men                        <dbl> 4, 2, 3, 1, 4, 2, 2, 2, 1, 3, 3, 2…
$ too_much_feminism                   <dbl> 2, 4, 5, 4, 2, 2, 1, 2, 4, 2, 5, 3…
$ adoption                            <dbl> 3, 1, 1, 1, 3, 1, 2, 3, 1, 4, 3, 1…
$ change_civil_status                 <dbl> 4, 1, 5, 1, 4, 2, 4, 3, 1, 2, 4, 2…
$ palestine                           <dbl> 1, 1, 2, 3, 4, 5, 4, 2, 1, 5, 5, 2…
$ eu_membership                       <dbl> 3, 1, 1, 1, 3, 1, 3, 1, 2, 1, 4, 1…
$ union_member                        <dbl> 1, 3, 3, 3, 1, 2, 2, 3, 3, 1, 3, 3…
$ cfdt                                <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ cgt                                 <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ fo                                  <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ fsu                                 <dbl> 1, 6666, 6666, 6666, 0, 0, 0, 6666…
$ cfecgc                              <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ cftc                                <dbl> 0, 6666, 6666, 6666, 0, 0, 1, 6666…
$ unsa                                <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ sud                                 <dbl> 0, 6666, 6666, 6666, 0, 0, 0, 6666…
$ union_representative                <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2…
$ union_services                      <dbl> 2, 1, 2, 3, 2, 2, 2, 2, 1, 2, 2, 2…
$ facilitate_layoffs                  <dbl> 4, 4, 4, 3, 3, 5, 4, 3, 3, 3, 3, 3…
$ increase_salaries                   <dbl> 1, 1, 2, 2, 1, 3, 1, 3, 1, 5, 2, 2…
$ find_job                            <dbl> 3, 3, 3, 2, 2, 2, 3, 2, 4, 1, 3, 3…
$ rsa_work                            <dbl> 4, 3, 3, 2, 1, 2, 1, 2, 4, 1, 2, 4…
$ contract                            <dbl> 1, 1, 6666, 2, 2, 1, 1, 2, 2, 2, 3…
$ size                                <dbl> 1, 4, 1, 2, 5, 6, 5, 1, 2, 5, 2, 2…
$ time                                <dbl> 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1…
$ competition                         <dbl> 4, 3, 2, 3, 4, 4, 3, 4, 2, 1, 4, 4…
$ repetitive                          <dbl> 2, 2, 1, 2, 1, 9999, 1, 2, 1, 2, 1…
$ tiring                              <dbl> 1, 2, 1, 2, 1, 9999, 1, 2, 1, 1, 1…
$ autonomous                          <dbl> 1, 1, 1, 1, 1, 9999, 1, 1, 1, 1, 3…
$ rewarding                           <dbl> 1, 1, 1, 1, 2, 9999, 2, 1, 1, 1, 2…
$ dangerous                           <dbl> 2, 2, 2, 2, 1, 9999, 1, 2, 2, 2, 2…
$ interesting                         <dbl> 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2…
$ stressful                           <dbl> 1, 2, 2, 1, 1, 9999, 1, 1, 1, 1, 1…
$ pressure                            <dbl> 1, 1, 2, 2, 1, 9999, 1, 1, 1, 1, 1…
$ job_security                        <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 3, 3…
$ remuneration                        <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 3, 2…
$ atmosphere                          <dbl> 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 2, 1…
$ career_development                  <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2…
$ initiatives                         <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 3, 2…
$ deadlines                           <dbl> 2, 1, 1, 1, 2, 9999, 2, 1, 1, 2, 3…
$ learning                            <dbl> 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 3, 2…
$ useful                              <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 3, 2…
$ colleague_support                   <dbl> 1, 1, 1, 1, 1, 3, 2, 1, 1, 1, 1, 1…
$ management_support                  <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 2, 1…
$ union_support                       <dbl> 1, 1, 1, 2, 1, 1, 2, 3, 3, 1, 3, 2…
$ employer_recognition                <dbl> 2, 2, 1, 1, 2, 9999, 2, 4, 1, 1, 2…
$ colleague_recognition               <dbl> 1, 1, 1, 1, 1, 9999, 2, 1, 1, 1, 2…
$ client_recognition                  <dbl> 2, 1, 1, 1, 1, 9999, 2, 1, 1, 1, 1…
$ company_recognition                 <dbl> 2, 1, 1, 1, 2, 9999, 2, 1, 1, 3, 2…
$ employer_relationship               <dbl> 5, 5, 9999, 4, 2, 9999, 5, 4, 3, 4…
$ management_relationship             <dbl> 5, 4, 9999, 4, 2, 9999, 4, 4, 3, 4…
$ colleague_relationship              <dbl> 6, 4, 9999, 4, 4, 9999, 4, 4, 3, 4…
$ user_relationship                   <dbl> 2, 3, 9999, 4, 4, 9999, 4, 3, 3, 4…
$ participate_in_decisions            <dbl> 2, 2, 2, 1, 4, 9999, 3, 1, 1, 1, 3…
$ colleague_discussions               <dbl> 1, 2, 2, 1, 1, 9999, 2, 2, 2, 2, 3…
$ socialize_with_colleagues           <dbl> 1, 2, 1, 1, 2, 6666, 2, 1, 2, 3, 6…
$ union_influence                     <dbl> 3, 3, 4, 6, 2, 9999, 4, 6, 5, 3, 6…
$ staff_representative_mandate        <dbl> 3, 3, 3, 3, 3, 9999, 3, 3, 3, 3, 3…
$ professional_vote_participation     <dbl> 1, 1, 3, 3, 1, 9999, 1, 3, 3, 1, 3…
$ strike_participation                <dbl> 1, 2, 1, 3, 1, 9999, 1, 3, 3, 3, 3…
$ collective_action_participation     <dbl> 3, 1, 1, 3, 3, 9999, 1, 3, 3, 1, 2…
$ professional_vote_choice            <dbl> 5, 2, 6666, 6666, 9, 6666, 6, 6666…
$ political_discussions_at_work       <dbl> 2, 2, 2, 1, 3, 2, 3, 1, 2, 4, 4, 3…
$ housing                             <dbl> 2, 2, 9999, 1, 2, 1, 2, 2, 3, 5, 3…
$ epices1                             <dbl> 1, 1, 9999, 2, 2, 1, 2, 1, 1, 1, 1…
$ epices2                             <dbl> 2, 1, 9999, 1, 1, 1, 2, 1, 1, 2, 1…
$ epices3                             <dbl> 1, 1, 9999, 1, 1, 1, 2, 1, 1, 1, 1…
$ epices4                             <dbl> 1, 1, 9999, 1, 1, 1, 2, 2, 1, 1, 1…
$ epices5                             <dbl> 1, 1, 9999, 1, 1, 1, 2, 1, 1, 2, 1…
$ epices6                             <dbl> 2, 2, 9999, 2, 2, 2, 2, 1, 2, 2, 1…
$ epices7                             <dbl> 1, 1, 9999, 1, 1, 1, 1, 1, 1, 1, 1…
$ epices8                             <dbl> 2, 2, 9999, 2, 2, 2, 2, 1, 1, 2, 2…
$ epices9                             <dbl> 1, 1, 9999, 1, 1, 1, 2, 1, 1, 1, 1…
$ disability                          <dbl> 3, 3, 9999, 3, 1, 2, 3, 3, 3, 3, 2…
$ nationality                         <dbl> 1, 1, 9999, 1, 1, 1, 1, 1, 1, 1, 2…
$ parental_nationality                <dbl> 2, 2, 9999, 2, 2, 2, 2, 2, 1, 2, 1…
$ religious_affiliation               <dbl> 9, 9, 9999, 9, 1, 2, 1, 9, 9, 9, 1…
$ religious_practice_frequency        <dbl> 9999, 9999, 9999, 9999, 5, 4, 5, 9…
$ heterosexual                        <dbl> 1, 1, 9999, 2, 1, 1, 1, 1, 1, 1, 1…
$ homosexual                          <dbl> 2, 2, 9999, 1, 2, 2, 2, 2, 2, 2, 2…
$ bisexual                            <dbl> 2, 2, 9999, 2, 2, 2, 2, 2, 2, 2, 2…
$ transgender                         <dbl> 2, 2, 9999, 2, 2, 9999, 2, 2, 2, 2…
$ origin_discrimination               <dbl> 2, 2, 9999, 2, 2, 9999, 2, 2, 2, 2…
$ disability_discrimination           <dbl> 2, 2, 9999, 2, 2, 9999, 2, 2, 2, 2…
$ color_discrimination                <dbl> 1, 2, 9999, 2, 2, 9999, 2, 2, 2, 2…
$ sex_discrimination                  <dbl> 2, 2, 9999, 2, 2, 9999, 2, 2, 2, 1…
$ sexuality_discrimination            <dbl> 2, 2, 9999, 2, 2, 9999, 2, 2, 2, 2…
$ religion_discrimination             <dbl> 2, 2, 9999, 2, 2, 1, 2, 2, 2, 2, 2…
$ perceived_origin                    <dbl> 6666, 6666, 6666, 6666, 6666, 6666…
$ epices                              <dbl> 7.100000e+00, -3.552714e-15, 6.686…
$ PCS                                 <dbl> 42, 34, 43, 37, 46, 43, 45, 37, 37…
$ weight                              <dbl> 0.3102469, 0.2236396, 0.3400078, 0…

Let’s drop all the variables except second_round_vote, gender, union_member, diploma, religious_affiliation, income and birth_year.

df <- df %>% select(second_round_vote, gender, union_member, diploma, religious_affiliation, income, birth_year)
  1. Inspecting the dataset and create new variables (voteRN + variable age)
attributes(df$second_round_vote)
$label
[1] "Pour quel candidat avez-vous voté lors de ce second tour ?"

$format.stata
[1] "%10.0g"

$labels
 Un candidat du Rassemblement national Un candidat du nouveau Front populaire 
                                     1                                      2 
                Un candidat d'Ensemble           Un candidat des Républicains 
                                     3                                      4 
                     Un autre candidat            Vous avez voté blanc ou nul 
                                     5                                      6 
           Question non posée (filtre)                            Non réponse 
                                  6666                                   9999 
  • voteRN = 1 if the respondent voted for RN in the second round, 0 otherwise (NA if no answer)
table(df$second_round_vote)

   1    2    3    4    5    6 6666 9999 
1365 1042  803  260   73  375 1160   31 
df <- df %>% mutate(voteRN = case_when(
    second_round_vote == 1 ~ 1,
    second_round_vote %in% c(2, 3, 4, 5, 6) ~ 0,
    second_round_vote == 6666 ~ NA_real_,
    second_round_vote == 9999 ~ NA_real_,
    TRUE ~ NA_real_
  ))
  • Using 2026 as the reference year, create a variable age.
df <- df %>% mutate(age = 2026 - birth_year)
  1. Define a variable female = 1 if the respondent identifies as a woman, 0 if male, and NA_integer_ otherwise (too few respondents in other categories or no answer).
table(df$gender)

   1    2    3 
2428 2658   23 
df <- df %>% mutate(female = case_when(
    gender == 1 ~ 1,
    gender == 2 ~ 0,
    gender == 3 ~ NA_real_,
    TRUE ~ NA_real_
  ))
  • Clean union_member, diploma, religion, and income by recoding non-responses as NA.
variables_to_clean <- c("union_member", "diploma", "religious_affiliation", "income")

df <- df %>% mutate(across(all_of(variables_to_clean), ~ case_when(
  . %in% c(6666, 9999) ~ NA_real_,
  TRUE ~ as.numeric(.)
)))
  • Create a nominal variable cohort by grouping ages into decades: 20-29, 30–39, 40–49, …, 80+,
df <- df %>% mutate(cohort = case_when(
  age >= 20 & age <= 29 ~ "20-29",
  age >= 30 & age <= 39 ~ "30-39",
  age >= 40 & age <= 49 ~ "40-49",
  age >= 50 & age <= 59 ~ "50-59",
  age >= 60 & age <= 69 ~ "60-69",
  age >= 70 & age <= 79 ~ "70-79",
  age >= 80 ~ "80+",
  TRUE ~ NA_character_
))
df$cohort <- factor(
df$cohort,
levels = c("20-29","30-39","40-49","50-59","60-69","70-79","80+"),
ordered = FALSE
)

5. Plot a histogram of the age variable with 10 breaks. Identify the most frequent decade in the dataset.

hist(df$age, breaks = 10)

table(df$cohort)

20-29 30-39 40-49 50-59 60-69 70-79   80+ 
  425   676   804  1144   900   916   244 

The decade 50-59 is the most frequent based on the histogram, upon checking the table(), we confirm that it has 1144 entries.

  1. Run a logistic regression predicting the probability of voting for RN based on gender and age.
model1 <- glm(
data = df,
voteRN ~ age + female,
family = "binomial"
)
summary(model1)

Call:
glm(formula = voteRN ~ age + female, family = "binomial", data = df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.088474   0.129171  -8.427  < 2e-16 ***
age          0.006902   0.002100   3.287  0.00101 ** 
female       0.181010   0.068116   2.657  0.00788 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5042.1  on 3897  degrees of freedom
Residual deviance: 5026.4  on 3895  degrees of freedom
  (1211 observations deleted due to missingness)
AIC: 5032.4

Number of Fisher Scoring iterations: 4

Interpret the results:

  • How does a one-year increase in age affect the odds of voting for RN?

The logistic regression provides an estimation of the age intercept: 0.006902, meaning that for an additional year of age, the odds of voting for RN increase by 0.6902%.

  • How does being a woman affect the odds ratio?

As for being a woman, it increases by 18,101% the odds of voting for RN. The estimated intercept is 0.181010.

  1. Include as.factor(diploma) in the regression.
model2 <- glm(
data = df,
voteRN ~ age + female + as.factor(diploma),
family = "binomial"
)
summary(model2)

Call:
glm(formula = voteRN ~ age + female + as.factor(diploma), family = "binomial", 
    data = df)

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -0.238619   0.198891  -1.200 0.230238    
age                  -0.003068   0.002369  -1.295 0.195211    
female                0.068059   0.070872   0.960 0.336900    
as.factor(diploma)2   0.252021   0.240866   1.046 0.295416    
as.factor(diploma)3   0.173461   0.183273   0.946 0.343914    
as.factor(diploma)4   0.273481   0.152113   1.798 0.072196 .  
as.factor(diploma)5   0.441467   0.171775   2.570 0.010169 *  
as.factor(diploma)6  -0.343566   0.194646  -1.765 0.077549 .  
as.factor(diploma)7  -0.460366   0.259491  -1.774 0.076045 .  
as.factor(diploma)8  -0.030958   0.171539  -0.180 0.856783    
as.factor(diploma)9  -0.653184   0.190811  -3.423 0.000619 ***
as.factor(diploma)10 -1.045290   0.232529  -4.495 6.95e-06 ***
as.factor(diploma)11 -0.416329   0.278993  -1.492 0.135632    
as.factor(diploma)12 -0.956555   0.269338  -3.551 0.000383 ***
as.factor(diploma)13 -0.855077   0.199986  -4.276 1.91e-05 ***
as.factor(diploma)14 -0.704965   0.241519  -2.919 0.003513 ** 
as.factor(diploma)15 -0.819945   0.206342  -3.974 7.08e-05 ***
as.factor(diploma)16 -1.425567   0.500326  -2.849 0.004382 ** 
as.factor(diploma)17 -0.653381   0.339124  -1.927 0.054021 .  
as.factor(diploma)18 -1.402429   0.297139  -4.720 2.36e-06 ***
as.factor(diploma)19 -1.634850   0.348500  -4.691 2.72e-06 ***
as.factor(diploma)20 -0.913743   0.400921  -2.279 0.022661 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5042.1  on 3897  degrees of freedom
Residual deviance: 4795.7  on 3876  degrees of freedom
  (1211 observations deleted due to missingness)
AIC: 4839.7

Number of Fisher Scoring iterations: 4
  1. Compare models using stargazer() by adapting the code below, which exponentiates the coefficients from your logistic regressions and adds the exponentiated values to the stargazer table:
exp_coef1 <- exp(coef(model1))
exp_coef2 <- exp(coef(model2))
exp_ci1 <- exp(confint(model1))
Waiting for profiling to be done...
exp_ci2 <- exp(confint(model2))
Waiting for profiling to be done...
stargazer(model1, model2,
type = "text",
coef = list(exp_coef1, exp_coef2),
ci = TRUE,
ci.custom = list(exp_ci1, exp_ci2),
p.auto = FALSE,
title = "Logistic Regression (Odds Ratios)")

Logistic Regression (Odds Ratios)
==================================================
                          Dependent variable:     
                     -----------------------------
                                voteRN            
                          (1)            (2)      
--------------------------------------------------
age                     1.007***        0.997     
                     (1.003, 1.011) (0.992, 1.002)
                                                  
female                  1.198***        1.070     
                     (1.049, 1.370) (0.932, 1.230)
                                                  
as.factor(diploma)2                     1.287     
                                    (0.801, 2.063)
                                                  
as.factor(diploma)3                     1.189     
                                    (0.831, 1.705)
                                                  
as.factor(diploma)4                     1.315*    
                                    (0.977, 1.774)
                                                  
as.factor(diploma)5                    1.555**    
                                    (1.112, 2.181)
                                                  
as.factor(diploma)6                     0.709*    
                                    (0.483, 1.037)
                                                  
as.factor(diploma)7                     0.631*    
                                    (0.376, 1.042)
                                                  
as.factor(diploma)8                     0.970     
                                    (0.693, 1.358)
                                                  
as.factor(diploma)9                    0.520***   
                                    (0.357, 0.755)
                                                  
as.factor(diploma)10                   0.352***   
                                    (0.221, 0.550)
                                                  
as.factor(diploma)11                    0.659     
                                    (0.377, 1.129)
                                                  
as.factor(diploma)12                   0.384***   
                                    (0.223, 0.643)
                                                  
as.factor(diploma)13                   0.425***   
                                    (0.286, 0.628)
                                                  
as.factor(diploma)14                   0.494***   
                                    (0.305, 0.787)
                                                  
as.factor(diploma)15                   0.440***   
                                    (0.293, 0.658)
                                                  
as.factor(diploma)16                   0.240***   
                                    (0.080, 0.590)
                                                  
as.factor(diploma)17                    0.520*    
                                    (0.260, 0.991)
                                                  
as.factor(diploma)18                   0.246***   
                                    (0.134, 0.431)
                                                  
as.factor(diploma)19                   0.195***   
                                    (0.094, 0.372)
                                                  
as.factor(diploma)20                   0.401**    
                                    (0.173, 0.848)
                                                  
Constant                0.337***        0.788     
                     (0.261, 0.433) (0.533, 1.162)
                                                  
--------------------------------------------------
Observations             3,898          3,898     
Log Likelihood         -2,513.209     -2,397.828  
Akaike Inf. Crit.      5,032.417      4,839.656   
==================================================
Note:                  *p<0.1; **p<0.05; ***p<0.01
  • Interpret the results: How does attending a “grande école” affect the odds of voting for the RN? Is the coefficient on gender still significant ?

We can see that the odds for every diploma are positive, meaning that every education increases slightly the voting for RN odds. We note however that as the diploma level increases, the odds of voting for RN decrease, nearly all results are highly significant. Considering that attending a “grand ecole” is category 5 (bac +5), it increases the odds of voting for RN the most (1.555) and is statistically significant.

We can see that when comparing the two models, the gender variable is no longer statistically significant in the second model, this means that the gender gap holds only in the first model.

  1. Include in the regression religion, union_member, and income. Only use as.factor(variable) for nominal variables with more than two categories. If it appears, disregard the following warning : Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred.
model3 <- glm(
data = df,
voteRN ~ age + female + as.factor(diploma) + as.factor(religious_affiliation) + as.factor(union_member) + as.factor(income),
family = "binomial"
)
summary(model3)

Call:
glm(formula = voteRN ~ age + female + as.factor(diploma) + as.factor(religious_affiliation) + 
    as.factor(union_member) + as.factor(income), family = "binomial", 
    data = df)

Coefficients:
                                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)                        -0.400298   0.390138  -1.026 0.304871    
age                                -0.009762   0.002739  -3.564 0.000365 ***
female                              0.127140   0.076490   1.662 0.096474 .  
as.factor(diploma)2                 0.307201   0.257595   1.193 0.233035    
as.factor(diploma)3                 0.135215   0.196548   0.688 0.491484    
as.factor(diploma)4                 0.253053   0.162067   1.561 0.118428    
as.factor(diploma)5                 0.519712   0.183548   2.831 0.004633 ** 
as.factor(diploma)6                -0.290422   0.205694  -1.412 0.157975    
as.factor(diploma)7                -0.313964   0.271642  -1.156 0.247763    
as.factor(diploma)8                -0.085215   0.183395  -0.465 0.642179    
as.factor(diploma)9                -0.603633   0.202513  -2.981 0.002876 ** 
as.factor(diploma)10               -0.859231   0.244328  -3.517 0.000437 ***
as.factor(diploma)11               -0.321312   0.296035  -1.085 0.277752    
as.factor(diploma)12               -0.896569   0.278965  -3.214 0.001309 ** 
as.factor(diploma)13               -0.932991   0.216742  -4.305 1.67e-05 ***
as.factor(diploma)14               -0.651978   0.260751  -2.500 0.012406 *  
as.factor(diploma)15               -0.694263   0.220071  -3.155 0.001607 ** 
as.factor(diploma)16               -1.166824   0.514271  -2.269 0.023275 *  
as.factor(diploma)17               -0.672384   0.365011  -1.842 0.065462 .  
as.factor(diploma)18               -1.291710   0.317387  -4.070 4.70e-05 ***
as.factor(diploma)19               -1.507559   0.359605  -4.192 2.76e-05 ***
as.factor(diploma)20               -0.706818   0.418755  -1.688 0.091430 .  
as.factor(religious_affiliation)2  -0.320644   0.242906  -1.320 0.186824    
as.factor(religious_affiliation)3   0.290019   0.539834   0.537 0.591104    
as.factor(religious_affiliation)4  -1.114308   0.531528  -2.096 0.036045 *  
as.factor(religious_affiliation)5  -2.599050   0.431088  -6.029 1.65e-09 ***
as.factor(religious_affiliation)6   0.111324   0.598702   0.186 0.852490    
as.factor(religious_affiliation)7 -10.472087 196.967866  -0.053 0.957599    
as.factor(religious_affiliation)8  -0.238394   0.327455  -0.728 0.466602    
as.factor(religious_affiliation)9  -0.900495   0.080109 -11.241  < 2e-16 ***
as.factor(union_member)2            0.268860   0.136529   1.969 0.048924 *  
as.factor(union_member)3            0.464084   0.119571   3.881 0.000104 ***
as.factor(income)2                  0.479501   0.470227   1.020 0.307861    
as.factor(income)3                  0.966915   0.406288   2.380 0.017318 *  
as.factor(income)4                  0.732616   0.397926   1.841 0.065609 .  
as.factor(income)5                  0.622555   0.351692   1.770 0.076699 .  
as.factor(income)6                  0.483632   0.352490   1.372 0.170050    
as.factor(income)7                  0.510010   0.337311   1.512 0.130538    
as.factor(income)8                  0.658218   0.335483   1.962 0.049762 *  
as.factor(income)9                  0.475442   0.337029   1.411 0.158337    
as.factor(income)10                 0.531670   0.336591   1.580 0.114204    
as.factor(income)11                 0.229343   0.344531   0.666 0.505624    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4732.8  on 3669  degrees of freedom
Residual deviance: 4285.6  on 3628  degrees of freedom
  (1439 observations deleted due to missingness)
AIC: 4369.6

Number of Fisher Scoring iterations: 10
  • Compare your results. Some religious practices may substantially reduce the odds of voting for RN. Which religions are these? Are the results in line with your expectations?

All religious affiliations except from numbers 3 and 6, reduce the odds of voting for RN. The religions that substantially reduce the odds of voting for RN are religions 4, 5 and 7. I am not able to check to which religions these correspond, but the results do not align with my expectations, which would be that some more conservative and restrictive religions would increase the odds of voting for RN.

  • Replace age with your cohort variable. Which cohort is most likely to vote for RN?
model4 <- glm(
data = df,
voteRN ~ female + as.factor(diploma) + as.factor(religious_affiliation) + as.factor(union_member) + as.factor(income) + as.factor(cohort),
family = "binomial"
)
summary(model4)

Call:
glm(formula = voteRN ~ female + as.factor(diploma) + as.factor(religious_affiliation) + 
    as.factor(union_member) + as.factor(income) + as.factor(cohort), 
    family = "binomial", data = df)

Coefficients:
                                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)                        -0.85985    0.39342  -2.186 0.028847 *  
female                              0.14096    0.07703   1.830 0.067272 .  
as.factor(diploma)2                 0.38658    0.25970   1.489 0.136612    
as.factor(diploma)3                 0.18392    0.19783   0.930 0.352541    
as.factor(diploma)4                 0.30318    0.16344   1.855 0.063592 .  
as.factor(diploma)5                 0.51157    0.18476   2.769 0.005626 ** 
as.factor(diploma)6                -0.24373    0.20676  -1.179 0.238462    
as.factor(diploma)7                -0.28112    0.27267  -1.031 0.302561    
as.factor(diploma)8                -0.09085    0.18470  -0.492 0.622796    
as.factor(diploma)9                -0.53138    0.20377  -2.608 0.009113 ** 
as.factor(diploma)10               -0.84598    0.24527  -3.449 0.000562 ***
as.factor(diploma)11               -0.32534    0.29753  -1.093 0.274196    
as.factor(diploma)12               -0.89185    0.28037  -3.181 0.001468 ** 
as.factor(diploma)13               -0.79268    0.22055  -3.594 0.000326 ***
as.factor(diploma)14               -0.60561    0.26237  -2.308 0.020985 *  
as.factor(diploma)15               -0.68181    0.22132  -3.081 0.002066 ** 
as.factor(diploma)16               -1.18630    0.51576  -2.300 0.021443 *  
as.factor(diploma)17               -0.68269    0.36706  -1.860 0.062903 .  
as.factor(diploma)18               -1.15672    0.32074  -3.606 0.000310 ***
as.factor(diploma)19               -1.38510    0.36095  -3.837 0.000124 ***
as.factor(diploma)20               -0.58070    0.42158  -1.377 0.168379    
as.factor(religious_affiliation)2  -0.36034    0.24403  -1.477 0.139786    
as.factor(religious_affiliation)3   0.19912    0.53507   0.372 0.709785    
as.factor(religious_affiliation)4  -1.19635    0.53517  -2.235 0.025389 *  
as.factor(religious_affiliation)5  -2.56114    0.43142  -5.937 2.91e-09 ***
as.factor(religious_affiliation)6   0.05488    0.60081   0.091 0.927224    
as.factor(religious_affiliation)7 -10.73765  196.96788  -0.055 0.956525    
as.factor(religious_affiliation)8  -0.21613    0.32608  -0.663 0.507452    
as.factor(religious_affiliation)9  -0.89973    0.08045 -11.183  < 2e-16 ***
as.factor(union_member)2            0.31847    0.13753   2.316 0.020580 *  
as.factor(union_member)3            0.50962    0.12057   4.227 2.37e-05 ***
as.factor(income)2                  0.43657    0.47170   0.926 0.354692    
as.factor(income)3                  0.95632    0.40669   2.351 0.018700 *  
as.factor(income)4                  0.71572    0.39869   1.795 0.072626 .  
as.factor(income)5                  0.60207    0.35223   1.709 0.087396 .  
as.factor(income)6                  0.48352    0.35336   1.368 0.171203    
as.factor(income)7                  0.51342    0.33829   1.518 0.129094    
as.factor(income)8                  0.65593    0.33605   1.952 0.050955 .  
as.factor(income)9                  0.47260    0.33828   1.397 0.162391    
as.factor(income)10                 0.52416    0.33773   1.552 0.120654    
as.factor(income)11                 0.18869    0.34586   0.546 0.585361    
as.factor(cohort)30-39             -0.25359    0.18377  -1.380 0.167603    
as.factor(cohort)40-49              0.02708    0.17925   0.151 0.879907    
as.factor(cohort)50-59              0.09141    0.17319   0.528 0.597649    
as.factor(cohort)60-69             -0.15999    0.18258  -0.876 0.380888    
as.factor(cohort)70-79             -0.48860    0.18543  -2.635 0.008415 ** 
as.factor(cohort)80+               -0.69715    0.24106  -2.892 0.003828 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4732.8  on 3669  degrees of freedom
Residual deviance: 4261.8  on 3623  degrees of freedom
  (1439 observations deleted due to missingness)
AIC: 4355.8

Number of Fisher Scoring iterations: 10

Cohort 50-59 is the most likely to vote for RN, however this result is not statistically significant.

9. Run a logistic regression including the interaction age*female.

model5 <- glm(
data = df,
voteRN ~ age*female,
family = "binomial"
)
summary(model5)

Call:
glm(formula = voteRN ~ age * female, family = "binomial", data = df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.6670028  0.1773442  -3.761 0.000169 ***
age         -0.0004861  0.0030147  -0.161 0.871901    
female      -0.6044836  0.2411561  -2.507 0.012190 *  
age:female   0.0142700  0.0042074   3.392 0.000695 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5042.1  on 3897  degrees of freedom
Residual deviance: 5014.9  on 3894  degrees of freedom
  (1211 observations deleted due to missingness)
AIC: 5022.9

Number of Fisher Scoring iterations: 4
  • Plot the interaction results using library(ggeffects) and ggpredict. Hint: use geom_ribbon() to display confidence intervals.
predicted_probs <- ggpredict(model5, terms = c("age", "female"))
Data were 'prettified'. Consider using `terms="age [all]"` to get smooth
  plots.
ggplot(predicted_probs, aes(x = x, y = predicted)) +
geom_line(color = "blue", size = 1) +
geom_point() +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.3) +
labs(
title = "Predicted Probability of Voting for RN by age*female",
x = "age*female",
y = "Predicted Probability of Voting for RN"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

  1. Run a similar regression with female*cohort and plot the interactions using library(ggeffects) and ggpredict(). Hint: use geom_errorbar() to display confidence intervals.
model6 <- glm(
data = df,
voteRN ~ female*cohort,
family = "binomial"
)
summary(model6)

Call:
glm(formula = voteRN ~ female * cohort, family = "binomial", 
    data = df)

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -0.81093    0.21246  -3.817 0.000135 ***
female             -0.18044    0.26149  -0.690 0.490175    
cohort30-39        -0.16393    0.25884  -0.633 0.526529    
cohort40-49         0.14925    0.24198   0.617 0.537387    
cohort50-59         0.34662    0.23322   1.486 0.137213    
cohort60-69         0.24562    0.23578   1.042 0.297547    
cohort70-79         0.01015    0.23689   0.043 0.965815    
cohort80+          -0.50475    0.32061  -1.574 0.115407    
female:cohort30-39  0.06153    0.33104   0.186 0.852556    
female:cohort40-49  0.26268    0.31044   0.846 0.397478    
female:cohort50-59  0.41181    0.29568   1.393 0.163687    
female:cohort60-69  0.43619    0.30688   1.421 0.155212    
female:cohort70-79  0.49136    0.30752   1.598 0.110083    
female:cohort80+    1.22282    0.41993   2.912 0.003591 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5042.1  on 3897  degrees of freedom
Residual deviance: 4976.3  on 3884  degrees of freedom
  (1211 observations deleted due to missingness)
AIC: 5004.3

Number of Fisher Scoring iterations: 4
predicted_probs2 <- ggpredict(model6, terms = c("female", "cohort"))
ggplot(predicted_probs2, aes(x = x, y = predicted)) +
geom_line(color = "blue", size = 1) +
geom_point() +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.3) +
labs(
title = "Predicted Probability of Voting for RN by female*cohort",
x = "female*cohort",
y = "Predicted Probability of Voting for RN"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)