class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## OLS with interaction ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] --- <style type="text/css"> .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 12px; } </style> #Let's get ready ```r library(tidyverse) # Add the tidyverse package to my current library. library(haven) # Import data. library(Hmisc) # Weighting library(ggplot2) # Allows us to create nice figures. library(estimatr) # Allows us to estimate (cluster-)robust standard errors. library(texreg) # Allows us to make nicely-formatted Html & Latex regression tables. ``` --- #Model the relationship between marital status and life satisfaction - [Prepare data](https://rpubs.com/fancycmn/1094159) - a dataset of age, sex, marital status, years of education, life satisfaction, weight - DV: life satisfaction - IV: age, sex, **marital status**, years of education --- #Running an OLS ```r ols3_cat <- lm_robust(formula = sat6 ~ age + marital1, data = wave1c, weights = cdweight) ``` ```r texreg::screenreg(list(ols3_cat), include.ci = FALSE, digits = 3) ``` ``` ## ## ============================== ## Model 1 ## ------------------------------ ## (Intercept) 8.549 *** ## (0.093) ## age -0.045 *** ## (0.004) ## marital1Married 0.735 *** ## (0.078) ## marital1Divorced -0.528 ** ## (0.169) ## ------------------------------ ## R^2 0.047 ## Adj. R^2 0.046 ## Num. obs. 6135 ## RMSE 1.756 ## ============================== ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` --- #Running an OLS: make it beautiful ```r ols3_cat <- lm_robust(formula = sat6 ~ age + marital1, data = wave1c, weights = cdweight) ``` ```r texreg::screenreg(list(ols3_cat), custom.model.names="OLS", #change the model name custom.coef.names=c( #change the variable name "Intercept", "Age", "Married", "Divorced" ), include.ci = FALSE, digits = 3) ``` ``` ## ## ======================= ## OLS ## ----------------------- ## Intercept 8.549 *** ## (0.093) ## Age -0.045 *** ## (0.004) ## Married 0.735 *** ## (0.078) ## Divorced -0.528 ** ## (0.169) ## ----------------------- ## R^2 0.047 ## Adj. R^2 0.046 ## Num. obs. 6135 ## RMSE 1.756 ## ======================= ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` --- #Running an OLS: make it beautiful Export to html ```r texreg::htmlreg(list(ols3_cat), custom.model.names="OLS", custom.coef.names=c( "Intercept", "Age", "Married", "Divorced" ), include.ci = FALSE, file = "Nice fomatted result.html") #html ``` ``` ## The table was written to the file 'Nice fomatted result.html'. ``` --- #Interpretation of OLS - what does the intercept "8.549" mean? - does it mean the life satisfaction of those never married?! ```r texreg::screenreg(list(ols3_cat), custom.model.names="OLS", #change the model name custom.coef.names=c("Intercept","Age","Married","Divorced"), #change the variable name include.ci = FALSE, digits = 3) ``` ``` ## ## ======================= ## OLS ## ----------------------- ## Intercept 8.549 *** ## (0.093) ## Age -0.045 *** ## (0.004) ## Married 0.735 *** ## (0.078) ## Divorced -0.528 ** ## (0.169) ## ----------------------- ## R^2 0.047 ## Adj. R^2 0.046 ## Num. obs. 6135 ## RMSE 1.756 ## ======================= ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` -- - it means that a person who is aged 0 and never married, his/her life satisfaction is 8.544. - does it make sense? a person aged 0 never existed in the pairfam survey. --- #Interpretation of OLS - Take those divorced as the reference group - What does the coefficient of never married (i.e. 0.528) mean here ```r wave1c$marital1 <- fct_relevel(wave1c$marital1, "Divorced", "Nevermarried", "Married") #use fct_level function to re-level marital1, now divorced is the first level and will be treated as the reference category ols3_cat_relevel <- lm_robust(formula = sat6 ~ age + marital1, data = wave1c, weights = cdweight) texreg::screenreg(list(ols3_cat_relevel), include.ci = FALSE, digits = 3) ``` ``` ## ## ================================== ## Model 1 ## ---------------------------------- ## (Intercept) 8.021 *** ## (0.216) ## age -0.045 *** ## (0.004) ## marital1Nevermarried 0.528 ** ## (0.169) ## marital1Married 1.263 *** ## (0.162) ## ---------------------------------- ## R^2 0.047 ## Adj. R^2 0.046 ## Num. obs. 6135 ## RMSE 1.756 ## ================================== ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` -- - It means when the age is controlled, a person who is never married has a higher life satisfaction score than a person who is divorced, by the amount of 0.528. --- #OLS with interaction: interaction betwen two categorical - Does the relationship between marital status and life satisfaction vary across men and women? ```r #as we re-level marital1 in last slides, we now re-level it back wave1c$marital1 <- fct_relevel(wave1c$marital1,"Nevermarried", "Married","Divorced") ols4_interactsex <- lm_robust(formula = sat6 ~ age + marital1*sex_gen, data = wave1c, weights = cdweight) # use * to specific the interaction texreg::screenreg(list(ols4_interactsex), include.ci = FALSE, digits = 3, single.row = TRUE)#single.row= TRUE is to make the standard error in the same row with coef. ``` ``` ## ## ====================================================== ## Model 1 ## ------------------------------------------------------ ## (Intercept) 8.522 (0.097) *** ## age -0.044 (0.004) *** ## marital1Married 0.770 (0.102) *** ## marital1Divorced -0.674 (0.293) * ## sex_gen2 Female 0.053 (0.062) ## marital1Married:sex_gen2 Female -0.074 (0.111) ## marital1Divorced:sex_gen2 Female 0.234 (0.337) ## ------------------------------------------------------ ## R^2 0.047 ## Adj. R^2 0.046 ## Num. obs. 6135 ## RMSE 1.756 ## ====================================================== ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` --- #Interpreting interaction betwen two categorical - `$$sat6=8.522 -0.044age +0.770married -0.674divorced + 0.053female $$` `$$-0.074married*female + 0.234divorced*female $$` -- - A man aged 20 and never married, what his life satisfaction? age=20, married=0, divorced=0, female=0 `$$sat6=8.522 -0.044*\color{red}{20}$$` -- - A man aged 20 and divorced, what his life satisfaction? age=20, married=0, divorced=1, female=0 `$$sat6=8.522 -0.044*\color{red}{20} -0.674*\color{red}{1}$$` -- - A female aged 20 and never married, what her life satisfaction? age=20, married=0, divorced=0, female=1 `$$sat6=8.522 -0.044*\color{red}{20} + 0.053*\color{red}{1}$$` -- - A female aged 20 and divorced, what her life satisfaction? age=20, married=0, divorced=1, female=1 `$$sat6=8.522 -0.044*\color{red}{20} -0.674*\color{red}{1} + 0.053*\color{red}{1}+ 0.234*\color{red}{1}*\color{red}{1}$$` --- #Interpreting interaction betwen two categorical Given that age is the average level of age in the sample: <img src="https://github.com/fancycmn/2023advancequant_intro/blob/main/interaction1.png?raw=true" width="80%" style="display: block; margin: auto;" > --- ##OLS with interaction: interaction betwen one categorical and one continuous - Does the relationship between age and life satisfaction vary across marital status? ```r ols5_interactage <- lm_robust(formula = sat6 ~ age*marital1 + sex_gen , data = wave1c, weights = cdweight) texreg::screenreg(list(ols5_interactage), include.ci = FALSE, digits = 3,single.row = TRUE) ``` ``` ## ## ========================================== ## Model 1 ## ------------------------------------------ ## (Intercept) 8.690 (0.104) *** ## age -0.052 (0.005) *** ## marital1Married -0.949 (0.363) ** ## marital1Divorced -1.947 (1.438) ## sex_gen2 Female 0.055 (0.051) ## age:marital1Married 0.052 (0.011) *** ## age:marital1Divorced 0.043 (0.041) ## ------------------------------------------ ## R^2 0.051 ## Adj. R^2 0.050 ## Num. obs. 6135 ## RMSE 1.752 ## ========================================== ## *** p < 0.001; ** p < 0.01; * p < 0.05 ``` `$$sat6=8.690 -0.052age -0.949married -1.947divorced +0.055female +0.052married*age $$` `$$+ 0.043divorced*age$$` --- ##interpreting interaction betwen one categorical and one continuous `$$sat6=8.690 -0.052age -0.949married -1.947divorced +0.055female +0.052married*age $$` `$$+ 0.043divorced*age$$` - Does the relationship between age and life satisfaction vary across marital status? - supposed that A-a never married man is aging, B-a divorced man is aging - the association between age and life satisfaction is not constant at -0.052 - for A, with 1-year increase in the age, sat6 decreases by -0.052 - for B, with 1-year increase in the age, sat6 decreases by -0.052+0.043 --- ##interpreting interaction betwen one categorical and one continuous `$$sat6=8.690 -0.052age -0.949married -1.947divorced +0.055female +0.052married*age $$` `$$+ 0.043divorced*age$$` Given that the marital status is the reference level, i.e."never-married": .pull-left[ male <img src="https://github.com/fancycmn/2023advancequant_intro/blob/main/interaction2_male.png?raw=true" width="110%" style="display: block; margin: auto;" > ] .pull-right[ female <img src="https://github.com/fancycmn/2023advancequant_intro/blob/main/interaction2_female.png?raw=true" width="110%" style="display: block; margin: auto;" > ] --- #Take home 1. Modify the output of your regression 2. OLS with interaction - Interaction between two categorical variables - Interaction between one categorical variable and one continous variable - Interpret OLS with interactions --- class: center, middle #[Exercise](https://rpubs.com/fancycmn/1095087)