project 4

linear regression - 1 (Direct Effects):

  1. Research question and hypotheses which you explain to the reader

  2. Descriptive statistics for each variable involved (for single variables and pairs of variables), e.g. histograms, barplots, scatter plots, and box plots

  3. Hierarchically built models: you start with one predictor and then add another one

  4. Comparison of model fits with ANOVA + you write down which model fits better To accompany your final model:

  5. you have provided a correct interpretation of the overall model fit (F-statistic, R-squared)

  6. correct interpretation of all coefficients

  7. you have written out the regression equation

  8. you have explained whether you hypotheses were falsified or supported

  9. you have created an output table with sjt.lm() or stargazer() and formulated some conclusion

  10. you have submitted a feedback to another group’s project on the forum

Variables

We gonna predict edlvpdua: Partner’s highest level of education, Ukraine

by edlvdua: (Respondent’s) highest level of education, Ukraine

and edlvfdua: Father’s highest level of education, Ukraine

and edlvmdua: Mother’s highest level of education, Ukraine

  • agea: age of respondent (factor)

  • gender: sex of respondent (factor)

Hypothesis

  1. Partner’s highest level of education correlate with (respondent’s) highest level of education like in Bourdieu’s concept of habitus - any person looking for partner with the same level of education.

  2. Mother’s and father’s levels of education influence through (respondent’s) level of education on respondent’s parnter’s highest level of education - parents influence on respondent’s level of education and that influence on respondent’s partner’s level of education - like in Bourdieu’s concept of habitus.

+) Additional: also we trying to show that younglings more tolerate in choosing partners by shared level of education that elder generations - that means nowadays shared (same) level of education influence not that strong on choosing partner then earlier.

Libraries+

library(car)
library(ggplot2)
library(psych)
library(QuantPsyc)
library(sjPlot)
library(stargazer)
library(car)
library(ggplot2)
library(gvlma)
library(magrittr)
library(effects)
options(scipen=999)

Dataset

getwd()
## [1] "/students/dstsimokha/DA_shirokaner"
politics = haven::read_sav("ESS6UA.sav")
politics <- dplyr::select(politics, edlvpdua, edlvdua, edlvfdua, edlvmdua, gndr, agea, eduyrs)
politics <- na.omit(politics)
politics <- subset(politics, edlvdua!="5555" & edlvfdua!="5555" & edlvmdua!="5555" & agea>"19")
politics$age_group <- as.factor(car::recode(politics$agea,"20:40='20-40';40:60='40-60';60:90='60-90'"))
politics$gndr <- as.factor(car::recode(politics$gndr,"1='male';2='female'"))

Descriptive statistics

All data statistics

##            vars    n  mean    sd median trimmed   mad min max range  skew
## edlvpdua      1 1018  6.89  2.74      8    7.01  2.97   0  12    12 -0.30
## edlvdua       2 1018  7.02  2.81      8    7.19  2.97   0  12    12 -0.41
## edlvfdua      3 1018  4.36  3.22      4    4.16  4.45   0  12    12  0.44
## edlvmdua      4 1018  4.24  3.22      4    4.04  4.45   0  11    11  0.42
## gndr*         5 1018  1.42  0.49      1    1.40  0.00   1   2     1  0.33
## agea          6 1018 47.22 15.75     46   46.57 19.27  20  87    67  0.26
## eduyrs        7 1018 12.72  3.29     13   12.82  2.97   1  30    29 -0.14
## age_group*    8 1018  1.85  0.77      2    1.81  1.48   1   3     2  0.27
##            kurtosis   se
## edlvpdua      -0.93 0.09
## edlvdua       -0.91 0.09
## edlvfdua      -0.93 0.10
## edlvmdua      -1.05 0.10
## gndr*         -1.89 0.02
## agea          -0.90 0.49
## eduyrs         3.89 0.10
## age_group*    -1.28 0.02

Outcome

Predictors


Visualization of relationship

Partner & respondent

Partner & respondent’s father

Partner & respondent’s mother

Correlation test

Partner & respondent

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvdua
## t = 18.496, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4544714 0.5464887
## sample estimates:
##       cor 
## 0.5018989

Partner & respondent’s father

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvfdua
## t = 13.688, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3414232 0.4452358
## sample estimates:
##      cor 
## 0.394588

Partner & respondent’s mother

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvmdua
## t = 13.141, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3273713 0.4324617
## sample estimates:
##      cor 
## 0.381147

Assumptions

I will do it a moment later cause we don’t need it

Linearity and outliers

Linearity

fit4 <- lm(edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data=politics)
 
ggplot(data = politics, aes(x = edlvpdua, y = edlvdua)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  theme_light()
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.

Outliers

outlierTest(fit4)
## 
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferonni p
## 769 -3.186016          0.0014866           NA

Additivity

gvmodel1 <- gvlma(fit4) 
summary(gvmodel1)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  3.27807    0.19601  16.724 < 0.0000000000000002 ***
## edlvdua      0.38260    0.02923  13.089 < 0.0000000000000002 ***
## edlvfdua     0.13203    0.03284   4.020            0.0000626 ***
## edlvmdua     0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit4) 
## 
##                     Value p-value                Decision
## Global Stat        5.1546 0.27180 Assumptions acceptable.
## Skewness           2.7703 0.09603 Assumptions acceptable.
## Kurtosis           1.0745 0.29993 Assumptions acceptable.
## Link Function      1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.

Independent errors

durbinWatsonTest(fit4)
##  lag Autocorrelation D-W Statistic p-value
##    1       0.1047431      1.789712       0
##  Alternative hypothesis: rho != 0

Homoscedasticity

ncvTest(fit4)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 13.29712    Df = 1     p = 0.0002658145

Normality of the residuals

model1=aov(politics$edlvpdua ~ politics$edlvdua) 
res1=model1$residuals 
shapiro.test(res1)
## 
##  Shapiro-Wilk normality test
## 
## data:  res1
## W = 0.98922, p-value = 0.0000008359

for ONE

model2=aov(politics$edlvpdua ~ politics$edlvfdua) 
res2=model2$residuals 
shapiro.test(res2)
## 
##  Shapiro-Wilk normality test
## 
## data:  res2
## W = 0.98308, p-value = 0.000000001728

for SECOND

model3=aov(politics$edlvpdua ~ politics$edlvmdua) 
res3=model3$residuals 
shapiro.test(res3)
## 
##  Shapiro-Wilk normality test
## 
## data:  res3
## W = 0.9783, p-value = 0.00000000003407

for THIRD

No confounding varibales

Predictor variables are quantitative or dichotomous, outcome variable is continuous

Multicollinearity

# checking multicolinearity for independent variables.
#VIF(lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics))

Global test for model assumptions

gvmodel <- gvlma(fit4) 
summary(gvmodel)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  3.27807    0.19601  16.724 < 0.0000000000000002 ***
## edlvdua      0.38260    0.02923  13.089 < 0.0000000000000002 ***
## edlvfdua     0.13203    0.03284   4.020            0.0000626 ***
## edlvmdua     0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit4) 
## 
##                     Value p-value                Decision
## Global Stat        5.1546 0.27180 Assumptions acceptable.
## Skewness           2.7703 0.09603 Assumptions acceptable.
## Kurtosis           1.0745 0.29993 Assumptions acceptable.
## Link Function      1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.

Linear regression:

Model 1

(edlvdua: Highest level of education, Ukraine)

ggplot(politics, aes(edlvpdua)) + geom_density(fill="blue")

ggplot(politics, aes(edlvdua)) + geom_density(fill="red")

qqplot(politics$edlvpdua, politics$edlvdua)

fit1 = lm(edlvpdua ~ edlvdua, data=politics)
summary(fit1)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.8383 -1.4077  0.1221  1.6518  7.0625 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   3.4474     0.2004    17.2 <0.0000000000000002 ***
## edlvdua       0.4901     0.0265    18.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared:  0.2519, Adjusted R-squared:  0.2512 
## F-statistic: 342.1 on 1 and 1016 DF,  p-value: < 0.00000000000000022
#par(mfrow=c(2,2))
#plot(fit1)

sjPlot::plot_model(fit1, type = "pred", terms = "edlvdua")

This model explains 25% of variance

Intercept = 3.43 & b coefficient = 0.5:

y(edlvpdua) = 0.5 * x(edlvdua) + 3.43

for every additonal (respondent’s) higher level of education the partner’s higher level of education increases by 0.5 from 3.43

Model 2

(edlvfdua: Father’s highest level of education, Ukraine)

fit2 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data=politics)
summary(fit2)
## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7846 -2.0960  0.2318  1.8875  5.5761 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           5.42386    0.13306   40.76 <0.0000000000000002 ***
## as.numeric(edlvfdua)  0.33607    0.02455   13.69 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.519 on 1016 degrees of freedom
## Multiple R-squared:  0.1557, Adjusted R-squared:  0.1549 
## F-statistic: 187.4 on 1 and 1016 DF,  p-value: < 0.00000000000000022
par(mfrow=c(2,2))
plot(fit2)

This model explains 16% of variance

Intercept = 5.9 & b coefficient = 0.34:

y(edlvpdua) = 0.34 * x(edlvfdua) + 5.9

for every additonal (respondent’s) father’s higher level of education the partner’s higher level of education increases by 0.34 from 5.9

Model 3

(edlvmdua: Mother’s highest level of education, Ukraine)

fit3 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data=politics)
summary(fit3)
## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7596 -2.1108  0.2136  1.8892  5.4844 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           5.51558    0.13133   42.00 <0.0000000000000002 ***
## as.numeric(edlvmdua)  0.32440    0.02469   13.14 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.534 on 1016 degrees of freedom
## Multiple R-squared:  0.1453, Adjusted R-squared:  0.1444 
## F-statistic: 172.7 on 1 and 1016 DF,  p-value: < 0.00000000000000022
par(mfrow=c(2,2))
plot(fit3)

This model explains 15% of variance

Intercept = 5.5 & b coefficient = 0.33:

y(edlvpdua) = 0.33 * x(edlvmdua) + 5.5

for every additonal (respondent’s) mother’s higher level of education the partner’s higher level of education increases by 0.33 from 5.5

ANOVA:

## Analysis of Variance Table
## 
## Response: edlvpdua
##             Df Sum Sq Mean Sq F value                Pr(>F)    
## edlvdua      1 1922.9 1922.94  342.11 < 0.00000000000000022 ***
## Residuals 1016 5710.7    5.62                                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model 4

(final)

fit4 <- lm(as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) + as.numeric(edlvmdua), data=politics)
summary(fit4)
## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) + 
##     as.numeric(edlvmdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           3.27807    0.19601  16.724 < 0.0000000000000002 ***
## as.numeric(edlvdua)   0.38260    0.02923  13.089 < 0.0000000000000002 ***
## as.numeric(edlvfdua)  0.13203    0.03284   4.020            0.0000626 ***
## as.numeric(edlvmdua)  0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022
par(mfrow=c(2,2))  
plot(fit4)

lm.beta(fit4)
##  as.numeric(edlvdua) as.numeric(edlvfdua) as.numeric(edlvmdua) 
##           0.39183089           0.15501268           0.09660109
plot_model(fit4, type = "std")

fit4i <- lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics)
summary(fit4i)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.0086 -1.7259  0.1634  1.5722  7.8285 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                            Estimate Std. Error t value
## (Intercept)                2.414640   0.361565   6.678
## edlvdua                    0.532523   0.054436   9.783
## edlvfdua                   0.387153   0.134628   2.876
## edlvmdua                   0.254588   0.138580   1.837
## edlvdua:edlvfdua          -0.040679   0.017111  -2.377
## edlvdua:edlvmdua          -0.030296   0.017090  -1.773
## edlvfdua:edlvmdua         -0.029865   0.025109  -1.189
## edlvdua:edlvfdua:edlvmdua  0.005302   0.002845   1.864
##                                       Pr(>|t|)    
## (Intercept)                    0.0000000000398 ***
## edlvdua                   < 0.0000000000000002 ***
## edlvfdua                               0.00412 ** 
## edlvmdua                               0.06648 .  
## edlvdua:edlvfdua                       0.01762 *  
## edlvdua:edlvmdua                       0.07658 .  
## edlvfdua:edlvmdua                      0.23456    
## edlvdua:edlvfdua:edlvmdua              0.06267 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.296 on 1010 degrees of freedom
## Multiple R-squared:  0.3023, Adjusted R-squared:  0.2975 
## F-statistic: 62.53 on 7 and 1010 DF,  p-value: < 0.00000000000000022
sjPlot::plot_model(fit4i, type = "pred", terms = "edlvpdua")

This model explains 30% of variance

Intercept = 3.2 & b1 coefficient = 0.38 & b2 coefficient = 0.13 + b3 coefficient = 0.08:

y(edlvpdua) = 0.38(edlvdua) + 0.13(edlvfdua) + 0.08(edlvmdua) + 3.2

for every additonal (respondent’s) highest level of education + (respondent’s) father’s highest level of education + (respondent’s) mother’s highest level of education the partner’s highest level of education increases by 0.38 + 0.13 + 0.08 from 3.2

Comparison of models

    edlvpdua   as.numeric(edlvpdua)   as.numeric(edlvpdua)   as.numeric(edlvpdua)
    B   B   B   B
(Intercept)   3.45 ***   5.42 ***   5.52 ***   3.28 ***
edlvdua   0.49 ***            
as.numeric(edlvfdua)       0.34 ***       0.13 ***
as.numeric(edlvmdua)           0.32 ***   0.08 *
as.numeric(edlvdua)               0.38 ***
Observations   1018   1018   1018   1018
R2 / adj. R2   .252 / .251   .156 / .155   .145 / .144   .295 / .293
Notes * p<.05   ** p<.01   *** p<.001

Picking the best model - the last fourth model has the best result in prediction of partner’s highest level of education = 30%


Interaction effect

    edlvpdua   edlvpdua   edlvpdua   edlvpdua   edlvpdua   edlvpdua   edlvpdua
    B   B   B   B   B   B   B
(Intercept)   2.98 ***   2.86 ***   5.35 ***   3.33 ***   3.33 ***   2.90 ***   2.41 ***
edlvdua   0.45 ***   0.46 ***       0.38 ***   0.38 ***   0.44 ***   0.53 ***
edlvmdua   0.28 ***       0.13 *   0.07    0.07    0.20 **   0.25 
edlvdua:edlvmdua   -0.01                        -0.03 
edlvfdua       0.32 ***   0.18 ***   0.12 **   0.12 **   0.13 ***   0.39 **
edlvdua:edlvfdua       -0.02                    -0.04 *
edlvfdua:edlvmdua           0.01    0.00             
edlvmdua:edlvfdua                   0.00        -0.03 
edlvmdua:edlvdua                       -0.02     
edlvdua:edlvmdua:edlvfdua                           0.01 
Observations   1018   1018   1018   1018   1018   1018   1018
R2 / adj. R2   .285 / .283   .293 / .291   .176 / .174   .295 / .292   .295 / .292   .297 / .294   .302 / .297
Notes * p<.05   ** p<.01   *** p<.001

Also in addition (because I was interested in it, that looks like Bourdeaurd) comparison of linear regressions of years of education respondent & partner with years of education of father & mother of respondent

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvfdua and politics$edlvmdua
## t = 32.859, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6866184 0.7463080
## sample estimates:
##       cor 
## 0.7177794
    edlvfdua   edlvpdua
    B   B
(Intercept)   1.32 ***   3.45 ***
edlvmdua   0.72 ***    
edlvdua       0.49 ***
Observations   1018   1018
R2 / adj. R2   .515 / .515   .252 / .251
Notes * p<.05   ** p<.01   *** p<.001
  • сделать несколько кластеров по возрасту респондента и посмотреть на разницу поколений (создать новую переменную - группировать по возрасту, а потом сделать интерактивный эффект по возрасту, смотреть как возрастные группы влияют)
hist(politics$agea)

ggplot() +
  geom_boxplot(data = politics, aes(x = age_group, y = edlvdua, fill = age_group)) +
  theme_light()+ 
  xlab("age group") +
  ylab("(respondent's) highest level of education")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.

fitFMa = lm(edlvfdua ~ edlvmdua * age_group, data=politics)
fitPRa = lm(edlvpdua ~ edlvdua * age_group, data=politics)


sjt.lm(fitFMa,fitPRa, show.ci = F, p.numeric = F)
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    edlvfdua   edlvpdua
    B   B
(Intercept)   2.29 ***   4.06 ***
edlvmdua   0.60 ***    
age_group
age_group40-60   -0.93 **   -0.18 
age_group60-90   -1.35 ***   -1.30 **
edlvmdua:age_group40-60   0.08     
edlvmdua:age_group60-90   0.16 *    
edlvdua       0.45 ***
edlvdua:age_group40-60       -0.03 
edlvdua:age_group60-90       0.07 
Observations   1018   1018
R2 / adj. R2   .524 / .522   .266 / .263
Notes * p<.05   ** p<.01   *** p<.001
sjPlot::plot_model(fitFMa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")

sjPlot::plot_model(fitPRa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")

group2 <- subset(politics, agea < 30 & agea > 19)
group3 <- subset(politics, agea < 40 & agea > 29)
group4 <- subset(politics, agea < 50 & agea > 39)
group5 <- subset(politics, agea < 60 & agea > 49)
group6 <- subset(politics, agea < 70 & agea > 59)

fitPR2 = lm(edlvpdua ~ edlvdua, data=group2)
fitPR3 = lm(edlvpdua ~ edlvdua, data=group3)
fitPR4 = lm(edlvpdua ~ edlvdua, data=group4)
fitPR5 = lm(edlvpdua ~ edlvdua, data=group5)
fitPR6 = lm(edlvpdua ~ edlvdua, data=group6)

sjt.lm(fitPR2,fitPR3,fitPR4,fitPR5,fitPR6, show.ci = F, p.numeric = F)
    edlvpdua   edlvpdua   edlvpdua   edlvpdua   edlvpdua
    B   B   B   B   B
(Intercept)   3.58 ***   4.50 ***   3.78 ***   3.95 ***   4.40 ***
edlvdua   0.49 ***   0.42 ***   0.43 ***   0.41 ***   0.32 ***
Observations   158   220   197   190   153
R2 / adj. R2   .264 / .259   .204 / .200   .161 / .156   .201 / .196   .105 / .099
Notes * p<.05   ** p<.01   *** p<.001

project 5

Theory: it was in the project 4 - Habitus Theory.

Data: We will predict edlvpdua partner’s highest level of education by edlvdua (respondent’s) highest level of education. Then we trying to use age_group respondent’s age (3 levels) and gndr gender (2 levels) as moderators.

fit5 <- lm(edlvpdua ~ edlvdua, data=politics)
summary(fit5)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.8383 -1.4077  0.1221  1.6518  7.0625 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   3.4474     0.2004    17.2 <0.0000000000000002 ***
## edlvdua       0.4901     0.0265    18.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared:  0.2519, Adjusted R-squared:  0.2512 
## F-statistic: 342.1 on 1 and 1016 DF,  p-value: < 0.00000000000000022

Here we can see that this model predicts 25%. Let’s try to use different moderators.

With age:

fit5iA <- lm(edlvpdua ~ edlvdua * age_group, data=politics)
summary(fit5iA)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * age_group, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.5523 -1.8124  0.1594  1.8732  7.7142 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                        Estimate Std. Error t value             Pr(>|t|)
## (Intercept)             4.05926    0.35079  11.572 < 0.0000000000000002
## edlvdua                 0.45195    0.04351  10.388 < 0.0000000000000002
## age_group40-60         -0.17853    0.50522  -0.353              0.72389
## age_group60-90         -1.30013    0.48043  -2.706              0.00692
## edlvdua:age_group40-60 -0.02911    0.06523  -0.446              0.65555
## edlvdua:age_group60-90  0.07470    0.06497   1.150              0.25052
##                           
## (Intercept)            ***
## edlvdua                ***
## age_group40-60            
## age_group60-90         ** 
## edlvdua:age_group40-60    
## edlvdua:age_group60-90    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.352 on 1012 degrees of freedom
## Multiple R-squared:  0.2665, Adjusted R-squared:  0.2629 
## F-statistic: 73.53 on 5 and 1012 DF,  p-value: < 0.00000000000000022

This model predicts slightly better - 26%.

With gender:

fit5iG <- lm(edlvpdua ~ edlvdua * gndr, data=politics)
summary(fit5iG)
## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * gndr, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.0159 -1.6567  0.2966  1.7732  7.4834 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                  Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)       2.99329    0.27023  11.077 <0.0000000000000002 ***
## edlvdua           0.52335    0.03511  14.906 <0.0000000000000002 ***
## gndrmale          0.95242    0.40050   2.378              0.0176 *  
## edlvdua:gndrmale -0.06242    0.05329  -1.171              0.2417    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.358 on 1014 degrees of freedom
## Multiple R-squared:  0.2615, Adjusted R-squared:  0.2594 
## F-statistic: 119.7 on 3 and 1014 DF,  p-value: < 0.00000000000000022

This model predicts slightly better that the first but worse that second with age.

Now lets look closer on the table and choose the better model:

sjt.lm(fit5,fit5iA,fit5iG, show.ci = F, p.numeric = F)
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    edlvpdua   edlvpdua   edlvpdua
    B   B   B
(Intercept)   3.45 ***   4.06 ***   2.99 ***
edlvdua   0.49 ***   0.45 ***   0.52 ***
age_group
age_group40-60       -0.18     
age_group60-90       -1.30 **    
edlvdua:age_group40-60       -0.03     
edlvdua:age_group60-90       0.07     
gndrmale           0.95 *
edlvdua:gndrmale           -0.06 
Observations   1018   1018   1018
R2 / adj. R2   .252 / .251   .266 / .263   .262 / .259
Notes * p<.05   ** p<.01   *** p<.001

As we can see the second model with the age is better.

And we test these models with ANOVA:

anova(fit5,fit5iA,fit5iG)
## Analysis of Variance Table
## 
## Model 1: edlvpdua ~ edlvdua
## Model 2: edlvpdua ~ edlvdua * age_group
## Model 3: edlvpdua ~ edlvdua * gndr
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1   1016 5710.7                                 
## 2   1012 5599.4  4   111.317 5.0297 0.000513 ***
## 3   1014 5637.1 -2   -37.691 3.4060 0.033555 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So here we can see that it’s better to use the second model.

Also we can use categorical variable as dependent and continuous variable as independent variable (eduyrs)

fit5iAe <- lm(edlvpdua ~ eduyrs * age_group, data=politics)
summary(fit5iAe)
## 
## Call:
## lm(formula = edlvpdua ~ eduyrs * age_group, data = politics)
## 
## Residuals:
## <Labelled double>
##      Min       1Q   Median       3Q      Max 
## -10.4839  -1.9937   0.3025   1.7881   6.1066 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            2.44700    0.58046   4.216            0.0000271 ***
## eduyrs                 0.37504    0.04219   8.890 < 0.0000000000000002 ***
## age_group40-60         1.18873    0.77598   1.532               0.1259    
## age_group60-90        -0.78013    0.78131  -0.998               0.3183    
## eduyrs:age_group40-60 -0.11742    0.05801  -2.024               0.0432 *  
## eduyrs:age_group60-90 -0.01447    0.05962  -0.243               0.8083    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.456 on 1012 degrees of freedom
## Multiple R-squared:  0.2005, Adjusted R-squared:  0.1965 
## F-statistic: 50.74 on 5 and 1012 DF,  p-value: < 0.00000000000000022

As we can see this model predicts worse that others. We can explain it by difference between each level of education - the same amount of years can be spent on getting bachelor degree in university and getting two college degrees but these two equals same amount of years of education. So we can not use eduyrs in this particular case because we don’t have the same variable in years on partner’s level of education (but even if we have still it will predict worse on the same reason).

Interaction plot:

sjPlot::plot_model(fit5iA, type = "int", terms = c("age_group"), mdrt.values = "meansd")

- parallel lines indicate that there is no interaction effect while different slopes suggest that one might be present - so here we have interaction effect between 40-60 and 60-90.