project 4

linear regression - 1 (Direct Effects):

Research question and hypotheses which you explain to the reader
Descriptive statistics for each variable involved (for single variables and pairs of variables), e.g. histograms, barplots, scatter plots, and box plots
Hierarchically built models: you start with one predictor and then add another one
Comparison of model fits with ANOVA + you write down which model fits better To accompany your final model:
you have provided a correct interpretation of the overall model fit (F-statistic, R-squared)
correct interpretation of all coefficients
you have written out the regression equation
you have explained whether you hypotheses were falsified or supported
you have created an output table with sjt.lm() or stargazer() and formulated some conclusion
you have submitted a feedback to another group’s project on the forum

Variables

We gonna predict edlvpdua: Partner’s highest level of education, Ukraine

by edlvdua: (Respondent’s) highest level of education, Ukraine

and edlvfdua: Father’s highest level of education, Ukraine

and edlvmdua: Mother’s highest level of education, Ukraine

agea: age of respondent (factor)
gender: sex of respondent (factor)

Hypothesis

Partner’s highest level of education correlate with (respondent’s) highest level of education like in Bourdieu’s concept of habitus - any person looking for partner with the same level of education.
Mother’s and father’s levels of education influence through (respondent’s) level of education on respondent’s parnter’s highest level of education - parents influence on respondent’s level of education and that influence on respondent’s partner’s level of education - like in Bourdieu’s concept of habitus.

+) Additional: also we trying to show that younglings more tolerate in choosing partners by shared level of education that elder generations - that means nowadays shared (same) level of education influence not that strong on choosing partner then earlier.

Libraries+

library(car)
library(ggplot2)
library(psych)
library(QuantPsyc)
library(sjPlot)
library(stargazer)
library(car)
library(ggplot2)
library(gvlma)
library(magrittr)
library(effects)
options(scipen=999)

Dataset

getwd()

## [1] "/students/dstsimokha/DA_shirokaner"

politics = haven::read_sav("ESS6UA.sav")
politics <- dplyr::select(politics, edlvpdua, edlvdua, edlvfdua, edlvmdua, gndr, agea, eduyrs)
politics <- na.omit(politics)
politics <- subset(politics, edlvdua!="5555" & edlvfdua!="5555" & edlvmdua!="5555" & agea>"19")
politics$age_group <- as.factor(car::recode(politics$agea,"20:40='20-40';40:60='40-60';60:90='60-90'"))
politics$gndr <- as.factor(car::recode(politics$gndr,"1='male';2='female'"))

Descriptive statistics

All data statistics

##            vars    n  mean    sd median trimmed   mad min max range  skew
## edlvpdua      1 1018  6.89  2.74      8    7.01  2.97   0  12    12 -0.30
## edlvdua       2 1018  7.02  2.81      8    7.19  2.97   0  12    12 -0.41
## edlvfdua      3 1018  4.36  3.22      4    4.16  4.45   0  12    12  0.44
## edlvmdua      4 1018  4.24  3.22      4    4.04  4.45   0  11    11  0.42
## gndr*         5 1018  1.42  0.49      1    1.40  0.00   1   2     1  0.33
## agea          6 1018 47.22 15.75     46   46.57 19.27  20  87    67  0.26
## eduyrs        7 1018 12.72  3.29     13   12.82  2.97   1  30    29 -0.14
## age_group*    8 1018  1.85  0.77      2    1.81  1.48   1   3     2  0.27
##            kurtosis   se
## edlvpdua      -0.93 0.09
## edlvdua       -0.91 0.09
## edlvfdua      -0.93 0.10
## edlvmdua      -1.05 0.10
## gndr*         -1.89 0.02
## agea          -0.90 0.49
## eduyrs         3.89 0.10
## age_group*    -1.28 0.02

Outcome

Predictors

Visualization of relationship

Partner & respondent

Partner & respondent’s father

Partner & respondent’s mother

Correlation test

Partner & respondent

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvdua
## t = 18.496, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4544714 0.5464887
## sample estimates:
##       cor 
## 0.5018989

Partner & respondent’s father

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvfdua
## t = 13.688, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3414232 0.4452358
## sample estimates:
##      cor 
## 0.394588

Partner & respondent’s mother

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvpdua and politics$edlvmdua
## t = 13.141, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3273713 0.4324617
## sample estimates:
##      cor 
## 0.381147

Assumptions

I will do it a moment later cause we don’t need it

linearity
normally distributed residuals of the outcome
predictors have an unlimited range
independent observations (no correlation between their residuals)
independent variables (no multicollinearity)
no outliers
homoskedasticity (residual variances are the same along the values of Y)

Linearity and outliers

Linearity

fit4 <- lm(edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data=politics)
 
ggplot(data = politics, aes(x = edlvpdua, y = edlvdua)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  theme_light()

## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.

Outliers

outlierTest(fit4)

## 
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferonni p
## 769 -3.186016          0.0014866           NA

Additivity

gvmodel1 <- gvlma(fit4) 
summary(gvmodel1)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  3.27807    0.19601  16.724 < 0.0000000000000002 ***
## edlvdua      0.38260    0.02923  13.089 < 0.0000000000000002 ***
## edlvfdua     0.13203    0.03284   4.020            0.0000626 ***
## edlvmdua     0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit4) 
## 
##                     Value p-value                Decision
## Global Stat        5.1546 0.27180 Assumptions acceptable.
## Skewness           2.7703 0.09603 Assumptions acceptable.
## Kurtosis           1.0745 0.29993 Assumptions acceptable.
## Link Function      1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.

Independent errors

durbinWatsonTest(fit4)

##  lag Autocorrelation D-W Statistic p-value
##    1       0.1047431      1.789712       0
##  Alternative hypothesis: rho != 0

Homoscedasticity

ncvTest(fit4)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 13.29712    Df = 1     p = 0.0002658145

Normality of the residuals

model1=aov(politics$edlvpdua ~ politics$edlvdua) 
res1=model1$residuals 
shapiro.test(res1)

## 
##  Shapiro-Wilk normality test
## 
## data:  res1
## W = 0.98922, p-value = 0.0000008359

for ONE

model2=aov(politics$edlvpdua ~ politics$edlvfdua) 
res2=model2$residuals 
shapiro.test(res2)

## 
##  Shapiro-Wilk normality test
## 
## data:  res2
## W = 0.98308, p-value = 0.000000001728

for SECOND

model3=aov(politics$edlvpdua ~ politics$edlvmdua) 
res3=model3$residuals 
shapiro.test(res3)

## 
##  Shapiro-Wilk normality test
## 
## data:  res3
## W = 0.9783, p-value = 0.00000000003407

for THIRD

No confounding varibales

Predictor variables are quantitative or dichotomous, outcome variable is continuous

Multicollinearity

# checking multicolinearity for independent variables.
#VIF(lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics))

Global test for model assumptions

gvmodel <- gvlma(fit4) 
summary(gvmodel)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  3.27807    0.19601  16.724 < 0.0000000000000002 ***
## edlvdua      0.38260    0.02923  13.089 < 0.0000000000000002 ***
## edlvfdua     0.13203    0.03284   4.020            0.0000626 ***
## edlvmdua     0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = fit4) 
## 
##                     Value p-value                Decision
## Global Stat        5.1546 0.27180 Assumptions acceptable.
## Skewness           2.7703 0.09603 Assumptions acceptable.
## Kurtosis           1.0745 0.29993 Assumptions acceptable.
## Link Function      1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.

Linear regression:

Model 1

(edlvdua: Highest level of education, Ukraine)

ggplot(politics, aes(edlvpdua)) + geom_density(fill="blue")

ggplot(politics, aes(edlvdua)) + geom_density(fill="red")

qqplot(politics$edlvpdua, politics$edlvdua)

fit1 = lm(edlvpdua ~ edlvdua, data=politics)
summary(fit1)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.8383 -1.4077  0.1221  1.6518  7.0625 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   3.4474     0.2004    17.2 <0.0000000000000002 ***
## edlvdua       0.4901     0.0265    18.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared:  0.2519, Adjusted R-squared:  0.2512 
## F-statistic: 342.1 on 1 and 1016 DF,  p-value: < 0.00000000000000022

#par(mfrow=c(2,2))
#plot(fit1)

sjPlot::plot_model(fit1, type = "pred", terms = "edlvdua")

This model explains 25% of variance

Intercept = 3.43 & b coefficient = 0.5:

y(edlvpdua) = 0.5 * x(edlvdua) + 3.43

for every additonal (respondent’s) higher level of education the partner’s higher level of education increases by 0.5 from 3.43

Model 2

(edlvfdua: Father’s highest level of education, Ukraine)

fit2 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data=politics)
summary(fit2)

## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7846 -2.0960  0.2318  1.8875  5.5761 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           5.42386    0.13306   40.76 <0.0000000000000002 ***
## as.numeric(edlvfdua)  0.33607    0.02455   13.69 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.519 on 1016 degrees of freedom
## Multiple R-squared:  0.1557, Adjusted R-squared:  0.1549 
## F-statistic: 187.4 on 1 and 1016 DF,  p-value: < 0.00000000000000022

par(mfrow=c(2,2))
plot(fit2)

This model explains 16% of variance

Intercept = 5.9 & b coefficient = 0.34:

y(edlvpdua) = 0.34 * x(edlvfdua) + 5.9

for every additonal (respondent’s) father’s higher level of education the partner’s higher level of education increases by 0.34 from 5.9

Model 3

(edlvmdua: Mother’s highest level of education, Ukraine)

fit3 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data=politics)
summary(fit3)

## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7596 -2.1108  0.2136  1.8892  5.4844 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           5.51558    0.13133   42.00 <0.0000000000000002 ***
## as.numeric(edlvmdua)  0.32440    0.02469   13.14 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.534 on 1016 degrees of freedom
## Multiple R-squared:  0.1453, Adjusted R-squared:  0.1444 
## F-statistic: 172.7 on 1 and 1016 DF,  p-value: < 0.00000000000000022

par(mfrow=c(2,2))
plot(fit3)

This model explains 15% of variance

Intercept = 5.5 & b coefficient = 0.33:

y(edlvpdua) = 0.33 * x(edlvmdua) + 5.5

for every additonal (respondent’s) mother’s higher level of education the partner’s higher level of education increases by 0.33 from 5.5

ANOVA:

## Analysis of Variance Table
## 
## Response: edlvpdua
##             Df Sum Sq Mean Sq F value                Pr(>F)    
## edlvdua      1 1922.9 1922.94  342.11 < 0.00000000000000022 ***
## Residuals 1016 5710.7    5.62                                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model 4

(final)

fit4 <- lm(as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) + as.numeric(edlvmdua), data=politics)
summary(fit4)

## 
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) + 
##     as.numeric(edlvmdua), data = politics)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2679 -1.6655  0.1428  1.5108  7.2571 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           3.27807    0.19601  16.724 < 0.0000000000000002 ***
## as.numeric(edlvdua)   0.38260    0.02923  13.089 < 0.0000000000000002 ***
## as.numeric(edlvfdua)  0.13203    0.03284   4.020            0.0000626 ***
## as.numeric(edlvmdua)  0.08222    0.03296   2.494               0.0128 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared:  0.2946, Adjusted R-squared:  0.2926 
## F-statistic: 141.2 on 3 and 1014 DF,  p-value: < 0.00000000000000022

par(mfrow=c(2,2))  
plot(fit4)

lm.beta(fit4)

##  as.numeric(edlvdua) as.numeric(edlvfdua) as.numeric(edlvmdua) 
##           0.39183089           0.15501268           0.09660109

plot_model(fit4, type = "std")

fit4i <- lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics)
summary(fit4i)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.0086 -1.7259  0.1634  1.5722  7.8285 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                            Estimate Std. Error t value
## (Intercept)                2.414640   0.361565   6.678
## edlvdua                    0.532523   0.054436   9.783
## edlvfdua                   0.387153   0.134628   2.876
## edlvmdua                   0.254588   0.138580   1.837
## edlvdua:edlvfdua          -0.040679   0.017111  -2.377
## edlvdua:edlvmdua          -0.030296   0.017090  -1.773
## edlvfdua:edlvmdua         -0.029865   0.025109  -1.189
## edlvdua:edlvfdua:edlvmdua  0.005302   0.002845   1.864
##                                       Pr(>|t|)    
## (Intercept)                    0.0000000000398 ***
## edlvdua                   < 0.0000000000000002 ***
## edlvfdua                               0.00412 ** 
## edlvmdua                               0.06648 .  
## edlvdua:edlvfdua                       0.01762 *  
## edlvdua:edlvmdua                       0.07658 .  
## edlvfdua:edlvmdua                      0.23456    
## edlvdua:edlvfdua:edlvmdua              0.06267 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.296 on 1010 degrees of freedom
## Multiple R-squared:  0.3023, Adjusted R-squared:  0.2975 
## F-statistic: 62.53 on 7 and 1010 DF,  p-value: < 0.00000000000000022

sjPlot::plot_model(fit4i, type = "pred", terms = "edlvpdua")

This model explains 30% of variance

Intercept = 3.2 & b1 coefficient = 0.38 & b2 coefficient = 0.13 + b3 coefficient = 0.08:

y(edlvpdua) = 0.38(edlvdua) + 0.13(edlvfdua) + 0.08(edlvmdua) + 3.2

for every additonal (respondent’s) highest level of education + (respondent’s) father’s highest level of education + (respondent’s) mother’s highest level of education the partner’s highest level of education increases by 0.38 + 0.13 + 0.08 from 3.2

Comparison of models

		edlvpdua	as.numeric(edlvpdua)	as.numeric(edlvpdua)	as.numeric(edlvpdua)
		B	B	B	B
(Intercept)		3.45 ***	5.42 ***	5.52 ***	3.28 ***
edlvdua		0.49 ***
as.numeric(edlvfdua)			0.34 ***		0.13 ***
as.numeric(edlvmdua)				0.32 ***	0.08 *
as.numeric(edlvdua)					0.38 ***
Observations		1018	1018	1018	1018
R² / adj. R²		.252 / .251	.156 / .155	.145 / .144	.295 / .293
Notes	* p<.05 p<.01 * p<.001

Picking the best model - the last fourth model has the best result in prediction of partner’s highest level of education = 30%

Interaction effect

edlvpdua

(Intercept)

2.98 ***

2.86 ***

5.35 ***

3.33 ***

2.90 ***

2.41 ***

edlvdua

0.45 ***

0.46 ***

0.38 ***

0.44 ***

0.53 ***

edlvmdua

0.28 ***

0.13 *

0.07

0.20 **

0.25

edlvdua:edlvmdua

-0.01

-0.03

edlvfdua

0.32 ***

0.18 ***

0.12 **

0.13 ***

0.39 **

edlvdua:edlvfdua

-0.02

-0.04 *

edlvfdua:edlvmdua

0.01

0.00

edlvmdua:edlvfdua

0.00

-0.03

edlvmdua:edlvdua

-0.02

edlvdua:edlvmdua:edlvfdua

0.01

Observations

1018

R² / adj. R²

.285 / .283

.293 / .291

.176 / .174

.295 / .292

.297 / .294

.302 / .297

Notes

* p<.05 ** p<.01 *** p<.001

Also in addition (because I was interested in it, that looks like Bourdeaurd) comparison of linear regressions of years of education respondent & partner with years of education of father & mother of respondent

## 
##  Pearson's product-moment correlation
## 
## data:  politics$edlvfdua and politics$edlvmdua
## t = 32.859, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6866184 0.7463080
## sample estimates:
##       cor 
## 0.7177794

		edlvfdua	edlvpdua
		B	B
(Intercept)		1.32 ***	3.45 ***
edlvmdua		0.72 ***
edlvdua			0.49 ***
Observations		1018	1018
R² / adj. R²		.515 / .515	.252 / .251
Notes	* p<.05 p<.01 * p<.001

сделать несколько кластеров по возрасту респондента и посмотреть на разницу поколений (создать новую переменную - группировать по возрасту, а потом сделать интерактивный эффект по возрасту, смотреть как возрастные группы влияют)

hist(politics$agea)

ggplot() +
  geom_boxplot(data = politics, aes(x = age_group, y = edlvdua, fill = age_group)) +
  theme_light()+ 
  xlab("age group") +
  ylab("(respondent's) highest level of education")

## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.

fitFMa = lm(edlvfdua ~ edlvmdua * age_group, data=politics)
fitPRa = lm(edlvpdua ~ edlvdua * age_group, data=politics)


sjt.lm(fitFMa,fitPRa, show.ci = F, p.numeric = F)

## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.

		edlvfdua	edlvpdua
		B	B
(Intercept)		2.29 ***	4.06 ***
edlvmdua		0.60 ***
age_group
age_group40-60		-0.93 **	-0.18
age_group60-90		-1.35 ***	-1.30 **
edlvmdua:age_group40-60		0.08
edlvmdua:age_group60-90		0.16 *
edlvdua			0.45 ***
edlvdua:age_group40-60			-0.03
edlvdua:age_group60-90			0.07
Observations		1018	1018
R² / adj. R²		.524 / .522	.266 / .263
Notes	* p<.05 p<.01 * p<.001

sjPlot::plot_model(fitFMa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")

sjPlot::plot_model(fitPRa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")

group2 <- subset(politics, agea < 30 & agea > 19)
group3 <- subset(politics, agea < 40 & agea > 29)
group4 <- subset(politics, agea < 50 & agea > 39)
group5 <- subset(politics, agea < 60 & agea > 49)
group6 <- subset(politics, agea < 70 & agea > 59)

fitPR2 = lm(edlvpdua ~ edlvdua, data=group2)
fitPR3 = lm(edlvpdua ~ edlvdua, data=group3)
fitPR4 = lm(edlvpdua ~ edlvdua, data=group4)
fitPR5 = lm(edlvpdua ~ edlvdua, data=group5)
fitPR6 = lm(edlvpdua ~ edlvdua, data=group6)

sjt.lm(fitPR2,fitPR3,fitPR4,fitPR5,fitPR6, show.ci = F, p.numeric = F)

		edlvpdua	edlvpdua	edlvpdua	edlvpdua	edlvpdua
		B	B	B	B	B
(Intercept)		3.58 ***	4.50 ***	3.78 ***	3.95 ***	4.40 ***
edlvdua		0.49 ***	0.42 ***	0.43 ***	0.41 ***	0.32 ***
Observations		158	220	197	190	153
R² / adj. R²		.264 / .259	.204 / .200	.161 / .156	.201 / .196	.105 / .099
Notes	* p<.05 p<.01 * p<.001

project 5

Theory: it was in the project 4 - Habitus Theory.

Data: We will predict edlvpdua partner’s highest level of education by edlvdua (respondent’s) highest level of education. Then we trying to use age_group respondent’s age (3 levels) and gndr gender (2 levels) as moderators.

fit5 <- lm(edlvpdua ~ edlvdua, data=politics)
summary(fit5)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.8383 -1.4077  0.1221  1.6518  7.0625 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   3.4474     0.2004    17.2 <0.0000000000000002 ***
## edlvdua       0.4901     0.0265    18.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared:  0.2519, Adjusted R-squared:  0.2512 
## F-statistic: 342.1 on 1 and 1016 DF,  p-value: < 0.00000000000000022

Here we can see that this model predicts 25%. Let’s try to use different moderators.

With age:

fit5iA <- lm(edlvpdua ~ edlvdua * age_group, data=politics)
summary(fit5iA)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * age_group, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -6.5523 -1.8124  0.1594  1.8732  7.7142 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                        Estimate Std. Error t value             Pr(>|t|)
## (Intercept)             4.05926    0.35079  11.572 < 0.0000000000000002
## edlvdua                 0.45195    0.04351  10.388 < 0.0000000000000002
## age_group40-60         -0.17853    0.50522  -0.353              0.72389
## age_group60-90         -1.30013    0.48043  -2.706              0.00692
## edlvdua:age_group40-60 -0.02911    0.06523  -0.446              0.65555
## edlvdua:age_group60-90  0.07470    0.06497   1.150              0.25052
##                           
## (Intercept)            ***
## edlvdua                ***
## age_group40-60            
## age_group60-90         ** 
## edlvdua:age_group40-60    
## edlvdua:age_group60-90    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.352 on 1012 degrees of freedom
## Multiple R-squared:  0.2665, Adjusted R-squared:  0.2629 
## F-statistic: 73.53 on 5 and 1012 DF,  p-value: < 0.00000000000000022

This model predicts slightly better - 26%.

With gender:

fit5iG <- lm(edlvpdua ~ edlvdua * gndr, data=politics)
summary(fit5iG)

## 
## Call:
## lm(formula = edlvpdua ~ edlvdua * gndr, data = politics)
## 
## Residuals:
## <Labelled double>
##     Min      1Q  Median      3Q     Max 
## -7.0159 -1.6567  0.2966  1.7732  7.4834 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                  Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)       2.99329    0.27023  11.077 <0.0000000000000002 ***
## edlvdua           0.52335    0.03511  14.906 <0.0000000000000002 ***
## gndrmale          0.95242    0.40050   2.378              0.0176 *  
## edlvdua:gndrmale -0.06242    0.05329  -1.171              0.2417    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.358 on 1014 degrees of freedom
## Multiple R-squared:  0.2615, Adjusted R-squared:  0.2594 
## F-statistic: 119.7 on 3 and 1014 DF,  p-value: < 0.00000000000000022

This model predicts slightly better that the first but worse that second with age.

Now lets look closer on the table and choose the better model:

sjt.lm(fit5,fit5iA,fit5iG, show.ci = F, p.numeric = F)

## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.

		edlvpdua	edlvpdua	edlvpdua
		B	B	B
(Intercept)		3.45 ***	4.06 ***	2.99 ***
edlvdua		0.49 ***	0.45 ***	0.52 ***
age_group
age_group40-60			-0.18
age_group60-90			-1.30 **
edlvdua:age_group40-60			-0.03
edlvdua:age_group60-90			0.07
gndrmale				0.95 *
edlvdua:gndrmale				-0.06
Observations		1018	1018	1018
R² / adj. R²		.252 / .251	.266 / .263	.262 / .259
Notes	* p<.05 p<.01 * p<.001

As we can see the second model with the age is better.

And we test these models with ANOVA:

anova(fit5,fit5iA,fit5iG)

## Analysis of Variance Table
## 
## Model 1: edlvpdua ~ edlvdua
## Model 2: edlvpdua ~ edlvdua * age_group
## Model 3: edlvpdua ~ edlvdua * gndr
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1   1016 5710.7                                 
## 2   1012 5599.4  4   111.317 5.0297 0.000513 ***
## 3   1014 5637.1 -2   -37.691 3.4060 0.033555 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So here we can see that it’s better to use the second model.

Also we can use categorical variable as dependent and continuous variable as independent variable (eduyrs)

fit5iAe <- lm(edlvpdua ~ eduyrs * age_group, data=politics)
summary(fit5iAe)

## 
## Call:
## lm(formula = edlvpdua ~ eduyrs * age_group, data = politics)
## 
## Residuals:
## <Labelled double>
##      Min       1Q   Median       3Q      Max 
## -10.4839  -1.9937   0.3025   1.7881   6.1066 
## 
## Labels:
##  value                                                        label
##      0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
##      1              Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
##      2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
##      3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
##      4  Povna serednja osvita (atestat pro povnu serednju osvitu za
##      5  Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
##      6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
##      7                          PTU na bazi povnoi seredn'oi osvity
##      8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
##      9                             Bazova vyshcha osvita (bakalavr)
##     10                            Povna vyshcha osvita (specialist)
##     11                               Povna vyshcha osvita (magistr)
##     12                                   Aspirantura, vchena stupin
##   5555                                                        Other
##   6666                                               Not applicable
##   7777                                                      Refusal
##   8888                                                   Don't know
##   9999                                                    No answer
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            2.44700    0.58046   4.216            0.0000271 ***
## eduyrs                 0.37504    0.04219   8.890 < 0.0000000000000002 ***
## age_group40-60         1.18873    0.77598   1.532               0.1259    
## age_group60-90        -0.78013    0.78131  -0.998               0.3183    
## eduyrs:age_group40-60 -0.11742    0.05801  -2.024               0.0432 *  
## eduyrs:age_group60-90 -0.01447    0.05962  -0.243               0.8083    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.456 on 1012 degrees of freedom
## Multiple R-squared:  0.2005, Adjusted R-squared:  0.1965 
## F-statistic: 50.74 on 5 and 1012 DF,  p-value: < 0.00000000000000022

As we can see this model predicts worse that others. We can explain it by difference between each level of education - the same amount of years can be spent on getting bachelor degree in university and getting two college degrees but these two equals same amount of years of education. So we can not use eduyrs in this particular case because we don’t have the same variable in years on partner’s level of education (but even if we have still it will predict worse on the same reason).

Interaction plot:

sjPlot::plot_model(fit5iA, type = "int", terms = c("age_group"), mdrt.values = "meansd")

- parallel lines indicate that there is no interaction effect while different slopes suggest that one might be present - so here we have interaction effect between 40-60 and 60-90.

KM: linear regression - 1 (Direct Effects) + linear regression - 2 (Interaction Effects)

Tsimokha D, Zaytseva N & Cherepanov G

14 05 2018

project 4

Variables

Hypothesis

Libraries+

Dataset

Descriptive statistics

All data statistics

Outcome

Predictors

Visualization of relationship

Partner & respondent

Partner & respondent’s father

Partner & respondent’s mother

Correlation test

Partner & respondent

Partner & respondent’s father

Partner & respondent’s mother

Assumptions

Linearity and outliers

Linearity

Outliers

Additivity

Independent errors

Homoscedasticity

Normality of the residuals

No confounding varibales

Predictor variables are quantitative or dichotomous, outcome variable is continuous

Multicollinearity

Global test for model assumptions

Linear regression:

Model 1

Model 2

Model 3

ANOVA:

Model 4

Comparison of models

Interaction effect

project 5