linear regression - 1 (Direct Effects):
Research question and hypotheses which you explain to the reader
Descriptive statistics for each variable involved (for single variables and pairs of variables), e.g. histograms, barplots, scatter plots, and box plots
Hierarchically built models: you start with one predictor and then add another one
Comparison of model fits with ANOVA + you write down which model fits better To accompany your final model:
you have provided a correct interpretation of the overall model fit (F-statistic, R-squared)
correct interpretation of all coefficients
you have written out the regression equation
you have explained whether you hypotheses were falsified or supported
you have created an output table with sjt.lm() or stargazer() and formulated some conclusion
you have submitted a feedback to another group’s project on the forum
We gonna predict edlvpdua: Partner’s highest level of education, Ukraine
by edlvdua: (Respondent’s) highest level of education, Ukraine
and edlvfdua: Father’s highest level of education, Ukraine
and edlvmdua: Mother’s highest level of education, Ukraine
agea: age of respondent (factor)
gender: sex of respondent (factor)
Partner’s highest level of education correlate with (respondent’s) highest level of education like in Bourdieu’s concept of habitus - any person looking for partner with the same level of education.
Mother’s and father’s levels of education influence through (respondent’s) level of education on respondent’s parnter’s highest level of education - parents influence on respondent’s level of education and that influence on respondent’s partner’s level of education - like in Bourdieu’s concept of habitus.
+) Additional: also we trying to show that younglings more tolerate in choosing partners by shared level of education that elder generations - that means nowadays shared (same) level of education influence not that strong on choosing partner then earlier.
library(car)
library(ggplot2)
library(psych)
library(QuantPsyc)
library(sjPlot)
library(stargazer)
library(car)
library(ggplot2)
library(gvlma)
library(magrittr)
library(effects)
options(scipen=999)
getwd()
## [1] "/students/dstsimokha/DA_shirokaner"
politics = haven::read_sav("ESS6UA.sav")
politics <- dplyr::select(politics, edlvpdua, edlvdua, edlvfdua, edlvmdua, gndr, agea, eduyrs)
politics <- na.omit(politics)
politics <- subset(politics, edlvdua!="5555" & edlvfdua!="5555" & edlvmdua!="5555" & agea>"19")
politics$age_group <- as.factor(car::recode(politics$agea,"20:40='20-40';40:60='40-60';60:90='60-90'"))
politics$gndr <- as.factor(car::recode(politics$gndr,"1='male';2='female'"))
## vars n mean sd median trimmed mad min max range skew
## edlvpdua 1 1018 6.89 2.74 8 7.01 2.97 0 12 12 -0.30
## edlvdua 2 1018 7.02 2.81 8 7.19 2.97 0 12 12 -0.41
## edlvfdua 3 1018 4.36 3.22 4 4.16 4.45 0 12 12 0.44
## edlvmdua 4 1018 4.24 3.22 4 4.04 4.45 0 11 11 0.42
## gndr* 5 1018 1.42 0.49 1 1.40 0.00 1 2 1 0.33
## agea 6 1018 47.22 15.75 46 46.57 19.27 20 87 67 0.26
## eduyrs 7 1018 12.72 3.29 13 12.82 2.97 1 30 29 -0.14
## age_group* 8 1018 1.85 0.77 2 1.81 1.48 1 3 2 0.27
## kurtosis se
## edlvpdua -0.93 0.09
## edlvdua -0.91 0.09
## edlvfdua -0.93 0.10
## edlvmdua -1.05 0.10
## gndr* -1.89 0.02
## agea -0.90 0.49
## eduyrs 3.89 0.10
## age_group* -1.28 0.02
##
## Pearson's product-moment correlation
##
## data: politics$edlvpdua and politics$edlvdua
## t = 18.496, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4544714 0.5464887
## sample estimates:
## cor
## 0.5018989
##
## Pearson's product-moment correlation
##
## data: politics$edlvpdua and politics$edlvfdua
## t = 13.688, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3414232 0.4452358
## sample estimates:
## cor
## 0.394588
##
## Pearson's product-moment correlation
##
## data: politics$edlvpdua and politics$edlvmdua
## t = 13.141, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3273713 0.4324617
## sample estimates:
## cor
## 0.381147
I will do it a moment later cause we don’t need it
fit4 <- lm(edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data=politics)
ggplot(data = politics, aes(x = edlvpdua, y = edlvdua)) +
geom_point() +
geom_smooth(method = "lm") +
theme_light()
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
outlierTest(fit4)
##
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferonni p
## 769 -3.186016 0.0014866 NA
gvmodel1 <- gvlma(fit4)
summary(gvmodel1)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -7.2679 -1.6655 0.1428 1.5108 7.2571
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.27807 0.19601 16.724 < 0.0000000000000002 ***
## edlvdua 0.38260 0.02923 13.089 < 0.0000000000000002 ***
## edlvfdua 0.13203 0.03284 4.020 0.0000626 ***
## edlvmdua 0.08222 0.03296 2.494 0.0128 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared: 0.2946, Adjusted R-squared: 0.2926
## F-statistic: 141.2 on 3 and 1014 DF, p-value: < 0.00000000000000022
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = fit4)
##
## Value p-value Decision
## Global Stat 5.1546 0.27180 Assumptions acceptable.
## Skewness 2.7703 0.09603 Assumptions acceptable.
## Kurtosis 1.0745 0.29993 Assumptions acceptable.
## Link Function 1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.
durbinWatsonTest(fit4)
## lag Autocorrelation D-W Statistic p-value
## 1 0.1047431 1.789712 0
## Alternative hypothesis: rho != 0
ncvTest(fit4)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 13.29712 Df = 1 p = 0.0002658145
model1=aov(politics$edlvpdua ~ politics$edlvdua)
res1=model1$residuals
shapiro.test(res1)
##
## Shapiro-Wilk normality test
##
## data: res1
## W = 0.98922, p-value = 0.0000008359
for ONE
model2=aov(politics$edlvpdua ~ politics$edlvfdua)
res2=model2$residuals
shapiro.test(res2)
##
## Shapiro-Wilk normality test
##
## data: res2
## W = 0.98308, p-value = 0.000000001728
for SECOND
model3=aov(politics$edlvpdua ~ politics$edlvmdua)
res3=model3$residuals
shapiro.test(res3)
##
## Shapiro-Wilk normality test
##
## data: res3
## W = 0.9783, p-value = 0.00000000003407
for THIRD
# checking multicolinearity for independent variables.
#VIF(lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics))
gvmodel <- gvlma(fit4)
summary(gvmodel)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua + edlvfdua + edlvmdua, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -7.2679 -1.6655 0.1428 1.5108 7.2571
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.27807 0.19601 16.724 < 0.0000000000000002 ***
## edlvdua 0.38260 0.02923 13.089 < 0.0000000000000002 ***
## edlvfdua 0.13203 0.03284 4.020 0.0000626 ***
## edlvmdua 0.08222 0.03296 2.494 0.0128 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared: 0.2946, Adjusted R-squared: 0.2926
## F-statistic: 141.2 on 3 and 1014 DF, p-value: < 0.00000000000000022
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = fit4)
##
## Value p-value Decision
## Global Stat 5.1546 0.27180 Assumptions acceptable.
## Skewness 2.7703 0.09603 Assumptions acceptable.
## Kurtosis 1.0745 0.29993 Assumptions acceptable.
## Link Function 1.0471 0.30617 Assumptions acceptable.
## Heteroscedasticity 0.2626 0.60834 Assumptions acceptable.
(edlvdua: Highest level of education, Ukraine)
ggplot(politics, aes(edlvpdua)) + geom_density(fill="blue")
ggplot(politics, aes(edlvdua)) + geom_density(fill="red")
qqplot(politics$edlvpdua, politics$edlvdua)
fit1 = lm(edlvpdua ~ edlvdua, data=politics)
summary(fit1)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -6.8383 -1.4077 0.1221 1.6518 7.0625
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4474 0.2004 17.2 <0.0000000000000002 ***
## edlvdua 0.4901 0.0265 18.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared: 0.2519, Adjusted R-squared: 0.2512
## F-statistic: 342.1 on 1 and 1016 DF, p-value: < 0.00000000000000022
#par(mfrow=c(2,2))
#plot(fit1)
sjPlot::plot_model(fit1, type = "pred", terms = "edlvdua")
This model explains 25% of variance
Intercept = 3.43 & b coefficient = 0.5:
y(edlvpdua) = 0.5 * x(edlvdua) + 3.43
for every additonal (respondent’s) higher level of education the partner’s higher level of education increases by 0.5 from 3.43
(edlvfdua: Father’s highest level of education, Ukraine)
fit2 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data=politics)
summary(fit2)
##
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvfdua), data = politics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7846 -2.0960 0.2318 1.8875 5.5761
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.42386 0.13306 40.76 <0.0000000000000002 ***
## as.numeric(edlvfdua) 0.33607 0.02455 13.69 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.519 on 1016 degrees of freedom
## Multiple R-squared: 0.1557, Adjusted R-squared: 0.1549
## F-statistic: 187.4 on 1 and 1016 DF, p-value: < 0.00000000000000022
par(mfrow=c(2,2))
plot(fit2)
This model explains 16% of variance
Intercept = 5.9 & b coefficient = 0.34:
y(edlvpdua) = 0.34 * x(edlvfdua) + 5.9
for every additonal (respondent’s) father’s higher level of education the partner’s higher level of education increases by 0.34 from 5.9
(edlvmdua: Mother’s highest level of education, Ukraine)
fit3 = lm(as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data=politics)
summary(fit3)
##
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvmdua), data = politics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7596 -2.1108 0.2136 1.8892 5.4844
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.51558 0.13133 42.00 <0.0000000000000002 ***
## as.numeric(edlvmdua) 0.32440 0.02469 13.14 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.534 on 1016 degrees of freedom
## Multiple R-squared: 0.1453, Adjusted R-squared: 0.1444
## F-statistic: 172.7 on 1 and 1016 DF, p-value: < 0.00000000000000022
par(mfrow=c(2,2))
plot(fit3)
This model explains 15% of variance
Intercept = 5.5 & b coefficient = 0.33:
y(edlvpdua) = 0.33 * x(edlvmdua) + 5.5
for every additonal (respondent’s) mother’s higher level of education the partner’s higher level of education increases by 0.33 from 5.5
## Analysis of Variance Table
##
## Response: edlvpdua
## Df Sum Sq Mean Sq F value Pr(>F)
## edlvdua 1 1922.9 1922.94 342.11 < 0.00000000000000022 ***
## Residuals 1016 5710.7 5.62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(final)
fit4 <- lm(as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) + as.numeric(edlvmdua), data=politics)
summary(fit4)
##
## Call:
## lm(formula = as.numeric(edlvpdua) ~ as.numeric(edlvdua) + as.numeric(edlvfdua) +
## as.numeric(edlvmdua), data = politics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.2679 -1.6655 0.1428 1.5108 7.2571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.27807 0.19601 16.724 < 0.0000000000000002 ***
## as.numeric(edlvdua) 0.38260 0.02923 13.089 < 0.0000000000000002 ***
## as.numeric(edlvfdua) 0.13203 0.03284 4.020 0.0000626 ***
## as.numeric(edlvmdua) 0.08222 0.03296 2.494 0.0128 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.304 on 1014 degrees of freedom
## Multiple R-squared: 0.2946, Adjusted R-squared: 0.2926
## F-statistic: 141.2 on 3 and 1014 DF, p-value: < 0.00000000000000022
par(mfrow=c(2,2))
plot(fit4)
lm.beta(fit4)
## as.numeric(edlvdua) as.numeric(edlvfdua) as.numeric(edlvmdua)
## 0.39183089 0.15501268 0.09660109
plot_model(fit4, type = "std")
fit4i <- lm(edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data=politics)
summary(fit4i)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua * edlvfdua * edlvmdua, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -7.0086 -1.7259 0.1634 1.5722 7.8285
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 2.414640 0.361565 6.678
## edlvdua 0.532523 0.054436 9.783
## edlvfdua 0.387153 0.134628 2.876
## edlvmdua 0.254588 0.138580 1.837
## edlvdua:edlvfdua -0.040679 0.017111 -2.377
## edlvdua:edlvmdua -0.030296 0.017090 -1.773
## edlvfdua:edlvmdua -0.029865 0.025109 -1.189
## edlvdua:edlvfdua:edlvmdua 0.005302 0.002845 1.864
## Pr(>|t|)
## (Intercept) 0.0000000000398 ***
## edlvdua < 0.0000000000000002 ***
## edlvfdua 0.00412 **
## edlvmdua 0.06648 .
## edlvdua:edlvfdua 0.01762 *
## edlvdua:edlvmdua 0.07658 .
## edlvfdua:edlvmdua 0.23456
## edlvdua:edlvfdua:edlvmdua 0.06267 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.296 on 1010 degrees of freedom
## Multiple R-squared: 0.3023, Adjusted R-squared: 0.2975
## F-statistic: 62.53 on 7 and 1010 DF, p-value: < 0.00000000000000022
sjPlot::plot_model(fit4i, type = "pred", terms = "edlvpdua")
This model explains 30% of variance
Intercept = 3.2 & b1 coefficient = 0.38 & b2 coefficient = 0.13 + b3 coefficient = 0.08:
y(edlvpdua) = 0.38(edlvdua) + 0.13(edlvfdua) + 0.08(edlvmdua) + 3.2
for every additonal (respondent’s) highest level of education + (respondent’s) father’s highest level of education + (respondent’s) mother’s highest level of education the partner’s highest level of education increases by 0.38 + 0.13 + 0.08 from 3.2
| edlvpdua | as.numeric(edlvpdua) | as.numeric(edlvpdua) | as.numeric(edlvpdua) | |||||
| B | B | B | B | |||||
| (Intercept) | 3.45 *** | 5.42 *** | 5.52 *** | 3.28 *** | ||||
| edlvdua | 0.49 *** | |||||||
| as.numeric(edlvfdua) | 0.34 *** | 0.13 *** | ||||||
| as.numeric(edlvmdua) | 0.32 *** | 0.08 * | ||||||
| as.numeric(edlvdua) | 0.38 *** | |||||||
| Observations | 1018 | 1018 | 1018 | 1018 | ||||
| R2 / adj. R2 | .252 / .251 | .156 / .155 | .145 / .144 | .295 / .293 | ||||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||||||
Picking the best model - the last fourth model has the best result in prediction of partner’s highest level of education = 30%
| edlvpdua | edlvpdua | edlvpdua | edlvpdua | edlvpdua | edlvpdua | edlvpdua | ||||||||
| B | B | B | B | B | B | B | ||||||||
| (Intercept) | 2.98 *** | 2.86 *** | 5.35 *** | 3.33 *** | 3.33 *** | 2.90 *** | 2.41 *** | |||||||
| edlvdua | 0.45 *** | 0.46 *** | 0.38 *** | 0.38 *** | 0.44 *** | 0.53 *** | ||||||||
| edlvmdua | 0.28 *** | 0.13 * | 0.07 | 0.07 | 0.20 ** | 0.25 | ||||||||
| edlvdua:edlvmdua | -0.01 | -0.03 | ||||||||||||
| edlvfdua | 0.32 *** | 0.18 *** | 0.12 ** | 0.12 ** | 0.13 *** | 0.39 ** | ||||||||
| edlvdua:edlvfdua | -0.02 | -0.04 * | ||||||||||||
| edlvfdua:edlvmdua | 0.01 | 0.00 | ||||||||||||
| edlvmdua:edlvfdua | 0.00 | -0.03 | ||||||||||||
| edlvmdua:edlvdua | -0.02 | |||||||||||||
| edlvdua:edlvmdua:edlvfdua | 0.01 | |||||||||||||
| Observations | 1018 | 1018 | 1018 | 1018 | 1018 | 1018 | 1018 | |||||||
| R2 / adj. R2 | .285 / .283 | .293 / .291 | .176 / .174 | .295 / .292 | .295 / .292 | .297 / .294 | .302 / .297 | |||||||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||||||||||||
Also in addition (because I was interested in it, that looks like Bourdeaurd) comparison of linear regressions of years of education respondent & partner with years of education of father & mother of respondent
##
## Pearson's product-moment correlation
##
## data: politics$edlvfdua and politics$edlvmdua
## t = 32.859, df = 1016, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6866184 0.7463080
## sample estimates:
## cor
## 0.7177794
| edlvfdua | edlvpdua | |||
| B | B | |||
| (Intercept) | 1.32 *** | 3.45 *** | ||
| edlvmdua | 0.72 *** | |||
| edlvdua | 0.49 *** | |||
| Observations | 1018 | 1018 | ||
| R2 / adj. R2 | .515 / .515 | .252 / .251 | ||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||
hist(politics$agea)
ggplot() +
geom_boxplot(data = politics, aes(x = age_group, y = edlvdua, fill = age_group)) +
theme_light()+
xlab("age group") +
ylab("(respondent's) highest level of education")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
fitFMa = lm(edlvfdua ~ edlvmdua * age_group, data=politics)
fitPRa = lm(edlvpdua ~ edlvdua * age_group, data=politics)
sjt.lm(fitFMa,fitPRa, show.ci = F, p.numeric = F)
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
| edlvfdua | edlvpdua | |||
| B | B | |||
| (Intercept) | 2.29 *** | 4.06 *** | ||
| edlvmdua | 0.60 *** | |||
| age_group | ||||
| age_group40-60 | -0.93 ** | -0.18 | ||
| age_group60-90 | -1.35 *** | -1.30 ** | ||
| edlvmdua:age_group40-60 | 0.08 | |||
| edlvmdua:age_group60-90 | 0.16 * | |||
| edlvdua | 0.45 *** | |||
| edlvdua:age_group40-60 | -0.03 | |||
| edlvdua:age_group60-90 | 0.07 | |||
| Observations | 1018 | 1018 | ||
| R2 / adj. R2 | .524 / .522 | .266 / .263 | ||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||
sjPlot::plot_model(fitFMa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")
sjPlot::plot_model(fitPRa, type = "int", terms = c("edlvmdua","age_group"), mdrt.values = "meansd")
group2 <- subset(politics, agea < 30 & agea > 19)
group3 <- subset(politics, agea < 40 & agea > 29)
group4 <- subset(politics, agea < 50 & agea > 39)
group5 <- subset(politics, agea < 60 & agea > 49)
group6 <- subset(politics, agea < 70 & agea > 59)
fitPR2 = lm(edlvpdua ~ edlvdua, data=group2)
fitPR3 = lm(edlvpdua ~ edlvdua, data=group3)
fitPR4 = lm(edlvpdua ~ edlvdua, data=group4)
fitPR5 = lm(edlvpdua ~ edlvdua, data=group5)
fitPR6 = lm(edlvpdua ~ edlvdua, data=group6)
sjt.lm(fitPR2,fitPR3,fitPR4,fitPR5,fitPR6, show.ci = F, p.numeric = F)
| edlvpdua | edlvpdua | edlvpdua | edlvpdua | edlvpdua | ||||||
| B | B | B | B | B | ||||||
| (Intercept) | 3.58 *** | 4.50 *** | 3.78 *** | 3.95 *** | 4.40 *** | |||||
| edlvdua | 0.49 *** | 0.42 *** | 0.43 *** | 0.41 *** | 0.32 *** | |||||
| Observations | 158 | 220 | 197 | 190 | 153 | |||||
| R2 / adj. R2 | .264 / .259 | .204 / .200 | .161 / .156 | .201 / .196 | .105 / .099 | |||||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||||||||
Theory: it was in the project 4 - Habitus Theory.
Data: We will predict edlvpdua partner’s highest level of education by edlvdua (respondent’s) highest level of education. Then we trying to use age_group respondent’s age (3 levels) and gndr gender (2 levels) as moderators.
fit5 <- lm(edlvpdua ~ edlvdua, data=politics)
summary(fit5)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -6.8383 -1.4077 0.1221 1.6518 7.0625
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4474 0.2004 17.2 <0.0000000000000002 ***
## edlvdua 0.4901 0.0265 18.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.371 on 1016 degrees of freedom
## Multiple R-squared: 0.2519, Adjusted R-squared: 0.2512
## F-statistic: 342.1 on 1 and 1016 DF, p-value: < 0.00000000000000022
Here we can see that this model predicts 25%. Let’s try to use different moderators.
With age:
fit5iA <- lm(edlvpdua ~ edlvdua * age_group, data=politics)
summary(fit5iA)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua * age_group, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -6.5523 -1.8124 0.1594 1.8732 7.7142
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.05926 0.35079 11.572 < 0.0000000000000002
## edlvdua 0.45195 0.04351 10.388 < 0.0000000000000002
## age_group40-60 -0.17853 0.50522 -0.353 0.72389
## age_group60-90 -1.30013 0.48043 -2.706 0.00692
## edlvdua:age_group40-60 -0.02911 0.06523 -0.446 0.65555
## edlvdua:age_group60-90 0.07470 0.06497 1.150 0.25052
##
## (Intercept) ***
## edlvdua ***
## age_group40-60
## age_group60-90 **
## edlvdua:age_group40-60
## edlvdua:age_group60-90
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.352 on 1012 degrees of freedom
## Multiple R-squared: 0.2665, Adjusted R-squared: 0.2629
## F-statistic: 73.53 on 5 and 1012 DF, p-value: < 0.00000000000000022
This model predicts slightly better - 26%.
With gender:
fit5iG <- lm(edlvpdua ~ edlvdua * gndr, data=politics)
summary(fit5iG)
##
## Call:
## lm(formula = edlvpdua ~ edlvdua * gndr, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -7.0159 -1.6567 0.2966 1.7732 7.4834
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.99329 0.27023 11.077 <0.0000000000000002 ***
## edlvdua 0.52335 0.03511 14.906 <0.0000000000000002 ***
## gndrmale 0.95242 0.40050 2.378 0.0176 *
## edlvdua:gndrmale -0.06242 0.05329 -1.171 0.2417
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.358 on 1014 degrees of freedom
## Multiple R-squared: 0.2615, Adjusted R-squared: 0.2594
## F-statistic: 119.7 on 3 and 1014 DF, p-value: < 0.00000000000000022
This model predicts slightly better that the first but worse that second with age.
Now lets look closer on the table and choose the better model:
sjt.lm(fit5,fit5iA,fit5iG, show.ci = F, p.numeric = F)
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
| edlvpdua | edlvpdua | edlvpdua | ||||
| B | B | B | ||||
| (Intercept) | 3.45 *** | 4.06 *** | 2.99 *** | |||
| edlvdua | 0.49 *** | 0.45 *** | 0.52 *** | |||
| age_group | ||||||
| age_group40-60 | -0.18 | |||||
| age_group60-90 | -1.30 ** | |||||
| edlvdua:age_group40-60 | -0.03 | |||||
| edlvdua:age_group60-90 | 0.07 | |||||
| gndrmale | 0.95 * | |||||
| edlvdua:gndrmale | -0.06 | |||||
| Observations | 1018 | 1018 | 1018 | |||
| R2 / adj. R2 | .252 / .251 | .266 / .263 | .262 / .259 | |||
| Notes | * p<.05 ** p<.01 *** p<.001 | |||||
As we can see the second model with the age is better.
And we test these models with ANOVA:
anova(fit5,fit5iA,fit5iG)
## Analysis of Variance Table
##
## Model 1: edlvpdua ~ edlvdua
## Model 2: edlvpdua ~ edlvdua * age_group
## Model 3: edlvpdua ~ edlvdua * gndr
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1016 5710.7
## 2 1012 5599.4 4 111.317 5.0297 0.000513 ***
## 3 1014 5637.1 -2 -37.691 3.4060 0.033555 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So here we can see that it’s better to use the second model.
Also we can use categorical variable as dependent and continuous variable as independent variable (eduyrs)
fit5iAe <- lm(edlvpdua ~ eduyrs * age_group, data=politics)
summary(fit5iAe)
##
## Call:
## lm(formula = edlvpdua ~ eduyrs * age_group, data = politics)
##
## Residuals:
## <Labelled double>
## Min 1Q Median 3Q Max
## -10.4839 -1.9937 0.3025 1.7881 6.1066
##
## Labels:
## value label
## 0 Nepovna pochatkova osvita (menshe 4-kh klasiv seredn'oi shko
## 1 Pochatkova osvita (4-7 klasiv seredn'oi shkoly)
## 2 Nepovna serednja osvita (atestat za 8-9 klasiv seredn'oi shk
## 3 PTU na bazi nepovnoi seredn'oi osviti, nemaje atestatu pro p
## 4 Povna serednja osvita (atestat pro povnu serednju osvitu za
## 5 Zakinchiv PTU na bazi nepovnoi seredn'oi osviti (atestat za
## 6 Dodatkove navchannja na bazi povnoi seredn'oi osvity (profes
## 7 PTU na bazi povnoi seredn'oi osvity
## 8 Nepovna vyshcha osvita (molodshij specialist - dyplom tekhni
## 9 Bazova vyshcha osvita (bakalavr)
## 10 Povna vyshcha osvita (specialist)
## 11 Povna vyshcha osvita (magistr)
## 12 Aspirantura, vchena stupin
## 5555 Other
## 6666 Not applicable
## 7777 Refusal
## 8888 Don't know
## 9999 No answer
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.44700 0.58046 4.216 0.0000271 ***
## eduyrs 0.37504 0.04219 8.890 < 0.0000000000000002 ***
## age_group40-60 1.18873 0.77598 1.532 0.1259
## age_group60-90 -0.78013 0.78131 -0.998 0.3183
## eduyrs:age_group40-60 -0.11742 0.05801 -2.024 0.0432 *
## eduyrs:age_group60-90 -0.01447 0.05962 -0.243 0.8083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.456 on 1012 degrees of freedom
## Multiple R-squared: 0.2005, Adjusted R-squared: 0.1965
## F-statistic: 50.74 on 5 and 1012 DF, p-value: < 0.00000000000000022
As we can see this model predicts worse that others. We can explain it by difference between each level of education - the same amount of years can be spent on getting bachelor degree in university and getting two college degrees but these two equals same amount of years of education. So we can not use eduyrs in this particular case because we don’t have the same variable in years on partner’s level of education (but even if we have still it will predict worse on the same reason).
Interaction plot:
sjPlot::plot_model(fit5iA, type = "int", terms = c("age_group"), mdrt.values = "meansd")
- parallel lines indicate that there is no interaction effect while different slopes suggest that one might be present - so here we have interaction effect between 40-60 and 60-90.