In this practice, we will focus primarily on how to compare linear regression models and how linear regression relates to other methods we have already covered in this course.
library(foreign)
prac <- read.spss("C:/Users/ASUS/Documents/Data Analysis/gen/sem4_WV6_Russia.sav", to.data.frame = T, use.value.labels = T)
library(dplyr)
df <- select(prac,
V240, # sex
V242, # age
V248, # the highest educational level attained
V239, # scale of income
RESEMAVAL) # index of emancipative values
df$V242 <- as.numeric(as.character(df$V242))
df$V239 <- as.numeric(df$V239)
df$RESEMAVAL <- as.numeric(as.character(df$RESEMAVAL))
rm(prac) # remove "prac" from the working environment
library(ggplot2)
library(RColorBrewer)
library(psych)
df$V248[df$V248 == "No formal education"] <- "Complete primary school"
df$V248[df$V248 == "Incomplete primary school"] <- "Complete primary school"
df$V248 <- droplevels(df$V248) #drop levels with 0 observations from variable description
names(df) <- c("sex", "age", "edu", "income", "emval")
m1 <- lm(emval ~ income, data = df)
m2 <- lm(emval ~ income + age, data = df)
m3 <- lm(emval ~ income + age + sex, data = df)
m4 <- lm(emval ~ income + age + sex + edu, data = df)
To compare model fit across models, anova function is used with hierarchical models as arguments.
To compare models 1-4 estimated in the previous task:
#anova(m1, m2, m3, m4) - gives an error because of the different sizes of the models
Comment: This should return an error “Error in anova.lmlist(object, …) : models were not all fitted to the same size of dataset”
Let’s see how many observations are lost as we fit larger models:
m1$df #2401 df
## [1] 2401
m2$df #2400 df
## [1] 2400
m3$df #2399 df
## [1] 2399
m4$df #2383 df
## [1] 2383
(1) Models 1-3 have 2403 observations while model 4 has 2393
This is so because some respondents failed to indicate their education.
One way to go here is to delete all missing values (if there are not too many) and fit all the models on the same dataset with no missings.
How many missings are there? Look at the last row of data summary:
summary(df)
## sex age
## Male :1115 Min. :18.00
## Female:1385 1st Qu.:31.00
## Median :46.00
## Mean :46.06
## 3rd Qu.:59.00
## Max. :91.00
##
## edu
## Complete secondary school: technical/ vocational type :978
## University - level education, with degree :647
## Complete secondary school: university-preparatory type :367
## Incomplete secondary school: university-preparatory type:196
## Some university-level education, without degree :132
## (Other) :168
## NA's : 12
## income emval
## Min. : 1.000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.2998
## Median : 4.000 Median :0.3917
## Mean : 4.208 Mean :0.3931
## 3rd Qu.: 5.000 3rd Qu.:0.4850
## Max. :10.000 Max. :0.9259
## NA's :75 NA's :22
library(mice)
##
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
##
## cbind, rbind
md.pattern(df)
## sex age edu emval income
## 2393 1 1 1 1 1 0
## 73 1 1 1 1 0 1
## 22 1 1 1 0 1 1
## 10 1 1 0 1 1 1
## 2 1 1 0 1 0 2
## 0 0 12 22 75 109
(2) We see that there are 0 missing values in sex and age, while edu has 12 missings, income has 75, and emval has 22
We started out modelling by regressing emval on income. R automatically deleted all the missing values: 2,500 - 75 - 22 = 2,403 (in this case, the missings belong to different respondents).
When model 4 added edu, 12 more missings appeared. Now the models are not comparable.
Since 107 (missing values) is a negligible share of 2,500 (4.2 per cent), so we can delete it before analysis:
df1 <- na.omit(df)
dim(df)[1] - dim(df1)[1] # difference between the number of rows between data frames
## [1] 107
Now, repeat all the four models on a new data.frame.
We can either replace the data = argument or update the models:
# Fill in the code here
m1upd <- lm(emval ~ income, data = df1)
m2upd <- lm(emval ~ income + age, data = df1)
m3upd <- lm(emval ~ income + age + sex, data = df1)
m4upd <- lm(emval ~ income + age + sex + edu, data = df1)
Now you have:
You can compare model fit across the updated models:
anova(m1upd, m2upd, m3upd, m4upd)
The output shows whether the larger model fits better than the previous smaller model. Thus, Model 2 is compared to Model 1, Model 3 is compared to Model 2, and Model 4 is compared to Model 3.
The difference in model fit is estimated using F-ratio. If the larger model contributes more to predicting the outcome, the difference is statistically significant (Pr(>F) is < 0.05).
The output also presents:
Now you can say that Model 4 fits significantly better than Model 3, even though the gain was not as big as in Models 2 and 3.
ggplot2 - this is a visualization task.m4upd <- update(m4, .~., data = df1)
plot(m4upd)
To reproduce the 1st diagnostic plot in ggplot2, we will use the augment() function of the broom package (part of the tidyverse pack).
The augment function creates a data frame out of the model statistics where all the fitted values, residuals and essential diagnostic statistics can be found.
First, let’s augment the model and create a scatter plot following the 1st diagnostic plot:
library(broom)
am4upd <- augment(m4upd)
head(am4upd, 3)
ggplot(df1, aes(x = fitted.values(m4upd), y = residuals(m4upd))) + geom_point()
Then add:
1. the horizontal line and the line of best fit (in a good model, they are both horizontal) - done 2. labels to residuals that are beyond 3.3 standard deviations (largest model error) 3. make the plot symmetric above and below zero, and 4. add the center-aligned title and lab titles - done
ggplot(df1, aes(x = fitted.values(m4upd), y = residuals(m4upd))) +
geom_hline(yintercept=0, ) + #horizontal line
geom_smooth(method='lm', formula= y~x) + # best fit
labs(
title = "Linear regression residuals plot",
subtitle = "index of emancipative values ~ income + age + sex + eduction",
x = "Fitted Values",
y = "Residuals") +
geom_point() +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
# geom_text()
Now you see how to organize a printing-quality diagnostic plot with your own hands.
Compare:
library(sjPlot) # ! delete "eval = F" in the chunk header before running
plot_model(m4upd, type = "diag")
Let’s compare a linear regression of emval on gender and a t-test with the same variables involved.
First, run the linear regression:
gen <- lm(emval ~ sex, data = df1)
summary(gen)
##
## Call:
## lm(formula = emval ~ sex, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40451 -0.09273 0.00001 0.09177 0.52142
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.378791 0.003982 95.117 < 2e-16 ***
## sexFemale 0.025717 0.005346 4.811 1.6e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.13 on 2391 degrees of freedom
## Multiple R-squared: 0.009586, Adjusted R-squared: 0.009172
## F-statistic: 23.14 on 1 and 2391 DF, p-value: 1.598e-06
library(sjPlot)
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
tab_model(gen)
| emval | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.38 | 0.37 – 0.39 | <0.001 |
| sex: Female | 0.03 | 0.02 – 0.04 | <0.001 |
| Observations | 2393 | ||
| R2 / R2 adjusted | 0.010 / 0.009 | ||
anova(gen)
(4) What do we learn from the lm output? Is the model good? What values are females and males predicted to have?
The model is nice, its p-value os rather small. On the other hand its R^2 is small, too, while I’d like to see it bigger. As for the prediction
t.test(df1$emval ~ df1$sex)
##
## Welch Two Sample t-test
##
## data: df1$emval by df1$sex
## t = -4.8324, df = 2315.9, p-value = 1.437e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.03615227 -0.01528086
## sample estimates:
## mean in group Male mean in group Female
## 0.3787908 0.4045074
library(effsize)
##
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
##
## cohen.d
cohen.d(df1$emval ~ df1$sex)
##
## Cohen's d
##
## d estimate: -0.1978776 (negligible)
## 95 percent confidence interval:
## lower upper
## -0.2787337 -0.1170215
So, what is in common and what is different that you learnt from both methods? Predicted scores? Statistical significance? Magnitude of effect?
In plain words, females have on average a slightly higher level of emval, but sex is not a good predictor of emancipative values.
Big conclusion: t-test and linear regression provide the same information when only 1 predictor is in the model.
Now, compare a model where emval is regressed on edu and a one-way ANOVA with the same variables in.
It is useful to have a list of all education levels for reference:
summary(df1$edu)
## Complete primary school
## 39
## Incomplete secondary school: technical/ vocational type
## 119
## Complete secondary school: technical/ vocational type
## 947
## Incomplete secondary school: university-preparatory type
## 187
## Complete secondary school: university-preparatory type
## 350
## Some university-level education, without degree
## 127
## University - level education, with degree
## 624
ggplot(data = df1,
aes(x = edu)) +
geom_bar(colour = "black",
fill = "cornflowerblue")+
coord_flip() +
theme_bw()
7 levels of education is the set, from 39 to 947 observations in each one, where Complete secondary school is the most popular, and then goes the university level.
Then, let’s fit a linear regression:
lm2 <- lm(emval ~ edu, data = df1)
summary(lm2)
##
## Call:
## lm(formula = emval ~ edu, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39268 -0.09262 -0.00031 0.08986 0.53324
##
## Coefficients:
## Estimate
## (Intercept) 0.31808
## eduIncomplete secondary school: technical/ vocational type 0.03459
## eduComplete secondary school: technical/ vocational type 0.07460
## eduIncomplete secondary school: university-preparatory type 0.03056
## eduComplete secondary school: university-preparatory type 0.06060
## eduSome university-level education, without degree 0.12175
## eduUniversity - level education, with degree 0.09981
## Std. Error
## (Intercept) 0.02054
## eduIncomplete secondary school: technical/ vocational type 0.02367
## eduComplete secondary school: technical/ vocational type 0.02096
## eduIncomplete secondary school: university-preparatory type 0.02258
## eduComplete secondary school: university-preparatory type 0.02165
## eduSome university-level education, without degree 0.02348
## eduUniversity - level education, with degree 0.02117
## t value
## (Intercept) 15.486
## eduIncomplete secondary school: technical/ vocational type 1.461
## eduComplete secondary school: technical/ vocational type 3.560
## eduIncomplete secondary school: university-preparatory type 1.354
## eduComplete secondary school: university-preparatory type 2.798
## eduSome university-level education, without degree 5.185
## eduUniversity - level education, with degree 4.714
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## eduIncomplete secondary school: technical/ vocational type 0.144033
## eduComplete secondary school: technical/ vocational type 0.000379 ***
## eduIncomplete secondary school: university-preparatory type 0.175989
## eduComplete secondary school: university-preparatory type 0.005177 **
## eduSome university-level education, without degree 2.35e-07 ***
## eduUniversity - level education, with degree 2.57e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1283 on 2386 degrees of freedom
## Multiple R-squared: 0.03721, Adjusted R-squared: 0.03479
## F-statistic: 15.37 on 6 and 2386 DF, p-value: < 2.2e-16
What do we learn from the output? Is the model good?
emval for respondents with primary education (reference category) is 0.31. Compared to respondents with primary education, those with complete secondary school (both vocational and university-oriented) and the respondents with any university education or higher have higher average levels of emval predicted by the model:For other levels of education the predicted difference is not statistically different from the level for primary school education.
Next, let’s run an ANOVA:
oneway.test(emval ~ edu, data = df1)
##
## One-way analysis of means (not assuming equal variances)
##
## data: emval and edu
## F = 14.4, num df = 6.00, denom df = 330.98, p-value = 1.338e-14
aov.out <- aov(emval ~ edu, data = df1)
summary(aov.out)
## Df Sum Sq Mean Sq F value Pr(>F)
## edu 6 1.52 0.25289 15.37 <2e-16 ***
## Residuals 2386 39.26 0.01645
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov.out)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = emval ~ edu, data = df1)
##
## $edu
## diff
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.034587901
## Complete secondary school: technical/ vocational type-Complete primary school 0.074602902
## Incomplete secondary school: university-preparatory type-Complete primary school 0.030564877
## Complete secondary school: university-preparatory type-Complete primary school 0.060595942
## Some university-level education, without degree-Complete primary school 0.121747333
## University - level education, with degree-Complete primary school 0.099810716
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.040015001
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.004023024
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.026008041
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.087159432
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.065222815
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.044038025
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.014006960
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.047144431
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.025207814
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.030031066
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.091182456
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.069245839
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.061151390
## University - level education, with degree-Complete secondary school: university-preparatory type 0.039214774
## University - level education, with degree-Some university-level education, without degree -0.021936617
## lwr
## Incomplete secondary school: technical/ vocational type-Complete primary school -0.035250911
## Complete secondary school: technical/ vocational type-Complete primary school 0.012757809
## Incomplete secondary school: university-preparatory type-Complete primary school -0.036065949
## Complete secondary school: university-preparatory type-Complete primary school -0.003301365
## Some university-level education, without degree-Complete primary school 0.052453599
## University - level education, with degree-Complete primary school 0.037335718
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.003201746
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.048408463
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.014157443
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.038868421
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.027360888
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.074327017
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.037684417
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.011376042
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.005691631
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type -0.004254112
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.047659714
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.037690605
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.021941341
## University - level education, with degree-Complete secondary school: university-preparatory type 0.013937674
## University - level education, with degree-Some university-level education, without degree -0.058783427
## upr
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.104426713
## Complete secondary school: technical/ vocational type-Complete primary school 0.136447996
## Incomplete secondary school: university-preparatory type-Complete primary school 0.097195703
## Complete secondary school: university-preparatory type-Complete primary school 0.124493250
## Some university-level education, without degree-Complete primary school 0.191041067
## University - level education, with degree-Complete primary school 0.162285714
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.076828256
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.040362414
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.066173525
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.135450443
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.103084742
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.013749033
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.009670498
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.082912819
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.044723998
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.064316243
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.134705198
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.100801074
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.100361440
## University - level education, with degree-Complete secondary school: university-preparatory type 0.064491874
## University - level education, with degree-Some university-level education, without degree 0.014910194
## p adj
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.7677401
## Complete secondary school: technical/ vocational type-Complete primary school 0.0069404
## Incomplete secondary school: university-preparatory type-Complete primary school 0.8262350
## Complete secondary school: university-preparatory type-Complete primary school 0.0763027
## Some university-level education, without degree-Complete primary school 0.0000049
## University - level education, with degree-Complete primary school 0.0000526
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.0229659
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.9999701
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.4732618
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.0000023
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.0000083
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.0003716
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.5851899
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.0019896
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.0027022
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.1310377
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.0000000
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.0000000
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.0000898
## University - level education, with degree-Complete secondary school: university-preparatory type 0.0001006
## University - level education, with degree-Some university-level education, without degree 0.5776114
The results are not readable with the current names.
Let’s shorten education level names to c(“prim”, “PTU_incompl”, “Sec_voc”, “Gymnas_incompl”, “Gymnas”, “Some_uni”, “Uni”)
levels(df1$edu) <- c("prim", "PTU_incompl", "Sec_voc", "Gymnas_incompl", "Gymnas", "Some_uni", "Uni") # delete eval = F
table(df1$edu)
Now, repeat the one-way ANOVA!
oneway.test(emval ~ edu, data = df1)
##
## One-way analysis of means (not assuming equal variances)
##
## data: emval and edu
## F = 14.4, num df = 6.00, denom df = 330.98, p-value = 1.338e-14
aov.out <- aov(emval ~ edu, data = df1)
summary(aov.out)
## Df Sum Sq Mean Sq F value Pr(>F)
## edu 6 1.52 0.25289 15.37 <2e-16 ***
## Residuals 2386 39.26 0.01645
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov.out)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = emval ~ edu, data = df1)
##
## $edu
## diff
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.034587901
## Complete secondary school: technical/ vocational type-Complete primary school 0.074602902
## Incomplete secondary school: university-preparatory type-Complete primary school 0.030564877
## Complete secondary school: university-preparatory type-Complete primary school 0.060595942
## Some university-level education, without degree-Complete primary school 0.121747333
## University - level education, with degree-Complete primary school 0.099810716
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.040015001
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.004023024
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.026008041
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.087159432
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.065222815
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.044038025
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.014006960
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.047144431
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.025207814
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.030031066
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.091182456
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.069245839
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.061151390
## University - level education, with degree-Complete secondary school: university-preparatory type 0.039214774
## University - level education, with degree-Some university-level education, without degree -0.021936617
## lwr
## Incomplete secondary school: technical/ vocational type-Complete primary school -0.035250911
## Complete secondary school: technical/ vocational type-Complete primary school 0.012757809
## Incomplete secondary school: university-preparatory type-Complete primary school -0.036065949
## Complete secondary school: university-preparatory type-Complete primary school -0.003301365
## Some university-level education, without degree-Complete primary school 0.052453599
## University - level education, with degree-Complete primary school 0.037335718
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.003201746
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.048408463
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.014157443
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.038868421
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.027360888
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.074327017
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.037684417
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.011376042
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.005691631
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type -0.004254112
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.047659714
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.037690605
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.021941341
## University - level education, with degree-Complete secondary school: university-preparatory type 0.013937674
## University - level education, with degree-Some university-level education, without degree -0.058783427
## upr
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.104426713
## Complete secondary school: technical/ vocational type-Complete primary school 0.136447996
## Incomplete secondary school: university-preparatory type-Complete primary school 0.097195703
## Complete secondary school: university-preparatory type-Complete primary school 0.124493250
## Some university-level education, without degree-Complete primary school 0.191041067
## University - level education, with degree-Complete primary school 0.162285714
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.076828256
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.040362414
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.066173525
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.135450443
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.103084742
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type -0.013749033
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.009670498
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.082912819
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.044723998
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.064316243
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.134705198
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.100801074
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.100361440
## University - level education, with degree-Complete secondary school: university-preparatory type 0.064491874
## University - level education, with degree-Some university-level education, without degree 0.014910194
## p adj
## Incomplete secondary school: technical/ vocational type-Complete primary school 0.7677401
## Complete secondary school: technical/ vocational type-Complete primary school 0.0069404
## Incomplete secondary school: university-preparatory type-Complete primary school 0.8262350
## Complete secondary school: university-preparatory type-Complete primary school 0.0763027
## Some university-level education, without degree-Complete primary school 0.0000049
## University - level education, with degree-Complete primary school 0.0000526
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type 0.0229659
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.9999701
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.4732618
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type 0.0000023
## University - level education, with degree-Incomplete secondary school: technical/ vocational type 0.0000083
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.0003716
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type 0.5851899
## Some university-level education, without degree-Complete secondary school: technical/ vocational type 0.0019896
## University - level education, with degree-Complete secondary school: technical/ vocational type 0.0027022
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type 0.1310377
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type 0.0000000
## University - level education, with degree-Incomplete secondary school: university-preparatory type 0.0000000
## Some university-level education, without degree-Complete secondary school: university-preparatory type 0.0000898
## University - level education, with degree-Complete secondary school: university-preparatory type 0.0001006
## University - level education, with degree-Some university-level education, without degree 0.5776114
(7) What do we learn about groups means? Is the effect size large?
F-value is 15.37, whih is not bad, p-value is small enough.
As for the results, significant deffirerences can be seen in the following groups:
Also, I’ve noticed that university with and without degree have nearly same results in all the pairs.
Comparing lm to one-way ANOVA, what do we learn from both methods?
In plain words, both models show that the longer the education of respondents, the more important emancipative values are for them.
Lastly, let’s compare linear regression to correlation. You have already learned about this, so it is time to put this into practice.
emval and age:cor(df1$emval, df1$age, method = c("pearson"))
## [1] -0.1878421
cor.test(df1$emval, df1$age, method=c("pearson"))
##
## Pearson's product-moment correlation
##
## data: df1$emval and df1$age
## t = -9.3515, df = 2391, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2262092 -0.1488930
## sample estimates:
## cor
## -0.1878421
What do we see in the output?
emval on age:summary(lm(emval ~ scale(age, scale = T, center = T), data = df1))
##
## Call:
## lm(formula = emval ~ scale(age, scale = T, center = T), data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41997 -0.09266 -0.00088 0.08883 0.49748
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.393062 0.002622 149.908 <2e-16
## scale(age, scale = T, center = T) -0.024525 0.002623 -9.352 <2e-16
##
## (Intercept) ***
## scale(age, scale = T, center = T) ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1283 on 2391 degrees of freedom
## Multiple R-squared: 0.03528, Adjusted R-squared: 0.03488
## F-statistic: 87.45 on 1 and 2391 DF, p-value: < 2.2e-16
In plain words, older respondents tend to have lower emancipative values, but, again, this is not a strong predictor as the model explains 3.5% of the variation in emancipative values.
Linear regression is the ‘umbrella’ method over t-test, ANOVA, and correlation.
In contrast to other three methods, linear regression can endorse continuous and categorical predictors simultaneously.
Moreover, linear regression provides a number (regression coefficient) that describes the relationships between a predictor, if other predictors held constant, and the outcome.
To sum up all that you learnt today, study the following table - locate familiar methods in the 1st column and how to do the same using linear model.
In broad strokes, all these methods are part of the General Linear Model (GLM) perspective, describing all models with linear relationships.