In this practice, we will focus primarily on how to compare linear regression models and how linear regression relates to other methods we have already covered in this course.

library(foreign)
prac <- read.spss("C:/Users/ASUS/Documents/Data Analysis/gen/sem4_WV6_Russia.sav", to.data.frame = T, use.value.labels = T)
library(dplyr)
df <- select(prac, 
             V240, # sex
             V242, # age
             V248, # the highest educational level attained
             V239, # scale of income
             RESEMAVAL) # index of emancipative values
df$V242 <- as.numeric(as.character(df$V242))
df$V239 <- as.numeric(df$V239)
df$RESEMAVAL <- as.numeric(as.character(df$RESEMAVAL))
rm(prac) # remove "prac" from the working environment
library(ggplot2)
library(RColorBrewer)
library(psych)
df$V248[df$V248 == "No formal education"] <- "Complete primary school" 
df$V248[df$V248 == "Incomplete primary school"] <- "Complete primary school"
df$V248 <- droplevels(df$V248) #drop levels with 0 observations from variable description

names(df) <- c("sex", "age", "edu", "income", "emval")
m1 <- lm(emval ~ income, data = df)
m2 <- lm(emval ~ income + age, data = df)
m3 <- lm(emval ~ income + age + sex, data = df)
m4 <- lm(emval ~ income + age + sex + edu, data = df)

Contents:

  1. Comparing model fit - done
  2. Diagnostic plots - done
  3. lm() vs. t.test() - done
  4. lm() vs. oneway.test() - done
  5. lm() vs. cor.test() - done

1. Compare model fit across models

To compare model fit across models, anova function is used with hierarchical models as arguments.

To compare models 1-4 estimated in the previous task:

#anova(m1, m2, m3, m4) - gives an error because of the different sizes of the models

Comment: This should return an error “Error in anova.lmlist(object, …) : models were not all fitted to the same size of dataset”

  1. What to do if you get this error message?

Let’s see how many observations are lost as we fit larger models:

m1$df #2401 df
## [1] 2401
m2$df #2400 df
## [1] 2400
m3$df #2399 df
## [1] 2399
m4$df #2383 df
## [1] 2383

(1) Models 1-3 have 2403 observations while model 4 has 2393

This is so because some respondents failed to indicate their education.

One way to go here is to delete all missing values (if there are not too many) and fit all the models on the same dataset with no missings.

How many missings are there? Look at the last row of data summary:

summary(df)
##      sex            age       
##  Male  :1115   Min.   :18.00  
##  Female:1385   1st Qu.:31.00  
##                Median :46.00  
##                Mean   :46.06  
##                3rd Qu.:59.00  
##                Max.   :91.00  
##                               
##                                                        edu     
##  Complete secondary school: technical/ vocational type   :978  
##  University - level education, with degree               :647  
##  Complete secondary school: university-preparatory type  :367  
##  Incomplete secondary school: university-preparatory type:196  
##  Some university-level education, without degree         :132  
##  (Other)                                                 :168  
##  NA's                                                    : 12  
##      income           emval       
##  Min.   : 1.000   Min.   :0.0000  
##  1st Qu.: 3.000   1st Qu.:0.2998  
##  Median : 4.000   Median :0.3917  
##  Mean   : 4.208   Mean   :0.3931  
##  3rd Qu.: 5.000   3rd Qu.:0.4850  
##  Max.   :10.000   Max.   :0.9259  
##  NA's   :75       NA's   :22
library(mice)
## 
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
md.pattern(df)

##      sex age edu emval income    
## 2393   1   1   1     1      1   0
## 73     1   1   1     1      0   1
## 22     1   1   1     0      1   1
## 10     1   1   0     1      1   1
## 2      1   1   0     1      0   2
##        0   0  12    22     75 109

(2) We see that there are 0 missing values in sex and age, while edu has 12 missings, income has 75, and emval has 22

We started out modelling by regressing emval on income. R automatically deleted all the missing values: 2,500 - 75 - 22 = 2,403 (in this case, the missings belong to different respondents).

When model 4 added edu, 12 more missings appeared. Now the models are not comparable.

Since 107 (missing values) is a negligible share of 2,500 (4.2 per cent), so we can delete it before analysis:

df1 <- na.omit(df)
dim(df)[1] - dim(df1)[1] # difference between the number of rows between data frames
## [1] 107

Now, repeat all the four models on a new data.frame.

We can either replace the data = argument or update the models:

# Fill in the code here
m1upd <- lm(emval ~ income, data = df1)
m2upd <- lm(emval ~ income + age, data = df1)
m3upd <- lm(emval ~ income + age + sex, data = df1)
m4upd <- lm(emval ~ income + age + sex + edu, data = df1)

Now you have:

  1. hierarchical models
  2. those models are estimated on the same data

You can compare model fit across the updated models:

anova(m1upd, m2upd, m3upd, m4upd)

The output shows whether the larger model fits better than the previous smaller model. Thus, Model 2 is compared to Model 1, Model 3 is compared to Model 2, and Model 4 is compared to Model 3.

The difference in model fit is estimated using F-ratio. If the larger model contributes more to predicting the outcome, the difference is statistically significant (Pr(>F) is < 0.05).

The output also presents:

  • the residual degrees of freedom,
  • the residual sum of squares (Error),
  • how many degrees of freedom were spent on adding a new predictor, and
  • the gain in model sum of squares due to the new predictor.

Now you can say that Model 4 fits significantly better than Model 3, even though the gain was not as big as in Models 2 and 3.

2. Reproduce the diagnostic plots using ggplot2 - this is a visualization task.

m4upd <- update(m4, .~., data = df1)
plot(m4upd)

To reproduce the 1st diagnostic plot in ggplot2, we will use the augment() function of the broom package (part of the tidyverse pack).

The augment function creates a data frame out of the model statistics where all the fitted values, residuals and essential diagnostic statistics can be found.

First, let’s augment the model and create a scatter plot following the 1st diagnostic plot:

library(broom)
am4upd <- augment(m4upd)
head(am4upd, 3)
  1. Now, create a ggplot scatterplot with fitted values on X-axis and residuals on Y-axis:
ggplot(df1, aes(x = fitted.values(m4upd), y = residuals(m4upd))) + geom_point()

Then add:

1. the horizontal line and the line of best fit (in a good model, they are both horizontal) - done 2. labels to residuals that are beyond 3.3 standard deviations (largest model error) 3. make the plot symmetric above and below zero, and 4. add the center-aligned title and lab titles - done

ggplot(df1, aes(x = fitted.values(m4upd), y = residuals(m4upd))) + 
  geom_hline(yintercept=0, ) +              #horizontal line
  geom_smooth(method='lm', formula= y~x) +  # best fit
  labs(
    title = "Linear regression residuals plot", 
    subtitle = "index of emancipative values ~ income + age + sex + eduction",
    x = "Fitted Values",
    y = "Residuals") + 
  geom_point() +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(plot.subtitle = element_text(hjust = 0.5))

#  geom_text() 

Now you see how to organize a printing-quality diagnostic plot with your own hands.

Compare:

library(sjPlot) # ! delete "eval = F" in the chunk header before running
plot_model(m4upd, type = "diag")

3. lm() vs. t.test()

Let’s compare a linear regression of emval on gender and a t-test with the same variables involved.

First, run the linear regression:

gen <- lm(emval ~ sex, data = df1)
summary(gen)
## 
## Call:
## lm(formula = emval ~ sex, data = df1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40451 -0.09273  0.00001  0.09177  0.52142 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.378791   0.003982  95.117  < 2e-16 ***
## sexFemale   0.025717   0.005346   4.811  1.6e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.13 on 2391 degrees of freedom
## Multiple R-squared:  0.009586,   Adjusted R-squared:  0.009172 
## F-statistic: 23.14 on 1 and 2391 DF,  p-value: 1.598e-06
library(sjPlot)
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
tab_model(gen)
  emval
Predictors Estimates CI p
(Intercept) 0.38 0.37 – 0.39 <0.001
sex: Female 0.03 0.02 – 0.04 <0.001
Observations 2393
R2 / R2 adjusted 0.010 / 0.009
anova(gen)

(4) What do we learn from the lm output? Is the model good? What values are females and males predicted to have?

The model is nice, its p-value os rather small. On the other hand its R^2 is small, too, while I’d like to see it bigger. As for the prediction

  1. What would we learn from the t-test output? What are the average levels of emval for males and females? What is the effect size?
t.test(df1$emval ~ df1$sex)
## 
##  Welch Two Sample t-test
## 
## data:  df1$emval by df1$sex
## t = -4.8324, df = 2315.9, p-value = 1.437e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.03615227 -0.01528086
## sample estimates:
##   mean in group Male mean in group Female 
##            0.3787908            0.4045074
library(effsize)
## 
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
## 
##     cohen.d
cohen.d(df1$emval ~ df1$sex)
## 
## Cohen's d
## 
## d estimate: -0.1978776 (negligible)
## 95 percent confidence interval:
##      lower      upper 
## -0.2787337 -0.1170215

So, what is in common and what is different that you learnt from both methods? Predicted scores? Statistical significance? Magnitude of effect?

  1. Statistical significance is OK in both cases, p-values are small enough both in lm and t.test.
  2. Magnitude of effect is small in both cases, close to zero.

In plain words, females have on average a slightly higher level of emval, but sex is not a good predictor of emancipative values.

Big conclusion: t-test and linear regression provide the same information when only 1 predictor is in the model.

4. lm() vs. oneway.test()

Now, compare a model where emval is regressed on edu and a one-way ANOVA with the same variables in.

It is useful to have a list of all education levels for reference:

summary(df1$edu)
##                                  Complete primary school 
##                                                       39 
##  Incomplete secondary school: technical/ vocational type 
##                                                      119 
##    Complete secondary school: technical/ vocational type 
##                                                      947 
## Incomplete secondary school: university-preparatory type 
##                                                      187 
##   Complete secondary school: university-preparatory type 
##                                                      350 
##          Some university-level education, without degree 
##                                                      127 
##                University - level education, with degree 
##                                                      624
ggplot(data = df1, 
       aes(x = edu)) + 
  geom_bar(colour = "black",
       fill = "cornflowerblue")+
  coord_flip() +
  theme_bw()

7 levels of education is the set, from 39 to 947 observations in each one, where Complete secondary school is the most popular, and then goes the university level.

Then, let’s fit a linear regression:

lm2 <- lm(emval ~ edu, data = df1)
summary(lm2)
## 
## Call:
## lm(formula = emval ~ edu, data = df1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39268 -0.09262 -0.00031  0.08986  0.53324 
## 
## Coefficients:
##                                                             Estimate
## (Intercept)                                                  0.31808
## eduIncomplete secondary school: technical/ vocational type   0.03459
## eduComplete secondary school: technical/ vocational type     0.07460
## eduIncomplete secondary school: university-preparatory type  0.03056
## eduComplete secondary school: university-preparatory type    0.06060
## eduSome university-level education, without degree           0.12175
## eduUniversity - level education, with degree                 0.09981
##                                                             Std. Error
## (Intercept)                                                    0.02054
## eduIncomplete secondary school: technical/ vocational type     0.02367
## eduComplete secondary school: technical/ vocational type       0.02096
## eduIncomplete secondary school: university-preparatory type    0.02258
## eduComplete secondary school: university-preparatory type      0.02165
## eduSome university-level education, without degree             0.02348
## eduUniversity - level education, with degree                   0.02117
##                                                             t value
## (Intercept)                                                  15.486
## eduIncomplete secondary school: technical/ vocational type    1.461
## eduComplete secondary school: technical/ vocational type      3.560
## eduIncomplete secondary school: university-preparatory type   1.354
## eduComplete secondary school: university-preparatory type     2.798
## eduSome university-level education, without degree            5.185
## eduUniversity - level education, with degree                  4.714
##                                                             Pr(>|t|)    
## (Intercept)                                                  < 2e-16 ***
## eduIncomplete secondary school: technical/ vocational type  0.144033    
## eduComplete secondary school: technical/ vocational type    0.000379 ***
## eduIncomplete secondary school: university-preparatory type 0.175989    
## eduComplete secondary school: university-preparatory type   0.005177 ** 
## eduSome university-level education, without degree          2.35e-07 ***
## eduUniversity - level education, with degree                2.57e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1283 on 2386 degrees of freedom
## Multiple R-squared:  0.03721,    Adjusted R-squared:  0.03479 
## F-statistic: 15.37 on 6 and 2386 DF,  p-value: < 2.2e-16

What do we learn from the output? Is the model good?

  1. The predicted level of emval for respondents with primary education (reference category) is 0.31. Compared to respondents with primary education, those with complete secondary school (both vocational and university-oriented) and the respondents with any university education or higher have higher average levels of emval predicted by the model:
  • for completed vocational school is 0.7;
  • for completed university-oriented school is 0.6;
  • for any university education is 0.22;
  • for complete university education is 0.9.

For other levels of education the predicted difference is not statistically different from the level for primary school education.

Next, let’s run an ANOVA:

oneway.test(emval ~ edu, data = df1)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  emval and edu
## F = 14.4, num df = 6.00, denom df = 330.98, p-value = 1.338e-14
aov.out <- aov(emval ~ edu, data = df1) 
summary(aov.out)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## edu            6   1.52 0.25289   15.37 <2e-16 ***
## Residuals   2386  39.26 0.01645                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov.out)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = emval ~ edu, data = df1)
## 
## $edu
##                                                                                                                          diff
## Incomplete secondary school: technical/ vocational type-Complete primary school                                   0.034587901
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.074602902
## Incomplete secondary school: university-preparatory type-Complete primary school                                  0.030564877
## Complete secondary school: university-preparatory type-Complete primary school                                    0.060595942
## Some university-level education, without degree-Complete primary school                                           0.121747333
## University - level education, with degree-Complete primary school                                                 0.099810716
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.040015001
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.004023024
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type    0.026008041
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.087159432
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.065222815
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.044038025
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     -0.014006960
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.047144431
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.025207814
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type   0.030031066
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.091182456
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.069245839
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.061151390
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.039214774
## University - level education, with degree-Some university-level education, without degree                        -0.021936617
##                                                                                                                           lwr
## Incomplete secondary school: technical/ vocational type-Complete primary school                                  -0.035250911
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.012757809
## Incomplete secondary school: university-preparatory type-Complete primary school                                 -0.036065949
## Complete secondary school: university-preparatory type-Complete primary school                                   -0.003301365
## Some university-level education, without degree-Complete primary school                                           0.052453599
## University - level education, with degree-Complete primary school                                                 0.037335718
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.003201746
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.048408463
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type   -0.014157443
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.038868421
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.027360888
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.074327017
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     -0.037684417
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.011376042
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.005691631
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type  -0.004254112
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.047659714
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.037690605
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.021941341
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.013937674
## University - level education, with degree-Some university-level education, without degree                        -0.058783427
##                                                                                                                           upr
## Incomplete secondary school: technical/ vocational type-Complete primary school                                   0.104426713
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.136447996
## Incomplete secondary school: university-preparatory type-Complete primary school                                  0.097195703
## Complete secondary school: university-preparatory type-Complete primary school                                    0.124493250
## Some university-level education, without degree-Complete primary school                                           0.191041067
## University - level education, with degree-Complete primary school                                                 0.162285714
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.076828256
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type  0.040362414
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type    0.066173525
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.135450443
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.103084742
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.013749033
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type      0.009670498
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.082912819
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.044723998
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type   0.064316243
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.134705198
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.100801074
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.100361440
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.064491874
## University - level education, with degree-Some university-level education, without degree                         0.014910194
##                                                                                                                      p adj
## Incomplete secondary school: technical/ vocational type-Complete primary school                                  0.7677401
## Complete secondary school: technical/ vocational type-Complete primary school                                    0.0069404
## Incomplete secondary school: university-preparatory type-Complete primary school                                 0.8262350
## Complete secondary school: university-preparatory type-Complete primary school                                   0.0763027
## Some university-level education, without degree-Complete primary school                                          0.0000049
## University - level education, with degree-Complete primary school                                                0.0000526
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type    0.0229659
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.9999701
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type   0.4732618
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type          0.0000023
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                0.0000083
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   0.0003716
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     0.5851899
## Some university-level education, without degree-Complete secondary school: technical/ vocational type            0.0019896
## University - level education, with degree-Complete secondary school: technical/ vocational type                  0.0027022
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type  0.1310377
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type         0.0000000
## University - level education, with degree-Incomplete secondary school: university-preparatory type               0.0000000
## Some university-level education, without degree-Complete secondary school: university-preparatory type           0.0000898
## University - level education, with degree-Complete secondary school: university-preparatory type                 0.0001006
## University - level education, with degree-Some university-level education, without degree                        0.5776114

The results are not readable with the current names.

Let’s shorten education level names to c(“prim”, “PTU_incompl”, “Sec_voc”, “Gymnas_incompl”, “Gymnas”, “Some_uni”, “Uni”)

levels(df1$edu) <- c("prim", "PTU_incompl", "Sec_voc", "Gymnas_incompl", "Gymnas", "Some_uni", "Uni") # delete eval = F
table(df1$edu)

Now, repeat the one-way ANOVA!

oneway.test(emval ~ edu, data = df1)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  emval and edu
## F = 14.4, num df = 6.00, denom df = 330.98, p-value = 1.338e-14
aov.out <- aov(emval ~ edu, data = df1) 
summary(aov.out)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## edu            6   1.52 0.25289   15.37 <2e-16 ***
## Residuals   2386  39.26 0.01645                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov.out)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = emval ~ edu, data = df1)
## 
## $edu
##                                                                                                                          diff
## Incomplete secondary school: technical/ vocational type-Complete primary school                                   0.034587901
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.074602902
## Incomplete secondary school: university-preparatory type-Complete primary school                                  0.030564877
## Complete secondary school: university-preparatory type-Complete primary school                                    0.060595942
## Some university-level education, without degree-Complete primary school                                           0.121747333
## University - level education, with degree-Complete primary school                                                 0.099810716
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.040015001
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.004023024
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type    0.026008041
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.087159432
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.065222815
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.044038025
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     -0.014006960
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.047144431
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.025207814
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type   0.030031066
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.091182456
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.069245839
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.061151390
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.039214774
## University - level education, with degree-Some university-level education, without degree                        -0.021936617
##                                                                                                                           lwr
## Incomplete secondary school: technical/ vocational type-Complete primary school                                  -0.035250911
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.012757809
## Incomplete secondary school: university-preparatory type-Complete primary school                                 -0.036065949
## Complete secondary school: university-preparatory type-Complete primary school                                   -0.003301365
## Some university-level education, without degree-Complete primary school                                           0.052453599
## University - level education, with degree-Complete primary school                                                 0.037335718
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.003201746
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type -0.048408463
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type   -0.014157443
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.038868421
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.027360888
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.074327017
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     -0.037684417
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.011376042
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.005691631
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type  -0.004254112
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.047659714
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.037690605
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.021941341
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.013937674
## University - level education, with degree-Some university-level education, without degree                        -0.058783427
##                                                                                                                           upr
## Incomplete secondary school: technical/ vocational type-Complete primary school                                   0.104426713
## Complete secondary school: technical/ vocational type-Complete primary school                                     0.136447996
## Incomplete secondary school: university-preparatory type-Complete primary school                                  0.097195703
## Complete secondary school: university-preparatory type-Complete primary school                                    0.124493250
## Some university-level education, without degree-Complete primary school                                           0.191041067
## University - level education, with degree-Complete primary school                                                 0.162285714
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type     0.076828256
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type  0.040362414
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type    0.066173525
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type           0.135450443
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                 0.103084742
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   -0.013749033
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type      0.009670498
## Some university-level education, without degree-Complete secondary school: technical/ vocational type             0.082912819
## University - level education, with degree-Complete secondary school: technical/ vocational type                   0.044723998
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type   0.064316243
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type          0.134705198
## University - level education, with degree-Incomplete secondary school: university-preparatory type                0.100801074
## Some university-level education, without degree-Complete secondary school: university-preparatory type            0.100361440
## University - level education, with degree-Complete secondary school: university-preparatory type                  0.064491874
## University - level education, with degree-Some university-level education, without degree                         0.014910194
##                                                                                                                      p adj
## Incomplete secondary school: technical/ vocational type-Complete primary school                                  0.7677401
## Complete secondary school: technical/ vocational type-Complete primary school                                    0.0069404
## Incomplete secondary school: university-preparatory type-Complete primary school                                 0.8262350
## Complete secondary school: university-preparatory type-Complete primary school                                   0.0763027
## Some university-level education, without degree-Complete primary school                                          0.0000049
## University - level education, with degree-Complete primary school                                                0.0000526
## Complete secondary school: technical/ vocational type-Incomplete secondary school: technical/ vocational type    0.0229659
## Incomplete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type 0.9999701
## Complete secondary school: university-preparatory type-Incomplete secondary school: technical/ vocational type   0.4732618
## Some university-level education, without degree-Incomplete secondary school: technical/ vocational type          0.0000023
## University - level education, with degree-Incomplete secondary school: technical/ vocational type                0.0000083
## Incomplete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type   0.0003716
## Complete secondary school: university-preparatory type-Complete secondary school: technical/ vocational type     0.5851899
## Some university-level education, without degree-Complete secondary school: technical/ vocational type            0.0019896
## University - level education, with degree-Complete secondary school: technical/ vocational type                  0.0027022
## Complete secondary school: university-preparatory type-Incomplete secondary school: university-preparatory type  0.1310377
## Some university-level education, without degree-Incomplete secondary school: university-preparatory type         0.0000000
## University - level education, with degree-Incomplete secondary school: university-preparatory type               0.0000000
## Some university-level education, without degree-Complete secondary school: university-preparatory type           0.0000898
## University - level education, with degree-Complete secondary school: university-preparatory type                 0.0001006
## University - level education, with degree-Some university-level education, without degree                        0.5776114

(7) What do we learn about groups means? Is the effect size large?

F-value is 15.37, whih is not bad, p-value is small enough.

As for the results, significant deffirerences can be seen in the following groups:

  • university VS primary
  • university VS incomplete PTU
  • university VS secondary vacation
  • university VS gymnasiums (both complete and incomlete)
  • incomplete gymnasium VS secondary vacation.

Also, I’ve noticed that university with and without degree have nearly same results in all the pairs.

Comparing lm to one-way ANOVA, what do we learn from both methods?

  1. Are the results the same? - Yes, but not really. LM shows the relation of different levels of education to the output, while ANOVA pays attention to the differences between groups. So, in general, both methods highlight (for example) university group, but while LM tells, that university increase the emval value, anova points out that for university level emval is higher than for primary.
  2. Do both methods provide difference between groups? - yes, both methods provide statistically significant insights to groups differences.

In plain words, both models show that the longer the education of respondents, the more important emancipative values are for them.

5. lm() vs. cor.test()

Lastly, let’s compare linear regression to correlation. You have already learned about this, so it is time to put this into practice.

  1. Calculate Pearson’s correlation coefficient between emval and age:
cor(df1$emval, df1$age, method = c("pearson"))
## [1] -0.1878421
cor.test(df1$emval, df1$age, method=c("pearson"))
## 
##  Pearson's product-moment correlation
## 
## data:  df1$emval and df1$age
## t = -9.3515, df = 2391, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2262092 -0.1488930
## sample estimates:
##        cor 
## -0.1878421

What do we see in the output?

  • p-value is small - results are statistically significant;
  • correlation coefficient is equal to -0.19, indicating a negative, quite weak correlation between age and emancipative values
  1. What about linear regression? Estimate a model where you regress emval on age:
summary(lm(emval ~ scale(age, scale = T, center = T), data = df1))
## 
## Call:
## lm(formula = emval ~ scale(age, scale = T, center = T), data = df1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41997 -0.09266 -0.00088  0.08883  0.49748 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        0.393062   0.002622 149.908   <2e-16
## scale(age, scale = T, center = T) -0.024525   0.002623  -9.352   <2e-16
##                                      
## (Intercept)                       ***
## scale(age, scale = T, center = T) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1283 on 2391 degrees of freedom
## Multiple R-squared:  0.03528,    Adjusted R-squared:  0.03488 
## F-statistic: 87.45 on 1 and 2391 DF,  p-value: < 2.2e-16
  • Is the model good? - p-value is small, relation between age and emancipative values is significant, but really weak (R^2 is 0.03)
  • How to interpret the regression coefficient? - with every point of age emancipative value descreases by 0.03.
  • Is the coefficient t value close to the statistic in correlation? - in both cases effect is negative and weak, however in correlation it is a liiiiitle bit storonger. (not still weak, yes)
  • What is the relation between the correlation coefficient and R-squared in this model?
  1. Make your conclusions from comparing the two methods:
  1. Correlation method is good for the first steps of analysis as it is relatively simple and clear, gives little insight saying if there are some reltions or not;
  2. linear model gives a bit more certain output, we may say about predictions with this method.

In plain words, older respondents tend to have lower emancipative values, but, again, this is not a strong predictor as the model explains 3.5% of the variation in emancipative values.

Overall conclusions to points 3-5:

Linear regression is the ‘umbrella’ method over t-test, ANOVA, and correlation.

In contrast to other three methods, linear regression can endorse continuous and categorical predictors simultaneously.

Moreover, linear regression provides a number (regression coefficient) that describes the relationships between a predictor, if other predictors held constant, and the outcome.

To sum up all that you learnt today, study the following table - locate familiar methods in the 1st column and how to do the same using linear model.

In broad strokes, all these methods are part of the General Linear Model (GLM) perspective, describing all models with linear relationships.

Source: https://lindeloev.github.io/tests-as-linear/