Team Members

  1. Nadezhda Bykova

  2. Anastasia Vlasenko

  3. Anna Gorobtsova

  4. Artyom Kulikov

Introduction

This time we were able to get our hands on some really good and insightful data - TIMSS 2015. We are looking at the scales that measure attitudes towards Mathematics of 8th grade students. With the help of Explanatory Factor Analysis we will be able to determine whether there are latent factors that influence our variables and how they are groupped. Afterwards, we will try to build a linear regression model to see how well these factors, along with gender, parental education and place of birth of a child, explain the variation of math achievement of a student.

Since we have variables that are connected to what can be divided into categories like ‘Attitudes towards Learning Mathematics’, ‘Your Mathematics Lessons’ and ‘Attitudes towards Mathematics’, we can check whether a child’s performance is somehow connected to how they are interested in Math and how they feel about it both class-wise and in general. Some papers suggest that attitudes towards Math and achievement in school, both in Math and in other subjects, are positively correlated, thus, those who are more positive about Math, perform better (Nicolaidou and Philippou, 2003). Others specifically show that attitudes can explain up to 32% of achievement in Math (Lipnevich et al., 2011). Thus, I suggest the following research questions for the linear regression:

  1. Do students who feel more confident in mathematics perform better?

  2. Does maternal education explaon the performance of a child better than paternal education?

  3. Do girls perform in Mathematics better than boys?

Keeping the suggestions of other researchers in mind, we will conduct EFA to find any latent factors there are, then use them to build a linear regression model and test our research questions.

Getting started

Here we simply upload our data which is, again, 8th grade students questionnaire answers. We also call our main libraries and save the data, filtering the NAs. We also create two datasets: for EFA and for regression analysis.

library(foreign)
library(psych)
library(knitr)
library(magrittr)
library(kableExtra)
library(polycor)
library(corrplot)
library(car)
library(ggplot2)
library(dplyr)
library(sjPlot)
data1 <- read.spss("BSGSGPM6.sav", to.data.frame = TRUE, use.value.labels = TRUE)
data2 <- data1[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")]
data3 <- na.omit(data2)
save1 <- c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")
save2 <- c("BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")
data_fa <- data3[save1] 
data_reg <- data3[save2] 

Descriptive statistics

First of all, I should mention that the variables are scales with ‘Agree a lot’ (1), ‘Agree a little’ (2), ‘Disagree a little’ (3) and ‘Disagree a lot’ (4). We should remember about the inverse scale when interpreting the results.

All of our variables are categorical. Since there are so many, running tests wouldn’t be reasonable (also a waste of time) so we will look into summary tables for all variables.

kable(tab1, col.names = name1) %>% 
  kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
BSBM17A BSBM17B BSBM17C BSBM17D BSBM17E BSBM17F BSBM17G BSBM17H BSBM17I
Agree a lot 2232 964 610 1947 2021 947 1487 1178 1924
Agree a little 2433 1325 1672 2718 2296 2081 2185 2169 1566
Disagree a little 766 1751 2074 938 980 2088 1457 1747 1297
Disagree a lot 444 1835 1519 272 578 759 746 781 1088
kable(tab2, col.names = name2) %>% 
  kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
BSBM18A BSBM18B BSBM18C BSBM18D BSBM18E BSBM18F BSBM18G BSBM18H BSBM18I BSBM18J
Agree a lot 2413 2265 1551 1186 2332 2688 1702 1929 2376 2137
Agree a little 3018 2592 2819 2600 2608 2348 2957 2774 2725 2824
Disagree a little 365 792 1190 1647 739 652 987 917 593 693
Disagree a lot 79 226 315 442 196 187 229 255 181 221
kable(tab3, col.names = name3) %>% 
  kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
BSBM19A BSBM19B BSBM19C BSBM19D BSBM19E
Agree a lot 1452 749 1366 1222 1115
Agree a little 2311 1659 1486 2403 1968
Disagree a little 1305 2380 1664 1664 1823
Disagree a lot 807 1087 1359 586 969

As we have checked each variable and its categories extensively, there is an appropriate number of observations per each of them. Now let’s see how many dimensions our data have.

dim(data_fa)
## [1] 5875   24

According to the function, the data, after cleaning out NAs and filtering the variables needed, have 5875 observations and 24 variables in total, including gender, parental education and place of birth.

EFA

Correlation matrix

Now let us look into correlations between variables which are categorical ordinal.

corr <- hetcor(data_fa)
corrplot(corr$correlations)

The results show quite strong correlations between many variables, meaning that there are some hidden (latent) factors that influence our variables. Thus, we need to find them.

To do that, we, first of all, make our data numeric.

datafa <- as.data.frame(lapply(data_fa, as.numeric))

These will be used later on in EFA and regression analysis.

Parallel analysis

We will check out scree plot now to see how many factors it suggests us using.

fa.parallel(datafa)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  3

We get the suggestion of 4 factors for the best model. Let us now test it out.

Building model & Interpretation

Using ‘mixed’ value for the function, we allow the factor analysis function choose the optimal rotation so the model gives the best results.

fa1 <- fa(datafa, 4, cor = "mixed")
## 
## mixed.cor is deprecated, please use mixedCor.
print(fa1$loadings,cutoff = 0.3)
## 
## Loadings:
##         MR1    MR2    MR3    MR4   
## BSBM17A  0.866                     
## BSBM17B -0.652                     
## BSBM17C -0.731                     
## BSBM17D  0.794                     
## BSBM17E  0.918                     
## BSBM17F  0.854                     
## BSBM17G  0.843                     
## BSBM17H  0.839                     
## BSBM17I  0.786                     
## BSBM18A                       0.421
## BSBM18B                       0.944
## BSBM18C  0.372                0.619
## BSBM18D  0.321  0.310         0.372
## BSBM18E                       0.659
## BSBM18F                       0.614
## BSBM18G         0.647              
## BSBM18H         0.811              
## BSBM18I         0.862              
## BSBM18J         0.604              
## BSBM19A               -0.632       
## BSBM19B                0.863       
## BSBM19C                0.801       
## BSBM19D  0.349        -0.477       
## BSBM19E                0.560       
## 
##                  MR1   MR2   MR3   MR4
## SS loadings    6.442 2.553 2.538 2.512
## Proportion Var 0.268 0.106 0.106 0.105
## Cumulative Var 0.268 0.375 0.481 0.585

After testing out several models, I came to the conclusion that 4 factors gives the best measure results. Also, the rotation chosen by the function is ‘oblimin’ since the factors are allowed to be correlated. The loadings here mainly show good measures, apart from one variable which is triple-loaded (BSBM18D) and one variable which is double-loaded (BSBM19D).

fa.diagram(fa1)

The diagram shows how variables are distributed among 4 factors and the correlations between factors. We can notice how each factor has at least 4 variables which corresponds to the sufficient requirements.

  1. The first factor fully has all (9) variables that are connected to learning mathematics and can be called ‘Attitudes towards Learning Mathematics’.

  2. The second factor has 4 variables that are connected to the attitudes of students towards learning mathematics in their classes with their teachers. Thus, we can call it ‘Feedback in Math Classes’ since it covers the feedback and how well one learns in mathematics.

  3. The third factor has 5 variables that combine the student’s perception of how they do in math. It can be called ‘Subjective Performance in Mathematics’.

  4. The fourth factor has 6 variables. It seems to cover how well a teacher explains Math and how easy they are to communicate on a subject are. Thus, it might be called ‘Teacher’s Input’ because I believe that these explain how teacher’s abilities to teach influence performance in math.

Model fit

Here we can look into the absolute measures of fit and how well they pass the threshold.

fa1
## Factor Analysis using method =  minres
## Call: fa(r = datafa, nfactors = 4, cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           MR1   MR2   MR3   MR4   h2    u2 com
## BSBM17A  0.87 -0.03 -0.07  0.05 0.86 0.138 1.0
## BSBM17B -0.65  0.05  0.21 -0.04 0.63 0.367 1.2
## BSBM17C -0.73  0.02  0.08 -0.03 0.62 0.380 1.0
## BSBM17D  0.79  0.19  0.10 -0.05 0.65 0.354 1.2
## BSBM17E  0.92 -0.03 -0.06  0.01 0.90 0.098 1.0
## BSBM17F  0.85  0.05  0.04 -0.07 0.67 0.329 1.0
## BSBM17G  0.84  0.01 -0.09 -0.02 0.80 0.205 1.0
## BSBM17H  0.84  0.05  0.10  0.12 0.76 0.241 1.1
## BSBM17I  0.79 -0.04 -0.22  0.03 0.88 0.116 1.2
## BSBM18A  0.10  0.26 -0.08  0.42 0.53 0.475 1.9
## BSBM18B -0.04 -0.04 -0.07  0.94 0.83 0.168 1.0
## BSBM18C  0.37  0.07  0.17  0.62 0.75 0.250 1.8
## BSBM18D  0.32  0.31  0.14  0.37 0.66 0.341 3.2
## BSBM18E -0.02  0.24 -0.04  0.66 0.74 0.259 1.3
## BSBM18F -0.02  0.30 -0.05  0.61 0.76 0.239 1.5
## BSBM18G  0.05  0.65 -0.03  0.13 0.61 0.387 1.1
## BSBM18H  0.08  0.81  0.03 -0.04 0.66 0.342 1.0
## BSBM18I -0.03  0.86 -0.06  0.02 0.76 0.241 1.0
## BSBM18J -0.03  0.60 -0.07  0.23 0.63 0.373 1.3
## BSBM19A  0.27  0.02 -0.63  0.06 0.72 0.279 1.4
## BSBM19B  0.08 -0.05  0.86 -0.01 0.68 0.324 1.0
## BSBM19C -0.15 -0.02  0.80 -0.01 0.83 0.170 1.1
## BSBM19D  0.35 -0.01 -0.48  0.10 0.62 0.383 1.9
## BSBM19E -0.10  0.01  0.56  0.00 0.40 0.604 1.1
## 
##                        MR1  MR2  MR3  MR4
## SS loadings           7.24 3.31 3.00 3.39
## Proportion Var        0.30 0.14 0.13 0.14
## Cumulative Var        0.30 0.44 0.56 0.71
## Proportion Explained  0.43 0.20 0.18 0.20
## Cumulative Proportion 0.43 0.62 0.80 1.00
## 
##  With factor correlations of 
##       MR1   MR2   MR3   MR4
## MR1  1.00  0.42 -0.62  0.48
## MR2  0.42  1.00 -0.06  0.80
## MR3 -0.62 -0.06  1.00 -0.20
## MR4  0.48  0.80 -0.20  1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  23.57 with Chi Square of  138249.6
## The degrees of freedom for the model are 186  and the objective function was  1.26 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  5875 with the empirical chi square  1291.38  with prob <  1.2e-164 
## The total number of observations was  5875  with Likelihood Chi Square =  7366.6  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.923
## RMSEA index =  0.081  and the 90 % confidence intervals are  0.079 0.083
## BIC =  5752.41
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR3  MR4
## Correlation of (regression) scores with factors   0.99 0.95 0.95 0.96
## Multiple R square of scores with factors          0.97 0.91 0.91 0.93
## Minimum correlation of possible factor scores     0.94 0.82 0.82 0.86
  1. Cumulative variance explained reaches as much as 71% which is a great result, making for more than a half explained.

  2. We also see that all factors have gone over the threshold of proportion variance, with each factor explaining at least 10%.

  3. RMSR is far below 0.05, moreover, it is very close to 0 as it is supposed to be.

  4. Tucker Lewis Index shows reliability of 0.923 which is not the best but still very good.

  5. RMSEA index is 0.081 is above 0.05 but below 0.1, which makes it acceptable for us.

Now let’s check Cronbach’s alpha for each factor to test whether it can be improved and whether it corresponds well with our data.

MR1 <- as.data.frame(datafa[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I")])
psych::alpha(MR1, check.keys = TRUE)
## 
## Reliability analysis   
## Call: psych::alpha(x = MR1, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.94      0.94    0.94      0.65  16 0.0011  2.2 0.79     0.64
## 
##  lower alpha upper     95% confidence boundaries
## 0.94 0.94 0.94 
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM17A       0.93      0.93    0.93      0.63  14   0.0013 0.0060  0.64
## BSBM17B-      0.94      0.94    0.94      0.66  16   0.0012 0.0063  0.65
## BSBM17C-      0.94      0.94    0.94      0.66  15   0.0012 0.0069  0.65
## BSBM17D       0.94      0.94    0.94      0.67  16   0.0012 0.0058  0.66
## BSBM17E       0.93      0.93    0.93      0.63  13   0.0014 0.0049  0.63
## BSBM17F       0.94      0.94    0.94      0.66  15   0.0012 0.0069  0.65
## BSBM17G       0.93      0.93    0.93      0.64  14   0.0013 0.0068  0.64
## BSBM17H       0.94      0.94    0.94      0.65  15   0.0012 0.0076  0.64
## BSBM17I       0.93      0.93    0.93      0.63  14   0.0013 0.0058  0.64
## 
##  Item statistics 
##             n raw.r std.r r.cor r.drop mean   sd
## BSBM17A  5875  0.87  0.88  0.87   0.84  1.9 0.90
## BSBM17B- 5875  0.78  0.77  0.74   0.71  2.2 1.07
## BSBM17C- 5875  0.79  0.78  0.75   0.73  2.2 0.95
## BSBM17D  5875  0.74  0.75  0.71   0.69  1.9 0.82
## BSBM17E  5875  0.90  0.90  0.90   0.87  2.0 0.95
## BSBM17F  5875  0.79  0.79  0.76   0.73  2.5 0.91
## BSBM17G  5875  0.86  0.86  0.85   0.82  2.2 0.97
## BSBM17H  5875  0.82  0.82  0.79   0.77  2.4 0.95
## BSBM17I  5875  0.89  0.88  0.87   0.85  2.3 1.10
## 
## Non missing response frequency for each item
##            1    2    3    4 miss
## BSBM17A 0.38 0.41 0.13 0.08    0
## BSBM17B 0.16 0.23 0.30 0.31    0
## BSBM17C 0.10 0.28 0.35 0.26    0
## BSBM17D 0.33 0.46 0.16 0.05    0
## BSBM17E 0.34 0.39 0.17 0.10    0
## BSBM17F 0.16 0.35 0.36 0.13    0
## BSBM17G 0.25 0.37 0.25 0.13    0
## BSBM17H 0.20 0.37 0.30 0.13    0
## BSBM17I 0.33 0.27 0.22 0.19    0

‘Attitudes towards Learning Mathematics’ shows 0.94 on alpha which is a great fit. Moreover, there are no items that can be dropped in order to improve the measure.

MR2 <- as.data.frame(datafa[c("BSBM18I", "BSBM18H", "BSBM18G", "BSBM18J")])
psych::alpha(MR2, check.keys = TRUE)
## 
## Reliability analysis   
## Call: psych::alpha(x = MR2, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.84      0.84     0.8      0.56 5.1 0.0035  1.9 0.64     0.55
## 
##  lower alpha upper     95% confidence boundaries
## 0.83 0.84 0.84 
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r S/N alpha se   var.r med.r
## BSBM18I      0.77      0.77    0.69      0.53 3.4   0.0052 0.00036  0.52
## BSBM18H      0.80      0.80    0.73      0.57 3.9   0.0046 0.00190  0.56
## BSBM18G      0.80      0.80    0.73      0.58 4.1   0.0045 0.00308  0.60
## BSBM18J      0.80      0.80    0.73      0.57 4.0   0.0045 0.00082  0.56
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## BSBM18I 5875  0.84  0.85  0.78   0.71  1.8 0.76
## BSBM18H 5875  0.82  0.81  0.72   0.66  1.9 0.81
## BSBM18G 5875  0.80  0.80  0.70   0.64  2.0 0.78
## BSBM18J 5875  0.81  0.81  0.71   0.65  1.8 0.78
## 
## Non missing response frequency for each item
##            1    2    3    4 miss
## BSBM18I 0.40 0.46 0.10 0.03    0
## BSBM18H 0.33 0.47 0.16 0.04    0
## BSBM18G 0.29 0.50 0.17 0.04    0
## BSBM18J 0.36 0.48 0.12 0.04    0

‘Feedback in Math Classes’ has a slightly worse fit of 0.84 which, however, still works for us since it’s above 0.8. Again, none items can be dropped for the factor improvement.

MR3 <- as.data.frame(datafa[c("BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")])
psych::alpha(MR3, check.keys = TRUE)
## 
## Reliability analysis   
## Call: psych::alpha(x = MR3, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.86      0.86    0.84      0.54   6 0.0029  2.6 0.78     0.52
## 
##  lower alpha upper     95% confidence boundaries
## 0.85 0.86 0.86 
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM19A-      0.81      0.81    0.78      0.52 4.3   0.0040 0.0102  0.50
## BSBM19B       0.82      0.82    0.80      0.54 4.7   0.0037 0.0145  0.54
## BSBM19C       0.80      0.80    0.76      0.50 4.0   0.0043 0.0092  0.48
## BSBM19D-      0.83      0.83    0.80      0.55 5.0   0.0035 0.0117  0.52
## BSBM19E       0.86      0.86    0.84      0.61 6.2   0.0029 0.0061  0.62
## 
##  Item statistics 
##             n raw.r std.r r.cor r.drop mean   sd
## BSBM19A- 5875  0.84  0.84  0.80   0.73  2.8 0.98
## BSBM19B  5875  0.80  0.80  0.74   0.68  2.6 0.92
## BSBM19C  5875  0.87  0.86  0.84   0.77  2.5 1.09
## BSBM19D- 5875  0.77  0.78  0.71   0.65  2.7 0.90
## BSBM19E  5875  0.70  0.70  0.57   0.53  2.5 0.98
## 
## Non missing response frequency for each item
##            1    2    3    4 miss
## BSBM19A 0.25 0.39 0.22 0.14    0
## BSBM19B 0.13 0.28 0.41 0.19    0
## BSBM19C 0.23 0.25 0.28 0.23    0
## BSBM19D 0.21 0.41 0.28 0.10    0
## BSBM19E 0.19 0.33 0.31 0.16    0

‘Subjective Performance in Mathematics’ has a good measure of fit as well - 0.86. Looking at the items, none should be dropped.

MR4 <- as.data.frame(datafa[c("BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F")])
psych::alpha(MR4, check.keys = TRUE)
## 
## Reliability analysis   
## Call: psych::alpha(x = MR4, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.89      0.89    0.89      0.58 8.3 0.0021  1.9 0.63     0.56
## 
##  lower alpha upper     95% confidence boundaries
## 0.89 0.89 0.9 
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM18A      0.89      0.89    0.88      0.62 8.3   0.0023 0.0051  0.61
## BSBM18B      0.86      0.86    0.85      0.56 6.4   0.0027 0.0092  0.55
## BSBM18C      0.87      0.87    0.85      0.57 6.6   0.0027 0.0087  0.56
## BSBM18D      0.88      0.88    0.86      0.59 7.2   0.0024 0.0076  0.58
## BSBM18E      0.87      0.87    0.85      0.57 6.6   0.0026 0.0074  0.56
## BSBM18F      0.87      0.87    0.85      0.57 6.7   0.0026 0.0066  0.56
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## BSBM18A 5875  0.70  0.72  0.62   0.59  1.7 0.65
## BSBM18B 5875  0.85  0.85  0.81   0.77  1.8 0.80
## BSBM18C 5875  0.84  0.83  0.80   0.75  2.0 0.82
## BSBM18D 5875  0.80  0.79  0.73   0.69  2.2 0.85
## BSBM18E 5875  0.83  0.83  0.80   0.75  1.8 0.78
## BSBM18F 5875  0.83  0.83  0.79   0.74  1.7 0.78
## 
## Non missing response frequency for each item
##            1    2    3    4 miss
## BSBM18A 0.41 0.51 0.06 0.01    0
## BSBM18B 0.39 0.44 0.13 0.04    0
## BSBM18C 0.26 0.48 0.20 0.05    0
## BSBM18D 0.20 0.44 0.28 0.08    0
## BSBM18E 0.40 0.44 0.13 0.03    0
## BSBM18F 0.46 0.40 0.11 0.03    0

‘Teacher’s Input’ shows a great result of 0.89, thus, all the factors are great for explaining our data. Also, if any item from the factor is dropped, the alpha will decrease.

Linear regression

First of all, we save the scores we got in EFA to build a model. We also turn our outcome variable numeric.

fascores <- as.data.frame(fa1$scores)
datareg <- cbind(data_reg,fascores)
datareg$BSMMAT01 <- as.numeric(as.character(datareg$BSMMAT01))

Baseline model

Let’s try out some options! After considering all factors individually, the third one showed to be the best one at the explaining the variance. Thus, we will build a model based on it.

regr <- lm(BSMMAT01 ~ MR3, data = datareg)
summary(regr)
## 
## Call:
## lm(formula = BSMMAT01 ~ MR3, data = datareg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -297.413  -44.040    9.423   53.891  210.361 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 616.5026     0.9725  633.95   <2e-16 ***
## MR3          34.6906     0.9787   35.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 74.54 on 5873 degrees of freedom
## Multiple R-squared:  0.1762, Adjusted R-squared:  0.1761 
## F-statistic:  1256 on 1 and 5873 DF,  p-value: < 2.2e-16

It explains approximately 18% of the variance and is significant. We also see that with an increase of a factor (‘Subjective Performance in Mathematics’) by a unit, we get 34.7 increase in our outcome variable (performance). We can see that the more confident in mathematics students do at it better as well. This corresponds well with our research question, meaning that students who feel more comfortable with mathematics, do achieve more.

Now let us add controlling variables to the model. Also let us see whether mother education plays a more important role than a father education in a child’s performance. We also add gender and a place of birth to the models.

Below is the mother education:

regrA <- lm(BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07A, data = datareg)
summary(regrA)
## 
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07A, data = datareg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -302.361  -41.290    8.086   48.860  197.352 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         591.0605     3.4002 173.832  < 2e-16
## MR3                                  32.8928     0.9436  34.859  < 2e-16
## BSBG10ANo                             2.7163     2.7778   0.978    0.328
## BSBG01Boy                           -12.7578     1.8684  -6.828 9.46e-12
## BSBG07ALower secondary                8.0071     5.2502   1.525    0.127
## BSBG07AUpper secondary               28.6197     3.9019   7.335 2.52e-13
## BSBG07APost-secondary, non-tertiary  36.9103     4.4165   8.357  < 2e-16
## BSBG07AShort-cycle tertiary          43.7808     4.4496   9.839  < 2e-16
## BSBG07ABachelorвЂ\231s or equivalent    66.2337     4.0997  16.156  < 2e-16
## BSBG07APostgraduate degree           79.3071     5.4928  14.438  < 2e-16
## BSBG07ADonвЂ\231t know                  16.2253     3.6913   4.396 1.12e-05
##                                        
## (Intercept)                         ***
## MR3                                 ***
## BSBG10ANo                              
## BSBG01Boy                           ***
## BSBG07ALower secondary                 
## BSBG07AUpper secondary              ***
## BSBG07APost-secondary, non-tertiary ***
## BSBG07AShort-cycle tertiary         ***
## BSBG07ABachelorвЂ\231s or equivalent   ***
## BSBG07APostgraduate degree          ***
## BSBG07ADonвЂ\231t know                 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.98 on 5864 degrees of freedom
## Multiple R-squared:  0.2541, Adjusted R-squared:  0.2528 
## F-statistic: 199.8 on 10 and 5864 DF,  p-value: < 2.2e-16

Here is the father education:

regrB <- lm(BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
summary(regrB)
## 
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -300.020  -41.073    8.216   48.901  196.870 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         590.3422     3.5876 164.552  < 2e-16
## MR3                                  32.4969     0.9416  34.513  < 2e-16
## BSBG10ANo                             0.4398     2.7757   0.158  0.87411
## BSBG01Boy                           -14.2534     1.8583  -7.670 2.00e-14
## BSBG07BLower secondary               14.8192     5.5453   2.672  0.00755
## BSBG07BUpper secondary               27.9192     4.2247   6.609 4.22e-11
## BSBG07BPost-secondary, non-tertiary  25.7833     4.5467   5.671 1.49e-08
## BSBG07BShort-cycle tertiary          43.7932     4.6268   9.465  < 2e-16
## BSBG07BBachelorвЂ\231s or equivalent    67.8669     4.2489  15.973  < 2e-16
## BSBG07BPostgraduate degree           76.4767     4.8433  15.790  < 2e-16
## BSBG07BDonвЂ\231t know                  18.8550     3.8439   4.905 9.58e-07
##                                        
## (Intercept)                         ***
## MR3                                 ***
## BSBG10ANo                              
## BSBG01Boy                           ***
## BSBG07BLower secondary              ** 
## BSBG07BUpper secondary              ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary         ***
## BSBG07BBachelorвЂ\231s or equivalent   ***
## BSBG07BPostgraduate degree          ***
## BSBG07BDonвЂ\231t know                 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.82 on 5864 degrees of freedom
## Multiple R-squared:  0.2576, Adjusted R-squared:  0.2563 
## F-statistic: 203.4 on 10 and 5864 DF,  p-value: < 2.2e-16

Since the results show only a slight increase in variance explained when considering a mother’s education, let us test how different they really are (models are non-nested):

AIC(regrA, regrB)
##       df      AIC
## regrA 12 66769.03
## regrB 12 66741.72

Accordingly, we can see that the difference is so small, it can be neglected. The research question, which stated whether maternal education affects performance more than paternal education, can be answered by the following - no, parental education plays a bigger role but only ever slightly.

Thus, we can look at the paternal education-based model’s results.

summary(regrB)
## 
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -300.020  -41.073    8.216   48.901  196.870 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         590.3422     3.5876 164.552  < 2e-16
## MR3                                  32.4969     0.9416  34.513  < 2e-16
## BSBG10ANo                             0.4398     2.7757   0.158  0.87411
## BSBG01Boy                           -14.2534     1.8583  -7.670 2.00e-14
## BSBG07BLower secondary               14.8192     5.5453   2.672  0.00755
## BSBG07BUpper secondary               27.9192     4.2247   6.609 4.22e-11
## BSBG07BPost-secondary, non-tertiary  25.7833     4.5467   5.671 1.49e-08
## BSBG07BShort-cycle tertiary          43.7932     4.6268   9.465  < 2e-16
## BSBG07BBachelorвЂ\231s or equivalent    67.8669     4.2489  15.973  < 2e-16
## BSBG07BPostgraduate degree           76.4767     4.8433  15.790  < 2e-16
## BSBG07BDonвЂ\231t know                  18.8550     3.8439   4.905 9.58e-07
##                                        
## (Intercept)                         ***
## MR3                                 ***
## BSBG10ANo                              
## BSBG01Boy                           ***
## BSBG07BLower secondary              ** 
## BSBG07BUpper secondary              ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary         ***
## BSBG07BBachelorвЂ\231s or equivalent   ***
## BSBG07BPostgraduate degree          ***
## BSBG07BDonвЂ\231t know                 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.82 on 5864 degrees of freedom
## Multiple R-squared:  0.2576, Adjusted R-squared:  0.2563 
## F-statistic: 203.4 on 10 and 5864 DF,  p-value: < 2.2e-16
vif(regrB)
##             GVIF Df GVIF^(1/(2*Df))
## MR3     1.025399  1        1.012620
## BSBG10A 1.030155  1        1.014966
## BSBG01  1.010963  1        1.005467
## BSBG07B 1.045725  7        1.003199
plot(regrB)

It explains up to 26% of the variance. It also makes a place of birth insignificant in this particular model. In general, we see that the higher father’s education is, the better a student performs in math. An increase by a degree unit in Lower Secondary education, gives a 14.8 increase in the outcome. An increase by a degree unit in Upper Secondary education, gives a 27.9 increase in the outcome.An increase by a degree unit in Post-Secondary education, gives a 25.7 increase in the outcome. An increase by a degree unit in Short-cycle tertiary education, gives a 43.8 increase in the outcome. An increase by a degree unit in Bachelor degree education, gives a 67.9 increase in the outcome. An increase by a degree unit in Postgraduate education, gives a 76.5 increase in the outcome. These are all compared to the lowest education.

We also checked for multicollinearity and since all the numbers are under 5, we are safe here. The plots show the data are normally distributed and there are 3 outliers but no leverages.

Interaction effect

Let us now see if an interaction effect in gender makes any difference.

regrint <- lm(BSMMAT01 ~ MR3*BSBG01 + BSBG10A + BSBG07A + BSBG07B, data = datareg)
summary(regrint)
## 
## Call:
## lm(formula = BSMMAT01 ~ MR3 * BSBG01 + BSBG10A + BSBG07A + BSBG07B, 
##     data = datareg)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -305.903  -41.240    7.218   48.594  200.566 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         577.7987     4.2985 134.419  < 2e-16
## MR3                                  27.8120     1.3062  21.292  < 2e-16
## BSBG01Boy                           -13.2622     1.8418  -7.201 6.75e-13
## BSBG10ANo                             0.3746     2.7511   0.136 0.891698
## BSBG07ALower secondary                5.2315     5.1825   1.009 0.312800
## BSBG07AUpper secondary               21.2476     3.9589   5.367 8.31e-08
## BSBG07APost-secondary, non-tertiary  27.5119     4.5457   6.052 1.52e-09
## BSBG07AShort-cycle tertiary          28.6710     4.6945   6.107 1.08e-09
## BSBG07ABachelorвЂ\231s or equivalent    40.1462     4.6010   8.726  < 2e-16
## BSBG07APostgraduate degree           51.2433     6.0284   8.500  < 2e-16
## BSBG07ADonвЂ\231t know                  11.9438     4.1129   2.904 0.003698
## BSBG07BLower secondary               12.5393     5.4874   2.285 0.022342
## BSBG07BUpper secondary               20.1077     4.2827   4.695 2.73e-06
## BSBG07BPost-secondary, non-tertiary  15.4958     4.6745   3.315 0.000922
## BSBG07BShort-cycle tertiary          31.1279     4.8528   6.414 1.52e-10
## BSBG07BBachelorвЂ\231s or equivalent    46.6535     4.7081   9.909  < 2e-16
## BSBG07BPostgraduate degree           51.3849     5.3856   9.541  < 2e-16
## BSBG07BDonвЂ\231t know                  15.4466     4.2561   3.629 0.000287
## MR3:BSBG01Boy                         8.7461     1.8447   4.741 2.17e-06
##                                        
## (Intercept)                         ***
## MR3                                 ***
## BSBG01Boy                           ***
## BSBG10ANo                              
## BSBG07ALower secondary                 
## BSBG07AUpper secondary              ***
## BSBG07APost-secondary, non-tertiary ***
## BSBG07AShort-cycle tertiary         ***
## BSBG07ABachelorвЂ\231s or equivalent   ***
## BSBG07APostgraduate degree          ***
## BSBG07ADonвЂ\231t know                 ** 
## BSBG07BLower secondary              *  
## BSBG07BUpper secondary              ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary         ***
## BSBG07BBachelorвЂ\231s or equivalent   ***
## BSBG07BPostgraduate degree          ***
## BSBG07BDonвЂ\231t know                 ***
## MR3:BSBG01Boy                       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 69.92 on 5856 degrees of freedom
## Multiple R-squared:  0.2772, Adjusted R-squared:  0.275 
## F-statistic: 124.8 on 18 and 5856 DF,  p-value: < 2.2e-16

Well, unfortunately, the model didn’t improve much. We still have only 27.5% of variance explained. However, our interaction effect is significant.

plot_model(regrint, type = "int")

The plot proves our point. The higher attitudes in self-evaluation in math abilities results in better performance for girls until the somewhat ‘very good at math’ point, where boys start performing better by 8 ‘points’ in math than the girls with the same evaluation. Thus, our third research question is disproved since boys do better at some point.

plot(regrint)

vif(regrint)
##                GVIF Df GVIF^(1/(2*Df))
## MR3        2.024079  1        1.422701
## BSBG01     1.018722  1        1.009318
## BSBG10A    1.038029  1        1.018837
## BSBG07A    3.795913  7        1.099967
## BSBG07B    3.790416  7        1.099853
## MR3:BSBG01 1.990631  1        1.410897

The plots show good results: normal distribution and no leverages. Multicollinearity test also showed good results by having all the numbers under 5. Overall, I would say that this model is the best.

Conclusion

The main thing we should get from this analysis is that attitudes towards math and how you evaluate yourself are crucial for your performance in the subject. We disproved our RQ2 and RQ3, however, RQ1 proved to be true - those who feel more confident in mathematics, do better at it. As for the parental education, father’s figure education matters a little bit more than the mother’s figure. Also, with the higher levels of confidence in their abilities, boys in 8th grade perform in math better than girls in 8th grade. Thus, we were able to repeat the success if the research that studies Attitudes and Performance in Mathematics, reaching almost 28% of explained variance in the last model. That is all. Again, next time the research may be more focused on the gender differences in performance based on how students evaluate themselves.

References

  1. A. A. Lipnevich, C. MacCann, S. Krumm, J. Burrus, and R. D. Roberts, “Mathematics attitudes and mathematics outcomes of US and Belarusian middle school students,” Journal of Educational Psychology, vol. 103, no. 1, pp. 105–118, 2011.

  2. M. Nicolaidou and G. Philippou, “Attitudes towards mathematics, self-efficacy and achievement in problem solving,” in European Research in Mathematics Education III, M. A. Mariotti, Ed., pp. 1–11, University of Pisa, Pisa, Italy, 2003.