Nadezhda Bykova
Anastasia Vlasenko
Anna Gorobtsova
Artyom Kulikov
This time we were able to get our hands on some really good and insightful data - TIMSS 2015. We are looking at the scales that measure attitudes towards Mathematics of 8th grade students. With the help of Explanatory Factor Analysis we will be able to determine whether there are latent factors that influence our variables and how they are groupped. Afterwards, we will try to build a linear regression model to see how well these factors, along with gender, parental education and place of birth of a child, explain the variation of math achievement of a student.
Since we have variables that are connected to what can be divided into categories like ‘Attitudes towards Learning Mathematics’, ‘Your Mathematics Lessons’ and ‘Attitudes towards Mathematics’, we can check whether a child’s performance is somehow connected to how they are interested in Math and how they feel about it both class-wise and in general. Some papers suggest that attitudes towards Math and achievement in school, both in Math and in other subjects, are positively correlated, thus, those who are more positive about Math, perform better (Nicolaidou and Philippou, 2003). Others specifically show that attitudes can explain up to 32% of achievement in Math (Lipnevich et al., 2011). Thus, I suggest the following research questions for the linear regression:
Do students who feel more confident in mathematics perform better?
Does maternal education explaon the performance of a child better than paternal education?
Do girls perform in Mathematics better than boys?
Keeping the suggestions of other researchers in mind, we will conduct EFA to find any latent factors there are, then use them to build a linear regression model and test our research questions.
Here we simply upload our data which is, again, 8th grade students questionnaire answers. We also call our main libraries and save the data, filtering the NAs. We also create two datasets: for EFA and for regression analysis.
library(foreign)
library(psych)
library(knitr)
library(magrittr)
library(kableExtra)
library(polycor)
library(corrplot)
library(car)
library(ggplot2)
library(dplyr)
library(sjPlot)
data1 <- read.spss("BSGSGPM6.sav", to.data.frame = TRUE, use.value.labels = TRUE)
data2 <- data1[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")]
data3 <- na.omit(data2)
save1 <- c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")
save2 <- c("BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")
data_fa <- data3[save1]
data_reg <- data3[save2]
First of all, I should mention that the variables are scales with ‘Agree a lot’ (1), ‘Agree a little’ (2), ‘Disagree a little’ (3) and ‘Disagree a lot’ (4). We should remember about the inverse scale when interpreting the results.
All of our variables are categorical. Since there are so many, running tests wouldn’t be reasonable (also a waste of time) so we will look into summary tables for all variables.
kable(tab1, col.names = name1) %>%
kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
| BSBM17A | BSBM17B | BSBM17C | BSBM17D | BSBM17E | BSBM17F | BSBM17G | BSBM17H | BSBM17I | |
|---|---|---|---|---|---|---|---|---|---|
| Agree a lot | 2232 | 964 | 610 | 1947 | 2021 | 947 | 1487 | 1178 | 1924 |
| Agree a little | 2433 | 1325 | 1672 | 2718 | 2296 | 2081 | 2185 | 2169 | 1566 |
| Disagree a little | 766 | 1751 | 2074 | 938 | 980 | 2088 | 1457 | 1747 | 1297 |
| Disagree a lot | 444 | 1835 | 1519 | 272 | 578 | 759 | 746 | 781 | 1088 |
kable(tab2, col.names = name2) %>%
kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
| BSBM18A | BSBM18B | BSBM18C | BSBM18D | BSBM18E | BSBM18F | BSBM18G | BSBM18H | BSBM18I | BSBM18J | |
|---|---|---|---|---|---|---|---|---|---|---|
| Agree a lot | 2413 | 2265 | 1551 | 1186 | 2332 | 2688 | 1702 | 1929 | 2376 | 2137 |
| Agree a little | 3018 | 2592 | 2819 | 2600 | 2608 | 2348 | 2957 | 2774 | 2725 | 2824 |
| Disagree a little | 365 | 792 | 1190 | 1647 | 739 | 652 | 987 | 917 | 593 | 693 |
| Disagree a lot | 79 | 226 | 315 | 442 | 196 | 187 | 229 | 255 | 181 | 221 |
kable(tab3, col.names = name3) %>%
kable_styling(bootstrap_options = c("bordered", "striped", "hover"))
| BSBM19A | BSBM19B | BSBM19C | BSBM19D | BSBM19E | |
|---|---|---|---|---|---|
| Agree a lot | 1452 | 749 | 1366 | 1222 | 1115 |
| Agree a little | 2311 | 1659 | 1486 | 2403 | 1968 |
| Disagree a little | 1305 | 2380 | 1664 | 1664 | 1823 |
| Disagree a lot | 807 | 1087 | 1359 | 586 | 969 |
As we have checked each variable and its categories extensively, there is an appropriate number of observations per each of them. Now let’s see how many dimensions our data have.
dim(data_fa)
## [1] 5875 24
According to the function, the data, after cleaning out NAs and filtering the variables needed, have 5875 observations and 24 variables in total, including gender, parental education and place of birth.
Now let us look into correlations between variables which are categorical ordinal.
corr <- hetcor(data_fa)
corrplot(corr$correlations)
The results show quite strong correlations between many variables, meaning that there are some hidden (latent) factors that influence our variables. Thus, we need to find them.
To do that, we, first of all, make our data numeric.
datafa <- as.data.frame(lapply(data_fa, as.numeric))
These will be used later on in EFA and regression analysis.
We will check out scree plot now to see how many factors it suggests us using.
fa.parallel(datafa)
## Parallel analysis suggests that the number of factors = 4 and the number of components = 3
We get the suggestion of 4 factors for the best model. Let us now test it out.
Using ‘mixed’ value for the function, we allow the factor analysis function choose the optimal rotation so the model gives the best results.
fa1 <- fa(datafa, 4, cor = "mixed")
##
## mixed.cor is deprecated, please use mixedCor.
print(fa1$loadings,cutoff = 0.3)
##
## Loadings:
## MR1 MR2 MR3 MR4
## BSBM17A 0.866
## BSBM17B -0.652
## BSBM17C -0.731
## BSBM17D 0.794
## BSBM17E 0.918
## BSBM17F 0.854
## BSBM17G 0.843
## BSBM17H 0.839
## BSBM17I 0.786
## BSBM18A 0.421
## BSBM18B 0.944
## BSBM18C 0.372 0.619
## BSBM18D 0.321 0.310 0.372
## BSBM18E 0.659
## BSBM18F 0.614
## BSBM18G 0.647
## BSBM18H 0.811
## BSBM18I 0.862
## BSBM18J 0.604
## BSBM19A -0.632
## BSBM19B 0.863
## BSBM19C 0.801
## BSBM19D 0.349 -0.477
## BSBM19E 0.560
##
## MR1 MR2 MR3 MR4
## SS loadings 6.442 2.553 2.538 2.512
## Proportion Var 0.268 0.106 0.106 0.105
## Cumulative Var 0.268 0.375 0.481 0.585
After testing out several models, I came to the conclusion that 4 factors gives the best measure results. Also, the rotation chosen by the function is ‘oblimin’ since the factors are allowed to be correlated. The loadings here mainly show good measures, apart from one variable which is triple-loaded (BSBM18D) and one variable which is double-loaded (BSBM19D).
fa.diagram(fa1)
The diagram shows how variables are distributed among 4 factors and the correlations between factors. We can notice how each factor has at least 4 variables which corresponds to the sufficient requirements.
The first factor fully has all (9) variables that are connected to learning mathematics and can be called ‘Attitudes towards Learning Mathematics’.
The second factor has 4 variables that are connected to the attitudes of students towards learning mathematics in their classes with their teachers. Thus, we can call it ‘Feedback in Math Classes’ since it covers the feedback and how well one learns in mathematics.
The third factor has 5 variables that combine the student’s perception of how they do in math. It can be called ‘Subjective Performance in Mathematics’.
The fourth factor has 6 variables. It seems to cover how well a teacher explains Math and how easy they are to communicate on a subject are. Thus, it might be called ‘Teacher’s Input’ because I believe that these explain how teacher’s abilities to teach influence performance in math.
Here we can look into the absolute measures of fit and how well they pass the threshold.
fa1
## Factor Analysis using method = minres
## Call: fa(r = datafa, nfactors = 4, cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 MR4 h2 u2 com
## BSBM17A 0.87 -0.03 -0.07 0.05 0.86 0.138 1.0
## BSBM17B -0.65 0.05 0.21 -0.04 0.63 0.367 1.2
## BSBM17C -0.73 0.02 0.08 -0.03 0.62 0.380 1.0
## BSBM17D 0.79 0.19 0.10 -0.05 0.65 0.354 1.2
## BSBM17E 0.92 -0.03 -0.06 0.01 0.90 0.098 1.0
## BSBM17F 0.85 0.05 0.04 -0.07 0.67 0.329 1.0
## BSBM17G 0.84 0.01 -0.09 -0.02 0.80 0.205 1.0
## BSBM17H 0.84 0.05 0.10 0.12 0.76 0.241 1.1
## BSBM17I 0.79 -0.04 -0.22 0.03 0.88 0.116 1.2
## BSBM18A 0.10 0.26 -0.08 0.42 0.53 0.475 1.9
## BSBM18B -0.04 -0.04 -0.07 0.94 0.83 0.168 1.0
## BSBM18C 0.37 0.07 0.17 0.62 0.75 0.250 1.8
## BSBM18D 0.32 0.31 0.14 0.37 0.66 0.341 3.2
## BSBM18E -0.02 0.24 -0.04 0.66 0.74 0.259 1.3
## BSBM18F -0.02 0.30 -0.05 0.61 0.76 0.239 1.5
## BSBM18G 0.05 0.65 -0.03 0.13 0.61 0.387 1.1
## BSBM18H 0.08 0.81 0.03 -0.04 0.66 0.342 1.0
## BSBM18I -0.03 0.86 -0.06 0.02 0.76 0.241 1.0
## BSBM18J -0.03 0.60 -0.07 0.23 0.63 0.373 1.3
## BSBM19A 0.27 0.02 -0.63 0.06 0.72 0.279 1.4
## BSBM19B 0.08 -0.05 0.86 -0.01 0.68 0.324 1.0
## BSBM19C -0.15 -0.02 0.80 -0.01 0.83 0.170 1.1
## BSBM19D 0.35 -0.01 -0.48 0.10 0.62 0.383 1.9
## BSBM19E -0.10 0.01 0.56 0.00 0.40 0.604 1.1
##
## MR1 MR2 MR3 MR4
## SS loadings 7.24 3.31 3.00 3.39
## Proportion Var 0.30 0.14 0.13 0.14
## Cumulative Var 0.30 0.44 0.56 0.71
## Proportion Explained 0.43 0.20 0.18 0.20
## Cumulative Proportion 0.43 0.62 0.80 1.00
##
## With factor correlations of
## MR1 MR2 MR3 MR4
## MR1 1.00 0.42 -0.62 0.48
## MR2 0.42 1.00 -0.06 0.80
## MR3 -0.62 -0.06 1.00 -0.20
## MR4 0.48 0.80 -0.20 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 23.57 with Chi Square of 138249.6
## The degrees of freedom for the model are 186 and the objective function was 1.26
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic number of observations is 5875 with the empirical chi square 1291.38 with prob < 1.2e-164
## The total number of observations was 5875 with Likelihood Chi Square = 7366.6 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.923
## RMSEA index = 0.081 and the 90 % confidence intervals are 0.079 0.083
## BIC = 5752.41
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2 MR3 MR4
## Correlation of (regression) scores with factors 0.99 0.95 0.95 0.96
## Multiple R square of scores with factors 0.97 0.91 0.91 0.93
## Minimum correlation of possible factor scores 0.94 0.82 0.82 0.86
Cumulative variance explained reaches as much as 71% which is a great result, making for more than a half explained.
We also see that all factors have gone over the threshold of proportion variance, with each factor explaining at least 10%.
RMSR is far below 0.05, moreover, it is very close to 0 as it is supposed to be.
Tucker Lewis Index shows reliability of 0.923 which is not the best but still very good.
RMSEA index is 0.081 is above 0.05 but below 0.1, which makes it acceptable for us.
Now let’s check Cronbach’s alpha for each factor to test whether it can be improved and whether it corresponds well with our data.
MR1 <- as.data.frame(datafa[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I")])
psych::alpha(MR1, check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR1, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.94 0.94 0.94 0.65 16 0.0011 2.2 0.79 0.64
##
## lower alpha upper 95% confidence boundaries
## 0.94 0.94 0.94
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM17A 0.93 0.93 0.93 0.63 14 0.0013 0.0060 0.64
## BSBM17B- 0.94 0.94 0.94 0.66 16 0.0012 0.0063 0.65
## BSBM17C- 0.94 0.94 0.94 0.66 15 0.0012 0.0069 0.65
## BSBM17D 0.94 0.94 0.94 0.67 16 0.0012 0.0058 0.66
## BSBM17E 0.93 0.93 0.93 0.63 13 0.0014 0.0049 0.63
## BSBM17F 0.94 0.94 0.94 0.66 15 0.0012 0.0069 0.65
## BSBM17G 0.93 0.93 0.93 0.64 14 0.0013 0.0068 0.64
## BSBM17H 0.94 0.94 0.94 0.65 15 0.0012 0.0076 0.64
## BSBM17I 0.93 0.93 0.93 0.63 14 0.0013 0.0058 0.64
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM17A 5875 0.87 0.88 0.87 0.84 1.9 0.90
## BSBM17B- 5875 0.78 0.77 0.74 0.71 2.2 1.07
## BSBM17C- 5875 0.79 0.78 0.75 0.73 2.2 0.95
## BSBM17D 5875 0.74 0.75 0.71 0.69 1.9 0.82
## BSBM17E 5875 0.90 0.90 0.90 0.87 2.0 0.95
## BSBM17F 5875 0.79 0.79 0.76 0.73 2.5 0.91
## BSBM17G 5875 0.86 0.86 0.85 0.82 2.2 0.97
## BSBM17H 5875 0.82 0.82 0.79 0.77 2.4 0.95
## BSBM17I 5875 0.89 0.88 0.87 0.85 2.3 1.10
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM17A 0.38 0.41 0.13 0.08 0
## BSBM17B 0.16 0.23 0.30 0.31 0
## BSBM17C 0.10 0.28 0.35 0.26 0
## BSBM17D 0.33 0.46 0.16 0.05 0
## BSBM17E 0.34 0.39 0.17 0.10 0
## BSBM17F 0.16 0.35 0.36 0.13 0
## BSBM17G 0.25 0.37 0.25 0.13 0
## BSBM17H 0.20 0.37 0.30 0.13 0
## BSBM17I 0.33 0.27 0.22 0.19 0
‘Attitudes towards Learning Mathematics’ shows 0.94 on alpha which is a great fit. Moreover, there are no items that can be dropped in order to improve the measure.
MR2 <- as.data.frame(datafa[c("BSBM18I", "BSBM18H", "BSBM18G", "BSBM18J")])
psych::alpha(MR2, check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR2, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.84 0.84 0.8 0.56 5.1 0.0035 1.9 0.64 0.55
##
## lower alpha upper 95% confidence boundaries
## 0.83 0.84 0.84
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM18I 0.77 0.77 0.69 0.53 3.4 0.0052 0.00036 0.52
## BSBM18H 0.80 0.80 0.73 0.57 3.9 0.0046 0.00190 0.56
## BSBM18G 0.80 0.80 0.73 0.58 4.1 0.0045 0.00308 0.60
## BSBM18J 0.80 0.80 0.73 0.57 4.0 0.0045 0.00082 0.56
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM18I 5875 0.84 0.85 0.78 0.71 1.8 0.76
## BSBM18H 5875 0.82 0.81 0.72 0.66 1.9 0.81
## BSBM18G 5875 0.80 0.80 0.70 0.64 2.0 0.78
## BSBM18J 5875 0.81 0.81 0.71 0.65 1.8 0.78
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM18I 0.40 0.46 0.10 0.03 0
## BSBM18H 0.33 0.47 0.16 0.04 0
## BSBM18G 0.29 0.50 0.17 0.04 0
## BSBM18J 0.36 0.48 0.12 0.04 0
‘Feedback in Math Classes’ has a slightly worse fit of 0.84 which, however, still works for us since it’s above 0.8. Again, none items can be dropped for the factor improvement.
MR3 <- as.data.frame(datafa[c("BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")])
psych::alpha(MR3, check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR3, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.86 0.86 0.84 0.54 6 0.0029 2.6 0.78 0.52
##
## lower alpha upper 95% confidence boundaries
## 0.85 0.86 0.86
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM19A- 0.81 0.81 0.78 0.52 4.3 0.0040 0.0102 0.50
## BSBM19B 0.82 0.82 0.80 0.54 4.7 0.0037 0.0145 0.54
## BSBM19C 0.80 0.80 0.76 0.50 4.0 0.0043 0.0092 0.48
## BSBM19D- 0.83 0.83 0.80 0.55 5.0 0.0035 0.0117 0.52
## BSBM19E 0.86 0.86 0.84 0.61 6.2 0.0029 0.0061 0.62
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM19A- 5875 0.84 0.84 0.80 0.73 2.8 0.98
## BSBM19B 5875 0.80 0.80 0.74 0.68 2.6 0.92
## BSBM19C 5875 0.87 0.86 0.84 0.77 2.5 1.09
## BSBM19D- 5875 0.77 0.78 0.71 0.65 2.7 0.90
## BSBM19E 5875 0.70 0.70 0.57 0.53 2.5 0.98
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM19A 0.25 0.39 0.22 0.14 0
## BSBM19B 0.13 0.28 0.41 0.19 0
## BSBM19C 0.23 0.25 0.28 0.23 0
## BSBM19D 0.21 0.41 0.28 0.10 0
## BSBM19E 0.19 0.33 0.31 0.16 0
‘Subjective Performance in Mathematics’ has a good measure of fit as well - 0.86. Looking at the items, none should be dropped.
MR4 <- as.data.frame(datafa[c("BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F")])
psych::alpha(MR4, check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR4, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.89 0.89 0.89 0.58 8.3 0.0021 1.9 0.63 0.56
##
## lower alpha upper 95% confidence boundaries
## 0.89 0.89 0.9
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM18A 0.89 0.89 0.88 0.62 8.3 0.0023 0.0051 0.61
## BSBM18B 0.86 0.86 0.85 0.56 6.4 0.0027 0.0092 0.55
## BSBM18C 0.87 0.87 0.85 0.57 6.6 0.0027 0.0087 0.56
## BSBM18D 0.88 0.88 0.86 0.59 7.2 0.0024 0.0076 0.58
## BSBM18E 0.87 0.87 0.85 0.57 6.6 0.0026 0.0074 0.56
## BSBM18F 0.87 0.87 0.85 0.57 6.7 0.0026 0.0066 0.56
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM18A 5875 0.70 0.72 0.62 0.59 1.7 0.65
## BSBM18B 5875 0.85 0.85 0.81 0.77 1.8 0.80
## BSBM18C 5875 0.84 0.83 0.80 0.75 2.0 0.82
## BSBM18D 5875 0.80 0.79 0.73 0.69 2.2 0.85
## BSBM18E 5875 0.83 0.83 0.80 0.75 1.8 0.78
## BSBM18F 5875 0.83 0.83 0.79 0.74 1.7 0.78
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM18A 0.41 0.51 0.06 0.01 0
## BSBM18B 0.39 0.44 0.13 0.04 0
## BSBM18C 0.26 0.48 0.20 0.05 0
## BSBM18D 0.20 0.44 0.28 0.08 0
## BSBM18E 0.40 0.44 0.13 0.03 0
## BSBM18F 0.46 0.40 0.11 0.03 0
‘Teacher’s Input’ shows a great result of 0.89, thus, all the factors are great for explaining our data. Also, if any item from the factor is dropped, the alpha will decrease.
First of all, we save the scores we got in EFA to build a model. We also turn our outcome variable numeric.
fascores <- as.data.frame(fa1$scores)
datareg <- cbind(data_reg,fascores)
datareg$BSMMAT01 <- as.numeric(as.character(datareg$BSMMAT01))
Let’s try out some options! After considering all factors individually, the third one showed to be the best one at the explaining the variance. Thus, we will build a model based on it.
regr <- lm(BSMMAT01 ~ MR3, data = datareg)
summary(regr)
##
## Call:
## lm(formula = BSMMAT01 ~ MR3, data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -297.413 -44.040 9.423 53.891 210.361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 616.5026 0.9725 633.95 <2e-16 ***
## MR3 34.6906 0.9787 35.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 74.54 on 5873 degrees of freedom
## Multiple R-squared: 0.1762, Adjusted R-squared: 0.1761
## F-statistic: 1256 on 1 and 5873 DF, p-value: < 2.2e-16
It explains approximately 18% of the variance and is significant. We also see that with an increase of a factor (‘Subjective Performance in Mathematics’) by a unit, we get 34.7 increase in our outcome variable (performance). We can see that the more confident in mathematics students do at it better as well. This corresponds well with our research question, meaning that students who feel more comfortable with mathematics, do achieve more.
Now let us add controlling variables to the model. Also let us see whether mother education plays a more important role than a father education in a child’s performance. We also add gender and a place of birth to the models.
Below is the mother education:
regrA <- lm(BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07A, data = datareg)
summary(regrA)
##
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07A, data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -302.361 -41.290 8.086 48.860 197.352
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 591.0605 3.4002 173.832 < 2e-16
## MR3 32.8928 0.9436 34.859 < 2e-16
## BSBG10ANo 2.7163 2.7778 0.978 0.328
## BSBG01Boy -12.7578 1.8684 -6.828 9.46e-12
## BSBG07ALower secondary 8.0071 5.2502 1.525 0.127
## BSBG07AUpper secondary 28.6197 3.9019 7.335 2.52e-13
## BSBG07APost-secondary, non-tertiary 36.9103 4.4165 8.357 < 2e-16
## BSBG07AShort-cycle tertiary 43.7808 4.4496 9.839 < 2e-16
## BSBG07ABachelorвЂ\231s or equivalent 66.2337 4.0997 16.156 < 2e-16
## BSBG07APostgraduate degree 79.3071 5.4928 14.438 < 2e-16
## BSBG07ADonвЂ\231t know 16.2253 3.6913 4.396 1.12e-05
##
## (Intercept) ***
## MR3 ***
## BSBG10ANo
## BSBG01Boy ***
## BSBG07ALower secondary
## BSBG07AUpper secondary ***
## BSBG07APost-secondary, non-tertiary ***
## BSBG07AShort-cycle tertiary ***
## BSBG07ABachelorвЂ\231s or equivalent ***
## BSBG07APostgraduate degree ***
## BSBG07ADonвЂ\231t know ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.98 on 5864 degrees of freedom
## Multiple R-squared: 0.2541, Adjusted R-squared: 0.2528
## F-statistic: 199.8 on 10 and 5864 DF, p-value: < 2.2e-16
Here is the father education:
regrB <- lm(BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
summary(regrB)
##
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -300.020 -41.073 8.216 48.901 196.870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 590.3422 3.5876 164.552 < 2e-16
## MR3 32.4969 0.9416 34.513 < 2e-16
## BSBG10ANo 0.4398 2.7757 0.158 0.87411
## BSBG01Boy -14.2534 1.8583 -7.670 2.00e-14
## BSBG07BLower secondary 14.8192 5.5453 2.672 0.00755
## BSBG07BUpper secondary 27.9192 4.2247 6.609 4.22e-11
## BSBG07BPost-secondary, non-tertiary 25.7833 4.5467 5.671 1.49e-08
## BSBG07BShort-cycle tertiary 43.7932 4.6268 9.465 < 2e-16
## BSBG07BBachelorвЂ\231s or equivalent 67.8669 4.2489 15.973 < 2e-16
## BSBG07BPostgraduate degree 76.4767 4.8433 15.790 < 2e-16
## BSBG07BDonвЂ\231t know 18.8550 3.8439 4.905 9.58e-07
##
## (Intercept) ***
## MR3 ***
## BSBG10ANo
## BSBG01Boy ***
## BSBG07BLower secondary **
## BSBG07BUpper secondary ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary ***
## BSBG07BBachelorвЂ\231s or equivalent ***
## BSBG07BPostgraduate degree ***
## BSBG07BDonвЂ\231t know ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.82 on 5864 degrees of freedom
## Multiple R-squared: 0.2576, Adjusted R-squared: 0.2563
## F-statistic: 203.4 on 10 and 5864 DF, p-value: < 2.2e-16
Since the results show only a slight increase in variance explained when considering a mother’s education, let us test how different they really are (models are non-nested):
AIC(regrA, regrB)
## df AIC
## regrA 12 66769.03
## regrB 12 66741.72
Accordingly, we can see that the difference is so small, it can be neglected. The research question, which stated whether maternal education affects performance more than paternal education, can be answered by the following - no, parental education plays a bigger role but only ever slightly.
Thus, we can look at the paternal education-based model’s results.
summary(regrB)
##
## Call:
## lm(formula = BSMMAT01 ~ MR3 + BSBG10A + BSBG01 + BSBG07B, data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -300.020 -41.073 8.216 48.901 196.870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 590.3422 3.5876 164.552 < 2e-16
## MR3 32.4969 0.9416 34.513 < 2e-16
## BSBG10ANo 0.4398 2.7757 0.158 0.87411
## BSBG01Boy -14.2534 1.8583 -7.670 2.00e-14
## BSBG07BLower secondary 14.8192 5.5453 2.672 0.00755
## BSBG07BUpper secondary 27.9192 4.2247 6.609 4.22e-11
## BSBG07BPost-secondary, non-tertiary 25.7833 4.5467 5.671 1.49e-08
## BSBG07BShort-cycle tertiary 43.7932 4.6268 9.465 < 2e-16
## BSBG07BBachelorвЂ\231s or equivalent 67.8669 4.2489 15.973 < 2e-16
## BSBG07BPostgraduate degree 76.4767 4.8433 15.790 < 2e-16
## BSBG07BDonвЂ\231t know 18.8550 3.8439 4.905 9.58e-07
##
## (Intercept) ***
## MR3 ***
## BSBG10ANo
## BSBG01Boy ***
## BSBG07BLower secondary **
## BSBG07BUpper secondary ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary ***
## BSBG07BBachelorвЂ\231s or equivalent ***
## BSBG07BPostgraduate degree ***
## BSBG07BDonвЂ\231t know ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.82 on 5864 degrees of freedom
## Multiple R-squared: 0.2576, Adjusted R-squared: 0.2563
## F-statistic: 203.4 on 10 and 5864 DF, p-value: < 2.2e-16
vif(regrB)
## GVIF Df GVIF^(1/(2*Df))
## MR3 1.025399 1 1.012620
## BSBG10A 1.030155 1 1.014966
## BSBG01 1.010963 1 1.005467
## BSBG07B 1.045725 7 1.003199
plot(regrB)
It explains up to 26% of the variance. It also makes a place of birth insignificant in this particular model. In general, we see that the higher father’s education is, the better a student performs in math. An increase by a degree unit in Lower Secondary education, gives a 14.8 increase in the outcome. An increase by a degree unit in Upper Secondary education, gives a 27.9 increase in the outcome.An increase by a degree unit in Post-Secondary education, gives a 25.7 increase in the outcome. An increase by a degree unit in Short-cycle tertiary education, gives a 43.8 increase in the outcome. An increase by a degree unit in Bachelor degree education, gives a 67.9 increase in the outcome. An increase by a degree unit in Postgraduate education, gives a 76.5 increase in the outcome. These are all compared to the lowest education.
We also checked for multicollinearity and since all the numbers are under 5, we are safe here. The plots show the data are normally distributed and there are 3 outliers but no leverages.
Let us now see if an interaction effect in gender makes any difference.
regrint <- lm(BSMMAT01 ~ MR3*BSBG01 + BSBG10A + BSBG07A + BSBG07B, data = datareg)
summary(regrint)
##
## Call:
## lm(formula = BSMMAT01 ~ MR3 * BSBG01 + BSBG10A + BSBG07A + BSBG07B,
## data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -305.903 -41.240 7.218 48.594 200.566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 577.7987 4.2985 134.419 < 2e-16
## MR3 27.8120 1.3062 21.292 < 2e-16
## BSBG01Boy -13.2622 1.8418 -7.201 6.75e-13
## BSBG10ANo 0.3746 2.7511 0.136 0.891698
## BSBG07ALower secondary 5.2315 5.1825 1.009 0.312800
## BSBG07AUpper secondary 21.2476 3.9589 5.367 8.31e-08
## BSBG07APost-secondary, non-tertiary 27.5119 4.5457 6.052 1.52e-09
## BSBG07AShort-cycle tertiary 28.6710 4.6945 6.107 1.08e-09
## BSBG07ABachelorвЂ\231s or equivalent 40.1462 4.6010 8.726 < 2e-16
## BSBG07APostgraduate degree 51.2433 6.0284 8.500 < 2e-16
## BSBG07ADonвЂ\231t know 11.9438 4.1129 2.904 0.003698
## BSBG07BLower secondary 12.5393 5.4874 2.285 0.022342
## BSBG07BUpper secondary 20.1077 4.2827 4.695 2.73e-06
## BSBG07BPost-secondary, non-tertiary 15.4958 4.6745 3.315 0.000922
## BSBG07BShort-cycle tertiary 31.1279 4.8528 6.414 1.52e-10
## BSBG07BBachelorвЂ\231s or equivalent 46.6535 4.7081 9.909 < 2e-16
## BSBG07BPostgraduate degree 51.3849 5.3856 9.541 < 2e-16
## BSBG07BDonвЂ\231t know 15.4466 4.2561 3.629 0.000287
## MR3:BSBG01Boy 8.7461 1.8447 4.741 2.17e-06
##
## (Intercept) ***
## MR3 ***
## BSBG01Boy ***
## BSBG10ANo
## BSBG07ALower secondary
## BSBG07AUpper secondary ***
## BSBG07APost-secondary, non-tertiary ***
## BSBG07AShort-cycle tertiary ***
## BSBG07ABachelorвЂ\231s or equivalent ***
## BSBG07APostgraduate degree ***
## BSBG07ADonвЂ\231t know **
## BSBG07BLower secondary *
## BSBG07BUpper secondary ***
## BSBG07BPost-secondary, non-tertiary ***
## BSBG07BShort-cycle tertiary ***
## BSBG07BBachelorвЂ\231s or equivalent ***
## BSBG07BPostgraduate degree ***
## BSBG07BDonвЂ\231t know ***
## MR3:BSBG01Boy ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69.92 on 5856 degrees of freedom
## Multiple R-squared: 0.2772, Adjusted R-squared: 0.275
## F-statistic: 124.8 on 18 and 5856 DF, p-value: < 2.2e-16
Well, unfortunately, the model didn’t improve much. We still have only 27.5% of variance explained. However, our interaction effect is significant.
plot_model(regrint, type = "int")
The plot proves our point. The higher attitudes in self-evaluation in math abilities results in better performance for girls until the somewhat ‘very good at math’ point, where boys start performing better by 8 ‘points’ in math than the girls with the same evaluation. Thus, our third research question is disproved since boys do better at some point.
plot(regrint)
vif(regrint)
## GVIF Df GVIF^(1/(2*Df))
## MR3 2.024079 1 1.422701
## BSBG01 1.018722 1 1.009318
## BSBG10A 1.038029 1 1.018837
## BSBG07A 3.795913 7 1.099967
## BSBG07B 3.790416 7 1.099853
## MR3:BSBG01 1.990631 1 1.410897
The plots show good results: normal distribution and no leverages. Multicollinearity test also showed good results by having all the numbers under 5. Overall, I would say that this model is the best.
The main thing we should get from this analysis is that attitudes towards math and how you evaluate yourself are crucial for your performance in the subject. We disproved our RQ2 and RQ3, however, RQ1 proved to be true - those who feel more confident in mathematics, do better at it. As for the parental education, father’s figure education matters a little bit more than the mother’s figure. Also, with the higher levels of confidence in their abilities, boys in 8th grade perform in math better than girls in 8th grade. Thus, we were able to repeat the success if the research that studies Attitudes and Performance in Mathematics, reaching almost 28% of explained variance in the last model. That is all. Again, next time the research may be more focused on the gender differences in performance based on how students evaluate themselves.
A. A. Lipnevich, C. MacCann, S. Krumm, J. Burrus, and R. D. Roberts, “Mathematics attitudes and mathematics outcomes of US and Belarusian middle school students,” Journal of Educational Psychology, vol. 103, no. 1, pp. 105–118, 2011.
M. Nicolaidou and G. Philippou, “Attitudes towards mathematics, self-efficacy and achievement in problem solving,” in European Research in Mathematics Education III, M. A. Mariotti, Ed., pp. 1–11, University of Pisa, Pisa, Italy, 2003.