Anna Gorobtsova
Nadezda Bykova
Artem Kulikov
Anastasia Vlasenko
In order to conduct EFA analysis and further regression analysis TIMSS 2015 data on Singapore is used. Variables which were selected for the analysis measure student’s attitudes towards mathematics. Therefore, the first question we would like to answer is whether there are some latent factors in the data. In order to answer this question exploratory factor analysis will be conducted. Secondly, regrerssion analysis will be conducted in order to see what possible variables can predict student’s math achievement.
library(foreign)
library(psych)
library(dplyr)
library(polycor)
library(corrplot)
library(sjPlot)
library(summarytools)
library(lmtest)
library(car)
library(gridExtra)
library(ggplot2)
library(GPArotation)
data1 <- read.spss("BSGSGPM6.sav", to.data.frame = TRUE, use.value.labels = TRUE)
data2<-data1[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")]
data3<-na.omit(data2)
save1<-c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")
save2<-c("BSMMAT01", "BSBG01", "BSBG07A", "BSBG07B", "BSBG10A")
data_fa <- data3[save1]
data_reg <- data3[save2]
print(dfSummary(data_fa, graph.magnif = 0.75,style="grid", varnumbers=FALSE, valid.col=FALSE, na.col=FALSE),
max.tbl.height = 300, method = "render")
| Variable | Stats / Values | Freqs (% of Valid) | Graph | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BSBM17A [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17B [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17C [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17D [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17E [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17F [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17G [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17H [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM17I [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18A [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18B [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18C [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18D [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18E [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18F [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18G [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18H [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18I [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM18J [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM19A [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM19B [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM19C [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM19D [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
|||||||||||||||||
| BSBM19E [factor] | 1. Agree a lot 2. Agree a little 3. Disagree a little 4. Disagree a lot |
|
Generated by summarytools 0.9.6 (R version 4.0.0)
2020-05-20
For further factor analysis we have 24 factor variables, which are answers on likert scale questions about math and math lessons attitudes among students from Singapore. All of them are supposed to measure to what extent a person agrees with certain statement concearning math lessons. Therefore, variables have 4 levels ranging from ‘Agree a lot’ to ‘Disagree a lot’. In the table above some frequencies of answers can be observed.
ggplot(data_reg, aes(x = BSBG01)) +
geom_bar(col = "white", fill = "cornflowerblue") +
ggtitle("Gender distribution barplot")+
xlab("boy/girl")+
ylab("Number of observations")+
theme_minimal()
The number of boys and girls in the data is almost the same which means that we have a representative sample.
data_reg$BSMMAT01<-as.numeric(as.character(data_reg$BSMMAT01))
data_reg %>%
group_by(BSBG01) %>%
summarize(mean = mean(BSMMAT01))
## # A tibble: 2 x 2
## BSBG01 mean
## <fct> <dbl>
## 1 Girl 622.
## 2 Boy 612.
Mean math achievement score for girls is 621.55, which is a little bit higher than for boys whose mean achievement score is 611.64. Therefore, maybe it is possible to hypothesise that further regression analysis might prove the fact that gender can be considered as a predictor of math achievement scores with girls performing better than boys.
m <- ggplot(data_reg, aes(x = BSBG07A)) +
geom_bar(col = "white", fill = "pink") +
ggtitle("Mother's education barplot")+
xlab("Level of mother's education")+
ylab("Number of observations")+
coord_flip()+
theme_minimal()
f <- ggplot(data_reg, aes(x = BSBG07B)) +
geom_bar(col = "white", fill = "cornflowerblue") +
ggtitle("Father's education barplot")+
xlab("Level of father's education")+
ylab("Number of observations")+
coord_flip()+
theme_minimal()
grid.arrange(m,f, nrow=2)
We have quite a lot ‘Don’t know’ responses, which is not very convinient. Nevertheless, it can be seen that in both cases most people have upper secondary education and bachelor’s degree. Additionally, it is interesting to mention that more fathers than mothers have Postgraduate degree. Moreover, relatively more mothers have some primary or lower secondary education or did not go to school at all. In case of all the other levels of education the distributions are more or less the same for mothers and fathers.
data_reg %>%
group_by(BSBG07A) %>%
summarize(mean = mean(BSMMAT01))
## # A tibble: 8 x 2
## BSBG07A mean
## <fct> <dbl>
## 1 Some Primary or Lower secondary or did not go to school 580.
## 2 Lower secondary 591.
## 3 Upper secondary 613.
## 4 Post-secondary, non-tertiary 619.
## 5 Short-cycle tertiary 633.
## 6 Bachelor’s or equivalent 659.
## 7 Postgraduate degree 666.
## 8 Don’t know 598.
From the table above it can be seen that the mean achievement scores are increasing as the level of mother’s education is increasing also. Therefore, as in case with gender we can also make a hypothesis about the level of mother’s education being the predictor of math achievement scores of a students with higher levels of mother’s education associated with higher achievement scores of a student.
data_reg %>%
group_by(BSBG07B) %>%
summarize(mean = mean(BSMMAT01))
## # A tibble: 8 x 2
## BSBG07B mean
## <fct> <dbl>
## 1 Some Primary or Lower secondary or did not go to school 579.
## 2 Lower secondary 593.
## 3 Upper secondary 609.
## 4 Post-secondary, non-tertiary 606.
## 5 Short-cycle tertiary 630.
## 6 Bachelor’s or equivalent 658.
## 7 Postgraduate degree 665.
## 8 Don’t know 600.
The same situation as with mother’s education can observed with father’s education as well. Therefore, here we can also hypothesise that the level of father’s education might be a predictor of math achievement scores and that higher levels of father’s education will be associated with higher math achievement scores.
ggplot(data_reg, aes(x = BSBG10A)) +
geom_bar(col = "white", fill = "cornflowerblue") +
ggtitle("Were you born in Singapore")+
xlab("yes/no")+
ylab("Number of observations")+
theme_minimal()
From the bar plot it can be seen that the majority of respondents are native citizens of Singapore. The group of those people who were not born there is almost 5 times smaller than the group of those who were. Therefore, it is questionable whether by including this variable in the further regerssion analysis we will get generalizable results as the group in of non-citizens is underrepresented in this sample
data_reg %>%
group_by(BSBG10A) %>%
summarize(mean = mean(BSMMAT01))
## # A tibble: 2 x 2
## BSBG10A mean
## <fct> <dbl>
## 1 Yes 614.
## 2 No 631.
Mean value for non-native citizens is higher. However, we must keep in mind the fact that the number of respondents from this group is too small, which might’ve influenced the result.
Before starting the actual factor analysis we need to look at the correlation matrix and see whether we do have some potential factors:
dat.cor <- hetcor(data_fa)
dat.cor<- dat.cor$correlations
dat.cor
## BSBM17A BSBM17B BSBM17C BSBM17D BSBM17E BSBM17F
## BSBM17A 1.0000000 -0.7193920 -0.7288701 0.7355962 0.9111533 0.7247957
## BSBM17B -0.7193920 1.0000000 0.7543606 -0.5770894 -0.7337688 -0.6041232
## BSBM17C -0.7288701 0.7543606 1.0000000 -0.5989825 -0.7423630 -0.6128955
## BSBM17D 0.7355962 -0.5770894 -0.5989825 1.0000000 0.7402890 0.6496178
## BSBM17E 0.9111533 -0.7337688 -0.7423630 0.7402890 1.0000000 0.7576284
## BSBM17F 0.7247957 -0.6041232 -0.6128955 0.6496178 0.7576284 1.0000000
## BSBM17G 0.8157328 -0.6925543 -0.6838195 0.6717706 0.8482756 0.7881664
## BSBM17H 0.7707337 -0.6338387 -0.6630358 0.6922386 0.7863611 0.7173778
## BSBM17I 0.8771027 -0.7351604 -0.7198372 0.6815484 0.9046162 0.7419462
## BSBM18A 0.4365172 -0.3361437 -0.3483428 0.4440063 0.4220274 0.3824189
## BSBM18B 0.4239973 -0.3355252 -0.3407935 0.4049221 0.4128685 0.3541809
## BSBM18C 0.5570614 -0.4150562 -0.4632566 0.5509088 0.5336086 0.4726867
## BSBM18D 0.4954503 -0.3723860 -0.4110799 0.5337899 0.4682367 0.4417382
## BSBM18E 0.4109797 -0.3163164 -0.3273311 0.4056564 0.3953043 0.3436193
## BSBM18F 0.4153566 -0.3241408 -0.3476863 0.4314988 0.4049488 0.3287098
## BSBM18G 0.3701707 -0.2727177 -0.2972288 0.4104661 0.3596911 0.3385384
## BSBM18H 0.3450139 -0.2410395 -0.2769005 0.4173687 0.3393638 0.3221327
## BSBM18I 0.3482216 -0.2553291 -0.2809522 0.4138137 0.3366036 0.3053723
## BSBM18J 0.3411972 -0.2596593 -0.3045194 0.3714719 0.3247526 0.3060154
## BSBM19A 0.6897207 -0.5912628 -0.5229339 0.5005319 0.6939807 0.5508194
## BSBM19B -0.4832689 0.4906922 0.4422061 -0.3166211 -0.4799602 -0.3596238
## BSBM19C -0.6473779 0.6173160 0.5552635 -0.4500244 -0.6630397 -0.5156707
## BSBM19D 0.6748903 -0.5596667 -0.5335571 0.5080798 0.6680382 0.5677637
## BSBM19E -0.4147306 0.4672779 0.4091927 -0.2669650 -0.4391049 -0.3525716
## BSBM17G BSBM17H BSBM17I BSBM18A BSBM18B BSBM18C
## BSBM17A 0.8157328 0.7707337 0.8771027 0.4365172 0.4239973 0.5570614
## BSBM17B -0.6925543 -0.6338387 -0.7351604 -0.3361437 -0.3355252 -0.4150562
## BSBM17C -0.6838195 -0.6630358 -0.7198372 -0.3483428 -0.3407935 -0.4632566
## BSBM17D 0.6717706 0.6922386 0.6815484 0.4440063 0.4049221 0.5509088
## BSBM17E 0.8482756 0.7863611 0.9046162 0.4220274 0.4128685 0.5336086
## BSBM17F 0.7881664 0.7173778 0.7419462 0.3824189 0.3541809 0.4726867
## BSBM17G 1.0000000 0.7508902 0.8288109 0.4140434 0.3830032 0.4787517
## BSBM17H 0.7508902 1.0000000 0.7809962 0.4554865 0.4913806 0.6331046
## BSBM17I 0.8288109 0.7809962 1.0000000 0.3873663 0.4030038 0.5025247
## BSBM18A 0.4140434 0.4554865 0.3873663 1.0000000 0.6711455 0.6068716
## BSBM18B 0.3830032 0.4913806 0.4030038 0.6711455 1.0000000 0.7497485
## BSBM18C 0.4787517 0.6331046 0.5025247 0.6068716 0.7497485 1.0000000
## BSBM18D 0.4316913 0.5783473 0.4369200 0.5444553 0.6551203 0.7975683
## BSBM18E 0.3721247 0.4762511 0.3775339 0.5931073 0.7449749 0.6764502
## BSBM18F 0.3701431 0.4827557 0.3873250 0.5919589 0.7733271 0.6763973
## BSBM18G 0.3462319 0.4351105 0.3313616 0.5536995 0.5961035 0.6017152
## BSBM18H 0.3059449 0.4389687 0.3094893 0.4917850 0.5681181 0.5917624
## BSBM18I 0.3043495 0.4303230 0.3020523 0.6022166 0.6343243 0.6113786
## BSBM18J 0.3267990 0.4258687 0.3127983 0.5661035 0.6391627 0.6045479
## BSBM19A 0.6380101 0.5411217 0.7480869 0.3429004 0.3344971 0.3400501
## BSBM19B -0.4721218 -0.3479669 -0.5441074 -0.2049625 -0.2095191 -0.1819877
## BSBM19C -0.6232494 -0.5100653 -0.7380489 -0.2641201 -0.2751570 -0.2966087
## BSBM19D 0.6633560 0.5509171 0.7069892 0.3601472 0.3693986 0.3635825
## BSBM19E -0.4308426 -0.3459162 -0.4794271 -0.1647835 -0.1694237 -0.1725734
## BSBM18D BSBM18E BSBM18F BSBM18G BSBM18H BSBM18I
## BSBM17A 0.4954503 0.4109797 0.4153566 0.37017075 0.34501392 0.34822165
## BSBM17B -0.3723860 -0.3163164 -0.3241408 -0.27271765 -0.24103952 -0.25532907
## BSBM17C -0.4110799 -0.3273311 -0.3476863 -0.29722876 -0.27690045 -0.28095216
## BSBM17D 0.5337899 0.4056564 0.4314988 0.41046607 0.41736870 0.41381368
## BSBM17E 0.4682367 0.3953043 0.4049488 0.35969108 0.33936381 0.33660364
## BSBM17F 0.4417382 0.3436193 0.3287098 0.33853839 0.32213272 0.30537228
## BSBM17G 0.4316913 0.3721247 0.3701431 0.34623192 0.30594492 0.30434946
## BSBM17H 0.5783473 0.4762511 0.4827557 0.43511053 0.43896873 0.43032300
## BSBM17I 0.4369200 0.3775339 0.3873250 0.33136158 0.30948934 0.30205227
## BSBM18A 0.5444553 0.5931073 0.5919589 0.55369948 0.49178501 0.60221664
## BSBM18B 0.6551203 0.7449749 0.7733271 0.59610347 0.56811808 0.63432429
## BSBM18C 0.7975683 0.6764502 0.6763973 0.60171519 0.59176237 0.61137857
## BSBM18D 1.0000000 0.6591867 0.6307510 0.61235983 0.66709662 0.60363191
## BSBM18E 0.6591867 1.0000000 0.8344016 0.62571618 0.59227405 0.66727259
## BSBM18F 0.6307510 0.8344016 1.0000000 0.66018751 0.63356709 0.69893239
## BSBM18G 0.6123598 0.6257162 0.6601875 1.00000000 0.63725315 0.65738366
## BSBM18H 0.6670966 0.5922741 0.6335671 0.63725315 1.00000000 0.70150717
## BSBM18I 0.6036319 0.6672726 0.6989324 0.65738366 0.70150717 1.00000000
## BSBM18J 0.5777493 0.6665217 0.6571562 0.61485954 0.60417791 0.71258447
## BSBM19A 0.2962577 0.2861207 0.2886201 0.24803162 0.19295750 0.21627977
## BSBM19B -0.1523205 -0.1607949 -0.1621349 -0.12951734 -0.08171768 -0.08257291
## BSBM19C -0.2525691 -0.2332858 -0.2288745 -0.18561717 -0.14136484 -0.15154191
## BSBM19D 0.3281293 0.3171978 0.3048265 0.26355465 0.22709306 0.23710884
## BSBM19E -0.1553963 -0.1367925 -0.1257406 -0.09158992 -0.07198792 -0.07916595
## BSBM18J BSBM19A BSBM19B BSBM19C BSBM19D BSBM19E
## BSBM17A 0.3411972 0.6897207 -0.48326887 -0.6473779 0.6748903 -0.41473060
## BSBM17B -0.2596593 -0.5912628 0.49069224 0.6173160 -0.5596667 0.46727795
## BSBM17C -0.3045194 -0.5229339 0.44220608 0.5552635 -0.5335571 0.40919272
## BSBM17D 0.3714719 0.5005319 -0.31662112 -0.4500244 0.5080798 -0.26696505
## BSBM17E 0.3247526 0.6939807 -0.47996019 -0.6630397 0.6680382 -0.43910486
## BSBM17F 0.3060154 0.5508194 -0.35962382 -0.5156707 0.5677637 -0.35257160
## BSBM17G 0.3267990 0.6380101 -0.47212179 -0.6232494 0.6633560 -0.43084263
## BSBM17H 0.4258687 0.5411217 -0.34796694 -0.5100653 0.5509171 -0.34591620
## BSBM17I 0.3127983 0.7480869 -0.54410745 -0.7380489 0.7069892 -0.47942706
## BSBM18A 0.5661035 0.3429004 -0.20496246 -0.2641201 0.3601472 -0.16478346
## BSBM18B 0.6391627 0.3344971 -0.20951910 -0.2751570 0.3693986 -0.16942371
## BSBM18C 0.6045479 0.3400501 -0.18198775 -0.2966087 0.3635825 -0.17257339
## BSBM18D 0.5777493 0.2962577 -0.15232050 -0.2525691 0.3281293 -0.15539630
## BSBM18E 0.6665217 0.2861207 -0.16079494 -0.2332858 0.3171978 -0.13679254
## BSBM18F 0.6571562 0.2886201 -0.16213485 -0.2288745 0.3048265 -0.12574063
## BSBM18G 0.6148595 0.2480316 -0.12951734 -0.1856172 0.2635546 -0.09158992
## BSBM18H 0.6041779 0.1929575 -0.08171768 -0.1413648 0.2270931 -0.07198792
## BSBM18I 0.7125845 0.2162798 -0.08257291 -0.1515419 0.2371088 -0.07916595
## BSBM18J 1.0000000 0.2347068 -0.12377651 -0.1633140 0.2471607 -0.11813449
## BSBM19A 0.2347068 1.0000000 -0.62863345 -0.7683127 0.7386583 -0.50530068
## BSBM19B -0.1237765 -0.6286334 1.00000000 0.7645472 -0.5683495 0.53180002
## BSBM19C -0.1633140 -0.7683127 0.76454719 1.0000000 -0.6558558 0.57176167
## BSBM19D 0.2471607 0.7386583 -0.56834948 -0.6558558 1.0000000 -0.44979620
## BSBM19E -0.1181345 -0.5053007 0.53180002 0.5717617 -0.4497962 1.00000000
corrplot(dat.cor, method = "circle")
And from the correlation plot above it can be clearly seen that we have 5 groups of variables, which have rather high correlation indices and therefore сan be our potential factors.
Turning variables into numeric form:
datafa<-as.data.frame(lapply(data_fa, as.numeric))
fa.parallel(datafa)
## Parallel analysis suggests that the number of factors = 4 and the number of components = 3
Interpretation of the Parallel Analysis screen plot:
From the plot it can be seen that 4 factors in the “Factor Analysis” lie above the corresponding simulated data line and 3 components in the “Principal Components” parallel analysis lie above the corresponding simulated data line.
Therefore, Parallel analysis suggests that the number of factors = 4 and the number of components = 3
fa1<-fa(datafa, 4, cor = "mixed")
fa1
## Factor Analysis using method = minres
## Call: fa(r = datafa, nfactors = 4, cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 MR4 h2 u2 com
## BSBM17A 0.87 -0.03 -0.07 0.05 0.86 0.138 1.0
## BSBM17B -0.65 0.05 0.21 -0.04 0.63 0.367 1.2
## BSBM17C -0.73 0.02 0.08 -0.03 0.62 0.380 1.0
## BSBM17D 0.79 0.19 0.10 -0.05 0.65 0.354 1.2
## BSBM17E 0.92 -0.03 -0.06 0.01 0.90 0.098 1.0
## BSBM17F 0.85 0.05 0.04 -0.07 0.67 0.329 1.0
## BSBM17G 0.84 0.01 -0.09 -0.02 0.80 0.205 1.0
## BSBM17H 0.84 0.05 0.10 0.12 0.76 0.241 1.1
## BSBM17I 0.79 -0.04 -0.22 0.03 0.88 0.116 1.2
## BSBM18A 0.10 0.26 -0.08 0.42 0.53 0.475 1.9
## BSBM18B -0.04 -0.04 -0.07 0.94 0.83 0.168 1.0
## BSBM18C 0.37 0.07 0.17 0.62 0.75 0.250 1.8
## BSBM18D 0.32 0.31 0.14 0.37 0.66 0.341 3.2
## BSBM18E -0.02 0.24 -0.04 0.66 0.74 0.259 1.3
## BSBM18F -0.02 0.30 -0.05 0.61 0.76 0.239 1.5
## BSBM18G 0.05 0.65 -0.03 0.13 0.61 0.387 1.1
## BSBM18H 0.08 0.81 0.03 -0.04 0.66 0.342 1.0
## BSBM18I -0.03 0.86 -0.06 0.02 0.76 0.241 1.0
## BSBM18J -0.03 0.60 -0.07 0.23 0.63 0.373 1.3
## BSBM19A 0.27 0.02 -0.63 0.06 0.72 0.279 1.4
## BSBM19B 0.08 -0.05 0.86 -0.01 0.68 0.324 1.0
## BSBM19C -0.15 -0.02 0.80 -0.01 0.83 0.170 1.1
## BSBM19D 0.35 -0.01 -0.48 0.10 0.62 0.383 1.9
## BSBM19E -0.10 0.01 0.56 0.00 0.40 0.604 1.1
##
## MR1 MR2 MR3 MR4
## SS loadings 7.24 3.31 3.00 3.39
## Proportion Var 0.30 0.14 0.13 0.14
## Cumulative Var 0.30 0.44 0.56 0.71
## Proportion Explained 0.43 0.20 0.18 0.20
## Cumulative Proportion 0.43 0.62 0.80 1.00
##
## With factor correlations of
## MR1 MR2 MR3 MR4
## MR1 1.00 0.42 -0.62 0.48
## MR2 0.42 1.00 -0.06 0.80
## MR3 -0.62 -0.06 1.00 -0.20
## MR4 0.48 0.80 -0.20 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 23.57 with Chi Square of 138249
## The degrees of freedom for the model are 186 and the objective function was 1.26
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic number of observations is 5875 with the empirical chi square 1291.37 with prob < 1.2e-164
## The total number of observations was 5875 with Likelihood Chi Square = 7366.33 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.923
## RMSEA index = 0.081 and the 90 % confidence intervals are 0.079 0.083
## BIC = 5752.13
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2 MR3 MR4
## Correlation of (regression) scores with factors 0.99 0.95 0.95 0.96
## Multiple R square of scores with factors 0.97 0.91 0.91 0.93
## Minimum correlation of possible factor scores 0.94 0.82 0.82 0.86
fa.diagram(fa1)
Description of the model fit:
First of all, looking at the factor loadings it can be seen that almost all variables belong to only one factor, this is also proved by the low complexity values. Only variable ‘BSBM18D’ has high complexity value and its’ uniqueness value is also higher than in all the other cases. However, its’ shared variance (communality) is higher than its’ unique variance, which is a good sign
Proportion explained: the explained variance should be evenly distributed among factors. In this model we can see that factor MR1 explains the largest proportion of variance which is 43%, while the estimate for all the other factors is more or less the same. This is not perfect, but looking on other fit indices it is possible to say that the model actually has a good fit
Proportion variance: A factor should explain at least 10% of the variance. In this model it can be seen that all the factors meet this criterion.
Cumulative Variance: looking at this parameter we can see that all in all our model explains 71% of variance
Also Chi Square of 138249.6 tells us that observed and expected data aren’t significantly different, which is good
Tucker Lewis Index of factoring reliability = 0.923, which is a very good measure of model fit (it should be >0.9)
RMSR index = 0.02 , which is also good, as it should be <0,05
Therefore, the model with 4 factors has a good fit judging by the values we got. Additionally, it is important to mention that the argument ‘cor=mixed’ uses oblique rotation by default allowing the factors to be correlated between each other. The 4 of our factors have rather hight correlation indices.
Factor names:
MR1: Variables which were assigned to this factor mostly measure the extant to which a students likes doing mathematics or not. The factor include such likert scale questions, as “I like mathematics”, “Mathematics is boring”, “I look forward to mathematics class” etc. Therefore, the name for this factor can be Student’s attitude towards mathematics
MR2: Variables which were assigned to this factor mostly measure support of a teacher, the extent to which he or she cares about studetns understanding the subject and helping them to make sence of their mistakes and tasks that they don’t understand. The factor include such likert scale questions, as “My teacher tells me how to do better when i make a mistake” or “My teacher listens to what i have to say” and others. Therefore, the name for this factor can be Level of teacher’s support
MR3: Variables which were assigned to this factor mostly measure the extent to which a student assesses his or her own level of understanding of mathematics. The factor include such likert scale questions, as “I usually do well in mathematics”, “I learn things quickly in mathematics” or “Mathematics is not one of my strengths” etc. Therefore, the name for this factor can be Self-perceived mathematical abilities
MR4: Variables which were assigned to this factor mostly measure the extent to which students understand the materials and requirements which are presented to them by the teacher. The factor include such likert scale questions, as “My teacher is good at explaining mathematics” or “My teacher is easy to understand”. Therefore, the name for this factor can be Level of clarity of teacher’s requirements and materials
MR1<- as.data.frame(datafa[c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I")])
psych::alpha(MR1,check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR1, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.94 0.94 0.94 0.65 16 0.0011 2.2 0.79 0.64
##
## lower alpha upper 95% confidence boundaries
## 0.94 0.94 0.94
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM17A 0.93 0.93 0.93 0.63 14 0.0013 0.0060 0.64
## BSBM17B- 0.94 0.94 0.94 0.66 16 0.0012 0.0063 0.65
## BSBM17C- 0.94 0.94 0.94 0.66 15 0.0012 0.0069 0.65
## BSBM17D 0.94 0.94 0.94 0.67 16 0.0012 0.0058 0.66
## BSBM17E 0.93 0.93 0.93 0.63 13 0.0014 0.0049 0.63
## BSBM17F 0.94 0.94 0.94 0.66 15 0.0012 0.0069 0.65
## BSBM17G 0.93 0.93 0.93 0.64 14 0.0013 0.0068 0.64
## BSBM17H 0.94 0.94 0.94 0.65 15 0.0012 0.0076 0.64
## BSBM17I 0.93 0.93 0.93 0.63 14 0.0013 0.0058 0.64
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM17A 5875 0.87 0.88 0.87 0.84 1.9 0.90
## BSBM17B- 5875 0.78 0.77 0.74 0.71 2.2 1.07
## BSBM17C- 5875 0.79 0.78 0.75 0.73 2.2 0.95
## BSBM17D 5875 0.74 0.75 0.71 0.69 1.9 0.82
## BSBM17E 5875 0.90 0.90 0.90 0.87 2.0 0.95
## BSBM17F 5875 0.79 0.79 0.76 0.73 2.5 0.91
## BSBM17G 5875 0.86 0.86 0.85 0.82 2.2 0.97
## BSBM17H 5875 0.82 0.82 0.79 0.77 2.4 0.95
## BSBM17I 5875 0.89 0.88 0.87 0.85 2.3 1.10
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM17A 0.38 0.41 0.13 0.08 0
## BSBM17B 0.16 0.23 0.30 0.31 0
## BSBM17C 0.10 0.28 0.35 0.26 0
## BSBM17D 0.33 0.46 0.16 0.05 0
## BSBM17E 0.34 0.39 0.17 0.10 0
## BSBM17F 0.16 0.35 0.36 0.13 0
## BSBM17G 0.25 0.37 0.25 0.13 0
## BSBM17H 0.20 0.37 0.30 0.13 0
## BSBM17I 0.33 0.27 0.22 0.19 0
Cronbach’s alpha is 0.9427067, which indicates very good scale reliability. Which means that if we use this scale to measure this construct multiple times we will get the same results showing very good internal consistency.
MR2<- as.data.frame(datafa[c("BSBM18I", "BSBM18H", "BSBM18G", "BSBM18J")])
psych::alpha(MR2,check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR2, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.84 0.84 0.8 0.56 5.1 0.0035 1.9 0.64 0.55
##
## lower alpha upper 95% confidence boundaries
## 0.83 0.84 0.84
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM18I 0.77 0.77 0.69 0.53 3.4 0.0052 0.00036 0.52
## BSBM18H 0.80 0.80 0.73 0.57 3.9 0.0046 0.00190 0.56
## BSBM18G 0.80 0.80 0.73 0.58 4.1 0.0045 0.00308 0.60
## BSBM18J 0.80 0.80 0.73 0.57 4.0 0.0045 0.00082 0.56
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM18I 5875 0.84 0.85 0.78 0.71 1.8 0.76
## BSBM18H 5875 0.82 0.81 0.72 0.66 1.9 0.81
## BSBM18G 5875 0.80 0.80 0.70 0.64 2.0 0.78
## BSBM18J 5875 0.81 0.81 0.71 0.65 1.8 0.78
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM18I 0.40 0.46 0.10 0.03 0
## BSBM18H 0.33 0.47 0.16 0.04 0
## BSBM18G 0.29 0.50 0.17 0.04 0
## BSBM18J 0.36 0.48 0.12 0.04 0
Cronbach’s alpha is 0.8358135, which indicates very good scale reliability. Which means that if we use this scale to measure this construct multiple times we will get the same results showing very good internal consistency.
MR3<- as.data.frame(datafa[c("BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E")])
psych::alpha(MR3,check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR3, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.86 0.86 0.84 0.54 6 0.0029 2.6 0.78 0.52
##
## lower alpha upper 95% confidence boundaries
## 0.85 0.86 0.86
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM19A- 0.81 0.81 0.78 0.52 4.3 0.0040 0.0102 0.50
## BSBM19B 0.82 0.82 0.80 0.54 4.7 0.0037 0.0145 0.54
## BSBM19C 0.80 0.80 0.76 0.50 4.0 0.0043 0.0092 0.48
## BSBM19D- 0.83 0.83 0.80 0.55 5.0 0.0035 0.0117 0.52
## BSBM19E 0.86 0.86 0.84 0.61 6.2 0.0029 0.0061 0.62
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM19A- 5875 0.84 0.84 0.80 0.73 2.8 0.98
## BSBM19B 5875 0.80 0.80 0.74 0.68 2.6 0.92
## BSBM19C 5875 0.87 0.86 0.84 0.77 2.5 1.09
## BSBM19D- 5875 0.77 0.78 0.71 0.65 2.7 0.90
## BSBM19E 5875 0.70 0.70 0.57 0.53 2.5 0.98
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM19A 0.25 0.39 0.22 0.14 0
## BSBM19B 0.13 0.28 0.41 0.19 0
## BSBM19C 0.23 0.25 0.28 0.23 0
## BSBM19D 0.21 0.41 0.28 0.10 0
## BSBM19E 0.19 0.33 0.31 0.16 0
Cronbach’s alpha is 0.856337, which indicates very good scale reliability. Which means that if we use this scale to measure this construct multiple times we will get the same results showing very good internal consistency.
MR4<- as.data.frame(datafa[c("BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F")])
psych::alpha(MR4,check.keys = TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = MR4, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.89 0.89 0.89 0.58 8.3 0.0021 1.9 0.63 0.56
##
## lower alpha upper 95% confidence boundaries
## 0.89 0.89 0.9
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM18A 0.89 0.89 0.88 0.62 8.3 0.0023 0.0051 0.61
## BSBM18B 0.86 0.86 0.85 0.56 6.4 0.0027 0.0092 0.55
## BSBM18C 0.87 0.87 0.85 0.57 6.6 0.0027 0.0087 0.56
## BSBM18D 0.88 0.88 0.86 0.59 7.2 0.0024 0.0076 0.58
## BSBM18E 0.87 0.87 0.85 0.57 6.6 0.0026 0.0074 0.56
## BSBM18F 0.87 0.87 0.85 0.57 6.7 0.0026 0.0066 0.56
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM18A 5875 0.70 0.72 0.62 0.59 1.7 0.65
## BSBM18B 5875 0.85 0.85 0.81 0.77 1.8 0.80
## BSBM18C 5875 0.84 0.83 0.80 0.75 2.0 0.82
## BSBM18D 5875 0.80 0.79 0.73 0.69 2.2 0.85
## BSBM18E 5875 0.83 0.83 0.80 0.75 1.8 0.78
## BSBM18F 5875 0.83 0.83 0.79 0.74 1.7 0.78
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM18A 0.41 0.51 0.06 0.01 0
## BSBM18B 0.39 0.44 0.13 0.04 0
## BSBM18C 0.26 0.48 0.20 0.05 0
## BSBM18D 0.20 0.44 0.28 0.08 0
## BSBM18E 0.40 0.44 0.13 0.03 0
## BSBM18F 0.46 0.40 0.11 0.03 0
Cronbach’s alpha is 0.8924768, which indicates very good scale reliability. Which means that if we use this scale to measure this construct multiple times we will get the same results showing very good internal consistency.
fascores<-as.data.frame(fa1$scores)
datareg<-cbind(data_reg,fascores)
datareg$BSMMAT01<-as.numeric(as.character(datareg$BSMMAT01))
names(datareg)[names(datareg) == "BSMMAT01"] <- "matach"
names(datareg)[names(datareg) == "BSBG01"] <- "gender"
names(datareg)[names(datareg) == "BSBG07A"] <- "educmat"
names(datareg)[names(datareg) == "BSBG07B"] <- "educfat"
names(datareg)[names(datareg) == "BSBG10A"] <- "countrybrn"
names(datareg)[names(datareg) == "MR1"] <- "likemat"
names(datareg)[names(datareg) == "MR2"] <- "teachsup"
names(datareg)[names(datareg) == "MR3"] <- "matab"
names(datareg)[names(datareg) == "MR4"] <- "clreq"
names(datareg)
## [1] "matach" "gender" "educmat" "educfat" "countrybrn"
## [6] "likemat" "teachsup" "matab" "clreq"
Do education of parents and self-confidence of a student in his or her math abilities influence math achievement?
Hypotheses:
H1: Higher educational levels of parents will be associated with higher math achievemnts of their children.
H2: Higher self-percieved math abilities of a student will be associated with higher math achievements irrespective of the gender of a person
Recoding variables:
datareg$educmat <- ifelse(datareg$educmat == "Some Primary or Lower secondary or did not go to school",1,
ifelse(datareg$educmat == "Lower secondary", 2,
ifelse(datareg$educmat =="Upper secondary", 3,
ifelse(datareg$educmat == "Post-secondary, non-tertiary", 4,
ifelse(datareg$educmat == "Short-cycle tertiary",5,
ifelse(datareg$educmat == "Bachelor’s or equivalent",6,
ifelse(datareg$educmat == "Postgraduate degree", 7, NA)))))))
datareg$educfat <- ifelse(datareg$educfat == "Some Primary or Lower secondary or did not go to school",1,
ifelse(datareg$educfat == "Lower secondary", 2,
ifelse(datareg$educfat =="Upper secondary", 3,
ifelse(datareg$educfat == "Post-secondary, non-tertiary", 4,
ifelse(datareg$educfat == "Short-cycle tertiary",5,
ifelse(datareg$educfat == "Bachelor’s or equivalent",6,
ifelse(datareg$educfat == "Postgraduate degree", 7, NA)))))))
regmodel <- lm(matach~educfat+educmat+matab*gender, data=datareg)
summary(regmodel)
##
## Call:
## lm(formula = matach ~ educfat + educmat + matab * gender, data = datareg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -312.432 -39.042 7.688 46.332 190.835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 565.0991 3.2186 175.574 < 2e-16 ***
## educfat 7.6696 0.7967 9.627 < 2e-16 ***
## educmat 8.2342 0.8263 9.965 < 2e-16 ***
## matab 26.0294 1.5306 17.006 < 2e-16 ***
## genderBoy -13.1746 2.2349 -5.895 4.09e-09 ***
## matab:genderBoy 8.8848 2.2362 3.973 7.23e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 66.86 on 3611 degrees of freedom
## (2258 observations deleted due to missingness)
## Multiple R-squared: 0.2907, Adjusted R-squared: 0.2897
## F-statistic: 296 on 5 and 3611 DF, p-value: < 2.2e-16
plot_model(regmodel, type = "int")
The model explains aproximately 29% of variance (Adjusted R-squared:0.2907). p-values are also signifficant and the coefficients show:
Meaningful interpretation of the results:
Father’s education: the general trend is that the higher the educational level of a father the higher the level of math achievement a student experience
Mother’s education: the same trend can be observed in case of mother’s education. Children whose mothers have higher educational levels tend to have higher math achievement levels. Therefore, one may conclude that children with well-educated parents tend to do better on their math classes
Self-perceived mathematical abilities of a student: here it can be observed that the hifgher the confidence of a student in his or her math abilities the higher the achievement scores are. Meaning that students who are confident in their abilities tend to perform better
Gender: Additionally, the coefficients show that boys tend to have lower math achievement scores
Interaction effect: However, an interaction effect can be observed between gender and self-perceived math abilities of a student. Specifically with the increase in Self-perceived mathematical abilities of a student the level of math achievement increases differently for boys and girls. The results suggest that boys who are confident in their math abilities start to perform better than girls taking into aacount the same levels of self-perceived mathematical abilities.
Technical interpretation of coefficients:
With one unit increase in the variable “educfat” math achievement increases by 7.6696
With one unit increase in the variable “educmat” math achievement increases by 8.2342
With one unit increase in the variable “matab” math achievement increases by 26.0294
If a student is a boy, than his math achievement decreases by 13.1746 comparing to girls
par(mfrow = c(2,2))
plot(regmodel)
Residuals VS Fitted (we can see that dots are notquite evenly dispersed around zero, whihch means that we face the problem of heteroscedasticity)
Normal Q-Q plot shows that our data is normally distributed
Also we do not have any leverages or influential cases, as Cook’s distance line is not present on the last plot
Checking for heteroscedasticity again:
bptest(regmodel)
##
## studentized Breusch-Pagan test
##
## data: regmodel
## BP = 120.17, df = 5, p-value < 2.2e-16
ncvTest(regmodel)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 126.3135, Df = 1, p = < 2.22e-16
These two tests also produce significant p-values, which proves the fact that we have the problem of heteroscedasticity.
Checking for multicollinearity:
vif(regmodel)
## educfat educmat matab gender matab:gender
## 1.725496 1.718170 1.896043 1.008564 1.873943
Values are less than 5. Therefore, it can be concluded that we do not have multicollinearity.
After conducting EFA 4 latent factors were found:
Student’s attitude towards mathematics
Level of teacher’s support
Self-perceived mathematical abilities
Level of clarity of teacher’s requirements and materials
Additionally, after conducting regression analysis the first hypothesis about higher level of parent’s education being a predictor of higher math achievemnts of their child has been supported. The second hypothesis was also supported but only partially. Regression analysis has indeed showed that higher level of self-percieved math abilities are associated with higher math achievements. However, it was found that with the increase in self-percieved math abilities the level of math achievement was increasing differently for boys and girls. To be more specific, girls tend to have higher math abilities tnan boys when the level of self-perceived math abilities is rather low or somewhat medium. However, when it is relatively high boys are starting to experience more rapid increase in the level of math achievement comparing to girls who have the same level of self-perceived math abilities which means that there is a significant intereaction effect betwwen self-percieved math abilities and gender.