- 0.1 Iran
- 0.2 Describe the aim and the research question of your small study - they might be different from those described in the article (1 pt)
- 0.3 Conduct EFA. Present your correlation matrix. How did you choose the number of factors? (1 pt)
- 0.4 Interpret the resulting factor structure. What type of rotation did you choose and why? Show the diagram. Is your structure different from the one Hanna Eklof got in her article? Why? (2 pts)
- 1 Analyze the model fit; Chronbach’s alpha. What can you say about your scale(s)? (1 pt)
- 1.1 Conduct regression analysis explaining math achievement (BSMMAT01) by the factors you got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside. Interpret the results. (2 pts)
- 1.2 Write a short conclusion with overall interpretation of the results (1 pt)
0.1 Iran
0.2 Describe the aim and the research question of your small study - they might be different from those described in the article (1 pt)
This study is part of a huge imaginary project which aimed at testing TIMSS for different countries.
“How universal are these scales and what are the differences?”
- Test if western created measures works in eastern Arab country
- Test explanatory power of attitudes factors to math achievements
- Describe the data, give descriptive statistics for your variables - should be the same as in the study (2 pts)
library(foreign)
BSGIRNM6 <- read.spss("~/datanal/3year/FA/BSGIRNM6.sav", to.data.frame=TRUE)
library(dplyr)
df_Iran <- BSGIRNM6 %>% dplyr::select(BSBM19A:BSBM19D, BSBM17A, BSBM20A:BSBM20E, BSMMAT01, ITSEX, BSBG10A, BSBG07B, BSBG07A)
knitr::kable(head(df_Iran, 5))
BSBM19A | BSBM19B | BSBM19C | BSBM19D | BSBM17A | BSBM20A | BSBM20B | BSBM20C | BSBM20D | BSBM20E | BSMMAT01 | ITSEX | BSBG10A | BSBG07B | BSBG07A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Disagree a lot | Agree a lot | Agree a lot | Disagree a little | Disagree a little | Disagree a lot | Disagree a little | Disagree a little | Disagree a lot | Disagree a lot | 509.70228 | Female | Yes | Lower secondary | Lower secondary |
Agree a little | Disagree a lot | Agree a little | Agree a lot | Disagree a little | Agree a little | Agree a lot | Agree a lot | Agree a lot | Agree a little | 574.81151 | Female | Yes | Postgraduate degree | Post-secondary, non-tertiary |
Agree a little | Agree a little | Agree a lot | Agree a little | Agree a little | Agree a lot | Agree a little | Agree a lot | Agree a little | Agree a little | 544.69944 | Female | Yes | Bachelor’s or equivalent | Bachelor’s or equivalent |
Agree a little | Agree a little | Agree a little | Agree a little | Agree a lot | Agree a lot | Agree a lot | Agree a lot | Agree a lot | Agree a lot | 515.23793 | Female | Yes | Lower secondary | Upper secondary |
Agree a little | Agree a little | Agree a little | Agree a little | Agree a little | Agree a lot | Agree a lot | Agree a lot | Agree a lot | Agree a little | 542.62545 | Female | Yes | Lower secondary | Lower secondary |
code.book <- data.frame(code = colnames(df_Iran), question = c(" I usually do well in mathematics","Mathematics is more difficult for me than for many of my classmates",
"Mathematics is not one of my strengths", " I learn things quickly in mathematics", " I enjoy learning mathematics", " I think learning mathematics will help me in my daily life",
" I need mathematics to learn other school subjects", "I need to do well in mathematics to get into the <university> of my choice", " I need to do well in mathematics to get the job I want", " I would like a job that involves using mathematics", " the eighth grade mathematics achievement plausible values", "sex", "Were you born in Iran?", "What is the highest level of education completed by your father", "What is the highest level of education completed by your mother"))
knitr::kable(code.book)
code | question |
---|---|
BSBM19A | I usually do well in mathematics |
BSBM19B | Mathematics is more difficult for me than for many of my classmates |
BSBM19C | Mathematics is not one of my strengths |
BSBM19D | I learn things quickly in mathematics |
BSBM17A | I enjoy learning mathematics |
BSBM20A | I think learning mathematics will help me in my daily life |
BSBM20B | I need mathematics to learn other school subjects |
BSBM20C | I need to do well in mathematics to get into the |
BSBM20D | I need to do well in mathematics to get the job I want |
BSBM20E | I would like a job that involves using mathematics |
BSMMAT01 | the eighth grade mathematics achievement plausible values |
ITSEX | sex |
BSBG10A | Were you born in Iran? |
BSBG07B | What is the highest level of education completed by your father |
BSBG07A | What is the highest level of education completed by your mother |
Only 10 v |
ariables available in 2015 data, so we want be able to evaluate 2 missing question. Things change. |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BSBM19A* | 1 | 6088 | 2.016097 | 9.202130e-01 | 2.0 | 1.908662 | 1.4826 | 1 | 4 | 3 | 0.6549524118 | -0.37393607 | 0.011793726 |
BSBM19B* | 2 | 6058 | 2.524431 | 1.066308e+00 | 2.0 | 2.530528 | 1.4826 | 1 | 4 | 3 | 0.0890651658 | -1.24687886 | 0.013699918 |
BSBM19C* | 3 | 5986 | 2.530738 | 1.111258e+00 | 2.0 | 2.538413 | 1.4826 | 1 | 4 | 3 | 0.0613887310 | -1.35456879 | 0.014363047 |
BSBM19D* | 4 | 5975 | 2.044184 | 9.551663e-01 | 2.0 | 1.931813 | 1.4826 | 1 | 4 | 3 | 0.5917264983 | -0.60051670 | 0.012356915 |
BSBM17A* | 5 | 6090 | 1.807882 | 9.705147e-01 | 2.0 | 1.639984 | 1.4826 | 1 | 4 | 3 | 1.0207785444 | -0.02814225 | 0.012436365 |
BSBM20A* | 6 | 6089 | 1.578256 | 8.565816e-01 | 1.0 | 1.399343 | 0.0000 | 1 | 4 | 3 | 1.4880037742 | 1.42473529 | 0.010977306 |
BSBM20B* | 7 | 6065 | 1.756966 | 8.868885e-01 | 2.0 | 1.618792 | 1.4826 | 1 | 4 | 3 | 1.0273595279 | 0.23902516 | 0.011388162 |
BSBM20C* | 8 | 6060 | 1.473267 | 8.089152e-01 | 1.0 | 1.284241 | 0.0000 | 1 | 4 | 3 | 1.7541325900 | 2.30715680 | 0.010391224 |
BSBM20D* | 9 | 6057 | 1.614000 | 8.855896e-01 | 1.0 | 1.442748 | 0.0000 | 1 | 4 | 3 | 1.3521593226 | 0.86799393 | 0.011378990 |
BSBM20E* | 10 | 6043 | 2.264604 | 1.086456e+00 | 2.0 | 2.205791 | 1.4826 | 1 | 4 | 3 | 0.3462894067 | -1.17039462 | 0.013976098 |
BSMMAT01* | 11 | 6130 | 3063.463948 | 1.768202e+03 | 3063.5 | 3063.454935 | 2269.8606 | 1 | 6126 | 6125 | 0.0000292468 | -1.20009588 | 22.584041406 |
ITSEX* | 12 | 6130 | 1.510604 | 4.999283e-01 | 2.0 | 1.513254 | 0.0000 | 1 | 2 | 1 | -0.0424135160 | -1.99852704 | 0.006385244 |
BSBG10A* | 13 | 6041 | 1.009436 | 9.668527e-02 | 1.0 | 1.000000 | 0.0000 | 1 | 2 | 1 | 10.1459767158 | 100.95755566 | 0.001243959 |
BSBG07B* | 14 | 6050 | 3.581983 | 2.123164e+00 | 3.0 | 3.408264 | 2.9652 | 1 | 8 | 7 | 0.5136830879 | -0.91494837 | 0.027296430 |
BSBG07A* | 15 | 6043 | 3.254675 | 2.083377e+00 | 3.0 | 3.026474 | 2.9652 | 1 | 8 | 7 | 0.7091241271 | -0.56728575 | 0.026800417 |
Mean age is near 18
a <- nrow(df_Iran)
b <- nrow(na.omit(df_Iran))
cat("There were", a, "observations in collected by TIMSS data.", "Without missings that would be", b)
## There were 6130 observations in collected by TIMSS data. Without missings that would be 5580
library(ggplot2)
library(ggthemes)
library(gridExtra)
df_Iran_o <- df_Iran %>% na.omit %>% select(-BSMMAT01)
df_Iran_reg <- df_Iran %>% na.omit %>% select(BSMMAT01)
plots <- list()
for (nm in names(df_Iran_o)){
plots[[nm]] <- ggplot(df_Iran_o)+
ggtitle(paste("distribution of variable" ,nm))+
geom_bar(aes_string(nm), stat = "count", fill = "#E63946")+
theme_wsj() +
coord_flip()+
theme(plot.title = element_text(size = 18, face = "bold"))
}
plots[[sample(1:12, 1)]]
##
## Female Male
## 2752 2828
There are 2752 female and 2828 male students in the sample.
0.3 Conduct EFA. Present your correlation matrix. How did you choose the number of factors? (1 pt)
library(polycor)
corrr<- hetcor(df_Iran_o[,-10:-14])
corrr <- corrr$correlations
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# matrix of the p-value of the correlation
p.mat <- cor.mtest(corrr)
corrplot::corrplot(corrr, type="upper", order="hclust",
p.mat = p.mat, sig.level = 0.05, insig = "blank")
All variables shows significant p-value < 0.05 correlation Most of them are strongly inter-correlated (almost 0.7) There are two group of correlated variables
## Parallel analysis suggests that the number of factors = 4 and the number of components = 2
4 factors were offered, so lets try them
There are two factors, that explains only 2 variables each. We need more explanation!
Still only two for the third factor.
Better with 2 factors, even if one loading is not good. (0.4)
0.4 Interpret the resulting factor structure. What type of rotation did you choose and why? Show the diagram. Is your structure different from the one Hanna Eklof got in her article? Why? (2 pts)
Varimax was used as rotation, because oblimin showed correlation between factor, when other variables does not changed much. rotation was used to significantly increase quality of model, to relay on data not on random. Low correlation 0.4 was stable with oblimin rotation to, so I decided to not follow the article.
BSBM19B, BSBM19C, BSBM17A, BSBM19D, BSBM19A – ML2
BSBM20A, BSBM20B, BSBM20D, BSBM20C – ML1
ML1
match is necessary for human in future
ML2
skills in match (19c and 19b correspond lack of skills)
Factors do meet results of PCA from the paper.
1 Analyze the model fit; Chronbach’s alpha. What can you say about your scale(s)? (1 pt)
## [1] BSBM19A BSBM19B BSBM19C BSBM19D BSBM17A BSBM20A BSBM20B
## [8] BSBM20C BSBM20D BSBM20E BSMMAT01 ITSEX BSBG10A BSBG07B
## [15] BSBG07A
## 15 Levels: BSBG07A BSBG07B BSBG10A BSBM17A BSBM19A BSBM19B ... ITSEX
df_Iran_num <- sapply( df_Iran_o, as.numeric )
df_Iran_num <- as.data.frame(df_Iran_num)
Iran_fa_model <- fa(df_Iran_num[,-11:-14], nfactors=2, rotate="varimax", fm="ml", n.obs = 5577, cor = "mixed")
##
## mixed.cor is deprecated, please use mixedCor.
##
## Loadings:
## ML1 ML2
## BSBM19A 0.354 0.774
## BSBM19B -0.397
## BSBM19C -0.503
## BSBM19D 0.338 0.744
## BSBM17A 0.451 0.652
## BSBM20A 0.571 0.362
## BSBM20B 0.571
## BSBM20C 0.880
## BSBM20D 0.896
## BSBM20E 0.646 0.405
##
## ML1 ML2
## SS loadings 3.106 2.350
## Proportion Var 0.311 0.235
## Cumulative Var 0.311 0.546
BSBM20E and BSBM17A quite fiited in both factors
##
## Factor analysis with Call: fa(r = df_Iran_num[, -11:-14], nfactors = 2, n.obs = 5577, rotate = "varimax",
## fm = "ml", cor = "mixed")
##
## Test of the hypothesis that 2 factors are sufficient.
## The degrees of freedom for the model is 26 and the objective function was 0.53
## The number of observations was 5580 with Chi Square = 2950.72 with prob < 0
##
## The root mean square of the residuals (RMSA) is 0.07
## The df corrected root mean square of the residuals is 0.09
##
## Tucker Lewis Index of factoring reliability = 0.828
## RMSEA index = 0.142 and the 10 % confidence intervals are 0.138 0.146
## BIC = 2726.42
RMSA is bigger then 0.5, same as RMSR more then 0.05, both of them does not show well model fit. Tucker Lewis Index does not cross 0.9 border. Not a good model
##
## Reliability analysis
## Call: psych::alpha(x = df_Iran_num[c("BSBM19A", "BSBM19B", "BSBM19C",
## "BSBM19D", "BSBM17A", "BSBM20A", "BSBM20B", "BSBM20C", "BSBM20D",
## "BSBM20E")], check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.82 0.83 0.85 0.32 4.7 0.0036 1.9 0.59 0.34
##
## lower alpha upper 95% confidence boundaries
## 0.81 0.82 0.83
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM19A 0.79 0.80 0.82 0.30 3.9 0.0043 0.030 0.34
## BSBM19B- 0.84 0.84 0.85 0.37 5.3 0.0033 0.018 0.36
## BSBM19C- 0.82 0.82 0.84 0.34 4.7 0.0036 0.029 0.36
## BSBM19D 0.79 0.80 0.83 0.31 4.0 0.0042 0.030 0.34
## BSBM17A 0.79 0.80 0.83 0.30 3.9 0.0043 0.030 0.31
## BSBM20A 0.80 0.81 0.83 0.32 4.2 0.0040 0.030 0.31
## BSBM20B 0.81 0.81 0.84 0.33 4.4 0.0039 0.030 0.34
## BSBM20C 0.80 0.81 0.83 0.32 4.3 0.0039 0.027 0.34
## BSBM20D 0.80 0.81 0.82 0.32 4.2 0.0040 0.027 0.34
## BSBM20E 0.79 0.80 0.83 0.31 4.0 0.0043 0.030 0.31
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## BSBM19A 5580 0.72 0.72 0.70 0.64 2.0 0.92
## BSBM19B- 5580 0.39 0.35 0.25 0.22 2.5 1.07
## BSBM19C- 5580 0.53 0.50 0.42 0.38 2.5 1.11
## BSBM19D 5580 0.70 0.70 0.67 0.60 2.0 0.95
## BSBM17A 5580 0.73 0.73 0.69 0.63 1.8 0.97
## BSBM20A 5580 0.63 0.65 0.60 0.53 1.6 0.85
## BSBM20B 5580 0.58 0.60 0.53 0.47 1.8 0.88
## BSBM20C 5580 0.60 0.63 0.59 0.50 1.5 0.81
## BSBM20D 5580 0.63 0.65 0.63 0.53 1.6 0.88
## BSBM20E 5580 0.72 0.71 0.68 0.61 2.3 1.09
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## BSBM19A 0.33 0.42 0.16 0.09 0
## BSBM19B 0.19 0.35 0.21 0.26 0
## BSBM19C 0.21 0.32 0.19 0.28 0
## BSBM19D 0.34 0.38 0.18 0.10 0
## BSBM17A 0.49 0.31 0.10 0.09 0
## BSBM20A 0.61 0.26 0.07 0.06 0
## BSBM20B 0.48 0.34 0.11 0.06 0
## BSBM20C 0.69 0.21 0.06 0.05 0
## BSBM20D 0.60 0.24 0.09 0.06 0
## BSBM20E 0.31 0.32 0.19 0.19 0
Cronbach’s Alpha shows good inter-item reliability, because value is near 0.82, (expected 0.9 or more, for exellent) SD is nice but, who cares.
1.1 Conduct regression analysis explaining math achievement (BSMMAT01) by the factors you got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside. Interpret the results. (2 pts)
fascores<-as.data.frame(Iran_fa_model$scores)
Iran_fa<-cbind(df_Iran_o,fascores)
Iran_fa$BSMMAT01 <- df_Iran_reg$BSMMAT01
model1 <- lm(as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A + ordered(BSBG07B) + ordered(BSBG07A), data = Iran_fa)
library("QuantPsyc")
summary(model1)
##
## Call:
## lm(formula = as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A +
## ordered(BSBG07B) + ordered(BSBG07A), data = Iran_fa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4764.4 -1117.6 80.9 1081.7 4562.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3422.692 32.737 104.552 < 2e-16 ***
## ML1 -28.331 19.683 -1.439 0.150108
## ML2 -656.846 21.532 -30.505 < 2e-16 ***
## ITSEXMale -144.673 39.169 -3.694 0.000223 ***
## BSBG10ANo -702.698 208.502 -3.370 0.000756 ***
## ordered(BSBG07B).L 619.213 82.667 7.490 7.94e-14 ***
## ordered(BSBG07B).Q -660.461 78.051 -8.462 < 2e-16 ***
## ordered(BSBG07B).C -588.181 73.229 -8.032 1.16e-15 ***
## ordered(BSBG07B)^4 -320.505 69.324 -4.623 3.86e-06 ***
## ordered(BSBG07B)^5 -352.518 64.049 -5.504 3.88e-08 ***
## ordered(BSBG07B)^6 -151.424 59.202 -2.558 0.010562 *
## ordered(BSBG07B)^7 192.405 59.661 3.225 0.001267 **
## ordered(BSBG07A).L 365.225 89.491 4.081 4.54e-05 ***
## ordered(BSBG07A).Q -841.559 78.946 -10.660 < 2e-16 ***
## ordered(BSBG07A).C -489.827 80.254 -6.103 1.11e-09 ***
## ordered(BSBG07A)^4 -251.322 82.666 -3.040 0.002375 **
## ordered(BSBG07A)^5 -224.154 76.689 -2.923 0.003482 **
## ordered(BSBG07A)^6 4.865 66.724 0.073 0.941881
## ordered(BSBG07A)^7 288.831 61.575 4.691 2.79e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1449 on 5561 degrees of freedom
## Multiple R-squared: 0.3263, Adjusted R-squared: 0.3241
## F-statistic: 149.6 on 18 and 5561 DF, p-value: < 2.2e-16
ML2 the only significant factor that show crucial influence of skills in math (self estimation) on math scores, otherwise thought about mathematics does not influence the success. p-value is less than 0.05
Beeing male decrise math scores Born in another country decreese math scores Educational level of mother and father linearly increses math scores, but fathers’s educationn is more sighnidicant and important
R scare is near 0.32, simular to the original study.
Iran_fa_num <- sapply( Iran_fa, as.numeric )
Iran_fa_num <- as.data.frame(
Iran_fa_num
)
model2 <- lm(as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A + BSBG07B + BSBG07A, data = Iran_fa_num)
library("QuantPsyc")
lm.beta(model2)
## ML1 ML2 ITSEX BSBG10A BSBG07B
## -0.005663678 -0.383103322 -0.054519928 -0.044915069 0.167236755
## BSBG07A
## 0.130957784
And self estimation in math have the most powerfull effect on the math scores
1.2 Write a short conclusion with overall interpretation of the results (1 pt)
Thus, it does not matter at all whether the student is aware of the significance of the knowledge he has received, and where he is going or not going to apply it. If a student is unsuccessful in mathematics, as he believes, then he will indeed be worse at coping with mathematical tasks. Apparently, this all works in Sweden as well as in Iran, and no power of Arabic numerals saves students.