0.1 Iran
0.2 Describe the aim and the research question of your small study - they might be diﬀerent from those described in the article (1 pt)
0.3 Conduct EFA. Present your correlation matrix. How did you choose the number of factors? (1 pt)
0.4 Interpret the resulting factor structure. What type of rotation did you choose and why? Show the diagram. Is your structure diﬀerent from the one Hanna Eklof got in her article? Why? (2 pts)
1 Analyze the model ﬁt; Chronbach’s alpha. What can you say about your scale(s)? (1 pt)
- 1.1 Conduct regression analysis explaining math achievement (BSMMAT01) by the factors you got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside. Interpret the results. (2 pts)
- 1.2 Write a short conclusion with overall interpretation of the results (1 pt)

0.1 Iran

0.2 Describe the aim and the research question of your small study - they might be diﬀerent from those described in the article (1 pt)

This study is part of a huge imaginary project which aimed at testing TIMSS for different countries.

“How universal are these scales and what are the diﬀerences?”

Test if western created measures works in eastern Arab country
Test explanatory power of attitudes factors to math achievements

Describe the data, give descriptive statistics for your variables - should be the same as in the study (2 pts)

library(foreign)

BSGIRNM6 <- read.spss("~/datanal/3year/FA/BSGIRNM6.sav", to.data.frame=TRUE)

library(dplyr)
df_Iran <- BSGIRNM6 %>% dplyr::select(BSBM19A:BSBM19D,  BSBM17A, BSBM20A:BSBM20E, BSMMAT01, ITSEX, BSBG10A, BSBG07B, BSBG07A)

knitr::kable(head(df_Iran, 5))

BSBM19A	BSBM19B	BSBM19C	BSBM19D	BSBM17A	BSBM20A	BSBM20B	BSBM20C	BSBM20D	BSBM20E	BSMMAT01	ITSEX	BSBG10A	BSBG07B	BSBG07A
Disagree a lot	Agree a lot	Agree a lot	Disagree a little	Disagree a little	Disagree a lot	Disagree a little	Disagree a little	Disagree a lot	Disagree a lot	509.70228	Female	Yes	Lower secondary	Lower secondary
Agree a little	Disagree a lot	Agree a little	Agree a lot	Disagree a little	Agree a little	Agree a lot	Agree a lot	Agree a lot	Agree a little	574.81151	Female	Yes	Postgraduate degree	Post-secondary, non-tertiary
Agree a little	Agree a little	Agree a lot	Agree a little	Agree a little	Agree a lot	Agree a little	Agree a lot	Agree a little	Agree a little	544.69944	Female	Yes	Bachelor’s or equivalent	Bachelor’s or equivalent
Agree a little	Agree a little	Agree a little	Agree a little	Agree a lot	Agree a lot	Agree a lot	Agree a lot	Agree a lot	Agree a lot	515.23793	Female	Yes	Lower secondary	Upper secondary
Agree a little	Agree a little	Agree a little	Agree a little	Agree a little	Agree a lot	Agree a lot	Agree a lot	Agree a lot	Agree a little	542.62545	Female	Yes	Lower secondary	Lower secondary

code.book <- data.frame(code = colnames(df_Iran), question = c(" I usually do well in mathematics","Mathematics is more difficult for me than for many of my classmates",
"Mathematics is not one of my strengths", " I learn things quickly in mathematics", " I enjoy learning mathematics", " I think learning mathematics will help me in my daily life",
" I need mathematics to learn other school subjects", "I need to do well in mathematics to get into the <university> of my choice", " I need to do well in mathematics to get the job I want", " I would like a job that involves using mathematics", " the eighth grade mathematics achievement plausible values", "sex", "Were you born in Iran?", "What is the highest level of education completed by your father", "What is the highest level of education completed by your mother"))
knitr::kable(code.book)

code	question
BSBM19A	I usually do well in mathematics
BSBM19B	Mathematics is more difficult for me than for many of my classmates
BSBM19C	Mathematics is not one of my strengths
BSBM19D	I learn things quickly in mathematics
BSBM17A	I enjoy learning mathematics
BSBM20A	I think learning mathematics will help me in my daily life
BSBM20B	I need mathematics to learn other school subjects
BSBM20C	I need to do well in mathematics to get into the of my choice
BSBM20D	I need to do well in mathematics to get the job I want
BSBM20E	I would like a job that involves using mathematics
BSMMAT01	the eighth grade mathematics achievement plausible values
ITSEX	sex
BSBG10A	Were you born in Iran?
BSBG07B	What is the highest level of education completed by your father
BSBG07A	What is the highest level of education completed by your mother
Only `10` v	ariables available in 2015 data, so we want be able to evaluate 2 missing question. Things change.

library(psych)
library(formattable)

formattable(describe(df_Iran))

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
BSBM19A*	1	6088	2.016097	9.202130e-01	2.0	1.908662	1.4826	1	4	3	0.6549524118	-0.37393607	0.011793726
BSBM19B*	2	6058	2.524431	1.066308e+00	2.0	2.530528	1.4826	1	4	3	0.0890651658	-1.24687886	0.013699918
BSBM19C*	3	5986	2.530738	1.111258e+00	2.0	2.538413	1.4826	1	4	3	0.0613887310	-1.35456879	0.014363047
BSBM19D*	4	5975	2.044184	9.551663e-01	2.0	1.931813	1.4826	1	4	3	0.5917264983	-0.60051670	0.012356915
BSBM17A*	5	6090	1.807882	9.705147e-01	2.0	1.639984	1.4826	1	4	3	1.0207785444	-0.02814225	0.012436365
BSBM20A*	6	6089	1.578256	8.565816e-01	1.0	1.399343	0.0000	1	4	3	1.4880037742	1.42473529	0.010977306
BSBM20B*	7	6065	1.756966	8.868885e-01	2.0	1.618792	1.4826	1	4	3	1.0273595279	0.23902516	0.011388162
BSBM20C*	8	6060	1.473267	8.089152e-01	1.0	1.284241	0.0000	1	4	3	1.7541325900	2.30715680	0.010391224
BSBM20D*	9	6057	1.614000	8.855896e-01	1.0	1.442748	0.0000	1	4	3	1.3521593226	0.86799393	0.011378990
BSBM20E*	10	6043	2.264604	1.086456e+00	2.0	2.205791	1.4826	1	4	3	0.3462894067	-1.17039462	0.013976098
BSMMAT01*	11	6130	3063.463948	1.768202e+03	3063.5	3063.454935	2269.8606	1	6126	6125	0.0000292468	-1.20009588	22.584041406
ITSEX*	12	6130	1.510604	4.999283e-01	2.0	1.513254	0.0000	1	2	1	-0.0424135160	-1.99852704	0.006385244
BSBG10A*	13	6041	1.009436	9.668527e-02	1.0	1.000000	0.0000	1	2	1	10.1459767158	100.95755566	0.001243959
BSBG07B*	14	6050	3.581983	2.123164e+00	3.0	3.408264	2.9652	1	8	7	0.5136830879	-0.91494837	0.027296430
BSBG07A*	15	6043	3.254675	2.083377e+00	3.0	3.026474	2.9652	1	8	7	0.7091241271	-0.56728575	0.026800417

Mean age is near 18

a <-  nrow(df_Iran)
b <-  nrow(na.omit(df_Iran))
cat("There were", a, "observations in collected by TIMSS data.", "Without missings that would be", b)

## There were 6130 observations in collected by TIMSS data. Without missings that would be 5580

library(ggplot2)
library(ggthemes)
library(gridExtra)
df_Iran_o <- df_Iran %>% na.omit %>% select(-BSMMAT01)
  
df_Iran_reg <- df_Iran %>% na.omit %>% select(BSMMAT01)
plots <- list()  

for (nm in names(df_Iran_o)){
 plots[[nm]] <-  ggplot(df_Iran_o)+
   ggtitle(paste("distribution of variable" ,nm))+
    geom_bar(aes_string(nm), stat = "count", fill = "#E63946")+ 
    theme_wsj() + 
    coord_flip()+
   theme(plot.title = element_text(size = 18, face = "bold"))
}


plots[[sample(1:12, 1)]]

table(df_Iran_o$ITSEX)

## 
## Female   Male 
##   2752   2828

There are 2752 female and 2828 male students in the sample.

0.3 Conduct EFA. Present your correlation matrix. How did you choose the number of factors? (1 pt)

library(polycor)

corrr<- hetcor(df_Iran_o[,-10:-14])
corrr <- corrr$correlations


cor.mtest <- function(mat, ...) {
    mat <- as.matrix(mat)
    n <- ncol(mat)
    p.mat<- matrix(NA, n, n)
    diag(p.mat) <- 0
    for (i in 1:(n - 1)) {
        for (j in (i + 1):n) {
            tmp <- cor.test(mat[, i], mat[, j], ...)
            p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
        }
    }
  colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
  p.mat
}
# matrix of the p-value of the correlation
p.mat <- cor.mtest(corrr)


corrplot::corrplot(corrr, type="upper", order="hclust", 
         p.mat = p.mat, sig.level = 0.05, insig = "blank")

All variables shows significant p-value < 0.05 correlation Most of them are strongly inter-correlated (almost 0.7) There are two group of correlated variables

fa.parallel(corrr, fa="both", n.iter=100, n.obs = 5580)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  2

4 factors were offered, so lets try them

fa.diagram(fa(corrr, nfactors=4, rotate="varimax", fm="ml", n.obs = 5577))

There are two factors, that explains only 2 variables each. We need more explanation!

fa.diagram(fa(corrr, nfactors=3, rotate="varimax", fm="ml", n.obs = 5577))

Still only two for the third factor.

fa.diagram(fa(corrr, nfactors=2, rotate="varimax", fm="ml", n.obs = 5577))

Better with 2 factors, even if one loading is not good. (0.4)

0.4 Interpret the resulting factor structure. What type of rotation did you choose and why? Show the diagram. Is your structure diﬀerent from the one Hanna Eklof got in her article? Why? (2 pts)

Varimax was used as rotation, because oblimin showed correlation between factor, when other variables does not changed much. rotation was used to significantly increase quality of model, to relay on data not on random. Low correlation 0.4 was stable with oblimin rotation to, so I decided to not follow the article.

BSBM19B, BSBM19C, BSBM17A, BSBM19D, BSBM19A – ML2

BSBM20A, BSBM20B, BSBM20D, BSBM20C – ML1

ML1 match is necessary for human in future

ML2 skills in match (19c and 19b correspond lack of skills)

Factors do meet results of PCA from the paper.

1 Analyze the model ﬁt; Chronbach’s alpha. What can you say about your scale(s)? (1 pt)

code.book$code

##  [1] BSBM19A  BSBM19B  BSBM19C  BSBM19D  BSBM17A  BSBM20A  BSBM20B 
##  [8] BSBM20C  BSBM20D  BSBM20E  BSMMAT01 ITSEX    BSBG10A  BSBG07B 
## [15] BSBG07A 
## 15 Levels: BSBG07A BSBG07B BSBG10A BSBM17A BSBM19A BSBM19B ... ITSEX

df_Iran_num <- sapply( df_Iran_o, as.numeric )
df_Iran_num <- as.data.frame(df_Iran_num)
Iran_fa_model <- fa(df_Iran_num[,-11:-14], nfactors=2, rotate="varimax", fm="ml", n.obs = 5577, cor = "mixed")

## 
## mixed.cor is deprecated, please use mixedCor.

print(Iran_fa_model$loadings,cutoff = 0.3)

## 
## Loadings:
##         ML1    ML2   
## BSBM19A  0.354  0.774
## BSBM19B        -0.397
## BSBM19C        -0.503
## BSBM19D  0.338  0.744
## BSBM17A  0.451  0.652
## BSBM20A  0.571  0.362
## BSBM20B  0.571       
## BSBM20C  0.880       
## BSBM20D  0.896       
## BSBM20E  0.646  0.405
## 
##                  ML1   ML2
## SS loadings    3.106 2.350
## Proportion Var 0.311 0.235
## Cumulative Var 0.311 0.546

BSBM20E and BSBM17A quite fiited in both factors

summary(Iran_fa_model)

## 
## Factor analysis with Call: fa(r = df_Iran_num[, -11:-14], nfactors = 2, n.obs = 5577, rotate = "varimax", 
##     fm = "ml", cor = "mixed")
## 
## Test of the hypothesis that 2 factors are sufficient.
## The degrees of freedom for the model is 26  and the objective function was  0.53 
## The number of observations was  5580  with Chi Square =  2950.72  with prob <  0 
## 
## The root mean square of the residuals (RMSA) is  0.07 
## The df corrected root mean square of the residuals is  0.09 
## 
## Tucker Lewis Index of factoring reliability =  0.828
## RMSEA index =  0.142  and the 10 % confidence intervals are  0.138 0.146
## BIC =  2726.42

RMSA is bigger then 0.5, same as RMSR more then 0.05, both of them does not show well model fit. Tucker Lewis Index does not cross 0.9 border. Not a good model

psych::alpha(df_Iran_num[c('BSBM19A',  'BSBM19B', 'BSBM19C',  'BSBM19D',  'BSBM17A',  'BSBM20A',  'BSBM20B',  'BSBM20C',  'BSBM20D', 'BSBM20E')], check.keys = TRUE)

## 
## Reliability analysis   
## Call: psych::alpha(x = df_Iran_num[c("BSBM19A", "BSBM19B", "BSBM19C", 
##     "BSBM19D", "BSBM17A", "BSBM20A", "BSBM20B", "BSBM20C", "BSBM20D", 
##     "BSBM20E")], check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.82      0.83    0.85      0.32 4.7 0.0036  1.9 0.59     0.34
## 
##  lower alpha upper     95% confidence boundaries
## 0.81 0.82 0.83 
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## BSBM19A       0.79      0.80    0.82      0.30 3.9   0.0043 0.030  0.34
## BSBM19B-      0.84      0.84    0.85      0.37 5.3   0.0033 0.018  0.36
## BSBM19C-      0.82      0.82    0.84      0.34 4.7   0.0036 0.029  0.36
## BSBM19D       0.79      0.80    0.83      0.31 4.0   0.0042 0.030  0.34
## BSBM17A       0.79      0.80    0.83      0.30 3.9   0.0043 0.030  0.31
## BSBM20A       0.80      0.81    0.83      0.32 4.2   0.0040 0.030  0.31
## BSBM20B       0.81      0.81    0.84      0.33 4.4   0.0039 0.030  0.34
## BSBM20C       0.80      0.81    0.83      0.32 4.3   0.0039 0.027  0.34
## BSBM20D       0.80      0.81    0.82      0.32 4.2   0.0040 0.027  0.34
## BSBM20E       0.79      0.80    0.83      0.31 4.0   0.0043 0.030  0.31
## 
##  Item statistics 
##             n raw.r std.r r.cor r.drop mean   sd
## BSBM19A  5580  0.72  0.72  0.70   0.64  2.0 0.92
## BSBM19B- 5580  0.39  0.35  0.25   0.22  2.5 1.07
## BSBM19C- 5580  0.53  0.50  0.42   0.38  2.5 1.11
## BSBM19D  5580  0.70  0.70  0.67   0.60  2.0 0.95
## BSBM17A  5580  0.73  0.73  0.69   0.63  1.8 0.97
## BSBM20A  5580  0.63  0.65  0.60   0.53  1.6 0.85
## BSBM20B  5580  0.58  0.60  0.53   0.47  1.8 0.88
## BSBM20C  5580  0.60  0.63  0.59   0.50  1.5 0.81
## BSBM20D  5580  0.63  0.65  0.63   0.53  1.6 0.88
## BSBM20E  5580  0.72  0.71  0.68   0.61  2.3 1.09
## 
## Non missing response frequency for each item
##            1    2    3    4 miss
## BSBM19A 0.33 0.42 0.16 0.09    0
## BSBM19B 0.19 0.35 0.21 0.26    0
## BSBM19C 0.21 0.32 0.19 0.28    0
## BSBM19D 0.34 0.38 0.18 0.10    0
## BSBM17A 0.49 0.31 0.10 0.09    0
## BSBM20A 0.61 0.26 0.07 0.06    0
## BSBM20B 0.48 0.34 0.11 0.06    0
## BSBM20C 0.69 0.21 0.06 0.05    0
## BSBM20D 0.60 0.24 0.09 0.06    0
## BSBM20E 0.31 0.32 0.19 0.19    0

Cronbach’s Alpha shows good inter-item reliability, because value is near 0.82, (expected 0.9 or more, for exellent) SD is nice but, who cares.

1.1 Conduct regression analysis explaining math achievement (BSMMAT01) by the factors you got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside. Interpret the results. (2 pts)

load <- Iran_fa_model$loadings[,1:2] 
plot(load) # set up plot

fascores<-as.data.frame(Iran_fa_model$scores)
Iran_fa<-cbind(df_Iran_o,fascores)
Iran_fa$BSMMAT01 <- df_Iran_reg$BSMMAT01

model1 <- lm(as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A + ordered(BSBG07B) + ordered(BSBG07A), data = Iran_fa)
library("QuantPsyc")

summary(model1)

## 
## Call:
## lm(formula = as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A + 
##     ordered(BSBG07B) + ordered(BSBG07A), data = Iran_fa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4764.4 -1117.6    80.9  1081.7  4562.9 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3422.692     32.737 104.552  < 2e-16 ***
## ML1                 -28.331     19.683  -1.439 0.150108    
## ML2                -656.846     21.532 -30.505  < 2e-16 ***
## ITSEXMale          -144.673     39.169  -3.694 0.000223 ***
## BSBG10ANo          -702.698    208.502  -3.370 0.000756 ***
## ordered(BSBG07B).L  619.213     82.667   7.490 7.94e-14 ***
## ordered(BSBG07B).Q -660.461     78.051  -8.462  < 2e-16 ***
## ordered(BSBG07B).C -588.181     73.229  -8.032 1.16e-15 ***
## ordered(BSBG07B)^4 -320.505     69.324  -4.623 3.86e-06 ***
## ordered(BSBG07B)^5 -352.518     64.049  -5.504 3.88e-08 ***
## ordered(BSBG07B)^6 -151.424     59.202  -2.558 0.010562 *  
## ordered(BSBG07B)^7  192.405     59.661   3.225 0.001267 ** 
## ordered(BSBG07A).L  365.225     89.491   4.081 4.54e-05 ***
## ordered(BSBG07A).Q -841.559     78.946 -10.660  < 2e-16 ***
## ordered(BSBG07A).C -489.827     80.254  -6.103 1.11e-09 ***
## ordered(BSBG07A)^4 -251.322     82.666  -3.040 0.002375 ** 
## ordered(BSBG07A)^5 -224.154     76.689  -2.923 0.003482 ** 
## ordered(BSBG07A)^6    4.865     66.724   0.073 0.941881    
## ordered(BSBG07A)^7  288.831     61.575   4.691 2.79e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1449 on 5561 degrees of freedom
## Multiple R-squared:  0.3263, Adjusted R-squared:  0.3241 
## F-statistic: 149.6 on 18 and 5561 DF,  p-value: < 2.2e-16

ML2 the only significant factor that show crucial influence of skills in math (self estimation) on math scores, otherwise thought about mathematics does not influence the success. p-value is less than 0.05

Beeing male decrise math scores Born in another country decreese math scores Educational level of mother and father linearly increses math scores, but fathers’s educationn is more sighnidicant and important

R scare is near 0.32, simular to the original study.

Iran_fa_num <- sapply( Iran_fa, as.numeric )
Iran_fa_num <- as.data.frame(
  Iran_fa_num
)

model2 <- lm(as.numeric(BSMMAT01) ~ ML1 + ML2 + ITSEX + BSBG10A + BSBG07B + BSBG07A, data = Iran_fa_num)
library("QuantPsyc")

lm.beta(model2)

##          ML1          ML2        ITSEX      BSBG10A      BSBG07B 
## -0.005663678 -0.383103322 -0.054519928 -0.044915069  0.167236755 
##      BSBG07A 
##  0.130957784

And self estimation in math have the most powerfull effect on the math scores

1.2 Write a short conclusion with overall interpretation of the results (1 pt)

Thus, it does not matter at all whether the student is aware of the significance of the knowledge he has received, and where he is going or not going to apply it. If a student is unsuccessful in mathematics, as he believes, then he will indeed be worse at coping with mathematical tasks. Apparently, this all works in Sweden as well as in Iran, and no power of Arabic numerals saves students.

EFA of Iran

Vsevolod Suschevskiy

2019-05-19