The data we are using is from TIMSS & PIRLS International Study Center. We selected the desired country for analysis and worked with part of the data.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The uploaded data initially contained 436 variables and 4745 observations, but 24 variables were pre-selected for factor analysis. All of them were renamed for simplicity. The observations with NA were deleted, and our final dataset has 4555 observations.
## [1] 4555 24
## 'data.frame': 4555 obs. of 24 variables:
## $ enjoymath :Class 'haven_labelled' atomic [1:4555] 1 3 2 3 2 1 1 3 2 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\ENJOY LEARNING MATHEMATICS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ wishnotstudy :Class 'haven_labelled' atomic [1:4555] 4 1 3 3 3 4 3 2 2 3 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\WISH HAVE NOT TO STUDY MATH"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ mathboring :Class 'haven_labelled' atomic [1:4555] 4 1 3 3 2 4 4 2 2 3 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATH IS BORING"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ learninterest :Class 'haven_labelled' atomic [1:4555] 1 4 2 2 2 1 2 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LEARN INTERESTING THINGS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ likemath :Class 'haven_labelled' atomic [1:4555] 1 3 2 3 3 1 1 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE MATHEMATICS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ likenumbers :Class 'haven_labelled' atomic [1:4555] 1 4 2 3 3 2 3 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE NUMBERS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ likemathproblems:Class 'haven_labelled' atomic [1:4555] 1 3 3 2 3 1 2 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE MATH PROBLEMS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ lookforwardmath :Class 'haven_labelled' atomic [1:4555] 1 3 3 2 3 3 2 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LOOK FORWARD TO MATH CLASS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ favoritesubj :Class 'haven_labelled' atomic [1:4555] 1 3 2 2 3 1 1 3 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\FAVORITE SUBJECT"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ texpect :Class 'haven_labelled' atomic [1:4555] 4 2 3 3 3 3 3 2 4 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER EXPECTS TO DO"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ teasy :Class 'haven_labelled' atomic [1:4555] 4 3 2 2 2 3 3 2 3 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER IS EASY TO UNDERSTAND"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ tinterest :Class 'haven_labelled' atomic [1:4555] 4 1 3 2 3 3 3 2 1 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\INTERESTED IN WHAT TCHR SAYS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ interesttodo :Class 'haven_labelled' atomic [1:4555] 4 3 4 2 3 2 3 2 4 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\INTERESTING THINGS TO DO"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ clearanswer :Class 'haven_labelled' atomic [1:4555] 4 1 2 2 2 2 2 2 2 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER CLEAR ANSWERS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ explaingood :Class 'haven_labelled' atomic [1:4555] 4 3 2 2 2 3 3 2 3 3 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER EXPLAINS GOOD"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ showslearned :Class 'haven_labelled' atomic [1:4555] 4 3 2 2 2 3 2 3 3 3 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER SHOWS LEARNED"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ thingstohelp :Class 'haven_labelled' atomic [1:4555] 4 2 2 2 2 1 2 2 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\DIFFERENT THINGS TO HELP"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ howtobetter :Class 'haven_labelled' atomic [1:4555] 4 3 2 2 2 1 2 2 2 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TELLS HOW TO DO BETTER"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ tlisten :Class 'haven_labelled' atomic [1:4555] 4 1 3 2 2 1 2 2 3 2 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER LISTENS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ wellmath :Class 'haven_labelled' atomic [1:4555] 4 3 3 4 3 3 3 3 4 1 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\USUALLY DO WELL IN MATH"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ difficultmath :Class 'haven_labelled' atomic [1:4555] 4 3 2 3 2 3 3 1 4 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATHEMATICS IS MORE DIFFICULT"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ notstrong :Class 'haven_labelled' atomic [1:4555] 4 2 2 2 2 4 3 1 1 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATHEMATICS NOT MY STRENGTH"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ mathquick :Class 'haven_labelled' atomic [1:4555] 4 3 3 3 3 1 3 3 3 3 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\LEARN QUICKLY IN MATHEMATICS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## $ mathnerv :Class 'haven_labelled' atomic [1:4555] 4 1 3 3 3 4 3 1 3 4 ...
## .. ..- attr(*, "label")= chr "MATH\\AGREE\\MAT MAKES NERVOUS"
## .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
## .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
## - attr(*, "na.action")=Class 'exclude' Named int [1:190] 15 40 127 138 148 194 243 247 265 274 ...
## .. ..- attr(*, "names")= chr [1:190] "15" "40" "127" "138" ...
Summary of our data shows that all our variables are haven_labelled which we will fix later.
The name of our variables:
## The following objects are masked from dt (pos = 3):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
## The following objects are masked from dt (pos = 3):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
## The following objects are masked from dt (pos = 4):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
## The following objects are masked from dt (pos = 3):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
## The following objects are masked from dt (pos = 4):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
## The following objects are masked from dt (pos = 5):
##
## clearanswer, difficultmath, enjoymath, explaingood,
## favoritesubj, howtobetter, interesttodo, learninterest,
## likemath, likemathproblems, likenumbers, lookforwardmath,
## mathboring, mathnerv, mathquick, notstrong, showslearned,
## teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
## wishnotstudy
Here we see distribution of all variables. All of them consist of scales from 1 to 4. It is coded data which means:
1: Agree a lot; 2: Agree a little; 3: Disagree a little; 4: Disagree a lot
Hense, there are not many answers 1 or 4 (agree a lot/disagree a lot) as 2 and 3 (agree a little/disagree a little).
Correlation matrix displaying the positive (blue) and negative (red) correlations between our variables. There is a correlation. We can group our variables.
## 'data.frame': 4555 obs. of 24 variables:
## $ enjoymath : num 1 3 2 3 2 1 1 3 2 2 ...
## $ wishnotstudy : num 4 1 3 3 3 4 3 2 2 3 ...
## $ mathboring : num 4 1 3 3 2 4 4 2 2 3 ...
## $ learninterest : num 1 4 2 2 2 1 2 3 3 2 ...
## $ likemath : num 1 3 2 3 3 1 1 3 3 2 ...
## $ likenumbers : num 1 4 2 3 3 2 3 3 3 2 ...
## $ likemathproblems: num 1 3 3 2 3 1 2 3 3 2 ...
## $ lookforwardmath : num 1 3 3 2 3 3 2 3 3 2 ...
## $ favoritesubj : num 1 3 2 2 3 1 1 3 3 2 ...
## $ texpect : num 4 2 3 3 3 3 3 2 4 2 ...
## $ teasy : num 4 3 2 2 2 3 3 2 3 4 ...
## $ tinterest : num 4 1 3 2 3 3 3 2 1 4 ...
## $ interesttodo : num 4 3 4 2 3 2 3 2 4 4 ...
## $ clearanswer : num 4 1 2 2 2 2 2 2 2 2 ...
## $ explaingood : num 4 3 2 2 2 3 3 2 3 3 ...
## $ showslearned : num 4 3 2 2 2 3 2 3 3 3 ...
## $ thingstohelp : num 4 2 2 2 2 1 2 2 3 2 ...
## $ howtobetter : num 4 3 2 2 2 1 2 2 2 2 ...
## $ tlisten : num 4 1 3 2 2 1 2 2 3 2 ...
## $ wellmath : num 4 3 3 4 3 3 3 3 4 1 ...
## $ difficultmath : num 4 3 2 3 2 3 3 1 4 4 ...
## $ notstrong : num 4 2 2 2 2 4 3 1 1 4 ...
## $ mathquick : num 4 3 3 3 3 1 3 3 3 3 ...
## $ mathnerv : num 4 1 3 3 3 4 3 1 3 4 ...
Our variables were coded as numeric for next step of analysis.
Since Factor Analysis (FA) does not work with NA, we have already excluded it from our data. Now we can conduct FA.
How many factors should be extracted?
##
## Attaching package: 'psych'
## The following object is masked from 'package:polycor':
##
## polyserial
## Parallel analysis suggests that the number of factors = 5 and the number of components = 3
It is hard to interpret the plot, but if we need to know about number of factors itself. Parallel analysis suggests that the number of factors = 5 and the number of components = 1.
## Factor Analysis using method = ml
## Call: fa(r = dt, nfactors = 4, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML1 ML3 ML4 h2 u2 com
## enjoymath 0.25 0.74 0.29 0.24 0.75 0.25 1.8
## wishnotstudy -0.13 -0.32 -0.20 -0.61 0.53 0.47 1.9
## mathboring -0.21 -0.44 -0.17 -0.65 0.69 0.31 2.2
## learninterest 0.32 0.67 0.19 0.22 0.64 0.36 1.9
## likemath 0.20 0.80 0.39 0.18 0.87 0.13 1.7
## likenumbers 0.23 0.74 0.27 0.17 0.70 0.30 1.6
## likemathproblems 0.19 0.76 0.36 0.18 0.78 0.22 1.7
## lookforwardmath 0.38 0.66 0.15 0.25 0.66 0.34 2.0
## favoritesubj 0.17 0.75 0.44 0.16 0.81 0.19 1.8
## texpect 0.45 0.26 0.16 0.00 0.30 0.70 1.9
## teasy 0.78 0.13 0.08 0.17 0.65 0.35 1.2
## tinterest 0.69 0.29 0.04 0.14 0.59 0.41 1.4
## interesttodo 0.64 0.27 0.08 0.06 0.50 0.50 1.4
## clearanswer 0.73 0.09 0.07 0.07 0.55 0.45 1.1
## explaingood 0.80 0.07 0.04 0.16 0.66 0.34 1.1
## showslearned 0.53 0.14 0.03 -0.02 0.30 0.70 1.1
## thingstohelp 0.75 0.11 0.03 0.09 0.58 0.42 1.1
## howtobetter 0.74 0.10 0.04 0.05 0.56 0.44 1.0
## tlisten 0.68 0.14 0.06 0.04 0.49 0.51 1.1
## wellmath 0.13 0.34 0.66 0.02 0.56 0.44 1.6
## difficultmath 0.04 -0.17 -0.66 -0.19 0.51 0.49 1.3
## notstrong -0.03 -0.41 -0.72 -0.19 0.72 0.28 1.8
## mathquick 0.15 0.38 0.55 0.08 0.48 0.52 2.0
## mathnerv -0.16 -0.36 -0.32 -0.42 0.43 0.57 3.2
##
## ML2 ML1 ML3 ML4
## SS loadings 5.30 4.97 2.60 1.43
## Proportion Var 0.22 0.21 0.11 0.06
## Cumulative Var 0.22 0.43 0.54 0.60
## Proportion Explained 0.37 0.35 0.18 0.10
## Cumulative Proportion 0.37 0.72 0.90 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 16.54 with Chi Square of 75155.14
## The degrees of freedom for the model are 186 and the objective function was 0.97
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 4555 with the empirical chi square 1786.77 with prob < 2.9e-259
## The total number of observations was 4555 with Likelihood Chi Square = 4404.41 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.916
## RMSEA index = 0.071 and the 90 % confidence intervals are 0.069 0.072
## BIC = 2837.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML1 ML3 ML4
## Correlation of (regression) scores with factors 0.95 0.92 0.86 0.81
## Multiple R square of scores with factors 0.90 0.85 0.75 0.65
## Minimum correlation of possible factor scores 0.81 0.70 0.49 0.30
We tried to do FA with different number of factors (close to 5), without and with rotation. FA with 5 and more factors and varimax rotation showed bad results since it was no variable for ML5. FA with 5 and more variables and oblimin rotation showed bad results since it was factors with 2 variables. FA with 4 or more factors without rotation had ML3, ML4 and ML5 with no variables.
Hence, our analysis showed that the best results of FA is with 5 factors and varimax rotation. Without rotation
We see that communality, the proportion of each variable’s variance that can be explained by the factors is higher than error or uniqueness for most of variables. Since the higher the communality (h2), the better, and otherwise, the lower the uniqueness (u2), the better. Here texpect, showslearned, tlisten, mathquick, difficultmath, interesttodo and mathnerv do not tell much about the factor since its uniqueness is higher than communality.
For Proportion Variance we know that each factor should explain at least 10% - if less - that factor does nor explain much. Our analysis showed that only ML4 explains less than 10% - means bad results of EFA.
For Proportion Explained we know that each factor should explain +- the same proportion. However, the first factor will always explain more than next factors. For our model - ML1 explain 37%, ML2 35%, ML3 - 18%, ML4 - 10%.
RMSR = 0.03, should be closer to 0, for our FA it is good result. RMSEA index = 0.071 and it is acceptable since it is < 0.08. Tucker Lewis Index = 0.916 and it is acceptable since it is > 0.09.
The plot shows our 4 factors. Each of them have more or 3 variables which is good since factors with less variables do not make sence. Red lines on the plot mean negative relationships.
Accordingly, we conclude that results of FA are good and we can go further.
Let us name the factors.
ML1 - sympathy in learning math ML2 - the role of the teacher in the study of math ML3 - difficulties in learning math ML4 - reluctance to learn math
##
## Reliability analysis
## Call: alpha(x = ML1, check.keys = T)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.95 0.95 0.94 0.72 18 0.0012 2.7 0.79 0.71
##
## lower alpha upper 95% confidence boundaries
## 0.95 0.95 0.95
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## likemath 0.93 0.93 0.92 0.70 14 0.0015 0.0018
## likemathproblems 0.94 0.94 0.93 0.71 15 0.0014 0.0035
## favoritesubj 0.94 0.94 0.93 0.72 15 0.0014 0.0028
## likenumbers 0.94 0.94 0.93 0.73 16 0.0013 0.0045
## enjoymath 0.94 0.94 0.93 0.72 15 0.0014 0.0040
## learninterest 0.94 0.94 0.94 0.74 17 0.0012 0.0035
## lookforwardmath 0.94 0.95 0.94 0.74 17 0.0012 0.0034
## med.r
## likemath 0.68
## likemathproblems 0.70
## favoritesubj 0.70
## likenumbers 0.71
## enjoymath 0.69
## learninterest 0.75
## lookforwardmath 0.75
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## likemath 4555 0.93 0.92 0.92 0.90 2.6 0.97
## likemathproblems 4555 0.89 0.89 0.87 0.85 2.6 0.93
## favoritesubj 4555 0.90 0.89 0.88 0.85 2.8 0.99
## likenumbers 4555 0.86 0.87 0.84 0.82 2.9 0.78
## enjoymath 4555 0.88 0.88 0.86 0.84 2.4 0.91
## learninterest 4555 0.83 0.83 0.79 0.77 2.5 0.85
## lookforwardmath 4555 0.82 0.82 0.78 0.76 2.8 0.85
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## likemath 0.15 0.28 0.37 0.20 0
## likemathproblems 0.13 0.30 0.38 0.19 0
## favoritesubj 0.13 0.20 0.38 0.28 0
## likenumbers 0.06 0.20 0.54 0.20 0
## enjoymath 0.16 0.37 0.34 0.13 0
## learninterest 0.12 0.34 0.42 0.12 0
## lookforwardmath 0.07 0.23 0.47 0.22 0
Cronbach’s alpha = 0.95 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.
##
## Reliability analysis
## Call: alpha(x = ML2, check.keys = T)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.91 0.91 0.91 0.5 9.8 0.002 2.3 0.59 0.49
##
## lower alpha upper 95% confidence boundaries
## 0.9 0.91 0.91
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## texpect 0.91 0.91 0.91 0.53 10.0 0.0020 0.0087
## teasy 0.89 0.89 0.90 0.48 8.4 0.0024 0.0099
## tinterest 0.90 0.90 0.90 0.49 8.6 0.0023 0.0117
## interesttodo 0.90 0.90 0.90 0.49 8.8 0.0023 0.0129
## clearanswer 0.90 0.90 0.90 0.49 8.7 0.0023 0.0120
## explaingood 0.89 0.89 0.90 0.49 8.5 0.0023 0.0090
## showslearned 0.91 0.91 0.91 0.52 9.7 0.0021 0.0112
## thingstohelp 0.90 0.90 0.90 0.49 8.6 0.0023 0.0114
## howtobetter 0.90 0.90 0.90 0.49 8.6 0.0023 0.0116
## tlisten 0.90 0.90 0.90 0.49 8.8 0.0023 0.0131
## med.r
## texpect 0.52
## teasy 0.48
## tinterest 0.48
## interesttodo 0.49
## clearanswer 0.48
## explaingood 0.49
## showslearned 0.52
## thingstohelp 0.48
## howtobetter 0.48
## tlisten 0.49
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## texpect 4555 0.59 0.59 0.52 0.50 2.8 0.79
## teasy 4555 0.80 0.80 0.79 0.74 2.1 0.82
## tinterest 4555 0.78 0.78 0.76 0.71 2.5 0.87
## interesttodo 4555 0.75 0.75 0.73 0.69 2.8 0.77
## clearanswer 4555 0.76 0.76 0.72 0.69 2.1 0.78
## explaingood 4555 0.79 0.79 0.78 0.73 2.1 0.83
## showslearned 4555 0.62 0.62 0.55 0.53 2.7 0.81
## thingstohelp 4555 0.77 0.78 0.75 0.71 2.0 0.76
## howtobetter 4555 0.77 0.77 0.75 0.71 2.1 0.78
## tlisten 4555 0.75 0.75 0.71 0.68 2.3 0.82
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## texpect 0.05 0.25 0.50 0.19 0
## teasy 0.20 0.52 0.21 0.07 0
## tinterest 0.14 0.38 0.37 0.12 0
## interesttodo 0.05 0.25 0.53 0.17 0
## clearanswer 0.22 0.54 0.18 0.05 0
## explaingood 0.24 0.50 0.19 0.07 0
## showslearned 0.07 0.32 0.46 0.15 0
## thingstohelp 0.23 0.56 0.16 0.05 0
## howtobetter 0.22 0.54 0.19 0.05 0
## tlisten 0.16 0.50 0.26 0.08 0
Cronbach’s alpha = 0.91 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If almost any item dropped the alpha are getting lower - each variable is needed.
##
## Reliability analysis
## Call: alpha(x = ML3, check.keys = T)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.82 0.83 0.79 0.54 4.8 0.0041 2.3 0.72 0.56
##
## lower alpha upper 95% confidence boundaries
## 0.82 0.82 0.83
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## notstrong 0.75 0.76 0.68 0.51 3.1 0.0064 0.0051
## difficultmath 0.80 0.80 0.73 0.58 4.1 0.0051 0.0010
## wellmath- 0.77 0.77 0.70 0.53 3.4 0.0057 0.0055
## mathquick- 0.79 0.79 0.73 0.56 3.9 0.0052 0.0039
## med.r
## notstrong 0.49
## difficultmath 0.59
## wellmath- 0.54
## mathquick- 0.60
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## notstrong 4555 0.86 0.84 0.78 0.71 2.2 1.03
## difficultmath 4555 0.79 0.78 0.67 0.61 2.5 0.90
## wellmath- 4555 0.82 0.83 0.75 0.67 2.1 0.83
## mathquick- 4555 0.78 0.79 0.69 0.62 2.3 0.79
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## notstrong 0.32 0.30 0.26 0.13 0
## difficultmath 0.16 0.32 0.40 0.12 0
## wellmath 0.06 0.21 0.48 0.25 0
## mathquick 0.07 0.28 0.51 0.14 0
Cronbach’s alpha = 0.83 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.
##
## Reliability analysis
## Call: alpha(x = ML4, check.keys = T)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.77 0.77 0.7 0.53 3.3 0.006 2.7 0.74 0.51
##
## lower alpha upper 95% confidence boundaries
## 0.76 0.77 0.78
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## mathboring 0.62 0.62 0.44 0.44 1.6 0.0114 NA
## wishnotstudy 0.68 0.68 0.51 0.51 2.1 0.0095 NA
## mathnerv 0.76 0.76 0.62 0.62 3.3 0.0070 NA
## med.r
## mathboring 0.44
## wishnotstudy 0.51
## mathnerv 0.62
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## mathboring 4555 0.85 0.86 0.76 0.67 2.7 0.86
## wishnotstudy 4555 0.84 0.83 0.71 0.61 2.7 0.92
## mathnerv 4555 0.79 0.79 0.60 0.53 2.8 0.92
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## mathboring 0.11 0.28 0.47 0.15 0
## wishnotstudy 0.12 0.25 0.42 0.20 0
## mathnerv 0.11 0.23 0.44 0.22 0
Cronbach’s alpha = 0.77 - it is lower than 0.8, but still good (>0.07). Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.
Reliability analysis for all factors are good.
How we can save scores for our regression and add new controlled variables.
## [1] "enjoymath" "wishnotstudy" "mathboring"
## [4] "learninterest" "likemath" "likenumbers"
## [7] "likemathproblems" "lookforwardmath" "favoritesubj"
## [10] "texpect" "teasy" "tinterest"
## [13] "interesttodo" "clearanswer" "explaingood"
## [16] "showslearned" "thingstohelp" "howtobetter"
## [19] "tlisten" "wellmath" "difficultmath"
## [22] "notstrong" "mathquick" "mathnerv"
## [25] "ML2" "ML1" "ML3"
## [28] "ML4"
We will predict math achievement by the factors we got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside.
Model 1 - by the ML1.
##
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum +
## regression$eduf + regression$borncountry + regression$ML1)
##
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
## Min 1Q Median 3Q Max
## -312.070 -53.576 0.293 57.018 280.527
##
## Labels:
## value label
## 999 Omitted or invalid
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 596.8023 14.2690 41.825 < 2e-16 ***
## regression$gender -7.6913 2.5088 -3.066 0.00218 **
## regression$edum 0.3310 0.7510 0.441 0.65943
## regression$eduf 1.3327 0.7355 1.812 0.07007 .
## regression$borncountry -7.6325 13.1414 -0.581 0.56141
## regression$ML1 -28.5541 1.3546 -21.080 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 83.76 on 4549 degrees of freedom
## Multiple R-squared: 0.09075, Adjusted R-squared: 0.08975
## F-statistic: 90.8 on 5 and 4549 DF, p-value: < 2.2e-16
P-value is less than 0.05, which is good. R-squared is a statistical measure of how close the data are to the fitted regression line. The higher the R-squared, the better the model fits the data. This model explains 0.09% of the variability of the response data around its mean. Gender and ML1 here are significant. However, we can notice that parental education and country are not that much significant. Explanatory power is not really strong.
The intercept is 596.802, ,means that if all variables are egual to 0, the math achievement is egual to 596.802. Since ML1 is sympathy in learning math, model shows it is significant effect on math achievement.
Model 2 - by the ML2.
##
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum +
## regression$eduf + regression$borncountry + regression$ML2)
##
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
## Min 1Q Median 3Q Max
## -335.95 -56.80 1.76 59.94 267.94
##
## Labels:
## value label
## 999 Omitted or invalid
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 584.3310 14.8449 39.362 < 2e-16 ***
## regression$gender -2.0119 2.5962 -0.775 0.438
## regression$edum 1.0557 0.7820 1.350 0.177
## regression$eduf 1.2066 0.7659 1.575 0.115
## regression$borncountry -6.7636 13.6831 -0.494 0.621
## regression$ML2 -10.2510 1.3604 -7.535 5.84e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87.21 on 4549 degrees of freedom
## Multiple R-squared: 0.01423, Adjusted R-squared: 0.01315
## F-statistic: 13.14 on 5 and 4549 DF, p-value: 1.004e-12
P-value is less than 0.05, which is good. This model explains 0.01% of the variability of the response data around its mean. ML2 here are significant. However, we can notice that gender, parental education and country are not that much significant. Explanatory power is not really strong.
The intercept is 584.331, ,means that if all variables are egual to 0, the math achievement is egual to 584.331. Since ML2 is the role of the teacher in the study of math, model shows it is significant effect on math achievement.
Model 3 - by the ML3.
##
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum +
## regression$eduf + regression$borncountry + regression$ML3)
##
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
## Min 1Q Median 3Q Max
## -327.96 -48.61 2.26 50.44 268.30
##
## Labels:
## value label
## 999 Omitted or invalid
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 588.9537 13.3699 44.051 < 2e-16 ***
## regression$gender -13.1268 2.3621 -5.557 2.9e-08 ***
## regression$edum 0.5002 0.7039 0.711 0.4774
## regression$eduf 1.7220 0.6899 2.496 0.0126 *
## regression$borncountry 5.0055 12.3277 0.406 0.6847
## regression$ML3 -45.8227 1.3634 -33.610 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78.54 on 4549 degrees of freedom
## Multiple R-squared: 0.2005, Adjusted R-squared: 0.1996
## F-statistic: 228.1 on 5 and 4549 DF, p-value: < 2.2e-16
P-value is less than 0.05, which is good. This model explains 0.2% of the variability of the response data around its mean. ML3 here are significant as the gender and education of father. However, we can notice that education of mother and country are not that much significant. Explanatory power is better than for previous models.
The intercept is 588.954, ,means that if all variables are egual to 0, the math achievement is egual to 588.954. Since ML3 is difficulties in learning math, model shows it is significant effect on math achievement.
Model 4 - by the ML4.
##
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum +
## regression$eduf + regression$borncountry + regression$ML4)
##
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
## Min 1Q Median 3Q Max
## -337.92 -55.84 1.92 60.10 275.81
##
## Labels:
## value label
## 999 Omitted or invalid
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 584.6347 14.9403 39.131 <2e-16 ***
## regression$gender -1.8269 2.6132 -0.699 0.4845
## regression$edum 0.8427 0.7866 1.071 0.2841
## regression$eduf 1.2697 0.7706 1.648 0.0995 .
## regression$borncountry -6.5698 13.7682 -0.477 0.6333
## regression$ML4 -0.4349 1.6160 -0.269 0.7878
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87.75 on 4549 degrees of freedom
## Multiple R-squared: 0.001944, Adjusted R-squared: 0.0008468
## F-statistic: 1.772 on 5 and 4549 DF, p-value: 0.115
P-value is higher than 0.05, which is bad This model explains 0.001% of the variability of the response data around its mean. ML4 here is not significant as other variables. Explanatory power is bad.
The intercept is 584.635, ,means that if all variables are egual to 0, the math achievement is egual to 584.635. Since ML4 is reluctance to learn math, model shows it is not significant effect on math achievement.
## [1] 53272.72
## [1] 53640.74
## [1] 52686.93
## [1] 53697.17
We see that AIC for model3 is the lowest (=52686.93), means this model is the best one for explaining math achievement.
Firsly, we check our model on multicollinerity, where collinearity exists between three or more variables. Therefore we are using VIF, the variance inflation factor, which measures how much the variance of a regression coefficient is inflated due to multicollinearity in the model.
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
## regression$gender regression$edum regression$eduf
## 1.029326 1.432432 1.423156
## regression$borncountry regression$ML3
## 1.001023 1.022360
The test showed that vif score for the predictor variableles less than 5 - that is okay (moderately correlated). No multicollinearity in our model is presented.
Next step in model diagnostic is a look on residuals and leverages.
## [1] 846 1348
The first plot shows that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed errors. So, no heteroscedasticity.
Q-Q plot shows that the distributions matched more or less perfectly, the residuals are normally distributed because the points follow the dotted line closely.It is seen expect observations 846,1348. That is okay. The model residuals have passed the test of normality.
The graphs show that we have outliers, but not leverages. Under Cook’s distance there is no points, means no leverages, which is good.
## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferonni p
## 846 -4.185481 2.8994e-05 0.13207
Bonferonni p-value shows that observation 846 is an outlier, but it is not influences the regression line - the test statistically significant.
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
The distribution of studentized residuals is normal.
The model diagnostic results are good.
All in all, we have made a factor analysis that was quite well and logically. In this way, we created 4 models to predict which factor has the most influence on math achievement. It was found that although all models have weak explanatory power, the model with ML3 - difficulties in learning math factor (which contains variables about how difficult / not difficult it is for a child to learn math and how quickly and well they do it) - best explains the mathematics achievement. However, the model with ML4 - reluctance to learn math factor - was not significant at all and did not explain the mathematics achievement. The best model was diagnosed and showed good results.