Introduction

Project author: Lamova Tamara

Project co-authors: Likhodievskaya Yulya, Ryabova Anastasia.

TIMSS is an international assessment of mathematics and science at the fourth and eighth grades. In 2015, 57 countries participated in TIMSS. There are questionnaires that were completed by students, parents, teachers, school principals, and curriculum specialists. The questionnaire data are presented publicly.

For our project we chosed data for Japan.

The aim of the research project is to understand how questions in questionnaire split into factors, and how they affect the academic achievement.

Research question: Which factor is best for describing math achievement of Japanese students?

Data description

The data we are using is from TIMSS & PIRLS International Study Center. We selected the desired country for analysis and worked with part of the data.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The uploaded data initially contained 436 variables and 4745 observations, but 24 variables were pre-selected for factor analysis. All of them were renamed for simplicity. The observations with NA were deleted, and our final dataset has 4555 observations.

## [1] 4555   24
## 'data.frame':    4555 obs. of  24 variables:
##  $ enjoymath       :Class 'haven_labelled'  atomic [1:4555] 1 3 2 3 2 1 1 3 2 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\ENJOY LEARNING MATHEMATICS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ wishnotstudy    :Class 'haven_labelled'  atomic [1:4555] 4 1 3 3 3 4 3 2 2 3 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\WISH HAVE NOT TO STUDY MATH"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ mathboring      :Class 'haven_labelled'  atomic [1:4555] 4 1 3 3 2 4 4 2 2 3 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATH IS BORING"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ learninterest   :Class 'haven_labelled'  atomic [1:4555] 1 4 2 2 2 1 2 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LEARN INTERESTING THINGS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ likemath        :Class 'haven_labelled'  atomic [1:4555] 1 3 2 3 3 1 1 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE MATHEMATICS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ likenumbers     :Class 'haven_labelled'  atomic [1:4555] 1 4 2 3 3 2 3 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE NUMBERS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ likemathproblems:Class 'haven_labelled'  atomic [1:4555] 1 3 3 2 3 1 2 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LIKE MATH PROBLEMS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ lookforwardmath :Class 'haven_labelled'  atomic [1:4555] 1 3 3 2 3 3 2 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LOOK FORWARD TO MATH CLASS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ favoritesubj    :Class 'haven_labelled'  atomic [1:4555] 1 3 2 2 3 1 1 3 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\FAVORITE SUBJECT"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ texpect         :Class 'haven_labelled'  atomic [1:4555] 4 2 3 3 3 3 3 2 4 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER EXPECTS TO DO"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ teasy           :Class 'haven_labelled'  atomic [1:4555] 4 3 2 2 2 3 3 2 3 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER IS EASY TO UNDERSTAND"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ tinterest       :Class 'haven_labelled'  atomic [1:4555] 4 1 3 2 3 3 3 2 1 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\INTERESTED IN WHAT TCHR SAYS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ interesttodo    :Class 'haven_labelled'  atomic [1:4555] 4 3 4 2 3 2 3 2 4 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\INTERESTING THINGS TO DO"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ clearanswer     :Class 'haven_labelled'  atomic [1:4555] 4 1 2 2 2 2 2 2 2 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER CLEAR ANSWERS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ explaingood     :Class 'haven_labelled'  atomic [1:4555] 4 3 2 2 2 3 3 2 3 3 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER EXPLAINS GOOD"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ showslearned    :Class 'haven_labelled'  atomic [1:4555] 4 3 2 2 2 3 2 3 3 3 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER SHOWS LEARNED"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ thingstohelp    :Class 'haven_labelled'  atomic [1:4555] 4 2 2 2 2 1 2 2 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\DIFFERENT THINGS TO HELP"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ howtobetter     :Class 'haven_labelled'  atomic [1:4555] 4 3 2 2 2 1 2 2 2 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TELLS HOW TO DO BETTER"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ tlisten         :Class 'haven_labelled'  atomic [1:4555] 4 1 3 2 2 1 2 2 3 2 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\TEACHER LISTENS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ wellmath        :Class 'haven_labelled'  atomic [1:4555] 4 3 3 4 3 3 3 3 4 1 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\USUALLY DO WELL IN MATH"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ difficultmath   :Class 'haven_labelled'  atomic [1:4555] 4 3 2 3 2 3 3 1 4 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATHEMATICS IS MORE DIFFICULT"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ notstrong       :Class 'haven_labelled'  atomic [1:4555] 4 2 2 2 2 4 3 1 1 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\MATHEMATICS NOT MY STRENGTH"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ mathquick       :Class 'haven_labelled'  atomic [1:4555] 4 3 3 3 3 1 3 3 3 3 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\LEARN QUICKLY IN MATHEMATICS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  $ mathnerv        :Class 'haven_labelled'  atomic [1:4555] 4 1 3 3 3 4 3 1 3 4 ...
##   .. ..- attr(*, "label")= chr "MATH\\AGREE\\MAT MAKES NERVOUS"
##   .. ..- attr(*, "labels")= Named num [1:5] 1 2 3 4 9
##   .. .. ..- attr(*, "names")= chr [1:5] "Agree a lot" "Agree a little" "Disagree a little" "Disagree a lot" ...
##  - attr(*, "na.action")=Class 'exclude'  Named int [1:190] 15 40 127 138 148 194 243 247 265 274 ...
##   .. ..- attr(*, "names")= chr [1:190] "15" "40" "127" "138" ...

Summary of our data shows that all our variables are haven_labelled which we will fix later.

The name of our variables:

## The following objects are masked from dt (pos = 3):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy

## The following objects are masked from dt (pos = 3):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy
## The following objects are masked from dt (pos = 4):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy

## The following objects are masked from dt (pos = 3):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy
## The following objects are masked from dt (pos = 4):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy
## The following objects are masked from dt (pos = 5):
## 
##     clearanswer, difficultmath, enjoymath, explaingood,
##     favoritesubj, howtobetter, interesttodo, learninterest,
##     likemath, likemathproblems, likenumbers, lookforwardmath,
##     mathboring, mathnerv, mathquick, notstrong, showslearned,
##     teasy, texpect, thingstohelp, tinterest, tlisten, wellmath,
##     wishnotstudy

Here we see distribution of all variables. All of them consist of scales from 1 to 4. It is coded data which means:

1: Agree a lot; 2: Agree a little; 3: Disagree a little; 4: Disagree a lot

Hense, there are not many answers 1 or 4 (agree a lot/disagree a lot) as 2 and 3 (agree a little/disagree a little).

Correlation matrix

Correlation matrix displaying the positive (blue) and negative (red) correlations between our variables. There is a correlation. We can group our variables.

## 'data.frame':    4555 obs. of  24 variables:
##  $ enjoymath       : num  1 3 2 3 2 1 1 3 2 2 ...
##  $ wishnotstudy    : num  4 1 3 3 3 4 3 2 2 3 ...
##  $ mathboring      : num  4 1 3 3 2 4 4 2 2 3 ...
##  $ learninterest   : num  1 4 2 2 2 1 2 3 3 2 ...
##  $ likemath        : num  1 3 2 3 3 1 1 3 3 2 ...
##  $ likenumbers     : num  1 4 2 3 3 2 3 3 3 2 ...
##  $ likemathproblems: num  1 3 3 2 3 1 2 3 3 2 ...
##  $ lookforwardmath : num  1 3 3 2 3 3 2 3 3 2 ...
##  $ favoritesubj    : num  1 3 2 2 3 1 1 3 3 2 ...
##  $ texpect         : num  4 2 3 3 3 3 3 2 4 2 ...
##  $ teasy           : num  4 3 2 2 2 3 3 2 3 4 ...
##  $ tinterest       : num  4 1 3 2 3 3 3 2 1 4 ...
##  $ interesttodo    : num  4 3 4 2 3 2 3 2 4 4 ...
##  $ clearanswer     : num  4 1 2 2 2 2 2 2 2 2 ...
##  $ explaingood     : num  4 3 2 2 2 3 3 2 3 3 ...
##  $ showslearned    : num  4 3 2 2 2 3 2 3 3 3 ...
##  $ thingstohelp    : num  4 2 2 2 2 1 2 2 3 2 ...
##  $ howtobetter     : num  4 3 2 2 2 1 2 2 2 2 ...
##  $ tlisten         : num  4 1 3 2 2 1 2 2 3 2 ...
##  $ wellmath        : num  4 3 3 4 3 3 3 3 4 1 ...
##  $ difficultmath   : num  4 3 2 3 2 3 3 1 4 4 ...
##  $ notstrong       : num  4 2 2 2 2 4 3 1 1 4 ...
##  $ mathquick       : num  4 3 3 3 3 1 3 3 3 3 ...
##  $ mathnerv        : num  4 1 3 3 3 4 3 1 3 4 ...

Our variables were coded as numeric for next step of analysis.

Factor Analysis

Since Factor Analysis (FA) does not work with NA, we have already excluded it from our data. Now we can conduct FA.

How many factors should be extracted?

## 
## Attaching package: 'psych'
## The following object is masked from 'package:polycor':
## 
##     polyserial

## Parallel analysis suggests that the number of factors =  5  and the number of components =  3

It is hard to interpret the plot, but if we need to know about number of factors itself. Parallel analysis suggests that the number of factors = 5 and the number of components = 1.

## Factor Analysis using method =  ml
## Call: fa(r = dt, nfactors = 4, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                    ML2   ML1   ML3   ML4   h2   u2 com
## enjoymath         0.25  0.74  0.29  0.24 0.75 0.25 1.8
## wishnotstudy     -0.13 -0.32 -0.20 -0.61 0.53 0.47 1.9
## mathboring       -0.21 -0.44 -0.17 -0.65 0.69 0.31 2.2
## learninterest     0.32  0.67  0.19  0.22 0.64 0.36 1.9
## likemath          0.20  0.80  0.39  0.18 0.87 0.13 1.7
## likenumbers       0.23  0.74  0.27  0.17 0.70 0.30 1.6
## likemathproblems  0.19  0.76  0.36  0.18 0.78 0.22 1.7
## lookforwardmath   0.38  0.66  0.15  0.25 0.66 0.34 2.0
## favoritesubj      0.17  0.75  0.44  0.16 0.81 0.19 1.8
## texpect           0.45  0.26  0.16  0.00 0.30 0.70 1.9
## teasy             0.78  0.13  0.08  0.17 0.65 0.35 1.2
## tinterest         0.69  0.29  0.04  0.14 0.59 0.41 1.4
## interesttodo      0.64  0.27  0.08  0.06 0.50 0.50 1.4
## clearanswer       0.73  0.09  0.07  0.07 0.55 0.45 1.1
## explaingood       0.80  0.07  0.04  0.16 0.66 0.34 1.1
## showslearned      0.53  0.14  0.03 -0.02 0.30 0.70 1.1
## thingstohelp      0.75  0.11  0.03  0.09 0.58 0.42 1.1
## howtobetter       0.74  0.10  0.04  0.05 0.56 0.44 1.0
## tlisten           0.68  0.14  0.06  0.04 0.49 0.51 1.1
## wellmath          0.13  0.34  0.66  0.02 0.56 0.44 1.6
## difficultmath     0.04 -0.17 -0.66 -0.19 0.51 0.49 1.3
## notstrong        -0.03 -0.41 -0.72 -0.19 0.72 0.28 1.8
## mathquick         0.15  0.38  0.55  0.08 0.48 0.52 2.0
## mathnerv         -0.16 -0.36 -0.32 -0.42 0.43 0.57 3.2
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           5.30 4.97 2.60 1.43
## Proportion Var        0.22 0.21 0.11 0.06
## Cumulative Var        0.22 0.43 0.54 0.60
## Proportion Explained  0.37 0.35 0.18 0.10
## Cumulative Proportion 0.37 0.72 0.90 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.54 with Chi Square of  75155.14
## The degrees of freedom for the model are 186  and the objective function was  0.97 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  4555 with the empirical chi square  1786.77  with prob <  2.9e-259 
## The total number of observations was  4555  with Likelihood Chi Square =  4404.41  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.916
## RMSEA index =  0.071  and the 90 % confidence intervals are  0.069 0.072
## BIC =  2837.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3  ML4
## Correlation of (regression) scores with factors   0.95 0.92 0.86 0.81
## Multiple R square of scores with factors          0.90 0.85 0.75 0.65
## Minimum correlation of possible factor scores     0.81 0.70 0.49 0.30

We tried to do FA with different number of factors (close to 5), without and with rotation. FA with 5 and more factors and varimax rotation showed bad results since it was no variable for ML5. FA with 5 and more variables and oblimin rotation showed bad results since it was factors with 2 variables. FA with 4 or more factors without rotation had ML3, ML4 and ML5 with no variables.

Hence, our analysis showed that the best results of FA is with 5 factors and varimax rotation. Without rotation

We see that communality, the proportion of each variable’s variance that can be explained by the factors is higher than error or uniqueness for most of variables. Since the higher the communality (h2), the better, and otherwise, the lower the uniqueness (u2), the better. Here texpect, showslearned, tlisten, mathquick, difficultmath, interesttodo and mathnerv do not tell much about the factor since its uniqueness is higher than communality.

For Proportion Variance we know that each factor should explain at least 10% - if less - that factor does nor explain much. Our analysis showed that only ML4 explains less than 10% - means bad results of EFA.

For Proportion Explained we know that each factor should explain +- the same proportion. However, the first factor will always explain more than next factors. For our model - ML1 explain 37%, ML2 35%, ML3 - 18%, ML4 - 10%.

RMSR = 0.03, should be closer to 0, for our FA it is good result. RMSEA index = 0.071 and it is acceptable since it is < 0.08. Tucker Lewis Index = 0.916 and it is acceptable since it is > 0.09.

The plot shows our 4 factors. Each of them have more or 3 variables which is good since factors with less variables do not make sence. Red lines on the plot mean negative relationships.

Accordingly, we conclude that results of FA are good and we can go further.

Let us name the factors.

ML1 - sympathy in learning math ML2 - the role of the teacher in the study of math ML3 - difficulties in learning math ML4 - reluctance to learn math

Internal consistency: Cronbach’s alpha

## 
## Reliability analysis   
## Call: alpha(x = ML1, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.95      0.95    0.94      0.72  18 0.0012  2.7 0.79     0.71
## 
##  lower alpha upper     95% confidence boundaries
## 0.95 0.95 0.95 
## 
##  Reliability if an item is dropped:
##                  raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r
## likemath              0.93      0.93    0.92      0.70  14   0.0015 0.0018
## likemathproblems      0.94      0.94    0.93      0.71  15   0.0014 0.0035
## favoritesubj          0.94      0.94    0.93      0.72  15   0.0014 0.0028
## likenumbers           0.94      0.94    0.93      0.73  16   0.0013 0.0045
## enjoymath             0.94      0.94    0.93      0.72  15   0.0014 0.0040
## learninterest         0.94      0.94    0.94      0.74  17   0.0012 0.0035
## lookforwardmath       0.94      0.95    0.94      0.74  17   0.0012 0.0034
##                  med.r
## likemath          0.68
## likemathproblems  0.70
## favoritesubj      0.70
## likenumbers       0.71
## enjoymath         0.69
## learninterest     0.75
## lookforwardmath   0.75
## 
##  Item statistics 
##                     n raw.r std.r r.cor r.drop mean   sd
## likemath         4555  0.93  0.92  0.92   0.90  2.6 0.97
## likemathproblems 4555  0.89  0.89  0.87   0.85  2.6 0.93
## favoritesubj     4555  0.90  0.89  0.88   0.85  2.8 0.99
## likenumbers      4555  0.86  0.87  0.84   0.82  2.9 0.78
## enjoymath        4555  0.88  0.88  0.86   0.84  2.4 0.91
## learninterest    4555  0.83  0.83  0.79   0.77  2.5 0.85
## lookforwardmath  4555  0.82  0.82  0.78   0.76  2.8 0.85
## 
## Non missing response frequency for each item
##                     1    2    3    4 miss
## likemath         0.15 0.28 0.37 0.20    0
## likemathproblems 0.13 0.30 0.38 0.19    0
## favoritesubj     0.13 0.20 0.38 0.28    0
## likenumbers      0.06 0.20 0.54 0.20    0
## enjoymath        0.16 0.37 0.34 0.13    0
## learninterest    0.12 0.34 0.42 0.12    0
## lookforwardmath  0.07 0.23 0.47 0.22    0

Cronbach’s alpha = 0.95 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.

## 
## Reliability analysis   
## Call: alpha(x = ML2, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.91      0.91    0.91       0.5 9.8 0.002  2.3 0.59     0.49
## 
##  lower alpha upper     95% confidence boundaries
## 0.9 0.91 0.91 
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r
## texpect           0.91      0.91    0.91      0.53 10.0   0.0020 0.0087
## teasy             0.89      0.89    0.90      0.48  8.4   0.0024 0.0099
## tinterest         0.90      0.90    0.90      0.49  8.6   0.0023 0.0117
## interesttodo      0.90      0.90    0.90      0.49  8.8   0.0023 0.0129
## clearanswer       0.90      0.90    0.90      0.49  8.7   0.0023 0.0120
## explaingood       0.89      0.89    0.90      0.49  8.5   0.0023 0.0090
## showslearned      0.91      0.91    0.91      0.52  9.7   0.0021 0.0112
## thingstohelp      0.90      0.90    0.90      0.49  8.6   0.0023 0.0114
## howtobetter       0.90      0.90    0.90      0.49  8.6   0.0023 0.0116
## tlisten           0.90      0.90    0.90      0.49  8.8   0.0023 0.0131
##              med.r
## texpect       0.52
## teasy         0.48
## tinterest     0.48
## interesttodo  0.49
## clearanswer   0.48
## explaingood   0.49
## showslearned  0.52
## thingstohelp  0.48
## howtobetter   0.48
## tlisten       0.49
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean   sd
## texpect      4555  0.59  0.59  0.52   0.50  2.8 0.79
## teasy        4555  0.80  0.80  0.79   0.74  2.1 0.82
## tinterest    4555  0.78  0.78  0.76   0.71  2.5 0.87
## interesttodo 4555  0.75  0.75  0.73   0.69  2.8 0.77
## clearanswer  4555  0.76  0.76  0.72   0.69  2.1 0.78
## explaingood  4555  0.79  0.79  0.78   0.73  2.1 0.83
## showslearned 4555  0.62  0.62  0.55   0.53  2.7 0.81
## thingstohelp 4555  0.77  0.78  0.75   0.71  2.0 0.76
## howtobetter  4555  0.77  0.77  0.75   0.71  2.1 0.78
## tlisten      4555  0.75  0.75  0.71   0.68  2.3 0.82
## 
## Non missing response frequency for each item
##                 1    2    3    4 miss
## texpect      0.05 0.25 0.50 0.19    0
## teasy        0.20 0.52 0.21 0.07    0
## tinterest    0.14 0.38 0.37 0.12    0
## interesttodo 0.05 0.25 0.53 0.17    0
## clearanswer  0.22 0.54 0.18 0.05    0
## explaingood  0.24 0.50 0.19 0.07    0
## showslearned 0.07 0.32 0.46 0.15    0
## thingstohelp 0.23 0.56 0.16 0.05    0
## howtobetter  0.22 0.54 0.19 0.05    0
## tlisten      0.16 0.50 0.26 0.08    0

Cronbach’s alpha = 0.91 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If almost any item dropped the alpha are getting lower - each variable is needed.

## 
## Reliability analysis   
## Call: alpha(x = ML3, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.82      0.83    0.79      0.54 4.8 0.0041  2.3 0.72     0.56
## 
##  lower alpha upper     95% confidence boundaries
## 0.82 0.82 0.83 
## 
##  Reliability if an item is dropped:
##               raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r
## notstrong          0.75      0.76    0.68      0.51 3.1   0.0064 0.0051
## difficultmath      0.80      0.80    0.73      0.58 4.1   0.0051 0.0010
## wellmath-          0.77      0.77    0.70      0.53 3.4   0.0057 0.0055
## mathquick-         0.79      0.79    0.73      0.56 3.9   0.0052 0.0039
##               med.r
## notstrong      0.49
## difficultmath  0.59
## wellmath-      0.54
## mathquick-     0.60
## 
##  Item statistics 
##                  n raw.r std.r r.cor r.drop mean   sd
## notstrong     4555  0.86  0.84  0.78   0.71  2.2 1.03
## difficultmath 4555  0.79  0.78  0.67   0.61  2.5 0.90
## wellmath-     4555  0.82  0.83  0.75   0.67  2.1 0.83
## mathquick-    4555  0.78  0.79  0.69   0.62  2.3 0.79
## 
## Non missing response frequency for each item
##                  1    2    3    4 miss
## notstrong     0.32 0.30 0.26 0.13    0
## difficultmath 0.16 0.32 0.40 0.12    0
## wellmath      0.06 0.21 0.48 0.25    0
## mathquick     0.07 0.28 0.51 0.14    0

Cronbach’s alpha = 0.83 - it is higher than 0.8 means it is a good result. Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.

## 
## Reliability analysis   
## Call: alpha(x = ML4, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.77      0.77     0.7      0.53 3.3 0.006  2.7 0.74     0.51
## 
##  lower alpha upper     95% confidence boundaries
## 0.76 0.77 0.78 
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## mathboring        0.62      0.62    0.44      0.44 1.6   0.0114    NA
## wishnotstudy      0.68      0.68    0.51      0.51 2.1   0.0095    NA
## mathnerv          0.76      0.76    0.62      0.62 3.3   0.0070    NA
##              med.r
## mathboring    0.44
## wishnotstudy  0.51
## mathnerv      0.62
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean   sd
## mathboring   4555  0.85  0.86  0.76   0.67  2.7 0.86
## wishnotstudy 4555  0.84  0.83  0.71   0.61  2.7 0.92
## mathnerv     4555  0.79  0.79  0.60   0.53  2.8 0.92
## 
## Non missing response frequency for each item
##                 1    2    3    4 miss
## mathboring   0.11 0.28 0.47 0.15    0
## wishnotstudy 0.12 0.25 0.42 0.20    0
## mathnerv     0.11 0.23 0.44 0.22    0

Cronbach’s alpha = 0.77 - it is lower than 0.8, but still good (>0.07). Reliability analysis is good. The scale is consistent and reliable. A scale can be used as a variable in further analysis. If any item dropped the alpha are getting lower - each variable is needed.

Reliability analysis for all factors are good.

Regression Analysis

How we can save scores for our regression and add new controlled variables.

##  [1] "enjoymath"        "wishnotstudy"     "mathboring"      
##  [4] "learninterest"    "likemath"         "likenumbers"     
##  [7] "likemathproblems" "lookforwardmath"  "favoritesubj"    
## [10] "texpect"          "teasy"            "tinterest"       
## [13] "interesttodo"     "clearanswer"      "explaingood"     
## [16] "showslearned"     "thingstohelp"     "howtobetter"     
## [19] "tlisten"          "wellmath"         "difficultmath"   
## [22] "notstrong"        "mathquick"        "mathnerv"        
## [25] "ML2"              "ML1"              "ML3"             
## [28] "ML4"

We will predict math achievement by the factors we got in the previous steps, controlling for gender, parental education and whether the student was born on the country or outside.

Model 1 - by the ML1.

## 
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum + 
##     regression$eduf + regression$borncountry + regression$ML1)
## 
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
##      Min       1Q   Median       3Q      Max 
## -312.070  -53.576    0.293   57.018  280.527 
## 
## Labels:
##  value              label
##    999 Omitted or invalid
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            596.8023    14.2690  41.825  < 2e-16 ***
## regression$gender       -7.6913     2.5088  -3.066  0.00218 ** 
## regression$edum          0.3310     0.7510   0.441  0.65943    
## regression$eduf          1.3327     0.7355   1.812  0.07007 .  
## regression$borncountry  -7.6325    13.1414  -0.581  0.56141    
## regression$ML1         -28.5541     1.3546 -21.080  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 83.76 on 4549 degrees of freedom
## Multiple R-squared:  0.09075,    Adjusted R-squared:  0.08975 
## F-statistic:  90.8 on 5 and 4549 DF,  p-value: < 2.2e-16

P-value is less than 0.05, which is good. R-squared is a statistical measure of how close the data are to the fitted regression line. The higher the R-squared, the better the model fits the data. This model explains 0.09% of the variability of the response data around its mean. Gender and ML1 here are significant. However, we can notice that parental education and country are not that much significant. Explanatory power is not really strong.

The intercept is 596.802, ,means that if all variables are egual to 0, the math achievement is egual to 596.802. Since ML1 is sympathy in learning math, model shows it is significant effect on math achievement.

Model 2 - by the ML2.

## 
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum + 
##     regression$eduf + regression$borncountry + regression$ML2)
## 
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
##     Min      1Q  Median      3Q     Max 
## -335.95  -56.80    1.76   59.94  267.94 
## 
## Labels:
##  value              label
##    999 Omitted or invalid
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            584.3310    14.8449  39.362  < 2e-16 ***
## regression$gender       -2.0119     2.5962  -0.775    0.438    
## regression$edum          1.0557     0.7820   1.350    0.177    
## regression$eduf          1.2066     0.7659   1.575    0.115    
## regression$borncountry  -6.7636    13.6831  -0.494    0.621    
## regression$ML2         -10.2510     1.3604  -7.535 5.84e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87.21 on 4549 degrees of freedom
## Multiple R-squared:  0.01423,    Adjusted R-squared:  0.01315 
## F-statistic: 13.14 on 5 and 4549 DF,  p-value: 1.004e-12

P-value is less than 0.05, which is good. This model explains 0.01% of the variability of the response data around its mean. ML2 here are significant. However, we can notice that gender, parental education and country are not that much significant. Explanatory power is not really strong.

The intercept is 584.331, ,means that if all variables are egual to 0, the math achievement is egual to 584.331. Since ML2 is the role of the teacher in the study of math, model shows it is significant effect on math achievement.

Model 3 - by the ML3.

## 
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum + 
##     regression$eduf + regression$borncountry + regression$ML3)
## 
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
##     Min      1Q  Median      3Q     Max 
## -327.96  -48.61    2.26   50.44  268.30 
## 
## Labels:
##  value              label
##    999 Omitted or invalid
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            588.9537    13.3699  44.051  < 2e-16 ***
## regression$gender      -13.1268     2.3621  -5.557  2.9e-08 ***
## regression$edum          0.5002     0.7039   0.711   0.4774    
## regression$eduf          1.7220     0.6899   2.496   0.0126 *  
## regression$borncountry   5.0055    12.3277   0.406   0.6847    
## regression$ML3         -45.8227     1.3634 -33.610  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.54 on 4549 degrees of freedom
## Multiple R-squared:  0.2005, Adjusted R-squared:  0.1996 
## F-statistic: 228.1 on 5 and 4549 DF,  p-value: < 2.2e-16

P-value is less than 0.05, which is good. This model explains 0.2% of the variability of the response data around its mean. ML3 here are significant as the gender and education of father. However, we can notice that education of mother and country are not that much significant. Explanatory power is better than for previous models.

The intercept is 588.954, ,means that if all variables are egual to 0, the math achievement is egual to 588.954. Since ML3 is difficulties in learning math, model shows it is significant effect on math achievement.

Model 4 - by the ML4.

## 
## Call:
## lm(formula = regression$mathachiev ~ regression$gender + regression$edum + 
##     regression$eduf + regression$borncountry + regression$ML4)
## 
## Residuals:
## <Labelled double>: 1ST PLAUSIBLE VALUE MATHEMATICS
##     Min      1Q  Median      3Q     Max 
## -337.92  -55.84    1.92   60.10  275.81 
## 
## Labels:
##  value              label
##    999 Omitted or invalid
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            584.6347    14.9403  39.131   <2e-16 ***
## regression$gender       -1.8269     2.6132  -0.699   0.4845    
## regression$edum          0.8427     0.7866   1.071   0.2841    
## regression$eduf          1.2697     0.7706   1.648   0.0995 .  
## regression$borncountry  -6.5698    13.7682  -0.477   0.6333    
## regression$ML4          -0.4349     1.6160  -0.269   0.7878    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87.75 on 4549 degrees of freedom
## Multiple R-squared:  0.001944,   Adjusted R-squared:  0.0008468 
## F-statistic: 1.772 on 5 and 4549 DF,  p-value: 0.115

P-value is higher than 0.05, which is bad This model explains 0.001% of the variability of the response data around its mean. ML4 here is not significant as other variables. Explanatory power is bad.

The intercept is 584.635, ,means that if all variables are egual to 0, the math achievement is egual to 584.635. Since ML4 is reluctance to learn math, model shows it is not significant effect on math achievement.

Comparing models by AIC

## [1] 53272.72
## [1] 53640.74
## [1] 52686.93
## [1] 53697.17

We see that AIC for model3 is the lowest (=52686.93), means this model is the best one for explaining math achievement.

Model diagnostics

Firsly, we check our model on multicollinerity, where collinearity exists between three or more variables. Therefore we are using VIF, the variance inflation factor, which measures how much the variance of a regression coefficient is inflated due to multicollinearity in the model.

## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
## The following object is masked from 'package:dplyr':
## 
##     recode
##      regression$gender        regression$edum        regression$eduf 
##               1.029326               1.432432               1.423156 
## regression$borncountry         regression$ML3 
##               1.001023               1.022360

The test showed that vif score for the predictor variableles less than 5 - that is okay (moderately correlated). No multicollinearity in our model is presented.

Next step in model diagnostic is a look on residuals and leverages.

## [1]  846 1348

The first plot shows that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed errors. So, no heteroscedasticity.

Q-Q plot shows that the distributions matched more or less perfectly, the residuals are normally distributed because the points follow the dotted line closely.It is seen expect observations 846,1348. That is okay. The model residuals have passed the test of normality.

The graphs show that we have outliers, but not leverages. Under Cook’s distance there is no points, means no leverages, which is good.

## No Studentized residuals with Bonferonni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferonni p
## 846 -4.185481         2.8994e-05      0.13207

Bonferonni p-value shows that observation 846 is an outlier, but it is not influences the regression line - the test statistically significant.

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

The distribution of studentized residuals is normal.

The model diagnostic results are good.

Conclusion

All in all, we have made a factor analysis that was quite well and logically. In this way, we created 4 models to predict which factor has the most influence on math achievement. It was found that although all models have weak explanatory power, the model with ML3 - difficulties in learning math factor (which contains variables about how difficult / not difficult it is for a child to learn math and how quickly and well they do it) - best explains the mathematics achievement. However, the model with ML4 - reluctance to learn math factor - was not significant at all and did not explain the mathematics achievement. The best model was diagnosed and showed good results.