Introduction

Aim of the project: The present study aims to determine what influences mathematics achivement among New Zealander pupils. In the following analysis, we seek to discover the exact factors of the named achievement, controlling also for the influence of gender, country of birth and parental education.

Research question: What factors influence a student’s achievement in mathematics?


Preparations

library(foreign)
library(psych)
library(polycor)
library(car)
library(ggplot2)
library(ggpubr)
nzl <- data.frame(ndata$BSBM17A, ndata$BSBM17B, ndata$BSBM17C, ndata$BSBM17D, ndata$BSBM17E, ndata$BSBM17F, ndata$BSBM17G, ndata$BSBM17H, ndata$BSBM17I, ndata$BSBM18A, ndata$BSBM18B, ndata$BSBM18C, ndata$BSBM18D, ndata$BSBM18E, ndata$BSBM18F, ndata$BSBM18G, ndata$BSBM18H, ndata$BSBM18I, ndata$BSBM18J, ndata$BSBM19A, ndata$BSBM19B, ndata$BSBM19C, ndata$BSBM19D, ndata$BSBM19E, ndata$BSMMAT01, ndata$BSDGEDUP, ndata$ITSEX, ndata$BSBG10A)
names(nzl) <- c("BSBM17A", "BSBM17B", "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01", "paredu", "gender", "country")
nzl <- subset(nzl, nzl$BSBM17A != "NA")
nzl <- subset(nzl, nzl$BSBM17B != "NA")
nzl <- subset(nzl, nzl$BSBM17C != "NA")
nzl <- subset(nzl, nzl$BSBM17D != "NA")
nzl <- subset(nzl, nzl$BSBM17E != "NA")
nzl <- subset(nzl, nzl$BSBM17F != "NA")
nzl <- subset(nzl, nzl$BSBM17G != "NA")
nzl <- subset(nzl, nzl$BSBM17H != "NA")
nzl <- subset(nzl, nzl$BSBM17I != "NA")
nzl <- subset(nzl, nzl$BSBM18A != "NA")
nzl <- subset(nzl, nzl$BSBM18B != "NA")
nzl <- subset(nzl, nzl$BSBM18C != "NA")
nzl <- subset(nzl, nzl$BSBM18D != "NA")
nzl <- subset(nzl, nzl$BSBM18E != "NA")
nzl <- subset(nzl, nzl$BSBM18F != "NA")
nzl <- subset(nzl, nzl$BSBM18G != "NA")
nzl <- subset(nzl, nzl$BSBM18H != "NA")
nzl <- subset(nzl, nzl$BSBM18I != "NA")
nzl <- subset(nzl, nzl$BSBM18J != "NA")
nzl <- subset(nzl, nzl$BSBM19A != "NA")
nzl <- subset(nzl, nzl$BSBM19B != "NA")
nzl <- subset(nzl, nzl$BSBM19C != "NA")
nzl <- subset(nzl, nzl$BSBM19D != "NA")
nzl <- subset(nzl, nzl$BSBM19E != "NA")
nzl <- subset(nzl, nzl$BSMMAT01 != "NA")
nzl <- subset(nzl, nzl$gender != "NA")
nzl <- subset(nzl, nzl$country != "NA")
nzl <- subset(nzl, nzl$paredu != "NA")

Descriptive statistics

Variables.

Variable Description and type
BSMMAT01 math achievement - continuous
gender gender of a pupil - categorical
country whether one was born in New Zealand - binary
paredu education level of parents - categorical
BSBM17A enjoys learning mathematics - ordinal
BSBM17B wish have not to study math - ordinal
BSBM17C math is boring - ordinal
BSBM17D learn interesting things - ordinal
BSBM17E like mathematics - ordinal
BSBM17F like numbers - ordinal
BSBM17G like math problems - ordinal
BSBM17H look forward to math classes - ordinal
BSBM17I favourite subject - ordinal
BSBM18A teacher expects to do - ordinal
BSBM18B teacher is easy to undestand - ordinal
BSBM18C interested in what teacher says - ordinal
BSBM18D interesting things to do - ordinal
BSBM18E teacher clear answers - ordinal
BSBM18F teacher explains good - ordinal
BSBM18G teacher shows learned - ordinal
BSBM18H different things to help - ordinal
BSBM18I tells how to do better - ordinal
BSBM18J teacher listens - ordinal
BSBM19A usually do well in math - ordinal
BSBM19B mathematics is more difficult - ordinal
BSBM19C mathematics is not my strength - ordinal
BSBM19D learn quickly in mathematics - ordinal
BSBM19E math makes nervous - ordinal
describe(nzl, na.rm = TRUE)
          vars    n    mean      sd median trimmed     mad min  max range  skew
BSBM17A*     1 7290    2.19    0.94    2.0    2.12    1.48   1    4     3  0.44
BSBM17B*     2 7290    2.83    1.04    3.0    2.91    1.48   1    4     3 -0.37
BSBM17C*     3 7290    2.44    0.96    2.0    2.42    1.48   1    4     3  0.13
BSBM17D*     4 7290    2.10    0.88    2.0    2.03    1.48   1    4     3  0.45
BSBM17E*     5 7290    2.29    0.97    2.0    2.23    1.48   1    4     3  0.32
BSBM17F*     6 7290    2.63    0.90    3.0    2.66    1.48   1    4     3 -0.17
BSBM17G*     7 7290    2.43    0.98    2.0    2.41    1.48   1    4     3  0.10
BSBM17H*     8 7290    2.69    0.96    3.0    2.73    1.48   1    4     3 -0.20
BSBM17I*     9 7290    2.77    1.10    3.0    2.83    1.48   1    4     3 -0.34
BSBM18A*    10 7290    1.66    0.71    2.0    1.55    1.48   1    4     3  0.92
BSBM18B*    11 7290    1.94    0.92    2.0    1.84    1.48   1    4     3  0.67
BSBM18C*    12 7290    2.15    0.90    2.0    2.09    1.48   1    4     3  0.37
BSBM18D*    13 7290    2.29    0.93    2.0    2.24    1.48   1    4     3  0.22
BSBM18E*    14 7290    2.00    0.93    2.0    1.91    1.48   1    4     3  0.58
BSBM18F*    15 7290    1.86    0.92    2.0    1.74    1.48   1    4     3  0.81
BSBM18G*    16 7290    2.15    0.90    2.0    2.08    1.48   1    4     3  0.38
BSBM18H*    17 7290    1.90    0.89    2.0    1.79    1.48   1    4     3  0.73
BSBM18I*    18 7290    1.84    0.88    2.0    1.73    1.48   1    4     3  0.82
BSBM18J*    19 7290    1.88    0.91    2.0    1.75    1.48   1    4     3  0.82
BSBM19A*    20 7290    2.02    0.83    2.0    1.94    1.48   1    4     3  0.60
BSBM19B*    21 7290    2.69    0.96    3.0    2.73    1.48   1    4     3 -0.21
BSBM19C*    22 7290    2.45    1.08    2.0    2.44    1.48   1    4     3  0.05
BSBM19D*    23 7290    2.29    0.89    2.0    2.24    1.48   1    4     3  0.19
BSBM19E*    24 7290    2.80    0.96    3.0    2.88    1.48   1    4     3 -0.32
BSMMAT01*   25 7290 4190.75 2342.11 4271.5 4213.60 2997.82   1 8127  8126 -0.07
paredu*     26 7290    4.16    2.10    6.0    4.32    0.00   1    6     5 -0.45
gender*     27 7290    1.46    0.50    1.0    1.45    0.00   1    2     1  0.17
country*    28 7290    1.16    0.37    1.0    1.08    0.00   1    2     1  1.84
          kurtosis    se
BSBM17A*     -0.68  0.01
BSBM17B*     -1.10  0.01
BSBM17C*     -0.94  0.01
BSBM17D*     -0.50  0.01
BSBM17E*     -0.88  0.01
BSBM17F*     -0.73  0.01
BSBM17G*     -0.98  0.01
BSBM17H*     -0.92  0.01
BSBM17I*     -1.22  0.01
BSBM18A*      0.60  0.01
BSBM18B*     -0.45  0.01
BSBM18C*     -0.64  0.01
BSBM18D*     -0.81  0.01
BSBM18E*     -0.60  0.01
BSBM18F*     -0.29  0.01
BSBM18G*     -0.65  0.01
BSBM18H*     -0.28  0.01
BSBM18I*     -0.11  0.01
BSBM18J*     -0.20  0.01
BSBM19A*     -0.07  0.01
BSBM19B*     -0.90  0.01
BSBM19C*     -1.26  0.01
BSBM19D*     -0.73  0.01
BSBM19E*     -0.89  0.01
BSMMAT01*    -1.19 27.43
paredu*      -1.55  0.02
gender*      -1.97  0.01
country*      1.38  0.00

All variables contain results, there are no empty variables. Similarly, no NA’s remained. Overall, there are 7290 observations.

Proceeding to relationship between math achievement and its potential predictors.

summary(as.numeric(as.character(nzl$BSMMAT01)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   197.3   444.8   510.0   505.5   568.0   752.6

Gender

summary(nzl$gender)
## Female   Male 
##   3946   3344
ggplot(nzl, aes(as.numeric(as.character(BSMMAT01)))) + geom_histogram(fill = "lightsteelblue1", color = "skyblue2") + labs(x = "Math achievement") + facet_grid(~gender) + theme_pubclean()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

In general, there are more girls than boys in the sampling, so it’s hard to assess their differences. Anyway, the results are similar for these two groups, and the distribution of achievement is skewed but slightly close to normality.

Parental education

summary(nzl$paredu)
##                       University or Higher 
##                                       1443 
##          Post-secondary but not University 
##                                        764 
##                            Upper Secondary 
##                                        790 
##                            Lower Secondary 
##                                        345 
## Some Primary, Lower Secondary or No School 
##                                         93 
##                                 Don't Know 
##                                       3855

Noting in advance that there are too many cases of ‘Don’t know’; the data is thus rather biased.

## Let us remove don't know answers for the plot
parenteduc <- as.data.frame(nzl)
parenteduc <- subset(parenteduc, parenteduc$paredu != "Don't Know")

ggplot(parenteduc, aes(as.numeric(as.character(BSMMAT01)))) + geom_histogram(fill = "lightsteelblue1", color = "skyblue2") + labs(x = "Math achievement") + facet_grid(~paredu) + theme_pubclean()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

More or less similar trend can be found in distribution for all parental education levels (except Some Primary, Lower Secondary or No School) . But still the number of cases differ significantly across the categories of education of parents; cannot be compared.

Born in the country or outside

ggplot(nzl, aes(as.numeric(as.character(BSMMAT01)))) + geom_histogram(color = "skyblue2", fill = "lightsteelblue1") + labs(x = "Math achievement") + facet_grid(~country) + theme_pubclean()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The majority of respondents were born in the country. However, the distribution of achievements is almost normal, a little left skewed.

Variables from BSBM17A to BSBM19E have four levels - “Disagree a lot”, Disagree a little“,”Agree a little" and “Agree a lot”.

#For instance,
summary(nzl$BSBM17A)
##       Agree a lot    Agree a little Disagree a little    Disagree a lot 
##              1826              3075              1539               850

Preparations for factor analysis

All variables except BSMMAT01 contain non-numeric data, so we need to change them.

##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1826
##   Agree a little       0    0 3075    0
##   Disagree a little    0 1539    0    0
##   Disagree a lot     850    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0  979
##   Agree a little       0    0 1756    0
##   Disagree a little    0 2078    0    0
##   Disagree a lot    2477    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1304
##   Agree a little       0    0 2695    0
##   Disagree a little    0 2102    0    0
##   Disagree a lot    1189    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1947
##   Agree a little       0    0 3196    0
##   Disagree a little    0 1621    0    0
##   Disagree a lot     526    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1687
##   Agree a little       0    0 2853    0
##   Disagree a little    0 1730    0    0
##   Disagree a lot    1020    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0  846
##   Agree a little       0    0 2238    0
##   Disagree a little    0 2959    0    0
##   Disagree a lot    1247    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1404
##   Agree a little       0    0 2532    0
##   Disagree a little    0 2181    0    0
##   Disagree a lot    1173    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0  930
##   Agree a little       0    0 2070    0
##   Disagree a little    0 2647    0    0
##   Disagree a lot    1643    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1296
##   Agree a little       0    0 1546    0
##   Disagree a little    0 2017    0    0
##   Disagree a lot    2431    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 3406
##   Agree a little       0    0 3117    0
##   Disagree a little    0  631    0    0
##   Disagree a lot     136    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 2777
##   Agree a little       0    0 2690    0
##   Disagree a little    0 1308    0    0
##   Disagree a lot     515    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1883
##   Agree a little       0    0 3025    0
##   Disagree a little    0 1795    0    0
##   Disagree a lot     587    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1583
##   Agree a little       0    0 2804    0
##   Disagree a little    0 2115    0    0
##   Disagree a lot     788    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 2575
##   Agree a little       0    0 2693    0
##   Disagree a little    0 1441    0    0
##   Disagree a lot     581    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 3169
##   Agree a little       0    0 2477    0
##   Disagree a little    0 1144    0    0
##   Disagree a lot     500    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1893
##   Agree a little       0    0 3012    0
##   Disagree a little    0 1784    0    0
##   Disagree a lot     601    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 2851
##   Agree a little       0    0 2793    0
##   Disagree a little    0 1193    0    0
##   Disagree a lot     453    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 3083
##   Agree a little       0    0 2718    0
##   Disagree a little    0 1072    0    0
##   Disagree a lot     417    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 3023
##   Agree a little       0    0 2678    0
##   Disagree a little    0 1064    0    0
##   Disagree a lot     525    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 2009
##   Agree a little       0    0 3594    0
##   Disagree a little    0 1248    0    0
##   Disagree a lot     439    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0  925
##   Agree a little       0    0 2057    0
##   Disagree a little    0 2676    0    0
##   Disagree a lot    1632    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1758
##   Agree a little       0    0 2018    0
##   Disagree a little    0 1956    0    0
##   Disagree a lot    1558    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0 1476
##   Agree a little       0    0 2920    0
##   Disagree a little    0 2212    0    0
##   Disagree a lot     682    0    0    0
##                    
##                        1    2    3    4
##   Agree a lot          0    0    0  791
##   Agree a little       0    0 1873    0
##   Disagree a little    0 2614    0    0
##   Disagree a lot    2012    0    0    0
nzld <- data.frame(nzl$BSBM17A1, nzl$BSBM17B1, nzl$BSBM17C1, nzl$BSBM17D1, nzl$BSBM17E1, nzl$BSBM17F1, nzl$BSBM17G1, nzl$BSBM17H1, nzl$BSBM17I1, nzl$BSBM18A1, nzl$BSBM18B1, nzl$BSBM18C1, nzl$BSBM18D1, nzl$BSBM18E1, nzl$BSBM18F1, nzl$BSBM18G1, nzl$BSBM18H1, nzl$BSBM18I1, nzl$BSBM18J1, nzl$BSBM19A1, nzl$BSBM19B1, nzl$BSBM19C1, nzl$BSBM19D1, nzl$BSBM19E1, nzl$BSMMAT01, nzl$paredu, nzl$gender, nzl$country)
names(nzld) <- c("BSBM17A1", "BSBM17B1", "BSBM17C1", "BSBM17D1", "BSBM17E1", "BSBM17F1", "BSBM17G1", "BSBM17H1", "BSBM17I1", "BSBM18A1", "BSBM18B1", "BSBM18C1", "BSBM18D1", "BSBM18E1", "BSBM18F1", "BSBM18G1", "BSBM18H1", "BSBM18I1", "BSBM18J1", "BSBM19A1", "BSBM19B1", "BSBM19C1", "BSBM19D1", "BSBM19E1", "BSMMAT01", "paredu", "gender", "country")

Factor analysis

Special data frame for factor analysis:

nzldfa <- data.frame(nzl$BSBM17A1, nzl$BSBM17B1, nzl$BSBM17C1, nzl$BSBM17D1, nzl$BSBM17E1, nzl$BSBM17F1, nzl$BSBM17G1, nzl$BSBM17H1, nzl$BSBM17I1, nzl$BSBM18A1, nzl$BSBM18B1, nzl$BSBM18C1, nzl$BSBM18D1, nzl$BSBM18E1, nzl$BSBM18F1, nzl$BSBM18G1, nzl$BSBM18H1, nzl$BSBM18I1, nzl$BSBM18J1, nzl$BSBM19A1, nzl$BSBM19B1, nzl$BSBM19C1, nzl$BSBM19D1, nzl$BSBM19E1)
names(nzldfa) <- c("BSBM17A1", "BSBM17B1", "BSBM17C1", "BSBM17D1", "BSBM17E1", "BSBM17F1", "BSBM17G1", "BSBM17H1", "BSBM17I1", "BSBM18A1", "BSBM18B1", "BSBM18C1", "BSBM18D1", "BSBM18E1", "BSBM18F1", "BSBM18G1", "BSBM18H1", "BSBM18I1", "BSBM18J1", "BSBM19A1", "BSBM19B1", "BSBM19C1", "BSBM19D1", "BSBM19E1")

Checking whether all variales are numeric.

sapply(nzldfa, class)
##  BSBM17A1  BSBM17B1  BSBM17C1  BSBM17D1  BSBM17E1  BSBM17F1  BSBM17G1  BSBM17H1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##  BSBM17I1  BSBM18A1  BSBM18B1  BSBM18C1  BSBM18D1  BSBM18E1  BSBM18F1  BSBM18G1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##  BSBM18H1  BSBM18I1  BSBM18J1  BSBM19A1  BSBM19B1  BSBM19C1  BSBM19D1  BSBM19E1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"

They are, so we can proceed further.

Getting correlations:

dat.cor <- hetcor(nzldfa)
dat.cor <- dat.cor$correlations
dat.cor
##            BSBM17A1   BSBM17B1   BSBM17C1    BSBM17D1   BSBM17E1   BSBM17F1
## BSBM17A1  1.0000000 -0.5768786 -0.5655810  0.62834768  0.8323609  0.6577364
## BSBM17B1 -0.5768786  1.0000000  0.6158734 -0.43266234 -0.5896195 -0.4771128
## BSBM17C1 -0.5655810  0.6158734  1.0000000 -0.43504951 -0.5901481 -0.4974803
## BSBM17D1  0.6283477 -0.4326623 -0.4350495  1.00000000  0.6340212  0.5346317
## BSBM17E1  0.8323609 -0.5896195 -0.5901481  0.63402123  1.0000000  0.7051352
## BSBM17F1  0.6577364 -0.4771128 -0.4974803  0.53463175  0.7051352  1.0000000
## BSBM17G1  0.7029365 -0.5114426 -0.5080860  0.55586174  0.7438328  0.7492198
## BSBM17H1  0.7053954 -0.5004388 -0.5648015  0.57592856  0.7227450  0.6573554
## BSBM17I1  0.7137881 -0.5194981 -0.5652939  0.52938661  0.7476362  0.6590231
## BSBM18A1  0.3363467 -0.2328558 -0.2310628  0.36018515  0.3329031  0.2722640
## BSBM18B1  0.3746519 -0.2722510 -0.2699911  0.38349082  0.3569243  0.2877919
## BSBM18C1  0.5318730 -0.3969848 -0.4103481  0.54303880  0.5183505  0.4348276
## BSBM18D1  0.5019071 -0.3545114 -0.4035136  0.52746717  0.4887656  0.4200531
## BSBM18E1  0.3707794 -0.2754567 -0.2819333  0.40988309  0.3578290  0.2964591
## BSBM18F1  0.3589903 -0.2623412 -0.2669856  0.39848855  0.3498627  0.2732231
## BSBM18G1  0.3546839 -0.2485478 -0.2548801  0.38031003  0.3493751  0.2966898
## BSBM18H1  0.3533003 -0.2447316 -0.2742504  0.41714836  0.3366282  0.2738079
## BSBM18I1  0.3301120 -0.2314396 -0.2416543  0.39139863  0.3190449  0.2508407
## BSBM18J1  0.3373492 -0.2614111 -0.2608939  0.38161483  0.3266244  0.2615856
## BSBM19A1  0.5236116 -0.3786238 -0.3447032  0.36535013  0.5399734  0.4798487
## BSBM19B1 -0.2882632  0.3131022  0.2807667 -0.14560072 -0.3106195 -0.2747911
## BSBM19C1 -0.4269891  0.4209089  0.3797490 -0.24700572 -0.4521639 -0.4142184
## BSBM19D1  0.5329095 -0.3856843 -0.3548225  0.38459148  0.5494962  0.5033834
## BSBM19E1 -0.2066615  0.2427975  0.2208979 -0.08275696 -0.2174522 -0.1875799
##            BSBM17G1   BSBM17H1   BSBM17I1    BSBM18A1   BSBM18B1    BSBM18C1
## BSBM17A1  0.7029365  0.7053954  0.7137881  0.33634674  0.3746519  0.53187304
## BSBM17B1 -0.5114426 -0.5004388 -0.5194981 -0.23285583 -0.2722510 -0.39698479
## BSBM17C1 -0.5080860 -0.5648015 -0.5652939 -0.23106281 -0.2699911 -0.41034806
## BSBM17D1  0.5558617  0.5759286  0.5293866  0.36018515  0.3834908  0.54303880
## BSBM17E1  0.7438328  0.7227450  0.7476362  0.33290307  0.3569243  0.51835047
## BSBM17F1  0.7492198  0.6573554  0.6590231  0.27226399  0.2877919  0.43482759
## BSBM17G1  1.0000000  0.6707390  0.6695092  0.29207641  0.2940582  0.45755313
## BSBM17H1  0.6707390  1.0000000  0.7524251  0.33600329  0.3911449  0.54875669
## BSBM17I1  0.6695092  0.7524251  1.0000000  0.28088590  0.3289157  0.45875566
## BSBM18A1  0.2920764  0.3360033  0.2808859  1.00000000  0.5748823  0.49294712
## BSBM18B1  0.2940582  0.3911449  0.3289157  0.57488229  1.0000000  0.62727427
## BSBM18C1  0.4575531  0.5487567  0.4587557  0.49294712  0.6272743  1.00000000
## BSBM18D1  0.4305047  0.5403949  0.4550318  0.46144318  0.5822664  0.72089200
## BSBM18E1  0.3043997  0.3975143  0.3330995  0.50944991  0.7150788  0.60465410
## BSBM18F1  0.2833863  0.3782298  0.3131238  0.49297448  0.7360868  0.58119967
## BSBM18G1  0.3045929  0.3864228  0.3177633  0.44607371  0.5228219  0.52357457
## BSBM18H1  0.2787567  0.3857662  0.3107298  0.45646831  0.5636863  0.52498541
## BSBM18I1  0.2612306  0.3446488  0.2837535  0.48483722  0.5842734  0.52307537
## BSBM18J1  0.2734775  0.3652734  0.2932462  0.47911980  0.6064992  0.54430381
## BSBM19A1  0.5272103  0.4412476  0.4929663  0.28231396  0.3016033  0.32385135
## BSBM19B1 -0.3129336 -0.2252724 -0.3207609 -0.10071192 -0.1403170 -0.12848220
## BSBM19C1 -0.4558835 -0.3484632 -0.4870410 -0.14830497 -0.1875101 -0.20933838
## BSBM19D1  0.5504441  0.4542709  0.5150073  0.28191586  0.3096618  0.32978748
## BSBM19E1 -0.2022902 -0.1726701 -0.2435620 -0.09137876 -0.1266766 -0.07749327
##             BSBM18D1   BSBM18E1    BSBM18F1    BSBM18G1    BSBM18H1    BSBM18I1
## BSBM17A1  0.50190710  0.3707794  0.35899028  0.35468388  0.35330027  0.33011196
## BSBM17B1 -0.35451145 -0.2754567 -0.26234118 -0.24854785 -0.24473156 -0.23143962
## BSBM17C1 -0.40351355 -0.2819333 -0.26698557 -0.25488009 -0.27425041 -0.24165425
## BSBM17D1  0.52746717  0.4098831  0.39848855  0.38031003  0.41714836  0.39139863
## BSBM17E1  0.48876561  0.3578290  0.34986270  0.34937507  0.33662825  0.31904492
## BSBM17F1  0.42005308  0.2964591  0.27322313  0.29668984  0.27380793  0.25084071
## BSBM17G1  0.43050466  0.3043997  0.28338629  0.30459285  0.27875668  0.26123058
## BSBM17H1  0.54039490  0.3975143  0.37822977  0.38642283  0.38576615  0.34464885
## BSBM17I1  0.45503181  0.3330995  0.31312376  0.31776332  0.31072983  0.28375347
## BSBM18A1  0.46144318  0.5094499  0.49297448  0.44607371  0.45646831  0.48483722
## BSBM18B1  0.58226643  0.7150788  0.73608679  0.52282187  0.56368633  0.58427341
## BSBM18C1  0.72089200  0.6046541  0.58119967  0.52357457  0.52498541  0.52307537
## BSBM18D1  1.00000000  0.6149318  0.57286780  0.55786548  0.60357276  0.54771173
## BSBM18E1  0.61493178  1.0000000  0.76576403  0.56896006  0.60517071  0.62876994
## BSBM18F1  0.57286780  0.7657640  1.00000000  0.56667565  0.62260931  0.63453985
## BSBM18G1  0.55786548  0.5689601  0.56667565  1.00000000  0.59988532  0.58241733
## BSBM18H1  0.60357276  0.6051707  0.62260931  0.59988532  1.00000000  0.65809505
## BSBM18I1  0.54771173  0.6287699  0.63453985  0.58241733  0.65809505  1.00000000
## BSBM18J1  0.54135589  0.6343709  0.62067304  0.55935865  0.59049838  0.66680277
## BSBM19A1  0.30510304  0.2688822  0.25062297  0.23569958  0.20480309  0.22471300
## BSBM19B1 -0.09853654 -0.1146815 -0.10124379 -0.07872459 -0.04968707 -0.06754569
## BSBM19C1 -0.19474441 -0.1666128 -0.15273425 -0.12267112 -0.09309949 -0.11485944
## BSBM19D1  0.31829632  0.2916006  0.28047162  0.27629340  0.23601229  0.24453325
## BSBM19E1 -0.06507009 -0.1029988 -0.08884738 -0.04894775 -0.02844946 -0.05621497
##             BSBM18J1   BSBM19A1    BSBM19B1    BSBM19C1   BSBM19D1    BSBM19E1
## BSBM17A1  0.33734922  0.5236116 -0.28826323 -0.42698905  0.5329095 -0.20666147
## BSBM17B1 -0.26141114 -0.3786238  0.31310220  0.42090892 -0.3856843  0.24279751
## BSBM17C1 -0.26089394 -0.3447032  0.28076670  0.37974896 -0.3548225  0.22089787
## BSBM17D1  0.38161483  0.3653501 -0.14560072 -0.24700572  0.3845915 -0.08275696
## BSBM17E1  0.32662437  0.5399734 -0.31061953 -0.45216385  0.5494962 -0.21745219
## BSBM17F1  0.26158556  0.4798487 -0.27479110 -0.41421841  0.5033834 -0.18757992
## BSBM17G1  0.27347749  0.5272103 -0.31293359 -0.45588352  0.5504441 -0.20229021
## BSBM17H1  0.36527343  0.4412476 -0.22527236 -0.34846323  0.4542709 -0.17267014
## BSBM17I1  0.29324619  0.4929663 -0.32076087 -0.48704098  0.5150073 -0.24356201
## BSBM18A1  0.47911980  0.2823140 -0.10071192 -0.14830497  0.2819159 -0.09137876
## BSBM18B1  0.60649920  0.3016033 -0.14031704 -0.18751012  0.3096618 -0.12667656
## BSBM18C1  0.54430381  0.3238514 -0.12848220 -0.20933838  0.3297875 -0.07749327
## BSBM18D1  0.54135589  0.3051030 -0.09853654 -0.19474441  0.3182963 -0.06507009
## BSBM18E1  0.63437094  0.2688822 -0.11468152 -0.16661284  0.2916006 -0.10299880
## BSBM18F1  0.62067304  0.2506230 -0.10124379 -0.15273425  0.2804716 -0.08884738
## BSBM18G1  0.55935865  0.2356996 -0.07872459 -0.12267112  0.2762934 -0.04894775
## BSBM18H1  0.59049838  0.2048031 -0.04968707 -0.09309949  0.2360123 -0.02844946
## BSBM18I1  0.66680277  0.2247130 -0.06754569 -0.11485944  0.2445332 -0.05621497
## BSBM18J1  1.00000000  0.2433607 -0.08253412 -0.12916724  0.2460654 -0.08517851
## BSBM19A1  0.24336071  1.0000000 -0.45243917 -0.53607512  0.6505018 -0.30194684
## BSBM19B1 -0.08253412 -0.4524392  1.00000000  0.63602971 -0.4484224  0.44065611
## BSBM19C1 -0.12916724 -0.5360751  0.63602971  1.00000000 -0.5180123  0.42977445
## BSBM19D1  0.24606539  0.6505018 -0.44842244 -0.51801227  1.0000000 -0.27456968
## BSBM19E1 -0.08517851 -0.3019468  0.44065611  0.42977445 -0.2745697  1.00000000

The result is rather immense and messy, so let’s make it look it a bit clearer.

mcor <- round(cor(dat.cor),2)

upper <- mcor
upper[upper.tri(mcor)] <- ""
upper <- as.data.frame(upper)
upper
##          BSBM17A1 BSBM17B1 BSBM17C1 BSBM17D1 BSBM17E1 BSBM17F1 BSBM17G1
## BSBM17A1        1                                                      
## BSBM17B1    -0.97        1                                             
## BSBM17C1    -0.97     0.96        1                                    
## BSBM17D1     0.94    -0.94    -0.94        1                           
## BSBM17E1     0.99    -0.97    -0.97     0.94        1                  
## BSBM17F1     0.96    -0.94    -0.94      0.9     0.97        1         
## BSBM17G1     0.97    -0.95    -0.94     0.91     0.98     0.98        1
## BSBM17H1     0.97    -0.96    -0.97     0.93     0.98     0.96     0.96
## BSBM17I1     0.98    -0.95    -0.96     0.91     0.98     0.97     0.97
## BSBM18A1     0.66    -0.71    -0.71     0.71     0.65      0.6     0.61
## BSBM18B1     0.66    -0.71    -0.71     0.72     0.64     0.58     0.59
## BSBM18C1     0.84    -0.87    -0.87     0.88     0.82     0.78     0.78
## BSBM18D1     0.81    -0.85    -0.86     0.86      0.8     0.75     0.76
## BSBM18E1     0.65    -0.71    -0.71     0.72     0.63     0.58     0.58
## BSBM18F1     0.62    -0.69    -0.68      0.7     0.61     0.55     0.55
## BSBM18G1     0.66    -0.71    -0.71     0.72     0.64     0.59      0.6
## BSBM18H1     0.62    -0.68    -0.69     0.71      0.6     0.54     0.55
## BSBM18I1     0.59    -0.65    -0.65     0.67     0.57     0.51     0.52
## BSBM18J1     0.62    -0.68    -0.68     0.69      0.6     0.54     0.55
## BSBM19A1      0.9    -0.87    -0.85     0.81      0.9     0.89     0.91
## BSBM19B1    -0.84     0.79     0.77    -0.76    -0.85    -0.84    -0.86
## BSBM19C1    -0.91     0.86     0.84    -0.83    -0.91    -0.91    -0.92
## BSBM19D1      0.9    -0.88    -0.86     0.82     0.91      0.9     0.91
## BSBM19E1    -0.78     0.73     0.71    -0.71    -0.78    -0.77    -0.78
##          BSBM17H1 BSBM17I1 BSBM18A1 BSBM18B1 BSBM18C1 BSBM18D1 BSBM18E1
## BSBM17A1                                                               
## BSBM17B1                                                               
## BSBM17C1                                                               
## BSBM17D1                                                               
## BSBM17E1                                                               
## BSBM17F1                                                               
## BSBM17G1                                                               
## BSBM17H1        1                                                      
## BSBM17I1     0.98        1                                             
## BSBM18A1     0.67     0.63        1                                    
## BSBM18B1     0.68     0.63      0.9        1                           
## BSBM18C1     0.86     0.81     0.85      0.9        1                  
## BSBM18D1     0.84     0.79     0.84      0.9     0.97        1         
## BSBM18E1     0.67     0.62     0.87     0.97      0.9      0.9        1
## BSBM18F1     0.65      0.6     0.86     0.97     0.88     0.89     0.98
## BSBM18G1     0.68     0.63     0.84     0.89     0.87     0.89     0.91
## BSBM18H1     0.65     0.59     0.83      0.9     0.86     0.89     0.92
## BSBM18I1     0.61     0.56     0.85     0.91     0.85     0.86     0.93
## BSBM18J1     0.64     0.59     0.86     0.93     0.87     0.87     0.94
## BSBM19A1     0.86      0.9     0.62     0.61     0.72     0.69     0.58
## BSBM19B1    -0.81    -0.85    -0.61    -0.62    -0.69    -0.67    -0.59
## BSBM19C1    -0.88    -0.92    -0.63    -0.63    -0.74    -0.72    -0.61
## BSBM19D1     0.87      0.9     0.62     0.62     0.73      0.7     0.59
## BSBM19E1    -0.76    -0.79    -0.62    -0.63    -0.67    -0.65     -0.6
##          BSBM18F1 BSBM18G1 BSBM18H1 BSBM18I1 BSBM18J1 BSBM19A1 BSBM19B1
## BSBM17A1                                                               
## BSBM17B1                                                               
## BSBM17C1                                                               
## BSBM17D1                                                               
## BSBM17E1                                                               
## BSBM17F1                                                               
## BSBM17G1                                                               
## BSBM17H1                                                               
## BSBM17I1                                                               
## BSBM18A1                                                               
## BSBM18B1                                                               
## BSBM18C1                                                               
## BSBM18D1                                                               
## BSBM18E1                                                               
## BSBM18F1        1                                                      
## BSBM18G1     0.91        1                                             
## BSBM18H1     0.93     0.92        1                                    
## BSBM18I1     0.94     0.91     0.94        1                           
## BSBM18J1     0.94     0.91     0.92     0.95        1                  
## BSBM19A1     0.55     0.57     0.52     0.51     0.54        1         
## BSBM19B1    -0.57    -0.58    -0.52    -0.52    -0.55    -0.94        1
## BSBM19C1    -0.58     -0.6    -0.54    -0.54    -0.57    -0.96     0.95
## BSBM19D1     0.57     0.59     0.53     0.53     0.55     0.96    -0.94
## BSBM19E1    -0.58    -0.57    -0.53    -0.54    -0.57    -0.85     0.85
##          BSBM19C1 BSBM19D1 BSBM19E1
## BSBM17A1                           
## BSBM17B1                           
## BSBM17C1                           
## BSBM17D1                           
## BSBM17E1                           
## BSBM17F1                           
## BSBM17G1                           
## BSBM17H1                           
## BSBM17I1                           
## BSBM18A1                           
## BSBM18B1                           
## BSBM18C1                           
## BSBM18D1                           
## BSBM18E1                           
## BSBM18F1                           
## BSBM18G1                           
## BSBM18H1                           
## BSBM18I1                           
## BSBM18J1                           
## BSBM19A1                           
## BSBM19B1                           
## BSBM19C1        1                  
## BSBM19D1    -0.96        1         
## BSBM19E1     0.85    -0.84        1
library(knitr)
library(kableExtra)
upper %>%
  kable() %>%
  kable_styling()
BSBM17A1 BSBM17B1 BSBM17C1 BSBM17D1 BSBM17E1 BSBM17F1 BSBM17G1 BSBM17H1 BSBM17I1 BSBM18A1 BSBM18B1 BSBM18C1 BSBM18D1 BSBM18E1 BSBM18F1 BSBM18G1 BSBM18H1 BSBM18I1 BSBM18J1 BSBM19A1 BSBM19B1 BSBM19C1 BSBM19D1 BSBM19E1
BSBM17A1 1
BSBM17B1 -0.97 1
BSBM17C1 -0.97 0.96 1
BSBM17D1 0.94 -0.94 -0.94 1
BSBM17E1 0.99 -0.97 -0.97 0.94 1
BSBM17F1 0.96 -0.94 -0.94 0.9 0.97 1
BSBM17G1 0.97 -0.95 -0.94 0.91 0.98 0.98 1
BSBM17H1 0.97 -0.96 -0.97 0.93 0.98 0.96 0.96 1
BSBM17I1 0.98 -0.95 -0.96 0.91 0.98 0.97 0.97 0.98 1
BSBM18A1 0.66 -0.71 -0.71 0.71 0.65 0.6 0.61 0.67 0.63 1
BSBM18B1 0.66 -0.71 -0.71 0.72 0.64 0.58 0.59 0.68 0.63 0.9 1
BSBM18C1 0.84 -0.87 -0.87 0.88 0.82 0.78 0.78 0.86 0.81 0.85 0.9 1
BSBM18D1 0.81 -0.85 -0.86 0.86 0.8 0.75 0.76 0.84 0.79 0.84 0.9 0.97 1
BSBM18E1 0.65 -0.71 -0.71 0.72 0.63 0.58 0.58 0.67 0.62 0.87 0.97 0.9 0.9 1
BSBM18F1 0.62 -0.69 -0.68 0.7 0.61 0.55 0.55 0.65 0.6 0.86 0.97 0.88 0.89 0.98 1
BSBM18G1 0.66 -0.71 -0.71 0.72 0.64 0.59 0.6 0.68 0.63 0.84 0.89 0.87 0.89 0.91 0.91 1
BSBM18H1 0.62 -0.68 -0.69 0.71 0.6 0.54 0.55 0.65 0.59 0.83 0.9 0.86 0.89 0.92 0.93 0.92 1
BSBM18I1 0.59 -0.65 -0.65 0.67 0.57 0.51 0.52 0.61 0.56 0.85 0.91 0.85 0.86 0.93 0.94 0.91 0.94 1
BSBM18J1 0.62 -0.68 -0.68 0.69 0.6 0.54 0.55 0.64 0.59 0.86 0.93 0.87 0.87 0.94 0.94 0.91 0.92 0.95 1
BSBM19A1 0.9 -0.87 -0.85 0.81 0.9 0.89 0.91 0.86 0.9 0.62 0.61 0.72 0.69 0.58 0.55 0.57 0.52 0.51 0.54 1
BSBM19B1 -0.84 0.79 0.77 -0.76 -0.85 -0.84 -0.86 -0.81 -0.85 -0.61 -0.62 -0.69 -0.67 -0.59 -0.57 -0.58 -0.52 -0.52 -0.55 -0.94 1
BSBM19C1 -0.91 0.86 0.84 -0.83 -0.91 -0.91 -0.92 -0.88 -0.92 -0.63 -0.63 -0.74 -0.72 -0.61 -0.58 -0.6 -0.54 -0.54 -0.57 -0.96 0.95 1
BSBM19D1 0.9 -0.88 -0.86 0.82 0.91 0.9 0.91 0.87 0.9 0.62 0.62 0.73 0.7 0.59 0.57 0.59 0.53 0.53 0.55 0.96 -0.94 -0.96 1
BSBM19E1 -0.78 0.73 0.71 -0.71 -0.78 -0.77 -0.78 -0.76 -0.79 -0.62 -0.63 -0.67 -0.65 -0.6 -0.58 -0.57 -0.53 -0.54 -0.57 -0.85 0.85 0.85 -0.84 1

Altogether, it seen that all variables are actually highly correlated, both positively and negatively (>= 0.5). Since we got high correlations, can procees to the next step.

Defining the number of factors:

fa.parallel(nzldfa, fa="both", n.iter=100) 

## Parallel analysis suggests that the number of factors =  4  and the number of components =  3

R suggests to remain 4 factors in the analyis.

Let’s try to rotate the diagrams to get the best result.

No rotation:

fa(nzldfa, nfactors = 4, rotate = "none", fm = "ml")
## Factor Analysis using method =  ml
## Call: fa(r = nzldfa, nfactors = 4, rotate = "none", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            ML1   ML2   ML3   ML4   h2   u2 com
## BSBM17A1  0.82 -0.26  0.16  0.01 0.77 0.23 1.3
## BSBM17B1 -0.62  0.24  0.00  0.32 0.54 0.46 1.8
## BSBM17C1 -0.63  0.22 -0.08  0.37 0.59 0.41 1.9
## BSBM17D1  0.70 -0.02  0.21  0.04 0.53 0.47 1.2
## BSBM17E1  0.84 -0.31  0.16  0.02 0.82 0.18 1.3
## BSBM17F1  0.72 -0.31  0.13  0.11 0.65 0.35 1.5
## BSBM17G1  0.76 -0.33  0.10  0.13 0.71 0.29 1.5
## BSBM17H1  0.79 -0.17  0.21 -0.03 0.69 0.31 1.2
## BSBM17I1  0.77 -0.30  0.09 -0.04 0.69 0.31 1.3
## BSBM18A1  0.53  0.34 -0.07  0.08 0.40 0.60 1.8
## BSBM18B1  0.64  0.50 -0.15  0.01 0.68 0.32 2.0
## BSBM18C1  0.73  0.29  0.09 -0.05 0.62 0.38 1.4
## BSBM18D1  0.71  0.32  0.11 -0.05 0.62 0.38 1.5
## BSBM18E1  0.65  0.54 -0.12 -0.01 0.72 0.28 2.0
## BSBM18F1  0.63  0.55 -0.12 -0.01 0.71 0.29 2.0
## BSBM18G1  0.57  0.41  0.00  0.04 0.50 0.50 1.8
## BSBM18H1  0.58  0.48  0.02 -0.01 0.58 0.42 1.9
## BSBM18I1  0.57  0.51 -0.05  0.02 0.60 0.40 2.0
## BSBM18J1  0.58  0.49 -0.07 -0.01 0.58 0.42 2.0
## BSBM19A1  0.60 -0.27 -0.32  0.24 0.59 0.41 2.4
## BSBM19B1 -0.36  0.34  0.58  0.10 0.59 0.41 2.4
## BSBM19C1 -0.50  0.40  0.50  0.09 0.67 0.33 3.0
## BSBM19D1  0.62 -0.26 -0.30  0.25 0.60 0.40 2.2
## BSBM19E1 -0.26  0.22  0.41  0.15 0.30 0.70 2.6
## 
##                         ML1  ML2  ML3  ML4
## SS loadings           10.00 3.09 1.21 0.45
## Proportion Var         0.42 0.13 0.05 0.02
## Cumulative Var         0.42 0.55 0.60 0.61
## Proportion Explained   0.68 0.21 0.08 0.03
## Cumulative Proportion  0.68 0.89 0.97 1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.69 with Chi Square of  121470.5
## The degrees of freedom for the model are 186  and the objective function was  0.74 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  7290 with the empirical chi square  1526.99  with prob <  4e-209 
## The total number of observations was  7290  with Likelihood Chi Square =  5395.06  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.936
## RMSEA index =  0.062  and the 90 % confidence intervals are  0.061 0.063
## BIC =  3740.72
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2  ML3  ML4
## Correlation of (regression) scores with factors   0.98 0.95 0.87 0.72
## Multiple R square of scores with factors          0.97 0.90 0.76 0.52
## Minimum correlation of possible factor scores     0.94 0.80 0.52 0.03
fa.diagram(fa(nzldfa, nfactors = 4, rotate = "none", fm = "ml"))

If we look at indexes, we may see that proportion variance is above 0.1 just for 2 variables, while proportion explained differs a lot among variables - not okay. One factor took the majority of the variables. Then, RMSR is 0.02 - not bad, quite close to 0; RMSEA is 0.062 - not bad but nor good, less than 0.05 is excellent; Tucker Lewis Index is 0.936 which is perfect.

Varimax:

fa(nzldfa, nfactors = 4, rotate = "varimax", fm = "ml")
## Factor Analysis using method =  ml
## Call: fa(r = nzldfa, nfactors = 4, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            ML2   ML1   ML3   ML4   h2   u2 com
## BSBM17A1  0.27  0.80  0.21 -0.01 0.77 0.23 1.4
## BSBM17B1 -0.18 -0.56 -0.31  0.31 0.54 0.46 2.5
## BSBM17C1 -0.19 -0.59 -0.25  0.38 0.59 0.41 2.4
## BSBM17D1  0.38  0.62  0.03  0.00 0.53 0.47 1.7
## BSBM17E1  0.24  0.84  0.23  0.00 0.82 0.18 1.3
## BSBM17F1  0.18  0.75  0.21  0.10 0.65 0.35 1.3
## BSBM17G1  0.19  0.77  0.25  0.12 0.71 0.29 1.4
## BSBM17H1  0.32  0.76  0.12 -0.06 0.69 0.31 1.4
## BSBM17I1  0.22  0.75  0.27 -0.05 0.69 0.31 1.5
## BSBM18A1  0.59  0.19  0.09  0.07 0.40 0.60 1.3
## BSBM18B1  0.80  0.15  0.15  0.01 0.68 0.32 1.1
## BSBM18C1  0.66  0.43  0.04 -0.08 0.62 0.38 1.8
## BSBM18D1  0.67  0.41  0.01 -0.08 0.62 0.38 1.7
## BSBM18E1  0.83  0.15  0.11 -0.02 0.72 0.28 1.1
## BSBM18F1  0.83  0.13  0.10 -0.01 0.71 0.29 1.1
## BSBM18G1  0.67  0.22  0.02  0.02 0.50 0.50 1.2
## BSBM18H1  0.73  0.20 -0.02 -0.03 0.58 0.42 1.2
## BSBM18I1  0.76  0.14  0.03  0.00 0.60 0.40 1.1
## BSBM18J1  0.74  0.15  0.06 -0.02 0.58 0.42 1.1
## BSBM19A1  0.19  0.44  0.53  0.28 0.59 0.41 2.8
## BSBM19B1 -0.02 -0.16 -0.75  0.02 0.59 0.41 1.1
## BSBM19C1 -0.05 -0.33 -0.74  0.02 0.67 0.33 1.4
## BSBM19D1  0.21  0.46  0.51  0.29 0.60 0.40 3.0
## BSBM19E1 -0.03 -0.10 -0.53  0.09 0.30 0.70 1.1
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           6.00 5.83 2.46 0.46
## Proportion Var        0.25 0.24 0.10 0.02
## Cumulative Var        0.25 0.49 0.60 0.61
## Proportion Explained  0.41 0.40 0.17 0.03
## Cumulative Proportion 0.41 0.80 0.97 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.69 with Chi Square of  121470.5
## The degrees of freedom for the model are 186  and the objective function was  0.74 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  7290 with the empirical chi square  1526.99  with prob <  4e-209 
## The total number of observations was  7290  with Likelihood Chi Square =  5395.06  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.936
## RMSEA index =  0.062  and the 90 % confidence intervals are  0.061 0.063
## BIC =  3740.72
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3  ML4
## Correlation of (regression) scores with factors   0.96 0.95 0.89 0.72
## Multiple R square of scores with factors          0.92 0.91 0.79 0.52
## Minimum correlation of possible factor scores     0.84 0.81 0.59 0.04
fa.diagram(fa(nzldfa, nfactors = 4, rotate = "varimax", fm = "ml"))

RMSR, RMSEA and Tucker Lewis Index remained the same. Proportion variance is now above .1 for three variables, proportion explained still differs quite a lot.

Altogether, rather good result; variables seem to be assigned to factors logically (all 17, 18, 19 to different factors correspondingly).

Oblimin:

fa(nzldfa, nfactors = 4, rotate = "oblimin", fm = "ml")
## Loading required namespace: GPArotation
## Factor Analysis using method =  ml
## Call: fa(r = nzldfa, nfactors = 4, rotate = "oblimin", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            ML2   ML1   ML3   ML4   h2   u2 com
## BSBM17A1  0.04  0.84 -0.02 -0.04 0.77 0.23 1.0
## BSBM17B1 -0.05 -0.43  0.26  0.34 0.54 0.46 2.6
## BSBM17C1 -0.04 -0.47  0.20  0.41 0.59 0.41 2.4
## BSBM17D1  0.21  0.65  0.14 -0.02 0.53 0.47 1.3
## BSBM17E1  0.00  0.89 -0.03 -0.03 0.82 0.18 1.0
## BSBM17F1 -0.04  0.83 -0.01  0.07 0.65 0.35 1.0
## BSBM17G1 -0.05  0.85 -0.04  0.09 0.71 0.29 1.0
## BSBM17H1  0.11  0.78  0.06 -0.09 0.69 0.31 1.1
## BSBM17I1  0.00  0.76 -0.11 -0.08 0.69 0.31 1.1
## BSBM18A1  0.59  0.08 -0.01  0.08 0.40 0.60 1.1
## BSBM18B1  0.84 -0.06 -0.09  0.03 0.68 0.32 1.0
## BSBM18C1  0.58  0.32  0.07 -0.08 0.62 0.38 1.6
## BSBM18D1  0.60  0.30  0.10 -0.08 0.62 0.38 1.6
## BSBM18E1  0.87 -0.06 -0.05  0.00 0.72 0.28 1.0
## BSBM18F1  0.88 -0.08 -0.05  0.00 0.71 0.29 1.0
## BSBM18G1  0.67  0.10  0.06  0.03 0.50 0.50 1.1
## BSBM18H1  0.74  0.05  0.09 -0.02 0.58 0.42 1.0
## BSBM18I1  0.79 -0.03  0.03  0.02 0.60 0.40 1.0
## BSBM18J1  0.77 -0.03 -0.01 -0.01 0.58 0.42 1.0
## BSBM19A1  0.09  0.40 -0.41  0.28 0.59 0.41 2.9
## BSBM19B1 -0.02  0.07  0.80  0.01 0.59 0.41 1.0
## BSBM19C1  0.01 -0.14  0.74  0.02 0.67 0.33 1.1
## BSBM19D1  0.10  0.43 -0.38  0.29 0.60 0.40 2.9
## BSBM19E1 -0.04  0.09  0.58  0.09 0.30 0.70 1.1
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           5.88 6.07 2.27 0.53
## Proportion Var        0.25 0.25 0.09 0.02
## Cumulative Var        0.25 0.50 0.59 0.61
## Proportion Explained  0.40 0.41 0.15 0.04
## Cumulative Proportion 0.40 0.81 0.96 1.00
## 
##  With factor correlations of 
##       ML2   ML1   ML3   ML4
## ML2  1.00  0.50 -0.16 -0.10
## ML1  0.50  1.00 -0.48 -0.15
## ML3 -0.16 -0.48  1.00 -0.06
## ML4 -0.10 -0.15 -0.06  1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.69 with Chi Square of  121470.5
## The degrees of freedom for the model are 186  and the objective function was  0.74 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  7290 with the empirical chi square  1526.99  with prob <  4e-209 
## The total number of observations was  7290  with Likelihood Chi Square =  5395.06  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.936
## RMSEA index =  0.062  and the 90 % confidence intervals are  0.061 0.063
## BIC =  3740.72
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3  ML4
## Correlation of (regression) scores with factors   0.97 0.98 0.92 0.75
## Multiple R square of scores with factors          0.94 0.95 0.84 0.56
## Minimum correlation of possible factor scores     0.88 0.90 0.68 0.11
fa.diagram(fa(nzldfa, nfactors = 4, rotate = "oblimin", fm = "ml"))

RMSR, RMSEA and Tucker Lewis Index are again the same. However, the results of proportions became worse - proportion var is above .1 just for two factors, proportion explained almost the seame but in the previous case the values seem to differ less.

So, evidently just three factors should be remained. However, the indexes for oblimin and varimax rotation are quite similar, thus we need to look at them one more time, but with three factors left.

fa(nzldfa, nfactors = 3, rotate = "varimax", fm = "ml")
## Factor Analysis using method =  ml
## Call: fa(r = nzldfa, nfactors = 3, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            ML2   ML1   ML3   h2   u2 com
## BSBM17A1  0.26  0.81  0.20 0.77 0.23 1.3
## BSBM17B1 -0.19 -0.56 -0.28 0.43 0.57 1.7
## BSBM17C1 -0.20 -0.59 -0.21 0.44 0.56 1.5
## BSBM17D1  0.38  0.62  0.03 0.53 0.47 1.7
## BSBM17E1  0.24  0.85  0.23 0.82 0.18 1.3
## BSBM17F1  0.17  0.74  0.22 0.63 0.37 1.3
## BSBM17G1  0.18  0.77  0.26 0.69 0.31 1.3
## BSBM17H1  0.31  0.76  0.12 0.69 0.31 1.4
## BSBM17I1  0.21  0.76  0.27 0.69 0.31 1.4
## BSBM18A1  0.59  0.20  0.10 0.40 0.60 1.3
## BSBM18B1  0.79  0.16  0.15 0.68 0.32 1.2
## BSBM18C1  0.65  0.44  0.04 0.62 0.38 1.8
## BSBM18D1  0.67  0.41  0.01 0.62 0.38 1.7
## BSBM18E1  0.82  0.16  0.11 0.72 0.28 1.1
## BSBM18F1  0.83  0.14  0.10 0.72 0.28 1.1
## BSBM18G1  0.67  0.22  0.03 0.50 0.50 1.2
## BSBM18H1  0.73  0.20 -0.01 0.58 0.42 1.2
## BSBM18I1  0.76  0.15  0.04 0.60 0.40 1.1
## BSBM18J1  0.74  0.16  0.06 0.58 0.42 1.1
## BSBM19A1  0.18  0.44  0.53 0.51 0.49 2.2
## BSBM19B1 -0.01 -0.16 -0.75 0.59 0.41 1.1
## BSBM19C1 -0.04 -0.33 -0.75 0.67 0.33 1.4
## BSBM19D1  0.20  0.46  0.51 0.51 0.49 2.3
## BSBM19E1 -0.03 -0.11 -0.52 0.29 0.71 1.1
## 
##                        ML2  ML1  ML3
## SS loadings           5.94 5.89 2.43
## Proportion Var        0.25 0.25 0.10
## Cumulative Var        0.25 0.49 0.59
## Proportion Explained  0.42 0.41 0.17
## Cumulative Proportion 0.42 0.83 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.69 with Chi Square of  121470.5
## The degrees of freedom for the model are 207  and the objective function was  0.98 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  7290 with the empirical chi square  2692.81  with prob <  0 
## The total number of observations was  7290  with Likelihood Chi Square =  7130.44  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.924
## RMSEA index =  0.068  and the 90 % confidence intervals are  0.066 0.069
## BIC =  5289.33
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3
## Correlation of (regression) scores with factors   0.96 0.95 0.89
## Multiple R square of scores with factors          0.92 0.91 0.79
## Minimum correlation of possible factor scores     0.84 0.81 0.58
fa(nzldfa, nfactors = 3, rotate = "oblimin", fm = "ml")
## Factor Analysis using method =  ml
## Call: fa(r = nzldfa, nfactors = 3, rotate = "oblimin", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            ML1   ML2   ML3   h2   u2 com
## BSBM17A1  0.86  0.02 -0.01 0.77 0.23 1.0
## BSBM17B1 -0.55 -0.03  0.16 0.43 0.57 1.2
## BSBM17C1 -0.61 -0.02  0.08 0.44 0.56 1.0
## BSBM17D1  0.66  0.20  0.13 0.53 0.47 1.3
## BSBM17E1  0.91 -0.02 -0.03 0.82 0.18 1.0
## BSBM17F1  0.80 -0.06 -0.04 0.63 0.37 1.0
## BSBM17G1  0.81 -0.06 -0.08 0.69 0.31 1.0
## BSBM17H1  0.82  0.08  0.07 0.69 0.31 1.0
## BSBM17I1  0.79 -0.02 -0.10 0.69 0.31 1.0
## BSBM18A1  0.06  0.59 -0.04 0.40 0.60 1.0
## BSBM18B1 -0.06  0.84 -0.10 0.68 0.32 1.0
## BSBM18C1  0.35  0.57  0.09 0.62 0.38 1.7
## BSBM18D1  0.33  0.59  0.11 0.62 0.38 1.7
## BSBM18E1 -0.05  0.87 -0.06 0.72 0.28 1.0
## BSBM18F1 -0.08  0.88 -0.05 0.72 0.28 1.0
## BSBM18G1  0.09  0.66  0.05 0.50 0.50 1.0
## BSBM18H1  0.07  0.73  0.09 0.58 0.42 1.0
## BSBM18I1 -0.03  0.79  0.02 0.60 0.40 1.0
## BSBM18J1 -0.02  0.77  0.00 0.58 0.42 1.0
## BSBM19A1  0.32  0.09 -0.47 0.51 0.49 1.9
## BSBM19B1  0.05 -0.03  0.79 0.59 0.41 1.0
## BSBM19C1 -0.16  0.00  0.73 0.67 0.33 1.1
## BSBM19D1  0.34  0.11 -0.44 0.51 0.49 2.0
## BSBM19E1  0.04 -0.04  0.55 0.29 0.71 1.0
## 
##                        ML1  ML2  ML3
## SS loadings           6.26 5.78 2.22
## Proportion Var        0.26 0.24 0.09
## Cumulative Var        0.26 0.50 0.59
## Proportion Explained  0.44 0.40 0.16
## Cumulative Proportion 0.44 0.84 1.00
## 
##  With factor correlations of 
##       ML1   ML2   ML3
## ML1  1.00  0.51 -0.45
## ML2  0.51  1.00 -0.13
## ML3 -0.45 -0.13  1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  16.69 with Chi Square of  121470.5
## The degrees of freedom for the model are 207  and the objective function was  0.98 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  7290 with the empirical chi square  2692.81  with prob <  0 
## The total number of observations was  7290  with Likelihood Chi Square =  7130.44  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.924
## RMSEA index =  0.068  and the 90 % confidence intervals are  0.066 0.069
## BIC =  5289.33
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML1  ML2  ML3
## Correlation of (regression) scores with factors   0.98 0.97 0.91
## Multiple R square of scores with factors          0.95 0.94 0.83
## Minimum correlation of possible factor scores     0.90 0.88 0.67

So, the indexes of Tucker Lewis, RMSR and RMSEA are the same, and we look at proprotions. Finally, within varimax rotation all factors’ proportion variances are above .10, and apparently, as noted above, the explained proportions differ less in this case.

Altogether, varimax rotation seem to be the best solution.

And here’s the diagram:

fa.diagram(fa(nzldfa, nfactors = 3, rotate = "varimax", fm = "ml"))

Naming factors

In accordance with the related variables, the factors’ names were assigned as follows:

Factor Assigned name
M1 Attitude to and interest in math
M2 Interaction with a teacher
M3 Personal abilities in math

Internal consistency

ML1 <- as.data.frame(nzldfa[c("BSBM17A1", "BSBM17B1", "BSBM17C1", "BSBM17D1", "BSBM17E1", "BSBM17F1", "BSBM17G1", "BSBM17H1", "BSBM17I1")])

psych::alpha(ML1, check.keys = TRUE)
## Warning in psych::alpha(ML1, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = ML1, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.93      0.93    0.94      0.61  14 0.0011  2.6 0.79      0.6
## 
##  lower alpha upper     95% confidence boundaries
## 0.93 0.93 0.94 
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM17A1       0.92      0.92    0.92      0.60  12   0.0014 0.0096  0.58
## BSBM17B1-      0.93      0.93    0.93      0.64  14   0.0012 0.0091  0.66
## BSBM17C1-      0.93      0.93    0.93      0.63  14   0.0012 0.0101  0.66
## BSBM17D1       0.93      0.93    0.93      0.63  14   0.0012 0.0093  0.66
## BSBM17E1       0.92      0.92    0.92      0.59  12   0.0014 0.0084  0.57
## BSBM17F1       0.93      0.93    0.93      0.61  13   0.0013 0.0102  0.59
## BSBM17G1       0.92      0.93    0.92      0.61  12   0.0013 0.0101  0.59
## BSBM17H1       0.92      0.92    0.92      0.61  12   0.0013 0.0105  0.59
## BSBM17I1       0.92      0.92    0.92      0.61  12   0.0013 0.0102  0.59
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean   sd
## BSBM17A1  7290  0.87  0.88  0.87   0.84  2.8 0.94
## BSBM17B1- 7290  0.72  0.72  0.67   0.64  2.8 1.04
## BSBM17C1- 7290  0.73  0.73  0.68   0.66  2.4 0.96
## BSBM17D1  7290  0.72  0.73  0.68   0.65  2.9 0.88
## BSBM17E1  7290  0.90  0.90  0.90   0.87  2.7 0.97
## BSBM17F1  7290  0.81  0.81  0.79   0.76  2.4 0.90
## BSBM17G1  7290  0.84  0.84  0.82   0.79  2.6 0.98
## BSBM17H1  7290  0.84  0.84  0.82   0.80  2.3 0.96
## BSBM17I1  7290  0.85  0.84  0.83   0.80  2.2 1.10
## 
## Non missing response frequency for each item
##             1    2    3    4 miss
## BSBM17A1 0.12 0.21 0.42 0.25    0
## BSBM17B1 0.34 0.29 0.24 0.13    0
## BSBM17C1 0.16 0.29 0.37 0.18    0
## BSBM17D1 0.07 0.22 0.44 0.27    0
## BSBM17E1 0.14 0.24 0.39 0.23    0
## BSBM17F1 0.17 0.41 0.31 0.12    0
## BSBM17G1 0.16 0.30 0.35 0.19    0
## BSBM17H1 0.23 0.36 0.28 0.13    0
## BSBM17I1 0.33 0.28 0.21 0.18    0

Alpha is 0.93 which is a very good result. The responses inside the scale are consistent, so the scale may be used further

ML2 <- as.data.frame(nzldfa[c("BSBM18A1", "BSBM18B1", "BSBM18C1", "BSBM18D1", "BSBM18E1", "BSBM18F1", "BSBM18G1", "BSBM18H1", "BSBM18I1", "BSBM18J1")])
psych::alpha(ML2)
## 
## Reliability analysis   
## Call: psych::alpha(x = ML2)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean  sd median_r
##       0.93      0.93    0.93      0.58  14 0.0012    3 0.7     0.58
## 
##  lower alpha upper     95% confidence boundaries
## 0.93 0.93 0.94 
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM18A1      0.93      0.93    0.93      0.60  14   0.0012 0.0037  0.60
## BSBM18B1      0.92      0.92    0.92      0.57  12   0.0013 0.0052  0.57
## BSBM18C1      0.93      0.93    0.92      0.58  13   0.0013 0.0055  0.58
## BSBM18D1      0.93      0.93    0.92      0.58  12   0.0013 0.0056  0.58
## BSBM18E1      0.92      0.92    0.92      0.57  12   0.0013 0.0047  0.57
## BSBM18F1      0.92      0.92    0.92      0.57  12   0.0013 0.0045  0.57
## BSBM18G1      0.93      0.93    0.93      0.59  13   0.0012 0.0059  0.59
## BSBM18H1      0.93      0.93    0.93      0.58  12   0.0013 0.0059  0.57
## BSBM18I1      0.93      0.92    0.92      0.58  12   0.0013 0.0057  0.57
## BSBM18J1      0.93      0.93    0.93      0.58  12   0.0013 0.0059  0.58
## 
##  Item statistics 
##             n raw.r std.r r.cor r.drop mean   sd
## BSBM18A1 7290  0.67  0.68  0.63   0.61  3.3 0.71
## BSBM18B1 7290  0.83  0.83  0.81   0.78  3.1 0.92
## BSBM18C1 7290  0.78  0.78  0.75   0.72  2.9 0.90
## BSBM18D1 7290  0.79  0.79  0.76   0.73  2.7 0.93
## BSBM18E1 7290  0.85  0.84  0.83   0.80  3.0 0.93
## BSBM18F1 7290  0.84  0.84  0.82   0.79  3.1 0.92
## BSBM18G1 7290  0.75  0.75  0.71   0.69  2.9 0.90
## BSBM18H1 7290  0.79  0.79  0.76   0.73  3.1 0.89
## BSBM18I1 7290  0.80  0.80  0.78   0.75  3.2 0.88
## BSBM18J1 7290  0.79  0.79  0.76   0.74  3.1 0.91
## 
## Non missing response frequency for each item
##             1    2    3    4 miss
## BSBM18A1 0.02 0.09 0.43 0.47    0
## BSBM18B1 0.07 0.18 0.37 0.38    0
## BSBM18C1 0.08 0.25 0.41 0.26    0
## BSBM18D1 0.11 0.29 0.38 0.22    0
## BSBM18E1 0.08 0.20 0.37 0.35    0
## BSBM18F1 0.07 0.16 0.34 0.43    0
## BSBM18G1 0.08 0.24 0.41 0.26    0
## BSBM18H1 0.06 0.16 0.38 0.39    0
## BSBM18I1 0.06 0.15 0.37 0.42    0
## BSBM18J1 0.07 0.15 0.37 0.41    0

The resulting number is again 0.93, which is also good. This scale may also be remained.

ML3 <- as.data.frame(nzldfa[c("BSBM19A1", "BSBM19B1", "BSBM19C1", "BSBM19D1", "BSBM19E1")])
psych::alpha(ML3, check.keys = TRUE)
## Warning in psych::alpha(ML3, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = ML3, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.81      0.82     0.8      0.47 4.4 0.0034  2.3 0.72     0.45
## 
##  lower alpha upper     95% confidence boundaries
## 0.81 0.81 0.82 
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## BSBM19A1-      0.77      0.77    0.73      0.46 3.4   0.0043 0.0140  0.44
## BSBM19B1       0.76      0.77    0.74      0.45 3.3   0.0045 0.0211  0.47
## BSBM19C1       0.74      0.75    0.72      0.43 3.0   0.0050 0.0180  0.44
## BSBM19D1-      0.78      0.78    0.74      0.47 3.5   0.0041 0.0126  0.45
## BSBM19E1       0.82      0.82    0.80      0.54 4.7   0.0034 0.0076  0.53
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean   sd
## BSBM19A1- 7290  0.76  0.78  0.71   0.63  2.0 0.83
## BSBM19B1  7290  0.79  0.79  0.72   0.65  2.3 0.96
## BSBM19C1  7290  0.84  0.82  0.78   0.70  2.5 1.08
## BSBM19D1- 7290  0.75  0.76  0.70   0.60  2.3 0.89
## BSBM19E1  7290  0.65  0.65  0.49   0.45  2.2 0.96
## 
## Non missing response frequency for each item
##             1    2    3    4 miss
## BSBM19A1 0.06 0.17 0.49 0.28    0
## BSBM19B1 0.22 0.37 0.28 0.13    0
## BSBM19C1 0.21 0.27 0.28 0.24    0
## BSBM19D1 0.09 0.30 0.40 0.20    0
## BSBM19E1 0.28 0.36 0.26 0.11    0

This alpha is just 0.81, but still it is satisfactory, so to say (alpha should be > .7). All scales are internally consistent and may be used in further analysis.


Regression analysis

Inserting factors into our dataset:

fa1 <- fa(nzldfa, nfactors = 3, rotate = "varimax", fm = "ml", scores=T) 
load <- fa1$loadings[,1:2] 
plot(load)

fascores <- as.data.frame(fa1$scores)
nzldfac <- cbind(nzld,fascores)

Checking types of the variables.

sapply(nzldfac, class)
##  BSBM17A1  BSBM17B1  BSBM17C1  BSBM17D1  BSBM17E1  BSBM17F1  BSBM17G1  BSBM17H1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##  BSBM17I1  BSBM18A1  BSBM18B1  BSBM18C1  BSBM18D1  BSBM18E1  BSBM18F1  BSBM18G1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##  BSBM18H1  BSBM18I1  BSBM18J1  BSBM19A1  BSBM19B1  BSBM19C1  BSBM19D1  BSBM19E1 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##  BSMMAT01    paredu    gender   country       ML2       ML1       ML3 
##  "factor"  "factor"  "factor"  "factor" "numeric" "numeric" "numeric"

Math achievement is factor, so let’s change it in advance.

nzldfac$BSMMAT01 <- as.numeric(as.character(nzldfac$BSMMAT01))

Distributions of used variables:

ml1 <- ggplot(nzldfac, aes(ML1)) + geom_histogram(color = "skyblue2", fill = "lightsteelblue1") + theme_pubclean()
ml2 <- ggplot(nzldfac, aes(ML2)) + geom_histogram(color = "skyblue2", fill = "lightsteelblue1") + theme_pubclean()
ml3 <- ggplot(nzldfac, aes(ML3)) + geom_histogram(color = "skyblue2", fill = "lightsteelblue1") + theme_pubclean()
ggarrange(ml1, ml2, ml3, nrow = 1, ncol = 3)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

First of all, let’s start from adding the factors.

model1 <- lm(nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2)
summary(model1)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2)

Residuals:
     Min       1Q   Median       3Q      Max 
-265.099  -46.138    7.271   51.981  221.323 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 505.5126     0.8669 583.119  < 2e-16 ***
nzldfac$ML3  45.9883     0.9792  46.966  < 2e-16 ***
nzldfac$ML1  12.8915     0.9148  14.092  < 2e-16 ***
nzldfac$ML2   5.4430     0.9039   6.022 1.81e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 74.02 on 7286 degrees of freedom
Multiple R-squared:  0.2615,    Adjusted R-squared:  0.2612 
F-statistic: 859.9 on 3 and 7286 DF,  p-value: < 2.2e-16

The overall relationship is significant (p < 0.0001), as well as the relations with the predictor variables.

Estimates mean that: (1) the better abilities are associated with the increase in math achievement by almost 46 points, (2) the better attitude to math results increases achievement by 13 points, and (3) the better is the nature of interaction with a teacher the better is the success in math (by 5 points roughly).

R-squared is rather ok, it says that approximately 26% of math achievement variance can be predicted by personal abilities, interaction with a teacher, and own attitude to math.

Many studies suggest that girls outperform boys in studying results. So, let’s check whether there is some influence of gender:

model2 <- lm(nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + as.factor(nzldfac$gender))
summary(model2)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$gender))

Residuals:
     Min       1Q   Median       3Q      Max 
-263.815  -46.395    7.465   51.323  220.247 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   508.4376     1.1869 428.372  < 2e-16 ***
nzldfac$ML3                    46.4830     0.9880  47.049  < 2e-16 ***
nzldfac$ML1                    13.1780     0.9175  14.363  < 2e-16 ***
nzldfac$ML2                     5.6839     0.9056   6.277 3.66e-10 ***
as.factor(nzldfac$gender)Male  -6.3766     1.7690  -3.605 0.000315 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 73.96 on 7285 degrees of freedom
Multiple R-squared:  0.2628,    Adjusted R-squared:  0.2624 
F-statistic: 649.2 on 4 and 7285 DF,  p-value: < 2.2e-16

The following result supports the named statement. According to significant p-value (p = .0003), and the estimate being a boy potentially reduces the level of math achievement by 6 points. R-squared remained almost the same, still the same 26% of variance may be explained with this model.

Now, the role of country:

model3 <- lm(nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + as.factor(nzldfac$country))
summary(model3)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$country))

Residuals:
     Min       1Q   Median       3Q      Max 
-263.361  -46.084    7.206   51.803  222.902 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  503.8938     0.9462 532.559  < 2e-16 ***
nzldfac$ML3                   45.8449     0.9786  46.846  < 2e-16 ***
nzldfac$ML1                   12.6517     0.9155  13.819  < 2e-16 ***
nzldfac$ML2                    5.3229     0.9032   5.893 3.96e-09 ***
as.factor(nzldfac$country)No  10.0179     2.3602   4.244 2.22e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 73.93 on 7285 degrees of freedom
Multiple R-squared:  0.2633,    Adjusted R-squared:  0.2629 
F-statistic: 650.9 on 4 and 7285 DF,  p-value: < 2.2e-16

Surprisingly, being born outside of the country increases the achievement in mathematics by 10 points (p < 0.0001) Explained variance. however, is still around 26%.

And parental education:

model4 <- lm(nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML3 + nzldfac$ML3 + as.factor(nzldfac$paredu))
summary(model4)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML3 + nzldfac$ML3 + 
    as.factor(nzldfac$paredu))

Residuals:
     Min       1Q   Median       3Q      Max 
-289.931  -44.885    5.647   49.554  231.944 

Coefficients:
                                                                    Estimate
(Intercept)                                                         538.8343
nzldfac$ML3                                                          44.0363
as.factor(nzldfac$paredu)Post-secondary but not University          -17.2546
as.factor(nzldfac$paredu)Upper Secondary                            -48.6591
as.factor(nzldfac$paredu)Lower Secondary                            -57.2875
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School -76.7510
as.factor(nzldfac$paredu)Don't Know                                 -42.6433
                                                                    Std. Error
(Intercept)                                                             1.9308
nzldfac$ML3                                                             0.9701
as.factor(nzldfac$paredu)Post-secondary but not University              3.2612
as.factor(nzldfac$paredu)Upper Secondary                                3.2388
as.factor(nzldfac$paredu)Lower Secondary                                4.3793
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School     7.7976
as.factor(nzldfac$paredu)Don't Know                                     2.2672
                                                                    t value
(Intercept)                                                         279.074
nzldfac$ML3                                                          45.391
as.factor(nzldfac$paredu)Post-secondary but not University           -5.291
as.factor(nzldfac$paredu)Upper Secondary                            -15.024
as.factor(nzldfac$paredu)Lower Secondary                            -13.081
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  -9.843
as.factor(nzldfac$paredu)Don't Know                                 -18.809
                                                                    Pr(>|t|)
(Intercept)                                                          < 2e-16
nzldfac$ML3                                                          < 2e-16
as.factor(nzldfac$paredu)Post-secondary but not University          1.25e-07
as.factor(nzldfac$paredu)Upper Secondary                             < 2e-16
as.factor(nzldfac$paredu)Lower Secondary                             < 2e-16
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  < 2e-16
as.factor(nzldfac$paredu)Don't Know                                  < 2e-16
                                                                       
(Intercept)                                                         ***
nzldfac$ML3                                                         ***
as.factor(nzldfac$paredu)Post-secondary but not University          ***
as.factor(nzldfac$paredu)Upper Secondary                            ***
as.factor(nzldfac$paredu)Lower Secondary                            ***
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School ***
as.factor(nzldfac$paredu)Don't Know                                 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 72.81 on 7283 degrees of freedom
Multiple R-squared:  0.2857,    Adjusted R-squared:  0.2851 
F-statistic: 485.5 on 6 and 7283 DF,  p-value: < 2.2e-16

As is seen, p-value tells that there is a significant relationship between math achivement and unknown status of parental education. Besides, all the relationships are negative. As was mentioned above, parental education contains too many “Don’t know” answers, thus we cannot remove it because it is significant, but the results should not be trusted completely.

In addition, several research also suggest that teachers are more ‘gentle’ towards girls (i.e. attitude to girls is somewhat better than to boys). Likewise, personal abilities in math may also depend on gender. So, let’s check the following assumptions and add the rest predictors into one model:

nzldfac$BSMMAT01 <- as.numeric(as.character(nzldfac$BSMMAT01))
model5 <- lm(nzldfac$BSMMAT01 ~ nzldfac$ML1 + nzldfac$ML2 * as.factor(nzldfac$gender) + nzldfac$ML3 * as.factor(nzldfac$gender) + as.factor(nzldfac$paredu) + as.factor(nzldfac$country))
summary(model5)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML1 + nzldfac$ML2 * as.factor(nzldfac$gender) + 
    nzldfac$ML3 * as.factor(nzldfac$gender) + as.factor(nzldfac$paredu) + 
    as.factor(nzldfac$country))

Residuals:
     Min       1Q   Median       3Q      Max 
-273.500  -44.105    6.302   49.332  229.611 

Coefficients:
                                                                    Estimate
(Intercept)                                                         537.9639
nzldfac$ML1                                                          11.8999
nzldfac$ML2                                                           1.3688
as.factor(nzldfac$gender)Male                                        -5.8192
nzldfac$ML3                                                          39.3535
as.factor(nzldfac$paredu)Post-secondary but not University          -16.0919
as.factor(nzldfac$paredu)Upper Secondary                            -45.3729
as.factor(nzldfac$paredu)Lower Secondary                            -53.5314
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School -71.0809
as.factor(nzldfac$paredu)Don't Know                                 -40.7669
as.factor(nzldfac$country)No                                          5.4686
nzldfac$ML2:as.factor(nzldfac$gender)Male                             9.5339
as.factor(nzldfac$gender)Male:nzldfac$ML3                             9.1231
                                                                    Std. Error
(Intercept)                                                             2.1104
nzldfac$ML1                                                             0.8915
nzldfac$ML2                                                             1.1480
as.factor(nzldfac$gender)Male                                           1.7123
nzldfac$ML3                                                             1.3046
as.factor(nzldfac$paredu)Post-secondary but not University              3.2111
as.factor(nzldfac$paredu)Upper Secondary                                3.2026
as.factor(nzldfac$paredu)Lower Secondary                                4.3179
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School     7.6662
as.factor(nzldfac$paredu)Don't Know                                     2.2381
as.factor(nzldfac$country)No                                            2.2991
nzldfac$ML2:as.factor(nzldfac$gender)Male                               1.7752
as.factor(nzldfac$gender)Male:nzldfac$ML3                               1.9149
                                                                    t value
(Intercept)                                                         254.909
nzldfac$ML1                                                          13.348
nzldfac$ML2                                                           1.192
as.factor(nzldfac$gender)Male                                        -3.398
nzldfac$ML3                                                          30.164
as.factor(nzldfac$paredu)Post-secondary but not University           -5.011
as.factor(nzldfac$paredu)Upper Secondary                            -14.168
as.factor(nzldfac$paredu)Lower Secondary                            -12.397
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  -9.272
as.factor(nzldfac$paredu)Don't Know                                 -18.215
as.factor(nzldfac$country)No                                          2.379
nzldfac$ML2:as.factor(nzldfac$gender)Male                             5.371
as.factor(nzldfac$gender)Male:nzldfac$ML3                             4.764
                                                                    Pr(>|t|)
(Intercept)                                                          < 2e-16
nzldfac$ML1                                                          < 2e-16
nzldfac$ML2                                                         0.233164
as.factor(nzldfac$gender)Male                                       0.000681
nzldfac$ML3                                                          < 2e-16
as.factor(nzldfac$paredu)Post-secondary but not University          5.53e-07
as.factor(nzldfac$paredu)Upper Secondary                             < 2e-16
as.factor(nzldfac$paredu)Lower Secondary                             < 2e-16
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  < 2e-16
as.factor(nzldfac$paredu)Don't Know                                  < 2e-16
as.factor(nzldfac$country)No                                        0.017407
nzldfac$ML2:as.factor(nzldfac$gender)Male                           8.09e-08
as.factor(nzldfac$gender)Male:nzldfac$ML3                           1.93e-06
                                                                       
(Intercept)                                                         ***
nzldfac$ML1                                                         ***
nzldfac$ML2                                                            
as.factor(nzldfac$gender)Male                                       ***
nzldfac$ML3                                                         ***
as.factor(nzldfac$paredu)Post-secondary but not University          ***
as.factor(nzldfac$paredu)Upper Secondary                            ***
as.factor(nzldfac$paredu)Lower Secondary                            ***
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School ***
as.factor(nzldfac$paredu)Don't Know                                 ***
as.factor(nzldfac$country)No                                        *  
nzldfac$ML2:as.factor(nzldfac$gender)Male                           ***
as.factor(nzldfac$gender)Male:nzldfac$ML3                           ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 71.47 on 7277 degrees of freedom
Multiple R-squared:  0.3123,    Adjusted R-squared:  0.3112 
F-statistic: 275.4 on 12 and 7277 DF,  p-value: < 2.2e-16

As can be seen from the resulting table, there’s an interaction in both cases. It means that (1) the level of achievement determined by the interaction with a teacher depends on gender, and (2) achivement level defined by personal abilities depends also on gender. Contrary to expectations, being a boy in the case of interaction with a teacher is associated with increase in math achievement by almost 9 points. However, ML2 seems to be insignificant predictor per se in this model (p = .233). Similarly, males seems to have slightly better abilities in math, so being a boy in the case of personal abilities potentially increases math achievement by 9 points.

Diagnostics

Multicollinearity of predictors

vif(model5)
##                                           GVIF Df GVIF^(1/(2*Df))
## nzldfac$ML1                           1.027904  1        1.013856
## nzldfac$ML2                           1.733306  1        1.316551
## as.factor(nzldfac$gender)             1.038986  1        1.019306
## nzldfac$ML3                           1.918162  1        1.384977
## as.factor(nzldfac$paredu)             1.048717  5        1.004768
## as.factor(nzldfac$country)            1.022062  1        1.010971
## nzldfac$ML2:as.factor(nzldfac$gender) 1.727542  1        1.314360
## as.factor(nzldfac$gender):nzldfac$ML3 1.869469  1        1.367285

There is no multicollinearity among the predictors.

Outliers and leverages.

outlierTest(model5)
## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferroni p
## 5277  -3.8331         0.00012761      0.93024
qqPlot(model5, main = "QQ Plot")

## [1] 3278 5277

The distribution of residuals is not really close to normality, two significant outliers.

Leverages

leveragePlots(model5)

par(mfrow=c(2,2)) 
plot(model5) 

No outliers fall above Cook’s distance, which means that there are no leverages.

Studentized residuals:

library(MASS)
sresid <- studres(model5) 
hist(sresid, freq = FALSE, 
     main = "Distribution of Studentized Residuals")
xfit <- seq(min(sresid), max(sresid), length = 40) 
yfit <- dnorm(xfit) 
lines(xfit, yfit)

Left-skewed, but almost normal distribution.

ANOVA

Finally, ANOVA testing.

anova(model1, model2)
Analysis of Variance Table

Model 1: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2
Model 2: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$gender)
  Res.Df      RSS Df Sum of Sq      F    Pr(>F)    
1   7286 39917806                                  
2   7285 39846735  1     71072 12.994 0.0003146 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The second model (= gender as factor) is better.

anova(model2, model3)
Analysis of Variance Table

Model 1: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$gender)
Model 2: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$country)
  Res.Df      RSS Df Sum of Sq F Pr(>F)
1   7285 39846735                      
2   7285 39819335  0     27399         

It seems that there’s no significant difference.

anova(model2, model4)
Analysis of Variance Table

Model 1: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$gender)
Model 2: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML3 + nzldfac$ML3 + 
    as.factor(nzldfac$paredu)
  Res.Df      RSS Df Sum of Sq     F    Pr(>F)    
1   7285 39846735                                 
2   7283 38608351  2   1238384 116.8 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The fourth one is obviously better.

anova(model3, model4)
Analysis of Variance Table

Model 1: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML1 + nzldfac$ML2 + 
    as.factor(nzldfac$country)
Model 2: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML3 + nzldfac$ML3 + 
    as.factor(nzldfac$paredu)
  Res.Df      RSS Df Sum of Sq      F    Pr(>F)    
1   7285 39819335                                  
2   7283 38608351  2   1210985 114.22 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, fourth is better.

And let’s compare with a final:

anova(model4, model5)
Analysis of Variance Table

Model 1: nzldfac$BSMMAT01 ~ nzldfac$ML3 + nzldfac$ML3 + nzldfac$ML3 + 
    as.factor(nzldfac$paredu)
Model 2: nzldfac$BSMMAT01 ~ nzldfac$ML1 + nzldfac$ML2 * as.factor(nzldfac$gender) + 
    nzldfac$ML3 * as.factor(nzldfac$gender) + as.factor(nzldfac$paredu) + 
    as.factor(nzldfac$country)
  Res.Df      RSS Df Sum of Sq      F    Pr(>F)    
1   7283 38608351                                  
2   7277 37171184  6   1437167 46.892 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The final model (i.e. with interactions) is the best.

Let’s look at it again:

summary(model5)

Call:
lm(formula = nzldfac$BSMMAT01 ~ nzldfac$ML1 + nzldfac$ML2 * as.factor(nzldfac$gender) + 
    nzldfac$ML3 * as.factor(nzldfac$gender) + as.factor(nzldfac$paredu) + 
    as.factor(nzldfac$country))

Residuals:
     Min       1Q   Median       3Q      Max 
-273.500  -44.105    6.302   49.332  229.611 

Coefficients:
                                                                    Estimate
(Intercept)                                                         537.9639
nzldfac$ML1                                                          11.8999
nzldfac$ML2                                                           1.3688
as.factor(nzldfac$gender)Male                                        -5.8192
nzldfac$ML3                                                          39.3535
as.factor(nzldfac$paredu)Post-secondary but not University          -16.0919
as.factor(nzldfac$paredu)Upper Secondary                            -45.3729
as.factor(nzldfac$paredu)Lower Secondary                            -53.5314
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School -71.0809
as.factor(nzldfac$paredu)Don't Know                                 -40.7669
as.factor(nzldfac$country)No                                          5.4686
nzldfac$ML2:as.factor(nzldfac$gender)Male                             9.5339
as.factor(nzldfac$gender)Male:nzldfac$ML3                             9.1231
                                                                    Std. Error
(Intercept)                                                             2.1104
nzldfac$ML1                                                             0.8915
nzldfac$ML2                                                             1.1480
as.factor(nzldfac$gender)Male                                           1.7123
nzldfac$ML3                                                             1.3046
as.factor(nzldfac$paredu)Post-secondary but not University              3.2111
as.factor(nzldfac$paredu)Upper Secondary                                3.2026
as.factor(nzldfac$paredu)Lower Secondary                                4.3179
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School     7.6662
as.factor(nzldfac$paredu)Don't Know                                     2.2381
as.factor(nzldfac$country)No                                            2.2991
nzldfac$ML2:as.factor(nzldfac$gender)Male                               1.7752
as.factor(nzldfac$gender)Male:nzldfac$ML3                               1.9149
                                                                    t value
(Intercept)                                                         254.909
nzldfac$ML1                                                          13.348
nzldfac$ML2                                                           1.192
as.factor(nzldfac$gender)Male                                        -3.398
nzldfac$ML3                                                          30.164
as.factor(nzldfac$paredu)Post-secondary but not University           -5.011
as.factor(nzldfac$paredu)Upper Secondary                            -14.168
as.factor(nzldfac$paredu)Lower Secondary                            -12.397
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  -9.272
as.factor(nzldfac$paredu)Don't Know                                 -18.215
as.factor(nzldfac$country)No                                          2.379
nzldfac$ML2:as.factor(nzldfac$gender)Male                             5.371
as.factor(nzldfac$gender)Male:nzldfac$ML3                             4.764
                                                                    Pr(>|t|)
(Intercept)                                                          < 2e-16
nzldfac$ML1                                                          < 2e-16
nzldfac$ML2                                                         0.233164
as.factor(nzldfac$gender)Male                                       0.000681
nzldfac$ML3                                                          < 2e-16
as.factor(nzldfac$paredu)Post-secondary but not University          5.53e-07
as.factor(nzldfac$paredu)Upper Secondary                             < 2e-16
as.factor(nzldfac$paredu)Lower Secondary                             < 2e-16
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School  < 2e-16
as.factor(nzldfac$paredu)Don't Know                                  < 2e-16
as.factor(nzldfac$country)No                                        0.017407
nzldfac$ML2:as.factor(nzldfac$gender)Male                           8.09e-08
as.factor(nzldfac$gender)Male:nzldfac$ML3                           1.93e-06
                                                                       
(Intercept)                                                         ***
nzldfac$ML1                                                         ***
nzldfac$ML2                                                            
as.factor(nzldfac$gender)Male                                       ***
nzldfac$ML3                                                         ***
as.factor(nzldfac$paredu)Post-secondary but not University          ***
as.factor(nzldfac$paredu)Upper Secondary                            ***
as.factor(nzldfac$paredu)Lower Secondary                            ***
as.factor(nzldfac$paredu)Some Primary, Lower Secondary or No School ***
as.factor(nzldfac$paredu)Don't Know                                 ***
as.factor(nzldfac$country)No                                        *  
nzldfac$ML2:as.factor(nzldfac$gender)Male                           ***
as.factor(nzldfac$gender)Male:nzldfac$ML3                           ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 71.47 on 7277 degrees of freedom
Multiple R-squared:  0.3123,    Adjusted R-squared:  0.3112 
F-statistic: 275.4 on 12 and 7277 DF,  p-value: < 2.2e-16

The model, though still have one insignificant predictor (which cannot be removed due to interaction), explains 31% of variance in math achievemetn. It is a rather good result.


Conclusions

In this project we discovered what factors affect math achievement among New Zealander pupils.

Exploratory factor analysis and regression analysis werre conducted to examine these questions.

In factor analysis the varimax rotation turned out to be the best solution. In regression analysis all of the received hidden factors were examined, and the role of gender, parental education, and country of origin were taken onto account. After having been done diagnostics, we discovered that the most suitable model is the final one.

Altogether, all of the factor predictors were discovered to impact math achievement, these are namely - (1) attitude to math, (2) interaction with a teacher, and (3) personal abilities in math. The rest predictor variables also had significant influnce, for instance having being born outside of the country was found to be associated with the slight increase in math achievement. Besides, two interactions were examined (interaction with a teacher and gender; personal abilities in math and gender) and actually were significant.