Libraries, Data, and Research Questions

library(dplyr)
library(ggplot2)
library(haven)
library(polycor)
library(corrplot)
library(tidyverse)
library(psych)
library(kableExtra)
library(sjPlot)
library(cowplot)
library(polycor)
library(ggcorrplot)
library(car)

path = file.path("C:/Users/Zver/Documents/R projects", "data analysis 2021", "BSGCANM6.sav")

In this paper, I conduct the factor analysis of the variables related to the students’ perception of maths (as a subject, requiring its learning and studying during the lessons) to use the resulting variables in the prediction of their math achievements. The underlying assumption behind this workflow is that the factors derived from the 24 closely connected variables may constitute a set of composed interpretable measures. So, the research questions I am concerned with are formulated as:

  1. Are there any meaningful factors behind the considered set of variables?
  2. How do these factors and other variables contribute to the explanation of the students’ math achievements?
canada = read_sav(path)

canada <- canada %>% 
  select("BSBM17A",  "BSBM17B",  "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01")

The data used for this purpose is taken from the Trends in International Mathematics and Science Study (TIMSS) for Canada, 2015 (8th grade).In total, the dataset consists of 8,757 observations of 25 variables (24 would be used for the factor analysis, and 1 is the dependent variable in the future regression model). Nevertheless, each variable used for the future analysis contains missing values (from 219 to 366 observations). To deal with it, such observations were removed: thus, the new dataset we consider further has 7,662 observations of 25 variables (to note, the dependent variable has no missing values).

canada <- na.omit(canada) %>% 
  as.data.frame()

Exploring the data

The table below presents the original names of the 25 selected variables, their interpretable names (the actual renaming is done further), their meanings, and the mean values. To interpret the last column, the possible values for each observation varies from 1 (agree a lot) to 4 (disagree a lot). The variable of the students’ achievements would be described further in more details.

variables = c("BSBM17A",  "BSBM17B",  "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01")

statements = c("I enjoy learning mathematics", "I wish I did not have to study mathematics", "Mathematics is boring", "I learn many interesting things in mathematics", "I like mathematics", "I like any schoolwork that involves numbers", "I like to solve mathematics problems", "I look forward to mathematics class", "Mathematics is one of my favorite subjects", "I know what my teacher expects me to do", "My teacher is easy to understand", "I am interested in what my teacher says", "My teacher gives me interesting things to do", "My teacher has clear answers to my questions", "My teacher is good at explaining mathematics", "My teacher lets me show what I have learned", "My teacher does a variety of things to help us learn", "My teacher tells me how to do better when I make a mistake", "My teacher listens to what I have to say", "I usually do well in mathematics", "Mathematics is more difficult for me than for many of my classmates", "Mathematics is not one of my strengths", "I learn things quickly in mathematics", "Mathematics makes me nervous", "Student's math achievement")

mean <- c(round(mean(canada$BSBM17A), digits = 2), round(mean(canada$BSBM17B), digits = 2), round(mean(canada$BSBM17C), digits = 2), round(mean(canada$BSBM17D), digits = 2), round(mean(canada$BSBM17E), digits = 2),  round(mean(canada$BSBM17F), digits = 2),  round(mean(canada$BSBM17G), digits = 2),  round(mean(canada$BSBM17H), digits = 2),  round(mean(canada$BSBM17I), digits = 2),  round(mean(canada$BSBM18A), digits = 2),  round(mean(canada$BSBM18B), digits = 2),  round(mean(canada$BSBM18C), digits = 2),  round(mean(canada$BSBM18D), digits = 2),  round(mean(canada$BSBM18E), digits = 2),  round(mean(canada$BSBM18F), digits = 2),  round(mean(canada$BSBM18G), digits = 2),  round(mean(canada$BSBM18H), digits = 2),  round(mean(canada$BSBM18I), digits = 2),  round(mean(canada$BSBM18J), digits = 2),  round(mean(canada$BSBM19A), digits = 2),  round(mean(canada$BSBM19B), digits = 2),  round(mean(canada$BSBM19C), digits = 2),  round(mean(canada$BSBM19D), digits = 2),  round(mean(canada$BSBM19E), digits = 2),  round(mean(canada$BSMMAT01), digits = 2))


renamed <- c("enjoy_learning_math", "wish_not_to_study", "math_is_boring", "learning_interesting_things", "like_math", "like_schoolwork_with_numbers", "like_solving_maths", "look_forward_for_classes", "one_of_favorite_classes", "know_teachers_expectations", "teacher_is_understandable", "interested_in_teachers_words", "teacher_gives_interesting_things", "teacher_has_clear_answers", "teacher_is_good_at_explaning", "teacher_ask_to_show_knowledge", "teacher_does_a_lot_to_learn", "teacher_tells_how_to_do_better", "teacher_listens_to_me", "i_usually_do_well", "maths_is_harder_for_me_than_others", "maths_is_not_my_strength", "i_learn_quickly", "maths_makes_me_nervous", "math_achievement")

data.frame(variables, renamed, statements, mean, stringsAsFactors = FALSE) %>% kable() %>% kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
variables renamed statements mean
BSBM17A enjoy_learning_math I enjoy learning mathematics 2.03
BSBM17B wish_not_to_study I wish I did not have to study mathematics 2.74
BSBM17C math_is_boring Mathematics is boring 2.59
BSBM17D learning_interesting_things I learn many interesting things in mathematics 2.02
BSBM17E like_math I like mathematics 2.14
BSBM17F like_schoolwork_with_numbers I like any schoolwork that involves numbers 2.42
BSBM17G like_solving_maths I like to solve mathematics problems 2.33
BSBM17H look_forward_for_classes I look forward to mathematics class 2.58
BSBM17I one_of_favorite_classes Mathematics is one of my favorite subjects 2.50
BSBM18A know_teachers_expectations I know what my teacher expects me to do 1.56
BSBM18B teacher_is_understandable My teacher is easy to understand 1.74
BSBM18C interested_in_teachers_words I am interested in what my teacher says 1.91
BSBM18D teacher_gives_interesting_things My teacher gives me interesting things to do 2.09
BSBM18E teacher_has_clear_answers My teacher has clear answers to my questions 1.78
BSBM18F teacher_is_good_at_explaning My teacher is good at explaining mathematics 1.65
BSBM18G teacher_ask_to_show_knowledge My teacher lets me show what I have learned 1.80
BSBM18H teacher_does_a_lot_to_learn My teacher does a variety of things to help us learn 1.67
BSBM18I teacher_tells_how_to_do_better My teacher tells me how to do better when I make a mistake 1.67
BSBM18J teacher_listens_to_me My teacher listens to what I have to say 1.62
BSBM19A i_usually_do_well I usually do well in mathematics 1.78
BSBM19B maths_is_harder_for_me_than_others Mathematics is more difficult for me than for many of my classmates 2.90
BSBM19C maths_is_not_my_strength Mathematics is not one of my strengths 2.76
BSBM19D i_learn_quickly I learn things quickly in mathematics 2.04
BSBM19E maths_makes_me_nervous Mathematics makes me nervous 2.81
BSMMAT01 math_achievement Student’s math achievement 537.14

Further, I examine each set of questions (BSBM17, BSBM18, and BSBM19) separately, in terms of its variables’ distributions. First, let’s look at the 17th block, the one that literally asks “How much do you agree with these statements about learning mathematics?”:

one <- ggplot(canada,
       aes(as.numeric(BSBM17A))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I enjoy learning mathematics
       ",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

two <- ggplot(canada,
       aes(as.numeric(BSBM17B))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I wish I did not
have to study mathematics",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

three <- ggplot(canada,
       aes(as.numeric(BSBM17C))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "Mathematics is boring
       ",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

four <- ggplot(canada,
       aes(as.numeric(BSBM17D))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I learn many interesting things 
in mathematics",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))
  
five <- ggplot(canada,
       aes(as.numeric(BSBM17E))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I like mathematics
       ",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

six <- ggplot(canada,
       aes(as.numeric(BSBM17F))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I like any schoolwork 
that involves numbers",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

seven <- ggplot(canada,
       aes(as.numeric(BSBM17G))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I like to solve
mathematics problems",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

eight <- ggplot(canada,
       aes(as.numeric(BSBM17H))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I look forward to 
mathematics class",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))
  
nine <- ggplot(canada,
       aes(as.numeric(BSBM17I))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "Mathematics is one of 
my favorite subjects",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black"))

plot_grid(one, two, three, four, five, six, seven, eight, nine)

The 18th block (A-J) meets the question “How much do you agree with these statements about your mathematics lessons?”:

one <- ggplot(canada,
       aes(as.numeric(BSBM18A))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I know what my teacher expects me to do",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

two <- ggplot(canada,
       aes(as.numeric(BSBM18B))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher is easy to understand",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

three <- ggplot(canada,
       aes(as.numeric(BSBM18C))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I am interested in what my teacher says",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

four <- ggplot(canada,
       aes(as.numeric(BSBM18D))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher gives me interesting things to do",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))
  
five <- ggplot(canada,
       aes(as.numeric(BSBM18E))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher has clear answers to my questions",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

six <- ggplot(canada,
       aes(as.numeric(BSBM18F))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher is good at explaining mathematics",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

seven <- ggplot(canada,
       aes(as.numeric(BSBM18G))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher lets me show what I have learned",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

eight <- ggplot(canada,
       aes(as.numeric(BSBM18H))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher does a variety of things to help us learn",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))
  
nine <- ggplot(canada,
       aes(as.numeric(BSBM18I))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher tells me how to do better when I make a mistake",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

ten <- ggplot(canada,
       aes(as.numeric(BSBM18J))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "My teacher listens to what I have to say",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

plot_grid(one, two, three, four, five, six, seven, eight, nine, ten, ncol = 2)

The last (19th) block is united under the question “How much do you agree with these statements about mathematics?”.

one <- ggplot(canada,
       aes(as.numeric(BSBM19A))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I usually 
do well in mathematics",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

two <- ggplot(canada,
       aes(as.numeric(BSBM19B))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "Maths is more difficult for me 
than for many of my classmates",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

three <- ggplot(canada,
       aes(as.numeric(BSBM19C))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "Mathematics is not 
one of my strengths",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))

four <- ggplot(canada,
       aes(as.numeric(BSBM19D))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "I learn things 
quickly in mathematics",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))
  
five <- ggplot(canada,
       aes(as.numeric(BSBM19E))) + 
  geom_bar(fill = "#60AB9A") +
  theme_bw() +
  labs(subtitle = "Mathematics makes me nervous
       ",
       x = "",
       y = "") + 
  theme(plot.subtitle=element_text(size=8, hjust=0.5, face="italic", color="black")) +
  theme(axis.text=element_text(size=6))


plot_grid(one, two, three, four, five, nrow = 2)

The distribution of the dependent variable that interests us mostly:

canada %>% 
  ggplot(aes(as.numeric(BSMMAT01))) + 
  geom_histogram(bins = 40, fill = "#60AB9A") +
  theme_bw() +
  labs(x = "
       student's math achievements",
       y = "",
       title = "The distribution of the math achievements across the data,",
       subtitle = "dash line shows the variable mean = 537.14
       ") +
  geom_vline(aes(xintercept = 537.14), color = "red", linetype = "longdash", size = 1.1)

Finally, I create 2 new sets of data: for the factor analysis and the regression model respectively, preliminary renaming them:

canada1 <- rename(canada,
                 ##first block of variables
                 enjoy_learning_math = BSBM17A,
                 wish_not_to_study = BSBM17B,
                 math_is_boring = BSBM17C,
                 learning_interesting_things = BSBM17D,
                 like_math = BSBM17E,
                 like_schoolwork_with_numbers = BSBM17F,
                 like_solving_maths = BSBM17G,
                 look_forward_for_classes = BSBM17H,
                 one_of_favorite_classes = BSBM17I,
                 ##second block of variables
                 know_teachers_expectations = BSBM18A,
                 teacher_is_understandable = BSBM18B,
                 interested_in_teachers_words = BSBM18C,
                 teacher_gives_interesting_things = BSBM18D,
                 teacher_has_clear_answers = BSBM18E,
                 teacher_is_good_at_explaning = BSBM18F,
                 teacher_ask_to_show_knowledge = BSBM18G,
                 teacher_does_a_lot_to_learn = BSBM18H,
                 teacher_tells_how_to_do_better = BSBM18I,
                 teacher_listens_to_me = BSBM18J,
                 ##third block of variables
                 i_usually_do_well = BSBM19A,
                 maths_is_harder_for_me_than_others = BSBM19B,
                 maths_is_not_my_strength = BSBM19C,
                 i_learn_quickly = BSBM19D,
                 maths_makes_me_nervous = BSBM19E,
                 ##dependent variable
                 math_achievement = BSMMAT01)

canada_reggression <- canada1
canada_factors <- canada1[,1:24]

Exploratory Factor Analysis

Correlation matrix

Before running the exploratory factor analysis, we need to consider the correlation matrix - in this way, it is possible to make a first suggestion about the existing latent factors. To make this picture clearer, I changed the variables’ names (back again!) so that they contain only the number of question (17-19) and the reference to their respective parts (letters).

canada <- as.data.frame(lapply(canada, as.numeric))

canada <- as.data.frame(lapply(canada, as.numeric))
names(canada) = gsub(pattern = "BSBM", replacement = " ", x = names(canada))
names(canada) = gsub(pattern = "X.", replacement = "", x = names(canada))

hetcor <- hetcor(canada)
cor <- round(hetcor$correlations, 1)
ggcorrplot(cor, hc.order = TRUE, 
           type = "lower",
           outline.color = "black",
           ggtheme = ggplot2::theme_bw,
           colors =c("#E74D4D", "white", "#9ECE9A"),
           lab = TRUE,
           lab_col = "black",
           lab_size = 2,
           tl.cex = 8,
           title = "Correlation matrix of the selected variables",
           legend.title = "Coefficients:
           ")

The observations based on this matrix are:

  1. there are many high correlations among the variables (> 0.5), so, the analysis does seem to be relevant.
  2. besides the positive correlations, there are negative ones as well: for example, the variables from the 19th block, related to the attitudes towards the maths as a subject, are negatively correlated with plenty of other variables.
  3. relying on the color palettes, the 3 clusters can be preliminary identified (on the basis of the “white square” in the matrix center, which stands for no correlations).

Number of clusters to be extracted & parameters

The parallel analysis is a technique to define the largest possible number of the latent clusters. I conduct the analysis with 500 iterations.

fa.parallel(hetcor$correlations, 500, fa = "fa")

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA

The parallel analysis suggests that 3 factors should be derived. Considering the plot, the blue line crosses the horizontal line at 3 thus approaching the dashed line. So, I would use 3 factors while choosing the best factor analysis parameters, rotation (none vs. oblimin vs. varimax) and factoring methods (minres vs. wls vs. ml).

Before comparing the results of the different analysis setups, I assure that the variables are numeric. Next, I run the 9 analysis and analyze their results in the table below:

canada_factors <-as.data.frame(lapply(canada_factors, as.numeric))

## creating FAs

none_minres <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "none", 
                     fm = "minres",
                     cor = "mixed")
none_wls <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "none", 
                     fm = "wls",
                     cor = "mixed")
none_ml <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "none", 
                     fm = "ml",
                     cor = "mixed")


oblimin_minres <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "oblimin", 
                     fm = "minres",
                     cor = "mixed")
oblimin_wls <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "oblimin", 
                     fm = "wls",
                     cor = "mixed")
oblimin_ml <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "oblimin", 
                     fm = "ml",
                     cor = "mixed")
  

varimax_minres <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "varimax", 
                     fm = "minres",
                     cor = "mixed")
varimax_wls <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "varimax", 
                     fm = "wls",
                     cor = "mixed")
varimax_ml <- fa(canada_factors, 
                     nfactors = 3, 
                     rotate = "varimax", 
                     fm = "ml",
                     cor = "mixed")


## making a table through data.frame


parameters <- c("none_minres", "none_wls", "none_ml", "oblimin_minres", "oblimin_wls", "oblimin_ml", "varimax_minres", "varimax_wls", "varimax_ml")

RMSR <- c(round(none_minres$rms, digits = 2), round(none_wls$rms, digits = 2), round(none_ml$rms, digits = 2), round(oblimin_minres$rms, digits = 2), round(oblimin_wls$rms, digits = 2), round(oblimin_ml$rms, digits = 2), round(varimax_minres$rms, digits = 2), round(varimax_wls$rms, digits = 2), round(varimax_ml$rms, digits = 2))

RMSEA <- c(round(none_minres$RMSEA, digits = 2), round(none_wls$RMSEA, digits = 2), round(none_ml$RMSEA, digits = 2), round(oblimin_minres$RMSEA, digits = 2), round(oblimin_wls$RMSEA, digits = 2), round(oblimin_ml$RMSEA, digits = 2), round(varimax_minres$RMSEA, digits = 2), round(varimax_wls$RMSEA, digits = 2), round(varimax_ml$RMSEA, digits = 2))

Tucker_Lewis_index <- c(round(none_minres$TLI, digits = 2), round(none_wls$TLI, digits = 2), round(none_ml$TLI, digits = 2), round(oblimin_minres$TLI, digits = 2), round(oblimin_wls$TLI, digits = 2), round(oblimin_ml$TLI, digits = 2), round(varimax_minres$TLI, digits = 2), round(varimax_wls$TLI, digits = 2), round(varimax_ml$TLI, digits = 2))

BIC <- c(none_minres$BIC, none_wls$BIC, none_ml$BIC, oblimin_minres$BIC, oblimin_wls$BIC, oblimin_ml$BIC, varimax_minres$BIC, varimax_wls$BIC, varimax_ml$BIC)

data.frame(parameters,
           RMSR, 
           RMSEA, 
           Tucker_Lewis_index, 
           BIC, 
           stringsAsFactors = FALSE) %>% 
  filter(RMSEA < 0.1) %>% 
  count(parameters, RMSR, RMSEA, Tucker_Lewis_index, BIC) %>% 
  select(!n) %>% 
  kable() %>% 
  kable_styling(bootstrap_options=c("bordered", 
                                    "responsive",
                                    "striped"), 
                full_width = FALSE)
parameters RMSR RMSEA Tucker_Lewis_index BIC
none_minres 0.02 0.08 0.93 8120.733
none_ml 0.02 0.08 0.93 7919.623
none_wls 0.02 0.08 0.93 8045.583
oblimin_minres 0.02 0.08 0.93 8120.733
oblimin_ml 0.02 0.08 0.93 7919.623
oblimin_wls 0.02 0.08 0.93 8045.583
varimax_minres 0.02 0.08 0.93 8120.733
varimax_ml 0.02 0.08 0.93 7919.623
varimax_wls 0.02 0.08 0.93 8045.583

As seen, the values of RMSR = 0.02 < 0.05, RMSEA = 0.08 > 0.05, and TLI = 0.93 > 0.9 are similar among the variatons. The only difference lies in the BIC values: the models with ml factoring method have the lowest ones. Generally speaking, the RMSEA is the only sufficient (not “good”) fit, but still it could be said that the derived factors are fine!

Another step is to compare the loadings. So, no rotation loadings:

print(none_ml$loadings,cutoff = 0.4)
## 
## Loadings:
##                                    ML1    ML2    ML3   
## enjoy_learning_math                 0.901              
## wish_not_to_study                  -0.679              
## math_is_boring                     -0.738              
## learning_interesting_things         0.757              
## like_math                           0.909              
## like_schoolwork_with_numbers        0.783              
## like_solving_maths                  0.825              
## look_forward_for_classes            0.861              
## one_of_favorite_classes             0.881              
## know_teachers_expectations          0.551  0.418       
## teacher_is_understandable           0.630  0.591       
## interested_in_teachers_words        0.712  0.433       
## teacher_gives_interesting_things    0.690  0.440       
## teacher_has_clear_answers           0.590  0.643       
## teacher_is_good_at_explaning        0.635  0.634       
## teacher_ask_to_show_knowledge       0.555  0.532       
## teacher_does_a_lot_to_learn         0.557  0.614       
## teacher_tells_how_to_do_better      0.522  0.630       
## teacher_listens_to_me               0.513  0.627       
## i_usually_do_well                   0.730              
## maths_is_harder_for_me_than_others -0.599         0.525
## maths_is_not_my_strength           -0.694  0.415  0.429
## i_learn_quickly                     0.731              
## maths_makes_me_nervous             -0.475              
## 
##                   ML1   ML2   ML3
## SS loadings    11.752 4.011 1.253
## Proportion Var  0.490 0.167 0.052
## Cumulative Var  0.490 0.657 0.709

oblimin rotation loadings:

print(oblimin_ml$loadings,cutoff = 0.4)
## 
## Loadings:
##                                    ML1    ML2    ML3   
## enjoy_learning_math                 0.901              
## wish_not_to_study                  -0.601              
## math_is_boring                     -0.753              
## learning_interesting_things         0.784              
## like_math                           0.913              
## like_schoolwork_with_numbers        0.836              
## like_solving_maths                  0.822              
## look_forward_for_classes            0.867              
## one_of_favorite_classes             0.820              
## know_teachers_expectations                 0.662       
## teacher_is_understandable                  0.879       
## interested_in_teachers_words        0.403  0.613       
## teacher_gives_interesting_things    0.406  0.606       
## teacher_has_clear_answers                  0.907       
## teacher_is_good_at_explaning               0.913       
## teacher_ask_to_show_knowledge              0.748       
## teacher_does_a_lot_to_learn                0.815       
## teacher_tells_how_to_do_better             0.827       
## teacher_listens_to_me                      0.830       
## i_usually_do_well                                -0.731
## maths_is_harder_for_me_than_others                0.906
## maths_is_not_my_strength                          0.826
## i_learn_quickly                                  -0.638
## maths_makes_me_nervous                            0.623
## 
##                  ML1   ML2   ML3
## SS loadings    6.431 6.287 3.047
## Proportion Var 0.268 0.262 0.127
## Cumulative Var 0.268 0.530 0.657

varimax rotation loadings:

print(varimax_ml$loadings,cutoff = 0.4)
## 
## Loadings:
##                                    ML2    ML1    ML3   
## enjoy_learning_math                        0.834       
## wish_not_to_study                         -0.592       
## math_is_boring                            -0.692       
## learning_interesting_things                0.693       
## like_math                                  0.849       
## like_schoolwork_with_numbers               0.754       
## like_solving_maths                         0.768       
## look_forward_for_classes                   0.789       
## one_of_favorite_classes                    0.791  0.426
## know_teachers_expectations          0.652              
## teacher_is_understandable           0.845              
## interested_in_teachers_words        0.708  0.449       
## teacher_gives_interesting_things    0.701  0.441       
## teacher_has_clear_answers           0.865              
## teacher_is_good_at_explaning        0.880              
## teacher_ask_to_show_knowledge       0.742              
## teacher_does_a_lot_to_learn         0.806              
## teacher_tells_how_to_do_better      0.805              
## teacher_listens_to_me               0.801              
## i_usually_do_well                                 0.751
## maths_is_harder_for_me_than_others               -0.847
## maths_is_not_my_strength                         -0.827
## i_learn_quickly                            0.430  0.687
## maths_makes_me_nervous                           -0.602
## 
##                  ML2   ML1   ML3
## SS loadings    6.810 6.361 3.845
## Proportion Var 0.284 0.265 0.160
## Cumulative Var 0.284 0.549 0.709

To start with, we should not consider the baseline model without the rotation as its first factor includes all the variables thus making SS loadings highly unequal. Next, comparing varimax and oblimin kinds of rotation, I would choose the former because its mean item complexity = 1.4 is better than the same value for the last (1.1 < 1.3). On the other hand, it should be said that the last model has less intersections between the variables constituting the factors.

Though, given that the proportions of SS loadings, proportion var, and cumulative var are approximately similar, the varimax composition was chosen. To finish this discussion, according to the Kaiser’s rule, it is legit to keep the factors: the SS loadings (eigenvalues) are > 1 for each factor.

Diagram

The total output of the chosen method:

varimax_ml
## Factor Analysis using method =  ml
## Call: fa(r = canada_factors, nfactors = 3, rotate = "varimax", fm = "ml", 
##     cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                                      ML2   ML1   ML3   h2    u2 com
## enjoy_learning_math                 0.26  0.83  0.34 0.88 0.118 1.5
## wish_not_to_study                  -0.18 -0.59 -0.33 0.49 0.505 1.8
## math_is_boring                     -0.20 -0.69 -0.28 0.60 0.401 1.5
## learning_interesting_things         0.39  0.69  0.11 0.65 0.352 1.6
## like_math                           0.22  0.85  0.39 0.92 0.083 1.5
## like_schoolwork_with_numbers        0.20  0.75  0.28 0.69 0.313 1.4
## like_solving_maths                  0.19  0.77  0.36 0.76 0.243 1.6
## look_forward_for_classes            0.34  0.79  0.24 0.80 0.202 1.6
## one_of_favorite_classes             0.21  0.79  0.43 0.85 0.150 1.7
## know_teachers_expectations          0.65  0.19  0.19 0.50 0.503 1.3
## teacher_is_understandable           0.84  0.17  0.19 0.78 0.224 1.2
## interested_in_teachers_words        0.71  0.45  0.06 0.71 0.292 1.7
## teacher_gives_interesting_things    0.70  0.44  0.04 0.69 0.313 1.7
## teacher_has_clear_answers           0.86  0.14  0.12 0.78 0.217 1.1
## teacher_is_good_at_explaning        0.88  0.17  0.15 0.83 0.174 1.1
## teacher_ask_to_show_knowledge       0.74  0.19  0.09 0.60 0.404 1.2
## teacher_does_a_lot_to_learn         0.81  0.19  0.03 0.69 0.312 1.1
## teacher_tells_how_to_do_better      0.80  0.15  0.02 0.67 0.329 1.1
## teacher_listens_to_me               0.80  0.14  0.03 0.66 0.339 1.1
## i_usually_do_well                   0.19  0.39  0.75 0.75 0.245 1.7
## maths_is_harder_for_me_than_others -0.05 -0.25 -0.85 0.78 0.217 1.2
## maths_is_not_my_strength           -0.06 -0.39 -0.83 0.84 0.162 1.4
## i_learn_quickly                     0.20  0.43  0.69 0.70 0.304 1.9
## maths_makes_me_nervous             -0.06 -0.23 -0.60 0.42 0.582 1.3
## 
##                        ML2  ML1  ML3
## SS loadings           6.81 6.36 3.84
## Proportion Var        0.28 0.27 0.16
## Cumulative Var        0.28 0.55 0.71
## Proportion Explained  0.40 0.37 0.23
## Cumulative Proportion 0.40 0.77 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  24.46 with Chi Square of  187201.9
## The degrees of freedom for the model are 207  and the objective function was  1.28 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  7662 with the empirical chi square  1815.99  with prob <  8.5e-255 
## The total number of observations was  7662  with Likelihood Chi Square =  9771.04  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.932
## RMSEA index =  0.078  and the 90 % confidence intervals are  0.076 0.079
## BIC =  7919.62
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3
## Correlation of (regression) scores with factors   0.97 0.96 0.94
## Multiple R square of scores with factors          0.95 0.93 0.89
## Minimum correlation of possible factor scores     0.90 0.86 0.78

The amount of the considered variables results in the awkward diagram presented below:

fa.diagram(varimax_ml, simple = T)

Though the graph is hard to look at, it is obvious that the factors strictly corresponds to the blocks of questions discussed above. Thus, the ML2 is related to the 18th block (10 variables, or 10 questions), the ML1 is related to the 17th block (9 variables, or 9 questions), and the ML3 is related to the 19th block (5 variables, or 5 questions). I would call these factors according to the blocks’ names:

1. learning maths (ML1) stands for the student’s attitudes towards the process of learning mathematics. It includes joy and the consideration of math as a lesson.

2. teacher’s role (ML2) stands for the student’s attitudes towards the teacher’s performance. It consists both of how the teacher studies and how does he or she interact with the pupils.

3. maths itself (ML3) stands for the math perception as the strong or the weak feature of the student.

In addition, a good indicator of the factors’ high quality is the number of variables included > 3.

Chronbach’s alpha

In this part, I check the model fit using the Chronbach’s alpha: if this value for the factor is > 0.7, than the reliability is good.

ML1 <- canada_factors[, 1:9]
alpha(ML1, check.keys = T)
## 
## Reliability analysis   
## Call: alpha(x = ML1, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.94      0.94    0.94      0.64  16 0.001  2.3 0.82     0.64
## 
##  lower alpha upper     95% confidence boundaries
## 0.94 0.94 0.94 
## 
##  Reliability if an item is dropped:
##                              raw_alpha std.alpha G6(smc) average_r S/N alpha se
## enjoy_learning_math               0.93      0.93    0.93      0.62  13   0.0012
## wish_not_to_study-                0.94      0.94    0.94      0.66  16   0.0010
## math_is_boring-                   0.93      0.94    0.93      0.65  15   0.0011
## learning_interesting_things       0.94      0.94    0.94      0.65  15   0.0011
## like_math                         0.93      0.93    0.92      0.61  13   0.0013
## like_schoolwork_with_numbers      0.93      0.93    0.93      0.64  14   0.0011
## like_solving_maths                0.93      0.93    0.93      0.63  14   0.0012
## look_forward_for_classes          0.93      0.93    0.93      0.63  13   0.0012
## one_of_favorite_classes           0.93      0.93    0.93      0.62  13   0.0012
##                               var.r med.r
## enjoy_learning_math          0.0082  0.61
## wish_not_to_study-           0.0065  0.66
## math_is_boring-              0.0100  0.66
## learning_interesting_things  0.0083  0.66
## like_math                    0.0072  0.61
## like_schoolwork_with_numbers 0.0092  0.63
## like_solving_maths           0.0091  0.63
## look_forward_for_classes     0.0094  0.63
## one_of_favorite_classes      0.0083  0.62
## 
##  Item statistics 
##                                 n raw.r std.r r.cor r.drop mean   sd
## enjoy_learning_math          7662  0.88  0.89  0.88   0.85  2.0 0.93
## wish_not_to_study-           7662  0.72  0.71  0.65   0.63  2.3 1.07
## math_is_boring-              7662  0.78  0.78  0.74   0.72  2.4 0.99
## learning_interesting_things  7662  0.74  0.75  0.70   0.68  2.0 0.87
## like_math                    7662  0.90  0.90  0.90   0.87  2.1 1.00
## like_schoolwork_with_numbers 7662  0.81  0.81  0.78   0.76  2.4 0.93
## like_solving_maths           7662  0.85  0.85  0.83   0.80  2.3 1.00
## look_forward_for_classes     7662  0.85  0.85  0.83   0.81  2.6 0.99
## one_of_favorite_classes      7662  0.87  0.86  0.85   0.82  2.5 1.14
## 
## Non missing response frequency for each item
##                                 1    2    3    4 miss
## enjoy_learning_math          0.33 0.42 0.16 0.09    0
## wish_not_to_study            0.17 0.23 0.29 0.31    0
## math_is_boring               0.16 0.30 0.33 0.21    0
## learning_interesting_things  0.30 0.44 0.19 0.06    0
## like_math                    0.31 0.38 0.18 0.13    0
## like_schoolwork_with_numbers 0.17 0.37 0.32 0.14    0
## like_solving_maths           0.24 0.35 0.26 0.16    0
## look_forward_for_classes     0.16 0.30 0.34 0.20    0
## one_of_favorite_classes      0.27 0.23 0.24 0.26    0

The alpha of the learning math factor shows an excellent fit as it is > 0.9.

ML2 <- canada_factors[, 10:19]
alpha(ML2, check.keys = T)
## 
## Reliability analysis   
## Call: alpha(x = ML2, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.93      0.93    0.93      0.57  13 0.0012  1.7 0.64     0.57
## 
##  lower alpha upper     95% confidence boundaries
## 0.93 0.93 0.93 
## 
##  Reliability if an item is dropped:
##                                  raw_alpha std.alpha G6(smc) average_r S/N
## know_teachers_expectations            0.93      0.93    0.93      0.59  13
## teacher_is_understandable             0.92      0.92    0.92      0.56  11
## interested_in_teachers_words          0.92      0.92    0.92      0.57  12
## teacher_gives_interesting_things      0.92      0.92    0.92      0.57  12
## teacher_has_clear_answers             0.92      0.92    0.92      0.56  11
## teacher_is_good_at_explaning          0.92      0.92    0.92      0.55  11
## teacher_ask_to_show_knowledge         0.92      0.92    0.92      0.57  12
## teacher_does_a_lot_to_learn           0.92      0.92    0.92      0.56  12
## teacher_tells_how_to_do_better        0.92      0.92    0.92      0.57  12
## teacher_listens_to_me                 0.92      0.92    0.92      0.57  12
##                                  alpha se  var.r med.r
## know_teachers_expectations         0.0012 0.0036  0.58
## teacher_is_understandable          0.0014 0.0048  0.57
## interested_in_teachers_words       0.0013 0.0054  0.57
## teacher_gives_interesting_things   0.0013 0.0054  0.57
## teacher_has_clear_answers          0.0014 0.0047  0.56
## teacher_is_good_at_explaning       0.0014 0.0040  0.56
## teacher_ask_to_show_knowledge      0.0013 0.0061  0.57
## teacher_does_a_lot_to_learn        0.0013 0.0057  0.56
## teacher_tells_how_to_do_better     0.0013 0.0057  0.57
## teacher_listens_to_me              0.0013 0.0057  0.57
## 
##  Item statistics 
##                                     n raw.r std.r r.cor r.drop mean   sd
## know_teachers_expectations       7662  0.66  0.68  0.62   0.60  1.6 0.68
## teacher_is_understandable        7662  0.83  0.82  0.81   0.78  1.7 0.85
## interested_in_teachers_words     7662  0.78  0.77  0.75   0.71  1.9 0.83
## teacher_gives_interesting_things 7662  0.77  0.77  0.74   0.71  2.1 0.89
## teacher_has_clear_answers        7662  0.83  0.83  0.81   0.78  1.8 0.86
## teacher_is_good_at_explaning     7662  0.84  0.84  0.83   0.79  1.6 0.83
## teacher_ask_to_show_knowledge    7662  0.75  0.76  0.72   0.69  1.8 0.80
## teacher_does_a_lot_to_learn      7662  0.79  0.79  0.76   0.74  1.7 0.81
## teacher_tells_how_to_do_better   7662  0.78  0.78  0.75   0.72  1.7 0.81
## teacher_listens_to_me            7662  0.77  0.77  0.74   0.71  1.6 0.79
## 
## Non missing response frequency for each item
##                                     1    2    3    4 miss
## know_teachers_expectations       0.53 0.39 0.06 0.02    0
## teacher_is_understandable        0.48 0.35 0.12 0.05    0
## interested_in_teachers_words     0.35 0.45 0.16 0.05    0
## teacher_gives_interesting_things 0.28 0.42 0.23 0.07    0
## teacher_has_clear_answers        0.46 0.36 0.13 0.05    0
## teacher_is_good_at_explaning     0.54 0.32 0.10 0.05    0
## teacher_ask_to_show_knowledge    0.40 0.43 0.13 0.04    0
## teacher_does_a_lot_to_learn      0.51 0.34 0.10 0.04    0
## teacher_tells_how_to_do_better   0.51 0.35 0.10 0.04    0
## teacher_listens_to_me            0.54 0.34 0.08 0.04    0

The same is for the second factor: still > 0.9.

ML3 <- canada_factors[, 20:24]
alpha(ML3, check.keys = T)
## 
## Reliability analysis   
## Call: alpha(x = ML3, check.keys = T)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.88      0.88    0.86      0.59 7.2 0.0022  2.9 0.81     0.61
## 
##  lower alpha upper     95% confidence boundaries
## 0.87 0.88 0.88 
## 
##  Reliability if an item is dropped:
##                                    raw_alpha std.alpha G6(smc) average_r S/N
## i_usually_do_well-                      0.84      0.84    0.81      0.57 5.4
## maths_is_harder_for_me_than_others      0.83      0.84    0.81      0.57 5.2
## maths_is_not_my_strength                0.83      0.83    0.80      0.55 4.9
## i_learn_quickly-                        0.85      0.85    0.82      0.59 5.7
## maths_makes_me_nervous                  0.88      0.89    0.86      0.66 7.9
##                                    alpha se  var.r med.r
## i_usually_do_well-                   0.0029 0.0117  0.56
## maths_is_harder_for_me_than_others   0.0031 0.0144  0.58
## maths_is_not_my_strength             0.0032 0.0121  0.56
## i_learn_quickly-                     0.0028 0.0128  0.58
## maths_makes_me_nervous               0.0021 0.0026  0.65
## 
##  Item statistics 
##                                       n raw.r std.r r.cor r.drop mean   sd
## i_usually_do_well-                 7662  0.83  0.84  0.80   0.74  3.2 0.86
## maths_is_harder_for_me_than_others 7662  0.86  0.85  0.81   0.76  2.9 1.03
## maths_is_not_my_strength           7662  0.88  0.87  0.84   0.79  2.8 1.11
## i_learn_quickly-                   7662  0.81  0.82  0.77   0.71  3.0 0.93
## maths_makes_me_nervous             7662  0.72  0.71  0.58   0.55  2.8 1.01
## 
## Non missing response frequency for each item
##                                       1    2    3    4 miss
## i_usually_do_well                  0.45 0.36 0.14 0.05    0
## maths_is_harder_for_me_than_others 0.13 0.20 0.32 0.35    0
## maths_is_not_my_strength           0.19 0.20 0.26 0.34    0
## i_learn_quickly                    0.34 0.37 0.22 0.08    0
## maths_makes_me_nervous             0.12 0.25 0.32 0.31    0

For the last factor, maths itself, the alpha is a bit lower but it indicates that the fit is still good (> 0.8).

Regression

2 models

In the last part, I build 2 multiple linear regression models to predict the students’ values of maths achievements (math_achievement, or BSMMAT01). The first model uses the factors derived previously as the only independent variables, while the second one controls for (1) gender - BSBG01, (2) parental education, operationalized by me as the father’s highest education - BSBG07B, and (3) the fact of being born in the country - BSBG10A.

# 1st model

scores <- (varimax_ml$scores)
canada_factors1 <- cbind(canada_reggression,scores)

canada_factors1 <- canada_factors1 %>% 
  rename(
    learning_maths = ML1,
    teachers_role = ML2,
    maths_itself = ML3
        )

model1 <- lm(math_achievement ~ teachers_role + learning_maths + maths_itself, canada_factors1)

# 2nd model

canada = read_sav(path)

canada <- canada %>% select(
  "BSBM17A",  "BSBM17B",  "BSBM17C", "BSBM17D", "BSBM17E", "BSBM17F", "BSBM17G", "BSBM17H", "BSBM17I", "BSBM18A", "BSBM18B", "BSBM18C", "BSBM18D", "BSBM18E", "BSBM18F", "BSBM18G", "BSBM18H", "BSBM18I", "BSBM18J", "BSBM19A", "BSBM19B", "BSBM19C", "BSBM19D", "BSBM19E", "BSMMAT01", "BSBG01", "BSBG07B", "BSBG10A"
)

canada <- rename(canada,
                 ##first block of variables
                 enjoy_learning_math = BSBM17A,
                 wish_not_to_study = BSBM17B,
                 math_is_boring = BSBM17C,
                 learning_interesting_things = BSBM17D,
                 like_math = BSBM17E,
                 like_schoolwork_with_numbers = BSBM17F,
                 like_solving_maths = BSBM17G,
                 look_forward_for_classes = BSBM17H,
                 one_of_favorite_classes = BSBM17I,
                 ##second block of variables
                 know_teachers_expectations = BSBM18A,
                 teacher_is_understandable = BSBM18B,
                 interested_in_teachers_words = BSBM18C,
                 teacher_gives_interesting_things = BSBM18D,
                 teacher_has_clear_answers = BSBM18E,
                 teacher_is_good_at_explaning = BSBM18F,
                 teacher_ask_to_show_knowledge = BSBM18G,
                 teacher_does_a_lot_to_learn = BSBM18H,
                 teacher_tells_how_to_do_better = BSBM18I,
                 teacher_listens_to_me = BSBM18J,
                 ##third block of variables
                 i_usually_do_well = BSBM19A,
                 maths_is_harder_for_me_than_others = BSBM19B,
                 maths_is_not_my_strength = BSBM19C,
                 i_learn_quickly = BSBM19D,
                 maths_makes_me_nervous = BSBM19E,
                 ##dependent variable
                 math_achievement = BSMMAT01,
                 ## control variables
                 gender = BSBG01,
                 fathers_education = BSBG07B,
                 place_of_birth = BSBG10A)


canada <- na.omit(canada) %>% 
  as.data.frame() %>% 
  left_join(canada_factors1)

canada$math_achievement = as.numeric(canada$math_achievement)
canada$gender = as.character(canada$gender)
canada$fathers_education = as.factor(canada$fathers_education)
canada$place_of_birth = as.character(canada$place_of_birth)

model2 <- lm(math_achievement ~ teachers_role + learning_maths + maths_itself + gender + fathers_education + place_of_birth, canada)

# table creation

tab_model(model1, model2, dv.labels = c("Factors only", "Factors with control for ..."), CSS = list(css.depvarhead = '+color: blue;'))
  Factors only Factors with control for …
Predictors Estimates CI p Estimates CI p
(Intercept) 537.14 535.87 – 538.41 <0.001 519.61 504.90 – 534.33 <0.001
teachers_role -0.39 -1.62 – 0.85 0.542 -0.54 -1.75 – 0.66 0.378
learning_maths -10.45 -11.76 – -9.13 <0.001 -9.10 -10.39 – -7.80 <0.001
maths_itself -36.60 -37.92 – -35.28 <0.001 -34.13 -35.44 – -32.81 <0.001
gender [2] -2.43 -4.92 – 0.06 0.056
fathers_education [2] 2.04 -15.35 – 19.42 0.818
fathers_education [3] 9.78 -5.31 – 24.86 0.204
fathers_education [4] 18.76 3.65 – 33.87 0.015
fathers_education [5] 20.23 4.89 – 35.57 0.010
fathers_education [6] 40.48 25.33 – 55.62 <0.001
fathers_education [7] 43.13 28.06 – 58.21 <0.001
fathers_education [8] 10.03 -4.77 – 24.82 0.184
place_of_birth [2] -5.06 -8.73 – -1.40 0.007
Observations 7662 7565
R2 / R2 adjusted 0.313 / 0.312 0.349 / 0.348

To start, the adjusted R^2 does not rise a lot when controlling for the 3 mentioned variables: it increases by 0.036. The factors learning maths (ML1) and maths itself (ML3) are statistically significant in both models. Next, I want to examine the second model in more details:

  1. With the increase in learning_maths by 1, the student’s maths achievement score decreases by 9.10.
  2. With the increase in maths_itself by 1, the student’s maths achievement score decreases by 34.13. 3.If the father’s educational level is post-secondary & non-tertiary (fathers_education4), the student’s maths achievement score increases by 18.76.
  3. If the father’s educational level is short-cycle tertiary (fathers_education5), the student’s maths achievement score increases by 20.23.
  4. If the father’s educational level is Bachelor’s or equivalent (fathers_education6), the student’s maths achievement score increases by 40.48.
  5. If the father’s educational level is Postgraduate (fathers_education7), the student’s maths achievement score increases by 43.13.
  6. If the student was born outside the country (place_of_birth2), than his or her maths achievement score decreases by 5.06.

Model diagnostics

The residual values are distributed (sort of) normally:

hist(resid(model2),
     xlab   = "Residuals",
     ylab = "",
     main   = "Second model: residuals' distribution",
     col    = "#60AB9A",
     border = "black",
     breaks = 20)

Checking for variance inflation factor (VIF) - no multicolinearity (values < 5):

vif(model2)
##                       GVIF Df GVIF^(1/(2*Df))
## teachers_role     1.008483  1        1.004232
## learning_maths    1.050537  1        1.024957
## maths_itself      1.057675  1        1.028433
## gender            1.012260  1        1.006111
## fathers_education 1.070114  7        1.004852
## place_of_birth    1.036728  1        1.018198

Residuals again:

Residuals vs. Fitted: no non-linear patters detected (red line is like horizontal). Normal Q-Q: residuals are normally distributed (the points are in line). Scale-Location: no problems with homoscedasticity (the red line is not strictly horizontal but it should be legit). Residuals vs. Leverage: there are no influential outliers (red line is horizontal).

par(mfrow = c(2,2))
plot(model2)

Conclusion

The most important result of this paper is that the variables of 3 questions (BSBM17-BSBM19) used in TIMSS 2015 may serve as the basis for the factors’ construction. Those which I got do consist of the variables from the blocks of questions, dedicated to the (1) learning of the maths, (2) teacher’s moderation role, and (3) attitudes towards maths as such. As the regression analysis has shown, the second factor does not contribute to the students’ maths achievements. The control for gender, parental educational level, and place of birth proved that the factors alone explain approximately 1/3 of the variation in the data.