Introduction and data description

Hello! This time I’m going to explore students’ attitudes towards studying maths in Canada. We have 25 variables describing students attitudes toward differents aspects of maths: the subject itself, classes, teachers, personal abilities on it. I’m interested to look at how these conditions can be related to each other, how these can be grouped by some factors, how many such factors are there.

Looking at the variables I can suggest that they’ll be divided on three groups:

  • students’ attitude to maths (as a subject to learn and as a class to attend): do they like it or find boring, do they like the subject itself or numbers/ problem solving. The variables, falling here are: “enjoy”, “wish_not”, “boring”, “learn_intrst”, “like_math”, “like_num”, “like_prob”, “look_forw”, “favourite”;
  • students’ attitude to their math teacher : does teacher explain material and answer questions clear, do they listen to students, suggest, help, advice. The variables, falling here are: “tchr_exp_do”, “tchr_easy_undrstnd”, “intrst_tchr_say”, “intrst_do”, “tchr_clear_answ”, “tchr_expl_good”, “tchr_show_learn”, “dif_help”, “tel_how_better”, “tchr_listen”;
  • how strong do students feel about math subject : are they doing it well, learn quickly or find it difficult and nervous. The folowing variables are falling here: “us_do_well”, “difficult”, “not_strong”, “learn_quick”, “makes_nerv”, “math_achiv”.

(variables in this case - ordinal 4-point scales from “Agree a lot” to “Disagree a lot”).

Also, there is a small group, which will be needed later, with information about the student and their parents : parental education, student’s gender and if a student is a native Canadian or was born outside.

Name_of_the_variable <- c("enjoy", "wish_not", "boring", "learn_intrst", "like_math", "like_num", "like_prob", "look_forw", "favourite",
                    "tchr_exp_do", "tchr_easy_undrstnd", "intrst_tchr_say", "intrst_do", "tchr_clear_answ", "tchr_expl_good", "tchr_show_learn", "dif_help", "tel_how_better", "tchr_listen",
                    "us_do_well", "difficult", "not_strong", "learn_quick", "makes_nerv",
                    "math_achiv", "sex_stud", "momedu", "fathedu", "if_native")
Label <- c("enjoy learning mathematics", "wish have not to study math", "math is boring", "learn interesting things", "like mathematics", "like numbers", "like math problems", "look forward to math class", "favorite subject",
           "teacher expects to do", "teacher is easy to understand", "interested in what tchr says", "interesting things to do", "teacher clear answers", "teacher explains good", "teacher shows learned", "different things to help", "tells how to do better", "teacher listens",
           "usually do well in math", "math is more difficult", "mathematics not my strength", "learn quickly in mathematics", "math makes nervous",
           "math achievement", "student's gender", "mother's education", "father's education", "if a student native or not")
Decoding <- data.frame(Name_of_the_variable, Label)
kable(Decoding) %>%
  kable_styling(c("bordered"))
Name_of_the_variable Label
enjoy enjoy learning mathematics
wish_not wish have not to study math
boring math is boring
learn_intrst learn interesting things
like_math like mathematics
like_num like numbers
like_prob like math problems
look_forw look forward to math class
favourite favorite subject
tchr_exp_do teacher expects to do
tchr_easy_undrstnd teacher is easy to understand
intrst_tchr_say interested in what tchr says
intrst_do interesting things to do
tchr_clear_answ teacher clear answers
tchr_expl_good teacher explains good
tchr_show_learn teacher shows learned
dif_help different things to help
tel_how_better tells how to do better
tchr_listen teacher listens
us_do_well usually do well in math
difficult math is more difficult
not_strong mathematics not my strength
learn_quick learn quickly in mathematics
makes_nerv math makes nervous
math_achiv math achievement
sex_stud student’s gender
momedu mother’s education
fathedu father’s education
if_native if a student native or not

Below descriptive statistics can be seen: both tables with N of observation for different categories in eachvariable and statistical values.

summary(dat1)
##                enjoy                   wish_not   
##  Agree a lot      :2464   Agree a lot      :1258  
##  Agree a little   :3160   Agree a little   :1762  
##  Disagree a little:1228   Disagree a little:2167  
##  Disagree a lot   : 696   Disagree a lot   :2361  
##                boring                learn_intrst 
##  Agree a lot      :1182   Agree a lot      :2293  
##  Agree a little   :2275   Agree a little   :3323  
##  Disagree a little:2491   Disagree a little:1461  
##  Disagree a lot   :1600   Disagree a lot   : 471  
##              like_math                 like_num   
##  Agree a lot      :2336   Agree a lot      :1301  
##  Agree a little   :2845   Agree a little   :2833  
##  Disagree a little:1395   Disagree a little:2377  
##  Disagree a lot   : 972   Disagree a lot   :1037  
##              like_prob                look_forw   
##  Agree a lot      :1795   Agree a lot      :1243  
##  Agree a little   :2628   Agree a little   :2236  
##  Disagree a little:1955   Disagree a little:2569  
##  Disagree a lot   :1170   Disagree a lot   :1500  
##              favourite               tchr_exp_do  
##  Agree a lot      :2008   Agree a lot      :4020  
##  Agree a little   :1758   Agree a little   :2959  
##  Disagree a little:1815   Disagree a little: 444  
##  Disagree a lot   :1967   Disagree a lot   : 125  
##          tchr_easy_undrstnd          intrst_tchr_say
##  Agree a lot      :3647     Agree a lot      :2631  
##  Agree a little   :2636     Agree a little   :3376  
##  Disagree a little: 904     Disagree a little:1165  
##  Disagree a lot   : 361     Disagree a lot   : 376  
##              intrst_do             tchr_clear_answ
##  Agree a lot      :2133   Agree a lot      :3461  
##  Agree a little   :3150   Agree a little   :2703  
##  Disagree a little:1734   Disagree a little:1014  
##  Disagree a lot   : 531   Disagree a lot   : 370  
##            tchr_expl_good          tchr_show_learn
##  Agree a lot      :4104   Agree a lot      :3045  
##  Agree a little   :2377   Agree a little   :3256  
##  Disagree a little: 728   Disagree a little: 976  
##  Disagree a lot   : 339   Disagree a lot   : 271  
##               dif_help              tel_how_better
##  Agree a lot      :3875   Agree a lot      :3822  
##  Agree a little   :2590   Agree a little   :2667  
##  Disagree a little: 791   Disagree a little: 783  
##  Disagree a lot   : 292   Disagree a lot   : 276  
##             tchr_listen               us_do_well  
##  Agree a lot      :4096   Agree a lot      :3419  
##  Agree a little   :2546   Agree a little   :2742  
##  Disagree a little: 632   Disagree a little:1019  
##  Disagree a lot   : 274   Disagree a lot   : 368  
##              difficult                not_strong  
##  Agree a lot      : 965   Agree a lot      :1393  
##  Agree a little   :1508   Agree a little   :1547  
##  Disagree a little:2387   Disagree a little:2006  
##  Disagree a lot   :2688   Disagree a lot   :2602  
##             learn_quick               makes_nerv  
##  Agree a lot      :2549   Agree a lot      : 926  
##  Agree a little   :2791   Agree a little   :1878  
##  Disagree a little:1625   Disagree a little:2407  
##  Disagree a lot   : 583   Disagree a lot   :2337
describeBy(dat1) 
dat1 %>% group_by(enjoy) %>% summarize(count=n()) %>%  mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(wish_not) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(boring) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(learn_intrst) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_math) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_num) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_prob) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(look_forw) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(favourite) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_exp_do) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_easy_undrstnd) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(intrst_tchr_say) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(intrst_do) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_clear_answ) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_expl_good) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_show_learn) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tel_how_better) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_listen) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(us_do_well) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(difficult) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(not_strong) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(learn_quick) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(makes_nerv) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()

EFA

Below the correlation matrix between variables can be seen.

dat.cor #was created at the very beginning
##                         enjoy   wish_not     boring learn_intrst
## enjoy               1.0000000 -0.6530110 -0.7311512    0.7337597
## wish_not           -0.6530110  1.0000000  0.6689014   -0.5301443
## boring             -0.7311512  0.6689014  1.0000000   -0.5909139
## learn_intrst        0.7337597 -0.5301443 -0.5909139    1.0000000
## like_math           0.9097056 -0.6735586 -0.7338107    0.7146248
## like_num            0.7665287 -0.5485120 -0.6177848    0.6438315
## like_prob           0.7994023 -0.5943102 -0.6563359    0.6550632
## look_forw           0.8202649 -0.6084189 -0.6883917    0.6938163
## favourite           0.8540816 -0.6228135 -0.7043775    0.6439868
## tchr_exp_do         0.3946110 -0.3065921 -0.2818359    0.4166977
## tchr_easy_undrstnd  0.4301234 -0.3110446 -0.3365352    0.4526249
## intrst_tchr_say     0.5870188 -0.4303067 -0.4879184    0.6337751
## intrst_do           0.5510505 -0.4140823 -0.4654591    0.6129997
## tchr_clear_answ     0.3846048 -0.2913330 -0.3230871    0.4449826
## tchr_expl_good      0.4376279 -0.2992816 -0.3357180    0.4768712
## tchr_show_learn     0.3893986 -0.2891893 -0.2974519    0.4275552
## dif_help            0.3783950 -0.2582635 -0.2981235    0.4609514
## tel_how_better      0.3339233 -0.2357417 -0.2551312    0.4200189
## tchr_listen         0.3292856 -0.2474333 -0.2607645    0.4052187
## us_do_well          0.6424544 -0.4900462 -0.4839159    0.4325017
## difficult          -0.5096370  0.4665074  0.4377293   -0.3042396
## not_strong         -0.6162819  0.5329517  0.5321202   -0.3779231
## learn_quick         0.6518604 -0.4669332 -0.4941101    0.4678809
## makes_nerv         -0.4073788  0.3773705  0.3785172   -0.2330107
##                     like_math   like_num  like_prob  look_forw  favourite
## enjoy               0.9097056  0.7665287  0.7994023  0.8202649  0.8540816
## wish_not           -0.6735586 -0.5485120 -0.5943102 -0.6084189 -0.6228135
## boring             -0.7338107 -0.6177848 -0.6563359 -0.6883917 -0.7043775
## learn_intrst        0.7146248  0.6438315  0.6550632  0.6938163  0.6439868
## like_math           1.0000000  0.7782737  0.8258742  0.8258958  0.8927466
## like_num            0.7782737  1.0000000  0.8102412  0.7329954  0.7448598
## like_prob           0.8258742  0.8102412  1.0000000  0.7638991  0.7881490
## look_forw           0.8258958  0.7329954  0.7638991  1.0000000  0.8358233
## favourite           0.8927466  0.7448598  0.7881490  0.8358233  1.0000000
## tchr_exp_do         0.3771673  0.3214364  0.3350991  0.4262023  0.3500125
## tchr_easy_undrstnd  0.4049418  0.3366131  0.3485542  0.4657342  0.3921987
## intrst_tchr_say     0.5470005  0.5033269  0.5001450  0.6101532  0.5040575
## intrst_do           0.5303567  0.4838652  0.4905195  0.6058280  0.4946253
## tchr_clear_answ     0.3506200  0.3208154  0.3175406  0.4349045  0.3491850
## tchr_expl_good      0.4003023  0.3473059  0.3447801  0.4729798  0.3899317
## tchr_show_learn     0.3579386  0.3368703  0.3388873  0.4141289  0.3427036
## dif_help            0.3505382  0.3087548  0.3081696  0.4381448  0.3360577
## tel_how_better      0.3171806  0.2846215  0.2892075  0.4009448  0.3050696
## tchr_listen         0.3061258  0.2734203  0.2592535  0.3939367  0.2858839
## us_do_well          0.6674715  0.5418959  0.6179009  0.5427544  0.6674118
## difficult          -0.5474407 -0.4326155 -0.5120452 -0.4213153 -0.5567467
## not_strong         -0.6576111 -0.5316221 -0.6015587 -0.5257382 -0.6817892
## learn_quick         0.6678710  0.5826812  0.6394750  0.5653425  0.6698119
## makes_nerv         -0.4355800 -0.3526284 -0.4132874 -0.3572428 -0.4502413
##                    tchr_exp_do tchr_easy_undrstnd intrst_tchr_say
## enjoy                0.3946110          0.4301234       0.5870188
## wish_not            -0.3065921         -0.3110446      -0.4303067
## boring              -0.2818359         -0.3365352      -0.4879184
## learn_intrst         0.4166977          0.4526249       0.6337751
## like_math            0.3771673          0.4049418       0.5470005
## like_num             0.3214364          0.3366131       0.5033269
## like_prob            0.3350991          0.3485542       0.5001450
## look_forw            0.4262023          0.4657342       0.6101532
## favourite            0.3500125          0.3921987       0.5040575
## tchr_exp_do          1.0000000          0.6622781       0.5727194
## tchr_easy_undrstnd   0.6622781          1.0000000       0.7061846
## intrst_tchr_say      0.5727194          0.7061846       1.0000000
## intrst_do            0.5437896          0.6523856       0.7904675
## tchr_clear_answ      0.5899970          0.7888271       0.6670746
## tchr_expl_good       0.5984295          0.8455601       0.6976676
## tchr_show_learn      0.5692919          0.6335037       0.6144980
## dif_help             0.5483149          0.6898528       0.6360824
## tel_how_better       0.5739399          0.6689336       0.6030622
## tchr_listen          0.5564177          0.6684535       0.6206567
## us_do_well           0.3715907          0.3693559       0.3574044
## difficult           -0.2467881         -0.2340042      -0.2186844
## not_strong          -0.2569218         -0.2572554      -0.2760368
## learn_quick          0.3406526          0.3692395       0.3765803
## makes_nerv          -0.1943283         -0.2063668      -0.1569602
##                     intrst_do tchr_clear_answ tchr_expl_good
## enjoy               0.5510505       0.3846048      0.4376279
## wish_not           -0.4140823      -0.2913330     -0.2992816
## boring             -0.4654591      -0.3230871     -0.3357180
## learn_intrst        0.6129997       0.4449826      0.4768712
## like_math           0.5303567       0.3506200      0.4003023
## like_num            0.4838652       0.3208154      0.3473059
## like_prob           0.4905195       0.3175406      0.3447801
## look_forw           0.6058280       0.4349045      0.4729798
## favourite           0.4946253       0.3491850      0.3899317
## tchr_exp_do         0.5437896       0.5899970      0.5984295
## tchr_easy_undrstnd  0.6523856       0.7888271      0.8455601
## intrst_tchr_say     0.7904675       0.6670746      0.6976676
## intrst_do           1.0000000       0.6660573      0.6629648
## tchr_clear_answ     0.6660573       1.0000000      0.8394925
## tchr_expl_good      0.6629648       0.8394925      1.0000000
## tchr_show_learn     0.6247443       0.6606140      0.6764640
## dif_help            0.6855708       0.6905161      0.7463640
## tel_how_better      0.6231466       0.7050020      0.7142226
## tchr_listen         0.6097310       0.7216724      0.7008700
## us_do_well          0.3328709       0.3061418      0.3480884
## difficult          -0.1929729      -0.1784544     -0.2076731
## not_strong         -0.2526752      -0.2023622     -0.2384670
## learn_quick         0.3511854       0.3171648      0.3514692
## makes_nerv         -0.1717992      -0.1661405     -0.1691876
##                    tchr_show_learn   dif_help tel_how_better tchr_listen
## enjoy                    0.3893986  0.3783950     0.33392335  0.32928557
## wish_not                -0.2891893 -0.2582635    -0.23574170 -0.24743332
## boring                  -0.2974519 -0.2981235    -0.25513123 -0.26076448
## learn_intrst             0.4275552  0.4609514     0.42001893  0.40521870
## like_math                0.3579386  0.3505382     0.31718059  0.30612582
## like_num                 0.3368703  0.3087548     0.28462154  0.27342034
## like_prob                0.3388873  0.3081696     0.28920747  0.25925350
## look_forw                0.4141289  0.4381448     0.40094479  0.39393667
## favourite                0.3427036  0.3360577     0.30506961  0.28588393
## tchr_exp_do              0.5692919  0.5483149     0.57393994  0.55641772
## tchr_easy_undrstnd       0.6335037  0.6898528     0.66893360  0.66845350
## intrst_tchr_say          0.6144980  0.6360824     0.60306215  0.62065668
## intrst_do                0.6247443  0.6855708     0.62314659  0.60973104
## tchr_clear_answ          0.6606140  0.6905161     0.70500196  0.72167240
## tchr_expl_good           0.6764640  0.7463640     0.71422265  0.70087000
## tchr_show_learn          1.0000000  0.6823598     0.65665394  0.67364836
## dif_help                 0.6823598  1.0000000     0.74371034  0.68564237
## tel_how_better           0.6566539  0.7437103     1.00000000  0.74304611
## tchr_listen              0.6736484  0.6856424     0.74304611  1.00000000
## us_do_well               0.2947806  0.2470531     0.24073244  0.23507386
## difficult               -0.1641726 -0.1152264    -0.09271954 -0.09978806
## not_strong              -0.1981051 -0.1524332    -0.12130732 -0.12665741
## learn_quick              0.2977383  0.2635005     0.22936755  0.22861487
## makes_nerv              -0.1496616 -0.1283584    -0.10061393 -0.11434392
##                    us_do_well   difficult not_strong learn_quick
## enjoy               0.6424544 -0.50963695 -0.6162819   0.6518604
## wish_not           -0.4900462  0.46650744  0.5329517  -0.4669332
## boring             -0.4839159  0.43772932  0.5321202  -0.4941101
## learn_intrst        0.4325017 -0.30423963 -0.3779231   0.4678809
## like_math           0.6674715 -0.54744070 -0.6576111   0.6678710
## like_num            0.5418959 -0.43261546 -0.5316221   0.5826812
## like_prob           0.6179009 -0.51204521 -0.6015587   0.6394750
## look_forw           0.5427544 -0.42131534 -0.5257382   0.5653425
## favourite           0.6674118 -0.55674667 -0.6817892   0.6698119
## tchr_exp_do         0.3715907 -0.24678808 -0.2569218   0.3406526
## tchr_easy_undrstnd  0.3693559 -0.23400419 -0.2572554   0.3692395
## intrst_tchr_say     0.3574044 -0.21868438 -0.2760368   0.3765803
## intrst_do           0.3328709 -0.19297292 -0.2526752   0.3511854
## tchr_clear_answ     0.3061418 -0.17845436 -0.2023622   0.3171648
## tchr_expl_good      0.3480884 -0.20767311 -0.2384670   0.3514692
## tchr_show_learn     0.2947806 -0.16417264 -0.1981051   0.2977383
## dif_help            0.2470531 -0.11522643 -0.1524332   0.2635005
## tel_how_better      0.2407324 -0.09271954 -0.1213073   0.2293676
## tchr_listen         0.2350739 -0.09978806 -0.1266574   0.2286149
## us_do_well          1.0000000 -0.73171095 -0.7716415   0.7944566
## difficult          -0.7317110  1.00000000  0.8199232  -0.6814399
## not_strong         -0.7716415  0.81992317  1.0000000  -0.7276280
## learn_quick         0.7944566 -0.68143986 -0.7276280   1.0000000
## makes_nerv         -0.5157156  0.59404085  0.5982757  -0.5031231
##                    makes_nerv
## enjoy              -0.4073788
## wish_not            0.3773705
## boring              0.3785172
## learn_intrst       -0.2330107
## like_math          -0.4355800
## like_num           -0.3526284
## like_prob          -0.4132874
## look_forw          -0.3572428
## favourite          -0.4502413
## tchr_exp_do        -0.1943283
## tchr_easy_undrstnd -0.2063668
## intrst_tchr_say    -0.1569602
## intrst_do          -0.1717992
## tchr_clear_answ    -0.1661405
## tchr_expl_good     -0.1691876
## tchr_show_learn    -0.1496616
## dif_help           -0.1283584
## tel_how_better     -0.1006139
## tchr_listen        -0.1143439
## us_do_well         -0.5157156
## difficult           0.5940408
## not_strong          0.5982757
## learn_quick        -0.5031231
## makes_nerv          1.0000000
#now all variables can be converted into numeric type, yay
dat1 = as.data.frame(lapply(dat1, as.numeric)) # now scale is from 1 - strong agreement, to 4 - strong disagree. In ohter words: the higher the value - the more disagreement. 
#str(dat1) #beautiful

As for the number за factors: 4 factors are suggested by parallel analysis. It also can be seen on the graph below, where crosses above the line are suggested factors. Well, let’s try this on practice.

fa.parallel(dat1, fa="both", n.iter=100)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  3
# running different options of fa: no rotation, varimax and oblimin rotation. Also I've done both with and without cor="mixed"
fa(dat1, nfactors=3, rotate="varimax", fm="ml", cor="mixed")  
## 
## mixed.cor is deprecated, please use mixedCor.
## Factor Analysis using method =  ml
## Call: fa(r = dat1, nfactors = 3, rotate = "varimax", fm = "ml", cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                      ML2   ML1   ML3   h2    u2 com
## enjoy               0.26  0.84  0.34 0.88 0.117 1.5
## wish_not           -0.18 -0.59 -0.33 0.49 0.506 1.8
## boring             -0.20 -0.69 -0.28 0.60 0.401 1.5
## learn_intrst        0.39  0.69  0.11 0.65 0.355 1.6
## like_math           0.21  0.85  0.39 0.92 0.084 1.5
## like_num            0.20  0.75  0.28 0.69 0.312 1.4
## like_prob           0.19  0.77  0.36 0.76 0.242 1.6
## look_forw           0.34  0.79  0.24 0.80 0.203 1.6
## favourite           0.21  0.79  0.43 0.85 0.150 1.7
## tchr_exp_do         0.65  0.19  0.19 0.50 0.502 1.3
## tchr_easy_undrstnd  0.84  0.16  0.19 0.78 0.225 1.2
## intrst_tchr_say     0.71  0.45  0.06 0.71 0.293 1.7
## intrst_do           0.70  0.44  0.03 0.69 0.313 1.7
## tchr_clear_answ     0.87  0.14  0.12 0.78 0.216 1.1
## tchr_expl_good      0.88  0.17  0.15 0.83 0.173 1.1
## tchr_show_learn     0.74  0.19  0.09 0.60 0.401 1.2
## dif_help            0.81  0.19  0.03 0.69 0.310 1.1
## tel_how_better      0.80  0.15  0.02 0.67 0.328 1.1
## tchr_listen         0.80  0.14  0.03 0.66 0.337 1.1
## us_do_well          0.20  0.39  0.75 0.76 0.244 1.7
## difficult          -0.05 -0.25 -0.85 0.78 0.216 1.2
## not_strong         -0.05 -0.39 -0.83 0.84 0.163 1.4
## learn_quick         0.20  0.43  0.69 0.69 0.305 1.9
## makes_nerv         -0.06 -0.23 -0.60 0.42 0.582 1.3
## 
##                        ML2  ML1  ML3
## SS loadings           6.82 6.37 3.84
## Proportion Var        0.28 0.27 0.16
## Cumulative Var        0.28 0.55 0.71
## Proportion Explained  0.40 0.37 0.23
## Cumulative Proportion 0.40 0.77 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  24.48 with Chi Square of  184517.5
## The degrees of freedom for the model are 207  and the objective function was  1.28 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic number of observations is  7548 with the empirical chi square  1792.57  with prob <  2.7e-250 
## The total number of observations was  7548  with Likelihood Chi Square =  9653.09  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.932
## RMSEA index =  0.078  and the 90 % confidence intervals are  0.076 0.079
## BIC =  7804.78
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3
## Correlation of (regression) scores with factors   0.97 0.96 0.94
## Multiple R square of scores with factors          0.95 0.93 0.89
## Minimum correlation of possible factor scores     0.90 0.86 0.78
# plot and diagram for the chosen model:
factor.plot(fa(dat1, nfactors=3, rotate="varimax", cor="mixed", fm="ml"))
## 
## mixed.cor is deprecated, please use mixedCor.

fa.diagram(fa(dat1, nfactors=3, rotate="varimax", cor="mixed", fm="ml"))
## 
## mixed.cor is deprecated, please use mixedCor.

#i've chosen the model with varimax roration. However, the code for the other I've decided to leave here, why not?
#fa(dat1, nfactors=3, rotate="none", fm="ml") 
#fa(dat1, nfactors=3, rotate="none", fm="ml", cor="mixed") 
#fa(dat1, nfactors=3, rotate="varimax", fm="ml") 
#fa(dat1, nfactors=3, rotate="oblimin", fm="ml") 
#fa(dat1, nfactors=3, rotate="oblimin", fm="ml", cor="mixed") 

# I've also tried a model with 4 factors, but that was not better at all, than model with suggested N of factors.
#fa(dat1, nfactors=4, rotate="varimax", fm="ml")
#fa(dat1, nfactors=4, rotate="varimax", fm="ml", cor="mixed")

I’ve chosen the model with varimax rotation, as it gave better results. Here they are:

  • Low Cumulative Variance = 0.71 - nice!
  • Proportion Variances = 0.23, 0.27, 0,16 - every factor explains more than 10% of total variance, which is great;
  • Proportion Explained = 0.40, 0.37, 0.23 - pretty good (at least better, than in other models);
  • RMSR = 0.02 (<0.05 perfect);
  • RMSEA index = 0.078 (<0.8 acceptable);
  • Tucker Lewis Index= 0.93 (>0.90 acceptable).

-> the model is pretty good. Graph shows the distribution of variables to three factors. As if we look closer, we can notice, that factors have divided variables in the same way, as I’ve suggested at the beginning:

  • 1st factor, ML1 - students’ attitude to maths, ;
  • 2nd factor, ML2 - students’ attitude to their math teacher, theacher’s qualities;
  • 3rd factor, ML3 - how strong do students feel about math subject, their personal abilities.

Also, if we look at the graph, we can see, that some variables are highlighted with red - the same have negative values for loadings - these variables relate to “negative” attitudes: when students feel boring or nervous, don’t wont to visit math classes, feel difficulties with the subject. That is explained by the fact, if a student gives positive answers to positive attitudes, they will the same way give negative answers to the questions with negative attitudes to math.

Moving forwards, let’s test our factors: Cronbach’s alpha (>.7 indicates good reliability). We have 0.93, 0.94 and 0.88 to the 1st, 2nd and 3rd factors respectively.

ML1<- as.data.frame(dat1[c("enjoy", "wish_not", "boring", "learn_intrst", "like_math", "like_num", "like_prob", "look_forw", "favourite")])
psych::alpha(ML1, check.keys=TRUE)
## Warning in psych::alpha(ML1, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = ML1, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.94      0.94    0.94      0.63  16 0.001  2.3 0.81     0.64
## 
##  lower alpha upper     95% confidence boundaries
## 0.94 0.94 0.94 
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r
## enjoy             0.93      0.93    0.93      0.62  13   0.0012 0.0082
## wish_not-         0.94      0.94    0.94      0.66  16   0.0010 0.0066
## boring-           0.93      0.94    0.93      0.65  15   0.0011 0.0100
## learn_intrst      0.94      0.94    0.94      0.65  15   0.0011 0.0083
## like_math         0.93      0.93    0.92      0.61  13   0.0013 0.0072
## like_num          0.93      0.93    0.93      0.64  14   0.0012 0.0092
## like_prob         0.93      0.93    0.93      0.63  14   0.0012 0.0091
## look_forw         0.93      0.93    0.93      0.63  13   0.0012 0.0094
## favourite         0.93      0.93    0.93      0.62  13   0.0012 0.0083
##              med.r
## enjoy         0.61
## wish_not-     0.66
## boring-       0.66
## learn_intrst  0.66
## like_math     0.61
## like_num      0.63
## like_prob     0.63
## look_forw     0.63
## favourite     0.62
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean   sd
## enjoy        7548  0.88  0.89  0.88   0.85  2.0 0.93
## wish_not-    7548  0.72  0.71  0.65   0.63  2.3 1.07
## boring-      7548  0.78  0.78  0.74   0.72  2.4 0.99
## learn_intrst 7548  0.74  0.75  0.70   0.67  2.0 0.86
## like_math    7548  0.90  0.90  0.90   0.87  2.1 1.00
## like_num     7548  0.81  0.81  0.78   0.76  2.4 0.93
## like_prob    7548  0.85  0.85  0.83   0.80  2.3 1.00
## look_forw    7548  0.85  0.85  0.83   0.81  2.6 0.99
## favourite    7548  0.87  0.86  0.85   0.82  2.5 1.14
## 
## Non missing response frequency for each item
##                 1    2    3    4 miss
## enjoy        0.33 0.42 0.16 0.09    0
## wish_not     0.17 0.23 0.29 0.31    0
## boring       0.16 0.30 0.33 0.21    0
## learn_intrst 0.30 0.44 0.19 0.06    0
## like_math    0.31 0.38 0.18 0.13    0
## like_num     0.17 0.38 0.31 0.14    0
## like_prob    0.24 0.35 0.26 0.16    0
## look_forw    0.16 0.30 0.34 0.20    0
## favourite    0.27 0.23 0.24 0.26    0
ML2<- as.data.frame(dat1[c("tchr_exp_do", "tchr_easy_undrstnd", "intrst_tchr_say", "intrst_do", "tchr_clear_answ", "tchr_expl_good", "tchr_show_learn", "dif_help", "tel_how_better", "tchr_listen")])
psych::alpha(ML2, check.keys=TRUE)
## 
## Reliability analysis   
## Call: psych::alpha(x = ML2, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.93      0.93    0.93      0.57  13 0.0012  1.7 0.64     0.57
## 
##  lower alpha upper     95% confidence boundaries
## 0.93 0.93 0.93 
## 
##  Reliability if an item is dropped:
##                    raw_alpha std.alpha G6(smc) average_r S/N alpha se
## tchr_exp_do             0.93      0.93    0.93      0.59  13   0.0012
## tchr_easy_undrstnd      0.92      0.92    0.92      0.56  11   0.0014
## intrst_tchr_say         0.92      0.92    0.92      0.57  12   0.0013
## intrst_do               0.92      0.92    0.92      0.57  12   0.0013
## tchr_clear_answ         0.92      0.92    0.92      0.56  11   0.0014
## tchr_expl_good          0.92      0.92    0.92      0.55  11   0.0014
## tchr_show_learn         0.92      0.92    0.92      0.57  12   0.0013
## dif_help                0.92      0.92    0.92      0.56  12   0.0013
## tel_how_better          0.92      0.92    0.92      0.57  12   0.0013
## tchr_listen             0.92      0.92    0.92      0.57  12   0.0013
##                     var.r med.r
## tchr_exp_do        0.0035  0.58
## tchr_easy_undrstnd 0.0049  0.57
## intrst_tchr_say    0.0054  0.57
## intrst_do          0.0054  0.57
## tchr_clear_answ    0.0047  0.56
## tchr_expl_good     0.0040  0.56
## tchr_show_learn    0.0061  0.57
## dif_help           0.0057  0.56
## tel_how_better     0.0057  0.57
## tchr_listen        0.0057  0.57
## 
##  Item statistics 
##                       n raw.r std.r r.cor r.drop mean   sd
## tchr_exp_do        7548  0.66  0.68  0.62   0.60  1.6 0.68
## tchr_easy_undrstnd 7548  0.83  0.82  0.81   0.78  1.7 0.85
## intrst_tchr_say    7548  0.77  0.77  0.74   0.71  1.9 0.83
## intrst_do          7548  0.77  0.77  0.74   0.70  2.1 0.89
## tchr_clear_answ    7548  0.83  0.83  0.81   0.78  1.8 0.86
## tchr_expl_good     7548  0.84  0.84  0.83   0.80  1.6 0.83
## tchr_show_learn    7548  0.76  0.76  0.72   0.69  1.8 0.80
## dif_help           7548  0.79  0.79  0.77   0.74  1.7 0.81
## tel_how_better     7548  0.78  0.78  0.76   0.73  1.7 0.80
## tchr_listen        7548  0.77  0.77  0.74   0.71  1.6 0.79
## 
## Non missing response frequency for each item
##                       1    2    3    4 miss
## tchr_exp_do        0.53 0.39 0.06 0.02    0
## tchr_easy_undrstnd 0.48 0.35 0.12 0.05    0
## intrst_tchr_say    0.35 0.45 0.15 0.05    0
## intrst_do          0.28 0.42 0.23 0.07    0
## tchr_clear_answ    0.46 0.36 0.13 0.05    0
## tchr_expl_good     0.54 0.31 0.10 0.04    0
## tchr_show_learn    0.40 0.43 0.13 0.04    0
## dif_help           0.51 0.34 0.10 0.04    0
## tel_how_better     0.51 0.35 0.10 0.04    0
## tchr_listen        0.54 0.34 0.08 0.04    0
ML3<- as.data.frame(dat1[c("us_do_well", "difficult", "not_strong", "learn_quick", "makes_nerv")])
psych::alpha(ML3, check.keys=TRUE)
## Warning in psych::alpha(ML3, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = ML3, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.88      0.88    0.86      0.59 7.1 0.0022  2.9 0.81     0.62
## 
##  lower alpha upper     95% confidence boundaries
## 0.87 0.88 0.88 
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r
## us_do_well-       0.84      0.84    0.81      0.57 5.4   0.0029 0.0117
## difficult         0.83      0.84    0.81      0.57 5.2   0.0031 0.0143
## not_strong        0.83      0.83    0.80      0.55 4.9   0.0033 0.0121
## learn_quick-      0.85      0.85    0.82      0.59 5.7   0.0028 0.0129
## makes_nerv        0.88      0.89    0.86      0.66 7.9   0.0021 0.0026
##              med.r
## us_do_well-   0.56
## difficult     0.58
## not_strong    0.56
## learn_quick-  0.58
## makes_nerv    0.65
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean   sd
## us_do_well-  7548  0.83  0.84  0.80   0.74  3.2 0.86
## difficult    7548  0.86  0.85  0.81   0.76  2.9 1.03
## not_strong   7548  0.88  0.87  0.84   0.79  2.8 1.11
## learn_quick- 7548  0.81  0.82  0.77   0.71  3.0 0.93
## makes_nerv   7548  0.72  0.71  0.58   0.55  2.8 1.01
## 
## Non missing response frequency for each item
##                1    2    3    4 miss
## us_do_well  0.45 0.36 0.14 0.05    0
## difficult   0.13 0.20 0.32 0.36    0
## not_strong  0.18 0.20 0.27 0.34    0
## learn_quick 0.34 0.37 0.22 0.08    0
## makes_nerv  0.12 0.25 0.32 0.31    0
fa1<-fa(dat1, nfactors=3, rotate="varimax", fm="ml", cor="mixed", scores=T)
## 
## mixed.cor is deprecated, please use mixedCor.
load <- fa1$loadings[,1:2] 
fascores<-as.data.frame(fa1$scores)
datfa<-cbind(data1,fascores) # now we have all our factor scores in one data frame, datfa
datfa1 <- datfa %>% select("math_achiv", "sex_stud", "momedu", "fathedu", "if_native", "ML2", "ML1", "ML3")
datfa1 = as.data.frame(lapply(datfa1, as.numeric))
#names(datfa1) #good!

Regression analysis

Now we can drive into regression, where outcome is student’s math achievements. I’ve conducted regression models using forward selection: I’ve started with one predictor (first factor’s scores), than added one by one other factors and information about student and their parent’s education.

The most reliable model, in my opinion, is the last one, combining all these predictors. In has not bad adjusted R-squared (0.317), explaining 32% of variability of math achievement. Below the summary for this model can be seen, as well as graphics for it. As for the outliers and leverages: there are two outliers (241 and 4323) and no leverages.

As for the relation of predictors: it can be seen that both factors have negative estimate. That is because the scale is from 1 to 4, where 4 means disagreement and 1 is agreement (as I’ve mentioned at the beginning). So in this case - strong negative relation of these predictors means positive relation of positive attitudes. For this model it can be said: when students like the subject and its classes and when they strong in it, easily catch the material, their achievemnts are better (comletely unexpectedly). In addition, mother’s education gives positive relation to students’ achievements.

#in order not to take to much space with summaries, i dont show it: R-squared, p-values and important notes are in captions here :)
model1_1 <- lm(math_achiv ~ ML1, data= datfa1)
#summary(model1_1) # R^2 = 0.046 , p-value < 2.2e-16, ML1 has significant negative relation to the outcome
model1_2 <- lm(math_achiv ~ ML1 + ML2, data= datfa1)
#summary(model1_2) # R^2 = 0.046 , p-value < 2.2e-16, ML2 gives insignficant results
model1_3 <- lm(math_achiv ~ ML1 + ML2 + ML3, data= datfa1)
#summary(model1_3) # R^2 = 0.317  , p-value < 2.2e-16 - ML2 still insignificant, but R^2 is much better! ML3 is significant, giving negative relation
model2_1 <- lm(math_achiv ~ ML1 + ML2 + ML3 + sex_stud, data= datfa1)
#summary(model2_1) # R^2 = 0.317, p-value < 2.2e-16, student's gender in this case is insignificant
model2_2 <- lm(datfa1$math_achiv ~ ML1 + ML2 + ML3 + sex_stud + momedu, data= datfa1)
#summary(model2_2) # R^2 = 0.317, p-value < 2.2e-16, mother's education is significant, gives positive relation to the outcome
model2_3 <- lm(datfa1$math_achiv ~ ML1 + ML2 + ML3 + sex_stud + momedu + fathedu, data= datfa1)
#summary(model2_3) # R^2 = 0.317, p-value < 2.2e-16, father's education is insignificant (but mother's estimate became stronger)
model_f <- lm(datfa1$math_achiv ~., data= datfa1)
summary(model_f) # R^2 = 0.317, p-value < 2.2e-16, if student is native or not- no significant relation.
## 
## Call:
## lm(formula = datfa1$math_achiv ~ ., data = datfa1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6000.5 -1575.4    35.1  1643.7  6325.2 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4529.308    135.094  33.527   <2e-16 ***
## sex_stud      -81.523     48.369  -1.685   0.0919 .  
## momedu         32.341     15.904   2.034   0.0420 *  
## fathedu        -7.462     15.259  -0.489   0.6248    
## if_native     -50.839     70.426  -0.722   0.4704    
## ML2           -13.191     23.345  -0.565   0.5721    
## ML1          -379.704     24.960 -15.213   <2e-16 ***
## ML3         -1365.080     25.026 -54.546   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2082 on 7540 degrees of freedom
## Multiple R-squared:  0.3178, Adjusted R-squared:  0.3171 
## F-statistic: 501.7 on 7 and 7540 DF,  p-value: < 2.2e-16
plot(model_f) #no leverages

qqPlot(model_f, main="QQ Plot") #outliers are: 241 and 4323

## [1]  241 4323

As for the AIC analysis - the lowest (=>the best) score has the model with all factors and mother’s education. However, it is not differ much from AIC score for the full model (that I’ve chosen).

Model <- c("model1_1", "model1_2", "model1_3 - with all 3 factors", "model2_1", "model2_2 - lowest (best) score", "model2_3", "Full model")
AIC_score <- c("139297", "139299", "136780", "136780", "136777", "136779", "136780")
AIC_scores <- data.frame(Model, AIC_score)
kable(AIC_scores) %>%
  kable_styling(c("bordered"))
Model AIC_score
model1_1 139297
model1_2 139299
model1_3 - with all 3 factors 136780
model2_1 136780
model2_2 - lowest (best) score 136777
model2_3 136779
Full model 136780
# to show it in more tidy way I've created a table an filled it with the scores, calculated by this code  
# the lower the better
#AIC(model1_1) #139297
#AIC(model1_2) #139299
#AIC(model1_3) #136780
#AIC(model2_1) #136780
#AIC(model2_2) #136777 - the lowest
#AIC(model2_3) #136779
#AIC(model_f)  #136780

After that I’ve created one more model: only significant predictors are here. AIС is the lowest, other features are the same: R-squared = 0.317, p-value <2.2e-16. Same outliers, no leverages.

model2_4 <- lm(datfa1$math_achiv ~ ML1 + ML3 + momedu, data= datfa1)
summary(model2_4)
## 
## Call:
## lm(formula = datfa1$math_achiv ~ ML1 + ML3 + momedu, data = datfa1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5987.3 -1577.2    30.6  1634.7  6280.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4343.62      82.41  52.706   <2e-16 ***
## ML1          -376.41      24.67 -15.255   <2e-16 ***
## ML3         -1361.15      24.91 -54.634   <2e-16 ***
## momedu         25.95      12.79   2.028   0.0425 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2082 on 7544 degrees of freedom
## Multiple R-squared:  0.3174, Adjusted R-squared:  0.3171 
## F-statistic:  1169 on 3 and 7544 DF,  p-value: < 2.2e-16
AIC(model2_4)
## [1] 136776.8
#plot(model2_4) #no leverages
#qqPlot(model2_4, main="QQ Plot") #outliers are: 241 and 4323

To test for multicollinearity, I’ve used VIF analysis (VIF > 10 indicates the presence of multicollinearity)- in this case everything is OK!

vif(model_f)
##  sex_stud    momedu   fathedu if_native       ML2       ML1       ML3 
##  1.017592  1.546183  1.550944  1.025065  1.008332  1.041084  1.027315
vif(model2_4)
##      ML1      ML3   momedu 
## 1.017399 1.018096 1.000696

Conclusion

In general, the following can be said:

  • canadian students’ attitudes towards maths can be divided on their theacher’s qualities, their own strength in the subject and how much do they like/dislike the subject and its classes;

  • two of these factors give significant relation to students’ math achievement: these factors are students’ interest in the subject and classes and their strenghs and personal ablilities. When students like the subject and high abilities in learning math they have higher achievements;

  • out of other predictors, mother’s education is significant for the students’ achievements - the higher is educational capital the higher the grades.