Hello! This time I’m going to explore students’ attitudes towards studying maths in Canada. We have 25 variables describing students attitudes toward differents aspects of maths: the subject itself, classes, teachers, personal abilities on it. I’m interested to look at how these conditions can be related to each other, how these can be grouped by some factors, how many such factors are there.
Looking at the variables I can suggest that they’ll be divided on three groups:
(variables in this case - ordinal 4-point scales from “Agree a lot” to “Disagree a lot”).
Also, there is a small group, which will be needed later, with information about the student and their parents : parental education, student’s gender and if a student is a native Canadian or was born outside.
Name_of_the_variable <- c("enjoy", "wish_not", "boring", "learn_intrst", "like_math", "like_num", "like_prob", "look_forw", "favourite",
"tchr_exp_do", "tchr_easy_undrstnd", "intrst_tchr_say", "intrst_do", "tchr_clear_answ", "tchr_expl_good", "tchr_show_learn", "dif_help", "tel_how_better", "tchr_listen",
"us_do_well", "difficult", "not_strong", "learn_quick", "makes_nerv",
"math_achiv", "sex_stud", "momedu", "fathedu", "if_native")
Label <- c("enjoy learning mathematics", "wish have not to study math", "math is boring", "learn interesting things", "like mathematics", "like numbers", "like math problems", "look forward to math class", "favorite subject",
"teacher expects to do", "teacher is easy to understand", "interested in what tchr says", "interesting things to do", "teacher clear answers", "teacher explains good", "teacher shows learned", "different things to help", "tells how to do better", "teacher listens",
"usually do well in math", "math is more difficult", "mathematics not my strength", "learn quickly in mathematics", "math makes nervous",
"math achievement", "student's gender", "mother's education", "father's education", "if a student native or not")
Decoding <- data.frame(Name_of_the_variable, Label)
kable(Decoding) %>%
kable_styling(c("bordered"))
| Name_of_the_variable | Label |
|---|---|
| enjoy | enjoy learning mathematics |
| wish_not | wish have not to study math |
| boring | math is boring |
| learn_intrst | learn interesting things |
| like_math | like mathematics |
| like_num | like numbers |
| like_prob | like math problems |
| look_forw | look forward to math class |
| favourite | favorite subject |
| tchr_exp_do | teacher expects to do |
| tchr_easy_undrstnd | teacher is easy to understand |
| intrst_tchr_say | interested in what tchr says |
| intrst_do | interesting things to do |
| tchr_clear_answ | teacher clear answers |
| tchr_expl_good | teacher explains good |
| tchr_show_learn | teacher shows learned |
| dif_help | different things to help |
| tel_how_better | tells how to do better |
| tchr_listen | teacher listens |
| us_do_well | usually do well in math |
| difficult | math is more difficult |
| not_strong | mathematics not my strength |
| learn_quick | learn quickly in mathematics |
| makes_nerv | math makes nervous |
| math_achiv | math achievement |
| sex_stud | student’s gender |
| momedu | mother’s education |
| fathedu | father’s education |
| if_native | if a student native or not |
Below descriptive statistics can be seen: both tables with N of observation for different categories in eachvariable and statistical values.
summary(dat1)
## enjoy wish_not
## Agree a lot :2464 Agree a lot :1258
## Agree a little :3160 Agree a little :1762
## Disagree a little:1228 Disagree a little:2167
## Disagree a lot : 696 Disagree a lot :2361
## boring learn_intrst
## Agree a lot :1182 Agree a lot :2293
## Agree a little :2275 Agree a little :3323
## Disagree a little:2491 Disagree a little:1461
## Disagree a lot :1600 Disagree a lot : 471
## like_math like_num
## Agree a lot :2336 Agree a lot :1301
## Agree a little :2845 Agree a little :2833
## Disagree a little:1395 Disagree a little:2377
## Disagree a lot : 972 Disagree a lot :1037
## like_prob look_forw
## Agree a lot :1795 Agree a lot :1243
## Agree a little :2628 Agree a little :2236
## Disagree a little:1955 Disagree a little:2569
## Disagree a lot :1170 Disagree a lot :1500
## favourite tchr_exp_do
## Agree a lot :2008 Agree a lot :4020
## Agree a little :1758 Agree a little :2959
## Disagree a little:1815 Disagree a little: 444
## Disagree a lot :1967 Disagree a lot : 125
## tchr_easy_undrstnd intrst_tchr_say
## Agree a lot :3647 Agree a lot :2631
## Agree a little :2636 Agree a little :3376
## Disagree a little: 904 Disagree a little:1165
## Disagree a lot : 361 Disagree a lot : 376
## intrst_do tchr_clear_answ
## Agree a lot :2133 Agree a lot :3461
## Agree a little :3150 Agree a little :2703
## Disagree a little:1734 Disagree a little:1014
## Disagree a lot : 531 Disagree a lot : 370
## tchr_expl_good tchr_show_learn
## Agree a lot :4104 Agree a lot :3045
## Agree a little :2377 Agree a little :3256
## Disagree a little: 728 Disagree a little: 976
## Disagree a lot : 339 Disagree a lot : 271
## dif_help tel_how_better
## Agree a lot :3875 Agree a lot :3822
## Agree a little :2590 Agree a little :2667
## Disagree a little: 791 Disagree a little: 783
## Disagree a lot : 292 Disagree a lot : 276
## tchr_listen us_do_well
## Agree a lot :4096 Agree a lot :3419
## Agree a little :2546 Agree a little :2742
## Disagree a little: 632 Disagree a little:1019
## Disagree a lot : 274 Disagree a lot : 368
## difficult not_strong
## Agree a lot : 965 Agree a lot :1393
## Agree a little :1508 Agree a little :1547
## Disagree a little:2387 Disagree a little:2006
## Disagree a lot :2688 Disagree a lot :2602
## learn_quick makes_nerv
## Agree a lot :2549 Agree a lot : 926
## Agree a little :2791 Agree a little :1878
## Disagree a little:1625 Disagree a little:2407
## Disagree a lot : 583 Disagree a lot :2337
describeBy(dat1)
dat1 %>% group_by(enjoy) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(wish_not) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(boring) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(learn_intrst) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_math) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_num) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(like_prob) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(look_forw) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(favourite) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_exp_do) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_easy_undrstnd) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(intrst_tchr_say) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(intrst_do) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_clear_answ) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_expl_good) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_show_learn) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tel_how_better) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(tchr_listen) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(us_do_well) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(difficult) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(not_strong) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(learn_quick) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
dat1 %>% group_by(makes_nerv) %>% summarize(count=n()) %>% mutate(perc=paste0(round(count/sum(count)*100, 1), "%")) %>% as.data.frame()
Below the correlation matrix between variables can be seen.
dat.cor #was created at the very beginning
## enjoy wish_not boring learn_intrst
## enjoy 1.0000000 -0.6530110 -0.7311512 0.7337597
## wish_not -0.6530110 1.0000000 0.6689014 -0.5301443
## boring -0.7311512 0.6689014 1.0000000 -0.5909139
## learn_intrst 0.7337597 -0.5301443 -0.5909139 1.0000000
## like_math 0.9097056 -0.6735586 -0.7338107 0.7146248
## like_num 0.7665287 -0.5485120 -0.6177848 0.6438315
## like_prob 0.7994023 -0.5943102 -0.6563359 0.6550632
## look_forw 0.8202649 -0.6084189 -0.6883917 0.6938163
## favourite 0.8540816 -0.6228135 -0.7043775 0.6439868
## tchr_exp_do 0.3946110 -0.3065921 -0.2818359 0.4166977
## tchr_easy_undrstnd 0.4301234 -0.3110446 -0.3365352 0.4526249
## intrst_tchr_say 0.5870188 -0.4303067 -0.4879184 0.6337751
## intrst_do 0.5510505 -0.4140823 -0.4654591 0.6129997
## tchr_clear_answ 0.3846048 -0.2913330 -0.3230871 0.4449826
## tchr_expl_good 0.4376279 -0.2992816 -0.3357180 0.4768712
## tchr_show_learn 0.3893986 -0.2891893 -0.2974519 0.4275552
## dif_help 0.3783950 -0.2582635 -0.2981235 0.4609514
## tel_how_better 0.3339233 -0.2357417 -0.2551312 0.4200189
## tchr_listen 0.3292856 -0.2474333 -0.2607645 0.4052187
## us_do_well 0.6424544 -0.4900462 -0.4839159 0.4325017
## difficult -0.5096370 0.4665074 0.4377293 -0.3042396
## not_strong -0.6162819 0.5329517 0.5321202 -0.3779231
## learn_quick 0.6518604 -0.4669332 -0.4941101 0.4678809
## makes_nerv -0.4073788 0.3773705 0.3785172 -0.2330107
## like_math like_num like_prob look_forw favourite
## enjoy 0.9097056 0.7665287 0.7994023 0.8202649 0.8540816
## wish_not -0.6735586 -0.5485120 -0.5943102 -0.6084189 -0.6228135
## boring -0.7338107 -0.6177848 -0.6563359 -0.6883917 -0.7043775
## learn_intrst 0.7146248 0.6438315 0.6550632 0.6938163 0.6439868
## like_math 1.0000000 0.7782737 0.8258742 0.8258958 0.8927466
## like_num 0.7782737 1.0000000 0.8102412 0.7329954 0.7448598
## like_prob 0.8258742 0.8102412 1.0000000 0.7638991 0.7881490
## look_forw 0.8258958 0.7329954 0.7638991 1.0000000 0.8358233
## favourite 0.8927466 0.7448598 0.7881490 0.8358233 1.0000000
## tchr_exp_do 0.3771673 0.3214364 0.3350991 0.4262023 0.3500125
## tchr_easy_undrstnd 0.4049418 0.3366131 0.3485542 0.4657342 0.3921987
## intrst_tchr_say 0.5470005 0.5033269 0.5001450 0.6101532 0.5040575
## intrst_do 0.5303567 0.4838652 0.4905195 0.6058280 0.4946253
## tchr_clear_answ 0.3506200 0.3208154 0.3175406 0.4349045 0.3491850
## tchr_expl_good 0.4003023 0.3473059 0.3447801 0.4729798 0.3899317
## tchr_show_learn 0.3579386 0.3368703 0.3388873 0.4141289 0.3427036
## dif_help 0.3505382 0.3087548 0.3081696 0.4381448 0.3360577
## tel_how_better 0.3171806 0.2846215 0.2892075 0.4009448 0.3050696
## tchr_listen 0.3061258 0.2734203 0.2592535 0.3939367 0.2858839
## us_do_well 0.6674715 0.5418959 0.6179009 0.5427544 0.6674118
## difficult -0.5474407 -0.4326155 -0.5120452 -0.4213153 -0.5567467
## not_strong -0.6576111 -0.5316221 -0.6015587 -0.5257382 -0.6817892
## learn_quick 0.6678710 0.5826812 0.6394750 0.5653425 0.6698119
## makes_nerv -0.4355800 -0.3526284 -0.4132874 -0.3572428 -0.4502413
## tchr_exp_do tchr_easy_undrstnd intrst_tchr_say
## enjoy 0.3946110 0.4301234 0.5870188
## wish_not -0.3065921 -0.3110446 -0.4303067
## boring -0.2818359 -0.3365352 -0.4879184
## learn_intrst 0.4166977 0.4526249 0.6337751
## like_math 0.3771673 0.4049418 0.5470005
## like_num 0.3214364 0.3366131 0.5033269
## like_prob 0.3350991 0.3485542 0.5001450
## look_forw 0.4262023 0.4657342 0.6101532
## favourite 0.3500125 0.3921987 0.5040575
## tchr_exp_do 1.0000000 0.6622781 0.5727194
## tchr_easy_undrstnd 0.6622781 1.0000000 0.7061846
## intrst_tchr_say 0.5727194 0.7061846 1.0000000
## intrst_do 0.5437896 0.6523856 0.7904675
## tchr_clear_answ 0.5899970 0.7888271 0.6670746
## tchr_expl_good 0.5984295 0.8455601 0.6976676
## tchr_show_learn 0.5692919 0.6335037 0.6144980
## dif_help 0.5483149 0.6898528 0.6360824
## tel_how_better 0.5739399 0.6689336 0.6030622
## tchr_listen 0.5564177 0.6684535 0.6206567
## us_do_well 0.3715907 0.3693559 0.3574044
## difficult -0.2467881 -0.2340042 -0.2186844
## not_strong -0.2569218 -0.2572554 -0.2760368
## learn_quick 0.3406526 0.3692395 0.3765803
## makes_nerv -0.1943283 -0.2063668 -0.1569602
## intrst_do tchr_clear_answ tchr_expl_good
## enjoy 0.5510505 0.3846048 0.4376279
## wish_not -0.4140823 -0.2913330 -0.2992816
## boring -0.4654591 -0.3230871 -0.3357180
## learn_intrst 0.6129997 0.4449826 0.4768712
## like_math 0.5303567 0.3506200 0.4003023
## like_num 0.4838652 0.3208154 0.3473059
## like_prob 0.4905195 0.3175406 0.3447801
## look_forw 0.6058280 0.4349045 0.4729798
## favourite 0.4946253 0.3491850 0.3899317
## tchr_exp_do 0.5437896 0.5899970 0.5984295
## tchr_easy_undrstnd 0.6523856 0.7888271 0.8455601
## intrst_tchr_say 0.7904675 0.6670746 0.6976676
## intrst_do 1.0000000 0.6660573 0.6629648
## tchr_clear_answ 0.6660573 1.0000000 0.8394925
## tchr_expl_good 0.6629648 0.8394925 1.0000000
## tchr_show_learn 0.6247443 0.6606140 0.6764640
## dif_help 0.6855708 0.6905161 0.7463640
## tel_how_better 0.6231466 0.7050020 0.7142226
## tchr_listen 0.6097310 0.7216724 0.7008700
## us_do_well 0.3328709 0.3061418 0.3480884
## difficult -0.1929729 -0.1784544 -0.2076731
## not_strong -0.2526752 -0.2023622 -0.2384670
## learn_quick 0.3511854 0.3171648 0.3514692
## makes_nerv -0.1717992 -0.1661405 -0.1691876
## tchr_show_learn dif_help tel_how_better tchr_listen
## enjoy 0.3893986 0.3783950 0.33392335 0.32928557
## wish_not -0.2891893 -0.2582635 -0.23574170 -0.24743332
## boring -0.2974519 -0.2981235 -0.25513123 -0.26076448
## learn_intrst 0.4275552 0.4609514 0.42001893 0.40521870
## like_math 0.3579386 0.3505382 0.31718059 0.30612582
## like_num 0.3368703 0.3087548 0.28462154 0.27342034
## like_prob 0.3388873 0.3081696 0.28920747 0.25925350
## look_forw 0.4141289 0.4381448 0.40094479 0.39393667
## favourite 0.3427036 0.3360577 0.30506961 0.28588393
## tchr_exp_do 0.5692919 0.5483149 0.57393994 0.55641772
## tchr_easy_undrstnd 0.6335037 0.6898528 0.66893360 0.66845350
## intrst_tchr_say 0.6144980 0.6360824 0.60306215 0.62065668
## intrst_do 0.6247443 0.6855708 0.62314659 0.60973104
## tchr_clear_answ 0.6606140 0.6905161 0.70500196 0.72167240
## tchr_expl_good 0.6764640 0.7463640 0.71422265 0.70087000
## tchr_show_learn 1.0000000 0.6823598 0.65665394 0.67364836
## dif_help 0.6823598 1.0000000 0.74371034 0.68564237
## tel_how_better 0.6566539 0.7437103 1.00000000 0.74304611
## tchr_listen 0.6736484 0.6856424 0.74304611 1.00000000
## us_do_well 0.2947806 0.2470531 0.24073244 0.23507386
## difficult -0.1641726 -0.1152264 -0.09271954 -0.09978806
## not_strong -0.1981051 -0.1524332 -0.12130732 -0.12665741
## learn_quick 0.2977383 0.2635005 0.22936755 0.22861487
## makes_nerv -0.1496616 -0.1283584 -0.10061393 -0.11434392
## us_do_well difficult not_strong learn_quick
## enjoy 0.6424544 -0.50963695 -0.6162819 0.6518604
## wish_not -0.4900462 0.46650744 0.5329517 -0.4669332
## boring -0.4839159 0.43772932 0.5321202 -0.4941101
## learn_intrst 0.4325017 -0.30423963 -0.3779231 0.4678809
## like_math 0.6674715 -0.54744070 -0.6576111 0.6678710
## like_num 0.5418959 -0.43261546 -0.5316221 0.5826812
## like_prob 0.6179009 -0.51204521 -0.6015587 0.6394750
## look_forw 0.5427544 -0.42131534 -0.5257382 0.5653425
## favourite 0.6674118 -0.55674667 -0.6817892 0.6698119
## tchr_exp_do 0.3715907 -0.24678808 -0.2569218 0.3406526
## tchr_easy_undrstnd 0.3693559 -0.23400419 -0.2572554 0.3692395
## intrst_tchr_say 0.3574044 -0.21868438 -0.2760368 0.3765803
## intrst_do 0.3328709 -0.19297292 -0.2526752 0.3511854
## tchr_clear_answ 0.3061418 -0.17845436 -0.2023622 0.3171648
## tchr_expl_good 0.3480884 -0.20767311 -0.2384670 0.3514692
## tchr_show_learn 0.2947806 -0.16417264 -0.1981051 0.2977383
## dif_help 0.2470531 -0.11522643 -0.1524332 0.2635005
## tel_how_better 0.2407324 -0.09271954 -0.1213073 0.2293676
## tchr_listen 0.2350739 -0.09978806 -0.1266574 0.2286149
## us_do_well 1.0000000 -0.73171095 -0.7716415 0.7944566
## difficult -0.7317110 1.00000000 0.8199232 -0.6814399
## not_strong -0.7716415 0.81992317 1.0000000 -0.7276280
## learn_quick 0.7944566 -0.68143986 -0.7276280 1.0000000
## makes_nerv -0.5157156 0.59404085 0.5982757 -0.5031231
## makes_nerv
## enjoy -0.4073788
## wish_not 0.3773705
## boring 0.3785172
## learn_intrst -0.2330107
## like_math -0.4355800
## like_num -0.3526284
## like_prob -0.4132874
## look_forw -0.3572428
## favourite -0.4502413
## tchr_exp_do -0.1943283
## tchr_easy_undrstnd -0.2063668
## intrst_tchr_say -0.1569602
## intrst_do -0.1717992
## tchr_clear_answ -0.1661405
## tchr_expl_good -0.1691876
## tchr_show_learn -0.1496616
## dif_help -0.1283584
## tel_how_better -0.1006139
## tchr_listen -0.1143439
## us_do_well -0.5157156
## difficult 0.5940408
## not_strong 0.5982757
## learn_quick -0.5031231
## makes_nerv 1.0000000
#now all variables can be converted into numeric type, yay
dat1 = as.data.frame(lapply(dat1, as.numeric)) # now scale is from 1 - strong agreement, to 4 - strong disagree. In ohter words: the higher the value - the more disagreement.
#str(dat1) #beautiful
As for the number за factors: 4 factors are suggested by parallel analysis. It also can be seen on the graph below, where crosses above the line are suggested factors. Well, let’s try this on practice.
fa.parallel(dat1, fa="both", n.iter=100)
## Parallel analysis suggests that the number of factors = 4 and the number of components = 3
# running different options of fa: no rotation, varimax and oblimin rotation. Also I've done both with and without cor="mixed"
fa(dat1, nfactors=3, rotate="varimax", fm="ml", cor="mixed")
##
## mixed.cor is deprecated, please use mixedCor.
## Factor Analysis using method = ml
## Call: fa(r = dat1, nfactors = 3, rotate = "varimax", fm = "ml", cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML1 ML3 h2 u2 com
## enjoy 0.26 0.84 0.34 0.88 0.117 1.5
## wish_not -0.18 -0.59 -0.33 0.49 0.506 1.8
## boring -0.20 -0.69 -0.28 0.60 0.401 1.5
## learn_intrst 0.39 0.69 0.11 0.65 0.355 1.6
## like_math 0.21 0.85 0.39 0.92 0.084 1.5
## like_num 0.20 0.75 0.28 0.69 0.312 1.4
## like_prob 0.19 0.77 0.36 0.76 0.242 1.6
## look_forw 0.34 0.79 0.24 0.80 0.203 1.6
## favourite 0.21 0.79 0.43 0.85 0.150 1.7
## tchr_exp_do 0.65 0.19 0.19 0.50 0.502 1.3
## tchr_easy_undrstnd 0.84 0.16 0.19 0.78 0.225 1.2
## intrst_tchr_say 0.71 0.45 0.06 0.71 0.293 1.7
## intrst_do 0.70 0.44 0.03 0.69 0.313 1.7
## tchr_clear_answ 0.87 0.14 0.12 0.78 0.216 1.1
## tchr_expl_good 0.88 0.17 0.15 0.83 0.173 1.1
## tchr_show_learn 0.74 0.19 0.09 0.60 0.401 1.2
## dif_help 0.81 0.19 0.03 0.69 0.310 1.1
## tel_how_better 0.80 0.15 0.02 0.67 0.328 1.1
## tchr_listen 0.80 0.14 0.03 0.66 0.337 1.1
## us_do_well 0.20 0.39 0.75 0.76 0.244 1.7
## difficult -0.05 -0.25 -0.85 0.78 0.216 1.2
## not_strong -0.05 -0.39 -0.83 0.84 0.163 1.4
## learn_quick 0.20 0.43 0.69 0.69 0.305 1.9
## makes_nerv -0.06 -0.23 -0.60 0.42 0.582 1.3
##
## ML2 ML1 ML3
## SS loadings 6.82 6.37 3.84
## Proportion Var 0.28 0.27 0.16
## Cumulative Var 0.28 0.55 0.71
## Proportion Explained 0.40 0.37 0.23
## Cumulative Proportion 0.40 0.77 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 24.48 with Chi Square of 184517.5
## The degrees of freedom for the model are 207 and the objective function was 1.28
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic number of observations is 7548 with the empirical chi square 1792.57 with prob < 2.7e-250
## The total number of observations was 7548 with Likelihood Chi Square = 9653.09 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.932
## RMSEA index = 0.078 and the 90 % confidence intervals are 0.076 0.079
## BIC = 7804.78
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML1 ML3
## Correlation of (regression) scores with factors 0.97 0.96 0.94
## Multiple R square of scores with factors 0.95 0.93 0.89
## Minimum correlation of possible factor scores 0.90 0.86 0.78
# plot and diagram for the chosen model:
factor.plot(fa(dat1, nfactors=3, rotate="varimax", cor="mixed", fm="ml"))
##
## mixed.cor is deprecated, please use mixedCor.
fa.diagram(fa(dat1, nfactors=3, rotate="varimax", cor="mixed", fm="ml"))
##
## mixed.cor is deprecated, please use mixedCor.
#i've chosen the model with varimax roration. However, the code for the other I've decided to leave here, why not?
#fa(dat1, nfactors=3, rotate="none", fm="ml")
#fa(dat1, nfactors=3, rotate="none", fm="ml", cor="mixed")
#fa(dat1, nfactors=3, rotate="varimax", fm="ml")
#fa(dat1, nfactors=3, rotate="oblimin", fm="ml")
#fa(dat1, nfactors=3, rotate="oblimin", fm="ml", cor="mixed")
# I've also tried a model with 4 factors, but that was not better at all, than model with suggested N of factors.
#fa(dat1, nfactors=4, rotate="varimax", fm="ml")
#fa(dat1, nfactors=4, rotate="varimax", fm="ml", cor="mixed")
I’ve chosen the model with varimax rotation, as it gave better results. Here they are:
-> the model is pretty good. Graph shows the distribution of variables to three factors. As if we look closer, we can notice, that factors have divided variables in the same way, as I’ve suggested at the beginning:
Also, if we look at the graph, we can see, that some variables are highlighted with red - the same have negative values for loadings - these variables relate to “negative” attitudes: when students feel boring or nervous, don’t wont to visit math classes, feel difficulties with the subject. That is explained by the fact, if a student gives positive answers to positive attitudes, they will the same way give negative answers to the questions with negative attitudes to math.
Moving forwards, let’s test our factors: Cronbach’s alpha (>.7 indicates good reliability). We have 0.93, 0.94 and 0.88 to the 1st, 2nd and 3rd factors respectively.
ML1<- as.data.frame(dat1[c("enjoy", "wish_not", "boring", "learn_intrst", "like_math", "like_num", "like_prob", "look_forw", "favourite")])
psych::alpha(ML1, check.keys=TRUE)
## Warning in psych::alpha(ML1, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
## This is indicated by a negative sign for the variable name.
##
## Reliability analysis
## Call: psych::alpha(x = ML1, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.94 0.94 0.94 0.63 16 0.001 2.3 0.81 0.64
##
## lower alpha upper 95% confidence boundaries
## 0.94 0.94 0.94
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## enjoy 0.93 0.93 0.93 0.62 13 0.0012 0.0082
## wish_not- 0.94 0.94 0.94 0.66 16 0.0010 0.0066
## boring- 0.93 0.94 0.93 0.65 15 0.0011 0.0100
## learn_intrst 0.94 0.94 0.94 0.65 15 0.0011 0.0083
## like_math 0.93 0.93 0.92 0.61 13 0.0013 0.0072
## like_num 0.93 0.93 0.93 0.64 14 0.0012 0.0092
## like_prob 0.93 0.93 0.93 0.63 14 0.0012 0.0091
## look_forw 0.93 0.93 0.93 0.63 13 0.0012 0.0094
## favourite 0.93 0.93 0.93 0.62 13 0.0012 0.0083
## med.r
## enjoy 0.61
## wish_not- 0.66
## boring- 0.66
## learn_intrst 0.66
## like_math 0.61
## like_num 0.63
## like_prob 0.63
## look_forw 0.63
## favourite 0.62
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## enjoy 7548 0.88 0.89 0.88 0.85 2.0 0.93
## wish_not- 7548 0.72 0.71 0.65 0.63 2.3 1.07
## boring- 7548 0.78 0.78 0.74 0.72 2.4 0.99
## learn_intrst 7548 0.74 0.75 0.70 0.67 2.0 0.86
## like_math 7548 0.90 0.90 0.90 0.87 2.1 1.00
## like_num 7548 0.81 0.81 0.78 0.76 2.4 0.93
## like_prob 7548 0.85 0.85 0.83 0.80 2.3 1.00
## look_forw 7548 0.85 0.85 0.83 0.81 2.6 0.99
## favourite 7548 0.87 0.86 0.85 0.82 2.5 1.14
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## enjoy 0.33 0.42 0.16 0.09 0
## wish_not 0.17 0.23 0.29 0.31 0
## boring 0.16 0.30 0.33 0.21 0
## learn_intrst 0.30 0.44 0.19 0.06 0
## like_math 0.31 0.38 0.18 0.13 0
## like_num 0.17 0.38 0.31 0.14 0
## like_prob 0.24 0.35 0.26 0.16 0
## look_forw 0.16 0.30 0.34 0.20 0
## favourite 0.27 0.23 0.24 0.26 0
ML2<- as.data.frame(dat1[c("tchr_exp_do", "tchr_easy_undrstnd", "intrst_tchr_say", "intrst_do", "tchr_clear_answ", "tchr_expl_good", "tchr_show_learn", "dif_help", "tel_how_better", "tchr_listen")])
psych::alpha(ML2, check.keys=TRUE)
##
## Reliability analysis
## Call: psych::alpha(x = ML2, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.93 0.93 0.93 0.57 13 0.0012 1.7 0.64 0.57
##
## lower alpha upper 95% confidence boundaries
## 0.93 0.93 0.93
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## tchr_exp_do 0.93 0.93 0.93 0.59 13 0.0012
## tchr_easy_undrstnd 0.92 0.92 0.92 0.56 11 0.0014
## intrst_tchr_say 0.92 0.92 0.92 0.57 12 0.0013
## intrst_do 0.92 0.92 0.92 0.57 12 0.0013
## tchr_clear_answ 0.92 0.92 0.92 0.56 11 0.0014
## tchr_expl_good 0.92 0.92 0.92 0.55 11 0.0014
## tchr_show_learn 0.92 0.92 0.92 0.57 12 0.0013
## dif_help 0.92 0.92 0.92 0.56 12 0.0013
## tel_how_better 0.92 0.92 0.92 0.57 12 0.0013
## tchr_listen 0.92 0.92 0.92 0.57 12 0.0013
## var.r med.r
## tchr_exp_do 0.0035 0.58
## tchr_easy_undrstnd 0.0049 0.57
## intrst_tchr_say 0.0054 0.57
## intrst_do 0.0054 0.57
## tchr_clear_answ 0.0047 0.56
## tchr_expl_good 0.0040 0.56
## tchr_show_learn 0.0061 0.57
## dif_help 0.0057 0.56
## tel_how_better 0.0057 0.57
## tchr_listen 0.0057 0.57
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## tchr_exp_do 7548 0.66 0.68 0.62 0.60 1.6 0.68
## tchr_easy_undrstnd 7548 0.83 0.82 0.81 0.78 1.7 0.85
## intrst_tchr_say 7548 0.77 0.77 0.74 0.71 1.9 0.83
## intrst_do 7548 0.77 0.77 0.74 0.70 2.1 0.89
## tchr_clear_answ 7548 0.83 0.83 0.81 0.78 1.8 0.86
## tchr_expl_good 7548 0.84 0.84 0.83 0.80 1.6 0.83
## tchr_show_learn 7548 0.76 0.76 0.72 0.69 1.8 0.80
## dif_help 7548 0.79 0.79 0.77 0.74 1.7 0.81
## tel_how_better 7548 0.78 0.78 0.76 0.73 1.7 0.80
## tchr_listen 7548 0.77 0.77 0.74 0.71 1.6 0.79
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## tchr_exp_do 0.53 0.39 0.06 0.02 0
## tchr_easy_undrstnd 0.48 0.35 0.12 0.05 0
## intrst_tchr_say 0.35 0.45 0.15 0.05 0
## intrst_do 0.28 0.42 0.23 0.07 0
## tchr_clear_answ 0.46 0.36 0.13 0.05 0
## tchr_expl_good 0.54 0.31 0.10 0.04 0
## tchr_show_learn 0.40 0.43 0.13 0.04 0
## dif_help 0.51 0.34 0.10 0.04 0
## tel_how_better 0.51 0.35 0.10 0.04 0
## tchr_listen 0.54 0.34 0.08 0.04 0
ML3<- as.data.frame(dat1[c("us_do_well", "difficult", "not_strong", "learn_quick", "makes_nerv")])
psych::alpha(ML3, check.keys=TRUE)
## Warning in psych::alpha(ML3, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
## This is indicated by a negative sign for the variable name.
##
## Reliability analysis
## Call: psych::alpha(x = ML3, check.keys = TRUE)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.88 0.88 0.86 0.59 7.1 0.0022 2.9 0.81 0.62
##
## lower alpha upper 95% confidence boundaries
## 0.87 0.88 0.88
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## us_do_well- 0.84 0.84 0.81 0.57 5.4 0.0029 0.0117
## difficult 0.83 0.84 0.81 0.57 5.2 0.0031 0.0143
## not_strong 0.83 0.83 0.80 0.55 4.9 0.0033 0.0121
## learn_quick- 0.85 0.85 0.82 0.59 5.7 0.0028 0.0129
## makes_nerv 0.88 0.89 0.86 0.66 7.9 0.0021 0.0026
## med.r
## us_do_well- 0.56
## difficult 0.58
## not_strong 0.56
## learn_quick- 0.58
## makes_nerv 0.65
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## us_do_well- 7548 0.83 0.84 0.80 0.74 3.2 0.86
## difficult 7548 0.86 0.85 0.81 0.76 2.9 1.03
## not_strong 7548 0.88 0.87 0.84 0.79 2.8 1.11
## learn_quick- 7548 0.81 0.82 0.77 0.71 3.0 0.93
## makes_nerv 7548 0.72 0.71 0.58 0.55 2.8 1.01
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## us_do_well 0.45 0.36 0.14 0.05 0
## difficult 0.13 0.20 0.32 0.36 0
## not_strong 0.18 0.20 0.27 0.34 0
## learn_quick 0.34 0.37 0.22 0.08 0
## makes_nerv 0.12 0.25 0.32 0.31 0
fa1<-fa(dat1, nfactors=3, rotate="varimax", fm="ml", cor="mixed", scores=T)
##
## mixed.cor is deprecated, please use mixedCor.
load <- fa1$loadings[,1:2]
fascores<-as.data.frame(fa1$scores)
datfa<-cbind(data1,fascores) # now we have all our factor scores in one data frame, datfa
datfa1 <- datfa %>% select("math_achiv", "sex_stud", "momedu", "fathedu", "if_native", "ML2", "ML1", "ML3")
datfa1 = as.data.frame(lapply(datfa1, as.numeric))
#names(datfa1) #good!
Now we can drive into regression, where outcome is student’s math achievements. I’ve conducted regression models using forward selection: I’ve started with one predictor (first factor’s scores), than added one by one other factors and information about student and their parent’s education.
The most reliable model, in my opinion, is the last one, combining all these predictors. In has not bad adjusted R-squared (0.317), explaining 32% of variability of math achievement. Below the summary for this model can be seen, as well as graphics for it. As for the outliers and leverages: there are two outliers (241 and 4323) and no leverages.
As for the relation of predictors: it can be seen that both factors have negative estimate. That is because the scale is from 1 to 4, where 4 means disagreement and 1 is agreement (as I’ve mentioned at the beginning). So in this case - strong negative relation of these predictors means positive relation of positive attitudes. For this model it can be said: when students like the subject and its classes and when they strong in it, easily catch the material, their achievemnts are better (comletely unexpectedly). In addition, mother’s education gives positive relation to students’ achievements.
#in order not to take to much space with summaries, i dont show it: R-squared, p-values and important notes are in captions here :)
model1_1 <- lm(math_achiv ~ ML1, data= datfa1)
#summary(model1_1) # R^2 = 0.046 , p-value < 2.2e-16, ML1 has significant negative relation to the outcome
model1_2 <- lm(math_achiv ~ ML1 + ML2, data= datfa1)
#summary(model1_2) # R^2 = 0.046 , p-value < 2.2e-16, ML2 gives insignficant results
model1_3 <- lm(math_achiv ~ ML1 + ML2 + ML3, data= datfa1)
#summary(model1_3) # R^2 = 0.317 , p-value < 2.2e-16 - ML2 still insignificant, but R^2 is much better! ML3 is significant, giving negative relation
model2_1 <- lm(math_achiv ~ ML1 + ML2 + ML3 + sex_stud, data= datfa1)
#summary(model2_1) # R^2 = 0.317, p-value < 2.2e-16, student's gender in this case is insignificant
model2_2 <- lm(datfa1$math_achiv ~ ML1 + ML2 + ML3 + sex_stud + momedu, data= datfa1)
#summary(model2_2) # R^2 = 0.317, p-value < 2.2e-16, mother's education is significant, gives positive relation to the outcome
model2_3 <- lm(datfa1$math_achiv ~ ML1 + ML2 + ML3 + sex_stud + momedu + fathedu, data= datfa1)
#summary(model2_3) # R^2 = 0.317, p-value < 2.2e-16, father's education is insignificant (but mother's estimate became stronger)
model_f <- lm(datfa1$math_achiv ~., data= datfa1)
summary(model_f) # R^2 = 0.317, p-value < 2.2e-16, if student is native or not- no significant relation.
##
## Call:
## lm(formula = datfa1$math_achiv ~ ., data = datfa1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6000.5 -1575.4 35.1 1643.7 6325.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4529.308 135.094 33.527 <2e-16 ***
## sex_stud -81.523 48.369 -1.685 0.0919 .
## momedu 32.341 15.904 2.034 0.0420 *
## fathedu -7.462 15.259 -0.489 0.6248
## if_native -50.839 70.426 -0.722 0.4704
## ML2 -13.191 23.345 -0.565 0.5721
## ML1 -379.704 24.960 -15.213 <2e-16 ***
## ML3 -1365.080 25.026 -54.546 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2082 on 7540 degrees of freedom
## Multiple R-squared: 0.3178, Adjusted R-squared: 0.3171
## F-statistic: 501.7 on 7 and 7540 DF, p-value: < 2.2e-16
plot(model_f) #no leverages
qqPlot(model_f, main="QQ Plot") #outliers are: 241 and 4323
## [1] 241 4323
As for the AIC analysis - the lowest (=>the best) score has the model with all factors and mother’s education. However, it is not differ much from AIC score for the full model (that I’ve chosen).
Model <- c("model1_1", "model1_2", "model1_3 - with all 3 factors", "model2_1", "model2_2 - lowest (best) score", "model2_3", "Full model")
AIC_score <- c("139297", "139299", "136780", "136780", "136777", "136779", "136780")
AIC_scores <- data.frame(Model, AIC_score)
kable(AIC_scores) %>%
kable_styling(c("bordered"))
| Model | AIC_score |
|---|---|
| model1_1 | 139297 |
| model1_2 | 139299 |
| model1_3 - with all 3 factors | 136780 |
| model2_1 | 136780 |
| model2_2 - lowest (best) score | 136777 |
| model2_3 | 136779 |
| Full model | 136780 |
# to show it in more tidy way I've created a table an filled it with the scores, calculated by this code
# the lower the better
#AIC(model1_1) #139297
#AIC(model1_2) #139299
#AIC(model1_3) #136780
#AIC(model2_1) #136780
#AIC(model2_2) #136777 - the lowest
#AIC(model2_3) #136779
#AIC(model_f) #136780
After that I’ve created one more model: only significant predictors are here. AIС is the lowest, other features are the same: R-squared = 0.317, p-value <2.2e-16. Same outliers, no leverages.
model2_4 <- lm(datfa1$math_achiv ~ ML1 + ML3 + momedu, data= datfa1)
summary(model2_4)
##
## Call:
## lm(formula = datfa1$math_achiv ~ ML1 + ML3 + momedu, data = datfa1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5987.3 -1577.2 30.6 1634.7 6280.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4343.62 82.41 52.706 <2e-16 ***
## ML1 -376.41 24.67 -15.255 <2e-16 ***
## ML3 -1361.15 24.91 -54.634 <2e-16 ***
## momedu 25.95 12.79 2.028 0.0425 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2082 on 7544 degrees of freedom
## Multiple R-squared: 0.3174, Adjusted R-squared: 0.3171
## F-statistic: 1169 on 3 and 7544 DF, p-value: < 2.2e-16
AIC(model2_4)
## [1] 136776.8
#plot(model2_4) #no leverages
#qqPlot(model2_4, main="QQ Plot") #outliers are: 241 and 4323
To test for multicollinearity, I’ve used VIF analysis (VIF > 10 indicates the presence of multicollinearity)- in this case everything is OK!
vif(model_f)
## sex_stud momedu fathedu if_native ML2 ML1 ML3
## 1.017592 1.546183 1.550944 1.025065 1.008332 1.041084 1.027315
vif(model2_4)
## ML1 ML3 momedu
## 1.017399 1.018096 1.000696
In general, the following can be said:
canadian students’ attitudes towards maths can be divided on their theacher’s qualities, their own strength in the subject and how much do they like/dislike the subject and its classes;
two of these factors give significant relation to students’ math achievement: these factors are students’ interest in the subject and classes and their strenghs and personal ablilities. When students like the subject and high abilities in learning math they have higher achievements;
out of other predictors, mother’s education is significant for the students’ achievements - the higher is educational capital the higher the grades.