Data analysis. 3rd year. Final exam. 2019

Vsevolod Suschevskiy

2019-06-23

0.1 Research Problem

“What is more important for student achievement nowadays: books or computers?”

Perhaps the only thing I learned from the sociology of education lessons is that the number of books at home was a good predictor of success in school. Lord for God’s sake, I do not remember what this article is for. However, the debate about the use of technology for learning is long (The impact of using multimedia on students’ academic achievement in the College of Education at King Saud University, Sara Aloraini 2012). And what is more important for this study, even home use of technology has an impact on educational success (Schacter, J., & Fagnano, C. (1999). Does Computer Technology Improve Student Learning and Achievement? How, When, and under What Conditions? Journal of Educational Computing Research, 20(4), 329–343).

On the other hand, it is obvious that the better the student reads and things like that, the better he will cope with the school, because the school (not the higher school of economics) teaches to study through reading (Students’ self-perception of reading ability, enjoyment of reading and reading achievement Jeffrey K.Smith, 2012, Learning and Individual Differences Volume 22, Issue 2, April 2012, Pages 202-206). And success in reading, in turn, depends on the student’s motivation (WANG, J. H. and GUTHRIE, J. T. (2004), Modeling the effects of intrinsic motivation, extrinsic motivation, amount of reading, and past reading achievement on text comprehension between U.S. and Chinese students. Reading Research Quarterly, 39: 162-186. doi:10.1598/RRQ.39.2.2).

Taking the above into account, and the fact that I could not find the article that answered the research question. It would be very interesting to find out whether it is necessary to set up an experiment on younger brothers, or regression and factor analysis will provide a convincing answer.

0.2 Preparation packages and data

Index for computers was created, for fun Amount of books were ordered.

vars n mean sd median trimmed mad min max range skew kurtosis se
math 1 4695 2398.314377 1376.6991372 2400 2400.327921 1765.7766 1 4775 4774 -0.009533691 -1.1985640 20.091910824
books* 2 4695 2.922045 1.0420912 3 2.863455 1.4826 1 5 4 0.321907912 -0.3794799 0.015208554
table* 3 4695 1.151864 0.3589269 1 1.064945 0.0000 1 2 1 1.939459704 1.7618793 0.005238275
pc* 4 4695 1.162513 0.3689603 1 1.078254 0.0000 1 2 1 1.829003130 1.3455392 0.005384705
gender* 5 4695 1.518637 0.4997058 2 1.523290 0.0000 1 2 1 -0.074575398 -1.9948633 0.007292838
migrant* 6 4695 1.177210 0.4671942 1 1.051371 0.0000 1 3 2 2.675987115 6.4160230 0.006818356
computers* 7 4695 1.979766 0.1408158 2 2.000000 0.0000 1 2 1 -6.812637956 44.4214975 0.002055103

Math score distributed not normally, but since there are more than 2000 observations, we happily violate this assumption. Most of students have 26-50 books, and PC or table at home. Boys and girls equally distributed in this sample. Most of students are not from migrant families.

## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  rnorm(10^4) and data_reg$math
## D = 0.99936, p-value < 2.2e-16
## alternative hypothesis: two-sided

As we see, our distribution is not really normal, as p-value < 2.2e-16 and given null hypothesis of normality. However we are okay with this violation, because there are more than 2000 observations.

0.3 Regression

Lets start our forwards approach to construction of regression.

0.3.1 Model

Obviously we will start from the books.

## 
## Books and math
##                                             
## ============================================
##                               MATH          
## --------------------------------------------
## books.L                    606.088***       
##                             (66.833)        
##                                             
## books.Q                     -84.602         
##                             (58.984)        
##                                             
## books.C                    -133.278**       
##                             (50.681)        
##                                             
## books4                      -31.685         
##                             (39.373)        
##                                             
## Constant                  2,408.523***      
##                             (24.561)        
##                                             
## Observations                 4,695          
## R2                            .026          
## Adjusted R2                   .026          
## Residual Std. Error  1,358.957 (df = 4690)  
## F Statistic         31.842*** (df = 4; 4690)
## --------------------------------------------
## Notes:              *P < .05                
##                     **P < .01               
##                     ***P < .001

Only books explains 2% of the model, so we won`t really discuss results. There is positive linear relation between amount of books at home and math scores, as well as negative cubic.

## 
## Books and math and devices
##                                             
## ============================================
##                               MATH          
## --------------------------------------------
## books.L                    601.451***       
##                             (66.947)        
##                                             
## books.Q                     -82.384         
##                             (59.012)        
##                                             
## books.C                    -132.636**       
##                             (50.682)        
##                                             
## books4                      -30.669         
##                             (39.381)        
##                                             
## computersYes                166.068         
##                            (141.213)        
##                                             
## Constant                  2,246.092***      
##                            (140.287)        
##                                             
## Observations                 4,695          
## R2                            .027          
## Adjusted R2                   .026          
## Residual Std. Error  1,358.901 (df = 4689)  
## F Statistic         25.753*** (df = 5; 4689)
## --------------------------------------------
## Notes:              *P < .05                
##                     **P < .01               
##                     ***P < .001

No changes in model, so let`s add tables and pc separatelly

## 
## Books and math and pc and tablets
##                                             
## ============================================
##                               MATH          
## --------------------------------------------
## books.L                    586.373***       
##                             (66.628)        
##                                             
## books.Q                     -83.268         
##                             (58.657)        
##                                             
## books.C                    -129.333*        
##                             (50.398)        
##                                             
## books4                      -35.179         
##                             (39.151)        
##                                             
## pcNo                      -349.642***       
##                             (53.609)        
##                                             
## tableNo                    205.942***       
##                             (55.028)        
##                                             
## Constant                  2,433.808***      
##                             (27.481)        
##                                             
## Observations                 4,695          
## R2                            .038          
## Adjusted R2                   .037          
## Residual Std. Error  1,350.866 (df = 4688)  
## F Statistic         31.208*** (df = 6; 4688)
## --------------------------------------------
## Notes:              *P < .05                
##                     **P < .01               
##                     ***P < .001

Saparatelly they improve the model, but owning a PC increase match scores, hence owning a tablet decrease them. R^2 = 0.037…

Control variables should be added

## 
## Books and pc + controls
##                                             
## ============================================
##                               MATH          
## --------------------------------------------
## books.L                    600.232***       
##                             (66.648)        
##                                             
## books.Q                     -81.540         
##                             (58.604)        
##                                             
## books.C                    -130.625**       
##                             (50.352)        
##                                             
## books4                      -30.046         
##                             (39.128)        
##                                             
## pcNo                      -359.420***       
##                             (53.623)        
##                                             
## tableNo                    221.543***       
##                             (55.126)        
##                                             
## genderMale                 149.084***       
##                             (39.717)        
##                                             
## migrantNo                   -35.770         
##                             (64.538)        
##                                             
## migrantI don't know         -102.465        
##                            (105.800)        
##                                             
## Constant                  2,363.628***      
##                             (35.364)        
##                                             
## Observations                 4,695          
## R2                            .042          
## Adjusted R2                   .040          
## Residual Std. Error  1,349.068 (df = 4685)  
## F Statistic         22.584*** (df = 9; 4685)
## --------------------------------------------
## Notes:              *P < .05                
##                     **P < .01               
##                     ***P < .001

Being male increase math scores for 150, migration of parents does not effect match scores. R^2 = 0.04%, only 4% of observations are explained. Very poor model. Let`s add interaction

0.3.2 interaction

## 
## Books and pc + controls
##                                                  
## =================================================
##                                   MATH           
## -------------------------------------------------
## books.L                        600.739***        
##                                 (66.684)         
##                                                  
## books.Q                          -82.939         
##                                 (58.623)         
##                                                  
## books.C                         -128.807*        
##                                 (50.360)         
##                                                  
## books4                           -32.002         
##                                 (39.132)         
##                                                  
## genderMale                      146.181**        
##                                 (47.086)         
##                                                  
## tableNo                         189.358*         
##                                 (80.202)         
##                                                  
## pcNo                           -371.084***       
##                                 (89.556)         
##                                                  
## migrantNo                        -36.818         
##                                 (64.552)         
##                                                  
## migrantI don't know             -108.879         
##                                 (105.830)        
##                                                  
## tableNo:pcNo                     310.526         
##                                 (221.471)        
##                                                  
## genderMale:tableNo               88.621          
##                                 (119.007)        
##                                                  
## genderMale:pcNo                  30.502          
##                                 (116.582)        
##                                                  
## genderMale:tableNo:pcNo         -774.169*        
##                                 (320.429)        
##                                                  
## Constant                      2,363.999***       
##                                 (37.898)         
##                                                  
## Observations                      4,695          
## R2                                .043           
## Adjusted R2                       .040           
## Residual Std. Error       1,348.722 (df = 4681)  
## F Statistic             16.135*** (df = 13; 4681)
## -------------------------------------------------
## Notes:                  *P < .05                 
##                         **P < .01                
##                         ***P < .001

By the method of conscious scientific enumeration, a single significant interaction effect was chosen, which shows a unique effect when Men have neither a computer nor a tablet, their educational achievements in mathematics fall significantly, although the usual addition of these parameters in the regression did not give such a result.

While men as a whole do better with math, but the absence of any technique breaks their lives

## 
## Books and pc + controls
##                                                                           
## ==========================================================================
##                                                MATH                       
##                                     1                        2            
## --------------------------------------------------------------------------
## books.L                        600.739***                600.232***       
##                                 (66.684)                  (66.648)        
##                                                                           
## books.Q                          -82.939                  -81.540         
##                                 (58.623)                  (58.604)        
##                                                                           
## books.C                         -128.807*                -130.625**       
##                                 (50.360)                  (50.352)        
##                                                                           
## books4                           -32.002                  -30.046         
##                                 (39.132)                  (39.128)        
##                                                                           
## genderMale                      146.181**                149.084***       
##                                 (47.086)                  (39.717)        
##                                                                           
## tableNo                         189.358*                 221.543***       
##                                 (80.202)                  (55.126)        
##                                                                           
## pcNo                           -371.084***              -359.420***       
##                                 (89.556)                  (53.623)        
##                                                                           
## migrantNo                        -36.818                  -35.770         
##                                 (64.552)                  (64.538)        
##                                                                           
## migrantI don't know             -108.879                  -102.465        
##                                 (105.830)                (105.800)        
##                                                                           
## tableNo:pcNo                     310.526                                  
##                                 (221.471)                                 
##                                                                           
## genderMale:tableNo               88.621                                   
##                                 (119.007)                                 
##                                                                           
## genderMale:pcNo                  30.502                                   
##                                 (116.582)                                 
##                                                                           
## genderMale:tableNo:pcNo         -774.169*                                 
##                                 (320.429)                                 
##                                                                           
## Constant                      2,363.999***              2,363.628***      
##                                 (37.898)                  (35.364)        
##                                                                           
## Observations                      4,695                    4,695          
## R2                                .043                      .042          
## Adjusted R2                       .040                      .040          
## Residual Std. Error       1,348.722 (df = 4681)    1,349.068 (df = 4685)  
## F Statistic             16.135*** (df = 13; 4681) 22.584*** (df = 9; 4685)
## --------------------------------------------------------------------------
## Notes:                  *P < .05                                          
##                         **P < .01                                         
##                         ***P < .001

R^2 is same for all models, so we need anova or bic to compare them

## [1] 81011.14
## [1] 81012.73

They are the same

## Analysis of Variance Table
## 
## Model 1: math ~ books + gender + table * pc * gender + migrant
## Model 2: math ~ books + pc + table + gender + migrant
##   Res.Df        RSS Df Sum of Sq      F Pr(>F)
## 1   4681 8514982777                           
## 2   4685 8526622397 -4 -11639620 1.5997 0.1715

First model uses less term for same poor result. That is our best model.

0.3.3 diagnostics

Let’s check model for multicollinearity.

##             GVIF Df GVIF^(1/(2*Df))
## books   1.017277  4        1.002143
## pc      1.009589  1        1.004783
## table   1.009702  1        1.004839
## gender  1.015926  1        1.007931
## migrant 1.006517  2        1.001625

Coefficients are less than 5, everything’s OK!

## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
##      rstudent unadjusted p-value Bonferroni p
## 1146 2.204854           0.027513           NA

No problems with outliers, exept for 1146. I will add this student to my list

## 1146 2759 
## 1121 2709

Residuals distributed not so normally, there are some outliers: 2759 and 1146

Model explain data OK in each set, graphs looks fine, due to data structure they are funny. (Variables are no integer in general) There are outliers 2759, 1146, 2759, 1138, 1163, 114

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.5022577, Df = 1, p = 0.47851

strong indicator of heteroscedasticity

not so normal…

Model shows not a really good(really bad) fit

moving on!

0.4 EFA

For a some reason there is only one one sighnificant correlation in this data. Could I did something totally wrong? Yes, sure.

## Parallel analysis suggests that the number of factors =  5  and the number of components =  4

5 factors are recommended

one of factors explains only one variable, we need to reduce them to 4

only two variable for factor is still not so good.

Looks better, but some of factors poorly (0.4) correlate with variables

we need to try different rotation

It does not improve anyrhing, but making it only worse, by introducing intercorrelation

BSBG06E,BSBG06F, BSBG06B – ML2 BSBG06H not explainde BSBG06C, BSBG06D, BSBG06A – ML1 BSBG06K, BSBG06G, BSBG06I, BSBG06J – ML3

ML2 – Internet connection, Your own mobile phone, A computer or tablet that is shared with other people at home Minimal needs, civilization

ML1 – Study desk/table for your use, Your own room, A computer or tablet of your own – Personal things

ML3 – indicators of wealth

## 
## mixed.cor is deprecated, please use mixedCor.
## 
## Loadings:
##         ML2    ML1    ML3   
## BSBG06A         0.373       
## BSBG06B  0.554              
## BSBG06C         0.912       
## BSBG06D         0.535       
## BSBG06E  0.729              
## BSBG06F  0.660              
## BSBG06G                0.571
## BSBG06H                     
## BSBG06I                0.425
## BSBG06J                0.382
## BSBG06K                0.616
## 
##                  ML2   ML1   ML3
## SS loadings    1.507 1.487 1.274
## Proportion Var 0.137 0.135 0.116
## Cumulative Var 0.137 0.272 0.388

all variables located only on one factor

## 
## Factor analysis with Call: fa(r = data_num[, -12:-15], nfactors = 3, n.obs = 4579, rotate = "varimax", 
##     fm = "ml", cor = "mixed")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 25  and the objective function was  0.31 
## The number of observations was  4579  with Chi Square =  1426.67  with prob <  7.7e-286 
## 
## The root mean square of the residuals (RMSA) is  0.05 
## The df corrected root mean square of the residuals is  0.08 
## 
## Tucker Lewis Index of factoring reliability =  0.716
## RMSEA index =  0.111  and the 10 % confidence intervals are  0.106 0.116
## BIC =  1215.93

RMSA is on 0.05 border, that will be considered as good, while as RMSR more then 0.05, border is less then 0.1, both of them does show ok model fit. Tucker Lewis Index 0.716 does not cross 0.9 border. Not a good model

0.4.1 Loadings

## 
## Reliability analysis   
## Call: psych::alpha(x = data_num[c("BSBG06A", "BSBG06B", "BSBG06C", 
##     "BSBG06D", "BSBG06E", "BSBG06F", "BSBG06G", "BSBG06H", "BSBG06I", 
##     "BSBG06J", "BSBG06K")], check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.49      0.51    0.51     0.087   1 0.011  1.3 0.16    0.073
## 
##  lower alpha upper     95% confidence boundaries
## 0.47 0.49 0.51 
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## BSBG06A      0.47      0.49    0.48     0.087 0.95    0.011 0.0040 0.069
## BSBG06B      0.50      0.52    0.51     0.097 1.08    0.011 0.0034 0.079
## BSBG06C      0.46      0.48    0.47     0.084 0.91    0.011 0.0037 0.069
## BSBG06D      0.45      0.48    0.47     0.083 0.91    0.012 0.0035 0.073
## BSBG06E      0.48      0.48    0.47     0.084 0.92    0.011 0.0041 0.073
## BSBG06F      0.48      0.49    0.48     0.088 0.97    0.011 0.0042 0.079
## BSBG06G      0.47      0.49    0.48     0.088 0.97    0.011 0.0038 0.078
## BSBG06H      0.49      0.50    0.50     0.092 1.01    0.011 0.0044 0.083
## BSBG06I      0.43      0.46    0.46     0.079 0.86    0.012 0.0041 0.067
## BSBG06J      0.45      0.48    0.48     0.086 0.94    0.012 0.0037 0.078
## BSBG06K      0.44      0.47    0.47     0.082 0.90    0.012 0.0039 0.069
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## BSBG06A 4579  0.38  0.41  0.28  0.181  1.2 0.36
## BSBG06B 4579  0.28  0.31  0.13  0.066  1.2 0.37
## BSBG06C 4579  0.37  0.44  0.34  0.223  1.1 0.28
## BSBG06D 4579  0.49  0.44  0.35  0.241  1.3 0.47
## BSBG06E 4579  0.29  0.43  0.33  0.192  1.0 0.18
## BSBG06F 4579  0.23  0.39  0.26  0.156  1.0 0.14
## BSBG06G 4579  0.43  0.39  0.26  0.194  1.7 0.44
## BSBG06H 4579  0.41  0.36  0.21  0.145  1.6 0.48
## BSBG06I 4579  0.54  0.48  0.40  0.300  1.3 0.47
## BSBG06J 4579  0.50  0.42  0.30  0.238  1.5 0.50
## BSBG06K 4579  0.48  0.45  0.36  0.273  1.8 0.40
## 
## Non missing response frequency for each item
##            1    2 miss
## BSBG06A 0.85 0.15    0
## BSBG06B 0.84 0.16    0
## BSBG06C 0.92 0.08    0
## BSBG06D 0.68 0.32    0
## BSBG06E 0.97 0.03    0
## BSBG06F 0.98 0.02    0
## BSBG06G 0.26 0.74    0
## BSBG06H 0.37 0.63    0
## BSBG06I 0.68 0.32    0
## BSBG06J 0.51 0.49    0
## BSBG06K 0.21 0.79    0

Cronbach’s Alpha shows low inter-item reliability, because value is near 0.49, (expected 0.9 or more, for exellent, or 0.7 for good) SD is nice but, who cares.

0.5 Regression 2

## 
## Books and pc + controls
##                                              
## =============================================
##                               MATH           
## ---------------------------------------------
## books.L                    560.433***        
##                             (67.722)         
##                                              
## books.Q                      -80.059         
##                             (58.919)         
##                                              
## books.C                     -129.143*        
##                             (50.634)         
##                                              
## books4                       -37.934         
##                             (39.413)         
##                                              
## genderMale                 168.029***        
##                             (40.425)         
##                                              
## tableNo                    307.775***        
##                             (61.431)         
##                                              
## pcNo                        -162.217*        
##                             (69.166)         
##                                              
## migrantNo                    -22.974         
##                             (65.081)         
##                                              
## migrantI don't know         -106.517         
##                             (106.605)        
##                                              
## moie                         -5.009          
##                             (22.720)         
##                                              
## minimum                    -141.056***       
##                             (26.777)         
##                                              
## wealth                       20.696          
##                             (25.518)         
##                                              
## Constant                  2,314.450***       
##                             (36.590)         
##                                              
## Observations                  4,579          
## R2                            .050           
## Adjusted R2                   .048           
## Residual Std. Error   1,342.038 (df = 4566)  
## F Statistic         20.042*** (df = 12; 4566)
## ---------------------------------------------
## Notes:              *P < .05                 
##                     **P < .01                
##                     ***P < .001

only minimal techinal equipment shows sighnificant and negative resuls, e.t. not having at mobilre phone and internet decrease math scores.

0.5.1 compare

## 
## Books and pc + controls
##                                                                       
## ======================================================================
##                                            MATH                       
##                                 1                        2            
## ----------------------------------------------------------------------
## books.L                    560.433***                600.232***       
##                             (67.722)                  (66.648)        
##                                                                       
## books.Q                      -80.059                  -81.540         
##                             (58.919)                  (58.604)        
##                                                                       
## books.C                     -129.143*                -130.625**       
##                             (50.634)                  (50.352)        
##                                                                       
## books4                       -37.934                  -30.046         
##                             (39.413)                  (39.128)        
##                                                                       
## genderMale                 168.029***                149.084***       
##                             (40.425)                  (39.717)        
##                                                                       
## tableNo                    307.775***                221.543***       
##                             (61.431)                  (55.126)        
##                                                                       
## pcNo                        -162.217*               -359.420***       
##                             (69.166)                  (53.623)        
##                                                                       
## migrantNo                    -22.974                  -35.770         
##                             (65.081)                  (64.538)        
##                                                                       
## migrantI don't know         -106.517                  -102.465        
##                             (106.605)                (105.800)        
##                                                                       
## moie                         -5.009                                   
##                             (22.720)                                  
##                                                                       
## minimum                    -141.056***                                
##                             (26.777)                                  
##                                                                       
## wealth                       20.696                                   
##                             (25.518)                                  
##                                                                       
## Constant                  2,314.450***              2,363.628***      
##                             (36.590)                  (35.364)        
##                                                                       
## Observations                  4,579                    4,695          
## R2                            .050                      .042          
## Adjusted R2                   .048                      .040          
## Residual Std. Error   1,342.038 (df = 4566)    1,349.068 (df = 4685)  
## F Statistic         20.042*** (df = 12; 4566) 22.584*** (df = 9; 4685)
## ----------------------------------------------------------------------
## Notes:              *P < .05                                          
##                     **P < .01                                         
##                     ***P < .001

Median math points near 2300 for both models Each additional level of owning a books increase math scores at 560 points for model with components and for 600 points for a base model equal negative relation between each additional qubic point for level of books, that decreases math score for 130 points Being male increase math scores for 170 and 150 recpectively Not having a tablet increases math scores for 300 and 220 And having a PC increases them for 160 and 360 points Each step on minimal equipment supplies increases math points on 140 Adding factors increases R^2 from 4% to 4.8%

## 
## Call:
## lm(formula = scale(math) ~ books + gender + table + pc + gender + 
##     migrant + moie + minimum + wealth, data = data_reg2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14036 -0.84544  0.00178  0.83172  2.38561 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.066159   0.026608  -2.486   0.0129 *  
## books.L              0.407551   0.049248   8.276  < 2e-16 ***
## books.Q             -0.058220   0.042846  -1.359   0.1743    
## books.C             -0.093914   0.036822  -2.550   0.0108 *  
## books^4             -0.027586   0.028661  -0.962   0.3359    
## genderMale           0.122192   0.029397   4.157 3.29e-05 ***
## tableNo              0.223817   0.044673   5.010 5.65e-07 ***
## pcNo                -0.117965   0.050298  -2.345   0.0191 *  
## migrantNo           -0.016707   0.047328  -0.353   0.7241    
## migrantI don't know -0.077460   0.077524  -0.999   0.3178    
## moie                -0.003643   0.016522  -0.220   0.8255    
## minimum             -0.102577   0.019473  -5.268 1.45e-07 ***
## wealth               0.015050   0.018557   0.811   0.4174    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9759 on 4566 degrees of freedom
## Multiple R-squared:  0.05004,    Adjusted R-squared:  0.04754 
## F-statistic: 20.04 on 12 and 4566 DF,  p-value: < 2.2e-16

as we see Books still are mostly correlated with an education in terms of msth scores

0.7 conclusion

I will throw out the computer, as the analysis made it clear that books are more important than computers, no matter how measured. here. take it, please.