submitted by: Navona
due date: 2019-11-26
last ran: 2019-11-26
website: http://rpubs.com/navona/PSY2002_assignment04

Question 1.

Build a path model, using the variables Fluid, Verbal, and Social Intelligence to predict Love for Basketball.

a. How do you define the model?

First.Model <- 'Luv4Baskt ~ FluidInt + VerbInt + SocInt'

b. Run the model using sem(). Draw the appropriate path model (outcome, predictors, predictor covariance, and outcome residual error) and label all paths with their standardized weights.

#use semPaths package for plotting
semPaths(q1.model, 
         what='std', #plot the standardize scores 
         rotation=2, #to look like class example
         nCharNodes=0, #plot full variable name
         sizeMan=12, #font of manifest variables
         edge.color='black', #edge colour
         edge.label.cex= 1, #edge text size
         nDigits = 3, #number of floats
         fade=FALSE, #edges aren't gradient colour
         style='lisrel', #remove residual variances for predictors
         curve=2, #increase angle of covariance curve
         filetype='jpg') #write out as an image

#read sem image back in
sem_image <- readJPEG("qgraph.jpg") #read the image back in

#feed sem image into ggplot, and annotate
qplot(1:10, 1:10, geom="blank") + #define graph size
  background_image(sem_image) + #set sem image as background
  annotate("text", x = 9.4, y = 6, label = "error") + #add error
  annotate("text", x = 7, y = 9, size = 7, parse = TRUE, label = as.character(expression(paste(chi^2, "(0, N=200) = 0.000, ", italic("p"), "<0.000")))) + #add statistics
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())

Note: Square nodes indicate manifest variables. Directed edges indicate linear regression parameters (thicker lines indicate larger path coefficients). Dashed bidirectional edges indicate covariances, and assumes that the variables are exogenous. Residual variance (error) of the predicted variable is indicated with thick solid arrow.

c. Describe and interpret the regression analysis as you would for publication. Comment on which paths are significant and include the proportion of variance in Luv4Baskt that is predicted by its predictors.

Our path model tested if fluid, verbal, and/or social intelligence predicted love for basketball. We found that FludInt was not a significant indicator of love for basketball (p=.067), but VerbInt (p=.004) and SocInt (p=<.001) were. Specifically, FludInt accounted for 1.2% of the variance, where as VerbInt and SocInt accounted for 4% and 10.3%, respectively.

We also examined the global model fit with chi-square (\(X^2\)), which measures of deviance between the model-implied covariance matrix, and the observed covariance matrix. The \(X^2\) statistic was highly significant (p<.001), indicating poor fit.

Question 2.

a. Construct the same model as for Question 1, but this time, constrain the co-variances between the MVs to be 0. What is the chi-square and p value for this model? Is this model a good fit for this data?

#define the model 
Second.Model <- 'Luv4Baskt ~ FluidInt + VerbInt + SocInt
FluidInt ~~ 0*VerbInt
FluidInt ~~ 0*SocInt
VerbInt  ~~ 0*SocInt'

#run the model
q2.model = sem(Second.Model, data = AD)

#summarize
summary(q2.model)

When we constrained the co-variances between the predictors, all three paths were significant (fluid intelligence p = .036; verbal p = .001, social p < .001). However, the model fit remains poor, \(X^2\)(3, N=200) = 107.362, p<.000.

Likewise, the \(X^2/df\) ratio (Joreskog, 1969), which provides a measure of badness-of-fit, is very large (35.787); a ratio value of <1 is a typical heuristic of good fit.

b. Look at the Modification Indices for this model. What is expected to reduce the chi-square value the most, and how much is the chi-square value predicted to change by?

modification_q2 <- modindices(q2.model)
modification_q2[modification_q2$op == "~~",] #sort to include only indices for covariance

##        lhs op     rhs     mi    epc sepc.lv sepc.all sepc.nox
## 4 FluidInt ~~ VerbInt 20.584  8.317   8.317    0.321    0.321
## 5 FluidInt ~~  SocInt 45.010 24.393  24.393    0.474    0.474
## 6  VerbInt ~~  SocInt 46.992 13.945  13.945    0.485    0.485

Modication indices provide a rough estimate of how well the \(X^2\) test statistic of a model would improve, if a particular parameter(s) were unconstrained, i.e., free to covary with others. If parameter pairs with a high index are freed, model fit may improve.

We see that freeing the variance between VerbInt and SocInt is expected to reduce \(X^2\) the most. Specifically, the \(X^2\) value is expected to reduce by an index of 46.992 (mi), and it is expected that the standardized expected parameter change (sepc.all) will take a value of .485.

c. How would you define a new model with the variance unconstrained using the information from question 2(b)? Run the new model using sem() and give the new chi-square and p values.

#define new model
Second.Model.Mod <- 'Luv4Baskt ~ FluidInt + VerbInt + SocInt
VerbInt  ~~ SocInt
FluidInt ~~ 0*VerbInt
FluidInt ~~ 0*SocInt'

#run new model
q2.model.mod = sem(Second.Model.Mod, data = AD)

#see test statistic
show(q2.model.mod)

To run a new model with variance unconstrained between VerbInt and SocInt, but leave the varaince contrained between the other two combinations of manifest variables, we code:
Second.Model.Mod <- 'Luv4Baskt ~ FluidInt + VerbInt + SocInt
VerbInt ~~ SocInt
FluidInt ~~ 0*VerbInt
FluidInt ~~ 0*SocInt'

The test statistic for the modified model, with freed covariance between VerbInt and SocInt, is \(X^2\)(2, N=200) = 53.796, p<.000. Thus, our model remains a poor fit to our data.

Question 3.

Now we want to build a model with two unobserved (latent) variables, each of which underlies 3 of the observed variables. Generate a model where Fluid, Verbal and Social Intelligence are indicators for Factor 1 (“Intelligence”), while Raptors Dedication (RapDed), Psychology Today (PsyToday) and Stock Market (StkMrkt) are indicators for Factor 2 (“Long-term life satisfaction (LTLS)”). We also think that Intelligence will predict LTLS, so define a regression between LTLS and Intelligence in the model.

a. In a couple of sentences, explain what we are trying to test with this model (what is our hypothesis)?

Path analyses allow for a separate ‘measurement’ and ‘structural’ hypothesis. Our measurement hypothesis is that (i) fluid, verbal, and social intelligence are indicators of global intelligence, and that (ii) dedication to the Raptors, reading ‘Psychology Today’, and investing in the stock market are indicators of long-term life satisfacton. Our structural hypothesis is that long-term life satisfaction has a positive effect on global intelligence.

b. How do you define this new model?

Third.Model <- 'Intelligence =~ FluidInt + VerbInt + SocInt
LTLS =~ RapDed + PsyToday + StkMrkt
Intelligence ~ LTLS'

c. Run the model using sem(). What are the standardized coefficients for our MVs, and the beta value between our LVs?

The standardized coefficients for the manifest variables are as follows:

#make table of manifest coefficients
q3.model_pars[1:6, c(1:3, 11)]

##            lhs op      rhs std.all
## 1 Intelligence =~ FluidInt   0.557
## 2 Intelligence =~  VerbInt   0.605
## 3 Intelligence =~   SocInt   0.819
## 4         LTLS =~   RapDed   0.878
## 5         LTLS =~ PsyToday   0.822
## 6         LTLS =~  StkMrkt   0.880

The beta value between the latent variables (Intelligence and LTLS) is β=.651:

#beta value between latent variables
q3.model_pars[7, c(1:3, 11)]

##            lhs op  rhs std.all
## 7 Intelligence  ~ LTLS   0.651

d. How good is the model at explaining our observed data? Give the chi-square and p-value.

\(X^2\)(8, N=200)=17.538, p=.025. Thus, the model fit remains poor (but, it is improving).

e. Look at the modification indices and discuss what changes you might make to the model to improve the fit.

Click to show table of modification indices:

#print indices
(modification_q3 <- modindices(q3.model))

##             lhs op      rhs     mi    epc sepc.lv sepc.all sepc.nox
## 16 Intelligence =~   RapDed  1.713  0.102   0.386    0.102    0.102
## 17 Intelligence =~ PsyToday  0.041 -0.022  -0.084   -0.016   -0.016
## 18 Intelligence =~  StkMrkt  1.257 -0.204  -0.774   -0.087   -0.087
## 19         LTLS =~ FluidInt  0.816 -0.211  -0.702   -0.103   -0.103
## 20         LTLS =~  VerbInt  1.708  0.179   0.598    0.157    0.157
## 21         LTLS =~   SocInt  0.203 -0.173  -0.577   -0.076   -0.076
## 22     FluidInt ~~  VerbInt  0.203 -0.722  -0.722   -0.042   -0.042
## 23     FluidInt ~~   SocInt  1.708  5.393   5.393    0.220    0.220
## 24     FluidInt ~~   RapDed  3.006  1.640   1.640    0.159    0.159
## 25     FluidInt ~~ PsyToday  2.281  2.100   2.100    0.127    0.127
## 26     FluidInt ~~  StkMrkt 12.891 -7.929  -7.929   -0.332   -0.332
## 27      VerbInt ~~   SocInt  0.816 -2.338  -2.338   -0.178   -0.178
## 28      VerbInt ~~   RapDed  0.182  0.221   0.221    0.040    0.040
## 29      VerbInt ~~ PsyToday  2.247 -1.139  -1.139   -0.128   -0.128
## 30      VerbInt ~~  StkMrkt  2.659  1.972   1.972    0.154    0.154
## 31       SocInt ~~   RapDed  0.032 -0.170  -0.170   -0.022   -0.022
## 32       SocInt ~~ PsyToday  0.009 -0.130  -0.130   -0.010   -0.010
## 33       SocInt ~~  StkMrkt  0.007  0.192   0.192    0.010    0.010
## 34       RapDed ~~ PsyToday  1.257 -1.236  -1.236   -0.232   -0.232
## 35       RapDed ~~  StkMrkt  0.041 -0.459  -0.459   -0.060   -0.060
## 36     PsyToday ~~  StkMrkt  1.713  3.397   3.397    0.274    0.274

Most modification indices (mi) are relatively low, suggesting that our model wouldn’t change much by freeing covariance between pairs of variables. However, one pair has an mi value above 10:

subset(modification_q3, mi > 10) #sort to examine `mi` values over 10

##         lhs op     rhs     mi    epc sepc.lv sepc.all sepc.nox
## 26 FluidInt ~~ StkMrkt 12.891 -7.929  -7.929   -0.332   -0.332

When we sort our table to mi values > 10, we see that we may be able to improve the fit by freeing the covariance between FluidInt and StkMrkt.

Question 4.

Now let’s test whether a model holds similarly for two different groups. For this question, we are going to test whether a given model holds for both West coast and East coast Canadians. Generate a model with a single unobserved variable (Intelligence) predicting the three observed intelligence measures and Love for Basketball. Free the variance for the first MV, and constrain the variance in the LV ‘Intelligence’ to 1.

a. How do you define this model?

Fourth.Model <- 'Intelligence =~ NA*FluidInt + VerbInt + SocInt + Luv4Baskt
Intelligence ~~ 1*Intelligence'

b. Run the model using sem(). Then, run the model with loadings and intercepts equal between groups. Calculate an ANOVA between the original model and the group-equal model. Is there a difference between groups?

#run the model
q4.model <- sem(Fourth.Model, data = AD, group = "Group")

#run a new model with loading and intercepts equal between groups
q4.model.eq <- sem(Fourth.Model, data = AD,group = "Group",
                   group.equal= c("loadings","intercepts"))

#calculate an ANOVA
anova(q4.model,q4.model.eq)

## Chi-Squared Difference Test
## 
##             Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## q4.model     4 4252.8 4331.9 3.0732                              
## q4.model.eq 11 4240.7 4296.8 5.0531     1.9798       7     0.9609

The test statistic for this difference model is \(X^2_{diff}\)(7, N=200)=1.9798, p=.9609.

Thus, the full equality model suggests that there is not a difference between groups, i.e., there is no difference between the three predictors of Intelligence, between West coast and East coast Canadians.

c. Next, we have reason to think that Westerners and Easterners differ in the influence of Intelligence on Verbal and Social scores, but the two groups have the same relationship between Love for Basketball and Fluid Intelligence. Run a new model with the appropriate freed and fixed parameters between groups. Compare this partial equality model to the earlier full equality model.

#equality constraint over only some parameters
frees <- c("Intelligence =~ VerbInt", "Intelligence =~ SocInt")

#run model
q4.model.partial <- sem(Fourth.Model, data = AD,
                        group = "Group",
                        group.equal= c("loadings","intercepts"),
                        group.partial = frees)

#calculate ANOVA to compare between full equality model and partial fit model
anova(q4.model.eq, q4.model.partial)

## Chi-Squared Difference Test
## 
##                  Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## q4.model.partial  9 4244.5 4307.2 4.8350                              
## q4.model.eq      11 4240.7 4296.8 5.0531    0.21806       2     0.8967

The test statistic for this difference model is \(X^2_{diff}\)(2, N=200)=.21806, p=.8967.

The described partial equality model suggests that there is not a difference between groups, i.e., there is no difference between the three predictors of Intelligence, between West coast and East coast Canadians. This is the same result as the full equality model above.

d. In a few sentences, summarize and interpret your results for this model. Discuss what MVs contribute most significantly, and whether they differ between groups.

subset(q4_parEst, group == 1)

##             lhs op          rhs block group label    est    se      z
## 1  Intelligence =~     FluidInt     1     1  .p1.  3.884 0.507  7.666
## 2  Intelligence =~      VerbInt     1     1        2.387 0.421  5.670
## 3  Intelligence =~       SocInt     1     1        6.278 0.805  7.795
## 4  Intelligence =~    Luv4Baskt     1     1  .p4.  0.609 0.074  8.255
## 5  Intelligence ~~ Intelligence     1     1        1.000 0.000     NA
## 6      FluidInt ~~     FluidInt     1     1       32.031 5.234  6.120
## 7       VerbInt ~~      VerbInt     1     1       10.702 1.828  5.853
## 8        SocInt ~~       SocInt     1     1       20.910 6.797  3.076
## 9     Luv4Baskt ~~    Luv4Baskt     1     1        0.622 0.107  5.823
## 10     FluidInt ~1                  1     1 .p10. 28.732 0.582 49.400
## 11      VerbInt ~1                  1     1 .p11. 24.649 0.332 74.227
## 12       SocInt ~1                  1     1 .p12. 14.430 0.736 19.614
## 13    Luv4Baskt ~1                  1     1 .p13.  0.015 0.087  0.168
## 14 Intelligence ~1                  1     1        0.000 0.000     NA
##    pvalue ci.lower ci.upper std.lv std.all std.nox
## 1   0.000    2.891    4.877  3.884   0.566   0.566
## 2   0.000    1.562    3.212  2.387   0.589   0.589
## 3   0.000    4.699    7.856  6.278   0.808   0.808
## 4   0.000    0.464    0.753  0.609   0.611   0.611
## 5      NA    1.000    1.000  1.000   1.000   1.000
## 6   0.000   21.774   42.289 32.031   0.680   0.680
## 7   0.000    7.118   14.286 10.702   0.653   0.653
## 8   0.002    7.589   34.232 20.910   0.347   0.347
## 9   0.000    0.412    0.831  0.622   0.627   0.627
## 10  0.000   27.592   29.872 28.732   4.186   4.186
## 11  0.000   23.998   25.299 24.649   6.086   6.086
## 12  0.000   12.988   15.872 14.430   1.858   1.858
## 13  0.866   -0.156    0.185  0.015   0.015   0.015
## 14     NA    0.000    0.000  0.000   0.000   0.000

subset(q4_parEst, group == 2)

##             lhs op          rhs block group label    est    se      z
## 15 Intelligence =~     FluidInt     2     2  .p1.  3.884 0.507  7.666
## 16 Intelligence =~      VerbInt     2     2        2.224 0.365  6.099
## 17 Intelligence =~       SocInt     2     2        5.860 0.748  7.835
## 18 Intelligence =~    Luv4Baskt     2     2  .p4.  0.609 0.074  8.255
## 19 Intelligence ~~ Intelligence     2     2        1.000 0.000     NA
## 20     FluidInt ~~     FluidInt     2     2       30.419 4.998  6.087
## 21      VerbInt ~~      VerbInt     2     2        7.680 1.368  5.612
## 22       SocInt ~~       SocInt     2     2       19.524 5.848  3.338
## 23    Luv4Baskt ~~    Luv4Baskt     2     2        0.626 0.107  5.875
## 24     FluidInt ~1                  2     2 .p10. 28.732 0.582 49.400
## 25      VerbInt ~1                  2     2 .p11. 24.649 0.332 74.227
## 26       SocInt ~1                  2     2 .p12. 14.430 0.736 19.614
## 27    Luv4Baskt ~1                  2     2 .p13.  0.015 0.087  0.168
## 28 Intelligence ~1                  2     2       -0.047 0.164 -0.285
##    pvalue ci.lower ci.upper std.lv std.all std.nox
## 15  0.000    2.891    4.877  3.884   0.576   0.576
## 16  0.000    1.509    2.939  2.224   0.626   0.626
## 17  0.000    4.394    7.326  5.860   0.798   0.798
## 18  0.000    0.464    0.753  0.609   0.610   0.610
## 19     NA    1.000    1.000  1.000   1.000   1.000
## 20  0.000   20.624   40.214 30.419   0.668   0.668
## 21  0.000    4.998   10.362  7.680   0.608   0.608
## 22  0.001    8.061   30.986 19.524   0.362   0.362
## 23  0.000    0.417    0.835  0.626   0.628   0.628
## 24  0.000   27.592   29.872 28.732   4.259   4.259
## 25  0.000   23.998   25.299 24.649   6.937   6.937
## 26  0.000   12.988   15.872 14.430   1.966   1.966
## 27  0.866   -0.156    0.185  0.015   0.015   0.015
## 28  0.776   -0.369    0.275 -0.047  -0.047  -0.047

The partial equality model provides a good model fit. Across both groups, our test statistic is \(X^2\)(9, N=200)=4.835, p=.848.

The \(X^2\) test statistic for group 1 (Western Canadians) is 0.964. The order of the manifest variables’ contribution is SocInt (β=0.808), followed by Luv4Baskt (β=0.611), VerbInt (β=0.589), and FluidInt (β=0.566).

The \(X^2\) test statistic for group 2 (Eastern Canadians) is 3.871. A similar though not identical pattern exists regarding the order of the manifest variables’ contribution: Socint (β=0.798), VerbInt (β=0.626), Luv4Baskt (β=0.610), FluidInt (β=0.576).

PSY2002 Assignment 4 Structural Equation Modeling

Question 1.

Question 2.

Question 3.

Question 4.