Study 2 Results

The simulation to date has the following conditions:

  1. Means of estimating correlations between two factors:
    1. Scale score correlations
    2. Latent variable correlations using the mean of the posterior distribution
    3. Latent variable correlations using the model of the posterior distribution sive goes to infinity.
    4. Factor scores
    5. Plausible values (1,5,10,20,100)
  2. Sample size (100, 250, 500, 1000); replications are set such that the total sample size for each condition is always 100,000
  3. Models:
    1. Tau equivilence with standadized loadings of .70 and reliability of \( \omega = .8 \)
    2. Strong reliability \( \omega = .8 \)
    3. Poor reliability \( \omega = .4 \)
    4. Tau equivilence with heavy skew
    5. Poor reliability with moderate cross-loadings

The population correlation between factor 1 and factor 2 is \( r = .5 \) for all models. Latent M (mean of posterior) and Laten D (mode of the posterior) are both included as they are the two main ways of summarizing results from Bayes analyses in Mplus. Latent M is the defaul while Latent D is asymtotically equivilent to maximum liklihood as the sample.

Tau Equivilence Results

Tau equivilence model with various sample sizes (\( N = 100,250,500,1000 \)):

## N = 100
##          estimate   bias efficency   MSE
## scale       0.393 -0.214     0.085 0.053
## Latent_M    0.487 -0.027     0.107 0.012
## Latent_D    0.493 -0.014     0.134 0.018
## fs          0.507  0.014     0.115 0.013
## PV1         0.480 -0.040     0.160 0.027
## PV5         0.480 -0.039     0.126 0.017
## PV10        0.484 -0.032     0.114 0.014
## PV20        0.480 -0.041     0.117 0.015
## PV100       0.481 -0.038     0.112 0.014
## N = 250
##          estimate   bias efficency   MSE
## scale       0.396 -0.207     0.053 0.046
## Latent_M    0.495 -0.011     0.069 0.005
## Latent_D    0.498 -0.003     0.108 0.012
## fs          0.519  0.038     0.074 0.007
## PV1         0.491 -0.018     0.128 0.017
## PV5         0.492 -0.017     0.087 0.008
## PV10        0.490 -0.021     0.090 0.008
## PV20        0.491 -0.019     0.084 0.007
## PV100       0.490 -0.020     0.083 0.007
## N = 500
##          estimate   bias efficency   MSE
## scale       0.398 -0.204     0.040 0.043
## Latent_M    0.497 -0.006     0.055 0.003
## Latent_D    0.497 -0.007     0.118 0.014
## fs          0.517  0.034     0.111 0.013
## PV1         0.491 -0.018     0.097 0.010
## PV5         0.495 -0.010     0.075 0.006
## PV10        0.493 -0.015     0.083 0.007
## PV20        0.490 -0.020     0.093 0.009
## PV100       0.491 -0.018     0.084 0.007
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.397 -0.207     0.026 0.043
## Latent_M    0.499 -0.001     0.034 0.001
## Latent_D    0.498 -0.003     0.036 0.001
## fs          0.524  0.048     0.035 0.003
## PV1         0.499 -0.002     0.041 0.002
## PV5         0.497 -0.006     0.041 0.002
## PV10        0.496 -0.009     0.062 0.004
## PV20        0.497 -0.007     0.049 0.002
## PV100       0.497 -0.005     0.046 0.002

With a high reliability tau equivilence model, all methods apart from scale scores do quite well.One plausible value tends to be poor.

Strong Reliabilty Results

Strong reliability model with various sample sizes (\( N = 100,250,500,1000 \)):

## N = 100
##          estimate   bias efficency   MSE
## scale       0.384 -0.232     0.085 0.061
## Latent_M    0.499 -0.002     0.099 0.010
## Latent_D    0.510  0.020     0.121 0.015
## fs          0.520  0.039     0.107 0.013
## PV1         0.492 -0.017     0.170 0.029
## PV5         0.494 -0.011     0.111 0.013
## PV10        0.493 -0.015     0.112 0.013
## PV20        0.495 -0.009     0.106 0.011
## PV100       0.494 -0.012     0.105 0.011
## N = 250
##          estimate   bias efficency   MSE
## scale       0.383 -0.234     0.055 0.058
## Latent_M    0.496 -0.008     0.066 0.004
## Latent_D    0.498 -0.004     0.105 0.011
## fs          0.517  0.035     0.095 0.010
## PV1         0.496 -0.008     0.114 0.013
## PV5         0.495 -0.009     0.075 0.006
## PV10        0.494 -0.012     0.077 0.006
## PV20        0.493 -0.015     0.080 0.007
## PV100       0.494 -0.012     0.071 0.005
## N = 500
##          estimate   bias efficency   MSE
## scale       0.385 -0.229     0.039 0.054
## Latent_M    0.500  0.000     0.045 0.002
## Latent_D    0.502  0.003     0.052 0.003
## fs          0.526  0.052     0.048 0.005
## PV1         0.499 -0.003     0.106 0.011
## PV5         0.498 -0.004     0.060 0.004
## PV10        0.499 -0.002     0.052 0.003
## PV20        0.498 -0.004     0.068 0.005
## PV100       0.497 -0.007     0.059 0.004
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.383 -0.233     0.028 0.055
## Latent_M    0.498 -0.004     0.032 0.001
## Latent_D    0.500  0.000     0.036 0.001
## fs          0.522  0.044     0.032 0.003
## PV1         0.500  0.000     0.033 0.001
## PV5         0.492 -0.017     0.071 0.005
## PV10        0.496 -0.007     0.039 0.002
## PV20        0.496 -0.009     0.043 0.002
## PV100       0.497 -0.005     0.034 0.001

In this model, scale scores do poorly but for factor scores vs other methods there is a trade off. Bias is smaller for latent variables and PVs but efficency is better for factor scores. One plausible value tends to be poor.

Weak Reliabilty Results

Weak reliability model with various sample sizes (\( N = 100,250,500,1000 \)):

## N = 100
##          estimate   bias efficency   MSE
## scale       0.184 -0.633     0.098 0.410
## Latent_M    0.394 -0.211     0.205 0.087
## Latent_D    0.426 -0.148     0.316 0.122
## fs          0.351 -0.298     0.202 0.130
## PV1         0.396 -0.208     0.321 0.147
## PV5         0.389 -0.222     0.234 0.104
## PV10        0.382 -0.237     0.226 0.107
## PV20        0.390 -0.220     0.219 0.097
## PV100       0.389 -0.221     0.211 0.094
## N = 250
##          estimate   bias efficency   MSE
## scale       0.186 -0.627     0.060 0.397
## Latent_M    0.465 -0.070     0.144 0.026
## Latent_D    0.475 -0.049     0.206 0.045
## fs          0.407 -0.186     0.127 0.051
## PV1         0.459 -0.082     0.223 0.056
## PV5         0.470 -0.060     0.164 0.031
## PV10        0.462 -0.076     0.155 0.030
## PV20        0.460 -0.081     0.147 0.028
## PV100       0.463 -0.075     0.144 0.026
## N = 500
##          estimate   bias efficency   MSE
## scale       0.189 -0.622     0.046 0.389
## Latent_M    0.485 -0.031     0.119 0.015
## Latent_D    0.492 -0.016     0.150 0.023
## fs          0.424 -0.153     0.100 0.033
## PV1         0.480 -0.040     0.192 0.038
## PV5         0.477 -0.047     0.147 0.024
## PV10        0.480 -0.039     0.126 0.018
## PV20        0.478 -0.045     0.133 0.020
## PV100       0.478 -0.045     0.125 0.018
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.187 -0.626     0.029 0.392
## Latent_M    0.488 -0.024     0.081 0.007
## Latent_D    0.485 -0.029     0.101 0.011
## fs          0.428 -0.145     0.066 0.025
## PV1         0.490 -0.020     0.156 0.025
## PV5         0.491 -0.018     0.107 0.012
## PV10        0.488 -0.023     0.095 0.010
## PV20        0.488 -0.024     0.083 0.007
## PV100       0.486 -0.027     0.086 0.008

Similar findings from the above model but the results are clearer. It is clear from the results that 5 PVs do quite well but more PVs do seem to be slightly better.

MSE Plots - All

plot of chunk unnamed-chunk-7

MSE Plots - Close up on PVs, Factor scores, and Latents

plot of chunk unnamed-chunk-8

Other simulations

Cross-loadings and poor reliability

Poor Reliability - small cross loadings

## N = 100
##          estimate   bias efficency   MSE
## scale       0.300 -0.401     0.092 0.169
## Latent_M    0.572  0.143     0.168 0.049
## Latent_D    0.617  0.234     0.257 0.121
## fs          0.509  0.018     0.160 0.026
## PV1         0.579  0.158     0.264 0.095
## PV5         0.564  0.129     0.193 0.054
## PV10        0.560  0.120     0.190 0.050
## PV20        0.563  0.127     0.177 0.047
## PV100       0.561  0.123     0.173 0.045
## N = 250
##          estimate   bias efficency   MSE
## scale       0.303 -0.393     0.056 0.158
## Latent_M    0.643  0.287     0.118 0.096
## Latent_D    0.684  0.368     0.168 0.163
## fs          0.566  0.133     0.093 0.026
## PV1         0.632  0.265     0.193 0.107
## PV5         0.626  0.253     0.143 0.084
## PV10        0.632  0.265     0.135 0.089
## PV20        0.635  0.271     0.126 0.089
## PV100       0.633  0.266     0.123 0.086
## N = 500
##          estimate   bias efficency   MSE
## scale       0.307 -0.387     0.042 0.151
## Latent_M    0.683  0.366     0.088 0.141
## Latent_D    0.687  0.373     0.123 0.154
## fs          0.592  0.184     0.067 0.038
## PV1         0.667  0.335     0.164 0.139
## PV5         0.667  0.335     0.100 0.122
## PV10        0.676  0.352     0.092 0.132
## PV20        0.674  0.348     0.091 0.129
## PV100       0.674  0.348     0.089 0.129
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.305 -0.389     0.027 0.152
## Latent_M    0.692  0.384     0.069 0.153
## Latent_D    0.700  0.400     0.086 0.168
## fs          0.599  0.199     0.046 0.042
## PV1         0.689  0.379     0.086 0.151
## PV5         0.686  0.373     0.093 0.148
## PV10        0.688  0.376     0.079 0.148
## PV20        0.687  0.375     0.069 0.145
## PV100       0.684  0.367     0.075 0.141

Strong Reliability - small cross loadings

## N = 100
##          estimate   bias efficency   MSE
## scale       0.384 -0.232     0.085 0.061
## Latent_M    0.499 -0.002     0.099 0.010
## Latent_D    0.510  0.020     0.121 0.015
## fs          0.520  0.039     0.107 0.013
## PV1         0.492 -0.017     0.170 0.029
## PV5         0.494 -0.011     0.111 0.013
## PV10        0.493 -0.015     0.112 0.013
## PV20        0.495 -0.009     0.106 0.011
## PV100       0.494 -0.012     0.105 0.011
## N = 250
##          estimate   bias efficency   MSE
## scale       0.383 -0.234     0.055 0.058
## Latent_M    0.496 -0.008     0.066 0.004
## Latent_D    0.498 -0.004     0.105 0.011
## fs          0.517  0.035     0.095 0.010
## PV1         0.496 -0.008     0.114 0.013
## PV5         0.495 -0.009     0.075 0.006
## PV10        0.494 -0.012     0.077 0.006
## PV20        0.493 -0.015     0.080 0.007
## PV100       0.494 -0.012     0.071 0.005
## N = 500
##          estimate   bias efficency   MSE
## scale       0.385 -0.229     0.039 0.054
## Latent_M    0.500  0.000     0.045 0.002
## Latent_D    0.502  0.003     0.052 0.003
## fs          0.526  0.052     0.048 0.005
## PV1         0.499 -0.003     0.106 0.011
## PV5         0.498 -0.004     0.060 0.004
## PV10        0.499 -0.002     0.052 0.003
## PV20        0.498 -0.004     0.068 0.005
## PV100       0.497 -0.007     0.059 0.004
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.383 -0.233     0.028 0.055
## Latent_M    0.498 -0.004     0.032 0.001
## Latent_D    0.500  0.000     0.036 0.001
## fs          0.522  0.044     0.032 0.003
## PV1         0.500  0.000     0.033 0.001
## PV5         0.492 -0.017     0.071 0.005
## PV10        0.496 -0.007     0.039 0.002
## PV20        0.496 -0.009     0.043 0.002
## PV100       0.497 -0.005     0.034 0.001

In the case of poor reliability and moderate factor laodings factor scores tend to do better than latent variables or PVs. I wonder if this is because the attentuation associated with scale scores and factor scores blanaces out with the higher correlations associated with not modelling non-zero cross loadings. For strong reliability PVs and latents are better.

Heavy Skew

Tau Equivilence Results

## N = 100
##          estimate   bias efficency   MSE
## scale       0.327 -0.346     0.092 0.128
## Latent_M    0.500 -0.001     0.146 0.021
## Latent_D    0.521  0.042     0.199 0.041
## fs          0.523  0.047     0.155 0.026
## PV1         0.495 -0.010     0.226 0.051
## PV5         0.492 -0.016     0.170 0.029
## PV10        0.494 -0.012     0.156 0.024
## PV20        0.496 -0.009     0.155 0.024
## PV100       0.493 -0.014     0.151 0.023
## N = 250
##          estimate   bias efficency   MSE
## scale       0.335 -0.330     0.056 0.112
## Latent_M    0.528  0.055     0.098 0.013
## Latent_D    0.540  0.080     0.115 0.020
## fs          0.560  0.119     0.094 0.023
## PV1         0.507  0.014     0.176 0.031
## PV5         0.527  0.053     0.107 0.014
## PV10        0.528  0.057     0.098 0.013
## PV20        0.525  0.051     0.098 0.012
## PV100       0.525  0.050     0.096 0.012
## N = 500
##          estimate   bias efficency   MSE
## scale       0.333 -0.335     0.041 0.114
## Latent_M    0.529  0.058     0.071 0.008
## Latent_D    0.526  0.052     0.127 0.019
## fs          0.553  0.105     0.070 0.016
## PV1         0.537  0.075     0.080 0.012
## PV5         0.524  0.048     0.099 0.012
## PV10        0.525  0.050     0.087 0.010
## PV20        0.522  0.044     0.107 0.013
## PV100       0.522  0.044     0.093 0.011
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.336 -0.329     0.029 0.109
## Latent_M    0.538  0.075     0.047 0.008
## Latent_D    0.542  0.085     0.056 0.010
## fs          0.568  0.135     0.046 0.020
## PV1         0.527  0.054     0.109 0.015
## PV5         0.534  0.068     0.061 0.008
## PV10        0.534  0.067     0.061 0.008
## PV20        0.534  0.067     0.051 0.007
## PV100       0.534  0.069     0.053 0.008

This is the model in which the advatnages of PVs and latent variables are most apparent.

Control files and all scripts are on my bitbucket account. If interested let me know and I can add you as collaborators.

Missing data - Moderate (cross sectional missing)

Tau Equivilence Results

## N = 100
##          estimate   bias efficency   MSE
## scale       0.383 -0.235     0.085 0.062
## Latent_M    0.490 -0.019     0.111 0.013
## Latent_D    0.497 -0.007     0.152 0.023
## fs          0.509  0.018     0.123 0.015
## PV1         0.480 -0.039     0.176 0.033
## PV5         0.485 -0.029     0.132 0.018
## PV10        0.481 -0.037     0.125 0.017
## PV20        0.483 -0.033     0.120 0.015
## PV100       0.483 -0.034     0.118 0.015
## N = 250
##          estimate   bias efficency   MSE
## scale       0.383 -0.233     0.056 0.057
## Latent_M    0.493 -0.015     0.071 0.005
## Latent_D    0.502  0.004     0.086 0.007
## fs          0.518  0.037     0.077 0.007
## PV1         0.486 -0.027     0.126 0.017
## PV5         0.489 -0.021     0.087 0.008
## PV10        0.487 -0.026     0.089 0.009
## PV20        0.485 -0.030     0.090 0.009
## PV100       0.486 -0.028     0.083 0.008
## N = 500
##          estimate   bias efficency   MSE
## scale       0.382 -0.236     0.036 0.057
## Latent_M    0.496 -0.008     0.045 0.002
## Latent_D    0.497 -0.006     0.060 0.004
## fs          0.523  0.046     0.051 0.005
## PV1         0.493 -0.014     0.060 0.004
## PV5         0.494 -0.011     0.054 0.003
## PV10        0.496 -0.009     0.046 0.002
## PV20        0.496 -0.009     0.046 0.002
## PV100       0.495 -0.010     0.047 0.002
## N = 1000
##          estimate   bias efficency   MSE
## scale       0.383 -0.235     0.025 0.056
## Latent_M    0.503  0.005     0.032 0.001
## Latent_D    0.507  0.015     0.040 0.002
## fs          0.529  0.057     0.036 0.005
## PV1         0.500  0.001     0.041 0.002
## PV5         0.499 -0.001     0.049 0.002
## PV10        0.501  0.002     0.037 0.001
## PV20        0.501  0.002     0.039 0.001
## PV100       0.501  0.002     0.034 0.001

Missing data here was moderate and larger for factor 1 than factor 2. In this case the data are MAR however the variable representing the missing data mechanism is not included in the model as is realistic in most applied research situations.

Control files and all scripts are on my bitbucket account. If interested let me know and I can add you as collaborators.