class: center, middle, inverse, title-slide .title[ # Computational Psychomet
[ comment ]
ics Applied to Enginee
[ comment ]
ing ] .subtitle[ ##
Bayesian Structural Equation Modeling ] .author[ ### Jo
[ comment ]
ge Sinval ] .date[ ### 2024-11-27 ] --- class: inverse, center, middle # Bayesian Confirmatory Factor Analysis <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .stan-color { color: #b2001d; } .r-color { color: #2167B9; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style> ??? Slides content based on this [document](https://mc-stan.org/users/documentation/case-studies/sem.html). --- # Bayesian Confirmatory Factor Analysis .pull-left-1[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Statistics is like building a bridge. <a href="https://t.co/E0UURWR4zy">pic.twitter.com/E0UURWR4zy</a></p>— Eric Novik (@ericnovik) <a href="https://twitter.com/ericnovik/status/1774512058836402645?ref_src=twsrc%5Etfw">March 31, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right-2[ <br> <br> <br> <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#assets/img/stats_is_like_bridge.png" alt="Statistics is like building a bridge." width="100%" /> <p class="caption">Statistics is like building a bridge.</p> </div> ] --- # Bayesian Confirmatory Factor Analysis As a measurement model and probably one of the most popular special cases of a SEM, CFA is often used to 1) validate a hypothesized factor structure among multiple variables, 2) estimate the correlation between factors, and 3) obtain factor scores. For example, consider a two-factor `\(\left(\eta_{1j}, \eta_{2j}\right)\)` model with each factor measured by six items `\(\left(y_{1j},\dots, y_{6j}\right)\)` for person `\(j\)`: `$$\underbrace{\left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]}_{\boldsymbol{y}_{j}}=\underbrace{\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]}_{\boldsymbol{\beta}}+\underbrace{\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]}_{\Lambda}\underbrace{\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]}_{\boldsymbol{\eta}_j}+\underbrace{\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]}_{\boldsymbol{\epsilon}_j}\\ \boldsymbol{y}_{j}=\boldsymbol{\beta}+\boldsymbol{\Lambda}\boldsymbol{\eta}_{j}+\boldsymbol{\epsilon}_{j}\\ \boldsymbol{\epsilon}_{j} \sim N_{I}(\mathbf{0}, \mathbf{\Theta}) \\ \boldsymbol{\eta}_{j} \sim N_{K}(\mathbf{0}, \boldsymbol{\Psi})$$` the number of items or variables is `\(I=6\)`, the number of factors is `\(K=2\)` and `\(\mathbf{\Theta}\)` is often assumed to be a diagonal matrix. --- # Bayesian Confirmatory Factor Analysis `\(Y_{ij}\)`, is the response of person `\(j\)` `\(\left(j=1,...,J\right)\)` on item `\(i\)` `\(\left(i=1,...,I\right)\)`, `\(\beta_i\)` is the intercept for item `\(i\)`, `\(\eta_{jk}\)` is the `\(k\)`th common factor for person `\(j\)`, `\(\lambda_{ik}\)` is the factor loading of item `\(i\)` on factor `\(k\)`, `\(\epsilon_{ij}\)` is the random error term for person `\(j\)` on item `\(i\)`, `\(\boldsymbol{\Psi}\)` is the variance-covariance matrix of the common factors `\(\boldsymbol{\eta}_j\)`; and, `\(\mathbf{\Theta}\)` is the variance-covariance matrix of the residuals (or unique factors) `\(\boldsymbol{\epsilon}_j\)`. --- # Bayesian Confirmatory Factor Analysis Suppose the errors or residuals `\(\epsilon_{ij}\)` are independent of each other. Then: `\(\psi_{kk}\)` is the variance for the `\(k\)`th factor, `\(\psi_{jk}\)` is the covariance between the `\(j\)`th and `\(k\)`th factors, `\(\theta_{ii}\)` is the variance for the `\(i\)`th residual, and, `\(\theta_{ii^\prime}=0 \iff i\neq i^\prime\)`. Specifically, the model can be written as: `$$\boldsymbol{\Psi}=\mathbb{Cov}\begin{pmatrix}\eta_{1j} \\ \eta_{2j} \end{pmatrix}= \left[\begin{array}{cc} \psi_{1 1}&\psi_{1 2} \\ \psi_{2 1}&\psi_{2 2} \end{array}\right]$$` `$$\mathbf{\Theta}=\mathbb{Cov}\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right]$$` --- # Bayesian Confirmatory Factor Analysis To better illustrate the use of `blavaan` (Merkle and Rosseel, 2018), we simulate data so that we know the data generating parameters. In our simulation, we set `\(β_i=0\)` for all `\(i\)`, `\(\psi_{11}=\psi_{22}=1\)`, `\(\psi_{12}=\psi_{21}=.5\)`, `\(\lambda_{21}=1.5\)`, `\(\lambda_{31}=2\)`, `\(\lambda_{52}=1.5\)`, `\(\lambda_{62}=2\)`, and `\(\theta_{ii}=.3\)`. We simulate data from the above model for `\(J=1000\)` units. Let's start by loading <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> libraries and simulating the data. <div class="pre-name">bcfa_sim_data.R</div> ``` r pacman::p_load(rstan, lavaan, blavaan, MASS, mvtnorm, tidyverse, semPlot, Matrix) options(mc.cores = parallel::detectCores()) J <- 1000; I <- 6; K <- 2 psi <- matrix(c(1, 0.5, 0.5, 0.8), nrow = K) beta <- seq(1, 2, by = .2) # loading matrix Lambda <- cbind(c(1, 1.5, 2, 0, 0, 0), c(0, 0, 0, 1, 1.5, 2)) # error covariance Theta <- diag(0.3, nrow = I) # factor scores eta <- mvrnorm(J, mu = c(0, 0), Sigma = psi) # error term epsilon <- mvrnorm(J, mu = rep(0, ncol(Theta)),Sigma = Theta) dat <- tcrossprod(eta, Lambda) + epsilon dat_cfa <- dat |> as.data.frame() |> setNames(c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6")) ``` --- # Bayesian Confirmatory Factor Analysis We define the model for lavaan as follows: <div class="pre-name">bcfa-model.R</div> ``` r model <- ' eta1 =~ Y1 + Y2 + Y3 eta2 =~ Y4 + Y5 + Y6' ``` Two latent variables `eta1` and `eta2` are specified to be measured by three items each, denoted as `eta1 =~ Y1 + Y2 + Y3`, and similary for `eta2`. By not specifying other parts of the model, by default we assume that the error terms for items are uncorrelated with each other while the covariances between latent variables are free to be estimated. We represent the CFA model in a path diagram and then fit the model by maximum likelihood estimation using the cfa function in the lavaan package. By convention, latent variables `\(\eta_1\)` and `\(\eta_2\)` are represented by circles, and observed variables `\(Y_1\)` to `\(Y_6\)` by rectangles. Straight arrows represent linear relations (here with coefficients given by the factor loadings `\(\lambda\)`), and double-headed arrows represent variances and covariances. We could make the diagram by simply using the function call `semPaths(semPlotModel_lavaanModel(model))`. --- # Bayesian Confirmatory Factor Analysis Below is the more complex syntax to display Greek letters, subscripts, etc. <div class="pre-name">bcfa-diagram.R</div> ``` r fit <- semPlotModel_lavaanModel(model,auto.var = TRUE, auto.fix.first = TRUE, auto.cov.lv.x = TRUE) semPaths(fit, what = "paths", whatLabels = "par",edge.color = "black", nodeLabels = c(expression(paste(Y[1])),expression(paste(Y[2])),expression(paste(Y[3])), expression(paste(Y[4])),expression(paste(Y[5])),expression(paste(Y[6])), expression(paste(eta[1])),expression(paste(eta[2]))), edge.label.cex = 0.8, edgeLabels = c(expression(paste(lambda[1])),expression(paste(lambda[2])),expression(paste(lambda[3])), expression(paste(lambda[4])),expression(paste(lambda[5])),expression(paste(lambda[6])), expression(paste(epsilon[1])),expression(paste(epsilon[2])),expression(paste(epsilon[3])), expression(paste(epsilon[4])),expression(paste(epsilon[5])),expression(paste(epsilon[6])), expression(paste(psi[1])),expression(paste(psi[2])), expression(paste(Psi[12])))) ``` --- # Bayesian Confirmatory Factor Analysis <img src="data:image/png;base64,#slides19ofx_files/figure-html/bcfa-diagram-1.png" width="100%" height="99%" /> --- # Bayesian Confirmatory Factor Analysis ## The frequentist way <div class="pre-name">bcfa-run.R</div> ``` r fit_cfa <- cfa(m=model,d = dat_cfa) summary(fit_cfa, fit.m = T, std=T) ``` .scroll-box-20[ ``` ## lavaan 0.6-19 ended normally after 28 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 13 ## ## Number of observations 1000 ## ## Model Test User Model: ## ## Test statistic 12.513 ## Degrees of freedom 8 ## P-value (Chi-square) 0.130 ## ## Model Test Baseline Model: ## ## Test statistic 5828.783 ## Degrees of freedom 15 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.999 ## Tucker-Lewis Index (TLI) 0.999 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -7908.490 ## Loglikelihood unrestricted model (H1) -7902.233 ## ## Akaike (AIC) 15842.979 ## Bayesian (BIC) 15906.780 ## Sample-size adjusted Bayesian (SABIC) 15865.491 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.024 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.048 ## P-value H_0: RMSEA <= 0.050 0.966 ## P-value H_0: RMSEA >= 0.080 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.008 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## eta1 =~ ## Y1 1.000 0.973 0.870 ## Y2 1.510 0.034 44.782 0.000 1.468 0.935 ## Y3 2.039 0.043 47.118 0.000 1.984 0.963 ## eta2 =~ ## Y4 1.000 0.876 0.843 ## Y5 1.500 0.038 39.087 0.000 1.314 0.916 ## Y6 2.023 0.049 41.092 0.000 1.772 0.953 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## eta1 ~~ ## eta2 0.477 0.034 13.860 0.000 0.560 0.560 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .Y1 0.303 0.016 18.998 0.000 0.303 0.242 ## .Y2 0.310 0.023 13.396 0.000 0.310 0.126 ## .Y3 0.308 0.036 8.485 0.000 0.308 0.073 ## .Y4 0.312 0.016 18.947 0.000 0.312 0.289 ## .Y5 0.330 0.024 13.763 0.000 0.330 0.160 ## .Y6 0.314 0.037 8.526 0.000 0.314 0.091 ## eta1 0.946 0.055 17.253 0.000 1.000 1.000 ## eta2 0.767 0.047 16.315 0.000 1.000 1.000 ``` ] --- # Bayesian Confirmatory Factor Analysis By default, `lavaan` uses maximum likelihood estimation. Estimates of the loadings `\(\lambda_{ik}\)` are given under "Latent Variables", the estimated covariance among the common factors is given under "Covariances", estimates of intercepts `\(\beta_i\)` of the measurement models, as well as the means of the common factors (set to `\(0\)`, not estimated), are given under "Intercepts", and estimates of the residual variances of the responses (variances `\(\theta_{ii}\)` of `\(\epsilon_{ij}\)`) and of the common factors (variances `\(\psi_{kk}\)` of `\(\eta_{jk}\)`) are given under "Variances." While `blavaan` borrows the syntax from `lavaan` and is a wrapper for Bayesian estimation using `Stan` (<img width="17px" src="assets/img/stan.png"></img>). The .stan-color[Stan object] created by `blavaan` can be seen using `lavInspect()`. By default, the `bcfa` function in `blavaan` uses an anchoring item for each factor for identification, so the factor loading of the first item that loads on a factor is fixed at `\(1\)`. The `bcfa` function also uses weakly informative priors for the factor loadings and residual variances. --- # Bayesian Confirmatory Factor Analysis ## The Bayesian way <div class="pre-name">bcfa-run-bayes.R</div> ``` r fit_bcfa <- bcfa(m=model, d=dat_cfa, mcmcfile = T) summary(fit_bcfa, std=T) ``` .scroll-box-20[ ] --- # Bayesian Confirmatory Factor Analysis We can see that the output from `blavaan` resembles that from `lavaan` although it gives the posterior means and standard deviations for the model parameters. We present the maximum likelihood estimates from `lavaan` and Bayesian estimates from `blavaan` next to each other for comparison. Since by default `blavaan` uses non-informative priors and the size of data is large, we see the two sets of estimates are very similar: <div class="pre-name">bcfa-cfa-comparison.R</div> ``` r bind_cols(parameterEstimates(fit_cfa)[, 1:4], parameterEstimates(fit_bcfa)[, 4]) %>% rename(ML = est, Bayes = ...5) %>% knitr::kable(caption = "Frequentist (Maximum Likelihood) and Bayesian Estimated Models") ``` .scroll-box-16[ <table> <caption>Frequentist (Maximum Likelihood) and Bayesian Estimated Models</caption> <thead> <tr> <th style="text-align:left;"> lhs </th> <th style="text-align:left;"> op </th> <th style="text-align:left;"> rhs </th> <th style="text-align:right;"> ML </th> <th style="text-align:right;"> Bayes </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> eta1 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y1 </td> <td style="text-align:right;"> 1.0000000 </td> <td style="text-align:right;"> 1.0000000 </td> </tr> <tr> <td style="text-align:left;"> eta1 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y2 </td> <td style="text-align:right;"> 1.5095511 </td> <td style="text-align:right;"> 1.5109594 </td> </tr> <tr> <td style="text-align:left;"> eta1 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y3 </td> <td style="text-align:right;"> 2.0390182 </td> <td style="text-align:right;"> 2.0417438 </td> </tr> <tr> <td style="text-align:left;"> eta2 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y4 </td> <td style="text-align:right;"> 1.0000000 </td> <td style="text-align:right;"> 1.0000000 </td> </tr> <tr> <td style="text-align:left;"> eta2 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y5 </td> <td style="text-align:right;"> 1.5003525 </td> <td style="text-align:right;"> 1.5015390 </td> </tr> <tr> <td style="text-align:left;"> eta2 </td> <td style="text-align:left;"> =~ </td> <td style="text-align:left;"> Y6 </td> <td style="text-align:right;"> 2.0234836 </td> <td style="text-align:right;"> 2.0262906 </td> </tr> <tr> <td style="text-align:left;"> Y1 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y1 </td> <td style="text-align:right;"> 0.3029230 </td> <td style="text-align:right;"> 0.3044087 </td> </tr> <tr> <td style="text-align:left;"> Y2 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y2 </td> <td style="text-align:right;"> 0.3099979 </td> <td style="text-align:right;"> 0.3117275 </td> </tr> <tr> <td style="text-align:left;"> Y3 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y3 </td> <td style="text-align:right;"> 0.3081206 </td> <td style="text-align:right;"> 0.3088713 </td> </tr> <tr> <td style="text-align:left;"> Y4 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y4 </td> <td style="text-align:right;"> 0.3118137 </td> <td style="text-align:right;"> 0.3137974 </td> </tr> <tr> <td style="text-align:left;"> Y5 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y5 </td> <td style="text-align:right;"> 0.3298803 </td> <td style="text-align:right;"> 0.3325729 </td> </tr> <tr> <td style="text-align:left;"> Y6 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> Y6 </td> <td style="text-align:right;"> 0.3141533 </td> <td style="text-align:right;"> 0.3137212 </td> </tr> <tr> <td style="text-align:left;"> eta1 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> eta1 </td> <td style="text-align:right;"> 0.9463280 </td> <td style="text-align:right;"> 0.9483843 </td> </tr> <tr> <td style="text-align:left;"> eta2 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> eta2 </td> <td style="text-align:right;"> 0.7666715 </td> <td style="text-align:right;"> 0.7693802 </td> </tr> <tr> <td style="text-align:left;"> eta1 </td> <td style="text-align:left;"> ~~ </td> <td style="text-align:left;"> eta2 </td> <td style="text-align:right;"> 0.4770401 </td> <td style="text-align:right;"> 0.4776647 </td> </tr> </tbody> </table> ] --- # Bayesian Confirmatory Factor Analysis To check the setting for the underlying .stan-color[Stan program] (i.e., the number of chains, the number of warm-up interations, and the number of post warm-up draws): <div class="pre-name">bcfa-stan-settings.R</div> ``` r blavInspect(fit_bcfa, "mcobj") ``` .scroll-box-20[ ``` ## Inference for Stan model: stanmarg. ## 3 chains, each with iter=1500; warmup=500; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=3000. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% ## ly_sign[1] 1.51 0.00 0.03 1.45 1.49 1.51 1.53 1.58 ## ly_sign[2] 2.04 0.00 0.04 1.95 2.01 2.04 2.07 2.13 ## ly_sign[3] 1.50 0.00 0.04 1.43 1.47 1.50 1.53 1.58 ## ly_sign[4] 2.03 0.00 0.05 1.93 1.99 2.02 2.06 2.13 ## Theta_var[1] 0.30 0.00 0.02 0.28 0.29 0.30 0.31 0.34 ## Theta_var[2] 0.31 0.00 0.02 0.27 0.30 0.31 0.33 0.36 ## Theta_var[3] 0.31 0.00 0.04 0.24 0.28 0.31 0.33 0.38 ## Theta_var[4] 0.31 0.00 0.02 0.28 0.30 0.31 0.32 0.35 ## Theta_var[5] 0.33 0.00 0.02 0.29 0.32 0.33 0.35 0.38 ## Theta_var[6] 0.31 0.00 0.04 0.25 0.29 0.31 0.34 0.39 ## Psi_cov[1] 0.48 0.00 0.04 0.41 0.45 0.48 0.50 0.55 ## Psi_var[1] 0.95 0.00 0.06 0.84 0.91 0.95 0.99 1.07 ## Psi_var[2] 0.77 0.00 0.05 0.68 0.74 0.77 0.80 0.87 ## log_lik[1] -102.38 0.07 2.50 -108.12 -103.82 -102.09 -100.59 -98.44 ## log_lik_sat[1] 12.74 0.07 2.50 8.80 10.95 12.45 14.18 18.48 ## ppp 0.28 0.01 0.45 0.00 0.00 0.00 1.00 1.00 ## lp__ -128.89 0.07 2.50 -134.58 -130.35 -128.58 -127.11 -124.94 ## n_eff Rhat ## ly_sign[1] 2351 1 ## ly_sign[2] 2358 1 ## ly_sign[3] 2488 1 ## ly_sign[4] 2243 1 ## Theta_var[1] 4151 1 ## Theta_var[2] 3014 1 ## Theta_var[3] 2941 1 ## Theta_var[4] 3170 1 ## Theta_var[5] 2492 1 ## Theta_var[6] 2634 1 ## Psi_cov[1] 2405 1 ## Psi_var[1] 2422 1 ## Psi_var[2] 2192 1 ## log_lik[1] 1318 1 ## log_lik_sat[1] 1318 1 ## ppp 2367 1 ## lp__ 1315 1 ## ## Samples were drawn using NUTS(diag_e) at Wed Nov 27 05:40:22 2024. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ``` ] --- # Bayesian Confirmatory Factor Analysis We see that there were `\(3\)` chains with `\(1500\)` iterations each, the first `\(500\)` of which were warmup, giving a total of `\(3000\)` draws. In the .stan-color[Stan] output above, `se_mean` is the Monte Carlo error due to the fact that we use posterior draws to empirically estimate posterior expectations. These Monte Carlo errors are given by `\(\frac{sd}{n_{eff}}\)`, where `\(sd\)` here refers to the posterior standard deviation (i.e., `Post.SD`) which can be viewed as a Bayesian counterpart of frequentist standard errors. Because there is an autocorrelation among our Monte Carlo draws within chains, the "effective" sample, `\(n_{eff}\)`, used to estimate the Monte Carlo errors is smaller than the number of post warmup draws. The effective sample size can be viewed as the number of independent draws that would give the same amount of information in terms of the Monte Carlo errors. We see that the Monte Carlo errors will go to zero as the number of draws goes to infinity. `Rhat` (i.e., `\(\hat{R}\)`) quantifies how well the different chains mix with each other (overlap) post warmup, with a value closer to `\(1\)` less than `\(1.1\)` indicating that there is adequate mixing. The purpose of running multiple chains with different starting values for the parameters is to assess whether the chains have reached the stationary distribution, i.e., the required correct posterior distribution, by checking that the chains are converging to the same distribution and are hence mixing with each other. --- # Bayesian Confirmatory Factor Analysis One benefit of running a Bayesian analysis is that we can incorporate prior information if we have it. The `blavaan` pacakge allows users to specify priors for all the model parameters via the argument `dp` and uses non-informative priors (with flat densities/large variances) if this argument is left blank. When non-informative priors are used for all parameters, the Bayesian estimates will be similar to their maximum likelihood counterparts. To check the default priors: <div class="pre-name">bcfa-priors.R</div> ``` r (default_prior <- dpriors()) ``` ``` ## nu alpha lambda beta ## "normal(0,32)" "normal(0,10)" "normal(0,10)" "normal(0,10)" ## theta psi rho ibpsi ## "gamma(1,.5)[sd]" "gamma(1,.5)[sd]" "beta(1,1)" "wishart(3,iden)" ## tau ## "normal(0,1.5)" ``` --- # Bayesian Confirmatory Factor Analysis To change a prior, we can just substitute the desired one: for example, if we would like to change the prior for beta from `\(\mathcal{N}(0, 10)\)` to `\(\mathcal{N}(0, 1)\)`, then we can form a new prior vector: <div class="pre-name">bcfa-new-prior.R</div> ``` r (new_prior <- dpriors(beta = "normal(0, 1)")) ``` .scroll-box-20[ ``` ## nu alpha lambda beta ## "normal(0,32)" "normal(0,10)" "normal(0,10)" "normal(0, 1)" ## theta psi rho ibpsi ## "gamma(1,.5)[sd]" "gamma(1,.5)[sd]" "beta(1,1)" "wishart(3,iden)" ## tau ## "normal(0,1.5)" ``` ] --- # Bayesian Confirmatory Factor Analysis New `bcfa` with updated prior for beta: <div class="pre-name">bcfa-new-prior-run.R</div> ``` r fit_bcfa_new_prior <- bcfa(m = model, data=dat_cfa, dp = new_prior) summary(fit_bcfa_new_prior, std=T) ``` .scroll-box-16[ ] --- # References Merkle, E. C. and Y. Rosseel (2018). "blavaan : Bayesian Structural Equation Models via Parameter Expansion". In: _Journal of Statistical Software_ 85 (4), pp. 1-30. ISSN: 1548-7660. DOI: [10.18637/jss.v085.i04](https://doi.org/10.18637%2Fjss.v085.i04). URL: [http://arxiv.org/abs/1511.05604 http://www.jstatsoft.org/v85/i04/](http://arxiv.org/abs/1511.05604 http://www.jstatsoft.org/v85/i04/). --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 