Computational Psychomet [ comment ] ics Applied to Enginee [ comment ] ing

class: center, middle, inverse, title-slide

.title[
# Computational Psychomet<svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>ics Applied to Enginee<svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>ing
]
.subtitle[
## <br>Bayesian Structural Equation Modeling
]
.author[
### Jo<svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>ge Sinval
]
.date[
### 2024-11-27
]

---

class: inverse, center, middle

# Bayesian Confirmatory Factor Analysis
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html>

.stan-color {
  color: #b2001d;
}

.r-color {
  color: #2167B9;
}

.kbd {
    display: inline-block;
    padding: .2em .5em;
    font-size: 0.75em;
    line-height: 1.75;
    color: #555;
    vertical-align: middle;
    background-color: #fcfcfc;
    border: solid 1px #ccc;
    border-bottom-color: #bbb;
    border-radius: 3px;
    box-shadow: inset 0 -1px 0 #bbb
}

</style>

???
Slides content based on this [document](https://mc-stan.org/users/documentation/case-studies/sem.html).

---

# Bayesian Confirmatory Factor Analysis

.pull-left-1[

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Statistics is like building a bridge. <a href="https://t.co/E0UURWR4zy">pic.twitter.com/E0UURWR4zy</a></p>&mdash; Eric Novik (@ericnovik) <a href="https://twitter.com/ericnovik/status/1774512058836402645?ref_src=twsrc%5Etfw">March 31, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

]

.pull-right-2[

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#assets/img/stats_is_like_bridge.png" alt="Statistics is like building a bridge." width="100%" />
<p class="caption">Statistics is like building a bridge.</p>
</div>

]

---

# Bayesian Confirmatory Factor Analysis

As a measurement model and probably one of the most popular special cases of a SEM, CFA is often used to 1) validate a hypothesized factor structure among multiple variables, 2) estimate the correlation between factors, and 3) obtain factor scores. For example, consider a two-factor `$\left(\eta_{1j}, \eta_{2j}\right)$` model with each factor measured by six items `$\left(y_{1j},\dots, y_{6j}\right)$` for person `$j$`:

`$$\underbrace{\left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]}_{\boldsymbol{y}_{j}}=\underbrace{\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]}_{\boldsymbol{\beta}}+\underbrace{\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]}_{\Lambda}\underbrace{\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]}_{\boldsymbol{\eta}_j}+\underbrace{\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]}_{\boldsymbol{\epsilon}_j}\\ \boldsymbol{y}_{j}=\boldsymbol{\beta}+\boldsymbol{\Lambda}\boldsymbol{\eta}_{j}+\boldsymbol{\epsilon}_{j}\\ \boldsymbol{\epsilon}_{j} \sim N_{I}(\mathbf{0}, \mathbf{\Theta}) \\ \boldsymbol{\eta}_{j} \sim N_{K}(\mathbf{0}, \boldsymbol{\Psi})$$`

the number of items or variables is `$I=6$`, the number of factors is `$K=2$` and `$\mathbf{\Theta}$` is often assumed to be a diagonal matrix.

---

# Bayesian Confirmatory Factor Analysis

`$Y_{ij}$`, is the response of person `$j$` `$\left(j=1,...,J\right)$` on item `$i$` `$\left(i=1,...,I\right)$`,  
  `$\beta_i$` is the intercept for item `$i$`,  
  `$\eta_{jk}$` is the `$k$`th common factor for person `$j$`,  
  `$\lambda_{ik}$` is the factor loading of item `$i$` on factor `$k$`,  
  `$\epsilon_{ij}$` is the random error term for person `$j$` on item `$i$`,  
  `$\boldsymbol{\Psi}$` is the variance-covariance matrix of the common factors `$\boldsymbol{\eta}_j$`; and,  
  `$\mathbf{\Theta}$` is the variance-covariance matrix of the residuals (or unique factors) `$\boldsymbol{\epsilon}_j$`.

---

# Bayesian Confirmatory Factor Analysis

Suppose the errors or residuals `$\epsilon_{ij}$` are independent of each other. Then:

`$\psi_{kk}$` is the variance for the `$k$`th factor,  
  `$\psi_{jk}$` is the covariance between the `$j$`th and `$k$`th factors,  
  `$\theta_{ii}$` is the variance for the `$i$`th residual, and,  
  `$\theta_{ii^\prime}=0 \iff i\neq i^\prime$`.  
  
Specifically, the model can be written as:

`$$\boldsymbol{\Psi}=\mathbb{Cov}\begin{pmatrix}\eta_{1j} \\ \eta_{2j} \end{pmatrix}= \left[\begin{array}{cc} \psi_{1 1}&\psi_{1 2} \\ \psi_{2 1}&\psi_{2 2} \end{array}\right]$$`

`$$\mathbf{\Theta}=\mathbb{Cov}\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right]$$`

---

# Bayesian Confirmatory Factor Analysis

To better illustrate the use of `blavaan` (Merkle and Rosseel, 2018), we simulate data so that we know the data generating parameters. In our simulation, we set `$β_i=0$` for all `$i$`, `$\psi_{11}=\psi_{22}=1$`, `$\psi_{12}=\psi_{21}=.5$`, `$\lambda_{21}=1.5$`, `$\lambda_{31}=2$`, `$\lambda_{52}=1.5$`, `$\lambda_{62}=2$`, and `$\theta_{ii}=.3$`. We simulate data from the above model for `$J=1000$` units. Let's start by loading <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;">  [ comment ]  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> libraries and simulating the data.

``` r
pacman::p_load(rstan, lavaan, blavaan, MASS, mvtnorm, tidyverse, semPlot, Matrix)
options(mc.cores = parallel::detectCores())
J <- 1000; I <- 6; K <- 2
psi <- matrix(c(1, 0.5,
                0.5, 0.8), nrow = K)  
beta <- seq(1, 2, by = .2)
# loading matrix
Lambda <- cbind(c(1, 1.5, 2, 0, 0, 0), c(0, 0, 0, 1, 1.5, 2))
# error covariance
Theta <- diag(0.3, nrow = I)
# factor scores
eta <- mvrnorm(J, mu = c(0, 0), Sigma = psi)
# error term
epsilon <- mvrnorm(J, mu = rep(0, ncol(Theta)),Sigma = Theta)
dat <- tcrossprod(eta, Lambda) + epsilon
dat_cfa  <-  dat |> as.data.frame() |> setNames(c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6"))
```

---

# Bayesian Confirmatory Factor Analysis

We define the model for lavaan as follows:

<div class="pre-name">bcfa-model.R</div>

``` r
model <- '
eta1 =~ Y1 + Y2 + Y3
eta2 =~ Y4 + Y5 + Y6'
```

Two latent variables `eta1` and `eta2` are specified to be measured by three items each, denoted as `eta1 =~ Y1 + Y2 + Y3`, and similary for `eta2`. By not specifying other parts of the model, by default we assume that the error terms for items are uncorrelated with each other while the covariances between latent variables are free to be estimated.

We represent the CFA model in a path diagram and then fit the model by maximum likelihood estimation using the cfa function in the lavaan package. By convention, latent variables `$\eta_1$` and `$\eta_2$` are represented by circles, and observed variables `$Y_1$` to `$Y_6$` by rectangles. Straight arrows represent linear relations (here with coefficients given by the factor loadings `$\lambda$`), and double-headed arrows represent variances and covariances. We could make the diagram by simply using the function call `semPaths(semPlotModel_lavaanModel(model))`.

---

# Bayesian Confirmatory Factor Analysis

Below is the more complex syntax to display Greek letters, subscripts, etc.

<div class="pre-name">bcfa-diagram.R</div>

``` r
fit <-
  semPlotModel_lavaanModel(model,auto.var = TRUE,
                           auto.fix.first = TRUE,
                           auto.cov.lv.x = TRUE)
semPaths(fit,
  what = "paths",
  whatLabels = "par",edge.color = "black",
  nodeLabels = c(expression(paste(Y[1])),expression(paste(Y[2])),expression(paste(Y[3])),
                 expression(paste(Y[4])),expression(paste(Y[5])),expression(paste(Y[6])),
                 expression(paste(eta[1])),expression(paste(eta[2]))),
  edge.label.cex = 0.8,
  edgeLabels = c(expression(paste(lambda[1])),expression(paste(lambda[2])),expression(paste(lambda[3])),
                 expression(paste(lambda[4])),expression(paste(lambda[5])),expression(paste(lambda[6])),
                 expression(paste(epsilon[1])),expression(paste(epsilon[2])),expression(paste(epsilon[3])),
                 expression(paste(epsilon[4])),expression(paste(epsilon[5])),expression(paste(epsilon[6])),
                 expression(paste(psi[1])),expression(paste(psi[2])),
                 expression(paste(Psi[12]))))
```

---

# Bayesian Confirmatory Factor Analysis

---

# Bayesian Confirmatory Factor Analysis

## The frequentist way

``` r
fit_cfa <- cfa(m=model,d = dat_cfa)
summary(fit_cfa, fit.m = T, std=T)
```

.scroll-box-20[

```
## lavaan 0.6-19 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        13
## 
##   Number of observations                          1000
## 
## Model Test User Model:
##                                                       
##   Test statistic                                12.513
##   Degrees of freedom                                 8
##   P-value (Chi-square)                           0.130
## 
## Model Test Baseline Model:
## 
##   Test statistic                              5828.783
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.999
##   Tucker-Lewis Index (TLI)                       0.999
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7908.490
##   Loglikelihood unrestricted model (H1)      -7902.233
##                                                       
##   Akaike (AIC)                               15842.979
##   Bayesian (BIC)                             15906.780
##   Sample-size adjusted Bayesian (SABIC)      15865.491
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.024
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.048
##   P-value H_0: RMSEA <= 0.050                    0.966
##   P-value H_0: RMSEA >= 0.080                    0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.008
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   eta1 =~                                                               
##     Y1                1.000                               0.973    0.870
##     Y2                1.510    0.034   44.782    0.000    1.468    0.935
##     Y3                2.039    0.043   47.118    0.000    1.984    0.963
##   eta2 =~                                                               
##     Y4                1.000                               0.876    0.843
##     Y5                1.500    0.038   39.087    0.000    1.314    0.916
##     Y6                2.023    0.049   41.092    0.000    1.772    0.953
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   eta1 ~~                                                               
##     eta2              0.477    0.034   13.860    0.000    0.560    0.560
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Y1                0.303    0.016   18.998    0.000    0.303    0.242
##    .Y2                0.310    0.023   13.396    0.000    0.310    0.126
##    .Y3                0.308    0.036    8.485    0.000    0.308    0.073
##    .Y4                0.312    0.016   18.947    0.000    0.312    0.289
##    .Y5                0.330    0.024   13.763    0.000    0.330    0.160
##    .Y6                0.314    0.037    8.526    0.000    0.314    0.091
##     eta1              0.946    0.055   17.253    0.000    1.000    1.000
##     eta2              0.767    0.047   16.315    0.000    1.000    1.000
```
]

---

# Bayesian Confirmatory Factor Analysis

By default, `lavaan` uses maximum likelihood estimation. Estimates of the loadings `$\lambda_{ik}$` are given under "Latent Variables", the estimated covariance among the common factors is given under "Covariances", estimates of intercepts `$\beta_i$` of the measurement models, as well as the means of the common factors (set to `$0$`, not estimated), are given under "Intercepts", and estimates of the residual variances of the responses (variances `$\theta_{ii}$` of `$\epsilon_{ij}$`) and of the common factors (variances `$\psi_{kk}$`  of `$\eta_{jk}$`) are given under "Variances."

While `blavaan` borrows the syntax from `lavaan` and is a wrapper for Bayesian estimation using `Stan` (<img width="17px" src="assets/img/stan.png"></img>). The .stan-color[Stan object] created by `blavaan` can be seen using `lavInspect()`.

By default, the `bcfa` function in `blavaan` uses an anchoring item for each factor for identification, so the factor loading of the first item that loads on a factor is fixed at `$1$`. The `bcfa` function also uses weakly informative priors for the factor loadings and residual variances.

---

# Bayesian Confirmatory Factor Analysis

## The Bayesian way

<div class="pre-name">bcfa-run-bayes.R</div>

``` r
fit_bcfa <- bcfa(m=model, d=dat_cfa, mcmcfile = T)
summary(fit_bcfa, std=T)
```

.scroll-box-20[

]

---

# Bayesian Confirmatory Factor Analysis

We can see that the output from `blavaan` resembles that from `lavaan` although it gives the posterior means and standard deviations for the model parameters. We present the maximum likelihood estimates from `lavaan` and Bayesian estimates from `blavaan` next to each other for comparison. Since by default `blavaan` uses non-informative priors and the size of data is large, we see the two sets of estimates are very similar:

<div class="pre-name">bcfa-cfa-comparison.R</div>

``` r
bind_cols(parameterEstimates(fit_cfa)[, 1:4], parameterEstimates(fit_bcfa)[, 4]) %>% rename(ML = est, Bayes = ...5) %>% knitr::kable(caption = "Frequentist (Maximum Likelihood) and Bayesian Estimated Models")
```

.scroll-box-16[

<table>
<caption>Frequentist (Maximum Likelihood) and Bayesian Estimated Models</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> lhs </th>
   <th style="text-align:left;"> op </th>
   <th style="text-align:left;"> rhs </th>
   <th style="text-align:right;"> ML </th>
   <th style="text-align:right;"> Bayes </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y1 </td>
   <td style="text-align:right;"> 1.0000000 </td>
   <td style="text-align:right;"> 1.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y2 </td>
   <td style="text-align:right;"> 1.5095511 </td>
   <td style="text-align:right;"> 1.5109594 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y3 </td>
   <td style="text-align:right;"> 2.0390182 </td>
   <td style="text-align:right;"> 2.0417438 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y4 </td>
   <td style="text-align:right;"> 1.0000000 </td>
   <td style="text-align:right;"> 1.0000000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y5 </td>
   <td style="text-align:right;"> 1.5003525 </td>
   <td style="text-align:right;"> 1.5015390 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:left;"> =~ </td>
   <td style="text-align:left;"> Y6 </td>
   <td style="text-align:right;"> 2.0234836 </td>
   <td style="text-align:right;"> 2.0262906 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y1 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y1 </td>
   <td style="text-align:right;"> 0.3029230 </td>
   <td style="text-align:right;"> 0.3044087 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y2 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y2 </td>
   <td style="text-align:right;"> 0.3099979 </td>
   <td style="text-align:right;"> 0.3117275 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y3 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y3 </td>
   <td style="text-align:right;"> 0.3081206 </td>
   <td style="text-align:right;"> 0.3088713 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y4 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y4 </td>
   <td style="text-align:right;"> 0.3118137 </td>
   <td style="text-align:right;"> 0.3137974 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y5 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y5 </td>
   <td style="text-align:right;"> 0.3298803 </td>
   <td style="text-align:right;"> 0.3325729 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Y6 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> Y6 </td>
   <td style="text-align:right;"> 0.3141533 </td>
   <td style="text-align:right;"> 0.3137212 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:right;"> 0.9463280 </td>
   <td style="text-align:right;"> 0.9483843 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:right;"> 0.7666715 </td>
   <td style="text-align:right;"> 0.7693802 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eta1 </td>
   <td style="text-align:left;"> ~~ </td>
   <td style="text-align:left;"> eta2 </td>
   <td style="text-align:right;"> 0.4770401 </td>
   <td style="text-align:right;"> 0.4776647 </td>
  </tr>
</tbody>
</table>

]

---

# Bayesian Confirmatory Factor Analysis

To check the setting for the underlying .stan-color[Stan program] (i.e., the number of chains, the number of warm-up interations, and the number of post warm-up draws):

<div class="pre-name">bcfa-stan-settings.R</div>

``` r
blavInspect(fit_bcfa, "mcobj")
```

.scroll-box-20[

```
## Inference for Stan model: stanmarg.
## 3 chains, each with iter=1500; warmup=500; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=3000.
## 
##                   mean se_mean   sd    2.5%     25%     50%     75%   97.5%
## ly_sign[1]        1.51    0.00 0.03    1.45    1.49    1.51    1.53    1.58
## ly_sign[2]        2.04    0.00 0.04    1.95    2.01    2.04    2.07    2.13
## ly_sign[3]        1.50    0.00 0.04    1.43    1.47    1.50    1.53    1.58
## ly_sign[4]        2.03    0.00 0.05    1.93    1.99    2.02    2.06    2.13
## Theta_var[1]      0.30    0.00 0.02    0.28    0.29    0.30    0.31    0.34
## Theta_var[2]      0.31    0.00 0.02    0.27    0.30    0.31    0.33    0.36
## Theta_var[3]      0.31    0.00 0.04    0.24    0.28    0.31    0.33    0.38
## Theta_var[4]      0.31    0.00 0.02    0.28    0.30    0.31    0.32    0.35
## Theta_var[5]      0.33    0.00 0.02    0.29    0.32    0.33    0.35    0.38
## Theta_var[6]      0.31    0.00 0.04    0.25    0.29    0.31    0.34    0.39
## Psi_cov[1]        0.48    0.00 0.04    0.41    0.45    0.48    0.50    0.55
## Psi_var[1]        0.95    0.00 0.06    0.84    0.91    0.95    0.99    1.07
## Psi_var[2]        0.77    0.00 0.05    0.68    0.74    0.77    0.80    0.87
## log_lik[1]     -102.38    0.07 2.50 -108.12 -103.82 -102.09 -100.59  -98.44
## log_lik_sat[1]   12.74    0.07 2.50    8.80   10.95   12.45   14.18   18.48
## ppp               0.28    0.01 0.45    0.00    0.00    0.00    1.00    1.00
## lp__           -128.89    0.07 2.50 -134.58 -130.35 -128.58 -127.11 -124.94
##                n_eff Rhat
## ly_sign[1]      2351    1
## ly_sign[2]      2358    1
## ly_sign[3]      2488    1
## ly_sign[4]      2243    1
## Theta_var[1]    4151    1
## Theta_var[2]    3014    1
## Theta_var[3]    2941    1
## Theta_var[4]    3170    1
## Theta_var[5]    2492    1
## Theta_var[6]    2634    1
## Psi_cov[1]      2405    1
## Psi_var[1]      2422    1
## Psi_var[2]      2192    1
## log_lik[1]      1318    1
## log_lik_sat[1]  1318    1
## ppp             2367    1
## lp__            1315    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Nov 27 05:40:22 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
```
]

---

# Bayesian Confirmatory Factor Analysis

We see that there were `$3$` chains with `$1500$` iterations each, the first `$500$` of which were warmup, giving a total of `$3000$` draws. In the .stan-color[Stan] output above, `se_mean` is the Monte Carlo error due to the fact that we use posterior draws to empirically estimate posterior expectations. These Monte Carlo errors are given by `$\frac{sd}{n_{eff}}$`, where `$sd$` here refers to the posterior standard deviation (i.e., `Post.SD`) which can be viewed as a Bayesian counterpart of frequentist standard errors. Because there is an autocorrelation among our Monte Carlo draws within chains, the "effective" sample, `$n_{eff}$`, used to estimate the Monte Carlo errors is smaller than the number of post warmup draws. The effective sample size can be viewed as the number of independent draws that would give the same amount of information in terms of the Monte Carlo errors. We see that the Monte Carlo errors will go to zero as the number of draws goes to infinity.

`Rhat` (i.e., `$\hat{R}$`) quantifies how well the different chains mix with each other (overlap) post warmup, with a value closer to `$1$` less than `$1.1$` indicating that there is adequate mixing. The purpose of running multiple chains with different starting values for the parameters is to assess whether the chains have reached the stationary distribution, i.e., the required correct posterior distribution, by checking that the chains are converging to the same distribution and are hence mixing with each other.

---

# Bayesian Confirmatory Factor Analysis

One benefit of running a Bayesian analysis is that we can incorporate prior information if we have it. The `blavaan` pacakge allows users to specify priors for all the model parameters via the argument `dp` and uses non-informative priors (with flat densities/large variances) if this argument is left blank. When non-informative priors are used for all parameters, the Bayesian estimates will be similar to their maximum likelihood counterparts.

To check the default priors:

<div class="pre-name">bcfa-priors.R</div>

``` r
(default_prior <- dpriors())
```

```
##                nu             alpha            lambda              beta 
##    "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0,10)" 
##             theta               psi               rho             ibpsi 
## "gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
##               tau 
##   "normal(0,1.5)"
```

---

# Bayesian Confirmatory Factor Analysis

To change a prior, we can just substitute the desired one: for example, if we would like to change the prior for beta from `$\mathcal{N}(0, 10)$` to `$\mathcal{N}(0, 1)$`, then we can form a new prior vector:

<div class="pre-name">bcfa-new-prior.R</div>

``` r
(new_prior <- dpriors(beta = "normal(0, 1)"))
```

.scroll-box-20[

```
##                nu             alpha            lambda              beta 
##    "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0, 1)" 
##             theta               psi               rho             ibpsi 
## "gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
##               tau 
##   "normal(0,1.5)"
```

]

---

# Bayesian Confirmatory Factor Analysis

New `bcfa` with updated prior for beta:

<div class="pre-name">bcfa-new-prior-run.R</div>

``` r
fit_bcfa_new_prior <- bcfa(m = model, data=dat_cfa, dp = new_prior)
summary(fit_bcfa_new_prior, std=T)
```

.scroll-box-16[

]

---
# References

Merkle, E. C. and Y. Rosseel (2018). "blavaan : Bayesian Structural
Equation Models via Parameter Expansion". In: _Journal of Statistical
Software_ 85 (4), pp. 1-30. ISSN: 1548-7660. DOI:
[10.18637/jss.v085.i04](https://doi.org/10.18637%2Fjss.v085.i04). URL:
[http://arxiv.org/abs/1511.05604
http://www.jstatsoft.org/v85/i04/](http://arxiv.org/abs/1511.05604
http://www.jstatsoft.org/v85/i04/).

---
class: center, bottom, inverse

# More info

Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;">  [ comment ]  <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan).

Practice is the best strategy for learning.

_In God we trust, all others bring data_

Edwards Deming

THE END

---
class: center, bottom, inverse

![:scale 50%](data:image/png;base64,#assets/gif/the_end.gif)