class: center, middle, inverse, title-slide .title[ # A priori distributions in Bayesian structural equation modeling ] .subtitle[ ##
A scoping review protocol ] .author[ ### Jorge Sinval
Sonja Winter
Joseph B. Kadane
Edgar C. Merkle ] .date[ ### SEM II (MAC: Johnson) | 2025-07-22 ] --- class: inverse, center, middle # .white[The authors] <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .white { color: #FFFFFF; } .red { color: #FF0000; } .green { color: #00FF00; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .stan-color { color: #b2001d; } .r-color { color: #2167B9; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style> .white[ <br> <div style="display: flex; justify-content: space-around; align-items: flex-start; text-align: center;"> <div style="flex: 1; padding: 0 10px;"> <img src="https://i1.rgstatic.net/ii/profile.image/1104223538290690-1640278815770_Q512/Jorge-Sinval.jpg" style="height: 200px; border-radius: 50%;"> <p style="margin-top: 10px;"><b>Jorge Sinval</b><br>Nanyang Technological University</p> </div> <div style="flex: 1; padding: 0 10px;"> <img src="https://cehd.missouri.edu/wp-content/uploads/2022/04/Sonja-Winter-2023-web.jpg" style="height: 200px; border-radius: 50%;"> <p style="margin-top: 10px;"><b>Sonja Winter</b><br>University of Missouri</p> </div> <div style="flex: 1; padding: 0 10px;"> <img src="https://www.cmu.edu/dietrich/statistics-datascience/people/faculty/images/joseph-kadane-800x800-min.jpg" style="height: 200px; border-radius: 50%;"> <p style="margin-top: 10px;"><b>Joseph B. Kadane</b><br>Carnegie Mellon University</p> </div> <div style="flex: 1; padding: 0 10px;"> <img src="https://psychology.missouri.edu/sites/default/files/styles/large/public/people-img/2023-04/merkle_2023_sm.jpg?itok=NTC-5eT1" style="height: 200px; border-radius: 50%;"> <p style="margin-top: 10px;"><b>Edgar C. Merkle</b><br>University of Missouri</p> </div> </div> ] --- class: inverse, center, middle # .white[Bayesian Statistics] --- # Posterior Bayesian estimate is a posterior distribution over parameters `\(Pr(parameters|data)\)`. <center> <img src="data:image/png;base64,#assets/img/bayes_chat_theorem.jpg" width="90%" /> </center> --- # Bayes' Rule/Theorem We can solve for the posterior distribution `\(Pr(\theta|y)\)`, represents the probability for our parameter(s) of interest `\(\left(\theta\right)\)`, given data `\(\left(y\right)\)` `$$p(\theta|y)=\frac{p(\theta,y)}{p(y)}=\frac{p(y|\theta)p(\theta)}{p(y)}$$` `$$p(\theta|y) \propto p(y|\theta)p(\theta)$$` --- class: inverse, center, middle # .white[Bayesian Structural Equation Modeling] ??? Slides content based on this [document](https://mc-stan.org/users/documentation/case-studies/sem.html). --- class: middle, center # The Rise of Bayesian SEM .left-column[ ### Challenges in Frequentist SEM - Nonconvergence - Heywood Cases (e.g., negative variances) - Small Sample Sizes - Inadmissible Solutions ] .right-column[ ### Bayesian Solutions - A defining feature of Bayesian SEM is the use of **a priori distributions**. - Allows integration of prior knowledge. - More stable estimation in complex models. ] --- class: inverse, center, middle # .white[One Application: Bayesian Confirmatory Factor Analysis] --- # Bayesian Confirmatory Factor Analysis As a measurement model and probably one of the most popular special cases of a SEM, CFA is often used to 1) validate a hypothesized factor structure among multiple variables, 2) estimate the correlation between factors, and 3) obtain factor scores. For example, consider a two-factor `\(\left(\eta_{1j}, \eta_{2j}\right)\)` model with each factor measured by six items `\(\left(y_{1j},\dots, y_{6j}\right)\)` for person `\(j\)`: `$$\underbrace{\left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]}_{\boldsymbol{y}_{j}}=\underbrace{\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]}_{\boldsymbol{\beta}}+\underbrace{\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]}_{\Lambda}\underbrace{\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]}_{\boldsymbol{\eta}_j}+\underbrace{\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]}_{\boldsymbol{\epsilon}_j}\\ \boldsymbol{y}_{j}=\boldsymbol{\beta}+\boldsymbol{\Lambda}\boldsymbol{\eta}_{j}+\boldsymbol{\epsilon}_{j}\\ \boldsymbol{\epsilon}_{j} \sim N_{I}(\mathbf{0}, \mathbf{\Theta}) \\ \boldsymbol{\eta}_{j} \sim N_{K}(\mathbf{0}, \boldsymbol{\Psi})$$` the number of items or variables is `\(I=6\)`, the number of factors is `\(K=2\)` and `\(\mathbf{\Theta}\)` is often assumed to be a diagonal matrix. --- # Bayesian Confirmatory Factor Analysis `\(Y_{ij}\)`, is the response of person `\(j\)` `\(\left(j=1,...,J\right)\)` on item `\(i\)` `\(\left(i=1,...,I\right)\)`, `\(\beta_i\)` is the intercept for item `\(i\)`, `\(\eta_{jk}\)` is the `\(k\)`th common factor for person `\(j\)`, `\(\lambda_{ik}\)` is the factor loading of item `\(i\)` on factor `\(k\)`, `\(\epsilon_{ij}\)` is the random error term for person `\(j\)` on item `\(i\)`, `\(\boldsymbol{\Psi}\)` is the variance-covariance matrix of the common factors `\(\boldsymbol{\eta}_j\)`; and, `\(\mathbf{\Theta}\)` is the variance-covariance matrix of the residuals (or unique factors) `\(\boldsymbol{\epsilon}_j\)`. --- # Bayesian Confirmatory Factor Analysis Suppose the errors or residuals `\(\epsilon_{ij}\)` are independent of each other. Then: `\(\psi_{kk}\)` is the variance for the `\(k\)`th factor, `\(\psi_{jk}\)` is the covariance between the `\(j\)`th and `\(k\)`th factors, `\(\theta_{ii}\)` is the variance for the `\(i\)`th residual, and, `\(\theta_{ii^\prime}=0 \iff i\neq i^\prime\)`. Specifically, the model can be written as: `$$\boldsymbol{\Psi}=\mathbb{Cov}\begin{pmatrix}\eta_{1j} \\ \eta_{2j} \end{pmatrix}= \left[\begin{array}{cc} \psi_{1 1}&\psi_{1 2} \\ \psi_{2 1}&\psi_{2 2} \end{array}\right]$$` `$$\mathbf{\Theta}=\mathbb{Cov}\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right]$$` --- exclude: true # Bayesian Confirmatory Factor Analysis To better illustrate the use of `blavaan` (Merkle and Rosseel, 2018), we simulate data so that we know the data generating parameters. In our simulation, we set `\(β_i=0\)` for all `\(i\)`, `\(\psi_{11}=\psi_{22}=1\)`, `\(\psi_{12}=\psi_{21}=.5\)`, `\(\lambda_{21}=1.5\)`, `\(\lambda_{31}=2\)`, `\(\lambda_{52}=1.5\)`, `\(\lambda_{62}=2\)`, and `\(\theta_{ii}=.3\)`. We simulate data from the above model for `\(J=1000\)` units. Let's start by loading <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> libraries and simulating the data. <div class="pre-name">bcfa_sim_data.R</div> ``` r pacman::p_load(rstan, lavaan, blavaan, MASS, mvtnorm, tidyverse, semPlot, Matrix) options(mc.cores = parallel::detectCores()) J <- 1000; I <- 6; K <- 2 psi <- matrix(c(1, 0.5, 0.5, 0.8), nrow = K) beta <- seq(1, 2, by = .2) # loading matrix Lambda <- cbind(c(1, 1.5, 2, 0, 0, 0), c(0, 0, 0, 1, 1.5, 2)) # error covariance Theta <- diag(0.3, nrow = I) # factor scores eta <- mvrnorm(J, mu = c(0, 0), Sigma = psi) # error term epsilon <- mvrnorm(J, mu = rep(0, ncol(Theta)),Sigma = Theta) dat <- tcrossprod(eta, Lambda) + epsilon dat_cfa <- dat |> as.data.frame() |> setNames(c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6")) ``` --- exclude: true # Bayesian Confirmatory Factor Analysis We define the model for lavaan as follows: <div class="pre-name">bcfa-model.R</div> ``` r model <- ' eta1 =~ Y1 + Y2 + Y3 eta2 =~ Y4 + Y5 + Y6' ``` Two latent variables `eta1` and `eta2` are specified to be measured by three items each, denoted as `eta1 =~ Y1 + Y2 + Y3`, and similary for `eta2`. By not specifying other parts of the model, by default we assume that the error terms for items are uncorrelated with each other while the covariances between latent variables are free to be estimated. We represent the CFA model in a path diagram and then fit the model by maximum likelihood estimation using the cfa function in the lavaan package. By convention, latent variables `\(\eta_1\)` and `\(\eta_2\)` are represented by circles, and observed variables `\(Y_1\)` to `\(Y_6\)` by rectangles. Straight arrows represent linear relations (here with coefficients given by the factor loadings `\(\lambda\)`), and double-headed arrows represent variances and covariances. We could make the diagram by simply using the function call `semPaths(semPlotModel_lavaanModel(model))`. --- exclude: true # Bayesian Confirmatory Factor Analysis Below is the more complex syntax to display Greek letters, subscripts, etc. <div class="pre-name">bcfa-diagram.R</div> ``` r fit <- semPlotModel_lavaanModel(model,auto.var = TRUE, auto.fix.first = TRUE, auto.cov.lv.x = TRUE) semPaths(fit, what = "paths", whatLabels = "par",edge.color = "black", nodeLabels = c(expression(paste(Y[1])),expression(paste(Y[2])),expression(paste(Y[3])), expression(paste(Y[4])),expression(paste(Y[5])),expression(paste(Y[6])), expression(paste(eta[1])),expression(paste(eta[2]))), edge.label.cex = 0.8, edgeLabels = c(expression(paste(lambda[1])),expression(paste(lambda[2])),expression(paste(lambda[3])), expression(paste(lambda[4])),expression(paste(lambda[5])),expression(paste(lambda[6])), expression(paste(epsilon[1])),expression(paste(epsilon[2])),expression(paste(epsilon[3])), expression(paste(epsilon[4])),expression(paste(epsilon[5])),expression(paste(epsilon[6])), expression(paste(psi[1])),expression(paste(psi[2])), expression(paste(Psi[12])))) ``` --- # Bayesian Confirmatory Factor Analysis <img src="data:image/png;base64,#imps2025_priors_protocol_files/figure-html/bcfa-diagram-1.png" width="100%" height="99%" /> --- # The Problem: A Critical but Underexplored Method .large[ **"The choice and specification of priors is a critical but underexplored aspect of Bayesian SEM."** ] -- .pull-bottom[ Priors are a key strength, but also a source of skepticism. <br> Current practice often relies on defaults or lacks clear justification. ] --- # Project Goal & Guiding Research Questions ### Goal: To systematically map the literature on the application of *a priori* distributions in Bayesian SEM (CFA and full SEM) via a scoping review. Protocol: following PRISMA-P and PRISMA-ScR Scoping review: flowing PRISMA-ScR -- ### Research Questions: 1. How are priors for key parameters (loadings, variances, etc.) specified? 2. What distribution families & hyperparameters are common? 3. How are these choices justified? 4. What are the reported impacts on model results? --- # Method Part 1: Search Strategy ### Databases: Scopus, Web of Science, PsycINFO, ProQuest -- ### Core Search String: `("Bayesian" OR "Bayes") AND ("Structural Equation Model*" OR "SEM" OR "Confirmatory Factor Analysis" OR "CFA") AND ("prior" OR "a priori" OR "prior distribution*" OR "priors")` -- ### Key Inclusion Criteria: - Peer-reviewed journal articles - Applied use of CFA or full SEM - Sufficient detail on prior specification --- # Method Part 2: Screening & Data Extraction .left-column[ ### Screening Process (Following PRISMA-ScR) - **Identification** - **Screening** (Title/Abstract) - **Eligibility** (Full Text) - **Included Studies** *(Dual, independent reviewers at each stage)* ] .right-column[ ### Data to be Extracted - Model Type (CFA/SEM) - type of prior - Parameters & Prior Families - Hyperparameter Values - **Justification for Prior Choice** - Software Used (Mplus, Stan, etc.) - Type of study: appliation vs. simulation ] --- # Method Part 3: Data Synthesis & Analysis ### Quantitative Synthesis: - Descriptive statistics to summarize frequencies. - *(e.g., % of studies using informative vs. non-informative priors, common hyperparameter values).* -- ### Qualitative Synthesis: - Thematic analysis of authors' justifications. - *(e.g., identifying themes like "reliance on defaults," "theory-driven choices," "sensitivity analysis").* --- # Expected Contributions & Significance 1. **Map the Field:** - Create the first comprehensive overview of current practices in prior specification. -- 2. **Promote Informed Decisions:** - Provide an evidence base to help researchers move beyond default settings. -- 3. **Enhance Methodological Rigor:** - Improve the quality, transparency, and robustness of future Bayesian SEM applications. --- # Conclusion & Current Status ### Summary: The rigorous choice of priors is fundamental to robust Bayesian SEM. This scoping review will systematically map current practices to guide future research. -- ### Current Status: - ⏳ Protocol Registered - ⏳ Database Search Complete - ⏳ Screening Underway --- class: center, middle # Thank You ### Questions? .pull-bottom[ Jorge Sinval | jorge.sinval@nie.edu.sg <br> *Collaborators: Sonja Winter, Joseph B. Kadane, Edgar C. Merkle* ] --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 