Latent Variables

Introduction to Latent Variables

Latent Variables (LV) are common in life, social sciences and generally required when measuring constructs or concepts that are agreed upon as relevant valid concepts but are somehow hidden, not directly measurable or requiring inference from other readily available and measurable variables called MVs (MV). Lay definitions describe characteristics of LVs as unobservable, hypothetical and reductionist. * Hypothetical: variables representing hypothetical constructs or estimations of concepts measured by other real phenomena * Unobservable: variables cannot me directly measures and are presumed never to be measurable * Reduction: the LV is a number of score representative of data reduction multiple factors (CFA; EFA)

Examples of LVs include standardized testing scores (GRE), measurement of intelligence (IQ), error in statistical models and clinical diagnoses. Terms used in daily life that are inherently latent induce: ‘growth’, ‘health’, ‘success’ and other similar times that are universally accepted as existing concepts but need to be treated as latent in measurement models.

History of SEM and LVs

To fully capture the scope of LVs, we must first back up a step and consider Structural Equation Modeling (SEM), where LVs were first utilized. SEM was first developed to understand the interaction and structure of unmeasuarable concepts in the education and social science fields. In the early 1900s Spearman and Pearson devloped a factor analysis that is credited as the foundation of the larger SEM umbrella. With computer technology the three arms of SEM (described later) became a widely used statistical technique in many feilds beyond social sciences. Latent variables can be assessed in any of the three arms of SEM and are also seen in other statistical methods, Bayesian for example.

Fields using LVs

SEM and LVs are accepted and widely used in sociology, ecology, economics, chemistry, health sciences, education and many others. Other field specific examples include:

Examples of LVs in feilds of study - SOCIAL: discrimination, motivation, aggression ECONOMY: utility, performance, development ECOLOGY: soil and habitat quality

Why are latent variables needed and what statistical purpose do they serve?

One may need to quantify or show the influence of a variable that cannot be measured. Latent variables derived from measurable variables can explain variation and differences in a model, where the measured variables fall short. A latent variable can also be the culmination of several variables represented as one (reduction).

You have undoubtedly seen a LV model or path model (SEM model without LVs):

knitr::include_graphics('latent.model.png')

knitr::include_graphics('latent.sem.png')

Use and Theoretical Concerns

There are several theoretical concerns surrounding the use of latent variables in statistics. The most salient of which are: * A single generalized definition for LVs is lacking * The use of LVs in models is not universally accepted by statisticians * There are common application problems obscured by current definitions in use

Bollen (2002) described the concerns surrounding existing formal definitions (which are summarized in the next section) and proposed the following adaptable definition of LVs:

“A latent random (or nonrandom) variable is a random (or nonrandom) variable for which there is no sample realization for at least some observations in a given sample…The sample realizations definition permits models with correlated errors of measurement, observed variables that directly or indirectly influence each other, and many other nonstandard models. The key criterion is whether a variable has values for cases in a given sample.” (Bolen 2002, p.612)

This definition allows for missing variables, migration from latent to observed with the acquisition of knowledge and removes independence restrictions across all variables in the model.

Formal Definitions

Local Independence

One or more LVs generate an association between observed variables which otherwise are independent.

\(P[Y_1, Y_2,…, Y_K] = P[Y_1│η]P[Y_2│η]…P[Y_K |η]\) Left side of the equation demonstrates the joint probability of random observed variables Right side of the equation demonstrates conditional probabilities of random observed and LVs (Lord 1953, Lazarsfeld 1959, McDonald 1981, Bartholomew 1987, Hambleton et al. 1991)

Concerns: This definition assumes errors are independent, observed variables do not interact, there are no less than two observed variables, each latent variable directly effects on one or more observed variables, and the observed variables do not directly affect the latent variable. These constraints may lead to a latent variable falling out of analysis as though it may make sense it does not meet the definition (Bollen, 2002).

Expected Value

There exists a true score which is equal to the expected value of the observed variable for an individual. Hypothesis is that repeated measure of same observation for individual absent retest error and previous exposure influence would lead to a true score \(T_i = E(Y_i )\) T is the true score E is the expected value *Y is a random observed variable for ith individual (Lord & Novick 1968, Lumsden 1976, Jöreskog 1971)

Concerns: As above if a latent variable does not meet the definition, it will be excluded. This is so, even if it is a variable that may contribute to differences or explain difference in the system (Bollen, 2002).

Nondeterministic Function of Observed Variables

A variable in a linear structural equation model than cannot be fully explained by the observed variables contributing to it. (Bentler 1982)

Concerns: This definition is restricted to linear models and will not work with categorical data. Also using this definition, as with the others, a latent variable may be found to contribute to a system as defined but this definition, but not if using another definition. There is no consistency between definitions (Bollen, 2002).

LVs in Analysis

Structural Equation Models (SEM)

##  
## |  Analysis                 |  Use                                                  |
## |---------------------------|-------------------------------------------------------|
## | Canonical Correlation     | Measuring correlations from cross-covariance matrices |
## |                           | See common observed contributor variables in two      |
## |                           | LVs use cancor() and CCA package                      |
## |---------------------------|-------------------------------------------------------|
## | Factor Analysis           | This represents the majority of SEM analyses - see    |
## |                           | examples below for execution                          |
## |---------------------------|-------------------------------------------------------|
## | Item Response Theory      | Use ltm package to link multiple items to a single LV | 
## |---------------------------|-------------------------------------------------------|
## | Regression/Correlation    | This represents the majority of SEM analyses - see    |
## |                           | examples below for execution                          | 
## |---------------------------|-------------------------------------------------------|
## | ANOVA                     | Apply as you would do ANOVA in a regression equation  |
## |                           | by using dummy coded IVs to represent categories      |
## |---------------------------|-------------------------------------------------------|
## | Meta-Analysis             | Use metaSEM package to conduct multilevel and multiva-| 
## |                           | riate metaphysical and fixed/random effects on        | 
## |                           | correlation/covariance matrices                       |
## |---------------------------|-------------------------------------------------------|

Examples of SEM models

Time Series (growth) models

Jomnonkwao, S., Uttra, S., & Ratanavaraha, V. (2020). Forecasting Road Traffic Deaths in Thailand: Applications of Time-Series, Curve Estimation, Multiple Linear Regression, and Path Analysis Models. Sustainability, 12(1), 395.

Figure 3 from article. You will note there are no LVs in this example (no oval shapes). EN_TRANSPORT is energy consumption and FA_RATE fatality rate.

knitr::include_graphics('sustainability-12-00395-g003.png')

Canonical Correlation Analysis models

Hart, P. D. (2017). A Canonical Correlation Analysis of Physical Activity Parameters and Body Composition Measures in College Students. American Journal of Sports Science and Medicine, 5(4), 64-68

PA=physical activity; BC=body composition; MMPA=min/week of moderate PA; MSA=days/week of muscle strengthening activity; PBF=percent body fat; BMI=body mass index; WC=waist circumference (cm; VO2Max=oxygen use during exercise

knitr::include_graphics('10.12691.ajssm-5-4-1_20200327074438.png')

Figure 1. Graphical representation of a canonical correlation analysis of BC and PA constructs (Note. The graph represents a single canonical variate)

Item Response Theory

Lonka, E., Hasan, M., & Komulainen, E. (2011). Spoken language skills and educational placement in Finnish children with cochlear implants. Folia Phoniatrica et Logopaedica, 63(6), 296-304.

SLS: spoken language skills; CAP=categories of auditory performance; MC=main mode of communication;PHA=age at cochlear implantation divided by chronological age; GEN=gender

In IRT as opposed to FA you are contributing two different scales of data to one LV. In this case physician characteristics (age at implant, gender, disabilities, and post lingual defense) combined with functional assessments.

knitr::include_graphics('Results-from-the-IRT-SEM-analysis-All-coefficients-are-in-standardized-form-DISA.png')

Fig. 1. Results from the IRT/SEM analysis. All coefficients are in standardized form. DISA = Additional disabilities; DFND = postlingual deafness

Scale Development (Factor analysis)

Hart, T. A., Flora, D. B., Palyo, S. A., Fresco, D. M., Holle, C., & Heimberg, R. G. (2008). Development and Examination of the Social Appearance Anxiety Scale. Assessment, 15(1), 48–59.

knitr::include_graphics('8-Figure2-1.png')

FIGURE 2 Structural Model Demonstrating the Relationship of the Social Appearance Anxiety Scale to Measures of Social Anxiety and Negative Body Image.

NOTE: All paths are significant (p < .05). SAAS = Social Appearance Anxiety Scale; Brief Fear of Negative Evaluation = Brief Fear of Negative Evaluation Scale; Social Interaction Anxiety Scale = Social Interaction Anxiety Scale; Social Phobia Scale = Social Phobia Scale; Appearance Schemas Inventory = Appearance Schemas Inventory; Body Image Ideals Questionnaire = Body Image Ideals Questionnaire; AppEval = Multidimensional Body-Self Relations Questionnaire - Appearance Evaluation Subscale; OwPreoc = Multidimensional Body-Self Relations Questionnaire - Overweight Preoccupation Scale; Social Physique Anxiety Scale = Social Physique Anxiety Scale; AppOrien = Multidimensional Body-Self Relations Questionnaire – Appearance Orientation Scale. High scores on the Appearance Evaluation Scale indicate the participant feels mostly positive and satisfied with his/her appearance, while low scores indicate unhappiness and dissatisfaction with physical appearance.

Steps involved in LV models

You will need to use theoretical principles to propose your variable relationships in your model. You can test mediators/moderators by comparing models with and without the varibale serving in that role. Mediators remove the significance of the direct path whereas moderators share the variance.

1. Make the path diagram (reflective MVs for LVs, LV relationships, formative MVs predicting LVs). You need to determine the relationship between the MVs for each LV. The relationships can be direct (formative, reflective) or indirect (covariance). - Reflective relationships (lavaan =~; plspm “A”). The MVs are presumed to be caused by LV and therefore differences in these MVs can contribute to the overall measurement of said LV. This is the principal behind the scale development - the answers to each scale item is dependent on the overall LV score of the individual and therefore collectively quantify the LV. Think of reflective MVs as pieces of the overall LV pie and you need all of them to provide a score for the LV. - Reflective relationships (lavaan =~, plspm 1 in LV matrix). When an LV is solely predicted by other LVs, this is called a second-order or hierarchical LV - Formative relationships (lavaan ~; plspm “B” and 1 in LV matrix). This applies to both LV and MVs and implies a causal relationship (i.e the variable regresses onto the other). In typical regression the DV is always exogenous; however, in SEM you can regress onto endogenous variables which can then regress onto other variables (or not).

knitr::include_graphics('FormvsRefl.png')

2. Specifying the covariance parameters Previous iterations of SEM and path analytic software packages required a basic understanding of matrix algebra and specifying every possible relationship in the model. These specifications included error covariance, variable covariance, variable relationships and scale of relationships for every possible path in the model. Removing relationships meant setting covariance estimates to 1 which excludes that path from being estimated. Current iterations of SEM programs assume no covariances and therefore the researcher need only remember to specify covariances when a shared variance between variables or errors is possible or needs to be estimated. Covariance is measured in lavaan with (~~). plspm provides cross loadings of manifest variables onto not specified latent variables but does not allow for specifying covariance between manifest variable errors or between latent variables in more complex SEM. Keep in mind the two packages use different techniques.

3. Identification and constraining Identification and constraining occurs for the measurement model and the structural model. - Measurement Model. You typically want to make sure the following occurs: - You set the latent variable scale to 1.0 (standardized) for reflective variables - You can set causal factor variance to 1.0 or you sometime fix a path to 1.0 under select conditions - Make sure you at least three measures for each construct that do no have shared error variance. There are tips for parameter setting if you have fewer than three. - Indicators should not double load onto more than one construct (doing this requires very advanced techniques) - Structural Model. You typically want to make sure the following occurs: - You meet the minimum condition of identifiability \(q>=p\) q= k(k-1)/2 k=# of constructs p= # of paths + # exogenous var correlations + # error variance correlations + - Correlations with Disturbance*. Disturbance is the error variance of endogenous LV - If there is no direct path between two constructs then they must have correlated error variance o r they are correlated with additional exogenous variables - There are methods for correcting violations of these rules Violations of these rules lead to under-identification of a model wherein you have more covariances and paths to estimate than the information you have available in the model. Just-Identified models demonstrate a saturated model wherein p=q. When you have a parsimonious model, you have removed or constrained pathways that are unimportant leaving you with more information than estimates to make.

knitr::include_graphics('ident.png')

3. Estimating fit In traditional SEM there are three estimates for model fit 1) Did the model converge * Iterations shown * Free parameters 0 or higher 2) What is the coefficient of determination for the model and LVs * Regression p-values<.05 * Model significant compared to baseline 3) Fit indices * Goodness of fit statistic chi-Square is not significant * Incremental fit indices are >.90 or .95 depending on index (CFI, GFI, TLI, AGFI) * ‘Badness’ of fit indices are <.08 (RMSEA, RMR, SRMR) 4. Modifying the model * you typically have a few proposed models to compare and contrast based on theory

R Code example for LVs.

LAVAAN()

STEP 1. Load the necessary libraries for the exercise.

library(lavaan)

## This is lavaan 0.6-5

## lavaan is BETA software! Please report any bugs.

library(semPlot)

## Registered S3 methods overwritten by 'huge':
##   method    from   
##   plot.sim  BDgraph
##   print.sim BDgraph

library(tidyverse)

## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidygraph)

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:stats':
## 
##     filter

library(ggraph)

Step 2. Download the data

The data are from the Lavaan tutorial using data from the ‘famous’ Industrialization and Political Democracy dataset.

Data variables are as follows:

##  
## |Var  | Description                                            |
## |-----|--------------------------------------------------------|
## | y1  | Expert ratings of the freedom of the press in 1960     |
## | y2  | The freedom of political opposition in 1960            |
## | y3  | The fairness of elections in 1960                      |
## | y4  | The effectiveness of the elected legislature in 1960   |
## | y5  | Expert ratings of the freedom of the press in 1965     |
## | y6  | The freedom of political opposition in 1965            |
## | y7  | The fairness of elections in 1965                      |
## | y8  | The effectiveness of the elected legislature in 1965   |
## | x1  | he gross national product (GNP) per capita in 1960     |
## | x2  | The inanimate energy consumption per capita in 1960    |
## | x3  | The percentage of the labor force in industry in 1960  |

data("PoliticalDemocracy")
head(PoliticalDemocracy,n=3)

##     y1  y2       y3       y4   y5       y6       y7       y8       x1       x2
## 1 2.50 0.0 3.333333 0.000000 1.25 0.000000 3.726360 3.333333 4.442651 3.637586
## 2 1.25 0.0 3.333333 0.000000 6.25 1.100000 6.666666 0.736999 5.384495 5.062595
## 3 7.50 8.8 9.999998 9.199991 8.75 8.094061 9.999998 8.211809 5.961005 6.255750
##         x3
## 1 2.557615
## 2 3.568079
## 3 5.224433

This is our model

knitr::include_graphics('sem-1.png')

Step 3. We will use the data to develop a LV model using the lavaan package.

Step 4. Write the LV model indicating using (=~) for reflective manifest-latent relationships (i.e. cfa), ~ for formative relationships (i.e. regressions), and (~~) to specify variances/covariances.

LVdata.model <-'
# CFAs
ind60 =~ x1 + x2 + x3 
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# Regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# Residuals
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'

####Step 5. Now we run the lavaan model. The last statement in the function provides various ways of reducing the number of identifications predicted by the model. Unstandardized estimates allow the LV to be on the same scale as the MVs. std.lv=False Standardized estimates set the latent variable scale to 1.0 std.lv=True All Standardized estimates set the latent variable and the MV scale to 1.0 std.all=True Nox Standardized estimates set the LV and the MV scale to 1.0 excluding any exogenous MVs std.nox=True

You can use cfa(), sem() or lavaan() to run the models (lavaan offers the most flexibility in model specification)

We will use standardized

LVdata.fit <- lavaan:::sem(LVdata.model, data= PoliticalDemocracy, std.lv=T)

Step 6. Print parameter estimates so we can determine the fit of the model

summary(LVdata.fit, fit.measures=TRUE, standardized = TRUE)

## lavaan 0.6-5 ended normally after 73 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         31
##                                                       
##   Number of observations                            75
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                38.125
##   Degrees of freedom                                35
##   P-value (Chi-square)                           0.329
## 
## Model Test Baseline Model:
## 
##   Test statistic                               730.654
##   Degrees of freedom                                55
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.995
##   Tucker-Lewis Index (TLI)                       0.993
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1547.791
##   Loglikelihood unrestricted model (H1)      -1528.728
##                                                       
##   Akaike (AIC)                                3157.582
##   Bayesian (BIC)                              3229.424
##   Sample-size adjusted Bayesian (BIC)         3131.720
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.035
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.092
##   P-value RMSEA <= 0.05                          0.611
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.044
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   ind60 =~                                                              
##     x1                0.670    0.065   10.346    0.000    0.670    0.920
##     x2                1.460    0.128   11.424    0.000    1.460    0.973
##     x3                1.218    0.128    9.480    0.000    1.218    0.872
##   dem60 =~                                                              
##     y1                1.989    0.232    8.589    0.000    2.223    0.850
##     y2                2.500    0.371    6.740    0.000    2.794    0.717
##     y3                2.104    0.308    6.833    0.000    2.351    0.722
##     y4                2.516    0.297    8.476    0.000    2.812    0.846
##   dem65 =~                                                              
##     y5                0.415    0.259    1.606    0.108    2.103    0.808
##     y6                0.492    0.311    1.584    0.113    2.493    0.746
##     y7                0.531    0.332    1.599    0.110    2.691    0.824
##     y8                0.526    0.329    1.596    0.111    2.662    0.828
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   dem60 ~                                                               
##     ind60             0.499    0.144    3.460    0.001    0.447    0.447
##   dem65 ~                                                               
##     ind60             0.923    0.628    1.469    0.142    0.182    0.182
##     dem60             4.010    2.617    1.533    0.125    0.885    0.885
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.624    0.358    1.741    0.082    0.624    0.296
##  .y2 ~~                                                                 
##    .y4                1.313    0.702    1.871    0.061    1.313    0.273
##    .y6                2.153    0.734    2.934    0.003    2.153    0.356
##  .y3 ~~                                                                 
##    .y7                0.795    0.608    1.308    0.191    0.795    0.191
##  .y4 ~~                                                                 
##    .y8                0.348    0.442    0.787    0.431    0.348    0.109
##  .y6 ~~                                                                 
##    .y8                1.356    0.568    2.386    0.017    1.356    0.338
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .x1                0.082    0.019    4.184    0.000    0.082    0.154
##    .x2                0.120    0.070    1.718    0.086    0.120    0.053
##    .x3                0.467    0.090    5.177    0.000    0.467    0.239
##    .y1                1.891    0.444    4.256    0.000    1.891    0.277
##    .y2                7.373    1.374    5.366    0.000    7.373    0.486
##    .y3                5.067    0.952    5.324    0.000    5.067    0.478
##    .y4                3.148    0.739    4.261    0.000    3.148    0.285
##    .y5                2.351    0.480    4.895    0.000    2.351    0.347
##    .y6                4.954    0.914    5.419    0.000    4.954    0.443
##    .y7                3.431    0.713    4.814    0.000    3.431    0.322
##    .y8                3.254    0.695    4.685    0.000    3.254    0.315
##     ind60             1.000                               1.000    1.000
##    .dem60             1.000                               0.800    0.800
##    .dem65             1.000                               0.039    0.039

Step 7. Generate a plot diagramming your model using semPlot

Step 8. Add standardized parameters to the model.

PLSPM()

STEP 1. Load the necessary libraries for the exercise.

library(plspm)

## 
## Attaching package: 'plspm'

## The following object is masked from 'package:ggplot2':
## 
##     alpha

Step 2. Download the data

The data are from PLS Path Modeling with R by Gaston Sanchez. The data are statistics on Spanish football (aka soccer) teams and we are using these data obviously because Spain is the best.

Data variables are as follows:

##  
## | Var  | Description                                          |
## |------|------------------------------------------------------|
## | GSH  | total # goals score at home                          |
## | GSA  | total # goals score at away                          |
## | SSH  | percentage of matches with scores goals at home      |
## | SSA  | percentage of matches with scores goals away         |
## | GCH  | total number of goals conceded at home               |
## | GCA  | total number of goals conceded away                  |
## | CSH  | percentage of matches with no conceded goals at home |
## | CSA  | percentage of matches with no conceded goals away    |
## | WMH  | total number of won matches at home                  |
## | WMA  | total number of won matches away                     |
## | LWR  | longest run of won matches                           |
## | LRWL | longest run of matches without losing                |
## | YC   | total number of yellow cards                         |
## | RC   | total number of red cards                            |

data(spainfoot)
head(spainfoot,n=3)

##            GSH GSA  SSH  SSA GCH GCA  CSH  CSA WMH WMA LWR LRWL  YC RC
## Barcelona   61  44 0.95 0.95  14  21 0.47 0.32  14  13  10   22  76  6
## RealMadrid  49  34 1.00 0.84  29  23 0.37 0.37  14  11  10   18 115  9
## Sevilla     28  26 0.74 0.74  20  19 0.42 0.53  11  10   4    7 100  8

This is our model

knitr::include_graphics('example model.png')

Step 3. We will use the data to develop a LV model using the plspm package.

####Step 4. First we specify the “inner model” of the LVs relationships with 1 representing the direction of the path to from one variable to the variable with a 1.

#rows of LV matrix
Attack = c(0,0,0)
Defense = c(0,0,0)
Success = c(1,1,0)
LVPath=rbind(Attack, Defense, Success)
colnames(LVPath) = rownames(LVPath)

####Step 5. Next we specify the “outer model” which include all the MVs in the model. This calls in the variables by column from the data frame. Second step specifies if the relationship is reflective (“A”) or formative (“B”).

LVBlocks=list(1:4,5:8,9:12)
LVmodes=c("A","A","B")

####Step 6. We run the analysis in plspm.

LVpls=plspm(spainfoot, LVPath,LVBlocks,modes=LVmodes)

Step 7. Print parameter estimates so we can determine the fit of the model

Things to note for a good model: 1) Chi-Square is not significant: this indicates that the overall model is a good fit 2) CFI measure ~1.00: Indicates the model is an improvement over the null model 3) RMSEA is significant: This means the variables within your model fit closely together

summary(LVpls, fit.measures=TRUE, standardized = TRUE)

## PARTIAL LEAST SQUARES PATH MODELING (PLS-PM) 
## 
## ---------------------------------------------------------- 
## MODEL SPECIFICATION 
## 1   Number of Cases      20 
## 2   Latent Variables     3 
## 3   Manifest Variables   12 
## 4   Scale of Data        Standardized Data 
## 5   Non-Metric PLS       FALSE 
## 6   Weighting Scheme     centroid 
## 7   Tolerance Crit       1e-06 
## 8   Max Num Iters        100 
## 9   Convergence Iters    9 
## 10  Bootstrapping        FALSE 
## 11  Bootstrap samples    NULL 
## 
## ---------------------------------------------------------- 
## BLOCKS DEFINITION 
##       Block         Type   Size   Mode
## 1    Attack    Exogenous      4      A
## 2   Defense    Exogenous      4      A
## 3   Success   Endogenous      4      B
## 
## ---------------------------------------------------------- 
## BLOCKS UNIDIMENSIONALITY 
##          Mode  MVs  C.alpha  DG.rho  eig.1st  eig.2nd
## Attack      A    4    0.891   0.925     3.02    0.792
## Defense     A    4    0.000   0.026     2.39    1.175
## Success     B    4    0.000   0.000     3.22    0.537
## 
## ---------------------------------------------------------- 
## OUTER MODEL 
##           weight  loading  communality  redundancy
## Attack                                            
##   1 GSH    0.325    0.930        0.865       0.000
##   1 GSA    0.287    0.872        0.760       0.000
##   1 SSH    0.275    0.829        0.687       0.000
##   1 SSA    0.262    0.839        0.704       0.000
## Defense                                           
##   2 GCH   -0.125    0.468        0.219       0.000
##   2 GCA   -0.401    0.888        0.788       0.000
##   2 CSH    0.274   -0.720        0.518       0.000
##   2 CSA    0.430   -0.905        0.819       0.000
## Success                                           
##   3 WMH    0.359    0.698        0.487       0.428
##   3 WMA    0.727    0.941        0.886       0.779
##   3 LWR   -0.437    0.863        0.746       0.655
##   3 LRWL   0.487    0.909        0.826       0.726
## 
## ---------------------------------------------------------- 
## CROSSLOADINGS 
##           Attack  Defense  Success
## Attack                            
##   1 GSH    0.930   -0.511    0.847
##   1 GSA    0.872   -0.337    0.747
##   1 SSH    0.829   -0.413    0.715
##   1 SSA    0.839   -0.347    0.682
## Defense                           
##   2 GCH   -0.128    0.468   -0.207
##   2 GCA   -0.463    0.888   -0.666
##   2 CSH    0.312   -0.720    0.455
##   2 CSA    0.420   -0.905    0.714
## Success                           
##   3 WMH    0.697   -0.411    0.698
##   3 WMA    0.776   -0.721    0.941
##   3 LWR    0.840   -0.532    0.863
##   3 LRWL   0.858   -0.586    0.909
## 
## ---------------------------------------------------------- 
## INNER MODEL 
## $Success
##              Estimate   Std. Error     t value   Pr(>|t|)
## Intercept   -5.13e-16       0.0844   -6.08e-15   1.00e+00
## Attack       6.73e-01       0.0954    7.05e+00   1.93e-06
## Defense     -4.10e-01       0.0954   -4.29e+00   4.92e-04
## 
## ---------------------------------------------------------- 
## CORRELATIONS BETWEEN LVs 
##          Attack  Defense  Success
## Attack    1.000   -0.467    0.865
## Defense  -0.467    1.000   -0.724
## Success   0.865   -0.724    1.000
## 
## ---------------------------------------------------------- 
## SUMMARY INNER MODEL 
##                Type     R2  Block_Communality  Mean_Redundancy    AVE
## Attack    Exogenous  0.000              0.754            0.000  0.754
## Defense   Exogenous  0.000              0.586            0.000  0.586
## Success  Endogenous  0.879              0.736            0.647  0.000
## 
## ---------------------------------------------------------- 
## GOODNESS-OF-FIT 
## [1]  0.78
## 
## ---------------------------------------------------------- 
## TOTAL EFFECTS 
##         relationships  direct  indirect   total
## 1   Attack -> Defense   0.000         0   0.000
## 2   Attack -> Success   0.673         0   0.673
## 3  Defense -> Success  -0.410         0  -0.410

Step 8. Generate a plot diagramming your model using plspm

   plot(LVpls)

   plot(LVpls,what="load
        ings", arr.width=0.1)

References and Resources

Beaujean, A. A. (2014). Latent Variable modeling using R: A step-by-step guide. Routledge. Figure 3.1
Rizopoulos, D. (2006). ltm: An R package for Latent Variable modeling and item response theory analyses. Journal of statistical software, 17(5), 1-25.
Cheung, M. W. L. (2015). metaSEM: An R package for meta-analysis using structural equation modeling. Frontiers in Psychology, 5, 1521.
González, I., Déjean, S., Martin, P. G., & Baccini, A. (2008). CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12), 1-14.
Sanchez, G. (2013) PLS Path Modeling with R, Trowchez Editions. Berkeley, 2013. http://www.gastonsanchez.com/PLS Path Modeling with R.pdf
Lefcheck, J. (2019) Structural Equation Modeling in R for Ecology and Evolution, GitHub https://jslefche.github.io/sem_book/index.html
Bollen, K. A. (2002). Latent Variables in Psychology and the Social Sciences. Annual Review of Psychology, 53(1), 605–634. https://doi.org/10.1146/annurev.psych.53.100901.135239
http://lavaan.ugent.be/start.html
https://psu-psychology.github.io/r-bootcamp-2018/talks/lavaan_tutorial.html
http://www.sachaepskamp.com/documentation/semPlot/semPaths.html

Latent Variables

Jennifer Andrews and Maureen Galindo

3/26/2020

Introduction to Latent Variables

History of SEM and LVs

Fields using LVs

Why are latent variables needed and what statistical purpose do they serve?

Use and Theoretical Concerns

Formal Definitions

Local Independence

Expected Value

Nondeterministic Function of Observed Variables

LVs in Analysis

Structural Equation Models (SEM)

Examples of SEM models

Time Series (growth) models

Canonical Correlation Analysis models

Item Response Theory

Scale Development (Factor analysis)

Steps involved in LV models

R Code example for LVs.

LAVAAN()

STEP 1. Load the necessary libraries for the exercise.

Step 2. Download the data

Step 3. We will use the data to develop a LV model using the lavaan package.

Step 4. Write the LV model indicating using (=~) for reflective manifest-latent relationships (i.e. cfa), ~ for formative relationships (i.e. regressions), and (~~) to specify variances/covariances.

Step 6. Print parameter estimates so we can determine the fit of the model

Step 7. Generate a plot diagramming your model using semPlot

Step 8. Add standardized parameters to the model.

PLSPM()

STEP 1. Load the necessary libraries for the exercise.

Step 2. Download the data

Step 3. We will use the data to develop a LV model using the plspm package.

Step 7. Print parameter estimates so we can determine the fit of the model

Step 8. Generate a plot diagramming your model using plspm

References and Resources