In this last lab we are going to examine structural equation modeling and then return to a principle that we have already harked on in this class, multiple paths to the same answer.

Introduction to SEM

Resources:

Book: Cause and Correlation in Biology, Bill Shipley 2016

Website: Introduction of the Framework

Website: Video and 3-part seminar *I highly recommend going through these seminars if you are interested in this method. The question often pops up “Isn’t SEM just linear regression” and I think this does a great job showing how these methods differ with respect to covariance in residual errors.

PDF: Reference Sheet

Structural Equation Modeling (SEM)

Structural equation modeling is a linear model framework that models both simultaneous regression equations with latent variables. Two types of models within this framework include: confirmatory factor analysis and path analysis. The working definition here, is that structural equation modeling is a model selection approach to path analysis that either includes latent variables or statistically compares different path models with the goal of comparing different causal hypotheses and examining indirect effects. Latent variables are key components of SEM. Latent variables are immeasurable or unobservable variables but whose influence can be summarized through observable indicator variables- variables that are measured directly- that are caused by the latent constructs. An example of a latent variable in ecology is climate change, whereby changes in means or extremes of climate variables, or rates of change in these variables, are used as indicators for the latent construct- climate change. These methods are widely used outside of ecology, in part because fields such as psychology use survey data items to construct latent variables of interest such as depression, anxiety and intelligence, but the theory and methods are also very useful to apply to, and consider for, ecological questions.

SEM accounts for the following relationships:

  • observed to observed variables (e.g., regression)
  • latent to observed variables (e.g., confirmatory factor analysis)
  • latent to latent variables (e.g., structural regression)

Path Diagram

Let’s visualize the components of a path diagram in a heuristic visualization, to understand the components of a path diagram and strength of the SEM approach.

Summary/Comparison of Multivariate Methods

Summary/Comparison of Multivariate Methods

  • Exogenous variable: represented by (x1). These are independent variables (either observed or latent) that explains an endogenous variable. Arrows point away from them, not at them.

  • Endogenous variables: represented by (y1 and y2). These are dependent variables (either observed or letent) that are explained by exogenous variables in the model. Arrows point toward them and these represent causal paths.

  • Arrows: (also called edges) are relationships among variables. Sometimes arrows are used to denote positive relationships while blunted lines repreesent negative relationships.

  • Vertices: (also called nodes) are the variables themselves.

In the above model we are saying that x1 is a predictor of y1 and y2. Additionally, we are saying that y1 is a predictor of y2. This is another key feature of SEMs. We can separate the direct effect of x1 on y1 and the indirect effect of x1 on y2 mediated by y1. Each line in the diagram represents a statistical model. For instance the model predicting y1 looks like this y1 = x1 + error. Each response variable gets its own error (\(\zeta\)).

Typically, before conducting the analysis, it is useful to explicitly draw out the metamodel (model with the variables of interest including (+/-) relationships to represent your hypotheses.

lavvan: R package to explore SEM

Frequently used syntax in lavaan

  • ~ predict, used for regression of observed outcome to observed predictors (e.g., y ~ x). “…is regressed on..”
  • =~ indicator, used for latent variable to observed indicator in factor analysis measurement models (e.g., f =~ q + r + s) “…is measured by…”
  • ~~ covariance (e.g., x1 ~~ x2), variance (e.g., x1 ~~ x1). “…is correlated with..”
  • ~1 intercept or mean (e.g., x ~ 1 estimates the mean of variable x)
  • 1* fixes parameter or loading to one (e.g., f =~ 1*q)
  • NA* frees parameter or loading (useful to override default marker method, (e.g., f =~ NA*q)
  • a* labels the parameter ‘a’, used for model constraints (e.g., f =~ a*q)

Example model structure:

myModel <- '
# regressions
y1 + y2 ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2

# latent variable definitions
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10

# variances and covariances
y1 ~~ y1
y1 ~~ y2
f1 ~~ f2

# intercepts
y1 ~ 1
f1 ~ 1
'

Practice

Let’s get on to working with real data. For this lab we will simulate everything.

library(lavaan)
library(lavaanPlot)
library(piecewiseSEM)
library(tidyverse)
library(rjags)
library(jagsUI)
library(coda)
library(matrixStats)

Here we will simulate data of caterpillar richness measured over 60 years. First I simulate that temperature is increasing. Then I simulate that caterpillar richness is declining as a function of both temperature and year. Then I add a precip variable with no trends and no effect on caterpillars.

set.seed(2002)
## years
years <- c(1:60)

#precip variable
precip <- rnorm(60, mean=100, sd=10) # precip is simulate from a random distribution with a mean of 100, standard devation of 10 . 60 values will be taken from that distribution (one for each year).

## simulate the increasing temperature anomolies
temperature <-  .05 + .05*years + rnorm(length(years),1,.5) #temperature is modeled with an intercept of 0.05 a positive slope of magnitude 0.05 to indicate it is increasing each year by 0.05 degrees celcius. Plus some error/ noise. Noise is taken from a gausian distribtuion.

#visualize trend of temperature across the 60 years
plot(temperature ~ years,pch=19); abline(lm(temperature ~ years),lty=2)

## simulate the caterpillar richness as a function of years and temps
catrich <-  350 -2*years -50*temperature + rnorm(length(years),50,50)

#visualize caterpillar richness across years
plot(catrich ~ years,pch=19); abline(lm(catrich ~ years),lty=2)

#combine the simulated vectors to create one dataset
semdata <- cbind(catrich, years, temperature, precip)

In the case where we didn’t simulate data and we are hypothesizing about these patterns, we would draw a metamodel of the relationships we are about to test. Below is the meta model for the model we are about to write. Note that this meta model is very simplified in that it only displays the edges that represent regression or path coefficients. When creating your model, it is important to consider correlated error structure. The covariance of residuals and residual variances of the variables are typically not displayed in a path diagram, but the model output will describe this information.

Let’s first examine the observed or sample covariance matrix to get a sense of these reltationships.

cov(semdata)
##                 catrich       years temperature      precip
## catrich      8825.22098 -1358.42825 -82.1936621  90.2473145
## years       -1358.42825   305.00000  15.3312769 -23.4048821
## temperature   -82.19366    15.33128   1.0431215   0.2863871
## precip         90.24731   -23.40488   0.2863871  97.8564803

Now let’s run our hypothesized SEM. First, we need to write a model statement. Finally we run the model using sem() and plot it using lavaanPlot().

##Write the model
# In the model below the exogenous variables are years and temperature
# Endogenous variables are caterpillar richnesss and precipitation
# Precipitation is predicted by years, caterpillar richness is predicted by years and temperature, temperature is predicted by years

model1 <- 'precip ~ years 
    catrich ~ years 
 catrich ~ temperature
    temperature ~ years'

# fitting the sem model to your data
fit1 <- sem(model1, data = semdata) 
summary(fit1,rsq = T, fit.measures = TRUE, standardized = TRUE) # get summary of model fits (since we specified `fit.measures=TRUE`), standardized estimates (since we specified `standardized =TRUE`) this will return the same coefficients had we scaled all the data and an R-squared value
## lavaan 0.6-10 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                            60
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 5.118
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.024
## 
## Model Test Baseline Model:
## 
##   Test statistic                               175.155
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.976
##   Tucker-Lewis Index (TLI)                       0.854
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -580.210
##   Loglikelihood unrestricted model (H1)       -577.651
##                                                       
##   Akaike (AIC)                                1176.420
##   Bayesian (BIC)                              1193.175
##   Sample-size adjusted Bayesian (BIC)         1168.013
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.262
##   90 Percent confidence interval - lower         0.077
##   90 Percent confidence interval - upper         0.504
##   P-value RMSEA <= 0.05                          0.034
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.054
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   precip ~                                                              
##     years            -0.077    0.072   -1.059    0.290   -0.077   -0.135
##   catrich ~                                                             
##     years            -1.702    0.652   -2.609    0.009   -1.702   -0.314
##     temperature     -54.746   11.124   -4.921    0.000  -54.746   -0.592
##   temperature ~                                                         
##     years             0.050    0.004   13.027    0.000    0.050    0.860
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .precip ~~                                                             
##    .catrich          64.990   57.207    1.136    0.256   64.990    0.148
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .precip           94.459   17.246    5.477    0.000   94.459    0.982
##    .catrich        2034.063  371.367    5.477    0.000 2034.063    0.231
##    .temperature       0.268    0.049    5.477    0.000    0.268    0.261
## 
## R-Square:
##                    Estimate
##     precip            0.018
##     catrich           0.769
##     temperature       0.739
#PLot model
lavaanPlot(name = "MODEL1", fit1, labels = semdata, coefs = TRUE) # prints path diagram that specifies relationships specified in `model1` along with the estimated path coefficients from the fitted model (here, the unstandardized coefficient estimates in the summary output.)

Let’s break down the output:

Summary/Comparison of Multivariate Methods

Summary/Comparison of Multivariate Methods

Model Fit Estimates

The general goal of SEM is to identify a set of maximally likely population parameters given a) the variances and covariances of the observed data and b) several user-specified paths thought to give rise to said variances and covariances. Thus, model fit is based on the comparison of the observed variance-covariance matrix and model-implied covariance matrix. If the model is an accurate representation of the patterns among the data, the variance-covariance matrix using the estimated parameters should match the variance-covariance of the sample or observed matrix.

  1. Estimator Methods
  • Indicates the Maximum Liklihood (ML) was used to estimate parameters. still fuzzy on ML? click here.
  • the optimizer that was used to find the best fitting parameter values for this estimator (here: NLMINB)
  • the number of model parameters (here: 8). 4 estimates for regression parameters + 1 estimate for convariance + 3 estimates for variances = 8 parameters
  1. Model Test User Model

This is the section describing the fitted model. Values in this section are reported in publications for model fit (e.g., χ2 = 2.3, p=0.128, df=1)

  • Test statistic for the model that was specified by the user/ the fitted model. Here, the test statistic calculated is a Chi-squared test statistic and represents the observed Chi-square value for the fitted model.
  • Degrees of freedom for the fitted (n=1 because df fro the model= (number of covariances) - (number of parameters)
  • p-value for the fitted model. Model significance was tested from Chi-square null hypothesis test. The null hypothesis is: the fitted model-implied covariance matrix and the observed sample covariance matrix are equal. If our specified model was “good” we would accept that null hypothesis. If the specified model is poor fit for the the data, then we would reject the null. Simply put, a high p-value is desired because its structure properly represents relationships in the data.
  1. Model Test Baseline Model

The baseline model represents a null model whereby your observed variables are constrained to covary with no other variables- covariances are fixed to 0 such that only variances are estimated in the covariance matrix. It represents a poor fitting mode. This is tested against your fitted model.

  • minimum function test statistic is the test statistic for the null model (chi-square value)
  • degrees of freedom for the null model
  • p-value for the null model compared to the fitted model.

D-G) More model fit statistics

A bunch of other model fit statistics. Beyond the scope of this lesson. In the spirit of somewhat arbitrary cut-offs: RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08 are accepted as indicators of “good fit” refer to Hu and Bentler (1999).

Parameter Estimates

  1. Regression Coefficients

Values in this section get interpreted just like any other linear model. These are the path coefficients that are presented on the path diagrams.

  • Estimates: Contains the (estimated or fixed) parameter value for each model parameter. You have the option to report unstandardized estimates (these are reported under Estimate) or standardized estimates (reported under Std.all). Standardized are most useful for interpreting effects when neither predictors nor outcomes have natural scales and when gauging relative effect sizes across predictors on different scales. It is also an option to do partial stansdardization if you wish to say a one unit increase in the predictor leads to some increase in standard deviations of the response. If you wish for the output to give partial standardized coefficients use summary(fit, std.nox=TRUE, rsquare=TRUE). Example interpretation: The unstandardized coefficients for precip~years indicate that each year precipitation increases by -0.098 units. Using, standardized coefficients indicates that for every 1 standard deviation in year, precipitation decreases by -0.191 standard deviations.

  • Std.err: Contains the standard error for each estimated parameter

  • Z-value: contains the Wald statistic (which is simply obtained by dividing the parameter value by its standard error), and the last column (P(>|z|)) contains the p-value for testing the null hypothesis that the parameter equals zero in the population

I, J) Variance Covariance Outputs

In these sections it is important to note that a (.) before the variable term (e.g., .precip) indicates that it is a residual term. Also, variances and covariance can be displayed on the path diagram abd there are specific ways to illustrate those relationships, however, this information is often omitted from published path diagrams for simplification. Published path diagrams, in ecology, typically only include edges representing the regression coefficients.

  1. Covariances

This section displays the covariance of the residuals among the endogenous variables (precip and catrich- the predicted variables/ those with arrows pointed to them). In our example the value can be interpreted as a negative association between the variance of precip and catrich that was not accounted for by the exogenous variables. As a default, lavaan allows all exogenous variables to covary but it fixes the residual covariances among endogenous variables to zero. The residual terms of stress precip and catrich are freed to covary within this section. Recall, to designate a covariance (or variance) we use the ~~ operator. Thus, .precip ~~ .catrich tells lavaan to include a residual covariance for precip and catrich. Note that values in the std.all column represents the correlation.

  1. Variances

This section represents the residual variance of the endogenous variables precip, temp and catrich- the left-over variance that is not explained by the predictor(s). Notice that precip has a lot of unexplained variance 0.963 while catrich= 0.190 and temperature= 0.261 have much smaller residual variance. This makes sense since we modeled temp and catrich to vary with each other and years.

  1. R-squared

The amount of total variance explained by the model for each variable. As we would expect, a lot of the variance in precip was not explained by the model. Had we modeled precip off of years then we would have expected a lot more variance in precip to be explained by the model.

Let’s try the same model with our variables scaled.

semdata_scaled <- apply(semdata, MARGIN = 2, scale)

model2 <- 'precip ~ years
    catrich ~ years 
    catrich ~ temperature
    temperature ~ years'

fit2 <- sem(model2, data = semdata_scaled)
summary(fit2, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 14 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                            60
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 5.118
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.024
## 
## Model Test Baseline Model:
## 
##   Test statistic                               175.155
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.976
##   Tucker-Lewis Index (TLI)                       0.854
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -168.878
##   Loglikelihood unrestricted model (H1)       -166.319
##                                                       
##   Akaike (AIC)                                 353.755
##   Bayesian (BIC)                               370.510
##   Sample-size adjusted Bayesian (BIC)          345.348
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.262
##   90 Percent confidence interval - lower         0.077
##   90 Percent confidence interval - upper         0.504
##   P-value RMSEA <= 0.05                          0.034
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.054
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   precip ~                                                              
##     years            -0.135    0.128   -1.059    0.290   -0.135   -0.135
##   catrich ~                                                             
##     years            -0.316    0.121   -2.609    0.009   -0.316   -0.314
##     temperature      -0.595    0.121   -4.921    0.000   -0.595   -0.592
##   temperature ~                                                         
##     years             0.860    0.066   13.027    0.000    0.860    0.860
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .precip ~~                                                             
##    .catrich           0.070    0.062    1.136    0.256    0.070    0.148
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .precip            0.965    0.176    5.477    0.000    0.965    0.982
##    .catrich           0.230    0.042    5.477    0.000    0.230    0.231
##    .temperature       0.257    0.047    5.477    0.000    0.257    0.261
## 
## R-Square:
##                    Estimate
##     precip            0.018
##     catrich           0.769
##     temperature       0.739
lavaanPlot(name = "MODEL2", fit2, labels = semdata_scaled, coefs = TRUE)

SEM is especially powerful when we use them to test different hypotheses and perform model comparison. This next model does not model a trend in caterpillars over time. We can then perform model comparison using AIC or some other metric.

semdata_scaled <- apply(semdata, MARGIN = 2, scale)

model3 <- 'precip ~ years
    catrich ~ temperature
    temperature ~ years'

fit3 <- sem(model3, data = semdata_scaled)
summary(fit3, fit.measures = TRUE, standardized = TRUE, rsquare=TRUE)
## lavaan 0.6-10 ended normally after 13 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         7
##                                                       
##   Number of observations                            60
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                11.148
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.004
## 
## Model Test Baseline Model:
## 
##   Test statistic                               175.155
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.946
##   Tucker-Lewis Index (TLI)                       0.838
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -171.892
##   Loglikelihood unrestricted model (H1)       -166.319
##                                                       
##   Akaike (AIC)                                 357.785
##   Bayesian (BIC)                               372.445
##   Sample-size adjusted Bayesian (BIC)          350.428
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.276
##   90 Percent confidence interval - lower         0.134
##   90 Percent confidence interval - upper         0.444
##   P-value RMSEA <= 0.05                          0.007
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.055
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   precip ~                                                              
##     years            -0.101    0.127   -0.796    0.426   -0.101   -0.102
##   catrich ~                                                             
##     temperature      -0.870    0.066  -13.147    0.000   -0.870   -0.860
##   temperature ~                                                         
##     years             0.860    0.066   13.027    0.000    0.860    0.860
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .precip ~~                                                             
##    .catrich           0.112    0.067    1.680    0.093    0.112    0.222
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .precip            0.966    0.176    5.477    0.000    0.966    0.990
##    .catrich           0.262    0.048    5.477    0.000    0.262    0.260
##    .temperature       0.257    0.047    5.477    0.000    0.257    0.261
## 
## R-Square:
##                    Estimate
##     precip            0.010
##     catrich           0.740
##     temperature       0.739
lavaanPlot(name = "MODEL3", fit3, labels = semdata_scaled, coefs = TRUE)
AIC(fit1, fit2, fit3)
##      df       AIC
## fit1  8 1176.4203
## fit2  8  353.7550
## fit3  7  357.7849

INTERPRETATION SUMMARY Our second model appears to be the best, so let’s take a closer look. We can see that years has a positive effects on temperature, which is to say that temperature is increasing over time. We can also see that temperature has a direct negative effect on caterpillar richness. There is a also slight negative trend in richness over time after accounting for temperature change and a slight negative trend in precip. The precip result should be interpreted with caution, as we did not simulate a relationship between precip and year. In fact if you look at the model output, the is no reason to suspect this is not due to random noise (high p-value).

Now for the indirect effect. We only have one in this model, it is the impact of year on caterpillar richness that is mediated by temperature change. We can calculate the total effect by multiplying the path coefficients. So the total impact of year on caterpillars is (-0.37 + (0.86 * -0.57)), which is -0.86.

Checking for Understanding

  1. Try a few other SEMs with the above data. a) How does the AIC vary between them? b) Try reversing the causality. What happens then?**

Multiple paths to the same answer

In this class we have introduced many techniques and depending on your data and questions some will be more appropriate than others. That said, often what some may call the wrong technique will still yield some reasonable interpretations that reflect some of the underlying patterns in your data. This is not to say picking the right technique is not important, it is very important, but this is to say that it can be good to try a bunch of techniques to see where they agree and where they differ. It is super important to remember that you probably will not report all of the things you tried and the results you do report should be based off whether or not the test was appropriate and not which one gives you the “best” results.

For this last section we will look at the same dataset a bunch of different ways. I have simulated relationships between a bunch of different variables. Elevation is a factor with two levels: high and low. At each of these two elevations 100 plots were placed. In each, the leaf area of the plant, the herbivory (an ordinal variable from 0 to 2) was determined, and the survival at the end of the season was observed.

dat <- read.csv("multiple_paths.csv")

Based on our knowledge of the system we expect these relationships between variables (this is how I simulated them). Recall that arrows indicate positive relationships and blunted arrows represent negative relationships. Also note this metamodel specifies relationships with low elevation.

SEMs

These data were simulated with SEMs in mind so it seem reasonable to analyze them using SEMs. There are a couple of different ways we can consider treating our variables. The first interesting one we should examine is herbivory. That was measured in an ordinal fashion and has three categories. Since this is the case, treating it like any other continuous variable probably is not the best thing to do, however it can be easier to interpret. Let’s start there.

#scale herbivory
moddat1 <- dat
moddat1$herbivory <- scale(moddat1$herbivory)
moddat1$leaf_area <- scale(moddat1$leaf_area)

model <- 'leaf_area ~ elevation 
    herbivory ~ elevation
    survival ~ herbivory +elevation
'
#Note that writing the model liek this is analogous to the model above
#model <- 'leaf_area ~ elevation 
#   survival ~  elevation
#   herbivory ~ elevation
#   survival ~ herbivory'

#fit model
fit <- sem(model, data = moddat1)

#summary output
summary(fit, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 10 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           200
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 2.322
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.128
## 
## Model Test Baseline Model:
## 
##   Test statistic                               108.420
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.987
##   Tucker-Lewis Index (TLI)                       0.923
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -643.959
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                                1303.919
##   Bayesian (BIC)                              1330.306
##   Sample-size adjusted Bayesian (BIC)         1304.961
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.081
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.224
##   P-value RMSEA <= 0.05                          0.220
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.031
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   leaf_area ~                                                           
##     elevation         0.598    0.135    4.441    0.000    0.598    0.300
##   herbivory ~                                                           
##     elevation         0.927    0.125    7.425    0.000    0.927    0.465
##   survival ~                                                            
##     herbivory        -0.191    0.034   -5.649    0.000   -0.191   -0.410
##     elevation         0.367    0.067    5.445    0.000    0.367    0.395
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .leaf_area ~~                                                          
##    .survival         -0.009    0.028   -0.333    0.739   -0.009   -0.024
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .leaf_area         0.906    0.091   10.000    0.000    0.906    0.910
##    .herbivory         0.780    0.078   10.000    0.000    0.780    0.784
##    .survival          0.178    0.018   10.000    0.000    0.178    0.826
## 
## R-Square:
##                    Estimate
##     leaf_area         0.090
##     herbivory         0.216
##     survival          0.174
#plot it in a path diagram
lavaanPlot(name = "MODEL1", fit, coefs = TRUE)

Instead of treating herbivory like a continuous variable, we can treat it as a categorical variable. This is a more conservative approach because we get an estimate for each herbivory level. The down side is the model looks a bit messier and it will be slightly harder to fit. That said, we have more than enough data here. First I dummy code these categorical variables, then I scale leaf area, before finally running the model.

# Lets first try dummy coding herbivory the tidyverse way. Result is `moddat2.2`

###dummy code herbivory into a categorical variable with 3 levels 
moddat2 <- dat
moddat2.1 <- moddat2 %>% 
  mutate(ones = 1) %>% 
  spread(key = herbivory, value = ones, fill = 0)

colnames(moddat2.1)[5:7] <- c("herb0", "herb1", "herb2") #each level of herbivory is now a column filled with 1,0 where 1 indicated that the sample had that level of herbivory
moddat2.1
##     plot_num elevation leaf_area survival herb0 herb1 herb2
## 1          1      high  223.9341        0     0     1     0
## 2          2      high  241.1445        0     0     1     0
## 3          3      high  238.8826        0     0     0     1
## 4          4      high  232.0278        0     0     0     1
## 5          5      high  247.5750        1     1     0     0
## 6          6      high  223.0350        0     1     0     0
## 7          7      high  241.9685        0     0     1     0
## 8          8      high  201.4210        0     1     0     0
## 9          9      high  224.1859        0     0     1     0
## 10        10      high  218.1735        0     0     0     1
## 11        11      high  213.8530        1     1     0     0
## 12        12      high  227.9871        0     0     0     1
## 13        13      high  218.4115        1     1     0     0
## 14        14      high  217.1436        0     0     1     0
## 15        15      high  219.5362        1     1     0     0
## 16        16      high  233.8519        0     0     1     0
## 17        17      high  248.4063        0     0     1     0
## 18        18      high  221.2045        0     1     0     0
## 19        19      high  227.4765        0     0     1     0
## 20        20      high  227.5430        0     1     0     0
## 21        21      high  235.3578        1     0     1     0
## 22        22      high  223.6971        0     0     1     0
## 23        23      high  230.0555        1     1     0     0
## 24        24      high  218.9193        0     1     0     0
## 25        25      high  217.3493        0     1     0     0
## 26        26      high  220.5121        0     0     1     0
## 27        27      high  230.4666        0     0     1     0
## 28        28      high  227.0012        0     1     0     0
## 29        29      high  215.6307        0     1     0     0
## 30        30      high  223.9131        0     0     0     1
## 31        31      high  215.6951        1     1     0     0
## 32        32      high  200.1617        0     1     0     0
## 33        33      high  231.8544        0     0     1     0
## 34        34      high  231.0273        0     1     0     0
## 35        35      high  228.5112        0     0     1     0
## 36        36      high  216.3507        0     1     0     0
## 37        37      high  226.1610        0     0     1     0
## 38        38      high  211.7191        0     0     1     0
## 39        39      high  218.5448        0     1     0     0
## 40        40      high  220.7018        0     1     0     0
## 41        41      high  224.4569        0     0     1     0
## 42        42      high  230.9382        1     1     0     0
## 43        43      high  219.7702        0     0     0     1
## 44        44      high  219.6076        0     0     1     0
## 45        45      high  213.1130        0     1     0     0
## 46        46      high  208.9755        0     1     0     0
## 47        47      high  231.6221        0     0     1     0
## 48        48      high  245.4474        0     1     0     0
## 49        49      high  235.1771        0     1     0     0
## 50        50      high  214.3386        0     1     0     0
## 51        51      high  225.4341        0     0     0     1
## 52        52      high  237.1792        0     0     1     0
## 53        53      high  227.8549        1     1     0     0
## 54        54      high  219.0544        0     0     1     0
## 55        55      high  228.5753        0     0     1     0
## 56        56      high  230.8852        1     1     0     0
## 57        57      high  220.5188        1     1     0     0
## 58        58      high  225.0896        1     1     0     0
## 59        59      high  219.6584        1     1     0     0
## 60        60      high  223.4775        1     1     0     0
## 61        61      high  239.6187        0     0     0     1
## 62        62      high  229.7886        1     0     1     0
## 63        63      high  218.1291        0     0     1     0
## 64        64      high  237.7585        0     1     0     0
## 65        65      high  231.3198        1     1     0     0
## 66        66      high  223.6571        0     0     1     0
## 67        67      high  236.6569        0     0     0     1
## 68        68      high  213.6713        0     0     1     0
## 69        69      high  227.4810        0     0     0     1
## 70        70      high  199.1948        0     0     1     0
## 71        71      high  226.9965        0     1     0     0
## 72        72      high  212.5137        0     0     1     0
## 73        73      high  220.4063        0     0     1     0
## 74        74      high  219.2520        0     0     1     0
## 75        75      high  213.9397        0     1     0     0
## 76        76      high  222.2626        1     1     0     0
## 77        77      high  245.0445        0     1     0     0
## 78        78      high  221.4576        0     1     0     0
## 79        79      high  223.9098        0     0     0     1
## 80        80      high  200.2534        1     1     0     0
## 81        81      high  228.6278        0     0     1     0
## 82        82      high  230.3779        0     1     0     0
## 83        83      high  229.8062        0     0     0     1
## 84        84      high  209.7952        0     0     1     0
## 85        85      high  232.2660        0     1     0     0
## 86        86      high  230.5404        1     1     0     0
## 87        87      high  236.0617        0     1     0     0
## 88        88      high  222.2768        0     0     1     0
## 89        89      high  230.4982        0     1     0     0
## 90        90      high  219.8582        1     1     0     0
## 91        91      high  236.2455        0     0     1     0
## 92        92      high  219.1919        0     0     1     0
## 93        93      high  224.1482        0     0     1     0
## 94        94      high  223.0236        0     0     1     0
## 95        95      high  215.8499        1     0     0     1
## 96        96      high  230.0664        0     0     1     0
## 97        97      high  219.6478        0     0     0     1
## 98        98      high  220.0212        1     1     0     0
## 99        99      high  221.6218        0     0     1     0
## 100      100      high  250.0842        0     0     0     1
## 101      101       low  222.3875        0     0     0     1
## 102      102       low  246.2901        1     0     1     0
## 103      103       low  226.9909        0     0     0     1
## 104      104       low  210.1867        0     0     1     0
## 105      105       low  235.1495        0     0     1     0
## 106      106       low  235.7125        1     1     0     0
## 107      107       low  235.4264        0     0     1     0
## 108      108       low  225.9778        1     0     1     0
## 109      109       low  248.2218        0     0     0     1
## 110      110       low  235.0839        1     1     0     0
## 111      111       low  233.8128        1     0     1     0
## 112      112       low  241.9972        0     0     0     1
## 113      113       low  246.5572        0     0     0     1
## 114      114       low  226.4523        0     0     0     1
## 115      115       low  223.9053        0     0     1     0
## 116      116       low  225.2227        0     0     0     1
## 117      117       low  223.1540        0     0     0     1
## 118      118       low  215.2080        0     0     1     0
## 119      119       low  218.1263        0     1     0     0
## 120      120       low  230.6563        0     0     0     1
## 121      121       low  233.3210        0     0     1     0
## 122      122       low  214.4861        1     0     1     0
## 123      123       low  235.8683        0     0     0     1
## 124      124       low  259.2070        0     0     0     1
## 125      125       low  233.6055        1     0     1     0
## 126      126       low  244.4771        0     0     0     1
## 127      127       low  233.4685        0     0     0     1
## 128      128       low  221.5074        1     0     0     1
## 129      129       low  229.7615        0     0     0     1
## 130      130       low  234.2079        1     0     0     1
## 131      131       low  229.8631        0     0     0     1
## 132      132       low  229.7241        0     0     0     1
## 133      133       low  229.8016        1     0     1     0
## 134      134       low  234.5849        1     0     0     1
## 135      135       low  233.4515        0     0     0     1
## 136      136       low  225.3502        0     0     0     1
## 137      137       low  224.4843        1     0     1     0
## 138      138       low  243.0909        1     0     1     0
## 139      139       low  234.3399        0     0     0     1
## 140      140       low  222.7211        0     0     0     1
## 141      141       low  246.9550        0     0     0     1
## 142      142       low  216.8487        1     0     1     0
## 143      143       low  246.3051        0     0     1     0
## 144      144       low  228.3345        1     1     0     0
## 145      145       low  229.0458        1     0     1     0
## 146      146       low  234.9262        1     0     0     1
## 147      147       low  245.0594        0     0     0     1
## 148      148       low  239.1504        1     0     1     0
## 149      149       low  229.4033        0     0     0     1
## 150      150       low  233.2787        1     0     0     1
## 151      151       low  214.2124        0     0     0     1
## 152      152       low  230.0630        0     0     0     1
## 153      153       low  243.0422        0     0     1     0
## 154      154       low  234.0899        1     1     0     0
## 155      155       low  239.7423        1     1     0     0
## 156      156       low  225.4010        1     0     0     1
## 157      157       low  231.9806        0     0     0     1
## 158      158       low  230.6718        1     0     1     0
## 159      159       low  242.4544        0     1     0     0
## 160      160       low  238.9013        1     1     0     0
## 161      161       low  218.5473        1     0     1     0
## 162      162       low  220.0033        1     0     0     1
## 163      163       low  239.5431        0     0     0     1
## 164      164       low  226.3389        0     0     0     1
## 165      165       low  215.3665        1     1     0     0
## 166      166       low  225.2492        0     0     0     1
## 167      167       low  229.2681        1     0     0     1
## 168      168       low  234.3611        1     0     1     0
## 169      169       low  235.4550        0     0     0     1
## 170      170       low  230.2943        1     0     0     1
## 171      171       low  225.0157        0     0     0     1
## 172      172       low  226.7957        0     0     0     1
## 173      173       low  239.0662        1     0     0     1
## 174      174       low  241.6744        0     0     1     0
## 175      175       low  225.6601        1     0     0     1
## 176      176       low  231.4486        0     0     0     1
## 177      177       low  238.2591        0     0     1     0
## 178      178       low  241.1180        0     0     0     1
## 179      179       low  252.6027        1     0     0     1
## 180      180       low  236.7156        1     1     0     0
## 181      181       low  206.5731        0     0     0     1
## 182      182       low  222.4729        0     0     1     0
## 183      183       low  240.2680        0     0     0     1
## 184      184       low  216.1082        0     0     1     0
## 185      185       low  218.8058        1     0     1     0
## 186      186       low  217.7569        1     0     0     1
## 187      187       low  222.5113        0     0     0     1
## 188      188       low  239.5409        0     0     0     1
## 189      189       low  239.7349        1     0     1     0
## 190      190       low  231.0164        0     0     0     1
## 191      191       low  250.7407        0     0     0     1
## 192      192       low  225.4600        0     0     0     1
## 193      193       low  211.8487        1     0     0     1
## 194      194       low  239.9444        0     0     1     0
## 195      195       low  226.8627        0     0     1     0
## 196      196       low  239.9382        1     0     0     1
## 197      197       low  223.2647        1     1     0     0
## 198      198       low  219.6682        0     0     0     1
## 199      199       low  222.4671        1     1     0     0
## 200      200       low  232.4861        0     1     0     0
#Fun tangent. If you are interested, this is the same step as above but showing you the base R method to do this. Result is `moddat2.2`
herb_vals <- as.character(unique(moddat2$herbivory))
moddat2.2 <- data.frame(matrix(ncol = (ncol(moddat2)-1) + length(herb_vals)))
colnames(moddat2.2) <- c(colnames(moddat2)[-4], paste0("herb", herb_vals))

for(i in 1:length(herb_vals)) {
  temp_dat <- subset(moddat2, herbivory == herb_vals[i])
  temp_dat <- subset(temp_dat, select= -herbivory)
  new_mat <- matrix(data = 0, nrow = nrow(temp_dat), ncol = length(herb_vals))
  colnames(new_mat) <- paste0("herb", herb_vals)
  new_mat[,i] <- 1
  moddat2.2 <- rbind(moddat2.2, cbind(temp_dat, new_mat))
}
moddat2.2 <- moddat2.2[-1,]
# ok. End tangent

# Let's use `moddat2.1
moddat2.1$leaf_area <- scale(moddat2$leaf_area) #scale leaf area

Now that we have the dataset moddat2.1 ready to go with categorical variables, let’s run a similar model updates with categorical herbivory.

model <- '
  leaf_area ~ elevation 
    herb1 ~ elevation
    herb2 ~ elevation
    survival ~ herb1 + herb2 + elevation'

fit <- sem(model, data = moddat2.1)
summary(fit, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 14 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        11
##                                                       
##   Number of observations                           200
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                78.230
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               181.187
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.561
##   Tucker-Lewis Index (TLI)                      -0.465
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -634.292
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                                1290.585
##   Bayesian (BIC)                              1326.866
##   Sample-size adjusted Bayesian (BIC)         1292.017
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.354
##   90 Percent confidence interval - lower         0.289
##   90 Percent confidence interval - upper         0.424
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.157
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   leaf_area ~                                                           
##     elevation         0.598    0.135    4.441    0.000    0.598    0.300
##   herb1 ~                                                               
##     elevation        -0.090    0.067   -1.345    0.179   -0.090   -0.095
##   herb2 ~                                                               
##     elevation         0.420    0.061    6.881    0.000    0.420    0.437
##   survival ~                                                            
##     herb1            -0.307    0.063   -4.889    0.000   -0.307   -0.293
##     herb2            -0.473    0.069   -6.869    0.000   -0.473   -0.455
##     elevation         0.361    0.066    5.439    0.000    0.361    0.362
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .leaf_area ~~                                                          
##    .survival         -0.010    0.028   -0.354    0.723   -0.010   -0.025
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .leaf_area         0.906    0.091   10.000    0.000    0.906    0.910
##    .herb1             0.224    0.022   10.000    0.000    0.224    0.991
##    .herb2             0.186    0.019   10.000    0.000    0.186    0.809
##    .survival          0.177    0.018   10.000    0.000    0.177    0.711
## 
## R-Square:
##                    Estimate
##     leaf_area         0.090
##     herb1             0.009
##     herb2             0.191
##     survival          0.289
lavaanPlot(name = "MODEL1", fit, coefs = TRUE)

You will probably recall from the complex models lab that we do not usually model binomial data as a regular glm, which both of the previous models have done. Lavaan can only run gaussian models. We need to use the piecewiseSEM package for fancier models.

moddat3 <- dat

moddat3$leaf_area <- scale(moddat3$leaf_area) #first, scale leaf area like we did in the previous models

moddat3 #Notice we are back to the data frame where herbivory is a single column
##     plot_num elevation    leaf_area herbivory survival
## 1          1      high -0.386859392         1        0
## 2          2      high  1.250968150         1        0
## 3          3      high  1.035721351         2        0
## 4          4      high  0.383378059         2        0
## 5          5      high  1.862932911         0        1
## 6          6      high -0.472421078         0        0
## 7          7      high  1.329389156         1        0
## 8          8      high -2.529326397         0        0
## 9          9      high -0.362900296         1        0
## 10        10      high -0.935064826         2        0
## 11        11      high -1.346227357         0        1
## 12        12      high -0.001155493         2        0
## 13        13      high -0.912415163         0        1
## 14        14      high -1.033076232         1        0
## 15        15      high -0.805383708         0        1
## 16        16      high  0.556968705         1        0
## 17        17      high  1.942039618         1        0
## 18        18      high -0.646619551         0        0
## 19        19      high -0.049741144         1        0
## 20        20      high -0.043420017         0        0
## 21        21      high  0.700278477         1        1
## 22        22      high -0.409408863         1        0
## 23        23      high  0.195681968         0        1
## 24        24      high -0.864090347         0        0
## 25        25      high -1.013501716         0        0
## 26        26      high -0.712513954         1        0
## 27        27      high  0.234807737         1        0
## 28        28      high -0.094972981         0        0
## 29        29      high -1.177051720         0        0
## 30        30      high -0.388854861         2        0
## 31        31      high -1.170925310         0        1
## 32        32      high -2.649161876         0        0
## 33        33      high  0.366876542         1        0
## 34        34      high  0.288170685         0        0
## 35        35      high  0.048719137         1        0
## 36        36      high -1.108533303         0        0
## 37        37      high -0.174934172         1        0
## 38        38      high -1.549302409         1        0
## 39        39      high -0.899736474         0        0
## 40        40      high -0.694460156         0        0
## 41        41      high -0.337102914         1        0
## 42        42      high  0.279684132         0        1
## 43        43      high -0.783118903         2        0
## 44        44      high -0.798589126         1        0
## 45        45      high -1.416651696         0        0
## 46        46      high -1.810393690         0        0
## 47        47      high  0.344774474         1        0
## 48        48      high  1.660457149         0        0
## 49        49      high  0.683087451         0        0
## 50        50      high -1.300017036         0        0
## 51        51      high -0.244110090         2        0
## 52        52      high  0.873612531         1        0
## 53        53      high -0.013734154         0        1
## 54        54      high -0.851237517         1        0
## 55        55      high  0.054818147         1        0
## 56        56      high  0.274640588         0        1
## 57        57      high -0.711877095         0        1
## 58        58      high -0.276896579         0        1
## 59        59      high -0.793752164         0        1
## 60        60      high -0.430315371         0        1
## 61        61      high  1.105768155         2        0
## 62        62      high  0.170290564         1        1
## 63        63      high -0.939293261         1        0
## 64        64      high  0.928747279         0        0
## 65        65      high  0.316002961         0        1
## 66        66      high -0.413214677         1        0
## 67        67      high  0.823905270         2        0
## 68        68      high -1.363517709         1        0
## 69        69      high -0.049318860         2        0
## 70        70      high -2.741180305         1        0
## 71        71      high -0.095420843         0        0
## 72        72      high -1.473683560         1        0
## 73        73      high -0.722577974         1        0
## 74        74      high -0.832428450         1        0
## 75        75      high -1.337974245         0        0
## 76        76      high -0.545925345         0        1
## 77        77      high  1.622118335         0        0
## 78        78      high -0.622533343         0        0
## 79        79      high -0.389168583         2        0
## 80        80      high -2.640433719         0        1
## 81        81      high  0.059814883         1        0
## 82        82      high  0.226366742         0        0
## 83        83      high  0.171959698         2        0
## 84        84      high -1.732392809         1        0
## 85        85      high  0.406047071         0        0
## 86        86      high  0.241829371         0        1
## 87        87      high  0.767266416         0        0
## 88        88      high -0.544575500         1        0
## 89        89      high  0.237816426         0        0
## 90        90      high -0.774745912         0        1
## 91        91      high  0.784756224         1        0
## 92        92      high -0.838149930         1        0
## 93        93      high -0.366483692         1        0
## 94        94      high -0.473503104         1        0
## 95        95      high -1.156197238         2        1
## 96        96      high  0.196721534         1        0
## 97        97      high -0.794762790         2        0
## 98        98      high -0.759230643         0        1
## 99        99      high -0.606904524         1        0
## 100      100      high  2.101719414         2        0
## 101      101       low -0.534044612         2        0
## 102      102       low  1.740656614         1        1
## 103      103       low -0.095954806         2        0
## 104      104       low -1.695130785         1        0
## 105      105       low  0.680461023         1        0
## 106      106       low  0.734037519         0        1
## 107      107       low  0.706805622         1        0
## 108      108       low -0.192367403         1        1
## 109      109       low  1.924482471         2        0
## 110      110       low  0.674218074         0        1
## 111      111       low  0.553250284         1        1
## 112      112       low  1.332121500         2        0
## 113      113       low  1.766074012         2        0
## 114      114       low -0.147214441         2        0
## 115      115       low -0.389598323         1        0
## 116      116       low -0.264230832         2        0
## 117      117       low -0.461092416         2        0
## 118      118       low -1.217275280         1        0
## 119      119       low -0.939560582         0        0
## 120      120       low  0.252862788         2        0
## 121      121       low  0.506446537         1        0
## 122      122       low -1.285980373         1        1
## 123      123       low  0.748863896         2        0
## 124      124       low  2.969895577         2        0
## 125      125       low  0.533521440         1        1
## 126      126       low  1.568115861         2        0
## 127      127       low  0.520480735         2        0
## 128      128       low -0.617797805         2        1
## 129      129       low  0.167705476         2        0
## 130      130       low  0.590846548         2        1
## 131      131       low  0.177372490         2        0
## 132      132       low  0.164145974         2        0
## 133      133       low  0.171525126         1        1
## 134      134       low  0.626729390         2        1
## 135      135       low  0.518865811         2        0
## 136      136       low -0.252098664         2        0
## 137      137       low -0.334502287         1        1
## 138      138       low  1.436200428         1        1
## 139      139       low  0.603411303         2        0
## 140      140       low -0.502294754         2        0
## 141      141       low  1.803933617         2        0
## 142      142       low -1.061143362         1        1
## 143      143       low  1.742085939         1        0
## 144      144       low  0.031902118         0        1
## 145      145       low  0.099595635         1        1
## 146      146       low  0.659210171         2        1
## 147      147       low  1.623533495         2        0
## 148      148       low  1.061202237         1        1
## 149      149       low  0.133616860         2        0
## 150      150       low  0.502426385         2        1
## 151      151       low -1.312029969         2        0
## 152      152       low  0.196399461         2        0
## 153      153       low  1.431570336         1        0
## 154      154       low  0.579621631         0        1
## 155      155       low  1.117530229         0        1
## 156      156       low -0.247257325         2        1
## 157      157       low  0.378886722         2        0
## 158      158       low  0.254339272         1        1
## 159      159       low  1.375626357         0        0
## 160      160       low  1.037499978         0        1
## 161      161       low -0.899497145         1        1
## 162      162       low -0.760935216         2        1
## 163      163       low  1.098579462         2        0
## 164      164       low -0.158002294         2        0
## 165      165       low -1.202197910         0        1
## 166      166       low -0.261708100         2        0
## 167      167       low  0.120749966         2        1
## 168      168       low  0.605433511         1        1
## 169      169       low  0.709534161         2        0
## 170      170       low  0.218415143         2        1
## 171      171       low -0.283930114         2        0
## 172      172       low -0.114537721         2        0
## 173      173       low  1.053190485         2        1
## 174      174       low  1.301400904         1        0
## 175      175       low -0.222607475         2        1
## 176      176       low  0.328261763         2        0
## 177      177       low  0.976379636         1        0
## 178      178       low  1.248452523         2        0
## 179      179       low  2.341394959         2        1
## 180      180       low  0.829491958         0        1
## 181      181       low -2.039019723         2        0
## 182      182       low -0.525914175         1        0
## 183      183       low  1.167558159         2        0
## 184      184       low -1.131615875         1        0
## 185      185       low -0.874895418         1        1
## 186      186       low -0.974708559         2        1
## 187      187       low -0.522260773         2        0
## 188      188       low  1.098364418         2        0
## 189      189       low  1.116826507         1        1
## 190      190       low  0.287125926         2        0
## 191      191       low  2.164195489         2        0
## 192      192       low -0.241644332         2        0
## 193      193       low -1.536971088         2        1
## 194      194       low  1.136766052         1        0
## 195      195       low -0.108159197         1        0
## 196      196       low  1.136179406         2        1
## 197      197       low -0.450565687         0        1
## 198      198       low -0.792825915         2        0
## 199      199       low -0.526464012         0        1
## 200      200       low  0.426992224         0        0
# `psem` is a function in the `piecewise` package that allows you to do an SEM but specify distributions for the variables other than a normal distribution- as is the case with lavaan. The syntax is different between lavaan and piecewise. Here you will notice, instead of using quotes to enclose the model, the function `psem` encloses the model and the `lm` function is used for each relationship- which is not the case in lavaan
glmsem <- psem(
  lm(leaf_area ~ elevation, moddat3), 
  lm(herbivory ~ elevation, moddat3),
  glm(survival ~ herbivory + leaf_area + elevation, "binomial", moddat3)
)

# As a summary output we will use `coefs`. This outputs the regression coefficients (path coefficients) for our model
coefs(glmsem)
##     Response        Predictor Estimate Std.Error  DF Crit.Value P.Value
## 1  leaf_area        elevation        -         -   1    19.5256  0.0000
## 2  leaf_area elevation = high  -0.2989    0.0956 198    -3.1245  0.0020
## 3  leaf_area  elevation = low   0.2989    0.0956 198     3.1245  0.0020
## 4  herbivory        elevation        -         -   1    54.5795  0.0000
## 5  herbivory elevation = high     0.69    0.0718 198     9.6121  0.0000
## 6  herbivory  elevation = low     1.44    0.0718 198    20.0600  0.0000
## 7   survival        herbivory  -1.3462    0.2766 196    -4.8670  0.0000
## 8   survival        leaf_area  -0.0766    0.1764 196    -0.4345  0.6639
## 9   survival        elevation        -         -   1    26.9222  0.0000
## 10  survival elevation = high  -2.0447    0.3288 Inf    -6.2180  0.0000
## 11  survival  elevation = low   0.1363     0.251 Inf     0.5431  0.5870
##    Std.Estimate    
## 1             - ***
## 2             -  **
## 3             -  **
## 4             - ***
## 5             - ***
## 6             - ***
## 7         -0.51 ***
## 8       -0.0359    
## 9             - ***
## 10            - ***
## 11            -

PiecewiseSEM does not have a “nice” plotting function. You could make one in illustration software (i.e. adobe illustrator or powerpoint). If you look at the coefficient directionality and the magnitude of the estimates, patterns follow our overall story!

Linear models

So far we have only looked at different types of SEMs. What if we perform regular linear models? Can we pick up any of the signal in the data?

moddat4 <- dat

mod4.1 <- glm(survival ~ elevation + herbivory + leaf_area + elevation*herbivory, data = moddat4)
summary(mod4.1)
## 
## Call:
## glm(formula = survival ~ elevation + herbivory + leaf_area + 
##     elevation * herbivory, data = moddat4)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7940  -0.2750  -0.1565   0.4634   1.0518  
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.612963   0.681641   0.899 0.369631    
## elevationlow            0.418324   0.115111   3.634 0.000357 ***
## herbivory              -0.214975   0.059984  -3.584 0.000428 ***
## leaf_area              -0.001088   0.003042  -0.358 0.721008    
## elevationlow:herbivory -0.041847   0.084693  -0.494 0.621795    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.182239)
## 
##     Null deviance: 43.155  on 199  degrees of freedom
## Residual deviance: 35.537  on 195  degrees of freedom
## AIC: 234.02
## 
## Number of Fisher Scoring iterations: 2
mod4.2 <- glm(survival ~ elevation + herbivory + leaf_area + elevation*herbivory, family = "binomial", data = moddat4)
summary(mod4.2)
## 
## Call:
## glm(formula = survival ~ elevation + herbivory + leaf_area + 
##     elevation * herbivory, family = "binomial", data = moddat4)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7652  -0.8026  -0.4411   1.0979   2.8727  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             1.04381    3.74622   0.279 0.780530    
## elevationlow            1.70865    0.60216   2.838 0.004546 ** 
## herbivory              -1.86889    0.56165  -3.327 0.000876 ***
## leaf_area              -0.00656    0.01673  -0.392 0.694908    
## elevationlow:herbivory  0.74054    0.64743   1.144 0.252700    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 249.22  on 199  degrees of freedom
## Residual deviance: 208.95  on 195  degrees of freedom
## AIC: 218.95
## 
## Number of Fisher Scoring iterations: 5
mod4.3 <- glm(survival ~ elevation, family = "binomial", data = moddat4)
summary(mod4.3)
## 
## Call:
## glm(formula = survival ~ elevation, family = "binomial", data = moddat4)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0273  -1.0273  -0.7049   1.3354   1.7402  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.2657     0.2414  -5.243 1.58e-07 ***
## elevationlow   0.9017     0.3156   2.857  0.00428 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 249.22  on 199  degrees of freedom
## Residual deviance: 240.75  on 198  degrees of freedom
## AIC: 244.75
## 
## Number of Fisher Scoring iterations: 4

We lose some of the indirect effect interpretation with these, but we still retain the interpretation that herbivory has a negative effect on survival, high elevations have lower survival, and that leaf area`` is not very predictive.

Bayesian Hierarchical Model

We can also model these data as a Bayesian hierarchical model. Since plots are nested in elevation we can estimate both elevation specific and across elevation effects.

moddat5 <- dat

#Make each level of Herbivory a sperate column
moddat5 <- moddat5 %>% 
  mutate(ones = 1) %>% 
  spread(key = herbivory, value = ones, fill = 0)
colnames(moddat5)[5:7] <- c("herb0", "herb1", "herb2")

#############################
# Step1:                    #
#Dataframe to lit for rJAGS #
#############################
# Put the model data into a list. The jags function requires this formatting.
model_data <- list(surv = moddat5$survival,
                   herb0 = moddat5$herb0,
                   herb1 = moddat5$herb1,
                   herb2 = moddat5$herb2,
                   area = scale(moddat5$leaf_area)[,1], #note we scale leaf area
                   elevs = as.numeric(factor(moddat5$elevation)), #we change elevation to a numeric variable so jags can interpret it
                   N = length(moddat5$plot_num), #jags requires the number of observations to be specified. In our case each row is a unique plot, so if we use length on the plot column, we will get the number of observations 
                   Nelev = length(unique(moddat5$elevation)))

#############################
# Step2:                    #
#Build Model                #
#############################
#Model specification (I will follow-up by adding line-by-line explanation of this model this weekend)

sim_model <- "model {

             #`for(i in 1:N)` : for each observation i through N observations. Here N = 200 
             #`surv[i] ~ dbern(p[i])`:  y is distributed as a Bernoulli distribution with parameter p
             # p= probabily of success and in this case refers to the probability of survival. 
             # i refers to each trial.The number of trials equals the number of observations.
             #`logit(p[i])`:  The logit probibility of success for the ith observation 
             #`logit(p[i])<- beta0[elevs[i]] * herb0[i] + beta1[elevs[i]] * herb1[i] +  
             #beta2[elevs[i]] * herb2[i] + beta3[elevs[i]*area[i]`: The logit probibility of success 
             #for the ith observation is a linear function of the different categories of herbivory 
             #(herb0,herb1, herb2) and area. The brackets next to the beta coefficients
             #(e.g. beta1[elevs[i]]) indicate the heirarchical components of the model. Such that a beta                               #coefficient for each level of elevation will be caclulated for each level of 
             #herbivory or area.

          for(i in 1:N) {  
              surv[i] ~ dbern(p[i]) 
              logit(p[i]) <- beta0[elevs[i]] * herb0[i] + 
              beta1[elevs[i]] * herb1[i] +   
              beta2[elevs[i]] * herb2[i] +   
              beta3[elevs[i]] * area[i]      
          }   

          #Uninformative priors
          #Prior distributions are a central component of Bayesian models. This section defines the 
          #priors.
          #`for (j in 1:Nelev)` For the jth elevation in 1: through the number of the elevations  
          #(which is 2 levels of elevation; high and low)
          #`dnorm(hbmu0, 0.001)`: Each beta coefficient has a prior ditribution. In this example, we 
          #have chosen uninformative priors that are defined as coming from a normal distribuion with 
          #two parameters: mean(hbmu) & precision(0.001). You may notice that typically in a Gaussian
          #distribution, we specify the mean and standard deviation. For the priors it is the convention 
          #to use precision which is the inverse variance (1/variance).  
          
        
          for (j in 1:Nelev) {
               beta0[j] ~ dnorm(hbmu0, 0.001)
               beta1[j] ~ dnorm(hbmu1, 0.001)
               beta2[j] ~ dnorm(hbmu2, 0.001)
               beta3[j] ~ dnorm(hbmu3, 0.001)
          } 

          #Uninformative hyperpriors
          #Notice in the chunck above, hbmu is a parameter of a parameter(e.g. hbmu0 is a hyperparamter 
          #since it is used to model the parameter beta0). Since it is a hyperparameter for the prior 
          #distribution, it is called a hyperprior. We need to include hyperparameters in the model as 
          #random variables since we do not know their value. Like above, we made these hyperpriors 
          #uninformative using a normal distributon to model our uncertainty in hbmu. We model the 
          #hyperprior with a mean of 0 and precision of 0.001.      
          
          hbmu0 ~ dnorm(0, 0.001)
          hbmu1 ~ dnorm(0, 0.001)
          hbmu2 ~ dnorm(0, 0.001)
          hbmu3 ~ dnorm(0, 0.001)
}   
"

writeLines(sim_model, con="sim_model.txt")

############################################################
# Step3:                                                   #
#Run data through the Sampler and specify sampler details  #
############################################################

#This specifies details of the MCMC chain (i.e., number of burnin, number of iterations, number of chains)
mod <- jags(data = model_data,
                    n.adapt = 2000,
                    n.burnin = 1000,
                    n.iter = 10000,
                    n.chains = 4,
                    modules = "glm",
                    model.file = "sim_model.txt",
                    parameters.to.save = c("beta0", "beta1", "beta2", "beta3", "hbmu0", "hbmu1", "hbmu2", "hbmu3"),
                    verbose = F)

# The `mod` output is an array. The section of code below pulls out the necessary information from the array to make a handy dataframe called `plot_dat`. `med` represents the mean of the posterior distribution- these are the beta coefficients we report. `lc` and `uc` represent the respective lower and upper credible  intervals around that estimate. Recall that beta coefficients and their credible intervals represent the effect of herbivory and leaf area on survival for each level of those factors (high and low)
plot_dat <- data.frame(label = c("High_Herb0", "High_Herb1", "High_Herb2", "High_Leaf", 
                                 "Low_Herb0", "Low_Herb1", "Low_Herb2", "Low_Leaf"),
                       med = c(mod$q50$beta0[1], mod$q50$beta1[1], mod$q50$beta2[1], mod$q50$beta3[1],
                               mod$q50$beta0[2], mod$q50$beta1[2], mod$q50$beta2[2], mod$q50$beta3[2]),
                       lc = c(mod$q2.5$beta0[1], mod$q2.5$beta1[1], mod$q2.5$beta2[1], mod$q2.5$beta3[1],
                              mod$q2.5$beta0[2], mod$q2.5$beta1[2], mod$q2.5$beta2[2], mod$q2.5$beta3[2]),
                       uc = c(mod$q97.5$beta0[1], mod$q97.5$beta1[1], mod$q97.5$beta2[1], mod$q97.5$beta3[1],
                              mod$q97.5$beta0[2], mod$q97.5$beta1[2], mod$q97.5$beta2[2], mod$q97.5$beta3[2]))

plot_dat
##        label         med         lc         uc
## 1 High_Herb0 -0.34951837 -1.0022445  0.2887310
## 2 High_Herb1 -3.08448335 -5.0149007 -1.8334635
## 3 High_Herb2 -2.98748760 -6.3265173 -1.2062020
## 4  High_Leaf  0.03348905 -0.5248504  0.5779709
## 5  Low_Herb0  1.34646560  0.1252927  2.9063205
## 6  Low_Herb1  0.17019252 -0.5607322  0.9230453
## 7  Low_Herb2 -1.00938745 -1.6562110 -0.4146287
## 8   Low_Leaf -0.15907374 -0.6419356  0.3052717
# plot the estimated beta coefficients and their credible intervals for the effects (beta coeficients) of herbivory and leaf area on survival at each elevation category
ggplot(data = plot_dat) +
  geom_vline(aes(xintercept = 0), linetype = "dashed", color = "gray") +
  geom_errorbarh(aes(xmin = lc, xmax = uc, y = label), height = 0) +
  geom_point(aes(x = med, y = label), pch = 21, fill = "gray65", color = "black") +
  labs(x = "Point Estimates of beta coefficients from posterior distribution (95% CI)", y="") +
  theme_classic()

Summary/Comparison of Multivariate Methods

Summary/Comparison of Multivariate Methods

When beta coefficients are close to zero and the credible intervals overlap with zero, these indicate no effect of thsoe variables on survival. Here we see the same pattern as we saw before. Increasingly negative effects of of high hebrivory (green). We also see no effect of leaf area (yellow) and we see greater survival overall at low elevations (red).

#This dataframe includes the OVERALL beta coefficients and credible intervals of herbivory and leaf area effect on survival
plot_dat1 <- data.frame(label = c("Herb0", "Herb1", "Herb2", "Leaf"),
                       med = c(mod$q50$hbmu0, mod$q50$hbmu1, mod$q50$hbmu2, mod$q50$hbmu3),
                       lc = c(mod$q2.5$hbmu0, mod$q2.5$hbmu1, mod$q2.5$hbmu2, mod$q2.5$hbmu3),
                       uc = c(mod$q97.5$hbmu0, mod$q97.5$hbmu1, mod$q97.5$hbmu2, mod$q97.5$hbmu3))
  
#plot the                      
ggplot(data = plot_dat1) +
  geom_vline(aes(xintercept = 0), linetype = "dashed", color = "gray") +
  geom_errorbarh(aes(xmin = lc, xmax = uc, y = label), height = 0) +
  geom_point(aes(x = med, y = label), pch = 21, fill = "gray65", color = "black") +
  labs(x = "", y="") +
  theme_classic()

There are no cross elevation effects, as we see in the estimated higher levels. This was not part of the original simulation so this also checks out.

Check for Understanding

  1. Go back to the SEM data (caterpillar richness over time) and try some other types of analyses. Even if they are weird. One idea is to break one of the continuous variables into groups (maybe something like early and late years). Is the story the same?

Key Take aways

  1. Structural Equation Modeling via path analysis is a useful framework for testing causal hyptheses and understanding direct and indirect relationships among variables, particularly when considering latent variables.

  2. Path diagrams are a strength of the method for their ability to carefully outline hypotheses in metamodels and then illustrate the direct and indirect relationships among variables using model estimated coefficients (and other properties e.g. residual variance/covariance estimates).

  3. Model fit is evaluated by comparing the observed covariance matrix to the model-estimated covariance matrix.

  4. The peicewise package has made it possible to model variables on non-normal distributions.

  5. The overall relationships among variables can be elicited from many approaches and you gain more confidence in these relationships if you try a bunch of techniques to see where they agree and where they differ.