In this last lab we are going to examine structural equation modeling and then return to a principle that we have already harked on in this class, multiple paths to the same answer.
Resources:
Book: Cause and Correlation in Biology, Bill Shipley 2016
Website: Introduction of the Framework
Website: Video and 3-part seminar *I highly recommend going through these seminars if you are interested in this method. The question often pops up “Isn’t SEM just linear regression” and I think this does a great job showing how these methods differ with respect to covariance in residual errors.
PDF: Reference Sheet
Structural equation modeling is a linear model framework that models both simultaneous regression equations with latent variables. Two types of models within this framework include: confirmatory factor analysis and path analysis. The working definition here, is that structural equation modeling is a model selection approach to path analysis that either includes latent variables or statistically compares different path models with the goal of comparing different causal hypotheses and examining indirect effects. Latent variables are key components of SEM. Latent variables are immeasurable or unobservable variables but whose influence can be summarized through observable indicator variables- variables that are measured directly- that are caused by the latent constructs. An example of a latent variable in ecology is climate change, whereby changes in means or extremes of climate variables, or rates of change in these variables, are used as indicators for the latent construct- climate change. These methods are widely used outside of ecology, in part because fields such as psychology use survey data items to construct latent variables of interest such as depression, anxiety and intelligence, but the theory and methods are also very useful to apply to, and consider for, ecological questions.
SEM accounts for the following relationships:
Let’s visualize the components of a path diagram in a heuristic visualization, to understand the components of a path diagram and strength of the SEM approach.
Summary/Comparison of Multivariate Methods
Exogenous variable: represented by (x1). These are independent variables (either observed or latent) that explains an endogenous variable. Arrows point away from them, not at them.
Endogenous variables: represented by (y1 and y2). These are dependent variables (either observed or letent) that are explained by exogenous variables in the model. Arrows point toward them and these represent causal paths.
Arrows: (also called edges) are relationships among variables. Sometimes arrows are used to denote positive relationships while blunted lines repreesent negative relationships.
Vertices: (also called nodes) are the variables themselves.
In the above model we are saying that x1 is a predictor of y1 and y2. Additionally, we are saying that y1 is a predictor of y2. This is another key feature of SEMs. We can separate the direct effect of x1 on y1 and the indirect effect of x1 on y2 mediated by y1. Each line in the diagram represents a statistical model. For instance the model predicting y1 looks like this y1 = x1 + error. Each response variable gets its own error (\(\zeta\)).
Typically, before conducting the analysis, it is useful to explicitly draw out the metamodel (model with the variables of interest including (+/-) relationships to represent your hypotheses.
Frequently used syntax in lavaan
Example model structure:
myModel <- '
# regressions
y1 + y2 ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2
# latent variable definitions
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10
# variances and covariances
y1 ~~ y1
y1 ~~ y2
f1 ~~ f2
# intercepts
y1 ~ 1
f1 ~ 1
'
Let’s get on to working with real data. For this lab we will simulate everything.
library(lavaan)
library(lavaanPlot)
library(piecewiseSEM)
library(tidyverse)
library(rjags)
library(jagsUI)
library(coda)
library(matrixStats)
Here we will simulate data of caterpillar richness measured over 60 years. First I simulate that temperature is increasing. Then I simulate that caterpillar richness is declining as a function of both temperature and year. Then I add a precip variable with no trends and no effect on caterpillars.
set.seed(2002)
## years
years <- c(1:60)
#precip variable
precip <- rnorm(60, mean=100, sd=10) # precip is simulate from a random distribution with a mean of 100, standard devation of 10 . 60 values will be taken from that distribution (one for each year).
## simulate the increasing temperature anomolies
temperature <- .05 + .05*years + rnorm(length(years),1,.5) #temperature is modeled with an intercept of 0.05 a positive slope of magnitude 0.05 to indicate it is increasing each year by 0.05 degrees celcius. Plus some error/ noise. Noise is taken from a gausian distribtuion.
#visualize trend of temperature across the 60 years
plot(temperature ~ years,pch=19); abline(lm(temperature ~ years),lty=2)
## simulate the caterpillar richness as a function of years and temps
catrich <- 350 -2*years -50*temperature + rnorm(length(years),50,50)
#visualize caterpillar richness across years
plot(catrich ~ years,pch=19); abline(lm(catrich ~ years),lty=2)
#combine the simulated vectors to create one dataset
semdata <- cbind(catrich, years, temperature, precip)
In the case where we didn’t simulate data and we are hypothesizing about these patterns, we would draw a metamodel of the relationships we are about to test. Below is the meta model for the model we are about to write. Note that this meta model is very simplified in that it only displays the edges that represent regression or path coefficients. When creating your model, it is important to consider correlated error structure. The covariance of residuals and residual variances of the variables are typically not displayed in a path diagram, but the model output will describe this information.
Let’s first examine the observed or sample covariance matrix to get a sense of these reltationships.
cov(semdata)
## catrich years temperature precip
## catrich 8825.22098 -1358.42825 -82.1936621 90.2473145
## years -1358.42825 305.00000 15.3312769 -23.4048821
## temperature -82.19366 15.33128 1.0431215 0.2863871
## precip 90.24731 -23.40488 0.2863871 97.8564803
Now let’s run our hypothesized SEM. First, we need to write a model statement. Finally we run the model using sem() and plot it using lavaanPlot().
##Write the model
# In the model below the exogenous variables are years and temperature
# Endogenous variables are caterpillar richnesss and precipitation
# Precipitation is predicted by years, caterpillar richness is predicted by years and temperature, temperature is predicted by years
model1 <- 'precip ~ years
catrich ~ years
catrich ~ temperature
temperature ~ years'
# fitting the sem model to your data
fit1 <- sem(model1, data = semdata)
summary(fit1,rsq = T, fit.measures = TRUE, standardized = TRUE) # get summary of model fits (since we specified `fit.measures=TRUE`), standardized estimates (since we specified `standardized =TRUE`) this will return the same coefficients had we scaled all the data and an R-squared value
## lavaan 0.6-10 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 60
##
## Model Test User Model:
##
## Test statistic 5.118
## Degrees of freedom 1
## P-value (Chi-square) 0.024
##
## Model Test Baseline Model:
##
## Test statistic 175.155
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.976
## Tucker-Lewis Index (TLI) 0.854
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -580.210
## Loglikelihood unrestricted model (H1) -577.651
##
## Akaike (AIC) 1176.420
## Bayesian (BIC) 1193.175
## Sample-size adjusted Bayesian (BIC) 1168.013
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.262
## 90 Percent confidence interval - lower 0.077
## 90 Percent confidence interval - upper 0.504
## P-value RMSEA <= 0.05 0.034
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.054
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## precip ~
## years -0.077 0.072 -1.059 0.290 -0.077 -0.135
## catrich ~
## years -1.702 0.652 -2.609 0.009 -1.702 -0.314
## temperature -54.746 11.124 -4.921 0.000 -54.746 -0.592
## temperature ~
## years 0.050 0.004 13.027 0.000 0.050 0.860
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip ~~
## .catrich 64.990 57.207 1.136 0.256 64.990 0.148
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip 94.459 17.246 5.477 0.000 94.459 0.982
## .catrich 2034.063 371.367 5.477 0.000 2034.063 0.231
## .temperature 0.268 0.049 5.477 0.000 0.268 0.261
##
## R-Square:
## Estimate
## precip 0.018
## catrich 0.769
## temperature 0.739
#PLot model
lavaanPlot(name = "MODEL1", fit1, labels = semdata, coefs = TRUE) # prints path diagram that specifies relationships specified in `model1` along with the estimated path coefficients from the fitted model (here, the unstandardized coefficient estimates in the summary output.)
Let’s break down the output:
Summary/Comparison of Multivariate Methods
Model Fit Estimates
The general goal of SEM is to identify a set of maximally likely population parameters given a) the variances and covariances of the observed data and b) several user-specified paths thought to give rise to said variances and covariances. Thus, model fit is based on the comparison of the observed variance-covariance matrix and model-implied covariance matrix. If the model is an accurate representation of the patterns among the data, the variance-covariance matrix using the estimated parameters should match the variance-covariance of the sample or observed matrix.
This is the section describing the fitted model. Values in this section are reported in publications for model fit (e.g., χ2 = 2.3, p=0.128, df=1)
The baseline model represents a null model whereby your observed variables are constrained to covary with no other variables- covariances are fixed to 0 such that only variances are estimated in the covariance matrix. It represents a poor fitting mode. This is tested against your fitted model.
D-G) More model fit statistics
A bunch of other model fit statistics. Beyond the scope of this lesson. In the spirit of somewhat arbitrary cut-offs: RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08 are accepted as indicators of “good fit” refer to Hu and Bentler (1999).
Parameter Estimates
Values in this section get interpreted just like any other linear model. These are the path coefficients that are presented on the path diagrams.
Estimates: Contains the (estimated or fixed) parameter value for each model parameter. You have the option to report unstandardized estimates (these are reported under Estimate) or standardized estimates (reported under Std.all). Standardized are most useful for interpreting effects when neither predictors nor outcomes have natural scales and when gauging relative effect sizes across predictors on different scales. It is also an option to do partial stansdardization if you wish to say a one unit increase in the predictor leads to some increase in standard deviations of the response. If you wish for the output to give partial standardized coefficients use summary(fit, std.nox=TRUE, rsquare=TRUE). Example interpretation: The unstandardized coefficients for precip~years indicate that each year precipitation increases by -0.098 units. Using, standardized coefficients indicates that for every 1 standard deviation in year, precipitation decreases by -0.191 standard deviations.
Std.err: Contains the standard error for each estimated parameter
Z-value: contains the Wald statistic (which is simply obtained by dividing the parameter value by its standard error), and the last column (P(>|z|)) contains the p-value for testing the null hypothesis that the parameter equals zero in the population
I, J) Variance Covariance Outputs
In these sections it is important to note that a (.) before the variable term (e.g., .precip) indicates that it is a residual term. Also, variances and covariance can be displayed on the path diagram abd there are specific ways to illustrate those relationships, however, this information is often omitted from published path diagrams for simplification. Published path diagrams, in ecology, typically only include edges representing the regression coefficients.
This section displays the covariance of the residuals among the endogenous variables (precip and catrich- the predicted variables/ those with arrows pointed to them). In our example the value can be interpreted as a negative association between the variance of precip and catrich that was not accounted for by the exogenous variables. As a default, lavaan allows all exogenous variables to covary but it fixes the residual covariances among endogenous variables to zero. The residual terms of stress precip and catrich are freed to covary within this section. Recall, to designate a covariance (or variance) we use the ~~ operator. Thus, .precip ~~ .catrich tells lavaan to include a residual covariance for precip and catrich. Note that values in the std.all column represents the correlation.
This section represents the residual variance of the endogenous variables precip, temp and catrich- the left-over variance that is not explained by the predictor(s). Notice that precip has a lot of unexplained variance 0.963 while catrich= 0.190 and temperature= 0.261 have much smaller residual variance. This makes sense since we modeled temp and catrich to vary with each other and years.
The amount of total variance explained by the model for each variable. As we would expect, a lot of the variance in precip was not explained by the model. Had we modeled precip off of years then we would have expected a lot more variance in precip to be explained by the model.
Let’s try the same model with our variables scaled.
semdata_scaled <- apply(semdata, MARGIN = 2, scale)
model2 <- 'precip ~ years
catrich ~ years
catrich ~ temperature
temperature ~ years'
fit2 <- sem(model2, data = semdata_scaled)
summary(fit2, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 14 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 60
##
## Model Test User Model:
##
## Test statistic 5.118
## Degrees of freedom 1
## P-value (Chi-square) 0.024
##
## Model Test Baseline Model:
##
## Test statistic 175.155
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.976
## Tucker-Lewis Index (TLI) 0.854
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -168.878
## Loglikelihood unrestricted model (H1) -166.319
##
## Akaike (AIC) 353.755
## Bayesian (BIC) 370.510
## Sample-size adjusted Bayesian (BIC) 345.348
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.262
## 90 Percent confidence interval - lower 0.077
## 90 Percent confidence interval - upper 0.504
## P-value RMSEA <= 0.05 0.034
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.054
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## precip ~
## years -0.135 0.128 -1.059 0.290 -0.135 -0.135
## catrich ~
## years -0.316 0.121 -2.609 0.009 -0.316 -0.314
## temperature -0.595 0.121 -4.921 0.000 -0.595 -0.592
## temperature ~
## years 0.860 0.066 13.027 0.000 0.860 0.860
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip ~~
## .catrich 0.070 0.062 1.136 0.256 0.070 0.148
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip 0.965 0.176 5.477 0.000 0.965 0.982
## .catrich 0.230 0.042 5.477 0.000 0.230 0.231
## .temperature 0.257 0.047 5.477 0.000 0.257 0.261
##
## R-Square:
## Estimate
## precip 0.018
## catrich 0.769
## temperature 0.739
lavaanPlot(name = "MODEL2", fit2, labels = semdata_scaled, coefs = TRUE)
SEM is especially powerful when we use them to test different hypotheses and perform model comparison. This next model does not model a trend in caterpillars over time. We can then perform model comparison using AIC or some other metric.
semdata_scaled <- apply(semdata, MARGIN = 2, scale)
model3 <- 'precip ~ years
catrich ~ temperature
temperature ~ years'
fit3 <- sem(model3, data = semdata_scaled)
summary(fit3, fit.measures = TRUE, standardized = TRUE, rsquare=TRUE)
## lavaan 0.6-10 ended normally after 13 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 7
##
## Number of observations 60
##
## Model Test User Model:
##
## Test statistic 11.148
## Degrees of freedom 2
## P-value (Chi-square) 0.004
##
## Model Test Baseline Model:
##
## Test statistic 175.155
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.946
## Tucker-Lewis Index (TLI) 0.838
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -171.892
## Loglikelihood unrestricted model (H1) -166.319
##
## Akaike (AIC) 357.785
## Bayesian (BIC) 372.445
## Sample-size adjusted Bayesian (BIC) 350.428
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.276
## 90 Percent confidence interval - lower 0.134
## 90 Percent confidence interval - upper 0.444
## P-value RMSEA <= 0.05 0.007
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.055
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## precip ~
## years -0.101 0.127 -0.796 0.426 -0.101 -0.102
## catrich ~
## temperature -0.870 0.066 -13.147 0.000 -0.870 -0.860
## temperature ~
## years 0.860 0.066 13.027 0.000 0.860 0.860
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip ~~
## .catrich 0.112 0.067 1.680 0.093 0.112 0.222
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .precip 0.966 0.176 5.477 0.000 0.966 0.990
## .catrich 0.262 0.048 5.477 0.000 0.262 0.260
## .temperature 0.257 0.047 5.477 0.000 0.257 0.261
##
## R-Square:
## Estimate
## precip 0.010
## catrich 0.740
## temperature 0.739
lavaanPlot(name = "MODEL3", fit3, labels = semdata_scaled, coefs = TRUE)
AIC(fit1, fit2, fit3)
## df AIC
## fit1 8 1176.4203
## fit2 8 353.7550
## fit3 7 357.7849
INTERPRETATION SUMMARY Our second model appears to be the best, so let’s take a closer look. We can see that years has a positive effects on temperature, which is to say that temperature is increasing over time. We can also see that temperature has a direct negative effect on caterpillar richness. There is a also slight negative trend in richness over time after accounting for temperature change and a slight negative trend in precip. The precip result should be interpreted with caution, as we did not simulate a relationship between precip and year. In fact if you look at the model output, the is no reason to suspect this is not due to random noise (high p-value).
Now for the indirect effect. We only have one in this model, it is the impact of year on caterpillar richness that is mediated by temperature change. We can calculate the total effect by multiplying the path coefficients. So the total impact of year on caterpillars is (-0.37 + (0.86 * -0.57)), which is -0.86.
Checking for Understanding
In this class we have introduced many techniques and depending on your data and questions some will be more appropriate than others. That said, often what some may call the wrong technique will still yield some reasonable interpretations that reflect some of the underlying patterns in your data. This is not to say picking the right technique is not important, it is very important, but this is to say that it can be good to try a bunch of techniques to see where they agree and where they differ. It is super important to remember that you probably will not report all of the things you tried and the results you do report should be based off whether or not the test was appropriate and not which one gives you the “best” results.
For this last section we will look at the same dataset a bunch of different ways. I have simulated relationships between a bunch of different variables. Elevation is a factor with two levels: high and low. At each of these two elevations 100 plots were placed. In each, the leaf area of the plant, the herbivory (an ordinal variable from 0 to 2) was determined, and the survival at the end of the season was observed.
dat <- read.csv("multiple_paths.csv")
Based on our knowledge of the system we expect these relationships between variables (this is how I simulated them). Recall that arrows indicate positive relationships and blunted arrows represent negative relationships. Also note this metamodel specifies relationships with low elevation.
These data were simulated with SEMs in mind so it seem reasonable to analyze them using SEMs. There are a couple of different ways we can consider treating our variables. The first interesting one we should examine is herbivory. That was measured in an ordinal fashion and has three categories. Since this is the case, treating it like any other continuous variable probably is not the best thing to do, however it can be easier to interpret. Let’s start there.
#scale herbivory
moddat1 <- dat
moddat1$herbivory <- scale(moddat1$herbivory)
moddat1$leaf_area <- scale(moddat1$leaf_area)
model <- 'leaf_area ~ elevation
herbivory ~ elevation
survival ~ herbivory +elevation
'
#Note that writing the model liek this is analogous to the model above
#model <- 'leaf_area ~ elevation
# survival ~ elevation
# herbivory ~ elevation
# survival ~ herbivory'
#fit model
fit <- sem(model, data = moddat1)
#summary output
summary(fit, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 10 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 2.322
## Degrees of freedom 1
## P-value (Chi-square) 0.128
##
## Model Test Baseline Model:
##
## Test statistic 108.420
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.987
## Tucker-Lewis Index (TLI) 0.923
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -643.959
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 1303.919
## Bayesian (BIC) 1330.306
## Sample-size adjusted Bayesian (BIC) 1304.961
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.081
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.224
## P-value RMSEA <= 0.05 0.220
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.031
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## leaf_area ~
## elevation 0.598 0.135 4.441 0.000 0.598 0.300
## herbivory ~
## elevation 0.927 0.125 7.425 0.000 0.927 0.465
## survival ~
## herbivory -0.191 0.034 -5.649 0.000 -0.191 -0.410
## elevation 0.367 0.067 5.445 0.000 0.367 0.395
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .leaf_area ~~
## .survival -0.009 0.028 -0.333 0.739 -0.009 -0.024
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .leaf_area 0.906 0.091 10.000 0.000 0.906 0.910
## .herbivory 0.780 0.078 10.000 0.000 0.780 0.784
## .survival 0.178 0.018 10.000 0.000 0.178 0.826
##
## R-Square:
## Estimate
## leaf_area 0.090
## herbivory 0.216
## survival 0.174
#plot it in a path diagram
lavaanPlot(name = "MODEL1", fit, coefs = TRUE)
Instead of treating herbivory like a continuous variable, we can treat it as a categorical variable. This is a more conservative approach because we get an estimate for each herbivory level. The down side is the model looks a bit messier and it will be slightly harder to fit. That said, we have more than enough data here. First I dummy code these categorical variables, then I scale leaf area, before finally running the model.
# Lets first try dummy coding herbivory the tidyverse way. Result is `moddat2.2`
###dummy code herbivory into a categorical variable with 3 levels
moddat2 <- dat
moddat2.1 <- moddat2 %>%
mutate(ones = 1) %>%
spread(key = herbivory, value = ones, fill = 0)
colnames(moddat2.1)[5:7] <- c("herb0", "herb1", "herb2") #each level of herbivory is now a column filled with 1,0 where 1 indicated that the sample had that level of herbivory
moddat2.1
## plot_num elevation leaf_area survival herb0 herb1 herb2
## 1 1 high 223.9341 0 0 1 0
## 2 2 high 241.1445 0 0 1 0
## 3 3 high 238.8826 0 0 0 1
## 4 4 high 232.0278 0 0 0 1
## 5 5 high 247.5750 1 1 0 0
## 6 6 high 223.0350 0 1 0 0
## 7 7 high 241.9685 0 0 1 0
## 8 8 high 201.4210 0 1 0 0
## 9 9 high 224.1859 0 0 1 0
## 10 10 high 218.1735 0 0 0 1
## 11 11 high 213.8530 1 1 0 0
## 12 12 high 227.9871 0 0 0 1
## 13 13 high 218.4115 1 1 0 0
## 14 14 high 217.1436 0 0 1 0
## 15 15 high 219.5362 1 1 0 0
## 16 16 high 233.8519 0 0 1 0
## 17 17 high 248.4063 0 0 1 0
## 18 18 high 221.2045 0 1 0 0
## 19 19 high 227.4765 0 0 1 0
## 20 20 high 227.5430 0 1 0 0
## 21 21 high 235.3578 1 0 1 0
## 22 22 high 223.6971 0 0 1 0
## 23 23 high 230.0555 1 1 0 0
## 24 24 high 218.9193 0 1 0 0
## 25 25 high 217.3493 0 1 0 0
## 26 26 high 220.5121 0 0 1 0
## 27 27 high 230.4666 0 0 1 0
## 28 28 high 227.0012 0 1 0 0
## 29 29 high 215.6307 0 1 0 0
## 30 30 high 223.9131 0 0 0 1
## 31 31 high 215.6951 1 1 0 0
## 32 32 high 200.1617 0 1 0 0
## 33 33 high 231.8544 0 0 1 0
## 34 34 high 231.0273 0 1 0 0
## 35 35 high 228.5112 0 0 1 0
## 36 36 high 216.3507 0 1 0 0
## 37 37 high 226.1610 0 0 1 0
## 38 38 high 211.7191 0 0 1 0
## 39 39 high 218.5448 0 1 0 0
## 40 40 high 220.7018 0 1 0 0
## 41 41 high 224.4569 0 0 1 0
## 42 42 high 230.9382 1 1 0 0
## 43 43 high 219.7702 0 0 0 1
## 44 44 high 219.6076 0 0 1 0
## 45 45 high 213.1130 0 1 0 0
## 46 46 high 208.9755 0 1 0 0
## 47 47 high 231.6221 0 0 1 0
## 48 48 high 245.4474 0 1 0 0
## 49 49 high 235.1771 0 1 0 0
## 50 50 high 214.3386 0 1 0 0
## 51 51 high 225.4341 0 0 0 1
## 52 52 high 237.1792 0 0 1 0
## 53 53 high 227.8549 1 1 0 0
## 54 54 high 219.0544 0 0 1 0
## 55 55 high 228.5753 0 0 1 0
## 56 56 high 230.8852 1 1 0 0
## 57 57 high 220.5188 1 1 0 0
## 58 58 high 225.0896 1 1 0 0
## 59 59 high 219.6584 1 1 0 0
## 60 60 high 223.4775 1 1 0 0
## 61 61 high 239.6187 0 0 0 1
## 62 62 high 229.7886 1 0 1 0
## 63 63 high 218.1291 0 0 1 0
## 64 64 high 237.7585 0 1 0 0
## 65 65 high 231.3198 1 1 0 0
## 66 66 high 223.6571 0 0 1 0
## 67 67 high 236.6569 0 0 0 1
## 68 68 high 213.6713 0 0 1 0
## 69 69 high 227.4810 0 0 0 1
## 70 70 high 199.1948 0 0 1 0
## 71 71 high 226.9965 0 1 0 0
## 72 72 high 212.5137 0 0 1 0
## 73 73 high 220.4063 0 0 1 0
## 74 74 high 219.2520 0 0 1 0
## 75 75 high 213.9397 0 1 0 0
## 76 76 high 222.2626 1 1 0 0
## 77 77 high 245.0445 0 1 0 0
## 78 78 high 221.4576 0 1 0 0
## 79 79 high 223.9098 0 0 0 1
## 80 80 high 200.2534 1 1 0 0
## 81 81 high 228.6278 0 0 1 0
## 82 82 high 230.3779 0 1 0 0
## 83 83 high 229.8062 0 0 0 1
## 84 84 high 209.7952 0 0 1 0
## 85 85 high 232.2660 0 1 0 0
## 86 86 high 230.5404 1 1 0 0
## 87 87 high 236.0617 0 1 0 0
## 88 88 high 222.2768 0 0 1 0
## 89 89 high 230.4982 0 1 0 0
## 90 90 high 219.8582 1 1 0 0
## 91 91 high 236.2455 0 0 1 0
## 92 92 high 219.1919 0 0 1 0
## 93 93 high 224.1482 0 0 1 0
## 94 94 high 223.0236 0 0 1 0
## 95 95 high 215.8499 1 0 0 1
## 96 96 high 230.0664 0 0 1 0
## 97 97 high 219.6478 0 0 0 1
## 98 98 high 220.0212 1 1 0 0
## 99 99 high 221.6218 0 0 1 0
## 100 100 high 250.0842 0 0 0 1
## 101 101 low 222.3875 0 0 0 1
## 102 102 low 246.2901 1 0 1 0
## 103 103 low 226.9909 0 0 0 1
## 104 104 low 210.1867 0 0 1 0
## 105 105 low 235.1495 0 0 1 0
## 106 106 low 235.7125 1 1 0 0
## 107 107 low 235.4264 0 0 1 0
## 108 108 low 225.9778 1 0 1 0
## 109 109 low 248.2218 0 0 0 1
## 110 110 low 235.0839 1 1 0 0
## 111 111 low 233.8128 1 0 1 0
## 112 112 low 241.9972 0 0 0 1
## 113 113 low 246.5572 0 0 0 1
## 114 114 low 226.4523 0 0 0 1
## 115 115 low 223.9053 0 0 1 0
## 116 116 low 225.2227 0 0 0 1
## 117 117 low 223.1540 0 0 0 1
## 118 118 low 215.2080 0 0 1 0
## 119 119 low 218.1263 0 1 0 0
## 120 120 low 230.6563 0 0 0 1
## 121 121 low 233.3210 0 0 1 0
## 122 122 low 214.4861 1 0 1 0
## 123 123 low 235.8683 0 0 0 1
## 124 124 low 259.2070 0 0 0 1
## 125 125 low 233.6055 1 0 1 0
## 126 126 low 244.4771 0 0 0 1
## 127 127 low 233.4685 0 0 0 1
## 128 128 low 221.5074 1 0 0 1
## 129 129 low 229.7615 0 0 0 1
## 130 130 low 234.2079 1 0 0 1
## 131 131 low 229.8631 0 0 0 1
## 132 132 low 229.7241 0 0 0 1
## 133 133 low 229.8016 1 0 1 0
## 134 134 low 234.5849 1 0 0 1
## 135 135 low 233.4515 0 0 0 1
## 136 136 low 225.3502 0 0 0 1
## 137 137 low 224.4843 1 0 1 0
## 138 138 low 243.0909 1 0 1 0
## 139 139 low 234.3399 0 0 0 1
## 140 140 low 222.7211 0 0 0 1
## 141 141 low 246.9550 0 0 0 1
## 142 142 low 216.8487 1 0 1 0
## 143 143 low 246.3051 0 0 1 0
## 144 144 low 228.3345 1 1 0 0
## 145 145 low 229.0458 1 0 1 0
## 146 146 low 234.9262 1 0 0 1
## 147 147 low 245.0594 0 0 0 1
## 148 148 low 239.1504 1 0 1 0
## 149 149 low 229.4033 0 0 0 1
## 150 150 low 233.2787 1 0 0 1
## 151 151 low 214.2124 0 0 0 1
## 152 152 low 230.0630 0 0 0 1
## 153 153 low 243.0422 0 0 1 0
## 154 154 low 234.0899 1 1 0 0
## 155 155 low 239.7423 1 1 0 0
## 156 156 low 225.4010 1 0 0 1
## 157 157 low 231.9806 0 0 0 1
## 158 158 low 230.6718 1 0 1 0
## 159 159 low 242.4544 0 1 0 0
## 160 160 low 238.9013 1 1 0 0
## 161 161 low 218.5473 1 0 1 0
## 162 162 low 220.0033 1 0 0 1
## 163 163 low 239.5431 0 0 0 1
## 164 164 low 226.3389 0 0 0 1
## 165 165 low 215.3665 1 1 0 0
## 166 166 low 225.2492 0 0 0 1
## 167 167 low 229.2681 1 0 0 1
## 168 168 low 234.3611 1 0 1 0
## 169 169 low 235.4550 0 0 0 1
## 170 170 low 230.2943 1 0 0 1
## 171 171 low 225.0157 0 0 0 1
## 172 172 low 226.7957 0 0 0 1
## 173 173 low 239.0662 1 0 0 1
## 174 174 low 241.6744 0 0 1 0
## 175 175 low 225.6601 1 0 0 1
## 176 176 low 231.4486 0 0 0 1
## 177 177 low 238.2591 0 0 1 0
## 178 178 low 241.1180 0 0 0 1
## 179 179 low 252.6027 1 0 0 1
## 180 180 low 236.7156 1 1 0 0
## 181 181 low 206.5731 0 0 0 1
## 182 182 low 222.4729 0 0 1 0
## 183 183 low 240.2680 0 0 0 1
## 184 184 low 216.1082 0 0 1 0
## 185 185 low 218.8058 1 0 1 0
## 186 186 low 217.7569 1 0 0 1
## 187 187 low 222.5113 0 0 0 1
## 188 188 low 239.5409 0 0 0 1
## 189 189 low 239.7349 1 0 1 0
## 190 190 low 231.0164 0 0 0 1
## 191 191 low 250.7407 0 0 0 1
## 192 192 low 225.4600 0 0 0 1
## 193 193 low 211.8487 1 0 0 1
## 194 194 low 239.9444 0 0 1 0
## 195 195 low 226.8627 0 0 1 0
## 196 196 low 239.9382 1 0 0 1
## 197 197 low 223.2647 1 1 0 0
## 198 198 low 219.6682 0 0 0 1
## 199 199 low 222.4671 1 1 0 0
## 200 200 low 232.4861 0 1 0 0
#Fun tangent. If you are interested, this is the same step as above but showing you the base R method to do this. Result is `moddat2.2`
herb_vals <- as.character(unique(moddat2$herbivory))
moddat2.2 <- data.frame(matrix(ncol = (ncol(moddat2)-1) + length(herb_vals)))
colnames(moddat2.2) <- c(colnames(moddat2)[-4], paste0("herb", herb_vals))
for(i in 1:length(herb_vals)) {
temp_dat <- subset(moddat2, herbivory == herb_vals[i])
temp_dat <- subset(temp_dat, select= -herbivory)
new_mat <- matrix(data = 0, nrow = nrow(temp_dat), ncol = length(herb_vals))
colnames(new_mat) <- paste0("herb", herb_vals)
new_mat[,i] <- 1
moddat2.2 <- rbind(moddat2.2, cbind(temp_dat, new_mat))
}
moddat2.2 <- moddat2.2[-1,]
# ok. End tangent
# Let's use `moddat2.1
moddat2.1$leaf_area <- scale(moddat2$leaf_area) #scale leaf area
Now that we have the dataset moddat2.1 ready to go with categorical variables, let’s run a similar model updates with categorical herbivory.
model <- '
leaf_area ~ elevation
herb1 ~ elevation
herb2 ~ elevation
survival ~ herb1 + herb2 + elevation'
fit <- sem(model, data = moddat2.1)
summary(fit, rsq = T, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-10 ended normally after 14 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 11
##
## Number of observations 200
##
## Model Test User Model:
##
## Test statistic 78.230
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 181.187
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.561
## Tucker-Lewis Index (TLI) -0.465
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -634.292
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 1290.585
## Bayesian (BIC) 1326.866
## Sample-size adjusted Bayesian (BIC) 1292.017
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.354
## 90 Percent confidence interval - lower 0.289
## 90 Percent confidence interval - upper 0.424
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.157
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## leaf_area ~
## elevation 0.598 0.135 4.441 0.000 0.598 0.300
## herb1 ~
## elevation -0.090 0.067 -1.345 0.179 -0.090 -0.095
## herb2 ~
## elevation 0.420 0.061 6.881 0.000 0.420 0.437
## survival ~
## herb1 -0.307 0.063 -4.889 0.000 -0.307 -0.293
## herb2 -0.473 0.069 -6.869 0.000 -0.473 -0.455
## elevation 0.361 0.066 5.439 0.000 0.361 0.362
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .leaf_area ~~
## .survival -0.010 0.028 -0.354 0.723 -0.010 -0.025
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .leaf_area 0.906 0.091 10.000 0.000 0.906 0.910
## .herb1 0.224 0.022 10.000 0.000 0.224 0.991
## .herb2 0.186 0.019 10.000 0.000 0.186 0.809
## .survival 0.177 0.018 10.000 0.000 0.177 0.711
##
## R-Square:
## Estimate
## leaf_area 0.090
## herb1 0.009
## herb2 0.191
## survival 0.289
lavaanPlot(name = "MODEL1", fit, coefs = TRUE)
You will probably recall from the complex models lab that we do not usually model binomial data as a regular glm, which both of the previous models have done. Lavaan can only run gaussian models. We need to use the piecewiseSEM package for fancier models.
moddat3 <- dat
moddat3$leaf_area <- scale(moddat3$leaf_area) #first, scale leaf area like we did in the previous models
moddat3 #Notice we are back to the data frame where herbivory is a single column
## plot_num elevation leaf_area herbivory survival
## 1 1 high -0.386859392 1 0
## 2 2 high 1.250968150 1 0
## 3 3 high 1.035721351 2 0
## 4 4 high 0.383378059 2 0
## 5 5 high 1.862932911 0 1
## 6 6 high -0.472421078 0 0
## 7 7 high 1.329389156 1 0
## 8 8 high -2.529326397 0 0
## 9 9 high -0.362900296 1 0
## 10 10 high -0.935064826 2 0
## 11 11 high -1.346227357 0 1
## 12 12 high -0.001155493 2 0
## 13 13 high -0.912415163 0 1
## 14 14 high -1.033076232 1 0
## 15 15 high -0.805383708 0 1
## 16 16 high 0.556968705 1 0
## 17 17 high 1.942039618 1 0
## 18 18 high -0.646619551 0 0
## 19 19 high -0.049741144 1 0
## 20 20 high -0.043420017 0 0
## 21 21 high 0.700278477 1 1
## 22 22 high -0.409408863 1 0
## 23 23 high 0.195681968 0 1
## 24 24 high -0.864090347 0 0
## 25 25 high -1.013501716 0 0
## 26 26 high -0.712513954 1 0
## 27 27 high 0.234807737 1 0
## 28 28 high -0.094972981 0 0
## 29 29 high -1.177051720 0 0
## 30 30 high -0.388854861 2 0
## 31 31 high -1.170925310 0 1
## 32 32 high -2.649161876 0 0
## 33 33 high 0.366876542 1 0
## 34 34 high 0.288170685 0 0
## 35 35 high 0.048719137 1 0
## 36 36 high -1.108533303 0 0
## 37 37 high -0.174934172 1 0
## 38 38 high -1.549302409 1 0
## 39 39 high -0.899736474 0 0
## 40 40 high -0.694460156 0 0
## 41 41 high -0.337102914 1 0
## 42 42 high 0.279684132 0 1
## 43 43 high -0.783118903 2 0
## 44 44 high -0.798589126 1 0
## 45 45 high -1.416651696 0 0
## 46 46 high -1.810393690 0 0
## 47 47 high 0.344774474 1 0
## 48 48 high 1.660457149 0 0
## 49 49 high 0.683087451 0 0
## 50 50 high -1.300017036 0 0
## 51 51 high -0.244110090 2 0
## 52 52 high 0.873612531 1 0
## 53 53 high -0.013734154 0 1
## 54 54 high -0.851237517 1 0
## 55 55 high 0.054818147 1 0
## 56 56 high 0.274640588 0 1
## 57 57 high -0.711877095 0 1
## 58 58 high -0.276896579 0 1
## 59 59 high -0.793752164 0 1
## 60 60 high -0.430315371 0 1
## 61 61 high 1.105768155 2 0
## 62 62 high 0.170290564 1 1
## 63 63 high -0.939293261 1 0
## 64 64 high 0.928747279 0 0
## 65 65 high 0.316002961 0 1
## 66 66 high -0.413214677 1 0
## 67 67 high 0.823905270 2 0
## 68 68 high -1.363517709 1 0
## 69 69 high -0.049318860 2 0
## 70 70 high -2.741180305 1 0
## 71 71 high -0.095420843 0 0
## 72 72 high -1.473683560 1 0
## 73 73 high -0.722577974 1 0
## 74 74 high -0.832428450 1 0
## 75 75 high -1.337974245 0 0
## 76 76 high -0.545925345 0 1
## 77 77 high 1.622118335 0 0
## 78 78 high -0.622533343 0 0
## 79 79 high -0.389168583 2 0
## 80 80 high -2.640433719 0 1
## 81 81 high 0.059814883 1 0
## 82 82 high 0.226366742 0 0
## 83 83 high 0.171959698 2 0
## 84 84 high -1.732392809 1 0
## 85 85 high 0.406047071 0 0
## 86 86 high 0.241829371 0 1
## 87 87 high 0.767266416 0 0
## 88 88 high -0.544575500 1 0
## 89 89 high 0.237816426 0 0
## 90 90 high -0.774745912 0 1
## 91 91 high 0.784756224 1 0
## 92 92 high -0.838149930 1 0
## 93 93 high -0.366483692 1 0
## 94 94 high -0.473503104 1 0
## 95 95 high -1.156197238 2 1
## 96 96 high 0.196721534 1 0
## 97 97 high -0.794762790 2 0
## 98 98 high -0.759230643 0 1
## 99 99 high -0.606904524 1 0
## 100 100 high 2.101719414 2 0
## 101 101 low -0.534044612 2 0
## 102 102 low 1.740656614 1 1
## 103 103 low -0.095954806 2 0
## 104 104 low -1.695130785 1 0
## 105 105 low 0.680461023 1 0
## 106 106 low 0.734037519 0 1
## 107 107 low 0.706805622 1 0
## 108 108 low -0.192367403 1 1
## 109 109 low 1.924482471 2 0
## 110 110 low 0.674218074 0 1
## 111 111 low 0.553250284 1 1
## 112 112 low 1.332121500 2 0
## 113 113 low 1.766074012 2 0
## 114 114 low -0.147214441 2 0
## 115 115 low -0.389598323 1 0
## 116 116 low -0.264230832 2 0
## 117 117 low -0.461092416 2 0
## 118 118 low -1.217275280 1 0
## 119 119 low -0.939560582 0 0
## 120 120 low 0.252862788 2 0
## 121 121 low 0.506446537 1 0
## 122 122 low -1.285980373 1 1
## 123 123 low 0.748863896 2 0
## 124 124 low 2.969895577 2 0
## 125 125 low 0.533521440 1 1
## 126 126 low 1.568115861 2 0
## 127 127 low 0.520480735 2 0
## 128 128 low -0.617797805 2 1
## 129 129 low 0.167705476 2 0
## 130 130 low 0.590846548 2 1
## 131 131 low 0.177372490 2 0
## 132 132 low 0.164145974 2 0
## 133 133 low 0.171525126 1 1
## 134 134 low 0.626729390 2 1
## 135 135 low 0.518865811 2 0
## 136 136 low -0.252098664 2 0
## 137 137 low -0.334502287 1 1
## 138 138 low 1.436200428 1 1
## 139 139 low 0.603411303 2 0
## 140 140 low -0.502294754 2 0
## 141 141 low 1.803933617 2 0
## 142 142 low -1.061143362 1 1
## 143 143 low 1.742085939 1 0
## 144 144 low 0.031902118 0 1
## 145 145 low 0.099595635 1 1
## 146 146 low 0.659210171 2 1
## 147 147 low 1.623533495 2 0
## 148 148 low 1.061202237 1 1
## 149 149 low 0.133616860 2 0
## 150 150 low 0.502426385 2 1
## 151 151 low -1.312029969 2 0
## 152 152 low 0.196399461 2 0
## 153 153 low 1.431570336 1 0
## 154 154 low 0.579621631 0 1
## 155 155 low 1.117530229 0 1
## 156 156 low -0.247257325 2 1
## 157 157 low 0.378886722 2 0
## 158 158 low 0.254339272 1 1
## 159 159 low 1.375626357 0 0
## 160 160 low 1.037499978 0 1
## 161 161 low -0.899497145 1 1
## 162 162 low -0.760935216 2 1
## 163 163 low 1.098579462 2 0
## 164 164 low -0.158002294 2 0
## 165 165 low -1.202197910 0 1
## 166 166 low -0.261708100 2 0
## 167 167 low 0.120749966 2 1
## 168 168 low 0.605433511 1 1
## 169 169 low 0.709534161 2 0
## 170 170 low 0.218415143 2 1
## 171 171 low -0.283930114 2 0
## 172 172 low -0.114537721 2 0
## 173 173 low 1.053190485 2 1
## 174 174 low 1.301400904 1 0
## 175 175 low -0.222607475 2 1
## 176 176 low 0.328261763 2 0
## 177 177 low 0.976379636 1 0
## 178 178 low 1.248452523 2 0
## 179 179 low 2.341394959 2 1
## 180 180 low 0.829491958 0 1
## 181 181 low -2.039019723 2 0
## 182 182 low -0.525914175 1 0
## 183 183 low 1.167558159 2 0
## 184 184 low -1.131615875 1 0
## 185 185 low -0.874895418 1 1
## 186 186 low -0.974708559 2 1
## 187 187 low -0.522260773 2 0
## 188 188 low 1.098364418 2 0
## 189 189 low 1.116826507 1 1
## 190 190 low 0.287125926 2 0
## 191 191 low 2.164195489 2 0
## 192 192 low -0.241644332 2 0
## 193 193 low -1.536971088 2 1
## 194 194 low 1.136766052 1 0
## 195 195 low -0.108159197 1 0
## 196 196 low 1.136179406 2 1
## 197 197 low -0.450565687 0 1
## 198 198 low -0.792825915 2 0
## 199 199 low -0.526464012 0 1
## 200 200 low 0.426992224 0 0
# `psem` is a function in the `piecewise` package that allows you to do an SEM but specify distributions for the variables other than a normal distribution- as is the case with lavaan. The syntax is different between lavaan and piecewise. Here you will notice, instead of using quotes to enclose the model, the function `psem` encloses the model and the `lm` function is used for each relationship- which is not the case in lavaan
glmsem <- psem(
lm(leaf_area ~ elevation, moddat3),
lm(herbivory ~ elevation, moddat3),
glm(survival ~ herbivory + leaf_area + elevation, "binomial", moddat3)
)
# As a summary output we will use `coefs`. This outputs the regression coefficients (path coefficients) for our model
coefs(glmsem)
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 leaf_area elevation - - 1 19.5256 0.0000
## 2 leaf_area elevation = high -0.2989 0.0956 198 -3.1245 0.0020
## 3 leaf_area elevation = low 0.2989 0.0956 198 3.1245 0.0020
## 4 herbivory elevation - - 1 54.5795 0.0000
## 5 herbivory elevation = high 0.69 0.0718 198 9.6121 0.0000
## 6 herbivory elevation = low 1.44 0.0718 198 20.0600 0.0000
## 7 survival herbivory -1.3462 0.2766 196 -4.8670 0.0000
## 8 survival leaf_area -0.0766 0.1764 196 -0.4345 0.6639
## 9 survival elevation - - 1 26.9222 0.0000
## 10 survival elevation = high -2.0447 0.3288 Inf -6.2180 0.0000
## 11 survival elevation = low 0.1363 0.251 Inf 0.5431 0.5870
## Std.Estimate
## 1 - ***
## 2 - **
## 3 - **
## 4 - ***
## 5 - ***
## 6 - ***
## 7 -0.51 ***
## 8 -0.0359
## 9 - ***
## 10 - ***
## 11 -
PiecewiseSEM does not have a “nice” plotting function. You could make one in illustration software (i.e. adobe illustrator or powerpoint). If you look at the coefficient directionality and the magnitude of the estimates, patterns follow our overall story!
So far we have only looked at different types of SEMs. What if we perform regular linear models? Can we pick up any of the signal in the data?
moddat4 <- dat
mod4.1 <- glm(survival ~ elevation + herbivory + leaf_area + elevation*herbivory, data = moddat4)
summary(mod4.1)
##
## Call:
## glm(formula = survival ~ elevation + herbivory + leaf_area +
## elevation * herbivory, data = moddat4)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7940 -0.2750 -0.1565 0.4634 1.0518
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.612963 0.681641 0.899 0.369631
## elevationlow 0.418324 0.115111 3.634 0.000357 ***
## herbivory -0.214975 0.059984 -3.584 0.000428 ***
## leaf_area -0.001088 0.003042 -0.358 0.721008
## elevationlow:herbivory -0.041847 0.084693 -0.494 0.621795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.182239)
##
## Null deviance: 43.155 on 199 degrees of freedom
## Residual deviance: 35.537 on 195 degrees of freedom
## AIC: 234.02
##
## Number of Fisher Scoring iterations: 2
mod4.2 <- glm(survival ~ elevation + herbivory + leaf_area + elevation*herbivory, family = "binomial", data = moddat4)
summary(mod4.2)
##
## Call:
## glm(formula = survival ~ elevation + herbivory + leaf_area +
## elevation * herbivory, family = "binomial", data = moddat4)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7652 -0.8026 -0.4411 1.0979 2.8727
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.04381 3.74622 0.279 0.780530
## elevationlow 1.70865 0.60216 2.838 0.004546 **
## herbivory -1.86889 0.56165 -3.327 0.000876 ***
## leaf_area -0.00656 0.01673 -0.392 0.694908
## elevationlow:herbivory 0.74054 0.64743 1.144 0.252700
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 249.22 on 199 degrees of freedom
## Residual deviance: 208.95 on 195 degrees of freedom
## AIC: 218.95
##
## Number of Fisher Scoring iterations: 5
mod4.3 <- glm(survival ~ elevation, family = "binomial", data = moddat4)
summary(mod4.3)
##
## Call:
## glm(formula = survival ~ elevation, family = "binomial", data = moddat4)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0273 -1.0273 -0.7049 1.3354 1.7402
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2657 0.2414 -5.243 1.58e-07 ***
## elevationlow 0.9017 0.3156 2.857 0.00428 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 249.22 on 199 degrees of freedom
## Residual deviance: 240.75 on 198 degrees of freedom
## AIC: 244.75
##
## Number of Fisher Scoring iterations: 4
We lose some of the indirect effect interpretation with these, but we still retain the interpretation that herbivory has a negative effect on survival, high elevations have lower survival, and that leaf area`` is not very predictive.
We can also model these data as a Bayesian hierarchical model. Since plots are nested in elevation we can estimate both elevation specific and across elevation effects.
moddat5 <- dat
#Make each level of Herbivory a sperate column
moddat5 <- moddat5 %>%
mutate(ones = 1) %>%
spread(key = herbivory, value = ones, fill = 0)
colnames(moddat5)[5:7] <- c("herb0", "herb1", "herb2")
#############################
# Step1: #
#Dataframe to lit for rJAGS #
#############################
# Put the model data into a list. The jags function requires this formatting.
model_data <- list(surv = moddat5$survival,
herb0 = moddat5$herb0,
herb1 = moddat5$herb1,
herb2 = moddat5$herb2,
area = scale(moddat5$leaf_area)[,1], #note we scale leaf area
elevs = as.numeric(factor(moddat5$elevation)), #we change elevation to a numeric variable so jags can interpret it
N = length(moddat5$plot_num), #jags requires the number of observations to be specified. In our case each row is a unique plot, so if we use length on the plot column, we will get the number of observations
Nelev = length(unique(moddat5$elevation)))
#############################
# Step2: #
#Build Model #
#############################
#Model specification (I will follow-up by adding line-by-line explanation of this model this weekend)
sim_model <- "model {
#`for(i in 1:N)` : for each observation i through N observations. Here N = 200
#`surv[i] ~ dbern(p[i])`: y is distributed as a Bernoulli distribution with parameter p
# p= probabily of success and in this case refers to the probability of survival.
# i refers to each trial.The number of trials equals the number of observations.
#`logit(p[i])`: The logit probibility of success for the ith observation
#`logit(p[i])<- beta0[elevs[i]] * herb0[i] + beta1[elevs[i]] * herb1[i] +
#beta2[elevs[i]] * herb2[i] + beta3[elevs[i]*area[i]`: The logit probibility of success
#for the ith observation is a linear function of the different categories of herbivory
#(herb0,herb1, herb2) and area. The brackets next to the beta coefficients
#(e.g. beta1[elevs[i]]) indicate the heirarchical components of the model. Such that a beta #coefficient for each level of elevation will be caclulated for each level of
#herbivory or area.
for(i in 1:N) {
surv[i] ~ dbern(p[i])
logit(p[i]) <- beta0[elevs[i]] * herb0[i] +
beta1[elevs[i]] * herb1[i] +
beta2[elevs[i]] * herb2[i] +
beta3[elevs[i]] * area[i]
}
#Uninformative priors
#Prior distributions are a central component of Bayesian models. This section defines the
#priors.
#`for (j in 1:Nelev)` For the jth elevation in 1: through the number of the elevations
#(which is 2 levels of elevation; high and low)
#`dnorm(hbmu0, 0.001)`: Each beta coefficient has a prior ditribution. In this example, we
#have chosen uninformative priors that are defined as coming from a normal distribuion with
#two parameters: mean(hbmu) & precision(0.001). You may notice that typically in a Gaussian
#distribution, we specify the mean and standard deviation. For the priors it is the convention
#to use precision which is the inverse variance (1/variance).
for (j in 1:Nelev) {
beta0[j] ~ dnorm(hbmu0, 0.001)
beta1[j] ~ dnorm(hbmu1, 0.001)
beta2[j] ~ dnorm(hbmu2, 0.001)
beta3[j] ~ dnorm(hbmu3, 0.001)
}
#Uninformative hyperpriors
#Notice in the chunck above, hbmu is a parameter of a parameter(e.g. hbmu0 is a hyperparamter
#since it is used to model the parameter beta0). Since it is a hyperparameter for the prior
#distribution, it is called a hyperprior. We need to include hyperparameters in the model as
#random variables since we do not know their value. Like above, we made these hyperpriors
#uninformative using a normal distributon to model our uncertainty in hbmu. We model the
#hyperprior with a mean of 0 and precision of 0.001.
hbmu0 ~ dnorm(0, 0.001)
hbmu1 ~ dnorm(0, 0.001)
hbmu2 ~ dnorm(0, 0.001)
hbmu3 ~ dnorm(0, 0.001)
}
"
writeLines(sim_model, con="sim_model.txt")
############################################################
# Step3: #
#Run data through the Sampler and specify sampler details #
############################################################
#This specifies details of the MCMC chain (i.e., number of burnin, number of iterations, number of chains)
mod <- jags(data = model_data,
n.adapt = 2000,
n.burnin = 1000,
n.iter = 10000,
n.chains = 4,
modules = "glm",
model.file = "sim_model.txt",
parameters.to.save = c("beta0", "beta1", "beta2", "beta3", "hbmu0", "hbmu1", "hbmu2", "hbmu3"),
verbose = F)
# The `mod` output is an array. The section of code below pulls out the necessary information from the array to make a handy dataframe called `plot_dat`. `med` represents the mean of the posterior distribution- these are the beta coefficients we report. `lc` and `uc` represent the respective lower and upper credible intervals around that estimate. Recall that beta coefficients and their credible intervals represent the effect of herbivory and leaf area on survival for each level of those factors (high and low)
plot_dat <- data.frame(label = c("High_Herb0", "High_Herb1", "High_Herb2", "High_Leaf",
"Low_Herb0", "Low_Herb1", "Low_Herb2", "Low_Leaf"),
med = c(mod$q50$beta0[1], mod$q50$beta1[1], mod$q50$beta2[1], mod$q50$beta3[1],
mod$q50$beta0[2], mod$q50$beta1[2], mod$q50$beta2[2], mod$q50$beta3[2]),
lc = c(mod$q2.5$beta0[1], mod$q2.5$beta1[1], mod$q2.5$beta2[1], mod$q2.5$beta3[1],
mod$q2.5$beta0[2], mod$q2.5$beta1[2], mod$q2.5$beta2[2], mod$q2.5$beta3[2]),
uc = c(mod$q97.5$beta0[1], mod$q97.5$beta1[1], mod$q97.5$beta2[1], mod$q97.5$beta3[1],
mod$q97.5$beta0[2], mod$q97.5$beta1[2], mod$q97.5$beta2[2], mod$q97.5$beta3[2]))
plot_dat
## label med lc uc
## 1 High_Herb0 -0.34951837 -1.0022445 0.2887310
## 2 High_Herb1 -3.08448335 -5.0149007 -1.8334635
## 3 High_Herb2 -2.98748760 -6.3265173 -1.2062020
## 4 High_Leaf 0.03348905 -0.5248504 0.5779709
## 5 Low_Herb0 1.34646560 0.1252927 2.9063205
## 6 Low_Herb1 0.17019252 -0.5607322 0.9230453
## 7 Low_Herb2 -1.00938745 -1.6562110 -0.4146287
## 8 Low_Leaf -0.15907374 -0.6419356 0.3052717
# plot the estimated beta coefficients and their credible intervals for the effects (beta coeficients) of herbivory and leaf area on survival at each elevation category
ggplot(data = plot_dat) +
geom_vline(aes(xintercept = 0), linetype = "dashed", color = "gray") +
geom_errorbarh(aes(xmin = lc, xmax = uc, y = label), height = 0) +
geom_point(aes(x = med, y = label), pch = 21, fill = "gray65", color = "black") +
labs(x = "Point Estimates of beta coefficients from posterior distribution (95% CI)", y="") +
theme_classic()
Summary/Comparison of Multivariate Methods
When beta coefficients are close to zero and the credible intervals overlap with zero, these indicate no effect of thsoe variables on survival. Here we see the same pattern as we saw before. Increasingly negative effects of of high hebrivory (green). We also see no effect of leaf area (yellow) and we see greater survival overall at low elevations (red).
#This dataframe includes the OVERALL beta coefficients and credible intervals of herbivory and leaf area effect on survival
plot_dat1 <- data.frame(label = c("Herb0", "Herb1", "Herb2", "Leaf"),
med = c(mod$q50$hbmu0, mod$q50$hbmu1, mod$q50$hbmu2, mod$q50$hbmu3),
lc = c(mod$q2.5$hbmu0, mod$q2.5$hbmu1, mod$q2.5$hbmu2, mod$q2.5$hbmu3),
uc = c(mod$q97.5$hbmu0, mod$q97.5$hbmu1, mod$q97.5$hbmu2, mod$q97.5$hbmu3))
#plot the
ggplot(data = plot_dat1) +
geom_vline(aes(xintercept = 0), linetype = "dashed", color = "gray") +
geom_errorbarh(aes(xmin = lc, xmax = uc, y = label), height = 0) +
geom_point(aes(x = med, y = label), pch = 21, fill = "gray65", color = "black") +
labs(x = "", y="") +
theme_classic()
There are no cross elevation effects, as we see in the estimated higher levels. This was not part of the original simulation so this also checks out.
Check for Understanding
Structural Equation Modeling via path analysis is a useful framework for testing causal hyptheses and understanding direct and indirect relationships among variables, particularly when considering latent variables.
Path diagrams are a strength of the method for their ability to carefully outline hypotheses in metamodels and then illustrate the direct and indirect relationships among variables using model estimated coefficients (and other properties e.g. residual variance/covariance estimates).
Model fit is evaluated by comparing the observed covariance matrix to the model-estimated covariance matrix.
The peicewise package has made it possible to model variables on non-normal distributions.
The overall relationships among variables can be elicited from many approaches and you gain more confidence in these relationships if you try a bunch of techniques to see where they agree and where they differ.