Path analysis is a form of multiple regression that examines statistical causal mechanism between IVs and a DV. Using this method one can estimate both the magnitude and significance of causal connections between variables.
We have multiple regression equations that explain relationships between the DVs to the IV in a restricted order as explained by the model, which is why Path Analysis is also called Simultaneous Equation.
A model of Path Analysis can show both direct and indirect effects of IVs to the DV.
When conducting path analysis one should first construct an input path diagram, which illustrates the hypothesized relationships. After statistical analysis has been completed, an output path diagram can then be constructed, which illustrates the relationships as they actually exist, according to the analysis conducted.
First, we set our working directory and load required packages.
setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/Path Analysis")
library(lavaan)
library(semPlot)
library(OpenMx)
library(tidyverse)
library(knitr)
library(GGally)
library(ggcorrplot)
Four steps in conducting a Path Analysis are:
1. Read in your data (as a correlation matrix or raw data)
2. Specify the model
3. Fit the model
4. View the results
In this practice, we will use mtcars dataset.
ggcorr(mtcars[-c(5, 7, 8)], #omit the 5th, 7th, and 8th variables.
nbreaks = 6,
label = T, low = "red3", high = "green3",
label_round = 2, name = "Correlation Scale", label_alpha = T, hjust = 0.75) +
ggtitle(label = "Correlation Plot") +
theme(plot.title = element_text(hjust = 0.6)) #move the title to the middle
We can also take a detour to test for correlation significance.
variables.to.use<-c("mpg","cyl","disp","hp","wt","am","gear","carb")
mtcars.corr<-cor(mtcars[variables.to.use],
method = "pearson",
use='pairwise.complete.obs')
ggcorrplot(mtcars.corr,
p.mat=cor_pmat(mtcars[variables.to.use]),
hc.order=T,
type='lower',
color=c('red3', 'white', 'green3'),
outline.color = 'darkgoldenrod1',
lab=T,
legend.title='Correlation',
pch=4,
pch.cex=12,
lab_size=6)+
labs(title="mtcars correlation")+
theme(plot.title=element_text(face='bold',size=14,hjust=0.5,colour="darkred"))+
theme(legend.position=c(0.10,0.80), legend.box.just = "bottom")
First, we must identify the independent and dependent variables within our dataset with lavaan formula.
We will use mpg as the IV, and cyl, disp, hp, gear, am, wt, and carb as the DV. Also, we will assume that hp is a function of cyl, disp, and carb.
model <-'
mpg ~ hp + gear + cyl + disp + carb + am + wt
hp ~ cyl + disp + carb
'
The cfa() function is a dedicated function for fitting confirmatory factor analysis models.
fit <- cfa(model, data = mtcars)
summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
## lavaan 0.6-6 ended normally after 62 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 12
##
## Number of observations 32
##
## Model Test User Model:
##
## Test statistic 7.901
## Degrees of freedom 3
## P-value (Chi-square) 0.048
##
## Model Test Baseline Model:
##
## Test statistic 132.831
## Degrees of freedom 13
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.959
## Tucker-Lewis Index (TLI) 0.823
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -220.099
## Loglikelihood unrestricted model (H1) -216.148
##
## Akaike (AIC) 464.198
## Bayesian (BIC) 481.787
## Sample-size adjusted Bayesian (BIC) 444.377
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.226
## 90 Percent confidence interval - lower 0.019
## 90 Percent confidence interval - upper 0.425
## P-value RMSEA <= 0.05 0.062
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.025
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## mpg ~
## hp -0.022 0.016 -1.388 0.165 -0.022 -0.243
## gear 0.586 1.247 0.470 0.638 0.586 0.071
## cyl -0.848 0.710 -1.194 0.232 -0.848 -0.248
## disp 0.006 0.012 0.512 0.609 0.006 0.127
## carb -0.472 0.620 -0.761 0.446 -0.472 -0.125
## am 1.624 1.542 1.053 0.292 1.624 0.133
## wt -2.671 1.267 -2.109 0.035 -2.671 -0.428
## hp ~
## cyl 7.717 6.554 1.177 0.239 7.717 0.201
## disp 0.233 0.087 2.666 0.008 0.233 0.421
## carb 20.273 3.405 5.954 0.000 20.273 0.478
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .mpg 5.011 1.253 4.000 0.000 5.011 0.139
## .hp 644.737 161.184 4.000 0.000 644.737 0.142
##
## R-Square:
## Estimate
## mpg 0.861
## hp 0.858
From the above summary, p-value section, wt is a significant indicator of mpg and both disp and carb are significant indicators of hp. However, hp itself is not significant with respect to mpg.
One of the best ways to understand an SEM model is to inspect the model visually using a path diagram from semplot.
The semPaths() function provides a quick and easy way to generate a visual representation of your model and automatically calculates key statistics that describe the relationships between the DV and each IV.
semPaths(fit, what = 'std', layout = 'tree', edge.label.cex=.9, curvePivot = TRUE, color. = "pink")
## Warning in qgraph::qgraph(Edgelist, labels = nLab, bidirectional = Bidir, : The
## following arguments are not documented and likely not arguments of qgraph and
## thus ignored: color.
The “tree” layout provides a good amount of space between the variables, making it easier to read. The arrows and values between the DV and each IV (or moderator) indicate path coefficient, which are standardized versions of linear regression weights examining statistical causality between variables.
The standardization involves multiplying the ordinary regression coefficient by the standard deviations of the corresponding explanatory variable: these can then be compared to assess the relative effects of the variables within the fitted regression model.