Path Analysis

Path analysis is a form of multiple regression that examines statistical causal mechanism between IVs and a DV. Using this method one can estimate both the magnitude and significance of causal connections between variables.

We have multiple regression equations that explain relationships between the DVs to the IV in a restricted order as explained by the model, which is why Path Analysis is also called Simultaneous Equation.

A model of Path Analysis can show both direct and indirect effects of IVs to the DV.

How to use Path Analysis

When conducting path analysis one should first construct an input path diagram, which illustrates the hypothesized relationships. After statistical analysis has been completed, an output path diagram can then be constructed, which illustrates the relationships as they actually exist, according to the analysis conducted.

First, we set our working directory and load required packages.

setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/Path Analysis")

library(lavaan)
library(semPlot)
library(OpenMx)
library(tidyverse)
library(knitr)
library(GGally)
library(ggcorrplot)

Four steps in conducting a Path Analysis are:
1. Read in your data (as a correlation matrix or raw data)
2. Specify the model
3. Fit the model
4. View the results

In this practice, we will use mtcars dataset.

Read the data

ggcorr(mtcars[-c(5, 7, 8)], #omit the 5th, 7th, and 8th variables. 
       nbreaks = 6, 
       label = T, low = "red3", high = "green3",
       label_round = 2, name = "Correlation Scale", label_alpha = T, hjust = 0.75) +
  ggtitle(label = "Correlation Plot") +
  theme(plot.title = element_text(hjust = 0.6)) #move the title to the middle

We can also take a detour to test for correlation significance.

variables.to.use<-c("mpg","cyl","disp","hp","wt","am","gear","carb")
mtcars.corr<-cor(mtcars[variables.to.use],
                 method = "pearson",
                 use='pairwise.complete.obs')
ggcorrplot(mtcars.corr,
           p.mat=cor_pmat(mtcars[variables.to.use]),
           hc.order=T, 
           type='lower',
           color=c('red3', 'white', 'green3'),
           outline.color = 'darkgoldenrod1', 
           lab=T,
           legend.title='Correlation',
           pch=4, 
           pch.cex=12, 
           lab_size=6)+ 
  labs(title="mtcars correlation")+
  theme(plot.title=element_text(face='bold',size=14,hjust=0.5,colour="darkred"))+
  theme(legend.position=c(0.10,0.80), legend.box.just = "bottom")

Specify the model

First, we must identify the independent and dependent variables within our dataset with lavaan formula.

We will use mpg as the IV, and cyl, disp, hp, gear, am, wt, and carb as the DV. Also, we will assume that hp is a function of cyl, disp, and carb.

model <-'
mpg ~ hp + gear + cyl + disp + carb + am + wt
hp ~ cyl + disp + carb
'

Fit the model

The cfa() function is a dedicated function for fitting confirmatory factor analysis models.

fit <- cfa(model, data = mtcars)

View the results

summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)

## lavaan 0.6-6 ended normally after 62 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         12
##                                                       
##   Number of observations                            32
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 7.901
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.048
## 
## Model Test Baseline Model:
## 
##   Test statistic                               132.831
##   Degrees of freedom                                13
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.959
##   Tucker-Lewis Index (TLI)                       0.823
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -220.099
##   Loglikelihood unrestricted model (H1)       -216.148
##                                                       
##   Akaike (AIC)                                 464.198
##   Bayesian (BIC)                               481.787
##   Sample-size adjusted Bayesian (BIC)          444.377
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.226
##   90 Percent confidence interval - lower         0.019
##   90 Percent confidence interval - upper         0.425
##   P-value RMSEA <= 0.05                          0.062
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.025
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
##   mpg ~                                                                   
##     hp                -0.022    0.016   -1.388    0.165    -0.022   -0.243
##     gear               0.586    1.247    0.470    0.638     0.586    0.071
##     cyl               -0.848    0.710   -1.194    0.232    -0.848   -0.248
##     disp               0.006    0.012    0.512    0.609     0.006    0.127
##     carb              -0.472    0.620   -0.761    0.446    -0.472   -0.125
##     am                 1.624    1.542    1.053    0.292     1.624    0.133
##     wt                -2.671    1.267   -2.109    0.035    -2.671   -0.428
##   hp ~                                                                    
##     cyl                7.717    6.554    1.177    0.239     7.717    0.201
##     disp               0.233    0.087    2.666    0.008     0.233    0.421
##     carb              20.273    3.405    5.954    0.000    20.273    0.478
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
##    .mpg                5.011    1.253    4.000    0.000     5.011    0.139
##    .hp               644.737  161.184    4.000    0.000   644.737    0.142
## 
## R-Square:
##                    Estimate 
##     mpg                0.861
##     hp                 0.858

From the above summary, p-value section, wt is a significant indicator of mpg and both disp and carb are significant indicators of hp. However, hp itself is not significant with respect to mpg.

One of the best ways to understand an SEM model is to inspect the model visually using a path diagram from semplot.

Building a Structural Equation Model (SEM)

The semPaths() function provides a quick and easy way to generate a visual representation of your model and automatically calculates key statistics that describe the relationships between the DV and each IV.

semPaths(fit, what = 'std', layout = 'tree', edge.label.cex=.9, curvePivot = TRUE, color. = "pink")

## Warning in qgraph::qgraph(Edgelist, labels = nLab, bidirectional = Bidir, : The
## following arguments are not documented and likely not arguments of qgraph and
## thus ignored: color.

The “tree” layout provides a good amount of space between the variables, making it easier to read. The arrows and values between the DV and each IV (or moderator) indicate path coefficient, which are standardized versions of linear regression weights examining statistical causality between variables.

The standardization involves multiplying the ordinary regression coefficient by the standard deviations of the corresponding explanatory variable: these can then be compared to assess the relative effects of the variables within the fitted regression model.