What is Path Analysis?

Path analysis is a form of multiple regression statistical analysis used to evaluate causal models by examining the relationships between a dependent variable and two or more independent variables. Using this method one can estimate both the magnitude and significance of causal connections between variables.

There are two main requirements for path analysis:

  1. All causal relationships between variables must go in one direction only (you cannot have a pair of variables that cause each other)

  2. The variables must have a clear time-ordering since one variable cannot be said to cause another unless it precedes it in time.

Path analysis is theoretically useful because, unlike other techniques, it forces us to specify relationships among all of the independent variables. This results in a model showing causal mechanisms through which independent variables produce both direct and indirect effects on a dependent variable.

How to use Path Analysis

Typically path analysis involves the construction of a path diagram in which the relationships between all variables and the causal direction between them are specifically laid out.

When conducting path analysis one should first construct an input path diagram, which illustrates the hypothesized relationships. After statistical analysis has been completed, an output path diagram can then be constructed, which illustrates the relationships as they actually exist, according to the analysis conducted.

While path analysis is useful for evaluating causal hypotheses, this method cannot determine the direction of causality. It clarifies correlation and indicates the strength of a causal hypothesis, but does not prove direction of causation.

R Packages used

NOTE: OpenMx is required to run semPlot. To install OpenMx, paste the below command into your console and press enter:

source('http://openmx.psyc.virginia.edu/getOpenMx.R')

Once OpenMx is installed, you can now load the required packages:

library(lavaan)
library(semPlot)
library(OpenMx)
library(tidyverse)
library(knitr)
library(kableExtra)
library(GGally)

# Organizing package information for table
packages <- c("tidyverse", "knitr", "kableExtra", "lavaan", "semPlot", "OpenMx", "GGally")
display <- c("Package","Title", "Maintainer", "Version", "URL")
table <- matrix(NA, 1, NROW(display), dimnames = list(1, display))
for(i in 1:NROW(packages)){
list <- packageDescription(packages[i])
table <- rbind(table, matrix(unlist(list[c(display)]), 1, NROW(display), byrow = T))
}
table[,NROW(display)] <- stringr::str_extract(table[,NROW(display)], ".+,")

# Table of packages
kable(table[-1,], format = "html", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Package Title Maintainer Version URL
tidyverse Easily Install and Load ‘Tidyverse’ Packages Hadley Wickham <hadley@rstudio.com>; 1.1.1 http://tidyverse.org,
knitr A General-Purpose Package for Dynamic Report Generation in R Yihui Xie <xie@yihui.name>; 1.16 NA
kableExtra Construct Complex Table with ‘kable’ and Pipe Syntax Hao Zhu <haozhu233@gmail.com>; 0.4.0 http://haozhu233.github.io/kableExtra/,
lavaan Latent Variable Analysis Yves Rosseel <Yves.Rosseel@UGent.be>; 0.5-23.1097 NA
semPlot Path Diagrams and Visual Analysis of Various SEM Packages’ Output Sacha Epskamp <mail@sachaepskamp.com>; 1.1 NA
OpenMx Extended Structural Equation Modelling Joshua N. Pritikin <jpritikin@pobox.com>; 2.7.12 http://openmx.ssri.psu.edu,
GGally Extension to ‘ggplot2’ Barret Schloerke <schloerke@gmail.com>; 1.3.2 https://ggobi.github.io/ggally,

Conducting a Path Analysis in R

The four general steps to conducting a Path Analysis in R include:

  1. Read in your data (as a correlation matrix or raw data)
  2. Specify the model
  3. Fit the model
  4. View the results

Read in your data

For this tutorial, we will use the mtcars dataset to demonstrate how to conduct a path analysis. However, a covariance matrix can also be used if necessary.

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Specify the model

First, we must identify the independent and dependent variables within our dataset.

In the R environment, a regression formula has the following form: y ~ x1 + x2 + x3 + x4

In this formula, the tilde sign (“~”) is the regression operator. On the left-hand side of the operator, we have the dependent variable (y), and on the right-hand side, we have the independent variables, each one separated by the “+” operator.

For this demonstration, we will utilize mpg as the independent variable and cyl, disp, hp, gear, am, wt and carb as the dependent variables. Furthermore, we will also assume that hp is a function of cyl, disp, and carb.

model <-'
mpg ~ hp + gear + cyl + disp + carb + am + wt
hp ~ cyl + disp + carb
'

Fit the model

fit <- cfa(model, data = mtcars)

The cfa() function is a dedicated function for fitting confirmatory factor analysis models. The first argument is the user-specified model. The second argument is the dataset that contains the observed variables. Once the model has been fitted, the summary() function provides a nice summary of the fitted model.

View the results

summary(fit, fit.measures = TRUE, standardized=T,rsquare=T)
## lavaan (0.5-23.1097) converged normally after  62 iterations
## 
##   Number of observations                            32
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                7.901
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.048
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              132.831
##   Degrees of freedom                                13
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.959
##   Tucker-Lewis Index (TLI)                       0.823
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -541.437
##   Loglikelihood unrestricted model (H1)       -537.487
## 
##   Number of free parameters                         12
##   Akaike (AIC)                                1106.874
##   Bayesian (BIC)                              1124.463
##   Sample-size adjusted Bayesian (BIC)         1087.054
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.226
##   90 Percent Confidence Interval          0.019  0.425
##   P-value RMSEA <= 0.05                          0.062
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.025
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
##   mpg ~                                                                   
##     hp                -0.022    0.016   -1.388    0.165    -0.022   -0.243
##     gear               0.586    1.247    0.470    0.638     0.586    0.071
##     cyl               -0.848    0.710   -1.194    0.232    -0.848   -0.248
##     disp               0.006    0.012    0.512    0.609     0.006    0.127
##     carb              -0.472    0.620   -0.761    0.446    -0.472   -0.125
##     am                 1.624    1.542    1.053    0.292     1.624    0.133
##     wt                -2.671    1.267   -2.109    0.035    -2.671   -0.428
##   hp ~                                                                    
##     cyl                7.717    6.554    1.177    0.239     7.717    0.201
##     disp               0.233    0.087    2.666    0.008     0.233    0.421
##     carb              20.273    3.405    5.954    0.000    20.273    0.478
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all
##    .mpg                5.011    1.253    4.000    0.000     5.011    0.139
##    .hp               644.737  161.184    4.000    0.000   644.737    0.142
## 
## R-Square:
##                    Estimate 
##     mpg                0.861
##     hp                 0.858

As we can see from the above summary, wt is a significant indicator of mpg and both disp and carb are significant indicators of hp. However, hp itself is not significant with respect to mpg.

One of the best ways to understand an SEM model is to inspect the model visually using a path diagram. Thanks to the semPlot package, this is easy to do.

Building a Structural Equation Model (SEM)

The semPaths() function provides a quick and easy way to generate a visual representation of your model and automatically calculates key statistics that describe the relationships between the dependent variable and each independent variable. The SEM produced below is that of the mtcars model we created earlier in this tutorial.

https://rdrr.io/cran/semPlot/man/semPaths.html provides a good breakdown of many additional customization options.

semPaths(fit, 'std', layout = 'circle')

Exercises

Exercise 1: What other layouts can you find that might make the SEM easier to read? HINT: Google search “semPath layouts”.

semPaths(fit,"std",layout = 'tree', edge.label.cex=.9, curvePivot = TRUE)

The “tree” layout provides a good amount of space between the variables, making it easier to read. The diagram can be customized much further to the programmer’s desire, however that is beyond the scope of this tutorial.

Exercise 2: What do the arrows and values between each independent variable and the dependent variable represent?

The arrows and values between each independent variable and the dependent variable (or moderating variable) are path coefficients. Path coefficients are standardized versions of linear regression weights which can be used in examining the possible causal linkage between statistical variables in the structural equation modeling approach. The standardization involves multiplying the ordinary regression coefficient by the standard deviations of the corresponding explanatory variable: these can then be compared to assess the relative effects of the variables within the fitted regression model.

We can see from the path coefficients in our SEM that mpg is more strongly caused by wt than by any other variable.

Exercise 3: What other inferences can you draw about the relationship between variables from the above SEM?

Exercise 4: What do the arrows and values between the independent variables represent?

ggcorr(mtcars[-c(5, 7, 8)], nbreaks = 6, label = T, low = "red3", high = "green3", 
       label_round = 2, name = "Correlation Scale", label_alpha = T, hjust = 0.75) +
  ggtitle(label = "Correlation Plot") +
  theme(plot.title = element_text(hjust = 0.6))

As we can see, the arrows and values between the independent variables on the SEM match those calculated through the use of a correlation plot.