This tutorial is based on Partial Least Square R (plspm) package maintained by Gaston Sanchez. source
From the Structural Equation Modeling (SEM) standpoint, PLS-PM offers a different approach that doesn’t impose any distributional assumptions on the data that are hard to meet in real life, especially for non-experimental data. When we use a covariance-based SEM approach we are implicitly assuming that the data is generated by some “true” theoretical model. In this scenario, the goal of covariance structure analysis (CSA) is to recover the “true” model that gave rise to the observed covariances.
Partial least square is an alternative approach to SEM covariance structural analysis, PLS-PM can also be regarded as a technique for analyzing a system of relationships between multiple blocks of variables, or if you want to put it in simple terms,multiple data tables.
For this example our goal will be to obtain an Index of Success using data of Spanish professional football teams. The data comes from the professional Spanish Football League, La Liga, and it consists of 14 variables measured on 20 teams. I collected and manually curated data from the 2008-2009 season from different websites like LeagueDay.com, BDFutbol, ceroacero.es, statto.com and LFP. The resulting data set comes with the package plspm under the name spainfoot.
| Variable | Description |
|---|---|
| GSH | total number of goals scored at home |
| GSA | total number of goals scored away |
| SSH | percentage of matches with scores goals at home |
| SSA | percentage of matches with scores goals away |
| GCH | total number of goals conceded at home |
| GCA | total number of goals conceded away |
| CSH | percentage of matches with no conceded goals at home |
| CSA | percentage of matches with no conceded goals away |
| WMH | total number of won matches at home |
| WMA | total number of won matches away |
| LWR | longest run of won matches |
| LRWL | longest run of matches without losing |
| YC | total number of yellow cards |
| RC | total number of red cards |
Common PLS examples of indices may include Index of Satisfaction, Index of Motivation, Index of Usability, or in our case an Index of Success. The issue with these four concepts (satisfaction, motivation, usability, success), is that they are things that cannot be measured directly.
As it is typical with structural models, we usually rely on some kind of theory to propose a model. It can be a very complex theory or an extremely simple one. In this case we are not going to reinvent the wheel, so let’s define a simple model based on a basic yet useful theory:
The better the quality of the Attack, as well as the quality of the Defense, the more Success.
Block of Attack If you check the available variables in the data spainfoot, you will see that the first four columns have to do with scored goals, which in turn can be considered to reflect the Attack of a team. We are going to take those variables as indicators of Attack:
GSH number of goals scores at home
GSA number of goals scores away
SSH percentage of matches with scores goals at home
SSA percentage of matches with scores goals away
Block of Defense The following four columns in the data (from 4 to 8) have to do with the Defense construct
GCH number of goals conceded at home
GCA number of goals conceded away
CSH percentage of matches with no conceded goals at home
CSA percentage of matches with no conceded goals away
Block of Success Finally, columns 9 to 12 can be grouped in a third block of variables, the block associated with Success
WMH number of won matches at home
WMA number of won matches away
LWR longest run of won matches
LRWL longest run of matches without losing
# load plspm
library(plspm)
# load data spainfoot
data(spainfoot)
# first 5 row of spainfoot data
head(spainfoot)
## GSH GSA SSH SSA GCH GCA CSH CSA WMH WMA LWR LRWL YC RC
## Barcelona 61 44 0.95 0.95 14 21 0.47 0.32 14 13 10 22 76 6
## RealMadrid 49 34 1.00 0.84 29 23 0.37 0.37 14 11 10 18 115 9
## Sevilla 28 26 0.74 0.74 20 19 0.42 0.53 11 10 4 7 100 8
## AtleMadrid 47 33 0.95 0.84 23 34 0.37 0.16 13 7 6 9 116 5
## Villarreal 33 28 0.84 0.68 25 29 0.26 0.16 12 6 5 11 102 5
## Valencia 47 21 1.00 0.68 26 28 0.26 0.26 12 6 5 8 120 6
As we already mention, the inner or structural model represents the relationships between the latent variables. Looking at the diagram of the inner model, we can think of it as a flowchart representing a causal process.Moreover, we can think of the inner model as a network and by doing so we can represent it in matrix format
Attack <- c(0, 0, 0)
Defense <- c(0, 0, 0)
Success <- c(1, 1, 0)
foot_path <- rbind(Attack, Defense, Success)
colnames(foot_path) <- rownames(foot_path)
foot_path
## Attack Defense Success
## Attack 0 0 0
## Defense 0 0 0
## Success 1 1 0
# graph structural model
innerplot(foot_path)
# define latent variable associated with
foot_blocks <- list(1:4, 5:8, 9:12)
# vector of modes (reflective)
foot_modes <- c("A", "A", "A")
# run plspm analysis
foot_pls <- plspm(Data = spainfoot, path_matrix = foot_path, blocks = foot_blocks, modes = foot_modes)
foot_pls
## Partial Least Squares Path Modeling (PLS-PM)
## ---------------------------------------------
## NAME DESCRIPTION
## 1 $outer_model outer model
## 2 $inner_model inner model
## 3 $path_coefs path coefficients matrix
## 4 $scores latent variable scores
## 5 $crossloadings cross-loadings
## 6 $inner_summary summary inner model
## 7 $effects total effects
## 8 $unidim unidimensionality
## 9 $gof goodness-of-fit
## 10 $boot bootstrap results
## 11 $data data matrix
## ---------------------------------------------
## You can also use the function 'summary'
Unidimensionaltiy measure the variables in the same thing and the same direction. In PLS-PM, there are 3 main indices to check unidimensionality:
Calculate the Cronbach’s alpha
Calculate the Dillon-Goldstein’s rho
Check the first eigenvalue of the indicators’ correlation matrix
foot_pls$unidim
## Mode MVs C.alpha DG.rho eig.1st eig.2nd
## Attack A 4 0.8905919 0.92456079 3.017160 0.7923055
## Defense A 4 0.0000000 0.02601677 2.393442 1.1752781
## Success A 4 0.9165491 0.94232868 3.217294 0.5370492
the Attack block has an alpha of 0.89, Defense has an alpha of 0.00, and Success has an alpha of 0.91. As a rule of thumb, a cronbach’s alpha greater than 0.7 is considered acceptable. According to this rule Attack and Success are good blocks, but not Defense. This provides us a warning sign that something potentially wrong is occurring with the manifest variables of Defense.
Another metric used to assess the unidimensionality of a reflective block is the DillonGoldstein’s rho which focuses on the variance of the sum of variables in the block of interest.As a rule of thumb, a block is considered as unidimensional when the Dillon-Goldstein’s rho is larger than 0.7
The third metric involves an eigen-analysis of the correlation matrix of each set of indicators. The use of this metric is based on the importance of the first eigenvalue. If a block is unidimensional, then the first eigenvalue should be much more" larger than 1 whereas the second eigenvalue should be smaller than 1
plot(foot_pls, what = "loadings")
The loadings are correlations between latent variable and its indicators, and communalities are squared correlations. Communalities are simply squared loadings and they measure the part of the variance between a latent variable and its indicator that is common to both.
foot_pls$outer_model
## name block weight loading communality redundancy
## 1 GSH Attack 0.3366220 0.9379455 0.8797418 0.0000000
## 2 GSA Attack 0.2819492 0.8620997 0.7432159 0.0000000
## 3 SSH Attack 0.2892480 0.8408295 0.7069942 0.0000000
## 4 SSA Attack 0.2396082 0.8263084 0.6827856 0.0000000
## 5 GCH Defense -0.1087380 0.4836561 0.2339232 0.0000000
## 6 GCA Defense -0.3914625 0.8759007 0.7672021 0.0000000
## 7 CSH Defense 0.3273741 -0.7463736 0.5570735 0.0000000
## 8 CSA Defense 0.4035138 -0.8926150 0.7967615 0.0000000
## 9 WMH Success 0.2308955 0.7755070 0.6014111 0.5145455
## 10 WMA Success 0.3029553 0.8863662 0.7856450 0.6721694
## 11 LWR Success 0.2821408 0.9686187 0.9382222 0.8027089
## 12 LRWL Success 0.2957718 0.9437099 0.8905884 0.7619551
Besides checking the loadings of the indicators with their own latent variables, we must also check the so-called cross-loadings. That is, the loadings of an indicator with the rest of latent variables.
foot_pls$crossloadings
## name block Attack Defense Success
## 1 GSH Attack 0.9379455 -0.5159446 0.8977256
## 2 GSA Attack 0.8620997 -0.3390746 0.7519204
## 3 SSH Attack 0.8408295 -0.4139277 0.7713854
## 4 SSA Attack 0.8263084 -0.3361551 0.6390025
## 5 GCH Defense -0.1305171 0.4836561 -0.1597543
## 6 GCA Defense -0.4621560 0.8759007 -0.5751232
## 7 CSH Defense 0.3188076 -0.7463736 0.4809671
## 8 CSA Defense 0.4214853 -0.8926150 0.5928287
## 9 WMH Success 0.7085826 -0.4226144 0.7755070
## 10 WMA Success 0.7730524 -0.7114747 0.8863662
## 11 LWR Success 0.8444012 -0.5380149 0.9686187
## 12 LRWL Success 0.8600572 -0.5891724 0.9437099
After assessing the quality of the measurement model, the next stage is to assess the structural part. To inspect the results of each regression in the structural equations we need to display the results contained in $inner_model. Besides the results of the regression equations, the quality of the structural model is evaluated by examining three indices or quality metrics:
the coefficient determination
the redundancy index
the Goodness-of-Fit (GoF)
foot_pls$inner_model
## $Success
## Estimate Std. Error t value Pr(>|t|)
## Intercept -1.997769e-16 0.09217513 -2.167362e-15 1.000000e+00
## Attack 7.572610e-01 0.10439994 7.253462e+00 1.348869e-06
## Defense -2.836068e-01 0.10439994 -2.716542e+00 1.465963e-02
For each regression in the structural model we have an \(R^2\) that is interpreted similarly as in any multiple regression analysis. \(R^2\) indicates the amount of variance in the endogenous latent variable explained by its independent latent variables.
Low : R < 0.30 (although some authors consider R < 0.20)
Moderate : 0.30 < R < 0.60 (you can also find 0.20 < R < 0.50)
High: R > 0.60 (alternatively there’s also R > 0.50)
foot_pls$inner_summary
## Type R2 Block_Communality Mean_Redundancy AVE
## Attack Exogenous 0.0000000 0.7531844 0.0000000 0.7531844
## Defense Exogenous 0.0000000 0.5887401 0.0000000 0.5887401
## Success Endogenous 0.8555637 0.8039667 0.6878447 0.8039667
Redundancy measures the percent of the variance of indicators in an endogenous block that is predicted from the independent latent variables associated to the endogenous LV. Another definition of redundancy is the amount of variance in an endogenous construct explained by its independent latent variables
\[Rd(LV_k, mv_{jk} = loading^2_{jk}R^2_{jk})\]
The GoF index is a pseudo Goodness of fit measure that accounts for the model quality at both the measurement and the structural models. GoF is calculated as the geometric mean of the average communality and the average \(R^2\) value
foot_pls$gof
## [1] 0.7822929