Question 1

1a)

Partial least regression (PLS) effectively addresses both collinearity of outcomes and predictors. While predictor reduction addresses collinearity of predictors, multivariate regression handles collinearity of outcomes. Consequently, PLS regression integrates both approaches.

1b)

In PLS regression, there are usually no NHSTs because PCR is a descrptive method. Instead, the focus lies in detailing the reduced space. However, for researchers seeking NHSTs, two general options exist:

  • Conduct an OLS multiple regression as a follow-up on the composite scores of extracted components.

  • Execute a jackknife test on the individual predictor coefficients.

To test individual predictors in PLS regression, the process involves extracting components and creating composites from individual predictors per component. Subsequently, an OLS regression can be run on the composite scores. This approach is logically and mathematically acceptable since the extracted components, orthogonal to each other based on varimax rotation of PCR, can be entered into an OLS multiple regression, given there is no correlation (r ~ 0).

1c)

Step by step instructions for S.B.:

Absence of Outliers:

  • Use a multivariate outlier technique like MCD75.001 for screening.

Linearity of Outcome-Prediction:

  • Ensure the shape is amenable to the (partial) least-square solution, indicating constant slope per predictor or latent construct.

  • Preliminary checks can be performed with zero-order correlations.

Acceptable Range of Inter-Predictor Correlations (Linearity & Separability):

  • Verify through zero-order correlations of predictors (approx. rs between .35 and .85) or VIFs on individual predictors (range: between 2.00 and 9.99).

Absence of Singularity:

  • Check using the determinant of the entire matrix (predictors and outcomes)

Once the assumptions are checked then,

  • Build a PLS regression model: To build a PLS multiple regression model in R, S.B. can create an object using the plsr () function from pls package that is the equation to be generated.

  • Determine the Number of Components to Extract from the PLS Model: Once the PLS multiple regression model is built in R, S.B. can use the summary () function to get the results of the validation technique specified and the number of components that could be extracted. They can interpret the % of variance explained and how much variance the components cumulatively account for in the outcome. They can make a decision based on picking the fewest components that give them the biggest reduction in error.

  • Plot the RMSEP to Determine the Number of Components to Extract: S.B. can plot the root mean square error of prediction (RMSEP) to determine the number of components that could be extracted. This is similar to (but different from) a Cattell scree plot on factors. They can use the plot and RMSEP() functions together on the PLS equation. They can then interpret the “drops” on the y-axis. The steeper the drop, the more that component lessens the error in prediction. Consequently, the largest drops in RMSEP are the components that should be extracted.

  • Use Principal Components Regression (PCR) Loadings Table for Predictors in PLS.

  • Jackknife Test for Individual Predictor Contributions in the PLS Regression.

Question 2

2a)

Multinomial DFA would work well here because the DV has more than 2 values. Also, the correlation between the IVs are mostly between .35 and .80 and mDFA is more effective with correlated predictors.

2b)

MANOVA primarily tests for differences in group means across all dependent variables simultaneously. In Chad’s case, he’s interested in the individual predictor effects on the choice of occupation, not just overall differences between groups. Even if MANOVA finds a significant overall effect, it wouldn’t tell him which specific predictor drives this effect or for which specific occupation choice. Multinomial DFA primarily focuses on discriminating between the different occupational groups based on predictor variables. This aligns with Chad’s goals of understanding which predictors differentiate individuals drawn to different I/O psychology specializations.

2c)

Chad should delve into past research related to occupational choice, personality traits, and relevant I/O psychology concepts. This can reveal established relationships between variables and uncover theoretical frameworks supporting specific predictions.

There are different plots he can run, such as a bi-plot or histogram that show the discriminating (separate) functions which are visually displayed. He can further analyze each individual predictor’s discriminant weight for a deeper understanding. Additionally by using an R function such as candisc, Chad can see which discriminant function is statistically significant. That way he can see how the outcome variables are separated from each other, as well as the discriminant power of the results he has analyzed.

2d)

When requesting the loadings on the object S.B. created in R, they should ask for the number of laodings by typing in a comma in brackets and the number of loadings as such: ObjectName$loadings[,1:2]. This will them the 2 components extracted.

Question 3

3a)

Anna Mae should strongly consider using MANOVA in her analyses, because simple ANOVA will assume orthogonality of the dependent variables. When the DVs are combined into a canonical variate then better separation of the orthogonal aspects of those DVs is possible. For MANOVA, the multiple DVs should be correlated between approx. r= .35 and r = .80 which is the case here.

  • Conduct a regular ANOVA on the first DV.

  • Perform an ANCOVA on the second DV, using the first DV as a covariate. This controls for the variance shared with the first DV.

3b)

Crucial Assumptions to Check for MANOVA:

  • Absence of Multivariate Outliers: Assessed using Mahalanobis distance metrics (e.g., MCD75, alpha = .001).

Linearity and Separability of Dependent Variables (DVs):

  • Assessed by a zero-order correlation matrix, looking for correlations between .35 and .80.

Absence of Singularity:

  • Assessed by examining the determinant.

Homogeneity of the Variance-Covariance Matrices:

  • The multiple DVs must show equality of variances in relation to each other, similar to Levene’s test in multivariate space. Assessed by Box’s M test, interpreted at p < .001 for violation.

Homogeneity of Regression Slopes of all DV pairs:

  • Assessed through visualization in Exploratory Data Analysis (EDA).

Conducting the Initial MANOVA Analysis:

  • Perform omnibus multivariate test to determine whether there is a difference somewhere between the levels of the independent variable on the canonical variate. Determine if levels differ by “peeling off” the DVs that comprise the canonical variate and testing whether the IV levels differ on each DV or DV segment.

Steps to Roy-Bargmann Stepdown Analysis:

  • First Step: Conduct a regular ANOVA on the first DV.

  • Second Step: Perform an ANCOVA on the second DV, controlling for the first DV.

  • Subsequent Steps: Continue adding prior DVs as covariates in an ANCOVA model.

  • Per Dr. Tate’s idea it’s also possible to run ANCOVA on all the DVs, assuming there are no issues from the pure math perspective.

3c)

There is little to no consensus on choosing the DV order in the field therefore the choice is arbitrary, as long as homogeneity of regression slopes holds in all cases. Some authors argue highest priority DV goes in first; others argue it goes last.

Choosing the order should be based on expected relationships (based on established literature) between the independent variable and the DVs. This can enhance interpretability.

Prioritizing DVs with larger expected effects or higher intercorrelations with other DVs can also be advantageous.