Barbara! What we are looking at items loading on a factor, we are basically measuring “meaningful” variance from a set of items. You are not getting a 0.40 loading because there are more factors than one and these items are not unidimensional and are more appropriately loaded on a 2 factor model. You have to conduct the following steps:
You have to make sure the ratio of cases to variables exceeds the 5:1 standard and the more conservative 20:1 standard for many cognitive psychology instances like this one. There are 4 variables and therefore there have to be 80 (4*20) participants (cases) here to have a sufficient cases to variables ratio for an EFA.
You have to screen for outliers and delete outliers if they are observed. She can use the MCD 75.001 technique. She should a) set the seed for reproducibility with the set.seed () code and b) create an outlier variable with .75 quantile and alpha of .001 and c) if any outliers exist she can use the which () function to delete them.
Example:
set.seed(123)
Cogtasks.listwiseMCD75 = aq.plot(Cogtasks.listwise, quan = .75, alpha = .001)$outliers
Cogtasks.listwise.final = Cogtasks.listwise[-c(which(Cogtasks.listwise$MCD75==“TRUE”)),]
You then should run a KMO test to determine the sampling adequacy. If the overall MSA exceeds the .60 standard then the model is adequate for an EFA.
You then should conduct a test for absence of orthogonality by running a Bartlett’s correlation test like the following: cortest.bartlett(CogTasks.listwise.final)
The next step is test for singularity by calculating the the determinant of the entire correlation matrix. Example: det(cor(CogTasks.listwise.final))
Now it’s time to observe how many factors should be extracted. This can be done via various methods. These methods include a) you can calculate the Kaiser’s little jiffy b) you can then generate a Cattell’s scree plot to visualize how many factors can be extracted. c) another technique is Horn’s parallel analysis
Now it’s time to extract. Based on the steps above if a 2 factor solution looks like the best option then you should proceed with a Varimax or Oblimin rotation. Oblimin rotation is a better choice when the factors are correlated.
You can examine the Factor Correlation Matrix. This matrix shows the correlations between each pair of factors. When you run the the Oblimin rotation in R the code additionally provides factor inter-correlations as well.
Example: ML. CogTasks = fa(CogTasks. listwise, nfactors = 5, rotate = “oblimin”, fm = “ml”, max.iter = 1000)
Printing the factor loadings:
print.psych(ML.CogTasks, cut = 0.30, sort = T)
The reason that Mike’s univariate boxplots method is sun-par is actually written in the problem. It’s because that would “treat all predictors at the same level”.
Univariate boxplots, while useful for detecting outliers within each variable, are sub-optimal in Mike’s case because they don’t consider the hierarchical structure of the data. In Hierarchical Linear Modeling (HLM), data are nested (employees within companies), and univariate analyses don’t account for this nesting. This can lead to misleading conclusions about variability and outlier detection.
Mike could perform a multilevel EDA. For instance, he could create
boxplots of employee-level variables within each company to observe
within- and between-company variability. Additionally, using packages
like lmerTest and lmer() and influence() function, Mike
could create the model somewhat like the ones below:
Exploratory Data Analysis (EDA) also known as Assumption Checking based on
ProductivityComp = conscientiousness + extraversion + grit + nfc + ahw + grit*ahw
HLM_model = lmer(ProductivityComp ~ conscientiousness + extraversion + grit + nfc + ahw + grit:ahw + (1 | Company), data = datafile)
estimated.HLM_model = influence(HLM_model)
Same mentioned reason as 2a, because Mike’s collinearity analysis is treating all predictors at the same level and ignoring the multilevel nature of the model.
The potential solution for Mike’s collinearity issue is centering the terms that are part of the interaction effect can indeed help in reducing non-essential collinearity, which often arises from scaling or the inclusion of interaction terms. This involves subtracting the mean of each predictor from its values. By centering the predictors around zero, it reduces multicollinearity between the main effects and their interaction terms. This is because the main effects and the interaction terms are no longer artificially inflated due to the scale of the original variables.
datafilecentered_conscientiousness = datafileSconscientiousness - mean(datafile$conscientiousness, na.rm = TRUE)
datafilecentered_grit = datafilegrit - mean(datafile$grit, na.rm = TRUE)
And then create the interaction term with these centered variables:
datafileinteraction_term = datafilecentered_grit * datafile$ahw
Low tolerance values suggest that there is a high degree of multicollinearity. This means that conscientiousness and grit are highly correlated with each other. In regression analysis, high multicollinearity can distort the estimates of the regression coefficients and make them unstable or unreliable. It becomes difficult to isolate the individual effect of each predictor on the outcome variable due to this multicollinearity. This could be why Mike is seeing non-significant results even for variables expected to be significant (like conscientiousness).
Options to Proceed:
-Centering Variables: As discussed earlier, centering the variables (either grand mean or group mean centering) can help reduce multicollinearity that arises due to scaling and the inclusion of interaction terms.
-Remove Highly Correlated Predictors: If certain predictors are very highly correlated, consider removing one to reduce multicollinearity. This decision should be based on theoretical justifications and the research questions at hand. Or he can combine the two correlated predictors.
-Regularization Techniques: If retaining all predictors is important, Mike might consider using regularization techniques like Ridge Regression or Lasso, which are designed to handle multicollinearity by penalizing the regression coefficients.
-Partial Least Squares Regression (PLS): If the primary goal is prediction and not inference, PLS can be an option. It’s a technique that reduces the predictors to a smaller set of uncorrelated components.
When it comes to participants as a random factor the 5:1 cases to variables ratio is appropriate, but when it comes to experimenter 3 does not meet the minimum standard.
C.R. can run the experimenters’ gender identity as a potential moderator in the relationship between the conditions as the predictor and the participants’ self-reported attraction to the human target as the outcome variable. This will allow dummy coding of the three categories of gender identity and any potential interactions.