1. Describe in some detail a research scenario in which you may use the Structural Equation Modeling in analyzing data.
Structural Equation Modeling (SEM) is useful for analyzing complex relationships among variables. For example, in a study on student academic performance, SEM can model both observed data (GPA, test scores) and latent constructs (motivation, study habits, institutional support). It assesses direct and indirect effects, validating measurement models and identifying key predictors of success.
2. Shawna works as an Administrator in an international school in Southeast Asia. She was interested in determining the job satisfaction of teachers working in international schools. She developed a job satisfaction questionnaire and was able to collect data from 163 teachers. She is seeking assistance from you to analyze her data. The data is found in ‘JobSatisfaction.xlsx’.
Shawna is interested in (a) determining the underlyingdimensions of job satisfaction that her questionnaire might have measured; and (b) the reliability of those dimensions; (c) how satisfied teachers are at these international schools; and (d) the extent to which job satisfaction may be related to gender or marital status.
As part of your analysis, be sure to screen your data for assumptions. Report of your analysis should include, though not limited to, factorability, number of factors, rotation, factor loadings and factor interpretation.
Importing the dataset:
library(MVN)
library(readxl)
data <- read_excel("JobSatisfaction.xlsx")
(a) Exploratory Factor Analysis (EFA):
library(dplyr)
js_items <- data %>%
select(starts_with("JS")) %>%
mutate(across(everything(), ~ as.numeric(as.character(.))))
library(psych)
KMO(js_items)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = js_items)
## Overall MSA = 0.67
## MSA for each item =
## JS1 JS2 JS3 JS4 JS5 JS6 JS7 JS8 JS9 JS10 JS11 JS12 JS13 JS14 JS15 JS16
## 0.49 0.81 0.62 0.74 0.63 0.74 0.77 0.69 0.55 0.72 0.54 0.55 0.47 0.70 0.64 0.67
## JS17 JS18 JS19 JS20
## 0.56 0.73 0.67 0.87
cortest.bartlett(js_items)
## $chisq
## [1] 3865.718
##
## $p.value
## [1] 0
##
## $df
## [1] 190
The overall KMO score is 0.67, indicating that the data is suitable—though not optimal—for factor analysis. Certain items, such as JS1 (0.49) and JS13 (0.47), have low individual sampling adequacy values and might require removal or further evaluation. Bartlett’s test (p < 0.05) confirms that the correlation matrix significantly differs from an identity matrix, suggesting that the variables are sufficiently correlated to justify proceeding with factor analysis.
scree(js_items)
The scree plot shows a noticeable “elbow” after the 3rd or 4th factor, indicating that retaining 3 to 4 factors is appropriate. In both Principal Components (PC) and Factor Analysis (FA), the eigenvalue curves drop steeply for the first few components before flattening out. According to Kaiser’s criterion, components with eigenvalues greater than 1 are considered significant.
For the Exploratory Factor Analysis (EFA), varimax rotation is used instead of promax, as we aim to keep the factors uncorrelated for easier interpretation. Varimax helps clarify the factor structure by simplifying the loadings, making it more apparent which items are associated with each factor.
efa_result <- fa(js_items, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result$loadings, cutoff = 0.3)
##
## Loadings:
## PA1 PA2 PA3 PA4
## JS1 0.539
## JS2 0.680 0.361
## JS3 0.701 -0.465 0.434
## JS4 0.617 0.542
## JS5 0.786
## JS6 0.730 0.512
## JS7 0.603 0.584
## JS8 0.900
## JS9 0.313 0.664
## JS10 0.680 0.571
## JS11 0.846
## JS12 0.594 0.393
## JS13 0.664
## JS14 0.790 0.506
## JS15 0.789 0.333
## JS16 0.862
## JS17 0.656
## JS18 0.551 0.309
## JS19 0.544 0.358
## JS20 0.475 0.478
##
## PA1 PA2 PA3 PA4
## SS loadings 4.272 4.211 2.621 2.539
## Proportion Var 0.214 0.211 0.131 0.127
## Cumulative Var 0.214 0.424 0.555 0.682
The cutoff is set to 0.5 to highlight stronger factor loadings and minimize cross-loadings, which helps clarify the factor structure and improves interpretability.
efa_result <- fa(js_items, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result$loadings, cutoff = 0.5)
##
## Loadings:
## PA1 PA2 PA3 PA4
## JS1 0.539
## JS2 0.680
## JS3 0.701
## JS4 0.617 0.542
## JS5 0.786
## JS6 0.730 0.512
## JS7 0.603 0.584
## JS8 0.900
## JS9 0.664
## JS10 0.680 0.571
## JS11 0.846
## JS12 0.594
## JS13 0.664
## JS14 0.790 0.506
## JS15 0.789
## JS16 0.862
## JS17 0.656
## JS18 0.551
## JS19 0.544
## JS20
##
## PA1 PA2 PA3 PA4
## SS loadings 4.272 4.211 2.621 2.539
## Proportion Var 0.214 0.211 0.131 0.127
## Cumulative Var 0.214 0.424 0.555 0.682
To enhance the quality of the factors, items with cross-loadings—those loading 0.5 or higher on two or more factors—should be removed. Based on the revised loadings using a 0.5 cutoff, the items identified for removal are JS4, JS6, JS7, JS10, and JS14.
# Remove cross-loading items
cleaned_items <- js_items[, !names(js_items) %in% c("JS4", "JS6", "JS7", "JS10", "JS14")]
Now we rerun the factor analysis on the cleaned data:
efa_result_clean <- fa(cleaned_items, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result_clean$loadings, cutoff = 0.5)
##
## Loadings:
## PA1 PA4 PA3 PA2
## JS1
## JS2 0.869
## JS3 0.715 -0.544
## JS5 0.712
## JS8 0.952
## JS9 0.592
## JS11 0.920
## JS12 0.538
## JS13 0.696
## JS15 0.726
## JS16 0.853
## JS17 0.663
## JS18 0.503
## JS19 0.728
## JS20 0.517 0.539
##
## PA1 PA4 PA3 PA2
## SS loadings 3.312 2.279 2.275 1.962
## Proportion Var 0.221 0.152 0.152 0.131
## Cumulative Var 0.221 0.373 0.524 0.655
Repeat the step:
# Remove cross-loading items again
cleaned_items1 <- cleaned_items[, !names(cleaned_items) %in% c("JS3", "JS20")]
efa_result_clean1 <- fa(cleaned_items1, nfactors = 4, rotate = "varimax", fm = "pa")
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
print(efa_result_clean1$loadings, cutoff = 0.5)
##
## Loadings:
## PA1 PA4 PA2 PA3
## JS1
## JS2 0.850
## JS5 0.694
## JS8 1.058
## JS9 0.641
## JS11 0.901
## JS12 0.524
## JS13 0.709
## JS15 0.652
## JS16 0.931
## JS17 0.600
## JS18
## JS19 0.778
##
## PA1 PA4 PA2 PA3
## SS loadings 2.492 2.182 2.023 1.706
## Proportion Var 0.192 0.168 0.156 0.131
## Cumulative Var 0.192 0.359 0.515 0.646
efa_result_clean1$communality
## JS1 JS2 JS5 JS8 JS9 JS11 JS12 JS13
## 0.2648907 0.7594103 0.5858315 1.1313641 0.6833168 0.9054743 0.4270738 0.6278099
## JS15 JS16 JS17 JS18 JS19
## 0.5891303 0.9245979 0.4047269 0.4160834 0.6820157
While there are no remaining cross-loadings, JS8 should be removed due to its communality value exceeding 1, which is not theoretically possible. Communality reflects the proportion of an item’s variance explained by the factors, and it cannot surpass the item’s total variance.
Repeat the step:
# Remove cross-loading items again
cleaned_items2 <- cleaned_items1[, !names(cleaned_items1) %in% c("JS8")]
efa_result_clean2 <- fa(cleaned_items2, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result_clean2$loadings, cutoff = 0.5)
##
## Loadings:
## PA1 PA3 PA2 PA4
## JS1
## JS2 0.866
## JS5 0.702
## JS9 0.691
## JS11 0.778
## JS12 0.929
## JS13 0.790
## JS15
## JS16 0.892
## JS17 0.586
## JS18
## JS19 0.764
##
## PA1 PA3 PA2 PA4
## SS loadings 2.480 2.206 1.623 1.182
## Proportion Var 0.207 0.184 0.135 0.099
## Cumulative Var 0.207 0.391 0.526 0.624
Since factors ideally should have at least 3 items to be reliable, we adjust the number of factors and the cutoff value.
efa_result_clean3 <- fa(cleaned_items2, nfactors = 3, rotate = "varimax", fm = "pa")
print(efa_result_clean3$loadings, cutoff = 0.4)
##
## Loadings:
## PA1 PA3 PA2
## JS1 0.404
## JS2 0.869
## JS5 0.691
## JS9 0.698
## JS11 0.730
## JS12 0.469
## JS13 0.816
## JS15 0.413
## JS16 0.956
## JS17 0.549
## JS18 0.401
## JS19 0.755
##
## PA1 PA3 PA2
## SS loadings 2.406 2.195 1.965
## Proportion Var 0.200 0.183 0.164
## Cumulative Var 0.200 0.383 0.547
efa_result_clean3$communality
## JS1 JS2 JS5 JS9 JS11 JS12 JS13 JS15
## 0.2465750 0.7856015 0.5878296 0.6951755 0.6101121 0.2527464 0.7640217 0.2841782
## JS16 JS17 JS18 JS19
## 0.9189478 0.3310899 0.4160292 0.6733273
Finally, there are no more cross loadings, factors have at least 3 items, and the communalities do not exceed 1 Thus, we have four factors.
(b) Reliability (Cronbach’s Alpha):
# Factor 1
factor1 <- js_items[, c("JS1", "JS2", "JS5", "JS18", "JS19")]
# Factor 2
factor2 <- js_items[, c("JS9", "JS11", "JS13", "JS15")]
# Factor 3
factor3 <- js_items[, c( "JS12", "JS16", "JS17")]
# Cronbach's alpha for Factor 1
alpha1 <- psych::alpha(factor1)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 1:", round(alpha1, 3)))
## [1] "Cronbach's alpha for Factor 1: 0.815"
# =Cronbach's alpha for Factor 2
alpha2 <- psych::alpha(factor2)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 2:", round(alpha2, 3)))
## [1] "Cronbach's alpha for Factor 2: 0.766"
# Cronbach's alpha for Factor 3
alpha3 <- psych::alpha(factor3)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 3:", round(alpha3, 3)))
## [1] "Cronbach's alpha for Factor 3: 0.693"
Cronbach’s alpha confirmed the reliability of three job satisfaction factors. Factor 1 showed strong reliability (0.815), Factor 2 was acceptable (0.766), and Factor 3 (0.693) fell slightly below the 0.7 threshold, suggesting room for improvement. Overall, they are sufficiently reliable for further analysis.
(c) Descriptive Statistics:
library(MVN)
factor_scores <- data.frame(factor1 = rowMeans(factor1, na.rm = TRUE),
factor2 = rowMeans(factor2, na.rm = TRUE),
factor3 = rowMeans(factor3, na.rm = TRUE))
summary_table <- data.frame(
Factor = c("Factor1", "Factor2", "Factor3"),
n = c(nrow(factor1), nrow(factor2), nrow(factor3)),
mean = c(mean(rowMeans(factor1)), mean(rowMeans(factor2)), mean(rowMeans(factor3))),
median = c(median(rowMeans(factor1)), median(rowMeans(factor2)), median(rowMeans(factor3))),
sd = c(sd(rowMeans(factor1)), sd(rowMeans(factor2)), sd(rowMeans(factor3))),
min = c(min(rowMeans(factor1)), min(rowMeans(factor2)), min(rowMeans(factor3))),
max = c(max(rowMeans(factor1)), max(rowMeans(factor2)), max(rowMeans(factor3)))
)
print(summary_table)
## Factor n mean median sd min max
## 1 Factor1 163 3.801227 3.80 0.6180403 2.6 4.8
## 2 Factor2 163 3.969325 4.25 0.6819843 2.5 5.0
## 3 Factor3 163 3.852761 4.00 0.8765331 2.0 5.0
Teachers in international schools rated their job satisfaction across three factors. Factor 2 had the highest mean score (3.97, SD = 0.68), followed by Factor 3 (3.85, SD = 0.88) and Factor 1 (3.80, SD = 0.62). With all scores above 3 on a 5-point scale, the results indicate generally positive ratings across all areas of job satisfaction.
(d) Relationship of job satisfaction with gender or marital status:
Normality test:
normality_results <- lapply(factor_scores, shapiro.test)
for (factor_name in names(normality_results)) {
cat("\n---", toupper(factor_name), "---\n")
print(normality_results[[factor_name]])
}
##
## --- FACTOR1 ---
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.93712, p-value = 1.342e-06
##
##
## --- FACTOR2 ---
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.91439, p-value = 3.399e-08
##
##
## --- FACTOR3 ---
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.8873, p-value = 8.589e-10
The Shapiro-Wilk tests showed significant results (p < .001), indicating violations of normality for all three factors. Since job satisfaction factors were dependent variables and predictors (gender and marital status) were categorical, MANOVA and ANOVA were initially considered. However, due to violated statistical assumptions, nonparametric alternatives were used. The Mann–Whitney U test examined gender differences, while the Kruskal–Wallis H test assessed marital status differences, as both methods do not require normality.
Mann-Whitney U-tests:
wilcox.test(factor_scores$factor1 ~ data$Gender)
##
## Wilcoxon rank sum test with continuity correction
##
## data: factor_scores$factor1 by data$Gender
## W = 3333, p-value = 0.9533
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(factor_scores$factor2 ~ data$Gender)
##
## Wilcoxon rank sum test with continuity correction
##
## data: factor_scores$factor2 by data$Gender
## W = 4631.5, p-value = 9.756e-06
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(factor_scores$factor3 ~ data$Gender)
##
## Wilcoxon rank sum test with continuity correction
##
## data: factor_scores$factor3 by data$Gender
## W = 2499, p-value = 0.005367
## alternative hypothesis: true location shift is not equal to 0
The Mann–Whitney U test found no significant gender difference in job satisfaction for Factor 1 (W = 3333, p = 0.953), indicating similar satisfaction levels. However, significant differences were observed in Factor 2 (W = 4631.5, p < 0.001) and Factor 3 (W = 2499, p = 0.005), suggesting that gender influences perceptions of job satisfaction in these areas.
Kruskal-Wallis tests:
kruskal.test(factor_scores$factor1 ~ data$MaritalStatus)
##
## Kruskal-Wallis rank sum test
##
## data: factor_scores$factor1 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 6.0674, df = 2, p-value = 0.04814
kruskal.test(factor_scores$factor2 ~ data$MaritalStatus)
##
## Kruskal-Wallis rank sum test
##
## data: factor_scores$factor2 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 16.81, df = 2, p-value = 0.0002237
kruskal.test(factor_scores$factor3 ~ data$MaritalStatus)
##
## Kruskal-Wallis rank sum test
##
## data: factor_scores$factor3 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 17.448, df = 2, p-value = 0.0001627
The Kruskal–Wallis tests showed significant differences in Factor 1 job satisfaction across marital status groups (p = 0.048), with stronger effects in Factor 2 (p < 0.001) and Factor 3 (p < 0.001). This suggests marital status influences job satisfaction levels. To identify specific group differences, post hoc pairwise comparisons, such as Dunn’s test with Bonferroni correction, were applied.
library(dunn.test)
# Factor 1
dunn.test(factor_scores$factor1, data$MaritalStatus, kw = TRUE)
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 6.0674, df = 2, p-value = 0.05
##
##
## Comparison of x by group
## (No adjustment)
## Col Mean-|
## Row Mean | 0 1
## ---------+----------------------
## 1 | 0.096994
## | 0.4614
## |
## 2 | 2.348498 2.401360
## | 0.0094* 0.0082*
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
# Factor 2
dunn.test(factor_scores$factor2, data$MaritalStatus, kw = TRUE)
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 16.8101, df = 2, p-value = 0
##
##
## Comparison of x by group
## (No adjustment)
## Col Mean-|
## Row Mean | 0 1
## ---------+----------------------
## 1 | 1.068687
## | 0.1426
## |
## 2 | 4.086209 3.693833
## | 0.0000* 0.0001*
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
# Factor 3
dunn.test(factor_scores$factor3, data$MaritalStatus, kw = TRUE)
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 17.4477, df = 2, p-value = 0
##
##
## Comparison of x by group
## (No adjustment)
## Col Mean-|
## Row Mean | 0 1
## ---------+----------------------
## 1 | 0.227523
## | 0.4100
## |
## 2 | 4.001101 4.057644
## | 0.0000* 0.0000*
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
The Kruskal-Wallis tests confirmed significant differences in all three factors across marital status groups. Dunn’s post-hoc analysis showed that for Factor 1, group 2 differed significantly from groups 1 and 0. Similarly, for Factors 2 and 3, group 2 showed significant differences from both groups 0 and 1, indicating that marital status strongly influences these factor scores.