Stat 56 Final Term Examination

1. Describe in some detail a research scenario in which you may use the Structural Equation Modeling in analyzing data.

Structural Equation Modeling (SEM) is useful for analyzing complex relationships among variables. For example, in a study on student academic performance, SEM can model both observed data (GPA, test scores) and latent constructs (motivation, study habits, institutional support). It assesses direct and indirect effects, validating measurement models and identifying key predictors of success.

2. Shawna works as an Administrator in an international school in Southeast Asia. She was interested in determining the job satisfaction of teachers working in international schools. She developed a job satisfaction questionnaire and was able to collect data from 163 teachers. She is seeking assistance from you to analyze her data. The data is found in ‘JobSatisfaction.xlsx’.

Shawna is interested in (a) determining the underlyingdimensions of job satisfaction that her questionnaire might have measured; and (b) the reliability of those dimensions; (c) how satisfied teachers are at these international schools; and (d) the extent to which job satisfaction may be related to gender or marital status.

As part of your analysis, be sure to screen your data for assumptions. Report of your analysis should include, though not limited to, factorability, number of factors, rotation, factor loadings and factor interpretation.

Importing the dataset:

library(MVN)
library(readxl)
data <- read_excel("JobSatisfaction.xlsx")

(a) Exploratory Factor Analysis (EFA):

library(dplyr)

js_items <- data %>%
  select(starts_with("JS")) %>%
  mutate(across(everything(), ~ as.numeric(as.character(.))))

library(psych)

KMO(js_items)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = js_items)
## Overall MSA =  0.67
## MSA for each item = 
##  JS1  JS2  JS3  JS4  JS5  JS6  JS7  JS8  JS9 JS10 JS11 JS12 JS13 JS14 JS15 JS16 
## 0.49 0.81 0.62 0.74 0.63 0.74 0.77 0.69 0.55 0.72 0.54 0.55 0.47 0.70 0.64 0.67 
## JS17 JS18 JS19 JS20 
## 0.56 0.73 0.67 0.87

cortest.bartlett(js_items)

## $chisq
## [1] 3865.718
## 
## $p.value
## [1] 0
## 
## $df
## [1] 190

The overall KMO score is 0.67, indicating that the data is suitable—though not optimal—for factor analysis. Certain items, such as JS1 (0.49) and JS13 (0.47), have low individual sampling adequacy values and might require removal or further evaluation. Bartlett’s test (p < 0.05) confirms that the correlation matrix significantly differs from an identity matrix, suggesting that the variables are sufficiently correlated to justify proceeding with factor analysis.

scree(js_items)

The scree plot shows a noticeable “elbow” after the 3rd or 4th factor, indicating that retaining 3 to 4 factors is appropriate. In both Principal Components (PC) and Factor Analysis (FA), the eigenvalue curves drop steeply for the first few components before flattening out. According to Kaiser’s criterion, components with eigenvalues greater than 1 are considered significant.

For the Exploratory Factor Analysis (EFA), varimax rotation is used instead of promax, as we aim to keep the factors uncorrelated for easier interpretation. Varimax helps clarify the factor structure by simplifying the loadings, making it more apparent which items are associated with each factor.

efa_result <- fa(js_items, nfactors = 4, rotate = "varimax", fm = "pa") 
print(efa_result$loadings, cutoff = 0.3)

## 
## Loadings:
##      PA1    PA2    PA3    PA4   
## JS1   0.539                     
## JS2   0.680                0.361
## JS3   0.701        -0.465  0.434
## JS4   0.617  0.542              
## JS5   0.786                     
## JS6   0.730  0.512              
## JS7   0.603  0.584              
## JS8          0.900              
## JS9   0.313                0.664
## JS10         0.680  0.571       
## JS11                       0.846
## JS12         0.594  0.393       
## JS13                       0.664
## JS14         0.790  0.506       
## JS15         0.789         0.333
## JS16                0.862       
## JS17                0.656       
## JS18  0.551                0.309
## JS19  0.544                0.358
## JS20  0.475  0.478              
## 
##                  PA1   PA2   PA3   PA4
## SS loadings    4.272 4.211 2.621 2.539
## Proportion Var 0.214 0.211 0.131 0.127
## Cumulative Var 0.214 0.424 0.555 0.682

The cutoff is set to 0.5 to highlight stronger factor loadings and minimize cross-loadings, which helps clarify the factor structure and improves interpretability.

efa_result <- fa(js_items, nfactors = 4, rotate = "varimax", fm = "pa") 
print(efa_result$loadings, cutoff = 0.5)

## 
## Loadings:
##      PA1    PA2    PA3    PA4   
## JS1   0.539                     
## JS2   0.680                     
## JS3   0.701                     
## JS4   0.617  0.542              
## JS5   0.786                     
## JS6   0.730  0.512              
## JS7   0.603  0.584              
## JS8          0.900              
## JS9                        0.664
## JS10         0.680  0.571       
## JS11                       0.846
## JS12         0.594              
## JS13                       0.664
## JS14         0.790  0.506       
## JS15         0.789              
## JS16                0.862       
## JS17                0.656       
## JS18  0.551                     
## JS19  0.544                     
## JS20                            
## 
##                  PA1   PA2   PA3   PA4
## SS loadings    4.272 4.211 2.621 2.539
## Proportion Var 0.214 0.211 0.131 0.127
## Cumulative Var 0.214 0.424 0.555 0.682

To enhance the quality of the factors, items with cross-loadings—those loading 0.5 or higher on two or more factors—should be removed. Based on the revised loadings using a 0.5 cutoff, the items identified for removal are JS4, JS6, JS7, JS10, and JS14.

# Remove cross-loading items
cleaned_items <- js_items[, !names(js_items) %in% c("JS4", "JS6", "JS7", "JS10", "JS14")]

Now we rerun the factor analysis on the cleaned data:

efa_result_clean <- fa(cleaned_items, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result_clean$loadings, cutoff = 0.5)

## 
## Loadings:
##      PA1    PA4    PA3    PA2   
## JS1                             
## JS2   0.869                     
## JS3   0.715               -0.544
## JS5   0.712                     
## JS8                 0.952       
## JS9          0.592              
## JS11         0.920              
## JS12                0.538       
## JS13         0.696              
## JS15                0.726       
## JS16                       0.853
## JS17                       0.663
## JS18  0.503                     
## JS19  0.728                     
## JS20  0.517         0.539       
## 
##                  PA1   PA4   PA3   PA2
## SS loadings    3.312 2.279 2.275 1.962
## Proportion Var 0.221 0.152 0.152 0.131
## Cumulative Var 0.221 0.373 0.524 0.655

Repeat the step:

# Remove cross-loading items again
cleaned_items1 <- cleaned_items[, !names(cleaned_items) %in% c("JS3", "JS20")]

efa_result_clean1 <- fa(cleaned_items1, nfactors = 4, rotate = "varimax", fm = "pa")

## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect.  Try a
## different factor score estimation method.

## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected.  Examine the results carefully

print(efa_result_clean1$loadings, cutoff = 0.5)

## 
## Loadings:
##      PA1    PA4    PA2    PA3   
## JS1                             
## JS2   0.850                     
## JS5   0.694                     
## JS8                 1.058       
## JS9          0.641              
## JS11         0.901              
## JS12                0.524       
## JS13         0.709              
## JS15                0.652       
## JS16                       0.931
## JS17                       0.600
## JS18                            
## JS19  0.778                     
## 
##                  PA1   PA4   PA2   PA3
## SS loadings    2.492 2.182 2.023 1.706
## Proportion Var 0.192 0.168 0.156 0.131
## Cumulative Var 0.192 0.359 0.515 0.646

efa_result_clean1$communality

##       JS1       JS2       JS5       JS8       JS9      JS11      JS12      JS13 
## 0.2648907 0.7594103 0.5858315 1.1313641 0.6833168 0.9054743 0.4270738 0.6278099 
##      JS15      JS16      JS17      JS18      JS19 
## 0.5891303 0.9245979 0.4047269 0.4160834 0.6820157

While there are no remaining cross-loadings, JS8 should be removed due to its communality value exceeding 1, which is not theoretically possible. Communality reflects the proportion of an item’s variance explained by the factors, and it cannot surpass the item’s total variance.

Repeat the step:

# Remove cross-loading items again
cleaned_items2 <- cleaned_items1[, !names(cleaned_items1) %in% c("JS8")]

efa_result_clean2 <- fa(cleaned_items2, nfactors = 4, rotate = "varimax", fm = "pa")
print(efa_result_clean2$loadings, cutoff = 0.5)

## 
## Loadings:
##      PA1    PA3    PA2    PA4   
## JS1                             
## JS2   0.866                     
## JS5   0.702                     
## JS9          0.691              
## JS11         0.778              
## JS12                       0.929
## JS13         0.790              
## JS15                            
## JS16                0.892       
## JS17                0.586       
## JS18                            
## JS19  0.764                     
## 
##                  PA1   PA3   PA2   PA4
## SS loadings    2.480 2.206 1.623 1.182
## Proportion Var 0.207 0.184 0.135 0.099
## Cumulative Var 0.207 0.391 0.526 0.624

Since factors ideally should have at least 3 items to be reliable, we adjust the number of factors and the cutoff value.

efa_result_clean3 <- fa(cleaned_items2, nfactors = 3, rotate = "varimax", fm = "pa")
print(efa_result_clean3$loadings, cutoff = 0.4)

## 
## Loadings:
##      PA1    PA3    PA2   
## JS1   0.404              
## JS2   0.869              
## JS5   0.691              
## JS9          0.698       
## JS11         0.730       
## JS12                0.469
## JS13         0.816       
## JS15         0.413       
## JS16                0.956
## JS17                0.549
## JS18  0.401              
## JS19  0.755              
## 
##                  PA1   PA3   PA2
## SS loadings    2.406 2.195 1.965
## Proportion Var 0.200 0.183 0.164
## Cumulative Var 0.200 0.383 0.547

efa_result_clean3$communality

##       JS1       JS2       JS5       JS9      JS11      JS12      JS13      JS15 
## 0.2465750 0.7856015 0.5878296 0.6951755 0.6101121 0.2527464 0.7640217 0.2841782 
##      JS16      JS17      JS18      JS19 
## 0.9189478 0.3310899 0.4160292 0.6733273

Finally, there are no more cross loadings, factors have at least 3 items, and the communalities do not exceed 1 Thus, we have four factors.

(b) Reliability (Cronbach’s Alpha):

# Factor 1
factor1 <- js_items[, c("JS1", "JS2", "JS5", "JS18", "JS19")]

# Factor 2
factor2 <- js_items[, c("JS9", "JS11", "JS13", "JS15")]

# Factor 3
factor3 <- js_items[, c( "JS12", "JS16", "JS17")]

# Cronbach's alpha for Factor 1
alpha1 <- psych::alpha(factor1)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 1:", round(alpha1, 3)))

## [1] "Cronbach's alpha for Factor 1: 0.815"

# =Cronbach's alpha for Factor 2
alpha2 <- psych::alpha(factor2)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 2:", round(alpha2, 3)))

## [1] "Cronbach's alpha for Factor 2: 0.766"

# Cronbach's alpha for Factor 3
alpha3 <- psych::alpha(factor3)$total$raw_alpha
print(paste("Cronbach's alpha for Factor 3:", round(alpha3, 3)))

## [1] "Cronbach's alpha for Factor 3: 0.693"

Cronbach’s alpha confirmed the reliability of three job satisfaction factors. Factor 1 showed strong reliability (0.815), Factor 2 was acceptable (0.766), and Factor 3 (0.693) fell slightly below the 0.7 threshold, suggesting room for improvement. Overall, they are sufficiently reliable for further analysis.

(c) Descriptive Statistics:

library(MVN)

factor_scores <- data.frame(factor1 = rowMeans(factor1, na.rm = TRUE),
                            factor2 = rowMeans(factor2, na.rm = TRUE),
                            factor3 = rowMeans(factor3, na.rm = TRUE))

summary_table <- data.frame(
  Factor = c("Factor1", "Factor2", "Factor3"),
  n = c(nrow(factor1), nrow(factor2), nrow(factor3)),
  mean = c(mean(rowMeans(factor1)), mean(rowMeans(factor2)), mean(rowMeans(factor3))),
  median = c(median(rowMeans(factor1)), median(rowMeans(factor2)), median(rowMeans(factor3))),
  sd = c(sd(rowMeans(factor1)), sd(rowMeans(factor2)), sd(rowMeans(factor3))),
  min = c(min(rowMeans(factor1)), min(rowMeans(factor2)), min(rowMeans(factor3))),
  max = c(max(rowMeans(factor1)), max(rowMeans(factor2)), max(rowMeans(factor3)))
)

print(summary_table)

##    Factor   n     mean median        sd min max
## 1 Factor1 163 3.801227   3.80 0.6180403 2.6 4.8
## 2 Factor2 163 3.969325   4.25 0.6819843 2.5 5.0
## 3 Factor3 163 3.852761   4.00 0.8765331 2.0 5.0

Teachers in international schools rated their job satisfaction across three factors. Factor 2 had the highest mean score (3.97, SD = 0.68), followed by Factor 3 (3.85, SD = 0.88) and Factor 1 (3.80, SD = 0.62). With all scores above 3 on a 5-point scale, the results indicate generally positive ratings across all areas of job satisfaction.

(d) Relationship of job satisfaction with gender or marital status:

Normality test:

normality_results <- lapply(factor_scores, shapiro.test)

for (factor_name in names(normality_results)) {
  cat("\n---", toupper(factor_name), "---\n")
  print(normality_results[[factor_name]])
}

## 
## --- FACTOR1 ---
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.93712, p-value = 1.342e-06
## 
## 
## --- FACTOR2 ---
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.91439, p-value = 3.399e-08
## 
## 
## --- FACTOR3 ---
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.8873, p-value = 8.589e-10

The Shapiro-Wilk tests showed significant results (p < .001), indicating violations of normality for all three factors. Since job satisfaction factors were dependent variables and predictors (gender and marital status) were categorical, MANOVA and ANOVA were initially considered. However, due to violated statistical assumptions, nonparametric alternatives were used. The Mann–Whitney U test examined gender differences, while the Kruskal–Wallis H test assessed marital status differences, as both methods do not require normality.

Mann-Whitney U-tests:

wilcox.test(factor_scores$factor1 ~ data$Gender)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  factor_scores$factor1 by data$Gender
## W = 3333, p-value = 0.9533
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(factor_scores$factor2 ~ data$Gender)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  factor_scores$factor2 by data$Gender
## W = 4631.5, p-value = 9.756e-06
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(factor_scores$factor3 ~ data$Gender)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  factor_scores$factor3 by data$Gender
## W = 2499, p-value = 0.005367
## alternative hypothesis: true location shift is not equal to 0

The Mann–Whitney U test found no significant gender difference in job satisfaction for Factor 1 (W = 3333, p = 0.953), indicating similar satisfaction levels. However, significant differences were observed in Factor 2 (W = 4631.5, p < 0.001) and Factor 3 (W = 2499, p = 0.005), suggesting that gender influences perceptions of job satisfaction in these areas.

Kruskal-Wallis tests:

kruskal.test(factor_scores$factor1 ~ data$MaritalStatus)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  factor_scores$factor1 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 6.0674, df = 2, p-value = 0.04814

kruskal.test(factor_scores$factor2 ~ data$MaritalStatus)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  factor_scores$factor2 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 16.81, df = 2, p-value = 0.0002237

kruskal.test(factor_scores$factor3 ~ data$MaritalStatus)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  factor_scores$factor3 by data$MaritalStatus
## Kruskal-Wallis chi-squared = 17.448, df = 2, p-value = 0.0001627

The Kruskal–Wallis tests showed significant differences in Factor 1 job satisfaction across marital status groups (p = 0.048), with stronger effects in Factor 2 (p < 0.001) and Factor 3 (p < 0.001). This suggests marital status influences job satisfaction levels. To identify specific group differences, post hoc pairwise comparisons, such as Dunn’s test with Bonferroni correction, were applied.

library(dunn.test)

# Factor 1
dunn.test(factor_scores$factor1, data$MaritalStatus, kw = TRUE)

##   Kruskal-Wallis rank sum test
## 
## data: x and group
## Kruskal-Wallis chi-squared = 6.0674, df = 2, p-value = 0.05
## 
## 
##                            Comparison of x by group                            
##                                 (No adjustment)                                
## Col Mean-|
## Row Mean |          0          1
## ---------+----------------------
##        1 |   0.096994
##          |     0.4614
##          |
##        2 |   2.348498   2.401360
##          |    0.0094*    0.0082*
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2

# Factor 2
dunn.test(factor_scores$factor2, data$MaritalStatus, kw = TRUE)

##   Kruskal-Wallis rank sum test
## 
## data: x and group
## Kruskal-Wallis chi-squared = 16.8101, df = 2, p-value = 0
## 
## 
##                            Comparison of x by group                            
##                                 (No adjustment)                                
## Col Mean-|
## Row Mean |          0          1
## ---------+----------------------
##        1 |   1.068687
##          |     0.1426
##          |
##        2 |   4.086209   3.693833
##          |    0.0000*    0.0001*
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2

# Factor 3
dunn.test(factor_scores$factor3, data$MaritalStatus, kw = TRUE)

##   Kruskal-Wallis rank sum test
## 
## data: x and group
## Kruskal-Wallis chi-squared = 17.4477, df = 2, p-value = 0
## 
## 
##                            Comparison of x by group                            
##                                 (No adjustment)                                
## Col Mean-|
## Row Mean |          0          1
## ---------+----------------------
##        1 |   0.227523
##          |     0.4100
##          |
##        2 |   4.001101   4.057644
##          |    0.0000*    0.0000*
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2

The Kruskal-Wallis tests confirmed significant differences in all three factors across marital status groups. Dunn’s post-hoc analysis showed that for Factor 1, group 2 differed significantly from groups 1 and 0. Similarly, for Factors 2 and 3, group 2 showed significant differences from both groups 0 and 1, indicating that marital status strongly influences these factor scores.

Stat 56 Final Term Examination

Justin Lian G. Caballero

2025-05-29