The dataset used for this homework is Protestant Work Ethic Scale Responses, created by Lucas Greenwell. Retrieved from Kaggle: https://www.kaggle.com/datasets/lucasgreenwell/protestant-work-ethic-scale-responses?resource=download

I will be using the factor analysis method.

mydata <- read.table("~/data1.csv",
                     header = TRUE,
                     sep = ";")

head(mydata)
##   Q1A Q2A Q3A Q4A Q5A Q6A Q7A Q8A Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A Q17A
## 1   4   1   5   5   5   2   4   3   5    5    3    4    2    3    5    2    5
## 2   4   4   4   5   4   2   5   2   2    2    4    4    4    2    2    4    4
## 3   3   3   4   2   2   3   3   2   3    3    4    3    3    4    4    3    3
## 4   4   2   4   4   1   4   5   5   5    5    4    4    2    5    5    5    5
## 5   4   1   5   1   1   1   5   5   5    2    1    2    5    5    5    4    2
## 6   1   1   2   1   5   1   5   1   5    3    2    4    4    4    5    4    3
##   Q18A Q19A gender age religion
## 1    1    2      1  24        6
## 2    4    5      2  66        6
## 3    2    2      2  17        2
## 4    4    2      2  23        2
## 5    4    2      2  19        1
## 6    1    1      2  40        7

All variables are measured on Likert scale (1: Disagree, 2: Slightly disagree, 3: Neutral, 4: Slightly agree, 5: Agree) for the following questions:

mydata <- mydata[, -c(20,21,22)]

Research Question:

““What are the underlying factors that shape individuals’ attitudes toward work according to the Protestant Work Ethic Scale?”

library(pastecs)
round(stat.desc(mydata, basic = FALSE), 2)
##               Q1A  Q2A  Q3A  Q4A  Q5A  Q6A  Q7A  Q8A  Q9A Q10A Q11A Q12A Q13A
## median       3.00 2.00 4.00 4.00 4.00 2.00 4.00 4.00 4.00 4.00 3.00 4.00 2.00
## mean         3.15 2.10 3.71 3.76 3.48 2.31 3.65 3.57 3.85 3.75 2.75 3.44 2.81
## SE.mean      0.04 0.04 0.03 0.04 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04 0.04
## CI.mean.0.95 0.08 0.07 0.07 0.07 0.07 0.08 0.07 0.08 0.06 0.07 0.07 0.08 0.08
## var          2.07 1.70 1.65 1.73 1.91 2.00 1.88 1.98 1.42 1.91 1.97 2.14 2.02
## std.dev      1.44 1.31 1.28 1.31 1.38 1.41 1.37 1.41 1.19 1.38 1.40 1.46 1.42
## coef.var     0.46 0.62 0.35 0.35 0.40 0.61 0.38 0.39 0.31 0.37 0.51 0.43 0.50
##              Q14A Q15A Q16A Q17A Q18A Q19A
## median       4.00 4.00 4.00 4.00 4.00 4.00
## mean         3.30 3.63 3.97 3.72 3.28 3.24
## SE.mean      0.04 0.04 0.03 0.04 0.04 0.04
## CI.mean.0.95 0.08 0.07 0.06 0.07 0.08 0.08
## var          2.14 1.75 1.33 1.77 2.23 2.20
## std.dev      1.46 1.32 1.15 1.33 1.49 1.48
## coef.var     0.44 0.36 0.29 0.36 0.46 0.46

Q18A has the strongest variability.

sapply(mydata[-c(20, 21, 22)], FUN = min)
##  Q1A  Q2A  Q3A  Q4A  Q5A  Q6A  Q7A  Q8A  Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
## Q17A Q18A Q19A 
##    0    0    0
sapply(mydata[-c(20, 21, 22)], FUN = max)
##  Q1A  Q2A  Q3A  Q4A  Q5A  Q6A  Q7A  Q8A  Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A 
##    5    5    5    5    5    5    5    5    5    5    5    5    5    5    5    5 
## Q17A Q18A Q19A 
##    5    5    5
mydata[mydata == 0] <- NA #Since I got minimum values 0s, which shouldn't be here since the Likert scale is 1-5, I replaced the 0s with NAs and omitted them.
mydata <- na.omit(mydata)
sapply(mydata[1:19], min)
##  Q1A  Q2A  Q3A  Q4A  Q5A  Q6A  Q7A  Q8A  Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
## Q17A Q18A Q19A 
##    1    1    1
R <- cor(mydata[-c(20, 21, 22)])

library(psych)
corPlot(R)

det(R)
## [1] 0.00266095

Checking the determinant of correlation matrix. If it weren’t bigger than 0.0001, I’d have to remove variables with the highest correlation.

library(psych)
cortest.bartlett(R, n = nrow(mydata)) #Bartlett's test of sphericity
## $chisq
## [1] 7949.897
## 
## $p.value
## [1] 0
## 
## $df
## [1] 171

H₀: P = I

H₁: P ≠ I

df: (px(p-1))/2 = (19x18)/2 = 171

We reject H0 at p < 0.001.

library(psych)
KMO(R)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = R)
## Overall MSA =  0.92
## MSA for each item = 
##  Q1A  Q2A  Q3A  Q4A  Q5A  Q6A  Q7A  Q8A  Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A 
## 0.95 0.93 0.95 0.92 0.94 0.92 0.89 0.91 0.88 0.89 0.93 0.96 0.94 0.82 0.88 0.94 
## Q17A Q18A Q19A 
## 0.90 0.90 0.94
library(psych)
fa.parallel(mydata,
            sim = FALSE,
            fa = "fa")

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

Based on the parallel analysis, it makes sense to extract 5 factors from the indicators.

library(psych)
library(GPArotation)
## 
## Attaching package: 'GPArotation'
## The following objects are masked from 'package:psych':
## 
##     equamax, varimin
factors <- fa(mydata,
              covar = FALSE,
              nfactors = 5,
              fm = "minres",
              rotate = "oblimin",
              impute = "mean")
print.psych(factors,
            cut = 0.3,
            sort = TRUE)
## Factor Analysis using method =  minres
## Call: fa(r = mydata, nfactors = 5, rotate = "oblimin", covar = FALSE, 
##     impute = "mean", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      item   MR1   MR2   MR5   MR4   MR3   h2   u2 com
## Q10A   10  0.86                         0.72 0.28 1.0
## Q17A   17  0.76                         0.72 0.28 1.0
## Q13A   13 -0.54                         0.36 0.64 1.1
## Q16A   16  0.38                         0.33 0.67 2.4
## Q9A     9        0.79                   0.67 0.33 1.0
## Q15A   15        0.76                   0.60 0.40 1.0
## Q2A     2       -0.43  0.32             0.46 0.54 2.1
## Q6A     6              0.76             0.67 0.33 1.0
## Q11A   11              0.59             0.52 0.48 1.2
## Q18A   18                    0.62       0.37 0.63 1.1
## Q4A     4                    0.57       0.43 0.57 1.1
## Q19A   19                    0.40       0.50 0.50 2.0
## Q5A     5                               0.17 0.83 2.0
## Q8A     8                          0.51 0.29 0.71 1.2
## Q3A     3                          0.37 0.29 0.71 1.5
## Q1A     1                          0.37 0.45 0.55 2.3
## Q7A     7                          0.33 0.15 0.85 1.4
## Q14A   14                               0.11 0.89 3.0
## Q12A   12                               0.26 0.74 2.5
## 
##                        MR1  MR2  MR5  MR4  MR3
## SS loadings           2.10 1.76 1.51 1.44 1.27
## Proportion Var        0.11 0.09 0.08 0.08 0.07
## Cumulative Var        0.11 0.20 0.28 0.36 0.42
## Proportion Explained  0.26 0.22 0.19 0.18 0.16
## Cumulative Proportion 0.26 0.48 0.66 0.84 1.00
## 
##  With factor correlations of 
##       MR1   MR2   MR5   MR4   MR3
## MR1  1.00 -0.50  0.63  0.51  0.42
## MR2 -0.50  1.00 -0.54 -0.40 -0.36
## MR5  0.63 -0.54  1.00  0.42  0.49
## MR4  0.51 -0.40  0.42  1.00  0.47
## MR3  0.42 -0.36  0.49  0.47  1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 5 factors are sufficient.
## 
## df null model =  171  with the objective function =  5.93 with Chi Square =  7949.9
## df of  the model are 86  and the objective function was  0.14 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic n.obs is  1349 with the empirical chi square  131.91  with prob <  0.0011 
## The total n.obs was  1349  with Likelihood Chi Square =  183.26  with prob <  5.3e-09 
## 
## Tucker Lewis Index of factoring reliability =  0.975
## RMSEA index =  0.029  and the 90 % confidence intervals are  0.023 0.035
## BIC =  -436.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR5  MR4  MR3
## Correlation of (regression) scores with factors   0.93 0.90 0.89 0.84 0.80
## Multiple R square of scores with factors          0.87 0.82 0.79 0.70 0.64
## Minimum correlation of possible factor scores     0.73 0.63 0.59 0.40 0.29
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:pastecs':
## 
##     first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mydata1 <- mydata %>%
  select(-c("Q1A", "Q19A", "Q5A", "Q16A", "Q12A"))
library(psych)
fa.parallel(mydata1,
            sim = FALSE,
            fa = "fa")

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA
library(psych)
library(GPArotation)
factors1 <- fa(mydata1,
              covar = FALSE,
              nfactors = 5,
              fm = "minres",
              rotate = "oblimin",
              impute = "mean")

print.psych(factors,
            cut = 0.3,
            sort = TRUE)
## Factor Analysis using method =  minres
## Call: fa(r = mydata, nfactors = 5, rotate = "oblimin", covar = FALSE, 
##     impute = "mean", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      item   MR1   MR2   MR5   MR4   MR3   h2   u2 com
## Q10A   10  0.86                         0.72 0.28 1.0
## Q17A   17  0.76                         0.72 0.28 1.0
## Q13A   13 -0.54                         0.36 0.64 1.1
## Q16A   16  0.38                         0.33 0.67 2.4
## Q9A     9        0.79                   0.67 0.33 1.0
## Q15A   15        0.76                   0.60 0.40 1.0
## Q2A     2       -0.43  0.32             0.46 0.54 2.1
## Q6A     6              0.76             0.67 0.33 1.0
## Q11A   11              0.59             0.52 0.48 1.2
## Q18A   18                    0.62       0.37 0.63 1.1
## Q4A     4                    0.57       0.43 0.57 1.1
## Q19A   19                    0.40       0.50 0.50 2.0
## Q5A     5                               0.17 0.83 2.0
## Q8A     8                          0.51 0.29 0.71 1.2
## Q3A     3                          0.37 0.29 0.71 1.5
## Q1A     1                          0.37 0.45 0.55 2.3
## Q7A     7                          0.33 0.15 0.85 1.4
## Q14A   14                               0.11 0.89 3.0
## Q12A   12                               0.26 0.74 2.5
## 
##                        MR1  MR2  MR5  MR4  MR3
## SS loadings           2.10 1.76 1.51 1.44 1.27
## Proportion Var        0.11 0.09 0.08 0.08 0.07
## Cumulative Var        0.11 0.20 0.28 0.36 0.42
## Proportion Explained  0.26 0.22 0.19 0.18 0.16
## Cumulative Proportion 0.26 0.48 0.66 0.84 1.00
## 
##  With factor correlations of 
##       MR1   MR2   MR5   MR4   MR3
## MR1  1.00 -0.50  0.63  0.51  0.42
## MR2 -0.50  1.00 -0.54 -0.40 -0.36
## MR5  0.63 -0.54  1.00  0.42  0.49
## MR4  0.51 -0.40  0.42  1.00  0.47
## MR3  0.42 -0.36  0.49  0.47  1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 5 factors are sufficient.
## 
## df null model =  171  with the objective function =  5.93 with Chi Square =  7949.9
## df of  the model are 86  and the objective function was  0.14 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic n.obs is  1349 with the empirical chi square  131.91  with prob <  0.0011 
## The total n.obs was  1349  with Likelihood Chi Square =  183.26  with prob <  5.3e-09 
## 
## Tucker Lewis Index of factoring reliability =  0.975
## RMSEA index =  0.029  and the 90 % confidence intervals are  0.023 0.035
## BIC =  -436.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR5  MR4  MR3
## Correlation of (regression) scores with factors   0.93 0.90 0.89 0.84 0.80
## Multiple R square of scores with factors          0.87 0.82 0.79 0.70 0.64
## Minimum correlation of possible factor scores     0.73 0.63 0.59 0.40 0.29
library(dplyr)
mydata2 <- mydata %>%
  select(-c("Q4A", "Q18A"))
library(psych)
fa.parallel(mydata2,
            sim = FALSE,
            fa = "fa")

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA
library(psych)
library(GPArotation)
factors2 <- fa(mydata2,
              covar = FALSE,
              nfactors = 4,
              fm = "minres",
              rotate = "oblimin",
              impute = "mean")

print.psych(factors,
            cut = 0.3,
            sort = TRUE)
## Factor Analysis using method =  minres
## Call: fa(r = mydata, nfactors = 5, rotate = "oblimin", covar = FALSE, 
##     impute = "mean", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      item   MR1   MR2   MR5   MR4   MR3   h2   u2 com
## Q10A   10  0.86                         0.72 0.28 1.0
## Q17A   17  0.76                         0.72 0.28 1.0
## Q13A   13 -0.54                         0.36 0.64 1.1
## Q16A   16  0.38                         0.33 0.67 2.4
## Q9A     9        0.79                   0.67 0.33 1.0
## Q15A   15        0.76                   0.60 0.40 1.0
## Q2A     2       -0.43  0.32             0.46 0.54 2.1
## Q6A     6              0.76             0.67 0.33 1.0
## Q11A   11              0.59             0.52 0.48 1.2
## Q18A   18                    0.62       0.37 0.63 1.1
## Q4A     4                    0.57       0.43 0.57 1.1
## Q19A   19                    0.40       0.50 0.50 2.0
## Q5A     5                               0.17 0.83 2.0
## Q8A     8                          0.51 0.29 0.71 1.2
## Q3A     3                          0.37 0.29 0.71 1.5
## Q1A     1                          0.37 0.45 0.55 2.3
## Q7A     7                          0.33 0.15 0.85 1.4
## Q14A   14                               0.11 0.89 3.0
## Q12A   12                               0.26 0.74 2.5
## 
##                        MR1  MR2  MR5  MR4  MR3
## SS loadings           2.10 1.76 1.51 1.44 1.27
## Proportion Var        0.11 0.09 0.08 0.08 0.07
## Cumulative Var        0.11 0.20 0.28 0.36 0.42
## Proportion Explained  0.26 0.22 0.19 0.18 0.16
## Cumulative Proportion 0.26 0.48 0.66 0.84 1.00
## 
##  With factor correlations of 
##       MR1   MR2   MR5   MR4   MR3
## MR1  1.00 -0.50  0.63  0.51  0.42
## MR2 -0.50  1.00 -0.54 -0.40 -0.36
## MR5  0.63 -0.54  1.00  0.42  0.49
## MR4  0.51 -0.40  0.42  1.00  0.47
## MR3  0.42 -0.36  0.49  0.47  1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 5 factors are sufficient.
## 
## df null model =  171  with the objective function =  5.93 with Chi Square =  7949.9
## df of  the model are 86  and the objective function was  0.14 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic n.obs is  1349 with the empirical chi square  131.91  with prob <  0.0011 
## The total n.obs was  1349  with Likelihood Chi Square =  183.26  with prob <  5.3e-09 
## 
## Tucker Lewis Index of factoring reliability =  0.975
## RMSEA index =  0.029  and the 90 % confidence intervals are  0.023 0.035
## BIC =  -436.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR5  MR4  MR3
## Correlation of (regression) scores with factors   0.93 0.90 0.89 0.84 0.80
## Multiple R square of scores with factors          0.87 0.82 0.79 0.70 0.64
## Minimum correlation of possible factor scores     0.73 0.63 0.59 0.40 0.29
  1. F1 -> Meritocracy (Q10A - Q13A)

  2. F2 -> Work-Leisure (Q9A - Q2A)

  3. F3 -> Financial Discipline (Q3A - Q1A)

  4. F4 -> Self-Reliance (Q6A - Q11A)

library(psych)
fa.diagram(factors2)

residuals <- factors2$residual

residuals <- as.matrix(residuals[upper.tri(residuals)])

highresiduals <- abs(residuals) > 0.05

head(highresiduals)
##       [,1]
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
## [5,] FALSE
## [6,] FALSE
sum(highresiduals)/nrow(residuals)
## [1] 0.01470588
head(factors2$scores)
##             MR1        MR2         MR3         MR4
## [1,]  0.6600824  0.8878537  0.04478136  0.03914653
## [2,] -0.4275908 -1.4535593  0.58208279  0.50479973
## [3,] -0.5698184 -0.2746627 -0.64392475  0.50933862
## [4,]  0.9194183  0.7515384  0.58109507  0.95065175
## [5,] -1.4894063  1.1462971  0.14256054 -1.15459613
## [6,] -0.7630665  1.2964196 -0.83266448 -1.10128591
print(factors2$weights)
##                MR1           MR2         MR3          MR4
## Q1A   0.0037222748 -0.0856051701  0.15115211  0.085331172
## Q2A  -0.0223718461 -0.1435261210  0.04779496  0.129077737
## Q3A   0.0037802471  0.0052559880  0.21241574  0.016846232
## Q5A   0.0035949828  0.0005757773  0.11399277  0.017232075
## Q6A   0.0478562495 -0.0385010541 -0.01852301  0.406440681
## Q7A  -0.0031480263  0.0210188093  0.14491478  0.001148358
## Q8A   0.0003228938 -0.0211155590  0.13881583  0.015948707
## Q9A  -0.0099524071  0.4328790239  0.02042869 -0.042827037
## Q10A  0.4059750981 -0.0122786950 -0.02225333  0.036307740
## Q11A  0.0392680786  0.0139193763  0.07599667  0.289982785
## Q12A  0.0078931905 -0.0387901283  0.14860369  0.013242708
## Q13A -0.1122640880  0.0295161368  0.01754068 -0.017043320
## Q14A -0.0194638286  0.0152363661  0.11160444  0.032429960
## Q15A -0.0328859611  0.3520586454  0.02784484  0.024572379
## Q16A  0.0781886654  0.0117393066  0.16348446 -0.029164505
## Q17A  0.3867711804 -0.0263986840  0.05106735  0.085638368
## Q19A  0.0513935796 -0.0160249147  0.19838069  0.066390322
mydata$F1_Meri <- factors2$scores[ ,1]
mydata$F2_Work <- factors2$scores[ ,2]
mydata$F3_FinD <- factors2$scores[ ,3]
mydata$F4_Self <- factors2$scores[ ,4]

print(mydata[100,])
##     Q1A Q2A Q3A Q4A Q5A Q6A Q7A Q8A Q9A Q10A Q11A Q12A Q13A Q14A Q15A Q16A Q17A
## 100   4   2   5   5   5   4   5   5   2    4    5    4    1    5    2    5    5
##     Q18A Q19A   F1_Meri   F2_Work  F3_FinD  F4_Self
## 100    5    5 0.9517155 -1.369416 1.579608 1.448549
F1_Meri <- mydata[,c("Q10A", "Q17A", "Q13A")]
head(F1_Meri)
##   Q10A Q17A Q13A
## 1    5    5    2
## 2    2    4    4
## 3    3    3    3
## 4    5    5    2
## 5    2    2    5
## 6    3    3    4
library(psych)
alpha(F1_Meri,
      check.keys = TRUE)
## Warning in alpha(F1_Meri, check.keys = TRUE): Some items were negatively correlated with the first principal component and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: alpha(x = F1_Meri, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean  sd median_r
##        0.8       0.8    0.75      0.58 4.1 0.0095  3.6 1.2     0.52
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.78   0.8  0.82
## Duhachek  0.78   0.8  0.82
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Q10A       0.68      0.68    0.52      0.52 2.1   0.0174    NA  0.52
## Q17A       0.66      0.66    0.50      0.50 2.0   0.0183    NA  0.50
## Q13A-      0.84      0.84    0.72      0.72 5.2   0.0088    NA  0.72
## 
##  Item statistics 
##          n raw.r std.r r.cor r.drop mean  sd
## Q10A  1349  0.87  0.87  0.80   0.70  3.8 1.4
## Q17A  1349  0.87  0.88  0.81   0.71  3.7 1.3
## Q13A- 1349  0.80  0.79  0.59   0.55  3.2 1.4
## 
## Non missing response frequency for each item
##         1    2    3    4    5 miss
## Q10A 0.11 0.11 0.10 0.27 0.41    0
## Q17A 0.10 0.11 0.12 0.30 0.37    0
## Q13A 0.22 0.28 0.12 0.21 0.17    0

Factor analysis was conducted using the minimum residual method on 19 standardized indicators (n = 1350). The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy confirmed the appropriateness of the analysis, with an overall KMO value of 0.92, and individual MSA values all above 0.82, indicating strong suitability for factor analysis.

Based on the parallel analysis and eigenvalues, we determined a four-factor solution using oblique rotation (Oblimin method). The following four factors were identified based on their pattern loadings:

These factors explain 57.8% of the variation in the data. The reliability of F1 was assessed using Cronbach’s alpha, indicating acceptable reliability (α > 0.8).