The dataset used in this analysis is the Student Performance Dataset obtained from the kaggle Repository.
student-por.csv (Student
Performance in Portuguese)The following packages are used in this analysis: psych
for multivariate statistical methods, dplyr for data
manipulation, and corrplot for correlation
visualization.
The dataset used is student-por.csv. Column names are
standardized to X1, X2, ..., and only numeric variables are
retained for analysis.
data <- read.csv("student-por.csv")
colnames(data) <- paste0("X", 1:ncol(data))
data_numeric <- data[sapply(data, is.numeric)]Prior to conducting PCA or Factor Analysis, two key assumptions must be verified.
A correlation matrix is computed and visualized to examine the linear relationships among variables. Sufficient inter-variable correlation is a prerequisite for dimensionality reduction.
r <- cor(data_numeric)
corrplot(r,
tl.col = "black",
tl.srt = 45,
tl.cex = 0.5,
title = "Correlation Matrix",
mar = c(0, 0, 1, 0))The KMO measure assesses sampling adequacy for each variable and for the overall model. Values closer to 1 indicate that the data is well-suited for factor analysis (minimum acceptable value: 0.5).
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.73
## MSA for each item =
## X3 X7 X8 X13 X14 X15 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33
## 0.62 0.64 0.63 0.81 0.86 0.86 0.53 0.60 0.61 0.66 0.61 0.52 0.75 0.87 0.73 0.78
Bartlett’s test examines whether the correlation matrix is significantly different from an identity matrix. A p-value below 0.05 confirms that the correlations are significant and factor analysis is appropriate.
## $chisq
## [1] 3535.842
##
## $p.value
## [1] 0
##
## $df
## [1] 120
Data is standardized using scale() before analysis.
Eigenvalues represent the amount of variance explained by each
component, while eigenvectors define the direction of each principal
component.
## Eigenvalues:
## [1] 3.66522444 1.94781579 1.43169697 1.32197163 1.13753974 1.01059364
## [7] 0.92486924 0.86029726 0.80786480 0.78054845 0.62896637 0.55434404
## [13] 0.36239871 0.31735807 0.17404805 0.07446281
##
## Eigenvectors:
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.14985339 -0.03072974 -0.16769887 0.25904502 0.65152307 0.20999128
## [2,] -0.22778366 -0.34350437 0.46521149 0.05378557 0.09141139 -0.13246905
## [3,] -0.20001281 -0.37039517 0.46487844 0.03856801 0.04284566 -0.11131647
## [4,] 0.14250860 0.12769961 -0.35760847 -0.01091853 -0.22002103 -0.13026619
## [5,] -0.19858804 0.10719291 -0.05903041 0.02697283 0.22305221 -0.14057688
## [6,] 0.28726402 0.06175917 0.04979487 0.09275109 0.43967470 0.07358447
## [7,] -0.05159628 -0.02533133 -0.04504691 -0.54908284 0.22422207 0.18629627
## [8,] 0.10442511 -0.23602892 -0.10530609 -0.50187008 0.21684623 -0.28119650
## [9,] 0.12115860 -0.40757729 -0.27824466 -0.22741374 0.15363838 -0.30955128
## [10,] 0.20654530 -0.42984408 -0.18983621 0.18829647 -0.19944469 0.12951925
## [11,] 0.19741902 -0.49027603 -0.20920886 0.12194532 -0.24291999 0.13310220
## [12,] 0.05303337 -0.10581936 0.14869974 -0.30715649 -0.06955755 0.77759489
## [13,] 0.11956444 -0.17069172 0.02006425 0.39664975 0.19295824 0.05887652
## [14,] -0.44885921 -0.09250038 -0.24653957 0.03434532 0.02076939 0.09753811
## [15,] -0.45850114 -0.09809644 -0.27185712 0.05818802 0.07982333 0.11375158
## [16,] -0.45243720 -0.07833531 -0.27919240 0.08611296 0.07270050 0.09567593
## [,7] [,8] [,9] [,10] [,11]
## [1,] 0.114015038 -0.16222683 0.13606380 0.193628830 -0.336115799
## [2,] 0.051197390 -0.24946875 0.09292882 -0.006075780 0.001706861
## [3,] -0.001394447 -0.29187735 0.10087873 -0.121636503 -0.048631536
## [4,] -0.030180354 -0.72581397 0.12425189 -0.459310367 -0.053248357
## [5,] 0.694200166 -0.08857760 -0.56600371 -0.142406257 0.138249125
## [6,] 0.057176654 -0.05755585 0.39371306 -0.123686937 0.455292013
## [7,] -0.346276697 -0.39512505 -0.34436116 0.387211263 0.226973064
## [8,] 0.024955959 0.31101478 0.14104994 -0.366786318 0.285482662
## [9,] 0.007049263 0.06374569 -0.06784500 0.006148621 -0.532870267
## [10,] 0.188057001 -0.09883508 -0.09018961 0.196375633 0.410075906
## [11,] 0.117368759 0.02284601 0.01145988 0.162399841 0.046834611
## [12,] 0.201494950 0.04944765 -0.01466885 -0.390958344 -0.216634708
## [13,] -0.521810725 0.11540732 -0.52412354 -0.433876239 0.068976773
## [14,] -0.020275687 0.06586853 0.12929409 -0.076215242 0.084678795
## [15,] -0.080247956 0.02449356 0.13433970 -0.027355273 0.079179844
## [16,] -0.093942377 0.01950597 0.10408048 -0.063543278 0.054472867
## [,12] [,13] [,14] [,15] [,16]
## [1,] 0.43729750 -0.064976291 0.054234948 -0.076697473 0.0149173422
## [2,] 0.02837008 0.553545501 0.450636657 0.030514154 0.0121337007
## [3,] 0.02941887 -0.551261195 -0.415771751 -0.028660109 -0.0015788903
## [4,] 0.07390635 0.028154359 0.078661109 -0.018743882 -0.0158783820
## [5,] -0.11253698 -0.093475518 0.059456701 0.025464342 -0.0157481237
## [6,] -0.56295211 0.010095784 -0.040018619 0.030545617 0.0110342543
## [7,] -0.06709189 -0.059781296 0.050993641 -0.030302950 0.0231102783
## [8,] 0.44028392 -0.072061191 0.109237384 0.007686881 0.0037100047
## [9,] -0.43023156 0.201269288 -0.227963872 0.018656344 -0.0029512550
## [10,] 0.22754179 0.342966810 -0.463482654 -0.003060498 0.0174839171
## [11,] -0.15134756 -0.443411341 0.567853105 0.017805781 -0.0002096884
## [12,] -0.05142337 0.102208766 -0.049819084 0.037794225 -0.0003304998
## [13,] -0.01790039 0.019556142 0.045925731 -0.039413469 -0.0212870145
## [14,] -0.09885720 0.033394199 0.002605489 -0.799282659 0.1840359090
## [15,] -0.02910992 0.008527162 -0.025007825 0.234189012 -0.7713986639
## [16,] -0.01849440 -0.017650374 -0.028241235 0.540463936 0.6072604228
This table summarizes the variance explained by each principal component. Components are typically retained until the cumulative variance reaches 70–80%.
sumvar <- sum(pc$values)
propvar <- (pc$values / sumvar) * 100
cumvar <- data.frame(
eigen_value = pc$values,
prop_var = propvar
) %>% mutate(cum_var = cumsum(prop_var))
row.names(cumvar) <- paste0("PC", 1:length(pc$values))
print(cumvar)## eigen_value prop_var cum_var
## PC1 3.66522444 22.9076527 22.90765
## PC2 1.94781579 12.1738487 35.08150
## PC3 1.43169697 8.9481060 44.02961
## PC4 1.32197163 8.2623227 52.29193
## PC5 1.13753974 7.1096234 59.40155
## PC6 1.01059364 6.3162102 65.71776
## PC7 0.92486924 5.7804328 71.49820
## PC8 0.86029726 5.3768579 76.87505
## PC9 0.80786480 5.0491550 81.92421
## PC10 0.78054845 4.8784278 86.80264
## PC11 0.62896637 3.9310398 90.73368
## PC12 0.55434404 3.4646503 94.19833
## PC13 0.36239871 2.2649919 96.46332
## PC14 0.31735807 1.9834879 98.44681
## PC15 0.17404805 1.0878003 99.53461
## PC16 0.07446281 0.4653925 100.00000
The scree plot visualizes eigenvalues across components. The red dashed line at eigenvalue = 1 represents Kaiser’s Rule — components above this threshold are generally retained.
plot(pc$values,
type = "b",
pch = 19,
xlab = "Principal Component",
ylab = "Eigenvalue",
main = "Scree Plot")
abline(h = 1, col = "red", lty = 2)Principal component scores are obtained by projecting the standardized data onto the eigenvectors.
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.3292614 -0.0618981 2.1573345 -0.01230047 0.9384844 -1.33662794
## [2,] 0.4836354 1.8017441 -0.3527618 -0.89723782 0.4278207 0.07146723
## [3,] -0.1229392 0.6573366 -0.8498326 0.28489355 -1.3347787 0.40738294
## [4,] -2.5683083 0.9456003 1.1440652 0.31392107 -0.8903418 0.68025211
## [5,] -1.1999696 0.2833878 1.2290959 -0.55949421 -0.4868174 0.89345203
## [6,] -1.2069326 -0.4911520 1.5212548 -1.05872144 0.2643133 0.77670310
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] -0.02875205 -1.64788797 -0.38973340 0.05352091 -1.5768793 0.8647519
## [2,] -0.42351752 0.60905607 -0.84112406 1.19294478 -0.1631189 0.2311293
## [3,] -0.42436581 1.35763655 -0.91220979 0.43416981 1.3439901 -0.1895358
## [4,] 1.35191505 0.50231375 -0.08922503 -0.46951645 0.1793541 -0.7239119
## [5,] 0.39382003 0.24427036 0.27779824 0.05086577 0.1000340 0.1270744
## [6,] -0.55412130 0.07164701 -0.54306251 -0.47762282 0.7034832 0.4515991
## [,13] [,14] [,15] [,16]
## [1,] 0.04515699 -0.39254752 2.99452824 -0.7810214
## [2,] -0.03195025 -0.36373457 0.42907611 -0.1606329
## [3,] -0.29978654 0.08736565 0.01588474 -0.3469934
## [4,] 1.13316391 0.27980951 0.03707345 -0.1092970
## [5,] -0.30686446 0.09134754 0.50570986 -0.1869752
## [6,] 0.08444541 0.71473403 0.08572768 0.1557004
principal()The principal() function from the psych
package provides a streamlined approach to PCA. Here, 6 components are
extracted without rotation.
## Principal Components Analysis
## Call: principal(r = scale_data, nfactors = 6, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 PC5 PC6 h2 u2 com
## X3 -0.29 0.04 0.20 -0.30 0.69 0.21 0.74 0.26 2.2
## X7 0.44 0.48 -0.56 -0.06 0.10 -0.13 0.76 0.24 3.1
## X8 0.38 0.52 -0.56 -0.04 0.05 -0.11 0.74 0.26 2.9
## X13 -0.27 -0.18 0.43 0.01 -0.23 -0.13 0.36 0.64 3.0
## X14 0.38 -0.15 0.07 -0.03 0.24 -0.14 0.25 0.75 2.5
## X15 -0.55 -0.09 -0.06 -0.11 0.47 0.07 0.55 0.45 2.2
## X24 0.10 0.04 0.05 0.63 0.24 0.19 0.50 0.50 1.6
## X25 -0.20 0.33 0.13 0.58 0.23 -0.28 0.63 0.37 3.0
## X26 -0.23 0.57 0.33 0.26 0.16 -0.31 0.68 0.32 3.4
## X27 -0.40 0.60 0.23 -0.22 -0.21 0.13 0.68 0.32 2.8
## X28 -0.38 0.68 0.25 -0.14 -0.26 0.13 0.78 0.22 2.4
## X29 -0.10 0.15 -0.18 0.35 -0.07 0.78 0.81 0.19 1.7
## X30 -0.23 0.24 -0.02 -0.46 0.21 0.06 0.36 0.64 2.6
## X31 0.86 0.13 0.29 -0.04 0.02 0.10 0.85 0.15 1.3
## X32 0.88 0.14 0.33 -0.07 0.09 0.11 0.92 0.08 1.4
## X33 0.87 0.11 0.33 -0.10 0.08 0.10 0.90 0.10 1.4
##
## PC1 PC2 PC3 PC4 PC5 PC6
## SS loadings 3.67 1.95 1.43 1.32 1.14 1.01
## Proportion Var 0.23 0.12 0.09 0.08 0.07 0.06
## Cumulative Var 0.23 0.35 0.44 0.52 0.59 0.66
## Proportion Explained 0.35 0.19 0.14 0.13 0.11 0.10
## Cumulative Proportion 0.35 0.53 0.67 0.80 0.90 1.00
##
## Mean item complexity = 2.3
## Test of the hypothesis that 6 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 436.27 with prob < 1.3e-68
##
## Fit based upon off diagonal values = 0.94
PC scores are recovered from the loadings matrix and eigenvalues as follows.
L <- as.matrix(pc_psych$loadings)
lambda_k <- pc_psych$values[1:ncol(L)]
V <- sweep(L, 2, sqrt(lambda_k), "/")
scores_PC <- scale_data %*% V
head(scores_PC)## PC1 PC2 PC3 PC4 PC5 PC6
## [1,] -1.3292614 0.0618981 -2.1573345 0.01230047 0.9384844 -1.33662794
## [2,] -0.4836354 -1.8017441 0.3527618 0.89723782 0.4278207 0.07146723
## [3,] 0.1229392 -0.6573366 0.8498326 -0.28489355 -1.3347787 0.40738294
## [4,] 2.5683083 -0.9456003 -1.1440652 -0.31392107 -0.8903418 0.68025211
## [5,] 1.1999696 -0.2833878 -1.2290959 0.55949421 -0.4868174 0.89345203
## [6,] 1.2069326 0.4911520 -1.5212548 1.05872144 0.2643133 0.77670310
Factor analysis without rotation extracts the initial factor solution. This serves as the baseline before applying any rotation method.
## Principal Components Analysis
## Call: principal(r = scale_data, nfactors = 6, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 PC5 PC6 h2 u2 com
## X3 -0.29 0.04 0.20 -0.30 0.69 0.21 0.74 0.26 2.2
## X7 0.44 0.48 -0.56 -0.06 0.10 -0.13 0.76 0.24 3.1
## X8 0.38 0.52 -0.56 -0.04 0.05 -0.11 0.74 0.26 2.9
## X13 -0.27 -0.18 0.43 0.01 -0.23 -0.13 0.36 0.64 3.0
## X14 0.38 -0.15 0.07 -0.03 0.24 -0.14 0.25 0.75 2.5
## X15 -0.55 -0.09 -0.06 -0.11 0.47 0.07 0.55 0.45 2.2
## X24 0.10 0.04 0.05 0.63 0.24 0.19 0.50 0.50 1.6
## X25 -0.20 0.33 0.13 0.58 0.23 -0.28 0.63 0.37 3.0
## X26 -0.23 0.57 0.33 0.26 0.16 -0.31 0.68 0.32 3.4
## X27 -0.40 0.60 0.23 -0.22 -0.21 0.13 0.68 0.32 2.8
## X28 -0.38 0.68 0.25 -0.14 -0.26 0.13 0.78 0.22 2.4
## X29 -0.10 0.15 -0.18 0.35 -0.07 0.78 0.81 0.19 1.7
## X30 -0.23 0.24 -0.02 -0.46 0.21 0.06 0.36 0.64 2.6
## X31 0.86 0.13 0.29 -0.04 0.02 0.10 0.85 0.15 1.3
## X32 0.88 0.14 0.33 -0.07 0.09 0.11 0.92 0.08 1.4
## X33 0.87 0.11 0.33 -0.10 0.08 0.10 0.90 0.10 1.4
##
## PC1 PC2 PC3 PC4 PC5 PC6
## SS loadings 3.67 1.95 1.43 1.32 1.14 1.01
## Proportion Var 0.23 0.12 0.09 0.08 0.07 0.06
## Cumulative Var 0.23 0.35 0.44 0.52 0.59 0.66
## Proportion Explained 0.35 0.19 0.14 0.13 0.11 0.10
## Cumulative Proportion 0.35 0.53 0.67 0.80 0.90 1.00
##
## Mean item complexity = 2.3
## Test of the hypothesis that 6 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 436.27 with prob < 1.3e-68
##
## Fit based upon off diagonal values = 0.94
Varimax rotation simplifies the factor structure by maximizing the variance of squared loadings within each factor. This yields a cleaner, more interpretable solution.
## Principal Components Analysis
## Call: principal(r = scale_data, nfactors = 6, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 RC3 RC4 RC5 RC6 h2 u2 com
## X3 0.01 0.04 -0.09 0.06 0.85 0.03 0.74 0.26 1.0
## X7 0.15 0.02 0.85 0.03 -0.09 -0.07 0.76 0.24 1.1
## X8 0.11 0.08 0.84 0.04 -0.12 -0.04 0.74 0.26 1.1
## X13 -0.08 0.14 -0.54 0.05 -0.11 -0.16 0.36 0.64 1.5
## X14 0.34 -0.32 0.06 0.03 0.07 -0.16 0.25 0.75 2.6
## X15 -0.42 -0.04 -0.10 0.08 0.59 0.03 0.55 0.45 1.9
## X24 0.10 -0.24 -0.01 0.47 -0.04 0.47 0.50 0.50 2.6
## X25 -0.13 0.04 0.02 0.78 -0.02 0.05 0.63 0.37 1.1
## X26 0.01 0.40 -0.01 0.70 0.09 -0.15 0.68 0.32 1.7
## X27 -0.10 0.81 -0.03 0.07 0.11 0.01 0.68 0.32 1.1
## X28 -0.07 0.86 -0.01 0.14 0.04 0.05 0.78 0.22 1.1
## X29 -0.08 0.15 0.07 -0.06 -0.01 0.87 0.81 0.19 1.1
## X30 -0.08 0.31 0.13 -0.15 0.44 -0.15 0.36 0.64 2.7
## X31 0.90 -0.08 0.14 -0.03 -0.15 0.02 0.85 0.15 1.1
## X32 0.94 -0.09 0.14 -0.03 -0.08 0.02 0.92 0.08 1.1
## X33 0.93 -0.09 0.11 -0.05 -0.08 -0.02 0.90 0.10 1.1
##
## RC1 RC2 RC3 RC4 RC5 RC6
## SS loadings 2.95 1.89 1.82 1.39 1.37 1.10
## Proportion Var 0.18 0.12 0.11 0.09 0.09 0.07
## Cumulative Var 0.18 0.30 0.42 0.50 0.59 0.66
## Proportion Explained 0.28 0.18 0.17 0.13 0.13 0.10
## Cumulative Proportion 0.28 0.46 0.63 0.77 0.90 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 6 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 436.27 with prob < 1.3e-68
##
## Fit based upon off diagonal values = 0.94
Factor scores are computed using the regression method, which multiplies the standardized data by the inverse of the correlation matrix and the loadings matrix.
## PC1 PC2 PC3 PC4 PC5 PC6
## [1,] -0.69432072 0.04435099 -1.8029843 0.0106982 0.8799212 -1.32960382
## [2,] -0.25262005 -1.29097894 0.2948194 0.7803627 0.4011239 0.07109167
## [3,] 0.06421553 -0.47099235 0.7102444 -0.2477830 -1.2514860 0.40524210
## [4,] 1.34151914 -0.67753800 -0.9561482 -0.2730294 -0.8347828 0.67667731
## [5,] 0.62678699 -0.20305197 -1.0272123 0.4866139 -0.4564391 0.88875685
## [6,] 0.63042399 0.35191843 -1.2713830 0.9208113 0.2478196 0.77262145
A descriptive statistics table is presented to provide an overview of the distribution of each numeric variable.
desc_table <- data.frame(
Min = sapply(data_numeric, min),
Max = sapply(data_numeric, max),
Mean = round(sapply(data_numeric, mean), 2),
Median = sapply(data_numeric, median),
SD = round(sapply(data_numeric, sd), 2)
)
print(desc_table)## Min Max Mean Median SD
## X3 15 22 16.74 17 1.22
## X7 0 4 2.51 2 1.13
## X8 0 4 2.31 2 1.10
## X13 1 4 1.57 1 0.75
## X14 1 4 1.93 2 0.83
## X15 0 3 0.22 0 0.59
## X24 1 5 3.93 4 0.96
## X25 1 5 3.18 3 1.05
## X26 1 5 3.18 3 1.18
## X27 1 5 1.50 1 0.92
## X28 1 5 2.28 2 1.28
## X29 1 5 3.54 4 1.45
## X30 0 32 3.66 2 4.64
## X31 0 19 11.40 11 2.75
## X32 0 19 11.57 11 2.91
## X33 0 19 11.91 12 3.23