Reliability
Analysis
Now that we have prepared the data set, we will perform internal
reliability and validity assessments in order to further investigate the
survey data set.
Student Experience
Reliability Analysis
First, let’s look at the correlation plots for the student experience
subset of our survey data set.
The experience portion of the survey is incredibly large, so let’s
take a smaller subset of this portion in order to make our correlation
plot easier to interpret. We will look at questions 5, 6, and 7 for this
subset. We will call this subset “experience.1”.
experience.1 = survey[, 25:34]
Now, let’s look at the correlation plot for this subset of the
student experience portion of the survey.
M=cor(experience.1)
corrplot.mixed(M, lower.col = "purple", upper = "ellipse", number.cex = .7, tl.cex = 0.7)

We can see some moderate correlation between the student experience
survey datafrom how some of the ellipses do appear to be moderately long
and stretched in their shape.
Student Satisfaction
Reliability Analysis
Next, let’s look at the correlation plots for the student
satisfaction subset of the survey data set.
M=cor(satisfaction)
corrplot.mixed(M, lower.col = "purple", upper = "ellipse", number.cex = .7, tl.cex = 0.7)

The satisfaction portion of the survey was so short compared to the
student experience portion, that it is quite hard to judge the
correlation due to the lack of entries on our plot. The ellipse does
appear to be noticeably streched and long, indicating some moderate
correlation does indeed exist.
Cronbach Alpha
Levels
Next, we will calculate the Cronbach alpha levels for each of our two
subsets, along with their 95% confidence intervals. This will allow us
to assess the reliability of the two subsets.
Student
Experience
First, let’s calculate the Cronbach alpha level for the student
experience portion of the survey data set.
cronbach.e = as.numeric(alpha(experience.1)$total[1])
Some items ( q61 q62 q63 q7 ) were negatively correlated with the first principal component and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
CI.e = cronbach.alpha.CI(alpha=cronbach.e, n=332, items=10, conf.level = 0.95)
CI.comp = cbind(LCI = CI.e[1], alpha = cronbach.e, UCI =CI.e[2])
row.names(CI.comp) = ""
pander(CI.comp, caption="Confidence Interval of Cronbach Alpha")
Confidence Interval of Cronbach Alpha
0.4298 |
0.5119 |
0.5866 |
The Cronbach alpha level for the student experience portion of the
survey data set is 0.5119, 95% CI [0.4298, 0.5866]. This value of 0.5119
is not that high, but it is not incredibly low either. This indicates
that their is moderate, but not great, reliability.
Student
Satisfaction
Next, let’s calculate the Cronbach alpha level for the student
satisfaction portion of the survey data set.
cronbach.s = as.numeric(alpha(satisfaction)$total[1])
CI.s = cronbach.alpha.CI(alpha=cronbach.s, n=332, items=2, conf.level = 0.95)
CI.comp = cbind(LCI = CI.s[1], alpha = cronbach.s, UCI =CI.s[2])
row.names(CI.comp) = ""
pander(CI.comp, caption="Confidence Interval of Cronbach Alpha")
Confidence Interval of Cronbach Alpha
0.4854 |
0.5853 |
0.6658 |
The Cronbach alpha level for the student experience portion of the
survey data set is 0.5853, 95% CI [0.4298, 0.6658]. TOnce again, this
value of 0.5853 is not that high, but it is not incredibly low either.
This indicates that their is moderate, but not great, reliability. This
value is slightly higher than that of the student experience subset,
indicating the student satisfaction subset has slightly better
reliability of the two.
Principal Component
Analysis
Now we will perform principal component analysis for the survey data
set. We will find the PCA twice, for both the experience subset and the
satisfaction subset of the student survey data.
We will define functions for our principal component analysis in
order to allow us to create scree plots to help visualize our data.
This first function will allow us to begin with our principal
component analysis, and to later create the plots we will use to further
analyze our findings.
My.plotnScree = function(mat, legend = TRUE, method ="factors", main){
ev <- eigen(cor(mat))
ap <- parallel(subject=nrow(mat),var=ncol(mat), rep=5000,cent=.05)
nScree = nScree(x=ev$values, aparallel=ap$eigen$qevpea, model=method)
if (!inherits(nScree, "nScree"))
stop("Method is only for nScree objects")
if (nScree$Model == "components")
nkaiser = "Eigenvalues > mean: n = "
if (nScree$Model == "factors")
nkaiser = "Eigenvalues > zero: n = "
xlab = nScree$Model
ylab = "Eigenvalues"
par(col = 1, pch = 18)
par(mfrow = c(1, 1))
eig <- nScree$Analysis$Eigenvalues
k <- 1:length(eig)
plot(1:length(eig), eig, type="b", main = main,
xlab = xlab, ylab = ylab, ylim=c(0, 1.2*max(eig)))
#
nk <- length(eig)
noc <- nScree$Components$noc
vp.p <- lm(eig[c(noc + 1, nk)] ~ k[c(noc + 1, nk)])
x <- sum(c(1, 1) * coef(vp.p))
y <- sum(c(1, nk) * coef(vp.p))
par(col = 10)
lines(k[c(1, nk)], c(x, y))
par(col = 11, pch = 20)
lines(1:nk, nScree$Analysis$Par.Analysis, type = "b")
if (legend == TRUE) {
leg.txt <- c(paste(nkaiser, nScree$Components$nkaiser),
c(paste("Parallel Analysis: n = ", nScree$Components$nparallel)),
c(paste("Optimal Coordinates: n = ", nScree$Components$noc)),
c(paste("Acceleration Factor: n = ", nScree$Components$naf))
)
legend("topright", legend = leg.txt, pch = c(18, 20, NA, NA),
text.col = c(1, 3, 2, 4),
col = c(1, 3, 2, 4), bty="n", cex=0.7)
}
naf <- nScree$Components$naf
text(x = noc, y = eig[noc], label = " (OC)", cex = 0.7,
adj = c(0, 0), col = 2)
text(x = naf + 1, y = eig[naf + 1], label = " (AF)",
cex = 0.7, adj = c(0, 0), col = 4)
}
This next function will help us with analyzing the various factors in
the PCA. This will allow us to analyze the factor loadings and the
proportion variance that can be explained by each factor. This will
allow us to see the proportion of the total variation which can be
explained by each principal component in the model.
My.loadings.var <- function(mat, nfct, method="fa"){
if(method == "fa"){
f1 <- factanal(mat, factors = nfct, rotation = "varimax")
x <- loadings(f1)
vx <- colSums(x^2)
varSS = rbind('SS loadings' = vx,
'Proportion Var' = vx/nrow(x),
'Cumulative Var' = cumsum(vx/nrow(x)))
weight = f1$loadings[]
} else if (method == "pca"){
pca <- prcomp(mat, center = TRUE, scale = TRUE)
varSS = summary(pca)$importance[,1:nfct]
weight = pca$rotation[,1:nfct]
}
list(Loadings = weight, Prop.Var = varSS)
}
Student Experience
PCA
First, we will perform PCA for the student experience subset of the
survey data set.
To avoid from having too many principal components since the studnt
experience portion of the survey is so large, we will use the
experience.1 subset of the student experience survey questions which was
created earlier during the validity analysis.
Let’s calculate the principal components for the experience.1 data
set.
experience.pca <- prcomp(experience.1, center = TRUE, scale = TRUE)
Now, we will find the factor loading of the PCA for the student
experience survey questions.
kable(round(experience.pca$rotation, 2), caption="Factor Loadings of the PCA")
Factor Loadings of the PCA
q51 |
0.26 |
-0.09 |
-0.15 |
0.60 |
-0.57 |
0.29 |
0.18 |
-0.30 |
0.04 |
0.03 |
q52 |
0.41 |
-0.07 |
0.05 |
0.09 |
-0.16 |
-0.54 |
0.08 |
0.43 |
0.55 |
-0.09 |
q53 |
0.44 |
-0.09 |
-0.05 |
-0.01 |
-0.04 |
-0.32 |
0.07 |
0.10 |
-0.79 |
-0.24 |
q54 |
0.41 |
-0.02 |
-0.02 |
-0.28 |
0.18 |
-0.07 |
-0.18 |
-0.71 |
0.25 |
-0.35 |
q55 |
0.44 |
-0.10 |
-0.02 |
-0.21 |
0.10 |
0.05 |
-0.01 |
-0.07 |
-0.03 |
0.85 |
q56 |
0.39 |
-0.15 |
-0.08 |
-0.03 |
0.22 |
0.67 |
-0.23 |
0.44 |
0.07 |
-0.26 |
q61 |
-0.10 |
-0.66 |
-0.07 |
-0.04 |
0.30 |
0.03 |
0.66 |
-0.06 |
0.06 |
-0.07 |
q62 |
-0.01 |
-0.32 |
0.80 |
-0.27 |
-0.41 |
0.13 |
-0.07 |
-0.01 |
-0.04 |
-0.05 |
q63 |
-0.17 |
-0.60 |
-0.08 |
0.34 |
0.11 |
-0.22 |
-0.65 |
-0.05 |
-0.03 |
0.08 |
q7 |
-0.14 |
-0.22 |
-0.56 |
-0.56 |
-0.53 |
0.01 |
-0.09 |
0.08 |
0.03 |
-0.04 |
We can see that we have ten total principal components for the
student experience data.
kable(summary(experience.pca)$importance, caption="The Importance of Each Principal Component")
The Importance of Each Principal Component
Standard deviation |
1.903139 |
1.224847 |
0.9956294 |
0.9782884 |
0.9071515 |
0.7291144 |
0.7027027 |
0.6709049 |
0.5693246 |
0.5540031 |
Proportion of Variance |
0.362190 |
0.150030 |
0.0991300 |
0.0957000 |
0.0822900 |
0.0531600 |
0.0493800 |
0.0450100 |
0.0324100 |
0.0306900 |
Cumulative Proportion |
0.362190 |
0.512220 |
0.6113500 |
0.7070500 |
0.7893400 |
0.8425000 |
0.8918800 |
0.9369000 |
0.9693100 |
1.0000000 |
The first PC accounts for around 36.22% of the total variation. The
first two PCs account for around 51.22% of the total variation. The
first three PCs account for around 61.14% of the total variation.
Scree Plot
Let’s take a look at the scree plot for the student experience survey
portion.
My.plotnScree(mat=experience.1, legend = TRUE, method ="components",
main="Determination of Number of Components\n Student Experience (Positive)")

As we can see, we have ten principal components just like we saw
before in the analysis portion. The elbow of the scree plot appears to
occur around when our components equals two. So this would be the ideal
value of components to choose for further analysis on the student
experience survey data.
Student Experience
Distribution Histogram
Next, we can look at a historgram of the distribution of the student
experience index.
pca <- prcomp(experience.1, center = TRUE, scale = TRUE)
se.idx = pca$x[,1]
hist(se.idx,
main="Distribution of Student Experience Index",
breaks = seq(min(se.idx), max(se.idx), length=9),
xlab="Self-compassion Index",
xlim=range(se.idx),
border="purple",
col="lightblue",
freq=FALSE
)

As we can see, the student experience index appears to be normally
distributed without any skew or outliers present.
Student Satisfaction
PCA
Now, we will perform PCA again, this time for the student
satisfaction subset of the survey data set.
satisfaction.pca <- prcomp(satisfaction, center = TRUE, scale = TRUE)
Now, we will find the factor loading of the PCA for the student
satisfaction survey questions.
kable(round(satisfaction.pca$rotation, 2), caption="Factor Loadings of the PCA")
Factor Loadings of the PCA
q17 |
0.71 |
-0.71 |
q18 |
0.71 |
0.71 |
We can see that we have two total principal components for the
student experience data.
kable(summary(satisfaction.pca)$importance, caption="The Importance of Each Principal Component")
The Importance of Each Principal Component
Standard deviation |
1.231695 |
0.6949298 |
Proportion of Variance |
0.758540 |
0.2414600 |
Cumulative Proportion |
0.758540 |
1.0000000 |
The first PC accounts for around 75.85% of the total variation. The
first two PCs account for 100% of the total variation.
Student
Satisfaction Distribution Histogram
Next, we can look at a historgram of the distribution of the student
satisfaction index.
pca <- prcomp(satisfaction, center = TRUE, scale = TRUE)
sat.idx = pca$x[,1]
hist(sat.idx,
main="Distribution of Student Experience Index",
breaks = seq(min(sat.idx), max(sat.idx), length=9),
xlab="Self-compassion Index",
xlim=range(sat.idx),
border="purple",
col="lightblue",
freq=FALSE
)

As we can see, the student satisfaction index does not appear to be
normally distributed. Instead, the distribution appears to be noticeably
skewed to the right, with some potential outliers present on the right
tail of the graph.
Concluslion and
Recommendations
Overall, the principal component analysis allowed us to further
analysis and better understand the survey data set. For further
recommendations for this project, something which stood out is how long
the student experience portion of the survey is in comparison to the
student satisfaction portion of the survey. In the survey, there only
appeared to be two questions which directly related to student
satisfaction.
These two student satisfaction questions were questions 17 and 18,
with those being: “17. Would you recommend our School of Business to a
friend or family member? 1 -Yes, 2 - No.” and “18. How would you
evaluate your entire educational experience at our school? 1 -Excellent,
2 - Good, 3 - Fair, 4 - Poor”.
This student satisfaction portion of the survey was incredibly short,
and led to it being impossible to create a scree plot for the student
satisfaction portion as we would need three or more prinicpal components
in order to do so, but we only had two principal components for this
subset of the survey data set.
So, the main recommendation I would make for future projects is to
expand the student satisfaction portion of the data set to have more
questions, in order to provide for further analysis of these survey
questions.
Project Questions
Lastly, we will now look at some potential project questions based
upon the analysis of the survey data set and the results that were
found.
Some potential questions include:
Which factors of student experience showed the strongest
influence on student satisfaction?
Did these particular factors with the strongest influence on
satisfaction show stronger reliability when compared to the
others?
Do students appear to be mostly satisfied with their school
experience?
How can we analysis the relationship between the principal
components in order to learn more about the revelant factors in relation
to a student’s overall experience and satisfaction in college?
For this project, I would be interesting in looking into particularly
the topic of which student experience factors have the greatest
significance or effect on a student’s satisfaction with their college
experience. This would be helpful because it would give specific factors
for college faculty or advisors to look out for in their students in
order to help them have a more positive and satisfactory experience
during their time in college.
I think it would be interesting to see if these particular factors
with stronger influence on a student’s satisfaction have stronger
reliability. It would be interesting to compare the reliability results
of these particular factors to see if their appears stronger or weaker
than the average. This would help to see if these findings truly are
reliable or not.
Additionally, we could use the findings in the principal component
analysis to further investigate the survey data set. We could look into
the relationships between the various principal components in order to
learn more about the relationships between various factors in student
experience and satisfaction. It would also be interesting to use which
loading factors had the highest importance, in order to create models
based upon these specific factors in order to see their importance and
influence on overall student experience and satisfaction.
