1 Introduction

The data in this analysis comes from a survey conducted at regional universities in the U.S. seeking to investigate which factors may impact the satisfaction level of undergraduate business students with their institution. The study population was defined to be all undergraduate business students at these two colleges. A total of 332 sets of student responses were collected. The survey consists of several multi-item questions corresponding to different factors which may influence a student’s satisfaction with their school. The aim of this analysis is to validate the internal consistency of the subscale pertaining to the students’ level of engagement in learning to assess whether principal component analysis would be appropriate for aggregation purposes. This evaluation will be based on the calculation and interpretion of its corresponding Cronbach alpha value and bootstrap confidence interval.

2 Data Management & Analysis of Subscale

The data set is preprocessed and thus there are no true missing values to be handled. However, each observation is separated by a full row of missing values, presumably for ease of reading. These rows will be deleted to eliminate the risk of calculation issues due to missing values later in the analysis. Then, a new data frame be created which consists of only the columns corresponding to the questions in the learning engagement subscale.

survey <- na.omit(read.csv("https://pengdsci.github.io/STA490/w11/at-risk-survey-data.csv"))
engagement <- survey[, 4:24]

The subscale of interest consists of 21 questions relating to different forms or means of learning engagement and participation in which a student may partake, such as making a class presentation or working with other students on a class project. It is to be answered using a Likert-type scale with four possible responses indicating how frequently the student engages in each activity: 1 - Very often, 2 - Often, 3 - Sometimes, 4 - Never.

Most of the questions inquire about what would generally be considered a positive or productive activity. Thus, a lower numerical response (e.g., 1 for “Very often” as opposed to 4 for “Never”) can be seen as an indication of greater level of learning engagement by the student. However, question 4.5 asks how frequently a student comes to class without having completed their readings or assignments, while 4.21 asks how often they skip class entirely. Unlike the rest of the questions, a lower numerical response to these two questions would suggest that the student is less engaged in learning activities. Therefore, the scale for these two questions will be reversed so that each question’s scale follows the same order from more engagement to less engagement as the numerical value increases.

engagement.new <- engagement
engagement.new$q45 <- 5 - engagement$q45
engagement.new$q421 <- 5 - engagement$q421

3 Validity & Consistency Analysis

3.1 Correlation Plot

As a preliminary step in assessing the internal consistency of this subscale, a correlation plot will be generated and interpreted as a general visual representation of the correlation among the variables, i.e., questions.

library(corrplot)
M=cor(engagement.new)
corrplot.mixed(M, lower.col = "royalblue3", upper = "ellipse", number.cex = .45, tl.cex = 0.55)

Overall, the plot does offer some support for the use of PCA as an aggregation tool, as the great majority of the pairs of questions appear to exhibit a weak to moderate positive correlation, as indicated by the blue, upward-sloping ellipses. There are also a few pairs for which the correlation is noticeably strong, indicated by darker blue ellipses which are thinner in shape.

Interestingly, questions 4.5 & 4.21 actually exhibit a negative correlation, albeit very weak in most cases, with almost all of the other questions despite reversing the order of the scale to match the others in theory. One possible explanation is that most students are generally not very likely to fail to do assignments or skip class frequently even if they rarely give presentations, discuss grades with their instructors, etc. Thus, many of the responses containing higher-numbered answers (closer to 4) indicating minimal engagement across most activities may still have lower-numbered answers (closer to 1) for questions 4.5 & 4.21. It may also be the case that students were simply less forthcoming about frequent failure to do their work or attend class than they were about infrequently engaging in activities that are not necessarily as essential or mandatory to academic success, perhaps out of fear of disciplinary action. Admittedly, neither of these explanations fully account for the other side of the negative relationship, i.e., answers to question 4.5 & 4.21 indicating frequent failure to complete assignments and attend class corresponding to answers to the rest of the questions which suggest frequent engagement in other learning activities. That said, most of these negative correlations were quite weak as noted previously, so it may not be particularly significant or impactful to the analysis as a whole.

3.2 Cronbach Alpha

Having detected ample evidence for correlation among the questions from a visual representation, we will now compute a Cronbach alpha value for the subscale, a numerical metric indicating the level of internal consistency. This metric will be computed for this set of responses, along with a 95% confidence interval for the value generated via bootstrapping.

library(psych)
cronbach.eng = as.numeric(alpha(engagement.new, check.keys = TRUE)$total[1])
kable(cronbach.eng, caption = "Cronbach Alpha")
Cronbach Alpha
x
0.8780093
set.seed(123)
num_bootstraps <- 1000
bootstrap_alphas <- numeric(num_bootstraps)
for (i in 1:num_bootstraps) {
  boot_sample <- sample(engagement.new, replace = TRUE)  
  bootstrap_alphas[i] <- as.numeric(alpha(boot_sample, check.keys = TRUE)$total[1])
}

lci.025 = round(quantile(bootstrap_alphas, 0.025, type = 2),8)
uci.975 = round(quantile(bootstrap_alphas,0.975, type = 2 ),8)
bootstrap.ci = paste("[", round(lci.025,4),", ", round(uci.975,4),"]")
kable(bootstrap.ci, caption = "95% Bootstrap Confidence Interval for Cronbach Alpha")
95% Bootstrap Confidence Interval for Cronbach Alpha
x
[ 0.8677 , 0.9189 ]

Both the single value for Cronbach alpha calculated directly from the data (alpha = 0.878) as well as the 95% bootstrap confidence interval (0.8677, 0.9189) indicate very good internal consistency among the items in this subscale. It should be noted that these values were computed using the original scale for questions 4.5 & 4.21 as uniform positive correlation among the questions simplified the Cronbach alpha calculation. When using the reversed scales for these questions, the single value and lower limit of the confidence interval dropped slightly, but still fell into the range of values indicating a very good internal consistency. Therefore, it can be concluded that aggregation of this information via PCA is appropriate and advisable.