Introduction
For this assignment, we will be looking at a data set for a survey
done on student experience and satisfaction. This survey asks students
questions regarding their experience in college. These questions include
gathering information on a student’s academic background, their year
level and credit status, along with their engagement in learning their
class material both inside and outside of school. This survey also asks
students about which campus resources they have used throughout their
time in college. The survey looks at the satisfaction rating students
have regarding the college, along with their information to gather data
about the students at the college and their opinions regarding their
experience and their satisfaction level regarding their college.
For this project, we will take a look at this survey data and begin
with analyzing the data set along with conducting some exploratory data
analysis steps to start. Then, we will perform internal reliability
analysis of the subscales in the survey. We will consider some potential
consulting questions which could be drawn from this survey data for
future projects on this survey data set.
Data Description
Let’s read in the survey data set from GitHub. We will call the data
set as “survey”.
survey <- read.csv("https://raw.githubusercontent.com/JosieGallop/STA490/refs/heads/main/student-satisfaction-survey.csv")
As we can see, the survey data set contains 332 observations of 121
variables. The variables represent the different questions within the
survey.
Exploratory Data
Analysis
We will perform some exploratory data analysis to gain a better
understanding of the survey data set, and to further prepare the data
for additional analysis related to reliability and validation of the
survey data.
Dealing with the
Missing Values
For the first step of our exploratory data analysis, we will check
for and fix any missing value concerns within our data set. First, let’s
check to see if there indeed are some missing values which we will need
to fix.
colSums(is.na(survey))
q1 q2 q3 q41 q42 q43 q44 q45 q46 q47
0 0 0 0 0 0 0 0 0 0
q48 q49 q410 q411 q412 q413 q414 q415 q416 q417
0 0 0 0 0 0 0 0 0 0
q418 q419 q420 q421 q51 q52 q53 q54 q55 q56
0 0 0 0 0 0 0 0 0 0
q61 q62 q63 q7 q81 q82 q83 q84 q85 q86
0 0 0 0 0 0 0 0 0 0
q87 q88 q89 q91 q92 q93 q94 q95 q96 q97
0 0 0 0 0 0 0 0 0 0
q101 q102 q103 q104 q105 q106 q107 q108 q109 q1010
0 0 0 0 0 0 0 0 0 0
q1011 q1012 q1013 q1014 q1015 q111.1 q111.2 q111.3 q112.1 q112.2
0 0 0 0 0 0 0 0 0 0
q112.3 q113.1 q113.2 q113.3 q114.1 q114.2 q114.3 q115.1 q115.2 q115.3
0 0 0 0 0 0 0 0 0 0
q116.1 q116.2 q116.3 q117.1 q117.2 q117.3 q118.1 q118.2 q118.3 q119.1
0 0 0 0 0 0 0 0 0 0
q119.2 q119.3 q1110.1 q1110.2 q1110.3 q1111.1 q1111.2 q1111.3 q121 q122
0 0 0 0 0 0 0 0 0 0
q123 q124 q125 q131 q132 q133 q134 q135 q136 q14
0 0 0 0 0 0 0 0 0 0
q15 q16 q17 q18 q19 q20 q21 q22 q23 q24
0 0 0 0 0 0 0 0 0 0
q25
0
It turns out that there are exactly zero missing values in this
survey data set, meaning that we do not have any missing values to worry
about. So, we do not have to do any imputation to fix any missing
values, because we do not have any missing values in the survey data
set.
Splitting the Survey
Data Set
We have two main components of our survey, the student experience
portion and the student satisfaction portion. We will split these apart
for our further analysis of the survey data set.
Student Experience
Portion
First, let’s separate the student experience portion of the survey
data set. The student experience portion looks at the personal and
academic background of students at the college. This information gives
some background details about the student along with their experience at
the college. The goal of this survey is to see how these experience
factors may affect a student’s college satisfaction.
The student experience portion takes up majority of the survey
questionarre. So, this student experience portion will contain the
majority of the questions within the survey data set.
experience = survey[, 1:112]
Now we have created the experience subset of our survey data.
Student Satisfaction
Portion
The second portion of the survey data set is the student satisfaction
questions. These questions look at a student’s overall satisfaction and
feelings towards their college. There are only two questions within the
survey questionarre which ask about student satisfaction related topics,
so this subset portion of the data set will be much smaller than the
student experience portion of the data set.
satisfaction = survey[, 113:114]
Now we have created the satisfaction subset of our survey data.
Reliability
Analysis
Now that we have prepared the data set, we will perform internal
reliability and validity assessments in order to further investigate the
survey data set.
Student Experience
Reliability Analysis
First, let’s look at the correlation plots for the student experience
subset of our survey data set.
The experience portion of the survey is incredibly large, so let’s
take a smaller subset of this portion in order to make our correlation
plot easier to interpret. We will look at questions 5, 6, and 7 for this
subset. We will call this subset “experience.1”.
experience.1 = survey[, 25:34]
Now, let’s look at the correlation plot for this subset of the
student experience portion of the survey.
M=cor(experience.1)
corrplot.mixed(M, lower.col = "purple", upper = "ellipse", number.cex = .7, tl.cex = 0.7)

We can see some moderate correlation between the student experience
survey datafrom how some of the ellipses do appear to be moderately long
and stretched in their shape.
Student Satisfaction
Reliability Analysis
Next, let’s look at the correlation plots for the student
satisfaction subset of the survey data set.
M=cor(satisfaction)
corrplot.mixed(M, lower.col = "purple", upper = "ellipse", number.cex = .7, tl.cex = 0.7)

The satisfaction portion of the survey was so short compared to the
student experience portion, that it is quite hard to judge the
correlation due to the lack of entries on our plot. The ellipse does
appear to be noticeably streched and long, indicating some moderate
correlation does indeed exist.
Cronbach Alpha
Levels
Next, we will calculate the Cronbach alpha levels for each of our two
subsets, along with their 95% confidence intervals. This will allow us
to assess the reliability of the two subsets.
Student
Experience
First, let’s calculate the Cronbach alpha level for the student
experience portion of the survey data set.
cronbach.e = as.numeric(alpha(experience.1)$total[1])
Some items ( q61 q62 q63 q7 ) were negatively correlated with the first principal component and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
CI.e = cronbach.alpha.CI(alpha=cronbach.e, n=332, items=10, conf.level = 0.95)
CI.comp = cbind(LCI = CI.e[1], alpha = cronbach.e, UCI =CI.e[2])
row.names(CI.comp) = ""
pander(CI.comp, caption="Confidence Interval of Cronbach Alpha")
Confidence Interval of Cronbach Alpha
0.4298 |
0.5119 |
0.5866 |
The Cronbach alpha level for the student experience portion of the
survey data set is 0.5119, 95% CI [0.4298, 0.5866]. This value of 0.5119
is not that high, but it is not incredibly low either. This indicates
that their is moderate, but not great, reliability.
Student
Satisfaction
Next, let’s calculate the Cronbach alpha level for the student
satisfaction portion of the survey data set.
cronbach.s = as.numeric(alpha(satisfaction)$total[1])
CI.s = cronbach.alpha.CI(alpha=cronbach.s, n=332, items=2, conf.level = 0.95)
CI.comp = cbind(LCI = CI.s[1], alpha = cronbach.s, UCI =CI.s[2])
row.names(CI.comp) = ""
pander(CI.comp, caption="Confidence Interval of Cronbach Alpha")
Confidence Interval of Cronbach Alpha
0.4854 |
0.5853 |
0.6658 |
The Cronbach alpha level for the student experience portion of the
survey data set is 0.5853, 95% CI [0.4298, 0.6658]. TOnce again, this
value of 0.5853 is not that high, but it is not incredibly low either.
This indicates that their is moderate, but not great, reliability. This
value is slightly higher than that of the student experience subset,
indicating the student satisfaction subset has slightly better
reliability of the two.
Project Questions
Lastly, we will now look at some potential project questions based
upon the analysis of the survey data set and the results that were
found.
Some potential questions include:
Which factors of student experience showed the strongest
influence on student satisfaction?
Did these particular factors with the strongest influence on
satisfaction show stronger reliability when compared to the
others?
Do students appear to be mostly satisfied with their school
experience?
For this project, I would be interesting in looking into particuarly
the topic of which student experience factors have the greatest
significance or effect on a student’s satisfaction with their college
experience. This would be helpful because it would give specific factors
for college faculty or advisors to look out for in their students in
order to help them have a more positive and satisfactory experience
during their time in college.
I think it would be interesing to see if these particular factors
with stronger influence on a student’s satisfaction have stronger
reliability. It would be interesting to compare the reliability results
of these particular factors to see if their appears stronger or weaker
than the average. This would help to see if these findings truly are
reliable or not.
