Introduction
We are doing this research project to look at the level of student
engagement and the overall academic experience with undergraduate
students enrolled at two business schools in the US.
We will look at the relationship between some of these variables. The
goal is to find areas that support or hurt student success and
satisfaction at these universities.
The survey given includes the following sections:
Students’ Engagement in Learning
Student Learning Styles
Writing and Reading Load
Remedial Experience
Encouragement and Support
Growth and Development
Campus Resource Utilization
Retention
How Students Pay for College
The purpose of our analysis is to find patterns in student responses
and examine how these factors correlate with a student’s sense of
academic belonging, satisfaction, and potential persistence to
graduation.
Data Management
This section will prepare the survey data for analysis by handling
missing values across multiple sections of the questionnaire.
Students’ Engagement
in Learning
Student Learning
Styles
Writing and Reading
Load
Remedial
Experience
Encouragement and
Support
Growth and
Development
Campus Resource
Utilization
Retention
How Students Pay for
College
Reliability of the
Analysis
We will now get an initial correlation matrix and make a decision on
whether to scale our values if we have a variety of negatively
correlated values or to compute our cronbachs alpha if our values are
all positively correlated.
Students’ Engagement
in Learning

From the correlation matrix we can see that we have a multitude of
negatively correlated values. Due to this we will have to scale our
values into different sub sections.
Student Learning
Styles

We see from our correlation matrix of student learning styles section
that we have high correlation between our questions.
Confidence Interval of Cronbach’s Alpha
| 0.8325 |
0.8508 |
0.8677 |
Looking at this, it shows we had a strong positive correlation matrix
so, we decided to do a cronbachs alpha calculation. We got an alpha
level of .8508 and a CI of (.8325, .8677). These values are very high
which means we have a high level of internal consistency.
Writing and Reading
Load

We see from our correlation matrix we have positive correlation.
Since these are all positive we go ahead and perform a cronbachs alpha
calculation.
Confidence Interval of Cronbach’s Alpha
| 0.3965 |
0.4703 |
0.5365 |
In the correlation matrix got a lower cronbachs alpha value of .4703
and CI of (.3965, .5365). With this value we cannot accept this
section.
Encouragement and
Support

The correlation matrix shows we have all positive values. Since our
values are positive we can go ahead and do a cronbachs alpha
calculation.
Confidence Interval of Cronbach’s Alpha
| 0.7851 |
0.8083 |
0.8298 |
We got a higher cronbachs value of .8083 and CI of (.7851, .8298). We
can go ahead and say that we have high internal consistency between our
questions.
Campus Resource
Utilization

We see from the correlation matrix above that we have differing
almost equal levels of positive and negative values so we will have to
make some subsections of our observations for this section.

With this matrix we separated it into the 3 different scales. This
did not help us at all in noticing any trends, instead we still see a
lot of negative and positive correlation values. Since we cannot get
these negative values out of the matrix we will not proceed with a
cronbachs alpha calculation and assume that we cannot use this
section.
Retention

This correlation matrix having every value positively correlated. So
we can go ahead and calculate our cronbachs alpha.
Confidence Interval of Cronbach’s Alpha
| 0.8589 |
0.8747 |
0.8891 |
We got a high cronbachs alpha of .8747 and a CI of (.8589, .8891). So
we can say that this section has high internal consistency.
How Students Pay for
College

The matrix for this has a mix of both negative and positively
correlated values so we cannot do a cronbachs alpha calculation. We now
have to go ahead and sub section our How Students Pay for College
section by personal income/savings and outside assistance.
The new matrix of just outside assistance has improved from the last
matrix. All the values are positively correlated so, we can go ahead and
perform a cronbachs alpha calculation.
Confidence Interval of Cronbach’s Alpha
| 0.5828 |
0.6309 |
0.6747 |
This has a smaller cronbach value of .6309 and CI of (.5828, .6747).
Since our value is at .63 we are questionable on whether to use this
questionnaire due to it being low.
PCA
This section presents the results of the PCA (principal component
analysis). The significant principal components identified will be used
in the analytic data set for use in regression modeling.
Students’ Engagement
in Learning PCA
We first generate a plot to visually and statistically help us decide
how many underlying dimensions are in our data for engagement. We do
this before running PCA or using PCs in regression.

Our plot above shows that the appropriate amount of components will
be around 2 so we go ahead and perform principle component analysis.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the
Engagement Questionaire Survey.
| q41 |
-0.241 |
0.202 |
| q42 |
-0.280 |
0.050 |
| q43 |
-0.229 |
-0.039 |
| q44 |
-0.215 |
-0.184 |
| q45 |
-0.074 |
-0.082 |
| q46 |
-0.168 |
-0.284 |
| q47 |
-0.265 |
0.030 |
| q48 |
-0.204 |
0.220 |
| q49 |
-0.254 |
0.163 |
| q410 |
-0.018 |
-0.462 |
| q411 |
-0.074 |
-0.470 |
| q412 |
-0.285 |
0.047 |
| q413 |
-0.273 |
-0.078 |
| q414 |
-0.257 |
-0.130 |
| q415 |
-0.187 |
-0.347 |
| q416 |
-0.165 |
-0.278 |
| q417 |
-0.278 |
0.164 |
| q418 |
-0.286 |
0.105 |
| q419 |
-0.222 |
0.134 |
| q420 |
-0.237 |
0.144 |
| q421 |
0.008 |
-0.156 |
Cumulative and proportion of variances explained by each the
principal component in the engagement survey.
| Standard deviation |
2.487 |
1.701 |
| Proportion of Variance |
0.294 |
0.138 |
| Cumulative Proportion |
0.294 |
0.432 |
After looking at the PCA both show were little proportion with PC1
being 29.4% and PC2 being 13.8.

We see here that our graph appears to be skewed to the right. Further
analysis will be need on whether to do a box cox transformation to fix
the distributional issue in the regression residuals.
Student Learning
Styles PCA

We see from the above plot that it suggests to use one principal
component.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the
Learning Style Questionaire Survey.
| q51 |
0.279 |
| q52 |
0.420 |
| q53 |
0.454 |
| q54 |
0.413 |
| q55 |
0.453 |
| q56 |
0.405 |
Cumulative and proportion of variances explained by each the
principal component in the Learning Style survey.
| Standard deviation |
1.877 |
| Proportion of Variance |
0.587 |
| Cumulative Proportion |
0.587 |
The PCA has a larger amount of proportion of variance at around
59%.

The plot above shows that our distribution appears to be skewed to
the right. We will need further analysis to see if we need to perform a
box cox transformation to fix the distributional issue in the regression
residuals.
Writing and Reading
Load PCA

We see from the above plot that it suggests to use one principal
component.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the
Writing and Reading Questionaire Survey.
| q61 |
0.676 |
| q62 |
0.307 |
| q63 |
0.670 |
Cumulative and proportion of variances explained by each the
principal component in the writing and reading survey.
| Standard deviation |
1.216 |
| Proportion of Variance |
0.493 |
| Cumulative Proportion |
0.493 |
The PCA gives us around 49%.

We see from the distribution plot it is skewed to the right. We will
need further analysis to see if we need to perform a box cox
transformation.
Encouragement and
Support PCA

We see from the above table that it suggests us to use the first two
principle components.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the
Encouragement and Support Questionaire Survey.
| q91 |
0.297 |
-0.378 |
| q92 |
0.426 |
-0.162 |
| q93 |
0.416 |
-0.178 |
| q94 |
0.435 |
0.347 |
| q95 |
0.430 |
0.331 |
| q96 |
0.376 |
0.278 |
| q97 |
0.203 |
-0.701 |
Cumulative and proportion of variances explained by each the
principal component in the Encouragement and Support survey.
| Standard deviation |
1.832 |
1.102 |
| Proportion of Variance |
0.480 |
0.173 |
| Cumulative Proportion |
0.480 |
0.653 |
Our first component explains 48% of the total variance and the second
component only explains 17% of the total variance. We will only be using
the first component in our analysis.

We see from the above table that our distribution table is
approximately normal.
Retention PCA

We see from the above table that it suggests for us to use one
component.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the
Retention Questionaire Survey.
| q121 |
-0.458 |
| q122 |
-0.461 |
| q123 |
-0.439 |
| q124 |
-0.455 |
| q125 |
-0.422 |
Cumulative and proportion of variances explained by each the
principal component in the Retention survey.
| Standard deviation |
1.835 |
| Proportion of Variance |
0.673 |
| Cumulative Proportion |
0.673 |
The first PC explains 67% of our total variance.

We see can see that our plot is skewed to the right so we might need
to perform a box cox transformation. Also, further analysis is
needed.
How Students Pay for
College PCA

We see from the plot above that it suggests to use on PC.
Factor loadings of the first few PCAs and the cumulative the
proportion of variation explained by the corresponding PCAs in the Pay
Questionaire Survey.
| q133 |
-0.418 |
| q134 |
-0.586 |
| q135 |
-0.528 |
| q136 |
-0.451 |
Cumulative and proportion of variances explained by each the
principal component in the pay survey.
| Standard deviation |
1.391 |
| Proportion of Variance |
0.484 |
| Cumulative Proportion |
0.484 |
The PC is 48.4% of the proportion of variance.

We see from the graph above that it is skewed to the right. We will
need to perform a box cox transformation to fix the distributional
issue.
Project Questions
How well does student engagement in inside and outside the
classroom predict their overall satisfaction with their college
experience?
Does how the student pays for school correlate with the students
engagement in school?
My first question looks at the idea that students who are more
actively engaged in their learning may feel more affiliated in their
academic experience. The survey data that was provided to us shows a lot
of detail on student engagement through the 21 questions (Section 4)
given, making it a strong candidate for giving us enough variables
representing engagement.
The second question is from the assumption if the student pays for
their college than they would be more likely to be more engaged in their
learning. Looking at these two together can offer meaningful insight
into how much a student will be engaged based off if they payed for
their school or if someone else did.
