In the absence of data, it seems to make the most sense to think, just as was presented to the SGA by Wendy Agrusa, that students who are closer to graduation would be less likely to benefit from switching to the new Gen Ed and to new major programs. Wendy and Valentina also mentioned that students’ majors might matter as well.
Based on their perfectly reasonable suppositions, one might think that, for example, students from CHSS might have left the most Gen Ed classes for their senior year and might tend to benefit more, and students from CHNS might tend to benefit less. As they said, we cannot know for everyone: There may be a student who has 110 credits but who still has more than 2 semesters of classes. But, it seems likely that most students with 110 credits will not benefit from switching.
This simulation was modified considerably on 1 March 2015 when I learned that 55% of our students hav e more than 90 accumulated credits. I also added a section trying to explain why, even if we cannot perfectly say how many credits a student needs to graduate as function of accumulated credits, accumulated credits still has utility to predict credits needed to graduate. If the assumtions I got from Valentina and Wendy are correct, this only strengthens the case against setting it up as opt-out decision becasue their assumption (that I think makes sense) is that students who are farther along will be less likely to benefit from switching to the 201580 catalog term.
In this simulation, I set up a structure like the one that Wendy and Valentina presented to the CHSS College Council, to estimate the number of errors generated by a 100% opt-out strategy like the one we will be adopting vs. an opt-out/opt-in policy based on credits and college.
switch.good <- function(credits, g, b, m) {return(((1-m) / (1 + exp(g*(credits-b)))) + m)}
In the simulation, I use the logistic function to model the probability that switching to the to Gen Ed and major program will be better for a student. Note that the probability here never drops to 0. This indicates that there is always a chance that switching to the new Gen Ed and major program will help, but that it does, as Wendy and Valentina suggested, decrease as students get closer to graduation.
Based on the Spring 2015 Comparative Enrollment Report, there are the following numbers of undergraduate students in each college. This, I believe, includes students who are currently in OCP, so I do not count them separately (although based on something Wendy said about military credits, they might tend to be in a different situation than non-OCP students).
N <- c(cba = 913, chss = 840, cncs = 809, cnhs = 797)
I was given to understand that 55% of our students have more than 90 accumulated credits. I do not know what the distribution of credits looks like, but I use a truncated negatively skewed ex-Gaussian distribution with \(\mu = 120\), \(\sigma = 35\), and \(\lambda = .04\). I truncate the low end strictly at 0, and at the high end I do a “noisy” truncation at 140.
I am not sure if the distribution is correct, but I do get close to the one data point I was given. In this simulated distribution, 55% of students have more than 90 credits.
One concern that I have heard voiced from Valentina and now from the provost is that we cannot deterministically tell how many credits someone needs to graduate based on the number of credits they have. I can see how that is a problem, but I cannot see how that prevents us from using the number of credits in a student’s transcript to predict credits to graduate. This is in the logistic model above indirectly. For example, a simulated CHSS student with 120 credits still has about a 25% chance that switching to the new Gen Ed and program will help.
Even if the relationship between credits and credits to graduate looks something like the figure below, it would still have predictive validity. In other words, if you tell me how many credits a student has, I will be able to make a more precise prediction about how many credits he or she needs to graduate than if I do not have that information.
The key ideas in this mini simulation (that does not directly influence any of the other simulations) is that yes, we cannot perfectly predict credits to graduate based on accumulated credits, but you can make an imperfect, noisy prediction that is better than ignoring earned credits altogether. This can be shown by noting that in this simulation, using class level (Freshman, Sophomore, etc.) to predict credits remaining accounts for 83% of the variance. It can also be shown by noting that the correlation beween accumulated credits and credits needed to graduate is \(r = -0.92\).
cred.rem.lm <- lm(cred.rem ~ class.level, students)
summary(cred.rem.lm)
##
## Call:
## lm(formula = cred.rem ~ class.level, data = students)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.553 -8.135 -1.553 6.447 62.447
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 108.7603 0.7540 144.24 <2e-16 ***
## class.levelSophomore -32.2626 0.9395 -34.34 <2e-16 ***
## class.levelJunior -59.6253 0.8553 -69.71 <2e-16 ***
## class.levelSenior -85.2072 0.8022 -106.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.73 on 3355 degrees of freedom
## Multiple R-squared: 0.8306, Adjusted R-squared: 0.8305
## F-statistic: 5484 on 3 and 3355 DF, p-value: < 2.2e-16
describeBy(cred.rem, students$class.level)
## group: Freshman
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 242 108.76 10 110 109.35 14.83 90 120 30 -0.21 -1.45
## se
## 1 0.64
## --------------------------------------------------------
## group: Sophomore
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 438 76.5 9.97 76 76.17 11.86 60 103 43 0.27 -0.69 0.48
## --------------------------------------------------------
## group: Junior
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 844 49.14 12.46 48 48.35 13.34 30 93 63 0.57 0.02
## se
## 1 0.43
## --------------------------------------------------------
## group: Senior
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 1835 23.55 11.98 22 22.34 8.9 0 86 86 1.2 2.31
## se
## 1 0.28
The summary statistics by class level above and this boxplot below show that the regression above can still make good use of class level to predict number of credits remaining even if, for example, among “seniors,” many still need 50 credits to graduate.
The point I have tried to make in several places in this section is that the inability to predict perfectly is not the same as the complete inability to predict. It seems as if decision makers are adopting a strategy that says, “if we do not have perfect information about every student, we should act as if we have no information.” This may well be justified, but I have not seen a good case for it.
students$prob.good[students$college == "CBA"] <-
switch.good(students$credits[students$college == "CBA"], .11, 80, .10)
students$prob.good[students$college == "CHSS"] <-
switch.good(students$credits[students$college == "CHSS"], .07, 90, .15)
students$prob.good[students$college == "CNCS"] <-
switch.good(students$credits[students$college == "CNCS"], .095, 65, .125)
students$prob.good[students$college == "CNHS"] <-
switch.good(students$credits[students$college == "CNHS"], .12, 70, .075)
## describeBy(students$prob.good, students$college)
This is where the actual simulation happens. It uses each students probability that it is a good idea to switch and “flips” a weighted coin. If a student has a 68% chance that it is good for them to switch the coin will come up heads 68% of the time and tails 32% of the time. This means that a senior who probably should not switch could have it come up that switching really would have been better after all.
students$is.good <- runif(nrow(students)) < students$prob.good
with(students, table(is.good))
## is.good
## FALSE TRUE
## 2009 1350
with(students, table(college, is.good))
## is.good
## college FALSE TRUE
## CBA 524 389
## CHSS 384 456
## CNCS 556 253
## CNHS 545 252
with(students, table(class.level, is.good))
## is.good
## class.level FALSE TRUE
## Freshman 1 241
## Sophomore 39 399
## Junior 400 444
## Senior 1569 266
with(students, ftable(college, class.level, is.good))
## is.good FALSE TRUE
## college class.level
## CBA Freshman 0 66
## Sophomore 7 125
## Junior 84 138
## Senior 433 60
## CHSS Freshman 1 65
## Sophomore 8 91
## Junior 48 170
## Senior 327 130
## CNCS Freshman 0 56
## Sophomore 13 81
## Junior 139 74
## Senior 404 42
## CNHS Freshman 0 54
## Sophomore 11 102
## Junior 129 62
## Senior 405 34
Count the number of initial mis-classifications. The students are all still responsible for where they end up. This is just an initial guess based on the overall base-rate, class level, or class level plus college.
students$opt.out.err <- !students$is.good # not good to switch -> mis-classification
students$opt.in.err <- students$is.good # good to switch -> mis-classification
students$level.err[students$class.level == "Freshman"] <- # Freshmen opt out
!students$is.good[students$class.level == "Freshman"]
students$level.err[students$class.level == "Sophomore"] <- # Sophomores opt out
!students$is.good[students$class.level == "Sophomore"]
students$level.err[students$class.level == "Junior"] <- # Juniors opt out
!students$is.good[students$class.level == "Junior"]
students$level.err[students$class.level == "Senior"] <- # Seniors opt in
students$is.good[students$class.level == "Senior"]
## Paying attention to college does not help much here, except for CHSS Juniors
students$level.college.err <- students$level.err
## Simulated CNCS & CNHS Juniors should actually opt in as well
students$level.college.err[students$class.level == "Junior" &
(students$college == "CNCS" | students$college == "CNHS")] <-
students$is.good[students$class.level == "Junior" &
(students$college == "CNCS" | students$college == "CNHS")]
How many initial simulated mis-classifications would there be if you had them all have to opt out like we are doing?
with(students, table(opt.out.err))
## opt.out.err
## FALSE TRUE
## 1350 2009
How many initial simulated mis-classifications would there be if you had them all have to opt in, which is the opposite of what we are doing?
with(students, table(opt.in.err))
## opt.in.err
## FALSE TRUE
## 2009 1350
Based on the simulated results above, if you are just paying attention to class rank, you should have Freshman, Sophomores, and Juniors opt out and Seniors opt in. How many initial simulated mis-classifications would there be if you followed that strategy?
with(students, table(level.err))
## level.err
## FALSE TRUE
## 2653 706
Based on the simulated results above, if you are paying attention to class rank and college, you should have Freshman and Sophomores, opt out and Seniors opt in. For Juniors, you should have the CBA and CHSS Juniors opt out and the CNCS and CNHS Juniors opt in. How many initial simulated mis-classifications would there be if you followed that strategy?
with(students, table(level.college.err))
## level.college.err
## FALSE TRUE
## 2785 574
I set up this simulation by translating some straightforward assumptions into math. This simulation assumes that as students accumulate credits that the likelihood that switching to the new Gen Ed and their new major programs will be beneficial decreases. I also assumed that it would be probabilistic, that given two students in the same college with the same number of credits, that one might benefit from switching and one might not.
Given the way I instantiated these mild assumptions, the simulation suggests that we could go from making 60% initial mis-classification to 21% initial mis-classifications just by attending to class level.
Each initial mis-classification has some potential to take up a lot of staff and faculty time. If we can cut initial mis-classifications by about 3/5, that should cut the amount of time staff and faculty need to devote to fixing initial mis-classifications that turn into problematic mis-classifications. We can also just have a big increase in happy students.