Anshul Kumar
Jan 10 2020
This presentation introduces one question that predictive analytic methods can help us answer, using example student data and hypothetical analysis methods.
Similar predictive methods can also be applied to a broad range of questions with different forms and goals.
Link to these slides: tinyurl.com/ClassificationExample2020 (not case sensitive)
| Key you can press | What should happen when you press it |
|---|---|
| A | Toggle between seeing all slides and one slide at a time |
| S | Make everything on the slide smaller |
| B | Make everything on the slide bigger |
BE SURE TO PAUSE THE VIDEO ANY TIME YOU FEEL LIKE IT.
By the end of this presentation, our goals are to:
Identify the types of questions that predictive analytics methods can help us answer in education.
Build intuition about how machine learning algorithms can help us predict group membership using systematically-organized data (also known as classification).
I really like to eat noodles and pasta. It’s January 10 2020 and the weather is really cold here in Boston. A perfect day for hot noodles. I cooked some noodles today in a pot of boiling water.
But now I have a problem to solve:
(Right now the noodles and water are together in the pot)
Our goal:
Keep all of the noodles.
Discard all of the water.
To be clear: Our goal is to take EVERYTHING IN THE POT and separate it into two groups:
KEEP and DISCARD.
We will have to use some kind of sorting mechanism to separate the noodles and water, both of which are currently mixed together in the pot.
.
.
This is the sorting mechanism we’re going to use. We’ll pour the noodles and water from the pot into a colander filter.
This is called colander filtering.
In plain words:
.
In predictive analytic / machine learning words:
In addition to liking noodles, I am also an educator and administrator in an educational program.
Here are some key details:
.
I have a problem to solve:
One-year program timeline for students:
Here’s what we do and don’t know for each cohort:
| Cohort | Fall term grades | Spring term grades | Final Exam Results |
|---|---|---|---|
| C1: 2018–19 | Yes | Yes | Yes |
| C2: 2019–20 | Yes | No | No |
Remember:
PAUSE TO MAKE SURE THIS MAKES SENSE, BEFORE YOU CONTINUE.
Our goal is to predict final exam results for C2, using only their fall term grades.
To be clear: Our goal is to take all of the students in C2 and sort them into two groups:
Predicted to pass final exam
and
Predicted to fail final exam (at-risk students)
Right now it’s Jan 10 2020, in between fall and spring terms.
The final exam for C2 students is not until May, in 5 months.
If we can identify RIGHT NOW who all we think will fail in five months, we can give them extra remedial support and prevent them from failing!
Just like with the noodles, we will have to use some kind of sorting mechanism to separate the students in C2 predicted to fail from the students predicted to pass, all of whom are half-way through the one-year program at this point.
Just imagine that you’re “pouring” students into a colander filter that will catch the at-risk students and let the other ones pass through.
Noodles:
.
Students:
Use (complete) C1 data to look for patterns.
Apply what we learned from C1 to make predictions on C2.
What we already know:
What we want to know:
We want to exploit the completeness of the C1 student data to predict something unknown about the students in C2.
We ignore the spring 2019 grades for C1 because we don’t have them for C2, so we don’t want to use them to train (calibrate) our predictive model.
Now it’s time to look more closely at the data we have for C1 and C2.
See Excel data spreadsheet in video.
Each row in the data is a student.
We have separate data sheets for C1 and C2.
.
.
We give the fall 2018 data and the final exam results for 75 C1 students to the computer.
The computer applies the SVM (support vector machine) algorithm these 75 students to “learn” and look for patterns in this data.
We won’t go over the technical details of SVM today.
Remember the 25 C1 students we left out? Now we use them. They were not used to train/calibrate the model. So now we can pretend that these 25 C1 students are actually C2 students about whom we want to make predictions. Very sneaky!
But we know the final exam grades of these 25 students. We are keeping them secret from the computer though.
These 25 students from C1 who we left out of the calibration process are called the testing data. We know the final grades for the testing data students but we leave them out of the model calibration anyway, specifically so that we can use them to test how well our model works.
We will compare the predicted final exam grades to the true final exam grades in the testing data.
Pause and go back if needed.
Here’s what we are doing:
Goal: predict whether C2 students will pass or fail, using patterns in data from C1 students.
Randomly separate the students into two datasets: 75 students in training data, 25 students in testing data.
Use all fall grades (independent variables) and final grades (dependent variable) of training dataset students to train a machine learning model.
Plug the testing dataset students’ fall grades (independent variables) into the machine learning model to see if it predicts whether they passed or failed. Compare these predictions to what actually happened (which we know because we actually have their final results and we’re just pretending that we don’t).
If the accuracy of the predictions (the success rate) is good enough, use the same machine learning model to predict final grades for C2 students (for whom we do not have the actual final grades).
Look at the spreadsheet in the video.
What we wanted:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 3 | 0 |
| Predicted to pass | 0 | 22 |
What we actually got:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 2 | 6 |
| Predicted to pass | 1 | 16 |
Above:
The accuracy of the sorting mechanism success rate is the proportion of correct predictions divided by the total number of students.
.
\[\text{accuracy} = \frac{\text{correctly classified students}}{\text{all students}} = \frac{\text{true negatives + true positives}}{\text{total students}}\]
.
\[\text{ideal desired accuracy} = \frac{3+22}{25} = 1\]
.
\[\text{Sorting mechanism #1 actual accuracy} = \frac{2 + 16}{25} = .72\]
We give the fall 2018 data and the final exam results for 75 C1 students to the computer.
The computer applies the Random Forest algorithm these 75 students to “learn” and look for patterns in this data.
We won’t go over the technical details of Random Forest today. It works differently than SVM to look for patterns.
Here’s what we are doing again:
Goal: predict whether C2 students will pass or fail, using data from C1 students.
Randomly separate the students into two datasets: 75 students in training data, 25 students in testing data.
Use all fall grades (independent variables) and final grades (dependent variable) of training dataset students to train a machine learning model.
Plug the testing dataset students’ fall grades (independent variables) into the machine learning model to see if it predicts whether they passed or failed. Compare these predictions to what actually happened (which we know because we actually have their final results and we’re just pretending that we don’t).
If the accuracy of the predictions (the success rate) is good enough, use the same machine learning model to predict final grades for C2 students (for whom we do not have the actual final grades).
What we wanted:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 3 | 0 |
| Predicted to pass | 0 | 22 |
What we actually got:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 3 | 8 |
| Predicted to pass | 0 | 14 |
Above:
The accuracy of the sorting mechanism success rate is the proportion of correct predictions divided by the total number of students.
.
\[\text{accuracy} = \frac{\text{correctly classified students}}{\text{all students}} = \frac{\text{true negatives + true positives}}{\text{total students}}\]
.
\[\text{ideal desired accuracy} = \frac{3+22}{25} = 1\]
.
\[\text{Sorting mechanism #1 actual accuracy} = \frac{3 + 14}{25} = .68\]
Before we declare which sorting mechanism is best, let’s review what we’re going to use the best sorting mechanism to do:
Using the best trained predictive model that had the best predictions on the testing dataset (from C1), we will make predictions about C2 students’ final exam results.
We will provide remedial support for all students in C2 predicted to fail.
We are trying to create a safety net for early detection of students at risk of failing.
Compare predictions made by each sorting mechanism on the 25 testing students from C1:
SVM:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 2 | 6 |
| Predicted to pass | 1 | 16 |
Random Forest:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 3 | 8 |
| Predicted to pass | 0 | 14 |
Criteria to consider while picking the best one:
Which and how many students will we remediate in each case? Do we have the resources to do so?
There is a trade-off between false positives and number of students requiring remediation.
Compare predictions made by each sorting mechanism on the 25 testing students from C1 (with hypothetical C2 numbers in parentheses):
SVM:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 2 (8) | 6 (24) |
| Predicted to pass | 1 (4) | 16 (64) |
Random Forest:
| Actually failed | Actually passed | |
|---|---|---|
| Predicted to fail | 3 (12) | 8 (32) |
| Predicted to pass | 0 (0 or 1) | 14 (56) |
Criteria to consider while picking the best one:
Which and how many students will we remediate in each case? Do we have the resources to do so?
In this example (not always) there is a trade-off between false positives and number of students requiring remediation.
In plain words:
Today, on January 10 2020, we “poured” all of our C2 students into a filter (a sorting mechanism).
The filter allowed students who are not at risk of failing to pass through. But it did not allow students who ARE at risk of failing pass through; it retained them in its net.
We can now give remedial support to the students who were caught by the filter.
In predictive analytics / machine learning terms:
Using complete previous data (from C1), we made a prediction on incomplete current data (from C2). We classified C2 students into two classes: those predicted to pass and those predicted to fail (at-risk students).
Using predictive analytics, we created an early warning system to make predictions of who in C2 would pass and fail the final exam, five months in advance of the exam happening.
We can’t know for sure how accurate our predictions are for C2; but we have a sense for how accurate they might be, given our predictions and accuracy calculations using the 25-person testing dataset from C1.
We can now give remedial support to the students predicted to fail by the analytic models.
Most important:
Independent variables: These are all of the data that we are using to make a prediction. In our case, these are all of the fall term grades.
Dependent variable: This what you are trying to predict. In our case, this is the final exam grade.
Machine learning: Machine learning is a group of analysis techniques that help us do predictive analytics. Machine learning and predictive analytics are subsets of artificial intelligence and statistical analysis.
Optional:
We are using supervised machine learning in this example.
All of the algorithms above use different statistical and/or algorithmic approaches to predict classes (outcome categories) into which our observations (rows of data) fall.
Generalizability: If C1 and C2 are very different from each other in any way, our predictions for C2’s final grades won’t be accurate. Example: if C2 WILL be affected by COVID-19 disruptions but C1 WASN’T, our predictions could be inaccurate. The computer doesn’t know about COVID-19; it only knows what we feed into it.
In educational analytics, it’s almost impossible to be 100% accurate. Students are not noodles. You should always compare results of your analytics to other sources of information as well as your own intuition and reasoning.
Predictive analytics is just as much an art as it may be a science. Correctly calibrating a machine learning algorithm for your specific situation can take time and effort. This presentation is a simplified summary of the entire process, just to introduce the concepts. The illustrated trade-off between SVM and Random Forest earlier in this presentation is an example of how the human’s judgment and involvement is critical to predictive analytics.
Quantitative analysis—the topic of this presentation—can often be paired well with qualitative analysis to achieve your goal. Example: In this case, our goal is to minimize the number of students who fail the final exam. In addition to doing our predictive analysis, we could also supplement this by doing qualitative interviewing of a subset of students who passed and failed the final exam in C1. We can ask them which study techniques they used. We can then compare the study techniques used by the students who passed with those of the students who failed. We can then recommend the successful study techniques—or even build into the spring curriculum—for the C2 students.
Analytics should be used only when they are useful to your work. This may not always be the case. Example: I never use a colander because I find it difficult to clean. For me, the benefits of the colander’s ability to filter well are outweighed by the extra cost of extra cleaning work.
This presentation was actually created in June 2020, but I’m pretending that the date is January 2020 because this makes the examples more logical.
Assistance for this presentation came from a number of people in the HPEd and PA programs at MGHIHP.
This presentation was created for the students of the MS and PhD programs in HPEd (health professions education) at MGHIHP.
The data and results used in this presentation were all fabricated for illustrative purposes. But they are reflective of true examples.
Here are some questions to consider:
How can predictive/learning analytics help you in your own work as an educator or at your organization/institution? What predictions would be useful for you to make?
How can you leverage data that your institution already collects (or that it is well-positioned to collect) using predictive analytic methods?
What would be the benefits and detriments of incorporating predictive analytics into your institution’s practices and processes?
Could predictive analysis complement any already-ongoing initiatives at your institution?
What would the ethical implications be of using predictive analytics at your institution? Would it cause unfair discrimination against particular learners? Would it help level the playing field for all learners?