Presentation information

Link to video that goes with this presentation: https://youtu.be/aql1u3Wi8Dg
Link to these slides: https://rpubs.com/AnshulKumar/ClassifyStudents1 (not case sensitive)
Most of the R code used to make this document can be found here: https://rpubs.com/anshulkumar/EducAnalytics1
Pressing keys on your keyboard, you can modify how you view this presentation:

Key you can press	What should happen when you press it
A	Toggle between seeing all slides and one slide at a time
S	Make everything on the slide smaller
B	Make everything on the slide bigger
right or left arrow	Go to next or previous slide
space-bar	Go to next slide
C	Show table of contents

BE SURE TO PAUSE THE VIDEO ANY TIME YOU FEEL LIKE IT.

Learning goals of this presentation

By the end of this presentation, our goals are to:

Identify the types of questions that predictive analytics methods can help us answer in education.
Examine real results from a predictive analytic method and brainstorm about how we would use the results.
Build intuition about how machine learning algorithms can help us predict group membership using systematically-organized data (also known as classification).

Sorting mechanism example – Colander filter for spaghetti

Image source: https://www.masterfile.com/image/en/600-02346521/hands-straining-pasta

We have spaghetti and water in a pot. We need to separate them.
We separate them with some kind of sorting mechanism, a colander filter in this case.
In machine learning terms, the colander filter solves a classification problem for us: it “classified” (sorted) the contents of the pot into two groups.
Our goal is to make a sorting mechanism on the computer to classify (separate from each other) students at risk of failing and those not at risk.
We want to “pour” all of the students into this computerized “filter” and see who it “catches” and identifies as at-risk.

Learning analytics with students

This presentation uses an example from:

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. http://www3.dsi.uminho.pt/pcortez/student.pdf.
Download the data from: Student Performance Data Set. UCI Machine Learning Repository. Center for Machine Learning and Intelligent Systems. https://archive.ics.uci.edu/ml/datasets/Student+Performance. Download the file student.zip, then open the file student-por.csv to see the data.
Also available at kaggle.com: https://www.kaggle.com/larsen0966/student-performance-data-set

Overarching but vague goal

Imagine we are teaching a large course with many students…

How can we improve student outcomes and support our students better?

We have some data…

See examples in the video of LAST YEAR’s and THIS YEAR’s data.
n = 649 students in one Portuguese language course LAST YEAR.
\(G3 \geq 10\) means passing final grade

Full analytics process – Roadmap

Order: green, black, red

Specific goal

Using…

The knowledge that 100 students failed and 549 passed LAST YEAR and
the demographic, mid-term, and final grade data on each student from LAST YEAR…

…Can we reduce the number of students who fail THIS YEAR?

Data dictionary / codebook – Data from last year

Attribute	Description (Domain)
sex	student’s sex (binary: female or male)
age	student’s age (numeric: from 15 to 22)
school	student’s school (binary: Gabriel Pereira or Mousinho da Silveira)
address	student’s home address type (binary: urban or rural)
Pstatus	parent’s cohabitation status (binary: living together or apart)
Medu	mother’s education (numeric: from 0 to 4 (a))
Mjob	mother’s job (nominal (b))
Fedu	father’s education (numeric: from 0 to 4 (a))
Fjob	father’s job (nominal (b))
guardian	student’s guardian (nominal: mother, father or other)
famsize	family size (binary: ≤ 3 or > 3)
famrel	quality of family relationships (numeric: from 1 – very bad to 5 – excellent)
reason	reason to choose this school (nominal: close to home, school reputation, course preference or other)
traveltime	home to school travel time (numeric: 1 – < 15 min., 2 – 15 to 30 min., 3 – 30 min. to 1 hour or 4 – > 1 hour).
studytime	weekly study time (numeric: 1 – < 2 hours, 2 – 2 to 5 hours, 3 – 5 to 10 hours or 4 – > 10 hours)
failures	number of past class failures (numeric: n if 1 ≤ n < 3, else 4)
schoolsup	extra educational school support (binary: yes or no)
famsup	family educational support (binary: yes or no)
activities	extra-curricular activities (binary: yes or no)
paidclass	extra paid classes (binary: yes or no)
internet	Internet access at home (binary: yes or no)
nursery	attended nursery school (binary: yes or no)
higher	wants to take higher education (binary: yes or no)
romantic	with a romantic relationship (binary: yes or no)
freetime	free time after school (numeric: from 1 – very low to 5 – very high)
goout	going out with friends (numeric: from 1 – very low to 5 – very high)
Walc	weekend alcohol consumption (numeric: from 1 – very low to 5 – very high)
Dalc	workday alcohol consumption (numeric: from 1 – very low to 5 – very high)
health	current health status (numeric: from 1 – very bad to 5 – very good)
absences	number of school absences (numeric: from 0 to 93)
G1	first period grade (numeric: from 0 to 20)
G2	second period grade (numeric: from 0 to 20)
G3	final grade (numeric: from 0 to 20)

Notes:

a: 0 – none, 1 – primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education.
b: teacher, health care related, civil services (e.g. administrative or police), at home or other.

Source: Table 1 in the Cortez & Silva article (p. 3 of 8).

Data summary and procedure

Here’s what we do and don’t know for each cohort:

Cohort	Demographic information	Mid-term grades	Final grades
Last year	Yes	Yes	Yes
This year	Yes	Yes	No

LAST YEAR’s data is complete. THIS YEAR’s data is incomplete.

Our procedure:

Look for patterns in LAST YEAR’s data.
Predict final exam results for THIS YEAR’s students, using only their demographic information and mid-term grades.
Provide extra support to at-risk students in THIS YEAR’s cohort.

How can we achieve our goal?

We need a sorting mechanism that we can “pour” our students into.
The sorting mechanism will tell us who is predicted to fail the course (at risk students) and who is predicted to pass.

Spaghetti:

Students:

Modified process diagram – Student learning analytics

Split up last year’s data

Important trick!

Randomly select 75% of students for training and 25% for testing.
Before we trust, we must test.
Hide 25% of the data from LAST YEAR, then use it to double-check.

Sorting mechanism – Decision Tree

We give LAST YEAR’s data for 487 students to the computer.
The computer makes the best decision tree possible to figure out each student’s score.

What is a decision tree?

Feature (variable) importance in decision tree

Feature	Importance
G1	1.5531560
failures	0.5927510
age	0.5624101
absences	0.2821250
school	0.2774309
higher	0.2386127
Dalc	0.2171346
Mjob	0.1712615
Walc	0.1241261
schoolsup	0.1174620
Medu	0.0997061
activities	0.0802697
reason	0.0753583
Fedu	0.0465667
studytime	0.0334740
sex	0.0000000
address	0.0000000
famsize	0.0000000
Pstatus	0.0000000
Fjob	0.0000000
guardian	0.0000000
traveltime	0.0000000
famsup	0.0000000
paid	0.0000000
nursery	0.0000000
internet	0.0000000
romantic	0.0000000
famrel	0.0000000
freetime	0.0000000
goout	0.0000000
health	0.0000000

Test accuracy with the 162 students we left out before

162 students from LAST YEAR were not used to create the decision tree model. We know their final grades.

Model testing procedure:

Pretend that these students were not part of LAST YEAR’s cohort.
Ask computer to predict their final grades.
But wait! We actually know their final grades.
Compare true final grades to predicted final grades.
If the predictions were close enough, use this model to make predictions for THIS YEAR’s students.

Did we achieve our goal?

Look at the spreadsheet in the video.

Confusion matrix – Predicted results

	Actually failed	Actually passed
Predicted to fail	24	12
Predicted to pass	5	121

Out of the 162 students used to test the model:

24 people who actually failed were correctly predicted to fail
12 people who actually passed were incorrectly predicted to fail
5 people who actually failed incorrectly were predicted to pass
121 people who actually passed were correctly predicted to pass

Pass/fail cutoff for predictions: 10

Success metrics – cutoff = 10

	Actually failed	Actually passed
Predicted to fail	24	12
Predicted to pass	5	121

\[\text{accuracy} = \frac{\text{correct predictions}}{\text{total predictions attempted}} = \frac{24+121}{162} = .895\]

\[\text{students to remediate} = 24+12 = 36\] .

\[\text{students who fell through the cracks} = 5\] .

Pass/fail cutoff for predictions: 10

PAUSE TO REVIEW IF YOU WANT

Success metrics – cutoff = 11.5 (increased)

	Actually failed	Actually passed
Predicted to fail	29	56
Predicted to pass	0	77

\[\text{accuracy} = \frac{\text{correct predictions}}{\text{total predictions attempted}} = \frac{29+77}{162} = .654\]

\[\text{students to remediate} = 29+56 = 85\] .

\[\text{students who fell through the cracks} = 0\] .

Pass/fail cutoff for predictions: 11.5

PAUSE TO REVIEW IF YOU WANT

Side-by-side comparison

Decision tree prediction, cutoff = 10:

	Actually failed	Actually passed
Predicted to fail	24	12
Predicted to pass	5	121

Metrics:

Accuracy = .895
Students to remediate = 36
Students falling through cracks = 5

Decision tree prediction, cutoff = 11.5:

	Actually failed	Actually passed
Predicted to fail	29	56
Predicted to pass	0	77