School Lottery Working Example

I have created a working example of the Technical School student assignment lottery using this R Markdown. Use of R Markdown allows for all steps, including code, inputs, and outputs to be reviewed. I believe this is one example of how to have transparency throughout this process.

This document will follow the process I outlined previously in the flowchart. This process consists of five overarching steps:

Here’s the visual as a reminder:

Addressing Fairness and Randomness

Fairness

1.The proposed system provides equal opportunity to all students while using weighting criteria. Every student is entered into the lottery with the exception of those who are ineligible. The weighting criteria can include additional priority layering based on policy if the district chooses to do so.

2.The methods can be shared in their entirety with the public to provide transparency. The process is intended to be well documented and available for public review if requested.

3.Capacity of schools is considered allowing for the proper amount of students to be assigned to schools.

Randomness

1.The proposed process using a random shuffling function to generate the lottery order. Using R we are able to record the random seed so results can be replicated.

2.There are no guarantees when assigning weights to students. While odds increase, no student is guaranteed a seat unless capacity is larger than the number of applied student.

3.The code is unbiased. The only factors considered are those outlined in 603 CMR 4.00(d). Other factors cannot impact the results of the lottery.

4.The wait list is a continuation of the random list produced. After assigned to CTE School remaining students keep their existing lottery order within the waitlist.

Let’s jump in.

Identify Problem

Here I intend to create a fair and random system for a lottery for students interested in entering career and technical education high schools.

Using this format would allow for transparency, the electronic lottery to be conducted publicly, and public sharing of the methodology used.

Below I include critical information that was shared in the scenario. This information was used to guide much of the decision making in the process.

  1. We are assuming more students than seats. N = 150, this will result in a wait list for the school.

  2. There is specific criteria for weighting student lottery entries:

    1. Student Attendance

    2. Student Discipline

    3. Student Interest

Evaluate Data Structure

# Add Packages
#install.packages("dplyr")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(knitr)
## Warning: package 'knitr' was built under R version 4.4.1
# Load in data
##set working directory
setwd("~/Downloads")
##create data frame
student_list <- read_csv("SPS_Example.csv")
## Rows: 7 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Student_Name, CTE_Application, Met_Attendance_Criteria, Met_Discipl...
## dbl (2): Ballot_ID, Student_ID
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##take a look at first few rows of data
kable(student_list[1:5,])
Ballot_ID Student_ID Student_Name CTE_Application Met_Attendance_Criteria Met_Discipline_Criteria Met_Student_Interest_Criteria
100001 12345678 Student One No NA NA NA
100002 89101112 Student Two Yes Yes Yes Yes
100003 13141516 Student Three Yes Yes Yes No
100004 17181920 Student Four Yes No Yes No
100005 21222324 Student Five Yes No Yes Yes
##check for duplicate data
duplicates <- student_list[student_list$Student_ID %in% student_list$Student_ID[duplicated(student_list$Student_ID)], ]

Things I’ve noticed:

  • There are some students who did not apply for career and technical education high school. I am making the assumption to exclude them from lottery. They will be removed during process.

  • Criteria are coded as Yes/No. I will assume criteria = Yes will mean that the student gains a weight. No will result in no additional entries for the student.

  • I will use Student_ID as the identifier variable moving forward.

  • No duplicates identified.

Assign Weights

Here I will create a new variable Weight. Things that will contribute to adding value to weight:

  1. All students will start with Weight = 1, we can think of this as one entry into lottery
  2. Each Criteria = Yes, will add 1 to weight
#Update data set to create new variable Weight based on the values of the criteria
student_list <- student_list %>%
  mutate(Weight = 1 +
           ifelse(Met_Attendance_Criteria == "Yes", 1, 0) +
           ifelse(Met_Discipline_Criteria == "Yes", 1, 0) +
           ifelse(Met_Student_Interest_Criteria == "Yes", 1, 0)
         )
#excluding students who did not apply to technical high school
student_list <- student_list %>%
  filter(CTE_Application != "No")

kable(student_list[1:7,])
Ballot_ID Student_ID Student_Name CTE_Application Met_Attendance_Criteria Met_Discipline_Criteria Met_Student_Interest_Criteria Weight
100002 89101112 Student Two Yes Yes Yes Yes 4
100003 13141516 Student Three Yes Yes Yes No 3
100004 17181920 Student Four Yes No Yes No 2
100005 21222324 Student Five Yes No Yes Yes 3
100006 31323334 Student Six Yes No No Yes 2
100007 45678910 Student Seven Yes No No No 1
NA NA NA NA NA NA NA NA
#Create new data set to show each row as ind. student entry (ex. if student has weight = 3 they receive 3 entire i.e. have three rows)
lottery <- student_list[rep(1:nrow(student_list), student_list$Weight), ]
kable(lottery[1:7,])
Ballot_ID Student_ID Student_Name CTE_Application Met_Attendance_Criteria Met_Discipline_Criteria Met_Student_Interest_Criteria Weight
100002 89101112 Student Two Yes Yes Yes Yes 4
100002 89101112 Student Two Yes Yes Yes Yes 4
100002 89101112 Student Two Yes Yes Yes Yes 4
100002 89101112 Student Two Yes Yes Yes Yes 4
100003 13141516 Student Three Yes Yes Yes No 3
100003 13141516 Student Three Yes Yes Yes No 3
100003 13141516 Student Three Yes Yes Yes No 3

Randomize

Here I will take the lottery list that was generated and add a varible LotteryRank that we can think of as the pull order. R comes with the function “set seed” that will allow for the random order to remain the same when replicating. I’ll use 811 as my seed.

#set random seed. Allow to replicate
set.seed(811)
#reorganize data to shuffle and assign lottery number to student
lottery <- lottery %>%
  sample_n(nrow(.)) %>%
  mutate(LotteryRank = row_number())

kable(lottery[1:15,])
Ballot_ID Student_ID Student_Name CTE_Application Met_Attendance_Criteria Met_Discipline_Criteria Met_Student_Interest_Criteria Weight LotteryRank
100003 13141516 Student Three Yes Yes Yes No 3 1
100003 13141516 Student Three Yes Yes Yes No 3 2
100005 21222324 Student Five Yes No Yes Yes 3 3
100006 31323334 Student Six Yes No No Yes 2 4
100004 17181920 Student Four Yes No Yes No 2 5
100002 89101112 Student Two Yes Yes Yes Yes 4 6
100002 89101112 Student Two Yes Yes Yes Yes 4 7
100005 21222324 Student Five Yes No Yes Yes 3 8
100002 89101112 Student Two Yes Yes Yes Yes 4 9
100002 89101112 Student Two Yes Yes Yes Yes 4 10
100004 17181920 Student Four Yes No Yes No 2 11
100006 31323334 Student Six Yes No No Yes 2 12
100007 45678910 Student Seven Yes No No No 1 13
100005 21222324 Student Five Yes No Yes Yes 3 14
100003 13141516 Student Three Yes Yes Yes No 3 15

In this table we can see that the order of the data frame now reflects the lottery order of the student name/ ballot number order that we previously saw.

##Create Student List

For the purpsose of this scenario I will pretend that the school only has 3 seats available. This way we can illustrate the generate of the wait list.

#set capacity of school
school_limit <- 3


lottery_unique <- lottery %>%
  distinct(Student_ID, .keep_all = TRUE)

CTE <- lottery_unique %>%
  slice(1:school_limit) %>%
  mutate(Assignment = "CTE High School")

Waitlist <- lottery_unique %>%
  slice((school_limit + 1):n()) %>%
  mutate(Assignment = "Waitlist")

School_Enroll <- bind_rows(CTE, Waitlist) %>%
  arrange(LotteryRank)

kable(School_Enroll[1:6,])
Ballot_ID Student_ID Student_Name CTE_Application Met_Attendance_Criteria Met_Discipline_Criteria Met_Student_Interest_Criteria Weight LotteryRank Assignment
100003 13141516 Student Three Yes Yes Yes No 3 1 CTE High School
100005 21222324 Student Five Yes No Yes Yes 3 3 CTE High School
100006 31323334 Student Six Yes No No Yes 2 4 CTE High School
100004 17181920 Student Four Yes No Yes No 2 5 Waitlist
100002 89101112 Student Two Yes Yes Yes Yes 4 6 Waitlist
100007 45678910 Student Seven Yes No No No 1 13 Waitlist

Results

In this document I have produced a final dataset that reflects the original data and provides a final column that shows CTE school assignment or wait list status. Along the way there are individual data sets we could extract if they are more useful. For example only the list of students assigned to CTE school without the wait list.

Based on the randomization used here students: students Three, Five, and Six have been admitted to the CTE school, Two, Four, and Seven have been added to a wait list, and student 1 has been removed based on not applying to CTE High Schools.

Based on your feedback all aspects of this can be adjusted. This has been a fun exercise.