School Lottery Working Example

I have created a working example of the Technical School student assignment lottery using this R Markdown. Use of R Markdown allows for all steps, including code, inputs, and outputs to be reviewed. I believe this is one example of how to have transparency throughout this process.

This document will follow the process I outlined previously in the flowchart. This process consists of five overarching steps:

Identify Problem
Evaluate Data Structure
Establish System to Assign Weights
Randomize
Create Student Lists

Here’s the visual as a reminder:

Addressing Fairness and Randomness

Fairness

1.The proposed system provides equal opportunity to all students while using weighting criteria. Every student is entered into the lottery with the exception of those who are ineligible. The weighting criteria can include additional priority layering based on policy if the district chooses to do so.

2.The methods can be shared in their entirety with the public to provide transparency. The process is intended to be well documented and available for public review if requested.

3.Capacity of schools is considered allowing for the proper amount of students to be assigned to schools.

Randomness

1.The proposed process using a random shuffling function to generate the lottery order. Using R we are able to record the random seed so results can be replicated.

2.There are no guarantees when assigning weights to students. While odds increase, no student is guaranteed a seat unless capacity is larger than the number of applied student.

3.The code is unbiased. The only factors considered are those outlined in 603 CMR 4.00(d). Other factors cannot impact the results of the lottery.

4.The wait list is a continuation of the random list produced. After assigned to CTE School remaining students keep their existing lottery order within the waitlist.

Let’s jump in.

Identify Problem

Here I intend to create a fair and random system for a lottery for students interested in entering career and technical education high schools.

Using this format would allow for transparency, the electronic lottery to be conducted publicly, and public sharing of the methodology used.

Below I include critical information that was shared in the scenario. This information was used to guide much of the decision making in the process.

We are assuming more students than seats. N = 150, this will result in a wait list for the school.
There is specific criteria for weighting student lottery entries:
1. Student Attendance
2. Student Discipline
3. Student Interest

Evaluate Data Structure

# Add Packages
#install.packages("dplyr")
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)
library(knitr)

## Warning: package 'knitr' was built under R version 4.4.1

# Load in data
##set working directory
setwd("~/Downloads")
##create data frame
student_list <- read_csv("SPS_Example.csv")

## Rows: 7 Columns: 7

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Student_Name, CTE_Application, Met_Attendance_Criteria, Met_Discipl...
## dbl (2): Ballot_ID, Student_ID
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

##take a look at first few rows of data
kable(student_list[1:5,])

Ballot_ID	Student_ID	Student_Name	CTE_Application	Met_Attendance_Criteria	Met_Discipline_Criteria	Met_Student_Interest_Criteria
100001	12345678	Student One	No	NA	NA	NA
100002	89101112	Student Two	Yes	Yes	Yes	Yes
100003	13141516	Student Three	Yes	Yes	Yes	No
100004	17181920	Student Four	Yes	No	Yes	No
100005	21222324	Student Five	Yes	No	Yes	Yes

##check for duplicate data
duplicates <- student_list[student_list$Student_ID %in% student_list$Student_ID[duplicated(student_list$Student_ID)], ]

Things I’ve noticed:

There are some students who did not apply for career and technical education high school. I am making the assumption to exclude them from lottery. They will be removed during process.
Criteria are coded as Yes/No. I will assume criteria = Yes will mean that the student gains a weight. No will result in no additional entries for the student.
I will use Student_ID as the identifier variable moving forward.
No duplicates identified.

Assign Weights

Here I will create a new variable Weight. Things that will contribute to adding value to weight:

All students will start with Weight = 1, we can think of this as one entry into lottery
Each Criteria = Yes, will add 1 to weight

#Update data set to create new variable Weight based on the values of the criteria
student_list <- student_list %>%
  mutate(Weight = 1 +
           ifelse(Met_Attendance_Criteria == "Yes", 1, 0) +
           ifelse(Met_Discipline_Criteria == "Yes", 1, 0) +
           ifelse(Met_Student_Interest_Criteria == "Yes", 1, 0)
         )
#excluding students who did not apply to technical high school
student_list <- student_list %>%
  filter(CTE_Application != "No")

kable(student_list[1:7,])

Ballot_ID	Student_ID	Student_Name	CTE_Application	Met_Attendance_Criteria	Met_Discipline_Criteria	Met_Student_Interest_Criteria	Weight
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4
100003	13141516	Student Three	Yes	Yes	Yes	No	3
100004	17181920	Student Four	Yes	No	Yes	No	2
100005	21222324	Student Five	Yes	No	Yes	Yes	3
100006	31323334	Student Six	Yes	No	No	Yes	2
100007	45678910	Student Seven	Yes	No	No	No	1
NA	NA	NA	NA	NA	NA	NA	NA

#Create new data set to show each row as ind. student entry (ex. if student has weight = 3 they receive 3 entire i.e. have three rows)
lottery <- student_list[rep(1:nrow(student_list), student_list$Weight), ]
kable(lottery[1:7,])

Ballot_ID	Student_ID	Student_Name	CTE_Application	Met_Attendance_Criteria	Met_Discipline_Criteria	Met_Student_Interest_Criteria	Weight
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4
100003	13141516	Student Three	Yes	Yes	Yes	No	3
100003	13141516	Student Three	Yes	Yes	Yes	No	3
100003	13141516	Student Three	Yes	Yes	Yes	No	3

Randomize

Here I will take the lottery list that was generated and add a varible LotteryRank that we can think of as the pull order. R comes with the function “set seed” that will allow for the random order to remain the same when replicating. I’ll use 811 as my seed.

#set random seed. Allow to replicate
set.seed(811)
#reorganize data to shuffle and assign lottery number to student
lottery <- lottery %>%
  sample_n(nrow(.)) %>%
  mutate(LotteryRank = row_number())

kable(lottery[1:15,])

Ballot_ID	Student_ID	Student_Name	CTE_Application	Met_Attendance_Criteria	Met_Discipline_Criteria	Met_Student_Interest_Criteria	Weight	LotteryRank
100003	13141516	Student Three	Yes	Yes	Yes	No	3	1
100003	13141516	Student Three	Yes	Yes	Yes	No	3	2
100005	21222324	Student Five	Yes	No	Yes	Yes	3	3
100006	31323334	Student Six	Yes	No	No	Yes	2	4
100004	17181920	Student Four	Yes	No	Yes	No	2	5
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4	6
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4	7
100005	21222324	Student Five	Yes	No	Yes	Yes	3	8
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4	9
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4	10
100004	17181920	Student Four	Yes	No	Yes	No	2	11
100006	31323334	Student Six	Yes	No	No	Yes	2	12
100007	45678910	Student Seven	Yes	No	No	No	1	13
100005	21222324	Student Five	Yes	No	Yes	Yes	3	14
100003	13141516	Student Three	Yes	Yes	Yes	No	3	15

In this table we can see that the order of the data frame now reflects the lottery order of the student name/ ballot number order that we previously saw.

##Create Student List

For the purpsose of this scenario I will pretend that the school only has 3 seats available. This way we can illustrate the generate of the wait list.

#set capacity of school
school_limit <- 3


lottery_unique <- lottery %>%
  distinct(Student_ID, .keep_all = TRUE)

CTE <- lottery_unique %>%
  slice(1:school_limit) %>%
  mutate(Assignment = "CTE High School")

Waitlist <- lottery_unique %>%
  slice((school_limit + 1):n()) %>%
  mutate(Assignment = "Waitlist")

School_Enroll <- bind_rows(CTE, Waitlist) %>%
  arrange(LotteryRank)

kable(School_Enroll[1:6,])

Ballot_ID	Student_ID	Student_Name	CTE_Application	Met_Attendance_Criteria	Met_Discipline_Criteria	Met_Student_Interest_Criteria	Weight	LotteryRank	Assignment
100003	13141516	Student Three	Yes	Yes	Yes	No	3	1	CTE High School
100005	21222324	Student Five	Yes	No	Yes	Yes	3	3	CTE High School
100006	31323334	Student Six	Yes	No	No	Yes	2	4	CTE High School
100004	17181920	Student Four	Yes	No	Yes	No	2	5	Waitlist
100002	89101112	Student Two	Yes	Yes	Yes	Yes	4	6	Waitlist
100007	45678910	Student Seven	Yes	No	No	No	1	13	Waitlist

Results

In this document I have produced a final dataset that reflects the original data and provides a final column that shows CTE school assignment or wait list status. Along the way there are individual data sets we could extract if they are more useful. For example only the list of students assigned to CTE school without the wait list.

Based on the randomization used here students: students Three, Five, and Six have been admitted to the CTE school, Two, Four, and Seven have been added to a wait list, and student 1 has been removed based on not applying to CTE High Schools.

Based on your feedback all aspects of this can be adjusted. This has been a fun exercise.

Springfield Public Schools Interview 2

Haylea Hubacz

2025-08-11