I have created a working example of the Technical School student assignment lottery using this R Markdown. Use of R Markdown allows for all steps, including code, inputs, and outputs to be reviewed. I believe this is one example of how to have transparency throughout this process.
This document will follow the process I outlined previously in the flowchart. This process consists of five overarching steps:
Here’s the visual as a reminder:
1.The proposed system provides equal opportunity to all students while using weighting criteria. Every student is entered into the lottery with the exception of those who are ineligible. The weighting criteria can include additional priority layering based on policy if the district chooses to do so.
2.The methods can be shared in their entirety with the public to provide transparency. The process is intended to be well documented and available for public review if requested.
3.Capacity of schools is considered allowing for the proper amount of students to be assigned to schools.
1.The proposed process using a random shuffling function to generate the lottery order. Using R we are able to record the random seed so results can be replicated.
2.There are no guarantees when assigning weights to students. While odds increase, no student is guaranteed a seat unless capacity is larger than the number of applied student.
3.The code is unbiased. The only factors considered are those outlined in 603 CMR 4.00(d). Other factors cannot impact the results of the lottery.
4.The wait list is a continuation of the random list produced. After assigned to CTE School remaining students keep their existing lottery order within the waitlist.
Let’s jump in.
Here I intend to create a fair and random system for a lottery for students interested in entering career and technical education high schools.
Using this format would allow for transparency, the electronic lottery to be conducted publicly, and public sharing of the methodology used.
Below I include critical information that was shared in the scenario. This information was used to guide much of the decision making in the process.
We are assuming more students than seats. N = 150, this will result in a wait list for the school.
There is specific criteria for weighting student lottery entries:
Student Attendance
Student Discipline
Student Interest
# Add Packages
#install.packages("dplyr")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(knitr)
## Warning: package 'knitr' was built under R version 4.4.1
# Load in data
##set working directory
setwd("~/Downloads")
##create data frame
student_list <- read_csv("SPS_Example.csv")
## Rows: 7 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Student_Name, CTE_Application, Met_Attendance_Criteria, Met_Discipl...
## dbl (2): Ballot_ID, Student_ID
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##take a look at first few rows of data
kable(student_list[1:5,])
| Ballot_ID | Student_ID | Student_Name | CTE_Application | Met_Attendance_Criteria | Met_Discipline_Criteria | Met_Student_Interest_Criteria |
|---|---|---|---|---|---|---|
| 100001 | 12345678 | Student One | No | NA | NA | NA |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No |
| 100004 | 17181920 | Student Four | Yes | No | Yes | No |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes |
##check for duplicate data
duplicates <- student_list[student_list$Student_ID %in% student_list$Student_ID[duplicated(student_list$Student_ID)], ]
Things I’ve noticed:
There are some students who did not apply for career and technical education high school. I am making the assumption to exclude them from lottery. They will be removed during process.
Criteria are coded as Yes/No. I will assume criteria = Yes will mean that the student gains a weight. No will result in no additional entries for the student.
I will use Student_ID as the identifier variable moving forward.
No duplicates identified.
Here I will create a new variable Weight. Things that will contribute to adding value to weight:
#Update data set to create new variable Weight based on the values of the criteria
student_list <- student_list %>%
mutate(Weight = 1 +
ifelse(Met_Attendance_Criteria == "Yes", 1, 0) +
ifelse(Met_Discipline_Criteria == "Yes", 1, 0) +
ifelse(Met_Student_Interest_Criteria == "Yes", 1, 0)
)
#excluding students who did not apply to technical high school
student_list <- student_list %>%
filter(CTE_Application != "No")
kable(student_list[1:7,])
| Ballot_ID | Student_ID | Student_Name | CTE_Application | Met_Attendance_Criteria | Met_Discipline_Criteria | Met_Student_Interest_Criteria | Weight |
|---|---|---|---|---|---|---|---|
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 |
| 100004 | 17181920 | Student Four | Yes | No | Yes | No | 2 |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes | 3 |
| 100006 | 31323334 | Student Six | Yes | No | No | Yes | 2 |
| 100007 | 45678910 | Student Seven | Yes | No | No | No | 1 |
| NA | NA | NA | NA | NA | NA | NA | NA |
#Create new data set to show each row as ind. student entry (ex. if student has weight = 3 they receive 3 entire i.e. have three rows)
lottery <- student_list[rep(1:nrow(student_list), student_list$Weight), ]
kable(lottery[1:7,])
| Ballot_ID | Student_ID | Student_Name | CTE_Application | Met_Attendance_Criteria | Met_Discipline_Criteria | Met_Student_Interest_Criteria | Weight |
|---|---|---|---|---|---|---|---|
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 |
Here I will take the lottery list that was generated and add a varible LotteryRank that we can think of as the pull order. R comes with the function “set seed” that will allow for the random order to remain the same when replicating. I’ll use 811 as my seed.
#set random seed. Allow to replicate
set.seed(811)
#reorganize data to shuffle and assign lottery number to student
lottery <- lottery %>%
sample_n(nrow(.)) %>%
mutate(LotteryRank = row_number())
kable(lottery[1:15,])
| Ballot_ID | Student_ID | Student_Name | CTE_Application | Met_Attendance_Criteria | Met_Discipline_Criteria | Met_Student_Interest_Criteria | Weight | LotteryRank |
|---|---|---|---|---|---|---|---|---|
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 | 1 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 | 2 |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes | 3 | 3 |
| 100006 | 31323334 | Student Six | Yes | No | No | Yes | 2 | 4 |
| 100004 | 17181920 | Student Four | Yes | No | Yes | No | 2 | 5 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 | 6 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 | 7 |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes | 3 | 8 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 | 9 |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 | 10 |
| 100004 | 17181920 | Student Four | Yes | No | Yes | No | 2 | 11 |
| 100006 | 31323334 | Student Six | Yes | No | No | Yes | 2 | 12 |
| 100007 | 45678910 | Student Seven | Yes | No | No | No | 1 | 13 |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes | 3 | 14 |
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 | 15 |
In this table we can see that the order of the data frame now reflects the lottery order of the student name/ ballot number order that we previously saw.
##Create Student List
For the purpsose of this scenario I will pretend that the school only has 3 seats available. This way we can illustrate the generate of the wait list.
#set capacity of school
school_limit <- 3
lottery_unique <- lottery %>%
distinct(Student_ID, .keep_all = TRUE)
CTE <- lottery_unique %>%
slice(1:school_limit) %>%
mutate(Assignment = "CTE High School")
Waitlist <- lottery_unique %>%
slice((school_limit + 1):n()) %>%
mutate(Assignment = "Waitlist")
School_Enroll <- bind_rows(CTE, Waitlist) %>%
arrange(LotteryRank)
kable(School_Enroll[1:6,])
| Ballot_ID | Student_ID | Student_Name | CTE_Application | Met_Attendance_Criteria | Met_Discipline_Criteria | Met_Student_Interest_Criteria | Weight | LotteryRank | Assignment |
|---|---|---|---|---|---|---|---|---|---|
| 100003 | 13141516 | Student Three | Yes | Yes | Yes | No | 3 | 1 | CTE High School |
| 100005 | 21222324 | Student Five | Yes | No | Yes | Yes | 3 | 3 | CTE High School |
| 100006 | 31323334 | Student Six | Yes | No | No | Yes | 2 | 4 | CTE High School |
| 100004 | 17181920 | Student Four | Yes | No | Yes | No | 2 | 5 | Waitlist |
| 100002 | 89101112 | Student Two | Yes | Yes | Yes | Yes | 4 | 6 | Waitlist |
| 100007 | 45678910 | Student Seven | Yes | No | No | No | 1 | 13 | Waitlist |
In this document I have produced a final dataset that reflects the original data and provides a final column that shows CTE school assignment or wait list status. Along the way there are individual data sets we could extract if they are more useful. For example only the list of students assigned to CTE school without the wait list.
Based on the randomization used here students: students Three, Five, and Six have been admitted to the CTE school, Two, Four, and Seven have been added to a wait list, and student 1 has been removed based on not applying to CTE High Schools.
Based on your feedback all aspects of this can be adjusted. This has been a fun exercise.