Rows: 4340 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): School Name*, Date of BOY MAP Math Test*, Date of MOY MAP Math Tes...
dbl (10): Student ID*, Student Grade Level*, Teacher ID*, BOY MAP Math Scale...
lgl (1): Treatment*
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(roster)
[1] "Student ID*" "Student Grade Level*"
[3] "School Name*" "Teacher ID*"
[5] "Treatment*" "BOY MAP Math Scaled Score*"
[7] "Date of BOY MAP Math Test*" "MOY MAP Math Scaled Score*"
[9] "Date of MOY MAP Math Test*" "EOY MAP Math Scaled Score*"
[11] "Date of EOY MAP Math Test*" "Gender"
[13] "IEP" "ELL"
[15] "FRPL" "Race"
[17] "Ethnicity" "Attendance Rate"
Change column headers for roster.
roster = roster %>%rename(student_id ="Student ID*",teacher_id ="Teacher ID*",grade ="Student Grade Level*", school ="School Name*", treat ="Treatment*", boy_math_score ="BOY MAP Math Scaled Score*", boy_math_date ="Date of BOY MAP Math Test*", moy_math_score ="MOY MAP Math Scaled Score*", moy_math_date ="Date of MOY MAP Math Test*", eoy_math_score ="EOY MAP Math Scaled Score*", eoy_math_date ="Date of EOY MAP Math Test*", gender ="Gender", iep ="IEP", ell ="ELL", frpl ="FRPL", race ="Race",ethn ="Ethnicity", attend_rate ="Attendance Rate" )colnames(roster)
Bennett Venture Academy Burton Glen Charter Academy
225 401
Holly Park Academy Laurus Academy
323 492
Linden Charter Academy North Saginaw Charter Academy
488 455
Oakside Prep Academy Stambaugh Charter Academy
612 260
Walton Charter Academy Windemere Park Charter Academy
521 360
Winterfield Venture Academy
203
Create school_num for each school.
roster = roster %>%mutate(school_num =recode(school,"Bennett Venture Academy"=1,"Burton Glen Charter Academy"=2,"Holly Park Academy"=3, "Laurus Academy"=4, "Linden Charter Academy"=5, "North Saginaw Charter Academy"=6, "Oakside Prep Academy"=7,"Stambaugh Charter Academy"=8, "Walton Charter Academy"=9, "Windemere Park Charter Academy"=10, "Winterfield Venture Academy"=11))table(roster$school_num)
Observation: Each duplicate student ID appears exactly 2 times. For a given duplicate student record, it appears to be the same individual (e.g., same demographics, same teacher), however, the duplicate record is tied to 2 different schools. These students have taken one test at one school site and completed their EOY test date at a different school site. This suggests that the students have transferred some point between BOY and EOY testing.
# A tibble: 16 × 19
student_id grade school teacher_id treat boy_math_score boy_math_date
<dbl> <dbl> <chr> <dbl> <lgl> <dbl> <chr>
1 566870 7 Walton Charte… 19525 NA NA <NA>
2 566870 7 Oakside Prep … 19525 NA 216 9/10/2025
3 605643 8 Walton Charte… 21216 NA 229 8/28/2025
4 605643 8 Oakside Prep … 21216 NA NA <NA>
5 632862 7 Oakside Prep … 23257 NA NA <NA>
6 632862 7 Walton Charte… 23257 NA 204 8/28/2025
7 670363 3 Linden Charte… 18378 NA 196 9/29/2025
8 670363 4 Linden Charte… 18378 NA NA <NA>
9 680297 3 Oakside Prep … 23154 NA NA <NA>
10 680297 3 Walton Charte… 23154 NA 181 8/28/2025
11 732197 5 Walton Charte… 17341 NA NA <NA>
12 732197 5 Oakside Prep … 17341 NA 198 9/3/2025
13 732203 8 Walton Charte… 23087 NA NA <NA>
14 732203 8 Oakside Prep … 23087 NA 199 9/10/2025
15 832717 3 Burton Glen C… 18916 NA NA <NA>
16 832717 3 Holly Park Ac… 18916 NA 172 8/21/2025
# ℹ 12 more variables: moy_math_score <dbl>, moy_math_date <chr>,
# eoy_math_score <dbl>, eoy_math_date <chr>, gender <chr>, iep <dbl>,
# ell <dbl>, frpl <dbl>, race <chr>, ethn <chr>, attend_rate <dbl>,
# school_num <dbl>
Decision: Collapse rows to obtain full test data but keep the school, grade, and demographics where EOY test was taken. Total observations reduced from 4340 to 4332 (8 duplicate student records removed).
#ensure student_id is same format for mergingroster_unique$student_id =as.character(roster_unique$student_id)tsl_nha_usage$student_id =as.character(tsl_nha_usage$student_id)merge_tsl_nha = roster_unique %>%left_join(tsl_nha_usage, by ="student_id")print(merge_tsl_nha)
Percentage missing for each variable. Observations: Most variables from the TSL file are missing, as expected; after merging the district roster with TSL usage file, not all students will be in the TSL system. Looking at district data columns, most of the data is present (test info, demographics, etc). The repeating missing percentage (6.37%) across EOY test info variables, school, grade, and demographics suggests there is a small group of students missing this info. The large majority of students are missing MOY test info. No student entry is missing an ID or treatment status, though.
Percentage of student who have both BOY and EOY data by school Assuming MOY is not of interest? Only 21% of the sample have all test info, including MOY test.
Create dummy for ethnicity. Observation: Ethn is often a duplication of Race, and Hispanic never appears under Race. Recode ethnicity into a Hispanic indicator?
table(merge_tsl_nha_test_info$ethn)
American Indian or Alaskan Native Asian
39 41
Black or African American Hispanic
2723 916
Native Hawaiian or Pacific Islander White
4 333
Decision: Keep Race as-is; Mutate Ethn to Hispanic indicator.
Reformatted session_BOY_date given by TSL (start date of treatment), from DD/MM/YYYY → MM/DD/YYYY.
#transform all date vars as datesmerge_tsl_nha_test_info = merge_tsl_nha_test_info %>%mutate(session_BOY_date =as.Date(session_BOY_date, format ="%d/%m/%Y"), eoy_math_date =as.Date(eoy_math_date, format ="%m/%d/%Y"), moy_math_date =as.Date(moy_math_date, format ="%m/%d/%Y"), boy_math_date =as.Date(boy_math_date, format ="%m/%d/%Y"), )
Create num_weeks by taking the difference in time between session start date and EOY assessment date. Average implementation period was about 14 weeks.
Save e2i file with all matched students from the merge, regardless of missing test info.
#select columns ready for e2itsl_nha_e2i_usage_test_info = merge_tsl_nha_test_info %>% dplyr::select(student_id, teacher_id, grade, school_num, boy_math_score, boy_math_date, eoy_math_score, eoy_math_date, male, iep, ell, frpl, attend_rate, treat, race_AI, race_A, race_B, race_NH, race_W,hisp_eth, boy_eoy_math,#1/0 indicator: boy and eoy math info present math_boy_missing_eoy_present, tsl_acct_id, session_BOY_date, #start date of TSL/treat n_co_confid, n_co_confid_same_imp, perc_confid_same_imp, score_confid_same_imp, enjoy_rate,activ_partic, n_lo_complete )#export e2i with test indicators, all matched students from the merge, regardless of missing test info (all_tests == 1 or 0)write.csv(tsl_nha_e2i_usage_test_info, file ="tsl_nha_e2i_all_students.csv", row.names =FALSE)
Save e2i file with only containing matched students from the merge with complete test info (all_tests means student has all BOY and EOY scores and dates).
table(tsl_nha_e2i_usage_test_info$boy_eoy_math)
0 1
486 3846
#subset: keep only students with complete test info (all_tests == 1)tsl_nha_e2i_all_tests = tsl_nha_e2i_usage_test_info %>%filter(boy_eoy_math ==1)print(tsl_nha_e2i_all_tests) # N=3846
Count of treatment students among the subset with complete test info.
table(tsl_nha_e2i_all_tests$treat)
0 1
3238 608
#export e2i only containing matched students from the merge with complete test info (all_tests == 1)write.csv(tsl_nha_e2i_all_tests, file ="tsl_nha_e2i_complete_tests.csv", row.names =FALSE)