Data Import and Preparation for Cleaning

Step 1 - Load R Packages and Import Data

This code chunk loads all packages needed for cleaning and organizing the data. If the packages are not downloaded on your RStudio software, you can first run install.packages() with the package name within quotes inside the parentheses.

library(tidyverse)
library(here)
library(janitor)
library(rio)
library(writexl)
library(lubridate)
library(stringr)


demographics1 <- import(here("data", "CA_SantaCruz_Demographics_12_2024_6_2025.xlsx"),
                  setclass = "tbl_df") 

demographics2 <- import(here("data", "CA_SantaCruz_Demographics_07_2025_12_2025.xlsx"),
                  setclass = "tbl_df") 

results1 <- import(here("data", "CA_SantaCruz_Assessments_12_2024_6_2025.xlsx"),
                  setclass = "tbl_df") 

results2 <- import(here("data", "CA_SantaCruz_Assessments_07_2025_12_2025.xlsx"),
                  setclass = "tbl_df")

Step 2 - Combine Raw Data Files

Since this year’s data is separated into multiple sets of files, we can merge them into one dataframe by following a set of steps to:

First, combine the demographic data into one dataframe and the results data into one dataframe using bind_rows()
Second, once our demographic data and results data are in single dataframes, we can join them by the Child ID variable

To properly use bind_rows(), make sure the columns in the original files have the same column names, are in the same column order, and have the same data structure.

Bind Demographic Data

names(demographics1)

## [1] "Child ID"         "Birthdate"        "Gender"           "Race"            
## [5] "Ethnicity"        "Language"         "Primary Language" "Petition Type"

names(demographics2)

## [1] "Child ID"         "Birthdate"        "Gender"           "Race"            
## [5] "Ethnicity"        "Language"         "Primary Language" "Petition Type"

str(demographics1)

## tibble [56 × 8] (S3: tbl_df/tbl/data.frame)
##  $ Child ID        : num [1:56] 4636048 4636553 5193896 5346463 5346465 ...
##  $ Birthdate       : POSIXct[1:56], format: "2004-07-26" "2006-08-28" ...
##  $ Gender          : chr [1:56] "Female" "Female" "Male" "Male" ...
##  $ Race            : chr [1:56] "White" "White" "White" "White" ...
##  $ Ethnicity       : chr [1:56] "Non-Hispanic" "Hispanic" "Non-Hispanic" "Hispanic" ...
##  $ Language        : chr [1:56] "English" "English" "English" "English" ...
##  $ Primary Language: chr [1:56] "True" "True" "True" "True" ...
##  $ Petition Type   : chr [1:56] "Dependency" "Dependency" "Dependency" "Dependency" ...

str(demographics2)

## tibble [65 × 8] (S3: tbl_df/tbl/data.frame)
##  $ Child ID        : num [1:65] 4636553 5193896 5549414 5799189 5799226 ...
##  $ Birthdate       : POSIXct[1:65], format: "2006-08-28" "2018-08-01" ...
##  $ Gender          : chr [1:65] "Female" "Male" "Male" "Female" ...
##  $ Race            : chr [1:65] "White" "White" "White" "Hispanic or Latino (of any race)" ...
##  $ Ethnicity       : chr [1:65] "Hispanic" "Non-Hispanic" "Hispanic" "Hispanic" ...
##  $ Language        : chr [1:65] "English" "English" "English" "English" ...
##  $ Primary Language: chr [1:65] "Yes" "Yes" "Yes" "Yes" ...
##  $ Petition Type   : chr [1:65] "Dependency" "Dependency" "Dependency" "Dependency" ...

demographics_bind <- bind_rows(demographics1,
                               demographics2,
                               .id = "SourceFile")

str(demographics_bind)

## tibble [121 × 9] (S3: tbl_df/tbl/data.frame)
##  $ SourceFile      : chr [1:121] "1" "1" "1" "1" ...
##  $ Child ID        : num [1:121] 4636048 4636553 5193896 5346463 5346465 ...
##  $ Birthdate       : POSIXct[1:121], format: "2004-07-26" "2006-08-28" ...
##  $ Gender          : chr [1:121] "Female" "Female" "Male" "Male" ...
##  $ Race            : chr [1:121] "White" "White" "White" "White" ...
##  $ Ethnicity       : chr [1:121] "Non-Hispanic" "Hispanic" "Non-Hispanic" "Hispanic" ...
##  $ Language        : chr [1:121] "English" "English" "English" "English" ...
##  $ Primary Language: chr [1:121] "True" "True" "True" "True" ...
##  $ Petition Type   : chr [1:121] "Dependency" "Dependency" "Dependency" "Dependency" ...

Bind Assessment Data

names(results1)

##  [1] "Child ID"                 "Assessment Date"         
##  [3] "Category"                 "Assessment"              
##  [5] "Total Child Score"        "Total Possible Score"    
##  [7] "Question Number"          "Question Name"           
##  [9] "Question"                 "Response Number"         
## [11] "Response"                 "Point Value"             
## [13] "Assigned to Program Date" "Program Closure Date"

names(results2)

##  [1] "Child ID"                 "Assessment Date"         
##  [3] "Category"                 "Assessment"              
##  [5] "Total Child Score"        "Total Possible Score"    
##  [7] "Question Number"          "Question Name"           
##  [9] "Question"                 "Response Number"         
## [11] "Response"                 "Point Value"             
## [13] "Assigned to Program Date" "Program Closure Date"

str(results1)

## tibble [3,738 × 14] (S3: tbl_df/tbl/data.frame)
##  $ Child ID                : num [1:3738] 7531459 7531459 7531459 7531459 7531459 ...
##  $ Assessment Date         : POSIXct[1:3738], format: "2025-06-04" "2025-06-04" ...
##  $ Category                : chr [1:3738] "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" ...
##  $ Assessment              : chr [1:3738] "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" ...
##  $ Total Child Score       : num [1:3738] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Total Possible Score    : num [1:3738] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Question Number         : num [1:3738] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name           : chr [1:3738] "ACEs-Intro" "ACEs-Q1-Emotional Neglect" "ACEs-Q2-Emotional Abuse" "ACEs-Q3-Physical Neglect" ...
##  $ Question                : chr [1:3738] "Document any adverse childhood experiences. This list of experiences represent child experiences and caregiver "| __truncated__ "Emotional Neglect: Do you think the child ever felt unsupported, unloved and/or unprotected?" "Emotional Abuse: Has a parent/caregiver ever insulted, humiliated, or put down the child?" "Physical Neglect: Has the child ever lacked appropriate care by any caregiver? (for example, not being protecte"| __truncated__ ...
##  $ Response Number         : num [1:3738] 1 1 1 1 1 2 1 2 1 1 ...
##  $ Response                : chr [1:3738] "Click here to acknowledge that you’ve read these instructions." "Yes" "Yes" "Yes" ...
##  $ Point Value             : num [1:3738] 0 0 0 0 0 1 0 1 0 0 ...
##  $ Assigned to Program Date: POSIXct[1:3738], format: "2025-04-09" "2025-04-09" ...
##  $ Program Closure Date    : logi [1:3738] NA NA NA NA NA NA ...

str(results2)

## tibble [3,320 × 14] (S3: tbl_df/tbl/data.frame)
##  $ Child ID                : num [1:3320] 6569605 6569605 6569605 6569605 6569605 ...
##  $ Assessment Date         : POSIXct[1:3320], format: "2025-07-01" "2025-07-01" ...
##  $ Category                : chr [1:3320] "CASA of Santa Cruz Advocacy Planning Survey:INTERIM" "CASA of Santa Cruz Advocacy Planning Survey:INTERIM" "CASA of Santa Cruz Advocacy Planning Survey:INTERIM" "CASA of Santa Cruz Advocacy Planning Survey:INTERIM" ...
##  $ Assessment              : chr [1:3320] "1. PACEs (Interim) 6+" "1. PACEs (Interim) 6+" "1. PACEs (Interim) 6+" "1. PACEs (Interim) 6+" ...
##  $ Total Child Score       : num [1:3320] 7 7 7 7 7 7 7 7 36 36 ...
##  $ Total Possible Score    : num [1:3320] 7 7 7 7 7 7 7 7 36 36 ...
##  $ Question Number         : num [1:3320] 1 2 3 4 5 6 7 8 1 2 ...
##  $ Question Name           : chr [1:3320] "PACEs-Intro" "PACEs-Q1-Access to Nature" "PACEs-Q2-Mental Health Care" "PACEs-Q3-Mindfulness" ...
##  $ Question                : chr [1:3320] "Document the types of positive experiences/activities you have engaged in or supported the youth/NMD to build r"| __truncated__ "Access to nature" "Mental Health Care" "Mindfulness" ...
##  $ Response Number         : num [1:3320] 1 1 1 1 1 1 1 1 1 3 ...
##  $ Response                : chr [1:3320] "Click here to acknowledge that you’ve read these instructions." "Yes" "Yes" "Yes" ...
##  $ Point Value             : num [1:3320] 0 1 1 1 1 1 1 1 0 3 ...
##  $ Assigned to Program Date: POSIXct[1:3320], format: "2022-02-07" "2022-02-07" ...
##  $ Program Closure Date    : POSIXct[1:3320], format: NA NA ...

results_bind <- bind_rows(results1,
                               results2,
                               .id = "SourceFile")

str(results_bind)

## tibble [7,058 × 15] (S3: tbl_df/tbl/data.frame)
##  $ SourceFile              : chr [1:7058] "1" "1" "1" "1" ...
##  $ Child ID                : num [1:7058] 7531459 7531459 7531459 7531459 7531459 ...
##  $ Assessment Date         : POSIXct[1:7058], format: "2025-06-04" "2025-06-04" ...
##  $ Category                : chr [1:7058] "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" ...
##  $ Assessment              : chr [1:7058] "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" ...
##  $ Total Child Score       : num [1:7058] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Total Possible Score    : num [1:7058] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Question Number         : num [1:7058] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name           : chr [1:7058] "ACEs-Intro" "ACEs-Q1-Emotional Neglect" "ACEs-Q2-Emotional Abuse" "ACEs-Q3-Physical Neglect" ...
##  $ Question                : chr [1:7058] "Document any adverse childhood experiences. This list of experiences represent child experiences and caregiver "| __truncated__ "Emotional Neglect: Do you think the child ever felt unsupported, unloved and/or unprotected?" "Emotional Abuse: Has a parent/caregiver ever insulted, humiliated, or put down the child?" "Physical Neglect: Has the child ever lacked appropriate care by any caregiver? (for example, not being protecte"| __truncated__ ...
##  $ Response Number         : num [1:7058] 1 1 1 1 1 2 1 2 1 1 ...
##  $ Response                : chr [1:7058] "Click here to acknowledge that you’ve read these instructions." "Yes" "Yes" "Yes" ...
##  $ Point Value             : num [1:7058] 0 0 0 0 0 1 0 1 0 0 ...
##  $ Assigned to Program Date: POSIXct[1:7058], format: "2025-04-09" "2025-04-09" ...
##  $ Program Closure Date    : POSIXct[1:7058], format: NA NA ...

Join Demographic and Assessment Data

To properly merge the data, we can use left_join(), but make sure the results data frame is on the left side of the function because it is larger, meaning all data in this data frame will be kept, and we will pull in the information from the demographic data that aligns based on the Child ID variable.

casa_data <- left_join(results_bind, demographics_bind)

Step 3 - Update Structure of Joined Data Frame

Prior to data wrangling by assessment time and assessment type for data analysis and visualization, I like to update the structure of the data frame to make it easier to manipulate the different types of data. I like to program data as either numeric, factors, or dates.

str(casa_data)

## tibble [7,396 × 22] (S3: tbl_df/tbl/data.frame)
##  $ SourceFile              : chr [1:7396] "1" "1" "1" "1" ...
##  $ Child ID                : num [1:7396] 7531459 7531459 7531459 7531459 7531459 ...
##  $ Assessment Date         : POSIXct[1:7396], format: "2025-06-04" "2025-06-04" ...
##  $ Category                : chr [1:7396] "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" ...
##  $ Assessment              : chr [1:7396] "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" ...
##  $ Total Child Score       : num [1:7396] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Total Possible Score    : num [1:7396] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Question Number         : num [1:7396] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name           : chr [1:7396] "ACEs-Intro" "ACEs-Q1-Emotional Neglect" "ACEs-Q2-Emotional Abuse" "ACEs-Q3-Physical Neglect" ...
##  $ Question                : chr [1:7396] "Document any adverse childhood experiences. This list of experiences represent child experiences and caregiver "| __truncated__ "Emotional Neglect: Do you think the child ever felt unsupported, unloved and/or unprotected?" "Emotional Abuse: Has a parent/caregiver ever insulted, humiliated, or put down the child?" "Physical Neglect: Has the child ever lacked appropriate care by any caregiver? (for example, not being protecte"| __truncated__ ...
##  $ Response Number         : num [1:7396] 1 1 1 1 1 2 1 2 1 1 ...
##  $ Response                : chr [1:7396] "Click here to acknowledge that you’ve read these instructions." "Yes" "Yes" "Yes" ...
##  $ Point Value             : num [1:7396] 0 0 0 0 0 1 0 1 0 0 ...
##  $ Assigned to Program Date: POSIXct[1:7396], format: "2025-04-09" "2025-04-09" ...
##  $ Program Closure Date    : POSIXct[1:7396], format: NA NA ...
##  $ Birthdate               : POSIXct[1:7396], format: "2012-12-17" "2012-12-17" ...
##  $ Gender                  : chr [1:7396] "Female" "Female" "Female" "Female" ...
##  $ Race                    : chr [1:7396] "Hispanic or Latino (of any race)" "Hispanic or Latino (of any race)" "Hispanic or Latino (of any race)" "Hispanic or Latino (of any race)" ...
##  $ Ethnicity               : chr [1:7396] "Hispanic" "Hispanic" "Hispanic" "Hispanic" ...
##  $ Language                : chr [1:7396] "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" ...
##  $ Primary Language        : chr [1:7396] "True" "True" "True" "True" ...
##  $ Petition Type           : chr [1:7396] "Dependency" "Dependency" "Dependency" "Dependency" ...

casa_data[sapply(casa_data, is.character)] <- lapply(casa_data[sapply(casa_data, is.character)],
                                               as.factor)

casa_data <- casa_data %>% 
  mutate(`Child ID` = as.factor(`Child ID`),
         `Response Number` = as.factor(`Response Number`))


casa_data$`Assessment Date` <- ymd(casa_data$`Assessment Date`)

casa_data$`Assigned to Program Date` <- ymd(casa_data$`Assigned to Program Date`)

casa_data$`Program Closure Date` <- ymd(casa_data$`Program Closure Date`)

casa_data$Birthdate <- ymd(casa_data$Birthdate)


casa_data <- casa_data %>% 
  mutate(`Assessment Date` = as.Date(`Assessment Date`),
         `Assigned to Program Date` = as.Date(`Assigned to Program Date`),
         `Program Closure Date` = as.Date(`Program Closure Date`),
         Birthdate = as.Date(Birthdate))

Last thing to do before the data is ready, let’s recode the variable we created using bind_rows() to make it easier to understand what the new column represents.

And let’s also use our date variables of Assessment Date and Birthdate to make a new variable for AssessmentAge and AssessmentAgeGroup

casa_data %>% 
  count(SourceFile)

## # A tibble: 2 × 2
##   SourceFile     n
##   <fct>      <int>
## 1 1           3738
## 2 2           3658

casa_data$SourceFile <- recode(casa_data$SourceFile,
                               `1` = "December2024-June2025",
                               `2` = "July2025-December2025")


casa_data <- casa_data %>% 
  mutate(AssessmentAge = round(interval(Birthdate, `Assessment Date`) / years(1), 1))

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        132
##  2 1. ACEs (INTAKE) 6+         660
##  3 1. PACEs (Interim) 0-5       56
##  4 1. PACEs (Interim) 6+       456
##  5 2. Wellbeing (INTAKE) 6+    780
##  6 2. Wellbeing (Interim) 6+   741
##  7 2a. ASQ-3 (INTAKE) 0-5       66
##  8 2a. ASQ-3 (Interim) 0-5      42
##  9 2b. ASQ-SE (INTAKE) 0-5      24
## 10 2b. ASQ-SE (Interim) 0-5     14
## # ℹ 16 more rows

casa_data <- casa_data %>%
  mutate(
    AssessmentAgeGroup = case_when(
      Assessment %in% c("1. ACEs (INTAKE) 0-5",
                        "1. PACEs (Interim) 0-5",
                        "2a. ASQ-3 (INTAKE) 0-5",
                        "2a. ASQ-3 (Interim) 0-5",
                        "2b. ASQ-SE (INTAKE) 0-5",
                        "2b. ASQ-SE (Interim) 0-5",
                        "3. Needs and Resources: Physical Health (INTAKE) 0-5",
                        "3. Needs and Resources: Physical Health (Interim) 0-5",
                        "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
                        "4. Needs and Resources: Emotional Health (Interim) 0-5",
                        "5. Needs and Resources: Learning (INTAKE) 0-5",
                        "5. Needs and Resources: Learning (Interim) 0-5") ~ "0-5",
      Assessment %in% c("1. ACEs (INTAKE) 6+",
                        "1. PACEs (Interim) 6+",
                        "2. Wellbeing (INTAKE) 6+",
                        "2. Wellbeing (Interim) 6+",
                        "3. Needs and Resources: Physical Health (INTAKE) 6+",
                        "3. Needs and Resources: Physical Health (Interim) 6+",
                        "4. Needs and Resources: Emotional Health (INTAKE) 6+",
                        "4. Needs and Resources: Emotional Health (Interim) 6+",
                        "5. Needs and Resources: Learning (INTAKE) 6+",
                        "5. Needs and Resources: Learning (Interim) 6+",
                        "6. Needs and Resources: Long Term Independence (INTAKE) 6+",
                        "6. Needs and Resources: Long Term Independence (Interim) 6+",
                        "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+",
                        "7. Needs and Resources: Probation-Involved Youth (Interim) 6+") ~ "6+",
    )
  )

casa_data <- casa_data %>% 
  select(c(1:2, 14:24, 3:13))

casa_data <- casa_data %>% 
  mutate(AssessmentAgeGroup = as.factor(AssessmentAgeGroup))

Another last step we can do is to remove the intro question statements from the data as they will not be needed for analysis or visualizations.With those introduction statements removed, we can then adjust the Question Number variable to appropriately align the Question Number and Question Name variables.

casa_data %>% 
  count(`Question Name`)

## # A tibble: 76 × 2
##    `Question Name`                         n
##    <fct>                               <int>
##  1 ACEs-Intro                             72
##  2 ACEs-Q1-Emotional Neglect              72
##  3 ACEs-Q10-Parent Separation/Divorce     72
##  4 ACEs-Q2-Emotional Abuse                72
##  5 ACEs-Q3-Physical Neglect               72
##  6 ACEs-Q4-Physical Abuse                 72
##  7 ACEs-Q5-Sexual Abuse                   72
##  8 ACEs-Q6-Incarcerated Caregiver         72
##  9 ACEs-Q7-Caregiver Treated Violently    72
## 10 ACEs-Q8-Mental Health                  72
## # ℹ 66 more rows

casa_data <- casa_data %>% 
  filter(`Question Name` != "ACEs-Intro" &
           `Question Name` != "ASQ-3-DVMLS-Intro" &
           `Question Name` != "EmotionalHealth-Intro" &
           `Question Name` != "Learning-Intro" &
           `Question Name` != "Longer Term-Intro" &
           `Question Name` != "PhysicalHealth-Intro" &
           `Question Name` != "ProbationYouth-Intro" &
           `Question Name` != "Wellbeing-Intro" &
           `Question Name` != "ASQ-SE Intro" & 
           `Question Name` != "PACEs-Intro")



casa_data <- casa_data %>% 
  mutate(`Question Number` = `Question Number` - 1)

Isolate Intake-Only Data

Step 1 - Filter Intake Data

The Category variable indicates whether the assessment entered was an intake or an interim. We can use that variable to isolate all intake assessments to be organized into their own dataframe and exported into .xlsx format.

casa_data %>% 
  count(Category)

## # A tibble: 2 × 2
##   Category                                                n
##   <fct>                                               <int>
## 1 CASA of Santa Cruz Advocacy Planning Survey:INTAKE   3515
## 2 CASA of Santa Cruz Advocacy Planning Survey:INTERIM  2979

casa_data_intake <- casa_data %>% 
  filter(Category == "CASA of Santa Cruz Advocacy Planning Survey:INTAKE")

Step 2 - Export Data

Create a subfolder in the working directory titled “data_export” to ensure the code below has a location and accurate file path to save the exported data.

write_xlsx(casa_data_intake, path = "data_export/Casa_SantaCruz_IntakeOnlyData_12.22.25.xlsx")

Isolate Interim Only Assessments (PACEs)

Since the PACEs assessment is only collected at interim, we can isolate it and export into its own dataframe.

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

casa_data_PACEs_interim <- casa_data %>% 
  filter(Assessment == "1. PACEs (Interim) 0-5" |
           Assessment == "1. PACEs (Interim) 6+")

write_xlsx(casa_data_PACEs_interim, path = "data_export/Casa_SantaCruz_InterimOnlyData_PACEs_12.22.25.xlsx")

Isolate Data by Earliest Intake and Latest Interim Assessment

To align the data by individuals’ earliest intake and latest interim assessments, we need to split apart the data by assessment and rebuild the complete dataframe in our desired structure.

The goal is to arrange the data identifying cases that match the following scenarios:

ASQ-3/SE intake and interim ages 0-5
Wellbeing intake and interim ages 6+
Needs and Resources Physical Health by:
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Emotional Health by:
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Learning by
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Long-Term Independence intake and interim 6+
Needs and Resources Probation Involved Youth intake and interim 6+

This code chunk provides a function that we can replicate across all assessments.

filter_intake_interim <- function(
  df,
  intake_lbl,
  interim_lbl,
  id_col = "Child ID",
  date_col = "Assessment Date",
  assess_col = "Assessment"
) {

  df <- df %>%
    mutate(across(all_of(date_col), as.Date))

  kids_both <- df %>%
    group_by(.data[[id_col]]) %>%
    summarise(
      has_intake  = any(.data[[assess_col]] == intake_lbl,  na.rm = TRUE),
      has_interim = any(.data[[assess_col]] == interim_lbl, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    filter(has_intake & has_interim) %>%
    select(.data[[id_col]])

  df_filt <- df %>%
    semi_join(kids_both, by = id_col)

  intake_dates <- df_filt %>%
    filter(.data[[assess_col]] == intake_lbl) %>%
    group_by(.data[[id_col]]) %>%
    summarise(intake_date = min(.data[[date_col]], na.rm = TRUE), .groups = "drop")

  interim_dates <- df_filt %>%
    filter(.data[[assess_col]] == interim_lbl) %>%
    group_by(.data[[id_col]]) %>%
    summarise(interim_date = max(.data[[date_col]], na.rm = TRUE), .groups = "drop")

  intake <- df_filt %>%
    filter(.data[[assess_col]] == intake_lbl) %>%
    inner_join(intake_dates, by = id_col) %>%
    filter(.data[[date_col]] == intake_date) %>%
    select(-intake_date)

  interim <- df_filt %>%
    filter(.data[[assess_col]] == interim_lbl) %>%
    inner_join(interim_dates, by = id_col) %>%
    filter(.data[[date_col]] == interim_date) %>%
    select(-interim_date)

  list(intake = intake, interim = interim)
}



# Suffix non-key columns (idempotent + safer on reruns)
key_cols <- c("Child ID",
                           "Question Number",
                           "Question Name",
                           "Question")

suffix_once <- function(nm, suf) ifelse(grepl(paste0(suf, "$"), nm), nm, paste0(nm, suf))


demo_cols <- c(
  "Birthdate",
  "Gender",
  "Race",
  "Ethnicity",
  "Language",
  "Primary Language",
  "Petition Type",
  "Assigned to Program Date",
  "Program Closure Date"
)

ASQ-3

Step 1 - Filter ASQ-3 Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

asq3 <- casa_data %>% 
  filter(Assessment == "2a. ASQ-3 (INTAKE) 0-5" | 
           Assessment == "2a. ASQ-3 (Interim) 0-5")

Step 2 - Filter for Intakes and Interims

We want to identify cases with completed intake and interim assessments.

asq3_sets <- filter_intake_interim(
  asq3,
  intake_lbl  = "2a. ASQ-3 (INTAKE) 0-5",
  interim_lbl = "2a. ASQ-3 (Interim) 0-5"
)

asq3_intake  <- asq3_sets$intake
asq3_interim <- asq3_sets$interim



asq3_intake_s <- asq3_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

asq3_interim_s <- asq3_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
asq3_joined <- left_join(
  asq3_intake_s,
  asq3_interim_s,
  by = key_cols
)

asq3_clean <- asq3_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

names(asq3_clean)

##  [1] "SourceFile_intake"            "Child ID"                    
##  [3] "Assigned to Program Date"     "Program Closure Date"        
##  [5] "Birthdate"                    "Gender"                      
##  [7] "Race"                         "Ethnicity"                   
##  [9] "Language"                     "Primary Language"            
## [11] "Petition Type"                "AssessmentAge_intake"        
## [13] "AssessmentAgeGroup_intake"    "Assessment Date_intake"      
## [15] "Category_intake"              "Assessment_intake"           
## [17] "Total Child Score_intake"     "Total Possible Score_intake" 
## [19] "Question Number"              "Question Name"               
## [21] "Question"                     "Response Number_intake"      
## [23] "Response_intake"              "Point Value_intake"          
## [25] "SourceFile_interim"           "AssessmentAge_interim"       
## [27] "AssessmentAgeGroup_interim"   "Assessment Date_interim"     
## [29] "Category_interim"             "Assessment_interim"          
## [31] "Total Child Score_interim"    "Total Possible Score_interim"
## [33] "Response Number_interim"      "Response_interim"            
## [35] "Point Value_interim"

asq3_clean <- asq3_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

ASQ-SE

Step 1 - Filter ASQ-SE Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

asq_se <- casa_data %>% 
  filter(Assessment == "2b. ASQ-SE (INTAKE) 0-5" | 
           Assessment == "2b. ASQ-SE (INTAKE) 0-5")

Only intake data is available for ASQ-SE results, so that data will be included in the Intakes only df and will not require intake/interim data joining.

Wellbeing

Step 1 - Filter Wellbeing Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

wellbeing <- casa_data %>% 
  filter(Assessment == "2. Wellbeing (INTAKE) 6+" | 
           Assessment == "2. Wellbeing (Interim) 6+")

Step 2 - Filter for Intakes and Interims

wellbeing_sets <- filter_intake_interim(
  wellbeing,
  intake_lbl  = "2. Wellbeing (INTAKE) 6+",
  interim_lbl = "2. Wellbeing (Interim) 6+"
)

wellbeing_intake  <- wellbeing_sets$intake
wellbeing_interim <- wellbeing_sets$interim



wellbeing_intake_s <- wellbeing_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

wellbeing_interim_s <- wellbeing_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
wellbeing_joined <- left_join(
  wellbeing_intake_s,
  wellbeing_interim_s,
  by = key_cols
)

wellbeing_clean <- wellbeing_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

wellbeing_clean <- wellbeing_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

Needs and Resources Physical Health

The Needs and Resources assessments require a few extra steps because we need to account for children who have 0-5 intake/interim, 6+ intake/interim, and 0-5 intake/6+ interim

Step 1 - Filter Physical Health Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

physical_health_0_5 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 0-5" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 0-5")

physical_health_6 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 6+" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 6+")

physical_health_0_5_6 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 0-5" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 6+")

Step 2 - Filter for Intakes and Interims

physicalhealth_sets_0_5 <- filter_intake_interim(
  physical_health_0_5,
  intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 0-5",
  interim_lbl = "3. Needs and Resources: Physical Health (Interim) 0-5"
)

physicalhealth_0_5_intake  <- physicalhealth_sets_0_5$intake
physicalhealth_0_5_interim <- physicalhealth_sets_0_5$interim



physicalhealth_0_5_intake_s <- physicalhealth_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

physicalhealth_0_5_interim_s <- physicalhealth_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
physicalhealth_0_5_joined <- left_join(
  physicalhealth_0_5_intake_s,
  physicalhealth_0_5_interim_s,
  by = key_cols
)

physicalhealth_0_5_clean <- physicalhealth_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

physicalhealth_sets_6 <- filter_intake_interim(
  physical_health_6,
  intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 6+",
  interim_lbl = "3. Needs and Resources: Physical Health (Interim) 6+"
)

physicalhealth_6_intake  <- physicalhealth_sets_6$intake
physicalhealth_6_interim <- physicalhealth_sets_6$interim



physicalhealth_6_intake_s <- physicalhealth_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

physicalhealth_6_interim_s <- physicalhealth_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
physicalhealth_6_joined <- left_join(
  physicalhealth_6_intake_s,
  physicalhealth_6_interim_s,
  by = key_cols
)

physicalhealth_6_clean <- physicalhealth_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing a physical health intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# physicalhealth_sets_0_5_6 <- filter_intake_interim(
#   physical_health_0_5_6,
#   intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 0-5",
#   interim_lbl = "3. Needs and Resources: Physical Health (Interim) 6+"
# )
# 
# physicalhealth_0_5_6_intake  <- physicalhealth_sets_0_5_6$intake
# physicalhealth_0_5_6_interim <- physicalhealth_sets_0_5_6$interim
# 
# 
# 
# physicalhealth_0_5_6_intake_s <- physicalhealth_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# physicalhealth_0_5_6_interim_s <- physicalhealth_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# physicalhealth_0_5_6_joined <- left_join(
#   physicalhealth_0_5_6_intake_s,
#   physicalhealth_0_5_6_interim_s,
#   by = key_cols
# )
# 
# physicalhealth_0_5_6_clean <- physicalhealth_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

physicalhealth_0_5_clean <- physicalhealth_0_5_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

physicalhealth_6_clean <- physicalhealth_6_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

# physicalhealth_0_5_6_clean <- physicalhealth_0_5_6_clean %>% 
#   select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

Needs and Resources Emotional Health

Step 1 - Filter Emotional Health Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

emotional_health_0_5 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 0-5" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 0-5")

emotional_health_6 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 6+" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 6+")

emotional_health_0_5_6 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 0-5" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 6+")

Step 2 - Filter for Intakes and Interims

emotionalhealth_sets_0_5 <- filter_intake_interim(
  emotional_health_0_5,
  intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
  interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 0-5"
)

emotionalhealth_0_5_intake  <- emotionalhealth_sets_0_5$intake
emotionalhealth_0_5_interim <- emotionalhealth_sets_0_5$interim



emotionalhealth_0_5_intake_s <- emotionalhealth_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

emotionalhealth_0_5_interim_s <- emotionalhealth_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
emotionalhealth_0_5_joined <- left_join(
  emotionalhealth_0_5_intake_s,
  emotionalhealth_0_5_interim_s,
  by = key_cols
)

emotionalhealth_0_5_clean <- emotionalhealth_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

emotionalhealth_sets_6 <- filter_intake_interim(
  emotional_health_6,
  intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 6+",
  interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 6+"
)

emotionalhealth_6_intake  <- emotionalhealth_sets_6$intake
emotionalhealth_6_interim <- emotionalhealth_sets_6$interim



emotionalhealth_6_intake_s <- emotionalhealth_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

emotionalhealth_6_interim_s <- emotionalhealth_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
emotionalhealth_6_joined <- left_join(
  emotionalhealth_6_intake_s,
  emotionalhealth_6_interim_s,
  by = key_cols
)

emotionalhealth_6_clean <- emotionalhealth_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing an emotional health intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# emotionalhealth_sets_0_5_6 <- filter_intake_interim(
#   emotional_health_0_5_6,
#   intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
#   interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 6+"
# )
# 
# emotionalhealth_0_5_6_intake  <- emotionalhealth_sets_0_5_6$intake
# emotionalhealth_0_5_6_interim <- emotionalhealth_sets_0_5_6$interim
# 
# 
# 
# emotionalhealth_0_5_6_intake_s <- emotionalhealth_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# emotionalhealth_0_5_6_interim_s <- emotionalhealth_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# emotionalhealth_0_5_6_joined <- left_join(
#   emotionalhealth_0_5_6_intake_s,
#   emotionalhealth_0_5_6_interim_s,
#   by = key_cols
# )
# 
# emotionalhealth_0_5_6_clean <- emotionalhealth_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

emotionalhealth_0_5_clean <- emotionalhealth_0_5_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

emotionalhealth_6_clean <- emotionalhealth_6_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

# emotionalhealth_0_5_6_clean <- emotionalhealth_0_5_6_clean %>% 
#   select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

Needs and Resources Learning

Step 1 - Filter Learning Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

learning_0_5 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 0-5" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 0-5")

learning_6 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 6+" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 6+")

learning_0_5_6 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 0-5" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 6+")

Step 2 - Filter for Intakes and Interims

learning_sets_0_5 <- filter_intake_interim(
  learning_0_5,
  intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 0-5",
  interim_lbl = "5. Needs and Resources: Learning (Interim) 0-5"
)

learning_0_5_intake  <- learning_sets_0_5$intake
learning_0_5_interim <- learning_sets_0_5$interim



learning_0_5_intake_s <- learning_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

learning_0_5_interim_s <- learning_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
learning_0_5_joined <- left_join(
  learning_0_5_intake_s,
  learning_0_5_interim_s,
  by = key_cols
)

learning_0_5_clean <- learning_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

learning_sets_6 <- filter_intake_interim(
  learning_6,
  intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 6+",
  interim_lbl = "5. Needs and Resources: Learning (Interim) 6+"
)

learning_6_intake  <- learning_sets_6$intake
learning_6_interim <- learning_sets_6$interim



learning_6_intake_s <- learning_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

learning_6_interim_s <- learning_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
learning_6_joined <- left_join(
  learning_6_intake_s,
  learning_6_interim_s,
  by = key_cols
)

learning_6_clean <- learning_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing a learning intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# 
# learning_sets_0_5_6 <- filter_intake_interim(
#   learning_0_5_6,
#   intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 0-5",
#   interim_lbl = "5. Needs and Resources: Learning (Interim) 6+"
# )
# 
# learning_0_5_6_intake  <- learning_sets_0_5_6$intake
# learning_0_5_6_interim <- learning_sets_0_5_6$interim
# 
# 
# 
# learning_0_5_6_intake_s <- learning_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# learning_0_5_6_interim_s <- learning_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# learning_0_5_6_joined <- left_join(
#   learning_0_5_6_intake_s,
#   learning_0_5_6_interim_s,
#   by = key_cols
# )
# 
# learning_0_5_6_clean <- learning_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

learning_0_5_clean <- learning_0_5_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

learning_6_clean <- learning_6_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

# learning_0_5_6_clean <- learning_0_5_6_clean %>% 
#   select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

Needs and Resources Long Term Independence

Long Term Indepedence appeared to have duplicated data entry so the steps were modified.

Step 1 - Filter Long Term Independence Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

INTAKE_LBL  <- "6. Needs and Resources: Long Term Independence (INTAKE) 6+"
INTERIM_LBL <- "6. Needs and Resources: Long Term Independence (Interim) 6+"

# LTI-specific item key (stronger than your global key_cols)
key_cols_lti <- c("Child ID", "Question Number", "Question Name")

LTI <- casa_data %>%
  filter(Assessment %in% c(INTAKE_LBL, INTERIM_LBL)) %>%
  distinct(
    `Child ID`, Assessment, `Assessment Date`, SourceFile,
    `Question Number`, `Question Name`, Question, Response, `Point Value`,
    .keep_all = TRUE
  )

Step 2 - Filter for Intakes and Interims

LTI_sets <- filter_intake_interim(
  LTI,
  intake_lbl  = INTAKE_LBL,
  interim_lbl = INTERIM_LBL
)

LTI_intake  <- LTI_sets$intake
LTI_interim <- LTI_sets$interim

LTI_intake  %>% count(across(all_of(key_cols_lti))) %>% summarise(max_n = max(n))

## # A tibble: 1 × 1
##   max_n
##   <int>
## 1     1

LTI_interim %>% count(across(all_of(key_cols_lti))) %>% summarise(max_n = max(n))

## # A tibble: 1 × 1
##   max_n
##   <int>
## 1     1

LTI_intake_s <- LTI_intake %>%
  rename_with(~ suffix_once(.x, "_intake"),  .cols = -all_of(key_cols_lti))

LTI_interim_s <- LTI_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols_lti))


LTI_joined <- inner_join(
  LTI_intake_s,
  LTI_interim_s,
  by = key_cols_lti
)

LTI_clean <- LTI_joined %>%
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

names(LTI_clean)

##  [1] "SourceFile_intake"            "Child ID"                    
##  [3] "Assigned to Program Date"     "Program Closure Date"        
##  [5] "Birthdate"                    "Gender"                      
##  [7] "Race"                         "Ethnicity"                   
##  [9] "Language"                     "Primary Language"            
## [11] "Petition Type"                "AssessmentAge_intake"        
## [13] "AssessmentAgeGroup_intake"    "Assessment Date_intake"      
## [15] "Category_intake"              "Assessment_intake"           
## [17] "Total Child Score_intake"     "Total Possible Score_intake" 
## [19] "Question Number"              "Question Name"               
## [21] "Question_intake"              "Response Number_intake"      
## [23] "Response_intake"              "Point Value_intake"          
## [25] "SourceFile_interim"           "AssessmentAge_interim"       
## [27] "AssessmentAgeGroup_interim"   "Assessment Date_interim"     
## [29] "Category_interim"             "Assessment_interim"          
## [31] "Total Child Score_interim"    "Total Possible Score_interim"
## [33] "Question_interim"             "Response Number_interim"     
## [35] "Response_interim"             "Point Value_interim"

LTI_clean <- LTI_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31, 32, 34, 35, 36))

LTI_clean <- LTI_clean %>% 
  rename("Question" = Question_intake)

Needs and Resources Probation-Involved Youth

Step 1 - Filter Wellbeing Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        120
##  2 1. ACEs (INTAKE) 6+         600
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+    720
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       55
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      12
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

PIY <- casa_data %>% 
  filter(Assessment == "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+" | 
           Assessment == "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+")

Step 2 - Filter for Intakes and Interims

PIY_sets <- filter_intake_interim(
  PIY,
  intake_lbl  = "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+",
  interim_lbl = "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+"
)

PIY_intake  <- PIY_sets$intake
PIY_interim <- PIY_sets$interim



PIY_intake_s <- PIY_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

PIY_interim_s <- PIY_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
PIY_joined <- left_join(
  PIY_intake_s,
  PIY_interim_s,
  by = key_cols
)

PIY_clean <- PIY_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

PIY_clean <- PIY_clean %>% 
  select(c(1, 25, 2:16, 26:30, 17:24, 31:35))

Bind Intake-Interim Data Objects

casa_intake_interim <- bind_rows(asq3_clean,
                                 wellbeing_clean,
                                 physicalhealth_0_5_clean,
                                 physicalhealth_6_clean,
                                 emotionalhealth_0_5_clean,
                                 emotionalhealth_6_clean,
                                 learning_0_5_clean,
                                 learning_6_clean,
                                 LTI_clean,
                                 PIY_clean,
                                 .id = "DataFrameSource")


casa_intake_interim <- casa_intake_interim %>% 
  arrange(`Child ID`)


str(casa_intake_interim)

## tibble [1,483 × 36] (S3: tbl_df/tbl/data.frame)
##  $ DataFrameSource             : chr [1:1483] "10" "10" "10" "10" ...
##  $ SourceFile_intake           : Factor w/ 2 levels "December2024-June2025",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SourceFile_interim          : Factor w/ 2 levels "December2024-June2025",..: 1 1 1 1 2 2 2 2 2 2 ...
##  $ Child ID                    : Factor w/ 91 levels "4636048","4636553",..: 1 1 1 1 3 3 3 3 3 3 ...
##  $ Assigned to Program Date    : Date[1:1483], format: "2019-06-04" "2019-06-04" ...
##  $ Program Closure Date        : Date[1:1483], format: NA NA ...
##  $ Birthdate                   : Date[1:1483], format: "2004-07-26" "2004-07-26" ...
##  $ Gender                      : Factor w/ 2 levels "Female","Male": 1 1 1 1 2 2 2 2 2 2 ...
##  $ Race                        : Factor w/ 5 levels "American Indian or Alaska Native",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ Ethnicity                   : Factor w/ 2 levels "Hispanic","Non-Hispanic": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Language                    : Factor w/ 4 levels "Bi-Lingual (English/Spanish)",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Primary Language            : Factor w/ 4 levels "False","No","True",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Petition Type               : Factor w/ 4 levels "Dependency","Dual Status",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ AssessmentAge_intake        : num [1:1483] 20.9 20.9 20.9 20.9 6.4 6.4 6.4 6.4 6.4 6.4 ...
##  $ AssessmentAgeGroup_intake   : Factor w/ 2 levels "0-5","6+": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment Date_intake      : Date[1:1483], format: "2025-06-30" "2025-06-30" ...
##  $ Category_intake             : Factor w/ 2 levels "CASA of Santa Cruz Advocacy Planning Survey:INTAKE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Assessment_intake           : Factor w/ 26 levels "1. ACEs (INTAKE) 0-5",..: 25 25 25 25 5 5 5 5 5 5 ...
##  $ AssessmentAge_interim       : num [1:1483] 20.9 20.9 20.9 20.9 7.3 7.3 7.3 7.3 7.3 7.3 ...
##  $ AssessmentAgeGroup_interim  : Factor w/ 2 levels "0-5","6+": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment Date_interim     : Date[1:1483], format: "2025-06-30" "2025-06-30" ...
##  $ Category_interim            : Factor w/ 2 levels "CASA of Santa Cruz Advocacy Planning Survey:INTAKE",..: 1 1 1 1 2 2 2 2 2 2 ...
##  $ Assessment_interim          : Factor w/ 26 levels "1. ACEs (INTAKE) 0-5",..: 25 25 25 25 6 6 6 6 6 6 ...
##  $ Total Child Score_intake    : num [1:1483] 12 12 12 12 32 32 32 32 32 32 ...
##  $ Total Possible Score_intake : num [1:1483] 12 12 12 12 36 36 36 36 36 36 ...
##  $ Question Number             : num [1:1483] 1 2 3 4 1 2 3 4 5 6 ...
##  $ Question Name               : Factor w/ 76 levels "ACEs-Intro","ACEs-Q1-Emotional Neglect",..: 67 68 69 70 24 25 26 73 74 75 ...
##  $ Question                    : Factor w/ 77 levels "Able to calm themselves",..: 43 48 45 46 10 74 9 2 1 25 ...
##  $ Response Number_intake      : Factor w/ 5 levels "1","2","3","4",..: 5 5 5 5 3 2 2 2 3 2 ...
##  $ Response_intake             : Factor w/ 29 levels "Above cutoff",..: 20 20 20 20 5 27 27 27 5 27 ...
##  $ Point Value_intake          : num [1:1483] 3 3 3 3 3 2 2 2 3 2 ...
##  $ Total Child Score_interim   : num [1:1483] 12 12 12 12 28 28 28 28 28 28 ...
##  $ Total Possible Score_interim: num [1:1483] 12 12 12 12 36 36 36 36 36 36 ...
##  $ Response Number_interim     : Factor w/ 5 levels "1","2","3","4",..: 5 5 5 5 2 2 2 2 2 2 ...
##  $ Response_interim            : Factor w/ 29 levels "Above cutoff",..: 20 20 20 20 27 27 27 27 27 27 ...
##  $ Point Value_interim         : num [1:1483] 3 3 3 3 2 2 2 2 2 2 ...

casa_intake_interim <- casa_intake_interim %>% 
  mutate(DataFrameSource = as.factor(DataFrameSource))


casa_intake_interim$DataFrameSource <- recode(casa_intake_interim$DataFrameSource,
                                              `1` = "ASQ-3_0-5",
                                              `2` = "Wellbeing_6+",
                                              `3` = "PhysicalHealth_0-5",
                                              `4` = "PhysicalHealth_6+",
                                              `5` = "EmotionalHealth_0-5",
                                              `6` = "EmotionalHealth_6+",
                                              `7` = "Learning_0-5",
                                              `8` = "Learning_6+",
                                              `9` = "LongTermIndependence_6+",
                                              `10` = "ProbationInvolvedYouth_6+")

write_xlsx(casa_intake_interim, path = "data_export/Casa_SantaCruz_Intake_Interim_Data_12.22.25.xlsx")

Isolate Data by Cases with Interims but Missing Intakes

To section walks through how to isolate cases where there is a present interim assessment but no corresponding intake data for the following scenarios:

ASQ-3/SE ages 0-5
Wellbeing ages 6+
Needs and Resources Physical Health by:
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Emotional Health by:
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Learning by
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Long-Term Independence ages 6+
Needs and Resources Probation Involved Youth ages 6+

ASQ-3

asq3_only_interim <- asq3 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


asq3_most_recent_interim <- asq3_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

ASQ SE

No observations meet the criteria, but the code is provided below if/when needed.

# asq_se_only_interim <- asq_se %>% 
#   group_by(`Child ID`) %>%
#   filter(all(str_detect(Assessment, "Interim")) & 
#            !any(str_detect(Assessment, "INTAKE")))
# 
# 
# asq_se_most_recent_interim <- asq_se_only_interim %>%
#   group_by(`Child ID`) %>%
#   filter(`Assessment Date` == max(`Assessment Date`)) %>%
#   arrange(`Child ID`)

Wellbeing

wellbeing_only_interim <- wellbeing %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


wellbeing_most_recent_interim <- wellbeing_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Physical Health 0-5

ph05_only_interim <- physical_health_0_5 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


ph05_most_recent_interim <- ph05_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Physical Health 6+

ph_6_only_interim <- physical_health_6 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


ph_6_most_recent_interim <- ph_6_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Emotional Health 0-5

eh05_only_interim <- emotional_health_0_5 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


eh05_most_recent_interim <- eh05_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources emotional Health 6+

eh_6_only_interim <- emotional_health_6 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


eh_6_most_recent_interim <- eh_6_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Learning 0-5

learning_05_only_interim <- learning_0_5 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


learning_05_most_recent_interim <- learning_05_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Learning 6+

learning_6_only_interim <- learning_6 %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


learning_6_most_recent_interim <- learning_6_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Long Term Independence 6+

lti_6_only_interim <- LTI %>% 
  group_by(`Child ID`) %>%
  filter(all(str_detect(Assessment, "Interim")) & 
           !any(str_detect(Assessment, "INTAKE")))


lti_6_most_recent_interim <- lti_6_only_interim %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`)) %>%
  arrange(`Child ID`)

Needs and Resources Probation-Involved Youth 6+

No cases meet this criteria, but the code is provided below if/when needed in the future.

# piy_6_only_interim <- PIY %>% 
#   group_by(`Child ID`) %>%
#   filter(all(str_detect(Assessment, "Interim")) & 
#            !any(str_detect(Assessment, "INTAKE")))
# 
# 
# piy_6_most_recent_interim <- piy_6_only_interim %>%
#   group_by(`Child ID`) %>%
#   filter(`Assessment Date` == max(`Assessment Date`)) %>%
#   arrange(`Child ID`)

Bind Data Together

casa_data_missing_intake <- bind_rows(asq3_most_recent_interim,
                                 wellbeing_most_recent_interim,
                                 ph05_most_recent_interim,
                                 ph_6_most_recent_interim,
                                 eh05_most_recent_interim,
                                 eh_6_most_recent_interim,
                                 learning_05_most_recent_interim,
                                 learning_6_most_recent_interim,
                                 lti_6_most_recent_interim,
                                 .id = "DataFrameSource")



casa_data_missing_intake <- casa_data_missing_intake %>% 
  arrange(`Child ID`)


str(casa_data_missing_intake)

## gropd_df [856 × 25] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ DataFrameSource         : chr [1:856] "2" "2" "2" "2" ...
##  $ SourceFile              : Factor w/ 2 levels "December2024-June2025",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Child ID                : Factor w/ 91 levels "4636048","4636553",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assigned to Program Date: Date[1:856], format: "2023-10-23" "2023-10-23" ...
##  $ Program Closure Date    : Date[1:856], format: NA NA ...
##  $ Birthdate               : Date[1:856], format: "2006-08-28" "2006-08-28" ...
##  $ Gender                  : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Race                    : Factor w/ 5 levels "American Indian or Alaska Native",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ Ethnicity               : Factor w/ 2 levels "Hispanic","Non-Hispanic": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Language                : Factor w/ 4 levels "Bi-Lingual (English/Spanish)",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Primary Language        : Factor w/ 4 levels "False","No","True",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Petition Type           : Factor w/ 4 levels "Dependency","Dual Status",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ AssessmentAge           : num [1:856] 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 ...
##  $ AssessmentAgeGroup      : Factor w/ 2 levels "0-5","6+": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment Date         : Date[1:856], format: "2025-11-18" "2025-11-18" ...
##  $ Category                : Factor w/ 2 levels "CASA of Santa Cruz Advocacy Planning Survey:INTAKE",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment              : Factor w/ 26 levels "1. ACEs (INTAKE) 0-5",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Total Child Score       : num [1:856] 25 25 25 25 25 25 25 25 25 25 ...
##  $ Total Possible Score    : num [1:856] 36 36 36 36 36 36 36 36 36 36 ...
##  $ Question Number         : num [1:856] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name           : Factor w/ 76 levels "ACEs-Intro","ACEs-Q1-Emotional Neglect",..: 24 25 26 73 74 75 12 13 14 15 ...
##  $ Question                : Factor w/ 77 levels "Able to calm themselves",..: 10 74 9 2 1 25 14 58 77 75 ...
##  $ Response Number         : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 2 2 1 1 2 2 3 ...
##  $ Response                : Factor w/ 29 levels "Above cutoff",..: 5 5 5 27 27 26 26 27 27 5 ...
##  $ Point Value             : num [1:856] 3 3 3 2 2 1 1 2 2 3 ...
##  - attr(*, "groups")= tibble [22 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ Child ID: Factor w/ 91 levels "4636048","4636553",..: 2 4 5 7 8 9 11 13 14 15 ...
##   ..$ .rows   : list<int> [1:22] 
##   .. ..$ : int [1:39] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ : int [1:39] 40 41 42 43 44 45 46 47 48 49 ...
##   .. ..$ : int [1:39] 79 80 81 82 83 84 85 86 87 88 ...
##   .. ..$ : int [1:39] 118 119 120 121 122 123 124 125 126 127 ...
##   .. ..$ : int [1:39] 157 158 159 160 161 162 163 164 165 166 ...
##   .. ..$ : int [1:70] 196 197 198 199 200 201 202 203 204 205 ...
##   .. ..$ : int [1:39] 266 267 268 269 270 271 272 273 274 275 ...
##   .. ..$ : int [1:39] 305 306 307 308 309 310 311 312 313 314 ...
##   .. ..$ : int [1:39] 344 345 346 347 348 349 350 351 352 353 ...
##   .. ..$ : int [1:70] 383 384 385 386 387 388 389 390 391 392 ...
##   .. ..$ : int [1:39] 453 454 455 456 457 458 459 460 461 462 ...
##   .. ..$ : int [1:39] 492 493 494 495 496 497 498 499 500 501 ...
##   .. ..$ : int [1:39] 531 532 533 534 535 536 537 538 539 540 ...
##   .. ..$ : int [1:39] 570 571 572 573 574 575 576 577 578 579 ...
##   .. ..$ : int [1:39] 609 610 611 612 613 614 615 616 617 618 ...
##   .. ..$ : int [1:39] 648 649 650 651 652 653 654 655 656 657 ...
##   .. ..$ : int [1:39] 687 688 689 690 691 692 693 694 695 696 ...
##   .. ..$ : int [1:24] 726 727 728 729 730 731 732 733 734 735 ...
##   .. ..$ : int [1:39] 750 751 752 753 754 755 756 757 758 759 ...
##   .. ..$ : int [1:39] 789 790 791 792 793 794 795 796 797 798 ...
##   .. ..$ : int [1:21] 828 829 830 831 832 833 834 835 836 837 ...
##   .. ..$ : int [1:8] 849 850 851 852 853 854 855 856
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE

casa_data_missing_intake <- casa_data_missing_intake %>% 
  mutate(DataFrameSource = as.factor(DataFrameSource))

casa_data_missing_intake %>% 
  count(DataFrameSource)

## # A tibble: 103 × 3
## # Groups:   Child ID [22]
##    `Child ID` DataFrameSource     n
##    <fct>      <fct>           <int>
##  1 4636553    2                  12
##  2 4636553    4                   3
##  3 4636553    6                   9
##  4 4636553    8                   7
##  5 4636553    9                   8
##  6 5346463    2                  12
##  7 5346463    4                   3
##  8 5346463    6                   9
##  9 5346463    8                   7
## 10 5346463    9                   8
## # ℹ 93 more rows

casa_data_missing_intake$DataFrameSource <- recode(casa_data_missing_intake$DataFrameSource,
                                              `1` = "ASQ-3_0-5",
                                              `2` = "Wellbeing_6+",
                                              `3` = "PhysicalHealth_0-5",
                                              `4` = "PhysicalHealth_6+",
                                              `5` = "EmotionalHealth_0-5",
                                              `6` = "EmotionalHealth_6+",
                                              `7` = "Learning_0-5",
                                              `8` = "Learning_6+",
                                              `9` = "LongTermIndependence_6+")

write_xlsx(casa_data_missing_intake, path = "data_export/Casa_SantaCruz_MissingIntakeInterim_Data_12.22.25.xlsx")

CASA Santa Cruz Optima Data Cleaning and Preparation

Jim Wright, Applied Survey Research

Data Import and Preparation for Cleaning

Step 1 - Load R Packages and Import Data

Step 2 - Combine Raw Data Files

Bind Demographic Data

Bind Assessment Data

Join Demographic and Assessment Data

Step 3 - Update Structure of Joined Data Frame

Isolate Intake-Only Data

Step 1 - Filter Intake Data

Step 2 - Export Data

Isolate Interim Only Assessments (PACEs)

Isolate Data by Earliest Intake and Latest Interim Assessment

ASQ-3

Step 1 - Filter ASQ-3 Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

ASQ-SE

Step 1 - Filter ASQ-SE Assessments

Wellbeing

Step 1 - Filter Wellbeing Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Needs and Resources Physical Health

Step 1 - Filter Physical Health Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Needs and Resources Emotional Health

Step 1 - Filter Emotional Health Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Needs and Resources Learning

Step 1 - Filter Learning Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Needs and Resources Long Term Independence

Step 1 - Filter Long Term Independence Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Needs and Resources Probation-Involved Youth

Step 1 - Filter Wellbeing Assessments

Step 2 - Filter for Intakes and Interims

Step 3 - Rearrange Column Order

Bind Intake-Interim Data Objects

Isolate Data by Cases with Interims but Missing Intakes

ASQ-3

ASQ SE

Wellbeing

Needs and Resources Physical Health 0-5

Needs and Resources Physical Health 6+

Needs and Resources Emotional Health 0-5

Needs and Resources emotional Health 6+

Needs and Resources Learning 0-5

Needs and Resources Learning 6+

Needs and Resources Long Term Independence 6+

Needs and Resources Probation-Involved Youth 6+

Bind Data Together

Create Master Data File