Data Import and Preparation for Cleaning

Step 1 - Load R Packages and Import Data

This code chunk loads all packages needed for cleaning and organizing the data. If the packages are not downloaded on your RStudio software, you can first run install.packages() with the package name within quotes inside the parentheses.

library(tidyverse)
library(here)
library(janitor)
library(rio)
library(writexl)
library(lubridate)
library(stringr)

demographics <- import(here("data", "Running_Data_File_Demographics.xlsx"),
                  setclass = "tbl_df")


results <- import(here("data", "Running_Data_File_Assessments.xlsx"),
                  setclass = "tbl_df")

Step 2 - Join Demographic and Assessment Data

To properly merge the data, we can use left_join(), but make sure the results data frame is on the left side of the function because it is larger, meaning all data in this data frame will be kept, and we will pull in the information from the demographic data that aligns based on the Child ID variable.

casa_data <- left_join(results, demographics)

Step 3 - Update Structure of Joined Data Frame

Prior to data wrangling by assessment time and assessment type for data analysis and visualization, I like to update the structure of the data frame to make it easier to manipulate the different types of data. I like to program data as either numeric, factors, or dates.

str(casa_data)

## tibble [9,434 × 22] (S3: tbl_df/tbl/data.frame)
##  $ SourceFile              : chr [1:9434] "July2024-December2024" "July2024-December2024" "July2024-December2024" "July2024-December2024" ...
##  $ Child ID                : num [1:9434] 4637144 4637144 4637144 4637144 4637144 ...
##  $ Assessment Date         : POSIXct[1:9434], format: "2024-10-24" "2024-10-24" ...
##  $ Category                : chr [1:9434] "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" "CASA of Santa Cruz Advocacy Planning Survey:INTAKE" ...
##  $ Assessment              : chr [1:9434] "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" "1. ACEs (INTAKE) 6+" ...
##  $ Total Child Score       : num [1:9434] 5 5 5 5 5 5 5 5 5 5 ...
##  $ Total Possible Score    : num [1:9434] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Question Number         : num [1:9434] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name           : chr [1:9434] "ACEs-Intro" "ACEs-Q1-Emotional Neglect" "ACEs-Q2-Emotional Abuse" "ACEs-Q3-Physical Neglect" ...
##  $ Question                : chr [1:9434] "Document any adverse childhood experiences. This list of experiences represent child experiences and caregiver "| __truncated__ "Emotional Neglect: Do you think the child ever felt unsupported, unloved and/or unprotected?" "Emotional Abuse: Has a parent/caregiver ever insulted, humiliated, or put down the child?" "Physical Neglect: Has the child ever lacked appropriate care by any caregiver? (for example, not being protecte"| __truncated__ ...
##  $ Response Number         : num [1:9434] 1 1 1 1 2 2 2 2 2 1 ...
##  $ Response                : chr [1:9434] "Click here to acknowledge that you’ve read these instructions." "Yes" "Yes" "Yes" ...
##  $ Point Value             : num [1:9434] 0 0 0 0 1 1 1 1 1 0 ...
##  $ Assigned to Program Date: POSIXct[1:9434], format: "2023-10-20" "2023-10-20" ...
##  $ Program Closure Date    : logi [1:9434] NA NA NA NA NA NA ...
##  $ Birthdate               : POSIXct[1:9434], format: "2004-01-22" "2004-01-22" ...
##  $ Gender                  : chr [1:9434] "Female" "Female" "Female" "Female" ...
##  $ Race                    : chr [1:9434] "White" "White" "White" "White" ...
##  $ Ethnicity               : chr [1:9434] "Hispanic" "Hispanic" "Hispanic" "Hispanic" ...
##  $ Language                : chr [1:9434] "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" "Bi-Lingual (English/Spanish)" ...
##  $ Primary Language        : chr [1:9434] NA NA NA NA ...
##  $ Petition Type           : chr [1:9434] "Dependency" "Dependency" "Dependency" "Dependency" ...

casa_data[sapply(casa_data, is.character)] <- lapply(casa_data[sapply(casa_data, is.character)],
                                               as.factor)

casa_data <- casa_data %>% 
  mutate(`Child ID` = as.factor(`Child ID`),
         `Question Name` = as.factor(`Question Name`))


casa_data$`Assessment Date` <- ymd(casa_data$`Assessment Date`)

casa_data$`Assigned to Program Date` <- ymd(casa_data$`Assigned to Program Date`)

casa_data$`Program Closure Date` <- ymd(casa_data$`Program Closure Date`)

casa_data$Birthdate <- ymd(casa_data$Birthdate)


casa_data <- casa_data %>% 
  mutate(`Assessment Date` = as.Date(`Assessment Date`),
         `Assigned to Program Date` = as.Date(`Assigned to Program Date`),
         `Program Closure Date` = as.Date(`Program Closure Date`),
         Birthdate = as.Date(Birthdate))

Last things to do before the data is ready:

Use the date variables of Assessment Date and Birthdate to make a new variable for AssessmentAge and AssessmentAgeGroup
Create a new variable for Fiscal Year based on the source file export of the data.

casa_data <- casa_data %>% 
  mutate(AssessmentAge = round(interval(Birthdate, `Assessment Date`) / years(1), 1))


casa_data <- casa_data %>% 
  mutate(
    FiscalYear = case_when(
      SourceFile == "July2024-December2024"   ~ "FY 2024–25",
      SourceFile == "January2025-June2025"    ~ "FY 2024–25",
      SourceFile == "July2025-December2025"   ~ "FY 2025–26",
      TRUE                                    ~ NA_character_
    )
  )

casa_data <- casa_data %>% 
  select(c(24, 1:23))

casa_data <- casa_data %>% 
  mutate(FiscalYear = as.factor(FiscalYear))

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        176
##  2 1. ACEs (INTAKE) 6+        1012
##  3 1. PACEs (Interim) 0-5       56
##  4 1. PACEs (Interim) 6+       456
##  5 2. Wellbeing (INTAKE) 6+   1196
##  6 2. Wellbeing (Interim) 6+   741
##  7 2a. ASQ-3 (INTAKE) 0-5       90
##  8 2a. ASQ-3 (Interim) 0-5      42
##  9 2b. ASQ-SE (INTAKE) 0-5      30
## 10 2b. ASQ-SE (Interim) 0-5     14
## # ℹ 16 more rows

casa_data <- casa_data %>%
  mutate(
    AssessmentAgeGroup = case_when(
      Assessment %in% c("1. ACEs (INTAKE) 0-5",
                        "1. PACEs (Interim) 0-5",
                        "2a. ASQ-3 (INTAKE) 0-5",
                        "2a. ASQ-3 (Interim) 0-5",
                        "2b. ASQ-SE (INTAKE) 0-5",
                        "2b. ASQ-SE (Interim) 0-5",
                        "3. Needs and Resources: Physical Health (INTAKE) 0-5",
                        "3. Needs and Resources: Physical Health (Interim) 0-5",
                        "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
                        "4. Needs and Resources: Emotional Health (Interim) 0-5",
                        "5. Needs and Resources: Learning (INTAKE) 0-5",
                        "5. Needs and Resources: Learning (Interim) 0-5") ~ "0-5",
      Assessment %in% c("1. ACEs (INTAKE) 6+",
                        "1. PACEs (Interim) 6+",
                        "2. Wellbeing (INTAKE) 6+",
                        "2. Wellbeing (Interim) 6+",
                        "3. Needs and Resources: Physical Health (INTAKE) 6+",
                        "3. Needs and Resources: Physical Health (Interim) 6+",
                        "4. Needs and Resources: Emotional Health (INTAKE) 6+",
                        "4. Needs and Resources: Emotional Health (Interim) 6+",
                        "5. Needs and Resources: Learning (INTAKE) 6+",
                        "5. Needs and Resources: Learning (Interim) 6+",
                        "6. Needs and Resources: Long Term Independence (INTAKE) 6+",
                        "6. Needs and Resources: Long Term Independence (Interim) 6+",
                        "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+",
                        "7. Needs and Resources: Probation-Involved Youth (Interim) 6+") ~ "6+",
    )
  )

casa_data <- casa_data %>% 
  select(c(1:3, 15:25, 4:14))

casa_data <- casa_data %>% 
  mutate(AssessmentAgeGroup = as.factor(AssessmentAgeGroup))

Another last step we can do is to remove the intro question statements from the data as they will not be needed for analysis or visualizations.With those introduction statements removed, we can then adjust the Question Number variable to appropriately align the Question Number and Question Name variables.

casa_data %>% 
  count(`Question Name`)

## # A tibble: 76 × 2
##    `Question Name`                         n
##    <fct>                               <int>
##  1 ACEs-Intro                            108
##  2 ACEs-Q1-Emotional Neglect             108
##  3 ACEs-Q10-Parent Separation/Divorce    108
##  4 ACEs-Q2-Emotional Abuse               108
##  5 ACEs-Q3-Physical Neglect              108
##  6 ACEs-Q4-Physical Abuse                108
##  7 ACEs-Q5-Sexual Abuse                  108
##  8 ACEs-Q6-Incarcerated Caregiver        108
##  9 ACEs-Q7-Caregiver Treated Violently   108
## 10 ACEs-Q8-Mental Health                 108
## # ℹ 66 more rows

casa_data <- casa_data %>% 
  filter(`Question Name` != "ACEs-Intro" &
           `Question Name` != "ASQ-3-DVMLS-Intro" &
           `Question Name` != "EmotionalHealth-Intro" &
           `Question Name` != "Learning-Intro" &
           `Question Name` != "Longer Term-Intro" &
           `Question Name` != "PhysicalHealth-Intro" &
           `Question Name` != "ProbationYouth-Intro" &
           `Question Name` != "Wellbeing-Intro" &
           `Question Name` != "ASQ-SE Intro" & 
           `Question Name` != "PACEs-Intro")


casa_data <- casa_data %>% 
  mutate(`Question Number` = `Question Number` - 1)

Isolate Intake-Only Data

Step 1 - Filter Intake Data

The Category variable indicates whether the assessment entered was an intake or an interim. We can use that variable to isolate all intake assessments to be organized into their own dataframe and exported into .xlsx format.

names(casa_data)

##  [1] "FiscalYear"               "SourceFile"              
##  [3] "Child ID"                 "Assigned to Program Date"
##  [5] "Program Closure Date"     "Birthdate"               
##  [7] "Gender"                   "Race"                    
##  [9] "Ethnicity"                "Language"                
## [11] "Primary Language"         "Petition Type"           
## [13] "AssessmentAge"            "AssessmentAgeGroup"      
## [15] "Assessment Date"          "Category"                
## [17] "Assessment"               "Total Child Score"       
## [19] "Total Possible Score"     "Question Number"         
## [21] "Question Name"            "Question"                
## [23] "Response Number"          "Response"                
## [25] "Point Value"

casa_data %>% 
  count(Category)

## # A tibble: 2 × 2
##   Category                                                n
##   <fct>                                               <int>
## 1 CASA of Santa Cruz Advocacy Planning Survey:INTAKE   5312
## 2 CASA of Santa Cruz Advocacy Planning Survey:INTERIM  2979

casa_data_intake <- casa_data %>%
  filter(Category == "CASA of Santa Cruz Advocacy Planning Survey:INTAKE") %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == min(`Assessment Date`, na.rm = TRUE)) %>%
  ungroup()

Step 2 - Export Data

Create a subfolder in the working directory titled “data_export” to ensure the code below has a location and accurate file path to save the exported data.

write_xlsx(casa_data_intake, path = "data_export/Casa_SantaCruz_EarliestIntakeOnlyData_2.26.26.xlsx")

Isolate Interim Only Assessments (PACEs)

Since the PACEs assessment is only collected at interim, we can isolate it and export into its own dataframe.

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

casa_data_PACEs_interim <- casa_data %>%
  filter(Assessment == "1. PACEs (Interim) 0-5" |
           Assessment == "1. PACEs (Interim) 6+") %>%
  group_by(`Child ID`) %>%
  filter(`Assessment Date` == max(`Assessment Date`, na.rm = TRUE)) %>%
  ungroup()

write_xlsx(casa_data_PACEs_interim, path = "data_export/Casa_SantaCruz_InterimOnlyData_LatestPACEs_2.26.26.xlsx")

Isolate Data by Earliest Intake and Latest Interim Assessment

To align the data by individuals’ earliest intake and latest interim assessments, we need to split apart the data by assessment and rebuild the complete dataframe in our desired structure.

The goal is to arrange the data identifying cases that match the following scenarios:

ASQ-3/SE intake and interim ages 0-5
Wellbeing intake and interim ages 6+
Needs and Resources Physical Health by:
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Emotional Health by:
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Learning by
- 0-5 intake and interim
- 6+ intake and interim
- 0-5 intake and 6+ interim
Needs and Resources Long-Term Independence intake and interim 6+
Needs and Resources Probation Involved Youth intake and interim 6+

This code chunk provides a function that we can replicate across all assessments.

filter_intake_interim <- function(
  df,
  intake_lbl,
  interim_lbl,
  id_col = "Child ID",
  date_col = "Assessment Date",
  assess_col = "Assessment"
) {

  df <- df %>%
    mutate(across(all_of(date_col), as.Date))

  kids_both <- df %>%
    group_by(.data[[id_col]]) %>%
    summarise(
      has_intake  = any(.data[[assess_col]] == intake_lbl,  na.rm = TRUE),
      has_interim = any(.data[[assess_col]] == interim_lbl, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    filter(has_intake & has_interim) %>%
    select(.data[[id_col]])

  df_filt <- df %>%
    semi_join(kids_both, by = id_col)

  intake_dates <- df_filt %>%
    filter(.data[[assess_col]] == intake_lbl) %>%
    group_by(.data[[id_col]]) %>%
    summarise(intake_date = min(.data[[date_col]], na.rm = TRUE), .groups = "drop")

  interim_dates <- df_filt %>%
    filter(.data[[assess_col]] == interim_lbl) %>%
    group_by(.data[[id_col]]) %>%
    summarise(interim_date = max(.data[[date_col]], na.rm = TRUE), .groups = "drop")

  intake <- df_filt %>%
    filter(.data[[assess_col]] == intake_lbl) %>%
    inner_join(intake_dates, by = id_col) %>%
    filter(.data[[date_col]] == intake_date) %>%
    select(-intake_date)

  interim <- df_filt %>%
    filter(.data[[assess_col]] == interim_lbl) %>%
    inner_join(interim_dates, by = id_col) %>%
    filter(.data[[date_col]] == interim_date) %>%
    select(-interim_date)

  list(intake = intake, interim = interim)
}



# Suffix non-key columns (idempotent + safer on reruns)
key_cols <- c("Child ID",
                           "Question Number",
                           "Question Name",
                           "Question")

suffix_once <- function(nm, suf) ifelse(grepl(paste0(suf, "$"), nm), nm, paste0(nm, suf))


demo_cols <- c(
  "Birthdate",
  "Gender",
  "Race",
  "Ethnicity",
  "Language",
  "Primary Language",
  "Petition Type",
  "Assigned to Program Date",
  "Program Closure Date"
)

ASQ-3

Step 1 - Filter ASQ-3 Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

asq3 <- casa_data %>% 
  filter(Assessment == "2a. ASQ-3 (INTAKE) 0-5" | 
           Assessment == "2a. ASQ-3 (Interim) 0-5")

Step 2 - Filter for Intakes and Interims

We want to identify cases with completed intake and interim assessments.

asq3_sets <- filter_intake_interim(
  asq3,
  intake_lbl  = "2a. ASQ-3 (INTAKE) 0-5",
  interim_lbl = "2a. ASQ-3 (Interim) 0-5"
)

asq3_intake  <- asq3_sets$intake
asq3_interim <- asq3_sets$interim



asq3_intake_s <- asq3_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

asq3_interim_s <- asq3_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
asq3_joined <- left_join(
  asq3_intake_s,
  asq3_interim_s,
  by = key_cols
)

asq3_clean <- asq3_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

names(asq3_clean)

##  [1] "FiscalYear_intake"            "SourceFile_intake"           
##  [3] "Child ID"                     "Assigned to Program Date"    
##  [5] "Program Closure Date"         "Birthdate"                   
##  [7] "Gender"                       "Race"                        
##  [9] "Ethnicity"                    "Language"                    
## [11] "Primary Language"             "Petition Type"               
## [13] "AssessmentAge_intake"         "AssessmentAgeGroup_intake"   
## [15] "Assessment Date_intake"       "Category_intake"             
## [17] "Assessment_intake"            "Total Child Score_intake"    
## [19] "Total Possible Score_intake"  "Question Number"             
## [21] "Question Name"                "Question"                    
## [23] "Response Number_intake"       "Response_intake"             
## [25] "Point Value_intake"           "FiscalYear_interim"          
## [27] "SourceFile_interim"           "AssessmentAge_interim"       
## [29] "AssessmentAgeGroup_interim"   "Assessment Date_interim"     
## [31] "Category_interim"             "Assessment_interim"          
## [33] "Total Child Score_interim"    "Total Possible Score_interim"
## [35] "Response Number_interim"      "Response_interim"            
## [37] "Point Value_interim"

asq3_clean <- asq3_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

ASQ-SE

Step 1 - Filter ASQ-SE Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

asq_se <- casa_data %>% 
  filter(Assessment == "2b. ASQ-SE (INTAKE) 0-5" | 
           Assessment == "2b. ASQ-SE (Interim) 0-5")

Step 2 - Filter for Intakes and Interims

We want to identify cases with completed intake and interim assessments.

asq_se_sets <- filter_intake_interim(
  asq_se,
  intake_lbl  = "2b. ASQ-SE (INTAKE) 0-5",
  interim_lbl = "2b. ASQ-SE (Interim) 0-5"
)

asq_se_intake  <- asq_se_sets$intake
asq_se_interim <- asq_se_sets$interim



asq_se_intake_s <- asq_se_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

asq_se_interim_s <- asq_se_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
asq_se_joined <- left_join(
  asq_se_intake_s,
  asq_se_interim_s,
  by = key_cols
)

asq_se_clean <- asq_se_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

names(asq_se_clean)

##  [1] "FiscalYear_intake"            "SourceFile_intake"           
##  [3] "Child ID"                     "Assigned to Program Date"    
##  [5] "Program Closure Date"         "Birthdate"                   
##  [7] "Gender"                       "Race"                        
##  [9] "Ethnicity"                    "Language"                    
## [11] "Primary Language"             "Petition Type"               
## [13] "AssessmentAge_intake"         "AssessmentAgeGroup_intake"   
## [15] "Assessment Date_intake"       "Category_intake"             
## [17] "Assessment_intake"            "Total Child Score_intake"    
## [19] "Total Possible Score_intake"  "Question Number"             
## [21] "Question Name"                "Question"                    
## [23] "Response Number_intake"       "Response_intake"             
## [25] "Point Value_intake"           "FiscalYear_interim"          
## [27] "SourceFile_interim"           "AssessmentAge_interim"       
## [29] "AssessmentAgeGroup_interim"   "Assessment Date_interim"     
## [31] "Category_interim"             "Assessment_interim"          
## [33] "Total Child Score_interim"    "Total Possible Score_interim"
## [35] "Response Number_interim"      "Response_interim"            
## [37] "Point Value_interim"

asq_se_clean <- asq_se_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Wellbeing

Step 1 - Filter Wellbeing Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

wellbeing <- casa_data %>% 
  filter(Assessment == "2. Wellbeing (INTAKE) 6+" | 
           Assessment == "2. Wellbeing (Interim) 6+")

Step 2 - Filter for Intakes and Interims

wellbeing_sets <- filter_intake_interim(
  wellbeing,
  intake_lbl  = "2. Wellbeing (INTAKE) 6+",
  interim_lbl = "2. Wellbeing (Interim) 6+"
)

wellbeing_intake  <- wellbeing_sets$intake
wellbeing_interim <- wellbeing_sets$interim



wellbeing_intake_s <- wellbeing_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

wellbeing_interim_s <- wellbeing_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
wellbeing_joined <- left_join(
  wellbeing_intake_s,
  wellbeing_interim_s,
  by = key_cols
)

wellbeing_clean <- wellbeing_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

wellbeing_clean <- wellbeing_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Needs and Resources Physical Health

The Needs and Resources assessments require a few extra steps because we need to account for children who have 0-5 intake/interim, 6+ intake/interim, and 0-5 intake/6+ interim

Step 1 - Filter Physical Health Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

physical_health_0_5 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 0-5" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 0-5")

physical_health_6 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 6+" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 6+")

physical_health_0_5_6 <- casa_data %>% 
  filter(Assessment == "3. Needs and Resources: Physical Health (INTAKE) 0-5" | 
           Assessment == "3. Needs and Resources: Physical Health (Interim) 6+")

Step 2 - Filter for Intakes and Interims

physicalhealth_sets_0_5 <- filter_intake_interim(
  physical_health_0_5,
  intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 0-5",
  interim_lbl = "3. Needs and Resources: Physical Health (Interim) 0-5"
)

physicalhealth_0_5_intake  <- physicalhealth_sets_0_5$intake
physicalhealth_0_5_interim <- physicalhealth_sets_0_5$interim



physicalhealth_0_5_intake_s <- physicalhealth_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

physicalhealth_0_5_interim_s <- physicalhealth_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
physicalhealth_0_5_joined <- left_join(
  physicalhealth_0_5_intake_s,
  physicalhealth_0_5_interim_s,
  by = key_cols
)

physicalhealth_0_5_clean <- physicalhealth_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

physicalhealth_sets_6 <- filter_intake_interim(
  physical_health_6,
  intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 6+",
  interim_lbl = "3. Needs and Resources: Physical Health (Interim) 6+"
)

physicalhealth_6_intake  <- physicalhealth_sets_6$intake
physicalhealth_6_interim <- physicalhealth_sets_6$interim



physicalhealth_6_intake_s <- physicalhealth_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

physicalhealth_6_interim_s <- physicalhealth_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
physicalhealth_6_joined <- left_join(
  physicalhealth_6_intake_s,
  physicalhealth_6_interim_s,
  by = key_cols
)

physicalhealth_6_clean <- physicalhealth_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing a physical health intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# 
# physicalhealth_sets_0_5_6 <- filter_intake_interim(
#   physical_health_0_5_6,
#   intake_lbl  = "3. Needs and Resources: Physical Health (INTAKE) 0-5",
#   interim_lbl = "3. Needs and Resources: Physical Health (Interim) 6+"
# )
# 
# physicalhealth_0_5_6_intake  <- physicalhealth_sets_0_5_6$intake
# physicalhealth_0_5_6_interim <- physicalhealth_sets_0_5_6$interim
# 
# 
# 
# physicalhealth_0_5_6_intake_s <- physicalhealth_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# physicalhealth_0_5_6_interim_s <- physicalhealth_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# physicalhealth_0_5_6_joined <- left_join(
#   physicalhealth_0_5_6_intake_s,
#   physicalhealth_0_5_6_interim_s,
#   by = key_cols
# )
# 
# physicalhealth_0_5_6_clean <- physicalhealth_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

physicalhealth_0_5_clean <- physicalhealth_0_5_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

physicalhealth_6_clean <- physicalhealth_6_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

# physicalhealth_0_5_6_clean <- physicalhealth_0_5_6_clean %>% 
#   select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Needs and Resources Emotional Health

Step 1 - Filter Emotional Health Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

emotional_health_0_5 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 0-5" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 0-5")

emotional_health_6 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 6+" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 6+")

emotional_health_0_5_6 <- casa_data %>% 
  filter(Assessment == "4. Needs and Resources: Emotional Health (INTAKE) 0-5" | 
           Assessment == "4. Needs and Resources: Emotional Health (Interim) 6+")

Step 2 - Filter for Intakes and Interims

emotionalhealth_sets_0_5 <- filter_intake_interim(
  emotional_health_0_5,
  intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
  interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 0-5"
)

emotionalhealth_0_5_intake  <- emotionalhealth_sets_0_5$intake
emotionalhealth_0_5_interim <- emotionalhealth_sets_0_5$interim


# 
# emotionalhealth_0_5_intake_s <- emotionalhealth_0_5_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# emotionalhealth_0_5_interim_s <- emotionalhealth_0_5_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# emotionalhealth_0_5_joined <- left_join(
#   emotionalhealth_0_5_intake_s,
#   emotionalhealth_0_5_interim_s,
#   by = key_cols
# )
# 
# emotionalhealth_0_5_clean <- emotionalhealth_0_5_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))




join_cols <- c("Child ID", "Question Number")
no_suffix_cols <- c("Child ID", "Question Number", "Question Name", "Question")

emotionalhealth_0_5_intake_s <- emotionalhealth_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

emotionalhealth_0_5_interim_s <- emotionalhealth_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

emotionalhealth_0_5_joined <- left_join(
  emotionalhealth_0_5_intake_s,
  emotionalhealth_0_5_interim_s %>% select(-all_of(c("Question Name", "Question"))),
  by = join_cols
)

# QC: what would fail if you joined by full key_cols?
anti_join(emotionalhealth_0_5_intake, emotionalhealth_0_5_interim, by = key_cols) %>%
  select(all_of(key_cols)) %>%
  distinct() %>%
  head(30)

## # A tibble: 6 × 4
##   `Child ID` `Question Number` `Question Name`                          Question
##   <fct>                  <dbl> <fct>                                    <fct>   
## 1 7227234                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…
## 2 7187587                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…
## 3 7169486                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…
## 4 7502618                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…
## 5 7447643                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…
## 6 7428752                    7 EmotionalHealth-Q7-Extracurricular/Enri… Has ext…

emotionalhealth_0_5_clean <- emotionalhealth_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

emotionalhealth_sets_6 <- filter_intake_interim(
  emotional_health_6,
  intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 6+",
  interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 6+"
)

emotionalhealth_6_intake  <- emotionalhealth_sets_6$intake
emotionalhealth_6_interim <- emotionalhealth_sets_6$interim



# emotionalhealth_6_intake_s <- emotionalhealth_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# emotionalhealth_6_interim_s <- emotionalhealth_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# emotionalhealth_6_joined <- left_join(
#   emotionalhealth_6_intake_s,
#   emotionalhealth_6_interim_s,
#   by = key_cols
# )
# 
# emotionalhealth_6_clean <- emotionalhealth_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))


emotionalhealth_6_intake_s <- emotionalhealth_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

emotionalhealth_6_interim_s <- emotionalhealth_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

emotionalhealth_6_joined <- left_join(
  emotionalhealth_6_intake_s,
  emotionalhealth_6_interim_s %>% select(-all_of(c("Question Name", "Question"))),
  by = join_cols
)


emotionalhealth_6_clean <- emotionalhealth_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing an emotional health intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# 
# emotionalhealth_sets_0_5_6 <- filter_intake_interim(
#   emotional_health_0_5_6,
#   intake_lbl  = "4. Needs and Resources: Emotional Health (INTAKE) 0-5",
#   interim_lbl = "4. Needs and Resources: Emotional Health (Interim) 6+"
# )
# 
# emotionalhealth_0_5_6_intake  <- emotionalhealth_sets_0_5_6$intake
# emotionalhealth_0_5_6_interim <- emotionalhealth_sets_0_5_6$interim
# 
# 
# 
# emotionalhealth_0_5_6_intake_s <- emotionalhealth_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# emotionalhealth_0_5_6_interim_s <- emotionalhealth_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# emotionalhealth_0_5_6_joined <- left_join(
#   emotionalhealth_0_5_6_intake_s,
#   emotionalhealth_0_5_6_interim_s,
#   by = key_cols
# )
# 
# emotionalhealth_0_5_6_clean <- emotionalhealth_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

emotionalhealth_0_5_clean <- emotionalhealth_0_5_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

emotionalhealth_6_clean <- emotionalhealth_6_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

# emotionalhealth_0_5_6_clean <- emotionalhealth_0_5_6_clean %>% 
#   select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Needs and Resources Learning

Step 1 - Filter Learning Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

learning_0_5 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 0-5" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 0-5")

learning_6 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 6+" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 6+")

learning_0_5_6 <- casa_data %>% 
  filter(Assessment == "5. Needs and Resources: Learning (INTAKE) 0-5" | 
           Assessment == "5. Needs and Resources: Learning (Interim) 6+")

Step 2 - Filter for Intakes and Interims

learning_sets_0_5 <- filter_intake_interim(
  learning_0_5,
  intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 0-5",
  interim_lbl = "5. Needs and Resources: Learning (Interim) 0-5"
)

learning_0_5_intake  <- learning_sets_0_5$intake
learning_0_5_interim <- learning_sets_0_5$interim



learning_0_5_intake_s <- learning_0_5_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

learning_0_5_interim_s <- learning_0_5_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
learning_0_5_joined <- left_join(
  learning_0_5_intake_s,
  learning_0_5_interim_s,
  by = key_cols
)

learning_0_5_clean <- learning_0_5_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

learning_sets_6 <- filter_intake_interim(
  learning_6,
  intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 6+",
  interim_lbl = "5. Needs and Resources: Learning (Interim) 6+"
)

learning_6_intake  <- learning_sets_6$intake
learning_6_interim <- learning_sets_6$interim



learning_6_intake_s <- learning_6_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))

learning_6_interim_s <- learning_6_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))

# 5) Join safely at question-row level
learning_6_joined <- left_join(
  learning_6_intake_s,
  learning_6_interim_s,
  by = key_cols
)

learning_6_clean <- learning_6_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

No cases meet the criteria of completing a learning intake in the 0-5 age group and interim in the 6+ age group, but the code is provided below if ever needed.

# 
# learning_sets_0_5_6 <- filter_intake_interim(
#   learning_0_5_6,
#   intake_lbl  = "5. Needs and Resources: Learning (INTAKE) 0-5",
#   interim_lbl = "5. Needs and Resources: Learning (Interim) 6+"
# )
# 
# learning_0_5_6_intake  <- learning_sets_0_5_6$intake
# learning_0_5_6_interim <- learning_sets_0_5_6$interim
# 
# 
# 
# learning_0_5_6_intake_s <- learning_0_5_6_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# learning_0_5_6_interim_s <- learning_0_5_6_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# learning_0_5_6_joined <- left_join(
#   learning_0_5_6_intake_s,
#   learning_0_5_6_interim_s,
#   by = key_cols
# )
# 
# learning_0_5_6_clean <- learning_0_5_6_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

learning_0_5_clean <- learning_0_5_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

learning_6_clean <- learning_6_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

# learning_0_5_6_clean <- learning_0_5_6_clean %>% 
#   select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Needs and Resources Long Term Independence

Long Term Indepedence appeared to have duplicated data entry so the steps were modified.

Step 1 - Filter Long Term Independence Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

INTAKE_LBL  <- "6. Needs and Resources: Long Term Independence (INTAKE) 6+"
INTERIM_LBL <- "6. Needs and Resources: Long Term Independence (Interim) 6+"

# LTI-specific item key (stronger than your global key_cols)
key_cols_lti <- c("Child ID", "Question Number", "Question Name")

LTI <- casa_data %>%
  filter(Assessment %in% c(INTAKE_LBL, INTERIM_LBL)) %>%
  distinct(
    `Child ID`, Assessment, `Assessment Date`, SourceFile,
    `Question Number`, `Question Name`, Question, Response, `Point Value`,
    .keep_all = TRUE
  )

Step 2 - Filter for Intakes and Interims

LTI_sets <- filter_intake_interim(
  LTI,
  intake_lbl  = INTAKE_LBL,
  interim_lbl = INTERIM_LBL
)

LTI_intake  <- LTI_sets$intake
LTI_interim <- LTI_sets$interim

LTI_intake  %>% count(across(all_of(key_cols_lti))) %>% summarise(max_n = max(n))

## # A tibble: 1 × 1
##   max_n
##   <int>
## 1     1

LTI_interim %>% count(across(all_of(key_cols_lti))) %>% summarise(max_n = max(n))

## # A tibble: 1 × 1
##   max_n
##   <int>
## 1     1

LTI_intake_s <- LTI_intake %>%
  rename_with(~ suffix_once(.x, "_intake"),  .cols = -all_of(key_cols_lti))

LTI_interim_s <- LTI_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols_lti))


LTI_joined <- inner_join(
  LTI_intake_s,
  LTI_interim_s,
  by = key_cols_lti
)

LTI_clean <- LTI_joined %>%
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

names(LTI_clean)

##  [1] "FiscalYear_intake"            "SourceFile_intake"           
##  [3] "Child ID"                     "Assigned to Program Date"    
##  [5] "Program Closure Date"         "Birthdate"                   
##  [7] "Gender"                       "Race"                        
##  [9] "Ethnicity"                    "Language"                    
## [11] "Primary Language"             "Petition Type"               
## [13] "AssessmentAge_intake"         "AssessmentAgeGroup_intake"   
## [15] "Assessment Date_intake"       "Category_intake"             
## [17] "Assessment_intake"            "Total Child Score_intake"    
## [19] "Total Possible Score_intake"  "Question Number"             
## [21] "Question Name"                "Question_intake"             
## [23] "Response Number_intake"       "Response_intake"             
## [25] "Point Value_intake"           "FiscalYear_interim"          
## [27] "SourceFile_interim"           "AssessmentAge_interim"       
## [29] "AssessmentAgeGroup_interim"   "Assessment Date_interim"     
## [31] "Category_interim"             "Assessment_interim"          
## [33] "Total Child Score_interim"    "Total Possible Score_interim"
## [35] "Question_interim"             "Response Number_interim"     
## [37] "Response_interim"             "Point Value_interim"

LTI_clean <- LTI_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33, 34, 36, 37, 38))

LTI_clean <- LTI_clean %>% 
  rename("Question" = Question_intake)

Needs and Resources Probation-Involved Youth

Step 1 - Filter PIY Assessments

casa_data %>% 
  count(Assessment)

## # A tibble: 26 × 2
##    Assessment                    n
##    <fct>                     <int>
##  1 1. ACEs (INTAKE) 0-5        160
##  2 1. ACEs (INTAKE) 6+         920
##  3 1. PACEs (Interim) 0-5       49
##  4 1. PACEs (Interim) 6+       399
##  5 2. Wellbeing (INTAKE) 6+   1104
##  6 2. Wellbeing (Interim) 6+   684
##  7 2a. ASQ-3 (INTAKE) 0-5       75
##  8 2a. ASQ-3 (Interim) 0-5      35
##  9 2b. ASQ-SE (INTAKE) 0-5      15
## 10 2b. ASQ-SE (Interim) 0-5      7
## # ℹ 16 more rows

PIY <- casa_data %>% 
  filter(Assessment == "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+" | 
           Assessment == "7. Needs and Resources: Probation-Involved Youth (Interim) 6+")

Step 2 - Filter for Intakes and Interims

PIY_sets <- filter_intake_interim(
  PIY,
  intake_lbl  = "7. Needs and Resources: Probation-Involved Youth (INTAKE) 6+",
  interim_lbl = "7. Needs and Resources: Probation-Involved Youth (Interim) 6+"
)

PIY_intake  <- PIY_sets$intake
PIY_interim <- PIY_sets$interim


# 
# PIY_intake_s <- PIY_intake %>%
#   rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(key_cols))
# 
# PIY_interim_s <- PIY_interim %>%
#   rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(key_cols))
# 
# # 5) Join safely at question-row level
# PIY_joined <- left_join(
#   PIY_intake_s,
#   PIY_interim_s,
#   by = key_cols
# )
# 
# PIY_clean <- PIY_joined %>%
#   # keep intake version, rename back to original
#   rename_with(
#     ~ str_remove(.x, "_intake$"),
#     .cols = all_of(paste0(demo_cols, "_intake"))
#   ) %>%
#   # drop interim duplicates
#   select(-all_of(paste0(demo_cols, "_interim")))

PIY_intake_s <- PIY_intake %>%
  rename_with(~ suffix_once(.x, "_intake"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

PIY_interim_s <- PIY_interim %>%
  rename_with(~ suffix_once(.x, "_interim"), .cols = -all_of(no_suffix_cols)) %>%
  distinct(across(all_of(join_cols)), .keep_all = TRUE)

PIY_joined <- left_join(
  PIY_intake_s,
  PIY_interim_s %>% select(-all_of(c("Question Name", "Question"))),
  by = join_cols
)


PIY_clean <- PIY_joined %>%
  # keep intake version, rename back to original
  rename_with(
    ~ str_remove(.x, "_intake$"),
    .cols = all_of(paste0(demo_cols, "_intake"))
  ) %>%
  # drop interim duplicates
  select(-all_of(paste0(demo_cols, "_interim")))

Step 3 - Rearrange Column Order

PIY_clean <- PIY_clean %>% 
  select(c(1, 26, 2, 27, 3:17, 28:32, 18:25, 33:37))

Bind Intake-Interim Data Objects

casa_intake_interim <- bind_rows(asq3_clean,
                                 wellbeing_clean,
                                 physicalhealth_0_5_clean,
                                 physicalhealth_6_clean,
                                 emotionalhealth_0_5_clean,
                                 emotionalhealth_6_clean,
                                 learning_0_5_clean,
                                 learning_6_clean,
                                 LTI_clean,
                                 PIY_clean,
                                 .id = "DataFrameSource")


casa_intake_interim <- casa_intake_interim %>% 
  arrange(`Child ID`)


str(casa_intake_interim)

## tibble [2,176 × 38] (S3: tbl_df/tbl/data.frame)
##  $ DataFrameSource             : chr [1:2176] "2" "2" "2" "2" ...
##  $ FiscalYear_intake           : Factor w/ 2 levels "FY 2024–25","FY 2025–26": 1 1 1 1 1 1 1 1 1 1 ...
##  $ FiscalYear_interim          : Factor w/ 2 levels "FY 2024–25","FY 2025–26": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SourceFile_intake           : Factor w/ 3 levels "January2025-June2025",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ SourceFile_interim          : Factor w/ 3 levels "January2025-June2025",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Child ID                    : Factor w/ 106 levels "4636048","4636223",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Assigned to Program Date    : Date[1:2176], format: "2023-10-23" "2023-10-23" ...
##  $ Program Closure Date        : Date[1:2176], format: NA NA ...
##  $ Birthdate                   : Date[1:2176], format: "2006-08-28" "2006-08-28" ...
##  $ Gender                      : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Race                        : Factor w/ 6 levels "American Indian or Alaska Native",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Ethnicity                   : Factor w/ 2 levels "Hispanic","Non-Hispanic": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Language                    : Factor w/ 4 levels "Bi-Lingual (English/Spanish)",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Primary Language            : Factor w/ 4 levels "False","No","True",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ Petition Type               : Factor w/ 4 levels "Dependency","Dual Status",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ AssessmentAge_intake        : num [1:2176] 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 ...
##  $ AssessmentAgeGroup_intake   : Factor w/ 2 levels "0-5","6+": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment Date_intake      : Date[1:2176], format: "2024-10-31" "2024-10-31" ...
##  $ Category_intake             : Factor w/ 2 levels "CASA of Santa Cruz Advocacy Planning Survey:INTAKE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Assessment_intake           : Factor w/ 26 levels "1. ACEs (INTAKE) 0-5",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ AssessmentAge_interim       : num [1:2176] 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 19.2 ...
##  $ AssessmentAgeGroup_interim  : Factor w/ 2 levels "0-5","6+": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment Date_interim     : Date[1:2176], format: "2025-11-18" "2025-11-18" ...
##  $ Category_interim            : Factor w/ 2 levels "CASA of Santa Cruz Advocacy Planning Survey:INTAKE",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Assessment_interim          : Factor w/ 26 levels "1. ACEs (INTAKE) 0-5",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Total Child Score_intake    : num [1:2176] 18 18 18 18 18 18 18 18 18 18 ...
##  $ Total Possible Score_intake : num [1:2176] 36 36 36 36 36 36 36 36 36 36 ...
##  $ Question Number             : num [1:2176] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Question Name               : Factor w/ 76 levels "ACEs-Intro","ACEs-Q1-Emotional Neglect",..: 24 25 26 73 74 75 12 13 14 15 ...
##  $ Question                    : Factor w/ 77 levels "Able to calm themselves",..: 10 74 9 2 1 25 14 58 77 75 ...
##  $ Response Number_intake      : num [1:2176] 2 1 2 2 1 1 1 3 1 1 ...
##  $ Response_intake             : Factor w/ 29 levels "Above cutoff",..: 27 26 27 27 26 26 26 5 26 26 ...
##  $ Point Value_intake          : num [1:2176] 2 1 2 2 1 1 1 3 1 1 ...
##  $ Total Child Score_interim   : num [1:2176] 25 25 25 25 25 25 25 25 25 25 ...
##  $ Total Possible Score_interim: num [1:2176] 36 36 36 36 36 36 36 36 36 36 ...
##  $ Response Number_interim     : num [1:2176] 3 3 3 2 2 1 1 2 2 3 ...
##  $ Response_interim            : Factor w/ 29 levels "Above cutoff",..: 5 5 5 27 27 26 26 27 27 5 ...
##  $ Point Value_interim         : num [1:2176] 3 3 3 2 2 1 1 2 2 3 ...

casa_intake_interim <- casa_intake_interim %>% 
  mutate(DataFrameSource = as.factor(DataFrameSource))


casa_intake_interim$DataFrameSource <- recode(casa_intake_interim$DataFrameSource,
                                              `1` = "ASQ-3_0-5",
                                              `2` = "Wellbeing_6+",
                                              `3` = "PhysicalHealth_0-5",
                                              `4` = "PhysicalHealth_6+",
                                              `5` = "EmotionalHealth_0-5",
                                              `6` = "EmotionalHealth_6+",
                                              `7` = "Learning_0-5",
                                              `8` = "Learning_6+",
                                              `9` = "LongTermIndependence_6+",
                                              `10` = "ProbationInvolvedYouth_6+")


# reorder to desired column order

names(casa_intake_interim)

##  [1] "DataFrameSource"              "FiscalYear_intake"           
##  [3] "FiscalYear_interim"           "SourceFile_intake"           
##  [5] "SourceFile_interim"           "Child ID"                    
##  [7] "Assigned to Program Date"     "Program Closure Date"        
##  [9] "Birthdate"                    "Gender"                      
## [11] "Race"                         "Ethnicity"                   
## [13] "Language"                     "Primary Language"            
## [15] "Petition Type"                "AssessmentAge_intake"        
## [17] "AssessmentAgeGroup_intake"    "Assessment Date_intake"      
## [19] "Category_intake"              "Assessment_intake"           
## [21] "AssessmentAge_interim"        "AssessmentAgeGroup_interim"  
## [23] "Assessment Date_interim"      "Category_interim"            
## [25] "Assessment_interim"           "Total Child Score_intake"    
## [27] "Total Possible Score_intake"  "Question Number"             
## [29] "Question Name"                "Question"                    
## [31] "Response Number_intake"       "Response_intake"             
## [33] "Point Value_intake"           "Total Child Score_interim"   
## [35] "Total Possible Score_interim" "Response Number_interim"     
## [37] "Response_interim"             "Point Value_interim"

casa_intake_interim <- casa_intake_interim %>% 
  select(DataFrameSource,
           FiscalYear_intake,
           FiscalYear_interim,
           SourceFile_intake,
           SourceFile_interim,
           `Child ID`,
           Birthdate,
           Gender,
           Race,
           Ethnicity,
           Language,
           `Primary Language`,
           `Petition Type`,
           `Assigned to Program Date`,
           `Program Closure Date`,
           `Assessment Date_intake`,
           `Assessment Date_interim`,
           Category_intake,
           Category_interim,
           Assessment_intake,
           Assessment_interim,
           AssessmentAge_intake,
           AssessmentAgeGroup_intake,
           AssessmentAge_interim,
           AssessmentAgeGroup_interim,
           `Question Number`,
           `Question Name`,
           Question,
           `Total Child Score_intake`,
           `Total Possible Score_intake`,
           `Response Number_intake`,
           Response_intake,
           `Point Value_intake`,
           `Total Child Score_interim`,
           `Total Possible Score_interim`,
           `Response Number_interim`,
           Response_interim,
           `Point Value_interim`)

write_xlsx(casa_intake_interim, path = "data_export/Casa_SantaCruz_Intake_Interim_Data_2.26.26.xlsx")

Isolate Data by Cases with Interims but Missing Intakes

To section walks through how to isolate cases where there is a present interim assessment but no corresponding intake data for the following scenarios:

ASQ-3/SE ages 0-5
Wellbeing ages 6+
Needs and Resources Physical Health by:
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Emotional Health by:
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Learning by
- 0-5 interim but no matching intake
- 6+ interim but no matching intake
Needs and Resources Long-Term Independence ages 6+
Needs and Resources Probation Involved Youth ages 6+