Introduction

This document describes the data used while analyzing the impact of Shamiri’s intervention through an RCT on the mental health status of students captured by various pre-determined scales in the field of psychology.

These scales include: PCS - Perceived Control Scale (6 questions, Academic and Staying out of trouble) PHQ - Patient Health Questionaire (8 questions + Functioning) GAD - Generalised Anxiety Disorder (7 questions + Check + Functioning) MSPSS - Multidimensional Scale of Perceived Social Support (8 questions) GQ - Gratitude Questionaire (6 questions) SWEMBS - Short Warwick Edinburgh Mental Well Being Scale (7 questions) SES - School Engagement Scale (19 questions)

The document is divided into four parts.

The first part checks for the missing values across the data set in both treatment and control groups.
The second part lays out the descriptive analysis of each variable in the demographic details for both treatment and control group
The third part checks for trends in responses to each of the questions in the scales mentioned above.
Work in Progress: The fourth part does correlational analysis of each of the scales with demographic information of the participants such as age, gender, form, perceived, area type (urban/rural), household situation and school metrics (academic performance and co-curricular engageement). This will help us understand the implication of the correlations toward our analysis in the RCT.

First we load the libraries as well as the data sets to perform basic cleaning before we begin the analysis.

# Installing essential libraries
library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(readxl)
library(fastDummies)

## Thank you for using fastDummies!

## To acknowledge our work, please cite the package:

## Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lmtest)

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(AER)

## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## Loading required package: sandwich
## Loading required package: survival

library(sandwich)
library(fixest)
library(ggplot2)
library(ggthemes)
library(rddensity)
library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

library(corrplot)

## corrplot 0.92 loaded

library(reshape2)

## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths

library(kableExtra)

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(MissMech)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

library(grid)


# Setting the working directory
setwd('/Users/ishangupta/Desktop/Books for Masters/R Practice')


# Loading the data in R
treatment <- read_xlsx("Treatment.xlsx", sheet = "2023Studentlongtrialdata")
control <- read_xlsx("Control.xlsx", sheet = "2023NaiKiambuControl")


# Renaming the columns in the treatment group
treatment <- treatment %>% 
  rename(
    Age = `student age`,
    Form = form,
    Gender = `student gender`,
    School_ID = `school id`
  ) %>% 
  mutate(trt = 1)

# List of school IDs to be dropped
school_ids_to_drop <- c("ANS23_School_31", "ANS23_School_32", "ANS23_School_33", 
                        "ANS23_School_34", "ANS23_School_35", "ANS23_School_36", 
                        "ANS23_School_37")

# Assuming 'treatment' is your dataframe with the treatment group data
# Filtering out rows with specified school IDs
treatment <- treatment %>%
  filter(!School_ID %in% school_ids_to_drop)

# Renaming the columns in the control group
control <- control %>% 
  rename(
    School_ID = `school id`
  ) %>% 
  mutate(supervisor_id = "NA") %>% 
  mutate(hub = "NA") %>% 
  mutate(fellow_gender = "NA") %>% 
  mutate(fellow_age = "NA") %>% 
  mutate(trt = 0)

# Creating the RCT data set
rct <- rbind(treatment,control)

# Converting the columns into numeric
cols_to_numeric <- c(1, 11, 12, 13, 103, 104, 17:101)
rct[cols_to_numeric] <- lapply(rct[cols_to_numeric], as.numeric)

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

## Warning in lapply(rct[cols_to_numeric], as.numeric): NAs introduced by coercion

PART 1: Checking for missingness

# Define a function to count NAs in a row
count_NAs <- function(row) {
  sum(is.na(row)) + sum(row == "NA", na.rm = TRUE)
}

# Apply the function to the relevant columns (19 to 79) and create the new column NA_entries
rct$NA_entries <- apply(rct[, 27:90], 1, count_NAs)

# Checking for trends in NA entries for each time
rct_small <- rct %>% 
  group_by(trt,participant_id,time, NA_entries) %>% 
  summarise(count=n())

## `summarise()` has grouped output by 'trt', 'participant_id', 'time'. You can
## override using the `.groups` argument.

# Pivot the data to wide format
rct_wide <- rct_small %>%
  pivot_wider(names_from = time, values_from = NA_entries, names_prefix = "NA_entries_in_time_")

# Rename the columns to match the desired output
colnames(rct_wide) <- gsub("NA_entries_in_time_", "NA_entries_in_time_", colnames(rct_wide))

# Checking for absence of students in each data collection period.
rct_wide <- rct_wide %>%
  select(-count) %>%
  mutate(cases = case_when(
    NA_entries_in_time_0 == 64 & NA_entries_in_time_2 == 64 & NA_entries_in_time_4 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in ALL",
    NA_entries_in_time_2 == 64 & NA_entries_in_time_4 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 2, 4, 8",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_4 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 0, 4, 8",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_2 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 0, 2, 8",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_2 == 64 & NA_entries_in_time_4 == 64 ~ "Absent in 0, 2, 4",
    NA_entries_in_time_4 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 4, 8",
    NA_entries_in_time_2 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 2, 8",
    NA_entries_in_time_2 == 64 & NA_entries_in_time_4 == 64 ~ "Absent in 2, 4",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_8 == 64 ~ "Absent in 0, 8",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_4 == 64 ~ "Absent in 0, 4",
    NA_entries_in_time_0 == 64 & NA_entries_in_time_2 == 64 ~ "Absent in 0, 2",
    NA_entries_in_time_0 == 64 ~ "Absent in 0",
    NA_entries_in_time_2 == 64 ~ "Absent in 2",
    NA_entries_in_time_4 == 64 ~ "Absent in 4",
    NA_entries_in_time_8 == 64 ~ "Absent in 8",
    TRUE ~ "Present"  # Default case when none of the above conditions are met
  ))

# Summarize the count of cases by trt and cases
rct_present <- rct_wide %>%
  group_by(trt, cases) %>%
  summarise(count = n(), .groups = 'drop')

# Summarize the total count of entries for trt = 0 and trt = 1
total_count_trt <- rct_present %>%
  group_by(trt) %>%
  summarise(total_count = sum(count))

# Calculating the count of students answering the questions in each scales
rct <- rct %>%
  mutate(
    PCS_count = rowSums(!is.na(select(., PCS_1:PCS_6))),
    PHQ_count = rowSums(!is.na(select(., PHQ_1:PHQ_8))),
    GAD_count = rowSums(!is.na(select(., GAD_1:GAD_7))),
    MSPSS_count = rowSums(!is.na(select(., MSPSS_1:MSPSS_8))),
    GQ_count = rowSums(!is.na(select(., GQ_1:GQ_6))),
    SWEMWBS_count = rowSums(!is.na(select(., SWEMWBS_1:SWEMWBS_7))),
    SES_count = rowSums(!is.na(select(., SES_1:SES_19)))
  )

# Shortening the dataset to perform further analysis
rct_tiny <- rct %>% 
  select(trt,participant_id,PCS_count,PHQ_count,GAD_count,MSPSS_count,GQ_count,SWEMWBS_count,SES_count)

# Calculating the proportion of students in treatment and control group answering the survey as a whole
proportions_df <- rct_tiny %>%
  group_by(trt) %>%
  summarise(
    PCS_proportion = mean(PCS_count > 0, na.rm = TRUE),
    PHQ_proportion = mean(PHQ_count > 0, na.rm = TRUE),
    GAD_proportion = mean(GAD_count > 0, na.rm = TRUE),
    MSPSS_proportion = mean(MSPSS_count > 0, na.rm = TRUE),
    GQ_proportion = mean(GQ_count > 0, na.rm = TRUE),
    SWEMWBS_proportion = mean(SWEMWBS_count > 0, na.rm = TRUE),
    SES_proportion = mean(SES_count > 0, na.rm = TRUE)
  )

# Step 1: Create a duplicate dataset with the necessary columns
phq_gad_columns <- c("time", "participant_id", paste0("PHQ_", 1:8), "PHQ_Functioning", paste0("GAD_", 1:7), "GAD_Check", "GAD_Functioning")
duplicate_dataset <- treatment %>%
  select(all_of(phq_gad_columns))

# Step 2: Trim spaces and replace "NA" strings with actual NA values
duplicate_dataset <- duplicate_dataset %>%
  mutate(across(c(paste0("PHQ_", 1:8), "PHQ_Functioning", paste0("GAD_", 1:7), "GAD_Check", "GAD_Functioning"), ~ str_trim(.))) %>%
  mutate(across(c(paste0("PHQ_", 1:8), "PHQ_Functioning", paste0("GAD_", 1:7), "GAD_Check", "GAD_Functioning"), ~ na_if(., "NA"))) %>%
  mutate(across(c(paste0("PHQ_", 1:8), "PHQ_Functioning", paste0("GAD_", 1:7), "GAD_Check", "GAD_Functioning"), ~ na_if(., "")))

# Step 3: Count the number of missing entries in the PHQ and GAD columns for each row
duplicate_dataset <- duplicate_dataset %>%
  rowwise() %>%
  mutate(num_missing_entries = sum(
    is.na(`PHQ_1`) | `PHQ_1` == "NA", is.na(`PHQ_2`) | `PHQ_2` == "NA", is.na(`PHQ_3`) | `PHQ_3` == "NA",
    is.na(`PHQ_4`) | `PHQ_4` == "NA", is.na(`PHQ_5`) | `PHQ_5` == "NA", is.na(`PHQ_6`) | `PHQ_6` == "NA",
    is.na(`PHQ_7`) | `PHQ_7` == "NA", is.na(`PHQ_8`) | `PHQ_8` == "NA", is.na(`PHQ_Functioning`) | `PHQ_Functioning` == "NA",
    is.na(`GAD_1`) | `GAD_1` == "NA", is.na(`GAD_2`) | `GAD_2` == "NA", is.na(`GAD_3`) | `GAD_3` == "NA",
    is.na(`GAD_4`) | `GAD_4` == "NA", is.na(`GAD_5`) | `GAD_5` == "NA", is.na(`GAD_6`) | `GAD_6` == "NA",
    is.na(`GAD_7`) | `GAD_7` == "NA", is.na(`GAD_Check`) | `GAD_Check` == "NA", is.na(`GAD_Functioning`) | `GAD_Functioning` == "NA"
  ))

# Step 4: Calculate the percentage of missingness
duplicate_dataset <- duplicate_dataset %>%
  mutate(missingness_percentage = (num_missing_entries / 18) * 100) %>%
  ungroup()


# Step 5: Check the frequency of each missing value per time.

check_missing <- duplicate_dataset %>% 
  group_by(time, num_missing_entries) %>% 
  summarise(freq_miss = n())

## `summarise()` has grouped output by 'time'. You can override using the
## `.groups` argument.

  stargazer(check_missing, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==================================
## time num_missing_entries freq_miss
## ----------------------------------
## 0             0             496   
## 0             1             184   
## 0             2             91    
## 0             3             38    
## 0             4             12    
## 0             5              8    
## 0             6              5    
## 0             7              2    
## 0             8              2    
## 0             9              9    
## 0            10              1    
## 0            11              2    
## 0            12              2    
## 0            13              1    
## 0            14              1    
## 0            15              4    
## 0            18              8    
## 2             0             362   
## 2             1             126   
## 2             2             107   
## 2             3             17    
## 2             4             10    
## 2             5              4    
## 2             8              3    
## 2             9              4    
## 2            10              1    
## 2            11              1    
## 2            14              4    
## 2            15              1    
## 2            17              2    
## 2            18             224   
## 4             0             297   
## 4             1             114   
## 4             2             115   
## 4             3             21    
## 4             4              7    
## 4             5              7    
## 4             6              1    
## 4             7              2    
## 4             8              3    
## 4            11              5    
## 4            14              2    
## 4            15              1    
## 4            16              1    
## 4            17              2    
## 4            18             288   
## 8             0             511   
## 8             1             89    
## 8             2             71    
## 8             3             14    
## 8             4              6    
## 8             5              1    
## 8             6              1    
## 8             7              1    
## 8             8              1    
## 8             9              1    
## 8            10              3    
## 8            12              1    
## 8            13              1    
## 8            14              1    
## 8            18             164   
## ----------------------------------

# List of columns to convert
columns_to_convert <- c("PHQ_1", "PHQ_2", "PHQ_3", "PHQ_4", "PHQ_5", "PHQ_6", "PHQ_7", "PHQ_8",
                        "GAD_1", "GAD_2", "GAD_3", "GAD_4", "GAD_5", "GAD_6", "GAD_7",
                        "GAD_Check", "PHQ_Functioning", "GAD_Functioning")

# Convert the specified columns from character to numeric
duplicate_dataset[columns_to_convert] <- lapply(duplicate_dataset[columns_to_convert], function(x) as.numeric(as.character(x)))

Part 2: Demographic Analysis**

Participant ID, time, group number

This is a unique identifier of each participant. The variable on time is indicative of when the data was collected for which participant. The group number is the unique identifier for the group that was assigned to the students in treatment. We want to check for missing entries if any. We also want to check if there was any attrition with respect to time. We also want to check the number of students in each group.

# Checking for total number of entries
total_stu <- nrow(rct)
unique_stu <- length(unique(rct$participant_id))
missing_stu <- sum(is.na(rct$participant_id))


# Creating a data frame to store the results
summary_stu <- data.frame(
  Metric = c("Total Entries", "Unique Entries", "Missing Entries"),
  Value = c(total_stu, unique_stu, missing_stu)
)

# Printing the summary table
stargazer(summary_stu, type = "text", summary = FALSE, rownames = FALSE)

## 
## =====================
## Metric          Value
## ---------------------
## Total Entries   6,864
## Unique Entries  1,716
## Missing Entries   0  
## ---------------------

time_stu <- rct %>% 
  group_by(trt,time) %>% 
  summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

stargazer(time_stu, type = "text", summary = FALSE, rownames = FALSE)

## 
## =======================
## trt time no_of_students
## -----------------------
## 0    0        850      
## 0    2        850      
## 0    4        850      
## 0    8        850      
## 1    0        866      
## 1    2        866      
## 1    4        866      
## 1    8        866      
## -----------------------

We can note the following:

Every student has a unique participant ID
Every time period of data collection has equal number of students. After confirming with the team we found that the demographic information for each participant in the treatment and control group was carried forward regardless of them being actually present in the session and the subsequent survey - therefore this does not speak to the attrition or absenteeism

Condition, Implementer and School ID

. School ID is the unique identifier for each school. Here we will check for any missing entries. We’ll also group students in school ID to check how many students were in each school - we want to find any outliers in terms of grouping students.

# Checking for entries for condition and implementer
condition_stu <- rct %>% 
  group_by(trt,time, condition) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'trt', 'time'. You can override using the
## `.groups` argument.

condition_stu_wide <- condition_stu %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(condition_stu_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =============================
## trt condition  0   2   4   8 
## -----------------------------
## 0   Control1  850 850 850 850
## 1    Shamiri  866 866 866 866
## -----------------------------

imp_stu <- rct %>% 
  group_by(trt,time, implementer) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'trt', 'time'. You can override using the
## `.groups` argument.

imp_stu_wide <- imp_stu %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(imp_stu_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===============================
## trt implementer  0   2   4   8 
## -------------------------------
## 0      Imp_1    850 850 850 850
## 1      Imp_1    866 866 866 866
## -------------------------------

school_ID <- rct %>% 
  group_by(trt,time, School_ID) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'trt', 'time'. You can override using the
## `.groups` argument.

school_ID_wide <- school_ID %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(school_ID_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===================================
## trt    School_ID     0   2   4   8 
## -----------------------------------
## 0   ANS23_School_17 174 174 174 174
## 0   ANS23_School_18 133 133 133 133
## 0   ANS23_School_25 137 137 137 137
## 0   ANS23_School_47 94  94  94  94 
## 0   ANS23_School_5  227 227 227 227
## 0   ANS23_School_7  85  85  85  85 
## 1   ANS23_School_17 138 138 138 138
## 1   ANS23_School_18 134 134 134 134
## 1   ANS23_School_25 135 135 135 135
## 1   ANS23_School_47 105 105 105 105
## 1   ANS23_School_5  224 224 224 224
## 1   ANS23_School_7  130 130 130 130
## -----------------------------------

school_group <- rct %>% 
  group_by(trt,time, School_ID,group_number) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'trt', 'time', 'School_ID'. You can
## override using the `.groups` argument.

school_group_wide <- school_group %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(school_group_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ================================================
## trt    School_ID    group_number  0   2   4   8 
## ------------------------------------------------
## 0   ANS23_School_17   3-3-C004   174 174 174 174
## 0   ANS23_School_18   2-2-C003   133 133 133 133
## 0   ANS23_School_25   1-1-C002   137 137 137 137
## 0   ANS23_School_47   6-6-C009   94  94  94  94 
## 0   ANS23_School_5    5-5-C007   227 227 227 227
## 0   ANS23_School_7    4-4-C005   85  85  85  85 
## 1   ANS23_School_17   17_S001    15  15  15  15 
## 1   ANS23_School_17   17_S002    18  18  18  18 
## 1   ANS23_School_17   17_S003    15  15  15  15 
## 1   ANS23_School_17   17_S004    15  15  15  15 
## 1   ANS23_School_17   17_S005    14  14  14  14 
## 1   ANS23_School_17   17_S006    15  15  15  15 
## 1   ANS23_School_17   17_S007    15  15  15  15 
## 1   ANS23_School_17   17_S008    15  15  15  15 
## 1   ANS23_School_17   17_S009    15  15  15  15 
## 1   ANS23_School_17               1   1   1   1 
## 1   ANS23_School_18   18_S001    15  15  15  15 
## 1   ANS23_School_18   18_S002    15  15  15  15 
## 1   ANS23_School_18   18_S003    15  15  15  15 
## 1   ANS23_School_18   18_S004    15  15  15  15 
## 1   ANS23_School_18   18_S005    14  14  14  14 
## 1   ANS23_School_18   18_S006    15  15  15  15 
## 1   ANS23_School_18   18_S007    15  15  15  15 
## 1   ANS23_School_18   18_S008    15  15  15  15 
## 1   ANS23_School_18   18_S009    15  15  15  15 
## 1   ANS23_School_25   25_S001    15  15  15  15 
## 1   ANS23_School_25   25_S002    15  15  15  15 
## 1   ANS23_School_25   25_S003    15  15  15  15 
## 1   ANS23_School_25   25_S004    15  15  15  15 
## 1   ANS23_School_25   25_S005    15  15  15  15 
## 1   ANS23_School_25   25_S006    15  15  15  15 
## 1   ANS23_School_25   25_S007    15  15  15  15 
## 1   ANS23_School_25   25_S008    15  15  15  15 
## 1   ANS23_School_25   25_S009    15  15  15  15 
## 1   ANS23_School_47   47_S001    15  15  15  15 
## 1   ANS23_School_47   47_S002    15  15  15  15 
## 1   ANS23_School_47   47_S003    15  15  15  15 
## 1   ANS23_School_47   47_S004    15  15  15  15 
## 1   ANS23_School_47   47_S005    15  15  15  15 
## 1   ANS23_School_47   47_S006    15  15  15  15 
## 1   ANS23_School_47   47_S007    15  15  15  15 
## 1   ANS23_School_5     5_S001    15  15  15  15 
## 1   ANS23_School_5     5_S002    14  14  14  14 
## 1   ANS23_School_5     5_S003    16  16  16  16 
## 1   ANS23_School_5     5_S004    15  15  15  15 
## 1   ANS23_School_5     5_S005    15  15  15  15 
## 1   ANS23_School_5     5_S006    15  15  15  15 
## 1   ANS23_School_5     5_S007    15  15  15  15 
## 1   ANS23_School_5     5_S008    15  15  15  15 
## 1   ANS23_School_5     5_S009    15  15  15  15 
## 1   ANS23_School_5     5_S010    15  15  15  15 
## 1   ANS23_School_5     5_S011    15  15  15  15 
## 1   ANS23_School_5     5_S012    15  15  15  15 
## 1   ANS23_School_5     5_S013    15  15  15  15 
## 1   ANS23_School_5     5_S014    15  15  15  15 
## 1   ANS23_School_5     5_S015    14  14  14  14 
## 1   ANS23_School_7     7_S001    13  13  13  13 
## 1   ANS23_School_7     7_S002    15  15  15  15 
## 1   ANS23_School_7     7_S003    16  16  16  16 
## 1   ANS23_School_7     7_S004    15  15  15  15 
## 1   ANS23_School_7     7_S005    15  15  15  15 
## 1   ANS23_School_7     7_S006    15  15  15  15 
## 1   ANS23_School_7     7_S007    12  12  12  12 
## 1   ANS23_School_7     7_S008    14  14  14  14 
## 1   ANS23_School_7     7_S009    15  15  15  15 
## ------------------------------------------------

We can note the following important points: - The number of groups in the control group are less - there’s only one control group for each school which makes sense as the purpose of those groups is only to collect data. But there are multiple groups within the treatment schools to possibly balance the fellow-to-student ratio

Fellow ID, Supervisor ID, Hub, Fellow Gender, and Fellow Age

We want to check for the following things: - How many students under each fellow and supervisor, hub. - Gender and age wise distribution of students under the fellow

# Checking for entries for condition and implementer
stu_hub <- rct %>% 
  group_by(hub, time) %>% 
  summarise(no_of_students = n_distinct(participant_id),
            no_of_supervisors = n_distinct(supervisor_id),
            no_of_fellows = n_distinct(fellow_id)
            )

## `summarise()` has grouped output by 'hub'. You can override using the `.groups`
## argument.

# Display the table using stargazer
stargazer(stu_hub, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==============================================================
## hub        time no_of_students no_of_supervisors no_of_fellows
## --------------------------------------------------------------
## Kariobangi  0        377               6              23      
## Kariobangi  2        377               6              23      
## Kariobangi  4        377               6              23      
## Kariobangi  8        377               6              23      
## Kasarani    0        224               3              15      
## Kasarani    2        224               3              15      
## Kasarani    4        224               3              15      
## Kasarani    8        224               3              15      
## Kiambu      0        135               1               9      
## Kiambu      2        135               1               9      
## Kiambu      4        135               1               9      
## Kiambu      8        135               1               9      
## NA          0        850               1               1      
## NA          2        850               1               1      
## NA          4        850               1               1      
## NA          8        850               1               1      
## Ruiru       0        130               1               9      
## Ruiru       2        130               1               9      
## Ruiru       4        130               1               9      
## Ruiru       8        130               1               9      
## --------------------------------------------------------------

stu_fel <- rct %>% 
  group_by(fellow_id, time) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'fellow_id'. You can override using the
## `.groups` argument.

stu_fel_wide <- stu_fel %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_fel_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ================================
## fellow_id         0   2   4   8 
## --------------------------------
## SHAMIRI RESEARCH 850 850 850 850
## TFW23_S_031      16  16  16  16 
## TFW23_S_032      15  15  15  15 
## TFW23_S_034      14  14  14  14 
## TFW23_S_036      15  15  15  15 
## TFW23_S_040      15  15  15  15 
## TFW23_S_044      15  15  15  15 
## TFW23_S_046      15  15  15  15 
## TFW23_S_048      15  15  15  15 
## TFW23_S_049      15  15  15  15 
## TFW23_S_050      14  14  14  14 
## TFW23_S_061      15  15  15  15 
## TFW23_S_064      15  15  15  15 
## TFW23_S_065      15  15  15  15 
## TFW23_S_067      15  15  15  15 
## TFW23_S_069      15  15  15  15 
## TFW23_S_074      15  15  15  15 
## TFW23_S_075      29  29  29  29 
## TFW23_S_077      15  15  15  15 
## TFW23_S_081      15  15  15  15 
## TFW23_S_083      15  15  15  15 
## TFW23_S_084      15  15  15  15 
## TFW23_S_085      15  15  15  15 
## TFW23_S_086      15  15  15  15 
## TFW23_S_090      15  15  15  15 
## TFW23_S_092      15  15  15  15 
## TFW23_S_094      15  15  15  15 
## TFW23_S_095      15  15  15  15 
## TFW23_S_099      15  15  15  15 
## TFW23_S_100      18  18  18  18 
## TFW23_S_101      15  15  15  15 
## TFW23_S_102      15  15  15  15 
## TFW23_S_107      15  15  15  15 
## TFW23_S_108      15  15  15  15 
## TFW23_S_111      15  15  15  15 
## TFW23_S_114      30  30  30  30 
## TFW23_S_115      15  15  15  15 
## TFW23_S_117      15  15  15  15 
## TFW23_S_141      15  15  15  15 
## TFW23_S_232      15  15  15  15 
## TFW23_S_234      15  15  15  15 
## TFW23_S_235      15  15  15  15 
## TFW23_S_236      15  15  15  15 
## TFW23_S_237      15  15  15  15 
## TFW23_S_238      15  15  15  15 
## TFW23_S_239      15  15  15  15 
## TFW23_S_240      15  15  15  15 
## TFW23_S_241      15  15  15  15 
## TFW23_S_312      12  12  12  12 
## TFW23_S_313      15  15  15  15 
## TFW23_S_314      16  16  16  16 
## TFW23_S_316      15  15  15  15 
## TFW23_S_317      14  14  14  14 
## TFW23_S_318      15  15  15  15 
## TFW23_S_319      15  15  15  15 
## TFW23_S_320      13  13  13  13 
## TFW23_S_321      15  15  15  15 
## --------------------------------

stu_sup <- rct %>% 
  group_by(supervisor_id, time) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'supervisor_id'. You can override using the
## `.groups` argument.

stu_sup_wide <- stu_sup %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_sup_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =============================
## supervisor_id  0   2   4   8 
## -----------------------------
## NA            850 850 850 850
## SPV23_S_03    130 130 130 130
## SPV23_S_11    75  75  75  75 
## SPV23_S_13    75  75  75  75 
## SPV23_S_16    135 135 135 135
## SPV23_S_17    93  93  93  93 
## SPV23_S_18    15  15  15  15 
## SPV23_S_20    60  60  60  60 
## SPV23_S_25    75  75  75  75 
## SPV23_S_30    75  75  75  75 
## SPV23_S_38    59  59  59  59 
## SPV23_S_40    74  74  74  74 
## -----------------------------

fel_gen <- rct %>% 
  group_by(fellow_gender, time) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'fellow_gender'. You can override using the
## `.groups` argument.

fel_gen_wide <- fel_gen %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(fel_gen_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =============================
## fellow_gender  0   2   4   8 
## -----------------------------
## F             402 402 402 402
## M             464 464 464 464
## NA            850 850 850 850
## -----------------------------

fel_age <- rct %>% 
  group_by(fellow_age, time) %>% 
  summarise(no_of_students = n())

## `summarise()` has grouped output by 'fellow_age'. You can override using the
## `.groups` argument.

fel_age_wide <- fel_age %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(fel_age_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==========================
## fellow_age  0   2   4   8 
## --------------------------
## 18         87  87  87  87 
## 19         124 124 124 124
## 20         45  45  45  45 
## 21         194 194 194 194
## 22         328 328 328 328
## 23         59  59  59  59 
## 24         29  29  29  29 
## NA         850 850 850 850
## --------------------------

We can note the following: - The total number of fellows are 75 in the RCT treatment group in comparison to the 400 fellows who we have data on - Similarly the total number of supervisors are 15, in comparison with the 40 supervisors who we have data on - Most fellows have 60 students under them. However, the number of students can be as high as 172 for one of the fellows. - Male fellows have more students than female fellows - The maximum number of students in the treatment group are with fellows who are of age 21-22. - The NAs in all the tables above represent the entries from the control group as they didn’t have the fellows assigned.

Student age, Student form, student gender, tribe and county

# Checking for entries for condition and implementer
stu_age <- rct %>% 
  group_by(trt,Age, Form) %>% 
  summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'trt', 'Age'. You can override using the
## `.groups` argument.

stargazer(stu_age, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===========================
## trt Age Form no_of_students
## ---------------------------
## 0   12   1         1       
## 0   13   1         12      
## 0   13   NA        1       
## 0   14   1         70      
## 0   14   2         19      
## 0   14   3         1       
## 0   14   NA        1       
## 0   15   1        105      
## 0   15   2         87      
## 0   15   3         5       
## 0   15   4         2       
## 0   15   NA        1       
## 0   16   1         34      
## 0   16   2        123      
## 0   16   3         59      
## 0   16   4         2       
## 0   16   NA        4       
## 0   17   1         6       
## 0   17   2         38      
## 0   17   3         79      
## 0   17   4         25      
## 0   17   NA        5       
## 0   18   1         3       
## 0   18   2         10      
## 0   18   3         31      
## 0   18   4         40      
## 0   18   NA        3       
## 0   19   1         1       
## 0   19   2         2       
## 0   19   3         12      
## 0   19   4         12      
## 0   20   3         1       
## 0   20   4         5       
## 0   22   2         1       
## 0   22   4         3       
## 0   NA   1         12      
## 0   NA   2         9       
## 0   NA   3         7       
## 0   NA   4         3       
## 0   NA   NA        15      
## 1   13   1         15      
## 1   13   2         1       
## 1   13   3         1       
## 1   14   1         89      
## 1   14   2         20      
## 1   14   4         1       
## 1   14   NA        2       
## 1   15   1        121      
## 1   15   2         81      
## 1   15   3         5       
## 1   15   NA        2       
## 1   16   1         65      
## 1   16   2         84      
## 1   16   3         55      
## 1   17   1         13      
## 1   17   2         41      
## 1   17   3        109      
## 1   17   4         17      
## 1   17   NA        4       
## 1   18   1         3       
## 1   18   2         4       
## 1   18   3         22      
## 1   18   4         37      
## 1   18   NA        1       
## 1   19   2         2       
## 1   19   3         8       
## 1   19   4         9       
## 1   20   3         1       
## 1   20   4         3       
## 1   NA   1         17      
## 1   NA   2         6       
## 1   NA   3         10      
## 1   NA   NA        17      
## ---------------------------

stu_age_gen <- rct %>% 
  group_by(trt,Age,Gender) %>% 
    summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'trt', 'Age'. You can override using the
## `.groups` argument.

stargazer(stu_age_gen, type = "text", summary = FALSE, rownames = FALSE)

## 
## =============================
## trt Age Gender no_of_students
## -----------------------------
## 0   12   Male        1       
## 0   13  Female       5       
## 0   13   Male        8       
## 0   14  Female       43      
## 0   14   Male        47      
## 0   14    NA         1       
## 0   15  Female       97      
## 0   15   Male       102      
## 0   15    NA         1       
## 0   16  Female      123      
## 0   16   Male        98      
## 0   16    NA         1       
## 0   17  Female       75      
## 0   17   Male        73      
## 0   17    NA         5       
## 0   18  Female       46      
## 0   18   Male        38      
## 0   18    NA         3       
## 0   19  Female       11      
## 0   19   Male        16      
## 0   20  Female       2       
## 0   20   Male        4       
## 0   22  Female       2       
## 0   22   Male        1       
## 0   22    NA         1       
## 0   NA  Female       13      
## 0   NA   Male        17      
## 0   NA    NA         16      
## 1   13  Female       5       
## 1   13   Male        12      
## 1   14  Female       55      
## 1   14   Male        57      
## 1   15  Female      108      
## 1   15   Male        97      
## 1   15    NA         4       
## 1   16  Female       99      
## 1   16   Male       102      
## 1   16    NA         3       
## 1   17  Female       92      
## 1   17   Male        89      
## 1   17    NA         3       
## 1   18  Female       32      
## 1   18   Male        34      
## 1   18    NA         1       
## 1   19  Female       8       
## 1   19   Male        11      
## 1   20  Female       2       
## 1   20   Male        2       
## 1   NA  Female       18      
## 1   NA   Male        13      
## 1   NA    NA         19      
## -----------------------------

stu_gen <- rct %>% 
  group_by(Gender, Form) %>% 
  summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.

stargazer(stu_gen, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==========================
## Gender Form no_of_students
## --------------------------
## Female  1        233      
## Female  2        278      
## Female  3        207      
## Female  4        113      
## Female  NA        5       
## Male    1        327      
## Male    2        245      
## Male    3        194      
## Male    4         45      
## Male    NA        11      
## NA      1         7       
## NA      2         5       
## NA      3         5       
## NA      4         1       
## NA      NA        40      
## --------------------------

stu_tribe <- rct %>% 
  group_by(trt,Tribe) %>% 
  summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

# Print the summary table with stargazer
stargazer(stu_tribe, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===========================================
## trt          Tribe           no_of_students
## -------------------------------------------
## 0           Abagusii               2       
## 0             Arab                 1       
## 0            Bantu                 2       
## 0            Borana                8       
## 0            Burji                 5       
## 0    Chirakinandi Dr Congo         1       
## 0            Congo                 1       
## 0          Congolese               2       
## 0          Digo Kisii              1       
## 0             Embu                 2       
## 0            Gaatta                1       
## 0            Gabra                 1       
## 0           Giriama                4       
## 0      Giriama Mijikenda           1       
## 0          Half Cast               1       
## 0   Half Of Kikuyu And Kisii       2       
## 0           Islamic                1       
## 0           Kalenjin               10      
## 0            Kamba                 90      
## 0         Kambakikuyu              2       
## 0            Kikuyu               255      
## 0           Kipsigis               1       
## 0            Kisii                 38      
## 0            Kuria                 1       
## 0            Luhya                125      
## 0       Luhya And Kisii            1       
## 0         Luhyakikuyu              2       
## 0             Luo                 184      
## 0            Maasai                5       
## 0            Masika                1       
## 0            Mbeere                1       
## 0             Meru                 15      
## 0          Mijikenda               3       
## 0            Muslim                4       
## 0              NA                  37      
## 0              Na                  2       
## 0             Nubi                 1       
## 0             Nuer                 1       
## 0            Oromo                 1       
## 0            Pokomo                1       
## 0            Rabai                 1       
## 0           Rwandese               1       
## 0            Somali                13      
## 0             Suba                 1       
## 0           Sudanese               3       
## 0           Swahili                1       
## 0        Taiba Nigerian            1       
## 0            Taita                 3       
## 0             Teso                 4       
## 0           Turkana                1       
## 0            Tutsi                 1       
## 0           Ugandan                3       
## 1           Abagusii               2       
## 1             Arab                 1       
## 1            Bantu                 1       
## 1            Borana                10      
## 1            Bukusu                1       
## 1            Burji                 5       
## 1            Chonyi                1       
## 1             Digo                 1       
## 1            Dinka                 1       
## 1            Duruma                1       
## 1             Embu                 4       
## 1            Gabra                 2       
## 1           Giriama                3       
## 1           Kalenjin               9       
## 1            Kamba                 84      
## 1            Kikuyu               257      
## 1           Kipsigis               1       
## 1            Kisii                 40      
## 1         Kisii Kamba              1       
## 1            Kuria                 1       
## 1             Lamu                 1       
## 1            Luhya                156      
## 1       Luhya And Kisii            1       
## 1             Luo                 180      
## 1            Luyha                 1       
## 1            Maasai                3       
## 1            Mbeere                1       
## 1             Meru                 13      
## 1          Mijikenda               1       
## 1            Muslim                3       
## 1           Mwilwana               1       
## 1              NA                  36      
## 1              Na                  4       
## 1            Nandi                 2       
## 1             Nubi                 1       
## 1             Nuer                 1       
## 1            Oromo                 2       
## 1            Pokomo                1       
## 1            Rabai                 2       
## 1           Rendille               1       
## 1           Rwandese               1       
## 1           Samburu                1       
## 1            Somali                9       
## 1             Suba                 2       
## 1           Sudanese               2       
## 1           Swahili                2       
## 1            Taita                 4       
## 1            Taveta                2       
## 1             Teso                 3       
## 1         Ugandakenya              1       
## 1           Ugandan                1       
## 1            Yoruba                1       
## -------------------------------------------

stu_county <- rct %>% 
  group_by(trt,County) %>% 
  summarise(no_of_students = n_distinct(participant_id)) %>%
  arrange(desc(no_of_students))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

# Print the summary table with stargazer
stargazer(stu_county, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===================================
## trt      County      no_of_students
## -----------------------------------
## 1       Nairobi           173      
## 0       Nairobi           163      
## 1        Kiambu           145      
## 0        Kiambu           122      
## 0        Siaya             72      
## 1        Siaya             64      
## 1       Kakamega           45      
## 0       Machakos           39      
## 0          NA              36      
## 1        Kisumu            35      
## 1       Machakos           35      
## 0       Kakamega           34      
## 0        Kisumu            34      
## 0       Muranga            32      
## 1        Busia             32      
## 1          NA              32      
## 0        Vihiga            30      
## 1       Muranga            29      
## 0        Busia             26      
## 1        Kisii             25      
## 1        Vihiga            22      
## 0        Kisii             21      
## 1       Makueni            20      
## 0        Kitui             19      
## 0       Makueni            19      
## 1       Homa Bay           19      
## 1        Migori            18      
## 1        Nakuru            18      
## 0        Nyeri             17      
## 1       Kajiado            15      
## 1        Kitui             15      
## 0       Homa Bay           14      
## 1          Na              13      
## 0       Nyamira            12      
## 0       Bungoma            11      
## 0         Meru             11      
## 0        Migori            11      
## 0        Nakuru            11      
## 1        Nyeri             11      
## 0       Kajiado            10      
## 0      Nyandarua           10      
## 0       Mombasa            8       
## 0          Na              8       
## 1         Meru             8       
## 1     Trans Nzoia          8       
## 0       Mandera            7       
## 1       Bungoma            7       
## 1       Marsabit           7       
## 0       Marsabit           6       
## 0        Narok             6       
## 1      Kirinyaga           6       
## 1       Nyamira            6       
## 1      Nyandarua           6       
## 1     Uasin Gishu          6       
## 0         Embu             5       
## 0        Kilifi            5       
## 1         Embu             5       
## 0       Laikipia           4       
## 0        Moyale            4       
## 0     Trans Nzoia          4       
## 0        Wajir             4       
## 1        Kilifi            4       
## 1       Mandera            4       
## 1     Taita Taveta         4       
## 0        Kenya             3       
## 0       Kericho            3       
## 0        Kitale            3       
## 0     Uasin Gishu          3       
## 0        Uganda            3       
## 1        Isiolo            3       
## 1       Mombasa            3       
## 1        Narok             3       
## 0       Gatundu            2       
## 0       Navaisha           2       
## 0     Taita Taveta         2       
## 0      Tana River          2       
## 1       Garissa            2       
## 1        Moyale            2       
## 1        Uganda            2       
## 0       Baringo            1       
## 0        Congo             1       
## 0       Eldoret            1       
## 0      Kirinyaga           1       
## 0         Lamu             1       
## 0     Mbooni East          1       
## 0        Mwingi            1       
## 0        Nandi             1       
## 0      Nyahururu           1       
## 0        Rwanda            1       
## 0    Tharaka Nithi         1       
## 0       Turkana            1       
## 1    Elgeyomarakwet        1       
## 1        Kenya             1       
## 1       Kericho            1       
## 1        Kwale             1       
## 1       Laikipia           1       
## 1         Lamu             1       
## 1        Nandi             1       
## 1       Navaisha           1       
## 1        Nueri             1       
## 1   Nyahururumandera       1       
## 1       Nyakach            1       
## 1       Samburu            1       
## 1     South Sudan          1       
## 1      Tana River          1       
## -----------------------------------

Home, Siblings, Religion, Parent’s home, Parent’s Dead, Father’s Education, Mother’s Education

stu_home <- rct %>% 
  group_by(trt,Home) %>% 
  summarise(no_of_students = n_distinct(participant_id)) %>% 
  mutate(Home = case_when(
    Home == 1 ~ "Rural Area",
    Home == 2 ~ "Small town",
    Home == 3 ~ "Big town",
    Home == 4 ~ "City",
  ))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

# Print the summary table with stargazer
stargazer(stu_home, type = "text", summary = FALSE, rownames = FALSE)

## 
## =============================
## trt    Home    no_of_students
## -----------------------------
## 0   Rural Area      211      
## 0   Small town      273      
## 0    Big town       193      
## 0      City         128      
## 0                    45      
## 1   Rural Area      208      
## 1   Small town      304      
## 1    Big town       198      
## 1      City         115      
## 1                    41      
## -----------------------------

stu_sib <- rct %>% 
  group_by(trt,Siblings) %>% 
  summarise(no_of_students = n_distinct(participant_id))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

# Print the summary table with stargazer
stargazer(stu_sib, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===========================
## trt Siblings no_of_students
## ---------------------------
## 0      0           26      
## 0      1          124      
## 0      2          211      
## 0      3          177      
## 0      4          117      
## 0      5           81      
## 0      6           41      
## 0      7           41      
## 0      NA          32      
## 1      0           34      
## 1      1          143      
## 1      2          203      
## 1      3          185      
## 1      4          123      
## 1      5           72      
## 1      6           41      
## 1      7           31      
## 1      NA          34      
## ---------------------------

stu_rel<- rct %>% 
  group_by(trt,Religion) %>% 
  summarise(no_of_students = n_distinct(participant_id)) %>% 
  mutate(Home = case_when(
    Religion == 1 ~ "Christian Protestants",
    Religion == 2 ~ "Christian Catholics",
    Religion == 3 ~ "Muslim",
    Religion == 4 ~ "Hindu",
    Religion == 5 ~ "Buddhist",
    Religion == 6 ~ "Traditional Africans",
    Religion == 7 ~ "No Religion",
    Religion == 8 ~ "Other",
  ))

## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.

# Print the summary table with stargazer
stargazer(stu_rel, type = "text", summary = FALSE, rownames = FALSE)

## 
## =================================================
## trt Religion no_of_students         Home         
## -------------------------------------------------
## 0      1          507       Christian Protestants
## 0      2          224        Christian Catholics 
## 0      3           56              Muslim        
## 0      6           6        Traditional Africans 
## 0      7           6             No Religion     
## 0      8           21               Other        
## 0      NA          30                            
## 1      1          535       Christian Protestants
## 1      2          223        Christian Catholics 
## 1      3           45              Muslim        
## 1      6           6        Traditional Africans 
## 1      7           7             No Religion     
## 1      8           23               Other        
## 1      NA          27                            
## -------------------------------------------------

# Group by Parents_Home and time, and count the number of students
stu_parentshome <- rct %>% 
  group_by(trt,time,Parents_Home) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(Home = case_when(
    Parents_Home == 0 ~ "No Parent",
    Parents_Home == 1 ~ "Single Parent",
    Parents_Home == 2 ~ "Both Parents"
  ))

# Pivot the data to a wider format
stu_parentshome_wide <- stu_parentshome %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_parentshome_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==============================================
## trt Parents_Home     Home       0   2   4   8 
## ----------------------------------------------
## 0        0         No Parent   27  27  27  27 
## 0        1       Single Parent 280 280 280 280
## 0        2       Both Parents  509 509 509 509
## 0        NA                    34  34  34  34 
## 1        0         No Parent   30  30  30  30 
## 1        1       Single Parent 330 330 330 330
## 1        2       Both Parents  491 491 491 491
## 1        NA                    15  15  15  15 
## ----------------------------------------------

# Group by Parents_Dead and time, and count the number of students
stu_parentdead <- rct %>% 
  group_by(trt, time,Parents_Dead) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(Dead = case_when(
    Parents_Dead == 1 ~ "Father",
    Parents_Dead == 2 ~ "Mother",
    Parents_Dead == 3 ~ "Both",
    Parents_Dead == 4 ~ "None"
  ))

# Pivot the data to a wider format
stu_parentdead_wide <- stu_parentdead %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_parentdead_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =======================================
## trt Parents_Dead  Dead   0   2   4   8 
## ---------------------------------------
## 0        1       Father 84  84  84  84 
## 0        2       Mother 27  27  27  27 
## 0        3        Both  17  17  17  17 
## 0        4        None  688 688 688 688
## 0        NA             34  34  34  34 
## 1        1       Father 99  99  99  99 
## 1        2       Mother 35  35  35  35 
## 1        3        Both  15  15  15  15 
## 1        4        None  694 694 694 694
## 1        NA             23  23  23  23 
## ---------------------------------------

# Group by Fathers_Education and time, and count the number of students
stu_fatheredu <- rct %>% 
  group_by(trt,time,Fathers_Education) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(FEDU = case_when(
    Fathers_Education == 1 ~ "University",
    Fathers_Education == 2 ~ "Secondary School",
    Fathers_Education == 3 ~ "Primary School",
    Fathers_Education == 4 ~ "Not Aware"
  ))

# Pivot the data to a wider format
stu_fatheredu_wide <- stu_fatheredu %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_fatheredu_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ======================================================
## trt Fathers_Education       FEDU        0   2   4   8 
## ------------------------------------------------------
## 0           1            University    175 175 175 175
## 0           2         Secondary School 226 226 226 226
## 0           3          Primary School  74  74  74  74 
## 0           4            Not Aware     333 333 333 333
## 0          NA                          42  42  42  42 
## 1           1            University    172 172 172 172
## 1           2         Secondary School 240 240 240 240
## 1           3          Primary School  99  99  99  99 
## 1           4            Not Aware     313 313 313 313
## 1          NA                          42  42  42  42 
## ------------------------------------------------------

# Group by Mothers_Education and time, and count the number of students
stu_motheredu <- rct %>% 
  group_by(trt,time,Mothers_Education) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(MEDU = case_when(
    Mothers_Education == 1 ~ "University",
    Mothers_Education == 2 ~ "Secondary School",
    Mothers_Education == 3 ~ "Primary School",
    Mothers_Education == 4 ~ "Not Aware"
  ))

# Pivot the data to a wider format
stu_motheredu_wide <- stu_motheredu %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_motheredu_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ======================================================
## trt Mothers_Education       MEDU        0   2   4   8 
## ------------------------------------------------------
## 0           1            University    163 163 163 163
## 0           2         Secondary School 234 234 234 234
## 0           3          Primary School  171 171 171 171
## 0           4            Not Aware     251 251 251 251
## 0          NA                          31  31  31  31 
## 1           1            University    146 146 146 146
## 1           2         Secondary School 280 280 280 280
## 1           3          Primary School  184 184 184 184
## 1           4            Not Aware     221 221 221 221
## 1          NA                          35  35  35  35 
## ------------------------------------------------------

Co-curricular, Sports, Academics

# Group by student gender, Co_Curricular, and time, then count the number of students
stu_cocu_gen <- rct %>% 
  group_by(trt,time,Gender, Co_Curricular) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(Cocu = case_when(
    Co_Curricular == 1 ~ "Not",
    Co_Curricular == 2 ~ "Quite",
    Co_Curricular == 3 ~ "Extremely"
  ))

# Pivot the data to a wider format
stu_cocu_gen_wide <- stu_cocu_gen %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_cocu_gen_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==================================================
## trt Gender Co_Curricular   Cocu     0   2   4   8 
## --------------------------------------------------
## 0   Female       1          Not    118 118 118 118
## 0   Female       2         Quite   161 161 161 161
## 0   Female       3       Extremely 124 124 124 124
## 0   Female      NA                 14  14  14  14 
## 0    Male        1          Not    107 107 107 107
## 0    Male        2         Quite   162 162 162 162
## 0    Male        3       Extremely 116 116 116 116
## 0    Male       NA                 20  20  20  20 
## 0     NA         1          Not     4   4   4   4 
## 0     NA         2         Quite    8   8   8   8 
## 0     NA         3       Extremely  9   9   9   9 
## 0     NA        NA                  7   7   7   7 
## 1   Female       1          Not    128 128 128 128
## 1   Female       2         Quite   164 164 164 164
## 1   Female       3       Extremely 112 112 112 112
## 1   Female      NA                 15  15  15  15 
## 1    Male        1          Not    117 117 117 117
## 1    Male        2         Quite   175 175 175 175
## 1    Male        3       Extremely 113 113 113 113
## 1    Male       NA                 12  12  12  12 
## 1     NA         1          Not    11  11  11  11 
## 1     NA         2         Quite    9   9   9   9 
## 1     NA         3       Extremely  7   7   7   7 
## 1     NA        NA                  3   3   3   3 
## --------------------------------------------------

# Group by student age, Co_Curricular, and time, then count the number of students
stu_cocu_age <- rct %>% 
  group_by(trt,Age, Co_Curricular, time) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(Cocu = case_when(
    Co_Curricular == 1 ~ "Not",
    Co_Curricular == 2 ~ "Quite",
    Co_Curricular == 3 ~ "Extremely"
  ))

# Pivot the data to a wider format
stu_cocu_age_wide <- stu_cocu_age %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_cocu_age_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ===============================================
## trt Age Co_Curricular   Cocu     0   2   4   8 
## -----------------------------------------------
## 0   12        2         Quite    1   1   1   1 
## 0   13        1          Not     5   5   5   5 
## 0   13        2         Quite    2   2   2   2 
## 0   13        3       Extremely  6   6   6   6 
## 0   14        1          Not    28  28  28  28 
## 0   14        2         Quite   43  43  43  43 
## 0   14        3       Extremely 16  16  16  16 
## 0   14       NA                  4   4   4   4 
## 0   15        1          Not    53  53  53  53 
## 0   15        2         Quite   80  80  80  80 
## 0   15        3       Extremely 54  54  54  54 
## 0   15       NA                 13  13  13  13 
## 0   16        1          Not    56  56  56  56 
## 0   16        2         Quite   101 101 101 101
## 0   16        3       Extremely 58  58  58  58 
## 0   16       NA                  7   7   7   7 
## 0   17        1          Not    39  39  39  39 
## 0   17        2         Quite   49  49  49  49 
## 0   17        3       Extremely 57  57  57  57 
## 0   17       NA                  8   8   8   8 
## 0   18        1          Not    24  24  24  24 
## 0   18        2         Quite   30  30  30  30 
## 0   18        3       Extremely 30  30  30  30 
## 0   18       NA                  3   3   3   3 
## 0   19        1          Not     7   7   7   7 
## 0   19        2         Quite   11  11  11  11 
## 0   19        3       Extremely  9   9   9   9 
## 0   20        1          Not     3   3   3   3 
## 0   20        2         Quite    1   1   1   1 
## 0   20        3       Extremely  2   2   2   2 
## 0   22        1          Not     2   2   2   2 
## 0   22        2         Quite    1   1   1   1 
## 0   22       NA                  1   1   1   1 
## 0   NA        1          Not    12  12  12  12 
## 0   NA        2         Quite   12  12  12  12 
## 0   NA        3       Extremely 17  17  17  17 
## 0   NA       NA                  5   5   5   5 
## 1   13        1          Not     8   8   8   8 
## 1   13        2         Quite    5   5   5   5 
## 1   13        3       Extremely  4   4   4   4 
## 1   14        1          Not    28  28  28  28 
## 1   14        2         Quite   56  56  56  56 
## 1   14        3       Extremely 24  24  24  24 
## 1   14       NA                  4   4   4   4 
## 1   15        1          Not    56  56  56  56 
## 1   15        2         Quite   86  86  86  86 
## 1   15        3       Extremely 66  66  66  66 
## 1   15       NA                  1   1   1   1 
## 1   16        1          Not    65  65  65  65 
## 1   16        2         Quite   69  69  69  69 
## 1   16        3       Extremely 62  62  62  62 
## 1   16       NA                  8   8   8   8 
## 1   17        1          Not    52  52  52  52 
## 1   17        2         Quite   73  73  73  73 
## 1   17        3       Extremely 51  51  51  51 
## 1   17       NA                  8   8   8   8 
## 1   18        1          Not    21  21  21  21 
## 1   18        2         Quite   31  31  31  31 
## 1   18        3       Extremely 13  13  13  13 
## 1   18       NA                  2   2   2   2 
## 1   19        1          Not     7   7   7   7 
## 1   19        2         Quite    7   7   7   7 
## 1   19        3       Extremely  4   4   4   4 
## 1   19       NA                  1   1   1   1 
## 1   20        1          Not     1   1   1   1 
## 1   20        2         Quite    1   1   1   1 
## 1   20        3       Extremely  1   1   1   1 
## 1   20       NA                  1   1   1   1 
## 1   NA        1          Not    18  18  18  18 
## 1   NA        2         Quite   20  20  20  20 
## 1   NA        3       Extremely  7   7   7   7 
## 1   NA       NA                  5   5   5   5 
## -----------------------------------------------

# Group by student gender, Sports, and time, then count the number of students
stu_sports_gen <- rct %>% 
  group_by(trt,Gender, Sports, time) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(spo = case_when(
    Sports == 1 ~ "Yes",
    Sports == 2 ~ "No"
  ))

# Pivot the data to a wider format
stu_sports_gen_wide <- stu_sports_gen %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_sports_gen_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =====================================
## trt Gender Sports spo  0   2   4   8 
## -------------------------------------
## 0   Female   1    Yes 126 126 126 126
## 0   Female   2    No  280 280 280 280
## 0   Female   NA       11  11  11  11 
## 0    Male    1    Yes 131 131 131 131
## 0    Male    2    No  259 259 259 259
## 0    Male    NA       15  15  15  15 
## 0     NA     1    Yes 10  10  10  10 
## 0     NA     2    No  14  14  14  14 
## 0     NA     NA        4   4   4   4 
## 1   Female   1    Yes 113 113 113 113
## 1   Female   2    No  297 297 297 297
## 1   Female   NA        9   9   9   9 
## 1    Male    1    Yes 123 123 123 123
## 1    Male    2    No  281 281 281 281
## 1    Male    NA       13  13  13  13 
## 1     NA     1    Yes  6   6   6   6 
## 1     NA     2    No  20  20  20  20 
## 1     NA     NA        4   4   4   4 
## -------------------------------------

# Group by student age, Sports, and time, then count the number of students
stu_sports_age <- rct %>% 
  group_by(trt,Age, Sports, time) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(spo = case_when(
    Sports == 1 ~ "Yes",
    Sports == 2 ~ "No"
  ))

# Pivot the data to a wider format
stu_sports_age_wide <- stu_sports_age %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_sports_age_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ==================================
## trt Age Sports spo  0   2   4   8 
## ----------------------------------
## 0   12    1    Yes  1   1   1   1 
## 0   13    1    Yes  3   3   3   3 
## 0   13    2    No  10  10  10  10 
## 0   14    1    Yes 16  16  16  16 
## 0   14    2    No  71  71  71  71 
## 0   14    NA        4   4   4   4 
## 0   15    1    Yes 57  57  57  57 
## 0   15    2    No  138 138 138 138
## 0   15    NA        5   5   5   5 
## 0   16    1    Yes 64  64  64  64 
## 0   16    2    No  154 154 154 154
## 0   16    NA        4   4   4   4 
## 0   17    1    Yes 49  49  49  49 
## 0   17    2    No  95  95  95  95 
## 0   17    NA        9   9   9   9 
## 0   18    1    Yes 48  48  48  48 
## 0   18    2    No  35  35  35  35 
## 0   18    NA        4   4   4   4 
## 0   19    1    Yes  9   9   9   9 
## 0   19    2    No  17  17  17  17 
## 0   19    NA        1   1   1   1 
## 0   20    1    Yes  1   1   1   1 
## 0   20    2    No   4   4   4   4 
## 0   20    NA        1   1   1   1 
## 0   22    1    Yes  2   2   2   2 
## 0   22    2    No   2   2   2   2 
## 0   NA    1    Yes 17  17  17  17 
## 0   NA    2    No  27  27  27  27 
## 0   NA    NA        2   2   2   2 
## 1   13    1    Yes  3   3   3   3 
## 1   13    2    No  13  13  13  13 
## 1   13    NA        1   1   1   1 
## 1   14    1    Yes 25  25  25  25 
## 1   14    2    No  85  85  85  85 
## 1   14    NA        2   2   2   2 
## 1   15    1    Yes 51  51  51  51 
## 1   15    2    No  156 156 156 156
## 1   15    NA        2   2   2   2 
## 1   16    1    Yes 60  60  60  60 
## 1   16    2    No  138 138 138 138
## 1   16    NA        6   6   6   6 
## 1   17    1    Yes 54  54  54  54 
## 1   17    2    No  122 122 122 122
## 1   17    NA        8   8   8   8 
## 1   18    1    Yes 21  21  21  21 
## 1   18    2    No  43  43  43  43 
## 1   18    NA        3   3   3   3 
## 1   19    1    Yes  9   9   9   9 
## 1   19    2    No  10  10  10  10 
## 1   20    1    Yes  3   3   3   3 
## 1   20    2    No   1   1   1   1 
## 1   NA    1    Yes 16  16  16  16 
## 1   NA    2    No  30  30  30  30 
## 1   NA    NA        4   4   4   4 
## ----------------------------------

# Group by student gender, Percieved_Academic_Abilities, and time, then count the number of students
stu_acad_gen <- rct %>% 
  group_by(trt,Gender, Percieved_Academic_Abilities, time) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(acad = case_when(
    Percieved_Academic_Abilities == 1 ~ "Not Satisfactory",
    Percieved_Academic_Abilities == 2 ~ "Satisfactory",
    Percieved_Academic_Abilities == 3 ~ "Good",
    Percieved_Academic_Abilities == 4 ~ "Very Good",
    Percieved_Academic_Abilities == 5 ~ "Excellent"
  ))

# Pivot the data to a wider format
stu_acad_gen_wide <- stu_acad_gen %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_acad_gen_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## ========================================================================
## trt Gender Percieved_Academic_Abilities       acad        0   2   4   8 
## ------------------------------------------------------------------------
## 0   Female              1               Not Satisfactory 92  92  92  92 
## 0   Female              2                 Satisfactory   57  57  57  57 
## 0   Female              3                     Good       130 130 130 130
## 0   Female              4                  Very Good     59  59  59  59 
## 0   Female              5                  Excellent     65  65  65  65 
## 0   Female              NA                               14  14  14  14 
## 0    Male               1               Not Satisfactory 53  53  53  53 
## 0    Male               2                 Satisfactory   68  68  68  68 
## 0    Male               3                     Good       140 140 140 140
## 0    Male               4                  Very Good     64  64  64  64 
## 0    Male               5                  Excellent     63  63  63  63 
## 0    Male               NA                               17  17  17  17 
## 0     NA                1               Not Satisfactory  1   1   1   1 
## 0     NA                2                 Satisfactory    4   4   4   4 
## 0     NA                3                     Good        5   5   5   5 
## 0     NA                4                  Very Good      6   6   6   6 
## 0     NA                5                  Excellent      6   6   6   6 
## 0     NA                NA                                6   6   6   6 
## 1   Female              1               Not Satisfactory 99  99  99  99 
## 1   Female              2                 Satisfactory   66  66  66  66 
## 1   Female              3                     Good       121 121 121 121
## 1   Female              4                  Very Good     67  67  67  67 
## 1   Female              5                  Excellent     59  59  59  59 
## 1   Female              NA                                7   7   7   7 
## 1    Male               1               Not Satisfactory 50  50  50  50 
## 1    Male               2                 Satisfactory   70  70  70  70 
## 1    Male               3                     Good       158 158 158 158
## 1    Male               4                  Very Good     81  81  81  81 
## 1    Male               5                  Excellent     47  47  47  47 
## 1    Male               NA                               11  11  11  11 
## 1     NA                1               Not Satisfactory  4   4   4   4 
## 1     NA                2                 Satisfactory    4   4   4   4 
## 1     NA                3                     Good       12  12  12  12 
## 1     NA                4                  Very Good      1   1   1   1 
## 1     NA                5                  Excellent      6   6   6   6 
## 1     NA                NA                                3   3   3   3 
## ------------------------------------------------------------------------

# Group by student age, Percieved_Academic_Abilities, and time, then count the number of students
stu_acad_age <- rct %>% 
  group_by(trt, Age,Percieved_Academic_Abilities, time) %>% 
  summarise(no_of_students = n(), .groups = 'drop') %>% 
  mutate(acad = case_when(
    Percieved_Academic_Abilities == 1 ~ "Not Satisfactory",
    Percieved_Academic_Abilities == 2 ~ "Satisfactory",
    Percieved_Academic_Abilities == 3 ~ "Good",
    Percieved_Academic_Abilities == 4 ~ "Very Good",
    Percieved_Academic_Abilities == 5 ~ "Excellent"
  ))

# Pivot the data to a wider format
stu_acad_age_wide <- stu_acad_age %>%
  pivot_wider(names_from = time, values_from = no_of_students, values_fill = list(no_of_students = 0))

# Display the table using stargazer
stargazer(stu_acad_age_wide, type = "text", summary = FALSE, rownames = FALSE)

## 
## =================================================================
## trt Age Percieved_Academic_Abilities       acad       0  2  4  8 
## -----------------------------------------------------------------
## 0   12               1               Not Satisfactory 1  1  1  1 
## 0   13               1               Not Satisfactory 1  1  1  1 
## 0   13               2                 Satisfactory   1  1  1  1 
## 0   13               3                     Good       5  5  5  5 
## 0   13               4                  Very Good     1  1  1  1 
## 0   13               5                  Excellent     4  4  4  4 
## 0   13               NA                               1  1  1  1 
## 0   14               1               Not Satisfactory 11 11 11 11
## 0   14               2                 Satisfactory   10 10 10 10
## 0   14               3                     Good       28 28 28 28
## 0   14               4                  Very Good     22 22 22 22
## 0   14               5                  Excellent     16 16 16 16
## 0   14               NA                               4  4  4  4 
## 0   15               1               Not Satisfactory 39 39 39 39
## 0   15               2                 Satisfactory   30 30 30 30
## 0   15               3                     Good       57 57 57 57
## 0   15               4                  Very Good     32 32 32 32
## 0   15               5                  Excellent     35 35 35 35
## 0   15               NA                               7  7  7  7 
## 0   16               1               Not Satisfactory 38 38 38 38
## 0   16               2                 Satisfactory   38 38 38 38
## 0   16               3                     Good       70 70 70 70
## 0   16               4                  Very Good     33 33 33 33
## 0   16               5                  Excellent     32 32 32 32
## 0   16               NA                               11 11 11 11
## 0   17               1               Not Satisfactory 30 30 30 30
## 0   17               2                 Satisfactory   24 24 24 24
## 0   17               3                     Good       50 50 50 50
## 0   17               4                  Very Good     21 21 21 21
## 0   17               5                  Excellent     20 20 20 20
## 0   17               NA                               8  8  8  8 
## 0   18               1               Not Satisfactory 17 17 17 17
## 0   18               2                 Satisfactory   16 16 16 16
## 0   18               3                     Good       31 31 31 31
## 0   18               4                  Very Good     8  8  8  8 
## 0   18               5                  Excellent     13 13 13 13
## 0   18               NA                               2  2  2  2 
## 0   19               1               Not Satisfactory 3  3  3  3 
## 0   19               2                 Satisfactory   6  6  6  6 
## 0   19               3                     Good       11 11 11 11
## 0   19               4                  Very Good     2  2  2  2 
## 0   19               5                  Excellent     5  5  5  5 
## 0   20               1               Not Satisfactory 1  1  1  1 
## 0   20               3                     Good       4  4  4  4 
## 0   20               NA                               1  1  1  1 
## 0   22               1               Not Satisfactory 2  2  2  2 
## 0   22               3                     Good       1  1  1  1 
## 0   22               5                  Excellent     1  1  1  1 
## 0   NA               1               Not Satisfactory 3  3  3  3 
## 0   NA               2                 Satisfactory   4  4  4  4 
## 0   NA               3                     Good       18 18 18 18
## 0   NA               4                  Very Good     10 10 10 10
## 0   NA               5                  Excellent     8  8  8  8 
## 0   NA               NA                               3  3  3  3 
## 1   13               2                 Satisfactory   2  2  2  2 
## 1   13               3                     Good       8  8  8  8 
## 1   13               4                  Very Good     4  4  4  4 
## 1   13               5                  Excellent     3  3  3  3 
## 1   14               1               Not Satisfactory 14 14 14 14
## 1   14               2                 Satisfactory   15 15 15 15
## 1   14               3                     Good       38 38 38 38
## 1   14               4                  Very Good     27 27 27 27
## 1   14               5                  Excellent     16 16 16 16
## 1   14               NA                               2  2  2  2 
## 1   15               1               Not Satisfactory 37 37 37 37
## 1   15               2                 Satisfactory   24 24 24 24
## 1   15               3                     Good       75 75 75 75
## 1   15               4                  Very Good     43 43 43 43
## 1   15               5                  Excellent     29 29 29 29
## 1   15               NA                               1  1  1  1 
## 1   16               1               Not Satisfactory 31 31 31 31
## 1   16               2                 Satisfactory   45 45 45 45
## 1   16               3                     Good       63 63 63 63
## 1   16               4                  Very Good     31 31 31 31
## 1   16               5                  Excellent     26 26 26 26
## 1   16               NA                               8  8  8  8 
## 1   17               1               Not Satisfactory 46 46 46 46
## 1   17               2                 Satisfactory   31 31 31 31
## 1   17               3                     Good       55 55 55 55
## 1   17               4                  Very Good     24 24 24 24
## 1   17               5                  Excellent     24 24 24 24
## 1   17               NA                               4  4  4  4 
## 1   18               1               Not Satisfactory 15 15 15 15
## 1   18               2                 Satisfactory   11 11 11 11
## 1   18               3                     Good       28 28 28 28
## 1   18               4                  Very Good     10 10 10 10
## 1   18               5                  Excellent     1  1  1  1 
## 1   18               NA                               2  2  2  2 
## 1   19               1               Not Satisfactory 5  5  5  5 
## 1   19               2                 Satisfactory   4  4  4  4 
## 1   19               3                     Good       7  7  7  7 
## 1   19               4                  Very Good     1  1  1  1 
## 1   19               5                  Excellent     2  2  2  2 
## 1   20               1               Not Satisfactory 1  1  1  1 
## 1   20               3                     Good       1  1  1  1 
## 1   20               4                  Very Good     2  2  2  2 
## 1   NA               1               Not Satisfactory 4  4  4  4 
## 1   NA               2                 Satisfactory   8  8  8  8 
## 1   NA               3                     Good       16 16 16 16
## 1   NA               4                  Very Good     7  7  7  7 
## 1   NA               5                  Excellent     11 11 11 11
## 1   NA               NA                               4  4  4  4 
## -----------------------------------------------------------------

Part 3: Frequency of responses to Mental health questions in each scale

PCS, PHQ, GAD scores

# Function to get the question text based on PCS column name
get_question_text <- function(pcs_col) {
  question_texts <- list(
    PCS_1 = "If I try to behave, I can keep myself out of trouble.",
    PCS_2 = "I can not succeed at school no matter how hard I try.",
    PCS_3 = "I can stay out of trouble if I really try.",
    PCS_4 = "I can not get good grades no matter how hard I try.",
    PCS_5 = "I can not do well on tests at school even if I study hard.",
    PCS_6 = "I can not get good marks for my homework, even if I work hard at it."
  )
  return(question_texts[[pcs_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a PCS column by time and create a frequency plot
create_summary_and_plot <- function(data, pcs_col) {
  # Summarize the PCS column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(pcs_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(pcs_col)), "NA", as.character(!!sym(pcs_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "0" ~ "Very False",
      score == "1" ~ "Somewhat False",
      score == "2" ~ "Somewhat True",
      score == "3" ~ "Very True",
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_question_text(pcs_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", pcs_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", pcs_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of PCS columns
pcs_columns <- paste0("PCS_", 1:6)

# Apply the function to each PCS column
for (pcs_col in pcs_columns) {
  create_summary_and_plot(rct, pcs_col)
}

## 
##  PCS_1  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   31  27  48  56 
## 0      1    Very False   42  29  49  55 
## 1      0  Somewhat False 20   7  13  21 
## 1      1  Somewhat False 28  21  13  23 
## 2      0  Somewhat True  71  87  82  86 
## 2      1  Somewhat True  116 88  53  88 
## 3      0    Very True    701 572 584 552
## 3      1    Very True    654 501 468 529
## NA     0        NA       27  157 123 135
## NA     1        NA       26  227 283 171
## ----------------------------------------

## 
##  PCS_2  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   44  33  32  39 
## 0      1    Very False   55  18  29  25 
## 1      0  Somewhat False 62  52  53  53 
## 1      1  Somewhat False 78  49  37  53 
## 2      0  Somewhat True  71  97  90  101
## 2      1  Somewhat True  99  94  68  94 
## 3      0    Very True    635 509 545 520
## 3      1    Very True    605 471 440 518
## NA     0        NA       38  159 130 137
## NA     1        NA       29  234 292 176
## ----------------------------------------

## 
##  PCS_3  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   54  32  49  46 
## 0      1    Very False   58  38  39  57 
## 1      0  Somewhat False 21  15  23  24 
## 1      1  Somewhat False 46  25  24  37 
## 2      0  Somewhat True  143 125 107 129
## 2      1  Somewhat True  145 124 103 117
## 3      0    Very True    599 521 530 506
## 3      1    Very True    572 443 405 477
## NA     0        NA       33  157 141 145
## NA     1        NA       45  236 295 178
## ----------------------------------------

## 
##  PCS_4  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   51  43  40  50 
## 0      1    Very False   61  41  28  30 
## 1      0  Somewhat False 72  63  51  50 
## 1      1  Somewhat False 82  53  32  49 
## 2      0  Somewhat True  92  81  89  88 
## 2      1  Somewhat True  101 75  75  91 
## 3      0    Very True    603 505 538 519
## 3      1    Very True    581 458 437 514
## NA     0        NA       32  158 132 143
## NA     1        NA       41  239 294 182
## ----------------------------------------

## 
##  PCS_5  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   53  42  47  53 
## 0      1    Very False   57  37  36  37 
## 1      0  Somewhat False 71  62  58  61 
## 1      1  Somewhat False 104 69  46  71 
## 2      0  Somewhat True  116 84  89  94 
## 2      1  Somewhat True  129 104 66  86 
## 3      0    Very True    575 506 520 500
## 3      1    Very True    546 418 419 501
## NA     0        NA       35  156 136 142
## NA     1        NA       30  238 299 171
## ----------------------------------------

## 
##  PCS_6  summary by time and treatment group
## 
## ========================================
## score trt  score_label    0   2   4   8 
## ----------------------------------------
## 0      0    Very False   58  36  52  58 
## 0      1    Very False   49  41  27  41 
## 1      0  Somewhat False 59  43  52  53 
## 1      1  Somewhat False 81  46  37  53 
## 2      0  Somewhat True  116 115 81  91 
## 2      1  Somewhat True  151 88  77  93 
## 3      0    Very True    586 499 533 508
## 3      1    Very True    561 459 431 512
## NA     0        NA       31  157 132 140
## NA     1        NA       24  232 294 167
## ----------------------------------------

We can note the following:

The trends for each question remain roughly the same. Including how students have responded to NA values in each of the time periods. The exact frequency of responses might differ but the trend is consistent.
There are two types of questions about perceived control here - staying out of trouble and academic performances - both based on efforts from the students.
For both normally coded and reverse coded questions, we see that the response frequencies are similar. This means that the positives and the negatives have the same trends - there may be a measurement error (most probably by the respondents).
The issue here is - both the questions on staying out of trouble are normally coded and all the four questions on academic performance are negatively coded and are framed in the similar manner which is focused on test scores/grades/marks - which makes it difficult to isolate if the measurement error is actually present.
Number of NA responses increase every two weeks - except for the last week: possible explanation is that there was response fatigue or the NAs reduced for the endline because it was the last survey and there may be a push from the organisation to encourage students to engage with the survey.

# Function to get the question text based on PHQ column name
get_phq_question_text <- function(phq_col) {
  question_texts <- list(
    PHQ_1 = "Little interest or pleasure in doing things?",
    PHQ_2 = "Feeling sad, depressed or hopeless?",
    PHQ_3 = "Trouble falling or staying asleep, or sleeping too much?",
    PHQ_4 = "Feeling tired or having little energy?",
    PHQ_5 = "Poor appetite or eating too much (overeating)?",
    PHQ_6 = "Feeling bad about yourself– or that you are a failure or have let yourself or your family down?",
    PHQ_7 = "Trouble concentrating on things, such as reading a book?",
    PHQ_8 = "Moving or speaking so slowly that other people could have noticed. Or the opposite — being so energetic or nervous that you have been moving around a lot more than usual?",
    PHQ_Functioning = "If you experienced any of the problems listed above, how difficult have they made it for you to do your work or get along with other people?"
  )
  return(question_texts[[phq_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a PHQ column by time and create a frequency plot
create_summary_and_plot_phq <- function(data, phq_col) {
  # Summarize the PHQ column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(phq_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(phq_col)), "NA", as.character(!!sym(phq_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "0" ~ ifelse(phq_col == "PHQ_Functioning", "Not difficult at all", "Not at all"),
      score == "1" ~ ifelse(phq_col == "PHQ_Functioning", "Somewhat difficult", "Several days (between 2 and 7 days)"),
      score == "2" ~ ifelse(phq_col == "PHQ_Functioning", "Very difficult", "Over half the days (more than 7 days)"),
      score == "3" ~ ifelse(phq_col == "PHQ_Functioning", "Extremely difficult", "Nearly/almost every day"),
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_phq_question_text(phq_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", phq_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", phq_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of PHQ columns (assuming PHQ_1 to PHQ_8 and PHQ_Functioning)
phq_columns <- c(paste0("PHQ_", 1:8), "PHQ_Functioning")

# Apply the function to each PHQ column
for (phq_col in phq_columns) {
  create_summary_and_plot_phq(rct, phq_col)
}

## 
##  PHQ_1  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               258 205 254 252
## 0      1               Not at all               251 168 202 253
## 1      0   Several days (between 2 and 7 days)  249 272 265 242
## 1      1   Several days (between 2 and 7 days)  286 245 211 260
## 2      0  Over half the days (more than 7 days) 77  89  100 106
## 2      1  Over half the days (more than 7 days) 107 108 97  86 
## 3      0         Nearly/almost every day        201 101 109 101
## 3      1         Nearly/almost every day        150 92  57  91 
## NA     0                   NA                   65  183 122 149
## NA     1                   NA                   72  253 299 176
## ---------------------------------------------------------------

## 
##  PHQ_2  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               371 316 368 345
## 0      1               Not at all               356 266 285 342
## 1      0   Several days (between 2 and 7 days)  221 202 205 202
## 1      1   Several days (between 2 and 7 days)  286 232 190 222
## 2      0  Over half the days (more than 7 days) 107 92  97  101
## 2      1  Over half the days (more than 7 days) 96  75  66  78 
## 3      0         Nearly/almost every day        101 74  58  59 
## 3      1         Nearly/almost every day        105 51  23  50 
## NA     0                   NA                   50  166 122 143
## NA     1                   NA                   23  242 302 174
## ---------------------------------------------------------------

## 
##  PHQ_3  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               405 345 401 382
## 0      1               Not at all               428 286 311 366
## 1      0   Several days (between 2 and 7 days)  173 171 176 182
## 1      1   Several days (between 2 and 7 days)  189 169 147 186
## 2      0  Over half the days (more than 7 days) 107 87  74  71 
## 2      1  Over half the days (more than 7 days) 94  99  60  73 
## 3      0         Nearly/almost every day        106 78  69  67 
## 3      1         Nearly/almost every day        121 68  36  63 
## NA     0                   NA                   59  169 130 148
## NA     1                   NA                   34  244 312 178
## ---------------------------------------------------------------

## 
##  PHQ_4  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               354 280 303 324
## 0      1               Not at all               344 232 228 308
## 1      0   Several days (between 2 and 7 days)  247 240 240 217
## 1      1   Several days (between 2 and 7 days)  250 262 202 229
## 2      0  Over half the days (more than 7 days) 118 103 121 91 
## 2      1  Over half the days (more than 7 days) 149 90  100 84 
## 3      0         Nearly/almost every day        77  54  56  67 
## 3      1         Nearly/almost every day        84  38  32  62 
## NA     0                   NA                   54  173 130 151
## NA     1                   NA                   39  244 304 183
## ---------------------------------------------------------------

## 
##  PHQ_5  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               430 368 416 380
## 0      1               Not at all               460 341 319 390
## 1      0   Several days (between 2 and 7 days)  155 153 157 161
## 1      1   Several days (between 2 and 7 days)  182 143 132 144
## 2      0  Over half the days (more than 7 days) 94  68  78  68 
## 2      1  Over half the days (more than 7 days) 81  71  57  78 
## 3      0         Nearly/almost every day        120 96  69  93 
## 3      1         Nearly/almost every day        98  69  47  75 
## NA     0                   NA                   51  165 130 148
## NA     1                   NA                   45  242 311 179
## ---------------------------------------------------------------

## 
##  PHQ_6  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               380 343 380 359
## 0      1               Not at all               360 283 300 362
## 1      0   Several days (between 2 and 7 days)  194 168 197 169
## 1      1   Several days (between 2 and 7 days)  228 190 149 173
## 2      0  Over half the days (more than 7 days) 78  77  61  85 
## 2      1  Over half the days (more than 7 days) 87  69  73  81 
## 3      0         Nearly/almost every day        150 92  84  87 
## 3      1         Nearly/almost every day        158 84  40  73 
## NA     0                   NA                   48  170 128 150
## NA     1                   NA                   33  240 304 177
## ---------------------------------------------------------------

## 
##  PHQ_7  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               296 265 316 305
## 0      1               Not at all               309 227 255 310
## 1      0   Several days (between 2 and 7 days)  224 229 240 207
## 1      1   Several days (between 2 and 7 days)  228 212 173 207
## 2      0  Over half the days (more than 7 days) 112 105 108 106
## 2      1  Over half the days (more than 7 days) 113 91  83  90 
## 3      0         Nearly/almost every day        170 86  58  87 
## 3      1         Nearly/almost every day        184 93  55  84 
## NA     0                   NA                   48  165 128 145
## NA     1                   NA                   32  243 300 175
## ---------------------------------------------------------------

## 
##  PHQ_8  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               449 382 405 406
## 0      1               Not at all               486 361 346 414
## 1      0   Several days (between 2 and 7 days)  163 160 169 167
## 1      1   Several days (between 2 and 7 days)  158 126 117 152
## 2      0  Over half the days (more than 7 days) 85  70  84  63 
## 2      1  Over half the days (more than 7 days) 86  62  55  63 
## 3      0         Nearly/almost every day        95  70  65  65 
## 3      1         Nearly/almost every day        89  71  42  63 
## NA     0                   NA                   58  168 127 149
## NA     1                   NA                   47  246 306 174
## ---------------------------------------------------------------

## 
##  PHQ_Functioning  summary by time and treatment group
## 
## ==============================================
## score trt     score_label       0   2   4   8 
## ----------------------------------------------
## 0      0  Not difficult at all 226 155 175 202
## 0      1  Not difficult at all 229 145 168 226
## 1      0   Somewhat difficult  339 305 302 317
## 1      1   Somewhat difficult  378 305 203 299
## 2      0     Very difficult    104 81  69  73 
## 2      1     Very difficult    113 54  55  72 
## 3      0  Extremely difficult  59  45  26  37 
## 3      1  Extremely difficult  55  24  14  28 
## NA     0           NA          122 264 278 221
## NA     1           NA          91  338 426 241
## ----------------------------------------------

We can note the following:

NA values follow exactly the same trend pattern. Also, the number of people responding NA to the questions in various time periods are also quite similar.
For every question, the frequency of reporting “several days”, “half the days” and “nearly every day” seems to be reducing in comparison to the baseline.
The frequency trend of “Not at all” remains consistent with all questions - decreases substantially from the baseline to week 2 and 4 and then rises up in the 8th week. This is exact opposite of the NA trend.

Possible explanation for this trend?

# Function to get the question text based on GAD column name
get_gad_question_text <- function(gad_col) {
  question_texts <- list(
    GAD_1 = "Feeling nervous, anxious, restless, or uneasy?",
    GAD_2 = "Not being able to stop or control worrying about things?",
    GAD_3 = "Worrying too much about different things?",
    GAD_4 = "Trouble relaxing?",
    GAD_5 = "Being so restless or uneasy that it's hard to sit still?",
    GAD_6 = "Becoming easily annoyed or irritable?",
    GAD_7 = "Feeling afraid as if something bad might happen?",
    GAD_Check = "If you read this question, respond 'Not at all'",
    GAD_Functioning = "If you experienced any of the problems listed above, how difficult have they made it for you to do your work or get along with other people?"
  )
  return(question_texts[[gad_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a GAD column by time and create a frequency plot
create_summary_and_plot_gad <- function(data, gad_col) {
  # Summarize the GAD column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(gad_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(gad_col)), "NA", as.character(!!sym(gad_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "0" ~ ifelse(gad_col == "GAD_Functioning", "Not difficult at all", "Not at all"),
      score == "1" ~ ifelse(gad_col == "GAD_Functioning", "Somewhat difficult", "Several days (between 2 and 7 days)"),
      score == "2" ~ ifelse(gad_col == "GAD_Functioning", "Very difficult", "Over half the days (more than 7 days)"),
      score == "3" ~ ifelse(gad_col == "GAD_Functioning", "Extremely difficult", "Nearly/almost every day"),
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_gad_question_text(gad_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", gad_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", gad_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of GAD columns (assuming GAD_1 to GAD_7, GAD_Check, and GAD_Functioning)
gad_columns <- c(paste0("GAD_", 1:7), "GAD_Check", "GAD_Functioning")

# Apply the function to each GAD column
for (gad_col in gad_columns) {
  create_summary_and_plot_gad(rct, gad_col)
}

## 
##  GAD_1  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               384 321 377 380
## 0      1               Not at all               412 275 307 376
## 1      0   Several days (between 2 and 7 days)  246 219 235 212
## 1      1   Several days (between 2 and 7 days)  270 235 203 228
## 2      0  Over half the days (more than 7 days) 84  82  70  48 
## 2      1  Over half the days (more than 7 days) 83  75  45  45 
## 3      0         Nearly/almost every day        90  57  41  67 
## 3      1         Nearly/almost every day        62  38  13  45 
## NA     0                   NA                   46  171 127 143
## NA     1                   NA                   39  243 298 172
## ---------------------------------------------------------------

## 
##  GAD_2  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               284 267 296 309
## 0      1               Not at all               307 236 232 302
## 1      0   Several days (between 2 and 7 days)  259 226 243 212
## 1      1   Several days (between 2 and 7 days)  269 228 221 226
## 2      0  Over half the days (more than 7 days) 95  86  99  87 
## 2      1  Over half the days (more than 7 days) 93  90  66  92 
## 3      0         Nearly/almost every day        172 105 83  95 
## 3      1         Nearly/almost every day        153 70  42  69 
## NA     0                   NA                   40  166 129 147
## NA     1                   NA                   44  242 305 177
## ---------------------------------------------------------------

## 
##  GAD_3  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               248 230 263 282
## 0      1               Not at all               261 183 214 283
## 1      0   Several days (between 2 and 7 days)  260 226 268 208
## 1      1   Several days (between 2 and 7 days)  272 251 200 237
## 2      0  Over half the days (more than 7 days) 118 102 102 103
## 2      1  Over half the days (more than 7 days) 105 100 90  73 
## 3      0         Nearly/almost every day        175 122 86  107
## 3      1         Nearly/almost every day        188 88  52  95 
## NA     0                   NA                   49  170 131 150
## NA     1                   NA                   40  244 310 178
## ---------------------------------------------------------------

## 
##  GAD_4  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               444 361 389 381
## 0      1               Not at all               461 342 323 382
## 1      0   Several days (between 2 and 7 days)  167 156 170 169
## 1      1   Several days (between 2 and 7 days)  194 146 145 170
## 2      0  Over half the days (more than 7 days) 101 87  86  71 
## 2      1  Over half the days (more than 7 days) 85  75  58  76 
## 3      0         Nearly/almost every day        91  74  68  79 
## 3      1         Nearly/almost every day        76  54  27  54 
## NA     0                   NA                   47  172 137 150
## NA     1                   NA                   50  249 313 184
## ---------------------------------------------------------------

## 
##  GAD_5  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               506 423 426 410
## 0      1               Not at all               558 383 351 412
## 1      0   Several days (between 2 and 7 days)  149 138 163 160
## 1      1   Several days (between 2 and 7 days)  139 136 127 171
## 2      0  Over half the days (more than 7 days) 67  67  79  80 
## 2      1  Over half the days (more than 7 days) 78  64  54  61 
## 3      0         Nearly/almost every day        71  49  43  50 
## 3      1         Nearly/almost every day        46  40  22  40 
## NA     0                   NA                   57  173 139 150
## NA     1                   NA                   45  243 312 182
## ---------------------------------------------------------------

## 
##  GAD_6  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               336 266 318 292
## 0      1               Not at all               316 241 247 293
## 1      0   Several days (between 2 and 7 days)  223 202 215 213
## 1      1   Several days (between 2 and 7 days)  256 200 178 232
## 2      0  Over half the days (more than 7 days) 88  98  84  82 
## 2      1  Over half the days (more than 7 days) 108 84  70  85 
## 3      0         Nearly/almost every day        150 101 96  114
## 3      1         Nearly/almost every day        133 96  59  79 
## NA     0                   NA                   53  183 137 149
## NA     1                   NA                   53  245 312 177
## ---------------------------------------------------------------

## 
##  GAD_7  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               283 284 300 296
## 0      1               Not at all               317 227 240 311
## 1      0   Several days (between 2 and 7 days)  278 209 232 214
## 1      1   Several days (between 2 and 7 days)  266 224 174 237
## 2      0  Over half the days (more than 7 days) 94  96  87  106
## 2      1  Over half the days (more than 7 days) 96  89  85  69 
## 3      0         Nearly/almost every day        153 94  95  87 
## 3      1         Nearly/almost every day        147 87  49  71 
## NA     0                   NA                   42  167 136 147
## NA     1                   NA                   40  239 318 178
## ---------------------------------------------------------------

## 
##  GAD_Check  summary by time and treatment group
## 
## ===============================================================
## score trt              score_label               0   2   4   8 
## ---------------------------------------------------------------
## 0      0               Not at all               654 594 622 597
## 0      1               Not at all               658 542 480 581
## 1      0   Several days (between 2 and 7 days)  25  23  33  37 
## 1      1   Several days (between 2 and 7 days)  41  32  34  49 
## 2      0  Over half the days (more than 7 days) 22  12  18  20 
## 2      1  Over half the days (more than 7 days) 17  14   9  27 
## 3      0         Nearly/almost every day        26  13  19  24 
## 3      1         Nearly/almost every day        22  18  16  14 
## NA     0                   NA                   123 208 158 172
## NA     1                   NA                   128 260 327 195
## ---------------------------------------------------------------

## 
##  GAD_Functioning  summary by time and treatment group
## 
## ==============================================
## score trt     score_label       0   2   4   8 
## ----------------------------------------------
## 0      0  Not difficult at all 209 161 165 223
## 0      1  Not difficult at all 220 153 136 245
## 1      0   Somewhat difficult  306 245 249 286
## 1      1   Somewhat difficult  349 228 190 271
## 2      0     Very difficult    123 69  62  76 
## 2      1     Very difficult    109 53  45  72 
## 3      0  Extremely difficult  52  39  24  31 
## 3      1  Extremely difficult  47  20   8  25 
## NA     0           NA          160 336 350 234
## NA     1           NA          141 412 487 253
## ----------------------------------------------

We can note the following:

NA trends remain consistent among all GAD_1 to GAD_7 questions. And this trend is consistent with PHQ questions as well.

# Function to get the question text based on MSPSS column name
get_mspss_question_text <- function(mspss_col) {
  question_texts <- list(
    MSPSS_1 = "My family really tries to help me",
    MSPSS_2 = "I get the emotional help and support I need from my family",
    MSPSS_3 = "My friends really try to help me",
    MSPSS_4 = "I can count on my friends when things go wrong",
    MSPSS_5 = "I can talk about my problems with my family",
    MSPSS_6 = "I have friends with whom I can share my joys and sorrows",
    MSPSS_7 = "My family is willing to help me make decisions",
    MSPSS_8 = "I can talk about my problems with my friends"
  )
  return(question_texts[[mspss_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a MSPSS column by time and create a frequency plot
create_summary_and_plot_mspss <- function(data, mspss_col) {
  # Summarize the MSPSS column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(mspss_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(mspss_col)), "NA", as.character(!!sym(mspss_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "1" ~ "I very strongly disagree",
      score == "2" ~ "I strongly disagree",
      score == "3" ~ "I mildly disagree",
      score == "4" ~ "I don’t agree or disagree (neutral)",
      score == "5" ~ "I mildly agree",
      score == "6" ~ "I strongly agree",
      score == "7" ~ "I very strongly agree",
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_mspss_question_text(mspss_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", mspss_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", mspss_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of MSPSS columns (assuming MSPSS_1 to MSPSS_8)
mspss_columns <- paste0("MSPSS_", 1:8)

# Apply the function to each MSPSS column
for (mspss_col in mspss_columns) {
  create_summary_and_plot_mspss(rct, mspss_col)
}

## 
##  MSPSS_1  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       43  35  37  52 
## 1      1       I very strongly disagree       35  31  25  39 
## 2      0          I strongly disagree         15  11  17  18 
## 2      1          I strongly disagree         12  11  18  12 
## 3      0           I mildly disagree          13   9  18  21 
## 3      1           I mildly disagree          16  14  12  18 
## 4      0  I don’t agree or disagree (neutral) 32  23  23  25 
## 4      1  I don’t agree or disagree (neutral) 28  23  19  33 
## 5      0            I mildly agree            54  45  49  44 
## 5      1            I mildly agree            58  48  35  48 
## 6      0           I strongly agree           193 152 156 149
## 6      1           I strongly agree           201 131 128 143
## 7      0         I very strongly agree        476 412 422 398
## 7      1         I very strongly agree        489 372 336 393
## NA     0                  NA                  24  163 128 143
## NA     1                  NA                  27  236 293 180
## -------------------------------------------------------------

## 
##  MSPSS_2  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       52  30  34  43 
## 1      1       I very strongly disagree       43  27  20  43 
## 2      0          I strongly disagree         37  23  29  32 
## 2      1          I strongly disagree         35  25  22  24 
## 3      0           I mildly disagree          30  19  23  31 
## 3      1           I mildly disagree          48  23  19  25 
## 4      0  I don’t agree or disagree (neutral) 47  40  31  51 
## 4      1  I don’t agree or disagree (neutral) 35  28  17  37 
## 5      0            I mildly agree            88  86  67  75 
## 5      1            I mildly agree            113 89  59  70 
## 6      0           I strongly agree           215 164 173 143
## 6      1           I strongly agree           199 156 152 150
## 7      0         I very strongly agree        343 323 369 329
## 7      1         I very strongly agree        360 280 280 339
## NA     0                  NA                  38  165 124 146
## NA     1                  NA                  33  238 297 178
## -------------------------------------------------------------

## 
##  MSPSS_3  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       75  70  64  86 
## 1      1       I very strongly disagree       93  64  39  71 
## 2      0          I strongly disagree         54  35  42  42 
## 2      1          I strongly disagree         46  34  19  32 
## 3      0           I mildly disagree          71  50  67  50 
## 3      1           I mildly disagree          67  34  42  54 
## 4      0  I don’t agree or disagree (neutral) 90  87  78  77 
## 4      1  I don’t agree or disagree (neutral) 110 67  54  77 
## 5      0            I mildly agree            201 209 179 169
## 5      1            I mildly agree            205 182 164 162
## 6      0           I strongly agree           197 141 158 150
## 6      1           I strongly agree           182 133 133 155
## 7      0         I very strongly agree        127 96  131 133
## 7      1         I very strongly agree        132 109 108 137
## NA     0                  NA                  35  162 131 143
## NA     1                  NA                  31  243 307 178
## -------------------------------------------------------------

## 
##  MSPSS_4  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       127 109 109 107
## 1      1       I very strongly disagree       145 93  61  100
## 2      0          I strongly disagree         76  57  55  63 
## 2      1          I strongly disagree         72  41  35  46 
## 3      0           I mildly disagree          60  75  62  62 
## 3      1           I mildly disagree          80  68  50  67 
## 4      0  I don’t agree or disagree (neutral) 97  85  96  91 
## 4      1  I don’t agree or disagree (neutral) 92  89  77  83 
## 5      0            I mildly agree            180 167 159 147
## 5      1            I mildly agree            177 150 127 146
## 6      0           I strongly agree           129 107 118 114
## 6      1           I strongly agree           134 74  112 125
## 7      0         I very strongly agree        138 85  113 120
## 7      1         I very strongly agree        121 105 96  116
## NA     0                  NA                  43  165 138 146
## NA     1                  NA                  45  246 308 183
## -------------------------------------------------------------

## 
##  MSPSS_5  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       88  58  55  66 
## 1      1       I very strongly disagree       121 54  34  61 
## 2      0          I strongly disagree         48  36  39  35 
## 2      1          I strongly disagree         55  26  28  37 
## 3      0           I mildly disagree          56  39  45  36 
## 3      1           I mildly disagree          51  33  28  53 
## 4      0  I don’t agree or disagree (neutral) 71  45  53  62 
## 4      1  I don’t agree or disagree (neutral) 69  47  44  58 
## 5      0            I mildly agree            102 99  90  100
## 5      1            I mildly agree            118 115 89  102
## 6      0           I strongly agree           143 141 152 148
## 6      1           I strongly agree           129 115 118 124
## 7      0         I very strongly agree        287 262 280 253
## 7      1         I very strongly agree        285 226 215 239
## NA     0                  NA                  55  170 136 150
## NA     1                  NA                  38  250 310 192
## -------------------------------------------------------------

## 
##  MSPSS_6  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       105 70  92  90 
## 1      1       I very strongly disagree       103 75  45  79 
## 2      0          I strongly disagree         50  46  34  48 
## 2      1          I strongly disagree         51  29  31  31 
## 3      0           I mildly disagree          53  42  53  50 
## 3      1           I mildly disagree          52  44  39  48 
## 4      0  I don’t agree or disagree (neutral) 79  71  74  73 
## 4      1  I don’t agree or disagree (neutral) 80  65  67  64 
## 5      0            I mildly agree            141 155 152 135
## 5      1            I mildly agree            165 137 118 123
## 6      0           I strongly agree           171 143 153 155
## 6      1           I strongly agree           181 119 121 150
## 7      0         I very strongly agree        203 151 153 154
## 7      1         I very strongly agree        196 153 140 178
## NA     0                  NA                  48  172 139 145
## NA     1                  NA                  38  244 305 193
## -------------------------------------------------------------

## 
##  MSPSS_7  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       50  35  34  43 
## 1      1       I very strongly disagree       49  25  30  45 
## 2      0          I strongly disagree         33  21  29  19 
## 2      1          I strongly disagree         28  22  13  20 
## 3      0           I mildly disagree          34  17  20  26 
## 3      1           I mildly disagree          27  19  15  24 
## 4      0  I don’t agree or disagree (neutral) 35  46  42  54 
## 4      1  I don’t agree or disagree (neutral) 49  30  32  41 
## 5      0            I mildly agree            88  73  70  71 
## 5      1            I mildly agree            103 75  69  61 
## 6      0           I strongly agree           189 142 162 145
## 6      1           I strongly agree           154 139 132 155
## 7      0         I very strongly agree        382 347 361 345
## 7      1         I very strongly agree        416 310 270 331
## NA     0                  NA                  39  169 132 147
## NA     1                  NA                  40  246 305 189
## -------------------------------------------------------------

## 
##  MSPSS_8  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0       I very strongly disagree       167 113 106 119
## 1      1       I very strongly disagree       183 100 76  105
## 2      0          I strongly disagree         73  65  60  60 
## 2      1          I strongly disagree         75  40  31  50 
## 3      0           I mildly disagree          66  58  70  52 
## 3      1           I mildly disagree          67  45  36  56 
## 4      0  I don’t agree or disagree (neutral) 86  84  95  91 
## 4      1  I don’t agree or disagree (neutral) 114 92  73  95 
## 5      0            I mildly agree            181 173 161 150
## 5      1            I mildly agree            167 152 139 137
## 6      0           I strongly agree           110 104 113 116
## 6      1           I strongly agree           117 91  102 109
## 7      0         I very strongly agree        130 86  114 116
## 7      1         I very strongly agree        116 106 104 136
## NA     0                  NA                  37  167 131 146
## NA     1                  NA                  27  240 305 178
## -------------------------------------------------------------

# Function to get the question text based on GQ column name
get_gq_question_text <- function(gq_col) {
  question_texts <- list(
    GQ_1 = "I have a lot to be thankful for in my life",
    GQ_2 = "I do not see a lot to be thankful for in my community and in the world",
    GQ_3 = "I am grateful to many different people",
    GQ_4 = "I don’t feel grateful very often",
    GQ_5 = "As I get older, I am able to appreciate things that happened in the past",
    GQ_6 = "I can write down a long list of things to be grateful for"
  )
  return(question_texts[[gq_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a GQ column by time and create a frequency plot
create_summary_and_plot_gq <- function(data, gq_col) {
  # Summarize the GQ column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(gq_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(gq_col)), "NA", as.character(!!sym(gq_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "1" ~ "I disagree very much",
      score == "2" ~ "I disagree",
      score == "3" ~ "I disagree a little",
      score == "4" ~ "I don’t agree or disagree (neutral)",
      score == "5" ~ "I agree a little",
      score == "6" ~ "I agree",
      score == "7" ~ "I agree very much",
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_gq_question_text(gq_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", gq_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", gq_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of GQ columns (assuming GQ_1 to GQ_6)
gq_columns <- paste0("GQ_", 1:6)

# Apply the function to each GQ column
for (gq_col in gq_columns) {
  create_summary_and_plot_gq(rct, gq_col)
}

## 
##  GQ_1  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         23  14  16  24 
## 1      1         I disagree very much         18   9   7  13 
## 2      0              I disagree               7   4   5   9 
## 2      1              I disagree              16   2   6   4 
## 3      0          I disagree a little          8   2   3   4 
## 3      1          I disagree a little         10   5   5   7 
## 4      0  I don’t agree or disagree (neutral) 15  14  14  17 
## 4      1  I don’t agree or disagree (neutral) 13   8  11  12 
## 5      0           I agree a little           18  27  26  31 
## 5      1           I agree a little           43  32  26  38 
## 6      0                I agree               220 157 168 170
## 6      1                I agree               234 183 137 176
## 7      0           I agree very much          522 463 487 450
## 7      1           I agree very much          501 386 381 434
## NA     0                  NA                  37  169 131 145
## NA     1                  NA                  31  241 293 182
## -------------------------------------------------------------

## 
##  GQ_2  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         49  34  47  39 
## 1      1         I disagree very much         42  34  27  50 
## 2      0              I disagree              76  38  58  57 
## 2      1              I disagree              81  38  46  53 
## 3      0          I disagree a little         102 84  79  60 
## 3      1          I disagree a little         108 90  49  82 
## 4      0  I don’t agree or disagree (neutral) 98  74  74  91 
## 4      1  I don’t agree or disagree (neutral) 92  71  64  67 
## 5      0           I agree a little           45  53  40  40 
## 5      1           I agree a little           68  43  44  39 
## 6      0                I agree               179 147 154 134
## 6      1                I agree               198 147 132 151
## 7      0           I agree very much          252 249 263 277
## 7      1           I agree very much          239 197 204 245
## NA     0                  NA                  49  171 135 152
## NA     1                  NA                  38  246 300 179
## -------------------------------------------------------------

## 
##  GQ_3  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         24  15  24  25 
## 1      1         I disagree very much         21  13  10  18 
## 2      0              I disagree              27  19  10  23 
## 2      1              I disagree              29  13   6  23 
## 3      0          I disagree a little         14  18  19  11 
## 3      1          I disagree a little         27  15  15  10 
## 4      0  I don’t agree or disagree (neutral) 35  39  39  36 
## 4      1  I don’t agree or disagree (neutral) 61  39  21  43 
## 5      0           I agree a little           99  86  95  99 
## 5      1           I agree a little           122 88  60  84 
## 6      0                I agree               296 256 235 222
## 6      1                I agree               295 209 197 214
## 7      0           I agree very much          308 248 288 285
## 7      1           I agree very much          273 241 251 291
## NA     0                  NA                  47  169 140 149
## NA     1                  NA                  38  248 306 183
## -------------------------------------------------------------

## 
##  GQ_4  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         51  41  52  62 
## 1      1         I disagree very much         62  48  38  52 
## 2      0              I disagree              122 92  75  79 
## 2      1              I disagree              105 81  53  61 
## 3      0          I disagree a little         144 100 91  87 
## 3      1          I disagree a little         122 89  81  102
## 4      0  I don’t agree or disagree (neutral) 77  82  104 94 
## 4      1  I don’t agree or disagree (neutral) 113 60  65  75 
## 5      0           I agree a little           69  60  47  56 
## 5      1           I agree a little           78  66  48  55 
## 6      0                I agree               180 147 166 135
## 6      1                I agree               182 135 140 146
## 7      0           I agree very much          158 154 174 190
## 7      1           I agree very much          147 133 138 190
## NA     0                  NA                  49  174 141 147
## NA     1                  NA                  57  254 303 185
## -------------------------------------------------------------

## 
##  GQ_5  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         43  20  29  41 
## 1      1         I disagree very much         49  24  14  30 
## 2      0              I disagree              44  20  19  22 
## 2      1              I disagree              55  28  18  20 
## 3      0          I disagree a little         22  11  15  15 
## 3      1          I disagree a little         29  13  16  20 
## 4      0  I don’t agree or disagree (neutral) 50  39  29  52 
## 4      1  I don’t agree or disagree (neutral) 57  35  31  49 
## 5      0           I agree a little           85  68  62  61 
## 5      1           I agree a little           86  68  56  69 
## 6      0                I agree               234 211 201 188
## 6      1                I agree               221 183 140 162
## 7      0           I agree very much          333 315 365 323
## 7      1           I agree very much          331 261 294 336
## NA     0                  NA                  39  166 130 148
## NA     1                  NA                  38  254 297 180
## -------------------------------------------------------------

## 
##  GQ_6  summary by time and treatment group
## 
## =============================================================
## score trt             score_label              0   2   4   8 
## -------------------------------------------------------------
## 1      0         I disagree very much         40  23  31  41 
## 1      1         I disagree very much         40  25   9  36 
## 2      0              I disagree              26  15  20  25 
## 2      1              I disagree              39  18  15  19 
## 3      0          I disagree a little         17   9  19  16 
## 3      1          I disagree a little         40  19  19  18 
## 4      0  I don’t agree or disagree (neutral) 55  43  35  36 
## 4      1  I don’t agree or disagree (neutral) 42  35  26  37 
## 5      0           I agree a little           106 75  81  81 
## 5      1           I agree a little           115 85  61  66 
## 6      0                I agree               237 183 191 168
## 6      1                I agree               227 176 146 167
## 7      0           I agree very much          324 339 344 338
## 7      1           I agree very much          331 267 293 347
## NA     0                  NA                  45  163 129 145
## NA     1                  NA                  32  241 297 176
## -------------------------------------------------------------

# Function to get the question text based on SWEMWBS column name
get_swemwbs_question_text <- function(swemwbs_col) {
  question_texts <- list(
    SWEMWBS_1 = "Over the last two weeks, I’ve been feeling optimistic about the future",
    SWEMWBS_2 = "Over the last two weeks, I’ve been feeling useful",
    SWEMWBS_3 = "Over the last two weeks, I’ve been feeling relaxed",
    SWEMWBS_4 = "Over the last two weeks, I’ve been dealing with problems well",
    SWEMWBS_5 = "Over the last two weeks, I’ve been thinking clearly",
    SWEMWBS_6 = "Over the last two weeks, I’ve been feeling close to other people",
    SWEMWBS_7 = "Over the last two weeks, I’ve been able to make up my own mind about things"
  )
  return(question_texts[[swemwbs_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize a SWEMWBS column by time and create a frequency plot
create_summary_and_plot_swemwbs <- function(data, swemwbs_col) {
  # Summarize the SWEMWBS column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(swemwbs_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(swemwbs_col)), "NA", as.character(!!sym(swemwbs_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "1" ~ "None of the time",
      score == "2" ~ "Rarely",
      score == "3" ~ "Some of the time",
      score == "4" ~ "Often",
      score == "5" ~ "All of the time",
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_swemwbs_question_text(swemwbs_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", swemwbs_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", swemwbs_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of SWEMWBS columns (assuming SWEMWBS_1 to SWEMWBS_7)
swemwbs_columns <- paste0("SWEMWBS_", 1:7)

# Apply the function to each SWEMWBS column
for (swemwbs_col in swemwbs_columns) {
  create_summary_and_plot_swemwbs(rct, swemwbs_col)
}

## 
##  SWEMWBS_1  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 110 98  133 143
## 1      1  None of the time 93  96  110 120
## 2      0       Rarely      111 119 130 102
## 2      1       Rarely      129 118 132 137
## 3      0  Some of the time 227 166 156 152
## 3      1  Some of the time 263 157 120 162
## 4      0       Often       145 131 132 118
## 4      1       Often       134 127 102 134
## 5      0  All of the time  198 160 158 176
## 5      1  All of the time  181 114 97  123
## NA     0         NA        59  176 141 159
## NA     1         NA        66  254 305 190
## ------------------------------------------

## 
##  SWEMWBS_2  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 131 91  99  97 
## 1      1  None of the time 130 68  64  88 
## 2      0       Rarely      121 95  87  98 
## 2      1       Rarely      124 82  69  96 
## 3      0  Some of the time 196 174 154 152
## 3      1  Some of the time 217 167 112 148
## 4      0       Often       165 140 166 157
## 4      1       Often       168 146 135 139
## 5      0  All of the time  184 185 204 193
## 5      1  All of the time  184 155 180 213
## NA     0         NA        53  165 140 153
## NA     1         NA        43  248 306 182
## ------------------------------------------

## 
##  SWEMWBS_3  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 138 104 93  89 
## 1      1  None of the time 139 71  56  111
## 2      0       Rarely      164 131 148 134
## 2      1       Rarely      195 131 111 111
## 3      0  Some of the time 258 210 187 189
## 3      1  Some of the time 235 174 157 159
## 4      0       Often       118 125 126 152
## 4      1       Often       142 123 119 136
## 5      0  All of the time  112 107 149 132
## 5      1  All of the time  99  118 120 158
## NA     0         NA        60  173 147 154
## NA     1         NA        56  249 303 191
## ------------------------------------------

## 
##  SWEMWBS_4  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 130 77  79  94 
## 1      1  None of the time 116 56  57  88 
## 2      0       Rarely      158 112 110 106
## 2      1       Rarely      145 118 77  93 
## 3      0  Some of the time 252 191 191 167
## 3      1  Some of the time 235 142 127 172
## 4      0       Often       138 139 141 158
## 4      1       Often       182 143 141 122
## 5      0  All of the time  119 156 180 174
## 5      1  All of the time  129 154 157 199
## NA     0         NA        53  175 149 151
## NA     1         NA        59  253 307 192
## ------------------------------------------

## 
##  SWEMWBS_5  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 64  35  44  59 
## 1      1  None of the time 63  40  26  59 
## 2      0       Rarely      109 82  78  79 
## 2      1       Rarely      112 69  68  73 
## 3      0  Some of the time 240 179 183 164
## 3      1  Some of the time 241 152 124 156
## 4      0       Often       153 151 144 166
## 4      1       Often       159 158 138 133
## 5      0  All of the time  233 233 248 225
## 5      1  All of the time  235 198 202 251
## NA     0         NA        51  170 153 157
## NA     1         NA        56  249 308 194
## ------------------------------------------

## 
##  SWEMWBS_6  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 94  82  74  79 
## 1      1  None of the time 96  52  49  69 
## 2      0       Rarely      143 99  109 114
## 2      1       Rarely      134 107 79  104
## 3      0  Some of the time 217 165 163 176
## 3      1  Some of the time 228 148 121 154
## 4      0       Often       151 150 158 145
## 4      1       Often       168 128 130 152
## 5      0  All of the time  193 186 200 183
## 5      1  All of the time  192 182 180 203
## NA     0         NA        52  168 146 153
## NA     1         NA        48  249 307 184
## ------------------------------------------

## 
##  SWEMWBS_7  summary by time and treatment group
## 
## ==========================================
## score trt   score_label     0   2   4   8 
## ------------------------------------------
## 1      0  None of the time 61  39  44  53 
## 1      1  None of the time 64  34  36  48 
## 2      0       Rarely      58  71  64  57 
## 2      1       Rarely      70  53  49  49 
## 3      0  Some of the time 195 153 146 155
## 3      1  Some of the time 209 124 109 162
## 4      0       Often       192 137 154 153
## 4      1       Often       175 143 126 147
## 5      0  All of the time  300 288 301 281
## 5      1  All of the time  309 271 246 278
## NA     0         NA        44  162 141 151
## NA     1         NA        39  241 300 182
## ------------------------------------------

# Function to get the question text based on SES column name
get_ses_question_text <- function(ses_col) {
  question_texts <- c(
    SES_1 = "I follow the rules at school",
    SES_2 = "I get in trouble at school",
    SES_3 = "When I am in class, I just pretend as if I am working",
    SES_4 = "I pay attention in class",
    SES_5 = "I complete my work on time",
    SES_6 = "I like being at school",
    SES_7 = "I feel excited by my work at school",
    SES_8 = "My classroom is a fun place to be",
    SES_9 = "I am interested in the work at school",
    SES_10 = "I feel happy in school",
    SES_11 = "I feel bored in school",
    SES_12 = "I check my schoolwork for mistakes",
    SES_13 = "I study at home even when I don’t have a test",
    SES_14 = "I try to watch TV shows about things we do in school",
    SES_15 = "When I read a book, I ask myself questions to make sure I understand what it is about",
    SES_16 = "I read extra books to learn more about things we do in school",
    SES_17 = "If I don’t know what a word means when I am reading, I do something to figure it out",
    SES_18 = "If I don’t understand what I read, I go back and read it over again",
    SES_19 = "I talk with people outside of school about what I am learning in class"
  )
  return(question_texts[[ses_col]])
}

# Function to extract the legend from a ggplot
get_legend <- function(myggplot) {
  tmp <- ggplotGrob(myggplot)
  legend <- tmp$grobs[which(sapply(tmp$grobs, function(x) x$name) == "guide-box")]
  return(legend[[1]])
}

# Function to summarize an SES column by time and create a frequency plot
create_summary_and_plot_ses <- function(data, ses_col) {
  # Summarize the SES column by time and treatment group
  freq_data <- data %>%
    select(time, !!sym(ses_col), trt) %>%
    mutate(score = ifelse(is.na(!!sym(ses_col)), "NA", as.character(!!sym(ses_col)))) %>%
    group_by(score, time, trt) %>%
    summarise(Frequency = n(), .groups = 'drop')
  
  # Add score labels
  freq_data <- freq_data %>%
    mutate(score_label = case_when(
      score == "1" ~ "Not true",
      score == "2" ~ "(Blank)",
      score == "3" ~ "Somewhat true",
      score == "4" ~ "(Blank)",
      score == "5" ~ "Very true",
      score == "NA" ~ "NA",
      TRUE ~ as.character(score)  # For any other values
    ))
  
  # Pivot the summarized data to wide format
  freq_wide <- freq_data %>%
    pivot_wider(names_from = time, values_from = Frequency, values_fill = list(Frequency = 0))
  
  # Get the question text for the title
  question_text <- get_ses_question_text(ses_col)
  
  # Create plots for each treatment group
  plots <- freq_data %>%
    split(.$trt) %>%
    map(~ ggplot(.x, aes(x = time, y = Frequency, color = score_label, group = score_label)) +
          geom_line() +
          geom_point() +
          labs(title = paste("Treatment Group:", unique(.x$trt)),
               x = "Time",
               y = "Frequency") +
          theme_minimal() +
          theme(
            plot.title = element_text(hjust = 0.5, size = 12),
            axis.title = element_text(size = 10),
            axis.text = element_text(size = 8),
            legend.position = "none"
          ) +
          scale_x_continuous(breaks = c(0, 2, 4, 8)))  # Specify the breaks on x-axis
  
  # Extract the legend from one plot
  legend <- get_legend(plots[[1]] + labs(color = "Score") + theme(legend.position = "bottom"))
  
  # Combine the plots and add the common title
  combined_plot <- arrangeGrob(grobs = lapply(plots, ggplotGrob), ncol = 2, top = textGrob(
    paste("Frequency of Scores for", ses_col, ":", question_text),
    gp = gpar(fontsize = 14), hjust = 0.5
  ))
  
  # Print stargazer table
  cat("\n", ses_col, " summary by time and treatment group\n")
  stargazer(freq_wide, type = "text", summary = FALSE, rownames = FALSE)
  
  # Combine the plots and legend
  final_plot <- arrangeGrob(combined_plot, legend, ncol = 1, heights = c(10, 1))
  
  # Print the final plot
  grid.newpage()
  grid.draw(final_plot)
}

# List of SES columns (assuming SES_1 to SES_19)
ses_columns <- paste0("SES_", 1:19)

# Apply the function to each SES column
for (ses_col in ses_columns) {
  create_summary_and_plot_ses(rct, ses_col)
}

## 
##  SES_1  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    29  32  43  48 
## 1      1    Not true    40  18  21  42 
## 2      0     (Blank)     6   5  13  16 
## 2      1     (Blank)     6   6   5   7 
## 3      0  Somewhat true 168 145 149 155
## 3      1  Somewhat true 201 155 125 143
## 4      0     (Blank)    20  12  15  19 
## 4      1     (Blank)    17  14  17  22 
## 5      0    Very true   589 496 503 469
## 5      1    Very true   577 434 399 470
## NA     0       NA       38  160 127 143
## NA     1       NA       25  239 299 182
## ---------------------------------------

## 
##  SES_2  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    49  37  37  55 
## 1      1    Not true    57  36  31  49 
## 2      0     (Blank)    11   8  10  13 
## 2      1     (Blank)    10   2   7  11 
## 3      0  Somewhat true 123 128 132 141
## 3      1  Somewhat true 152 125 103 137
## 4      0     (Blank)    25  26  33  34 
## 4      1     (Blank)    20  24  13  30 
## 5      0    Very true   600 486 500 460
## 5      1    Very true   590 437 407 455
## NA     0       NA       42  165 138 147
## NA     1       NA       37  242 305 184
## ---------------------------------------

## 
##  SES_3  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    89  52  60  50 
## 1      1    Not true    74  45  51  68 
## 2      0     (Blank)     9  11  14  18 
## 2      1     (Blank)     8  11   6   4 
## 3      0  Somewhat true 125 107 136 123
## 3      1  Somewhat true 162 120 93  119
## 4      0     (Blank)    21  20  23  28 
## 4      1     (Blank)    15  17  14  24 
## 5      0    Very true   561 496 471 480
## 5      1    Very true   571 429 391 459
## NA     0       NA       45  164 146 151
## NA     1       NA       36  244 311 192
## ---------------------------------------

## 
##  SES_4  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    29  25  28  28 
## 1      1    Not true    45  23  17  39 
## 2      0     (Blank)    12   9  18  13 
## 2      1     (Blank)     6  11   9   9 
## 3      0  Somewhat true 237 188 162 188
## 3      1  Somewhat true 274 165 128 187
## 4      0     (Blank)    23  31  24  37 
## 4      1     (Blank)    24  17  22  13 
## 5      0    Very true   503 434 486 435
## 5      1    Very true   487 409 381 430
## NA     0       NA       46  163 132 149
## NA     1       NA       30  241 309 188
## ---------------------------------------

## 
##  SES_5  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    99  70  67  59 
## 1      1    Not true    120 57  51  62 
## 2      0     (Blank)    12  11  11  18 
## 2      1     (Blank)     8  12   8  18 
## 3      0  Somewhat true 272 232 217 212
## 3      1  Somewhat true 328 220 191 216
## 4      0     (Blank)    18  30  27  40 
## 4      1     (Blank)    21  30  18  24 
## 5      0    Very true   403 347 388 369
## 5      1    Very true   352 308 294 355
## NA     0       NA       46  160 140 152
## NA     1       NA       37  239 304 191
## ---------------------------------------

## 
##  SES_6  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    77  70  69  88 
## 1      1    Not true    89  75  52  81 
## 2      0     (Blank)    17  15  20  26 
## 2      1     (Blank)    14  15   8   8 
## 3      0  Somewhat true 148 145 136 136
## 3      1  Somewhat true 170 112 120 142
## 4      0     (Blank)    17  26  29  15 
## 4      1     (Blank)    18  28  16  19 
## 5      0    Very true   547 432 464 436
## 5      1    Very true   540 394 366 429
## NA     0       NA       44  162 132 149
## NA     1       NA       35  242 304 187
## ---------------------------------------

## 
##  SES_7  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    69  72  70  69 
## 1      1    Not true    97  66  58  72 
## 2      0     (Blank)    18  20  14  17 
## 2      1     (Blank)    17  19  10  14 
## 3      0  Somewhat true 204 165 176 172
## 3      1  Somewhat true 220 163 123 168
## 4      0     (Blank)    24  24  26  34 
## 4      1     (Blank)    22  21  19  23 
## 5      0    Very true   479 403 428 407
## 5      1    Very true   458 352 347 401
## NA     0       NA       56  166 136 151
## NA     1       NA       52  245 309 188
## ---------------------------------------

## 
##  SES_8  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    129 113 119 94 
## 1      1    Not true    142 90  85  96 
## 2      0     (Blank)    23  21  14  17 
## 2      1     (Blank)     6  12  10  12 
## 3      0  Somewhat true 160 148 144 164
## 3      1  Somewhat true 181 126 112 146
## 4      0     (Blank)    15  16  19  25 
## 4      1     (Blank)    26  28  14  25 
## 5      0    Very true   469 382 418 397
## 5      1    Very true   466 372 335 393
## NA     0       NA       54  170 136 153
## NA     1       NA       45  238 310 194
## ---------------------------------------

## 
##  SES_9  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    67  65  65  81 
## 1      1    Not true    85  67  50  72 
## 2      0     (Blank)    15  15  17  23 
## 2      1     (Blank)    11  11  11  16 
## 3      0  Somewhat true 211 181 175 154
## 3      1  Somewhat true 228 163 119 171
## 4      0     (Blank)    14  24  26  38 
## 4      1     (Blank)    23  32  21  31 
## 5      0    Very true   485 401 421 402
## 5      1    Very true   470 344 358 383
## NA     0       NA       58  164 146 152
## NA     1       NA       49  249 307 193
## ---------------------------------------

## 
##  SES_10  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    65  65  73  76 
## 1      1    Not true    73  71  40  82 
## 2      0     (Blank)     9  15  24  17 
## 2      1     (Blank)    10  11  11   9 
## 3      0  Somewhat true 184 150 128 146
## 3      1  Somewhat true 190 126 100 140
## 4      0     (Blank)    21  19  19  36 
## 4      1     (Blank)    20  16  15  25 
## 5      0    Very true   516 438 470 422
## 5      1    Very true   529 397 387 420
## NA     0       NA       55  163 136 153
## NA     1       NA       44  245 313 190
## ---------------------------------------

## 
##  SES_11  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    103 75  71  94 
## 1      1    Not true    85  81  71  110
## 2      0     (Blank)    17  11  18  28 
## 2      1     (Blank)    15   8  10  15 
## 3      0  Somewhat true 167 144 171 177
## 3      1  Somewhat true 174 114 120 132
## 4      0     (Blank)    22  29  26  33 
## 4      1     (Blank)    23  16  19  32 
## 5      0    Very true   480 420 423 364
## 5      1    Very true   522 400 340 382
## NA     0       NA       61  171 141 154
## NA     1       NA       47  247 306 195
## ---------------------------------------

## 
##  SES_12  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    212 171 158 152
## 1      1    Not true    259 173 152 174
## 2      0     (Blank)    30  32  25  22 
## 2      1     (Blank)    17  26  15  22 
## 3      0  Somewhat true 237 225 232 221
## 3      1  Somewhat true 277 193 164 191
## 4      0     (Blank)    22  21  21  39 
## 4      1     (Blank)    23  19  29  29 
## 5      0    Very true   272 238 271 259
## 5      1    Very true   229 212 197 256
## NA     0       NA       77  163 143 157
## NA     1       NA       61  243 309 194
## ---------------------------------------

## 
##  SES_13  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    166 127 114 112
## 1      1    Not true    215 118 97  120
## 2      0     (Blank)    22  17  28  19 
## 2      1     (Blank)    19  18  12  20 
## 3      0  Somewhat true 243 212 226 219
## 3      1  Somewhat true 256 179 184 207
## 4      0     (Blank)    13  20  27  32 
## 4      1     (Blank)    16  23  23  21 
## 5      0    Very true   351 296 310 315
## 5      1    Very true   303 278 238 311
## NA     0       NA       55  178 145 153
## NA     1       NA       57  250 312 187
## ---------------------------------------

## 
##  SES_14  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    280 212 186 183
## 1      1    Not true    343 207 155 174
## 2      0     (Blank)    20  21  28  27 
## 2      1     (Blank)    18  17  16  28 
## 3      0  Somewhat true 191 197 209 209
## 3      1  Somewhat true 197 153 148 184
## 4      0     (Blank)    18  24  22  35 
## 4      1     (Blank)    13  23  18  23 
## 5      0    Very true   272 223 254 243
## 5      1    Very true   251 224 219 267
## NA     0       NA       69  173 151 153
## NA     1       NA       44  242 310 190
## ---------------------------------------

## 
##  SES_15  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    90  71  79  70 
## 1      1    Not true    115 75  55  67 
## 2      0     (Blank)    22  24  16  16 
## 2      1     (Blank)    13  13  14  14 
## 3      0  Somewhat true 162 171 193 185
## 3      1  Somewhat true 211 168 158 169
## 4      0     (Blank)    30  20  24  36 
## 4      1     (Blank)    26  14  19  27 
## 5      0    Very true   493 387 393 389
## 5      1    Very true   444 349 307 395
## NA     0       NA       53  177 145 154
## NA     1       NA       57  247 313 194
## ---------------------------------------

## 
##  SES_16  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    135 102 110 86 
## 1      1    Not true    193 103 80  90 
## 2      0     (Blank)    25  25  20  23 
## 2      1     (Blank)    22  12  11  16 
## 3      0  Somewhat true 229 210 188 228
## 3      1  Somewhat true 232 172 169 194
## 4      0     (Blank)    30  36  40  35 
## 4      1     (Blank)    19  23  29  25 
## 5      0    Very true   370 311 350 330
## 5      1    Very true   352 307 269 355
## NA     0       NA       61  166 142 148
## NA     1       NA       48  249 308 186
## ---------------------------------------

## 
##  SES_17  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    112 68  94  99 
## 1      1    Not true    136 92  71  93 
## 2      0     (Blank)    12  24  17  14 
## 2      1     (Blank)     8  10  10  15 
## 3      0  Somewhat true 163 190 176 173
## 3      1  Somewhat true 197 156 152 183
## 4      0     (Blank)    23  20  32  33 
## 4      1     (Blank)    29  22  17  17 
## 5      0    Very true   471 378 388 379
## 5      1    Very true   430 326 304 374
## NA     0       NA       69  170 143 152
## NA     1       NA       66  260 312 184
## ---------------------------------------

## 
##  SES_18  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    66  58  63  63 
## 1      1    Not true    83  45  51  62 
## 2      0     (Blank)    13  16  17  16 
## 2      1     (Blank)     9  14   5  15 
## 3      0  Somewhat true 157 154 142 163
## 3      1  Somewhat true 179 144 119 175
## 4      0     (Blank)    18  24  32  35 
## 4      1     (Blank)    17  18  27  23 
## 5      0    Very true   541 432 450 423
## 5      1    Very true   535 400 352 410
## NA     0       NA       55  166 146 150
## NA     1       NA       43  245 312 181
## ---------------------------------------

## 
##  SES_19  summary by time and treatment group
## 
## =======================================
## score trt  score_label   0   2   4   8 
## ---------------------------------------
## 1      0    Not true    223 159 148 148
## 1      1    Not true    263 170 115 145
## 2      0     (Blank)    11  16  19  20 
## 2      1     (Blank)    14   6   9  16 
## 3      0  Somewhat true 194 218 205 198
## 3      1  Somewhat true 220 147 159 186
## 4      0     (Blank)    27  23  31  32 
## 4      1     (Blank)    21  25  24  21 
## 5      0    Very true   348 273 312 303
## 5      1    Very true   311 281 255 322
## NA     0       NA       47  161 135 149
## NA     1       NA       37  237 304 176
## ---------------------------------------

RCT_CodeBook

Ishan Gupta

2024-07-03

Introduction

PART 1: Checking for missingness

Part 2: Demographic Analysis**

Part 3: Frequency of responses to Mental health questions in each scale