As part of the RCT analysis of Shamiri’s 4-week intervention in 6-schools, we are also studing the impact of the treatment on the academic outcomes between control and treatment students.
The academic outcomes consist of term wise, and form wise grades for 5 out of the 6 schools. The academic outcomes data for the 6th school could not be retrived. Moreover, majority of the Form 4 data for the other 5 schools could not be retrieved due to the issues with schools not releasing the grades in time.
Accounting for the missingness we can see that we have about 87% of the academic outcomes data for the 1600+ students in the RCT.
The intervention was conducted in Term 2. Therefore, Term 1 marks serve as a baseline for the analysis. Term 2 marks were immediately after the intervention and can therefore serve as the endline. Term 3 marks were a few months after the intervention (about 4 months) and can therefore serve as a follow-up data point post-intervention.
Before the academic outcomes data could be cleaned for the analysis, we had to follow the steps mentioned below:
We included 3 additional columns in each of the form/term/school-wise sheets: Form, Term and School_Name. These columns tool values from the sheet name. For example, in the sheet named “F1_T1_Aquinas”, the form column took value F1 for each row, Term column took value T1 for each row and the School_Name column took value of Aquinas for each row. This was done in order to retain individual identification of form, term and school in the merged data frame which contained all academic outcomes.
# Installing essential libraries
library(stargazer)
library(readxl)
library(dplyr)
library(knitr)
library(tidyverse)
library(lmtest)
library(sandwich)
library(fixest)
library(ggplot2)
library(ggthemes)
library(corrplot)
library(reshape2)
library(gridExtra)
library(grid)
library(tableone)
library(writexl)
library(patchwork)
library(interactions)
library(kableExtra)
library(broom)
# Setting the working directory
setwd('/Users/ishangupta/Desktop/Books for Masters/R Practice')
# Path to your Excel file
file_path <- "/Users/ishangupta/Desktop/Books for Masters/R Practice/Mastersheet.xlsx"
# Get the sheet names
sheet_names <- excel_sheets(file_path)
# Function to check and convert "Admission_Number" to numeric
convert_to_numeric <- function(df, sheet_name) {
if (!"Admission_Number" %in% colnames(df)) {
warning(paste("Sheet", sheet_name, "does not have an 'Admission_Number' column."))
return(df)
}
if (!is.numeric(df$Admission_Number)) {
df$Admission_Number <- as.numeric(df$Admission_Number)
if (any(is.na(df$Admission_Number))) {
warning(paste("Sheet", sheet_name, "contains non-numeric 'Admission_Number' values that were converted to NA."))
}
}
df
}
# Load all sheets into a named list of dataframes, add the "Names" column, check and convert "Admission_Number"
dfs <- lapply(setNames(sheet_names, sheet_names), function(sheet) {
df <- read_excel(file_path, sheet = sheet)
df$Names <- sheet
df <- convert_to_numeric(df, sheet)
df
})
# Assign each dataframe to the global environment with its sheet name
list2env(dfs, .GlobalEnv)
## <environment: R_GlobalEnv>
Each of these sheets had uneven set of columns with different names. This was due to schools not offering certain subjects or not recording certain data points. For example, Computer Science or Home Science was not offered in all schools. We ensured that all the sheets have equal number of columns. We added missing columns in each sheet with NA values from the master set of column names.
# Function to extract form, term, and school name from sheet name
parse_sheet_name <- function(sheet_name) {
parts <- strsplit(sheet_name, "_")[[1]]
list(
Form = parts[1],
Term = parts[2],
School_Name = parts[3]
)
}
# Function to add Form, Term, and School Name columns
add_columns <- function(df, sheet_name) {
info <- parse_sheet_name(sheet_name)
df %>%
mutate(
Form = info$Form,
Term = info$Term,
School_Name = info$School_Name
)
}
# Load all sheets into a named list of dataframes, add columns
dfs <- lapply(setNames(sheet_names, sheet_names), function(sheet) {
df <- read_excel(file_path, sheet = sheet)
df <- add_columns(df, sheet)
df
})
# Assign each dataframe to the global environment with its sheet name
list2env(dfs, .GlobalEnv)
## <environment: R_GlobalEnv>
Since the some of the sheets were converted from pdf, their data sturcture was not consistent. To ensure that all the sheets have the same data type for merging purposes. We forced convert all the columns into the character data type.
# Load necessary library
library(dplyr)
# Define the full list of columns
full_columns <- c("Admission_Number", "No_of_Sub", "STR", "Name", "ENG", "KIS", "MAT", "BIO",
"PHY", "CHE", "HIS", "GEO", "CRE", "HSC", "AGR", "BST", "CS", "COM",
"MW", "DD", "FRE", "KSL", "UPI", "Avg_Marks", "Total_Marks",
"Total_Points", "DEV", "Grade", "VAP", "STR_POS", "OVR_POS",
"Mean_Points", "KCPE", "Form", "Term", "School_Name")
# Function to add missing columns and ensure consistent column data types
add_missing_columns_and_convert_types <- function(df, full_columns) {
# Determine which columns are missing
missing_cols <- setdiff(full_columns, names(df))
if (length(missing_cols) > 0) {
# Add missing columns with NA values
df <- df %>%
bind_cols(set_names(as.data.frame(matrix(NA, nrow(df), length(missing_cols))), missing_cols))
}
# Ensure columns have the same data type
for (col in full_columns) {
if (col %in% names(df)) {
# Determine the type of the column across all data frames
column_types <- sapply(sheet_list, function(x) if (col %in% names(x)) typeof(x[[col]]) else NA)
column_type <- na.omit(column_types)[1] # Choose the first non-NA type as the consistent type
# Convert column to the consistent type
df[[col]] <- switch(column_type,
"character" = as.character(df[[col]]),
"double" = as.numeric(df[[col]]),
"integer" = as.integer(df[[col]]),
df[[col]]) # No conversion if type is not recognized
}
}
return(df)
}
# List of data frames (sheets)
sheet_list <- list(F1_T1_Aquinas, F1_T1_Fatima, F1_T1_Kamkunji, F1_T1_Muthurwa, F1_T1_Ruiru,
F1_T2_Aquinas, F1_T2_Fatima, F1_T2_Kamkunji, F1_T2_Muthurwa, F1_T2_Ruiru,
F1_T3_Aquinas, F1_T3_Fatima, F1_T3_Kamkunji, F1_T3_Muthurwa, F1_T3_Ruiru,
F2_T1_Aquinas, F2_T1_Fatima, F2_T1_Kamkunji, F2_T1_Muthurwa, F2_T1_Ruiru,
F2_T2_Aquinas, F2_T2_Fatima, F2_T2_Kamkunji, F2_T2_Muthurwa, F2_T2_Ruiru,
F2_T3_Aquinas, F2_T3_Fatima, F2_T3_Kamkunji, F2_T3_Muthurwa, F2_T3_Ruiru,
F3_T1_Aquinas, F3_T1_Fatima, F3_T1_Kamkunji, F3_T1_Muthurwa, F3_T1_Ruiru,
F3_T2_Aquinas, F3_T2_Fatima, F3_T2_Kamkunji, F3_T2_Muthurwa, F3_T2_Ruiru,
F3_T3_Aquinas, F3_T3_Fatima, F3_T3_Kamkunji, F3_T3_Muthurwa, F3_T3_Ruiru,
F4_T1_Kamkunji, F4_T2_Kamkunji)
# Add missing columns and ensure consistent data types
sheet_list_updated <- lapply(sheet_list, function(df) {
add_missing_columns_and_convert_types(df, full_columns)
})
# Assign updated data frames back to original variable names
list2env(setNames(sheet_list_updated, c("F1_T1_Aquinas", "F1_T1_Fatima", "F1_T1_Kamkunji",
"F1_T1_Muthurwa", "F1_T1_Ruiru", "F1_T2_Aquinas",
"F1_T2_Fatima", "F1_T2_Kamkunji", "F1_T2_Muthurwa",
"F1_T2_Ruiru", "F1_T3_Aquinas", "F1_T3_Fatima",
"F1_T3_Kamkunji", "F1_T3_Muthurwa", "F1_T3_Ruiru",
"F2_T1_Aquinas", "F2_T1_Fatima", "F2_T1_Kamkunji",
"F2_T1_Muthurwa", "F2_T1_Ruiru", "F2_T2_Aquinas",
"F2_T2_Fatima", "F2_T2_Kamkunji", "F2_T2_Muthurwa",
"F2_T2_Ruiru", "F2_T3_Aquinas", "F2_T3_Fatima",
"F2_T3_Kamkunji", "F2_T3_Muthurwa", "F2_T3_Ruiru",
"F3_T1_Aquinas", "F3_T1_Fatima", "F3_T1_Kamkunji",
"F3_T1_Muthurwa", "F3_T1_Ruiru", "F3_T2_Aquinas",
"F3_T2_Fatima", "F3_T2_Kamkunji", "F3_T2_Muthurwa",
"F3_T2_Ruiru", "F3_T3_Aquinas", "F3_T3_Fatima",
"F3_T3_Kamkunji", "F3_T3_Muthurwa", "F3_T3_Ruiru",
"F4_T1_Kamkunji", "F4_T2_Kamkunji")), envir = .GlobalEnv)
## <environment: R_GlobalEnv>
We merged all the academic outcomes sheets into a single file.
# Define your sheet list with names
sheet_names <- c(
"F1_T1_Aquinas", "F1_T1_Fatima", "F1_T1_Kamkunji", "F1_T1_Muthurwa", "F1_T1_Ruiru",
"F1_T2_Aquinas", "F1_T2_Fatima", "F1_T2_Kamkunji", "F1_T2_Muthurwa", "F1_T2_Ruiru",
"F1_T3_Aquinas", "F1_T3_Fatima", "F1_T3_Kamkunji", "F1_T3_Muthurwa", "F1_T3_Ruiru",
"F2_T1_Aquinas", "F2_T1_Fatima", "F2_T1_Kamkunji", "F2_T1_Muthurwa", "F2_T1_Ruiru",
"F2_T2_Aquinas", "F2_T2_Fatima", "F2_T2_Kamkunji", "F2_T2_Muthurwa", "F2_T2_Ruiru",
"F2_T3_Aquinas", "F2_T3_Fatima", "F2_T3_Kamkunji", "F2_T3_Muthurwa", "F2_T3_Ruiru",
"F3_T1_Aquinas", "F3_T1_Fatima", "F3_T1_Kamkunji", "F3_T1_Muthurwa", "F3_T1_Ruiru",
"F3_T2_Aquinas", "F3_T2_Fatima", "F3_T2_Kamkunji", "F3_T2_Muthurwa", "F3_T2_Ruiru",
"F3_T3_Aquinas", "F3_T3_Fatima", "F3_T3_Kamkunji", "F3_T3_Muthurwa", "F3_T3_Ruiru",
"F4_T1_Kamkunji", "F4_T2_Kamkunji"
)
# Define your sheet list with actual data frames
sheet_list <- list(
F1_T1_Aquinas, F1_T1_Fatima, F1_T1_Kamkunji, F1_T1_Muthurwa, F1_T1_Ruiru,
F1_T2_Aquinas, F1_T2_Fatima, F1_T2_Kamkunji, F1_T2_Muthurwa, F1_T2_Ruiru,
F1_T3_Aquinas, F1_T3_Fatima, F1_T3_Kamkunji, F1_T3_Muthurwa, F1_T3_Ruiru,
F2_T1_Aquinas, F2_T1_Fatima, F2_T1_Kamkunji, F2_T1_Muthurwa, F2_T1_Ruiru,
F2_T2_Aquinas, F2_T2_Fatima, F2_T2_Kamkunji, F2_T2_Muthurwa, F2_T2_Ruiru,
F2_T3_Aquinas, F2_T3_Fatima, F2_T3_Kamkunji, F2_T3_Muthurwa, F2_T3_Ruiru,
F3_T1_Aquinas, F3_T1_Fatima, F3_T1_Kamkunji, F3_T1_Muthurwa, F3_T1_Ruiru,
F3_T2_Aquinas, F3_T2_Fatima, F3_T2_Kamkunji, F3_T2_Muthurwa, F3_T2_Ruiru,
F3_T3_Aquinas, F3_T3_Fatima, F3_T3_Kamkunji, F3_T3_Muthurwa, F3_T3_Ruiru,
F4_T1_Kamkunji, F4_T2_Kamkunji
)
# Define a function to get column names and their data types in a specific format
get_column_types_for_sheet <- function(df, sheet_name) {
tibble(
Sheet = sheet_name,
Column = names(df),
Type = sapply(df, class)
) %>%
pivot_wider(names_from = Column, values_from = Type)
}
# Apply the function to each sheet
column_types_list <- Map(get_column_types_for_sheet, sheet_list, sheet_names)
# Combine all data frames into a single data frame
combined_column_types <- bind_rows(column_types_list)
# Print the combined column types data frame
print(combined_column_types)
## # A tibble: 47 × 38
## Sheet Admission_Number UPI NAME STR ENG KIS MAT BIO PHY CHE
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 F1_T1… character char… char… char… char… char… char… char… char… char…
## 2 F1_T1… character char… char… char… char… char… char… char… char… char…
## 3 F1_T1… character char… char… char… char… char… char… char… char… char…
## 4 F1_T1… character char… <NA> char… char… char… char… char… char… char…
## 5 F1_T1… character char… char… char… char… char… char… char… char… char…
## 6 F1_T2… character char… char… char… char… char… char… char… char… char…
## 7 F1_T2… character char… char… char… char… char… char… char… char… char…
## 8 F1_T2… character char… char… char… char… char… char… char… char… char…
## 9 F1_T2… character char… <NA> char… char… char… char… char… char… char…
## 10 F1_T2… character char… char… char… char… char… char… char… char… char…
## # ℹ 37 more rows
## # ℹ 27 more variables: HIS <chr>, GEO <chr>, CRE <chr>, HSC <chr>, AGR <chr>,
## # MW <chr>, COM <chr>, FRE <chr>, BST <chr>, No_of_Sub <chr>, KCPE <chr>,
## # VAP <chr>, Avg_Marks <chr>, DEV <chr>, Grade <chr>, Total_Marks <chr>,
## # Total_Points <chr>, Mean_Points <chr>, STR_POS <chr>, OVR_POS <chr>,
## # Form <chr>, Term <chr>, School_Name <chr>, Name <chr>, CS <chr>, DD <chr>,
## # KSL <chr>
merged_df <- bind_rows(sheet_list)
merged_df <- merged_df %>%
mutate(across(everything(), ~ na_if(., "--")))
Since all of the subject wise grades contained both marks and the letter grade achieved in the respective subject, we had to use split columns to text method using space as a delimiter to retrieve only the numeric marks.
# Define the columns to process
columns_to_process <- c("ENG", "KIS", "MAT", "BIO", "PHY", "CHE", "HIS", "GEO", "CRE",
"HSC", "AGR", "MW", "FRE", "COM", "BST", "KCPE", "CS", "DD", "KSL")
# Function to keep only the part before the first space
clean_column <- function(column) {
# This function will split the column by space and take the first part
return(sapply(column, function(x) strsplit(as.character(x), " ")[[1]][1]))
}
# Apply the cleaning function to the specified columns
cleaned_df <- merged_df %>%
mutate(across(all_of(columns_to_process), ~ clean_column(.)))
cleaned_df <- cleaned_df %>%
select(-UPI,-NAME,-Name)
school_lookup <- data.frame(
School_ID = c("ANS23_School_47", "ANS23_School_25", "ANS23_School_18",
"ANS23_School_17", "ANS23_School_7", "ANS23_School_5"),
School_Name = c("Makongeni", "Muthurwa", "Aquinas",
"Kamkunji", "Ruiru", "Fatima"),
stringsAsFactors = FALSE
)
cleaned_df <- cleaned_df %>%
left_join(school_lookup, by = "School_Name")
We used the MasterSheet which had the link between Participant_IDs and Admission_Number and joined it with the pivoted dataset from the step above. The resulting data set included only those students who were part of the RCT conducted by Shamiri.
The current data frame was in the format of:
| Admission_Number | Form | Term | Sub1 | Sub2 |
|---|---|---|---|---|
| 12345 | F1 | T1 | 80 | 60 |
| 12345 | F1 | T2 | 90 | 70 |
| 12345 | F1 | T3 | 70 | 100 |
We had to pivot it by Term to create the following type of format:
| Admission_Number | Form | T1_Sub1 | T1_Sub2 | T2_Sub1 | T2_Sub2 | T3_Sub1 | T3_Sub2 |
|---|---|---|---|---|---|---|---|
| 12345 | F1 | 80 | 60 | 90 | 70 | 70 | 100 |
We did this as each Participant_ID needed to have a unique entry for the regressions analysis.
Now since we had the entire data purely numeric, we were able to convert all the character data types into numeric for the regression analysis.
Once the pivoted data set was ready, for all the subject scores, average marks, etc. we demeaned the data set. This was done to ensure that we remove any inconsistencies across the schools and individual effects which may impact the regression results.
# Select only Participant_ID and Admission_Number from Mastersheet
mastersheet_reduced <- MasterSheet %>%
select(Participant_ID, Admission_Number, School_ID)
final <- mastersheet_reduced %>%
left_join(cleaned_df, by = c("School_ID", "Admission_Number"), relationship = "many-to-many")
final$Data_Avail <- apply(final[, 3:33], 1, function(row) {
if (all(is.na(row))) {
return(0)
} else {
return(1)
}
})
# Define the columns to process
columns_to_process <- c("ENG", "KIS", "MAT", "BIO", "PHY", "CHE", "HIS", "GEO", "CRE",
"HSC", "AGR", "MW", "FRE", "COM", "BST", "KCPE", "CS", "DD", "KSL")
# Function to keep only the part before the first space
clean_column <- function(column) {
# Split each value by space and keep the first part
return(sapply(column, function(x) {
parts <- strsplit(as.character(x), " ")[[1]]
if (length(parts) > 0) {
return(parts[1])
} else {
return(NA)
}
}))
}
# Apply the cleaning function to the specified columns in the final dataframe
final <- final %>%
mutate(across(all_of(columns_to_process), ~ clean_column(.)))
# List of columns to pivot
columns_to_pivot <- c("No_of_Sub", "ENG", "KIS", "MAT", "BIO",
"PHY", "CHE", "HIS", "GEO", "CRE", "HSC",
"AGR", "BST", "CS", "COM", "MW", "DD",
"FRE", "KSL", "Avg_Marks", "Total_Marks",
"Total_Points", "DEV", "Grade", "VAP",
"STR_POS", "OVR_POS", "Mean_Points", "KCPE")
# Filter the data
filtered_final <- final %>%
filter(Data_Avail == 1)
# Define the columns to rename
columns_to_rename <- c("No_of_Sub", "ENG", "KIS", "MAT", "BIO", "PHY", "CHE",
"HIS", "GEO", "CRE", "HSC", "AGR", "BST", "CS", "COM",
"MW", "DD", "FRE", "KSL", "Avg_Marks", "Total_Marks",
"Total_Points", "DEV", "Grade", "VAP", "STR_POS",
"OVR_POS", "Mean_Points", "KCPE")
# Function to rename columns with term prefix and remove Term column
rename_and_remove_term <- function(df, term_prefix) {
df %>%
rename_with(~ paste0(term_prefix, "_", .), all_of(columns_to_rename)) %>%
select(-Term)
}
# Subset and rename for Term 1
filtered_final_T1 <- filtered_final %>%
filter(Term == "T1") %>%
rename_and_remove_term("T1")
# Subset and rename for Term 2
filtered_final_T2 <- filtered_final %>%
filter(Term == "T2") %>%
rename_and_remove_term("T2")
# Subset and rename for Term 3
filtered_final_T3 <- filtered_final %>%
filter(Term == "T3") %>%
rename_and_remove_term("T3")
# Combine all unique participant_IDs and Admission_Number
all_ids <- bind_rows(
filtered_final_T1 %>% select(Participant_ID, Admission_Number),
filtered_final_T2 %>% select(Participant_ID, Admission_Number),
filtered_final_T3 %>% select(Participant_ID, Admission_Number)
) %>% distinct()
# Merge the data frames based on participant_ID and Admission_Number
merged_final <- all_ids %>%
left_join(filtered_final_T1, by = c("Participant_ID", "Admission_Number")) %>%
left_join(filtered_final_T2, by = c("Participant_ID", "Admission_Number")) %>%
left_join(filtered_final_T3, by = c("Participant_ID", "Admission_Number"))
# List of columns to convert to numeric
cols_to_convert <- c("T1_No_of_Sub", "T1_ENG", "T1_KIS", "T1_MAT", "T1_BIO",
"T1_PHY", "T1_CHE", "T1_HIS", "T1_GEO", "T1_CRE", "T1_HSC",
"T1_AGR", "T1_BST", "T1_CS", "T1_COM", "T1_MW", "T1_DD",
"T1_FRE", "T1_KSL", "T1_Avg_Marks", "T1_Total_Marks",
"T1_Total_Points", "T1_DEV", "T1_VAP", "T1_STR_POS",
"T1_OVR_POS", "T1_Mean_Points", "T1_KCPE", "T2_No_of_Sub",
"T2_ENG", "T2_KIS", "T2_MAT", "T2_BIO", "T2_PHY", "T2_CHE",
"T2_HIS", "T2_GEO", "T2_CRE", "T2_HSC", "T2_AGR", "T2_BST",
"T2_CS", "T2_COM", "T2_MW", "T2_DD", "T2_FRE", "T2_KSL",
"T2_Avg_Marks", "T2_Total_Marks", "T2_Total_Points", "T2_DEV",
"T2_VAP", "T2_STR_POS", "T2_OVR_POS", "T2_Mean_Points",
"T2_KCPE", "T3_No_of_Sub", "T3_ENG", "T3_KIS", "T3_MAT",
"T3_BIO", "T3_PHY", "T3_CHE", "T3_HIS", "T3_GEO", "T3_CRE",
"T3_HSC", "T3_AGR", "T3_BST", "T3_COM", "T3_MW", "T3_DD",
"T3_FRE", "T3_KSL", "T3_Avg_Marks", "T3_Total_Marks",
"T3_Total_Points", "T3_DEV", "T3_VAP", "T3_STR_POS",
"T3_OVR_POS", "T3_Mean_Points", "T3_KCPE")
# Convert specified columns to numeric
merged_final <- merged_final %>%
mutate(across(all_of(cols_to_convert), as.numeric))
# List of columns to convert to numeric and demean
cols_to_demean <- c("T1_ENG", "T1_KIS", "T1_MAT", "T1_BIO", "T1_PHY", "T1_CHE", "T1_HIS", "T1_GEO",
"T1_CRE", "T1_HSC", "T1_AGR", "T1_BST", "T1_CS", "T1_COM", "T1_MW", "T1_DD",
"T1_FRE", "T1_KSL", "T1_Avg_Marks", "T1_Total_Marks", "T1_Total_Points",
"T1_Mean_Points", "T1_KCPE", "T2_ENG", "T2_KIS", "T2_MAT", "T2_BIO", "T2_PHY",
"T2_CHE", "T2_HIS", "T2_GEO", "T2_CRE", "T2_HSC", "T2_AGR", "T2_BST", "T2_CS",
"T2_COM", "T2_MW", "T2_DD", "T2_FRE", "T2_KSL", "T2_Avg_Marks", "T2_Total_Marks",
"T2_Total_Points", "T2_Mean_Points", "T2_KCPE", "T3_ENG", "T3_KIS", "T3_MAT",
"T3_BIO", "T3_PHY", "T3_CHE", "T3_HIS", "T3_GEO", "T3_CRE", "T3_HSC", "T3_AGR",
"T3_BST", "T3_COM", "T3_MW", "T3_DD", "T3_FRE", "T3_KSL", "T3_Avg_Marks",
"T3_Total_Marks", "T3_Total_Points", "T3_Mean_Points", "T3_KCPE")
library(dplyr)
# Step 1: Create `trt` column
merged_final <- merged_final %>%
mutate(trt = ifelse(startsWith(Participant_ID, "S"), 1,
ifelse(startsWith(Participant_ID, "C"), 0, NA)))
temp <- merged_final
# Step 2: Calculate means for control students
control_means <- merged_final %>%
filter(trt == 0) %>%
group_by(School_ID) %>%
summarize(across(all_of(cols_to_demean), ~ mean(as.numeric(.), na.rm = TRUE), .names = "mean_{.col}"))
# Step 3: Calculate standard deviations for control students
control_sds <- merged_final %>%
filter(trt == 0) %>%
group_by(School_ID) %>%
summarize(across(all_of(cols_to_demean), ~ sd(as.numeric(.), na.rm = TRUE), .names = "sd_{.col}"))
# Step 4: Merge the means and sds with the original dataframe
merged_final <- merged_final %>%
left_join(control_means, by = "School_ID") %>%
left_join(control_sds, by = "School_ID")
# Step 5: Normalize columns in `cols_to_demean` without changing column names
for (col in cols_to_demean) {
mean_col <- paste0("mean_", col) # Name of the mean column
sd_col <- paste0("sd_", col) # Name of the sd column
merged_final <- merged_final %>%
mutate(!!sym(col) := (as.numeric(!!sym(col)) - !!sym(mean_col)) / !!sym(sd_col))
}
# Step 6: Select relevant columns, removing the mean and sd columns
final_final <- merged_final %>%
select(-starts_with("mean_"), -starts_with("sd_"))
final_final <- final_final %>%
select(-School_Name.x,-School_ID.y,-School_Name.y, -School_Name,-STR.x,-STR.y,-STR,-Form.x,-Form.y,-Data_Avail.x,-Data_Avail.y, -trt)
We then loaded the RCT data and pivoted it by time frame (0, 2, 4, and 8 weeks) and used the Participant_IDs in this dataframe to merge with the academic scores data set.
# Setting the working directory
setwd('/Users/ishangupta/Desktop/Books for Masters/R Practice')
# Loading the data in R
treatment <- read_xlsx("Treatment.xlsx", sheet = "2023Studentlongtrialdata")
control <- read_xlsx("Control.xlsx", sheet = "2023NaiKiambuControl")
six_mfu <- read.csv("six_mfu.csv")
# Renaming the columns in the treatment group
treatment <- treatment %>%
rename(
Age = `student age`,
Form = form,
Gender = `student gender`,
School_ID = `school id`
) %>%
mutate(trt = 1)
# List of school IDs to be dropped
school_ids_to_drop <- c("ANS23_School_31", "ANS23_School_32", "ANS23_School_33",
"ANS23_School_34", "ANS23_School_35", "ANS23_School_36",
"ANS23_School_37")
# Assuming 'treatment' is your dataframe with the treatment group data
# Filtering out rows with specified school IDs
treatment <- treatment %>%
filter(!School_ID %in% school_ids_to_drop)
# Renaming the columns in the control group
control <- control %>%
rename(
School_ID = `school id`
) %>%
mutate(supervisor_id = "NA") %>%
mutate(hub = "NA") %>%
mutate(fellow_gender = "NA") %>%
mutate(fellow_age = "NA") %>%
mutate(trt = 0)
# Creating the RCT data set
rct <- rbind(treatment,control)
rct <- rct %>%
rename(
Participant_ID = participant_id
) %>%
mutate(student_female = case_when(
is.na(Gender) ~ NA_real_,
Gender == "Female" ~ 1,
TRUE ~ 0),
rural_smalltown = case_when(
is.na(Home) ~ NA_real_,
Home == 1 ~ 1,
Home == 2 ~ 1,
TRUE ~ 0),
bigtown_city = case_when(
is.na(Home) ~ NA_real_,
Home == 3 ~ 1,
Home == 4 ~ 1,
TRUE ~ 0)
)
# Converting the columns into numeric
cols_to_numeric <- c(1, 11, 12, 13, 103, 104, 17:101)
rct[cols_to_numeric] <- lapply(rct[cols_to_numeric], as.numeric)
rct <- rct %>%
select(-c(PHQ_Functioning, GAD_Functioning, GAD_Check))
rct <- rct %>%
select(-c(88:100))
phq_questions <- paste("PHQ", 1:8, sep = "_")
gad_questions <- paste("GAD", 1:7, sep = "_")
pcs_questions <- c("PCS_2", "PCS_4", "PCS_5", "PCS_6")
gq_questions <- c("GQ_2", "GQ_4")
ses_questions <- c("SES_2", "SES_3", "SES_11")
rct <- rct %>%
mutate(
across(all_of(phq_questions), ~ 3 - .x),
across(all_of(gad_questions), ~ 3 - .x),
across(all_of(pcs_questions), ~ 3 - .x),
across(all_of(gq_questions), ~ 8 - .x),
across(all_of(ses_questions), ~ 8 - .x)
)
# List of column prefixes for each scale
scales <- c("PCS_", "PHQ_", "GAD_", "MSPSS_", "GQ_", "SWEMWBS_", "SES_")
# Function to calculate total score for each scale
calculate_total_score <- function(df, prefix) {
scale_columns <- grep(paste0("^", prefix), names(df), value = TRUE)
df[paste0(prefix, "Total")] <- rowSums(df[, scale_columns], na.rm = TRUE)
return(df)
}
# Apply the function to calculate total scores
for (scale in scales) {
rct <- calculate_total_score(rct, scale)
}
rct_pivoted <- rct %>%
pivot_wider(names_from = time, values_from = c(starts_with("PHQ_"), starts_with("GAD_"),
starts_with("MSPSS_"), starts_with("GQ_"),
starts_with("SWEMWBS_"), starts_with("SES_"),
starts_with("PCS_")),
names_sep = "_")
six_mfu <- six_mfu %>%
select(participant_id, treatment, time, starts_with("pcs"), starts_with("swembs"),
starts_with("phq"), starts_with("gad"), starts_with("gq"),
starts_with("mspss"), starts_with("ses")) %>%
rename_with(~ gsub("^swembs", "swemwbs", .), starts_with("swembs"))
six_mfu_filtered <- six_mfu %>%
filter(time == 26) %>% # Filter rows where time is 26
select(-phq_functioning, -gad_functioning, -gad_check) %>% # Remove specific columns
rename_with(~ toupper(.), starts_with("pcs")) %>% # Convert pcs columns to uppercase
rename_with(~ toupper(.), starts_with("swemwbs")) %>%
rename_with(~ toupper(.), starts_with("phq")) %>%
rename_with(~ toupper(.), starts_with("gad")) %>%
rename_with(~ toupper(.), starts_with("gq")) %>%
rename_with(~ toupper(.), starts_with("mspss")) %>%
rename_with(~ toupper(.), starts_with("ses")) %>%
rename(Participant_ID = participant_id) %>% # Rename participant_id to Participant_ID
select(-treatment) # Remove the treatment column
six_mfu_filtered <- six_mfu_filtered %>%
mutate(
PHQ_Total = rowSums(select(., starts_with("PHQ")), na.rm = TRUE),
GAD_Total = rowSums(select(., starts_with("GAD")), na.rm = TRUE),
PCS_Total = rowSums(select(., starts_with("PCS")), na.rm = TRUE),
SWEMWBS_Total = rowSums(select(., starts_with("SWEMWBS")), na.rm = TRUE),
GQ_Total = rowSums(select(., starts_with("GQ")), na.rm = TRUE),
MSPSS_Total = rowSums(select(., starts_with("MSPSS")), na.rm = TRUE),
SES_Total = rowSums(select(., starts_with("SES")), na.rm = TRUE)
)
six_mfu_filtered <- six_mfu_filtered %>%
rename_with(~ paste0(., "_26"), starts_with("GAD")) %>% # Add _26 to GAD columns
rename_with(~ paste0(., "_26"), starts_with("PCS")) %>% # Add _26 to PCS columns
rename_with(~ paste0(., "_26"), starts_with("PHQ")) %>% # Add _26 to PHQ columns
rename_with(~ paste0(., "_26"), starts_with("SWEMWBS")) %>% # Add _26 to SWEMWBS columns
rename_with(~ paste0(., "_26"), starts_with("GQ")) %>% # Add _26 to GQ columns
rename_with(~ paste0(., "_26"), starts_with("MSPSS")) %>% # Add _26 to MSPSS columns
rename_with(~ paste0(., "_26"), starts_with("SES")) # Add _26 to SES columns
six_mfu_filtered <- six_mfu_filtered %>%
select(-time)
merged_results <- rct_pivoted %>%
left_join(six_mfu_filtered, by = "Participant_ID")
combined <- merged_results %>%
left_join(final_final, by = "Participant_ID")
combined <- combined %>%
select(-School_ID.x,-School_ID.y,-Form.y) %>%
rename (School_ID = School_ID.x.x, Form = Form.x)
combined <- combined %>%
mutate(clinical = ifelse(PHQ_Total_0 <=14 | GAD_Total_0 <=14, 1, 0))
short <- combined %>%
select(Participant_ID, School_ID, PHQ_Total_0, GAD_Total_0, clinical) %>%
group_by(clinical) %>% summarise (count = n())
Since T1 was right before the intervention, we can treat T1 scores as the baseline and therefore conduct balance test.
# Select columns that start with "T1" and the treatment variable
t1_columns <- grep("^T1", names(combined), value = TRUE)
selected_columns <- c("trt", t1_columns)
# Subset the dataframe to include only the selected columns
balance_df <- combined[, selected_columns]
balance_df <- balance_df %>%
select(-T1_Grade)
# Create a TableOne object
balance_table <- CreateTableOne(vars = t1_columns, strata = "trt", data = balance_df, test = TRUE)
# Convert the TableOne object to a data frame
balance_table_df <- as.data.frame(print(balance_table, smd = TRUE, printToggle = FALSE))
# Format the data frame for stargazer (adjust as needed)
balance_table_df$Variable <- rownames(balance_table_df)
rownames(balance_table_df) <- NULL
# Reorder the columns to fit stargazer input format
balance_table_df <- balance_table_df %>%
select(Variable, everything())
# Use stargazer to display the table
stargazer(balance_table_df, type = "text", summary = FALSE, rownames = FALSE,
title = "Table 1: Balance Table for Term 1 Academic Outcomes",
digits = 2)
##
## Table 1: Balance Table for Term 1 Academic Outcomes
## ==========================================================================
## Variable 0 1 p test SMD
## --------------------------------------------------------------------------
## n 850 866
## T1_ENG (mean (SD)) 0.00 (1.00) 0.02 (1.09) 0.710 0.020
## T1_KIS (mean (SD)) 0.00 (1.00) 0.11 (1.02) 0.052 0.106
## T1_MAT (mean (SD)) 0.00 (1.00) 0.06 (1.06) 0.302 0.056
## T1_BIO (mean (SD)) 0.00 (1.00) 0.14 (1.07) 0.021 0.131
## T1_PHY (mean (SD)) 0.00 (1.00) 0.10 (1.03) 0.095 0.100
## T1_CHE (mean (SD)) 0.00 (1.00) 0.08 (1.05) 0.164 0.076
## T1_HIS (mean (SD)) 0.00 (1.00) 0.08 (1.04) 0.159 0.080
## T1_GEO (mean (SD)) 0.00 (1.00) 0.11 (0.99) 0.080 0.108
## T1_CRE (mean (SD)) 0.00 (1.00) -0.01 (1.04) 0.850 0.011
## T1_HSC (mean (SD)) 0.00 (0.98) 0.13 (1.04) 0.258 0.127
## T1_AGR (mean (SD)) 0.00 (0.99) 0.19 (1.64) 0.167 0.141
## T1_MW (mean (SD)) 0.00 (1.00) 0.05 (0.57) 0.866 0.056
## T1_COM (mean (SD)) 0.00 (0.99) -0.01 (1.19) 0.899 0.012
## T1_FRE (mean (SD)) 0.00 (0.99) -0.03 (1.03) 0.873 0.033
## T1_BST (mean (SD)) 0.00 (0.99) 0.14 (1.06) 0.033 0.133
## T1_No_of_Sub (mean (SD)) 9.96 (1.92) 10.04 (1.75) 0.444 0.041
## T1_KCPE (mean (SD)) 0.00 (1.00) -0.04 (1.04) 0.457 0.044
## T1_VAP (mean (SD)) -11.20 (13.32) -9.76 (11.57) 0.050 0.115
## T1_Avg_Marks (mean (SD)) 0.00 (1.00) 0.10 (1.03) 0.055 0.104
## T1_DEV (mean (SD)) -0.41 (4.12) -0.30 (3.35) 0.609 0.031
## T1_Total_Marks (mean (SD)) 0.00 (1.00) 0.10 (1.05) 0.066 0.102
## T1_Total_Points (mean (SD)) 0.00 (1.00) 0.13 (1.15) 0.046 0.117
## T1_Mean_Points (mean (SD)) 0.00 (1.00) 0.04 (0.97) 0.444 0.046
## T1_STR_POS (mean (SD)) 25.58 (16.09) 26.02 (16.36) 0.616 0.027
## T1_OVR_POS (mean (SD)) 120.44 (82.41) 120.20 (83.98) 0.956 0.003
## T1_CS (mean (SD)) 0.00 (1.00) -0.20 (0.71) 0.427 0.226
## T1_DD (mean (SD)) 0.00 (1.00) -0.42 (1.54) 0.390 0.327
## T1_KSL (mean (SD)) 0.00 (1.00) 0.21 (0.78) 0.597 0.233
## --------------------------------------------------------------------------
school_stats <- merged_final %>%
group_by(School_ID) %>%
summarise(
Mean_T1 = mean(T1_Avg_Marks, na.rm = TRUE),
SD_T1 = sd(T1_Avg_Marks, na.rm = TRUE),
Mean_T2 = mean(T2_Avg_Marks, na.rm = TRUE),
SD_T2 = sd(T2_Avg_Marks, na.rm = TRUE),
Mean_T3 = mean(T3_Avg_Marks, na.rm = TRUE),
SD_T3 = sd(T3_Avg_Marks, na.rm = TRUE)
)
# Use stargazer to present the summary statistics by School_ID
stargazer(school_stats,
type = "text",
summary = FALSE,
title = "Table 2: School-wise Mean and Standard Deviation of Average Marks Across Terms",
digits = 2,
rownames = FALSE)
##
## Table 2: School-wise Mean and Standard Deviation of Average Marks Across Terms
## ===================================================================================================================================
## School_ID Mean_T1 SD_T1 Mean_T2 SD_T2 Mean_T3 SD_T3
## -----------------------------------------------------------------------------------------------------------------------------------
## ANS23_School_17 -0.00651313970601903 0.971935801389596 -0.0277842496996569 0.932567207786478 -0.0316167920505773 0.947113596803313
## ANS23_School_18 0.0907202805645688 0.91180438810084 0.0820121279239994 0.952576578182148 0.0542900329251336 0.953673279746632
## ANS23_School_25 0.0362225979087015 0.972663077088342 -0.00719284011220962 1.05279819743362 0.0301510452608446 1.00365493158475
## ANS23_School_5 -0.0207080807112431 1.00879878567722 -0.0465681393745451 0.993421073159136 -0.047542718891748 1.02143797066025
## ANS23_School_7 0.0813721641956309 1.1082482237558 0.0987877545908999 1.06652600870861 0.0852487666406166 1.03539466253287
## 0.497049467447658 1.29847414317251 0.386719861870872 1.29776183111221 NaN NA
## -----------------------------------------------------------------------------------------------------------------------------------
From the balance table above, we can note that majority of the academic outcomes for T1 are balanced (that is, there is no statistically significant difference between the treatment and control group). The only exceptions are BIO (Biology) and Total_Mean_Points.
Moreover, a small effect size (SMD) indicates that the differences in baseline characteristics between the control and the treatment group are minimal. This suggests that the two groups are fairly similar in terms of these characteristics at the baseline of the study.
# Reshape data to long format for easier plotting
combined_long <- combined %>%
select(School_ID, T1_Avg_Marks, T2_Avg_Marks, T3_Avg_Marks) %>%
pivot_longer(cols = starts_with("T"),
names_to = "Term",
values_to = "Avg_Marks")
# Density plot for T1, T2, and T3 average marks
ggplot(combined_long, aes(x = Avg_Marks, fill = Term, color = Term)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plot of Average Marks Across Terms (Demeaned Scores)",
x = "Average Marks",
y = "Density") +
theme_minimal()
# Reshape data to long format for easier plotting
temp_long <- temp %>%
select(School_ID, T1_Avg_Marks, T2_Avg_Marks, T3_Avg_Marks) %>%
pivot_longer(cols = starts_with("T"),
names_to = "Term",
values_to = "Avg_Marks")
# Density plot for T1, T2, and T3 average marks in merged_final
ggplot(temp_long, aes(x = Avg_Marks, fill = Term, color = Term)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plot of Average Marks Across Terms (Actual Scores)",
x = "Average Marks",
y = "Density") +
theme_minimal()
# Create the first plot: SES_Total_26 vs T3_Avg_Marks
plot_t2_t1 <- ggplot(temp, aes(x = T1_Avg_Marks, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "T1 Avg Marks vs T2 Avg Marks",
x = "T1 Avg Marks",
y = "T2 Avg Marks") +
theme_minimal()
print(plot_t2_t1)
# Create the first plot: SES_Total_26 vs T3_Avg_Marks
plot_t2_t3 <- ggplot(temp, aes(x = T2_Avg_Marks, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "T2 Avg Marks vs T3 Avg Marks",
x = "T2 Avg Marks",
y = "T3 Avg Marks") +
theme_minimal()
print(plot_t2_t3)
# Create the first plot: SES_Total_26 vs T3_Avg_Marks
plot_t1_t3 <- ggplot(temp, aes(x = T1_Avg_Marks, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "T1 Avg Marks vs T3 Avg Marks",
x = "T1 Avg Marks",
y = "T3 Avg Marks") +
theme_minimal()
print(plot_t1_t3)
# Create the first plot: PHQ_Total_0 vs T1_Avg_Marks
plot1 <- ggplot(combined, aes(x = PHQ_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "PHQ_Total_0 vs T1_Avg_Marks",
x = "PHQ_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
# Create the second plot: GAD_Total_0 vs T1_Avg_Marks
plot2 <- ggplot(combined, aes(x = GAD_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "GAD_Total_0 vs T1_Avg_Marks",
x = "GAD_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot <- plot1 + plot2 + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot)
# Define models for PHQ without controls or interaction term
model_phq_t1 <- lm(T1_Avg_Marks ~ PHQ_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_phq_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
# Same for GAD
model_gad_t1 <- lm(T1_Avg_Marks ~ GAD_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_gad_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
stargazer(model_phq_t1, model_phq_t1_v1 , model_gad_t1, model_gad_t1_v1, type = "text", title = "Table 3: Term 1 Average Scores Regressed on PHQ and GAD baseline scores", notes = "Includes School fixed effects")
##
## Table 3: Term 1 Average Scores Regressed on PHQ and GAD baseline scores
## =======================================================================
## Dependent variable:
## ---------------------------------------
## T1_Avg_Marks
## (1) (2) (3) (4)
## -----------------------------------------------------------------------
## PHQ_Total_0 0.021***
## (0.005)
##
## trt 0.097* 0.097*
## (0.052) (0.052)
##
## GAD_Total_0 0.014***
## (0.005)
##
## T1_No_of_Sub 0.171*** 0.171*** 0.170*** 0.171***
## (0.014) (0.015) (0.015) (0.015)
##
## Constant -1.977*** -1.710*** -1.831*** -1.710***
## (0.163) (0.150) (0.160) (0.150)
##
## -----------------------------------------------------------------------
## Observations 1,373 1,373 1,373 1,373
## R2 0.105 0.095 0.097 0.095
## Adjusted R2 0.104 0.093 0.096 0.093
## Residual Std. Error (df = 1370) 0.959 0.964 0.963 0.964
## F Statistic (df = 2; 1370) 80.485*** 71.632*** 73.717*** 71.632***
## =======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create scatter plots with regression lines for SWEMWBS, PCS, and GQ
plot_swemwbs <- ggplot(combined, aes(x = SWEMWBS_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SWEMWBS_Total_0 vs T1_Avg_Marks",
x = "SWEMWBS_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
plot_pcs <- ggplot(combined, aes(x = PCS_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "PCS_Total_0 vs T1_Avg_Marks",
x = "PCS_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
plot_gq <- ggplot(combined, aes(x = GQ_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(title = "GQ_Total_0 vs T1_Avg_Marks",
x = "GQ_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
# Combine the plots side by side
combined_plots <- plot_swemwbs + plot_pcs + plot_gq +
plot_layout(guides = "collect")
# Print the combined plots
print(combined_plots)
model_swemwbs_t1 <- lm(T1_Avg_Marks ~ SWEMWBS_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_swemwbs_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_pcs_t1 <- lm(T1_Avg_Marks ~ PCS_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_pcs_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_gq_t1 <- lm(T1_Avg_Marks ~ GQ_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_gq_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
stargazer(model_swemwbs_t1, model_swemwbs_t1_v1 , model_pcs_t1, model_pcs_t1_v1, type = "text", title = "Table 4: Term 1 Average Scores Regressed on SWEMWBS and PCS baseline scores", notes = "Includes School fixed effects")
##
## Table 4: Term 1 Average Scores Regressed on SWEMWBS and PCS baseline scores
## =======================================================================
## Dependent variable:
## ---------------------------------------
## T1_Avg_Marks
## (1) (2) (3) (4)
## -----------------------------------------------------------------------
## SWEMWBS_Total_0 0.020***
## (0.004)
##
## trt 0.097* 0.097*
## (0.052) (0.052)
##
## PCS_Total_0 -0.062***
## (0.009)
##
## T1_No_of_Sub 0.170*** 0.171*** 0.169*** 0.171***
## (0.014) (0.015) (0.014) (0.015)
##
## Constant -2.091*** -1.710*** -1.210*** -1.710***
## (0.167) (0.150) (0.159) (0.150)
##
## -----------------------------------------------------------------------
## Observations 1,373 1,373 1,373 1,373
## R2 0.111 0.095 0.125 0.095
## Adjusted R2 0.110 0.093 0.124 0.093
## Residual Std. Error (df = 1370) 0.956 0.964 0.948 0.964
## F Statistic (df = 2; 1370) 85.469*** 71.632*** 97.745*** 71.632***
## =======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
stargazer(model_gq_t1, model_gq_t1_v1, type = "text", title = "Table 5: Term 1 Average Scores Regressed on GQ baseline scores", notes = "Includes School fixed effects")
##
## Table 5: Term 1 Average Scores Regressed on GQ baseline scores
## ==============================================================
## Dependent variable:
## ------------------------------
## T1_Avg_Marks
## (1) (2)
## --------------------------------------------------------------
## GQ_Total_0 0.011***
## (0.004)
##
## trt 0.097*
## (0.052)
##
## T1_No_of_Sub 0.172*** 0.171***
## (0.014) (0.015)
##
## Constant -1.975*** -1.710***
## (0.180) (0.150)
##
## --------------------------------------------------------------
## Observations 1,373 1,373
## R2 0.098 0.095
## Adjusted R2 0.097 0.093
## Residual Std. Error (df = 1370) 0.962 0.964
## F Statistic (df = 2; 1370) 74.642*** 71.632***
## ==============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create the first plot: SES_Total_0 vs T1_Avg_Marks
plot_ses <- ggplot(combined, aes(x = SES_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SES_Total_0 vs T1_Avg_Marks",
x = "SES_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
# Create the second plot: MSPSS_Total_0 vs T1_Avg_Marks
plot_mpss <- ggplot(combined, aes(x = MSPSS_Total_0, y = T1_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "MSPSS_Total_0 vs T1_Avg_Marks",
x = "MSPSS_Total_0",
y = "T1_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot_ses_mpss <- plot_ses + plot_mpss + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot_ses_mpss)
model_mspss_t1 <- lm(T1_Avg_Marks ~ MSPSS_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_mspss_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_ses_t1 <- lm(T1_Avg_Marks ~ SES_Total_0 + T1_No_of_Sub, data = combined, cluster = ~School_ID)
model_ses_t1_v1 <- lm(T1_Avg_Marks ~ trt + T1_No_of_Sub, data = combined, cluster = ~School_ID)
stargazer(model_mspss_t1, model_mspss_t1_v1 , model_ses_t1, model_ses_t1_v1, type = "text", title = "Table 6: Term 1 Average Scores Regressed on MSPSS and SES baseline scores", notes = "Includes School fixed effects")
##
## Table 6: Term 1 Average Scores Regressed on MSPSS and SES baseline scores
## =======================================================================
## Dependent variable:
## ---------------------------------------
## T1_Avg_Marks
## (1) (2) (3) (4)
## -----------------------------------------------------------------------
## MSPSS_Total_0 0.001
## (0.002)
##
## trt 0.097* 0.097*
## (0.052) (0.052)
##
## SES_Total_0 0.007***
## (0.002)
##
## T1_No_of_Sub 0.171*** 0.171*** 0.165*** 0.171***
## (0.015) (0.015) (0.014) (0.015)
##
## Constant -1.692*** -1.710*** -2.075*** -1.710***
## (0.167) (0.150) (0.171) (0.150)
##
## -----------------------------------------------------------------------
## Observations 1,373 1,373 1,373 1,373
## R2 0.092 0.095 0.107 0.095
## Adjusted R2 0.091 0.093 0.105 0.093
## Residual Std. Error (df = 1370) 0.966 0.964 0.958 0.964
## F Statistic (df = 2; 1370) 69.797*** 71.632*** 81.900*** 71.632***
## =======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Define the base model and incremental models
model_fe_1 <- lm(T2_Avg_Marks ~ trt, data = combined, cluster = ~School_ID)
model_fe_2 <- lm(T2_Avg_Marks ~ trt + student_female, data = combined, cluster = ~School_ID)
model_fe_3 <- lm(T2_Avg_Marks ~ trt + student_female + rural_smalltown, data = combined, cluster = ~School_ID)
model_fe_4 <- lm(T2_Avg_Marks ~ trt + student_female + rural_smalltown + bigtown_city, data = combined, cluster = ~School_ID)
model_fe_5 <- lm(T2_Avg_Marks ~ trt + student_female + rural_smalltown + bigtown_city + as.factor(Form), data = combined, cluster = ~School_ID)
# Create the stargazer table
stargazer(model_fe_1, model_fe_2, model_fe_3, model_fe_4, model_fe_5,
type = "text",
title = "Table 7: Incremental Regression Results for Avg Marks in term 2 on Shamiri treatment",
covariate.labels = c("Treatment", "Female Student", "Rural/Small Town", "Big Town/City",
"Form 2 Dummy", "Form 3 Dummy", "Form 4 Dummy"),
dep.var.labels = "Average Marks (T2)",
omit.stat = c("f", "ser"), # Adjust to show/hide specific statistics
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE, notes = "Includes School fixed effects")
##
## Table 7: Incremental Regression Results for Avg Marks in term 2 on Shamiri treatment
## ============================================================
## Dependent variable:
## -------------------------------------------
## Average Marks (T2)
## (1) (2) (3) (4) (5)
## ------------------------------------------------------------
## Treatment 0.056 0.055 0.048 0.048 -0.001
## (0.055) (0.055) (0.055) (0.055) (0.054)
## Female Student 0.022 0.001 -0.002 0.062
## (0.056) (0.056) (0.056) (0.056)
## Rural/Small Town 0.235*** 0.365*** 0.319*
## (0.056) (0.128) (0.165)
## Big Town/City 0.147 0.124
## (0.131) (0.167)
## Form 2 Dummy -0.485***
## (0.065)
## Form 3 Dummy -0.673***
## (0.070)
## Form 4 Dummy 0.056
## (0.176)
## Constant -0.000 -0.009 -0.133** -0.262** 0.120
## (0.040) (0.046) (0.054) (0.126) (0.171)
## ------------------------------------------------------------
## Observations 1,346 1,346 1,346 1,346 1,301
## R2 0.001 0.001 0.014 0.015 0.091
## Adjusted R2 0.00001 -0.001 0.012 0.012 0.086
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create the first plot: GAD_Total_8 vs T2_Avg_Marks
plot1 <- ggplot(combined, aes(x = GAD_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "GAD_Total_8 vs T2_Avg_Marks",
x = "GAD_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
# Create the second plot: PHQ_Total_8 vs T2_Avg_Marks
plot2 <- ggplot(combined, aes(x = PHQ_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "PHQ_Total_8 vs T2_Avg_Marks",
x = "PHQ_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot <- plot1 + plot2 + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot)
model_phq_t2 <- lm(T2_Avg_Marks ~ PHQ_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_phq_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_gad_t2 <- lm(T2_Avg_Marks ~ GAD_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_gad_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
stargazer(model_phq_t2, model_phq_t2_v1 , model_gad_t2, model_gad_t2_v1, type = "text", title = "Table 8: Term 1 Average Scores Regressed on GAD and PHQ endline scores", notes = "Includes School fixed effects")
##
## Table 8: Term 1 Average Scores Regressed on GAD and PHQ endline scores
## ===========================================================================
## Dependent variable:
## -------------------------------------------
## T2_Avg_Marks
## (1) (2) (3) (4)
## ---------------------------------------------------------------------------
## PHQ_Total_8 0.011***
## (0.002)
##
## trt -0.013 -0.013
## (0.032) (0.032)
##
## GAD_Total_8 0.010***
## (0.002)
##
## T2_No_of_Sub -0.009 -0.006 -0.008 -0.006
## (0.009) (0.009) (0.009) (0.009)
##
## T1_Avg_Marks 0.823*** 0.831*** 0.825*** 0.831***
## (0.017) (0.017) (0.017) (0.017)
##
## Constant -0.103 0.038 -0.074 0.038
## (0.092) (0.091) (0.092) (0.091)
##
## ---------------------------------------------------------------------------
## Observations 1,330 1,330 1,330 1,330
## R2 0.675 0.668 0.672 0.668
## Adjusted R2 0.674 0.667 0.671 0.667
## Residual Std. Error (df = 1326) 0.580 0.587 0.583 0.587
## F Statistic (df = 3; 1326) 917.726*** 888.114*** 905.462*** 888.114***
## ===========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create scatter plots with regression lines for SWEMWBS_8, PCS_8, and GQ_8
plot_swemwbs_8 <- ggplot(combined, aes(x = SWEMWBS_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SWEMWBS_Total_8 vs T2_Avg_Marks",
x = "SWEMWBS_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
plot_pcs_8 <- ggplot(combined, aes(x = PCS_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "PCS_Total_8 vs T2_Avg_Marks",
x = "PCS_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
plot_gq_8 <- ggplot(combined, aes(x = GQ_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(title = "GQ_Total_8 vs T2_Avg_Marks",
x = "GQ_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
# Combine the plots side by side
combined_plots_8 <- plot_swemwbs_8 + plot_pcs_8 + plot_gq_8 +
plot_layout(guides = "collect")
# Print the combined plots
print(combined_plots_8)
model_swemwbs_t2 <- lm(T2_Avg_Marks ~ SWEMWBS_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_swemwbs_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_pcs_t2 <- lm(T2_Avg_Marks ~ PCS_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_pcs_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_gq_t2 <- lm(T2_Avg_Marks ~ GQ_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_gq_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
stargazer(model_swemwbs_t2, model_swemwbs_t2_v1 , model_pcs_t2, model_pcs_t2_v1, type = "text", title = "Table 9: Term 2 Average Scores Regressed on SWEMWBS and PCS endline scores", notes = "Includes School fixed effects")
##
## Table 9: Term 2 Average Scores Regressed on SWEMWBS and PCS endline scores
## ===========================================================================
## Dependent variable:
## -------------------------------------------
## T2_Avg_Marks
## (1) (2) (3) (4)
## ---------------------------------------------------------------------------
## SWEMWBS_Total_8 0.008***
## (0.002)
##
## trt -0.013 -0.013
## (0.032) (0.032)
##
## PCS_Total_8 0.007
## (0.005)
##
## T2_No_of_Sub -0.007 -0.006 -0.007 -0.006
## (0.009) (0.009) (0.009) (0.009)
##
## T1_Avg_Marks 0.824*** 0.831*** 0.832*** 0.831***
## (0.017) (0.017) (0.017) (0.017)
##
## Constant -0.118 0.038 -0.00004 0.038
## (0.093) (0.091) (0.092) (0.091)
##
## ---------------------------------------------------------------------------
## Observations 1,330 1,330 1,330 1,330
## R2 0.674 0.668 0.668 0.668
## Adjusted R2 0.673 0.667 0.667 0.667
## Residual Std. Error (df = 1326) 0.581 0.587 0.586 0.587
## F Statistic (df = 3; 1326) 914.357*** 888.114*** 890.075*** 888.114***
## ===========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
stargazer(model_gq_t2, model_gq_t2_v1, type = "text", title = "Table 10: Term 2 Average Scores Regressed on GQ endline scores", notes = "Includes School fixed effects")
##
## Table 10: Term 2 Average Scores Regressed on GQ endline scores
## ==============================================================
## Dependent variable:
## ------------------------------
## T2_Avg_Marks
## (1) (2)
## --------------------------------------------------------------
## GQ_Total_8 0.006***
## (0.001)
##
## trt -0.013
## (0.032)
##
## T2_No_of_Sub -0.008 -0.006
## (0.009) (0.009)
##
## T1_Avg_Marks 0.828*** 0.831***
## (0.017) (0.017)
##
## Constant -0.097 0.038
## (0.093) (0.091)
##
## --------------------------------------------------------------
## Observations 1,330 1,330
## R2 0.672 0.668
## Adjusted R2 0.672 0.667
## Residual Std. Error (df = 1326) 0.582 0.587
## F Statistic (df = 3; 1326) 906.725*** 888.114***
## ==============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create the first plot: SES_Total_8 vs T2_Avg_Marks
plot_ses_8 <- ggplot(combined, aes(x = SES_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SES_Total_8 vs T2_Avg_Marks",
x = "SES_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
# Create the second plot: MSPSS_Total_8 vs T2_Avg_Marks
plot_mpss_8 <- ggplot(combined, aes(x = MSPSS_Total_8, y = T2_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "MSPSS_Total_8 vs T2_Avg_Marks",
x = "MSPSS_Total_8",
y = "T2_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot_ses_mpss_8 <- plot_ses_8 + plot_mpss_8 + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot_ses_mpss_8)
model_mspss_t2 <- lm(T2_Avg_Marks ~ MSPSS_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_mspss_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_ses_t2 <- lm(T2_Avg_Marks ~ MSPSS_Total_8 + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
model_ses_t2_v1 <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub + T1_Avg_Marks, data = combined, cluster = ~School_ID)
stargazer(model_mspss_t2, model_mspss_t2_v1 , model_ses_t2, model_ses_t2_v1, type = "text", title = "Table 11: Term 2 Average Scores Regressed on MSPSS and SES endline scores", notes = "Includes School fixed effects")
##
## Table 11: Term 2 Average Scores Regressed on MSPSS and SES endline scores
## ===========================================================================
## Dependent variable:
## -------------------------------------------
## T2_Avg_Marks
## (1) (2) (3) (4)
## ---------------------------------------------------------------------------
## MSPSS_Total_8 0.004*** 0.004***
## (0.001) (0.001)
##
## trt -0.013 -0.013
## (0.032) (0.032)
##
## T2_No_of_Sub -0.008 -0.006 -0.008 -0.006
## (0.009) (0.009) (0.009) (0.009)
##
## T1_Avg_Marks 0.827*** 0.831*** 0.827*** 0.831***
## (0.017) (0.017) (0.017) (0.017)
##
## Constant -0.076 0.038 -0.076 0.038
## (0.093) (0.091) (0.093) (0.091)
##
## ---------------------------------------------------------------------------
## Observations 1,330 1,330 1,330 1,330
## R2 0.671 0.668 0.671 0.668
## Adjusted R2 0.671 0.667 0.671 0.667
## Residual Std. Error (df = 1326) 0.583 0.587 0.583 0.587
## F Statistic (df = 3; 1326) 902.941*** 888.114*** 902.941*** 888.114***
## ===========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create the first plot: GAD_Total_26 vs T3_Avg_Marks
plot1_t3 <- ggplot(combined, aes(x = GAD_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "GAD_Total_26 vs T3_Avg_Marks",
x = "GAD_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
# Create the second plot: PHQ_Total_26 vs T3_Avg_Marks
plot2_t3 <- ggplot(combined, aes(x = PHQ_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "PHQ_Total_26 vs T3_Avg_Marks",
x = "PHQ_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot_t3 <- plot1_t3 + plot2_t3 + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot_t3)
model_phq_t3 <- lm(T3_Avg_Marks ~ PHQ_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_phq_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_gad_t3 <- lm(T3_Avg_Marks ~ GAD_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_gad_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
stargazer(model_phq_t3, model_phq_t3_v1 , model_gad_t3, model_gad_t3_v1, type = "text", title = "Table 12: Term 3 Average Scores Regressed on GAD and PHQ 6MFU scores", notes = "Includes School fixed effects")
##
## Table 12: Term 3 Average Scores Regressed on GAD and PHQ 6MFU scores
## =========================================================================================================================
## Dependent variable:
## -----------------------------------------------------------------------------------------------------
## T3_Avg_Marks
## (1) (2) (3) (4)
## -------------------------------------------------------------------------------------------------------------------------
## PHQ_Total_26 -0.002
## (0.004)
##
## trt -0.025 -0.025
## (0.027) (0.027)
##
## GAD_Total_26 0.004
## (0.004)
##
## T3_No_of_Sub 0.032*** 0.024** 0.032*** 0.024**
## (0.010) (0.009) (0.010) (0.009)
##
## T1_Avg_Marks 0.116*** 0.142*** 0.116*** 0.142***
## (0.030) (0.027) (0.030) (0.027)
##
## T2_Avg_Marks 0.770*** 0.729*** 0.770*** 0.729***
## (0.028) (0.024) (0.028) (0.024)
##
## Constant -0.283*** -0.183** -0.294*** -0.183**
## (0.094) (0.085) (0.094) (0.085)
##
## -------------------------------------------------------------------------------------------------------------------------
## Observations 981 1,266 981 1,266
## R2 0.785 0.759 0.785 0.759
## Adjusted R2 0.784 0.759 0.784 0.759
## Residual Std. Error 0.460 (df = 976) 0.484 (df = 1261) 0.459 (df = 976) 0.484 (df = 1261)
## F Statistic 891.065*** (df = 4; 976) 994.356*** (df = 4; 1261) 891.959*** (df = 4; 976) 994.356*** (df = 4; 1261)
## =========================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create scatter plots with regression lines for SWEMWBS_26, PCS_26, and GQ_26
plot_swemwbs_26 <- ggplot(combined, aes(x = SWEMWBS_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SWEMWBS_Total_26 vs T3_Avg_Marks",
x = "SWEMWBS_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
plot_pcs_26 <- ggplot(combined, aes(x = PCS_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "PCS_Total_26 vs T3_Avg_Marks",
x = "PCS_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
plot_gq_26 <- ggplot(combined, aes(x = GQ_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(title = "GQ_Total_26 vs T3_Avg_Marks",
x = "GQ_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
# Combine the plots side by side
combined_plots_26 <- plot_swemwbs_26 + plot_pcs_26 + plot_gq_26 +
plot_layout(guides = "collect")
# Print the combined plots
print(combined_plots_26)
model_swemwbs_t3 <- lm(T3_Avg_Marks ~ SWEMWBS_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_swemwbs_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_pcs_t3 <- lm(T3_Avg_Marks ~ PCS_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_pcs_t3_v1 <- lm(T3_Avg_Marks ~ + trt + T3_No_of_Sub, data = combined, cluster = ~School_ID)
model_gq_t3 <- lm(T3_Avg_Marks ~ GQ_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_gq_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub, data = combined, cluster = ~School_ID)
stargazer(model_swemwbs_t3, model_swemwbs_t3_v1 , model_pcs_t3, model_pcs_t3_v1, type = "text", title = "Table 13: Term 3 Average Scores Regressed on SWEMWBS and PCS 6MFU scores", notes = "Includes School fixed effects")
##
## Table 13: Term 3 Average Scores Regressed on SWEMWBS and PCS 6MFU scores
## ========================================================================================================================
## Dependent variable:
## ----------------------------------------------------------------------------------------------------
## T3_Avg_Marks
## (1) (2) (3) (4)
## ------------------------------------------------------------------------------------------------------------------------
## SWEMWBS_Total_26 0.003**
## (0.001)
##
## trt -0.025 -0.019
## (0.027) (0.051)
##
## PCS_Total_26 0.003
## (0.002)
##
## T3_No_of_Sub 0.033*** 0.024** 0.032*** 0.191***
## (0.010) (0.009) (0.010) (0.015)
##
## T1_Avg_Marks 0.117*** 0.142*** 0.116***
## (0.030) (0.027) (0.030)
##
## T2_Avg_Marks 0.764*** 0.729*** 0.766***
## (0.028) (0.024) (0.028)
##
## Constant -0.344*** -0.183** -0.323*** -1.685***
## (0.097) (0.085) (0.096) (0.135)
##
## ------------------------------------------------------------------------------------------------------------------------
## Observations 981 1,266 981 1,316
## R2 0.786 0.759 0.786 0.113
## Adjusted R2 0.785 0.759 0.785 0.112
## Residual Std. Error 0.459 (df = 976) 0.484 (df = 1261) 0.459 (df = 976) 0.932 (df = 1313)
## F Statistic 896.032*** (df = 4; 976) 994.356*** (df = 4; 1261) 893.767*** (df = 4; 976) 84.049*** (df = 2; 1313)
## ========================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
stargazer(model_gq_t3, model_gq_t3_v1, type = "text", title = "Table 14: Term 3 Average Scores Regressed on GQ 6MFU scores", notes = "Includes School fixed effects")
##
## Table 14: Term 3 Average Scores Regressed on GQ 6MFU scores
## =====================================================================
## Dependent variable:
## -------------------------------------------------
## T3_Avg_Marks
## (1) (2)
## ---------------------------------------------------------------------
## GQ_Total_26 0.002**
## (0.001)
##
## trt -0.019
## (0.051)
##
## T3_No_of_Sub 0.033*** 0.191***
## (0.010) (0.015)
##
## T1_Avg_Marks 0.117***
## (0.030)
##
## T2_Avg_Marks 0.764***
## (0.028)
##
## Constant -0.355*** -1.685***
## (0.097) (0.135)
##
## ---------------------------------------------------------------------
## Observations 981 1,316
## R2 0.786 0.113
## Adjusted R2 0.785 0.112
## Residual Std. Error 0.458 (df = 976) 0.932 (df = 1313)
## F Statistic 898.160*** (df = 4; 976) 84.049*** (df = 2; 1313)
## =====================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
# Create the first plot: SES_Total_26 vs T3_Avg_Marks
plot_ses_26 <- ggplot(combined, aes(x = SES_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "SES_Total_26 vs T3_Avg_Marks",
x = "SES_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
# Create the second plot: MSPSS_Total_26 vs T3_Avg_Marks
plot_mpss_26 <- ggplot(combined, aes(x = MSPSS_Total_26, y = T3_Avg_Marks)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "MSPSS_Total_26 vs T3_Avg_Marks",
x = "MSPSS_Total_26",
y = "T3_Avg_Marks") +
theme_minimal()
# Combine the plots side by side with the same y-axis scale
combined_plot_ses_mpss_26 <- plot_ses_26 + plot_mpss_26 + plot_layout(guides = "collect")
# Print the combined plot
print(combined_plot_ses_mpss_26)
model_mspss_t3 <- lm(T3_Avg_Marks ~ MSPSS_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_mspss_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_ses_t3 <- lm(T3_Avg_Marks ~ SES_Total_26 + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
model_ses_t3_v1 <- lm(T3_Avg_Marks ~ trt + T3_No_of_Sub + T1_Avg_Marks + T2_Avg_Marks, data = combined, cluster = ~School_ID)
stargazer(model_mspss_t3, model_mspss_t3_v1 , model_ses_t3, model_ses_t3_v1, type = "text", title = "Table 15: Term 3 Average Scores Regressed on MSPSS and GQ 6MFU scores", notes = "Includes School fixed effects")
##
## Table 15: Term 3 Average Scores Regressed on MSPSS and GQ 6MFU scores
## =========================================================================================================================
## Dependent variable:
## -----------------------------------------------------------------------------------------------------
## T3_Avg_Marks
## (1) (2) (3) (4)
## -------------------------------------------------------------------------------------------------------------------------
## MSPSS_Total_26 0.002**
## (0.001)
##
## trt -0.025 -0.025
## (0.027) (0.027)
##
## SES_Total_26 0.001*
## (0.0004)
##
## T3_No_of_Sub 0.032*** 0.024** 0.032*** 0.024**
## (0.010) (0.009) (0.010) (0.009)
##
## T1_Avg_Marks 0.119*** 0.142*** 0.117*** 0.142***
## (0.030) (0.027) (0.030) (0.027)
##
## T2_Avg_Marks 0.764*** 0.729*** 0.765*** 0.729***
## (0.028) (0.024) (0.028) (0.024)
##
## Constant -0.334*** -0.183** -0.328*** -0.183**
## (0.096) (0.085) (0.096) (0.085)
##
## -------------------------------------------------------------------------------------------------------------------------
## Observations 981 1,266 981 1,266
## R2 0.786 0.759 0.786 0.759
## Adjusted R2 0.785 0.759 0.785 0.759
## Residual Std. Error 0.458 (df = 976) 0.484 (df = 1261) 0.459 (df = 976) 0.484 (df = 1261)
## F Statistic 896.070*** (df = 4; 976) 994.356*** (df = 4; 1261) 894.323*** (df = 4; 976) 994.356*** (df = 4; 1261)
## =========================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Includes School fixed effects
clinical <- combined %>%
filter(clinical == 1)
nonclinical <- combined %>%
filter(clinical == 0)
# Fit the linear models
model_lm_clinical <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub,
data = clinical, cluster = ~School_ID)
model_lm_nonclinical <- lm(T2_Avg_Marks ~ trt + T2_No_of_Sub,
data = nonclinical, cluster = ~School_ID)
# Generate the stargazer output
stargazer(model_lm_clinical, model_lm_nonclinical,
type = "text",
title = "Table 16: Correlation of Treatment on Avg Marks of Term 2 for Clinical and Non-Clinical sample",
covariate.labels = c("Treatment", "Number of Subjects"),
dep.var.labels = "T2_Avg_Marks",
column.labels = c("Clinical", "Non-Clinical"),
no.space = TRUE,
align = TRUE,
digits = 3,
out = "Impact_Treatment_T2.txt")
##
## Table 16: Correlation of Treatment on Avg Marks of Term 2 for Clinical and Non-Clinical sample
## ===================================================================
## Dependent variable:
## -----------------------------------------------
## T2
## Clinical Non-Clinical
## (1) (2)
## -------------------------------------------------------------------
## Treatment 0.080 0.002
## (0.069) (0.085)
## Number of Subjects 0.138*** 0.129***
## (0.019) (0.023)
## Constant -1.400*** -1.096***
## (0.183) (0.232)
## -------------------------------------------------------------------
## Observations 790 556
## R2 0.066 0.053
## Adjusted R2 0.064 0.050
## Residual Std. Error 0.967 (df = 787) 1.003 (df = 553)
## F Statistic 27.882*** (df = 2; 787) 15.486*** (df = 2; 553)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
combined <- combined %>%
mutate(phq_score_diff = PHQ_Total_8 - PHQ_Total_0,
gad_score_diff = GAD_Total_8 - GAD_Total_0,
avg_score_diff = T2_Avg_Marks - T1_Avg_Marks)
regress_change_phq <- lm(avg_score_diff ~ phq_score_diff, data = combined, cluster = ~School_ID)
regress_change_phq_trt <- lm(avg_score_diff ~ trt, data = combined, cluster = ~School_ID)
regress_change_gad <- lm(avg_score_diff ~ gad_score_diff, data = combined, cluster = ~School_ID)
regress_change_gad_trt <- lm(avg_score_diff ~ trt, data = combined, cluster = ~School_ID)
# Create stargazer table
stargazer(regress_change_phq, regress_change_phq_trt,
title = "Table 17: Difference in Avg Score Between T1 and T2 Regressed on PHQ Score Difference and Treatment",
type = "text",
dep.var.labels = "T1 and T2 Average Score Difference",
covariate.labels = c("PHQ Score diff (W0 to W8)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ PHQ", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 17: Difference in Avg Score Between T1 and T2 Regressed on PHQ Score Difference and Treatment
## ===================================================================
## Dependent variable:
## -----------------------------------
## T1 and T2 Average Score Difference
## Score diff ~ PHQ Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## PHQ Score diff (W0 to W8) 0.007***
## (0.002)
## Treatment (Trt) -0.028
## (0.034)
## Constant -0.033** -0.025
## (0.017) (0.024)
## -------------------------------------------------------------------
## Observations 1,330 1,330
## R2 0.010 0.001
## Adjusted R2 0.010 -0.0002
## Residual Std. Error (df = 1328) 0.608 0.611
## F Statistic (df = 1; 1328) 13.984*** 0.720
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(regress_change_gad, regress_change_gad_trt,
title = "Table 18: Difference in Avg Score Between T1 and T2 Regressed on GAD Score Difference and Treatment",
type = "text",
dep.var.labels = "T1 and T2 Average Score Difference",
covariate.labels = c("GAD Score diff (W0 to W8)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ GAD", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 18: Difference in Avg Score Between T1 and T2 Regressed on GAD Score Difference and Treatment
## ===================================================================
## Dependent variable:
## -----------------------------------
## T1 and T2 Average Score Difference
## Score diff ~ GAD Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## GAD Score diff (W0 to W8) 0.006***
## (0.002)
## Treatment (Trt) -0.028
## (0.034)
## Constant -0.034** -0.025
## (0.017) (0.024)
## -------------------------------------------------------------------
## Observations 1,330 1,330
## R2 0.006 0.001
## Adjusted R2 0.005 -0.0002
## Residual Std. Error (df = 1328) 0.610 0.611
## F Statistic (df = 1; 1328) 8.089*** 0.720
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
combined <- combined %>%
mutate(phq_score_diff_1 = PHQ_Total_26 - PHQ_Total_0,
gad_score_diff_1 = GAD_Total_26 - GAD_Total_0,
avg_score_diff_1 = T3_Avg_Marks - T1_Avg_Marks)
regress_change_phq_1 <- lm(avg_score_diff_1 ~ phq_score_diff_1, data = combined, cluster = ~School_ID)
regress_change_phq_trt_1 <- lm(avg_score_diff_1 ~ trt, data = combined, cluster = ~School_ID)
regress_change_gad_1 <- lm(avg_score_diff_1 ~ gad_score_diff_1, data = combined, cluster = ~School_ID)
regress_change_gad_trt_1 <- lm(avg_score_diff_1 ~ trt, data = combined, cluster = ~School_ID)
# Create stargazer table
stargazer(regress_change_phq_1, regress_change_phq_trt_1,
title = "Table 19: Difference in Avg Score Between T1 and T3 Regressed on PHQ Score Difference and Treatment",
type = "text",
dep.var.labels = "T2 and T3 Average Score Difference",
covariate.labels = c("PHQ Score diff (W0 to W26)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ PHQ", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 19: Difference in Avg Score Between T1 and T3 Regressed on PHQ Score Difference and Treatment
## ===================================================================
## Dependent variable:
## ----------------------------------------
## T2 and T3 Average Score Difference
## Score diff ~ PHQ Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## PHQ Score diff (W0 to W26) 0.001
## (0.003)
## Treatment (Trt) -0.043
## (0.038)
## Constant -0.007 0.0003
## (0.045) (0.027)
## -------------------------------------------------------------------
## Observations 991 1,300
## R2 0.0002 0.001
## Adjusted R2 -0.001 0.0002
## Residual Std. Error 0.657 (df = 989) 0.689 (df = 1298)
## F Statistic 0.212 (df = 1; 989) 1.255 (df = 1; 1298)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(regress_change_gad_1, regress_change_gad_trt_1,
title = "Table 20: Difference in Avg Score Between T1 and T3 Regressed on GAD Score Difference and Treatment",
type = "text",
dep.var.labels = "T2 and T3 Average Score Difference",
covariate.labels = c("GAD Score diff (W0 to W26)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ GAD", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 20: Difference in Avg Score Between T1 and T3 Regressed on GAD Score Difference and Treatment
## ===================================================================
## Dependent variable:
## ----------------------------------------
## T2 and T3 Average Score Difference
## Score diff ~ GAD Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## GAD Score diff (W0 to W26) 0.001
## (0.003)
## Treatment (Trt) -0.043
## (0.038)
## Constant -0.007 0.0003
## (0.042) (0.027)
## -------------------------------------------------------------------
## Observations 991 1,300
## R2 0.0003 0.001
## Adjusted R2 -0.001 0.0002
## Residual Std. Error 0.657 (df = 989) 0.689 (df = 1298)
## F Statistic 0.250 (df = 1; 989) 1.255 (df = 1; 1298)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
combined <- combined %>%
mutate(phq_score_diff_2 = PHQ_Total_26 - PHQ_Total_8,
gad_score_diff_2 = GAD_Total_26 - GAD_Total_8,
avg_score_diff_2 = T3_Avg_Marks - T2_Avg_Marks)
regress_change_phq_2 <- lm(avg_score_diff_2 ~ phq_score_diff_2, data = combined, cluster = ~School_ID)
regress_change_phq_trt_2 <- lm(avg_score_diff_2 ~ trt, data = combined, cluster = ~School_ID)
regress_change_gad_2 <- lm(avg_score_diff_2 ~ gad_score_diff_2, data = combined, cluster = ~School_ID)
regress_change_gad_trt_2 <- lm(avg_score_diff_2 ~ trt, data = combined, cluster = ~School_ID)
# Create stargazer table
stargazer(regress_change_phq_2, regress_change_phq_trt_2,
title = "Table 21: Difference in Avg Score Between T2 and T3 Regressed on PHQ Score Difference and Treatment",
type = "text",
dep.var.labels = "T2 and T3 Average Score Difference",
covariate.labels = c("PHQ Score diff (W8 to W26)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ PHQ", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 21: Difference in Avg Score Between T2 and T3 Regressed on PHQ Score Difference and Treatment
## ===================================================================
## Dependent variable:
## ----------------------------------------
## T2 and T3 Average Score Difference
## Score diff ~ PHQ Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## PHQ Score diff (W8 to W26) -0.0003
## (0.002)
## Treatment (Trt) -0.013
## (0.029)
## Constant -0.004 0.030
## (0.027) (0.020)
## -------------------------------------------------------------------
## Observations 990 1,280
## R2 0.00003 0.0001
## Adjusted R2 -0.001 -0.001
## Residual Std. Error 0.487 (df = 988) 0.517 (df = 1278)
## F Statistic 0.026 (df = 1; 988) 0.189 (df = 1; 1278)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(regress_change_gad_2, regress_change_gad_trt_2,
title = "Table 22: Difference in Avg Score Between T2 and T3 Regressed on GAD Score Difference and Treatment",
type = "text",
dep.var.labels = "T2 and T3 Average Score Difference",
covariate.labels = c("GAD Score diff (W8 to W26)",
"Treatment (Trt)"),
column.labels = c("Score diff ~ GAD", "Score diff ~ trt"),
star.cutoffs = c(0.1, 0.05, 0.01), # Adjust significance levels for stars
no.space = TRUE,
out = "regression_results.txt")
##
## Table 22: Difference in Avg Score Between T2 and T3 Regressed on GAD Score Difference and Treatment
## ===================================================================
## Dependent variable:
## ----------------------------------------
## T2 and T3 Average Score Difference
## Score diff ~ GAD Score diff ~ trt
## (1) (2)
## -------------------------------------------------------------------
## GAD Score diff (W8 to W26) 0.0005
## (0.002)
## Treatment (Trt) -0.013
## (0.029)
## Constant 0.005 0.030
## (0.026) (0.020)
## -------------------------------------------------------------------
## Observations 990 1,280
## R2 0.0001 0.0001
## Adjusted R2 -0.001 -0.001
## Residual Std. Error 0.487 (df = 988) 0.517 (df = 1278)
## F Statistic 0.061 (df = 1; 988) 0.189 (df = 1; 1278)
## ===================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# List of columns to check for missing values
columns_to_check <- c("T1_ENG", "T1_KIS", "T1_MAT", "T1_BIO", "T1_PHY", "T1_CHE",
"T1_HIS", "T1_GEO", "T1_CRE", "T1_HSC", "T1_AGR", "T1_BST",
"T1_CS", "T1_COM", "T1_MW", "T1_DD", "T1_FRE", "T1_KSL",
"T2_ENG", "T2_KIS", "T2_MAT", "T2_BIO", "T2_PHY", "T2_CHE",
"T2_HIS", "T2_GEO", "T2_CRE", "T2_HSC", "T2_AGR", "T2_BST",
"T2_CS", "T2_COM", "T2_MW", "T2_DD", "T2_FRE", "T2_KSL",
"T3_ENG", "T3_KIS", "T3_MAT", "T3_BIO", "T3_PHY", "T3_CHE",
"T3_HIS", "T3_GEO", "T3_CRE", "T3_HSC", "T3_AGR", "T3_BST",
"T3_COM", "T3_MW", "T3_DD", "T3_FRE", "T3_KSL")
# Calculate the number of missing values for each column
missing_values <- sapply(columns_to_check, function(col) sum(is.na(combined[[col]])))
eng_reg <- lm(T2_ENG ~ PHQ_Total_8 + trt + T1_ENG, data = combined)
kis_reg <- lm(T2_KIS ~ PHQ_Total_8 + trt + T1_KIS, data = combined)
bio_reg <- lm(T2_BIO ~ PHQ_Total_8 + trt + T1_BIO, data = combined)
mat_reg <- lm(T2_MAT ~ PHQ_Total_8 + trt + T1_MAT, data = combined)
che_reg <- lm(T2_CHE ~ PHQ_Total_8 + trt + T1_CHE, data = combined)
cre_reg <- lm(T2_CRE ~ PHQ_Total_8 + trt + T1_CRE, data = combined)
phy_reg <- lm(T2_PHY ~ PHQ_Total_8 + trt + T1_PHY, data = combined)
# Calculate clustered standard errors for each model
eng_se <- sqrt(diag(vcovHC(eng_reg, type = "HC0", cluster = ~School_ID)))
kis_se <- sqrt(diag(vcovHC(kis_reg, type = "HC0", cluster = ~School_ID)))
bio_se <- sqrt(diag(vcovHC(bio_reg, type = "HC0", cluster = ~School_ID)))
mat_se <- sqrt(diag(vcovHC(mat_reg, type = "HC0", cluster = ~School_ID)))
che_se <- sqrt(diag(vcovHC(che_reg, type = "HC0", cluster = ~School_ID)))
phy_se <- sqrt(diag(vcovHC(phy_reg, type = "HC0", cluster = ~School_ID)))
stargazer(eng_reg, kis_reg, bio_reg, mat_reg, che_reg, phy_reg,
se = list(eng_se, kis_se, bio_se, mat_se, che_se, phy_se),
type = "text",
title = "Table 23: Subject wise analysis of correlation between term 2 scores and PHQ total scores in week 8",
no.space = TRUE, digits = 2,
omit.stat = c("f", "ser"))
##
## Table 23: Subject wise analysis of correlation between term 2 scores and PHQ total scores in week 8
## ===============================================================
## Dependent variable:
## --------------------------------------------------
## T2_ENG T2_KIS T2_BIO T2_MAT T2_CHE T2_PHY
## (1) (2) (3) (4) (5) (6)
## ---------------------------------------------------------------
## PHQ_Total_8 0.01** 0.01*** 0.02*** 0.01*** 0.01*** 0.003
## (0.003) (0.003) (0.003) (0.003) (0.003) (0.005)
## trt 0.04 0.08** 0.02 0.09* -0.01 0.06
## (0.05) (0.04) (0.05) (0.05) (0.04) (0.06)
## T1_ENG 0.41***
## (0.03)
## T1_KIS 0.64***
## (0.02)
## T1_BIO 0.67***
## (0.03)
## T1_MAT 0.73***
## (0.04)
## T1_CHE 0.68***
## (0.02)
## T1_PHY 0.73***
## (0.04)
## Constant -0.12** -0.14** -0.26*** -0.15*** -0.16*** -0.12
## (0.06) (0.05) (0.06) (0.06) (0.05) (0.08)
## ---------------------------------------------------------------
## Observations 1,285 1,298 1,112 1,296 1,300 888
## R2 0.18 0.42 0.43 0.41 0.47 0.43
## Adjusted R2 0.18 0.42 0.43 0.41 0.47 0.43
## ===============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01