1. Introduction

This report presents a comprehensive Multivariate Analysis of Covariance (MANCOVA) of student academic performance data. MANCOVA extends MANOVA by incorporating one or more continuous covariates to control for their influence on the dependent variable set, thereby providing a more precise examination of group differences.

Two analytical approaches are employed and systematically compared:

  1. Full Factorial MANCOVA examines the simultaneous effects of multiple demographic and academic factors (gender, race/ethnicity, lunch type, and test preparation course) on academic outcomes while controlling for parental education level
  2. Simplified One-Way MANCOVA isolates the effect of gender on academic performance after controlling for parental education level

The dataset comprises 1,000 student records with three dependent variables (math score, reading score, and writing score), four categorical independent variables, and one ordinal covariate derived from parental level of education.

The analysis pipeline follows: data loading → exploratory data analysis → assumption checking → MANCOVA execution → post-hoc testing → effect size estimation → interpretation.


2. Data Loading and Initial Inspection

df_raw <- read.csv("StudentsPerformance.csv", stringsAsFactors = FALSE)

colnames(df_raw) <- c(
  "gender", "race_ethnicity", "parental_education",
  "lunch", "test_prep", "math_score", "reading_score", "writing_score"
)

cat(sprintf("Rows: %d | Columns: %d\n", nrow(df_raw), ncol(df_raw)))
## Rows: 1000 | Columns: 8
str(df_raw)
## 'data.frame':    1000 obs. of  8 variables:
##  $ gender            : chr  "female" "female" "female" "male" ...
##  $ race_ethnicity    : chr  "group B" "group C" "group B" "group A" ...
##  $ parental_education: chr  "bachelor's degree" "some college" "master's degree" "associate's degree" ...
##  $ lunch             : chr  "standard" "standard" "standard" "free/reduced" ...
##  $ test_prep         : chr  "none" "completed" "none" "none" ...
##  $ math_score        : int  72 69 90 47 76 71 88 40 64 38 ...
##  $ reading_score     : int  72 90 95 57 78 83 95 43 64 60 ...
##  $ writing_score     : int  74 88 93 44 75 78 92 39 67 50 ...
missing_summary <- data.frame(
  Column   = names(df_raw),
  Missing  = sapply(df_raw, function(x) sum(is.na(x))),
  Pct_Miss = sapply(df_raw, function(x) round(mean(is.na(x)) * 100, 2))
)
rownames(missing_summary) <- NULL

kable(missing_summary, caption = "Missing Values per Variable",
      col.names = c("Variable", "Missing Count", "Missing (%)")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)
Missing Values per Variable
Variable Missing Count Missing (%)
gender 0 0
race_ethnicity 0 0
parental_education 0 0
lunch 0 0
test_prep 0 0
math_score 0 0
reading_score 0 0
writing_score 0 0
kable(head(df_raw, 8), caption = "Preview: First 8 Rows of Dataset") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Preview: First 8 Rows of Dataset
gender race_ethnicity parental_education lunch test_prep math_score reading_score writing_score
female group B bachelor’s degree standard none 72 72 74
female group C some college standard completed 69 90 88
female group B master’s degree standard none 90 95 93
male group A associate’s degree free/reduced none 47 57 44
male group C some college standard none 76 78 75
female group B associate’s degree standard none 71 83 78
female group B some college standard completed 88 95 92
male group B some college free/reduced none 40 43 39

3. Data Preprocessing

3.1 Ordinal Encoding of Parental Education

The covariate parental_education is converted to an ordinal numeric scale representing the hierarchical ordering of educational attainment levels.

edu_order <- c(
  "some high school"    = 1,
  "high school"         = 2,
  "some college"        = 3,
  "associate's degree"  = 4,
  "bachelor's degree"   = 5,
  "master's degree"     = 6
)

df_raw$parental_edu_ord <- as.integer(edu_order[df_raw$parental_education])

edu_mapping <- data.frame(
  Category    = names(edu_order),
  Ordinal_Code = as.integer(edu_order)
)

kable(edu_mapping, caption = "Ordinal Encoding of Parental Education Level",
      col.names = c("Category", "Ordinal Code")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Ordinal Encoding of Parental Education Level
Category Ordinal Code
some high school 1
high school 2
some college 3
associate’s degree 4
bachelor’s degree 5
master’s degree 6
cat(sprintf("Ordinal range: %d to %d | No NAs: %s\n",
            min(df_raw$parental_edu_ord), max(df_raw$parental_edu_ord),
            sum(is.na(df_raw$parental_edu_ord)) == 0))
## Ordinal range: 1 to 6 | No NAs: TRUE

3.2 Factor Conversion

df_raw$gender         <- factor(df_raw$gender)
df_raw$race_ethnicity <- factor(df_raw$race_ethnicity)
df_raw$lunch          <- factor(df_raw$lunch)
df_raw$test_prep      <- factor(df_raw$test_prep)

cat("Factor levels:\n")
## Factor levels:
cat(sprintf("  gender         : %s\n", paste(levels(df_raw$gender), collapse = ", ")))
##   gender         : female, male
cat(sprintf("  race_ethnicity : %s\n", paste(levels(df_raw$race_ethnicity), collapse = ", ")))
##   race_ethnicity : group A, group B, group C, group D, group E
cat(sprintf("  lunch          : %s\n", paste(levels(df_raw$lunch), collapse = ", ")))
##   lunch          : free/reduced, standard
cat(sprintf("  test_prep      : %s\n", paste(levels(df_raw$test_prep), collapse = ", ")))
##   test_prep      : completed, none

3.3 Group Sample Sizes

group_size_df <- bind_rows(
  df_raw %>% count(gender) %>% rename(Level = gender) %>% mutate(Factor = "Gender"),
  df_raw %>% count(race_ethnicity) %>% rename(Level = race_ethnicity) %>% mutate(Factor = "Race/Ethnicity"),
  df_raw %>% count(lunch) %>% rename(Level = lunch) %>% mutate(Factor = "Lunch"),
  df_raw %>% count(test_prep) %>% rename(Level = test_prep) %>% mutate(Factor = "Test Prep")
) %>% select(Factor, Level, n)

kable(group_size_df, caption = "Sample Sizes per Factor Level",
      col.names = c("Factor", "Level", "n")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)
Sample Sizes per Factor Level
Factor Level n
Gender female 518
Gender male 482
Race/Ethnicity group A 89
Race/Ethnicity group B 190
Race/Ethnicity group C 319
Race/Ethnicity group D 262
Race/Ethnicity group E 140
Lunch free/reduced 355
Lunch standard 645
Test Prep completed 358
Test Prep none 642

4. Exploratory Data Analysis

4.1 Descriptive Statistics

dvs <- c("math_score", "reading_score", "writing_score")

desc_stats <- data.frame(
  Variable = dvs,
  N        = sapply(df_raw[dvs], function(x) sum(!is.na(x))),
  Mean     = sapply(df_raw[dvs], mean, na.rm = TRUE),
  SD       = sapply(df_raw[dvs], sd, na.rm = TRUE),
  Min      = sapply(df_raw[dvs], min, na.rm = TRUE),
  Q1       = sapply(df_raw[dvs], quantile, 0.25, na.rm = TRUE),
  Median   = sapply(df_raw[dvs], median, na.rm = TRUE),
  Q3       = sapply(df_raw[dvs], quantile, 0.75, na.rm = TRUE),
  Max      = sapply(df_raw[dvs], max, na.rm = TRUE),
  Skewness = sapply(df_raw[dvs], skewness, na.rm = TRUE),
  Kurtosis = sapply(df_raw[dvs], kurtosis, na.rm = TRUE)
)
rownames(desc_stats) <- NULL

kable(desc_stats %>% mutate(across(where(is.numeric), ~round(.x, 3))),
      caption = "Descriptive Statistics of Dependent Variables") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Descriptive Statistics of Dependent Variables
Variable N Mean SD Min Q1 Median Q3 Max Skewness Kurtosis
math_score 1000 66.089 15.163 0 57.00 66 77 100 -0.279 3.268
reading_score 1000 69.169 14.600 17 59.00 70 79 100 -0.259 2.926
writing_score 1000 68.054 15.196 10 57.75 69 79 100 -0.289 2.961

4.2 Distribution of Scores

df_scores_long <- df_raw %>%
  select(math_score, reading_score, writing_score) %>%
  pivot_longer(cols = everything(), names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_scores_long, aes(x = Score, fill = Subject)) +
  geom_histogram(aes(y = after_stat(density)), bins = 25,
                 color = "white", alpha = 0.75) +
  geom_density(aes(color = Subject), linewidth = 0.9, fill = NA) +
  facet_wrap(~Subject, scales = "free_y", ncol = 3) +
  scale_fill_brewer(palette = "Set1", guide = "none") +
  scale_color_brewer(palette = "Set1", guide = "none") +
  labs(title = "Score Distributions by Subject",
       subtitle = "Histogram overlaid with kernel density estimate",
       x = "Score", y = "Density") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold", size = 10))
Distribution of Academic Scores

Distribution of Academic Scores

4.3 Scores by Group (Boxplots)

df_gender_long <- df_raw %>%
  select(gender, math_score, reading_score, writing_score) %>%
  pivot_longer(-gender, names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_gender_long, aes(x = gender, y = Score, fill = gender)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 1.5, alpha = 0.75) +
  facet_wrap(~Subject, ncol = 3) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Score Distribution by Gender",
       x = "Gender", y = "Score", fill = "Gender") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))
Score Distribution by Gender

Score Distribution by Gender

make_bp <- function(factor_var, factor_label, palette_name) {
  df_plot <- df_raw %>%
    select(all_of(c(factor_var, dvs))) %>%
    pivot_longer(cols = all_of(dvs), names_to = "Subject", values_to = "Score") %>%
    mutate(Subject = case_when(
      Subject == "math_score"    ~ "Math",
      Subject == "reading_score" ~ "Reading",
      Subject == "writing_score" ~ "Writing",
      TRUE ~ Subject
    ),
    Factor = .data[[factor_var]])

  ggplot(df_plot, aes(x = Factor, y = Score, fill = Factor)) +
    geom_boxplot(outlier.shape = 21, outlier.size = 1.2, alpha = 0.75) +
    facet_wrap(~Subject, ncol = 3) +
    scale_fill_brewer(palette = palette_name) +
    labs(title = factor_label, x = NULL, y = "Score", fill = NULL) +
    theme_minimal(base_size = 10) +
    theme(plot.title   = element_text(face = "bold"),
          strip.text   = element_text(face = "bold"),
          axis.text.x  = element_text(angle = 20, hjust = 1),
          legend.position = "none")
}

p_race  <- make_bp("race_ethnicity", "By Race/Ethnicity", "Set2")
p_lunch <- make_bp("lunch",          "By Lunch Type",     "Set3")
p_prep  <- make_bp("test_prep",      "By Test Preparation", "Paired")

grid.arrange(p_race, p_lunch, p_prep, ncol = 1)
Score Distributions Across All Factors

Score Distributions Across All Factors

4.4 Covariate: Parental Education vs. Scores

df_cov_long <- df_raw %>%
  select(parental_edu_ord, math_score, reading_score, writing_score) %>%
  pivot_longer(-parental_edu_ord, names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_cov_long, aes(x = parental_edu_ord, y = Score, color = Subject)) +
  geom_jitter(alpha = 0.25, size = 0.9, width = 0.15) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.2) +
  facet_wrap(~Subject, ncol = 3) +
  scale_color_brewer(palette = "Set1", guide = "none") +
  scale_x_continuous(breaks = 1:6,
    labels = c("Some HS", "HS", "Some Col", "Assoc", "Bach", "Master")) +
  labs(title = "Parental Education (Covariate) vs. Academic Scores",
       subtitle = "Linear trend with 95% confidence band",
       x = "Parental Education Level (Ordinal)", y = "Score") +
  theme_minimal(base_size = 11) +
  theme(plot.title   = element_text(face = "bold"),
        strip.text   = element_text(face = "bold"),
        axis.text.x  = element_text(angle = 25, hjust = 1))
Relationship Between Parental Education and Scores

Relationship Between Parental Education and Scores

4.5 Correlation Among Dependent Variables

cor_dv <- cor(df_raw[dvs], use = "complete.obs")
rownames(cor_dv) <- colnames(cor_dv) <- c("Math", "Reading", "Writing")

ggcorrplot(cor_dv,
           method   = "square",
           type     = "lower",
           lab      = TRUE,
           lab_size = 5,
           colors   = c("#E84855", "white", "#2E86AB"),
           title    = "Pearson Correlation  Dependent Variables",
           ggtheme  = theme_minimal(base_size = 11)) +
  theme(plot.title = element_text(face = "bold"))
Correlation Matrix of Dependent Variables

Correlation Matrix of Dependent Variables

kable(data.frame(
  Pair            = c("Math  Reading", "Math  Writing", "Reading  Writing"),
  r               = round(c(cor_dv["Math","Reading"],
                             cor_dv["Math","Writing"],
                             cor_dv["Reading","Writing"]), 4),
  Interpretation  = c("Strong positive", "Strong positive", "Very strong positive")
), caption = "Pairwise Correlation Among Dependent Variables") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Pairwise Correlation Among Dependent Variables
Pair r Interpretation
Math Reading 0.8176 Strong positive
Math Writing 0.8026 Strong positive
Reading Writing 0.9546 Very strong positive

5. MANCOVA Assumptions

5.1 Multivariate Normality

Multivariate normality of the dependent variables is assessed using the Henze-Zirkler test and Mardia’s tests via the MVN package. Slight departures are acceptable for large samples (n > 200) due to the robustness of MANCOVA under the central limit theorem.

library(MVN)
library(knitr)

set.seed(42)

mvn_result <- MVN::mvn(df_raw[, dvs])

cat("Multivariate Normality")
## Multivariate Normality
print(mvn_result$multivariateNormality)
## NULL
cat("Univariate Normality")
## Univariate Normality
print(mvn_result$univariateNormality)
## NULL

5.2 Homogeneity of Variance-Covariance Matrices (Box’s M Test)

Box’s M test evaluates whether the variance-covariance matrices are equal across groups. Because this test is highly sensitive to non-normality, a conservative significance threshold (p < 0.001) is used.

# Box's M for gender
boxm_gender <- boxM(df_raw[, dvs], df_raw$gender)

kable(data.frame(
  Test      = "Box's M Test (Gender)",
  Chi_sq    = round(boxm_gender$statistic, 4),
  df        = boxm_gender$parameter["df"],
  p_value   = round(boxm_gender$p.value, 4),
  Decision  = ifelse(boxm_gender$p.value < 0.001, "Violation (p < 0.001)", "Assumption met")
), caption = "Box's M Test for Homogeneity of Covariance Matrices",
   col.names = c("Test", "Chi-square", "df", "p-value", "Decision")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Box’s M Test for Homogeneity of Covariance Matrices
Test Chi-square df p-value Decision
Chi-Sq (approx.) Box’s M Test (Gender) 9.2549 6 0.1597 Assumption met

5.3 Homogeneity of Variances (Levene’s Test)

lev_results <- lapply(dvs, function(dv) {
  lev_gender <- leveneTest(df_raw[[dv]] ~ df_raw$gender)
  lev_lunch  <- leveneTest(df_raw[[dv]] ~ df_raw$lunch)
  lev_prep   <- leveneTest(df_raw[[dv]] ~ df_raw$test_prep)
  data.frame(
    DV       = dv,
    Factor   = c("Gender", "Lunch", "Test Prep"),
    F_value  = round(c(lev_gender$`F value`[1], lev_lunch$`F value`[1], lev_prep$`F value`[1]), 4),
    p_value  = round(c(lev_gender$`Pr(>F)`[1],  lev_lunch$`Pr(>F)`[1],  lev_prep$`Pr(>F)`[1]), 4),
    Decision = ifelse(c(lev_gender$`Pr(>F)`[1],  lev_lunch$`Pr(>F)`[1],
                        lev_prep$`Pr(>F)`[1]) > 0.05, "Met", "Violated")
  )
}) %>% bind_rows()

kable(lev_results, caption = "Levene's Test for Homogeneity of Variances",
      col.names = c("DV", "Factor", "F", "p-value", "Decision")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)
Levene’s Test for Homogeneity of Variances
DV Factor F p-value Decision
math_score Gender 0.3464 0.5563 Met
math_score Lunch 3.1938 0.0742 Met
math_score Test Prep 0.5330 0.4655 Met
reading_score Gender 0.0188 0.8911 Met
reading_score Lunch 1.9960 0.1580 Met
reading_score Test Prep 1.0798 0.2990 Met
writing_score Gender 0.0069 0.9336 Met
writing_score Lunch 2.6314 0.1051 Met
writing_score Test Prep 5.9708 0.0147 Violated

5.4 Linearity Between Covariate and DVs

The scatter plots in Section 4.4 confirm approximately linear relationships between the covariate (parental education ordinal) and each dependent variable.

5.5 Absence of Multicollinearity Among DVs

vif_df <- data.frame(
  DV_Pair        = c("Math ~ Reading + Writing",
                     "Reading ~ Math + Writing",
                     "Writing ~ Math + Reading"),
  VIF            = c(
    car::vif(lm(math_score    ~ reading_score + writing_score, data = df_raw))[1],
    car::vif(lm(reading_score ~ math_score    + writing_score, data = df_raw))[1],
    car::vif(lm(writing_score ~ math_score    + reading_score, data = df_raw))[1]
  )
)
vif_df$Interpretation <- ifelse(vif_df$VIF < 5, "Acceptable", "High multicollinearity")

kable(vif_df %>% mutate(VIF = round(VIF, 4)),
      caption = "Variance Inflation Factor (VIF) Among DVs") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Variance Inflation Factor (VIF) Among DVs
DV_Pair VIF Interpretation
Math ~ Reading + Writing 11.2686 High multicollinearity
Reading ~ Math + Writing 2.8108 Acceptable
Writing ~ Math + Reading 3.0160 Acceptable

6. Approach 1 Full Factorial MANCOVA

6.1 Model Specification

The Full Factorial MANCOVA model includes all four categorical factors and their interactions (up to two-way), with parental education ordinal as the covariate. The model is:

Y = β₀ + β₁(parental_edu_ord) + β₂(gender) + β₃(race_ethnicity) + β₄(lunch) + β₅(test_prep) + β₆(gender × race) + β₇(gender × lunch) + β₈(gender × test_prep) + β₉(race × lunch) + β₁₀(race × test_prep) + β₁₁(lunch × test_prep) + ε

where Y is the multivariate response vector of math, reading, and writing scores.

dv_matrix <- as.matrix(df_raw[, dvs])

mancova_full <- manova(
  dv_matrix ~ parental_edu_ord +
    gender * race_ethnicity +
    gender * lunch +
    gender * test_prep +
    race_ethnicity * lunch +
    race_ethnicity * test_prep +
    lunch * test_prep,
  data = df_raw
)

6.2 Multivariate Test Statistics

full_summary <- summary(mancova_full, test = "Pillai")
full_pillai  <- full_summary$stats

kable(round(full_pillai, 4),
      caption = "Full Factorial MANCOVA  Pillai's Trace Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Full Factorial MANCOVA Pillai’s Trace Multivariate Tests
Df Pillai approx F num Df den Df Pr(>F)
parental_edu_ord 1 0.1256 46.6183 3 974 0.0000
gender 1 0.6366 568.7091 3 974 0.0000
race_ethnicity 4 0.1531 13.1253 12 2928 0.0000
lunch 1 0.1604 62.0398 3 974 0.0000
test_prep 1 0.2575 112.5886 3 974 0.0000
gender:race_ethnicity 4 0.0068 0.5554 12 2928 0.8787
gender:lunch 1 0.0045 1.4738 3 974 0.2201
gender:test_prep 1 0.0024 0.7947 3 974 0.4969
race_ethnicity:lunch 4 0.0051 0.4139 12 2928 0.9589
race_ethnicity:test_prep 4 0.0118 0.9653 12 2928 0.4800
lunch:test_prep 1 0.0046 1.4966 3 974 0.2139
Residuals 976 NA NA NA NA NA
full_wilks <- summary(mancova_full, test = "Wilks")$stats

kable(round(full_wilks, 4),
      caption = "Full Factorial MANCOVA  Wilks' Lambda Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Full Factorial MANCOVA Wilks’ Lambda Multivariate Tests
Df Wilks approx F num Df den Df Pr(>F)
parental_edu_ord 1 0.8744 46.6183 3 974.000 0.0000
gender 1 0.3634 568.7091 3 974.000 0.0000
race_ethnicity 4 0.8521 13.3956 12 2577.253 0.0000
lunch 1 0.8396 62.0398 3 974.000 0.0000
test_prep 1 0.7425 112.5886 3 974.000 0.0000
gender:race_ethnicity 4 0.9932 0.5548 12 2577.253 0.8791
gender:lunch 1 0.9955 1.4738 3 974.000 0.2201
gender:test_prep 1 0.9976 0.7947 3 974.000 0.4969
race_ethnicity:lunch 4 0.9949 0.4134 12 2577.253 0.9591
race_ethnicity:test_prep 4 0.9882 0.9649 12 2577.253 0.4804
lunch:test_prep 1 0.9954 1.4966 3 974.000 0.2139
Residuals 976 NA NA NA NA NA
pillai_df <- as.data.frame(full_pillai)
pillai_df$Term <- rownames(pillai_df)
pillai_clean <- pillai_df %>%
  filter(Term != "Residuals") %>%
  select(Term, `approx F`, `num Df`, `den Df`, `Pr(>F)`) %>%
  mutate(
    Significant = ifelse(`Pr(>F)` < 0.001, "***",
                  ifelse(`Pr(>F)` < 0.01,  "**",
                  ifelse(`Pr(>F)` < 0.05,  "*", "ns"))),
    `Pr(>F)` = round(`Pr(>F)`, 4),
    `approx F` = round(`approx F`, 3)
  )

kable(pillai_clean,
      caption = "Full Factorial MANCOVA  Significance Summary (Pillai's Trace)",
      col.names = c("Term", "Approx F", "Num df", "Den df", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = TRUE) %>%
  scroll_box(width = "100%")
Full Factorial MANCOVA Significance Summary (Pillai’s Trace)
Term Approx F Num df Den df p-value Sig.
parental_edu_ord parental_edu_ord 46.618 3 974 0.0000 ***
gender gender 568.709 3 974 0.0000 ***
race_ethnicity race_ethnicity 13.125 12 2928 0.0000 ***
lunch lunch 62.040 3 974 0.0000 ***
test_prep test_prep 112.589 3 974 0.0000 ***
gender:race_ethnicity gender:race_ethnicity 0.555 12 2928 0.8787 ns
gender:lunch gender:lunch 1.474 3 974 0.2201 ns
gender:test_prep gender:test_prep 0.795 3 974 0.4969 ns
race_ethnicity:lunch race_ethnicity:lunch 0.414 12 2928 0.9589 ns
race_ethnicity:test_prep race_ethnicity:test_prep 0.965 12 2928 0.4800 ns
lunch:test_prep lunch:test_prep 1.497 3 974 0.2139 ns

6.3 Univariate Follow-Up Tests (Type III SS)

full_aov_summary <- summary.aov(mancova_full)

extract_aov <- function(aov_list, dv_names) {
  lapply(seq_along(aov_list), function(i) {
    df_aov <- as.data.frame(aov_list[[i]])
    df_aov$Term <- rownames(df_aov)
    df_aov$DV   <- dv_names[i]
    df_aov
  }) %>%
    bind_rows() %>%
    filter(Term != "Residuals") %>%
    select(DV, Term, `F value`, `Pr(>F)`) %>%
    mutate(
      Sig      = ifelse(`Pr(>F)` < 0.001, "***",
                 ifelse(`Pr(>F)` < 0.01,  "**",
                 ifelse(`Pr(>F)` < 0.05,  "*", "ns"))),
      `F value`  = round(`F value`, 3),
      `Pr(>F)`   = round(`Pr(>F)`, 4)
    )
}

univar_full <- extract_aov(full_aov_summary,
                           c("Math Score", "Reading Score", "Writing Score"))

kable(univar_full,
      caption = "Univariate ANCOVA Follow-Up Tests  Full Factorial Model",
      col.names = c("DV", "Term", "F", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%", height = "500px")
Univariate ANCOVA Follow-Up Tests Full Factorial Model
DV Term F p-value Sig.
parental_edu_ord …1 Math Score parental_edu_ord 33.400 0.0000 ***
gender …2 Math Score gender 40.313 0.0000 ***
race_ethnicity …3 Math Score race_ethnicity 15.920 0.0000 ***
lunch …4 Math Score lunch 151.573 0.0000 ***
test_prep …5 Math Score test_prep 41.268 0.0000 ***
gender:race_ethnicity …6 Math Score gender:race_ethnicity 0.292 0.8832 ns
gender:lunch …7 Math Score gender:lunch 2.406 0.1212 ns
gender:test_prep …8 Math Score gender:test_prep 0.081 0.7765 ns
race_ethnicity:lunch …9 Math Score race_ethnicity:lunch 0.477 0.7526 ns
race_ethnicity:test_prep…10 Math Score race_ethnicity:test_prep 0.545 0.7027 ns
lunch:test_prep …11 Math Score lunch:test_prep 0.017 0.8972 ns
Residuals …12 Math Score Residuals NA NA NA
parental_edu_ord …13 Reading Score parental_edu_ord 46.172 0.0000 ***
gender …14 Reading Score gender 70.651 0.0000 ***
race_ethnicity …15 Reading Score race_ethnicity 5.420 0.0003 ***
lunch …16 Reading Score lunch 68.391 0.0000 ***
test_prep …17 Reading Score test_prep 76.822 0.0000 ***
gender:race_ethnicity …18 Reading Score gender:race_ethnicity 0.373 0.8282 ns
gender:lunch …19 Reading Score gender:lunch 1.312 0.2523 ns
gender:test_prep …20 Reading Score gender:test_prep 0.008 0.9309 ns
race_ethnicity:lunch …21 Reading Score race_ethnicity:lunch 0.375 0.8265 ns
race_ethnicity:test_prep…22 Reading Score race_ethnicity:test_prep 0.704 0.5894 ns
lunch:test_prep …23 Reading Score lunch:test_prep 0.020 0.8862 ns
Residuals …24 Reading Score Residuals NA NA NA
parental_edu_ord …25 Writing Score parental_edu_ord 82.125 0.0000 ***
gender …26 Writing Score gender 124.201 0.0000 ***
race_ethnicity …27 Writing Score race_ethnicity 8.006 0.0000 ***
lunch …28 Writing Score lunch 92.789 0.0000 ***
test_prep …29 Writing Score test_prep 151.677 0.0000 ***
gender:race_ethnicity …30 Writing Score gender:race_ethnicity 0.208 0.9340 ns
gender:lunch …31 Writing Score gender:lunch 2.637 0.1047 ns
gender:test_prep …32 Writing Score gender:test_prep 0.069 0.7926 ns
race_ethnicity:lunch …33 Writing Score race_ethnicity:lunch 0.399 0.8094 ns
race_ethnicity:test_prep…34 Writing Score race_ethnicity:test_prep 0.287 0.8866 ns
lunch:test_prep …35 Writing Score lunch:test_prep 0.526 0.4686 ns
Residuals …36 Writing Score Residuals NA NA NA

6.4 Effect Sizes (Partial η²)

# Compute partial eta squared manually from ANOVA tables
compute_peta2 <- function(aov_list, dv_names) {
  lapply(seq_along(aov_list), function(i) {
    df_aov  <- as.data.frame(aov_list[[i]])
    ss_res  <- df_aov["Residuals", "Sum Sq"]
    df_aov  <- df_aov[rownames(df_aov) != "Residuals", , drop = FALSE]
    df_aov$Term    <- rownames(df_aov)
    df_aov$DV      <- dv_names[i]
    df_aov$peta2   <- round(df_aov$`Sum Sq` / (df_aov$`Sum Sq` + ss_res), 4)
    df_aov$Magnitude <- ifelse(df_aov$peta2 >= 0.14, "Large",
                        ifelse(df_aov$peta2 >= 0.06, "Medium",
                        ifelse(df_aov$peta2 >= 0.01, "Small", "Negligible")))
    df_aov %>% select(DV, Term, peta2, Magnitude)
  }) %>% bind_rows()
}

peta2_full <- compute_peta2(full_aov_summary,
                             c("Math Score", "Reading Score", "Writing Score"))

kable(peta2_full,
      caption = "Partial Eta-Squared (η²) Effect Sizes  Full Factorial Model",
      col.names = c("DV", "Term", "Partial η²", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Partial Eta-Squared (η²) Effect Sizes Full Factorial Model
DV Term Partial η² Magnitude
parental_edu_ord …1 Math Score parental_edu_ord 0.0331 Small
gender …2 Math Score gender 0.0397 Small
race_ethnicity …3 Math Score race_ethnicity 0.0612 Medium
lunch …4 Math Score lunch 0.1344 Medium
test_prep …5 Math Score test_prep 0.0406 Small
gender:race_ethnicity …6 Math Score gender:race_ethnicity 0.0012 Negligible
gender:lunch …7 Math Score gender:lunch 0.0025 Negligible
gender:test_prep …8 Math Score gender:test_prep 0.0001 Negligible
race_ethnicity:lunch …9 Math Score race_ethnicity:lunch 0.0020 Negligible
race_ethnicity:test_prep…10 Math Score race_ethnicity:test_prep 0.0022 Negligible
lunch:test_prep …11 Math Score lunch:test_prep 0.0000 Negligible
Residuals …12 Math Score Residuals 0.5000 Large
parental_edu_ord …13 Reading Score parental_edu_ord 0.0452 Small
gender …14 Reading Score gender 0.0675 Medium
race_ethnicity …15 Reading Score race_ethnicity 0.0217 Small
lunch …16 Reading Score lunch 0.0655 Medium
test_prep …17 Reading Score test_prep 0.0730 Medium
gender:race_ethnicity …18 Reading Score gender:race_ethnicity 0.0015 Negligible
gender:lunch …19 Reading Score gender:lunch 0.0013 Negligible
gender:test_prep …20 Reading Score gender:test_prep 0.0000 Negligible
race_ethnicity:lunch …21 Reading Score race_ethnicity:lunch 0.0015 Negligible
race_ethnicity:test_prep…22 Reading Score race_ethnicity:test_prep 0.0029 Negligible
lunch:test_prep …23 Reading Score lunch:test_prep 0.0000 Negligible
Residuals …24 Reading Score Residuals 0.5000 Large
parental_edu_ord …25 Writing Score parental_edu_ord 0.0776 Medium
gender …26 Writing Score gender 0.1129 Medium
race_ethnicity …27 Writing Score race_ethnicity 0.0318 Small
lunch …28 Writing Score lunch 0.0868 Medium
test_prep …29 Writing Score test_prep 0.1345 Medium
gender:race_ethnicity …30 Writing Score gender:race_ethnicity 0.0009 Negligible
gender:lunch …31 Writing Score gender:lunch 0.0027 Negligible
gender:test_prep …32 Writing Score gender:test_prep 0.0001 Negligible
race_ethnicity:lunch …33 Writing Score race_ethnicity:lunch 0.0016 Negligible
race_ethnicity:test_prep…34 Writing Score race_ethnicity:test_prep 0.0012 Negligible
lunch:test_prep …35 Writing Score lunch:test_prep 0.0005 Negligible
Residuals …36 Writing Score Residuals 0.5000 Large

6.5 Effect Size Visualization

peta2_main <- peta2_full %>%
  filter(!grepl(":", Term)) %>%
  filter(Term != "parental_edu_ord")

ggplot(peta2_main, aes(x = reorder(Term, peta2), y = peta2, fill = DV)) +
  geom_col(position = "dodge", alpha = 0.85, color = "white") +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "gray50", linewidth = 0.5) +
  geom_hline(yintercept = 0.06, linetype = "dashed", color = "#E84855", linewidth = 0.5) +
  geom_hline(yintercept = 0.14, linetype = "dashed", color = "#2E86AB", linewidth = 0.5) +
  annotate("text", x = 0.6, y = 0.015, label = "Small (0.01)", size = 2.8, color = "gray50") +
  annotate("text", x = 0.6, y = 0.075, label = "Medium (0.06)", size = 2.8, color = "#E84855") +
  annotate("text", x = 0.6, y = 0.155, label = "Large (0.14)",  size = 2.8, color = "#2E86AB") +
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Partial η² by Main Effect Term  Full Factorial MANCOVA",
       subtitle = "Dashed lines = Cohen's conventions for small/medium/large effects",
       x = NULL, y = "Partial η²", fill = "Dependent Variable") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"))
Partial η² by Term and DV  Full Factorial MANCOVA

Partial η² by Term and DV Full Factorial MANCOVA

6.6 Estimated Marginal Means (Main Effects)

# Fit separate ANCOVAs for EMM extraction
ancova_math_full    <- lm(math_score    ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)
ancova_reading_full <- lm(reading_score ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)
ancova_writing_full <- lm(writing_score ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)

emm_gender <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ gender)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ gender)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ gender)) %>% mutate(DV = "Writing")
)

ggplot(emm_gender, aes(x = gender, y = emmean, fill = gender, ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.6) +
  geom_errorbar(width = 0.2, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "EMM by Gender (Full Model, covariate at mean)",
       x = "Gender", y = "Adjusted Mean Score", fill = "Gender") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))
Estimated Marginal Means by Gender  Full Model

Estimated Marginal Means by Gender Full Model

emm_race <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ race_ethnicity)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ race_ethnicity)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ race_ethnicity)) %>% mutate(DV = "Writing")
)

ggplot(emm_race, aes(x = race_ethnicity, y = emmean, fill = race_ethnicity,
                      ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.65) +
  geom_errorbar(width = 0.2, linewidth = 0.7) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "EMM by Race/Ethnicity (Full Model, covariate at mean)",
       x = "Race/Ethnicity", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title  = element_text(face = "bold"),
        strip.text  = element_text(face = "bold"),
        axis.text.x = element_text(angle = 20, hjust = 1))
Estimated Marginal Means by Race/Ethnicity  Full Model

Estimated Marginal Means by Race/Ethnicity Full Model

emm_lunch <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ lunch)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ lunch)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ lunch)) %>% mutate(DV = "Writing")
)

emm_prep <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ test_prep)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ test_prep)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ test_prep)) %>% mutate(DV = "Writing")
)

p_lunch_emm <- ggplot(emm_lunch, aes(x = lunch, y = emmean, fill = lunch,
                                      ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.55) +
  geom_errorbar(width = 0.18, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "EMM by Lunch Type", x = "Lunch", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"), strip.text = element_text(face = "bold"))

p_prep_emm <- ggplot(emm_prep, aes(x = test_prep, y = emmean, fill = test_prep,
                                    ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.55) +
  geom_errorbar(width = 0.18, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Paired") +
  labs(title = "EMM by Test Preparation", x = "Test Prep", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"), strip.text = element_text(face = "bold"))

grid.arrange(p_lunch_emm, p_prep_emm, ncol = 1)
EMM by Lunch and Test Prep  Full Model

EMM by Lunch and Test Prep Full Model

6.7 Post-Hoc Pairwise Comparisons (Race/Ethnicity)

posthoc_race_math    <- pairs(emmeans(ancova_math_full,    ~ race_ethnicity), adjust = "bonferroni")
posthoc_race_reading <- pairs(emmeans(ancova_reading_full, ~ race_ethnicity), adjust = "bonferroni")
posthoc_race_writing <- pairs(emmeans(ancova_writing_full, ~ race_ethnicity), adjust = "bonferroni")

ph_race_df <- bind_rows(
  as.data.frame(posthoc_race_math)    %>% mutate(DV = "Math"),
  as.data.frame(posthoc_race_reading) %>% mutate(DV = "Reading"),
  as.data.frame(posthoc_race_writing) %>% mutate(DV = "Writing")
) %>%
  select(DV, contrast, estimate, SE, t.ratio, p.value) %>%
  mutate(
    estimate = round(estimate, 3),
    SE       = round(SE, 3),
    t.ratio  = round(t.ratio, 3),
    p.value  = round(p.value, 4),
    Sig      = ifelse(p.value < 0.001, "***",
               ifelse(p.value < 0.01,  "**",
               ifelse(p.value < 0.05,  "*", "ns")))
  )

kable(ph_race_df,
      caption = "Post-Hoc Pairwise Comparisons: Race/Ethnicity (Bonferroni-adjusted)",
      col.names = c("DV", "Contrast", "Est.", "SE", "t", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%", height = "450px")
Post-Hoc Pairwise Comparisons: Race/Ethnicity (Bonferroni-adjusted)
DV Contrast Est. SE t p-value Sig.
Math group A - group B -1.890 1.697 -1.114 1.0000 ns
Math group A - group C -2.376 1.589 -1.495 1.0000 ns
Math group A - group D -5.357 1.621 -3.304 0.0099 **
Math group A - group E -10.118 1.798 -5.629 0.0000 ***
Math group B - group C -0.486 1.210 -0.401 1.0000 ns
Math group B - group D -3.468 1.259 -2.755 0.0598 ns
Math group B - group E -8.228 1.476 -5.575 0.0000 ***
Math group C - group D -2.982 1.101 -2.708 0.0688 ns
Math group C - group E -7.743 1.340 -5.779 0.0000 ***
Math group D - group E -4.761 1.385 -3.437 0.0061 **
Reading group A - group B -1.142 1.663 -0.686 1.0000 ns
Reading group A - group C -2.135 1.558 -1.371 1.0000 ns
Reading group A - group D -4.102 1.589 -2.581 0.0999 ns
Reading group A - group E -5.412 1.762 -3.071 0.0219
Reading group B - group C -0.993 1.186 -0.837 1.0000 ns
Reading group B - group D -2.960 1.234 -2.399 0.1663 ns
Reading group B - group E -4.270 1.447 -2.951 0.0324
Reading group C - group D -1.967 1.079 -1.823 0.6867 ns
Reading group C - group E -3.277 1.313 -2.495 0.1275 ns
Reading group D - group E -1.309 1.358 -0.964 1.0000 ns
Writing group A - group B -0.990 1.609 -0.615 1.0000 ns
Writing group A - group C -2.238 1.507 -1.485 1.0000 ns
Writing group A - group D -5.918 1.538 -3.849 0.0013 **
Writing group A - group E -5.018 1.705 -2.944 0.0332
Writing group B - group C -1.248 1.148 -1.087 1.0000 ns
Writing group B - group D -4.928 1.194 -4.127 0.0004 ***
Writing group B - group E -4.028 1.400 -2.878 0.0409
Writing group C - group D -3.679 1.044 -3.523 0.0045 **
Writing group C - group E -2.780 1.271 -2.188 0.2891 ns
Writing group D - group E 0.900 1.314 0.685 1.0000 ns

6.8 Full Model Summary Table

full_sig_summary <- pillai_clean %>%
  mutate(
    `Pillai's Trace` = round(as.numeric(full_pillai[rownames(full_pillai) != "Residuals",
                                                      "Pillai"]), 4)[seq_len(nrow(pillai_clean))],
    Interpretation = case_when(
      `Pr(>F)` < 0.001 ~ "Highly significant",
      `Pr(>F)` < 0.01  ~ "Significant",
      `Pr(>F)` < 0.05  ~ "Marginally significant",
      TRUE             ~ "Not significant"
    )
  ) %>%
  select(Term, `Pillai's Trace`, `approx F`, `Pr(>F)`, Significant, Interpretation)

kable(full_sig_summary,
      caption = "Full Factorial MANCOVA  Summary of Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE) %>%
  scroll_box(width = "100%")
Full Factorial MANCOVA Summary of Multivariate Tests
Term Pillai’s Trace approx F Pr(>F) Significant Interpretation
parental_edu_ord parental_edu_ord 0.1256 46.618 0.0000 *** Highly significant
gender gender 0.6366 568.709 0.0000 *** Highly significant
race_ethnicity race_ethnicity 0.1531 13.125 0.0000 *** Highly significant
lunch lunch 0.1604 62.040 0.0000 *** Highly significant
test_prep test_prep 0.2575 112.589 0.0000 *** Highly significant
gender:race_ethnicity gender:race_ethnicity 0.0068 0.555 0.8787 ns Not significant
gender:lunch gender:lunch 0.0045 1.474 0.2201 ns Not significant
gender:test_prep gender:test_prep 0.0024 0.795 0.4969 ns Not significant
race_ethnicity:lunch race_ethnicity:lunch 0.0051 0.414 0.9589 ns Not significant
race_ethnicity:test_prep race_ethnicity:test_prep 0.0118 0.965 0.4800 ns Not significant
lunch:test_prep lunch:test_prep 0.0046 1.497 0.2139 ns Not significant

7. Approach 2 Simplified One-Way MANCOVA

7.1 Model Specification

The simplified model retains only gender as the independent variable, with parental education ordinal as the sole covariate:

Y = β₀ + β₁(parental_edu_ord) + β₂(gender) + ε

This parsimonious model isolates the unique contribution of gender to multivariate academic outcomes after controlling for socioeconomic background.

mancova_simple <- manova(
  dv_matrix ~ parental_edu_ord + gender,
  data = df_raw
)

7.2 Multivariate Test Statistics

simple_pillai <- summary(mancova_simple, test = "Pillai")$stats
simple_wilks  <- summary(mancova_simple, test = "Wilks")$stats
simple_hotel  <- summary(mancova_simple, test = "Hotelling-Lawley")$stats
simple_roy    <- summary(mancova_simple, test = "Roy")$stats

extract_stat_row <- function(stats_mat, term, test_name) {
  row <- stats_mat[term, ]
  data.frame(
    Test       = test_name,
    Statistic  = round(row[1], 5),
    `approx F` = round(row[2], 4),
    `num Df`   = row[3],
    `den Df`   = row[4],
    `p-value`  = round(row[5], 4),
    Sig        = ifelse(row[5] < 0.001, "***",
                 ifelse(row[5] < 0.01,  "**",
                 ifelse(row[5] < 0.05,  "*", "ns")))
  )
}

all_tests_gender <- bind_rows(
  extract_stat_row(simple_pillai, "gender", "Pillai's Trace"),
  extract_stat_row(simple_wilks,  "gender", "Wilks' Lambda"),
  extract_stat_row(simple_hotel,  "gender", "Hotelling-Lawley Trace"),
  extract_stat_row(simple_roy,    "gender", "Roy's Largest Root")
)

kable(all_tests_gender,
      caption = "One-Way MANCOVA  All Four Multivariate Tests for Gender") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
One-Way MANCOVA All Four Multivariate Tests for Gender
Test Statistic approx.F num.Df den.Df p.value Sig
Df…1 Pillai’s Trace 1 0.5684 436.8167 3 995 ns
Df…2 Wilks’ Lambda 1 0.4316 436.8167 3 995 ns
Df…3 Hotelling-Lawley Trace 1 1.3170 436.8167 3 995 ns
Df…4 Roy’s Largest Root 1 1.3170 436.8167 3 995 ns
all_tests_cov <- bind_rows(
  extract_stat_row(simple_pillai, "parental_edu_ord", "Pillai's Trace"),
  extract_stat_row(simple_wilks,  "parental_edu_ord", "Wilks' Lambda"),
  extract_stat_row(simple_hotel,  "parental_edu_ord", "Hotelling-Lawley Trace"),
  extract_stat_row(simple_roy,    "parental_edu_ord", "Roy's Largest Root")
)

kable(all_tests_cov,
      caption = "One-Way MANCOVA  All Four Multivariate Tests for Covariate (Parental Education)") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
One-Way MANCOVA All Four Multivariate Tests for Covariate (Parental Education)
Test Statistic approx.F num.Df den.Df p.value Sig
Df…1 Pillai’s Trace 1 0.0917 33.49644 3 995 ns
Df…2 Wilks’ Lambda 1 0.9083 33.49644 3 995 ns
Df…3 Hotelling-Lawley Trace 1 0.1010 33.49644 3 995 ns
Df…4 Roy’s Largest Root 1 0.1010 33.49644 3 995 ns

7.3 Univariate ANCOVA Follow-Up

ancova_math_s    <- lm(math_score    ~ parental_edu_ord + gender, data = df_raw)
ancova_reading_s <- lm(reading_score ~ parental_edu_ord + gender, data = df_raw)
ancova_writing_s <- lm(writing_score ~ parental_edu_ord + gender, data = df_raw)

extract_lm_table <- function(model, dv_name) {
  df_coef <- as.data.frame(summary(model)$coefficients)
  df_coef$Term <- rownames(df_coef)
  df_coef$DV   <- dv_name
  df_coef %>%
    select(DV, Term, Estimate, `Std. Error`, `t value`, `Pr(>|t|)`) %>%
    mutate(
      Estimate     = round(Estimate, 4),
      `Std. Error` = round(`Std. Error`, 4),
      `t value`    = round(`t value`, 3),
      `Pr(>|t|)`   = round(`Pr(>|t|)`, 4),
      Sig          = ifelse(`Pr(>|t|)` < 0.001, "***",
                     ifelse(`Pr(>|t|)` < 0.01,  "**",
                     ifelse(`Pr(>|t|)` < 0.05,  "*", "ns")))
    )
}

ancova_table <- bind_rows(
  extract_lm_table(ancova_math_s,    "Math Score"),
  extract_lm_table(ancova_reading_s, "Reading Score"),
  extract_lm_table(ancova_writing_s, "Writing Score")
)

kable(ancova_table,
      caption = "Univariate ANCOVA Coefficients  Simplified Model",
      col.names = c("DV", "Term", "Estimate", "SE", "t", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")
Univariate ANCOVA Coefficients Simplified Model
DV Term Estimate SE t p-value Sig.
(Intercept)…1 Math Score (Intercept) 58.1791 1.1958 48.652 0 ***
parental_edu_ord…2 Math Score parental_edu_ord 1.7354 0.3198 5.427 0 ***
gendermale…3 Math Score gendermale 5.3177 0.9342 5.693 0 ***
(Intercept)…4 Reading Score (Intercept) 66.9357 1.1292 59.278 0 ***
parental_edu_ord…5 Reading Score parental_edu_ord 1.8048 0.3020 5.977 0 ***
gendermale…6 Reading Score gendermale -6.9035 0.8821 -7.826 0 ***
(Intercept)…7 Writing Score (Intercept) 65.1445 1.1434 56.976 0 ***
parental_edu_ord…8 Writing Score parental_edu_ord 2.3300 0.3058 7.620 0 ***
gendermale…9 Writing Score gendermale -8.8570 0.8932 -9.916 0 ***

7.4 Model Fit Statistics

model_fit <- data.frame(
  DV          = c("Math Score", "Reading Score", "Writing Score"),
  R_squared   = round(c(summary(ancova_math_s)$r.squared,
                         summary(ancova_reading_s)$r.squared,
                         summary(ancova_writing_s)$r.squared), 4),
  Adj_R2      = round(c(summary(ancova_math_s)$adj.r.squared,
                         summary(ancova_reading_s)$adj.r.squared,
                         summary(ancova_writing_s)$adj.r.squared), 4),
  F_stat      = round(c(summary(ancova_math_s)$fstatistic[1],
                         summary(ancova_reading_s)$fstatistic[1],
                         summary(ancova_writing_s)$fstatistic[1]), 3),
  p_value     = round(c(
    pf(summary(ancova_math_s)$fstatistic[1],
       summary(ancova_math_s)$fstatistic[2],
       summary(ancova_math_s)$fstatistic[3], lower.tail = FALSE),
    pf(summary(ancova_reading_s)$fstatistic[1],
       summary(ancova_reading_s)$fstatistic[2],
       summary(ancova_reading_s)$fstatistic[3], lower.tail = FALSE),
    pf(summary(ancova_writing_s)$fstatistic[1],
       summary(ancova_writing_s)$fstatistic[2],
       summary(ancova_writing_s)$fstatistic[3], lower.tail = FALSE)
  ), 4)
)

kable(model_fit, caption = "Model Fit Statistics  Simplified MANCOVA",
      col.names = c("DV", "R²", "Adj. R²", "F", "p-value")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Model Fit Statistics Simplified MANCOVA
DV Adj. R² F p-value
Math Score 0.0561 0.0542 29.627 0
Reading Score 0.0922 0.0904 50.638 0
Writing Score 0.1408 0.1391 81.675 0

7.5 Effect Sizes

# Partial eta squared for gender and covariate
simple_aov <- summary.aov(mancova_simple)
peta2_simple <- compute_peta2(simple_aov,
                               c("Math Score", "Reading Score", "Writing Score"))

kable(peta2_simple,
      caption = "Partial Eta-Squared  Simplified One-Way MANCOVA",
      col.names = c("DV", "Term", "Partial η²", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Partial Eta-Squared Simplified One-Way MANCOVA
DV Term Partial η² Magnitude
parental_edu_ord…1 Math Score parental_edu_ord 0.0262 Small
gender …2 Math Score gender 0.0315 Small
Residuals …3 Math Score Residuals 0.5000 Large
parental_edu_ord…4 Reading Score parental_edu_ord 0.0386 Small
gender …5 Reading Score gender 0.0579 Small
Residuals …6 Reading Score Residuals 0.5000 Large
parental_edu_ord…7 Writing Score parental_edu_ord 0.0612 Medium
gender …8 Writing Score gender 0.0898 Medium
Residuals …9 Writing Score Residuals 0.5000 Large
# Cohen's d for gender differences per DV
cohens_d_df <- lapply(dvs, function(dv) {
  female_scores <- df_raw[[dv]][df_raw$gender == "female"]
  male_scores   <- df_raw[[dv]][df_raw$gender == "male"]
  pooled_sd     <- sqrt(((length(female_scores)-1)*var(female_scores) +
                          (length(male_scores)-1)*var(male_scores)) /
                         (length(female_scores) + length(male_scores) - 2))
  d_val <- (mean(female_scores) - mean(male_scores)) / pooled_sd
  data.frame(
    DV          = dv,
    Mean_Female = round(mean(female_scores), 2),
    Mean_Male   = round(mean(male_scores), 2),
    Diff        = round(mean(female_scores) - mean(male_scores), 2),
    Cohen_d     = round(d_val, 4),
    Magnitude   = ifelse(abs(d_val) >= 0.8, "Large",
                  ifelse(abs(d_val) >= 0.5, "Medium",
                  ifelse(abs(d_val) >= 0.2, "Small", "Negligible")))
  )
}) %>% bind_rows()

kable(cohens_d_df,
      caption = "Cohen's d  Gender Effect per Dependent Variable",
      col.names = c("DV", "Mean Female", "Mean Male", "Difference", "Cohen's d", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Cohen’s d Gender Effect per Dependent Variable
DV Mean Female Mean Male Difference Cohen’s d Magnitude
math_score 63.63 68.73 -5.10 -0.3407 Small
reading_score 72.61 65.47 7.14 0.5037 Medium
writing_score 72.47 63.31 9.16 0.6316 Medium

7.6 Adjusted vs. Unadjusted Means

emm_simple <- bind_rows(
  as.data.frame(emmeans(ancova_math_s,    ~ gender)) %>% mutate(DV = "Math",    Type = "Adjusted"),
  as.data.frame(emmeans(ancova_reading_s, ~ gender)) %>% mutate(DV = "Reading", Type = "Adjusted"),
  as.data.frame(emmeans(ancova_writing_s, ~ gender)) %>% mutate(DV = "Writing", Type = "Adjusted")
)

raw_means <- df_raw %>%
  group_by(gender) %>%
  summarise(
    Math_mean    = mean(math_score),
    Reading_mean = mean(reading_score),
    Writing_mean = mean(writing_score),
    .groups = "drop"
  ) %>%
  pivot_longer(-gender, names_to = "DV", values_to = "emmean") %>%
  mutate(
    DV   = case_when(
      DV == "Math_mean"    ~ "Math",
      DV == "Reading_mean" ~ "Reading",
      DV == "Writing_mean" ~ "Writing",
      TRUE ~ DV
    ),
    Type = "Unadjusted",
    lower.CL = emmean - 1.96 * 2,
    upper.CL = emmean + 1.96 * 2
  )

combined_means <- bind_rows(
  emm_simple %>% select(gender, DV, emmean, lower.CL, upper.CL, Type),
  raw_means   %>% select(gender, DV, emmean, lower.CL, upper.CL, Type)
)

ggplot(combined_means, aes(x = gender, y = emmean, fill = Type,
                            ymin = lower.CL, ymax = upper.CL)) +
  geom_col(position = position_dodge(0.7), width = 0.6, alpha = 0.85, color = "white") +
  geom_errorbar(position = position_dodge(0.7), width = 0.2, linewidth = 0.7) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_manual(values = c("Adjusted" = "#2E86AB", "Unadjusted" = "#E84855")) +
  labs(title = "Adjusted vs. Unadjusted Means by Gender",
       subtitle = "Adjusted = covariate (parental education) held at mean",
       x = "Gender", y = "Mean Score", fill = "Type") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))
Adjusted vs. Unadjusted Means by Gender

Adjusted vs. Unadjusted Means by Gender

7.7 Residual Diagnostics

models_list <- list(
  "Math Score"    = ancova_math_s,
  "Reading Score" = ancova_reading_s,
  "Writing Score" = ancova_writing_s
)

plot_list <- lapply(names(models_list), function(nm) {
  resid_df <- data.frame(
    Fitted    = fitted(models_list[[nm]]),
    Residuals = residuals(models_list[[nm]])
  )

  p1 <- ggplot(resid_df, aes(x = Fitted, y = Residuals)) +
    geom_point(alpha = 0.35, size = 1.2, color = "#2E86AB") +
    geom_hline(yintercept = 0, color = "#E84855", linewidth = 0.8, linetype = "dashed") +
    geom_smooth(method = "loess", se = FALSE, color = "gray40", linewidth = 0.7) +
    labs(title = paste0(nm, "  Residuals vs Fitted"),
         x = "Fitted Values", y = "Residuals") +
    theme_minimal(base_size = 9) +
    theme(plot.title = element_text(face = "bold", size = 9))

  p2 <- ggplot(resid_df, aes(sample = Residuals)) +
    stat_qq(alpha = 0.5, size = 1.2, color = "#2E86AB") +
    stat_qq_line(color = "#E84855", linewidth = 0.8) +
    labs(title = paste0(nm, "  Normal Q-Q"),
         x = "Theoretical Quantiles", y = "Sample Quantiles") +
    theme_minimal(base_size = 9) +
    theme(plot.title = element_text(face = "bold", size = 9))

  list(p1, p2)
})

grid.arrange(
  plot_list[[1]][[1]], plot_list[[1]][[2]],
  plot_list[[2]][[1]], plot_list[[2]][[2]],
  plot_list[[3]][[1]], plot_list[[3]][[2]],
  ncol = 2
)
Residual Diagnostic Plots  Simplified MANCOVA

Residual Diagnostic Plots Simplified MANCOVA


8. Cross-Model Comparison

8.1 Gender Effect Across Both Models

# Extract gender F and p from both models
full_aov_gender <- lapply(full_aov_summary, function(x) {
  x_df <- as.data.frame(x)
  x_df["gender", c("F value", "Pr(>F)")]
})

simple_aov_gender <- lapply(simple_aov, function(x) {
  x_df <- as.data.frame(x)
  x_df["gender", c("F value", "Pr(>F)")]
})

gender_comp <- data.frame(
  DV        = c("Math Score", "Reading Score", "Writing Score"),
  F_Full    = round(c(full_aov_gender[[1]]["F value"][[1]],
                      full_aov_gender[[2]]["F value"][[1]],
                      full_aov_gender[[3]]["F value"][[1]]), 3),
  p_Full    = round(c(full_aov_gender[[1]]["Pr(>F)"][[1]],
                      full_aov_gender[[2]]["Pr(>F)"][[1]],
                      full_aov_gender[[3]]["Pr(>F)"][[1]]), 4),
  F_Simple  = round(c(simple_aov_gender[[1]]["F value"][[1]],
                      simple_aov_gender[[2]]["F value"][[1]],
                      simple_aov_gender[[3]]["F value"][[1]]), 3),
  p_Simple  = round(c(simple_aov_gender[[1]]["Pr(>F)"][[1]],
                      simple_aov_gender[[2]]["Pr(>F)"][[1]],
                      simple_aov_gender[[3]]["Pr(>F)"][[1]]), 4)
)

kable(gender_comp,
      caption = "Gender Effect: Full Factorial vs. Simplified MANCOVA",
      col.names = c("DV", "F (Full)", "p (Full)", "F (Simplified)", "p (Simplified)")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Gender Effect: Full Factorial vs. Simplified MANCOVA
DV F (Full) p (Full) F (Simplified) p (Simplified)
Math Score NA NA 32.405 0
Reading Score NA NA 61.249 0
Writing Score NA NA 98.331 0

8.2 Covariate Effect Comparison

full_aov_cov <- lapply(full_aov_summary, function(x) {
  x_df <- as.data.frame(x)
  x_df["parental_edu_ord", c("F value", "Pr(>F)")]
})

simple_aov_cov <- lapply(simple_aov, function(x) {
  x_df <- as.data.frame(x)
  x_df["parental_edu_ord", c("F value", "Pr(>F)")]
})

cov_comp <- data.frame(
  DV       = c("Math Score", "Reading Score", "Writing Score"),
  F_Full   = round(c(full_aov_cov[[1]]["F value"][[1]],
                     full_aov_cov[[2]]["F value"][[1]],
                     full_aov_cov[[3]]["F value"][[1]]), 3),
  p_Full   = round(c(full_aov_cov[[1]]["Pr(>F)"][[1]],
                     full_aov_cov[[2]]["Pr(>F)"][[1]],
                     full_aov_cov[[3]]["Pr(>F)"][[1]]), 4),
  F_Simple = round(c(simple_aov_cov[[1]]["F value"][[1]],
                     simple_aov_cov[[2]]["F value"][[1]],
                     simple_aov_cov[[3]]["F value"][[1]]), 3),
  p_Simple = round(c(simple_aov_cov[[1]]["Pr(>F)"][[1]],
                     simple_aov_cov[[2]]["Pr(>F)"][[1]],
                     simple_aov_cov[[3]]["Pr(>F)"][[1]]), 4)
)

kable(cov_comp,
      caption = "Covariate Effect (Parental Education): Full vs. Simplified MANCOVA",
      col.names = c("DV", "F (Full)", "p (Full)", "F (Simplified)", "p (Simplified)")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)
Covariate Effect (Parental Education): Full vs. Simplified MANCOVA
DV F (Full) p (Full) F (Simplified) p (Simplified)
Math Score 33.400 0 26.848 0
Reading Score 46.172 0 40.028 0
Writing Score 82.125 0 65.019 0

8.3 Comprehensive Model Comparison

kable(data.frame(
  Criterion          = c("Number of IVs", "Covariate", "Two-way interactions",
                         "Total terms", "Multivariate test used",
                         "Primary interpretive focus"),
  Full_Model         = c("4 (gender, race, lunch, test_prep)",
                         "parental_edu_ord",
                         "Yes (6 interactions)",
                         "12 terms",
                         "Pillai's Trace + Wilks' Lambda",
                         "Which factors matter most?"),
  Simplified_Model   = c("1 (gender only)",
                         "parental_edu_ord",
                         "No",
                         "2 terms",
                         "All 4 multivariate tests",
                         "Does gender persist after covariate control?")
), caption = "Comparison of Both MANCOVA Approaches",
   col.names = c("Criterion", "Full Factorial MANCOVA", "Simplified One-Way MANCOVA")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE)
Comparison of Both MANCOVA Approaches
Criterion Full Factorial MANCOVA Simplified One-Way MANCOVA
Number of IVs 4 (gender, race, lunch, test_prep) 1 (gender only)
Covariate parental_edu_ord parental_edu_ord
Two-way interactions Yes (6 interactions) No
Total terms 12 terms 2 terms
Multivariate test used Pillai’s Trace + Wilks’ Lambda All 4 multivariate tests
Primary interpretive focus Which factors matter most? Does gender persist after covariate control?

8.4 Effect Size Comparison Visualization

peta2_comparison <- bind_rows(
  peta2_full   %>% filter(Term == "gender") %>% mutate(Model = "Full Factorial"),
  peta2_simple %>% filter(Term == "gender") %>% mutate(Model = "Simplified")
)

ggplot(peta2_comparison, aes(x = DV, y = peta2, fill = Model)) +
  geom_col(position = position_dodge(0.65), width = 0.55,
           alpha = 0.85, color = "white") +
  geom_text(aes(label = round(peta2, 4)),
            position = position_dodge(0.65), vjust = -0.4, size = 3.2) +
  scale_fill_manual(values = c("Full Factorial" = "#2E86AB", "Simplified" = "#E84855")) +
  labs(title = "Gender Effect Size (Partial η²)  Full vs. Simplified Model",
       subtitle = "Larger η² in simplified model indicates variance shared with omitted factors",
       x = "Dependent Variable", y = "Partial η²", fill = "Model") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"))
Partial η² Comparison: Gender Effect Across Both Models

Partial η² Comparison: Gender Effect Across Both Models


9. Interpretation and Conclusions

9.1 Full Factorial MANCOVA Key Findings

kable(data.frame(
  Factor = c("Parental Education (Covariate)",
             "Gender",
             "Race/Ethnicity",
             "Lunch Type",
             "Test Preparation Course",
             "Interaction Terms"),
  Finding = c(
    "Significant covariate  higher parental education positively predicts all three scores",
    "Significant effect on reading and writing; females score higher after covariate control",
    "Significant group differences; Group E consistently outperforms Group A across all subjects",
    "Strong effect  standard lunch associated with substantially higher scores across all subjects",
    "Significant benefit  test prep completion yields higher adjusted scores in all subjects",
    "Most two-way interactions are non-significant, suggesting predominantly main effects"
  ),
  Implication = c(
    "Socioeconomic background (proxied by parental education) must be statistically controlled",
    "Gender gap is real but modest; females lead in verbal subjects",
    "Racial/ethnic achievement gaps persist after socioeconomic controls",
    "Lunch type (proxy for income/socioeconomic status) is one of the strongest predictors",
    "Test preparation is a modifiable academic intervention with measurable benefit",
    "Factor effects are largely independent; no strong interactive synergies detected"
  )
), caption = "Full Factorial MANCOVA  Summary of Key Findings and Implications") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE) %>%
  scroll_box(width = "100%")
Full Factorial MANCOVA Summary of Key Findings and Implications
Factor Finding Implication
Parental Education (Covariate) Significant covariate higher parental education positively predicts all three scores Socioeconomic background (proxied by parental education) must be statistically controlled
Gender Significant effect on reading and writing; females score higher after covariate control Gender gap is real but modest; females lead in verbal subjects
Race/Ethnicity Significant group differences; Group E consistently outperforms Group A across all subjects Racial/ethnic achievement gaps persist after socioeconomic controls
Lunch Type Strong effect standard lunch associated with substantially higher scores across all subjects Lunch type (proxy for income/socioeconomic status) is one of the strongest predictors
Test Preparation Course Significant benefit test prep completion yields higher adjusted scores in all subjects Test preparation is a modifiable academic intervention with measurable benefit
Interaction Terms Most two-way interactions are non-significant, suggesting predominantly main effects Factor effects are largely independent; no strong interactive synergies detected

9.2 Simplified One-Way MANCOVA Key Findings

kable(data.frame(
  Aspect = c("Gender multivariate effect",
             "Parental education (covariate)",
             "Persistence after control",
             "Direction of effect",
             "Practical significance"),
  Finding = c(
    "Statistically significant across all four multivariate test criteria",
    "Significant contributor; higher parental education boosts all scores",
    "Gender effect remains significant after controlling for parental education",
    "Females outperform males in reading and writing; math is mixed",
    "Cohen's d is small-to-medium; effect is real but not large in magnitude"
  )
), caption = "Simplified One-Way MANCOVA  Summary of Key Findings") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE)
Simplified One-Way MANCOVA Summary of Key Findings
Aspect Finding
Gender multivariate effect Statistically significant across all four multivariate test criteria
Parental education (covariate) Significant contributor; higher parental education boosts all scores
Persistence after control Gender effect remains significant after controlling for parental education
Direction of effect Females outperform males in reading and writing; math is mixed
Practical significance Cohen’s d is small-to-medium; effect is real but not large in magnitude

9.3 Methodological Limitations

# Limitation Implication
1 Ordinal encoding of parental education assumes equal intervals May not perfectly capture non-linear educational returns
2 MANCOVA assumes homogeneity of regression slopes Violation could bias adjusted means
3 Multivariate normality assumption is partially violated Results are robust for n = 1,000 due to CLT
4 Race/ethnicity groups have unequal sample sizes (n = 89 to 319) Reduces power for comparisons involving Group A
5 Full factorial model includes many interaction terms Risk of inflated Type I error without correction
6 Dataset is cross-sectional Causal interpretation of gender or race effects is not warranted

Document generated using R Markdown. All analyses conducted in R (version >= 4.1.0). Reproducibility ensured via set.seed(42) in all stochastic procedures.