1. Introduction

This report presents a comprehensive Multivariate Analysis of Covariance (MANCOVA) of student academic performance data. MANCOVA extends MANOVA by incorporating one or more continuous covariates to control for their influence on the dependent variable set, thereby providing a more precise examination of group differences.

Two analytical approaches are employed and systematically compared:

Full Factorial MANCOVA examines the simultaneous effects of multiple demographic and academic factors (gender, race/ethnicity, lunch type, and test preparation course) on academic outcomes while controlling for parental education level
Simplified One-Way MANCOVA isolates the effect of gender on academic performance after controlling for parental education level

The dataset comprises 1,000 student records with three dependent variables (math score, reading score, and writing score), four categorical independent variables, and one ordinal covariate derived from parental level of education.

The analysis pipeline follows: data loading → exploratory data analysis → assumption checking → MANCOVA execution → post-hoc testing → effect size estimation → interpretation.

2. Data Loading and Initial Inspection

df_raw <- read.csv("StudentsPerformance.csv", stringsAsFactors = FALSE)

colnames(df_raw) <- c(
  "gender", "race_ethnicity", "parental_education",
  "lunch", "test_prep", "math_score", "reading_score", "writing_score"
)

cat(sprintf("Rows: %d | Columns: %d\n", nrow(df_raw), ncol(df_raw)))

## Rows: 1000 | Columns: 8

str(df_raw)

## 'data.frame':    1000 obs. of  8 variables:
##  $ gender            : chr  "female" "female" "female" "male" ...
##  $ race_ethnicity    : chr  "group B" "group C" "group B" "group A" ...
##  $ parental_education: chr  "bachelor's degree" "some college" "master's degree" "associate's degree" ...
##  $ lunch             : chr  "standard" "standard" "standard" "free/reduced" ...
##  $ test_prep         : chr  "none" "completed" "none" "none" ...
##  $ math_score        : int  72 69 90 47 76 71 88 40 64 38 ...
##  $ reading_score     : int  72 90 95 57 78 83 95 43 64 60 ...
##  $ writing_score     : int  74 88 93 44 75 78 92 39 67 50 ...

missing_summary <- data.frame(
  Column   = names(df_raw),
  Missing  = sapply(df_raw, function(x) sum(is.na(x))),
  Pct_Miss = sapply(df_raw, function(x) round(mean(is.na(x)) * 100, 2))
)
rownames(missing_summary) <- NULL

kable(missing_summary, caption = "Missing Values per Variable",
      col.names = c("Variable", "Missing Count", "Missing (%)")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)

Missing Values per Variable
Variable	Missing Count	Missing (%)
gender	0	0
race_ethnicity	0	0
parental_education	0	0
lunch	0	0
test_prep	0	0
math_score	0	0
reading_score	0	0
writing_score	0	0

kable(head(df_raw, 8), caption = "Preview: First 8 Rows of Dataset") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Preview: First 8 Rows of Dataset
gender	race_ethnicity	parental_education	lunch	test_prep	math_score	reading_score	writing_score
female	group B	bachelor’s degree	standard	none	72	72	74
female	group C	some college	standard	completed	69	90	88
female	group B	master’s degree	standard	none	90	95	93
male	group A	associate’s degree	free/reduced	none	47	57	44
male	group C	some college	standard	none	76	78	75
female	group B	associate’s degree	standard	none	71	83	78
female	group B	some college	standard	completed	88	95	92
male	group B	some college	free/reduced	none	40	43	39

3. Data Preprocessing

3.1 Ordinal Encoding of Parental Education

The covariate parental_education is converted to an ordinal numeric scale representing the hierarchical ordering of educational attainment levels.

edu_order <- c(
  "some high school"    = 1,
  "high school"         = 2,
  "some college"        = 3,
  "associate's degree"  = 4,
  "bachelor's degree"   = 5,
  "master's degree"     = 6
)

df_raw$parental_edu_ord <- as.integer(edu_order[df_raw$parental_education])

edu_mapping <- data.frame(
  Category    = names(edu_order),
  Ordinal_Code = as.integer(edu_order)
)

kable(edu_mapping, caption = "Ordinal Encoding of Parental Education Level",
      col.names = c("Category", "Ordinal Code")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Ordinal Encoding of Parental Education Level
Category	Ordinal Code
some high school	1
high school	2
some college	3
associate’s degree	4
bachelor’s degree	5
master’s degree	6

cat(sprintf("Ordinal range: %d to %d | No NAs: %s\n",
            min(df_raw$parental_edu_ord), max(df_raw$parental_edu_ord),
            sum(is.na(df_raw$parental_edu_ord)) == 0))

## Ordinal range: 1 to 6 | No NAs: TRUE

3.2 Factor Conversion

df_raw$gender         <- factor(df_raw$gender)
df_raw$race_ethnicity <- factor(df_raw$race_ethnicity)
df_raw$lunch          <- factor(df_raw$lunch)
df_raw$test_prep      <- factor(df_raw$test_prep)

cat("Factor levels:\n")

## Factor levels:

cat(sprintf("  gender         : %s\n", paste(levels(df_raw$gender), collapse = ", ")))

##   gender         : female, male

cat(sprintf("  race_ethnicity : %s\n", paste(levels(df_raw$race_ethnicity), collapse = ", ")))

##   race_ethnicity : group A, group B, group C, group D, group E

cat(sprintf("  lunch          : %s\n", paste(levels(df_raw$lunch), collapse = ", ")))

##   lunch          : free/reduced, standard

cat(sprintf("  test_prep      : %s\n", paste(levels(df_raw$test_prep), collapse = ", ")))

##   test_prep      : completed, none

3.3 Group Sample Sizes

group_size_df <- bind_rows(
  df_raw %>% count(gender) %>% rename(Level = gender) %>% mutate(Factor = "Gender"),
  df_raw %>% count(race_ethnicity) %>% rename(Level = race_ethnicity) %>% mutate(Factor = "Race/Ethnicity"),
  df_raw %>% count(lunch) %>% rename(Level = lunch) %>% mutate(Factor = "Lunch"),
  df_raw %>% count(test_prep) %>% rename(Level = test_prep) %>% mutate(Factor = "Test Prep")
) %>% select(Factor, Level, n)

kable(group_size_df, caption = "Sample Sizes per Factor Level",
      col.names = c("Factor", "Level", "n")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)

Sample Sizes per Factor Level
Factor	Level	n
Gender	female	518
Gender	male	482
Race/Ethnicity	group A	89
Race/Ethnicity	group B	190
Race/Ethnicity	group C	319
Race/Ethnicity	group D	262
Race/Ethnicity	group E	140
Lunch	free/reduced	355
Lunch	standard	645
Test Prep	completed	358
Test Prep	none	642

4. Exploratory Data Analysis

4.1 Descriptive Statistics

dvs <- c("math_score", "reading_score", "writing_score")

desc_stats <- data.frame(
  Variable = dvs,
  N        = sapply(df_raw[dvs], function(x) sum(!is.na(x))),
  Mean     = sapply(df_raw[dvs], mean, na.rm = TRUE),
  SD       = sapply(df_raw[dvs], sd, na.rm = TRUE),
  Min      = sapply(df_raw[dvs], min, na.rm = TRUE),
  Q1       = sapply(df_raw[dvs], quantile, 0.25, na.rm = TRUE),
  Median   = sapply(df_raw[dvs], median, na.rm = TRUE),
  Q3       = sapply(df_raw[dvs], quantile, 0.75, na.rm = TRUE),
  Max      = sapply(df_raw[dvs], max, na.rm = TRUE),
  Skewness = sapply(df_raw[dvs], skewness, na.rm = TRUE),
  Kurtosis = sapply(df_raw[dvs], kurtosis, na.rm = TRUE)
)
rownames(desc_stats) <- NULL

kable(desc_stats %>% mutate(across(where(is.numeric), ~round(.x, 3))),
      caption = "Descriptive Statistics of Dependent Variables") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Descriptive Statistics of Dependent Variables
Variable	N	Mean	SD	Min	Q1	Median	Q3	Max	Skewness	Kurtosis
math_score	1000	66.089	15.163	0	57.00	66	77	100	-0.279	3.268
reading_score	1000	69.169	14.600	17	59.00	70	79	100	-0.259	2.926
writing_score	1000	68.054	15.196	10	57.75	69	79	100	-0.289	2.961

4.2 Distribution of Scores

df_scores_long <- df_raw %>%
  select(math_score, reading_score, writing_score) %>%
  pivot_longer(cols = everything(), names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_scores_long, aes(x = Score, fill = Subject)) +
  geom_histogram(aes(y = after_stat(density)), bins = 25,
                 color = "white", alpha = 0.75) +
  geom_density(aes(color = Subject), linewidth = 0.9, fill = NA) +
  facet_wrap(~Subject, scales = "free_y", ncol = 3) +
  scale_fill_brewer(palette = "Set1", guide = "none") +
  scale_color_brewer(palette = "Set1", guide = "none") +
  labs(title = "Score Distributions by Subject",
       subtitle = "Histogram overlaid with kernel density estimate",
       x = "Score", y = "Density") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold", size = 10))

Distribution of Academic Scores

4.3 Scores by Group (Boxplots)

df_gender_long <- df_raw %>%
  select(gender, math_score, reading_score, writing_score) %>%
  pivot_longer(-gender, names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_gender_long, aes(x = gender, y = Score, fill = gender)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 1.5, alpha = 0.75) +
  facet_wrap(~Subject, ncol = 3) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Score Distribution by Gender",
       x = "Gender", y = "Score", fill = "Gender") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))

Score Distribution by Gender

make_bp <- function(factor_var, factor_label, palette_name) {
  df_plot <- df_raw %>%
    select(all_of(c(factor_var, dvs))) %>%
    pivot_longer(cols = all_of(dvs), names_to = "Subject", values_to = "Score") %>%
    mutate(Subject = case_when(
      Subject == "math_score"    ~ "Math",
      Subject == "reading_score" ~ "Reading",
      Subject == "writing_score" ~ "Writing",
      TRUE ~ Subject
    ),
    Factor = .data[[factor_var]])

  ggplot(df_plot, aes(x = Factor, y = Score, fill = Factor)) +
    geom_boxplot(outlier.shape = 21, outlier.size = 1.2, alpha = 0.75) +
    facet_wrap(~Subject, ncol = 3) +
    scale_fill_brewer(palette = palette_name) +
    labs(title = factor_label, x = NULL, y = "Score", fill = NULL) +
    theme_minimal(base_size = 10) +
    theme(plot.title   = element_text(face = "bold"),
          strip.text   = element_text(face = "bold"),
          axis.text.x  = element_text(angle = 20, hjust = 1),
          legend.position = "none")
}

p_race  <- make_bp("race_ethnicity", "By Race/Ethnicity", "Set2")
p_lunch <- make_bp("lunch",          "By Lunch Type",     "Set3")
p_prep  <- make_bp("test_prep",      "By Test Preparation", "Paired")

grid.arrange(p_race, p_lunch, p_prep, ncol = 1)

Score Distributions Across All Factors

4.4 Covariate: Parental Education vs. Scores

df_cov_long <- df_raw %>%
  select(parental_edu_ord, math_score, reading_score, writing_score) %>%
  pivot_longer(-parental_edu_ord, names_to = "Subject", values_to = "Score") %>%
  mutate(Subject = case_when(
    Subject == "math_score"    ~ "Math",
    Subject == "reading_score" ~ "Reading",
    Subject == "writing_score" ~ "Writing",
    TRUE ~ Subject
  ))

ggplot(df_cov_long, aes(x = parental_edu_ord, y = Score, color = Subject)) +
  geom_jitter(alpha = 0.25, size = 0.9, width = 0.15) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.2) +
  facet_wrap(~Subject, ncol = 3) +
  scale_color_brewer(palette = "Set1", guide = "none") +
  scale_x_continuous(breaks = 1:6,
    labels = c("Some HS", "HS", "Some Col", "Assoc", "Bach", "Master")) +
  labs(title = "Parental Education (Covariate) vs. Academic Scores",
       subtitle = "Linear trend with 95% confidence band",
       x = "Parental Education Level (Ordinal)", y = "Score") +
  theme_minimal(base_size = 11) +
  theme(plot.title   = element_text(face = "bold"),
        strip.text   = element_text(face = "bold"),
        axis.text.x  = element_text(angle = 25, hjust = 1))

Relationship Between Parental Education and Scores

4.5 Correlation Among Dependent Variables

cor_dv <- cor(df_raw[dvs], use = "complete.obs")
rownames(cor_dv) <- colnames(cor_dv) <- c("Math", "Reading", "Writing")

ggcorrplot(cor_dv,
           method   = "square",
           type     = "lower",
           lab      = TRUE,
           lab_size = 5,
           colors   = c("#E84855", "white", "#2E86AB"),
           title    = "Pearson Correlation  Dependent Variables",
           ggtheme  = theme_minimal(base_size = 11)) +
  theme(plot.title = element_text(face = "bold"))

Correlation Matrix of Dependent Variables

kable(data.frame(
  Pair            = c("Math  Reading", "Math  Writing", "Reading  Writing"),
  r               = round(c(cor_dv["Math","Reading"],
                             cor_dv["Math","Writing"],
                             cor_dv["Reading","Writing"]), 4),
  Interpretation  = c("Strong positive", "Strong positive", "Very strong positive")
), caption = "Pairwise Correlation Among Dependent Variables") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Pairwise Correlation Among Dependent Variables
Pair	r	Interpretation
Math Reading	0.8176	Strong positive
Math Writing	0.8026	Strong positive
Reading Writing	0.9546	Very strong positive

5. MANCOVA Assumptions

5.1 Multivariate Normality

Multivariate normality of the dependent variables is assessed using the Henze-Zirkler test and Mardia’s tests via the MVN package. Slight departures are acceptable for large samples (n > 200) due to the robustness of MANCOVA under the central limit theorem.

library(MVN)
library(knitr)

set.seed(42)

mvn_result <- MVN::mvn(df_raw[, dvs])

cat("Multivariate Normality")

## Multivariate Normality

print(mvn_result$multivariateNormality)

## NULL

cat("Univariate Normality")

## Univariate Normality

print(mvn_result$univariateNormality)

## NULL

5.2 Homogeneity of Variance-Covariance Matrices (Box’s M Test)

Box’s M test evaluates whether the variance-covariance matrices are equal across groups. Because this test is highly sensitive to non-normality, a conservative significance threshold (p < 0.001) is used.

# Box's M for gender
boxm_gender <- boxM(df_raw[, dvs], df_raw$gender)

kable(data.frame(
  Test      = "Box's M Test (Gender)",
  Chi_sq    = round(boxm_gender$statistic, 4),
  df        = boxm_gender$parameter["df"],
  p_value   = round(boxm_gender$p.value, 4),
  Decision  = ifelse(boxm_gender$p.value < 0.001, "Violation (p < 0.001)", "Assumption met")
), caption = "Box's M Test for Homogeneity of Covariance Matrices",
   col.names = c("Test", "Chi-square", "df", "p-value", "Decision")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Box’s M Test for Homogeneity of Covariance Matrices
	Test	Chi-square	df	p-value	Decision
Chi-Sq (approx.)	Box’s M Test (Gender)	9.2549	6	0.1597	Assumption met

5.3 Homogeneity of Variances (Levene’s Test)

lev_results <- lapply(dvs, function(dv) {
  lev_gender <- leveneTest(df_raw[[dv]] ~ df_raw$gender)
  lev_lunch  <- leveneTest(df_raw[[dv]] ~ df_raw$lunch)
  lev_prep   <- leveneTest(df_raw[[dv]] ~ df_raw$test_prep)
  data.frame(
    DV       = dv,
    Factor   = c("Gender", "Lunch", "Test Prep"),
    F_value  = round(c(lev_gender$`F value`[1], lev_lunch$`F value`[1], lev_prep$`F value`[1]), 4),
    p_value  = round(c(lev_gender$`Pr(>F)`[1],  lev_lunch$`Pr(>F)`[1],  lev_prep$`Pr(>F)`[1]), 4),
    Decision = ifelse(c(lev_gender$`Pr(>F)`[1],  lev_lunch$`Pr(>F)`[1],
                        lev_prep$`Pr(>F)`[1]) > 0.05, "Met", "Violated")
  )
}) %>% bind_rows()

kable(lev_results, caption = "Levene's Test for Homogeneity of Variances",
      col.names = c("DV", "Factor", "F", "p-value", "Decision")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = FALSE)

Levene’s Test for Homogeneity of Variances
DV	Factor	F	p-value	Decision
math_score	Gender	0.3464	0.5563	Met
math_score	Lunch	3.1938	0.0742	Met
math_score	Test Prep	0.5330	0.4655	Met
reading_score	Gender	0.0188	0.8911	Met
reading_score	Lunch	1.9960	0.1580	Met
reading_score	Test Prep	1.0798	0.2990	Met
writing_score	Gender	0.0069	0.9336	Met
writing_score	Lunch	2.6314	0.1051	Met
writing_score	Test Prep	5.9708	0.0147	Violated

5.4 Linearity Between Covariate and DVs

The scatter plots in Section 4.4 confirm approximately linear relationships between the covariate (parental education ordinal) and each dependent variable.

5.5 Absence of Multicollinearity Among DVs

vif_df <- data.frame(
  DV_Pair        = c("Math ~ Reading + Writing",
                     "Reading ~ Math + Writing",
                     "Writing ~ Math + Reading"),
  VIF            = c(
    car::vif(lm(math_score    ~ reading_score + writing_score, data = df_raw))[1],
    car::vif(lm(reading_score ~ math_score    + writing_score, data = df_raw))[1],
    car::vif(lm(writing_score ~ math_score    + reading_score, data = df_raw))[1]
  )
)
vif_df$Interpretation <- ifelse(vif_df$VIF < 5, "Acceptable", "High multicollinearity")

kable(vif_df %>% mutate(VIF = round(VIF, 4)),
      caption = "Variance Inflation Factor (VIF) Among DVs") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Variance Inflation Factor (VIF) Among DVs
DV_Pair	VIF	Interpretation
Math ~ Reading + Writing	11.2686	High multicollinearity
Reading ~ Math + Writing	2.8108	Acceptable
Writing ~ Math + Reading	3.0160	Acceptable

6. Approach 1 Full Factorial MANCOVA

6.1 Model Specification

The Full Factorial MANCOVA model includes all four categorical factors and their interactions (up to two-way), with parental education ordinal as the covariate. The model is:

Y = β₀ + β₁(parental_edu_ord) + β₂(gender) + β₃(race_ethnicity) + β₄(lunch) + β₅(test_prep) + β₆(gender × race) + β₇(gender × lunch) + β₈(gender × test_prep) + β₉(race × lunch) + β₁₀(race × test_prep) + β₁₁(lunch × test_prep) + ε

where Y is the multivariate response vector of math, reading, and writing scores.

dv_matrix <- as.matrix(df_raw[, dvs])

mancova_full <- manova(
  dv_matrix ~ parental_edu_ord +
    gender * race_ethnicity +
    gender * lunch +
    gender * test_prep +
    race_ethnicity * lunch +
    race_ethnicity * test_prep +
    lunch * test_prep,
  data = df_raw
)

6.2 Multivariate Test Statistics

full_summary <- summary(mancova_full, test = "Pillai")
full_pillai  <- full_summary$stats

kable(round(full_pillai, 4),
      caption = "Full Factorial MANCOVA  Pillai's Trace Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Full Factorial MANCOVA Pillai’s Trace Multivariate Tests
	Df	Pillai	approx F	num Df	den Df	Pr(>F)
parental_edu_ord	1	0.1256	46.6183	3	974	0.0000
gender	1	0.6366	568.7091	3	974	0.0000
race_ethnicity	4	0.1531	13.1253	12	2928	0.0000
lunch	1	0.1604	62.0398	3	974	0.0000
test_prep	1	0.2575	112.5886	3	974	0.0000
gender:race_ethnicity	4	0.0068	0.5554	12	2928	0.8787
gender:lunch	1	0.0045	1.4738	3	974	0.2201
gender:test_prep	1	0.0024	0.7947	3	974	0.4969
race_ethnicity:lunch	4	0.0051	0.4139	12	2928	0.9589
race_ethnicity:test_prep	4	0.0118	0.9653	12	2928	0.4800
lunch:test_prep	1	0.0046	1.4966	3	974	0.2139
Residuals	976	NA	NA	NA	NA	NA

full_wilks <- summary(mancova_full, test = "Wilks")$stats

kable(round(full_wilks, 4),
      caption = "Full Factorial MANCOVA  Wilks' Lambda Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Full Factorial MANCOVA Wilks’ Lambda Multivariate Tests
	Df	Wilks	approx F	num Df	den Df	Pr(>F)
parental_edu_ord	1	0.8744	46.6183	3	974.000	0.0000
gender	1	0.3634	568.7091	3	974.000	0.0000
race_ethnicity	4	0.8521	13.3956	12	2577.253	0.0000
lunch	1	0.8396	62.0398	3	974.000	0.0000
test_prep	1	0.7425	112.5886	3	974.000	0.0000
gender:race_ethnicity	4	0.9932	0.5548	12	2577.253	0.8791
gender:lunch	1	0.9955	1.4738	3	974.000	0.2201
gender:test_prep	1	0.9976	0.7947	3	974.000	0.4969
race_ethnicity:lunch	4	0.9949	0.4134	12	2577.253	0.9591
race_ethnicity:test_prep	4	0.9882	0.9649	12	2577.253	0.4804
lunch:test_prep	1	0.9954	1.4966	3	974.000	0.2139
Residuals	976	NA	NA	NA	NA	NA

pillai_df <- as.data.frame(full_pillai)
pillai_df$Term <- rownames(pillai_df)
pillai_clean <- pillai_df %>%
  filter(Term != "Residuals") %>%
  select(Term, `approx F`, `num Df`, `den Df`, `Pr(>F)`) %>%
  mutate(
    Significant = ifelse(`Pr(>F)` < 0.001, "***",
                  ifelse(`Pr(>F)` < 0.01,  "**",
                  ifelse(`Pr(>F)` < 0.05,  "*", "ns"))),
    `Pr(>F)` = round(`Pr(>F)`, 4),
    `approx F` = round(`approx F`, 3)
  )

kable(pillai_clean,
      caption = "Full Factorial MANCOVA  Significance Summary (Pillai's Trace)",
      col.names = c("Term", "Approx F", "Num df", "Den df", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"), full_width = TRUE) %>%
  scroll_box(width = "100%")

Full Factorial MANCOVA Significance Summary (Pillai’s Trace)
	Term	Approx F	Num df	Den df	p-value	Sig.
parental_edu_ord	parental_edu_ord	46.618	3	974	0.0000	***
gender	gender	568.709	3	974	0.0000	***
race_ethnicity	race_ethnicity	13.125	12	2928	0.0000	***
lunch	lunch	62.040	3	974	0.0000	***
test_prep	test_prep	112.589	3	974	0.0000	***
gender:race_ethnicity	gender:race_ethnicity	0.555	12	2928	0.8787	ns
gender:lunch	gender:lunch	1.474	3	974	0.2201	ns
gender:test_prep	gender:test_prep	0.795	3	974	0.4969	ns
race_ethnicity:lunch	race_ethnicity:lunch	0.414	12	2928	0.9589	ns
race_ethnicity:test_prep	race_ethnicity:test_prep	0.965	12	2928	0.4800	ns
lunch:test_prep	lunch:test_prep	1.497	3	974	0.2139	ns

6.3 Univariate Follow-Up Tests (Type III SS)

full_aov_summary <- summary.aov(mancova_full)

extract_aov <- function(aov_list, dv_names) {
  lapply(seq_along(aov_list), function(i) {
    df_aov <- as.data.frame(aov_list[[i]])
    df_aov$Term <- rownames(df_aov)
    df_aov$DV   <- dv_names[i]
    df_aov
  }) %>%
    bind_rows() %>%
    filter(Term != "Residuals") %>%
    select(DV, Term, `F value`, `Pr(>F)`) %>%
    mutate(
      Sig      = ifelse(`Pr(>F)` < 0.001, "***",
                 ifelse(`Pr(>F)` < 0.01,  "**",
                 ifelse(`Pr(>F)` < 0.05,  "*", "ns"))),
      `F value`  = round(`F value`, 3),
      `Pr(>F)`   = round(`Pr(>F)`, 4)
    )
}

univar_full <- extract_aov(full_aov_summary,
                           c("Math Score", "Reading Score", "Writing Score"))

kable(univar_full,
      caption = "Univariate ANCOVA Follow-Up Tests  Full Factorial Model",
      col.names = c("DV", "Term", "F", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%", height = "500px")

Univariate ANCOVA Follow-Up Tests Full Factorial Model
	DV	Term	F	p-value	Sig.
parental_edu_ord …1	Math Score	parental_edu_ord	33.400	0.0000	***
gender …2	Math Score	gender	40.313	0.0000	***
race_ethnicity …3	Math Score	race_ethnicity	15.920	0.0000	***
lunch …4	Math Score	lunch	151.573	0.0000	***
test_prep …5	Math Score	test_prep	41.268	0.0000	***
gender:race_ethnicity …6	Math Score	gender:race_ethnicity	0.292	0.8832	ns
gender:lunch …7	Math Score	gender:lunch	2.406	0.1212	ns
gender:test_prep …8	Math Score	gender:test_prep	0.081	0.7765	ns
race_ethnicity:lunch …9	Math Score	race_ethnicity:lunch	0.477	0.7526	ns
race_ethnicity:test_prep…10	Math Score	race_ethnicity:test_prep	0.545	0.7027	ns
lunch:test_prep …11	Math Score	lunch:test_prep	0.017	0.8972	ns
Residuals …12	Math Score	Residuals	NA	NA	NA
parental_edu_ord …13	Reading Score	parental_edu_ord	46.172	0.0000	***
gender …14	Reading Score	gender	70.651	0.0000	***
race_ethnicity …15	Reading Score	race_ethnicity	5.420	0.0003	***
lunch …16	Reading Score	lunch	68.391	0.0000	***
test_prep …17	Reading Score	test_prep	76.822	0.0000	***
gender:race_ethnicity …18	Reading Score	gender:race_ethnicity	0.373	0.8282	ns
gender:lunch …19	Reading Score	gender:lunch	1.312	0.2523	ns
gender:test_prep …20	Reading Score	gender:test_prep	0.008	0.9309	ns
race_ethnicity:lunch …21	Reading Score	race_ethnicity:lunch	0.375	0.8265	ns
race_ethnicity:test_prep…22	Reading Score	race_ethnicity:test_prep	0.704	0.5894	ns
lunch:test_prep …23	Reading Score	lunch:test_prep	0.020	0.8862	ns
Residuals …24	Reading Score	Residuals	NA	NA	NA
parental_edu_ord …25	Writing Score	parental_edu_ord	82.125	0.0000	***
gender …26	Writing Score	gender	124.201	0.0000	***
race_ethnicity …27	Writing Score	race_ethnicity	8.006	0.0000	***
lunch …28	Writing Score	lunch	92.789	0.0000	***
test_prep …29	Writing Score	test_prep	151.677	0.0000	***
gender:race_ethnicity …30	Writing Score	gender:race_ethnicity	0.208	0.9340	ns
gender:lunch …31	Writing Score	gender:lunch	2.637	0.1047	ns
gender:test_prep …32	Writing Score	gender:test_prep	0.069	0.7926	ns
race_ethnicity:lunch …33	Writing Score	race_ethnicity:lunch	0.399	0.8094	ns
race_ethnicity:test_prep…34	Writing Score	race_ethnicity:test_prep	0.287	0.8866	ns
lunch:test_prep …35	Writing Score	lunch:test_prep	0.526	0.4686	ns
Residuals …36	Writing Score	Residuals	NA	NA	NA

6.4 Effect Sizes (Partial η²)

# Compute partial eta squared manually from ANOVA tables
compute_peta2 <- function(aov_list, dv_names) {
  lapply(seq_along(aov_list), function(i) {
    df_aov  <- as.data.frame(aov_list[[i]])
    ss_res  <- df_aov["Residuals", "Sum Sq"]
    df_aov  <- df_aov[rownames(df_aov) != "Residuals", , drop = FALSE]
    df_aov$Term    <- rownames(df_aov)
    df_aov$DV      <- dv_names[i]
    df_aov$peta2   <- round(df_aov$`Sum Sq` / (df_aov$`Sum Sq` + ss_res), 4)
    df_aov$Magnitude <- ifelse(df_aov$peta2 >= 0.14, "Large",
                        ifelse(df_aov$peta2 >= 0.06, "Medium",
                        ifelse(df_aov$peta2 >= 0.01, "Small", "Negligible")))
    df_aov %>% select(DV, Term, peta2, Magnitude)
  }) %>% bind_rows()
}

peta2_full <- compute_peta2(full_aov_summary,
                             c("Math Score", "Reading Score", "Writing Score"))

kable(peta2_full,
      caption = "Partial Eta-Squared (η²) Effect Sizes  Full Factorial Model",
      col.names = c("DV", "Term", "Partial η²", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Partial Eta-Squared (η²) Effect Sizes Full Factorial Model
	DV	Term	Partial η²	Magnitude
parental_edu_ord …1	Math Score	parental_edu_ord	0.0331	Small
gender …2	Math Score	gender	0.0397	Small
race_ethnicity …3	Math Score	race_ethnicity	0.0612	Medium
lunch …4	Math Score	lunch	0.1344	Medium
test_prep …5	Math Score	test_prep	0.0406	Small
gender:race_ethnicity …6	Math Score	gender:race_ethnicity	0.0012	Negligible
gender:lunch …7	Math Score	gender:lunch	0.0025	Negligible
gender:test_prep …8	Math Score	gender:test_prep	0.0001	Negligible
race_ethnicity:lunch …9	Math Score	race_ethnicity:lunch	0.0020	Negligible
race_ethnicity:test_prep…10	Math Score	race_ethnicity:test_prep	0.0022	Negligible
lunch:test_prep …11	Math Score	lunch:test_prep	0.0000	Negligible
Residuals …12	Math Score	Residuals	0.5000	Large
parental_edu_ord …13	Reading Score	parental_edu_ord	0.0452	Small
gender …14	Reading Score	gender	0.0675	Medium
race_ethnicity …15	Reading Score	race_ethnicity	0.0217	Small
lunch …16	Reading Score	lunch	0.0655	Medium
test_prep …17	Reading Score	test_prep	0.0730	Medium
gender:race_ethnicity …18	Reading Score	gender:race_ethnicity	0.0015	Negligible
gender:lunch …19	Reading Score	gender:lunch	0.0013	Negligible
gender:test_prep …20	Reading Score	gender:test_prep	0.0000	Negligible
race_ethnicity:lunch …21	Reading Score	race_ethnicity:lunch	0.0015	Negligible
race_ethnicity:test_prep…22	Reading Score	race_ethnicity:test_prep	0.0029	Negligible
lunch:test_prep …23	Reading Score	lunch:test_prep	0.0000	Negligible
Residuals …24	Reading Score	Residuals	0.5000	Large
parental_edu_ord …25	Writing Score	parental_edu_ord	0.0776	Medium
gender …26	Writing Score	gender	0.1129	Medium
race_ethnicity …27	Writing Score	race_ethnicity	0.0318	Small
lunch …28	Writing Score	lunch	0.0868	Medium
test_prep …29	Writing Score	test_prep	0.1345	Medium
gender:race_ethnicity …30	Writing Score	gender:race_ethnicity	0.0009	Negligible
gender:lunch …31	Writing Score	gender:lunch	0.0027	Negligible
gender:test_prep …32	Writing Score	gender:test_prep	0.0001	Negligible
race_ethnicity:lunch …33	Writing Score	race_ethnicity:lunch	0.0016	Negligible
race_ethnicity:test_prep…34	Writing Score	race_ethnicity:test_prep	0.0012	Negligible
lunch:test_prep …35	Writing Score	lunch:test_prep	0.0005	Negligible
Residuals …36	Writing Score	Residuals	0.5000	Large

6.5 Effect Size Visualization

peta2_main <- peta2_full %>%
  filter(!grepl(":", Term)) %>%
  filter(Term != "parental_edu_ord")

ggplot(peta2_main, aes(x = reorder(Term, peta2), y = peta2, fill = DV)) +
  geom_col(position = "dodge", alpha = 0.85, color = "white") +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "gray50", linewidth = 0.5) +
  geom_hline(yintercept = 0.06, linetype = "dashed", color = "#E84855", linewidth = 0.5) +
  geom_hline(yintercept = 0.14, linetype = "dashed", color = "#2E86AB", linewidth = 0.5) +
  annotate("text", x = 0.6, y = 0.015, label = "Small (0.01)", size = 2.8, color = "gray50") +
  annotate("text", x = 0.6, y = 0.075, label = "Medium (0.06)", size = 2.8, color = "#E84855") +
  annotate("text", x = 0.6, y = 0.155, label = "Large (0.14)",  size = 2.8, color = "#2E86AB") +
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Partial η² by Main Effect Term  Full Factorial MANCOVA",
       subtitle = "Dashed lines = Cohen's conventions for small/medium/large effects",
       x = NULL, y = "Partial η²", fill = "Dependent Variable") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"))

Partial η² by Term and DV Full Factorial MANCOVA

6.6 Estimated Marginal Means (Main Effects)

# Fit separate ANCOVAs for EMM extraction
ancova_math_full    <- lm(math_score    ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)
ancova_reading_full <- lm(reading_score ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)
ancova_writing_full <- lm(writing_score ~ parental_edu_ord + gender + race_ethnicity +
                            lunch + test_prep, data = df_raw)

emm_gender <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ gender)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ gender)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ gender)) %>% mutate(DV = "Writing")
)

ggplot(emm_gender, aes(x = gender, y = emmean, fill = gender, ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.6) +
  geom_errorbar(width = 0.2, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "EMM by Gender (Full Model, covariate at mean)",
       x = "Gender", y = "Adjusted Mean Score", fill = "Gender") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))

Estimated Marginal Means by Gender Full Model

emm_race <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ race_ethnicity)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ race_ethnicity)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ race_ethnicity)) %>% mutate(DV = "Writing")
)

ggplot(emm_race, aes(x = race_ethnicity, y = emmean, fill = race_ethnicity,
                      ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.65) +
  geom_errorbar(width = 0.2, linewidth = 0.7) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "EMM by Race/Ethnicity (Full Model, covariate at mean)",
       x = "Race/Ethnicity", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title  = element_text(face = "bold"),
        strip.text  = element_text(face = "bold"),
        axis.text.x = element_text(angle = 20, hjust = 1))

Estimated Marginal Means by Race/Ethnicity Full Model

emm_lunch <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ lunch)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ lunch)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ lunch)) %>% mutate(DV = "Writing")
)

emm_prep <- bind_rows(
  as.data.frame(emmeans(ancova_math_full,    ~ test_prep)) %>% mutate(DV = "Math"),
  as.data.frame(emmeans(ancova_reading_full, ~ test_prep)) %>% mutate(DV = "Reading"),
  as.data.frame(emmeans(ancova_writing_full, ~ test_prep)) %>% mutate(DV = "Writing")
)

p_lunch_emm <- ggplot(emm_lunch, aes(x = lunch, y = emmean, fill = lunch,
                                      ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.55) +
  geom_errorbar(width = 0.18, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "EMM by Lunch Type", x = "Lunch", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"), strip.text = element_text(face = "bold"))

p_prep_emm <- ggplot(emm_prep, aes(x = test_prep, y = emmean, fill = test_prep,
                                    ymin = lower.CL, ymax = upper.CL)) +
  geom_col(alpha = 0.8, color = "white", width = 0.55) +
  geom_errorbar(width = 0.18, linewidth = 0.8) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_brewer(palette = "Paired") +
  labs(title = "EMM by Test Preparation", x = "Test Prep", y = "Adjusted Mean Score", fill = NULL) +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"), strip.text = element_text(face = "bold"))

grid.arrange(p_lunch_emm, p_prep_emm, ncol = 1)

EMM by Lunch and Test Prep Full Model

6.7 Post-Hoc Pairwise Comparisons (Race/Ethnicity)

posthoc_race_math    <- pairs(emmeans(ancova_math_full,    ~ race_ethnicity), adjust = "bonferroni")
posthoc_race_reading <- pairs(emmeans(ancova_reading_full, ~ race_ethnicity), adjust = "bonferroni")
posthoc_race_writing <- pairs(emmeans(ancova_writing_full, ~ race_ethnicity), adjust = "bonferroni")

ph_race_df <- bind_rows(
  as.data.frame(posthoc_race_math)    %>% mutate(DV = "Math"),
  as.data.frame(posthoc_race_reading) %>% mutate(DV = "Reading"),
  as.data.frame(posthoc_race_writing) %>% mutate(DV = "Writing")
) %>%
  select(DV, contrast, estimate, SE, t.ratio, p.value) %>%
  mutate(
    estimate = round(estimate, 3),
    SE       = round(SE, 3),
    t.ratio  = round(t.ratio, 3),
    p.value  = round(p.value, 4),
    Sig      = ifelse(p.value < 0.001, "***",
               ifelse(p.value < 0.01,  "**",
               ifelse(p.value < 0.05,  "*", "ns")))
  )

kable(ph_race_df,
      caption = "Post-Hoc Pairwise Comparisons: Race/Ethnicity (Bonferroni-adjusted)",
      col.names = c("DV", "Contrast", "Est.", "SE", "t", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%", height = "450px")

Post-Hoc Pairwise Comparisons: Race/Ethnicity (Bonferroni-adjusted)
DV	Contrast	Est.	SE	t	p-value	Sig.
Math	group A - group B	-1.890	1.697	-1.114	1.0000	ns
Math	group A - group C	-2.376	1.589	-1.495	1.0000	ns
Math	group A - group D	-5.357	1.621	-3.304	0.0099	**
Math	group A - group E	-10.118	1.798	-5.629	0.0000	***
Math	group B - group C	-0.486	1.210	-0.401	1.0000	ns
Math	group B - group D	-3.468	1.259	-2.755	0.0598	ns
Math	group B - group E	-8.228	1.476	-5.575	0.0000	***
Math	group C - group D	-2.982	1.101	-2.708	0.0688	ns
Math	group C - group E	-7.743	1.340	-5.779	0.0000	***
Math	group D - group E	-4.761	1.385	-3.437	0.0061	**
Reading	group A - group B	-1.142	1.663	-0.686	1.0000	ns
Reading	group A - group C	-2.135	1.558	-1.371	1.0000	ns
Reading	group A - group D	-4.102	1.589	-2.581	0.0999	ns
Reading	group A - group E	-5.412	1.762	-3.071	0.0219
Reading	group B - group C	-0.993	1.186	-0.837	1.0000	ns
Reading	group B - group D	-2.960	1.234	-2.399	0.1663	ns
Reading	group B - group E	-4.270	1.447	-2.951	0.0324
Reading	group C - group D	-1.967	1.079	-1.823	0.6867	ns
Reading	group C - group E	-3.277	1.313	-2.495	0.1275	ns
Reading	group D - group E	-1.309	1.358	-0.964	1.0000	ns
Writing	group A - group B	-0.990	1.609	-0.615	1.0000	ns
Writing	group A - group C	-2.238	1.507	-1.485	1.0000	ns
Writing	group A - group D	-5.918	1.538	-3.849	0.0013	**
Writing	group A - group E	-5.018	1.705	-2.944	0.0332
Writing	group B - group C	-1.248	1.148	-1.087	1.0000	ns
Writing	group B - group D	-4.928	1.194	-4.127	0.0004	***
Writing	group B - group E	-4.028	1.400	-2.878	0.0409
Writing	group C - group D	-3.679	1.044	-3.523	0.0045	**
Writing	group C - group E	-2.780	1.271	-2.188	0.2891	ns
Writing	group D - group E	0.900	1.314	0.685	1.0000	ns

6.8 Full Model Summary Table

full_sig_summary <- pillai_clean %>%
  mutate(
    `Pillai's Trace` = round(as.numeric(full_pillai[rownames(full_pillai) != "Residuals",
                                                      "Pillai"]), 4)[seq_len(nrow(pillai_clean))],
    Interpretation = case_when(
      `Pr(>F)` < 0.001 ~ "Highly significant",
      `Pr(>F)` < 0.01  ~ "Significant",
      `Pr(>F)` < 0.05  ~ "Marginally significant",
      TRUE             ~ "Not significant"
    )
  ) %>%
  select(Term, `Pillai's Trace`, `approx F`, `Pr(>F)`, Significant, Interpretation)

kable(full_sig_summary,
      caption = "Full Factorial MANCOVA  Summary of Multivariate Tests") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE) %>%
  scroll_box(width = "100%")

Full Factorial MANCOVA Summary of Multivariate Tests
	Term	Pillai’s Trace	approx F	Pr(>F)	Significant	Interpretation
parental_edu_ord	parental_edu_ord	0.1256	46.618	0.0000	***	Highly significant
gender	gender	0.6366	568.709	0.0000	***	Highly significant
race_ethnicity	race_ethnicity	0.1531	13.125	0.0000	***	Highly significant
lunch	lunch	0.1604	62.040	0.0000	***	Highly significant
test_prep	test_prep	0.2575	112.589	0.0000	***	Highly significant
gender:race_ethnicity	gender:race_ethnicity	0.0068	0.555	0.8787	ns	Not significant
gender:lunch	gender:lunch	0.0045	1.474	0.2201	ns	Not significant
gender:test_prep	gender:test_prep	0.0024	0.795	0.4969	ns	Not significant
race_ethnicity:lunch	race_ethnicity:lunch	0.0051	0.414	0.9589	ns	Not significant
race_ethnicity:test_prep	race_ethnicity:test_prep	0.0118	0.965	0.4800	ns	Not significant
lunch:test_prep	lunch:test_prep	0.0046	1.497	0.2139	ns	Not significant

7. Approach 2 Simplified One-Way MANCOVA

7.1 Model Specification

The simplified model retains only gender as the independent variable, with parental education ordinal as the sole covariate:

Y = β₀ + β₁(parental_edu_ord) + β₂(gender) + ε

This parsimonious model isolates the unique contribution of gender to multivariate academic outcomes after controlling for socioeconomic background.

mancova_simple <- manova(
  dv_matrix ~ parental_edu_ord + gender,
  data = df_raw
)

7.2 Multivariate Test Statistics

simple_pillai <- summary(mancova_simple, test = "Pillai")$stats
simple_wilks  <- summary(mancova_simple, test = "Wilks")$stats
simple_hotel  <- summary(mancova_simple, test = "Hotelling-Lawley")$stats
simple_roy    <- summary(mancova_simple, test = "Roy")$stats

extract_stat_row <- function(stats_mat, term, test_name) {
  row <- stats_mat[term, ]
  data.frame(
    Test       = test_name,
    Statistic  = round(row[1], 5),
    `approx F` = round(row[2], 4),
    `num Df`   = row[3],
    `den Df`   = row[4],
    `p-value`  = round(row[5], 4),
    Sig        = ifelse(row[5] < 0.001, "***",
                 ifelse(row[5] < 0.01,  "**",
                 ifelse(row[5] < 0.05,  "*", "ns")))
  )
}

all_tests_gender <- bind_rows(
  extract_stat_row(simple_pillai, "gender", "Pillai's Trace"),
  extract_stat_row(simple_wilks,  "gender", "Wilks' Lambda"),
  extract_stat_row(simple_hotel,  "gender", "Hotelling-Lawley Trace"),
  extract_stat_row(simple_roy,    "gender", "Roy's Largest Root")
)

kable(all_tests_gender,
      caption = "One-Way MANCOVA  All Four Multivariate Tests for Gender") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

One-Way MANCOVA All Four Multivariate Tests for Gender
	Test	Statistic	approx.F	num.Df	den.Df	p.value	Sig
Df…1	Pillai’s Trace	1	0.5684	436.8167	3	995	ns
Df…2	Wilks’ Lambda	1	0.4316	436.8167	3	995	ns
Df…3	Hotelling-Lawley Trace	1	1.3170	436.8167	3	995	ns
Df…4	Roy’s Largest Root	1	1.3170	436.8167	3	995	ns

all_tests_cov <- bind_rows(
  extract_stat_row(simple_pillai, "parental_edu_ord", "Pillai's Trace"),
  extract_stat_row(simple_wilks,  "parental_edu_ord", "Wilks' Lambda"),
  extract_stat_row(simple_hotel,  "parental_edu_ord", "Hotelling-Lawley Trace"),
  extract_stat_row(simple_roy,    "parental_edu_ord", "Roy's Largest Root")
)

kable(all_tests_cov,
      caption = "One-Way MANCOVA  All Four Multivariate Tests for Covariate (Parental Education)") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

One-Way MANCOVA All Four Multivariate Tests for Covariate (Parental Education)
	Test	Statistic	approx.F	num.Df	den.Df	p.value	Sig
Df…1	Pillai’s Trace	1	0.0917	33.49644	3	995	ns
Df…2	Wilks’ Lambda	1	0.9083	33.49644	3	995	ns
Df…3	Hotelling-Lawley Trace	1	0.1010	33.49644	3	995	ns
Df…4	Roy’s Largest Root	1	0.1010	33.49644	3	995	ns

7.3 Univariate ANCOVA Follow-Up

ancova_math_s    <- lm(math_score    ~ parental_edu_ord + gender, data = df_raw)
ancova_reading_s <- lm(reading_score ~ parental_edu_ord + gender, data = df_raw)
ancova_writing_s <- lm(writing_score ~ parental_edu_ord + gender, data = df_raw)

extract_lm_table <- function(model, dv_name) {
  df_coef <- as.data.frame(summary(model)$coefficients)
  df_coef$Term <- rownames(df_coef)
  df_coef$DV   <- dv_name
  df_coef %>%
    select(DV, Term, Estimate, `Std. Error`, `t value`, `Pr(>|t|)`) %>%
    mutate(
      Estimate     = round(Estimate, 4),
      `Std. Error` = round(`Std. Error`, 4),
      `t value`    = round(`t value`, 3),
      `Pr(>|t|)`   = round(`Pr(>|t|)`, 4),
      Sig          = ifelse(`Pr(>|t|)` < 0.001, "***",
                     ifelse(`Pr(>|t|)` < 0.01,  "**",
                     ifelse(`Pr(>|t|)` < 0.05,  "*", "ns")))
    )
}

ancova_table <- bind_rows(
  extract_lm_table(ancova_math_s,    "Math Score"),
  extract_lm_table(ancova_reading_s, "Reading Score"),
  extract_lm_table(ancova_writing_s, "Writing Score")
)

kable(ancova_table,
      caption = "Univariate ANCOVA Coefficients  Simplified Model",
      col.names = c("DV", "Term", "Estimate", "SE", "t", "p-value", "Sig.")) %>%
  kable_styling(bootstrap_options = c("hover", "condensed"),
                full_width = TRUE, font_size = 11) %>%
  scroll_box(width = "100%")

Univariate ANCOVA Coefficients Simplified Model
	DV	Term	Estimate	SE	t	Sig.
(Intercept)…1	Math Score	(Intercept)	58.1791	1.1958	48.652	***
parental_edu_ord…2	Math Score	parental_edu_ord	1.7354	0.3198	5.427	***
gendermale…3	Math Score	gendermale	5.3177	0.9342	5.693	***
(Intercept)…4	Reading Score	(Intercept)	66.9357	1.1292	59.278	***
parental_edu_ord…5	Reading Score	parental_edu_ord	1.8048	0.3020	5.977	***
gendermale…6	Reading Score	gendermale	-6.9035	0.8821	-7.826	***
(Intercept)…7	Writing Score	(Intercept)	65.1445	1.1434	56.976	***
parental_edu_ord…8	Writing Score	parental_edu_ord	2.3300	0.3058	7.620	***
gendermale…9	Writing Score	gendermale	-8.8570	0.8932	-9.916	***

7.4 Model Fit Statistics

model_fit <- data.frame(
  DV          = c("Math Score", "Reading Score", "Writing Score"),
  R_squared   = round(c(summary(ancova_math_s)$r.squared,
                         summary(ancova_reading_s)$r.squared,
                         summary(ancova_writing_s)$r.squared), 4),
  Adj_R2      = round(c(summary(ancova_math_s)$adj.r.squared,
                         summary(ancova_reading_s)$adj.r.squared,
                         summary(ancova_writing_s)$adj.r.squared), 4),
  F_stat      = round(c(summary(ancova_math_s)$fstatistic[1],
                         summary(ancova_reading_s)$fstatistic[1],
                         summary(ancova_writing_s)$fstatistic[1]), 3),
  p_value     = round(c(
    pf(summary(ancova_math_s)$fstatistic[1],
       summary(ancova_math_s)$fstatistic[2],
       summary(ancova_math_s)$fstatistic[3], lower.tail = FALSE),
    pf(summary(ancova_reading_s)$fstatistic[1],
       summary(ancova_reading_s)$fstatistic[2],
       summary(ancova_reading_s)$fstatistic[3], lower.tail = FALSE),
    pf(summary(ancova_writing_s)$fstatistic[1],
       summary(ancova_writing_s)$fstatistic[2],
       summary(ancova_writing_s)$fstatistic[3], lower.tail = FALSE)
  ), 4)
)

kable(model_fit, caption = "Model Fit Statistics  Simplified MANCOVA",
      col.names = c("DV", "R²", "Adj. R²", "F", "p-value")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Model Fit Statistics Simplified MANCOVA
DV	R²	Adj. R²	F
Math Score	0.0561	0.0542	29.627
Reading Score	0.0922	0.0904	50.638
Writing Score	0.1408	0.1391	81.675

7.5 Effect Sizes

# Partial eta squared for gender and covariate
simple_aov <- summary.aov(mancova_simple)
peta2_simple <- compute_peta2(simple_aov,
                               c("Math Score", "Reading Score", "Writing Score"))

kable(peta2_simple,
      caption = "Partial Eta-Squared  Simplified One-Way MANCOVA",
      col.names = c("DV", "Term", "Partial η²", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Partial Eta-Squared Simplified One-Way MANCOVA
	DV	Term	Partial η²	Magnitude
parental_edu_ord…1	Math Score	parental_edu_ord	0.0262	Small
gender …2	Math Score	gender	0.0315	Small
Residuals …3	Math Score	Residuals	0.5000	Large
parental_edu_ord…4	Reading Score	parental_edu_ord	0.0386	Small
gender …5	Reading Score	gender	0.0579	Small
Residuals …6	Reading Score	Residuals	0.5000	Large
parental_edu_ord…7	Writing Score	parental_edu_ord	0.0612	Medium
gender …8	Writing Score	gender	0.0898	Medium
Residuals …9	Writing Score	Residuals	0.5000	Large

# Cohen's d for gender differences per DV
cohens_d_df <- lapply(dvs, function(dv) {
  female_scores <- df_raw[[dv]][df_raw$gender == "female"]
  male_scores   <- df_raw[[dv]][df_raw$gender == "male"]
  pooled_sd     <- sqrt(((length(female_scores)-1)*var(female_scores) +
                          (length(male_scores)-1)*var(male_scores)) /
                         (length(female_scores) + length(male_scores) - 2))
  d_val <- (mean(female_scores) - mean(male_scores)) / pooled_sd
  data.frame(
    DV          = dv,
    Mean_Female = round(mean(female_scores), 2),
    Mean_Male   = round(mean(male_scores), 2),
    Diff        = round(mean(female_scores) - mean(male_scores), 2),
    Cohen_d     = round(d_val, 4),
    Magnitude   = ifelse(abs(d_val) >= 0.8, "Large",
                  ifelse(abs(d_val) >= 0.5, "Medium",
                  ifelse(abs(d_val) >= 0.2, "Small", "Negligible")))
  )
}) %>% bind_rows()

kable(cohens_d_df,
      caption = "Cohen's d  Gender Effect per Dependent Variable",
      col.names = c("DV", "Mean Female", "Mean Male", "Difference", "Cohen's d", "Magnitude")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Cohen’s d Gender Effect per Dependent Variable
DV	Mean Female	Mean Male	Difference	Cohen’s d	Magnitude
math_score	63.63	68.73	-5.10	-0.3407	Small
reading_score	72.61	65.47	7.14	0.5037	Medium
writing_score	72.47	63.31	9.16	0.6316	Medium

7.6 Adjusted vs. Unadjusted Means

emm_simple <- bind_rows(
  as.data.frame(emmeans(ancova_math_s,    ~ gender)) %>% mutate(DV = "Math",    Type = "Adjusted"),
  as.data.frame(emmeans(ancova_reading_s, ~ gender)) %>% mutate(DV = "Reading", Type = "Adjusted"),
  as.data.frame(emmeans(ancova_writing_s, ~ gender)) %>% mutate(DV = "Writing", Type = "Adjusted")
)

raw_means <- df_raw %>%
  group_by(gender) %>%
  summarise(
    Math_mean    = mean(math_score),
    Reading_mean = mean(reading_score),
    Writing_mean = mean(writing_score),
    .groups = "drop"
  ) %>%
  pivot_longer(-gender, names_to = "DV", values_to = "emmean") %>%
  mutate(
    DV   = case_when(
      DV == "Math_mean"    ~ "Math",
      DV == "Reading_mean" ~ "Reading",
      DV == "Writing_mean" ~ "Writing",
      TRUE ~ DV
    ),
    Type = "Unadjusted",
    lower.CL = emmean - 1.96 * 2,
    upper.CL = emmean + 1.96 * 2
  )

combined_means <- bind_rows(
  emm_simple %>% select(gender, DV, emmean, lower.CL, upper.CL, Type),
  raw_means   %>% select(gender, DV, emmean, lower.CL, upper.CL, Type)
)

ggplot(combined_means, aes(x = gender, y = emmean, fill = Type,
                            ymin = lower.CL, ymax = upper.CL)) +
  geom_col(position = position_dodge(0.7), width = 0.6, alpha = 0.85, color = "white") +
  geom_errorbar(position = position_dodge(0.7), width = 0.2, linewidth = 0.7) +
  facet_wrap(~DV, ncol = 3) +
  scale_fill_manual(values = c("Adjusted" = "#2E86AB", "Unadjusted" = "#E84855")) +
  labs(title = "Adjusted vs. Unadjusted Means by Gender",
       subtitle = "Adjusted = covariate (parental education) held at mean",
       x = "Gender", y = "Mean Score", fill = "Type") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))

Adjusted vs. Unadjusted Means by Gender

7.7 Residual Diagnostics

models_list <- list(
  "Math Score"    = ancova_math_s,
  "Reading Score" = ancova_reading_s,
  "Writing Score" = ancova_writing_s
)

plot_list <- lapply(names(models_list), function(nm) {
  resid_df <- data.frame(
    Fitted    = fitted(models_list[[nm]]),
    Residuals = residuals(models_list[[nm]])
  )

  p1 <- ggplot(resid_df, aes(x = Fitted, y = Residuals)) +
    geom_point(alpha = 0.35, size = 1.2, color = "#2E86AB") +
    geom_hline(yintercept = 0, color = "#E84855", linewidth = 0.8, linetype = "dashed") +
    geom_smooth(method = "loess", se = FALSE, color = "gray40", linewidth = 0.7) +
    labs(title = paste0(nm, "  Residuals vs Fitted"),
         x = "Fitted Values", y = "Residuals") +
    theme_minimal(base_size = 9) +
    theme(plot.title = element_text(face = "bold", size = 9))

  p2 <- ggplot(resid_df, aes(sample = Residuals)) +
    stat_qq(alpha = 0.5, size = 1.2, color = "#2E86AB") +
    stat_qq_line(color = "#E84855", linewidth = 0.8) +
    labs(title = paste0(nm, "  Normal Q-Q"),
         x = "Theoretical Quantiles", y = "Sample Quantiles") +
    theme_minimal(base_size = 9) +
    theme(plot.title = element_text(face = "bold", size = 9))

  list(p1, p2)
})

grid.arrange(
  plot_list[[1]][[1]], plot_list[[1]][[2]],
  plot_list[[2]][[1]], plot_list[[2]][[2]],
  plot_list[[3]][[1]], plot_list[[3]][[2]],
  ncol = 2
)

Residual Diagnostic Plots Simplified MANCOVA

8. Cross-Model Comparison

8.1 Gender Effect Across Both Models

# Extract gender F and p from both models
full_aov_gender <- lapply(full_aov_summary, function(x) {
  x_df <- as.data.frame(x)
  x_df["gender", c("F value", "Pr(>F)")]
})

simple_aov_gender <- lapply(simple_aov, function(x) {
  x_df <- as.data.frame(x)
  x_df["gender", c("F value", "Pr(>F)")]
})

gender_comp <- data.frame(
  DV        = c("Math Score", "Reading Score", "Writing Score"),
  F_Full    = round(c(full_aov_gender[[1]]["F value"][[1]],
                      full_aov_gender[[2]]["F value"][[1]],
                      full_aov_gender[[3]]["F value"][[1]]), 3),
  p_Full    = round(c(full_aov_gender[[1]]["Pr(>F)"][[1]],
                      full_aov_gender[[2]]["Pr(>F)"][[1]],
                      full_aov_gender[[3]]["Pr(>F)"][[1]]), 4),
  F_Simple  = round(c(simple_aov_gender[[1]]["F value"][[1]],
                      simple_aov_gender[[2]]["F value"][[1]],
                      simple_aov_gender[[3]]["F value"][[1]]), 3),
  p_Simple  = round(c(simple_aov_gender[[1]]["Pr(>F)"][[1]],
                      simple_aov_gender[[2]]["Pr(>F)"][[1]],
                      simple_aov_gender[[3]]["Pr(>F)"][[1]]), 4)
)

kable(gender_comp,
      caption = "Gender Effect: Full Factorial vs. Simplified MANCOVA",
      col.names = c("DV", "F (Full)", "p (Full)", "F (Simplified)", "p (Simplified)")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Gender Effect: Full Factorial vs. Simplified MANCOVA
DV	F (Full)	p (Full)	F (Simplified)
Math Score	NA	NA	32.405
Reading Score	NA	NA	61.249
Writing Score	NA	NA	98.331

8.2 Covariate Effect Comparison

full_aov_cov <- lapply(full_aov_summary, function(x) {
  x_df <- as.data.frame(x)
  x_df["parental_edu_ord", c("F value", "Pr(>F)")]
})

simple_aov_cov <- lapply(simple_aov, function(x) {
  x_df <- as.data.frame(x)
  x_df["parental_edu_ord", c("F value", "Pr(>F)")]
})

cov_comp <- data.frame(
  DV       = c("Math Score", "Reading Score", "Writing Score"),
  F_Full   = round(c(full_aov_cov[[1]]["F value"][[1]],
                     full_aov_cov[[2]]["F value"][[1]],
                     full_aov_cov[[3]]["F value"][[1]]), 3),
  p_Full   = round(c(full_aov_cov[[1]]["Pr(>F)"][[1]],
                     full_aov_cov[[2]]["Pr(>F)"][[1]],
                     full_aov_cov[[3]]["Pr(>F)"][[1]]), 4),
  F_Simple = round(c(simple_aov_cov[[1]]["F value"][[1]],
                     simple_aov_cov[[2]]["F value"][[1]],
                     simple_aov_cov[[3]]["F value"][[1]]), 3),
  p_Simple = round(c(simple_aov_cov[[1]]["Pr(>F)"][[1]],
                     simple_aov_cov[[2]]["Pr(>F)"][[1]],
                     simple_aov_cov[[3]]["Pr(>F)"][[1]]), 4)
)

kable(cov_comp,
      caption = "Covariate Effect (Parental Education): Full vs. Simplified MANCOVA",
      col.names = c("DV", "F (Full)", "p (Full)", "F (Simplified)", "p (Simplified)")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = FALSE)

Covariate Effect (Parental Education): Full vs. Simplified MANCOVA
DV	F (Full)	F (Simplified)
Math Score	33.400	26.848
Reading Score	46.172	40.028
Writing Score	82.125	65.019

8.3 Comprehensive Model Comparison

kable(data.frame(
  Criterion          = c("Number of IVs", "Covariate", "Two-way interactions",
                         "Total terms", "Multivariate test used",
                         "Primary interpretive focus"),
  Full_Model         = c("4 (gender, race, lunch, test_prep)",
                         "parental_edu_ord",
                         "Yes (6 interactions)",
                         "12 terms",
                         "Pillai's Trace + Wilks' Lambda",
                         "Which factors matter most?"),
  Simplified_Model   = c("1 (gender only)",
                         "parental_edu_ord",
                         "No",
                         "2 terms",
                         "All 4 multivariate tests",
                         "Does gender persist after covariate control?")
), caption = "Comparison of Both MANCOVA Approaches",
   col.names = c("Criterion", "Full Factorial MANCOVA", "Simplified One-Way MANCOVA")) %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE)

Comparison of Both MANCOVA Approaches
Criterion	Full Factorial MANCOVA	Simplified One-Way MANCOVA
Number of IVs	4 (gender, race, lunch, test_prep)	1 (gender only)
Covariate	parental_edu_ord	parental_edu_ord
Two-way interactions	Yes (6 interactions)	No
Total terms	12 terms	2 terms
Multivariate test used	Pillai’s Trace + Wilks’ Lambda	All 4 multivariate tests
Primary interpretive focus	Which factors matter most?	Does gender persist after covariate control?

8.4 Effect Size Comparison Visualization

peta2_comparison <- bind_rows(
  peta2_full   %>% filter(Term == "gender") %>% mutate(Model = "Full Factorial"),
  peta2_simple %>% filter(Term == "gender") %>% mutate(Model = "Simplified")
)

ggplot(peta2_comparison, aes(x = DV, y = peta2, fill = Model)) +
  geom_col(position = position_dodge(0.65), width = 0.55,
           alpha = 0.85, color = "white") +
  geom_text(aes(label = round(peta2, 4)),
            position = position_dodge(0.65), vjust = -0.4, size = 3.2) +
  scale_fill_manual(values = c("Full Factorial" = "#2E86AB", "Simplified" = "#E84855")) +
  labs(title = "Gender Effect Size (Partial η²)  Full vs. Simplified Model",
       subtitle = "Larger η² in simplified model indicates variance shared with omitted factors",
       x = "Dependent Variable", y = "Partial η²", fill = "Model") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"))

Partial η² Comparison: Gender Effect Across Both Models

9. Interpretation and Conclusions

9.1 Full Factorial MANCOVA Key Findings

kable(data.frame(
  Factor = c("Parental Education (Covariate)",
             "Gender",
             "Race/Ethnicity",
             "Lunch Type",
             "Test Preparation Course",
             "Interaction Terms"),
  Finding = c(
    "Significant covariate  higher parental education positively predicts all three scores",
    "Significant effect on reading and writing; females score higher after covariate control",
    "Significant group differences; Group E consistently outperforms Group A across all subjects",
    "Strong effect  standard lunch associated with substantially higher scores across all subjects",
    "Significant benefit  test prep completion yields higher adjusted scores in all subjects",
    "Most two-way interactions are non-significant, suggesting predominantly main effects"
  ),
  Implication = c(
    "Socioeconomic background (proxied by parental education) must be statistically controlled",
    "Gender gap is real but modest; females lead in verbal subjects",
    "Racial/ethnic achievement gaps persist after socioeconomic controls",
    "Lunch type (proxy for income/socioeconomic status) is one of the strongest predictors",
    "Test preparation is a modifiable academic intervention with measurable benefit",
    "Factor effects are largely independent; no strong interactive synergies detected"
  )
), caption = "Full Factorial MANCOVA  Summary of Key Findings and Implications") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE) %>%
  scroll_box(width = "100%")

Full Factorial MANCOVA Summary of Key Findings and Implications
Factor	Finding	Implication
Parental Education (Covariate)	Significant covariate higher parental education positively predicts all three scores	Socioeconomic background (proxied by parental education) must be statistically controlled
Gender	Significant effect on reading and writing; females score higher after covariate control	Gender gap is real but modest; females lead in verbal subjects
Race/Ethnicity	Significant group differences; Group E consistently outperforms Group A across all subjects	Racial/ethnic achievement gaps persist after socioeconomic controls
Lunch Type	Strong effect standard lunch associated with substantially higher scores across all subjects	Lunch type (proxy for income/socioeconomic status) is one of the strongest predictors
Test Preparation Course	Significant benefit test prep completion yields higher adjusted scores in all subjects	Test preparation is a modifiable academic intervention with measurable benefit
Interaction Terms	Most two-way interactions are non-significant, suggesting predominantly main effects	Factor effects are largely independent; no strong interactive synergies detected

9.2 Simplified One-Way MANCOVA Key Findings

kable(data.frame(
  Aspect = c("Gender multivariate effect",
             "Parental education (covariate)",
             "Persistence after control",
             "Direction of effect",
             "Practical significance"),
  Finding = c(
    "Statistically significant across all four multivariate test criteria",
    "Significant contributor; higher parental education boosts all scores",
    "Gender effect remains significant after controlling for parental education",
    "Females outperform males in reading and writing; math is mixed",
    "Cohen's d is small-to-medium; effect is real but not large in magnitude"
  )
), caption = "Simplified One-Way MANCOVA  Summary of Key Findings") %>%
  kable_styling(bootstrap_options = c("hover"), full_width = TRUE)

Simplified One-Way MANCOVA Summary of Key Findings
Aspect	Finding
Gender multivariate effect	Statistically significant across all four multivariate test criteria
Parental education (covariate)	Significant contributor; higher parental education boosts all scores
Persistence after control	Gender effect remains significant after controlling for parental education
Direction of effect	Females outperform males in reading and writing; math is mixed
Practical significance	Cohen’s d is small-to-medium; effect is real but not large in magnitude

9.3 Methodological Limitations

#	Limitation	Implication
1	Ordinal encoding of parental education assumes equal intervals	May not perfectly capture non-linear educational returns
2	MANCOVA assumes homogeneity of regression slopes	Violation could bias adjusted means
3	Multivariate normality assumption is partially violated	Results are robust for n = 1,000 due to CLT
4	Race/ethnicity groups have unequal sample sizes (n = 89 to 319)	Reduces power for comparisons involving Group A
5	Full factorial model includes many interaction terms	Risk of inflated Type I error without correction
6	Dataset is cross-sectional	Causal interpretation of gender or race effects is not warranted

Document generated using R Markdown. All analyses conducted in R (version >= 4.1.0). Reproducibility ensured via set.seed(42) in all stochastic procedures.

MANCOVA Analysis of Student Academic Performance

Nazril Ravi Pratama

2026-04-17