1 Executive Summary

This document presents a comprehensive and technical assumption testing process before conducting MANOVA, ANCOVA, and MANCOVA analyses on the Students Performance in Exams dataset. Five fundamental assumptions were tested sequentially:

No.	Assumption	Test Method
1	Dependence between DVs	Bartlett’s Sphericity Test
2	Homogeneity of the covariance matrix	Box’s M Test
3	Multivariate normality	Mardia + Henze-Zirkler + MVN plot
4	Linearity of covariates–DVs	Scatter plot + Pearson correlation + ANOVA F-test
5	Independence of observations	Review of the research design

Variables used:

DV (Dependent Variables): math score, reading score, writing score
IV (Independent Variable / Factor): gender (female, male)
Covariate: test preparation course (none, completed → coded 0/1)

2 Data Loading & Preparation

2.1 Import Dataset

df <- read.csv("StudentsPerformance.csv", check.names = FALSE)

cat("STUDENTS PERFORMANCE DATASET - Overview\n")
cat(sprintf(" Number of Observations : %d rows\n", nrow(df)))
cat(sprintf(" Number of Variables : %d columns\n", ncol(df)))
cat("\n Variable Name:\n")
for (v in names(df)) cat(sprintf(" ▸ %s\n", v))

## STUDENTS PERFORMANCE DATASET - Overview
##  Number of Observations : 1000 rows
##  Number of Variables : 8 columns
## 
##  Variable Name:
##  ▸ gender
##  ▸ race/ethnicity
##  ▸ parental level of education
##  ▸ lunch
##  ▸ test preparation course
##  ▸ math score
##  ▸ reading score
##  ▸ writing score

2.2 Dataset Preview

head(df, 10) %>% 
  kable(caption = "First 10 Rows — Students Performance Dataset") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = TRUE, font_size = 12) %>% 
scroll_box(width = "100%")

First 10 Rows — Students Performance Dataset
gender	race/ethnicity	parental level of education	lunch	test preparation course	math score	reading score	writing score
female	group B	bachelor’s degree	standard	none	72	72	74
female	group C	some college	standard	completed	69	90	88
female	group B	master’s degree	standard	none	90	95	93
male	group A	associate’s degree	free/reduced	none	47	57	44
male	group C	some college	standard	none	76	78	75
female	group B	associate’s degree	standard	none	71	83	78
female	group B	some college	standard	completed	88	95	92
male	group B	some college	free/reduced	none	40	43	39
male	group D	high school	free/reduced	completed	64	64	67
female	group B	high school	free/reduced	none	38	60	50

2.3 Persiapan Variabel

# Main variable
dv_cols <- c("math score", "reading score", "writing score")
iv_col <- "gender"
cov_col <- "test preparation course"

# Encode covariate → numeric
df$test_prep_num <- ifelse(df[[cov_col]] == "completed", 1, 0)

# DV matrix subset
Y <- df[, dv_cols]

cat("Group Distribution (gender):\n")
print(table(df[[iv_col]]))

cat("\nCovariate Distribution (test preparation course):\n")
print(table(df[[cov_col]]))

## Group Distribution (gender):
## 
## female   male 
##    518    482 
## 
## Covariate Distribution (test preparation course):
## 
## completed      none 
##       358       642

2.4 Statistik Deskriptif

# Per variabel DV
desc_tbl <- data.frame(
  Variable = dv_cols,
  N = sapply(Y, length),
  Mean = round(sapply(Y, mean), 3),
  Median = round(sapply(Y, median), 3),
  SD = round(sapply(Y, sd), 3),
  Min = sapply(Y, min),
  Max = sapply(Y, max),
  Skewness = round(sapply(Y, function(x) mean((x - mean(x))^3) / sd(x)^3), 3),
  Kurtosis = round(sapply(Y, function(x) mean((x - mean(x))^4) / sd(x)^4 - 3), 3)
)

desc_tbl %>%
  kable(caption = "Descriptive Statistics - Dependent Variable") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#34495E")

Descriptive Statistics - Dependent Variable
	Variable	N	Mean	Median	SD	Min	Max	Skewness	Kurtosis
math score	math score	1000	66.089	66	15.163	0	100	-0.278	0.261
reading score	reading score	1000	69.169	70	14.600	17	100	-0.258	-0.080
writing score	writing score	1000	68.054	69	15.196	10	100	-0.289	-0.045

# Each variable each group
for (col in dv_cols) {
  tbl <- df %>%
    group_by(gender) %>%
    summarise(
      N = n(),
      Mean = round(mean(.data[[col]]), 3),
      Median = round(median(.data[[col]]), 3),
      SD = round(sd(.data[[col]]), 3),
      Min = min(.data[[col]]),
      Max = max(.data[[col]])
    )
  
  print(
    tbl %>%
      kable(caption = sprintf("Descriptive '%s' each Gender", col)) %>%
      kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
      column_spec(1, bold = TRUE)
  )
}

## <table class="table table-striped table-hover" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <caption>Descriptive 'math score' each Gender</caption>
##  <thead>
##   <tr>
##    <th style="text-align:left;"> gender </th>
##    <th style="text-align:right;"> N </th>
##    <th style="text-align:right;"> Mean </th>
##    <th style="text-align:right;"> Median </th>
##    <th style="text-align:right;"> SD </th>
##    <th style="text-align:right;"> Min </th>
##    <th style="text-align:right;"> Max </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> female </td>
##    <td style="text-align:right;"> 518 </td>
##    <td style="text-align:right;"> 63.633 </td>
##    <td style="text-align:right;"> 65 </td>
##    <td style="text-align:right;"> 15.491 </td>
##    <td style="text-align:right;"> 0 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> male </td>
##    <td style="text-align:right;"> 482 </td>
##    <td style="text-align:right;"> 68.728 </td>
##    <td style="text-align:right;"> 69 </td>
##    <td style="text-align:right;"> 14.356 </td>
##    <td style="text-align:right;"> 27 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
## </tbody>
## </table><table class="table table-striped table-hover" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <caption>Descriptive 'reading score' each Gender</caption>
##  <thead>
##   <tr>
##    <th style="text-align:left;"> gender </th>
##    <th style="text-align:right;"> N </th>
##    <th style="text-align:right;"> Mean </th>
##    <th style="text-align:right;"> Median </th>
##    <th style="text-align:right;"> SD </th>
##    <th style="text-align:right;"> Min </th>
##    <th style="text-align:right;"> Max </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> female </td>
##    <td style="text-align:right;"> 518 </td>
##    <td style="text-align:right;"> 72.608 </td>
##    <td style="text-align:right;"> 73 </td>
##    <td style="text-align:right;"> 14.378 </td>
##    <td style="text-align:right;"> 17 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> male </td>
##    <td style="text-align:right;"> 482 </td>
##    <td style="text-align:right;"> 65.473 </td>
##    <td style="text-align:right;"> 66 </td>
##    <td style="text-align:right;"> 13.932 </td>
##    <td style="text-align:right;"> 23 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
## </tbody>
## </table><table class="table table-striped table-hover" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
## <caption>Descriptive 'writing score' each Gender</caption>
##  <thead>
##   <tr>
##    <th style="text-align:left;"> gender </th>
##    <th style="text-align:right;"> N </th>
##    <th style="text-align:right;"> Mean </th>
##    <th style="text-align:right;"> Median </th>
##    <th style="text-align:right;"> SD </th>
##    <th style="text-align:right;"> Min </th>
##    <th style="text-align:right;"> Max </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> female </td>
##    <td style="text-align:right;"> 518 </td>
##    <td style="text-align:right;"> 72.467 </td>
##    <td style="text-align:right;"> 74 </td>
##    <td style="text-align:right;"> 14.845 </td>
##    <td style="text-align:right;"> 10 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;font-weight: bold;"> male </td>
##    <td style="text-align:right;"> 482 </td>
##    <td style="text-align:right;"> 63.311 </td>
##    <td style="text-align:right;"> 64 </td>
##    <td style="text-align:right;"> 14.114 </td>
##    <td style="text-align:right;"> 15 </td>
##    <td style="text-align:right;"> 100 </td>
##   </tr>
## </tbody>
## </table>

Y_long <- pivot_longer(Y, cols = everything(), names_to = "Variable", values_to = "Score")

ggplot(Y_long, aes(x = Score, fill = Variable)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30,
                 alpha = 0.7, color = "white") +
  geom_density(color = "#2C3E50", linewidth = 0.9) +
  facet_wrap(~Variable, scales = "free", ncol = 1) +
  scale_fill_manual(values = c("#3498DB", "#E74C3C", "#2ECC71")) +
  theme_minimal(base_size = 13) +
  theme(
    strip.background = element_rect(fill = "#34495E"),
    strip.text       = element_text(color = "white", face = "bold"),
    legend.position  = "none",
    plot.title       = element_text(hjust = 0.5, face = "bold", size = 15)
  ) +
  labs(title = "Distribution of Dependent Variable", x = "Score", y = "Density")

3 Assumption 1 - Dependence between Dependent Variables (Bartlett’s Sphericity Test)

3.1 Theoretical Basis

Objective: To verify that the dependent variables ($Y_1$, $Y_2$, $Y_3$) are not independent, meaning there is a meaningful correlation between them. MANOVA only provides analytical advantages over a series of univariate ANOVAs if the DVs are correlated.

Hypothesis:

\[H_0 : \boldsymbol{\rho} = \mathbf{I} \quad \text{(correlation matrix = identity matrix; independent DV)}\] \[H_1 : \boldsymbol{\rho} \neq \mathbf{I} \quad \text{(there is at least one correlation }\neq 0\text{)}\]

Statistik Uji — Bartlett (1950): \[\chi^2 = -\left[(n-1) - \frac{2p+5}{6}\right] \ln |\mathbf{R}|\]

where $n$ = number of observations, $p$ = number of DV variables, $|\mathbf{R}|$ = determinant of the correlation matrix. Degrees of freedom: $df = \frac{p(p-1)}{2}$.

Criteria: Reject $H_0$ if $\chi^2_{\text{count}} > \chi^2_{\alpha, df}$, or $p\text{-value} < 0.05$.

3.2 Correlation Matrix between DVs

R <- cor(Y)

# Correlation Table
R %>%
  round(4) %>%
  kable(caption = "Pearson Correlation Matrix between Dependent Variables") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2C3E50")

Pearson Correlation Matrix between Dependent Variables
	math score	reading score	writing score
math score	1.0000	0.8176	0.8026
reading score	0.8176	1.0000	0.9546
writing score	0.8026	0.9546	1.0000

corrplot(R,
         method    = "color",
         type      = "upper",
         addCoef.col = "black",
         tl.col    = "#2C3E50",
         tl.srt    = 0,
         cl.cex    = 0.9,
         number.cex = 1.1,
         col       = colorRampPalette(c("#EBF5FB","#3498DB","#154360"))(200),
         title     = "Correlation Matrix - Math, Reading, Writing Score",
         mar       = c(0,0,2,0))

3.3 Implementation of Bartlett’s Test

n <- nrow(Y)
p <- ncol(Y)
det_R <- det(R)

# Bartlett Correction
chi2_stat <- -((n - 1) - (2*p + 5)/6) * log(det_R)
df_bart <- p * (p - 1) / 2
p_value <- pchisq(chi2_stat, df = df_bart, lower.tail = FALSE)

cat("BARTLETT SPHERICITY TEST — Manual Calculation Results\n")
cat(sprintf(" n (number of observations) : %d\n", n))
cat(sprintf(" p (number of DVs) : %d\n", p))
cat(sprintf(" |R| (correlation matrix det) : %.8f\n", det_R))
cat(sprintf(" ln|R| : %.6f\n", log(det_R)))
cat(sprintf(" Correction factor : %.4f\n", (n - 1) - (2*p + 5)/6))
cat()
cat(sprintf("χ² calculated statistic : %.4f\n", chi2_stat))
cat(sprintf(" Degrees of freedom (df) : %d\n", df_bart))
cat(sprintf(" χ² table (α=0.05) : %.4f\n", 
qchisq(0.95, df = df_bart)))
cat(sprintf(" p-value : %.6e\n", p_value))
cat()
cat(sprintf(" Decision: %s\n", 
            ifelse(p_value < 0.05, "REJECT H₀ → MUTUAL DEPENDENT DV ✓", "FAILED TO REJECT H₀ → Independent DV")))
cat()

# Validation via R's built-in bartlett.test (per pair)
cat("Confirmation via R's built-in bartlett.test() (per pair of DVs):\n")
pairs_list <- list(
  c("math score","reading score"),
  c("math score","writing score"),
  c("reading score","writing score")
)
for (pair in pairs_list) {
  bt <- cor.test(df[[pair[1]]], df[[pair[2]]])
  cat(sprintf("  %s vs %s → r = %.4f, p = %.2e\n",
              pair[1], pair[2], bt$estimate, bt$p.value))
}

## BARTLETT SPHERICITY TEST — Manual Calculation Results
##  n (number of observations) : 1000
##  p (number of DVs) : 3
##  |R| (correlation matrix det) : 0.02893173
##  ln|R| : -3.542816
##  Correction factor : 997.1667
## χ² calculated statistic : 3532.7783
##  Degrees of freedom (df) : 3
##  χ² table (α=0.05) : 7.8147
##  p-value : 0.000000e+00
##  Decision: REJECT H₀ → MUTUAL DEPENDENT DV ✓
## Confirmation via R's built-in bartlett.test() (per pair of DVs):
##   math score vs reading score → r = 0.8176, p = 1.79e-241
##   math score vs writing score → r = 0.8026, p = 3.38e-226
##   reading score vs writing score → r = 0.9546, p = 0.00e+00

bartlett_result <- data.frame(
  Parameters = c("χ² Calculated", "Degrees of Freedom", "χ² Table (α=0.05)", "p-value", "Decision"),
  Values = c(round(chi2_stat, 4), df_bart, round(qchisq(0.95, df=df_bart), 4), formatC(p_value, format="e", digits=4), 
               ifelse(p_value < 0.05, "Reject H₀", "Fail to Reject H₀"))
)

bartlett_result %>%
  kable(caption = "Summary of the Bartlett Sphericity Test") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>% 
  row_spec(5, bold = TRUE, color = "white", background = ifelse(p_value < 0.05, "#1E8449", "#E67E22"))

Summary of the Bartlett Sphericity Test
Parameters	Values
χ² Calculated	3532.7783
Degrees of Freedom	3
χ² Table (α=0.05)	7.8147
p-value	0.0000e+00
Decision	Reject H₀

3.4 Scatter Plot Relationship between DV

pairs_df <- data.frame(Y, group = df[[iv_col]])

names(pairs_df)[1:3] <- c("math_score", "reading_score", "writing_score")

p1 <- ggplot(pairs_df, aes(x = math_score, y = reading_score, color = group)) +
  geom_point(alpha = 0.5, size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  scale_color_manual(values = c("#E74C3C","#3498DB")) +
  theme_minimal(base_size = 11) +
  labs(title = "Math vs Reading", color = "Gender")

p2 <- ggplot(pairs_df, aes(x = math_score, y = writing_score, color = group)) +
  geom_point(alpha = 0.5, size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  scale_color_manual(values = c("#E74C3C","#3498DB")) +
  theme_minimal(base_size = 11) +
  labs(title = "Math vs Writing", color = "Gender")

p3 <- ggplot(pairs_df, aes(x = reading_score, y = writing_score, color = group)) +
  geom_point(alpha = 0.5, size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  scale_color_manual(values = c("#E74C3C","#3498DB")) +
  theme_minimal(base_size = 11) +
  labs(title = "Reading vs Writing", color = "Gender")

grid.arrange(p1, p2, p3, ncol = 3, top = "Scatter Plot of the Relationship between Dependent Variables each Gender")

3.5 Interpretation

Assumption 1 FULFILLED - DV Variables Are Mutually Dependent

The Bartlett Sphericity Test yields $\chi^2 = 3532.78$ with $df = 3$ and $p\text{-value} < 0.05$. Since $\chi^2_{\text{count}} > \chi^2_{\text{table}}$, we reject $H_0$ — the DV correlation matrix is not an identity matrix.

Correlation between DVs: - Math–Reading: $r = 0.818$ → strong correlation - Math–Writing: $r = 0.803$ → strong correlation - Reading–Writing: $r = 0.955$ → very strong correlation

The three score variables (math, reading, writing) reflect interrelated academic dimensions. MANOVA is appropriate because analyzing all three DVs simultaneously provides richer information than separate ANOVAs.

4 Assumption 2 - Homogeneity of Covariance Matrices (Box’s M Test)

4.1 Theoretical Basis

Purpose: Test the equality of variance-covariance matrices ($\boldsymbol{\Sigma}$) between groups — a multivariate analogue of Levene’s test (univariate). MANOVA assumes $\boldsymbol{\Sigma}_1 = \boldsymbol{\Sigma}_2 = \cdots = \boldsymbol{\Sigma}_g$.

Hypothesis: \[H_0 : \boldsymbol{\Sigma}_1 = \boldsymbol{\Sigma}_2 = \cdots = \boldsymbol{\Sigma}_g\] \[H_1 : \text{Minimum one } \boldsymbol{\Sigma}_\ell \neq \boldsymbol{\Sigma}_m\]

Box’s M Test Statistics (1949): \[M = \left[\sum_\ell (n_\ell - 1)\right] \ln |\mathbf{S}_{\text{pooled}}| - \sum_\ell (n_\ell - 1)\ln|\mathbf{S}_\ell|\]

\[C = (1 - u)M \approx \chi^2_{\nu}\]

where $\nu = \frac{1}{2}p(p+1)(g-1)$, and $u$ is Box’s correction factor.

Important Note: Box’s M is very sensitive to non-normality. If the normal distribution is not perfectly met, non-significant results (p > 0.05) are interpreted more conservatively.

4.2 Covariance Matrix per Group

groups <- unique(df[[iv_col]])

cat("Covariance Matrix per Group:\n\n")
for (g in groups) { 
  cat(sprintf("Group: %s\n", toupper(g)))
  idx <- df[[iv_col]] == g 
  S_g <- cov(Y[idx, ]) 
  det_g <- det(S_g) 
  cat(sprintf(" n = %d\n", sum(idx))) 
  cat(sprintf(" |S| = %.4f\n", det_g)) 
  print(round(S_g, 4)) 
  cat("\n")
}

# Pooled covariance
n_total <- nrow(df)
g_count <- length(groups)

S_pooled <- Reduce("+", lapply(groups, function(g) { 
  idx <- df[[iv_col]] == g 
  (sum(idx) - 1) * cov(Y[idx, ])
})) / (n_total - g_count)

cat("Pooled Covariance Matrix\n")
print(round(S_pooled, 4))
cat(sprintf(" |S_pooled| = %.4f\n", det(S_pooled)))

## Covariance Matrix per Group:
## 
## Group: FEMALE
##  n = 518
##  |S| = 137344.0080
##               math score reading score writing score
## math score      239.9851      202.5272      211.7384
## reading score   202.5272      206.7339      203.7792
## writing score   211.7384      203.7792      220.3693
## 
## Group: MALE
##  n = 482
##  |S| = 145617.9018
##               math score reading score writing score
## math score      206.1027      177.1060      180.7334
## reading score   177.1060      194.0959      186.9232
## writing score   180.7334      186.9232      199.2002
## 
## Pooled Covariance Matrix
##               math score reading score writing score
## math score      223.6550      190.2751      196.7951
## reading score   190.2751      200.6429      195.6552
## writing score   196.7951      195.6552      210.1666
##  |S_pooled| = 142591.8251

make_cov_heatmap <- function(mat, title, color_mid = "#3498DB") {
  mat_long <- melt(mat)
  ggplot(mat_long, aes(Var1, Var2, fill = value)) +
    geom_tile(color = "white", linewidth = 0.8) +
    geom_text(aes(label = round(value, 1)), size = 3.5, fontface = "bold") +
    scale_fill_gradient2(low = "#EBF5FB", mid = color_mid, high = "#154360", midpoint = median(mat_long$value)) +
    theme_minimal(base_size = 11) +
    theme(axis.text.x = element_text(angle = 30, hjust = 1), plot.title   = element_text(hjust = 0.5, face = "bold")) + labs(title = title, x = "", y = "", fill = "Cov")
}

cov_plots <- lapply(seq_along(groups), function(i) {
  idx <- df[[iv_col]] == groups[i]
  make_cov_heatmap(cov(Y[idx, ]),
                   sprintf("Covariance %s", toupper(groups[i])),
                   color_mid = ifelse(i == 1, "#E74C3C", "#3498DB"))
})
cov_plots[["Pooled"]] <- make_cov_heatmap(S_pooled, "Covariance Pooled", "#2ECC71")

do.call(grid.arrange, c(cov_plots, ncol = 3, list(top = "Comparison of Covariance Matrices between Groups")))

4.3 Box’s M Test

# Calculate Box's M manually
ns <- sapply(groups, function(g) sum(df[[iv_col]] == g))
p  <- ncol(Y)
g  <- length(groups)

M_stat <- sum(ns - 1) * log(det(S_pooled)) -
          sum(sapply(seq_along(groups), function(i) {
            idx <- df[[iv_col]] == groups[i]
            (ns[i] - 1) * log(det(cov(Y[idx, ])))
          }))

# Correction Factor u
u_factor <- (sum(1/(ns - 1)) - 1/sum(ns - 1)) *
             (2*p^2 + 3*p - 1) / (6*(p + 1)*(g - 1))

C_stat <- (1 - u_factor) * M_stat
df_box  <- p*(p + 1)*(g - 1) / 2
p_boxm  <- pchisq(C_stat, df = df_box, lower.tail = FALSE)

cat("BOX'S M TEST - Calculation Results\n")
cat(sprintf(" M statistics : %.4f\n", M_stat))
cat(sprintf(" Correction factor u : %.6f\n", u_factor))
cat(sprintf(" C = (1-u)M : %.4f\n", C_stat))
cat(sprintf(" Degrees of freedom : %.0f\n", df_box))
cat(sprintf(" χ² table α=0.05 : %.4f\n", qchisq(0.95, df = df_box)))
cat(sprintf(" p-value : %.6f\n", p_boxm))
cat()
cat(sprintf(" Decision: %s\n", ifelse(p_boxm > 0.05, "FAILED TO REJECT H₀ → HOMOGENEOUS Covariance Matrix ✓", "REJECT H₀ → NOT Homogeneous Covariance Matrix")))
cat()

# Validation via biotools::boxM
cat("\nValidation via biotools::boxM():\n")
bm <- biotools::boxM(Y, df[[iv_col]])
print(bm)

## BOX'S M TEST - Calculation Results
##  M statistics : 9.2852
##  Correction factor u : 0.003262
##  C = (1-u)M : 9.2549
##  Degrees of freedom : 6
##  χ² table α=0.05 : 12.5916
##  p-value : 0.159741
##  Decision: FAILED TO REJECT H₀ → HOMOGENEOUS Covariance Matrix ✓
## 
## Validation via biotools::boxM():
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  Y
## Chi-Sq (approx.) = 9.2549, df = 6, p-value = 0.1597

boxm_result <- data.frame(
  Parameters = c("M Statistic", "C = (1-u)M", "Degrees of Freedom", "χ² Table (α=0.05)", "p-value", "Decision"),
  Values = c(round(M_stat, 4), round(C_stat, 4), df_box, round(qchisq(0.95, df=df_box), 4), round(p_boxm, 6), ifelse(p_boxm > 0.05, "Failed to Reject H₀ (Homogeneous)", "Reject H₀ (Inhomogeneous)"))
)

boxm_result %>%
  kable(caption = "Box's M Test Summary") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(6, bold = TRUE, color = "white",
           background = ifelse(p_boxm > 0.05, "#1E8449", "#E67E22"))

Box’s M Test Summary
Parameters	Values
M Statistic	9.2852
C = (1-u)M	9.2549
Degrees of Freedom	6
χ² Table (α=0.05)	12.5916
p-value	0.159741
Decision	Failed to Reject H₀ (Homogeneous)

4.4 Boxplot Comparison of Variance each Group

df_long <- df %>%
  pivot_longer(cols = all_of(dv_cols), names_to = "Variable", values_to = "Score")

ggplot(df_long, aes(x = gender, y = Score, fill = gender)) +
  geom_boxplot(alpha = 0.75, outlier.shape = 21, outlier.size = 1.5) +
  geom_jitter(width = 0.15, alpha = 0.2, size = 0.8) +
  scale_fill_manual(values = c("#E74C3C","#3498DB")) +
  facet_wrap(~Variable, ncol = 3) +
  theme_minimal(base_size = 12) +
  theme(
    strip.background = element_rect(fill = "#34495E"),
    strip.text       = element_text(color = "white", face = "bold"),
    legend.position  = "bottom",
    plot.title       = element_text(hjust = 0.5, face = "bold", size = 14)
  ) +
  labs(title = "Distribution of Scores each Gender - Comparison of Variances", x = "Gender", y = "Score", fill = "Gender")

4.5 Interpretation

Asumption 2 FULFILLED - Homogeneous Covariance Matrix

Box’s M test yields $C = `rround(C_stat, 4)`$ with $df = `rdf_box`$ and $p\text{-value} = `rround(p_boxm, 4)`$. Since $p > 0.05$, we fail to reject $H_0$ there is insufficient evidence that the covariance matrix differs across gender groups.

This indicates that: - The variability of math, reading, and writing scores is equal between the female and male groups. - The assumption $\boldsymbol{\Sigma}_{\text{female}} = \boldsymbol{\Sigma}_{\text{male}}$ can be maintained. - MANOVA, ANCOVA, and MANCOVA can be continued using the pooled covariance matrix.

Methodological Note: Box’s M is sensitive to non-normality. This non-significant result supports the reliability of the follow-up analysis.

5 Assumption 3 - Multivariate Normality

5.1 Theoretical Basis

Objective: To verify that the DV vector follows the multivariate normal distribution $\mathbf{Y} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$. MANOVA is based on this assumption for the validity of the Wilks’ F/Lambda test.

Two formal tests were used:

1. Mardia’s Test (Multivariate Skewness & Kurtosis): \[b_{1,p} = \frac{1}{n^2}\sum_{i,j} g_{ij}^3 \quad \text{dan} \quad b_{2,p} = \frac{1}{n}\sum_i g_{ii}^2\] where $g_{ij} = (\mathbf{x}_i - \bar{\mathbf{x}})'\mathbf{S}^{-1}(\mathbf{x}_j - \bar{\mathbf{x}})$ (Mahalanobis-based).

2. Henze-Zirkler test: \[HZ = \frac{1}{n}\sum_{i,j} K_\beta(X_i - X_j) - 2\sum_i K_{\beta/\sqrt{2}}(X_i - \bar{X}) + n \cdot c\]

Both are available via the MVN package.

Criteria: $H_0$: data is multivariately normally distributed. Reject $H_0$ if $p < 0.05$.

5.2 Univariate Normality Test (Prerequisite)

cat("Shapiro-Wilk Test each DV Variable:\n\n")

sw_results <- lapply(dv_cols, function(col) {
  sw  <- shapiro.test(df[[col]])
  list(Variable = col, W = round(sw$statistic, 5),
       p_value = round(sw$p.value, 6),
       Normal  = ifelse(sw$p.value > 0.05, "Yes", "No"))
})

sw_tbl <- do.call(rbind, lapply(sw_results, as.data.frame))

sw_tbl %>%
  kable(caption = "Shapiro-Wilk Normality Test - DV Univariate") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(4, bold = TRUE,
              color = ifelse(sw_tbl$Normal == "Ya", "#1E8449", "#C0392B"))

## Shapiro-Wilk Test each DV Variable:

Shapiro-Wilk Normality Test - DV Univariate
	Variable	W	p_value	Normal
W	math score	0.99315	0.000145	No
W1	reading score	0.99292	0.000106	No
W2	writing score	0.99196	0.000029	No

qq_plots <- lapply(dv_cols, function(col) {
  ggplot(df, aes(sample = .data[[col]])) +
    stat_qq(color = "#3498DB", alpha = 0.6, size = 1.2) +
    stat_qq_line(color = "#E74C3C", linewidth = 1) +
    theme_minimal(base_size = 11) +
    labs(title = sprintf("Q-Q Plot: %s", col),
         x = "Theoretical Quantiles", y = "Sample Quantiles")
})

grid.arrange(grobs = qq_plots, ncol = 3, top = "Q-Q Plot Univariate Normality - Dependent Variable")

5.3 Multivariate Outlier Detection (Mahalanobis Distance)

mean_vec <- colMeans(Y)
cov_mat <- cov(Y)
inv_cov <- solve(cov_mat)

maha_dist <- mahalanobis(Y, center = mean_vec, cov = cov_mat)
df$mahalanobis <- maha_dist

threshold <- qchisq(0.999, df = ncol(Y))   # α = 0.001, df = p
n_outliers <- sum(maha_dist > threshold)

cat("MAHALANOBIS DISTANCE - Multivariate Outlier\n")
cat()
cat(sprintf(" Threshold χ²(0.001, df=%d) : %.4f\n", ncol(Y), threshold))
cat(sprintf(" Number of outliers : %d (%.1f%%)\n", n_outliers, n_outliers/nrow(df)*100))
cat(sprintf(" Maximum distance : %.4f\n", max(maha_dist)))
cat(sprintf(" Minimum distance : %.4f\n", min(maha_dist)))
cat(sprintf(" Median distance : %.4f\n", median(maha_dist)))
cat()

## MAHALANOBIS DISTANCE - Multivariate Outlier
##  Threshold χ²(0.001, df=3) : 16.2662
##  Number of outliers : 1 (0.1%)
##  Maximum distance : 20.1224
##  Minimum distance : 0.0008
##  Median distance : 2.5039

df_maha <- data.frame(Index = 1:nrow(df), Distance = maha_dist,
                      Outlier = maha_dist > threshold)

p_maha1 <- ggplot(df_maha, aes(x = Index, y = Distance, color = Outlier)) +
  geom_point(size = 1.5, alpha = 0.7) +
  geom_hline(yintercept = threshold, linetype = "dashed",
             color = "#E74C3C", linewidth = 1) +
  scale_color_manual(values = c("#3498DB","#E74C3C")) +
  theme_minimal(base_size = 11) +
  labs(title = "Mahalanobis Distance per Observation", subtitle = sprintf("Threshold χ²(0.001, df=3) = %.2f", threshold), x = "Observation Index", y = "Mahalanobis Distance")

p_maha2 <- ggplot(df_maha, aes(x = Distance)) +
  geom_histogram(aes(y = after_stat(density)), bins = 40,
                 fill = "#3498DB", alpha = 0.7, color = "white") +
  geom_vline(xintercept = threshold, linetype = "dashed",
             color = "#E74C3C", linewidth = 1) +
  stat_function(fun = function(x) dchisq(x, df = ncol(Y)),
                color = "#E74C3C", linewidth = 1) +
  theme_minimal(base_size = 11) +
  labs(title = "Mahalanobis Distribution vs. χ²", x = "Distance", y = "Density")

grid.arrange(p_maha1, p_maha2, ncol = 2)

5.4 Formal Multivariate Normality Test

cat("MARDIA TEST - Multivariate Normality\n")
mvn_mardia <- MVN::mvn(data = Y, mvn_test = "mardia", descriptives = FALSE)
print(mvn_mardia$multivariate_normality)

cat("\nHENZE-ZIRKLER TEST - Multivariate Normality\n")
mvn_hz <- MVN::mvn(data = Y, mvn_test = "hz", descriptives = FALSE)
print(mvn_hz$multivariate_normality)

## MARDIA TEST - Multivariate Normality
##              Test Statistic p.value     Method          MVN
## 1 Mardia Skewness    30.255  <0.001 asymptotic ✗ Not normal
## 2 Mardia Kurtosis    -2.266   0.023 asymptotic ✗ Not normal
## 
## HENZE-ZIRKLER TEST - Multivariate Normality
##            Test Statistic p.value     Method          MVN
## 1 Henze-Zirkler     1.282   0.003 asymptotic ✗ Not normal

5.5 Multivariate Normality Plot

# Chi-square Q-Q plot manual
p_values_sorted <- sort(maha_dist)
n    <- length(p_values_sorted)
expected <- qchisq((1:n - 0.5)/n, df = ncol(Y))

qq_df <- data.frame(Expected = expected, Observed = p_values_sorted,
                    Outlier = p_values_sorted > threshold)

p_chisq <- ggplot(qq_df, aes(x = Expected, y = Observed, color = Outlier)) +
  geom_point(size = 1.5, alpha = 0.7) +
  geom_abline(intercept = 0, slope = 1, color = "#E74C3C",
              linewidth = 1, linetype = "dashed") +
  scale_color_manual(values = c("#3498DB","#E74C3C")) +
  theme_minimal(base_size = 12) +
  labs(title = "Chi-Square Q-Q Plot (Multivariate Normality)",
       subtitle = "Points follow the diagonal line → multivariate normal",
       x = "Theoretical χ² Quantiles", y = "Mahalanobis Distance²",
       color = "Outlier")

# Perspective plot via MVN
mvn_out <- MVN::mvn(data = Y, 
                    mvn_test = "mardia",
                    descriptives = FALSE,
                    multivariate_outlier_method = "quan")

print(p_chisq)

cat("Multivariate Normality Test each Gender Group:\n\n")
for (g in groups) {
  cat(sprintf("Group: %s\n", toupper(g)))
  Y_g <- Y[df[[iv_col]] == g, ]
  mvn_g <- MVN::mvn(data = Y_g, mvn_test = "hz", descriptives = FALSE)
  print(mvn_g$multivariate_normality)
  cat("\n")
}

## Multivariate Normality Test each Gender Group:
## 
## Group: FEMALE
##            Test Statistic p.value     Method      MVN
## 1 Henze-Zirkler     1.018   0.091 asymptotic ✓ Normal
## 
## Group: MALE
##            Test Statistic p.value     Method      MVN
## 1 Henze-Zirkler      0.82     0.5 asymptotic ✓ Normal

mvn_summary <- data.frame(
  Test       = c("Mardia Skewness", "Mardia Kurtosis", "Henze-Zirkler"),
  Statistics = c(
    round(mvn_mardia$multivariate_normality[1, "Statistic"], 4),
    round(mvn_mardia$multivariate_normality[2, "Statistic"], 4),
    round(mvn_hz$multivariate_normality[1, "Statistic"], 4)
  ),
  p_value   = c(
    mvn_mardia$multivariate_normality[1, "p.value"],
    mvn_mardia$multivariate_normality[2, "p.value"],
    mvn_hz$multivariate_normality[1, "p.value"]
  ),
  Hasil     = c(
    mvn_mardia$multivariate_normality[1, "MVN"],
    mvn_mardia$multivariate_normality[2, "MVN"],
    mvn_hz$multivariate_normality[1, "MVN"]
  )
)

mvn_summary %>%
  kable(caption = "Multivariate Normality Test Summary") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#34495E")

Multivariate Normality Test Summary
Test	Statistics	p_value	Hasil
Mardia Skewness	30.255	<0.001	✗ Not normal
Mardia Kurtosis	-2.266	0.023	✗ Not normal
Henze-Zirkler	1.282	0.003	✗ Not normal

5.6 Interpretation

Assumption 3 - Multivariate Normality Not Formally Met

The results of the formal tests (Mardia & Henze-Zirkler) indicate that the data do not meet statistically strict multivariate normality. However:

Justification for continuing the analysis:

Robustness of MANOVA: MANOVA is known to be robust to violations of normality when n is large. With n = 1000 $ per group, the sampling distribution of the test statistic approximates chi-square/F based on the Central Limit Theorem.
Multivariate Outliers: Only 1 observations (0.1%) exceed the Mahalanobis threshold there are no extreme outliers that threaten the analysis.
The Chi-square Q-Q plot shows that most of the points follow the diagonal line — deviations occur only in the tails of the distribution.
Rule of thumb (Hair et al.): With n > 200, MANOVA is considered quite robust against violations of multivariate normality.

Recommendation: Proceed with cautious interpretation and consider using Pillai’s Trace (more robust) as an alternative test statistic to Wilks’ Lambda.

6 Assumption 4 - Linearity of Covariates with DV (Linearity Test)

6.1 Theoretical Basis

Objective: To verify that the relationship between the covariate (test preparation course) and each DV variable is linear. ANCOVA and MANCOVA assume a linear regression model between the covariate and DV. If the relationship is not linear, the covariate correction will be biased.

Hypothesis per pair (covariate–DV): \[H_0 : \beta_{\text{linear}} = 0 \quad \text{(no linear relationship)}\] \[H_1 : \beta_{\text{linear}} \neq 0 \quad \text{(there is a linear relationship)}\]

Methods used:

Point-Biserial Correlation (binary covariate vs. continuous DV): $r_{pb}$
Independent Samples t-test / ANOVA F-test: comparing mean DV across covariate levels
Scatter plot + regression line per group IV (gender)
Homogeneity of regression slopes: covariate–DV slopes must be equal across all groups (critical assumption of ANCOVA)

6.2 Transformation & Covariate Statistics

df$test_prep_num <- ifelse(df[[cov_col]] == "completed", 1, 0)

cat("Covariate Distribution:\n")
cov_dist <- table(df[[cov_col]])
print(cov_dist)
cat(sprintf("\nProposition 'completed': %.1f%%\n", cov_dist["completed"]/sum(cov_dist)*100))

cat("\nAverage DV per Covariate Level:\n")
df %>% 
  group_by(`test preparation course`) %>% 
  summarize(across(all_of(dv_cols), ~ round(mean(.), 2)), 
            n = n()) %>% 
  kable(caption = "Mean DV per Covariate Level") %>% 
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

## Covariate Distribution:
## 
## completed      none 
##       358       642 
## 
## Proposition 'completed': 35.8%
## 
## Average DV per Covariate Level:

Mean DV per Covariate Level
test preparation course	math score	reading score	writing score	n
completed	69.70	73.89	74.42	358
none	64.08	66.53	64.50	642

6.3 Covariate–DV Correlation

cat("Point-Biserial Correlation (covariate vs DV):\n\n")

corr_results <- lapply(dv_cols, function(col) { 
  ct <- cor.test(df[[col]], df$test_prep_num, method = "pearson") 
  list(DV = col, 
       r = round(ct$estimate, 4), 
       t_stat = round(ct$statistic, 4), 
       p_value = round(ct$p.value, 6), 
       CI_lower = round(ct$conf.int[1], 4), 
       CI_upper = round(ct$conf.int[2], 4), 
       Signif = ifelse(ct$p.value < 0.05, "Significant ✓", "Not Significant"))
})

corr_tbl <- do.call(rbind, lapply(corr_results, as.data.frame))

corr_tbl %>% 
  kable(caption = "Pearson Correlation: Covariate (Test Prep) vs DV") %>% 
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>% 
  column_spec(1, bold = TRUE) %>% 
  column_spec(7, bold = TRUE, color = ifelse(corr_tbl$Signif == "Significant ✓", "#1E8449", "#C0392B"))

## Point-Biserial Correlation (covariate vs DV):

Pearson Correlation: Covariate (Test Prep) vs DV
	DV	r	t_stat	CI_lower	CI_upper	Signif
cor	math score	0.1777	5.7046	0.1170	0.2371	Significant ✓
cor1	reading score	0.2418	7.8717	0.1825	0.2993	Significant ✓
cor2	writing score	0.3129	10.4092	0.2559	0.3678	Significant ✓

6.4 Boxplot & Scatter DV per Covariate Level

df$test_prep_label <- factor(df[[cov_col]], levels = c("none","completed"), 
labels = c("None", "Completed"))

bp_plots <- lapply(dv_cols, function(col) { 
  df %>% 
    ggplot(aes(x = test_prep_label, y = .data[[col]], fill = test_prep_label)) + 
    geom_boxplot(alpha = 0.75, outlier.shape = 21) + 
    geom_jitter(width = 0.15, alpha = 0.2, size = 0.7) + 
    scale_fill_manual(values = c("#BDC3C7","#27AE60")) + 
    stat_summary(fun = mean, geom = "point", shape = 18, 
                 size = 4, color = "#E74C3C") + 
    theme_minimal(base_size = 11) + 
    theme(legend.position = "none") + 
    labs(title = col, x = "Test Preparation", y = "Score")
})

grid.arrange(grobs = bp_plots, ncol = 3, 
top = "DV Score Distribution based on Test Preparation Status")

scatter_plots <- lapply(dv_cols, function(col) {
  ggplot(df, aes(x = test_prep_num, y = .data[[col]], color = gender)) +
    geom_jitter(width = 0.05, alpha = 0.4, size = 1.5) +
    geom_smooth(method = "lm", se = TRUE, linewidth = 1.2) +
    geom_smooth(aes(group = 1), method = "lm", color = "#2C3E50",
                se = FALSE, linewidth = 1.5, linetype = "dashed") +
    scale_color_manual(values = c("#E74C3C","#3498DB")) +
    scale_x_continuous(breaks = c(0,1),
                       labels = c("None (0)","Completed (1)")) +
    theme_minimal(base_size = 11) +
    theme(legend.position = "bottom") +
    labs(title = sprintf("Linearity: Test Prep → %s", col), 
         subtitle = sprintf("r = %.4f (black line = whole)", 
                            cor(df[[col]], df$test_prep_num)), 
         x = "Test Preparation (0=None, 1=Completed)", 
         y = "Score", color = "Gender")
})

grid.arrange(grobs = scatter_plots, ncol = 3, top = "Scatter Plot of Covariate Linearity–DV each Gender")

6.5 Independent t-Test: Difference in Mean DV across Covariate Levels

cat("INDEPENDENT SAMPLES t-TEST: Kovariat vs DV\n")
cat()

ttest_results <- lapply(dv_cols, function(col) {
  none_grp      <- df[[col]][df[[cov_col]] == "none"]
  completed_grp <- df[[col]][df[[cov_col]] == "completed"]
  tt <- t.test(completed_grp, none_grp, var.equal = FALSE)
  
  cat(sprintf(" Variable: %s\n", col))
  cat(sprintf(" Mean (None) = %.2f\n", mean(none_grp)))
  cat(sprintf(" Mean (Completed) = %.2f\n", mean(completed_grp)))
  cat(sprintf(" Difference in means = %.2f\n", mean(completed_grp) - mean(none_grp)))
  cat(sprintf(" t = %.4f, df = %.1f, p = %.2e\n",
              tt$statistic, tt$parameter, tt$p.value))
  cat(sprintf(" Conclusion: %s\n\n",
              ifelse(tt$p.value < 0.05,
                     "There is a significant difference → Covariate has an effect ✓",
                     "There is no significant difference")))
  
  data.frame(DV = col,
             Mean_None = round(mean(none_grp), 2),
             Mean_Completed = round(mean(completed_grp), 2),
             Selisih = round(mean(completed_grp) - mean(none_grp), 2),
             t_stat  = round(tt$statistic, 4),
             p_value = round(tt$p.value, 6),
             Signif  = ifelse(tt$p.value < 0.05, "Significant ✓", "No"))
})

ttest_tbl <- do.call(rbind, ttest_results)

ttest_tbl %>%
  kable(caption = "t-Test: Difference in Mean DV based on Test Preparation") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(7, bold = TRUE,
              color = ifelse(ttest_tbl$Signif == "Significant ✓", "#1E8449", "#C0392B"))

## INDEPENDENT SAMPLES t-TEST: Kovariat vs DV
##  Variable: math score
##  Mean (None) = 64.08
##  Mean (Completed) = 69.70
##  Difference in means = 5.62
##  t = 5.7870, df = 770.1, p = 1.04e-08
##  Conclusion: There is a significant difference → Covariate has an effect ✓
## 
##  Variable: reading score
##  Mean (None) = 66.53
##  Mean (Completed) = 73.89
##  Difference in means = 7.36
##  t = 8.0041, df = 775.4, p = 4.39e-15
##  Conclusion: There is a significant difference → Covariate has an effect ✓
## 
##  Variable: writing score
##  Mean (None) = 64.50
##  Mean (Completed) = 74.42
##  Difference in means = 9.91
##  t = 10.7525, df = 811.1, p = 2.66e-25
##  Conclusion: There is a significant difference → Covariate has an effect ✓

t-Test: Difference in Mean DV based on Test Preparation
	DV	Mean_None	Mean_Completed	Selisih	t_stat	Signif
t	math score	64.08	69.70	5.62	5.7870	Significant ✓
t1	reading score	66.53	73.89	7.36	8.0041	Significant ✓
t2	writing score	64.50	74.42	9.91	10.7525	Significant ✓

6.6 Homogeneity Test of Regression Slope (Interaction Test)

cat("HOMOGENITY TEST OF REGRESSION SLOPE\n")
cat("(Covariate × IV Interaction - Critical Assumptions of ANCOVA)\n")
cat()

for (col in dv_cols) {
  fml <- as.formula(paste0('`', col, '` ~ gender * test_prep_num'))
  m_int <- lm(fml, data = df)
  
  cat(sprintf("DV: %s\n", col))
  
  # Extract interaction term
  s <- summary(m_int)$coefficients
  int_row <- grep("gender.*test_prep|test_prep.*gender", rownames(s), value = TRUE)
  
  if (length(int_row) > 0) {
    cat(sprintf(" Interaction coefficient (gender:test_prep): %.4f\n", s[int_row, "Estimate"])) 
    cat(sprintf(" t = %.4f, p = %.4f\n", s[int_row,"t value"], s[int_row,"Pr(>|t|)"])) 
    cat(sprintf(" Homogeneous slope: %s\n\n", ifelse(s[int_row,"Pr(>|t|)"] > 0.05, "YES ✓", "NO ✗")))
  }
}

## HOMOGENITY TEST OF REGRESSION SLOPE
## (Covariate × IV Interaction - Critical Assumptions of ANCOVA)
## DV: math score
##  Interaction coefficient (gender:test_prep): 0.1258
##  t = 0.0647, p = 0.9484
##  Homogeneous slope: YES ✓
## 
## DV: reading score
##  Interaction coefficient (gender:test_prep): 0.0242
##  t = 0.0134, p = 0.9893
##  Homogeneous slope: YES ✓
## 
## DV: writing score
##  Interaction coefficient (gender:test_prep): 0.3323
##  t = 0.1838, p = 0.8542
##  Homogeneous slope: YES ✓

slope_plots <- lapply(dv_cols, function(col) {
  coefs <- df %>%
    group_by(gender) %>%
    summarise(slope = coef(lm(reformulate("test_prep_num", col), data = cur_data()))[2],
              .groups = "drop")
  
  ggplot(df, aes(x = test_prep_num, y = .data[[col]], color = gender)) +
    geom_jitter(width = 0.04, alpha = 0.35, size = 1.2) +
    geom_smooth(method = "lm", se = TRUE, linewidth = 1.3) +
    scale_color_manual(values = c("#E74C3C","#3498DB")) +
    scale_x_continuous(breaks = c(0,1), labels = c("None","Completed")) +
    annotate("text", x = 0.5, y = min(df[[col]]) + 2,
             label = sprintf("Slope F: %.2f | M: %.2f",
                             coefs$slope[coefs$gender=="female"],
                             coefs$slope[coefs$gender=="male"]),
             size = 3.5, color = "#2C3E50", fontface = "italic") +
    theme_minimal(base_size = 11) +
    theme(legend.position = "bottom") +
    labs(title = col, x = "Test Prep", y = "Skor")
})

grid.arrange(grobs = slope_plots, ncol = 3, top = "Covariate Regression Slope→DV per Gender Group\n(Must be Parallel for ANCOVA)")

6.7 Interpretation

Assumption 4 MEET - Linearity of Covariates with DVs

Results show:

Significant correlation: The covariate test preparation course is significantly correlated with all three DVs ($p < 0.05$ for all). Students who participated in test preparation scored consistently higher:

Math: $+5.6$ points
Reading: $+7.4$ points
Writing: $+9.9$ points

Linear relationship: The scatter plot shows a clear linear pattern — scores increase as test preparation status increases.
Homogeneity of slope: There is no significant interaction between gender and the covariate ($p > 0.05$), meaning the covariate–DV regression slope is equal across both gender groups. This is a critical assumption of ANCOVA that is met.

The covariate is proven relevant and valid for inclusion in the ANCOVA/MANCOVA model.

7 Assumption 5 — Independence of Observations

7.1 Theoretical Basis

Goal: To ensure that each observation (respondent/student) is independent of each other there is no relationship or influence between individuals in the dataset.

Formal Definition: \[\text{Cov}(\varepsilon_{ij}, \varepsilon_{i'j'}) = 0 \quad \text{for } (i,j) \neq (i',j')\]

This assumption cannot be tested formally statistically from cross-sectional data, but must be justified from the research design.

7.2 Research Design Evaluation

Review of Independence of Observations — Student Performance Dataset

Data Source: Kaggle — “Students Performance in Exams” (Royce Kimmons, BYU; simulation of a standard exam scenario)

7.2.1 Arguments in Support of Independence

1. Cross-Sectional Design: Each row represents a unique student taking the exam at the same time. There are no repeated measures on the same individual.

2. No Known Clustering: The data does not include school, grade, or teacher information there is no identifiable hierarchical structure that could introduce intra-class autocorrelation.

3. Independent Sampling: The dataset is designed as an independent survey/observation of a population of students with diverse demographic backgrounds (race/ethnicity, parental level of education, lunch, gender).

4. No Repeated Measures: Each student appears only once in the dataset — there is no longitudinal or before-after data to create dependencies.

7.2.2 Considerations & Limitations

Students in the same grade could potentially share the same teacher/instructor, creating a clustering effect. However, the grade variable is not available in this dataset.
Socio-economic effects: The variables lunch (a proxy for poverty) and parental education may reflect geographic/community clustering, but are considered independent covariates.
Synthetic data: This dataset is a simulation, so independence between observations was explicitly guaranteed in the data generation process.

cat("Verify No Duplicate Observations:\n")
cat(sprintf(" Total rows: %d\n", nrow(df)))
cat(sprintf(" Unique rows: %d\n",
            nrow(unique(df[, c("gender","race/ethnicity","parental level of education",
                               "lunch","test preparation course",
                               "math score","reading score","writing score")]))))
cat(sprintf(" Duplicates present: %s\n",
            ifelse(nrow(df) == nrow(unique(df[, dv_cols])), "No ✓", "Yes - investigation required")))

cat(" Distribution per categorical variable:\n")
for (cat_var in c("gender", "race/ethnicity", "parental level of education", "lunch", "test preparation course")) { 
  cat(sprintf("\n %s:\n", cat_var)) 
  tb <- table(df[[cat_var]]) 
  for (nm in names(tb)) cat(sprintf(" %-30s: %d (%.1f%%)\n", nm, tb[nm], tb[nm]/sum(tb)*100))
}

## Verify No Duplicate Observations:
##  Total rows: 1000
##  Unique rows: 1000
##  Duplicates present: Yes - investigation required
##  Distribution per categorical variable:
## 
##  gender:
##  female                        : 518 (51.8%)
##  male                          : 482 (48.2%)
## 
##  race/ethnicity:
##  group A                       : 89 (8.9%)
##  group B                       : 190 (19.0%)
##  group C                       : 319 (31.9%)
##  group D                       : 262 (26.2%)
##  group E                       : 140 (14.0%)
## 
##  parental level of education:
##  associate's degree            : 222 (22.2%)
##  bachelor's degree             : 118 (11.8%)
##  high school                   : 196 (19.6%)
##  master's degree               : 59 (5.9%)
##  some college                  : 226 (22.6%)
##  some high school              : 179 (17.9%)
## 
##  lunch:
##  free/reduced                  : 355 (35.5%)
##  standard                      : 645 (64.5%)
## 
##  test preparation course:
##  completed                     : 358 (35.8%)
##  none                          : 642 (64.2%)

7.3 Interpretation

Assumption 5 MEET - Independent Observations

Based on the research design review:

The dataset is cross-sectional with 1000 unique students
Each student is measured once without repetition
No clustering structure is identified in the data
The distribution of categorical variables shows variability that reflects general population sampling

The independence assumption is acceptable based on the available data structure. For studies with actual data from specific schools, it is recommended to consider a mixed-effects model to accommodate potential clustering between classes/schools.

8 Ringkasan Akhir Semua Asumsi

summary_df <- data.frame(
  No        = 1:5,
  Asumsi    = c(
    "Dependensi DV (Bartlett)",
    "Homogenitas Kovarians (Box's M)",
    "Normalitas Multivariat (Mardia/HZ)",
    "Linearitas Kovariat–DV",
    "Independensi Observasi"
  ),
  Metode    = c(
    "Bartlett Sphericity Test",
    "Box's M Test (biotools)",
    "Mardia + Henze-Zirkler (MVN)",
    "Korelasi Pearson + t-test + Slope Test",
    "Tinjauan Desain Penelitian"
  ),
  Statistik_Kunci = c(
    sprintf("χ²=%.2f, df=%d", chi2_stat, df_bart),
    sprintf("C=%.2f, df=%.0f", C_stat, df_box),
    "Mardia Skew p<0.001; HZ p<0.001",
    sprintf("r=%.3f~%.3f, semua p<0.05",
            min(as.numeric(corr_tbl$r)), max(as.numeric(corr_tbl$r))),
    "Cross-sectional, n=1000 siswa unik"
  ),
  p_value   = c(
    formatC(p_value, format="e", digits=3),
    round(p_boxm, 4),
    "< 0.05 (formal)",
    "< 0.001",
    "N/A"
  ),
  Status    = c(
    "✅ TERPENUHI",
    "✅ TERPENUHI",
    "⚠️ ROBUST (n besar)",
    "✅ TERPENUHI",
    "✅ TERPENUHI"
  ),
  Implikasi = c(
    "MANOVA layak digunakan",
    "Pooled covariance valid",
    "Lanjutkan; gunakan Pillai's Trace",
    "Kovariat valid untuk ANCOVA/MANCOVA",
    "Inferensi statistik valid"
  )
)

summary_df %>%
  kable(caption = "Tabel Ringkasan Pengujian Asumsi — MANOVA / ANCOVA / MANCOVA") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 12) %>%
  column_spec(1, bold = TRUE, width = "2em") %>%
  column_spec(2, bold = TRUE, width = "14em") %>%
  column_spec(6, bold = TRUE,
              color = c("#1E8449","#1E8449","#E67E22","#1E8449","#1E8449")) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2C3E50")

Tabel Ringkasan Pengujian Asumsi — MANOVA / ANCOVA / MANCOVA
No	Asumsi	Metode	Statistik_Kunci	p_value	Status	Implikasi
1	Dependensi DV (Bartlett)	Bartlett Sphericity Test	χ²=3532.78, df=3	0.000e+00	✅ TERPENUHI \|	ANOVA layak digunakan \|
2	Homogenitas Kovarians (Box’s M)	Box’s M Test (biotools)	C=9.25, df=6	0.1597	✅ TERPENUHI \|	ooled covariance valid \|
3	Normalitas Multivariat (Mardia/HZ)	Mardia + Henze-Zirkler (MVN)	Mardia Skew p<0.001; HZ p<0.001	< 0.05 (formal)	⚠️ ROBUST (n besar)	\|Lanjutkan; gunakan Pillai’s Trace
4	Linearitas Kovariat–DV	Korelasi Pearson + t-test + Slope Test	r=0.178~0.313, semua p<0.05	< 0.001	✅ TERPENUHI \|	ovariat valid untuk ANCOVA/MANCOVA \|
5	Independensi Observasi	Tinjauan Desain Penelitian	Cross-sectional, n=1000 siswa unik	N/A	✅ TERPENUHI \|	nferensi statistik valid \|

8.1 Final Conclusions

Conclusions of Assumption Testing

Of the five assumptions tested on the Students Performance in Exams dataset, four assumptions were fully met and one assumption (multivariate normality) was practically met given the large sample size ($n = 1000).

Feasibility of Further Analysis:

Analysis	Feasibility	Recommendations
One-Way MANOVA	Feasible	Gender as IV, Math/Reading/Writing as DV
ANCOVA	Feasible	Gender as IV, Test Prep as covariate, one DV
MANCOVA	Feasible	Gender as IV, Test Prep as covariate, all three DVs simultaneously

Recommended test statistic: Pillai’s Trace (more robust to violations of multivariate normality than Wilks’ Lambda).

Substantive Interpretation: - Academic score variables (math, reading, writing) are highly correlated → strong justification for using MANOVA - Test preparation covariates have been shown to have a linear and significant effect on all three DVs → MANCOVA will provide a more accurate estimate of the gender effect - Covariate regression slopes are homogeneous across groups → critical assumptions of ANCOVA are met

9 References

9.1 Data Source

Dataset: Students Performance in Exams
Source: Kaggle — sp scientist
License: CC0: Public Domain

9.2 Pustaka R yang Digunakan

MVN: Korkmaz, S., Goksuluk, D. & Zararsiz, G. (2014). MVN: An R package for Multivariate Normality Tests
biotools: Da Silva, A.R. (2017). A discriminant function using biotools::boxM
car: Fox, J. & Weisberg, S. (2019). An R Companion to Applied Regression
ggplot2, dplyr, tidyr: Tidyverse packages

9.3 Referensi Statistik

Box, G.E.P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317–346.
Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Psychology, 3, 77–85.
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Henze, N. & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in Statistics, 19, 3595–3618.
Hair, J.F. et al. (2019). Multivariate Data Analysis (8th ed.). Cengage Learning.

Laporan Pengujian Asumsi — Analisis Multivariat

Tanggal: 15 April 2026

Prodi S1 Sains Data | FMIPA UNESA | 2026

Dokumen ini dibuat menggunakan R Markdown dengan paket MVN, biotools, car, dan tidyverse

Student Performance: Assumption Testing for MANOVA, ANCOVA & MANCOVA

Multivariate Analysis - S1 Data Science UNESA

2026-04-15