Install Packages

# install.packages("datarium")
# install.packages("MVN")
# install.packages("heplots")
# install.packages("car")
# install.packages("biotools")
# install.packages("psych")
# install.packages("emmeans")
# install.packages("knitr")

Data Preparation and Loading

library(datarium)
## Warning: package 'datarium' was built under R version 4.5.3
library(MVN)
## Warning: package 'MVN' was built under R version 4.5.3
## Registered S3 method overwritten by 'lme4':
##   method           from
##   na.action.merMod car
library(heplots)
## Warning: package 'heplots' was built under R version 4.5.3
## Loading required package: broom
library(car)
## Warning: package 'car' was built under R version 4.5.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.5.3
library(biotools)
## Warning: package 'biotools' was built under R version 4.5.3
## Loading required package: MASS
## ---
## biotools version 4.3
## 
## Attaching package: 'biotools'
## The following object is masked from 'package:heplots':
## 
##     boxM
library(psych)
## Warning: package 'psych' was built under R version 4.5.3
## 
## Attaching package: 'psych'
## The following object is masked from 'package:car':
## 
##     logit
## The following object is masked from 'package:MVN':
## 
##     mardia
library(emmeans)
## Warning: package 'emmeans' was built under R version 4.5.3
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
library(knitr)
## Warning: package 'knitr' was built under R version 4.5.3
data("anxiety", package = "datarium")
str(anxiety)
## tibble [45 × 5] (S3: tbl_df/tbl/data.frame)
##  $ id   : Factor w/ 45 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ group: Factor w/ 3 levels "grp1","grp2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ t1   : num [1:45] 14.1 14.5 15.7 16 16.5 16.9 17 17 17.3 17.3 ...
##  $ t2   : num [1:45] 14.4 14.6 15.2 15.5 15.8 16.5 16.8 17.1 16.9 17.1 ...
##  $ t3   : num [1:45] 14.1 14.3 14.9 15.3 15.7 16.2 16.5 16.6 16.5 16.7 ...
head(anxiety)
## # A tibble: 6 × 5
##   id    group    t1    t2    t3
##   <fct> <fct> <dbl> <dbl> <dbl>
## 1 1     grp1   14.1  14.4  14.1
## 2 2     grp1   14.5  14.6  14.3
## 3 3     grp1   15.7  15.2  14.9
## 4 4     grp1   16    15.5  15.3
## 5 5     grp1   16.5  15.8  15.7
## 6 6     grp1   16.9  16.5  16.2

A. Data Characteristics

Bagian ini menyajikan ringkasan statistika deskriptif dan visualisasi data sebelum analisis utama dilakukan.

# Descriptive statistics per group and variable
desc_t1 <- tapply(anxiety$t1, anxiety$group, function(x)
  c(n = length(x), mean = round(mean(x), 3), sd = round(sd(x), 3),
    min = min(x), max = max(x)))

desc_t2 <- tapply(anxiety$t2, anxiety$group, function(x)
  c(n = length(x), mean = round(mean(x), 3), sd = round(sd(x), 3),
    min = min(x), max = max(x)))

desc_t3 <- tapply(anxiety$t3, anxiety$group, function(x)
  c(n = length(x), mean = round(mean(x), 3), sd = round(sd(x), 3),
    min = min(x), max = max(x)))

desc_table <- data.frame(
  Variable = c("t1 (Covariate)", "", "",
               "t2 (DV1)", "", "",
               "t3 (DV2)", "", ""),
  Group    = rep(c("grp1", "grp2", "grp3"), 3),
  N        = c(sapply(desc_t1, `[[`, "n"),
               sapply(desc_t2, `[[`, "n"),
               sapply(desc_t3, `[[`, "n")),
  Mean     = c(sapply(desc_t1, `[[`, "mean"),
               sapply(desc_t2, `[[`, "mean"),
               sapply(desc_t3, `[[`, "mean")),
  SD       = c(sapply(desc_t1, `[[`, "sd"),
               sapply(desc_t2, `[[`, "sd"),
               sapply(desc_t3, `[[`, "sd")),
  Min      = c(sapply(desc_t1, `[[`, "min"),
               sapply(desc_t2, `[[`, "min"),
               sapply(desc_t3, `[[`, "min")),
  Max      = c(sapply(desc_t1, `[[`, "max"),
               sapply(desc_t2, `[[`, "max"),
               sapply(desc_t3, `[[`, "max"))
)

kable(desc_table)
Variable Group N Mean SD Min Max
t1 (Covariate) grp1 15 17.087 1.629 14.1 19.8
grp2 15 16.647 1.566 13.7 19.3
grp3 15 17.013 1.321 14.6 19.0
t2 (DV1) grp1 15 16.927 1.696 14.4 20.0
grp2 15 16.467 1.695 13.4 18.9
grp3 15 15.013 1.389 12.7 17.2
t3 (DV2) grp1 15 16.507 1.556 14.1 19.4
grp2 15 15.527 1.703 12.7 17.7
grp3 15 13.560 1.423 11.0 15.5
# Boxplots for all 3 variables by group
par(mfrow = c(1, 3))
boxplot(t1 ~ group, data = anxiety, main = "t1 (Covariate) by Group",
        xlab = "Group", ylab = "Score", col = c("lightblue","lightgreen","lightyellow"))
boxplot(t2 ~ group, data = anxiety, main = "t2 (DV1) by Group",
        xlab = "Group", ylab = "Score", col = c("lightblue","lightgreen","lightyellow"))
boxplot(t3 ~ group, data = anxiety, main = "t3 (DV2) by Group",
        xlab = "Group", ylab = "Score", col = c("lightblue","lightgreen","lightyellow"))

par(mfrow = c(1, 1))

Interpretation: The anxiety dataset consists of 45 participants divided equally into three groups (grp1, grp2, grp3), with 15 participants each. The variables are: t1 (pretest anxiety score, used as covariate), t2 and t3 (post-test scores at two time points, used as dependent variables), and group (independent variable/factor). Scores range approximately from 11 to 20 across all time points, with no missing values detected from the summary output.

Based onthe table, the covariate t1 shows relatively similar means across groups (grp1: 17.087, grp2: 16.647, grp3: 17.013), indicating that the groups were comparable at baseline. In contrast, the DVs show a clear downward pattern from grp1 to grp3: mean t2 decreases from 16.927 (grp1) to 16.467 (grp2) to 15.013 (grp3), and mean t3 drops more sharply from 16.507 (grp1) to 15.527 (grp2) to 13.560 (grp3). The boxplots confirm this trend, with grp3 displaying notably lower post-test anxiety scores on both DVs relative to grp1 and grp2, while the covariate t1 distributions overlap substantially across groups.

B. Assumptions

B.1 Assumption 1: Dependence Between DVs

# Assumption 1: DVs should be significantly correlated (dependent)
dv_matrix <- anxiety[, c("t2", "t3")]

cor(dv_matrix)
##           t2        t3
## t2 1.0000000 0.9597594
## t3 0.9597594 1.0000000
cortest.bartlett(cor(dv_matrix), n = nrow(dv_matrix))
## $chisq
## [1] 107.9524
## 
## $p.value
## [1] 2.753051e-25
## 
## $df
## [1] 1

Result: The Pearson correlation r = 0.9598, and Bartlett’s test yields Chi-square = 107.95, df = 1, p = 2.75e-25. The p-value is essentially zero, providing extremely strong evidence that t2 and t3 are not independent, the identity matrix null hypothesis is firmly rejected. This assumption is met. The DVs are sufficiently correlated to justify analyzing them jointly using MANOVA rather than running separate univariate ANOVAs. This assumption is met (Assumption 1: Satisfied).

B.2 Assumption 2: Homogeneity of Covariance

# Assumption 2: Homogeneity of covariance matrices across groups
boxM(data = anxiety[, c("t2", "t3")],
     grouping = anxiety$group)
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  anxiety[, c("t2", "t3")]
## Chi-Sq (approx.) = 2.8435, df = 6, p-value = 0.8282

Result: The output shows: Box’s M Chi-Sq (approx.) = 2.8435, df = 6, p-value = 0.8282. With p = 0.828 far exceeding even the lenient 0.05 threshold, there is no evidence of covariance heterogeneity across the three groups. This assumption is met. The homogeneity of covariance structure is well preserved across groups, which supports the validity of MANOVA and MANCOVA analyses. This assumption is met (Assumption 2: Satisfied).

B.3 Assumption 3: Multivariate Normality

# Assumption 3: Multivariate Normality
dv_data <- anxiety[, c("t2", "t3")]

# Mardia's Test (multivariate skewness and kurtosis)
mardia(dv_data)

## Call: mardia(x = dv_data)
## 
## Mardia tests of multivariate skew and kurtosis
## Use describe(x) the to get univariate tests
## n.obs = 45   num.vars =  2 
## b1p =  0.21   skew =  1.58  with probability  <=  0.81
##  small sample skew =  1.76  with probability <=  0.78
## b2p =  6.28   kurtosis =  -1.44  with probability <=  0.15
# Shapiro-Wilk per DV (univariate normality check)
shapiro.test(anxiety$t2)
## 
##  Shapiro-Wilk normality test
## 
## data:  anxiety$t2
## W = 0.98313, p-value = 0.7471
shapiro.test(anxiety$t3)
## 
##  Shapiro-Wilk normality test
## 
## data:  anxiety$t3
## W = 0.98244, p-value = 0.7193
# Shapiro-Wilk per group per DV
for (grp in levels(anxiety$group)) {
  cat("\nGroup:", grp, "\n")
  subset_data <- anxiety[anxiety$group == grp, c("t2", "t3")]
  cat("Shapiro-Wilk t2: p =", shapiro.test(subset_data$t2)$p.value, "\n")
  cat("Shapiro-Wilk t3: p =", shapiro.test(subset_data$t3)$p.value, "\n")
}
## 
## Group: grp1 
## Shapiro-Wilk t2: p = 0.6241596 
## Shapiro-Wilk t3: p = 0.5062152 
## 
## Group: grp2 
## Shapiro-Wilk t2: p = 0.3283178 
## Shapiro-Wilk t3: p = 0.1311525 
## 
## Group: grp3 
## Shapiro-Wilk t2: p = 0.5583945 
## Shapiro-Wilk t3: p = 0.2324214
# Chi-Square Q-Q Plot of Mahalanobis Distances
dv_means <- colMeans(dv_data)
dv_cov   <- cov(dv_data)
mah      <- mahalanobis(dv_data, dv_means, dv_cov)
n        <- nrow(dv_data)
p        <- ncol(dv_data)

qqplot(qchisq(ppoints(n), df = p), sort(mah),
       main = "Chi-Square Q-Q Plot - Multivariate Normality",
       xlab = "Theoretical Chi-Square Quantiles (df = 2)",
       ylab = "Observed Mahalanobis Distances",
       pch  = 19, col = "steelblue")
abline(0, 1, col = "red", lwd = 2)

Result: Mardia’s test assesses whether the joint distribution of t2 and t3 satisfies multivariate normality by examining both skewness and kurtosis. The skewness test yields p = 0.81 and kurtosis yields p = 0.15, both are well above 0.05, meaning we fail to reject H0 for both. The data shows no significant multivariate skewness or kurtosis. Additionally, all Shapiro-Wilk tests at the univariate level (both overall and per group) yield p-values well above 0.05, ranging from 0.131 to 0.747, confirming that each DV is normally distributed individually within every group. The Chi-Square Q-Q plot of Mahalanobis distances should show points closely following the red diagonal line, which is consistent with these results. This assumption is met (Assumption 3: Satisfied).

B.4 Assumption 4: Linearity of Covariate-DV

# Assumption 4: Linear relationship between covariate (t1) and each DV
par(mfrow = c(1, 2))

plot(anxiety$t1, anxiety$t2,
     col = anxiety$group,
     pch = 19,
     main = "Covariate (t1) vs DV1 (t2)",
     xlab = "t1 (Covariate)", ylab = "t2")
abline(lm(t2 ~ t1, data = anxiety), col = "black", lwd = 2)
legend("topright", legend = levels(anxiety$group),
       col = 1:nlevels(anxiety$group), pch = 19, cex = 0.8)

plot(anxiety$t1, anxiety$t3,
     col = anxiety$group,
     pch = 19,
     main = "Covariate (t1) vs DV2 (t3)",
     xlab = "t1 (Covariate)", ylab = "t3")
abline(lm(t3 ~ t1, data = anxiety), col = "black", lwd = 2)

par(mfrow = c(1, 1))

# Statistical linearity: Pearson correlation + significance
cor.test(anxiety$t1, anxiety$t2, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  anxiety$t1 and anxiety$t2
## t = 10.385, df = 43, p-value = 2.705e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7342640 0.9125856
## sample estimates:
##       cor 
## 0.8455544
cor.test(anxiety$t1, anxiety$t3, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  anxiety$t1 and anxiety$t3
## t = 7.0252, df = 43, p-value = 1.183e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5570174 0.8435502
## sample estimates:
##      cor 
## 0.731026
# Formal test: check for non-linearity via polynomial term
summary(lm(t2 ~ t1 + I(t1^2), data = anxiety))
## 
## Call:
## lm(formula = t2 ~ t1 + I(t1^2), data = anxiety)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9647 -0.9672  0.3707  0.7306  1.2626 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.06650   15.48472   1.360    0.181
## t1          -1.63312    1.85596  -0.880    0.384
## I(t1^2)      0.07872    0.05534   1.423    0.162
## 
## Residual standard error: 0.9433 on 42 degrees of freedom
## Multiple R-squared:  0.7281, Adjusted R-squared:  0.7151 
## F-statistic: 56.22 on 2 and 42 DF,  p-value: 1.33e-12
summary(lm(t3 ~ t1 + I(t1^2), data = anxiety))
## 
## Call:
## lm(formula = t3 ~ t1 + I(t1^2), data = anxiety)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7886 -1.2667  0.3356  1.1589  1.7743 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.00288   22.38518   0.760    0.452
## t1          -1.21810    2.68303  -0.454    0.652
## I(t1^2)      0.06521    0.07999   0.815    0.420
## 
## Residual standard error: 1.364 on 42 degrees of freedom
## Multiple R-squared:  0.5417, Adjusted R-squared:  0.5198 
## F-statistic: 24.82 on 2 and 42 DF,  p-value: 7.676e-08

Result: ANCOVA and MANCOVA require that the covariate (t1) has a linear relationship with each DV. The Pearson correlation between t1 and t2 is r = 0.846 (p < 0.001), and between t1 and t3 is r = 0.731 (p < 0.001). Both are strong, positive, and highly significant, confirming that meaningful linear relationships exist. To rule out curvilinearity, polynomial regression was used, if the quadratic term (t1²) were significant, it would indicate that a curved rather than straight-line relationship exists. For both t2 and t3, the quadratic term is non-significant (p = 0.162 and p = 0.420 respectively), meaning there is no evidence of curvature. The scatterplots with fitted regression lines visually support this. This assumption is met. The covariate t1 has a strong and genuinely linear relationship with both DVs (Assumption 4: Satisfied).

B.5 Assumption 5: Independence of Observations

# Assumption 5: Independence of observations
# No formal statistical test - justified through study design

table(anxiety$group)
## 
## grp1 grp2 grp3 
##   15   15   15
nrow(anxiety)
## [1] 45
cat("Total participants:", nrow(anxiety), "\n")
## Total participants: 45
cat("Group distribution:\n")
## Group distribution:
print(table(anxiety$group))
## 
## grp1 grp2 grp3 
##   15   15   15

Result: Independence of observations cannot be tested statistically, it is evaluated based on how the data was collected. In the anxiety dataset, each of the 45 rows represents a unique individual participant. Participants were assigned to one of three mutually exclusive groups (grp1, grp2, grp3), with no participant appearing in more than one group. There is no clustering structure (e.g., classrooms, families, or therapists) documented in the dataset that would create dependency between observations. Therefore, this assumption is met by design. Each observation is independent of all others. This assumption is met. Each observation is independent of all others (Assumption 5: Satisfied).

B.6 Assumptions Conclusion

Assumption Test Used Result Status
1. Dependence between DVs Pearson correlation + Bartlett’s sphericity r = 0.960, strongly correlated Satisfied
2. Homogeneity of covariance Box’s M Test Chi-Sq = 2.84, p = 0.828 Satisfied
3. Multivariate normality Mardia’s Test + Shapiro-Wilk All p-value > 0.05 across all groups Satisfied
4. Linearity of covariate-DV Pearson correlation + polynomial regression r > 0.73 significant; no curvature detected Satisfied
5. Independence of observations Study design inspection 45 unique participants, mutually exclusive groups Satisfied

All 5 assumptions are fully satisfied. The assumption violations are not merely marginal passes but clean results across the board. The homogeneity of covariance (p = 0.828) is well within bounds, multivariate normality is convincingly normal (skewness p = 0.81, kurtosis p = 0.15), and the covariate shows a strong linear relationship with both DVs. The dataset is satisfied condition to proceed with MANOVA, ANCOVA, and MANCOVA analyses without the need for any corrective adjustments or robust alternatives.

C. MANOVA

The anxiety dataset has two post-test scores (t2 and t3) as dependent variables (DVs) and group as the categorical independent variable with three levels (grp1, grp2, grp3). MANOVA simultaneously tests whether the group centroids - vectors of means across both DVs - differ significantly in the multivariate space defined by t2 and t3 jointly. Analyzing both DVs together exploits their high intercorrelation (r = 0.960) to increase sensitivity and controls the Type I error inflation that would result from running two separate ANOVAs. Since there is only one categorical factor (group), this is a One-Way MANOVA.

C.1 Verify Assumptions (Already Done in Section B)

MANOVA requires that the DVs be correlated, that covariance matrices are homogeneous across groups, and that the data follow multivariate normality. All three were verified in Section B:

Assumption PPT Name Test Result
1 Dependence between DVs Pearson + Bartlett’s sphericity r = 0.960, strongly correlated - Satisfied
2 Homogeneity of covariance matrices Box’s M Test p = 0.828 - Satisfied
3 Multivariate normality Mardia’s Test + Shapiro-Wilk All p > 0.05 - Satisfied
5 Independence of observations Study design inspection 45 unique participants, mutually exclusive groups - Satisfied

All assumptions required for One-Way MANOVA are fully satisfied. Assumption 4 (linearity of covariate-DV) is specific to ANCOVA and MANCOVA and does not apply here. The data is ready to proceed without corrective adjustments.

C.2 Compute SSCP Components

The next step is to compute the Sums of Squares and Cross-Products (SSCP) for both DVs (t2 and t3), decomposed into within-groups and between-groups components. Unlike univariate ANOVA where we compute only SS(Y) for a single DV, MANOVA requires SS for each DV and the Sum of Products (SP) that captures how the two DVs co-vary together. These six values, SS_wg(t2), SS_wg(t3), SP_wg, SS_bg(t2), SS_bg(t3), and SP_bg, are the raw building blocks for constructing the SSCP matrices in the next step.

y1  <- anxiety$t2
y2  <- anxiety$t3
grp <- anxiety$group

k <- nlevels(grp)
N <- nrow(anxiety)
n <- N / k

cat("k (number of groups):", k, "\n")
## k (number of groups): 3
cat("N (total sample size):", N, "\n")
## N (total sample size): 45
cat("n (per group):", n, "\n\n")
## n (per group): 15
sum_y1   <- tapply(y1, grp, sum)
sum_y2   <- tapply(y2, grp, sum)
sum_y1sq <- tapply(y1^2, grp, sum)
sum_y2sq <- tapply(y2^2, grp, sum)
sum_y1y2 <- tapply(y1 * y2, grp, sum)

total_y1   <- sum(y1)
total_y2   <- sum(y2)
total_y1sq <- sum(y1^2)
total_y2sq <- sum(y2^2)
total_y1y2 <- sum(y1 * y2)

SS_wg_y1 <- total_y1sq - sum(sum_y1^2 / n)
SS_wg_y2 <- total_y2sq - sum(sum_y2^2 / n)
SP_wg_mv <- total_y1y2 - sum(sum_y1 * sum_y2 / n)

SS_bg_y1 <- sum(sum_y1^2 / n) - total_y1^2 / N
SS_bg_y2 <- sum(sum_y2^2 / n) - total_y2^2 / N
SP_bg_mv <- sum(sum_y1 * sum_y2 / n) - (total_y1 * total_y2) / N

cat("Within Groups (E elements)\n")
## Within Groups (E elements)
cat("SS_wg(t2):", round(SS_wg_y1, 3), "\n")
## SS_wg(t2): 107.5
cat("SS_wg(t3):", round(SS_wg_y2, 3), "\n")
## SS_wg(t3): 102.835
cat("SP_wg(t2,t3):", round(SP_wg_mv, 3), "\n\n")
## SP_wg(t2,t3): 102.129
cat("Between Groups (H elements)\n")
## Between Groups (H elements)
cat("SS_bg(t2):", round(SS_bg_y1, 3), "\n")
## SS_bg(t2): 29.923
cat("SS_bg(t3):", round(SS_bg_y2, 3), "\n")
## SS_bg(t3): 67.555
cat("SP_bg(t2,t3):", round(SP_bg_mv, 3), "\n")
## SP_bg(t2,t3): 44.735

Interpretation: The SSCP components partition the total multivariate variability into within-group and between-group parts. The SS_wg values represent unexplained variance in each DV (analogous to SSE in univariate ANOVA), while the SS_bg values represent variance attributable to group membership (analogous to SSA). The SP values capture how t2 and t3 co-vary together within and between groups, this cross-product structure is what distinguishes MANOVA from running two separate ANOVAs, because it accounts for the joint distribution of the DVs rather than treating each in isolation.

The output shows within-group SS: SS_wg(t2) = 107.500, SS_wg(t3) = 102.835, SP_wg = 102.129. Between-group SS: SS_bg(t2) = 29.923, SS_bg(t3) = 67.555, SP_bg = 44.735. The between-group SS for t3 (67.555) is notably larger than for t2 (29.923), suggesting that group separation is more pronounced on t3 than on t2.

C.3 Build SSCP Matrices (H and E)

The six computed values are assembled into two 2x2 symmetric matrices: the E matrix (Error/Within-groups SSCP) and the H matrix (Hypothesis/Between-groups SSCP). The diagonal elements of each matrix contain the SS values for each DV individually, while the off-diagonal elements contain the SP values that encode the joint covariance structure between t2 and t3.

E <- matrix(c(SS_wg_y1, SP_wg_mv, SP_wg_mv, SS_wg_y2), nrow = 2,
            dimnames = list(c("t2", "t3"), c("t2", "t3")))

H <- matrix(c(SS_bg_y1, SP_bg_mv, SP_bg_mv, SS_bg_y2), nrow = 2,
            dimnames = list(c("t2", "t3"), c("t2", "t3")))

cat("E Matrix (Within-Groups SSCP):\n")
## E Matrix (Within-Groups SSCP):
print(round(E, 3))
##         t2      t3
## t2 107.500 102.129
## t3 102.129 102.835
cat("\nH Matrix (Between-Groups SSCP):\n")
## 
## H Matrix (Between-Groups SSCP):
print(round(H, 3))
##        t2     t3
## t2 29.923 44.735
## t3 44.735 67.555
cat("\nH + E Matrix (Total SSCP):\n")
## 
## H + E Matrix (Total SSCP):
print(round(H + E, 3))
##         t2      t3
## t2 137.423 146.864
## t3 146.864 170.390

Interpretation: The E matrix is the multivariate equivalent of the within-group sum of squares, it measures how much variability in the combined DV space is unexplained by group membership. The H matrix is the multivariate equivalent of the between-group sum of squares, it measures how far the group centroids deviate from the grand centroid across both DVs simultaneously. The H + E matrix captures total variability. In the next step, Wilks’ Lambda uses the ratio of the determinants of E and (H + E) to quantify how much of the total multivariate variance is unexplained by the group effect.

The E matrix has diagonal elements 107.500 (t2) and 102.835 (t3) with off-diagonal 102.129, confirming strong within-group co-variation. The H matrix has diagonal elements 29.923 (t2) and 67.555 (t3) with off-diagonal 44.735. The total SSCP (H + E) has diagonals 137.423 and 170.390, consistent with the overall variance in the dataset.

C.4 Compute Wilks’ Lambda (L)

Wilks’ Lambda is the primary test statistic for MANOVA. It is defined as the ratio of the determinant of the E matrix to the determinant of the combined H + E matrix. The determinant of a matrix captures its “generalized variance”, the total multivariate spread of observations in the DV space. A Lambda of 1 indicates that the group centroids are identical (no group effect); a Lambda close to 0 indicates maximal separation between groups relative to within-group variability.

det_E  <- det(E)
det_HE <- det(H + E)
Lambda <- det_E / det_HE

cat("det(E):", round(det_E, 4), "\n")
## det(E): 624.4621
cat("det(H + E):", round(det_HE, 4), "\n")
## det(H + E): 1846.589
cat("\nWilks' Lambda = det(E) / det(H + E)\n")
## 
## Wilks' Lambda = det(E) / det(H + E)
cat("             =", round(det_E, 4), "/", round(det_HE, 4), "\n")
##              = 624.4621 / 1846.589
cat("             =", round(Lambda, 6), "\n")
##              = 0.338171

Interpretation: Wilks’ Lambda (L) expresses the fraction of total multivariate variance that is unexplained by group membership. A small Lambda (close to 0) indicates that most of the multivariate variance is explained by between-group differences, suggesting a strong group effect. A Lambda near 1 would indicate no meaningful group separation in the combined DV space. Because Wilks’ Lambda does not itself follow an F distribution, the next two steps convert it to an approximate F statistic using Rao’s approximation in order to carry out the significance test.

The output yields det(E) = 624.4621, det(H + E) = 1846.589, and Wilks’ Lambda = 0.338171. This value of 0.338 indicates that only about 33.8% of total multivariate variance is unexplained by group membership, meaning the group effect accounts for the majority of variance in the combined DV space, which is a substantial separation between groups.

C.5 Compute Degrees of Freedom

p_mv <- 2
g    <- k

df_H <- g - 1
df_E <- N - g

cat("p (number of DVs):", p_mv, "\n")
## p (number of DVs): 2
cat("g (number of groups):", g, "\n")
## g (number of groups): 3
cat("df_H (between groups) = g - 1 =", df_H, "\n")
## df_H (between groups) = g - 1 = 2
cat("df_E (within groups) = N - g =", df_E, "\n")
## df_E (within groups) = N - g = 42

Interpretation: The between-groups (hypothesis) degrees of freedom df_H = k - 1 = 2, corresponding to the number of groups minus one. The within-groups (error) degrees of freedom df_E = N - k = 42, representing the residual degrees of freedom after estimating the three group centroids. Unlike ANCOVA, MANOVA does not consume an additional degree of freedom for a covariate. These values feed directly into Rao’s F approximation formula in the next step.

The output confirms: p = 2 DVs, g = 3 groups, df_H = 2, df_E = 42. These are as expected from the design (45 participants, 3 groups, no covariate in MANOVA).

C.6 Compute F Approximation (Rao’s Approximation)

Because Wilks’ Lambda does not follow a standard F distribution directly, Rao’s approximation transforms it into an approximate F statistic with calculable degrees of freedom. The parameter w is a scaled version of the error degrees of freedom, and s is a shape parameter derived from the dimensions of the problem (number of DVs p_mv and hypothesis df df_H). The resulting F test has numerator df = p x df_H and an approximate denominator df computed from these parameters.

w <- N - 1 - (p_mv + g) / 2
s <- sqrt((p_mv^2 * df_H^2 - 4) / (p_mv^2 + df_H^2 - 5))
df1 <- p_mv * df_H
df2 <- s * w - (p_mv * df_H - 2) / 2

F_stat_mv <- ((1 - Lambda^(1/s)) / Lambda^(1/s)) * (df2 / df1)
p_val_mv  <- pf(F_stat_mv, df1, df2, lower.tail = FALSE)
F_crit_mv <- qf(0.95, df1, df2)

cat("MANOVA TABLE (Wilks' Lambda / Rao's F Approximation)\n\n")
## MANOVA TABLE (Wilks' Lambda / Rao's F Approximation)
cat(sprintf("%-20s %6s %8s %10s %10s %8s\n",
            "Source", "df_H", "df_E", "Lambda", "F", "p-value"))
## Source                 df_H     df_E     Lambda          F  p-value
cat(rep("=", 66), "\n", sep = "")
## ==================================================================
cat(sprintf("%-20s %6d %8.1f %10.6f %10.3f %8.4f\n",
            "Between Groups", df_H, df2, Lambda, F_stat_mv, p_val_mv))
## Between Groups            2     82.0   0.338171     14.752   0.0000
cat(rep("=", 66), "\n", sep = "")
## ==================================================================
cat(sprintf("\nF critical at alpha = 0.05, df(%d, %.1f): %.3f\n", df1, df2, F_crit_mv))
## 
## F critical at alpha = 0.05, df(4, 82.0): 2.483
cat(sprintf("F observed: %.3f\n", F_stat_mv))
## F observed: 14.752
if (F_stat_mv > F_crit_mv) {
  cat("Decision: REJECT H0 (F_obs > F_crit)\n")
} else {
  cat("Decision: FAIL TO REJECT H0 (F_obs <= F_crit)\n")
}
## Decision: REJECT H0 (F_obs > F_crit)

Interpretation: The MANOVA table presents the result of the multivariate F test derived from Wilks’ Lambda via Rao’s approximation. The numerator df = p x df_H = 4 and the denominator df = 82 are determined by the dimensions of the matrices (p = 2 DVs, g = 3 groups, N = 45). We compare F_obs against the critical F value at a = 0.05. If F_obs > F_crit, we reject H0 and conclude that at least one pair of groups differs significantly in their combined DV centroid - that is, the multivariate profile of anxiety post-test scores (t2 and t3 together) differs across groups.

The output shows F_obs = 14.752, F_crit (df = 4, 82) = 2.483, p = 0.0000. Since F_obs = 14.752 far exceeds F_crit = 2.483, H0 is rejected. The group effect on the combined DV space is highly significant.

C.7 Verify with R’s Built-in MANOVA

manova_model <- manova(cbind(t2, t3) ~ group, data = anxiety)
summary(manova_model, test = "Wilks")
##           Df   Wilks approx F num Df den Df   Pr(>F)    
## group      2 0.33817   14.752      4     82 4.04e-09 ***
## Residuals 42                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova_model, test = "Pillai")
##           Df  Pillai approx F num Df den Df    Pr(>F)    
## group      2 0.67279   10.646      4     84 5.011e-07 ***
## Residuals 42                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova_model, test = "Hotelling-Lawley")
##           Df Hotelling-Lawley approx F num Df den Df    Pr(>F)    
## group      2           1.9247   19.247      4     80 4.012e-11 ***
## Residuals 42                                                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova_model, test = "Roy")
##           Df    Roy approx F num Df den Df    Pr(>F)    
## group      2 1.9077   40.061      2     42 1.843e-10 ***
## Residuals 42                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: R’s manova() function produces all four multivariate test statistics for completeness. Wilks’ Lambda matches our manual computation above and is the most widely reported statistic in published research. Pillai’s Trace is often preferred when assumption violations are suspected, as it is the most robust of the four criteria. Hotelling-Lawley Trace and Roy’s Largest Root are less commonly reported but test the same null hypothesis. For well-behaved data like ours, where all assumptions are strongly satisfied, all 4 statistics should yield consistent and significant results, confirming that group membership has a significant multivariate effect on the combined DV space of t2 and t3.

All 4 criteria yield highly significant results: Wilks’ Lambda = 0.33817, F(4, 82) = 14.752, p = 4.04e-09; Pillai’s Trace = 0.67279, F(4, 84) = 10.646, p = 5.01e-07; Hotelling-Lawley = 1.9247, F(4, 80) = 19.247, p = 4.01e-11; Roy’s Largest Root = 1.9077, F(2, 42) = 40.061, p = 1.84e-10. The unanimous significance across all four multivariate criteria strongly confirms the group effect.

C.8 Effect Size: Multivariate Eta Squared (η²)

This tells us what proportion of the total multivariate variance in the combined DVs (t2 and t3 jointly) is attributable to the group factor.

eta_sq_mv <- 1 - Lambda^(1/s)

cat(sprintf("Multivariate Eta Squared (eta2) = 1 - Lambda^(1/s)\n"))
## Multivariate Eta Squared (eta2) = 1 - Lambda^(1/s)
cat(sprintf("                             = 1 - %.6f^(1/%.4f)\n", Lambda, s))
##                              = 1 - 0.338171^(1/2.0000)
cat(sprintf("                             = %.4f\n", eta_sq_mv))
##                              = 0.4185
cat(sprintf("                             = %.1f%%\n\n", eta_sq_mv * 100))
##                              = 41.8%
if (eta_sq_mv >= 0.14) {
  cat("Effect size: LARGE (eta2 >= 0.14)\n")
} else if (eta_sq_mv >= 0.06) {
  cat("Effect size: MEDIUM (0.06 <= eta2 < 0.14)\n")
} else {
  cat("Effect size: SMALL (eta2 < 0.06)\n")
}
## Effect size: LARGE (eta2 >= 0.14)

Interpretation: The multivariate eta squared (η²) = 1 - L^(1/s) quantifies the proportion of total multivariate variance in the combined DV space (t2 and t3 simultaneously) that is explained by group membership. It is the multivariate counterpart of partial eta squared in univariate ANOVA and ANCOVA. Following Cohen’s guidelines: η² >= 0.14 = large effect, 0.06 <= η² < 0.14 = medium effect, η² < 0.06 = small effect. A large η² here would indicate that group membership accounts for a substantial share of the total multivariate variability across both post-test anxiety scores, meaning the three groups are well-separated in the combined DV space.

The output yields η² = 0.4185, or 41.8% of total multivariate variance explained by group membership. By Cohen’s criteria this is a LARGE effect. This means that group differences in the combined anxiety post-test scores (t2 and t3 jointly) are not only statistically significant but also practically substantial.

C.9 Post-Hoc Tests (If H0 is Rejected)

If the overall MANOVA F-test is significant (group effect p < 0.05), we conduct follow-up analyses to identify which individual DVs drive the multivariate effect and which specific group pairs differ from each other. The first step is follow-up univariate ANOVAs on each DV separately; if significant, Bonferroni-adjusted pairwise comparisons are then used to pinpoint the differing group pairs.

cat("Follow-up Univariate ANOVAs\n\n")
## Follow-up Univariate ANOVAs
summary.aov(manova_model)
##  Response t2 :
##             Df  Sum Sq Mean Sq F value   Pr(>F)   
## group        2  29.923 14.9616  5.8454 0.005759 **
## Residuals   42 107.500  2.5595                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response t3 :
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## group        2  67.555  33.778  13.796 2.481e-05 ***
## Residuals   42 102.835   2.448                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\nPairwise Comparisons for t2 (Bonferroni)\n")
## 
## Pairwise Comparisons for t2 (Bonferroni)
pairwise.t.test(anxiety$t2, anxiety$group, p.adj = "bonferroni")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  anxiety$t2 and anxiety$group 
## 
##      grp1   grp2  
## grp2 1.0000 -     
## grp3 0.0064 0.0507
## 
## P value adjustment method: bonferroni
cat("\nPairwise Comparisons for t3 (Bonferroni)\n")
## 
## Pairwise Comparisons for t3 (Bonferroni)
pairwise.t.test(anxiety$t3, anxiety$group, p.adj = "bonferroni")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  anxiety$t3 and anxiety$group 
## 
##      grp1    grp2 
## grp2 0.281   -    
## grp3 1.9e-05 0.004
## 
## P value adjustment method: bonferroni

Interpretation: The follow-up univariate ANOVAs decompose the overall multivariate significance into its per-DV components. If both t2 and t3 are individually significant, group differences manifest across the full DV space. If only one DV is significant, the multivariate effect is primarily driven by that variable. The Bonferroni-adjusted pairwise comparisons then identify which specific group pairs (grp1 vs grp2, grp1 vs grp3, grp2 vs grp3) account for the overall group effect. A p-value < 0.05 after Bonferroni correction indicates that the two groups differ significantly in their mean scores on that DV.

Both DVs are individually significant: for t2, F(2, 42) = 5.845, p = 0.006; for t3, F(2, 42) = 13.796, p = 2.48e-05. Pairwise comparisons for t2 show grp1 vs grp3 (p = 0.006) and grp2 vs grp3 (p = 0.051, marginally not significant) as the primary sources of group separation, while grp1 vs grp2 did not differ (p = 1.000). For t3, both grp1 vs grp3 (p = 1.9e-05) and grp2 vs grp3 (p = 0.004) are significant, while grp1 vs grp2 remains non-significant (p = 0.281). grp3 is thus the primary driver of the group effect across both DVs.

C.10 Conclusion

cat("MANOVA FINAL SUMMARY\n\n")
## MANOVA FINAL SUMMARY
cat(sprintf("Wilks' Lambda: %.6f\n", Lambda))
## Wilks' Lambda: 0.338171
cat(sprintf("F(%d, %.1f) = %.3f, p = %.4f  %s\n",
            df1, df2, F_stat_mv, p_val_mv,
            ifelse(p_val_mv < 0.05, "SIGNIFICANT", "not significant")))
## F(4, 82.0) = 14.752, p = 0.0000  SIGNIFICANT
cat(sprintf("Multivariate eta2: %.3f (%.1f%% of variance explained)\n",
            eta_sq_mv, eta_sq_mv * 100))
## Multivariate eta2: 0.418 (41.8% of variance explained)
cat("\nConclusion:\n")
## 
## Conclusion:
if (p_val_mv < 0.05) {
  cat("We REJECT H0. There is a statistically significant multivariate\n")
  cat("difference between at least two groups on the combined\n")
  cat("dependent variables (t2 and t3) simultaneously.\n")
  cat("Group membership has a significant effect on the combined DV space.\n")
} else {
  cat("We FAIL TO REJECT H0. There is no statistically significant\n")
  cat("multivariate difference between groups on the combined\n")
  cat("dependent variables (t2 and t3).\n")
}
## We REJECT H0. There is a statistically significant multivariate
## difference between at least two groups on the combined
## dependent variables (t2 and t3) simultaneously.
## Group membership has a significant effect on the combined DV space.

D. ANCOVA

The anxiety dataset has one categorical independent variable (group with 3 levels: grp1, grp2, grp3), one continuous covariate (t1, pretest anxiety score), and we will analyze one dependent variable at a time in ANCOVA (for ANCOVA we take t2 as the primary DV, since ANCOVA is univariate, the multivariate extension is MANCOVA).

Because there is only one factor (group), this is a One-Way ANCOVA. A Two-Way ANCOVA would require two categorical independent variables (e.g., group x gender), which is not present in this dataset.

D.1 Verify the Assumptions (Already Done in Section B)

Before running ANCOVA, ANCOVA required validation of 5 assumptions. All were verified in Section B:

Assumption PPT Name Test Result
1 Linearity of regression Pearson + polynomial Linear relationship confirmed between t1 and t2
2 Homogeneity of error variances Box’s M p = 0.828, equal variances across groups
3 Independence of error terms Design inspection 45 unique participants, mutually exclusive groups
4 Normality of error terms Shapiro-Wilk, Mardia All p > 0.05
5 Homogeneity of regression slopes Interaction test (done below) To be verified in Step 2

Assumption 5 (homogeneity of regression slopes) is the one assumption specific to ANCOVA that requires checking the interaction between the factor and covariate.

D.2 Assumption 5: Homogeneity of Regression Slopes

This assumption states that the slope of the regression line between the covariate (t1) and the DV (t2) must be the same (parallel) across all groups. “The slopes of the different regression lines should be equivalent, i.e., regression lines should be parallel among groups.”

We test this by fitting a model that includes the interaction term group x t1. If the interaction is significant, the slopes differ across groups and the assumption is violated.

# Test for homogeneity of regression slopes
# If group:t1 interaction is significant - slopes differ - assumption violated
model_interaction <- lm(t2 ~ t1 * group, data = anxiety)
Anova(model_interaction, type = "III")
## Anova Table (Type III tests)
## 
## Response: t2
##             Sum Sq Df  F value Pr(>F)    
## (Intercept)  0.041  1   0.2912 0.5925    
## t1          38.955  1 273.7682 <2e-16 ***
## group        0.117  2   0.4128 0.6647    
## t1:group     0.039  2   0.1370 0.8724    
## Residuals    5.549 39                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: From the Type III ANOVA table, the interaction term t1:group yields F(2, 39) = 0.137, p = 0.872. Since p = 0.872 is far above the 0.05 threshold, we fail to reject H0. There is no statistically significant interaction between the covariate and the group factor. This means the regression lines relating t1 to t2 are parallel across grp1, grp2, and grp3 - exactly as required by ANCOVA. Note that the covariate t1 itself is highly significant (F(1, 39) = 273.768, p < 0.001), confirming that t1 is a powerful and relevant covariate that meaningfully explains variance in t2. Assumption 5 is satisfied.

The output confirms: t1:group interaction F(2, 39) = 0.137, p = 0.872 — non-significant. The covariate t1 alone is highly significant with F(1, 39) = 273.768, p < 2e-16. Homogeneity of regression slopes is fully met.

D.3 Compute Sums of Squares and Cross-Products (SS and SP)

The next step is compute the SS for Y (DV = t2), SS for X (covariate = t1), and SP (sum of products of XY), both within groups (wg) and between groups (bg).

These are the building blocks for the adjusted SS values used in the F-test.

dv  <- anxiety$t2
cov <- anxiety$t1
grp <- anxiety$group

k <- nlevels(grp)
N <- nrow(anxiety)
n <- N / k

cat("k (number of groups):", k, "\n")
## k (number of groups): 3
cat("N (total sample size):", N, "\n")
## N (total sample size): 45
cat("n (per group):", n, "\n\n")
## n (per group): 15
group_sumY  <- tapply(dv,       grp, sum)
group_sumX  <- tapply(cov,      grp, sum)
group_sumY2 <- tapply(dv^2,     grp, sum)
group_sumX2 <- tapply(cov^2,    grp, sum)
group_sumXY <- tapply(dv * cov, grp, sum)

cat("Sum of Y per group:\n"); print(group_sumY)
## Sum of Y per group:
##  grp1  grp2  grp3 
## 253.9 247.0 225.2
cat("Sum of X per group:\n"); print(group_sumX)
## Sum of X per group:
##  grp1  grp2  grp3 
## 256.3 249.7 255.2
totalY  <- sum(dv)
totalX  <- sum(cov)
totalY2 <- sum(dv^2)
totalX2 <- sum(cov^2)
totalXY <- sum(dv * cov)

cat("\nTotal sum of Y:", totalY)
## 
## Total sum of Y: 726.1
cat("\nTotal sum of X:", totalX, "\n")
## 
## Total sum of X: 761.2
# WITHIN-GROUPS (wg)
SS_wg_Y <- totalY2 - sum(group_sumY^2 / n)
SS_wg_X <- totalX2 - sum(group_sumX^2 / n)
SP_wg   <- totalXY - sum(group_sumY * group_sumX / n)

cat("\nWithin Groups\n")
## 
## Within Groups
cat("SS_wg(Y):", round(SS_wg_Y, 3), "\n")
## SS_wg(Y): 107.5
cat("SS_wg(X):", round(SS_wg_X, 3), "\n")
## SS_wg(X): 95.892
cat("SP_wg   :", round(SP_wg,    3), "\n")
## SP_wg   : 98.856
# BETWEEN-GROUPS (bg)
SS_bg_Y <- sum(group_sumY^2 / n) - totalY^2 / N
SS_bg_X <- sum(group_sumX^2 / n) - totalX^2 / N
SP_bg   <- sum(group_sumY * group_sumX / n) - (totalY * totalX) / N

cat("\nBetween Groups\n")
## 
## Between Groups
cat("SS_bg(Y):", round(SS_bg_Y, 3), "\n")
## SS_bg(Y): 29.923
cat("SS_bg(X):", round(SS_bg_X, 3), "\n")
## SS_bg(X): 1.667
cat("SP_bg   :", round(SP_bg,    3), "\n")
## SP_bg   : -0.951
# TOTAL
SS_total_Y <- SS_bg_Y + SS_wg_Y
SS_total_X <- SS_bg_X + SS_wg_X
SP_total   <- SP_bg   + SP_wg

cat("\nTotal\n")
## 
## Total
cat("SS_total(Y):", round(SS_total_Y, 3), "\n")
## SS_total(Y): 137.423
cat("SS_total(X):", round(SS_total_X, 3), "\n")
## SS_total(X): 97.559
cat("SP_total   :", round(SP_total,    3), "\n")
## SP_total   : 97.905

Interpretation: These raw (unadjusted) sums of squares and cross-products partition the total variability in the DV (t2) and the covariate (t1) into components that are between groups and within groups. The SP values capture how Y and X covary together in each partition, this is what allows ANCOVA to statistically remove the covariate’s influence before testing the group effect.

The output shows within-group SS_wg(Y) = 107.500, SS_wg(X) = 95.892, SP_wg = 98.856; between-group SS_bg(Y) = 29.923, SS_bg(X) = 1.667, SP_bg = -0.951. Notably, SS_bg(X) = 1.667 is very small, indicating that the 3 groups had nearly identical pretest means (consistent with what was observed in Tabel 1), confirming that group differences in SS_bg(Y) are due to treatment and not pre-existing covariate differences.

D.4 Compute Adjusted SS (SS’)

These are what get entered into the ANCOVA F-test.

# Adjusted within-group SS (covariate effect removed from error)
SS_prime_wg <- SS_wg_Y - (SP_wg^2 / SS_wg_X)

# Adjusted between-group SS (covariate effect removed from treatment)
SS_prime_bg <- SS_bg_Y - (
  ((SP_bg + SP_wg)^2 / (SS_bg_X + SS_wg_X)) -
  (SP_wg^2 / SS_wg_X)
)

cat("SS'_wg (adjusted within-group SS) :", round(SS_prime_wg, 3), "\n")
## SS'_wg (adjusted within-group SS) : 5.588
cat("SS'_bg (adjusted between-group SS):", round(SS_prime_bg, 3), "\n")
## SS'_bg (adjusted between-group SS): 33.582

Interpretation: The adjusted SS values account for the linear relationship between the covariate (t1) and the DV (t2). SS’_wg represents the residual (error) variance after removing the covariate’s contribution. These adjusted SS values are taken from the final ANCOVA model after removing the non-significant interaction term. SS’_bg represents the variance attributable to the group factor after statistically controlling for the covariate. These are the “Adj SS” values.

The output shows SS’_wg = 5.588 and SS’_bg = 33.582. Compared to the unadjusted values (SS_wg(Y) = 107.500, SS_bg(Y) = 29.923), adjusting for the covariate dramatically reduced the within-group error (from 107.5 to 5.588), while the between-group SS actually increased (from 29.923 to 33.582). This indicates the power of ANCOVA: removing the strong covariate effect substantially increases the signal-to-noise ratio.

D.5 Compute Degrees of Freedom

c_cov <- 1  # number of covariates

df_bg <- k - 1
df_wg <- N - k - c_cov

cat("df_bg (between groups) = k - 1        =", df_bg, "\n")
## df_bg (between groups) = k - 1        = 2
cat("df_wg (within groups)  = N - k - c    =", df_wg, "\n")
## df_wg (within groups)  = N - k - c    = 41

Interpretation: The between-groups df = 2 (because we have 3 groups, so 3 - 1 = 2). The within-groups (error) df = 45 - 3 - 1 = 41. The final ANCOVA model (without interaction) yields df_E = 41. Note that compared to standard one-way ANOVA, ANCOVA loses one degree of freedom from the error term (for estimating the regression slope B), which reduces error df but also reduces error SS, generally increasing power.

The output confirms df_bg = 2 and df_wg = 41, consistent with the formula k - 1 and N - k - c, where c = 1 covariate. The loss of one df for the covariate is a small price for the substantial reduction in error SS achieved.

D.6 Build the ANCOVA Table and Compute F

MS_bg  <- SS_prime_bg / df_bg
MS_wg  <- SS_prime_wg / df_wg
F_stat <- MS_bg / MS_wg

cat("ANCOVA TABLE\n\n")
## ANCOVA TABLE
cat(sprintf("%-20s %6s %10s %10s %8s\n",
            "Source", "df", "Adj SS", "Adj MS", "F"))
## Source                   df     Adj SS     Adj MS        F
cat(rep("=", 58), "\n", sep = "")
## ==========================================================
cat(sprintf("%-20s %6d %10.3f %10.3f %8.3f\n",
            "Between Groups", df_bg, SS_prime_bg, MS_bg, F_stat))
## Between Groups            2     33.582     16.791  123.191
cat(sprintf("%-20s %6d %10.3f %10.3f\n",
            "Within Groups", df_wg, SS_prime_wg, MS_wg))
## Within Groups            41      5.588      0.136
cat(rep("=", 58), "\n", sep = "")
## ==========================================================
F_crit <- qf(0.95, df1 = df_bg, df2 = df_wg)
cat(sprintf("\nF critical at alpha=0.05, df(%d, %d): %.3f\n", df_bg, df_wg, F_crit))
## 
## F critical at alpha=0.05, df(2, 41): 3.226
cat(sprintf("F observed: %.3f\n", F_stat))
## F observed: 123.191
if (F_stat > F_crit) {
  cat("Decision: REJECT H0 (F_obs > F_crit)\n")
} else {
  cat("Decision: FAIL TO REJECT H0 (F_obs <= F_crit)\n")
}
## Decision: REJECT H0 (F_obs > F_crit)

Interpretation: The ANCOVA table above mirrors the structure shown in the PPT (Table 6.3 style). The F statistic is computed as MS_bg / MS_wg. We compare F_obs against the critical value F_(a, df_bg, df_wg). If F_obs > F_crit, we reject H0 and conclude that group membership has a significant effect on anxiety scores (t2) even after controlling for the pretest scores (t1).

The output shows: MS_bg = 16.791, MS_wg = 0.136, F_obs = 123.191, F_crit (df = 2, 41) = 3.226. Since F_obs = 123.191 far exceeds F_crit = 3.226, H0 is rejected. The group effect on t2 is highly significant after controlling for t1.

D.7 Verify with R’s Built-in ANCOVA

# Fit the ANCOVA model: DV ~ covariate + factor
ancova_model <- aov(t2 ~ t1 + group, data = anxiety)

Anova(ancova_model, type = "III")
## Anova Table (Type III tests)
## 
## Response: t2
##              Sum Sq Df  F value Pr(>F)    
## (Intercept)   0.152  1   1.1167 0.2968    
## t1          101.912  1 747.6896 <2e-16 ***
## group        33.582  2 123.1911 <2e-16 ***
## Residuals     5.588 41                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ancova_model)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## t1           1  98.25   98.25   720.8 <2e-16 ***
## group        2  33.58   16.79   123.2 <2e-16 ***
## Residuals   41   5.59    0.14                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: The output from Anova(..., type = "III") provides the official ANCOVA table with adjusted sums of squares — this should closely match our manually computed values above. We look at two rows:

  • t1 (covariate): A significant p-value (< 0.05) confirms that the pretest anxiety score is a meaningful covariate — it explains a portion of the variance in t2, thereby reducing error and increasing power.
  • group (factor): This is the main test of interest. If p < 0.05, we conclude that the groups differ significantly in their t2 scores after adjusting for t1.

The R output confirms: t1 covariate F(1, 41) = 747.690, p < 2e-16 (highly significant); group factor F(2, 41) = 123.191, p< 2e-16 (highly significant). The Adj SS for group = 33.582 exactly matches our manual computation, confirming the manual calculations are correct.

D.8 Effect Size: Partial Eta Squared (η²)

This tells us what proportion of the adjusted variance in the DV is attributable to the group factor.

eta_sq <- SS_prime_bg / (SS_prime_bg + SS_prime_wg)

cat(sprintf("Partial Eta Squared (eta2) = SS'_bg / (SS'_bg + SS'_wg)\n"))
## Partial Eta Squared (eta2) = SS'_bg / (SS'_bg + SS'_wg)
cat(sprintf("                        = %.3f / (%.3f + %.3f)\n",
            SS_prime_bg, SS_prime_bg, SS_prime_wg))
##                         = 33.582 / (33.582 + 5.588)
cat(sprintf("                        = %.4f\n", eta_sq))
##                         = 0.8573
cat(sprintf("                        = %.1f%%\n\n", eta_sq * 100))
##                         = 85.7%
if (eta_sq >= 0.14) {
  cat("Effect size: LARGE (eta2 >= 0.14)\n")
} else if (eta_sq >= 0.06) {
  cat("Effect size: MEDIUM (0.06 <= eta2 < 0.14)\n")
} else {
  cat("Effect size: SMALL (eta2 < 0.06)\n")
}
## Effect size: LARGE (eta2 >= 0.14)

Interpretation: The partial eta squared value tells us what percentage of the variance in t2 (after controlling for the covariate t1) is explained by group membership. Following Cohen’s guidelines: η² >= 0.14 = large, η² >= 0.06 = medium, η² < 0.06 = small. A large η² would indicate that group membership accounts for a substantial share of the adjusted variance in anxiety post-scores.

The output yields η² = 0.8573, or 85.7% of the adjusted variance in t2 is explained by group membership. This is an exceptionally large effect size, confirming that after removing the variance attributable to pretest anxiety, the three groups are highly differentiated in their post-test anxiety scores.

D.9 Post-Hoc Tests (If H0 is Rejected)

If the overall ANCOVA F-test is significant (group effect p < 0.05), we perform post-hoc pairwise comparisons to identify which specific groups differ from each other, while adjusting for the covariate.

# Estimated marginal means (adjusted means at the covariate mean)
emm <- emmeans(ancova_model, ~ group)
cat("Estimated Marginal Means (Adjusted for t1)\n")
## Estimated Marginal Means (Adjusted for t1)
print(emm)
##  group emmean     SE df lower.CL upper.CL
##  grp1    16.8 0.0955 41     16.6     16.9
##  grp2    16.7 0.0959 41     16.6     16.9
##  grp3    14.9 0.0954 41     14.7     15.1
## 
## Confidence level used: 0.95
cat("\nPairwise Comparisons (Tukey Adjustment)\n")
## 
## Pairwise Comparisons (Tukey Adjustment)
pairs(emm, adjust = "tukey")
##  contrast    estimate    SE df t.ratio p.value
##  grp1 - grp2   0.0064 0.136 41   0.047  0.9988
##  grp1 - grp3   1.8377 0.135 41  13.629 <0.0001
##  grp2 - grp3   1.8313 0.136 41  13.514 <0.0001
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

Interpretation: It computes the estimated marginal means, which is the predicted mean of t2 for each group at the grand mean of the covariate t1. These adjusted means are what we compare, not the raw group means. The pairwise Tukey-adjusted comparisons show which pairs of groups (grp1 vs grp2, grp1 vs grp3, grp2 vs grp3) are significantly different from each other after covariate adjustment. A p-value < 0.05 in a pairwise comparison indicates that those two groups differ significantly in their adjusted anxiety post-scores.

The output shows adjusted means: grp1 = 16.8, grp2 = 16.7, grp3 = 14.9. Pairwise Tukey comparisons reveal: grp1 vs grp2 (estimate = 0.006, p = 0.9988, which is not significant), grp1 vs grp3 (estimate = 1.838, t = 13.629, p < 0.0001, which is highly significant), grp2 vs grp3 (estimate = 1.831, t = 13.514, p < 0.0001, which is highly significant). grp3 has a substantially lower adjusted anxiety post-score than both grp1 and grp2, while grp1 and grp2 are essentially equivalent after covariate adjustment.

D.10 Conclusion

p_val_group <- Anova(ancova_model, type = "III")["group", "Pr(>F)"]
p_val_cov   <- Anova(ancova_model, type = "III")["t1",    "Pr(>F)"]

cat("ANCOVA FINAL SUMMARY\n\n")
## ANCOVA FINAL SUMMARY
cat(sprintf("Covariate (t1):  F(%d,%d) p = %.4f  %s\n",
            1, df_wg, p_val_cov,
            ifelse(p_val_cov < 0.05, "SIGNIFICANT", "not significant")))
## Covariate (t1):  F(1,41) p = 0.0000  SIGNIFICANT
cat(sprintf("Group factor:    F(%d,%d) p = %.4f  %s\n",
            df_bg, df_wg, p_val_group,
            ifelse(p_val_group < 0.05, "SIGNIFICANT", "not significant")))
## Group factor:    F(2,41) p = 0.0000  SIGNIFICANT
cat(sprintf("Partial eta2:    %.3f (%.1f%% of adjusted variance)\n",
            eta_sq, eta_sq * 100))
## Partial eta2:    0.857 (85.7% of adjusted variance)
cat("\nConclusion:\n")
## 
## Conclusion:
if (p_val_group < 0.05) {
  cat("We REJECT H0. After controlling for pretest anxiety scores (t1),\n")
  cat("there is a statistically significant difference between at least\n")
  cat("two groups in post-test anxiety scores (t2).\n")
  cat("The effect of group on t2 is significant even when the influence\n")
  cat("of t1 is statistically removed.\n")
} else {
  cat("We FAIL TO REJECT H0. After controlling for pretest anxiety scores (t1),\n")
  cat("there is no statistically significant difference between groups\n")
  cat("in post-test anxiety scores (t2).\n")
}
## We REJECT H0. After controlling for pretest anxiety scores (t1),
## there is a statistically significant difference between at least
## two groups in post-test anxiety scores (t2).
## The effect of group on t2 is significant even when the influence
## of t1 is statistically removed.

E. MANCOVA

MANCOVA (Multivariate Analysis of Covariance) adalah perluasan dari ANCOVA ke kasus multivariat. In this analysis, we have:

Since there is only one factor (group), this is a One-Way MANCOVA.

E.1 Assumption: Homogeneity of Regression Slopes (Multivariate)

Before running MANCOVA, all key assumptions were verified in Section B. One additional assumption specific to MANCOVA is the Homogeneity of Regression Slopes (multivariate version); this assumption states that the relationship between the covariate (t1) and each DV (t2 and t3) must have the same slope across all groups, meaning the regression lines must be parallel across groups for both DVs simultaneously.

The test is conducted by including the group x t1 interaction in the model for each DV. If the interaction is not significant, the assumption is met.

# Homogeneity of regression slopes - checked per DV
model_int_t2 <- lm(t2 ~ t1 * group, data = anxiety)
cat("Homogeneity of Slopes Test for t2\n")
## Homogeneity of Slopes Test for t2
print(Anova(model_int_t2, type = "III"))
## Anova Table (Type III tests)
## 
## Response: t2
##             Sum Sq Df  F value Pr(>F)    
## (Intercept)  0.041  1   0.2912 0.5925    
## t1          38.955  1 273.7682 <2e-16 ***
## group        0.117  2   0.4128 0.6647    
## t1:group     0.039  2   0.1370 0.8724    
## Residuals    5.549 39                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_int_t3 <- lm(t3 ~ t1 * group, data = anxiety)
cat("\nHomogeneity of Slopes Test for t3\n")
## 
## Homogeneity of Slopes Test for t3
print(Anova(model_int_t3, type = "III"))
## Anova Table (Type III tests)
## 
## Response: t3
##              Sum Sq Df  F value   Pr(>F)    
## (Intercept)  0.1287  1   0.5545  0.46095    
## t1          30.5469  1 131.6210 4.55e-14 ***
## group        1.1968  2   2.5784  0.08878 .  
## t1:group     0.4173  2   0.8989  0.41527    
## Residuals    9.0512 39                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: If the t1:group interaction is not significant (p > 0.05) in both models, then the assumption of homogeneity of regression slopes is met for both dependent variables. This means that the effect of the covariate t1 on both t2 and t3 is consistent across all groups, so the MANCOVA can proceed validly.

The output shows: for t2, t1:group interaction F(2, 39) = 0.137, p = 0.872 (non-significant); for t3, t1:group interaction F(2, 39) = 0.899, p = 0.415 (non-significant). Both interactions are non-significant at p > 0.05, confirming that the homogeneity of regression slopes assumption is met for both DVs simultaneously. The covariate t1 is highly significant for both t2 (F = 273.77, p < 2e-16) and t3 (F = 131.62, p = 4.55e-14), confirming its importance in the MANCOVA model.

E.2 Compute SSCP Matrices (Manual)

MANCOVA uses the SSCP (Sum of Squares and Cross-Products) matrix. We calculate the E matrix (Error/Within-groups) and the H matrix (Hypothesis/Between-groups) for the vector combination where x is the covariate (t1) and y is the DV vector (t2, t3).

E.2.1 Setup

X   <- anxiety$t1
Y1  <- anxiety$t2
Y2  <- anxiety$t3
grp <- anxiety$group

k <- nlevels(grp)
N <- nrow(anxiety)
n <- N / k

cat("k (groups):", k, "\n")
## k (groups): 3
cat("N (total) :", N, "\n")
## N (total) : 45
cat("n (per group):", n, "\n\n")
## n (per group): 15
grp_mean_X  <- tapply(X,  grp, mean)
grp_mean_Y1 <- tapply(Y1, grp, mean)
grp_mean_Y2 <- tapply(Y2, grp, mean)

grand_mean_X  <- mean(X)
grand_mean_Y1 <- mean(Y1)
grand_mean_Y2 <- mean(Y2)

cat("Grand mean X (t1):", round(grand_mean_X, 4), "\n")
## Grand mean X (t1): 16.9156
cat("Grand mean Y1 (t2):", round(grand_mean_Y1, 4), "\n")
## Grand mean Y1 (t2): 16.1356
cat("Grand mean Y2 (t3):", round(grand_mean_Y2, 4), "\n\n")
## Grand mean Y2 (t3): 15.1978
cat("Group means X (t1):\n"); print(round(grp_mean_X, 4))
## Group means X (t1):
##    grp1    grp2    grp3 
## 17.0867 16.6467 17.0133
cat("Group means Y1 (t2):\n"); print(round(grp_mean_Y1, 4))
## Group means Y1 (t2):
##    grp1    grp2    grp3 
## 16.9267 16.4667 15.0133
cat("Group means Y2 (t3):\n"); print(round(grp_mean_Y2, 4))
## Group means Y2 (t3):
##    grp1    grp2    grp3 
## 16.5067 15.5267 13.5600

Interpretation: The grand means and group means serve as reference points for computing the SSCP matrices. Grand means are: t1 = 16.9156, t2 = 16.1356, t3 = 15.1978. Group means for t1 are closely clustered (grp1: 17.087, grp2: 16.647, grp3: 17.013), confirming baseline comparability. Group means for t2 decrease progressively (grp1: 16.927, grp2: 16.467, grp3: 15.013), and the decline is even more pronounced for t3 (grp1: 16.507, grp2: 15.527, grp3: 13.560). This pattern suggests that grp3 has substantially lower post-test anxiety on both DVs, which will be reflected in the H matrix.

E.2.2 SSCP Total Matrix (T)

The T matrix is the matrix of the sum of squares and the total cross-products for the combination vector where x is the covariate and y is the DV vector:

\[T = \begin{bmatrix} T_{xx} & T_{xy} \\ T_{yx} & T_{yy} \end{bmatrix}\]

with df = N - 1

# SSCP Total Matrix T
T_xx   <- sum((X  - grand_mean_X)^2)
T_y1y1 <- sum((Y1 - grand_mean_Y1)^2)
T_y2y2 <- sum((Y2 - grand_mean_Y2)^2)
T_xy1  <- sum((X  - grand_mean_X) * (Y1 - grand_mean_Y1))
T_xy2  <- sum((X  - grand_mean_X) * (Y2 - grand_mean_Y2))
T_y1y2 <- sum((Y1 - grand_mean_Y1) * (Y2 - grand_mean_Y2))

df_T <- N - 1

cat("SSCP Total Matrix T\n")
## SSCP Total Matrix T
cat("df_T = N - 1 =", df_T, "\n\n")
## df_T = N - 1 = 44
cat(sprintf("T_xx    = %.4f\n", T_xx))
## T_xx    = 97.5591
cat(sprintf("T_y1y1  = %.4f\n", T_y1y1))
## T_y1y1  = 137.4231
cat(sprintf("T_y2y2  = %.4f\n", T_y2y2))
## T_y2y2  = 170.3898
cat(sprintf("T_xy1   = %.4f\n", T_xy1))
## T_xy1   = 97.9051
cat(sprintf("T_xy2   = %.4f\n", T_xy2))
## T_xy2   = 94.2516
cat(sprintf("T_y1y2  = %.4f\n\n", T_y1y2))
## T_y1y2  = 146.8636
T_yy    <- matrix(c(T_y1y1, T_y1y2, T_y1y2, T_y2y2), nrow = 2,
                  dimnames = list(c("t2","t3"), c("t2","t3")))
T_xx_mat <- matrix(T_xx, nrow = 1, dimnames = list("t1","t1"))
T_xy_mat <- matrix(c(T_xy1, T_xy2), nrow = 1,
                   dimnames = list("t1", c("t2","t3")))
T_yx_mat <- t(T_xy_mat)

cat("T_yy (SS for DVs):\n"); print(round(T_yy, 4))
## T_yy (SS for DVs):
##          t2       t3
## t2 137.4231 146.8636
## t3 146.8636 170.3898
cat("\nT_xx (SS for covariate):\n"); print(round(T_xx_mat, 4))
## 
## T_xx (SS for covariate):
##         t1
## t1 97.5591
cat("\nT_xy (cross-product covariate x DVs):\n"); print(round(T_xy_mat, 4))
## 
## T_xy (cross-product covariate x DVs):
##         t2      t3
## t1 97.9051 94.2516

Interpretation: The T matrix captures total variability across all observations before partitioning into within- and between-group components. T_xx = 97.559 (total SS for covariate t1), T_y1y1 = 137.423 (total SS for t2), T_y2y2 = 170.390 (total SS for t3). The cross-products T_xy1 = 97.905 and T_xy2 = 94.252 are large and positive, reflecting the strong linear relationships between t1 and both DVs that were established in the linearity assumption check. The T_yy off-diagonal T_y1y2 = 146.864 also reflects the strong co-variation between t2 and t3 (r = 0.960).

E.2.3 SSCP Error Matrix (E)

Matrix E is the matrix of the sums of squares and cross-products of the within-group errors:

\[E = \begin{bmatrix} E_{xx} & E_{xy} \\ E_{yx} & E_{yy} \end{bmatrix}\]

with df = N - k

# SSCP Error Matrix E (Within-groups)
E_xx   <- sum((X  - grp_mean_X[grp])^2)
E_y1y1 <- sum((Y1 - grp_mean_Y1[grp])^2)
E_y2y2 <- sum((Y2 - grp_mean_Y2[grp])^2)
E_xy1  <- sum((X  - grp_mean_X[grp]) * (Y1 - grp_mean_Y1[grp]))
E_xy2  <- sum((X  - grp_mean_X[grp]) * (Y2 - grp_mean_Y2[grp]))
E_y1y2 <- sum((Y1 - grp_mean_Y1[grp]) * (Y2 - grp_mean_Y2[grp]))

df_E <- N - k

cat("SSCP Error Matrix E (Within-Groups)\n")
## SSCP Error Matrix E (Within-Groups)
cat("df_E = N - k =", df_E, "\n\n")
## df_E = N - k = 42
cat(sprintf("E_xx    = %.4f\n", E_xx))
## E_xx    = 95.8920
cat(sprintf("E_y1y1  = %.4f\n", E_y1y1))
## E_y1y1  = 107.5000
cat(sprintf("E_y2y2  = %.4f\n", E_y2y2))
## E_y2y2  = 102.8347
cat(sprintf("E_xy1   = %.4f\n", E_xy1))
## E_xy1   = 98.8560
cat(sprintf("E_xy2   = %.4f\n", E_xy2))
## E_xy2   = 94.6207
cat(sprintf("E_y1y2  = %.4f\n\n", E_y1y2))
## E_y1y2  = 102.1287
E_yy    <- matrix(c(E_y1y1, E_y1y2, E_y1y2, E_y2y2), nrow = 2,
                  dimnames = list(c("t2","t3"), c("t2","t3")))
E_xx_mat <- matrix(E_xx, nrow = 1, dimnames = list("t1","t1"))
E_xy_mat <- matrix(c(E_xy1, E_xy2), nrow = 1,
                   dimnames = list("t1", c("t2","t3")))
E_yx_mat <- t(E_xy_mat)

cat("E_yy (within-group SS matrix for DVs):\n"); print(round(E_yy, 4))
## E_yy (within-group SS matrix for DVs):
##          t2       t3
## t2 107.5000 102.1287
## t3 102.1287 102.8347
cat("\nE_xx (within-group SS for covariate):\n"); print(round(E_xx_mat, 4))
## 
## E_xx (within-group SS for covariate):
##        t1
## t1 95.892
cat("\nE_yx (within-group cross-product DVs x covariate):\n"); print(round(E_yx_mat, 4))
## 
## E_yx (within-group cross-product DVs x covariate):
##         t1
## t2 98.8560
## t3 94.6207

Interpretation: The E matrix captures within-group variability — the portion of variance unexplained by group membership. E_xx = 95.892 (within-group SS for covariate t1), E_y1y1 = 107.500 (within-group SS for t2), E_y2y2 = 102.835 (within-group SS for t3). The large cross-products E_xy1 = 98.856 and E_xy2 = 94.621 indicate strong within-group relationships between the covariate and each DV, which is what MANCOVA exploits to reduce within-group error. E_y1y2 = 102.129 confirms strong within-group co-variation between t2 and t3.

E.2.4 SSCP Hypothesis Matrix (H)

The H matrix is a matrix of the sums of squares and cross-products of treatment effects (between-groups):

\[H = \begin{bmatrix} H_{xx} & H_{xy} \\ H_{yx} & H_{yy} \end{bmatrix}\]

with df = k - 1

# SSCP Hypothesis Matrix H (Between-groups)
H_xx   <- n * sum((grp_mean_X  - grand_mean_X)^2)
H_y1y1 <- n * sum((grp_mean_Y1 - grand_mean_Y1)^2)
H_y2y2 <- n * sum((grp_mean_Y2 - grand_mean_Y2)^2)
H_xy1  <- n * sum((grp_mean_X  - grand_mean_X) * (grp_mean_Y1 - grand_mean_Y1))
H_xy2  <- n * sum((grp_mean_X  - grand_mean_X) * (grp_mean_Y2 - grand_mean_Y2))
H_y1y2 <- n * sum((grp_mean_Y1 - grand_mean_Y1) * (grp_mean_Y2 - grand_mean_Y2))

df_H <- k - 1

cat("SSCP Hypothesis Matrix H (Between-Groups)\n")
## SSCP Hypothesis Matrix H (Between-Groups)
cat("df_H = k - 1 =", df_H, "\n\n")
## df_H = k - 1 = 2
cat(sprintf("H_xx    = %.4f\n", H_xx))
## H_xx    = 1.6671
cat(sprintf("H_y1y1  = %.4f\n", H_y1y1))
## H_y1y1  = 29.9231
cat(sprintf("H_y2y2  = %.4f\n", H_y2y2))
## H_y2y2  = 67.5551
cat(sprintf("H_xy1   = %.4f\n", H_xy1))
## H_xy1   = -0.9509
cat(sprintf("H_xy2   = %.4f\n", H_xy2))
## H_xy2   = -0.3691
cat(sprintf("H_y1y2  = %.4f\n\n", H_y1y2))
## H_y1y2  = 44.7349
H_yy    <- matrix(c(H_y1y1, H_y1y2, H_y1y2, H_y2y2), nrow = 2,
                  dimnames = list(c("t2","t3"), c("t2","t3")))
H_xx_mat <- matrix(H_xx, nrow = 1, dimnames = list("t1","t1"))
H_xy_mat <- matrix(c(H_xy1, H_xy2), nrow = 1,
                   dimnames = list("t1", c("t2","t3")))
H_yx_mat <- t(H_xy_mat)

cat("H_yy (between-group SS matrix for DVs):\n"); print(round(H_yy, 4))
## H_yy (between-group SS matrix for DVs):
##         t2      t3
## t2 29.9231 44.7349
## t3 44.7349 67.5551
cat("\nH_xx (between-group SS for covariate):\n"); print(round(H_xx_mat, 4))
## 
## H_xx (between-group SS for covariate):
##        t1
## t1 1.6671
cat("\nH_yx (between-group cross-product DVs x covariate):\n"); print(round(H_yx_mat, 4))
## 
## H_yx (between-group cross-product DVs x covariate):
##         t1
## t2 -0.9509
## t3 -0.3691
# Verify: T = H + E (should hold)
cat("\nVerification: T_yy = H_yy + E_yy\n")
## 
## Verification: T_yy = H_yy + E_yy
print(round(H_yy + E_yy, 4))
##          t2       t3
## t2 137.4231 146.8636
## t3 146.8636 170.3898
cat("T_yy actual:\n"); print(round(T_yy, 4))
## T_yy actual:
##          t2       t3
## t2 137.4231 146.8636
## t3 146.8636 170.3898

Interpretation: The H matrix captures between-group variability — the portion of variance attributable to group membership. H_xx = 1.667 is very small (the groups had nearly identical covariate means), while H_y1y1 = 29.923 and H_y2y2 = 67.555 are larger, indicating meaningful group separation on the DVs. The cross-products H_xy1 = -0.951 and H_xy2 = -0.369 are near zero, reflecting the minimal between-group covariate variation. The verification confirms T_yy = H_yy + E_yy exactly, validating the matrix decomposition.

E.3 Compute Adjusted SSCP Matrices (Corrected for Covariate)

The covariate-adjusted SSCP matrix is calculated as follows:

\[E_{Y \cdot X} = E_{yy} - E_{yx} E_{xx}^{-1} E_{xy}\]

\[H_{A,Y \cdot X} = H_{yy} - (H_{yx} + E_{yx})(H_{xx} + E_{xx})^{-1}(H_{xy} + E_{xy}) + E_{yx} E_{xx}^{-1} E_{xy}\]

df for \(E_{Y \cdot X}\) = N - k - 1

# Adjusted Error Matrix E_Y.X
E_yx_vec <- matrix(c(E_xy1, E_xy2), ncol = 1)
E_xy_vec <- t(E_yx_vec)

E_YX    <- E_yy - (E_yx_vec %*% (1/E_xx) %*% E_xy_vec)
df_E_YX <- N - k - 1

cat("Adjusted Error Matrix E_Y.X\n")
## Adjusted Error Matrix E_Y.X
cat("df = N - k - 1 =", df_E_YX, "\n\n")
## df = N - k - 1 = 41
print(round(E_YX, 4))
##        t2     t3
## t2 5.5884 4.5833
## t3 4.5833 9.4685
# Adjusted Hypothesis Matrix H_A,Y.X
H_yx_vec <- matrix(c(H_xy1, H_xy2), ncol = 1)
H_xy_vec <- t(H_yx_vec)

H_plus_E_xx  <- H_xx + E_xx
H_plus_E_yx  <- H_yx_vec + E_yx_vec
H_plus_E_xy  <- H_xy_vec + E_xy_vec

H_AYX    <- H_yy - (H_plus_E_yx %*% (1/H_plus_E_xx) %*% H_plus_E_xy) +
            (E_yx_vec %*% (1/E_xx) %*% E_xy_vec)
df_H_AYX <- k - 1

cat("\nAdjusted Hypothesis Matrix H_A,Y.X (Treatment)\n")
## 
## Adjusted Hypothesis Matrix H_A,Y.X (Treatment)
cat("df = k - 1 =", df_H_AYX, "\n\n")
## df = k - 1 = 2
print(round(H_AYX, 4))
##         t2      t3
## t2 33.5824 47.6944
## t3 47.6944 69.8652

Interpretation: After adjusting for the covariate, the error matrix E_Y.X is dramatically reduced compared to the unadjusted E_yy. The adjusted error matrix diagonal elements are E_Y.X[t2,t2] = 5.5884 and E_Y.X[t3,t3] = 9.4685, compared to the unadjusted E_yy diagonals of 107.500 and 102.835, a reduction of approximately 95% in error for t2 and 91% for t3. The adjusted hypothesis matrix H_A,Y.X has diagonal elements 33.582 (t2) and 69.865 (t3), which are larger than the unadjusted H_yy diagonals (29.923 and 67.555), indicating that adjusting for the covariate actually revealed a slightly stronger group effect on both DVs.

E.4 Compute Wilks’ Lambda (L)

The test statistic used is Wilks’ Lambda (L):

\[\Lambda_A = \frac{|E_{Y \cdot X}|}{|E_{Y \cdot X} + H_{A,Y \cdot X}|}\]

det_E_YX  <- det(E_YX)
det_EH_YX <- det(E_YX + H_AYX)

Lambda_A <- det_E_YX / det_EH_YX

cat("Wilks' Lambda\n")
## Wilks' Lambda
cat(sprintf("|E_Y.X|           = %.6f\n", det_E_YX))
## |E_Y.X|           = 31.906878
cat(sprintf("|E_Y.X + H_A,Y.X| = %.6f\n", det_EH_YX))
## |E_Y.X + H_A,Y.X| = 374.598947
cat(sprintf("Lambda (L)        = %.6f\n\n", Lambda_A))
## Lambda (L)        = 0.085176

Interpretation: The adjusted Wilks’ Lambda for MANCOVA is L_A = 0.085176. This is substantially smaller than the unadjusted Wilks’ Lambda from MANOVA (0.338171), which demonstrates the power of MANCOVA: after removing the covariate’s effect, only 8.5% of the adjusted multivariate variance remains unexplained by group membership — compared to 33.8% in the unadjusted MANOVA. The ratio |E_Y.X| = 31.907 to |E_Y.X + H_A,Y.X| = 374.599 confirms the strong group effect after covariate adjustment.

E.5 Transform Wilks’ Lambda to F-Statistic

According to the transformation table in the lecture materials (Transformation from L to F distribution), where: p = number of observed response variables = 2 (t2 and t3), v_H = degrees of freedom for the treatment = k - 1 = 2, v_E = degrees of freedom for the error = N - k - 1 = 41.

Since p = 2 and v_H >= 1, the formula in the second row of the table is used (p = 2, v_H >= 1):

\[F = \left(\frac{1 - \sqrt{\Lambda}}{\sqrt{\Lambda}}\right)\left(\frac{v_E - 1}{v_H}\right)\]

with degree of freedom: \((2v_H,\ 2(v_E - 1))\)

p   <- 2
v_H <- df_H_AYX
v_E <- df_E_YX

# F transformation for p=2, v_H >= 1
F_stat_mancova <- ((1 - sqrt(Lambda_A)) / sqrt(Lambda_A)) * ((v_E - 1) / v_H)

df1 <- 2 * v_H
df2 <- 2 * (v_E - 1)

F_crit_mancova <- qf(0.95, df1 = df1, df2 = df2)
p_val_mancova  <- pf(F_stat_mancova, df1 = df1, df2 = df2, lower.tail = FALSE)

cat("Wilks' Lambda to F Transformation\n")
## Wilks' Lambda to F Transformation
cat(sprintf("p     = %d  (total DV)\n", p))
## p     = 2  (total DV)
cat(sprintf("v_H   = %d  (df treatment = k - 1)\n", v_H))
## v_H   = 2  (df treatment = k - 1)
cat(sprintf("v_E   = %d  (df error = N - k - 1)\n", v_E))
## v_E   = 41  (df error = N - k - 1)
cat(sprintf("\nFormula: F = ((1 - sqrt(L)) / sqrt(L)) x ((v_E - 1) / v_H)\n"))
## 
## Formula: F = ((1 - sqrt(L)) / sqrt(L)) x ((v_E - 1) / v_H)
cat(sprintf("L           = %.6f\n", Lambda_A))
## L           = 0.085176
cat(sprintf("sqrt(L)     = %.6f\n", sqrt(Lambda_A)))
## sqrt(L)     = 0.291849
cat(sprintf("\nF_observed  = %.4f\n", F_stat_mancova))
## 
## F_observed  = 48.5285
cat(sprintf("df1         = 2 x v_H = 2 x %d = %d\n", v_H, df1))
## df1         = 2 x v_H = 2 x 2 = 4
cat(sprintf("df2         = 2 x (v_E - 1) = 2 x %d = %d\n", v_E - 1, df2))
## df2         = 2 x (v_E - 1) = 2 x 40 = 80
cat(sprintf("F_critical  (alpha=0.05, df=%d,%d) = %.4f\n", df1, df2, F_crit_mancova))
## F_critical  (alpha=0.05, df=4,80) = 2.4859
cat(sprintf("p-value     = %.6f\n", p_val_mancova))
## p-value     = 0.000000
cat("\nDecision\n")
## 
## Decision
if (F_stat_mancova > F_crit_mancova) {
  cat(sprintf("F_obs (%.4f) > F_crit (%.4f) - REJECT H0\n",
              F_stat_mancova, F_crit_mancova))
} else {
  cat(sprintf("F_obs (%.4f) <= F_crit (%.4f) - FAIL TO REJECT H0\n",
              F_stat_mancova, F_crit_mancova))
}
## F_obs (48.5285) > F_crit (2.4859) - REJECT H0

Interpretation: The F transformation yields F_obs = 48.5285 with df = (4, 80), and F_crit = 2.4859 at a = 0.05. Since F_obs = 48.529 far exceeds F_crit = 2.486, H0 is decisively rejected (p = 0.000000). This F statistic is more than three times larger than the unadjusted MANOVA F (14.752), clearly demonstrating that after controlling for the covariate t1, the group effect on the combined DVs is considerably stronger and more precisely estimated.

E.6 Verify with R’s Built-in MANCOVA

mancova_model <- manova(cbind(t2, t3) ~ t1 + group, data = anxiety)

cat("MANCOVA Summary (Wilks' Lambda)\n")
## MANCOVA Summary (Wilks' Lambda)
print(summary(mancova_model, test = "Wilks"))
##           Df    Wilks approx F num Df den Df    Pr(>F)    
## t1         1 0.052823   358.62      2     40 < 2.2e-16 ***
## group      2 0.085176    48.53      4     80 < 2.2e-16 ***
## Residuals 41                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\nUnivariate Tests per DV\n")
## 
## Univariate Tests per DV
print(summary.aov(mancova_model))
##  Response t2 :
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## t1           1 98.252  98.252  720.84 < 2.2e-16 ***
## group        2 33.582  16.791  123.19 < 2.2e-16 ***
## Residuals   41  5.588   0.136                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response t3 :
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## t1           1 91.056  91.056  394.29 < 2.2e-16 ***
## group        2 69.865  34.933  151.26 < 2.2e-16 ***
## Residuals   41  9.468   0.231                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: The output of summary(mancova_model, test = "Wilks") provides the official Wilks’ Lambda value from R. From the output: the t1 row (covariate) shows Wilks = 0.052823, F(2, 40) = 358.62, p < 2.2e-16 — the covariate has a massively significant multivariate effect on the linear combination of t2 and t3, confirming the critical importance of including t1. The group row shows Wilks = 0.085176, F(4, 80) = 48.53, p < 2.2e-16, at least one group differs significantly from the others on the optimal linear combination of t2 and t3 after controlling for t1. The R-computed Wilks’ Lambda of 0.085176 exactly matches our manual calculation, confirming the accuracy of the manual procedure. The univariate follow-up tests show group effects are significant for both t2 (F(2,41) = 123.19, p < 2.2e-16) and t3 (F(2,41) = 151.26, p < 2.2e-16), with the covariate t1 also significant for both DVs individually.

E.7 Conclusion

mancova_sum  <- summary(mancova_model, test = "Wilks")
lambda_group <- mancova_sum$stats["group", "approx F"]
p_group      <- mancova_sum$stats["group", "Pr(>F)"]
lambda_t1    <- mancova_sum$stats["t1", "approx F"]
p_t1         <- mancova_sum$stats["t1", "Pr(>F)"]

cat("MANCOVA FINAL SUMMARY\n")
## MANCOVA FINAL SUMMARY
cat("Model: cbind(t2, t3) ~ t1 + group\n\n")
## Model: cbind(t2, t3) ~ t1 + group
cat(sprintf("Covariate (t1) : F = %.4f, p = %.6f  - %s\n",
            lambda_t1, p_t1,
            ifelse(p_t1 < 0.05, "SIGNIFICANT", "not significant")))
## Covariate (t1) : F = 358.6229, p = 0.000000  - SIGNIFICANT
cat(sprintf("Group factor   : Wilks' L = %.6f\n", Lambda_A))
## Group factor   : Wilks' L = 0.085176
cat(sprintf("               : F_manual = %.4f, df = (%d, %d)\n",
            F_stat_mancova, df1, df2))
##                : F_manual = 48.5285, df = (4, 80)
cat(sprintf("               : p-value  = %.6f  - %s\n",
            p_val_mancova,
            ifelse(p_val_mancova < 0.05, "SIGNIFICANT", "not significant")))
##                : p-value  = 0.000000  - SIGNIFICANT
cat("\nConclusion\n")
## 
## Conclusion
if (p_val_mancova < 0.05) {
  cat("Reject H0. After controlling for pretest anxiety scores (t1),\n")
  cat("there is a significant multivariate difference between\n")
  cat("at least two groups in the optimal linear combination of\n")
  cat("post-test anxiety scores (t2 and t3).\n\n")
  cat("This means that the group effect on t2 and t3 together\n")
  cat("is significant after the influence of the covariate t1 is statistically controlled.\n")
} else {
  cat("Fail to reject H0. After controlling for pretest anxiety scores (t1),\n")
  cat("there was no significant multivariate difference\n")
  cat("between groups in the linear combination of t2 and t3.\n")
}
## Reject H0. After controlling for pretest anxiety scores (t1),
## there is a significant multivariate difference between
## at least two groups in the optimal linear combination of
## post-test anxiety scores (t2 and t3).
## 
## This means that the group effect on t2 and t3 together
## is significant after the influence of the covariate t1 is statistically controlled.

F. Conclusion

This analysis applied three multivariate statistical methods — MANOVA, ANCOVA, and MANCOVA, to the anxiety dataset from the datarium library, which includes 45 participants distributed equally across three groups (grp1, grp2, grp3), with pretest anxiety scores (t1) as a covariate and post-test scores at two time points (t2 and t3) as dependent variables.

Assumptions: All assumptions were satisfied. The DVs (t2, t3) are strongly correlated (r = 0.960, Bartlett’s p < 0.001), covariance matrices are homogeneous (Box’s M p = 0.828), and multivariate normality holds (Mardia’s p > 0.05; Shapiro-Wilk p > 0.13). The covariate t1 shows strong linear relationships with both DVs (r = 0.846; r = 0.731) with no curvature. Homogeneity of regression slopes is met (all interaction p > 0.05). Observations are independent.

MANOVA: A significant multivariate group effect was found, Wilks’ Λ = 0.338, F(4,82) = 14.75, p < 0.001, η² = 0.418. Both t2 and t3 are individually significant, with grp3 driving the group differences.

ANCOVA: After controlling for t1, group differences in t2 remain highly significant, F(2,41) = 123.19, p < 0.001, η² = 0.857. The covariate is also significant (p < 0.001). Post-hoc tests show grp3 differs from grp1 and grp2, while grp1 and grp2 do not differ.

MANCOVA: Controlling for t1, the multivariate group effect is stronger, Wilks’ Λ = 0.085, F(4,80) = 48.53, p < 0.001. The covariate has a large multivariate effect, and both DVs remain significant.

Final Conslusion: Across all three analyses, Group membership significantly affects post-test anxiety. grp3 consistently shows the lowest scores, while grp1 and grp2 are similar. Including t1 substantially reduces error and strengthens the detected effects.