\[ \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\R}{\mathbb{R}} \newcommand{\Tr}{\mathrm{Tr}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\braket}[1]{\langle #1 \rangle} \]

Assignment 1

You may discuss homework problems with other students, but you have to prepare the written assignments yourself.

Please combine all your answers, the computer code and the figures into one PDF file, and submit a copy to your folder on Gradescope.

Grading scheme: 10 points per question, total of 50.

Due date: ~~11:59 PM Monday April 15, 2025 (Tuesday evening)~~ 11:59 PM Thursday April 17, 2025.

Download RMarkdown

Part A: From Mardia, Kent & Biiby

Problem 3.4.2

Plug $A=I_m, B=b, C=I_m, D=d$ into Theorem 3.3, we get $Xb \perp Xd$, which implies $b^T Mb = (Xb)^T Xb \perp (Xd)^T (Xd) = d^T M d$.
Let $b=e_i$, $d=e_j$, we get $M_{ii}\perp M_{jj}$ if $b^T\Sigma d=\Sigma_{ij}=0$. When $\Sigma=I_p$, $\text{Tr}(M)=\text{Tr}(X^TX)=\sum_{i=1}^m \sum_{\alpha=1}^p X_{i\alpha}^2 \sim \chi^2_{mp}$ since $X_{i \alpha}$’s are independent standard Gaussian.
Suppose $M=X^T X \in\mathbb{R}^{p\times p}$, where $X\sim N(0,I_m\otimes \Sigma_p)\in\mathbb{R}^{m\times p}$. Then $b^T M b = (Xb)^T(Xb)$, $d^T M d = (Xd)^T(Xd)$. Note that the joint distribution of $(Xb,Xd)$ is Gaussian, because $X$ is Gaussian and the mapping is linear. To prove $b^T M b \perp d^T M d$, it suffices to prove $Xb \perp Xd$, which in turn holds if $\Cov(Xb,Xd)=0$. For any two vectors $u,v\in\R^m$, \[ \begin{align} \Cov(u^TXb,v^TXd) &=\langle u\otimes b,\Cov(X) (v\otimes d) \rangle \\ &=u^T v \cdot b^T \Sigma d \end{align} \] Hence if $b^T\Sigma d=0$, then $\Cov(u^TXb,v^TXd)=0$, $b^T M b \perp d^T M d$.

Problem 3.4.9

The symmetric matrix $C$ has a spectral decomposition $C=U E U^T$ where $U\in SO(p)$ and $E$ is diagonal. Note $XU$ is distributed the same as $X$, so WLOG we assume $U=I$.
If $C$ is idempotent, then $E^2=E$, the elements of $E$ can only be $0$ or $1$. The matrix $XE$ selects $\Tr(E)$ columns from $X$, hence $XE$ has $p-\Tr(E)$ zero columns and the remaining $n\times r$ submatrix follows $N(0,I_{n}\otimes I_r)$, it follows $XCX^T=((XE)^T)^T(XE)^T \sim W_n(I_n,r)$.
Conversely, if $X C X^T \sim W_n(I_n,r)$, denote $E=\diag(\lambda_1,...,\lambda_p)$, from the previous proof we know $X E_{ii} X^T \sim W_n(I_n,1)$. Hence $X C X^T \sim \sum_{i=1}^n \lambda_i W_i$ where $W_i\sim W_n(I_n,1)$ i.i.d. We directly cite Theorem 3.4.4, which in turn cites Amderson, 1958, to claim that $C$ must be idempotent, so $\lambda_i=0,1$. Hence $X C X^T \sim W_n(I_n,\#\{\lambda_i=1\})$, $\Tr(C)=\sum_{i=1}^p \lambda_i=r$.
Apply Theorem 3.4.5 on $X^T$, we directly get $XCX^T\perp XDX^T$ iff $CD=0$.
If $B=0$, clearly they are independent. If $AC=0$, WLOG assume $C$ is diagonal, and $C=\diag(\lambda_i,...,\lambda_r,0,...,0)$, $\lambda_i\neq 0$. Then $AC$ implies the first $r$ columns of $A$ are all zero. Hence $AX^T B$ depends only on the last $p-r$ columns of $X$, $XCX^T$ depends only on the first $r$ columns of $X$, so they are independent.
Suppose for now that $XCX^T \perp AX^TB$. Note that orthonormal transformations on both left and right sides of $X$ won’t change the distribution of $X$. Moreover, we can apply orthonormal transformations on both sides of $AX^TB$ without changing its independency of $XCX^T$. These two claims together implies that WLOG we can apply any orthonormal transformations on $A,B$. Hence, WLOG, by applying SVD decompositions on $A,B$, we assume $A,B$ are diagonal.
Lets assume the first $a,b$ diagonal elements of $A,B$ are nonzero. Then $AX^TB$ contains all elements in $X_{1:b,1:a}$. Note that $(XCX^T)_{ij}=\sum_{kl}X_{ik}C_{kl}X_{jl}$. To fulfill the desired independency, it has to be that both arguments $(i\leq b, k\leq a \implies C_{kl}=0)$ and $(j\leq b, l\leq a \implies C_{kl}=0)$ hold. If $b=0$, this implies $B=0$. If $b\neq 0$, there exists some $i,j$ that triggers the above arguments. We must have $C_{1:a,:}=0$ and $C_{:,1:a}=0$ (by recalling that $C$ symmetric), which further implies $AC=0$.

Problem 3.4.20

Let $z_i\sim N_p(0,\Sigma)$ and $x_i=\mu_i+x_i$. Then $Z$, whose rows are $z_i$’s, satisfies the assumption of Theorem 3.3.3. Denote $X=Z+M$ where $M=\mu \mathbf{1}_p^T$ is a constant matrix. We have $Y=AXB=AZB+AMB$, $Z=CXD=CZD+CMD$. We conclude the proof by noting that shifting $Y,Z$ by constant matrices do not change their independency.
Follows the noration from the book, for any symmetric $C$ we write a decomposition $C=\sum_{i=1}^n \lambda_i \gamma_i\gamma_i^T$. Write $y_i=Z^T\gamma_i$, we have $X^TCX=Z^TCZ+Z^TCM+M^TCZ+M^TCM=\sum \lambda_i (y_i y_i^T +y_i \gamma_i^T M+M^T\gamma_i y_i^T+M^T\gamma_i\gamma_i^T M)$ is still a function of the random vectors $y_i$’s. The remaining steps are all the same as the proof of Theorem 3.4.5 (by first prove that for $k=2$, $C_1C_2=0$ is equivalent to $\lambda_i\psi_j\gamma_i^T\delta_j=0$, which is further equivalent to the $y_i$’s and $z_j$’s involved in $X^TC_1X,X^TC_2X$ are independent, which in turn if and only if $X^TC_1X,X^TC_2X$ are independent; then for $k>2$, apply that Gaussian vectors are pairwise independent are also jointly independent).
$\E[CX]=CM = 0$. Then \[ \begin{align} X^TCX=Z^TCZ+Z^TCM+M^TCZ+M^TCM = Z^TC Z \end{align} \] We apply Theorem 3.4.4 on $Z$ to end the proof.

Problem 3.6.1
Schur’s complement gives us \[ \begin{align} \Sigma^{-1} = \begin{bmatrix} S^{-1} & -S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \\ -\Sigma_{22}^{-1} \Sigma_{21} S^{-1} & \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \end{bmatrix}, \quad \text{where } S = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}. \end{align} \] Then: \[ \begin{align} \mu^T \Sigma^{-1} \mu &=\mu_1^T S^{-1}\mu_1- 2\mu_1^T S^{-1} \Sigma_{12} \Sigma_{22}^{-1}\mu_2 +\mu_2^T \left( \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) \mu_2 \\ &= \mu_1^T S^{-1} \mu_1 + (\mu_2 - \Sigma_{21} S^{-1} \mu_1)^T \left( \Sigma_{22}^{-1} + \Sigma_{22}^{-1} \Sigma_{21} S^{-1} \Sigma_{12} \Sigma_{22}^{-1} \right) (\mu_2 - \Sigma_{21} S^{-1} \mu_1), \end{align} \] Plugging in $(\Sigma^{-1})_{11}=S^{-1}$ ends the proof.
Problem 5.5.1 $Caption text$

Part B

We prove the second question first

If $X$ is a random matrix with covariance given by the Kronecker product $\Sigma_R \otimes \Sigma_C$, show that

\[ \text{Cov}\left(\text{Tr}(A^TX), \text{Tr}(B^TX)\right) = \text{Tr}(A^T\Sigma_R B \Sigma_C) \]

Apply the definition in class, \[\begin{align} \text{Cov}\left(\text{Tr}(A^TX), \text{Tr}(B^TX)\right) &= \text{Cov}\left(\sum A_{ij} X_{ij}, \sum B_{\alpha\beta} X_{\alpha\beta} \right) \\ &= \sum A_{ij} B_{\alpha\beta} \text{Cov}\left( e_i^T X e_j , e_\alpha^T X e_\beta \right) \\ &= \sum A_{ij} B_{\alpha\beta} e_i^T \Sigma_R e_\alpha e_j^T \Sigma_C e_\beta \\ &= \sum A_{ij} B_{\alpha\beta} (\Sigma_R)_{i\alpha} (\Sigma_C)_{j\beta} \\ &= \sum (A^T)_{ji} (\Sigma_R)_{i\alpha} B_{\alpha\beta} (\Sigma_C)_{\beta j} \\ &= \text{Tr}(A^T\Sigma_R B \Sigma_C) \end{align}\]

Let $\mathbb{R}^{n \times p} \ni W \sim \text{Wishart}(n, \Sigma)$. Show that the covariance matrix of $W$ is, for symmetric matrices $\Delta, \tilde{\Delta}$

\[ \text{Cov}\left(\text{Tr}(\Delta W), \text{Tr}(\tilde{\Delta}W) \right) = 2 n \text{Tr}(\Sigma \Delta \Sigma \tilde{\Delta}). \]

(Hint: use the fact that the cumulant generating function (log moment generating function) of $-W/2$ is $-n/2 \log \det \Theta$ with $\Theta=\Sigma^{-1}$.)

Conclude that a Wishart has covariance given by $2 n \Sigma \otimes \Sigma$.

We have \[\begin{align} K(\Theta) = \log \E e^{\langle \Theta , W \rangle} = -\frac{n}{2} \log \det (I-2\Theta \Sigma) \end{align}\]

Part C

Consider a bivariate normal distribution $X \sim N(\mu, \Sigma)$ with \[ X \sim N \left(\mu = \begin{pmatrix} 2 \\ -1 \end{pmatrix}, \Sigma = \begin{pmatrix} 1 & -0.4 \\ -0.4 & 1.3 \end{pmatrix}\right) \]

Using the language of your choice, plot an ellipsoid that has 95% probability under this distribution.

We have $(X-\mu)^T \Sigma^{-1} (X-\mu) \sim \chi_2^2$, so let $E=\{x\in\R^2:(x-\mu)^T \Sigma^{-1} (x-\mu)\leq (\chi_2^2)_{0.95}\}$

suppressPackageStartupMessages(library(ellipse))

mu <- c(2, -1)
Sigma <- matrix(c(1, -0.4, -0.4, 1.3), nrow = 2)
ellipse_data <- ellipse(Sigma, centre = mu, alpha = 0.05)

plot(ellipse_data,
     type = "l",         # line
     col = "blue",       
     lwd = 2,            
     asp = 1,            
     main = "95% Probability Ellipse for Bivariate Normal",
     xlab = expression(X[1]),
     ylab = expression(X[2])
)

points(mu[1], mu[2], col = "red", pch = 19, cex = 1.5)
grid()

det(Sigma)

## [1] 1.14

In the following I will work on paper.

Compute the law of $X_2|X_1=x_1$ as an explicit function of $x_1$ for $\mu$ and $\Sigma$ as above. What is the support of the law?
Compute the law of $X|X_1=x_1$ as an explicit function of $x_1$ for $\mu$ and $\Sigma$ as above. What is the support of the law?
Suppose $\mu$ was unknown. Show that $N=X - \Sigma[,1] \frac{X_1}{\Sigma[1,1]}$ is independent of $X_1$. Show that the law of $X | N$ depends only on $\mu_1$ and $\Sigma$ but not $\mu_2$. What is the support of $X|N=n$?
Suppose you had observed a value $X=(2.32,-0.57)$. Compute $n$ the observed value of $N$ and describe the law of $X|N=n$. Describe a set that contains 95% probability for this law.
Generalize 4. to a case of higher than 2: suppose $\mathbb{R}^{n_1+n_2} \ni X = (X_1, X_2)$ with mean $(\mu_1, \mu_2)$ and a similarly partitioned covariance matrix. Construct $N$ such that $X|N$ depends only on $\mu_1$. What is the support of $X|N=n$? Does $\Sigma$ need to be non-degenerate? Explain.

Part D

The two-sample Hotelling $T^2$ test (with large sample size so we can assume $\Sigma$ is roughly known) looks for differences in all possible directions $a \in \mathbb{R}^p$ by maximizing a (suitably standardized) test of $H_{0,a}:\mu_1^Ta=\mu_2^Ta$ over all directions $a \in \mathbb{R}^p$. An alternative test would be to only maximize over coordinate directions. Call this the maxT test. For covariance, use the sample covariance of the Sardinia.inland and Sardinia.coast for the region in the oliveoil data ignoring the eicosenoic measurement.

library(pdfCluster)

## pdfCluster 1.0-4

data(oliveoil)

Construct distributions $F_1 = N(\mu_1, \Sigma)$ and $F_2 = N(\mu_2, \Sigma)$, for which the $T^2$ test has noticably more power than the maxT test. Provide plots as a function of sample size (say with fixed $\mu_1-\mu_2$) illustrating this difference in power. (How did you calibrate the maxT test?) (If this turns out to be difficult, you can choose your own covariance matrix instead of that from the oliveoil).

# Step 1: Subset Sardinia.inland and Sardinia.coast samples
sardinia_data <- subset(oliveoil, region %in% c("Sardinia.inland", "Sardinia.coast"))

# Step 2: Drop the 'eicosenoic' column (it’s one of the fatty acid variables)
sardinia_data <- sardinia_data[, !names(sardinia_data) %in% "eicosenoic"]

# Step 3: Drop the 'region' column (since it's categorical and not numeric)
sardinia_numeric <- sardinia_data[, sapply(sardinia_data, is.numeric)]

# Step 4: Compute sample covariance matrix
Sigma <- cov(sardinia_numeric)
Sigma

##                palmitic  palmitoleic     stearic        oleic     linoleic
## palmitic     1633.05365  -14.7868714   200.34799  -3548.12056   1816.33979
## palmitoleic   -14.78687  191.8620871    78.33600   -480.42773    321.05428
## stearic       200.34799   78.3359983   311.03808  -1829.64296   1291.15927
## oleic       -3548.12056 -480.4277299 -1829.64296  20129.44288 -14358.85630
## linoleic     1816.33979  321.0542815  1291.15927 -14358.85630  11499.03514
## linolenic     -24.53734    0.3020198   -50.31601    352.85378   -290.21418
## arachidic      10.18662   15.0034715   -31.00126     27.25416    -38.47444
##                linolenic arachidic
## palmitic     -24.5373448  10.18662
## palmitoleic    0.3020198  15.00347
## stearic      -50.3160109 -31.00126
## oleic        352.8537766  27.25416
## linoleic    -290.2141805 -38.47444
## linolenic     28.9914791  23.83958
## arachidic     23.8395750 141.19640

R <- cov2cor(Sigma)
library(ggcorrplot)

## Loading required package: ggplot2

library(corrplot)

## corrplot 0.95 loaded

corrplot(Sigma, is.corr = FALSE)
title("Covariance Matrix")

ggcorrplot(R, hc.order = TRUE, lab = TRUE)

pairs(sardinia_numeric)

Now we construct the model, $\mu_1$, $\mu_2$. In the first two questions we will assume $X_1$, $X_2$ are of the same size $n$. We will let the difference of $\mu_1,\mu_2$ spreads among all axes to supress the power of maxT. Specifically, \[ \mu_2-\mu_1 = c \sqrt{\text{diag}(\Sigma)} \frac{\mathbf{1}}{\sqrt{p}}. \]

We run both the Hotelling test and the maxT test. Since the distribution of the maximum of t-statistics is not well-known, we calibrate the maxT test by simulation under $H_0$.

Note that under $H_0$ the distribution of maxT depends on $\Sigma$. The question assumes that $\Sigma$ is known, so in the simulation we will sample $X_1,X_2$ from $N(0,\Sigma)$.

The test level is set to $\alpha=0.05$.

library(expm)

## Loading required package: Matrix

## 
## Attaching package: 'expm'

## The following object is masked from 'package:Matrix':
## 
##     expm

library(MASS) 
library(Hotelling)

## Loading required package: corpcor

library(multtest)

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
##     table, tapply, union, unique, unsplit, which.max, which.min

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

p = ncol(Sigma)
mu1 = rep(0, p)
#mu2 = sqrtm(Sigma) %*%  rep(1/sqrt(p), p) * 0.1
mu2 = sqrt(diag(Sigma)) *  rep(1/sqrt(p), p) * 0.1
alpha = 0.05

hotelling = function(X1, X2) {
  T = hotelling.test(X1, X2, progBar=FALSE, B=1000)
  return(T$pval)
}

# simulate maxT H0
compute_maxT <- function(X1, X2) {
  n1 <- nrow(X1)
  n2 <- nrow(X2)
  
  xbar1 <- colMeans(X1)
  xbar2 <- colMeans(X2)
  
  # Standard errors (pooled variance)
  s1 <- apply(X1, 2, sd)
  s2 <- apply(X2, 2, sd)
  sp = ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2)

  # t-statistics
  t_stats <- (xbar1 - xbar2)^2 / (sp*(1/n1+1/n2))
  max_t <- max(t_stats)
  
  return(max_t)
}



maxT <- function(X1, X2, Sigma) {
  max_t <- compute_maxT(X1,X2)
  p = ncol(X1)
  n1 = nrow(X1)
  n2 = nrow(X2)
  # simulate H0
  B = 100
  maxT_vals <- numeric(B)
  for (b in 1:B){
    X1 <- mvrnorm(n1, numeric(p), Sigma)
    X2 <- mvrnorm(n2, numeric(p), Sigma)
    maxT_vals[b] <- compute_maxT(X1,X2)
  }
  return(mean(maxT_vals >= max_t))
}

# n_vals <- seq(10, 3000, by = 500)
n_vals <- c(10, 50, 100, 500, 1000, 2000, 3000, 4000, 6000)
power_hotelling <- numeric(length(n_vals))
power_maxT <- numeric(length(n_vals))

B = 400
for (i in seq_along(n_vals)) {
  n <- n_vals[i] 
  cat("Sample size =", n, "\n")
  rej_hotelling <- rej_maxT <- 0
  
  for (b in 1:B) {
    X1 <- mvrnorm(n, mu1, Sigma)
    X2 <- mvrnorm(n, mu2, Sigma)
    
    pval_h = hotelling(X1, X2)
    if (pval_h < alpha) 
      rej_hotelling <- rej_hotelling + 1
    
    pval_m = maxT(X1, X2, Sigma)
    if (pval_m < alpha) 
      rej_maxT <- rej_maxT + 1
  }
  power_hotelling[i] <- rej_hotelling / B
  power_maxT[i] <- rej_maxT / B
}

## Sample size = 10 
## Sample size = 50 
## Sample size = 100 
## Sample size = 500 
## Sample size = 1000 
## Sample size = 2000 
## Sample size = 3000 
## Sample size = 4000 
## Sample size = 6000

# === Plot Power Curves
plot(n_vals, power_hotelling, type = "o", pch = 16, col = "blue",  ylim = c(0, 1), xlab = "Sample size n1=n2", ylab = "Power", main = "Power vs Sample Size")
lines(n_vals, power_maxT, type = "o", pch = 17, col = "red", lwd = 2)
abline(h = alpha, col = "gray", lty = 2)
legend("bottomright", legend = c("Hotelling T²", "maxT"), col = c("blue", "red"), pch = c(16, 17))

Repeat 1. but switching $T^2$ and maxT. (If this turns out to be difficult, you can choose your own covariance matrix instead of that from the oliveoil).

We use $\mu_2-\mu_1=c \cdot e_i$. We expect the gap of powers increase with $p$. The given dataset has $p=7$, which is not large enough to clearly see a gap.

mu1 = rep(0, p)
mu2 = rep(0, p)
i = 7
mu2[i] = mu2[i] + sqrt(Sigma[i,i]) * 0.3

alpha = 0.05

#n_vals <- seq(100, 1000, by = 300)
n_vals <- c(50, 100, 300, 500, 700)
power_hotelling <- numeric(length(n_vals))
power_maxT <- numeric(length(n_vals))

# === Simulation loop
B = 100
for (i in seq_along(n_vals)) {
  n <- n_vals[i] 
  cat("Sample size =", n, "\n")
  rej_hotelling <- rej_maxT <- 0
  
  for (b in 1:B) {
    X1 <- mvrnorm(n, mu1, Sigma)
    X2 <- mvrnorm(n, mu2, Sigma)
    
    pval_h = hotelling(X1, X2)
    if (pval_h < alpha) 
      rej_hotelling <- rej_hotelling + 1
    
    pval_m = maxT(X1, X2, Sigma)
    if (pval_m < alpha) 
      rej_maxT <- rej_maxT + 1
  }
  power_hotelling[i] <- rej_hotelling / B
  power_maxT[i] <- rej_maxT / B
}

## Sample size = 50 
## Sample size = 100 
## Sample size = 300 
## Sample size = 500 
## Sample size = 700

# === Plot Power Curves
plot(n_vals, power_hotelling, type = "o", pch = 16, col = "blue",  ylim = c(0, 1), xlab = "Sample size", ylab = "Power", main = "Power vs Sample Size")
lines(n_vals, power_maxT, type = "o", pch = 17, col = "red", lwd = 2)
abline(h = alpha, col = "gray", lty = 2)
legend("bottomright", legend = c("Hotelling T²", "maxT"), col = c("blue", "red"), pch = c(16, 17))

For a more obvious comparison, we use $p=100$, $\Sigma=I_p$.

p_new = 100
Sigma_new = diag(p_new)
mu1 = rep(0, p_new)
mu2 = rep(0, p_new)
mu2[1] = mu2[1] + 0.5

alpha = 0.05

#n_vals <- seq(100, 1000, by = 300)
n_vals <- c(60, 80, 100, 150, 200, 300, 400, 600, 800)
power_hotelling <- numeric(length(n_vals))
power_maxT <- numeric(length(n_vals))

# === Simulation loop
B = 100
for (i in seq_along(n_vals)) {
  n <- n_vals[i] 
  cat("Sample size =", n, "\n")
  rej_hotelling <- rej_maxT <- 0
  
  for (b in 1:B) {
    X1 <- mvrnorm(n, mu1, Sigma_new)
    X2 <- mvrnorm(n, mu2, Sigma_new)
    
    pval_h = hotelling(X1, X2)
    if (pval_h < alpha) 
      rej_hotelling <- rej_hotelling + 1
    
    pval_m = maxT(X1, X2, Sigma_new)
    if (pval_m < alpha) 
      rej_maxT <- rej_maxT + 1
  }
  power_hotelling[i] <- rej_hotelling / B
  power_maxT[i] <- rej_maxT / B
}

## Sample size = 60 
## Sample size = 80 
## Sample size = 100 
## Sample size = 150 
## Sample size = 200 
## Sample size = 300 
## Sample size = 400 
## Sample size = 600 
## Sample size = 800

# === Plot Power Curves
plot(n_vals, power_hotelling, type = "o", pch = 16, col = "blue",  ylim = c(0, 1), xlab = "Sample size", ylab = "Power", main = "Power vs Sample Size")
lines(n_vals, power_maxT, type = "o", pch = 17, col = "red", lwd = 2)
abline(h = alpha, col = "gray", lty = 2)
legend("bottomright", legend = c("Hotelling T²", "maxT"), col = c("blue", "red"), pch = c(16, 17))

Carry out the Hotelling $T^2$ and maxT tests, comparing the two regions of Sardinia.

X1 <- subset(oliveoil, region == "Sardinia.inland")[, -which(names(oliveoil) == "eicosenoic")]
X1 <- X1[, sapply(X1, is.numeric)]
X2 <- subset(oliveoil, region == "Sardinia.coast")[, -which(names(oliveoil) == "eicosenoic")]
X2 <- X2[, sapply(X2, is.numeric)]

pval_h = hotelling(X1, X2)
pval_m = maxT(X1, X2, Sigma)

cat("Hotelling T² p-value:", pval_h, "\n")

## Hotelling T² p-value: 0

cat("maxT p-value:", pval_m, "\n")

## maxT p-value: 0

We see both tests give nearly zero p-values. We reject the null hypothesis that $\mu_1=\mu_2$ with fairly high confidence.

As an calibrated test, we also run a permutation test in case the hypothesized model does not fit well, see question 4.

T = hotelling.test(X1,X2,perm=TRUE,progBar=FALSE)
T

## Test stat:  127.54 
## Numerator df:  7 
## Denominator df:  90 
## Permutation P-value:  0 
## Number of permutations : 10000

Still, we would reject the null, say for any $\alpha\gtrsim 0.01$.

Do you think this data fits the hypothesized model reasonably well?

We made several assumptions under $H_0$:

Both $X_1,X_2$ are sampled from Gaussian distributions.
$X_1,X_2$ have the same covariance $\Sigma_1=\Sigma_2=\Sigma$.
$n_1,n_2$ are large enough.

nrow(X1)

## [1] 65

nrow(X2)

## [1] 33

X <- rbind(X1, X2)
group <- c(rep(1, nrow(X1)), rep(2, nrow(X2))) 
pairs(X, col = group)

We assess each of them.

From the pair plots, the marginal distribution of one or two variables are approximately Gaussian.
The covariance of pairs (palmitic,oleic) and (oleic,linolenic) does not appear the same for $X_1,X_2$.
$n_1=65, n_2=33$ are not very large, but they are within a reasonable range to assume a known covariance matrix $\Sigma$.