Inference about Mean Vector: Hotelling’s \(T^2\) Test

The Plausibility of \(\mu_0\) as a Value for a Normal Population Mean

Recall the univariate theory for determining whether a specific value \(\mu_0\) is a plausible value for the population mean \(\mu\)

View of Hypothesis Testing: Can be formulated as:
- \(H_0:\mu=\mu_0\)
- \(H_1:\mu\not= \mu_0\)

Where \(H_0\) is the null hypothesis and \(H_1\) the alternative hypothesis

Be a random sample from a normal population \(X_1,…,X_n\), the appropiate test has a student’s \(t\) distribution with n-1 degrees of freedom as:

\[t=\frac{(\overline{X}-\mu_0)}{\frac{s}{\sqrt{n}}}\] where:

\[ \overline{X}=\frac{1}{n}\sum_{j=1}^{n} X_j \text{ and } s^2=\frac{1}{n-1}\sum_{j=1}^n (X_j-\overline{X})^2 \]

Rejecting \(H_0\) when \(\mid t\mid\) is large is equivalent to rejecting \(H_0\) if \(t^2\) is large.
Be \(t^2\) the square distance from the sample mean \(\overline{X}\) to the test value \(\mu_0\):

\[ t^2=\frac{(\overline{X}-\mu_0)^2}{\frac{s^2}{n}}=n(\overline{X}-\mu_0)^2(s^2)^{-1} \]

With the observation of \(\overline{X}\) and \(s^2\), the test becomes: Reject \(H_0\) in favor of \(H_1\) at significance level \(\alpha\) if

\[ n(\overline{X}-\mu_0)^2(s^2)^{-1}>t^2_{n-1}{\alpha}/{2} \]

where \(t_{n-1}(\alpha/2)\) denotes the upper 100\((\alpha/2)\)th percentile of the \(t\)-distribution with \(n-1\) degrees of freedom.

If \(H_0\) is not rejected, we conclude that \(\mu_0\) is a plausible value for the normal population mean

The Statistic Hotelling’s \(T^2\)

A generalization of the square distance between the sample mean \(\overline{X}\) to the test value \(\mu_0\) is:

\[ T^2=(\overline{X}-\mu_0)'(\frac{1}{n}S)^{-1}(\overline{X}-\mu_0)=n(\overline{X}-\mu_0)'S^{-1}(\overline{X}-\mu_0) \]

where:

\[ \overline{X}_{(p\times 1)} = \frac{1}{n}\sum_{j=1}^n X_j, S_{(p\times p)}=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})' \text{ and } \mu_0=\begin{bmatrix} \mu_{10} \\ \mu_{20} \\ \vdots \\ \mu_{p0} \end{bmatrix} \]

The hypothesis \(H_0:\mu=\mu_0\) is rejected if the observed statistical distance \(T^2\) is too large (i.e. if \(\overline{x}\) is too far from \(\mu_0\) )

The special tables of \(T^2\) turns out when the percentage points are not required for formal test of hypothesis. It’s true because:

\[T^2 \text{is distributed as }\frac{(n-1)p}{n-p}F_{p,n-p}\] where \(F_{p,n-p}\) denotes a random variable with an \(F-\) distribution with \(p\) and \(n-1\) degrees of freedom.

Summarize

Let \(X_1,X_2,…,X_n\) be a random sample from an \(N_p(\mu,\Sigma)\) population. Then with \(\overline{X}=\frac{1}{n}\sum_{j=1}^nX_j\) and \(S=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})'\),

\[\alpha=P[T^2>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)]\]

\[ =P[n(\overline{X}-\mu)'S^{-1}(\overline{X}-\mu)>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)] \]

whatever the true \(\mu\) and \(\Sigma\). Here \(F_{p,n-p}(\alpha)\) is the upper (100\(\alpha\))th percentile of the \(F_{p,n-p}\) distribution.

Applications in R

The following has been taken from the book of Randall E. Schumacker-Using R with Multivariate Statistics: Chapter 3: Hotelling’s \(T^2\)A Two-Group Multivariate Analysis

Single-Sample Multivariate \(t\) Test (Example 1)

The following has two dependent variables, \(Y1\) and \(Y2\).

The first dependent variable (\(Y1\)) has scores that indicate the number of points subtracted from a pop quiz.
The second dependent variable (\(Y2\)) has scores that indicate the number of points awarded on a home- work assignment.

The teacher wants to test if the joint mean for these two dependent variables together are statistically significant for her 10 students

Step 1: Clean Dashboard, Install Packages and Load Library Functions

install.packages(“ICSNP”)
install.packages(“mvtnorm”)

rm(list=ls(all=TRUE))
library(ICSNP)

## Loading required package: mvtnorm

## Loading required package: ICS

library(mvtnorm)

Step 2: Enter Data for Two Dependent Variables in Two Separate Matrix Vectors \(Y1\) and \(Y2\)

\(Y12\) is a data frame that combines the two matrix vectors of dependent variables
The names() function assigns variable names to the dependent variables
The attach() function makes it possible to refer to variables in data frame by their names
Print out data in the \(Y12\) data frame

Y1 =c(-2,-4,-6,-3,-7,-2,-1,-8,-6,-9) 
Y2 =c(3,4,9,3,5,4,2,4,2,8)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)

## The following objects are masked _by_ .GlobalEnv:
## 
##     Y1, Y2

Y12

##    Y1 Y2
## 1  -2  3
## 2  -4  4
## 3  -6  9
## 4  -3  3
## 5  -7  5
## 6  -2  4
## 7  -1  2
## 8  -8  4
## 9  -6  2
## 10 -9  8

Step 3: Print Correlation, Mean and Standard Deviations

Print out correlation (cor) of dependent variables
Print out means (mean) and standard deviations (sd) of dependent variables

cor(Y12)

##            Y1         Y2
## Y1  1.0000000 -0.5875697
## Y2 -0.5875697  1.0000000

mean(Y1); sd(Y1)

## [1] -4.8

## [1] 2.780887

mean(Y2); sd(Y2)

## [1] 4.4

## [1] 2.366432

Step 4: Install and Load R psych Package. Graph Dependent Variable Mean

install.packages(“psych”)

library(psych) 
error.bars(Y12,bar=FALSE,ylab="Group Means",xlab="Dependent Variables",ylim=c(-10,10),eyes=FALSE)

Step 5: Conduct a Hotelling \(T^2\) Test of Null Hypothesis that Dependent Means are Different than Zero

muH0 = c(0, 0) 
HotellingsT2(Y12, mu=muH0)

## 
##  Hotelling's one sample T2-test
## 
## data:  Y12
## T.2 = 18.09, df1 = 2, df2 = 8, p-value = 0.001075
## alternative hypothesis: true location is not equal to c(0,0)

Step 6: Results Interpretations

The results for the single-sample multivariate \(t\) test indicated that the two dependent variable means together are statistically significantly different from zero
The correlation matrix indicated that the two dependent variables were correlated (Step 3) \(r=-0.587\)
The Hotelling \(T^2\) value was statistically significant.
- \(T^2\)=18.089 with 2 and 8 degrees of freedom
- \(p\)-value=0.001. The null hypothesis is rejected and the alternative is accepted.

Single-Sample Multivariate t Test (Example 2)

Let the data matrix for a random sample of size \(n = 3\) from a bivariate normal population be \begin{bmatrix}
6 & 9 \\
10 & 6 \\8 &3
\end{bmatrix}

Evaluate the observed \(T^2\) for \(\mu'_0=[9,5]\)

rm(list=ls(all=TRUE))
Y1 =c(6,10,8) 
Y2 =c(9,6,3)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)

## The following objects are masked _by_ .GlobalEnv:
## 
##     Y1, Y2

## The following objects are masked from Y12 (pos = 4):
## 
##     Y1, Y2

Y12

##   Y1 Y2
## 1  6  9
## 2 10  6
## 3  8  3

muH0 = c(9, 5) 
HotellingsT2(Y12, mu=muH0)

## 
##  Hotelling's one sample T2-test
## 
## data:  Y12
## T.2 = 0.19444, df1 = 2, df2 = 1, p-value = 0.8485
## alternative hypothesis: true location is not equal to c(9,5)

Two Independent Group Mean Difference

The two independent group multivariate \(t\) test is when you hypothesize that a set of dependent variable group means are different between two independent groups

Rogerian and Adlerian counselors: There are three Rogerian counselors and six Adlerian counselors measured on two dependent variables by their clients. The first measure was counseling effectiveness and the second measure was counseling satisfaction based on a 10-point numerical scale.

Step 1: Clean Dashboard, Install Packages and Load Library Functions

install.packages(“ICSNP”)
install.packages(“mvtnorm”)

rm(list=ls(all=TRUE))
library(ICSNP)
library(mvtnorm)

Step 2: Enter Data Set

Use data set from James Steven Book (2009, 5th Edition) P. 148 the independent variables Rogerian vs Adlerian
Assign data to two matrices with different number of subjects
Assign data to matrix for group membership variable, grp
Print out the matrices

roger = matrix(c(1,3,2,3,7,2),3,2)
adler = matrix(c(4,6,6,5,5,4,6,8,8,10,10,6),6,2)
grp = matrix(c(1,1,1,2,2,2,2,2,2),9,1)
roger

##      [,1] [,2]
## [1,]    1    3
## [2,]    3    7
## [3,]    2    2

adler

##      [,1] [,2]
## [1,]    4    6
## [2,]    6    8
## [3,]    6    8
## [4,]    5   10
## [5,]    5   10
## [6,]    4    6

grp

##       [,1]
##  [1,]    1
##  [2,]    1
##  [3,]    1
##  [4,]    2
##  [5,]    2
##  [6,]    2
##  [7,]    2
##  [8,]    2
##  [9,]    2

Combine the two dependent variable matrices
Add variable names with names() function
Use attach() function so variable names can be used
Use factor() function to declare group variable as categorical

Y=data.frame(rbind(roger,adler))
names(Y)=c("effect","satis")
attach(Y)
factor(grp)

## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2

options(scipen = 999)
Y

##   effect satis
## 1      1     3
## 2      3     7
## 3      2     2
## 4      4     6
## 5      6     8
## 6      6     8
## 7      5    10
## 8      5    10
## 9      4     6

Step 3: Correlation, Mean and Standard Deviations

Print out correlation between effect and satis
Print out means and standard deviations for two dependent variables

cor(effect,satis)

## [1] 0.8295614

mean(effect);sd(effect)

## [1] 4

## [1] 1.732051

mean(satis);sd(satis)

## [1] 6.666667

## [1] 2.783882

Step 4: Compute Box M Test of Equal Covariance Matrices

Install and load R biotools package
Declare grp as factor variable
Use Y data set and grp variable
Compute Box M test

install.packages(“biotools”)

library(biotools)

## Loading required package: MASS

## ---
## biotools version 4.2

factor(grp)

## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2

boxM(Y,grp)

## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  Y
## Chi-Sq (approx.) = 0.34276, df = 3, p-value = 0.9518

When group sizes are 20 or more and the number of dependent variables are 5 or more, the chi-square approximation is preferred, other-wise the F approximation is more accurate

Step 5: Conduct a Hotelling \(T^2\) Test Based on Two Sample Data Matrices

HotellingsT2(roger,adler)

## 
##  Hotelling's two sample T2-test
## 
## data:  roger and adler
## T.2 = 9, df1 = 2, df2 = 6, p-value = 0.01562
## alternative hypothesis: true location difference is not equal to c(0,0)

Step 6: Compute F Test for Hotelling \(T^2\) Value

Enter sample sizes, number of dependent variables and Hotelling \(T^2\) value
Compute degrees of freedom for numerator and denominator
Compute \(F\) value and \(p\)-value
Print out \(T,F\), df1, df2 and \(p\)-value

n1 = 3 
n2 = 6
p=2 
T=3

df1 = p
df2 = n1 + n2-p-1

Fval = (df2/df1) * T
pval = round(1-pf(Fval,df1,df2),digits=3)

cat("T = ",T,"F-value =",Fval,"df1 =",df1,"df2 =",df2,"p-value=",pval,fill=FALSE,"\n")

## T =  3 F-value = 9 df1 = 2 df2 = 6 p-value= 0.016

Step 7: Results Interpretations

The results show that the two dependent variables were positively correlated, \(r=0.829\) (Step 3)
The first dependent variable had mean = 4 and standard deviation = 1.73, and the second dependent variable had mean = 6.67 and standard deviation = 2.78.
The Box M test indicated that the covariance matrices were not statistically different, so we assumed them to be equal and proceeded with the multivariate \(t\) test.
The results indicated that \(T^2\) = 9, with 2 and 6 degrees of freedom and \(p\) = 0.016
The null hypothesis of no group mean difference is rejected.
The alternative hypothesis is accepted which indicates that the two groups, Rogerian and Adlerian, had a statistically significant joint mean difference for counseling effectiveness and counseling satisfaction by clients

Inference About a Mean Vector

Jose Carlos Molano de Oro

2022-08-02

Inference about Mean Vector: Hotelling’s \(T^2\) Test

The Plausibility of \(\mu_0\) as a Value for a Normal Population Mean

The Statistic Hotelling’s \(T^2\)

Summarize

Applications in R

Single-Sample Multivariate \(t\) Test (Example 1)

Step 1: Clean Dashboard, Install Packages and Load Library Functions

Step 2: Enter Data for Two Dependent Variables in Two Separate Matrix Vectors \(Y1\) and \(Y2\)

Step 3: Print Correlation, Mean and Standard Deviations

Step 4: Install and Load R psych Package. Graph Dependent Variable Mean

Step 5: Conduct a Hotelling \(T^2\) Test of Null Hypothesis that Dependent Means are Different than Zero

Step 6: Results Interpretations

Single-Sample Multivariate t Test (Example 2)

Two Independent Group Mean Difference

Step 2: Enter Data Set

Step 3: Correlation, Mean and Standard Deviations

Step 4: Compute Box M Test of Equal Covariance Matrices

Step 5: Conduct a Hotelling \(T^2\) Test Based on Two Sample Data Matrices

Step 6: Compute F Test for Hotelling \(T^2\) Value

Step 7: Results Interpretations