Inference about Mean Vector: Hotelling’s \(T^2\) Test

The Plausibility of \(\mu_0\) as a Value for a Normal Population Mean

Recall the univariate theory for determining whether a specific value \(\mu_0\) is a plausible value for the population mean \(\mu\)

  • View of Hypothesis Testing: Can be formulated as:

    • \(H_0:\mu=\mu_0\)

    • \(H_1:\mu\not= \mu_0\)

Where \(H_0\) is the null hypothesis and \(H_1\) the alternative hypothesis

  • Be a random sample from a normal population \(X_1,…,X_n\), the appropiate test has a student’s \(t\) distribution with n-1 degrees of freedom as:

    \[t=\frac{(\overline{X}-\mu_0)}{\frac{s}{\sqrt{n}}}\] where:

\[ \overline{X}=\frac{1}{n}\sum_{j=1}^{n} X_j \text{ and } s^2=\frac{1}{n-1}\sum_{j=1}^n (X_j-\overline{X})^2 \]

  • Rejecting \(H_0\) when \(\mid t\mid\) is large is equivalent to rejecting \(H_0\) if \(t^2\) is large.

  • Be \(t^2\) the square distance from the sample mean \(\overline{X}\) to the test value \(\mu_0\):

    \[ t^2=\frac{(\overline{X}-\mu_0)^2}{\frac{s^2}{n}}=n(\overline{X}-\mu_0)^2(s^2)^{-1} \]

    With the observation of \(\overline{X}\) and \(s^2\), the test becomes: Reject \(H_0\) in favor of \(H_1\) at significance level \(\alpha\) if

\[ n(\overline{X}-\mu_0)^2(s^2)^{-1}>t^2_{n-1}{\alpha}/{2} \]

where \(t_{n-1}(\alpha/2)\) denotes the upper 100\((\alpha/2)\)th percentile of the \(t\)-distribution with \(n-1\) degrees of freedom.

  • If \(H_0\) is not rejected, we conclude that \(\mu_0\) is a plausible value for the normal population mean

The Statistic Hotelling’s \(T^2\)

A generalization of the square distance between the sample mean \(\overline{X}\) to the test value \(\mu_0\) is:

\[ T^2=(\overline{X}-\mu_0)'(\frac{1}{n}S)^{-1}(\overline{X}-\mu_0)=n(\overline{X}-\mu_0)'S^{-1}(\overline{X}-\mu_0) \]

where:

\[ \overline{X}_{(p\times 1)} = \frac{1}{n}\sum_{j=1}^n X_j, S_{(p\times p)}=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})' \text{ and } \mu_0=\begin{bmatrix} \mu_{10} \\ \mu_{20} \\ \vdots \\ \mu_{p0} \end{bmatrix} \]

The hypothesis \(H_0:\mu=\mu_0\) is rejected if the observed statistical distance \(T^2\) is too large (i.e. if \(\overline{x}\) is too far from \(\mu_0\) )

The special tables of \(T^2\) turns out when the percentage points are not required for formal test of hypothesis. It’s true because:

\[T^2 \text{is distributed as }\frac{(n-1)p}{n-p}F_{p,n-p}\] where \(F_{p,n-p}\) denotes a random variable with an \(F-\) distribution with \(p\) and \(n-1\) degrees of freedom.

Summarize

Let \(X_1,X_2,…,X_n\) be a random sample from an \(N_p(\mu,\Sigma)\) population. Then with \(\overline{X}=\frac{1}{n}\sum_{j=1}^nX_j\) and \(S=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})'\),

\[\alpha=P[T^2>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)]\]

\[ =P[n(\overline{X}-\mu)'S^{-1}(\overline{X}-\mu)>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)] \]

whatever the true \(\mu\) and \(\Sigma\). Here \(F_{p,n-p}(\alpha)\) is the upper (100\(\alpha\))th percentile of the \(F_{p,n-p}\) distribution.

Applications in R

The following has been taken from the book of Randall E. Schumacker-Using R with Multivariate Statistics: Chapter 3: Hotelling’s \(T^2\)A Two-Group Multivariate Analysis

Single-Sample Multivariate \(t\) Test (Example 1)

The following has two dependent variables, \(Y1\) and \(Y2\).

The teacher wants to test if the joint mean for these two dependent variables together are statistically significant for her 10 students

Step 1: Clean Dashboard, Install Packages and Load Library Functions

  • install.packages(“ICSNP”)

  • install.packages(“mvtnorm”)

rm(list=ls(all=TRUE))
library(ICSNP)
## Loading required package: mvtnorm
## Loading required package: ICS
library(mvtnorm) 

Step 2: Enter Data for Two Dependent Variables in Two Separate Matrix Vectors \(Y1\) and \(Y2\)

  • \(Y12\) is a data frame that combines the two matrix vectors of dependent variables

  • The names() function assigns variable names to the dependent variables

  • The attach() function makes it possible to refer to variables in data frame by their names

  • Print out data in the \(Y12\) data frame

Y1 =c(-2,-4,-6,-3,-7,-2,-1,-8,-6,-9) 
Y2 =c(3,4,9,3,5,4,2,4,2,8)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)
## The following objects are masked _by_ .GlobalEnv:
## 
##     Y1, Y2
Y12
##    Y1 Y2
## 1  -2  3
## 2  -4  4
## 3  -6  9
## 4  -3  3
## 5  -7  5
## 6  -2  4
## 7  -1  2
## 8  -8  4
## 9  -6  2
## 10 -9  8

Step 3: Print Correlation, Mean and Standard Deviations

  • Print out correlation (cor) of dependent variables

  • Print out means (mean) and standard deviations (sd) of dependent variables

cor(Y12) 
##            Y1         Y2
## Y1  1.0000000 -0.5875697
## Y2 -0.5875697  1.0000000
mean(Y1); sd(Y1) 
## [1] -4.8
## [1] 2.780887
mean(Y2); sd(Y2) 
## [1] 4.4
## [1] 2.366432

Step 4: Install and Load R psych Package. Graph Dependent Variable Mean

install.packages(“psych”)

library(psych) 
error.bars(Y12,bar=FALSE,ylab="Group Means",xlab="Dependent Variables",ylim=c(-10,10),eyes=FALSE) 

Step 5: Conduct a Hotelling \(T^2\) Test of Null Hypothesis that Dependent Means are Different than Zero

muH0 = c(0, 0) 
HotellingsT2(Y12, mu=muH0) 
## 
##  Hotelling's one sample T2-test
## 
## data:  Y12
## T.2 = 18.09, df1 = 2, df2 = 8, p-value = 0.001075
## alternative hypothesis: true location is not equal to c(0,0)

Step 6: Results Interpretations

  • The results for the single-sample multivariate \(t\) test indicated that the two dependent variable means together are statistically significantly different from zero

  • The correlation matrix indicated that the two dependent variables were correlated (Step 3) \(r=-0.587\)

  • The Hotelling \(T^2\) value was statistically significant.

    • \(T^2\)=18.089 with 2 and 8 degrees of freedom

    • \(p\)-value=0.001. The null hypothesis is rejected and the alternative is accepted.

Single-Sample Multivariate t Test (Example 2)

Let the data matrix for a random sample of size \(n = 3\) from a bivariate normal population be \begin{bmatrix}
6 & 9 \\
10 & 6 \\8 &3
\end{bmatrix}

Evaluate the observed \(T^2\) for \(\mu'_0=[9,5]\)

rm(list=ls(all=TRUE))
Y1 =c(6,10,8) 
Y2 =c(9,6,3)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)
## The following objects are masked _by_ .GlobalEnv:
## 
##     Y1, Y2
## The following objects are masked from Y12 (pos = 4):
## 
##     Y1, Y2
Y12
##   Y1 Y2
## 1  6  9
## 2 10  6
## 3  8  3
muH0 = c(9, 5) 
HotellingsT2(Y12, mu=muH0) 
## 
##  Hotelling's one sample T2-test
## 
## data:  Y12
## T.2 = 0.19444, df1 = 2, df2 = 1, p-value = 0.8485
## alternative hypothesis: true location is not equal to c(9,5)

Two Independent Group Mean Difference

The two independent group multivariate \(t\) test is when you hypothesize that a set of dependent variable group means are different between two independent groups

Rogerian and Adlerian counselors: There are three Rogerian counselors and six Adlerian counselors measured on two dependent variables by their clients. The first measure was counseling effectiveness and the second measure was counseling satisfaction based on a 10-point numerical scale.

Step 1: Clean Dashboard, Install Packages and Load Library Functions

  • install.packages(“ICSNP”)

  • install.packages(“mvtnorm”)

rm(list=ls(all=TRUE))
library(ICSNP)
library(mvtnorm) 

Step 2: Enter Data Set

  • Use data set from James Steven Book (2009, 5th Edition) P. 148 the independent variables Rogerian vs Adlerian

  • Assign data to two matrices with different number of subjects

  • Assign data to matrix for group membership variable, grp

  • Print out the matrices

roger = matrix(c(1,3,2,3,7,2),3,2)
adler = matrix(c(4,6,6,5,5,4,6,8,8,10,10,6),6,2)
grp = matrix(c(1,1,1,2,2,2,2,2,2),9,1)
roger 
##      [,1] [,2]
## [1,]    1    3
## [2,]    3    7
## [3,]    2    2
adler
##      [,1] [,2]
## [1,]    4    6
## [2,]    6    8
## [3,]    6    8
## [4,]    5   10
## [5,]    5   10
## [6,]    4    6
grp
##       [,1]
##  [1,]    1
##  [2,]    1
##  [3,]    1
##  [4,]    2
##  [5,]    2
##  [6,]    2
##  [7,]    2
##  [8,]    2
##  [9,]    2
  • Combine the two dependent variable matrices

  • Add variable names with names() function

  • Use attach() function so variable names can be used

  • Use factor() function to declare group variable as categorical

Y=data.frame(rbind(roger,adler))
names(Y)=c("effect","satis")
attach(Y)
factor(grp)
## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2
options(scipen = 999)
Y
##   effect satis
## 1      1     3
## 2      3     7
## 3      2     2
## 4      4     6
## 5      6     8
## 6      6     8
## 7      5    10
## 8      5    10
## 9      4     6

Step 3: Correlation, Mean and Standard Deviations

  • Print out correlation between effect and satis

  • Print out means and standard deviations for two dependent variables

cor(effect,satis)
## [1] 0.8295614
mean(effect);sd(effect)
## [1] 4
## [1] 1.732051
mean(satis);sd(satis)
## [1] 6.666667
## [1] 2.783882

Step 4: Compute Box M Test of Equal Covariance Matrices

  • Install and load R biotools package

  • Declare grp as factor variable

  • Use Y data set and grp variable

  • Compute Box M test

install.packages(“biotools”)

library(biotools)
## Loading required package: MASS
## ---
## biotools version 4.2
factor(grp)
## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2
boxM(Y,grp)
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  Y
## Chi-Sq (approx.) = 0.34276, df = 3, p-value = 0.9518

When group sizes are 20 or more and the number of dependent variables are 5 or more, the chi-square approximation is preferred, other-wise the F approximation is more accurate

Step 5: Conduct a Hotelling \(T^2\) Test Based on Two Sample Data Matrices

HotellingsT2(roger,adler)
## 
##  Hotelling's two sample T2-test
## 
## data:  roger and adler
## T.2 = 9, df1 = 2, df2 = 6, p-value = 0.01562
## alternative hypothesis: true location difference is not equal to c(0,0)

Step 6: Compute F Test for Hotelling \(T^2\) Value

  • Enter sample sizes, number of dependent variables and Hotelling \(T^2\) value

  • Compute degrees of freedom for numerator and denominator

  • Compute \(F\) value and \(p\)-value

  • Print out \(T,F\), df1, df2 and \(p\)-value

n1 = 3 
n2 = 6
p=2 
T=3

df1 = p
df2 = n1 + n2-p-1

Fval = (df2/df1) * T
pval = round(1-pf(Fval,df1,df2),digits=3)

cat("T = ",T,"F-value =",Fval,"df1 =",df1,"df2 =",df2,"p-value=",pval,fill=FALSE,"\n")
## T =  3 F-value = 9 df1 = 2 df2 = 6 p-value= 0.016

Step 7: Results Interpretations

  • The results show that the two dependent variables were positively correlated, \(r=0.829\) (Step 3)

  • The first dependent variable had mean = 4 and standard deviation = 1.73, and the second dependent variable had mean = 6.67 and standard deviation = 2.78.

  • The Box M test indicated that the covariance matrices were not statistically different, so we assumed them to be equal and proceeded with the multivariate \(t\) test.

  • The results indicated that \(T^2\) = 9, with 2 and 6 degrees of freedom and \(p\) = 0.016

  • The null hypothesis of no group mean difference is rejected.

  • The alternative hypothesis is accepted which indicates that the two groups, Rogerian and Adlerian, had a statistically significant joint mean difference for counseling effectiveness and counseling satisfaction by clients