Recall the univariate theory for determining whether a specific value \(\mu_0\) is a plausible value for the population mean \(\mu\)
View of Hypothesis Testing: Can be formulated as:
\(H_0:\mu=\mu_0\)
\(H_1:\mu\not= \mu_0\)
Where \(H_0\) is the null hypothesis and \(H_1\) the alternative hypothesis
Be a random sample from a normal population \(X_1,…,X_n\), the appropiate test has a student’s \(t\) distribution with n-1 degrees of freedom as:
\[t=\frac{(\overline{X}-\mu_0)}{\frac{s}{\sqrt{n}}}\] where:
\[ \overline{X}=\frac{1}{n}\sum_{j=1}^{n} X_j \text{ and } s^2=\frac{1}{n-1}\sum_{j=1}^n (X_j-\overline{X})^2 \]
Rejecting \(H_0\) when \(\mid t\mid\) is large is equivalent to rejecting \(H_0\) if \(t^2\) is large.
Be \(t^2\) the square distance from the sample mean \(\overline{X}\) to the test value \(\mu_0\):
\[ t^2=\frac{(\overline{X}-\mu_0)^2}{\frac{s^2}{n}}=n(\overline{X}-\mu_0)^2(s^2)^{-1} \]
With the observation of \(\overline{X}\) and \(s^2\), the test becomes: Reject \(H_0\) in favor of \(H_1\) at significance level \(\alpha\) if
\[ n(\overline{X}-\mu_0)^2(s^2)^{-1}>t^2_{n-1}{\alpha}/{2} \]
where \(t_{n-1}(\alpha/2)\) denotes the upper 100\((\alpha/2)\)th percentile of the \(t\)-distribution with \(n-1\) degrees of freedom.
A generalization of the square distance between the sample mean \(\overline{X}\) to the test value \(\mu_0\) is:
\[ T^2=(\overline{X}-\mu_0)'(\frac{1}{n}S)^{-1}(\overline{X}-\mu_0)=n(\overline{X}-\mu_0)'S^{-1}(\overline{X}-\mu_0) \]
where:
\[ \overline{X}_{(p\times 1)} = \frac{1}{n}\sum_{j=1}^n X_j, S_{(p\times p)}=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})' \text{ and } \mu_0=\begin{bmatrix} \mu_{10} \\ \mu_{20} \\ \vdots \\ \mu_{p0} \end{bmatrix} \]
The hypothesis \(H_0:\mu=\mu_0\) is rejected if the observed statistical distance \(T^2\) is too large (i.e. if \(\overline{x}\) is too far from \(\mu_0\) )
The special tables of \(T^2\) turns out when the percentage points are not required for formal test of hypothesis. It’s true because:
\[T^2 \text{is distributed as }\frac{(n-1)p}{n-p}F_{p,n-p}\] where \(F_{p,n-p}\) denotes a random variable with an \(F-\) distribution with \(p\) and \(n-1\) degrees of freedom.
Let \(X_1,X_2,…,X_n\) be a random sample from an \(N_p(\mu,\Sigma)\) population. Then with \(\overline{X}=\frac{1}{n}\sum_{j=1}^nX_j\) and \(S=\frac{1}{n-1}\sum_{j=1}^n(X_j-\overline{X})(X_j-\overline{X})'\),
\[\alpha=P[T^2>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)]\]
\[ =P[n(\overline{X}-\mu)'S^{-1}(\overline{X}-\mu)>\frac{(n-1)p}{n-p}F_{p,n-p}(\alpha)] \]
whatever the true \(\mu\) and \(\Sigma\). Here \(F_{p,n-p}(\alpha)\) is the upper (100\(\alpha\))th percentile of the \(F_{p,n-p}\) distribution.
The following has been taken from the book of Randall E. Schumacker-Using R with Multivariate Statistics: Chapter 3: Hotelling’s \(T^2\)A Two-Group Multivariate Analysis
The following has two dependent variables, \(Y1\) and \(Y2\).
The first dependent variable (\(Y1\)) has scores that indicate the number of points subtracted from a pop quiz.
The second dependent variable (\(Y2\)) has scores that indicate the number of points awarded on a home- work assignment.
The teacher wants to test if the joint mean for these two dependent variables together are statistically significant for her 10 students
install.packages(“ICSNP”)
install.packages(“mvtnorm”)
rm(list=ls(all=TRUE))
library(ICSNP)
## Loading required package: mvtnorm
## Loading required package: ICS
library(mvtnorm)
\(Y12\) is a data frame that combines the two matrix vectors of dependent variables
The names() function assigns variable names to the dependent variables
The attach() function makes it possible to refer to variables in data frame by their names
Print out data in the \(Y12\) data frame
Y1 =c(-2,-4,-6,-3,-7,-2,-1,-8,-6,-9)
Y2 =c(3,4,9,3,5,4,2,4,2,8)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)
## The following objects are masked _by_ .GlobalEnv:
##
## Y1, Y2
Y12
## Y1 Y2
## 1 -2 3
## 2 -4 4
## 3 -6 9
## 4 -3 3
## 5 -7 5
## 6 -2 4
## 7 -1 2
## 8 -8 4
## 9 -6 2
## 10 -9 8
Print out correlation (cor) of dependent variables
Print out means (mean) and standard deviations (sd) of dependent variables
cor(Y12)
## Y1 Y2
## Y1 1.0000000 -0.5875697
## Y2 -0.5875697 1.0000000
mean(Y1); sd(Y1)
## [1] -4.8
## [1] 2.780887
mean(Y2); sd(Y2)
## [1] 4.4
## [1] 2.366432
install.packages(“psych”)
library(psych)
error.bars(Y12,bar=FALSE,ylab="Group Means",xlab="Dependent Variables",ylim=c(-10,10),eyes=FALSE)
muH0 = c(0, 0)
HotellingsT2(Y12, mu=muH0)
##
## Hotelling's one sample T2-test
##
## data: Y12
## T.2 = 18.09, df1 = 2, df2 = 8, p-value = 0.001075
## alternative hypothesis: true location is not equal to c(0,0)
The results for the single-sample multivariate \(t\) test indicated that the two dependent variable means together are statistically significantly different from zero
The correlation matrix indicated that the two dependent variables were correlated (Step 3) \(r=-0.587\)
The Hotelling \(T^2\) value was statistically significant.
\(T^2\)=18.089 with 2 and 8 degrees of freedom
\(p\)-value=0.001. The null hypothesis is rejected and the alternative is accepted.
Let the data matrix for a random sample of size \(n = 3\) from a bivariate normal population
be \begin{bmatrix}
6 & 9 \\
10 & 6 \\8 &3
\end{bmatrix}
Evaluate the observed \(T^2\) for \(\mu'_0=[9,5]\)
rm(list=ls(all=TRUE))
Y1 =c(6,10,8)
Y2 =c(9,6,3)
Y12 =data.frame(Y1,Y2)
names (Y12) = c("Y1","Y2")
attach(Y12)
## The following objects are masked _by_ .GlobalEnv:
##
## Y1, Y2
## The following objects are masked from Y12 (pos = 4):
##
## Y1, Y2
Y12
## Y1 Y2
## 1 6 9
## 2 10 6
## 3 8 3
muH0 = c(9, 5)
HotellingsT2(Y12, mu=muH0)
##
## Hotelling's one sample T2-test
##
## data: Y12
## T.2 = 0.19444, df1 = 2, df2 = 1, p-value = 0.8485
## alternative hypothesis: true location is not equal to c(9,5)
The two independent group multivariate \(t\) test is when you hypothesize that a set of dependent variable group means are different between two independent groups
Rogerian and Adlerian counselors: There are three Rogerian counselors and six Adlerian counselors measured on two dependent variables by their clients. The first measure was counseling effectiveness and the second measure was counseling satisfaction based on a 10-point numerical scale.
Step 1: Clean Dashboard, Install Packages and Load Library Functions
install.packages(“ICSNP”)
install.packages(“mvtnorm”)
rm(list=ls(all=TRUE))
library(ICSNP)
library(mvtnorm)
Use data set from James Steven Book (2009, 5th Edition) P. 148 the independent variables Rogerian vs Adlerian
Assign data to two matrices with different number of subjects
Assign data to matrix for group membership variable, grp
Print out the matrices
roger = matrix(c(1,3,2,3,7,2),3,2)
adler = matrix(c(4,6,6,5,5,4,6,8,8,10,10,6),6,2)
grp = matrix(c(1,1,1,2,2,2,2,2,2),9,1)
roger
## [,1] [,2]
## [1,] 1 3
## [2,] 3 7
## [3,] 2 2
adler
## [,1] [,2]
## [1,] 4 6
## [2,] 6 8
## [3,] 6 8
## [4,] 5 10
## [5,] 5 10
## [6,] 4 6
grp
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 2
## [6,] 2
## [7,] 2
## [8,] 2
## [9,] 2
Combine the two dependent variable matrices
Add variable names with names() function
Use attach() function so variable names can be used
Use factor() function to declare group variable as categorical
Y=data.frame(rbind(roger,adler))
names(Y)=c("effect","satis")
attach(Y)
factor(grp)
## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2
options(scipen = 999)
Y
## effect satis
## 1 1 3
## 2 3 7
## 3 2 2
## 4 4 6
## 5 6 8
## 6 6 8
## 7 5 10
## 8 5 10
## 9 4 6
Print out correlation between effect and satis
Print out means and standard deviations for two dependent variables
cor(effect,satis)
## [1] 0.8295614
mean(effect);sd(effect)
## [1] 4
## [1] 1.732051
mean(satis);sd(satis)
## [1] 6.666667
## [1] 2.783882
Install and load R biotools package
Declare grp as factor variable
Use Y data set and grp variable
Compute Box M test
install.packages(“biotools”)
library(biotools)
## Loading required package: MASS
## ---
## biotools version 4.2
factor(grp)
## [1] 1 1 1 2 2 2 2 2 2
## Levels: 1 2
boxM(Y,grp)
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: Y
## Chi-Sq (approx.) = 0.34276, df = 3, p-value = 0.9518
When group sizes are 20 or more and the number of dependent variables are 5 or more, the chi-square approximation is preferred, other-wise the F approximation is more accurate
HotellingsT2(roger,adler)
##
## Hotelling's two sample T2-test
##
## data: roger and adler
## T.2 = 9, df1 = 2, df2 = 6, p-value = 0.01562
## alternative hypothesis: true location difference is not equal to c(0,0)
Enter sample sizes, number of dependent variables and Hotelling \(T^2\) value
Compute degrees of freedom for numerator and denominator
Compute \(F\) value and \(p\)-value
Print out \(T,F\), df1, df2 and \(p\)-value
n1 = 3
n2 = 6
p=2
T=3
df1 = p
df2 = n1 + n2-p-1
Fval = (df2/df1) * T
pval = round(1-pf(Fval,df1,df2),digits=3)
cat("T = ",T,"F-value =",Fval,"df1 =",df1,"df2 =",df2,"p-value=",pval,fill=FALSE,"\n")
## T = 3 F-value = 9 df1 = 2 df2 = 6 p-value= 0.016
The results show that the two dependent variables were positively correlated, \(r=0.829\) (Step 3)
The first dependent variable had mean = 4 and standard deviation = 1.73, and the second dependent variable had mean = 6.67 and standard deviation = 2.78.
The Box M test indicated that the covariance matrices were not statistically different, so we assumed them to be equal and proceeded with the multivariate \(t\) test.
The results indicated that \(T^2\) = 9, with 2 and 6 degrees of freedom and \(p\) = 0.016
The null hypothesis of no group mean difference is rejected.
The alternative hypothesis is accepted which indicates that the two groups, Rogerian and Adlerian, had a statistically significant joint mean difference for counseling effectiveness and counseling satisfaction by clients