Chapter 6 – Part Two (Larger sample inferences for two samples case and MANOVA)

Question 1
Question 2

Question 1

The tail length in millimeters (x1) and wing length in millimeters (x2) for 45 male hook-billed kites and 45 female kites are given data file named KiteData.csv. In data file, for male, Gender = 1, and female, Gender = 2. (use large sample approximation – Chi-square approximation)

(q1bt2 = t2_two_pop(male, female))

##          T^2      P-Value 
## 2.500501e+01 3.717322e-06

(q1b_astar = solve(spooled(male, female))%*%matrix((colMeans(male)-colMeans(female)), ncol = 1))

##                  [,1]
## Tail_Length -3.490238
## Wing_Length  2.079550

colnames(q1b_astar) = c("$a*$")
(q1ct2 = t2_two_pop(male_a, female_a))

##          T^2      P-Value 
## 2.566253e+01 2.675791e-06

(q1c_ints = two_pop_t2_CI(male, female, .05))

##                  lower     upper
## Tail_Length -11.764358 -1.161905
## Wing_Length  -5.985685  8.339220

q1d_table = q1c_ints %>%
  kbl(digits = 3, caption = "95% simultaneous confidence intervals") %>%
  kable_minimal(full_width = F)

A. Plot the male kite data as a scatter diagram, and visually check for outliers. (Note that observation 31 with x1 = 284).

B. Eliminate observation 31 and test for equality of mean vectors for populations of male and female kites at 5% significance level. If \(H_0: \mu_1, \mu_2\) = 0 is rejected, find the linear combination most responsible for the rejection of \(H_0\).

A \(T^2\) test returns a value of 25.005. This corresponds to a \(p\)-value of 0. \(A*\) is calculated to be -3.49, 2.08 corresponding to tail length and wing length, respectively; given that the magnitude of the wing length vector larger, we can conlcude it is most responsible for rejectin \(H_0\).

C. Alternatively, interpret x1 = 284 for observation 31 as a misprint and conduct the test with x1 = 184 for this observation. Does it make any difference in your inference results?

\(T^2\) test returns a value of 25.005, corresponding to a \(p\)-value of 0. This does not change our conclusion.

D. Find 95% simultaneous confidence intervals for the components of \(\mu_1, \mu_2\).

95% simultaneous confidence intervals
	lower	upper
Tail_Length	-11.764	-1.162
Wing_Length	-5.986	8.339

E. Are male or female birds generally larger? Justify based on your confidence intervals.

Male and female kites typically have similar wing lengths, but female kites generally have longer tails.

Question 2

Observations on two responses are collected for three treatments. The observation vectors are \(\begin{bmatrix}x_1\\x_2\end{bmatrix}\) are:

Treatment 1: \(\begin{bmatrix}6\\7\end{bmatrix}\begin{bmatrix}5\\9\end{bmatrix}\begin{bmatrix}8\\6\end{bmatrix}\begin{bmatrix}4\\9\end{bmatrix}\begin{bmatrix}7\\9\end{bmatrix}\)
Treatment 2: \(\begin{bmatrix}3\\3\end{bmatrix}\begin{bmatrix}1\\6\end{bmatrix}\begin{bmatrix}2\\3\end{bmatrix}\)
Treatment 3: \(\begin{bmatrix}2\\3\end{bmatrix}\begin{bmatrix}5\\1\end{bmatrix}\begin{bmatrix}3\\1\end{bmatrix}\begin{bmatrix}2\\3\end{bmatrix}\)

X = tribble(
  ~x1, ~x2, ~Treatment,
  6, 7, "1",
  5, 9, "1",
  8,5, "1",
  4, 9, "1",
  7, 9, "1",
  3,3,"2",
  1,6,"2",
  2,3,"2",
  2,3,"3",
  5,1,"3",
  3,1,"3",
  2,3,"3"
)

A. Find the sample treatment mean vectors and overall sample mean vector.

mu = X %>%
  group_by(Treatment) %>%
  summarise(x1 = mean(x1), x2 = mean(x2), count = n()) %>%
  bind_rows(., X%>%summarize(x1 = mean(x1), x2 = mean(x2), count = n())) %>%
  replace_na(., list(Treatment = "All Treatments", x1 = 0.0, x2 = 0.0, count = 1))

## `summarise()` ungrouping output (override with `.groups` argument)

mu_table = mu %>%
  kbl(caption = "Mean values", digits = 3) %>%
  kable_minimal(full_width = F)

Mean values
Treatment	x1	x2	count
1	6	7.800	5
2	2	4.000	3
3	3	2.000	4
All Treatments	4	4.917	12

B. Using the information in part a) construct the MANOVA table.

n = mu$count
x_bar = mu %>%
  filter(Treatment == "All Treatments") %>%
  select(x1, x2) %>%
  as.matrix(.)
x_bar_T = mu %>%
  filter(Treatment != "All Treatments") %>%
  select(x1, x2) %>%
  as.matrix(.)

p = dim(x_bar)[2]
q = dim(x_bar_T)[1]
diff = ((x_bar_T - matrix(x_bar, nrow = q, ncol = p, byrow = T)))
a1 = diag(1, nrow = p, ncol = p)
a2 = diag(1, nrow = q, ncol = q)
A1 = cbind(cbind(a1, a1), a1)
A2 = cbind(a2, a2)

A3 = matrix(0, nrow = p*q, ncol = p*q)
for(i in 1:(p*q)){
  j = 1
  if(i %% 2 == 0){
    j = i - 1
  }else if (i %% 2 == 1){
    j = i
  }
  A3[[i,j]] = 1
}
B_diff = t(A1)%*%t(diff)%*%A2*A3
B_n = matrix(1, nrow = p*q)%*%(n[1:3]%*%matrix(c(1, 1, 0, 0,0,0,0,0,1,1,0,0,0,0,0,0,1,1), ncol = p*q, byrow = T))

A4 = cbind(cbind(diag(1, nrow = p, ncol = p), diag(1, nrow = p, ncol = p)), diag(1, nrow = p, ncol = p))
B_diff_N = (B_diff%*%t(B_diff))*t(B_n)
B = A4%*%B_diff_N%*%t(A4)

S1 = X %>%
  filter(Treatment == 1) %>%
  select(-Treatment)%>%
  cov(as.matrix(.))
S2 = X %>%
  filter(Treatment == 2) %>%
  select(-Treatment)%>%
  cov(as.matrix(.))
S3 = X %>%
  filter(Treatment == 3) %>%
  select(-Treatment)%>%
  cov(as.matrix(.))
W = (n[1]-1)*S1 + (n[2]-1)*S2 + (n[3]-1)*S3

Source of Variation	Matrix of sum of squares and cross products (SSP)	Degrees of freedom (d.f)
Treatment	\(\begin{bmatrix}39&44.917 \\44.917&70.45 \\\end{bmatrix}\)	2
Residual (Error)	\(\begin{bmatrix}18&-15 \\-15&22.8 \\\end{bmatrix}\)	3
Total	\(\begin{bmatrix}57&29.917 \\29.917&93.25 \\\end{bmatrix}\)	9

C. Compute the Wilk’s lambda (\(\Lambda*\)), and use the table given in slide 41 of class slides to test the treatment effects using \(\alpha= 0.01\).

w_lam = det(W)/det(B+W)
w_lam_dist = ((sum(n[1:3]) - p -2)/p)*((1 - sqrt(w_lam))/sqrt(w_lam))
q2c_p_value = 1 - pf(w_lam_dist, 2*p, 2*(sum(sum(n[1:3]) - p -2)))

With a Wilk’s lambda of 0.042, a test statistic of 15.531 we find a p-value of 0, and as it is below .05, we can reject \(H_0\) and conclude that there is a treatment effect.

D. Repeat the test using the Chi-square approximation with Bartlett’s correction given in slides 42-43. Compare your conclusions with part c).

bart  = -(n[4]-1-((p+q)/2)*log(w_lam))
crit = qchisq(1 -.05, p*(q-1))
crit

## [1] 9.487729

Bartlett’s correction produces a value of -18.929, which is less than critical number 9.488 which corresponds to \(\alpha = .05\). This does not support our findings above, but as Bartlett’s correction applies to samples with large \(n\) we shouldn’t expect it to work on this sample, with \(n = 12\).