Homework 2

\(\textbf{4.23}\)
Consider the annual rates of return (including dividends) on the Dow-Jones industrial average for the years 1996-2005. These data, multiplied by 100, are
-0.6 3.1 25.3 -16.8 -7.1 -6.2 16.1 25.2 22.6 26
Use these 10 observations to complete the following.
\(\textbf{(a)}\) Construct a \(Q-Q\) plot. Do the data seem to be normally distributed? Explain.

dowj<-c(-.6, 3.1, 25.3, -16.8, -7.1, -6.2, 16.1, 25.2, 22.6, 26)
qqp<-qqnorm(dowj)
qqline(dowj)

The data seem to play close to our Theoretical Normal Quantiles here. Based on a small sample size of only 10, no egregious violations of normality are present.

\(\textbf{(b)}\) Carry out a test of normality based on the correlation coefficient \(r_Q\) Let the significance level be \(\alpha\)=.10.

cor(qqp$x, qqp$y)

## [1] 0.9500625

Using Table 4.2 in the text, we would reject our assumption of normality if \(r\) falls below .9351 based on \(n=10\) and \(\alpha=.10\). With our correlation coefficient of .95, we do NOT reject our assumption that the data is normal.

\(\textbf{4.29}\) Given the air pollution data in Table 1.5, examine the pairs \(X_5 = NO_2\) and \(X_6 = O_3\) for bivariate normality.
\(\textbf{(a)}\) Calculate statistical distances \((X_j = \bar{x})\prime S^{-1} (X_j = \bar{x}), j=1, 2, ..., 42\) where \(X_j\prime = [x_{j5}, x_{j6}]\).

no2<-c(12, 9, 5, 8, 8, 12, 12, 21, 11, 13, 10, 12, 18, 11, 8, 9, 7, 16, 13, 
       9, 14, 7, 13, 5, 10, 7, 11, 7, 9, 7, 10, 12, 8, 10, 6, 9, 6, 13, 9, 8, 11, 6)
o3<-c(8, 5, 6, 15, 10, 12, 15, 14, 11, 9, 3, 7, 10, 7, 10, 10, 7, 4, 2, 5, 
      4, 6, 11, 2, 23, 6, 11, 10, 8, 2, 7, 8, 4, 24, 9, 10, 12, 18, 25, 6, 14, 5)

x<-cbind(no2, o3)

#mean
one<-matrix(rep(1, 42), 42, 1)
xbar<-1/42*t(x)%*%one

#var-covar matrix
varcov<-1/41*t(x)%*%(diag(42)-(1/42)*one%*%t(one))%*%x

sqdist<-mahalanobis(x, xbar, varcov)
sqsortdist<-sort(sqdist)
sqsortdist

##  [1]  0.1224973  0.1224973  0.1379719  0.1388339  0.1388339  0.1901188
##  [7]  0.3159498  0.4135364  0.4135364  0.4606524  0.4606524  0.4760726
## [13]  0.6228096  0.6370218  0.6592206  0.6592206  0.7032485  0.7874152
## [19]  0.8162468  0.8856041  0.8987982  1.0360061  1.0360061  1.1471939
## [25]  1.1848895  1.3566301  1.4584229  1.6282902  1.8013611  1.8984708
## [31]  2.2488867  2.3770610  2.7741416  2.7782596  3.0089122  3.4437748
## [37]  4.7646873  5.6494392  6.1488606  7.0857237  8.4730649 10.6391792

\(\textbf{(b)}\) Determine the proportion of observations \(X_j\prime = [x_{j5}, x_{j6}], j=1, 2, ..., 42\) falling within the approximate 50% probability contour of a bivariate normal distribution.

###Formal Test
qcp<-qchisq(.5, 2)
t<-NULL
for(i in 1:length(sqsortdist)){
if (sqsortdist[i]<=qcp){
  t[i]<-1
} else {
  t[i]<-0
 }
}
prop.normality.test<-mean(t)
prop.normality.test

## [1] 0.6190476

Almost 62% of squared distances fall within the approximate 50% probability contour of a bivariate normal distribution. We fail to reject the assumption of bivariate normality here.

\(\textbf{(c)}\) Construct a chi-square plot of the ordered distances in Part a.

n<-42
prob<-(c(1:n)-0.5)/n
q1<-qchisq(prob,2) 
par(bg="gray99")
plot(q1,sqsortdist,xlab='Quantile of chi-square',ylab='d^2', 
     main = "Chi-Square Plot \n for Bivariate Normality\nAir Pollution Data", col = "steelblue", pch = 19)
abline(a=0, b=1, col="indianred")

\(\textbf{5.5}\)
The quantities \(\bar{x}\), \(S\), and \(S^{-1}\) are given in Example 5.3 for the transformed microwave-radiation data. Conduct a test of the null hypothesis \(H_0: \mu\prime = [.55, .60]\) at the \(\alpha=.05\) level of significance. Is your result consistent with the 95% confidence ellipse for \(\mu\) pictured in Figure 5.1? Explain.
We reject H_0 when: \[ T^2=n(\bar{x}-\mu_0)\prime S^{-1} (\bar{x}-\mu_0) > \frac{(n-1)p}{n-p}F_{p, n-p}(\alpha) \] \[T^2=42[.564-.55, .603-.6] \left[ \begin{array} {r r} 199.0457 & -159.5092 -159.5092 & 196.3190 \end{array} \right] \left[ \begin{array} {r} .014 \\ .003 \end{array} \right]\ \] \[T^2= \left[ \begin{array} {r r} 96.9407 & -69.05521 \end{array} \right]\left[ \begin{array} {r} .014 \\ .003 \end{array} \right]\ \] \[T^2= 1.15004 \]

Our critical value is given by:

p<-2
n<-42
crit.val<-sqrt((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F))
crit.val

## [1] 2.573915

Since \(T^2=1.15004\) is less than our critical value, we fail to reject the null hypothesis \(H_0: \mu\prime = [.55, .60]\)

\(\textbf{5.10}\)
\(\textbf{(a)}\) Obtain the 95% \(T^2\) simultaneous confidence intervals for the four population means for length.

The sample data for the four mean lengths for the bears at Years 2, 3, 4, and 5 are:

l2<-c(141, 140, 145, 146, 150, 142, 139)
l3<-c(157, 168, 162, 159, 158, 140, 171)
l4<-c(168, 174, 172, 176, 168, 178, 176)
l5<-c(183, 170, 177, 171, 175, 189, 175)
bear.length<-cbind(l2, l3, l4, l5)

We can use the following function to obtain these simultaneous c.i.’s:

simult.ci<-function(x, n, p){
  crit.value<-sqrt(((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F)))
  paste("(",mean(x)-crit.value*sqrt(var(x)/n),",", mean(x)+crit.value*sqrt(var(x)/n),")")
}
n<-7
p<-4
simult.ci(l2, n, p) #Length 2 T-SQUARE CI

## [1] "( 130.685102442155 , 155.886326129274 )"

simult.ci(l3, n, p) #LENGTH 3 T-SQUARE CI

## [1] "( 127.021626824303 , 191.549801747126 )"

simult.ci(l4, n, p) #LENGTH 4 T-SQUARE CI

## [1] "( 160.308158196709 , 185.977556089005 )"

simult.ci(l5, n, p) #LENGTH 5 T-SQUARE CI

## [1] "( 155.37486709837 , 198.910847187344 )"

\(\textbf{(b)}\) Refer to Part a. Obtain the 95% \(T^2\) simultaneous confidence intervals for the three successive yearly increases in mean length.

Since \[\bar{x}_i - \bar{x}_k \pm \sqrt{\frac{p(n-1)}{(n-p)}F_{p, n-p}(\alpha)} \sqrt{\frac{s_{ii}-2s_{ik}+s_{kk}}{n}} \] We may use a short R-function:

successive.diff<-function(x, i, j, n, p){
  mean.dif<-mean(x[,j]-x[,i])
  crit.value<-sqrt(((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F)))
  var<-var(x)
  lb<-mean.dif-crit.value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
  ub<-mean.dif+crit.value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
  paste("(",lb, ",", ub, ")")
  
}


successive.diff(bear.length, 1, 2, 7, 4) #Length 3- Length 2 T-SQUARE CI

## [1] "( -21.2264919444937 , 53.2264919444937 )"

successive.diff(bear.length, 2, 3, 7, 4) #Length 4- Length 3 T-SQUARE CI

## [1] "( -22.7307683981843 , 50.44505411247 )"

successive.diff(bear.length, 3, 4, 7, 4) #Length 5- Length 4 T-SQUARE CI

## [1] "( -20.6538465602516 , 28.6538465602516 )"

\(\textbf{(c)}\) Obtain the 95% \(T^2\) confidence ellipse for the mean increase in length from 2 to 3 years and the mean increase in length from 4 to 5 years.

Using our bear.length data set we can get the paired differences for year3-year2 and year5-year4 and the variance-covariance matrices for the differences:

a.mat<-matrix(c(l3-l2,l5-l4), 7, 2)
a.mat

##      [,1] [,2]
## [1,]   16   15
## [2,]   28   -4
## [3,]   17    5
## [4,]   13   -5
## [5,]    8    7
## [6,]   -2   11
## [7,]   32   -1

var(a.mat)

##           [,1]      [,2]
## [1,] 133.00000 -49.66667
## [2,] -49.66667  58.33333

Then we can plot our confidence ellipse, notice we have the same x and y bounds from our simultaneous confidence intervals for the successive differences.

library(ellipse)
dbar<-matrix(c(16, 4), 2, 1)
n<-7
p<-4
S<-var(a.mat)
plot(ellipse(S,centre=dbar,t=sqrt(((n-1)*p/(n*(n-p)))*qf(0.95,p,n-p))),type="l",xlim=c(-40,60),ylim=c(-40,40), 
     main="95% Confidence Ellipse for \nSuccessive Yearly Length Increases \nYear3-Year2 and Year5-Year4", xlab = "Year3-Year2",
     ylab = "Year5-Year4")
points(dbar[1,],dbar[2,])
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(-20.6538465602516,-20.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(28.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937, -21.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(53.2264919444937, 53.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)

\(\textbf{(d)}\) Refer to Parts a and b. Construct the 95% Bonferroni confidence intervals for the set consisting of four mean lengths and three successive yearly increases in mean length.
Since we are making 7 comparisons here and we want a familywise confidence level of 95% for all 7 comparisons, our critical value becomes:

critical_value<-qt(.05/14, 6, lower.tail = F)
critical_value

## [1] 3.997061

The Bonferroni 95% CI’s for four mean lengths are given by: \(\bar{x} \pm t_{\alpha/2m, n-1} \sqrt(\frac{s_ii}{n})\)

And the formula for our comparison of successive yearly increases is given by:

\[\bar{x}_i - \bar{x}_k \pm \t_{n-1, \alpha/(2m)}\sqrt{\frac{s_{ii}-2s_{ik}+s_{kk}}{n}}\]

We can use a short r-function to calculate the four mean length CI’s:

bonferonni.cis<-function(m, x, n){
critical_value<-qt(.05/(2*m), n-1, lower.tail = F)
paste("(",mean(x)-critical_value*sqrt(var(x)/n),",", mean(x)+critical_value*sqrt(var(x)/n),")")
}

bonferonni.cis(7, l2, 7)#CI FOR LENGTH 2

## [1] "( 137.388361957808 , 149.18306661362 )"

bonferonni.cis(7, l3, 7)#CI FOR LENGTH 3

## [1] "( 144.185440294212 , 174.385988277217 )"

bonferonni.cis(7, l4, 7)#CI FOR LENGTH 4

## [1] "( 167.135947109617 , 179.149767176097 )"

bonferonni.cis(7, l5, 7)#CI FOR LENGTH 5

## [1] "( 166.9549783036 , 187.330735982114 )"

And the formula for our comparison of successive yearly increases is given by:

Using our successive.diff function from before but changing to the Bonferroni t-Critical Value:

successive.diff.bon<-function(x, i, j, n, m){
  mean.dif<-mean(x[,j]-x[,i])
  critical_value<-qt(.05/(2*m), n-1, lower.tail = F)
  var<-var(x)
  lb<-mean.dif-critical_value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
  ub<-mean.dif+critical_value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
  paste("(",lb, ",", ub, ")")
  
}
successive.diff.bon(bear.length, 1, 2, 7, 7) # Year 3 - Year 2

## [1] "( -1.42278404051086 , 33.4227840405109 )"

successive.diff.bon(bear.length, 2, 3, 7, 7) # Year 4 - Year 3

## [1] "( -3.26677194109882 , 30.9810576553845 )"

successive.diff.bon(bear.length, 3, 4, 7, 7) # Year 5 - Year 4

## [1] "( -7.53852060590654 , 15.5385206059065 )"

\(\textbf{(e)}\) Refer to Parts c and d. Compare the 95% Bonferroni confidence rectangle for the mean increase in length from 2 to 3 years and the mean increase in length from 4 to 5 years with the confidence ellipse produced by the \(T^2\)-procedure.

Using our \(T^2\) ellipse from before and plotting the Bonferroni CI’s on top, we have:

plot(ellipse(S,centre=dbar,t=sqrt(((n-1)*p/(n*(n-p)))*qf(0.95,p,n-p))),type="l",xlim=c(-40,60),ylim=c(-40,40), 
     main="95% Confidence Ellipses for \nSuccessive Yearly Length Increases \nYear3-Year2 and Year5-Year4", xlab = "Year3-Year2",
     ylab = "Year5-Year4")
points(dbar[1,],dbar[2,])
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(-20.6538465602516,-20.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(28.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937, -21.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(53.2264919444937, 53.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086 , 33.42278404051097), y=c(-7.53852060590654, -7.53852060590654), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086 , 33.42278404051097), y=c(15.5385206059065, 15.5385206059065), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086, -1.42278404051086), y=c( -7.53852060590654 , 15.5385206059065), lty=2, lwd=1.5)
lines(x=c(33.42278404051097, 33.42278404051097), y=c( -7.53852060590654 , 15.5385206059065), lty=2, lwd=1.5)
text(x=-1, y=-7.5, "Bonferroni")
text(x=-21, y=-21, "T^2")

In this case we can clearly see that we get much more precise intervals from the Bonferroni method, while maintaining a familywise confidence coefficient for all 7 pairwise comparisons of 95%.

Homework 2

Dan McGuire

October 1, 2015