\(\textbf{4.23}\)
Consider the annual rates of return (including dividends) on the Dow-Jones industrial average for the years 1996-2005. These data, multiplied by 100, are
-0.6 3.1 25.3 -16.8 -7.1 -6.2 16.1 25.2 22.6 26
Use these 10 observations to complete the following.
\(\textbf{(a)}\) Construct a \(Q-Q\) plot. Do the data seem to be normally distributed? Explain.
dowj<-c(-.6, 3.1, 25.3, -16.8, -7.1, -6.2, 16.1, 25.2, 22.6, 26)
qqp<-qqnorm(dowj)
qqline(dowj)
The data seem to play close to our Theoretical Normal Quantiles here. Based on a small sample size of only 10, no egregious violations of normality are present.
\(\textbf{(b)}\) Carry out a test of normality based on the correlation coefficient \(r_Q\) Let the significance level be \(\alpha\)=.10.
cor(qqp$x, qqp$y)
## [1] 0.9500625
Using Table 4.2 in the text, we would reject our assumption of normality if \(r\) falls below .9351 based on \(n=10\) and \(\alpha=.10\). With our correlation coefficient of .95, we do NOT reject our assumption that the data is normal.
\(\textbf{4.29}\) Given the air pollution data in Table 1.5, examine the pairs \(X_5 = NO_2\) and \(X_6 = O_3\) for bivariate normality.
\(\textbf{(a)}\) Calculate statistical distances \((X_j = \bar{x})\prime S^{-1} (X_j = \bar{x}), j=1, 2, ..., 42\) where \(X_j\prime = [x_{j5}, x_{j6}]\).
no2<-c(12, 9, 5, 8, 8, 12, 12, 21, 11, 13, 10, 12, 18, 11, 8, 9, 7, 16, 13,
9, 14, 7, 13, 5, 10, 7, 11, 7, 9, 7, 10, 12, 8, 10, 6, 9, 6, 13, 9, 8, 11, 6)
o3<-c(8, 5, 6, 15, 10, 12, 15, 14, 11, 9, 3, 7, 10, 7, 10, 10, 7, 4, 2, 5,
4, 6, 11, 2, 23, 6, 11, 10, 8, 2, 7, 8, 4, 24, 9, 10, 12, 18, 25, 6, 14, 5)
x<-cbind(no2, o3)
#mean
one<-matrix(rep(1, 42), 42, 1)
xbar<-1/42*t(x)%*%one
#var-covar matrix
varcov<-1/41*t(x)%*%(diag(42)-(1/42)*one%*%t(one))%*%x
sqdist<-mahalanobis(x, xbar, varcov)
sqsortdist<-sort(sqdist)
sqsortdist
## [1] 0.1224973 0.1224973 0.1379719 0.1388339 0.1388339 0.1901188
## [7] 0.3159498 0.4135364 0.4135364 0.4606524 0.4606524 0.4760726
## [13] 0.6228096 0.6370218 0.6592206 0.6592206 0.7032485 0.7874152
## [19] 0.8162468 0.8856041 0.8987982 1.0360061 1.0360061 1.1471939
## [25] 1.1848895 1.3566301 1.4584229 1.6282902 1.8013611 1.8984708
## [31] 2.2488867 2.3770610 2.7741416 2.7782596 3.0089122 3.4437748
## [37] 4.7646873 5.6494392 6.1488606 7.0857237 8.4730649 10.6391792
\(\textbf{(b)}\) Determine the proportion of observations \(X_j\prime = [x_{j5}, x_{j6}], j=1, 2, ..., 42\) falling within the approximate 50% probability contour of a bivariate normal distribution.
###Formal Test
qcp<-qchisq(.5, 2)
t<-NULL
for(i in 1:length(sqsortdist)){
if (sqsortdist[i]<=qcp){
t[i]<-1
} else {
t[i]<-0
}
}
prop.normality.test<-mean(t)
prop.normality.test
## [1] 0.6190476
Almost 62% of squared distances fall within the approximate 50% probability contour of a bivariate normal distribution. We fail to reject the assumption of bivariate normality here.
\(\textbf{(c)}\) Construct a chi-square plot of the ordered distances in Part a.
n<-42
prob<-(c(1:n)-0.5)/n
q1<-qchisq(prob,2)
par(bg="gray99")
plot(q1,sqsortdist,xlab='Quantile of chi-square',ylab='d^2',
main = "Chi-Square Plot \n for Bivariate Normality\nAir Pollution Data", col = "steelblue", pch = 19)
abline(a=0, b=1, col="indianred")
\(\textbf{5.5}\)
The quantities \(\bar{x}\), \(S\), and \(S^{-1}\) are given in Example 5.3 for the transformed microwave-radiation data. Conduct a test of the null hypothesis \(H_0: \mu\prime = [.55, .60]\) at the \(\alpha=.05\) level of significance. Is your result consistent with the 95% confidence ellipse for \(\mu\) pictured in Figure 5.1? Explain.
We reject H_0 when: \[ T^2=n(\bar{x}-\mu_0)\prime S^{-1} (\bar{x}-\mu_0) > \frac{(n-1)p}{n-p}F_{p, n-p}(\alpha)
\] \[T^2=42[.564-.55, .603-.6]
\left[ \begin{array} {r r}
199.0457 & -159.5092
-159.5092 & 196.3190
\end{array}
\right]
\left[ \begin{array} {r}
.014 \\
.003
\end{array}
\right]\
\] \[T^2= \left[ \begin{array} {r r}
96.9407 & -69.05521
\end{array}
\right]\left[ \begin{array} {r}
.014 \\
.003
\end{array}
\right]\
\] \[T^2=
1.15004
\]
Our critical value is given by:
p<-2
n<-42
crit.val<-sqrt((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F))
crit.val
## [1] 2.573915
Since \(T^2=1.15004\) is less than our critical value, we fail to reject the null hypothesis \(H_0: \mu\prime = [.55, .60]\)
\(\textbf{5.10}\)
\(\textbf{(a)}\) Obtain the 95% \(T^2\) simultaneous confidence intervals for the four population means for length.
The sample data for the four mean lengths for the bears at Years 2, 3, 4, and 5 are:
l2<-c(141, 140, 145, 146, 150, 142, 139)
l3<-c(157, 168, 162, 159, 158, 140, 171)
l4<-c(168, 174, 172, 176, 168, 178, 176)
l5<-c(183, 170, 177, 171, 175, 189, 175)
bear.length<-cbind(l2, l3, l4, l5)
We can use the following function to obtain these simultaneous c.i.’s:
simult.ci<-function(x, n, p){
crit.value<-sqrt(((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F)))
paste("(",mean(x)-crit.value*sqrt(var(x)/n),",", mean(x)+crit.value*sqrt(var(x)/n),")")
}
n<-7
p<-4
simult.ci(l2, n, p) #Length 2 T-SQUARE CI
## [1] "( 130.685102442155 , 155.886326129274 )"
simult.ci(l3, n, p) #LENGTH 3 T-SQUARE CI
## [1] "( 127.021626824303 , 191.549801747126 )"
simult.ci(l4, n, p) #LENGTH 4 T-SQUARE CI
## [1] "( 160.308158196709 , 185.977556089005 )"
simult.ci(l5, n, p) #LENGTH 5 T-SQUARE CI
## [1] "( 155.37486709837 , 198.910847187344 )"
\(\textbf{(b)}\) Refer to Part a. Obtain the 95% \(T^2\) simultaneous confidence intervals for the three successive yearly increases in mean length.
Since \[\bar{x}_i - \bar{x}_k \pm \sqrt{\frac{p(n-1)}{(n-p)}F_{p, n-p}(\alpha)} \sqrt{\frac{s_{ii}-2s_{ik}+s_{kk}}{n}}
\] We may use a short R-function:
successive.diff<-function(x, i, j, n, p){
mean.dif<-mean(x[,j]-x[,i])
crit.value<-sqrt(((p*(n-1))/(n-p)*qf(.05, p, n-p, lower.tail = F)))
var<-var(x)
lb<-mean.dif-crit.value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
ub<-mean.dif+crit.value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
paste("(",lb, ",", ub, ")")
}
successive.diff(bear.length, 1, 2, 7, 4) #Length 3- Length 2 T-SQUARE CI
## [1] "( -21.2264919444937 , 53.2264919444937 )"
successive.diff(bear.length, 2, 3, 7, 4) #Length 4- Length 3 T-SQUARE CI
## [1] "( -22.7307683981843 , 50.44505411247 )"
successive.diff(bear.length, 3, 4, 7, 4) #Length 5- Length 4 T-SQUARE CI
## [1] "( -20.6538465602516 , 28.6538465602516 )"
\(\textbf{(c)}\) Obtain the 95% \(T^2\) confidence ellipse for the mean increase in length from 2 to 3 years and the mean increase in length from 4 to 5 years.
Using our bear.length data set we can get the paired differences for year3-year2 and year5-year4 and the variance-covariance matrices for the differences:
a.mat<-matrix(c(l3-l2,l5-l4), 7, 2)
a.mat
## [,1] [,2]
## [1,] 16 15
## [2,] 28 -4
## [3,] 17 5
## [4,] 13 -5
## [5,] 8 7
## [6,] -2 11
## [7,] 32 -1
var(a.mat)
## [,1] [,2]
## [1,] 133.00000 -49.66667
## [2,] -49.66667 58.33333
Then we can plot our confidence ellipse, notice we have the same x and y bounds from our simultaneous confidence intervals for the successive differences.
library(ellipse)
dbar<-matrix(c(16, 4), 2, 1)
n<-7
p<-4
S<-var(a.mat)
plot(ellipse(S,centre=dbar,t=sqrt(((n-1)*p/(n*(n-p)))*qf(0.95,p,n-p))),type="l",xlim=c(-40,60),ylim=c(-40,40),
main="95% Confidence Ellipse for \nSuccessive Yearly Length Increases \nYear3-Year2 and Year5-Year4", xlab = "Year3-Year2",
ylab = "Year5-Year4")
points(dbar[1,],dbar[2,])
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(-20.6538465602516,-20.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(28.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937, -21.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(53.2264919444937, 53.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
\(\textbf{(d)}\) Refer to Parts a and b. Construct the 95% Bonferroni confidence intervals for the set consisting of four mean lengths and three successive yearly increases in mean length.
Since we are making 7 comparisons here and we want a familywise confidence level of 95% for all 7 comparisons, our critical value becomes:
critical_value<-qt(.05/14, 6, lower.tail = F)
critical_value
## [1] 3.997061
The Bonferroni 95% CI’s for four mean lengths are given by: \(\bar{x} \pm t_{\alpha/2m, n-1} \sqrt(\frac{s_ii}{n})\)
And the formula for our comparison of successive yearly increases is given by:
\[\bar{x}_i - \bar{x}_k \pm \t_{n-1, \alpha/(2m)}\sqrt{\frac{s_{ii}-2s_{ik}+s_{kk}}{n}}\]
We can use a short r-function to calculate the four mean length CI’s:
bonferonni.cis<-function(m, x, n){
critical_value<-qt(.05/(2*m), n-1, lower.tail = F)
paste("(",mean(x)-critical_value*sqrt(var(x)/n),",", mean(x)+critical_value*sqrt(var(x)/n),")")
}
bonferonni.cis(7, l2, 7)#CI FOR LENGTH 2
## [1] "( 137.388361957808 , 149.18306661362 )"
bonferonni.cis(7, l3, 7)#CI FOR LENGTH 3
## [1] "( 144.185440294212 , 174.385988277217 )"
bonferonni.cis(7, l4, 7)#CI FOR LENGTH 4
## [1] "( 167.135947109617 , 179.149767176097 )"
bonferonni.cis(7, l5, 7)#CI FOR LENGTH 5
## [1] "( 166.9549783036 , 187.330735982114 )"
And the formula for our comparison of successive yearly increases is given by:
Using our successive.diff function from before but changing to the Bonferroni t-Critical Value:
successive.diff.bon<-function(x, i, j, n, m){
mean.dif<-mean(x[,j]-x[,i])
critical_value<-qt(.05/(2*m), n-1, lower.tail = F)
var<-var(x)
lb<-mean.dif-critical_value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
ub<-mean.dif+critical_value*sqrt((var[i,i]+var[j,j]-2*var[i,j])/n)
paste("(",lb, ",", ub, ")")
}
successive.diff.bon(bear.length, 1, 2, 7, 7) # Year 3 - Year 2
## [1] "( -1.42278404051086 , 33.4227840405109 )"
successive.diff.bon(bear.length, 2, 3, 7, 7) # Year 4 - Year 3
## [1] "( -3.26677194109882 , 30.9810576553845 )"
successive.diff.bon(bear.length, 3, 4, 7, 7) # Year 5 - Year 4
## [1] "( -7.53852060590654 , 15.5385206059065 )"
\(\textbf{(e)}\) Refer to Parts c and d. Compare the 95% Bonferroni confidence rectangle for the mean increase in length from 2 to 3 years and the mean increase in length from 4 to 5 years with the confidence ellipse produced by the \(T^2\)-procedure.
Using our \(T^2\) ellipse from before and plotting the Bonferroni CI’s on top, we have:
plot(ellipse(S,centre=dbar,t=sqrt(((n-1)*p/(n*(n-p)))*qf(0.95,p,n-p))),type="l",xlim=c(-40,60),ylim=c(-40,40),
main="95% Confidence Ellipses for \nSuccessive Yearly Length Increases \nYear3-Year2 and Year5-Year4", xlab = "Year3-Year2",
ylab = "Year5-Year4")
points(dbar[1,],dbar[2,])
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(-20.6538465602516,-20.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937 , 53.2264919444937), y=c(28.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-21.2264919444937, -21.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(53.2264919444937, 53.2264919444937), y=c(-20.6538465602516, 28.6538465602516), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086 , 33.42278404051097), y=c(-7.53852060590654, -7.53852060590654), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086 , 33.42278404051097), y=c(15.5385206059065, 15.5385206059065), lty=2, lwd=1.5)
lines(x=c(-1.42278404051086, -1.42278404051086), y=c( -7.53852060590654 , 15.5385206059065), lty=2, lwd=1.5)
lines(x=c(33.42278404051097, 33.42278404051097), y=c( -7.53852060590654 , 15.5385206059065), lty=2, lwd=1.5)
text(x=-1, y=-7.5, "Bonferroni")
text(x=-21, y=-21, "T^2")
In this case we can clearly see that we get much more precise intervals from the Bonferroni method, while maintaining a familywise confidence coefficient for all 7 pairwise comparisons of 95%.