Here observe that,
\[ s_{ij} = \frac{1}{n-1} \sum_{k=1}^n (y_{ik} - \bar{y_{i0}})(y_{jk} - \bar{y_{j0}}) \]
So, for \(i=j\), \(s_{ii}\) is the sample variance \(Y_i,\ \forall i=1,2,3\).
Again, observe that, for \(i \neq j\), \(s_{ij}\) is the sample covariance \((Y_i,Y_j),\ \forall i,j=1,2,3\).
Now, \[ \frac{1}{n-1} \sum_{i=1}^n (y_{i} - \bar{y})(y_{i} - \bar{y})'\\ \] \[= \frac{1}{n-1} \sum_{i=1}^n ( \begin{pmatrix} y_{1i}\\ y_{2i}\\ y_{3i} \end{pmatrix} - \begin{pmatrix} \bar{y_1}\\ \bar{y_2}\\ \bar{y_3} \end{pmatrix} ) ( \begin{pmatrix} y_{1i}\\ y_{2i}\\ y_{3i} \end{pmatrix} - \begin{pmatrix} \bar{y_1}\\ \bar{y_2}\\ \bar{y_3} \end{pmatrix} )'\\\] \[=\frac{1}{n-1} \sum_{i=1}^n \begin{pmatrix} y_{1i} - \bar{y_1}\\ y_{2i} - \bar{y_2}\\ y_{3i} - \bar{y_3} \end{pmatrix}\begin{pmatrix} y_{1i} - \bar{y_1}\\ y_{2i} - \bar{y_2}\\ y_{3i} - \bar{y_3} \end{pmatrix}' \\\] \[= \frac{1}{n-1} \sum_{i=1}^n \begin{pmatrix} (y_{1i} - \bar{y_1})(y_{1i} - \bar{y_1}) & (y_{1i} - \bar{y_1})(y_{2i} - \bar{y_2}) & (y_{1i} - \bar{y_1})(y_{3i} - \bar{y_3})\\ (y_{2i} - \bar{y_2})(y_{1i} - \bar{y_1}) & (y_{2i} - \bar{y_2})(y_{2i} - \bar{y_2}) & (y_{2i} - \bar{y_2})(y_{3i} - \bar{y_3})\\ (y_{3i} - \bar{y_3})(y_{1i} - \bar{y_1}) & (y_{3i} - \bar{y_3})(y_{2i} - \bar{y_2}) & (y_{3i} - \bar{y_3})(y_{3i} - \bar{y_3}) \end{pmatrix} \\\] \[= \begin{pmatrix} \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{3i} - \bar{y_3})\\ \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{3i} - \bar{y_3})\\ \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{3i} - \bar{y_3}) \end{pmatrix} \]
Hence, \[ \frac{1}{n-1} \sum_{i=1}^n (y_{i} - \bar{y})(y_{i} - \bar{y})'\] gives the sample variance-covariance matrix.
Let us first read the “T3_6_BONE.DAT” data as follows,
Q2 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T3_6_BONE.DAT")
colnames(Q2) = c("Serial No.", "y1", "y2", "y3", "y4")
head(Q2)
The scatterplot matrix for pairs of variables is as follows,
plot(Q2[,2:5])
Now, \(\bar{y}, S, R\) for \(y_i's\) are as follows,
ybar = c(0,0,0,0)
for(i in 1:4){
ybar[i] = mean(Q2[,i+1])
}
ybar
## [1] 48.655 49.625 50.570 51.445
S = cov(Q2[,2:4])
S
## y1 y2 y3
## y1 6.329974 6.189079 5.777000
## y2 6.189079 6.449342 6.153421
## y3 5.777000 6.153421 6.918000
R = cor(Q2[,2:4])
R
## y1 y2 y3
## y1 1.0000000 0.9686511 0.8729938
## y2 0.9686511 1.0000000 0.9212312
## y3 0.8729938 0.9212312 1.0000000
Let us first read the “T3_7_SONS.DAT” data as follows,
Q3 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T3_7_SoNS.DAT")
colnames(Q3) = c("y1", "y2", "x1", "x2")
head(Q3)
Now, \(\bar{y_1}, \bar{y_2}, \bar{x_1}, \bar{x_2},\) are as follows,
ybar = c(0,0,0,0)
for(i in 1:4){
ybar[i] = mean(Q3[,i])
}
ybar = data.frame( cbind(c("y1bar", "y2bar", "x1bar", "x2bar"),ybar))
ybar
Now, the covariance matrix is as follows,
S3 = cov(Q3)
S3
## y1 y2 x1 x2
## y1 95.29333 52.86833 69.66167 46.11167
## y2 52.86833 54.36000 51.31167 35.05333
## x1 69.66167 51.31167 100.80667 56.54000
## x2 46.11167 35.05333 56.54000 45.02333
Syy = S3[1:2,1:2]
Syy
## y1 y2
## y1 95.29333 52.86833
## y2 52.86833 54.36000
Sxx = S3[3:4,3:4]
Sxx
## x1 x2
## x1 100.8067 56.54000
## x2 56.5400 45.02333
Syx = S3[1:2,3:4]
Syx
## x1 x2
## y1 69.66167 46.11167
## y2 51.31167 35.05333
Sxy = S3[3:4,1:2]
Sxy
## y1 y2
## x1 69.66167 51.31167
## x2 46.11167 35.05333
The given \(\mu\) and \(\Sigma\) is as follows,
mu = c(3,1,4)
mu
## [1] 3 1 4
sig = matrix(c(6,1,-2,1,13,4,-2,4,4), ncol = 3, byrow = TRUE)
sig
## [,1] [,2] [,3]
## [1,] 6 1 -2
## [2,] 1 13 4
## [3,] -2 4 4
Now, the distribution of \[z = 2y_1 - y_2 +3y_3\] follows univariate normal with mean \(\mu_z\) and variance \(\sigma_z^2\) as follows,
mu_z = t(c(2,-1,3))%*%mu
mu_z
## [,1]
## [1,] 17
sig_z = t(c(2,-1,3))%*%sig%*%c(2,-1,3)
sig_z
## [,1]
## [1,] 21
Hence, \(z \sim N(17,21)\).
The joint distribution of \(z_1 = y_1 + y_2 + y_3\ \text{and}\ z_2 = y_1 - y_2 + 2y_3\) is a bivariate normal with mean vector,
l = matrix(c(1,1,1,1,-1,2), ncol = 3, byrow = TRUE)
l%*%mu
## [,1]
## [1,] 8
## [2,] 10
and variance-covariance matrix,
l%*%sig%*%t(l)
## [,1] [,2]
## [1,] 29 -1
## [2,] -1 9
Here observe that, \(\Sigma_{2,2} = 13 = \sigma^2_{y_2}\) and \(\mu_{y_2} = 1\), so \(y_2 \sim N(1,13)\), a univariate normal.
The joint distribution of \(y_1\) and \(y_3\) is a bivariate normal with mean,
mu_13 = mu[c(1,3)]
mu_13
## [1] 3 4
and variance-covariance matrix,
sig_13 = sig[c(1,3),c(1,3)]
sig_13
## [,1] [,2]
## [1,] 6 -2
## [2,] -2 4
The joint distribution of \(y_1, y_3, \frac{1}{2}(y_1+y_3)\) is a multivariate normal with mean vector,
l2 = matrix(c(1,0,0,1,0.5,0.5), ncol = 2, byrow = TRUE)
l2%*%mu_13
## [,1]
## [1,] 3.0
## [2,] 4.0
## [3,] 3.5
and variance-covariance matrix,
l2%*%sig_13%*%t(l2)
## [,1] [,2] [,3]
## [1,] 6 -2 2.0
## [2,] -2 4 1.0
## [3,] 2 1 1.5
The \(z = (\Sigma^{\frac{1}{2}})^{-1} (y - \mu)\) vector which follows \(N_3(0,I)\), the \(\Sigma^{\frac{1}{2}}\) is,
diag(sqrt(svd(sig)$d))%*%svd(sig)$u
## [,1] [,2] [,3]
## [1,] -0.1051331 3.4287894 1.6611224
## [2,] -2.4894215 0.3446727 -0.8690093
## [3,] -0.4149634 -0.4937462 0.9928982
and \(\mu\) is,
mu
## [1] 3 1 4
The distribution of \((y- \mu)' \Sigma^{-1} (y- \mu) \sim \chi^2_3\).
Let us first read the “T4_3_HEMATOL.DAT” data as follows,
Q6 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T4_3_HEMATOL.DAT")
head(Q6)
For checking the univariate normality, we will perform normal QQ-plot as follows,
par(mfrow=c(2,3))
qqnorm(Q6[,1], main = "Normal Q-Q plot for v1")
qqline(Q6[,1])
qqnorm(Q6[,2], main = "Normal Q-Q plot for v2")
qqline(Q6[,2])
qqnorm(Q6[,3], main = "Normal Q-Q plot for v3")
qqline(Q6[,3])
qqnorm(Q6[,4], main = "Normal Q-Q plot for v4")
qqline(Q6[,4])
qqnorm(Q6[,5], main = "Normal Q-Q plot for v5")
qqline(Q6[,5])
qqnorm(Q6[,6], main = "Normal Q-Q plot for v6")
qqline(Q6[,6])
Here observe that, for \(V_1 - V_6\) except \(V_3, V_4, C_6\), the points are near to the theoretical line, so all variables follows univariate normal. But \(V_3, V_4, C_6\) doesn’t follow univariate normal.
For testing the pairwise bivariate normal, we will perform Mardia’s normality test as follows,
library(MVN)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
## sROC 0.1-2 loaded
for(i in 1:6){
for( j in (i+1):6){
if(i != j && i<7 && j<7){
print(c("Bivariate normality test for pairs"))
print(c(i, j))
print(mvn(Q6[,c(i,j)],mvnTest = "mardia")$multivariateNormality)
}
}
}
## [1] "Bivariate normality test for pairs"
## [1] 1 2
## Test Statistic p value Result
## 1 Mardia Skewness 3.17299135228571 0.529304320552425 YES
## 2 Mardia Kurtosis 0.105409659283307 0.91605075176653 YES
## 3 MVN <NA> <NA> YES
## [1] "Bivariate normality test for pairs"
## [1] 1 3
## Test Statistic p value Result
## 1 Mardia Skewness 7.80365008980945 0.0990413884570596 YES
## 2 Mardia Kurtosis 0.00486915626774317 0.996114990741246 YES
## 3 MVN <NA> <NA> YES
## [1] "Bivariate normality test for pairs"
## [1] 1 4
## Test Statistic p value Result
## 1 Mardia Skewness 22.6943249173763 0.000145732736977252 NO
## 2 Mardia Kurtosis 3.22716118548049 0.00125025009686275 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 1 5
## Test Statistic p value Result
## 1 Mardia Skewness 5.51323487225013 0.238568570730039 YES
## 2 Mardia Kurtosis -0.291720099476145 0.770500640182101 YES
## 3 MVN <NA> <NA> YES
## [1] "Bivariate normality test for pairs"
## [1] 1 6
## Test Statistic p value Result
## 1 Mardia Skewness 13.6415851854692 0.00853135832924726 NO
## 2 Mardia Kurtosis 1.03563103855275 0.300374305897019 YES
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 2 3
## Test Statistic p value Result
## 1 Mardia Skewness 5.61491635985261 0.229811388196661 YES
## 2 Mardia Kurtosis 0.152641942140793 0.878680647504501 YES
## 3 MVN <NA> <NA> YES
## [1] "Bivariate normality test for pairs"
## [1] 2 4
## Test Statistic p value Result
## 1 Mardia Skewness 21.3051407790445 0.000275466757370697 NO
## 2 Mardia Kurtosis 3.17768566924064 0.0014845560243717 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 2 5
## Test Statistic p value Result
## 1 Mardia Skewness 3.12790583281286 0.536653448095711 YES
## 2 Mardia Kurtosis -0.0124458720775812 0.990069887186201 YES
## 3 MVN <NA> <NA> YES
## [1] "Bivariate normality test for pairs"
## [1] 2 6
## Test Statistic p value Result
## 1 Mardia Skewness 11.0529357963604 0.0259754866971697 NO
## 2 Mardia Kurtosis 1.28613594935628 0.198395619785664 YES
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 3 4
## Test Statistic p value Result
## 1 Mardia Skewness 24.1173585198822 7.5662623095289e-05 NO
## 2 Mardia Kurtosis 3.79685461457691 0.000146543675640487 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 3 5
## Test Statistic p value Result
## 1 Mardia Skewness 16.1932291046616 0.00277054220726324 NO
## 2 Mardia Kurtosis 1.03606601036006 0.300171347953391 YES
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 3 6
## Test Statistic p value Result
## 1 Mardia Skewness 14.96634638818 0.00477152834872363 NO
## 2 Mardia Kurtosis 2.55916531203404 0.0104923833641331 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 4 5
## Test Statistic p value Result
## 1 Mardia Skewness 22.6608888123325 0.000147988935524286 NO
## 2 Mardia Kurtosis 2.98125944715184 0.00287065475914439 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 4 6
## Test Statistic p value Result
## 1 Mardia Skewness 32.4870666460666 1.52107059888688e-06 NO
## 2 Mardia Kurtosis 5.38066231210181 7.42122912011922e-08 NO
## 3 MVN <NA> <NA> NO
## [1] "Bivariate normality test for pairs"
## [1] 5 6
## Test Statistic p value Result
## 1 Mardia Skewness 14.0478949111802 0.00714375340802133 NO
## 2 Mardia Kurtosis 1.96296737623216 0.0496499649112825 NO
## 3 MVN <NA> <NA> NO
Here observe that, all pairs except the pairs \((1,4), (1,6), (2,4), (2,6), (3,4), (3,5), (3,6), (4,5), (4,6), (5,6)\) follows bivariate normal.