Applied Multivariate Data Analysis

Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Problem 6

Problem 1

Here observe that,

\[ s_{ij} = \frac{1}{n-1} \sum_{k=1}^n (y_{ik} - \bar{y_{i0}})(y_{jk} - \bar{y_{j0}}) \]

So, for \(i=j\), \(s_{ii}\) is the sample variance \(Y_i,\ \forall i=1,2,3\).

Again, observe that, for \(i \neq j\), \(s_{ij}\) is the sample covariance \((Y_i,Y_j),\ \forall i,j=1,2,3\).

Now, \[ \frac{1}{n-1} \sum_{i=1}^n (y_{i} - \bar{y})(y_{i} - \bar{y})'\\ \] \[= \frac{1}{n-1} \sum_{i=1}^n ( \begin{pmatrix} y_{1i}\\ y_{2i}\\ y_{3i} \end{pmatrix} - \begin{pmatrix} \bar{y_1}\\ \bar{y_2}\\ \bar{y_3} \end{pmatrix} ) ( \begin{pmatrix} y_{1i}\\ y_{2i}\\ y_{3i} \end{pmatrix} - \begin{pmatrix} \bar{y_1}\\ \bar{y_2}\\ \bar{y_3} \end{pmatrix} )'\\\] \[=\frac{1}{n-1} \sum_{i=1}^n \begin{pmatrix} y_{1i} - \bar{y_1}\\ y_{2i} - \bar{y_2}\\ y_{3i} - \bar{y_3} \end{pmatrix}\begin{pmatrix} y_{1i} - \bar{y_1}\\ y_{2i} - \bar{y_2}\\ y_{3i} - \bar{y_3} \end{pmatrix}' \\\] \[= \frac{1}{n-1} \sum_{i=1}^n \begin{pmatrix} (y_{1i} - \bar{y_1})(y_{1i} - \bar{y_1}) & (y_{1i} - \bar{y_1})(y_{2i} - \bar{y_2}) & (y_{1i} - \bar{y_1})(y_{3i} - \bar{y_3})\\ (y_{2i} - \bar{y_2})(y_{1i} - \bar{y_1}) & (y_{2i} - \bar{y_2})(y_{2i} - \bar{y_2}) & (y_{2i} - \bar{y_2})(y_{3i} - \bar{y_3})\\ (y_{3i} - \bar{y_3})(y_{1i} - \bar{y_1}) & (y_{3i} - \bar{y_3})(y_{2i} - \bar{y_2}) & (y_{3i} - \bar{y_3})(y_{3i} - \bar{y_3}) \end{pmatrix} \\\] \[= \begin{pmatrix} \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{1i} - \bar{y_1})(y_{3i} - \bar{y_3})\\ \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{2i} - \bar{y_2})(y_{3i} - \bar{y_3})\\ \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{1i} - \bar{y_1}) & \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{2i} - \bar{y_2}) & \frac{1}{n-1} \sum_{i=1}^n (y_{3i} - \bar{y_3})(y_{3i} - \bar{y_3}) \end{pmatrix} \]

Hence, \[ \frac{1}{n-1} \sum_{i=1}^n (y_{i} - \bar{y})(y_{i} - \bar{y})'\] gives the sample variance-covariance matrix.

Problem 2

Let us first read the “T3_6_BONE.DAT” data as follows,

Q2 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T3_6_BONE.DAT")
colnames(Q2) = c("Serial No.", "y1", "y2", "y3", "y4")
head(Q2)

The scatterplot matrix for pairs of variables is as follows,

plot(Q2[,2:5])

Now, \(\bar{y}, S, R\) for \(y_i's\) are as follows,

ybar = c(0,0,0,0)
for(i in 1:4){
  ybar[i] = mean(Q2[,i+1])
}
ybar

## [1] 48.655 49.625 50.570 51.445

S = cov(Q2[,2:4])
S

##          y1       y2       y3
## y1 6.329974 6.189079 5.777000
## y2 6.189079 6.449342 6.153421
## y3 5.777000 6.153421 6.918000

R = cor(Q2[,2:4])
R

##           y1        y2        y3
## y1 1.0000000 0.9686511 0.8729938
## y2 0.9686511 1.0000000 0.9212312
## y3 0.8729938 0.9212312 1.0000000

Problem 3

Let us first read the “T3_7_SONS.DAT” data as follows,

Q3 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T3_7_SoNS.DAT")
colnames(Q3) = c("y1", "y2", "x1", "x2")
head(Q3)

Now, \(\bar{y_1}, \bar{y_2}, \bar{x_1}, \bar{x_2},\) are as follows,

ybar = c(0,0,0,0)
for(i in 1:4){
  ybar[i] = mean(Q3[,i])
}
ybar = data.frame( cbind(c("y1bar", "y2bar", "x1bar", "x2bar"),ybar))
ybar

Now, the covariance matrix is as follows,

S3 = cov(Q3)
S3

##          y1       y2        x1       x2
## y1 95.29333 52.86833  69.66167 46.11167
## y2 52.86833 54.36000  51.31167 35.05333
## x1 69.66167 51.31167 100.80667 56.54000
## x2 46.11167 35.05333  56.54000 45.02333

Syy = S3[1:2,1:2]
Syy

##          y1       y2
## y1 95.29333 52.86833
## y2 52.86833 54.36000

Sxx = S3[3:4,3:4]
Sxx

##          x1       x2
## x1 100.8067 56.54000
## x2  56.5400 45.02333

Syx = S3[1:2,3:4]
Syx

##          x1       x2
## y1 69.66167 46.11167
## y2 51.31167 35.05333

Sxy = S3[3:4,1:2]
Sxy

##          y1       y2
## x1 69.66167 51.31167
## x2 46.11167 35.05333

Problem 4

The given \(\mu\) and \(\Sigma\) is as follows,

mu = c(3,1,4)
mu

## [1] 3 1 4

sig = matrix(c(6,1,-2,1,13,4,-2,4,4), ncol = 3, byrow = TRUE)
sig

##      [,1] [,2] [,3]
## [1,]    6    1   -2
## [2,]    1   13    4
## [3,]   -2    4    4

Now, the distribution of \[z = 2y_1 - y_2 +3y_3\] follows univariate normal with mean \(\mu_z\) and variance \(\sigma_z^2\) as follows,

mu_z = t(c(2,-1,3))%*%mu
mu_z

##      [,1]
## [1,]   17

sig_z = t(c(2,-1,3))%*%sig%*%c(2,-1,3)
sig_z

##      [,1]
## [1,]   21

Hence, \(z \sim N(17,21)\).

The joint distribution of \(z_1 = y_1 + y_2 + y_3\ \text{and}\ z_2 = y_1 - y_2 + 2y_3\) is a bivariate normal with mean vector,

l = matrix(c(1,1,1,1,-1,2), ncol = 3, byrow = TRUE)
l%*%mu

##      [,1]
## [1,]    8
## [2,]   10

and variance-covariance matrix,

l%*%sig%*%t(l)

##      [,1] [,2]
## [1,]   29   -1
## [2,]   -1    9

Here observe that, \(\Sigma_{2,2} = 13 = \sigma^2_{y_2}\) and \(\mu_{y_2} = 1\), so \(y_2 \sim N(1,13)\), a univariate normal.

The joint distribution of \(y_1\) and \(y_3\) is a bivariate normal with mean,

mu_13 = mu[c(1,3)]
mu_13

## [1] 3 4

and variance-covariance matrix,

sig_13 = sig[c(1,3),c(1,3)]
sig_13

##      [,1] [,2]
## [1,]    6   -2
## [2,]   -2    4

The joint distribution of \(y_1, y_3, \frac{1}{2}(y_1+y_3)\) is a multivariate normal with mean vector,

l2 = matrix(c(1,0,0,1,0.5,0.5), ncol = 2, byrow = TRUE)
l2%*%mu_13

##      [,1]
## [1,]  3.0
## [2,]  4.0
## [3,]  3.5

and variance-covariance matrix,

l2%*%sig_13%*%t(l2)

##      [,1] [,2] [,3]
## [1,]    6   -2  2.0
## [2,]   -2    4  1.0
## [3,]    2    1  1.5

Problem 5

The \(z = (\Sigma^{\frac{1}{2}})^{-1} (y - \mu)\) vector which follows \(N_3(0,I)\), the \(\Sigma^{\frac{1}{2}}\) is,

diag(sqrt(svd(sig)$d))%*%svd(sig)$u

##            [,1]       [,2]       [,3]
## [1,] -0.1051331  3.4287894  1.6611224
## [2,] -2.4894215  0.3446727 -0.8690093
## [3,] -0.4149634 -0.4937462  0.9928982

and \(\mu\) is,

mu

## [1] 3 1 4

The distribution of \((y- \mu)' \Sigma^{-1} (y- \mu) \sim \chi^2_3\).

Problem 6

Let us first read the “T4_3_HEMATOL.DAT” data as follows,

Q6 = read.table("C:/Users/Lenovo/OneDrive/Desktop/instadata/R Programming_Kim_15th Feb/T4_3_HEMATOL.DAT")
head(Q6)

For checking the univariate normality, we will perform normal QQ-plot as follows,

par(mfrow=c(2,3))

qqnorm(Q6[,1], main = "Normal Q-Q plot for v1")
qqline(Q6[,1])

qqnorm(Q6[,2], main = "Normal Q-Q plot for v2")
qqline(Q6[,2])

qqnorm(Q6[,3], main = "Normal Q-Q plot for v3")
qqline(Q6[,3])

qqnorm(Q6[,4], main = "Normal Q-Q plot for v4")
qqline(Q6[,4])

qqnorm(Q6[,5], main = "Normal Q-Q plot for v5")
qqline(Q6[,5])

qqnorm(Q6[,6], main = "Normal Q-Q plot for v6")
qqline(Q6[,6])

Here observe that, for \(V_1 - V_6\) except \(V_3, V_4, C_6\), the points are near to the theoretical line, so all variables follows univariate normal. But \(V_3, V_4, C_6\) doesn’t follow univariate normal.

For testing the pairwise bivariate normal, we will perform Mardia’s normality test as follows,

library(MVN)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## sROC 0.1-2 loaded

for(i in 1:6){
  for( j in (i+1):6){
    if(i != j && i<7 && j<7){
      print(c("Bivariate normality test for pairs"))
      print(c(i, j))
      print(mvn(Q6[,c(i,j)],mvnTest = "mardia")$multivariateNormality)
    }
  }
}

## [1] "Bivariate normality test for pairs"
## [1] 1 2
##              Test         Statistic           p value Result
## 1 Mardia Skewness  3.17299135228571 0.529304320552425    YES
## 2 Mardia Kurtosis 0.105409659283307  0.91605075176653    YES
## 3             MVN              <NA>              <NA>    YES
## [1] "Bivariate normality test for pairs"
## [1] 1 3
##              Test           Statistic            p value Result
## 1 Mardia Skewness    7.80365008980945 0.0990413884570596    YES
## 2 Mardia Kurtosis 0.00486915626774317  0.996114990741246    YES
## 3             MVN                <NA>               <NA>    YES
## [1] "Bivariate normality test for pairs"
## [1] 1 4
##              Test        Statistic              p value Result
## 1 Mardia Skewness 22.6943249173763 0.000145732736977252     NO
## 2 Mardia Kurtosis 3.22716118548049  0.00125025009686275     NO
## 3             MVN             <NA>                 <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 1 5
##              Test          Statistic           p value Result
## 1 Mardia Skewness   5.51323487225013 0.238568570730039    YES
## 2 Mardia Kurtosis -0.291720099476145 0.770500640182101    YES
## 3             MVN               <NA>              <NA>    YES
## [1] "Bivariate normality test for pairs"
## [1] 1 6
##              Test        Statistic             p value Result
## 1 Mardia Skewness 13.6415851854692 0.00853135832924726     NO
## 2 Mardia Kurtosis 1.03563103855275   0.300374305897019    YES
## 3             MVN             <NA>                <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 2 3
##              Test         Statistic           p value Result
## 1 Mardia Skewness  5.61491635985261 0.229811388196661    YES
## 2 Mardia Kurtosis 0.152641942140793 0.878680647504501    YES
## 3             MVN              <NA>              <NA>    YES
## [1] "Bivariate normality test for pairs"
## [1] 2 4
##              Test        Statistic              p value Result
## 1 Mardia Skewness 21.3051407790445 0.000275466757370697     NO
## 2 Mardia Kurtosis 3.17768566924064   0.0014845560243717     NO
## 3             MVN             <NA>                 <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 2 5
##              Test           Statistic           p value Result
## 1 Mardia Skewness    3.12790583281286 0.536653448095711    YES
## 2 Mardia Kurtosis -0.0124458720775812 0.990069887186201    YES
## 3             MVN                <NA>              <NA>    YES
## [1] "Bivariate normality test for pairs"
## [1] 2 6
##              Test        Statistic            p value Result
## 1 Mardia Skewness 11.0529357963604 0.0259754866971697     NO
## 2 Mardia Kurtosis 1.28613594935628  0.198395619785664    YES
## 3             MVN             <NA>               <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 3 4
##              Test        Statistic              p value Result
## 1 Mardia Skewness 24.1173585198822  7.5662623095289e-05     NO
## 2 Mardia Kurtosis 3.79685461457691 0.000146543675640487     NO
## 3             MVN             <NA>                 <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 3 5
##              Test        Statistic             p value Result
## 1 Mardia Skewness 16.1932291046616 0.00277054220726324     NO
## 2 Mardia Kurtosis 1.03606601036006   0.300171347953391    YES
## 3             MVN             <NA>                <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 3 6
##              Test        Statistic             p value Result
## 1 Mardia Skewness   14.96634638818 0.00477152834872363     NO
## 2 Mardia Kurtosis 2.55916531203404  0.0104923833641331     NO
## 3             MVN             <NA>                <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 4 5
##              Test        Statistic              p value Result
## 1 Mardia Skewness 22.6608888123325 0.000147988935524286     NO
## 2 Mardia Kurtosis 2.98125944715184  0.00287065475914439     NO
## 3             MVN             <NA>                 <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 4 6
##              Test        Statistic              p value Result
## 1 Mardia Skewness 32.4870666460666 1.52107059888688e-06     NO
## 2 Mardia Kurtosis 5.38066231210181 7.42122912011922e-08     NO
## 3             MVN             <NA>                 <NA>     NO
## [1] "Bivariate normality test for pairs"
## [1] 5 6
##              Test        Statistic             p value Result
## 1 Mardia Skewness 14.0478949111802 0.00714375340802133     NO
## 2 Mardia Kurtosis 1.96296737623216  0.0496499649112825     NO
## 3             MVN             <NA>                <NA>     NO

Here observe that, all pairs except the pairs \((1,4), (1,6), (2,4), (2,6), (3,4), (3,5), (3,6), (4,5), (4,6), (5,6)\) follows bivariate normal.

Applied Multivariate Data Analysis

17 February 2021

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6