Consider five measurements of normal patients and diabetics in the data “Diabetes” in the package “heplots”. The variables are
y1: relative weight
y2: fasting plasma glucose
x1: glucose intolerance
x2: insulin response to oral glucose;
x3: insulin resistance.
The original data are from Reaven and Miller (1979, Diabetologia). Focus on x variables that are of main interest.
Answer the following questions:
Are the 3-dimensional data normally distributed? Why?
Compute the sample mean and covariance matrix of the x variables.
Compute the eigenvalues and eigenvectors of the sample covariance matrix.
If we have a p x 1 random vector \(X\) that is distributed according to a multivariate normal distribution with population mean vector \(\mu\) and population variance-covariance matrix \(\Sigma\), then this random vector, \(X\) , will have the joint density function as shown in the expression below:
\[\phi(\textbf{x})=\left(\frac{1}{2\pi}\right)^{p/2}|\Sigma|^{-1/2}\exp\{-\frac{1}{2}(\textbf{x}-\mathbf{\mu})'\Sigma^{-1}(\textbf{x}-\mathbf{\mu})\}\]
\(|\Sigma|\)denotes the determinant of the variance-covariance matrix \(\Sigma\) and \(\Sigma^{-1}\)is just the inverse of the variance-covariance matrix \(\Sigma\). Again, this distribution will take maximum values when the vector \(X\) is equal to the mean vector \(\mu\) , and decrease around that maximum.
The shorthand notation, similar to the univariate version above, is \[\mathbf{X} \sim N(\mathbf{\mu},\Sigma)\]
If the random vectors are uncorrelated then the variance co-variance matrix is written as follows. \[\Sigma = \left(\begin{array}{cccc}\sigma^2_1 & 0 & \dots & 0\\ 0 & \sigma^2_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \sigma^2_p \end{array}\right)\]
library("heplots")
## Loading required package: car
## Loading required package: carData
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)
data(Diabetes,package = "heplots")
head(Diabetes) #The actucal dataset from where we will extract columns
## relwt glufast glutest instest sspg group
## 1 0.81 80 356 124 55 Normal
## 2 0.95 97 289 117 76 Normal
## 3 0.94 105 319 143 105 Normal
## 4 1.04 90 356 199 108 Normal
## 5 1.00 90 323 240 143 Normal
## 6 0.76 86 381 157 165 Normal
(d<-str(Diabetes))
## 'data.frame': 145 obs. of 6 variables:
## $ relwt : num 0.81 0.95 0.94 1.04 1 0.76 0.91 1.1 0.99 0.78 ...
## $ glufast: int 80 97 105 90 90 86 100 85 97 97 ...
## $ glutest: int 356 289 319 356 323 381 350 301 379 296 ...
## $ instest: int 124 117 143 199 240 157 221 186 142 131 ...
## $ sspg : int 55 76 105 108 143 165 119 105 98 94 ...
## $ group : Factor w/ 3 levels "Normal","Chemical_Diabetic",..: 1 1 1 1 1 1 1 1 1 1 ...
## NULL
Y <- Diabetes[,1:2]
X <- Diabetes[,3:5]
The variables are:
relwt: relative weight, expressed as the ratio of actual weight to expected weight, given the person’s height
glufast: fasting plasma glucose level
glutest: test plasma glucose level, a measure of glucose intolerance,
instest: plasma insulin during test, a measure of insulin response to oral glucose,
sspg: steady state plasma glucose, a measure of insulin resistance
group: diagnostic group
A <- qplot(sample = X[,1],data = X,main = "Glucose Intolerence",
col="salmon")+ geom_qq_line(col="green")
A
B <- qplot(sample = X[,2],data = X,main = "Insulin response to oral glucose",
col="salmon")+ geom_qq_line(col="green")
B
C <- qplot(sample = X[,3],data = X,main = "Insulin resistance",
col="salmon")+ geom_qq_line(col="green")
C
colMeans(X) #sample mean
## glutest instest sspg
## 543.6138 186.1172 184.2069
var(X) #cov matrix
## glutest instest sspg
## glutest 100457.85 -12918.1627 25908.4902
## instest -12918.16 14625.3125 101.4825
## sspg 25908.49 101.4825 11242.3319
eigen(var(X)) #eigen values and vectors
## eigen() decomposition
## $values
## [1] 109078.28 14170.86 3076.36
##
## $vectors
## [,1] [,2] [,3]
## [1,] 0.9584067 0.03568944 -0.2831658
## [2,] -0.1308070 0.93673946 -0.3246671
## [3,] 0.2536654 0.34820318 0.9024458
sessioninfo::platform_info()
## setting value
## version R version 3.6.1 (2019-07-05)
## os Ubuntu 19.10
## system x86_64, linux-gnu
## ui X11
## language en_IN:en
## collate en_IN.UTF-8
## ctype en_IN.UTF-8
## tz Asia/Kolkata
## date 2021-01-03
Friendly, M. (2020). Diabetes data: heplots and candisc examples. Retrieved 20 September 2020, from https://cran.r-project.org/web/packages/candisc/vignettes/diabetes.html
Lesson 4: Multivariate Normal Distribution. (2020). Retrieved 20 September 2020, from https://online.stat.psu.edu/stat505/book/export/html/636