In the univariate paradigm we have seen central tendency explained by mean and dispersion or spread of the data explained by variance. In the multivariate paradigm as we deal with many variables at the same time, we will see mean vector, covariance and correlation.

We will calculate the mean vector, covariance and correlation.

Lets take mtcars data as an example.

Mean vector

Formula:

\[ x^{bar} = 1/n.X^{T}.X\]

x <- as.matrix(mtcars[,1:5]) # take first columns of mtcars data

one <- as.matrix(rep(1, dim(x)[1])) # create identity matrix with all elements being 1

n <- dim(x)[1] # number of observations

xbar <- 1/n*t(x)%*%one #  calculate means of all columns

print(xbar)
##            [,1]
## mpg   20.090625
## cyl    6.187500
## disp 230.721875
## hp   146.687500
## drat   3.596563

We can also achieve this by using one of the apply family functions

apply(mtcars[,1:5], 2, mean)
##        mpg        cyl       disp         hp       drat 
##  20.090625   6.187500 230.721875 146.687500   3.596563

Covariance matrix

Lets build the covariance matrix, whose diagonal elements are the variances.

Formula:

\[ S = 1/(n-1).X^{*T}.X^{*}\]

First we will build a mean matrix, the elements of vectors of the mean matrix are the mean of their repective vectors.

# calculate covariance matrix

mean_matrix <- matrix(data = 1, nrow = n)%*%cbind(xbar[[1]], xbar[[2]], xbar[[3]], xbar[[4]], xbar[[5]])

head(mean_matrix, n = 5)
##          [,1]   [,2]     [,3]     [,4]     [,5]
## [1,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [2,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [3,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [4,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [5,] 20.09062 6.1875 230.7219 146.6875 3.596563

Second subtract the x matrix from mean matrix. This is the difference between ith element of the jth vector and the mean of the jth vector.

x_star <- x - mean_matrix 

head(x_star, n = 5)
##                         mpg     cyl       disp       hp       drat
## Mazda RX4          0.909375 -0.1875  -70.72188 -36.6875  0.3034375
## Mazda RX4 Wag      0.909375 -0.1875  -70.72188 -36.6875  0.3034375
## Datsun 710         2.709375 -2.1875 -122.72188 -53.6875  0.2534375
## Hornet 4 Drive     1.309375 -0.1875   27.27812 -36.6875 -0.5165625
## Hornet Sportabout -1.390625  1.8125  129.27812  28.3125 -0.4465625
covar_matrix <- 1/(n-1)*t(x_star)%*%x_star # Covariance matrix

print(covar_matrix)
##              mpg         cyl        disp         hp        drat
## mpg    36.324103  -9.1723790  -633.09721 -320.73206   2.1950635
## cyl    -9.172379   3.1895161   199.66028  101.93145  -0.6683669
## disp -633.097208 199.6602823 15360.79983 6721.15867 -47.0640192
## hp   -320.732056 101.9314516  6721.15867 4700.86694 -16.4511089
## drat    2.195064  -0.6683669   -47.06402  -16.45111   0.2858814

cov() function will return same result.

Correlation matrix

In the correlation matrix all the diagonal elements are 1.

Formula:

\[ R = D^{-1/2}.S.D^{-1/2}\]

Where D^(-1/2) is the square root of the diagonal elements of the covariance matrix and S is the covariance matrix.

D <- diag(diag(covar_matrix)^(-1/2)) # extract the standard deviations (which is the square root of the diagonal elements of covariance matrix)

corr_matrix <- D%*%covar_matrix%*%D # correlation matrix

print(corr_matrix)
##            [,1]       [,2]       [,3]       [,4]       [,5]
## [1,]  1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719
## [2,] -0.8521620  1.0000000  0.9020329  0.8324475 -0.6999381
## [3,] -0.8475514  0.9020329  1.0000000  0.7909486 -0.7102139
## [4,] -0.7761684  0.8324475  0.7909486  1.0000000 -0.4487591
## [5,]  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.0000000

cor() function will also return the same result.

Building covariance matrix from correlation matrix.

Formula:

\[ S = D^{1/2}.R.D^{1/2}\]

D1 <- diag(diag(covar_matrix)^(1/2))

S <- D1%*%corr_matrix%*%D1

print(S)
##             [,1]        [,2]        [,3]       [,4]        [,5]
## [1,]   36.324103  -9.1723790  -633.09721 -320.73206   2.1950635
## [2,]   -9.172379   3.1895161   199.66028  101.93145  -0.6683669
## [3,] -633.097208 199.6602823 15360.79983 6721.15867 -47.0640192
## [4,] -320.732056 101.9314516  6721.15867 4700.86694 -16.4511089
## [5,]    2.195064  -0.6683669   -47.06402  -16.45111   0.2858814