In the univariate paradigm we have seen central tendency explained by mean and dispersion or spread of the data explained by variance. In the multivariate paradigm as we deal with many variables at the same time, we will see mean vector, covariance and correlation.
We will calculate the mean vector, covariance and correlation.
Lets take mtcars data as an example.
Formula:
\[ x^{bar} = 1/n.X^{T}.X\]
x <- as.matrix(mtcars[,1:5]) # take first columns of mtcars data
one <- as.matrix(rep(1, dim(x)[1])) # create identity matrix with all elements being 1
n <- dim(x)[1] # number of observations
xbar <- 1/n*t(x)%*%one # calculate means of all columns
print(xbar)
## [,1]
## mpg 20.090625
## cyl 6.187500
## disp 230.721875
## hp 146.687500
## drat 3.596563
We can also achieve this by using one of the apply family functions
apply(mtcars[,1:5], 2, mean)
## mpg cyl disp hp drat
## 20.090625 6.187500 230.721875 146.687500 3.596563
Lets build the covariance matrix, whose diagonal elements are the variances.
Formula:
\[ S = 1/(n-1).X^{*T}.X^{*}\]
First we will build a mean matrix, the elements of vectors of the mean matrix are the mean of their repective vectors.
# calculate covariance matrix
mean_matrix <- matrix(data = 1, nrow = n)%*%cbind(xbar[[1]], xbar[[2]], xbar[[3]], xbar[[4]], xbar[[5]])
head(mean_matrix, n = 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [2,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [3,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [4,] 20.09062 6.1875 230.7219 146.6875 3.596563
## [5,] 20.09062 6.1875 230.7219 146.6875 3.596563
Second subtract the x matrix from mean matrix. This is the difference between ith element of the jth vector and the mean of the jth vector.
x_star <- x - mean_matrix
head(x_star, n = 5)
## mpg cyl disp hp drat
## Mazda RX4 0.909375 -0.1875 -70.72188 -36.6875 0.3034375
## Mazda RX4 Wag 0.909375 -0.1875 -70.72188 -36.6875 0.3034375
## Datsun 710 2.709375 -2.1875 -122.72188 -53.6875 0.2534375
## Hornet 4 Drive 1.309375 -0.1875 27.27812 -36.6875 -0.5165625
## Hornet Sportabout -1.390625 1.8125 129.27812 28.3125 -0.4465625
covar_matrix <- 1/(n-1)*t(x_star)%*%x_star # Covariance matrix
print(covar_matrix)
## mpg cyl disp hp drat
## mpg 36.324103 -9.1723790 -633.09721 -320.73206 2.1950635
## cyl -9.172379 3.1895161 199.66028 101.93145 -0.6683669
## disp -633.097208 199.6602823 15360.79983 6721.15867 -47.0640192
## hp -320.732056 101.9314516 6721.15867 4700.86694 -16.4511089
## drat 2.195064 -0.6683669 -47.06402 -16.45111 0.2858814
cov() function will return same result.
In the correlation matrix all the diagonal elements are 1.
Formula:
\[ R = D^{-1/2}.S.D^{-1/2}\]
Where D^(-1/2) is the square root of the diagonal elements of the covariance matrix and S is the covariance matrix.
D <- diag(diag(covar_matrix)^(-1/2)) # extract the standard deviations (which is the square root of the diagonal elements of covariance matrix)
corr_matrix <- D%*%covar_matrix%*%D # correlation matrix
print(corr_matrix)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719
## [2,] -0.8521620 1.0000000 0.9020329 0.8324475 -0.6999381
## [3,] -0.8475514 0.9020329 1.0000000 0.7909486 -0.7102139
## [4,] -0.7761684 0.8324475 0.7909486 1.0000000 -0.4487591
## [5,] 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.0000000
cor() function will also return the same result.
Formula:
\[ S = D^{1/2}.R.D^{1/2}\]
D1 <- diag(diag(covar_matrix)^(1/2))
S <- D1%*%corr_matrix%*%D1
print(S)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 36.324103 -9.1723790 -633.09721 -320.73206 2.1950635
## [2,] -9.172379 3.1895161 199.66028 101.93145 -0.6683669
## [3,] -633.097208 199.6602823 15360.79983 6721.15867 -47.0640192
## [4,] -320.732056 101.9314516 6721.15867 4700.86694 -16.4511089
## [5,] 2.195064 -0.6683669 -47.06402 -16.45111 0.2858814