Correlation Coefficient
The correlation coefficient of two variables in a data sample is their covariance divided by the product of
their individual standard deviations.
It is a normalized measurement of how the two are linearly related.
In R, one can use the cor( ) function to produce correlations and the cov( ) function to produces covariances.
A simplified format is cor(x, use=, method= ) where
Option Description
______ ____________________________________________________________________________
x Matrix or data frame
use Specifies the handling of missing data. Options are all.obs (assumes no missing
data - missing data will produce an error), complete.obs (listwise deletion),
and pairwise.complete.obs (pairwise deletion)
method Specifies the type of correlation. Options are pearson, spearman or kendall.
For this blog purpose, we will use R's inbuilt dataset "mtcars"
(for more info, please vist:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)
A data frame with 32 observations on 11 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
R Code:
# Below calculates the Correlations among (numeric) variables in R's inbuilt data "mtcars".
# We will use listwise deletion of missing data and method of calculation as by Kendall.
cor(mtcars, use = "complete.obs", method = "kendall")
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.7953134 -0.7681311 -0.7428125 0.46454879 -0.7278321
## cyl -0.7953134 1.0000000 0.8144263 0.7851865 -0.55131785 0.7282611
## disp -0.7681311 0.8144263 1.0000000 0.6659987 -0.49898277 0.7433824
## hp -0.7428125 0.7851865 0.6659987 1.0000000 -0.38262689 0.6113081
## drat 0.4645488 -0.5513178 -0.4989828 -0.3826269 1.00000000 -0.5471495
## wt -0.7278321 0.7282611 0.7433824 0.6113081 -0.54714953 1.0000000
## qsec 0.3153652 -0.4489698 -0.3008155 -0.4729061 0.03272155 -0.1419881
## vs 0.5896790 -0.7710007 -0.6033059 -0.6305926 0.37510111 -0.4884787
## am 0.4690128 -0.4946212 -0.5202739 -0.3039956 0.57554849 -0.6138790
## gear 0.4331509 -0.5125435 -0.4759795 -0.2794458 0.58392476 -0.5435956
## carb -0.5043945 0.4654299 0.4137360 0.5959842 -0.09535193 0.3713741
## qsec vs am gear carb
## mpg 0.31536522 0.5896790 0.46901280 0.43315089 -0.50439455
## cyl -0.44896982 -0.7710007 -0.49462115 -0.51254349 0.46542994
## disp -0.30081549 -0.6033059 -0.52027392 -0.47597955 0.41373600
## hp -0.47290613 -0.6305926 -0.30399557 -0.27944584 0.59598416
## drat 0.03272155 0.3751011 0.57554849 0.58392476 -0.09535193
## wt -0.14198812 -0.4884787 -0.61387896 -0.54359562 0.37137413
## qsec 1.00000000 0.6575431 -0.16890405 -0.09126069 -0.50643945
## vs 0.65754312 1.0000000 0.16834512 0.26974788 -0.57692729
## am -0.16890405 0.1683451 1.00000000 0.77078758 -0.05859929
## gear -0.09126069 0.2697479 0.77078758 1.00000000 0.09801487
## carb -0.50643945 -0.5769273 -0.05859929 0.09801487 1.00000000
# And covariance among the variables:
cov(mtcars, use="complete.obs")
## mpg cyl disp hp drat
## mpg 36.324103 -9.1723790 -633.09721 -320.732056 2.19506351
## cyl -9.172379 3.1895161 199.66028 101.931452 -0.66836694
## disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915
## hp -320.732056 101.9314516 6721.15867 4700.866935 -16.45110887
## drat 2.195064 -0.6683669 -47.06402 -16.451109 0.28588135
## wt -5.116685 1.3673710 107.68420 44.192661 -0.37272073
## qsec 4.509149 -1.8868548 -96.05168 -86.770081 0.08714073
## vs 2.017137 -0.7298387 -44.37762 -24.987903 0.11864919
## am 1.803931 -0.4657258 -36.56401 -8.320565 0.19015121
## gear 2.135685 -0.6491935 -50.80262 -6.358871 0.27598790
## carb -5.363105 1.5201613 79.06875 83.036290 -0.07840726
## wt qsec vs am gear
## mpg -5.1166847 4.50914919 2.01713710 1.80393145 2.1356855
## cyl 1.3673710 -1.88685484 -0.72983871 -0.46572581 -0.6491935
## disp 107.6842040 -96.05168145 -44.37762097 -36.56401210 -50.8026210
## hp 44.1926613 -86.77008065 -24.98790323 -8.32056452 -6.3588710
## drat -0.3727207 0.08714073 0.11864919 0.19015121 0.2759879
## wt 0.9573790 -0.30548161 -0.27366129 -0.33810484 -0.4210806
## qsec -0.3054816 3.19316613 0.67056452 -0.20495968 -0.2804032
## vs -0.2736613 0.67056452 0.25403226 0.04233871 0.0766129
## am -0.3381048 -0.20495968 0.04233871 0.24899194 0.2923387
## gear -0.4210806 -0.28040323 0.07661290 0.29233871 0.5443548
## carb 0.6757903 -1.89411290 -0.46370968 0.04637097 0.3266129
## carb
## mpg -5.36310484
## cyl 1.52016129
## disp 79.06875000
## hp 83.03629032
## drat -0.07840726
## wt 0.67579032
## qsec -1.89411290
## vs -0.46370968
## am 0.04637097
## gear 0.32661290
## carb 2.60887097
Unfortunately, neither cor() function or cov() function produce tests of significance in R, one can use the cor.test() function to test a single correlation coefficient.
test a single correlation coefficient
cor.test(mtcars$mpg, mtcars$cyl, method = "kendall", alternative = "greater",
exact = FALSE) # using large sample approximation
##
## Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$cyl
## z = -5.5913, p-value = 1
## alternative hypothesis: true tau is greater than 0
## sample estimates:
## tau
## -0.7953134
Alternate method
Below is the alternate method to calculate correlations, covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
# Correlations with significance levels
# Method type is pearson
## rcorr(x, type="pearson")
# Method type is spearman
## rcorr(x, type="spearman")
# using mtcars data
rcorr(as.matrix(mtcars), type="pearson")
## mpg cyl disp hp drat wt qsec vs am gear carb
## mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
## cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
## disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
## hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
## drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
## wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
## qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
## vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
## am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
## gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
## carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
##
## n= 32
##
##
## P
## mpg cyl disp hp drat wt qsec vs am gear
## mpg 0.0000 0.0000 0.0000 0.0000 0.0000 0.0171 0.0000 0.0003 0.0054
## cyl 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0022 0.0042
## disp 0.0000 0.0000 0.0000 0.0000 0.0000 0.0131 0.0000 0.0004 0.0010
## hp 0.0000 0.0000 0.0000 0.0100 0.0000 0.0000 0.0000 0.1798 0.4930
## drat 0.0000 0.0000 0.0000 0.0100 0.0000 0.6196 0.0117 0.0000 0.0000
## wt 0.0000 0.0000 0.0000 0.0000 0.0000 0.3389 0.0010 0.0000 0.0005
## qsec 0.0171 0.0004 0.0131 0.0000 0.6196 0.3389 0.0000 0.2057 0.2425
## vs 0.0000 0.0000 0.0000 0.0000 0.0117 0.0010 0.0000 0.3570 0.2579
## am 0.0003 0.0022 0.0004 0.1798 0.0000 0.0000 0.2057 0.3570 0.0000
## gear 0.0054 0.0042 0.0010 0.4930 0.0000 0.0005 0.2425 0.2579 0.0000
## carb 0.0011 0.0019 0.0253 0.0000 0.6212 0.0146 0.0000 0.0007 0.7545 0.1290
## carb
## mpg 0.0011
## cyl 0.0019
## disp 0.0253
## hp 0.0000
## drat 0.6212
## wt 0.0146
## qsec 0.0000
## vs 0.0007
## am 0.7545
## gear 0.1290
## carb
Interpretations
In the above correlation, we can see that strongest (and positive) correlation is
between variables "am (transmission type)" and "gear (no. of forward gear)" and is 0.7545
# using mtcars data
rcorr(as.matrix(mtcars), type="spearman")
## mpg cyl disp hp drat wt qsec vs am gear carb
## mpg 1.00 -0.91 -0.91 -0.89 0.65 -0.89 0.47 0.71 0.56 0.54 -0.66
## cyl -0.91 1.00 0.93 0.90 -0.68 0.86 -0.57 -0.81 -0.52 -0.56 0.58
## disp -0.91 0.93 1.00 0.85 -0.68 0.90 -0.46 -0.72 -0.62 -0.59 0.54
## hp -0.89 0.90 0.85 1.00 -0.52 0.77 -0.67 -0.75 -0.36 -0.33 0.73
## drat 0.65 -0.68 -0.68 -0.52 1.00 -0.75 0.09 0.45 0.69 0.74 -0.13
## wt -0.89 0.86 0.90 0.77 -0.75 1.00 -0.23 -0.59 -0.74 -0.68 0.50
## qsec 0.47 -0.57 -0.46 -0.67 0.09 -0.23 1.00 0.79 -0.20 -0.15 -0.66
## vs 0.71 -0.81 -0.72 -0.75 0.45 -0.59 0.79 1.00 0.17 0.28 -0.63
## am 0.56 -0.52 -0.62 -0.36 0.69 -0.74 -0.20 0.17 1.00 0.81 -0.06
## gear 0.54 -0.56 -0.59 -0.33 0.74 -0.68 -0.15 0.28 0.81 1.00 0.11
## carb -0.66 0.58 0.54 0.73 -0.13 0.50 -0.66 -0.63 -0.06 0.11 1.00
##
## n= 32
##
##
## P
## mpg cyl disp hp drat wt qsec vs am gear
## mpg 0.0000 0.0000 0.0000 0.0000 0.0000 0.0071 0.0000 0.0008 0.0013
## cyl 0.0000 0.0000 0.0000 0.0000 0.0000 0.0006 0.0000 0.0022 0.0008
## disp 0.0000 0.0000 0.0000 0.0000 0.0000 0.0081 0.0000 0.0001 0.0003
## hp 0.0000 0.0000 0.0000 0.0023 0.0000 0.0000 0.0000 0.0416 0.0639
## drat 0.0000 0.0000 0.0000 0.0023 0.0000 0.6170 0.0102 0.0000 0.0000
## wt 0.0000 0.0000 0.0000 0.0000 0.0000 0.2148 0.0004 0.0000 0.0000
## qsec 0.0071 0.0006 0.0081 0.0000 0.6170 0.2148 0.0000 0.2644 0.4182
## vs 0.0000 0.0000 0.0000 0.0000 0.0102 0.0004 0.0000 0.3570 0.1170
## am 0.0008 0.0022 0.0001 0.0416 0.0000 0.0000 0.2644 0.3570 0.0000
## gear 0.0013 0.0008 0.0003 0.0639 0.0000 0.0000 0.4182 0.1170 0.0000
## carb 0.0000 0.0005 0.0014 0.0000 0.4947 0.0036 0.0000 0.0000 0.7264 0.5312
## carb
## mpg 0.0000
## cyl 0.0005
## disp 0.0014
## hp 0.0000
## drat 0.4947
## wt 0.0036
## qsec 0.0000
## vs 0.0000
## am 0.7264
## gear 0.5312
## carb
Correlograms
Correlograms help us visualize the data in correlation matrices.
# Correlogram
library(corrgram)
corrgram(mtcars, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel = panel.txt, main="Car Milage Data in PC2/PC1 Order")
