true

Correlation Coefficient

The correlation coefficient of two variables in a data sample is their covariance divided by the product of 
their individual standard deviations. 

It is a normalized measurement of how the two are linearly related. 
In R, one can use the cor( ) function to produce correlations and the cov( ) function to produces covariances.

A simplified format is cor(x, use=, method= ) where

Option      Description
______      ____________________________________________________________________________

x           Matrix or data frame

use         Specifies the handling of missing data. Options are all.obs (assumes no missing 
            data - missing data will produce an error), complete.obs (listwise deletion), 
            and pairwise.complete.obs (pairwise deletion)

method      Specifies the type of correlation. Options are pearson, spearman or kendall.
For this blog purpose, we will use R's inbuilt dataset "mtcars" 
(for more info, please vist: 
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)

A data frame with 32 observations on 11 variables.
[, 1]   mpg     Miles/(US) gallon
[, 2]   cyl     Number of cylinders
[, 3]   disp    Displacement (cu.in.)
[, 4]   hp  Gross horsepower
[, 5]   drat    Rear axle ratio
[, 6]   wt  Weight (1000 lbs)
[, 7]   qsec    1/4 mile time
[, 8]   vs  V/S
[, 9]   am  Transmission (0 = automatic, 1 = manual)
[,10]   gear    Number of forward gears
[,11]   carb    Number of carburetors 

R Code:

# Below calculates the Correlations among (numeric) variables in R's inbuilt data "mtcars". 
# We will use listwise deletion of missing data and method of calculation as by Kendall.

cor(mtcars, use = "complete.obs", method = "kendall")
##             mpg        cyl       disp         hp        drat         wt
## mpg   1.0000000 -0.7953134 -0.7681311 -0.7428125  0.46454879 -0.7278321
## cyl  -0.7953134  1.0000000  0.8144263  0.7851865 -0.55131785  0.7282611
## disp -0.7681311  0.8144263  1.0000000  0.6659987 -0.49898277  0.7433824
## hp   -0.7428125  0.7851865  0.6659987  1.0000000 -0.38262689  0.6113081
## drat  0.4645488 -0.5513178 -0.4989828 -0.3826269  1.00000000 -0.5471495
## wt   -0.7278321  0.7282611  0.7433824  0.6113081 -0.54714953  1.0000000
## qsec  0.3153652 -0.4489698 -0.3008155 -0.4729061  0.03272155 -0.1419881
## vs    0.5896790 -0.7710007 -0.6033059 -0.6305926  0.37510111 -0.4884787
## am    0.4690128 -0.4946212 -0.5202739 -0.3039956  0.57554849 -0.6138790
## gear  0.4331509 -0.5125435 -0.4759795 -0.2794458  0.58392476 -0.5435956
## carb -0.5043945  0.4654299  0.4137360  0.5959842 -0.09535193  0.3713741
##             qsec         vs          am        gear        carb
## mpg   0.31536522  0.5896790  0.46901280  0.43315089 -0.50439455
## cyl  -0.44896982 -0.7710007 -0.49462115 -0.51254349  0.46542994
## disp -0.30081549 -0.6033059 -0.52027392 -0.47597955  0.41373600
## hp   -0.47290613 -0.6305926 -0.30399557 -0.27944584  0.59598416
## drat  0.03272155  0.3751011  0.57554849  0.58392476 -0.09535193
## wt   -0.14198812 -0.4884787 -0.61387896 -0.54359562  0.37137413
## qsec  1.00000000  0.6575431 -0.16890405 -0.09126069 -0.50643945
## vs    0.65754312  1.0000000  0.16834512  0.26974788 -0.57692729
## am   -0.16890405  0.1683451  1.00000000  0.77078758 -0.05859929
## gear -0.09126069  0.2697479  0.77078758  1.00000000  0.09801487
## carb -0.50643945 -0.5769273 -0.05859929  0.09801487  1.00000000
# And covariance among the variables:
cov(mtcars, use="complete.obs") 
##              mpg         cyl        disp          hp         drat
## mpg    36.324103  -9.1723790  -633.09721 -320.732056   2.19506351
## cyl    -9.172379   3.1895161   199.66028  101.931452  -0.66836694
## disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915
## hp   -320.732056 101.9314516  6721.15867 4700.866935 -16.45110887
## drat    2.195064  -0.6683669   -47.06402  -16.451109   0.28588135
## wt     -5.116685   1.3673710   107.68420   44.192661  -0.37272073
## qsec    4.509149  -1.8868548   -96.05168  -86.770081   0.08714073
## vs      2.017137  -0.7298387   -44.37762  -24.987903   0.11864919
## am      1.803931  -0.4657258   -36.56401   -8.320565   0.19015121
## gear    2.135685  -0.6491935   -50.80262   -6.358871   0.27598790
## carb   -5.363105   1.5201613    79.06875   83.036290  -0.07840726
##               wt         qsec           vs           am        gear
## mpg   -5.1166847   4.50914919   2.01713710   1.80393145   2.1356855
## cyl    1.3673710  -1.88685484  -0.72983871  -0.46572581  -0.6491935
## disp 107.6842040 -96.05168145 -44.37762097 -36.56401210 -50.8026210
## hp    44.1926613 -86.77008065 -24.98790323  -8.32056452  -6.3588710
## drat  -0.3727207   0.08714073   0.11864919   0.19015121   0.2759879
## wt     0.9573790  -0.30548161  -0.27366129  -0.33810484  -0.4210806
## qsec  -0.3054816   3.19316613   0.67056452  -0.20495968  -0.2804032
## vs    -0.2736613   0.67056452   0.25403226   0.04233871   0.0766129
## am    -0.3381048  -0.20495968   0.04233871   0.24899194   0.2923387
## gear  -0.4210806  -0.28040323   0.07661290   0.29233871   0.5443548
## carb   0.6757903  -1.89411290  -0.46370968   0.04637097   0.3266129
##             carb
## mpg  -5.36310484
## cyl   1.52016129
## disp 79.06875000
## hp   83.03629032
## drat -0.07840726
## wt    0.67579032
## qsec -1.89411290
## vs   -0.46370968
## am    0.04637097
## gear  0.32661290
## carb  2.60887097
Unfortunately, neither cor() function or cov() function produce tests of significance in R, one can use the cor.test() function to test a single correlation coefficient. 

test a single correlation coefficient

cor.test(mtcars$mpg, mtcars$cyl, method = "kendall", alternative = "greater",
         exact = FALSE) # using large sample approximation
## 
##  Kendall's rank correlation tau
## 
## data:  mtcars$mpg and mtcars$cyl
## z = -5.5913, p-value = 1
## alternative hypothesis: true tau is greater than 0
## sample estimates:
##        tau 
## -0.7953134

Alternate method

Below is the alternate method to calculate correlations, covariances and significance levels for pearson and spearman correlations.
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
# Correlations with significance levels
# Method type is pearson
## rcorr(x, type="pearson") 

# Method type is spearman
## rcorr(x, type="spearman")

# using mtcars data
rcorr(as.matrix(mtcars), type="pearson") 
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
## vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
## am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00
## 
## n= 32 
## 
## 
## P
##      mpg    cyl    disp   hp     drat   wt     qsec   vs     am     gear  
## mpg         0.0000 0.0000 0.0000 0.0000 0.0000 0.0171 0.0000 0.0003 0.0054
## cyl  0.0000        0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0022 0.0042
## disp 0.0000 0.0000        0.0000 0.0000 0.0000 0.0131 0.0000 0.0004 0.0010
## hp   0.0000 0.0000 0.0000        0.0100 0.0000 0.0000 0.0000 0.1798 0.4930
## drat 0.0000 0.0000 0.0000 0.0100        0.0000 0.6196 0.0117 0.0000 0.0000
## wt   0.0000 0.0000 0.0000 0.0000 0.0000        0.3389 0.0010 0.0000 0.0005
## qsec 0.0171 0.0004 0.0131 0.0000 0.6196 0.3389        0.0000 0.2057 0.2425
## vs   0.0000 0.0000 0.0000 0.0000 0.0117 0.0010 0.0000        0.3570 0.2579
## am   0.0003 0.0022 0.0004 0.1798 0.0000 0.0000 0.2057 0.3570        0.0000
## gear 0.0054 0.0042 0.0010 0.4930 0.0000 0.0005 0.2425 0.2579 0.0000       
## carb 0.0011 0.0019 0.0253 0.0000 0.6212 0.0146 0.0000 0.0007 0.7545 0.1290
##      carb  
## mpg  0.0011
## cyl  0.0019
## disp 0.0253
## hp   0.0000
## drat 0.6212
## wt   0.0146
## qsec 0.0000
## vs   0.0007
## am   0.7545
## gear 0.1290
## carb

Interpretations

In the above correlation, we can see that strongest (and positive) correlation is 
between variables "am (transmission type)" and "gear (no. of forward gear)" and is 0.7545
# using mtcars data
rcorr(as.matrix(mtcars), type="spearman") 
##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.91 -0.91 -0.89  0.65 -0.89  0.47  0.71  0.56  0.54 -0.66
## cyl  -0.91  1.00  0.93  0.90 -0.68  0.86 -0.57 -0.81 -0.52 -0.56  0.58
## disp -0.91  0.93  1.00  0.85 -0.68  0.90 -0.46 -0.72 -0.62 -0.59  0.54
## hp   -0.89  0.90  0.85  1.00 -0.52  0.77 -0.67 -0.75 -0.36 -0.33  0.73
## drat  0.65 -0.68 -0.68 -0.52  1.00 -0.75  0.09  0.45  0.69  0.74 -0.13
## wt   -0.89  0.86  0.90  0.77 -0.75  1.00 -0.23 -0.59 -0.74 -0.68  0.50
## qsec  0.47 -0.57 -0.46 -0.67  0.09 -0.23  1.00  0.79 -0.20 -0.15 -0.66
## vs    0.71 -0.81 -0.72 -0.75  0.45 -0.59  0.79  1.00  0.17  0.28 -0.63
## am    0.56 -0.52 -0.62 -0.36  0.69 -0.74 -0.20  0.17  1.00  0.81 -0.06
## gear  0.54 -0.56 -0.59 -0.33  0.74 -0.68 -0.15  0.28  0.81  1.00  0.11
## carb -0.66  0.58  0.54  0.73 -0.13  0.50 -0.66 -0.63 -0.06  0.11  1.00
## 
## n= 32 
## 
## 
## P
##      mpg    cyl    disp   hp     drat   wt     qsec   vs     am     gear  
## mpg         0.0000 0.0000 0.0000 0.0000 0.0000 0.0071 0.0000 0.0008 0.0013
## cyl  0.0000        0.0000 0.0000 0.0000 0.0000 0.0006 0.0000 0.0022 0.0008
## disp 0.0000 0.0000        0.0000 0.0000 0.0000 0.0081 0.0000 0.0001 0.0003
## hp   0.0000 0.0000 0.0000        0.0023 0.0000 0.0000 0.0000 0.0416 0.0639
## drat 0.0000 0.0000 0.0000 0.0023        0.0000 0.6170 0.0102 0.0000 0.0000
## wt   0.0000 0.0000 0.0000 0.0000 0.0000        0.2148 0.0004 0.0000 0.0000
## qsec 0.0071 0.0006 0.0081 0.0000 0.6170 0.2148        0.0000 0.2644 0.4182
## vs   0.0000 0.0000 0.0000 0.0000 0.0102 0.0004 0.0000        0.3570 0.1170
## am   0.0008 0.0022 0.0001 0.0416 0.0000 0.0000 0.2644 0.3570        0.0000
## gear 0.0013 0.0008 0.0003 0.0639 0.0000 0.0000 0.4182 0.1170 0.0000       
## carb 0.0000 0.0005 0.0014 0.0000 0.4947 0.0036 0.0000 0.0000 0.7264 0.5312
##      carb  
## mpg  0.0000
## cyl  0.0005
## disp 0.0014
## hp   0.0000
## drat 0.4947
## wt   0.0036
## qsec 0.0000
## vs   0.0000
## am   0.7264
## gear 0.5312
## carb

Correlograms

Correlograms help us visualize the data in correlation matrices.
# Correlogram
library(corrgram)

corrgram(mtcars, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel = panel.txt, main="Car Milage Data in PC2/PC1 Order")