Correlations in the mtcars dataset

Sameer Mathur

Correlations (1)

In bivariate distribution, we may be interested to find out if there is any correlation or covariation between the two variables under study. If the change in one variable affects a change in the other variable, the variables are said to be correlated.

Correlations (2)

  • If the two variables deviate in the same direction i.e. if the increase (or decrease) in one results in a corresponding increase (or decrease) in the other, correlation is said to be direct or positive.

Correlations (3)

  • If they constantly deviate in the opposite directions i.e. if increase (or decrease) in one results in corresponding decrease (or increase) in the other, correlation is said to be diverse or negative.

Correlations (4)

For example, the correlation between the heights and weights of a group of persons is positive and the correlation between price and demand of commodity is negative

3 Types of Correlations

Pearson, Spearman and Kendall

  • The Pearson product moment correlation assesses the degree of linear relationship between two quantitative variables.
  • Spearman's rank-order correlation coefficient assesses the degree of relationship between two rank-ordered variables.
  • Kendall's tau is nonparametric measure of rank correlation.

mtcars dataset

We will use the mtcars dataset available in the base R installation.

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

It has 32 observations on 11 variables. See mtcars data description

Summary of the mtcars dataset

attach(mtcars)   # attach the data
library(psych)
describe(mtcars)[, c(1:5, 7:9)]  # selected columns
     vars  n   mean     sd median    mad   min    max
mpg     1 32  20.09   6.03  19.20   5.41 10.40  33.90
cyl     2 32   6.19   1.79   6.00   2.97  4.00   8.00
disp    3 32 230.72 123.94 196.30 140.48 71.10 472.00
hp      4 32 146.69  68.56 123.00  77.10 52.00 335.00
drat    5 32   3.60   0.53   3.70   0.70  2.76   4.93
wt      6 32   3.22   0.98   3.33   0.77  1.51   5.42
qsec    7 32  17.85   1.79  17.71   1.42 14.50  22.90
vs      8 32   0.44   0.50   0.00   0.00  0.00   1.00
am      9 32   0.41   0.50   0.00   0.00  0.00   1.00
gear   10 32   3.69   0.74   4.00   1.48  3.00   5.00
carb   11 32   2.81   1.62   2.00   1.48  1.00   8.00

Variance-Covariance Matrix

# variance-covariance matrix
covmt <- cov(mtcars[,c(1:7)])
round(covmt,2)
         mpg    cyl     disp      hp   drat     wt   qsec
mpg    36.32  -9.17  -633.10 -320.73   2.20  -5.12   4.51
cyl    -9.17   3.19   199.66  101.93  -0.67   1.37  -1.89
disp -633.10 199.66 15360.80 6721.16 -47.06 107.68 -96.05
hp   -320.73 101.93  6721.16 4700.87 -16.45  44.19 -86.77
drat    2.20  -0.67   -47.06  -16.45   0.29  -0.37   0.09
wt     -5.12   1.37   107.68   44.19  -0.37   0.96  -0.31
qsec    4.51  -1.89   -96.05  -86.77   0.09  -0.31   3.19

Correlation Matrix - selected columns

# mpg, cyl, and disp as rows 
x <- mtcars[1:3]
# hp, drat, and wt as columns
y <- mtcars[4:6]
z <- cor(x, y)
round(z,2)
        hp  drat    wt
mpg  -0.78  0.68 -0.87
cyl   0.83 -0.70  0.78
disp  0.79 -0.71  0.89

Correlation Matrix from mtcars

Pearson product-moment correlation coefficients

# Pearson - correlation matrix - all rows and columns 
cormt <- cor(mtcars)
round(cormt,2)
       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43
qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00  0.74 -0.23 -0.21 -0.66
vs    0.66 -0.81 -0.71 -0.72  0.44 -0.55  0.74  1.00  0.17  0.21 -0.57
am    0.60 -0.52 -0.59 -0.24  0.71 -0.69 -0.23  0.17  1.00  0.79  0.06
gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  0.21  0.79  1.00  0.27
carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66 -0.57  0.06  0.27  1.00

Correlation Matrix from mtcars

Spearman rank-order correlation coefficients.

# Spearman - correlation matrix
s <- cor(mtcars, method="spearman")
round(s,2)
       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
mpg   1.00 -0.91 -0.91 -0.89  0.65 -0.89  0.47  0.71  0.56  0.54 -0.66
cyl  -0.91  1.00  0.93  0.90 -0.68  0.86 -0.57 -0.81 -0.52 -0.56  0.58
disp -0.91  0.93  1.00  0.85 -0.68  0.90 -0.46 -0.72 -0.62 -0.59  0.54
hp   -0.89  0.90  0.85  1.00 -0.52  0.77 -0.67 -0.75 -0.36 -0.33  0.73
drat  0.65 -0.68 -0.68 -0.52  1.00 -0.75  0.09  0.45  0.69  0.74 -0.13
wt   -0.89  0.86  0.90  0.77 -0.75  1.00 -0.23 -0.59 -0.74 -0.68  0.50
qsec  0.47 -0.57 -0.46 -0.67  0.09 -0.23  1.00  0.79 -0.20 -0.15 -0.66
vs    0.71 -0.81 -0.72 -0.75  0.45 -0.59  0.79  1.00  0.17  0.28 -0.63
am    0.56 -0.52 -0.62 -0.36  0.69 -0.74 -0.20  0.17  1.00  0.81 -0.06
gear  0.54 -0.56 -0.59 -0.33  0.74 -0.68 -0.15  0.28  0.81  1.00  0.11
carb -0.66  0.58  0.54  0.73 -0.13  0.50 -0.66 -0.63 -0.06  0.11  1.00

Non-Square Correlation Matrices

# four rows in x
x <- mtcars[,c("mpg", "disp", "hp", "drat")]
# two columns in y
y <- mtcars[,c("vs", "carb")]
z <- cor(x,y)
round(z,2)
        vs  carb
mpg   0.66 -0.55
disp -0.71  0.39
hp   -0.72  0.75
drat  0.44 -0.09

This is useful when we are interested in the relationship between one set of variables and another.

Testing a correlation coefficient for significance (1/2)

# correlation test between mpg and wt
cor.test(mpg, wt)

    Pearson's product-moment correlation

data:  mpg and wt
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.9338264 -0.7440872
sample estimates:
       cor 
-0.8676594 

Testing a correlation coefficient for significance (2/2)

# correlation test between disp(3rd column) and drat (5th column)
cor.test(mtcars[,3], mtcars[,5])

    Pearson's product-moment correlation

data:  mtcars[, 3] and mtcars[, 5]
t = -5.5257, df = 30, p-value = 5.282e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8487237 -0.4805193
sample estimates:
       cor 
-0.7102139 

It tests the null hypothesis that the Pearson correlation between disp and drat is 0.

Tests of significance via corr.test() (1/2)

library(psych)
corr.test(mtcars[,c(1,6:9)], use="complete")

We can test only one correlation at a time using cor.test(), but the corr.test() function provided in the psych package allows you to test more than one at a time.

Tests of significance via corr.test() (2/2)

Call:corr.test(x = mtcars[, c(1, 6:9)], use = "complete")
Correlation matrix 
       mpg    wt  qsec    vs    am
mpg   1.00 -0.87  0.42  0.66  0.60
wt   -0.87  1.00 -0.17 -0.55 -0.69
qsec  0.42 -0.17  1.00  0.74 -0.23
vs    0.66 -0.55  0.74  1.00  0.17
am    0.60 -0.69 -0.23  0.17  1.00
Sample Size 
[1] 32
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
      mpg   wt qsec   vs   am
mpg  0.00 0.00 0.07 0.00 0.00
wt   0.00 0.00 0.68 0.00 0.00
qsec 0.02 0.34 0.00 0.00 0.62
vs   0.00 0.00 0.00 0.00 0.68
am   0.00 0.00 0.21 0.36 0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Correlations with significance levels

The rcorr() function in the Hmisc package produces correlations/covariances and significance levels for pearson and spearman correlations.

library(Hmisc)
rcorr(as.matrix(mtcars[,c(1,6:9)]), type="pearson")   # type can be pearson or spearman

Correlations with significance levels

       mpg    wt  qsec    vs    am
mpg   1.00 -0.87  0.42  0.66  0.60
wt   -0.87  1.00 -0.17 -0.55 -0.69
qsec  0.42 -0.17  1.00  0.74 -0.23
vs    0.66 -0.55  0.74  1.00  0.17
am    0.60 -0.69 -0.23  0.17  1.00

n= 32 


P
     mpg    wt     qsec   vs     am    
mpg         0.0000 0.0171 0.0000 0.0003
wt   0.0000        0.3389 0.0010 0.0000
qsec 0.0171 0.3389        0.0000 0.2057
vs   0.0000 0.0010 0.0000        0.3570
am   0.0003 0.0000 0.2057 0.3570       

Corrgrams (1/2)

library(corrgram)
corrgram(mtcars[,c(1,6:9)], order=FALSE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram - mtcars")

Corrgrams (2/2)

plot of chunk unnamed-chunk-14

Summary of Correlations on mtcars

  • Correlations
  • Types of Correlations
  • The mtcars dataset
  • Variance-Covariance Matrices using cov()
  • Correlations between selected columns, using cor()
  • Non-Square Correlation matrices
  • cor.test()
  • psych::corr.test()
  • Hmisc::rcorr()