Pearson correlation (r), which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it depends to the distribution of the data. It can be used only when x and y are from normal distribution.
mpg and wt in the dataframe mtcars## [1] -0.8676594
mpg and wt in the dataframe mtcars# correlation significance test using between mpg and flying wt
cor.test(mpg, wt, method = "pearson")##
## Pearson's product-moment correlation
##
## data: mpg and wt
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9338264 -0.7440872
## sample estimates:
## cor
## -0.8676594
Here, the p-value of the test is 1.294e-10, which is less than the significance level alpha = 0.05. We can conclude that mpg and wt are significantly correlated with a correlation coefficient of -0.8676594 and p-value of 1.294e-10 and the negative value of correlation coefficient indicates to the negative correlation.
Spearman is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function.
mpg and wt are normally distributed or not?##
## Shapiro-Wilk normality test
##
## data: mtcars$mpg
## W = 0.94756, p-value = 0.1229
##
## Shapiro-Wilk normality test
##
## data: mtcars$wt
## W = 0.94326, p-value = 0.09265
The p-value is greater than 0.05, hence we can assume mpg and wt are normally distributed.
mpg and wt in the dataframe mtcars## [1] -0.886422
mpg and wt in the dataframe mtcars# correlation significance test using between mpg and flying wt
cor.test(mpg, wt, method = "spearman")## Warning in cor.test.default(mpg, wt, method = "spearman"): Cannot compute exact
## p-value with ties
##
## Spearman's rank correlation rho
##
## data: mpg and wt
## S = 10292, p-value = 1.488e-11
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.886422
Here, the p-value of the test is 1.488e-11, which is less than the significance level alpha = 0.05. We can conclude that mpg and wt are significantly correlated with a correlation coefficient of -0.8676594 and p-value of 1.488e-11 and the negative value of correlation coefficient indicates to the negative correlation.
# subsetting data only with continuous variables
mtcarsSubset <- mtcars[,c('mpg','wt','hp','drat')]
#mtcarsSubset2 <- subset(mtcars, select=c(mpg,wt,hp,drat))
# correlation matrix on new dataframe mtcarsSubset
corMat <- cor(mtcarsSubset)
# round off upto 3 decimal places
round(corMat, 3)## mpg wt hp drat
## mpg 1.000 -0.868 -0.776 0.681
## wt -0.868 1.000 0.659 -0.712
## hp -0.776 0.659 1.000 -0.449
## drat 0.681 -0.712 -0.449 1.000
## mpg wt hp drat
## mpg 1.00 -0.87 -0.78 0.68
## wt -0.87 1.00 0.66 -0.71
## hp -0.78 0.66 1.00 -0.45
## drat 0.68 -0.71 -0.45 1.00
##
## n= 32
##
##
## P
## mpg wt hp drat
## mpg 0.00 0.00 0.00
## wt 0.00 0.00 0.00
## hp 0.00 0.00 0.01
## drat 0.00 0.00 0.01
# visualization using scatter plot(between "mpg" and "wt")
plot(mtcars$wt, mtcars$mpg,
main = "Scatterplot",
xlab ="Car Weight ",
ylab ="Miles Per Gallon ",
pch = 19)
# regression line (y~x)
abline(lm(mtcars$mpg ~ mtcars$wt), col="red")# basic Scatterplot Matrix
pairs(~ mpg + disp + drat + wt,
data = mtcars,
main = "Simple Scatterplot Matrix")## Loading required package: carData
# scatter plot matrix colored by three cylender options
scatterplotMatrix(~ mpg
+ disp
+ drat
+ wt,
data = mtcars,
main = "Simple Scatterplot Matrix")## Warning: package 'corrplot' was built under R version 4.0.3
## corrplot 0.84 loaded
## Warning: package 'corrgram' was built under R version 4.0.5
##
## Attaching package: 'corrgram'
## The following object is masked from 'package:lattice':
##
## panel.fill
# corrgram
corrgram(mtcarsSubset,
lower.panel = panel.shade,
upper.panel = panel.conf, text.panel = panel.txt,
main = "Corrgram")# subsetting data only with continuous variables
mtcarsSubset <- mtcars[,c('mpg','wt','hp','drat')]
# correlogram
library(corrplot)
corrplot(cor(mtcarsSubset), method = "circle")# correlogram with correlation coefficient
library(corrplot)
corrplot(cor(mtcarsSubset), method = "ellipse")