March 1 Correlation

R Markdown

Use the default set “iris” for the following experiment

Find the names of the variables in the data set.

d<-iris
View(d) 
head(d)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Plot the data set.

plot(d)

Find the correlation between the variables using the function cor(). What is this correlation method?

d<-iris[,c(-5)]  
d<-iris[c(1:4)] 
cor(d)

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

This is a Pearson correlation method.

Use “Kendall’s” and “Spearman’s” correlation method.

cor(d, method = "kendall")

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length   1.00000000 -0.07699679    0.7185159   0.6553086
## Sepal.Width   -0.07699679  1.00000000   -0.1859944  -0.1571257
## Petal.Length   0.71851593 -0.18599442    1.0000000   0.8068907
## Petal.Width    0.65530856 -0.15712566    0.8068907   1.0000000

cor(d, method = "spearman")

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1667777    0.8818981   0.8342888
## Sepal.Width    -0.1667777   1.0000000   -0.3096351  -0.2890317
## Petal.Length    0.8818981  -0.3096351    1.0000000   0.9376668
## Petal.Width     0.8342888  -0.2890317    0.9376668   1.0000000

Conduct the correlation test between Sepal.Length and Sepal.Width

cor.test(d$Sepal.Length, d$Sepal.Width)

## 
##  Pearson's product-moment correlation
## 
## data:  d$Sepal.Length and d$Sepal.Width
## t = -1.4403, df = 148, p-value = 0.1519
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.27269325  0.04351158
## sample estimates:
##        cor 
## -0.1175698

let us denote cor(data name) as “cr”. You can use any name though.

cr<-cor(d) 
cr

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Install the library “corrplot” to visulaize through the function corrplot()

library(corrplot)

## corrplot 0.84 loaded

corrplot(cr)

Visualize the same data set by different methods, for example “pie”, “color” and “number”

corrplot(cr, method = "pie")

corrplot(cr, method = "color")

corrplot(cr, method = "number")

What do you observe from all these data plot? Are they repetitive? Can you make the graphs of type upper triangular or lower triangular?

corrplot(cr, type = "lower")

corrplot(cr, type = "upper")

Try a di???erent function for scatter plot for any two variable from column 1 to 4 in “iris”. Use pairs() .

pairs(iris[,1:4], pch = 19)

pairs(iris[,1:4], pch = 19, lower.panel = NULL)

Visualize histogram, scatterplot with ???tted curve and correlation coe???cient in a same matrixgraph. You may need to install the library psych and then use pairs.panels().

library(psych)
pairs.panels(iris[,-5],
 hist.col = "#00AFBB",   
 density = TRUE, 
 ellipses = TRUE)

Change the color of the histogram from this website: https://www.rapidtables.com/web/ color/RGB_Color.html

View(mtcars)

March 1 Correlation

Cade Corcoran

March 1, 2019

R Markdown