R Markdown

Use the default set “iris” for the following experiment

  1. Find the names of the variables in the data set.
d<-iris
View(d) 
head(d)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
  1. Plot the data set.
plot(d)

  1. Find the correlation between the variables using the function cor(). What is this correlation method?
d<-iris[,c(-5)]  
d<-iris[c(1:4)] 
cor(d)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

This is a Pearson correlation method.

  1. Use “Kendall’s” and “Spearman’s” correlation method.
cor(d, method = "kendall") 
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length   1.00000000 -0.07699679    0.7185159   0.6553086
## Sepal.Width   -0.07699679  1.00000000   -0.1859944  -0.1571257
## Petal.Length   0.71851593 -0.18599442    1.0000000   0.8068907
## Petal.Width    0.65530856 -0.15712566    0.8068907   1.0000000
cor(d, method = "spearman")
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1667777    0.8818981   0.8342888
## Sepal.Width    -0.1667777   1.0000000   -0.3096351  -0.2890317
## Petal.Length    0.8818981  -0.3096351    1.0000000   0.9376668
## Petal.Width     0.8342888  -0.2890317    0.9376668   1.0000000
  1. Conduct the correlation test between Sepal.Length and Sepal.Width
cor.test(d$Sepal.Length, d$Sepal.Width)
## 
##  Pearson's product-moment correlation
## 
## data:  d$Sepal.Length and d$Sepal.Width
## t = -1.4403, df = 148, p-value = 0.1519
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.27269325  0.04351158
## sample estimates:
##        cor 
## -0.1175698
  1. let us denote cor(data name) as “cr”. You can use any name though.
cr<-cor(d) 
cr
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000
  1. Install the library “corrplot” to visulaize through the function corrplot()
library(corrplot)
## corrplot 0.84 loaded
corrplot(cr)

  1. Visualize the same data set by different methods, for example “pie”, “color” and “number”
corrplot(cr, method = "pie") 

corrplot(cr, method = "color") 

corrplot(cr, method = "number")

  1. What do you observe from all these data plot? Are they repetitive? Can you make the graphs of type upper triangular or lower triangular?
corrplot(cr, type = "lower") 

corrplot(cr, type = "upper")

  1. Try a di???erent function for scatter plot for any two variable from column 1 to 4 in “iris”. Use pairs() .
pairs(iris[,1:4], pch = 19) 

pairs(iris[,1:4], pch = 19, lower.panel = NULL)

  1. Visualize histogram, scatterplot with ???tted curve and correlation coe???cient in a same matrixgraph. You may need to install the library psych and then use pairs.panels().
library(psych)
pairs.panels(iris[,-5],
 hist.col = "#00AFBB",   
 density = TRUE, 
 ellipses = TRUE) 

  1. Change the color of the histogram from this website: https://www.rapidtables.com/web/ color/RGB_Color.html
View(mtcars)