## corrplot 0.84 loaded
散点图-双变量关系的图形化表示
两个变量间的关系到底有多强?
取值范围[-1,1]
符号正负表示方向
绝对值大小表示强度
接近完美相关:
强相关:
弱相关相关:
不相关:
负相关:
非线性:
\[correlation = r(x, y)=\frac{Cov(x, y)}{\delta_x*\delta_y}\]
\[Cov(x, y) = \frac{\sum ^n _{i=1} (x_i-\bar{x})* (y_i-\bar{y})}{n}\]
\[\delta _x = \sqrt{\frac{\sum ^n _{i=1} (x_i-\bar{x})^2}{n}}\]
相关性不等于因果关系(大学辍学与财富积累)
尤其注意被时间联系在一起的数据,他们的相关性可能毫无意义
尼古拉斯凯奇的电影与在游泳池溺亡的人数有关系吗?
摇滚乐与美国石油产量有关系吗?
高速路死亡率与鲜柠檬进口量有关系吗?
记得在自己本地电脑上安装install.packages("corrplot")
查看数据结构和内容:
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
所有变量间的相关系数:
cor(mtcars)
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
保存所有相关系数为M
,保留两位小数:
library(corrplot)
M<-cor(mtcars)
round(M,2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
## cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
## disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
## hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
## drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
## wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
## qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
## vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
## am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
## gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
## carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
corrplot主要有数种展现相关性的方法:“circle”,“pie”, “color”, “number”。 一般表达式为:
corrplot(corr, method="circle")
library(corrplot)
M<-cor(mtcars)
corrplot(M, method="circle")
可以根据需要改变相关性的表现形式:
library(corrplot)
M<-cor(mtcars)
corrplot(M, method="pie")
library(corrplot)
M<-cor(mtcars)
corrplot(M, method="color")
library(corrplot)
M<-cor(mtcars)
corrplot(M, method="number")
只保留一半的图形,另一半与之完全对称:
library(corrplot)
M<-cor(mtcars)
corrplot(M, type="upper")
library(corrplot)
M<-cor(mtcars)
corrplot(M, type="lower")
改字体颜色以及字体方向:
library(corrplot)
M<-cor(mtcars)
corrplot(M, type="upper", tl.col="black", tl.srt=45)
改变背景颜色:
library(corrplot)
M<-cor(mtcars)
# Change background color to lightblue
corrplot(M, type="upper", col=c("black", "white"),
bg="lightblue")
使用下面的公式,计算相关性的P值,是否显著相关:
# mat : is a matrix of data
# ... : further arguments to pass to the native R cor.test function
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# matrix of the p-value of the correlation
p.mat <- cor.mtest(mtcars)
head(p.mat[, 1:5])
## mpg cyl disp hp drat
## mpg 0.000000e+00 6.112687e-10 9.380327e-10 1.787835e-07 1.776240e-05
## cyl 6.112687e-10 0.000000e+00 1.802838e-12 3.477861e-09 8.244636e-06
## disp 9.380327e-10 1.802838e-12 0.000000e+00 7.142679e-08 5.282022e-06
## hp 1.787835e-07 3.477861e-09 7.142679e-08 0.000000e+00 9.988772e-03
## drat 1.776240e-05 8.244636e-06 5.282022e-06 9.988772e-03 0.000000e+00
## wt 1.293959e-10 1.217567e-07 1.222320e-11 4.145827e-05 4.784260e-06
library(corrplot)
M<-cor(mtcars)
# Specialized the insignificant value according to the significant level
corrplot(M, type="upper", order="hclust",
p.mat = p.mat, sig.level = 0.01)
让不显著的关联消失:
library(corrplot)
M<-cor(mtcars)
# Leave blank on no significant coefficient
corrplot(M, type="upper",
p.mat = p.mat, sig.level = 0.01, insig = "blank")
还可以调整其它参数,作出自己想要的结果:
library(corrplot)
M<-cor(mtcars)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method="color", col=col(200),
type="upper",
addCoef.col = "black", # Add coefficient of correlation
tl.col="black", tl.srt=45, #Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.01, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag=FALSE
)
使用 corrplot()
函数 作出炫酷的相关性矩阵 correlation matrix,你学会了吗?
再教你一招,来补一刀!
install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
my_data <- mtcars[, c(1,3,4,5,6,7)]
chart.Correlation(my_data, histogram=TRUE, pch=19)