本周学习了判别分析法,包括距离分析法、Bayes分析法、Fisher分析法,根据今后使用的情况,作业均采用多元分析法的方式
X1 <- c(-1.9, -6.9, 5.2, 5, 7.3, 6.8, 0.9, -12.5, 1.5, 3.8, 0.2, -0.1, 0.4,
2.7, 2.1, -4.6, -1.7, -2.6, 2.6, -2.8)
X2 <- c(3.2, 10.4, 2, 2.5, 0, 12.7, -15.4, -2.5, 1.3, 6.8, 0.2, 7.5, 14.6, 8.3,
0.8, 4.3, 10.9, 13.1, 12.8, 10)
RainDf <- data.frame(X1, X2)
RainFactor <- gl(2, 10)
RainTst <- matrix(c(8.1, 2), nrow = 1)
1.距离判别法:
source("distinguish.distance.R")
distinguish.distance(RainDf, RainFactor, var.equal = T)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## belong 1 2 1 1 1 2 1 1 1 1 1 2 2 2 1 2 2 2 2 2
distinguish.distance(RainDf, RainFactor, var.equal = F)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## belong 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1 2
根据结果可知,考虑方差不同时误判相对较高,但是方差不同对下雨的预测比较准确
distinguish.distance(RainDf, RainFactor, RainTst, var.equal = T)
## 1
## belong 1
distinguish.distance(RainDf, RainFactor, RainTst, var.equal = F)
## 1
## belong 1
预测结果均为有雨
2.Bayes判别法
source("distinguish.bayes-1.R")
prob <- c(1, 1)
distinguish.bayes(RainDf, RainFactor, p = prob, var.equal = T)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## blong 1 2 1 1 1 2 1 1 1 1 1 2 2 2 1 2 2 2 2 2
distinguish.bayes(RainDf, factor(RainFactor), p = prob, var.equal = F)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## blong 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Bayes在方差不同时,完全失效,怀疑课本中多元Bayes程序存在设计缺陷,书中Iris例子在var.equal=T时也出现了分类失效的情况,而书中未做任何解释,感觉是回避了此问题。使用二元分析程序,感觉比较正常,因结论相同这里不再测试。参见http://f.dataguru.cn/thread-207837-1-1.html 下面使用方差相同的情况进行预测
distinguish.bayes(RainDf, RainFactor, TstX = RainTst, var.equal = T)
## 1
## blong 1
结果预测有雨
3.Fisher判别法 这里不使用薛毅教材的程序,直接调用lda函数
library(MASS)
RainTst = data.frame(X1 = c(8.1), X2 = c(2))
rt <- lda(RainFactor ~ X1 + X2, data = RainDf)
predict(rt, RainTst)$class
## [1] 1
## Levels: 1 2
结果也是预测有雨
结论:根据上述3次判别的结论,明天应预报下雨
习题8.2
X1 = c(8.11, 9.36, 9.85, 2.55, 6.01, 9.64, 4.11, 8.9, 7.71, 7.51, 8.06, 6.8,
8.68, 5.67, 8.1, 3.71, 5.37, 9.89, 5.22, 4.71, 4.71, 3.36, 8.27)
X2 = c(261.01, 185.39, 249.58, 137.13, 231.34, 231.38, 260.25, 259.91, 273.84,
303.59, 231.03, 308.9, 258.69, 355.54, 476.69, 316.32, 274.57, 409.42, 330.34,
331.47, 352.5, 347.31, 189.56)
X3 = c(13.23, 9.02, 15.61, 9.21, 14.27, 13.03, 14.72, 14.16, 16.01, 19.14, 14.41,
15.11, 14.02, 15.13, 7.38, 17.12, 16.75, 19.47, 18.19, 21.26, 20.79, 17.9,
12.74)
X4 = c(7.36, 5.99, 6.11, 4.35, 8.79, 8.53, 10.02, 9.79, 8.79, 8.53, 6.15, 8.49,
7.16, 9.43, 11.32, 8.17, 9.67, 10.49, 9.61, 13.72, 11, 11.19, 6.94)
Heath.frame <- data.frame(X1, X2, X3, X4)
Heath.factor <- as.factor(rep(1:3, times = c(11, 7, 5)))
prob <- c(11/23, 7/23, 5/23)
1.距离判别
source("distinguish.distance.R")
htt <- distinguish.distance(Heath.frame, Heath.factor, var.equal = T) #5
length(htt[htt == Heath.factor]) * 100/23
## [1] 78.26
htf <- distinguish.distance(Heath.frame, Heath.factor, var.equal = F) #5
length(htf[htf == Heath.factor]) * 100/23
## [1] 78.26
距离判别回代正确率均为78.26%
2.Beyes判别
source("distinguish.bayes-1.R")
htt <- distinguish.bayes(Heath.frame, Heath.factor, p = prob, var.equal = T) #4
length(htt[htt == Heath.factor]) * 100/23
## [1] 82.61
htf <- distinguish.bayes(Heath.frame, Heath.factor, p = prob, var.equal = F) #7
length(htf[htf == Heath.factor]) * 100/23
## [1] 65.22
Beyes判别在方差相同时回代正确率82.61%,不同时为65.22%
3.Fisher判别
library(MASS)
H <- lda(Heath.factor ~ X1 + X2 + X3 + X4, data = Heath.frame)
ht <- predict(H)$class
length(ht[ht == Heath.factor])/23 * 100
## [1] 78.26
Fisher判别回代正确率为78.26%