試圖找到兩類物體或事件的特徵的一個線性組合,以能夠特徵化或區分它們。所得的組合可用來作為一個線性分類器,或者,更常見的是,為後續的分類做降維處理。
於 R 中套件 MASS 提供 LDA 函式。
if(!("MASS" %in% rownames(installed.packages()))) {
install.packages("MASS")
}
library("MASS")
使用 R 內建的 iris 資料當例子,其中 iris 由 5 個變數組成,iris 資料維度為 [50,4,3] (50 行 x 4 列 x 3 類別) ,分別為 Sepal.L, Sepal.W, Petal.L, Petal.W 與 Sp。其中 Sp 由 3 類組成,即 Setosa (s), Versicolor (c) 及 Virginica (v)。此 5 項資料中,前 4 項為特徵變數,最後 1 項為分組變數。
Iris = data.frame(
rbind(iris3[,,1],iris3[,,2],iris3[,,3])
)
Sp = rep(c("s","c","v"), rep(50,3))
# lda prototype
# |- x: 公式,如 groups~x1+x2+...
# |- data: 資料
# |- prior: 用來指定總體上每組出現的機率,即先驗機率
# |- subset: 指定資料中一部分當作 training data
lda(x, data, prior=probability, subset=c(), ...)
train = sample(1:150, 75)
# Sp~. 表示將分類(Sp)針對所有特徵(共四項)進行 LDA 分析
z = lda(Sp~., Iris, prior=c(1,1,1)/3, subset=train)
z
## Call:
## lda(Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)
##
## Prior probabilities of groups:
## c s v
## 0.3333333 0.3333333 0.3333333
##
## Group means:
## Sepal.L. Sepal.W. Petal.L. Petal.W.
## c 5.917391 2.760870 4.291304 1.3521739
## s 5.022727 3.381818 1.459091 0.2272727
## v 6.476667 2.966667 5.420000 2.0033333
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.L. 0.8139983 -0.1686539
## Sepal.W. 1.7255985 2.3260218
## Petal.L. -2.1514551 -0.6573666
## Petal.W. -2.8972464 2.3237338
##
## Proportion of trace:
## LD1 LD2
## 0.9924 0.0076
# predict prototype
# |- model: LDA 模型
# |- testing_data: 用來預測的模型
predict(model, testing_data)
# true class
Sp[-train]
## [1] "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s"
## [18] "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "s" "c" "c" "c" "c" "c" "c"
## [35] "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"
## [52] "c" "c" "c" "c" "v" "v" "v" "v" "v" "v" "v" "v" "v" "v" "v" "v" "v"
## [69] "v" "v" "v" "v" "v" "v" "v"
# predicted class
predict(z, Iris[-train,])$class
## [1] s s s s s s s s s s s s s s s s s s s s s s s s s s s s c c c c c c c
## [36] c c c c c c c c c c c c c c c c c c c c v v v v v v v v v v v v v v v
## [71] v v v v v
## Levels: c s v
# update prototype
# |- update_feature: .~. 為更新所有變數,或如 .~.-Petal.W. 表示去除 Petal.W 之外所有特徵皆更新
update(Origin_model, update_features)
(z1 = update(z,.~.))
## Call:
## lda(Sp ~ Sepal.L. + Sepal.W. + Petal.L. + Petal.W., data = Iris,
## prior = c(1, 1, 1)/3, subset = train)
##
## Prior probabilities of groups:
## c s v
## 0.3333333 0.3333333 0.3333333
##
## Group means:
## Sepal.L. Sepal.W. Petal.L. Petal.W.
## c 5.917391 2.760870 4.291304 1.3521739
## s 5.022727 3.381818 1.459091 0.2272727
## v 6.476667 2.966667 5.420000 2.0033333
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.L. 0.8139983 -0.1686539
## Sepal.W. 1.7255985 2.3260218
## Petal.L. -2.1514551 -0.6573666
## Petal.W. -2.8972464 2.3237338
##
## Proportion of trace:
## LD1 LD2
## 0.9924 0.0076
而 LD1 與 LD2 表示為判別係數,代表判別函式能夠解釋原來特徵之中變異資料所佔的比例。而 LD1 達 98.5% 表示已足夠用來判斷資料。