外れ値の検定

データの作成

意図的に外れ値を挿入する

set.seed(1)
n <- 30 ; mn <- 50 ; sd <- 5
v1 <- c(0.01, rnorm(n-1, mean = mn, sd = sd))
v2 <- c(100, rnorm(n-1, mean = mn, sd = sd))
v3 <- c(rnorm(n, mean = mn, sd = sd))
boxplot(cbind(v1,v2,v3))

スミルノフ・グラブス検定

grubbs.test
- type=10 : Grubbs test for one outlier (デフォルト)
  - “alternative hypothesis: highest value x is an outlier”
- type=11 : Grubbs test for two opposite outliers
  - “alternative hypothesis: x and y are outliers”
- type=20 : Grubbs test for two outliers
  - サンプルサイズが30 を超えるデータには適合しないらしい。
  - “alternative hypothesis: highest values x , y are outliers”

# grubbsテスト
grb.v1 <- outliers::grubbs.test(v1)
grb.v2 <- outliers::grubbs.test(v2)
grb.v3 <- outliers::grubbs.test(v3)
grb.v1 ; grb.v2 ; grb.v3

## 
##  Grubbs test for one outlier
## 
## data:  v1
## G = 4.73281, U = 0.20097, p-value = 4.361e-10
## alternative hypothesis: lowest value 0.01 is an outlier
## 
## 
##  Grubbs test for one outlier
## 
## data:  v2
## G = 4.84723, U = 0.16187, p-value = 2.062e-11
## alternative hypothesis: highest value 100 is an outlier
## 
## 
##  Grubbs test for one outlier
## 
## data:  v3
## G = 2.38506, U = 0.79708, p-value = 0.1873
## alternative hypothesis: highest value 62.0080888025239 is an outlier

`grubbs.test`の結果を取り出す

結果はリストで返る
外れ値と判定された値は文字列中に含まれるので、numericに変換
grubbs.test(type=11)で実行した場合外れ値は2つ出力される

#データフレームを作成
dat <- data.frame(
        v1 =round(v1, digits = 5),
        v2 =round(v2, digits = 5),
        v3 =round(v3, digits = 5)
        )

# 全ての列に対してgrubbs検定を実行. 結果をres.grbに格納
res.grb <- lapply(dat, outliers::grubbs.test)
#res.grb <- lapply(dat, function(x){outliers::grubbs.test(x, type=11)})
#res.grb <- lapply(dat, function(x){outliers::grubbs.test(x, type=20)})

# grubbs検定の結果から値を取り出し、データフレームに格納
## サンプルインデックス
idx <- names(dat) 
## p-value
pv <- sapply(res.grb, function(x){x$p.value})
## 外れ値(typeオプションの値で条件分岐)
out <- sapply(res.grb, function(x){
  str <- x$alternative
  if(grepl("and", str)){
    round(as.numeric(unlist(strsplit(str," "))[c(1,3)]), digits = 5)
  } else if (grepl(",", str)) {
    round(as.numeric(unlist(strsplit(str," "))[c(3,5)]),digits=5)
  } else {
    round(as.numeric(unlist(strsplit(str," "))[3]),digits=5)
  } 
  })
out <-if(is.matrix(out)){setNames(data.frame(t(out)),c("out1","out2"))}else{out}
## 有意差の有無で外れ値のカラーコードを付ける
pvth <- 0.01
vcol <- ifelse(pv < pvth, "red", "black")
## データフレームに格納
grb_dat <- data.frame(idx, pv, out, vcol, row.names = NULL)
head(grb_dat)

##   idx           pv       out  vcol
## 1  v1 4.360845e-10   0.01000   red
## 2  v2 2.062350e-11 100.00000   red
## 3  v3 1.873270e-01  62.00809 black

外れ値をboxplot上に図示する

boxplot()で箱ひげ図を作成し、
外れ値を除外したデータを使用してstripchart()を上書き
points()で外れ値のみを上書き
プロットエリア外にレジェンドを追記

# 行持ちデータに変換
gg_dat <- tidyr::gather(dat, key="k", value="v") 

# 外れ値を除外したデータ
strip_dat <- Reduce(rbind, 
              lapply(seq_along(idx), function(i){
                subd <- gg_dat[gg_dat$k %in% idx[i],]
                vout <- grb_dat[i,grep("out",names(grb_dat))]
                subd[!subd$v %in% vout,]
                })
              )

# 箱ひげ図
boxplot(v~k, gg_dat, outline = FALSE, ylim = range(gg_dat$v), xlab = NA, ylab=NA)

# ストリップチャート
stripchart(v~k, data=strip_dat, method="jitter", pch=1, vertical=T, add = TRUE)
# 外れ値を図示(type=11かtype=10かで分岐)
if(sum(grepl("out", names(grb_dat)))>1){
  vout <- unlist(t(grb_dat[grep("out", names(grb_dat))]))
  vcol <- rep(grb_dat$vcol, each=2)
  points(x=rep(1:3, each=2), vout, pch=16, col=vcol)
} else {
  points(x=c(1:3), grb_dat$out, pch=16, col=grb_dat$vcol)
}

# legend追記
add_legend <- function(...) {
  opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), 
    mar=c(0, 0, 0, 0), new=TRUE)
  on.exit(par(opar))
  plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
  legend(...)
}


labs <-  c(paste("outlier: p < ", pvth), paste("outlier: p >= ", pvth))
add_legend("topright", legend=labs, pch=16, 
   col=c("red", "black"),
   horiz=TRUE, bty='n', cex=0.8)

外れ値の検定

S. Konishi

2024-03-13

データの作成

スミルノフ・グラブス検定

`grubbs.test`の結果を取り出す

外れ値をboxplot上に図示する

参考

外れ値の検定

S. Konishi

2024-03-13

データの作成

スミルノフ・グラブス検定

grubbs.testの結果を取り出す

外れ値をboxplot上に図示する

参考

`grubbs.test`の結果を取り出す