Grubbs’ Test For Outliers

Outlier Detection

One of the first steps towards obtaining a coherent analysis is the detection of outlying observations. Although outliers are often considered as an error or noise, they may carry important information (see Mandelbrot/Taleb).

Detected outliers are candidates for aberrant data that may otherwise adversely lead to model misspecification, biased parameter estimation and incorrect results. It is therefore important to identify them prior to modelling and analysis.

Applications of Outlier Detection

Outlier detection methods have been suggested for numerous applications, such as credit card fraud detection, clinical trials, voting irregularity analysis, data cleansing, network intrusion, severe weather prediction, geographic information systems and athlete performance analysis.

Grubbs’ Test

Grubbs’ test is a formal hypothesis test for assessing whether or not a data set contains an outlier.
This data set is univariate and approximately normal distributed.
The Grubbs’ test is designed for assessing one outlier only.If more outliers are suspected, alternative tests, such as the Tietjen-Moore test, are recommended.

Hypotheses

Hypotheses: Grubbs’ test is defined for the hypothesis:

[Ho] : There are no outliers in the data set
[Ha] : There is exactly one outlier in the data set


install.packages("outliers")
library(outliers)
#Package Author : Lukasz Komsta (UMLUB, Poland)

grubbs.test(myData)

library(outliers)
set.seed(1234)
X <- c(rnorm(99,15,1),20) 
grubbs.test(X)

## 
##  Grubbs test for one outlier
## 
## data:  X
## G = 4.63470, U = 0.78083, p-value = 4.517e-05
## alternative hypothesis: highest value 20 is an outlier

Outliers on Boxplots

Boxplots can used to indentify potential outliers. However there is a different mechanism for classifying outliers, and various analyses may not always agree on particular cases.

    boxplot(X, col="lightblue",pch=16,horizontal = TRUE)