One of the first steps towards obtaining a coherent analysis is the detection of outlying observations. Although outliers are often considered as an error or noise, they may carry important information (see Mandelbrot/Taleb).
Detected outliers are candidates for aberrant data that may otherwise adversely lead to model misspecification, biased parameter estimation and incorrect results. It is therefore important to identify them prior to modelling and analysis.
Outlier detection methods have been suggested for numerous applications, such as credit card fraud detection, clinical trials, voting irregularity analysis, data cleansing, network intrusion, severe weather prediction, geographic information systems and athlete performance analysis.
Hypotheses: Grubbs’ test is defined for the hypothesis:
[Ho] : There are no outliers in the data set
[Ha] : There is exactly one outlier in the data set
install.packages("outliers")
library(outliers)
#Package Author : Lukasz Komsta (UMLUB, Poland)
grubbs.test(myData)
library(outliers)
set.seed(1234)
X <- c(rnorm(99,15,1),20)
grubbs.test(X)
##
## Grubbs test for one outlier
##
## data: X
## G = 4.63470, U = 0.78083, p-value = 4.517e-05
## alternative hypothesis: highest value 20 is an outlier
Boxplots can used to indentify potential outliers. However there is a different mechanism for classifying outliers, and various analyses may not always agree on particular cases.
boxplot(X, col="lightblue",pch=16,horizontal = TRUE)