Here is the R Markdown file for the topic on outlier detection, specifically with the use of the Rosner’s Test for Outliers, presented in Module 6 Unit 2. Inspect the parts of this file, particularly how the scripts and texts are written.
\(~\)
To start with, let us first load the necessary packages.
library(readr)
library(outliers)
library(EnvStats)
##
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
## The following object is masked from 'package:base':
##
## print.default
\(~\)
The next step is for us to import the “outlier.csv” data into RStudio and assign it to the object that we are going to name as “data”.
data <- read.csv("outlier.csv")
\(~\)
We now generate a boxplot to initially assess the presence of outliers in our data set.
boxplot(data, outcol= "red", cex=1.5)
The boxplot shows three outliers.
\(~\)
From the boxplot, we see that there are three values beyond the upper whisker; considered to be outliers based on the boxplot. We now use this information in setting the value of k for the Rosner’s test to determine the specific outliers in our data. By the way, we present two code chunks for the Rosner’s Test. Compare the outputs and try to see the difference between the two code chunks in the R Markdown file.
rosnerTest(data$weights, k=4, warn=TRUE)
##
## Results of Outlier Test
## -------------------------
##
## Test Method: Rosner's Test for Outliers
##
## Hypothesized Distribution: Normal
##
## Data: data$weights
##
## Sample Size: 50
##
## Test Statistics: R.1 = 3.196505
## R.2 = 2.702503
## R.3 = 2.826365
## R.4 = 2.305651
##
## Test Statistic Parameter: k = 4
##
## Alternative Hypothesis: Up to 4 observations are not
## from the same Distribution.
##
## Type I Error: 5%
##
## Number of Outliers Detected: 1
##
## i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
## 1 0 16.30000 2.502733 24.3 50 3.196505 3.128247 TRUE
## 2 1 16.13673 2.243574 22.2 49 2.702503 3.120128 FALSE
## 3 2 16.01042 2.083802 21.9 48 2.826365 3.111796 FALSE
## 4 3 15.88511 1.914814 20.3 47 2.305651 3.103243 FALSE
rosnerTest(data$weights, k=4, warn=TRUE)
Results of Outlier Test
-------------------------
Test Method: Rosner's Test for Outliers
Hypothesized Distribution: Normal
Data: data$weights
Sample Size: 50
Test Statistics: R.1 = 3.196505
R.2 = 2.702503
R.3 = 2.826365
R.4 = 2.305651
Test Statistic Parameter: k = 4
Alternative Hypothesis: Up to 4 observations are not
from the same Distribution.
Type I Error: 5%
Number of Outliers Detected: 1
i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
1 0 16.30000 2.502733 24.3 50 3.196505 3.128247 TRUE
2 1 16.13673 2.243574 22.2 49 2.702503 3.120128 FALSE
3 2 16.01042 2.083802 21.9 48 2.826365 3.111796 FALSE
4 3 15.88511 1.914814 20.3 47 2.305651 3.103243 FALSE
The Rosner’s Test show that observation number 50, with a value of 24.3, is an outlier.
\(~\)
To illustrate how to trim our data or remove the outlier from the data set, we perform the following script (Again, note that in actual practice, removing outliers from our data set is not always the case. Read about this in the module). Note the designated observation number of the outlier. It is Obs.Num 50.
(Note: if we are to remove two (or more) outliers from our original data set, supposed these are observations 49 and 50, the argument to delete these outliers will be: data$weights[c(-49, -50)]. )
data1 <- data$weights[-50]
boxplot(data1, outcol="red", cex=1.5, main="After removing outlier")
\(~\)
After trimming the data or removing the outlier and generating a boxplot for the trimmed data set, we can see that there are observations considered to be outliers based on the boxplot. We check if indeed these observations are outliers by performing the Rosner’s Test again.
rosnerTest(data1, k=3, warn=TRUE)
Results of Outlier Test
-------------------------
Test Method: Rosner's Test for Outliers
Hypothesized Distribution: Normal
Data: data1
Sample Size: 49
Test Statistics: R.1 = 2.702503
R.2 = 2.826365
R.3 = 2.305651
Test Statistic Parameter: k = 3
Alternative Hypothesis: Up to 3 observations are not
from the same Distribution.
Type I Error: 5%
Number of Outliers Detected: 0
i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
1 0 16.13673 2.243574 22.2 49 2.702503 3.120128 FALSE
2 1 16.01042 2.083802 21.9 48 2.826365 3.111796 FALSE
3 2 15.88511 1.914814 20.3 47 2.305651 3.103243 FALSE
The Rosner’s test result show that there are no significant outliers in our trimmed data set. With this, we can then proceed with the succeeding statistical analysis using our trimmed data.
\(~\)