Sample R Markdown document.

Here is the R Markdown file for the topic on outlier detection, specifically with the use of the Rosner’s Test for Outliers, presented in Module 6 Unit 2. Inspect the parts of this file, particularly how the scripts and texts are written.

$~$

To start with, let us first load the necessary packages.

library(readr)
library(outliers)
library(EnvStats)

## 
## Attaching package: 'EnvStats'

## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm

## The following object is masked from 'package:base':
## 
##     print.default

$~$

The next step is for us to import the “outlier.csv” data into RStudio and assign it to the object that we are going to name as “data”.

data <- read.csv("outlier.csv")

$~$

We now generate a boxplot to initially assess the presence of outliers in our data set.

boxplot(data, outcol= "red", cex=1.5)

The boxplot shows three outliers.

$~$

From the boxplot, we see that there are three values beyond the upper whisker; considered to be outliers based on the boxplot. We now use this information in setting the value of k for the Rosner’s test to determine the specific outliers in our data. By the way, we present two code chunks for the Rosner’s Test. Compare the outputs and try to see the difference between the two code chunks in the R Markdown file.

rosnerTest(data$weights, k=4, warn=TRUE)

## 
## Results of Outlier Test
## -------------------------
## 
## Test Method:                     Rosner's Test for Outliers
## 
## Hypothesized Distribution:       Normal
## 
## Data:                            data$weights
## 
## Sample Size:                     50
## 
## Test Statistics:                 R.1 = 3.196505
##                                  R.2 = 2.702503
##                                  R.3 = 2.826365
##                                  R.4 = 2.305651
## 
## Test Statistic Parameter:        k = 4
## 
## Alternative Hypothesis:          Up to 4 observations are not
##                                  from the same Distribution.
## 
## Type I Error:                    5%
## 
## Number of Outliers Detected:     1
## 
##   i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
## 1 0 16.30000 2.502733  24.3      50 3.196505   3.128247    TRUE
## 2 1 16.13673 2.243574  22.2      49 2.702503   3.120128   FALSE
## 3 2 16.01042 2.083802  21.9      48 2.826365   3.111796   FALSE
## 4 3 15.88511 1.914814  20.3      47 2.305651   3.103243   FALSE

rosnerTest(data$weights, k=4, warn=TRUE)

  
  Results of Outlier Test
  -------------------------
  
  Test Method:                     Rosner's Test for Outliers
  
  Hypothesized Distribution:       Normal
  
  Data:                            data$weights
  
  Sample Size:                     50
  
  Test Statistics:                 R.1 = 3.196505
                                   R.2 = 2.702503
                                   R.3 = 2.826365
                                   R.4 = 2.305651
  
  Test Statistic Parameter:        k = 4
  
  Alternative Hypothesis:          Up to 4 observations are not
                                   from the same Distribution.
  
  Type I Error:                    5%
  
  Number of Outliers Detected:     1
  
    i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
  1 0 16.30000 2.502733  24.3      50 3.196505   3.128247    TRUE
  2 1 16.13673 2.243574  22.2      49 2.702503   3.120128   FALSE
  3 2 16.01042 2.083802  21.9      48 2.826365   3.111796   FALSE
  4 3 15.88511 1.914814  20.3      47 2.305651   3.103243   FALSE

The Rosner’s Test show that observation number 50, with a value of 24.3, is an outlier.

$~$

To illustrate how to trim our data or remove the outlier from the data set, we perform the following script (Again, note that in actual practice, removing outliers from our data set is not always the case. Read about this in the module). Note the designated observation number of the outlier. It is Obs.Num 50.

(Note: if we are to remove two (or more) outliers from our original data set, supposed these are observations 49 and 50, the argument to delete these outliers will be: data$weights[c(-49, -50)]. )

data1 <- data$weights[-50]      
boxplot(data1, outcol="red", cex=1.5, main="After removing outlier")

$~$

After trimming the data or removing the outlier and generating a boxplot for the trimmed data set, we can see that there are observations considered to be outliers based on the boxplot. We check if indeed these observations are outliers by performing the Rosner’s Test again.

rosnerTest(data1, k=3, warn=TRUE)

  
  Results of Outlier Test
  -------------------------
  
  Test Method:                     Rosner's Test for Outliers
  
  Hypothesized Distribution:       Normal
  
  Data:                            data1
  
  Sample Size:                     49
  
  Test Statistics:                 R.1 = 2.702503
                                   R.2 = 2.826365
                                   R.3 = 2.305651
  
  Test Statistic Parameter:        k = 3
  
  Alternative Hypothesis:          Up to 3 observations are not
                                   from the same Distribution.
  
  Type I Error:                    5%
  
  Number of Outliers Detected:     0
  
    i   Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
  1 0 16.13673 2.243574  22.2      49 2.702503   3.120128   FALSE
  2 1 16.01042 2.083802  21.9      48 2.826365   3.111796   FALSE
  3 2 15.88511 1.914814  20.3      47 2.305651   3.103243   FALSE

The Rosner’s test result show that there are no significant outliers in our trimmed data set. With this, we can then proceed with the succeeding statistical analysis using our trimmed data.

$~$

Outlier

AE311

Sample R Markdown document.