outliers

Create the data.

set.seed(321123)
data <- rchisq(100, df=10)
head(data)
## [1]  8.077467  6.208754  8.679944 12.300834  5.539333 14.603254

View the data.

hist(data,
     col="orangered")

  1. Describe this data (unimodal, bimodal, multimodal; symmetric, right-skewed, left-skewed).

Calculate a summary and the IQR of the data.

summary(data)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.533   6.855   8.646   9.837  12.380  35.490
IQR(data)
## [1] 5.527023

Make a boxplot.

boxplot(data,
        horizontal=TRUE, col="yellow")

Identification of outliers

Assign values to the symbols \(q1, m, q3\) and \(iqr\).

q1 <- as.numeric(quantile(data, 0.25))
m <- as.numeric(quantile(data, 0.50))
q3 <- as.numeric(quantile(data, 0.75))
iqr <- q3 - q1
  1. Calculate the lower and upper boundaries beyond which any data value would be an outlier.

The numbers 1 and 2 are just place holders. Use the correct formulas.

lower.boundary <- q1 - 1.5 * iqr
lower.boundary
## [1] -1.435212
upper.boundary <- q3 + 1.5 * iqr
upper.boundary
## [1] 20.67288

Here are the smallest 10 values and largest 10 values of the data. Are any of these numbers outliers?

  1. Identify the outliers in these lists.
head(sort(data), 10)
##  [1] 1.532913 3.298883 4.005185 4.577083 4.718302 4.784650 4.849587
##  [8] 5.167080 5.395535 5.539333
tail(sort(data), 10)
##  [1] 15.38270 15.41619 15.85357 15.85960 16.02138 16.21278 16.68438
##  [8] 16.85810 20.15049 35.48660