A sample of the height-weight dataset is used to demonstrate outlier detection by using Mahalanobis distance. App link: https://steffenruefer.shinyapps.io/mahalanobis_outliers/
18 December 2016
A sample of the height-weight dataset is used to demonstrate outlier detection by using Mahalanobis distance. App link: https://steffenruefer.shinyapps.io/mahalanobis_outliers/
After loading the data, Mahalanobis Distance is calculated and a threshold is used to create a new feature, classifying an observation as outlier (Yes or No).
df <- read.csv("data.csv")
# Calculate Mahalanobis Distance and add to data frame
df$m_dist <- round(mahalanobis(df, colMeans(df), cov(df)),1)
df$Outlier <- "No"
df$Outlier[df$m_dist > 10] <- "Yes"
df$Outlier <- as.factor(df$Outlier)
head(df, 4)
## height weight m_dist Outlier ## 1 185.09 45.19 21.2 Yes ## 2 181.65 61.83 3.3 No ## 3 176.27 69.32 4.7 No ## 4 173.27 64.48 1.8 No
In the server calculations, data for the plot and the output table are calculated every time the threshold is changed. Below shows part of the server calculation:
shinyServer(function(input, output) {
# Table Output - Create table of outliers
output$table <- renderTable({
df$Outlier <- "No"
df$Outlier[df$m_dist > input$threshold] <- "Yes"
df$Outlier <- as.factor(df$Outlier)
# Table Output:
dt <- df %>%
filter(Outlier == "Yes") %>%
select(height, weight, m_dist) %>%
arrange(desc(m_dist))
})
})
The Shiny Application will output a plot with outlier data as shown below: