18 December 2016

Introduction

Data Exploration

After loading the data, Mahalanobis Distance is calculated and a threshold is used to create a new feature, classifying an observation as outlier (Yes or No).

df <- read.csv("data.csv")
# Calculate Mahalanobis Distance and add to data frame
df$m_dist <- round(mahalanobis(df, colMeans(df), cov(df)),1)
df$Outlier <- "No"
df$Outlier[df$m_dist > 10] <- "Yes"
df$Outlier <- as.factor(df$Outlier)
head(df, 4)
##   height weight m_dist Outlier
## 1 185.09  45.19   21.2     Yes
## 2 181.65  61.83    3.3      No
## 3 176.27  69.32    4.7      No
## 4 173.27  64.48    1.8      No

Server Calculations

In the server calculations, data for the plot and the output table are calculated every time the threshold is changed. Below shows part of the server calculation:

shinyServer(function(input, output) {
      # Table Output - Create table of outliers
      output$table <- renderTable({
            df$Outlier <- "No"
            df$Outlier[df$m_dist > input$threshold] <- "Yes"
            df$Outlier <- as.factor(df$Outlier)
            # Table Output:
            dt <- df %>%
                  filter(Outlier == "Yes") %>%
                  select(height, weight, m_dist) %>%
                  arrange(desc(m_dist))
      })
})

Resulting Plot

The Shiny Application will output a plot with outlier data as shown below: