Introducing the custom binning App

Jurriaan Nagelkerke
23-8-2015

Value of binning for predictive modeling

Binning is very useful in predictive modeling

  • To identify how variables are related
  • To visualize correlations
  • To get rid of outliers and missing values

Below, an example is included that shows how easy it is to spot a correlation between sepal length and the specy “Satosa” in the Iris dataset in the binned version of sepal with.

library(datasets)
data(iris)

iris_ext<-iris
iris_ext$ind_setosa<-iris$Species=="setosa"

quantiles<-quantile(iris_ext$Sepal.Length,probs = (1:5)/5,na.rm = TRUE)

iris_ext$Sepal.Length_binned <- 5
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[4]]<-4
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[3]]<-3
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[2]]<-2
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[1]]<-1

# Stacked Bar Plot with Colors and Legend
counts <- table(iris_ext$ind_setosa, iris_ext$Sepal.Length_binned)
barplot(counts, main="Species Setosa (Y/N) per Sepal length bin",
        xlab="Sepal length (binned)", col=c("red","green"),
        legend = rownames(counts))

plot of chunk unnamed-chunk-1

Features of the app

The following features make the app very usefull:

  • You can select a data source (in the current version: MLBattend - an R example set from the UsingR package, see http://www.inside-r.org/packages/cran/UsingR/docs/MLBattend for details)
  • You can select all numeric fields within the dataset
  • You can select the number of equal-sized bins you would like to have
  • You can evaluatie the cutoff points visually since the points for the bins are visualized in the histogram graph of the original variable
  • You can adjust the exact cutoff values to have more meaningfull cutoffs. For example: 1.000.000 as a cutoff instead of 995289.29
  • You can copy and paste the resulting code to create the binned variable. In case you've adjusted the cutoffs, these adjusted values are used; otherwise the calculated cutoffs are used.

How to get to the app

The app can be found on the following URL: https://jurrr.shinyapps.io/course-project

If you would like to know more on the source code (server.R and ui.R): https://github.com/jurrr/DevelopingDataProducts_CourseProject

Happy binning!!

Thanks for trying out this app! Hope it suits your goals in transforming your analysis data!

beybey