Jurriaan Nagelkerke
23-8-2015
Binning is very useful in predictive modeling
Below, an example is included that shows how easy it is to spot a correlation between sepal length and the specy “Satosa” in the Iris dataset in the binned version of sepal with.
library(datasets)
data(iris)
iris_ext<-iris
iris_ext$ind_setosa<-iris$Species=="setosa"
quantiles<-quantile(iris_ext$Sepal.Length,probs = (1:5)/5,na.rm = TRUE)
iris_ext$Sepal.Length_binned <- 5
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[4]]<-4
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[3]]<-3
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[2]]<-2
iris_ext$Sepal.Length_binned[iris_ext$Sepal.Length<=quantiles[1]]<-1
# Stacked Bar Plot with Colors and Legend
counts <- table(iris_ext$ind_setosa, iris_ext$Sepal.Length_binned)
barplot(counts, main="Species Setosa (Y/N) per Sepal length bin",
xlab="Sepal length (binned)", col=c("red","green"),
legend = rownames(counts))
The following features make the app very usefull:
The app can be found on the following URL: https://jurrr.shinyapps.io/course-project
If you would like to know more on the source code (server.R and ui.R): https://github.com/jurrr/DevelopingDataProducts_CourseProject
Thanks for trying out this app! Hope it suits your goals in transforming your analysis data!