Developing Data Products: Coursera Project

Alexander Schniertshauer
2016 - 03 - 30

What does the app want to achieve?

Classification trees are an important, frequently used method in machine learning.

Changing the parameters used by a specific classification algorithms can strongly affect the resulting tree.

This Shiny app - developed for Coursera's course 'Developing Data Products' - demonstrates the impact of changing various parameters on learning classification trees with the rpart package.

How does the app work?

To build the classification tree the iris data set is split into a traing set (80% of the data) which is used to learn the tree and a test set (20% of the data) which is used to predict the species. The tree is build with R's rpart package.

The user can vary two of the parameters using a slider:

  • Minimum number of observations per split node
  • Minimum number of observations per terminal (leaf) node

What will the user see?

The user will see a fitted tree and the confusion matrix of the prediction ( like the one shown below)

            Reference
Prediction   setosa versicolor virginica
  setosa         10          0         0
  versicolor      0         10         1
  virginica       0          0         9
 Accuracy 
0.9666667 

Where to find the app?