The ML Studio is an interactive platform for data visualization, statistical modeling and machine learning applications. Based on Shiny and shinydashboard interface, with Plotly interactive data visualization, DT HTML tables and H2O machine learning and deep learning algorithms, the ML Studio provides a set of tools for the data science pipeline workflow.
The package is available for installation with the devtools package (if devetools package is not installed please use install.packages("devtools") to install it).
# Install the MLstudio
devtools::install_github("RamiKrispin/MLstudio")Please note – the H2O package may require additional Java adds-in (if not installed) and therefor is listed under the “Suggests” packages list of the MLstudio package (and not under the Imports or Depends list) and won’t be installed automatically during the installation of the MLstudio package. More information about the installation of H2O can be find in H2O documentation (under the “INSTALL IN R” tab).
The app is called from R and opened on the default web browser (running best on Google Chrome). To open the app please use:
# Launch the MLstudio
runML()The ML Studio provide the user with the ability to load (or remove), modify, visualize and analyze multiple dataset at the same time.
Under the “Data” tab there are two sub-tabs:
Load – set of tools to load data into the platform (from R environment, R datasets and/or csv file)
Prep – data prep tools:
There are three methods to load dataset into the platform:
Loading dataset from R enironment
Loading the diamond dataset from the ggplot2’s datasets
Loading the Kaggle’s Titanic train set from a csv file
The variables attribution can be seen in the “Prep” tab in the middle table, a more in depth summary available on the variable summary box. Using the variable attributes option, it is possible to modify if needed the variables attributes. Below can be find an interactive table, the fields can be sort and a search option is available.
Variables summary and attribution changing
A data summary function is available on the “Prep” tab under the “Select Option” dropdown menu. This is a dplyr based function and it is provide the ability to summaries data by a specific group. Currently there summaries categories are – count, mean, sd, max and mean.
diamonds dataset - price summary by cut
Utilizing Plotly interactive data visualization tools along with Shiny engine, the ML Studio provides the user with effective tool for data exploration. The “Visualization” tab provides key functionality:
Visualization of the diamonds and iris datasets
Visualization of the AirPassengers datasets
The models applications of the ML Studio is still under development and currently available four classification models from the H2O package (Deep Learning, GBM, GLM and Random Forest).
Classification model for the GermanCredit dataset
Features that under development: