Article Summary

Yufeng Guo’s The 7 Steps of Machine Learning, published on August 31th, 2017, highlights the key steps to be taken to complete a successful project in Machine Learning. In order to discuss these steps with a practical application, Yufeng pretends that he has been asked to create a classification system that predicts if a drink is either wine or beer.

He also covers the material in this article in the linked YouTube video

wine_beer

The seven key steps are listed below:

1. Gathering Data

First, extract and/or find a data set to run a Machine Learning model on. The training data to be used includes three fields, the color of the drink, the Alcohol by Volume %, and the labeling. An example is displayed below:

Color ABV Drink.Type
610 5 Beer
599 13 Wine
693 14 Wine

2. Data Preparation

With the training set acquired, we need to prepare it for training. This includes randomizing the order, conducting exploratory data analysis, and splitting the data into a train and test set.

3. Model Selection

There are many Machine Learning models to chose from, yet there are typically a few that are best suited for your data set. Some are best for image or sound data, while others are best for numerical data. Yufeng chose to use a simple linear model.

4. Model Training

This is the biggest step of Machine Learning, in which we use our data to continually improve the model’s ability to classify a drink as wine or beer. Behind the scenes, the model will incrementally adjust weights and biases that are associated with the specific inputs, until it settles on a combination that best predicts the drink type.

5. Evaluation

With the model trained, we can then use the test set to evaluate the data set on data it has yet to see. Without splitting the data into these two sets, we risk the model over-fitting or not being able to generalize.

6. Hyperparameter Tuning

As the final step before prediction, we attempt to incrementally adjust the parameters of our model in order to improve its performance. In this example, there are not many different parameters… yet examples include the number of times we run the model through the train set, the learning rate, etc.

7. Prediction

The final step will be to use the simple linear regression model to predict whether a drink is wine or beer

prediction

Article Viewing Statistics

Article.Claps Comments
2800 6

Author Information

Yufeng is a frequent author on Towards Data Science with dozens of data science focused articles. He currently has 7.4k followers on the website. His bio highlights that he is a developer and advocate of Google Cloud, and he enjoys running, music, and cooking.

What’s My Take?

Overall, I think the author did a good job giving a high-level overview of the Machine Learning process. Machine Learning is a very important tool for the future and this serves as great introduction

That being said, I think the reader could benefit from seeing the process implemented with code alongside the general descriptions. Furthermore, I think some of the steps may have been a little oversimplified and could be misleading.

Random Plots

Iris Sepal Length x Width

plot(iris$Sepal.Length,iris$Sepal.Width)

mtcars MPG x Horsepower

mtcars Data Set