Yufeng Guo’s The 7 Steps of Machine Learning, published on August 31th, 2017, highlights the key steps to be taken to complete a successful project in Machine Learning. In order to discuss these steps with a practical application, Yufeng pretends that he has been asked to create a classification system that predicts if a drink is either wine or beer.
He also covers the material in this article in the linked YouTube video
The seven key steps are listed below:
First, extract and/or find a data set to run a Machine Learning model on. The training data to be used includes three fields, the color of the drink, the Alcohol by Volume %, and the labeling. An example is displayed below:
| Color | ABV | Drink.Type |
|---|---|---|
| 610 | 5 | Beer |
| 599 | 13 | Wine |
| 693 | 14 | Wine |
With the training set acquired, we need to prepare it for training. This includes randomizing the order, conducting exploratory data analysis, and splitting the data into a train and test set.
There are many Machine Learning models to chose from, yet there are typically a few that are best suited for your data set. Some are best for image or sound data, while others are best for numerical data. Yufeng chose to use a simple linear model.
This is the biggest step of Machine Learning, in which we use our data to continually improve the model’s ability to classify a drink as wine or beer. Behind the scenes, the model will incrementally adjust weights and biases that are associated with the specific inputs, until it settles on a combination that best predicts the drink type.
With the model trained, we can then use the test set to evaluate the data set on data it has yet to see. Without splitting the data into these two sets, we risk the model over-fitting or not being able to generalize.
As the final step before prediction, we attempt to incrementally adjust the parameters of our model in order to improve its performance. In this example, there are not many different parameters… yet examples include the number of times we run the model through the train set, the learning rate, etc.
The final step will be to use the simple linear regression model to predict whether a drink is wine or beer
| Article.Claps | Comments |
|---|---|
| 2800 | 6 |
Overall, I think the author did a good job giving a high-level overview of the Machine Learning process. Machine Learning is a very important tool for the future and this serves as great introduction
That being said, I think the reader could benefit from seeing the process implemented with code alongside the general descriptions. Furthermore, I think some of the steps may have been a little oversimplified and could be misleading.
plot(iris$Sepal.Length,iris$Sepal.Width)
Social Media Links