機械学習を基礎から学びます。そもそも、機械学習とはなんでしょう?
次の説明では、特定のデータや結果に限らず適応できる一般的なアルゴリズムであるとされます。
Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data. (Geitgey 2014)
わかりやすい例があげられています。
手書きの数字を読み取る機械学習のアルゴリズム(Generic Machine Learning Algorithm)があるとします。このアルゴリズムが、メールをスパムかそうでないかに分けることができるような場合です。アルゴリズムは同じものですが、トレーニングが異なるので、異なった分類ができるのです。
図で示します。
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
# define the global styles of the nodes. We can override these in box if we wish
node [shape = rectangle, style = filled, fillcolor = Linen]
data1 [label = 'Images of \n Hand-written Numbers', shape = folder, fillcolor = Beige]
process [label = 'Generic \n Machine \n Algorithm']
statistical1 [label = '1']
statistical2 [label = '2']
statistical3 [label = '3']
statistical4 [label = 'etc...']
# edge definitions with the node IDs
{data1} -> process -> statistical1
process -> statistical2
process -> statistical3
process -> statistical4
}")
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
# define the global styles of the nodes. We can override these in box if we wish
node [shape = rectangle, style = filled, fillcolor = Linen]
data1 [label = 'Email', shape = folder, fillcolor = Beige]
process [label = 'Generic \n Machine \n Algorithm']
statistical1 [label = 'Spam']
statistical2 [label = 'Not Spam']
# edge definitions with the node IDs
{data1} -> process -> statistical1
process -> statistical2
}")