In this article a semi-supervised classification algorithm implementation will be described using Markov Chains and Random Walks. We have the following 2D circles dataset (with 1000 points) with only 2 points labeled (as shown in the figure, colored red and blue respectively, for all others the labels are unknown, indicated by the color black).
Now the task is to predict the labels of the other (unlabeled) points. From each of the unlabeled points (states) a random walk with Markov transition matrix (computed from the row-stochastic kernelized distance matrix) will be started that will end in one labelled state, which will be an absorbing state in the Markov Chain.
This problem was discussed as an application of Markov Chain in a lecture from the edX course ColumbiaX: CSMM.102x Machine Learning. The following theory is taken straightway from the slides of the same course:
Since the Markov chain does not satisfy the connectedness condition of the Perron-Frobenius theorem, it will not have a single stationary distribution, instead the random walk will end in exactly one of the terminal (absorbing states) and we can directly compute (using the above formula, or use power-iteration method for convergence, which will be used in this implementation) the probability on which absorbing state it’s more likely to terminate starting from any unlabelled point and accordingly label the point with the class of the most probable absorbing state in the Markov Chain.
The following animation shows how the algorithm converges to an absorbing state for each unlabeled point and gets labeled correctly. The bandwidth for the Gaussian Kernel used was taken to be 0.01. Off course the result varies with bandwidth. For each datapoint 500 iterations were used to obtain the convergence to one of the absorbing states, as shown. Each time a new unlabeled point is selected and a random walk is started with the Markov transition matrix and power-iteration is continued until it terminates to one of the absorbing states with high probabiliy. Since two absorbing states are there, finally the point will be labeled with the label of the absorbing state where the random walk is likely to terminate with higher probability.