JHU-Data Science Capstone: NGram Model for word prediction

LIN, WEI-YU
2019-08-04

Background

This is the Capstone project of Data Science Specialization, offered by Johns Jopkins University on Coursera. https://www.coursera.org/specializations/jhu-data-science?.

The main purpose of this project is to use the corpus provided from SwiftKey, and use previous n words to predict the next word by N-gram model.

1-gram illustration

plot of chunk 1gram Based on single word input to predict next word.

2-gram illustration

plot of chunk 2gram Based on two-word input to predict next word.

Algorithm

Back-off Method: While inputing a sentence, subtract the last two words as 2-gram to run the prediction. If there is no suitable result, the last word is adopted to be used in 1-gram prediction.

Appendix