Capstone Project - Word Predictor

Verence
01. Februrary 2018

Predicts next word you will type

Situation

The future depends strongly on the past.
That is also valid for natual text: The next written word depend on priorly written words.

Idea: Create a model to predict next word.

The Model

  • Looks up to three words in the “past” to predict the next word.

  • There for the model was trained by public available text posted on twitter, bolgs and news.

  • After cleaning the training data the most common uni/two/three/four-Grams were put in a separate organized dictionary.

  • To save storage space - only the hash-value of the N-Gram were stored in the dictionary not the N-Gram it self

Demo application

Number of suggested words and used N-Grams are configurable.

Conclusion

Model characteristics

  • accuracy of 23% that the next word is predicted in the first 10 words
  • Model contains 118,917 N-Grams stored in 573 kB
  • 1.08ms response time per prediction
  • It is not perfect but a appropiate basis.

Test it!

Shiny demo application is availible on here.