Capstone Project - Word Predictor

Verence
01. Februrary 2018

Predict the next word you will type

Situation

The future depends strongly on the past.
That is also valid for natual text: The next written word depends on priorly written words.

Idea: Create a model to predict next word.

The Model

  • Looks up to three words in the “past” to predict the next word.
  • There for the model was trained by public available text posted on twitter, bolgs and news, with an spelling correctness of at least 95%.
  • The most common uni/two/three/four-Grams (fequency greater \( 2*10^{-6} \)) were put in a four dictionary.
  • To save storage space - only the hash-value of the N-Gram were stored in the dictionaries not the N-Gram it self.

  • Pediction: Look at first in the four- then three- then two- and at last in uni-Gram dictionariy till find the most 10 likely words.

Demo application

Number of suggested words and used N-Grams are configurable.

Conclusion

Model characteristics

  • accuracy of 23% that the next word is predicted in the first 10 words.
  • Model contains 118,917 N-Grams stored in 573 kB.
  • 1.08ms response time per prediction.
  • It is not perfect but a appropiate basis.

Test it!

Shiny demo application is availible on here.