Krzysztof Piotr Malinowski
2019 JAN 13
The application was written as a Capston project for Data Science Specialization or Coursera.
After typing a sentence by a used, the applications provides a prediction of next word based on n-gram model.
Models were trained on data from Twitter, blogs and news written in English.
Unigrams, bigrams, trigrams and fourgrams were created based on data from Twitter, blogs and news written in English.
Algorithm takes sentence proveded by the user and returns word that is most likely to be the next word. Unigrams, bigrams, trigrams and fourgrams are selected automatically by algowithm based on number of worlds provided by the used in the sentence.
Although all efforts were made to prepare the best aplication possiblw few improvments could be proposed.
More algorithms could be implemented and the radiobutton could be added to the GUI for the user to choose the model.
Algorithm could provide three most likely words.
In this application n-grams were generated based on small subset of complete dataset due to extensive memory ussage. Prosucrion application should be based on entire avaliable datasets.