WordPredict

Krzysztof Piotr Malinowski
2019 JAN 13

Introduction

The application was written as a Capston project for Data Science Specialization or Coursera.

After typing a sentence by a used, the applications provides a prediction of next word based on n-gram model.

Models were trained on data from Twitter, blogs and news written in English.

Model

Unigrams, bigrams, trigrams and fourgrams were created based on data from Twitter, blogs and news written in English.

Algorithm takes sentence proveded by the user and returns word that is most likely to be the next word. Unigrams, bigrams, trigrams and fourgrams are selected automatically by algowithm based on number of worlds provided by the used in the sentence.

How to use

Go to https://krzysini.shinyapps.io/WordPredict/
Provide sentence
Click submit
Wait few seconds for the prediction to appear

Potential improvments

Although all efforts were made to prepare the best aplication possiblw few improvments could be proposed.

More algorithms could be implemented and the radiobutton could be added to the GUI for the user to choose the model.
Algorithm could provide three most likely words.
In this application n-grams were generated based on small subset of complete dataset due to extensive memory ussage. Prosucrion application should be based on entire avaliable datasets.