Using natural language processing for predicting next word in a phrase or a sentence
17 July 2016
The goal of this app is to synthesize the information of a database of sample texts (i.e. messages from Twitter, from blog posts, and news posts), and come with a suggestion for the next suitable word a user could type in. Some limitation could be pointed out:
The data used for this app is supplied by SwiftKey and could be found here: Dataset SwiftKey . It consists of three different sources of data, in order to have different variety of word combinations. Some 'cleaning' and adjusting of the data has been performed, i.e.
The basic idea is the Markov Assumption. It states that the next word is mostly determined by the previous one, two or three(or more) words, instead of the whole sentence. So this app simply looks for matches, first in the fourgrams, if it does not find any looks at trigrams, and so on until unigram, at which point the prediction is poor.
The app could be found in this link . Some special features:
The code for the app could be fund here.
Some useful links to the absolutely amazing materials of
the website of prof. Dan Jurafsky
Specifically the slides of the course Natural Language Processing could be found here.