Yuliang Wang
Next word prediction is widely used on mobile devices and search engines and have made our lives a lot easier.
This presentation will describe an application for predicting the next word:
https://yuliangwang.shinyapps.io/shiny/.
The objective of this project is to build a Shiny application to predict the next word given user input phrases.
The project is divided into several sub-tasks, including exploratory analysis, model building and refinement, and Shiny app development.
HC Corpora is the basis of all n-gram calculations. Several important R packages used include quanteda, data.table and tm.
Due to hardware limitations, a randomly sampled 50% of HC Corpora data was used. The dfm function in quanteda was used to convert to lower case, remove punctuation, remove numbers, and other clean up.
According to Wikipedia: “In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.”
For example, “I am” is a bigram, “good at learning” is a trigram, etc.
Modified interpolated Kneser Ney method, a highly popular and accurate method, are used to calculate probabilities for all 2-,3- and 4-grams.
The Kneser-Ney method provides better estimate for probabilities of lower-order unigrams by introducing continuation probabilities, i.e, how likely a unigram complete a bigram, a bigram completes a trigram, etc.
First the application will try if input matches any 4-grams that begins with the 3 input words, if so, select 3 most probably next words. If not, recursively go to 3-grams and 2-grams.
The user interface is very simple. The user just enters any phrases, hit the “GO” button, and top 3 most probable next words will be returned.
Since the pre-calculated model handles up to 4-grams, only the last 3 of the input word sequences will be used to predict the next words.
The first trial will take 10 seconds to load, as the app needs to load pre-computed models and required packages, but subsequent trials will be instanteneous.
Please try it out at https://yuliangwang.shinyapps.io/shiny/.