A text prediction application to ease input on web pages, prepared as the Capstone project of Johns Hopkins' Data science specialization offered in Coursera.
Oscar de Leon
The prepared application,
n-gram predictor, takes plain text input in English
and uses some simple models to predict the next word.
It outputs the prediction to a red button so the user can pick it, and also provides two additional suggestions (grey buttons). Some details on the likelihood of each word are presented.
The following image shows the general appearance of the application:
The prediction is performed by searching for the last
(up to) 3 recognized words of the text input in an n-gram table to get the
absolute frequency of each
n-gram containing the last (
n-1) words and each
The retrieved information is used to compute the likelihood of each prediction
option given the number of times its “root” (the previous
and a linear interpolation is performed to select the best prediction across all the
The model always performs back-off, to use information on all the
n-gram table contains information
4-grams derived from the datasets
provided by the course instructors.
You can access the application in the link provided in the evaluation page from the course site. Some panels with instructions and additional information are provided under the “Information” tab.
To use the application, write some English text in the text box found under the “App” tab. To request a prediction you can either:
The application is built on a
shiny backend as provided by Rstudio.
The text box and the buttons are observed for user interaction in a reactive
environment, and the application performs actions based on user input.
The application gets its accuracy from using a large lookup table. The application gets its speed from the following design choices:
n-gramsare encoded as short integers to reduce storage size
ffpackage, so it is read directly from disk instead of loaded in memory