Word Suggestion Application

Vicente Cano
December 30, 2016

The goal of the Data Science Capstone final project is to create a product that highlights a prediction algorithm for words typed.

The application described in this presentation makes suggestions after a user has entered one or more words and can be found at https://vecano.shinyapps.io/word-suggestions/ .

Application Description

A screenshot of the Word Suggestion Application is shown below. It consists of an input field to enter text and a series of suggestions that appear once the user has started entering text. When clicked, the selected suggestion is appended to the existing text.

screenshot of application

Using n-grams for Suggestions

The first project (http://rpubs.com/vecano/n-gram-milestone-report) for this Data Science Capstone course consisted of extracting n-grams from a large corpus of text that included twitter posts, news articles, and blog posts written in English.

In order to extract the actual unigrams, bigrams, and trigrams from the corpus of text, I used the quanteda library. This library tokenizes words and finds n-grams, which are groups of words that appear often together in a corpus of text.

The trigrams (a group of three words) found from this project are used within the “Word Suggestion” shinyapp application to suggest words.

Instructions to Use the Application

Using the application is simple. You start typing in the input field and suggested next words are automatically displayed by the application. You can choose from these suggestions by clicking the button of the desired word or you can continue typing. Below is an example of suggested words for the phrase starting with “take another”:

example of suggestion

The word chosen will be appended to the end of the existing sentence. Once a selection has been made, the buttons will show new suggestions.

How does the Application Work

The trigrams extracted on the first project were stored in a dataframe which is used to find appropriate suggestions for words typed in the input field. The first two columns contain the first and second words of popular trigrams. The third column contains a list of the top three words for a particular combination of two starting words.

For example, for the word “leave”, the key1 column contains the word itself, key2 contains the most common words that follow it, and key3 is a list of words that complete the trigrams:

trigrams for the word 'first'