Text Prediction Shiny Application

May 23, 2023

css: slides.css output: ioslides_presentation: widescreen: true smaller: true transition: 0.1 self_contained: true logo: image/tandon_long_color.png —

```

Summary

This presentation features the Next Word Predict Shiny Application including an introduction to the application user interface and details about the text prediction algorithm. The application will generate up to three different suggested words or phrases to complete any given character input in the text entry box.The benefit of this application is to speed writing and expected work(s) selection for techinal writers and bloggers.

The Next Word Predict Application is located at:

Next Word Predict Shiny Application

The Code is available at:

Github Repository

Shiny Application

The Next Word Prediction is a Shiny application that uses a text prediction algorithm to predict the next word(s) based on a text string entered by a user (nGram).

The application will suggest the next word in a string of text using an n-gram algorithm. An n-gram is a contiguous sequence of n words from a given sequence of text.

The text used to build the predictive text model came from a large corpus of blogs, news and twitter data from Swiftkey. N-grams were extracted from the corpus and then used to build the predictive text model.

Data was obtained from:

Swiftkey

The Predictive Text Model

The predictive text model was built from a sample of a large corpus of blogs, news, and twitter data from Swiftkey.

I used an n-gram Language Model to predict the next word. As part of the cleaning process the data was converted to lowercase letters and all non-ascii characters, like URLs, email addresses, Twitter handles, hash tags, ordinal numbers, profane words, punctuation and whitespace, were removed. The data was split into tokens (n-grams).

Tasks accomplished were: (1) Tokenization - Writing a function that takes a file as input and returns a tokenized version of it. (2) Profanity filtering - removing profanity and other bad words.

Application User Interface

As text is entered by the user, the algorithm iterates from longest n-gram (4-gram) to shortest (2-gram) to detect a match. The predicted next word is considered using the longest, most frequent matching n-gram.

The predicted next word will be shown when the application detects that you have finished typing one or more words. When entering text, please allow a few seconds for the output to appear. The top prediction will be shown first followed by the second and third likely next words.