6/13/2020

Overview

The goal of this project is to design a word-predicting application that suggests next word using a language predictive model when writting a text .

Given a word or phrase as an input, the application will try to suggest the next word.

The predictive language model for this app will be trained using a corpus, a collection of text document over which will be applied text mining or natural language processing routines to derive inferences.

In this project US english data set of news, blogs and twitter will be used.

Creating Word Predicting Model

The following steps are taken in creating the prediction model used in the app:

  • Downloading the raw text files for the model training

  • Cleaning/filtering the data and separating into 2, 3, 4, 5, and 6 words n-grams and sorting it by frequency

  • Saving processed data as .rds files which will be used for model training

  • n-grams function uses a back-off type prediction model

Model will try to predict the best 6th, 5th, 4th, 3rd, or 2nd match using last 5, 4, 3, 2, or 1 words

Word Prediction App

The app provides a very simple user interface to the next word prediction model. It takes as input a word or a phrase and displays an output of suggested next word.

Documentation