Application Of The Next Prediction Word

Anita Hassan
23 April 2016

The presentation describe the application of https://equinenas.shinyapps.io/My_capstone/ is an R Shiny application exploring the topic of text prediction.

The application was designed to satisfy requirement of predicting the next work given an input as well as exploring different prediction algorithms.

The Objective

  • The main goal of this capstone project is to build a shiny application that is able to predict the next word.

  • This assignment was divided into 7 data activities which are data cleansing, exploratory analysis and the creation of a predictive model.

  • All text data that is used to create a frequency dictionary and thus to predict the next words comes from a corpus.

  • All text mining and natural language processing was done with the usage of a variety of well-known R packages such as sqldf packages.

The Applied Methods & Models

  • The training data from https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip was cleaned (removed numbers and punctuation) and n-grams formed for use in prediction.

  • This data sample was then tokenized into so-called [n-grams]. Using these N-gram frequencies The Next Word can take the user submitted sentences and quickly calculate the most likely next word.Those aggregated uni-,bi- and trigram term frequency matrices have been transferred into frequency dictionaries.

  • The resulting data.frames are used to predict the next word in connection with the text input by a user of the described application and the frequencies of the underlying n-grams table.

The Usage Of The Application

The main layout elements is the content panel with tabs for:

  • Instructions
  • Input the sentence

The application allow user to enter the phrases and the application will display the next prediction word.

Application Screenshot

Additional Information