Natural Language Processing: Predictive Text

Arushi Gulati
April 26, 2015

Introduction

This project takes in an input of words and predicts the next word that is most likely to occur. Such algorithms may be used for predictive texting in mobile applications, etc. The aim is to minimize speed of the alogirthm while maximising accuracy. This project focuses on:

  • Getting and Cleaning Data
  • Outlining the process in a report
  • Creating a prediction Algorithm
  • Uploading the application on ShinyApps
  • Creating an R Presentation to highlight the tasks performed

Prediction Algorithm

The prediction algorithm is as follows:

  • The algorithm takes 3 data sets containing millions of rows and creates a subset of 50,000 rows each.
    • The 3 data sets are then combined together to make the final data set which will be used in prediction.
    • The final data set is converted to a corpus for further processing.
    • The corpus is cleaned (punctuation and extra white spaces removed).
    • N-grams are calculated for n=1,2,3 and 4 to determine the likelihood of n words occuring together.

Prediction Algorithm

  • The entered input is split into words by space as the separator.
  • Counting the number of words in the input array, the final prediction is made.
  • If the input array consists of 1 word, bigram is calculated to predict the word most likely to occur after this input word.
  • If there are 3 words in the input array, fourgram is calculated to predict the word most liekly to occur after these three words.
  • If the input array consists of 2 words or >= 4 words, then trigram is calculated to predict the next word whichi is most likely to occur.

How To Use The App

  • Enter desired number of words as input on the left panel.
  • Click on submit.
  • The predicted word should appear on the right.
  • If no predictions are available for the entered words, a message will indicate the same on the right. In such a case, please start over with a new sequence of words.

NOTE: Please allow a few seconds for the app to load initially.