Single Word Prediction Algorithm

Wassim Moukhaiber
10/12/2018

First Slide

Main activities for overall algorithm conditional (“if”) download & extract Coursera-SwiftKey dataset (version: 15th Jan 2018) optimized efficiency of load text files in English language by “fread” function custom & efficient text data cleaning applied 7-10 (among 14 analyzed) text data cleaning operations in proper sequence using faster kinds of functions and parallelism via clusters text data exploration to better understand text dataset and to optimal improve activities achieved: files, n-grams and words properties

Slide Two

Main activities summary for overall algorithm - cont.

optimized efficiency, memory and R code reuse during n-gram models generation generation of n-grams frequencies in format required by word prediction function, using fast “ngram_asweka” and “data.table” functions inside new “generateNFDF” function next word prediction using efficient, modified “predict_backoff” function Backoff algorithm applied to first four reduced (freq. > 1) n-grams input text phrase cleaned in separate “custom_input_text_clean” function debug mode during testing and improvements optimization memory in places with higher memory usage choose less memory consumed functions, apply “lineprof” and “gc”

Slide Three

Features and manual for SINGLE WORD PREDICTION application

application predicts single word based on n-gram model for “News articles” or “Twitter” text data files application contains 4-gram model for each among two text data files application has progress bar to show loading of required n-grams application allows to select text data type among “News articles” and “Twitter” application has resized input text box to enter text phrase without last word application presents cleaned input text phrase and predicted word directly after press “Clean & Predict” button

Slide For

Features and manual for SINGLE WORD PREDICTION application - cont.

application allows to change selected text data type or edit entered text phrase to see immediately new cleaned text phrase and new predicted next word alternatively application has “Clear” button to clear: entered and cleaned text phrase and to clear predicted next word application has directly visible usage example and other parts of help text in right places application is available on shinyapps.io server and github development platform