Text Prediction App

Marco Lunardi

Aim of the App

The aim assigned to the App is not so easy:

Once given a sequence of words, organized into a sentence, the App has to return the word that makes sense as the final word of the same sentence.

It's quite an easy stuff for our brain, trained for years in learning how to fill in words into sentences, but it's not the same easy to an algorithm that hasn't got either the reasoning power of a human brain, or enough time to get the training a human being develops through a lifetime.

Where I started from...

The starting point was a collection of posts from blogs and twitter, and sentences from news websites.

It was quite a huge dataset to be analyzed, amounting to more than 550 Megabytes, and more than 4,2 millions lines.

But two were the main hurdles to be overtaken: making the text readable by a computer, and keeping the memory-usage quite low.

So there is a huge trade-off to be faced: using the most of available data, while keeping the App light enough to be read by any device.

Easy to say, not so easy to be done.

... Going through a Step-by-Step Process...

These are the steps taken to develop the predicting algorithm for the app (with great patience and a lot of fine-tuning)

Reading sentences and posts from the database of text files into R language (forming the so called “corpora”)
Cleaning the files and filtering out punctuation from the text.
Choices to be made: filter out stop words (the, and, to,…)? You better don't. Filter out profane words before or after the training of the algorithm? Maybe it's better after. How much great has to be the data sample for the algorithm being trained enough?
N-Gram frequencies and algorithm training: assigning probabilities to all possible words combinations in a sentence, and “smoothing” them to consider also “not seen” combinations.

... To finally get to the final outcome

Once the text is transformed into a computer-readable format and each word combination has an assigned frequency, the algorithm can be trained and then it's able to make its predictions.

Just type a sentence into the App (better if two words or more), and it will return your sentence along with its most “probable” last word.

Please just consider that the App uses a “reduced” and less-performing version of my original algorithm, in order to make it work well on Shinyapp website.

You can find my App at the below link: have fun!

http://marcolunardi.shinyapps.io/Text_Prediction/