NLP App Presentation

Angus Macdonald
28th February 2020

Background

The goal of this project is to build a Shiny app that takes a string of characters and predicts what the next word should be.

The following slides will describe:

The algorithm that is used in this study.
The App and how it works, outlining specifically how the user can use it.
The limitations to the project.

Project Data & Algorithm

The first step was to create a text corpus of which to perform the prediction on. This took data from different source and combined them in several n-grams to be used in the app.

This project used the back-off algorithm commonly used in NLP (Natural Language Processing). This algorithm does the following:

– Takes the input and passes it into the function as a character (not a very fun one). – Searches for the “quadgram” and sees if it can find the most likely output. – If the word count for the input is less than that required for the “quadgram”, the inut is then compared to the “trigram” to see if it can find the most likely word. – This goes on until the word count is 1 word and the “bigram” is used to find the next most likely predictor. – This produces an output estimate for the given input.

The App

The App is set up in two panels:

– The guide panel – The App

The guide panel gives an intuitive explanation on how where the user inputs are required and the outputs.

This app in essence would work even better, should a larger corpus of data be used. I implore anyone with a PC strong enough to cater for such processing to give it a shot and see what they can find!

Limitations to the App

The main issue was the processing power required to create the corpus. Machine learning and NLP requires large amounts of RAM and processing power to run the algorithms not only to create the corpus but also in the prediction algorithm.

As such the corpus used in this project is smaller than most, given the minute amount of RAM and strength of the laptop this project was performed on.

Machine learning and other aspects of programming hold efficiency at the core, and this little prediction engine encapsualtes the idea that despite the underwhelming performance, prediction and NLP is still possible!.

Useful Links

– Data: HC Corpora

– Milestone Report

– App: Shiny app