Word Prediction using NLP algorithms

Zhenning Xu

11.03

1 Introduction

The goal of this project is to create a dashboard (app) to provide an interface that can be used to make predictions based on natural language processing algorithms. This slide deck consists of slides pitching the algorithm and the app.

For this iteration of the project, the data used is actually from a real-world app - SwiftKey (http://swiftkey.com/en/). The purpose of this project is to understand and build predictive text models like those used by SwiftKey.

2 why is it so cool

My app detects words typed and predicts the most likey word(s) within seconds. Companies like Microsoft (Swiftkey's current owner) have recently introduced a popular app that uses NLP algorithms to predict the words we will write and offers sentence completion suggestions accordingly. See the following screenshot (https://www.microsoft.com/en-us/swiftkey?rtc=1&activetab=pivot_1:primaryr2):

plot of chunk unnamed-chunk-1

3 Short Brief

The app includes the following feature:

  • A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

These data were tokenized 3 times using 1-gram to 3-gram calculations using RWeka.

The algorithm predicts the next word based on the last 3 text inputs the user entered then starts to search using the 3-gram. If the next word isn't predicted, it selects the 2-gram, then 1-gram. If nothing is found it falls back to a “default” of the word most often seen.

Please feel free to browse over the shiny app here: https://utjimmyx.shinyapps.io/shinynlp/.

4 please find a sceenshot of the dashboard below (refresh if the page does not load)

library(imager)
myimg <- load.image("C:/Users/zxu3/Documents/R/shiny/nlp.png")
plot(myimg)

plot of chunk unnamed-chunk-2