Final Project Submission- Word Prediction -

Word prediction: A shiny app

17.11.2017

Goal and data basis of the app

Goal:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs after a suitable delay a prediction of the next word.

Data basis:

Data basis are US blogs, news, and twitter data downloaded from: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

Code - Preprocessing -

The following code shows how the data for the app is loaded and preprocessed.

library(ngram)
blogs<- readLines ("C:/Coursera/ngram/final/en_US/en_US.blogs.txt", n=10000)
news<- readLines ("C:/Coursera/ngram/final/en_US/en_US.news.txt", n=10000)
twitter<- readLines ("C:/Coursera/ngram/final/en_US/en_US.twitter.txt", n=10000)

str <- concatenate(blogs, news, twitter)

str<- preprocess(str)
ng <- ngram(str, n=2)

pt_ng<- get.phrasetable(ng)

Code - Core Funtion -

The app works with word 2-grams build with the data basis.

Core of the app is a function that takes the last word of the text input, searches the 2-gram with this word as startword and the highest frequency. The the second word of this 2-gram is displayed.

Code of the core function:

find_word <- function(w) {
      x <- pt_ng[grep(paste0("^",word(w,-1)),pt_ng$ngrams), ]
      x<- x$ngrams[1]       
      if (is.na(word(x,2))){print("No suggest")} else {word(x,2)} 
      }

Result

Result is a simple but high effective app that meets the

goal of predicting the next word in a performant way.