Natural Language Processing Application

KFollmer
Oct. 15th, 2016

Request: Build a text prediction web application

Requirements

R Packages

library(shiny)
library(quanteda)
library(stringr)
library(data.table)
library(dplyr)

Create 4-gram dictionary using the sample data source references in previous slide.
Store the last 2 words of the user's input to use to predict the next word
Use the dictionary in conjuction with the kwic() function of the quanteda package by Ken Benoit to search for occurances of the bigram stored in step 2
Create a frequency table of all the possibilities of the next word as found in the dictionary
Select the most frequently occuring 'next word' as the answer

There is a tradeoff between performance and acurracy with this tool. Performance was prioritized over formatting

Improvements for Next Version