12/31/2022

1.- Introduction

The goal of this exercise is to create a product to highlight the prediction algorithm and to provide an interface that can be accessed by others.

The Shiny app will takes as input a phrase in a text box input and outputs a prediction of the next word. .

2.- App link

3.- Prediction algorithm

Data was downloaded from Coursera-SwiftKey.zip. Read the blog, news and twitter dataset from the English language files and built a a collection of written texts called text corpus using VCorpus. The corpus is processed using tm_map to remove punctuation, numbers, whitespaces, stopwords, convert text to lower case and stemDocument.

Next we apply tokenization which is the splitting of a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms. The processed corpus was then tokenized in n-grams frequency database, namely 2-gram, 3-grams and 4-grams with frequency of occurrence n.

Thanks