Data Science Capstone Final Project: Prediction Model on SwiftKey Data

Omid Jazi

May 2, 2021

Introduction

In this Project, we create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. Accordingly, we build

A Shiny application that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

Description

The data was cleaned and processed by the tm, stringr packages which have different inbult functions for removing common puntuation, stopswords, numbers, twitter handles etc.The clean data was then combined together for furthur analysis.
A sample of one percent of the data was used for the project. The N-Grams were created using tokenization. The model algorithm uses the stupid back-off strategy for words prediction.

Predictive Model

We have trained one percent of sample on the SwiftKey data on blogs, news and twitter.
The model adapts a set of n-grams whcih is a contiguous sequence of \(n\) items from a given sample of text or speech and it is used to make a prediction on the next word.
In the prediction algorithm, the results are in the order of quadgram, trigram, and bigram. In case no result is found, it will return the word “the” as the predicted word.

Shiny Application Link & Instruction

The link for the Shiny Application is Here
On the left side, there is a textbox to enter a phrase. On the right side, the output with “NULL” value.
Enter a word phrase, then the algorithm will predict the next word. In case no result is found, it will return the word “the” as the predicted word.

Reference

Shiny App: Link
Github Repository: Link