Data Science Specialization SwiftKey Capstone: WordPredict App

Divya Subramanian

Introduction

AIM: To create a Shiny App which predicts the next word ,given a phrase or a sentence as input.

The capstone dataset is obtained from Coursera-SwiftKey . Only the blogs, news and twitter files in english are used for this project.

Summary of Datasets

Summary of train dataset

About the data

  • As the dataset above is too large ,a sample of 3000 random lines are selected from each dataset.
  • The data is cleaned by removing punctuations,numbers,spaces,converted to lower case and non english words are removed.
  • 4-grams,3-grams,2-grams and 1-gram words are created and stored as separate data tables.
  • The back-off Model is used for word prediction.

Prediction Algorithm Flow

Flowchart

About Shiny App

The Shiny app can be found at WordPredict App

The app lets you enter phrase or sentence , and displays the output in the tab named The next words. The tab named How it works, gives a description of the algorithm

screenshot