6/6/2020

Introduction

The goal of this project is to create a product to highlight the prediction algorithm that I have built and to provide an interface that can be accessed by others.

Click here to use the app.

Data Cleaning

  • The text file data is from https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip.
  • I remove the URLs, special characters, punctuations, numbers, excess whitespace and stopwords and transfer the text to lower case.
  • The data is separated into tokens as bigram, trigram and quadgram.
  • For speeding up the deployment of app, I upload those three cleaned datasets to server.

Prediction Algorithm

  • Algorithm matches the given words into N-grams cases.
  • If nothing matches, less given words will be used. And algorithm then try to match N-grams again.
  • If nothing matches single given word, the algorithm will just return a word “the”.

Instruction of App

  • Type in the “Enter here” box
  • Read the prediction results on the right area

Thank you !