Vedant Mane
July 27, 2020
This presentation outlines a text prediction app designed by Vedant Mane.
The github repository is available at https://github.com/vedantmane/SwiftKey-PredictiveTextAnalyis
The objective of this capstone project is to build a text prediction application in shiny that predicts the next word given a phrase. The application tries to predict the next possible word with a trade off between the accuracy and the comutational time required for the output.
This application is built using data obtained from US blogs, news & twitter. The summary of the dataset is as follows:
Blogs News Twitter
Size 200 MB 200 MB 200 MB
Number of lines 899288 77259 2360148
Number of characters 206824505 15639408 162096241
Max Characters 40833 5760 140
The dataset is available here
Algotithm
N-GRAM Prediction Model
The algorithm predicts the next word using the n-gram model (5-grams, 4-grams, 3-grams, 2-grams) designed using Markov Chain technique and for the unobserved n-grams it uses stupid backoff to precdict the next word.
The application is deployed at https://vedantmane.shinyapps.io/swift/