Slide Deck: Your Next Word Predictor

Gian Atmaja
June 9, 2020

Introcuction: Your Next Word Predictor

What is it?
It's a web application where you input a word/ set of words, and get a prediction of the next most probable word. This data product is part of the capstone project in the John Hopkins University Data Science Specialization.

How it's achieved, in a nutshell

The data used for this prediction model is comprised of 3 text files. They include English words and phrases extracted from 3 main sources: blogs, news, and twitter.
The method used is ngrams tokenization. We basically split the texts into groups of 1, 2, 3,…,n words.
We then rank them based on how frequent they appear, and match them with the input received from the app user.

Some notes

The data used are only a small fraction of the whole training set.
This is due to memory constraints,
Kindly download the codes if an error occurs.
GitHub Repo

Thank you