Gian Atmaja
June 9, 2020
What is it?
It's a web application where you input a word/ set of words, and get a prediction
of the next most probable word. This data product is part of the capstone project
in the John Hopkins University Data Science Specialization.
The data used for this prediction model is comprised of 3 text files. They include English
words and phrases extracted from 3 main sources: blogs, news, and twitter.
The method used is ngrams tokenization. We basically split the texts into groups of 1, 2,
3,…,n words.
We then rank them based on how frequent they appear, and match them with the input received
from the app user.
The data used are only a small fraction of the whole training set.
This is due to memory constraints,
Kindly download the codes if an error occurs.
GitHub Repo