Anang Hudaya Muhamad Amin
9 September 2018
This presentation has been prepared as part of the John Hopkins University - Coursera Data Science Capstone Project submission. It briefly describe the Word Prediction Application that has been developed for this course.
The aim of this project is to develop a web-based application that predicts the next possible word that the user is likely to enter, based on a chosen prediction algorithm used. This application focuses on words in English language. Future development may include other languages as well.
The prediction algorithm utilizes the maximum likelihood estimation (MLE) function on n-gram language model being developed. A sample size of 1000 words are taken from each data file within the dataset (en_US.blogs.txt, en_US.news.txt, and en_US.twitter.txt).
The NLP and RWeka library is used to tokenize the words into a set of 1-, 2-, 3-, and 4-gram models. These models are stored as an object list stored in four different Rds files.
MLE function is used to find the closest match between the input phrase given by the user (according to number of words) and the four different n-gram models and the possible next word is determined.
This is a simple application prototype for next-word prediction.It utilizes the dataset obtained from SwiftKey.
There are certain limitations with this app:
It has been a challenging task in completing this project.
The Word Prediction App can be accessed with the following URL:https://ananghudaya.shinyapps.io/submission/