Rinnette Ramdhanie
9 May 2020
This app makes it easier to type text by automatically predicting the next word based on what was already typed.
Data Science Capstone presentation
John Hopkins University
Source: The data for this project was obtained from twitter, blogs and news websites. A sample was taken from over 4.2 million lines of text with more than 105 million words.
Cleaning: This included removal of profanity, punctuation and extra spaces.
Processing: The data was tokenized: unigrams, bigrams, trigrams and quadgrams were obtained with their associated frequencies. Phrases with low frequencies were not used in the model, in order to decrease the size of the files.
Laplace smoothing was used to calculate the probability for each n-gram
The algorithm uses a simple backoff method.
Instructions for use:
Application is available at the following link: https://niala.shinyapps.io/predictWordApp/