Next Word Prediction helps complete sentences on mobile phones and tablets. This project explores how to predict the next likely words in real-time using a shiny app in R. we use a Data Set provided by Swiftkey.
For our model, we used N-grams tokenization, more specifically for N=2, and N=3. After sampling 10% from the given data setand some cleaning We calculated the frequency and probability for each n-gram, removing The resulting indexed data.tables are stored for real-time prediction.
We use a backoff algorithm for predictions, starting with 3-grams, then 2-grams.