2026-05-22

Objective of the App.

Next Word Prediction helps complete sentences on mobile phones and tablets. This project explores how to predict the next likely words in real-time using a shiny app in R. we use a Data Set provided by Swiftkey.

  • For our model, we used N-grams tokenization, more specifically for N=2, and N=3. After sampling 10% from the given data setand some cleaning We calculated the frequency and probability for each n-gram, removing The resulting indexed data.tables are stored for real-time prediction.

  • We use a backoff algorithm for predictions, starting with 3-grams, then 2-grams.

Relevant Links to the Project.

Snapshots of the App .

The First Page.

Alt Text

The Second Page.

Alt Text

Perfomance and Possible Improvements .

We used various sample phrases from three data sets, and our app was able to predict the next word with a 73% accuracy rate. The main challenge we faced in this project was memory usage. We believe that the accuracy of the N-gram prediction model, which employs backoff probability selection, could be improved by including N-grams with larger values of N in the selection process. Unfortunately, as N increases, more memory is required to store the resulting N-grams. Our computer was unable to handle values larger than N=3.