The aim of this project is to create a Shiny application that predicts the next word in a sequence of text, mimicking the functionality seen in predictive text systems, such as those used in smartphone keyboards.
The project follows several steps:
Understanding the problem and preparing the dataset,
Conducting exploratory data analysis (EDA)
Tokenizing the input text and building a prediction model
Compiling a milestone report
Developing and deploying the Shiny app for next-word prediction
The data used for this project is publicly sourced, and after preprocessing, it was tokenized into n-grams (word sequences). These n-grams form the foundation for predicting the next word based on user input.
The Shiny application allows users to input a phrase and, in real-time, predicts the next possible word.
Functionality:
Users enter a sentence or phrase in the text box.
The app compares the user input with the n-gram models to predict the next word.
The prediction is instantly displayed as the user types, providing an option to use the suggested word.
The application is designed with a responsive user interface, ensuring that predictions refresh dynamically as more text is entered.
The performance of the n-gram model was tested against a separate dataset to assess its predictive accuracy. The evaluation involved comparing the suggested next word with the actual next word in the text.
The model’s accuracy was compared across bigram, trigram, and quadgram predictions to ensure the highest possible level of accuracy in the predictions provided to users.
The following screenshot shows the user interface of the Shiny app:
This Shiny app successfully provides an easy-to-use interface for predicting the next word in a sentence, utilizing n-gram models to analyze language patterns.
Future work could include improving the prediction accuracy by expanding the dataset or incorporating more advanced machine learning techniques, such as Recurrent Neural Networks (RNNs) or Transformer models. Enhancing the real-time performance of the application is also a consideration for future development.