2024-09-05

Project Specifications

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

A slide deck consisting of no more than 5 slides created with R Studio Presenter ( https://support.rstudio.com/hc/en-us/articles/200486468-Authoring-R-Presentations ) pitching your algorithm and app as if you were presenting to your boss or an investor.

Project Overview

Next Word Prediction Shiny App

Objective: Develop an interactive web application that predicts the next word in a phrase based on a preprocessed corpus of English text.

Key Features:

  • Utilizes n-gram models (unigrams, bigrams, trigrams) for word prediction.

  • Interactive UI with real-time predictions.

  • Includes a progress bar for enhanced user experience.

  • Customizable with company branding through logos.

Algorithm Description

Predictive Algorithm:

1. Data Preparation:

  • Unigrams: Single words with their frequency.

  • Bigrams: Pairs of words with their frequency.

  • Trigrams: Triplets of words with their frequency.

2. Prediction Strategy:

  • Trigrams: Check if the last two words form a known trigram; predict the next word.

  • Bigrams: If no trigram match, use the last word to find the most likely next word from bigrams.

  • Fallback: If no bigram match, predict the most common unigram.

3. Efficiency:

  • Utilizes data.table for fast querying and processing.

App Description (1)

App Overview:

User Interface:

  • Text Input: Enter a phrase or sentence.

  • Submit Button: Triggers prediction process.

  • Progress Bar: Indicates ongoing prediction process.

  • Predicted Word Display: Shows the most likely next word.

Features:

  • Real-time updates as users type.

  • Background color customization and company logos for branding.

App Description (2)

Technical Stack:

  • Backend: R with data.table and dplyr for data manipulation.

  • Frontend: Shiny for interactive web interface.

Click Here to access the app

¡Please wait while the server loads the data!

Functionality and Benefits (1)

App Functionality:

1. Text Processing:

  • Converts input to lowercase and splits into words. Matches against n-grams to predict the next word.

2. Efficiency:

  • Fast lookups using data.table. Simulates processing delay to mimic real-time computation.

3. User Experience:

  • Visual feedback through progress bar. Brand visibility with company logos.

Functionality and Benefits (2)

4. Benefits:

  • Enhances user engagement with real-time predictions.

  • Can be customized and scaled for different applications or datasets.

Next Steps:

  • Test with real-world data.

  • Explore additional features or integrations based on user feedback.

Conclusion

The project successfully demonstrates the application of natural language processing (NLP) and machine learning techniques to predict the next word in a given input text.

The project employs n-gram models (unigrams, bigrams, trigrams) to predict the next word. This approach is effective for capturing different levels of context within the text, enhancing the prediction accuracy.The use of data.table for handling n-grams significantly enhances the speed and efficiency of data manipulation and querying, making the app responsive and faster.

The Shiny app provides a user-friendly interface with visual elements such as progress bars and logos. This enhances the overall user experience, making the tool more engaging and intuitive.