November 14, 2024

Introduction

Objective:

The goal is to build a product specifically a shiny application made using R to highlight the prediction algorithm that is built and to provide a user-friendly interface that can be accessed by others conveniently.

How was it created?

  • Preprocessed the data (removed unnecessary expressions not needed for prediction)
  • Stemmed the text corpus
  • Created N-grams and Document Frequency Matrices from the corpus
  • Tokenized the group of words for better word count
  • Created functions to predict next word using the concept of n-gram and backoff modeling

Limitations:

  • If the word or phrase entered in the text box is beyond the scope of the algorithm, it will produce or display “NA”s for all buttons.
  • Words more than three will predict the next word only based on the last two words of the whole phrase or sentence. This is because of the constraints from the n-gram model.

Sources