Predicting Next Word Given a sequence

2024-05-15

Introduction

This app is my submission for the data science capstone project on JHU’s datascience specialization.
This app utilizes trigram Markov chain to predict the next word based on the previous two words.
The models in this app can be used to make typing easier especially on touch screen devices.

Trigram Markov Chain

The app employs a trigram Markov chain model for text prediction.
This statistical model calculates the probability of a word based on the previous two words.
It analyzes patterns in the dataset to make accurate predictions.
Trigram Markov chain is a powerful technique widely used in natural language processing.

Implementation

library(ngram)

We process the corpus and compute the trigram using the ngram package

# Example text corpus
corpus <- c(
  "This is some sample text for a demo",
  "This is some more text here"
)
# We tokenize the corpus to trigrams
trigrams_text <- ngram(corpus, n = 3, sep = " ")
trigram_probabilities <- get.phrasetable(trigrams_text)
print(trigram_probabilities) # Based on this value we compute the most likely next word.

##              ngrams freq prop
## 1     This is some     2  0.2
## 2   more text here     1  0.1
## 3       for a demo     1  0.1
## 4   some more text     1  0.1
## 5     is some more     1  0.1
## 6       text for a     1  0.1
## 7   is some sample     1  0.1
## 8  sample text for     1  0.1
## 9 some sample text     1  0.1

APP

Input Text: Users can input text into the app using the provided text box.
Context-aware Predictions: By analyzing patterns in the dataset, the app offers context-aware predictions, ensuring relevance to the input text.
Displaying Most Probable Prediction: The most probable prediction is displayed, helping users quickly identify the suggested word.
User-friendly Interface: With a simple and intuitive interface, the app is accessible to users of all levels, providing a seamless experience.