2024-05-15

Introduction

  • This app is my submission for the data science capstone project on JHU’s datascience specialization.
  • This app utilizes trigram Markov chain to predict the next word based on the previous two words.
  • The models in this app can be used to make typing easier especially on touch screen devices.

Trigram Markov Chain

  • The app employs a trigram Markov chain model for text prediction.
  • This statistical model calculates the probability of a word based on the previous two words.
  • It analyzes patterns in the dataset to make accurate predictions.
  • Trigram Markov chain is a powerful technique widely used in natural language processing.

Implementation

library(ngram)
  • We process the corpus and compute the trigram using the ngram package
# Example text corpus
corpus <- c(
  "This is some sample text for a demo",
  "This is some more text here"
)
# We tokenize the corpus to trigrams
trigrams_text <- ngram(corpus, n = 3, sep = " ")
trigram_probabilities <- get.phrasetable(trigrams_text)
print(trigram_probabilities) # Based on this value we compute the most likely next word.
##              ngrams freq prop
## 1     This is some     2  0.2
## 2   more text here     1  0.1
## 3       for a demo     1  0.1
## 4   some more text     1  0.1
## 5     is some more     1  0.1
## 6       text for a     1  0.1
## 7   is some sample     1  0.1
## 8  sample text for     1  0.1
## 9 some sample text     1  0.1

APP

  1. Input Text: Users can input text into the app using the provided text box.
  2. Context-aware Predictions: By analyzing patterns in the dataset, the app offers context-aware predictions, ensuring relevance to the input text.
  3. Displaying Most Probable Prediction: The most probable prediction is displayed, helping users quickly identify the suggested word.
  4. User-friendly Interface: With a simple and intuitive interface, the app is accessible to users of all levels, providing a seamless experience.