r Sys.Date()

{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE)

Project Overview

This project presents a Shiny application that predicts the next word in a phrase using a trigram model. It demonstrates how basic natural language processing (NLP) techniques can be implemented in R to create an interactive and reproducible tool.

Objective

  • Build a functional app that predicts the next word based on user input.
  • Use a custom corpus and trigram frequency analysis.
  • Deploy the app publicly via shinyapps.io.

Methodology

  • Corpus: A manually curated set of English sentences.
  • Tokenization: Using unnest_tokens() from tidytext to extract trigrams.
  • Frequency Table: Built with count() to identify common word sequences.
  • Prediction Logic:
    • Extract the last two words from user input.
    • Match against trigram table.
    • Return up to 3 most frequent continuations.

Corpus Example

corpus <- c( “The government announced a new economic policy”, “She posted a photo of her birthday cake”, “The hurricane is expected to make landfall on the coast”, “He was arrested after being caught with illegal drugs”, “The company plans to expand its operations in Asia” )

```{r setup_1, include=FALSE} knitr::opts_chunk$set(echo = FALSE)

Trigram Construction

ngrams <- text_df %>% unnest_tokens(ngram, text, token = “ngrams”, n = 3) %>% separate(ngram, into = c(“word1”, “word2”, “word3”), sep = ” “) %>% count(word1, word2, word3, sort = TRUE)

```{r setup_2, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)

## Prediction Function
predict_next_words <- function(input_text) {
  input_text <- tolower(input_text)
  words <- str_split(input_text, " ")[[1]]
  if (length(words) < 2) return("Please enter at least two words.")
  last_two <- tail(words, 2)
  predictions <- ngrams %>%
    filter(word1 == last_two[1], word2 == last_two[2]) %>%
    arrange(desc(n)) %>%
    slice_max(n, n = 3) %>%
    pull(word3)
  if (length(predictions) == 0) return("No prediction found.")
  paste(predictions, collapse = ", ")
}

App Interface

Text input for phrase

Predict button

Output area for predicted words

Screenshot: (Insert image of your app here)

Deployment

Example Results

Input: The hurricane is expected to make landfall on the → Prediction: coast

Input: She posted a photo of her birthday → Prediction: cake

Future Improvements Expand corpus with thematic texts (e.g., sustainability, economics)

Add smoothing for unseen phrases

Visualize trigram network

Translate app for Spanish corpora

Conclusion

This app demonstrates how a simple trigram model can be used to build an interactive NLP tool in R. It is reproducible, adaptable, and useful for teaching, experimentation, and further development.