Data Science Capstone

Humberto Renteria

2024-02-07

Data Science Capstone Project - Humberto Hernandez Renteria

In this project, we are tasked with developing a word prediction system using n-grams. We organize monograms, bigrams, and trigrams. The type of n-gram used depends on the number of words to predict.

Main Working Code

Here, you’ll be able to observe the most critical part of the code in action. This algorithm is responsible for extracting the last word from the sentence to predict, locating the first word of the bigram, and utilizing the second one.

Below is the main code:

      word_to_start_with <- input$name
    
      last_word <- str_extract(word_to_start_with, "\\b\\w+\\b$")
    
      result <- newsBigrams %>%
      dplyr::filter(str_detect(bigram, paste0("^", last_word, "\\b"))) %>%
      mutate(second_word = str_extract(bigram, "\\b\\w+\\b")) %>%
      arrange(line) %>%
      slice(1) %>%
      pull(bigram)
    
    return(result)

Programm Deployment

The program is deployed in Shiny Apps, and you can check it on this link:

Click Here

Conclusion

This algorithm, or this type of algorithm, is actually used in most modern word predictors that we encounter in our daily lives, from cell phones to the internet and computers.

Machine learning tools are becoming increasingly available and widespread.