2023-08-14

Background

This presentation is a short description of a project that will predict the next word of a sentence fragment or phrase.

The application is a capstone project for the Coursera Data Science Specialization provided by Johns Hopkins University with support by Swiftkey.

Goals and objectives of the project

  • The main goal was to develop a predictive algorithm using R programming and the Shiny app interface.

  • The application was developed using a sample of twitter tweets (English), which was provided by Swiftkey.

Algorithm

Once the English version of the data was loaded, the algorithm pulled the number of lines, and removed profanity and tokenization, which were then organized into n-gram sequences.

This resulted into bigram, trigram and a quadrigram models, and converted into frequency dictionaries sorted by frequency numbers.

Application

Below is the link to the application: https://nyandele.shinyapps.io/shiny/

By default, when the application loads, it will check for a word and a message will show requiring entering a word or phrase.

The user can then enter a word or phrase and the application will require the user hit submit. When this happens 2 items will displayed:

  • The phrase
  • The next word

The application starts at quadrigram and works its way down to determine if it can find a predictive word.