Coursera Data Science Capstone Project

Shubham Nagle
July 31 2020

  • This Project predicting text using Natural Language Processing techniques.
  • This Project is sponsored by
    • Coursera and
    • Swiftkey

OVERVIEW

  • The objective of this project is developing an application that would predict the next word in a phrase that the user inputs and This application shopuld be:

    • Fast, to produce quick responce to user input.
    • lightweight, to preserve device resources.
    • Interative interface, to make use of application easy.
  • The data came from HC Corpora with three files (Blogs, News and Twitter).

  • The data was cleaned, processed, tokenized, and n-grams are created which is further used into the algorithm to predict the next word based on the text entered by the user.

A PREDICTIVE TEXT MODEL

  • The Data was loaded into R.
  • A sample was created, cleaned (By lower casing, removing links, twitter handles, punctuation, numbers and extra whitespaces etc.) and prepared to be used as a corpus of text.
  • The sample text was “tokenized” into so-called n-grams to construct the predictive models.
  • The n-grams files or data.frames (unigram, bigram, trigram and quadgram) are matrices with frequencies of words, used into the algorithm to predict the next word based on the text entered by the user.

THE SHINY APPLICATION

  • The Shiny application predict the next possible word based on the text entered by the user.
  • This Shiny application Provides a text input box for user to type a word/phrase then Detects words typed and in the output box, the application returns the most probability word to be used.
  • The predicted word is obtained from the n-grams matrices, comparing it with tokenized frequency of 2, 3 and 4 grams sequences.Basically its prediction is based on the longest, most frequent, matching N-gram.

THE APP USER INTERFACE

The user simply enters a word or phrase in the text box, and suggested next words will appear below it. Instructions are provided in the left sidebar to ensure a smooth user experience.

Application Screenshot

APP AND RESOURCES

  • The final report comes from the link Milestone Report.

  • The next word prediction app is hosted on shinyapps.io: Shiny app

  • Thank You