Johns Hopkins Data Science Specialization Capstone Project

Shashwat khare
15-May-2019

Word Prediction App Description

The app helps to predict the next word based on the N-gram model using R, hosted on Shiny.R

  • The user can type any word or sentence in the input box
  • App uses a the data from Corpora of (blogs, twitter & news)

Here is a link to the Shiny app- https://shashwatkhare03.shinyapps.io/Word_Prediction/

Data Gleaning

  • Merged data from the 3 Data Sources into one data file (Blogs, Twitter & News)
  • Cleansed the data included converting to lower case, Removing special characters.
  • Created Bigram, Trigram and Quadgram.
  • Created data frames using those n-grams to count top frequency of words
  • Find out the Most frequently occuring word in the Corpus.

Data Exploration

-Use the data frame to create, wordclouds and Bar plots of most frequently occuring words.

You can check the Image of such wordcloud here

Word Prediction Working

  • Algoithm checks for the highest-order n-gram (n=4)
  • If n=4 is not found, then checks the next lower-order model (n=3)
  • If n=3 is not found, then the app continues to check (n=2)
  • If n=2 is not found, then the app returns “No Match Found”

Enjoy the app!