Presentation for pitch

Arjyahi Bhattacharya
18/07/2021

Word Prediction App

Use this app to predict the next word.

  • Users start to type a sentence, the app predicts next word
  • App uses a subset of data from the three data sources (blogs, twitter & news)
  • App also uses the technology of Swiftkey

Project Overview

This presentation was created as the final step in the Capstone project for the Data Scientist specialization offered through Coursera / Johns Hopkins.

The project goal was to build a predictive model of English text. The skills needed to complete this task include natural language processing and text mining. The model was created using the Shiny Application in RStudio.

Link to my app: https://arjyahi.shinyapps.io/WordPredictor_for_DataScienceCapstone/

Data Gathering & Cleansing

  • Merged data from the 3 Data Sources into one data file (Blogs, Twitter & News)
  • Cleansed data including converting to lowercase, stripping white space, and removing punctuation & numbers
  • Created Bigram, Trigram and Quadgram n-grams
  • Extracted term-count tables from the n-grams
  • Sorted in descending order based on frequency
  • Saved n-gram objects

Word Prediction Algorithm & Summary

  • Algoithm checks for the highest-order n-gram (n=4)
  • If n=4 is not found, then checks the next lower-order model (n=3)
  • If n=3 is not found, then the app continues to check (n=2)
  • If n=2 is not found, then the app returns “No Match Found”