Coursera Data Science Capstone Project

Monika

The application is the Capstone Project of JOHN HOPKINS University Data Science specialization in cooperation with Swiftkey.

Introduction

The main goal of this presentation is showcasing the algorithm of predicting text model and shinny app- the interface of this predicting next word application.

To ease typing on mobile device, Swiftkey alike smart ketboard are designed. This application is a foundation of that kind of smart keyboard.

Approach

The first step toward a building a predicting text application is to understand the relationship between words. An Exploratory Data Analysis is performed on the data to analysis the nature of text.

Further a language model to predict next word will be implemented using n-gram language model and backoff model.

back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram

Algorithm

  • Calculate unigram, bigram, trigram and quadgram word-frequency list
  • Take last 3 words of input string, match it in quadgram
  • If found then the most occured word is output
  • If not found, take last 2 words of the input string and match it in down level i.e. trigram.
  • the process will be continued till the word is not found
  • In case of unseen word, most occured word in unigram will be output

Shinny App

Description - The shinny app is an user interface that allows users to enter input text and get next predictive word accordingly. The app has two sidebars

  • Predict Word

    • Enter the Input Text -> Click on Submit -> output on right panel
  • Word Bubble

    • Choose a n-gram radio button -> Bubble Chart appeared accordingly (top 100 words will be appeared on bubble chart)

Shinny app link https://msharma.shinyapps.io/predictWord/