WordPredict App

Cristian Santa
January 2016

Introduction

This is an application for Data Science Capstone part of John Hopkins University Specialization in Data Science in the platform Coursera Inc joint with the SwiftKey company.

logos

Abstract

The predictive model is based on Katz's back-off model. Essentially, This Means That if the a n-gram has-been seen more than k times in training, the conditional probability of a word Given STI history is proportional to the maximum likelihood estimate of That n-gram. Otherwise, the conditional probability is equal to the back-off of the conditional probability “(n-1)-gram”.

Model

The App

The goal of this application is to predict the next word in a sentence that the user types in a text box. The dataset used for this app is part of a set of corpus called HC Corpora in English, divided into three sources: News, Blogs and Twitter.

Example

The App

he corpora have been collected from numerous different webpages, with the aim of getting a varied and comprehensive corpus of current use of the respective language.

Enjoy it!