Capstone Project

Leandro Jimenez

This presentation is a comprehensively explanation of a predicting the next word.

The application is the capstone project for the Coursera Data Science specialization

Goal

The main objective of this project is to build a brilliant application to predict the next word and show the result, which means the next word after typing a phrase

This exercise was divided in seven weeks which encourages us to the cleaning, the exploratory analysis and mainly the creation of a predictive model to put into practice the knowledge that I have acquired during this amazing specialization .

All text data that is used to create a frequency dictionary. Then, the prediction comes from a corpus called HC Corpora using well-known R packages

About Methods & Models

After creating the data sets of the data from three resources of Corpora of HC, the data was cleaned, eliminating:

  • punctuation - links - white space - numbers and others

This data sample was then tokenized into the so-called n-grams.

It was created data set of frequency matrices have been transferred to frequency dictionaries of bi-, tri-, and quadgram to each data set.

The prediction model uses the n-gram dataset to make prediction. A backoff predicting model is used to compare the first 3 words against the dataset and produce the predicted word based on the last word. It will then compare 2 words and lastly one word. The frequency column is used to sort the data with the highest frequency as better prediction score.

How to use

After opening the app: https://jleandroj1.shinyapps.io/capstonedatascience/

  • enter a phrase

  • wait a moment

  • see the word

  • see the complete phrase that you type

This app only works with the language english

More...

To contact: jleandroj@gmail.com

A short messages

To my teachers: thanks for everthing, you changed my life

To everybody: You need to take this specialization it really amazing