Coursera DS Capstone Project

Leandro Meili
2017-05-27

Word Prediction App

App Overview

This word prediction app gives the user the next word of a phrase

It uses an algorithm (Stupid Backoff) that checks the existance of n-grams (1 to 5) and gives a score based on the number of times that n-gram was seen.

The app returns the words with the top 5 scores

Stupid Backoff Algorithm app Reference: link

Development

The main steps to develop this app were:

  1. Load and clean the data

  2. Extract the n-grams (1 to 5)

  3. Write the stupid backoff algorithm

  4. Design and Deploy the app on Shiny

App Interface

This is the app interface.

The user input the text and the app automatically run the score for the next word

The plot shows the top 5 predictions with its respective scores

Lessons Learned

During this capstone project a lot of issues appeared.

  • The first model was using only 1,2 and 3-grams. And the precision was not very good. 4 and 5-grams were built to improve the model.
  • The n-gram count is demand a lot of computing processing, and to simplify, the n-gram function was set to read 3,000 observations each time. To read about 30% of all files, it took 9 hours. And the prediction model was built with these 30%.