Capstone Project - Data Science Specialization

Andre Morato
06/14/2017

The application

This presentation explains the app created to predict next word in a sentence provided by the user.

Using the app:

Just write a sentence in text box and wait for a table with most promissing next word.
Time eleapsed is less than 2 seconds.

Link for the applicarion:

https://amdmorato.shinyapps.io/Nextwordprediction/

Link for GitHub repository containing all codes:

https://github.com/andmorato/Capstone_Project

How it works?

The n-gram theory is applied to obtain words to be suggested to user.
Main principle is matching the last n-1 words of sentence provided with n-1 words of database. The suggested word will be the last one of n-grams with higher probability.

Example (n=4):

User sentence: I would like to
Last n-1 word: “would like to”

  term.1 term.2 term.3 term.4   Pkn
1  would   like     to    see 0.130
2  would   like     to   know 0.083
3  would   like     to  think 0.057
4  would   like     to     be 0.051
5  would   like     to   have 0.045

By database search, showed above, the suggested word will be “see”.

What if the sentence contains a word that is out of app dictionary?

Example (n=4):

User sentence: I'm a huge pokemon fan and blastoise is my
Last n-1 words: “blastoise is my”

Problem:

Blastoise is out of database. So, there will be no match among user sentence and database.

Solution:

Use a lower order n-gram. In this case, n=3.

Example (n=3):

User sentence: I'm a huge pokemon fan and blastoise is my
Last n-1 words: “is my” [There are matches and suggested word is “favorite”]

The final model?

The selection of model parameters was driven by two boundaries:

1) Time elapsed among user entry and app response.

The maximum waiting time was set to be no more than 2 seconds for good user experience.

Capstone Project - Data Science Specialization

The application

Using the app:

Link for the applicarion:

Link for GitHub repository containing all codes:

How it works?

Example (n=4):

What if the sentence contains a word that is out of app dictionary?

Example (n=4):

Problem:

Solution:

Example (n=3):

The final model?

The selection of model parameters was driven by two boundaries:

1) Time elapsed among user entry and app response.

2) Accuracy.

Final model parameters:

For more detailed information, access the Documentation tab in application.