Peer-graded Assignment: Final Project Submission

Vicente Castro
18 jun 2017

COURSERA - Final Project Submission

According with the instructions, I've used the information of Twitter, Blogs and News.

  • Taking into account that the files are very big (>200 MB), I've sampled the 10%.

The nexts slides, explain the algortihm used and how the app was constructed.

Description of the algorithm used

When I started the Capstone Project, I tried the same approach the majority in Coursera had done it, with n-grams; however, I wanted to try another thing, and looking for something different, I find the Word2Vec algorithm.

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words.

Word2vec was created by a team of researchers led by Tomas Mikolov at Google. The algorithm has been subsequently analysed and explained by other researchers.[2][3]. You can read information of the Word2Vec algortihm in google or here.

Word2Vec implementation in R

I used Word Vectors to implement the Word2Vec algorithm. Thanks to Benjamin Schmidt.Word Vector is An R package for building and exploring word embedding models. See Github

Description

This package does three major things to make it easier to work with word2vec and other vectorspace models of language.

  • You can prepare de sample data.
  • Obtain the model.
  • And predict

Description of the app, some instructions, and how it functions

You can open the app here.The app predict the next Word, using the Word2Vec model.

When you write the words, the app suggests three words as the next word, according with the Word2Vec model.

The prediction model uses the last seven (7) words of the sentence to try to predict. This means, that sentences with 7 or less words, are used to predict the next word.

The first screen that you see, it's the input box.


alt text

You don't need to submit the sentence.

alt text