Ljiljana
Apr 23, 2015
In this Capstone project we worked on understanding and building predictive text models like those used by SwiftKey.
For example, when a user types:
I went to the
the application presents three options for what the next word might be.
The text data comes from HC Corpora. The training data containing US blogs, news and tweets can be downloaded here.
The project can be roughly divided in the following steps:
In order to save memory and to speed-up the computation we subsampled 50000 lines from the entire data. Then we performed various transformations on the raw text, including:
For this prototype application we decided to use only bi-gram and tri-gram models when making suggestions to the user.