Marek Kluczynski
19/06/2017
The following slide deck relates to the Capstone Project for the Data Science Capstone from the Johns Hopkins University on Cousera.
The problem this activity and app tries to solve is that of predicting text based on input, for instance when someone types “I went to the” the application should predict options for what the next word might be.
In order to solve this problem I opted for building an n-gram model, that is building a model that given one or two words the model will predict what the most likely next word.
The technology I used to solve this problem was as follows
The steps for processing the data were as follows
The output was a set of data frames for the data product.
Using the data frames a data product was put together (http://bit.ly/2rwxU1S) which takes input text and then does the following:
The data product could be described as a minimal viable product and does have some issues namely
I was to approach this problem again from scratch the data cleansing method would likely remain the same however I may use a predictive model such as a neural net which may give better performance. It should be noted that the method used above gave good results with the test quizzes on the course giving high degree of matches.