Tarun Kaushik
April 23, 2015
Part of Capstone of Data Science Specialization by John Hopkins University on Coursera
In this project the objective is to predict the word the user is about to type. For this user would have to type a few words and from the words that the user types, a prediction would be made.
The three broad steps involved in the capstone are as follows:
Getting and Cleaning data
Prediction
After tokenization, final dataset was created which contained
A function taking a string as the input used the data from the above four datasets, and arranged the predictions in decreasing order of probability, for each dataset in aforementioned order.
From the list of predictions a maximum of 10 predictions were displayed. The number of predictions displayed depends on input by the user.
To evaluate the performance of App, the left over data which was 95% of the total data obtained from the course website was used. There are two parameters which were checked.
Following are the results:
Another version of model with 70% of whole data used for delevopment and 30% of the data for validation gave the following results:
However in the App 5% data for development was used as there were memory and run-time constraints.