michael.coursera@eipsoftware.com
28 December 2017
The predictive model that was used is the Katz-Backoff model.
The model looks at a predetermined list of word phrases referred to as n-grams. The n-grams are comprised of 2, 3, or 4 words. An example of a n-gram of size 3 would be “I love you”
If a user enters “I love”, the model tries to predict the next word based on how frequently the phrased appeared in the source material. The model may suggest, you, cars, or food. The model then ranks the suggestions based on the frequency and presents the top suggestions to the user.
The application is made up of three sections.
Section 1: User Input
User will enter in a word phrase. Each word separated by spaces.
Section 2: Number of Results
The user can request how many results to return. Options are from one result to 10 results. If 10 results are not available the top N available results are returned.
Section 3: Results
Predicted results are displayed in the right column. They are ranked by most likely and the probability is shown.
Application Performance:
- Startup: ~2 seconds
- Query runtime: Best Case: < 0.1ms and Worst Case: < 0.5ms
- Results Returned and Displayed to User: Average: < 0.8ms
- Prediction Results from model: ~20% accuracy.
Model results are low because of constraints of database size and limitations in using the Katz-Backoff model.
Assumption user has a high-speed (>20Mb/s) database connection.