A. Zuoza
16 August 2015
The goal of the project was to create a algorithm and application for predicting next word, which the user want to write.
This is briefly presentation about algorithm and application.
All this work is a part of the Coursera Data Science specialization, offered by Johns Hopkins University.
All calculations, analysis and application was done with R and RStudio.
Based on given data a frequency dictionary was build.
The dictionary consist from 88158 trig rams, which was found at least 5 times in given data. Small piece of dictionary is presented below.
X1 X2 Y Frequency
1 concerns about a 5
89 brothers and a 5
405 of becoming a 61
3162 and the ability 127
Prediction, whit the help of simple search, is working for third word only, i.e. if user entered two words, then algorythm is trying to predict third. If user entered more then three words, then prediction is made based on last two words.
There are three posible scenarios: 1. User input is found in the dictionary. Then the algorythm gives back ordered by frequency posibles third words. For example: - user input - output
Just last word was found in the dictionary. Then the algorythm gives back ordered by frequency posibles third words. For example:
User entry was not found in the dictionary. Then the algorythm gives back NA value.
My “next 6 words” prediction app is placed on shinyapps.io: https://azuoza.shinyapps.io/Capstone_project
The code of application, reports and scripst can be found on Git Hub: https://github.com/azuoza/
More about Data Science Specialization on Coursera can be found on: https://www.coursera.org/