TextPredictor
Padma Panchapakesan
28th February 2020
Capstone Project for Coursera Data Science Course
Introduction
- TextPredictor is a word predicting app
- The prediction model for this app is incorporated in a shiny app which provides a front end for user inputs
- The user should enter a text in the text box and press the Submit button
- The next predicted word will appear in the main panel
- The links for the Shiny App and the source code are given below
Data Set
- The data that was used for the prediction model was obtained from blogs, tweets and news.
This data set was provide as part of the coursera capstone project
The data obtained was preprocessed to remove extra whitespace, convert all letters to lower case, remove punctuation and remove numbers
Trigrams, bigrams and unigrams were generated from the preprocessed data
The ngrams were sorted according to their frequency of occurance(highest to lowest) and
stored in RData files
Algorithm
- User input is first searched in the sorted trigrams
- If the string is found, then the last word of the most frequent trigram is predicted as the next word
- Else the user input is searched in bigram
- If it is neither found in trigram or bigram, the most frequent unigram is predicted as the next word