TextPredictor

Padma Panchapakesan
28th February 2020

Capstone Project for Coursera Data Science Course

TextPredictor is a word predicting app
The prediction model for this app is incorporated in a shiny app which provides a front end for user inputs
The user should enter a text in the text box and press the Submit button
The next predicted word will appear in the main panel
The links for the Shiny App and the source code are given below

The data that was used for the prediction model was obtained from blogs, tweets and news.
This data set was provide as part of the coursera capstone project
The data obtained was preprocessed to remove extra whitespace, convert all letters to lower case, remove punctuation and remove numbers
Trigrams, bigrams and unigrams were generated from the preprocessed data
The ngrams were sorted according to their frequency of occurance(highest to lowest) and stored in RData files

User input is first searched in the sorted trigrams
If the string is found, then the last word of the most frequent trigram is predicted as the next word
Else the user input is searched in bigram
If it is neither found in trigram or bigram, the most frequent unigram is predicted as the next word

Shiny App: https://padmapanchapakesan.shinyapps.io/TextPredictor/

GitHub: https://github.com/PadmaPanchapakesan/DataScienceCapstone