Calvin Chin [calvin3663@hotmail.com]
April 2016
Project Overview
People are spending increasing amount of time on their mobile devices. What if we could have an application that makes it easier/faster for people to type on their mobile devices by predicting the next word based on the sentense entered? This is a simple Shiny application to demonstrates the power of Text Prediction Model.
The project begins by mining large corpus of text sentences obtained from news, blogs and twitter data. The objective is to discover how words are put together, which will form the basis to build the Term Frequency database to be used in the text prediction application.
Exploratory Analysis Report: http://rpubs.com/calvin3663/DataScienceCapstoneExploratoryAnalysis/
The graphs below show an excerpt of the initial data analysis of the Top Term Frequencies.
The Text Prediction Model is essentially a re-usable R function (FuncTextPrediction.R) that utilizes the Term Frequency Database created in during the Data Mining and Text Analysis phase summarized in the previous section. The function takes in text sentense as input, extracts the terms, and employ text matching algorithm in order to obtain the list of possible next words. Five possible suggestions is then return to the calling application.
The Text Prediction Function and the accompanying Term Frequency Database can be easily port into different environment, and can be easily integrated into mobile devices to extend the capabilities of existing mobile applications.
This Smart Text Prediction Algorithm has tremendous potential based on the following factors:
To run the application, use your browser to open the link below:
https://calvin3663.shinyapps.io/DataScienceCapstone_NgramTextPredictor/