'Word App' - Word Predictor

Ching Yin Goh
Mon Jul 11 20:42:58 2016

Executive Summary

'Word App' takes a phrase as input and outputs a prediction of the next word

The algorithm uses R 'quanteda' package to generate corpus from these data sources:

  • U.S. news
  • U.S. blogs
  • U.S. tweeters

'n-grams' are generated for the prediction

Data source: SwiftKey, 2016

High-level Algorithm

  • Download data
  • Clean data
  • Sample data from each sources
  • Create corpus
  • Tokenize words
  • Detect ngrams collocations from corpus
  • Return collocations and scores of the association measure
  • Parse inputs
  • Predict next word

How to use 'Word App'

  • Prediction
    1) Enter a word or phrase in the text box
    2) A list of potential next word will be displayed
    3) Search box allows user to look for word with specific letters

  • Visualization
    1) Additional feature of 'Word App' is the wordcloud
    2) Use slider to select the number of words to display
    3) Visually show the selected numbers of next word

'Word App' could be used to help people in learning language

Documentation and Source Codes

  • global.R - Functions to restore a single R object
  • getData.R - Functions to load and preprocess data
  • predict.R - Functions to predict word
  • ui.R - User interface for shiny app
  • server.R - Server for shiny app

Source codes, https://github.com/cygoh888/myApp2