Capstone Word Prediction Presentation

Nikolaos Perdikis
October 2019

Speed and Accuracy in Text Prediction

  • Model for the English Language
  • Non proprietary platform, Github for code transparency
  • Exploratory Data Analysis to visually inspect the data
  • R Language, free software environment for statistical computing, data analysis and graphics plot of chunk unnamed-chunk-1

Data and Exploratory Analysis

  • Over 550MB of text from blogs, news and Twitter feeds,
    nearly 70 million words in 3 million lines of text
  • Identify trends in the data, most common words/combinations of words
  • Natural Language Processing (NLP) algorithms

plot of chunk unnamed-chunk-2

One Shiny Application

  • The application will attempt to predict the next word in a given sentence

  • When the user enters a text in the input box, this is replicated in the output pane and the algorithm chooses the most probable prediction

  • Based on the length of the provided sentence, the algorithm will use as much as 3 last words, if those exist. It will provide results with even one word

  • There is no need to press or click anywhere. As text is being input, the prediction appears in the relevant window

Benefits!

  • Easy acquisition of texts and training of models
  • Any language that contains text can be supported
  • Non proprietary, non legacy software and hardware platform
  • web interface for computers and mobile devices

plot of chunk unnamed-chunk-3