Capstone Word Prediction Presentation
Nikolaos Perdikis
October 2019
Speed and Accuracy in Text Prediction
- Model for the English Language
- Non proprietary platform, Github for code transparency
- Exploratory Data Analysis to visually inspect the data
- R Language, free software environment for statistical computing, data analysis and graphics

Data and Exploratory Analysis
- Over 550MB of text from blogs, news and Twitter feeds,
nearly 70 million words in 3 million lines of text
- Identify trends in the data, most common words/combinations of words
- Natural Language Processing (NLP) algorithms

One Shiny Application
The application will attempt to predict the next word in a given sentence
When the user enters a text in the input box, this is replicated in the output pane and the algorithm chooses the most probable prediction
Based on the length of the provided sentence, the algorithm will use as much as 3 last words, if those exist. It will provide results with even one word
There is no need to press or click anywhere. As text is being input, the prediction appears in the relevant window
Benefits!
- Easy acquisition of texts and training of models
- Any language that contains text can be supported
- Non proprietary, non legacy software and hardware platform
- web interface for computers and mobile devices
