Capstone Project: Word Predictor

Jonah Winninghoff

10/31/2020

THE CAPSTONE PROJECT

The final capstone project is to create and devise an algorithm via machine learning to make word prediction similar to most message apps in iPhone and other smartphones. This project is of Johns Hopkins Data Science Specialization Program under Coursera in partnership with Swiftkey.

This work is done solely using R programming language accompanied by Natural Language Processing (NLP) and several other techinques. More details can be seen in my work. Before you view my work, I would like you to go through this presentation in order to understand this better.

WHAT DOES THIS DATASET LOOK LIKE?

Originally, this data is like a book with many gibberish and profane languages. It turns into the dataset with a number of words associated with next words called bigram, which can turn into a word network. What you see below is just a demonstration. As soon as this presentation is finished, you will see the real complex networks.

image1

HOW WELL DOES THIS PERFORM?

This algorithm manages to eliminate the majority of profane and gibberish words. It is not only able to predict the next word through many word sequences but capable of changing a number of word suggests. It is just a stepping stone toward user-friendly interface. The speed of this algorithm is considered to be at optimal level.

image3

DRAWBACKS AND FUTURE IMPROVEMENTS

This algorithm gives no gurantee of predicting the next word that is grammarically correct. Not only that, it sometimes can be trapped in the same prediction cycle.

In future, this algorithm should be with grammar foundation and it should be able to make word prediction that relies on, rather than general population, demographic populace datasets without infringing ethical issues.

VIEW MY WORK

The application can be seen here. If you want to see how coding works, my GitHub page is here.

Image