Data Science Capstone Project

Pedro Carvalho Brom - 02/11/2017

This presentation will briefly but comprehensively pitch an application for predicting the next word. The application is the capstone project for the Coursera Data Science specialization held by professors of the Johns Hopkins University and in cooperation with SwiftKey.

The objective

The main goal of this project is to build an application capable of predicting the next word given the previous word.

All text processing was based on the concept of Natural Language in which a dictionary of words is used more frequently in composition of phrases and popular expressions.

After the processing, a prediction model was created by Neural Networks, which allows an accuracy of approximately 74%.

How does the program work?

An Neural Network is typically defined by three steps:

  1. Input the information directly to the interconnections of the neurons;
  2. The weights of the interconnections, which are updated in the learning process;
  3. The activation function that converts the weighted input of a neuron to its output activation.

In this case you enter with a word and the machine evaluates the best prediction of the next word considering your writing routine (your natural language).

The application is simple to use: Just start typing =D

Some additional information

The next word prediction app is hosted on shinyapps.io: https://supermetrica.shinyapps.io/nextword/

The code of this application, as well as all the milestone report, related scripts, this presentation etc. can be found in this GitHub repo: https://github.com/pcbrom/CPSK

Learn more about the Coursera Data Science Specialization: https://www.coursera.org/specialization/jhudatascience

If you are interested in your network: https://www.facebook.com/pedraodeexatas