SwiftKey Prediction App

Andres Camilo Zuñiga Gonzalez

24/8/2020

App Overview

  • This capstone project is focused on word prediction of text datasets from the SwiftKey Company.
  • This app is developed with n-gram algorithm and deployed via R Shiny.
  • Instead of only showing the most probable next word, this app creates a word cloud that displays the words with size as and indicator of probability of occurrence.
  • See the app here.

Approach Explanation

  • The data comes from three different sources: Blogs, News and Twitter in English.
  • An exploratory data analysis was developed for the whole files and can be seen here.
  • For simplicity of the app, only 10% of each file was used in the development.
  • In the development of the Shiny App the packages used were dplyr, tidyr, wordcloud2, tidytext.
  • Results can be filtered if the user wants to avoid stop words, which are words like: of, in, the, among others that in some cases do not provide meaningful information.
  • Only 2-gram and 3-gram are used in the development of the app.

Usage

  • Type up to two (2) words in the text box
  • Select a type of file you would like to search the word you typed.
  • Check the box if you want to exclude stop words in the predicted words.
  • Hover over the words in the wordcloud plot to see how many times they appear in the datasets.
  • IMPORTANT NOTE: The app updates itself and might take up to 10 seconds to plot the wordcloud.

Usage (Fun Stuff)

  • If no word is typed, the app will display the message TYPE SOME WORDS.
  • If the words could not be found, the app will display the message WORDS NOT FOUND.
  • If the user types more than two words, the app will display these in the wordcloud.
  • The shape of the wordcloud changes randomly.
  • The background color changes according to the type of text the user wants to look up into (i.e., twitter blue when twitter is selected).

App Demo

See the app here.