Marievee's Final Presentation

Marievee Santana
January 23, 2022

Background

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching your algorithm and app as if you were presenting to your boss or an investor.

Full background on the John's Hopkins Data Science Capstone class can be found at https://www.coursera.org/learn/data-science-project/home/welcome

The Analysis

The algorithm I created is a very simple regression algorithm that looks at the probabilities of certain words being used together. My algorithm includes data from Blogs, Tweets, and News that were provided within the class materials. For simplicity, I only used a portion of the data in my corpus and looked at ngrams of n = 2, 3 & 4. I explored larger ngrams but ultimately decided to only include those 3 types.

The Shiny App

ui <- fluidPage(

        # Application title
        titlePanel("Word Predictor"),
        p("You enter a phrase and the app will predict the next word."),

        # Sidebar with a slider input for number of bins
        sidebarLayout(
                sidebarPanel(
                        h2("HOW to USE"),
                        h5("1. Enter a word or phrase in the box."),
                        h5("2. The algorithm will predict the next word."),
                        h5("3. If you get a question mark it means no prediction. Check spelling or try a again"),
                        h5("4. Additional tabs show plots of the top ngrams in the dataset"),
                        br(),
                ),

                # Show a plot of the generated distribution
                mainPanel(
                        tabsetPanel(
                                tabPanel("Predict",
                                         textInput("user_input", h3("Your Input:"),
                                                   value = "Your words"),
                                         h3("Predicted Next Word:"),
                                         h4(em(span(textOutput("ngram_output"), style="color:green")))),

                                tabPanel("Top Bigrams",
                                         br(),
                                         img(src = "biplotcloud.png", height = 500, width = 700)),
                                tabPanel("Top Trigrams",
                                         br(),
                                         img(src = "trigramcloud.png", height = 500, width = 700)),
                                tabPanel("Top Quadgrams",
                                         br(),
                                         img(src = "quadgrams.png", height = 500, width = 700))
                        )
                )
        )
)

Additional Information

In my app, I included a few tabs including wordclouds and barplot of the most common words found in the bigrams, trigrams and quadgrams in the predictive model I used. Here are a couple of examples.

knitr::include_graphics("./www/biplotcloud.png")

plot of chunk unnamed-chunk-2