Marievee Santana
January 23, 2022
The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:
A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching your algorithm and app as if you were presenting to your boss or an investor.
Full background on the John's Hopkins Data Science Capstone class can be found at https://www.coursera.org/learn/data-science-project/home/welcome
The algorithm I created is a very simple regression algorithm that looks at the probabilities of certain words being used together. My algorithm includes data from Blogs, Tweets, and News that were provided within the class materials. For simplicity, I only used a portion of the data in my corpus and looked at ngrams of n = 2, 3 & 4. I explored larger ngrams but ultimately decided to only include those 3 types.
ui <- fluidPage(
# Application title
titlePanel("Word Predictor"),
p("You enter a phrase and the app will predict the next word."),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
h2("HOW to USE"),
h5("1. Enter a word or phrase in the box."),
h5("2. The algorithm will predict the next word."),
h5("3. If you get a question mark it means no prediction. Check spelling or try a again"),
h5("4. Additional tabs show plots of the top ngrams in the dataset"),
br(),
),
# Show a plot of the generated distribution
mainPanel(
tabsetPanel(
tabPanel("Predict",
textInput("user_input", h3("Your Input:"),
value = "Your words"),
h3("Predicted Next Word:"),
h4(em(span(textOutput("ngram_output"), style="color:green")))),
tabPanel("Top Bigrams",
br(),
img(src = "biplotcloud.png", height = 500, width = 700)),
tabPanel("Top Trigrams",
br(),
img(src = "trigramcloud.png", height = 500, width = 700)),
tabPanel("Top Quadgrams",
br(),
img(src = "quadgrams.png", height = 500, width = 700))
)
)
)
)
In my app, I included a few tabs including wordclouds and barplot of the most common words found in the bigrams, trigrams and quadgrams in the predictive model I used. Here are a couple of examples.
knitr::include_graphics("./www/biplotcloud.png")
For additional information you can contact me at