Data Science Capstone Final Assigment

Jose Javier Saravia Mata
24/06/2020

Project Description

This Shiny APP was built as part of the Data Science Specialization Capstone Final Assigment. The followes steps was:

  • Read the .txt files
  • Take a sample of that files
  • Use corpus to clean data
  • Use ngram to extract word combinations
  • Export dataset to RDS file
  • Use a function to read RDS and predict next word

Initial code is avaliable on: https://drive.google.com/file/d/1CtbhG-E8bKtOc_DD5rSuAZa2CdNraQbd/view?usp=sharing

Function to predict

predecir <- function(model, palabra) { library(data.table) I1 <- model[model$keys %like% palabra,] I2 <- as.data.frame(aggregate(I1$freq, by = list(I1$prediction), FUN = sum)) names(I2) <- c(“word”, “freq”) prob <- round(as.numeric(I2$freq / sum(I2$freq)),2) I2 <- cbind(I2, prob) I3 <- as.data.frame(I2[order(-I2$prob),]) devolver <- I3[1:4,c(1,3)] return(devolver) }

Dataset stored into RDS

This is a sample of the dataset used with prediction function…

    keys     values freq prediction
1    for    for the  261        the
2    and    and the  178        the
3   with   with the  122        the
4   from   from the   96        the
5 thanks thanks for   93        for
6    you    you can   78        can

Word prediction APP running

App avaliable on: https://licjaviersaravia.shinyapps.io/CapstoneFinalAssigment/

Instructions: just type a 2-3 words sentence to analyce and then click “go”