Zhouyi Wu
October 2016
This presentaion is an introduction of a shiny app, which is the project of coursera data science capstone.
The goal of this app is to pedict the next word with a giving phrase.
For more details of this course and project: https://www.coursera.org/learn/data-science-project/.
The final app: https://theoneeno.shinyapps.io/predict_word/
The dataset is from HC Corpora https:www.corpora.heliohost.org
We use three files of this dataset:
[1] "en_US.blogs.txt" "en_US.news.txt" "en_US.twitter.txt"
Limited by small memory of my pc, I randomly selected 2000 samples from each objective to build the model.
The analysis of data including these steps:
We only use onigram and bigram to build prediction, because the sample set we select is small, trigram is not statistically representative.
To use the app, simply type in the phrase and then click “submit”.
The app will give you the predicted next word.
If your text is not in the bigram model, the app will return the most frequently word in uniram.
Than you for your test