## -- Attaching packages ----------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.4
## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts -------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning in readLines("en_US/en_US.news.txt", skipNul = TRUE): incomplete final
## line found on 'en_US/en_US.news.txt'
| f_names | f_lines | n_char | n_words | pct_chars | pct_lines | pct_words |
|---|---|---|---|---|---|---|
| blogs | 899288 | 208361438 | 37334131 | 0.54 | 0.27 | 0.53 |
| news | 77259 | 15683765 | 2643969 | 0.04 | 0.02 | 0.04 |
| 2360148 | 162385035 | 30373583 | 0.42 | 0.71 | 0.43 |
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Joining, by = "word"
I will be using the table created for bi-grams as the basis for prediction. The user will input a word and the model will find the bi-gram with the greatest relative frequency given that word. The second word in this bi-gram will be the prediction of the model for the next word, given the userโs input word.