Final Project-Capstone Next Word Prediction
Next Word Prediction
Ken Peters
date: 9/7/2020
autosize: true
Course Instructors:
Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models.
The corpora are collected from publicly available sources by a web crawler and consists of 3 files that are composed of Blogs, News and Twitters, all provided by SwiftKey.
install.packages(“kableExtra”) install.packages(“stringi”)
| Size_in_Mb | Number_of_Lines | Number_of_Words | |
|---|---|---|---|
| Blogs | 200 | 899288 | 38154238 |
| 159 | 2360148 | 30218125 | |
| News | 196 | 77259 | 2693898 |
| All | 555 | 3336695 | 71066261 |
The Data is so large, we need to use a sample of only 1% of the Data.
Next we Clean and Pre-Process the Data.
Link to Next Word prediction
A description of the algorithm is on the next slide