Carlos Rios
11/20/2020
1- Description of the Project.
The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.
2- The Cleansing data.
First of all we need to:
Libraries to tokenize the text (omitting stopwords). For twitter text we could use function tokenize_tweets().
Is possible to use different ways, like Twitter.
The data file, In order to build a function that can provide word-prediction, a predictive model is needed. Such models use known content to predict unknown content. For this package, that content comes from the HC Corpora collection, which is “a collection of corpora for various languages freely available to download.
The version used was obtained from an archive maintained at Coursera. The file included three text document collections, blogs, news feeds, and tweets, in four languages, German, English, Finnish, and Russian, of which only the English collections were used.
Prediction Model According to “an n-gram is a contiguous sequence of n items from a given sequence of text or speech.” This package takes a key word or phrase, matches that key to the most frequent n-1 term found in a TDM of n-word terms, and returns the nth word of that item.
You can find the app in the link below. : )
RestoreWorkspace: Default SaveWorkspace: Default AlwaysSaveHistory: Default EnableCodeIndexing: Yes UseSpacesForTab: Yes NumSpacesForTab: 4 Encoding: UTF-8 RnwWeave: knitr LaTeX: pdfLaTeX AutoAppendNewline: Yes StripTrailingWhitespace: Yes BuildType: Package PackageUseDevtools: Yes PackageInstallArgs: –no-multiarch –with-keep.source PackageRoxygenize: rd,collate,namespace