Coursera Data Science Capstone

11/2/2021

Load and cleaning Data

I downloaded the dataset from here
Replace all non alphanumeric letters with space;
Remove excessive spaces;

dataset = sent_detect(dataset, language = "en", model = NULL)
body = VCorpus(VectorSource(dataset
body = tm_map( body, removeNumbers) # removing numbers
body = tm_map( body, stripWhitespace) # removing whitespaces
body = tm_map( body, tolower) #lowercasing all contents
body = tm_map( body, removePunctuation) # removing special characters

N-gram

Before creating an Ngram, first delete with Regex, begin with @, etc;
Split text at space to get n-gram
To reduce the N-gram size, first calculate frequency for each N-gram

body_2<-gsub("http\\w+", "", body)
token_n <- NGramTokenizer(body_2, Weka_control(min = n, max = n))

The total count of 2-gram is around 165,000.

Shiny N gram

Here you have the app shiny
Click here for Enter app shiny
You must write the word in the space

Information

Also add more information about where the dataset is
You can download the dataset I use and also the shiny app code

Load and cleaning Data

I downloaded the dataset from here

Replace all non alphanumeric letters with space;

Remove excessive spaces;

N-gram

Before creating an Ngram, first delete with Regex, begin with @, etc;

Split text at space to get n-gram

To reduce the N-gram size, first calculate frequency for each N-gram

The total count of 2-gram is around 165,000.

Shiny N gram

Here you have the app shiny

Click here for Enter app shiny

You must write the word in the space

Information

Also add more information about where the dataset is

You can download the dataset I use and also the shiny app code

Thank you