[Video]
# Examine the text data
text_df
## person
## 1 Nick
## 2 Jonathan
## 3 Martijn
## 4 Nicole
## 5 Nick
## 6 Jonathan
## 7 Martijn
## 8 Nicole
## text
## 1 DataCamp courses are the best
## 2 I like talking to students
## 3 Other online data science curricula are boring.
## 4 What is for lunch?
## 5 DataCamp has lots of great content!
## 6 Students are passionate and are excited to learn
## 7 Other data science curriculum is hard to learn and difficult to understand
## 8 I think the food here is good.
# Calc overall polarity score
text_df %$% polarity(text)
## all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
## 1 all 8 54 0.179 0.452 0.396
# Calc polarity score by person
(datacamp_conversation <- text_df %$% polarity(text, person))
## person total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
## 1 Jonathan 2 13 0.577 0.184 3.141
## 2 Martijn 2 19 -0.478 0.141 -3.388
## 3 Nick 2 11 0.428 0.028 15.524
## 4 Nicole 2 11 0.189 0.267 0.707
# Counts table from datacamp_conversation
counts(datacamp_conversation)
## person wc polarity pos.words neg.words text.var
## 1 Nick 5 0.447 best - DataCamp courses are the best
## 2 Jonathan 5 0.447 like - I like talking to students
## 3 Martijn 7 -0.378 - boring Other online data science curricula are boring.
## 4 Nicole 4 0.000 - - What is for lunch?
## 5 Nick 6 0.408 great - DataCamp has lots of great content!
## 6 Jonathan 8 0.707 passionate, excited - Students are passionate and are excited to learn
## 7 Martijn 12 -0.577 - hard, difficult Other data science curriculum is hard to learn and difficult to understand
## 8 Nicole 7 0.378 good - I think the food here is good.
# Plot the conversation polarity
plot(datacamp_conversation)
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
## Warning: Ignoring unknown aesthetics: x
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
# clean_corpus(), tm_define are pre-defined
clean_corpus
## function(corpus){
## corpus <- tm_map(corpus, content_transformer(replace_abbreviation))
## corpus <- tm_map(corpus, removePunctuation)
## corpus <- tm_map(corpus, removeNumbers)
## corpus <- tm_map(corpus, removeWords, c(stopwords("en"), "coffee"))
## corpus <- tm_map(corpus, content_transformer(tolower))
## corpus <- tm_map(corpus, stripWhitespace)
## return(corpus)
## }
tm_define
## x
## 1 Text mining is the process of distilling actionable insights from text.
## 2 Sentiment analysis represents the set of tools to extract an author's feelings towards a subject.
# Create a VectorSource
tm_vector <- VectorSource(tm_define)
# Apply VCorpus
tm_corpus <- VCorpus(tm_vector)
# Examine the first document's contents
content(tm_corpus[[1]])
## [1] "Text mining is the process of distilling actionable insights from text."
## [2] "Sentiment analysis represents the set of tools to extract an author's feelings towards a subject."
# Clean the text
tm_clean <- clean_corpus(tm_corpus)
# Reexamine the contents of the first doc
content(tm_clean[[1]])
## [1] "text mining process distilling actionable insights text"
## [2] "sentiment analysis represents set tools extract authors feelings towards subject"
Michael is a hybrid thinker and doer—a byproduct of being a CliftonStrengths “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470