In each step, you will process your data for common text data issues. Be sure to complete each one in R and Python separately - creating a clean text version in each language for comparison at the end. Update the saved clean text at each step, do not simply just print it out.

Libraries / R Setup

##r chunk
##python chunk

Import data to work with

##r chunk
##python chunk 

Lower case

##r chunk
##python chunk

Removing symbols

##r chunk
##python chunk

Contractions

##r chunk
##python chunk

Spelling

##r chunk
##python chunk

Lemmatization

##r chunk
##python chunk

Stopwords

##r chunk
##python chunk

Tokenization

##r chunk
##python chunk

Check out the results

##r chunk
##python chunk

Note: here you can print out, summarize, or otherwise view your text in anyway you want.