stylo R packagestylo is a flexible R package used in the study of computational text analysis and stylometry:
Stylometry (computational stylistics) is concerned with the quantitative study of writing style, and is particularly useful in the exploratory statistical analysis of texts with respect to authorial writing style.
stylos Processing PreceduresPreprocessing → Feature extraction → Statistical Analysis → Visualization
This example is an unsupervised analysis of:
| Gospels | |||
|---|---|---|---|
| Matthew | Mark | Luke | John |
stylo can compare them.CorpusCreating a folder named “corpus” where the text files can be stored and used in analysis
stylo() then needs each Gospel saved as its own .txt file
This code creates one text file for each Gospel and saves it inside the corresponding corpus folder.
Output for this example:
| Gospel | File created |
|---|---|
| Matthew | corpus/Matthew_KJV.txt |
| Mark | corpus/Mark_KJV.txt |
| Luke | corpus/Luke_KJV.txt |
| John | corpus/John_KJV.txt |
stylo function| Function | What it does |
|---|---|
stylo() |
Primary function used to compare writing style across texts |
“It is quite a long story what this function does. Basically, it is an all-in-one tool for a variety of experiments in computational stylistics”
stylo() Sub-Arguments| Sub-Argument | What it does |
|---|---|
analysis.type |
Decides what stylometric analysis we want to run |
mfw.min / mfw.max / mfw.number |
Set how many “most frequent words” are included in the analysis |
mfw.incr |
Controls the step size when testing a range of most frequent words |
corpus.dir |
Tells stylo where the text files are stored (typically in a corpus folder) |
gui |
Controls whether stylo() opens an interactive menu or runs directly from written code |
stylo() functionselected_mfw <- 300 sets the feature for comparison (300 most frequent words)
Now, the R par function can be used to set up a plotting environment for visualizing the results of the stylo algorithm.
stylo plotting argumnetsstylo_ca <- run_stylo(...
# removes stylo's automatic titles
titles.on.graphs = FALSE,
# using stylo's automatic colors for labels
colors.on.graphs = "colors",
# adjusts the plot size
plot.custom.width = 8,
plot.custom.height = 5,
plot.font.size = 11,
plot.line.thickness = 2,
# displays the dendrogram horizontally
dendrogram.layout.horizontal = TRUE
title("CA Plot of the Gospels Using 300 Most Frequent Words")