Project Description

Synopsis

The problem addressed in this document is whether stylometric analysis of Plato’s dialogues can be effective when applied to smaller samples (500-1000 words), which are likely to be affected by random noise (Eder 2015). The sample size necessary for the correct classification of Plato’s texts in a supervised machine-learning setup is tested using the R package Stylo (Eder, Rybicki, and Kestemont 2016). Burrows’ Delta is adopted throughout as a method proved most effective in attribution experiments (Burrows 2002).

Methodology

A word on the “correct classification” is in order here. Our task is to see whether the classifier can discriminate between different styles of Plato associated in these tests with Platos 1-4. However, if a text by “Plato 1” is “misattributed” to “Plato 3,” this may only mean that we started with a wrong assumption concerning its stylistic neighbors. “Misattribution” is therefore a signal that our Platos do not look very different for the machine and thus need to be reshuffled. There is also a possibility that some of the dialogues were revised and thus contain various stylistic layers (Howland 1991), and I do not assume that all texts in this study are stylistically homogeneous. However, we may reasonably expect that stylistic variations within one text will be signaled by a high percent of misattributions. At any rate, it is only for the sake of further comparison with a test dialogue we suspect to have been revised that we need several Platos. As long as they, on average, represent a certain stylistic tendency, and as long as this tendency is visible for the classifier, they can serve as suitable comparanda.

Corpus Preparation

For this tests, I used Diorisis Ancient Greek Corpus (Vatri and McGillivray 2018). On the accuracy of lemmatization, see (Vatri and McGillivray 2020). The code I used for extracting the lemmata is accessible via links: Parsing Plato’s Republic (Separate Books), Parsing Plato’s Laws (Separate Books), Corpus_Platonicum: Lemmata Extraction. I start with the list of files produced by this code in my working directory.

Packages

library(stylo)

Testing Sample Size

Test 1: Plato 1 (Prt., Grg.) vs. Plato 2 (Lg. 5, Plt.)

I start with positing only two authors whom I believe to be stylistically remote, and this is confirmed by the output. Except for the tests with 35 mfw and 500-w blocks of the Statesman, the accuracy is above 90%. Note specifically that 500-w blocks, if used with 70 or 100 mfw, give a sufficiently good result. There is no need to print confusion matrices

dir.create("corpus1")
file.copy("Protagoras.txt", "corpus1")
file.copy("Gorgias.txt", "corpus1")
file.copy("Laws5.txt", "corpus1")
file.copy("Statesman.txt", "corpus1")
setwd("corpus1")
file.names <- list.files()
new.file.names <- c("Pl1_Gorgias.txt", "Pl2_Laws5.txt", "Pl1_Protagoras.txt", "Pl2_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp1 <- size.penalize(corpus.dir = "corpus1", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp1$accuracy.scores 
## $Pl1_Gorgias
##          500 1000 1500
## mfw_35  0.99 0.99    1
## mfw_70  1.00 1.00    1
## mfw_100 0.99 1.00    1
## 
## $Pl1_Protagoras
##          500 1000 1500
## mfw_35  0.97 1.00 0.99
## mfw_70  0.98 0.99 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl2_Laws5
##          500 1000 1500
## mfw_35  0.97 0.93 0.96
## mfw_70  1.00 1.00 1.00
## mfw_100 1.00 1.00 1.00
## 
## $Pl2_Statesman
##          500 1000 1500
## mfw_35  0.85 0.93 0.93
## mfw_70  0.94 1.00 0.99
## mfw_100 0.94 0.99 1.00
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Test 2: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)

We now complicate the picture a bit by adding the third author, but the classification is again successful.

dir.create("corpus2")
file.copy("Protagoras.txt", "corpus2")
file.copy("Gorgias.txt", "corpus2")
file.copy("Laws8.txt", "corpus2")
file.copy("Laws9.txt", "corpus2")
file.copy("Republic8.txt", "corpus2")
file.copy("Republic9.txt", "corpus2")
setwd("corpus2")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl3_Laws8.txt", "Pl3_Laws9.txt",      "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp2 <- size.penalize(corpus.dir = "corpus2", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp2$accuracy.scores 
## $Pl1_Gorgias
##          500 1000 1500
## mfw_35  1.00    1    1
## mfw_70  1.00    1    1
## mfw_100 0.99    1    1
## 
## $Pl1_Protagoras
##          500 1000 1500
## mfw_35  0.85 0.96 0.99
## mfw_70  0.86 0.97 0.99
## mfw_100 0.95 1.00 1.00
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  0.94 0.98    1
## mfw_70  1.00 1.00    1
## mfw_100 0.98 1.00    1
## 
## $Pl2_Republic9
##          500 1000 1500
## mfw_35  0.92 0.97    1
## mfw_70  0.98 1.00    1
## mfw_100 0.99 1.00    1
## 
## $Pl3_Laws8
##          500 1000 1500
## mfw_35  0.93 0.98 0.98
## mfw_70  0.98 1.00 0.99
## mfw_100 1.00 1.00 1.00
## 
## $Pl3_Laws9
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Test 3: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Sph. 8, Plt.)

This case differs from the previous one in that we replace the Laws with the Sophist and the Statesman. Even if with 100 mfw and 1000-w blocks the accuracy is 100% in all cases, the Statesman, again (cf. Test 1), behaves suspiciously. For this dialogue, the accuracy drastically drops with 35 mfw, and (as the confusion tables printed below show) it is classified into Plato 2 in more than 50% of cases, notwithstanding the length of the sample. Given its unstable attribution, I consider it best to exclude the Statesman from further tests.

dir.create("corpus3")
file.copy("Protagoras.txt", "corpus3")
file.copy("Gorgias.txt", "corpus3")
file.copy("Sophist.txt", "corpus3")
file.copy("Statesman.txt", "corpus3")
file.copy("Republic8.txt", "corpus3")
file.copy("Republic9.txt", "corpus3")
setwd("corpus3")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl1_Protagoras.txt", "Pl2_Republic8.txt",  "Pl2_Republic9.txt",  "Pl3_Sophist.txt", "Pl3_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp3 <- size.penalize(corpus.dir = "corpus3", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp3$accuracy.scores 
## $Pl1_Gorgias
##          500 1000 1500
## mfw_35  0.97    1    1
## mfw_70  0.99    1    1
## mfw_100 0.98    1    1
## 
## $Pl1_Protagoras
##          500 1000 1500
## mfw_35  0.70 0.90 0.93
## mfw_70  0.91 0.98 1.00
## mfw_100 0.86 0.96 0.97
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  0.99 0.99    1
## mfw_70  1.00 1.00    1
## mfw_100 1.00 1.00    1
## 
## $Pl2_Republic9
##         500 1000 1500
## mfw_35  0.9 0.99    1
## mfw_70  1.0 1.00    1
## mfw_100 1.0 1.00    1
## 
## $Pl3_Sophist
##          500 1000 1500
## mfw_35  0.82  0.9 0.91
## mfw_70  1.00  1.0 1.00
## mfw_100 1.00  1.0 1.00
## 
## $Pl3_Statesman
##          500 1000 1500
## mfw_35  0.31 0.42 0.36
## mfw_70  0.85 0.90 0.95
## mfw_100 0.86 0.96 1.00
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp3$confusion.matrices
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
##     500 1000 1500
## Pl1  97  100  100
## Pl2   0    0    0
## Pl3   3    0    0
## 
## $Pl1_Gorgias$mfw_70
##     500 1000 1500
## Pl1  99  100  100
## Pl2   1    0    0
## Pl3   0    0    0
## 
## $Pl1_Gorgias$mfw_100
##     500 1000 1500
## Pl1  98  100  100
## Pl2   2    0    0
## Pl3   0    0    0
## 
## 
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
##     500 1000 1500
## Pl1  70   90   93
## Pl2  30   10    7
## Pl3   0    0    0
## 
## $Pl1_Protagoras$mfw_70
##     500 1000 1500
## Pl1  91   98  100
## Pl2   7    2    0
## Pl3   2    0    0
## 
## $Pl1_Protagoras$mfw_100
##     500 1000 1500
## Pl1  86   96   97
## Pl2  14    4    3
## Pl3   0    0    0
## 
## 
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  99   99  100
## Pl3   1    1    0
## 
## $Pl2_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## $Pl2_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## 
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
##     500 1000 1500
## Pl1   4    1    0
## Pl2  90   99  100
## Pl3   6    0    0
## 
## $Pl2_Republic9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## $Pl2_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## 
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
##     500 1000 1500
## Pl1  17    9    9
## Pl2   1    1    0
## Pl3  82   90   91
## 
## $Pl3_Sophist$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Sophist$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## $Pl3_Statesman
## $Pl3_Statesman$mfw_35
##     500 1000 1500
## Pl1   2    0    0
## Pl2  67   58   64
## Pl3  31   42   36
## 
## $Pl3_Statesman$mfw_70
##     500 1000 1500
## Pl1   1    0    0
## Pl2  14   10    5
## Pl3  85   90   95
## 
## $Pl3_Statesman$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  14    4    0
## Pl3  86   96  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 4: Plato 1 (Prt., Chrm.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)

We now return to the combination presented in Test 2, but add a little tweak by replacing the Gorgias with the Charmides. Now the accuracy for the Protagoras collapses: it apparently has as much to do (stylistically) with the Republic as with the Charmides.

dir.create("corpus4")
file.copy("Protagoras.txt", "corpus4")
file.copy("Charmides.txt", "corpus4")
file.copy("Laws8.txt", "corpus4")
file.copy("Laws9.txt", "corpus4")
file.copy("Republic8.txt", "corpus4")
file.copy("Republic9.txt", "corpus4")
setwd("corpus4")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws8.txt", "Pl3_Laws9.txt",     "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp4 <- size.penalize(corpus.dir = "corpus4", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp4$accuracy.scores 
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.98 0.99    1
## mfw_70  0.99 1.00    1
## mfw_100 1.00 1.00    1
## 
## $Pl1_Protagoras
##          500 1000 1500
## mfw_35  0.74 0.87 0.89
## mfw_70  0.49 0.64 0.54
## mfw_100 0.74 0.83 0.78
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  0.93 1.00    1
## mfw_70  0.96 0.99    1
## mfw_100 0.93 1.00    1
## 
## $Pl2_Republic9
##          500 1000 1500
## mfw_35  0.92 0.97 0.99
## mfw_70  0.96 1.00 1.00
## mfw_100 0.98 1.00 1.00
## 
## $Pl3_Laws8
##          500 1000 1500
## mfw_35  0.88 0.99 0.99
## mfw_70  0.97 1.00 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl3_Laws9
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp4$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  98   99  100
## Pl2   2    1    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  99  100  100
## Pl2   1    0    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1 100  100  100
## Pl2   0    0    0
## Pl3   0    0    0
## 
## 
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
##     500 1000 1500
## Pl1  74   87   89
## Pl2  25   13   11
## Pl3   1    0    0
## 
## $Pl1_Protagoras$mfw_70
##     500 1000 1500
## Pl1  49   64   54
## Pl2  50   36   46
## Pl3   1    0    0
## 
## $Pl1_Protagoras$mfw_100
##     500 1000 1500
## Pl1  74   83   78
## Pl2  24   17   21
## Pl3   2    0    1
## 
## 
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  93  100  100
## Pl3   7    0    0
## 
## $Pl2_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  96   99  100
## Pl3   4    1    0
## 
## $Pl2_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  93  100  100
## Pl3   7    0    0
## 
## 
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
##     500 1000 1500
## Pl1   2    2    0
## Pl2  92   97   99
## Pl3   6    1    1
## 
## $Pl2_Republic9$mfw_70
##     500 1000 1500
## Pl1   3    0    0
## Pl2  96  100  100
## Pl3   1    0    0
## 
## $Pl2_Republic9$mfw_100
##     500 1000 1500
## Pl1   2    0    0
## Pl2  98  100  100
## Pl3   0    0    0
## 
## 
## $Pl3_Laws8
## $Pl3_Laws8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  12    1    1
## Pl3  88   99   99
## 
## $Pl3_Laws8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   3    0    0
## Pl3  97  100  100
## 
## $Pl3_Laws8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3  99  100  100
## 
## 
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 5: Plato 1 (Grg., Chrm.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Sph., Lg. 9)

We now modify the combination presented in Test 3. The unstable Statesman is replaced with the Laws 9, and I also try to combine the Charmides with the Gorgias in group 1. This results in a disaster: the accuracy rates for the Laws and for the Sophist are just 0-15%: Laws 9 are mainly classified into Plato 2 (see confusion matrices), and the Sophist – either into Plato 1 or into Plato 2. The combination clearly does not work.

dir.create("corpus5")
file.copy("Gorgias.txt", "corpus5")
file.copy("Charmides.txt", "corpus5")
file.copy("Sophist.txt", "corpus5")
file.copy("Laws9.txt", "corpus5")
file.copy("Republic8.txt", "corpus5")
file.copy("Republic9.txt", "corpus5")
setwd("corpus5")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Gorgias.txt", "Pl3_Laws9.txt",     "Pl2_Republic8.txt", "Pl2_Republic9.txt", "Pl3_Sophist.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp5 <- size.penalize(corpus.dir = "corpus5", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp5$accuracy.scores 
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.92 0.98    1
## mfw_70  0.98 1.00    1
## mfw_100 0.94 1.00    1
## 
## $Pl1_Gorgias
##          500 1000 1500
## mfw_35  0.81 0.94 0.98
## mfw_70  0.91 0.96 0.98
## mfw_100 0.83 0.97 0.97
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  1.00    1    1
## mfw_70  0.98    1    1
## mfw_100 0.98    1    1
## 
## $Pl2_Republic9
##          500 1000 1500
## mfw_35  0.95    1    1
## mfw_70  0.98    1    1
## mfw_100 1.00    1    1
## 
## $Pl3_Laws9
##          500 1000 1500
## mfw_35  0.01 0.00 0.00
## mfw_70  0.05 0.02 0.01
## mfw_100 0.02 0.00 0.00
## 
## $Pl3_Sophist
##          500 1000 1500
## mfw_35  0.08 0.01 0.01
## mfw_70  0.22 0.09 0.03
## mfw_100 0.18 0.10 0.06
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp5$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  92   98  100
## Pl2   8    2    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  98  100  100
## Pl2   2    0    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  94  100  100
## Pl2   6    0    0
## Pl3   0    0    0
## 
## 
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
##     500 1000 1500
## Pl1  81   94   98
## Pl2   3    2    1
## Pl3  16    4    1
## 
## $Pl1_Gorgias$mfw_70
##     500 1000 1500
## Pl1  91   96   98
## Pl2   3    2    2
## Pl3   6    2    0
## 
## $Pl1_Gorgias$mfw_100
##     500 1000 1500
## Pl1  83   97   97
## Pl2  14    3    3
## Pl3   3    0    0
## 
## 
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## $Pl2_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  98  100  100
## Pl3   2    0    0
## 
## $Pl2_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  98  100  100
## Pl3   2    0    0
## 
## 
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
##     500 1000 1500
## Pl1   3    0    0
## Pl2  95  100  100
## Pl3   2    0    0
## 
## $Pl2_Republic9$mfw_70
##     500 1000 1500
## Pl1   1    0    0
## Pl2  98  100  100
## Pl3   1    0    0
## 
## $Pl2_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## 
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  99  100  100
## Pl3   1    0    0
## 
## $Pl3_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  95   98   99
## Pl3   5    2    1
## 
## $Pl3_Laws9$mfw_100
##     500 1000 1500
## Pl1   1    0    0
## Pl2  97  100  100
## Pl3   2    0    0
## 
## 
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
##     500 1000 1500
## Pl1  63   83   81
## Pl2  29   16   18
## Pl3   8    1    1
## 
## $Pl3_Sophist$mfw_70
##     500 1000 1500
## Pl1  30   35   44
## Pl2  48   56   53
## Pl3  22    9    3
## 
## $Pl3_Sophist$mfw_100
##     500 1000 1500
## Pl1  11   12    6
## Pl2  71   78   88
## Pl3  18   10    6
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 6: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg 5. and 9)

Test 6, on the contrary, is a success. All texts are neatly assigned to their authors. The only exception is, again, Laws 5 grouping with Plato 2 in 30% of cases, but only if 35 mfw and 500-w blocks are used.

dir.create("corpus6")
file.copy("Lysis.txt", "corpus6")
file.copy("Charmides.txt", "corpus6")
file.copy("Laws5.txt", "corpus6")
file.copy("Laws9.txt", "corpus6")
file.copy("Republic8.txt", "corpus6")
file.copy("Republic9.txt", "corpus6")
setwd("corpus6")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws5.txt", "Pl3_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp6 <- size.penalize(corpus.dir = "corpus6", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp6$accuracy.scores 
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.97 1.00    1
## mfw_70  0.96 1.00    1
## mfw_100 0.82 0.98    1
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.84 0.94 0.98
## mfw_70  0.95 1.00 1.00
## mfw_100 0.81 0.95 0.99
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  0.88 0.94 0.97
## mfw_70  0.97 0.99 1.00
## mfw_100 0.97 1.00 1.00
## 
## $Pl2_Republic9
##          500 1000 1500
## mfw_35  0.84 0.98 0.98
## mfw_70  0.96 1.00 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl3_Laws5
##          500 1000 1500
## mfw_35  0.69 0.79  0.8
## mfw_70  0.99 0.98  1.0
## mfw_100 1.00 1.00  1.0
## 
## $Pl3_Laws9
##          500 1000 1500
## mfw_35  0.97 0.99    1
## mfw_70  1.00 1.00    1
## mfw_100 0.96 1.00    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp6$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  97  100  100
## Pl2   3    0    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  96  100  100
## Pl2   4    0    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  82   98  100
## Pl2  18    2    0
## Pl3   0    0    0
## 
## 
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
##     500 1000 1500
## Pl1  84   94   98
## Pl2  16    6    2
## Pl3   0    0    0
## 
## $Pl1_Lysis$mfw_70
##     500 1000 1500
## Pl1  95  100  100
## Pl2   5    0    0
## Pl3   0    0    0
## 
## $Pl1_Lysis$mfw_100
##     500 1000 1500
## Pl1  81   95   99
## Pl2  19    5    1
## Pl3   0    0    0
## 
## 
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  88   94   97
## Pl3  12    6    3
## 
## $Pl2_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  97   99  100
## Pl3   3    1    0
## 
## $Pl2_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  97  100  100
## Pl3   3    0    0
## 
## 
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
##     500 1000 1500
## Pl1  11    1    1
## Pl2  84   98   98
## Pl3   5    1    1
## 
## $Pl2_Republic9$mfw_70
##     500 1000 1500
## Pl1   2    0    0
## Pl2  96  100  100
## Pl3   2    0    0
## 
## $Pl2_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  99  100  100
## Pl3   1    0    0
## 
## 
## $Pl3_Laws5
## $Pl3_Laws5$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  31   21   20
## Pl3  69   79   80
## 
## $Pl3_Laws5$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    2    0
## Pl3  99   98  100
## 
## $Pl3_Laws5$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   3    1    0
## Pl3  97   99  100
## 
## $Pl3_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   4    0    0
## Pl3  96  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 7: Plato 1 (Ly., Chrm.) vs. Plato 2 (Prt., Grg.) vs. Plato 3 (R. 8 and 9) vs. Plato 4 (Lg. 5 and 9)

In this test, I try adding more Platos. The result is not so devastating as one might have expected: although the accuracy lowers for Plato 1 and Plato 2, the confusion matrices suggest that the confusion is only within these two groups (both supposedly “early”). Again, note relatively low scores for Laws 5 (attributed to Plato 3 in 38% of cases with 500-w blocks and 35 mfw, but not with 70 and 100 mfw).

dir.create("corpus7")
file.copy("Lysis.txt", "corpus7")
file.copy("Charmides.txt", "corpus7")
file.copy("Laws5.txt", "corpus7")
file.copy("Laws9.txt", "corpus7")
file.copy("Republic8.txt", "corpus7")
file.copy("Republic9.txt", "corpus7")
file.copy("Protagoras.txt", "corpus7")
file.copy("Gorgias.txt", "corpus7")

setwd("corpus7")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl2_Gorgias.txt", "Pl4_Laws5.txt"    , "Pl4_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Protagoras.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp7 <- size.penalize(corpus.dir = "corpus7", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp7$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.41 0.63 0.51
## mfw_70  0.34 0.41 0.46
## mfw_100 0.58 0.63 0.72
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.52 0.56 0.70
## mfw_70  0.36 0.39 0.42
## mfw_100 0.66 0.73 0.82
## 
## $Pl2_Gorgias
##          500 1000 1500
## mfw_35  0.81 0.81 0.92
## mfw_70  0.90 0.96 1.00
## mfw_100 0.88 0.96 0.98
## 
## $Pl2_Protagoras
##          500 1000 1500
## mfw_35  0.48 0.57 0.60
## mfw_70  0.81 0.96 0.99
## mfw_100 0.77 0.95 1.00
## 
## $Pl3_Republic8
##          500 1000 1500
## mfw_35  0.93 0.98    1
## mfw_70  0.98 1.00    1
## mfw_100 0.98 1.00    1
## 
## $Pl3_Republic9
##          500 1000 1500
## mfw_35  0.92 0.99    1
## mfw_70  0.98 1.00    1
## mfw_100 0.97 1.00    1
## 
## $Pl4_Laws5
##          500 1000 1500
## mfw_35  0.65 0.76 0.79
## mfw_70  0.89 0.93 1.00
## mfw_100 1.00 1.00 1.00
## 
## $Pl4_Laws9
##          500 1000 1500
## mfw_35  0.97 1.00    1
## mfw_70  0.98 0.99    1
## mfw_100 1.00 1.00    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp7$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  41   63   51
## Pl2  59   37   49
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  34   41   46
## Pl2  66   59   54
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  58   63   72
## Pl2  42   37   28
## Pl3   0    0    0
## Pl4   0    0    0
## 
## 
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
##     500 1000 1500
## Pl1  52   56   70
## Pl2  45   44   30
## Pl3   2    0    0
## Pl4   1    0    0
## 
## $Pl1_Lysis$mfw_70
##     500 1000 1500
## Pl1  36   39   42
## Pl2  63   61   58
## Pl3   1    0    0
## Pl4   0    0    0
## 
## $Pl1_Lysis$mfw_100
##     500 1000 1500
## Pl1  66   73   82
## Pl2  30   27   18
## Pl3   4    0    0
## Pl4   0    0    0
## 
## 
## $Pl2_Gorgias
## $Pl2_Gorgias$mfw_35
##     500 1000 1500
## Pl1  19   19    8
## Pl2  81   81   92
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl2_Gorgias$mfw_70
##     500 1000 1500
## Pl1   9    4    0
## Pl2  90   96  100
## Pl3   1    0    0
## Pl4   0    0    0
## 
## $Pl2_Gorgias$mfw_100
##     500 1000 1500
## Pl1  11    4    2
## Pl2  88   96   98
## Pl3   1    0    0
## Pl4   0    0    0
## 
## 
## $Pl2_Protagoras
## $Pl2_Protagoras$mfw_35
##     500 1000 1500
## Pl1  45   40   39
## Pl2  48   57   60
## Pl3   6    3    1
## Pl4   1    0    0
## 
## $Pl2_Protagoras$mfw_70
##     500 1000 1500
## Pl1  10    3    0
## Pl2  81   96   99
## Pl3   9    1    1
## Pl4   0    0    0
## 
## $Pl2_Protagoras$mfw_100
##     500 1000 1500
## Pl1  20    5    0
## Pl2  77   95  100
## Pl3   3    0    0
## Pl4   0    0    0
## 
## 
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  93   98  100
## Pl4   7    2    0
## 
## $Pl3_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  98  100  100
## Pl4   2    0    0
## 
## $Pl3_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  98  100  100
## Pl4   2    0    0
## 
## 
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
##     500 1000 1500
## Pl1   1    0    0
## Pl2   1    0    0
## Pl3  92   99  100
## Pl4   6    1    0
## 
## $Pl3_Republic9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  98  100  100
## Pl4   2    0    0
## 
## $Pl3_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  97  100  100
## Pl4   3    0    0
## 
## 
## $Pl4_Laws5
## $Pl4_Laws5$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  35   24   21
## Pl4  65   76   79
## 
## $Pl4_Laws5$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  11    7    0
## Pl4  89   93  100
## 
## $Pl4_Laws5$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   3    0    0
## Pl4  97  100  100
## 
## $Pl4_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   2    1    0
## Pl4  98   99  100
## 
## $Pl4_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 8: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Tht. and Sph.) vs. Plato 4 (Lg. 5 and 9)

This is again a failure: the Theatetus is dramatically misattributed (to Plato 1 and 2), and so is (less dramatically) Laws 5.

dir.create("corpus8")
file.copy("Theaetetus.txt", "corpus8")
file.copy("Sophist.txt", "corpus8")
file.copy("Laws5.txt", "corpus8")
file.copy("Laws9.txt", "corpus8")
file.copy("Republic8.txt", "corpus8")
file.copy("Republic9.txt", "corpus8")
file.copy("Protagoras.txt", "corpus8")
file.copy("Gorgias.txt", "corpus8")

setwd("corpus8")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl4_Laws5.txt", "Pl4_Laws9.txt",  "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt", "Pl3_Sophist.txt", "Pl3_Theaetetus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp8 <- size.penalize(corpus.dir = "corpus8", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp8$accuracy.scores
## $Pl1_Gorgias
##          500 1000 1500
## mfw_35  0.85 0.90 0.97
## mfw_70  0.94 0.97 1.00
## mfw_100 0.96 0.99 0.99
## 
## $Pl1_Protagoras
##          500 1000 1500
## mfw_35  0.80  0.9 0.97
## mfw_70  0.98  1.0 1.00
## mfw_100 0.89  1.0 1.00
## 
## $Pl2_Republic8
##          500 1000 1500
## mfw_35  0.89 0.99 0.99
## mfw_70  0.97 0.99 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl2_Republic9
##          500 1000 1500
## mfw_35  0.91 0.96    1
## mfw_70  1.00 0.99    1
## mfw_100 0.99 1.00    1
## 
## $Pl3_Sophist
##          500 1000 1500
## mfw_35  0.70 0.91 0.95
## mfw_70  0.82 0.98 0.99
## mfw_100 0.95 1.00 1.00
## 
## $Pl3_Theaetetus
##          500 1000 1500
## mfw_35  0.31 0.25 0.29
## mfw_70  0.21 0.13 0.12
## mfw_100 0.41 0.30 0.42
## 
## $Pl4_Laws5
##          500 1000 1500
## mfw_35  0.69 0.76 0.76
## mfw_70  0.89 0.95 0.97
## mfw_100 1.00 1.00 1.00
## 
## $Pl4_Laws9
##          500 1000 1500
## mfw_35  1.00    1    1
## mfw_70  0.98    1    1
## mfw_100 0.98    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp8$confusion.matrices
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
##     500 1000 1500
## Pl1  85   90   97
## Pl2   0    0    0
## Pl3  15   10    3
## Pl4   0    0    0
## 
## $Pl1_Gorgias$mfw_70
##     500 1000 1500
## Pl1  94   97  100
## Pl2   2    0    0
## Pl3   4    3    0
## Pl4   0    0    0
## 
## $Pl1_Gorgias$mfw_100
##     500 1000 1500
## Pl1  96   99   99
## Pl2   0    0    0
## Pl3   4    1    1
## Pl4   0    0    0
## 
## 
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
##     500 1000 1500
## Pl1  80   90   97
## Pl2  12    5    1
## Pl3   8    5    2
## Pl4   0    0    0
## 
## $Pl1_Protagoras$mfw_70
##     500 1000 1500
## Pl1  98  100  100
## Pl2   1    0    0
## Pl3   0    0    0
## Pl4   1    0    0
## 
## $Pl1_Protagoras$mfw_100
##     500 1000 1500
## Pl1  89  100  100
## Pl2   3    0    0
## Pl3   4    0    0
## Pl4   4    0    0
## 
## 
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
##     500 1000 1500
## Pl1   1    0    0
## Pl2  89   99   99
## Pl3   0    0    0
## Pl4  10    1    1
## 
## $Pl2_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  97   99  100
## Pl3   0    0    0
## Pl4   3    1    0
## 
## $Pl2_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  99  100  100
## Pl3   0    0    0
## Pl4   1    0    0
## 
## 
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
##     500 1000 1500
## Pl1   4    4    0
## Pl2  91   96  100
## Pl3   1    0    0
## Pl4   4    0    0
## 
## $Pl2_Republic9$mfw_70
##     500 1000 1500
## Pl1   0    1    0
## Pl2 100   99  100
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl2_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  99  100  100
## Pl3   1    0    0
## Pl4   0    0    0
## 
## 
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
##     500 1000 1500
## Pl1  18    9    4
## Pl2   6    0    0
## Pl3  70   91   95
## Pl4   6    0    1
## 
## $Pl3_Sophist$mfw_70
##     500 1000 1500
## Pl1   7    0    0
## Pl2   6    1    0
## Pl3  82   98   99
## Pl4   5    1    1
## 
## $Pl3_Sophist$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3  95  100  100
## Pl4   4    0    0
## 
## 
## $Pl3_Theaetetus
## $Pl3_Theaetetus$mfw_35
##     500 1000 1500
## Pl1  54   68   67
## Pl2  14    7    4
## Pl3  31   25   29
## Pl4   1    0    0
## 
## $Pl3_Theaetetus$mfw_70
##     500 1000 1500
## Pl1  54   74   82
## Pl2  25   13    6
## Pl3  21   13   12
## Pl4   0    0    0
## 
## $Pl3_Theaetetus$mfw_100
##     500 1000 1500
## Pl1  26   52   48
## Pl2  33   18   10
## Pl3  41   30   42
## Pl4   0    0    0
## 
## 
## $Pl4_Laws5
## $Pl4_Laws5$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  29   24   24
## Pl3   2    0    0
## Pl4  69   76   76
## 
## $Pl4_Laws5$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  11    4    3
## Pl3   0    1    0
## Pl4  89   95   97
## 
## $Pl4_Laws5$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## $Pl4_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   2    0    0
## Pl3   0    0    0
## Pl4  98  100  100
## 
## $Pl4_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   2    0    0
## Pl3   0    0    0
## Pl4  98  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 9: Plato 1 (Ly., Chrm.) vs. Plato 2 (Prt., Grg.) vs. Plato 3 (R. 8 and 9) vs. Plato 4 (Lg. 8 and 9)

This one is a modification of Test 7. I replace the Laws 5 with Laws 8, which leads to a more stable result for Plato 3 and 4.

dir.create("corpus9")
file.copy("Lysis.txt", "corpus9")
file.copy("Charmides.txt", "corpus9")
file.copy("Laws8.txt", "corpus9")
file.copy("Laws9.txt", "corpus9")
file.copy("Republic8.txt", "corpus9")
file.copy("Republic9.txt", "corpus9")
file.copy("Protagoras.txt", "corpus9")
file.copy("Gorgias.txt", "corpus9")

setwd("corpus9")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl2_Gorgias.txt", "Pl4_Laws8.txt"    , "Pl4_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Protagoras.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp9 <- size.penalize(corpus.dir = "corpus9", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp9$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.50 0.63 0.74
## mfw_70  0.37 0.47 0.45
## mfw_100 0.36 0.48 0.47
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.58 0.74 0.76
## mfw_70  0.49 0.51 0.60
## mfw_100 0.57 0.77 0.82
## 
## $Pl2_Gorgias
##          500 1000 1500
## mfw_35  0.74 0.63 0.81
## mfw_70  0.84 0.89 0.97
## mfw_100 0.89 0.94 0.99
## 
## $Pl2_Protagoras
##          500 1000 1500
## mfw_35  0.48 0.49 0.59
## mfw_70  0.85 0.89 0.99
## mfw_100 0.75 0.89 0.96
## 
## $Pl3_Republic8
##          500 1000 1500
## mfw_35  0.94    1    1
## mfw_70  0.99    1    1
## mfw_100 0.97    1    1
## 
## $Pl3_Republic9
##          500 1000 1500
## mfw_35  0.85 0.99    1
## mfw_70  0.97 1.00    1
## mfw_100 0.99 1.00    1
## 
## $Pl4_Laws8
##          500 1000 1500
## mfw_35  0.87 0.96    1
## mfw_70  0.99 1.00    1
## mfw_100 1.00 1.00    1
## 
## $Pl4_Laws9
##          500 1000 1500
## mfw_35  0.99    1    1
## mfw_70  1.00    1    1
## mfw_100 1.00    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp9$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  50   63   74
## Pl2  50   37   26
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  37   47   45
## Pl2  63   53   55
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  36   48   47
## Pl2  64   52   53
## Pl3   0    0    0
## Pl4   0    0    0
## 
## 
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
##     500 1000 1500
## Pl1  58   74   76
## Pl2  40   25   24
## Pl3   2    1    0
## Pl4   0    0    0
## 
## $Pl1_Lysis$mfw_70
##     500 1000 1500
## Pl1  49   51   60
## Pl2  51   49   40
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Lysis$mfw_100
##     500 1000 1500
## Pl1  57   77   82
## Pl2  43   22   17
## Pl3   0    1    1
## Pl4   0    0    0
## 
## 
## $Pl2_Gorgias
## $Pl2_Gorgias$mfw_35
##     500 1000 1500
## Pl1  26   37   19
## Pl2  74   63   81
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl2_Gorgias$mfw_70
##     500 1000 1500
## Pl1  16   11    3
## Pl2  84   89   97
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl2_Gorgias$mfw_100
##     500 1000 1500
## Pl1  10    6    1
## Pl2  89   94   99
## Pl3   1    0    0
## Pl4   0    0    0
## 
## 
## $Pl2_Protagoras
## $Pl2_Protagoras$mfw_35
##     500 1000 1500
## Pl1  50   50   40
## Pl2  48   49   59
## Pl3   2    1    1
## Pl4   0    0    0
## 
## $Pl2_Protagoras$mfw_70
##     500 1000 1500
## Pl1   9    5    1
## Pl2  85   89   99
## Pl3   6    6    0
## Pl4   0    0    0
## 
## $Pl2_Protagoras$mfw_100
##     500 1000 1500
## Pl1  19   10    4
## Pl2  75   89   96
## Pl3   6    1    0
## Pl4   0    0    0
## 
## 
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  94  100  100
## Pl4   6    0    0
## 
## $Pl3_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  99  100  100
## Pl4   1    0    0
## 
## $Pl3_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  97  100  100
## Pl4   3    0    0
## 
## 
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
##     500 1000 1500
## Pl1   6    0    0
## Pl2   0    1    0
## Pl3  85   99  100
## Pl4   9    0    0
## 
## $Pl3_Republic9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3  97  100  100
## Pl4   2    0    0
## 
## $Pl3_Republic9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  99  100  100
## Pl4   1    0    0
## 
## 
## $Pl4_Laws8
## $Pl4_Laws8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  13    4    0
## Pl4  87   96  100
## 
## $Pl4_Laws8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   1    0    0
## Pl4  99  100  100
## 
## $Pl4_Laws8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   1    0    0
## Pl4  99  100  100
## 
## $Pl4_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## $Pl4_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Test 10: Plato 1 (Ly., Chrm.) vs. Plato 2 (Smp., Phd.) vs. Plato 3 (R. 8 and 9) vs. Plato 4 (Lg. 8 and 9)

Can we get less misattributions between groups 1 and 2 (and more stylistic variations) if we now consider the Phaedo and the Symposium? Apparently not. Both the Symposium and the Phaedo are far too often attributed to Plato 1 or 3.

dir.create("corpus10")
file.copy("Lysis.txt", "corpus10")
file.copy("Charmides.txt", "corpus10")
file.copy("Laws8.txt", "corpus10")
file.copy("Laws9.txt", "corpus10")
file.copy("Republic8.txt", "corpus10")
file.copy("Republic9.txt", "corpus10")
file.copy("Symposium.txt", "corpus10")
file.copy("Phaedo.txt", "corpus10")

setwd("corpus10")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl4_Laws8.txt", "Pl4_Laws9.txt",     "Pl1_Lysis.txt", "Pl2_Phaedo.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt", "Pl2_Symposium.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp10 <- size.penalize(corpus.dir = "corpus10", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp10$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.79 0.83 0.93
## mfw_70  0.85 0.97 1.00
## mfw_100 0.83 0.95 0.97
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.76 0.86 0.97
## mfw_70  0.93 0.98 1.00
## mfw_100 0.94 1.00 1.00
## 
## $Pl2_Phaedo
##          500 1000 1500
## mfw_35  0.56 0.68 0.80
## mfw_70  0.59 0.65 0.82
## mfw_100 0.37 0.50 0.67
## 
## $Pl2_Symposium
##          500 1000 1500
## mfw_35  0.68 0.81 0.89
## mfw_70  0.66 0.77 0.87
## mfw_100 0.74 0.89 0.97
## 
## $Pl3_Republic8
##          500 1000 1500
## mfw_35  0.92    1    1
## mfw_70  0.93    1    1
## mfw_100 0.97    1    1
## 
## $Pl3_Republic9
##          500 1000 1500
## mfw_35  0.80 0.92 0.99
## mfw_70  0.92 0.98 1.00
## mfw_100 0.93 0.99 1.00
## 
## $Pl4_Laws8
##          500 1000 1500
## mfw_35  0.88 0.97 0.97
## mfw_70  0.99 1.00 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl4_Laws9
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp10$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  79   83   93
## Pl2  21   17    7
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  85   97  100
## Pl2  15    3    0
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  83   95   97
## Pl2  17    5    3
## Pl3   0    0    0
## Pl4   0    0    0
## 
## 
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
##     500 1000 1500
## Pl1  76   86   97
## Pl2  20   14    3
## Pl3   4    0    0
## Pl4   0    0    0
## 
## $Pl1_Lysis$mfw_70
##     500 1000 1500
## Pl1  93   98  100
## Pl2   7    2    0
## Pl3   0    0    0
## Pl4   0    0    0
## 
## $Pl1_Lysis$mfw_100
##     500 1000 1500
## Pl1  94  100  100
## Pl2   4    0    0
## Pl3   2    0    0
## Pl4   0    0    0
## 
## 
## $Pl2_Phaedo
## $Pl2_Phaedo$mfw_35
##     500 1000 1500
## Pl1  26   26   14
## Pl2  56   68   80
## Pl3  16    6    6
## Pl4   2    0    0
## 
## $Pl2_Phaedo$mfw_70
##     500 1000 1500
## Pl1   9    3    0
## Pl2  59   65   82
## Pl3  32   32   18
## Pl4   0    0    0
## 
## $Pl2_Phaedo$mfw_100
##     500 1000 1500
## Pl1  22    8    4
## Pl2  37   50   67
## Pl3  41   42   29
## Pl4   0    0    0
## 
## 
## $Pl2_Symposium
## $Pl2_Symposium$mfw_35
##     500 1000 1500
## Pl1  14    8    3
## Pl2  68   81   89
## Pl3  16   10    8
## Pl4   2    1    0
## 
## $Pl2_Symposium$mfw_70
##     500 1000 1500
## Pl1  13    8    0
## Pl2  66   77   87
## Pl3  21   15   13
## Pl4   0    0    0
## 
## $Pl2_Symposium$mfw_100
##     500 1000 1500
## Pl1   7    3    0
## Pl2  74   89   97
## Pl3  13    7    3
## Pl4   6    1    0
## 
## 
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3  92  100  100
## Pl4   7    0    0
## 
## $Pl3_Republic8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3  93  100  100
## Pl4   6    0    0
## 
## $Pl3_Republic8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  97  100  100
## Pl4   3    0    0
## 
## 
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
##     500 1000 1500
## Pl1   3    1    0
## Pl2  10    7    1
## Pl3  80   92   99
## Pl4   7    0    0
## 
## $Pl3_Republic9$mfw_70
##     500 1000 1500
## Pl1   1    0    0
## Pl2   5    1    0
## Pl3  92   98  100
## Pl4   2    1    0
## 
## $Pl3_Republic9$mfw_100
##     500 1000 1500
## Pl1   2    0    0
## Pl2   4    1    0
## Pl3  93   99  100
## Pl4   1    0    0
## 
## 
## $Pl4_Laws8
## $Pl4_Laws8$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3  12    3    3
## Pl4  88   97   97
## 
## $Pl4_Laws8$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3   0    0    0
## Pl4  99  100  100
## 
## $Pl4_Laws8$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    0    0
## Pl3   0    0    0
## Pl4  99  100  100
## 
## 
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## $Pl4_Laws9$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## $Pl4_Laws9$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3   0    0    0
## Pl4 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Preliminary conclusions

  1. Of the combinations presented here, those in Test 2 and 6 are most clearly recognized by the classifier; these are:

    • Set1: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)

    • Set 2: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg 5. and 9) Test 1 is also successful, but less informative (insofar as we only have two authors in this set) than 2.

  2. The confusion is higher with more “Platos” to discriminate between (as tests 7, 9 and 10 demonstrate).

Additional Tests

As Sets 1 and 2 only differ in respect to Plato 1, it is desirable to get some more variations for the classifier. At least we could try experimenting with different books of the Republic and the Laws.

Additional Test 1: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 2 and 3) vs. Plato 3 (Lg. 1. and 2)

There is some notable confusion between the Republic and and the first books of the Laws, especially with smaller samples. A relatively high percent of misclassifications for Charmides also signals that our texts might be too proximate stylistically.

dir.create("corpus11")
file.copy("Lysis.txt", "corpus11")
file.copy("Charmides.txt", "corpus11")
file.copy("Laws1.txt", "corpus11")
file.copy("Laws2.txt", "corpus11")
file.copy("Republic2.txt", "corpus11")
file.copy("Republic3.txt", "corpus11")


setwd("corpus11")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws1.txt", "Pl3_Laws2.txt",     "Pl1_Lysis.txt", "Pl2_Republic2.txt", "Pl2_Republic3.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp11 <- size.penalize(corpus.dir = "corpus11", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp11$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.96 0.98 1.00
## mfw_70  0.87 0.94 1.00
## mfw_100 0.71 0.88 0.96
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.89 0.90 0.98
## mfw_70  0.91 0.95 0.97
## mfw_100 0.75 0.87 0.90
## 
## $Pl2_Republic2
##          500 1000 1500
## mfw_35  0.76 0.93 0.98
## mfw_70  0.99 0.99 1.00
## mfw_100 0.96 0.98 1.00
## 
## $Pl2_Republic3
##          500 1000 1500
## mfw_35  0.74 0.88 0.93
## mfw_70  0.79 0.99 0.97
## mfw_100 0.82 0.97 0.99
## 
## $Pl3_Laws1
##          500 1000 1500
## mfw_35  0.92 0.98 0.99
## mfw_70  0.93 0.98 1.00
## mfw_100 0.98 1.00 1.00
## 
## $Pl3_Laws2
##          500 1000 1500
## mfw_35  0.84 0.91 0.97
## mfw_70  0.97 0.99 1.00
## mfw_100 0.98 1.00 1.00
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Additional Test 2: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 4 and 5) vs. Plato 3 (Tim., Cri)

The accuracy is higher for this combination (thought it slightly decreases for Charmides with 100 mfw). Let’s try replacing this text in the next iteration.

dir.create("corpus12")
file.copy("Lysis.txt", "corpus12")
file.copy("Charmides.txt", "corpus12")
file.copy("Timaeus.txt", "corpus12")
file.copy("Critias.txt", "corpus12")
file.copy("Republic4.txt", "corpus12")
file.copy("Republic5.txt", "corpus12")


setwd("corpus12")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Critias.txt", "Pl1_Lysis.txt",     "Pl2_Republic4.txt", "Pl2_Republic5.txt", "Pl3_Timaeus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp12 <- size.penalize(corpus.dir = "corpus12", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp12$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.85 0.91 1.00
## mfw_70  0.93 0.97 0.98
## mfw_100 0.66 0.78 0.88
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.85 0.95 0.96
## mfw_70  0.99 1.00 1.00
## mfw_100 0.96 1.00 1.00
## 
## $Pl2_Republic4
##          500 1000 1500
## mfw_35  0.87 0.95 0.98
## mfw_70  0.94 0.99 1.00
## mfw_100 0.95 0.99 1.00
## 
## $Pl2_Republic5
##          500 1000 1500
## mfw_35  0.76 0.89 0.96
## mfw_70  0.88 0.98 0.99
## mfw_100 0.98 0.99 1.00
## 
## $Pl3_Critias
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## $Pl3_Timaeus
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Additional Test 3: Plato 1 (Ly., La.) vs. Plato 2 (R. 4 and 5) vs. Plato 3 (Tim., Cri)

It’s no good if we replace the Charmides with the Laches: now both the Lysis and the Laches are massively classified into Plato 2! The combination in Add. Test 2 was definitely more fortunate.

dir.create("corpus13")
file.copy("Lysis.txt", "corpus13")
file.copy("Laches.txt", "corpus13")
file.copy("Timaeus.txt", "corpus13")
file.copy("Critias.txt", "corpus13")
file.copy("Republic4.txt", "corpus13")
file.copy("Republic5.txt", "corpus13")

setwd("corpus13")
file.names <- list.files()
file.names
new.file.names <- c("Pl3_Critias.txt", "Pl1_Laches.txt", "Pl1_Lysis.txt",     "Pl2_Republic4.txt", "Pl2_Republic5.txt", "Pl3_Timaeus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp13 <- size.penalize(corpus.dir = "corpus13", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp13$accuracy.scores
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.51 0.55 0.68
## mfw_70  0.75 0.80 0.89
## mfw_100 0.58 0.51 0.55
## 
## $Pl1_Lysis
##          500 1000 1500
## mfw_35  0.34 0.42 0.33
## mfw_70  0.77 0.74 0.78
## mfw_100 0.65 0.69 0.67
## 
## $Pl2_Republic4
##          500 1000 1500
## mfw_35  0.80 0.91 0.97
## mfw_70  0.97 1.00 1.00
## mfw_100 0.99 1.00 1.00
## 
## $Pl2_Republic5
##          500 1000 1500
## mfw_35  0.85 0.92 0.98
## mfw_70  0.93 0.98 1.00
## mfw_100 0.95 0.99 1.00
## 
## $Pl3_Critias
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## $Pl3_Timaeus
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp13$confusion.matrices
## $Pl1_Laches
## $Pl1_Laches$mfw_35
##     500 1000 1500
## Pl1  51   55   68
## Pl2  49   45   32
## Pl3   0    0    0
## 
## $Pl1_Laches$mfw_70
##     500 1000 1500
## Pl1  75   80   89
## Pl2  25   20   11
## Pl3   0    0    0
## 
## $Pl1_Laches$mfw_100
##     500 1000 1500
## Pl1  58   51   55
## Pl2  42   49   45
## Pl3   0    0    0
## 
## 
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
##     500 1000 1500
## Pl1  34   42   33
## Pl2  66   58   67
## Pl3   0    0    0
## 
## $Pl1_Lysis$mfw_70
##     500 1000 1500
## Pl1  77   74   78
## Pl2  23   26   22
## Pl3   0    0    0
## 
## $Pl1_Lysis$mfw_100
##     500 1000 1500
## Pl1  65   69   67
## Pl2  35   31   33
## Pl3   0    0    0
## 
## 
## $Pl2_Republic4
## $Pl2_Republic4$mfw_35
##     500 1000 1500
## Pl1  20    9    3
## Pl2  80   91   97
## Pl3   0    0    0
## 
## $Pl2_Republic4$mfw_70
##     500 1000 1500
## Pl1   3    0    0
## Pl2  97  100  100
## Pl3   0    0    0
## 
## $Pl2_Republic4$mfw_100
##     500 1000 1500
## Pl1   1    0    0
## Pl2  99  100  100
## Pl3   0    0    0
## 
## 
## $Pl2_Republic5
## $Pl2_Republic5$mfw_35
##     500 1000 1500
## Pl1  15    8    2
## Pl2  85   92   98
## Pl3   0    0    0
## 
## $Pl2_Republic5$mfw_70
##     500 1000 1500
## Pl1   6    2    0
## Pl2  93   98  100
## Pl3   1    0    0
## 
## $Pl2_Republic5$mfw_100
##     500 1000 1500
## Pl1   5    1    0
## Pl2  95   99  100
## Pl3   0    0    0
## 
## 
## $Pl3_Critias
## $Pl3_Critias$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Critias$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Critias$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## $Pl3_Timaeus
## $Pl3_Timaeus$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Timaeus$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Timaeus$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Additional Test 4: Plato 1 (La., Chrm.) vs. Plato 2 (R. 6 and 7) vs. Plato 3 (Soph., Plt.)

Here’s another combination we try. The Statesman is again suspiciously unstable, and I now remember to have decided to exclude it from comparison. So shall it be.

dir.create("corpus14")
file.copy("Laches.txt", "corpus14")
file.copy("Charmides.txt", "corpus14")
file.copy("Sophist.txt", "corpus14")
file.copy("Statesman.txt", "corpus14")
file.copy("Republic6.txt", "corpus14")
file.copy("Republic7.txt", "corpus14")

setwd("corpus14")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt", "Pl3_Sophist.txt", "Pl3_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp14 <- size.penalize(corpus.dir = "corpus14", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp14$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.23 0.34 0.42
## mfw_70  0.90 0.96 0.98
## mfw_100 0.93 0.98 1.00
## 
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.70 0.83 0.90
## mfw_70  0.80 0.83 0.91
## mfw_100 0.92 0.96 0.98
## 
## $Pl2_Republic6
##          500 1000 1500
## mfw_35  0.74 0.90 0.99
## mfw_70  0.87 0.99 1.00
## mfw_100 0.93 1.00 1.00
## 
## $Pl2_Republic7
##          500 1000 1500
## mfw_35  0.87 0.97    1
## mfw_70  0.95 0.98    1
## mfw_100 0.92 0.97    1
## 
## $Pl3_Sophist
##          500 1000 1500
## mfw_35  0.78 0.92 0.98
## mfw_70  0.93 0.99 1.00
## mfw_100 0.98 1.00 1.00
## 
## $Pl3_Statesman
##          500 1000 1500
## mfw_35  0.58 0.75 0.83
## mfw_70  0.63 0.88 0.88
## mfw_100 0.68 0.89 0.96
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Additional Test 5: Plato 1 (La., Chrm.) vs. Plato 2 (R. 6 and 7) vs. Plato 3 (Soph., Prm.)

It was not a good idea to take the Parmenides. It’s again a disaster (see scores for the Sophist).

dir.create("corpus15")
file.copy("Laches.txt", "corpus15")
file.copy("Charmides.txt", "corpus15")
file.copy("Sophist.txt", "corpus15")
file.copy("Parmenides.txt", "corpus15")
file.copy("Republic6.txt", "corpus15")
file.copy("Republic7.txt", "corpus15")

setwd("corpus15")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Parmenides.txt", "Pl2_Republic6.txt",  "Pl2_Republic7.txt", "Pl3_Sophist.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp15 <- size.penalize(corpus.dir = "corpus15", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp15$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.39 0.37 0.44
## mfw_70  0.84 0.88 0.98
## mfw_100 0.86 0.90 0.99
## 
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.63 0.84 0.92
## mfw_70  0.89 0.95 0.98
## mfw_100 0.94 1.00 1.00
## 
## $Pl2_Republic6
##          500 1000 1500
## mfw_35  0.80 0.90 0.96
## mfw_70  0.89 1.00 1.00
## mfw_100 0.92 0.99 1.00
## 
## $Pl2_Republic7
##          500 1000 1500
## mfw_35  0.92 0.98    1
## mfw_70  0.94 1.00    1
## mfw_100 0.96 1.00    1
## 
## $Pl3_Parmenides
##          500 1000 1500
## mfw_35  0.33 0.16 0.11
## mfw_70  0.45 0.41 0.36
## mfw_100 0.80 0.95 0.98
## 
## $Pl3_Sophist
##          500 1000 1500
## mfw_35  0.04 0.01    0
## mfw_70  0.01 0.00    0
## mfw_100 0.04 0.01    0
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"

Additional Test 6: Plato 1 (La., Chrm.) vs. Plato 2 (R. 6 and 7) vs. Plato 3 (Lg. 10 and 11)

Again, a failure. Laws 10 are classified into Plato 2!

dir.create("corpus16")
file.copy("Laches.txt", "corpus16")
file.copy("Charmides.txt", "corpus16")
file.copy("Laws10.txt", "corpus16")
file.copy("Laws11.txt", "corpus16")
file.copy("Republic6.txt", "corpus16")
file.copy("Republic7.txt", "corpus16")

setwd("corpus16")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws10.txt",    "Pl3_Laws11.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp16 <- size.penalize(corpus.dir = "corpus16", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp16$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.31 0.23 0.23
## mfw_70  0.94 0.97 1.00
## mfw_100 0.79 0.87 0.96
## 
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.66 0.77 0.85
## mfw_70  0.76 0.73 0.85
## mfw_100 0.83 0.88 0.95
## 
## $Pl2_Republic6
##          500 1000 1500
## mfw_35  0.80 0.93 0.94
## mfw_70  0.85 0.97 1.00
## mfw_100 0.91 1.00 1.00
## 
## $Pl2_Republic7
##          500 1000 1500
## mfw_35  0.92 0.92 0.98
## mfw_70  0.88 0.99 0.99
## mfw_100 0.91 0.99 0.99
## 
## $Pl3_Laws10
##          500 1000 1500
## mfw_35  0.12 0.04 0.00
## mfw_70  0.27 0.10 0.04
## mfw_100 0.48 0.30 0.28
## 
## $Pl3_Laws11
##          500 1000 1500
## mfw_35  0.94 1.00    1
## mfw_70  0.93 0.98    1
## mfw_100 1.00 1.00    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp16$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  31   23   23
## Pl2  69   77   77
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  94   97  100
## Pl2   6    3    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  79   87   96
## Pl2  21   13    4
## Pl3   0    0    0
## 
## 
## $Pl1_Laches
## $Pl1_Laches$mfw_35
##     500 1000 1500
## Pl1  66   77   85
## Pl2  32   23   15
## Pl3   2    0    0
## 
## $Pl1_Laches$mfw_70
##     500 1000 1500
## Pl1  76   73   85
## Pl2  24   27   15
## Pl3   0    0    0
## 
## $Pl1_Laches$mfw_100
##     500 1000 1500
## Pl1  83   88   95
## Pl2  17   12    5
## Pl3   0    0    0
## 
## 
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
##     500 1000 1500
## Pl1   7    2    1
## Pl2  80   93   94
## Pl3  13    5    5
## 
## $Pl2_Republic6$mfw_70
##     500 1000 1500
## Pl1   8    3    0
## Pl2  85   97  100
## Pl3   7    0    0
## 
## $Pl2_Republic6$mfw_100
##     500 1000 1500
## Pl1   3    0    0
## Pl2  91  100  100
## Pl3   6    0    0
## 
## 
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
##     500 1000 1500
## Pl1   2    1    0
## Pl2  92   92   98
## Pl3   6    7    2
## 
## $Pl2_Republic7$mfw_70
##     500 1000 1500
## Pl1   3    0    0
## Pl2  88   99   99
## Pl3   9    1    1
## 
## $Pl2_Republic7$mfw_100
##     500 1000 1500
## Pl1   2    0    0
## Pl2  91   99   99
## Pl3   7    1    1
## 
## 
## $Pl3_Laws10
## $Pl3_Laws10$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2  88   96  100
## Pl3  12    4    0
## 
## $Pl3_Laws10$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  73   90   96
## Pl3  27   10    4
## 
## $Pl3_Laws10$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  52   70   72
## Pl3  48   30   28
## 
## 
## $Pl3_Laws11
## $Pl3_Laws11$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   6    0    0
## Pl3  94  100  100
## 
## $Pl3_Laws11$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   7    2    0
## Pl3  93   98  100
## 
## $Pl3_Laws11$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Additional Test 7: Plato 1 (La., Chrm.) vs. Plato 2 (R. 6 and 7) vs. Plato 3 (Lg. 2 and 3)

It’s now better, but we should use 1000-w blocks and 70-100 mfw.

dir.create("corpus17")
file.copy("Laches.txt", "corpus17")
file.copy("Charmides.txt", "corpus17")
file.copy("Laws2.txt", "corpus17")
file.copy("Laws3.txt", "corpus17")
file.copy("Republic6.txt", "corpus17")
file.copy("Republic7.txt", "corpus17")

setwd("corpus17")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws2.txt",    "Pl3_Laws3.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp17 <- size.penalize(corpus.dir = "corpus17", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp17$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.41 0.46 0.38
## mfw_70  0.92 0.96 0.99
## mfw_100 0.84 0.88 0.89
## 
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.83 0.84 0.94
## mfw_70  0.79 0.87 0.89
## mfw_100 0.83 0.94 0.99
## 
## $Pl2_Republic6
##          500 1000 1500
## mfw_35  0.67 0.82 0.83
## mfw_70  0.83 0.92 0.99
## mfw_100 0.87 0.95 0.99
## 
## $Pl2_Republic7
##          500 1000 1500
## mfw_35  0.84 0.96 0.98
## mfw_70  0.80 0.95 0.98
## mfw_100 0.87 0.98 0.98
## 
## $Pl3_Laws2
##          500 1000 1500
## mfw_35  0.76 0.90 0.93
## mfw_70  0.82 0.96 0.96
## mfw_100 0.91 0.98 1.00
## 
## $Pl3_Laws3
##          500 1000 1500
## mfw_35  0.91 0.98 0.99
## mfw_70  0.81 0.92 0.98
## mfw_100 0.87 0.98 1.00
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp17$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  41   46   38
## Pl2  59   54   62
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  92   96   99
## Pl2   8    4    1
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  84   88   89
## Pl2  16   12   11
## Pl3   0    0    0
## 
## 
## $Pl1_Laches
## $Pl1_Laches$mfw_35
##     500 1000 1500
## Pl1  83   84   94
## Pl2  13   11    6
## Pl3   4    5    0
## 
## $Pl1_Laches$mfw_70
##     500 1000 1500
## Pl1  79   87   89
## Pl2  20   12   11
## Pl3   1    1    0
## 
## $Pl1_Laches$mfw_100
##     500 1000 1500
## Pl1  83   94   99
## Pl2  14    2    1
## Pl3   3    4    0
## 
## 
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
##     500 1000 1500
## Pl1  10    0    4
## Pl2  67   82   83
## Pl3  23   18   13
## 
## $Pl2_Republic6$mfw_70
##     500 1000 1500
## Pl1   4    0    0
## Pl2  83   92   99
## Pl3  13    8    1
## 
## $Pl2_Republic6$mfw_100
##     500 1000 1500
## Pl1   2    0    0
## Pl2  87   95   99
## Pl3  11    5    1
## 
## 
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
##     500 1000 1500
## Pl1   2    0    0
## Pl2  84   96   98
## Pl3  14    4    2
## 
## $Pl2_Republic7$mfw_70
##     500 1000 1500
## Pl1   1    1    0
## Pl2  80   95   98
## Pl3  19    4    2
## 
## $Pl2_Republic7$mfw_100
##     500 1000 1500
## Pl1   2    0    0
## Pl2  87   98   98
## Pl3  11    2    2
## 
## 
## $Pl3_Laws2
## $Pl3_Laws2$mfw_35
##     500 1000 1500
## Pl1   1    0    0
## Pl2  23   10    7
## Pl3  76   90   93
## 
## $Pl3_Laws2$mfw_70
##     500 1000 1500
## Pl1   1    0    0
## Pl2  17    4    4
## Pl3  82   96   96
## 
## $Pl3_Laws2$mfw_100
##     500 1000 1500
## Pl1   1    0    0
## Pl2   8    2    0
## Pl3  91   98  100
## 
## 
## $Pl3_Laws3
## $Pl3_Laws3$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   9    2    1
## Pl3  91   98   99
## 
## $Pl3_Laws3$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2  19    8    2
## Pl3  81   92   98
## 
## $Pl3_Laws3$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2  13    2    0
## Pl3  87   98  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Additional Test 8: Plato 1 (La., Chrm.) vs. Plato 2 (R. 6 and 7) vs. Plato 3 (Lg. 11 and 12)

dir.create("corpus18")
file.copy("Laches.txt", "corpus18")
file.copy("Charmides.txt", "corpus18")
file.copy("Laws12.txt", "corpus18")
file.copy("Laws11.txt", "corpus18")
file.copy("Republic6.txt", "corpus18")
file.copy("Republic7.txt", "corpus18")

setwd("corpus18")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws11.txt",    "Pl3_Laws12.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp18 <- size.penalize(corpus.dir = "corpus18", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp18$accuracy.scores
## $Pl1_Charmides
##          500 1000 1500
## mfw_35  0.36 0.37 0.32
## mfw_70  0.88 0.99 1.00
## mfw_100 0.71 0.84 0.89
## 
## $Pl1_Laches
##          500 1000 1500
## mfw_35  0.63 0.71 0.74
## mfw_70  0.71 0.83 0.87
## mfw_100 0.82 0.90 0.99
## 
## $Pl2_Republic6
##          500 1000 1500
## mfw_35  0.74 0.92 0.97
## mfw_70  0.91 1.00 1.00
## mfw_100 1.00 1.00 1.00
## 
## $Pl2_Republic7
##          500 1000 1500
## mfw_35  0.87 0.99    1
## mfw_70  0.87 0.97    1
## mfw_100 0.90 1.00    1
## 
## $Pl3_Laws11
##         500 1000 1500
## mfw_35    1    1    1
## mfw_70    1    1    1
## mfw_100   1    1    1
## 
## $Pl3_Laws12
##          500 1000 1500
## mfw_35  0.96 0.99    1
## mfw_70  0.99 0.99    1
## mfw_100 1.00 1.00    1
## 
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp18$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
##     500 1000 1500
## Pl1  36   37   32
## Pl2  64   63   68
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_70
##     500 1000 1500
## Pl1  88   99  100
## Pl2  12    1    0
## Pl3   0    0    0
## 
## $Pl1_Charmides$mfw_100
##     500 1000 1500
## Pl1  71   84   89
## Pl2  29   16   11
## Pl3   0    0    0
## 
## 
## $Pl1_Laches
## $Pl1_Laches$mfw_35
##     500 1000 1500
## Pl1  63   71   74
## Pl2  37   29   26
## Pl3   0    0    0
## 
## $Pl1_Laches$mfw_70
##     500 1000 1500
## Pl1  71   83   87
## Pl2  29   17   13
## Pl3   0    0    0
## 
## $Pl1_Laches$mfw_100
##     500 1000 1500
## Pl1  82   90   99
## Pl2  18   10    1
## Pl3   0    0    0
## 
## 
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
##     500 1000 1500
## Pl1  19    8    3
## Pl2  74   92   97
## Pl3   7    0    0
## 
## $Pl2_Republic6$mfw_70
##     500 1000 1500
## Pl1   5    0    0
## Pl2  91  100  100
## Pl3   4    0    0
## 
## $Pl2_Republic6$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2 100  100  100
## Pl3   0    0    0
## 
## 
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
##     500 1000 1500
## Pl1   4    0    0
## Pl2  87   99  100
## Pl3   9    1    0
## 
## $Pl2_Republic7$mfw_70
##     500 1000 1500
## Pl1   3    0    0
## Pl2  87   97  100
## Pl3  10    3    0
## 
## $Pl2_Republic7$mfw_100
##     500 1000 1500
## Pl1   3    0    0
## Pl2  90  100  100
## Pl3   7    0    0
## 
## 
## $Pl3_Laws11
## $Pl3_Laws11$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Laws11$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## $Pl3_Laws11$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## $Pl3_Laws12
## $Pl3_Laws12$mfw_35
##     500 1000 1500
## Pl1   0    0    0
## Pl2   4    1    0
## Pl3  96   99  100
## 
## $Pl3_Laws12$mfw_70
##     500 1000 1500
## Pl1   0    0    0
## Pl2   1    1    0
## Pl3  99   99  100
## 
## $Pl3_Laws12$mfw_100
##     500 1000 1500
## Pl1   0    0    0
## Pl2   0    0    0
## Pl3 100  100  100
## 
## 
## attr(,"description")
## [1] "all classification scores (raw tables)"

Best combinations

The following combinations of Plato’s works seem to be most suitable to compare our test dialogue with. These are:

Works cited

Burrows, John. 2002. Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship.” Literary and Linguistic Computing 17 (3): 267–87.
Eder, Maciej. 2015. “Does Size Matter? Authorship Attribution, Small Samples, Big Problem.” Digital Scholarship in the Humanities 30 (2): 167–82.
Eder, Maciej, Jan Rybicki, and Mike Kestemont. 2016. “Stylometry with R: A Package for Computational Text Analysis.” The R Journal 8 (1).
Howland, Jacob. 1991. “Re-Reading Plato: The Problem of Platonic Chronology.” Phoenix 45 (3): 189–214. https://doi.org/10.2307/1088791.
Vatri, Alessandro, and Barbara McGillivray. 2018. “The Diorisis Ancient Greek Corpus: Linguistics and Literature.” Research Data Journal for the Humanities and Social Sciences 3 (1): 55–65.
———. 2020. “Lemmatization for Ancient Greek: An Experimental Assessment of the State of the Art.” Journal of Greek Linguistics 20 (2): 179–96. https://doi.org/10.1163/15699846-02002001.