The problem addressed in this document is whether stylometric analysis of Plato’s dialogues can be effective when applied to smaller samples (500-1000 words), which are likely to be affected by random noise (Eder 2015). The sample size necessary for the correct classification of Plato’s texts in a supervised machine-learning setup is tested using the R package Stylo (Eder, Rybicki, and Kestemont 2016). Burrows’ Delta is adopted throughout as a method proved most effective in attribution experiments (Burrows 2002).
A word on the “correct classification” is in order here. Our task is to see whether the classifier can discriminate between different styles of Plato associated in these tests with Platos 1-4. However, if a text by “Plato 1” is “misattributed” to “Plato 3,” this may only mean that we started with a wrong assumption concerning its stylistic neighbors. “Misattribution” is therefore a signal that our Platos do not look very different for the machine and thus need to be reshuffled. There is also a possibility that some of the dialogues were revised and thus contain various stylistic layers (Howland 1991), and I do not assume that all texts in this study are stylistically homogeneous. However, we may reasonably expect that stylistic variations within one text will be signaled by a high percent of misattributions. At any rate, it is only for the sake of further comparison with a test dialogue we suspect to have been revised that we need several Platos. As long as they, on average, represent a certain stylistic tendency, and as long as this tendency is visible for the classifier, they can serve as suitable comparanda.
For this tests, I used Diorisis Ancient Greek Corpus (Vatri and McGillivray 2018). On the accuracy of lemmatization, see (Vatri and McGillivray 2020). The code I used for extracting the lemmata is accessible via links: Parsing Plato’s Republic (Separate Books), Parsing Plato’s Laws (Separate Books), Corpus_Platonicum: Lemmata Extraction. I start with the list of files produced by this code in my working directory.
library(stylo)
I start with positing only two authors whom I believe to be stylistically remote, and this is confirmed by the output. Except for the tests with 35 mfw and 500-w blocks of the Statesman, the accuracy is above 90%. Note specifically that 500-w blocks, if used with 70 or 100 mfw, give a sufficiently good result. There is no need to print confusion matrices
dir.create("corpus1")
file.copy("Protagoras.txt", "corpus1")
file.copy("Gorgias.txt", "corpus1")
file.copy("Laws5.txt", "corpus1")
file.copy("Statesman.txt", "corpus1")
setwd("corpus1")
file.names <- list.files()
new.file.names <- c("Pl1_Gorgias.txt", "Pl2_Laws5.txt", "Pl1_Protagoras.txt", "Pl2_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp1 <- size.penalize(corpus.dir = "corpus1", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp1$accuracy.scores
## $Pl1_Gorgias
## 500 1000 1500
## mfw_35 0.99 0.99 1
## mfw_70 1.00 1.00 1
## mfw_100 0.99 1.00 1
##
## $Pl1_Protagoras
## 500 1000 1500
## mfw_35 0.97 1.00 0.99
## mfw_70 0.98 0.99 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl2_Laws5
## 500 1000 1500
## mfw_35 0.97 0.93 0.96
## mfw_70 1.00 1.00 1.00
## mfw_100 1.00 1.00 1.00
##
## $Pl2_Statesman
## 500 1000 1500
## mfw_35 0.85 0.93 0.93
## mfw_70 0.94 1.00 0.99
## mfw_100 0.94 0.99 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
We now complicate the picture a bit by adding the third author, but the classification is again successful.
dir.create("corpus2")
file.copy("Protagoras.txt", "corpus2")
file.copy("Gorgias.txt", "corpus2")
file.copy("Laws8.txt", "corpus2")
file.copy("Laws9.txt", "corpus2")
file.copy("Republic8.txt", "corpus2")
file.copy("Republic9.txt", "corpus2")
setwd("corpus2")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl3_Laws8.txt", "Pl3_Laws9.txt", "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp2 <- size.penalize(corpus.dir = "corpus2", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp2$accuracy.scores
## $Pl1_Gorgias
## 500 1000 1500
## mfw_35 1.00 1 1
## mfw_70 1.00 1 1
## mfw_100 0.99 1 1
##
## $Pl1_Protagoras
## 500 1000 1500
## mfw_35 0.85 0.96 0.99
## mfw_70 0.86 0.97 0.99
## mfw_100 0.95 1.00 1.00
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 0.94 0.98 1
## mfw_70 1.00 1.00 1
## mfw_100 0.98 1.00 1
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.92 0.97 1
## mfw_70 0.98 1.00 1
## mfw_100 0.99 1.00 1
##
## $Pl3_Laws8
## 500 1000 1500
## mfw_35 0.93 0.98 0.98
## mfw_70 0.98 1.00 0.99
## mfw_100 1.00 1.00 1.00
##
## $Pl3_Laws9
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
This case differs from the previous one in that we replace the Laws with the Sophist and the Statesman. Even if with 100 mfw and 1000-w blocks the accuracy is 100% in all cases, the Statesman, again (cf. Test 1), behaves suspiciously. For this dialogue, the accuracy drastically drops with 35 mfw, and (as the confusion tables printed below show) it is classified into Plato 2 in more than 50% of cases, notwithstanding the length of the sample. Given its unstable attribution, I consider it best to exclude the Statesman from further tests.
dir.create("corpus3")
file.copy("Protagoras.txt", "corpus3")
file.copy("Gorgias.txt", "corpus3")
file.copy("Sophist.txt", "corpus3")
file.copy("Statesman.txt", "corpus3")
file.copy("Republic8.txt", "corpus3")
file.copy("Republic9.txt", "corpus3")
setwd("corpus3")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt", "Pl3_Sophist.txt", "Pl3_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp3 <- size.penalize(corpus.dir = "corpus3", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp3$accuracy.scores
## $Pl1_Gorgias
## 500 1000 1500
## mfw_35 0.97 1 1
## mfw_70 0.99 1 1
## mfw_100 0.98 1 1
##
## $Pl1_Protagoras
## 500 1000 1500
## mfw_35 0.70 0.90 0.93
## mfw_70 0.91 0.98 1.00
## mfw_100 0.86 0.96 0.97
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 0.99 0.99 1
## mfw_70 1.00 1.00 1
## mfw_100 1.00 1.00 1
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.9 0.99 1
## mfw_70 1.0 1.00 1
## mfw_100 1.0 1.00 1
##
## $Pl3_Sophist
## 500 1000 1500
## mfw_35 0.82 0.9 0.91
## mfw_70 1.00 1.0 1.00
## mfw_100 1.00 1.0 1.00
##
## $Pl3_Statesman
## 500 1000 1500
## mfw_35 0.31 0.42 0.36
## mfw_70 0.85 0.90 0.95
## mfw_100 0.86 0.96 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp3$confusion.matrices
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
## 500 1000 1500
## Pl1 97 100 100
## Pl2 0 0 0
## Pl3 3 0 0
##
## $Pl1_Gorgias$mfw_70
## 500 1000 1500
## Pl1 99 100 100
## Pl2 1 0 0
## Pl3 0 0 0
##
## $Pl1_Gorgias$mfw_100
## 500 1000 1500
## Pl1 98 100 100
## Pl2 2 0 0
## Pl3 0 0 0
##
##
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
## 500 1000 1500
## Pl1 70 90 93
## Pl2 30 10 7
## Pl3 0 0 0
##
## $Pl1_Protagoras$mfw_70
## 500 1000 1500
## Pl1 91 98 100
## Pl2 7 2 0
## Pl3 2 0 0
##
## $Pl1_Protagoras$mfw_100
## 500 1000 1500
## Pl1 86 96 97
## Pl2 14 4 3
## Pl3 0 0 0
##
##
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 99 99 100
## Pl3 1 1 0
##
## $Pl2_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
## $Pl2_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
##
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
## 500 1000 1500
## Pl1 4 1 0
## Pl2 90 99 100
## Pl3 6 0 0
##
## $Pl2_Republic9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
## $Pl2_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
##
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
## 500 1000 1500
## Pl1 17 9 9
## Pl2 1 1 0
## Pl3 82 90 91
##
## $Pl3_Sophist$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Sophist$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## $Pl3_Statesman
## $Pl3_Statesman$mfw_35
## 500 1000 1500
## Pl1 2 0 0
## Pl2 67 58 64
## Pl3 31 42 36
##
## $Pl3_Statesman$mfw_70
## 500 1000 1500
## Pl1 1 0 0
## Pl2 14 10 5
## Pl3 85 90 95
##
## $Pl3_Statesman$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 14 4 0
## Pl3 86 96 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
We now return to the combination presented in Test 2, but add a little tweak by replacing the Gorgias with the Charmides. Now the accuracy for the Protagoras collapses: it apparently has as much to do (stylistically) with the Republic as with the Charmides.
dir.create("corpus4")
file.copy("Protagoras.txt", "corpus4")
file.copy("Charmides.txt", "corpus4")
file.copy("Laws8.txt", "corpus4")
file.copy("Laws9.txt", "corpus4")
file.copy("Republic8.txt", "corpus4")
file.copy("Republic9.txt", "corpus4")
setwd("corpus4")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws8.txt", "Pl3_Laws9.txt", "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp4 <- size.penalize(corpus.dir = "corpus4", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp4$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.98 0.99 1
## mfw_70 0.99 1.00 1
## mfw_100 1.00 1.00 1
##
## $Pl1_Protagoras
## 500 1000 1500
## mfw_35 0.74 0.87 0.89
## mfw_70 0.49 0.64 0.54
## mfw_100 0.74 0.83 0.78
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 0.93 1.00 1
## mfw_70 0.96 0.99 1
## mfw_100 0.93 1.00 1
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.92 0.97 0.99
## mfw_70 0.96 1.00 1.00
## mfw_100 0.98 1.00 1.00
##
## $Pl3_Laws8
## 500 1000 1500
## mfw_35 0.88 0.99 0.99
## mfw_70 0.97 1.00 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl3_Laws9
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp4$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 98 99 100
## Pl2 2 1 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 99 100 100
## Pl2 1 0 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 100 100 100
## Pl2 0 0 0
## Pl3 0 0 0
##
##
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
## 500 1000 1500
## Pl1 74 87 89
## Pl2 25 13 11
## Pl3 1 0 0
##
## $Pl1_Protagoras$mfw_70
## 500 1000 1500
## Pl1 49 64 54
## Pl2 50 36 46
## Pl3 1 0 0
##
## $Pl1_Protagoras$mfw_100
## 500 1000 1500
## Pl1 74 83 78
## Pl2 24 17 21
## Pl3 2 0 1
##
##
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 93 100 100
## Pl3 7 0 0
##
## $Pl2_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 96 99 100
## Pl3 4 1 0
##
## $Pl2_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 93 100 100
## Pl3 7 0 0
##
##
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
## 500 1000 1500
## Pl1 2 2 0
## Pl2 92 97 99
## Pl3 6 1 1
##
## $Pl2_Republic9$mfw_70
## 500 1000 1500
## Pl1 3 0 0
## Pl2 96 100 100
## Pl3 1 0 0
##
## $Pl2_Republic9$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 98 100 100
## Pl3 0 0 0
##
##
## $Pl3_Laws8
## $Pl3_Laws8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 12 1 1
## Pl3 88 99 99
##
## $Pl3_Laws8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 3 0 0
## Pl3 97 100 100
##
## $Pl3_Laws8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 99 100 100
##
##
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
We now modify the combination presented in Test 3. The unstable Statesman is replaced with the Laws 9, and I also try to combine the Charmides with the Gorgias in group 1. This results in a disaster: the accuracy rates for the Laws and for the Sophist are just 0-15%: Laws 9 are mainly classified into Plato 2 (see confusion matrices), and the Sophist – either into Plato 1 or into Plato 2. The combination clearly does not work.
dir.create("corpus5")
file.copy("Gorgias.txt", "corpus5")
file.copy("Charmides.txt", "corpus5")
file.copy("Sophist.txt", "corpus5")
file.copy("Laws9.txt", "corpus5")
file.copy("Republic8.txt", "corpus5")
file.copy("Republic9.txt", "corpus5")
setwd("corpus5")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Gorgias.txt", "Pl3_Laws9.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt", "Pl3_Sophist.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp5 <- size.penalize(corpus.dir = "corpus5", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp5$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.92 0.98 1
## mfw_70 0.98 1.00 1
## mfw_100 0.94 1.00 1
##
## $Pl1_Gorgias
## 500 1000 1500
## mfw_35 0.81 0.94 0.98
## mfw_70 0.91 0.96 0.98
## mfw_100 0.83 0.97 0.97
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 1.00 1 1
## mfw_70 0.98 1 1
## mfw_100 0.98 1 1
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.95 1 1
## mfw_70 0.98 1 1
## mfw_100 1.00 1 1
##
## $Pl3_Laws9
## 500 1000 1500
## mfw_35 0.01 0.00 0.00
## mfw_70 0.05 0.02 0.01
## mfw_100 0.02 0.00 0.00
##
## $Pl3_Sophist
## 500 1000 1500
## mfw_35 0.08 0.01 0.01
## mfw_70 0.22 0.09 0.03
## mfw_100 0.18 0.10 0.06
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp5$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 92 98 100
## Pl2 8 2 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 98 100 100
## Pl2 2 0 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 94 100 100
## Pl2 6 0 0
## Pl3 0 0 0
##
##
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
## 500 1000 1500
## Pl1 81 94 98
## Pl2 3 2 1
## Pl3 16 4 1
##
## $Pl1_Gorgias$mfw_70
## 500 1000 1500
## Pl1 91 96 98
## Pl2 3 2 2
## Pl3 6 2 0
##
## $Pl1_Gorgias$mfw_100
## 500 1000 1500
## Pl1 83 97 97
## Pl2 14 3 3
## Pl3 3 0 0
##
##
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
## $Pl2_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 98 100 100
## Pl3 2 0 0
##
## $Pl2_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 98 100 100
## Pl3 2 0 0
##
##
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
## 500 1000 1500
## Pl1 3 0 0
## Pl2 95 100 100
## Pl3 2 0 0
##
## $Pl2_Republic9$mfw_70
## 500 1000 1500
## Pl1 1 0 0
## Pl2 98 100 100
## Pl3 1 0 0
##
## $Pl2_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
##
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 99 100 100
## Pl3 1 0 0
##
## $Pl3_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 95 98 99
## Pl3 5 2 1
##
## $Pl3_Laws9$mfw_100
## 500 1000 1500
## Pl1 1 0 0
## Pl2 97 100 100
## Pl3 2 0 0
##
##
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
## 500 1000 1500
## Pl1 63 83 81
## Pl2 29 16 18
## Pl3 8 1 1
##
## $Pl3_Sophist$mfw_70
## 500 1000 1500
## Pl1 30 35 44
## Pl2 48 56 53
## Pl3 22 9 3
##
## $Pl3_Sophist$mfw_100
## 500 1000 1500
## Pl1 11 12 6
## Pl2 71 78 88
## Pl3 18 10 6
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
Test 6, on the contrary, is a success. All texts are neatly assigned to their authors. The only exception is, again, Laws 5 grouping with Plato 2 in 30% of cases, but only if 35 mfw and 500-w blocks are used.
dir.create("corpus6")
file.copy("Lysis.txt", "corpus6")
file.copy("Charmides.txt", "corpus6")
file.copy("Laws5.txt", "corpus6")
file.copy("Laws9.txt", "corpus6")
file.copy("Republic8.txt", "corpus6")
file.copy("Republic9.txt", "corpus6")
setwd("corpus6")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws5.txt", "Pl3_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp6 <- size.penalize(corpus.dir = "corpus6", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp6$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.97 1.00 1
## mfw_70 0.96 1.00 1
## mfw_100 0.82 0.98 1
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.84 0.94 0.98
## mfw_70 0.95 1.00 1.00
## mfw_100 0.81 0.95 0.99
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 0.88 0.94 0.97
## mfw_70 0.97 0.99 1.00
## mfw_100 0.97 1.00 1.00
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.84 0.98 0.98
## mfw_70 0.96 1.00 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl3_Laws5
## 500 1000 1500
## mfw_35 0.69 0.79 0.8
## mfw_70 0.99 0.98 1.0
## mfw_100 1.00 1.00 1.0
##
## $Pl3_Laws9
## 500 1000 1500
## mfw_35 0.97 0.99 1
## mfw_70 1.00 1.00 1
## mfw_100 0.96 1.00 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp6$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 97 100 100
## Pl2 3 0 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 96 100 100
## Pl2 4 0 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 82 98 100
## Pl2 18 2 0
## Pl3 0 0 0
##
##
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
## 500 1000 1500
## Pl1 84 94 98
## Pl2 16 6 2
## Pl3 0 0 0
##
## $Pl1_Lysis$mfw_70
## 500 1000 1500
## Pl1 95 100 100
## Pl2 5 0 0
## Pl3 0 0 0
##
## $Pl1_Lysis$mfw_100
## 500 1000 1500
## Pl1 81 95 99
## Pl2 19 5 1
## Pl3 0 0 0
##
##
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 88 94 97
## Pl3 12 6 3
##
## $Pl2_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 97 99 100
## Pl3 3 1 0
##
## $Pl2_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 97 100 100
## Pl3 3 0 0
##
##
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
## 500 1000 1500
## Pl1 11 1 1
## Pl2 84 98 98
## Pl3 5 1 1
##
## $Pl2_Republic9$mfw_70
## 500 1000 1500
## Pl1 2 0 0
## Pl2 96 100 100
## Pl3 2 0 0
##
## $Pl2_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 99 100 100
## Pl3 1 0 0
##
##
## $Pl3_Laws5
## $Pl3_Laws5$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 31 21 20
## Pl3 69 79 80
##
## $Pl3_Laws5$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 2 0
## Pl3 99 98 100
##
## $Pl3_Laws5$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## $Pl3_Laws9
## $Pl3_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 3 1 0
## Pl3 97 99 100
##
## $Pl3_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 4 0 0
## Pl3 96 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
In this test, I try adding more Platos. The result is not so devastating as one might have expected: although the accuracy lowers for Plato 1 and Plato 2, the confusion matrices suggest that the confusion is only within these two groups (both supposedly “early”). Again, note relatively low scores for Laws 5 (attributed to Plato 3 in 38% of cases with 500-w blocks and 35 mfw, but not with 70 and 100 mfw).
dir.create("corpus7")
file.copy("Lysis.txt", "corpus7")
file.copy("Charmides.txt", "corpus7")
file.copy("Laws5.txt", "corpus7")
file.copy("Laws9.txt", "corpus7")
file.copy("Republic8.txt", "corpus7")
file.copy("Republic9.txt", "corpus7")
file.copy("Protagoras.txt", "corpus7")
file.copy("Gorgias.txt", "corpus7")
setwd("corpus7")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl2_Gorgias.txt", "Pl4_Laws5.txt" , "Pl4_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Protagoras.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp7 <- size.penalize(corpus.dir = "corpus7", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp7$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.41 0.63 0.51
## mfw_70 0.34 0.41 0.46
## mfw_100 0.58 0.63 0.72
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.52 0.56 0.70
## mfw_70 0.36 0.39 0.42
## mfw_100 0.66 0.73 0.82
##
## $Pl2_Gorgias
## 500 1000 1500
## mfw_35 0.81 0.81 0.92
## mfw_70 0.90 0.96 1.00
## mfw_100 0.88 0.96 0.98
##
## $Pl2_Protagoras
## 500 1000 1500
## mfw_35 0.48 0.57 0.60
## mfw_70 0.81 0.96 0.99
## mfw_100 0.77 0.95 1.00
##
## $Pl3_Republic8
## 500 1000 1500
## mfw_35 0.93 0.98 1
## mfw_70 0.98 1.00 1
## mfw_100 0.98 1.00 1
##
## $Pl3_Republic9
## 500 1000 1500
## mfw_35 0.92 0.99 1
## mfw_70 0.98 1.00 1
## mfw_100 0.97 1.00 1
##
## $Pl4_Laws5
## 500 1000 1500
## mfw_35 0.65 0.76 0.79
## mfw_70 0.89 0.93 1.00
## mfw_100 1.00 1.00 1.00
##
## $Pl4_Laws9
## 500 1000 1500
## mfw_35 0.97 1.00 1
## mfw_70 0.98 0.99 1
## mfw_100 1.00 1.00 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp7$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 41 63 51
## Pl2 59 37 49
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 34 41 46
## Pl2 66 59 54
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 58 63 72
## Pl2 42 37 28
## Pl3 0 0 0
## Pl4 0 0 0
##
##
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
## 500 1000 1500
## Pl1 52 56 70
## Pl2 45 44 30
## Pl3 2 0 0
## Pl4 1 0 0
##
## $Pl1_Lysis$mfw_70
## 500 1000 1500
## Pl1 36 39 42
## Pl2 63 61 58
## Pl3 1 0 0
## Pl4 0 0 0
##
## $Pl1_Lysis$mfw_100
## 500 1000 1500
## Pl1 66 73 82
## Pl2 30 27 18
## Pl3 4 0 0
## Pl4 0 0 0
##
##
## $Pl2_Gorgias
## $Pl2_Gorgias$mfw_35
## 500 1000 1500
## Pl1 19 19 8
## Pl2 81 81 92
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl2_Gorgias$mfw_70
## 500 1000 1500
## Pl1 9 4 0
## Pl2 90 96 100
## Pl3 1 0 0
## Pl4 0 0 0
##
## $Pl2_Gorgias$mfw_100
## 500 1000 1500
## Pl1 11 4 2
## Pl2 88 96 98
## Pl3 1 0 0
## Pl4 0 0 0
##
##
## $Pl2_Protagoras
## $Pl2_Protagoras$mfw_35
## 500 1000 1500
## Pl1 45 40 39
## Pl2 48 57 60
## Pl3 6 3 1
## Pl4 1 0 0
##
## $Pl2_Protagoras$mfw_70
## 500 1000 1500
## Pl1 10 3 0
## Pl2 81 96 99
## Pl3 9 1 1
## Pl4 0 0 0
##
## $Pl2_Protagoras$mfw_100
## 500 1000 1500
## Pl1 20 5 0
## Pl2 77 95 100
## Pl3 3 0 0
## Pl4 0 0 0
##
##
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 93 98 100
## Pl4 7 2 0
##
## $Pl3_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 98 100 100
## Pl4 2 0 0
##
## $Pl3_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 98 100 100
## Pl4 2 0 0
##
##
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
## 500 1000 1500
## Pl1 1 0 0
## Pl2 1 0 0
## Pl3 92 99 100
## Pl4 6 1 0
##
## $Pl3_Republic9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 98 100 100
## Pl4 2 0 0
##
## $Pl3_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 97 100 100
## Pl4 3 0 0
##
##
## $Pl4_Laws5
## $Pl4_Laws5$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 35 24 21
## Pl4 65 76 79
##
## $Pl4_Laws5$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 11 7 0
## Pl4 89 93 100
##
## $Pl4_Laws5$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 3 0 0
## Pl4 97 100 100
##
## $Pl4_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 2 1 0
## Pl4 98 99 100
##
## $Pl4_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
This is again a failure: the Theatetus is dramatically misattributed (to Plato 1 and 2), and so is (less dramatically) Laws 5.
dir.create("corpus8")
file.copy("Theaetetus.txt", "corpus8")
file.copy("Sophist.txt", "corpus8")
file.copy("Laws5.txt", "corpus8")
file.copy("Laws9.txt", "corpus8")
file.copy("Republic8.txt", "corpus8")
file.copy("Republic9.txt", "corpus8")
file.copy("Protagoras.txt", "corpus8")
file.copy("Gorgias.txt", "corpus8")
setwd("corpus8")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Gorgias.txt", "Pl4_Laws5.txt", "Pl4_Laws9.txt", "Pl1_Protagoras.txt", "Pl2_Republic8.txt", "Pl2_Republic9.txt", "Pl3_Sophist.txt", "Pl3_Theaetetus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp8 <- size.penalize(corpus.dir = "corpus8", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp8$accuracy.scores
## $Pl1_Gorgias
## 500 1000 1500
## mfw_35 0.85 0.90 0.97
## mfw_70 0.94 0.97 1.00
## mfw_100 0.96 0.99 0.99
##
## $Pl1_Protagoras
## 500 1000 1500
## mfw_35 0.80 0.9 0.97
## mfw_70 0.98 1.0 1.00
## mfw_100 0.89 1.0 1.00
##
## $Pl2_Republic8
## 500 1000 1500
## mfw_35 0.89 0.99 0.99
## mfw_70 0.97 0.99 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl2_Republic9
## 500 1000 1500
## mfw_35 0.91 0.96 1
## mfw_70 1.00 0.99 1
## mfw_100 0.99 1.00 1
##
## $Pl3_Sophist
## 500 1000 1500
## mfw_35 0.70 0.91 0.95
## mfw_70 0.82 0.98 0.99
## mfw_100 0.95 1.00 1.00
##
## $Pl3_Theaetetus
## 500 1000 1500
## mfw_35 0.31 0.25 0.29
## mfw_70 0.21 0.13 0.12
## mfw_100 0.41 0.30 0.42
##
## $Pl4_Laws5
## 500 1000 1500
## mfw_35 0.69 0.76 0.76
## mfw_70 0.89 0.95 0.97
## mfw_100 1.00 1.00 1.00
##
## $Pl4_Laws9
## 500 1000 1500
## mfw_35 1.00 1 1
## mfw_70 0.98 1 1
## mfw_100 0.98 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp8$confusion.matrices
## $Pl1_Gorgias
## $Pl1_Gorgias$mfw_35
## 500 1000 1500
## Pl1 85 90 97
## Pl2 0 0 0
## Pl3 15 10 3
## Pl4 0 0 0
##
## $Pl1_Gorgias$mfw_70
## 500 1000 1500
## Pl1 94 97 100
## Pl2 2 0 0
## Pl3 4 3 0
## Pl4 0 0 0
##
## $Pl1_Gorgias$mfw_100
## 500 1000 1500
## Pl1 96 99 99
## Pl2 0 0 0
## Pl3 4 1 1
## Pl4 0 0 0
##
##
## $Pl1_Protagoras
## $Pl1_Protagoras$mfw_35
## 500 1000 1500
## Pl1 80 90 97
## Pl2 12 5 1
## Pl3 8 5 2
## Pl4 0 0 0
##
## $Pl1_Protagoras$mfw_70
## 500 1000 1500
## Pl1 98 100 100
## Pl2 1 0 0
## Pl3 0 0 0
## Pl4 1 0 0
##
## $Pl1_Protagoras$mfw_100
## 500 1000 1500
## Pl1 89 100 100
## Pl2 3 0 0
## Pl3 4 0 0
## Pl4 4 0 0
##
##
## $Pl2_Republic8
## $Pl2_Republic8$mfw_35
## 500 1000 1500
## Pl1 1 0 0
## Pl2 89 99 99
## Pl3 0 0 0
## Pl4 10 1 1
##
## $Pl2_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 97 99 100
## Pl3 0 0 0
## Pl4 3 1 0
##
## $Pl2_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 99 100 100
## Pl3 0 0 0
## Pl4 1 0 0
##
##
## $Pl2_Republic9
## $Pl2_Republic9$mfw_35
## 500 1000 1500
## Pl1 4 4 0
## Pl2 91 96 100
## Pl3 1 0 0
## Pl4 4 0 0
##
## $Pl2_Republic9$mfw_70
## 500 1000 1500
## Pl1 0 1 0
## Pl2 100 99 100
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl2_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 99 100 100
## Pl3 1 0 0
## Pl4 0 0 0
##
##
## $Pl3_Sophist
## $Pl3_Sophist$mfw_35
## 500 1000 1500
## Pl1 18 9 4
## Pl2 6 0 0
## Pl3 70 91 95
## Pl4 6 0 1
##
## $Pl3_Sophist$mfw_70
## 500 1000 1500
## Pl1 7 0 0
## Pl2 6 1 0
## Pl3 82 98 99
## Pl4 5 1 1
##
## $Pl3_Sophist$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 95 100 100
## Pl4 4 0 0
##
##
## $Pl3_Theaetetus
## $Pl3_Theaetetus$mfw_35
## 500 1000 1500
## Pl1 54 68 67
## Pl2 14 7 4
## Pl3 31 25 29
## Pl4 1 0 0
##
## $Pl3_Theaetetus$mfw_70
## 500 1000 1500
## Pl1 54 74 82
## Pl2 25 13 6
## Pl3 21 13 12
## Pl4 0 0 0
##
## $Pl3_Theaetetus$mfw_100
## 500 1000 1500
## Pl1 26 52 48
## Pl2 33 18 10
## Pl3 41 30 42
## Pl4 0 0 0
##
##
## $Pl4_Laws5
## $Pl4_Laws5$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 29 24 24
## Pl3 2 0 0
## Pl4 69 76 76
##
## $Pl4_Laws5$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 11 4 3
## Pl3 0 1 0
## Pl4 89 95 97
##
## $Pl4_Laws5$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
## $Pl4_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 2 0 0
## Pl3 0 0 0
## Pl4 98 100 100
##
## $Pl4_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 2 0 0
## Pl3 0 0 0
## Pl4 98 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
This one is a modification of Test 7. I replace the Laws 5 with Laws 8, which leads to a more stable result for Plato 3 and 4.
dir.create("corpus9")
file.copy("Lysis.txt", "corpus9")
file.copy("Charmides.txt", "corpus9")
file.copy("Laws8.txt", "corpus9")
file.copy("Laws9.txt", "corpus9")
file.copy("Republic8.txt", "corpus9")
file.copy("Republic9.txt", "corpus9")
file.copy("Protagoras.txt", "corpus9")
file.copy("Gorgias.txt", "corpus9")
setwd("corpus9")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl2_Gorgias.txt", "Pl4_Laws8.txt" , "Pl4_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Protagoras.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp9 <- size.penalize(corpus.dir = "corpus9", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp9$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.50 0.63 0.74
## mfw_70 0.37 0.47 0.45
## mfw_100 0.36 0.48 0.47
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.58 0.74 0.76
## mfw_70 0.49 0.51 0.60
## mfw_100 0.57 0.77 0.82
##
## $Pl2_Gorgias
## 500 1000 1500
## mfw_35 0.74 0.63 0.81
## mfw_70 0.84 0.89 0.97
## mfw_100 0.89 0.94 0.99
##
## $Pl2_Protagoras
## 500 1000 1500
## mfw_35 0.48 0.49 0.59
## mfw_70 0.85 0.89 0.99
## mfw_100 0.75 0.89 0.96
##
## $Pl3_Republic8
## 500 1000 1500
## mfw_35 0.94 1 1
## mfw_70 0.99 1 1
## mfw_100 0.97 1 1
##
## $Pl3_Republic9
## 500 1000 1500
## mfw_35 0.85 0.99 1
## mfw_70 0.97 1.00 1
## mfw_100 0.99 1.00 1
##
## $Pl4_Laws8
## 500 1000 1500
## mfw_35 0.87 0.96 1
## mfw_70 0.99 1.00 1
## mfw_100 1.00 1.00 1
##
## $Pl4_Laws9
## 500 1000 1500
## mfw_35 0.99 1 1
## mfw_70 1.00 1 1
## mfw_100 1.00 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp9$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 50 63 74
## Pl2 50 37 26
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 37 47 45
## Pl2 63 53 55
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 36 48 47
## Pl2 64 52 53
## Pl3 0 0 0
## Pl4 0 0 0
##
##
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
## 500 1000 1500
## Pl1 58 74 76
## Pl2 40 25 24
## Pl3 2 1 0
## Pl4 0 0 0
##
## $Pl1_Lysis$mfw_70
## 500 1000 1500
## Pl1 49 51 60
## Pl2 51 49 40
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Lysis$mfw_100
## 500 1000 1500
## Pl1 57 77 82
## Pl2 43 22 17
## Pl3 0 1 1
## Pl4 0 0 0
##
##
## $Pl2_Gorgias
## $Pl2_Gorgias$mfw_35
## 500 1000 1500
## Pl1 26 37 19
## Pl2 74 63 81
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl2_Gorgias$mfw_70
## 500 1000 1500
## Pl1 16 11 3
## Pl2 84 89 97
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl2_Gorgias$mfw_100
## 500 1000 1500
## Pl1 10 6 1
## Pl2 89 94 99
## Pl3 1 0 0
## Pl4 0 0 0
##
##
## $Pl2_Protagoras
## $Pl2_Protagoras$mfw_35
## 500 1000 1500
## Pl1 50 50 40
## Pl2 48 49 59
## Pl3 2 1 1
## Pl4 0 0 0
##
## $Pl2_Protagoras$mfw_70
## 500 1000 1500
## Pl1 9 5 1
## Pl2 85 89 99
## Pl3 6 6 0
## Pl4 0 0 0
##
## $Pl2_Protagoras$mfw_100
## 500 1000 1500
## Pl1 19 10 4
## Pl2 75 89 96
## Pl3 6 1 0
## Pl4 0 0 0
##
##
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 94 100 100
## Pl4 6 0 0
##
## $Pl3_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 99 100 100
## Pl4 1 0 0
##
## $Pl3_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 97 100 100
## Pl4 3 0 0
##
##
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
## 500 1000 1500
## Pl1 6 0 0
## Pl2 0 1 0
## Pl3 85 99 100
## Pl4 9 0 0
##
## $Pl3_Republic9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 97 100 100
## Pl4 2 0 0
##
## $Pl3_Republic9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 99 100 100
## Pl4 1 0 0
##
##
## $Pl4_Laws8
## $Pl4_Laws8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 13 4 0
## Pl4 87 96 100
##
## $Pl4_Laws8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 1 0 0
## Pl4 99 100 100
##
## $Pl4_Laws8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 1 0 0
## Pl4 99 100 100
##
## $Pl4_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
## $Pl4_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
Can we get less misattributions between groups 1 and 2 (and more stylistic variations) if we now consider the Phaedo and the Symposium? Apparently not. Both the Symposium and the Phaedo are far too often attributed to Plato 1 or 3.
dir.create("corpus10")
file.copy("Lysis.txt", "corpus10")
file.copy("Charmides.txt", "corpus10")
file.copy("Laws8.txt", "corpus10")
file.copy("Laws9.txt", "corpus10")
file.copy("Republic8.txt", "corpus10")
file.copy("Republic9.txt", "corpus10")
file.copy("Symposium.txt", "corpus10")
file.copy("Phaedo.txt", "corpus10")
setwd("corpus10")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl4_Laws8.txt", "Pl4_Laws9.txt", "Pl1_Lysis.txt", "Pl2_Phaedo.txt", "Pl3_Republic8.txt", "Pl3_Republic9.txt", "Pl2_Symposium.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp10 <- size.penalize(corpus.dir = "corpus10", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp10$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.79 0.83 0.93
## mfw_70 0.85 0.97 1.00
## mfw_100 0.83 0.95 0.97
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.76 0.86 0.97
## mfw_70 0.93 0.98 1.00
## mfw_100 0.94 1.00 1.00
##
## $Pl2_Phaedo
## 500 1000 1500
## mfw_35 0.56 0.68 0.80
## mfw_70 0.59 0.65 0.82
## mfw_100 0.37 0.50 0.67
##
## $Pl2_Symposium
## 500 1000 1500
## mfw_35 0.68 0.81 0.89
## mfw_70 0.66 0.77 0.87
## mfw_100 0.74 0.89 0.97
##
## $Pl3_Republic8
## 500 1000 1500
## mfw_35 0.92 1 1
## mfw_70 0.93 1 1
## mfw_100 0.97 1 1
##
## $Pl3_Republic9
## 500 1000 1500
## mfw_35 0.80 0.92 0.99
## mfw_70 0.92 0.98 1.00
## mfw_100 0.93 0.99 1.00
##
## $Pl4_Laws8
## 500 1000 1500
## mfw_35 0.88 0.97 0.97
## mfw_70 0.99 1.00 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl4_Laws9
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp10$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 79 83 93
## Pl2 21 17 7
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 85 97 100
## Pl2 15 3 0
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 83 95 97
## Pl2 17 5 3
## Pl3 0 0 0
## Pl4 0 0 0
##
##
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
## 500 1000 1500
## Pl1 76 86 97
## Pl2 20 14 3
## Pl3 4 0 0
## Pl4 0 0 0
##
## $Pl1_Lysis$mfw_70
## 500 1000 1500
## Pl1 93 98 100
## Pl2 7 2 0
## Pl3 0 0 0
## Pl4 0 0 0
##
## $Pl1_Lysis$mfw_100
## 500 1000 1500
## Pl1 94 100 100
## Pl2 4 0 0
## Pl3 2 0 0
## Pl4 0 0 0
##
##
## $Pl2_Phaedo
## $Pl2_Phaedo$mfw_35
## 500 1000 1500
## Pl1 26 26 14
## Pl2 56 68 80
## Pl3 16 6 6
## Pl4 2 0 0
##
## $Pl2_Phaedo$mfw_70
## 500 1000 1500
## Pl1 9 3 0
## Pl2 59 65 82
## Pl3 32 32 18
## Pl4 0 0 0
##
## $Pl2_Phaedo$mfw_100
## 500 1000 1500
## Pl1 22 8 4
## Pl2 37 50 67
## Pl3 41 42 29
## Pl4 0 0 0
##
##
## $Pl2_Symposium
## $Pl2_Symposium$mfw_35
## 500 1000 1500
## Pl1 14 8 3
## Pl2 68 81 89
## Pl3 16 10 8
## Pl4 2 1 0
##
## $Pl2_Symposium$mfw_70
## 500 1000 1500
## Pl1 13 8 0
## Pl2 66 77 87
## Pl3 21 15 13
## Pl4 0 0 0
##
## $Pl2_Symposium$mfw_100
## 500 1000 1500
## Pl1 7 3 0
## Pl2 74 89 97
## Pl3 13 7 3
## Pl4 6 1 0
##
##
## $Pl3_Republic8
## $Pl3_Republic8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 92 100 100
## Pl4 7 0 0
##
## $Pl3_Republic8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 93 100 100
## Pl4 6 0 0
##
## $Pl3_Republic8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 97 100 100
## Pl4 3 0 0
##
##
## $Pl3_Republic9
## $Pl3_Republic9$mfw_35
## 500 1000 1500
## Pl1 3 1 0
## Pl2 10 7 1
## Pl3 80 92 99
## Pl4 7 0 0
##
## $Pl3_Republic9$mfw_70
## 500 1000 1500
## Pl1 1 0 0
## Pl2 5 1 0
## Pl3 92 98 100
## Pl4 2 1 0
##
## $Pl3_Republic9$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 4 1 0
## Pl3 93 99 100
## Pl4 1 0 0
##
##
## $Pl4_Laws8
## $Pl4_Laws8$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 12 3 3
## Pl4 88 97 97
##
## $Pl4_Laws8$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 0 0 0
## Pl4 99 100 100
##
## $Pl4_Laws8$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 0 0
## Pl3 0 0 0
## Pl4 99 100 100
##
##
## $Pl4_Laws9
## $Pl4_Laws9$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
## $Pl4_Laws9$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
## $Pl4_Laws9$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 0 0 0
## Pl4 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
Of the combinations presented here, those in Test 2 and 6 are most clearly recognized by the classifier; these are:
Set1: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)
Set 2: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg 5. and 9) Test 1 is also successful, but less informative (insofar as we only have two authors in this set) than 2.
The confusion is higher with more “Platos” to discriminate between (as tests 7, 9 and 10 demonstrate).
As Sets 1 and 2 only differ in respect to Plato 1, it is desirable to get some more variations for the classifier. At least we could try experimenting with different books of the Republic and the Laws.
There is some notable confusion between the Republic and and the first books of the Laws, especially with smaller samples. A relatively high percent of misclassifications for Charmides also signals that our texts might be too proximate stylistically.
dir.create("corpus11")
file.copy("Lysis.txt", "corpus11")
file.copy("Charmides.txt", "corpus11")
file.copy("Laws1.txt", "corpus11")
file.copy("Laws2.txt", "corpus11")
file.copy("Republic2.txt", "corpus11")
file.copy("Republic3.txt", "corpus11")
setwd("corpus11")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Laws1.txt", "Pl3_Laws2.txt", "Pl1_Lysis.txt", "Pl2_Republic2.txt", "Pl2_Republic3.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp11 <- size.penalize(corpus.dir = "corpus11", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp11$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.96 0.98 1.00
## mfw_70 0.87 0.94 1.00
## mfw_100 0.71 0.88 0.96
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.89 0.90 0.98
## mfw_70 0.91 0.95 0.97
## mfw_100 0.75 0.87 0.90
##
## $Pl2_Republic2
## 500 1000 1500
## mfw_35 0.76 0.93 0.98
## mfw_70 0.99 0.99 1.00
## mfw_100 0.96 0.98 1.00
##
## $Pl2_Republic3
## 500 1000 1500
## mfw_35 0.74 0.88 0.93
## mfw_70 0.79 0.99 0.97
## mfw_100 0.82 0.97 0.99
##
## $Pl3_Laws1
## 500 1000 1500
## mfw_35 0.92 0.98 0.99
## mfw_70 0.93 0.98 1.00
## mfw_100 0.98 1.00 1.00
##
## $Pl3_Laws2
## 500 1000 1500
## mfw_35 0.84 0.91 0.97
## mfw_70 0.97 0.99 1.00
## mfw_100 0.98 1.00 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
The accuracy is higher for this combination (thought it slightly decreases for Charmides with 100 mfw). Let’s try replacing this text in the next iteration.
dir.create("corpus12")
file.copy("Lysis.txt", "corpus12")
file.copy("Charmides.txt", "corpus12")
file.copy("Timaeus.txt", "corpus12")
file.copy("Critias.txt", "corpus12")
file.copy("Republic4.txt", "corpus12")
file.copy("Republic5.txt", "corpus12")
setwd("corpus12")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl3_Critias.txt", "Pl1_Lysis.txt", "Pl2_Republic4.txt", "Pl2_Republic5.txt", "Pl3_Timaeus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp12 <- size.penalize(corpus.dir = "corpus12", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp12$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.85 0.91 1.00
## mfw_70 0.93 0.97 0.98
## mfw_100 0.66 0.78 0.88
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.85 0.95 0.96
## mfw_70 0.99 1.00 1.00
## mfw_100 0.96 1.00 1.00
##
## $Pl2_Republic4
## 500 1000 1500
## mfw_35 0.87 0.95 0.98
## mfw_70 0.94 0.99 1.00
## mfw_100 0.95 0.99 1.00
##
## $Pl2_Republic5
## 500 1000 1500
## mfw_35 0.76 0.89 0.96
## mfw_70 0.88 0.98 0.99
## mfw_100 0.98 0.99 1.00
##
## $Pl3_Critias
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## $Pl3_Timaeus
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
It’s no good if we replace the Charmides with the Laches: now both the Lysis and the Laches are massively classified into Plato 2! The combination in Add. Test 2 was definitely more fortunate.
dir.create("corpus13")
file.copy("Lysis.txt", "corpus13")
file.copy("Laches.txt", "corpus13")
file.copy("Timaeus.txt", "corpus13")
file.copy("Critias.txt", "corpus13")
file.copy("Republic4.txt", "corpus13")
file.copy("Republic5.txt", "corpus13")
setwd("corpus13")
file.names <- list.files()
file.names
new.file.names <- c("Pl3_Critias.txt", "Pl1_Laches.txt", "Pl1_Lysis.txt", "Pl2_Republic4.txt", "Pl2_Republic5.txt", "Pl3_Timaeus.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp13 <- size.penalize(corpus.dir = "corpus13", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp13$accuracy.scores
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.51 0.55 0.68
## mfw_70 0.75 0.80 0.89
## mfw_100 0.58 0.51 0.55
##
## $Pl1_Lysis
## 500 1000 1500
## mfw_35 0.34 0.42 0.33
## mfw_70 0.77 0.74 0.78
## mfw_100 0.65 0.69 0.67
##
## $Pl2_Republic4
## 500 1000 1500
## mfw_35 0.80 0.91 0.97
## mfw_70 0.97 1.00 1.00
## mfw_100 0.99 1.00 1.00
##
## $Pl2_Republic5
## 500 1000 1500
## mfw_35 0.85 0.92 0.98
## mfw_70 0.93 0.98 1.00
## mfw_100 0.95 0.99 1.00
##
## $Pl3_Critias
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## $Pl3_Timaeus
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp13$confusion.matrices
## $Pl1_Laches
## $Pl1_Laches$mfw_35
## 500 1000 1500
## Pl1 51 55 68
## Pl2 49 45 32
## Pl3 0 0 0
##
## $Pl1_Laches$mfw_70
## 500 1000 1500
## Pl1 75 80 89
## Pl2 25 20 11
## Pl3 0 0 0
##
## $Pl1_Laches$mfw_100
## 500 1000 1500
## Pl1 58 51 55
## Pl2 42 49 45
## Pl3 0 0 0
##
##
## $Pl1_Lysis
## $Pl1_Lysis$mfw_35
## 500 1000 1500
## Pl1 34 42 33
## Pl2 66 58 67
## Pl3 0 0 0
##
## $Pl1_Lysis$mfw_70
## 500 1000 1500
## Pl1 77 74 78
## Pl2 23 26 22
## Pl3 0 0 0
##
## $Pl1_Lysis$mfw_100
## 500 1000 1500
## Pl1 65 69 67
## Pl2 35 31 33
## Pl3 0 0 0
##
##
## $Pl2_Republic4
## $Pl2_Republic4$mfw_35
## 500 1000 1500
## Pl1 20 9 3
## Pl2 80 91 97
## Pl3 0 0 0
##
## $Pl2_Republic4$mfw_70
## 500 1000 1500
## Pl1 3 0 0
## Pl2 97 100 100
## Pl3 0 0 0
##
## $Pl2_Republic4$mfw_100
## 500 1000 1500
## Pl1 1 0 0
## Pl2 99 100 100
## Pl3 0 0 0
##
##
## $Pl2_Republic5
## $Pl2_Republic5$mfw_35
## 500 1000 1500
## Pl1 15 8 2
## Pl2 85 92 98
## Pl3 0 0 0
##
## $Pl2_Republic5$mfw_70
## 500 1000 1500
## Pl1 6 2 0
## Pl2 93 98 100
## Pl3 1 0 0
##
## $Pl2_Republic5$mfw_100
## 500 1000 1500
## Pl1 5 1 0
## Pl2 95 99 100
## Pl3 0 0 0
##
##
## $Pl3_Critias
## $Pl3_Critias$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Critias$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Critias$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## $Pl3_Timaeus
## $Pl3_Timaeus$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Timaeus$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Timaeus$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
Here’s another combination we try. The Statesman is again suspiciously unstable, and I now remember to have decided to exclude it from comparison. So shall it be.
dir.create("corpus14")
file.copy("Laches.txt", "corpus14")
file.copy("Charmides.txt", "corpus14")
file.copy("Sophist.txt", "corpus14")
file.copy("Statesman.txt", "corpus14")
file.copy("Republic6.txt", "corpus14")
file.copy("Republic7.txt", "corpus14")
setwd("corpus14")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt", "Pl3_Sophist.txt", "Pl3_Statesman.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp14 <- size.penalize(corpus.dir = "corpus14", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp14$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.23 0.34 0.42
## mfw_70 0.90 0.96 0.98
## mfw_100 0.93 0.98 1.00
##
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.70 0.83 0.90
## mfw_70 0.80 0.83 0.91
## mfw_100 0.92 0.96 0.98
##
## $Pl2_Republic6
## 500 1000 1500
## mfw_35 0.74 0.90 0.99
## mfw_70 0.87 0.99 1.00
## mfw_100 0.93 1.00 1.00
##
## $Pl2_Republic7
## 500 1000 1500
## mfw_35 0.87 0.97 1
## mfw_70 0.95 0.98 1
## mfw_100 0.92 0.97 1
##
## $Pl3_Sophist
## 500 1000 1500
## mfw_35 0.78 0.92 0.98
## mfw_70 0.93 0.99 1.00
## mfw_100 0.98 1.00 1.00
##
## $Pl3_Statesman
## 500 1000 1500
## mfw_35 0.58 0.75 0.83
## mfw_70 0.63 0.88 0.88
## mfw_100 0.68 0.89 0.96
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
It was not a good idea to take the Parmenides. It’s again a disaster (see scores for the Sophist).
dir.create("corpus15")
file.copy("Laches.txt", "corpus15")
file.copy("Charmides.txt", "corpus15")
file.copy("Sophist.txt", "corpus15")
file.copy("Parmenides.txt", "corpus15")
file.copy("Republic6.txt", "corpus15")
file.copy("Republic7.txt", "corpus15")
setwd("corpus15")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Parmenides.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt", "Pl3_Sophist.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp15 <- size.penalize(corpus.dir = "corpus15", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp15$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.39 0.37 0.44
## mfw_70 0.84 0.88 0.98
## mfw_100 0.86 0.90 0.99
##
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.63 0.84 0.92
## mfw_70 0.89 0.95 0.98
## mfw_100 0.94 1.00 1.00
##
## $Pl2_Republic6
## 500 1000 1500
## mfw_35 0.80 0.90 0.96
## mfw_70 0.89 1.00 1.00
## mfw_100 0.92 0.99 1.00
##
## $Pl2_Republic7
## 500 1000 1500
## mfw_35 0.92 0.98 1
## mfw_70 0.94 1.00 1
## mfw_100 0.96 1.00 1
##
## $Pl3_Parmenides
## 500 1000 1500
## mfw_35 0.33 0.16 0.11
## mfw_70 0.45 0.41 0.36
## mfw_100 0.80 0.95 0.98
##
## $Pl3_Sophist
## 500 1000 1500
## mfw_35 0.04 0.01 0
## mfw_70 0.01 0.00 0
## mfw_100 0.04 0.01 0
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
Again, a failure. Laws 10 are classified into Plato 2!
dir.create("corpus16")
file.copy("Laches.txt", "corpus16")
file.copy("Charmides.txt", "corpus16")
file.copy("Laws10.txt", "corpus16")
file.copy("Laws11.txt", "corpus16")
file.copy("Republic6.txt", "corpus16")
file.copy("Republic7.txt", "corpus16")
setwd("corpus16")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws10.txt", "Pl3_Laws11.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp16 <- size.penalize(corpus.dir = "corpus16", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp16$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.31 0.23 0.23
## mfw_70 0.94 0.97 1.00
## mfw_100 0.79 0.87 0.96
##
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.66 0.77 0.85
## mfw_70 0.76 0.73 0.85
## mfw_100 0.83 0.88 0.95
##
## $Pl2_Republic6
## 500 1000 1500
## mfw_35 0.80 0.93 0.94
## mfw_70 0.85 0.97 1.00
## mfw_100 0.91 1.00 1.00
##
## $Pl2_Republic7
## 500 1000 1500
## mfw_35 0.92 0.92 0.98
## mfw_70 0.88 0.99 0.99
## mfw_100 0.91 0.99 0.99
##
## $Pl3_Laws10
## 500 1000 1500
## mfw_35 0.12 0.04 0.00
## mfw_70 0.27 0.10 0.04
## mfw_100 0.48 0.30 0.28
##
## $Pl3_Laws11
## 500 1000 1500
## mfw_35 0.94 1.00 1
## mfw_70 0.93 0.98 1
## mfw_100 1.00 1.00 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp16$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 31 23 23
## Pl2 69 77 77
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 94 97 100
## Pl2 6 3 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 79 87 96
## Pl2 21 13 4
## Pl3 0 0 0
##
##
## $Pl1_Laches
## $Pl1_Laches$mfw_35
## 500 1000 1500
## Pl1 66 77 85
## Pl2 32 23 15
## Pl3 2 0 0
##
## $Pl1_Laches$mfw_70
## 500 1000 1500
## Pl1 76 73 85
## Pl2 24 27 15
## Pl3 0 0 0
##
## $Pl1_Laches$mfw_100
## 500 1000 1500
## Pl1 83 88 95
## Pl2 17 12 5
## Pl3 0 0 0
##
##
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
## 500 1000 1500
## Pl1 7 2 1
## Pl2 80 93 94
## Pl3 13 5 5
##
## $Pl2_Republic6$mfw_70
## 500 1000 1500
## Pl1 8 3 0
## Pl2 85 97 100
## Pl3 7 0 0
##
## $Pl2_Republic6$mfw_100
## 500 1000 1500
## Pl1 3 0 0
## Pl2 91 100 100
## Pl3 6 0 0
##
##
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
## 500 1000 1500
## Pl1 2 1 0
## Pl2 92 92 98
## Pl3 6 7 2
##
## $Pl2_Republic7$mfw_70
## 500 1000 1500
## Pl1 3 0 0
## Pl2 88 99 99
## Pl3 9 1 1
##
## $Pl2_Republic7$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 91 99 99
## Pl3 7 1 1
##
##
## $Pl3_Laws10
## $Pl3_Laws10$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 88 96 100
## Pl3 12 4 0
##
## $Pl3_Laws10$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 73 90 96
## Pl3 27 10 4
##
## $Pl3_Laws10$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 52 70 72
## Pl3 48 30 28
##
##
## $Pl3_Laws11
## $Pl3_Laws11$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 6 0 0
## Pl3 94 100 100
##
## $Pl3_Laws11$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 7 2 0
## Pl3 93 98 100
##
## $Pl3_Laws11$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
It’s now better, but we should use 1000-w blocks and 70-100 mfw.
dir.create("corpus17")
file.copy("Laches.txt", "corpus17")
file.copy("Charmides.txt", "corpus17")
file.copy("Laws2.txt", "corpus17")
file.copy("Laws3.txt", "corpus17")
file.copy("Republic6.txt", "corpus17")
file.copy("Republic7.txt", "corpus17")
setwd("corpus17")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws2.txt", "Pl3_Laws3.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp17 <- size.penalize(corpus.dir = "corpus17", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp17$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.41 0.46 0.38
## mfw_70 0.92 0.96 0.99
## mfw_100 0.84 0.88 0.89
##
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.83 0.84 0.94
## mfw_70 0.79 0.87 0.89
## mfw_100 0.83 0.94 0.99
##
## $Pl2_Republic6
## 500 1000 1500
## mfw_35 0.67 0.82 0.83
## mfw_70 0.83 0.92 0.99
## mfw_100 0.87 0.95 0.99
##
## $Pl2_Republic7
## 500 1000 1500
## mfw_35 0.84 0.96 0.98
## mfw_70 0.80 0.95 0.98
## mfw_100 0.87 0.98 0.98
##
## $Pl3_Laws2
## 500 1000 1500
## mfw_35 0.76 0.90 0.93
## mfw_70 0.82 0.96 0.96
## mfw_100 0.91 0.98 1.00
##
## $Pl3_Laws3
## 500 1000 1500
## mfw_35 0.91 0.98 0.99
## mfw_70 0.81 0.92 0.98
## mfw_100 0.87 0.98 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp17$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 41 46 38
## Pl2 59 54 62
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 92 96 99
## Pl2 8 4 1
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 84 88 89
## Pl2 16 12 11
## Pl3 0 0 0
##
##
## $Pl1_Laches
## $Pl1_Laches$mfw_35
## 500 1000 1500
## Pl1 83 84 94
## Pl2 13 11 6
## Pl3 4 5 0
##
## $Pl1_Laches$mfw_70
## 500 1000 1500
## Pl1 79 87 89
## Pl2 20 12 11
## Pl3 1 1 0
##
## $Pl1_Laches$mfw_100
## 500 1000 1500
## Pl1 83 94 99
## Pl2 14 2 1
## Pl3 3 4 0
##
##
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
## 500 1000 1500
## Pl1 10 0 4
## Pl2 67 82 83
## Pl3 23 18 13
##
## $Pl2_Republic6$mfw_70
## 500 1000 1500
## Pl1 4 0 0
## Pl2 83 92 99
## Pl3 13 8 1
##
## $Pl2_Republic6$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 87 95 99
## Pl3 11 5 1
##
##
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
## 500 1000 1500
## Pl1 2 0 0
## Pl2 84 96 98
## Pl3 14 4 2
##
## $Pl2_Republic7$mfw_70
## 500 1000 1500
## Pl1 1 1 0
## Pl2 80 95 98
## Pl3 19 4 2
##
## $Pl2_Republic7$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 87 98 98
## Pl3 11 2 2
##
##
## $Pl3_Laws2
## $Pl3_Laws2$mfw_35
## 500 1000 1500
## Pl1 1 0 0
## Pl2 23 10 7
## Pl3 76 90 93
##
## $Pl3_Laws2$mfw_70
## 500 1000 1500
## Pl1 1 0 0
## Pl2 17 4 4
## Pl3 82 96 96
##
## $Pl3_Laws2$mfw_100
## 500 1000 1500
## Pl1 1 0 0
## Pl2 8 2 0
## Pl3 91 98 100
##
##
## $Pl3_Laws3
## $Pl3_Laws3$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 9 2 1
## Pl3 91 98 99
##
## $Pl3_Laws3$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 19 8 2
## Pl3 81 92 98
##
## $Pl3_Laws3$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 13 2 0
## Pl3 87 98 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
dir.create("corpus18")
file.copy("Laches.txt", "corpus18")
file.copy("Charmides.txt", "corpus18")
file.copy("Laws12.txt", "corpus18")
file.copy("Laws11.txt", "corpus18")
file.copy("Republic6.txt", "corpus18")
file.copy("Republic7.txt", "corpus18")
setwd("corpus18")
file.names <- list.files()
file.names
new.file.names <- c("Pl1_Charmides.txt", "Pl1_Laches.txt", "Pl3_Laws11.txt", "Pl3_Laws12.txt", "Pl2_Republic6.txt", "Pl2_Republic7.txt")
file.rename(from = file.names, to = new.file.names)
setwd("~/R_Workflow/Plato_Testing_Sample_Size")
sp18 <- size.penalize(corpus.dir = "corpus18", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp18$accuracy.scores
## $Pl1_Charmides
## 500 1000 1500
## mfw_35 0.36 0.37 0.32
## mfw_70 0.88 0.99 1.00
## mfw_100 0.71 0.84 0.89
##
## $Pl1_Laches
## 500 1000 1500
## mfw_35 0.63 0.71 0.74
## mfw_70 0.71 0.83 0.87
## mfw_100 0.82 0.90 0.99
##
## $Pl2_Republic6
## 500 1000 1500
## mfw_35 0.74 0.92 0.97
## mfw_70 0.91 1.00 1.00
## mfw_100 1.00 1.00 1.00
##
## $Pl2_Republic7
## 500 1000 1500
## mfw_35 0.87 0.99 1
## mfw_70 0.87 0.97 1
## mfw_100 0.90 1.00 1
##
## $Pl3_Laws11
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## $Pl3_Laws12
## 500 1000 1500
## mfw_35 0.96 0.99 1
## mfw_70 0.99 0.99 1
## mfw_100 1.00 1.00 1
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp18$confusion.matrices
## $Pl1_Charmides
## $Pl1_Charmides$mfw_35
## 500 1000 1500
## Pl1 36 37 32
## Pl2 64 63 68
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_70
## 500 1000 1500
## Pl1 88 99 100
## Pl2 12 1 0
## Pl3 0 0 0
##
## $Pl1_Charmides$mfw_100
## 500 1000 1500
## Pl1 71 84 89
## Pl2 29 16 11
## Pl3 0 0 0
##
##
## $Pl1_Laches
## $Pl1_Laches$mfw_35
## 500 1000 1500
## Pl1 63 71 74
## Pl2 37 29 26
## Pl3 0 0 0
##
## $Pl1_Laches$mfw_70
## 500 1000 1500
## Pl1 71 83 87
## Pl2 29 17 13
## Pl3 0 0 0
##
## $Pl1_Laches$mfw_100
## 500 1000 1500
## Pl1 82 90 99
## Pl2 18 10 1
## Pl3 0 0 0
##
##
## $Pl2_Republic6
## $Pl2_Republic6$mfw_35
## 500 1000 1500
## Pl1 19 8 3
## Pl2 74 92 97
## Pl3 7 0 0
##
## $Pl2_Republic6$mfw_70
## 500 1000 1500
## Pl1 5 0 0
## Pl2 91 100 100
## Pl3 4 0 0
##
## $Pl2_Republic6$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 100 100 100
## Pl3 0 0 0
##
##
## $Pl2_Republic7
## $Pl2_Republic7$mfw_35
## 500 1000 1500
## Pl1 4 0 0
## Pl2 87 99 100
## Pl3 9 1 0
##
## $Pl2_Republic7$mfw_70
## 500 1000 1500
## Pl1 3 0 0
## Pl2 87 97 100
## Pl3 10 3 0
##
## $Pl2_Republic7$mfw_100
## 500 1000 1500
## Pl1 3 0 0
## Pl2 90 100 100
## Pl3 7 0 0
##
##
## $Pl3_Laws11
## $Pl3_Laws11$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Laws11$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
## $Pl3_Laws11$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## $Pl3_Laws12
## $Pl3_Laws12$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 4 1 0
## Pl3 96 99 100
##
## $Pl3_Laws12$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 1 1 0
## Pl3 99 99 100
##
## $Pl3_Laws12$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 0 0 0
## Pl3 100 100 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
The following combinations of Plato’s works seem to be most suitable to compare our test dialogue with. These are:
Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 2 and 3) vs. Plato 3 (Lg 1. and 2) OR Plato 3 (Tim., Cri.)
Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)