In our previous series of experiments, we have chosen the following combinations of three “Platos” to be discriminated by a machine-learning classifier.
Set 1: Plato 1 (Ly., Chrm.) vs. Plato 2 (R. 2 and 3) vs. Plato 3 (Lg 1. and 2)
Set 2: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)
To each Plato, only 2 dialogues were assigned, and with 1000 word blocks and 70 to 100 most frequent words (mfw) the classifier (Delta) was able to correctly assign a sample from a text to its corresponding author. Though in fact authored by one person (presumably with some chronological gap), the texts in each group “look different” (statistically) for the classifier, and we intend to use our Platos for further comparison with a test dialogue which we suspect to have been revised. However, before we do so, it is advisable to have a closer look at our reference Platos. We are here interested in the specific delta distances between them. In sum, we come to the conclusion that we need to redefine our Set 1, for it now includes in one group texts with delta > 1.
For these experiments, I used Diorisis Ancient Greek Corpus [@vatri2018]. On the accuracy of lemmatization, see [@vatri2020]. The code I used for extracting the lemmata is accessible via links: Parsing Plato’s Republic (Separate Books), Parsing Plato’s Laws (Separate Books), Corpus_Platonicum: Lemmata Extraction. I start with the files produced by this code in my working directory.
library(stylo)
library(dendextend)
dir.create("corpus1")
file.copy(c("Protagoras.txt", "Gorgias.txt", "Republic2.txt", "Republic3.txt", "Republic8.txt", "Republic9.txt", "Laws1.txt", "Laws2.txt","Laws8.txt","Laws9.txt", "Lysis.txt","Charmides.txt"), "corpus1")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
my_corpus1 <- load.corpus.and.parse(files = "all", corpus.dir = "corpus1", markup.type= "plain", corpus.lang = "Other", sampling = "no.sampling", preserve.case = FALSE, encoding = "UTF-8")
my_freq <- make.frequency.list(my_corpus1, value = TRUE)
dim(my_freq)
## [1] 5498
head(my_freq)
## data
## ὁ καί εἰμί δέ οὗτος ἐγώ
## 9.137414 5.673112 2.735435 2.676744 1.869359 1.780531
For our analysis, we want to chose 100 mfw from the corpus. My_freq presents frequencies in descending order, so we basically need to subset the first 100 elements.
mfw100 <- my_freq[1:100]
names(mfw100)
## [1] "ὁ" "καί" "εἰμί" "δέ" "οὗτος" "ἐγώ"
## [7] "οὐ" "τε" "αὐτός" "ἄν" "μέν" "ἠέ"
## [13] "ὅς" "τις" "λέγω" "φημί" "γάρ" "ἐν"
## [19] "σύ" "ἀλλά" "γε" "ἄλλος" "ὅστις" "μή"
## [25] "δή" "ὡς" "τίς" "οὖν" "γίγνομαι" "εἰ"
## [31] "πᾶς" "περί" "ὦ" "ἔχω" "τοιοῦτος" "πρός"
## [37] "ἀγαθός" "λόγος" "ἑαυτοῦ" "οὕτως" "πολύς" "δοκέω"
## [43] "ποιέω" "ἐάν" "κακός" "πόλις" "εἰς" "οὐδείς"
## [49] "κατά" "εἶπον" "οἴομαι" "ἐκ" "καλός" "ἄνθρωπος"
## [55] "διά" "ἐπί" "οὐδέ" "ὑπό" "οὔτε" "νῦν"
## [61] "δέομαι" "μέγας" "βούλομαι" "οὐκοῦν" "ἀίω" "οἷος"
## [67] "ἕ" "ἀνήρ" "ὀρθός" "πάνυ" "ἀληθής" "φίλος"
## [73] "ἐκεῖνος" "πῶς" "πρότερος" "οἶδα" "αὖ" "νόμος"
## [79] "ὥσπερ" "ἄρα" "ψυχή" "σωκράτης" "πράσσω" "ἆρα"
## [85] "δίκαιος" "ἦ" "φαίνω" "παρά" "μετά" "ἀδικέω"
## [91] "ἡδονά" "ὅσος" "ἕκαστος" "ἔοικα" "σῶμα" "μήτε"
## [97] "ὅδε" "ἔτι" "μᾶλλον" "μόνος"
A lemma in this list will be excluded from the analysis: σωκράτης (which is a proper name absent from some later dialogues).
mfw99 <- mfw100[-82]
names <- names(mfw99)
In order to perform delta, we first need to make frequency lists for each dialogue in our corpus (the texts to be compared in rows, the variables in columns). Dialogues are stored as character vectors within my_corpus.
Chrm_freq <- make.frequency.list(my_corpus1$Charmides, value = TRUE)
Grg_freq <- make.frequency.list(my_corpus1$Gorgias, value = TRUE)
Ly_freq <- make.frequency.list(my_corpus1$Lysis, value = TRUE)
Prt_freq <- make.frequency.list(my_corpus1$Protagoras, value = TRUE)
R2_freq <- make.frequency.list(my_corpus1$Republic2, value = TRUE)
R3_freq <- make.frequency.list(my_corpus1$Republic3, value = TRUE)
R8_freq <- make.frequency.list(my_corpus1$Republic8, value = TRUE)
R9_freq <- make.frequency.list(my_corpus1$Republic9, value = TRUE)
L1_freq <- make.frequency.list(my_corpus1$Laws1, value = TRUE)
L2_freq <- make.frequency.list(my_corpus1$Laws2, value = TRUE)
L8_freq <- make.frequency.list(my_corpus1$Laws8, value = TRUE)
L9_freq <- make.frequency.list(my_corpus1$Laws9, value = TRUE)
As we don’t need all the frequencies for the analysis, we subset using the list of features we have just created.
Chrm <- Chrm_freq[names]
Grg <- Grg_freq[names]
Ly <- Ly_freq[names]
Prt <- Prt_freq[names]
R2 <- R2_freq[names]
R3 <- R3_freq[names]
R8 <- R8_freq[names]
R9 <- R9_freq[names]
L1 <- L1_freq[names]
L2 <- L2_freq[names]
L8 <- L8_freq[names]
L9 <- L9_freq[names]
dataset1 <- rbind(Grg, Prt, R8, R9, L8, L9)
dataset2 <- rbind(Ly, Chrm, R2, R3, L1, L2)
dataset1[,1:5]
## ὁ καί εἰμί δέ οὗτος
## Grg 9.105061 5.649846 3.174242 1.936439 2.251585
## Prt 8.491149 5.743186 3.276201 2.489463 2.197246
## R8 8.883466 6.516168 2.330689 3.014033 1.683954
## R9 9.346748 6.561680 2.857976 2.828813 1.531059
## L8 11.234434 5.522469 1.759610 3.126692 2.111532
## L9 11.570589 4.589048 1.773744 3.434052 1.691245
dataset2[,1:5]
## ὁ καί εἰμί δέ οὗτος
## Ly 8.553009 4.828080 3.868195 3.008596 1.418338
## Chrm 8.338347 5.101672 4.151125 2.791481 1.672482
## R2 8.217489 5.672646 2.399103 3.094170 1.199552
## R3 7.609965 6.374076 2.423122 3.123784 1.440249
## L1 9.070878 5.471906 1.628106 2.521728 1.995348
## L2 10.405028 6.103352 1.815642 2.402235 2.150838
As we have initially assumed, the least distance is observed within groups which we defined as Plato 1, 2, and 3.
dist.delta(dataset1)
## Grg Prt R8 R9 L8
## Prt 0.8064134
## R8 1.3021532 1.1781170
## R9 1.2147812 1.1051432 0.7033716
## L8 1.4715845 1.3392882 1.1485845 1.2703215
## L9 1.5107271 1.4307145 1.1535585 1.2979888 0.8315440
dist.delta(dataset2)
## Ly Chrm R2 R3 L1
## Chrm 1.0158182
## R2 1.2185163 1.2087662
## R3 1.2521667 1.1980437 0.8092858
## L1 1.5086234 1.4211472 1.1068653 1.1158083
## L2 1.4596924 1.4520931 1.0707005 1.0143156 0.6966609
The same data can be presented as matrix (values rounded to two decimal places)
round((as.matrix(dist.delta(dataset1))), 2)
## Grg Prt R8 R9 L8 L9
## Grg 0.00 0.81 1.30 1.21 1.47 1.51
## Prt 0.81 0.00 1.18 1.11 1.34 1.43
## R8 1.30 1.18 0.00 0.70 1.15 1.15
## R9 1.21 1.11 0.70 0.00 1.27 1.30
## L8 1.47 1.34 1.15 1.27 0.00 0.83
## L9 1.51 1.43 1.15 1.30 0.83 0.00
round((as.matrix(dist.delta(dataset2))), 2)
## Ly Chrm R2 R3 L1 L2
## Ly 0.00 1.02 1.22 1.25 1.51 1.46
## Chrm 1.02 0.00 1.21 1.20 1.42 1.45
## R2 1.22 1.21 0.00 0.81 1.11 1.07
## R3 1.25 1.20 0.81 0.00 1.12 1.01
## L1 1.51 1.42 1.11 1.12 0.00 0.70
## L2 1.46 1.45 1.07 1.01 0.70 0.00
hc1 <- hclust(dist.delta(dataset1))
hcd1 <- as.dendrogram(hc1)
plot(hcd1, type = "rectangle", xlab = "Distance", horiz = TRUE)
hc2 <- hclust(dist.delta(dataset2))
hcd2 <- as.dendrogram(hc2)
plot(hcd2, type = "rectangle", xlab = "Distance", horiz = TRUE)
While for groups Grg-Prt, R. 2-3, R.8-9, L.1-2, L.8-9 the Delta is below 1, in Ly.-Chrm. group it is slightly above 1, which is comparable to the distance between L. 2 and R. 3, attributed to different Platos. It is worth reminding that in our previous experiments, there was some confusion between Ly. and R.: thus, with 100 mfw and 500-word samples 25% of samples were classified into Plato 2, with 1000-word samples the percentage of misclassification was about 15%. The computations above suggest that we need to redefine the profile of Plato 1 in Set 1.
dir.create("corpus2")
file.copy(c("Apology.txt", "Cleitophon.txt", "Cratylus.txt", "Crito.txt", "Euthydemus.txt", "Euthyphro.txt", "HippiasMajor.txt", "HippiasMinor.txt","Laches.txt"), "corpus2")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
my_corpus2 <- load.corpus.and.parse(files = "all", corpus.dir = "corpus2", markup.type= "plain", corpus.lang = "Other", sampling = "no.sampling", preserve.case = FALSE, encoding = "UTF-8")
To facilitate things a bit, I shall use the same list of features we made earlier.
Apol_freq <- make.frequency.list(my_corpus2$Apology, value = TRUE)
Cleit_freq <- make.frequency.list(my_corpus2$Cleitophon, value = TRUE)
Crat_freq <- make.frequency.list(my_corpus2$Cratylus, value = TRUE)
Crit_freq <- make.frequency.list(my_corpus2$Crito, value = TRUE)
Euthd_freq <- make.frequency.list(my_corpus2$Euthydemus, value = TRUE)
Euthph_freq <- make.frequency.list(my_corpus2$Euthyphro, value = TRUE)
HiMa_freq <- make.frequency.list(my_corpus2$HippiasMajor, value = TRUE)
HiMi_freq <- make.frequency.list(my_corpus2$HippiasMinor, value = TRUE)
Lch_freq <- make.frequency.list(my_corpus2$Laches, value = TRUE)
Apol <- Apol_freq[names]
Cleit <- Cleit_freq[names]
Crat <- Crat_freq[names]
Crit <- Crit_freq[names]
Euthd <- Euthd_freq[names]
Euthph <- Euthph_freq[names]
HiMa <- HiMa_freq[names]
HiMi <- HiMi_freq[names]
Lch <- Lch_freq[names]
dataset3 <- rbind(Ly, Chrm, Grg, Prt, Apol, Cleit, Crat, Crit, Euthd, Euthph, HiMa, HiMi, Lch)
round(as.matrix(dist.delta(dataset3)), 2)
## Ly Chrm Grg Prt Apol Cleit Crat Crit Euthd Euthph HiMa HiMi Lch
## Ly 0.00 0.98 1.11 1.13 1.28 1.63 1.16 1.37 0.91 1.18 1.03 1.17 1.19
## Chrm 0.98 0.00 1.00 1.01 1.15 1.51 1.11 1.27 0.81 1.22 0.94 1.28 0.99
## Grg 1.11 1.00 0.00 0.80 0.92 1.15 0.91 1.01 0.77 1.01 0.88 0.92 0.90
## Prt 1.13 1.01 0.80 0.00 0.87 1.19 0.98 1.08 0.77 1.16 0.99 1.12 0.93
## Apol 1.28 1.15 0.92 0.87 0.00 1.36 1.25 1.07 1.05 1.26 1.12 1.34 1.02
## Cleit 1.63 1.51 1.15 1.19 1.36 0.00 1.28 1.53 1.32 1.35 1.45 1.55 1.44
## Crat 1.16 1.11 0.91 0.98 1.25 1.28 0.00 1.34 0.96 1.10 0.98 1.19 1.08
## Crit 1.37 1.27 1.01 1.08 1.07 1.53 1.34 0.00 1.10 1.40 1.13 1.27 1.21
## Euthd 0.91 0.81 0.77 0.77 1.05 1.32 0.96 1.10 0.00 1.14 0.84 1.13 0.88
## Euthph 1.18 1.22 1.01 1.16 1.26 1.35 1.10 1.40 1.14 0.00 0.86 1.21 1.07
## HiMa 1.03 0.94 0.88 0.99 1.12 1.45 0.98 1.13 0.84 0.86 0.00 1.10 1.01
## HiMi 1.17 1.28 0.92 1.12 1.34 1.55 1.19 1.27 1.13 1.21 1.10 0.00 1.14
## Lch 1.19 0.99 0.90 0.93 1.02 1.44 1.08 1.21 0.88 1.07 1.01 1.14 0.00
hc3 <- hclust(dist.delta(dataset3))
hcd3 <- as.dendrogram(hc3)
plot(hcd3, type = "rectangle", xlab = "Distance", horiz = TRUE)
abline(v=1,col="red",lty=2)
While this time the delta for Chrm. and Ly. is slightly below 1, the dendrogram suggests that HiMa and Euthph are more proximate stylistically. Let us see if we can now plug this pair into Set 1.
dir.create("corpus3")
file.copy(c("HippiasMajor.txt", "Euthyphro.txt","Republic2.txt","Republic3.txt", "Laws1.txt","Laws2.txt"), "corpus3")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
setwd("corpus3")
file.names <- list.files()
new.file.names <- c("Pl1_Euthyphro.txt", "Pl1_HippiasMajor.txt", "Pl3_Laws1.txt", "Pl3_Laws2.txt", "Pl2_Republic2.txt", "Pl2_Republic3.txt")
file.rename(from = file.names, to = new.file.names)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
setwd("~/R_Workflow/2_Three_Platos_Distances")
sp3 <- size.penalize(corpus.dir = "corpus3", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp3$accuracy.scores
## $Pl1_Euthyphro
## 500 1000 1500
## mfw_35 0.95 0.98 1
## mfw_70 0.98 0.99 1
## mfw_100 0.96 1.00 1
##
## $Pl1_HippiasMajor
## 500 1000 1500
## mfw_35 0.97 0.99 0.99
## mfw_70 0.98 1.00 1.00
## mfw_100 0.88 0.99 0.98
##
## $Pl2_Republic2
## 500 1000 1500
## mfw_35 0.85 0.96 0.99
## mfw_70 0.93 1.00 1.00
## mfw_100 0.94 1.00 1.00
##
## $Pl2_Republic3
## 500 1000 1500
## mfw_35 0.75 0.87 0.90
## mfw_70 0.69 0.80 0.86
## mfw_100 0.83 0.90 0.98
##
## $Pl3_Laws1
## 500 1000 1500
## mfw_35 0.91 0.99 1.00
## mfw_70 0.92 0.98 0.97
## mfw_100 0.97 0.99 1.00
##
## $Pl3_Laws2
## 500 1000 1500
## mfw_35 0.87 0.98 0.99
## mfw_70 0.92 0.97 1.00
## mfw_100 0.91 0.99 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
sp3$confusion.matrices
## $Pl1_Euthyphro
## $Pl1_Euthyphro$mfw_35
## 500 1000 1500
## Pl1 95 98 100
## Pl2 1 0 0
## Pl3 4 2 0
##
## $Pl1_Euthyphro$mfw_70
## 500 1000 1500
## Pl1 98 99 100
## Pl2 2 1 0
## Pl3 0 0 0
##
## $Pl1_Euthyphro$mfw_100
## 500 1000 1500
## Pl1 96 100 100
## Pl2 4 0 0
## Pl3 0 0 0
##
##
## $Pl1_HippiasMajor
## $Pl1_HippiasMajor$mfw_35
## 500 1000 1500
## Pl1 97 99 99
## Pl2 3 1 1
## Pl3 0 0 0
##
## $Pl1_HippiasMajor$mfw_70
## 500 1000 1500
## Pl1 98 100 100
## Pl2 1 0 0
## Pl3 1 0 0
##
## $Pl1_HippiasMajor$mfw_100
## 500 1000 1500
## Pl1 88 99 98
## Pl2 12 1 2
## Pl3 0 0 0
##
##
## $Pl2_Republic2
## $Pl2_Republic2$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 85 96 99
## Pl3 15 4 1
##
## $Pl2_Republic2$mfw_70
## 500 1000 1500
## Pl1 1 0 0
## Pl2 93 100 100
## Pl3 6 0 0
##
## $Pl2_Republic2$mfw_100
## 500 1000 1500
## Pl1 2 0 0
## Pl2 94 100 100
## Pl3 4 0 0
##
##
## $Pl2_Republic3
## $Pl2_Republic3$mfw_35
## 500 1000 1500
## Pl1 1 0 0
## Pl2 75 87 90
## Pl3 24 13 10
##
## $Pl2_Republic3$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 69 80 86
## Pl3 31 20 14
##
## $Pl2_Republic3$mfw_100
## 500 1000 1500
## Pl1 1 0 0
## Pl2 83 90 98
## Pl3 16 10 2
##
##
## $Pl3_Laws1
## $Pl3_Laws1$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 9 1 0
## Pl3 91 99 100
##
## $Pl3_Laws1$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 8 2 3
## Pl3 92 98 97
##
## $Pl3_Laws1$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 3 1 0
## Pl3 97 99 100
##
##
## $Pl3_Laws2
## $Pl3_Laws2$mfw_35
## 500 1000 1500
## Pl1 0 0 0
## Pl2 13 2 1
## Pl3 87 98 99
##
## $Pl3_Laws2$mfw_70
## 500 1000 1500
## Pl1 0 0 0
## Pl2 8 3 0
## Pl3 92 97 100
##
## $Pl3_Laws2$mfw_100
## 500 1000 1500
## Pl1 0 0 0
## Pl2 9 1 0
## Pl3 91 99 100
##
##
## attr(,"description")
## [1] "all classification scores (raw tables)"
Now HiMa-Euthph is a clearly distinguished group, but there remains some confusion between the first books of the Laws and the Republic. Let us build another dendrogram to see which books of R. and L. are most remote stylistically.
We already have frequencies for R. 2-3, 8-9, and L. 1-2, 8-9. Let us make frequency lists for the remaining books.
dir.create("corpus4")
file.copy(c("Republic1.txt", "Republic4.txt","Republic5.txt", "Republic6.txt","Republic7.txt", "Republic10.txt", "Laws3.txt","Laws4.txt", "Laws5.txt", "Laws6.txt", "Laws7.txt", "Laws10.txt", "Laws11.txt", "Laws12.txt"), "corpus4")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
my_corpus4 <- load.corpus.and.parse(files = "all", corpus.dir = "corpus4", markup.type= "plain", corpus.lang = "Other", sampling = "no.sampling", preserve.case = FALSE, encoding = "UTF-8")
R1_freq <- make.frequency.list(my_corpus4$Republic1, value = TRUE)
R4_freq <- make.frequency.list(my_corpus4$Republic4, value = TRUE)
R5_freq <- make.frequency.list(my_corpus4$Republic5, value = TRUE)
R6_freq <- make.frequency.list(my_corpus4$Republic6, value = TRUE)
R7_freq <- make.frequency.list(my_corpus4$Republic7, value = TRUE)
R10_freq <- make.frequency.list(my_corpus4$Republic10, value = TRUE)
L3_freq <- make.frequency.list(my_corpus4$Laws3, value = TRUE)
L4_freq <- make.frequency.list(my_corpus4$Laws4, value = TRUE)
L5_freq <- make.frequency.list(my_corpus4$Laws5, value = TRUE)
L6_freq <- make.frequency.list(my_corpus4$Laws6, value = TRUE)
L7_freq <- make.frequency.list(my_corpus4$Laws7, value = TRUE)
L10_freq <- make.frequency.list(my_corpus4$Laws10, value = TRUE)
L11_freq <- make.frequency.list(my_corpus4$Laws11, value = TRUE)
L12_freq <- make.frequency.list(my_corpus4$Laws12, value = TRUE)
To subset frequencies for mfw:
R1 <- R1_freq[names]
R4 <- R4_freq[names]
R5 <- R5_freq[names]
R6 <- R6_freq[names]
R7 <- R7_freq[names]
R10 <- R10_freq[names]
L3 <- L3_freq[names]
L4 <- L4_freq[names]
L5 <- L5_freq[names]
L6 <- L6_freq[names]
L7 <- L7_freq[names]
L10 <- L10_freq[names]
L11 <- L11_freq[names]
L12 <- L12_freq[names]
dataset4 <- rbind(R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12)
hc4 <- hclust(dist.delta(dataset4))
hcd4 <- as.dendrogram(hc4)
hcd4 %>% set("labels_col", value = c("navy", "magenta"), k=4) %>%
plot(horiz = TRUE)
abline(v=1,col="purple",lty=2)
R. 2-3 and L. 1-2 are already in different clusters. Yet we can pick books with max distance for R2-3: L11 and L6, and also redefine Plato 3 in Set 1.
m4 <- as.matrix(dist.delta(dataset4))
m4part <- m4[,2:3]
m4part
## R2 R3
## R1 1.0079545 1.0182222
## R2 0.0000000 0.7931216
## R3 0.7931216 0.0000000
## R4 0.9444780 0.7595450
## R5 0.8203685 0.6679126
## R6 0.8354431 0.8099925
## R7 1.0144477 0.8327298
## R8 1.0336879 0.8688327
## R9 1.0643546 0.8086371
## R10 0.8554122 0.8093287
## L1 1.2031633 1.1370590
## L2 1.1273998 1.0313713
## L3 1.1429056 1.1564221
## L4 1.1631368 1.1576791
## L5 1.1994261 1.2028352
## L6 1.3995590 1.3405549
## L7 1.1867852 1.0370860
## L8 1.4223065 1.2616897
## L9 1.4156080 1.2222859
## L10 1.1526823 1.0032480
## L11 1.5041857 1.3228621
## L12 1.2993280 1.2222116
dir.create("corpus5")
file.copy(c("HippiasMajor.txt","Euthyphro.txt","Laws6.txt","Laws11.txt", "Republic2.txt","Republic3.txt"), "corpus5")
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
setwd("corpus5")
file.names <- list.files()
file.names
## [1] "Euthyphro.txt" "HippiasMajor.txt" "Laws11.txt" "Laws6.txt"
## [5] "Republic2.txt" "Republic3.txt"
new.file.names <- c("Pl1_Euthyphro.txt", "Pl1_HippiasMajor.txt", "Pl3_Laws11.txt", "Pl3_Laws6.txt", "Pl2_Republic2.txt", "Pl2_Republic3.txt")
file.rename(from = file.names, to = new.file.names)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
setwd("~/R_Workflow/2_Three_Platos_Distances")
sp5 <- size.penalize(corpus.dir = "corpus5", mfw = c(35, 70, 100), sample.size.coverage = c(500, 1000, 1500), classification.method = "delta")
sp5$accuracy.scores
## $Pl1_Euthyphro
## 500 1000 1500
## mfw_35 0.97 0.99 1
## mfw_70 0.99 1.00 1
## mfw_100 0.95 0.99 1
##
## $Pl1_HippiasMajor
## 500 1000 1500
## mfw_35 0.95 0.98 0.99
## mfw_70 1.00 0.99 1.00
## mfw_100 0.84 0.88 0.98
##
## $Pl2_Republic2
## 500 1000 1500
## mfw_35 0.98 1 1
## mfw_70 1.00 1 1
## mfw_100 1.00 1 1
##
## $Pl2_Republic3
## 500 1000 1500
## mfw_35 0.96 1.00 1
## mfw_70 0.95 0.99 1
## mfw_100 0.96 1.00 1
##
## $Pl3_Laws11
## 500 1000 1500
## mfw_35 1 1 1
## mfw_70 1 1 1
## mfw_100 1 1 1
##
## $Pl3_Laws6
## 500 1000 1500
## mfw_35 0.94 0.98 0.99
## mfw_70 1.00 1.00 1.00
## mfw_100 1.00 1.00 1.00
##
## attr(,"description")
## [1] "accuracy scores for the tested texts"
As a result of these computations, we can redefine our Set 1 to achieve higher accuracy scores:
Set 1: Plato 1 (HiMa., Euthph) vs. Plato 2 (R. 2 and 3) vs. Plato 3 (Lg 6 and 11)
Set 2: Plato 1 (Prt., Grg.) vs. Plato 2 (R. 8 and 9) vs. Plato 3 (Lg. 8 and 9)