Image retrieved from xkcd

Introduction

Iconicity refers to the resemblance relation that holds between the form and the meaning of a sign. In language, this can be manifested both orally and written. In spoken language, in particular, interlocutors tend to indicate spatial or temporal extent by varying prosodic features of their speech, such as the fundamental frequency (F0), the amplitude or the duration of their voices, a phenomenon commonly termed as iconic prosodic modulation (Fuchs et al. 2019: 1). A homologous case occurs in written language, when language users reduplicate letters in text messages as a means of conveying additional meaning.

The following sections present the process followed to analyse the extracted data as well as the results of the analysis. The entire data analysis was performed using the software R (version 4.1.3), while the packages dplyr, tidyr and stringr were particularly used for the data manipulation. Finally, the graphs displayed in the appendix’s final two sections were created using the package ggplot2.

Data wrangling

First, the resultant data set along with the relevant packages for the data manipulation and plotting are loaded in R:

library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)

du_data <- read.csv("Data set.csv")
head(du_data)
##   Frequency N.gram Adjective Reduplication Reduplication_size Un.inflected
## 1  11659013   lang      lang             0                  0            U
## 2   3310352  lange      lang             0                  0            I
## 3    563227   Lang      lang             0                  0            U
## 4    338996  Lange      lang             0                  0            I
## 5     94753   LANG      lang             0                  0            U
## 6     58386  langg      lang             1                  1            U
##   Vowel__Consonant Capitalisation
## 1             <NA>              0
## 2             <NA>              0
## 3             <NA>              0
## 4             <NA>              0
## 5             <NA>              1
## 6                C              0

A smaller dataframe displaying the total counts of non-reduplicated and reduplicated adjectival tokens is created:

re_du_data <- du_data %>%
    group_by(Reduplication, Adjective) %>%
    summarise(Total = sum(Frequency))
re_du_data
## # A tibble: 28 × 3
## # Groups:   Reduplication [2]
##    Reduplication Adjective    Total
##            <int> <chr>        <int>
##  1             0 dik        6618096
##  2             0 dun         417405
##  3             0 hard      10020081
##  4             0 hoog       2311384
##  5             0 kort       4007828
##  6             0 laag       1433535
##  7             0 lang      15985090
##  8             0 langzaam   1154772
##  9             0 licht      2191823
## 10             0 luid        104626
## # … with 18 more rows

The resultant tibble displays data in a long format, which results to rows containing data points in addition to variable names. In the case at hand, the tibble can be rearranged in such a way that the values of the column Reduplication, namely 0 and 1, will form two distinct columns that will be appended to the new tibble. To this end, the pivot_wider() function is employed. For the sake of convenience, the columns 0 and 1 are renamed as NoRedupl and Redupl respectively:

# Reshaping the tibble into a wider format:

percent <- re_du_data %>%
    pivot_wider(names_from = Reduplication, values_from = Total)

# Renaming columns '0' and '1' as 'NoRedupl' and 'Redupl' respectively:

percent <- percent %>%
    rename(NoRedupl = "0", Redupl = "1")
percent
## # A tibble: 14 × 3
##    Adjective NoRedupl Redupl
##    <chr>        <int>  <int>
##  1 dik        6618096  60043
##  2 dun         417405  23387
##  3 hard      10020081 168236
##  4 hoog       2311384  30782
##  5 kort       4007828  15570
##  6 laag       1433535 114362
##  7 lang      15985090 290977
##  8 langzaam   1154772  26161
##  9 licht      2191823   8237
## 10 luid        104626   4365
## 11 snel      17822917 216802
## 12 stil       3232898 230789
## 13 traag       448440  23973
## 14 zwaar      3586643 158512

As a next step, the relative frequencies of the reduplicated forms per adjective are calculated by dividing the number of each adjective’s reduplicated forms by the sum of their non-reduplicated and reduplicated forms. The quotient is assigned to the variable rel_fr, and, then, appended as a separate column to the tibble percent under the name Rel_frequency:

rel_fr <- percent$Redupl/(percent$NoRedupl + percent$Redupl)
percent$Rel_frequency <- rel_fr
percent
## # A tibble: 14 × 4
##    Adjective NoRedupl Redupl Rel_frequency
##    <chr>        <int>  <int>         <dbl>
##  1 dik        6618096  60043       0.00899
##  2 dun         417405  23387       0.0531 
##  3 hard      10020081 168236       0.0165 
##  4 hoog       2311384  30782       0.0131 
##  5 kort       4007828  15570       0.00387
##  6 laag       1433535 114362       0.0739 
##  7 lang      15985090 290977       0.0179 
##  8 langzaam   1154772  26161       0.0222 
##  9 licht      2191823   8237       0.00374
## 10 luid        104626   4365       0.0400 
## 11 snel      17822917 216802       0.0120 
## 12 stil       3232898 230789       0.0666 
## 13 traag       448440  23973       0.0507 
## 14 zwaar      3586643 158512       0.0423

For ease of computation, percent is transformed into a data matrix with each row containing an adjective, and two columns, one corresponding to the non-reduplicated and the other to the reduplicated adjectival forms:

adjectives <- data.matrix(percent, rownames.force = NA)
rownames(adjectives) <- c("dik", "dun", "hard", "hoog", "kort", "laag", "lang", "langzaam",
    "licht", "luid", "snel", "stil", "traag", "zwaar")
Adjectives <- adjectives[, -c(1, 4)]
Adjectives
##          NoRedupl Redupl
## dik       6618096  60043
## dun        417405  23387
## hard     10020081 168236
## hoog      2311384  30782
## kort      4007828  15570
## laag      1433535 114362
## lang     15985090 290977
## langzaam  1154772  26161
## licht     2191823   8237
## luid       104626   4365
## snel     17822917 216802
## stil      3232898 230789
## traag      448440  23973
## zwaar     3586643 158512

Data analysis

In order to identify whether the presence of reduplication depends on the dimension, i.e., large or small, expressed by each adjective, a chi squared test of independence is performed along with a Fisher’s exact test for each antonymic adjectival pair. In those cases where multiple antonyms correspond to an adjective, an additional comparison is performed by creating a group of the multiple antonyms, and, then comparing it to the relevant adjective. Both types of comparison are described in the subsequent sections.

Comparing adjectives pairwise

The project examined the following six antonymic adjectival pairs:

  1. dik :: dun
  2. hard, luid :: stil
  3. laag :: hoog
  4. lang :: kort
  5. langzaam, traag :: snel
  6. zwaar :: licht

The adjectives on the left correspond to the large dimension, whereas those on the right correspond to the small dimension. As the list above indicates, the adjectives stil and snel has each two antonyms. For the purposes of the pairwise comparison, stil and snel are each paired with each of their antonyms, thus resulting to a total of eight antonymic adjectival pairs:

  1. dik :: dun
  2. hard :: stil
  3. luid :: stil
  4. laag :: hoog
  5. lang :: kort
  6. langzaam :: snel
  7. traag :: snel
  8. zwaar :: licht
# Antonymic pair 1: dik vs. dun:

ant_p_1 <- Adjectives[c(1, 2), ]

# Chi squared test for ant_p_1:

chisq.test(ant_p_1, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_1
## X-squared = 69325, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_1:

fisher.test(ant_p_1)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  6.080589 6.272458
## sample estimates:
## odds ratio 
##   6.175725
# Antonymic pair 2: hard vs. stil:

ant_p_2 <- Adjectives[c(3, 12), ]

# Chi squared test for ant_p_2:

chisq.test(ant_p_2, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_2
## X-squared = 228833, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_2:

fisher.test(ant_p_2)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  4.224368 4.279579
## sample estimates:
## odds ratio 
##   4.251441
# Antonymic pair 3: luid vs. stil:

ant_p_3 <- Adjectives[c(10, 12), ]

# Chi squared test for ant_p_3:

chisq.test(ant_p_3, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_3
## X-squared = 1214.3, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_3:

fisher.test(ant_p_3)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_3
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.659573 1.764607
## sample estimates:
## odds ratio 
##   1.711064
# Antonymic pair 4: laag vs. hoog:

ant_p_4 <- Adjectives[c(6, 4), ]

# Chi squared test for ant_p_4:

chisq.test(ant_p_4, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_4
## X-squared = 95724, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_4:

fisher.test(ant_p_4)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_4
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.1648064 0.1691000
## sample estimates:
## odds ratio 
##   0.166937
# Antonymic pair 5: lang vs. kort:

ant_p_5 <- Adjectives[c(7, 5), ]

# Chi squared test for ant_p_5:

chisq.test(ant_p_5, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_5
## X-squared = 42559, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_5:

fisher.test(ant_p_5)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_5
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2099866 0.2168883
## sample estimates:
## odds ratio 
##  0.2134191
# Antonymic pair 6: langzaam vs. snel:

ant_p_6 <- Adjectives[c(8, 11), ]

# Chi squared test for ant_p_6:

chisq.test(ant_p_6, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_6
## X-squared = 9121.6, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_6:

fisher.test(ant_p_6)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_6
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.5300272 0.5439636
## sample estimates:
## odds ratio 
##  0.5369125
# Antonymic pair 7: traag vs. snel:

ant_p_7 <- Adjectives[c(13, 11), ]

# Chi squared test for ant_p_7:

chisq.test(ant_p_7, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_7
## X-squared = 53786, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_7:

fisher.test(ant_p_7)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_7
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2244416 0.2306809
## sample estimates:
## odds ratio 
##  0.2275198
# Antonymic pair 8: zwaar vs. licht:

ant_p_8 <- Adjectives[c(14, 9), ]

# Chi squared test for ant_p_8:

chisq.test(ant_p_8, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  ant_p_8
## X-squared = 75672, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_8:

fisher.test(ant_p_8)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  ant_p_8
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.08316735 0.08695111
## sample estimates:
## odds ratio 
## 0.08502935

Forming the antonyms of stil and snel into distinct groups

In addition to the pairwise comparison, for those adjectives with multiple antonyms, namely stil and snel, a supplementary comparison is performed in which the multiple antonyms of each of those adjectives are formed into separate groups. The grouped antonyms are subsequently compared to the corresponding adjective. Thus, the adjectival pairs to be compared are the following:

  1. hard & luid :: stil
  2. langzaam & traag :: snel

As the code chunk below presents, first, the individual non-reduplicated frequencies of the adjectives comprising each antonymic dyad are added up, thus, providing the total count of non-reduplicated forms for the entire dyad. Then, the total count of reduplicated forms per antonymic dyad is calculated similarly. In order to obtain the relative frequency for each dyad’s reduplicated forms, the total count of reduplicated forms of each dyad is divided by the sum of the corresponding non-reduplicated and reduplicated forms. The resulting counts are presented in separate dataframes, which are finally combined into the dataframe new_adj:

# Summed non-reduplicated frequencies of hard & luid:

ant_luid_no <- percent[3, 2] + percent[10, 2]

# Summed reduplicated frequencies of hard & luid:

ant_luid_r <- percent[3, 3] + percent[10, 3]

# Calculating the relative frequency for the reduplicated forms of hard & luid:

rf_ant_luid <- ant_luid_r/(ant_luid_no + ant_luid_r)
rf_ant_luid <- rf_ant_luid %>%
    rename(Rel_frequency = "Redupl")

# Creating the dataframe 'h_l' with the relevant counts for hard & luid:

h_l <- cbind(ant_luid_no, ant_luid_r, rf_ant_luid)
adj <- "hard_and_luid"
h_l$Adjective <- adj
h_l
##   NoRedupl Redupl Rel_frequency     Adjective
## 1 10124707 172601    0.01676176 hard_and_luid
# Summed non-reduplicated frequencies of langzaam & traag:

ant_stil_no <- percent[8, 2] + percent[13, 2]

# Summed reduplicated frequencies of langzaam & traag:

ant_stil_r <- percent[8, 3] + percent[13, 3]

# Calculating the relative frequency for the reduplicated forms of langzaam &
# traag:

rf_ant_stil <- ant_stil_r/(ant_stil_no + ant_stil_r)
rf_ant_stil <- rf_ant_stil %>%
    rename(Rel_frequency = "Redupl")

# Creating the dataframe 'l_t' with the relevant counts for langzaam & traag:

l_t <- cbind(ant_stil_no, ant_stil_r, rf_ant_stil)
adj <- "langzaam_and_traag"
l_t$Adjective <- adj
l_t
##   NoRedupl Redupl Rel_frequency          Adjective
## 1  1603212  50134    0.03032275 langzaam_and_traag
# Combining the two dataframes into 'new_adj' and rearranging the order of its
# columns:

new_adj <- rbind(h_l, l_t)
new_adj <- new_adj[, c(4, 1, 2, 3)]
new_adj
##            Adjective NoRedupl Redupl Rel_frequency
## 1      hard_and_luid 10124707 172601    0.01676176
## 2 langzaam_and_traag  1603212  50134    0.03032275

As a further step, the summed and relative frequencies of the grouped antonymic pairs are appended to the tibble new_percent. Then, the column Rel_frequency is removed from new_percent. The resultant tibble is assigned to the variable Adjectives_grouped, and converted to a data matrix. Finally, as in the case of the pairwise comparison, a chi squared test of independence in addition to a Fisher’s exact test is performed for each grouped antonymic pair:

# Adding the summed and relative frequencies of the grouped antonymic pairs to
# the tibble 'new_percent':

percent_2 <- percent[-c(3, 8, 10, 13), ]
new_percent <- rbind(percent_2, new_adj)
new_percent
## # A tibble: 12 × 4
##    Adjective          NoRedupl Redupl Rel_frequency
##    <chr>                 <int>  <int>         <dbl>
##  1 dik                 6618096  60043       0.00899
##  2 dun                  417405  23387       0.0531 
##  3 hoog                2311384  30782       0.0131 
##  4 kort                4007828  15570       0.00387
##  5 laag                1433535 114362       0.0739 
##  6 lang               15985090 290977       0.0179 
##  7 licht               2191823   8237       0.00374
##  8 snel               17822917 216802       0.0120 
##  9 stil                3232898 230789       0.0666 
## 10 zwaar               3586643 158512       0.0423 
## 11 hard_and_luid      10124707 172601       0.0168 
## 12 langzaam_and_traag  1603212  50134       0.0303
# Creating the data matrix 'Adjectives_grouped' containing the grouped
# antonymic pairs:

Adjectives_grouped <- new_percent[, -c(4)]
Adjectives_grouped <- data.matrix(Adjectives_grouped, rownames.force = NA)
rownames(Adjectives_grouped) <- c("dik", "dun", "hoog", "kort", "laag", "lang", "licht",
    "snel", "stil", "zwaar", "hard_and_luid", "langzaam_and_traag")

Adjectives_grouped <- Adjectives_grouped[, -c(1)]
Adjectives_grouped
##                    NoRedupl Redupl
## dik                 6618096  60043
## dun                  417405  23387
## hoog                2311384  30782
## kort                4007828  15570
## laag                1433535 114362
## lang               15985090 290977
## licht               2191823   8237
## snel               17822917 216802
## stil                3232898 230789
## zwaar               3586643 158512
## hard_and_luid      10124707 172601
## langzaam_and_traag  1603212  50134
# Grouped antonymic pair 1: hard & luid vs. stil:

g.ant_p_1 <- Adjectives_grouped[c(11, 9), ]

# Chi squared test for 'g.ant_p_1':

chisq.test(g.ant_p_1, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  g.ant_p_1
## X-squared = 226529, df = 1, p-value < 2.2e-16
# Fisher's exact test for 'g.ant_p_1':

fisher.test(g.ant_p_1)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  g.ant_p_1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  4.161144 4.214733
## sample estimates:
## odds ratio 
##   4.187187
# Grouped antonymic pair 2: langzaam & traag vs. snel:

g.ant_p_2 <- Adjectives_grouped[c(12, 8), ]

# Chi squared test for 'g.ant_p_2':

chisq.test(g.ant_p_2, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  g.ant_p_2
## X-squared = 37952, df = 1, p-value < 2.2e-16
# Fisher's exact test for 'g.ant_p_2':

fisher.test(g.ant_p_2)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  g.ant_p_2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3851732 0.3928268
## sample estimates:
## odds ratio 
##  0.3890051

Plotting the letter reduplications

In order to obtain a better understanding of the reduplicated letters in the recorded data, two plotting processes are followed. The first plotting process involves the grouping of the adjectives under examination into the corresponding adjectival dimension, i.e., the large vs. the small dimension, and produces a graph that compares letter reduplications across the two adjectival dimensions. In the second plotting process adjectives are examined in pairs. Moreover, they are classified into six categories, namely elevation, height, loudness, speed, weight, and width, each of which corresponds to the property expressed by each of the examined adjectival pairs. The resultant plot displays the mean letter reduplications recorded for both the large and the small adjectival dimension across all six categories. The following two sections offer a detailed account of each plotting process.

Per adjectival dimension

Some of the recorded adjectival tokens displayed both vowel as well as consonant reduplications. In such cases, letter reduplications were recorded as fractions in which numerators indicated the letter that was reduplicated first, while denominators the one that was reduplicated second. For instance, the letter reduplications for the token laaaannge were represented as 3/1, where, based on the orthographic norm of the word, 3 corresponded to the additional three reduplications of the letter a, while 1 represented the single reduplication of the letter n. For the purposes of both plotting processes, however, the letter reduplications per adjectival token need to be regarded as sums. The sumslash function collapses the distinct indications of letter reduplications into one integral number corresponding to the total number of reduplicated letters per adjectival token:

sumslash <- function (strg) {ifelse(grepl("/", strg),
                                    as.numeric(substring(strg, 1, 1)) +
                                      as.numeric(str_sub(strg, -1, -1)),
                                    
                                    return(strtoi(strg)))
}

Having resolved this issue, the initial du_data dataframe is renamed to du_data_mod. The column redsize containing the total number of reduplicated letters per adjectival token is appended to the renamed dataframe:

# Renaming the 'du_data' dataframe as 'du_data_mod':

du_data_mod <- du_data

# Appending the column 'redsize' to the renamed dataframe and applying the
# 'sumslash' function:

du_data_mod$redsize <- du_data_mod$Reduplication_size
du_data_mod$redsize <- sapply(du_data_mod$redsize, sumslash)
head(du_data_mod)
##   Frequency N.gram Adjective Reduplication Reduplication_size Un.inflected
## 1  11659013   lang      lang             0                  0            U
## 2   3310352  lange      lang             0                  0            I
## 3    563227   Lang      lang             0                  0            U
## 4    338996  Lange      lang             0                  0            I
## 5     94753   LANG      lang             0                  0            U
## 6     58386  langg      lang             1                  1            U
##   Vowel__Consonant Capitalisation redsize
## 1             <NA>              0       0
## 2             <NA>              0       0
## 3             <NA>              0       0
## 4             <NA>              0       0
## 5             <NA>              1       0
## 6                C              0       1

The following code chunk introduces a factor that keeps track of the letter reduplications in the two adjectival dimensions:

du_data_mod$dimension <- sapply(du_data_mod$Adjective, function(a) ifelse(a %in%
    c("dun", "stil", "hoog", "kort", "snel", "licht"), "dimsmall", "dimlarge"))

In order to plot the letter reduplications across the two adjectival dimensions, the smaller dataframe forana is created. This dataframe comprises the reduplicated adjectival tokens, which are now dimensionally defined, along with the number of their letter reduplications:

# Creating the smaller dataframe 'forana':

forana <- du_data_mod %>%
    select(dimension, redsize) %>%
    filter(redsize > 0)
head(forana)
##   dimension redsize
## 1  dimlarge       1
## 2  dimlarge       3
## 3  dimlarge       2
## 4  dimlarge       3
## 5  dimlarge       2
## 6  dimlarge       2
# Identifying the large- and small-dimension adjectives in the 'forana'
# dataframe:

bigadjectives <- subset(forana, dimension == "dimlarge")$redsize
smalladjectives <- subset(forana, dimension == "dimsmall")$redsize

As the code chunk below demonstrates, the graph of this plotting process is generated by passing the forana dataframe to the function ggplot:

ggplot(forana, aes(x = dimension, y = redsize)) + geom_boxplot() + theme_classic()

As a next step, an F-test is performed to compare the variances between the two samples, i.e., between bigadjectives and smalladjectives:

# F-test:

var.test(bigadjectives, smalladjectives)
## 
##  F test to compare two variances
## 
## data:  bigadjectives and smalladjectives
## F = 1.9794, num df = 499, denom df = 225, p-value = 1.171e-08
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  1.576149 2.462196
## sample estimates:
## ratio of variances 
##           1.979384

The F-test returns a p-value much lower than the conventional significance level of 0.05 (p-value = 1.171e-08, \(df_n = 499\), \(df_d = 225\)), which indicates that the two tested samples are heteroscedastic, i.e., their variances differ significantly. For this reason, a Welch t-test is performed. The Welch t-test assumes that the tested samples’ variances are unequal, and it is conducted in order to identify whether the difference between the means of the two tested samples is significant:

# Welch t-test:

test <- t.test(bigadjectives, smalladjectives, var.equal = FALSE)
test
## 
##  Welch Two Sample t-test
## 
## data:  bigadjectives and smalladjectives
## t = 3.7443, df = 593.5, p-value = 0.0001985
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4073823 1.3061752
## sample estimates:
## mean of x mean of y 
##  4.432000  3.575221

The resultant p-value is lower than 0.05 (p-value = 0.0001985, df = 593.5), which suggests that the two samples differ significantly with respect to their means.

Per adjectival category

As displayed in the code chunk below, the mean and standard deviation of the reduplicated letters per adjectival type are calculated. These summary statistics are assigned to the existing dataframe forana, and, thus, override its previous content. Furthermore, two more columns are added to the new forana tibble, namely the column dimension, which matches each adjectival type with the relevant dimension, i.e., the large or the small dimension, and the column kind, which specifies the category to which each adjectival type belongs, i.e., elevation, height, loudness, speed, weight or width:

# Overriding the existing 'forana' dataframe by adding to it the summarised
# data:

forana <- du_data_mod %>%
    filter(redsize > 0) %>%
    group_by(Adjective) %>%
    summarise(mean = mean(redsize), sd = sd(redsize))

# Identifying the dimension of each adjectival type and adding the resultant
# observations to the dataframe 'forana' under the column 'dimension':

forana$dimension <- sapply(forana$Adjective, function(a) ifelse(a %in% c("dun", "stil",
    "hoog", "kort", "snel", "licht"), "dimsmall", "dimlarge"))

# Assigning the relevant category to each adjectival type and adding the
# resultant observations to the dataframe 'forana' under the column 'kind':

forana$kind <- c("width", "width", "loudness", "elevation", "height", "elevation",
    "height", "speed", "weight", "loudness", "speed", "loudness", "speed", "weight")
forana
## # A tibble: 14 × 5
##    Adjective  mean    sd dimension kind     
##    <chr>     <dbl> <dbl> <chr>     <chr>    
##  1 dik        3.04  1.58 dimlarge  width    
##  2 dun        2.45  1.41 dimsmall  width    
##  3 hard       3.74  2.72 dimlarge  loudness 
##  4 hoog       2.71  1.81 dimsmall  elevation
##  5 kort       2.25  1.26 dimsmall  height   
##  6 laag       3.56  2.66 dimlarge  elevation
##  7 lang       5.60  4.42 dimlarge  height   
##  8 langzaam   4.11  2.98 dimlarge  speed    
##  9 licht      1.77  1.36 dimsmall  weight   
## 10 luid       1.25  0.5  dimlarge  loudness 
## 11 snel       4.5   2.75 dimsmall  speed    
## 12 stil       4.26  2.66 dimsmall  loudness 
## 13 traag      4.80  3.70 dimlarge  speed    
## 14 zwaar      4.67  3.36 dimlarge  weight

The graph of this plotting process is created by passing the redefined forana tibble to the function ggplot as indicated below:

ggplot(forana, aes(x = kind, y = mean, group = Adjective)) + geom_bar(aes(fill = dimension),
    stat = "identity", position = "dodge", colour = "black") + geom_text(aes(label = Adjective),
    position = position_dodge(width = 0.9), vjust = -0.25, size = 3) + theme_classic() +
    scale_fill_manual("dimension", values = c(dimlarge = "steelblue", dimsmall = "goldenrod"))

ggsave("Category_comparison.pdf", width = 6, height = 4)

References

Fuchs, Susanne, Egor Savin, Stephanie Solt, Cornelia Ebert, and Manfred Krifka. 2019. “Antonym Adjective Pairs and Prosodic Iconicity: Evidence from Letter Replications in an English Blogger Corpus.” Linguistics Vanguard 5 (1).