Image retrieved from xkcd
Iconicity refers to the resemblance relation that holds between the form and the meaning of a sign. In language, this can be manifested both orally and written. In spoken language, in particular, interlocutors tend to indicate spatial or temporal extent by varying prosodic features of their speech, such as the fundamental frequency (F0), the amplitude or the duration of their voices, a phenomenon commonly termed as iconic prosodic modulation (Fuchs et al. 2019: 1). A homologous case occurs in written language, when language users reduplicate letters in text messages as a means of conveying additional meaning.
The following sections present the process followed to analyse the extracted data as well as the results of the analysis. The entire data analysis was performed using the software R (version 4.1.3), while the packages dplyr, tidyr and stringr were particularly used for the data manipulation. Finally, the graphs displayed in the appendix’s final two sections were created using the package ggplot2.
First, the resultant data set along with the relevant packages for the data manipulation and plotting are loaded in R:
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
du_data <- read.csv("Data set.csv")
head(du_data)
## Frequency N.gram Adjective Reduplication Reduplication_size Un.inflected
## 1 11659013 lang lang 0 0 U
## 2 3310352 lange lang 0 0 I
## 3 563227 Lang lang 0 0 U
## 4 338996 Lange lang 0 0 I
## 5 94753 LANG lang 0 0 U
## 6 58386 langg lang 1 1 U
## Vowel__Consonant Capitalisation
## 1 <NA> 0
## 2 <NA> 0
## 3 <NA> 0
## 4 <NA> 0
## 5 <NA> 1
## 6 C 0
A smaller dataframe displaying the total counts of non-reduplicated and reduplicated adjectival tokens is created:
re_du_data <- du_data %>%
group_by(Reduplication, Adjective) %>%
summarise(Total = sum(Frequency))
re_du_data
## # A tibble: 28 × 3
## # Groups: Reduplication [2]
## Reduplication Adjective Total
## <int> <chr> <int>
## 1 0 dik 6618096
## 2 0 dun 417405
## 3 0 hard 10020081
## 4 0 hoog 2311384
## 5 0 kort 4007828
## 6 0 laag 1433535
## 7 0 lang 15985090
## 8 0 langzaam 1154772
## 9 0 licht 2191823
## 10 0 luid 104626
## # … with 18 more rows
The resultant tibble displays data in a long format, which results to
rows containing data points in addition to variable names. In the case
at hand, the tibble can be rearranged in such a way that the values of
the column Reduplication, namely 0 and
1, will form two distinct columns that will be appended to
the new tibble. To this end, the pivot_wider() function is
employed. For the sake of convenience, the columns 0 and
1 are renamed as NoRedupl and
Redupl respectively:
# Reshaping the tibble into a wider format:
percent <- re_du_data %>%
pivot_wider(names_from = Reduplication, values_from = Total)
# Renaming columns '0' and '1' as 'NoRedupl' and 'Redupl' respectively:
percent <- percent %>%
rename(NoRedupl = "0", Redupl = "1")
percent
## # A tibble: 14 × 3
## Adjective NoRedupl Redupl
## <chr> <int> <int>
## 1 dik 6618096 60043
## 2 dun 417405 23387
## 3 hard 10020081 168236
## 4 hoog 2311384 30782
## 5 kort 4007828 15570
## 6 laag 1433535 114362
## 7 lang 15985090 290977
## 8 langzaam 1154772 26161
## 9 licht 2191823 8237
## 10 luid 104626 4365
## 11 snel 17822917 216802
## 12 stil 3232898 230789
## 13 traag 448440 23973
## 14 zwaar 3586643 158512
As a next step, the relative frequencies of the reduplicated forms
per adjective are calculated by dividing the number of each adjective’s
reduplicated forms by the sum of their non-reduplicated and reduplicated
forms. The quotient is assigned to the variable rel_fr,
and, then, appended as a separate column to the tibble
percent under the name Rel_frequency:
rel_fr <- percent$Redupl/(percent$NoRedupl + percent$Redupl)
percent$Rel_frequency <- rel_fr
percent
## # A tibble: 14 × 4
## Adjective NoRedupl Redupl Rel_frequency
## <chr> <int> <int> <dbl>
## 1 dik 6618096 60043 0.00899
## 2 dun 417405 23387 0.0531
## 3 hard 10020081 168236 0.0165
## 4 hoog 2311384 30782 0.0131
## 5 kort 4007828 15570 0.00387
## 6 laag 1433535 114362 0.0739
## 7 lang 15985090 290977 0.0179
## 8 langzaam 1154772 26161 0.0222
## 9 licht 2191823 8237 0.00374
## 10 luid 104626 4365 0.0400
## 11 snel 17822917 216802 0.0120
## 12 stil 3232898 230789 0.0666
## 13 traag 448440 23973 0.0507
## 14 zwaar 3586643 158512 0.0423
For ease of computation, percent is transformed into a
data matrix with each row containing an adjective, and two columns, one
corresponding to the non-reduplicated and the other to the reduplicated
adjectival forms:
adjectives <- data.matrix(percent, rownames.force = NA)
rownames(adjectives) <- c("dik", "dun", "hard", "hoog", "kort", "laag", "lang", "langzaam",
"licht", "luid", "snel", "stil", "traag", "zwaar")
Adjectives <- adjectives[, -c(1, 4)]
Adjectives
## NoRedupl Redupl
## dik 6618096 60043
## dun 417405 23387
## hard 10020081 168236
## hoog 2311384 30782
## kort 4007828 15570
## laag 1433535 114362
## lang 15985090 290977
## langzaam 1154772 26161
## licht 2191823 8237
## luid 104626 4365
## snel 17822917 216802
## stil 3232898 230789
## traag 448440 23973
## zwaar 3586643 158512
In order to identify whether the presence of reduplication depends on the dimension, i.e., large or small, expressed by each adjective, a chi squared test of independence is performed along with a Fisher’s exact test for each antonymic adjectival pair. In those cases where multiple antonyms correspond to an adjective, an additional comparison is performed by creating a group of the multiple antonyms, and, then comparing it to the relevant adjective. Both types of comparison are described in the subsequent sections.
The project examined the following six antonymic adjectival pairs:
The adjectives on the left correspond to the large dimension, whereas those on the right correspond to the small dimension. As the list above indicates, the adjectives stil and snel has each two antonyms. For the purposes of the pairwise comparison, stil and snel are each paired with each of their antonyms, thus resulting to a total of eight antonymic adjectival pairs:
# Antonymic pair 1: dik vs. dun:
ant_p_1 <- Adjectives[c(1, 2), ]
# Chi squared test for ant_p_1:
chisq.test(ant_p_1, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_1
## X-squared = 69325, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_1:
fisher.test(ant_p_1)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 6.080589 6.272458
## sample estimates:
## odds ratio
## 6.175725
# Antonymic pair 2: hard vs. stil:
ant_p_2 <- Adjectives[c(3, 12), ]
# Chi squared test for ant_p_2:
chisq.test(ant_p_2, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_2
## X-squared = 228833, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_2:
fisher.test(ant_p_2)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 4.224368 4.279579
## sample estimates:
## odds ratio
## 4.251441
# Antonymic pair 3: luid vs. stil:
ant_p_3 <- Adjectives[c(10, 12), ]
# Chi squared test for ant_p_3:
chisq.test(ant_p_3, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_3
## X-squared = 1214.3, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_3:
fisher.test(ant_p_3)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_3
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.659573 1.764607
## sample estimates:
## odds ratio
## 1.711064
# Antonymic pair 4: laag vs. hoog:
ant_p_4 <- Adjectives[c(6, 4), ]
# Chi squared test for ant_p_4:
chisq.test(ant_p_4, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_4
## X-squared = 95724, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_4:
fisher.test(ant_p_4)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_4
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.1648064 0.1691000
## sample estimates:
## odds ratio
## 0.166937
# Antonymic pair 5: lang vs. kort:
ant_p_5 <- Adjectives[c(7, 5), ]
# Chi squared test for ant_p_5:
chisq.test(ant_p_5, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_5
## X-squared = 42559, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_5:
fisher.test(ant_p_5)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_5
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.2099866 0.2168883
## sample estimates:
## odds ratio
## 0.2134191
# Antonymic pair 6: langzaam vs. snel:
ant_p_6 <- Adjectives[c(8, 11), ]
# Chi squared test for ant_p_6:
chisq.test(ant_p_6, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_6
## X-squared = 9121.6, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_6:
fisher.test(ant_p_6)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_6
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.5300272 0.5439636
## sample estimates:
## odds ratio
## 0.5369125
# Antonymic pair 7: traag vs. snel:
ant_p_7 <- Adjectives[c(13, 11), ]
# Chi squared test for ant_p_7:
chisq.test(ant_p_7, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_7
## X-squared = 53786, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_7:
fisher.test(ant_p_7)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_7
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.2244416 0.2306809
## sample estimates:
## odds ratio
## 0.2275198
# Antonymic pair 8: zwaar vs. licht:
ant_p_8 <- Adjectives[c(14, 9), ]
# Chi squared test for ant_p_8:
chisq.test(ant_p_8, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: ant_p_8
## X-squared = 75672, df = 1, p-value < 2.2e-16
# Fisher's exact test for ant_p_8:
fisher.test(ant_p_8)
##
## Fisher's Exact Test for Count Data
##
## data: ant_p_8
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.08316735 0.08695111
## sample estimates:
## odds ratio
## 0.08502935
In addition to the pairwise comparison, for those adjectives with multiple antonyms, namely stil and snel, a supplementary comparison is performed in which the multiple antonyms of each of those adjectives are formed into separate groups. The grouped antonyms are subsequently compared to the corresponding adjective. Thus, the adjectival pairs to be compared are the following:
As the code chunk below presents, first, the individual
non-reduplicated frequencies of the adjectives comprising each antonymic
dyad are added up, thus, providing the total count of non-reduplicated
forms for the entire dyad. Then, the total count of reduplicated forms
per antonymic dyad is calculated similarly. In order to obtain the
relative frequency for each dyad’s reduplicated forms, the total count
of reduplicated forms of each dyad is divided by the sum of the
corresponding non-reduplicated and reduplicated forms. The resulting
counts are presented in separate dataframes, which are finally combined
into the dataframe new_adj:
# Summed non-reduplicated frequencies of hard & luid:
ant_luid_no <- percent[3, 2] + percent[10, 2]
# Summed reduplicated frequencies of hard & luid:
ant_luid_r <- percent[3, 3] + percent[10, 3]
# Calculating the relative frequency for the reduplicated forms of hard & luid:
rf_ant_luid <- ant_luid_r/(ant_luid_no + ant_luid_r)
rf_ant_luid <- rf_ant_luid %>%
rename(Rel_frequency = "Redupl")
# Creating the dataframe 'h_l' with the relevant counts for hard & luid:
h_l <- cbind(ant_luid_no, ant_luid_r, rf_ant_luid)
adj <- "hard_and_luid"
h_l$Adjective <- adj
h_l
## NoRedupl Redupl Rel_frequency Adjective
## 1 10124707 172601 0.01676176 hard_and_luid
# Summed non-reduplicated frequencies of langzaam & traag:
ant_stil_no <- percent[8, 2] + percent[13, 2]
# Summed reduplicated frequencies of langzaam & traag:
ant_stil_r <- percent[8, 3] + percent[13, 3]
# Calculating the relative frequency for the reduplicated forms of langzaam &
# traag:
rf_ant_stil <- ant_stil_r/(ant_stil_no + ant_stil_r)
rf_ant_stil <- rf_ant_stil %>%
rename(Rel_frequency = "Redupl")
# Creating the dataframe 'l_t' with the relevant counts for langzaam & traag:
l_t <- cbind(ant_stil_no, ant_stil_r, rf_ant_stil)
adj <- "langzaam_and_traag"
l_t$Adjective <- adj
l_t
## NoRedupl Redupl Rel_frequency Adjective
## 1 1603212 50134 0.03032275 langzaam_and_traag
# Combining the two dataframes into 'new_adj' and rearranging the order of its
# columns:
new_adj <- rbind(h_l, l_t)
new_adj <- new_adj[, c(4, 1, 2, 3)]
new_adj
## Adjective NoRedupl Redupl Rel_frequency
## 1 hard_and_luid 10124707 172601 0.01676176
## 2 langzaam_and_traag 1603212 50134 0.03032275
As a further step, the summed and relative frequencies of the grouped
antonymic pairs are appended to the tibble new_percent.
Then, the column Rel_frequency is removed from
new_percent. The resultant tibble is assigned to the
variable Adjectives_grouped, and converted to a data
matrix. Finally, as in the case of the pairwise comparison, a chi
squared test of independence in addition to a Fisher’s exact test is
performed for each grouped antonymic pair:
# Adding the summed and relative frequencies of the grouped antonymic pairs to
# the tibble 'new_percent':
percent_2 <- percent[-c(3, 8, 10, 13), ]
new_percent <- rbind(percent_2, new_adj)
new_percent
## # A tibble: 12 × 4
## Adjective NoRedupl Redupl Rel_frequency
## <chr> <int> <int> <dbl>
## 1 dik 6618096 60043 0.00899
## 2 dun 417405 23387 0.0531
## 3 hoog 2311384 30782 0.0131
## 4 kort 4007828 15570 0.00387
## 5 laag 1433535 114362 0.0739
## 6 lang 15985090 290977 0.0179
## 7 licht 2191823 8237 0.00374
## 8 snel 17822917 216802 0.0120
## 9 stil 3232898 230789 0.0666
## 10 zwaar 3586643 158512 0.0423
## 11 hard_and_luid 10124707 172601 0.0168
## 12 langzaam_and_traag 1603212 50134 0.0303
# Creating the data matrix 'Adjectives_grouped' containing the grouped
# antonymic pairs:
Adjectives_grouped <- new_percent[, -c(4)]
Adjectives_grouped <- data.matrix(Adjectives_grouped, rownames.force = NA)
rownames(Adjectives_grouped) <- c("dik", "dun", "hoog", "kort", "laag", "lang", "licht",
"snel", "stil", "zwaar", "hard_and_luid", "langzaam_and_traag")
Adjectives_grouped <- Adjectives_grouped[, -c(1)]
Adjectives_grouped
## NoRedupl Redupl
## dik 6618096 60043
## dun 417405 23387
## hoog 2311384 30782
## kort 4007828 15570
## laag 1433535 114362
## lang 15985090 290977
## licht 2191823 8237
## snel 17822917 216802
## stil 3232898 230789
## zwaar 3586643 158512
## hard_and_luid 10124707 172601
## langzaam_and_traag 1603212 50134
# Grouped antonymic pair 1: hard & luid vs. stil:
g.ant_p_1 <- Adjectives_grouped[c(11, 9), ]
# Chi squared test for 'g.ant_p_1':
chisq.test(g.ant_p_1, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: g.ant_p_1
## X-squared = 226529, df = 1, p-value < 2.2e-16
# Fisher's exact test for 'g.ant_p_1':
fisher.test(g.ant_p_1)
##
## Fisher's Exact Test for Count Data
##
## data: g.ant_p_1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 4.161144 4.214733
## sample estimates:
## odds ratio
## 4.187187
# Grouped antonymic pair 2: langzaam & traag vs. snel:
g.ant_p_2 <- Adjectives_grouped[c(12, 8), ]
# Chi squared test for 'g.ant_p_2':
chisq.test(g.ant_p_2, correct = FALSE)
##
## Pearson's Chi-squared test
##
## data: g.ant_p_2
## X-squared = 37952, df = 1, p-value < 2.2e-16
# Fisher's exact test for 'g.ant_p_2':
fisher.test(g.ant_p_2)
##
## Fisher's Exact Test for Count Data
##
## data: g.ant_p_2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3851732 0.3928268
## sample estimates:
## odds ratio
## 0.3890051
In order to obtain a better understanding of the reduplicated letters in the recorded data, two plotting processes are followed. The first plotting process involves the grouping of the adjectives under examination into the corresponding adjectival dimension, i.e., the large vs. the small dimension, and produces a graph that compares letter reduplications across the two adjectival dimensions. In the second plotting process adjectives are examined in pairs. Moreover, they are classified into six categories, namely elevation, height, loudness, speed, weight, and width, each of which corresponds to the property expressed by each of the examined adjectival pairs. The resultant plot displays the mean letter reduplications recorded for both the large and the small adjectival dimension across all six categories. The following two sections offer a detailed account of each plotting process.
Some of the recorded adjectival tokens displayed both vowel as well
as consonant reduplications. In such cases, letter reduplications were
recorded as fractions in which numerators indicated the letter that was
reduplicated first, while denominators the one that was reduplicated
second. For instance, the letter reduplications for the token
laaaannge were represented as 3/1, where, based on the
orthographic norm of the word, 3 corresponded to the additional three
reduplications of the letter a, while 1 represented the single
reduplication of the letter n. For the purposes of both
plotting processes, however, the letter reduplications per adjectival
token need to be regarded as sums. The sumslash function
collapses the distinct indications of letter reduplications into one
integral number corresponding to the total number of reduplicated
letters per adjectival token:
sumslash <- function (strg) {ifelse(grepl("/", strg),
as.numeric(substring(strg, 1, 1)) +
as.numeric(str_sub(strg, -1, -1)),
return(strtoi(strg)))
}
Having resolved this issue, the initial du_data
dataframe is renamed to du_data_mod. The column
redsize containing the total number of reduplicated letters
per adjectival token is appended to the renamed dataframe:
# Renaming the 'du_data' dataframe as 'du_data_mod':
du_data_mod <- du_data
# Appending the column 'redsize' to the renamed dataframe and applying the
# 'sumslash' function:
du_data_mod$redsize <- du_data_mod$Reduplication_size
du_data_mod$redsize <- sapply(du_data_mod$redsize, sumslash)
head(du_data_mod)
## Frequency N.gram Adjective Reduplication Reduplication_size Un.inflected
## 1 11659013 lang lang 0 0 U
## 2 3310352 lange lang 0 0 I
## 3 563227 Lang lang 0 0 U
## 4 338996 Lange lang 0 0 I
## 5 94753 LANG lang 0 0 U
## 6 58386 langg lang 1 1 U
## Vowel__Consonant Capitalisation redsize
## 1 <NA> 0 0
## 2 <NA> 0 0
## 3 <NA> 0 0
## 4 <NA> 0 0
## 5 <NA> 1 0
## 6 C 0 1
The following code chunk introduces a factor that keeps track of the letter reduplications in the two adjectival dimensions:
du_data_mod$dimension <- sapply(du_data_mod$Adjective, function(a) ifelse(a %in%
c("dun", "stil", "hoog", "kort", "snel", "licht"), "dimsmall", "dimlarge"))
In order to plot the letter reduplications across the two adjectival
dimensions, the smaller dataframe forana is created. This
dataframe comprises the reduplicated adjectival tokens, which are now
dimensionally defined, along with the number of their letter
reduplications:
# Creating the smaller dataframe 'forana':
forana <- du_data_mod %>%
select(dimension, redsize) %>%
filter(redsize > 0)
head(forana)
## dimension redsize
## 1 dimlarge 1
## 2 dimlarge 3
## 3 dimlarge 2
## 4 dimlarge 3
## 5 dimlarge 2
## 6 dimlarge 2
# Identifying the large- and small-dimension adjectives in the 'forana'
# dataframe:
bigadjectives <- subset(forana, dimension == "dimlarge")$redsize
smalladjectives <- subset(forana, dimension == "dimsmall")$redsize
As the code chunk below demonstrates, the graph of this plotting
process is generated by passing the forana dataframe to the
function ggplot:
ggplot(forana, aes(x = dimension, y = redsize)) + geom_boxplot() + theme_classic()
As a next step, an F-test is performed to compare the variances
between the two samples, i.e., between bigadjectives and
smalladjectives:
# F-test:
var.test(bigadjectives, smalladjectives)
##
## F test to compare two variances
##
## data: bigadjectives and smalladjectives
## F = 1.9794, num df = 499, denom df = 225, p-value = 1.171e-08
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.576149 2.462196
## sample estimates:
## ratio of variances
## 1.979384
The F-test returns a p-value much lower than the conventional significance level of 0.05 (p-value = 1.171e-08, \(df_n = 499\), \(df_d = 225\)), which indicates that the two tested samples are heteroscedastic, i.e., their variances differ significantly. For this reason, a Welch t-test is performed. The Welch t-test assumes that the tested samples’ variances are unequal, and it is conducted in order to identify whether the difference between the means of the two tested samples is significant:
# Welch t-test:
test <- t.test(bigadjectives, smalladjectives, var.equal = FALSE)
test
##
## Welch Two Sample t-test
##
## data: bigadjectives and smalladjectives
## t = 3.7443, df = 593.5, p-value = 0.0001985
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.4073823 1.3061752
## sample estimates:
## mean of x mean of y
## 4.432000 3.575221
The resultant p-value is lower than 0.05 (p-value = 0.0001985, df = 593.5), which suggests that the two samples differ significantly with respect to their means.
As displayed in the code chunk below, the mean and standard deviation
of the reduplicated letters per adjectival type are calculated. These
summary statistics are assigned to the existing dataframe
forana, and, thus, override its previous content.
Furthermore, two more columns are added to the new forana
tibble, namely the column dimension, which matches each
adjectival type with the relevant dimension, i.e., the large or the
small dimension, and the column kind, which specifies the
category to which each adjectival type belongs, i.e.,
elevation, height, loudness, speed,
weight or width:
# Overriding the existing 'forana' dataframe by adding to it the summarised
# data:
forana <- du_data_mod %>%
filter(redsize > 0) %>%
group_by(Adjective) %>%
summarise(mean = mean(redsize), sd = sd(redsize))
# Identifying the dimension of each adjectival type and adding the resultant
# observations to the dataframe 'forana' under the column 'dimension':
forana$dimension <- sapply(forana$Adjective, function(a) ifelse(a %in% c("dun", "stil",
"hoog", "kort", "snel", "licht"), "dimsmall", "dimlarge"))
# Assigning the relevant category to each adjectival type and adding the
# resultant observations to the dataframe 'forana' under the column 'kind':
forana$kind <- c("width", "width", "loudness", "elevation", "height", "elevation",
"height", "speed", "weight", "loudness", "speed", "loudness", "speed", "weight")
forana
## # A tibble: 14 × 5
## Adjective mean sd dimension kind
## <chr> <dbl> <dbl> <chr> <chr>
## 1 dik 3.04 1.58 dimlarge width
## 2 dun 2.45 1.41 dimsmall width
## 3 hard 3.74 2.72 dimlarge loudness
## 4 hoog 2.71 1.81 dimsmall elevation
## 5 kort 2.25 1.26 dimsmall height
## 6 laag 3.56 2.66 dimlarge elevation
## 7 lang 5.60 4.42 dimlarge height
## 8 langzaam 4.11 2.98 dimlarge speed
## 9 licht 1.77 1.36 dimsmall weight
## 10 luid 1.25 0.5 dimlarge loudness
## 11 snel 4.5 2.75 dimsmall speed
## 12 stil 4.26 2.66 dimsmall loudness
## 13 traag 4.80 3.70 dimlarge speed
## 14 zwaar 4.67 3.36 dimlarge weight
The graph of this plotting process is created by passing the
redefined forana tibble to the function ggplot
as indicated below:
ggplot(forana, aes(x = kind, y = mean, group = Adjective)) + geom_bar(aes(fill = dimension),
stat = "identity", position = "dodge", colour = "black") + geom_text(aes(label = Adjective),
position = position_dodge(width = 0.9), vjust = -0.25, size = 3) + theme_classic() +
scale_fill_manual("dimension", values = c(dimlarge = "steelblue", dimsmall = "goldenrod"))
ggsave("Category_comparison.pdf", width = 6, height = 4)