For this experiment, we recruited 15 native speakers of Xhosa and 15 native speakers of Afrikaans. Each participant described 25 sung tone pairs to a confederate matcher.
##
## Afrikaans Xhosa
## 556 190
In the figure below, we find that ‘height’ is dominant in both languages. ‘Size’ is only used in Xhosa, though less than expected on the basis of our pilot data.
In the table anch figure below, we see that Xhosa speakers gestured more when using metaphors in speech. Within the two groups, the by-metaphor gesture rates were comparable.
Gestures were coded for dimension (in terms of movement and location) and handshape (i.e. flat hand, “grip”) and speech-gesture pairs were then coded as either “yes” (convergent), “no” (divergent, e.g. ‘size’ in speech with vertical gestures), “mixed” (both ‘height’ and ‘size’ mappings expressed in gesture) and “n/a” (gestures not clearly expressing spatial mappings).
For this experiment, we recruited 30 native speakers of Xhosa and 30 native speakers of Afrikaans.
Participants performed an RT task targeting implicit space-pitch associations by pairing circles differing in vertical position (high/low, height condition) or size (small, big) with a high/low pitched voice. Participants were asked to indicate whether the sound in each trial was high or low-pitched with button presses.
There were 16 blocks (8 for each condition), with 20 trials in each (320 trials per participant). The order of blocks was randomised.
Stimulus pairs were presented for 200 msec and participants used button presses to indicate whether the sound
Half of the stimulus pairs were “incongruent” in terms of the space-pitch mapping.
In experiment one, Afrikaans speakers described pitch in terms of ‘height’, whereas Xhosa speakers used ‘size’ in addition to ‘height’. Based on these findings and previous work, we would expect the following:
We thus expected a three-way interaction effect between language, condition and congruence.
RT data are generally difficult to handle for a number of reasons, and the literature proposes a number of procedures to trim and filter the data prior to statistical analyses. In the next sections, we go through each step discussed in the relevant literature.
Ideally, we would want to remove data from participants performing at chance level by pressing buttons at random. To my knowledge, there isn’t a specific threshold for accuracy that’s widely agreed upon for this or similar tasks. However, inspecting the distribution of overall accuracies for each participant, we see that a few participants (n=5) clearly stand out from the rest. Data from these participants are left out in the later analyses.
Another method used by Abutalebi et al., is to use confidence intervals and then set the threshold at the lower CI for each language group. I tried this, and found that this would have further excluded data from four participants.
In the distribution of the response times below, we see a very long right tail. The slowest response is 28 seconds!
Setting an upper threshold will affect RT estimates, but these extreme values would themselves have a major influence making estimates unreliable.
I therefore propose “mild” initial trimming of the data excluding response slower than five seconds as indicated in the figures below.
I’ve also checked for responses that are faster than 100 msec (which are generally considered to be errors), but found none. The fastest recorded RT is 160 msec.
There appears to be wide agreement that individual thresholds should be set based on the overall mean and standard deviation for each participant, but authors advocate different levels. Baayen & Milin argue that the frequently used limit at 2 (perhaps also 2.5) standard deviations is too aggressive and proposes an upper limit of 3 standard deviations above the mean coupled with minimal trimming based on residuals after fitting a model. This is essentially what they call performing “model criticism”.
Following this suggestion, we set the threshold for individual RTs at three SDs above individual means.
The table below indicates the amount of data removed in each step. The amount of discarded data seems reasonable and in line with what is generally considered acceptable. Note that the majority of the discarded data is due to poor performance.
The classical central tendency approach to analysing RTs is using ANOVAs on the by-participant aggregated means. However, this technique assumes normally distributed data, in which case participant means would offer a reliable summary of the data. The problem is that RTs rarely follow a normal distribution, but rather a positively skewed distribution resembling an ex-Gaussian distribution characterized by a long right tail. As the below plot shows, this is also the case here.
To deal with this issue, we’ll try out the following approaches:
In this approach, we compute the three parameters, \(\sigma\) (sigma), \(\mu\) (mu) and tau (tau), that describe an ex-Gaussian distribution, which itself is a convolution of a normal and an exponential distribution. The idea is that potentially interesting effects may hide in the long right tail (\(\tau\) component), which is ignored in standard central tendency tests. \(\mu\) and \(\sigma\) reflect the mean and standard deviation of the Gaussian component, whereas \(\tau\) reflects the mean and the standard deviation of the exponential component.
This procedure has been used e.g. by Abutalebi et al. and Calabria et al.
The figure below shows the dstributions of the three parameters.
We already know that the the grouped distributions appear to follow ex-Gaussian distributions, but we can go further and inspect distributions for each participant in the figure below ordered by mean RT.
From this, it is perhaps less clear that the ex-Gaussian distribution provides the best fit for our data.
We can quantify this by creating simulated data based on the aggregated normal and ex-Gaussian parameters for each participant and perform Kolmogorov-Smirnov tests to see whether the real and simulated distributions are significantly different.
The p-values in the below figure tell us how many participants follow an ex-Gaussian vs. a normal distribution. There are more values above .05 in the ex-Gaussian panel. We therefore conclude that this distribution provides a better fit for the majority of our data.
Before fitting models to our data, we can inspect the following plots which give us a better idea of possible interactions between factors within each parameter. There’s no clear evidence for interactions in the estimated parameters, but there might be main effects of language.
Fitting a full model with all predictors and interactions reveals a marginally significant effect of language (p = .059), i.e. the Xhosa group had shorter RTs in the Gaussian component. No effects reached significant p-values.
##
## Call:
## lm(formula = mu ~ language * condition * congruent, data = params)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131.82 -50.05 -13.16 51.48 253.78
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 592.737 13.468 44.010
## languageXhosa -37.173 19.588 -1.898
## conditionsize -13.829 19.047 -0.726
## congruentTRUE -21.541 19.047 -1.131
## languageXhosa:conditionsize 3.375 27.702 0.122
## languageXhosa:congruentTRUE 18.597 27.702 0.671
## conditionsize:congruentTRUE 27.784 26.936 1.031
## languageXhosa:conditionsize:congruentTRUE -41.386 39.177 -1.056
## Pr(>|t|)
## (Intercept) <2e-16 ***
## languageXhosa 0.0591 .
## conditionsize 0.4686
## congruentTRUE 0.2593
## languageXhosa:conditionsize 0.9031
## languageXhosa:congruentTRUE 0.5027
## conditionsize:congruentTRUE 0.3035
## languageXhosa:conditionsize:congruentTRUE 0.2920
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 72.53 on 212 degrees of freedom
## Multiple R-squared: 0.07613, Adjusted R-squared: 0.04562
## F-statistic: 2.496 on 7 and 212 DF, p-value: 0.01751
As with the \(\mu\) parameter, fitting a full model with all predictors and interactions reveals only a marginally significant effect of language (p = .095).
##
## Call:
## lm(formula = tau ~ language * condition * congruent, data = params)
##
## Residuals:
## Min 1Q Median 3Q Max
## -170.61 -72.48 -20.88 52.21 468.38
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 147.355 19.775 7.451
## languageXhosa 48.309 28.762 1.680
## conditionsize 8.576 27.966 0.307
## congruentTRUE 7.628 27.966 0.273
## languageXhosa:conditionsize 7.118 40.675 0.175
## languageXhosa:congruentTRUE -8.017 40.675 -0.197
## conditionsize:congruentTRUE -11.452 39.550 -0.290
## languageXhosa:conditionsize:congruentTRUE 15.138 57.524 0.263
## Pr(>|t|)
## (Intercept) 2.31e-12 ***
## languageXhosa 0.0945 .
## conditionsize 0.7594
## congruentTRUE 0.7853
## languageXhosa:conditionsize 0.8613
## languageXhosa:congruentTRUE 0.8439
## conditionsize:congruentTRUE 0.7724
## languageXhosa:conditionsize:congruentTRUE 0.7927
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 106.5 on 212 degrees of freedom
## Multiple R-squared: 0.06082, Adjusted R-squared: 0.02981
## F-statistic: 1.961 on 7 and 212 DF, p-value: 0.06172
In sum, we found that Xhosa speakers produced smaller \(\mu\). Otherwise, splitting the Gaussian and exponential components in the RTs did not allow us to identify effects that might have been hiding in the right tails.
Calabria et al. also ran correlation analyses on the \(\mu\) and \(\tau\) parameters of their groups. I’m not completely sure if this is truly interesting or relevant, but doing so yields a small, but significant negative correlation of \(\mu\) and \(\tau\) for Afrikaans speakers and no correlation for Xhosa speakers.
We can also think of the dependent variable as the mean difference in RTs in response to congruent and incongruent trials, what Calabria et al. call the conflict effect
Below we’ll plot the conflict effect and run the analysis. Note that the congruence variable is contained in the dependent variable, so we fit a model with language and condition as the only factors.
##
## Call:
## lm(formula = conflict ~ language * condition, data = conflict_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108.086 -22.213 1.279 16.963 144.360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.913 7.616 1.827 0.0706 .
## languageXhosa -10.580 11.078 -0.955 0.3417
## conditionsize -16.332 10.771 -1.516 0.1324
## languageXhosa:conditionsize 26.248 15.666 1.675 0.0968 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.02 on 106 degrees of freedom
## Multiple R-squared: 0.02899, Adjusted R-squared: 0.001504
## F-statistic: 1.055 on 3 and 106 DF, p-value: 0.3716
There seems to be too much variability to detect any significant effects, though the interaction lines suggest opposite trends, which is reflected in the marginally significant interaction effect between language and condition (p = .097)
It might be interesting to explore whether there are patterns in the variability of response latencies. We can compute the coefficient of variability for each participant by diving the individual SDs by the means.
We plot the data and fit a model in the same way as with the conflict effect.
##
## Call:
## lm(formula = lm(coef_var ~ language * condition * congruent,
## data = params))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.18572 -0.06823 -0.00593 0.04981 0.35430
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.2329422 0.0178153 13.075
## languageXhosa 0.0418901 0.0259112 1.617
## conditionsize 0.0014727 0.0251946 0.058
## congruentTRUE 0.0020427 0.0251946 0.081
## languageXhosa:conditionsize 0.0141439 0.0366440 0.386
## languageXhosa:congruentTRUE 0.0037418 0.0366440 0.102
## conditionsize:congruentTRUE -0.0067914 0.0356306 -0.191
## languageXhosa:conditionsize:congruentTRUE 0.0007974 0.0518224 0.015
## Pr(>|t|)
## (Intercept) <2e-16 ***
## languageXhosa 0.107
## conditionsize 0.953
## congruentTRUE 0.935
## languageXhosa:conditionsize 0.700
## languageXhosa:congruentTRUE 0.919
## conditionsize:congruentTRUE 0.849
## languageXhosa:conditionsize:congruentTRUE 0.988
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09594 on 212 degrees of freedom
## Multiple R-squared: 0.07045, Adjusted R-squared: 0.03975
## F-statistic: 2.295 on 7 and 212 DF, p-value: 0.02833
From the regression output, we see that no effects reached significance.
The downside of the previous analyses is that they require the data to be aggregated. For this analysis, we’ll follow Baayen & Milin’s suggestions and do the following:
We’ll compare and determine whether to use untransformed RTs, a log or a inverse Gaussian transformation.
The values for skewness and kurtosis in the table below suggests that all options result in skewed and kurtotic distributions, but with considerable improvements with transformations.
However, these measures are known to be unreliable with larger samples (n > 200)
Instead, we’ll inspect quantile-quantile plots for the goodness of fit of theoretical distributions. Also shown, are the correlation coefficients of the observed and theoretical distributions.
Based on the output, we proceed with the inverse Gaussian transformation. Later model criticism will further improve the goodness of fit.
Below we see QQ-plots for individual participants. It is clear that both groups have a few participants deviating from the expected pattern causing later points to rise above the line in the center panel above.
Before fitting regression models, we’ll plot the data grouped by our independent variables.
We first fit a random intercept model. From the output we can observe a significant three-way interaction between language, condition and congruence (p = .02)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RTinv ~ language * condition * congruent + (1 | participant) +
## (1 | item)
## Data: df
##
## REML criterion at convergence: 10686.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2278 -0.5713 0.0162 0.5620 14.7918
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant (Intercept) 4.128e-02 0.203168
## item (Intercept) 5.032e-05 0.007093
## Residual 1.103e-01 0.332078
## Number of obs: 16392, groups: participant, 55; item, 8
##
## Fixed effects:
## Estimate Std. Error df
## (Intercept) 1.457e+00 3.872e-02 5.675e+01
## languageXhosa 7.514e-03 5.587e-02 5.593e+01
## conditionsize 1.229e-02 1.227e-02 8.310e+00
## congruentTRUE 2.619e-02 1.233e-02 8.450e+00
## languageXhosa:conditionsize -1.786e-02 1.482e-02 1.633e+04
## languageXhosa:congruentTRUE -1.666e-02 1.482e-02 1.633e+04
## conditionsize:congruentTRUE -3.397e-02 1.739e-02 8.361e+00
## languageXhosa:conditionsize:congruentTRUE 4.828e-02 2.085e-02 1.633e+04
## t value Pr(>|t|)
## (Intercept) 37.633 <2e-16 ***
## languageXhosa 0.134 0.8935
## conditionsize 1.001 0.3451
## congruentTRUE 2.125 0.0645 .
## languageXhosa:conditionsize -1.205 0.2282
## languageXhosa:congruentTRUE -1.125 0.2607
## conditionsize:congruentTRUE -1.954 0.0849 .
## languageXhosa:conditionsize:congruentTRUE 2.316 0.0206 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.681
## conditionsz -0.159 0.074
## congrntTRUE -0.159 0.074 0.502
## lnggXhs:cnd 0.088 -0.133 -0.552 -0.278
## lnggXh:TRUE 0.088 -0.133 -0.279 -0.556 0.502
## cndtns:TRUE 0.113 -0.052 -0.707 -0.709 0.390 0.395
## lnggX::TRUE -0.063 0.095 0.393 0.396 -0.712 -0.711 -0.556
We then calculate \(R^2\) indicating how well the model fits our data.
## [1] 0.2670519
We then apply “model criticism” following Baayen & Milin’s recommendations. This means minimal trimming of standardized residuals above 2.5.
We then refit the model and see that the interaction is still significant.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RTinv ~ language * condition * congruent + (1 | participant) +
## (1 | item)
## Data: df2
##
## REML criterion at convergence: 6830.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.87973 -0.61361 0.03524 0.63555 2.98165
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant (Intercept) 4.243e-02 0.20599
## item (Intercept) 8.227e-05 0.00907
## Residual 8.754e-02 0.29587
## Number of obs: 16173, groups: participant, 55; item, 8
##
## Fixed effects:
## Estimate Std. Error df
## (Intercept) 1.448e+00 3.930e-02 5.688e+01
## languageXhosa 9.256e-03 5.642e-02 5.529e+01
## conditionsize 1.180e-02 1.275e-02 6.666e+00
## congruentTRUE 2.993e-02 1.279e-02 6.741e+00
## languageXhosa:conditionsize -1.645e-02 1.329e-02 1.611e+04
## languageXhosa:congruentTRUE -2.742e-02 1.328e-02 1.611e+04
## conditionsize:congruentTRUE -3.451e-02 1.806e-02 6.695e+00
## languageXhosa:conditionsize:congruentTRUE 5.753e-02 1.871e-02 1.611e+04
## t value Pr(>|t|)
## (Intercept) 36.836 < 2e-16 ***
## languageXhosa 0.164 0.87030
## conditionsize 0.925 0.38712
## congruentTRUE 2.340 0.05319 .
## languageXhosa:conditionsize -1.238 0.21588
## languageXhosa:congruentTRUE -2.064 0.03906 *
## conditionsize:congruentTRUE -1.911 0.09947 .
## languageXhosa:conditionsize:congruentTRUE 3.075 0.00211 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.678
## conditionsz -0.163 0.056
## congrntTRUE -0.162 0.056 0.501
## lnggXhs:cnd 0.077 -0.118 -0.474 -0.239
## lnggXh:TRUE 0.078 -0.118 -0.240 -0.478 0.501
## cndtns:TRUE 0.115 -0.040 -0.707 -0.709 0.336 0.339
## lnggX::TRUE -0.055 0.084 0.338 0.340 -0.712 -0.710 -0.478
The \(R^2\) value now indicates a considerably better fit.
## [1] 0.3208119
We’ll also try fitting a model with “maximal random structure” including both random intercepts and slopes as is allowed by the design. We see that the interaction effect is retained in the model.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## RTinv ~ language * condition * congruent + (1 + condition + congruent |
## participant) + (1 | item)
## Data: df
##
## REML criterion at convergence: 10603.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2198 -0.5668 0.0171 0.5619 14.6442
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## participant (Intercept) 4.357e-02 0.208723
## conditionsize 3.872e-03 0.062228 -0.21
## congruentTRUE 1.397e-03 0.037370 -0.10 -0.09
## item (Intercept) 5.222e-05 0.007226
## Residual 1.090e-01 0.330184
## Number of obs: 16392, groups: participant, 55; item, 8
##
## Fixed effects:
## Estimate Std. Error df
## (Intercept) 1.456e+00 3.973e-02 5.474e+01
## languageXhosa 7.739e-03 5.733e-02 5.387e+01
## conditionsize 1.324e-02 1.689e-02 2.359e+01
## congruentTRUE 2.662e-02 1.419e-02 1.354e+01
## languageXhosa:conditionsize -1.772e-02 2.236e-02 8.650e+01
## languageXhosa:congruentTRUE -1.545e-02 1.788e-02 1.211e+02
## conditionsize:congruentTRUE -3.440e-02 1.744e-02 8.261e+00
## languageXhosa:conditionsize:congruentTRUE 4.615e-02 2.076e-02 1.626e+04
## t value Pr(>|t|)
## (Intercept) 36.649 <2e-16 ***
## languageXhosa 0.135 0.8931
## conditionsize 0.784 0.4408
## congruentTRUE 1.876 0.0823 .
## languageXhosa:conditionsize -0.792 0.4303
## languageXhosa:congruentTRUE -0.864 0.3891
## conditionsize:congruentTRUE -1.972 0.0829 .
## languageXhosa:conditionsize:congruentTRUE 2.223 0.0262 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.682
## conditionsz -0.253 0.148
## congrntTRUE -0.184 0.095 0.289
## lnggXhs:cnd 0.162 -0.239 -0.617 -0.136
## lnggXh:TRUE 0.109 -0.163 -0.143 -0.588 0.235
## cndtns:TRUE 0.110 -0.050 -0.516 -0.619 0.255 0.323
## lnggX::TRUE -0.061 0.092 0.284 0.342 -0.470 -0.586 -0.552
\(R^2\):
## [1] 0.2783126
We apply model criticism and see that the interaction is still significant.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## RTinv ~ language * condition * congruent + (1 + condition + congruent |
## participant) + (1 | item)
## Data: df3
##
## REML criterion at convergence: 6807.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.81499 -0.60413 0.03325 0.63064 2.97502
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## participant (Intercept) 4.457e-02 0.211109
## conditionsize 3.611e-03 0.060095 -0.23
## congruentTRUE 8.312e-04 0.028831 -0.12 0.22
## item (Intercept) 9.442e-05 0.009717
## Residual 8.685e-02 0.294711
## Number of obs: 16180, groups: participant, 55; item, 8
##
## Fixed effects:
## Estimate Std. Error df
## (Intercept) 1.446e+00 4.030e-02 5.544e+01
## languageXhosa 1.064e-02 5.778e-02 5.369e+01
## conditionsize 1.499e-02 1.729e-02 1.695e+01
## congruentTRUE 3.141e-02 1.428e-02 8.598e+00
## languageXhosa:conditionsize -2.008e-02 2.096e-02 8.247e+01
## languageXhosa:congruentTRUE -2.633e-02 1.537e-02 1.354e+02
## conditionsize:congruentTRUE -3.750e-02 1.869e-02 6.429e+00
## languageXhosa:conditionsize:congruentTRUE 6.175e-02 1.866e-02 1.605e+04
## t value Pr(>|t|)
## (Intercept) 35.868 < 2e-16 ***
## languageXhosa 0.184 0.854607
## conditionsize 0.867 0.398196
## congruentTRUE 2.199 0.056791 .
## languageXhosa:conditionsize -0.958 0.340960
## languageXhosa:congruentTRUE -1.713 0.089030 .
## conditionsize:congruentTRUE -2.006 0.088469 .
## languageXhosa:conditionsize:congruentTRUE 3.309 0.000937 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.677
## conditionsz -0.268 0.140
## congrntTRUE -0.197 0.080 0.408
## lnggXhs:cnd 0.165 -0.246 -0.564 -0.179
## lnggXh:TRUE 0.107 -0.161 -0.202 -0.499 0.360
## cndtns:TRUE 0.116 -0.037 -0.540 -0.657 0.205 0.282
## lnggX::TRUE -0.054 0.082 0.248 0.304 -0.450 -0.612 -0.460
The new \(R^2\) indicates a much better fit.
## [1] 0.3297167
In the plots below, we see that there are no clear patterns in RTs over trials or over the duration of the experiments.
Still, in this very fast-paced experiment with many trials, it might be the case that response latencies are dependent on RTs in previous trials, particularly at lag\(_{t-1}\).
We therefore need to check for and possibly control for the RT variable being correlated with itself.
However, the following autocorrelation plots with a subsample of thirty participants suggest that there’s no evidence for significant autocorrelation between RTs for most of the participants. We can therefore leave RTs at t-1 out of the model.
## Quantiles to be plotted:
## 0% 3.448276% 6.896552% 10.34483% 13.7931%
## -0.586671730 -0.397174279 -0.321054119 -0.277102594 -0.237523948
## 17.24138% 20.68966% 24.13793% 27.58621% 31.03448%
## -0.207245033 -0.182002424 -0.158051947 -0.130596415 -0.113514347
## 34.48276% 37.93103% 41.37931% 44.82759% 48.27586%
## -0.097340640 -0.065118054 -0.041524692 -0.022997554 -0.002101595
## 51.72414% 55.17241% 58.62069% 62.06897% 65.51724%
## 0.023375358 0.045085477 0.062544878 0.083833946 0.105303793
## 68.96552% 72.41379% 75.86207% 79.31034% 82.75862%
## 0.131932183 0.153693915 0.178259327 0.202648323 0.238750555
## 86.2069% 89.65517% 93.10345% 96.55172% 100%
## 0.271158576 0.311439933 0.369299480 0.432781245 0.705182612
Methods:
The three-way interaction suggests that:
Xhosa speakers appear to have slightly slower responses overall, but this may be due to greater variability, since, as we saw in the ex-Gaussian analysis, they actually have smaller \(\mu\), but larger \(\tau\) values compared with the Afrikaans group.
Whereas experiment 2 was designed to test implicit associations between space and pitch, experiment 3 is aimed at explicit associations where participants choose between height and size mappings in a mixed condition, and whether to match high/low pitch with high vs low position and small vs. big.
We recruited 30 native speakers of Xhosa and 30 native speakers of Afrikaans for experiment 3.
Participants performed a two-alternative forced-choice task targeting explicit space-pitch associations by pairing pitch with circles differing in vertical position (high/low, height condition) or size (small, big). In a third condition (or trial type), participants had to choose between mapping pitch to either a high/low or a small/big circle.
There were 40 trials per participant. The order of trial types was randomised.
We also recorded RTs in each trial.
We again use mixed-effects regression to analyse the data.
The plot and output below indicate that Xhosa speakers, when given the possibility of pairing either visual height or size with pitch, are less likely to select size than Afrikaans speakers.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: choiceDimension ~ voice * language + (1 | participant)
## Data: df_m
##
## AIC BIC logLik deviance df.resid
## 587.4 609.4 -288.7 577.4 592
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0581 -0.5260 0.2646 0.4567 3.4811
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant (Intercept) 2.32 1.523
## Number of obs: 597, groups: participant, 61
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.3472 0.3720 3.622 0.000293 ***
## voicelow 1.0147 0.3516 2.886 0.003899 **
## languageXhosa -2.1798 0.5143 -4.238 2.25e-05 ***
## voicelow:languageXhosa 1.4295 0.4846 2.950 0.003180 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) voiclw lnggXh
## voicelow -0.358
## languageXhs -0.731 0.255
## vclw:lnggXh 0.294 -0.709 -0.418
We find an interaction effect between language and voice frequency.
Contrary to our expectations, Afrikaans speakers were more likely to pair small circles with high-pitched voices rather than circles with a high position.
The opposite patterns was found for Xhosa speakers, who preferred the ‘height’ mapping.
For low-pitched voices, both group consistently favoured the ‘size’ mapping.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: choiceName ~ voice * language + (1 + voice | participant)
## Data: df_h
##
## AIC BIC logLik deviance df.resid
## 735.6 766.4 -360.8 721.6 600
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2522 -0.6824 0.2859 0.5909 2.2376
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## participant (Intercept) 4.828 2.197
## voicelow 8.511 2.917 -0.94
## Number of obs: 607, groups: participant, 61
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8408 0.4730 -1.778 0.07547 .
## voicelow 1.6184 0.6233 2.596 0.00942 **
## languageXhosa 0.7860 0.6648 1.182 0.23710
## voicelow:languageXhosa -1.3244 0.8702 -1.522 0.12804
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) voiclw lnggXh
## voicelow -0.892
## languageXhs -0.714 0.637
## vclw:lnggXh 0.641 -0.717 -0.897
In the ‘height’ condition, we find a significant effect for voice frequency. The trend further suggests an interaction such that Afrikaans speakers are more consistent in mapping high/low voices to high/low circles.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: choiceName ~ voice * language + (1 | participant)
## Data: df_s
##
## AIC BIC logLik deviance df.resid
## 577.1 599.1 -283.5 567.1 603
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0099 -0.4562 -0.2429 0.3899 3.8936
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant (Intercept) 0.9321 0.9654
## Number of obs: 608, groups: participant, 61
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8822 0.2976 6.324 2.55e-10 ***
## voicelow -4.2257 0.4014 -10.527 < 2e-16 ***
## languageXhosa -2.1984 0.3913 -5.619 1.92e-08 ***
## voicelow:languageXhosa 2.5268 0.4982 5.072 3.93e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) voiclw lnggXh
## voicelow -0.546
## languageXhs -0.774 0.436
## vclw:lnggXh 0.417 -0.769 -0.482
In the ‘size’ condition, we see a clear interaction between language and voice frequency.
The two language groups showed high agreement in pairing low pitch with big, rather than small circles.
Interestingly, Xhosa speakers also showed a slight preference for pairing high pitch with big circles, whereas Afrikaans speakers more consistently paired high pitch with small circles.
Finally, as large RTs might be indicative of uncertainty, we’ll examine whether there are significant differences in RTs related to the independent variables.
The data has been trimmed to only include responses faster than 20 seconds.
Not shown here are QQ-plots and correlation coefficients indicating that the log-normal transformation provides the best fit for our data. We will also apply model criticism and only examine the final model.
As the condition variable has more than two levels for comparison, we’ll base our analysis on an anova table summarizing the regression output.
The interaction plots and anova table reveal significant main effects of language, voice frequency and condition with no interaction effects. Xhosa speakers generally took longer in selecting visual stimuli, which was to be expected on the basis of their lower consistency in all conditions.
Interestingly, our expectation that the mixed condition would give rise to the longest RTs is not supported by the results. Instead the ‘height’ condition consistently gave rise to slower decisions despite being the more dominant mapping in speech.
This experiment yielded some very interesting and surprising results.
Overall, Xhosa speakers were less consistent in how they mapped space to pitch, both when choosing visual stimuli of opposite spatial polarities, and when choosing between height and size.
Surprisingly, Afrikaans speakers consistently preferred the size mapping for both high and low-pitched voices, whereas Xhosa speakers showed a preference for ‘height’ in cases with low pitch, and ‘size’ in cases with high pitch.
Another striking finding was that, in the size condition, Xhosa speakers showed a preference for mapping big circles to both low pitch and high pitch, though to a lesser extent.
There appears to be more individual variation for Xhosa speakers in this task, both in choosing between mappings, as well as choosing polarity correspondences within particular mappings.
In this experiment, linguistic metaphors proved to be poor predictors of non-linguistic choices with regards to spatial mappings.
In the series of experiments, we examined spatial metaphors for pitch from three angles: language production, performance in a nonverbal implicit association task and nonverbal judgements in an explicit association task.
Our language production findings show that ‘height’ is used in Adrikaans, whereas Xhosa speakers also used ‘size’. Interestingly, and in line with our findings from a previous study, vertical gestures frequently accompany ‘height’ but also ‘size’ metaphors. The speech material does not allow us determine whether gestural indications of height might refer to the physical size/height of a person likely to produce the heard sounds.
In the RT task, we found a three-way interaction that followed our predictions. However, methods requiring data aggregation failed to detect this effect. I would suggest reporting the statistically more sophisticated mixed-effect regfression model with maximal random structure, but perhaps also noting that an ex-Gaussian approach failed to detect any significant effects. As the last interaction plot indicates, the effect is far from spectacular (as opposed to previous findings in the literature).
The results from the hird experiment seem more puzzling. We did not expect Afrikaans speakers to consistently map pitch to ‘size’. Nor did we expect Xhosa speakers to choose ‘height’/‘size’ depending on the pitch of the voice.
In the height and size conditions, Afrikaans speakers paired pitch with high/low/big/small as expected, whereas Xhosa speakers were much less consistent, with the exception that they had a clear preference for pairing “low” pitch with bic circles as opposed to small circles. Further studies with other types of stimuli might shed light on the latter findings
As I see it, our findings point in different directions and only support the very general idea that the conceptualisation of pitch is, to some extent, spatial, but flexible. The factorial designs demonstrated different effects of manipulating visual size and height for the two groups, but only in the case of mixed-effects regression. The role of language in shaping the conceptualisation of pitch is unclear, but far from deterministic. This is particularly evident from the contradictory findings in experiment 3.
Abutalebi, J., Guidi, L., Borsa, V., Canini, M., Della Rosa, P. A., Parris, B. A., & Weekes, B. S. (2015). Bilingualism provides a neural reserve for aging populations. Neuropsychologia, 69, 201–210. https://doi.org/10.1016/j.neuropsychologia.2015.01.040
Baayen, H. R., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12. https://doi.org/10.21500/20112084.807
Calabria, M., Hernandez, M., Martin, C. D., & Costa, A. (2011). When the Tail Counts: The Advantage of Bilingualism Through the Ex-Gaussian Distribution Analysis. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00250
Henriquez-Henriquez, M. P., Billeke, P., Henriquez, H., Zamorano, F. J., Rothhammer, F., & Aboitiz, F. (2015). Intra-Individual Response Variability Assessed by Ex-Gaussian Analysis may be a New Endophenotype for Attention-Deficit/Hyperactivity Disorder. Frontiers in Psychiatry, 5. https://doi.org/10.3389/fpsyt.2014.00197
Lachaud, C. M., & Renaud, O. (2011). A tutorial for analyzing human reaction times: How to filter data, manage missing values, and choose a statistical model. Applied Psycholinguistics, 32(02), 389–416. https://doi.org/10.1017/S0142716410000457
Marsden, E., Thompson, S., & Plonsky, L. (2018). A methodological synthesis of self-paced reading in second language research. Applied Psycholinguistics, 39(05), 861–904. https://doi.org/10.1017/S0142716418000036
Ratcliff, R. (u.å.). Methods for Dealing With Reaction Time Outliers, 23. Whelan, R. (2010). Effective analysis of reaction time data. The Psychological Record, 58(3). Hentet fra https://opensiuc.lib.siu.edu/tpr/vol58/iss3/9