Experiment 1: Director-Matcher Task

Methods

For this experiment, we recruited 15 native speakers of Xhosa and 15 native speakers of Afrikaans. Each participant described 25 sung tone pairs to a confederate matcher.

Hypotheses

Afrikaans speakers will mainly use the ‘height’ metaphor in speech.
Xhosa speakers, will use both ‘height’ and ‘size’ flexibly.
Based on previous findings, we expect gestures accompanying ‘height’ to consistently converge with the spatial mappings invoked by expressions like “high” and “low”.
Conversely, we expect gestures accompanying ‘size’ in speech to indicate ‘size’ to a lesser extent, and sometimes reveal spatial mappings consistent with ‘height’ metaphors.

Speech results

Number of metaphorical expressions used in speech

## 
## Afrikaans     Xhosa 
##       556       190

Weighted mean distributions of metaphors in speech

In the figure below, we find that ‘height’ is dominant in both languages. ‘Size’ is only used in Xhosa, though less than expected on the basis of our pilot data.

Gesture frequency

In the table anch figure below, we see that Xhosa speakers gestured more when using metaphors in speech. Within the two groups, the by-metaphor gesture rates were comparable.

Proportion of metaphorical expressions with co-speech gestures

Gesture frequency by metaphor

Speech-gesture convergence

Here, we only consider the spatial metaphors ‘height’ and ‘size’.

Gestures were coded for dimension (in terms of movement and location) and handshape (i.e. flat hand, “grip”) and speech-gesture pairs were then coded as either “yes” (convergent), “no” (divergent, e.g. ‘size’ in speech with vertical gestures), “mixed” (both ‘height’ and ‘size’ mappings expressed in gesture) and “n/a” (gestures not clearly expressing spatial mappings).

Summary

We see that in both languages, the ‘height’ metaphor is consistently accompanied by gestures indicating a vertical space-pitch mapping.
As predicted, this was also the case with ‘size’ metaphors in Xhosa, which were mostly accompanied by similar gestures expressing verticality.

Experiment 2: Implicit Associations Task

Methods

For this experiment, we recruited 30 native speakers of Xhosa and 30 native speakers of Afrikaans.

Participants performed an RT task targeting implicit space-pitch associations by pairing circles differing in vertical position (high/low, height condition) or size (small, big) with a high/low pitched voice. Participants were asked to indicate whether the sound in each trial was high or low-pitched with button presses.

There were 16 blocks (8 for each condition), with 20 trials in each (320 trials per participant). The order of blocks was randomised.

Stimulus pairs were presented for 200 msec and participants used button presses to indicate whether the sound

Half of the stimulus pairs were “incongruent” in terms of the space-pitch mapping.

Hypotheses

In experiment one, Afrikaans speakers described pitch in terms of ‘height’, whereas Xhosa speakers used ‘size’ in addition to ‘height’. Based on these findings and previous work, we would expect the following:

Afrikaans speakers’ RTs would be slower (in response to incongruent stimuli) in the ‘height’ condition than in the ‘size’ condition.
Xhosa speakers’ RTs would be slower in response to incongruent stimuli in both conditions.

We thus expected a three-way interaction effect between language, condition and congruence.

Trimming and filtering the data

RT data are generally difficult to handle for a number of reasons, and the literature proposes a number of procedures to trim and filter the data prior to statistical analyses. In the next sections, we go through each step discussed in the relevant literature.

Accuracy

Ideally, we would want to remove data from participants performing at chance level by pressing buttons at random. To my knowledge, there isn’t a specific threshold for accuracy that’s widely agreed upon for this or similar tasks. However, inspecting the distribution of overall accuracies for each participant, we see that a few participants (n=5) clearly stand out from the rest. Data from these participants are left out in the later analyses.

Another method used by Abutalebi et al., is to use confidence intervals and then set the threshold at the lower CI for each language group. I tried this, and found that this would have further excluded data from four participants.

Setting an upper RT threshold

In the distribution of the response times below, we see a very long right tail. The slowest response is 28 seconds!

Setting an upper threshold will affect RT estimates, but these extreme values would themselves have a major influence making estimates unreliable.

I therefore propose “mild” initial trimming of the data excluding response slower than five seconds as indicated in the figures below.

I’ve also checked for responses that are faster than 100 msec (which are generally considered to be errors), but found none. The fastest recorded RT is 160 msec.

Individual thresholds

There appears to be wide agreement that individual thresholds should be set based on the overall mean and standard deviation for each participant, but authors advocate different levels. Baayen & Milin argue that the frequently used limit at 2 (perhaps also 2.5) standard deviations is too aggressive and proposes an upper limit of 3 standard deviations above the mean coupled with minimal trimming based on residuals after fitting a model. This is essentially what they call performing “model criticism”.

Following this suggestion, we set the threshold for individual RTs at three SDs above individual means.

The table below indicates the amount of data removed in each step. The amount of discarded data seems reasonable and in line with what is generally considered acceptable. Note that the majority of the discarded data is due to poor performance.

Statistical approaches

The classical central tendency approach to analysing RTs is using ANOVAs on the by-participant aggregated means. However, this technique assumes normally distributed data, in which case participant means would offer a reliable summary of the data. The problem is that RTs rarely follow a normal distribution, but rather a positively skewed distribution resembling an ex-Gaussian distribution characterized by a long right tail. As the below plot shows, this is also the case here.

To deal with this issue, we’ll try out the following approaches:

The Ex-Gaussian approach with separate analyses for the \(\mu\) and \(\tau\) parameters in RT distributions
Linear regression/ANOVA on the conflict effect and coefficient of variability
Mixed-effects linear regression with model criticism, checking for autocorrelated RT lags.

The ex-Gaussian approach

In this approach, we compute the three parameters, \(\sigma\) (sigma), \(\mu\) (mu) and tau (tau), that describe an ex-Gaussian distribution, which itself is a convolution of a normal and an exponential distribution. The idea is that potentially interesting effects may hide in the long right tail (\(\tau\) component), which is ignored in standard central tendency tests. \(\mu\) and \(\sigma\) reflect the mean and standard deviation of the Gaussian component, whereas \(\tau\) reflects the mean and the standard deviation of the exponential component.

This procedure has been used e.g. by Abutalebi et al. and Calabria et al.

The figure below shows the dstributions of the three parameters.

We already know that the the grouped distributions appear to follow ex-Gaussian distributions, but we can go further and inspect distributions for each participant in the figure below ordered by mean RT.

From this, it is perhaps less clear that the ex-Gaussian distribution provides the best fit for our data.

We can quantify this by creating simulated data based on the aggregated normal and ex-Gaussian parameters for each participant and perform Kolmogorov-Smirnov tests to see whether the real and simulated distributions are significantly different.

The p-values in the below figure tell us how many participants follow an ex-Gaussian vs. a normal distribution. There are more values above .05 in the ex-Gaussian panel. We therefore conclude that this distribution provides a better fit for the majority of our data.

Before fitting models to our data, we can inspect the following plots which give us a better idea of possible interactions between factors within each parameter. There’s no clear evidence for interactions in the estimated parameters, but there might be main effects of language.

The \(\mu\) parameter

Fitting a full model with all predictors and interactions reveals a marginally significant effect of language (p = .059), i.e. the Xhosa group had shorter RTs in the Gaussian component. No effects reached significant p-values.

## 
## Call:
## lm(formula = mu ~ language * condition * congruent, data = params)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -131.82  -50.05  -13.16   51.48  253.78 
## 
## Coefficients:
##                                           Estimate Std. Error t value
## (Intercept)                                592.737     13.468  44.010
## languageXhosa                              -37.173     19.588  -1.898
## conditionsize                              -13.829     19.047  -0.726
## congruentTRUE                              -21.541     19.047  -1.131
## languageXhosa:conditionsize                  3.375     27.702   0.122
## languageXhosa:congruentTRUE                 18.597     27.702   0.671
## conditionsize:congruentTRUE                 27.784     26.936   1.031
## languageXhosa:conditionsize:congruentTRUE  -41.386     39.177  -1.056
##                                           Pr(>|t|)    
## (Intercept)                                 <2e-16 ***
## languageXhosa                               0.0591 .  
## conditionsize                               0.4686    
## congruentTRUE                               0.2593    
## languageXhosa:conditionsize                 0.9031    
## languageXhosa:congruentTRUE                 0.5027    
## conditionsize:congruentTRUE                 0.3035    
## languageXhosa:conditionsize:congruentTRUE   0.2920    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 72.53 on 212 degrees of freedom
## Multiple R-squared:  0.07613,    Adjusted R-squared:  0.04562 
## F-statistic: 2.496 on 7 and 212 DF,  p-value: 0.01751

The \(\tau\) parameter

As with the \(\mu\) parameter, fitting a full model with all predictors and interactions reveals only a marginally significant effect of language (p = .095).

## 
## Call:
## lm(formula = tau ~ language * condition * congruent, data = params)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -170.61  -72.48  -20.88   52.21  468.38 
## 
## Coefficients:
##                                           Estimate Std. Error t value
## (Intercept)                                147.355     19.775   7.451
## languageXhosa                               48.309     28.762   1.680
## conditionsize                                8.576     27.966   0.307
## congruentTRUE                                7.628     27.966   0.273
## languageXhosa:conditionsize                  7.118     40.675   0.175
## languageXhosa:congruentTRUE                 -8.017     40.675  -0.197
## conditionsize:congruentTRUE                -11.452     39.550  -0.290
## languageXhosa:conditionsize:congruentTRUE   15.138     57.524   0.263
##                                           Pr(>|t|)    
## (Intercept)                               2.31e-12 ***
## languageXhosa                               0.0945 .  
## conditionsize                               0.7594    
## congruentTRUE                               0.7853    
## languageXhosa:conditionsize                 0.8613    
## languageXhosa:congruentTRUE                 0.8439    
## conditionsize:congruentTRUE                 0.7724    
## languageXhosa:conditionsize:congruentTRUE   0.7927    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 106.5 on 212 degrees of freedom
## Multiple R-squared:  0.06082,    Adjusted R-squared:  0.02981 
## F-statistic: 1.961 on 7 and 212 DF,  p-value: 0.06172

In sum, we found that Xhosa speakers produced smaller \(\mu\). Otherwise, splitting the Gaussian and exponential components in the RTs did not allow us to identify effects that might have been hiding in the right tails.

Calabria et al. also ran correlation analyses on the \(\mu\) and \(\tau\) parameters of their groups. I’m not completely sure if this is truly interesting or relevant, but doing so yields a small, but significant negative correlation of \(\mu\) and \(\tau\) for Afrikaans speakers and no correlation for Xhosa speakers.

Conflict effect

We can also think of the dependent variable as the mean difference in RTs in response to congruent and incongruent trials, what Calabria et al. call the conflict effect

Below we’ll plot the conflict effect and run the analysis. Note that the congruence variable is contained in the dependent variable, so we fit a model with language and condition as the only factors.

## 
## Call:
## lm(formula = conflict ~ language * condition, data = conflict_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -108.086  -22.213    1.279   16.963  144.360 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                   13.913      7.616   1.827   0.0706 .
## languageXhosa                -10.580     11.078  -0.955   0.3417  
## conditionsize                -16.332     10.771  -1.516   0.1324  
## languageXhosa:conditionsize   26.248     15.666   1.675   0.0968 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 41.02 on 106 degrees of freedom
## Multiple R-squared:  0.02899,    Adjusted R-squared:  0.001504 
## F-statistic: 1.055 on 3 and 106 DF,  p-value: 0.3716

There seems to be too much variability to detect any significant effects, though the interaction lines suggest opposite trends, which is reflected in the marginally significant interaction effect between language and condition (p = .097)

Coefficient of variability

It might be interesting to explore whether there are patterns in the variability of response latencies. We can compute the coefficient of variability for each participant by diving the individual SDs by the means.

We plot the data and fit a model in the same way as with the conflict effect.

## 
## Call:
## lm(formula = lm(coef_var ~ language * condition * congruent, 
##     data = params))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.18572 -0.06823 -0.00593  0.04981  0.35430 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                0.2329422  0.0178153  13.075
## languageXhosa                              0.0418901  0.0259112   1.617
## conditionsize                              0.0014727  0.0251946   0.058
## congruentTRUE                              0.0020427  0.0251946   0.081
## languageXhosa:conditionsize                0.0141439  0.0366440   0.386
## languageXhosa:congruentTRUE                0.0037418  0.0366440   0.102
## conditionsize:congruentTRUE               -0.0067914  0.0356306  -0.191
## languageXhosa:conditionsize:congruentTRUE  0.0007974  0.0518224   0.015
##                                           Pr(>|t|)    
## (Intercept)                                 <2e-16 ***
## languageXhosa                                0.107    
## conditionsize                                0.953    
## congruentTRUE                                0.935    
## languageXhosa:conditionsize                  0.700    
## languageXhosa:congruentTRUE                  0.919    
## conditionsize:congruentTRUE                  0.849    
## languageXhosa:conditionsize:congruentTRUE    0.988    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09594 on 212 degrees of freedom
## Multiple R-squared:  0.07045,    Adjusted R-squared:  0.03975 
## F-statistic: 2.295 on 7 and 212 DF,  p-value: 0.02833

From the regression output, we see that no effects reached significance.

Mixed-effects regression

The downside of the previous analyses is that they require the data to be aggregated. For this analysis, we’ll follow Baayen & Milin’s suggestions and do the following:

Compare distributions and transformations of the RTs
Fit models with different random structures
Apply model criticism and refit models
Account for autocorrelation in RTs

Distributions

We’ll compare and determine whether to use untransformed RTs, a log or a inverse Gaussian transformation.

The values for skewness and kurtosis in the table below suggests that all options result in skewed and kurtotic distributions, but with considerable improvements with transformations.

However, these measures are known to be unreliable with larger samples (n > 200)

Instead, we’ll inspect quantile-quantile plots for the goodness of fit of theoretical distributions. Also shown, are the correlation coefficients of the observed and theoretical distributions.

Based on the output, we proceed with the inverse Gaussian transformation. Later model criticism will further improve the goodness of fit.

Below we see QQ-plots for individual participants. It is clear that both groups have a few participants deviating from the expected pattern causing later points to rise above the line in the center panel above.

Before fitting regression models, we’ll plot the data grouped by our independent variables.

Random intercept model

We first fit a random intercept model. From the output we can observe a significant three-way interaction between language, condition and congruence (p = .02)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RTinv ~ language * condition * congruent + (1 | participant) +  
##     (1 | item)
##    Data: df
## 
## REML criterion at convergence: 10686.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2278 -0.5713  0.0162  0.5620 14.7918 
## 
## Random effects:
##  Groups      Name        Variance  Std.Dev.
##  participant (Intercept) 4.128e-02 0.203168
##  item        (Intercept) 5.032e-05 0.007093
##  Residual                1.103e-01 0.332078
## Number of obs: 16392, groups:  participant, 55; item, 8
## 
## Fixed effects:
##                                             Estimate Std. Error         df
## (Intercept)                                1.457e+00  3.872e-02  5.675e+01
## languageXhosa                              7.514e-03  5.587e-02  5.593e+01
## conditionsize                              1.229e-02  1.227e-02  8.310e+00
## congruentTRUE                              2.619e-02  1.233e-02  8.450e+00
## languageXhosa:conditionsize               -1.786e-02  1.482e-02  1.633e+04
## languageXhosa:congruentTRUE               -1.666e-02  1.482e-02  1.633e+04
## conditionsize:congruentTRUE               -3.397e-02  1.739e-02  8.361e+00
## languageXhosa:conditionsize:congruentTRUE  4.828e-02  2.085e-02  1.633e+04
##                                           t value Pr(>|t|)    
## (Intercept)                                37.633   <2e-16 ***
## languageXhosa                               0.134   0.8935    
## conditionsize                               1.001   0.3451    
## congruentTRUE                               2.125   0.0645 .  
## languageXhosa:conditionsize                -1.205   0.2282    
## languageXhosa:congruentTRUE                -1.125   0.2607    
## conditionsize:congruentTRUE                -1.954   0.0849 .  
## languageXhosa:conditionsize:congruentTRUE   2.316   0.0206 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.681                                          
## conditionsz -0.159  0.074                                   
## congrntTRUE -0.159  0.074  0.502                            
## lnggXhs:cnd  0.088 -0.133 -0.552 -0.278                     
## lnggXh:TRUE  0.088 -0.133 -0.279 -0.556  0.502              
## cndtns:TRUE  0.113 -0.052 -0.707 -0.709  0.390  0.395       
## lnggX::TRUE -0.063  0.095  0.393  0.396 -0.712 -0.711 -0.556

We then calculate \(R^2\) indicating how well the model fits our data.

## [1] 0.2670519

We then apply “model criticism” following Baayen & Milin’s recommendations. This means minimal trimming of standardized residuals above 2.5.

We then refit the model and see that the interaction is still significant.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RTinv ~ language * condition * congruent + (1 | participant) +  
##     (1 | item)
##    Data: df2
## 
## REML criterion at convergence: 6830.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.87973 -0.61361  0.03524  0.63555  2.98165 
## 
## Random effects:
##  Groups      Name        Variance  Std.Dev.
##  participant (Intercept) 4.243e-02 0.20599 
##  item        (Intercept) 8.227e-05 0.00907 
##  Residual                8.754e-02 0.29587 
## Number of obs: 16173, groups:  participant, 55; item, 8
## 
## Fixed effects:
##                                             Estimate Std. Error         df
## (Intercept)                                1.448e+00  3.930e-02  5.688e+01
## languageXhosa                              9.256e-03  5.642e-02  5.529e+01
## conditionsize                              1.180e-02  1.275e-02  6.666e+00
## congruentTRUE                              2.993e-02  1.279e-02  6.741e+00
## languageXhosa:conditionsize               -1.645e-02  1.329e-02  1.611e+04
## languageXhosa:congruentTRUE               -2.742e-02  1.328e-02  1.611e+04
## conditionsize:congruentTRUE               -3.451e-02  1.806e-02  6.695e+00
## languageXhosa:conditionsize:congruentTRUE  5.753e-02  1.871e-02  1.611e+04
##                                           t value Pr(>|t|)    
## (Intercept)                                36.836  < 2e-16 ***
## languageXhosa                               0.164  0.87030    
## conditionsize                               0.925  0.38712    
## congruentTRUE                               2.340  0.05319 .  
## languageXhosa:conditionsize                -1.238  0.21588    
## languageXhosa:congruentTRUE                -2.064  0.03906 *  
## conditionsize:congruentTRUE                -1.911  0.09947 .  
## languageXhosa:conditionsize:congruentTRUE   3.075  0.00211 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.678                                          
## conditionsz -0.163  0.056                                   
## congrntTRUE -0.162  0.056  0.501                            
## lnggXhs:cnd  0.077 -0.118 -0.474 -0.239                     
## lnggXh:TRUE  0.078 -0.118 -0.240 -0.478  0.501              
## cndtns:TRUE  0.115 -0.040 -0.707 -0.709  0.336  0.339       
## lnggX::TRUE -0.055  0.084  0.338  0.340 -0.712 -0.710 -0.478

The \(R^2\) value now indicates a considerably better fit.

## [1] 0.3208119

Random intercepts and slopes

We’ll also try fitting a model with “maximal random structure” including both random intercepts and slopes as is allowed by the design. We see that the interaction effect is retained in the model.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## RTinv ~ language * condition * congruent + (1 + condition + congruent |  
##     participant) + (1 | item)
##    Data: df
## 
## REML criterion at convergence: 10603.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2198 -0.5668  0.0171  0.5619 14.6442 
## 
## Random effects:
##  Groups      Name          Variance  Std.Dev. Corr       
##  participant (Intercept)   4.357e-02 0.208723            
##              conditionsize 3.872e-03 0.062228 -0.21      
##              congruentTRUE 1.397e-03 0.037370 -0.10 -0.09
##  item        (Intercept)   5.222e-05 0.007226            
##  Residual                  1.090e-01 0.330184            
## Number of obs: 16392, groups:  participant, 55; item, 8
## 
## Fixed effects:
##                                             Estimate Std. Error         df
## (Intercept)                                1.456e+00  3.973e-02  5.474e+01
## languageXhosa                              7.739e-03  5.733e-02  5.387e+01
## conditionsize                              1.324e-02  1.689e-02  2.359e+01
## congruentTRUE                              2.662e-02  1.419e-02  1.354e+01
## languageXhosa:conditionsize               -1.772e-02  2.236e-02  8.650e+01
## languageXhosa:congruentTRUE               -1.545e-02  1.788e-02  1.211e+02
## conditionsize:congruentTRUE               -3.440e-02  1.744e-02  8.261e+00
## languageXhosa:conditionsize:congruentTRUE  4.615e-02  2.076e-02  1.626e+04
##                                           t value Pr(>|t|)    
## (Intercept)                                36.649   <2e-16 ***
## languageXhosa                               0.135   0.8931    
## conditionsize                               0.784   0.4408    
## congruentTRUE                               1.876   0.0823 .  
## languageXhosa:conditionsize                -0.792   0.4303    
## languageXhosa:congruentTRUE                -0.864   0.3891    
## conditionsize:congruentTRUE                -1.972   0.0829 .  
## languageXhosa:conditionsize:congruentTRUE   2.223   0.0262 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.682                                          
## conditionsz -0.253  0.148                                   
## congrntTRUE -0.184  0.095  0.289                            
## lnggXhs:cnd  0.162 -0.239 -0.617 -0.136                     
## lnggXh:TRUE  0.109 -0.163 -0.143 -0.588  0.235              
## cndtns:TRUE  0.110 -0.050 -0.516 -0.619  0.255  0.323       
## lnggX::TRUE -0.061  0.092  0.284  0.342 -0.470 -0.586 -0.552

\(R^2\):

## [1] 0.2783126

We apply model criticism and see that the interaction is still significant.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## RTinv ~ language * condition * congruent + (1 + condition + congruent |  
##     participant) + (1 | item)
##    Data: df3
## 
## REML criterion at convergence: 6807.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.81499 -0.60413  0.03325  0.63064  2.97502 
## 
## Random effects:
##  Groups      Name          Variance  Std.Dev. Corr       
##  participant (Intercept)   4.457e-02 0.211109            
##              conditionsize 3.611e-03 0.060095 -0.23      
##              congruentTRUE 8.312e-04 0.028831 -0.12  0.22
##  item        (Intercept)   9.442e-05 0.009717            
##  Residual                  8.685e-02 0.294711            
## Number of obs: 16180, groups:  participant, 55; item, 8
## 
## Fixed effects:
##                                             Estimate Std. Error         df
## (Intercept)                                1.446e+00  4.030e-02  5.544e+01
## languageXhosa                              1.064e-02  5.778e-02  5.369e+01
## conditionsize                              1.499e-02  1.729e-02  1.695e+01
## congruentTRUE                              3.141e-02  1.428e-02  8.598e+00
## languageXhosa:conditionsize               -2.008e-02  2.096e-02  8.247e+01
## languageXhosa:congruentTRUE               -2.633e-02  1.537e-02  1.354e+02
## conditionsize:congruentTRUE               -3.750e-02  1.869e-02  6.429e+00
## languageXhosa:conditionsize:congruentTRUE  6.175e-02  1.866e-02  1.605e+04
##                                           t value Pr(>|t|)    
## (Intercept)                                35.868  < 2e-16 ***
## languageXhosa                               0.184 0.854607    
## conditionsize                               0.867 0.398196    
## congruentTRUE                               2.199 0.056791 .  
## languageXhosa:conditionsize                -0.958 0.340960    
## languageXhosa:congruentTRUE                -1.713 0.089030 .  
## conditionsize:congruentTRUE                -2.006 0.088469 .  
## languageXhosa:conditionsize:congruentTRUE   3.309 0.000937 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) lnggXh cndtns cnTRUE lnggX: lX:TRU c:TRUE
## languageXhs -0.677                                          
## conditionsz -0.268  0.140                                   
## congrntTRUE -0.197  0.080  0.408                            
## lnggXhs:cnd  0.165 -0.246 -0.564 -0.179                     
## lnggXh:TRUE  0.107 -0.161 -0.202 -0.499  0.360              
## cndtns:TRUE  0.116 -0.037 -0.540 -0.657  0.205  0.282       
## lnggX::TRUE -0.054  0.082  0.248  0.304 -0.450 -0.612 -0.460

The new \(R^2\) indicates a much better fit.

## [1] 0.3297167

Autocorrelation of RTs

In the plots below, we see that there are no clear patterns in RTs over trials or over the duration of the experiments.

Still, in this very fast-paced experiment with many trials, it might be the case that response latencies are dependent on RTs in previous trials, particularly at lag\(_{t-1}\).

We therefore need to check for and possibly control for the RT variable being correlated with itself.

However, the following autocorrelation plots with a subsample of thirty participants suggest that there’s no evidence for significant autocorrelation between RTs for most of the participants. We can therefore leave RTs at t-1 out of the model.

## Quantiles to be plotted:
##           0%    3.448276%    6.896552%    10.34483%     13.7931% 
## -0.586671730 -0.397174279 -0.321054119 -0.277102594 -0.237523948 
##    17.24138%    20.68966%    24.13793%    27.58621%    31.03448% 
## -0.207245033 -0.182002424 -0.158051947 -0.130596415 -0.113514347 
##    34.48276%    37.93103%    41.37931%    44.82759%    48.27586% 
## -0.097340640 -0.065118054 -0.041524692 -0.022997554 -0.002101595 
##    51.72414%    55.17241%    58.62069%    62.06897%    65.51724% 
##  0.023375358  0.045085477  0.062544878  0.083833946  0.105303793 
##    68.96552%    72.41379%    75.86207%    79.31034%    82.75862% 
##  0.131932183  0.153693915  0.178259327  0.202648323  0.238750555 
##     86.2069%    89.65517%    93.10345%    96.55172%         100% 
##  0.271158576  0.311439933  0.369299480  0.432781245  0.705182612

Summary

Methods:

The ex-Gaussian approach failed to detect significant effects in the \(\mu\) and \(\tau\) parameters. However, more aggressive trimming might yield significant results in some of these analyses.
This is an interesting approach, but also quite limited in that it requires aggregating the data.
The same may be true for the conflict effect and the coefficient of variability.
A three-way interaction was significant in all mixed-effects regression models varying in their random structures before and after applying model criticism.

The three-way interaction suggests that:

Afrikaans speakers’ RTs were more affected by congruence in the ‘height’ condition
Xhosa speakers were more affected by congruence in the ‘size’ condition

Xhosa speakers appear to have slightly slower responses overall, but this may be due to greater variability, since, as we saw in the ex-Gaussian analysis, they actually have smaller \(\mu\), but larger \(\tau\) values compared with the Afrikaans group.

Experiment 3: Two-alternative forced-choice task

Whereas experiment 2 was designed to test implicit associations between space and pitch, experiment 3 is aimed at explicit associations where participants choose between height and size mappings in a mixed condition, and whether to match high/low pitch with high vs low position and small vs. big.

Methods

We recruited 30 native speakers of Xhosa and 30 native speakers of Afrikaans for experiment 3.

Participants performed a two-alternative forced-choice task targeting explicit space-pitch associations by pairing pitch with circles differing in vertical position (high/low, height condition) or size (small, big). In a third condition (or trial type), participants had to choose between mapping pitch to either a high/low or a small/big circle.

There were 40 trials per participant. The order of trial types was randomised.

We also recorded RTs in each trial.

We again use mixed-effects regression to analyse the data.

Hypotheses

Xhosa speakers’ will be more likely to pair pitch with ‘size’ in the mixed condition compared with Afrikaans speakers.
No group differences in stimulus pairing in the ‘height’ and ‘size’ conditions. E.g. high pitch will consistently be paired with high and small circles.
RTs will be slower in the mixed condition for both groups.

Mixed condition

The plot and output below indicate that Xhosa speakers, when given the possibility of pairing either visual height or size with pitch, are less likely to select size than Afrikaans speakers.

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: choiceDimension ~ voice * language + (1 | participant)
##    Data: df_m
## 
##      AIC      BIC   logLik deviance df.resid 
##    587.4    609.4   -288.7    577.4      592 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0581 -0.5260  0.2646  0.4567  3.4811 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept) 2.32     1.523   
## Number of obs: 597, groups:  participant, 61
## 
## Fixed effects:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.3472     0.3720   3.622 0.000293 ***
## voicelow                 1.0147     0.3516   2.886 0.003899 ** 
## languageXhosa           -2.1798     0.5143  -4.238 2.25e-05 ***
## voicelow:languageXhosa   1.4295     0.4846   2.950 0.003180 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) voiclw lnggXh
## voicelow    -0.358              
## languageXhs -0.731  0.255       
## vclw:lnggXh  0.294 -0.709 -0.418

We find an interaction effect between language and voice frequency.

Contrary to our expectations, Afrikaans speakers were more likely to pair small circles with high-pitched voices rather than circles with a high position.

The opposite patterns was found for Xhosa speakers, who preferred the ‘height’ mapping.

For low-pitched voices, both group consistently favoured the ‘size’ mapping.

Height condition

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: choiceName ~ voice * language + (1 + voice | participant)
##    Data: df_h
## 
##      AIC      BIC   logLik deviance df.resid 
##    735.6    766.4   -360.8    721.6      600 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2522 -0.6824  0.2859  0.5909  2.2376 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  participant (Intercept) 4.828    2.197         
##              voicelow    8.511    2.917    -0.94
## Number of obs: 607, groups:  participant, 61
## 
## Fixed effects:
##                        Estimate Std. Error z value Pr(>|z|)   
## (Intercept)             -0.8408     0.4730  -1.778  0.07547 . 
## voicelow                 1.6184     0.6233   2.596  0.00942 **
## languageXhosa            0.7860     0.6648   1.182  0.23710   
## voicelow:languageXhosa  -1.3244     0.8702  -1.522  0.12804   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) voiclw lnggXh
## voicelow    -0.892              
## languageXhs -0.714  0.637       
## vclw:lnggXh  0.641 -0.717 -0.897

In the ‘height’ condition, we find a significant effect for voice frequency. The trend further suggests an interaction such that Afrikaans speakers are more consistent in mapping high/low voices to high/low circles.

Size condition

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: choiceName ~ voice * language + (1 | participant)
##    Data: df_s
## 
##      AIC      BIC   logLik deviance df.resid 
##    577.1    599.1   -283.5    567.1      603 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0099 -0.4562 -0.2429  0.3899  3.8936 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept) 0.9321   0.9654  
## Number of obs: 608, groups:  participant, 61
## 
## Fixed effects:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.8822     0.2976   6.324 2.55e-10 ***
## voicelow                -4.2257     0.4014 -10.527  < 2e-16 ***
## languageXhosa           -2.1984     0.3913  -5.619 1.92e-08 ***
## voicelow:languageXhosa   2.5268     0.4982   5.072 3.93e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) voiclw lnggXh
## voicelow    -0.546              
## languageXhs -0.774  0.436       
## vclw:lnggXh  0.417 -0.769 -0.482

In the ‘size’ condition, we see a clear interaction between language and voice frequency.

The two language groups showed high agreement in pairing low pitch with big, rather than small circles.

Interestingly, Xhosa speakers also showed a slight preference for pairing high pitch with big circles, whereas Afrikaans speakers more consistently paired high pitch with small circles.

RT analyses

Finally, as large RTs might be indicative of uncertainty, we’ll examine whether there are significant differences in RTs related to the independent variables.

The data has been trimmed to only include responses faster than 20 seconds.

Not shown here are QQ-plots and correlation coefficients indicating that the log-normal transformation provides the best fit for our data. We will also apply model criticism and only examine the final model.

As the condition variable has more than two levels for comparison, we’ll base our analysis on an anova table summarizing the regression output.

The interaction plots and anova table reveal significant main effects of language, voice frequency and condition with no interaction effects. Xhosa speakers generally took longer in selecting visual stimuli, which was to be expected on the basis of their lower consistency in all conditions.

Interestingly, our expectation that the mixed condition would give rise to the longest RTs is not supported by the results. Instead the ‘height’ condition consistently gave rise to slower decisions despite being the more dominant mapping in speech.

Summary

This experiment yielded some very interesting and surprising results.

Overall, Xhosa speakers were less consistent in how they mapped space to pitch, both when choosing visual stimuli of opposite spatial polarities, and when choosing between height and size.

Surprisingly, Afrikaans speakers consistently preferred the size mapping for both high and low-pitched voices, whereas Xhosa speakers showed a preference for ‘height’ in cases with low pitch, and ‘size’ in cases with high pitch.

Another striking finding was that, in the size condition, Xhosa speakers showed a preference for mapping big circles to both low pitch and high pitch, though to a lesser extent.

There appears to be more individual variation for Xhosa speakers in this task, both in choosing between mappings, as well as choosing polarity correspondences within particular mappings.

In this experiment, linguistic metaphors proved to be poor predictors of non-linguistic choices with regards to spatial mappings.

Conclusion

In the series of experiments, we examined spatial metaphors for pitch from three angles: language production, performance in a nonverbal implicit association task and nonverbal judgements in an explicit association task.

Our language production findings show that ‘height’ is used in Adrikaans, whereas Xhosa speakers also used ‘size’. Interestingly, and in line with our findings from a previous study, vertical gestures frequently accompany ‘height’ but also ‘size’ metaphors. The speech material does not allow us determine whether gestural indications of height might refer to the physical size/height of a person likely to produce the heard sounds.

In the RT task, we found a three-way interaction that followed our predictions. However, methods requiring data aggregation failed to detect this effect. I would suggest reporting the statistically more sophisticated mixed-effect regfression model with maximal random structure, but perhaps also noting that an ex-Gaussian approach failed to detect any significant effects. As the last interaction plot indicates, the effect is far from spectacular (as opposed to previous findings in the literature).

The results from the hird experiment seem more puzzling. We did not expect Afrikaans speakers to consistently map pitch to ‘size’. Nor did we expect Xhosa speakers to choose ‘height’/‘size’ depending on the pitch of the voice.

In the height and size conditions, Afrikaans speakers paired pitch with high/low/big/small as expected, whereas Xhosa speakers were much less consistent, with the exception that they had a clear preference for pairing “low” pitch with bic circles as opposed to small circles. Further studies with other types of stimuli might shed light on the latter findings

As I see it, our findings point in different directions and only support the very general idea that the conceptualisation of pitch is, to some extent, spatial, but flexible. The factorial designs demonstrated different effects of manipulating visual size and height for the two groups, but only in the case of mixed-effects regression. The role of language in shaping the conceptualisation of pitch is unclear, but far from deterministic. This is particularly evident from the contradictory findings in experiment 3.

References and readings

Abutalebi, J., Guidi, L., Borsa, V., Canini, M., Della Rosa, P. A., Parris, B. A., & Weekes, B. S. (2015). Bilingualism provides a neural reserve for aging populations. Neuropsychologia, 69, 201–210. https://doi.org/10.1016/j.neuropsychologia.2015.01.040

Baayen, H. R., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12. https://doi.org/10.21500/20112084.807

Calabria, M., Hernandez, M., Martin, C. D., & Costa, A. (2011). When the Tail Counts: The Advantage of Bilingualism Through the Ex-Gaussian Distribution Analysis. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00250

Henriquez-Henriquez, M. P., Billeke, P., Henriquez, H., Zamorano, F. J., Rothhammer, F., & Aboitiz, F. (2015). Intra-Individual Response Variability Assessed by Ex-Gaussian Analysis may be a New Endophenotype for Attention-Deficit/Hyperactivity Disorder. Frontiers in Psychiatry, 5. https://doi.org/10.3389/fpsyt.2014.00197

Lachaud, C. M., & Renaud, O. (2011). A tutorial for analyzing human reaction times: How to filter data, manage missing values, and choose a statistical model. Applied Psycholinguistics, 32(02), 389–416. https://doi.org/10.1017/S0142716410000457

Marsden, E., Thompson, S., & Plonsky, L. (2018). A methodological synthesis of self-paced reading in second language research. Applied Psycholinguistics, 39(05), 861–904. https://doi.org/10.1017/S0142716418000036

Ratcliff, R. (u.å.). Methods for Dealing With Reaction Time Outliers, 23. Whelan, R. (2010). Effective analysis of reaction time data. The Psychological Record, 58(3). Hentet fra https://opensiuc.lib.siu.edu/tpr/vol58/iss3/9

Report on Stellenbosch data

Experiment 1: Director-Matcher Task

Methods

Hypotheses

Speech results

Number of metaphorical expressions used in speech

Weighted mean distributions of metaphors in speech

Gesture frequency

Proportion of metaphorical expressions with co-speech gestures

Gesture frequency by metaphor

Speech-gesture convergence

Summary

Experiment 2: Implicit Associations Task

Methods

Hypotheses

Trimming and filtering the data

Accuracy

Setting an upper RT threshold

Individual thresholds

Statistical approaches

The ex-Gaussian approach

The \(\mu\) parameter

The \(\tau\) parameter

Conflict effect

Coefficient of variability

Mixed-effects regression

Distributions

Random intercept model

Random intercepts and slopes

Autocorrelation of RTs

Summary

Experiment 3: Two-alternative forced-choice task

Methods

Hypotheses

Mixed condition

Height condition

Size condition

RT analyses

Summary

Conclusion

References and readings