Criminal behaviour and trait judgements of accents in the UK

Analysis: Jens Roeser

Compiled Jun 05 2023

1 Analysis

Data were analysed in Bayesian mixed effects models (Gelman et al., 2014; McElreath, 2016). The R (R Core Team, 2020) package brms (Bürkner, 2017, 2018) was used to model the data. Models were fitted with weakly informative priors (see McElreath, 2016), and run with 4,000 iterations on 4 chains with a warm-up of 2,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman & Rubin, 1992) and inspection of the Markov chain Monte Carlo chains.

2 Data processing

Get file names

# File names
(files <- list.files(path = "../data", pattern = ".csv", full.names = T))
[1] "../data/behaviouralresults.csv" "../data/socialresults.csv"     

Load and process both data files

# Load and process data
data <- map_dfr(files, read_csv) %>% 
  select(-1) %>% 
  clean_names() %>% 
  select(-screen_name, -display, 
         -response_num, -age_group,
         -starts_with("other"), -question_type,
         -starts_with("trial"), -uk_area2, -mono,
         -response_scale, -qnum, -accentq, -listener_accent) %>% 
  rename(age = age_years,
         judgements = spreadsheet,
         id = participant_private_id) %>% 
  mutate(across(id, ~as.numeric(factor(.))),
         across(c(judgements, question), str_to_lower),
         across(judgements, ~str_remove(., " judgements")))

The data frame used for analysis had the following structure:

glimpse(data)
Rows: 18,000
Columns: 10
$ id              <dbl> 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 50, 50, 50, 50…
$ judgements      <chr> "behavioural", "behavioural", "behavioural", "behaviou…
$ stimulus_accent <chr> "Bristol", "Cardiff", "Glasgow", "Bradford", "Belfast"…
$ question        <chr> "physically assault someone", "report a relative to th…
$ response        <dbl> 2, 2, 3, 4, 3, 2, 5, 2, 5, 2, 1, 2, 7, 5, 5, 5, 2, 6, …
$ familiarity     <dbl> 3, 2, 2, 7, 1, 7, 2, 6, 3, 5, 5, 6, 7, 3, 4, 5, 4, 5, …
$ age             <dbl> 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23…
$ gender          <chr> "Woman", "Woman", "Woman", "Woman", "Woman", "Woman", …
$ ethnicity       <chr> "White", "White", "White", "White", "White", "White", …
$ uk_area         <chr> "South East England", "South East England", "South Eas…

3 Descriptive overview

A descriptive overview of the ratings for stimulus accents across questions (social and behavioural attributes) can be found in Figure 3.1.

We can observe that for some attributes there appears to be more diversity associated with the stimulus accent than for others. In other words, some attributes are more associated with accent stereotypes. For example, the attributes aggressive and kind received relatively uniform ratings across accent type. Most accents were rated as not aggressive (rating 1 / 2) and moderately kind. In other words, these attributes are not used as characteristic for the present list of accents.

Stronger variability between accents can be observed for example for the attributes educated and intelligent. SSBE is rated more educated and intelligent than all other accents. Also SSBE are less likely to be associated with shoplifting, vandalising, physical assault, working class, and more likely to be rich. Also there was a small tendency showing that SSBE is associated with the returning of a lost wallet.

Descriptive overview of ratings by stimulus accent and question (social and behavioural attributes) shown in the panel strips.

Figure 3.1: Descriptive overview of ratings by stimulus accent and question (social and behavioural attributes) shown in the panel strips.

Below we will test whether these observations hold after statistically accounting for individual differences between our raters.

4 Model definitions

The outcome variable was ratings from 1 (storngly disagree) to 7 (strongly agree). We modelled the ratings using a cumulative normal distribution for ordinal data (see Bürkner & Vuorre, 2019). Cumulative models assume that the observed ordinal variable \(Y\), the agreement rating, originates from the categorization of a latent (not observable) continuous variable \(\tilde{Y}\), i.e. the latent tendency to agree with a statement. We map the discrete observed data onto a continuous underlying scale.

The model assumes that there are \(K\) thresholds \(\tau_\text{k}\) which separates \(\tilde{Y}\) into \(K+1\) observable, ordered categories of \(Y\). As the rating variable has 7 levels, there are 6 thresholds. These thresholds \(\tau_\text{k}\) replace the intercept. Thus, we model the probabilities of \(Y\) being equal to category \(k\) given a linear predictor \(\eta\). For simplicity, we assume the latent variable \(\tilde{Y}\) to be normally distributed. We call the corresponding cumulative normal distribution function \(\Phi\). The probability for each text quality rating \(Y\) being equal to category \(k\) can be determined by equation (4.1).

\[ \begin{align}\tag{4.1} \text{P}(Y=k\mid\eta) = \Phi(\tau_\text{k}-\eta) - \Phi(\tau_\text{k-1}-\eta) \end{align} \]

The linear regression for the tendency to agree with a statement \(\tilde{Y}\) can be formulated as equation (4.2).

\[ \begin{align}\tag{4.2} \tilde{Y}_\text{ijm} = \eta_\text{ijm} + \text{U}_\text{stimulus accent[m], question[j], participant[i]} + \epsilon_\text{ijm} \end{align} \]

\(\tilde{Y}_\text{ijm}\) the latent rating of participant \(i\) for \(i \in \{1,\dots, I\}\) where \(I\) is the total number of participants, question \(j\) for \(j \in \{1,\dots, J\}\) where \(J\) is the total number of questions, and stimulus accent \(m\) for \(m \in \{1,\dots, M\}\) where \(M\) is the total number of stimulus accents presented.

\(\tilde{Y}\), then, consists of three parts: \(\eta\) represents the variation explained by fixed effects, \(\text{U}\) are random effects, and \(\epsilon\) represents the unexplained variance. Random effects were random intercepts for participants addressing the assumption that each participant has an individual bias to agree or disagree with the presented questions and random by-participant slopes adjustments for stimulus accent as a function of question capturing the assumption that people’s attitudes towards accents differ depending on individual difference factors that were not covered by the predictors included below.

The linear models for \(\eta\) are defined in the following equations. Model parameters were added incrementally.

We started with a model that only included random effects but not fixed effects except for the threshold parameter \(\eta\) equivalent to the model expressed in equation (4.2).

The second model introduces, in addition to the random effects described before, the two (simple) main effects of interest, namely question (attribute) and stimulus accent, as fixed effects as well as predictors related to individual differences, i.e. predictors that are known to affect attitude towards accents but that are not of primary interest to our inference. The model is summarised in equation (4.3).

\[ \begin{align}\tag{4.3} \eta_\text{ijm} = &\text{ }\beta_\text{1} \cdot \text{question}_\text{ij} + \beta_\text{2} \cdot \text{stimulus accent}_\text{im} +\\ & \beta_\text{3} \cdot \text{uk area} + \beta_\text{4} \cdot \text{ethnicity}_\text{i} +\\ & \beta_\text{5} \cdot \text{familiarity}_\text{i} + \beta_\text{6} \cdot \text{age}_\text{i} +\\ & \beta_\text{7} \cdot \text{gender}_\text{i} \end{align} \]

The fourth model is similar to model 3 but it has allows for covariation between the two fixed effects of interest, namely question and stimulus accent, and the individual differences predictors introduced above. The relevant interactions were factor question (attribute) and interactions with age, gender, ethnicity; also interactions between the factor stimulus accent and all other individual differences predictors (age, gender, ethnicity, familiarity, UK area). The model is summarised in equation (4.4).

\[ \begin{align}\tag{4.4} \eta_\text{ijm} = &\text{ }\beta_\text{1} \cdot \text{question}_\text{ij} + \beta_\text{2} \cdot \text{stimulus accent}_\text{im} +\\ & \beta_\text{3} \cdot \text{uk area} + \beta_\text{4} \cdot \text{ethnicity}_\text{i} +\\ & \beta_\text{5} \cdot \text{familiarity}_\text{i} + \beta_\text{6} \cdot \text{age}_\text{i} +\\ & \beta_\text{7} \cdot \text{gender}_\text{i} + \beta_\text{8} \cdot \text{question:familiarity}_\text{ij} +\\ & \beta_\text{9} \cdot \text{question:age}_\text{ij} + \beta_\text{10} \cdot \text{question:gender}_\text{ij} +\\ & \beta_\text{11} \cdot \text{question:ethnicity}_\text{ij} + \beta_\text{12} \cdot \text{stimulus accent:uk area}_\text{im} +\\ & \beta_\text{13} \cdot \text{stimulus accent:ethnicity}_\text{im} + \beta_\text{14} \cdot \text{stimulus accent:familiarity}_\text{im} +\\ & \beta_\text{15} \cdot \text{stimulus accent:age}_\text{im} + \beta_\text{16} \cdot \text{stimulus accent:gender}_\text{im} \end{align} \]

The fifths model makes the additional assumption that ratings for particular questions vary depending on the stimulus accents (and vice versa) rather than contributing individually to the response outcomes. The model is summarised in equation (4.5).

\[ \begin{align}\tag{4.5} \eta_\text{ijm} = &\text{ }\beta_\text{1} \cdot \text{question}_\text{ij} + \beta_\text{2} \cdot \text{stimulus accent}_\text{im} +\\ & \beta_\text{3} \cdot \text{question:stimulus accent}_\text{ijm} + \beta_\text{4} \cdot \text{uk area} +\\ & \beta_\text{5} \cdot \text{ethnicity}_\text{i} + \beta_\text{6} \cdot \text{familiarity}_\text{i} +\\ & \beta_\text{7} \cdot \text{age}_\text{i} + \beta_\text{8} \cdot \text{gender}_\text{i} +\\ & \beta_\text{9} \cdot \text{question:familiarity}_\text{ij} + \beta_\text{10} \cdot \text{question:age}_\text{ij} +\\ & \beta_\text{11} \cdot \text{question:gender}_\text{ij} + \beta_\text{12} \cdot \text{question:ethnicity}_\text{ij} +\\ & \beta_\text{13} \cdot \text{stimulus accent:uk area}_\text{im} + \beta_\text{14} \cdot \text{stimulus accent:ethnicity}_\text{im} +\\ & \beta_\text{15} \cdot \text{stimulus accent:familiarity}_\text{im} + \beta_\text{16} \cdot \text{stimulus accent:age}_\text{im} +\\ & \beta_\text{17} \cdot \text{stimulus accent:gender}_\text{im} \end{align} \]

5 Out-of-samples cross-validation

To establish evidence for the effects of interest, we applied cross-validation based model comparisons, namely out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari et al., 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density (\(\widehat{elpd}\)) and the difference \(\Delta\widehat{elpd}\) between models. The advantage of using leave-one-out cross-validation is that models with more parameters are penalised to prevent overfit.

Results are shown in Table 5.1. Models are ordered from the best model (the model with the highest predictive performance) to the weakest model (the model with the lowest predictive performance). Comparisons show strong evidence that adding the interaction of stimulus accent and question (attributes) increases the predictive performance of the model. Adding stimulus accent and question as independent (simple) main effects decrease the model performance compared to a model with only the individual differences factors. Adding the latter made a negligible addition to the model performance but reduces the predictive performance when adding interactions between the individual differences predictors and question as well as stimulus accent.

Table 5.1: Model comparisons. The top row shows the models with the highest predictive performance. Standard error is shown in parentheses.
Models \(\Delta\widehat{elpd}\) \(\widehat{elpd}\)
Eq. (4.5) -24,160 (121)
No fixed effects -171 (31) -24,331 (119)
Eq. (4.3) -221 (31) -24,380 (119)
Eq. (4.4) -249 (29) -24,409 (121)
Note:
\(\widehat{elpd}\) = predictive performance indicated as expected log pointwise predictive density; \(\Delta\widehat{elpd}\) = difference in predictive performance relative to the model with the highest predictive performance in the top row.

6 Posterior summary

The posterior of the model is summarised in Figure 6.1. The summary statistics show the most probable posterior (i.e. inferred) parameter value as well as the posterior probability interval (henceforth, PI), i.e. the interval that contains the true parameter value with a 95% probability (Kruschke et al., 2012; Nicenboim & Vasishth, 2016; Sorensen et al., 2016). These are derived from the best fitting model above, hence after accounting for individual differences.

Summary of posterior. Shown are the posterior median response rating with 95% probability intervals.

Figure 6.1: Summary of posterior. Shown are the posterior median response rating with 95% probability intervals.

Noticeable are the following patterns:

  • Ratings for social attributes appear generally higher than for behavioural attributes suggesting that the former is more strongly associated with accents.
  • Overall social judgements elicit stronger differences between accents than behavioural judgements and are therefore more important for stereotyping.
  • Some attributes are more strongly associated with accents than others (both panels in Figure 6.1 are ordered from the lowest rating to the strongest rating). For behavioural attributes, for example, “standing up for someone” is more strongly associated with accents and “report a relative to the police” is less strongly associated with accents. For social attributes, “aggressive” showed the lowest ratings and the highest ratings were observed for trustworthy, kind, honest, confident and friendly. Also the lack of variability across accents for attributes like “aggressive” suggests that accents are generally not associated with different levels of aggressiveness.
  • SSBE is standing out for a few social and behavioural attributes. For behavioural attributes, the SSBE accent is associated with being more likely – than other accents – to “reporting a relative” and “returning a lost wallet” and less likely to vandalise, shoplift, and physically assault someone. Note though that differences for all other attributes are small or absent. For social attributes the differences for SSBE are large. SSBE is, compared to other accents associated with being richer, more intelligent and educated, some what more confident, and less likely to be working class. However, no difference was observed for aggressive, trustworthy, kind, honest, confident, and friendly.
  • For accents other than SSBE, we observe two groups of accents for social judgements with some languages being rated lower (Liverpool, Newcastle, London) than others for the attributes “rich,” “intelligent,” “educated,” “kind,” “trustworthy,” and “friendly.”
  • Newcastle is standing out as less confident.

References

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017
Bürkner, P.-C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1), 77–101.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
Kruschke, J. K., Aguinis, H., & Joo, H. (2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15(4), 722–752.
McElreath, R. (2016). Statistical rethinking: A bayesian course with examples in R and Stan. CRC Press.
Nicenboim, B., & Vasishth, S. (2016). Statistical methods for linguistic research: Foundational Ideas – Part II. Language and Linguistics Compass, 10(11), 591–613.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Sorensen, T., Hohenstein, S., & Vasishth, S. (2016). Bayesian linear mixed models using stan: A tutorial for psychologists, linguists, and cognitive scientists. Quantitative Methods for Psychology, 12(3), 175–200.
Vehtari, A., Gelman, A., & Gabry, J. (2015). Pareto smoothed importance sampling. arXiv Preprint arXiv:1507.02646.
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.