LexSIC: A vocabulary task for Sicilian Dialect
Data exploration
Data exploration
Participants
N tested participants: 100
- 63 Sicilian speakers living in Sicily
- 22 HS of Sicilian in Germany
- 5 South Italians
- 10 North Italians
Exclusion of participants:
- 7 participants excluded for guessing behavior during IRT analysis for LexSIC: 156, 250, 541, 543, 566, 574, 539
- 3 participants excluded for guessing behavior in DIALANG ITA: 510, 537, 461
- Total participant N = 90
56 Sicilian speakers living in Sicily
19 HS of Sicilian in Germany
5 South Italians
10 North Italians
| Group | n | Age Mean | sd | range |
|---|---|---|---|---|
| Sicilian Heritage Speakers (HS) | 19 | 28.21 | 10.04 | 14-53 |
| Northern Italians | 10 | 31.30 | 10.89 | 24-62 |
| 2L1 Sicilian Speakers | 56 | 35.59 | 11.41 | 18-69 |
| Southern Italians | 5 | 38.00 | 10.99 | 23-52 |
Level of education
0 1 2 3 4 5 6 7
825 600 1875 825 1500 225 825 75
| Group | Education Level | n |
|---|---|---|
| Sicilian HS | High school | 8 |
| Sicilian HS | Middle school | 5 |
| Sicilian HS | University | 6 |
| Northern Italians | High school | 1 |
| Northern Italians | University | 9 |
| Sicilian 2L1 | High school | 19 |
| Sicilian 2L1 | I don't know | 1 |
| Sicilian 2L1 | Middle school | 3 |
| Sicilian 2L1 | University | 33 |
| Southern Italians | High school | 2 |
| Southern Italians | Middle school | 1 |
| Southern Italians | University | 2 |
| Group | Mothers' Education Level | n |
|---|---|---|
| Sicilian HS | High school | 1 |
| Sicilian HS | I don't know | 2 |
| Sicilian HS | Middle school | 12 |
| Sicilian HS | No degree | 2 |
| Sicilian HS | University | 2 |
| Northern Italians | High school | 4 |
| Northern Italians | Middle school | 4 |
| Northern Italians | No degree | 1 |
| Northern Italians | University | 1 |
| Sicilian 2L1 | High school | 17 |
| Sicilian 2L1 | I don't know | 8 |
| Sicilian 2L1 | Middle school | 19 |
| Sicilian 2L1 | No degree | 2 |
| Sicilian 2L1 | University | 10 |
| Southern Italians | I don't know | 1 |
| Southern Italians | Middle school | 4 |
| Group | Father's Education Level | n |
|---|---|---|
| Sicilian HS | High school | 3 |
| Sicilian HS | I don't know | 2 |
| Sicilian HS | Middle school | 9 |
| Sicilian HS | No degree | 4 |
| Sicilian HS | University | 1 |
| Northern Italians | High school | 2 |
| Northern Italians | Middle school | 4 |
| Northern Italians | No degree | 2 |
| Northern Italians | University | 2 |
| Sicilian 2L1 | High school | 15 |
| Sicilian 2L1 | I don't know | 8 |
| Sicilian 2L1 | Middle school | 20 |
| Sicilian 2L1 | No degree | 2 |
| Sicilian 2L1 | University | 11 |
| Southern Italians | I don't know | 1 |
| Southern Italians | Middle school | 3 |
| Southern Italians | University | 1 |
Self-reported Sicilian Proficiency (active vs. passive proficiency in the L1 and HS group)
| Group | Proficiency type | M | sd |
|---|---|---|---|
| Sicilian HS | Productive skills | 3.05 | 1.99 |
| Sicilian HS | Receptive skills | 4.05 | 1.70 |
| Sicilian 2L1 | Productive skills | 4.39 | 1.60 |
| Sicilian 2L1 | Receptive skills | 5.00 | 1.15 |
Use of Sicilian
- Score: 1-4
LexSIC
# Formula = N(yes)[WORDS] - 2 * N(yes)[PSEUDO]
d %>%
group_by(participant) %>%
mutate(tot_real = sum(answ[cond == "REAL"]),
tot_unreal = sum(answ[cond == "UNREAL"]),
tot_lex = tot_real - 2*tot_unreal) %>%
ungroup()-> d| Group | Mean LexSIC score | sd | range |
|---|---|---|---|
| Sicilian HS | 31.47 | 10.17 | 12-45 |
| Northern Italians | 19.50 | 8.14 | 3-29 |
| Sicilian 2L1 | 38.41 | 7.55 | 17-50 |
| Southern Italians | 33.80 | 4.96 | 26-39 |
DIALANG (Ita) for the Sicilian HS and Sicilian L1 groups
| Group | Mean DIALANG_ITA score | sd | range |
|---|---|---|---|
| Sicilian HS | 29.89 | 10.08 | 7-44 |
| Sicilian 2L1 | 41.62 | 4.27 | 28-49 |
Distribution of the measures
Correlation Analysis
Following Salmela et al., 2021, we run the following correlation analyses:
- Correlation between LexSIC score and age for the L1 and HS group (Spearman Correlation)
- Correlation between LexSIC score and self-reported proficiency for L1 and HS group (Spearman Correlation)
- Correlation between LexSIC and exposure for L1 and HS group (Spearman Correlation)
- Correlation between LexSIC and DIALANG (ITA) for L1 and HS group (Spearman Correlation)
Prior to the correlation analysis, we test:
(a) whether there is a difference between the L1 and HS group in terms of LexSIC and DIALANG ITA scores (Wilcoxon rank-sum test)
(b) whether there is a difference of LexSIC score by education level in the L1 and HS score (Kruskal-Wallis test and Dunn’s multiple comparison test)
Wilcoxon rank-sum test LexSIC and DIALANG
| tot lex SIC | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 29.51 | 26.10 – 33.37 | <0.001 |
| group [sicilian] | 1.27 | 1.10 – 1.47 | 0.001 |
| Random Effects | |||
| σ2 | 0.03 | ||
| τ00 participant | 0.07 | ||
| ICC | 0.73 | ||
| N participant | 75 | ||
| Observations | 5625 | ||
| Marginal R2 / Conditional R2 | 0.097 / 0.753 | ||
Using the Wilcox test, we found a significant difference of LexSIC scores by group (p < .0001****), effect size: r = .30 (moderate)
HS: Median = 33, IQR = 18 (Interquartile range is the difference between 3rd and 1st quartile)
L1: Median = 41, IQR = 12
Using the Wilcox test, we found a significant difference of DIALANG scores by group (p = 0 ****), effect size: r = .57 (large)
HS: Median = 33, IQR = 18
L1: Median = 42, IQR = 4,25
Difference within groups as a function of educational level
| Group | Education level | Mean DIALANG_ITA score | sd |
|---|---|---|---|
| HS | High school | 30.625 | 11.831 |
| HS | Middle school | 28.200 | 8.267 |
| HS | University | 30.333 | 8.683 |
| sicilian | High school | 41.421 | 4.477 |
| sicilian | Middle school | 40.000 | 3.273 |
| sicilian | University | 41.939 | 4.236 |
| Group | Education level | Mean LexSIC score | sd |
|---|---|---|---|
| HS | High school | 31.125 | 10.796 |
| HS | Middle school | 35.600 | 5.859 |
| HS | University | 28.500 | 10.978 |
| sicilian | High school | 36.737 | 8.139 |
| sicilian | Middle school | 38.667 | 5.324 |
| sicilian | University | 39.333 | 7.324 |
Kruskal-Wallis rank sum test
Within the L1 group, the difference of DIALANG scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)
Within the L1 group, the difference of LexSIC scores by education level is significant (p < .001) and the pairwise comparison shows that both the comparison between University and High school (p < .001) and the comparison between University and Middle school are significantly different (p < .01)
Within the HS group, the difference of DIALANG scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)
Within the HS group, the difference of LexSIC scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)
The generalized linear model does not predict any difference between education levels within the L1 (p = 0.5) and the HS group (p = 0.4).
Correlation between LexSIC and age
L1 group: Spearman’s rank correlation .12 (p < .001) - no significant effect of age (p = .94)
In the HS group, the LexSIC score correlates with age ( Spearman .38 p<.001) - no significant effect of age (p = 0.18)
Correlations between active and receptive skills & LexSIC scores
| tot lex SIC | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 29.51 | 26.22 – 33.21 | <0.001 |
| receptive skills | 1.08 | 1.02 – 1.15 | 0.008 |
| group [sicilian] | 1.27 | 1.11 – 1.46 | 0.001 |
| Random Effects | |||
| σ2 | 0.03 | ||
| τ00 participant | 0.07 | ||
| ICC | 0.71 | ||
| N participant | 74 | ||
| Observations | 5550 | ||
| Marginal R2 / Conditional R2 | 0.155 / 0.755 | ||
Within the 2L1 group:
Active skills:
Correlation (Spearman): .05 (p < 001)
glmer: no effect
Receptive skills:
Correlation (Spearman): -.03 (p = .04)
glmer: no effect
Composite measure
Correlation (Spearman): .02 (not significant)
glmer: no effect
Within the HS group:
Active skills:
Correlation (Spearman): .78 (p < .001)
glmer: no effect
Receptive skills:
Correlation (Spearman): .78 (p<.001)
glmer: significant effect (p <.001)
Composite measure
Correlation (Spearman): .79 (p < .001)
glmer: significant effect (p <.001)
Within the whole sample:
Active skills
Correlation (Spearman): .32 (p < .001)
glmer: no effect
Receptive skills
Correlation (Spearman): .24 (p < .001)
glmer: significant effect (p <.05) [see plot below]
Composite measure
Correlation (Spearman): .22 (p < .001)
glmer: significant effect (p <.05)
Correlation between Sicilian Use and LexSIC
| tot lex SIC | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 27.03 | 23.31 – 31.35 | <0.001 |
| composite use family | 1.01 | 1.00 – 1.02 | 0.046 |
| group [sicilian] | 1.25 | 1.09 – 1.44 | 0.002 |
| Random Effects | |||
| σ2 | 0.03 | ||
| τ00 participant | 0.07 | ||
| ICC | 0.72 | ||
| N participant | 74 | ||
| Observations | 5550 | ||
| Marginal R2 / Conditional R2 | 0.130 / 0.755 | ||
- Within 2L1 group:
- cor = .13 (p < .001)
- glmer not significant
- Within HS group:
- cor = .73(p < .001)
- glmer significant (p<.01)
- Within whole sample:
cor = .30 (p < .001)
glmer not significant
Correlation between LexSIC and DIALANG ITA
| tot lex SIC | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 29.51 | 26.34 – 33.06 | <0.001 |
| DIALANG | 1.12 | 1.05 – 1.18 | <0.001 |
| group [sicilian] | 1.27 | 1.11 – 1.45 | <0.001 |
| Random Effects | |||
| σ2 | 0.03 | ||
| τ00 participant | 0.06 | ||
| ICC | 0.69 | ||
| N participant | 74 | ||
| Observations | 5550 | ||
| Marginal R2 / Conditional R2 | 0.200 / 0.755 | ||
- Within the L1 group:
- the correlation between DIALANG ITA and LexSIC .34 (p < .001, Spearman);
- Within the HS group
- the correlation between DIALANG ITA and LexSIC .57 (p < .001, Spearman)
- Within the whole sample
- the correlation between DIALANG ITA and LexSIC .50 (p < .001, Spearman)