LexSIC: A vocabulary task for Sicilian Dialect

Author

Arona, Besler, Cruschina, Ferin, Gyllstad, Venagli, Kupisch

Data exploration

Data exploration

Participants

  • N tested participants: 100

    • 63 Sicilian speakers living in Sicily
    • 22 HS of Sicilian in Germany
    • 5 South Italians
    • 10 North Italians

Exclusion of participants:

  • 7 participants excluded for guessing behavior during IRT analysis for LexSIC: 156, 250, 541, 543, 566, 574, 539
  • 3 participants excluded for guessing behavior in DIALANG ITA: 510, 537, 461
  • Total participant N = 90
    • 56 Sicilian speakers living in Sicily

    • 19 HS of Sicilian in Germany

    • 5 South Italians

    • 10 North Italians

Tab. 1. Sample description
Group n Age Mean sd range
Sicilian Heritage Speakers (HS) 19 28.21 10.04 14-53
Northern Italians 10 31.30 10.89 24-62
2L1 Sicilian Speakers 56 35.59 11.41 18-69
Southern Italians 5 38.00 10.99 23-52

Level of education


   0    1    2    3    4    5    6    7 
 825  600 1875  825 1500  225  825   75 
Tab. 2. Participants' education levels
Group Education Level n
Sicilian HS High school 8
Sicilian HS Middle school 5
Sicilian HS University 6
Northern Italians High school 1
Northern Italians University 9
Sicilian 2L1 High school 19
Sicilian 2L1 I don't know 1
Sicilian 2L1 Middle school 3
Sicilian 2L1 University 33
Southern Italians High school 2
Southern Italians Middle school 1
Southern Italians University 2
Tab. 3. Participants mothers' education levels
Group Mothers' Education Level n
Sicilian HS High school 1
Sicilian HS I don't know 2
Sicilian HS Middle school 12
Sicilian HS No degree 2
Sicilian HS University 2
Northern Italians High school 4
Northern Italians Middle school 4
Northern Italians No degree 1
Northern Italians University 1
Sicilian 2L1 High school 17
Sicilian 2L1 I don't know 8
Sicilian 2L1 Middle school 19
Sicilian 2L1 No degree 2
Sicilian 2L1 University 10
Southern Italians I don't know 1
Southern Italians Middle school 4
Tab. 4. Participants fathers' education levels
Group Father's Education Level n
Sicilian HS High school 3
Sicilian HS I don't know 2
Sicilian HS Middle school 9
Sicilian HS No degree 4
Sicilian HS University 1
Northern Italians High school 2
Northern Italians Middle school 4
Northern Italians No degree 2
Northern Italians University 2
Sicilian 2L1 High school 15
Sicilian 2L1 I don't know 8
Sicilian 2L1 Middle school 20
Sicilian 2L1 No degree 2
Sicilian 2L1 University 11
Southern Italians I don't know 1
Southern Italians Middle school 3
Southern Italians University 1

Self-reported Sicilian Proficiency (active vs. passive proficiency in the L1 and HS group)

Tab. 5. Active and passive Sicilian proficiency
Group Proficiency type M sd
Sicilian HS Productive skills 3.05 1.99
Sicilian HS Receptive skills 4.05 1.70
Sicilian 2L1 Productive skills 4.39 1.60
Sicilian 2L1 Receptive skills 5.00 1.15

Use of Sicilian

  • Score: 1-4

LexSIC

# Formula = N(yes)[WORDS] - 2 * N(yes)[PSEUDO]
d %>% 
  group_by(participant) %>% 
  mutate(tot_real = sum(answ[cond == "REAL"]),
         tot_unreal = sum(answ[cond == "UNREAL"]),
         tot_lex = tot_real - 2*tot_unreal) %>% 
  ungroup()-> d
Tab. 7. LexSIC score by group
Group Mean LexSIC score sd range
Sicilian HS 31.47 10.17 12-45
Northern Italians 19.50 8.14 3-29
Sicilian 2L1 38.41 7.55 17-50
Southern Italians 33.80 4.96 26-39

DIALANG (Ita) for the Sicilian HS and Sicilian L1 groups

Tab. 8. DIALANG (Ita) score by group
Group Mean DIALANG_ITA score sd range
Sicilian HS 29.89 10.08 7-44
Sicilian 2L1 41.62 4.27 28-49

Distribution of the measures

Correlation Analysis

Following Salmela et al., 2021, we run the following correlation analyses:

  1. Correlation between LexSIC score and age for the L1 and HS group (Spearman Correlation)
  2. Correlation between LexSIC score and self-reported proficiency for L1 and HS group (Spearman Correlation)
  3. Correlation between LexSIC and exposure for L1 and HS group (Spearman Correlation)
  4. Correlation between LexSIC and DIALANG (ITA) for L1 and HS group (Spearman Correlation)

Prior to the correlation analysis, we test:

(a) whether there is a difference between the L1 and HS group in terms of LexSIC and DIALANG ITA scores (Wilcoxon rank-sum test)

(b) whether there is a difference of LexSIC score by education level in the L1 and HS score (Kruskal-Wallis test and Dunn’s multiple comparison test)

Wilcoxon rank-sum test LexSIC and DIALANG

  tot lex SIC
Predictors Incidence Rate Ratios CI p
(Intercept) 29.51 26.10 – 33.37 <0.001
group [sicilian] 1.27 1.10 – 1.47 0.001
Random Effects
σ2 0.03
τ00 participant 0.07
ICC 0.73
N participant 75
Observations 5625
Marginal R2 / Conditional R2 0.097 / 0.753
  • Using the Wilcox test, we found a significant difference of LexSIC scores by group (p < .0001****), effect size: r = .30 (moderate)

    • HS: Median = 33, IQR = 18 (Interquartile range is the difference between 3rd and 1st quartile)

    • L1: Median = 41, IQR = 12

  • Using the Wilcox test, we found a significant difference of DIALANG scores by group (p = 0 ****), effect size: r = .57 (large)

    • HS: Median = 33, IQR = 18

    • L1: Median = 42, IQR = 4,25

Difference within groups as a function of educational level

Tab. 9. DIALANG (Ita) score by education level
Group Education level Mean DIALANG_ITA score sd
HS High school 30.625 11.831
HS Middle school 28.200 8.267
HS University 30.333 8.683
sicilian High school 41.421 4.477
sicilian Middle school 40.000 3.273
sicilian University 41.939 4.236

Tab. 10. LexSIC score by education level
Group Education level Mean LexSIC score sd
HS High school 31.125 10.796
HS Middle school 35.600 5.859
HS University 28.500 10.978
sicilian High school 36.737 8.139
sicilian Middle school 38.667 5.324
sicilian University 39.333 7.324

Kruskal-Wallis rank sum test

  • Within the L1 group, the difference of DIALANG scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)

  • Within the L1 group, the difference of LexSIC scores by education level is significant (p < .001) and the pairwise comparison shows that both the comparison between University and High school (p < .001) and the comparison between University and Middle school are significantly different (p < .01)

  • Within the HS group, the difference of DIALANG scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)

  • Within the HS group, the difference of LexSIC scores by education level is significant (p < .001) and the pairwise comparison shows that all levels are significantly different (p < .001)

The generalized linear model does not predict any difference between education levels within the L1 (p = 0.5) and the HS group (p = 0.4).

Correlation between LexSIC and age

  • L1 group: Spearman’s rank correlation .12 (p < .001) - no significant effect of age (p = .94)

  • In the HS group, the LexSIC score correlates with age ( Spearman .38 p<.001) - no significant effect of age (p = 0.18)

Correlations between active and receptive skills & LexSIC scores

  tot lex SIC
Predictors Incidence Rate Ratios CI p
(Intercept) 29.51 26.22 – 33.21 <0.001
receptive skills 1.08 1.02 – 1.15 0.008
group [sicilian] 1.27 1.11 – 1.46 0.001
Random Effects
σ2 0.03
τ00 participant 0.07
ICC 0.71
N participant 74
Observations 5550
Marginal R2 / Conditional R2 0.155 / 0.755
  • Within the 2L1 group:

    • Active skills:

      • Correlation (Spearman): .05 (p < 001)

      • glmer: no effect

    • Receptive skills:

      • Correlation (Spearman): -.03 (p = .04)

      • glmer: no effect

    • Composite measure

      • Correlation (Spearman): .02 (not significant)

      • glmer: no effect

  • Within the HS group:

    • Active skills:

      • Correlation (Spearman): .78 (p < .001)

      • glmer: no effect

    • Receptive skills:

      • Correlation (Spearman): .78 (p<.001)

      • glmer: significant effect (p <.001)

    • Composite measure

      • Correlation (Spearman): .79 (p < .001)

      • glmer: significant effect (p <.001)

  • Within the whole sample:

    • Active skills

      • Correlation (Spearman): .32 (p < .001)

      • glmer: no effect

    • Receptive skills

      • Correlation (Spearman): .24 (p < .001)

      • glmer: significant effect (p <.05) [see plot below]

    • Composite measure

      • Correlation (Spearman): .22 (p < .001)

      • glmer: significant effect (p <.05)

Correlation between Sicilian Use and LexSIC

  tot lex SIC
Predictors Incidence Rate Ratios CI p
(Intercept) 27.03 23.31 – 31.35 <0.001
composite use family 1.01 1.00 – 1.02 0.046
group [sicilian] 1.25 1.09 – 1.44 0.002
Random Effects
σ2 0.03
τ00 participant 0.07
ICC 0.72
N participant 74
Observations 5550
Marginal R2 / Conditional R2 0.130 / 0.755
  • Within 2L1 group:
    • cor = .13 (p < .001)
    • glmer not significant
  • Within HS group:
    • cor = .73(p < .001)
    • glmer significant (p<.01)
  • Within whole sample:
    • cor = .30 (p < .001)

    • glmer not significant

Correlation between LexSIC and DIALANG ITA

  tot lex SIC
Predictors Incidence Rate Ratios CI p
(Intercept) 29.51 26.34 – 33.06 <0.001
DIALANG 1.12 1.05 – 1.18 <0.001
group [sicilian] 1.27 1.11 – 1.45 <0.001
Random Effects
σ2 0.03
τ00 participant 0.06
ICC 0.69
N participant 74
Observations 5550
Marginal R2 / Conditional R2 0.200 / 0.755
  • Within the L1 group:
    • the correlation between DIALANG ITA and LexSIC .34 (p < .001, Spearman);
  • Within the HS group
    • the correlation between DIALANG ITA and LexSIC .57 (p < .001, Spearman)
  • Within the whole sample
    • the correlation between DIALANG ITA and LexSIC .50 (p < .001, Spearman)

Random Forest Analysis [to be continued]