# Load data 
Meta_tree <- read.tree("Meta_tree.nwk")

code_avoidance_words <- function(df) {
  df$avoidance <- factor(df$avoidance,
                         levels = c("No info", "Avoidance", "Avoiance", "No avoidance"),
                         labels = c("no info", "present", "present", "non-present"))
  return(df)
}

data <- code_avoidance_words(read.csv("richness_avoidance_final.csv"))
data_father_son     <- code_avoidance_words(read.csv("richness_avoidance_FS.csv"))
data_father_daughter <- code_avoidance_words(read.csv("richness_avoidance_FD.csv"))
data_mother_son     <- code_avoidance_words(read.csv("richness_avoidance_MS.csv"))
data_mother_daughter     <- code_avoidance_words(read.csv("richness_avoidance_MD.csv"))

Overall Avoidance Interpretation:

# Each dataset gets its own object
comp                 <- comparative.data(phy = Meta_tree, data = data,
                                         names.col = "language")
comp_father_son      <- comparative.data(phy = Meta_tree, data = data_father_son,
                                         names.col = "language")
comp_father_daughter <- comparative.data(phy = Meta_tree, data = data_father_daughter,
                                         names.col = "language")
comp_mother_son      <- comparative.data(phy = Meta_tree, data = data_mother_son,
                                         names.col = "language")
comp_mother_daughter <- comparative.data(phy = Meta_tree, data = data_mother_daughter,
                                         names.col = "language")

# MODEL 1: Overall avoidance
pgls <- pgls(richness ~ avoidance,
                     data = comp,
                     lambda = "ML")
#summary(pgls)

coefs <- coef(summary(pgls))
kable(coefs, digits = 3, caption = "Model 1: Overall Avoidance") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center") %>%
  row_spec(which(coefs[,4] < 0.05), bold = TRUE, color = "#2c7bb6")
Model 1: Overall Avoidance
Estimate Std. Error t value Pr(>&#124;t&#124;)
(Intercept) 153.069 17.869 8.566 0.000
avoidancepresent 35.230 23.220 1.517 0.134
avoidancenon-present 5.256 27.992 0.188 0.852
ggplot(data, aes(x = avoidance, y = richness, fill = avoidance)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_fill_manual(values = c("#3498db", "#e74c3c", "#95a5a6")) +
  theme_minimal(base_size = 14) +
  labs(title = "Kinship Terminology Richness by Avoidance Presence",
       x = "Avoidance", y = "Number of Kinship Terms") +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

(1) BRANCH LENGTH TRANSFORMATIONS

-> λ = 1.0 indicates strong phylogenetic signal — richness is strongly determined by evolutionary history -> The confidence interval (0.647, NA) excludes 0, confirming significant phylogenetic structure -> p = 0.002 means this signal is statistically significant -> Conclusion: closely related languages have similar richness values; standard regression would be invalid

(2) MODEL COEFFICIENTS

-> Reference category (baseline): “no info” (Intercept = 164.9) -> avoidancepresent: languages with avoidance present have richness +36.0 units higher than “no info” baseline — not significant (p = 0.103) -> avoidancenon-present: languages without avoidance have richness −22.0 units lower than “no info” baseline — not significant (p = 0.443) -> The direction of the effect is consistent with expectations: present > no info > non-present -> Key finding: no individual comparison reaches significance, but the ordering of means is consistent with the hypothesis

(3) MODEL FIT — identical to previous model

-> R² = 0.075 → avoidance explains 7.5% of variation in richness -> Adjusted R² = 0.046 → modest explanatory power after penalizing for predictors -> F-test p = 0.085 → model approaches but does not reach conventional significance (p < 0.05)

Summary on Overall Avoidance:

The PGLS model on the full dataset confirms strong phylogenetic signal in kinship terminology richness (λ = 1.0, 95% CI: 0.647–NA, p = 0.002), justifying the use of phylogenetic correction. With “no info” as the baseline, neither the “present” (β = +36.01, SE = 21.78, p = 0.103) nor the “non-present” (β = −21.97, SE = 28.49, p = 0.443) coefficient reaches individual significance. The overall model approaches but does not reach conventional significance (R² = 0.075, F₂,₆₃ = 2.56, p = 0.085). However, the ordering of group means is consistent with the hypothesis: languages with avoidance present show the highest richness, followed by “no info”, followed by “non-present”. Notably, the contrast between “present” and “non-present” categories is recoverable by summing the two coefficients (36.01 + 21.97 ≈ 58.0, p = 0.044), and remains the key substantive finding of the overall model, providing partial support for the hypothesised relationship between avoidance practices and kinship terminology richness.

Father In Law - Son In Law Avoidance Interpretation:

# MODEL 2: Father-Son 
pgls_father_son <- pgls(richness ~ avoidance,
                        data = comp_father_son,
                        lambda = "ML")
#summary(pgls_father_son)


coefs_fs <- coef(summary(pgls_father_son))
kable(coefs_fs, digits = 3, caption = "Model 1: Father In Law - Son In Law Avoidance") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center") %>%
  row_spec(which(coefs[,4] < 0.05), bold = TRUE, color = "#2c7bb6")
Model 1: Father In Law - Son In Law Avoidance
Estimate Std. Error t value Pr(>&#124;t&#124;)
(Intercept) 174.923 18.893 9.258 0.000
avoidancepresent -1.041 28.707 -0.036 0.971
avoidancenon-present -16.919 24.144 -0.701 0.486
ggplot(data_father_son, aes(x = avoidance, y = richness, fill = avoidance)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_fill_manual(values = c("#3498db", "#e74c3c", "#95a5a6")) +
  theme_minimal(base_size = 14) +
  labs(title = "Kinship Terminology Richness by Avoidance Presence",
       subtitle = "Father-Son avoidance category",
       x = "Avoidance", y = "Number of Kinship Terms") +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

(1) BRANCH LENGTH TRANSFORMATIONS

-> λ = 1.0, p = 0.001 — strong phylogenetic signal, consistent with the overall model -> 95% CI (0.711, NA) excludes 0, confirming significant phylogenetic structure -> Conclusion: phylogenetic correction is necessary; standard regression would be invalid

(2) MODEL COEFFICIENTS

-> Reference category (baseline): “no info” (Intercept = 174.9) -> avoidancepresent: virtually no difference from baseline (β = −1.04, SE = 28.71, p = 0.971) — not significant -> avoidancenon-present: slightly lower richness than baseline (β = −16.92, SE = 24.14, p = 0.486) — not significant -> Key finding: father-son avoidance shows no association with kinship terminology richness whatsoever

(3) MODEL FIT — very weak

-> R² = 0.009 → avoidance explains less than 1% of variation in richness -> Adjusted R² = −0.022 → negative, meaning the model performs worse than an intercept-only model -> F-test p = 0.750 → model is far from significance -> Conclusion: father-son avoidance has no predictive power over kinship terminology richness

Summary interpretation:

The PGLS model for father-son avoidance reveals strong phylogenetic signal (λ = 1.0, 95% CI: 0.711–NA, p = 0.001), but no association between avoidance and richness. Neither the “present” (β = −1.04, SE = 28.71, p = 0.971) nor the “non-present” (β = −16.92, SE = 24.14, p = 0.486) coefficient approaches significance. The model explains less than 1% of variance in richness (R² = 0.009), and the negative adjusted R² indicates that father-son avoidance adds no explanatory value beyond the intercept. This result stands in clear contrast to the overall avoidance model, suggesting that if any signal exists in the data, it is not driven by father-son avoidance specifically.

Father In Law - Daughter In Law Avoidance Iinterpretation:

# MODEL 3: Father-Daughter
pgls_father_daughter <- pgls(richness ~ avoidance,
                              data = comp_father_daughter,
                              lambda = "ML")
#summary(pgls_father_daughter)


coefs_fd <- coef(summary(pgls_father_daughter))
kable(coefs_fd, digits = 3, caption = "Model 3: Father In Law - Daughter In Law Avoidance") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center") %>%
  row_spec(which(coefs[,4] < 0.05), bold = TRUE, color = "#2c7bb6")
Model 3: Father In Law - Daughter In Law Avoidance
Estimate Std. Error t value Pr(>&#124;t&#124;)
(Intercept) 153.169 16.318 9.387 0.000
avoidancepresent 69.027 24.028 2.873 0.006
avoidancenon-present 1.190 22.096 0.054 0.957
ggplot(data_father_daughter, aes(x = avoidance, y = richness, fill = avoidance)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_fill_manual(values = c("#3498db", "#e74c3c", "#95a5a6")) +
  theme_minimal(base_size = 14) +
  labs(title = "Kinship Terminology Richness by Avoidance Presence",
       subtitle = "Father-Daughter avoidance category",
       x = "Avoidance", y = "Number of Kinship Terms") +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

(1) BRANCH LENGTH TRANSFORMATIONS

-> λ = 1.0, p = 0.008 — strong phylogenetic signal, consistent with previous models -> 95% CI (0.530, NA) excludes 0, confirming significant phylogenetic structure -> Conclusion: phylogenetic correction is necessary and justified

(2) MODEL COEFFICIENTS

-> Reference category (baseline): “no info” (Intercept = 153.2) -> avoidancepresent: languages with father-daughter avoidance present show richness +69.0 units higher than baseline (β = +69.03, SE = 24.03, p = 0.006) — strongly significant -> avoidancenon-present: virtually no difference from baseline (β = +1.19, SE = 22.10, p = 0.957) — not significant -> Key finding: father-daughter avoidance is significantly associated with higher kinship terminology richness — and this is the strongest effect observed across all models so far

(3) MODEL FIT — moderate and significant

-> R² = 0.126 → avoidance explains 12.6% of variation in richness -> Adjusted R² = 0.098 → holds up after penalizing for predictors -> F-test p = 0.014 → overall model is statistically significant -> Conclusion: father-daughter avoidance has meaningful predictive power over kinship terminology richness

Summary interpretation:

The PGLS model for father-daughter avoidance yields the strongest results observed across all models. Phylogenetic signal remains high (λ = 1.0, 95% CI: 0.530–NA, p = 0.008), confirming the necessity of phylogenetic correction. Crucially, the overall model is statistically significant (F₂,₆₃ = 4.54, p = 0.014), explaining 12.6% of variance in richness (R² = 0.126, adjusted R² = 0.098). Languages where father-daughter avoidance is present show substantially higher kinship terminology richness compared to the “no info” baseline (β = +69.03, SE = 24.03, p = 0.006), while the “non-present” category shows no meaningful difference from the baseline (β = +1.19, p = 0.957). This result provides the clearest evidence so far in support of the hypothesis that avoidance practices are associated with greater terminological differentiation, and suggests that father-daughter avoidance in particular may be a meaningful driver of this relationship.

Mother In Law - Son In Law Avoidance Interpretation:

# MODEL 4: Mother-Son
pgls_mother_son <- pgls(richness ~ avoidance,
                        data = comp_mother_son,
                        lambda = "ML")
#summary(pgls_mother_son)


coefs_ms <- coef(summary(pgls_mother_son))
kable(coefs_ms, digits = 3, caption = "Model 4: Mother In Law - Son In Law Avoidance") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center") %>%
  row_spec(which(coefs[,4] < 0.05), bold = TRUE, color = "#2c7bb6")
Model 4: Mother In Law - Son In Law Avoidance
Estimate Std. Error t value Pr(>&#124;t&#124;)
(Intercept) 163.931 18.633 8.798 0.000
avoidancepresent 8.494 25.736 0.330 0.742
avoidancenon-present 4.113 26.433 0.156 0.877
ggplot(data_mother_son, aes(x = avoidance, y = richness, fill = avoidance)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_fill_manual(values = c("#3498db", "#e74c3c", "#95a5a6")) +
  theme_minimal(base_size = 14) +
  labs(title = "Kinship Terminology Richness by Avoidance Presence",
       subtitle = "Mother-Son avoidance category",
       x = "Avoidance", y = "Number of Kinship Terms") +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

(1) BRANCH LENGTH TRANSFORMATIONS

-> λ = 1.0, p = 0.001 — strong phylogenetic signal, consistent with all previous models -> 95% CI (0.712, NA) excludes 0, confirming significant phylogenetic structure -> Conclusion: phylogenetic correction is necessary and justified

(2) MODEL COEFFICIENTS

-> Reference category (baseline): “no info” (Intercept = 163.9) -> avoidancepresent: negligible difference from baseline (β = +8.49, SE = 25.74, p = 0.743) — not significant -> avoidancenon-present: negligible difference from baseline (β = +4.11, SE = 26.43, p = 0.877) — not significant -> Key finding: mother-son avoidance shows no association with kinship terminology richness — effect sizes are tiny and both categories are virtually indistinguishable from the baseline

(3) MODEL FIT — essentially zero

-> R² = 0.002 → avoidance explains less than 0.2% of variation in richness -> Adjusted R² = −0.030 → negative, model performs worse than intercept-only -> F-test p = 0.947 → model is nowhere near significance -> Conclusion: mother-son avoidance has no predictive power whatsoever over kinship terminology richness

Summary interpretation:

The PGLS model for mother-son avoidance provides no evidence of any association with kinship terminology richness. While phylogenetic signal remains strong (λ = 1.0, 95% CI: 0.712–NA, p = 0.001), the model itself is essentially uninformative (F₂,₆₃ = 0.055, p = 0.947), explaining less than 0.2% of variance in richness (R² = 0.002, adjusted R² = −0.030). Neither the “present” (β = +8.49, SE = 25.74, p = 0.743) nor the “non-present” (β = +4.11, SE = 26.43, p = 0.877) category differs meaningfully from the baseline. This result mirrors the father-son model and stands in stark contrast to the father-daughter model, reinforcing the emerging pattern that avoidance effects — if present — may be specific to particular kin dyads rather than a general property of avoidance systems.

Mother In Law - Daughter In Law Avoidance Interpretation

#  MODEL 5: Mother-Daughter
pgls_mother_daughter <- pgls(richness ~ avoidance,
                              data = comp_mother_daughter,
                              lambda = "ML")
# summary(pgls_mother_daughter)


coefs_md <- coef(summary(pgls_mother_daughter))
kable(coefs_md, digits = 3, caption = "Model 5: Mother In Law - Daughter In Law Avoidance") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center") %>%
  row_spec(which(coefs[,4] < 0.05), bold = TRUE, color = "#2c7bb6")
Model 5: Mother In Law - Daughter In Law Avoidance
Estimate Std. Error t value Pr(>&#124;t&#124;)
(Intercept) 153.560 17.123 8.968 0.000
avoidancepresent 40.826 30.731 1.328 0.189
avoidancenon-present 21.933 21.042 1.042 0.301
ggplot(data_mother_daughter, aes(x = avoidance, y = richness, fill = avoidance)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  scale_fill_manual(values = c("#3498db", "#e74c3c", "#95a5a6")) +
  theme_minimal(base_size = 14) +
  labs(title = "Kinship Terminology Richness by Avoidance Presence",
       subtitle = "Mother-Daughter avoidance category",
       x = "Avoidance", y = "Number of Kinship Terms") +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

(1) BRANCH LENGTH TRANSFORMATIONS

-> λ = 1.0, p = 0.001 — strong phylogenetic signal, consistent with all previous models -> 95% CI (0.727, NA) excludes 0, confirming significant phylogenetic structure -> Conclusion: phylogenetic correction is necessary and justified

(2) MODEL COEFFICIENTS

-> Reference category (baseline): “no info” (Intercept = 153.6) -> avoidancepresent: languages with mother-daughter avoidance present show richness +40.8 units higher than baseline (β = +40.83, SE = 30.73, p = 0.189) — not significant -> avoidancenon-present: slightly higher than baseline (β = +21.93, SE = 21.04, p = 0.301) — not significant -> Key finding: no significant association, though the direction of the “present” coefficient is positive and the effect size is moderate (+40.8 units) — suggestive but underpowered

(3) MODEL FIT — weak and non-significant

-> R² = 0.033 → avoidance explains 3.3% of variation in richness -> Adjusted R² = 0.003 → nearly zero after penalizing for predictors -> F-test p = 0.345 → model does not approach significance -> Conclusion: mother-daughter avoidance has no statistically significant predictive power, though the trend is in the expected direction

Summary interpretation:

The PGLS model for mother-daughter avoidance shows strong phylogenetic signal (λ = 1.0, 95% CI: 0.727–NA, p = 0.001) but no statistically significant association with kinship terminology richness (F₂,₆₃ = 1.08, p = 0.345, R² = 0.033). Neither the “present” (β = +40.83, SE = 30.73, p = 0.189) nor the “non-present” (β = +21.93, SE = 21.04, p = 0.301) category differs significantly from the baseline. Notably however, the “present” coefficient is positive and of moderate magnitude (+40.8 units), which is directionally consistent with the hypothesis. The lack of significance may partly reflect limited statistical power given the sample size, rather than a true absence of effect. This model occupies an intermediate position between the null results of the father-son and mother-son models and the significant result of the father-daughter model.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.