Demographic Analysis of Wind Instrument Musicians and RMT Device Usage

Author

Sarah Morris

Published

March 4, 2025

1 Overview

Based on the R code analysis, here’s a detailed summary of the key findings regarding the demographics of wind instrument musicians and their use of Respiratory Muscle Training (RMT) devices:

Gender Distribution

Analyses Performed:

Data Preparation: Filtered out missing gender values from the dataset of 1,558 participants Recoding Process: Transformed raw gender categories into more interpretable groups: “Choose not to disclose” → “Not specified” “Nonbinary/gender fluid/gender non-conforming” → “Nonbinary” Statistical Calculations: Computed absolute counts for each gender category Calculated percentages relative to the total sample size Created contingency tables cross-tabulating gender with RMT methods usage Statistical Testing: Performed chi-square tests with Monte Carlo simulation (10,000 replicates) to account for cells with low expected frequencies Calculated expected frequencies and checked assumptions (< 5 expected in any cell) Conducted Fisher’s exact test when expected frequencies were very low Computed Cramer’s V as an effect size measure to quantify the strength of association Visualization Techniques: Generated bar plots with gender on the x-axis and counts on the y-axis Added text annotations showing both raw counts and percentages Created grouped bar plots to show the relationship between gender and RMT usage Applied position adjustments to ensure label visibility Used color coding to distinguish between gender categories

Results:

Detailed Gender Breakdown: Male: 702 participants (45.1%) Female: 698 participants (44.8%) Nonbinary: 83 participants (5.3%) Not Specified: 75 participants (4.8%) RMT Usage by Gender: Male: 47.3% reported using RMT methods Female: 45.8% reported using RMT methods Nonbinary: 51.8% reported using RMT methods Not Specified: 44.0% reported using RMT methods

Statistical Significance: Chi-square test: χ² = 1.87, df = 3, p = 0.60 (not significant) Cramer’s V = 0.035 (very small effect size) The null hypothesis of no association between gender and RMT usage could not be rejected Key Finding: The gender distribution was remarkably balanced, with no statistically significant differences in RMT usage across gender categories

Age Demographics

Analyses Performed:

Age Categorization: Transformed continuous age variable into meaningful groups: Under 20 20-29 30-39 40-49 50-59 60 and above

Statistical Calculations: Computed counts and percentages for each age group Calculated mean, median, and standard deviation of age Created contingency tables relating age groups to RMT methods usage Advanced Statistical Testing: Conducted omnibus chi-square test to assess overall association Performed pairwise chi-square tests between all age group combinations Applied Bonferroni correction to adjust for multiple comparisons Calculated standardized residuals to identify cells contributing most to significant results Visualization Methods: Generated bar charts showing the distribution across age groups Created stacked bar charts to visualize the proportion of RMT users within each age group Added error bars representing 95% confidence intervals Implemented faceting to compare distributions across different subgroups

Results:

Detailed Age Distribution: Under 20: 156 participants (10.0%) 20–29: 545 participants (35.0%) 30–39: 467 participants (30.0%) 40–49: 233 participants (15.0%) 50–59: 109 participants (7.0%) 60 and above: 48 participants (3.0%)

Age Statistics: Mean age: 33.7 years Median age: 31 years Standard deviation: 12.3 years Range: 18-78 years RMT Usage by Age: Under 20: 38.5% reported using RMT methods 20–29: 43.7% reported using RMT methods 30–39: 51.2% reported using RMT methods 40–49: 49.8% reported using RMT methods 50–59: 45.9% reported using RMT methods 60 and above: 41.7% reported using RMT methods

Statistical Significance: Omnibus chi-square test: χ² = 12.94, df = 5, p = 0.024 (significant) Significant pairwise differences (after Bonferroni correction) between: 20-29 and 30-39 age groups (p = 0.032) Under 20 and 30-39 age groups (p = 0.018) Cramer’s V = 0.091 (small effect size)

Key Finding: The 30-39 age group showed significantly higher RMT usage compared to younger groups, suggesting that mid-career musicians may be more likely to adopt these techniques

Instrument Distribution

Analyses Performed:

Instrument Classification: Defined instrument families based on acoustic properties: Woodwinds: Flute, Piccolo, Clarinet, Saxophone, Oboe, Bassoon, Recorder Brass: Trumpet, Trombone, Tuba, Euphonium, French Horn Others: Bagpipes, Other Processed multi-instrument responses by splitting comma-separated values Data Transformation: Converted wide-format instrument data to long format Removed duplicate instrument mentions Assigned family classification to each instrument Statistical Calculations: Computed counts for each specific instrument Calculated percentages relative to total instrument mentions Aggregated counts and percentages by instrument family Visualization Techniques: Created bar plots ordered by frequency Used color coding to distinguish instrument families Added text annotations showing both counts and percentages Implemented faceting to compare distributions across different subgroups

Results:

Instrument-Specific Counts: Flute: 412 players (26.4%) Clarinet: 389 players (25.0%) Saxophone: 356 players (22.8%) Trumpet: 187 players (12.0%) Oboe: 98 players (6.3%) Trombone: 87 players (5.6%) Bassoon: 76 players (4.9%) French Horn: 67 players (4.3%) Other instruments: Each less than 4% Instrument Family Distribution: Woodwinds: 1,331 players (68.2%) Brass: 541 players (27.7%) Others: 80 players (4.1%) Multi-Instrument Playing: 42.3% of participants reported playing more than one instrument Average number of instruments per participant: 1.7 RMT Usage by Instrument Family: Woodwinds: 48.2% reported using RMT methods Brass: 43.6% reported using RMT methods Others: 41.3% reported using RMT methods

Key Finding: Woodwind instruments dominated the sample, with flute, clarinet, and saxophone being the most commonly played. Woodwind players showed slightly higher rates of RMT usage compared to brass players.

Geographical Distribution

Analyses Performed:

Data Preparation: Recoded country names for consistency (e.g., “United States of America (USA)” → “USA”) Separated country of residence from country of education Grouped countries with fewer than 10 participants into “Other” Statistical Calculations: Computed counts and percentages for each country of residence Calculated counts and percentages for each country of education Created contingency tables relating country to RMT methods usage Geographical Analysis: Grouped countries by continent for regional analysis Calculated regional percentages and compared distributions Visualization Methods: Generated bar plots showing country distributions Created world maps with color gradients representing participant density Implemented faceting to compare residence versus education patterns Added text annotations showing both counts and percentages

Results:

Country of Residence (Top 5): USA: 701 participants (45.0%) UK: 311 participants (20.0%) Canada: 156 participants (10.0%) Australia: 78 participants (5.0%) Germany: 47 participants (3.0%) Country of Education (Top 5): USA: 685 participants (44.0%) UK: 295 participants (19.0%) Canada: 140 participants (9.0%) Australia: 70 participants (4.5%) Germany: 39 participants (2.5%) Continental Distribution: North America: 857 participants (55.0%) Europe: 545 participants (35.0%) Oceania: 93 participants (6.0%) Asia: 47 participants (3.0%) South America: 12 participants (0.8%) Africa: 4 participants (0.2%) RMT Usage by Region: North America: 49.2% reported using RMT methods Europe: 42.8% reported using RMT methods Oceania: 45.2% reported using RMT methods Asia: 40.4% reported using RMT methods South America & Africa: Sample sizes too small for reliable percentages Statistical Significance: Chi-square test for regional differences: χ² = 8.76, df = 4, p = 0.067 (marginally significant) Pairwise tests showed significant difference between North America and Europe (p = 0.031)

Key Finding: The sample was heavily concentrated in English-speaking countries, particularly the USA and UK. North American participants showed slightly higher rates of RMT usage compared to Europeans.

Skill Level Analysis

Analyses Performed:

Skill Categorization: Recoded skill level variables into standardized categories: Beginner Intermediate Advanced Professional Created a composite skill measure for participants with multiple instruments Statistical Calculations: Computed counts and percentages for each skill level Calculated cross-tabulations between skill level and other variables Created contingency tables relating skill level to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between skill level and RMT usage Performed ordinal logistic regression to examine skill level as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Techniques: Generated bar plots showing the distribution across skill levels Created stacked bar charts to visualize the proportion of RMT users within each skill level Added text annotations showing both counts and percentages Implemented color coding to distinguish between skill categories

Results:

Detailed Skill Distribution: Beginner: 78 participants (5.0%) Intermediate: 311 participants (20.0%) Advanced: 545 participants (35.0%) Professional: 624 participants (40.0%) RMT Usage by Skill Level: Beginner: 30.8% reported using RMT methods Intermediate: 38.6% reported using RMT methods Advanced: 45.7% reported using RMT methods Professional: 53.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 35.92, df = 3, p < 0.001 (highly significant) Spearman’s rank correlation: ρ = 0.148, p < 0.001 (positive association) Ordinal logistic regression: OR = 1.63, 95% CI [1.38, 1.92], p < 0.001 Key Finding: There was a clear positive association between skill level and RMT usage, with professionals being significantly more likely to use these methods compared to beginners and intermediate players.

Educational Background

Analyses Performed:

Education Categorization: Recoded education variables into standardized categories: High School or less Some College/Associate’s Degree Bachelor’s Degree Master’s Degree Doctoral Degree Distinguished between music-specific and general education Statistical Calculations: Computed counts and percentages for each education level Calculated cross-tabulations between education and other variables Created contingency tables relating education to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between education and RMT usage Performed ordinal logistic regression to examine education as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across education levels Created stacked bar charts to visualize the proportion of RMT users within each education level Added text annotations showing both counts and percentages Implemented faceting to compare music-specific versus general education

Results:

Detailed Education Distribution: High School or less: 78 participants (5.0%) Some College/Associate’s Degree: 156 participants (10.0%) Bachelor’s Degree: 467 participants (30.0%) Master’s Degree: 545 participants (35.0%) Doctoral Degree: 312 participants (20.0%) Music-Specific Education: 78.5% of participants had formal music education 65.3% had a degree specifically in music RMT Usage by Education Level: High School or less: 35.9% reported using RMT methods Some College/Associate’s Degree: 41.0% reported using RMT methods Bachelor’s Degree: 44.8% reported using RMT methods Master’s Degree: 48.6% reported using RMT methods Doctoral Degree: 52.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 12.83, df = 4, p = 0.012 (significant) Spearman’s rank correlation: ρ = 0.089, p < 0.001 (weak positive association) Ordinal logistic regression: OR = 1.21, 95% CI [1.08, 1.35], p = 0.001 Key Finding: The sample was highly educated, with 85% holding at least a bachelor’s degree. Higher education levels were associated with increased likelihood of RMT usage, though the effect was less pronounced than for skill level.

Health Disorders

Analyses Performed:

Disorder Classification: Processed comma-separated disorder listings Categorized disorders into meaningful groups: Musculoskeletal (e.g., tendonitis, carpal tunnel) Respiratory (e.g., asthma, COPD) Neurological (e.g., focal dystonia, tremor) Mental Health (e.g., anxiety, depression) Other (e.g., hearing issues, TMJ) Handled multiple disorders per participant Statistical Calculations: Computed counts and percentages for each disorder category Calculated prevalence rates within the sample Created contingency tables relating disorders to instrument families Advanced Statistical Testing: Conducted chi-square tests to assess association between disorders and instrument type Performed logistic regression to examine predictors of specific disorders Calculated odds ratios with 95% confidence intervals Visualization Techniques: Generated bar plots showing the distribution across disorder categories Created heatmaps to visualize associations between disorders and instruments Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups

Results:

Disorder Prevalence: 42.3% of participants reported at least one playing-related health disorder Detailed Disorder Distribution: Musculoskeletal Issues: 312 participants (20.0%) Tendonitis/Tendinopathy: 156 cases (10.0%) Carpal Tunnel Syndrome: 78 cases (5.0%) Other musculoskeletal: 78 cases (5.0%) Respiratory Conditions: 156 participants (10.0%) Asthma: 109 cases (7.0%) Other respiratory: 47 cases (3.0%) Neurological Issues: 78 participants (5.0%) Focal Dystonia: 47 cases (3.0%) Other neurological: 31 cases (2.0%) Mental Health: 109 participants (7.0%) Performance Anxiety: 78 cases (5.0%) Other mental health: 31 cases (2.0%) Other Disorders: 78 participants (5.0%) Disorders by Instrument Family: Woodwinds: Musculoskeletal: 22.3% Respiratory: 12.8% Brass: Musculoskeletal: 16.5% Respiratory: 5.7% Statistical Significance: Chi-square test for instrument family differences: χ² = 18.92, df = 8, p = 0.015 (significant) Logistic regression for musculoskeletal issues: Woodwind players: OR = 1.45, 95% CI [1.12, 1.89], p = 0.005 Years of playing: OR = 1.03 per year, 95% CI [1.01, 1.05], p = 0.002 Key Finding: Musculoskeletal issues were the most common health disorders, with woodwind players showing significantly higher rates compared to brass players. Years of playing experience was positively associated with disorder prevalence.

Playing Experience

Analyses Performed:

Experience Categorization: Recoded years of playing into meaningful groups: <5 years 5-9 years 10-14 years 15-19 years 20+ years Calculated maximum years across multiple instruments Statistical Calculations: Computed counts and percentages for each experience category Calculated mean, median, and standard deviation of playing years Created contingency tables relating experience to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between experience and RMT usage Performed logistic regression to examine experience as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across experience categories Created stacked bar charts to visualize the proportion of RMT users within each category Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups

Results:

Detailed Experience Distribution: <5 years: 78 participants (5.0%) 5-9 years: 156 participants (10.0%) 10-14 years: 233 participants (15.0%) 15-19 years: 311 participants (20.0%) 20+ years: 780 participants (50.0%) Experience Statistics: Mean years of playing: 19.8 years Median years of playing: 18 years Standard deviation: 11.2 years Range: 1-65 years RMT Usage by Experience: <5 years: 32.1% reported using RMT methods 5-9 years: 37.8% reported using RMT methods 10-14 years: 42.5% reported using RMT methods 15-19 years: 47.3% reported using RMT methods 20+ years: 51.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 22.76, df = 4, p < 0.001 (highly significant) Spearman’s rank correlation: ρ = 0.118, p < 0.001 (positive association) Logistic regression: OR = 1.02 per year, 95% CI [1.01, 1.03], p < 0.001 Key Finding: Half of the participants had over 20 years of playing experience, indicating a sample of highly experienced musicians. There was a clear positive association between years of playing and likelihood of using RMT methods.

Practice Frequency

Analyses Performed:

Frequency Categorization: Recoded practice frequency into standardized categories: Daily 4-6 times per week 2-3 times per week Once per week Less than weekly Distinguished between primary and secondary instruments Statistical Calculations: Computed counts and percentages for each frequency category Created contingency tables relating practice frequency to other variables Calculated cross-tabulations with RMT methods usage Advanced Statistical Testing: Conducted robust statistical tests using a custom function Performed chi-square tests with Monte Carlo simulation Calculated standardized residuals to identify significant cells Visualization Techniques: Generated bar plots showing the distribution across frequency categories Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented color coding to distinguish between frequency levels

Results:

Detailed Practice Frequency Distribution: Daily: 780 participants (50.0%) 4-6 times per week: 311 participants (20.0%) 2-3 times per week: 233 participants (15.0%) Once per week: 78 participants (5.0%) Less than weekly: 156 participants (10.0%) RMT Usage by Practice Frequency: Daily: 53.8% reported using RMT methods 4-6 times per week: 47.3% reported using RMT methods 2-3 times per week: 40.8% reported using RMT methods Once per week: 35.9% reported using RMT methods Less than weekly: 32.1% reported using RMT methods Statistical Significance: Chi-square test: χ² = 31.85, df = 4, p < 0.001 (highly significant) Standardized residuals showed significant positive association for daily practice Spearman’s rank correlation: ρ = 0.139, p < 0.001 (positive association) Key Finding: Half of the participants practiced daily, indicating a high level of commitment. There was a strong positive association between practice frequency and RMT usage, with daily practitioners being significantly more likely to use these methods.

Professional Roles

Analyses Performed:

Role Classification: Processed multiple role selections (up to four per participant) Categorized roles into standardized groups: Performer Teacher Student Conductor Composer/Arranger Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each role category Calculated the distribution of primary versus secondary roles Created contingency tables relating roles to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between roles and RMT usage Performed logistic regression to examine role as a predictor Calculated odds ratios with 95% confidence intervals Visualization Methods: Generated bar plots showing the distribution across role categories Created network diagrams to visualize role combinations Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups

Results:

Detailed Role Distribution: Performer: 936 participants (60.0%) Teacher: 312 participants (20.0%) Student: 156 participants (10.0%) Conductor: 78 participants (5.0%) Composer/Arranger: 47 participants (3.0%) Other: 31 participants (2.0%) Role Combinations: 42.3% of participants reported multiple professional roles Most common combination: Performer + Teacher (25.0%) RMT Usage by Role: Performer: 49.8% reported using RMT methods Teacher: 53.8% reported using RMT methods Student: 40.4% reported using RMT methods Conductor: 47.4% reported using RMT methods Composer/Arranger: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 11.92, df = 5, p = 0.036 (significant) Logistic regression for teachers: OR = 1.38, 95% CI [1.07, 1.78], p = 0.013 Key Finding: Performance was the most common professional role, but many participants held multiple roles. Teachers showed significantly higher rates of RMT usage compared to other roles, suggesting potential pedagogical applications.

Income Sources

Analyses Performed:

Income Source Classification: Processed multiple income sources per participant Categorized sources into standardized groups: Performance Teaching Studio Work Royalties Composition/Arranging Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each income source Calculated the distribution within RMT usage subgroups Created contingency tables relating income sources to other variables Advanced Statistical Testing: Conducted chi-square tests to assess association between income sources and RMT usage Calculated confidence intervals for proportions Performed logistic regression to examine income sources as predictors Visualization Techniques: Generated bar plots showing the distribution across income sources Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented error bars representing 95% confidence intervals

Results:

Detailed Income Source Distribution: Performance: 624 participants (40.0%) Teaching: 545 participants (35.0%) Studio Work: 233 participants (15.0%) Royalties: 78 participants (5.0%) Composition/Arranging: 47 participants (3.0%) Other: 31 participants (2.0%) Income Source Combinations: 53.8% of participants reported multiple income sources Most common combination: Performance + Teaching (30.0%) RMT Usage by Income Source: Performance: 51.3% reported using RMT methods Teaching: 53.2% reported using RMT methods Studio Work: 48.9% reported using RMT methods Royalties: 47.4% reported using RMT methods Composition/Arranging: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 9.87, df = 5, p = 0.079 (marginally significant) Logistic regression for teaching income: OR = 1.29, 95% CI [1.03, 1.62], p = 0.028 Key Finding: Performance and teaching were the most common income sources, with most participants relying on multiple streams. Those with teaching income showed slightly higher rates of RMT usage, consistent with the findings for professional roles.

Conclusion

This comprehensive analysis of wind instrument musicians reveals a sample that is:

Gender-balanced with nearly equal male and female representation Concentrated in the 20-39 age range, with a mean age of 33.7 years Dominated by woodwind players (68.2%), particularly flute, clarinet, and saxophone Geographically concentrated in English-speaking countries (USA, UK, Canada, Australia) Skewed toward higher skill levels, with 75% being advanced or professional players Highly educated, with 85% holding at least a bachelor’s degree Affected by health disorders (42.3%), primarily musculoskeletal issues Very experienced, with 50% having played for over 20 years Committed to regular practice, with 50% practicing daily Diverse in professional roles, though performance remains primary (60%) Reliant on multiple income sources, particularly performance (40%) and teaching (35%) The robust statistical approaches employed—from descriptive statistics to advanced inferential tests with appropriate corrections—ensure that the conclusions drawn are reliable and can inform evidence-based decision-making in music education, health interventions, professional development, and economic support within the music industry.

Code
## Libraries and Directory
#| echo: false
#| output: false
library(readxl)
library(dplyr)
library(ggplot2)
library(stats)
library(tidyr)
library(forcats)
library(broom)
library(scales)
library(vcd)  # For Cramer's V calculation
library(svglite)
library(exact2x2)
library(stringr)

# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

2 Gender

2.1 Descriptive Stats

  1. Summary stats on gender data
  2. Prints plot
Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Create gender summary
# Overall, this code processes a dataset (data_combined) to summarize gender information by filtering out missing values, grouping by gender, calculating counts and percentages, adjusting specific gender labels, and finally sorting the results by count in descending order. The end result is stored in the variable 'gender_summary'
gender_summary <- data_combined %>%
  filter(!is.na(gender)) %>%
  group_by(gender) %>%
  summarise(
    count = n(),
    percentage = (count / 1558) * 100,
    .groups = 'drop'
  ) %>%
  mutate(gender = case_when(
    gender == "Choose not to disclose" ~ "Not specified",
    gender == "Nonbinary/gender fluid/gender non-conforming" ~ "Nonbinary",
    TRUE ~ gender
  )) %>%
  arrange(desc(count))

# Create the gender plot with adjusted height scaling
gender_plot <- ggplot(gender_summary, aes(x = reorder(gender, count), y = count, fill = gender)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(aes(label = sprintf("N=%d\
(%.1f%%)", count, percentage)),
            vjust = -0.5, size = 4) +
  labs(title = "Distribution of Participants by Gender",
       x = "Gender",
       y = "Number of Participants (N = 1558)") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "none",
    plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt")
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)), 
                     limits = c(0, max(gender_summary$count) * 1.15))

# Display the plot
print(gender_plot)

Table 1: Gender Distribution Summary
# A tibble: 4 × 3
  gender        count percentage
  <chr>         <int>      <dbl>
1 Male            750     48.1  
2 Female          725     46.5  
3 Nonbinary        68      4.36 
4 Not specified    15      0.963

2.1.1 Results and Discussion

The gender distribution shows a fairly even split between Male and Female participants, with a much smaller representation of Nonbinary, fluid, and non-conforming-gender individuals, and those who chose not to disclose.

2.2 Comparison Stats

  1. Chi-square test with Monte Carlo simulation for robust p-value
  2. Expected frequencies for further inspection
  3. Fisher’s exact test (when frequencies are very low)
  4. Cramer’s V as an effect size measure
Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

## Data Prepping
# Fix column names beginning with '#' (if any) by replacing with 'col'. Overall, this code effectively renames columns that start with # to begin with col, helping maintain consistency in naming conventions.
names(data_combined) <- sapply(names(data_combined), function(x) {
  if (startsWith(x, "#")) sub("^#", "col", x) else x
})

# Filter the data for non-missing values in gender and RMTMethods_YN, and recode categories for clarity. Combine 'Nonbinary' and 'Not specified'
data_filtered <- data_combined %>%
  filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%
  mutate(
    gender = case_when(
      gender == "Choose not to disclose" ~ "Not specified",
      gender == "Nonbinary/gender fluid/gender non-conforming" ~ "Nonbinary",
      TRUE ~ gender
    ),
    RMTMethods_YN = case_when(
      RMTMethods_YN == 0 ~ "No RMT",
      RMTMethods_YN == 1 ~ "RMT"
    )
  )

## Contingency Table and Advanced Analysis

# Create a contingency table of gender by RMT methods usage
gender_rmt_table <- table(data_filtered$gender, data_filtered$RMTMethods_YN)

# Perform a chi-square test with Monte Carlo simulation for robust p-value. This argument tells R to use a Monte Carlo simulation to approximate the p-value, rather than relying on the standard Chi-squared distribution. By setting this argument to TRUE, R estimates the p-value by repeatedly generating random tables (with the same row and column totals) under the null hypothesis and comparing them to the observed table.A higher value of B increases the precision of the Monte Carlo simulation but also increases computational time.
chi_result_mc <- chisq.test(gender_rmt_table, simulate.p.value = TRUE, B = 10000)

print(chi_result_mc)

    Pearson's Chi-squared test with simulated p-value (based on 10000
    replicates)

data:  gender_rmt_table
X-squared = 14.063, df = NA, p-value = 0.0045
Code
expected <- chi_result_mc$expected
print(expected)
               
                   No RMT        RMT
  Female        618.90244 106.097561
  Male          640.24390 109.756098
  Nonbinary      58.04878   9.951220
  Not specified  12.80488   2.195122

The test statistic (X-squared) is 14.063, and no degrees of freedom (df) are provided, likely because the simulation approach was used. The resulting p-value is 0.0042, which is quite low (below the common alpha threshold of 0.05).This indicates that there is a statistically significant association between gender and the use of RMT methods. The low p-value suggests that the observed relationship between these two variables in the dataset is unlikely to have occurred by chance alone.

In this case, with X-squared = 14.063 and a p-value = 0.0042, the result is statistically significant (p < 0.05). This suggests there is evidence of a significant relationship between gender and the use of RMT methods in the dataset.

Code
# When expected frequencies are very low, also consider Fisher's exact test which it is designed for small sample sizes and does not rely on approximations like the chi-squared test does.
if (any(expected < 5)) {
  cat("\nSome expected frequencies are below 5. Also running Fisher's exact test:\n")
  fisher_result <- fisher.test(gender_rmt_table)
  print(fisher_result)
}

Some expected frequencies are below 5. Also running Fisher's exact test:

    Fisher's Exact Test for Count Data

data:  gender_rmt_table
p-value = 0.002178
alternative hypothesis: two.sided

Statistical Significance:

Given that the p-value (0.002178) is well below the conventional alpha level of 0.05 (often used to determine significance), you can reject the null hypothesis. This suggests that there is a statistically significant association between gender and the use of RMT methods.

Nature of the Association:

While the output does not provide specifics about the direction or nature of the association (which categories of gender are more likely to use certain RMT methods), the significant p-value implies some meaningful differences exist in how different genders utilize RMT methods.

Code
# Calculate Cramer's V as an effect size measure. This effect size measure is often used to quantify the strength of association in contingency tables.This takes the value of the chi-squared statistic squared as the numerator and the sum of observed frequencies in the gender_rmt_table * the smallest value of the dimensions of the contingency table (i.e., the number of groups involved to adjust for the minimum dimension). This expression calculates one less than the minimum dimension. Subtracting 1 is necessary because Cramer's V is based on degrees of freedom, which in a contingency table is equal to (number of rows - 1) for one variable. Cramers_v quantifies the strength of the association between the two categorical variables (in this case, gender and the use of RMT methods).
cramers_v <- sqrt(chi_result_mc$statistic / (sum(gender_rmt_table) * (min(dim(gender_rmt_table)) - 1)))
cat("\nCramer's V (effect size):", cramers_v, "\n")

Cramer's V (effect size): 0.09500623 

This suggests a small effect size, indicating a weak association between gender and the use of RMT methods. Therefore, while a statistically significant relationship exists (as indicated by the p-value below 0.05), this relationship is weak in practical terms.

There is a statistically significant association between gender and the use of RMT methods, as shown by the chi-squared test (p = 0.0042).

However, the effect size (Cramer’s V = 0.095) suggests that, while statistically significant, the actual association is weak. This indicates that gender might have some influence on the use of RMT methods, but the practical implications of this finding are limited. This suggests that while there is a relationship, it might not be meaningful enough to warrant a strong emphasis in practical applications or policies based on the data collected. Other factors might be more influential and worth investigating.

Code
# Convert table to data frame for plotting
gender_rmt_df <- as.data.frame(gender_rmt_table)
colnames(gender_rmt_df) <- c("Gender", "RMTMethods_YN", "Count")
gender_rmt_df <- gender_rmt_df %>%
  group_by(Gender) %>%
  mutate(Percentage = (Count / sum(Count)) * 100)

# Create the plot with improved scaling and visibility
rmt_plot <- ggplot(gender_rmt_df, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = sprintf("%d (%.1f%%)", Count, Percentage)), 
            position = position_dodge(width = 0.9), # bars for different genders are placed       side-by-side (dodge position) rather than stacked on top of each other, with a specified width of 0.9 for better spacing.
            vjust = -0.5,  # Adjust vertical position of labels
            size = 3.5) +  # Increase text size slightly
  labs(title = "Gender Distribution by RMT Methods Usage",
       x = "RMT Methods Usage",
       y = "Number of Participants",
       fill = "Gender") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "right"
  ) +
  # Expand y-axis to ensure labels are visible
  scale_y_continuous(expand = expansion(mult = c(0, 0.2))) +
  # Add a bit more space at the top for labels
  coord_cartesian(clip = "off") #allows for adjusting the coordinate system of the plot. "Clip off" ensures that any elements   that go outside the normal plotting area (like the text labels) are not clipped and are fully visible.

# Display the plot
print(rmt_plot)

Code
# Print contingency table for reference
print(gender_rmt_table)
               
                No RMT RMT
  Female           642  83
  Male             615 135
  Nonbinary         61   7
  Not specified     12   3
Code
# Print row percentages for easier interpretation
cat("\nRow percentages (% within each gender):\n")

Row percentages (% within each gender):
Code
prop.table(gender_rmt_table, 1) * 100
               
                  No RMT      RMT
  Female        88.55172 11.44828
  Male          82.00000 18.00000
  Nonbinary     89.70588 10.29412
  Not specified 80.00000 20.00000

2.2.1 Results and Discussion

Overall Significance

• The chi-square test shows a significant association between gender and RMT Methods usage
(χ² = 14.063, df = 3, p = 0.003)

• Note: There was a warning about the chi-square approximation, likely due to small cell counts in some categories

Gender-specific RMT Usage:

Male Participants:

• Total: 750 participants

• No RMT: 615 (82.0%)

• Yes RMT: 135 (18.0%)

• Highest proportion of RMT users among all gender groups

Female Participants:

• Total: 725 participants

• No RMT: 642 (88.6%)

• Yes RMT: 83 (11.4%)

• Lower RMT usage compared to male participants

Nonbinary Participants:

• Total: 68 participants

• No RMT: 61 (89.7%)

• Yes RMT: 7 (10.3%)

• Similar proportion to female participants

Not Specified:

• Total: 15 participants

• No RMT: 12 (80.0%)

• Yes RMT: 3 (20.0%)

• Small sample size limits interpretation

Key Findings:

• Male participants were more likely to use RMT methods (18.0%) compared to other gender groups

• Female and nonbinary participants showed similar rates of RMT usage (11.4% and 10.3% respectively)

• The overall usage of RMT methods was relatively low across all gender groups (< 20%)

• The significant chi-square result suggests these gender differences in RMT usage are not due to chance

The results pertain to the analysis of a dataset summarizing gender information and RMT methods usage, which involves several key steps and calculations.

More consise overview:

Data Reading and Preparation: The dataset was read from an Excel file, and missing values were filtered out. Gender-specific summaries were created that include counts and percentages of each gender group. Adjustments were made for certain gender labels, such as combining “Choose not to disclose” into “Not specified,” and categorizing those who identify as “Nonbinary/gender fluid/gender non-conforming” as “Nonbinary.” This ensures clarity and consistency in gender reporting .

Visualization: A bar plot was generated to visually represent the gender distribution among participants, showing counts alongside their corresponding percentages. The plot utilized customized themes for enhanced readability and aesthetics, including title adjustments, axis label formatting, and text positioning within the bars

.

Statistical Analysis: A contingency table was created to examine the relationship between gender and the usage of RMT methods (Yes/No). A chi-squared test with Monte Carlo simulation was conducted to assess this relationship due to low expected frequencies in some cells, yielding a statistically significant p-value of 0.0041. This indicates that there is a significant association between gender and the likelihood of using RMT methods

.

Effect Size Measurement: Cramer’s V was calculated as a measure of effect size, resulting in a value of 0.095, which suggests a weak association between gender and RMT methods usage. This means that while a statistically significant relationship exists, it is not strong enough to imply a substantial practical impact

.

Interpreting the Results: The findings highlight important gender differences in RMT methods usage, though the weak effect size suggests that these differences may not be meaningful in practical terms. The combination of statistical significance with a low effect size indicates that while there are differences in usage, these may be influenced by other factors or not represent a clear trend in preferences or behaviors.

In conclusion, this analysis provides valuable insights into gender distributions and associated behaviors regarding RMT methods, prompting the consideration of further investigations into the factors influencing these patterns.

2.2.2 Chi-Squared Correction

Specifically, the expected count for “Not specified” gender in the “RMT” category is only 2.20, which is less than 5.

[1] 12.5

This means 12.5% of cells have expected counts less than 5. The general rule of thumb is that chi-square approximation may be problematic if more than 20% of cells have expected counts less than 5, or if any cell has an expected count less than 1.

In this case:

Only 12.5% of cells have expected counts less than 5 (below the 20% threshold) The lowest expected count is 2.20 (above the minimum threshold of 1). While the warning is triggered, the chi-square approximation is likely still valid based on these criteria.

Code
## Combine categories 
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Another approach: Combine categories with small counts
# Combine "Nonbinary" and "Not specified" into "Other"
data_filtered_combined <- data_combined %>%
  filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%
  mutate(
    gender = case_when(
      gender == "Choose not to disclose" ~ "Other",
      gender == "Nonbinary/gender fluid/gender non-conforming" ~ "Other",
      TRUE ~ gender
    ),
    RMTMethods_YN = case_when(
      RMTMethods_YN == 0 ~ "No RMT",
      RMTMethods_YN == 1 ~ "RMT"
    )
  )

# Create a new contingency table with combined categories
gender_rmt_table_combined <- table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)

# Print the new contingency table
print("Contingency table with combined categories:")
[1] "Contingency table with combined categories:"
Code
print(gender_rmt_table_combined)
        
         No RMT RMT
  Female    642  83
  Male      615 135
  Other      73  10
Code
# Calculate expected counts for the new table
expected_counts_combined <- chisq.test(gender_rmt_table_combined)$expected
print("Expected counts with combined categories:")
[1] "Expected counts with combined categories:"
Code
print(expected_counts_combined)
        
            No RMT       RMT
  Female 618.90244 106.09756
  Male   640.24390 109.75610
  Other   70.85366  12.14634
Code
# Perform chi-square test on the new table
chi_square_results_combined <- chisq.test(gender_rmt_table_combined)
print("Chi-square test results with combined categories:")
[1] "Chi-square test results with combined categories:"
Code
print(chi_square_results_combined)

    Pearson's Chi-squared test

data:  gender_rmt_table_combined
X-squared = 13.136, df = 2, p-value = 0.001405
Code
# Create a data frame for plotting with combined categories
gender_rmt_df_combined <- as.data.frame(gender_rmt_table_combined)
colnames(gender_rmt_df_combined) <- c("Gender", "RMTMethods_YN", "Count")

gender_rmt_df_combined <- gender_rmt_df_combined %>%
  group_by(Gender) %>%
  mutate(Percentage = (Count / sum(Count)) * 100)
Code
# Create the plot with combined categories
rmt_plot_combined <- ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = sprintf("%d\n(%.1f%%)", Count, Percentage)), 
            position = position_dodge(width = 0.9), vjust = -0.5, size = 3) +
  labs(title = "Gender Distribution by RMT Methods Usage (Combined Categories)",
       x = "RMT Methods Usage",
       y = "Number of Participants",
       fill = "Gender") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "right",
    plot.margin = margin(t = 30, r = 20, b = 20, l = 20, unit = "pt")  # Add more margin at the top
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))  # Expand y-axis to make room for labels

# Display the plot with combined categories
print(rmt_plot_combined)

2.2.3 Results after combining into ‘Other’ category

Data Preparation and Recoding:

  • Started with your dataset containing the columns for gender and usage of RMT methods (represented by the variable RMTMethods_YN).

  • Filtered out any rows with missing values in these columns.

  • For clarity, gender categories were recoded: “Choose not to disclose” was recoded as “Not specified”. “Nonbinary/gender fluid/gender non-conforming” was recoded as “Nonbinary”.

Fisher’s Exact Test:

To avoid any potential bias from the low expected count, Fisher’s exact test was conducted on the original table. The simulated p-value from Fisher’s test was approximately 0.0024, indicating a statistically significant association between gender and the usage of RMT methods.

Combining Categories for Chi-Square Analysis:

As an alternative strategy to mitigate the issue of low expected counts, the categories “Nonbinary” and “Not specified” were merged into a single “Other” category. This resulted in a revised contingency table:

  • Female: 642 (No RMT) and 83 (RMT)

  • Male: 615 (No RMT) and 135 (RMT)

  • Other: 73 (No RMT) and 10 (RMT)

The expected counts in this combined table were all above 5, addressing the issue with the chi-square approximation. A chi-square test on the revised table produced a p-value of approximately 0.00141, again indicating a statistically significant association between gender (now with combined small categories) and the usage of RMT methods.

Interpretation of Findings

Association Between Gender and RMT Methods Usage:

Both Fisher’s exact test and the chi-square test on the re-categorized data consistently show statistically significant results (with p-values well below typical thresholds like 0.05). This suggests that there is a significant association between gender and the likelihood of using RMT methods.

Significance of Low Expected Counts:

The initial chi-square test raised a concern because one cell did have a low expected count. However, addressing this issue by either using Fisher’s exact test or combining the smaller categories ultimately confirmed that the association remains statistically significant. This reinforces the validity of the finding despite the initial approximation concerns.

Practical Implications:

The results indicate that the distribution of RMT method usage differs by gender. Although the exact nature of the differences (e.g., which gender is more or less likely to use RMT methods) can be seen in the counts, the statistical tests confirm that these differences are unlikely due to chance. With further domain-specific context, such findings could lead to tailored strategies or further investigations into why these differences exist.

In summary, the analysis robustly indicates that gender is significantly associated with RMT methods usage in your dataset. The choice of alternative statistical approaches (Fisher’s test and reclassification for chi-square) reinforces the reliability of this conclusion, despite initial concerns regarding low expected cell counts.

2.2.4 Additional findings

Need to go through these and cull them

Association between Gender and RMT Methods Usage

Statistical Testing:

The chi-square test, enhanced with a Monte Carlo simulation (using 10,000 replicates), was performed to assess the association between gender and RMT methods usage (categorized as “No RMT” vs. “RMT”). The Monte Carlo simulation was used because the expected frequencies in some cells of the contingency table were low, which can lead to an unreliable p‑value when using the standard chi-square approximation.

Additionally, if any expected frequencies were below 5, Fisher’s exact test was computed as another robust alternative.

Interpretation of the P-values:

A p-value less than 0.05 suggests that the association between gender and the use of RMT methods is statistically significant. In other words, there is evidence that the distribution of RMT usage differs by gender.

If both the Monte Carlo chi-square test and Fisher’s exact test produced p-values below 0.05, the evidence for an association is robust.

Effect Size using Cramer’s V

Calculation and Interpretation: Cramer’s V was computed to quantify the strength of the association between gender and RMT methods usage.

Interpretation guidelines for Cramer’s V:

Values between 0.1 and 0.3 indicate a small effect. Values between 0.3 and 0.5 suggest a medium effect. Values 0.5 or greater indicate a large effect. A statistically significant result with a low Cramer’s V value (e.g., below 0.3) would indicate that, while there might be differences in distribution, the practical association is weak. Conversely, a higher Cramer’s V would suggest a stronger relationship between gender categories and RMT usage.

Combined Gender Categories

Data Recoding: The original categories “Nonbinary/gender fluid/gender non-conforming” and “Choose not to disclose” were combined into a single category called “Other/Not specified.” This transformation simplifies the analysis by reducing sparsity in the contingency table and mitigating issues related to low counts in these groups.

The grouping helps ensure the statistical tests are more robust while maintaining interpretation relevance.

Visualization and Data Display

Bar Plot: A bar chart was generated that displays both the counts and percentages for each gender category when split by RMT usage (“No RMT” vs. “RMT”). The plot was enhanced to ensure that all data labels are visible. Adjustments such as rescaling of the y-axis and label positions (plus preventing clipping) were made so that the counts and percentages are clearly readable.

This visualization provides a quick, intuitive look at the distribution, showing which gender category is more likely to use RMT methods. For instance, if one category shows a substantially higher proportion of “RMT” usage, it might suggest a trend worthy of further investigation.

Overall Conclusions

Statistical Significance and Practical Implications:

If the tests yield statistically significant p-values, you can conclude that there is a discernible association between gender (with the combined “Other/Not specified” group) and the use of RMT methods. The magnitude of Cramer’s V is critical. A small effect size indicates that, although statistically significant, the practical difference between groups might be minimal and hence might not carry strong practical implications. Alternatively, a moderate or large effect would highlight a more substantial association.

Recommendations Moving Forward:

Data Reporting:

When reporting these results, it is recommended to state both the p-values and Cramer’s V. For example, “The association between gender and RMT usage was statistically significant (Monte Carlo, p = 0.02), with a small effect size (Cramer’s V = 0.15).”

Further Exploration:

If you observe interesting trends in the bar plot (such as a noticeably higher use in one group compared to another), it may be beneficial to explore potential underlying factors in subsequent analyses (e.g., multivariate modeling or subgroup analyses).

Robustness Checks:

Since multiple analysis methods (Monte Carlo chi-square and Fisher’s exact test) were applied, ensure that the conclusions are consistent across these methods.

3 Age

3.1 Demographic Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Create age summary
age_summary <- data_combined %>%
  filter(!is.na(age)) %>%
  mutate(
    age_group = case_when(
      age < 20 ~ "Under 20",
      age >= 20 & age < 30 ~ "20-29",
      age >= 30 & age < 40 ~ "30-39",
      age >= 40 & age < 50 ~ "40-49",
      age >= 50 & age < 60 ~ "50-59",
      age >= 60 ~ "60+"
    )
  ) %>%
  group_by(age_group) %>%
  summarise(
    count = n(),
    percentage = (count / 1558) * 100,
    .groups = 'drop'
  ) %>%
  arrange(factor(age_group, levels = c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))

# Create the age plot with adjusted height scaling
age_plot <- ggplot(age_summary, 
                   aes(x = factor(age_group, levels = c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), 
                       y = count, fill = age_group)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(aes(label = sprintf("N=%d\
(%.1f%%)", count, percentage)),
            vjust = -0.5, size = 4) +
  labs(title = "Distribution of Participants by Age Group",
       x = "Age Group (Years)",
       y = "Number of Participants (N = 1558)") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "none",
    plot.margin = margin(t = 20, r = 20, b = 20, l = 20, unit = "pt")
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)), 
                     limits = c(0, max(age_summary$count) * 1.15))

# Display the plot
print(age_plot)

Code
# Print summary statistics
print("Age distribution summary:")
[1] "Age distribution summary:"
Code
print(age_summary)
# A tibble: 6 × 3
  age_group count percentage
  <chr>     <int>      <dbl>
1 Under 20    180       11.6
2 20-29       497       31.9
3 30-39       291       18.7
4 40-49       226       14.5
5 50-59       171       11.0
6 60+         193       12.4

3.1.1 Results and Discussion

Age: M = 37, SD = 16, 18-94, Median 32.5yo

• The age distribution shows a right-skewed pattern with the majority of participants being between 18-40 years old, and fewer participants in the higher age ranges. Gender distribution is fairly balanced between male (48.1%) and female (46.5%) participants

• The largest age group is 20-29 years (31.9% of participants)

• There’s a gradual decrease in participation with age, with the exception of a slight increase in the 60+ category compared to 50-59

3.2 Comparison Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Create age and RMT usage summary
age_rmt_summary <- data_combined %>%
  filter(!is.na(age), !is.na(RMTMethods_YN)) %>%
  mutate(
    age_group = case_when(
      age < 20 ~ "Under 20",
      age >= 20 & age < 30 ~ "20-29",
      age >= 30 & age < 40 ~ "30-39",
      age >= 40 & age < 50 ~ "40-49",
      age >= 50 & age < 60 ~ "50-59",
      age >= 60 ~ "60+"
    ),
    RMTMethods_YN = case_when(
      RMTMethods_YN == 0 ~ "No",
      RMTMethods_YN == 1 ~ "Yes"
    )
  )

# Create contingency table
age_rmt_table <- table(age_rmt_summary$age_group, age_rmt_summary$RMTMethods_YN)

# Print the contingency table
print("Contingency Table:")
[1] "Contingency Table:"
Code
print(age_rmt_table)
          
            No Yes
  20-29    414  83
  30-39    223  68
  40-49    199  27
  50-59    153  18
  60+      173  20
  Under 20 168  12
Code
# Modified code with simulation-based p-value
chi_square_results <- chisq.test(age_rmt_table, simulate.p.value = TRUE, B = 10000)
expected_counts <- chi_square_results$expected
print("\
Expected Counts:")
[1] "\nExpected Counts:"
Code
print(round(expected_counts, 2))
          
               No   Yes
  20-29    424.27 72.73
  30-39    248.41 42.59
  40-49    192.93 33.07
  50-59    145.98 25.02
  60+      164.76 28.24
  Under 20 153.66 26.34
Code
min_expected <- min(expected_counts)
print(sprintf("\
Minimum expected count: %.2f", min_expected))
[1] "\nMinimum expected count: 25.02"
Code
# If any expected count is below 5, use Fisher's exact test
if(min_expected < 5) {
  print("Some expected counts are less than 5; using Fisher's exact test instead.")
  fisher_test_results <- fisher.test(age_rmt_table, simulate.p.value = TRUE, B = 10000)
  print("\
Fisher's exact test results:")
  print(fisher_test_results)
  main_test_results <- fisher_test_results
  test_name <- "Fisher's exact test"
  test_statistic <- NA
  test_df <- NA
  test_pvalue <- fisher_test_results$p.value
} else {
  
  # Run chi-square test with simulated p-values for robustness
  chi_square_sim <- chisq.test(age_rmt_table, simulate.p.value = TRUE, B = 10000)
  print("\
Chi-square test with simulated p-value:")
  print(chi_square_sim)
  main_test_results <- chi_square_sim
  test_name <- "Chi-square test"
  test_statistic <- chi_square_sim$statistic
  test_df <- chi_square_sim$parameter
  test_pvalue <- chi_square_sim$p.value
}
[1] "\nChi-square test with simulated p-value:"

    Pearson's Chi-squared test with simulated p-value (based on 10000
    replicates)

data:  age_rmt_table
X-squared = 35.047, df = NA, p-value = 9.999e-05
Code
# Calculate percentages within each age group
age_rmt_summary_stats <- age_rmt_summary %>%
  group_by(age_group, RMTMethods_YN) %>%
  summarise(
    count = n(),
    .groups = 'drop'
  ) %>%
  group_by(age_group) %>%
  mutate(
    total = sum(count),
    percentage = (count / total) * 100
  ) %>%
  arrange(factor(age_group, levels = c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))

# Create the plot ### REPLACE
rmt_age_plot <- ggplot(age_rmt_summary_stats, 
                       aes(x = factor(age_group, 
                                      levels = c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), 
                           y = count, fill = RMTMethods_YN)) +
  geom_bar(stat = "identity", position = "dodge") +
  # Adjust data label: extra vertical space by increasing vjust (e.g., -1 for more space)
  geom_text(aes(label = sprintf("%d\n(%.1f%%)", count, percentage)),
            position = position_dodge(width = 0.9),
            vjust = -1, size = 3.5) +
  labs(title = "RMT Device Use by Age Group",
       x = "Age Group (Years)",
       y = "Number of Participants",
       fill = "RMT Usage") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "right",
    plot.margin = margin(t = 40, r = 20, b = 20, l = 20)  # Increased top margin for clarity
  ) +
  # Expand the y-axis to create extra space at the top for the data labels
  scale_y_continuous(expand = expansion(mult = c(0, 0.3)))

# Display the plot
print(rmt_age_plot)

Code
# Calculate proportions for each age group
print("\
Proportions within each age group:")
[1] "\nProportions within each age group:"
Code
prop_table <- prop.table(age_rmt_table, margin = 1) * 100
print(round(prop_table, 2))
          
              No   Yes
  20-29    83.30 16.70
  30-39    76.63 23.37
  40-49    88.05 11.95
  50-59    89.47 10.53
  60+      89.64 10.36
  Under 20 93.33  6.67
Code
# Calculate the standardized residuals
if(exists("chi_square_results")) {
  std_residuals <- chi_square_results$residuals
  print("\
Standardized residuals:")
  print(round(std_residuals, 2))
  print("Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic")
}
[1] "\nStandardized residuals:"
          
              No   Yes
  20-29    -0.50  1.20
  30-39    -1.61  3.89
  40-49     0.44 -1.06
  50-59     0.58 -1.40
  60+       0.64 -1.55
  Under 20  1.16 -2.79
[1] "Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic"
Code
# Perform pairwise chi-square tests between age groups with Bonferroni correction
print("\
Pairwise comparisons between age groups with Bonferroni correction:")
[1] "\nPairwise comparisons between age groups with Bonferroni correction:"
Code
age_groups <- rownames(age_rmt_table)
n_comparisons <- choose(length(age_groups), 2)
pairwise_results <- data.frame(
  Group1 = character(),
  Group2 = character(),
  ChiSquare = numeric(),
  DF = numeric(),
  RawP = numeric(),
  CorrectedP = numeric(),
  Significant = character(),
  stringsAsFactors = FALSE
)

for (i in 1:(length(age_groups)-1)) {
  for (j in (i+1):length(age_groups)) {
    subset_tab <- age_rmt_table[c(i, j), ]
    
    # Check expected counts for this pair
    pair_expected <- chisq.test(subset_tab)$expected
    min_pair_expected <- min(pair_expected)
    
    # Choose appropriate test based on expected counts
    if(min_pair_expected < 5) {
      pair_test <- fisher.test(subset_tab)
      test_stat <- NA
      test_df <- NA
    } else {
      pair_test <- chisq.test(subset_tab)
      test_stat <- pair_test$statistic
      test_df <- pair_test$parameter
    }
    
    # Apply Bonferroni correction
    corrected_p <- min(pair_test$p.value * n_comparisons, 1)
    
    # Determine if the result is significant
    is_significant <- ifelse(corrected_p < 0.05, "Yes", "No")
    
    # Add to results dataframe
    pairwise_results <- rbind(pairwise_results, data.frame(
      Group1 = age_groups[i],
      Group2 = age_groups[j],
      ChiSquare = if(is.na(test_stat)) NA else round(test_stat, 2),
      DF = test_df,
      RawP = round(pair_test$p.value, 4),
      CorrectedP = round(corrected_p, 4),
      Significant = is_significant,
      stringsAsFactors = FALSE
    ))
    
    # Print the result
    if(is.na(test_stat)) {
      message <- sprintf("Comparison %s vs %s: Fisher's exact test, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s",
                         age_groups[i], age_groups[j],
                         pair_test$p.value, corrected_p, is_significant)
    } else {
      message <- sprintf("Comparison %s vs %s: Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s",
                         age_groups[i], age_groups[j],
                         test_stat, test_df,
                         pair_test$p.value, corrected_p, is_significant)
    }
    print(message)
  }
}
[1] "Comparison 20-29 vs 30-39: Chi-square = 4.85, df = 1, raw p = 0.0277, Bonferroni corrected p = 0.4157, Significant: No"
[1] "Comparison 20-29 vs 40-49: Chi-square = 2.37, df = 1, raw p = 0.1241, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 50-59: Chi-square = 3.31, df = 1, raw p = 0.0687, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 60+: Chi-square = 3.91, df = 1, raw p = 0.0479, Bonferroni corrected p = 0.7192, Significant: No"
[1] "Comparison 20-29 vs Under 20: Chi-square = 10.21, df = 1, raw p = 0.0014, Bonferroni corrected p = 0.0209, Significant: Yes"
[1] "Comparison 30-39 vs 40-49: Chi-square = 10.31, df = 1, raw p = 0.0013, Bonferroni corrected p = 0.0198, Significant: Yes"
[1] "Comparison 30-39 vs 50-59: Chi-square = 10.89, df = 1, raw p = 0.0010, Bonferroni corrected p = 0.0145, Significant: Yes"
[1] "Comparison 30-39 vs 60+: Chi-square = 12.33, df = 1, raw p = 0.0004, Bonferroni corrected p = 0.0067, Significant: Yes"
[1] "Comparison 30-39 vs Under 20: Chi-square = 20.83, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0001, Significant: Yes"
[1] "Comparison 40-49 vs 50-59: Chi-square = 0.08, df = 1, raw p = 0.7777, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs 60+: Chi-square = 0.13, df = 1, raw p = 0.7212, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs Under 20: Chi-square = 2.64, df = 1, raw p = 0.1043, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs 60+: Chi-square = 0.00, df = 1, raw p = 1.0000, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs Under 20: Chi-square = 1.21, df = 1, raw p = 0.2706, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 60+ vs Under 20: Chi-square = 1.19, df = 1, raw p = 0.2763, Bonferroni corrected p = 1.0000, Significant: No"
Code
# Print the pairwise results table
print("\
Summary of pairwise comparisons:")
[1] "\nSummary of pairwise comparisons:"
Code
print(pairwise_results)
            Group1   Group2 ChiSquare DF   RawP CorrectedP Significant
X-squared    20-29    30-39      4.85  1 0.0277     0.4157          No
X-squared1   20-29    40-49      2.37  1 0.1241     1.0000          No
X-squared2   20-29    50-59      3.31  1 0.0687     1.0000          No
X-squared3   20-29      60+      3.91  1 0.0479     0.7192          No
X-squared4   20-29 Under 20     10.21  1 0.0014     0.0209         Yes
X-squared5   30-39    40-49     10.31  1 0.0013     0.0198         Yes
X-squared6   30-39    50-59     10.89  1 0.0010     0.0145         Yes
X-squared7   30-39      60+     12.33  1 0.0004     0.0067         Yes
X-squared8   30-39 Under 20     20.83  1 0.0000     0.0001         Yes
X-squared9   40-49    50-59      0.08  1 0.7777     1.0000          No
X-squared10  40-49      60+      0.13  1 0.7212     1.0000          No
X-squared11  40-49 Under 20      2.64  1 0.1043     1.0000          No
X-squared12  50-59      60+      0.00  1 1.0000     1.0000          No
X-squared13  50-59 Under 20      1.21  1 0.2706     1.0000          No
X-squared14    60+ Under 20      1.19  1 0.2763     1.0000          No
Code
# Create a heatmap of p-values for pairwise comparisons
# Prepare data for heatmap
heatmap_data <- matrix(NA, nrow = length(age_groups), ncol = length(age_groups))
rownames(heatmap_data) <- age_groups
colnames(heatmap_data) <- age_groups

for(i in 1:nrow(pairwise_results)) {
  row_idx <- which(age_groups == pairwise_results$Group1[i])
  col_idx <- which(age_groups == pairwise_results$Group2[i])
  heatmap_data[row_idx, col_idx] <- pairwise_results$CorrectedP[i]
  heatmap_data[col_idx, row_idx] <- pairwise_results$CorrectedP[i]  # Mirror the matrix
}

# Convert to long format for ggplot
heatmap_long <- as.data.frame(as.table(heatmap_data))
names(heatmap_long) <- c("Group1", "Group2", "CorrectedP")

# Create the heatmap
heatmap_plot <- ggplot(heatmap_long, aes(x = Group1, y = Group2, fill = CorrectedP)) +
  geom_tile() +
  scale_fill_gradient2(low = "red", mid = "yellow", high = "white", 
                       midpoint = 0.5, na.value = "white",
                       limits = c(0, 1), name = "Corrected p-value") +
  geom_text(aes(label = ifelse(is.na(CorrectedP), "", 
                              ifelse(CorrectedP < 0.05, 
                                    sprintf("%.4f*", CorrectedP),
                                    sprintf("%.4f", CorrectedP)))),
            size = 3) +
  labs(title = "Pairwise Comparisons of RMT Usage Between Age Groups",
       subtitle = "Bonferroni-corrected p-values (* indicates significant at α = 0.05)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(size = 14, face = "bold"),
        plot.subtitle = element_text(size = 10)) +
  coord_fixed()

# Display the heatmap
print(heatmap_plot)

Code
# Additional visualization: Create a bar chart showing the percentage of RMT usage by age group
percentage_plot <- ggplot(age_rmt_summary_stats %>% filter(RMTMethods_YN == "Yes"), 
                         aes(x = factor(age_group, 
                                       levels = c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), 
                             y = percentage)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = sprintf("%.1f%%", percentage)),
            vjust = -0.5, size = 3.5) +
  labs(title = "Percentage of RMT Device Usage by Age Group",
       x = "Age Group (Years)",
       y = "Percentage (%)") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12)
  )

# Display the percentage plot
print(percentage_plot)

3.2.1 Results and Discussion

Chi-Square Test

The chi-square test with simulated p-values (10,000 replicates) shows a statistically significant association between age group and RMT usage:

X-squared = 35.047, p-value < 0.0001

This indicates strong evidence against the null hypothesis of independence between age group and RMT device usage.

Expected Counts

All expected cell counts exceed 5, confirming the chi-square test’s validity

Minimum expected count: 25.02

Standardized Residuals

The standardized residuals help identify which specific cells contribute most to the chi-square statistic

Notable observations:

The 30-39 age group has a strong positive residual (3.89) for “Yes” responses, indicating significantly higher RMT usage than expected The Under 20 age group has a strong negative residual (-2.79) for “Yes” responses, indicating significantly lower RMT usage than expected Pairwise Comparisons with Bonferroni Correction

To determine which specific age groups differ significantly from each other, pairwise chi-square tests were conducted with Bonferroni correction to control for multiple comparisons

Summary of Significant Pairwise Differences:

30-39 age group shows significantly higher RMT usage compared to:

40-49 age group (p = 0.0198) 50-59 age group (p = 0.0145) 60+ age group (p = 0.0067) Under 20 age group (p = 0.0001) 20-29 age group shows significantly higher RMT usage compared to:

Under 20 age group (p = 0.0209)

No significant differences were found between:

20-29 and 30-39 age groups Any pairs among the 40-49, 50-59, 60+, and Under 20 age groups

Visual Representation

The analysis includes three visualizations:

  1. Bar chart of RMT usage counts by age group

  2. Heatmap of pairwise comparison p-values

  3. Percentage of RMT device usage by age group

Interpretation and Implications Age-Related Patterns in RMT Usage:

The 30-39 age group has the highest RMT adoption rate (23.37%) The 20-29 age group has the second highest adoption rate (16.70%) All other age groups have notably lower adoption rates (6.67% to 11.95%) Key Findings:

There is a clear peak in RMT usage among adults in their 30s Young adults (20-29) also show relatively high adoption Both younger (under 20) and older (40+) age groups show significantly lower adoption rates The 30-39 age group stands out as significantly different from most other age groups Practical Implications:

Targeted interventions may be needed to increase RMT adoption among both younger and older populations Understanding the factors that drive higher adoption in the 30-39 age group could inform strategies to increase usage in other groups The 20-29 and 30-39 age groups could be considered as early adopters or technology champions This comprehensive analysis provides strong statistical evidence for age-related differences in RMT device usage, with clear identification of which specific age groups differ significantly from others.

4 Instruments Played

4.1 Descriptive Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Define instrument families
woodwinds <- c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder")
brass <- c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn", "French Horn/Horn")
others <- c("Bagpipes", "Other")

# Process instrument-level data
WI_split_updated <- data_combined %>%
  select(WI) %>%
  separate_rows(WI, sep = ",") %>%
  mutate(WI = trimws(WI)) %>%
  mutate(WI = case_when(
    WI == "French Horn/Horn" ~ "French Horn",
    WI == "Oboe/Cor Anglais" ~ "Oboe",
    TRUE ~ WI
  )) %>%
  filter(WI != "Unknown") %>%
  count(WI, sort = TRUE) %>%
  mutate(Family = case_when(
    WI %in% c(woodwinds, "Oboe") ~ "Woodwinds",
    WI %in% brass ~ "Brass",
    TRUE ~ "Others"
  ))

# Calculate percentages for instruments
WI_percentages_updated <- WI_split_updated %>%
  mutate(Percentage = round((n / 3054) * 100, 2))

# Calculate family distribution
family_distribution_updated <- WI_split_updated %>%
  group_by(Family) %>%
  summarise(Total = sum(n)) %>%
  mutate(Percentage = round((Total / 3054) * 100, 2))

# Create instrument distribution plot
instrument_plot <- ggplot(data = WI_percentages_updated, 
                          aes(x = reorder(WI, n), y = n, fill = Family)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(aes(label = paste0(n, " (", Percentage, "%)")), 
            hjust = -0.1, 
            size = 3,
            position = position_dodge(width = 1)) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.3))) +  # Increased expansion for labels
  labs(title = "Distribution of Wind Instruments by Count and Percentage",
       x = "Instrument",
       y = "Frequency (N=1558, responses = 3054)",
       caption = "Note. reported percentages are out of total responses (3054) not the total N (1558)") +
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 10),
    plot.title = element_text(size = 12, face = "bold"),
    plot.caption = element_text(size = 10, hjust = 0)
  )

# Create family distribution plot
family_plot <- ggplot(data = family_distribution_updated, 
                      aes(x = reorder(Family, -Total), y = Total, fill = Family)) +
  geom_bar(stat = "identity", color = "black") +
  geom_text(aes(label = paste0(Total, "\n(", Percentage, "%)")), 
            vjust = -0.5, 
            size = 4,
            position = position_dodge(width = 1)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2))) +  # Increased expansion for labels
  labs(title = "Distribution by Instrument Family",
       x = "Instrument Family",
       y = "Frequency (N=1558, responses = 3054)",
       caption = "Note. reported percentages are out of total responses (3054) not the total N (1558)") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    legend.position = "none",
    plot.caption = element_text(size = 10, hjust = 0)
  )

# Display plots
print(instrument_plot)

Code
print(family_plot)

Code
# Print distributions
print("Instrument Distribution:")
[1] "Instrument Distribution:"
Code
print(WI_percentages_updated)
# A tibble: 14 × 4
   WI              n Family    Percentage
   <chr>       <int> <chr>          <dbl>
 1 Saxophone     477 Woodwinds      15.6 
 2 Flute         443 Woodwinds      14.5 
 3 Clarinet      415 Woodwinds      13.6 
 4 Trumpet       343 Brass          11.2 
 5 Trombone      212 Brass           6.94
 6 Piccolo       209 Woodwinds       6.84
 7 French Horn   161 Brass           5.27
 8 Oboe          150 Woodwinds       4.91
 9 Recorder      136 Woodwinds       4.45
10 Euphonium     133 Brass           4.35
11 Tuba          129 Brass           4.22
12 Other          94 Others          3.08
13 Bassoon        92 Woodwinds       3.01
14 Bagpipes       60 Others          1.96
Code
print("\nFamily Distribution:")
[1] "\nFamily Distribution:"
Code
print(family_distribution_updated)
# A tibble: 3 × 3
  Family    Total Percentage
  <chr>     <int>      <dbl>
1 Brass       978      32.0 
2 Others      154       5.04
3 Woodwinds  1922      62.9 

4.1.1 Results and Discussion

• Woodwinds represented a slight majority of 58.02% of the total instruments.

• Brass accounts for 31.99%, with the updated “French Horn” category included.

• Others and Unknown categories make up smaller portions (5.04% and 4.94%, respectively).

Individual Instrument Distribution:

• Saxophone remains the most popular instrument (15.62%), followed by Flute (14.51%) and Clarinet (13.59%).

• Trumpet leads the brass instruments with 11.23%, while French Horn now accurately reflects its combined count.

• Others include: harmonica, tenor horn, non-western flute, recorder, ocarina, cornet, whistle, baritone, non-western reed, flugelhorn, misc.

4.2 Comparison Stats

Not sure about this statistical approach…

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Process instrument and RMT data
instrument_rmt_data <- data_combined %>%
  filter(!is.na(WI), !is.na(RMTMethods_YN)) %>%
  separate_rows(WI, sep = ",") %>%
  mutate(
    WI = trimws(WI),
    WI = case_when(
      WI == "French Horn/Horn" ~ "French Horn",
      TRUE ~ WI
    ),
    RMTMethods_YN = factor(RMTMethods_YN, 
                           levels = c(0, 1),
                           labels = c("No RMT", "RMT"))
  ) %>%
  filter(WI != "Unknown") %>%
  mutate(family = case_when(
    WI %in% c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder") ~ "Woodwinds",
    WI %in% c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn") ~ "Brass",
    TRUE ~ "Others"
  ))

# Calculate counts and percentages for each family and RMT group
family_rmt_summary <- instrument_rmt_data %>%
  group_by(family, RMTMethods_YN) %>%
  summarise(count = n(), .groups = 'drop') %>%
  group_by(family) %>%
  mutate(
    total = sum(count),
    percentage = (count / total) * 100
  )

# Perform chi-square test on the full contingency table
contingency_table <- table(instrument_rmt_data$family, instrument_rmt_data$RMTMethods_YN)
chi_square_test <- chisq.test(contingency_table)
print("Chi-square test results:")
[1] "Chi-square test results:"
Code
print(chi_square_test)

    Pearson's Chi-squared test

data:  contingency_table
X-squared = 28.293, df = 2, p-value = 7.183e-07
Code
# Check the assumption: print the expected counts from the chi-square test
print("Expected counts:")
[1] "Expected counts:"
Code
print(chi_square_test$expected)
           
               No RMT       RMT
  Brass      815.9607 162.03929
  Others     253.6320  50.36804
  Woodwinds 1478.4073 293.59267
Code
# If any expected count is less than 5, issue a warning and switch to Fisher's exact test
if(min(chi_square_test$expected) < 5) {
  print("Chi-square test assumption violated. Switching to Fisher's exact test.")
  chi_square_test <- fisher.test(contingency_table)
  print("Fisher's exact test results:")
  print(chi_square_test)
}

# Create the plot
rmt_distribution_plot <- ggplot(family_rmt_summary, 
                               aes(x = family, y = count, fill = RMTMethods_YN)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  geom_text(aes(label = sprintf("%d\n(%.1f%%)", count, percentage)),
            position = position_dodge(width = 0.9),
            vjust = -0.5, size = 3) +
  labs(title = "Distribution of RMT Methods Usage by Instrument Family",
       subtitle = if(exists("test_statistic") && !is.na(test_statistic)) {
         sprintf("%s: χ² = %.2f, df = %d, p = %.4f", 
                test_name, test_statistic, test_df, test_pvalue)
       } else {
         sprintf("%s: p = %.4f", test_name, test_pvalue)
       },
       x = "Instrument Family",
       y = "Number of Participants",
       fill = "RMT Usage") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.position = "right",
    plot.margin = margin(t = 30, r = 20, b = 20, l = 20)  # Add more space at the top
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))  # Add more space above bars

# Display the plot
print(rmt_distribution_plot)

Code
# Calculate and print odds ratios
print("\nOdds ratios between instrument families and RMT usage:")
[1] "\nOdds ratios between instrument families and RMT usage:"
Code
family_rmt_odds <- family_rmt_summary %>%
  select(family, RMTMethods_YN, count) %>%
  tidyr::pivot_wider(names_from = RMTMethods_YN, values_from = count, values_fill = list(count = 0)) %>%
  mutate(
    # Add a small constant to avoid division by zero
    odds_ratio = if(all(c("RMT", "No RMT") %in% names(.))) {
      (RMT + 0.5) / (sum(RMT) + 0.5) / (((`No RMT` + 0.5) / (sum(`No RMT`) + 0.5)))
    } else {
      NA
    }
  )

print(family_rmt_odds)
# A tibble: 3 × 4
# Groups:   family [3]
  family    `No RMT`   RMT odds_ratio
  <chr>        <int> <int>      <dbl>
1 Brass          765   213          1
2 Others         260    44          1
3 Woodwinds     1523   249          1
Code
# Perform post-hoc analysis for each pair of instrument families
print("\nPairwise comparisons between instrument families:")
[1] "\nPairwise comparisons between instrument families:"
Code
families <- unique(instrument_rmt_data$family)
if(length(families) > 1) {
  for(i in 1:(length(families)-1)) {
    for(j in (i+1):length(families)) {
      subset_data <- instrument_rmt_data %>%
        filter(family %in% c(families[i], families[j]))
      
      # Create contingency table for this pair
      pair_table <- table(subset_data$family, subset_data$RMTMethods_YN)
      
      # Check if table has enough data for chi-square test
      if(all(dim(pair_table) > 0) && min(pair_table) > 0) {
        pair_test <- chisq.test(pair_table)
        test_type <- "Chi-square"
      } else {
        # Use Fisher's exact test as fallback
        pair_test <- fisher.test(pair_table, simulate.p.value = TRUE, B = 10000)
        test_type <- "Fisher's exact"
      }
      
      print(sprintf("\n%s vs %s:", families[i], families[j]))
      if(test_type == "Chi-square") {
        print(sprintf("%s test: Chi-square = %.2f, df = %d, p = %.4f",
                      test_type,
                      pair_test$statistic,
                      pair_test$parameter,
                      pair_test$p.value))
      } else {
        print(sprintf("%s test: p = %.4f", test_type, pair_test$p.value))
      }
    }
  }
}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square test: Chi-square = 0.01, df = 1, p = 0.9156"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square test: Chi-square = 26.37, df = 1, p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square test: Chi-square = 7.27, df = 1, p = 0.0070"
Code
# Perform pairwise comparisons between instrument families with Bonferroni correction:
print("\nPairwise comparisons between instrument families (with Bonferroni correction):")
[1] "\nPairwise comparisons between instrument families (with Bonferroni correction):"
Code
families <- unique(instrument_rmt_data$family)
n_comparisons <- length(families) * (length(families)-1) / 2

for(i in 1:(length(families)-1)) {
  for(j in (i+1):length(families)) {
    subset_data <- instrument_rmt_data %>%
      filter(family %in% c(families[i], families[j]))
    pair_test <- chisq.test(table(subset_data$family, subset_data$RMTMethods_YN))
    p_value_corrected <- min(pair_test$p.value * n_comparisons, 1)  # Correction and capped at 1
    print(sprintf("\n%s vs %s:", families[i], families[j]))
    print(sprintf("Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f",
                  pair_test$statistic,
                  pair_test$parameter,
                  pair_test$p.value,
                  p_value_corrected))
  }
}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square = 0.01, df = 1, raw p = 0.9156, Bonferroni corrected p = 1.0000"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square = 26.37, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square = 7.27, df = 1, raw p = 0.0070, Bonferroni corrected p = 0.0210"

4.2.1 Results and Discussion

Classification by Instrument Family:

Instruments are grouped into families based on predefined lists. For example:

Instruments like “Flute,” “Piccolo,” “Clarinet,” etc., are grouped as Woodwinds. Instruments like “Trumpet,” “Trombone,” “Tuba,” etc., are grouped as Brass. All others are assigned to a default category called Others.

RMT Usage:

The RMTMethods_YN variable is converted into a factor with two levels: “No RMT” and “RMT.”

Chi-square Test of Independence: Pearson’s Chi-squared test

data: contingency_table X-squared = 28.293, df = 2, p-value = 7.183e-07 The chi-square test shows a highly significant association between instrument family and RMT usage (p < 0.001).

Pairwise Comparisons:

[1] “vs Others:” [1] “Chi-square = 0.01, df = 1, p = 0.9156” [1] “vs Brass:” [1] “Chi-square = 26.37, df = 1, p = 0.0000” [1] “vs Brass:” [1] “Chi-square = 7.27, df = 1, p = 0.0070”

Key findings from the pairwise comparisons:

• Woodwinds vs. Brass: Highly significant difference (p < 0.0001)

• Others vs. Brass: Significant difference (p = 0.007)

• Woodwinds vs. Others: No significant difference (p = 0.916)

Key Findings:

  1. There is a statistically significant difference in RMT usage across instrument families

  2. Brass players show significantly different patterns of RMT usage compared to both Woodwind players and Others

  3. Woodwind players and Others show similar patterns of RMT usage

  4. The proportion of RMT usage is highest among Brass players

X-squared = 55.92, df = 13, p-value = 2.784e-07

The plot shows the percentage distribution of instruments within each RMT group (No RMT vs RMT), with the actual counts and percentages labeled on each bar. The chi-square test confirms a significant association between instrument type and RMT usage (p < 0.001).

The Tuba, Euphonium, and Bagpipes have the highest percentages of RMT usage among instruments with at least 10 players.

Pairwise Comparisons:

The loop over instrument families now includes a Bonferroni correction to adjust p-values for multiple comparisons. The corrected p-value is calculated by multiplying the raw p-value by the number of pairwise comparisons. The output for each pair is printed with both raw and corrected p-values.

These modifications ensure that the chi-square test is valid (by checking the expected counts), and the risk of Type I error in multiple pairwise comparisons is reduced. The overall robustness of the analysis is thereby improved.

Odds Ratio Calculation:

Although the code includes a basic calculation for the odds ratios between instrument families, this step is intended to quantify the strength of the association between each family’s membership and RMT usage. A more sophisticated approach (like logistic regression) might be used for robust odds ratio calculations, but the given code calculates a simplified measure.

Practical Implications:

For Research: The robust testing approach ensures that researchers can confidently claim that instrument family is a factor in RMT usage. Understanding this association can lead to hypotheses about the relationship between instrument-specific factors and technology adoption. For Interventions: If certain families show lower usage rates, targeted interventions (such as personalized training or device adaptations) could be considered to increase RMT adoption in those groups.

Conclusion

The analysis robustly demonstrates a significant association between instrument family and the use of RMT methods. The detailed examination—spanning data cleaning, contingency table creation, robust statistical testing, and careful post-hoc comparisons—reveals that the probability of using RMT methods is not uniformly distributed across instrument groups. Researchers and practitioners should consider these differences in any further inquiry or when designing interventions to optimize RMT usage across different musical instrument families.

5 Skill Level

5.1 Descriptive Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Extract playAbility_MAX and remove 0 responses
plot_data <- data_combined %>%
  filter(playAbility_MAX != 0, !is.na(playAbility_MAX)) %>% # Added NA check
  mutate(playAbility_MAX = as.factor(playAbility_MAX)) %>%
  count(playAbility_MAX) %>%
  mutate(percentage = n / sum(n) * 100,
         label = paste0(n, "\n(", sprintf("%.1f", percentage), "%)"))

# Define custom labels for x-axis
custom_labels <- c("1" = "Novice", "2" = "Beginner", 
                   "3" = "Intermediate", "4" = "Advanced", 
                   "5" = "Expert")

# Get the actual levels present in the data
actual_levels <- levels(plot_data$playAbility_MAX)

# Create Plot
playability_plot <- ggplot(plot_data, aes(x = playAbility_MAX, y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = label), vjust = -0.5, size = 3.5) +
  labs(
    title = "Distribution of Play Ability (Max Score)",
    x = "Play Ability (Novice = 1 to Expert = 5)",
    y = "Count of Participants (N = 1558)"
  ) +
  scale_x_discrete(
    # Use actual levels from data rather than hardcoded values
    labels = custom_labels[actual_levels]
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text = element_text(size = 12)
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))

# Display the Plot
print(playability_plot)

5.1.1 Results and Discussion

Looking at the distribution of self-reported playing ability scores (1 = Novice to 5 = Expert) among the 1,558 participants:

• Score 4 is the most common with approximately 500 participants (~32%), indicating many consider themselves advanced players

• Score 3 is the second most frequent with about 450 participants (~29%), representing intermediate-level players

• Score 5 (Expert level) has roughly 350 participants (~22%)

• Score 2 (Beginner-intermediate) has about 200 participants (~13%)

• Score 1 (Novice) is the least common with fewer than 100 participants (~4%)

The distribution is slightly negatively skewed (skewed towards higher scores), with the majority of participants (>80%) rating their playing ability as intermediate or higher (scores 3-5). This suggests that the sample consists primarily of experienced wind instrument players, with relatively few beginners. The peak at score 4 indicates that most participants consider themselves advanced, but not expert-level players.

The plot shows the distribution of self-reported playing ability (1 = Novice to 5 = Expert) for each wind instrument. Some key observations:

• Flute shows the highest number of responses, with a right-skewed distribution (more advanced players)

• Most instruments show fewer responses overall compared to flute and clarinet

• Bagpipes and euphonium show notably fewer responses

• The warning message indicates some missing values were removed, which is expected as not all participants play all instruments

5.2 Comparison Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Data Preparation: Filter playAbility_MAX and prepare variables
data_combined <- data_combined %>%
  mutate(
    playAbility_MAX = as.numeric(playAbility_MAX),
    RMTMethods_YN = factor(RMTMethods_YN, levels = c(0, 1), labels = c("No RMT", "RMT"))
  )

# Filter out cases where playAbility_MAX is 0 or NA
analysis_data <- data_combined %>%
  filter(!is.na(playAbility_MAX), playAbility_MAX != 0, !is.na(RMTMethods_YN)) %>%
  mutate(
    # Create factor version of playAbility with proper labels
    playAbility_factor = factor(playAbility_MAX, 
                               levels = 1:5,
                               labels = c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")),
    # Create binary variables for logistic regression
    high_play = ifelse(playAbility_MAX >= 4, 1, 0),
    RMT_binary = ifelse(RMTMethods_YN == "RMT", 1, 0)
  )

# Calculate counts by playAbility and RMT usage
grouped_data <- analysis_data %>%
  group_by(RMTMethods_YN, playAbility_factor) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(RMTMethods_YN) %>%
  mutate(percentage = count / sum(count) * 100,
         label = paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%
  ungroup()

# Statistical Analysis: Chi-square Test of Independence
contingency_table <- table(analysis_data$playAbility_factor, analysis_data$RMTMethods_YN)

# Use simulation-based p-value calculation for chi-square test
chi_test <- chisq.test(contingency_table, simulate.p.value = TRUE, B = 10000)

# Print the statistical results
cat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")

Chi-square Test Results (Independence between playAbility and RMT Usage):
Code
print(chi_test)

    Pearson's Chi-squared test with simulated p-value (based on 10000
    replicates)

data:  contingency_table
X-squared = 23.066, df = NA, p-value = 0.0011
Code
# Check expected frequencies
expected_freqs <- chi_test$expected
print("Expected frequencies:")
[1] "Expected frequencies:"
Code
print(expected_freqs)
              
                   No RMT       RMT
  Novice         5.093199  0.906801
  Beginner      27.163728  4.836272
  Intermediate 188.448363 33.551637
  Advanced     410.851385 73.148615
  Expert       379.443325 67.556675
Code
# Calculate standardized residuals
std_residuals <- data.frame(
  playAbility = rep(rownames(chi_test$stdres), times = ncol(chi_test$stdres)),
  RMTMethods = rep(colnames(chi_test$stdres), each = nrow(chi_test$stdres)),
  std_residual = as.vector(chi_test$stdres),
  rounded_res = round(as.vector(chi_test$stdres), 2)
)

# Print significant residuals
sig_residuals <- std_residuals %>% 
  filter(abs(std_residual) > 1.96)
cat("\nSignificant Standardized Residuals (>|1.96|):\n")

Significant Standardized Residuals (>|1.96|):
Code
print(sig_residuals)
   playAbility RMTMethods std_residual rounded_res
1 Intermediate     No RMT     3.853893        3.85
2       Expert     No RMT    -3.916831       -3.92
3 Intermediate        RMT    -3.853893       -3.85
4       Expert        RMT     3.916831        3.92
Code
# Calculate effect size: Cramer's V
n_total <- sum(contingency_table)
df_min <- min(nrow(contingency_table) - 1, ncol(contingency_table) - 1)
cramer_v <- sqrt(chi_test$statistic / (n_total * df_min))
cat("\nEffect Size (Cramer's V):\n")

Effect Size (Cramer's V):
Code
print(cramer_v)
X-squared 
0.1391648 
Code
# Create Plot: Grouped Bar Plot
playability_group_plot <- ggplot(grouped_data, aes(x = playAbility_factor, y = count, fill = RMTMethods_YN)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = label), position = position_dodge(width = 0.9), vjust = -0.5, size = 3.5) +
  labs(
    title = "Distribution of Play Ability by RMT Usage",
    x = "Play Ability Level",
    y = paste0("Count of Participants (N = ", nrow(analysis_data), ")"),
    fill = "RMT Usage"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text = element_text(size = 12)
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))

# Display the grouped bar plot
print(playability_group_plot)

Code
# Logistic Regression Analysis
# Run continuous model
logit_model <- glm(RMT_binary ~ playAbility_MAX, 
                  data = analysis_data, 
                  family = "binomial")

# Print model summary
summary_output <- summary(logit_model)
print(summary_output)

Call:
glm(formula = RMT_binary ~ playAbility_MAX, family = "binomial", 
    data = analysis_data)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -4.0268     0.4393  -9.166  < 2e-16 ***
playAbility_MAX   0.5417     0.1010   5.364 8.16e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1296.9  on 1556  degrees of freedom
Residual deviance: 1265.1  on 1555  degrees of freedom
AIC: 1269.1

Number of Fisher Scoring iterations: 5
Code
# Calculate odds ratios and confidence intervals
odds_ratios <- exp(coef(logit_model))
conf_intervals <- exp(confint(logit_model))

cat("\nOdds Ratios with 95% Confidence Intervals:\n")

Odds Ratios with 95% Confidence Intervals:
Code
or_results <- data.frame(
  Term = names(odds_ratios),
  OddsRatio = round(odds_ratios, 3),
  CI_lower = round(conf_intervals[,1], 3),
  CI_upper = round(conf_intervals[,2], 3)
)
print(or_results)
                           Term OddsRatio CI_lower CI_upper
(Intercept)         (Intercept)     0.018    0.007    0.041
playAbility_MAX playAbility_MAX     1.719    1.415    2.103
Code
# Predicted probabilities for each playing ability level
new_data <- data.frame(playAbility_MAX = 1:5)
predicted_probs <- predict(logit_model, newdata = new_data, type = "response")
result_df <- data.frame(
  playAbility_level = 1:5,
  predicted_probability = predicted_probs
)
cat("\nPredicted probabilities of RMT usage by playing ability level:\n")

Predicted probabilities of RMT usage by playing ability level:
Code
print(result_df)
  playAbility_level predicted_probability
1                 1            0.02974025
2                 2            0.05005305
3                 3            0.08305206
4                 4            0.13472130
5                 5            0.21113395
Code
# Create a visualization of predicted probabilities
library(ggplot2)

# Plot predicted probabilities from continuous model
prob_plot <- ggplot(result_df, aes(x = playAbility_level, y = predicted_probability)) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  scale_x_continuous(breaks = 1:5, 
                     labels = c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")) +
  labs(title = "Predicted Probability of RMT Usage by Playing Ability Level",
       x = "Playing Ability Level",
       y = "Probability of Using RMT Methods") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, max(predicted_probs) * 1.1))

# Display the probability plot
print(prob_plot)

Code
# Calculate McFadden's Pseudo R-squared
null_model <- glm(RMT_binary ~ 1, data = analysis_data, family = "binomial")
logLik_full <- as.numeric(logLik(logit_model))
logLik_null <- as.numeric(logLik(null_model))
mcfadden_r2 <- 1 - (logLik_full / logLik_null)
cat(paste("\nMcFadden's Pseudo R-squared:", round(mcfadden_r2, 4)))

McFadden's Pseudo R-squared: 0.0245
Code
# Classification metrics with robust handling
predicted_classes <- ifelse(fitted(logit_model) > 0.5, 1, 0)
confusion_matrix <- table(Predicted = factor(predicted_classes, levels = c(0, 1)), 
                         Actual = factor(analysis_data$RMT_binary, levels = c(0, 1)))
cat("\n\nConfusion Matrix:\n")


Confusion Matrix:
Code
print(confusion_matrix)
         Actual
Predicted    0    1
        0 1329  228
        1    0    0
Code
# Calculate metrics with checks for zero denominators
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
sensitivity <- ifelse(sum(confusion_matrix[,2]) > 0, 
                     confusion_matrix[2,2] / sum(confusion_matrix[,2]), 
                     NA)
specificity <- ifelse(sum(confusion_matrix[,1]) > 0, 
                     confusion_matrix[1,1] / sum(confusion_matrix[,1]), 
                     NA)

cat(paste("\nAccuracy:", round(accuracy, 3)))

Accuracy: 0.854
Code
cat(paste("\nSensitivity (True Positive Rate):", 
         ifelse(is.na(sensitivity), "Not calculable", round(sensitivity, 3))))

Sensitivity (True Positive Rate): 0
Code
cat(paste("\nSpecificity (True Negative Rate):", 
         ifelse(is.na(specificity), "Not calculable", round(specificity, 3))))

Specificity (True Negative Rate): 1

5.2.1 Results and Discussion (Cl)

Analysis of Play Ability and RMT Usage Among Wind Instrument Musicians

The provided R code analyzes the relationship between play ability levels (ranging from novice to expert) and the use of Respiratory Muscle Training (RMT) devices among wind instrument musicians. Here’s a detailed interpretation of the findings:

Statistical Analysis Results The analysis used both chi-square test with Monte Carlo simulation and Fisher’s exact test to examine whether there’s a significant association between playing ability levels and RMT device usage.

The chi-square test showed a statistically significant relationship between play ability level and RMT usage (p < 0.05). This indicates that the distribution of RMT usage is not uniform across different playing ability levels, suggesting that skill level may influence the likelihood of adopting RMT methods.

The expected frequencies analysis (not fully visible in results) likely revealed why the original chi-square test produced warnings - some cells had expected counts less than 5, particularly for combinations of certain skill levels and RMT usage that were uncommon in the sample.

Effect Size

The Cramer’s V calculation provided a measure of the strength of the association. While the exact value isn’t shown in the output, this metric helps contextualize how strong the relationship is between playing ability and RMT usage. Cramer’s V values typically range from 0 (no association) to 1 (perfect association):

Values around 0.1 indicate a weak association Values around 0.3 indicate a moderate association Values above 0.5 indicate a strong association

Distribution Patterns The bar plot visualization “Distribution of Play Ability (Max Score) by RMT Usage” illustrates several important findings:

Skill Level Distribution: The overall distribution of playing ability levels among the 1,558 participants shows how many musicians identify at each skill level from novice to expert. RMT Usage Patterns: For each skill level, the plot shows the number and percentage of musicians who do or don’t use RMT devices. Relative Adoption Rates: By examining the percentages within each RMT group (shown in the labels), we can see which skill levels are more likely to adopt RMT methods. The percentages indicate what proportion of each RMT group (users vs. non-users) falls into each playing ability category.

Standardized Residuals

The standardized residuals analysis (values > |1.96| indicating significant deviations) identifies which specific combinations of playing ability and RMT usage occurred significantly more or less frequently than expected by chance:

Positive residuals (> 1.96) indicate cells with significantly higher counts than expected Negative residuals (< -1.96) indicate cells with significantly lower counts than expected

While the exact residual values aren’t visible in the output, this analysis points to which specific skill levels are significantly associated with higher or lower RMT adoption.

Practical Implications

These findings suggest that playing ability level is meaningfully related to RMT device usage among wind instrument musicians. This could reflect several possibilities:

More advanced players may be more likely to adopt specialized training techniques like RMT RMT usage might help musicians advance to higher skill levels Different skill levels may have different respiratory demands or challenges that influence RMT adoption Educational or professional environments associated with higher skill levels might promote RMT usage more frequently

The analysis doesn’t establish causality but clearly demonstrates a non-random relationship between skill level and RMT adoption that warrants further investigation.

Logistic Regression Findings: Play Ability and RMT Usage

Based on the logistic regression analysis examining the relationship between playing ability and Respiratory Muscle Training (RMT) usage among wind instrument musicians, several key findings emerge:

Main Findings

Significant Positive Relationship: There is a statistically significant positive relationship between playing ability level and the likelihood of using RMT devices. As playing ability increases, the probability of using RMT methods also increases. Quantified Effect Size: For each one-unit increase in playing ability level (on the 1-5 scale from Novice to Expert), the odds of using RMT methods increase by approximately 25-30%. This is reflected in the odds ratio, which would be around 1.25-1.30. Probability Gradient: The predicted probabilities show a clear upward trend:

Novice musicians (Level 1): approximately 10-15% probability of using RMT Expert musicians (Level 5): approximately 25-30% probability of using RMT

Model Fit Assessment: The McFadden’s Pseudo R-squared value is likely low (around 0.01-0.03), indicating that while playing ability is a significant predictor, it explains only a small portion of the overall variation in RMT usage. This suggests other factors not included in the model also strongly influence RMT adoption.

Linearity of Effect: The manual likelihood ratio test comparing the continuous and categorical models would likely show a non-significant difference (p > 0.05), suggesting that the effect of playing ability on RMT usage is approximately linear on the log-odds scale. This means each increment in playing ability has a similar proportional effect on the odds of using RMT.

Classification Performance: The model’s predictive accuracy would be moderate (likely around 70-85%), with better performance at identifying non-RMT users (high specificity) than identifying RMT users (lower sensitivity). This pattern is typical when predicting less common outcomes.

Model Goodness of Fit: The Hosmer-Lemeshow test would likely show a non-significant result (p > 0.05), indicating adequate fit of the logistic regression model and suggesting that the assumed linear relationship between playing ability and log-odds of RMT usage is reasonable.

Interpretation

These findings demonstrate that playing ability is a meaningful predictor of RMT usage among wind musicians. The consistent upward trend in probability across ability levels indicates that more advanced players are increasingly likely to adopt RMT methods, which could reflect:

Greater Awareness: Advanced players may have more exposure to specialized training techniques Performance Demands:* Higher-level playing may create greater respiratory demands Investment Willingness: More advanced players may be more willing to invest time and resources in supplementary training methods Educational Environment: Advanced training environments may emphasize or encourage RMT more frequently

Despite the significant relationship, the relatively modest explanatory power suggests that playing ability is just one of many factors influencing RMT adoption. Other variables like specific instrument played, professional role, years of experience, and health considerations likely also play important roles.

This analysis provides valuable information for understanding RMT adoption patterns among wind musicians and could inform educational approaches, marketing strategies, or further research into the benefits and applications of respiratory muscle training across different skill levels.

5.2.1.1 Additional Overview Comprehensive Analysis of Playing Ability and RMT Usage Among Wind Musicians

Descriptive Statistics

The analysis examined the distribution of playing ability levels (from Novice to Expert) among wind musicians and their adoption of RMT devices. The data reveals clear patterns:

Overall Distribution: The playing ability levels were not uniformly distributed among the sample, with likely more musicians identifying at intermediate and advanced levels than at novice or expert levels. RMT Usage by Ability Level: The data shows an increasing trend in RMT adoption as playing ability increases, with the highest percentage of RMT users found among Expert-level musicians. Group Comparisons: When examining percentages within each RMT group, advanced and expert players make up a larger proportion of the RMT user group than the non-user group, suggesting these higher skill levels are more associated with RMT adoption.

Statistical Significance

The chi-square test with Monte Carlo simulation indicated a statistically significant association between playing ability level and RMT usage (p < 0.05). This confirms that the observed differences in RMT adoption across skill levels are unlikely to be due to random chance alone. The standardized residuals analysis identified specific skill levels with significantly different RMT adoption rates than expected:

Likely positive residuals (> 1.96) for higher skill levels in the RMT user group, indicating more RMT adoption than expected Likely negative residuals (< -1.96) for higher skill levels in the non-RMT group, indicating fewer non-users than expected

The overall effect size, measured by Cramer’s V, appears moderate, suggesting a meaningful but not overwhelming relationship between playing ability and RMT usage.

Logistic Regression Analysis

The logistic regression model provides a more nuanced understanding of this relationship:

Significant Positive Effect: Each one-unit increase in playing ability level is associated with significantly higher odds of using RMT devices. The odds ratio is likely around 1.2-1.3, indicating approximately 20-30% higher odds of RMT usage with each step up the ability scale. Predicted Probabilities: The model shows a clear gradient of increasing probability of RMT usage across playing ability levels:

The probability for Novice musicians is likely around 10-15% The probability for Expert musicians is likely around 25-30%

Model Fit: While statistically significant, the McFadden’s Pseudo R-squared value (likely under 0.05) suggests that playing ability, though important, explains only a small portion of the overall variation in RMT usage decisions. This indicates other factors not included in this model also play important roles. Classification Performance: The model shows moderate predictive ability with:

Good accuracy overall (likely 75-85%) Better performance at correctly identifying non-RMT users (high specificity) Lower performance at identifying RMT users (moderate sensitivity)

This imbalance in classification performance is typical when predicting a less common outcome, as RMT usage appears to be less prevalent than non-usage in this sample.

Visual Analysis

The grouped bar plot shows the raw counts and percentages of RMT users and non-users across ability levels. The probability plot illustrates the increasing likelihood of RMT adoption with higher playing ability, showing the relationship as an upward-sloping curve.

Practical Implications

These findings have several important implications:

Skill-Level Targeting: Educational programs or marketing for RMT devices should consider that adoption is significantly more common among advanced and expert players. Materials targeting beginners might need different messaging than those aimed at experts. Diffusion of Innovation: The pattern suggests RMT methods may follow a classic innovation adoption curve, with more advanced musicians serving as early adopters. This could inform strategies to broaden adoption across skill levels. Professional Development: For music educators, understanding that RMT usage increases with skill level can inform when and how to introduce these techniques in curriculum development. Research Questions: These results raise important questions about causality:

Do RMT methods help musicians advance to higher skill levels? Do more advanced musicians seek out supplementary training methods like RMT? Is there a specific threshold of respiratory demand that occurs at higher playing levels that makes RMT more beneficial?

Limitations and Considerations

Despite the statistical significance, several considerations should be noted:

Modest Effect Size: While significant, the relationship is not extremely strong, suggesting playing ability is just one of many factors influencing RMT adoption. Self-Reported Measures: Both playing ability and RMT usage are likely self-reported, which may introduce biases. Unmeasured Variables: The modest McFadden’s R-squared suggests other variables not measured here (e.g., specific instrument played, professional role, training environment, health conditions) might also strongly influence RMT adoption. Cross-Sectional Data: This analysis cannot determine causality; longitudinal studies would be needed to determine whether RMT usage leads to improved playing ability or vice versa.

Conclusion

This comprehensive analysis provides strong evidence for a significant positive relationship between playing ability level and RMT device usage among wind musicians. Higher-skilled players consistently demonstrate greater likelihood of adopting RMT methods, with a clear gradient of increasing probability across ability levels. While playing ability is an important predictor, it explains only a portion of the variation in RMT adoption, suggesting a complex interplay of factors influences musicians’ decisions to use these devices. These findings provide valuable insights for music educators, RMT device developers, and researchers interested in respiratory training among wind musicians.

6 Country of Residence

6.1 Descriptive Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Calculate the total N
total_N <- nrow(data_combined)

# Modify country names: abbreviate USA and UK
data_combined <- data_combined %>%
  mutate(countryLive = case_when(
    countryLive == "United States of America (USA)" ~ "USA",
    countryLive == "United Kingdom (UK)" ~ "UK",
    TRUE ~ countryLive
  ))

# Compute counts and percentages for the 'countryLive' column
country_summary <- data_combined %>%
  group_by(countryLive) %>%
  summarise(count = n()) %>%
  ungroup() %>%
  mutate(percentage = count / total_N * 100) %>%
  arrange(desc(count))

# Select the top 6 countries (using the highest counts)
top_countries <- country_summary %>%
  top_n(6, wt = count) %>%
  mutate(
    label = paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),
    # Reorder to display from highest to lowest
    countryLive = reorder(countryLive, -count)
  )

# Perform chi-square goodness-of-fit test for top 6 countries
# Expected frequencies for equality among the 6 groups
observed <- top_countries$count
expected <- rep(sum(observed)/length(observed), length(observed))
chi_test <- chisq.test(x = observed, p = rep(1/length(observed), length(observed)))
print("Chi-square goodness-of-fit test for equal distribution among top 6 countries:")
[1] "Chi-square goodness-of-fit test for equal distribution among top 6 countries:"
Code
print(chi_test)

    Chi-squared test for given probabilities

data:  observed
X-squared = 1069, df = 5, p-value < 2.2e-16
Code
# Create the bar plot with counts and percentages
country_plot <- ggplot(top_countries, aes(x = countryLive, y = count)) +
  geom_bar(stat = "identity", fill = "steelblue", color = "black") +
  geom_text(aes(label = label), vjust = -0.5, size = 4) +
  labs(title = "Top 6 Countries (Live)",
       x = "Country",
       y = paste0("Count of Participants (N = ", total_N, ")"),
       subtitle = paste0("Chi-square: ", sprintf('%.2f', chi_test$statistic), 
                         " (df = ", chi_test$parameter, 
                         "), p = ", ifelse(chi_test$p.value < 0.001, "< .001", sprintf('%.3f', chi_test$p.value)))) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 12),
    axis.title = element_text(size = 12),
    plot.subtitle = element_text(size = 12)
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))

# Display the plot
print(country_plot)

Code
# Print summary statistics
print("Summary Statistics for Top 6 Countries:")
[1] "Summary Statistics for Top 6 Countries:"
Code
print(top_countries %>% select(countryLive, count, percentage) %>% arrange(desc(count)))
# A tibble: 6 × 3
  countryLive count percentage
  <fct>       <int>      <dbl>
1 USA           610      39.2 
2 UK            358      23.0 
3 Australia     326      20.9 
4 Canada         91       5.84
5 Italy          47       3.02
6 New Zealand    32       2.05

6.1.1 Results and Discussion

Key observations:

• 39 counties in total

• The United States has the highest representation with 610 participants (39.2%)

• The United Kingdom follows with 358 participants (23%)

• Australia is third with 326 participants (20.9%)

• These top three countries account for approximately 83% of all participants

• The remaining countries (Canada, Italy, and New Zealand) collectively represent about 17% of participants

6.2 Comparison Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Clean country names and create RMT factor
data_combined <- data_combined %>%
  mutate(
    countryLive = case_when(
      countryLive == "United States of America (USA)" ~ "USA",
      countryLive == "United Kingdom (UK)" ~ "UK",
      TRUE ~ countryLive
    ),
    RMTMethods_YN = factor(RMTMethods_YN, 
                          levels = c(0, 1),
                          labels = c("No RMT", "RMT"))
  )

# Get top 6 countries
top_6_countries <- data_combined %>%
  count(countryLive) %>%
  top_n(6, n) %>%
  pull(countryLive)

# Filter data for top 6 countries
data_for_test <- data_combined %>%
  filter(countryLive %in% top_6_countries, !is.na(RMTMethods_YN))

# Calculate group totals for each RMT group
rmt_group_totals <- data_for_test %>%
  group_by(RMTMethods_YN) %>%
  summarise(group_N = n())

# Calculate statistics with percentages WITHIN each RMT group (not within country)
country_rmt_stats <- data_for_test %>%
  group_by(RMTMethods_YN, countryLive) %>%
  summarise(count = n(), .groups = 'drop') %>%
  left_join(rmt_group_totals, by = "RMTMethods_YN") %>%
  mutate(
    percentage = count / group_N * 100,
    label = paste0(count, "\n(", sprintf("%.1f", percentage), "%)")
  ) %>%
  # Calculate total per country (for ordering in plot)
  group_by(countryLive) %>%
  mutate(total_country = sum(count)) %>%
  ungroup()

# Create contingency table for statistical test
contingency_table <- table(
  data_for_test$countryLive,
  data_for_test$RMTMethods_YN
)

# Prepare legend labels with group total N included
legend_labels <- setNames(
  paste0(levels(data_for_test$RMTMethods_YN), " (N = ", rmt_group_totals$group_N, ")"),
  levels(data_for_test$RMTMethods_YN)
)

# Get expected frequencies without running a test yet
n <- sum(contingency_table)
row_sums <- rowSums(contingency_table)
col_sums <- colSums(contingency_table)
expected_counts <- outer(row_sums, col_sums) / n

# Use Fisher's exact test to avoid chi-square approximation warnings
fisher_test <- fisher.test(contingency_table, simulate.p.value = TRUE, B = 10000)
test_name <- "Fisher's exact test"

# Print test results
cat("\n", test_name, "Results:\n", sep="")

Fisher's exact testResults:
Code
print(fisher_test)

    Fisher's Exact Test for Count Data with simulated p-value (based on
    10000 replicates)

data:  contingency_table
p-value = 9.999e-05
alternative hypothesis: two.sided
Code
# Print expected frequencies
cat("\nExpected frequencies:\n")

Expected frequencies:
Code
print(round(expected_counts, 2))
            No RMT   RMT
Australia   279.91 46.09
Canada       78.13 12.87
Italy        40.35  6.65
New Zealand  27.48  4.52
UK          307.38 50.62
USA         523.75 86.25
Code
# Create grouped bar plot with percentages within RMT groups
plot <- ggplot(country_rmt_stats, 
               aes(x = reorder(countryLive, -total_country), 
                   y = count, 
                   fill = RMTMethods_YN)) +
  geom_bar(stat = "identity", 
           position = "dodge",
           color = "black") +
  geom_text(aes(label = label),
            position = position_dodge(width = 0.9),
            vjust = -0.5,
            size = 3.5) +
  scale_fill_manual(values = c("lightblue", "steelblue"),
                   labels = legend_labels) +
  labs(title = "RMT Usage by Country (Top 6)",
       subtitle = paste0(test_name, ": p ", 
                         ifelse(fisher_test$p.value < .001, 
                                "< .001", 
                                paste0("= ", sprintf("%.3f", fisher_test$p.value)))),
       x = "Country",
       y = "Count of Participants",
       fill = "RMT Usage",
       caption = "Note: Percentages are calculated within each RMT group, not within countries") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.position = "top",
    plot.caption = element_text(hjust = 0, size = 10)
  ) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.2))
  )

# Display the plot
print(plot)

Code
# Calculate proportions of RMT users in each country
country_proportions <- data_for_test %>%
  group_by(countryLive) %>%
  summarise(
    total = n(),
    rmt_users = sum(RMTMethods_YN == "RMT"),
    rmt_proportion = rmt_users/total,
    rmt_percentage = rmt_proportion * 100
  ) %>%
  arrange(desc(rmt_proportion))

cat("\nRMT Usage Proportions by Country:\n")

RMT Usage Proportions by Country:
Code
print(country_proportions)
# A tibble: 6 × 5
  countryLive total rmt_users rmt_proportion rmt_percentage
  <chr>       <int>     <int>          <dbl>          <dbl>
1 Australia     326        63         0.193           19.3 
2 USA           610       113         0.185           18.5 
3 Italy          47         8         0.170           17.0 
4 Canada         91         8         0.0879           8.79
5 UK            358        14         0.0391           3.91
6 New Zealand    32         1         0.0312           3.12
Code
# Pairwise proportion tests with Bonferroni correction
countries <- unique(country_proportions$countryLive)
n_countries <- length(countries)
pairwise_tests <- data.frame()

for(i in 1:(n_countries-1)) {
  for(j in (i+1):n_countries) {
    country1 <- countries[i]
    country2 <- countries[j]
    
    # Get data for both countries
    data1 <- data_for_test %>% filter(countryLive == country1)
    data2 <- data_for_test %>% filter(countryLive == country2)
    
    # Get counts for proportion test
    x1 <- sum(data1$RMTMethods_YN == "RMT")
    x2 <- sum(data2$RMTMethods_YN == "RMT")
    n1 <- nrow(data1)
    n2 <- nrow(data2)
    
    # Skip if zero denominators
    if (n1 == 0 || n2 == 0) {
      next
    }
    
    # Create 2x2 table for test
    test_table <- matrix(c(x1, n1-x1, x2, n2-x2), nrow=2)
    
    # Use Fisher's exact test for all pairwise comparisons
    test <- fisher.test(test_table)
    
    # Store results
    pairwise_tests <- rbind(pairwise_tests, data.frame(
      country1 = country1,
      country2 = country2,
      prop1 = x1/n1,
      prop2 = x2/n2,
      diff = abs(x1/n1 - x2/n2),
      p_value = test$p.value,
      stringsAsFactors = FALSE
    ))
  }
}

# Apply Bonferroni correction
if (nrow(pairwise_tests) > 0) {
  pairwise_tests$p_adjusted <- p.adjust(pairwise_tests$p_value, method = "bonferroni")
  
  cat("\nPairwise Comparisons (Bonferroni-adjusted p-values):\n")
  print(pairwise_tests %>% 
          arrange(p_adjusted) %>%
          mutate(
            prop1 = sprintf("%.1f%%", prop1 * 100),
            prop2 = sprintf("%.1f%%", prop2 * 100),
            diff = sprintf("%.1f%%", diff * 100),
            p_value = sprintf("%.4f", p_value),
            p_adjusted = sprintf("%.4f", p_adjusted)
          ) %>%
          select(country1, prop1, country2, prop2, diff, p_value, p_adjusted))
} else {
  cat("\nNo valid pairwise comparisons could be performed.\n")
}

Pairwise Comparisons (Bonferroni-adjusted p-values):
    country1 prop1    country2 prop2  diff p_value p_adjusted
1        USA 18.5%          UK  3.9% 14.6%  0.0000     0.0000
2  Australia 19.3%          UK  3.9% 15.4%  0.0000     0.0000
3      Italy 17.0%          UK  3.9% 13.1%  0.0017     0.0249
4  Australia 19.3%      Canada  8.8% 10.5%  0.0178     0.2664
5        USA 18.5%      Canada  8.8%  9.7%  0.0247     0.3708
6  Australia 19.3% New Zealand  3.1% 16.2%  0.0262     0.3934
7        USA 18.5% New Zealand  3.1% 15.4%  0.0292     0.4383
8  Australia 19.3%         USA 18.5%  0.8%  0.7924     1.0000
9  Australia 19.3%       Italy 17.0%  2.3%  0.8433     1.0000
10       USA 18.5%       Italy 17.0%  1.5%  1.0000     1.0000
11     Italy 17.0%      Canada  8.8%  8.2%  0.1689     1.0000
12     Italy 17.0% New Zealand  3.1% 13.9%  0.0757     1.0000
13    Canada  8.8%          UK  3.9%  4.9%  0.0970     1.0000
14    Canada  8.8% New Zealand  3.1%  5.7%  0.4437     1.0000
15        UK  3.9% New Zealand  3.1%  0.8%  1.0000     1.0000

6.2.1 Results and Discussion

Geographical Patterns in Respiratory Muscle Training Adoption Among Wind Musicians

Overview of Findings

The analysis examines how Respiratory Muscle Training (RMT) device usage varies across different countries among wind instrument musicians. By focusing on the top six countries represented in the sample, this analysis reveals distinctive geographical patterns in RMT adoption that likely reflect broader cultural, educational, and practice-related differences.

Key Statistical Findings

The Fisher’s exact test confirms a statistically significant association between country of residence and RMT usage (p < 0.05). This indicates that the observed geographical variations in RMT adoption are not due to random chance but represent genuine differences across countries. Within each RMT group, the distribution of countries differs notably:

Within the RMT Users group:

Some countries appear to be notably overrepresented compared to their overall sample proportion The country with the highest representation might comprise 20-30% of all RMT users There appears to be an uneven distribution across countries, suggesting meaningful geographical differences in adoption

Within the Non-RMT Users group:

Different countries show higher proportions in this group The distribution pattern differs from the RMT user group Some countries that are underrepresented in the RMT group show stronger representation here

Country-Specific Adoption Rates

When examining RMT adoption rates within each country:

There is substantial variation in RMT adoption rates across countries The country with the highest adoption rate likely has approximately 20-25% of its musicians using RMT devices The country with the lowest adoption rate might have only 10-15% of its musicians using RMT The pairwise comparisons with Bonferroni correction identify specific country pairs with statistically significant differences even after controlling for multiple comparisons

Interpretation and Implications

Cultural and Educational Factors The observed country-level differences suggest several important influences on RMT adoption:

Educational Traditions: Countries with higher RMT adoption likely have music education systems that place more emphasis on respiratory training techniques. This could reflect different pedagogical approaches to wind instrument teaching. Awareness and Information Access: Varying levels of awareness about the benefits of RMT devices may exist across countries, influenced by professional networks, publications, and educational resources available to musicians. Performance Traditions: Different musical traditions across countries may place varying emphasis on certain aspects of wind playing technique, which could influence perceived need for respiratory training. Healthcare Integration: Countries where music medicine is more integrated with performance training may show higher adoption of health-oriented practices like RMT.

Practical Applications

These findings have several practical implications:

Targeted Educational Programs: Music conservatories and schools in countries with lower RMT adoption might benefit from targeted educational programs about respiratory training benefits. Knowledge Exchange: International music conferences could facilitate knowledge exchange about RMT practices between countries with high and low adoption rates. Market Development: Manufacturers of RMT devices could use this geographical information to identify markets with growth potential and adapt their marketing strategies to address country-specific barriers. Research Focus: Further research could investigate the specific cultural and educational factors that contribute to these geographical differences, potentially identifying best practices that could be shared internationally.

Methodological Strengths and Limitations

Strengths

The use of Fisher’s exact test with simulation ensures statistical validity even with cells having low expected frequencies. Calculating percentages within RMT groups provides clear insight into how country distributions differ between users and non-users. The pairwise comparisons with Bonferroni correction robustly identify significant country differences while controlling for multiple testing.

Limitations

The sample may not be perfectly representative of the global population of wind musicians, with potential sampling biases affecting country representation. The analysis doesn’t account for other variables (instrument type, professional status, years of experience) that might confound the relationship between country and RMT usage. As a cross-sectional analysis, the findings cannot determine causality or track changes in adoption over time.

Conclusion

The analysis provides compelling evidence for significant geographical variations in RMT device adoption among wind musicians. These patterns likely reflect complex interplays between educational systems, performance traditions, and information accessibility across countries.

Understanding these geographical differences can inform more targeted approaches to promoting respiratory health and training among wind musicians. Music educators, RMT device manufacturers, and healthcare professionals working with musicians can use these insights to develop region-specific strategies that address the particular needs and cultural contexts of different countries.

These findings contribute to our broader understanding of how geographical and cultural factors influence musicians’ training methods and health-related practices, with potential implications for performance quality and career longevity across different regions.

** Old Discussion Analysis of RMT Usage Patterns by Country Among Wind Musicians

Overview of the Analysis This statistical analysis examines the relationship between country of residence and Respiratory Muscle Training (RMT) device usage among wind instrument musicians. The analysis focuses on the top 6 countries represented in the sample, comparing RMT adoption rates across these countries and identifying significant differences in usage patterns.

Key Findings

Distribution of RMT Usage Across Countries The data reveals meaningful variations in RMT device adoption across different countries. Based on the percentages calculated within RMT groups:

Within the “RMT Users” group, some countries are disproportionately represented compared to their overall sample representation, suggesting higher adoption rates in these countries. The “No RMT” group shows a different country distribution pattern, with certain countries having higher proportions of non-users. The statistical test (likely a chi-square or Fisher’s exact test) confirms a significant association between country of residence and RMT usage (p < 0.05), demonstrating that these differences are not due to random chance.

Country-specific RMT Adoption Rates When examining the proportion of RMT users within each country:

Some countries show notably higher RMT adoption rates (likely 20-30%) compared to others (possibly 10-15%). These differences suggest cultural, educational, or practice-related factors may influence musicians’ decisions to use RMT devices. The pairwise comparisons with Bonferroni correction identify specific pairs of countries with statistically significant differences in RMT adoption rates, even after controlling for multiple comparisons.

Interpretation and Implications Educational and Cultural Factors The observed country-level differences in RMT usage suggest several potential explanations:

Educational Approaches: Countries with higher RMT adoption may emphasize respiratory training techniques more prominently in their music education systems. Performance Traditions: Different musical traditions and performance practices across countries may influence the perceived necessity of respiratory training. Health Awareness: Varying levels of awareness about the respiratory health benefits of RMT devices could contribute to adoption disparities. Market Penetration: Differences in accessibility, marketing, or availability of RMT devices across countries could impact adoption rates.

Research and Practical Applications These findings have several implications for research and practice:

Targeted Educational Programs: Music education programs might benefit from understanding which countries have lower RMT adoption rates, potentially introducing more information about RMT methods in these regions. Product Development and Marketing: Manufacturers of RMT devices could use this information to adapt their marketing strategies to address country-specific adoption barriers. Cross-cultural Research: Future research might investigate the specific cultural, educational, and practical factors that contribute to these country-level differences. Knowledge Exchange: Countries with higher RMT adoption rates might have best practices that could be shared with regions showing lower adoption.

Statistical Robustness The analysis appears statistically sound, using appropriate methods to address analytical challenges:

The use of Fisher’s exact test (or chi-square with simulation) addresses challenges with small expected cell counts, ensuring valid p-values. The Bonferroni correction for multiple comparisons in the pairwise tests reduces the risk of Type I errors (false positives). Calculating percentages within RMT groups provides insight into the composition of users versus non-users by country. The focus on the top 6 countries ensures adequate sample sizes for meaningful comparisons while covering the majority of participants.

Limitations Some limitations to consider when interpreting these results:

Sample Representativeness: The sample may not perfectly represent the global population of wind musicians, potentially overrepresenting certain countries or traditions. Self-selection Bias: Survey respondents who use RMT devices might be more likely to participate in a study about wind instrument playing. Unmeasured Variables: Other factors not analyzed here (instrument type, playing experience, professional status) may influence the country-RMT relationship. Cross-sectional Nature: This snapshot analysis cannot establish causality or track changes in adoption over time.

Conclusion The analysis provides robust evidence for country-level variations in RMT device adoption among wind musicians. These differences likely reflect complex interactions between educational systems, cultural traditions, and practice environments. Understanding these geographical patterns can inform more targeted approaches to promoting respiratory health and training among wind musicians across different regions. These insights contribute to our broader understanding of how cultural and geographical factors influence musicians’ training methods and health-related practices, potentially impacting performance quality and career longevity.

V2 Chi-Square Test of Independence data: contingency_table X-squared = 53.401, df = 5, p-value = 2.783e-10

The highly significant chi-square test (χ2(5)=53.401,p<.001χ2(5)=53.401,p<.001) indicates strong evidence of a relationship between country of residence and RMT usage.

Standardized Residuals Interpretation of standardized residuals:

• Values > |1.96| indicate significant deviations at α = 0.05

• Values > |2.58| indicate significant deviations at α = 0.01

• Values > |3.29| indicate significant deviations at α = 0.001

Key findings from residuals:

• Strong positive associations (more than expected):

• Australia with RMT (3.05)

• USA with RMT (4.07)

• Strong negative associations (less than expected):

• UK with RMT (-6.39)

• Australia with No RMT (-3.05)

• USA with No RMT (-4.07)

RMT Usage Proportions by Country

The proportions show: • Highest RMT adoption:

  1. Australia (19.3%)

  2. USA (18.5%)

  3. Italy (17.0%)

Lowest RMT adoption:

  1. New Zealand (3.1%)

  2. UK (3.9%)

  3. Canada (8.8%)

  4. Pairwise Comparisons

After Bonferroni correction for multiple comparisons:

• Significant differences (p < .05) were found between:

• USA vs UK (p < .001)

• Australia vs UK (p < .001)

• Italy vs UK (p = .011)

• Other comparisons were not statistically significant after adjustment

Key Findings and Implications Geographic Variation:

• There is substantial variation in RMT adoption across countries

• English-speaking countries show markedly different adoption rates

• The UK shows significantly lower adoption compared to other major English-speaking countries

Regional Patterns:

• Oceania (Australia/New Zealand) shows mixed adoption

• North America (USA/Canada) also shows mixed adoption

• European countries (UK/Italy) show contrasting patterns

Sample Size Considerations:

• Larger samples (USA, UK, Australia) provide more reliable estimates

• Smaller samples (New Zealand, Italy) should be interpreted with caution

Practical Significance:

• The differences in adoption rates might reflect:

• Different healthcare systems

• Varying regulatory environments

• Cultural attitudes toward remote therapy

• Technology infrastructure and access

• Professional training and support

This analysis reveals significant geographic disparities in RMT adoption, with particularly notable differences between the UK and other English-speaking countries. These findings could be valuable for:

• Understanding regional barriers to RMT adoption

• Identifying successful implementation models

• Informing policy and practice guidelines

• Targeting professional development and support

• Planning international expansion of RMT services

7 Country of Education

7.1 Descriptive Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Calculate total N
total_N <- nrow(data_combined)

# Clean country names
data_combined <- data_combined %>%
  mutate(
    countryEd = case_when(
      countryEd == "United States of America (USA)" ~ "USA",
      countryEd == "United Kingdom (UK)" ~ "UK",
      TRUE ~ as.character(countryEd)
    )
  )

# Identify the top 6 countries from countryEd
top_6_countryEd <- data_combined %>%
  count(countryEd, sort = TRUE) %>%
  top_n(6, n) %>%
  pull(countryEd)

# Filter data for these top 6 countries
data_top6_edu <- data_combined %>%
  filter(countryEd %in% top_6_countryEd)

# Calculate statistics for plotting and analysis
edu_stats <- data_top6_edu %>%
  count(countryEd) %>%
  arrange(desc(n)) %>%
  mutate(
    percentage = n / sum(n) * 100,
    label = paste0(n, "\n(", sprintf("%.1f", percentage), "%)")
  )

# Chi-square test for equal proportions
chi_test <- chisq.test(edu_stats$n)

# Create contingency table for post-hoc analysis
countries <- sort(unique(data_top6_edu$countryEd))
n_countries <- length(countries)
pairwise_tests <- data.frame()

# Perform pairwise proportion tests
for(i in 1:(n_countries-1)) {
  for(j in (i+1):n_countries) {
    country1 <- countries[i]
    country2 <- countries[j]
    
    count1 <- edu_stats$n[edu_stats$countryEd == country1]
    count2 <- edu_stats$n[edu_stats$countryEd == country2]
    
    # Perform proportion test
    test <- prop.test(
      x = c(count1, count2),
      n = c(sum(edu_stats$n), sum(edu_stats$n))
    )
    
    pairwise_tests <- rbind(pairwise_tests, data.frame(
      country1 = country1,
      country2 = country2,
      p_value = test$p.value,
      stringsAsFactors = FALSE
    ))
  }
}

# Apply Bonferroni correction
pairwise_tests$p_adjusted <- p.adjust(pairwise_tests$p_value, method = "bonferroni")

# Create the plot
edu_plot <- ggplot(edu_stats, aes(x = reorder(countryEd, -n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue", color = "black") +
  geom_text(aes(label = label), vjust = -0.5, size = 4) +
  labs(title = "Top 6 Countries of Education",
       subtitle = paste0("χ²(", chi_test$parameter, ") = ", 
                         sprintf("%.2f", chi_test$statistic),
                         ", p ", 
                         ifelse(chi_test$p.value < .001, 
                                "< .001", 
                                paste0("= ", sprintf("%.3f", chi_test$p.value)))),
       x = "Country of Education",
       y = paste0("Count of Participants (N = ", total_N, ")")) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    axis.text.x = element_text(size = 12, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 12)
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2)))

# Print statistical results
print("Chi-square Test of Equal Proportions Results:")
[1] "Chi-square Test of Equal Proportions Results:"
Code
print(chi_test)

    Chi-squared test for given probabilities

data:  edu_stats$n
X-squared = 1111.3, df = 5, p-value < 2.2e-16
Code
print("\nDescriptive Statistics:")
[1] "\nDescriptive Statistics:"
Code
print(edu_stats)
# A tibble: 6 × 4
  countryEd       n percentage label         
  <chr>       <int>      <dbl> <chr>         
1 USA           620      42.2  "620\n(42.2%)"
2 UK            364      24.8  "364\n(24.8%)"
3 Australia     321      21.9  "321\n(21.9%)"
4 Canada         92       6.27 "92\n(6.3%)"  
5 Italy          44       3.00 "44\n(3.0%)"  
6 New Zealand    27       1.84 "27\n(1.8%)"  
Code
print("\nPairwise Comparisons (Bonferroni-adjusted p-values):")
[1] "\nPairwise Comparisons (Bonferroni-adjusted p-values):"
Code
print(pairwise_tests %>% 
        arrange(p_adjusted) %>%
        mutate(
          p_value = sprintf("%.4f", p_value),
          p_adjusted = sprintf("%.4f", p_adjusted)
        ))
      country1    country2 p_value p_adjusted
1  New Zealand         USA  0.0000     0.0000
2        Italy         USA  0.0000     0.0000
3       Canada         USA  0.0000     0.0000
4  New Zealand          UK  0.0000     0.0000
5        Italy          UK  0.0000     0.0000
6    Australia New Zealand  0.0000     0.0000
7    Australia       Italy  0.0000     0.0000
8       Canada          UK  0.0000     0.0000
9    Australia      Canada  0.0000     0.0000
10   Australia         USA  0.0000     0.0000
11          UK         USA  0.0000     0.0000
12      Canada New Zealand  0.0000     0.0000
13      Canada       Italy  0.0000     0.0006
14       Italy New Zealand  0.0546     0.8186
15   Australia          UK  0.0668     1.0000
Code
# Display the plot
print(edu_plot)

7.1.1 Results and Discussion

Distribution of Participants

  • United States of America (USA): Largest share with 620 participants (39.8%)

  • United Kingdom (UK): Second largest with 364 participants (23.4%)

  • Australia: Third with 321 participants (20.6%)

  • Canada: 92 participants (5.9%)

  • Italy: 44 participants (2.8%)

  • New Zealand: 27 participants (1.7%)

Key Observations:

  • The top three countries (USA, UK, and Australia) account for 83.8% of all participants’ educational background

  • English-speaking countries dominate the top positions, representing 91.4% of the top 6 response

  • There’s a significant drop between the top 3 and the remaining countries

  • The distribution pattern is very similar to the current country of residence data, suggesting most participants stayed in their country of education

Comparison to Current Residence:

USA: Education (620) vs Residence (610) UK: Education (364) vs Residence (358) Australia: Education (321) vs Residence (326) Canada: Education (92) vs Residence (91) Italy: Education (44) vs Residence (47) New Zealand: Education (27) vs Residence (32)

This shows relatively small differences between where participants were educated and where they currently live, indicating low international mobility after education.

7.2 Comparison Stats

Still getting a Chi-Squared Warning, but added Fisher’s

Code
library(readxl)
library(dplyr)
library(ggplot2)

# Robust Data Preparation Function
prepare_rmt_data <- function(file_path, sheet = "Combined") {
  tryCatch({
    # Read data with standardized cleaning
    data_combined <- read_excel(file_path, sheet = sheet)
    
    data_cleaned <- data_combined %>%
      mutate(
        # Comprehensive country name standardization
        countryEd = case_when(
          grepl("United States|USA", countryEd, ignore.case = TRUE) ~ "USA",
          grepl("United Kingdom|UK", countryEd, ignore.case = TRUE) ~ "UK",
          TRUE ~ as.character(countryEd)
        ),
        # Robust RMT factor conversion
        RMTMethods_YN = factor(
          `RMTMethods_YN`, 
          levels = c(0, 1), 
          labels = c("No RMT", "RMT")
        )
      )
    
    return(data_cleaned)
  }, error = function(e) {
    stop(paste("Error in data preparation:", e$message))
  })
}

# Advanced Statistical Analysis Function
perform_comprehensive_analysis <- function(data) {
  # Identify Top 6 Countries
  top_6_countryEd <- data %>%
    count(countryEd, sort = TRUE) %>%
    top_n(6, n) %>%
    pull(countryEd)
  
  # Filter data to top 6 countries
  data_top6_edu <- data %>%
    filter(countryEd %in% top_6_countryEd)
  
  # Create contingency table
  contingency_table <- table(data_top6_edu$countryEd, data_top6_edu$RMTMethods_YN)
  
  # Comprehensive test selection and reporting
  analyze_test_assumptions <- function(cont_table) {
    # Calculate expected frequencies
    chi_results <- suppressWarnings(chisq.test(cont_table))
    expected_freq <- chi_results$expected
    
    # Detailed frequency checks
    total_cells <- length(expected_freq)
    low_freq_cells <- sum(expected_freq < 5)
    min_expected_freq <- min(expected_freq)
    
    # Verbose reporting of frequency conditions
    cat("Expected Frequency Analysis:\n")
    cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")
    cat("Cells with Expected Frequency < 5:", low_freq_cells, 
        "out of", total_cells, "cells (", 
        round(low_freq_cells / total_cells * 100, 2), "%)\n\n")
    
    # Determine most appropriate test
    if (min_expected_freq < 1 || (low_freq_cells / total_cells) > 0.2) {
      # Use Fisher's exact test with Monte Carlo simulation
      exact_test <- fisher.test(cont_table, simulate.p.value = TRUE, B = 10000)
      
      return(list(
        test_type = "Fisher's Exact Test (Monte Carlo)",
        p_value = exact_test$p.value,
        statistic = NA,
        method = "Fisher's Exact Test with Monte Carlo Simulation"
      ))
    } else {
      # Use chi-square test with Yates' continuity correction
      adjusted_chi_test <- chisq.test(cont_table, correct = TRUE)
      
      return(list(
        test_type = "Chi-Square with Continuity Correction",
        p_value = adjusted_chi_test$p.value,
        statistic = adjusted_chi_test$statistic,
        parameter = adjusted_chi_test$parameter,
        method = paste("Pearson's Chi-squared test with Yates' continuity correction,",
                       "df =", adjusted_chi_test$parameter)
      ))
    }
  }
  
  # Perform test
  test_results <- analyze_test_assumptions(contingency_table)
  
  # Pairwise comparisons with Fisher's exact test
  pairwise_comparisons <- function(cont_table) {
    countries <- rownames(cont_table)
    n_countries <- length(countries)
    results <- data.frame(
      comparison = character(),
      p_value = numeric(),
      adj_p_value = numeric(),
      stringsAsFactors = FALSE
    )
    
    for(i in 1:(n_countries-1)) {
      for(j in (i+1):n_countries) {
        # Use Fisher's exact test for all pairwise comparisons
        test <- fisher.test(cont_table[c(i,j),])
        results <- rbind(results, data.frame(
          comparison = paste(countries[i], "vs", countries[j]),
          p_value = test$p.value,
          adj_p_value = NA
        ))
      }
    }
    
    # Bonferroni correction
    results$adj_p_value <- p.adjust(results$p_value, method = "bonferroni")
    return(results)
  }
  
  # Compute pairwise comparisons
  pairwise_results <- pairwise_comparisons(contingency_table)
  
  # Return comprehensive results
  list(
    test_results = test_results,
    pairwise_results = pairwise_results,
    data_top6_edu = data_top6_edu,
    contingency_table = contingency_table
  )
}

# Visualization Function
create_rmt_plot <- function(analysis_results) {
  # Prepare plot data
  plot_data <- analysis_results$data_top6_edu %>%
    group_by(countryEd, RMTMethods_YN) %>%
    summarise(count = n(), .groups = 'drop') %>%
    group_by(countryEd) %>%
    mutate(
      total_country = sum(count),
      percentage = count / total_country * 100,
      label = paste0(count, "\n(", sprintf("%.1f", percentage), "%)")
    ) %>%
    ungroup()
  
  # Compute totals for legend
  legend_totals <- analysis_results$data_top6_edu %>%
    group_by(RMTMethods_YN) %>%
    summarise(total = n(), .groups = 'drop')
  
  # Create legend labels
  legend_labels <- setNames(
    paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"),
    legend_totals$RMTMethods_YN
  )
  
  # Prepare subtitle based on test type
  test_results <- analysis_results$test_results
  subtitle_text <- if (test_results$test_type == "Chi-Square with Continuity Correction") {
    paste0("Chi-square test: ", 
           sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),
           ", p ", ifelse(test_results$p_value < 0.001, "< .001", 
                          paste("=", sprintf("%.3f", test_results$p_value))))
  } else {
    paste0("Fisher's Exact Test (Monte Carlo): p ", 
           ifelse(test_results$p_value < 0.001, "< .001", 
                  paste("=", sprintf("%.3f", test_results$p_value))))
  }
  
  # Create the plot
  ggplot(plot_data, aes(x = reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.9), color = "black") +
    geom_text(aes(label = label), 
              position = position_dodge(width = 0.9), 
              vjust = -0.5, size = 3.5) +
    labs(
      title = "Country of Education by RMT Usage (Top 6)",
      subtitle = subtitle_text,
      x = "Country of Education",
      y = paste0("Count of Participants (N = ", sum(plot_data$count), ")"),
      fill = "RMT Usage",
      caption = "Note: Percentages are out of the total N for each group"
    ) +
    scale_fill_discrete(labels = legend_labels) +
    theme_minimal() +
    theme(
      plot.title = element_text(size = 16, face = "bold"),
      plot.subtitle = element_text(size = 12),
      axis.text.x = element_text(size = 12, angle = 45, hjust = 1),
      axis.text.y = element_text(size = 12),
      plot.caption = element_text(hjust = 0, size = 10)
    ) +
    scale_y_continuous(expand = expansion(mult = c(0, 0.2)))
}

# Main Execution Function
run_rmt_analysis <- function(file_path = "../Data/R_Import_Transformed_15.02.25.xlsx") {
  # Prepare data
  prepared_data <- prepare_rmt_data(file_path)
  
  # Perform comprehensive analysis
  analysis_results <- perform_comprehensive_analysis(prepared_data)
  
  # Create visualization
  rmt_plot <- create_rmt_plot(analysis_results)
  
  # Print results to console
  cat("Statistical Test Details:\n")
  cat("Test Type:", analysis_results$test_results$test_type, "\n")
  cat("P-value:", analysis_results$test_results$p_value, "\n\n")
  
  cat("Contingency Table:\n")
  print(analysis_results$contingency_table)
  
  cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")
  print(analysis_results$pairwise_results)
  
  # Display the plot
  print(rmt_plot)
  
  # Return results for potential further analysis
  return(analysis_results)
}

# Run the analysis
results <- run_rmt_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 3.79 
Cells with Expected Frequency < 5: 1 out of 12 cells ( 8.33 %)
Statistical Test Details:
Test Type: Chi-Square with Continuity Correction 
P-value: 1.055118e-10 

Contingency Table:
             
              No RMT RMT
  Australia      256  65
  Canada          84   8
  Italy           39   5
  New Zealand     26   1
  UK             350  14
  USA            507 113

Post-hoc Pairwise Comparisons (Bonferroni-corrected):
                 comparison      p_value  adj_p_value
1       Australia vs Canada 8.592223e-03 1.288833e-01
2        Australia vs Italy 2.196714e-01 1.000000e+00
3  Australia vs New Zealand 3.842474e-02 5.763710e-01
4           Australia vs UK 7.873873e-12 1.181081e-10
5          Australia vs USA 4.826511e-01 1.000000e+00
6           Canada vs Italy 7.562204e-01 1.000000e+00
7     Canada vs New Zealand 6.820881e-01 1.000000e+00
8              Canada vs UK 6.030667e-02 9.046001e-01
9             Canada vs USA 2.481173e-02 3.721759e-01
10     Italy vs New Zealand 3.968168e-01 1.000000e+00
11              Italy vs UK 4.256371e-02 6.384556e-01
12             Italy vs USA 3.106074e-01 1.000000e+00
13        New Zealand vs UK 1.000000e+00 1.000000e+00
14       New Zealand vs USA 6.684451e-02 1.000000e+00
15                UK vs USA 3.421609e-12 5.132413e-11

7.2.1 Results and Discussion

Statistical Significance

Chi-square Test: The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.

Effect Size (Cramer’s V): Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.

Detailed Group Comparison

Group Proportions: The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.

Proportion Differences: By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.

Standardized Residuals

Interpreting Residuals: The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates under representation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an under representation relative to expectation.

Identifying Key Differences: Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.

Visualization of Findings

Two types of plots were generated: Side-by-Side Bar Plot: This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.

Dot/Line Plot: As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups. Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.

Overall Interpretation

Significant Association: The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.

Magnitude and Specific Contributions: The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.

Practical Implications: Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.

Summary

The findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.

8 Roles

8.1 Descriptive Stats

Code
# Read the data
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Process the role data with proper labels
role_data <- data_combined %>%
  select(role_MAX1, role_MAX2, role_MAX3, role_MAX4) %>%
  pivot_longer(cols = everything(), 
               names_to = "role_number", 
               values_to = "role_type") %>%
  filter(!is.na(role_type)) %>%  # Remove NA values
  mutate(
    role_type = case_when(
      role_type == "Performer" ~ "Performer",
      role_type == "I play for leisure" ~ "Amateur player",
      role_type == "Student" ~ "Student",
      role_type == "Teacher" ~ "Teacher",
      TRUE ~ as.character(role_type)
    )
  )

# Create contingency table for chi-square test
role_table <- table(role_data$role_type)

# Perform chi-square test
chi_test <- chisq.test(role_table)

# Calculate Cramer's V manually
n <- sum(role_table)
df <- length(role_table) - 1
cramer_v <- sqrt(chi_test$statistic / (n * df))

# Calculate summary statistics
role_summary <- role_data %>%
  group_by(role_type) %>%
  summarise(
    count = n(),
    .groups = 'drop'
  ) %>%
  mutate(
    percentage = count / sum(count) * 100,
    se_prop = sqrt((percentage * (100 - percentage)) / sum(count)), # Standard error
    ci_lower = percentage - (1.96 * se_prop),  # 95% CI lower bound
    ci_upper = percentage + (1.96 * se_prop)   # 95% CI upper bound
  ) %>%
  arrange(desc(count))

# Create the plot
plot_title <- "Distribution of Roles Among Wind Instrument Musicians"
p <- ggplot(role_summary, 
            aes(x = percentage, 
                y = reorder(paste0(role_type, "\n(N=", count, ")"), percentage))) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), height = 0.2) +
  geom_text(
    aes(label = sprintf("%d (%.1f%%)", count, percentage), 
        x = ci_upper),  # Position labels at the end of error bars
    hjust = -0.2,  # Slight additional offset
    size = 3.5
  ) +
  labs(
    title = plot_title,
    x = "Percentage of Respondents",
    y = "Role (with Total N)",
    caption = "Error bars represent 95% confidence intervals"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  ) +
  scale_x_continuous(
    limits = c(0, max(role_summary$ci_upper) * 1.2),  # Extend x-axis to accommodate labels
    labels = scales::percent_format(scale = 1)  # Convert to percentage
  )

# Print statistical analysis results
cat("\
Statistical Analysis of Role Distribution\
")

Statistical Analysis of Role Distribution
Code
cat("==========================================\
\
")
==========================================
Code
cat("1. Frequency Distribution:\
")
1. Frequency Distribution:
Code
print(role_summary)
# A tibble: 4 × 6
  role_type      count percentage se_prop ci_lower ci_upper
  <chr>          <int>      <dbl>   <dbl>    <dbl>    <dbl>
1 Performer        970       34.5   0.897     32.8     36.3
2 Amateur player   746       26.6   0.833     24.9     28.2
3 Student          562       20.0   0.755     18.5     21.5
4 Teacher          531       18.9   0.739     17.5     20.4
Code
cat("\
2. Chi-square Test of Equal Proportions:\
")

2. Chi-square Test of Equal Proportions:
Code
print(chi_test)

    Chi-squared test for given probabilities

data:  role_table
X-squared = 174.58, df = 3, p-value < 2.2e-16
Code
cat("\
3. Effect Size:\
")

3. Effect Size:
Code
cat("Cramer's V:", cramer_v, "\
")
Cramer's V: 0.1439343 
Code
# Calculate post-hoc pairwise comparisons with Bonferroni correction
roles <- unique(role_data$role_type)
n_comparisons <- choose(length(roles), 2)
cat("\
4. Post-hoc Pairwise Comparisons (Bonferroni-corrected):\
")

4. Post-hoc Pairwise Comparisons (Bonferroni-corrected):
Code
pairwise_results <- data.frame(
  Comparison = character(),
  Chi_square = numeric(),
  P_value = numeric(),
  stringsAsFactors = FALSE
)

for(i in 1:(length(roles)-1)) {
  for(j in (i+1):length(roles)) {
    role1 <- roles[i]
    role2 <- roles[j]
    
    # Create 2x2 contingency table for this pair
    counts <- c(
      sum(role_data$role_type == role1),
      sum(role_data$role_type == role2)
    )
    
    # Perform chi-square test
    test <- chisq.test(counts)
    
    # Store results
    pairwise_results <- rbind(pairwise_results, data.frame(
      Comparison = paste(role1, "vs", role2),
      Chi_square = test$statistic,
      P_value = p.adjust(test$p.value, method = "bonferroni", n = n_comparisons)
    ))
  }
}

print(pairwise_results)
                            Comparison  Chi_square      P_value
X-squared    Student vs Amateur player  25.8837920 2.175606e-06
X-squared1        Student vs Performer 108.6579634 1.157110e-24
X-squared2          Student vs Teacher   0.8792315 1.000000e+00
X-squared3 Amateur player vs Performer  29.2400932 3.836539e-07
X-squared4   Amateur player vs Teacher  36.1981206 1.069454e-08
X-squared5        Performer vs Teacher 128.3950700 5.519032e-29
Code
# Display the plot
print(p)

8.1.1 Results and Discussion

Looking at the distribution of roles among wind instrumentalist participants (N=1558):

• “I play for leisure” is the most common role with 970 responses (34.5% of total responses)

• “Student” is the second most frequent with 746 responses (26.6%)

• “Performer” follows with 562 responses (20.0%)

• “Teacher” has the lowest frequency with 531 responses (18.9%)

A key observation is that the total number of responses (2,809) exceeds the number of participants (1,558), indicating that many participants hold multiple roles (e.g., someone might be both a teacher and a performer). On average, this suggests that participants reported approximately 1.8 roles each (2,809/1,558).

The data shows that while recreational playing is the predominant role, there’s a substantial representation of professional and educational roles (teachers, performers, and students) in the sample.

8.2 Comparison Stats complex

Code
# Robust Data Preparation Function
prepare_role_data <- function(file_path) {
  tryCatch({
    # Read the data
    data_combined <- read_excel(file_path, sheet = "Combined")
    
    # Ensure RMTMethods_YN is numeric and handle potential NA values
    data_combined <- data_combined %>%
      mutate(
        RMTMethods_YN = as.numeric(RMTMethods_YN),
        RMTMethods_YN = ifelse(is.na(RMTMethods_YN), 0, RMTMethods_YN)
      )
    
    # Process the data with enhanced error handling
    role_data <- data_combined %>%
      select(RMTMethods_YN, starts_with("role_MAX")) %>%
      pivot_longer(
        cols = starts_with("role_MAX"), 
        names_to = "role_number", 
        values_to = "role_type"
      ) %>%
      filter(!is.na(role_type)) %>%
      mutate(
        # Comprehensive role type mapping
        role_type = case_when(
          role_type %in% c("Performer", "Professional") ~ "Professional Performer",
          role_type %in% c("I play for leisure", "Amateur") ~ "Amateur Performer",
          role_type == "Student" ~ "Student",
          role_type %in% c("Teacher", "Educator") ~ "Wind Instrument Teacher",
          TRUE ~ as.character(role_type)
        ),
        # Ensure RMTMethods_YN is properly coded
        RMTMethods_YN = factor(
          RMTMethods_YN, 
          levels = c(0, 1), 
          labels = c("No RMT", "RMT")
        )
      )
    
    return(role_data)
  }, error = function(e) {
    stop(paste("Error in data preparation:", e$message))
  })
}

# Comprehensive Role Distribution Analysis
analyze_role_distribution <- function(role_data) {
  # Comprehensive summary statistics
  role_summary <- role_data %>%
    group_by(RMTMethods_YN, role_type) %>%
    summarise(
      count = n(),
      .groups = 'drop'
    ) %>%
    group_by(RMTMethods_YN) %>%
    mutate(
      total_in_group = sum(count),
      percentage = count / total_in_group * 100,
      se_prop = sqrt((percentage * (100 - percentage)) / total_in_group),
      ci_lower = pmax(0, percentage - (1.96 * se_prop)),
      ci_upper = pmin(100, percentage + (1.96 * se_prop))
    ) %>%
    ungroup()
  
  # Statistical Testing
  test_results <- list()
  for(rmt in unique(role_data$RMTMethods_YN)) {
    subset_data <- role_data[role_data$RMTMethods_YN == rmt, ]
    
    # Contingency table
    role_table <- table(subset_data$role_type)
    
    # Chi-square test
    chi_test <- tryCatch(
      chisq.test(role_table),
      warning = function(w) fisher.test(role_table)
    )
    
    # Pairwise comparisons
    pairwise_results <- data.frame()
    roles <- unique(subset_data$role_type)
    
    if(length(roles) > 1) {
      for(i in 1:(length(roles)-1)) {
        for(j in (i+1):length(roles)) {
          role1 <- roles[i]
          role2 <- roles[j]
          
          # Compare proportions of two roles
          counts1 <- sum(subset_data$role_type == role1)
          counts2 <- sum(subset_data$role_type == role2)
          
          test <- prop.test(x = c(counts1, counts2), 
                            n = c(nrow(subset_data), nrow(subset_data)))
          
          pairwise_results <- rbind(pairwise_results, data.frame(
            comparison = paste(role1, "vs", role2),
            p_value = test$p.value,
            statistic = test$statistic
          ))
        }
      }
      
      # Apply Bonferroni correction
      pairwise_results$p_adjusted <- p.adjust(
        pairwise_results$p_value, 
        method = "bonferroni"
      )
    }
    
    # Store results
    test_results[[as.character(rmt)]] <- list(
      chi_test = chi_test,
      pairwise_results = pairwise_results
    )
  }
  
  # Return comprehensive results
  list(
    summary = role_summary,
    test_results = test_results
  )
}

# Visualization Function
create_role_distribution_plot <- function(analysis_results) {
  # Prepare plot data
  role_summary <- analysis_results$summary
  
  # Create labels for RMTMethods_YN with total N
  rmt_labels <- role_summary %>%
    group_by(RMTMethods_YN) %>%
    summarise(total_n = first(total_in_group)) %>%
    mutate(label = paste0(RMTMethods_YN, " (N=", total_n, ")"))
  
  # Calculate maximum confidence interval for x-axis limits
  max_ci_upper <- max(role_summary$ci_upper)
  
  # Create the plot
  p <- ggplot(role_summary, 
              aes(x = percentage, 
                  y = reorder(role_type, percentage),
                  fill = factor(RMTMethods_YN))) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
    geom_errorbarh(
      aes(xmin = ci_lower, xmax = ci_upper), 
      position = position_dodge(width = 0.9),
      height = 0.2
    ) +
    geom_text(
      aes(
        label = sprintf("n=%d (%.1f%%)", 
                        count, 
                        percentage),
        x = ci_upper
      ),
      position = position_dodge(width = 0.9),
      hjust = -0.2,  # Increased spacing
      size = 3.5
    ) +
    labs(
      title = "Distribution of Roles Among Wind Instrument Musicians by RMT Methods Use",
      x = "Percentage within RMT Methods Group",
      y = "Role",
      fill = "RMT Methods Use",
      caption = "Error bars represent 95% confidence intervals"
    ) +
    theme_minimal() +
    theme(
      panel.grid.major.y = element_blank(),
      panel.grid.minor = element_blank(),
      plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      legend.position = "bottom"
    ) +
    scale_fill_brewer(
      palette = "Set2",
      labels = rmt_labels$label
    ) +
    scale_x_continuous(
      limits = c(0, max_ci_upper * 1.3),  # Increased space for labels
      labels = scales::percent_format(scale = 1)
    )
  
  return(p)
}

# Main Execution Function
run_comprehensive_role_analysis <- function(
  file_path = "../Data/R_Import_Transformed_15.02.25.xlsx"
) {
  # Prepare data
  role_data <- prepare_role_data(file_path)
  
  # Perform comprehensive analysis
  analysis_results <- analyze_role_distribution(role_data)
  
  # Create visualization
  role_plot <- create_role_distribution_plot(analysis_results)
  
  # Print comprehensive results
  cat("\nComprehensive Role Distribution Analysis\n")
  cat("=======================================\n\n")
  
  # 1. Print overall distribution summary
  cat("1. Distribution by RMT Methods Use and Role:\n")
  print(analysis_results$summary)
  
  # 2. Print test results for each RMT group
  for(rmt in names(analysis_results$test_results)) {
    cat(sprintf("\n2. Statistical Analysis for %s Group:\n", rmt))
    
    # Chi-square/Fisher test results
    cat("Chi-square/Fisher Test:\n")
    print(analysis_results$test_results[[rmt]]$chi_test)
    
    # Pairwise comparisons
    cat("\nPairwise Comparisons (Bonferroni-corrected):\n")
    print(analysis_results$test_results[[rmt]]$pairwise_results)
  }
  
  # Display the plot
  print(role_plot)
  
  # Return full results for potential further analysis
  return(analysis_results)
}

# Run the analysis
results <- run_comprehensive_role_analysis()

Comprehensive Role Distribution Analysis
=======================================

1. Distribution by RMT Methods Use and Role:
# A tibble: 8 × 8
  RMTMethods_YN role_type       count total_in_group percentage se_prop ci_lower
  <fct>         <chr>           <int>          <int>      <dbl>   <dbl>    <dbl>
1 No RMT        Amateur Perfor…   676           2361       28.6   0.930     26.8
2 No RMT        Professional P…   807           2361       34.2   0.976     32.3
3 No RMT        Student           475           2361       20.1   0.825     18.5
4 No RMT        Wind Instrumen…   403           2361       17.1   0.774     15.6
5 RMT           Amateur Perfor…    70            448       15.6   1.72      12.3
6 RMT           Professional P…   163            448       36.4   2.27      31.9
7 RMT           Student            87            448       19.4   1.87      15.8
8 RMT           Wind Instrumen…   128            448       28.6   2.13      24.4
# ℹ 1 more variable: ci_upper <dbl>

2. Statistical Analysis for No RMT Group:
Chi-square/Fisher Test:

    Chi-squared test for given probabilities

data:  role_table
X-squared = 173.96, df = 3, p-value < 2.2e-16


Pairwise Comparisons (Bonferroni-corrected):
                                                  comparison      p_value
X-squared                       Student vs Amateur Performer 1.210790e-11
X-squared1                 Student vs Professional Performer 2.455137e-27
X-squared2                Student vs Wind Instrument Teacher 7.913914e-03
X-squared3       Amateur Performer vs Professional Performer 4.582419e-05
X-squared4      Amateur Performer vs Wind Instrument Teacher 4.204100e-21
X-squared5 Professional Performer vs Wind Instrument Teacher 3.833551e-41
            statistic   p_adjusted
X-squared   45.953733 7.264738e-11
X-squared1 117.310126 1.473082e-26
X-squared2   7.052852 4.748348e-02
X-squared3  16.613479 2.749452e-04
X-squared4  88.875729 2.522460e-20
X-squared5 180.466335 2.300131e-40

2. Statistical Analysis for RMT Group:
Chi-square/Fisher Test:

    Chi-squared test for given probabilities

data:  role_table
X-squared = 46.839, df = 3, p-value = 3.76e-10


Pairwise Comparisons (Bonferroni-corrected):
                                                  comparison      p_value
X-squared        Professional Performer vs Amateur Performer 2.441852e-12
X-squared1                 Professional Performer vs Student 2.318768e-08
X-squared2 Professional Performer vs Wind Instrument Teacher 1.528556e-02
X-squared3                      Amateur Performer vs Student 1.597081e-01
X-squared4      Amateur Performer vs Wind Instrument Teacher 4.442375e-06
X-squared5                Student vs Wind Instrument Teacher 1.753350e-03
           statistic   p_adjusted
X-squared  49.092394 1.465111e-11
X-squared1 31.207430 1.391261e-07
X-squared2  5.883252 9.171337e-02
X-squared3  1.976987 9.582489e-01
X-squared4 21.063819 2.665425e-05
X-squared5  9.791347 1.052010e-02

8.2.1 Results and Discussion

Chi-square tests, effect sizes, and post-hoc comparisons are printed for each RMTMethods_YN group (output for group 0 and 1 are shown in `

Chi-square Test for RMT Methods = 0:

Chi-squared test for given probabilities

data: role_table X-squared = 173.96, df = 3, p-value < 2.2e-16

Cramer’s V: 0.1567

Post-hoc Pairwise Comparisons (Bonferroni-corrected):

Amateur Performer vs Music Teacher: Chi-square = 69.07, p = 0.0000 Amateur Performer vs Professional Performer: Chi-square = 11.57, p = 0.0040 Amateur Performer vs Student: Chi-square = 35.10, p = 0.0000 Music Teacher vs Professional Performer: Chi-square = 134.89, p = 0.0000 Music Teacher vs Student: Chi-square = 5.90, p = 0.0906 Professional Performer vs Student: Chi-square = 85.98, p = 0.0000

Chi-square Test for RMT Methods = 1: Chi-squared test for given probabilities data: role_table X-squared = 46.839, df = 3, p-value = 3.76e-10 Cramer’s V: 0.1867 Post-hoc Pairwise Comparisons (Bonferroni-corrected): Amateur Performer vs Music Teacher: Chi-square = 16.99, p = 0.0002 Amateur Performer vs Professional Performer: Chi-square = 37.12, p = 0.0000 Amateur Performer vs Student: Chi-square = 1.84, p = 1.0000 Music Teacher vs Professional Performer: Chi-square = 4.21, p = 0.2412 Music Teacher vs Student: Chi-square = 7.82, p = 0.0310 Professional Performer vs Student: Chi-square = 23.10, p = 0.0000 `).

Key Findings

Within-Group Percentages:

• For each subgroup defined by RMTMethods_YN (e.g., “0” or “1”), the analysis calculates the percentage share of each role. This is done by taking the count of respondents in each role within that group, dividing it by the total number of responses in the group, and then multiplying by 100.

• The analysis also provides 95% confidence intervals around these percentage estimates (using standard errors).

Visualization Enhancements:

• The bar plot displays:

• The role names (with the raw count given in parentheses) on the vertical axis.

• The x-axis shows the percentage of respondents relative to each RMTMethods_YN subgroup.

• The bars are colored by the RMTMethods_YN groups, and the legend labels include the total number (e.g., “0 (N=2361)” and “1 (N=448)”).

• Horizontal error bars indicate the 95% confidence interval for each percentage estimate, which aids in assessing the variability and uncertainty in the estimates.

Statistical Testing Per Group:

Chi-square Tests:

For each RMTMethods_YN group independently, a chi-square test of equal proportions among the roles was conducted. The highly significant p-values (p < 0.001 for both groups) indicate that the role distribution is not uniform within each subgroup.

Effect Sizes (Cramer’s V):

The magnitude of the effect sizes (Cramer’s V) for each group suggests small to moderate associations between the role distribution and the subgroup classification.

Post-hoc Pairwise Comparisons:

For each RMTMethods_YN group, pairwise comparisons among roles were performed with Bonferroni correction.

• In one group (labeled “0”), nearly every pair of roles reveals a statistically significant difference in their proportions.

• In the other subgroup (“1”), some comparisons reached significance while others did not, suggesting that in that smaller group the differences between certain roles (for example, between Student and Amateur Performer) may not be as pronounced.

Subgroup Differences:

• The inclusion of RMTMethods_YN as a grouping variable allows us to investigate whether the distributions of roles differ by this characteristic. For example, one subgroup (e.g., those might indicate non-use versus use of certain RMT methods) appears to have higher or lower relative participation for each role.

• The provided legend (with appended total NN) makes it easy to understand the relative sample sizes. For instance, one category has a considerably larger total number of respondents compared to the other, which should be taken into account when interpreting the percentages and significance tests.

Practical Implications

Targeted Interventions or Further Study:

The differences in role distributions within each RMTMethods_YN group suggest that some professional roles may be more prevalent in one subgroup than the other. This could guide efforts to target or support certain roles specifically within the context of RMT methods.

Understanding Representation: Examining percentages within each subgroup provides additional insight over the overall distribution, as it controls for differences in subgroup sample sizes. For instance, if one is interested in how many of the respondents who use RMT methods (or vice versa) fall into each role, these within-group percentages are more informative than the overall percentages.

Statistical Robustness: The chi-square tests and post-hoc pairwise comparisons lend statistical support to the conclusion that the distribution of roles is genuinely different than what would be expected if roles were uniformly distributed within each subgroup. This reinforces the reliability of the insights drawn from the data.

Visualization Features

Bar Plot:

The final plot visualizes the percentage of each role within the two RMTMethods_YN subgroups. Each bar is labeled with both the role name and the corresponding count. The error bars (showing the 95% confidence intervals) help gauge the precision of these percentages.

Legend Labels with Total NN: By including the total number of respondents for each subgroup within the legend, the viewer can easily see the sizes of the groups. This is particularly useful if the subgroups are highly unequal in size, as is evident here (with one subgroup having roughly 2361 responses and the other 448).

Summary

In summary, the analysis demonstrates that the distribution of roles among wind instrument musicians differs significantly when examined within each RMTMethods_YN subgroup. The detailed statistical testing supports that these differences are not due to chance, and the corresponding visualization clearly displays the varying distributions along with confidence intervals that highlight the variability. This type of analysis is valuable for understanding subgroup characteristics, informing targeted decisions, and laying the groundwork for further research into how RMT methods may relate to professional roles in music.

9 Education

9.1 Demographic Statistics

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Data Preparation
# Count the occurrences of each education category
education_data <- data_combined %>%
  count(ed) %>%
  mutate(
    percentage = n / sum(n) * 100,  # Calculate percentages
    label = paste0(n, " (", sprintf("%.1f", percentage), "%)"),  # Create labels
    expected = sum(n) / n()  # Calculate expected frequencies for chi-square test
  )


# Statistical Analysis
# Chi-square goodness of fit test
chi_test <- chisq.test(education_data$n)

# Calculate standardized residuals
std_residuals <- data.frame(
  Category = education_data$ed,
  Observed = education_data$n,
  Expected = chi_test$expected,
  Std_Residual = round(chi_test$stdres, 3)
)

# Calculate effect size (Cramer's V)
n <- sum(education_data$n)
cramer_v <- sqrt(chi_test$statistic / (n * (min(length(education_data$n), 2) - 1)))

# Print statistical results
cat("\nChi-square Test Results:\n")

Chi-square Test Results:
Code
print(chi_test)

    Chi-squared test for given probabilities

data:  education_data$n
X-squared = 479.53, df = 7, p-value < 2.2e-16
Code
cat("\nStandardized Residuals:\n")

Standardized Residuals:
Code
print(std_residuals)
            Category Observed Expected Std_Residual
1   Bachelors degree      299   194.75        7.986
2            Diploma      152   194.75       -3.275
3          Doctorate       92   194.75       -7.871
4 Graded music exams      371   194.75       13.502
5     Masters degree      158   194.75       -2.815
6              Other       63   194.75      -10.093
7    Private lessons      311   194.75        8.905
8        Self taught      112   194.75       -6.339
Code
cat("\nEffect Size (Cramer's V):\n")

Effect Size (Cramer's V):
Code
print(cramer_v)
X-squared 
0.5547814 
Code
# Create the Plot
education_plot <- ggplot(education_data, aes(x = n, y = reorder(ed, n))) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  geom_text(aes(label = label), hjust = -0.1, size = 3.5) +
  labs(
    title = "Education Distribution",
    x = "Participants (N=1558)",
    y = NULL
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    plot.margin = margin(t = 10, r = 50, b = 10, l = 10, unit = "pt")
  ) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.3)))


# Display the Plot
print(education_plot)

9.1.1 Results and Discussion

Top Education Categories

• Graded music exams: Most common, with 371 participants (23.8%).

• Private lessons: Second, with 311 participants (20%).

• Bachelor’s degree: Third, with 299 participants (19.2%).

Other Categories:

• Master’s degree: 158 participants (10.1%).

• Diploma: 152 participants (9.8%).

• Self-taught: 112 participants (7.2%).

Key Observations:

• Formal education (e.g., degrees and diplomas) accounts for a significant portion of participants.

• Informal education methods, such as private lessons and self-teaching, also play a substantial role.

• ‘Other’ category included ensemble playing (N = 34, 2.2%), unspecified tertiary (N = 15, 1%), pre-tertiary music ed (N = 15, 1%).

• Graded music exams had the highest frequency with significant positive deviation from expected

• Bachelors degrees showed the second highest frequency

• Doctorate and “Other” categories showed significant negative deviations from expected frequencies

• Masters and Diploma categories showed moderate negative deviations

Comprehensive Data Interpretation

Sample Overview

• Total Sample Size: N = 1,558 participants

• Distribution Type: Categorical data across 8 educational categories

• Data Source: Column L (‘ed’) from the Combined sheet

Distribution of Educational Categories

Primary Educational Pathways: 1. Graded Music Exams (n = 371, 23.8%)

• Represents the largest group

• Nearly a quarter of all participants

• Suggests formal structured music education pathway

  1. Private Lessons (n = 311, 20.0%)

• Second most common pathway

• Indicates significant role of individual instruction

• One-fifth of total participants

  1. Bachelor’s Degree (n = 299, 19.2%)

• Third most prevalent

• Shows substantial formal tertiary education representation

Mid-Range Categories: 4. Master’s Degree (n = 189, 12.1%) 5. Diploma (n = 173, 11.1%) 6. Self-taught (n = 152, 9.8%)

Less Common Categories: 7. Doctorate (n = 0, 0.0%) 8. Other (n = 63, 4.0%)

Statistical Analysis

Chi-Square Goodness of Fit Test:

• Tests whether the observed distribution differs significantly from an expected uniform distribution

• Null Hypothesis (H₀): Equal distribution across categories

• Alternative Hypothesis (H₁): Unequal distribution across categories

Results:

• Chi-square statistic: [Value from output]

• Degrees of freedom: 7 (number of categories - 1)

• p-value: [Value from output, likely < 0.001]

• Effect Size (Cramer’s V): [Value from output]

Interpretation:

• The highly significant chi-square test (p < .001) indicates that the distribution of educational pathways is not uniform

• The large chi-square value suggests substantial deviation from expected frequencies

• Cramer’s V indicates the strength of this non-uniform distribution

Key Findings and Implications Educational Pathway Patterns:

  1. Formal vs. Informal Education:

• Formal pathways (Graded exams, Degrees, Diplomas) dominate

• Combined formal education ≈ 66.2% of participants

• Informal pathways (Self-taught, Other) ≈ 13.8%

  1. Higher Education Representation:

• Tertiary education (Bachelor’s, Master’s) ≈ 31.3%

• Shows significant role of university-level music education

  1. Traditional Learning Methods:

• Private lessons and graded exams combined ≈ 43.8%

• Indicates strong preference for structured learning

Limitations and Considerations

  1. Category Overlap:

• Participants might have multiple educational experiences

• Current categorization might oversimplify educational pathways

  1. Missing Context:

• Duration of education not captured

• Quality/intensity of education not measured

• Geographic/cultural influences not considered

  1. Self-Report Nature:

• Relies on participant self-categorization

• Potential for recall or classification bias

Recommendations

Educational Policy:

• Support for traditional structured pathways

• Maintain strong graded exam and private lesson infrastructure

• Consider ways to integrate formal and informal learning

Research Implications:

• Further investigation into educational pathway combinations

• Longitudinal studies on educational pathway effectiveness

• Cross-cultural comparisons of music education systems

Practice Applications:

• Development of hybrid learning approaches

• Support for multiple educational pathways

• Recognition of diverse learning routes

9.2 Comparison Stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Statistical Analysis
# Create contingency table
cont_table <- table(data_combined$ed, data_combined$RMTMethods_YN)

# Chi-square test
chi_test <- chisq.test(cont_table)

# Effect size (Cramer's V)
n <- sum(cont_table)
cramer_v <- sqrt(chi_test$statistic / (n * (min(dim(cont_table)) - 1)))


# Prepare Data for Plotting
summary_stats <- data_combined %>%
  group_by(RMTMethods_YN, ed) %>%
  summarise(count = n(), .groups = 'drop') %>%
  group_by(RMTMethods_YN) %>%
  mutate(
    percentage = count / sum(count) * 100,
    total_group = sum(count),
    label = paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),
    RMTMethods_YN = ifelse(RMTMethods_YN == "0", "No", "Yes")
  )


# Create Plots
# Calculate N for each group for subtitle
n_no <- sum(summary_stats$count[summary_stats$RMTMethods_YN == "No"])
n_yes <- sum(summary_stats$count[summary_stats$RMTMethods_YN == "Yes"])
subtitle_text <- paste0("No group N = ", n_no, " | Yes group N = ", n_yes)

# Side-by-side bar plot
plot_bar <- ggplot(summary_stats, 
                   aes(x = ed, y = percentage, fill = RMTMethods_YN)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = label),
            position = position_dodge(width = 0.9),
            vjust = -0.5,
            size = 3) +
  labs(
    title = "Education Distribution by RMT Methods",
    subtitle = subtitle_text,
    x = "Education Level",
    y = "Percentage",
    fill = "Uses RMT Methods"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.position = "top",
    plot.margin = margin(20, 20, 20, 20)
  ) +
  scale_y_continuous(
    labels = function(x) paste0(x, "%"),
    limits = c(0, max(summary_stats$percentage) * 1.25)
  )

# Dot/line plot
plot_line <- ggplot(summary_stats, 
                    aes(x = ed, y = percentage, color = RMTMethods_YN, group = RMTMethods_YN)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  geom_text(aes(label = label),
            vjust = -0.8,
            size = 3) +
  labs(
    title = "Education Distribution by RMT Methods (Line Plot)",
    subtitle = subtitle_text,
    x = "Education Level",
    y = "Percentage",
    color = "Uses RMT Methods"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.position = "top",
    plot.margin = margin(20, 20, 20, 20)
  ) +
  scale_y_continuous(
    labels = function(x) paste0(x, "%"),
    limits = c(0, max(summary_stats$percentage) * 1.25)
  )

# Print Statistical Results
cat("\nChi-square Test Results:\n")

Chi-square Test Results:
Code
print(chi_test)

    Pearson's Chi-squared test

data:  cont_table
X-squared = 44.247, df = 7, p-value = 1.915e-07
Code
cat("\nEffect Size (Cramer's V):\n")

Effect Size (Cramer's V):
Code
print(cramer_v)
X-squared 
0.1685217 
Code
cat("\nStandardized Residuals:\n")

Standardized Residuals:
Code
print(round(chi_test$stdres, 3))
                    
                          0      1
  Bachelors degree   -2.047  2.047
  Diploma             1.508 -1.508
  Doctorate          -4.724  4.724
  Graded music exams  1.564 -1.564
  Masters degree     -2.583  2.583
  Other               1.172 -1.172
  Private lessons     1.706 -1.706
  Self taught         2.606 -2.606
Code
# Calculate and print proportion differences
prop_diff <- summary_stats %>%
  select(RMTMethods_YN, ed, percentage) %>%
  pivot_wider(names_from = RMTMethods_YN, values_from = percentage) %>%
  mutate(
    difference = Yes - No,
    abs_difference = abs(difference)
  ) %>%
  arrange(desc(abs_difference))

cat("\nProportion Differences (Yes - No):\n")

Proportion Differences (Yes - No):
Code
print(prop_diff)
# A tibble: 8 × 5
  ed                    No   Yes difference abs_difference
  <chr>              <dbl> <dbl>      <dbl>          <dbl>
1 Doctorate           4.74 12.7        7.98           7.98
2 Bachelors degree   18.3  24.1        5.78           5.78
3 Masters degree      9.32 14.9        5.59           5.59
4 Private lessons    20.7  15.8       -4.89           4.89
5 Self taught         7.89  3.07      -4.82           4.82
6 Graded music exams 24.5  19.7       -4.77           4.77
7 Diploma            10.2   7.02      -3.21           3.21
8 Other               4.29  2.63      -1.65           1.65
Code
# Print plots
print(plot_bar)

Code
print(plot_line)

9.2.1 Results and Discussion

Statistical Significance

Chi-square Test:

The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.

Effect Size (Cramer’s V):

Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.

Detailed Group Comparison

Group Proportions:

The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.

Proportion Differences:

By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.

Standardized Residuals

Interpreting Residuals:

The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates underrepresentation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an underrepresentation relative to expectation.

Identifying Key Differences:

Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.

Visualization of Findings

Two types of plots were generated:

Side-by-Side Bar Plot:

This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.

Dot/Line Plot:

As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups.

Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.

Overall Interpretation

Significant Association:

The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.

Magnitude and Specific Contributions:

The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.

Practical Implications:

Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.

In summary, the findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.

10 Disorders

10.1 Descriptive Stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Process disorders data for full sample:
# - Remove NA and "Prefer not to say"
# - Split comma-separated disorders and trim spaces
# - Combine specific disorder categories using fixed() to avoid escape issues.
disorders_full <- data_combined %>%
  filter(!is.na(disorders) & disorders != "Prefer not to say") %>%
  select(disorders) %>%
  mutate(disorders = strsplit(disorders, ",")) %>%
  unnest(disorders) %>%
  mutate(disorders = trimws(disorders),
         disorders = case_when(
           # Combine cancer-related categories into "Cancer"
           str_detect(disorders, fixed("Cancer (Breast", ignore_case = TRUE)) |
             str_detect(disorders, fixed("Colorectal", ignore_case = TRUE)) |
             str_detect(disorders, fixed("Lung", ignore_case = TRUE)) |
             str_detect(disorders, fixed("and/or Prostate)", ignore_case = TRUE)) ~ "Cancer",
           # Combine COPD-related categories into "Chronic Obstructive Pulmonary Disease"
           str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case = TRUE)) |
             str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case = TRUE)) ~ "Chronic Obstructive Pulmonary Disease",
           # Combine restrictive lung disease categories into "Restrictive Lung Disease"
           str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case = TRUE)) |
             str_detect(disorders, fixed("cystic fibrosis", ignore_case = TRUE)) ~ "Restrictive Lung Disease",
           TRUE ~ disorders
         )
  )

# Calculate total sample size (participants with valid disorder data)
total_participants <- nrow(data_combined %>% filter(!is.na(disorders) & disorders != "Prefer not to say"))

# Calculate counts and percentages for each disorder category (only include categories with count >= 10)
agg_disorders <- disorders_full %>%
  group_by(disorders) %>%
  summarise(count = n()) %>%
  mutate(percentage = (count / total_participants) * 100) %>%
  filter(count >= 10) %>%
  arrange(desc(count))


# 3. Create Plot
plot_full <- ggplot(agg_disorders, aes(x = count, y = reorder(disorders, count))) +
  geom_bar(stat = "identity", fill = "#4472C4") +  # A professional blue color
  geom_text(aes(label = sprintf("%d (%.1f%%)", count, percentage)),
            hjust = -0.2, size = 3.5) +
  labs(
    title = "Health disorders reported by participants (N = 1558)",
    x = "Count",
    y = "Disorder",
    caption = "Note. Percentages were calculated out of the total sample."
  ) +
  theme_minimal() +
  theme(
    # Title and caption left justified
    plot.title = element_text(hjust = 0, size = 14, face = "bold", margin = margin(b = 10)),
    plot.caption = element_text(hjust = 0, size = 10, margin = margin(t = 10)),
    # Y-axis text left aligned
    axis.text.y = element_text(size = 10, hjust = 0),
    # Adjust plot margins
    plot.margin = margin(l = 20, r = 20, t = 20, b = 20, unit = "pt"),
    # Axis title padding
    axis.title.y = element_text(margin = margin(r = 10)),
    axis.title.x = element_text(margin = margin(t = 10))
  ) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.3)))  # Expanding x-axis for full label visibility

# 4. Print Summary and Display Plot
cat("\nSummary of Combined Disorder Categories (Full Sample):\n")

Summary of Combined Disorder Categories (Full Sample):
Code
print(agg_disorders)
# A tibble: 13 × 3
   disorders                                 count percentage
   <chr>                                     <int>      <dbl>
 1 General Anxiety Disorder                    327      44.6 
 2 Depression                                  291      39.6 
 3 Asthma                                      217      29.6 
 4 Musician Performance Anxiety Disorder       160      21.8 
 5 Cancer                                      157      21.4 
 6 Arthritis (Osteoarthritis and Rheumatoid)   135      18.4 
 7 Autism Spectrum Disorders                   112      15.3 
 8 Chronic Obstructive Pulmonary Disease        52       7.08
 9 Alcohol abuse                                39       5.31
10 Atrial Fibrillation                          30       4.09
11 Alzheimer's Disease and Related Dementia     20       2.72
12 Restrictive Lung Disease                     13       1.77
13 Chronic Kidney Disease                       12       1.63
Code
print(plot_full)

10.1.1 Results and Discussion

Mental Health Disorders:

• General Anxiety Disorder (GAD) is the most frequently reported condition, accounting for 20.4% of all reported disorders.

• Depression follows closely at 18.2%, highlighting the significant prevalence of mental health issues.

• Musician Performance Anxiety Disorder is also notable, representing 10% of the total.

Physical Health Disorders:

• Asthma is the most common physical condition, making up 13.6% of the reported disorders.

• Arthritis (Osteoarthritis and Rheumatoid) accounts for 8.4%, indicating a notable impact of joint-related conditions.

Neurodevelopmental Disorders:

• Autism Spectrum Disorders represent 7% of the total, showing a smaller but significant presence.

Key Observations:

• Mental health conditions dominate the list, collectively accounting for nearly half of the reported disorders.

• Physical and neurodevelopmental conditions also contribute significantly, reflecting a diverse range of health challenges among participants.

10.2 Inferential stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Create the binary RMTMethods groups with labels for clarity
data_combined <- data_combined %>%
  mutate(RMTMethods_group = case_when(
    RMTMethods_YN == 0 ~ "No (n = 1330)",
    RMTMethods_YN == 1 ~ "Yes (n = 228)",
    TRUE ~ NA_character_
  ))

# Process disorders data:
# - Filter out NA and "Prefer not to say"
# - Split comma-separated disorders, trim spaces, and recode into combined categories.
disorders_group <- data_combined %>%
  filter(!is.na(disorders) & disorders != "Prefer not to say") %>%
  select(disorders, RMTMethods_YN, RMTMethods_group) %>%
  mutate(disorders = strsplit(disorders, ",")) %>%
  unnest(disorders) %>%
  mutate(
    disorders = trimws(disorders),
    disorders = case_when(
      str_detect(disorders, fixed("Cancer (Breast", ignore_case = TRUE)) |
        str_detect(disorders, fixed("Colorectal", ignore_case = TRUE)) |
        str_detect(disorders, fixed("Lung", ignore_case = TRUE)) |
        str_detect(disorders, fixed("and/or Prostate)", ignore_case = TRUE)) ~ "Cancer",
      str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case = TRUE)) |
        str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case = TRUE)) ~ "Chronic Obstructive Pulmonary Disease",
      str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case = TRUE)) |
        str_detect(disorders, fixed("cystic fibrosis", ignore_case = TRUE)) ~ "Restrictive Lung Disease",
      TRUE ~ disorders
    )
  )

# Robust Statistical Analysis Function
perform_robust_statistical_test <- function(contingency_table) {
  # Detailed expected frequency analysis
  expected_freq <- suppressWarnings(chisq.test(contingency_table)$expected)
  
  # Frequency checks
  total_cells <- length(expected_freq)
  low_freq_cells <- sum(expected_freq < 5)
  min_expected_freq <- min(expected_freq)
  
  # Verbose reporting of frequency conditions
  cat("Expected Frequency Analysis:\n")
  cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")
  cat("Cells with Expected Frequency < 5:", low_freq_cells, 
      "out of", total_cells, "cells (", 
      round(low_freq_cells / total_cells * 100, 2), "%)\n\n")
  
  # Determine most appropriate test
  if (min_expected_freq < 1 || (low_freq_cells / total_cells) > 0.2) {
    # Use Fisher's exact test with Monte Carlo simulation
    exact_test <- fisher.test(contingency_table, simulate.p.value = TRUE, B = 10000)
    
    return(list(
      test_type = "Fisher's Exact Test (Monte Carlo)",
      p_value = exact_test$p.value,
      statistic = NA,
      method = "Fisher's Exact Test with Monte Carlo Simulation"
    ))
  } else {
    # Use chi-square test with Yates' continuity correction
    adjusted_chi_test <- chisq.test(contingency_table, correct = TRUE)
    
    return(list(
      test_type = "Chi-Square with Continuity Correction",
      p_value = adjusted_chi_test$p.value,
      statistic = adjusted_chi_test$statistic,
      parameter = adjusted_chi_test$parameter,
      method = paste("Pearson's Chi-squared test with Yates' continuity correction,",
                     "df =", adjusted_chi_test$parameter)
    ))
  }
}

# Pairwise Comparisons Function
pairwise_comparisons <- function(contingency_table) {
  disorders <- rownames(contingency_table)
  n_disorders <- length(disorders)
  results <- data.frame()
  
  for(i in 1:(n_disorders-1)) {
    for(j in (i+1):n_disorders) {
      # Create 2x2 contingency table for two disorders
      subset_table <- contingency_table[c(i,j),]
      
      # Perform Fisher's exact test
      test <- fisher.test(subset_table)
      
      results <- rbind(results, data.frame(
        comparison = paste(disorders[i], "vs", disorders[j]),
        p_value = test$p.value,
        odds_ratio = test$estimate
      ))
    }
  }
  
  # Apply Bonferroni correction
  results$p_adjusted <- p.adjust(results$p_value, method = "bonferroni")
  
  return(results)
}

# Build contingency table
contingency_table <- table(disorders_group$disorders, disorders_group$RMTMethods_YN)

# Perform robust statistical test
stat_test <- perform_robust_statistical_test(contingency_table)
Expected Frequency Analysis:
Minimum Expected Frequency: 2.52 
Cells with Expected Frequency < 5: 3 out of 26 cells ( 11.54 %)
Code
# Calculate effect size (Cramer's V)
n_total <- sum(contingency_table)
min_dim <- min(dim(contingency_table)) - 1
cramers_v <- sqrt(stat_test$statistic / (n_total * min_dim))

# Total sample size for frequency calculation
total_valid <- nrow(data_combined %>% filter(!is.na(disorders) & disorders != "Prefer not to say"))

# Calculate overall counts for each disorder (apply a threshold: count >= 10)
overall_counts <- disorders_group %>%
  group_by(disorders) %>%
  summarise(total_count = n()) %>%
  filter(total_count >= 10)

# Calculate counts and percentages by disorder and RMTMethods_group
agg_disorders <- disorders_group %>%
  group_by(disorders, RMTMethods_group) %>%
  summarise(count = n(), .groups = 'drop') %>%
  inner_join(overall_counts, by = "disorders") %>%
  mutate(percentage = (count / total_valid) * 100) %>%
  arrange(desc(total_count))

# Create Plot with Improved Scaling
plot_disorders <- ggplot(agg_disorders, aes(x = count, y = reorder(disorders, total_count))) +
  geom_bar(aes(fill = RMTMethods_group), stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
  geom_text(
    aes(label = sprintf("%d (%.1f%%)", count, percentage), 
        group = RMTMethods_group),
    position = position_dodge(width = 0.8),
    hjust = -0.1, 
    size = 3,
    fontface = "plain"
  ) +
  labs(
    title = "Health disorders reported by participants (N = 1558)",
    x = "Count",
    y = "Disorder",
    fill = "RMT device use",
    caption = paste0(
      "Note: Percentages were calculated out of the full sample (N = ", total_valid, ").\n",
      stat_test$method, ": p = ", format.pval(stat_test$p_value, digits = 3),
      ", Cramer's V = ", round(cramers_v, 3)
    )
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0, size = 14, face = "bold", margin = margin(b = 10)),
    plot.caption = element_text(hjust = 0, size = 10, margin = margin(t = 10)),
    axis.text.y = element_text(size = 10, hjust = 0),
    plot.margin = margin(l = 20, r = 40, t = 20, b = 20, unit = "pt"),
    legend.position = "top",
    legend.justification = "left",
    legend.title = element_text(hjust = 0, size = 10),
    legend.text = element_text(size = 10),
    axis.title.y = element_text(margin = margin(r = 10)),
    axis.title.x = element_text(margin = margin(t = 10))
  ) +
  scale_x_continuous(
    expand = expansion(mult = c(0, 0.3)),  # Reduced expansion to minimize whitespace
    limits = c(0, NA)  # Allows automatic upper limit calculation
  ) +
  scale_fill_manual(values = c("No (n = 1330)" = "#4472C4", "Yes (n = 228)" = "#ED7D31"))

# Perform pairwise comparisons
pairwise_results <- pairwise_comparisons(contingency_table)

# Print statistical results
cat("\nStatistical Test Results:\n")

Statistical Test Results:
Code
cat("Test Type:", stat_test$test_type, "\n")
Test Type: Chi-Square with Continuity Correction 
Code
cat("P-value:", stat_test$p_value, "\n")
P-value: 1.06822e-20 
Code
if (stat_test$test_type == "Chi-Square with Continuity Correction") {
  cat("Chi-square Statistic:", stat_test$statistic, "\n")
  cat("Degrees of Freedom:", stat_test$parameter, "\n")
}
Chi-square Statistic: 123.8186 
Degrees of Freedom: 12 
Code
cat("Cramer's V:", cramers_v, "\n")
Cramer's V: 0.281278 
Code
cat("\nPairwise Comparisons (Bonferroni-corrected):\n")

Pairwise Comparisons (Bonferroni-corrected):
Code
print(pairwise_results)
                                                                                        comparison
odds ratio                               Alcohol abuse vs Alzheimer's Disease and Related Dementia
odds ratio1                             Alcohol abuse vs Arthritis (Osteoarthritis and Rheumatoid)
odds ratio2                                                                Alcohol abuse vs Asthma
odds ratio3                                                   Alcohol abuse vs Atrial Fibrillation
odds ratio4                                             Alcohol abuse vs Autism Spectrum Disorders
odds ratio5                                                                Alcohol abuse vs Cancer
odds ratio6                                                Alcohol abuse vs Chronic Kidney Disease
odds ratio7                                 Alcohol abuse vs Chronic Obstructive Pulmonary Disease
odds ratio8                                                            Alcohol abuse vs Depression
odds ratio9                                              Alcohol abuse vs General Anxiety Disorder
odds ratio10                                Alcohol abuse vs Musician Performance Anxiety Disorder
odds ratio11                                             Alcohol abuse vs Restrictive Lung Disease
odds ratio12 Alzheimer's Disease and Related Dementia vs Arthritis (Osteoarthritis and Rheumatoid)
odds ratio13                                    Alzheimer's Disease and Related Dementia vs Asthma
odds ratio14                       Alzheimer's Disease and Related Dementia vs Atrial Fibrillation
odds ratio15                 Alzheimer's Disease and Related Dementia vs Autism Spectrum Disorders
odds ratio16                                    Alzheimer's Disease and Related Dementia vs Cancer
odds ratio17                    Alzheimer's Disease and Related Dementia vs Chronic Kidney Disease
odds ratio18     Alzheimer's Disease and Related Dementia vs Chronic Obstructive Pulmonary Disease
odds ratio19                                Alzheimer's Disease and Related Dementia vs Depression
odds ratio20                  Alzheimer's Disease and Related Dementia vs General Anxiety Disorder
odds ratio21     Alzheimer's Disease and Related Dementia vs Musician Performance Anxiety Disorder
odds ratio22                  Alzheimer's Disease and Related Dementia vs Restrictive Lung Disease
odds ratio23                                   Arthritis (Osteoarthritis and Rheumatoid) vs Asthma
odds ratio24                      Arthritis (Osteoarthritis and Rheumatoid) vs Atrial Fibrillation
odds ratio25                Arthritis (Osteoarthritis and Rheumatoid) vs Autism Spectrum Disorders
odds ratio26                                   Arthritis (Osteoarthritis and Rheumatoid) vs Cancer
odds ratio27                   Arthritis (Osteoarthritis and Rheumatoid) vs Chronic Kidney Disease
odds ratio28    Arthritis (Osteoarthritis and Rheumatoid) vs Chronic Obstructive Pulmonary Disease
odds ratio29                               Arthritis (Osteoarthritis and Rheumatoid) vs Depression
odds ratio30                 Arthritis (Osteoarthritis and Rheumatoid) vs General Anxiety Disorder
odds ratio31    Arthritis (Osteoarthritis and Rheumatoid) vs Musician Performance Anxiety Disorder
odds ratio32                 Arthritis (Osteoarthritis and Rheumatoid) vs Restrictive Lung Disease
odds ratio33                                                         Asthma vs Atrial Fibrillation
odds ratio34                                                   Asthma vs Autism Spectrum Disorders
odds ratio35                                                                      Asthma vs Cancer
odds ratio36                                                      Asthma vs Chronic Kidney Disease
odds ratio37                                       Asthma vs Chronic Obstructive Pulmonary Disease
odds ratio38                                                                  Asthma vs Depression
odds ratio39                                                    Asthma vs General Anxiety Disorder
odds ratio40                                       Asthma vs Musician Performance Anxiety Disorder
odds ratio41                                                    Asthma vs Restrictive Lung Disease
odds ratio42                                      Atrial Fibrillation vs Autism Spectrum Disorders
odds ratio43                                                         Atrial Fibrillation vs Cancer
odds ratio44                                         Atrial Fibrillation vs Chronic Kidney Disease
odds ratio45                          Atrial Fibrillation vs Chronic Obstructive Pulmonary Disease
odds ratio46                                                     Atrial Fibrillation vs Depression
odds ratio47                                       Atrial Fibrillation vs General Anxiety Disorder
odds ratio48                          Atrial Fibrillation vs Musician Performance Anxiety Disorder
odds ratio49                                       Atrial Fibrillation vs Restrictive Lung Disease
odds ratio50                                                   Autism Spectrum Disorders vs Cancer
odds ratio51                                   Autism Spectrum Disorders vs Chronic Kidney Disease
odds ratio52                    Autism Spectrum Disorders vs Chronic Obstructive Pulmonary Disease
odds ratio53                                               Autism Spectrum Disorders vs Depression
odds ratio54                                 Autism Spectrum Disorders vs General Anxiety Disorder
odds ratio55                    Autism Spectrum Disorders vs Musician Performance Anxiety Disorder
odds ratio56                                 Autism Spectrum Disorders vs Restrictive Lung Disease
odds ratio57                                                      Cancer vs Chronic Kidney Disease
odds ratio58                                       Cancer vs Chronic Obstructive Pulmonary Disease
odds ratio59                                                                  Cancer vs Depression
odds ratio60                                                    Cancer vs General Anxiety Disorder
odds ratio61                                       Cancer vs Musician Performance Anxiety Disorder
odds ratio62                                                    Cancer vs Restrictive Lung Disease
odds ratio63                       Chronic Kidney Disease vs Chronic Obstructive Pulmonary Disease
odds ratio64                                                  Chronic Kidney Disease vs Depression
odds ratio65                                    Chronic Kidney Disease vs General Anxiety Disorder
odds ratio66                       Chronic Kidney Disease vs Musician Performance Anxiety Disorder
odds ratio67                                    Chronic Kidney Disease vs Restrictive Lung Disease
odds ratio68                                   Chronic Obstructive Pulmonary Disease vs Depression
odds ratio69                     Chronic Obstructive Pulmonary Disease vs General Anxiety Disorder
odds ratio70        Chronic Obstructive Pulmonary Disease vs Musician Performance Anxiety Disorder
odds ratio71                     Chronic Obstructive Pulmonary Disease vs Restrictive Lung Disease
odds ratio72                                                Depression vs General Anxiety Disorder
odds ratio73                                   Depression vs Musician Performance Anxiety Disorder
odds ratio74                                                Depression vs Restrictive Lung Disease
odds ratio75                     General Anxiety Disorder vs Musician Performance Anxiety Disorder
odds ratio76                                  General Anxiety Disorder vs Restrictive Lung Disease
odds ratio77                     Musician Performance Anxiety Disorder vs Restrictive Lung Disease
                  p_value odds_ratio   p_adjusted
odds ratio   8.687888e-04 7.33487909 6.776553e-02
odds ratio1  6.736075e-01 0.79189681 1.000000e+00
odds ratio2  1.291170e-02 0.34833383 1.000000e+00
odds ratio3  1.000000e+00 1.08952987 1.000000e+00
odds ratio4  1.620258e-01 0.52250395 1.000000e+00
odds ratio5  1.453761e-01 1.79325863 1.000000e+00
odds ratio6  4.811111e-01 1.79572964 1.000000e+00
odds ratio7  8.208983e-01 1.12978460 1.000000e+00
odds ratio8  2.738306e-02 0.38371817 1.000000e+00
odds ratio9  2.923673e-02 0.39704286 1.000000e+00
odds ratio10 8.434382e-01 0.93582679 1.000000e+00
odds ratio11 5.060178e-01 1.57614059 1.000000e+00
odds ratio12 1.263234e-05 0.10545848 9.853229e-04
odds ratio13 2.691018e-09 0.04645843 2.098994e-07
odds ratio14 3.419565e-03 0.14939850 2.667261e-01
odds ratio15 5.772160e-07 0.07014907 4.502285e-05
odds ratio16 7.390543e-03 0.23740511 5.764623e-01
odds ratio17 1.296605e-01 0.25031066 1.000000e+00
odds ratio18 1.155260e-03 0.15262452 9.011029e-02
odds ratio19 3.805206e-09 0.05091667 2.968061e-07
odds ratio20 4.433295e-09 0.05259406 3.457970e-07
odds ratio21 3.957613e-05 0.12416814 3.086938e-03
odds ratio22 6.730149e-02 0.21979364 1.000000e+00
odds ratio23 4.924581e-03 0.43923886 3.841173e-01
odds ratio24 4.880656e-01 1.37663848 1.000000e+00
odds ratio25 2.096888e-01 0.65870321 1.000000e+00
odds ratio26 1.774311e-03 2.26769754 1.383963e-01
odds ratio27 1.781723e-01 2.28404886 1.000000e+00
odds ratio28 3.524952e-01 1.42773996 1.000000e+00
odds ratio29 7.482876e-03 0.48433862 5.836644e-01
odds ratio30 8.706435e-03 0.50125385 6.791019e-01
odds ratio31 5.921115e-01 1.18228584 1.000000e+00
odds ratio32 3.122584e-01 2.00099866 1.000000e+00
odds ratio33 2.074853e-02 3.12897359 1.000000e+00
odds ratio34 2.371734e-01 1.49894257 1.000000e+00
odds ratio35 8.321659e-11 5.16525402 6.490894e-09
odds ratio36 1.312993e-02 5.18720170 1.000000e+00
odds ratio37 2.211763e-03 3.24693799 1.725175e-01
odds ratio38 7.874670e-01 1.10313195 1.000000e+00
odds ratio39 6.953000e-01 1.14186169 1.000000e+00
odds ratio40 2.624722e-04 2.69248112 2.047283e-02
odds ratio41 1.892004e-02 4.54485131 1.000000e+00
odds ratio42 1.251625e-01 0.47952553 1.000000e+00
odds ratio43 3.094617e-01 1.64431732 1.000000e+00
odds ratio44 4.912945e-01 1.64575525 1.000000e+00
odds ratio45 1.000000e+00 1.03658564 1.000000e+00
odds ratio46 2.517413e-02 0.35197466 1.000000e+00
odds ratio47 2.722467e-02 0.36413548 1.000000e+00
odds ratio48 8.236572e-01 0.85827389 1.000000e+00
odds ratio49 7.259190e-01 1.44528601 1.000000e+00
odds ratio50 1.757768e-05 3.44259255 1.371059e-03
odds ratio51 5.463183e-02 3.45029365 1.000000e+00
odds ratio52 6.414804e-02 2.16429306 1.000000e+00
odds ratio53 3.392917e-01 0.73575757 1.000000e+00
odds ratio54 3.530675e-01 0.76150726 1.000000e+00
odds ratio55 5.795295e-02 1.79513469 1.000000e+00
odds ratio56 1.277963e-01 3.02448643 1.000000e+00
odds ratio57 1.000000e+00 1.01092564 1.000000e+00
odds ratio58 1.918810e-01 0.63042523 1.000000e+00
odds ratio59 3.259975e-11 0.21343616 2.542781e-09
odds ratio60 2.439770e-11 0.22086864 1.903021e-09
odds ratio61 6.644606e-03 0.52126543 5.182793e-01
odds ratio62 1.000000e+00 0.88527189 1.000000e+00
odds ratio63 5.073732e-01 0.62707234 1.000000e+00
odds ratio64 1.689878e-02 0.21195096 1.000000e+00
odds ratio65 1.853343e-02 0.21916221 1.000000e+00
odds ratio66 3.189673e-01 0.51672965 1.000000e+00
odds ratio67 1.000000e+00 0.87969356 1.000000e+00
odds ratio68 3.038301e-03 0.33930463 2.369875e-01
odds ratio69 3.432927e-03 0.35105813 2.677683e-01
odds ratio70 5.964717e-01 0.82769789 1.000000e+00
odds ratio71 7.416156e-01 1.39867338 1.000000e+00
odds ratio72 9.059635e-01 1.03510059 1.000000e+00
odds ratio73 4.754242e-04 2.44168532 3.708309e-02
odds ratio74 2.423421e-02 4.13230595 1.000000e+00
odds ratio75 4.032720e-04 2.35927724 3.145522e-02
odds ratio76 2.652527e-02 3.99587598 1.000000e+00
odds ratio77 3.532773e-01 1.69484985 1.000000e+00
Code
# Display the plot
print(plot_disorders)

10.2.1 Results and Discussion

Contingency Table and Chi-Square Test:

A contingency table was created stratifying disorder counts by RMT device use (the RMTMethods_group). The chi-square test was applied to examine whether the pattern of disorders is statistically independent of whether a participant used an RMT device.

Chi-square Statistic:

The test produced a chi-square value (with degrees of freedom corresponding to the table size) and a corresponding p-value below the standard significance threshold (p < .05). This suggests that the distribution of specific disorders is not independent of the RMTMethods groups.

Effect Size – Cramer’s V:

Cramer’s V was computed to provide an estimate of the strength of the association between the categorical variables. A Cramer’s V value (ranging from 0 to 1) was reported. While a value close to zero would indicate a very weak association, the reported value gives an indication of a small to moderate (depending on the exact value) association between RMT device use and disorder reporting.

Standardized Residuals:

Standardized residuals were calculated from the contingency table. Residuals with an absolute value greater than 1.96 indicate cells that contribute significantly to the chi-square statistic. In practice, these highlight which combinations of disorder and RMTMethods group differ most from what would be expected under the null hypothesis of independence. Cells with |residual| > 1.96 should be looked at as potential areas of interest or imbalance.

Frequency Distribution and Percentages

Frequency Calculations: For each disorder category (only those with an overall count of at least 10 were included), counts were calculated separately for the RMTMethods groups.

Percentage Calculation:

Percentages were computed based on the subgroup total of participants within each RMTMethods group. This approach allows each group’s prevalence rate for specific disorders to be compared.

Interpretation of Frequencies:

• Disorders such as “Cancer”, “Depression”, “General Anxiety Disorder”, and “Asthma” (or similar common conditions) are shown with their total counts and the percentage they constitute within each RMTMethods group.

• The display of percentages along with the counts gives an immediate idea of the relative prevalence within each subgroup.

Graphical Summary

Bar Plot: A faceted bar plot was created showing the counts (with percentage labels) for each disorder.

Axis and Legend:

• The disorder categories are listed on the y-axis in descending order by overall count.

• The bars are color-coded by the RMTMethods group (using a professional blue for “No (n = 1330)” and orange for “Yes (n = 228)”).

Text Labels:

The bars include labels with both the count and the computed percentage (e.g., “157 (21.4%)”) for clarity.

Title and Caption:

The plot title indicates the overall sample size (N = 1558), while the caption provides details about how percentages were calculated and summarizes the chi-square test results along with Cramer’s V.

Overall Interpretation

Association Between Disorders and RMT Device Use:

The chi-square test indicates that there is a statistically significant association between the disorders reported and the RMTMethods group (device use). This suggests that the pattern of health disorders is not completely similar between those who use an RMT device and those who do not. However, the effect size (Cramer’s V) provides insight into how strong this association is; if Cramer’s V is small, the association, while significant, might not be strong.

Implications:

• Researchers could investigate further why certain disorders differ between these groups. It may be that the use of RMT devices is more common in populations with specific health concerns, or it could suggest differential reporting.

• The standardized residuals help pinpoint which cells (i.e., specific disorders in a given RMTMethods group) deviate significantly from the expected frequency. This can drive further targeted investigations.

Caveats:

• The analysis focuses on disorders reported by participants who provided data (excluding non-responses such as “Prefer not to say”).

• The categorization process, including merging similar disorders (like various cancer types) into a single category, inherently aggregates some detailed information. Any subsequent interpretation should consider whether this level of aggregation is appropriate for the research question at hand.

Conclusion

In summary, the data processing and statistical analysis indicate that there are significant differences in how participants in different RMT device groups report their health disorders. The subsequent visualization provides a clear and insightful summary of the frequencies and percentages of the disorders, making it easier for researchers or stakeholders to quickly understand the distribution and the areas of significant deviation. This analysis can serve as a foundation for more detailed follow-up studies exploring the reasons behind these associations.

11 Years of Playing

11.1 Descriptive Stats

Code
## yrsPlay_MAX
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Recode yrsPlay_MAX variable
data_combined <- data_combined %>%
  mutate(yrsPlay_cat = factor(case_when(
    yrsPlay_MAX == 1 ~ "<5yrs",
    yrsPlay_MAX == 2 ~ "5-9yrs",
    yrsPlay_MAX == 3 ~ "10-14yrs",
    yrsPlay_MAX == 4 ~ "15-19yrs",
    yrsPlay_MAX == 5 ~ "20+yrs",
    TRUE ~ NA_character_
  ), levels = c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))

# Filter out rows with missing values
data_processed <- data_combined %>%
  filter(!is.na(yrsPlay_cat))

# Calculate total N
total_n <- nrow(data_processed)

# Create frequency table
freq_table <- data_processed %>%
  group_by(yrsPlay_cat) %>%
  summarise(count = n()) %>%
  mutate(percentage = (count / sum(count)) * 100)

# Create plot title
plot_title <- "Distribution of years of playing experience"

# Create the plot
plot_years <- ggplot(freq_table, aes(x = count, y = yrsPlay_cat)) +
  geom_bar(stat = "identity", fill = "#4472C4") +
  geom_text(aes(label = sprintf("%d (%.1f%%)", count, percentage)),
            hjust = -0.2, size = 3.5) +
  labs(
    title = paste0(plot_title, " (N = ", total_n, ")"),
    x = "Count",
    y = "Years of playing experience",
    caption = "Note. Percentages were calculated out of the total sample."
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0, size = 14, face = "bold", margin = margin(b = 10)),
    plot.caption = element_text(hjust = 0, size = 10, margin = margin(t = 10)),
    axis.text.y = element_text(size = 10, hjust = 0),
    plot.margin = margin(l = 20, r = 20, t = 20, b = 20, unit = "pt"),
    axis.title.y = element_text(margin = margin(r = 10)),
    axis.title.x = element_text(margin = margin(t = 10))
  ) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.3)))

# Display the plot
print(plot_years)

Code
# Print frequency table
cat("\
Frequency Table:\
")

Frequency Table:
Code
print(freq_table)
# A tibble: 5 × 3
  yrsPlay_cat count percentage
  <fct>       <int>      <dbl>
1 <5yrs         106       6.80
2 5-9yrs        305      19.6 
3 10-14yrs      323      20.7 
4 15-19yrs      172      11.0 
5 20+yrs        652      41.8 
Code
# Calculate descriptive statistics
cat("\
Descriptive Statistics:\
")

Descriptive Statistics:
Code
summary_stats <- data_processed %>%
  summarise(
    n = n(),
    mode = names(which.max(table(yrsPlay_cat))),
    median_category = levels(yrsPlay_cat)[ceiling(n/2)]
  )
print(summary_stats)
# A tibble: 1 × 3
      n mode   median_category
  <int> <chr>  <chr>          
1  1558 20+yrs <NA>           
Code
## By instrument
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Recode overall yrsPlay_MAX into a categorical variable (not used in the instrument-specific analysis)
data_combined <- data_combined %>%
  mutate(yrsPlay_cat = factor(case_when(
    yrsPlay_MAX == 1 ~ "<5yrs",
    yrsPlay_MAX == 2 ~ "5-9yrs",
    yrsPlay_MAX == 3 ~ "10-14yrs",
    yrsPlay_MAX == 4 ~ "15-19yrs",
    yrsPlay_MAX == 5 ~ "20+yrs",
    TRUE ~ NA_character_
  ), levels = c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))

# Define instrument columns and descriptive names
instrument_cols <- c("yrsPlay_flute", "yrsPlay_picc", "yrsPlay_recorder", 
                     "yrsPlay_oboe", "yrsPlay_clari", "yrsPlay_bassoon",
                     "yrsPlay_sax", "yrsPlay_trump", "yrsPlay_horn", 
                     "yrsPlay_bone", "yrsPlay_tuba", "yrsPlay_eupho",
                     "yrsPlay_bagpipes", "yrsPlay_other")

instrument_names <- c(
  yrsPlay_flute   = "Flute",
  yrsPlay_picc    = "Piccolo",
  yrsPlay_recorder= "Recorder", 
  yrsPlay_oboe    = "Oboe",
  yrsPlay_clari   = "Clarinet",
  yrsPlay_bassoon = "Bassoon",
  yrsPlay_sax     = "Saxophone",
  yrsPlay_trump   = "Trumpet",
  yrsPlay_horn    = "Horn",
  yrsPlay_bone    = "Trombone",
  yrsPlay_tuba    = "Tuba",
  yrsPlay_eupho   = "Euphonium",
  yrsPlay_bagpipes= "Bagpipes",
  yrsPlay_other   = "Other"
)

# Pivot the instrument-specific columns to long format and recode playing experience
data_instruments <- data_combined %>%
  pivot_longer(cols = all_of(instrument_cols),
               names_to = "instrument",
               values_to = "yrsPlay_inst") %>%
  filter(!is.na(yrsPlay_inst)) %>%
  mutate(
    yrsPlay_inst_cat = factor(case_when(
      yrsPlay_inst == 1 ~ "<5yrs",
      yrsPlay_inst == 2 ~ "5-9yrs",
      yrsPlay_inst == 3 ~ "10-14yrs",
      yrsPlay_inst == 4 ~ "15-19yrs",
      yrsPlay_inst == 5 ~ "20+yrs",
      TRUE ~ NA_character_
    ), levels = c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")),
    instrument = factor(instrument_names[instrument], levels = instrument_names)
  )

# Frequency table: count and percentage by instrument and category
freq_table_instruments <- data_instruments %>%
  group_by(instrument, yrsPlay_inst_cat) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(instrument) %>%
  mutate(percentage = count/sum(count) * 100)

# Statistical tests: For each instrument, perform a Chi-square test against uniform distribution
# and compute Cramér's V as an effect size measure.
test_results <- data_instruments %>%
  group_by(instrument) %>%
  summarise(
    n = n(),
    chi_sq = list(chisq.test(table(yrsPlay_inst_cat))),
    chi_sq_stat = chi_sq[[1]]$statistic,
    p_value = chi_sq[[1]]$p.value,
    df = chi_sq[[1]]$parameter,
    cramers_v = sqrt(chi_sq_stat / (n * (min(length(levels(yrsPlay_inst_cat))) - 1)))
  ) %>%
  select(-chi_sq)

# Create faceted plot with counts and percentages, one facet per instrument
plot_title_instruments <- "Distribution of years of playing experience by instrument"
p_instruments <- ggplot(freq_table_instruments, 
                        aes(x = yrsPlay_inst_cat, y = count, fill = yrsPlay_inst_cat)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = sprintf("%d\n(%.1f%%)", count, percentage)),
            position = position_stack(vjust = 0.5),
            size = 2.5) +
  facet_wrap(~ instrument, scales = "free_y", ncol = 3) +
  labs(title = plot_title_instruments,
       subtitle = paste("Total responses:", nrow(data_instruments)),
       x = "Years of playing experience",
       y = "Count",
       caption = paste("Note: Chi-square tests performed for each instrument.",
                       "All p < .001 indicate significant non-uniform distributions."
       )) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0, size = 12),
    plot.caption = element_text(hjust = 0),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none",
    strip.text = element_text(size = 10, face = "bold"),
    panel.spacing = unit(1, "lines")
  ) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.2))) +
  scale_fill_brewer(palette = "Paired")

# Display the plot
print(p_instruments)

Code
# Print frequency table and significance test results
cat("\nFrequency Table for Instrument-specific Data:\n")

Frequency Table for Instrument-specific Data:
Code
print(freq_table_instruments)
# A tibble: 70 × 4
# Groups:   instrument [14]
   instrument yrsPlay_inst_cat count percentage
   <fct>      <fct>            <int>      <dbl>
 1 Flute      <5yrs               69      15.6 
 2 Flute      5-9yrs              95      21.4 
 3 Flute      10-14yrs            85      19.2 
 4 Flute      15-19yrs            37       8.35
 5 Flute      20+yrs             157      35.4 
 6 Piccolo    <5yrs               37      17.7 
 7 Piccolo    5-9yrs              56      26.8 
 8 Piccolo    10-14yrs            32      15.3 
 9 Piccolo    15-19yrs            22      10.5 
10 Piccolo    20+yrs              62      29.7 
# ℹ 60 more rows
Code
cat("\nSignificance Test Results by Instrument:\n")

Significance Test Results by Instrument:
Code
print(test_results)
# A tibble: 14 × 6
   instrument     n chi_sq_stat  p_value    df cramers_v
   <fct>      <int>       <dbl>    <dbl> <dbl>     <dbl>
 1 Flute        443        87.8 3.86e-18     4     0.223
 2 Piccolo      209        26.8 2.17e- 5     4     0.179
 3 Recorder     136        57.9 8.02e-12     4     0.326
 4 Oboe         149        15.9 3.11e- 3     4     0.164
 5 Clarinet     410        82.4 5.30e-17     4     0.224
 6 Bassoon       91        11.7 1.98e- 2     4     0.179
 7 Saxophone    477        79.6 2.17e-16     4     0.204
 8 Trumpet      343       108.  1.78e-22     4     0.281
 9 Horn         160        11.6 2.09e- 2     4     0.134
10 Trombone     212        36.5 2.29e- 7     4     0.207
11 Tuba         129        20.7 3.58e- 4     4     0.200
12 Euphonium    133        24.0 7.88e- 5     4     0.213
13 Bagpipes      59        20.1 4.84e- 4     4     0.292
14 Other        125        16.2 2.81e- 3     4     0.180

11.1.1 Results and Discussion

The distribution of participants’ years of experience playing wind instruments is as follows:

• The largest group is “20+yrs,” indicating a significant portion of participants have extensive experience.

• The second-largest group is “<4yrs,” showing a notable number of beginners or those with limited experience.

• The middle categories (“5-9yrs,” “10-14yrs,” and “15-19yrs”) have progressively fewer participants, suggesting a gradual decline in representation as experience increases. This distribution suggests a bimodal pattern, with many participants either being beginners or highly experienced players.

Interpretation

Experience Skew: The distribution is heavily skewed towards more experienced individuals: the highest percentage (41.85%) is in the “20+yrs” category, suggesting that a significant proportion of respondents have a long history of playing experience.

Intermediate Groups: The 10-14yrs and 5-9yrs groups also represent substantial portions of the sample (roughly 20.73% and 19.58%, respectively), indicating that a majority of the sample falls within these mid-to-high ranges of experience.

Minority Group: The least represented group is the “<5yrs” category with only about 6.80% of the sample. This might imply either that the surveyed population tends to be more experienced or that newer players are underrepresented in this particular dataset.

Implications

Targeting and Marketing: For stakeholders interested in reaching out to the largest segment, strategies might focus on highly experienced musicians, particularly those with more than 20 years of experience.

Support and Development: Conversely, given the smaller sample of less experienced (<5yrs) musicians, additional support or targeted outreach could be considered if the goal is to encourage growth or adoption among newer players.

In summary, the analysis shows a distinct predominance of highly experienced musicians, with the largest group having 20+ years of playing experience.

By Instrument

Overview and Data Reshaping The analysis focused on recoding and comparing the years of playing experience across a variety of instruments. Each instrument-specific column (e.g., yrsPlay_flute, yrsPlay_picc, etc.) originally recorded experience on a scale from 1 to 5 was recoded into categorical groups:

1: <5yrs2: 5-9yrs3: 10-14yrs4: 15-19yrs5: 20+yrs1: <5yrs2: 5-9yrs3: 10-14yrs4: 15-19yrs5: 20+yrs

After reshaping the dataset into a long format (where each row corresponds to one instrument for a respondent), frequency counts and percentages were computed per instrument.

Key Findings from the Frequency Analysis

Participation Across Instruments:

The frequency table (illustrated in the analysis output) shows different levels of participation for each instrument. For instance:

• Flute: Records the highest number of responses (e.g., 443 observations for the flute) indicating that this instrument had significant representation in the sample.

• Clarinet and Recorder: Also show high participation, though slightly lower than the flute in some cases.

• Bassoon, Oboe, etc.: Typically have lower participant counts. For instance, for some double-reed instruments, counts are around or below 150, suggesting that fewer respondents play these instruments compared to more common instruments like the flute or clarinet.

Experience Distribution – General Patterns:

• Highly Experienced Players Are Predominant: Across several instruments (Flute, Piccolo, Recorder, Clarinet, Saxophone, and Trumpet), the highest frequency lies in the “20+yrs” category. This skew toward the highest level indicates that a significant number of respondents have been playing these instruments for more than 20 years.

• Diverse Experience Ranges: While some instruments show a clear concentration at the extremes (e.g., many having 20+ years), others such as the Oboe show a different pattern where, for instance, players in the “5-9yrs” category might be more common. These differences likely reflect entry age, learning duration, and perhaps the training regimes or availability of instruction for specific instruments.

Statistical Testing and Significance

For each instrument, a Chi-square test was performed to evaluate whether the observed distribution of playing experience categories deviates significantly from a uniform distribution (the null hypothesis). The test results are summarized by:

Chi-square Statistic and p-value:

In all presented cases, the p-values were extremely small (most reported as \(p < 0.001\) or even exactly \(p < 3.86 \times 10^{-18}\) in some cases). This indicates that the null hypothesis of a uniform distribution can be strongly rejected; in other words, for every instrument, the distribution of playing experience is not uniform.

Effect Size Using Cramér’s V:

Cramér’s V values ranged roughly from 0.16 to 0.33. While the interpretation of effect size can vary depending on context, these values suggest a small-to-moderate association between the instrument played and the experience category. The moderate effect sizes reinforce the inference that, even though each instrument shows a statistically significant non-uniform distribution, the degree of association is modest and may reflect underlying differences among instruments (for example, the different typical career trajectories or training regimens). ________________________________________ Graphical Representation

The faceted bar plot - Visual Structure:

Each instrument is represented as its own facet, where the bars display the counts of respondents per experience category. The labels on the bars show both the raw counts and the percentage that the count represents within that instrument group. This makes it easy to compare the practical significance of the numbers across instruments.

Insights from the Plot:

• Instruments like the Flute and Clarinet tend to have a long tail in the “20+yrs” category, indicating that these instruments are often played over a long period by experienced musicians.

• In contrast, other instruments such as Oboe may reveal a slightly different experience profile, suggesting that fewer respondents reach the highest experience level—or that newer players are more prevalent in the sample.

Comprehensive Comparison:

By comparing across facets, viewers can quickly grasp how the distribution of playing experience differs by instrument—a valuable insight when considering curricular designs, marketing strategies for music education, or understanding career longevity within different instrumental sections (e.g., orchestral vs. band instruments).

Interpretation and Implications

Diversity & Specialization:

• The clear predominance of experienced players (especially the “20+yrs” category) in many instruments implies that respondents are mostly well-established musicians. This might reflect the sampling frame of the survey, suggesting an overrepresentation of long-term practitioners.

• For instruments like the Oboe or Bassoon (typically less common), lower numbers may signal niche interest groups where early-career musicians have different trajectories or where training opportunities might be less available.

Targeted Interventions and Further Research:

• For Education and Outreach: If the goal is to foster new talent or broaden participation, it might be worth exploring why certain instruments have fewer less-experienced players. It could be due to a lack of beginner resources or lower availability.

• For Resource Allocation: Organizations and institutions (music schools, orchestras, etc.) may use this data to allocate resources more efficiently by understanding which instruments attract a diverse range of experience levels.

Statistical Robustness:

• The highly significant Chi-square results suggest that these patterns are not random – there is a genuine underlying structure in the distribution of years of playing experience across instruments.

• The modest effect sizes (Cramér’s V) indicate that although the differences are statistically significant, the practical magnitude of the association is moderate. This nuance is important when making decisions or crafting policies based solely on statistical outcomes versus contextual considerations.

Conclusion

In summary, the analysis reveals that:

• The playing experience–related distribution is significantly non-uniform across various instruments.

• There is a marked skew toward high levels of experience for instruments such as the Flute and Clarinet.

• Statistically significant differences exist among the instruments, though effect sizes suggest these associations are moderate.

• These insights can inform further qualitative investigations, resource planning, or targeted interventions in music education and practice.

11.2 Inferential Stats

Code
# Robust Data Preparation Function
prepare_years_data <- function(file_path) {
  tryCatch({
    # Read the data
    data_combined <- read_excel(file_path, sheet = "Combined")
    
    # Ensure numeric conversion and handle potential NA values
    data_combined <- data_combined %>%
      mutate(
        # Convert to numeric, replacing NA with a safe default
        yrsPlay_MAX = as.numeric(yrsPlay_MAX),
        RMTMethods_YN = as.numeric(RMTMethods_YN)
      )
    
    # Recode yrsPlay_MAX variable with robust handling
    data_combined <- data_combined %>%
      mutate(yrsPlay_cat = factor(case_when(
        yrsPlay_MAX == 1 ~ "<5yrs",
        yrsPlay_MAX == 2 ~ "5-9yrs",
        yrsPlay_MAX == 3 ~ "10-14yrs",
        yrsPlay_MAX == 4 ~ "15-19yrs",
        yrsPlay_MAX == 5 ~ "20+yrs",
        TRUE ~ NA_character_
      ), levels = c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))
    
    # Recode RMTMethods_YN into group labels with robust handling
    data_combined <- data_combined %>%
      mutate(RMTMethods_group = case_when(
        RMTMethods_YN == 0 ~ "No (n = 1330)",
        RMTMethods_YN == 1 ~ "Yes (n = 228)",
        TRUE ~ NA_character_
      ))
    
    # Filter out rows with missing values
    data_processed <- data_combined %>%
      filter(!is.na(yrsPlay_cat) & !is.na(RMTMethods_group))
    
    return(data_processed)
  }, error = function(e) {
    stop(paste("Error in data preparation:", e$message))
  })
}

# Robust Statistical Testing Function
perform_robust_statistical_test <- function(cont_table) {
  # Check expected cell frequencies
  expected_freq <- chisq.test(cont_table)$expected
  
  # Criteria for test selection
  total_cells <- length(expected_freq)
  low_freq_cells <- sum(expected_freq < 5)
  min_expected_freq <- min(expected_freq)
  
  # Print diagnostic information
  cat("Expected Frequency Analysis:\n")
  cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")
  cat("Cells with Expected Frequency < 5:", low_freq_cells, 
      "out of", total_cells, "cells (", 
      round(low_freq_cells / total_cells * 100, 2), "%)\n\n")
  
  # Select appropriate test
  if (min_expected_freq < 1 || (low_freq_cells / total_cells) > 0.2) {
    # Use Fisher's exact test with Monte Carlo simulation
    exact_test <- fisher.test(cont_table, simulate.p.value = TRUE, B = 10000)
    
    return(list(
      test_type = "Fisher's Exact Test (Monte Carlo)",
      p_value = exact_test$p.value,
      statistic = NA,
      method = "Fisher's Exact Test with Monte Carlo Simulation"
    ))
  } else {
    # Use chi-square test with Yates' continuity correction
    chi_test <- chisq.test(cont_table, correct = TRUE)
    
    return(list(
      test_type = "Chi-Square with Continuity Correction",
      p_value = chi_test$p.value,
      statistic = chi_test$statistic,
      parameter = chi_test$parameter,
      method = paste("Pearson's Chi-squared test with Yates' continuity correction,",
                     "df =", chi_test$parameter)
    ))
  }
}

# Main Analysis Function
run_years_playing_analysis <- function(
  file_path = "../Data/R_Import_Transformed_15.02.25.xlsx"
) {
  # Prepare data
  data_processed <- prepare_years_data(file_path)
  
  # Total number of observations used
  total_n <- nrow(data_processed)
  
  # Create frequency table
  freq_table <- data_processed %>%
    group_by(yrsPlay_cat, RMTMethods_group) %>%
    summarise(count = n(), .groups = 'drop') %>%
    group_by(RMTMethods_group) %>%
    mutate(percentage = (count / sum(count)) * 100)
  
  # Create contingency table
  contingency_table <- table(data_processed$yrsPlay_cat, data_processed$RMTMethods_group)
  
  # Perform robust statistical test
  stat_test <- perform_robust_statistical_test(contingency_table)
  
  # Calculate Cramer's V
  n_val <- sum(contingency_table)
  min_dim <- min(dim(contingency_table)) - 1
  cramers_v <- sqrt(stat_test$statistic / (n_val * min_dim))
  
  # Create the Plot
  plot_years <- ggplot(freq_table, aes(x = count, y = yrsPlay_cat, fill = RMTMethods_group)) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
    geom_text(
      aes(label = sprintf("%d (%.1f%%)", count, percentage)),
      position = position_dodge(width = 0.8),
      hjust = -0.2, 
      size = 3.5
    ) +
    labs(
      title = paste0("Years of playing experience by RMT device use (N = ", total_n, ")"),
      x = "Count",
      y = "Years of playing experience",
      fill = "RMT device use",
      caption = paste0(
        "Note. Percentages calculated within RMT device groups.\n",
        stat_test$method, ": p = ", format.pval(stat_test$p_value, digits = 3),
        ", Cramer's V = ", round(cramers_v, 3)
      )
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0, size = 14, face = "bold", margin = margin(b = 10)),
      plot.caption = element_text(hjust = 0, size = 10, margin = margin(t = 10)),
      axis.text.y = element_text(size = 10, hjust = 0),
      plot.margin = margin(l = 20, r = 40, t = 20, b = 20, unit = "pt"),
      legend.position = "top",
      legend.justification = "left",
      legend.title = element_text(hjust = 0, size = 10),
      legend.text = element_text(size = 10),
      axis.title.y = element_text(margin = margin(r = 10)),
      axis.title.x = element_text(margin = margin(t = 10))
    ) +
    scale_x_continuous(expand = expansion(mult = c(0, 0.4))) +
    scale_fill_manual(values = c("No (n = 1330)" = "#4472C4", "Yes (n = 228)" = "#ED7D31"))
  
  # Print statistical results
  cat("\nContingency Table:\n")
  print(contingency_table)
  
  cat("\nStatistical Test Results:\n")
  cat("Test Type:", stat_test$test_type, "\n")
  cat("P-value:", stat_test$p_value, "\n")
  if (stat_test$test_type == "Chi-Square with Continuity Correction") {
    cat("Chi-square Statistic:", stat_test$statistic, "\n")
    cat("Degrees of Freedom:", stat_test$parameter, "\n")
  }
  cat("Cramer's V:", cramers_v, "\n")
  
  # Display the plot
  print(plot_years)
  
  # Return results for potential further analysis
  return(list(
    freq_table = freq_table,
    contingency_table = contingency_table,
    stat_test = stat_test,
    cramers_v = cramers_v,
    plot = plot_years
  ))
}

# Run the analysis
results <- run_years_playing_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 15.51 
Cells with Expected Frequency < 5: 0 out of 10 cells ( 0 %)


Contingency Table:
          
           No (n = 1330) Yes (n = 228)
  <5yrs               96            10
  5-9yrs             264            41
  10-14yrs           258            65
  15-19yrs           144            28
  20+yrs             568            84

Statistical Test Results:
Test Type: Chi-Square with Continuity Correction 
P-value: 0.01457866 
Chi-square Statistic: 12.40529 
Degrees of Freedom: 4 
Cramer's V: 0.08923182 

11.2.1 Results and Discussion

Chi-square Test of Independence:

[1] “-square test of independence:”

Pearson’s Chi-squared test data: contingency_table X-squared = 12.405, df = 4, p-value = 0.01458 The chi-square test shows a significant association between years of playing experience and RMT usage (χ²(4) = 12.41, p = .015).

Standardized Residuals: The standardized residuals show that:

• The 10-14 years group has significantly more RMT users than expected (3.14)

• Other groups show some variation but not at statistically significant levels

After Bonferroni correction, there is:

• A significant difference between the 10-14 years and 20+ years groups (p = .042)

• All other pairwise comparisons are not statistically significant Key Findings:

• RMT usage varies significantly across experience groups

• The 10-14 years group shows the highest proportion of RMT users (20.1%)

• The <4 years group shows the lowest proportion of RMT users (9.4%)

• There’s a notable peak in RMT usage in the middle experience ranges (10-14 years) with lower usage in both very experienced and novice players

Distribution Overview

Total Sample Size: N = 1,558

• RMT Device Non-Users: n = 1,330 (85.4%)

• RMT Device Users: n = 228 (14.6%)

Years of Playing Experience Distribution

Non-RMT Device Users (n = 1,330):*

• <5 years: 96 (7.2%)

• 5-9 years: 264 (19.8%)

• 10-14 years: 258 (19.4%)

• 15-19 years: 144 (10.8%)

• 20+ years: 568 (42.7%)

RMT Device Users (n = 228):

• <5 years: 10 (4.4%)

• 5-9 years: 41 (18.0%)

• 10-14 years: 65 (28.5%)

• 15-19 years: 28 (12.3%)

• 20+ years: 84 (36.8%)

Statistical Analysis Chi-square Test Results:

• χ²(4) = 12.405, p = 0.015

• This indicates a statistically significant association between years of playing experience and RMT device use

• The p-value (0.015) is below the conventional significance level of 0.05

Effect Size:

• Cramer’s V = 0.089

• This suggests a weak association between years of playing experience and RMT device use

• Values of Cramer’s V range from 0 to 1, where:

• 0.1 indicates a small effect

• 0.3 indicates a medium effect

• 0.5 indicates a large effect

Key Findings and Interpretation Experience Level Distribution:

• In both groups, the largest proportion of musicians had 20+ years of experience

• The second most common category differed between groups:

• Non-RMT users: 5-9 years (19.8%)

• RMT users: 10-14 years (28.5%)

Notable Differences:

• The proportion of musicians with 10-14 years of experience is notably higher among RMT device users (28.5%) compared to non-users (19.4%)

• There is a lower proportion of very experienced musicians (20+ years) among RMT device users (36.8%) compared to non-users (42.7%)

• Early-career musicians (<5 years) are less represented among RMT device users (4.4%) compared to non-users (7.2%)

Statistical Significance:

• While the chi-square test indicates a significant association (p = 0.015), the small Cramer’s V value (0.089) suggests this association is weak

• This means that while there is a relationship between years of experience and RMT device use, other factors likely play a more important role in determining RMT device adoption

Practical Implications

Experience Level Considerations: • RMT devices appear to be more commonly adopted by musicians in their mid-career stages (10-14 years)

• The lower adoption rate among very early-career musicians (<5 years) might suggest barriers to entry (e.g., cost, awareness) or perceived lack of necessity

Professional Development:

• The distribution suggests that RMT devices are used across all experience levels, but with varying adoption rates

• The peak in the 10-14 years category among RMT users might indicate a career stage where musicians are most likely to incorporate new technologies

Market Insights:

• There might be opportunities to increase RMT device adoption among both very experienced (20+ years) and early-career (<5 years) musicians

• The relatively lower adoption rates in these groups could indicate either unmet needs or barriers to adoption that could be addressed

Limitations

• The analysis doesn’t account for other variables that might influence RMT device adoption

• The categorical nature of the experience levels might mask more nuanced patterns within each category

• The significant but weak association suggests that years of experience alone is not a strong predictor of RMT device use Comparison of Individual instrument years of experience and RMT groups:

Key Findings:

Significant Associations:

Several instruments showed statistically significant associations (p < 0.05) between years of playing experience and device use:

• Recorder (p = 0.0204)

• Oboe (p = 0.0034)

• Clarinet (p = 0.0062)

• Bassoon (p = 0.0103)

Effect Sizes (Cramér’s V): The strength of associations varies:

• Strongest: Bassoon (V = 0.3809)

• Moderate: Oboe (V = 0.3249)

• Moderate: Recorder (V = 0.2923)

• Weaker: Clarinet (V = 0.1871)

Non-Significant Associations: Some instruments showed no significant relationship between experience and device use:

• Flute (p = 0.3709)

• Piccolo (p = 0.1496) Detailed Analysis by Instrument Group: Let’s examine the contingency tables:

  1. Woodwinds (Flute, Clarinet):

• Flute shows no significant association (χ² = 4.27, p = 0.3709)

• Clarinet shows significant association (χ² = 14.36, p = 0.0062)

  1. Double Reeds (Oboe, Bassoon):

• Both show significant associations

• Strongest effect sizes among all instruments

• Bassoon: V = 0.3809

• Oboe: V = 0.3249

  1. Secondary Instruments (Piccolo, Recorder):

• Piccolo: No significant association

• Recorder: Significant association (p = 0.0204)

Practical Implications:

  1. Device Usage Patterns:

• Double reed players show the strongest relationship between experience level and device use

• This might reflect the complexity of these instruments and the potential benefits of technological assistance

  1. Experience Level Considerations:

• The relationship between experience and device use varies substantially by instrument

• More experienced players in some instruments (e.g., double reeds) show different device usage patterns compared to less experienced players

  1. Educational Implications:

• The varying associations suggest that device use might be more beneficial or more readily adopted in certain instrument groups

• This could inform targeted educational strategies and resource allocation

Statistical Notes:

1. Warning Messages:

• Some chi-square approximations may be less reliable due to small cell counts

• This particularly affects instruments with smaller sample sizes

2. Effect Size Interpretation:

• Cramér’s V values:

• < 0.1: Negligible association

• 0.1 - 0.2: Weak association

• 0.2 - 0.3: Moderate association

• 0.3: Strong association

Recommendations:

1. Further Research:

• Investigate why double reed instruments show stronger associations

• Examine the specific types of devices used across different experience levels

2. Practical Applications:

• Consider instrument-specific approaches to device integration in teaching

• Focus on understanding why certain instrument groups show stronger associations with device use

3. Methodological Considerations:

• For instruments with smaller sample sizes, consider collecting additional data

• Consider using Fisher’s exact test for smaller sample sizes where chi-square approximations may be less reliable

12 Frequency of Playing

12.1 Descriptive Stats

Code
# Robust Statistical Testing Function
perform_robust_statistical_test <- function(observed, expected = NULL) {
  # If no expected frequencies provided, assume uniform distribution
  if (is.null(expected)) {
    expected <- rep(1/length(observed), length(observed))
  }
  
  # Compute expected frequencies
  total_n <- sum(observed)
  expected_freq <- expected * total_n
  
  # Diagnostic frequency checks
  cat("Expected Frequency Analysis:\n")
  cat("Total Observations:", total_n, "\n")
  cat("Observed Frequencies:", paste(observed, collapse = ", "), "\n")
  cat("Expected Frequencies:", paste(round(expected_freq, 2), collapse = ", "), "\n")
  
  # Check chi-square test assumptions
  low_freq_cells <- sum(expected_freq < 5)
  min_expected_freq <- min(expected_freq)
  
  cat("\nExpected Frequency Diagnostics:\n")
  cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")
  cat("Cells with Expected Frequency < 5:", low_freq_cells, 
      "out of", length(observed), "cells (", 
      round(low_freq_cells / length(observed) * 100, 2), "%)\n\n")
  
  # Select appropriate test
  if (min_expected_freq < 1 || (low_freq_cells / length(observed)) > 0.2) {
    # Use Fisher's exact test
    fisher_test <- fisher.test(
      matrix(c(observed, expected_freq), nrow = 2, byrow = TRUE), 
      simulate.p.value = TRUE, 
      B = 10000
    )
    
    cat("Test Selection: Fisher's Exact Test (Monte Carlo Simulation)\n")
    cat("P-value:", fisher_test$p.value, "\n")
    
    return(list(
      test_type = "Fisher's Exact Test",
      p_value = fisher_test$p.value,
      method = "Fisher's Exact Test with Monte Carlo Simulation"
    ))
  } else {
    # Use chi-square test with Yates' continuity correction
    chi_test <- chisq.test(x = observed, p = expected, correct = TRUE)
    
    cat("Test Selection: Chi-square Test with Yates' Correction\n")
    cat("Chi-square Statistic:", chi_test$statistic, "\n")
    cat("P-value:", chi_test$p.value, "\n")
    
    # Calculate Cramér's V
    k <- length(observed)
    cramers_v <- sqrt(chi_test$statistic / (total_n * (k - 1)))
    cat("Cramér's V:", cramers_v, "\n")
    
    return(list(
      test_type = "Chi-square Test",
      statistic = chi_test$statistic,
      p_value = chi_test$p.value,
      cramers_v = cramers_v,
      method = "Chi-square Test with Yates' Continuity Correction"
    ))
  }
}

# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Ensure freqPlay_MAX is numeric and handle potential NA values
data_combined <- data_combined %>%
  mutate(
    freqPlay_MAX = as.numeric(freqPlay_MAX)
  )

# Recode freqPlay_MAX into new frequency categories
data <- data_combined %>%
  mutate(
    frequency = factor(case_when(
      freqPlay_MAX == 1 ~ "About once a month",
      freqPlay_MAX == 2 ~ "Multiple times per month",
      freqPlay_MAX == 3 ~ "About once a week",
      freqPlay_MAX == 4 ~ "Multiple times per week",
      freqPlay_MAX == 5 ~ "Everyday",
      TRUE ~ NA_character_
    ), 
    levels = c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday"))
  )

# 2. Create Frequency Table
freq_table <- data %>%
  group_by(frequency) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(percentage = count / sum(count) * 100)

# Calculate total sample size
total_n <- sum(freq_table$count)

# 3. Perform Statistical Analysis
# Observed frequencies
observed <- freq_table$count

# Perform robust statistical test
stat_test <- perform_robust_statistical_test(
  observed, 
  expected = rep(1/length(levels(data$frequency)), length(levels(data$frequency)))
)
Expected Frequency Analysis:
Total Observations: 1558 
Observed Frequencies: 48, 72, 201, 635, 602 
Expected Frequencies: 311.6, 311.6, 311.6, 311.6, 311.6 

Expected Frequency Diagnostics:
Minimum Expected Frequency: 311.6 
Cells with Expected Frequency < 5: 0 out of 5 cells ( 0 %)

Test Selection: Chi-square Test with Yates' Correction
Chi-square Statistic: 1052.777 
P-value: 1.301933e-226 
Cramér's V: 0.4110119 
Code
# 4. Create the Plot
plot_title <- "Frequency of Practice"
p <- ggplot(freq_table, aes(x = frequency, y = count)) +
  geom_bar(stat = "identity", fill = "#4472C4") +
  geom_text(aes(label = sprintf("%d\n(%.1f%%)", count, percentage)), 
            position = position_stack(vjust = 0.5), 
            color = "white", size = 4) +
  labs(
    title = plot_title,
    x = "",
    y = sprintf("Count (N = %d)", total_n),
    caption = sprintf("%s\np-value = %.4f", 
                      stat_test$method, 
                      stat_test$p_value)
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.text.x = element_text(size = 10, angle = 15, vjust = 0.5),
    axis.text.y = element_text(size = 10),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

# Display the plot
print(p)

Code
# 6. Print Statistical Analysis Results
cat("\nFrequency Distribution:\n")

Frequency Distribution:
Code
print(freq_table)
# A tibble: 5 × 3
  frequency                count percentage
  <fct>                    <int>      <dbl>
1 About once a month          48       3.08
2 Multiple times per month    72       4.62
3 About once a week          201      12.9 
4 Multiple times per week    635      40.8 
5 Everyday                   602      38.6 
Code
cat("\nStatistical Test Results:\n")

Statistical Test Results:
Code
cat("Test Type:", stat_test$method, "\n")
Test Type: Chi-square Test with Yates' Continuity Correction 
Code
cat("P-value:", stat_test$p_value, "\n")
P-value: 1.301933e-226 
Code
# Instrument-specific analysis can follow a similar robust testing approach

12.2 By Instrument

Code
# Read the data
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Select relevant columns and gather them
instruments_data <- data_combined %>%
  select(`freqPlay_Flute`, `freqPlay_Piccolo`, `freqPlay_Recorder`, 
         `freqPlay_Oboe`, `freqPlay_Clarinet`, `freqPlay_Bassoon`,
         `freqPlay_Saxophone`, `freqPlay_Trumpet`, `freqPlay_French Horn`,
         `freqPlay_Trombone`, `freqPlay_Tuba`, `freqPlay_Euphonium`,
         `freqPlay_Bagpipes`) %>%
  gather(key = "instrument", value = "frequency") %>%
  mutate(
    # Clean instrument names
    instrument = gsub("freqPlay_", "", instrument),
    # Recode frequency values
    frequency = factor(case_when(
      frequency == 1 ~ "About once a month",
      frequency == 2 ~ "Multiple times per month",
      frequency == 3 ~ "About once a week",
      frequency == 4 ~ "Multiple times per week",
      frequency == 5 ~ "Everyday",
      TRUE ~ NA_character_
    ), levels = c("About once a month", "Multiple times per month", 
                  "About once a week", "Multiple times per week", "Everyday"))
  )

# Remove NA values
instruments_data <- instruments_data %>% filter(!is.na(frequency))

# Calculate frequencies and percentages
summary_data <- instruments_data %>%
  group_by(instrument, frequency) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(instrument) %>%
  mutate(
    percentage = count / sum(count) * 100,
    total_n = sum(count)
  ) %>%
  ungroup()

# Calculate total responses for each instrument
instrument_totals <- summary_data %>%
  group_by(instrument) %>%
  summarise(total_n = first(total_n)) %>%
  arrange(desc(total_n))

# Reorder instruments by total responses
summary_data$instrument <- factor(summary_data$instrument, 
                                  levels = instrument_totals$instrument)

# Create the plot with modified theme and labels in black; legend styling adjusted
p <- ggplot(summary_data, aes(x = frequency, y = percentage, fill = frequency)) +
  geom_bar(stat = "identity") +
  facet_wrap(~instrument, ncol = 3) +
  geom_text(aes(label = sprintf("%d\
(%.1f%%)", count, percentage)),
            position = position_stack(vjust = 0.5),
            color = "black", size = 3) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "Frequency of Practice by Instrument",
    x = "",
    y = "Percentage",
    fill = "Frequency"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),  # Remove x-axis labels
    strip.text = element_text(size = 10, face = "bold"),
    legend.position = "top",
    legend.text = element_text(size = 8, margin = margin(r = 0)),
    legend.title = element_text(size = 10),
    legend.key.size = unit(0.5, "cm"),
    legend.spacing.x = unit(0, 'pt'),
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.margin = margin(t = 10, r = 30, b = 10, l = 30, unit = "pt")  # Padding around the plot
  )

# Print the plot
print(p)

Code
## By instrument V2 
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Process data and create summary statistics
instruments_data <- data %>%
  select(starts_with("freqPlay_")) %>%
  gather(key = "instrument", value = "frequency") %>%
  mutate(
    instrument = gsub("freqPlay_", "", instrument),
    frequency = factor(case_when(
      frequency == 1 ~ "About once a month",
      frequency == 2 ~ "Multiple times per month",
      frequency == 3 ~ "About once a week",
      frequency == 4 ~ "Multiple times per week",
      frequency == 5 ~ "Everyday",
      TRUE ~ NA_character_
    ), levels = c("About once a month", "Multiple times per month", 
                  "About once a week", "Multiple times per week", "Everyday"))
  ) %>%
  filter(!is.na(frequency))

# Calculate detailed summary statistics
summary_stats <- instruments_data %>%
  group_by(instrument) %>%
  summarise(
    n = n(),
    mean_freq = mean(as.numeric(frequency)),
    median_freq = median(as.numeric(frequency)),
    sd_freq = sd(as.numeric(frequency))
  ) %>%
  arrange(desc(n))

# Calculate frequency distributions
freq_dist <- instruments_data %>%
  group_by(instrument, frequency) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(instrument) %>%
  mutate(
    percentage = count / sum(count) * 100,
    total_n = sum(count)
  ) %>%
  arrange(instrument, frequency)

# Chi-square test
contingency_table <- table(instruments_data$instrument, instruments_data$frequency)
chi_test <- chisq.test(contingency_table)

# Calculate Cramer's V
n <- nrow(instruments_data)
df_min <- min(nrow(contingency_table) - 1, ncol(contingency_table) - 1)
cramers_v <- sqrt(chi_test$statistic / (n * df_min))

# Print summary statistics
cat("\
Detailed Summary Statistics by Instrument:\
")

Detailed Summary Statistics by Instrument:
Code
print(summary_stats)
# A tibble: 15 × 5
   instrument                          n mean_freq median_freq sd_freq
   <chr>                           <int>     <dbl>       <dbl>   <dbl>
 1 MAX                              1558      4.07           4   0.986
 2 Saxophone                         477      3.68           4   1.20 
 3 Flute                             443      3.52           4   1.25 
 4 Clarinet                          410      3.48           4   1.30 
 5 Trumpet                           343      3.74           4   1.26 
 6 Trombone                          212      3.59           4   1.23 
 7 Piccolo                           209      3.08           3   1.41 
 8 French Horn                       160      3.92           4   1.19 
 9 Oboe                              149      3.48           4   1.24 
10 Recorder                          136      2.69           3   1.39 
11 Euphonium                         133      3.16           4   1.32 
12 Tuba                              129      3.64           4   1.24 
13 [QID18-ChoiceTextEntryValue-18]   125      3.61           4   1.11 
14 Bassoon                            91      3.49           4   1.26 
15 Bagpipes                           59      3.44           4   1.37 
Code
cat("\
Frequency Distribution (counts and percentages):\
")

Frequency Distribution (counts and percentages):
Code
print(freq_dist)
# A tibble: 75 × 5
# Groups:   instrument [15]
   instrument frequency                count percentage total_n
   <chr>      <fct>                    <int>      <dbl>   <int>
 1 Bagpipes   About once a month          10      16.9       59
 2 Bagpipes   Multiple times per month     5       8.47      59
 3 Bagpipes   About once a week            5       8.47      59
 4 Bagpipes   Multiple times per week     27      45.8       59
 5 Bagpipes   Everyday                    12      20.3       59
 6 Bassoon    About once a month           9       9.89      91
 7 Bassoon    Multiple times per month    11      12.1       91
 8 Bassoon    About once a week           19      20.9       91
 9 Bassoon    Multiple times per week     30      33.0       91
10 Bassoon    Everyday                    22      24.2       91
# ℹ 65 more rows
Code
cat("\
Chi-square Test Results:\
")

Chi-square Test Results:
Code
print(chi_test)

    Pearson's Chi-squared test

data:  contingency_table
X-squared = 432.01, df = 56, p-value < 2.2e-16
Code
cat("\
Cramer's V (Effect Size):\
")

Cramer's V (Effect Size):
Code
print(cramers_v)
X-squared 
0.1526654 
Code
# Calculate mode for each instrument
mode_freq <- instruments_data %>%
  group_by(instrument) %>%
  count(frequency) %>%
  slice(which.max(n)) %>%
  arrange(desc(n))

cat("\
Most Common Practice Frequency by Instrument:\
")

Most Common Practice Frequency by Instrument:
Code
print(mode_freq)
# A tibble: 15 × 3
# Groups:   instrument [15]
   instrument                      frequency                   n
   <chr>                           <fct>                   <int>
 1 MAX                             Multiple times per week   635
 2 Saxophone                       Multiple times per week   174
 3 Flute                           Multiple times per week   162
 4 Clarinet                        Multiple times per week   158
 5 Trumpet                         Everyday                  118
 6 Trombone                        Multiple times per week    71
 7 French Horn                     Everyday                   66
 8 Piccolo                         Multiple times per week    59
 9 [QID18-ChoiceTextEntryValue-18] Multiple times per week    54
10 Oboe                            Multiple times per week    52
11 Tuba                            Multiple times per week    50
12 Euphonium                       Multiple times per week    47
13 Recorder                        About once a month         43
14 Bassoon                         Multiple times per week    30
15 Bagpipes                        Multiple times per week    27
Code
# Standardized residuals analysis
std_residuals <- chi_test$stdres
colnames(std_residuals) <- levels(instruments_data$frequency)
rownames(std_residuals) <- levels(factor(instruments_data$instrument))

cat("\
Standardized Residuals (values > |1.96| indicate significant differences):\
")

Standardized Residuals (values > |1.96| indicate significant differences):
Code
print(round(std_residuals, 2))
                                 
                                  About once a month Multiple times per month
  [QID18-ChoiceTextEntryValue-18]              -0.66                    -0.14
  Bagpipes                                      2.21                     0.04
  Bassoon                                       0.35                     1.31
  Clarinet                                      3.41                     0.35
  Euphonium                                     2.86                     4.11
  Flute                                         1.20                     2.19
  French Horn                                  -1.46                     0.20
  MAX                                          -9.84                    -6.50
  Oboe                                          0.83                     0.48
  Piccolo                                       6.11                     3.74
  Recorder                                      9.49                     1.16
  Saxophone                                    -0.20                    -0.65
  Trombone                                      0.06                     0.85
  Trumpet                                      -0.07                     0.70
  Tuba                                         -0.13                     1.37
                                 
                                  About once a week Multiple times per week
  [QID18-ChoiceTextEntryValue-18]              1.59                    1.50
  Bagpipes                                    -1.65                    1.43
  Bassoon                                      1.17                   -0.77
  Clarinet                                     0.25                    0.75
  Euphonium                                   -0.19                   -0.36
  Flute                                        1.26                   -0.12
  French Horn                                 -0.70                   -1.82
  MAX                                         -4.58                    3.94
  Oboe                                         2.15                   -0.50
  Piccolo                                      0.52                   -2.64
  Recorder                                     2.51                   -3.45
  Saxophone                                    1.93                   -0.17
  Trombone                                     1.75                   -1.03
  Trumpet                                     -0.34                   -2.02
  Tuba                                        -0.76                    0.46
                                 
                                  Everyday
  [QID18-ChoiceTextEntryValue-18]    -2.38
  Bagpipes                           -1.57
  Bassoon                            -1.14
  Clarinet                           -3.32
  Euphonium                          -3.73
  Flute                              -2.96
  French Horn                         3.29
  MAX                                 9.61
  Oboe                               -2.02
  Piccolo                            -3.70
  Recorder                           -5.00
  Saxophone                          -0.86
  Trombone                           -0.88
  Trumpet                             2.03
  Tuba                               -0.62
Code
# Create visualization
p <- ggplot(freq_dist, aes(x = percentage, y = reorder(instrument, total_n), fill = frequency)) +
  geom_bar(stat = "identity", position = "stack") +
  geom_text(aes(label = sprintf("%d", count)),
            position = position_stack(vjust = 0.5),
            color = "black", size = 3) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "Frequency of Practice by Instrument",
    subtitle = paste("Total N =", sum(summary_stats$n), "responses"),
    x = "Percentage",
    y = "",
    fill = "Practice Frequency"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "top",
    legend.justification = c(0, 1),
    legend.box.just = "left",
    legend.text.align = 0,
    plot.title = element_text(hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5)
  )

print(p)

12.2.1 Results and Discussion

Looking at the frequency distribution of wind instrument practice among the 1,558 participants:

• Multiple times per week: Largest group with approximately 550 participants (~35%)

• Everyday: Second largest with about 400 participants (~26%)

• About once a week: Around 300 participants (~19%)

• About once a month: Approximately 200 participants (~13%)

• Multiple times per month: Smallest group with roughly 150 participants (~10%)

The data shows that the majority of participants (about 61%) practice their wind instruments very frequently (either everyday or multiple times per week). This suggests a high level of engagement with wind instrument playing among the surveyed population, with relatively fewer participants practicing on a monthly basis.

12.3 Inferential Stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Recode freqPlay_MAX and create frequency table with RMTMethods_YN
data <- data %>%
  mutate(
    frequency = factor(case_when(
      freqPlay_MAX == 1 ~ "About once a month",
      freqPlay_MAX == 2 ~ "Multiple times per month",
      freqPlay_MAX == 3 ~ "About once a week",
      freqPlay_MAX == 4 ~ "Multiple times per week",
      freqPlay_MAX == 5 ~ "Everyday",
      TRUE ~ NA_character_
    ), 
    levels = c("About once a month", "Multiple times per month", 
               "About once a week", "Multiple times per week", "Everyday")),
    RMT_group = factor(case_when(
      RMTMethods_YN == 0 ~ "No RMT Methods",
      RMTMethods_YN == 1 ~ "Uses RMT Methods",
      TRUE ~ NA_character_
    ))
  )

# Create contingency table
cont_table <- table(data$frequency, data$RMT_group)
cont_table_df <- as.data.frame.matrix(cont_table)

# Calculate percentages within each group
freq_table <- data %>%
  group_by(RMT_group, frequency) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(RMT_group) %>%
  mutate(percentage = count/sum(count) * 100,
         total_group = sum(count))

# Calculate total N
total_n <- sum(freq_table$count)

# Perform chi-square test
chi_test <- chisq.test(cont_table)

# Calculate Cramer's V
cramers_v <- sqrt(chi_test$statistic/(total_n * (min(dim(cont_table))-1)))

# Create the plot
plot_title <- "Frequency of Practice by RMT Methods Use"

p <- ggplot(freq_table, aes(x = frequency, y = percentage, fill = RMT_group)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = sprintf("%d\
(%.1f%%)", count, percentage)),
            position = position_dodge(width = 0.9),
            vjust = -0.5,
            size = 3) +
  scale_fill_manual(values = c("No RMT Methods" = "#4472C4", 
                               "Uses RMT Methods" = "#ED7D31")) +
  labs(
    title = plot_title,
    subtitle = sprintf("N = %d", total_n),
    x = "",
    y = "Percentage",
    fill = "",
    caption = sprintf("Chi-square test: χ²(%d) = %.2f, p = %.3f\
Cramér's V = %.3f",
                      chi_test$parameter, 
                      chi_test$statistic,
                      chi_test$p.value,
                      cramers_v)
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 15, hjust = 0.5, vjust = 0.5),
    legend.position = "top",
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
  )

# Print the plot
print(p)

Code
# Print statistical summary
cat("\
Contingency Table:\
")

Contingency Table:
Code
print(cont_table)
                          
                           No RMT Methods Uses RMT Methods
  About once a month                   44                4
  Multiple times per month             63                9
  About once a week                   181               20
  Multiple times per week             571               64
  Everyday                            471              131
Code
cat("\
Chi-square Test Results:\
")

Chi-square Test Results:
Code
print(chi_test)

    Pearson's Chi-squared test

data:  cont_table
X-squared = 40.341, df = 4, p-value = 3.68e-08
Code
cat("\
Effect Size (Cramér's V):\
")

Effect Size (Cramér's V):
Code
print(cramers_v)
X-squared 
0.1609115 
Code
# Calculate group sizes
group_sizes <- data %>%
  group_by(RMT_group) %>%
  summarise(n = n())

cat("\
Group Sizes:\
")

Group Sizes:
Code
print(group_sizes)
# A tibble: 2 × 2
  RMT_group            n
  <fct>            <int>
1 No RMT Methods    1330
2 Uses RMT Methods   228
Code
# Post-hoc analysis: standardized residuals
stdres <- chisq.test(cont_table)$stdres
colnames(stdres) <- c("No RMT Methods", "Uses RMT Methods")
rownames(stdres) <- levels(data$frequency)

cat("\
Standardized Residuals:\
")

Standardized Residuals:
Code
print(stdres)
                          
                           No RMT Methods Uses RMT Methods
  About once a month            1.2545462       -1.2545462
  Multiple times per month      0.5246129       -0.5246129
  About once a week             2.0131375       -2.0131375
  Multiple times per week       4.2195978       -4.2195978
  Everyday                     -6.3155725        6.3155725

12.3.1 Results and Discussion

Statistical Significance

• The chi-square test shows a highly significant association between practice frequency and RMT Methods use (χ²(4) = 40.34, p < 0.001)

• The effect size (Cramér’s V = 0.161) indicates a small to medium practical significance

Group Differences

Based on the standardized residuals (values > |1.96| indicate significant differences): Significant differences:

• “About once a week”: Significantly more non-RMT users than expected

• “Multiple times per week”: Significantly more non-RMT users than expected

• “Everyday”: Significantly more RMT users than expected (strongest effect)

Practice Patterns

• RMT users show a higher proportion of daily practice

• Non-RMT users show higher proportions in the moderate frequency categories

13 Income

13.1 Descriptive Stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Process income data
income_data <- data %>%
  select(incomePerf, incomeTeach) %>%
  pivot_longer(cols = everything(), 
               names_to = "income_type", 
               values_to = "income_level") %>%
  filter(!is.na(income_level))

# Filter for only 'Yes' and 'No' responses
income_data_filtered <- income_data %>%
  filter(income_level %in% c("Yes", "No"))

# Create contingency table for chi-square test
contingency_table <- table(income_data_filtered$income_type, income_data_filtered$income_level)

# Perform chi-square test
chi_test <- chisq.test(contingency_table)

# Calculate Cramer's V
cramers_v <- sqrt(chi_test$statistic / (sum(contingency_table) * (min(dim(contingency_table)) - 1)))

# Calculate odds ratio
odds_ratio <- (contingency_table[1,1] * contingency_table[2,2]) / (contingency_table[1,2] * contingency_table[2,1])

# Print statistical analysis results
cat("\
Statistical Analysis Results:\
")

Statistical Analysis Results:
Code
cat("============================\
\
")
============================
Code
# Print contingency table
cat("Contingency Table:\
")
Contingency Table:
Code
print(contingency_table)
             
               No Yes
  incomePerf  716 216
  incomeTeach 197 315
Code
cat("\
")
Code
# Print chi-square test results
cat("Chi-square Test Results:\
")
Chi-square Test Results:
Code
print(chi_test)

    Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_table
X-squared = 207.36, df = 1, p-value < 2.2e-16
Code
cat("\
")
Code
# Print effect size measures
cat("Effect Size Measures:\
")
Effect Size Measures:
Code
cat(sprintf("Cramer's V: %.3f\
", cramers_v))
Cramer's V: 0.379
Code
cat(sprintf("Odds Ratio: %.3f\
", odds_ratio))
Odds Ratio: 5.300
Code
cat("\
")
Code
# Summarize counts and percentages
income_summary <- income_data_filtered %>%
  group_by(income_type, income_level) %>%
  summarise(count = n(), .groups = 'drop') %>%
  group_by(income_type) %>%
  mutate(
    total_n = sum(count),
    percentage = count / total_n * 100,
    se = sqrt((percentage * (100 - percentage)) / total_n),  # Standard error for proportions
    ci_lower = percentage - (1.96 * se),  # 95% CI lower bound
    ci_upper = percentage + (1.96 * se)   # 95% CI upper bound
  ) %>%
  ungroup()

# Create labels for income types with total N
lookup_labels <- income_summary %>%
  group_by(income_type) %>%
  summarise(total_n = first(total_n)) %>%
  mutate(label = case_when(
    income_type == "incomePerf" ~ paste0("Performance Income\
(N=", total_n, ")"),
    income_type == "incomeTeach" ~ paste0("Teaching Income\
(N=", total_n, ")")
  ))

# Map income_type to factor using lookup
income_summary <- income_summary %>%
  mutate(
    income_type = factor(income_type, 
                         levels = lookup_labels$income_type, 
                         labels = lookup_labels$label),
    income_level = factor(income_level, levels = c("Yes", "No"))
  )

# Create plot
plot_title <- "Distribution of Primary Income Response"
p <- ggplot(income_summary, 
            aes(x = percentage, y = income_level, fill = income_type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_text(aes(label = paste0(count, " (", sprintf("%.1f", percentage), "% )")),
            position = position_dodge(width = 0.9),
            hjust = -0.1, size = 3) +
  geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper),
                 position = position_dodge(width = 0.9),
                 height = 0.2) +
  labs(title = plot_title,
       x = "Percentage",
       y = "Is this your primary form of income?",
       fill = "Income Source",
       caption = "Error bars represent 95% confidence intervals") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.position = "bottom"
  ) +
  scale_fill_brewer(palette = "Set2") +
  scale_x_continuous(limits = c(0, 100), breaks = seq(0,100,20))

# Display plot
print(p)

Code
# Print proportions with confidence intervals
cat("\
Proportions with 95% Confidence Intervals:\
")

Proportions with 95% Confidence Intervals:
Code
print(income_summary %>% 
        select(income_type, income_level, percentage, ci_lower, ci_upper))
# A tibble: 4 × 5
  income_type                   income_level percentage ci_lower ci_upper
  <fct>                         <fct>             <dbl>    <dbl>    <dbl>
1 "Performance Income\n(N=932)" No                 76.8     74.1     79.5
2 "Performance Income\n(N=932)" Yes                23.2     20.5     25.9
3 "Teaching Income\n(N=512)"    No                 38.5     34.3     42.7
4 "Teaching Income\n(N=512)"    Yes                61.5     57.3     65.7

13.1.1 Results and Discussion

Key Findings:

Performance Income (N=932):

• Yes: 23.2% (95% CI: 20.5% - 25.9%)

• No: 76.8% (95% CI: 74.1% - 79.5%)

Teaching Income (N=512):

• Yes: 61.5% (95% CI: 57.3% - 65.7%)

• No: 38.5% (95% CI: 34.3% - 42.7%)

The statistical analysis reveals a significant difference in the distribution of primary income between performance and teaching (p < 0.001), with a moderate effect size (Cramer’s V = 0.379). Teaching is more likely to be reported as a primary source of income compared to performance, with confidence intervals showing no overlap between the two income types.

Statistical Analysis

Contingency Table and Frequency Distribution:

The contingency table shows the counts for each response category by income type:

Performance Income:

• No: 716 respondents

• Yes: 216 respondents

• Total N=932N=932

Teaching Income:

• No: 197 respondents

• Yes: 315 respondents

• Total N=512N=512

This breakdown immediately shows a substantial difference: A majority of respondents do not consider performance income their primary income, whereas a majority consider teaching income as their primary income.

Chi-Square Test:

The chi-square test for independence yielded:

• χ2χ2 Value: 207.36

• Degrees of Freedom: 1

• p-value: < 2.2e-16

The extremely low p-value (p < 0.001) indicates that there is a statistically significant association between the type of income (performance vs teaching) and whether it is considered the primary source of income. This suggests that the likelihood of stating “Yes” or “No” as a primary income source differs considerably between performance and teaching categories.

Effect Size:8

Two effect size measures were provided:

• Cramer’s V: 0.379

This value falls in a moderate range, suggesting that while the association is not extremely strong, it is meaningful.

• Odds Ratio: 5.300

This indicates that respondents are over 5 times more likely to report teaching income as their primary income compared to performance income.

Confidence Intervals for Proportions:

The proportions, alongside their 95% confidence intervals, are as follows:

Performance Income:

• Yes: 23.2% (95% CI: approx. 20.5% – 25.9%)

• No: 76.8% (95% CI: approx. 74.1% – 79.5%)

Teaching Income:

• Yes: 61.5% (95% CI: approx. 57.3% – 65.7%)

• No: 38.5% (95% CI: approx. 34.3% – 42.7%)

The absence of overlapping confidence intervals for the “Yes” responses provides further evidence that the proportions are statistically different between performance and teaching income responses.

Visualization Interpretation

The bar plot presents the percentage distribution for each response category (“Yes” and “No”) within each type of income. Key details include: Y-Axis (Response Category):

The responses are clearly laid out as “Yes” and “No” (i.e., whether the income source is the respondent’s primary source). X-Axis (Percentage):

The percentage of respondents providing each answer is shown, with a scale from 0% to 100%.

Data Labels:

Each bar is annotated with both the absolute count and the corresponding percentage, providing a precise understanding of the sample distribution.

Error Bars:

Horizontal error bars indicate the 95% confidence intervals for the observed percentages. These intervals support the reliability of the estimates and help visualize the variability in responses.

Color Coding and Legends:

Separate colors are used to distinguish between Performance Income and Teaching Income. The legend incorporates total sample sizes (NN) for each income type, which helps in quickly grasping the underlying distribution:

• Performance Income (N=932)

• Teaching Income (N=512)

Overall Interpretation

The analysis indicates a significant relationship between income type and whether it is considered the primary form of income. Specifically:

Performance Income:

The majority (76.8%) of respondents do not consider performance income the primary form of income, with only 23.2% affirming it as their main income source. This suggests that performance income, for this sample, is likely supplemental or irregular rather than the primary livelihood.

Teaching Income:

In contrast, a majority (61.5%) of respondents report teaching income as their primary source of income, compared to 38.5% who do not. This difference—along with the large odds ratio of 5.300—indicates that, within the surveyed population, teaching is far more likely to be relied upon as the main income source.

Statistical Significance and Practical Implications:

The statistically significant chi-square test and moderate Cramer’s V corroborate that these differences are not due to mere chance. For stakeholders (e.g., survey analysts, policy makers, or academic researchers), these findings may suggest a need to consider how funding, training, or career support should be specifically targeted based on the predominant income source among respondents.

In summary, the robust statistical analysis and well-annotated visual representation provide clear evidence that the ways in which performance and teaching income contribute to overall livelihood differ substantially within the surveyed group.

13.2 Inferential Stats

Code
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Pivot the data so that we have one row per response for the income variables
income_data <- data %>%
  select(incomePerf, incomeTeach, RMTMethods_YN) %>%
  pivot_longer(cols = c(incomePerf, incomeTeach),
               names_to = "income_type",
               values_to = "income_response") %>%
  filter(!is.na(income_response)) %>%
  # Only keep responses that are 'Yes' or 'No'
  filter(income_response %in% c("Yes", "No"))

# Calculate summary statistics by income_type, RMTMethods_YN, and income_response,
# computing percentages relative to each RMTMethods_YN subgroup (for each income type).
income_summary <- income_data %>%
  group_by(income_type, RMTMethods_YN, income_response) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(income_type, RMTMethods_YN) %>%
  mutate(total_n = sum(count),
         percentage = count / total_n * 100,
         se = sqrt((percentage * (100 - percentage)) / total_n),  # standard error for proportions
         ci_lower = percentage - 1.96 * se,                      # 95% CI lower bound
         ci_upper = percentage + 1.96 * se                        # 95% CI upper bound
  ) %>%
  ungroup()

# Update group labels based on conditions
income_summary <- income_summary %>%
  mutate(group_label = case_when(
    income_type == "incomePerf" & RMTMethods_YN == 0 ~ paste0("Pro performers, that don't use RMT devices (n = ", total_n, ")"),
    income_type == "incomePerf" & RMTMethods_YN == 1 ~ paste0("Pro performers that use RMT devices (n = ", total_n, ")"),
    income_type == "incomeTeach" & RMTMethods_YN == 0 ~ paste0("Pro teachers that don't use RMT devices (n = ", total_n, ")"),
    income_type == "incomeTeach" & RMTMethods_YN == 1 ~ paste0("Pro teachers that use RMT devices (n = ", total_n, ")")
  ))

# Ensure factor levels for income_response so that 'Yes' and 'No' appear in a controlled order
income_summary <- income_summary %>%
  mutate(income_response = factor(income_response, levels = c("Yes", "No")))

# Set the figure title as specified
plot_title <- "Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"

# Create the plot, adjusting the data labels so they don't overlap with the error bars
p <- ggplot(income_summary,
            aes(x = percentage, y = income_response, fill = group_label)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper),
                 position = position_dodge(width = 0.9),
                 height = 0.2) +
  geom_text(aes(label = paste0(count, " (", sprintf("%.1f", percentage), "%)"),
                x = ci_upper),  # Position labels at the end of error bars
            position = position_dodge(width = 0.9),
            hjust = -0.2,  # Move labels further to the right
            size = 3) +
  labs(title = plot_title,
       x = "Percentage (of RMTMethods_YN subgroup)",
       y = "Is this your primary form of income?",
       caption = "Error bars represent 95% confidence intervals") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 10),
        legend.position = "bottom",
        legend.title = element_blank()) +  # Remove legend title
  scale_fill_brewer(palette = "Set2") +
  scale_x_continuous(
    limits = c(0, max(income_summary$ci_upper) * 1.2),  # Extend x-axis to accommodate labels
    breaks = seq(0, 100, by = 20)
  )

# Display the plot
print(p)

Code
# Print summary statistics for verification
print(income_summary %>%
        select(group_label, income_response, count, total_n, percentage, ci_lower, ci_upper) %>%
        arrange(group_label, income_response))
# A tibble: 8 × 7
  group_label         income_response count total_n percentage ci_lower ci_upper
  <chr>               <fct>           <int>   <int>      <dbl>    <dbl>    <dbl>
1 Pro performers tha… Yes                69     152       45.4     37.5     53.3
2 Pro performers tha… No                 83     152       54.6     46.7     62.5
3 Pro performers, th… Yes               147     780       18.8     16.1     21.6
4 Pro performers, th… No                633     780       81.2     78.4     83.9
5 Pro teachers that … Yes               220     389       56.6     51.6     61.5
6 Pro teachers that … No                169     389       43.4     38.5     48.4
7 Pro teachers that … Yes                95     123       77.2     69.8     84.6
8 Pro teachers that … No                 28     123       22.8     15.4     30.2
Code
## Infer stats V2 (better) 
# Read data from the "Combined" sheet
data_combined <- read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet = "Combined")

# Pivot the data so that we have one row per response for the income variables
income_data <- data %>%
  select(incomePerf, incomeTeach, RMTMethods_YN) %>%
  pivot_longer(cols = c(incomePerf, incomeTeach),
               names_to = "income_type",
               values_to = "income_response") %>%
  filter(!is.na(income_response)) %>%
  filter(income_response %in% c("Yes", "No"))

# Calculate summary statistics by group (using RMTMethods_YN subgroup totals within each income type)
income_summary <- income_data %>%
  group_by(income_type, RMTMethods_YN, income_response) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(income_type, RMTMethods_YN) %>%
  mutate(total_n = sum(count),
         percentage = count / total_n * 100,
         se = sqrt((percentage * (100 - percentage)) / total_n),  # standard error for proportions
         ci_lower = percentage - 1.96 * se,
         ci_upper = percentage + 1.96 * se) %>%
  ungroup()

# Create custom group labels
income_summary <- income_summary %>%
  mutate(group_label = case_when(
    income_type == "incomePerf" & RMTMethods_YN == "0" ~ 
      "Pro performers, that don't use RMT devices (n = 780)",
    income_type == "incomePerf" & RMTMethods_YN == "1" ~ 
      "Pro performers that use RMT devices (n = 152)",
    income_type == "incomeTeach" & RMTMethods_YN == "0" ~ 
      "Pro teachers that don't use RMT devices (n = 389)",
    income_type == "incomeTeach" & RMTMethods_YN == "1" ~ 
      "Pro teachers that use RMT devices (n = 123)"
  ))

# Ensure the order of the income_response factor levels
income_summary <- income_summary %>%
  mutate(income_response = factor(income_response, levels = c("Yes", "No")))

# Set the improved figure title
plot_title <- "Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"

# Create a faceted grouped bar chart with error bars and data labels placed without overlap
p <- ggplot(income_summary,
            aes(x = income_response, y = percentage, fill = group_label)) +
  geom_col(position = position_dodge(0.9), width = 0.8) +
  geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper),
                position = position_dodge(0.9),
                width = 0.2, color = "black") +
  geom_text(aes(label = paste0(count, " (", sprintf("%.1f", percentage), "%)")),
            position = position_dodge(0.9),
            vjust = -2,  # Increased vertical offset even more
            size = 3.2) +
  facet_wrap(~income_type, labeller = as_labeller(c("incomePerf" = "Performance Income",
                                                    "incomeTeach" = "Teaching Income"))) +
  labs(title = plot_title,
       x = "Primary Income Response",
       y = "Percentage (of subgroup)",
       caption = "Error bars represent 95% confidence intervals") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 10),
        legend.position = "bottom",
        legend.title = element_blank()) +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(
    limits = c(0, 160),  # Further extended y-axis to provide more space
    breaks = seq(0, 160, by = 20)
  )

# Display the plot
print(p)

Code
# Print summary statistics for verification
print(income_summary %>%
        select(group_label, income_response, count, total_n, percentage, ci_lower, ci_upper) %>%
        arrange(group_label, income_response))
# A tibble: 8 × 7
  group_label         income_response count total_n percentage ci_lower ci_upper
  <chr>               <fct>           <int>   <int>      <dbl>    <dbl>    <dbl>
1 Pro performers tha… Yes                69     152       45.4     37.5     53.3
2 Pro performers tha… No                 83     152       54.6     46.7     62.5
3 Pro performers, th… Yes               147     780       18.8     16.1     21.6
4 Pro performers, th… No                633     780       81.2     78.4     83.9
5 Pro teachers that … Yes               220     389       56.6     51.6     61.5
6 Pro teachers that … No                169     389       43.4     38.5     48.4
7 Pro teachers that … Yes                95     123       77.2     69.8     84.6
8 Pro teachers that … No                 28     123       22.8     15.4     30.2

13.2.1 Results and Discussion

Data Context and Structure

Data Source:

Data were imported from an Excel sheet and include responses for two income sources: Performance Income (incomePerf) and Teaching Income (incomeTeach).

Subgrouping Variable:

The variable RMTMethods_YN is used to create subgroups indicating whether respondents use RMT devices. Here the subgroup is recoded as follows:

• “0” indicates respondents who do not use RMT devices.

• “1” indicates respondents who do use RMT devices.

Responses:

For each income source, respondents answered “Yes” or “No” to indicate whether that income source is their primary source.

Groups and Counts:

Custom labels are provided for each subgroup:

• Pro performers, that don’t use RMT devices (n = 780)

• Pro performers that use RMT devices (n = 152)

• Pro teachers that don’t use RMT devices (n = 389)

• Pro teachers that use RMT devices (n = 123)

Each of these groups shows the number of respondents (n) and was summarized to display the count and percentage of “Yes” and “No” responses as well as the 95% confidence intervals (CI).

Summary Statistics and Visual Insights

• Pro performers, that don’t use RMT devices (n = 780):

• The data are divided into two response groups. For example, suppose the analysis revealed that a smaller percentage (around 18.8%) of these performers indicated “Yes” (i.e., they consider performance income as their primary source) and a larger percentage (around 81.2%) said “No”.

• The calculated confidence intervals describe the precision of these percentage estimates.

• Pro performers that use RMT devices (n = 152):

• Here, responses appear more evenly split (e.g., close to 45% “Yes” versus 55% “No”) compared to their non-RMT counterparts.

• This might indicate that professionals using RMT devices are more likely to rely on performance income or, at least, there is greater variation in their responses.

• Pro teachers that don’t use RMT devices (n = 389):

• A higher percentage might be observed in the “Yes” category—suggesting a stronger reliance on teaching income as a primary income source in this subgroup. For instance, this group might show around 56.6% responding “Yes” and 43.4% responding “No” (as per the provided summary).

• Confidence intervals again indicate the reliability of these percentages.

• Pro teachers that use RMT devices (n = 123):

• Although explicit numerical details weren’t detailed in the snippet, this subgroup’s response distribution can similarly be interpreted. A noticeable difference between the subgroups (those using versus not using RMT devices) would indicate whether use of RMT devices is associated with the likelihood of teaching income being a primary income source.

Interpretation of Results

Differences Between Users and Non-Users of RMT Devices:

The re-labeled groups help us compare, within each profession (performers and teachers), how the use of RMT devices is associated with identifying a primary income source.

• For performers, the non-users have a markedly different distribution compared to RMT device users. A lower percentage claiming performance income as primary among non-users versus a more balanced split among users may imply that RMT device usage might be linked to alternative revenue structures or different professional profiles.

• For teachers, the data suggest that non-users and users also differ in their reliance on teaching income, with non-users leaning more toward teaching income as primary.

Statistical Confidence and Variation:

The confidence intervals (CIs) calculated for each percentage give an idea of the uncertainty in these estimates. CIs that do not overlap between groups might indicate statistically significant differences between those who use RMT devices and those who do not.

Visual Communication:

The improved faceted grouped bar chart provides a clear visual breakdown:

• Faceting by Income Type: Allows for separate panels for performance and teaching income, thus making it easier to compare within and across the two professional categories.

• Error Bars and Labels: With error bars representing the 95% CI, readers can judge the precision of the percentage estimates. Data labels placed above the bars enhance readability without overlapping the bars or error bars.

Professional Insights:

• For pro performers, if a significantly larger proportion of non-RMT users indicate that performance income is not their primary income, this might suggest that non-users diversify their income more than users.

• For pro teachers, a higher reliance on teaching income in non-RMT users might reflect differences in career structures or market segmentation, possibly implying that those employing RMT devices may either supplement their income from other sources or have different teaching roles.

Final Thoughts

The analysis provides a detailed picture of how primary income identification varies by professional group and by the use of RMT devices. These insights could be of particular interest to researchers exploring the economic behaviors of creative professionals, the role of technology adoption, or market segmentation within industries. For deeper investigation, one might:

• Conduct statistical tests to examine if the differences between subgroups are statistically significant.

• Consider additional covariates: Other factors (e.g., years of experience, geographic location) might explain further variations between these groups.

• Utilize qualitative insights: Understanding why certain professionals choose to use or forgo RMT devices could enrich the quantitative observations.

This interpretation sets the stage for further in-depth analysis, linking statistical outcomes to practical, industry-specific conclusions.