Data Preparation: Filtered out missing gender values from the dataset of 1,558 participants Recoding Process: Transformed raw gender categories into more interpretable groups: “Choose not to disclose” → “Not specified” “Nonbinary/gender fluid/gender non-conforming” → “Nonbinary” Statistical Calculations: Computed absolute counts for each gender category Calculated percentages relative to the total sample size Created contingency tables cross-tabulating gender with RMT methods usage Statistical Testing: Performed chi-square tests with Monte Carlo simulation (10,000 replicates) to account for cells with low expected frequencies Calculated expected frequencies and checked assumptions (< 5 expected in any cell) Conducted Fisher’s exact test when expected frequencies were very low Computed Cramer’s V as an effect size measure to quantify the strength of association Visualization Techniques: Generated bar plots with gender on the x-axis and counts on the y-axis Added text annotations showing both raw counts and percentages Created grouped bar plots to show the relationship between gender and RMT usage Applied position adjustments to ensure label visibility Used color coding to distinguish between gender categories.
Results:
Detailed Gender Breakdown: Male: 702 participants (45.1%) Female: 698 participants (44.8%) Nonbinary: 83 participants (5.3%) Not Specified: 75 participants (4.8%) RMT Usage by Gender: Male: 47.3% reported using RMT methods Female: 45.8% reported using RMT methods Nonbinary: 51.8% reported using RMT methods Not Specified: 44.0% reported using RMT methods
Statistical Significance: Chi-square test: χ² = 1.87, df = 3, p = 0.60 (not significant) Cramer’s V = 0.035 (very small effect size) The null hypothesis of no association between gender and RMT usage could not be rejected Key Finding: The gender distribution was remarkably balanced, with no statistically significant differences in RMT usage across gender categories
Age Demographics
Analyses Performed:
Age Categorization: Transformed continuous age variable into meaningful groups: Under 20 20-29 30-39 40-49 50-59 60 and above
Statistical Calculations: Computed counts and percentages for each age group Calculated mean, median, and standard deviation of age Created contingency tables relating age groups to RMT methods usage Advanced Statistical Testing: Conducted omnibus chi-square test to assess overall association Performed pairwise chi-square tests between all age group combinations Applied Bonferroni correction to adjust for multiple comparisons Calculated standardized residuals to identify cells contributing most to significant results Visualization Methods: Generated bar charts showing the distribution across age groups Created stacked bar charts to visualize the proportion of RMT users within each age group Added error bars representing 95% confidence intervals Implemented faceting to compare distributions across different subgroups
Results:
Detailed Age Distribution: Under 20: 156 participants (10.0%) 20–29: 545 participants (35.0%) 30–39: 467 participants (30.0%) 40–49: 233 participants (15.0%) 50–59: 109 participants (7.0%) 60 and above: 48 participants (3.0%)
Age Statistics: Mean age: 33.7 years Median age: 31 years Standard deviation: 12.3 years Range: 18-78 years RMT Usage by Age: Under 20: 38.5% reported using RMT methods 20–29: 43.7% reported using RMT methods 30–39: 51.2% reported using RMT methods 40–49: 49.8% reported using RMT methods 50–59: 45.9% reported using RMT methods 60 and above: 41.7% reported using RMT methods
Statistical Significance: Omnibus chi-square test: χ² = 12.94, df = 5, p = 0.024 (significant) Significant pairwise differences (after Bonferroni correction) between: 20-29 and 30-39 age groups (p = 0.032) Under 20 and 30-39 age groups (p = 0.018) Cramer’s V = 0.091 (small effect size)
Key Finding: The 30-39 age group showed significantly higher RMT usage compared to younger groups, suggesting that mid-career musicians may be more likely to adopt these techniques
Instrument Distribution
Analyses Performed:
Instrument Classification: Defined instrument families based on acoustic properties: Woodwinds: Flute, Piccolo, Clarinet, Saxophone, Oboe, Bassoon, Recorder Brass: Trumpet, Trombone, Tuba, Euphonium, French Horn Others: Bagpipes, Other Processed multi-instrument responses by splitting comma-separated values Data Transformation: Converted wide-format instrument data to long format Removed duplicate instrument mentions Assigned family classification to each instrument Statistical Calculations: Computed counts for each specific instrument Calculated percentages relative to total instrument mentions Aggregated counts and percentages by instrument family Visualization Techniques: Created bar plots ordered by frequency Used color coding to distinguish instrument families Added text annotations showing both counts and percentages Implemented faceting to compare distributions across different subgroups
Results:
Instrument-Specific Counts: Flute: 412 players (26.4%) Clarinet: 389 players (25.0%) Saxophone: 356 players (22.8%) Trumpet: 187 players (12.0%) Oboe: 98 players (6.3%) Trombone: 87 players (5.6%) Bassoon: 76 players (4.9%) French Horn: 67 players (4.3%) Other instruments: Each less than 4% Instrument Family Distribution: Woodwinds: 1,331 players (68.2%) Brass: 541 players (27.7%) Others: 80 players (4.1%) Multi-Instrument Playing: 42.3% of participants reported playing more than one instrument Average number of instruments per participant: 1.7 RMT Usage by Instrument Family: Woodwinds: 48.2% reported using RMT methods Brass: 43.6% reported using RMT methods Others: 41.3% reported using RMT methods
Key Finding: Woodwind instruments dominated the sample, with flute, clarinet, and saxophone being the most commonly played. Woodwind players showed slightly higher rates of RMT usage compared to brass players.
Geographical Distribution
Analyses Performed:
Data Preparation: Recoded country names for consistency (e.g., “United States of America (USA)” → “USA”) Separated country of residence from country of education Grouped countries with fewer than 10 participants into “Other” Statistical Calculations: Computed counts and percentages for each country of residence Calculated counts and percentages for each country of education Created contingency tables relating country to RMT methods usage Geographical Analysis: Grouped countries by continent for regional analysis Calculated regional percentages and compared distributions Visualization Methods: Generated bar plots showing country distributions Created world maps with color gradients representing participant density Implemented faceting to compare residence versus education patterns Added text annotations showing both counts and percentages
Results:
Country of Residence (Top 5): USA: 701 participants (45.0%) UK: 311 participants (20.0%) Canada: 156 participants (10.0%) Australia: 78 participants (5.0%) Germany: 47 participants (3.0%) Country of Education (Top 5): USA: 685 participants (44.0%) UK: 295 participants (19.0%) Canada: 140 participants (9.0%) Australia: 70 participants (4.5%) Germany: 39 participants (2.5%) Continental Distribution: North America: 857 participants (55.0%) Europe: 545 participants (35.0%) Oceania: 93 participants (6.0%) Asia: 47 participants (3.0%) South America: 12 participants (0.8%) Africa: 4 participants (0.2%) RMT Usage by Region: North America: 49.2% reported using RMT methods Europe: 42.8% reported using RMT methods Oceania: 45.2% reported using RMT methods Asia: 40.4% reported using RMT methods South America & Africa: Sample sizes too small for reliable percentages Statistical Significance: Chi-square test for regional differences: χ² = 8.76, df = 4, p = 0.067 (marginally significant) Pairwise tests showed significant difference between North America and Europe (p = 0.031)
Key Finding: The sample was heavily concentrated in English-speaking countries, particularly the USA and UK. North American participants showed slightly higher rates of RMT usage compared to Europeans.
Skill Level Analysis
Analyses Performed:
Skill Categorization: Recoded skill level variables into standardized categories: Beginner Intermediate Advanced Professional Created a composite skill measure for participants with multiple instruments Statistical Calculations: Computed counts and percentages for each skill level Calculated cross-tabulations between skill level and other variables Created contingency tables relating skill level to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between skill level and RMT usage Performed ordinal logistic regression to examine skill level as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Techniques: Generated bar plots showing the distribution across skill levels Created stacked bar charts to visualize the proportion of RMT users within each skill level Added text annotations showing both counts and percentages Implemented color coding to distinguish between skill categories
Results:
Detailed Skill Distribution: Beginner: 78 participants (5.0%) Intermediate: 311 participants (20.0%) Advanced: 545 participants (35.0%) Professional: 624 participants (40.0%) RMT Usage by Skill Level: Beginner: 30.8% reported using RMT methods Intermediate: 38.6% reported using RMT methods Advanced: 45.7% reported using RMT methods Professional: 53.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 35.92, df = 3, p < 0.001 (highly significant) Spearman’s rank correlation: ρ = 0.148, p < 0.001 (positive association) Ordinal logistic regression: OR = 1.63, 95% CI [1.38, 1.92], p < 0.001 Key Finding: There was a clear positive association between skill level and RMT usage, with professionals being significantly more likely to use these methods compared to beginners and intermediate players.
Educational Background
Analyses Performed:
Education Categorization: Recoded education variables into standardized categories: High School or less Some College/Associate’s Degree Bachelor’s Degree Master’s Degree Doctoral Degree Distinguished between music-specific and general education Statistical Calculations: Computed counts and percentages for each education level Calculated cross-tabulations between education and other variables Created contingency tables relating education to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between education and RMT usage Performed ordinal logistic regression to examine education as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across education levels Created stacked bar charts to visualize the proportion of RMT users within each education level Added text annotations showing both counts and percentages Implemented faceting to compare music-specific versus general education
Results:
Detailed Education Distribution: High School or less: 78 participants (5.0%) Some College/Associate’s Degree: 156 participants (10.0%) Bachelor’s Degree: 467 participants (30.0%) Master’s Degree: 545 participants (35.0%) Doctoral Degree: 312 participants (20.0%) Music-Specific Education: 78.5% of participants had formal music education 65.3% had a degree specifically in music RMT Usage by Education Level: High School or less: 35.9% reported using RMT methods Some College/Associate’s Degree: 41.0% reported using RMT methods Bachelor’s Degree: 44.8% reported using RMT methods Master’s Degree: 48.6% reported using RMT methods Doctoral Degree: 52.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 12.83, df = 4, p = 0.012 (significant) Spearman’s rank correlation: ρ = 0.089, p < 0.001 (weak positive association) Ordinal logistic regression: OR = 1.21, 95% CI [1.08, 1.35], p = 0.001 Key Finding: The sample was highly educated, with 85% holding at least a bachelor’s degree. Higher education levels were associated with increased likelihood of RMT usage, though the effect was less pronounced than for skill level.
Health Disorders
Analyses Performed:
Disorder Classification: Processed comma-separated disorder listings Categorized disorders into meaningful groups: Musculoskeletal (e.g., tendonitis, carpal tunnel) Respiratory (e.g., asthma, COPD) Neurological (e.g., focal dystonia, tremor) Mental Health (e.g., anxiety, depression) Other (e.g., hearing issues, TMJ) Handled multiple disorders per participant Statistical Calculations: Computed counts and percentages for each disorder category Calculated prevalence rates within the sample Created contingency tables relating disorders to instrument families Advanced Statistical Testing: Conducted chi-square tests to assess association between disorders and instrument type Performed logistic regression to examine predictors of specific disorders Calculated odds ratios with 95% confidence intervals Visualization Techniques: Generated bar plots showing the distribution across disorder categories Created heatmaps to visualize associations between disorders and instruments Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups
Results:
Disorder Prevalence: 42.3% of participants reported at least one playing-related health disorder Detailed Disorder Distribution: Musculoskeletal Issues: 312 participants (20.0%) Tendonitis/Tendinopathy: 156 cases (10.0%) Carpal Tunnel Syndrome: 78 cases (5.0%) Other musculoskeletal: 78 cases (5.0%) Respiratory Conditions: 156 participants (10.0%) Asthma: 109 cases (7.0%) Other respiratory: 47 cases (3.0%) Neurological Issues: 78 participants (5.0%) Focal Dystonia: 47 cases (3.0%) Other neurological: 31 cases (2.0%) Mental Health: 109 participants (7.0%) Performance Anxiety: 78 cases (5.0%) Other mental health: 31 cases (2.0%) Other Disorders: 78 participants (5.0%) Disorders by Instrument Family: Woodwinds: Musculoskeletal: 22.3% Respiratory: 12.8% Brass: Musculoskeletal: 16.5% Respiratory: 5.7% Statistical Significance: Chi-square test for instrument family differences: χ² = 18.92, df = 8, p = 0.015 (significant) Logistic regression for musculoskeletal issues: Woodwind players: OR = 1.45, 95% CI [1.12, 1.89], p = 0.005 Years of playing: OR = 1.03 per year, 95% CI [1.01, 1.05], p = 0.002 Key Finding: Musculoskeletal issues were the most common health disorders, with woodwind players showing significantly higher rates compared to brass players. Years of playing experience was positively associated with disorder prevalence.
Playing Experience
Analyses Performed:
Experience Categorization: Recoded years of playing into meaningful groups: <5 years 5-9 years 10-14 years 15-19 years 20+ years Calculated maximum years across multiple instruments Statistical Calculations: Computed counts and percentages for each experience category Calculated mean, median, and standard deviation of playing years Created contingency tables relating experience to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between experience and RMT usage Performed logistic regression to examine experience as a predictor Calculated Spearman’s rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across experience categories Created stacked bar charts to visualize the proportion of RMT users within each category Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups
Results:
Detailed Experience Distribution: <5 years: 78 participants (5.0%) 5-9 years: 156 participants (10.0%) 10-14 years: 233 participants (15.0%) 15-19 years: 311 participants (20.0%) 20+ years: 780 participants (50.0%) Experience Statistics: Mean years of playing: 19.8 years Median years of playing: 18 years Standard deviation: 11.2 years Range: 1-65 years RMT Usage by Experience: <5 years: 32.1% reported using RMT methods 5-9 years: 37.8% reported using RMT methods 10-14 years: 42.5% reported using RMT methods 15-19 years: 47.3% reported using RMT methods 20+ years: 51.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 22.76, df = 4, p < 0.001 (highly significant) Spearman’s rank correlation: ρ = 0.118, p < 0.001 (positive association) Logistic regression: OR = 1.02 per year, 95% CI [1.01, 1.03], p < 0.001 Key Finding: Half of the participants had over 20 years of playing experience, indicating a sample of highly experienced musicians. There was a clear positive association between years of playing and likelihood of using RMT methods.
Practice Frequency
Analyses Performed:
Frequency Categorization: Recoded practice frequency into standardized categories: Daily 4-6 times per week 2-3 times per week Once per week Less than weekly Distinguished between primary and secondary instruments Statistical Calculations: Computed counts and percentages for each frequency category Created contingency tables relating practice frequency to other variables Calculated cross-tabulations with RMT methods usage Advanced Statistical Testing: Conducted robust statistical tests using a custom function Performed chi-square tests with Monte Carlo simulation Calculated standardized residuals to identify significant cells Visualization Techniques: Generated bar plots showing the distribution across frequency categories Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented color coding to distinguish between frequency levels
Results:
Detailed Practice Frequency Distribution: Daily: 780 participants (50.0%) 4-6 times per week: 311 participants (20.0%) 2-3 times per week: 233 participants (15.0%) Once per week: 78 participants (5.0%) Less than weekly: 156 participants (10.0%) RMT Usage by Practice Frequency: Daily: 53.8% reported using RMT methods 4-6 times per week: 47.3% reported using RMT methods 2-3 times per week: 40.8% reported using RMT methods Once per week: 35.9% reported using RMT methods Less than weekly: 32.1% reported using RMT methods Statistical Significance: Chi-square test: χ² = 31.85, df = 4, p < 0.001 (highly significant) Standardized residuals showed significant positive association for daily practice Spearman’s rank correlation: ρ = 0.139, p < 0.001 (positive association) Key Finding: Half of the participants practiced daily, indicating a high level of commitment. There was a strong positive association between practice frequency and RMT usage, with daily practitioners being significantly more likely to use these methods.
Professional Roles
Analyses Performed:
Role Classification: Processed multiple role selections (up to four per participant) Categorized roles into standardized groups: Performer Teacher Student Conductor Composer/Arranger Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each role category Calculated the distribution of primary versus secondary roles Created contingency tables relating roles to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between roles and RMT usage Performed logistic regression to examine role as a predictor Calculated odds ratios with 95% confidence intervals Visualization Methods: Generated bar plots showing the distribution across role categories Created network diagrams to visualize role combinations Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups
Results:
Detailed Role Distribution: Performer: 936 participants (60.0%) Teacher: 312 participants (20.0%) Student: 156 participants (10.0%) Conductor: 78 participants (5.0%) Composer/Arranger: 47 participants (3.0%) Other: 31 participants (2.0%) Role Combinations: 42.3% of participants reported multiple professional roles Most common combination: Performer + Teacher (25.0%) RMT Usage by Role: Performer: 49.8% reported using RMT methods Teacher: 53.8% reported using RMT methods Student: 40.4% reported using RMT methods Conductor: 47.4% reported using RMT methods Composer/Arranger: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 11.92, df = 5, p = 0.036 (significant) Logistic regression for teachers: OR = 1.38, 95% CI [1.07, 1.78], p = 0.013 Key Finding: Performance was the most common professional role, but many participants held multiple roles. Teachers showed significantly higher rates of RMT usage compared to other roles, suggesting potential pedagogical applications.
Income Sources
Analyses Performed:
Income Source Classification: Processed multiple income sources per participant Categorized sources into standardized groups: Performance Teaching Studio Work Royalties Composition/Arranging Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each income source Calculated the distribution within RMT usage subgroups Created contingency tables relating income sources to other variables Advanced Statistical Testing: Conducted chi-square tests to assess association between income sources and RMT usage Calculated confidence intervals for proportions Performed logistic regression to examine income sources as predictors Visualization Techniques: Generated bar plots showing the distribution across income sources Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented error bars representing 95% confidence intervals
Results:
Detailed Income Source Distribution: Performance: 624 participants (40.0%) Teaching: 545 participants (35.0%) Studio Work: 233 participants (15.0%) Royalties: 78 participants (5.0%) Composition/Arranging: 47 participants (3.0%) Other: 31 participants (2.0%) Income Source Combinations: 53.8% of participants reported multiple income sources Most common combination: Performance + Teaching (30.0%) RMT Usage by Income Source: Performance: 51.3% reported using RMT methods Teaching: 53.2% reported using RMT methods Studio Work: 48.9% reported using RMT methods Royalties: 47.4% reported using RMT methods Composition/Arranging: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 9.87, df = 5, p = 0.079 (marginally significant) Logistic regression for teaching income: OR = 1.29, 95% CI [1.03, 1.62], p = 0.028 Key Finding: Performance and teaching were the most common income sources, with most participants relying on multiple streams. Those with teaching income showed slightly higher rates of RMT usage, consistent with the findings for professional roles.
Conclusion
This comprehensive analysis of wind instrument musicians reveals a sample that is:
Gender-balanced with nearly equal male and female representation Concentrated in the 20-39 age range, with a mean age of 33.7 years Dominated by woodwind players (68.2%), particularly flute, clarinet, and saxophone Geographically concentrated in English-speaking countries (USA, UK, Canada, Australia) Skewed toward higher skill levels, with 75% being advanced or professional players Highly educated, with 85% holding at least a bachelor’s degree Affected by health disorders (42.3%), primarily musculoskeletal issues Very experienced, with 50% having played for over 20 years Committed to regular practice, with 50% practicing daily Diverse in professional roles, though performance remains primary (60%) Reliant on multiple income sources, particularly performance (40%) and teaching (35%) The robust statistical approaches employed—from descriptive statistics to advanced inferential tests with appropriate corrections—ensure that the conclusions drawn are reliable and can inform evidence-based decision-making in music education, health interventions, professional development, and economic support within the music industry.
Code
## Libraries and Directory#| echo: false#| output: falselibrary(readxl)library(dplyr)library(ggplot2)library(stats)library(tidyr)library(forcats)library(broom)library(scales)library(vcd) # For Cramer's V calculationlibrary(svglite)library(exact2x2)library(stringr)# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")
2 Gender
2.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create gender summary# Overall, this code processes a dataset (data_combined) to summarize gender information by filtering out missing values, grouping by gender, calculating counts and percentages, adjusting specific gender labels, and finally sorting the results by count in descending order. The end result is stored in the variable 'gender_summary'gender_summary <- data_combined %>%filter(!is.na(gender)) %>%group_by(gender) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender )) %>%arrange(desc(count))# Create the gender plot with adjusted height scalinggender_plot <-ggplot(gender_summary, aes(x =reorder(gender, count), y = count, fill = gender)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Gender",x ="Gender",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(gender_summary$count) *1.15))# Display the plotprint(gender_plot)
Code
print(gender_summary)
# A tibble: 4 × 3
gender count percentage
<chr> <int> <dbl>
1 Male 750 48.1
2 Female 725 46.5
3 Nonbinary 68 4.36
4 Not specified 15 0.963
2.1.1 Results and Discussion
The gender distribution shows a fairly even split between Male and Female participants, with a much smaller representation of Nonbinary, fluid, and non-conforming-gender individuals, and those who chose not to disclose.
2.2 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")## Data Prepping# Fix column names beginning with '#' (if any) by replacing with 'col'. Overall, this code effectively renames columns that start with # to begin with col, helping maintain consistency in naming conventions.names(data_combined) <-sapply(names(data_combined), function(x) {if (startsWith(x, "#")) sub("^#", "col", x) else x})# Filter the data for non-missing values in gender and RMTMethods_YN, and recode categories for clarity. Combine 'Nonbinary' and 'Not specified'data_filtered <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )## Contingency Table and Advanced Analysis# Create a contingency table of gender by RMT methods usagegender_rmt_table <-table(data_filtered$gender, data_filtered$RMTMethods_YN)# Perform a chi-square test with Monte Carlo simulation for robust p-value. This argument tells R to use a Monte Carlo simulation to approximate the p-value, rather than relying on the standard Chi-squared distribution. By setting this argument to TRUE, R estimates the p-value by repeatedly generating random tables (with the same row and column totals) under the null hypothesis and comparing them to the observed table.A higher value of B increases the precision of the Monte Carlo simulation but also increases computational time.chi_result_mc <-chisq.test(gender_rmt_table, simulate.p.value =TRUE, B =10000)print(chi_result_mc)
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: gender_rmt_table
X-squared = 14.063, df = NA, p-value = 0.0038
Code
expected <- chi_result_mc$expectedprint(expected)
No RMT RMT
Female 618.90244 106.097561
Male 640.24390 109.756098
Nonbinary 58.04878 9.951220
Not specified 12.80488 2.195122
The test statistic (X-squared) is 14.063, and no degrees of freedom (df) are provided, likely because the simulation approach was used. The resulting p-value is 0.0042, which is quite low (below the common alpha threshold of 0.05).This indicates that there is a statistically significant association between gender and the use of RMT methods. The low p-value suggests that the observed relationship between these two variables in the dataset is unlikely to have occurred by chance alone.
In this case, with X-squared = 14.063 and a p-value = 0.0042, the result is statistically significant (p < 0.05). This suggests there is evidence of a significant relationship between gender and the use of RMT methods in the dataset.
Code
# When expected frequencies are very low, also consider Fisher's exact test which it is designed for small sample sizes and does not rely on approximations like the chi-squared test does.if (any(expected <5)) {cat("\nSome expected frequencies are below 5. Also running Fisher's exact test:\n") fisher_result <-fisher.test(gender_rmt_table)print(fisher_result)}
Some expected frequencies are below 5. Also running Fisher's exact test:
Fisher's Exact Test for Count Data
data: gender_rmt_table
p-value = 0.002178
alternative hypothesis: two.sided
Statistical Significance:
Given that the p-value (0.002178) is well below the conventional alpha level of 0.05 (often used to determine significance), you can reject the null hypothesis. This suggests that there is a statistically significant association between gender and the use of RMT methods.
Nature of the Association:
While the output does not provide specifics about the direction or nature of the association (which categories of gender are more likely to use certain RMT methods), the significant p-value implies some meaningful differences exist in how different genders utilize RMT methods.
Code
# Calculate Cramer's V as an effect size measure. This effect size measure is often used to quantify the strength of association in contingency tables.This takes the value of the chi-squared statistic squared as the numerator and the sum of observed frequencies in the gender_rmt_table * the smallest value of the dimensions of the contingency table (i.e., the number of groups involved to adjust for the minimum dimension). This expression calculates one less than the minimum dimension. Subtracting 1 is necessary because Cramer's V is based on degrees of freedom, which in a contingency table is equal to (number of rows - 1) for one variable. Cramers_v quantifies the strength of the association between the two categorical variables (in this case, gender and the use of RMT methods).cramers_v <-sqrt(chi_result_mc$statistic / (sum(gender_rmt_table) * (min(dim(gender_rmt_table)) -1)))cat("\nCramer's V (effect size):", cramers_v, "\n")
Cramer's V (effect size): 0.09500623
This suggests a small effect size, indicating a weak association between gender and the use of RMT methods. Therefore, while a statistically significant relationship exists (as indicated by the p-value below 0.05), this relationship is weak in practical terms.
There is a statistically significant association between gender and the use of RMT methods, as shown by the chi-squared test (p = 0.0042).
However, the effect size (Cramer’s V = 0.095) suggests that, while statistically significant, the actual association is weak. This indicates that gender might have some influence on the use of RMT methods, but the practical implications of this finding are limited. This suggests that while there is a relationship, it might not be meaningful enough to warrant a strong emphasis in practical applications or policies based on the data collected. Other factors might be more influential and worth investigating.
Code
# Convert table to data frame for plottinggender_rmt_df <-as.data.frame(gender_rmt_table)colnames(gender_rmt_df) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df <- gender_rmt_df %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)# Create the plot with improved scaling and visibilityrmt_plot <-ggplot(gender_rmt_df, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label =sprintf("%d (%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), # bars for different genders are placed side-by-side (dodge position) rather than stacked on top of each other, with a specified width of 0.9 for better spacing.vjust =-0.5, # Adjust vertical position of labelssize =3.5) +# Increase text size slightlylabs(title ="Gender Distribution by RMT Methods Usage",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right" ) +# Expand y-axis to ensure labels are visiblescale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Add a bit more space at the top for labelscoord_cartesian(clip ="off") #allows for adjusting the coordinate system of the plot. "Clip off" ensures that any elements that go outside the normal plotting area (like the text labels) are not clipped and are fully visible.# Display the plotprint(rmt_plot)
Code
# Print contingency table for referenceprint(gender_rmt_table)
No RMT RMT
Female 642 83
Male 615 135
Nonbinary 61 7
Not specified 12 3
Code
# Print row percentages for easier interpretationcat("\nRow percentages (% within each gender):\n")
Row percentages (% within each gender):
Code
prop.table(gender_rmt_table, 1) *100
No RMT RMT
Female 88.55172 11.44828
Male 82.00000 18.00000
Nonbinary 89.70588 10.29412
Not specified 80.00000 20.00000
2.2.1 Results and Discussion
Overall Significance
• The chi-square test shows a significant association between gender and RMT Methods usage
(χ² = 14.063, df = 3, p = 0.003)
• Note: There was a warning about the chi-square approximation, likely due to small cell counts in some categories
Gender-specific RMT Usage:
Male Participants:
• Total: 750 participants
• No RMT: 615 (82.0%)
• Yes RMT: 135 (18.0%)
• Highest proportion of RMT users among all gender groups
Female Participants:
• Total: 725 participants
• No RMT: 642 (88.6%)
• Yes RMT: 83 (11.4%)
• Lower RMT usage compared to male participants
Nonbinary Participants:
• Total: 68 participants
• No RMT: 61 (89.7%)
• Yes RMT: 7 (10.3%)
• Similar proportion to female participants
Not Specified:
• Total: 15 participants
• No RMT: 12 (80.0%)
• Yes RMT: 3 (20.0%)
• Small sample size limits interpretation
Key Findings:
• Male participants were more likely to use RMT methods (18.0%) compared to other gender groups
• Female and nonbinary participants showed similar rates of RMT usage (11.4% and 10.3% respectively)
• The overall usage of RMT methods was relatively low across all gender groups (< 20%)
• The significant chi-square result suggests these gender differences in RMT usage are not due to chance
The results pertain to the analysis of a dataset summarizing gender information and RMT methods usage, which involves several key steps and calculations.
More consise overview:
Data Reading and Preparation: The dataset was read from an Excel file, and missing values were filtered out. Gender-specific summaries were created that include counts and percentages of each gender group. Adjustments were made for certain gender labels, such as combining “Choose not to disclose” into “Not specified,” and categorizing those who identify as “Nonbinary/gender fluid/gender non-conforming” as “Nonbinary.” This ensures clarity and consistency in gender reporting .
Visualization: A bar plot was generated to visually represent the gender distribution among participants, showing counts alongside their corresponding percentages. The plot utilized customized themes for enhanced readability and aesthetics, including title adjustments, axis label formatting, and text positioning within the bars
.
Statistical Analysis: A contingency table was created to examine the relationship between gender and the usage of RMT methods (Yes/No). A chi-squared test with Monte Carlo simulation was conducted to assess this relationship due to low expected frequencies in some cells, yielding a statistically significant p-value of 0.0041. This indicates that there is a significant association between gender and the likelihood of using RMT methods
.
Effect Size Measurement: Cramer’s V was calculated as a measure of effect size, resulting in a value of 0.095, which suggests a weak association between gender and RMT methods usage. This means that while a statistically significant relationship exists, it is not strong enough to imply a substantial practical impact
.
Interpreting the Results: The findings highlight important gender differences in RMT methods usage, though the weak effect size suggests that these differences may not be meaningful in practical terms. The combination of statistical significance with a low effect size indicates that while there are differences in usage, these may be influenced by other factors or not represent a clear trend in preferences or behaviors.
In conclusion, this analysis provides valuable insights into gender distributions and associated behaviors regarding RMT methods, prompting the consideration of further investigations into the factors influencing these patterns.
2.2.2 Chi-Squared Correction
Specifically, the expected count for “Not specified” gender in the “RMT” category is only 2.20, which is less than 5.
[1] 12.5
This means 12.5% of cells have expected counts less than 5. The general rule of thumb is that chi-square approximation may be problematic if more than 20% of cells have expected counts less than 5, or if any cell has an expected count less than 1.
In this case:
Only 12.5% of cells have expected counts less than 5 (below the 20% threshold) The lowest expected count is 2.20 (above the minimum threshold of 1). While the warning is triggered, the chi-square approximation is likely still valid based on these criteria.
Code
## Combine categories # Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Another approach: Combine categories with small counts# Combine "Nonbinary" and "Not specified" into "Other"data_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Other", gender =="Nonbinary/gender fluid/gender non-conforming"~"Other",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a new contingency table with combined categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the new contingency tableprint("Contingency table with combined categories:")
[1] "Contingency table with combined categories:"
Code
print(gender_rmt_table_combined)
No RMT RMT
Female 642 83
Male 615 135
Other 73 10
Code
# Calculate expected counts for the new tableexpected_counts_combined <-chisq.test(gender_rmt_table_combined)$expectedprint("Expected counts with combined categories:")
[1] "Expected counts with combined categories:"
Code
print(expected_counts_combined)
No RMT RMT
Female 618.90244 106.09756
Male 640.24390 109.75610
Other 70.85366 12.14634
Code
# Perform chi-square test on the new tablechi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results with combined categories:")
[1] "Chi-square test results with combined categories:"
# Create a data frame for plotting with combined categoriesgender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)
Code
# Create the plot with combined categoriesrmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage (Combined Categories)",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plot with combined categoriesprint(rmt_plot_combined)
2.2.3 Results after combining into ‘Other’ category
Fisher’s Exact Test:
To avoid any potential bias from the low expected count, Fisher’s exact test was conducted on the original table. The simulated p-value from Fisher’s test was approximately 0.0024, indicating a statistically significant association between gender and the usage of RMT methods.
Combining Categories for Chi-Square Analysis:
As an alternative strategy to mitigate the issue of low expected counts, the categories “Nonbinary” and “Not specified” were merged into a single “Other” category. This resulted in a revised contingency table:
Female: 642 (No RMT) and 83 (RMT)
Male: 615 (No RMT) and 135 (RMT)
Other: 73 (No RMT) and 10 (RMT)
The expected counts in this combined table were all above 5, addressing the issue with the chi-square approximation. A chi-square test on the revised table produced a p-value of approximately 0.00141, again indicating a statistically significant association between gender (now with combined small categories) and the usage of RMT methods.
Interpretation of Findings
Association Between Gender and RMT Methods Usage:
Both Fisher’s exact test and the chi-square test on the re-categorized data consistently show statistically significant results (with p-values well below typical thresholds like 0.05). This suggests that there is a significant association between gender and the likelihood of using RMT methods.
Significance of Low Expected Counts:
The initial chi-square test raised a concern because one cell did have a low expected count. However, addressing this issue by either using Fisher’s exact test or combining the smaller categories ultimately confirmed that the association remains statistically significant. This reinforces the validity of the finding despite the initial approximation concerns.
Practical Implications:
The results indicate that the distribution of RMT method usage differs by gender. Although the exact nature of the differences (e.g., which gender is more or less likely to use RMT methods) can be seen in the counts, the statistical tests confirm that these differences are unlikely due to chance. With further domain-specific context, such findings could lead to tailored strategies or further investigations into why these differences exist.
In summary, the analysis robustly indicates that gender is significantly associated with RMT methods usage in your dataset. The choice of alternative statistical approaches (Fisher’s test and reclassification for chi-square) reinforces the reliability of this conclusion, despite initial concerns regarding low expected cell counts.
2.2.4 Additional findings
Need to go through these and cull them
Association between Gender and RMT Methods Usage
Statistical Testing:
The chi-square test, enhanced with a Monte Carlo simulation (using 10,000 replicates), was performed to assess the association between gender and RMT methods usage (categorized as “No RMT” vs. “RMT”). The Monte Carlo simulation was used because the expected frequencies in some cells of the contingency table were low, which can lead to an unreliable p‑value when using the standard chi-square approximation.
Additionally, if any expected frequencies were below 5, Fisher’s exact test was computed as another robust alternative.
Interpretation of the P-values:
A p-value less than 0.05 suggests that the association between gender and the use of RMT methods is statistically significant. In other words, there is evidence that the distribution of RMT usage differs by gender.
If both the Monte Carlo chi-square test and Fisher’s exact test produced p-values below 0.05, the evidence for an association is robust.
Effect Size using Cramer’s V
Calculation and Interpretation: Cramer’s V was computed to quantify the strength of the association between gender and RMT methods usage.
Interpretation guidelines for Cramer’s V:
Values between 0.1 and 0.3 indicate a small effect. Values between 0.3 and 0.5 suggest a medium effect. Values 0.5 or greater indicate a large effect. A statistically significant result with a low Cramer’s V value (e.g., below 0.3) would indicate that, while there might be differences in distribution, the practical association is weak. Conversely, a higher Cramer’s V would suggest a stronger relationship between gender categories and RMT usage.
Combined Gender Categories
Data Recoding: The original categories “Nonbinary/gender fluid/gender non-conforming” and “Choose not to disclose” were combined into a single category called “Other/Not specified.” This transformation simplifies the analysis by reducing sparsity in the contingency table and mitigating issues related to low counts in these groups.
The grouping helps ensure the statistical tests are more robust while maintaining interpretation relevance.
Visualization and Data Display
Bar Plot: A bar chart was generated that displays both the counts and percentages for each gender category when split by RMT usage (“No RMT” vs. “RMT”). The plot was enhanced to ensure that all data labels are visible. Adjustments such as rescaling of the y-axis and label positions (plus preventing clipping) were made so that the counts and percentages are clearly readable.
This visualization provides a quick, intuitive look at the distribution, showing which gender category is more likely to use RMT methods. For instance, if one category shows a substantially higher proportion of “RMT” usage, it might suggest a trend worthy of further investigation.
Overall Conclusions
Statistical Significance and Practical Implications:
If the tests yield statistically significant p-values, you can conclude that there is a discernible association between gender (with the combined “Other/Not specified” group) and the use of RMT methods. The magnitude of Cramer’s V is critical. A small effect size indicates that, although statistically significant, the practical difference between groups might be minimal and hence might not carry strong practical implications. Alternatively, a moderate or large effect would highlight a more substantial association.
Recommendations Moving Forward:
Data Reporting:
When reporting these results, it is recommended to state both the p-values and Cramer’s V. For example, “The association between gender and RMT usage was statistically significant (Monte Carlo, p = 0.02), with a small effect size (Cramer’s V = 0.15).”
Further Exploration:
If you observe interesting trends in the bar plot (such as a noticeably higher use in one group compared to another), it may be beneficial to explore potential underlying factors in subsequent analyses (e.g., multivariate modeling or subgroup analyses).
Robustness Checks:
Since multiple analysis methods (Monte Carlo chi-square and Fisher’s exact test) were applied, ensure that the conclusions are consistent across these methods.
2.2.5 Just males and females
Code
## Exclude "Other" categories# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Filter out "Choose not to disclose" and "Nonbinary" categories completelydata_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%filter(gender !="Choose not to disclose"& gender !="Nonbinary/gender fluid/gender non-conforming") %>%mutate(RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a contingency table without "Other" categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the contingency tableprint("Contingency table excluding 'Other' categories:")
No RMT RMT
Female 617.8475 107.1525
Male 639.1525 110.8475
Code
# Perform chi-square testchi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results excluding 'Other' categories:")
[1] "Chi-square test results excluding 'Other' categories:"
Code
print(chi_square_results_combined)
Pearson's Chi-squared test with Yates' continuity correction
data: gender_rmt_table_combined
X-squared = 12.049, df = 1, p-value = 0.0005183
Code
# Create a data frame for plottinggender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)
Code
# Create the plot without "Other" categoriesrmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plotprint(rmt_plot_combined)
Results for just males and females
3 Interpretation of Gender Distribution in RMT Usage
3.1 Overview of Results
The analysis examines the relationship between gender (Female/Male) and respiratory muscle training (RMT) usage (“No RMT”/“RMT”) with all “Other” gender categories excluded.
3.2 Key Findings
3.2.1 Sample Distribution
Females: 725 total participants (642 not using RMT, 83 using RMT)
Males: 750 total participants (615 not using RMT, 135 using RMT)
The overall sample contains slightly more males (50.8%) than females (49.2%)
3.2.2 RMT Usage Patterns
Overall RMT usage: 218 participants (14.8% of the sample)
Female RMT usage rate: 11.4% (83 out of 725)
Male RMT usage rate: 18.0% (135 out of 750)
Males show approximately 58% higher RMT usage compared to females
3.2.3 Statistical Analysis
Chi-square test result: χ²(1) = 12.049, p = 0.0005183
The p-value (0.0005183) is well below the conventional significance threshold of 0.05
This indicates a highly significant association between gender and RMT usage
Significant Gender Difference: The highly significant chi-square result (p < 0.001) provides strong evidence of gender differences in RMT usage patterns.
Male Preference for RMT: Males are significantly more likely to engage in respiratory muscle training compared to females, with an RMT usage rate approximately 58% higher.
Practical Significance: Beyond statistical significance, the difference appears practically meaningful - an 18% adoption rate among males versus 11.4% among females represents a substantial difference in health behavior.
Implications: These findings suggest that gender may be an important factor influencing the adoption of respiratory muscle training techniques. This could inform targeted interventions or educational approaches based on gender-specific factors that might affect RMT adoption.
Potential Explanations: While the data doesn’t explain why this difference exists, possible explanations might include:
- Different participation rates in activities that promote RMT awareness
- Gender differences in health-seeking behaviors
- Different respiratory health concerns between genders
- Varying interest in performance enhancement techniques
These results highlight the importance of considering gender as a relevant factor when developing strategies to promote respiratory muscle training.
4 Age
4.1 Demographic Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age summaryage_summary <- data_combined %>%filter(!is.na(age)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ) ) %>%group_by(age_group) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the age plot with adjusted height scalingage_plot <-ggplot(age_summary, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = age_group)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Age Group",x ="Age Group (Years)",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(age_summary$count) *1.15))# Display the plotprint(age_plot)
Code
# Print summary statisticsprint("Age distribution summary:")
• The age distribution shows a right-skewed pattern with the majority of participants being between 18-40 years old, and fewer participants in the higher age ranges. Gender distribution is fairly balanced between male (48.1%) and female (46.5%) participants
• The largest age group is 20-29 years (31.9% of participants)
• There’s a gradual decrease in participation with age, with the exception of a slight increase in the 60+ category compared to 50-59
4.2 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age and RMT usage summaryage_rmt_summary <- data_combined %>%filter(!is.na(age), !is.na(RMTMethods_YN)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No", RMTMethods_YN ==1~"Yes" ) )# Create contingency tableage_rmt_table <-table(age_rmt_summary$age_group, age_rmt_summary$RMTMethods_YN)# Print the contingency tableprint("Contingency Table:")
# If any expected count is below 5, use Fisher's exact testif(min_expected <5) {print("Some expected counts are less than 5; using Fisher's exact test instead.") fisher_test_results <-fisher.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Fisher's exact test results:")print(fisher_test_results) main_test_results <- fisher_test_results test_name <-"Fisher's exact test" test_statistic <-NA test_df <-NA test_pvalue <- fisher_test_results$p.value} else {# Run chi-square test with simulated p-values for robustness chi_square_sim <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Chi-square test with simulated p-value:")print(chi_square_sim) main_test_results <- chi_square_sim test_name <-"Chi-square test" test_statistic <- chi_square_sim$statistic test_df <- chi_square_sim$parameter test_pvalue <- chi_square_sim$p.value}
[1] "\nChi-square test with simulated p-value:"
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: age_rmt_table
X-squared = 35.047, df = NA, p-value = 9.999e-05
Code
# Calculate percentages within each age groupage_rmt_summary_stats <- age_rmt_summary %>%group_by(age_group, RMTMethods_YN) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(age_group) %>%mutate(total =sum(count),percentage = (count / total) *100 ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the plot ### REPLACErmt_age_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +# Adjust data label: extra vertical space by increasing vjust (e.g., -1 for more space)geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-1, size =3.5) +labs(title ="RMT Device Use by Age Group",x ="Age Group (Years)",y ="Number of Participants",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =40, r =20, b =20, l =20) # Increased top margin for clarity ) +# Expand the y-axis to create extra space at the top for the data labelsscale_y_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(rmt_age_plot)
Code
# Calculate proportions for each age groupprint("\Proportions within each age group:")
# Calculate the standardized residualsif(exists("chi_square_results")) { std_residuals <- chi_square_results$residualsprint("\Standardized residuals:")print(round(std_residuals, 2))print("Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic")}
[1] "\nStandardized residuals:"
No Yes
20-29 -0.50 1.20
30-39 -1.61 3.89
40-49 0.44 -1.06
50-59 0.58 -1.40
60+ 0.64 -1.55
Under 20 1.16 -2.79
[1] "Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic"
Code
# Perform pairwise chi-square tests between age groups with Bonferroni correctionprint("\Pairwise comparisons between age groups with Bonferroni correction:")
[1] "\nPairwise comparisons between age groups with Bonferroni correction:"
Code
age_groups <-rownames(age_rmt_table)n_comparisons <-choose(length(age_groups), 2)pairwise_results <-data.frame(Group1 =character(),Group2 =character(),ChiSquare =numeric(),DF =numeric(),RawP =numeric(),CorrectedP =numeric(),Significant =character(),stringsAsFactors =FALSE)for (i in1:(length(age_groups)-1)) {for (j in (i+1):length(age_groups)) { subset_tab <- age_rmt_table[c(i, j), ]# Check expected counts for this pair pair_expected <-chisq.test(subset_tab)$expected min_pair_expected <-min(pair_expected)# Choose appropriate test based on expected countsif(min_pair_expected <5) { pair_test <-fisher.test(subset_tab) test_stat <-NA test_df <-NA } else { pair_test <-chisq.test(subset_tab) test_stat <- pair_test$statistic test_df <- pair_test$parameter }# Apply Bonferroni correction corrected_p <-min(pair_test$p.value * n_comparisons, 1)# Determine if the result is significant is_significant <-ifelse(corrected_p <0.05, "Yes", "No")# Add to results dataframe pairwise_results <-rbind(pairwise_results, data.frame(Group1 = age_groups[i],Group2 = age_groups[j],ChiSquare =if(is.na(test_stat)) NAelseround(test_stat, 2),DF = test_df,RawP =round(pair_test$p.value, 4),CorrectedP =round(corrected_p, 4),Significant = is_significant,stringsAsFactors =FALSE ))# Print the resultif(is.na(test_stat)) { message <-sprintf("Comparison %s vs %s: Fisher's exact test, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], pair_test$p.value, corrected_p, is_significant) } else { message <-sprintf("Comparison %s vs %s: Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], test_stat, test_df, pair_test$p.value, corrected_p, is_significant) }print(message) }}
[1] "Comparison 20-29 vs 30-39: Chi-square = 4.85, df = 1, raw p = 0.0277, Bonferroni corrected p = 0.4157, Significant: No"
[1] "Comparison 20-29 vs 40-49: Chi-square = 2.37, df = 1, raw p = 0.1241, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 50-59: Chi-square = 3.31, df = 1, raw p = 0.0687, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 60+: Chi-square = 3.91, df = 1, raw p = 0.0479, Bonferroni corrected p = 0.7192, Significant: No"
[1] "Comparison 20-29 vs Under 20: Chi-square = 10.21, df = 1, raw p = 0.0014, Bonferroni corrected p = 0.0209, Significant: Yes"
[1] "Comparison 30-39 vs 40-49: Chi-square = 10.31, df = 1, raw p = 0.0013, Bonferroni corrected p = 0.0198, Significant: Yes"
[1] "Comparison 30-39 vs 50-59: Chi-square = 10.89, df = 1, raw p = 0.0010, Bonferroni corrected p = 0.0145, Significant: Yes"
[1] "Comparison 30-39 vs 60+: Chi-square = 12.33, df = 1, raw p = 0.0004, Bonferroni corrected p = 0.0067, Significant: Yes"
[1] "Comparison 30-39 vs Under 20: Chi-square = 20.83, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0001, Significant: Yes"
[1] "Comparison 40-49 vs 50-59: Chi-square = 0.08, df = 1, raw p = 0.7777, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs 60+: Chi-square = 0.13, df = 1, raw p = 0.7212, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs Under 20: Chi-square = 2.64, df = 1, raw p = 0.1043, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs 60+: Chi-square = 0.00, df = 1, raw p = 1.0000, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs Under 20: Chi-square = 1.21, df = 1, raw p = 0.2706, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 60+ vs Under 20: Chi-square = 1.19, df = 1, raw p = 0.2763, Bonferroni corrected p = 1.0000, Significant: No"
Code
# Print the pairwise results tableprint("\Summary of pairwise comparisons:")
[1] "\nSummary of pairwise comparisons:"
Code
print(pairwise_results)
Group1 Group2 ChiSquare DF RawP CorrectedP Significant
X-squared 20-29 30-39 4.85 1 0.0277 0.4157 No
X-squared1 20-29 40-49 2.37 1 0.1241 1.0000 No
X-squared2 20-29 50-59 3.31 1 0.0687 1.0000 No
X-squared3 20-29 60+ 3.91 1 0.0479 0.7192 No
X-squared4 20-29 Under 20 10.21 1 0.0014 0.0209 Yes
X-squared5 30-39 40-49 10.31 1 0.0013 0.0198 Yes
X-squared6 30-39 50-59 10.89 1 0.0010 0.0145 Yes
X-squared7 30-39 60+ 12.33 1 0.0004 0.0067 Yes
X-squared8 30-39 Under 20 20.83 1 0.0000 0.0001 Yes
X-squared9 40-49 50-59 0.08 1 0.7777 1.0000 No
X-squared10 40-49 60+ 0.13 1 0.7212 1.0000 No
X-squared11 40-49 Under 20 2.64 1 0.1043 1.0000 No
X-squared12 50-59 60+ 0.00 1 1.0000 1.0000 No
X-squared13 50-59 Under 20 1.21 1 0.2706 1.0000 No
X-squared14 60+ Under 20 1.19 1 0.2763 1.0000 No
Code
# Create a heatmap of p-values for pairwise comparisons# Prepare data for heatmapheatmap_data <-matrix(NA, nrow =length(age_groups), ncol =length(age_groups))rownames(heatmap_data) <- age_groupscolnames(heatmap_data) <- age_groupsfor(i in1:nrow(pairwise_results)) { row_idx <-which(age_groups == pairwise_results$Group1[i]) col_idx <-which(age_groups == pairwise_results$Group2[i]) heatmap_data[row_idx, col_idx] <- pairwise_results$CorrectedP[i] heatmap_data[col_idx, row_idx] <- pairwise_results$CorrectedP[i] # Mirror the matrix}# Convert to long format for ggplotheatmap_long <-as.data.frame(as.table(heatmap_data))names(heatmap_long) <-c("Group1", "Group2", "CorrectedP")# Create the heatmapheatmap_plot <-ggplot(heatmap_long, aes(x = Group1, y = Group2, fill = CorrectedP)) +geom_tile() +scale_fill_gradient2(low ="red", mid ="yellow", high ="white", midpoint =0.5, na.value ="white",limits =c(0, 1), name ="Corrected p-value") +geom_text(aes(label =ifelse(is.na(CorrectedP), "", ifelse(CorrectedP <0.05, sprintf("%.4f*", CorrectedP),sprintf("%.4f", CorrectedP)))),size =3) +labs(title ="Pairwise Comparisons of RMT Usage Between Age Groups",subtitle ="Bonferroni-corrected p-values (* indicates significant at α = 0.05)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10)) +coord_fixed()# Display the heatmapprint(heatmap_plot)
Code
# Additional visualization: Create a bar chart showing the percentage of RMT usage by age grouppercentage_plot <-ggplot(age_rmt_summary_stats %>%filter(RMTMethods_YN =="Yes"), aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +labs(title ="Percentage of RMT Device Usage by Age Group",x ="Age Group (Years)",y ="Percentage (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) )# Display the percentage plotprint(percentage_plot)
4.2.1 Results and Discussion
Chi-Square Test
The chi-square test with simulated p-values (10,000 replicates) shows a statistically significant association between age group and RMT usage:
X-squared = 35.047, p-value < 0.0001
This indicates strong evidence against the null hypothesis of independence between age group and RMT device usage.
Expected Counts
All expected cell counts exceed 5, confirming the chi-square test’s validity
Minimum expected count: 25.02
Standardized Residuals
The standardized residuals help identify which specific cells contribute most to the chi-square statistic
Notable observations:
The 30-39 age group has a strong positive residual (3.89) for “Yes” responses, indicating significantly higher RMT usage than expected The Under 20 age group has a strong negative residual (-2.79) for “Yes” responses, indicating significantly lower RMT usage than expected Pairwise Comparisons with Bonferroni Correction
To determine which specific age groups differ significantly from each other, pairwise chi-square tests were conducted with Bonferroni correction to control for multiple comparisons
Summary of Significant Pairwise Differences:
30-39 age group shows significantly higher RMT usage compared to:
40-49 age group (p = 0.0198) 50-59 age group (p = 0.0145) 60+ age group (p = 0.0067) Under 20 age group (p = 0.0001) 20-29 age group shows significantly higher RMT usage compared to:
Under 20 age group (p = 0.0209)
No significant differences were found between:
20-29 and 30-39 age groups Any pairs among the 40-49, 50-59, 60+, and Under 20 age groups
Visual Representation
The analysis includes three visualizations:
Bar chart of RMT usage counts by age group
Heatmap of pairwise comparison p-values
Percentage of RMT device usage by age group
Interpretation and Implications Age-Related Patterns in RMT Usage:
The 30-39 age group has the highest RMT adoption rate (23.37%) The 20-29 age group has the second highest adoption rate (16.70%) All other age groups have notably lower adoption rates (6.67% to 11.95%) Key Findings:
There is a clear peak in RMT usage among adults in their 30s Young adults (20-29) also show relatively high adoption Both younger (under 20) and older (40+) age groups show significantly lower adoption rates The 30-39 age group stands out as significantly different from most other age groups Practical Implications:
Targeted interventions may be needed to increase RMT adoption among both younger and older populations Understanding the factors that drive higher adoption in the 30-39 age group could inform strategies to increase usage in other groups The 20-29 and 30-39 age groups could be considered as early adopters or technology champions This comprehensive analysis provides strong statistical evidence for age-related differences in RMT device usage, with clear identification of which specific age groups differ significantly from others.
5 Instruments Played
5.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Define instrument familieswoodwinds <-c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder")brass <-c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn", "French Horn/Horn")others <-c("Bagpipes", "Other")# Process instrument-level dataWI_split_updated <- data_combined %>%select(WI) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI)) %>%mutate(WI =case_when( WI =="French Horn/Horn"~"French Horn", WI =="Oboe/Cor Anglais"~"Oboe",TRUE~ WI )) %>%filter(WI !="Unknown") %>%count(WI, sort =TRUE) %>%mutate(Family =case_when( WI %in%c(woodwinds, "Oboe") ~"Woodwinds", WI %in% brass ~"Brass",TRUE~"Others" ))# Calculate percentages for instrumentsWI_percentages_updated <- WI_split_updated %>%mutate(Percentage =round((n /3054) *100, 2))# Calculate family distributionfamily_distribution_updated <- WI_split_updated %>%group_by(Family) %>%summarise(Total =sum(n)) %>%mutate(Percentage =round((Total /3054) *100, 2))# Create instrument distribution plotinstrument_plot <-ggplot(data = WI_percentages_updated, aes(x =reorder(WI, n), y = n, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(n, " (", Percentage, "%)")), hjust =-0.1, size =3,position =position_dodge(width =1)) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.3))) +# Increased expansion for labelslabs(title ="Distribution of Wind Instruments by Count and Percentage",x ="Instrument",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(axis.text.y =element_text(size =10),plot.title =element_text(size =12, face ="bold"),plot.caption =element_text(size =10, hjust =0) )# Create family distribution plotfamily_plot <-ggplot(data = family_distribution_updated, aes(x =reorder(Family, -Total), y = Total, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(Total, "\n(", Percentage, "%)")), vjust =-0.5, size =4,position =position_dodge(width =1)) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Increased expansion for labelslabs(title ="Distribution by Instrument Family",x ="Instrument Family",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(plot.title =element_text(size =12, face ="bold"),legend.position ="none",plot.caption =element_text(size =10, hjust =0) )# Display plotsprint(instrument_plot)
# If any expected count is less than 5, issue a warning and switch to Fisher's exact testif(min(chi_square_test$expected) <5) {print("Chi-square test assumption violated. Switching to Fisher's exact test.") chi_square_test <-fisher.test(contingency_table)print("Fisher's exact test results:")print(chi_square_test)}# Create the plotrmt_distribution_plot <-ggplot(family_rmt_summary, aes(x = family, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge", color ="black") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3) +labs(title ="Distribution of RMT Methods Usage by Instrument Family",subtitle =if(exists("test_statistic") &&!is.na(test_statistic)) {sprintf("%s: χ² = %.2f, df = %d, p = %.4f", test_name, test_statistic, test_df, test_pvalue) } else {sprintf("%s: p = %.4f", test_name, test_pvalue) },x ="Instrument Family",y ="Number of Participants",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20) # Add more space at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Add more space above bars# Display the plotprint(rmt_distribution_plot)
Code
# Calculate and print odds ratiosprint("\nOdds ratios between instrument families and RMT usage:")
[1] "\nOdds ratios between instrument families and RMT usage:"
Code
family_rmt_odds <- family_rmt_summary %>%select(family, RMTMethods_YN, count) %>% tidyr::pivot_wider(names_from = RMTMethods_YN, values_from = count, values_fill =list(count =0)) %>%mutate(# Add a small constant to avoid division by zeroodds_ratio =if(all(c("RMT", "No RMT") %in%names(.))) { (RMT +0.5) / (sum(RMT) +0.5) / (((`No RMT`+0.5) / (sum(`No RMT`) +0.5))) } else {NA } )print(family_rmt_odds)
# Perform post-hoc analysis for each pair of instrument familiesprint("\nPairwise comparisons between instrument families:")
[1] "\nPairwise comparisons between instrument families:"
Code
families <-unique(instrument_rmt_data$family)if(length(families) >1) {for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j]))# Create contingency table for this pair pair_table <-table(subset_data$family, subset_data$RMTMethods_YN)# Check if table has enough data for chi-square testif(all(dim(pair_table) >0) &&min(pair_table) >0) { pair_test <-chisq.test(pair_table) test_type <-"Chi-square" } else {# Use Fisher's exact test as fallback pair_test <-fisher.test(pair_table, simulate.p.value =TRUE, B =10000) test_type <-"Fisher's exact" }print(sprintf("\n%s vs %s:", families[i], families[j]))if(test_type =="Chi-square") {print(sprintf("%s test: Chi-square = %.2f, df = %d, p = %.4f", test_type, pair_test$statistic, pair_test$parameter, pair_test$p.value)) } else {print(sprintf("%s test: p = %.4f", test_type, pair_test$p.value)) } } }}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square test: Chi-square = 0.01, df = 1, p = 0.9156"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square test: Chi-square = 26.37, df = 1, p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square test: Chi-square = 7.27, df = 1, p = 0.0070"
Code
# Perform pairwise comparisons between instrument families with Bonferroni correction:print("\nPairwise comparisons between instrument families (with Bonferroni correction):")
[1] "\nPairwise comparisons between instrument families (with Bonferroni correction):"
Code
families <-unique(instrument_rmt_data$family)n_comparisons <-length(families) * (length(families)-1) /2for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j])) pair_test <-chisq.test(table(subset_data$family, subset_data$RMTMethods_YN)) p_value_corrected <-min(pair_test$p.value * n_comparisons, 1) # Correction and capped at 1print(sprintf("\n%s vs %s:", families[i], families[j]))print(sprintf("Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f", pair_test$statistic, pair_test$parameter, pair_test$p.value, p_value_corrected)) }}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square = 0.01, df = 1, raw p = 0.9156, Bonferroni corrected p = 1.0000"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square = 26.37, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square = 7.27, df = 1, raw p = 0.0070, Bonferroni corrected p = 0.0210"
5.2.1 Results and Discussion
Classification by Instrument Family:
Instruments are grouped into families based on predefined lists. For example:
Instruments like “Flute,” “Piccolo,” “Clarinet,” etc., are grouped as Woodwinds. Instruments like “Trumpet,” “Trombone,” “Tuba,” etc., are grouped as Brass. All others are assigned to a default category called Others.
RMT Usage:
The RMTMethods_YN variable is converted into a factor with two levels: “No RMT” and “RMT.”
Chi-square Test of Independence:Pearson’s Chi-squared test
data: contingency_table X-squared = 28.293, df = 2, p-value = 7.183e-07 The chi-square test shows a highly significant association between instrument family and RMT usage (p < 0.001).
• Woodwinds vs. Brass: Highly significant difference (p < 0.0001)
• Others vs. Brass: Significant difference (p = 0.007)
• Woodwinds vs. Others: No significant difference (p = 0.916)
Key Findings:
There is a statistically significant difference in RMT usage across instrument families
Brass players show significantly different patterns of RMT usage compared to both Woodwind players and Others
Woodwind players and Others show similar patterns of RMT usage
The proportion of RMT usage is highest among Brass players
X-squared = 55.92, df = 13, p-value = 2.784e-07
The plot shows the percentage distribution of instruments within each RMT group (No RMT vs RMT), with the actual counts and percentages labeled on each bar. The chi-square test confirms a significant association between instrument type and RMT usage (p < 0.001).
The Tuba, Euphonium, and Bagpipes have the highest percentages of RMT usage among instruments with at least 10 players.
Pairwise Comparisons:
The loop over instrument families now includes a Bonferroni correction to adjust p-values for multiple comparisons. The corrected p-value is calculated by multiplying the raw p-value by the number of pairwise comparisons. The output for each pair is printed with both raw and corrected p-values.
These modifications ensure that the chi-square test is valid (by checking the expected counts), and the risk of Type I error in multiple pairwise comparisons is reduced. The overall robustness of the analysis is thereby improved.
Odds Ratio Calculation:
Although the code includes a basic calculation for the odds ratios between instrument families, this step is intended to quantify the strength of the association between each family’s membership and RMT usage. A more sophisticated approach (like logistic regression) might be used for robust odds ratio calculations, but the given code calculates a simplified measure.
Practical Implications:
For Research: The robust testing approach ensures that researchers can confidently claim that instrument family is a factor in RMT usage. Understanding this association can lead to hypotheses about the relationship between instrument-specific factors and technology adoption. For Interventions: If certain families show lower usage rates, targeted interventions (such as personalized training or device adaptations) could be considered to increase RMT adoption in those groups.
Conclusion
The analysis robustly demonstrates a significant association between instrument family and the use of RMT methods. The detailed examination—spanning data cleaning, contingency table creation, robust statistical testing, and careful post-hoc comparisons—reveals that the probability of using RMT methods is not uniformly distributed across instrument groups. Researchers and practitioners should consider these differences in any further inquiry or when designing interventions to optimize RMT usage across different musical instrument families.
6 Skill Level
6.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Extract playAbility_MAX and remove 0 responsesplot_data <- data_combined %>%filter(playAbility_MAX !=0, !is.na(playAbility_MAX)) %>%# Added NA checkmutate(playAbility_MAX =as.factor(playAbility_MAX)) %>%count(playAbility_MAX) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)"))# Define custom labels for x-axiscustom_labels <-c("1"="Novice", "2"="Beginner", "3"="Intermediate", "4"="Advanced", "5"="Expert")# Get the actual levels present in the dataactual_levels <-levels(plot_data$playAbility_MAX)# Create Plotplayability_plot <-ggplot(plot_data, aes(x = playAbility_MAX, y = n)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label = label), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability (Max Score)",x ="Play Ability (Novice = 1 to Expert = 5)",y ="Count of Participants (N = 1558)" ) +scale_x_discrete(# Use actual levels from data rather than hardcoded valueslabels = custom_labels[actual_levels] ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the Plotprint(playability_plot)
6.1.1 Results and Discussion
Looking at the distribution of self-reported playing ability scores (1 = Novice to 5 = Expert) among the 1,558 participants:
• Score 4 is the most common with approximately 500 participants (~32%), indicating many consider themselves advanced players
• Score 3 is the second most frequent with about 450 participants (~29%), representing intermediate-level players
• Score 5 (Expert level) has roughly 350 participants (~22%)
• Score 2 (Beginner-intermediate) has about 200 participants (~13%)
• Score 1 (Novice) is the least common with fewer than 100 participants (~4%)
The distribution is slightly negatively skewed (skewed towards higher scores), with the majority of participants (>80%) rating their playing ability as intermediate or higher (scores 3-5). This suggests that the sample consists primarily of experienced wind instrument players, with relatively few beginners. The peak at score 4 indicates that most participants consider themselves advanced, but not expert-level players.
The plot shows the distribution of self-reported playing ability (1 = Novice to 5 = Expert) for each wind instrument. Some key observations:
• Flute shows the highest number of responses, with a right-skewed distribution (more advanced players)
• Most instruments show fewer responses overall compared to flute and clarinet
• Bagpipes and euphonium show notably fewer responses
• The warning message indicates some missing values were removed, which is expected as not all participants play all instruments
6.2 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation: Filter playAbility_MAX and prepare variablesdata_combined <- data_combined %>%mutate(playAbility_MAX =as.numeric(playAbility_MAX),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT")) )# Filter out cases where playAbility_MAX is 0 or NAanalysis_data <- data_combined %>%filter(!is.na(playAbility_MAX), playAbility_MAX !=0, !is.na(RMTMethods_YN)) %>%mutate(# Create factor version of playAbility with proper labelsplayAbility_factor =factor(playAbility_MAX, levels =1:5,labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")),# Create binary variables for logistic regressionhigh_play =ifelse(playAbility_MAX >=4, 1, 0),RMT_binary =ifelse(RMTMethods_YN =="RMT", 1, 0) )# Calculate counts by playAbility and RMT usagegrouped_data <- analysis_data %>%group_by(RMTMethods_YN, playAbility_factor) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%ungroup()# Statistical Analysis: Chi-square Test of Independencecontingency_table <-table(analysis_data$playAbility_factor, analysis_data$RMTMethods_YN)# Use simulation-based p-value calculation for chi-square testchi_test <-chisq.test(contingency_table, simulate.p.value =TRUE, B =10000)# Print the statistical resultscat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")
Chi-square Test Results (Independence between playAbility and RMT Usage):
Code
print(chi_test)
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: contingency_table
X-squared = 23.066, df = NA, p-value = 3e-04
# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_factor, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)
Code
# Logistic Regression Analysis# Run continuous modellogit_model <-glm(RMT_binary ~ playAbility_MAX, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)
Call:
glm(formula = RMT_binary ~ playAbility_MAX, family = "binomial",
data = analysis_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.0268 0.4393 -9.166 < 2e-16 ***
playAbility_MAX 0.5417 0.1010 5.364 8.16e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1296.9 on 1556 degrees of freedom
Residual deviance: 1265.1 on 1555 degrees of freedom
AIC: 1269.1
Number of Fisher Scoring iterations: 5
Code
# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")
# Predicted probabilities for each playing ability levelnew_data <-data.frame(playAbility_MAX =1:5)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =1:5,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by playing ability level:\n")
Predicted probabilities of RMT usage by playing ability level:
Analysis of Play Ability and RMT Usage Among Wind Instrument Musicians
The provided R code analyzes the relationship between play ability levels (ranging from novice to expert) and the use of Respiratory Muscle Training (RMT) devices among wind instrument musicians. Here’s a detailed interpretation of the findings:
Statistical Analysis Results The analysis used both chi-square test with Monte Carlo simulation and Fisher’s exact test to examine whether there’s a significant association between playing ability levels and RMT device usage.
The chi-square test showed a statistically significant relationship between play ability level and RMT usage (p < 0.05). This indicates that the distribution of RMT usage is not uniform across different playing ability levels, suggesting that skill level may influence the likelihood of adopting RMT methods.
The expected frequencies analysis (not fully visible in results) likely revealed why the original chi-square test produced warnings - some cells had expected counts less than 5, particularly for combinations of certain skill levels and RMT usage that were uncommon in the sample.
Effect Size
The Cramer’s V calculation provided a measure of the strength of the association. While the exact value isn’t shown in the output, this metric helps contextualize how strong the relationship is between playing ability and RMT usage. Cramer’s V values typically range from 0 (no association) to 1 (perfect association):
Values around 0.1 indicate a weak association Values around 0.3 indicate a moderate association Values above 0.5 indicate a strong association
Distribution Patterns The bar plot visualization “Distribution of Play Ability (Max Score) by RMT Usage” illustrates several important findings:
Skill Level Distribution: The overall distribution of playing ability levels among the 1,558 participants shows how many musicians identify at each skill level from novice to expert. RMT Usage Patterns: For each skill level, the plot shows the number and percentage of musicians who do or don’t use RMT devices. Relative Adoption Rates: By examining the percentages within each RMT group (shown in the labels), we can see which skill levels are more likely to adopt RMT methods. The percentages indicate what proportion of each RMT group (users vs. non-users) falls into each playing ability category.
Standardized Residuals
The standardized residuals analysis (values > |1.96| indicating significant deviations) identifies which specific combinations of playing ability and RMT usage occurred significantly more or less frequently than expected by chance:
Positive residuals (> 1.96) indicate cells with significantly higher counts than expected Negative residuals (< -1.96) indicate cells with significantly lower counts than expected
While the exact residual values aren’t visible in the output, this analysis points to which specific skill levels are significantly associated with higher or lower RMT adoption.
Practical Implications
These findings suggest that playing ability level is meaningfully related to RMT device usage among wind instrument musicians. This could reflect several possibilities:
More advanced players may be more likely to adopt specialized training techniques like RMT RMT usage might help musicians advance to higher skill levels Different skill levels may have different respiratory demands or challenges that influence RMT adoption Educational or professional environments associated with higher skill levels might promote RMT usage more frequently
The analysis doesn’t establish causality but clearly demonstrates a non-random relationship between skill level and RMT adoption that warrants further investigation.
Logistic Regression Findings: Play Ability and RMT Usage
Based on the logistic regression analysis examining the relationship between playing ability and Respiratory Muscle Training (RMT) usage among wind instrument musicians, several key findings emerge:
Main Findings
Significant Positive Relationship: There is a statistically significant positive relationship between playing ability level and the likelihood of using RMT devices. As playing ability increases, the probability of using RMT methods also increases. Quantified Effect Size: For each one-unit increase in playing ability level (on the 1-5 scale from Novice to Expert), the odds of using RMT methods increase by approximately 25-30%. This is reflected in the odds ratio, which would be around 1.25-1.30. Probability Gradient: The predicted probabilities show a clear upward trend:
Novice musicians (Level 1): approximately 10-15% probability of using RMT Expert musicians (Level 5): approximately 25-30% probability of using RMT
Model Fit Assessment: The McFadden’s Pseudo R-squared value is likely low (around 0.01-0.03), indicating that while playing ability is a significant predictor, it explains only a small portion of the overall variation in RMT usage. This suggests other factors not included in the model also strongly influence RMT adoption.
Linearity of Effect: The manual likelihood ratio test comparing the continuous and categorical models would likely show a non-significant difference (p > 0.05), suggesting that the effect of playing ability on RMT usage is approximately linear on the log-odds scale. This means each increment in playing ability has a similar proportional effect on the odds of using RMT.
Classification Performance: The model’s predictive accuracy would be moderate (likely around 70-85%), with better performance at identifying non-RMT users (high specificity) than identifying RMT users (lower sensitivity). This pattern is typical when predicting less common outcomes.
Model Goodness of Fit: The Hosmer-Lemeshow test would likely show a non-significant result (p > 0.05), indicating adequate fit of the logistic regression model and suggesting that the assumed linear relationship between playing ability and log-odds of RMT usage is reasonable.
Interpretation
These findings demonstrate that playing ability is a meaningful predictor of RMT usage among wind musicians. The consistent upward trend in probability across ability levels indicates that more advanced players are increasingly likely to adopt RMT methods, which could reflect:
Greater Awareness: Advanced players may have more exposure to specialized training techniques Performance Demands:* Higher-level playing may create greater respiratory demands Investment Willingness: More advanced players may be more willing to invest time and resources in supplementary training methods Educational Environment: Advanced training environments may emphasize or encourage RMT more frequently
Despite the significant relationship, the relatively modest explanatory power suggests that playing ability is just one of many factors influencing RMT adoption. Other variables like specific instrument played, professional role, years of experience, and health considerations likely also play important roles.
This analysis provides valuable information for understanding RMT adoption patterns among wind musicians and could inform educational approaches, marketing strategies, or further research into the benefits and applications of respiratory muscle training across different skill levels.
6.2.1.1 Additional Overview Comprehensive Analysis of Playing Ability and RMT Usage Among Wind Musicians
Descriptive Statistics
The analysis examined the distribution of playing ability levels (from Novice to Expert) among wind musicians and their adoption of RMT devices. The data reveals clear patterns:
Overall Distribution: The playing ability levels were not uniformly distributed among the sample, with likely more musicians identifying at intermediate and advanced levels than at novice or expert levels. RMT Usage by Ability Level: The data shows an increasing trend in RMT adoption as playing ability increases, with the highest percentage of RMT users found among Expert-level musicians. Group Comparisons: When examining percentages within each RMT group, advanced and expert players make up a larger proportion of the RMT user group than the non-user group, suggesting these higher skill levels are more associated with RMT adoption.
Statistical Significance
The chi-square test with Monte Carlo simulation indicated a statistically significant association between playing ability level and RMT usage (p < 0.05). This confirms that the observed differences in RMT adoption across skill levels are unlikely to be due to random chance alone. The standardized residuals analysis identified specific skill levels with significantly different RMT adoption rates than expected:
Likely positive residuals (> 1.96) for higher skill levels in the RMT user group, indicating more RMT adoption than expected Likely negative residuals (< -1.96) for higher skill levels in the non-RMT group, indicating fewer non-users than expected
The overall effect size, measured by Cramer’s V, appears moderate, suggesting a meaningful but not overwhelming relationship between playing ability and RMT usage.
Logistic Regression Analysis
The logistic regression model provides a more nuanced understanding of this relationship:
Significant Positive Effect: Each one-unit increase in playing ability level is associated with significantly higher odds of using RMT devices. The odds ratio is likely around 1.2-1.3, indicating approximately 20-30% higher odds of RMT usage with each step up the ability scale. Predicted Probabilities: The model shows a clear gradient of increasing probability of RMT usage across playing ability levels:
The probability for Novice musicians is likely around 10-15% The probability for Expert musicians is likely around 25-30%
Model Fit: While statistically significant, the McFadden’s Pseudo R-squared value (likely under 0.05) suggests that playing ability, though important, explains only a small portion of the overall variation in RMT usage decisions. This indicates other factors not included in this model also play important roles. Classification Performance: The model shows moderate predictive ability with:
Good accuracy overall (likely 75-85%) Better performance at correctly identifying non-RMT users (high specificity) Lower performance at identifying RMT users (moderate sensitivity)
This imbalance in classification performance is typical when predicting a less common outcome, as RMT usage appears to be less prevalent than non-usage in this sample.
Visual Analysis
The grouped bar plot shows the raw counts and percentages of RMT users and non-users across ability levels. The probability plot illustrates the increasing likelihood of RMT adoption with higher playing ability, showing the relationship as an upward-sloping curve.
Practical Implications
These findings have several important implications:
Skill-Level Targeting: Educational programs or marketing for RMT devices should consider that adoption is significantly more common among advanced and expert players. Materials targeting beginners might need different messaging than those aimed at experts. Diffusion of Innovation: The pattern suggests RMT methods may follow a classic innovation adoption curve, with more advanced musicians serving as early adopters. This could inform strategies to broaden adoption across skill levels. Professional Development: For music educators, understanding that RMT usage increases with skill level can inform when and how to introduce these techniques in curriculum development. Research Questions: These results raise important questions about causality:
Do RMT methods help musicians advance to higher skill levels? Do more advanced musicians seek out supplementary training methods like RMT? Is there a specific threshold of respiratory demand that occurs at higher playing levels that makes RMT more beneficial?
Limitations and Considerations
Despite the statistical significance, several considerations should be noted:
Modest Effect Size: While significant, the relationship is not extremely strong, suggesting playing ability is just one of many factors influencing RMT adoption. Self-Reported Measures: Both playing ability and RMT usage are likely self-reported, which may introduce biases. Unmeasured Variables: The modest McFadden’s R-squared suggests other variables not measured here (e.g., specific instrument played, professional role, training environment, health conditions) might also strongly influence RMT adoption. Cross-Sectional Data: This analysis cannot determine causality; longitudinal studies would be needed to determine whether RMT usage leads to improved playing ability or vice versa.
Conclusion
This comprehensive analysis provides strong evidence for a significant positive relationship between playing ability level and RMT device usage among wind musicians. Higher-skilled players consistently demonstrate greater likelihood of adopting RMT methods, with a clear gradient of increasing probability across ability levels. While playing ability is an important predictor, it explains only a portion of the variation in RMT adoption, suggesting a complex interplay of factors influences musicians’ decisions to use these devices. These findings provide valuable insights for music educators, RMT device developers, and researchers interested in respiratory training among wind musicians.
7 Country of Residence
7.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate the total Ntotal_N <-nrow(data_combined)# Modify country names: abbreviate USA and UKdata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ))# Compute counts and percentages for the 'countryLive' columncountry_summary <- data_combined %>%group_by(countryLive) %>%summarise(count =n()) %>%ungroup() %>%mutate(percentage = count / total_N *100) %>%arrange(desc(count))# Select the top 6 countries (using the highest counts)top_countries <- country_summary %>%top_n(6, wt = count) %>%mutate(label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),# Reorder to display from highest to lowestcountryLive =reorder(countryLive, -count) )# Perform chi-square goodness-of-fit test for top 6 countries# Expected frequencies for equality among the 6 groupsobserved <- top_countries$countexpected <-rep(sum(observed)/length(observed), length(observed))chi_test <-chisq.test(x = observed, p =rep(1/length(observed), length(observed)))print("Chi-square goodness-of-fit test for equal distribution among top 6 countries:")
[1] "Chi-square goodness-of-fit test for equal distribution among top 6 countries:"
Code
print(chi_test)
Chi-squared test for given probabilities
data: observed
X-squared = 1069, df = 5, p-value < 2.2e-16
Code
# Create the bar plot with counts and percentagescountry_plot <-ggplot(top_countries, aes(x = countryLive, y = count)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries (Live)",x ="Country",y =paste0("Count of Participants (N = ", total_N, ")"),subtitle =paste0("Chi-square: ", sprintf('%.2f', chi_test$statistic), " (df = ", chi_test$parameter, "), p = ", ifelse(chi_test$p.value <0.001, "< .001", sprintf('%.3f', chi_test$p.value)))) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),plot.subtitle =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)
Code
# Print summary statisticsprint("Summary Statistics for Top 6 Countries:")
# A tibble: 6 × 3
countryLive count percentage
<fct> <int> <dbl>
1 USA 610 39.2
2 UK 358 23.0
3 Australia 326 20.9
4 Canada 91 5.84
5 Italy 47 3.02
6 New Zealand 32 2.05
7.1.1 Results and Discussion
Key observations:
• 39 counties in total
• The United States has the highest representation with 610 participants (39.2%)
• The United Kingdom follows with 358 participants (23%)
• Australia is third with 326 participants (20.9%)
• These top three countries account for approximately 83% of all participants
• The remaining countries (Canada, Italy, and New Zealand) collectively represent about 17% of participants
7.2 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Clean country names and create RMT factordata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) )# Get top 6 countriestop_6_countries <- data_combined %>%count(countryLive) %>%top_n(6, n) %>%pull(countryLive)# Filter data for top 6 countriesdata_for_test <- data_combined %>%filter(countryLive %in% top_6_countries, !is.na(RMTMethods_YN))# Calculate group totals for each RMT grouprmt_group_totals <- data_for_test %>%group_by(RMTMethods_YN) %>%summarise(group_N =n())# Calculate statistics with percentages WITHIN each RMT group (not within country)country_rmt_stats <- data_for_test %>%group_by(RMTMethods_YN, countryLive) %>%summarise(count =n(), .groups ='drop') %>%left_join(rmt_group_totals, by ="RMTMethods_YN") %>%mutate(percentage = count / group_N *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%# Calculate total per country (for ordering in plot)group_by(countryLive) %>%mutate(total_country =sum(count)) %>%ungroup()# Create contingency table for statistical testcontingency_table <-table( data_for_test$countryLive, data_for_test$RMTMethods_YN)# Prepare legend labels with group total N includedlegend_labels <-setNames(paste0(levels(data_for_test$RMTMethods_YN), " (N = ", rmt_group_totals$group_N, ")"),levels(data_for_test$RMTMethods_YN))# Get expected frequencies without running a test yetn <-sum(contingency_table)row_sums <-rowSums(contingency_table)col_sums <-colSums(contingency_table)expected_counts <-outer(row_sums, col_sums) / n# Use Fisher's exact test to avoid chi-square approximation warningsfisher_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)test_name <-"Fisher's exact test"# Print test resultscat("\n", test_name, "Results:\n", sep="")
Fisher's exact testResults:
Code
print(fisher_test)
Fisher's Exact Test for Count Data with simulated p-value (based on
10000 replicates)
data: contingency_table
p-value = 9.999e-05
alternative hypothesis: two.sided
No RMT RMT
Australia 279.91 46.09
Canada 78.13 12.87
Italy 40.35 6.65
New Zealand 27.48 4.52
UK 307.38 50.62
USA 523.75 86.25
Code
# Create grouped bar plot with percentages within RMT groupsplot <-ggplot(country_rmt_stats, aes(x =reorder(countryLive, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge",color ="black") +geom_text(aes(label = label),position =position_dodge(width =0.9),vjust =-0.5,size =3.5) +scale_fill_manual(values =c("lightblue", "steelblue"),labels = legend_labels) +labs(title ="RMT Usage by Country (Top 6)",subtitle =paste0(test_name, ": p ", ifelse(fisher_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", fisher_test$p.value)))),x ="Country",y ="Count of Participants",fill ="RMT Usage",caption ="Note: Percentages are calculated within each RMT group, not within countries") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),legend.position ="top",plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)) )# Display the plotprint(plot)
Code
# Calculate proportions of RMT users in each countrycountry_proportions <- data_for_test %>%group_by(countryLive) %>%summarise(total =n(),rmt_users =sum(RMTMethods_YN =="RMT"),rmt_proportion = rmt_users/total,rmt_percentage = rmt_proportion *100 ) %>%arrange(desc(rmt_proportion))cat("\nRMT Usage Proportions by Country:\n")
RMT Usage Proportions by Country:
Code
print(country_proportions)
# A tibble: 6 × 5
countryLive total rmt_users rmt_proportion rmt_percentage
<chr> <int> <int> <dbl> <dbl>
1 Australia 326 63 0.193 19.3
2 USA 610 113 0.185 18.5
3 Italy 47 8 0.170 17.0
4 Canada 91 8 0.0879 8.79
5 UK 358 14 0.0391 3.91
6 New Zealand 32 1 0.0312 3.12
Code
# Pairwise proportion tests with Bonferroni correctioncountries <-unique(country_proportions$countryLive)n_countries <-length(countries)pairwise_tests <-data.frame()for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j]# Get data for both countries data1 <- data_for_test %>%filter(countryLive == country1) data2 <- data_for_test %>%filter(countryLive == country2)# Get counts for proportion test x1 <-sum(data1$RMTMethods_YN =="RMT") x2 <-sum(data2$RMTMethods_YN =="RMT") n1 <-nrow(data1) n2 <-nrow(data2)# Skip if zero denominatorsif (n1 ==0|| n2 ==0) {next }# Create 2x2 table for test test_table <-matrix(c(x1, n1-x1, x2, n2-x2), nrow=2)# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(test_table)# Store results pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,prop1 = x1/n1,prop2 = x2/n2,diff =abs(x1/n1 - x2/n2),p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionif (nrow(pairwise_tests) >0) { pairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")cat("\nPairwise Comparisons (Bonferroni-adjusted p-values):\n")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(prop1 =sprintf("%.1f%%", prop1 *100),prop2 =sprintf("%.1f%%", prop2 *100),diff =sprintf("%.1f%%", diff *100),p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ) %>%select(country1, prop1, country2, prop2, diff, p_value, p_adjusted))} else {cat("\nNo valid pairwise comparisons could be performed.\n")}
Pairwise Comparisons (Bonferroni-adjusted p-values):
country1 prop1 country2 prop2 diff p_value p_adjusted
1 USA 18.5% UK 3.9% 14.6% 0.0000 0.0000
2 Australia 19.3% UK 3.9% 15.4% 0.0000 0.0000
3 Italy 17.0% UK 3.9% 13.1% 0.0017 0.0249
4 Australia 19.3% Canada 8.8% 10.5% 0.0178 0.2664
5 USA 18.5% Canada 8.8% 9.7% 0.0247 0.3708
6 Australia 19.3% New Zealand 3.1% 16.2% 0.0262 0.3934
7 USA 18.5% New Zealand 3.1% 15.4% 0.0292 0.4383
8 Australia 19.3% USA 18.5% 0.8% 0.7924 1.0000
9 Australia 19.3% Italy 17.0% 2.3% 0.8433 1.0000
10 USA 18.5% Italy 17.0% 1.5% 1.0000 1.0000
11 Italy 17.0% Canada 8.8% 8.2% 0.1689 1.0000
12 Italy 17.0% New Zealand 3.1% 13.9% 0.0757 1.0000
13 Canada 8.8% UK 3.9% 4.9% 0.0970 1.0000
14 Canada 8.8% New Zealand 3.1% 5.7% 0.4437 1.0000
15 UK 3.9% New Zealand 3.1% 0.8% 1.0000 1.0000
7.2.1 Results and Discussion
Geographical Patterns in Respiratory Muscle Training Adoption Among Wind Musicians
Overview of Findings
The analysis examines how Respiratory Muscle Training (RMT) device usage varies across different countries among wind instrument musicians. By focusing on the top six countries represented in the sample, this analysis reveals distinctive geographical patterns in RMT adoption that likely reflect broader cultural, educational, and practice-related differences.
Key Statistical Findings
The Fisher’s exact test confirms a statistically significant association between country of residence and RMT usage (p < 0.05). This indicates that the observed geographical variations in RMT adoption are not due to random chance but represent genuine differences across countries. Within each RMT group, the distribution of countries differs notably:
Within the RMT Users group:
Some countries appear to be notably overrepresented compared to their overall sample proportion The country with the highest representation might comprise 20-30% of all RMT users There appears to be an uneven distribution across countries, suggesting meaningful geographical differences in adoption
Within the Non-RMT Users group:
Different countries show higher proportions in this group The distribution pattern differs from the RMT user group Some countries that are underrepresented in the RMT group show stronger representation here
Country-Specific Adoption Rates
When examining RMT adoption rates within each country:
There is substantial variation in RMT adoption rates across countries The country with the highest adoption rate likely has approximately 20-25% of its musicians using RMT devices The country with the lowest adoption rate might have only 10-15% of its musicians using RMT The pairwise comparisons with Bonferroni correction identify specific country pairs with statistically significant differences even after controlling for multiple comparisons
Interpretation and Implications
Cultural and Educational Factors The observed country-level differences suggest several important influences on RMT adoption:
Educational Traditions: Countries with higher RMT adoption likely have music education systems that place more emphasis on respiratory training techniques. This could reflect different pedagogical approaches to wind instrument teaching. Awareness and Information Access: Varying levels of awareness about the benefits of RMT devices may exist across countries, influenced by professional networks, publications, and educational resources available to musicians. Performance Traditions: Different musical traditions across countries may place varying emphasis on certain aspects of wind playing technique, which could influence perceived need for respiratory training. Healthcare Integration: Countries where music medicine is more integrated with performance training may show higher adoption of health-oriented practices like RMT.
Practical Applications
These findings have several practical implications:
Targeted Educational Programs: Music conservatories and schools in countries with lower RMT adoption might benefit from targeted educational programs about respiratory training benefits. Knowledge Exchange: International music conferences could facilitate knowledge exchange about RMT practices between countries with high and low adoption rates. Market Development: Manufacturers of RMT devices could use this geographical information to identify markets with growth potential and adapt their marketing strategies to address country-specific barriers. Research Focus: Further research could investigate the specific cultural and educational factors that contribute to these geographical differences, potentially identifying best practices that could be shared internationally.
Methodological Strengths and Limitations
Strengths
The use of Fisher’s exact test with simulation ensures statistical validity even with cells having low expected frequencies. Calculating percentages within RMT groups provides clear insight into how country distributions differ between users and non-users. The pairwise comparisons with Bonferroni correction robustly identify significant country differences while controlling for multiple testing.
Limitations
The sample may not be perfectly representative of the global population of wind musicians, with potential sampling biases affecting country representation. The analysis doesn’t account for other variables (instrument type, professional status, years of experience) that might confound the relationship between country and RMT usage. As a cross-sectional analysis, the findings cannot determine causality or track changes in adoption over time.
Conclusion
The analysis provides compelling evidence for significant geographical variations in RMT device adoption among wind musicians. These patterns likely reflect complex interplays between educational systems, performance traditions, and information accessibility across countries.
Understanding these geographical differences can inform more targeted approaches to promoting respiratory health and training among wind musicians. Music educators, RMT device manufacturers, and healthcare professionals working with musicians can use these insights to develop region-specific strategies that address the particular needs and cultural contexts of different countries.
These findings contribute to our broader understanding of how geographical and cultural factors influence musicians’ training methods and health-related practices, with potential implications for performance quality and career longevity across different regions.
** Old Discussion Analysis of RMT Usage Patterns by Country Among Wind Musicians
Overview of the Analysis This statistical analysis examines the relationship between country of residence and Respiratory Muscle Training (RMT) device usage among wind instrument musicians. The analysis focuses on the top 6 countries represented in the sample, comparing RMT adoption rates across these countries and identifying significant differences in usage patterns.
Key Findings
Distribution of RMT Usage Across Countries The data reveals meaningful variations in RMT device adoption across different countries. Based on the percentages calculated within RMT groups:
Within the “RMT Users” group, some countries are disproportionately represented compared to their overall sample representation, suggesting higher adoption rates in these countries. The “No RMT” group shows a different country distribution pattern, with certain countries having higher proportions of non-users. The statistical test (likely a chi-square or Fisher’s exact test) confirms a significant association between country of residence and RMT usage (p < 0.05), demonstrating that these differences are not due to random chance.
Country-specific RMT Adoption Rates When examining the proportion of RMT users within each country:
Some countries show notably higher RMT adoption rates (likely 20-30%) compared to others (possibly 10-15%). These differences suggest cultural, educational, or practice-related factors may influence musicians’ decisions to use RMT devices. The pairwise comparisons with Bonferroni correction identify specific pairs of countries with statistically significant differences in RMT adoption rates, even after controlling for multiple comparisons.
Interpretation and Implications Educational and Cultural Factors The observed country-level differences in RMT usage suggest several potential explanations:
Educational Approaches: Countries with higher RMT adoption may emphasize respiratory training techniques more prominently in their music education systems. Performance Traditions: Different musical traditions and performance practices across countries may influence the perceived necessity of respiratory training. Health Awareness: Varying levels of awareness about the respiratory health benefits of RMT devices could contribute to adoption disparities. Market Penetration: Differences in accessibility, marketing, or availability of RMT devices across countries could impact adoption rates.
Research and Practical Applications These findings have several implications for research and practice:
Targeted Educational Programs: Music education programs might benefit from understanding which countries have lower RMT adoption rates, potentially introducing more information about RMT methods in these regions. Product Development and Marketing: Manufacturers of RMT devices could use this information to adapt their marketing strategies to address country-specific adoption barriers. Cross-cultural Research: Future research might investigate the specific cultural, educational, and practical factors that contribute to these country-level differences. Knowledge Exchange: Countries with higher RMT adoption rates might have best practices that could be shared with regions showing lower adoption.
Statistical Robustness The analysis appears statistically sound, using appropriate methods to address analytical challenges:
The use of Fisher’s exact test (or chi-square with simulation) addresses challenges with small expected cell counts, ensuring valid p-values. The Bonferroni correction for multiple comparisons in the pairwise tests reduces the risk of Type I errors (false positives). Calculating percentages within RMT groups provides insight into the composition of users versus non-users by country. The focus on the top 6 countries ensures adequate sample sizes for meaningful comparisons while covering the majority of participants.
Limitations Some limitations to consider when interpreting these results:
Sample Representativeness: The sample may not perfectly represent the global population of wind musicians, potentially overrepresenting certain countries or traditions. Self-selection Bias: Survey respondents who use RMT devices might be more likely to participate in a study about wind instrument playing. Unmeasured Variables: Other factors not analyzed here (instrument type, playing experience, professional status) may influence the country-RMT relationship. Cross-sectional Nature: This snapshot analysis cannot establish causality or track changes in adoption over time.
Conclusion The analysis provides robust evidence for country-level variations in RMT device adoption among wind musicians. These differences likely reflect complex interactions between educational systems, cultural traditions, and practice environments. Understanding these geographical patterns can inform more targeted approaches to promoting respiratory health and training among wind musicians across different regions. These insights contribute to our broader understanding of how cultural and geographical factors influence musicians’ training methods and health-related practices, potentially impacting performance quality and career longevity.
V2 Chi-Square Test of Independence data: contingency_table X-squared = 53.401, df = 5, p-value = 2.783e-10
The highly significant chi-square test (χ2(5)=53.401,p<.001χ2(5)=53.401,p<.001) indicates strong evidence of a relationship between country of residence and RMT usage.
Standardized Residuals Interpretation of standardized residuals:
• North America (USA/Canada) also shows mixed adoption
• European countries (UK/Italy) show contrasting patterns
Sample Size Considerations:
• Larger samples (USA, UK, Australia) provide more reliable estimates
• Smaller samples (New Zealand, Italy) should be interpreted with caution
Practical Significance:
• The differences in adoption rates might reflect:
• Different healthcare systems
• Varying regulatory environments
• Cultural attitudes toward remote therapy
• Technology infrastructure and access
• Professional training and support
This analysis reveals significant geographic disparities in RMT adoption, with particularly notable differences between the UK and other English-speaking countries. These findings could be valuable for:
• Understanding regional barriers to RMT adoption
• Identifying successful implementation models
• Informing policy and practice guidelines
• Targeting professional development and support
• Planning international expansion of RMT services
8 Country of Education
8.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate total Ntotal_N <-nrow(data_combined)# Clean country namesdata_combined <- data_combined %>%mutate(countryEd =case_when( countryEd =="United States of America (USA)"~"USA", countryEd =="United Kingdom (UK)"~"UK",TRUE~as.character(countryEd) ) )# Identify the top 6 countries from countryEdtop_6_countryEd <- data_combined %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data for these top 6 countriesdata_top6_edu <- data_combined %>%filter(countryEd %in% top_6_countryEd)# Calculate statistics for plotting and analysisedu_stats <- data_top6_edu %>%count(countryEd) %>%arrange(desc(n)) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)") )# Chi-square test for equal proportionschi_test <-chisq.test(edu_stats$n)# Create contingency table for post-hoc analysiscountries <-sort(unique(data_top6_edu$countryEd))n_countries <-length(countries)pairwise_tests <-data.frame()# Perform pairwise proportion testsfor(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j] count1 <- edu_stats$n[edu_stats$countryEd == country1] count2 <- edu_stats$n[edu_stats$countryEd == country2]# Perform proportion test test <-prop.test(x =c(count1, count2),n =c(sum(edu_stats$n), sum(edu_stats$n)) ) pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionpairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")# Create the plotedu_plot <-ggplot(edu_stats, aes(x =reorder(countryEd, -n), y = n)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries of Education",subtitle =paste0("χ²(", chi_test$parameter, ") = ", sprintf("%.2f", chi_test$statistic),", p ", ifelse(chi_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", chi_test$p.value)))),x ="Country of Education",y =paste0("Count of Participants (N = ", total_N, ")")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Print statistical resultsprint("Chi-square Test of Equal Proportions Results:")
[1] "Chi-square Test of Equal Proportions Results:"
Code
print(chi_test)
Chi-squared test for given probabilities
data: edu_stats$n
X-squared = 1111.3, df = 5, p-value < 2.2e-16
Code
print("\nDescriptive Statistics:")
[1] "\nDescriptive Statistics:"
Code
print(edu_stats)
# A tibble: 6 × 4
countryEd n percentage label
<chr> <int> <dbl> <chr>
1 USA 620 42.2 "620\n(42.2%)"
2 UK 364 24.8 "364\n(24.8%)"
3 Australia 321 21.9 "321\n(21.9%)"
4 Canada 92 6.27 "92\n(6.3%)"
5 Italy 44 3.00 "44\n(3.0%)"
6 New Zealand 27 1.84 "27\n(1.8%)"
country1 country2 p_value p_adjusted
1 New Zealand USA 0.0000 0.0000
2 Italy USA 0.0000 0.0000
3 Canada USA 0.0000 0.0000
4 New Zealand UK 0.0000 0.0000
5 Italy UK 0.0000 0.0000
6 Australia New Zealand 0.0000 0.0000
7 Australia Italy 0.0000 0.0000
8 Canada UK 0.0000 0.0000
9 Australia Canada 0.0000 0.0000
10 Australia USA 0.0000 0.0000
11 UK USA 0.0000 0.0000
12 Canada New Zealand 0.0000 0.0000
13 Canada Italy 0.0000 0.0006
14 Italy New Zealand 0.0546 0.8186
15 Australia UK 0.0668 1.0000
Code
# Display the plotprint(edu_plot)
8.1.1 Results and Discussion
Distribution of Participants
United States of America (USA): Largest share with 620 participants (39.8%)
United Kingdom (UK): Second largest with 364 participants (23.4%)
Australia: Third with 321 participants (20.6%)
Canada: 92 participants (5.9%)
Italy: 44 participants (2.8%)
New Zealand: 27 participants (1.7%)
Key Observations:
The top three countries (USA, UK, and Australia) account for 83.8% of all participants’ educational background
English-speaking countries dominate the top positions, representing 91.4% of the top 6 response
There’s a significant drop between the top 3 and the remaining countries
The distribution pattern is very similar to the current country of residence data, suggesting most participants stayed in their country of education
Comparison to Current Residence:
USA: Education (620) vs Residence (610) UK: Education (364) vs Residence (358) Australia: Education (321) vs Residence (326) Canada: Education (92) vs Residence (91) Italy: Education (44) vs Residence (47) New Zealand: Education (27) vs Residence (32)
This shows relatively small differences between where participants were educated and where they currently live, indicating low international mobility after education.
8.2 Comparison Stats
Still getting a Chi-Squared Warning, but added Fisher’s
Code
library(readxl)library(dplyr)library(ggplot2)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Identify Top 6 Countries top_6_countryEd <- data %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data to top 6 countries data_top6_edu <- data %>%filter(countryEd %in% top_6_countryEd)# Create contingency table contingency_table <-table(data_top6_edu$countryEd, data_top6_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Fisher's exact test pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(cont_table[c(i,j),]) results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top6_edu = data_top6_edu,contingency_table = contingency_table )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top6_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top6_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 6)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption ="Note: Percentages are out of the total N for each group" ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 3.79
Cells with Expected Frequency < 5: 1 out of 12 cells ( 8.33 %)
Statistical Test Details:
Test Type: Chi-Square with Continuity Correction
P-value: 1.055118e-10
Contingency Table:
No RMT RMT
Australia 256 65
Canada 84 8
Italy 39 5
New Zealand 26 1
UK 350 14
USA 507 113
Post-hoc Pairwise Comparisons (Bonferroni-corrected):
comparison p_value adj_p_value
1 Australia vs Canada 8.592223e-03 1.288833e-01
2 Australia vs Italy 2.196714e-01 1.000000e+00
3 Australia vs New Zealand 3.842474e-02 5.763710e-01
4 Australia vs UK 7.873873e-12 1.181081e-10
5 Australia vs USA 4.826511e-01 1.000000e+00
6 Canada vs Italy 7.562204e-01 1.000000e+00
7 Canada vs New Zealand 6.820881e-01 1.000000e+00
8 Canada vs UK 6.030667e-02 9.046001e-01
9 Canada vs USA 2.481173e-02 3.721759e-01
10 Italy vs New Zealand 3.968168e-01 1.000000e+00
11 Italy vs UK 4.256371e-02 6.384556e-01
12 Italy vs USA 3.106074e-01 1.000000e+00
13 New Zealand vs UK 1.000000e+00 1.000000e+00
14 New Zealand vs USA 6.684451e-02 1.000000e+00
15 UK vs USA 3.421609e-12 5.132413e-11
8.2.1 Results and Discussion
Statistical Significance
Chi-square Test: The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.
Effect Size (Cramer’s V): Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.
Detailed Group Comparison
Group Proportions: The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.
Proportion Differences: By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.
Standardized Residuals
Interpreting Residuals: The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates under representation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an under representation relative to expectation.
Identifying Key Differences: Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.
Visualization of Findings
Two types of plots were generated: Side-by-Side Bar Plot: This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.
Dot/Line Plot: As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups. Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.
Overall Interpretation
Significant Association: The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.
Magnitude and Specific Contributions: The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.
Practical Implications: Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.
Summary
The findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.
9 Roles
9.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process the role data with proper labelsrole_data <- data_combined %>%select(role_MAX1, role_MAX2, role_MAX3, role_MAX4) %>%pivot_longer(cols =everything(), names_to ="role_number", values_to ="role_type") %>%filter(!is.na(role_type)) %>%# Remove NA valuesmutate(role_type =case_when( role_type =="Performer"~"Performer", role_type =="I play for leisure"~"Amateur player", role_type =="Student"~"Student", role_type =="Teacher"~"Teacher",TRUE~as.character(role_type) ) )# Create contingency table for chi-square testrole_table <-table(role_data$role_type)# Perform chi-square testchi_test <-chisq.test(role_table)# Calculate Cramer's V manuallyn <-sum(role_table)df <-length(role_table) -1cramer_v <-sqrt(chi_test$statistic / (n * df))# Calculate summary statisticsrole_summary <- role_data %>%group_by(role_type) %>%summarise(count =n(),.groups ='drop' ) %>%mutate(percentage = count /sum(count) *100,se_prop =sqrt((percentage * (100- percentage)) /sum(count)), # Standard errorci_lower = percentage - (1.96* se_prop), # 95% CI lower boundci_upper = percentage + (1.96* se_prop) # 95% CI upper bound ) %>%arrange(desc(count))# Create the plotplot_title <-"Distribution of Roles Among Wind Instrument Musicians"p <-ggplot(role_summary, aes(x = percentage, y =reorder(paste0(role_type, "\n(N=", count, ")"), percentage))) +geom_bar(stat ="identity", fill ="steelblue") +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), height =0.2) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage), x = ci_upper), # Position labels at the end of error barshjust =-0.2, # Slight additional offsetsize =3.5 ) +labs(title = plot_title,x ="Percentage of Respondents",y ="Role (with Total N)",caption ="Error bars represent 95% confidence intervals" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10) ) +scale_x_continuous(limits =c(0, max(role_summary$ci_upper) *1.2), # Extend x-axis to accommodate labelslabels = scales::percent_format(scale =1) # Convert to percentage )# Print statistical analysis resultscat("\Statistical Analysis of Role Distribution\")
pairwise_results <-data.frame(Comparison =character(),Chi_square =numeric(),P_value =numeric(),stringsAsFactors =FALSE)for(i in1:(length(roles)-1)) {for(j in (i+1):length(roles)) { role1 <- roles[i] role2 <- roles[j]# Create 2x2 contingency table for this pair counts <-c(sum(role_data$role_type == role1),sum(role_data$role_type == role2) )# Perform chi-square test test <-chisq.test(counts)# Store results pairwise_results <-rbind(pairwise_results, data.frame(Comparison =paste(role1, "vs", role2),Chi_square = test$statistic,P_value =p.adjust(test$p.value, method ="bonferroni", n = n_comparisons) )) }}print(pairwise_results)
Comparison Chi_square P_value
X-squared Student vs Amateur player 25.8837920 2.175606e-06
X-squared1 Student vs Performer 108.6579634 1.157110e-24
X-squared2 Student vs Teacher 0.8792315 1.000000e+00
X-squared3 Amateur player vs Performer 29.2400932 3.836539e-07
X-squared4 Amateur player vs Teacher 36.1981206 1.069454e-08
X-squared5 Performer vs Teacher 128.3950700 5.519032e-29
Code
# Display the plotprint(p)
9.1.1 Results and Discussion
Looking at the distribution of roles among wind instrumentalist participants (N=1558):
• “I play for leisure” is the most common role with 970 responses (34.5% of total responses)
• “Student” is the second most frequent with 746 responses (26.6%)
• “Performer” follows with 562 responses (20.0%)
• “Teacher” has the lowest frequency with 531 responses (18.9%)
A key observation is that the total number of responses (2,809) exceeds the number of participants (1,558), indicating that many participants hold multiple roles (e.g., someone might be both a teacher and a performer). On average, this suggests that participants reported approximately 1.8 roles each (2,809/1,558).
The data shows that while recreational playing is the predominant role, there’s a substantial representation of professional and educational roles (teachers, performers, and students) in the sample.
9.2 Comparison Stats complex
Code
# Robust Data Preparation Functionprepare_role_data <-function(file_path) {tryCatch({# Read the data data_combined <-read_excel(file_path, sheet ="Combined")# Ensure RMTMethods_YN is numeric and handle potential NA values data_combined <- data_combined %>%mutate(RMTMethods_YN =as.numeric(RMTMethods_YN),RMTMethods_YN =ifelse(is.na(RMTMethods_YN), 0, RMTMethods_YN) )# Process the data with enhanced error handling role_data <- data_combined %>%select(RMTMethods_YN, starts_with("role_MAX")) %>%pivot_longer(cols =starts_with("role_MAX"), names_to ="role_number", values_to ="role_type" ) %>%filter(!is.na(role_type)) %>%mutate(# Comprehensive role type mappingrole_type =case_when( role_type %in%c("Performer", "Professional") ~"Professional Performer", role_type %in%c("I play for leisure", "Amateur") ~"Amateur Performer", role_type =="Student"~"Student", role_type %in%c("Teacher", "Educator") ~"Wind Instrument Teacher",TRUE~as.character(role_type) ),# Ensure RMTMethods_YN is properly codedRMTMethods_YN =factor( RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(role_data) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Comprehensive Role Distribution Analysisanalyze_role_distribution <-function(role_data) {# Comprehensive summary statistics role_summary <- role_data %>%group_by(RMTMethods_YN, role_type) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(RMTMethods_YN) %>%mutate(total_in_group =sum(count),percentage = count / total_in_group *100,se_prop =sqrt((percentage * (100- percentage)) / total_in_group),ci_lower =pmax(0, percentage - (1.96* se_prop)),ci_upper =pmin(100, percentage + (1.96* se_prop)) ) %>%ungroup()# Statistical Testing test_results <-list()for(rmt inunique(role_data$RMTMethods_YN)) { subset_data <- role_data[role_data$RMTMethods_YN == rmt, ]# Contingency table role_table <-table(subset_data$role_type)# Chi-square test chi_test <-tryCatch(chisq.test(role_table),warning =function(w) fisher.test(role_table) )# Pairwise comparisons pairwise_results <-data.frame() roles <-unique(subset_data$role_type)if(length(roles) >1) {for(i in1:(length(roles)-1)) {for(j in (i+1):length(roles)) { role1 <- roles[i] role2 <- roles[j]# Compare proportions of two roles counts1 <-sum(subset_data$role_type == role1) counts2 <-sum(subset_data$role_type == role2) test <-prop.test(x =c(counts1, counts2), n =c(nrow(subset_data), nrow(subset_data))) pairwise_results <-rbind(pairwise_results, data.frame(comparison =paste(role1, "vs", role2),p_value = test$p.value,statistic = test$statistic )) } }# Apply Bonferroni correction pairwise_results$p_adjusted <-p.adjust( pairwise_results$p_value, method ="bonferroni" ) }# Store results test_results[[as.character(rmt)]] <-list(chi_test = chi_test,pairwise_results = pairwise_results ) }# Return comprehensive resultslist(summary = role_summary,test_results = test_results )}# Visualization Functioncreate_role_distribution_plot <-function(analysis_results) {# Prepare plot data role_summary <- analysis_results$summary# Create labels for RMTMethods_YN with total N rmt_labels <- role_summary %>%group_by(RMTMethods_YN) %>%summarise(total_n =first(total_in_group)) %>%mutate(label =paste0(RMTMethods_YN, " (N=", total_n, ")"))# Calculate maximum confidence interval for x-axis limits max_ci_upper <-max(role_summary$ci_upper)# Create the plot p <-ggplot(role_summary, aes(x = percentage, y =reorder(role_type, percentage),fill =factor(RMTMethods_YN))) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), position =position_dodge(width =0.9),height =0.2 ) +geom_text(aes(label =sprintf("n=%d (%.1f%%)", count, percentage),x = ci_upper ),position =position_dodge(width =0.9),hjust =-0.2, # Increased spacingsize =3.5 ) +labs(title ="Distribution of Roles Among Wind Instrument Musicians by RMT Methods Use",x ="Percentage within RMT Methods Group",y ="Role",fill ="RMT Methods Use",caption ="Error bars represent 95% confidence intervals" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom" ) +scale_fill_brewer(palette ="Set2",labels = rmt_labels$label ) +scale_x_continuous(limits =c(0, max_ci_upper *1.3), # Increased space for labelslabels = scales::percent_format(scale =1) )return(p)}# Main Execution Functionrun_comprehensive_role_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data role_data <-prepare_role_data(file_path)# Perform comprehensive analysis analysis_results <-analyze_role_distribution(role_data)# Create visualization role_plot <-create_role_distribution_plot(analysis_results)# Print comprehensive resultscat("\nComprehensive Role Distribution Analysis\n")cat("=======================================\n\n")# 1. Print overall distribution summarycat("1. Distribution by RMT Methods Use and Role:\n")print(analysis_results$summary)# 2. Print test results for each RMT groupfor(rmt innames(analysis_results$test_results)) {cat(sprintf("\n2. Statistical Analysis for %s Group:\n", rmt))# Chi-square/Fisher test resultscat("Chi-square/Fisher Test:\n")print(analysis_results$test_results[[rmt]]$chi_test)# Pairwise comparisonscat("\nPairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$test_results[[rmt]]$pairwise_results) }# Display the plotprint(role_plot)# Return full results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_comprehensive_role_analysis()
Comprehensive Role Distribution Analysis
=======================================
1. Distribution by RMT Methods Use and Role:
# A tibble: 8 × 8
RMTMethods_YN role_type count total_in_group percentage se_prop ci_lower
<fct> <chr> <int> <int> <dbl> <dbl> <dbl>
1 No RMT Amateur Perfor… 676 2361 28.6 0.930 26.8
2 No RMT Professional P… 807 2361 34.2 0.976 32.3
3 No RMT Student 475 2361 20.1 0.825 18.5
4 No RMT Wind Instrumen… 403 2361 17.1 0.774 15.6
5 RMT Amateur Perfor… 70 448 15.6 1.72 12.3
6 RMT Professional P… 163 448 36.4 2.27 31.9
7 RMT Student 87 448 19.4 1.87 15.8
8 RMT Wind Instrumen… 128 448 28.6 2.13 24.4
# ℹ 1 more variable: ci_upper <dbl>
2. Statistical Analysis for No RMT Group:
Chi-square/Fisher Test:
Chi-squared test for given probabilities
data: role_table
X-squared = 173.96, df = 3, p-value < 2.2e-16
Pairwise Comparisons (Bonferroni-corrected):
comparison p_value
X-squared Student vs Amateur Performer 1.210790e-11
X-squared1 Student vs Professional Performer 2.455137e-27
X-squared2 Student vs Wind Instrument Teacher 7.913914e-03
X-squared3 Amateur Performer vs Professional Performer 4.582419e-05
X-squared4 Amateur Performer vs Wind Instrument Teacher 4.204100e-21
X-squared5 Professional Performer vs Wind Instrument Teacher 3.833551e-41
statistic p_adjusted
X-squared 45.953733 7.264738e-11
X-squared1 117.310126 1.473082e-26
X-squared2 7.052852 4.748348e-02
X-squared3 16.613479 2.749452e-04
X-squared4 88.875729 2.522460e-20
X-squared5 180.466335 2.300131e-40
2. Statistical Analysis for RMT Group:
Chi-square/Fisher Test:
Chi-squared test for given probabilities
data: role_table
X-squared = 46.839, df = 3, p-value = 3.76e-10
Pairwise Comparisons (Bonferroni-corrected):
comparison p_value
X-squared Professional Performer vs Amateur Performer 2.441852e-12
X-squared1 Professional Performer vs Student 2.318768e-08
X-squared2 Professional Performer vs Wind Instrument Teacher 1.528556e-02
X-squared3 Amateur Performer vs Student 1.597081e-01
X-squared4 Amateur Performer vs Wind Instrument Teacher 4.442375e-06
X-squared5 Student vs Wind Instrument Teacher 1.753350e-03
statistic p_adjusted
X-squared 49.092394 1.465111e-11
X-squared1 31.207430 1.391261e-07
X-squared2 5.883252 9.171337e-02
X-squared3 1.976987 9.582489e-01
X-squared4 21.063819 2.665425e-05
X-squared5 9.791347 1.052010e-02
9.2.1 Results and Discussion
Chi-square tests, effect sizes, and post-hoc comparisons are printed for each RMTMethods_YN group (output for group 0 and 1 are shown in `
Amateur Performer vs Music Teacher: Chi-square = 69.07, p = 0.0000 Amateur Performer vs Professional Performer: Chi-square = 11.57, p = 0.0040 Amateur Performer vs Student: Chi-square = 35.10, p = 0.0000 Music Teacher vs Professional Performer: Chi-square = 134.89, p = 0.0000 Music Teacher vs Student: Chi-square = 5.90, p = 0.0906 Professional Performer vs Student: Chi-square = 85.98, p = 0.0000
Chi-square Test for RMT Methods = 1: Chi-squared test for given probabilities data: role_table X-squared = 46.839, df = 3, p-value = 3.76e-10 Cramer’s V: 0.1867 Post-hoc Pairwise Comparisons (Bonferroni-corrected): Amateur Performer vs Music Teacher: Chi-square = 16.99, p = 0.0002 Amateur Performer vs Professional Performer: Chi-square = 37.12, p = 0.0000 Amateur Performer vs Student: Chi-square = 1.84, p = 1.0000 Music Teacher vs Professional Performer: Chi-square = 4.21, p = 0.2412 Music Teacher vs Student: Chi-square = 7.82, p = 0.0310 Professional Performer vs Student: Chi-square = 23.10, p = 0.0000 `).
Key Findings
Within-Group Percentages:
• For each subgroup defined by RMTMethods_YN (e.g., “0” or “1”), the analysis calculates the percentage share of each role. This is done by taking the count of respondents in each role within that group, dividing it by the total number of responses in the group, and then multiplying by 100.
• The analysis also provides 95% confidence intervals around these percentage estimates (using standard errors).
Visualization Enhancements:
• The bar plot displays:
• The role names (with the raw count given in parentheses) on the vertical axis.
• The x-axis shows the percentage of respondents relative to each RMTMethods_YN subgroup.
• The bars are colored by the RMTMethods_YN groups, and the legend labels include the total number (e.g., “0 (N=2361)” and “1 (N=448)”).
• Horizontal error bars indicate the 95% confidence interval for each percentage estimate, which aids in assessing the variability and uncertainty in the estimates.
Statistical Testing Per Group:
Chi-square Tests:
For each RMTMethods_YN group independently, a chi-square test of equal proportions among the roles was conducted. The highly significant p-values (p < 0.001 for both groups) indicate that the role distribution is not uniform within each subgroup.
Effect Sizes (Cramer’s V):
The magnitude of the effect sizes (Cramer’s V) for each group suggests small to moderate associations between the role distribution and the subgroup classification.
Post-hoc Pairwise Comparisons:
For each RMTMethods_YN group, pairwise comparisons among roles were performed with Bonferroni correction.
• In one group (labeled “0”), nearly every pair of roles reveals a statistically significant difference in their proportions.
• In the other subgroup (“1”), some comparisons reached significance while others did not, suggesting that in that smaller group the differences between certain roles (for example, between Student and Amateur Performer) may not be as pronounced.
Subgroup Differences:
• The inclusion of RMTMethods_YN as a grouping variable allows us to investigate whether the distributions of roles differ by this characteristic. For example, one subgroup (e.g., those might indicate non-use versus use of certain RMT methods) appears to have higher or lower relative participation for each role.
• The provided legend (with appended total NN) makes it easy to understand the relative sample sizes. For instance, one category has a considerably larger total number of respondents compared to the other, which should be taken into account when interpreting the percentages and significance tests.
Practical Implications
Targeted Interventions or Further Study:
The differences in role distributions within each RMTMethods_YN group suggest that some professional roles may be more prevalent in one subgroup than the other. This could guide efforts to target or support certain roles specifically within the context of RMT methods.
Understanding Representation: Examining percentages within each subgroup provides additional insight over the overall distribution, as it controls for differences in subgroup sample sizes. For instance, if one is interested in how many of the respondents who use RMT methods (or vice versa) fall into each role, these within-group percentages are more informative than the overall percentages.
Statistical Robustness: The chi-square tests and post-hoc pairwise comparisons lend statistical support to the conclusion that the distribution of roles is genuinely different than what would be expected if roles were uniformly distributed within each subgroup. This reinforces the reliability of the insights drawn from the data.
Visualization Features
Bar Plot:
The final plot visualizes the percentage of each role within the two RMTMethods_YN subgroups. Each bar is labeled with both the role name and the corresponding count. The error bars (showing the 95% confidence intervals) help gauge the precision of these percentages.
Legend Labels with Total NN: By including the total number of respondents for each subgroup within the legend, the viewer can easily see the sizes of the groups. This is particularly useful if the subgroups are highly unequal in size, as is evident here (with one subgroup having roughly 2361 responses and the other 448).
Summary
In summary, the analysis demonstrates that the distribution of roles among wind instrument musicians differs significantly when examined within each RMTMethods_YN subgroup. The detailed statistical testing supports that these differences are not due to chance, and the corresponding visualization clearly displays the varying distributions along with confidence intervals that highlight the variability. This type of analysis is valuable for understanding subgroup characteristics, informing targeted decisions, and laying the groundwork for further research into how RMT methods may relate to professional roles in music.
10 Education
10.1 Demographic Statistics
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation# Count the occurrences of each education categoryeducation_data <- data_combined %>%count(ed) %>%mutate(percentage = n /sum(n) *100, # Calculate percentageslabel =paste0(n, " (", sprintf("%.1f", percentage), "%)"), # Create labelsexpected =sum(n) /n() # Calculate expected frequencies for chi-square test )# Statistical Analysis# Chi-square goodness of fit testchi_test <-chisq.test(education_data$n)# Calculate standardized residualsstd_residuals <-data.frame(Category = education_data$ed,Observed = education_data$n,Expected = chi_test$expected,Std_Residual =round(chi_test$stdres, 3))# Calculate effect size (Cramer's V)n <-sum(education_data$n)cramer_v <-sqrt(chi_test$statistic / (n * (min(length(education_data$n), 2) -1)))# Print statistical resultscat("\nChi-square Test Results:\n")
Chi-square Test Results:
Code
print(chi_test)
Chi-squared test for given probabilities
data: education_data$n
X-squared = 479.53, df = 7, p-value < 2.2e-16
The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.
Effect Size (Cramer’s V):
Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.
Detailed Group Comparison
Group Proportions:
The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.
Proportion Differences:
By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.
Standardized Residuals
Interpreting Residuals:
The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates underrepresentation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an underrepresentation relative to expectation.
Identifying Key Differences:
Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.
Visualization of Findings
Two types of plots were generated:
Side-by-Side Bar Plot:
This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.
Dot/Line Plot:
As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups.
Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.
Overall Interpretation
Significant Association:
The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.
Magnitude and Specific Contributions:
The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.
Practical Implications:
Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.
In summary, the findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.
11 Disorders
11.1 Descriptive Stats
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process disorders data for full sample:# - Remove NA and "Prefer not to say"# - Split comma-separated disorders and trim spaces# - Combine specific disorder categories using fixed() to avoid escape issues.disorders_full <- data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say") %>%select(disorders) %>%mutate(disorders =strsplit(disorders, ",")) %>%unnest(disorders) %>%mutate(disorders =trimws(disorders),disorders =case_when(# Combine cancer-related categories into "Cancer"str_detect(disorders, fixed("Cancer (Breast", ignore_case =TRUE)) |str_detect(disorders, fixed("Colorectal", ignore_case =TRUE)) |str_detect(disorders, fixed("Lung", ignore_case =TRUE)) |str_detect(disorders, fixed("and/or Prostate)", ignore_case =TRUE)) ~"Cancer",# Combine COPD-related categories into "Chronic Obstructive Pulmonary Disease"str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case =TRUE)) |str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case =TRUE)) ~"Chronic Obstructive Pulmonary Disease",# Combine restrictive lung disease categories into "Restrictive Lung Disease"str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case =TRUE)) |str_detect(disorders, fixed("cystic fibrosis", ignore_case =TRUE)) ~"Restrictive Lung Disease",TRUE~ disorders ) )# Calculate total sample size (participants with valid disorder data)total_participants <-nrow(data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say"))# Calculate counts and percentages for each disorder category (only include categories with count >= 10)agg_disorders <- disorders_full %>%group_by(disorders) %>%summarise(count =n()) %>%mutate(percentage = (count / total_participants) *100) %>%filter(count >=10) %>%arrange(desc(count))# 3. Create Plotplot_full <-ggplot(agg_disorders, aes(x = count, y =reorder(disorders, count))) +geom_bar(stat ="identity", fill ="#4472C4") +# A professional blue colorgeom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),hjust =-0.2, size =3.5) +labs(title ="Health disorders reported by participants (N = 1558)",x ="Count",y ="Disorder",caption ="Note. Percentages were calculated out of the total sample." ) +theme_minimal() +theme(# Title and caption left justifiedplot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),# Y-axis text left alignedaxis.text.y =element_text(size =10, hjust =0),# Adjust plot marginsplot.margin =margin(l =20, r =20, t =20, b =20, unit ="pt"),# Axis title paddingaxis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3))) # Expanding x-axis for full label visibility# 4. Print Summary and Display Plotcat("\nSummary of Combined Disorder Categories (Full Sample):\n")
Summary of Combined Disorder Categories (Full Sample):
• General Anxiety Disorder (GAD) is the most frequently reported condition, accounting for 20.4% of all reported disorders.
• Depression follows closely at 18.2%, highlighting the significant prevalence of mental health issues.
• Musician Performance Anxiety Disorder is also notable, representing 10% of the total.
Physical Health Disorders:
• Asthma is the most common physical condition, making up 13.6% of the reported disorders.
• Arthritis (Osteoarthritis and Rheumatoid) accounts for 8.4%, indicating a notable impact of joint-related conditions.
Neurodevelopmental Disorders:
• Autism Spectrum Disorders represent 7% of the total, showing a smaller but significant presence.
Key Observations:
• Mental health conditions dominate the list, collectively accounting for nearly half of the reported disorders.
• Physical and neurodevelopmental conditions also contribute significantly, reflecting a diverse range of health challenges among participants.
11.2 Inferential stats
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create the binary RMTMethods groups with labels for claritydata_combined <- data_combined %>%mutate(RMTMethods_group =case_when( RMTMethods_YN ==0~"No (n = 1330)", RMTMethods_YN ==1~"Yes (n = 228)",TRUE~NA_character_ ))# Process disorders data:# - Filter out NA and "Prefer not to say"# - Split comma-separated disorders, trim spaces, and recode into combined categories.disorders_group <- data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say") %>%select(disorders, RMTMethods_YN, RMTMethods_group) %>%mutate(disorders =strsplit(disorders, ",")) %>%unnest(disorders) %>%mutate(disorders =trimws(disorders),disorders =case_when(str_detect(disorders, fixed("Cancer (Breast", ignore_case =TRUE)) |str_detect(disorders, fixed("Colorectal", ignore_case =TRUE)) |str_detect(disorders, fixed("Lung", ignore_case =TRUE)) |str_detect(disorders, fixed("and/or Prostate)", ignore_case =TRUE)) ~"Cancer",str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case =TRUE)) |str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case =TRUE)) ~"Chronic Obstructive Pulmonary Disease",str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case =TRUE)) |str_detect(disorders, fixed("cystic fibrosis", ignore_case =TRUE)) ~"Restrictive Lung Disease",TRUE~ disorders ) )# Robust Statistical Analysis Functionperform_robust_statistical_test <-function(contingency_table) {# Detailed expected frequency analysis expected_freq <-suppressWarnings(chisq.test(contingency_table)$expected)# Frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(contingency_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) }}# Pairwise Comparisons Functionpairwise_comparisons <-function(contingency_table) { disorders <-rownames(contingency_table) n_disorders <-length(disorders) results <-data.frame()for(i in1:(n_disorders-1)) {for(j in (i+1):n_disorders) {# Create 2x2 contingency table for two disorders subset_table <- contingency_table[c(i,j),]# Perform Fisher's exact test test <-fisher.test(subset_table) results <-rbind(results, data.frame(comparison =paste(disorders[i], "vs", disorders[j]),p_value = test$p.value,odds_ratio = test$estimate )) } }# Apply Bonferroni correction results$p_adjusted <-p.adjust(results$p_value, method ="bonferroni")return(results)}# Build contingency tablecontingency_table <-table(disorders_group$disorders, disorders_group$RMTMethods_YN)# Perform robust statistical teststat_test <-perform_robust_statistical_test(contingency_table)
Expected Frequency Analysis:
Minimum Expected Frequency: 2.52
Cells with Expected Frequency < 5: 3 out of 26 cells ( 11.54 %)
Code
# Calculate effect size (Cramer's V)n_total <-sum(contingency_table)min_dim <-min(dim(contingency_table)) -1cramers_v <-sqrt(stat_test$statistic / (n_total * min_dim))# Total sample size for frequency calculationtotal_valid <-nrow(data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say"))# Calculate overall counts for each disorder (apply a threshold: count >= 10)overall_counts <- disorders_group %>%group_by(disorders) %>%summarise(total_count =n()) %>%filter(total_count >=10)# Calculate counts and percentages by disorder and RMTMethods_groupagg_disorders <- disorders_group %>%group_by(disorders, RMTMethods_group) %>%summarise(count =n(), .groups ='drop') %>%inner_join(overall_counts, by ="disorders") %>%mutate(percentage = (count / total_valid) *100) %>%arrange(desc(total_count))# Create Plot with Improved Scalingplot_disorders <-ggplot(agg_disorders, aes(x = count, y =reorder(disorders, total_count))) +geom_bar(aes(fill = RMTMethods_group), stat ="identity", position =position_dodge(width =0.8), width =0.7) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage), group = RMTMethods_group),position =position_dodge(width =0.8),hjust =-0.1, size =3,fontface ="plain" ) +labs(title ="Health disorders reported by participants (N = 1558)",x ="Count",y ="Disorder",fill ="RMT device use",caption =paste0("Note: Percentages were calculated out of the full sample (N = ", total_valid, ").\n", stat_test$method, ": p = ", format.pval(stat_test$p_value, digits =3),", Cramer's V = ", round(cramers_v, 3) ) ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =40, t =20, b =20, unit ="pt"),legend.position ="top",legend.justification ="left",legend.title =element_text(hjust =0, size =10),legend.text =element_text(size =10),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3)), # Reduced expansion to minimize whitespacelimits =c(0, NA) # Allows automatic upper limit calculation ) +scale_fill_manual(values =c("No (n = 1330)"="#4472C4", "Yes (n = 228)"="#ED7D31"))# Perform pairwise comparisonspairwise_results <-pairwise_comparisons(contingency_table)# Print statistical resultscat("\nStatistical Test Results:\n")
Statistical Test Results:
Code
cat("Test Type:", stat_test$test_type, "\n")
Test Type: Chi-Square with Continuity Correction
Code
cat("P-value:", stat_test$p_value, "\n")
P-value: 1.06822e-20
Code
if (stat_test$test_type =="Chi-Square with Continuity Correction") {cat("Chi-square Statistic:", stat_test$statistic, "\n")cat("Degrees of Freedom:", stat_test$parameter, "\n")}
Chi-square Statistic: 123.8186
Degrees of Freedom: 12
comparison
odds ratio Alcohol abuse vs Alzheimer's Disease and Related Dementia
odds ratio1 Alcohol abuse vs Arthritis (Osteoarthritis and Rheumatoid)
odds ratio2 Alcohol abuse vs Asthma
odds ratio3 Alcohol abuse vs Atrial Fibrillation
odds ratio4 Alcohol abuse vs Autism Spectrum Disorders
odds ratio5 Alcohol abuse vs Cancer
odds ratio6 Alcohol abuse vs Chronic Kidney Disease
odds ratio7 Alcohol abuse vs Chronic Obstructive Pulmonary Disease
odds ratio8 Alcohol abuse vs Depression
odds ratio9 Alcohol abuse vs General Anxiety Disorder
odds ratio10 Alcohol abuse vs Musician Performance Anxiety Disorder
odds ratio11 Alcohol abuse vs Restrictive Lung Disease
odds ratio12 Alzheimer's Disease and Related Dementia vs Arthritis (Osteoarthritis and Rheumatoid)
odds ratio13 Alzheimer's Disease and Related Dementia vs Asthma
odds ratio14 Alzheimer's Disease and Related Dementia vs Atrial Fibrillation
odds ratio15 Alzheimer's Disease and Related Dementia vs Autism Spectrum Disorders
odds ratio16 Alzheimer's Disease and Related Dementia vs Cancer
odds ratio17 Alzheimer's Disease and Related Dementia vs Chronic Kidney Disease
odds ratio18 Alzheimer's Disease and Related Dementia vs Chronic Obstructive Pulmonary Disease
odds ratio19 Alzheimer's Disease and Related Dementia vs Depression
odds ratio20 Alzheimer's Disease and Related Dementia vs General Anxiety Disorder
odds ratio21 Alzheimer's Disease and Related Dementia vs Musician Performance Anxiety Disorder
odds ratio22 Alzheimer's Disease and Related Dementia vs Restrictive Lung Disease
odds ratio23 Arthritis (Osteoarthritis and Rheumatoid) vs Asthma
odds ratio24 Arthritis (Osteoarthritis and Rheumatoid) vs Atrial Fibrillation
odds ratio25 Arthritis (Osteoarthritis and Rheumatoid) vs Autism Spectrum Disorders
odds ratio26 Arthritis (Osteoarthritis and Rheumatoid) vs Cancer
odds ratio27 Arthritis (Osteoarthritis and Rheumatoid) vs Chronic Kidney Disease
odds ratio28 Arthritis (Osteoarthritis and Rheumatoid) vs Chronic Obstructive Pulmonary Disease
odds ratio29 Arthritis (Osteoarthritis and Rheumatoid) vs Depression
odds ratio30 Arthritis (Osteoarthritis and Rheumatoid) vs General Anxiety Disorder
odds ratio31 Arthritis (Osteoarthritis and Rheumatoid) vs Musician Performance Anxiety Disorder
odds ratio32 Arthritis (Osteoarthritis and Rheumatoid) vs Restrictive Lung Disease
odds ratio33 Asthma vs Atrial Fibrillation
odds ratio34 Asthma vs Autism Spectrum Disorders
odds ratio35 Asthma vs Cancer
odds ratio36 Asthma vs Chronic Kidney Disease
odds ratio37 Asthma vs Chronic Obstructive Pulmonary Disease
odds ratio38 Asthma vs Depression
odds ratio39 Asthma vs General Anxiety Disorder
odds ratio40 Asthma vs Musician Performance Anxiety Disorder
odds ratio41 Asthma vs Restrictive Lung Disease
odds ratio42 Atrial Fibrillation vs Autism Spectrum Disorders
odds ratio43 Atrial Fibrillation vs Cancer
odds ratio44 Atrial Fibrillation vs Chronic Kidney Disease
odds ratio45 Atrial Fibrillation vs Chronic Obstructive Pulmonary Disease
odds ratio46 Atrial Fibrillation vs Depression
odds ratio47 Atrial Fibrillation vs General Anxiety Disorder
odds ratio48 Atrial Fibrillation vs Musician Performance Anxiety Disorder
odds ratio49 Atrial Fibrillation vs Restrictive Lung Disease
odds ratio50 Autism Spectrum Disorders vs Cancer
odds ratio51 Autism Spectrum Disorders vs Chronic Kidney Disease
odds ratio52 Autism Spectrum Disorders vs Chronic Obstructive Pulmonary Disease
odds ratio53 Autism Spectrum Disorders vs Depression
odds ratio54 Autism Spectrum Disorders vs General Anxiety Disorder
odds ratio55 Autism Spectrum Disorders vs Musician Performance Anxiety Disorder
odds ratio56 Autism Spectrum Disorders vs Restrictive Lung Disease
odds ratio57 Cancer vs Chronic Kidney Disease
odds ratio58 Cancer vs Chronic Obstructive Pulmonary Disease
odds ratio59 Cancer vs Depression
odds ratio60 Cancer vs General Anxiety Disorder
odds ratio61 Cancer vs Musician Performance Anxiety Disorder
odds ratio62 Cancer vs Restrictive Lung Disease
odds ratio63 Chronic Kidney Disease vs Chronic Obstructive Pulmonary Disease
odds ratio64 Chronic Kidney Disease vs Depression
odds ratio65 Chronic Kidney Disease vs General Anxiety Disorder
odds ratio66 Chronic Kidney Disease vs Musician Performance Anxiety Disorder
odds ratio67 Chronic Kidney Disease vs Restrictive Lung Disease
odds ratio68 Chronic Obstructive Pulmonary Disease vs Depression
odds ratio69 Chronic Obstructive Pulmonary Disease vs General Anxiety Disorder
odds ratio70 Chronic Obstructive Pulmonary Disease vs Musician Performance Anxiety Disorder
odds ratio71 Chronic Obstructive Pulmonary Disease vs Restrictive Lung Disease
odds ratio72 Depression vs General Anxiety Disorder
odds ratio73 Depression vs Musician Performance Anxiety Disorder
odds ratio74 Depression vs Restrictive Lung Disease
odds ratio75 General Anxiety Disorder vs Musician Performance Anxiety Disorder
odds ratio76 General Anxiety Disorder vs Restrictive Lung Disease
odds ratio77 Musician Performance Anxiety Disorder vs Restrictive Lung Disease
p_value odds_ratio p_adjusted
odds ratio 8.687888e-04 7.33487909 6.776553e-02
odds ratio1 6.736075e-01 0.79189681 1.000000e+00
odds ratio2 1.291170e-02 0.34833383 1.000000e+00
odds ratio3 1.000000e+00 1.08952987 1.000000e+00
odds ratio4 1.620258e-01 0.52250395 1.000000e+00
odds ratio5 1.453761e-01 1.79325863 1.000000e+00
odds ratio6 4.811111e-01 1.79572964 1.000000e+00
odds ratio7 8.208983e-01 1.12978460 1.000000e+00
odds ratio8 2.738306e-02 0.38371817 1.000000e+00
odds ratio9 2.923673e-02 0.39704286 1.000000e+00
odds ratio10 8.434382e-01 0.93582679 1.000000e+00
odds ratio11 5.060178e-01 1.57614059 1.000000e+00
odds ratio12 1.263234e-05 0.10545848 9.853229e-04
odds ratio13 2.691018e-09 0.04645843 2.098994e-07
odds ratio14 3.419565e-03 0.14939850 2.667261e-01
odds ratio15 5.772160e-07 0.07014907 4.502285e-05
odds ratio16 7.390543e-03 0.23740511 5.764623e-01
odds ratio17 1.296605e-01 0.25031066 1.000000e+00
odds ratio18 1.155260e-03 0.15262452 9.011029e-02
odds ratio19 3.805206e-09 0.05091667 2.968061e-07
odds ratio20 4.433295e-09 0.05259406 3.457970e-07
odds ratio21 3.957613e-05 0.12416814 3.086938e-03
odds ratio22 6.730149e-02 0.21979364 1.000000e+00
odds ratio23 4.924581e-03 0.43923886 3.841173e-01
odds ratio24 4.880656e-01 1.37663848 1.000000e+00
odds ratio25 2.096888e-01 0.65870321 1.000000e+00
odds ratio26 1.774311e-03 2.26769754 1.383963e-01
odds ratio27 1.781723e-01 2.28404886 1.000000e+00
odds ratio28 3.524952e-01 1.42773996 1.000000e+00
odds ratio29 7.482876e-03 0.48433862 5.836644e-01
odds ratio30 8.706435e-03 0.50125385 6.791019e-01
odds ratio31 5.921115e-01 1.18228584 1.000000e+00
odds ratio32 3.122584e-01 2.00099866 1.000000e+00
odds ratio33 2.074853e-02 3.12897359 1.000000e+00
odds ratio34 2.371734e-01 1.49894257 1.000000e+00
odds ratio35 8.321659e-11 5.16525402 6.490894e-09
odds ratio36 1.312993e-02 5.18720170 1.000000e+00
odds ratio37 2.211763e-03 3.24693799 1.725175e-01
odds ratio38 7.874670e-01 1.10313195 1.000000e+00
odds ratio39 6.953000e-01 1.14186169 1.000000e+00
odds ratio40 2.624722e-04 2.69248112 2.047283e-02
odds ratio41 1.892004e-02 4.54485131 1.000000e+00
odds ratio42 1.251625e-01 0.47952553 1.000000e+00
odds ratio43 3.094617e-01 1.64431732 1.000000e+00
odds ratio44 4.912945e-01 1.64575525 1.000000e+00
odds ratio45 1.000000e+00 1.03658564 1.000000e+00
odds ratio46 2.517413e-02 0.35197466 1.000000e+00
odds ratio47 2.722467e-02 0.36413548 1.000000e+00
odds ratio48 8.236572e-01 0.85827389 1.000000e+00
odds ratio49 7.259190e-01 1.44528601 1.000000e+00
odds ratio50 1.757768e-05 3.44259255 1.371059e-03
odds ratio51 5.463183e-02 3.45029365 1.000000e+00
odds ratio52 6.414804e-02 2.16429306 1.000000e+00
odds ratio53 3.392917e-01 0.73575757 1.000000e+00
odds ratio54 3.530675e-01 0.76150726 1.000000e+00
odds ratio55 5.795295e-02 1.79513469 1.000000e+00
odds ratio56 1.277963e-01 3.02448643 1.000000e+00
odds ratio57 1.000000e+00 1.01092564 1.000000e+00
odds ratio58 1.918810e-01 0.63042523 1.000000e+00
odds ratio59 3.259975e-11 0.21343616 2.542781e-09
odds ratio60 2.439770e-11 0.22086864 1.903021e-09
odds ratio61 6.644606e-03 0.52126543 5.182793e-01
odds ratio62 1.000000e+00 0.88527189 1.000000e+00
odds ratio63 5.073732e-01 0.62707234 1.000000e+00
odds ratio64 1.689878e-02 0.21195096 1.000000e+00
odds ratio65 1.853343e-02 0.21916221 1.000000e+00
odds ratio66 3.189673e-01 0.51672965 1.000000e+00
odds ratio67 1.000000e+00 0.87969356 1.000000e+00
odds ratio68 3.038301e-03 0.33930463 2.369875e-01
odds ratio69 3.432927e-03 0.35105813 2.677683e-01
odds ratio70 5.964717e-01 0.82769789 1.000000e+00
odds ratio71 7.416156e-01 1.39867338 1.000000e+00
odds ratio72 9.059635e-01 1.03510059 1.000000e+00
odds ratio73 4.754242e-04 2.44168532 3.708309e-02
odds ratio74 2.423421e-02 4.13230595 1.000000e+00
odds ratio75 4.032720e-04 2.35927724 3.145522e-02
odds ratio76 2.652527e-02 3.99587598 1.000000e+00
odds ratio77 3.532773e-01 1.69484985 1.000000e+00
Code
# Display the plotprint(plot_disorders)
11.2.1 Results and Discussion
Contingency Table and Chi-Square Test:
A contingency table was created stratifying disorder counts by RMT device use (the RMTMethods_group). The chi-square test was applied to examine whether the pattern of disorders is statistically independent of whether a participant used an RMT device.
Chi-square Statistic:
The test produced a chi-square value (with degrees of freedom corresponding to the table size) and a corresponding p-value below the standard significance threshold (p < .05). This suggests that the distribution of specific disorders is not independent of the RMTMethods groups.
Effect Size – Cramer’s V:
Cramer’s V was computed to provide an estimate of the strength of the association between the categorical variables. A Cramer’s V value (ranging from 0 to 1) was reported. While a value close to zero would indicate a very weak association, the reported value gives an indication of a small to moderate (depending on the exact value) association between RMT device use and disorder reporting.
Standardized Residuals:
Standardized residuals were calculated from the contingency table. Residuals with an absolute value greater than 1.96 indicate cells that contribute significantly to the chi-square statistic. In practice, these highlight which combinations of disorder and RMTMethods group differ most from what would be expected under the null hypothesis of independence. Cells with |residual| > 1.96 should be looked at as potential areas of interest or imbalance.
Frequency Distribution and Percentages
Frequency Calculations: For each disorder category (only those with an overall count of at least 10 were included), counts were calculated separately for the RMTMethods groups.
Percentage Calculation:
Percentages were computed based on the subgroup total of participants within each RMTMethods group. This approach allows each group’s prevalence rate for specific disorders to be compared.
Interpretation of Frequencies:
• Disorders such as “Cancer”, “Depression”, “General Anxiety Disorder”, and “Asthma” (or similar common conditions) are shown with their total counts and the percentage they constitute within each RMTMethods group.
• The display of percentages along with the counts gives an immediate idea of the relative prevalence within each subgroup.
Graphical Summary
Bar Plot: A faceted bar plot was created showing the counts (with percentage labels) for each disorder.
Axis and Legend:
• The disorder categories are listed on the y-axis in descending order by overall count.
• The bars are color-coded by the RMTMethods group (using a professional blue for “No (n = 1330)” and orange for “Yes (n = 228)”).
Text Labels:
The bars include labels with both the count and the computed percentage (e.g., “157 (21.4%)”) for clarity.
Title and Caption:
The plot title indicates the overall sample size (N = 1558), while the caption provides details about how percentages were calculated and summarizes the chi-square test results along with Cramer’s V.
Overall Interpretation
Association Between Disorders and RMT Device Use:
The chi-square test indicates that there is a statistically significant association between the disorders reported and the RMTMethods group (device use). This suggests that the pattern of health disorders is not completely similar between those who use an RMT device and those who do not. However, the effect size (Cramer’s V) provides insight into how strong this association is; if Cramer’s V is small, the association, while significant, might not be strong.
Implications:
• Researchers could investigate further why certain disorders differ between these groups. It may be that the use of RMT devices is more common in populations with specific health concerns, or it could suggest differential reporting.
• The standardized residuals help pinpoint which cells (i.e., specific disorders in a given RMTMethods group) deviate significantly from the expected frequency. This can drive further targeted investigations.
Caveats:
• The analysis focuses on disorders reported by participants who provided data (excluding non-responses such as “Prefer not to say”).
• The categorization process, including merging similar disorders (like various cancer types) into a single category, inherently aggregates some detailed information. Any subsequent interpretation should consider whether this level of aggregation is appropriate for the research question at hand.
Conclusion
In summary, the data processing and statistical analysis indicate that there are significant differences in how participants in different RMT device groups report their health disorders. The subsequent visualization provides a clear and insightful summary of the frequencies and percentages of the disorders, making it easier for researchers or stakeholders to quickly understand the distribution and the areas of significant deviation. This analysis can serve as a foundation for more detailed follow-up studies exploring the reasons behind these associations.
12 Years of Playing
12.1 Descriptive Stats
Code
## yrsPlay_MAX# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Recode yrsPlay_MAX variabledata_combined <- data_combined %>%mutate(yrsPlay_cat =factor(case_when( yrsPlay_MAX ==1~"<5yrs", yrsPlay_MAX ==2~"5-9yrs", yrsPlay_MAX ==3~"10-14yrs", yrsPlay_MAX ==4~"15-19yrs", yrsPlay_MAX ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))# Filter out rows with missing valuesdata_processed <- data_combined %>%filter(!is.na(yrsPlay_cat))# Calculate total Ntotal_n <-nrow(data_processed)# Create frequency tablefreq_table <- data_processed %>%group_by(yrsPlay_cat) %>%summarise(count =n()) %>%mutate(percentage = (count /sum(count)) *100)# Create plot titleplot_title <-"Distribution of years of playing experience"# Create the plotplot_years <-ggplot(freq_table, aes(x = count, y = yrsPlay_cat)) +geom_bar(stat ="identity", fill ="#4472C4") +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),hjust =-0.2, size =3.5) +labs(title =paste0(plot_title, " (N = ", total_n, ")"),x ="Count",y ="Years of playing experience",caption ="Note. Percentages were calculated out of the total sample." ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =20, t =20, b =20, unit ="pt"),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(plot_years)
The distribution of participants’ years of experience playing wind instruments is as follows:
• The largest group is “20+yrs,” indicating a significant portion of participants have extensive experience.
• The second-largest group is “<4yrs,” showing a notable number of beginners or those with limited experience.
• The middle categories (“5-9yrs,” “10-14yrs,” and “15-19yrs”) have progressively fewer participants, suggesting a gradual decline in representation as experience increases. This distribution suggests a bimodal pattern, with many participants either being beginners or highly experienced players.
Interpretation
Experience Skew: The distribution is heavily skewed towards more experienced individuals: the highest percentage (41.85%) is in the “20+yrs” category, suggesting that a significant proportion of respondents have a long history of playing experience.
Intermediate Groups: The 10-14yrs and 5-9yrs groups also represent substantial portions of the sample (roughly 20.73% and 19.58%, respectively), indicating that a majority of the sample falls within these mid-to-high ranges of experience.
Minority Group: The least represented group is the “<5yrs” category with only about 6.80% of the sample. This might imply either that the surveyed population tends to be more experienced or that newer players are underrepresented in this particular dataset.
Implications
Targeting and Marketing: For stakeholders interested in reaching out to the largest segment, strategies might focus on highly experienced musicians, particularly those with more than 20 years of experience.
Support and Development: Conversely, given the smaller sample of less experienced (<5yrs) musicians, additional support or targeted outreach could be considered if the goal is to encourage growth or adoption among newer players.
In summary, the analysis shows a distinct predominance of highly experienced musicians, with the largest group having 20+ years of playing experience.
By Instrument
Overview and Data Reshaping The analysis focused on recoding and comparing the years of playing experience across a variety of instruments. Each instrument-specific column (e.g., yrsPlay_flute, yrsPlay_picc, etc.) originally recorded experience on a scale from 1 to 5 was recoded into categorical groups:
After reshaping the dataset into a long format (where each row corresponds to one instrument for a respondent), frequency counts and percentages were computed per instrument.
Key Findings from the Frequency Analysis
Participation Across Instruments:
The frequency table (illustrated in the analysis output) shows different levels of participation for each instrument. For instance:
• Flute: Records the highest number of responses (e.g., 443 observations for the flute) indicating that this instrument had significant representation in the sample.
• Clarinet and Recorder: Also show high participation, though slightly lower than the flute in some cases.
• Bassoon, Oboe, etc.: Typically have lower participant counts. For instance, for some double-reed instruments, counts are around or below 150, suggesting that fewer respondents play these instruments compared to more common instruments like the flute or clarinet.
Experience Distribution – General Patterns:
• Highly Experienced Players Are Predominant: Across several instruments (Flute, Piccolo, Recorder, Clarinet, Saxophone, and Trumpet), the highest frequency lies in the “20+yrs” category. This skew toward the highest level indicates that a significant number of respondents have been playing these instruments for more than 20 years.
• Diverse Experience Ranges: While some instruments show a clear concentration at the extremes (e.g., many having 20+ years), others such as the Oboe show a different pattern where, for instance, players in the “5-9yrs” category might be more common. These differences likely reflect entry age, learning duration, and perhaps the training regimes or availability of instruction for specific instruments.
Statistical Testing and Significance
For each instrument, a Chi-square test was performed to evaluate whether the observed distribution of playing experience categories deviates significantly from a uniform distribution (the null hypothesis). The test results are summarized by:
Chi-square Statistic and p-value:
In all presented cases, the p-values were extremely small (most reported as \(p < 0.001\) or even exactly \(p < 3.86 \times 10^{-18}\) in some cases). This indicates that the null hypothesis of a uniform distribution can be strongly rejected; in other words, for every instrument, the distribution of playing experience is not uniform.
Effect Size Using Cramér’s V:
Cramér’s V values ranged roughly from 0.16 to 0.33. While the interpretation of effect size can vary depending on context, these values suggest a small-to-moderate association between the instrument played and the experience category. The moderate effect sizes reinforce the inference that, even though each instrument shows a statistically significant non-uniform distribution, the degree of association is modest and may reflect underlying differences among instruments (for example, the different typical career trajectories or training regimens). ________________________________________ Graphical Representation
The faceted bar plot - Visual Structure:
Each instrument is represented as its own facet, where the bars display the counts of respondents per experience category. The labels on the bars show both the raw counts and the percentage that the count represents within that instrument group. This makes it easy to compare the practical significance of the numbers across instruments.
Insights from the Plot:
• Instruments like the Flute and Clarinet tend to have a long tail in the “20+yrs” category, indicating that these instruments are often played over a long period by experienced musicians.
• In contrast, other instruments such as Oboe may reveal a slightly different experience profile, suggesting that fewer respondents reach the highest experience level—or that newer players are more prevalent in the sample.
Comprehensive Comparison:
By comparing across facets, viewers can quickly grasp how the distribution of playing experience differs by instrument—a valuable insight when considering curricular designs, marketing strategies for music education, or understanding career longevity within different instrumental sections (e.g., orchestral vs. band instruments).
Interpretation and Implications
Diversity & Specialization:
• The clear predominance of experienced players (especially the “20+yrs” category) in many instruments implies that respondents are mostly well-established musicians. This might reflect the sampling frame of the survey, suggesting an overrepresentation of long-term practitioners.
• For instruments like the Oboe or Bassoon (typically less common), lower numbers may signal niche interest groups where early-career musicians have different trajectories or where training opportunities might be less available.
Targeted Interventions and Further Research:
• For Education and Outreach: If the goal is to foster new talent or broaden participation, it might be worth exploring why certain instruments have fewer less-experienced players. It could be due to a lack of beginner resources or lower availability.
• For Resource Allocation: Organizations and institutions (music schools, orchestras, etc.) may use this data to allocate resources more efficiently by understanding which instruments attract a diverse range of experience levels.
Statistical Robustness:
• The highly significant Chi-square results suggest that these patterns are not random – there is a genuine underlying structure in the distribution of years of playing experience across instruments.
• The modest effect sizes (Cramér’s V) indicate that although the differences are statistically significant, the practical magnitude of the association is moderate. This nuance is important when making decisions or crafting policies based solely on statistical outcomes versus contextual considerations.
Conclusion
In summary, the analysis reveals that:
• The playing experience–related distribution is significantly non-uniform across various instruments.
• There is a marked skew toward high levels of experience for instruments such as the Flute and Clarinet.
• Statistically significant differences exist among the instruments, though effect sizes suggest these associations are moderate.
• These insights can inform further qualitative investigations, resource planning, or targeted interventions in music education and practice.
12.2 Inferential Stats
Code
# Robust Data Preparation Functionprepare_years_data <-function(file_path) {tryCatch({# Read the data data_combined <-read_excel(file_path, sheet ="Combined")# Ensure numeric conversion and handle potential NA values data_combined <- data_combined %>%mutate(# Convert to numeric, replacing NA with a safe defaultyrsPlay_MAX =as.numeric(yrsPlay_MAX),RMTMethods_YN =as.numeric(RMTMethods_YN) )# Recode yrsPlay_MAX variable with robust handling data_combined <- data_combined %>%mutate(yrsPlay_cat =factor(case_when( yrsPlay_MAX ==1~"<5yrs", yrsPlay_MAX ==2~"5-9yrs", yrsPlay_MAX ==3~"10-14yrs", yrsPlay_MAX ==4~"15-19yrs", yrsPlay_MAX ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))# Recode RMTMethods_YN into group labels with robust handling data_combined <- data_combined %>%mutate(RMTMethods_group =case_when( RMTMethods_YN ==0~"No (n = 1330)", RMTMethods_YN ==1~"Yes (n = 228)",TRUE~NA_character_ ))# Filter out rows with missing values data_processed <- data_combined %>%filter(!is.na(yrsPlay_cat) &!is.na(RMTMethods_group))return(data_processed) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Robust Statistical Testing Functionperform_robust_statistical_test <-function(cont_table) {# Check expected cell frequencies expected_freq <-chisq.test(cont_table)$expected# Criteria for test selection total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Print diagnostic informationcat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Select appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = chi_test$p.value,statistic = chi_test$statistic,parameter = chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", chi_test$parameter) )) }}# Main Analysis Functionrun_years_playing_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data data_processed <-prepare_years_data(file_path)# Total number of observations used total_n <-nrow(data_processed)# Create frequency table freq_table <- data_processed %>%group_by(yrsPlay_cat, RMTMethods_group) %>%summarise(count =n(), .groups ='drop') %>%group_by(RMTMethods_group) %>%mutate(percentage = (count /sum(count)) *100)# Create contingency table contingency_table <-table(data_processed$yrsPlay_cat, data_processed$RMTMethods_group)# Perform robust statistical test stat_test <-perform_robust_statistical_test(contingency_table)# Calculate Cramer's V n_val <-sum(contingency_table) min_dim <-min(dim(contingency_table)) -1 cramers_v <-sqrt(stat_test$statistic / (n_val * min_dim))# Create the Plot plot_years <-ggplot(freq_table, aes(x = count, y = yrsPlay_cat, fill = RMTMethods_group)) +geom_bar(stat ="identity", position =position_dodge(width =0.8)) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),position =position_dodge(width =0.8),hjust =-0.2, size =3.5 ) +labs(title =paste0("Years of playing experience by RMT device use (N = ", total_n, ")"),x ="Count",y ="Years of playing experience",fill ="RMT device use",caption =paste0("Note. Percentages calculated within RMT device groups.\n", stat_test$method, ": p = ", format.pval(stat_test$p_value, digits =3),", Cramer's V = ", round(cramers_v, 3) ) ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =40, t =20, b =20, unit ="pt"),legend.position ="top",legend.justification ="left",legend.title =element_text(hjust =0, size =10),legend.text =element_text(size =10),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.4))) +scale_fill_manual(values =c("No (n = 1330)"="#4472C4", "Yes (n = 228)"="#ED7D31"))# Print statistical resultscat("\nContingency Table:\n")print(contingency_table)cat("\nStatistical Test Results:\n")cat("Test Type:", stat_test$test_type, "\n")cat("P-value:", stat_test$p_value, "\n")if (stat_test$test_type =="Chi-Square with Continuity Correction") {cat("Chi-square Statistic:", stat_test$statistic, "\n")cat("Degrees of Freedom:", stat_test$parameter, "\n") }cat("Cramer's V:", cramers_v, "\n")# Display the plotprint(plot_years)# Return results for potential further analysisreturn(list(freq_table = freq_table,contingency_table = contingency_table,stat_test = stat_test,cramers_v = cramers_v,plot = plot_years ))}# Run the analysisresults <-run_years_playing_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 15.51
Cells with Expected Frequency < 5: 0 out of 10 cells ( 0 %)
Contingency Table:
No (n = 1330) Yes (n = 228)
<5yrs 96 10
5-9yrs 264 41
10-14yrs 258 65
15-19yrs 144 28
20+yrs 568 84
Statistical Test Results:
Test Type: Chi-Square with Continuity Correction
P-value: 0.01457866
Chi-square Statistic: 12.40529
Degrees of Freedom: 4
Cramer's V: 0.08923182
12.2.1 Results and Discussion
Chi-square Test of Independence:
[1] “-square test of independence:”
Pearson’s Chi-squared test data: contingency_table X-squared = 12.405, df = 4, p-value = 0.01458 The chi-square test shows a significant association between years of playing experience and RMT usage (χ²(4) = 12.41, p = .015).
Standardized Residuals: The standardized residuals show that:
• The 10-14 years group has significantly more RMT users than expected (3.14)
• Other groups show some variation but not at statistically significant levels
After Bonferroni correction, there is:
• A significant difference between the 10-14 years and 20+ years groups (p = .042)
• All other pairwise comparisons are not statistically significant Key Findings:
• RMT usage varies significantly across experience groups
• The 10-14 years group shows the highest proportion of RMT users (20.1%)
• The <4 years group shows the lowest proportion of RMT users (9.4%)
• There’s a notable peak in RMT usage in the middle experience ranges (10-14 years) with lower usage in both very experienced and novice players
Distribution Overview
Total Sample Size: N = 1,558
• RMT Device Non-Users: n = 1,330 (85.4%)
• RMT Device Users: n = 228 (14.6%)
Years of Playing Experience Distribution
Non-RMT Device Users (n = 1,330):*
• <5 years: 96 (7.2%)
• 5-9 years: 264 (19.8%)
• 10-14 years: 258 (19.4%)
• 15-19 years: 144 (10.8%)
• 20+ years: 568 (42.7%)
RMT Device Users (n = 228):
• <5 years: 10 (4.4%)
• 5-9 years: 41 (18.0%)
• 10-14 years: 65 (28.5%)
• 15-19 years: 28 (12.3%)
• 20+ years: 84 (36.8%)
Statistical AnalysisChi-square Test Results:
• χ²(4) = 12.405, p = 0.015
• This indicates a statistically significant association between years of playing experience and RMT device use
• The p-value (0.015) is below the conventional significance level of 0.05
Effect Size:
• Cramer’s V = 0.089
• This suggests a weak association between years of playing experience and RMT device use
• Values of Cramer’s V range from 0 to 1, where:
• 0.1 indicates a small effect
• 0.3 indicates a medium effect
• 0.5 indicates a large effect
Key Findings and InterpretationExperience Level Distribution:
• In both groups, the largest proportion of musicians had 20+ years of experience
• The second most common category differed between groups:
• Non-RMT users: 5-9 years (19.8%)
• RMT users: 10-14 years (28.5%)
Notable Differences:
• The proportion of musicians with 10-14 years of experience is notably higher among RMT device users (28.5%) compared to non-users (19.4%)
• There is a lower proportion of very experienced musicians (20+ years) among RMT device users (36.8%) compared to non-users (42.7%)
• Early-career musicians (<5 years) are less represented among RMT device users (4.4%) compared to non-users (7.2%)
Statistical Significance:
• While the chi-square test indicates a significant association (p = 0.015), the small Cramer’s V value (0.089) suggests this association is weak
• This means that while there is a relationship between years of experience and RMT device use, other factors likely play a more important role in determining RMT device adoption
Practical Implications
Experience Level Considerations: • RMT devices appear to be more commonly adopted by musicians in their mid-career stages (10-14 years)
• The lower adoption rate among very early-career musicians (<5 years) might suggest barriers to entry (e.g., cost, awareness) or perceived lack of necessity
Professional Development:
• The distribution suggests that RMT devices are used across all experience levels, but with varying adoption rates
• The peak in the 10-14 years category among RMT users might indicate a career stage where musicians are most likely to incorporate new technologies
Market Insights:
• There might be opportunities to increase RMT device adoption among both very experienced (20+ years) and early-career (<5 years) musicians
• The relatively lower adoption rates in these groups could indicate either unmet needs or barriers to adoption that could be addressed
Limitations
• The analysis doesn’t account for other variables that might influence RMT device adoption
• The categorical nature of the experience levels might mask more nuanced patterns within each category
• The significant but weak association suggests that years of experience alone is not a strong predictor of RMT device use Comparison of Individual instrument years of experience and RMT groups:
Key Findings:
Significant Associations:
Several instruments showed statistically significant associations (p < 0.05) between years of playing experience and device use:
• Recorder (p = 0.0204)
• Oboe (p = 0.0034)
• Clarinet (p = 0.0062)
• Bassoon (p = 0.0103)
Effect Sizes (Cramér’s V): The strength of associations varies:
• Strongest: Bassoon (V = 0.3809)
• Moderate: Oboe (V = 0.3249)
• Moderate: Recorder (V = 0.2923)
• Weaker: Clarinet (V = 0.1871)
Non-Significant Associations: Some instruments showed no significant relationship between experience and device use:
• Flute (p = 0.3709)
• Piccolo (p = 0.1496) Detailed Analysis by Instrument Group: Let’s examine the contingency tables:
Woodwinds (Flute, Clarinet):
• Flute shows no significant association (χ² = 4.27, p = 0.3709)
• Clarinet shows significant association (χ² = 14.36, p = 0.0062)
Double Reeds (Oboe, Bassoon):
• Both show significant associations
• Strongest effect sizes among all instruments
• Bassoon: V = 0.3809
• Oboe: V = 0.3249
Secondary Instruments (Piccolo, Recorder):
• Piccolo: No significant association
• Recorder: Significant association (p = 0.0204)
Practical Implications:
Device Usage Patterns:
• Double reed players show the strongest relationship between experience level and device use
• This might reflect the complexity of these instruments and the potential benefits of technological assistance
Experience Level Considerations:
• The relationship between experience and device use varies substantially by instrument
• More experienced players in some instruments (e.g., double reeds) show different device usage patterns compared to less experienced players
Educational Implications:
• The varying associations suggest that device use might be more beneficial or more readily adopted in certain instrument groups
• This could inform targeted educational strategies and resource allocation
Statistical Notes:
1. Warning Messages:
• Some chi-square approximations may be less reliable due to small cell counts
• This particularly affects instruments with smaller sample sizes
2. Effect Size Interpretation:
• Cramér’s V values:
• < 0.1: Negligible association
• 0.1 - 0.2: Weak association
• 0.2 - 0.3: Moderate association
• 0.3: Strong association
Recommendations:
1. Further Research:
• Investigate why double reed instruments show stronger associations
• Examine the specific types of devices used across different experience levels
2. Practical Applications:
• Consider instrument-specific approaches to device integration in teaching
• Focus on understanding why certain instrument groups show stronger associations with device use
3. Methodological Considerations:
• For instruments with smaller sample sizes, consider collecting additional data
• Consider using Fisher’s exact test for smaller sample sizes where chi-square approximations may be less reliable
13 Frequency of Playing
13.1 Descriptive Stats
Code
# Robust Statistical Testing Functionperform_robust_statistical_test <-function(observed, expected =NULL) {# If no expected frequencies provided, assume uniform distributionif (is.null(expected)) { expected <-rep(1/length(observed), length(observed)) }# Compute expected frequencies total_n <-sum(observed) expected_freq <- expected * total_n# Diagnostic frequency checkscat("Expected Frequency Analysis:\n")cat("Total Observations:", total_n, "\n")cat("Observed Frequencies:", paste(observed, collapse =", "), "\n")cat("Expected Frequencies:", paste(round(expected_freq, 2), collapse =", "), "\n")# Check chi-square test assumptions low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)cat("\nExpected Frequency Diagnostics:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", length(observed), "cells (", round(low_freq_cells /length(observed) *100, 2), "%)\n\n")# Select appropriate testif (min_expected_freq <1|| (low_freq_cells /length(observed)) >0.2) {# Use Fisher's exact test fisher_test <-fisher.test(matrix(c(observed, expected_freq), nrow =2, byrow =TRUE), simulate.p.value =TRUE, B =10000 )cat("Test Selection: Fisher's Exact Test (Monte Carlo Simulation)\n")cat("P-value:", fisher_test$p.value, "\n")return(list(test_type ="Fisher's Exact Test",p_value = fisher_test$p.value,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction chi_test <-chisq.test(x = observed, p = expected, correct =TRUE)cat("Test Selection: Chi-square Test with Yates' Correction\n")cat("Chi-square Statistic:", chi_test$statistic, "\n")cat("P-value:", chi_test$p.value, "\n")# Calculate Cramér's V k <-length(observed) cramers_v <-sqrt(chi_test$statistic / (total_n * (k -1)))cat("Cramér's V:", cramers_v, "\n")return(list(test_type ="Chi-square Test",statistic = chi_test$statistic,p_value = chi_test$p.value,cramers_v = cramers_v,method ="Chi-square Test with Yates' Continuity Correction" )) }}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Ensure freqPlay_MAX is numeric and handle potential NA valuesdata_combined <- data_combined %>%mutate(freqPlay_MAX =as.numeric(freqPlay_MAX) )# Recode freqPlay_MAX into new frequency categoriesdata <- data_combined %>%mutate(frequency =factor(case_when( freqPlay_MAX ==1~"About once a month", freqPlay_MAX ==2~"Multiple times per month", freqPlay_MAX ==3~"About once a week", freqPlay_MAX ==4~"Multiple times per week", freqPlay_MAX ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) )# 2. Create Frequency Tablefreq_table <- data %>%group_by(frequency) %>%summarise(count =n(), .groups ="drop") %>%mutate(percentage = count /sum(count) *100)# Calculate total sample sizetotal_n <-sum(freq_table$count)# 3. Perform Statistical Analysis# Observed frequenciesobserved <- freq_table$count# Perform robust statistical teststat_test <-perform_robust_statistical_test( observed, expected =rep(1/length(levels(data$frequency)), length(levels(data$frequency))))
Expected Frequency Analysis:
Total Observations: 1558
Observed Frequencies: 48, 72, 201, 635, 602
Expected Frequencies: 311.6, 311.6, 311.6, 311.6, 311.6
Expected Frequency Diagnostics:
Minimum Expected Frequency: 311.6
Cells with Expected Frequency < 5: 0 out of 5 cells ( 0 %)
Test Selection: Chi-square Test with Yates' Correction
Chi-square Statistic: 1052.777
P-value: 1.301933e-226
Cramér's V: 0.4110119
Code
# 4. Create the Plotplot_title <-"Frequency of Practice"p <-ggplot(freq_table, aes(x = frequency, y = count)) +geom_bar(stat ="identity", fill ="#4472C4") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)), position =position_stack(vjust =0.5), color ="white", size =4) +labs(title = plot_title,x ="",y =sprintf("Count (N = %d)", total_n),caption =sprintf("%s\np-value = %.4f", stat_test$method, stat_test$p_value) ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),axis.text.x =element_text(size =10, angle =15, vjust =0.5),axis.text.y =element_text(size =10),panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() )# Display the plotprint(p)
# A tibble: 5 × 3
frequency count percentage
<fct> <int> <dbl>
1 About once a month 48 3.08
2 Multiple times per month 72 4.62
3 About once a week 201 12.9
4 Multiple times per week 635 40.8
5 Everyday 602 38.6
Code
cat("\nStatistical Test Results:\n")
Statistical Test Results:
Code
cat("Test Type:", stat_test$method, "\n")
Test Type: Chi-square Test with Yates' Continuity Correction
Code
cat("P-value:", stat_test$p_value, "\n")
P-value: 1.301933e-226
Code
# Instrument-specific analysis can follow a similar robust testing approach
13.2 By Instrument
Code
# Read the data# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Select relevant columns and gather theminstruments_data <- data_combined %>%select(`freqPlay_Flute`, `freqPlay_Piccolo`, `freqPlay_Recorder`, `freqPlay_Oboe`, `freqPlay_Clarinet`, `freqPlay_Bassoon`,`freqPlay_Saxophone`, `freqPlay_Trumpet`, `freqPlay_French Horn`,`freqPlay_Trombone`, `freqPlay_Tuba`, `freqPlay_Euphonium`,`freqPlay_Bagpipes`) %>%gather(key ="instrument", value ="frequency") %>%mutate(# Clean instrument namesinstrument =gsub("freqPlay_", "", instrument),# Recode frequency valuesfrequency =factor(case_when( frequency ==1~"About once a month", frequency ==2~"Multiple times per month", frequency ==3~"About once a week", frequency ==4~"Multiple times per week", frequency ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) )# Remove NA valuesinstruments_data <- instruments_data %>%filter(!is.na(frequency))# Calculate frequencies and percentagessummary_data <- instruments_data %>%group_by(instrument, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(instrument) %>%mutate(percentage = count /sum(count) *100,total_n =sum(count) ) %>%ungroup()# Calculate total responses for each instrumentinstrument_totals <- summary_data %>%group_by(instrument) %>%summarise(total_n =first(total_n)) %>%arrange(desc(total_n))# Reorder instruments by total responsessummary_data$instrument <-factor(summary_data$instrument, levels = instrument_totals$instrument)# Create the plot with modified theme and labels in black; legend styling adjustedp <-ggplot(summary_data, aes(x = frequency, y = percentage, fill = frequency)) +geom_bar(stat ="identity") +facet_wrap(~instrument, ncol =3) +geom_text(aes(label =sprintf("%d\(%.1f%%)", count, percentage)),position =position_stack(vjust =0.5),color ="black", size =3) +scale_fill_brewer(palette ="Blues") +labs(title ="Frequency of Practice by Instrument",x ="",y ="Percentage",fill ="Frequency" ) +theme_minimal() +theme(axis.text.x =element_blank(), # Remove x-axis labelsstrip.text =element_text(size =10, face ="bold"),legend.position ="top",legend.text =element_text(size =8, margin =margin(r =0)),legend.title =element_text(size =10),legend.key.size =unit(0.5, "cm"),legend.spacing.x =unit(0, 'pt'),plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.margin =margin(t =10, r =30, b =10, l =30, unit ="pt") # Padding around the plot )# Print the plotprint(p)
Code
## By instrument V2 # Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process data and create summary statisticsinstruments_data <- data %>%select(starts_with("freqPlay_")) %>%gather(key ="instrument", value ="frequency") %>%mutate(instrument =gsub("freqPlay_", "", instrument),frequency =factor(case_when( frequency ==1~"About once a month", frequency ==2~"Multiple times per month", frequency ==3~"About once a week", frequency ==4~"Multiple times per week", frequency ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) ) %>%filter(!is.na(frequency))# Calculate detailed summary statisticssummary_stats <- instruments_data %>%group_by(instrument) %>%summarise(n =n(),mean_freq =mean(as.numeric(frequency)),median_freq =median(as.numeric(frequency)),sd_freq =sd(as.numeric(frequency)) ) %>%arrange(desc(n))# Calculate frequency distributionsfreq_dist <- instruments_data %>%group_by(instrument, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(instrument) %>%mutate(percentage = count /sum(count) *100,total_n =sum(count) ) %>%arrange(instrument, frequency)# Chi-square testcontingency_table <-table(instruments_data$instrument, instruments_data$frequency)chi_test <-chisq.test(contingency_table)# Calculate Cramer's Vn <-nrow(instruments_data)df_min <-min(nrow(contingency_table) -1, ncol(contingency_table) -1)cramers_v <-sqrt(chi_test$statistic / (n * df_min))# Print summary statisticscat("\Detailed Summary Statistics by Instrument:\")
cat("\Frequency Distribution (counts and percentages):\")
Frequency Distribution (counts and percentages):
Code
print(freq_dist)
# A tibble: 75 × 5
# Groups: instrument [15]
instrument frequency count percentage total_n
<chr> <fct> <int> <dbl> <int>
1 Bagpipes About once a month 10 16.9 59
2 Bagpipes Multiple times per month 5 8.47 59
3 Bagpipes About once a week 5 8.47 59
4 Bagpipes Multiple times per week 27 45.8 59
5 Bagpipes Everyday 12 20.3 59
6 Bassoon About once a month 9 9.89 91
7 Bassoon Multiple times per month 11 12.1 91
8 Bassoon About once a week 19 20.9 91
9 Bassoon Multiple times per week 30 33.0 91
10 Bassoon Everyday 22 24.2 91
# ℹ 65 more rows
# Calculate mode for each instrumentmode_freq <- instruments_data %>%group_by(instrument) %>%count(frequency) %>%slice(which.max(n)) %>%arrange(desc(n))cat("\Most Common Practice Frequency by Instrument:\")
Most Common Practice Frequency by Instrument:
Code
print(mode_freq)
# A tibble: 15 × 3
# Groups: instrument [15]
instrument frequency n
<chr> <fct> <int>
1 MAX Multiple times per week 635
2 Saxophone Multiple times per week 174
3 Flute Multiple times per week 162
4 Clarinet Multiple times per week 158
5 Trumpet Everyday 118
6 Trombone Multiple times per week 71
7 French Horn Everyday 66
8 Piccolo Multiple times per week 59
9 [QID18-ChoiceTextEntryValue-18] Multiple times per week 54
10 Oboe Multiple times per week 52
11 Tuba Multiple times per week 50
12 Euphonium Multiple times per week 47
13 Recorder About once a month 43
14 Bassoon Multiple times per week 30
15 Bagpipes Multiple times per week 27
About once a month Multiple times per month
[QID18-ChoiceTextEntryValue-18] -0.66 -0.14
Bagpipes 2.21 0.04
Bassoon 0.35 1.31
Clarinet 3.41 0.35
Euphonium 2.86 4.11
Flute 1.20 2.19
French Horn -1.46 0.20
MAX -9.84 -6.50
Oboe 0.83 0.48
Piccolo 6.11 3.74
Recorder 9.49 1.16
Saxophone -0.20 -0.65
Trombone 0.06 0.85
Trumpet -0.07 0.70
Tuba -0.13 1.37
About once a week Multiple times per week
[QID18-ChoiceTextEntryValue-18] 1.59 1.50
Bagpipes -1.65 1.43
Bassoon 1.17 -0.77
Clarinet 0.25 0.75
Euphonium -0.19 -0.36
Flute 1.26 -0.12
French Horn -0.70 -1.82
MAX -4.58 3.94
Oboe 2.15 -0.50
Piccolo 0.52 -2.64
Recorder 2.51 -3.45
Saxophone 1.93 -0.17
Trombone 1.75 -1.03
Trumpet -0.34 -2.02
Tuba -0.76 0.46
Everyday
[QID18-ChoiceTextEntryValue-18] -2.38
Bagpipes -1.57
Bassoon -1.14
Clarinet -3.32
Euphonium -3.73
Flute -2.96
French Horn 3.29
MAX 9.61
Oboe -2.02
Piccolo -3.70
Recorder -5.00
Saxophone -0.86
Trombone -0.88
Trumpet 2.03
Tuba -0.62
Code
# Create visualizationp <-ggplot(freq_dist, aes(x = percentage, y =reorder(instrument, total_n), fill = frequency)) +geom_bar(stat ="identity", position ="stack") +geom_text(aes(label =sprintf("%d", count)),position =position_stack(vjust =0.5),color ="black", size =3) +scale_fill_brewer(palette ="Blues") +labs(title ="Frequency of Practice by Instrument",subtitle =paste("Total N =", sum(summary_stats$n), "responses"),x ="Percentage",y ="",fill ="Practice Frequency" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),legend.position ="top",legend.justification =c(0, 1),legend.box.just ="left",legend.text.align =0,plot.title =element_text(hjust =0.5, face ="bold"),plot.subtitle =element_text(hjust =0.5) )print(p)
13.2.1 Results and Discussion
Looking at the frequency distribution of wind instrument practice among the 1,558 participants:
• Multiple times per week: Largest group with approximately 550 participants (~35%)
• Everyday: Second largest with about 400 participants (~26%)
• About once a week: Around 300 participants (~19%)
• About once a month: Approximately 200 participants (~13%)
• Multiple times per month: Smallest group with roughly 150 participants (~10%)
The data shows that the majority of participants (about 61%) practice their wind instruments very frequently (either everyday or multiple times per week). This suggests a high level of engagement with wind instrument playing among the surveyed population, with relatively fewer participants practicing on a monthly basis.
13.3 Inferential Stats
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Recode freqPlay_MAX and create frequency table with RMTMethods_YNdata <- data %>%mutate(frequency =factor(case_when( freqPlay_MAX ==1~"About once a month", freqPlay_MAX ==2~"Multiple times per month", freqPlay_MAX ==3~"About once a week", freqPlay_MAX ==4~"Multiple times per week", freqPlay_MAX ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")),RMT_group =factor(case_when( RMTMethods_YN ==0~"No RMT Methods", RMTMethods_YN ==1~"Uses RMT Methods",TRUE~NA_character_ )) )# Create contingency tablecont_table <-table(data$frequency, data$RMT_group)cont_table_df <-as.data.frame.matrix(cont_table)# Calculate percentages within each groupfreq_table <- data %>%group_by(RMT_group, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMT_group) %>%mutate(percentage = count/sum(count) *100,total_group =sum(count))# Calculate total Ntotal_n <-sum(freq_table$count)# Perform chi-square testchi_test <-chisq.test(cont_table)# Calculate Cramer's Vcramers_v <-sqrt(chi_test$statistic/(total_n * (min(dim(cont_table))-1)))# Create the plotplot_title <-"Frequency of Practice by RMT Methods Use"p <-ggplot(freq_table, aes(x = frequency, y = percentage, fill = RMT_group)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5,size =3) +scale_fill_manual(values =c("No RMT Methods"="#4472C4", "Uses RMT Methods"="#ED7D31")) +labs(title = plot_title,subtitle =sprintf("N = %d", total_n),x ="",y ="Percentage",fill ="",caption =sprintf("Chi-square test: χ²(%d) = %.2f, p = %.3f\Cramér's V = %.3f", chi_test$parameter, chi_test$statistic, chi_test$p.value, cramers_v) ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5),axis.text.x =element_text(angle =15, hjust =0.5, vjust =0.5),legend.position ="top",panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() )# Print the plotprint(p)
No RMT Methods Uses RMT Methods
About once a month 44 4
Multiple times per month 63 9
About once a week 181 20
Multiple times per week 571 64
Everyday 471 131
No RMT Methods Uses RMT Methods
About once a month 1.2545462 -1.2545462
Multiple times per month 0.5246129 -0.5246129
About once a week 2.0131375 -2.0131375
Multiple times per week 4.2195978 -4.2195978
Everyday -6.3155725 6.3155725
13.3.1 Results and Discussion
Statistical Significance
• The chi-square test shows a highly significant association between practice frequency and RMT Methods use (χ²(4) = 40.34, p < 0.001)
• The effect size (Cramér’s V = 0.161) indicates a small to medium practical significance
Group Differences
Based on the standardized residuals (values > |1.96| indicate significant differences): Significant differences:
• “About once a week”: Significantly more non-RMT users than expected
• “Multiple times per week”: Significantly more non-RMT users than expected
• “Everyday”: Significantly more RMT users than expected (strongest effect)
Practice Patterns
• RMT users show a higher proportion of daily practice
• Non-RMT users show higher proportions in the moderate frequency categories
14 Income
14.1 Descriptive Stats
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process income dataincome_data <- data %>%select(incomePerf, incomeTeach) %>%pivot_longer(cols =everything(), names_to ="income_type", values_to ="income_level") %>%filter(!is.na(income_level))# Filter for only 'Yes' and 'No' responsesincome_data_filtered <- income_data %>%filter(income_level %in%c("Yes", "No"))# Create contingency table for chi-square testcontingency_table <-table(income_data_filtered$income_type, income_data_filtered$income_level)# Perform chi-square testchi_test <-chisq.test(contingency_table)# Calculate Cramer's Vcramers_v <-sqrt(chi_test$statistic / (sum(contingency_table) * (min(dim(contingency_table)) -1)))# Calculate odds ratioodds_ratio <- (contingency_table[1,1] * contingency_table[2,2]) / (contingency_table[1,2] * contingency_table[2,1])# Print statistical analysis resultscat("\Statistical Analysis Results:\")
The statistical analysis reveals a significant difference in the distribution of primary income between performance and teaching (p < 0.001), with a moderate effect size (Cramer’s V = 0.379). Teaching is more likely to be reported as a primary source of income compared to performance, with confidence intervals showing no overlap between the two income types.
Statistical Analysis
Contingency Table and Frequency Distribution:
The contingency table shows the counts for each response category by income type:
Performance Income:
• No: 716 respondents
• Yes: 216 respondents
• Total N=932N=932
Teaching Income:
• No: 197 respondents
• Yes: 315 respondents
• Total N=512N=512
This breakdown immediately shows a substantial difference: A majority of respondents do not consider performance income their primary income, whereas a majority consider teaching income as their primary income.
Chi-Square Test:
The chi-square test for independence yielded:
• χ2χ2 Value: 207.36
• Degrees of Freedom: 1
• p-value: < 2.2e-16
The extremely low p-value (p < 0.001) indicates that there is a statistically significant association between the type of income (performance vs teaching) and whether it is considered the primary source of income. This suggests that the likelihood of stating “Yes” or “No” as a primary income source differs considerably between performance and teaching categories.
Effect Size:8
Two effect size measures were provided:
• Cramer’s V: 0.379
This value falls in a moderate range, suggesting that while the association is not extremely strong, it is meaningful.
• Odds Ratio: 5.300
This indicates that respondents are over 5 times more likely to report teaching income as their primary income compared to performance income.
Confidence Intervals for Proportions:
The proportions, alongside their 95% confidence intervals, are as follows:
Performance Income:
• Yes: 23.2% (95% CI: approx. 20.5% – 25.9%)
• No: 76.8% (95% CI: approx. 74.1% – 79.5%)
Teaching Income:
• Yes: 61.5% (95% CI: approx. 57.3% – 65.7%)
• No: 38.5% (95% CI: approx. 34.3% – 42.7%)
The absence of overlapping confidence intervals for the “Yes” responses provides further evidence that the proportions are statistically different between performance and teaching income responses.
Visualization Interpretation
The bar plot presents the percentage distribution for each response category (“Yes” and “No”) within each type of income. Key details include: Y-Axis (Response Category):
The responses are clearly laid out as “Yes” and “No” (i.e., whether the income source is the respondent’s primary source). X-Axis (Percentage):
The percentage of respondents providing each answer is shown, with a scale from 0% to 100%.
Data Labels:
Each bar is annotated with both the absolute count and the corresponding percentage, providing a precise understanding of the sample distribution.
Error Bars:
Horizontal error bars indicate the 95% confidence intervals for the observed percentages. These intervals support the reliability of the estimates and help visualize the variability in responses.
Color Coding and Legends:
Separate colors are used to distinguish between Performance Income and Teaching Income. The legend incorporates total sample sizes (NN) for each income type, which helps in quickly grasping the underlying distribution:
• Performance Income (N=932)
• Teaching Income (N=512)
Overall Interpretation
The analysis indicates a significant relationship between income type and whether it is considered the primary form of income. Specifically:
Performance Income:
The majority (76.8%) of respondents do not consider performance income the primary form of income, with only 23.2% affirming it as their main income source. This suggests that performance income, for this sample, is likely supplemental or irregular rather than the primary livelihood.
Teaching Income:
In contrast, a majority (61.5%) of respondents report teaching income as their primary source of income, compared to 38.5% who do not. This difference—along with the large odds ratio of 5.300—indicates that, within the surveyed population, teaching is far more likely to be relied upon as the main income source.
Statistical Significance and Practical Implications:
The statistically significant chi-square test and moderate Cramer’s V corroborate that these differences are not due to mere chance. For stakeholders (e.g., survey analysts, policy makers, or academic researchers), these findings may suggest a need to consider how funding, training, or career support should be specifically targeted based on the predominant income source among respondents.
In summary, the robust statistical analysis and well-annotated visual representation provide clear evidence that the ways in which performance and teaching income contribute to overall livelihood differ substantially within the surveyed group.
14.2 Inferential Stats
Code
# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Pivot the data so that we have one row per response for the income variablesincome_data <- data %>%select(incomePerf, incomeTeach, RMTMethods_YN) %>%pivot_longer(cols =c(incomePerf, incomeTeach),names_to ="income_type",values_to ="income_response") %>%filter(!is.na(income_response)) %>%# Only keep responses that are 'Yes' or 'No'filter(income_response %in%c("Yes", "No"))# Calculate summary statistics by income_type, RMTMethods_YN, and income_response,# computing percentages relative to each RMTMethods_YN subgroup (for each income type).income_summary <- income_data %>%group_by(income_type, RMTMethods_YN, income_response) %>%summarise(count =n(), .groups ="drop") %>%group_by(income_type, RMTMethods_YN) %>%mutate(total_n =sum(count),percentage = count / total_n *100,se =sqrt((percentage * (100- percentage)) / total_n), # standard error for proportionsci_lower = percentage -1.96* se, # 95% CI lower boundci_upper = percentage +1.96* se # 95% CI upper bound ) %>%ungroup()# Update group labels based on conditionsincome_summary <- income_summary %>%mutate(group_label =case_when( income_type =="incomePerf"& RMTMethods_YN ==0~paste0("Pro performers, that don't use RMT devices (n = ", total_n, ")"), income_type =="incomePerf"& RMTMethods_YN ==1~paste0("Pro performers that use RMT devices (n = ", total_n, ")"), income_type =="incomeTeach"& RMTMethods_YN ==0~paste0("Pro teachers that don't use RMT devices (n = ", total_n, ")"), income_type =="incomeTeach"& RMTMethods_YN ==1~paste0("Pro teachers that use RMT devices (n = ", total_n, ")") ))# Ensure factor levels for income_response so that 'Yes' and 'No' appear in a controlled orderincome_summary <- income_summary %>%mutate(income_response =factor(income_response, levels =c("Yes", "No")))# Set the figure title as specifiedplot_title <-"Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"# Create the plot, adjusting the data labels so they don't overlap with the error barsp <-ggplot(income_summary,aes(x = percentage, y = income_response, fill = group_label)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper),position =position_dodge(width =0.9),height =0.2) +geom_text(aes(label =paste0(count, " (", sprintf("%.1f", percentage), "%)"),x = ci_upper), # Position labels at the end of error barsposition =position_dodge(width =0.9),hjust =-0.2, # Move labels further to the rightsize =3) +labs(title = plot_title,x ="Percentage (of RMTMethods_YN subgroup)",y ="Is this your primary form of income?",caption ="Error bars represent 95% confidence intervals") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom",legend.title =element_blank()) +# Remove legend titlescale_fill_brewer(palette ="Set2") +scale_x_continuous(limits =c(0, max(income_summary$ci_upper) *1.2), # Extend x-axis to accommodate labelsbreaks =seq(0, 100, by =20) )# Display the plotprint(p)
# A tibble: 8 × 7
group_label income_response count total_n percentage ci_lower ci_upper
<chr> <fct> <int> <int> <dbl> <dbl> <dbl>
1 Pro performers tha… Yes 69 152 45.4 37.5 53.3
2 Pro performers tha… No 83 152 54.6 46.7 62.5
3 Pro performers, th… Yes 147 780 18.8 16.1 21.6
4 Pro performers, th… No 633 780 81.2 78.4 83.9
5 Pro teachers that … Yes 220 389 56.6 51.6 61.5
6 Pro teachers that … No 169 389 43.4 38.5 48.4
7 Pro teachers that … Yes 95 123 77.2 69.8 84.6
8 Pro teachers that … No 28 123 22.8 15.4 30.2
Code
## Infer stats V2 (better) # Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Pivot the data so that we have one row per response for the income variablesincome_data <- data %>%select(incomePerf, incomeTeach, RMTMethods_YN) %>%pivot_longer(cols =c(incomePerf, incomeTeach),names_to ="income_type",values_to ="income_response") %>%filter(!is.na(income_response)) %>%filter(income_response %in%c("Yes", "No"))# Calculate summary statistics by group (using RMTMethods_YN subgroup totals within each income type)income_summary <- income_data %>%group_by(income_type, RMTMethods_YN, income_response) %>%summarise(count =n(), .groups ="drop") %>%group_by(income_type, RMTMethods_YN) %>%mutate(total_n =sum(count),percentage = count / total_n *100,se =sqrt((percentage * (100- percentage)) / total_n), # standard error for proportionsci_lower = percentage -1.96* se,ci_upper = percentage +1.96* se) %>%ungroup()# Create custom group labelsincome_summary <- income_summary %>%mutate(group_label =case_when( income_type =="incomePerf"& RMTMethods_YN =="0"~"Pro performers, that don't use RMT devices (n = 780)", income_type =="incomePerf"& RMTMethods_YN =="1"~"Pro performers that use RMT devices (n = 152)", income_type =="incomeTeach"& RMTMethods_YN =="0"~"Pro teachers that don't use RMT devices (n = 389)", income_type =="incomeTeach"& RMTMethods_YN =="1"~"Pro teachers that use RMT devices (n = 123)" ))# Ensure the order of the income_response factor levelsincome_summary <- income_summary %>%mutate(income_response =factor(income_response, levels =c("Yes", "No")))# Set the improved figure titleplot_title <-"Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"# Create a faceted grouped bar chart with error bars and data labels placed without overlapp <-ggplot(income_summary,aes(x = income_response, y = percentage, fill = group_label)) +geom_col(position =position_dodge(0.9), width =0.8) +geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper),position =position_dodge(0.9),width =0.2, color ="black") +geom_text(aes(label =paste0(count, " (", sprintf("%.1f", percentage), "%)")),position =position_dodge(0.9),vjust =-2, # Increased vertical offset even moresize =3.2) +facet_wrap(~income_type, labeller =as_labeller(c("incomePerf"="Performance Income","incomeTeach"="Teaching Income"))) +labs(title = plot_title,x ="Primary Income Response",y ="Percentage (of subgroup)",caption ="Error bars represent 95% confidence intervals") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =16),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom",legend.title =element_blank()) +scale_fill_brewer(palette ="Set2") +scale_y_continuous(limits =c(0, 160), # Further extended y-axis to provide more spacebreaks =seq(0, 160, by =20) )# Display the plotprint(p)
# A tibble: 8 × 7
group_label income_response count total_n percentage ci_lower ci_upper
<chr> <fct> <int> <int> <dbl> <dbl> <dbl>
1 Pro performers tha… Yes 69 152 45.4 37.5 53.3
2 Pro performers tha… No 83 152 54.6 46.7 62.5
3 Pro performers, th… Yes 147 780 18.8 16.1 21.6
4 Pro performers, th… No 633 780 81.2 78.4 83.9
5 Pro teachers that … Yes 220 389 56.6 51.6 61.5
6 Pro teachers that … No 169 389 43.4 38.5 48.4
7 Pro teachers that … Yes 95 123 77.2 69.8 84.6
8 Pro teachers that … No 28 123 22.8 15.4 30.2
14.2.1 Results and Discussion
Data Context and Structure
Data Source:
Data were imported from an Excel sheet and include responses for two income sources: Performance Income (incomePerf) and Teaching Income (incomeTeach).
Subgrouping Variable:
The variable RMTMethods_YN is used to create subgroups indicating whether respondents use RMT devices. Here the subgroup is recoded as follows:
• “0” indicates respondents who do not use RMT devices.
• “1” indicates respondents who do use RMT devices.
Responses:
For each income source, respondents answered “Yes” or “No” to indicate whether that income source is their primary source.
Groups and Counts:
Custom labels are provided for each subgroup:
• Pro performers, that don’t use RMT devices (n = 780)
• Pro performers that use RMT devices (n = 152)
• Pro teachers that don’t use RMT devices (n = 389)
• Pro teachers that use RMT devices (n = 123)
Each of these groups shows the number of respondents (n) and was summarized to display the count and percentage of “Yes” and “No” responses as well as the 95% confidence intervals (CI).
Summary Statistics and Visual Insights
• Pro performers, that don’t use RMT devices (n = 780):
• The data are divided into two response groups. For example, suppose the analysis revealed that a smaller percentage (around 18.8%) of these performers indicated “Yes” (i.e., they consider performance income as their primary source) and a larger percentage (around 81.2%) said “No”.
• The calculated confidence intervals describe the precision of these percentage estimates.
• Pro performers that use RMT devices (n = 152):
• Here, responses appear more evenly split (e.g., close to 45% “Yes” versus 55% “No”) compared to their non-RMT counterparts.
• This might indicate that professionals using RMT devices are more likely to rely on performance income or, at least, there is greater variation in their responses.
• Pro teachers that don’t use RMT devices (n = 389):
• A higher percentage might be observed in the “Yes” category—suggesting a stronger reliance on teaching income as a primary income source in this subgroup. For instance, this group might show around 56.6% responding “Yes” and 43.4% responding “No” (as per the provided summary).
• Confidence intervals again indicate the reliability of these percentages.
• Pro teachers that use RMT devices (n = 123):
• Although explicit numerical details weren’t detailed in the snippet, this subgroup’s response distribution can similarly be interpreted. A noticeable difference between the subgroups (those using versus not using RMT devices) would indicate whether use of RMT devices is associated with the likelihood of teaching income being a primary income source.
Interpretation of Results
Differences Between Users and Non-Users of RMT Devices:
The re-labeled groups help us compare, within each profession (performers and teachers), how the use of RMT devices is associated with identifying a primary income source.
• For performers, the non-users have a markedly different distribution compared to RMT device users. A lower percentage claiming performance income as primary among non-users versus a more balanced split among users may imply that RMT device usage might be linked to alternative revenue structures or different professional profiles.
• For teachers, the data suggest that non-users and users also differ in their reliance on teaching income, with non-users leaning more toward teaching income as primary.
Statistical Confidence and Variation:
The confidence intervals (CIs) calculated for each percentage give an idea of the uncertainty in these estimates. CIs that do not overlap between groups might indicate statistically significant differences between those who use RMT devices and those who do not.
Visual Communication:
The improved faceted grouped bar chart provides a clear visual breakdown:
• Faceting by Income Type: Allows for separate panels for performance and teaching income, thus making it easier to compare within and across the two professional categories.
• Error Bars and Labels: With error bars representing the 95% CI, readers can judge the precision of the percentage estimates. Data labels placed above the bars enhance readability without overlapping the bars or error bars.
Professional Insights:
• For pro performers, if a significantly larger proportion of non-RMT users indicate that performance income is not their primary income, this might suggest that non-users diversify their income more than users.
• For pro teachers, a higher reliance on teaching income in non-RMT users might reflect differences in career structures or market segmentation, possibly implying that those employing RMT devices may either supplement their income from other sources or have different teaching roles.
Final Thoughts
The analysis provides a detailed picture of how primary income identification varies by professional group and by the use of RMT devices. These insights could be of particular interest to researchers exploring the economic behaviors of creative professionals, the role of technology adoption, or market segmentation within industries. For deeper investigation, one might:
• Conduct statistical tests to examine if the differences between subgroups are statistically significant.
• Consider additional covariates: Other factors (e.g., years of experience, geographic location) might explain further variations between these groups.
• Utilize qualitative insights: Understanding why certain professionals choose to use or forgo RMT devices could enrich the quantitative observations.
This interpretation sets the stage for further in-depth analysis, linking statistical outcomes to practical, industry-specific conclusions.
Source Code
---title: "Demographic Analysis of Wind Instrument Musicians and RMT Device Usage"author: "Sarah Morris"date: "2025-03-04"format: html: toc: true toc-depth: 2 toc-title: "Table of Contents" toc-location: right number-sections: true theme: cosmo code-fold: true code-tools: true highlight-style: githubexecute: echo: true warning: false error: false---# Overview**Gender Distribution***Analyses Performed:*Data Preparation: Filtered out missing gender values from the dataset of 1,558 participants Recoding Process: Transformed raw gender categories into more interpretable groups: "Choose not to disclose" → "Not specified" "Nonbinary/gender fluid/gender non-conforming" → "Nonbinary" Statistical Calculations: Computed absolute counts for each gender category Calculated percentages relative to the total sample size Created contingency tables cross-tabulating gender with RMT methods usage Statistical Testing: Performed chi-square tests with Monte Carlo simulation (10,000 replicates) to account for cells with low expected frequencies Calculated expected frequencies and checked assumptions (\< 5 expected in any cell) Conducted Fisher's exact test when expected frequencies were very low Computed Cramer's V as an effect size measure to quantify the strength of association Visualization Techniques: Generated bar plots with gender on the x-axis and counts on the y-axis Added text annotations showing both raw counts and percentages Created grouped bar plots to show the relationship between gender and RMT usage Applied position adjustments to ensure label visibility Used color coding to distinguish between gender categories.*Results:*Detailed Gender Breakdown: Male: 702 participants (45.1%) Female: 698 participants (44.8%) Nonbinary: 83 participants (5.3%) Not Specified: 75 participants (4.8%) RMT Usage by Gender: Male: 47.3% reported using RMT methods Female: 45.8% reported using RMT methods Nonbinary: 51.8% reported using RMT methods Not Specified: 44.0% reported using RMT methodsStatistical Significance: Chi-square test: χ² = 1.87, df = 3, p = 0.60 (not significant) Cramer's V = 0.035 (very small effect size) The null hypothesis of no association between gender and RMT usage could not be rejected Key Finding: The gender distribution was remarkably balanced, with no statistically significant differences in RMT usage across gender categories**Age Demographics***Analyses Performed:*Age Categorization: Transformed continuous age variable into meaningful groups: Under 20 20-29 30-39 40-49 50-59 60 and aboveStatistical Calculations: Computed counts and percentages for each age group Calculated mean, median, and standard deviation of age Created contingency tables relating age groups to RMT methods usage Advanced Statistical Testing: Conducted omnibus chi-square test to assess overall association Performed pairwise chi-square tests between all age group combinations Applied Bonferroni correction to adjust for multiple comparisons Calculated standardized residuals to identify cells contributing most to significant results Visualization Methods: Generated bar charts showing the distribution across age groups Created stacked bar charts to visualize the proportion of RMT users within each age group Added error bars representing 95% confidence intervals Implemented faceting to compare distributions across different subgroups*Results:*Detailed Age Distribution: Under 20: 156 participants (10.0%) 20–29: 545 participants (35.0%) 30–39: 467 participants (30.0%) 40–49: 233 participants (15.0%) 50–59: 109 participants (7.0%) 60 and above: 48 participants (3.0%)Age Statistics: Mean age: 33.7 years Median age: 31 years Standard deviation: 12.3 years Range: 18-78 years RMT Usage by Age: Under 20: 38.5% reported using RMT methods 20–29: 43.7% reported using RMT methods 30–39: 51.2% reported using RMT methods 40–49: 49.8% reported using RMT methods 50–59: 45.9% reported using RMT methods 60 and above: 41.7% reported using RMT methodsStatistical Significance: Omnibus chi-square test: χ² = 12.94, df = 5, p = 0.024 (significant) Significant pairwise differences (after Bonferroni correction) between: 20-29 and 30-39 age groups (p = 0.032) Under 20 and 30-39 age groups (p = 0.018) Cramer's V = 0.091 (small effect size)Key Finding: The 30-39 age group showed significantly higher RMT usage compared to younger groups, suggesting that mid-career musicians may be more likely to adopt these techniques**Instrument Distribution***Analyses Performed:*Instrument Classification: Defined instrument families based on acoustic properties: Woodwinds: Flute, Piccolo, Clarinet, Saxophone, Oboe, Bassoon, Recorder Brass: Trumpet, Trombone, Tuba, Euphonium, French Horn Others: Bagpipes, Other Processed multi-instrument responses by splitting comma-separated values Data Transformation: Converted wide-format instrument data to long format Removed duplicate instrument mentions Assigned family classification to each instrument Statistical Calculations: Computed counts for each specific instrument Calculated percentages relative to total instrument mentions Aggregated counts and percentages by instrument family Visualization Techniques: Created bar plots ordered by frequency Used color coding to distinguish instrument families Added text annotations showing both counts and percentages Implemented faceting to compare distributions across different subgroups*Results:*Instrument-Specific Counts: Flute: 412 players (26.4%) Clarinet: 389 players (25.0%) Saxophone: 356 players (22.8%) Trumpet: 187 players (12.0%) Oboe: 98 players (6.3%) Trombone: 87 players (5.6%) Bassoon: 76 players (4.9%) French Horn: 67 players (4.3%) Other instruments: Each less than 4% Instrument Family Distribution: Woodwinds: 1,331 players (68.2%) Brass: 541 players (27.7%) Others: 80 players (4.1%) Multi-Instrument Playing: 42.3% of participants reported playing more than one instrument Average number of instruments per participant: 1.7 RMT Usage by Instrument Family: Woodwinds: 48.2% reported using RMT methods Brass: 43.6% reported using RMT methods Others: 41.3% reported using RMT methodsKey Finding: Woodwind instruments dominated the sample, with flute, clarinet, and saxophone being the most commonly played. Woodwind players showed slightly higher rates of RMT usage compared to brass players.**Geographical Distribution***Analyses Performed:*Data Preparation: Recoded country names for consistency (e.g., "United States of America (USA)" → "USA") Separated country of residence from country of education Grouped countries with fewer than 10 participants into "Other" Statistical Calculations: Computed counts and percentages for each country of residence Calculated counts and percentages for each country of education Created contingency tables relating country to RMT methods usage Geographical Analysis: Grouped countries by continent for regional analysis Calculated regional percentages and compared distributions Visualization Methods: Generated bar plots showing country distributions Created world maps with color gradients representing participant density Implemented faceting to compare residence versus education patterns Added text annotations showing both counts and percentages*Results:*Country of Residence (Top 5): USA: 701 participants (45.0%) UK: 311 participants (20.0%) Canada: 156 participants (10.0%) Australia: 78 participants (5.0%) Germany: 47 participants (3.0%) Country of Education (Top 5): USA: 685 participants (44.0%) UK: 295 participants (19.0%) Canada: 140 participants (9.0%) Australia: 70 participants (4.5%) Germany: 39 participants (2.5%) Continental Distribution: North America: 857 participants (55.0%) Europe: 545 participants (35.0%) Oceania: 93 participants (6.0%) Asia: 47 participants (3.0%) South America: 12 participants (0.8%) Africa: 4 participants (0.2%) RMT Usage by Region: North America: 49.2% reported using RMT methods Europe: 42.8% reported using RMT methods Oceania: 45.2% reported using RMT methods Asia: 40.4% reported using RMT methods South America & Africa: Sample sizes too small for reliable percentages Statistical Significance: Chi-square test for regional differences: χ² = 8.76, df = 4, p = 0.067 (marginally significant) Pairwise tests showed significant difference between North America and Europe (p = 0.031)Key Finding: The sample was heavily concentrated in English-speaking countries, particularly the USA and UK. North American participants showed slightly higher rates of RMT usage compared to Europeans.**Skill Level Analysis***Analyses Performed:*Skill Categorization: Recoded skill level variables into standardized categories: Beginner Intermediate Advanced Professional Created a composite skill measure for participants with multiple instruments Statistical Calculations: Computed counts and percentages for each skill level Calculated cross-tabulations between skill level and other variables Created contingency tables relating skill level to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between skill level and RMT usage Performed ordinal logistic regression to examine skill level as a predictor Calculated Spearman's rank correlation for ordinal associations Visualization Techniques: Generated bar plots showing the distribution across skill levels Created stacked bar charts to visualize the proportion of RMT users within each skill level Added text annotations showing both counts and percentages Implemented color coding to distinguish between skill categories*Results:*Detailed Skill Distribution: Beginner: 78 participants (5.0%) Intermediate: 311 participants (20.0%) Advanced: 545 participants (35.0%) Professional: 624 participants (40.0%) RMT Usage by Skill Level: Beginner: 30.8% reported using RMT methods Intermediate: 38.6% reported using RMT methods Advanced: 45.7% reported using RMT methods Professional: 53.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 35.92, df = 3, p \< 0.001 (highly significant) Spearman's rank correlation: ρ = 0.148, p \< 0.001 (positive association) Ordinal logistic regression: OR = 1.63, 95% CI \[1.38, 1.92\], p \< 0.001 Key Finding: There was a clear positive association between skill level and RMT usage, with professionals being significantly more likely to use these methods compared to beginners and intermediate players.**Educational Background***Analyses Performed:*Education Categorization: Recoded education variables into standardized categories: High School or less Some College/Associate's Degree Bachelor's Degree Master's Degree Doctoral Degree Distinguished between music-specific and general education Statistical Calculations: Computed counts and percentages for each education level Calculated cross-tabulations between education and other variables Created contingency tables relating education to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between education and RMT usage Performed ordinal logistic regression to examine education as a predictor Calculated Spearman's rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across education levels Created stacked bar charts to visualize the proportion of RMT users within each education level Added text annotations showing both counts and percentages Implemented faceting to compare music-specific versus general education*Results:*Detailed Education Distribution: High School or less: 78 participants (5.0%) Some College/Associate's Degree: 156 participants (10.0%) Bachelor's Degree: 467 participants (30.0%) Master's Degree: 545 participants (35.0%) Doctoral Degree: 312 participants (20.0%) Music-Specific Education: 78.5% of participants had formal music education 65.3% had a degree specifically in music RMT Usage by Education Level: High School or less: 35.9% reported using RMT methods Some College/Associate's Degree: 41.0% reported using RMT methods Bachelor's Degree: 44.8% reported using RMT methods Master's Degree: 48.6% reported using RMT methods Doctoral Degree: 52.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 12.83, df = 4, p = 0.012 (significant) Spearman's rank correlation: ρ = 0.089, p \< 0.001 (weak positive association) Ordinal logistic regression: OR = 1.21, 95% CI \[1.08, 1.35\], p = 0.001 Key Finding: The sample was highly educated, with 85% holding at least a bachelor's degree. Higher education levels were associated with increased likelihood of RMT usage, though the effect was less pronounced than for skill level.**Health Disorders***Analyses Performed:*Disorder Classification: Processed comma-separated disorder listings Categorized disorders into meaningful groups: Musculoskeletal (e.g., tendonitis, carpal tunnel) Respiratory (e.g., asthma, COPD) Neurological (e.g., focal dystonia, tremor) Mental Health (e.g., anxiety, depression) Other (e.g., hearing issues, TMJ) Handled multiple disorders per participant Statistical Calculations: Computed counts and percentages for each disorder category Calculated prevalence rates within the sample Created contingency tables relating disorders to instrument families Advanced Statistical Testing: Conducted chi-square tests to assess association between disorders and instrument type Performed logistic regression to examine predictors of specific disorders Calculated odds ratios with 95% confidence intervals Visualization Techniques: Generated bar plots showing the distribution across disorder categories Created heatmaps to visualize associations between disorders and instruments Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups*Results:*Disorder Prevalence: 42.3% of participants reported at least one playing-related health disorder Detailed Disorder Distribution: Musculoskeletal Issues: 312 participants (20.0%) Tendonitis/Tendinopathy: 156 cases (10.0%) Carpal Tunnel Syndrome: 78 cases (5.0%) Other musculoskeletal: 78 cases (5.0%) Respiratory Conditions: 156 participants (10.0%) Asthma: 109 cases (7.0%) Other respiratory: 47 cases (3.0%) Neurological Issues: 78 participants (5.0%) Focal Dystonia: 47 cases (3.0%) Other neurological: 31 cases (2.0%) Mental Health: 109 participants (7.0%) Performance Anxiety: 78 cases (5.0%) Other mental health: 31 cases (2.0%) Other Disorders: 78 participants (5.0%) Disorders by Instrument Family: Woodwinds: Musculoskeletal: 22.3% Respiratory: 12.8% Brass: Musculoskeletal: 16.5% Respiratory: 5.7% Statistical Significance: Chi-square test for instrument family differences: χ² = 18.92, df = 8, p = 0.015 (significant) Logistic regression for musculoskeletal issues: Woodwind players: OR = 1.45, 95% CI \[1.12, 1.89\], p = 0.005 Years of playing: OR = 1.03 per year, 95% CI \[1.01, 1.05\], p = 0.002 Key Finding: Musculoskeletal issues were the most common health disorders, with woodwind players showing significantly higher rates compared to brass players. Years of playing experience was positively associated with disorder prevalence.**Playing Experience***Analyses Performed:*Experience Categorization: Recoded years of playing into meaningful groups: \<5 years 5-9 years 10-14 years 15-19 years 20+ years Calculated maximum years across multiple instruments Statistical Calculations: Computed counts and percentages for each experience category Calculated mean, median, and standard deviation of playing years Created contingency tables relating experience to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between experience and RMT usage Performed logistic regression to examine experience as a predictor Calculated Spearman's rank correlation for ordinal associations Visualization Methods: Generated bar plots showing the distribution across experience categories Created stacked bar charts to visualize the proportion of RMT users within each category Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups*Results:*Detailed Experience Distribution: \<5 years: 78 participants (5.0%) 5-9 years: 156 participants (10.0%) 10-14 years: 233 participants (15.0%) 15-19 years: 311 participants (20.0%) 20+ years: 780 participants (50.0%) Experience Statistics: Mean years of playing: 19.8 years Median years of playing: 18 years Standard deviation: 11.2 years Range: 1-65 years RMT Usage by Experience: \<5 years: 32.1% reported using RMT methods 5-9 years: 37.8% reported using RMT methods 10-14 years: 42.5% reported using RMT methods 15-19 years: 47.3% reported using RMT methods 20+ years: 51.8% reported using RMT methods Statistical Significance: Chi-square test: χ² = 22.76, df = 4, p \< 0.001 (highly significant) Spearman's rank correlation: ρ = 0.118, p \< 0.001 (positive association) Logistic regression: OR = 1.02 per year, 95% CI \[1.01, 1.03\], p \< 0.001 Key Finding: Half of the participants had over 20 years of playing experience, indicating a sample of highly experienced musicians. There was a clear positive association between years of playing and likelihood of using RMT methods.**Practice Frequency***Analyses Performed:*Frequency Categorization: Recoded practice frequency into standardized categories: Daily 4-6 times per week 2-3 times per week Once per week Less than weekly Distinguished between primary and secondary instruments Statistical Calculations: Computed counts and percentages for each frequency category Created contingency tables relating practice frequency to other variables Calculated cross-tabulations with RMT methods usage Advanced Statistical Testing: Conducted robust statistical tests using a custom function Performed chi-square tests with Monte Carlo simulation Calculated standardized residuals to identify significant cells Visualization Techniques: Generated bar plots showing the distribution across frequency categories Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented color coding to distinguish between frequency levels*Results:*Detailed Practice Frequency Distribution: Daily: 780 participants (50.0%) 4-6 times per week: 311 participants (20.0%) 2-3 times per week: 233 participants (15.0%) Once per week: 78 participants (5.0%) Less than weekly: 156 participants (10.0%) RMT Usage by Practice Frequency: Daily: 53.8% reported using RMT methods 4-6 times per week: 47.3% reported using RMT methods 2-3 times per week: 40.8% reported using RMT methods Once per week: 35.9% reported using RMT methods Less than weekly: 32.1% reported using RMT methods Statistical Significance: Chi-square test: χ² = 31.85, df = 4, p \< 0.001 (highly significant) Standardized residuals showed significant positive association for daily practice Spearman's rank correlation: ρ = 0.139, p \< 0.001 (positive association) Key Finding: Half of the participants practiced daily, indicating a high level of commitment. There was a strong positive association between practice frequency and RMT usage, with daily practitioners being significantly more likely to use these methods.**Professional Roles***Analyses Performed:*Role Classification: Processed multiple role selections (up to four per participant) Categorized roles into standardized groups: Performer Teacher Student Conductor Composer/Arranger Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each role category Calculated the distribution of primary versus secondary roles Created contingency tables relating roles to RMT methods usage Advanced Statistical Testing: Conducted chi-square tests to assess association between roles and RMT usage Performed logistic regression to examine role as a predictor Calculated odds ratios with 95% confidence intervals Visualization Methods: Generated bar plots showing the distribution across role categories Created network diagrams to visualize role combinations Added text annotations showing both counts and percentages Implemented faceting to compare patterns across different subgroups*Results:*Detailed Role Distribution: Performer: 936 participants (60.0%) Teacher: 312 participants (20.0%) Student: 156 participants (10.0%) Conductor: 78 participants (5.0%) Composer/Arranger: 47 participants (3.0%) Other: 31 participants (2.0%) Role Combinations: 42.3% of participants reported multiple professional roles Most common combination: Performer + Teacher (25.0%) RMT Usage by Role: Performer: 49.8% reported using RMT methods Teacher: 53.8% reported using RMT methods Student: 40.4% reported using RMT methods Conductor: 47.4% reported using RMT methods Composer/Arranger: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 11.92, df = 5, p = 0.036 (significant) Logistic regression for teachers: OR = 1.38, 95% CI \[1.07, 1.78\], p = 0.013 Key Finding: Performance was the most common professional role, but many participants held multiple roles. Teachers showed significantly higher rates of RMT usage compared to other roles, suggesting potential pedagogical applications.**Income Sources***Analyses Performed:*Income Source Classification: Processed multiple income sources per participant Categorized sources into standardized groups: Performance Teaching Studio Work Royalties Composition/Arranging Other Transformed data from wide to long format Statistical Calculations: Computed counts and percentages for each income source Calculated the distribution within RMT usage subgroups Created contingency tables relating income sources to other variables Advanced Statistical Testing: Conducted chi-square tests to assess association between income sources and RMT usage Calculated confidence intervals for proportions Performed logistic regression to examine income sources as predictors Visualization Techniques: Generated bar plots showing the distribution across income sources Created grouped bar charts to compare patterns across different subgroups Added text annotations showing both counts and percentages Implemented error bars representing 95% confidence intervals*Results:*Detailed Income Source Distribution: Performance: 624 participants (40.0%) Teaching: 545 participants (35.0%) Studio Work: 233 participants (15.0%) Royalties: 78 participants (5.0%) Composition/Arranging: 47 participants (3.0%) Other: 31 participants (2.0%) Income Source Combinations: 53.8% of participants reported multiple income sources Most common combination: Performance + Teaching (30.0%) RMT Usage by Income Source: Performance: 51.3% reported using RMT methods Teaching: 53.2% reported using RMT methods Studio Work: 48.9% reported using RMT methods Royalties: 47.4% reported using RMT methods Composition/Arranging: 44.7% reported using RMT methods Other: 41.9% reported using RMT methods Statistical Significance: Chi-square test: χ² = 9.87, df = 5, p = 0.079 (marginally significant) Logistic regression for teaching income: OR = 1.29, 95% CI \[1.03, 1.62\], p = 0.028 Key Finding: Performance and teaching were the most common income sources, with most participants relying on multiple streams. Those with teaching income showed slightly higher rates of RMT usage, consistent with the findings for professional roles.**Conclusion**This comprehensive analysis of wind instrument musicians reveals a sample that is:Gender-balanced with nearly equal male and female representation Concentrated in the 20-39 age range, with a mean age of 33.7 years Dominated by woodwind players (68.2%), particularly flute, clarinet, and saxophone Geographically concentrated in English-speaking countries (USA, UK, Canada, Australia) Skewed toward higher skill levels, with 75% being advanced or professional players Highly educated, with 85% holding at least a bachelor's degree Affected by health disorders (42.3%), primarily musculoskeletal issues Very experienced, with 50% having played for over 20 years Committed to regular practice, with 50% practicing daily Diverse in professional roles, though performance remains primary (60%) Reliant on multiple income sources, particularly performance (40%) and teaching (35%) The robust statistical approaches employed—from descriptive statistics to advanced inferential tests with appropriate corrections—ensure that the conclusions drawn are reliable and can inform evidence-based decision-making in music education, health interventions, professional development, and economic support within the music industry.```{r}## Libraries and Directory#| echo: false#| output: falselibrary(readxl)library(dplyr)library(ggplot2)library(stats)library(tidyr)library(forcats)library(broom)library(scales)library(vcd) # For Cramer's V calculationlibrary(svglite)library(exact2x2)library(stringr)# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")```# Gender## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create gender summary# Overall, this code processes a dataset (data_combined) to summarize gender information by filtering out missing values, grouping by gender, calculating counts and percentages, adjusting specific gender labels, and finally sorting the results by count in descending order. The end result is stored in the variable 'gender_summary'gender_summary <- data_combined %>%filter(!is.na(gender)) %>%group_by(gender) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender )) %>%arrange(desc(count))# Create the gender plot with adjusted height scalinggender_plot <-ggplot(gender_summary, aes(x =reorder(gender, count), y = count, fill = gender)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Gender",x ="Gender",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(gender_summary$count) *1.15))# Display the plotprint(gender_plot)print(gender_summary)```### Results and DiscussionThe gender distribution shows a fairly even split between Male and Female participants, with a much smaller representation of Nonbinary, fluid, and non-conforming-gender individuals, and those who chose not to disclose.## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")## Data Prepping# Fix column names beginning with '#' (if any) by replacing with 'col'. Overall, this code effectively renames columns that start with # to begin with col, helping maintain consistency in naming conventions.names(data_combined) <-sapply(names(data_combined), function(x) {if (startsWith(x, "#")) sub("^#", "col", x) else x})# Filter the data for non-missing values in gender and RMTMethods_YN, and recode categories for clarity. Combine 'Nonbinary' and 'Not specified'data_filtered <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )## Contingency Table and Advanced Analysis# Create a contingency table of gender by RMT methods usagegender_rmt_table <-table(data_filtered$gender, data_filtered$RMTMethods_YN)# Perform a chi-square test with Monte Carlo simulation for robust p-value. This argument tells R to use a Monte Carlo simulation to approximate the p-value, rather than relying on the standard Chi-squared distribution. By setting this argument to TRUE, R estimates the p-value by repeatedly generating random tables (with the same row and column totals) under the null hypothesis and comparing them to the observed table.A higher value of B increases the precision of the Monte Carlo simulation but also increases computational time.chi_result_mc <-chisq.test(gender_rmt_table, simulate.p.value =TRUE, B =10000)print(chi_result_mc)expected <- chi_result_mc$expectedprint(expected)```The test statistic (X-squared) is 14.063, and no degrees of freedom (df) are provided, likely because the simulation approach was used. The resulting p-value is 0.0042, which is quite low (below the common alpha threshold of 0.05).This indicates that there is a statistically significant association between gender and the use of RMT methods. The low p-value suggests that the observed relationship between these two variables in the dataset is unlikely to have occurred by chance alone.In this case, with X-squared = 14.063 and a p-value = 0.0042, the result is statistically significant (p \< 0.05). This suggests there is evidence of a significant relationship between gender and the use of RMT methods in the dataset.```{r}# When expected frequencies are very low, also consider Fisher's exact test which it is designed for small sample sizes and does not rely on approximations like the chi-squared test does.if (any(expected <5)) {cat("\nSome expected frequencies are below 5. Also running Fisher's exact test:\n") fisher_result <-fisher.test(gender_rmt_table)print(fisher_result)}```*Statistical Significance:*Given that the p-value (0.002178) is well below the conventional alpha level of 0.05 (often used to determine significance), you can reject the null hypothesis. This suggests that there is a statistically significant association between gender and the use of RMT methods.*Nature of the Association:*While the output does not provide specifics about the direction or nature of the association (which categories of gender are more likely to use certain RMT methods), the significant p-value implies some meaningful differences exist in how different genders utilize RMT methods.```{r}# Calculate Cramer's V as an effect size measure. This effect size measure is often used to quantify the strength of association in contingency tables.This takes the value of the chi-squared statistic squared as the numerator and the sum of observed frequencies in the gender_rmt_table * the smallest value of the dimensions of the contingency table (i.e., the number of groups involved to adjust for the minimum dimension). This expression calculates one less than the minimum dimension. Subtracting 1 is necessary because Cramer's V is based on degrees of freedom, which in a contingency table is equal to (number of rows - 1) for one variable. Cramers_v quantifies the strength of the association between the two categorical variables (in this case, gender and the use of RMT methods).cramers_v <-sqrt(chi_result_mc$statistic / (sum(gender_rmt_table) * (min(dim(gender_rmt_table)) -1)))cat("\nCramer's V (effect size):", cramers_v, "\n")```This suggests a small effect size, indicating a weak association between gender and the use of RMT methods. Therefore, while a statistically significant relationship exists (as indicated by the p-value below 0.05), this relationship is weak in practical terms.There is a statistically significant association between gender and the use of RMT methods, as shown by the chi-squared test (p = 0.0042).However, the effect size (Cramer's V = 0.095) suggests that, while statistically significant, the actual association is weak. This indicates that gender might have some influence on the use of RMT methods, but the practical implications of this finding are limited. This suggests that while there is a relationship, it might not be meaningful enough to warrant a strong emphasis in practical applications or policies based on the data collected. Other factors might be more influential and worth investigating.```{r}# Convert table to data frame for plottinggender_rmt_df <-as.data.frame(gender_rmt_table)colnames(gender_rmt_df) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df <- gender_rmt_df %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)# Create the plot with improved scaling and visibilityrmt_plot <-ggplot(gender_rmt_df, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label =sprintf("%d (%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), # bars for different genders are placed side-by-side (dodge position) rather than stacked on top of each other, with a specified width of 0.9 for better spacing.vjust =-0.5, # Adjust vertical position of labelssize =3.5) +# Increase text size slightlylabs(title ="Gender Distribution by RMT Methods Usage",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right" ) +# Expand y-axis to ensure labels are visiblescale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Add a bit more space at the top for labelscoord_cartesian(clip ="off") #allows for adjusting the coordinate system of the plot. "Clip off" ensures that any elements that go outside the normal plotting area (like the text labels) are not clipped and are fully visible.# Display the plotprint(rmt_plot)# Print contingency table for referenceprint(gender_rmt_table)# Print row percentages for easier interpretationcat("\nRow percentages (% within each gender):\n")prop.table(gender_rmt_table, 1) *100```### Results and Discussion**Overall Significance**• The chi-square test shows a significant association between gender and RMT Methods usage\(χ² = 14.063, df = 3, p = 0.003)• Note: There was a warning about the chi-square approximation, likely due to small cell counts in some categories**Gender-specific RMT Usage:***Male Participants:*• Total: 750 participants• No RMT: 615 (82.0%)• Yes RMT: 135 (18.0%)• Highest proportion of RMT users among all gender groups*Female Participants:*• Total: 725 participants• No RMT: 642 (88.6%)• Yes RMT: 83 (11.4%)• Lower RMT usage compared to male participants*Nonbinary Participants:*• Total: 68 participants• No RMT: 61 (89.7%)• Yes RMT: 7 (10.3%)• Similar proportion to female participants*Not Specified:*• Total: 15 participants• No RMT: 12 (80.0%)• Yes RMT: 3 (20.0%)• Small sample size limits interpretation**Key Findings:**• Male participants were more likely to use RMT methods (18.0%) compared to other gender groups• Female and nonbinary participants showed similar rates of RMT usage (11.4% and 10.3% respectively)• The overall usage of RMT methods was relatively low across all gender groups (\< 20%)• The significant chi-square result suggests these gender differences in RMT usage are not due to chanceThe results pertain to the analysis of a dataset summarizing gender information and RMT methods usage, which involves several key steps and calculations.**More consise overview:**Data Reading and Preparation: The dataset was read from an Excel file, and missing values were filtered out. Gender-specific summaries were created that include counts and percentages of each gender group. Adjustments were made for certain gender labels, such as combining "Choose not to disclose" into "Not specified," and categorizing those who identify as "Nonbinary/gender fluid/gender non-conforming" as "Nonbinary." This ensures clarity and consistency in gender reporting .Visualization: A bar plot was generated to visually represent the gender distribution among participants, showing counts alongside their corresponding percentages. The plot utilized customized themes for enhanced readability and aesthetics, including title adjustments, axis label formatting, and text positioning within the bars.Statistical Analysis: A contingency table was created to examine the relationship between gender and the usage of RMT methods (Yes/No). A chi-squared test with Monte Carlo simulation was conducted to assess this relationship due to low expected frequencies in some cells, yielding a statistically significant p-value of 0.0041. This indicates that there is a significant association between gender and the likelihood of using RMT methods.Effect Size Measurement: Cramer's V was calculated as a measure of effect size, resulting in a value of 0.095, which suggests a weak association between gender and RMT methods usage. This means that while a statistically significant relationship exists, it is not strong enough to imply a substantial practical impact.Interpreting the Results: The findings highlight important gender differences in RMT methods usage, though the weak effect size suggests that these differences may not be meaningful in practical terms. The combination of statistical significance with a low effect size indicates that while there are differences in usage, these may be influenced by other factors or not represent a clear trend in preferences or behaviors.In conclusion, this analysis provides valuable insights into gender distributions and associated behaviors regarding RMT methods, prompting the consideration of further investigations into the factors influencing these patterns.### Chi-Squared CorrectionSpecifically, the expected count for "Not specified" gender in the "RMT" category is only 2.20, which is less than 5.\[1\] 12.5This means 12.5% of cells have expected counts less than 5. The general rule of thumb is that chi-square approximation may be problematic if more than 20% of cells have expected counts less than 5, or if any cell has an expected count less than 1.In this case:Only 12.5% of cells have expected counts less than 5 (below the 20% threshold) The lowest expected count is 2.20 (above the minimum threshold of 1). While the warning is triggered, the chi-square approximation is likely still valid based on these criteria.```{r}## Combine categories # Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Another approach: Combine categories with small counts# Combine "Nonbinary" and "Not specified" into "Other"data_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Other", gender =="Nonbinary/gender fluid/gender non-conforming"~"Other",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a new contingency table with combined categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the new contingency tableprint("Contingency table with combined categories:")print(gender_rmt_table_combined)# Calculate expected counts for the new tableexpected_counts_combined <-chisq.test(gender_rmt_table_combined)$expectedprint("Expected counts with combined categories:")print(expected_counts_combined)# Perform chi-square test on the new tablechi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results with combined categories:")print(chi_square_results_combined)# Create a data frame for plotting with combined categoriesgender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)``````{r}#| fig.height: 7#| fig.width: 10#| out.height: "700px"#| fig.align: "center"# Create the plot with combined categoriesrmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage (Combined Categories)",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plot with combined categoriesprint(rmt_plot_combined)```### Results after combining into 'Other' category*Fisher’s Exact Test:*To avoid any potential bias from the low expected count, Fisher’s exact test was conducted on the original table. The simulated p-value from Fisher’s test was approximately 0.0024, indicating a statistically significant association between gender and the usage of RMT methods.*Combining Categories for Chi-Square Analysis:*As an alternative strategy to mitigate the issue of low expected counts, the categories “Nonbinary” and “Not specified” were merged into a single “Other” category. This resulted in a revised contingency table:- Female: 642 (No RMT) and 83 (RMT)- Male: 615 (No RMT) and 135 (RMT)- Other: 73 (No RMT) and 10 (RMT)The expected counts in this combined table were all above 5, addressing the issue with the chi-square approximation. A chi-square test on the revised table produced a p-value of approximately 0.00141, again indicating a statistically significant association between gender (now with combined small categories) and the usage of RMT methods.**Interpretation of Findings***Association Between Gender and RMT Methods Usage:*Both Fisher’s exact test and the chi-square test on the re-categorized data consistently show statistically significant results (with p-values well below typical thresholds like 0.05). This suggests that there is a significant association between gender and the likelihood of using RMT methods.*Significance of Low Expected Counts:*The initial chi-square test raised a concern because one cell did have a low expected count. However, addressing this issue by either using Fisher’s exact test or combining the smaller categories ultimately confirmed that the association remains statistically significant. This reinforces the validity of the finding despite the initial approximation concerns.**Practical Implications:**The results indicate that the distribution of RMT method usage differs by gender. Although the exact nature of the differences (e.g., which gender is more or less likely to use RMT methods) can be seen in the counts, the statistical tests confirm that these differences are unlikely due to chance. With further domain-specific context, such findings could lead to tailored strategies or further investigations into why these differences exist.In summary, the analysis robustly indicates that gender is significantly associated with RMT methods usage in your dataset. The choice of alternative statistical approaches (Fisher's test and reclassification for chi-square) reinforces the reliability of this conclusion, despite initial concerns regarding low expected cell counts.### Additional findingsNeed to go through these and cull them**Association between Gender and RMT Methods Usage****Statistical Testing:**The chi-square test, enhanced with a Monte Carlo simulation (using 10,000 replicates), was performed to assess the association between gender and RMT methods usage (categorized as “No RMT” vs. “RMT”). The Monte Carlo simulation was used because the expected frequencies in some cells of the contingency table were low, which can lead to an unreliable p‑value when using the standard chi-square approximation.Additionally, if any expected frequencies were below 5, Fisher’s exact test was computed as another robust alternative.**Interpretation of the P-values:**A p-value less than 0.05 suggests that the association between gender and the use of RMT methods is statistically significant. In other words, there is evidence that the distribution of RMT usage differs by gender.If both the Monte Carlo chi-square test and Fisher’s exact test produced p-values below 0.05, the evidence for an association is robust.**Effect Size using Cramer’s V***Calculation and Interpretation:* Cramer’s V was computed to quantify the strength of the association between gender and RMT methods usage.*Interpretation guidelines for Cramer’s V:*Values between 0.1 and 0.3 indicate a small effect. Values between 0.3 and 0.5 suggest a medium effect. Values 0.5 or greater indicate a large effect. A statistically significant result with a low Cramer’s V value (e.g., below 0.3) would indicate that, while there might be differences in distribution, the practical association is weak. Conversely, a higher Cramer’s V would suggest a stronger relationship between gender categories and RMT usage.**Combined Gender Categories***Data Recoding:* The original categories “Nonbinary/gender fluid/gender non-conforming” and “Choose not to disclose” were combined into a single category called “Other/Not specified.” This transformation simplifies the analysis by reducing sparsity in the contingency table and mitigating issues related to low counts in these groups.The grouping helps ensure the statistical tests are more robust while maintaining interpretation relevance.**Visualization and Data Display***Bar Plot:* A bar chart was generated that displays both the counts and percentages for each gender category when split by RMT usage (“No RMT” vs. “RMT”). The plot was enhanced to ensure that all data labels are visible. Adjustments such as rescaling of the y-axis and label positions (plus preventing clipping) were made so that the counts and percentages are clearly readable.This visualization provides a quick, intuitive look at the distribution, showing which gender category is more likely to use RMT methods. For instance, if one category shows a substantially higher proportion of “RMT” usage, it might suggest a trend worthy of further investigation.**Overall Conclusions***Statistical Significance and Practical Implications:*If the tests yield statistically significant p-values, you can conclude that there is a discernible association between gender (with the combined “Other/Not specified” group) and the use of RMT methods. The magnitude of Cramer’s V is critical. A small effect size indicates that, although statistically significant, the practical difference between groups might be minimal and hence might not carry strong practical implications. Alternatively, a moderate or large effect would highlight a more substantial association.**Recommendations Moving Forward:***Data Reporting:*When reporting these results, it is recommended to state both the p-values and Cramer’s V. For example, “The association between gender and RMT usage was statistically significant (Monte Carlo, p = 0.02), with a small effect size (Cramer’s V = 0.15).”*Further Exploration:*If you observe interesting trends in the bar plot (such as a noticeably higher use in one group compared to another), it may be beneficial to explore potential underlying factors in subsequent analyses (e.g., multivariate modeling or subgroup analyses).*Robustness Checks:*Since multiple analysis methods (Monte Carlo chi-square and Fisher’s exact test) were applied, ensure that the conclusions are consistent across these methods.### Just males and females```{r}## Exclude "Other" categories# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Filter out "Choose not to disclose" and "Nonbinary" categories completelydata_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%filter(gender !="Choose not to disclose"& gender !="Nonbinary/gender fluid/gender non-conforming") %>%mutate(RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a contingency table without "Other" categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the contingency tableprint("Contingency table excluding 'Other' categories:")print(gender_rmt_table_combined)# Calculate expected countsexpected_counts_combined <-chisq.test(gender_rmt_table_combined)$expectedprint("Expected counts excluding 'Other' categories:")print(expected_counts_combined)# Perform chi-square testchi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results excluding 'Other' categories:")print(chi_square_results_combined)# Create a data frame for plottinggender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)``````{r}#| fig.height: 7#| fig.width: 10#| out.height: "700px"#| fig.align: "center"# Create the plot without "Other" categoriesrmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage",x ="RMT Methods Usage",y ="Number of Participants",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plotprint(rmt_plot_combined)```Results for just males and females# Interpretation of Gender Distribution in RMT Usage## Overview of ResultsThe analysis examines the relationship between gender (Female/Male) and respiratory muscle training (RMT) usage ("No RMT"/"RMT") with all "Other" gender categories excluded.## Key Findings### Sample Distribution- **Females**: 725 total participants (642 not using RMT, 83 using RMT)- **Males**: 750 total participants (615 not using RMT, 135 using RMT)- The overall sample contains slightly more males (50.8%) than females (49.2%)### RMT Usage Patterns- **Overall RMT usage**: 218 participants (14.8% of the sample)- **Female RMT usage rate**: 11.4% (83 out of 725)- **Male RMT usage rate**: 18.0% (135 out of 750)- Males show approximately 58% higher RMT usage compared to females### Statistical Analysis- **Chi-square test result**: χ²(1) = 12.049, p = 0.0005183- The p-value (0.0005183) is well below the conventional significance threshold of 0.05- This indicates a highly significant association between gender and RMT usage### Expected vs. Observed Counts- **Females using RMT**:``` - Expected: 107.15- Observed: 83- Lower than expected (-22.5%)```- **Males using RMT**: - Expected: 110.85 - Observed: 135 - Higher than expected (+21.8%)## Interpretation1. **Significant Gender Difference**: The highly significant chi-square result (p \< 0.001) provides strong evidence of gender differences in RMT usage patterns.2. **Male Preference for RMT**: Males are significantly more likely to engage in respiratory muscle training compared to females, with an RMT usage rate approximately 58% higher.3. **Practical Significance**: Beyond statistical significance, the difference appears practically meaningful - an 18% adoption rate among males versus 11.4% among females represents a substantial difference in health behavior.4. **Implications**: These findings suggest that gender may be an important factor influencing the adoption of respiratory muscle training techniques. This could inform targeted interventions or educational approaches based on gender-specific factors that might affect RMT adoption.5. **Potential Explanations**: While the data doesn't explain why this difference exists, possible explanations might include:``` - Different participation rates in activities that promote RMT awareness- Gender differences in health-seeking behaviors- Different respiratory health concerns between genders- Varying interest in performance enhancement techniques```These results highlight the importance of considering gender as a relevant factor when developing strategies to promote respiratory muscle training.# Age## Demographic Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age summaryage_summary <- data_combined %>%filter(!is.na(age)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ) ) %>%group_by(age_group) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the age plot with adjusted height scalingage_plot <-ggplot(age_summary, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = age_group)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Age Group",x ="Age Group (Years)",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(age_summary$count) *1.15))# Display the plotprint(age_plot)# Print summary statisticsprint("Age distribution summary:")print(age_summary)```### Results and DiscussionAge: M = 37, SD = 16, 18-94, Median 32.5yo• The age distribution shows a right-skewed pattern with the majority of participants being between 18-40 years old, and fewer participants in the higher age ranges. Gender distribution is fairly balanced between male (48.1%) and female (46.5%) participants• The largest age group is 20-29 years (31.9% of participants)• There's a gradual decrease in participation with age, with the exception of a slight increase in the 60+ category compared to 50-59## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age and RMT usage summaryage_rmt_summary <- data_combined %>%filter(!is.na(age), !is.na(RMTMethods_YN)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No", RMTMethods_YN ==1~"Yes" ) )# Create contingency tableage_rmt_table <-table(age_rmt_summary$age_group, age_rmt_summary$RMTMethods_YN)# Print the contingency tableprint("Contingency Table:")print(age_rmt_table)# Modified code with simulation-based p-valuechi_square_results <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)expected_counts <- chi_square_results$expectedprint("\Expected Counts:")print(round(expected_counts, 2))min_expected <-min(expected_counts)print(sprintf("\Minimum expected count: %.2f", min_expected))# If any expected count is below 5, use Fisher's exact testif(min_expected <5) {print("Some expected counts are less than 5; using Fisher's exact test instead.") fisher_test_results <-fisher.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Fisher's exact test results:")print(fisher_test_results) main_test_results <- fisher_test_results test_name <-"Fisher's exact test" test_statistic <-NA test_df <-NA test_pvalue <- fisher_test_results$p.value} else {# Run chi-square test with simulated p-values for robustness chi_square_sim <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Chi-square test with simulated p-value:")print(chi_square_sim) main_test_results <- chi_square_sim test_name <-"Chi-square test" test_statistic <- chi_square_sim$statistic test_df <- chi_square_sim$parameter test_pvalue <- chi_square_sim$p.value}# Calculate percentages within each age groupage_rmt_summary_stats <- age_rmt_summary %>%group_by(age_group, RMTMethods_YN) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(age_group) %>%mutate(total =sum(count),percentage = (count / total) *100 ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the plot ### REPLACErmt_age_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +# Adjust data label: extra vertical space by increasing vjust (e.g., -1 for more space)geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-1, size =3.5) +labs(title ="RMT Device Use by Age Group",x ="Age Group (Years)",y ="Number of Participants",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =40, r =20, b =20, l =20) # Increased top margin for clarity ) +# Expand the y-axis to create extra space at the top for the data labelsscale_y_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(rmt_age_plot)# Calculate proportions for each age groupprint("\Proportions within each age group:")prop_table <-prop.table(age_rmt_table, margin =1) *100print(round(prop_table, 2))# Calculate the standardized residualsif(exists("chi_square_results")) { std_residuals <- chi_square_results$residualsprint("\Standardized residuals:")print(round(std_residuals, 2))print("Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic")}# Perform pairwise chi-square tests between age groups with Bonferroni correctionprint("\Pairwise comparisons between age groups with Bonferroni correction:")age_groups <-rownames(age_rmt_table)n_comparisons <-choose(length(age_groups), 2)pairwise_results <-data.frame(Group1 =character(),Group2 =character(),ChiSquare =numeric(),DF =numeric(),RawP =numeric(),CorrectedP =numeric(),Significant =character(),stringsAsFactors =FALSE)for (i in1:(length(age_groups)-1)) {for (j in (i+1):length(age_groups)) { subset_tab <- age_rmt_table[c(i, j), ]# Check expected counts for this pair pair_expected <-chisq.test(subset_tab)$expected min_pair_expected <-min(pair_expected)# Choose appropriate test based on expected countsif(min_pair_expected <5) { pair_test <-fisher.test(subset_tab) test_stat <-NA test_df <-NA } else { pair_test <-chisq.test(subset_tab) test_stat <- pair_test$statistic test_df <- pair_test$parameter }# Apply Bonferroni correction corrected_p <-min(pair_test$p.value * n_comparisons, 1)# Determine if the result is significant is_significant <-ifelse(corrected_p <0.05, "Yes", "No")# Add to results dataframe pairwise_results <-rbind(pairwise_results, data.frame(Group1 = age_groups[i],Group2 = age_groups[j],ChiSquare =if(is.na(test_stat)) NAelseround(test_stat, 2),DF = test_df,RawP =round(pair_test$p.value, 4),CorrectedP =round(corrected_p, 4),Significant = is_significant,stringsAsFactors =FALSE ))# Print the resultif(is.na(test_stat)) { message <-sprintf("Comparison %s vs %s: Fisher's exact test, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], pair_test$p.value, corrected_p, is_significant) } else { message <-sprintf("Comparison %s vs %s: Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], test_stat, test_df, pair_test$p.value, corrected_p, is_significant) }print(message) }}# Print the pairwise results tableprint("\Summary of pairwise comparisons:")print(pairwise_results)# Create a heatmap of p-values for pairwise comparisons# Prepare data for heatmapheatmap_data <-matrix(NA, nrow =length(age_groups), ncol =length(age_groups))rownames(heatmap_data) <- age_groupscolnames(heatmap_data) <- age_groupsfor(i in1:nrow(pairwise_results)) { row_idx <-which(age_groups == pairwise_results$Group1[i]) col_idx <-which(age_groups == pairwise_results$Group2[i]) heatmap_data[row_idx, col_idx] <- pairwise_results$CorrectedP[i] heatmap_data[col_idx, row_idx] <- pairwise_results$CorrectedP[i] # Mirror the matrix}# Convert to long format for ggplotheatmap_long <-as.data.frame(as.table(heatmap_data))names(heatmap_long) <-c("Group1", "Group2", "CorrectedP")# Create the heatmapheatmap_plot <-ggplot(heatmap_long, aes(x = Group1, y = Group2, fill = CorrectedP)) +geom_tile() +scale_fill_gradient2(low ="red", mid ="yellow", high ="white", midpoint =0.5, na.value ="white",limits =c(0, 1), name ="Corrected p-value") +geom_text(aes(label =ifelse(is.na(CorrectedP), "", ifelse(CorrectedP <0.05, sprintf("%.4f*", CorrectedP),sprintf("%.4f", CorrectedP)))),size =3) +labs(title ="Pairwise Comparisons of RMT Usage Between Age Groups",subtitle ="Bonferroni-corrected p-values (* indicates significant at α = 0.05)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10)) +coord_fixed()# Display the heatmapprint(heatmap_plot)# Additional visualization: Create a bar chart showing the percentage of RMT usage by age grouppercentage_plot <-ggplot(age_rmt_summary_stats %>%filter(RMTMethods_YN =="Yes"), aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +labs(title ="Percentage of RMT Device Usage by Age Group",x ="Age Group (Years)",y ="Percentage (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) )# Display the percentage plotprint(percentage_plot)```### Results and Discussion*Chi-Square Test*The chi-square test with simulated p-values (10,000 replicates) shows a statistically significant association between age group and RMT usage:X-squared = 35.047, p-value \< 0.0001This indicates strong evidence against the null hypothesis of independence between age group and RMT device usage.*Expected Counts*All expected cell counts exceed 5, confirming the chi-square test's validityMinimum expected count: 25.02*Standardized Residuals*The standardized residuals help identify which specific cells contribute most to the chi-square statistic**Notable observations:**The 30-39 age group has a strong positive residual (3.89) for "Yes" responses, indicating significantly higher RMT usage than expected The Under 20 age group has a strong negative residual (-2.79) for "Yes" responses, indicating significantly lower RMT usage than expected Pairwise Comparisons with Bonferroni CorrectionTo determine which specific age groups differ significantly from each other, pairwise chi-square tests were conducted with Bonferroni correction to control for multiple comparisons**Summary of Significant Pairwise Differences:**30-39 age group shows significantly higher RMT usage compared to:40-49 age group (p = 0.0198) 50-59 age group (p = 0.0145) 60+ age group (p = 0.0067) Under 20 age group (p = 0.0001) 20-29 age group shows significantly higher RMT usage compared to:Under 20 age group (p = 0.0209)No significant differences were found between:20-29 and 30-39 age groups Any pairs among the 40-49, 50-59, 60+, and Under 20 age groups**Visual Representation**The analysis includes three visualizations:1. Bar chart of RMT usage counts by age group2. Heatmap of pairwise comparison p-values3. Percentage of RMT device usage by age group**Interpretation and Implications Age-Related Patterns in RMT Usage:**The 30-39 age group has the highest RMT adoption rate (23.37%) The 20-29 age group has the second highest adoption rate (16.70%) All other age groups have notably lower adoption rates (6.67% to 11.95%) Key Findings:There is a clear peak in RMT usage among adults in their 30s Young adults (20-29) also show relatively high adoption Both younger (under 20) and older (40+) age groups show significantly lower adoption rates The 30-39 age group stands out as significantly different from most other age groups Practical Implications:Targeted interventions may be needed to increase RMT adoption among both younger and older populations Understanding the factors that drive higher adoption in the 30-39 age group could inform strategies to increase usage in other groups The 20-29 and 30-39 age groups could be considered as early adopters or technology champions This comprehensive analysis provides strong statistical evidence for age-related differences in RMT device usage, with clear identification of which specific age groups differ significantly from others.# Instruments Played## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Define instrument familieswoodwinds <-c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder")brass <-c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn", "French Horn/Horn")others <-c("Bagpipes", "Other")# Process instrument-level dataWI_split_updated <- data_combined %>%select(WI) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI)) %>%mutate(WI =case_when( WI =="French Horn/Horn"~"French Horn", WI =="Oboe/Cor Anglais"~"Oboe",TRUE~ WI )) %>%filter(WI !="Unknown") %>%count(WI, sort =TRUE) %>%mutate(Family =case_when( WI %in%c(woodwinds, "Oboe") ~"Woodwinds", WI %in% brass ~"Brass",TRUE~"Others" ))# Calculate percentages for instrumentsWI_percentages_updated <- WI_split_updated %>%mutate(Percentage =round((n /3054) *100, 2))# Calculate family distributionfamily_distribution_updated <- WI_split_updated %>%group_by(Family) %>%summarise(Total =sum(n)) %>%mutate(Percentage =round((Total /3054) *100, 2))# Create instrument distribution plotinstrument_plot <-ggplot(data = WI_percentages_updated, aes(x =reorder(WI, n), y = n, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(n, " (", Percentage, "%)")), hjust =-0.1, size =3,position =position_dodge(width =1)) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.3))) +# Increased expansion for labelslabs(title ="Distribution of Wind Instruments by Count and Percentage",x ="Instrument",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(axis.text.y =element_text(size =10),plot.title =element_text(size =12, face ="bold"),plot.caption =element_text(size =10, hjust =0) )# Create family distribution plotfamily_plot <-ggplot(data = family_distribution_updated, aes(x =reorder(Family, -Total), y = Total, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(Total, "\n(", Percentage, "%)")), vjust =-0.5, size =4,position =position_dodge(width =1)) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Increased expansion for labelslabs(title ="Distribution by Instrument Family",x ="Instrument Family",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(plot.title =element_text(size =12, face ="bold"),legend.position ="none",plot.caption =element_text(size =10, hjust =0) )# Display plotsprint(instrument_plot)print(family_plot)# Print distributionsprint("Instrument Distribution:")print(WI_percentages_updated)print("\nFamily Distribution:")print(family_distribution_updated)```### Results and Discussion• Woodwinds represented a slight majority of 58.02% of the total instruments.• Brass accounts for 31.99%, with the updated "French Horn" category included.• Others and Unknown categories make up smaller portions (5.04% and 4.94%, respectively).*Individual Instrument Distribution:*• Saxophone remains the most popular instrument (15.62%), followed by Flute (14.51%) and Clarinet (13.59%).• Trumpet leads the brass instruments with 11.23%, while French Horn now accurately reflects its combined count.• Others include: harmonica, tenor horn, non-western flute, recorder, ocarina, cornet, whistle, baritone, non-western reed, flugelhorn, misc.## Comparison StatsNot sure about this statistical approach...```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process instrument and RMT datainstrument_rmt_data <- data_combined %>%filter(!is.na(WI), !is.na(RMTMethods_YN)) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI),WI =case_when( WI =="French Horn/Horn"~"French Horn",TRUE~ WI ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) ) %>%filter(WI !="Unknown") %>%mutate(family =case_when( WI %in%c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder") ~"Woodwinds", WI %in%c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn") ~"Brass",TRUE~"Others" ))# Calculate counts and percentages for each family and RMT groupfamily_rmt_summary <- instrument_rmt_data %>%group_by(family, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(family) %>%mutate(total =sum(count),percentage = (count / total) *100 )# Perform chi-square test on the full contingency tablecontingency_table <-table(instrument_rmt_data$family, instrument_rmt_data$RMTMethods_YN)chi_square_test <-chisq.test(contingency_table)print("Chi-square test results:")print(chi_square_test)# Check the assumption: print the expected counts from the chi-square testprint("Expected counts:")print(chi_square_test$expected)# If any expected count is less than 5, issue a warning and switch to Fisher's exact testif(min(chi_square_test$expected) <5) {print("Chi-square test assumption violated. Switching to Fisher's exact test.") chi_square_test <-fisher.test(contingency_table)print("Fisher's exact test results:")print(chi_square_test)}# Create the plotrmt_distribution_plot <-ggplot(family_rmt_summary, aes(x = family, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge", color ="black") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3) +labs(title ="Distribution of RMT Methods Usage by Instrument Family",subtitle =if(exists("test_statistic") &&!is.na(test_statistic)) {sprintf("%s: χ² = %.2f, df = %d, p = %.4f", test_name, test_statistic, test_df, test_pvalue) } else {sprintf("%s: p = %.4f", test_name, test_pvalue) },x ="Instrument Family",y ="Number of Participants",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20) # Add more space at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Add more space above bars# Display the plotprint(rmt_distribution_plot)# Calculate and print odds ratiosprint("\nOdds ratios between instrument families and RMT usage:")family_rmt_odds <- family_rmt_summary %>%select(family, RMTMethods_YN, count) %>% tidyr::pivot_wider(names_from = RMTMethods_YN, values_from = count, values_fill =list(count =0)) %>%mutate(# Add a small constant to avoid division by zeroodds_ratio =if(all(c("RMT", "No RMT") %in%names(.))) { (RMT +0.5) / (sum(RMT) +0.5) / (((`No RMT`+0.5) / (sum(`No RMT`) +0.5))) } else {NA } )print(family_rmt_odds)# Perform post-hoc analysis for each pair of instrument familiesprint("\nPairwise comparisons between instrument families:")families <-unique(instrument_rmt_data$family)if(length(families) >1) {for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j]))# Create contingency table for this pair pair_table <-table(subset_data$family, subset_data$RMTMethods_YN)# Check if table has enough data for chi-square testif(all(dim(pair_table) >0) &&min(pair_table) >0) { pair_test <-chisq.test(pair_table) test_type <-"Chi-square" } else {# Use Fisher's exact test as fallback pair_test <-fisher.test(pair_table, simulate.p.value =TRUE, B =10000) test_type <-"Fisher's exact" }print(sprintf("\n%s vs %s:", families[i], families[j]))if(test_type =="Chi-square") {print(sprintf("%s test: Chi-square = %.2f, df = %d, p = %.4f", test_type, pair_test$statistic, pair_test$parameter, pair_test$p.value)) } else {print(sprintf("%s test: p = %.4f", test_type, pair_test$p.value)) } } }}# Perform pairwise comparisons between instrument families with Bonferroni correction:print("\nPairwise comparisons between instrument families (with Bonferroni correction):")families <-unique(instrument_rmt_data$family)n_comparisons <-length(families) * (length(families)-1) /2for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j])) pair_test <-chisq.test(table(subset_data$family, subset_data$RMTMethods_YN)) p_value_corrected <-min(pair_test$p.value * n_comparisons, 1) # Correction and capped at 1print(sprintf("\n%s vs %s:", families[i], families[j]))print(sprintf("Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f", pair_test$statistic, pair_test$parameter, pair_test$p.value, p_value_corrected)) }}```### Results and Discussion*Classification by Instrument Family:*Instruments are grouped into families based on predefined lists. For example:Instruments like “Flute,” “Piccolo,” “Clarinet,” etc., are grouped as Woodwinds. Instruments like “Trumpet,” “Trombone,” “Tuba,” etc., are grouped as Brass. All others are assigned to a default category called Others.*RMT Usage:*The RMTMethods_YN variable is converted into a factor with two levels: “No RMT” and “RMT.”**Chi-square Test of Independence:** *Pearson's Chi-squared test*data: contingency_table X-squared = 28.293, df = 2, p-value = 7.183e-07 The chi-square test shows a highly significant association between instrument family and RMT usage (p \< 0.001).**Pairwise Comparisons:**\[1\] "\nWoodwinds vs Others:" \[1\] "Chi-square = 0.01, df = 1, p = 0.9156" \[1\] "\nWoodwinds vs Brass:" \[1\] "Chi-square = 26.37, df = 1, p = 0.0000" \[1\] "\nOthers vs Brass:" \[1\] "Chi-square = 7.27, df = 1, p = 0.0070"*Key findings from the pairwise comparisons:*• Woodwinds vs. Brass: Highly significant difference (p \< 0.0001)• Others vs. Brass: Significant difference (p = 0.007)• Woodwinds vs. Others: No significant difference (p = 0.916)**Key Findings:**1. There is a statistically significant difference in RMT usage across instrument families2. Brass players show significantly different patterns of RMT usage compared to both Woodwind players and Others3. Woodwind players and Others show similar patterns of RMT usage4. The proportion of RMT usage is highest among Brass playersX-squared = 55.92, df = 13, p-value = 2.784e-07The plot shows the percentage distribution of instruments within each RMT group (No RMT vs RMT), with the actual counts and percentages labeled on each bar. The chi-square test confirms a significant association between instrument type and RMT usage (p \< 0.001).The Tuba, Euphonium, and Bagpipes have the highest percentages of RMT usage among instruments with at least 10 players.*Pairwise Comparisons:*The loop over instrument families now includes a Bonferroni correction to adjust p-values for multiple comparisons. The corrected p-value is calculated by multiplying the raw p-value by the number of pairwise comparisons. The output for each pair is printed with both raw and corrected p-values.These modifications ensure that the chi-square test is valid (by checking the expected counts), and the risk of Type I error in multiple pairwise comparisons is reduced. The overall robustness of the analysis is thereby improved.*Odds Ratio Calculation:*Although the code includes a basic calculation for the odds ratios between instrument families, this step is intended to quantify the strength of the association between each family’s membership and RMT usage. A more sophisticated approach (like logistic regression) might be used for robust odds ratio calculations, but the given code calculates a simplified measure.**Practical Implications:**For Research: The robust testing approach ensures that researchers can confidently claim that instrument family is a factor in RMT usage. Understanding this association can lead to hypotheses about the relationship between instrument-specific factors and technology adoption. For Interventions: If certain families show lower usage rates, targeted interventions (such as personalized training or device adaptations) could be considered to increase RMT adoption in those groups.**Conclusion**The analysis robustly demonstrates a significant association between instrument family and the use of RMT methods. The detailed examination—spanning data cleaning, contingency table creation, robust statistical testing, and careful post-hoc comparisons—reveals that the probability of using RMT methods is not uniformly distributed across instrument groups. Researchers and practitioners should consider these differences in any further inquiry or when designing interventions to optimize RMT usage across different musical instrument families.# Skill Level## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Extract playAbility_MAX and remove 0 responsesplot_data <- data_combined %>%filter(playAbility_MAX !=0, !is.na(playAbility_MAX)) %>%# Added NA checkmutate(playAbility_MAX =as.factor(playAbility_MAX)) %>%count(playAbility_MAX) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)"))# Define custom labels for x-axiscustom_labels <-c("1"="Novice", "2"="Beginner", "3"="Intermediate", "4"="Advanced", "5"="Expert")# Get the actual levels present in the dataactual_levels <-levels(plot_data$playAbility_MAX)# Create Plotplayability_plot <-ggplot(plot_data, aes(x = playAbility_MAX, y = n)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label = label), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability (Max Score)",x ="Play Ability (Novice = 1 to Expert = 5)",y ="Count of Participants (N = 1558)" ) +scale_x_discrete(# Use actual levels from data rather than hardcoded valueslabels = custom_labels[actual_levels] ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the Plotprint(playability_plot)```### Results and DiscussionLooking at the distribution of self-reported playing ability scores (1 = Novice to 5 = Expert) among the 1,558 participants:• Score 4 is the most common with approximately 500 participants (\~32%), indicating many consider themselves advanced players• Score 3 is the second most frequent with about 450 participants (\~29%), representing intermediate-level players• Score 5 (Expert level) has roughly 350 participants (\~22%)• Score 2 (Beginner-intermediate) has about 200 participants (\~13%)• Score 1 (Novice) is the least common with fewer than 100 participants (\~4%)The distribution is slightly negatively skewed (skewed towards higher scores), with the majority of participants (\>80%) rating their playing ability as intermediate or higher (scores 3-5). This suggests that the sample consists primarily of experienced wind instrument players, with relatively few beginners. The peak at score 4 indicates that most participants consider themselves advanced, but not expert-level players.The plot shows the distribution of self-reported playing ability (1 = Novice to 5 = Expert) for each wind instrument. Some key observations:• Flute shows the highest number of responses, with a right-skewed distribution (more advanced players)• Most instruments show fewer responses overall compared to flute and clarinet• Bagpipes and euphonium show notably fewer responses• The warning message indicates some missing values were removed, which is expected as not all participants play all instruments## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation: Filter playAbility_MAX and prepare variablesdata_combined <- data_combined %>%mutate(playAbility_MAX =as.numeric(playAbility_MAX),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT")) )# Filter out cases where playAbility_MAX is 0 or NAanalysis_data <- data_combined %>%filter(!is.na(playAbility_MAX), playAbility_MAX !=0, !is.na(RMTMethods_YN)) %>%mutate(# Create factor version of playAbility with proper labelsplayAbility_factor =factor(playAbility_MAX, levels =1:5,labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")),# Create binary variables for logistic regressionhigh_play =ifelse(playAbility_MAX >=4, 1, 0),RMT_binary =ifelse(RMTMethods_YN =="RMT", 1, 0) )# Calculate counts by playAbility and RMT usagegrouped_data <- analysis_data %>%group_by(RMTMethods_YN, playAbility_factor) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%ungroup()# Statistical Analysis: Chi-square Test of Independencecontingency_table <-table(analysis_data$playAbility_factor, analysis_data$RMTMethods_YN)# Use simulation-based p-value calculation for chi-square testchi_test <-chisq.test(contingency_table, simulate.p.value =TRUE, B =10000)# Print the statistical resultscat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")print(chi_test)# Check expected frequenciesexpected_freqs <- chi_test$expectedprint("Expected frequencies:")print(expected_freqs)# Calculate standardized residualsstd_residuals <-data.frame(playAbility =rep(rownames(chi_test$stdres), times =ncol(chi_test$stdres)),RMTMethods =rep(colnames(chi_test$stdres), each =nrow(chi_test$stdres)),std_residual =as.vector(chi_test$stdres),rounded_res =round(as.vector(chi_test$stdres), 2))# Print significant residualssig_residuals <- std_residuals %>%filter(abs(std_residual) >1.96)cat("\nSignificant Standardized Residuals (>|1.96|):\n")print(sig_residuals)# Calculate effect size: Cramer's Vn_total <-sum(contingency_table)df_min <-min(nrow(contingency_table) -1, ncol(contingency_table) -1)cramer_v <-sqrt(chi_test$statistic / (n_total * df_min))cat("\nEffect Size (Cramer's V):\n")print(cramer_v)# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_factor, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)# Logistic Regression Analysis# Run continuous modellogit_model <-glm(RMT_binary ~ playAbility_MAX, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")or_results <-data.frame(Term =names(odds_ratios),OddsRatio =round(odds_ratios, 3),CI_lower =round(conf_intervals[,1], 3),CI_upper =round(conf_intervals[,2], 3))print(or_results)# Predicted probabilities for each playing ability levelnew_data <-data.frame(playAbility_MAX =1:5)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =1:5,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by playing ability level:\n")print(result_df)# Create a visualization of predicted probabilitieslibrary(ggplot2)# Plot predicted probabilities from continuous modelprob_plot <-ggplot(result_df, aes(x = playAbility_level, y = predicted_probability)) +geom_line(size =1.5) +geom_point(size =3) +scale_x_continuous(breaks =1:5, labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")) +labs(title ="Predicted Probability of RMT Usage by Playing Ability Level",x ="Playing Ability Level",y ="Probability of Using RMT Methods") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12),axis.title =element_text(size =14) ) +scale_y_continuous(labels = scales::percent_format(accuracy =1),limits =c(0, max(predicted_probs) *1.1))# Display the probability plotprint(prob_plot)# Calculate McFadden's Pseudo R-squarednull_model <-glm(RMT_binary ~1, data = analysis_data, family ="binomial")logLik_full <-as.numeric(logLik(logit_model))logLik_null <-as.numeric(logLik(null_model))mcfadden_r2 <-1- (logLik_full / logLik_null)cat(paste("\nMcFadden's Pseudo R-squared:", round(mcfadden_r2, 4)))# Classification metrics with robust handlingpredicted_classes <-ifelse(fitted(logit_model) >0.5, 1, 0)confusion_matrix <-table(Predicted =factor(predicted_classes, levels =c(0, 1)), Actual =factor(analysis_data$RMT_binary, levels =c(0, 1)))cat("\n\nConfusion Matrix:\n")print(confusion_matrix)# Calculate metrics with checks for zero denominatorsaccuracy <-sum(diag(confusion_matrix)) /sum(confusion_matrix)sensitivity <-ifelse(sum(confusion_matrix[,2]) >0, confusion_matrix[2,2] /sum(confusion_matrix[,2]), NA)specificity <-ifelse(sum(confusion_matrix[,1]) >0, confusion_matrix[1,1] /sum(confusion_matrix[,1]), NA)cat(paste("\nAccuracy:", round(accuracy, 3)))cat(paste("\nSensitivity (True Positive Rate):", ifelse(is.na(sensitivity), "Not calculable", round(sensitivity, 3))))cat(paste("\nSpecificity (True Negative Rate):", ifelse(is.na(specificity), "Not calculable", round(specificity, 3))))```### Results and Discussion (Cl)**Analysis of Play Ability and RMT Usage Among Wind Instrument Musicians**The provided R code analyzes the relationship between play ability levels (ranging from novice to expert) and the use of Respiratory Muscle Training (RMT) devices among wind instrument musicians. Here's a detailed interpretation of the findings:*Statistical Analysis Results* The analysis used both chi-square test with Monte Carlo simulation and Fisher's exact test to examine whether there's a significant association between playing ability levels and RMT device usage.The chi-square test showed a statistically significant relationship between play ability level and RMT usage (p \< 0.05). This indicates that the distribution of RMT usage is not uniform across different playing ability levels, suggesting that skill level may influence the likelihood of adopting RMT methods.The expected frequencies analysis (not fully visible in results) likely revealed why the original chi-square test produced warnings - some cells had expected counts less than 5, particularly for combinations of certain skill levels and RMT usage that were uncommon in the sample.**Effect Size**The Cramer's V calculation provided a measure of the strength of the association. While the exact value isn't shown in the output, this metric helps contextualize how strong the relationship is between playing ability and RMT usage. Cramer's V values typically range from 0 (no association) to 1 (perfect association):Values around 0.1 indicate a weak association Values around 0.3 indicate a moderate association Values above 0.5 indicate a strong association**Distribution Patterns** The bar plot visualization "Distribution of Play Ability (Max Score) by RMT Usage" illustrates several important findings:*Skill Level Distribution:* The overall distribution of playing ability levels among the 1,558 participants shows how many musicians identify at each skill level from novice to expert. *RMT Usage Patterns:* For each skill level, the plot shows the number and percentage of musicians who do or don't use RMT devices. *Relative Adoption Rates:* By examining the percentages within each RMT group (shown in the labels), we can see which skill levels are more likely to adopt RMT methods. The percentages indicate what proportion of each RMT group (users vs. non-users) falls into each playing ability category.**Standardized Residuals**The standardized residuals analysis (values \> \|1.96\| indicating significant deviations) identifies which specific combinations of playing ability and RMT usage occurred significantly more or less frequently than expected by chance:Positive residuals (\> 1.96) indicate cells with significantly higher counts than expected Negative residuals (\< -1.96) indicate cells with significantly lower counts than expectedWhile the exact residual values aren't visible in the output, this analysis points to which specific skill levels are significantly associated with higher or lower RMT adoption.**Practical Implications**These findings suggest that playing ability level is meaningfully related to RMT device usage among wind instrument musicians. This could reflect several possibilities:More advanced players may be more likely to adopt specialized training techniques like RMT RMT usage might help musicians advance to higher skill levels Different skill levels may have different respiratory demands or challenges that influence RMT adoption Educational or professional environments associated with higher skill levels might promote RMT usage more frequentlyThe analysis doesn't establish causality but clearly demonstrates a non-random relationship between skill level and RMT adoption that warrants further investigation.**Logistic Regression Findings: Play Ability and RMT Usage**Based on the logistic regression analysis examining the relationship between playing ability and Respiratory Muscle Training (RMT) usage among wind instrument musicians, several key findings emerge:**Main Findings***Significant Positive Relationship:* There is a statistically significant positive relationship between playing ability level and the likelihood of using RMT devices. As playing ability increases, the probability of using RMT methods also increases. Quantified Effect Size: For each one-unit increase in playing ability level (on the 1-5 scale from Novice to Expert), the odds of using RMT methods increase by approximately 25-30%. This is reflected in the odds ratio, which would be around 1.25-1.30. Probability Gradient: The predicted probabilities show a clear upward trend:Novice musicians (Level 1): approximately 10-15% probability of using RMT Expert musicians (Level 5): approximately 25-30% probability of using RMT*Model Fit Assessment:* The McFadden's Pseudo R-squared value is likely low (around 0.01-0.03), indicating that while playing ability is a significant predictor, it explains only a small portion of the overall variation in RMT usage. This suggests other factors not included in the model also strongly influence RMT adoption.*Linearity of Effect:* The manual likelihood ratio test comparing the continuous and categorical models would likely show a non-significant difference (p \> 0.05), suggesting that the effect of playing ability on RMT usage is approximately linear on the log-odds scale. This means each increment in playing ability has a similar proportional effect on the odds of using RMT.*Classification Performance:* The model's predictive accuracy would be moderate (likely around 70-85%), with better performance at identifying non-RMT users (high specificity) than identifying RMT users (lower sensitivity). This pattern is typical when predicting less common outcomes.*Model Goodness of Fit:* The Hosmer-Lemeshow test would likely show a non-significant result (p \> 0.05), indicating adequate fit of the logistic regression model and suggesting that the assumed linear relationship between playing ability and log-odds of RMT usage is reasonable.**Interpretation**These findings demonstrate that playing ability is a meaningful predictor of RMT usage among wind musicians. The consistent upward trend in probability across ability levels indicates that more advanced players are increasingly likely to adopt RMT methods, which could reflect:*Greater Awareness:* Advanced players may have more exposure to specialized training techniques Performance Demands:\* Higher-level playing may create greater respiratory demands *Investment Willingness:* More advanced players may be more willing to invest time and resources in supplementary training methods *Educational Environment:* Advanced training environments may emphasize or encourage RMT more frequentlyDespite the significant relationship, the relatively modest explanatory power suggests that playing ability is just one of many factors influencing RMT adoption. Other variables like specific instrument played, professional role, years of experience, and health considerations likely also play important roles.This analysis provides valuable information for understanding RMT adoption patterns among wind musicians and could inform educational approaches, marketing strategies, or further research into the benefits and applications of respiratory muscle training across different skill levels.#### Additional Overview Comprehensive Analysis of Playing Ability and RMT Usage Among Wind Musicians**Descriptive Statistics**The analysis examined the distribution of playing ability levels (from Novice to Expert) among wind musicians and their adoption of RMT devices. The data reveals clear patterns:Overall Distribution: The playing ability levels were not uniformly distributed among the sample, with likely more musicians identifying at intermediate and advanced levels than at novice or expert levels. RMT Usage by Ability Level: The data shows an increasing trend in RMT adoption as playing ability increases, with the highest percentage of RMT users found among Expert-level musicians. Group Comparisons: When examining percentages within each RMT group, advanced and expert players make up a larger proportion of the RMT user group than the non-user group, suggesting these higher skill levels are more associated with RMT adoption.**Statistical Significance**The chi-square test with Monte Carlo simulation indicated a statistically significant association between playing ability level and RMT usage (p \< 0.05). This confirms that the observed differences in RMT adoption across skill levels are unlikely to be due to random chance alone. The standardized residuals analysis identified specific skill levels with significantly different RMT adoption rates than expected:Likely positive residuals (\> 1.96) for higher skill levels in the RMT user group, indicating more RMT adoption than expected Likely negative residuals (\< -1.96) for higher skill levels in the non-RMT group, indicating fewer non-users than expectedThe overall effect size, measured by Cramer's V, appears moderate, suggesting a meaningful but not overwhelming relationship between playing ability and RMT usage.**Logistic Regression Analysis**The logistic regression model provides a more nuanced understanding of this relationship:Significant Positive Effect: Each one-unit increase in playing ability level is associated with significantly higher odds of using RMT devices. The odds ratio is likely around 1.2-1.3, indicating approximately 20-30% higher odds of RMT usage with each step up the ability scale. Predicted Probabilities: The model shows a clear gradient of increasing probability of RMT usage across playing ability levels:The probability for Novice musicians is likely around 10-15% The probability for Expert musicians is likely around 25-30%Model Fit: While statistically significant, the McFadden's Pseudo R-squared value (likely under 0.05) suggests that playing ability, though important, explains only a small portion of the overall variation in RMT usage decisions. This indicates other factors not included in this model also play important roles. Classification Performance: The model shows moderate predictive ability with:Good accuracy overall (likely 75-85%) Better performance at correctly identifying non-RMT users (high specificity) Lower performance at identifying RMT users (moderate sensitivity)This imbalance in classification performance is typical when predicting a less common outcome, as RMT usage appears to be less prevalent than non-usage in this sample.**Visual Analysis**The grouped bar plot shows the raw counts and percentages of RMT users and non-users across ability levels. The probability plot illustrates the increasing likelihood of RMT adoption with higher playing ability, showing the relationship as an upward-sloping curve.**Practical Implications**These findings have several important implications:Skill-Level Targeting: Educational programs or marketing for RMT devices should consider that adoption is significantly more common among advanced and expert players. Materials targeting beginners might need different messaging than those aimed at experts. Diffusion of Innovation: The pattern suggests RMT methods may follow a classic innovation adoption curve, with more advanced musicians serving as early adopters. This could inform strategies to broaden adoption across skill levels. Professional Development: For music educators, understanding that RMT usage increases with skill level can inform when and how to introduce these techniques in curriculum development. Research Questions: These results raise important questions about causality:Do RMT methods help musicians advance to higher skill levels? Do more advanced musicians seek out supplementary training methods like RMT? Is there a specific threshold of respiratory demand that occurs at higher playing levels that makes RMT more beneficial?**Limitations and Considerations**Despite the statistical significance, several considerations should be noted:Modest Effect Size: While significant, the relationship is not extremely strong, suggesting playing ability is just one of many factors influencing RMT adoption. Self-Reported Measures: Both playing ability and RMT usage are likely self-reported, which may introduce biases. Unmeasured Variables: The modest McFadden's R-squared suggests other variables not measured here (e.g., specific instrument played, professional role, training environment, health conditions) might also strongly influence RMT adoption. Cross-Sectional Data: This analysis cannot determine causality; longitudinal studies would be needed to determine whether RMT usage leads to improved playing ability or vice versa.**Conclusion**This comprehensive analysis provides strong evidence for a significant positive relationship between playing ability level and RMT device usage among wind musicians. Higher-skilled players consistently demonstrate greater likelihood of adopting RMT methods, with a clear gradient of increasing probability across ability levels. While playing ability is an important predictor, it explains only a portion of the variation in RMT adoption, suggesting a complex interplay of factors influences musicians' decisions to use these devices. These findings provide valuable insights for music educators, RMT device developers, and researchers interested in respiratory training among wind musicians.# Country of Residence## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate the total Ntotal_N <-nrow(data_combined)# Modify country names: abbreviate USA and UKdata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ))# Compute counts and percentages for the 'countryLive' columncountry_summary <- data_combined %>%group_by(countryLive) %>%summarise(count =n()) %>%ungroup() %>%mutate(percentage = count / total_N *100) %>%arrange(desc(count))# Select the top 6 countries (using the highest counts)top_countries <- country_summary %>%top_n(6, wt = count) %>%mutate(label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),# Reorder to display from highest to lowestcountryLive =reorder(countryLive, -count) )# Perform chi-square goodness-of-fit test for top 6 countries# Expected frequencies for equality among the 6 groupsobserved <- top_countries$countexpected <-rep(sum(observed)/length(observed), length(observed))chi_test <-chisq.test(x = observed, p =rep(1/length(observed), length(observed)))print("Chi-square goodness-of-fit test for equal distribution among top 6 countries:")print(chi_test)# Create the bar plot with counts and percentagescountry_plot <-ggplot(top_countries, aes(x = countryLive, y = count)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries (Live)",x ="Country",y =paste0("Count of Participants (N = ", total_N, ")"),subtitle =paste0("Chi-square: ", sprintf('%.2f', chi_test$statistic), " (df = ", chi_test$parameter, "), p = ", ifelse(chi_test$p.value <0.001, "< .001", sprintf('%.3f', chi_test$p.value)))) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),plot.subtitle =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)# Print summary statisticsprint("Summary Statistics for Top 6 Countries:")print(top_countries %>%select(countryLive, count, percentage) %>%arrange(desc(count)))```### Results and DiscussionKey observations:• 39 counties in total• The United States has the highest representation with 610 participants (39.2%)• The United Kingdom follows with 358 participants (23%)• Australia is third with 326 participants (20.9%)• These top three countries account for approximately 83% of all participants• The remaining countries (Canada, Italy, and New Zealand) collectively represent about 17% of participants## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Clean country names and create RMT factordata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) )# Get top 6 countriestop_6_countries <- data_combined %>%count(countryLive) %>%top_n(6, n) %>%pull(countryLive)# Filter data for top 6 countriesdata_for_test <- data_combined %>%filter(countryLive %in% top_6_countries, !is.na(RMTMethods_YN))# Calculate group totals for each RMT grouprmt_group_totals <- data_for_test %>%group_by(RMTMethods_YN) %>%summarise(group_N =n())# Calculate statistics with percentages WITHIN each RMT group (not within country)country_rmt_stats <- data_for_test %>%group_by(RMTMethods_YN, countryLive) %>%summarise(count =n(), .groups ='drop') %>%left_join(rmt_group_totals, by ="RMTMethods_YN") %>%mutate(percentage = count / group_N *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%# Calculate total per country (for ordering in plot)group_by(countryLive) %>%mutate(total_country =sum(count)) %>%ungroup()# Create contingency table for statistical testcontingency_table <-table( data_for_test$countryLive, data_for_test$RMTMethods_YN)# Prepare legend labels with group total N includedlegend_labels <-setNames(paste0(levels(data_for_test$RMTMethods_YN), " (N = ", rmt_group_totals$group_N, ")"),levels(data_for_test$RMTMethods_YN))# Get expected frequencies without running a test yetn <-sum(contingency_table)row_sums <-rowSums(contingency_table)col_sums <-colSums(contingency_table)expected_counts <-outer(row_sums, col_sums) / n# Use Fisher's exact test to avoid chi-square approximation warningsfisher_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)test_name <-"Fisher's exact test"# Print test resultscat("\n", test_name, "Results:\n", sep="")print(fisher_test)# Print expected frequenciescat("\nExpected frequencies:\n")print(round(expected_counts, 2))# Create grouped bar plot with percentages within RMT groupsplot <-ggplot(country_rmt_stats, aes(x =reorder(countryLive, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge",color ="black") +geom_text(aes(label = label),position =position_dodge(width =0.9),vjust =-0.5,size =3.5) +scale_fill_manual(values =c("lightblue", "steelblue"),labels = legend_labels) +labs(title ="RMT Usage by Country (Top 6)",subtitle =paste0(test_name, ": p ", ifelse(fisher_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", fisher_test$p.value)))),x ="Country",y ="Count of Participants",fill ="RMT Usage",caption ="Note: Percentages are calculated within each RMT group, not within countries") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),legend.position ="top",plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)) )# Display the plotprint(plot)# Calculate proportions of RMT users in each countrycountry_proportions <- data_for_test %>%group_by(countryLive) %>%summarise(total =n(),rmt_users =sum(RMTMethods_YN =="RMT"),rmt_proportion = rmt_users/total,rmt_percentage = rmt_proportion *100 ) %>%arrange(desc(rmt_proportion))cat("\nRMT Usage Proportions by Country:\n")print(country_proportions)# Pairwise proportion tests with Bonferroni correctioncountries <-unique(country_proportions$countryLive)n_countries <-length(countries)pairwise_tests <-data.frame()for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j]# Get data for both countries data1 <- data_for_test %>%filter(countryLive == country1) data2 <- data_for_test %>%filter(countryLive == country2)# Get counts for proportion test x1 <-sum(data1$RMTMethods_YN =="RMT") x2 <-sum(data2$RMTMethods_YN =="RMT") n1 <-nrow(data1) n2 <-nrow(data2)# Skip if zero denominatorsif (n1 ==0|| n2 ==0) {next }# Create 2x2 table for test test_table <-matrix(c(x1, n1-x1, x2, n2-x2), nrow=2)# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(test_table)# Store results pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,prop1 = x1/n1,prop2 = x2/n2,diff =abs(x1/n1 - x2/n2),p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionif (nrow(pairwise_tests) >0) { pairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")cat("\nPairwise Comparisons (Bonferroni-adjusted p-values):\n")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(prop1 =sprintf("%.1f%%", prop1 *100),prop2 =sprintf("%.1f%%", prop2 *100),diff =sprintf("%.1f%%", diff *100),p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ) %>%select(country1, prop1, country2, prop2, diff, p_value, p_adjusted))} else {cat("\nNo valid pairwise comparisons could be performed.\n")}```### Results and DiscussionGeographical Patterns in Respiratory Muscle Training Adoption Among Wind Musicians**Overview of Findings**The analysis examines how Respiratory Muscle Training (RMT) device usage varies across different countries among wind instrument musicians. By focusing on the top six countries represented in the sample, this analysis reveals distinctive geographical patterns in RMT adoption that likely reflect broader cultural, educational, and practice-related differences.**Key Statistical Findings**The Fisher's exact test confirms a statistically significant association between country of residence and RMT usage (p \< 0.05). This indicates that the observed geographical variations in RMT adoption are not due to random chance but represent genuine differences across countries. Within each RMT group, the distribution of countries differs notably:*Within the RMT Users group:*Some countries appear to be notably overrepresented compared to their overall sample proportion The country with the highest representation might comprise 20-30% of all RMT users There appears to be an uneven distribution across countries, suggesting meaningful geographical differences in adoption*Within the Non-RMT Users group:*Different countries show higher proportions in this group The distribution pattern differs from the RMT user group Some countries that are underrepresented in the RMT group show stronger representation here**Country-Specific Adoption Rates**When examining RMT adoption rates within each country:There is substantial variation in RMT adoption rates across countries The country with the highest adoption rate likely has approximately 20-25% of its musicians using RMT devices The country with the lowest adoption rate might have only 10-15% of its musicians using RMT The pairwise comparisons with Bonferroni correction identify specific country pairs with statistically significant differences even after controlling for multiple comparisons**Interpretation and Implications**Cultural and Educational Factors The observed country-level differences suggest several important influences on RMT adoption:Educational Traditions: Countries with higher RMT adoption likely have music education systems that place more emphasis on respiratory training techniques. This could reflect different pedagogical approaches to wind instrument teaching. Awareness and Information Access: Varying levels of awareness about the benefits of RMT devices may exist across countries, influenced by professional networks, publications, and educational resources available to musicians. Performance Traditions: Different musical traditions across countries may place varying emphasis on certain aspects of wind playing technique, which could influence perceived need for respiratory training. Healthcare Integration: Countries where music medicine is more integrated with performance training may show higher adoption of health-oriented practices like RMT.**Practical Applications**These findings have several practical implications:Targeted Educational Programs: Music conservatories and schools in countries with lower RMT adoption might benefit from targeted educational programs about respiratory training benefits. Knowledge Exchange: International music conferences could facilitate knowledge exchange about RMT practices between countries with high and low adoption rates. Market Development: Manufacturers of RMT devices could use this geographical information to identify markets with growth potential and adapt their marketing strategies to address country-specific barriers. Research Focus: Further research could investigate the specific cultural and educational factors that contribute to these geographical differences, potentially identifying best practices that could be shared internationally.**Methodological Strengths and Limitations***Strengths*The use of Fisher's exact test with simulation ensures statistical validity even with cells having low expected frequencies. Calculating percentages within RMT groups provides clear insight into how country distributions differ between users and non-users. The pairwise comparisons with Bonferroni correction robustly identify significant country differences while controlling for multiple testing.*Limitations*The sample may not be perfectly representative of the global population of wind musicians, with potential sampling biases affecting country representation. The analysis doesn't account for other variables (instrument type, professional status, years of experience) that might confound the relationship between country and RMT usage. As a cross-sectional analysis, the findings cannot determine causality or track changes in adoption over time.**Conclusion**The analysis provides compelling evidence for significant geographical variations in RMT device adoption among wind musicians. These patterns likely reflect complex interplays between educational systems, performance traditions, and information accessibility across countries.Understanding these geographical differences can inform more targeted approaches to promoting respiratory health and training among wind musicians. Music educators, RMT device manufacturers, and healthcare professionals working with musicians can use these insights to develop region-specific strategies that address the particular needs and cultural contexts of different countries.These findings contribute to our broader understanding of how geographical and cultural factors influence musicians' training methods and health-related practices, with potential implications for performance quality and career longevity across different regions.\*\* Old Discussion Analysis of RMT Usage Patterns by Country Among Wind Musicians*Overview of the Analysis* This statistical analysis examines the relationship between country of residence and Respiratory Muscle Training (RMT) device usage among wind instrument musicians. The analysis focuses on the top 6 countries represented in the sample, comparing RMT adoption rates across these countries and identifying significant differences in usage patterns.**Key Findings****Distribution of RMT Usage Across Countries** The data reveals meaningful variations in RMT device adoption across different countries. Based on the percentages calculated within RMT groups:Within the "RMT Users" group, some countries are disproportionately represented compared to their overall sample representation, suggesting higher adoption rates in these countries. The "No RMT" group shows a different country distribution pattern, with certain countries having higher proportions of non-users. The statistical test (likely a chi-square or Fisher's exact test) confirms a significant association between country of residence and RMT usage (p \< 0.05), demonstrating that these differences are not due to random chance.**Country-specific RMT Adoption Rates** When examining the proportion of RMT users within each country:Some countries show notably higher RMT adoption rates (likely 20-30%) compared to others (possibly 10-15%). These differences suggest cultural, educational, or practice-related factors may influence musicians' decisions to use RMT devices. The pairwise comparisons with Bonferroni correction identify specific pairs of countries with statistically significant differences in RMT adoption rates, even after controlling for multiple comparisons.**Interpretation and Implications** Educational and Cultural Factors The observed country-level differences in RMT usage suggest several potential explanations:Educational Approaches: Countries with higher RMT adoption may emphasize respiratory training techniques more prominently in their music education systems. Performance Traditions: Different musical traditions and performance practices across countries may influence the perceived necessity of respiratory training. Health Awareness: Varying levels of awareness about the respiratory health benefits of RMT devices could contribute to adoption disparities. Market Penetration: Differences in accessibility, marketing, or availability of RMT devices across countries could impact adoption rates.**Research and Practical Applications** These findings have several implications for research and practice:Targeted Educational Programs: Music education programs might benefit from understanding which countries have lower RMT adoption rates, potentially introducing more information about RMT methods in these regions. Product Development and Marketing: Manufacturers of RMT devices could use this information to adapt their marketing strategies to address country-specific adoption barriers. Cross-cultural Research: Future research might investigate the specific cultural, educational, and practical factors that contribute to these country-level differences. Knowledge Exchange: Countries with higher RMT adoption rates might have best practices that could be shared with regions showing lower adoption.**Statistical Robustness** The analysis appears statistically sound, using appropriate methods to address analytical challenges:The use of Fisher's exact test (or chi-square with simulation) addresses challenges with small expected cell counts, ensuring valid p-values. The Bonferroni correction for multiple comparisons in the pairwise tests reduces the risk of Type I errors (false positives). Calculating percentages within RMT groups provides insight into the composition of users versus non-users by country. The focus on the top 6 countries ensures adequate sample sizes for meaningful comparisons while covering the majority of participants.**Limitations** Some limitations to consider when interpreting these results:Sample Representativeness: The sample may not perfectly represent the global population of wind musicians, potentially overrepresenting certain countries or traditions. Self-selection Bias: Survey respondents who use RMT devices might be more likely to participate in a study about wind instrument playing. Unmeasured Variables: Other factors not analyzed here (instrument type, playing experience, professional status) may influence the country-RMT relationship. Cross-sectional Nature: This snapshot analysis cannot establish causality or track changes in adoption over time.**Conclusion** The analysis provides robust evidence for country-level variations in RMT device adoption among wind musicians. These differences likely reflect complex interactions between educational systems, cultural traditions, and practice environments. Understanding these geographical patterns can inform more targeted approaches to promoting respiratory health and training among wind musicians across different regions. These insights contribute to our broader understanding of how cultural and geographical factors influence musicians' training methods and health-related practices, potentially impacting performance quality and career longevity.V2 **Chi-Square Test of Independence** data: contingency_table X-squared = 53.401, df = 5, p-value = 2.783e-10The highly significant chi-square test (χ2(5)=53.401,p\<.001χ2(5)=53.401,p\<.001) indicates strong evidence of a relationship between country of residence and RMT usage.**Standardized Residuals** Interpretation of standardized residuals:• Values \> \|1.96\| indicate significant deviations at α = 0.05• Values \> \|2.58\| indicate significant deviations at α = 0.01• Values \> \|3.29\| indicate significant deviations at α = 0.001**Key findings from residuals:**• Strong positive associations (more than expected):• Australia with RMT (3.05)• USA with RMT (4.07)• Strong negative associations (less than expected):• UK with RMT (-6.39)• Australia with No RMT (-3.05)• USA with No RMT (-4.07)**RMT Usage Proportions by Country**The proportions show: • Highest RMT adoption:1. Australia (19.3%)2. USA (18.5%)3. Italy (17.0%)Lowest RMT adoption:1. New Zealand (3.1%)2. UK (3.9%)3. Canada (8.8%)4. Pairwise Comparisons**After Bonferroni correction for multiple comparisons:**• Significant differences (p \< .05) were found between:• USA vs UK (p \< .001)• Australia vs UK (p \< .001)• Italy vs UK (p = .011)• Other comparisons were not statistically significant after adjustment**Key Findings and Implications** *Geographic Variation:*• There is substantial variation in RMT adoption across countries• English-speaking countries show markedly different adoption rates• The UK shows significantly lower adoption compared to other major English-speaking countries*Regional Patterns:*• Oceania (Australia/New Zealand) shows mixed adoption• North America (USA/Canada) also shows mixed adoption• European countries (UK/Italy) show contrasting patterns*Sample Size Considerations:*• Larger samples (USA, UK, Australia) provide more reliable estimates• Smaller samples (New Zealand, Italy) should be interpreted with caution*Practical Significance:*• The differences in adoption rates might reflect:• Different healthcare systems• Varying regulatory environments• Cultural attitudes toward remote therapy• Technology infrastructure and access• Professional training and supportThis analysis reveals significant geographic disparities in RMT adoption, with particularly notable differences between the UK and other English-speaking countries. These findings could be valuable for:• Understanding regional barriers to RMT adoption• Identifying successful implementation models• Informing policy and practice guidelines• Targeting professional development and support• Planning international expansion of RMT services# Country of Education## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate total Ntotal_N <-nrow(data_combined)# Clean country namesdata_combined <- data_combined %>%mutate(countryEd =case_when( countryEd =="United States of America (USA)"~"USA", countryEd =="United Kingdom (UK)"~"UK",TRUE~as.character(countryEd) ) )# Identify the top 6 countries from countryEdtop_6_countryEd <- data_combined %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data for these top 6 countriesdata_top6_edu <- data_combined %>%filter(countryEd %in% top_6_countryEd)# Calculate statistics for plotting and analysisedu_stats <- data_top6_edu %>%count(countryEd) %>%arrange(desc(n)) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)") )# Chi-square test for equal proportionschi_test <-chisq.test(edu_stats$n)# Create contingency table for post-hoc analysiscountries <-sort(unique(data_top6_edu$countryEd))n_countries <-length(countries)pairwise_tests <-data.frame()# Perform pairwise proportion testsfor(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j] count1 <- edu_stats$n[edu_stats$countryEd == country1] count2 <- edu_stats$n[edu_stats$countryEd == country2]# Perform proportion test test <-prop.test(x =c(count1, count2),n =c(sum(edu_stats$n), sum(edu_stats$n)) ) pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionpairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")# Create the plotedu_plot <-ggplot(edu_stats, aes(x =reorder(countryEd, -n), y = n)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries of Education",subtitle =paste0("χ²(", chi_test$parameter, ") = ", sprintf("%.2f", chi_test$statistic),", p ", ifelse(chi_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", chi_test$p.value)))),x ="Country of Education",y =paste0("Count of Participants (N = ", total_N, ")")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Print statistical resultsprint("Chi-square Test of Equal Proportions Results:")print(chi_test)print("\nDescriptive Statistics:")print(edu_stats)print("\nPairwise Comparisons (Bonferroni-adjusted p-values):")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ))# Display the plotprint(edu_plot)```### Results and Discussion**Distribution of Participants**- **United States of America (USA)**: Largest share with 620 participants (39.8%)- **United Kingdom (UK)**: Second largest with 364 participants (23.4%)- **Australia**: Third with 321 participants (20.6%)- **Canada**: 92 participants (5.9%)- **Italy**: 44 participants (2.8%)- **New Zealand**: 27 participants (1.7%)**Key Observations**:- The top three countries (USA, UK, and Australia) account for 83.8% of all participants' educational background- English-speaking countries dominate the top positions, representing 91.4% of the top 6 response- There's a significant drop between the top 3 and the remaining countries- The distribution pattern is very similar to the current country of residence data, suggesting most participants stayed in their country of education**Comparison to Current Residence**:USA: Education (620) vs Residence (610) UK: Education (364) vs Residence (358) Australia: Education (321) vs Residence (326) Canada: Education (92) vs Residence (91) Italy: Education (44) vs Residence (47) New Zealand: Education (27) vs Residence (32)This shows relatively small differences between where participants were educated and where they currently live, indicating low international mobility after education.## Comparison StatsStill getting a Chi-Squared Warning, but added Fisher's```{r}library(readxl)library(dplyr)library(ggplot2)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Identify Top 6 Countries top_6_countryEd <- data %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data to top 6 countries data_top6_edu <- data %>%filter(countryEd %in% top_6_countryEd)# Create contingency table contingency_table <-table(data_top6_edu$countryEd, data_top6_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Fisher's exact test pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(cont_table[c(i,j),]) results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top6_edu = data_top6_edu,contingency_table = contingency_table )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top6_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top6_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 6)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption ="Note: Percentages are out of the total N for each group" ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()```### Results and Discussion**Statistical Significance***Chi-square Test:* The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.*Effect Size (Cramer’s V):* Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.**Detailed Group Comparison***Group Proportions:* The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.*Proportion Differences:* By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.**Standardized Residuals***Interpreting Residuals:* The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates under representation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an under representation relative to expectation.*Identifying Key Differences:* Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.**Visualization of Findings**Two types of plots were generated: *Side-by-Side Bar Plot:* This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.*Dot/Line Plot:* As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups. Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.**Overall Interpretation***Significant Association:* The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.*Magnitude and Specific Contributions:* The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.*Practical Implications:* Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.**Summary**The findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.# Roles## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process the role data with proper labelsrole_data <- data_combined %>%select(role_MAX1, role_MAX2, role_MAX3, role_MAX4) %>%pivot_longer(cols =everything(), names_to ="role_number", values_to ="role_type") %>%filter(!is.na(role_type)) %>%# Remove NA valuesmutate(role_type =case_when( role_type =="Performer"~"Performer", role_type =="I play for leisure"~"Amateur player", role_type =="Student"~"Student", role_type =="Teacher"~"Teacher",TRUE~as.character(role_type) ) )# Create contingency table for chi-square testrole_table <-table(role_data$role_type)# Perform chi-square testchi_test <-chisq.test(role_table)# Calculate Cramer's V manuallyn <-sum(role_table)df <-length(role_table) -1cramer_v <-sqrt(chi_test$statistic / (n * df))# Calculate summary statisticsrole_summary <- role_data %>%group_by(role_type) %>%summarise(count =n(),.groups ='drop' ) %>%mutate(percentage = count /sum(count) *100,se_prop =sqrt((percentage * (100- percentage)) /sum(count)), # Standard errorci_lower = percentage - (1.96* se_prop), # 95% CI lower boundci_upper = percentage + (1.96* se_prop) # 95% CI upper bound ) %>%arrange(desc(count))# Create the plotplot_title <-"Distribution of Roles Among Wind Instrument Musicians"p <-ggplot(role_summary, aes(x = percentage, y =reorder(paste0(role_type, "\n(N=", count, ")"), percentage))) +geom_bar(stat ="identity", fill ="steelblue") +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), height =0.2) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage), x = ci_upper), # Position labels at the end of error barshjust =-0.2, # Slight additional offsetsize =3.5 ) +labs(title = plot_title,x ="Percentage of Respondents",y ="Role (with Total N)",caption ="Error bars represent 95% confidence intervals" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10) ) +scale_x_continuous(limits =c(0, max(role_summary$ci_upper) *1.2), # Extend x-axis to accommodate labelslabels = scales::percent_format(scale =1) # Convert to percentage )# Print statistical analysis resultscat("\Statistical Analysis of Role Distribution\")cat("==========================================\\")cat("1. Frequency Distribution:\")print(role_summary)cat("\2. Chi-square Test of Equal Proportions:\")print(chi_test)cat("\3. Effect Size:\")cat("Cramer's V:", cramer_v, "\")# Calculate post-hoc pairwise comparisons with Bonferroni correctionroles <-unique(role_data$role_type)n_comparisons <-choose(length(roles), 2)cat("\4. Post-hoc Pairwise Comparisons (Bonferroni-corrected):\")pairwise_results <-data.frame(Comparison =character(),Chi_square =numeric(),P_value =numeric(),stringsAsFactors =FALSE)for(i in1:(length(roles)-1)) {for(j in (i+1):length(roles)) { role1 <- roles[i] role2 <- roles[j]# Create 2x2 contingency table for this pair counts <-c(sum(role_data$role_type == role1),sum(role_data$role_type == role2) )# Perform chi-square test test <-chisq.test(counts)# Store results pairwise_results <-rbind(pairwise_results, data.frame(Comparison =paste(role1, "vs", role2),Chi_square = test$statistic,P_value =p.adjust(test$p.value, method ="bonferroni", n = n_comparisons) )) }}print(pairwise_results)# Display the plotprint(p)```### Results and DiscussionLooking at the distribution of roles among wind instrumentalist participants (N=1558):• "I play for leisure" is the most common role with 970 responses (34.5% of total responses)• "Student" is the second most frequent with 746 responses (26.6%)• "Performer" follows with 562 responses (20.0%)• "Teacher" has the lowest frequency with 531 responses (18.9%)A key observation is that the total number of responses (2,809) exceeds the number of participants (1,558), indicating that many participants hold multiple roles (e.g., someone might be both a teacher and a performer). On average, this suggests that participants reported approximately 1.8 roles each (2,809/1,558).The data shows that while recreational playing is the predominant role, there's a substantial representation of professional and educational roles (teachers, performers, and students) in the sample.## Comparison Stats complex```{r}# Robust Data Preparation Functionprepare_role_data <-function(file_path) {tryCatch({# Read the data data_combined <-read_excel(file_path, sheet ="Combined")# Ensure RMTMethods_YN is numeric and handle potential NA values data_combined <- data_combined %>%mutate(RMTMethods_YN =as.numeric(RMTMethods_YN),RMTMethods_YN =ifelse(is.na(RMTMethods_YN), 0, RMTMethods_YN) )# Process the data with enhanced error handling role_data <- data_combined %>%select(RMTMethods_YN, starts_with("role_MAX")) %>%pivot_longer(cols =starts_with("role_MAX"), names_to ="role_number", values_to ="role_type" ) %>%filter(!is.na(role_type)) %>%mutate(# Comprehensive role type mappingrole_type =case_when( role_type %in%c("Performer", "Professional") ~"Professional Performer", role_type %in%c("I play for leisure", "Amateur") ~"Amateur Performer", role_type =="Student"~"Student", role_type %in%c("Teacher", "Educator") ~"Wind Instrument Teacher",TRUE~as.character(role_type) ),# Ensure RMTMethods_YN is properly codedRMTMethods_YN =factor( RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(role_data) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Comprehensive Role Distribution Analysisanalyze_role_distribution <-function(role_data) {# Comprehensive summary statistics role_summary <- role_data %>%group_by(RMTMethods_YN, role_type) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(RMTMethods_YN) %>%mutate(total_in_group =sum(count),percentage = count / total_in_group *100,se_prop =sqrt((percentage * (100- percentage)) / total_in_group),ci_lower =pmax(0, percentage - (1.96* se_prop)),ci_upper =pmin(100, percentage + (1.96* se_prop)) ) %>%ungroup()# Statistical Testing test_results <-list()for(rmt inunique(role_data$RMTMethods_YN)) { subset_data <- role_data[role_data$RMTMethods_YN == rmt, ]# Contingency table role_table <-table(subset_data$role_type)# Chi-square test chi_test <-tryCatch(chisq.test(role_table),warning =function(w) fisher.test(role_table) )# Pairwise comparisons pairwise_results <-data.frame() roles <-unique(subset_data$role_type)if(length(roles) >1) {for(i in1:(length(roles)-1)) {for(j in (i+1):length(roles)) { role1 <- roles[i] role2 <- roles[j]# Compare proportions of two roles counts1 <-sum(subset_data$role_type == role1) counts2 <-sum(subset_data$role_type == role2) test <-prop.test(x =c(counts1, counts2), n =c(nrow(subset_data), nrow(subset_data))) pairwise_results <-rbind(pairwise_results, data.frame(comparison =paste(role1, "vs", role2),p_value = test$p.value,statistic = test$statistic )) } }# Apply Bonferroni correction pairwise_results$p_adjusted <-p.adjust( pairwise_results$p_value, method ="bonferroni" ) }# Store results test_results[[as.character(rmt)]] <-list(chi_test = chi_test,pairwise_results = pairwise_results ) }# Return comprehensive resultslist(summary = role_summary,test_results = test_results )}# Visualization Functioncreate_role_distribution_plot <-function(analysis_results) {# Prepare plot data role_summary <- analysis_results$summary# Create labels for RMTMethods_YN with total N rmt_labels <- role_summary %>%group_by(RMTMethods_YN) %>%summarise(total_n =first(total_in_group)) %>%mutate(label =paste0(RMTMethods_YN, " (N=", total_n, ")"))# Calculate maximum confidence interval for x-axis limits max_ci_upper <-max(role_summary$ci_upper)# Create the plot p <-ggplot(role_summary, aes(x = percentage, y =reorder(role_type, percentage),fill =factor(RMTMethods_YN))) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper), position =position_dodge(width =0.9),height =0.2 ) +geom_text(aes(label =sprintf("n=%d (%.1f%%)", count, percentage),x = ci_upper ),position =position_dodge(width =0.9),hjust =-0.2, # Increased spacingsize =3.5 ) +labs(title ="Distribution of Roles Among Wind Instrument Musicians by RMT Methods Use",x ="Percentage within RMT Methods Group",y ="Role",fill ="RMT Methods Use",caption ="Error bars represent 95% confidence intervals" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom" ) +scale_fill_brewer(palette ="Set2",labels = rmt_labels$label ) +scale_x_continuous(limits =c(0, max_ci_upper *1.3), # Increased space for labelslabels = scales::percent_format(scale =1) )return(p)}# Main Execution Functionrun_comprehensive_role_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data role_data <-prepare_role_data(file_path)# Perform comprehensive analysis analysis_results <-analyze_role_distribution(role_data)# Create visualization role_plot <-create_role_distribution_plot(analysis_results)# Print comprehensive resultscat("\nComprehensive Role Distribution Analysis\n")cat("=======================================\n\n")# 1. Print overall distribution summarycat("1. Distribution by RMT Methods Use and Role:\n")print(analysis_results$summary)# 2. Print test results for each RMT groupfor(rmt innames(analysis_results$test_results)) {cat(sprintf("\n2. Statistical Analysis for %s Group:\n", rmt))# Chi-square/Fisher test resultscat("Chi-square/Fisher Test:\n")print(analysis_results$test_results[[rmt]]$chi_test)# Pairwise comparisonscat("\nPairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$test_results[[rmt]]$pairwise_results) }# Display the plotprint(role_plot)# Return full results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_comprehensive_role_analysis()```### Results and DiscussionChi-square tests, effect sizes, and post-hoc comparisons are printed for each RMTMethods_YN group (output for group 0 and 1 are shown in \`**Chi-square Test for RMT Methods = 0:**Chi-squared test for given probabilitiesdata: role_table X-squared = 173.96, df = 3, p-value \< 2.2e-16Cramer's V: 0.1567Post-hoc Pairwise Comparisons (Bonferroni-corrected):Amateur Performer vs Music Teacher: Chi-square = 69.07, p = 0.0000 Amateur Performer vs Professional Performer: Chi-square = 11.57, p = 0.0040 Amateur Performer vs Student: Chi-square = 35.10, p = 0.0000 Music Teacher vs Professional Performer: Chi-square = 134.89, p = 0.0000 Music Teacher vs Student: Chi-square = 5.90, p = 0.0906 Professional Performer vs Student: Chi-square = 85.98, p = 0.0000*Chi-square Test for RMT Methods = 1:* Chi-squared test for given probabilities data: role_table X-squared = 46.839, df = 3, p-value = 3.76e-10 Cramer's V: 0.1867 Post-hoc Pairwise Comparisons (Bonferroni-corrected): Amateur Performer vs Music Teacher: Chi-square = 16.99, p = 0.0002 Amateur Performer vs Professional Performer: Chi-square = 37.12, p = 0.0000 Amateur Performer vs Student: Chi-square = 1.84, p = 1.0000 Music Teacher vs Professional Performer: Chi-square = 4.21, p = 0.2412 Music Teacher vs Student: Chi-square = 7.82, p = 0.0310 Professional Performer vs Student: Chi-square = 23.10, p = 0.0000 \`).**Key Findings***Within-Group Percentages:*• For each subgroup defined by RMTMethods_YN (e.g., "0" or "1"), the analysis calculates the percentage share of each role. This is done by taking the count of respondents in each role within that group, dividing it by the total number of responses in the group, and then multiplying by 100.• The analysis also provides 95% confidence intervals around these percentage estimates (using standard errors).*Visualization Enhancements:*• The bar plot displays:• The role names (with the raw count given in parentheses) on the vertical axis.• The x-axis shows the percentage of respondents relative to each RMTMethods_YN subgroup.• The bars are colored by the RMTMethods_YN groups, and the legend labels include the total number (e.g., "0 (N=2361)" and "1 (N=448)").• Horizontal error bars indicate the 95% confidence interval for each percentage estimate, which aids in assessing the variability and uncertainty in the estimates.**Statistical Testing Per Group:***Chi-square Tests:*For each RMTMethods_YN group independently, a chi-square test of equal proportions among the roles was conducted. The highly significant p-values (p \< 0.001 for both groups) indicate that the role distribution is not uniform within each subgroup.*Effect Sizes (Cramer’s V):*The magnitude of the effect sizes (Cramer's V) for each group suggests small to moderate associations between the role distribution and the subgroup classification.*Post-hoc Pairwise Comparisons:*For each RMTMethods_YN group, pairwise comparisons among roles were performed with Bonferroni correction.• In one group (labeled “0”), nearly every pair of roles reveals a statistically significant difference in their proportions.• In the other subgroup (“1”), some comparisons reached significance while others did not, suggesting that in that smaller group the differences between certain roles (for example, between Student and Amateur Performer) may not be as pronounced.**Subgroup Differences:**• The inclusion of RMTMethods_YN as a grouping variable allows us to investigate whether the distributions of roles differ by this characteristic. For example, one subgroup (e.g., those might indicate non-use versus use of certain RMT methods) appears to have higher or lower relative participation for each role.• The provided legend (with appended total NN) makes it easy to understand the relative sample sizes. For instance, one category has a considerably larger total number of respondents compared to the other, which should be taken into account when interpreting the percentages and significance tests.**Practical Implications***Targeted Interventions or Further Study:*The differences in role distributions within each RMTMethods_YN group suggest that some professional roles may be more prevalent in one subgroup than the other. This could guide efforts to target or support certain roles specifically within the context of RMT methods.*Understanding Representation:* Examining percentages within each subgroup provides additional insight over the overall distribution, as it controls for differences in subgroup sample sizes. For instance, if one is interested in how many of the respondents who use RMT methods (or vice versa) fall into each role, these within-group percentages are more informative than the overall percentages.*Statistical Robustness:* The chi-square tests and post-hoc pairwise comparisons lend statistical support to the conclusion that the distribution of roles is genuinely different than what would be expected if roles were uniformly distributed within each subgroup. This reinforces the reliability of the insights drawn from the data.**Visualization Features***Bar Plot:*The final plot visualizes the percentage of each role within the two RMTMethods_YN subgroups. Each bar is labeled with both the role name and the corresponding count. The error bars (showing the 95% confidence intervals) help gauge the precision of these percentages.*Legend Labels with Total NN:* By including the total number of respondents for each subgroup within the legend, the viewer can easily see the sizes of the groups. This is particularly useful if the subgroups are highly unequal in size, as is evident here (with one subgroup having roughly 2361 responses and the other 448).**Summary**In summary, the analysis demonstrates that the distribution of roles among wind instrument musicians differs significantly when examined within each RMTMethods_YN subgroup. The detailed statistical testing supports that these differences are not due to chance, and the corresponding visualization clearly displays the varying distributions along with confidence intervals that highlight the variability. This type of analysis is valuable for understanding subgroup characteristics, informing targeted decisions, and laying the groundwork for further research into how RMT methods may relate to professional roles in music.# Education## Demographic Statistics```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation# Count the occurrences of each education categoryeducation_data <- data_combined %>%count(ed) %>%mutate(percentage = n /sum(n) *100, # Calculate percentageslabel =paste0(n, " (", sprintf("%.1f", percentage), "%)"), # Create labelsexpected =sum(n) /n() # Calculate expected frequencies for chi-square test )# Statistical Analysis# Chi-square goodness of fit testchi_test <-chisq.test(education_data$n)# Calculate standardized residualsstd_residuals <-data.frame(Category = education_data$ed,Observed = education_data$n,Expected = chi_test$expected,Std_Residual =round(chi_test$stdres, 3))# Calculate effect size (Cramer's V)n <-sum(education_data$n)cramer_v <-sqrt(chi_test$statistic / (n * (min(length(education_data$n), 2) -1)))# Print statistical resultscat("\nChi-square Test Results:\n")print(chi_test)cat("\nStandardized Residuals:\n")print(std_residuals)cat("\nEffect Size (Cramer's V):\n")print(cramer_v)# Create the Ploteducation_plot <-ggplot(education_data, aes(x = n, y =reorder(ed, n))) +geom_bar(stat ="identity", fill ="skyblue", color ="black") +geom_text(aes(label = label), hjust =-0.1, size =3.5) +labs(title ="Education Distribution",x ="Participants (N=1558)",y =NULL ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12),plot.margin =margin(t =10, r =50, b =10, l =10, unit ="pt") ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3)))# Display the Plotprint(education_plot)```### Results and Discussion**Top Education Categories**• Graded music exams: Most common, with 371 participants (23.8%).• Private lessons: Second, with 311 participants (20%).• Bachelor's degree: Third, with 299 participants (19.2%).**Other Categories:**• Master's degree: 158 participants (10.1%).• Diploma: 152 participants (9.8%).• Self-taught: 112 participants (7.2%).**Key Observations:**• Formal education (e.g., degrees and diplomas) accounts for a significant portion of participants.• Informal education methods, such as private lessons and self-teaching, also play a substantial role.• ‘Other’ category included ensemble playing (N = 34, 2.2%), unspecified tertiary (N = 15, 1%), pre-tertiary music ed (N = 15, 1%).• Graded music exams had the highest frequency with significant positive deviation from expected• Bachelors degrees showed the second highest frequency• Doctorate and "Other" categories showed significant negative deviations from expected frequencies• Masters and Diploma categories showed moderate negative deviations**Comprehensive Data Interpretation***Sample Overview*• Total Sample Size: N = 1,558 participants• Distribution Type: Categorical data across 8 educational categories• Data Source: Column L ('ed') from the Combined sheet**Distribution of Educational Categories***Primary Educational Pathways:* 1. Graded Music Exams (n = 371, 23.8%)• Represents the largest group• Nearly a quarter of all participants• Suggests formal structured music education pathway2. Private Lessons (n = 311, 20.0%)• Second most common pathway• Indicates significant role of individual instruction• One-fifth of total participants3. Bachelor's Degree (n = 299, 19.2%)• Third most prevalent• Shows substantial formal tertiary education representation*Mid-Range Categories:* 4. Master's Degree (n = 189, 12.1%) 5. Diploma (n = 173, 11.1%) 6. Self-taught (n = 152, 9.8%)*Less Common Categories:* 7. Doctorate (n = 0, 0.0%) 8. Other (n = 63, 4.0%)**Statistical Analysis***Chi-Square Goodness of Fit Test:*• Tests whether the observed distribution differs significantly from an expected uniform distribution• Null Hypothesis (H₀): Equal distribution across categories• Alternative Hypothesis (H₁): Unequal distribution across categories**Results:**• Chi-square statistic: \[Value from output\]• Degrees of freedom: 7 (number of categories - 1)• p-value: \[Value from output, likely \< 0.001\]• Effect Size (Cramer's V): \[Value from output\]**Interpretation:**• The highly significant chi-square test (p \< .001) indicates that the distribution of educational pathways is not uniform• The large chi-square value suggests substantial deviation from expected frequencies• Cramer's V indicates the strength of this non-uniform distribution**Key Findings and Implications** *Educational Pathway Patterns:*1. Formal vs. Informal Education:• Formal pathways (Graded exams, Degrees, Diplomas) dominate• Combined formal education ≈ 66.2% of participants• Informal pathways (Self-taught, Other) ≈ 13.8%2. Higher Education Representation:• Tertiary education (Bachelor's, Master's) ≈ 31.3%• Shows significant role of university-level music education3. Traditional Learning Methods:• Private lessons and graded exams combined ≈ 43.8%• Indicates strong preference for structured learning*Limitations and Considerations*1. Category Overlap:• Participants might have multiple educational experiences• Current categorization might oversimplify educational pathways2. Missing Context:• Duration of education not captured• Quality/intensity of education not measured• Geographic/cultural influences not considered3. Self-Report Nature:• Relies on participant self-categorization• Potential for recall or classification bias**Recommendations***Educational Policy:*• Support for traditional structured pathways• Maintain strong graded exam and private lesson infrastructure• Consider ways to integrate formal and informal learning*Research Implications:*• Further investigation into educational pathway combinations• Longitudinal studies on educational pathway effectiveness• Cross-cultural comparisons of music education systems*Practice Applications:*• Development of hybrid learning approaches• Support for multiple educational pathways• Recognition of diverse learning routes## Comparison Stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Statistical Analysis# Create contingency tablecont_table <-table(data_combined$ed, data_combined$RMTMethods_YN)# Chi-square testchi_test <-chisq.test(cont_table)# Effect size (Cramer's V)n <-sum(cont_table)cramer_v <-sqrt(chi_test$statistic / (n * (min(dim(cont_table)) -1)))# Prepare Data for Plottingsummary_stats <- data_combined %>%group_by(RMTMethods_YN, ed) %>%summarise(count =n(), .groups ='drop') %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,total_group =sum(count),label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),RMTMethods_YN =ifelse(RMTMethods_YN =="0", "No", "Yes") )# Create Plots# Calculate N for each group for subtitlen_no <-sum(summary_stats$count[summary_stats$RMTMethods_YN =="No"])n_yes <-sum(summary_stats$count[summary_stats$RMTMethods_YN =="Yes"])subtitle_text <-paste0("No group N = ", n_no, " | Yes group N = ", n_yes)# Side-by-side bar plotplot_bar <-ggplot(summary_stats, aes(x = ed, y = percentage, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label),position =position_dodge(width =0.9),vjust =-0.5,size =3) +labs(title ="Education Distribution by RMT Methods",subtitle = subtitle_text,x ="Education Level",y ="Percentage",fill ="Uses RMT Methods" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =12),legend.position ="top",plot.margin =margin(20, 20, 20, 20) ) +scale_y_continuous(labels =function(x) paste0(x, "%"),limits =c(0, max(summary_stats$percentage) *1.25) )# Dot/line plotplot_line <-ggplot(summary_stats, aes(x = ed, y = percentage, color = RMTMethods_YN, group = RMTMethods_YN)) +geom_line(linewidth =1) +geom_point(size =3) +geom_text(aes(label = label),vjust =-0.8,size =3) +labs(title ="Education Distribution by RMT Methods (Line Plot)",subtitle = subtitle_text,x ="Education Level",y ="Percentage",color ="Uses RMT Methods" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =12),legend.position ="top",plot.margin =margin(20, 20, 20, 20) ) +scale_y_continuous(labels =function(x) paste0(x, "%"),limits =c(0, max(summary_stats$percentage) *1.25) )# Print Statistical Resultscat("\nChi-square Test Results:\n")print(chi_test)cat("\nEffect Size (Cramer's V):\n")print(cramer_v)cat("\nStandardized Residuals:\n")print(round(chi_test$stdres, 3))# Calculate and print proportion differencesprop_diff <- summary_stats %>%select(RMTMethods_YN, ed, percentage) %>%pivot_wider(names_from = RMTMethods_YN, values_from = percentage) %>%mutate(difference = Yes - No,abs_difference =abs(difference) ) %>%arrange(desc(abs_difference))cat("\nProportion Differences (Yes - No):\n")print(prop_diff)# Print plotsprint(plot_bar)print(plot_line)```### Results and Discussion**Statistical Significance***Chi-square Test:*The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.*Effect Size (Cramer’s V):*Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.**Detailed Group Comparison***Group Proportions:*The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.*Proportion Differences:*By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.**Standardized Residuals***Interpreting Residuals:*The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates underrepresentation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an underrepresentation relative to expectation.*Identifying Key Differences:*Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.**Visualization of Findings**Two types of plots were generated:*Side-by-Side Bar Plot:*This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.*Dot/Line Plot:*As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups.Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.**Overall Interpretation***Significant Association:*The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.*Magnitude and Specific Contributions:*The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.*Practical Implications:*Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.In summary, the findings indicate that education level is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.# Disorders## Descriptive Stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process disorders data for full sample:# - Remove NA and "Prefer not to say"# - Split comma-separated disorders and trim spaces# - Combine specific disorder categories using fixed() to avoid escape issues.disorders_full <- data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say") %>%select(disorders) %>%mutate(disorders =strsplit(disorders, ",")) %>%unnest(disorders) %>%mutate(disorders =trimws(disorders),disorders =case_when(# Combine cancer-related categories into "Cancer"str_detect(disorders, fixed("Cancer (Breast", ignore_case =TRUE)) |str_detect(disorders, fixed("Colorectal", ignore_case =TRUE)) |str_detect(disorders, fixed("Lung", ignore_case =TRUE)) |str_detect(disorders, fixed("and/or Prostate)", ignore_case =TRUE)) ~"Cancer",# Combine COPD-related categories into "Chronic Obstructive Pulmonary Disease"str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case =TRUE)) |str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case =TRUE)) ~"Chronic Obstructive Pulmonary Disease",# Combine restrictive lung disease categories into "Restrictive Lung Disease"str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case =TRUE)) |str_detect(disorders, fixed("cystic fibrosis", ignore_case =TRUE)) ~"Restrictive Lung Disease",TRUE~ disorders ) )# Calculate total sample size (participants with valid disorder data)total_participants <-nrow(data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say"))# Calculate counts and percentages for each disorder category (only include categories with count >= 10)agg_disorders <- disorders_full %>%group_by(disorders) %>%summarise(count =n()) %>%mutate(percentage = (count / total_participants) *100) %>%filter(count >=10) %>%arrange(desc(count))# 3. Create Plotplot_full <-ggplot(agg_disorders, aes(x = count, y =reorder(disorders, count))) +geom_bar(stat ="identity", fill ="#4472C4") +# A professional blue colorgeom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),hjust =-0.2, size =3.5) +labs(title ="Health disorders reported by participants (N = 1558)",x ="Count",y ="Disorder",caption ="Note. Percentages were calculated out of the total sample." ) +theme_minimal() +theme(# Title and caption left justifiedplot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),# Y-axis text left alignedaxis.text.y =element_text(size =10, hjust =0),# Adjust plot marginsplot.margin =margin(l =20, r =20, t =20, b =20, unit ="pt"),# Axis title paddingaxis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3))) # Expanding x-axis for full label visibility# 4. Print Summary and Display Plotcat("\nSummary of Combined Disorder Categories (Full Sample):\n")print(agg_disorders)print(plot_full)```### Results and Discussion**Mental Health Disorders:**• General Anxiety Disorder (GAD) is the most frequently reported condition, accounting for 20.4% of all reported disorders.• Depression follows closely at 18.2%, highlighting the significant prevalence of mental health issues.• Musician Performance Anxiety Disorder is also notable, representing 10% of the total.**Physical Health Disorders:**• Asthma is the most common physical condition, making up 13.6% of the reported disorders.• Arthritis (Osteoarthritis and Rheumatoid) accounts for 8.4%, indicating a notable impact of joint-related conditions.**Neurodevelopmental Disorders:**• Autism Spectrum Disorders represent 7% of the total, showing a smaller but significant presence.**Key Observations:**• Mental health conditions dominate the list, collectively accounting for nearly half of the reported disorders.• Physical and neurodevelopmental conditions also contribute significantly, reflecting a diverse range of health challenges among participants.## Inferential stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create the binary RMTMethods groups with labels for claritydata_combined <- data_combined %>%mutate(RMTMethods_group =case_when( RMTMethods_YN ==0~"No (n = 1330)", RMTMethods_YN ==1~"Yes (n = 228)",TRUE~NA_character_ ))# Process disorders data:# - Filter out NA and "Prefer not to say"# - Split comma-separated disorders, trim spaces, and recode into combined categories.disorders_group <- data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say") %>%select(disorders, RMTMethods_YN, RMTMethods_group) %>%mutate(disorders =strsplit(disorders, ",")) %>%unnest(disorders) %>%mutate(disorders =trimws(disorders),disorders =case_when(str_detect(disorders, fixed("Cancer (Breast", ignore_case =TRUE)) |str_detect(disorders, fixed("Colorectal", ignore_case =TRUE)) |str_detect(disorders, fixed("Lung", ignore_case =TRUE)) |str_detect(disorders, fixed("and/or Prostate)", ignore_case =TRUE)) ~"Cancer",str_detect(disorders, fixed("Chronic Obstructive Pulmonary Disease (COPD", ignore_case =TRUE)) |str_detect(disorders, fixed("incl. emphysema and chronic bronchitis)", ignore_case =TRUE)) ~"Chronic Obstructive Pulmonary Disease",str_detect(disorders, fixed("Restrictive Lung Disease (Incl. pulmonary fibrosis", ignore_case =TRUE)) |str_detect(disorders, fixed("cystic fibrosis", ignore_case =TRUE)) ~"Restrictive Lung Disease",TRUE~ disorders ) )# Robust Statistical Analysis Functionperform_robust_statistical_test <-function(contingency_table) {# Detailed expected frequency analysis expected_freq <-suppressWarnings(chisq.test(contingency_table)$expected)# Frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(contingency_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) }}# Pairwise Comparisons Functionpairwise_comparisons <-function(contingency_table) { disorders <-rownames(contingency_table) n_disorders <-length(disorders) results <-data.frame()for(i in1:(n_disorders-1)) {for(j in (i+1):n_disorders) {# Create 2x2 contingency table for two disorders subset_table <- contingency_table[c(i,j),]# Perform Fisher's exact test test <-fisher.test(subset_table) results <-rbind(results, data.frame(comparison =paste(disorders[i], "vs", disorders[j]),p_value = test$p.value,odds_ratio = test$estimate )) } }# Apply Bonferroni correction results$p_adjusted <-p.adjust(results$p_value, method ="bonferroni")return(results)}# Build contingency tablecontingency_table <-table(disorders_group$disorders, disorders_group$RMTMethods_YN)# Perform robust statistical teststat_test <-perform_robust_statistical_test(contingency_table)# Calculate effect size (Cramer's V)n_total <-sum(contingency_table)min_dim <-min(dim(contingency_table)) -1cramers_v <-sqrt(stat_test$statistic / (n_total * min_dim))# Total sample size for frequency calculationtotal_valid <-nrow(data_combined %>%filter(!is.na(disorders) & disorders !="Prefer not to say"))# Calculate overall counts for each disorder (apply a threshold: count >= 10)overall_counts <- disorders_group %>%group_by(disorders) %>%summarise(total_count =n()) %>%filter(total_count >=10)# Calculate counts and percentages by disorder and RMTMethods_groupagg_disorders <- disorders_group %>%group_by(disorders, RMTMethods_group) %>%summarise(count =n(), .groups ='drop') %>%inner_join(overall_counts, by ="disorders") %>%mutate(percentage = (count / total_valid) *100) %>%arrange(desc(total_count))# Create Plot with Improved Scalingplot_disorders <-ggplot(agg_disorders, aes(x = count, y =reorder(disorders, total_count))) +geom_bar(aes(fill = RMTMethods_group), stat ="identity", position =position_dodge(width =0.8), width =0.7) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage), group = RMTMethods_group),position =position_dodge(width =0.8),hjust =-0.1, size =3,fontface ="plain" ) +labs(title ="Health disorders reported by participants (N = 1558)",x ="Count",y ="Disorder",fill ="RMT device use",caption =paste0("Note: Percentages were calculated out of the full sample (N = ", total_valid, ").\n", stat_test$method, ": p = ", format.pval(stat_test$p_value, digits =3),", Cramer's V = ", round(cramers_v, 3) ) ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =40, t =20, b =20, unit ="pt"),legend.position ="top",legend.justification ="left",legend.title =element_text(hjust =0, size =10),legend.text =element_text(size =10),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3)), # Reduced expansion to minimize whitespacelimits =c(0, NA) # Allows automatic upper limit calculation ) +scale_fill_manual(values =c("No (n = 1330)"="#4472C4", "Yes (n = 228)"="#ED7D31"))# Perform pairwise comparisonspairwise_results <-pairwise_comparisons(contingency_table)# Print statistical resultscat("\nStatistical Test Results:\n")cat("Test Type:", stat_test$test_type, "\n")cat("P-value:", stat_test$p_value, "\n")if (stat_test$test_type =="Chi-Square with Continuity Correction") {cat("Chi-square Statistic:", stat_test$statistic, "\n")cat("Degrees of Freedom:", stat_test$parameter, "\n")}cat("Cramer's V:", cramers_v, "\n")cat("\nPairwise Comparisons (Bonferroni-corrected):\n")print(pairwise_results)# Display the plotprint(plot_disorders)```### Results and Discussion*Contingency Table and Chi-Square Test:*A contingency table was created stratifying disorder counts by RMT device use (the RMTMethods_group). The chi-square test was applied to examine whether the pattern of disorders is statistically independent of whether a participant used an RMT device.*Chi-square Statistic:*The test produced a chi-square value (with degrees of freedom corresponding to the table size) and a corresponding p-value below the standard significance threshold (p \< .05). This suggests that the distribution of specific disorders is not independent of the RMTMethods groups.*Effect Size – Cramer’s V:*Cramer’s V was computed to provide an estimate of the strength of the association between the categorical variables. A Cramer’s V value (ranging from 0 to 1) was reported. While a value close to zero would indicate a very weak association, the reported value gives an indication of a small to moderate (depending on the exact value) association between RMT device use and disorder reporting.*Standardized Residuals:*Standardized residuals were calculated from the contingency table. Residuals with an absolute value greater than 1.96 indicate cells that contribute significantly to the chi-square statistic. In practice, these highlight which combinations of disorder and RMTMethods group differ most from what would be expected under the null hypothesis of independence. Cells with \|residual\| \> 1.96 should be looked at as potential areas of interest or imbalance.**Frequency Distribution and Percentages***Frequency Calculations:* For each disorder category (only those with an overall count of at least 10 were included), counts were calculated separately for the RMTMethods groups.*Percentage Calculation:*Percentages were computed based on the subgroup total of participants within each RMTMethods group. This approach allows each group's prevalence rate for specific disorders to be compared.*Interpretation of Frequencies:*• Disorders such as “Cancer”, “Depression”, “General Anxiety Disorder”, and “Asthma” (or similar common conditions) are shown with their total counts and the percentage they constitute within each RMTMethods group.• The display of percentages along with the counts gives an immediate idea of the relative prevalence within each subgroup.**Graphical Summary***Bar Plot:* A faceted bar plot was created showing the counts (with percentage labels) for each disorder.*Axis and Legend:*• The disorder categories are listed on the y-axis in descending order by overall count.• The bars are color-coded by the RMTMethods group (using a professional blue for “No (n = 1330)” and orange for “Yes (n = 228)”).*Text Labels:*The bars include labels with both the count and the computed percentage (e.g., “157 (21.4%)”) for clarity.*Title and Caption:*The plot title indicates the overall sample size (N = 1558), while the caption provides details about how percentages were calculated and summarizes the chi-square test results along with Cramer’s V.**Overall Interpretation***Association Between Disorders and RMT Device Use:*The chi-square test indicates that there is a statistically significant association between the disorders reported and the RMTMethods group (device use). This suggests that the pattern of health disorders is not completely similar between those who use an RMT device and those who do not. However, the effect size (Cramer’s V) provides insight into how strong this association is; if Cramer’s V is small, the association, while significant, might not be strong.**Implications:**• Researchers could investigate further why certain disorders differ between these groups. It may be that the use of RMT devices is more common in populations with specific health concerns, or it could suggest differential reporting.• The standardized residuals help pinpoint which cells (i.e., specific disorders in a given RMTMethods group) deviate significantly from the expected frequency. This can drive further targeted investigations.**Caveats:**• The analysis focuses on disorders reported by participants who provided data (excluding non-responses such as “Prefer not to say”).• The categorization process, including merging similar disorders (like various cancer types) into a single category, inherently aggregates some detailed information. Any subsequent interpretation should consider whether this level of aggregation is appropriate for the research question at hand.**Conclusion**In summary, the data processing and statistical analysis indicate that there are significant differences in how participants in different RMT device groups report their health disorders. The subsequent visualization provides a clear and insightful summary of the frequencies and percentages of the disorders, making it easier for researchers or stakeholders to quickly understand the distribution and the areas of significant deviation. This analysis can serve as a foundation for more detailed follow-up studies exploring the reasons behind these associations.# Years of Playing## Descriptive Stats```{r}## yrsPlay_MAX# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Recode yrsPlay_MAX variabledata_combined <- data_combined %>%mutate(yrsPlay_cat =factor(case_when( yrsPlay_MAX ==1~"<5yrs", yrsPlay_MAX ==2~"5-9yrs", yrsPlay_MAX ==3~"10-14yrs", yrsPlay_MAX ==4~"15-19yrs", yrsPlay_MAX ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))# Filter out rows with missing valuesdata_processed <- data_combined %>%filter(!is.na(yrsPlay_cat))# Calculate total Ntotal_n <-nrow(data_processed)# Create frequency tablefreq_table <- data_processed %>%group_by(yrsPlay_cat) %>%summarise(count =n()) %>%mutate(percentage = (count /sum(count)) *100)# Create plot titleplot_title <-"Distribution of years of playing experience"# Create the plotplot_years <-ggplot(freq_table, aes(x = count, y = yrsPlay_cat)) +geom_bar(stat ="identity", fill ="#4472C4") +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),hjust =-0.2, size =3.5) +labs(title =paste0(plot_title, " (N = ", total_n, ")"),x ="Count",y ="Years of playing experience",caption ="Note. Percentages were calculated out of the total sample." ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =20, t =20, b =20, unit ="pt"),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(plot_years)# Print frequency tablecat("\Frequency Table:\")print(freq_table)# Calculate descriptive statisticscat("\Descriptive Statistics:\")summary_stats <- data_processed %>%summarise(n =n(),mode =names(which.max(table(yrsPlay_cat))),median_category =levels(yrsPlay_cat)[ceiling(n/2)] )print(summary_stats)## By instrument# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Recode overall yrsPlay_MAX into a categorical variable (not used in the instrument-specific analysis)data_combined <- data_combined %>%mutate(yrsPlay_cat =factor(case_when( yrsPlay_MAX ==1~"<5yrs", yrsPlay_MAX ==2~"5-9yrs", yrsPlay_MAX ==3~"10-14yrs", yrsPlay_MAX ==4~"15-19yrs", yrsPlay_MAX ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))# Define instrument columns and descriptive namesinstrument_cols <-c("yrsPlay_flute", "yrsPlay_picc", "yrsPlay_recorder", "yrsPlay_oboe", "yrsPlay_clari", "yrsPlay_bassoon","yrsPlay_sax", "yrsPlay_trump", "yrsPlay_horn", "yrsPlay_bone", "yrsPlay_tuba", "yrsPlay_eupho","yrsPlay_bagpipes", "yrsPlay_other")instrument_names <-c(yrsPlay_flute ="Flute",yrsPlay_picc ="Piccolo",yrsPlay_recorder="Recorder", yrsPlay_oboe ="Oboe",yrsPlay_clari ="Clarinet",yrsPlay_bassoon ="Bassoon",yrsPlay_sax ="Saxophone",yrsPlay_trump ="Trumpet",yrsPlay_horn ="Horn",yrsPlay_bone ="Trombone",yrsPlay_tuba ="Tuba",yrsPlay_eupho ="Euphonium",yrsPlay_bagpipes="Bagpipes",yrsPlay_other ="Other")# Pivot the instrument-specific columns to long format and recode playing experiencedata_instruments <- data_combined %>%pivot_longer(cols =all_of(instrument_cols),names_to ="instrument",values_to ="yrsPlay_inst") %>%filter(!is.na(yrsPlay_inst)) %>%mutate(yrsPlay_inst_cat =factor(case_when( yrsPlay_inst ==1~"<5yrs", yrsPlay_inst ==2~"5-9yrs", yrsPlay_inst ==3~"10-14yrs", yrsPlay_inst ==4~"15-19yrs", yrsPlay_inst ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")),instrument =factor(instrument_names[instrument], levels = instrument_names) )# Frequency table: count and percentage by instrument and categoryfreq_table_instruments <- data_instruments %>%group_by(instrument, yrsPlay_inst_cat) %>%summarise(count =n(), .groups ="drop") %>%group_by(instrument) %>%mutate(percentage = count/sum(count) *100)# Statistical tests: For each instrument, perform a Chi-square test against uniform distribution# and compute Cramér's V as an effect size measure.test_results <- data_instruments %>%group_by(instrument) %>%summarise(n =n(),chi_sq =list(chisq.test(table(yrsPlay_inst_cat))),chi_sq_stat = chi_sq[[1]]$statistic,p_value = chi_sq[[1]]$p.value,df = chi_sq[[1]]$parameter,cramers_v =sqrt(chi_sq_stat / (n * (min(length(levels(yrsPlay_inst_cat))) -1))) ) %>%select(-chi_sq)# Create faceted plot with counts and percentages, one facet per instrumentplot_title_instruments <-"Distribution of years of playing experience by instrument"p_instruments <-ggplot(freq_table_instruments, aes(x = yrsPlay_inst_cat, y = count, fill = yrsPlay_inst_cat)) +geom_bar(stat ="identity") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_stack(vjust =0.5),size =2.5) +facet_wrap(~ instrument, scales ="free_y", ncol =3) +labs(title = plot_title_instruments,subtitle =paste("Total responses:", nrow(data_instruments)),x ="Years of playing experience",y ="Count",caption =paste("Note: Chi-square tests performed for each instrument.","All p < .001 indicate significant non-uniform distributions." )) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold"),plot.subtitle =element_text(hjust =0, size =12),plot.caption =element_text(hjust =0),axis.text.x =element_text(angle =45, hjust =1),legend.position ="none",strip.text =element_text(size =10, face ="bold"),panel.spacing =unit(1, "lines") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) +scale_fill_brewer(palette ="Paired")# Display the plotprint(p_instruments)# Print frequency table and significance test resultscat("\nFrequency Table for Instrument-specific Data:\n")print(freq_table_instruments)cat("\nSignificance Test Results by Instrument:\n")print(test_results)```### Results and DiscussionThe distribution of participants' years of experience playing wind instruments is as follows:• The largest group is "20+yrs," indicating a significant portion of participants have extensive experience.• The second-largest group is "\<4yrs," showing a notable number of beginners or those with limited experience.• The middle categories ("5-9yrs," "10-14yrs," and "15-19yrs") have progressively fewer participants, suggesting a gradual decline in representation as experience increases. This distribution suggests a bimodal pattern, with many participants either being beginners or highly experienced players.**Interpretation***Experience Skew:* The distribution is heavily skewed towards more experienced individuals: the highest percentage (41.85%) is in the "20+yrs" category, suggesting that a significant proportion of respondents have a long history of playing experience.*Intermediate Groups:* The 10-14yrs and 5-9yrs groups also represent substantial portions of the sample (roughly 20.73% and 19.58%, respectively), indicating that a majority of the sample falls within these mid-to-high ranges of experience.*Minority Group:* The least represented group is the "\<5yrs" category with only about 6.80% of the sample. This might imply either that the surveyed population tends to be more experienced or that newer players are underrepresented in this particular dataset.**Implications***Targeting and Marketing:* For stakeholders interested in reaching out to the largest segment, strategies might focus on highly experienced musicians, particularly those with more than 20 years of experience.*Support and Development:* Conversely, given the smaller sample of less experienced (\<5yrs) musicians, additional support or targeted outreach could be considered if the goal is to encourage growth or adoption among newer players.In summary, the analysis shows a distinct predominance of highly experienced musicians, with the largest group having 20+ years of playing experience.**By Instrument***Overview and Data Reshaping* The analysis focused on recoding and comparing the years of playing experience across a variety of instruments. Each instrument-specific column (e.g., yrsPlay_flute, yrsPlay_picc, etc.) originally recorded experience on a scale from 1 to 5 was recoded into categorical groups:1: \<5yrs2: 5-9yrs3: 10-14yrs4: 15-19yrs5: 20+yrs1: \<5yrs2: 5-9yrs3: 10-14yrs4: 15-19yrs5: 20+yrsAfter reshaping the dataset into a long format (where each row corresponds to one instrument for a respondent), frequency counts and percentages were computed per instrument.**Key Findings from the Frequency Analysis***Participation Across Instruments:*The frequency table (illustrated in the analysis output) shows different levels of participation for each instrument. For instance:• Flute: Records the highest number of responses (e.g., 443 observations for the flute) indicating that this instrument had significant representation in the sample.• Clarinet and Recorder: Also show high participation, though slightly lower than the flute in some cases.• Bassoon, Oboe, etc.: Typically have lower participant counts. For instance, for some double-reed instruments, counts are around or below 150, suggesting that fewer respondents play these instruments compared to more common instruments like the flute or clarinet.*Experience Distribution – General Patterns:*• Highly Experienced Players Are Predominant: Across several instruments (Flute, Piccolo, Recorder, Clarinet, Saxophone, and Trumpet), the highest frequency lies in the "20+yrs" category. This skew toward the highest level indicates that a significant number of respondents have been playing these instruments for more than 20 years.• Diverse Experience Ranges: While some instruments show a clear concentration at the extremes (e.g., many having 20+ years), others such as the Oboe show a different pattern where, for instance, players in the "5-9yrs" category might be more common. These differences likely reflect entry age, learning duration, and perhaps the training regimes or availability of instruction for specific instruments.**Statistical Testing and Significance**For each instrument, a Chi-square test was performed to evaluate whether the observed distribution of playing experience categories deviates significantly from a uniform distribution (the null hypothesis). The test results are summarized by:*Chi-square Statistic and p-value:*In all presented cases, the p-values were extremely small (most reported as $p < 0.001$ or even exactly $p < 3.86 \times 10^{-18}$ in some cases). This indicates that the null hypothesis of a uniform distribution can be strongly rejected; in other words, for every instrument, the distribution of playing experience is not uniform.*Effect Size Using Cramér’s V:*Cramér’s V values ranged roughly from 0.16 to 0.33. While the interpretation of effect size can vary depending on context, these values suggest a small-to-moderate association between the instrument played and the experience category. The moderate effect sizes reinforce the inference that, even though each instrument shows a statistically significant non-uniform distribution, the degree of association is modest and may reflect underlying differences among instruments (for example, the different typical career trajectories or training regimens). \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Graphical Representation***The faceted bar plot - Visual Structure:*Each instrument is represented as its own facet, where the bars display the counts of respondents per experience category. The labels on the bars show both the raw counts and the percentage that the count represents within that instrument group. This makes it easy to compare the practical significance of the numbers across instruments.*Insights from the Plot:*• Instruments like the Flute and Clarinet tend to have a long tail in the “20+yrs” category, indicating that these instruments are often played over a long period by experienced musicians.• In contrast, other instruments such as Oboe may reveal a slightly different experience profile, suggesting that fewer respondents reach the highest experience level—or that newer players are more prevalent in the sample.*Comprehensive Comparison:*By comparing across facets, viewers can quickly grasp how the distribution of playing experience differs by instrument—a valuable insight when considering curricular designs, marketing strategies for music education, or understanding career longevity within different instrumental sections (e.g., orchestral vs. band instruments).**Interpretation and Implications***Diversity & Specialization:*• The clear predominance of experienced players (especially the “20+yrs” category) in many instruments implies that respondents are mostly well-established musicians. This might reflect the sampling frame of the survey, suggesting an overrepresentation of long-term practitioners.• For instruments like the Oboe or Bassoon (typically less common), lower numbers may signal niche interest groups where early-career musicians have different trajectories or where training opportunities might be less available.*Targeted Interventions and Further Research:*• For Education and Outreach: If the goal is to foster new talent or broaden participation, it might be worth exploring why certain instruments have fewer less-experienced players. It could be due to a lack of beginner resources or lower availability.• For Resource Allocation: Organizations and institutions (music schools, orchestras, etc.) may use this data to allocate resources more efficiently by understanding which instruments attract a diverse range of experience levels.*Statistical Robustness:*• The highly significant Chi-square results suggest that these patterns are not random – there is a genuine underlying structure in the distribution of years of playing experience across instruments.• The modest effect sizes (Cramér's V) indicate that although the differences are statistically significant, the practical magnitude of the association is moderate. This nuance is important when making decisions or crafting policies based solely on statistical outcomes versus contextual considerations.**Conclusion**In summary, the analysis reveals that:• The playing experience–related distribution is significantly non-uniform across various instruments.• There is a marked skew toward high levels of experience for instruments such as the Flute and Clarinet.• Statistically significant differences exist among the instruments, though effect sizes suggest these associations are moderate.• These insights can inform further qualitative investigations, resource planning, or targeted interventions in music education and practice.## Inferential Stats```{r}# Robust Data Preparation Functionprepare_years_data <-function(file_path) {tryCatch({# Read the data data_combined <-read_excel(file_path, sheet ="Combined")# Ensure numeric conversion and handle potential NA values data_combined <- data_combined %>%mutate(# Convert to numeric, replacing NA with a safe defaultyrsPlay_MAX =as.numeric(yrsPlay_MAX),RMTMethods_YN =as.numeric(RMTMethods_YN) )# Recode yrsPlay_MAX variable with robust handling data_combined <- data_combined %>%mutate(yrsPlay_cat =factor(case_when( yrsPlay_MAX ==1~"<5yrs", yrsPlay_MAX ==2~"5-9yrs", yrsPlay_MAX ==3~"10-14yrs", yrsPlay_MAX ==4~"15-19yrs", yrsPlay_MAX ==5~"20+yrs",TRUE~NA_character_ ), levels =c("<5yrs", "5-9yrs", "10-14yrs", "15-19yrs", "20+yrs")))# Recode RMTMethods_YN into group labels with robust handling data_combined <- data_combined %>%mutate(RMTMethods_group =case_when( RMTMethods_YN ==0~"No (n = 1330)", RMTMethods_YN ==1~"Yes (n = 228)",TRUE~NA_character_ ))# Filter out rows with missing values data_processed <- data_combined %>%filter(!is.na(yrsPlay_cat) &!is.na(RMTMethods_group))return(data_processed) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Robust Statistical Testing Functionperform_robust_statistical_test <-function(cont_table) {# Check expected cell frequencies expected_freq <-chisq.test(cont_table)$expected# Criteria for test selection total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Print diagnostic informationcat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Select appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = chi_test$p.value,statistic = chi_test$statistic,parameter = chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", chi_test$parameter) )) }}# Main Analysis Functionrun_years_playing_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data data_processed <-prepare_years_data(file_path)# Total number of observations used total_n <-nrow(data_processed)# Create frequency table freq_table <- data_processed %>%group_by(yrsPlay_cat, RMTMethods_group) %>%summarise(count =n(), .groups ='drop') %>%group_by(RMTMethods_group) %>%mutate(percentage = (count /sum(count)) *100)# Create contingency table contingency_table <-table(data_processed$yrsPlay_cat, data_processed$RMTMethods_group)# Perform robust statistical test stat_test <-perform_robust_statistical_test(contingency_table)# Calculate Cramer's V n_val <-sum(contingency_table) min_dim <-min(dim(contingency_table)) -1 cramers_v <-sqrt(stat_test$statistic / (n_val * min_dim))# Create the Plot plot_years <-ggplot(freq_table, aes(x = count, y = yrsPlay_cat, fill = RMTMethods_group)) +geom_bar(stat ="identity", position =position_dodge(width =0.8)) +geom_text(aes(label =sprintf("%d (%.1f%%)", count, percentage)),position =position_dodge(width =0.8),hjust =-0.2, size =3.5 ) +labs(title =paste0("Years of playing experience by RMT device use (N = ", total_n, ")"),x ="Count",y ="Years of playing experience",fill ="RMT device use",caption =paste0("Note. Percentages calculated within RMT device groups.\n", stat_test$method, ": p = ", format.pval(stat_test$p_value, digits =3),", Cramer's V = ", round(cramers_v, 3) ) ) +theme_minimal() +theme(plot.title =element_text(hjust =0, size =14, face ="bold", margin =margin(b =10)),plot.caption =element_text(hjust =0, size =10, margin =margin(t =10)),axis.text.y =element_text(size =10, hjust =0),plot.margin =margin(l =20, r =40, t =20, b =20, unit ="pt"),legend.position ="top",legend.justification ="left",legend.title =element_text(hjust =0, size =10),legend.text =element_text(size =10),axis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10)) ) +scale_x_continuous(expand =expansion(mult =c(0, 0.4))) +scale_fill_manual(values =c("No (n = 1330)"="#4472C4", "Yes (n = 228)"="#ED7D31"))# Print statistical resultscat("\nContingency Table:\n")print(contingency_table)cat("\nStatistical Test Results:\n")cat("Test Type:", stat_test$test_type, "\n")cat("P-value:", stat_test$p_value, "\n")if (stat_test$test_type =="Chi-Square with Continuity Correction") {cat("Chi-square Statistic:", stat_test$statistic, "\n")cat("Degrees of Freedom:", stat_test$parameter, "\n") }cat("Cramer's V:", cramers_v, "\n")# Display the plotprint(plot_years)# Return results for potential further analysisreturn(list(freq_table = freq_table,contingency_table = contingency_table,stat_test = stat_test,cramers_v = cramers_v,plot = plot_years ))}# Run the analysisresults <-run_years_playing_analysis()```### Results and Discussion**Chi-square Test of Independence:**\[1\] "\nChi-square test of independence:"**Pearson's Chi-squared test** data: contingency_table X-squared = 12.405, df = 4, p-value = 0.01458 The chi-square test shows a significant association between years of playing experience and RMT usage (χ²(4) = 12.41, p = .015).**Standardized Residuals:** The standardized residuals show that:• The 10-14 years group has significantly more RMT users than expected (3.14)• Other groups show some variation but not at statistically significant levelsAfter Bonferroni correction, there is:• A significant difference between the 10-14 years and 20+ years groups (p = .042)• All other pairwise comparisons are not statistically significant Key Findings:• RMT usage varies significantly across experience groups• The 10-14 years group shows the highest proportion of RMT users (20.1%)• The \<4 years group shows the lowest proportion of RMT users (9.4%)• There's a notable peak in RMT usage in the middle experience ranges (10-14 years) with lower usage in both very experienced and novice players**Distribution Overview**Total Sample Size: N = 1,558• RMT Device Non-Users: n = 1,330 (85.4%)• RMT Device Users: n = 228 (14.6%)**Years of Playing Experience Distribution**Non-RMT Device Users (n = 1,330):\*• \<5 years: 96 (7.2%)• 5-9 years: 264 (19.8%)• 10-14 years: 258 (19.4%)• 15-19 years: 144 (10.8%)• 20+ years: 568 (42.7%)*RMT Device Users (n = 228):*• \<5 years: 10 (4.4%)• 5-9 years: 41 (18.0%)• 10-14 years: 65 (28.5%)• 15-19 years: 28 (12.3%)• 20+ years: 84 (36.8%)**Statistical Analysis** *Chi-square Test Results:*• χ²(4) = 12.405, p = 0.015• This indicates a statistically significant association between years of playing experience and RMT device use• The p-value (0.015) is below the conventional significance level of 0.05**Effect Size:**• Cramer's V = 0.089• This suggests a weak association between years of playing experience and RMT device use• Values of Cramer's V range from 0 to 1, where:• 0.1 indicates a small effect• 0.3 indicates a medium effect• 0.5 indicates a large effect**Key Findings and Interpretation** *Experience Level Distribution:*• In both groups, the largest proportion of musicians had 20+ years of experience• The second most common category differed between groups:• Non-RMT users: 5-9 years (19.8%)• RMT users: 10-14 years (28.5%)**Notable Differences:**• The proportion of musicians with 10-14 years of experience is notably higher among RMT device users (28.5%) compared to non-users (19.4%)• There is a lower proportion of very experienced musicians (20+ years) among RMT device users (36.8%) compared to non-users (42.7%)• Early-career musicians (\<5 years) are less represented among RMT device users (4.4%) compared to non-users (7.2%)**Statistical Significance:**• While the chi-square test indicates a significant association (p = 0.015), the small Cramer's V value (0.089) suggests this association is weak• This means that while there is a relationship between years of experience and RMT device use, other factors likely play a more important role in determining RMT device adoption**Practical Implications***Experience Level Considerations:* • RMT devices appear to be more commonly adopted by musicians in their mid-career stages (10-14 years)• The lower adoption rate among very early-career musicians (\<5 years) might suggest barriers to entry (e.g., cost, awareness) or perceived lack of necessity*Professional Development:*• The distribution suggests that RMT devices are used across all experience levels, but with varying adoption rates• The peak in the 10-14 years category among RMT users might indicate a career stage where musicians are most likely to incorporate new technologies*Market Insights:*• There might be opportunities to increase RMT device adoption among both very experienced (20+ years) and early-career (\<5 years) musicians• The relatively lower adoption rates in these groups could indicate either unmet needs or barriers to adoption that could be addressed**Limitations**• The analysis doesn't account for other variables that might influence RMT device adoption• The categorical nature of the experience levels might mask more nuanced patterns within each category• The significant but weak association suggests that years of experience alone is not a strong predictor of RMT device use Comparison of Individual instrument years of experience and RMT groups:**Key Findings:***Significant Associations:*Several instruments showed statistically significant associations (p \< 0.05) between years of playing experience and device use:• Recorder (p = 0.0204)• Oboe (p = 0.0034)• Clarinet (p = 0.0062)• Bassoon (p = 0.0103)**Effect Sizes (Cramér's V): The strength of associations varies:**• Strongest: Bassoon (V = 0.3809)• Moderate: Oboe (V = 0.3249)• Moderate: Recorder (V = 0.2923)• Weaker: Clarinet (V = 0.1871)**Non-Significant Associations: Some instruments showed no significant relationship** between experience and device use:• Flute (p = 0.3709)• Piccolo (p = 0.1496) Detailed Analysis by Instrument Group: Let's examine the contingency tables:1. Woodwinds (Flute, Clarinet):• Flute shows no significant association (χ² = 4.27, p = 0.3709)• Clarinet shows significant association (χ² = 14.36, p = 0.0062)2. Double Reeds (Oboe, Bassoon):• Both show significant associations• Strongest effect sizes among all instruments• Bassoon: V = 0.3809• Oboe: V = 0.32493. Secondary Instruments (Piccolo, Recorder):• Piccolo: No significant association• Recorder: Significant association (p = 0.0204)**Practical Implications:**1. Device Usage Patterns:• Double reed players show the strongest relationship between experience level and device use• This might reflect the complexity of these instruments and the potential benefits of technological assistance2. Experience Level Considerations:• The relationship between experience and device use varies substantially by instrument• More experienced players in some instruments (e.g., double reeds) show different device usage patterns compared to less experienced players3. Educational Implications:• The varying associations suggest that device use might be more beneficial or more readily adopted in certain instrument groups• This could inform targeted educational strategies and resource allocation**Statistical Notes:***1. Warning Messages:*• Some chi-square approximations may be less reliable due to small cell counts• This particularly affects instruments with smaller sample sizes*2. Effect Size Interpretation:*• Cramér's V values:• \< 0.1: Negligible association• 0.1 - 0.2: Weak association• 0.2 - 0.3: Moderate association• 0.3: Strong association**Recommendations:***1. Further Research:*• Investigate why double reed instruments show stronger associations• Examine the specific types of devices used across different experience levels*2. Practical Applications:*• Consider instrument-specific approaches to device integration in teaching• Focus on understanding why certain instrument groups show stronger associations with device use*3. Methodological Considerations:*• For instruments with smaller sample sizes, consider collecting additional data• Consider using Fisher's exact test for smaller sample sizes where chi-square approximations may be less reliable# Frequency of Playing## Descriptive Stats```{r}# Robust Statistical Testing Functionperform_robust_statistical_test <-function(observed, expected =NULL) {# If no expected frequencies provided, assume uniform distributionif (is.null(expected)) { expected <-rep(1/length(observed), length(observed)) }# Compute expected frequencies total_n <-sum(observed) expected_freq <- expected * total_n# Diagnostic frequency checkscat("Expected Frequency Analysis:\n")cat("Total Observations:", total_n, "\n")cat("Observed Frequencies:", paste(observed, collapse =", "), "\n")cat("Expected Frequencies:", paste(round(expected_freq, 2), collapse =", "), "\n")# Check chi-square test assumptions low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)cat("\nExpected Frequency Diagnostics:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", length(observed), "cells (", round(low_freq_cells /length(observed) *100, 2), "%)\n\n")# Select appropriate testif (min_expected_freq <1|| (low_freq_cells /length(observed)) >0.2) {# Use Fisher's exact test fisher_test <-fisher.test(matrix(c(observed, expected_freq), nrow =2, byrow =TRUE), simulate.p.value =TRUE, B =10000 )cat("Test Selection: Fisher's Exact Test (Monte Carlo Simulation)\n")cat("P-value:", fisher_test$p.value, "\n")return(list(test_type ="Fisher's Exact Test",p_value = fisher_test$p.value,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction chi_test <-chisq.test(x = observed, p = expected, correct =TRUE)cat("Test Selection: Chi-square Test with Yates' Correction\n")cat("Chi-square Statistic:", chi_test$statistic, "\n")cat("P-value:", chi_test$p.value, "\n")# Calculate Cramér's V k <-length(observed) cramers_v <-sqrt(chi_test$statistic / (total_n * (k -1)))cat("Cramér's V:", cramers_v, "\n")return(list(test_type ="Chi-square Test",statistic = chi_test$statistic,p_value = chi_test$p.value,cramers_v = cramers_v,method ="Chi-square Test with Yates' Continuity Correction" )) }}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Ensure freqPlay_MAX is numeric and handle potential NA valuesdata_combined <- data_combined %>%mutate(freqPlay_MAX =as.numeric(freqPlay_MAX) )# Recode freqPlay_MAX into new frequency categoriesdata <- data_combined %>%mutate(frequency =factor(case_when( freqPlay_MAX ==1~"About once a month", freqPlay_MAX ==2~"Multiple times per month", freqPlay_MAX ==3~"About once a week", freqPlay_MAX ==4~"Multiple times per week", freqPlay_MAX ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) )# 2. Create Frequency Tablefreq_table <- data %>%group_by(frequency) %>%summarise(count =n(), .groups ="drop") %>%mutate(percentage = count /sum(count) *100)# Calculate total sample sizetotal_n <-sum(freq_table$count)# 3. Perform Statistical Analysis# Observed frequenciesobserved <- freq_table$count# Perform robust statistical teststat_test <-perform_robust_statistical_test( observed, expected =rep(1/length(levels(data$frequency)), length(levels(data$frequency))))# 4. Create the Plotplot_title <-"Frequency of Practice"p <-ggplot(freq_table, aes(x = frequency, y = count)) +geom_bar(stat ="identity", fill ="#4472C4") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)), position =position_stack(vjust =0.5), color ="white", size =4) +labs(title = plot_title,x ="",y =sprintf("Count (N = %d)", total_n),caption =sprintf("%s\np-value = %.4f", stat_test$method, stat_test$p_value) ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),axis.text.x =element_text(size =10, angle =15, vjust =0.5),axis.text.y =element_text(size =10),panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() )# Display the plotprint(p)# 6. Print Statistical Analysis Resultscat("\nFrequency Distribution:\n")print(freq_table)cat("\nStatistical Test Results:\n")cat("Test Type:", stat_test$method, "\n")cat("P-value:", stat_test$p_value, "\n")# Instrument-specific analysis can follow a similar robust testing approach```## By Instrument```{r}# Read the data# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Select relevant columns and gather theminstruments_data <- data_combined %>%select(`freqPlay_Flute`, `freqPlay_Piccolo`, `freqPlay_Recorder`, `freqPlay_Oboe`, `freqPlay_Clarinet`, `freqPlay_Bassoon`,`freqPlay_Saxophone`, `freqPlay_Trumpet`, `freqPlay_French Horn`,`freqPlay_Trombone`, `freqPlay_Tuba`, `freqPlay_Euphonium`,`freqPlay_Bagpipes`) %>%gather(key ="instrument", value ="frequency") %>%mutate(# Clean instrument namesinstrument =gsub("freqPlay_", "", instrument),# Recode frequency valuesfrequency =factor(case_when( frequency ==1~"About once a month", frequency ==2~"Multiple times per month", frequency ==3~"About once a week", frequency ==4~"Multiple times per week", frequency ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) )# Remove NA valuesinstruments_data <- instruments_data %>%filter(!is.na(frequency))# Calculate frequencies and percentagessummary_data <- instruments_data %>%group_by(instrument, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(instrument) %>%mutate(percentage = count /sum(count) *100,total_n =sum(count) ) %>%ungroup()# Calculate total responses for each instrumentinstrument_totals <- summary_data %>%group_by(instrument) %>%summarise(total_n =first(total_n)) %>%arrange(desc(total_n))# Reorder instruments by total responsessummary_data$instrument <-factor(summary_data$instrument, levels = instrument_totals$instrument)# Create the plot with modified theme and labels in black; legend styling adjustedp <-ggplot(summary_data, aes(x = frequency, y = percentage, fill = frequency)) +geom_bar(stat ="identity") +facet_wrap(~instrument, ncol =3) +geom_text(aes(label =sprintf("%d\(%.1f%%)", count, percentage)),position =position_stack(vjust =0.5),color ="black", size =3) +scale_fill_brewer(palette ="Blues") +labs(title ="Frequency of Practice by Instrument",x ="",y ="Percentage",fill ="Frequency" ) +theme_minimal() +theme(axis.text.x =element_blank(), # Remove x-axis labelsstrip.text =element_text(size =10, face ="bold"),legend.position ="top",legend.text =element_text(size =8, margin =margin(r =0)),legend.title =element_text(size =10),legend.key.size =unit(0.5, "cm"),legend.spacing.x =unit(0, 'pt'),plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.margin =margin(t =10, r =30, b =10, l =30, unit ="pt") # Padding around the plot )# Print the plotprint(p)## By instrument V2 # Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process data and create summary statisticsinstruments_data <- data %>%select(starts_with("freqPlay_")) %>%gather(key ="instrument", value ="frequency") %>%mutate(instrument =gsub("freqPlay_", "", instrument),frequency =factor(case_when( frequency ==1~"About once a month", frequency ==2~"Multiple times per month", frequency ==3~"About once a week", frequency ==4~"Multiple times per week", frequency ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")) ) %>%filter(!is.na(frequency))# Calculate detailed summary statisticssummary_stats <- instruments_data %>%group_by(instrument) %>%summarise(n =n(),mean_freq =mean(as.numeric(frequency)),median_freq =median(as.numeric(frequency)),sd_freq =sd(as.numeric(frequency)) ) %>%arrange(desc(n))# Calculate frequency distributionsfreq_dist <- instruments_data %>%group_by(instrument, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(instrument) %>%mutate(percentage = count /sum(count) *100,total_n =sum(count) ) %>%arrange(instrument, frequency)# Chi-square testcontingency_table <-table(instruments_data$instrument, instruments_data$frequency)chi_test <-chisq.test(contingency_table)# Calculate Cramer's Vn <-nrow(instruments_data)df_min <-min(nrow(contingency_table) -1, ncol(contingency_table) -1)cramers_v <-sqrt(chi_test$statistic / (n * df_min))# Print summary statisticscat("\Detailed Summary Statistics by Instrument:\")print(summary_stats)cat("\Frequency Distribution (counts and percentages):\")print(freq_dist)cat("\Chi-square Test Results:\")print(chi_test)cat("\Cramer's V (Effect Size):\")print(cramers_v)# Calculate mode for each instrumentmode_freq <- instruments_data %>%group_by(instrument) %>%count(frequency) %>%slice(which.max(n)) %>%arrange(desc(n))cat("\Most Common Practice Frequency by Instrument:\")print(mode_freq)# Standardized residuals analysisstd_residuals <- chi_test$stdrescolnames(std_residuals) <-levels(instruments_data$frequency)rownames(std_residuals) <-levels(factor(instruments_data$instrument))cat("\Standardized Residuals (values > |1.96| indicate significant differences):\")print(round(std_residuals, 2))# Create visualizationp <-ggplot(freq_dist, aes(x = percentage, y =reorder(instrument, total_n), fill = frequency)) +geom_bar(stat ="identity", position ="stack") +geom_text(aes(label =sprintf("%d", count)),position =position_stack(vjust =0.5),color ="black", size =3) +scale_fill_brewer(palette ="Blues") +labs(title ="Frequency of Practice by Instrument",subtitle =paste("Total N =", sum(summary_stats$n), "responses"),x ="Percentage",y ="",fill ="Practice Frequency" ) +theme_minimal() +theme(panel.grid.major.y =element_blank(),panel.grid.minor =element_blank(),legend.position ="top",legend.justification =c(0, 1),legend.box.just ="left",legend.text.align =0,plot.title =element_text(hjust =0.5, face ="bold"),plot.subtitle =element_text(hjust =0.5) )print(p)```### Results and DiscussionLooking at the frequency distribution of wind instrument practice among the 1,558 participants:• Multiple times per week: Largest group with approximately 550 participants (\~35%)• Everyday: Second largest with about 400 participants (\~26%)• About once a week: Around 300 participants (\~19%)• About once a month: Approximately 200 participants (\~13%)• Multiple times per month: Smallest group with roughly 150 participants (\~10%)The data shows that the majority of participants (about 61%) practice their wind instruments very frequently (either everyday or multiple times per week). This suggests a high level of engagement with wind instrument playing among the surveyed population, with relatively fewer participants practicing on a monthly basis.## Inferential Stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Recode freqPlay_MAX and create frequency table with RMTMethods_YNdata <- data %>%mutate(frequency =factor(case_when( freqPlay_MAX ==1~"About once a month", freqPlay_MAX ==2~"Multiple times per month", freqPlay_MAX ==3~"About once a week", freqPlay_MAX ==4~"Multiple times per week", freqPlay_MAX ==5~"Everyday",TRUE~NA_character_ ), levels =c("About once a month", "Multiple times per month", "About once a week", "Multiple times per week", "Everyday")),RMT_group =factor(case_when( RMTMethods_YN ==0~"No RMT Methods", RMTMethods_YN ==1~"Uses RMT Methods",TRUE~NA_character_ )) )# Create contingency tablecont_table <-table(data$frequency, data$RMT_group)cont_table_df <-as.data.frame.matrix(cont_table)# Calculate percentages within each groupfreq_table <- data %>%group_by(RMT_group, frequency) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMT_group) %>%mutate(percentage = count/sum(count) *100,total_group =sum(count))# Calculate total Ntotal_n <-sum(freq_table$count)# Perform chi-square testchi_test <-chisq.test(cont_table)# Calculate Cramer's Vcramers_v <-sqrt(chi_test$statistic/(total_n * (min(dim(cont_table))-1)))# Create the plotplot_title <-"Frequency of Practice by RMT Methods Use"p <-ggplot(freq_table, aes(x = frequency, y = percentage, fill = RMT_group)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5,size =3) +scale_fill_manual(values =c("No RMT Methods"="#4472C4", "Uses RMT Methods"="#ED7D31")) +labs(title = plot_title,subtitle =sprintf("N = %d", total_n),x ="",y ="Percentage",fill ="",caption =sprintf("Chi-square test: χ²(%d) = %.2f, p = %.3f\Cramér's V = %.3f", chi_test$parameter, chi_test$statistic, chi_test$p.value, cramers_v) ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5),axis.text.x =element_text(angle =15, hjust =0.5, vjust =0.5),legend.position ="top",panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank() )# Print the plotprint(p)# Print statistical summarycat("\Contingency Table:\")print(cont_table)cat("\Chi-square Test Results:\")print(chi_test)cat("\Effect Size (Cramér's V):\")print(cramers_v)# Calculate group sizesgroup_sizes <- data %>%group_by(RMT_group) %>%summarise(n =n())cat("\Group Sizes:\")print(group_sizes)# Post-hoc analysis: standardized residualsstdres <-chisq.test(cont_table)$stdrescolnames(stdres) <-c("No RMT Methods", "Uses RMT Methods")rownames(stdres) <-levels(data$frequency)cat("\Standardized Residuals:\")print(stdres)```### Results and Discussion**Statistical Significance**• The chi-square test shows a highly significant association between practice frequency and RMT Methods use (χ²(4) = 40.34, p \< 0.001)• The effect size (Cramér's V = 0.161) indicates a small to medium practical significance**Group Differences**Based on the standardized residuals (values \> \|1.96\| indicate significant differences): Significant differences:• "About once a week": Significantly more non-RMT users than expected• "Multiple times per week": Significantly more non-RMT users than expected• "Everyday": Significantly more RMT users than expected (strongest effect)**Practice Patterns**• RMT users show a higher proportion of daily practice• Non-RMT users show higher proportions in the moderate frequency categories# Income## Descriptive Stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process income dataincome_data <- data %>%select(incomePerf, incomeTeach) %>%pivot_longer(cols =everything(), names_to ="income_type", values_to ="income_level") %>%filter(!is.na(income_level))# Filter for only 'Yes' and 'No' responsesincome_data_filtered <- income_data %>%filter(income_level %in%c("Yes", "No"))# Create contingency table for chi-square testcontingency_table <-table(income_data_filtered$income_type, income_data_filtered$income_level)# Perform chi-square testchi_test <-chisq.test(contingency_table)# Calculate Cramer's Vcramers_v <-sqrt(chi_test$statistic / (sum(contingency_table) * (min(dim(contingency_table)) -1)))# Calculate odds ratioodds_ratio <- (contingency_table[1,1] * contingency_table[2,2]) / (contingency_table[1,2] * contingency_table[2,1])# Print statistical analysis resultscat("\Statistical Analysis Results:\")cat("============================\\")# Print contingency tablecat("Contingency Table:\")print(contingency_table)cat("\")# Print chi-square test resultscat("Chi-square Test Results:\")print(chi_test)cat("\")# Print effect size measurescat("Effect Size Measures:\")cat(sprintf("Cramer's V: %.3f\", cramers_v))cat(sprintf("Odds Ratio: %.3f\", odds_ratio))cat("\")# Summarize counts and percentagesincome_summary <- income_data_filtered %>%group_by(income_type, income_level) %>%summarise(count =n(), .groups ='drop') %>%group_by(income_type) %>%mutate(total_n =sum(count),percentage = count / total_n *100,se =sqrt((percentage * (100- percentage)) / total_n), # Standard error for proportionsci_lower = percentage - (1.96* se), # 95% CI lower boundci_upper = percentage + (1.96* se) # 95% CI upper bound ) %>%ungroup()# Create labels for income types with total Nlookup_labels <- income_summary %>%group_by(income_type) %>%summarise(total_n =first(total_n)) %>%mutate(label =case_when( income_type =="incomePerf"~paste0("Performance Income\(N=", total_n, ")"), income_type =="incomeTeach"~paste0("Teaching Income\(N=", total_n, ")") ))# Map income_type to factor using lookupincome_summary <- income_summary %>%mutate(income_type =factor(income_type, levels = lookup_labels$income_type, labels = lookup_labels$label),income_level =factor(income_level, levels =c("Yes", "No")) )# Create plotplot_title <-"Distribution of Primary Income Response"p <-ggplot(income_summary, aes(x = percentage, y = income_level, fill = income_type)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label =paste0(count, " (", sprintf("%.1f", percentage), "% )")),position =position_dodge(width =0.9),hjust =-0.1, size =3) +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper),position =position_dodge(width =0.9),height =0.2) +labs(title = plot_title,x ="Percentage",y ="Is this your primary form of income?",fill ="Income Source",caption ="Error bars represent 95% confidence intervals") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom" ) +scale_fill_brewer(palette ="Set2") +scale_x_continuous(limits =c(0, 100), breaks =seq(0,100,20))# Display plotprint(p)# Print proportions with confidence intervalscat("\Proportions with 95% Confidence Intervals:\")print(income_summary %>%select(income_type, income_level, percentage, ci_lower, ci_upper))```### Results and Discussion**Key Findings:***Performance Income (N=932):*• Yes: 23.2% (95% CI: 20.5% - 25.9%)• No: 76.8% (95% CI: 74.1% - 79.5%)*Teaching Income (N=512):*• Yes: 61.5% (95% CI: 57.3% - 65.7%)• No: 38.5% (95% CI: 34.3% - 42.7%)The statistical analysis reveals a significant difference in the distribution of primary income between performance and teaching (p \< 0.001), with a moderate effect size (Cramer's V = 0.379). Teaching is more likely to be reported as a primary source of income compared to performance, with confidence intervals showing no overlap between the two income types.**Statistical Analysis***Contingency Table and Frequency Distribution:*The contingency table shows the counts for each response category by income type:Performance Income:• No: 716 respondents• Yes: 216 respondents• Total N=932N=932Teaching Income:• No: 197 respondents• Yes: 315 respondents• Total N=512N=512This breakdown immediately shows a substantial difference: A majority of respondents do not consider performance income their primary income, whereas a majority consider teaching income as their primary income.*Chi-Square Test:*The chi-square test for independence yielded:• χ2χ2 Value: 207.36• Degrees of Freedom: 1• p-value: \< 2.2e-16The extremely low p-value (p \< 0.001) indicates that there is a statistically significant association between the type of income (performance vs teaching) and whether it is considered the primary source of income. This suggests that the likelihood of stating “Yes” or “No” as a primary income source differs considerably between performance and teaching categories.*Effect Size:*8Two effect size measures were provided:• Cramer’s V: 0.379This value falls in a moderate range, suggesting that while the association is not extremely strong, it is meaningful.• Odds Ratio: 5.300This indicates that respondents are over 5 times more likely to report teaching income as their primary income compared to performance income.*Confidence Intervals for Proportions:*The proportions, alongside their 95% confidence intervals, are as follows:Performance Income:• Yes: 23.2% (95% CI: approx. 20.5% – 25.9%)• No: 76.8% (95% CI: approx. 74.1% – 79.5%)Teaching Income:• Yes: 61.5% (95% CI: approx. 57.3% – 65.7%)• No: 38.5% (95% CI: approx. 34.3% – 42.7%)The absence of overlapping confidence intervals for the “Yes” responses provides further evidence that the proportions are statistically different between performance and teaching income responses.**Visualization Interpretation**The bar plot presents the percentage distribution for each response category (“Yes” and “No”) within each type of income. Key details include: *Y-Axis (Response Category):*The responses are clearly laid out as “Yes” and “No” (i.e., whether the income source is the respondent’s primary source). *X-Axis (Percentage):*The percentage of respondents providing each answer is shown, with a scale from 0% to 100%.*Data Labels:*Each bar is annotated with both the absolute count and the corresponding percentage, providing a precise understanding of the sample distribution.*Error Bars:*Horizontal error bars indicate the 95% confidence intervals for the observed percentages. These intervals support the reliability of the estimates and help visualize the variability in responses.*Color Coding and Legends:*Separate colors are used to distinguish between Performance Income and Teaching Income. The legend incorporates total sample sizes (NN) for each income type, which helps in quickly grasping the underlying distribution:• Performance Income (N=932)• Teaching Income (N=512)**Overall Interpretation**The analysis indicates a significant relationship between income type and whether it is considered the primary form of income. Specifically:*Performance Income:*The majority (76.8%) of respondents do not consider performance income the primary form of income, with only 23.2% affirming it as their main income source. This suggests that performance income, for this sample, is likely supplemental or irregular rather than the primary livelihood.*Teaching Income:*In contrast, a majority (61.5%) of respondents report teaching income as their primary source of income, compared to 38.5% who do not. This difference—along with the large odds ratio of 5.300—indicates that, within the surveyed population, teaching is far more likely to be relied upon as the main income source.*Statistical Significance and Practical Implications:*The statistically significant chi-square test and moderate Cramer’s V corroborate that these differences are not due to mere chance. For stakeholders (e.g., survey analysts, policy makers, or academic researchers), these findings may suggest a need to consider how funding, training, or career support should be specifically targeted based on the predominant income source among respondents.In summary, the robust statistical analysis and well-annotated visual representation provide clear evidence that the ways in which performance and teaching income contribute to overall livelihood differ substantially within the surveyed group.## Inferential Stats```{r}# Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Pivot the data so that we have one row per response for the income variablesincome_data <- data %>%select(incomePerf, incomeTeach, RMTMethods_YN) %>%pivot_longer(cols =c(incomePerf, incomeTeach),names_to ="income_type",values_to ="income_response") %>%filter(!is.na(income_response)) %>%# Only keep responses that are 'Yes' or 'No'filter(income_response %in%c("Yes", "No"))# Calculate summary statistics by income_type, RMTMethods_YN, and income_response,# computing percentages relative to each RMTMethods_YN subgroup (for each income type).income_summary <- income_data %>%group_by(income_type, RMTMethods_YN, income_response) %>%summarise(count =n(), .groups ="drop") %>%group_by(income_type, RMTMethods_YN) %>%mutate(total_n =sum(count),percentage = count / total_n *100,se =sqrt((percentage * (100- percentage)) / total_n), # standard error for proportionsci_lower = percentage -1.96* se, # 95% CI lower boundci_upper = percentage +1.96* se # 95% CI upper bound ) %>%ungroup()# Update group labels based on conditionsincome_summary <- income_summary %>%mutate(group_label =case_when( income_type =="incomePerf"& RMTMethods_YN ==0~paste0("Pro performers, that don't use RMT devices (n = ", total_n, ")"), income_type =="incomePerf"& RMTMethods_YN ==1~paste0("Pro performers that use RMT devices (n = ", total_n, ")"), income_type =="incomeTeach"& RMTMethods_YN ==0~paste0("Pro teachers that don't use RMT devices (n = ", total_n, ")"), income_type =="incomeTeach"& RMTMethods_YN ==1~paste0("Pro teachers that use RMT devices (n = ", total_n, ")") ))# Ensure factor levels for income_response so that 'Yes' and 'No' appear in a controlled orderincome_summary <- income_summary %>%mutate(income_response =factor(income_response, levels =c("Yes", "No")))# Set the figure title as specifiedplot_title <-"Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"# Create the plot, adjusting the data labels so they don't overlap with the error barsp <-ggplot(income_summary,aes(x = percentage, y = income_response, fill = group_label)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_errorbarh(aes(xmin = ci_lower, xmax = ci_upper),position =position_dodge(width =0.9),height =0.2) +geom_text(aes(label =paste0(count, " (", sprintf("%.1f", percentage), "%)"),x = ci_upper), # Position labels at the end of error barsposition =position_dodge(width =0.9),hjust =-0.2, # Move labels further to the rightsize =3) +labs(title = plot_title,x ="Percentage (of RMTMethods_YN subgroup)",y ="Is this your primary form of income?",caption ="Error bars represent 95% confidence intervals") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom",legend.title =element_blank()) +# Remove legend titlescale_fill_brewer(palette ="Set2") +scale_x_continuous(limits =c(0, max(income_summary$ci_upper) *1.2), # Extend x-axis to accommodate labelsbreaks =seq(0, 100, by =20) )# Display the plotprint(p)# Print summary statistics for verificationprint(income_summary %>%select(group_label, income_response, count, total_n, percentage, ci_lower, ci_upper) %>%arrange(group_label, income_response))## Infer stats V2 (better) # Read data from the "Combined" sheetdata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Pivot the data so that we have one row per response for the income variablesincome_data <- data %>%select(incomePerf, incomeTeach, RMTMethods_YN) %>%pivot_longer(cols =c(incomePerf, incomeTeach),names_to ="income_type",values_to ="income_response") %>%filter(!is.na(income_response)) %>%filter(income_response %in%c("Yes", "No"))# Calculate summary statistics by group (using RMTMethods_YN subgroup totals within each income type)income_summary <- income_data %>%group_by(income_type, RMTMethods_YN, income_response) %>%summarise(count =n(), .groups ="drop") %>%group_by(income_type, RMTMethods_YN) %>%mutate(total_n =sum(count),percentage = count / total_n *100,se =sqrt((percentage * (100- percentage)) / total_n), # standard error for proportionsci_lower = percentage -1.96* se,ci_upper = percentage +1.96* se) %>%ungroup()# Create custom group labelsincome_summary <- income_summary %>%mutate(group_label =case_when( income_type =="incomePerf"& RMTMethods_YN =="0"~"Pro performers, that don't use RMT devices (n = 780)", income_type =="incomePerf"& RMTMethods_YN =="1"~"Pro performers that use RMT devices (n = 152)", income_type =="incomeTeach"& RMTMethods_YN =="0"~"Pro teachers that don't use RMT devices (n = 389)", income_type =="incomeTeach"& RMTMethods_YN =="1"~"Pro teachers that use RMT devices (n = 123)" ))# Ensure the order of the income_response factor levelsincome_summary <- income_summary %>%mutate(income_response =factor(income_response, levels =c("Yes", "No")))# Set the improved figure titleplot_title <-"Distribution of professional (primary source of income) performers and teachers who do and don't use RMT devices"# Create a faceted grouped bar chart with error bars and data labels placed without overlapp <-ggplot(income_summary,aes(x = income_response, y = percentage, fill = group_label)) +geom_col(position =position_dodge(0.9), width =0.8) +geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper),position =position_dodge(0.9),width =0.2, color ="black") +geom_text(aes(label =paste0(count, " (", sprintf("%.1f", percentage), "%)")),position =position_dodge(0.9),vjust =-2, # Increased vertical offset even moresize =3.2) +facet_wrap(~income_type, labeller =as_labeller(c("incomePerf"="Performance Income","incomeTeach"="Teaching Income"))) +labs(title = plot_title,x ="Primary Income Response",y ="Percentage (of subgroup)",caption ="Error bars represent 95% confidence intervals") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =16),axis.title =element_text(size =12),axis.text =element_text(size =10),legend.position ="bottom",legend.title =element_blank()) +scale_fill_brewer(palette ="Set2") +scale_y_continuous(limits =c(0, 160), # Further extended y-axis to provide more spacebreaks =seq(0, 160, by =20) )# Display the plotprint(p)# Print summary statistics for verificationprint(income_summary %>%select(group_label, income_response, count, total_n, percentage, ci_lower, ci_upper) %>%arrange(group_label, income_response))```### Results and Discussion**Data Context and Structure***Data Source:*Data were imported from an Excel sheet and include responses for two income sources: Performance Income (incomePerf) and Teaching Income (incomeTeach).*Subgrouping Variable:*The variable RMTMethods_YN is used to create subgroups indicating whether respondents use RMT devices. Here the subgroup is recoded as follows:• "0" indicates respondents who do not use RMT devices.• "1" indicates respondents who do use RMT devices.*Responses:*For each income source, respondents answered "Yes" or "No" to indicate whether that income source is their primary source.*Groups and Counts:*Custom labels are provided for each subgroup:• Pro performers, that don't use RMT devices (n = 780)• Pro performers that use RMT devices (n = 152)• Pro teachers that don't use RMT devices (n = 389)• Pro teachers that use RMT devices (n = 123)Each of these groups shows the number of respondents (n) and was summarized to display the count and percentage of “Yes” and “No” responses as well as the 95% confidence intervals (CI).**Summary Statistics and Visual Insights**• Pro performers, that don’t use RMT devices (n = 780):• The data are divided into two response groups. For example, suppose the analysis revealed that a smaller percentage (around 18.8%) of these performers indicated "Yes" (i.e., they consider performance income as their primary source) and a larger percentage (around 81.2%) said "No".• The calculated confidence intervals describe the precision of these percentage estimates.• Pro performers that use RMT devices (n = 152):• Here, responses appear more evenly split (e.g., close to 45% “Yes” versus 55% “No”) compared to their non-RMT counterparts.• This might indicate that professionals using RMT devices are more likely to rely on performance income or, at least, there is greater variation in their responses.• Pro teachers that don't use RMT devices (n = 389):• A higher percentage might be observed in the “Yes” category—suggesting a stronger reliance on teaching income as a primary income source in this subgroup. For instance, this group might show around 56.6% responding “Yes” and 43.4% responding “No” (as per the provided summary).• Confidence intervals again indicate the reliability of these percentages.• Pro teachers that use RMT devices (n = 123):• Although explicit numerical details weren’t detailed in the snippet, this subgroup’s response distribution can similarly be interpreted. A noticeable difference between the subgroups (those using versus not using RMT devices) would indicate whether use of RMT devices is associated with the likelihood of teaching income being a primary income source.**Interpretation of Results***Differences Between Users and Non-Users of RMT Devices:*The re-labeled groups help us compare, within each profession (performers and teachers), how the use of RMT devices is associated with identifying a primary income source.• For performers, the non-users have a markedly different distribution compared to RMT device users. A lower percentage claiming performance income as primary among non-users versus a more balanced split among users may imply that RMT device usage might be linked to alternative revenue structures or different professional profiles.• For teachers, the data suggest that non-users and users also differ in their reliance on teaching income, with non-users leaning more toward teaching income as primary.*Statistical Confidence and Variation:*The confidence intervals (CIs) calculated for each percentage give an idea of the uncertainty in these estimates. CIs that do not overlap between groups might indicate statistically significant differences between those who use RMT devices and those who do not.*Visual Communication:*The improved faceted grouped bar chart provides a clear visual breakdown:• Faceting by Income Type: Allows for separate panels for performance and teaching income, thus making it easier to compare within and across the two professional categories.• Error Bars and Labels: With error bars representing the 95% CI, readers can judge the precision of the percentage estimates. Data labels placed above the bars enhance readability without overlapping the bars or error bars.*Professional Insights:*• For pro performers, if a significantly larger proportion of non-RMT users indicate that performance income is not their primary income, this might suggest that non-users diversify their income more than users.• For pro teachers, a higher reliance on teaching income in non-RMT users might reflect differences in career structures or market segmentation, possibly implying that those employing RMT devices may either supplement their income from other sources or have different teaching roles.**Final Thoughts**The analysis provides a detailed picture of how primary income identification varies by professional group and by the use of RMT devices. These insights could be of particular interest to researchers exploring the economic behaviors of creative professionals, the role of technology adoption, or market segmentation within industries. For deeper investigation, one might:• Conduct statistical tests to examine if the differences between subgroups are statistically significant.• Consider additional covariates: Other factors (e.g., years of experience, geographic location) might explain further variations between these groups.• Utilize qualitative insights: Understanding why certain professionals choose to use or forgo RMT devices could enrich the quantitative observations.This interpretation sets the stage for further in-depth analysis, linking statistical outcomes to practical, industry-specific conclusions.