Provide a full description and interpretation from the results of the following code. Include a breakdown of the descriptive statistics, as well as any statistical tests included in the code.
Code
# Install all required packages (if not already installed)required_packages <-c("dplyr", "ggplot2", "stats", "tidyr", "forcats", "broom","scales", "vcd", "svglite", "exact2x2", "exactci","ssanv", "testthat")# Install any missing packagesinstall.packages(setdiff(required_packages, rownames(installed.packages())))## Libraries and Directory#| echo: false#| output: falselibrary(readxl)library(dplyr)library(ggplot2)library(stats)library(tidyr)library(forcats)library(broom)library(scales)library(vcd) # For Cramer's V calculationlibrary(svglite)library(exact2x2)library(stringr)# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")
1 Overview
2 Gender
2.1 Descriptive Stats
Summary stats on gender data
Prints plot
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create gender summary# Overall, this code processes a dataset (data_combined) to summarize gender information by filtering out missing values, grouping by gender, calculating counts and percentages, adjusting specific gender labels, and finally sorting the results by count in descending order. The end result is stored in the variable 'gender_summary'gender_summary <- data_combined %>%filter(!is.na(gender)) %>%group_by(gender) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender )) %>%arrange(desc(count))# Create the gender plot with adjusted height scalinggender_plot <-ggplot(gender_summary, aes(x =reorder(gender, count), y = count, fill = gender)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Gender",x ="Gender",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(gender_summary$count) *1.15))# Display the plotprint(gender_plot)
Table 1: Gender Distribution Summary
# A tibble: 4 × 3
gender count percentage
<chr> <int> <dbl>
1 Male 750 48.1
2 Female 725 46.5
3 Nonbinary 68 4.36
4 Not specified 15 0.963
2.2 Results and Discussion
The gender distribution shows a fairly even split between Male and Female participants, with a much smaller representation of Nonbinary, fluid, and non-conforming-gender individuals, and those who chose not to disclose.
2.3 Comparison Stats
Code
## Combine categories # Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Another approach: Combine categories with small counts# Combine "Nonbinary" and "Not specified" into "Other"data_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Other", gender =="Nonbinary/gender fluid/gender non-conforming"~"Other",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a new contingency table with combined categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the new contingency tableprint("Contingency table with combined categories:")
[1] "Contingency table with combined categories:"
Code
print(gender_rmt_table_combined)
No RMT RMT
Female 642 83
Male 615 135
Other 73 10
Code
# Calculate expected counts for the new tableexpected_counts_combined <-chisq.test(gender_rmt_table_combined)$expectedprint("Expected counts with combined categories:")
[1] "Expected counts with combined categories:"
Code
print(expected_counts_combined)
No RMT RMT
Female 618.90244 106.09756
Male 640.24390 109.75610
Other 70.85366 12.14634
Code
# Perform chi-square test on the new tablechi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results with combined categories:")
[1] "Chi-square test results with combined categories:"
# Create a data frame for plotting with combined categoriesgender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)# Create the plot with combined categories#| fig.height: 7#| fig.width: 10#| out.height: "700px"#| fig.align: "center"rmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage (Combined Categories)",x ="RMT Methods Usage",y ="Number of Participants (N = 1558)",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plot with combined categoriesprint(rmt_plot_combined)
2.4 Results after combining into ‘Other’ category
Overview of the Data
The analysis examines the relationship between gender and Remote Monitoring Technology (RMT) usage across three gender categories: Female, Male, and Other. The contingency table shows the observed counts of participants in each combination of gender and RMT usage status.
Contingency Table Analysis
Observed Frequencies
Female participants: 642 do not use RMT; 83 use RMT (total: 725)
Male participants: 615 do not use RMT; 135 use RMT (total: 750)
Other gender category: 73 do not use RMT; 10 use RMT (total: 83)
The majority of individuals across all genders did not use RMT (“No RMT”).
In absolute terms, females are the largest group in the “No RMT” category (642), while males are the largest in the “RMT” category (135).
The “Other” gender group has the fewest individuals in both “RMT” (10) and “No RMT” (73) categories.
Expected Frequencies
The expected counts represent what we would expect to see if there were no relationship between gender and RMT usage (i.e., if the null hypothesis were true). These are calculated based on the assumption that RMT usage and gender are independent:
Female participants: Expected 618.90 non-users; 106.10 user
Male participants: Expected 640.24 non-users; 109.76 users
Other gender category: Expected 70.85 non-users; 12.15 users
Differences between Observed and Expected
Female participants: More non-users than expected (+23.10); fewer users than expected (-23.10)
Male participants: Fewer non-users than expected (-25.24); more users than expected (+25.24)
Other gender category: More non-users than expected (+2.15); fewer users than expected (-2.15)
Key Observations:
Under independence, females were expected to have about 618.9 individuals in “No RMT” and 106.1 in “RMT.” However, the actual numbers are 642 and 83, respectively, suggesting fewer females participated in RMT than expected.
Conversely, males were expected to have 640.2 individuals in “No RMT” and 109.8 in “RMT,” but the actual numbers are 615 and 135, suggesting more males participated in RMT than expected.
The “Other” group shows relatively small deviations from the expected counts.
Statistical Test Results
The chi-squared test evaluates whether the observed differences between the gender categories in RMT usage contingency table and the expected counts are statistically significant or could be attributed to random chance. The test outputs the following results:
Chi-square statistic (X²): 13.136
Degrees of freedom (df): 2
P-value: 0.001405
Interpretation
Statistical Significance: The p-value (0.001405) is well below the conventional threshold of 0.05, indicating a statistically significant association between gender and RMT usage. This means we can reject the null hypothesis that gender and RMT usage are independent.
Effect Direction:
Males show notably higher RMT usage than expected under the null hypothesis
Females show notably lower RMT usage than expected
The “Other” gender category shows slightly lower RMT usage than expected
Practical Significance:
Male participants are approximately 1.6 times more likely to use RMT compared to female participants (18.0% vs 11.4%)
The “Other” gender category’s RMT usage (12.0%) is closer to females than males
Strength of Association: While statistically significant, the chi-square value (13.136) suggests a modest association between gender and RMT usage in the context of the sample size (1,558 participants).
In simpler terms, there is indeed a meaningful relationship between gender and the use of RMT methods.
Limitations and Considerations
Categorical Combination: The “Other” category appears to be a combination of non-binary individuals and those who chose not to disclose their gender. This grouping, while necessary for statistical analysis due to smaller sample sizes, may obscure potential differences between these distinct groups.
Sample Size Differences: The “Other” gender category (n=83) is much smaller than the Female (n=725) and Male (n=750) categories, which affects the precision of estimates for this group.
No Control Variables: This analysis does not account for other variables that might influence RMT usage (e.g., age, technology access, health status) and might confound the relationship between gender and RMT usage.
Conclusion
There is compelling statistical evidence for a gender difference in RMT usage patterns, with males showing significantly higher adoption rates compared to females and those in the “Other” gender category. This finding may have implications for RMT implementation strategies, suggesting potential benefits from gender-specific approaches to technology adoption and support.
3 Age
3.1 Demographic Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age summaryage_summary <- data_combined %>%filter(!is.na(age)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ) ) %>%group_by(age_group) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the age plot with adjusted height scalingage_plot <-ggplot(age_summary, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = age_group)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Age Group",x ="Age Group (Years)",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(age_summary$count) *1.15))# Display the plotprint(age_plot)
Code
# Print summary statisticsprint("Age distribution summary:")
• The age distribution shows a right-skewed pattern with the majority of participants being between 18-40 years old, and fewer participants in the higher age ranges. Gender distribution is fairly balanced between male (48.1%) and female (46.5%) participants
• The largest age group is 20-29 years (31.9% of participants)
• There’s a gradual decrease in participation with age, with the exception of a slight increase in the 60+ category compared to 50-59
3.3 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age and RMT usage summaryage_rmt_summary <- data_combined %>%filter(!is.na(age), !is.na(RMTMethods_YN)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No", RMTMethods_YN ==1~"Yes" ) )# Calculate total N for the analysistotal_n <-nrow(age_rmt_summary)# Create contingency tableage_rmt_table <-table(age_rmt_summary$age_group, age_rmt_summary$RMTMethods_YN)# Print the contingency tableprint("Contingency Table:")
# If any expected count is below 5, use Fisher's exact testif(min_expected <5) {print("Some expected counts are less than 5; using Fisher's exact test instead.") fisher_test_results <-fisher.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Fisher's exact test results:")print(fisher_test_results) main_test_results <- fisher_test_results test_name <-"Fisher's exact test" test_statistic <-NA test_df <-NA test_pvalue <- fisher_test_results$p.value} else {# Run chi-square test with simulated p-values for robustness chi_square_sim <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Chi-square test with simulated p-value:")print(chi_square_sim) main_test_results <- chi_square_sim test_name <-"Chi-square test" test_statistic <- chi_square_sim$statistic test_df <- chi_square_sim$parameter test_pvalue <- chi_square_sim$p.value}
[1] "\nChi-square test with simulated p-value:"
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: age_rmt_table
X-squared = 35.047, df = NA, p-value = 9.999e-05
Code
# Calculate percentages within each age groupage_rmt_summary_stats <- age_rmt_summary %>%group_by(age_group, RMTMethods_YN) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(RMTMethods_YN) %>%mutate(total =sum(count),percentage = (count / total) *100 ) %>%arrange(RMTMethods_YN, factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the plot ### REPLACErmt_age_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +# Adjust data label: extra vertical space by increasing vjust (e.g., -1 for more space)geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-1, size =3.5) +labs(title ="RMT Device Use by Age Group",subtitle =paste("Percentages calculated within each RMT group (Yes/No)"),x ="Age Group (Years)",y =paste("Number of Participants (Total N =", total_n, ")"),fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =40, r =20, b =20, l =20) # Increased top margin for clarity ) +# Expand the y-axis to create extra space at the top for the data labelsscale_y_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(rmt_age_plot)
Code
# Calculate proportions for each RMT groupprint("\nProportions within each RMT group:")
[1] "\nProportions within each RMT group:"
Code
prop_table <-prop.table(age_rmt_table, margin =2) *100# Changed margin from 1 to 2print(round(prop_table, 2))
# Calculate the standardized residualsif(exists("chi_square_results")) { std_residuals <- chi_square_results$residualsprint("\nStandardized residuals:")print(round(std_residuals, 2))print("Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic")}
[1] "\nStandardized residuals:"
No Yes
20-29 -0.50 1.20
30-39 -1.61 3.89
40-49 0.44 -1.06
50-59 0.58 -1.40
60+ 0.64 -1.55
Under 20 1.16 -2.79
[1] "Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic"
Code
# Additional visualization: Create a bar chart showing the percentage of each age group within RMT usage groupspercentage_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%.1f%%", percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3.5) +labs(title ="Age Distribution Within RMT Usage Groups",subtitle ="Showing percentage of each age group within 'Yes' and 'No' RMT usage categories",x ="Age Group (Years)",y ="Percentage within RMT Group (%)",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =11, face ="italic"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the percentage plotprint(percentage_plot)
Code
# Bonferroni Correction and Visualisation# Perform pairwise chi-square tests between age groups with Bonferroni correction. This determines how strict the Bonferroni correction will be.print("\Pairwise comparisons between age groups with Bonferroni correction:")
[1] "\nPairwise comparisons between age groups with Bonferroni correction:"
Code
age_groups <-rownames(age_rmt_table)n_comparisons <-choose(length(age_groups), 2) #calculates the total number of pairwise comparisonspairwise_results <-data.frame(Group1 =character(),Group2 =character(),ChiSquare =numeric(),DF =numeric(),RawP =numeric(),CorrectedP =numeric(),Significant =character(),stringsAsFactors =FALSE)# For each pair of age groups, the code: ## Creates a 2×2 contingency table containing only those two age groups## Checks if expected cell counts are below 5## Uses either Chi-square or Fisher's exact test accordingly## Applies Bonferroni correction to control for multiple comparisons## Determines significance based on the corrected p-valuefor (i in1:(length(age_groups)-1)) {for (j in (i+1):length(age_groups)) { subset_tab <- age_rmt_table[c(i, j), ]# Check expected counts for this pair pair_expected <-chisq.test(subset_tab)$expected min_pair_expected <-min(pair_expected)# Choose appropriate test based on expected counts## Chi-square test assumes expected cell counts ≥5## When this assumption is violated, Fisher's exact test is used instead## This adaptive approach ensures statistical validity for each comparisonif(min_pair_expected <5) { pair_test <-fisher.test(subset_tab) test_stat <-NA test_df <-NA } else { pair_test <-chisq.test(subset_tab) test_stat <- pair_test$statistic test_df <- pair_test$parameter }# Apply Bonferroni correction## The original p-value is multiplied by the total number of comparisons## This makes the significance threshold more stringent to avoid false positives## The correction controls the family-wise error rate when making multiple comparisons## The min() function ensures the corrected p-value doesn't exceed 1 corrected_p <-min(pair_test$p.value * n_comparisons, 1)# Determine if the result is significant is_significant <-ifelse(corrected_p <0.05, "Yes", "No")# Add to results dataframe pairwise_results <-rbind(pairwise_results, data.frame(Group1 = age_groups[i],Group2 = age_groups[j],ChiSquare =if(is.na(test_stat)) NAelseround(test_stat, 2),DF = test_df,RawP =round(pair_test$p.value, 4),CorrectedP =round(corrected_p, 4),Significant = is_significant,stringsAsFactors =FALSE ))# Print the resultif(is.na(test_stat)) { message <-sprintf("Comparison %s vs %s: Fisher's exact test, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], pair_test$p.value, corrected_p, is_significant) } else { message <-sprintf("Comparison %s vs %s: Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], test_stat, test_df, pair_test$p.value, corrected_p, is_significant) }print(message) }}
[1] "Comparison 20-29 vs 30-39: Chi-square = 4.85, df = 1, raw p = 0.0277, Bonferroni corrected p = 0.4157, Significant: No"
[1] "Comparison 20-29 vs 40-49: Chi-square = 2.37, df = 1, raw p = 0.1241, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 50-59: Chi-square = 3.31, df = 1, raw p = 0.0687, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 20-29 vs 60+: Chi-square = 3.91, df = 1, raw p = 0.0479, Bonferroni corrected p = 0.7192, Significant: No"
[1] "Comparison 20-29 vs Under 20: Chi-square = 10.21, df = 1, raw p = 0.0014, Bonferroni corrected p = 0.0209, Significant: Yes"
[1] "Comparison 30-39 vs 40-49: Chi-square = 10.31, df = 1, raw p = 0.0013, Bonferroni corrected p = 0.0198, Significant: Yes"
[1] "Comparison 30-39 vs 50-59: Chi-square = 10.89, df = 1, raw p = 0.0010, Bonferroni corrected p = 0.0145, Significant: Yes"
[1] "Comparison 30-39 vs 60+: Chi-square = 12.33, df = 1, raw p = 0.0004, Bonferroni corrected p = 0.0067, Significant: Yes"
[1] "Comparison 30-39 vs Under 20: Chi-square = 20.83, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0001, Significant: Yes"
[1] "Comparison 40-49 vs 50-59: Chi-square = 0.08, df = 1, raw p = 0.7777, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs 60+: Chi-square = 0.13, df = 1, raw p = 0.7212, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 40-49 vs Under 20: Chi-square = 2.64, df = 1, raw p = 0.1043, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs 60+: Chi-square = 0.00, df = 1, raw p = 1.0000, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 50-59 vs Under 20: Chi-square = 1.21, df = 1, raw p = 0.2706, Bonferroni corrected p = 1.0000, Significant: No"
[1] "Comparison 60+ vs Under 20: Chi-square = 1.19, df = 1, raw p = 0.2763, Bonferroni corrected p = 1.0000, Significant: No"
Code
# Print the pairwise results tableprint("\Summary of pairwise comparisons:")
[1] "\nSummary of pairwise comparisons:"
Code
print(pairwise_results)
Group1 Group2 ChiSquare DF RawP CorrectedP Significant
X-squared 20-29 30-39 4.85 1 0.0277 0.4157 No
X-squared1 20-29 40-49 2.37 1 0.1241 1.0000 No
X-squared2 20-29 50-59 3.31 1 0.0687 1.0000 No
X-squared3 20-29 60+ 3.91 1 0.0479 0.7192 No
X-squared4 20-29 Under 20 10.21 1 0.0014 0.0209 Yes
X-squared5 30-39 40-49 10.31 1 0.0013 0.0198 Yes
X-squared6 30-39 50-59 10.89 1 0.0010 0.0145 Yes
X-squared7 30-39 60+ 12.33 1 0.0004 0.0067 Yes
X-squared8 30-39 Under 20 20.83 1 0.0000 0.0001 Yes
X-squared9 40-49 50-59 0.08 1 0.7777 1.0000 No
X-squared10 40-49 60+ 0.13 1 0.7212 1.0000 No
X-squared11 40-49 Under 20 2.64 1 0.1043 1.0000 No
X-squared12 50-59 60+ 0.00 1 1.0000 1.0000 No
X-squared13 50-59 Under 20 1.21 1 0.2706 1.0000 No
X-squared14 60+ Under 20 1.19 1 0.2763 1.0000 No
Code
# Create a heatmap of p-values for pairwise comparisons# Prepare data for heatmap## Red cells indicate p-values close to 0 (strong evidence of difference)## Yellow indicates p-values around 0.5 (moderate evidence)## White indicates p-values close to 1 (little evidence of difference)## Asterisks (*) mark statistically significant differences (p < 0.05)## The heatmap is symmetrical because the comparison of Group1 vs Group2 is the same as Group2 vs Group1heatmap_data <-matrix(NA, nrow =length(age_groups), ncol =length(age_groups))rownames(heatmap_data) <- age_groupscolnames(heatmap_data) <- age_groupsfor(i in1:nrow(pairwise_results)) { row_idx <-which(age_groups == pairwise_results$Group1[i]) col_idx <-which(age_groups == pairwise_results$Group2[i]) heatmap_data[row_idx, col_idx] <- pairwise_results$CorrectedP[i] heatmap_data[col_idx, row_idx] <- pairwise_results$CorrectedP[i] # Mirror the matrix}# Convert to long format for ggplotheatmap_long <-as.data.frame(as.table(heatmap_data))names(heatmap_long) <-c("Group1", "Group2", "CorrectedP")# Create the heatmapheatmap_plot <-ggplot(heatmap_long, aes(x = Group1, y = Group2, fill = CorrectedP)) +geom_tile() +scale_fill_gradient2(low ="red", mid ="yellow", high ="white", midpoint =0.5, na.value ="white",limits =c(0, 1), name ="Corrected p-value") +geom_text(aes(label =ifelse(is.na(CorrectedP), "", ifelse(CorrectedP <0.05, sprintf("%.4f*", CorrectedP),sprintf("%.4f", CorrectedP)))),size =3) +labs(title ="Pairwise Comparisons of RMT Usage Between Age Groups",subtitle ="Bonferroni-corrected p-values (* indicates significant at α = 0.05)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10)) +coord_fixed()# Display the heatmap## When analyzing the results from this code:## Look for cells with asterisks (*) in the heatmap or "Significant: Yes" in the printed results## These indicate which specific age groups differ significantly in RMT usage## The raw p-values show the strength of evidence before correction## The corrected p-values account for multiple testing and are more conservative## Pay attention to which test was used (Chi-square or Fisher's) for each comparison## This approach enables you to identify specifically which age groups differ from each other in their RMT usage patterns, rather than just knowing that "age is associated with RMT usage" in general.print(heatmap_plot)
Code
# Additional visualization: Create a bar chart showing the percentage of RMT usage by age grouppercentage_plot <-ggplot(age_rmt_summary_stats %>%filter(RMTMethods_YN =="Yes"), aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +labs(title ="Percentage of RMT Device Usage by Age Group",x ="Age Group (Years)",y ="Percentage (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) )# Display the percentage plotprint(percentage_plot)
3.4 Results and Discussion
Chi-Square Test
The chi-square test with simulated p-values (10,000 replicates) shows a statistically significant association between age group and RMT usage:
X-squared = 35.047, p-value < 0.0001
This indicates strong evidence against the null hypothesis of independence between age group and RMT device usage.
Expected Counts
All expected cell counts exceed 5, confirming the chi-square test’s validity
Minimum expected count: 25.02
Standardized Residuals
The standardized residuals help identify which specific cells contribute most to the chi-square statistic
Notable observations:
The 30-39 age group has a strong positive residual (3.89) for “Yes” responses, indicating significantly higher RMT usage than expected. The Under 20 age group has a strong negative residual (-2.79) for “Yes” responses, indicating significantly lower RMT usage than expected.
Pairwise Comparisons with Bonferroni Correction
To determine which specific age groups differ significantly from each other, pairwise chi-square tests were conducted with Bonferroni correction to control for multiple comparisons
Summary of Significant Pairwise Differences:
30-39 age group shows significantly higher RMT usage compared to: 40-49 age group (p = 0.0198), 50-59 age group (p = 0.0145), 60+ age group (p = 0.0067), Under 20 age group (p = 0.0001).
20-29 age group shows significantly higher RMT usage compared to: Under 20 age group (p = 0.0209).
No significant differences were found between: 20-29 and 30-39 age groups Any pairs among the 40-49, 50-59, 60+, and Under 20 age groups.
Visual Representation
The analysis includes three visualizations:
Bar chart of RMT usage counts by age group
Heatmap of pairwise comparison p-values
Percentage of RMT device usage by age group
Interpretation and Implications Age-Related Patterns in RMT Usage:
The 30-39 age group has the highest RMT adoption rate (23.37%). The 20-29 age group has the second highest adoption rate (16.70%). All other age groups have notably lower adoption rates (6.67% to 11.95%).
Key Findings:
There is a clear peak in RMT usage among adults in their 30s. Young adults (20-29) also show relatively high adoption. Both younger (under 20) and older (40+) age groups show significantly lower adoption rates. The 30-39 age group stands out as significantly different from most other age groups.
PracticalImplications:
Targeted interventions may be needed to increase RMT adoption among both younger and older populations. Understanding the factors that drive higher adoption in the 30-39 age group could inform strategies to increase usage in other groups. The 20-29 and 30-39 age groups could be considered as early adopters or technology champions. This comprehensive analysis provides strong statistical evidence for age-related differences in RMT device usage, with clear identification of which specific age groups differ significantly from others.
4 Instruments Played
4.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Define instrument familieswoodwinds <-c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder")brass <-c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn", "French Horn/Horn")others <-c("Bagpipes", "Other")# Process instrument-level dataWI_split_updated <- data_combined %>%select(WI) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI)) %>%mutate(WI =case_when( WI =="French Horn/Horn"~"French Horn", WI =="Oboe/Cor Anglais"~"Oboe",TRUE~ WI )) %>%filter(WI !="Unknown") %>%count(WI, sort =TRUE) %>%mutate(Family =case_when( WI %in%c(woodwinds, "Oboe") ~"Woodwinds", WI %in% brass ~"Brass",TRUE~"Others" ))# Calculate percentages for instrumentsWI_percentages_updated <- WI_split_updated %>%mutate(Percentage =round((n /3054) *100, 2))# Calculate family distributionfamily_distribution_updated <- WI_split_updated %>%group_by(Family) %>%summarise(Total =sum(n)) %>%mutate(Percentage =round((Total /3054) *100, 2))# Create instrument distribution plotinstrument_plot <-ggplot(data = WI_percentages_updated, aes(x =reorder(WI, n), y = n, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(n, " (", Percentage, "%)")), hjust =-0.1, size =3,position =position_dodge(width =1)) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.3))) +# Increased expansion for labelslabs(title ="Distribution of Wind Instruments by Count and Percentage",x ="Instrument",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(axis.text.y =element_text(size =10),plot.title =element_text(size =12, face ="bold"),plot.caption =element_text(size =10, hjust =0) )# Create family distribution plotfamily_plot <-ggplot(data = family_distribution_updated, aes(x =reorder(Family, -Total), y = Total, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(Total, "\n(", Percentage, "%)")), vjust =-0.5, size =4,position =position_dodge(width =1)) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Increased expansion for labelslabs(title ="Distribution by Instrument Family",x ="Instrument Family",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(plot.title =element_text(size =12, face ="bold"),legend.position ="none",plot.caption =element_text(size =10, hjust =0) )# Display plotsprint(instrument_plot)
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process instrument and RMT datainstrument_rmt_data <- data_combined %>%filter(!is.na(WI), !is.na(RMTMethods_YN)) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI),WI =case_when( WI =="French Horn/Horn"~"French Horn",TRUE~ WI ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) ) %>%filter(WI !="Unknown") %>%mutate(family =case_when( WI %in%c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder") ~"Woodwinds", WI %in%c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn") ~"Brass",TRUE~"Others" ))# Calculate counts and percentages for each RMT group (Changed grouping)family_rmt_summary <- instrument_rmt_data %>%group_by(family, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(RMTMethods_YN) %>%# Changed from family to RMTMethods_YNmutate(total =sum(count),percentage = (count / total) *100 )# Calculate total N for adding to the plottotal_n <-nrow(instrument_rmt_data)# Perform chi-square test on the full contingency tablecontingency_table <-table(instrument_rmt_data$family, instrument_rmt_data$RMTMethods_YN)chi_square_test <-chisq.test(contingency_table)print("Chi-square test results:")
# If any expected count is less than 5, issue a warning and switch to Fisher's exact testif(min(chi_square_test$expected) <5) {print("Chi-square test assumption violated. Switching to Fisher's exact test.") chi_square_test <-fisher.test(contingency_table)print("Fisher's exact test results:")print(chi_square_test)}# Create the plotrmt_distribution_plot <-ggplot(family_rmt_summary, aes(x = family, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge", color ="black") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3) +labs(title ="Distribution of Instrument Families within RMT Usage Groups",subtitle =sprintf("Percentages calculated within each RMT group (Chi-square: χ² = %.2f, df = %d, p = %.4f)", chi_square_test$statistic, chi_square_test$parameter, chi_square_test$p.value),x ="Instrument Family",y =paste("Number of Responses (Total N =", total_n, ")"),fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20) # Add more space at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Add more space above bars# Display the plotprint(rmt_distribution_plot)
Code
# Calculate and print odds ratiosprint("\nOdds ratios between instrument families and RMT usage:")
[1] "\nOdds ratios between instrument families and RMT usage:"
Code
family_rmt_odds <- family_rmt_summary %>%select(family, RMTMethods_YN, count) %>% tidyr::pivot_wider(names_from = RMTMethods_YN, values_from = count, values_fill =list(count =0)) %>%mutate(# Add a small constant to avoid division by zeroodds_ratio =if(all(c("RMT", "No RMT") %in%names(.))) { (RMT +0.5) / (sum(RMT) +0.5) / (((`No RMT`+0.5) / (sum(`No RMT`) +0.5))) } else {NA } )print(family_rmt_odds)
# Create a new data frame with percentages within RMT groups for easier interpretationrmt_percentages <- family_rmt_summary %>%select(family, RMTMethods_YN, count, percentage) %>%pivot_wider(names_from = RMTMethods_YN, values_from =c(count, percentage),names_glue ="{RMTMethods_YN}_{.value}")print("\nPercentage distribution within each RMT group:")
[1] "\nPercentage distribution within each RMT group:"
# Perform post-hoc analysis for each pair of instrument familiesprint("\nPairwise comparisons between instrument families:")
[1] "\nPairwise comparisons between instrument families:"
Code
families <-unique(instrument_rmt_data$family)if(length(families) >1) {for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j]))# Create contingency table for this pair pair_table <-table(subset_data$family, subset_data$RMTMethods_YN)# Check if table has enough data for chi-square testif(all(dim(pair_table) >0) &&min(pair_table) >0) { pair_test <-chisq.test(pair_table) test_type <-"Chi-square" } else {# Use Fisher's exact test as fallback pair_test <-fisher.test(pair_table, simulate.p.value =TRUE, B =10000) test_type <-"Fisher's exact" }print(sprintf("\n%s vs %s:", families[i], families[j]))if(test_type =="Chi-square") {print(sprintf("%s test: Chi-square = %.2f, df = %d, p = %.4f", test_type, pair_test$statistic, pair_test$parameter, pair_test$p.value)) } else {print(sprintf("%s test: p = %.4f", test_type, pair_test$p.value)) } } }}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square test: Chi-square = 0.01, df = 1, p = 0.9156"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square test: Chi-square = 26.37, df = 1, p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square test: Chi-square = 7.27, df = 1, p = 0.0070"
Code
# Perform pairwise comparisons between instrument families with Bonferroni correction:print("\nPairwise comparisons between instrument families (with Bonferroni correction):")
[1] "\nPairwise comparisons between instrument families (with Bonferroni correction):"
Code
families <-unique(instrument_rmt_data$family)n_comparisons <-length(families) * (length(families)-1) /2for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j])) pair_test <-chisq.test(table(subset_data$family, subset_data$RMTMethods_YN)) p_value_corrected <-min(pair_test$p.value * n_comparisons, 1) # Correction and capped at 1print(sprintf("\n%s vs %s:", families[i], families[j]))print(sprintf("Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f", pair_test$statistic, pair_test$parameter, pair_test$p.value, p_value_corrected)) }}
[1] "\nWoodwinds vs Others:"
[1] "Chi-square = 0.01, df = 1, raw p = 0.9156, Bonferroni corrected p = 1.0000"
[1] "\nWoodwinds vs Brass:"
[1] "Chi-square = 26.37, df = 1, raw p = 0.0000, Bonferroni corrected p = 0.0000"
[1] "\nOthers vs Brass:"
[1] "Chi-square = 7.27, df = 1, raw p = 0.0070, Bonferroni corrected p = 0.0210"
Code
# Add new visualization specifically showing the distribution within RMT groupspercentage_plot <-ggplot(family_rmt_summary, aes(x = family, y = percentage, fill = family)) +geom_bar(stat ="identity") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +facet_wrap(~ RMTMethods_YN, scales ="free_y") +labs(title ="Distribution of Instrument Families within Each RMT Usage Group",subtitle ="Showing percentage of each instrument family within 'RMT' and 'No RMT' groups",x ="Instrument Family",y ="Percentage within RMT Group (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, face ="italic"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none" )# Display the percentage plotprint(percentage_plot)
4.3 Results and Discussion
Based on the provided statistical output, there are significant differences in Respiratory Muscle Training (RMT) usage across instrument families, with brass players showing notably higher adoption rates compared to woodwind players and other instrumentalists.
Classification by Instrument Family:
Instruments are grouped into families based on predefined lists. For example:
Instruments like “Flute,” “Piccolo,” “Clarinet,” etc., are grouped as Woodwinds. Instruments like “Trumpet,” “Trombone,” “Tuba,” etc., are grouped as Brass. All others are assigned to a default category called Others.
This highly significant p-value (<0.001) indicates strong evidence of an association between instrument family and RMT usage.
Pairwise Comparisons
Woodwinds vs. Others:
Chi-square = 0.01, df = 1, p = 0.9156
Bonferroni corrected p = 1.0000
Interpretation: No significant difference in RMT usage between woodwind players and other instrumentalists.
Woodwinds vs. Brass:
Chi-square = 26.37, df = 1, p < 0.0001
Bonferroni corrected p < 0.0001
Interpretation: Highly significant difference, with brass players using RMT at a significantly higher rate than woodwind players.
Others vs. Brass:
Chi-square = 7.27, df = 1, p = 0.0070
Bonferroni corrected p = 0.0210
Interpretation: Significant difference even after correction for multiple comparisons, with brass players using RMT at a higher rate than other instrumentalists.
Key findings from the pairwise comparisons:
• Woodwinds vs. Brass: Highly significant difference (p < 0.0001)
• Others vs. Brass: Significant difference (p = 0.007)
• Woodwinds vs. Others: No significant difference (p = 0.916)
There is a statistically significant difference in RMT usage across instrument families
Brass players show significantly different patterns of RMT usage compared to both Woodwind players and Others
Woodwind players and Others show similar patterns of RMT usage
The proportion of RMT usage is highest among Brass players
X-squared = 55.92, df = 13, p-value = 2.784e-07
The plot shows the percentage distribution of instruments within each RMT group (No RMT vs RMT), with the actual counts and percentages labeled on each bar. The chi-square test confirms a significant association between instrument type and RMT usage (p < 0.001).
The Tuba, Euphonium, and Bagpipes have the highest percentages of RMT usage among instruments with at least 10 players.
The loop over instrument families now includes a Bonferroni correction to adjust p-values for multiple comparisons. The corrected p-value is calculated by multiplying the raw p-value by the number of pairwise comparisons. The output for each pair is printed with both raw and corrected p-values.
These modifications ensure that the chi-square test is valid (by checking the expected counts), and the risk of Type I error in multiple pairwise comparisons is reduced. The overall robustness of the analysis is thereby improved.
Odds Ratio Calculation:
Although the code includes a basic calculation for the odds ratios between instrument families, this step is intended to quantify the strength of the association between each family’s membership and RMT usage. A more sophisticated approach (like logistic regression) might be used for robust odds ratio calculations, but the given code calculates a simplified measure.
Calculated from raw counts:
Brass vs. Woodwinds: Brass players have approximately 1.71 times higher odds of using RMT compared to woodwind players (21.8% vs 14.0%).
Brass vs. Others: Brass players have approximately 1.64 times higher odds of using RMT compared to other instrumentalists (21.8% vs 14.5%).
Woodwinds vs. Others: Nearly identical RMT usage rates (14.0% vs 14.5%), confirming the non-significant chi-square result.
Comprehensive Interpretation
Brass instrumentalists stand out: Players of brass instruments show significantly higher adoption of RMT methods compared to both woodwind players and other instrumentalists. This difference remains significant even after applying the conservative Bonferroni correction for multiple comparisons.
Woodwinds and Others are similar: There is no meaningful difference in RMT usage between woodwind players and other instrumentalists, as evidenced by the very low chi-square value (0.01) and high p-value (0.9156).
Potential explanations: Several factors might explain the higher RMT adoption among brass players:
Specific physical demands or challenges unique to brass playing
Different pedagogical traditions or technological integration in brass education
Demographic differences between instrument families (e.g., age, gender, education level)
Potential differences in injury patterns or physical concerns across instrument families
Conclusion
Brass instrumentalists demonstrate significantly higher adoption of RMT methods compared to other musician groups. This finding may have implications for how RMT technologies are marketed, designed, or implemented across different instruments. Further research could explore the underlying reasons for this difference, which could inform more targeted approaches to technology integration in music education and performance.
5 Skill Level
5.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Extract playAbility_MAX and remove 0 responsesplot_data <- data_combined %>%filter(playAbility_MAX !=0, !is.na(playAbility_MAX)) %>%# Added NA checkmutate(playAbility_MAX =as.factor(playAbility_MAX)) %>%count(playAbility_MAX) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)"))# Define custom labels for x-axiscustom_labels <-c("1"="Novice", "2"="Beginner", "3"="Intermediate", "4"="Advanced", "5"="Expert")# Get the actual levels present in the dataactual_levels <-levels(plot_data$playAbility_MAX)# Create Plotplayability_plot <-ggplot(plot_data, aes(x = playAbility_MAX, y = n)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label = label), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability (Max Score)",x ="Play Ability (Novice = 1 to Expert = 5)",y ="Count of Participants (N = 1558)" ) +scale_x_discrete(# Use actual levels from data rather than hardcoded valueslabels = custom_labels[actual_levels] ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the Plotprint(playability_plot)
Code
# Combined Categories# Read the datalibrary(tidyverse)library(readxl)data_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process playAbility_MAX with the new category definitionsplot_data <- data_combined %>%filter(!is.na(playAbility_MAX)) %>%mutate(# Ensure playAbility_MAX is numericplayAbility_MAX =as.numeric(playAbility_MAX),# Create the new combined categories with half-point valuesskill_level =case_when( playAbility_MAX %in%c(0, 1, 1.5, 2) ~"Novice/Beginner", playAbility_MAX %in%c(2.5, 3, 3.5) ~"Intermediate", playAbility_MAX %in%c(4, 4.5, 5) ~"Advanced/Expert",TRUE~NA_character_ ),# Order factor levelsskill_level =factor(skill_level, levels =c("Novice/Beginner", "Intermediate", "Advanced/Expert")) ) %>%# Remove any NA values from our categoriesfilter(!is.na(skill_level)) %>%# Count by skill levelgroup_by(skill_level) %>%summarise(count =n()) %>%# Calculate percentagesmutate(total =sum(count),percentage = (count / total) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") )# Calculate total valid responses for y-axis labeltotal_n <-sum(plot_data$count)# Create the plotggplot(plot_data, aes(x = skill_level, y = count, fill = skill_level)) +geom_bar(stat ="identity", width =0.7) +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Distribution of Play Ability (Combined Categories)",x ="Play Ability Level",y =paste0("Number of Participants (N = ", total_n, ")") ) +scale_fill_manual(values =c("Novice/Beginner"="#4682B4", "Intermediate"="#6495ED", "Advanced/Expert"="#1E90FF")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =12),axis.text =element_text(size =11),legend.position ="none",panel.grid.major.x =element_blank() ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))
Code
# Additionally, print value counts to help diagnose any issuesvalue_counts <- data_combined %>%filter(!is.na(playAbility_MAX)) %>%count(playAbility_MAX) %>%arrange(playAbility_MAX)print("Distribution of values in playAbility_MAX:")
Looking at the distribution of self-reported playing ability scores (1 = Novice to 5 = Expert) among the 1,558 participants:
• Score 4 is the most common with approximately 500 participants (~32%), indicating many consider themselves advanced players
• Score 3 is the second most frequent with about 450 participants (~29%), representing intermediate-level players
• Score 5 (Expert level) has roughly 350 participants (~22%)
• Score 2 (Beginner-intermediate) has about 200 participants (~13%)
• Score 1 (Novice) is the least common with fewer than 100 participants (~4%)
The distribution is slightly negatively skewed (skewed towards higher scores), with the majority of participants (>80%) rating their playing ability as intermediate or higher (scores 3-5). This suggests that the sample consists primarily of experienced wind instrument players, with relatively few beginners. The peak at score 4 indicates that most participants consider themselves advanced, but not expert-level players.
The plot shows the distribution of self-reported playing ability (1 = Novice to 5 = Expert) for each wind instrument. Some key observations:
• Flute shows the highest number of responses, with a right-skewed distribution (more advanced players)
• Most instruments show fewer responses overall compared to flute and clarinet
• Bagpipes and euphonium show notably fewer responses
• The warning message indicates some missing values were removed, which is expected as not all participants play all instruments
5.2.1 Combined Groups
Key Observations:
Strong Positive Skew: The distribution is heavily skewed toward the higher end of the scale, with 71.1% of participants (1,104 out of 1,558) falling in the 4.0-5.0 range.
Highest Frequencies:
Level 4.0: 484 participants (31.1%)
Level 5.0: 447 participants (28.7%)
Level 3.0: 222 participants (14.2%)
Lowest Frequencies:
Level 0.0: 1 participant (0.1%)
Level 1.5: 3 participants (0.2%)
Level 1.0: 6 participants (0.4%)
Half-Point vs. Whole-Number Ratings: Half-point ratings (1.5, 2.5, 3.5, 4.5) consistently show lower frequencies than adjacent whole numbers, suggesting raters may prefer whole numbers when assessing ability but use half points for more nuanced judgments.
This group represents a very small portion of the sample, indicating either few beginners participated or the population being studied generally has higher ability levels.
Clearly the dominant group, comprising more than two-thirds of the sample.
Outlier (0.0): 1 participant (0.1%)
This likely represents an error, a special case, or potentially a “non-player” category.
Implications and Considerations
Potential Explanations for the Skewed Distribution:
Population Characteristics: This may accurately reflect a population primarily consisting of advanced players, such as professional or experienced musicians.
Self-Selection Bias: Advanced players might be more likely to participate in studies related to their expertise or complete relevant surveys.
Self-Assessment Inflation: If self-reported, participants may tend to overestimate their abilities.
Scale Calibration: The assessment scale might not optimally differentiate skill levels in this specific population, potentially causing a ceiling effect where many advanced players cluster at the top.
Recruitment Strategy: The participant recruitment process may have targeted more experienced players.
Research and Application Considerations
Representativeness: Consider whether this distribution accurately represents the target population or reflects selection biases.
Statistical Analysis: The heavy skew toward higher ratings might influence statistical analyses that assume normal distribution.
Group Comparisons: When comparing groups (e.g., by demographics), be aware that the small sample of beginners might limit certain analyses.
Scale Refinement: For future studies, consider a more differentiated scale at the upper end to better distinguish between advanced and expert performers.
Conclusion
The data reveals a population predominantly composed of advanced/expert level players, with relatively few beginners. This distribution is important context for any further analyses or conclusions drawn from this dataset, particularly when examining factors associated with play ability or when making generalizations about the broader population of players.
5.3 Comparison Stats
Code
# Original Analysis without Combined Groups# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation: Filter playAbility_MAX and prepare variablesdata_combined <- data_combined %>%mutate(playAbility_MAX =as.numeric(playAbility_MAX),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT")) )# Filter out cases where playAbility_MAX is 0 or NAanalysis_data <- data_combined %>%filter(!is.na(playAbility_MAX), playAbility_MAX !=0, !is.na(RMTMethods_YN)) %>%mutate(# Create factor version of playAbility with proper labelsplayAbility_factor =factor(playAbility_MAX, levels =1:5,labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")),# Create binary variables for logistic regressionhigh_play =ifelse(playAbility_MAX >=4, 1, 0),RMT_binary =ifelse(RMTMethods_YN =="RMT", 1, 0) )# Calculate counts by playAbility and RMT usagegrouped_data <- analysis_data %>%group_by(RMTMethods_YN, playAbility_factor) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%ungroup()# Statistical Analysis: Chi-square Test of Independencecontingency_table <-table(analysis_data$playAbility_factor, analysis_data$RMTMethods_YN)# Use simulation-based p-value calculation for chi-square testchi_test <-chisq.test(contingency_table, simulate.p.value =TRUE, B =10000)# Print the statistical resultscat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")
Chi-square Test Results (Independence between playAbility and RMT Usage):
Code
print(chi_test)
Pearson's Chi-squared test with simulated p-value (based on 10000
replicates)
data: contingency_table
X-squared = 23.066, df = NA, p-value = 0.0005999
# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_factor, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)
Code
# Logistic Regression Analysis# Run continuous modellogit_model <-glm(RMT_binary ~ playAbility_MAX, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)
Call:
glm(formula = RMT_binary ~ playAbility_MAX, family = "binomial",
data = analysis_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.0268 0.4393 -9.166 < 2e-16 ***
playAbility_MAX 0.5417 0.1010 5.364 8.16e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1296.9 on 1556 degrees of freedom
Residual deviance: 1265.1 on 1555 degrees of freedom
AIC: 1269.1
Number of Fisher Scoring iterations: 5
Code
# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")
# Predicted probabilities for each playing ability levelnew_data <-data.frame(playAbility_MAX =1:5)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =1:5,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by playing ability level:\n")
Predicted probabilities of RMT usage by playing ability level:
# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_combined, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",subtitle ="Using Combined Ability Categories (0 included in Novice/Beginner)",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12, face ="italic"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)
Code
# Logistic Regression Analysis# First create a numeric version of the combined categories for the continuous modelanalysis_data <- analysis_data %>%mutate(playAbility_combined_numeric =case_when( playAbility_combined =="Novice/Beginner"~1, playAbility_combined =="Intermediate"~2, playAbility_combined =="Advanced/Expert"~3,TRUE~NA_real_ ) )# Run continuous model with the combined categorieslogit_model <-glm(RMT_binary ~ playAbility_combined_numeric, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)
Call:
glm(formula = RMT_binary ~ playAbility_combined_numeric, family = "binomial",
data = analysis_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.0341 0.5056 -7.979 1.47e-15 ***
playAbility_combined_numeric 0.8246 0.1776 4.643 3.43e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1297.2 on 1557 degrees of freedom
Residual deviance: 1271.1 on 1556 degrees of freedom
AIC: 1275.1
Number of Fisher Scoring iterations: 5
Code
# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")
# Predicted probabilities for each combined playing ability levelnew_data <-data.frame(playAbility_combined_numeric =1:3)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =c("Novice/Beginner", "Intermediate", "Advanced/Expert"),combined_level_value =1:3,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by combined playing ability level:\n")
Predicted probabilities of RMT usage by combined playing ability level:
# Additional analysis: Also run categorical modellogit_model_categorical <-glm(RMT_binary ~ playAbility_combined, data = analysis_data, family ="binomial")# Print categorical model summarycat("\n\nCategorical Model Results (using combined categories as factors):\n")
Categorical Model Results (using combined categories as factors):
Code
print(summary(logit_model_categorical))
Call:
glm(formula = RMT_binary ~ playAbility_combined, family = "binomial",
data = analysis_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2513 0.5257 -4.283 1.85e-05 ***
playAbility_combinedIntermediate -0.2929 0.5588 -0.524 0.600
playAbility_combinedAdvanced/Expert 0.7057 0.5316 1.328 0.184
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1297.2 on 1557 degrees of freedom
Residual deviance: 1267.8 on 1555 degrees of freedom
AIC: 1273.8
Number of Fisher Scoring iterations: 5
Code
# Calculate odds ratios for categorical modelcat("\nOdds Ratios for Categorical Model:\n")
5.4 Results and Discussion (based on combined findings)
This analysis examines the relationship between musicians’ play ability levels (grouped into three categories) and their use of Remote Monitoring Technology (RMT). The results show a statistically significant association between play ability and RMT usage, with specific patterns of usage across different skill levels.
Chi-Square Test of Independence
Test Statistic: X² = 26.337
P-value: p < 0.0001 (based on 10,000 simulated replicates)
Interpretation: There is very strong evidence of an association between play ability and RMT usage. The null hypothesis that these variables are independent can be confidently rejected.
Effect Size
Cramer’s V: 0.13
Interpretation: This represents a small to moderate association between play ability and RMT usage. While statistically significant, the strength of the relationship is not overwhelming.
Standardized Residuals Analysis
The standardized residuals reveal exactly where the significant deviations from independence occur:
Intermediate players and RMT usage:
Strong negative residual (-4.92) for “RMT”
Strong positive residual (4.92) for “No RMT”
This indicates intermediate players are significantly less likely to use RMT than expected
Advanced/Expert players and RMT usage:
Strong positive residual (5.12) for “RMT”
Strong negative residual (-5.12) for “No RMT”
This indicates advanced/expert players are significantly more likely to use RMT than expected
Logistic Regression Models
Continuous Model
Coefficient for playAbility_combined_numeric: 0.8246 (p < 0.0001)
Odds Ratio: 2.281 (95% CI: 1.632-3.280)
Interpretation: For each step up in the combined ability category (e.g., from Novice/Beginner to Intermediate), the odds of using RMT increase by a factor of 2.28.
Categorical Model
Intermediate vs. Novice/Beginner:
Odds Ratio: 0.746 (non-significant, p = 0.600)
Intermediate players have slightly lower odds of using RMT than novice/beginners, but this difference is not statistically significant
Advanced/Expert vs. Novice/Beginner:
Odds Ratio: 2.025 (non-significant, p = 0.184)
Advanced/expert players have approximately double the odds of using RMT compared to novice/beginners, but this difference does not reach statistical significance.
Predicted Probabilities
The model predicts the following probabilities of RMT usage:
Novice/Beginner: 3.9%
Intermediate: 8.4%
Advanced/Expert: 17.4%
This shows a clear increasing trend in RMT usage with higher play ability levels, with advanced/expert players having more than four times the probability of using RMT compared to novice/beginners.
Model Performance
McFadden’s Pseudo R-squared: 0.0201
Accuracy: 85.4%
Sensitivity: 0 (model fails to predict any true RMT users)
Specificity: 1 (model correctly identifies all non-RMT users)
The model has modest explanatory power and defaults to predicting “No RMT” for all observations, which yields high accuracy due to the imbalanced dataset (far more non-users than users).
The sample is heavily weighted toward advanced/expert players, with relatively few novice/beginners.
Key Insights and Implications
Skill-Technology Relationship: There is a significant positive relationship between play ability and RMT usage, with more advanced players more likely to adopt this technology.
Intermediate Player Resistance: Intermediate players show a particularly strong pattern of avoiding RMT usage, which could indicate:
Transitional learning approaches at this skill level
Concerns about technology hindering development of fundamental skills
Different teaching methodologies that don’t emphasize technology
Advanced Player Adoption: Advanced/expert players embrace RMT at higher rates, possibly because:
They use technology to refine already-developed skills
They have greater technical understanding of their instruments
They may be more involved in teaching and use RMT as pedagogical tools
Practical Applications:
Technology developers should consider targeting advanced players as early adopters
Different approaches may be needed to increase RMT adoption among intermediate players
Educational approaches might need to be tailored by skill level.
Despite the statistical significance of the relationship, the modest effect size suggests that play ability is just one of many factors influencing RMT adoption among musicians.
6 Country of Residence
6.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate the total Ntotal_N <-nrow(data_combined)# Modify country names: abbreviate USA and UKdata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ))# Compute counts and percentages for the 'countryLive' columncountry_summary <- data_combined %>%group_by(countryLive) %>%summarise(count =n()) %>%ungroup() %>%mutate(percentage = count / total_N *100) %>%arrange(desc(count))# Select the top 4 countries (using the highest counts)top_countries <- country_summary %>%top_n(4, wt = count) %>%mutate(label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),# Reorder to display from highest to lowestcountryLive =reorder(countryLive, -count) )# Perform chi-square goodness-of-fit test for top 4 countries# Expected frequencies for equality among the 4 groupsobserved <- top_countries$countexpected <-rep(sum(observed)/length(observed), length(observed))chi_test <-chisq.test(x = observed, p =rep(1/length(observed), length(observed)))print("Chi-square goodness-of-fit test for equal distribution among top 4 countries:")
[1] "Chi-square goodness-of-fit test for equal distribution among top 4 countries:"
Code
print(chi_test)
Chi-squared test for given probabilities
data: observed
X-squared = 390.66, df = 3, p-value < 2.2e-16
Code
# Create the bar plot with counts and percentages and add the notecountry_plot <-ggplot(top_countries, aes(x = countryLive, y = count)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 4 Countries (Live)",x ="Country",y =paste0("Count of Participants (N = ", total_N, ")"),subtitle =paste0("Chi-square: ", sprintf('%.2f', chi_test$statistic), " (df = ", chi_test$parameter, "), p = ", ifelse(chi_test$p.value <0.001, "< .001", sprintf('%.3f', chi_test$p.value))),caption ="Note: Countries with frequencies less than 5% were removed from the figure.") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),plot.subtitle =element_text(size =12),plot.caption =element_text(size =10, hjust =0, face ="italic", margin =margin(t =15)),plot.margin =margin(b =30, t =20, r =20, l =20) # Add extra bottom margin for the note ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)
Code
# Print summary statisticsprint("Summary Statistics for Top 4 Countries:")
# A tibble: 4 × 3
countryLive count percentage
<fct> <int> <dbl>
1 USA 610 39.2
2 UK 358 23.0
3 Australia 326 20.9
4 Canada 91 5.84
Code
# Combined 5% values with note# Read the datalibrary(tidyverse)library(readxl)library(ggplot2)library(stringr)data_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process the countryLive columncountry_analysis <- data_combined %>%# Remove missing valuesfilter(!is.na(countryLive)) %>%# Clean country names (trim whitespace, standardize case if needed)mutate(countryLive =trimws(countryLive)) %>%# Count occurrences of each countrycount(countryLive) %>%# Calculate percentagesmutate(total =sum(n),percentage = n / total *100,# Create grouped category (countries with <5% go to "Other")country_group =if_else(percentage >=5, countryLive, "Other") ) %>%# Sort by frequencyarrange(desc(n))# Identify countries that will be in the "Other" categoryother_countries <- country_analysis %>%filter(country_group =="Other") %>%arrange(desc(n))# Create a formatted string listing all "Other" countries WITHOUT percentagesother_countries_list <- other_countries %>%pull(countryLive) %>%paste(collapse =", ")# Create data for plottingplot_data <- country_analysis %>%group_by(country_group) %>%summarise(count =sum(n),percentage =sum(percentage) ) %>%# Ensure "Other" appears at the endmutate(country_group =factor(country_group),country_group =fct_relevel(fct_reorder(country_group, percentage, .desc =TRUE), "Other", after =Inf) )# Calculate total participantstotal_participants <-sum(country_analysis$n)# Create the base note textnote_text <-paste0("Note: 'Other' includes frequencies <5%: ", other_countries_list)# Manually wrap the note text to ensure it's fully visiblewrapped_note <-str_wrap(note_text, width =100)# Create the plot with adjusted bottom margin and caption stylingcountry_plot <-ggplot(plot_data, aes(x = country_group, y = count)) +geom_bar(stat ="identity", fill ="steelblue", width =0.7) +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_stack(vjust =0.5),vjust =-0.5, size =3.5) +labs(title ="Distribution of Participants by Country of Residence",x ="Country",y =paste0("Number of Participants (N = ", total_participants, ")"),caption = wrapped_note ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =12),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),# Style the caption to ensure it's readableplot.caption =element_text(size =9, hjust =0, vjust =1,lineheight =1.2,margin =margin(t =15) ),# Add extra bottom margin for the captionplot.margin =margin(b =100, t =20, r =20, l =20) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)
6.2 Results and Discussion
Key observations:
• 39 counties in total
• The United States has the highest representation with 610 participants (39.2%)
• The United Kingdom follows with 358 participants (23%)
• Australia is third with 326 participants (20.9%)
• These top three countries account for approximately 83% of all participants
• The remaining countries (Canada, Italy, and New Zealand) collectively represent about 17% of participants
6.3 Comparison Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Clean country names and create RMT factordata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) )# Get top 6 countriestop_6_countries <- data_combined %>%count(countryLive) %>%top_n(6, n) %>%pull(countryLive)# Filter data for top 6 countriesdata_for_test <- data_combined %>%filter(countryLive %in% top_6_countries, !is.na(RMTMethods_YN))# Calculate group totals for each RMT grouprmt_group_totals <- data_for_test %>%group_by(RMTMethods_YN) %>%summarise(group_N =n())# Calculate statistics with percentages WITHIN each RMT group (not within country)country_rmt_stats <- data_for_test %>%group_by(RMTMethods_YN, countryLive) %>%summarise(count =n(), .groups ='drop') %>%left_join(rmt_group_totals, by ="RMTMethods_YN") %>%mutate(percentage = count / group_N *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%# Calculate total per country (for ordering in plot)group_by(countryLive) %>%mutate(total_country =sum(count)) %>%ungroup()# Create contingency table for statistical testcontingency_table <-table( data_for_test$countryLive, data_for_test$RMTMethods_YN)# Prepare legend labels with group total N includedlegend_labels <-setNames(paste0(levels(data_for_test$RMTMethods_YN), " (N = ", rmt_group_totals$group_N, ")"),levels(data_for_test$RMTMethods_YN))# Get expected frequencies without running a test yetn <-sum(contingency_table)row_sums <-rowSums(contingency_table)col_sums <-colSums(contingency_table)expected_counts <-outer(row_sums, col_sums) / n# Use Fisher's exact test to avoid chi-square approximation warningsfisher_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)test_name <-"Fisher's exact test"# Print test resultscat("\n", test_name, "Results:\n", sep="")
Fisher's exact testResults:
Code
print(fisher_test)
Fisher's Exact Test for Count Data with simulated p-value (based on
10000 replicates)
data: contingency_table
p-value = 9.999e-05
alternative hypothesis: two.sided
No RMT RMT
Australia 279.91 46.09
Canada 78.13 12.87
Italy 40.35 6.65
New Zealand 27.48 4.52
UK 307.38 50.62
USA 523.75 86.25
Code
# Create grouped bar plot with percentages within RMT groupsplot <-ggplot(country_rmt_stats, aes(x =reorder(countryLive, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge",color ="black") +geom_text(aes(label = label),position =position_dodge(width =0.9),vjust =-0.5,size =3.5) +scale_fill_manual(values =c("lightblue", "steelblue"),labels = legend_labels) +labs(title ="RMT Usage by Country (Top 6)",subtitle =paste0(test_name, ": p ", ifelse(fisher_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", fisher_test$p.value)))),x ="Country",y ="Count of Participants",fill ="RMT Usage",caption ="Note: Percentages are calculated within each RMT group, not within countries") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),legend.position ="top",plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)) )# Display the plotprint(plot)
Code
# Calculate proportions of RMT users in each countrycountry_proportions <- data_for_test %>%group_by(countryLive) %>%summarise(total =n(),rmt_users =sum(RMTMethods_YN =="RMT"),rmt_proportion = rmt_users/total,rmt_percentage = rmt_proportion *100 ) %>%arrange(desc(rmt_proportion))cat("\nRMT Usage Proportions by Country:\n")
RMT Usage Proportions by Country:
Code
print(country_proportions)
# A tibble: 6 × 5
countryLive total rmt_users rmt_proportion rmt_percentage
<chr> <int> <int> <dbl> <dbl>
1 Australia 326 63 0.193 19.3
2 USA 610 113 0.185 18.5
3 Italy 47 8 0.170 17.0
4 Canada 91 8 0.0879 8.79
5 UK 358 14 0.0391 3.91
6 New Zealand 32 1 0.0312 3.12
Code
# Pairwise proportion tests with Bonferroni correctioncountries <-unique(country_proportions$countryLive)n_countries <-length(countries)pairwise_tests <-data.frame()for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j]# Get data for both countries data1 <- data_for_test %>%filter(countryLive == country1) data2 <- data_for_test %>%filter(countryLive == country2)# Get counts for proportion test x1 <-sum(data1$RMTMethods_YN =="RMT") x2 <-sum(data2$RMTMethods_YN =="RMT") n1 <-nrow(data1) n2 <-nrow(data2)# Skip if zero denominatorsif (n1 ==0|| n2 ==0) {next }# Create 2x2 table for test test_table <-matrix(c(x1, n1-x1, x2, n2-x2), nrow=2)# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(test_table)# Store results pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,prop1 = x1/n1,prop2 = x2/n2,diff =abs(x1/n1 - x2/n2),p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionif (nrow(pairwise_tests) >0) { pairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")cat("\nPairwise Comparisons (Bonferroni-adjusted p-values):\n")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(prop1 =sprintf("%.1f%%", prop1 *100),prop2 =sprintf("%.1f%%", prop2 *100),diff =sprintf("%.1f%%", diff *100),p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ) %>%select(country1, prop1, country2, prop2, diff, p_value, p_adjusted))} else {cat("\nNo valid pairwise comparisons could be performed.\n")}
Pairwise Comparisons (Bonferroni-adjusted p-values):
country1 prop1 country2 prop2 diff p_value p_adjusted
1 USA 18.5% UK 3.9% 14.6% 0.0000 0.0000
2 Australia 19.3% UK 3.9% 15.4% 0.0000 0.0000
3 Italy 17.0% UK 3.9% 13.1% 0.0017 0.0249
4 Australia 19.3% Canada 8.8% 10.5% 0.0178 0.2664
5 USA 18.5% Canada 8.8% 9.7% 0.0247 0.3708
6 Australia 19.3% New Zealand 3.1% 16.2% 0.0262 0.3934
7 USA 18.5% New Zealand 3.1% 15.4% 0.0292 0.4383
8 Australia 19.3% USA 18.5% 0.8% 0.7924 1.0000
9 Australia 19.3% Italy 17.0% 2.3% 0.8433 1.0000
10 USA 18.5% Italy 17.0% 1.5% 1.0000 1.0000
11 Italy 17.0% Canada 8.8% 8.2% 0.1689 1.0000
12 Italy 17.0% New Zealand 3.1% 13.9% 0.0757 1.0000
13 Canada 8.8% UK 3.9% 4.9% 0.0970 1.0000
14 Canada 8.8% New Zealand 3.1% 5.7% 0.4437 1.0000
15 UK 3.9% New Zealand 3.1% 0.8% 1.0000 1.0000
6.4 Results and Discussion
Geographical Patterns in Respiratory Muscle Training Adoption Among Wind Musicians
Overview of Findings
The analysis examines how Respiratory Muscle Training (RMT) device usage varies across different countries among wind instrument musicians. By focusing on the top six countries represented in the sample, this analysis reveals distinctive geographical patterns in RMT adoption that likely reflect broader cultural, educational, and practice-related differences.
Key Statistical Findings
The Fisher’s exact test confirms a statistically significant association between country of residence and RMT usage (p < 0.05). This indicates that the observed geographical variations in RMT adoption are not due to random chance but represent genuine differences across countries. Within each RMT group, the distribution of countries differs notably:
Country-Specific RMT Usage Patterns
The data shows notable differences in RMT adoption rates across countries:
High Adoption Countries:
Australia: 19.3% (63 out of 326 participants)
USA: 18.5% (113 out of 610 participants)
Italy: 17.0% (8 out of 47 participants)
Low Adoption Countries:
Canada: 8.8% (8 out of 91 participants)
UK: 3.9% (14 out of 358 participants)
New Zealand: 3.1% (1 out of 32 participants)
This reveals a striking pattern where participants from the UK and New Zealand are approximately 5-6 times less likely to use RMT compared to those from Australia, USA, or Italy.
Pairwise Comparisons
The Bonferroni-adjusted pairwise comparisons identify which specific country differences are statistically significant:
Significant Differences (p < 0.05 after adjustment):
USA vs. UK: 14.6 percentage point difference (p = 0.0000)
Australia vs. UK: 15.4 percentage point difference (p = 0.0000)
Italy vs. UK: 13.1 percentage point difference (p = 0.0249)
Non-Significant Differences (p > 0.05 after adjustment):
All other country comparisons, including those between high-adoption countries (Australia, USA, Italy) and between low-adoption countries (Canada, UK, New Zealand)
The Bonferroni correction, which adjusts for multiple comparisons, confirms that the UK stands out as having significantly lower RMT usage compared to Australia, USA, and Italy.
Implications and Potential Explanations
Several factors might explain these international differences in RMT adoption:
Educational and Training Approaches: Different educational systems and training methodologies may emphasize technology integration to varying degrees.
Cultural Attitudes: Cultural differences in approaches to teaching, learning, and technology adoption may influence RMT usage.
Technological Infrastructure: Differences in access to technology, internet reliability, and technological support systems might affect adoption rates.
Market Penetration: Technology companies might have targeted certain markets more aggressively or entered them earlier.
Professional Norms: Music education and performance practices may have different established norms regarding technology usage across these countries.
The Anglo countries show an interesting split, with the USA and Australia having high adoption rates while the UK and New Zealand have notably low rates. This suggests that shared language and similar cultural backgrounds do not necessarily lead to similar technology adoption patterns.
Limitations
While these findings are statistically significant, some considerations are worth noting:
Sample Size Variations: The number of participants varies considerably across countries (from 32 in New Zealand to 610 in the USA), which affects the precision of the estimates.
Limited Coverage for Italy and New Zealand: These countries have relatively small sample sizes (47 and 32 respectively), making their estimates less reliable.
Potential Confounding Variables: Factors like participant age, professional status, or educational background might vary across countries and influence the results.
Conclusion
The analysis clearly demonstrates that RMT usage varies significantly by country of residence, with particularly strong differences between high-adoption countries (Australia, USA, Italy) and the UK. These findings suggest that geographical and potentially cultural factors play an important role in technology adoption within music education and performance settings. Further research could explore the specific mechanisms behind these differences and their implications for educational policy and technology development.
7 Country of Education
7.1 Descriptive Stats
Code
# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate total Ntotal_N <-nrow(data_combined)# Clean country namesdata_combined <- data_combined %>%mutate(countryEd =case_when( countryEd =="United States of America (USA)"~"USA", countryEd =="United Kingdom (UK)"~"UK",TRUE~as.character(countryEd) ) )# Identify the top 6 countries from countryEdtop_6_countryEd <- data_combined %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data for these top 6 countriesdata_top6_edu <- data_combined %>%filter(countryEd %in% top_6_countryEd)# Calculate statistics for plotting and analysisedu_stats <- data_top6_edu %>%count(countryEd) %>%arrange(desc(n)) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)") )# Chi-square test for equal proportionschi_test <-chisq.test(edu_stats$n)# Create contingency table for post-hoc analysiscountries <-sort(unique(data_top6_edu$countryEd))n_countries <-length(countries)pairwise_tests <-data.frame()# Perform pairwise proportion testsfor(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j] count1 <- edu_stats$n[edu_stats$countryEd == country1] count2 <- edu_stats$n[edu_stats$countryEd == country2]# Perform proportion test test <-prop.test(x =c(count1, count2),n =c(sum(edu_stats$n), sum(edu_stats$n)) ) pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionpairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")# Create the plotedu_plot <-ggplot(edu_stats, aes(x =reorder(countryEd, -n), y = n)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries of Education",subtitle =paste0("χ²(", chi_test$parameter, ") = ", sprintf("%.2f", chi_test$statistic),", p ", ifelse(chi_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", chi_test$p.value)))),x ="Country of Education",y =paste0("Count of Participants (N = ", total_N, ")")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Print statistical resultsprint("Chi-square Test of Equal Proportions Results:")
[1] "Chi-square Test of Equal Proportions Results:"
Code
print(chi_test)
Chi-squared test for given probabilities
data: edu_stats$n
X-squared = 1111.3, df = 5, p-value < 2.2e-16
Code
print("\nDescriptive Statistics:")
[1] "\nDescriptive Statistics:"
Code
print(edu_stats)
# A tibble: 6 × 4
countryEd n percentage label
<chr> <int> <dbl> <chr>
1 USA 620 42.2 "620\n(42.2%)"
2 UK 364 24.8 "364\n(24.8%)"
3 Australia 321 21.9 "321\n(21.9%)"
4 Canada 92 6.27 "92\n(6.3%)"
5 Italy 44 3.00 "44\n(3.0%)"
6 New Zealand 27 1.84 "27\n(1.8%)"
country1 country2 p_value p_adjusted
1 New Zealand USA 0.0000 0.0000
2 Italy USA 0.0000 0.0000
3 Canada USA 0.0000 0.0000
4 New Zealand UK 0.0000 0.0000
5 Italy UK 0.0000 0.0000
6 Australia New Zealand 0.0000 0.0000
7 Australia Italy 0.0000 0.0000
8 Canada UK 0.0000 0.0000
9 Australia Canada 0.0000 0.0000
10 Australia USA 0.0000 0.0000
11 UK USA 0.0000 0.0000
12 Canada New Zealand 0.0000 0.0000
13 Canada Italy 0.0000 0.0006
14 Italy New Zealand 0.0546 0.8186
15 Australia UK 0.0668 1.0000
Code
# Display the plotprint(edu_plot)
7.2 Results and Discussion
Distribution of Participants
United States of America (USA): Largest share with 620 participants (39.8%)
United Kingdom (UK): Second largest with 364 participants (23.4%)
Australia: Third with 321 participants (20.6%)
Canada: 92 participants (5.9%)
Italy: 44 participants (2.8%)
New Zealand: 27 participants (1.7%)
Key Observations:
The top three countries (USA, UK, and Australia) account for 83.8% of all participants’ educational background
English-speaking countries dominate the top positions, representing 91.4% of the top 6 response
There’s a significant drop between the top 3 and the remaining countries
The distribution pattern is very similar to the current country of residence data, suggesting most participants stayed in their country of education
Comparison to Current Residence:
USA: Education (620) vs Residence (610) UK: Education (364) vs Residence (358) Australia: Education (321) vs Residence (326) Canada: Education (92) vs Residence (91) Italy: Education (44) vs Residence (47) New Zealand: Education (27) vs Residence (32)
This shows relatively small differences between where participants were educated and where they currently live, indicating low international mobility after education.
7.3 Comparison Stats
Still getting a Chi-Squared Warning, but added Fisher’s
Code
library(readxl)library(dplyr)library(ggplot2)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Identify Top 6 Countries top_6_countryEd <- data %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data to top 6 countries data_top6_edu <- data %>%filter(countryEd %in% top_6_countryEd)# Create contingency table contingency_table <-table(data_top6_edu$countryEd, data_top6_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Fisher's exact test pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(cont_table[c(i,j),]) results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top6_edu = data_top6_edu,contingency_table = contingency_table )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top6_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top6_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 6)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption ="Note: Percentages are out of the total N for each group" ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 3.79
Cells with Expected Frequency < 5: 1 out of 12 cells ( 8.33 %)
Statistical Test Details:
Test Type: Chi-Square with Continuity Correction
P-value: 1.055118e-10
Contingency Table:
No RMT RMT
Australia 256 65
Canada 84 8
Italy 39 5
New Zealand 26 1
UK 350 14
USA 507 113
Post-hoc Pairwise Comparisons (Bonferroni-corrected):
comparison p_value adj_p_value
1 Australia vs Canada 8.592223e-03 1.288833e-01
2 Australia vs Italy 2.196714e-01 1.000000e+00
3 Australia vs New Zealand 3.842474e-02 5.763710e-01
4 Australia vs UK 7.873873e-12 1.181081e-10
5 Australia vs USA 4.826511e-01 1.000000e+00
6 Canada vs Italy 7.562204e-01 1.000000e+00
7 Canada vs New Zealand 6.820881e-01 1.000000e+00
8 Canada vs UK 6.030667e-02 9.046001e-01
9 Canada vs USA 2.481173e-02 3.721759e-01
10 Italy vs New Zealand 3.968168e-01 1.000000e+00
11 Italy vs UK 4.256371e-02 6.384556e-01
12 Italy vs USA 3.106074e-01 1.000000e+00
13 New Zealand vs UK 1.000000e+00 1.000000e+00
14 New Zealand vs USA 6.684451e-02 1.000000e+00
15 UK vs USA 3.421609e-12 5.132413e-11
Code
# Only top 4 countries shown - Trying to increase the robustness of the analyses.library(readxl)library(dplyr)library(ggplot2)library(stringr)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Get all countries and their frequencies all_countries <- data %>%filter(!is.na(countryEd)) %>%count(countryEd, sort =TRUE) %>%mutate(percentage = n /sum(n) *100)# Identify Top 4 Countries top_4_countryEd <- all_countries %>%top_n(4, n) %>%pull(countryEd)# Get excluded countries for the note excluded_countries <- all_countries %>%filter(!(countryEd %in% top_4_countryEd)) %>%mutate(country_info = countryEd) %>%pull(country_info)# Create excluded countries list as a string excluded_countries_text <-paste(excluded_countries, collapse =", ")# Filter data to top 4 countries data_top4_edu <- data %>%filter(countryEd %in% top_4_countryEd)# Create contingency table contingency_table <-table(data_top4_edu$countryEd, data_top4_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Chi-square tests (if appropriate) pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Get the 2x2 subtable subtable <- cont_table[c(i,j),]# Check assumptions for chi-square expected <-chisq.test(subtable)$expected min_expected <-min(expected)# Use appropriate test based on expected frequenciesif (min_expected <5) { test <-fisher.test(subtable) } else { test <-chisq.test(subtable, correct =TRUE) } results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top4_edu = data_top4_edu,contingency_table = contingency_table,excluded_countries_text = excluded_countries_text,all_countries = all_countries )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top4_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top4_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the caption with excluded countries note_text <-paste0("Note. All countries with frequencies less than 5% were excluded from the Figure, including: ", analysis_results$excluded_countries_text)# Wrap the note text for better display wrapped_note <-str_wrap(note_text, width =100)# Count how many lines are in the wrapped note for margin adjustment line_count <-str_count(wrapped_note, "\n") +1# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 4)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption = wrapped_note ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =9, lineheight =1.2),# Adjust bottom margin based on the number of lines in the noteplot.margin =margin(b =10+ (line_count *12), t =20, r =20, l =20) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)cat("\nExcluded Countries:\n")cat(analysis_results$excluded_countries_text, "\n")# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()
Expected Frequency Analysis:
Minimum Expected Frequency: 13.17
Cells with Expected Frequency < 5: 0 out of 8 cells ( 0 %)
Statistical Test Details:
Test Type: Chi-Square with Continuity Correction
P-value: 3.247836e-11
Contingency Table:
No RMT RMT
Australia 256 65
Canada 84 8
UK 350 14
USA 507 113
Post-hoc Pairwise Comparisons (Bonferroni-corrected):
comparison p_value adj_p_value
1 Australia vs Canada 1.612145e-02 9.672870e-02
2 Australia vs UK 4.485762e-11 2.691457e-10
3 Australia vs USA 5.069555e-01 1.000000e+00
4 Canada vs UK 6.030667e-02 3.618400e-01
5 Canada vs USA 3.380404e-02 2.028243e-01
6 UK vs USA 1.586250e-10 9.517502e-10
Excluded Countries:
Italy, New Zealand, Germany, South Africa, Hungary, Albania, Argentina, Afghanistan, Czechia, Austria, Sweden, Belarus, Netherlands, Angola, Antigua and Barbuda, Armenia, Azerbaijan, Bangladesh, Belgium, Switzerland, Andorra, Bahamas, Brazil, Colombia, Denmark, Finland, Greece, Ireland, Marshall Islands, Palestine, Solomon Islands, Turkey, Uzbekistan, Vietnam
7.4 Results and Discussion
Statistical Significance
Chi-square Test: The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.
Effect Size (Cramer’s V): Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.
Detailed Group Comparison
Group Proportions: The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.
Proportion Differences: By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.
Standardized Residuals
Interpreting Residuals: The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates under representation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an under representation relative to expectation.
Identifying Key Differences: Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.
Visualization of Findings
Two types of plots were generated: Side-by-Side Bar Plot: This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.
Dot/Line Plot: As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups. Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.
Overall Interpretation
Significant Association: The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.
Magnitude and Specific Contributions: The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.
Practical Implications: Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.
Summary
The findings indicate that country of education is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.
Combined Groups
This analysis examines the relationship between country of education and Respiratory Muscle Training (RMT) usage among participants. The data covers four major English-speaking countries: Australia, Canada, the UK, and the USA, with 34 additional countries excluded due to smaller sample sizes.
Statistical Significance
The chi-square test with continuity correction yielded a highly significant result (p < 0.001), indicating a strong association between country of education and RMT usage. All expected cell frequencies were above the minimum threshold of 5 (minimum = 13.17), confirming that the chi-square test was appropriate for this analysis.
RMT Usage Patterns by Country
From the contingency table, we can observe distinct patterns in RMT adoption:
Australia: 65 out of 321 participants (20.2%) use RMT
USA: 113 out of 620 participants (18.2%) use RMT
Canada: 8 out of 92 participants (8.7%) use RMT
UK: 14 out of 364 participants (3.8%) use RMT
This reveals a striking pattern where participants educated in Australia and the USA have RMT adoption rates approximately 2-5 times higher than those educated in Canada and the UK.
Post-hoc Pairwise Comparisons
The Bonferroni-corrected pairwise comparisons identify which specific country differences are statistically significant:
Significant Differences (p < 0.05 after adjustment):
Australia vs. UK (p = 2.69 × 10^-10): Australian-educated participants are significantly more likely to use RMT than UK-educated participants
USA vs. UK (p = 9.52 × 10^-10): US-educated participants are significantly more likely to use RMT than UK-educated participants
Non-Significant Differences (p > 0.05 after adjustment):
Australia vs. Canada (p = 0.097): Despite the substantial percentage difference (20.2% vs 8.7%), this comparison falls just short of significance after Bonferroni correction
Australia vs. USA (p = 1.0): Very similar RMT usage rates between these countries
Canada vs. UK (p = 0.362): Both have lower RMT usage rates
Canada vs. USA (p = 0.203): Not statistically significant after correction
Implications and Potential Explanations
Several factors might explain these international differences in RMT adoption based on education location:
Educational Approaches: Australia and the USA may emphasize technology integration in music education more heavily than the UK and Canada. This could reflect differences in curriculum design, pedagogical approaches, or resource allocation within educational institutions.
Technological Infrastructure in Educational Settings: Educational institutions in Australia and the USA might have greater access to technology resources, better digital infrastructure, or more technology-oriented training programs.
Cultural Attitudes Toward Technology in Education: Different educational systems may have varying cultural perspectives on the role of technology in learning and performance. These cultural attitudes could shape how students are taught to view technological aids.
Educational Investment in Technology: Countries may differ in their funding priorities for music education, with some investing more heavily in technological resources and training.
Different Teaching Traditions: Musical education may follow different teaching traditions and philosophies across these countries, with some more receptive to technological innovation than others.
Broader Context
The pattern observed here—with Australia and the USA having higher RMT adoption rates than the UK and Canada—suggests a potential divide in approaches to music education and technology integration across these major English-speaking countries. This divide persists despite shared language and many cultural similarities, highlighting how educational practices can diverge significantly across national boundaries.
Limitations and Considerations
Sample Size Variations: The number of participants varies considerably across countries (from 92 in Canada to 620 in the USA), affecting the precision of estimates.
Excluded Countries: 34 countries were excluded from the analysis due to small sample sizes. This extensive list includes several European countries (Germany, Italy, Netherlands, etc.) and countries from various regions worldwide, limiting the global representativeness of the findings.
Historical Context: The data doesn’t indicate when participants were educated in these countries, so the findings may reflect historical differences in educational approaches across different time periods.
Education vs. Residence: This analysis focuses on country of education rather than current residence, which may provide insights into formative influences rather than current environmental factors affecting RMT usage.
Conclusion
The analysis provides strong evidence that country of education is significantly associated with RMT usage, with particularly pronounced differences between Australia/USA and the UK. These findings suggest that educational systems and approaches play an important role in shaping technology adoption practices among musicians. Understanding these patterns could help inform international educational policy, curriculum development, and technology integration strategies in music education.
Source Code
---title: "Demographic Analysis of Wind Instrument Musicians and RMT Device Usage"author: "Sarah Morris"date: "2025-03-04"format: html: toc: true toc-depth: 2 toc-title: "Table of Contents" toc-location: right number-sections: true theme: cosmo code-fold: true code-tools: true highlight-style: githubexecute: echo: true warning: false error: false---Provide a full description and interpretation from the results of the following code. Include a breakdown of the descriptive statistics, as well as any statistical tests included in the code.```{r}# Install all required packages (if not already installed)required_packages <-c("dplyr", "ggplot2", "stats", "tidyr", "forcats", "broom","scales", "vcd", "svglite", "exact2x2", "exactci","ssanv", "testthat")# Install any missing packagesinstall.packages(setdiff(required_packages, rownames(installed.packages())))## Libraries and Directory#| echo: false#| output: falselibrary(readxl)library(dplyr)library(ggplot2)library(stats)library(tidyr)library(forcats)library(broom)library(scales)library(vcd) # For Cramer's V calculationlibrary(svglite)library(exact2x2)library(stringr)# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")```# Overview# Gender## Descriptive Stats1. Summary stats on gender data2. Prints plot```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create gender summary# Overall, this code processes a dataset (data_combined) to summarize gender information by filtering out missing values, grouping by gender, calculating counts and percentages, adjusting specific gender labels, and finally sorting the results by count in descending order. The end result is stored in the variable 'gender_summary'gender_summary <- data_combined %>%filter(!is.na(gender)) %>%group_by(gender) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Not specified", gender =="Nonbinary/gender fluid/gender non-conforming"~"Nonbinary",TRUE~ gender )) %>%arrange(desc(count))# Create the gender plot with adjusted height scalinggender_plot <-ggplot(gender_summary, aes(x =reorder(gender, count), y = count, fill = gender)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Gender",x ="Gender",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(gender_summary$count) *1.15))# Display the plotprint(gender_plot)``````{r}#| echo: false#| tbl-cap: "Gender Distribution Summary"#| label: tbl-gender-summaryprint(gender_summary)```## Results and DiscussionThe gender distribution shows a fairly even split between Male and Female participants, with a much smaller representation of Nonbinary, fluid, and non-conforming-gender individuals, and those who chose not to disclose.## Comparison Stats```{r}## Combine categories # Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Another approach: Combine categories with small counts# Combine "Nonbinary" and "Not specified" into "Other"data_filtered_combined <- data_combined %>%filter(!is.na(gender), !is.na(RMTMethods_YN)) %>%mutate(gender =case_when( gender =="Choose not to disclose"~"Other", gender =="Nonbinary/gender fluid/gender non-conforming"~"Other",TRUE~ gender ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No RMT", RMTMethods_YN ==1~"RMT" ) )# Create a new contingency table with combined categoriesgender_rmt_table_combined <-table(data_filtered_combined$gender, data_filtered_combined$RMTMethods_YN)# Print the new contingency tableprint("Contingency table with combined categories:")print(gender_rmt_table_combined)# Calculate expected counts for the new tableexpected_counts_combined <-chisq.test(gender_rmt_table_combined)$expectedprint("Expected counts with combined categories:")print(expected_counts_combined)# Perform chi-square test on the new tablechi_square_results_combined <-chisq.test(gender_rmt_table_combined)print("Chi-square test results with combined categories:")print(chi_square_results_combined)# Create a data frame for plotting with combined categoriesgender_rmt_df_combined <-as.data.frame(gender_rmt_table_combined)colnames(gender_rmt_df_combined) <-c("Gender", "RMTMethods_YN", "Count")gender_rmt_df_combined <- gender_rmt_df_combined %>%group_by(Gender) %>%mutate(Percentage = (Count /sum(Count)) *100)# Create the plot with combined categories#| fig.height: 7#| fig.width: 10#| out.height: "700px"#| fig.align: "center"rmt_plot_combined <-ggplot(gender_rmt_df_combined, aes(x = RMTMethods_YN, y = Count, fill = Gender)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", Count, Percentage)), position =position_dodge(width =0.9), vjust =-0.5, size =3) +labs(title ="Gender Distribution by RMT Methods Usage (Combined Categories)",x ="RMT Methods Usage",y ="Number of Participants (N = 1558)",fill ="Gender") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20, unit ="pt") # Add more margin at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Expand y-axis to make room for labels# Display the plot with combined categoriesprint(rmt_plot_combined)```## Results after combining into 'Other' category**Overview of the Data**The analysis examines the relationship between gender and Remote Monitoring Technology (RMT) usage across three gender categories: Female, Male, and Other. The contingency table shows the observed counts of participants in each combination of gender and RMT usage status.**Contingency Table Analysis***Observed Frequencies*- **Female participants**: 642 do not use RMT; 83 use RMT (total: 725)- **Male participants**: 615 do not use RMT; 135 use RMT (total: 750)- **Other gender category**: 73 do not use RMT; 10 use RMT (total: 83)- **Column totals**: 1,330 non-RMT users; 228 RMT users (overall total: 1,558)**Usage Rates by Gender**- Females: 11.4% use RMT (83 out of 725)- Males: 18.0% use RMT (135 out of 750)- Other: 12.0% use RMT (10 out of 83)**Key Observations**- The majority of individuals across all genders did not use RMT ("No RMT").- In absolute terms, females are the largest group in the "No RMT" category (642), while males are the largest in the "RMT" category (135).- The "Other" gender group has the fewest individuals in both "RMT" (10) and "No RMT" (73) categories.**Expected Frequencies**The expected counts represent what we would expect to see if there were no relationship between gender and RMT usage (i.e., if the null hypothesis were true). These are calculated based on the assumption that RMT usage and gender are independent:- **Female participants**: Expected 618.90 non-users; 106.10 user- **Male participants**: Expected 640.24 non-users; 109.76 users- **Other gender category**: Expected 70.85 non-users; 12.15 users**Differences between Observed and Expected**- **Female participants**: More non-users than expected (+23.10); fewer users than expected (-23.10)- **Male participants**: Fewer non-users than expected (-25.24); more users than expected (+25.24)- **Other gender category**: More non-users than expected (+2.15); fewer users than expected (-2.15)**Key Observations:**- Under independence, females were expected to have about 618.9 individuals in "No RMT" and 106.1 in "RMT." However, the actual numbers are 642 and 83, respectively, suggesting fewer females participated in RMT than expected.- Conversely, males were expected to have 640.2 individuals in "No RMT" and 109.8 in "RMT," but the actual numbers are 615 and 135, suggesting more males participated in RMT than expected.- The "Other" group shows relatively small deviations from the expected counts.**Statistical Test Results**The chi-squared test evaluates whether the observed differences between the gender categories in RMT usage contingency table and the expected counts are statistically significant or could be attributed to random chance. The test outputs the following results:- **Chi-square statistic (X²)**: 13.136- **Degrees of freedom (df)**: 2- **P-value**: 0.001405**Interpretation**1. **Statistical Significance**: The p-value (0.001405) is well below the conventional threshold of 0.05, indicating a statistically significant association between gender and RMT usage. This means we can reject the null hypothesis that gender and RMT usage are independent.2. **Effect Direction**: - Males show notably higher RMT usage than expected under the null hypothesis - Females show notably lower RMT usage than expected - The "Other" gender category shows slightly lower RMT usage than expected3. **Practical Significance**: - Male participants are approximately 1.6 times more likely to use RMT compared to female participants (18.0% vs 11.4%) - The "Other" gender category's RMT usage (12.0%) is closer to females than males4. **Strength of Association**: While statistically significant, the chi-square value (13.136) suggests a modest association between gender and RMT usage in the context of the sample size (1,558 participants).5. In simpler terms, there is indeed a meaningful relationship between gender and the use of RMT methods.**Limitations and Considerations**1. **Categorical Combination**: The "Other" category appears to be a combination of non-binary individuals and those who chose not to disclose their gender. This grouping, while necessary for statistical analysis due to smaller sample sizes, may obscure potential differences between these distinct groups.2. **Sample Size Differences**: The "Other" gender category (n=83) is much smaller than the Female (n=725) and Male (n=750) categories, which affects the precision of estimates for this group.3. **No Control Variables**: This analysis does not account for other variables that might influence RMT usage (e.g., age, technology access, health status) and might confound the relationship between gender and RMT usage.**Conclusion**There is compelling statistical evidence for a gender difference in RMT usage patterns, with males showing significantly higher adoption rates compared to females and those in the "Other" gender category. This finding may have implications for RMT implementation strategies, suggesting potential benefits from gender-specific approaches to technology adoption and support.# Age## Demographic Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age summaryage_summary <- data_combined %>%filter(!is.na(age)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ) ) %>%group_by(age_group) %>%summarise(count =n(),percentage = (count /1558) *100,.groups ='drop' ) %>%arrange(factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the age plot with adjusted height scalingage_plot <-ggplot(age_summary, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = age_group)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =sprintf("N=%d\(%.1f%%)", count, percentage)),vjust =-0.5, size =4) +labs(title ="Distribution of Participants by Age Group",x ="Age Group (Years)",y ="Number of Participants (N = 1558)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none",plot.margin =margin(t =20, r =20, b =20, l =20, unit ="pt") ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)), limits =c(0, max(age_summary$count) *1.15))# Display the plotprint(age_plot)# Print summary statisticsprint("Age distribution summary:")print(age_summary)```## Results and DiscussionAge: M = 37, SD = 16, 18-94, Median 32.5yo• The age distribution shows a right-skewed pattern with the majority of participants being between 18-40 years old, and fewer participants in the higher age ranges. Gender distribution is fairly balanced between male (48.1%) and female (46.5%) participants• The largest age group is 20-29 years (31.9% of participants)• There's a gradual decrease in participation with age, with the exception of a slight increase in the 60+ category compared to 50-59## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Create age and RMT usage summaryage_rmt_summary <- data_combined %>%filter(!is.na(age), !is.na(RMTMethods_YN)) %>%mutate(age_group =case_when( age <20~"Under 20", age >=20& age <30~"20-29", age >=30& age <40~"30-39", age >=40& age <50~"40-49", age >=50& age <60~"50-59", age >=60~"60+" ),RMTMethods_YN =case_when( RMTMethods_YN ==0~"No", RMTMethods_YN ==1~"Yes" ) )# Calculate total N for the analysistotal_n <-nrow(age_rmt_summary)# Create contingency tableage_rmt_table <-table(age_rmt_summary$age_group, age_rmt_summary$RMTMethods_YN)# Print the contingency tableprint("Contingency Table:")print(age_rmt_table)# Modified code with simulation-based p-valuechi_square_results <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)expected_counts <- chi_square_results$expectedprint("\Expected Counts:")print(round(expected_counts, 2))min_expected <-min(expected_counts)print(sprintf("\Minimum expected count: %.2f", min_expected))# If any expected count is below 5, use Fisher's exact testif(min_expected <5) {print("Some expected counts are less than 5; using Fisher's exact test instead.") fisher_test_results <-fisher.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Fisher's exact test results:")print(fisher_test_results) main_test_results <- fisher_test_results test_name <-"Fisher's exact test" test_statistic <-NA test_df <-NA test_pvalue <- fisher_test_results$p.value} else {# Run chi-square test with simulated p-values for robustness chi_square_sim <-chisq.test(age_rmt_table, simulate.p.value =TRUE, B =10000)print("\Chi-square test with simulated p-value:")print(chi_square_sim) main_test_results <- chi_square_sim test_name <-"Chi-square test" test_statistic <- chi_square_sim$statistic test_df <- chi_square_sim$parameter test_pvalue <- chi_square_sim$p.value}# Calculate percentages within each age groupage_rmt_summary_stats <- age_rmt_summary %>%group_by(age_group, RMTMethods_YN) %>%summarise(count =n(),.groups ='drop' ) %>%group_by(RMTMethods_YN) %>%mutate(total =sum(count),percentage = (count / total) *100 ) %>%arrange(RMTMethods_YN, factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")))# Create the plot ### REPLACErmt_age_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +# Adjust data label: extra vertical space by increasing vjust (e.g., -1 for more space)geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-1, size =3.5) +labs(title ="RMT Device Use by Age Group",subtitle =paste("Percentages calculated within each RMT group (Yes/No)"),x ="Age Group (Years)",y =paste("Number of Participants (Total N =", total_n, ")"),fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =40, r =20, b =20, l =20) # Increased top margin for clarity ) +# Expand the y-axis to create extra space at the top for the data labelsscale_y_continuous(expand =expansion(mult =c(0, 0.3)))# Display the plotprint(rmt_age_plot)# Calculate proportions for each RMT groupprint("\nProportions within each RMT group:")prop_table <-prop.table(age_rmt_table, margin =2) *100# Changed margin from 1 to 2print(round(prop_table, 2))# Calculate the standardized residualsif(exists("chi_square_results")) { std_residuals <- chi_square_results$residualsprint("\nStandardized residuals:")print(round(std_residuals, 2))print("Cells with absolute standardized residuals > 2 contribute significantly to the chi-square statistic")}# Additional visualization: Create a bar chart showing the percentage of each age group within RMT usage groupspercentage_plot <-ggplot(age_rmt_summary_stats, aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge") +geom_text(aes(label =sprintf("%.1f%%", percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3.5) +labs(title ="Age Distribution Within RMT Usage Groups",subtitle ="Showing percentage of each age group within 'Yes' and 'No' RMT usage categories",x ="Age Group (Years)",y ="Percentage within RMT Group (%)",fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =11, face ="italic"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the percentage plotprint(percentage_plot)``````{r}# Bonferroni Correction and Visualisation# Perform pairwise chi-square tests between age groups with Bonferroni correction. This determines how strict the Bonferroni correction will be.print("\Pairwise comparisons between age groups with Bonferroni correction:")age_groups <-rownames(age_rmt_table)n_comparisons <-choose(length(age_groups), 2) #calculates the total number of pairwise comparisonspairwise_results <-data.frame(Group1 =character(),Group2 =character(),ChiSquare =numeric(),DF =numeric(),RawP =numeric(),CorrectedP =numeric(),Significant =character(),stringsAsFactors =FALSE)# For each pair of age groups, the code: ## Creates a 2×2 contingency table containing only those two age groups## Checks if expected cell counts are below 5## Uses either Chi-square or Fisher's exact test accordingly## Applies Bonferroni correction to control for multiple comparisons## Determines significance based on the corrected p-valuefor (i in1:(length(age_groups)-1)) {for (j in (i+1):length(age_groups)) { subset_tab <- age_rmt_table[c(i, j), ]# Check expected counts for this pair pair_expected <-chisq.test(subset_tab)$expected min_pair_expected <-min(pair_expected)# Choose appropriate test based on expected counts## Chi-square test assumes expected cell counts ≥5## When this assumption is violated, Fisher's exact test is used instead## This adaptive approach ensures statistical validity for each comparisonif(min_pair_expected <5) { pair_test <-fisher.test(subset_tab) test_stat <-NA test_df <-NA } else { pair_test <-chisq.test(subset_tab) test_stat <- pair_test$statistic test_df <- pair_test$parameter }# Apply Bonferroni correction## The original p-value is multiplied by the total number of comparisons## This makes the significance threshold more stringent to avoid false positives## The correction controls the family-wise error rate when making multiple comparisons## The min() function ensures the corrected p-value doesn't exceed 1 corrected_p <-min(pair_test$p.value * n_comparisons, 1)# Determine if the result is significant is_significant <-ifelse(corrected_p <0.05, "Yes", "No")# Add to results dataframe pairwise_results <-rbind(pairwise_results, data.frame(Group1 = age_groups[i],Group2 = age_groups[j],ChiSquare =if(is.na(test_stat)) NAelseround(test_stat, 2),DF = test_df,RawP =round(pair_test$p.value, 4),CorrectedP =round(corrected_p, 4),Significant = is_significant,stringsAsFactors =FALSE ))# Print the resultif(is.na(test_stat)) { message <-sprintf("Comparison %s vs %s: Fisher's exact test, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], pair_test$p.value, corrected_p, is_significant) } else { message <-sprintf("Comparison %s vs %s: Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f, Significant: %s", age_groups[i], age_groups[j], test_stat, test_df, pair_test$p.value, corrected_p, is_significant) }print(message) }}# Print the pairwise results tableprint("\Summary of pairwise comparisons:")print(pairwise_results)# Create a heatmap of p-values for pairwise comparisons# Prepare data for heatmap## Red cells indicate p-values close to 0 (strong evidence of difference)## Yellow indicates p-values around 0.5 (moderate evidence)## White indicates p-values close to 1 (little evidence of difference)## Asterisks (*) mark statistically significant differences (p < 0.05)## The heatmap is symmetrical because the comparison of Group1 vs Group2 is the same as Group2 vs Group1heatmap_data <-matrix(NA, nrow =length(age_groups), ncol =length(age_groups))rownames(heatmap_data) <- age_groupscolnames(heatmap_data) <- age_groupsfor(i in1:nrow(pairwise_results)) { row_idx <-which(age_groups == pairwise_results$Group1[i]) col_idx <-which(age_groups == pairwise_results$Group2[i]) heatmap_data[row_idx, col_idx] <- pairwise_results$CorrectedP[i] heatmap_data[col_idx, row_idx] <- pairwise_results$CorrectedP[i] # Mirror the matrix}# Convert to long format for ggplotheatmap_long <-as.data.frame(as.table(heatmap_data))names(heatmap_long) <-c("Group1", "Group2", "CorrectedP")# Create the heatmapheatmap_plot <-ggplot(heatmap_long, aes(x = Group1, y = Group2, fill = CorrectedP)) +geom_tile() +scale_fill_gradient2(low ="red", mid ="yellow", high ="white", midpoint =0.5, na.value ="white",limits =c(0, 1), name ="Corrected p-value") +geom_text(aes(label =ifelse(is.na(CorrectedP), "", ifelse(CorrectedP <0.05, sprintf("%.4f*", CorrectedP),sprintf("%.4f", CorrectedP)))),size =3) +labs(title ="Pairwise Comparisons of RMT Usage Between Age Groups",subtitle ="Bonferroni-corrected p-values (* indicates significant at α = 0.05)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10)) +coord_fixed()# Display the heatmap## When analyzing the results from this code:## Look for cells with asterisks (*) in the heatmap or "Significant: Yes" in the printed results## These indicate which specific age groups differ significantly in RMT usage## The raw p-values show the strength of evidence before correction## The corrected p-values account for multiple testing and are more conservative## Pay attention to which test was used (Chi-square or Fisher's) for each comparison## This approach enables you to identify specifically which age groups differ from each other in their RMT usage patterns, rather than just knowing that "age is associated with RMT usage" in general.print(heatmap_plot)# Additional visualization: Create a bar chart showing the percentage of RMT usage by age grouppercentage_plot <-ggplot(age_rmt_summary_stats %>%filter(RMTMethods_YN =="Yes"), aes(x =factor(age_group, levels =c("Under 20", "20-29", "30-39", "40-49", "50-59", "60+")), y = percentage)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +labs(title ="Percentage of RMT Device Usage by Age Group",x ="Age Group (Years)",y ="Percentage (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12) )# Display the percentage plotprint(percentage_plot)```## Results and Discussion*Chi-Square Test*The chi-square test with simulated p-values (10,000 replicates) shows a statistically significant association between age group and RMT usage:X-squared = 35.047, p-value \< 0.0001This indicates strong evidence against the null hypothesis of independence between age group and RMT device usage.*Expected Counts*All expected cell counts exceed 5, confirming the chi-square test's validityMinimum expected count: 25.02*Standardized Residuals*The standardized residuals help identify which specific cells contribute most to the chi-square statistic**Notable observations:**The 30-39 age group has a strong positive residual (3.89) for "Yes" responses, indicating significantly higher RMT usage than expected. The Under 20 age group has a strong negative residual (-2.79) for "Yes" responses, indicating significantly lower RMT usage than expected.**Pairwise Comparisons with Bonferroni Correction**To determine which specific age groups differ significantly from each other, pairwise chi-square tests were conducted with Bonferroni correction to control for multiple comparisons**Summary of Significant Pairwise Differences:**30-39 age group shows significantly higher RMT usage compared to: 40-49 age group (p = 0.0198), 50-59 age group (p = 0.0145), 60+ age group (p = 0.0067), Under 20 age group (p = 0.0001).20-29 age group shows significantly higher RMT usage compared to: Under 20 age group (p = 0.0209).No significant differences were found between: 20-29 and 30-39 age groups Any pairs among the 40-49, 50-59, 60+, and Under 20 age groups.**Visual Representation**The analysis includes three visualizations:1. Bar chart of RMT usage counts by age group2. Heatmap of pairwise comparison p-values3. Percentage of RMT device usage by age group**Interpretation and Implications Age-Related Patterns in RMT Usage:**The 30-39 age group has the highest RMT adoption rate (23.37%). The 20-29 age group has the second highest adoption rate (16.70%). All other age groups have notably lower adoption rates (6.67% to 11.95%).**Key Findings:**There is a clear peak in RMT usage among adults in their 30s. Young adults (20-29) also show relatively high adoption. Both younger (under 20) and older (40+) age groups show significantly lower adoption rates. The 30-39 age group stands out as significantly different from most other age groups.**Practical** **Implications:**Targeted interventions may be needed to increase RMT adoption among both younger and older populations. Understanding the factors that drive higher adoption in the 30-39 age group could inform strategies to increase usage in other groups. The 20-29 and 30-39 age groups could be considered as early adopters or technology champions. This comprehensive analysis provides strong statistical evidence for age-related differences in RMT device usage, with clear identification of which specific age groups differ significantly from others.# Instruments Played## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Define instrument familieswoodwinds <-c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder")brass <-c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn", "French Horn/Horn")others <-c("Bagpipes", "Other")# Process instrument-level dataWI_split_updated <- data_combined %>%select(WI) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI)) %>%mutate(WI =case_when( WI =="French Horn/Horn"~"French Horn", WI =="Oboe/Cor Anglais"~"Oboe",TRUE~ WI )) %>%filter(WI !="Unknown") %>%count(WI, sort =TRUE) %>%mutate(Family =case_when( WI %in%c(woodwinds, "Oboe") ~"Woodwinds", WI %in% brass ~"Brass",TRUE~"Others" ))# Calculate percentages for instrumentsWI_percentages_updated <- WI_split_updated %>%mutate(Percentage =round((n /3054) *100, 2))# Calculate family distributionfamily_distribution_updated <- WI_split_updated %>%group_by(Family) %>%summarise(Total =sum(n)) %>%mutate(Percentage =round((Total /3054) *100, 2))# Create instrument distribution plotinstrument_plot <-ggplot(data = WI_percentages_updated, aes(x =reorder(WI, n), y = n, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(n, " (", Percentage, "%)")), hjust =-0.1, size =3,position =position_dodge(width =1)) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.3))) +# Increased expansion for labelslabs(title ="Distribution of Wind Instruments by Count and Percentage",x ="Instrument",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(axis.text.y =element_text(size =10),plot.title =element_text(size =12, face ="bold"),plot.caption =element_text(size =10, hjust =0) )# Create family distribution plotfamily_plot <-ggplot(data = family_distribution_updated, aes(x =reorder(Family, -Total), y = Total, fill = Family)) +geom_bar(stat ="identity", color ="black") +geom_text(aes(label =paste0(Total, "\n(", Percentage, "%)")), vjust =-0.5, size =4,position =position_dodge(width =1)) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) +# Increased expansion for labelslabs(title ="Distribution by Instrument Family",x ="Instrument Family",y ="Frequency (N=1558, responses = 3054)",caption ="Note. reported percentages are out of total responses (3054) not the total N (1558)") +theme_minimal() +theme(plot.title =element_text(size =12, face ="bold"),legend.position ="none",plot.caption =element_text(size =10, hjust =0) )# Display plotsprint(instrument_plot)print(family_plot)# Print distributionsprint("Instrument Distribution:")print(WI_percentages_updated)print("\nFamily Distribution:")print(family_distribution_updated)```### Results and Discussion• Woodwinds represented a slight majority of 58.02% of the total instruments.• Brass accounts for 31.99%, with the updated "French Horn" category included.• Others and Unknown categories make up smaller portions (5.04% and 4.94%, respectively).*Individual Instrument Distribution:*• Saxophone remains the most popular instrument (15.62%), followed by Flute (14.51%) and Clarinet (13.59%).• Trumpet leads the brass instruments with 11.23%, while French Horn now accurately reflects its combined count.• Others include: harmonica, tenor horn, non-western flute, recorder, ocarina, cornet, whistle, baritone, non-western reed, flugelhorn, misc.## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process instrument and RMT datainstrument_rmt_data <- data_combined %>%filter(!is.na(WI), !is.na(RMTMethods_YN)) %>%separate_rows(WI, sep =",") %>%mutate(WI =trimws(WI),WI =case_when( WI =="French Horn/Horn"~"French Horn",TRUE~ WI ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) ) %>%filter(WI !="Unknown") %>%mutate(family =case_when( WI %in%c("Flute", "Piccolo", "Clarinet", "Saxophone", "Oboe", "Bassoon", "Recorder") ~"Woodwinds", WI %in%c("Trumpet", "Trombone", "Tuba", "Euphonium", "French Horn") ~"Brass",TRUE~"Others" ))# Calculate counts and percentages for each RMT group (Changed grouping)family_rmt_summary <- instrument_rmt_data %>%group_by(family, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(RMTMethods_YN) %>%# Changed from family to RMTMethods_YNmutate(total =sum(count),percentage = (count / total) *100 )# Calculate total N for adding to the plottotal_n <-nrow(instrument_rmt_data)# Perform chi-square test on the full contingency tablecontingency_table <-table(instrument_rmt_data$family, instrument_rmt_data$RMTMethods_YN)chi_square_test <-chisq.test(contingency_table)print("Chi-square test results:")print(chi_square_test)# Check the assumption: print the expected counts from the chi-square testprint("Expected counts:")print(chi_square_test$expected)# If any expected count is less than 5, issue a warning and switch to Fisher's exact testif(min(chi_square_test$expected) <5) {print("Chi-square test assumption violated. Switching to Fisher's exact test.") chi_square_test <-fisher.test(contingency_table)print("Fisher's exact test results:")print(chi_square_test)}# Create the plotrmt_distribution_plot <-ggplot(family_rmt_summary, aes(x = family, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge", color ="black") +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_dodge(width =0.9),vjust =-0.5, size =3) +labs(title ="Distribution of Instrument Families within RMT Usage Groups",subtitle =sprintf("Percentages calculated within each RMT group (Chi-square: χ² = %.2f, df = %d, p = %.4f)", chi_square_test$statistic, chi_square_test$parameter, chi_square_test$p.value),x ="Instrument Family",y =paste("Number of Responses (Total N =", total_n, ")"),fill ="RMT Usage") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="right",plot.margin =margin(t =30, r =20, b =20, l =20) # Add more space at the top ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2))) # Add more space above bars# Display the plotprint(rmt_distribution_plot)# Calculate and print odds ratiosprint("\nOdds ratios between instrument families and RMT usage:")family_rmt_odds <- family_rmt_summary %>%select(family, RMTMethods_YN, count) %>% tidyr::pivot_wider(names_from = RMTMethods_YN, values_from = count, values_fill =list(count =0)) %>%mutate(# Add a small constant to avoid division by zeroodds_ratio =if(all(c("RMT", "No RMT") %in%names(.))) { (RMT +0.5) / (sum(RMT) +0.5) / (((`No RMT`+0.5) / (sum(`No RMT`) +0.5))) } else {NA } )print(family_rmt_odds)# Create a new data frame with percentages within RMT groups for easier interpretationrmt_percentages <- family_rmt_summary %>%select(family, RMTMethods_YN, count, percentage) %>%pivot_wider(names_from = RMTMethods_YN, values_from =c(count, percentage),names_glue ="{RMTMethods_YN}_{.value}")print("\nPercentage distribution within each RMT group:")print(rmt_percentages)# Perform post-hoc analysis for each pair of instrument familiesprint("\nPairwise comparisons between instrument families:")families <-unique(instrument_rmt_data$family)if(length(families) >1) {for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j]))# Create contingency table for this pair pair_table <-table(subset_data$family, subset_data$RMTMethods_YN)# Check if table has enough data for chi-square testif(all(dim(pair_table) >0) &&min(pair_table) >0) { pair_test <-chisq.test(pair_table) test_type <-"Chi-square" } else {# Use Fisher's exact test as fallback pair_test <-fisher.test(pair_table, simulate.p.value =TRUE, B =10000) test_type <-"Fisher's exact" }print(sprintf("\n%s vs %s:", families[i], families[j]))if(test_type =="Chi-square") {print(sprintf("%s test: Chi-square = %.2f, df = %d, p = %.4f", test_type, pair_test$statistic, pair_test$parameter, pair_test$p.value)) } else {print(sprintf("%s test: p = %.4f", test_type, pair_test$p.value)) } } }}# Perform pairwise comparisons between instrument families with Bonferroni correction:print("\nPairwise comparisons between instrument families (with Bonferroni correction):")families <-unique(instrument_rmt_data$family)n_comparisons <-length(families) * (length(families)-1) /2for(i in1:(length(families)-1)) {for(j in (i+1):length(families)) { subset_data <- instrument_rmt_data %>%filter(family %in%c(families[i], families[j])) pair_test <-chisq.test(table(subset_data$family, subset_data$RMTMethods_YN)) p_value_corrected <-min(pair_test$p.value * n_comparisons, 1) # Correction and capped at 1print(sprintf("\n%s vs %s:", families[i], families[j]))print(sprintf("Chi-square = %.2f, df = %d, raw p = %.4f, Bonferroni corrected p = %.4f", pair_test$statistic, pair_test$parameter, pair_test$p.value, p_value_corrected)) }}# Add new visualization specifically showing the distribution within RMT groupspercentage_plot <-ggplot(family_rmt_summary, aes(x = family, y = percentage, fill = family)) +geom_bar(stat ="identity") +geom_text(aes(label =sprintf("%.1f%%", percentage)),vjust =-0.5, size =3.5) +facet_wrap(~ RMTMethods_YN, scales ="free_y") +labs(title ="Distribution of Instrument Families within Each RMT Usage Group",subtitle ="Showing percentage of each instrument family within 'RMT' and 'No RMT' groups",x ="Instrument Family",y ="Percentage within RMT Group (%)") +theme_minimal() +theme(plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =10, face ="italic"),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.position ="none" )# Display the percentage plotprint(percentage_plot)```## Results and DiscussionBased on the provided statistical output, there are significant differences in Respiratory Muscle Training (RMT) usage across instrument families, with brass players showing notably higher adoption rates compared to woodwind players and other instrumentalists.*Classification by Instrument Family:*Instruments are grouped into families based on predefined lists. For example:Instruments like “Flute,” “Piccolo,” “Clarinet,” etc., are grouped as Woodwinds. Instruments like “Trumpet,” “Trombone,” “Tuba,” etc., are grouped as Brass. All others are assigned to a default category called Others.**Observed Frequencies Analysis***Raw Counts by Instrument Family:*- **Brass**: 765 non-users, 213 RMT users (21.8% usage rate)- **Woodwinds**: 1,523 non-users, 249 RMT users (14.0% usage rate)- **Others**: 260 non-users, 44 RMT users (14.5% usage rate)**Chi-square Test Results:**- Overall test: X² = 28.293, df = 2, p-value = 7.183e-07- This highly significant p-value (\<0.001) indicates strong evidence of an association between instrument family and RMT usage.**Pairwise Comparisons***Woodwinds vs. Others:*- Chi-square = 0.01, df = 1, p = 0.9156- Bonferroni corrected p = 1.0000- Interpretation: No significant difference in RMT usage between woodwind players and other instrumentalists.**Woodwinds vs. Brass:**- Chi-square = 26.37, df = 1, p \< 0.0001- Bonferroni corrected p \< 0.0001- Interpretation: Highly significant difference, with brass players using RMT at a significantly higher rate than woodwind players.**Others vs. Brass:**- Chi-square = 7.27, df = 1, p = 0.0070- Bonferroni corrected p = 0.0210- Interpretation: Significant difference even after correction for multiple comparisons, with brass players using RMT at a higher rate than other instrumentalists.*Key findings from the pairwise comparisons:*• Woodwinds vs. Brass: Highly significant difference (p \< 0.0001)• Others vs. Brass: Significant difference (p = 0.007)• Woodwinds vs. Others: No significant difference (p = 0.916)1. There is a statistically significant difference in RMT usage across instrument families2. Brass players show significantly different patterns of RMT usage compared to both Woodwind players and Others3. Woodwind players and Others show similar patterns of RMT usage4. The proportion of RMT usage is highest among Brass playersX-squared = 55.92, df = 13, p-value = 2.784e-07The plot shows the percentage distribution of instruments within each RMT group (No RMT vs RMT), with the actual counts and percentages labeled on each bar. The chi-square test confirms a significant association between instrument type and RMT usage (p \< 0.001).The Tuba, Euphonium, and Bagpipes have the highest percentages of RMT usage among instruments with at least 10 players.The loop over instrument families now includes a Bonferroni correction to adjust p-values for multiple comparisons. The corrected p-value is calculated by multiplying the raw p-value by the number of pairwise comparisons. The output for each pair is printed with both raw and corrected p-values.These modifications ensure that the chi-square test is valid (by checking the expected counts), and the risk of Type I error in multiple pairwise comparisons is reduced. The overall robustness of the analysis is thereby improved.**Odds Ratio Calculation:**Although the code includes a basic calculation for the odds ratios between instrument families, this step is intended to quantify the strength of the association between each family’s membership and RMT usage. A more sophisticated approach (like logistic regression) might be used for robust odds ratio calculations, but the given code calculates a simplified measure.Calculated from raw counts:- *Brass vs. Woodwinds*: Brass players have approximately 1.71 times higher odds of using RMT compared to woodwind players (21.8% vs 14.0%).- *Brass vs. Others*: Brass players have approximately 1.64 times higher odds of using RMT compared to other instrumentalists (21.8% vs 14.5%).- *Woodwinds vs. Others*: Nearly identical RMT usage rates (14.0% vs 14.5%), confirming the non-significant chi-square result.**Comprehensive Interpretation**1. *Brass instrumentalists stand out*: Players of brass instruments show significantly higher adoption of RMT methods compared to both woodwind players and other instrumentalists. This difference remains significant even after applying the conservative Bonferroni correction for multiple comparisons.2. *Woodwinds and Others are similar*: There is no meaningful difference in RMT usage between woodwind players and other instrumentalists, as evidenced by the very low chi-square value (0.01) and high p-value (0.9156).3. *Potential explanations*: Several factors might explain the higher RMT adoption among brass players:- Specific physical demands or challenges unique to brass playing- Different pedagogical traditions or technological integration in brass education- Demographic differences between instrument families (e.g., age, gender, education level)- Potential differences in injury patterns or physical concerns across instrument families**Conclusion**Brass instrumentalists demonstrate significantly higher adoption of RMT methods compared to other musician groups. This finding may have implications for how RMT technologies are marketed, designed, or implemented across different instruments. Further research could explore the underlying reasons for this difference, which could inform more targeted approaches to technology integration in music education and performance.# Skill Level## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Extract playAbility_MAX and remove 0 responsesplot_data <- data_combined %>%filter(playAbility_MAX !=0, !is.na(playAbility_MAX)) %>%# Added NA checkmutate(playAbility_MAX =as.factor(playAbility_MAX)) %>%count(playAbility_MAX) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)"))# Define custom labels for x-axiscustom_labels <-c("1"="Novice", "2"="Beginner", "3"="Intermediate", "4"="Advanced", "5"="Expert")# Get the actual levels present in the dataactual_levels <-levels(plot_data$playAbility_MAX)# Create Plotplayability_plot <-ggplot(plot_data, aes(x = playAbility_MAX, y = n)) +geom_bar(stat ="identity", fill ="steelblue") +geom_text(aes(label = label), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability (Max Score)",x ="Play Ability (Novice = 1 to Expert = 5)",y ="Count of Participants (N = 1558)" ) +scale_x_discrete(# Use actual levels from data rather than hardcoded valueslabels = custom_labels[actual_levels] ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the Plotprint(playability_plot)``````{r}# Combined Categories# Read the datalibrary(tidyverse)library(readxl)data_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process playAbility_MAX with the new category definitionsplot_data <- data_combined %>%filter(!is.na(playAbility_MAX)) %>%mutate(# Ensure playAbility_MAX is numericplayAbility_MAX =as.numeric(playAbility_MAX),# Create the new combined categories with half-point valuesskill_level =case_when( playAbility_MAX %in%c(0, 1, 1.5, 2) ~"Novice/Beginner", playAbility_MAX %in%c(2.5, 3, 3.5) ~"Intermediate", playAbility_MAX %in%c(4, 4.5, 5) ~"Advanced/Expert",TRUE~NA_character_ ),# Order factor levelsskill_level =factor(skill_level, levels =c("Novice/Beginner", "Intermediate", "Advanced/Expert")) ) %>%# Remove any NA values from our categoriesfilter(!is.na(skill_level)) %>%# Count by skill levelgroup_by(skill_level) %>%summarise(count =n()) %>%# Calculate percentagesmutate(total =sum(count),percentage = (count / total) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") )# Calculate total valid responses for y-axis labeltotal_n <-sum(plot_data$count)# Create the plotggplot(plot_data, aes(x = skill_level, y = count, fill = skill_level)) +geom_bar(stat ="identity", width =0.7) +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Distribution of Play Ability (Combined Categories)",x ="Play Ability Level",y =paste0("Number of Participants (N = ", total_n, ")") ) +scale_fill_manual(values =c("Novice/Beginner"="#4682B4", "Intermediate"="#6495ED", "Advanced/Expert"="#1E90FF")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =12),axis.text =element_text(size =11),legend.position ="none",panel.grid.major.x =element_blank() ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Additionally, print value counts to help diagnose any issuesvalue_counts <- data_combined %>%filter(!is.na(playAbility_MAX)) %>%count(playAbility_MAX) %>%arrange(playAbility_MAX)print("Distribution of values in playAbility_MAX:")print(value_counts)```## Results and DiscussionLooking at the distribution of self-reported playing ability scores (1 = Novice to 5 = Expert) among the 1,558 participants:• Score 4 is the most common with approximately 500 participants (\~32%), indicating many consider themselves advanced players• Score 3 is the second most frequent with about 450 participants (\~29%), representing intermediate-level players• Score 5 (Expert level) has roughly 350 participants (\~22%)• Score 2 (Beginner-intermediate) has about 200 participants (\~13%)• Score 1 (Novice) is the least common with fewer than 100 participants (\~4%)The distribution is slightly negatively skewed (skewed towards higher scores), with the majority of participants (\>80%) rating their playing ability as intermediate or higher (scores 3-5). This suggests that the sample consists primarily of experienced wind instrument players, with relatively few beginners. The peak at score 4 indicates that most participants consider themselves advanced, but not expert-level players.The plot shows the distribution of self-reported playing ability (1 = Novice to 5 = Expert) for each wind instrument. Some key observations:• Flute shows the highest number of responses, with a right-skewed distribution (more advanced players)• Most instruments show fewer responses overall compared to flute and clarinet• Bagpipes and euphonium show notably fewer responses• The warning message indicates some missing values were removed, which is expected as not all participants play all instruments### Combined Groups*Key Observations:*1. **Strong Positive Skew**: The distribution is heavily skewed toward the higher end of the scale, with 71.1% of participants (1,104 out of 1,558) falling in the 4.0-5.0 range.2. **Highest Frequencies**:- Level 4.0: 484 participants (31.1%)- Level 5.0: 447 participants (28.7%)- Level 3.0: 222 participants (14.2%)3. **Lowest Frequencies**:- Level 0.0: 1 participant (0.1%)- Level 1.5: 3 participants (0.2%)- Level 1.0: 6 participants (0.4%)4. **Half-Point vs. Whole-Number Ratings**: Half-point ratings (1.5, 2.5, 3.5, 4.5) consistently show lower frequencies than adjacent whole numbers, suggesting raters may prefer whole numbers when assessing ability but use half points for more nuanced judgments.*Combined Category Analysis*When organized into three combined categories:1. **Novice/Beginner (1.0, 1.5, 2.0)**: 41 participants (2.6%)- This group represents a very small portion of the sample, indicating either few beginners participated or the population being studied generally has higher ability levels.2. **Intermediate (2.5, 3.0, 3.5)**: 412 participants (26.5%)- Representing about a quarter of participants with moderate skill levels.3. **Advanced/Expert (4.0, 4.5, 5.0)**: 1,104 participants (70.9%)- Clearly the dominant group, comprising more than two-thirds of the sample.4. **Outlier (0.0)**: 1 participant (0.1%)- This likely represents an error, a special case, or potentially a "non-player" category.**Implications and Considerations***Potential Explanations for the Skewed Distribution:*1. **Population Characteristics**: This may accurately reflect a population primarily consisting of advanced players, such as professional or experienced musicians.2. **Self-Selection Bias**: Advanced players might be more likely to participate in studies related to their expertise or complete relevant surveys.3. **Self-Assessment Inflation**: If self-reported, participants may tend to overestimate their abilities.4. **Scale Calibration**: The assessment scale might not optimally differentiate skill levels in this specific population, potentially causing a ceiling effect where many advanced players cluster at the top.5. **Recruitment Strategy**: The participant recruitment process may have targeted more experienced players.**Research and Application Considerations**1. **Representativeness**: Consider whether this distribution accurately represents the target population or reflects selection biases.2. **Statistical Analysis**: The heavy skew toward higher ratings might influence statistical analyses that assume normal distribution.3. **Group Comparisons**: When comparing groups (e.g., by demographics), be aware that the small sample of beginners might limit certain analyses.4. **Scale Refinement**: For future studies, consider a more differentiated scale at the upper end to better distinguish between advanced and expert performers.**Conclusion**The data reveals a population predominantly composed of advanced/expert level players, with relatively few beginners. This distribution is important context for any further analyses or conclusions drawn from this dataset, particularly when examining factors associated with play ability or when making generalizations about the broader population of players.## Comparison Stats```{r}# Original Analysis without Combined Groups# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation: Filter playAbility_MAX and prepare variablesdata_combined <- data_combined %>%mutate(playAbility_MAX =as.numeric(playAbility_MAX),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT")) )# Filter out cases where playAbility_MAX is 0 or NAanalysis_data <- data_combined %>%filter(!is.na(playAbility_MAX), playAbility_MAX !=0, !is.na(RMTMethods_YN)) %>%mutate(# Create factor version of playAbility with proper labelsplayAbility_factor =factor(playAbility_MAX, levels =1:5,labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")),# Create binary variables for logistic regressionhigh_play =ifelse(playAbility_MAX >=4, 1, 0),RMT_binary =ifelse(RMTMethods_YN =="RMT", 1, 0) )# Calculate counts by playAbility and RMT usagegrouped_data <- analysis_data %>%group_by(RMTMethods_YN, playAbility_factor) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%ungroup()# Statistical Analysis: Chi-square Test of Independencecontingency_table <-table(analysis_data$playAbility_factor, analysis_data$RMTMethods_YN)# Use simulation-based p-value calculation for chi-square testchi_test <-chisq.test(contingency_table, simulate.p.value =TRUE, B =10000)# Print the statistical resultscat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")print(chi_test)# Check expected frequenciesexpected_freqs <- chi_test$expectedprint("Expected frequencies:")print(expected_freqs)# Calculate standardized residualsstd_residuals <-data.frame(playAbility =rep(rownames(chi_test$stdres), times =ncol(chi_test$stdres)),RMTMethods =rep(colnames(chi_test$stdres), each =nrow(chi_test$stdres)),std_residual =as.vector(chi_test$stdres),rounded_res =round(as.vector(chi_test$stdres), 2))# Print significant residualssig_residuals <- std_residuals %>%filter(abs(std_residual) >1.96)cat("\nSignificant Standardized Residuals (>|1.96|):\n")print(sig_residuals)# Calculate effect size: Cramer's Vn_total <-sum(contingency_table)df_min <-min(nrow(contingency_table) -1, ncol(contingency_table) -1)cramer_v <-sqrt(chi_test$statistic / (n_total * df_min))cat("\nEffect Size (Cramer's V):\n")print(cramer_v)# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_factor, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)# Logistic Regression Analysis# Run continuous modellogit_model <-glm(RMT_binary ~ playAbility_MAX, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")or_results <-data.frame(Term =names(odds_ratios),OddsRatio =round(odds_ratios, 3),CI_lower =round(conf_intervals[,1], 3),CI_upper =round(conf_intervals[,2], 3))print(or_results)# Predicted probabilities for each playing ability levelnew_data <-data.frame(playAbility_MAX =1:5)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =1:5,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by playing ability level:\n")print(result_df)# Create a visualization of predicted probabilitieslibrary(ggplot2)# Plot predicted probabilities from continuous modelprob_plot <-ggplot(result_df, aes(x = playAbility_level, y = predicted_probability)) +geom_line(size =1.5) +geom_point(size =3) +scale_x_continuous(breaks =1:5, labels =c("Novice", "Beginner", "Intermediate", "Advanced", "Expert")) +labs(title ="Predicted Probability of RMT Usage by Playing Ability Level",x ="Playing Ability Level",y ="Probability of Using RMT Methods") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text =element_text(size =12),axis.title =element_text(size =14) ) +scale_y_continuous(labels = scales::percent_format(accuracy =1),limits =c(0, max(predicted_probs) *1.1))# Display the probability plotprint(prob_plot)# Calculate McFadden's Pseudo R-squarednull_model <-glm(RMT_binary ~1, data = analysis_data, family ="binomial")logLik_full <-as.numeric(logLik(logit_model))logLik_null <-as.numeric(logLik(null_model))mcfadden_r2 <-1- (logLik_full / logLik_null)cat(paste("\nMcFadden's Pseudo R-squared:", round(mcfadden_r2, 4)))# Classification metrics with robust handlingpredicted_classes <-ifelse(fitted(logit_model) >0.5, 1, 0)confusion_matrix <-table(Predicted =factor(predicted_classes, levels =c(0, 1)), Actual =factor(analysis_data$RMT_binary, levels =c(0, 1)))cat("\n\nConfusion Matrix:\n")print(confusion_matrix)# Calculate metrics with checks for zero denominatorsaccuracy <-sum(diag(confusion_matrix)) /sum(confusion_matrix)sensitivity <-ifelse(sum(confusion_matrix[,2]) >0, confusion_matrix[2,2] /sum(confusion_matrix[,2]), NA)specificity <-ifelse(sum(confusion_matrix[,1]) >0, confusion_matrix[1,1] /sum(confusion_matrix[,1]), NA)cat(paste("\nAccuracy:", round(accuracy, 3)))cat(paste("\nSensitivity (True Positive Rate):", ifelse(is.na(sensitivity), "Not calculable", round(sensitivity, 3))))cat(paste("\nSpecificity (True Negative Rate):", ifelse(is.na(specificity), "Not calculable", round(specificity, 3))))``````{r}# Combined Groups# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Data Preparation: Filter playAbility_MAX and prepare variablesdata_combined <- data_combined %>%mutate(playAbility_MAX =as.numeric(playAbility_MAX),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1), labels =c("No RMT", "RMT")) )# Filter out cases where playAbility_MAX is NAanalysis_data <- data_combined %>%filter(!is.na(playAbility_MAX), !is.na(RMTMethods_YN)) %>%mutate(# Create new combined categories - note that 0 is now included in Novice/BeginnerplayAbility_combined =case_when( playAbility_MAX %in%c(0, 1, 1.5, 2) ~"Novice/Beginner", playAbility_MAX %in%c(2.5, 3, 3.5) ~"Intermediate", playAbility_MAX %in%c(4, 4.5, 5) ~"Advanced/Expert",TRUE~NA_character_ ),# Order the factor levels correctlyplayAbility_combined =factor(playAbility_combined, levels =c("Novice/Beginner", "Intermediate", "Advanced/Expert")),# Create binary variables for logistic regressionhigh_play =ifelse(playAbility_MAX >=4, 1, 0),RMT_binary =ifelse(RMTMethods_YN =="RMT", 1, 0) )# Calculate counts by combined playAbility and RMT usagegrouped_data <- analysis_data %>%group_by(RMTMethods_YN, playAbility_combined) %>%summarise(count =n(), .groups ="drop") %>%group_by(RMTMethods_YN) %>%mutate(percentage = count /sum(count) *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)")) %>%ungroup()# Statistical Analysis: Chi-square Test of Independencecontingency_table <-table(analysis_data$playAbility_combined, analysis_data$RMTMethods_YN)# Use simulation-based p-value calculation for chi-square testchi_test <-chisq.test(contingency_table, simulate.p.value =TRUE, B =10000)# Print the statistical resultscat("\nChi-square Test Results (Independence between playAbility and RMT Usage):\n")print(chi_test)# Check expected frequenciesexpected_freqs <- chi_test$expectedprint("Expected frequencies:")print(expected_freqs)# Calculate standardized residualsstd_residuals <-data.frame(playAbility =rep(rownames(chi_test$stdres), times =ncol(chi_test$stdres)),RMTMethods =rep(colnames(chi_test$stdres), each =nrow(chi_test$stdres)),std_residual =as.vector(chi_test$stdres),rounded_res =round(as.vector(chi_test$stdres), 2))# Print significant residualssig_residuals <- std_residuals %>%filter(abs(std_residual) >1.96)cat("\nSignificant Standardized Residuals (>|1.96|):\n")print(sig_residuals)# Calculate effect size: Cramer's Vn_total <-sum(contingency_table)df_min <-min(nrow(contingency_table) -1, ncol(contingency_table) -1)cramer_v <-sqrt(chi_test$statistic / (n_total * df_min))cat("\nEffect Size (Cramer's V):\n")print(cramer_v)# Create Plot: Grouped Bar Plotplayability_group_plot <-ggplot(grouped_data, aes(x = playAbility_combined, y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9)) +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Distribution of Play Ability by RMT Usage",subtitle ="Using Combined Ability Categories (0 included in Novice/Beginner)",x ="Play Ability Level",y =paste0("Count of Participants (N = ", nrow(analysis_data), ")"),fill ="RMT Usage" ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12, face ="italic"),axis.text =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the grouped bar plotprint(playability_group_plot)# Logistic Regression Analysis# First create a numeric version of the combined categories for the continuous modelanalysis_data <- analysis_data %>%mutate(playAbility_combined_numeric =case_when( playAbility_combined =="Novice/Beginner"~1, playAbility_combined =="Intermediate"~2, playAbility_combined =="Advanced/Expert"~3,TRUE~NA_real_ ) )# Run continuous model with the combined categorieslogit_model <-glm(RMT_binary ~ playAbility_combined_numeric, data = analysis_data, family ="binomial")# Print model summarysummary_output <-summary(logit_model)print(summary_output)# Calculate odds ratios and confidence intervalsodds_ratios <-exp(coef(logit_model))conf_intervals <-exp(confint(logit_model))cat("\nOdds Ratios with 95% Confidence Intervals:\n")or_results <-data.frame(Term =names(odds_ratios),OddsRatio =round(odds_ratios, 3),CI_lower =round(conf_intervals[,1], 3),CI_upper =round(conf_intervals[,2], 3))print(or_results)# Predicted probabilities for each combined playing ability levelnew_data <-data.frame(playAbility_combined_numeric =1:3)predicted_probs <-predict(logit_model, newdata = new_data, type ="response")result_df <-data.frame(playAbility_level =c("Novice/Beginner", "Intermediate", "Advanced/Expert"),combined_level_value =1:3,predicted_probability = predicted_probs)cat("\nPredicted probabilities of RMT usage by combined playing ability level:\n")print(result_df)# Create a visualization of predicted probabilitieslibrary(ggplot2)# Plot predicted probabilities from continuous modelprob_plot <-ggplot(result_df, aes(x = combined_level_value, y = predicted_probability)) +geom_line(size =1.5) +geom_point(size =3) +scale_x_continuous(breaks =1:3, labels =c("Novice/Beginner", "Intermediate", "Advanced/Expert")) +labs(title ="Predicted Probability of RMT Usage by Playing Ability Level",subtitle ="Based on Combined Ability Categories (0 included in Novice/Beginner)",x ="Playing Ability Level",y ="Probability of Using RMT Methods") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12, face ="italic"),axis.text =element_text(size =12),axis.text.x =element_text(angle =45, hjust =1),axis.title =element_text(size =14) ) +scale_y_continuous(labels = scales::percent_format(accuracy =1),limits =c(0, max(predicted_probs) *1.1))# Display the probability plotprint(prob_plot)# Calculate McFadden's Pseudo R-squarednull_model <-glm(RMT_binary ~1, data = analysis_data, family ="binomial")logLik_full <-as.numeric(logLik(logit_model))logLik_null <-as.numeric(logLik(null_model))mcfadden_r2 <-1- (logLik_full / logLik_null)cat(paste("\nMcFadden's Pseudo R-squared:", round(mcfadden_r2, 4)))# Classification metrics with robust handlingpredicted_classes <-ifelse(fitted(logit_model) >0.5, 1, 0)confusion_matrix <-table(Predicted =factor(predicted_classes, levels =c(0, 1)), Actual =factor(analysis_data$RMT_binary, levels =c(0, 1)))cat("\n\nConfusion Matrix:\n")print(confusion_matrix)# Calculate metrics with checks for zero denominatorsaccuracy <-sum(diag(confusion_matrix)) /sum(confusion_matrix)sensitivity <-ifelse(sum(confusion_matrix[,2]) >0, confusion_matrix[2,2] /sum(confusion_matrix[,2]), NA)specificity <-ifelse(sum(confusion_matrix[,1]) >0, confusion_matrix[1,1] /sum(confusion_matrix[,1]), NA)cat(paste("\nAccuracy:", round(accuracy, 3)))cat(paste("\nSensitivity (True Positive Rate):", ifelse(is.na(sensitivity), "Not calculable", round(sensitivity, 3))))cat(paste("\nSpecificity (True Negative Rate):", ifelse(is.na(specificity), "Not calculable", round(specificity, 3))))# Additional analysis: Also run categorical modellogit_model_categorical <-glm(RMT_binary ~ playAbility_combined, data = analysis_data, family ="binomial")# Print categorical model summarycat("\n\nCategorical Model Results (using combined categories as factors):\n")print(summary(logit_model_categorical))# Calculate odds ratios for categorical modelcat("\nOdds Ratios for Categorical Model:\n")odds_ratios_cat <-exp(coef(logit_model_categorical))conf_intervals_cat <-exp(confint(logit_model_categorical))or_results_cat <-data.frame(Term =names(odds_ratios_cat),OddsRatio =round(odds_ratios_cat, 3),CI_lower =round(conf_intervals_cat[,1], 3),CI_upper =round(conf_intervals_cat[,2], 3))print(or_results_cat)# Print breakdown of original values in each combined categorycat("\nBreakdown of original playAbility_MAX values in each combined category:\n")value_counts <- analysis_data %>%count(playAbility_MAX, playAbility_combined) %>%arrange(playAbility_MAX)print(value_counts)```## Results and Discussion (based on combined findings)This analysis examines the relationship between musicians' play ability levels (grouped into three categories) and their use of Remote Monitoring Technology (RMT). The results show a statistically significant association between play ability and RMT usage, with specific patterns of usage across different skill levels.**Chi-Square Test of Independence**- **Test Statistic**: X² = 26.337- **P-value**: p \< 0.0001 (based on 10,000 simulated replicates)- **Interpretation**: There is very strong evidence of an association between play ability and RMT usage. The null hypothesis that these variables are independent can be confidently rejected.**Effect Size**- **Cramer's V**: 0.13- **Interpretation**: This represents a small to moderate association between play ability and RMT usage. While statistically significant, the strength of the relationship is not overwhelming.**Standardized Residuals Analysis**The standardized residuals reveal exactly where the significant deviations from independence occur:- **Intermediate players and RMT usage**:- Strong negative residual (-4.92) for "RMT"- Strong positive residual (4.92) for "No RMT"- This indicates intermediate players are significantly less likely to use RMT than expected- **Advanced/Expert players and RMT usage**:- Strong positive residual (5.12) for "RMT"- Strong negative residual (-5.12) for "No RMT"- This indicates advanced/expert players are significantly more likely to use RMT than expected**Logistic Regression Models***Continuous Model*- **Coefficient for playAbility_combined_numeric**: 0.8246 (p \< 0.0001)- **Odds Ratio**: 2.281 (95% CI: 1.632-3.280)- **Interpretation**: For each step up in the combined ability category (e.g., from Novice/Beginner to Intermediate), the odds of using RMT increase by a factor of 2.28.**Categorical Model**- **Intermediate vs. Novice/Beginner**:- Odds Ratio: 0.746 (non-significant, p = 0.600)- Intermediate players have slightly lower odds of using RMT than novice/beginners, but this difference is not statistically significant- **Advanced/Expert vs. Novice/Beginner**:- Odds Ratio: 2.025 (non-significant, p = 0.184)- Advanced/expert players have approximately double the odds of using RMT compared to novice/beginners, but this difference does not reach statistical significance.**Predicted Probabilities**The model predicts the following probabilities of RMT usage:- Novice/Beginner: 3.9%- Intermediate: 8.4%- Advanced/Expert: 17.4%This shows a clear increasing trend in RMT usage with higher play ability levels, with advanced/expert players having more than four times the probability of using RMT compared to novice/beginners.**Model Performance**- **McFadden's Pseudo R-squared**: 0.0201- **Accuracy**: 85.4%- **Sensitivity**: 0 (model fails to predict any true RMT users)- **Specificity**: 1 (model correctly identifies all non-RMT users)The model has modest explanatory power and defaults to predicting "No RMT" for all observations, which yields high accuracy due to the imbalanced dataset (far more non-users than users).**Distribution of Participants**- **Novice/Beginner** (values 0, 1, 1.5, 2): 42 participants- **Intermediate** (values 2.5, 3, 3.5): 412 participants- **Advanced/Expert** (values 4, 4.5, 5): 1,104 participantsThe sample is heavily weighted toward advanced/expert players, with relatively few novice/beginners.**Key Insights and Implications**1. **Skill-Technology Relationship**: There is a significant positive relationship between play ability and RMT usage, with more advanced players more likely to adopt this technology.2. **Intermediate Player Resistance**: Intermediate players show a particularly strong pattern of avoiding RMT usage, which could indicate:- Transitional learning approaches at this skill level- Concerns about technology hindering development of fundamental skills- Different teaching methodologies that don't emphasize technology3. **Advanced Player Adoption**: Advanced/expert players embrace RMT at higher rates, possibly because:- They use technology to refine already-developed skills- They have greater technical understanding of their instruments- They may be more involved in teaching and use RMT as pedagogical tools4. **Practical Applications**:- Technology developers should consider targeting advanced players as early adopters- Different approaches may be needed to increase RMT adoption among intermediate players- Educational approaches might need to be tailored by skill level.Despite the statistical significance of the relationship, the modest effect size suggests that play ability is just one of many factors influencing RMT adoption among musicians.# Country of Residence## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate the total Ntotal_N <-nrow(data_combined)# Modify country names: abbreviate USA and UKdata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ))# Compute counts and percentages for the 'countryLive' columncountry_summary <- data_combined %>%group_by(countryLive) %>%summarise(count =n()) %>%ungroup() %>%mutate(percentage = count / total_N *100) %>%arrange(desc(count))# Select the top 4 countries (using the highest counts)top_countries <- country_summary %>%top_n(4, wt = count) %>%mutate(label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)"),# Reorder to display from highest to lowestcountryLive =reorder(countryLive, -count) )# Perform chi-square goodness-of-fit test for top 4 countries# Expected frequencies for equality among the 4 groupsobserved <- top_countries$countexpected <-rep(sum(observed)/length(observed), length(observed))chi_test <-chisq.test(x = observed, p =rep(1/length(observed), length(observed)))print("Chi-square goodness-of-fit test for equal distribution among top 4 countries:")print(chi_test)# Create the bar plot with counts and percentages and add the notecountry_plot <-ggplot(top_countries, aes(x = countryLive, y = count)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 4 Countries (Live)",x ="Country",y =paste0("Count of Participants (N = ", total_N, ")"),subtitle =paste0("Chi-square: ", sprintf('%.2f', chi_test$statistic), " (df = ", chi_test$parameter, "), p = ", ifelse(chi_test$p.value <0.001, "< .001", sprintf('%.3f', chi_test$p.value))),caption ="Note: Countries with frequencies less than 5% were removed from the figure.") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),plot.subtitle =element_text(size =12),plot.caption =element_text(size =10, hjust =0, face ="italic", margin =margin(t =15)),plot.margin =margin(b =30, t =20, r =20, l =20) # Add extra bottom margin for the note ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)# Print summary statisticsprint("Summary Statistics for Top 4 Countries:")print(top_countries %>%select(countryLive, count, percentage) %>%arrange(desc(count)))``````{r}# Combined 5% values with note# Read the datalibrary(tidyverse)library(readxl)library(ggplot2)library(stringr)data_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Process the countryLive columncountry_analysis <- data_combined %>%# Remove missing valuesfilter(!is.na(countryLive)) %>%# Clean country names (trim whitespace, standardize case if needed)mutate(countryLive =trimws(countryLive)) %>%# Count occurrences of each countrycount(countryLive) %>%# Calculate percentagesmutate(total =sum(n),percentage = n / total *100,# Create grouped category (countries with <5% go to "Other")country_group =if_else(percentage >=5, countryLive, "Other") ) %>%# Sort by frequencyarrange(desc(n))# Identify countries that will be in the "Other" categoryother_countries <- country_analysis %>%filter(country_group =="Other") %>%arrange(desc(n))# Create a formatted string listing all "Other" countries WITHOUT percentagesother_countries_list <- other_countries %>%pull(countryLive) %>%paste(collapse =", ")# Create data for plottingplot_data <- country_analysis %>%group_by(country_group) %>%summarise(count =sum(n),percentage =sum(percentage) ) %>%# Ensure "Other" appears at the endmutate(country_group =factor(country_group),country_group =fct_relevel(fct_reorder(country_group, percentage, .desc =TRUE), "Other", after =Inf) )# Calculate total participantstotal_participants <-sum(country_analysis$n)# Create the base note textnote_text <-paste0("Note: 'Other' includes frequencies <5%: ", other_countries_list)# Manually wrap the note text to ensure it's fully visiblewrapped_note <-str_wrap(note_text, width =100)# Create the plot with adjusted bottom margin and caption stylingcountry_plot <-ggplot(plot_data, aes(x = country_group, y = count)) +geom_bar(stat ="identity", fill ="steelblue", width =0.7) +geom_text(aes(label =sprintf("%d\n(%.1f%%)", count, percentage)),position =position_stack(vjust =0.5),vjust =-0.5, size =3.5) +labs(title ="Distribution of Participants by Country of Residence",x ="Country",y =paste0("Number of Participants (N = ", total_participants, ")"),caption = wrapped_note ) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold", hjust =0.5),axis.title =element_text(size =12),axis.text.x =element_text(size =10, angle =45, hjust =1),axis.text.y =element_text(size =10),# Style the caption to ensure it's readableplot.caption =element_text(size =9, hjust =0, vjust =1,lineheight =1.2,margin =margin(t =15) ),# Add extra bottom margin for the captionplot.margin =margin(b =100, t =20, r =20, l =20) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Display the plotprint(country_plot)```## Results and DiscussionKey observations:• 39 counties in total• The United States has the highest representation with 610 participants (39.2%)• The United Kingdom follows with 358 participants (23%)• Australia is third with 326 participants (20.9%)• These top three countries account for approximately 83% of all participants• The remaining countries (Canada, Italy, and New Zealand) collectively represent about 17% of participants## Comparison Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Clean country names and create RMT factordata_combined <- data_combined %>%mutate(countryLive =case_when( countryLive =="United States of America (USA)"~"USA", countryLive =="United Kingdom (UK)"~"UK",TRUE~ countryLive ),RMTMethods_YN =factor(RMTMethods_YN, levels =c(0, 1),labels =c("No RMT", "RMT")) )# Get top 6 countriestop_6_countries <- data_combined %>%count(countryLive) %>%top_n(6, n) %>%pull(countryLive)# Filter data for top 6 countriesdata_for_test <- data_combined %>%filter(countryLive %in% top_6_countries, !is.na(RMTMethods_YN))# Calculate group totals for each RMT grouprmt_group_totals <- data_for_test %>%group_by(RMTMethods_YN) %>%summarise(group_N =n())# Calculate statistics with percentages WITHIN each RMT group (not within country)country_rmt_stats <- data_for_test %>%group_by(RMTMethods_YN, countryLive) %>%summarise(count =n(), .groups ='drop') %>%left_join(rmt_group_totals, by ="RMTMethods_YN") %>%mutate(percentage = count / group_N *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%# Calculate total per country (for ordering in plot)group_by(countryLive) %>%mutate(total_country =sum(count)) %>%ungroup()# Create contingency table for statistical testcontingency_table <-table( data_for_test$countryLive, data_for_test$RMTMethods_YN)# Prepare legend labels with group total N includedlegend_labels <-setNames(paste0(levels(data_for_test$RMTMethods_YN), " (N = ", rmt_group_totals$group_N, ")"),levels(data_for_test$RMTMethods_YN))# Get expected frequencies without running a test yetn <-sum(contingency_table)row_sums <-rowSums(contingency_table)col_sums <-colSums(contingency_table)expected_counts <-outer(row_sums, col_sums) / n# Use Fisher's exact test to avoid chi-square approximation warningsfisher_test <-fisher.test(contingency_table, simulate.p.value =TRUE, B =10000)test_name <-"Fisher's exact test"# Print test resultscat("\n", test_name, "Results:\n", sep="")print(fisher_test)# Print expected frequenciescat("\nExpected frequencies:\n")print(round(expected_counts, 2))# Create grouped bar plot with percentages within RMT groupsplot <-ggplot(country_rmt_stats, aes(x =reorder(countryLive, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position ="dodge",color ="black") +geom_text(aes(label = label),position =position_dodge(width =0.9),vjust =-0.5,size =3.5) +scale_fill_manual(values =c("lightblue", "steelblue"),labels = legend_labels) +labs(title ="RMT Usage by Country (Top 6)",subtitle =paste0(test_name, ": p ", ifelse(fisher_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", fisher_test$p.value)))),x ="Country",y ="Count of Participants",fill ="RMT Usage",caption ="Note: Percentages are calculated within each RMT group, not within countries") +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),axis.title =element_text(size =12),legend.position ="top",plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)) )# Display the plotprint(plot)# Calculate proportions of RMT users in each countrycountry_proportions <- data_for_test %>%group_by(countryLive) %>%summarise(total =n(),rmt_users =sum(RMTMethods_YN =="RMT"),rmt_proportion = rmt_users/total,rmt_percentage = rmt_proportion *100 ) %>%arrange(desc(rmt_proportion))cat("\nRMT Usage Proportions by Country:\n")print(country_proportions)# Pairwise proportion tests with Bonferroni correctioncountries <-unique(country_proportions$countryLive)n_countries <-length(countries)pairwise_tests <-data.frame()for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j]# Get data for both countries data1 <- data_for_test %>%filter(countryLive == country1) data2 <- data_for_test %>%filter(countryLive == country2)# Get counts for proportion test x1 <-sum(data1$RMTMethods_YN =="RMT") x2 <-sum(data2$RMTMethods_YN =="RMT") n1 <-nrow(data1) n2 <-nrow(data2)# Skip if zero denominatorsif (n1 ==0|| n2 ==0) {next }# Create 2x2 table for test test_table <-matrix(c(x1, n1-x1, x2, n2-x2), nrow=2)# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(test_table)# Store results pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,prop1 = x1/n1,prop2 = x2/n2,diff =abs(x1/n1 - x2/n2),p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionif (nrow(pairwise_tests) >0) { pairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")cat("\nPairwise Comparisons (Bonferroni-adjusted p-values):\n")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(prop1 =sprintf("%.1f%%", prop1 *100),prop2 =sprintf("%.1f%%", prop2 *100),diff =sprintf("%.1f%%", diff *100),p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ) %>%select(country1, prop1, country2, prop2, diff, p_value, p_adjusted))} else {cat("\nNo valid pairwise comparisons could be performed.\n")}```## Results and DiscussionGeographical Patterns in Respiratory Muscle Training Adoption Among Wind Musicians**Overview of Findings**The analysis examines how Respiratory Muscle Training (RMT) device usage varies across different countries among wind instrument musicians. By focusing on the top six countries represented in the sample, this analysis reveals distinctive geographical patterns in RMT adoption that likely reflect broader cultural, educational, and practice-related differences.**Key Statistical Findings**The Fisher's exact test confirms a statistically significant association between country of residence and RMT usage (p \< 0.05). This indicates that the observed geographical variations in RMT adoption are not due to random chance but represent genuine differences across countries. Within each RMT group, the distribution of countries differs notably:**Country-Specific RMT Usage Patterns**The data shows notable differences in RMT adoption rates across countries:1. **High Adoption Countries**: - Australia: 19.3% (63 out of 326 participants) - USA: 18.5% (113 out of 610 participants) - Italy: 17.0% (8 out of 47 participants)2. **Low Adoption Countries**: - Canada: 8.8% (8 out of 91 participants) - UK: 3.9% (14 out of 358 participants) - New Zealand: 3.1% (1 out of 32 participants)This reveals a striking pattern where participants from the UK and New Zealand are approximately 5-6 times less likely to use RMT compared to those from Australia, USA, or Italy.**Pairwise Comparisons**The Bonferroni-adjusted pairwise comparisons identify which specific country differences are statistically significant:1. **Significant Differences** (p \< 0.05 after adjustment): - USA vs. UK: 14.6 percentage point difference (p = 0.0000) - Australia vs. UK: 15.4 percentage point difference (p = 0.0000) - Italy vs. UK: 13.1 percentage point difference (p = 0.0249)2. **Non-Significant Differences** (p \> 0.05 after adjustment): - All other country comparisons, including those between high-adoption countries (Australia, USA, Italy) and between low-adoption countries (Canada, UK, New Zealand)The Bonferroni correction, which adjusts for multiple comparisons, confirms that the UK stands out as having significantly lower RMT usage compared to Australia, USA, and Italy.**Implications and Potential Explanations**Several factors might explain these international differences in RMT adoption:1. **Educational and Training Approaches**: Different educational systems and training methodologies may emphasize technology integration to varying degrees.2. **Cultural Attitudes**: Cultural differences in approaches to teaching, learning, and technology adoption may influence RMT usage.3. **Technological Infrastructure**: Differences in access to technology, internet reliability, and technological support systems might affect adoption rates.4. **Market Penetration**: Technology companies might have targeted certain markets more aggressively or entered them earlier.5. **Professional Norms**: Music education and performance practices may have different established norms regarding technology usage across these countries.The Anglo countries show an interesting split, with the USA and Australia having high adoption rates while the UK and New Zealand have notably low rates. This suggests that shared language and similar cultural backgrounds do not necessarily lead to similar technology adoption patterns.**Limitations**While these findings are statistically significant, some considerations are worth noting:1. **Sample Size Variations**: The number of participants varies considerably across countries (from 32 in New Zealand to 610 in the USA), which affects the precision of the estimates.2. **Limited Coverage for Italy and New Zealand**: These countries have relatively small sample sizes (47 and 32 respectively), making their estimates less reliable.3. **Potential Confounding Variables**: Factors like participant age, professional status, or educational background might vary across countries and influence the results.**Conclusion**The analysis clearly demonstrates that RMT usage varies significantly by country of residence, with particularly strong differences between high-adoption countries (Australia, USA, Italy) and the UK. These findings suggest that geographical and potentially cultural factors play an important role in technology adoption within music education and performance settings. Further research could explore the specific mechanisms behind these differences and their implications for educational policy and technology development.# Country of Education## Descriptive Stats```{r}# Read the datadata_combined <-read_excel("../Data/R_Import_Transformed_15.02.25.xlsx", sheet ="Combined")# Calculate total Ntotal_N <-nrow(data_combined)# Clean country namesdata_combined <- data_combined %>%mutate(countryEd =case_when( countryEd =="United States of America (USA)"~"USA", countryEd =="United Kingdom (UK)"~"UK",TRUE~as.character(countryEd) ) )# Identify the top 6 countries from countryEdtop_6_countryEd <- data_combined %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data for these top 6 countriesdata_top6_edu <- data_combined %>%filter(countryEd %in% top_6_countryEd)# Calculate statistics for plotting and analysisedu_stats <- data_top6_edu %>%count(countryEd) %>%arrange(desc(n)) %>%mutate(percentage = n /sum(n) *100,label =paste0(n, "\n(", sprintf("%.1f", percentage), "%)") )# Chi-square test for equal proportionschi_test <-chisq.test(edu_stats$n)# Create contingency table for post-hoc analysiscountries <-sort(unique(data_top6_edu$countryEd))n_countries <-length(countries)pairwise_tests <-data.frame()# Perform pairwise proportion testsfor(i in1:(n_countries-1)) {for(j in (i+1):n_countries) { country1 <- countries[i] country2 <- countries[j] count1 <- edu_stats$n[edu_stats$countryEd == country1] count2 <- edu_stats$n[edu_stats$countryEd == country2]# Perform proportion test test <-prop.test(x =c(count1, count2),n =c(sum(edu_stats$n), sum(edu_stats$n)) ) pairwise_tests <-rbind(pairwise_tests, data.frame(country1 = country1,country2 = country2,p_value = test$p.value,stringsAsFactors =FALSE )) }}# Apply Bonferroni correctionpairwise_tests$p_adjusted <-p.adjust(pairwise_tests$p_value, method ="bonferroni")# Create the plotedu_plot <-ggplot(edu_stats, aes(x =reorder(countryEd, -n), y = n)) +geom_bar(stat ="identity", fill ="steelblue", color ="black") +geom_text(aes(label = label), vjust =-0.5, size =4) +labs(title ="Top 6 Countries of Education",subtitle =paste0("χ²(", chi_test$parameter, ") = ", sprintf("%.2f", chi_test$statistic),", p ", ifelse(chi_test$p.value < .001, "< .001", paste0("= ", sprintf("%.3f", chi_test$p.value)))),x ="Country of Education",y =paste0("Count of Participants (N = ", total_N, ")")) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))# Print statistical resultsprint("Chi-square Test of Equal Proportions Results:")print(chi_test)print("\nDescriptive Statistics:")print(edu_stats)print("\nPairwise Comparisons (Bonferroni-adjusted p-values):")print(pairwise_tests %>%arrange(p_adjusted) %>%mutate(p_value =sprintf("%.4f", p_value),p_adjusted =sprintf("%.4f", p_adjusted) ))# Display the plotprint(edu_plot)```## Results and Discussion**Distribution of Participants**- **United States of America (USA)**: Largest share with 620 participants (39.8%)- **United Kingdom (UK)**: Second largest with 364 participants (23.4%)- **Australia**: Third with 321 participants (20.6%)- **Canada**: 92 participants (5.9%)- **Italy**: 44 participants (2.8%)- **New Zealand**: 27 participants (1.7%)**Key Observations**:- The top three countries (USA, UK, and Australia) account for 83.8% of all participants' educational background- English-speaking countries dominate the top positions, representing 91.4% of the top 6 response- There's a significant drop between the top 3 and the remaining countries- The distribution pattern is very similar to the current country of residence data, suggesting most participants stayed in their country of education**Comparison to Current Residence**:USA: Education (620) vs Residence (610) UK: Education (364) vs Residence (358) Australia: Education (321) vs Residence (326) Canada: Education (92) vs Residence (91) Italy: Education (44) vs Residence (47) New Zealand: Education (27) vs Residence (32)This shows relatively small differences between where participants were educated and where they currently live, indicating low international mobility after education.## Comparison StatsStill getting a Chi-Squared Warning, but added Fisher's```{r}library(readxl)library(dplyr)library(ggplot2)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Identify Top 6 Countries top_6_countryEd <- data %>%count(countryEd, sort =TRUE) %>%top_n(6, n) %>%pull(countryEd)# Filter data to top 6 countries data_top6_edu <- data %>%filter(countryEd %in% top_6_countryEd)# Create contingency table contingency_table <-table(data_top6_edu$countryEd, data_top6_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Fisher's exact test pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Use Fisher's exact test for all pairwise comparisons test <-fisher.test(cont_table[c(i,j),]) results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top6_edu = data_top6_edu,contingency_table = contingency_table )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top6_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top6_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 6)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption ="Note: Percentages are out of the total N for each group" ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =10) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()``````{r}# Only top 4 countries shown - Trying to increase the robustness of the analyses.library(readxl)library(dplyr)library(ggplot2)library(stringr)# Robust Data Preparation Functionprepare_rmt_data <-function(file_path, sheet ="Combined") {tryCatch({# Read data with standardized cleaning data_combined <-read_excel(file_path, sheet = sheet) data_cleaned <- data_combined %>%mutate(# Comprehensive country name standardizationcountryEd =case_when(grepl("United States|USA", countryEd, ignore.case =TRUE) ~"USA",grepl("United Kingdom|UK", countryEd, ignore.case =TRUE) ~"UK",TRUE~as.character(countryEd) ),# Robust RMT factor conversionRMTMethods_YN =factor(`RMTMethods_YN`, levels =c(0, 1), labels =c("No RMT", "RMT") ) )return(data_cleaned) }, error =function(e) {stop(paste("Error in data preparation:", e$message)) })}# Advanced Statistical Analysis Functionperform_comprehensive_analysis <-function(data) {# Get all countries and their frequencies all_countries <- data %>%filter(!is.na(countryEd)) %>%count(countryEd, sort =TRUE) %>%mutate(percentage = n /sum(n) *100)# Identify Top 4 Countries top_4_countryEd <- all_countries %>%top_n(4, n) %>%pull(countryEd)# Get excluded countries for the note excluded_countries <- all_countries %>%filter(!(countryEd %in% top_4_countryEd)) %>%mutate(country_info = countryEd) %>%pull(country_info)# Create excluded countries list as a string excluded_countries_text <-paste(excluded_countries, collapse =", ")# Filter data to top 4 countries data_top4_edu <- data %>%filter(countryEd %in% top_4_countryEd)# Create contingency table contingency_table <-table(data_top4_edu$countryEd, data_top4_edu$RMTMethods_YN)# Comprehensive test selection and reporting analyze_test_assumptions <-function(cont_table) {# Calculate expected frequencies chi_results <-suppressWarnings(chisq.test(cont_table)) expected_freq <- chi_results$expected# Detailed frequency checks total_cells <-length(expected_freq) low_freq_cells <-sum(expected_freq <5) min_expected_freq <-min(expected_freq)# Verbose reporting of frequency conditionscat("Expected Frequency Analysis:\n")cat("Minimum Expected Frequency:", round(min_expected_freq, 2), "\n")cat("Cells with Expected Frequency < 5:", low_freq_cells, "out of", total_cells, "cells (", round(low_freq_cells / total_cells *100, 2), "%)\n\n")# Determine most appropriate testif (min_expected_freq <1|| (low_freq_cells / total_cells) >0.2) {# Use Fisher's exact test with Monte Carlo simulation exact_test <-fisher.test(cont_table, simulate.p.value =TRUE, B =10000)return(list(test_type ="Fisher's Exact Test (Monte Carlo)",p_value = exact_test$p.value,statistic =NA,method ="Fisher's Exact Test with Monte Carlo Simulation" )) } else {# Use chi-square test with Yates' continuity correction adjusted_chi_test <-chisq.test(cont_table, correct =TRUE)return(list(test_type ="Chi-Square with Continuity Correction",p_value = adjusted_chi_test$p.value,statistic = adjusted_chi_test$statistic,parameter = adjusted_chi_test$parameter,method =paste("Pearson's Chi-squared test with Yates' continuity correction,","df =", adjusted_chi_test$parameter) )) } }# Perform test test_results <-analyze_test_assumptions(contingency_table)# Pairwise comparisons with Chi-square tests (if appropriate) pairwise_comparisons <-function(cont_table) { countries <-rownames(cont_table) n_countries <-length(countries) results <-data.frame(comparison =character(),p_value =numeric(),adj_p_value =numeric(),stringsAsFactors =FALSE )for(i in1:(n_countries-1)) {for(j in (i+1):n_countries) {# Get the 2x2 subtable subtable <- cont_table[c(i,j),]# Check assumptions for chi-square expected <-chisq.test(subtable)$expected min_expected <-min(expected)# Use appropriate test based on expected frequenciesif (min_expected <5) { test <-fisher.test(subtable) } else { test <-chisq.test(subtable, correct =TRUE) } results <-rbind(results, data.frame(comparison =paste(countries[i], "vs", countries[j]),p_value = test$p.value,adj_p_value =NA )) } }# Bonferroni correction results$adj_p_value <-p.adjust(results$p_value, method ="bonferroni")return(results) }# Compute pairwise comparisons pairwise_results <-pairwise_comparisons(contingency_table)# Return comprehensive resultslist(test_results = test_results,pairwise_results = pairwise_results,data_top4_edu = data_top4_edu,contingency_table = contingency_table,excluded_countries_text = excluded_countries_text,all_countries = all_countries )}# Visualization Functioncreate_rmt_plot <-function(analysis_results) {# Prepare plot data plot_data <- analysis_results$data_top4_edu %>%group_by(countryEd, RMTMethods_YN) %>%summarise(count =n(), .groups ='drop') %>%group_by(countryEd) %>%mutate(total_country =sum(count),percentage = count / total_country *100,label =paste0(count, "\n(", sprintf("%.1f", percentage), "%)") ) %>%ungroup()# Compute totals for legend legend_totals <- analysis_results$data_top4_edu %>%group_by(RMTMethods_YN) %>%summarise(total =n(), .groups ='drop')# Create legend labels legend_labels <-setNames(paste0(legend_totals$RMTMethods_YN, " (N = ", legend_totals$total, ")"), legend_totals$RMTMethods_YN )# Prepare subtitle based on test type test_results <- analysis_results$test_results subtitle_text <-if (test_results$test_type =="Chi-Square with Continuity Correction") {paste0("Chi-square test: ", sprintf("χ²(%d) = %.2f", test_results$parameter, test_results$statistic),", p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) } else {paste0("Fisher's Exact Test (Monte Carlo): p ", ifelse(test_results$p_value <0.001, "< .001", paste("=", sprintf("%.3f", test_results$p_value)))) }# Create the caption with excluded countries note_text <-paste0("Note. All countries with frequencies less than 5% were excluded from the Figure, including: ", analysis_results$excluded_countries_text)# Wrap the note text for better display wrapped_note <-str_wrap(note_text, width =100)# Count how many lines are in the wrapped note for margin adjustment line_count <-str_count(wrapped_note, "\n") +1# Create the plotggplot(plot_data, aes(x =reorder(countryEd, -total_country), y = count, fill = RMTMethods_YN)) +geom_bar(stat ="identity", position =position_dodge(width =0.9), color ="black") +geom_text(aes(label = label), position =position_dodge(width =0.9), vjust =-0.5, size =3.5) +labs(title ="Country of Education by RMT Usage (Top 4)",subtitle = subtitle_text,x ="Country of Education",y =paste0("Count of Participants (N = ", sum(plot_data$count), ")"),fill ="RMT Usage",caption = wrapped_note ) +scale_fill_discrete(labels = legend_labels) +theme_minimal() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text.x =element_text(size =12, angle =45, hjust =1),axis.text.y =element_text(size =12),plot.caption =element_text(hjust =0, size =9, lineheight =1.2),# Adjust bottom margin based on the number of lines in the noteplot.margin =margin(b =10+ (line_count *12), t =20, r =20, l =20) ) +scale_y_continuous(expand =expansion(mult =c(0, 0.2)))}# Main Execution Functionrun_rmt_analysis <-function(file_path ="../Data/R_Import_Transformed_15.02.25.xlsx") {# Prepare data prepared_data <-prepare_rmt_data(file_path)# Perform comprehensive analysis analysis_results <-perform_comprehensive_analysis(prepared_data)# Create visualization rmt_plot <-create_rmt_plot(analysis_results)# Print results to consolecat("Statistical Test Details:\n")cat("Test Type:", analysis_results$test_results$test_type, "\n")cat("P-value:", analysis_results$test_results$p_value, "\n\n")cat("Contingency Table:\n")print(analysis_results$contingency_table)cat("\nPost-hoc Pairwise Comparisons (Bonferroni-corrected):\n")print(analysis_results$pairwise_results)cat("\nExcluded Countries:\n")cat(analysis_results$excluded_countries_text, "\n")# Display the plotprint(rmt_plot)# Return results for potential further analysisreturn(analysis_results)}# Run the analysisresults <-run_rmt_analysis()```## Results and Discussion**Statistical Significance***Chi-square Test:* The chi-square test produced a statistic of approximately 44.25 with 7 degrees of freedom and a p-value of about 1.92×10−71.92×10−7. This highly significant result indicates that the distribution of education levels differs between the two groups (those who use RMT methods versus those who do not), i.e., the association is statistically significant.*Effect Size (Cramer’s V):* Cramer’s V was calculated to be around 0.169. Although not extremely large, this value suggests a modest association between education and the use of RMT methods. In social sciences, such effect sizes are common when dealing with categorical variables; even a modest effect size can be substantively meaningful.**Detailed Group Comparison***Group Proportions:* The summary statistics break down the relative frequencies of each education category for the “No” and “Yes” groups. For each education level, both counts and percentages were calculated. In the plots, you can see the proportions clearly marked with labels (which include counts and percentages), making it easy to compare how education distributions differ between the two groups.*Proportion Differences:* By pivoting the grouped data, we computed the differences in percentage points between the “Yes” and “No” groups for each education level. Education levels with the largest differences indicate where the distribution deviates most from what is expected under independence. These differences can help pinpoint which education categories are driving the overall chi-square statistic.**Standardized Residuals***Interpreting Residuals:* The standardized residuals provide additional insight into which specific cells of the contingency table contribute most to the chi-square result. A positive residual in a cell suggests that the observed count is higher than what is expected under the null hypothesis, whereas a negative residual indicates under representation. For example, if the “Doctorate” category shows a strongly positive residual for one group, it means that group has a higher-than-expected frequency of individuals holding a doctorate. Conversely, a negative value in another cell would point to an under representation relative to expectation.*Identifying Key Differences:* Reviewing these residuals allows us to identify which education levels are statistically noteworthy. For instance, if “Doctorate” or “Bachelors degree” have notable positive or negative deviations, these categories are influencing the overall association significantly.**Visualization of Findings**Two types of plots were generated: *Side-by-Side Bar Plot:* This plot displays the percentage breakdown of each education category for the “No” and “Yes” groups side by side. With detailed labels that include both the count and percentage, this visualization makes it straightforward to compare each education level between groups. The subtitle also reminds the viewer of the total sample size for each group, ensuring a complete context for interpretation.*Dot/Line Plot:* As an alternative, the dot/line plot connects the percentages across education categories for both groups. This plot emphasizes trends and differences in the pattern of education distribution. The connecting lines help to visualize how the relative positions of categories shift between groups. Both visualizations have been adjusted to ensure that all labels are fully visible by setting appropriate margins and enlarging plot height if necessary.**Overall Interpretation***Significant Association:* The significant chi-square result confirms that the pattern of education across groups is not random. There is a statistically significant association between having a particular education level and whether someone uses RMT methods.*Magnitude and Specific Contributions:* The moderate effect size (Cramer’s V ≈ 0.169) implies that while the association exists, it is of modest strength. Importantly, the examination of standardized residuals and proportion differences shines a light on which specific education categories drive this association. These insights might be used to explore further why certain education levels are more, or less, likely to be associated with RMT method usage.*Practical Implications:* Depending on the context of the study (e.g., educational policies, targeted interventions, or market segmentation), these findings could inform decision-making by pinpointing demographic groups for further investigation or tailored strategies. For instance, if a particular education category is underrepresented among RMT users, it might signal potential barriers or areas for targeted improvement.**Summary**The findings indicate that country of education is significantly associated with the use of RMT methods. The statistical tests, along with the calculated effect size and detailed graphical representations, provide a robust basis for interpreting the differences across education levels. The analysis not only confirms the presence of differences but also identifies which education categories are most influential in driving these differences, offering valuable insights for further study or practical applications.**Combined Groups**This analysis examines the relationship between country of education and Respiratory Muscle Training (RMT) usage among participants. The data covers four major English-speaking countries: Australia, Canada, the UK, and the USA, with 34 additional countries excluded due to smaller sample sizes.**Statistical Significance**The chi-square test with continuity correction yielded a highly significant result (p \< 0.001), indicating a strong association between country of education and RMT usage. All expected cell frequencies were above the minimum threshold of 5 (minimum = 13.17), confirming that the chi-square test was appropriate for this analysis.**RMT Usage Patterns by Country**From the contingency table, we can observe distinct patterns in RMT adoption:1. **Australia**: 65 out of 321 participants (20.2%) use RMT2. **USA**: 113 out of 620 participants (18.2%) use RMT3. **Canada**: 8 out of 92 participants (8.7%) use RMT4. **UK**: 14 out of 364 participants (3.8%) use RMTThis reveals a striking pattern where participants educated in Australia and the USA have RMT adoption rates approximately 2-5 times higher than those educated in Canada and the UK.**Post-hoc Pairwise Comparisons**The Bonferroni-corrected pairwise comparisons identify which specific country differences are statistically significant:1. **Significant Differences** (p \< 0.05 after adjustment):- Australia vs. UK (p = 2.69 × 10\^-10): Australian-educated participants are significantly more likely to use RMT than UK-educated participants- USA vs. UK (p = 9.52 × 10\^-10): US-educated participants are significantly more likely to use RMT than UK-educated participants2. **Non-Significant Differences** (p \> 0.05 after adjustment):- Australia vs. Canada (p = 0.097): Despite the substantial percentage difference (20.2% vs 8.7%), this comparison falls just short of significance after Bonferroni correction- Australia vs. USA (p = 1.0): Very similar RMT usage rates between these countries- Canada vs. UK (p = 0.362): Both have lower RMT usage rates- Canada vs. USA (p = 0.203): Not statistically significant after correction**Implications and Potential Explanations**Several factors might explain these international differences in RMT adoption based on education location:1. **Educational Approaches**: Australia and the USA may emphasize technology integration in music education more heavily than the UK and Canada. This could reflect differences in curriculum design, pedagogical approaches, or resource allocation within educational institutions.2. **Technological Infrastructure in Educational Settings**: Educational institutions in Australia and the USA might have greater access to technology resources, better digital infrastructure, or more technology-oriented training programs.3. **Cultural Attitudes Toward Technology in Education**: Different educational systems may have varying cultural perspectives on the role of technology in learning and performance. These cultural attitudes could shape how students are taught to view technological aids.4. **Educational Investment in Technology**: Countries may differ in their funding priorities for music education, with some investing more heavily in technological resources and training.5. **Different Teaching Traditions**: Musical education may follow different teaching traditions and philosophies across these countries, with some more receptive to technological innovation than others.**Broader Context**The pattern observed here—with Australia and the USA having higher RMT adoption rates than the UK and Canada—suggests a potential divide in approaches to music education and technology integration across these major English-speaking countries. This divide persists despite shared language and many cultural similarities, highlighting how educational practices can diverge significantly across national boundaries.**Limitations and Considerations**1. **Sample Size Variations**: The number of participants varies considerably across countries (from 92 in Canada to 620 in the USA), affecting the precision of estimates.2. **Excluded Countries**: 34 countries were excluded from the analysis due to small sample sizes. This extensive list includes several European countries (Germany, Italy, Netherlands, etc.) and countries from various regions worldwide, limiting the global representativeness of the findings.3. **Historical Context**: The data doesn't indicate when participants were educated in these countries, so the findings may reflect historical differences in educational approaches across different time periods.4. **Education vs. Residence**: This analysis focuses on country of education rather than current residence, which may provide insights into formative influences rather than current environmental factors affecting RMT usage.**Conclusion**The analysis provides strong evidence that country of education is significantly associated with RMT usage, with particularly pronounced differences between Australia/USA and the UK. These findings suggest that educational systems and approaches play an important role in shaping technology adoption practices among musicians. Understanding these patterns could help inform international educational policy, curriculum development, and technology integration strategies in music education.