New DATA1901 Project

Author

SID: 540904066, 550064633, 560328048, 560609084, 560633584, 560653928

Code
# Hiding Warning and Startup Messages
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rafalib))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rmarkdown))
suppressPackageStartupMessages(library(data.table))


suppressWarnings(library(tidyverse))
suppressWarnings(library(rafalib))
suppressWarnings(library(ggplot2))
suppressWarnings(library(readxl))
suppressWarnings(library(plotly))
suppressWarnings(library(dplyr))
suppressWarnings(library(rmarkdown))
suppressWarnings(library(data.table))

Executive Summary

This report investigates the relationship between gender and symptom reporting, measured by baseline GASE scores and total symptom change. Findings reveal females exhibit higher symptom severity and variability, and susceptibility to nocebo effects compared to males. Although minor, these consistent differences indicate a moderate relationship between gender and symptom reporting.

Exploratory Data Analysis

Code
# Setting up necessary packages
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(data.table)
library(rmarkdown)
library(readxl)

#the excel data was edited to include a "Count" heading for numbered rows, to remove read_excel outputs regarding absent variable names
data = readxl::read_excel("~/Documents/Uni Stuff/DATA1901/DATA1901_group_project.xlsx")

#removing "other" gender catagory due to limited sample size
cleaned_data = filter(data, gender != "other") 

#adding Side Effect Modelling boolean (SEM)
cleaned_data = mutate(cleaned_data, SEM = ifelse(social_mod != "NA", "SEM", "No SEM"))

#adding gender filtered data sets to focus on gendered-correlations/summaries
male_filtered = filter(cleaned_data, gender == "Male")
female_filtered = filter(cleaned_data, gender == "Female")

#Creating a new data set allocating into Intervention groups
cleaned_data <-mutate(cleaned_data, Intervention = case_when(
  gid %in% c("INSMDN", "ISMDN", "INSMWT", "ISMWT") ~ "Intervention",
  #Natural history group watched the No Intervention Video
  gid %in% c("NISMDN", "NINSMDN", "NISMWT", "NINSMWT", "NHDN", "NHWT") ~ "No Intervention"
))

The data was sourced from a research article1 taking data points from 161 participants and 80 different variables. Our investigation focuses on the variables gender (nominal-qualitative) and two interrelated variables encompassing symptom reporting:

  1. Discrete-quantitative baseline side effect symptom scores (baseline_gase)2

  2. Discrete-quantitative side effect reporting difference scores, subtracting pre-placebo symptom scores from post-placebo scores (tot)

Additionally, we used intervention_bin,3 for participants either experiencing social modelling or not, and created SEM, which separated participants according to if they experienced side-effect modelling4.

Limitations

The research article deliberately screened participants, leading to potential selection bias, underrepresenting individuals reporting severe symptoms5. Since our analysis investigates differences across gender, and does not individually analyse select symptoms, the hypothetical disparities in symptom reporting may differ according to each symptom, and its severity6.

Assumptions

We assumed that cumulative symptom reporting was similar to individual symptom reporting, and reflective of general trends. We also assumed that the sample size of the “other” gender was too small (n = 4) for any significant trends to appear, so it was omitted. It was assumed that participants understood survey questions and reported symptoms truthfully. Contrary to the research article, we included the excluded outlier, assuming its validity regardless of severity.7

Research Question

Are there significant relationships between gender and overall and severity rates of symptom reporting, both with and without potential nocebo effects?

Analysis

The analysis revealed that females report higher severity levels of symptoms than males, with greater variability, and are more sensitive to potential nocebo effects. Disparities between genders, while marginal, are consistent, having addressed potential confounders, suggesting moderate relationships between gender and symptom reporting.

Baseline GASE

Figures 1 and 2

Code
#Horizontal boxplot of Baseline GASE and Males
# You can play with these plots! Try clicking on the small coloured squares on the right of the graphs to toggle between graphs.
boxplot_bgase <- plot_ly(x = male_filtered$baseline_gase, type = "box", name = "Male")
boxplot_bgase <- boxplot_bgase %>% 
  #Adding second boxplot of Females for comparison
  add_trace(x = female_filtered$baseline_gase, type = "box", name = "Female") %>%
    layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Gender"))

#Histogram of Baseline GASE and Males
gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$baseline_gase, name = "Male", nbinsx = 15, opacity = 1.0, marker = list(color = '2E91E5')) %>% # Making colours match as close as possible for aesthetics
  add_histogram(x = female_filtered$baseline_gase, name = "Female", nbinsx = 15, opacity = 1.0, marker = list(color = 'darkorange')) %>%
  layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Frequency"))

# putting two relevant graphs on top of each other
stackem <- subplot(boxplot_bgase, gender_histogram_baseline, 
                   # giving separate labels to y axis of graphs
                   titleY = TRUE,
                   nrows = 2) %>%
  layout(title = list(text = "Male and Female Baseline GASE"),
         # giving the same label to the x axis
         xaxis2 = list(title = "Baseline GASE Score"))

stackem

Figure 1 (boxplot) shows similar average baseline scores across genders, the male median being slightly higher than females, but with a lower mean (11.62 vs. 11.84), and the same IQR (2.00). However, the longer upper tail, and greater range of female scores, suggesting higher symptom reporting scores. Figure 2 corroborates this; female standard deviation is significantly higher than the more clustered male scores8 (3.30 vs. 2.45). Social norms around masculinity encourages men to downplay symptoms. Barsky et al. (2001)9 finds similar results, attributing biological factors, where fluctuations across menstrual cycles10 influence symptom experience, to greater spread in females.

Figure 3

Code
# Male and Female Baseline GASE depending on intervention status
# Creating boxplots into Intervention groups
intervention_boxplot <- plot_ly(cleaned_data, x= ~Intervention, y = ~baseline_gase, color = ~gender, type = "box") %>%
  #Further separation into grouped boxplots
  layout(boxmode = "group",
         title = "Male and Female Baseline GASE",
         yaxis = list(title = "Baseline GASE Score"),
         xaxis = list(title = "Intervention Status"))


intervention_boxplot

While the experiment’s intervention and non-intervention groups11 could be possible confounders, Figure 3 shows females spread in baseline_gase score consistently greater regardless of groups, aligning with previous results. Still, female spread compared to males is greater in intervention (upper tail: 3.75 vs. 0.75) than non-intervention (3 vs. 2). While intervention is a minor confounder, males still consistently correlate with lower and less varied symptom reporting.

Barsky et al. (2001) concurs when analysing variously severe symptoms12, agreeing with the correlation between gender and symptom reporting, despite possible selection bias.

Symptom Change

Figure 4

Code
# Density Histogram of Symptom Change in Males and Females
p = ggplot(cleaned_data, aes(x = tot, fill = gender)) +
  geom_histogram(aes(y = after_stat(density)), 
                 # Preventing overlap between male and female data
                 position = "dodge", bins = 25)+
  labs(y = "Density", x = "Symptom Change", title = "Symptom Changes in Male vs Female", fill = "Gender")
ggplotly(p)

Figure 4 displays a symmetrical distribution about 0, slightly skewing left, indicating little symptom change overall. Males experienced lower symptom increase (mean ≈ 0.47) than females (mean ≈ 0.58), affirming lower male symptom reporting rates in post-placebo surveys, since initial female reporting rates were higher.

Figure 5

Code
#Comparative boxplot of gender and symptom change
box1 = ggplot(cleaned_data, aes(x = gender, y = tot)) +
  geom_boxplot() +
  labs(x = "Gender", y = "Symptom Change", title = "Symptom Changes in Male vs Female")
ggplotly(box1)

Increasing GASE scores are attributed to the nocebo effect13. Females exhibited higher standard deviation in symptom change scores compared to males (2.72 vs. 1.87); Figure 5 corroborates with greater IQR (2 vs. 1.75) and overall range (23 vs. 11). Greater symptom change spread in females may suggest a stronger nocebo effect for females. Meta-studies on the nocebo effect concur, suggesting “nocebo responses were more frequent in females” (Vanbheim and Flaten, 2017). Also, nocebo responses may be enhanced by anxiety, which females could experience at higher rates.

Figure 6

Code
#Several comparative boxplots along SEM, Intervention, Symptom Change, and Gender variables
ggplot(cleaned_data, aes(x = tot, y = intervention_bin, colour = gender)) +
  geom_boxplot() +
  #Facet Wrap Splits into two columns based on variable of SEM
  facet_wrap(~ SEM) +
  labs(x = "Symptom Change", y = "Positive Social Modelling", colour = "Gender", title = "PSM & SEM Between Gender")

The experimental process split participants into groups - social, and side-effect, modelling - that could represent confounders. Figure 6 shows that per confounder group, females have greater symptom change variability, their average standard deviation being greater than males (2.59 vs. 1.67), which is consistent with previous findings.

Acknowledgements

Group Meetings

  • 23/3/2026, 8:00pm - 8:45pm, Present: Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

  • 26/3/2026, 12:00pm - 1:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

  • 30/3/2026, 10:45am - 12:15pm Caleb McCubben, David Lu

  • 1/4/2026, 10:00am - 1:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu

  • 6/4/2026, 6:00pm - 7:00pm, Present: Abhishek Kumar, Harry Liu

  • 9/4/2026, 12:00pm - 11:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

  • 10/4/2026, 11:00am - 6:00pm, Present: Abhhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

Resources and References

  1. ‌American Psychological Association. (2019). In-Text Citations. Https://Apastyle.apa.org.

    https://apastyle.apa.org/style-grammar-guidelines/citations 

  2. Barsky, A. J., Peekna, H. M., & Borus, J. F. (2001). Somatic symptom reporting in women and men. Journal of General Internal Medicine, 16(4).

    https://doi.org/10.1046/j.1525-1497.2001.016004266.x

  3. Beutel, M. E., Wiltink, J., Ghaemi Kerahrodi, J., Tibubos, A. N., Brähler, E., Schulz, A., Wild, P., Münzel, T., Lackner, K., König, J., Pfeiffer, N., Michal, M., & Henning, M. (2019). Somatic symptom load in men and women from middle to high age in the Gutenberg Health Study — association with psychosocial and somatic factors. Scientific Reports, 9, Article 4610.

    https://doi.org/10.1038/s41598-019-40709-0

  4. Bryan, S. (n.d.). Library Guides: APA (7th Edition) Referencing Guide: In-Text Citations. Libguides.jcu.edu.au. 

    https://libguides.jcu.edu.au/apa/in-text 

  5. Colors in R. (n.d.). R CHARTS | a Collection of Charts and Graphs Made with the R Programming Language. 

    https://r-charts.com/colors/ 

  6. Discrete. (2026). Plotly.com.

    https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express 

  7. Ed Discussion (2026). Ask the Researcher.

    https://edstem.org/au/courses/30417/discussion/3126896

  8. GeeksforGeeks. (2024, January 29). How to do Conditional Mutate in R. GeeksforGeeks.

    https://www.geeksforgeeks.org/r-language/how-to-do-conditional-mutate-in-r/ 

  9. GeeksforGeeks. (2025, July 23). How to plot multiple histograms in R?

    https://www.geeksforgeeks.org/r-language/how-to-plot-multiple-histograms-in-r/

  10. Harrell, F. (2016, December 25). Getting separate axis labels on R plotly subplots. Stack Overflow.

    https://stackoverflow.com/questions/41324934/getting-separate-axis-labels-on-r-plotly-subplots 

  11. Learn R. (n.d.). R read CSV file.

    https://learn-r.org/r-tutorial/read-csv.php

  12. Plotly Technologies Inc. (n.d.). Subplots in R. Plotly. 

    https://plotly.com/r/subplots/

  13. Quarto. (n.d.). HTML theming.

    https://quarto.org/docs/output-formats/html-themes.html

  14. Quarto. (n.d.). Markdown basics.

    https://quarto.org/docs/authoring/markdown-basics.html#footnotes‌

  15. Saunders, C., Tan, W., Ng, D., Burchett, A., McNair, N., & Colagiuri, B. (2025). Positive social modeling attenuates nocebo side effects. Annals of Behavioral Medicine, 59(1), kaaf048.

    https://doi.org/10.1093/abm/kaaf048

  16. Świder, K., & Bąbel, P. (2013). The effect of the sex of a model on nocebo hyperalgesia induced by social observational learning. Pain, 154(8), 1312–1317.

    https://doi.org/10.1016/j.pain.2013.04.001‌

  17. Vambheim, S., & Flaten, M. A. (2017). A systematic review of sex differences in the placebo and the nocebo effect. Journal of Pain Research, Volume 10, 1831–1839.

    https://doi.org/10.2147/jpr.s134745

Note: The specific references for articles used are 2 and 17

Contributions of Members

  • Abhishek Kumar: Wrote analysis on symptom change findings, wrote acknowledgements section

  • Caleb McCubben: Exploratory Data Analysis, Visual Analytics, Quarto formatting

  • David Lu: Exploratory Data Analysis,Visual Analytics, Quarto formatting

  • Franco Maiolo: Presentation,Data Wrangling, Executive summary

  • Harry Liu: Wrote analysis on BGASE observations, wrote acknowledgements section

  • Sunmo An: Presentation, Data Wrangling, Initial draft for EDA

AI Usage Statement

No AI was used in the creation of the report or code.

Compliance with Ethical and Professional Standards

We demonstrated our commitment to ‘professionalism’ by understanding the levels of competence and responsibility in our research. Professionalism was achieved by paying concrete attention to the accurate use of references and citations, and we prioritised the informed judgement of our readers through the consistent use of clarifying footnotes that gave additional context where necessary. These steps were taken to ensure a high standard was kept throughout our research and analysis.

To adhere to the principle ‘maintaining confidence in statistics’, each output and code string was clearly described and well-documented to help readers, who may or may not be familiar with R and its coding formats, the process of data creation. Limitations and assumptions of our analysis were clearly laid out to avoid drawing or potentially implying invalid conclusions This was done to preserve and maintain public and readers’ trust in our work, so that they could be confident and assured in the proper use of statistics.

Footnotes

  1. Saunders, C., Tan, W., Ng, D., Burchett, A., McNair, N., & Colagiuri, B. (2025). Positive social modeling attenuates nocebo side effects. Annals of Behavioral Medicine : A Publication of the Society of Behavioral Medicine, 59(1), kaaf048.

    https://doi.org/10.1093/abm/kaaf048

    ↩︎

  2. GASE: General Assessment of Side Effects. The baseline_gase scores are the cumulative totals of survey scores that measured the perceived severity of 10 particular symptoms (headache, dizziness, and others) from a 7-point scale (1 = “Not present” to 7 = “Severe”). These scores range from 7 - 70. The baseline_gase scores were taken before individuals took a supposed cognitive enhancer (except for the natural history control group) and cognitive performance exam.↩︎

  3. Participants watch one of two videos where an individual either reported a positive experience with medication (intervention) or was not present (non-intervention). The natural history group watched the non-intervention video.↩︎

  4. Participants (not in the Natural History group) either experienced a live social model experience negative side effects or not. Both intervention_bin and SEM were used to analyse potential confounders.↩︎

  5. “To ensure participants were not experiencing significant symptoms at the time of testing, they were excluded from analysis if they exceeded pre-registered thresholds of physical symptoms pre-treatment[…] In accordance with pre-registration, those with extreme baseline side effects (6 or more on any single item, or a mean greater than 4) were excluded from analyses.”

    Saunders, C., Tan, W., Ng, D., Burchett, A., McNair, N., & Colagiuri, B. (2025). Positive social modeling attenuates nocebo side effects. Annals of Behavioral Medicine : A Publication of the Society of Behavioral Medicine, 59(1), kaaf048. https://doi.org/10.1093/abm/kaaf048↩︎

  6. Men and women could, for example, report severe chest pains at equal rates, but not mild pains, nor runny noses.↩︎

  7. Symptom reporting rates were considered valid regardless of severity. Furthermore, this particular data point was one male result out of 47. The trends of higher female symptom reporting and variability discussed in analysis below remained very similar regardless of this male data point.↩︎

  8. While measures of center are largely similar, due to generally low rates of illness and high clusters around 10/11, the larger spread of female scores indicates greater proportions of females reporting symptoms, and severity, compared to males.↩︎

  9. This study is a cumulative analysis of several medical databases, where reports of symptoms are greater in females than males.↩︎

  10. This reasoning similarly applies here, where the ages of all females in our analysis are greater than or equal to 17 years.↩︎

  11. Discussed in the Exploratory Data Analysis above at 5↩︎

  12. “Women generally report more bodily distress and more numerous, more intense, and more frequent somatic symptoms than men. These differences emerge regardless of the time period inquired about, the response format used, and whether symptoms are recorded prospectively or retrospectively[…] When symptoms due to demonstrable disease were omitted from the analysis and only medically unexplained complaints studied, the gender differences persisted.”

    This implies in the contrapositive that regardless of disease, severity, and symptom, these trends still appeared.

    Barsky, A. J., Peekna, H. M., & Borus, J. F. (2001). Somatic symptom reporting in women and men. Journal of General Internal Medicine, 16(4).

    https://doi.org/10.1046/j.1525-1497.2001.016004266.x↩︎

  13. “A significant nocebo effect was observed”

    Saunders, C., Tan, W., Ng, D., Burchett, A., McNair, N., & Colagiuri, B. (2025). Positive social modeling attenuates nocebo side effects. Annals of Behavioral Medicine : A Publication of the Society of Behavioral Medicine, 59(1), kaaf048.

    https://doi.org/10.1093/abm/kaaf048↩︎