New DATA1901 Project

Author

SID: 560653928, 560633584, 540904066, 560609084

Code
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rafalib))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rmarkdown))
suppressPackageStartupMessages(library(data.table))


suppressWarnings(library(tidyverse))
suppressWarnings(library(rafalib))
suppressWarnings(library(ggplot2))
suppressWarnings(library(readxl))
suppressWarnings(library(plotly))
suppressWarnings(library(dplyr))
suppressWarnings(library(rmarkdown))
suppressWarnings(library(data.table))

Executive Summary

This report investigates the relationship between gender and symptom reporting, measured by baseline and active GASE scores. Findings reveal females exhibit consistently higher symptom severity, variability, and susceptibility to nocebo effects than males. Despite marginal differences, these consistent trends indicate a moderate relationship between gender and symptom reporting.

Exploratory Data Analysis - Over by 28 words

Code
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(data.table)
library(rmarkdown)
library(readxl)

#the excel data was edited to include a "Count" heading for numbered rows, to remove read_excel outputs regarding absent variable names
data = readxl::read_excel("~/Documents/Uni Stuff/DATA1901/DATA1901_group_project.xlsx")

data = filter(data, gender != "other") #limited sample size for "other" gender catagory

#adding agase_total variable to pair along side bgase_total labelled "Baseline_gase" and symptom change labelled "tot"
cleaned_data = mutate(data, agase_total = agase_1 + agase_2 + agase_3 + agase_4 + agase_5 + agase_6 + agase_7 + agase_8 + agase_9 + agase_10)
#adding Side Effect Modelling boolean (SEM)
cleaned_data = mutate(cleaned_data, SEM = ifelse(social_mod != "NA", "SEM", "No SEM"))

#adding filtered data sets to focus on gendered-correlations/summaries
male_filtered = filter(cleaned_data, gender == "Male")
female_filtered = filter(cleaned_data, gender == "Female")

The data was sourced from a research article1 taking data points from 1612 participants and 80 different variables, with numerical measures of age, symptom severity, and more. Our investigation focuses on the variables gender (nominal-qualitative) as well as two interrelated variables encompassing symptom reporting:

  1. Discrete-quantitative baseline side effect symptom scores (baseline_gase)3

  2. Discrete-quantitative side effect reporting difference scores, subtracting pre-placebo symptom scores from post-placebo scores (tot)

Additionally, we used the variable of intervention_bin,4 where participants either experienced social modelling or not. Similarly, a variable (SEM) separated participants according to if they experienced side-effect modelling5.

Limitations

The research article, from which our data is sourced, deliberately screened for participants not demonstrating severe symptoms,6 potentially hiding symptom reporting and nocebo trends amongst these individuals. Since our analysis investigates differences across gender, and does not individually analyse select symptoms, the hypothetical disparities in symptom reporting may differ according to each symptom, and its severity7.

Assumptions

We assumed that cumulative symptom reporting is near identical to individual symptom reporting to reflect on general trends. We also assumed the the sample size of the “other” gender was too small (n = 4) for any significant trends to appear, so it was omitted. Moreover, it was assumed that participants understood survey questions and answered truthfully regarding symptom reports. Contrary to the research article8, we included the excluded outlier due to valid data reporting for baseline symptoms regardless of it’s outlier nature.9

Research Question - 171 words over

Are there significant relationships between gender and overall rates of symptom reporting, both with and without potential nocebo effects?

Analysis

The analysis revealed that females experience higher severity levels, and counts, of symptoms than males with greater variability and sensitivity to change with potential nocebo effect. The disparities between genders are marginal but consistent between subgroups addressing different confounders, thus indicating a moderate strength relationship between gender and symptom reporting.

Baseline GASE

Figures 1 and 2

Code
boxplot_bgase <- plot_ly(x = male_filtered$baseline_gase, type = "box", name = "Male")
boxplot_bgase <- boxplot_bgase %>% add_trace(x = female_filtered$baseline_gase, type = "box", name = "Female") %>%
    layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Gender"))

gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$baseline_gase, name = "Male", nbinsx = 15, opacity = 1.0, marker = list(color = 'blue')) %>%
  add_histogram(x = female_filtered$baseline_gase, name = "Female", nbinsx = 15, opacity = 1.0, marker = list(color = 'orange')) %>%
  layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Frequency"))

# putting two relevant graphs on top of each other
stackem <- subplot(boxplot_bgase, gender_histogram_baseline, 
                   # giving separate labels to y axis of graphs
                   titleY = TRUE,
                   nrows = 2) %>%
  layout(title = list(text = "Male and Female Baseline GASE"),
         # giving the same label to the x axis
         xaxis2 = list(title = "Baseline GASE Score"))

stackem

Figure 1 (boxplot) shows that both genders have very similar average Baseline GASE scores, with the male median slightly higher than females, but with a lower mean (11 vs. 10 and 11.62 vs. 11.84 respectively), and the same IQR. However, female distribution is more spread, with a longer upper tail, and significantly greater range. Figure 2 corroborates this greater spread, with female standard deviation being significantly higher than males (3.30 to 2.45). While minimal, females are consistently correlated with higher, more varied, rates of symptom reporting. Social conditioning and norms may encourage men to consciously or subconsciously downplay symptoms, while Barsky et al. (2001)10 finds similar results, attributing biological factors, where fluctuations across menstrual cycles influence symptom experience, to possibly account for the greater spread and average in females.11

Figure 3

Code
intervention_data <-mutate(cleaned_data, Intervention = case_when(
  gid %in% c("INSMDN", "ISMDN", "INSMWT", "ISMWT") ~ "Intervention",
  #Natural history group watched the No Intervention Video
  gid %in% c("NISMDN", "NINSMDN", "NISMWT", "NINSMWT", "NHDN", "NHWT") ~ "No Intervention"
))
intervention_boxplot <- plot_ly(intervention_data, x= ~Intervention, y = ~baseline_gase, color = ~gender, type = "box") %>%
  layout(boxmode = "group",
         title = "Male and Female Baseline GASE",
         yaxis = list(title = "Baseline_GASE"),
         xaxis = list(title = "Intervention Status"))


intervention_boxplot

However, the experimental design presents a possible confounder for baseline symptom reporting. As per the experimental design, participants watched two different videos. The possibility that there may be gender-based differences in the perception of the two videos12 is unlikely as Figure 3 above shows females with consistently greater spread compared to males in baseline_gase score regardless of video groups, aligning with our previous results. However, female spread, compared to men, is greater in intervention (upper tail is 3.75 vs. 0.75) than non-intervention (3 vs. 2). This could mean that, while intervention is a minor confounder, females still correlate with higher and more varied symptom reporting regardless, albeit more minimally.

Symptom Change

Figure 4

Code
 p = ggplot(cleaned_data, aes(x = tot, fill = gender)) +
  geom_histogram(aes(y = after_stat(density)), position = "dodge", bins = 25) +
  labs(y = "Density", x = "Symptom Change", title = "Symptom Changes in Male vs Female", fill = "Gender")
ggplotly(p)

Figure 4 above displays a symmetrical distribution about 0 with a slight left skew indicating a near neutral symptom change between both genders; participants overall experienced little to no symptom change. The difference between participants’ GASE symptoms before and after certain interventions between genders revealed male participants showed slightly less symptom change (mean ≈ 0.47) than female participants (mean ≈ 0.58), affirming a higher rate of symptom reporting in females, given that their reporting rates were initially higher. In addition to earlier analysis, females experience lower socio-economic outcomes compared to males, including Australia13, where this experiment occured, potentially influencing worse health outcomes.14

The overall increase in GASE scores could be attributed to the presence of the nocebo effect,15 and furthermore female participants also exhibited a higher standard deviation in their scores (s.d.≈ 2.72) compared to males (s.d. ≈ 1.87). This increase in spread between genders is affirmed in Figure 5 (below) where females have a greater IQR and overall range (IQR = 2, range = 23) than males (IQR = 1.75, range = 11). Since the spread of symptom change is also greater in females, this suggests that females may be more negatively affected to the nocebo effect. Meta-studies on the nocebo effect concur, (Bombaclart et. al) suggesting “conditioned nocebo responses were more frequent in females than in males.” It is likely that nocebo responses are enhanced by anxiety levels, which females often experience at higher rates than males.

Figure 5

Code
box1 = ggplot(cleaned_data, aes(x = gender, y = tot)) +
  geom_boxplot() +
  labs(x = "Gender", y = "Symptom Change", title = "Symptom Changes in Male vs Female")
ggplotly(box1)

Figure 6

Code
ggplot(cleaned_data, aes(x = tot, y = intervention_bin, colour = gender)) +
  geom_boxplot() +
  facet_wrap(~ SEM) +
  labs(x = "Symptom Change", y = "Positive Social Modelling", colour = "Gender", title = "PSM & SEM Between Gender")

Due to the experimental process, the participants were eventually split into four groups that addresses multiple confounders with the assignments of positive social modelling (PSM) and side-effect modelling (SEM). Figure 6 shows that per confounder-group, females have greater symptom change variability with average standard deviation between groups ≈ 2.59 whereas males ≈ 1.67, which is consistent with previous findings.

Acknowledgements

Group Meetings

  • 23/3/2026, 8:00pm - 8:45pm, Present: Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

  • 26/3/2026, 12:00pm - 1:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

  • 30/3/2026, 10:45am - 12:15pm Caleb McCubben, David Lu

  • 1/4/2026, 10:00am - 1:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu

  • 6/4/2026, 6:00pm - 7:00pm, Present: Abhishek Kumar, Harry Liu

  • 9/4/2026, 12:00pm - 11:00pm, Present: Abhishek Kumar, Caleb McCubben, David Lu, Franco Maiolo, Harry Liu, Sunmo An

Resources and References

  1. Saunders, C., Tan, W., Ng, D., Burchett, A., McNair, N., & Colagiuri, B. (2025). Positive social modeling attenuates nocebo side effects. Annals of Behavioral Medicine, 59(1), kaaf048.

    https://doi.org/10.1093/abm/kaaf048

  2. Barsky, A. J., Peekna, H. M., & Borus, J. F. (2001). Somatic symptom reporting in women and men. Journal of General Internal Medicine, 16(4).

    https://doi.org/10.1046/j.1525-1497.2001.016004266.x

  3. Beutel, M. E., Wiltink, J., Ghaemi Kerahrodi, J., Tibubos, A. N., Brähler, E., Schulz, A., Wild, P., Münzel, T., Lackner, K., König, J., Pfeiffer, N., Michal, M., & Henning, M. (2019). Somatic symptom load in men and women from middle to high age in the Gutenberg Health Study — association with psychosocial and somatic factors. Scientific Reports, 9, Article 4610.

    https://doi.org/10.1038/s41598-019-40709-0

  4. Plotly Technologies Inc. (n.d.). Subplots in R. Plotly. 

    https://plotly.com/r/subplots/

  5. GeeksforGeeks. (2025, July 23). How to plot multiple histograms in R

    https://www.geeksforgeeks.org/r-language/how-to-plot-multiple-histograms-in-r/

  6. Quarto. (n.d.). HTML theming.

    https://quarto.org/docs/output-formats/html-themes.html

  7. Quarto. (n.d.). Markdown basics.

    https://quarto.org/docs/authoring/markdown-basics.html#footnotes

  8. Ed Discussion. (2026). Edstem.org.

    https://edstem.org/au/courses/30417/discussion/3126896

  9. Świder, K., & Bąbel, P. (2013). The effect of the sex of a model on nocebo hyperalgesia induced by social observational learning. Pain, 154(8), 1312–1317.

    https://doi.org/10.1016/j.pain.2013.04.001‌

  10. Vambheim, S., & Flaten, M. A. (2017). A systematic review of sex differences in the placebo and the nocebo effect. Journal of Pain Research, Volume 10, 1831–1839.

    https://doi.org/10.2147/jpr.s134745

    Note: The specific references for articles used are 2 and 10

AI Usage Statement

No AI was used in the creation of the report or code.

Compliance with Ethical and Professional Standards

Footnotes

  1. Cosette give link:↩︎

  2. Check this number please↩︎

  3. GASE: General Assessment of Side Effects. The baseline_gase and active_gase_total scores are the cumulative totals of survey scores that measured the perceived severity of a particular symptom (headache, dizziness, and others) from a 7-point scale (1 = “Not present” to 7 = “Severe”) before and after taking a supposed cognitive enhancer (except for the natural history control group) and cognitive performance exam respectively.↩︎

  4. Participants watch one of two videos where an individual either reported a positive experience with medication (intervention) or was not present (non-intervention). The natural history group watched the non-intervention video.↩︎

  5. Tell me what side effect modeeling is↩︎

  6. maybe quote from th articl?↩︎

  7. Men and women could, for example, report severe chest pains at equal rates, but not mild pains, nor runny noses.↩︎

  8. give detail from article↩︎

  9. Explain why↩︎

  10. This study is a cumulative analysis of several medical databases↩︎

  11. This is valid analysis as the ages of all females in our analysis are greater than or equal to 17 years.↩︎

  12. The non-intervention and intervention videos were also split into two different researchers who separately appear in the videos. We did not think different researchers would demonstrate significant differences.

    Also, the Natural History group watched the non-intervention video↩︎

  13. Females make around $6000 less compared the males per year for this age demographic of men and women.↩︎

  14. It is also reasonable to suggest that since the median age of females and males in this study is 19 and 20 respectively, and a greater proportion of females go to university than males, males may have entered into more full time jobs, including trades and apprenticeships, compared to females, thus earning more money. This could then explain the difference in socio-economic, and therefore health, status.↩︎

  15. cosette’s study↩︎