New DATA1901 Project

Test

Code

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rafalib))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rmarkdown))
suppressPackageStartupMessages(library(data.table))


suppressWarnings(library(tidyverse))
suppressWarnings(library(rafalib))
suppressWarnings(library(ggplot2))
suppressWarnings(library(readxl))
suppressWarnings(library(plotly))
suppressWarnings(library(dplyr))
suppressWarnings(library(rmarkdown))
suppressWarnings(library(data.table))

Executive Summary

this is an absolute slog

Exploratory Data Analysis

The data was sourced from a research article¹ taking data points from 161² participants and 80 different variables. These variables include numerical measures of symptom severity, group allocations, and more. Having separated the data into gender (nominal qualitative) categories, three interrelated variables from the article were examined in our research question:

Discrete-quantitative baseline side effect symptom scores (baseline_gase)³
Discrete-quantitative active side effect symptom scores, added together to give a cumulative score (active_gase_total)
Discrete-quantitative side effect reporting difference scores, subtracting baseline_gase from active_gase_total (tot)

Code

library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(rmarkdown)
data <- read.csv("1901_data.csv")

data_excluded <- filter(data, gid_pid != "nismwt_28")

data_1 <- mutate(data, active_gase_total = agase_1 + agase_2 + agase_3 + agase_4 + agase_5 + agase_6 + agase_7 + agase_8 + agase_9 + agase_10)
#we actually don't need symptom change, it is given by the variable tot
data_2 <- mutate(data_1, symptom_change = active_gase_total - baseline_gase)
#see if we need to exclude high baseline donny

data_excluded_1 <- mutate(data_excluded, agase_total = agase_1 + agase_2 + agase_3 + agase_4 + agase_5 + agase_6 + agase_7 + agase_8 + agase_9 + agase_10)

Limitations and Assumptions

The research article, from which our data is sourced, deliberately screened for participants not demonstrating severe symptoms,⁴ and so this pool of data is not random. Since our analysis investigates differences across gender, and does not individually analyse select symptoms, the hypothetical disparities in symptom reporting may differ according to each symptom, and its severity⁵. It is assumed that trends in cumulative, mild symptom reporting reflect general trends.

Omissions and Changes

We considered the sample size of the “Other” gender as too small (n=6) for significant trends, so it was omitted. We also separated the data into two male and female sets.

Code

data_2_tot_barplot <- data_2

male_filtered <- filter(data_2, gender == "Male") 
male_filtered_excluded <- filter(data_excluded_1, gender == "Male")

female_filtered <- filter(data_2, gender == "Female") 
female_filtered_excluded <- filter(data_excluded_1, gender == "Female") #the data from the excluded shown below does not affect our results much at all

Code

library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(data.table)
#see how to get rid of warning
#may ned to remove outlier (find out which is outlier first)
#what other graphs? --> positive experience, vs. no site of that (as that happens before)
#intervention vs no intervention should only be relevant for baseline gase
#outlier greater than 6 or mean of 4 as per study
#note that removing the outlier, who is male, does not affect the analysis - female symptom reporting is already innately higher
#analysis should see if this is a social cause, a physiological cause, or combined
# in fact men may actually be sicker than women, but not report it ("man flu")
#do we care about the absolute value of the symptom change?

Research Question and Analysis

Are there significant relationships between gender and overall rates of symptom reporting, both with and without potential nocebo effects?

Analysis

this is a wrong analysis

Our analysis revealed that females reported symptoms at very similar rates to men, however. This trend was present both for initial (baseline_gase) measurements, and symptom change (tot) measurements.

Here is a footnote reference,⁶ and another.⁷

This paragraph won’t be part of the note, because it isn’t indented.

Here is a footnote reference,⁸ and another.⁹

This paragraph won’t be part of the note, because it isn’t indented.

Baseline GASE

Figure 1

Code

data_2_bgase_barplot <- data_2


boxplot_bgase <- plot_ly(x = male_filtered$baseline_gase, type = "box", name = "Male")
boxplot_bgase <- boxplot_bgase %>% add_trace(x = female_filtered$baseline_gase, type = "box", name = "Female") %>%
    layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Gender"))

boxplot_bgase

Figure 2

Figure 1 shows that both genders have very similar average Baseline GASE scores, with the male median slightly higher than females, but with a lower mean (11 vs. 10 and 11.61702 vs. 11.83636 respectively), and the same IQR. However, female distribution is more spread, with a longer upper tail, and significantly greater range. Figure 2 corroborates this greater spread, and female standard distribution is significantly higher than males (3.30 to 2.45). While minimal, there exists a consistent correlation between females and higher, more varied, rates of symptom reporting. Social conditioning and masculine norms may encourage men to consciously and subconsciously downplay symptoms, while Barsky et al. (2001)¹⁰ finds similar results, discussing biological factors, where fluctuations across menstrual cycles influence symptom experience, possibly accounting for the greater spread and average in females.¹¹

Figure 2

Code

gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$baseline_gase, name = "Male", nbinsx = 15, opacity = 1.0, marker = list(color = 'blue')) %>%
  add_histogram(x = female_filtered$baseline_gase, name = "Female", nbinsx = 15, opacity = 1.0, marker = list(color = 'orange')) %>%
  layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Frequency"))
gender_histogram_baseline

Code

stackem <- subplot(boxplot_bgase, gender_histogram_baseline, nrows = 2) %>%
  layout(title = list(text = "Male and Female Baseline GASE"))

stackem

Code

stackem <- subplot(boxplot_bgase, gender_histogram_baseline, nrows = 2) %>%
  layout(title = list(text = "Male and Female Baseline GASE"))

stackem

However, the experimental design presents a possible confounder for baseline symptom reporting. Participants watch one of two videos where an individual either reported a positive experience with medication (intervention) or was not present (non-intervention). The possibility that there may be gender-based differences in the perception of the two videos¹² is unlikely as Figure 3 below shows females with consistently greater spread compared to males in baseline_gase score regardless of video groups, aligning with our previous results. However, female spread is enhanced more, compared to men, in intervention (upper tail is 3.75 vs. 0.75) than non-intervention (3 vs. 2). This could mean that, while intervention is a minor confounder, females still correlate with higher and more varied symptom reporting regardless, albeit more minimally.

Figure 3

Code

data_3 <- filter(data_2, gender != "other")


data_3 <-mutate(data_3, Intervention = case_when(
  gid %in% c("INSMDN", "ISMDN", "INSMWT", "ISMWT") ~ "Intervention",
  gid %in% c("NISMDN", "NINSMDN", "NISMWT", "NINSMWT", "NHDN", "NHWT") ~ "No Intervention"
))
intervention_boxplot <- plot_ly(data_3, x= ~Intervention, y = ~baseline_gase, color = ~gender, type = "box") %>%
  layout(boxmode = "group",
         title = "Male and Female Baseline GASE")


intervention_boxplot

Results

Plots

We show a scatter plot in this section

Code

boxplot_bgase

Tables

We show the data in this tab

Code

gender_histogram_baseline

Symptom Change

Code

gender_histogram_change <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$tot, name = "Men", nbinsx = 25, opacity = 1.0) %>%
  add_histogram(x = female_filtered$tot, name = "Women", nbinsx = 25, opacity = 1.0) %>%
  layout(title = "Male and Female Symptom Changes", 
         xaxis = list(title = "Symptom Change"),
         yaxis = list(title = "Frequency"))
  

 
gender_histogram_change

Code

gender_histogram_change_outlier_excluded <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered_excluded$tot, name = "Men", nbinsx = 25, opacity = 1.0) %>%
  add_histogram(x = female_filtered_excluded$tot, name = "Women", nbinsx = 25, opacity = 1.0) %>%
  layout(title = "Male and Female Symptom Changes - Outlier Removed", 
         xaxis = list(title = "Symptom Change"),
         yaxis = list(title = "Frequency"))
     
gender_histogram_change_outlier_excluded

Code

sd(female_filtered$tot)

[1] 2.72394

Code

sd(male_filtered$tot)

[1] 1.874904

Code

sd(female_filtered_excluded$tot)

[1] 2.72394

Code

sd(male_filtered_excluded$tot)

[1] 1.882669

Code

male_filtered_tot_mean = mean(abs(male_filtered$tot))
female_filtered_tot_mean = mean(abs(female_filtered$tot))
#using absolute values here to see how much symptoms change (is this relevant?)
data_2_tot_barplot <- data.frame(
  Gender = c("Male", "Female"),
  Average_Symptom = c(male_filtered_tot_mean, female_filtered_tot_mean)
)
ggplot(data_2_tot_barplot, aes(x = Gender, y = Average_Symptom)) +
  geom_bar(stat = "identity") +
  labs(title = "Mean Symptom Change by Gender", x = "Gender", y = "Mean Symptom Change")

Code

male_filtered_tot_mean

[1] 1.234043

Code

female_filtered_tot_mean

[1] 1.6

?@fig-gender_histogram_change is present mate

Graph test two

Code

data_2_agase_barplot <- data_2

gender_histogram_agase <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$active_gase_total, name = "Men", nbinsx = 15, opacity = 1.0) %>%
  add_histogram(x = female_filtered$active_gase_total, name = "Women", nbinsx = 15, opacity = 1.0) %>%
  layout(title = "Male and Female Active GASE", 
         xaxis = list(title = "Active GASE"),
         yaxis = list(title = "Frequency"))
     
gender_histogram_agase

Code

sd(female_filtered$agase_total)

[1] NA

Code

sd(male_filtered$agase_total)

[1] NA

Code

summary(female_filtered$agase_total)

Length  Class   Mode 
     0   NULL   NULL

Code

summary(male_filtered$agase_total)

Length  Class   Mode 
     0   NULL   NULL

Code

male_filtered_agase_mean = mean(abs(male_filtered$active_gase_total))
female_filtered_agase_mean = mean(abs(female_filtered$active_gase_total))
data_2_agase_barplot <- data.frame(
  Gender = c("Male", "Female"),
  Active_Gase = c(male_filtered_agase_mean, female_filtered_agase_mean)
)
ggplot(data_2_agase_barplot, aes(x = Gender, y = Active_Gase)) +
  geom_bar(stat = "identity") +
  labs(title = "Mean Active GASE by Gender", x = "Gender", y = "Mean Active GASE")

Code

male_filtered_agase_mean

[1] 12.08511

Code

female_filtered_agase_mean

[1] 12.41818

Sub heading

Code

data_2_bgase_barplot <- data_2

gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
  add_histogram(x = male_filtered$baseline_gase, name = "Men", nbinsx = 15, opacity = 1.0) %>%
  add_histogram(x = female_filtered$baseline_gase, name = "Women", nbinsx = 15, opacity = 1.0) %>%
  layout(title = "Male and Female Baseline GASE", 
         xaxis = list(title = "Baseline GASE"),
         yaxis = list(title = "Frequency"))
     
gender_histogram_baseline

Code

sd(female_filtered$baseline_gase)

[1] 3.302841

Code

sd(male_filtered$baseline_gase)

[1] 2.454394

Code

summary(female_filtered$baseline_gase)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.00   10.00   10.00   11.84   12.00   31.00

Code

summary(male_filtered$baseline_gase)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.00   10.00   11.00   11.62   12.00   21.00

Code

male_filtered_bgase_mean = mean((male_filtered$baseline_gase))
female_filtered_bgase_mean = mean((female_filtered$baseline_gase))
data_2_bgase_barplot <- data.frame(
  Gender = c("Male", "Female"),
  Baseline_Gase = c(male_filtered_bgase_mean, female_filtered_bgase_mean)
)
ggplot(data_2_bgase_barplot, aes(x = Gender, y = Baseline_Gase)) +
  geom_bar(stat = "identity") +
  labs(title = "Mean Baseline GASE by Gender", x = "Gender", y = "Mean Baseline GASE")

Code

male_filtered_bgase_mean

[1] 11.61702

Code

female_filtered_bgase_mean

[1] 11.83636

Footnotes

Cosette give link:↩︎
Check this number please↩︎
GASE: General Assessment of Side Effects. The baseline_gase and active_gase_total scores are the cumulative totals of survey scores that measured the perceived severity of a particular symptom (headache, dizziness, and others) from a 7-point scale (1 = “Not present” to 7 = “Severe”) before and after taking a supposed cognitive enhancer (except for the natural history control group) and cognitive performance exam respectively.↩︎
maybe quote from th articl?↩︎
Men and women could, for example, report severe chest pains at equal rates, but not mild pains, nor runny noses.↩︎
Here is the footnote.↩︎
Here’s one with multiple blocks.

Subsequent paragraphs are indented to show that they belong to the previous footnote.
```
{ some.code }
```
The whole paragraph can be indented, or just the first line. In this way, multi-paragraph footnotes work like multi-paragraph list items.↩︎
third one motherfucker↩︎
Here’s one with multiple blocks.

Subsequent paragraphs are indented to show that they belong to the previous footnote.
```
{ some.code }
```
yeah son↩︎
This study is a cumulative analysis of several medical databases↩︎
This is valid analysis as the ages of all females in our analysis are greater than or equal to 17 years.↩︎
The non-intervention and intervention videos were also split into two different researchers who separately appear in the videos. We did not think different researchers would demonstrate significant differences.

Also, the Natural History group watched the non-intervention video↩︎