---
title: "New DATA1901 Project"
crossref:
lof-title: "List of Figures"
format:
html:
code-fold: true
code-tools: true
toc: true
html-math-method: katex
css: styles.css
theme: cerulean
fontsize: 1.2em
linestretch: 1.9
title-block-banner-color: Black
editor: visual
---
# Test
```{r}
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rafalib))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rmarkdown))
suppressPackageStartupMessages(library(data.table))
suppressWarnings(library(tidyverse))
suppressWarnings(library(rafalib))
suppressWarnings(library(ggplot2))
suppressWarnings(library(readxl))
suppressWarnings(library(plotly))
suppressWarnings(library(dplyr))
suppressWarnings(library(rmarkdown))
suppressWarnings(library(data.table))
```
# Executive Summary
this is an absolute slog
# Exploratory Data Analysis
The data was sourced from a research article[^1] taking data points from 161[^2] participants and 80 different variables. These variables include numerical measures of symptom severity, group allocations, and more. Having separated the data into gender (nominal qualitative) categories, three interrelated variables from the article were examined in our research question:
[^1]: Cosette give link:
[^2]: Check this number please
1. Discrete-quantitative baseline side effect symptom scores (baseline_gase)[^3]
2. Discrete-quantitative active side effect symptom scores, added together to give a cumulative score (active_gase_total)
3. Discrete-quantitative side effect reporting difference scores, subtracting baseline_gase from active_gase_total (tot)
[^3]: GASE: General Assessment of Side Effects. The baseline_gase and active_gase_total scores are the cumulative totals of survey scores that measured the perceived severity of a particular symptom (headache, dizziness, and others) from a 7-point scale (1 = "Not present" to 7 = "Severe") before and after taking a supposed cognitive enhancer (except for the natural history control group) and cognitive performance exam respectively.
```{r}
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(rmarkdown)
data <- read.csv("1901_data.csv")
data_excluded <- filter(data, gid_pid != "nismwt_28")
data_1 <- mutate(data, active_gase_total = agase_1 + agase_2 + agase_3 + agase_4 + agase_5 + agase_6 + agase_7 + agase_8 + agase_9 + agase_10)
#we actually don't need symptom change, it is given by the variable tot
data_2 <- mutate(data_1, symptom_change = active_gase_total - baseline_gase)
#see if we need to exclude high baseline donny
data_excluded_1 <- mutate(data_excluded, agase_total = agase_1 + agase_2 + agase_3 + agase_4 + agase_5 + agase_6 + agase_7 + agase_8 + agase_9 + agase_10)
```
## Limitations and Assumptions
The research article, from which our data is sourced, deliberately screened for participants not demonstrating severe symptoms,[^4] and so this pool of data is not random. Since our analysis investigates differences across gender, and does not individually analyse select symptoms, the hypothetical disparities in symptom reporting may differ according to each symptom, and its severity[^5]. It is assumed that trends in cumulative, mild symptom reporting reflect general trends.
[^4]: maybe quote from th articl?
[^5]: Men and women could, for example, report severe chest pains at equal rates, but not mild pains, nor runny noses.
## Omissions and Changes
We considered the sample size of the "Other" gender as too small (n=6) for significant trends, so it was omitted. We also separated the data into two male and female sets.
```{r}
data_2_tot_barplot <- data_2
male_filtered <- filter(data_2, gender == "Male")
male_filtered_excluded <- filter(data_excluded_1, gender == "Male")
female_filtered <- filter(data_2, gender == "Female")
female_filtered_excluded <- filter(data_excluded_1, gender == "Female") #the data from the excluded shown below does not affect our results much at all
```
```{r}
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(rafalib)
library(data.table)
#see how to get rid of warning
#may ned to remove outlier (find out which is outlier first)
#what other graphs? --> positive experience, vs. no site of that (as that happens before)
#intervention vs no intervention should only be relevant for baseline gase
#outlier greater than 6 or mean of 4 as per study
#note that removing the outlier, who is male, does not affect the analysis - female symptom reporting is already innately higher
#analysis should see if this is a social cause, a physiological cause, or combined
# in fact men may actually be sicker than women, but not report it ("man flu")
#do we care about the absolute value of the symptom change?
```
# Research Question and Analysis
[*Are there significant relationships between gender and overall rates of symptom reporting, both with and without potential nocebo effects?*]{.smallcaps}
# Analysis
this is a wrong analysis
\|
\|
v
Our analysis revealed that females reported symptoms at very similar rates to men, however. This trend was present both for initial (baseline_gase) measurements, and symptom change (tot) measurements.
Here is a footnote reference,[^6] and another.[^7]
[^6]: Here is the footnote.
[^7]: Here's one with multiple blocks.
Subsequent paragraphs are indented to show that they belong to the previous footnote.
```
{ some.code }
```
The whole paragraph can be indented, or just the first line. In this way, multi-paragraph footnotes work like multi-paragraph list items.
This paragraph won't be part of the note, because it isn't indented.
Here is a footnote reference,[^8] and another.[^9]
[^8]: third one motherfucker
[^9]: Here's one with multiple blocks.
Subsequent paragraphs are indented to show that they belong to the previous footnote.
```
{ some.code }
```
yeah son
This paragraph won't be part of the note, because it isn't indented.
## Baseline GASE
### Figure 1
```{r}
data_2_bgase_barplot <- data_2
boxplot_bgase <- plot_ly(x = male_filtered$baseline_gase, type = "box", name = "Male")
boxplot_bgase <- boxplot_bgase %>% add_trace(x = female_filtered$baseline_gase, type = "box", name = "Female") %>%
layout(title = "Male and Female Baseline GASE",
xaxis = list(title = "Baseline GASE"),
yaxis = list(title = "Gender"))
boxplot_bgase
```
### Figure 2
```
```
Figure 1 shows that both genders have very similar average Baseline GASE scores, with the male median slightly higher than females, but with a lower mean (11 vs. 10 and 11.61702 vs. 11.83636 respectively), and the same IQR. However, female distribution is more spread, with a longer upper tail, and significantly greater range. Figure 2 corroborates this greater spread, and female standard distribution is significantly higher than males (3.30 to 2.45). While minimal, there exists a consistent correlation between females and higher, more varied, rates of symptom reporting. Social conditioning and masculine norms may encourage men to consciously and subconsciously downplay symptoms, while Barsky et al. (2001)[^10] finds similar results, discussing biological factors, where fluctuations across menstrual cycles influence symptom experience, possibly accounting for the greater spread and average in females.[^11]
[^10]: This study is a cumulative analysis of several medical databases
[^11]: This is valid analysis as the ages of all females in our analysis are greater than or equal to 17 years.
### Figure 2
```{r}
gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
add_histogram(x = male_filtered$baseline_gase, name = "Male", nbinsx = 15, opacity = 1.0, marker = list(color = 'blue')) %>%
add_histogram(x = female_filtered$baseline_gase, name = "Female", nbinsx = 15, opacity = 1.0, marker = list(color = 'orange')) %>%
layout(title = "Male and Female Baseline GASE",
xaxis = list(title = "Baseline GASE"),
yaxis = list(title = "Frequency"))
gender_histogram_baseline
stackem <- subplot(boxplot_bgase, gender_histogram_baseline, nrows = 2) %>%
layout(title = list(text = "Male and Female Baseline GASE"))
stackem
```
```{r}
stackem <- subplot(boxplot_bgase, gender_histogram_baseline, nrows = 2) %>%
layout(title = list(text = "Male and Female Baseline GASE"))
stackem
```
However, the experimental design presents a possible confounder for baseline symptom reporting. Participants watch one of two videos where an individual either reported a positive experience with medication (intervention) or was not present (non-intervention). The possibility that there may be gender-based differences in the perception of the two videos[^12] is unlikely as Figure 3 below shows females with consistently greater spread compared to males in baseline_gase score regardless of video groups, aligning with our previous results. However, female spread is enhanced more, compared to men, in intervention (upper tail is 3.75 vs. 0.75) than non-intervention (3 vs. 2). This could mean that, while intervention is a minor confounder, females still correlate with higher and more varied symptom reporting regardless, albeit more minimally.
[^12]: The non-intervention and intervention videos were also split into two different researchers who separately appear in the videos. We did not think different researchers would demonstrate significant differences.
Also, the Natural History group watched the non-intervention video
### Figure 3
```{r}
data_3 <- filter(data_2, gender != "other")
data_3 <-mutate(data_3, Intervention = case_when(
gid %in% c("INSMDN", "ISMDN", "INSMWT", "ISMWT") ~ "Intervention",
gid %in% c("NISMDN", "NINSMDN", "NISMWT", "NINSMWT", "NHDN", "NHWT") ~ "No Intervention"
))
intervention_boxplot <- plot_ly(data_3, x= ~Intervention, y = ~baseline_gase, color = ~gender, type = "box") %>%
layout(boxmode = "group",
title = "Male and Female Baseline GASE")
intervention_boxplot
```
## Results {.tabset}
### Plots
We show a scatter plot in this section
```{r, echo = T}
boxplot_bgase
```
### Tables
We show the data in this tab
```{r, echo = T}
gender_histogram_baseline
```
# Symptom Change
```{r}
gender_histogram_change <- plot_ly(histnorm = "probability") %>%
add_histogram(x = male_filtered$tot, name = "Men", nbinsx = 25, opacity = 1.0) %>%
add_histogram(x = female_filtered$tot, name = "Women", nbinsx = 25, opacity = 1.0) %>%
layout(title = "Male and Female Symptom Changes",
xaxis = list(title = "Symptom Change"),
yaxis = list(title = "Frequency"))
gender_histogram_change
gender_histogram_change_outlier_excluded <- plot_ly(histnorm = "probability") %>%
add_histogram(x = male_filtered_excluded$tot, name = "Men", nbinsx = 25, opacity = 1.0) %>%
add_histogram(x = female_filtered_excluded$tot, name = "Women", nbinsx = 25, opacity = 1.0) %>%
layout(title = "Male and Female Symptom Changes - Outlier Removed",
xaxis = list(title = "Symptom Change"),
yaxis = list(title = "Frequency"))
gender_histogram_change_outlier_excluded
sd(female_filtered$tot)
sd(male_filtered$tot)
sd(female_filtered_excluded$tot)
sd(male_filtered_excluded$tot)
male_filtered_tot_mean = mean(abs(male_filtered$tot))
female_filtered_tot_mean = mean(abs(female_filtered$tot))
#using absolute values here to see how much symptoms change (is this relevant?)
data_2_tot_barplot <- data.frame(
Gender = c("Male", "Female"),
Average_Symptom = c(male_filtered_tot_mean, female_filtered_tot_mean)
)
ggplot(data_2_tot_barplot, aes(x = Gender, y = Average_Symptom)) +
geom_bar(stat = "identity") +
labs(title = "Mean Symptom Change by Gender", x = "Gender", y = "Mean Symptom Change")
male_filtered_tot_mean
female_filtered_tot_mean
```
@fig-gender_histogram_change is present mate
# Graph test two
```{r}
data_2_agase_barplot <- data_2
gender_histogram_agase <- plot_ly(histnorm = "probability") %>%
add_histogram(x = male_filtered$active_gase_total, name = "Men", nbinsx = 15, opacity = 1.0) %>%
add_histogram(x = female_filtered$active_gase_total, name = "Women", nbinsx = 15, opacity = 1.0) %>%
layout(title = "Male and Female Active GASE",
xaxis = list(title = "Active GASE"),
yaxis = list(title = "Frequency"))
gender_histogram_agase
sd(female_filtered$agase_total)
sd(male_filtered$agase_total)
summary(female_filtered$agase_total)
summary(male_filtered$agase_total)
male_filtered_agase_mean = mean(abs(male_filtered$active_gase_total))
female_filtered_agase_mean = mean(abs(female_filtered$active_gase_total))
data_2_agase_barplot <- data.frame(
Gender = c("Male", "Female"),
Active_Gase = c(male_filtered_agase_mean, female_filtered_agase_mean)
)
ggplot(data_2_agase_barplot, aes(x = Gender, y = Active_Gase)) +
geom_bar(stat = "identity") +
labs(title = "Mean Active GASE by Gender", x = "Gender", y = "Mean Active GASE")
male_filtered_agase_mean
female_filtered_agase_mean
```
## Sub heading
```{r}
data_2_bgase_barplot <- data_2
gender_histogram_baseline <- plot_ly(histnorm = "probability") %>%
add_histogram(x = male_filtered$baseline_gase, name = "Men", nbinsx = 15, opacity = 1.0) %>%
add_histogram(x = female_filtered$baseline_gase, name = "Women", nbinsx = 15, opacity = 1.0) %>%
layout(title = "Male and Female Baseline GASE",
xaxis = list(title = "Baseline GASE"),
yaxis = list(title = "Frequency"))
gender_histogram_baseline
sd(female_filtered$baseline_gase)
sd(male_filtered$baseline_gase)
summary(female_filtered$baseline_gase)
summary(male_filtered$baseline_gase)
male_filtered_bgase_mean = mean((male_filtered$baseline_gase))
female_filtered_bgase_mean = mean((female_filtered$baseline_gase))
data_2_bgase_barplot <- data.frame(
Gender = c("Male", "Female"),
Baseline_Gase = c(male_filtered_bgase_mean, female_filtered_bgase_mean)
)
ggplot(data_2_bgase_barplot, aes(x = Gender, y = Baseline_Gase)) +
geom_bar(stat = "identity") +
labs(title = "Mean Baseline GASE by Gender", x = "Gender", y = "Mean Baseline GASE")
male_filtered_bgase_mean
female_filtered_bgase_mean
```