Breast Cancer Research - Project 2

Author

SID - 550695253

Client Bio

Client: Breast Cancer Trials

Australian Breast Cancer Research URL Link

Bio

The Breast Cancer Trials (BCT) is the world leading group of breast cancer doctors and researchers who are determined to finding better treatments and preventative initiatives for all people who are affected by Breast Cancer across the world. Through Clinical trials and conducting major research across the world they continue to live out their main focus where “We can and we will find new and better treatments and prevention strategies for every person affected by breast cancer that saves lives today, tomorrow and forever.” (Breast Cancer Trials).

Recommendation

There have been major differences found in the survival months between the races within the study (white, Black, Other), however, those of black ethnicity have significantly lower rates than those of other races, therefore this must be recognized and studied upon. It has been found that those of black ethnicity suffer lower health outcomes due to “higher rates of unemployment and under-representation in good-paying jobs that include health insurance as part of the benefit package” (Williams, Rucker, 2000), and such knowledge has been represented within the breast cancer data. Therefore, Drawing insights from the statistics and in depth analysis of the existing disparities I recommended that the Breast Cancer Trials organisation focuses on implementing easy access educational programs that are easily accessible online through social media platforms and also are associated with low costs to ensure marginalized groups can detect breast cancer to increase chances of survival.

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(readr)
breast_cancer_2_ <- read_csv("~/Desktop/breast_cancer (2).csv")

Rows: 4024 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): Race, Marital Status, T Stage, N Stage, 6th Stage, differentiate, ...
dbl  (5): Age, Tumor Size, Regional Node Examined, Reginol Node Positive, Su...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Statistic Data Analysis

Code

# statistical comparison of Races in the Breast Cancer Data set 
library(ggplot2)
library(dplyr)

race_counts <- breast_cancer_2_ %>%
  group_by(Race) %>%
  summarise(Count = n()) %>%
  mutate(Percent = Count / sum(Count) * 100,
         Label = paste0(Race, " (", round(Percent, 1), "%)"))

ggplot(race_counts, aes(x = "", y = Percent, fill = Race)) +
  geom_col(width = 1, color = "white") +
  coord_polar("y") + 
  labs(title = "Composition of Race") +  geom_text(aes(label = Label), position = position_stack(vjust = 0.3), size = 4) + theme_void() + scale_fill_brewer(palette = "Set2")

Code

# Historgam comparison of Race and Survival Months 
library(dplyr)
library(ggplot2)

# Mean survival by race
summary_data <- breast_cancer_2_ %>%
  group_by(Race) %>%
  summarise(Mean_Survival = mean(`Survival Months`, na.rm = TRUE))

ggplot(summary_data, aes(x = Race, y = Mean_Survival, fill = Race)) +
  geom_col(width = 0.6) +
  labs(
    title = "Mean Survival Months by Race",
    x = "Race",
    y = "Mean Survival Months"
  ) +
  theme_minimal() +
  theme(axis.text = element_text(size = 12)) +
  scale_fill_brewer(palette = "Dark2")

The data illustrates that there is a high percentage of white people who are identified with breast cancer within the data set.
The histogram illustrates that there are significant difference in the Survival months between races in the data set, however those of Black ethnicity have the lowest survival months compared to other races.
Therefore from this analysis of data, we can draw major insights and reasoning’s as to why these statistics have occurred, evidence and major findings are found below.

Evidence

Code

# Box Plot illustarating the IQR and mean of the comparison of Race and Survival
library(ggplot2)

ggplot(breast_cancer_2_, aes(x = Race, y = `Survival Months`, fill = Race)) +
  geom_boxplot() +
  labs(
    title = "Survival Months by Race",
    x = "Race",
    y = "Survival Months"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")

Code

# The statistical summary giving a model illustrating Test Statistic and P-Values
anova_model <- aov(`Survival Months` ~ Race, data = breast_cancer_2_)
summary(anova_model)

              Df  Sum Sq Mean Sq F value   Pr(>F)    
Race           2    7739    3870   7.388 0.000627 ***
Residuals   4021 2105913     524                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

anova_model <- aov(`Survival Months` ~ Race, data = breast_cancer_2_)
TukeyHSD(anova_model)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = `Survival Months` ~ Race, data = breast_cancer_2_)

$Race
                 diff       lwr       upr     p adj
Other-Black  6.610191  2.263932 10.956450 0.0010686
White-Black  4.905456  1.628752  8.182161 0.0013162
White-Other -1.704735 -4.841647  1.432177 0.4099001

Code

# Summary table of Survival Months by Race (IQR, Mean, Median)
library(dplyr)
summary_table <- breast_cancer_2_ %>%
  group_by(Race) %>%
  summarise(
    Mean_Survival = round(mean(`Survival Months`, na.rm = TRUE), 1),
    Median_Survival = round(median(`Survival Months`, na.rm = TRUE), 1),
    IQR_Survival = round(IQR(`Survival Months`, na.rm = TRUE), 1)
  )
print(summary_table)

# A tibble: 3 × 4
  Race  Mean_Survival Median_Survival IQR_Survival
  <chr>         <dbl>           <dbl>        <dbl>
1 Black          66.6              67         34.5
2 Other          73.2              77         33.2
3 White          71.5              73         34

To research the different breast Cancer survival months compared to the different races in the data set, the most effective approach is hypothesis testing to determine whether there is a relationship between the two variables.

Hypothesis

H(O) - There is no difference in the average survival months between the comparison of all of the racial groups within the study (white - Other, White - Black, Black - Other).

H(1) - There is a significant difference in the average survival months between the comparisons of the racial groups (White - other, White - Black, Black - Other).

Assumptions

Each participant’s survival time data should be independent of each other.
The residuals between each of the race groups (white, Black, and other) should be normally distributed to ensure the given test statistic and p-value is representative of the data.
The variance in survival time amongst the different races should be roughly equal to each other to ensure accurate results.

Test Statistic

Test Statistic calculated in R:

T = 7.388

P-Value

P = 0.000627 < 0.05

The P-value of 0.000627 rejects the Null Hypothesis, therefore to determine the differences between the races (White, Black, Other) The P-Values were found between each comparison to draw out the most significant differences that Occur in survival months.

P-Values of each Race/survival comparison
Comparison	P-Value	Statistical significance (Yes/No)
Black - Other	P = 0.00107 < 0.05	Yes
Black - White	P = 0.00132 < 0.05	Yes
White - Other	P = 0.40990 > 0.05	No

Conclusion

The P value given was 0.000627 which is significantly less than 0.05, therefore we reject the Null hypothesis overall for this test.
There is significant evidence that shows that different races (white, black, other) have different survival rates of breast cancer.

Insights From Testing

The Hypothesis testing illustrates that there is a significant difference in the survival rates between the races for breast cancer. Specifically the most significant findings was the low survival months of Black participants compared to both the White and Other Races with the P - Values (White - Black) P = 0.00132 < 0.05, and (Black - Other) P = 0.00107 < 0.05. Statistical summary of both the means and medians of each race survival rate shows great deviations, White median = 73, Other median = 77, and Black median = 67, Therefore there is great evidence that race plays a significant role in the survival months for breast cancer, thus looking into the reasons why is crucial for increasing survival rates and well being for Black breast cancer participants.

Studies show that those of black ethnicity have ” higher rates of morbidity and mortality than white persons for most indicators of physical health” (Williams, Rucker, 2000), commonly due to the higher unemployment rates which then affect the ability to access to health services to ensure that “preventive care, early intervention and the appropriate management of chronic disease” (Williams, Rucker, 2000) occurs. The data illustrates that there is a major disparity comparing white/other and black people where the mean of survival represented by the Box Plot for Black participants is 66 compared to 73.2 (Other), and 71.5 (White). Breast Cancer Trials can reduce this margin through initiatives that aim to educated and increase healthcare to those of black ethnicity to reduce the impact of breast cancer. in the knowledge that “Young African American women are disproportionately impacted by early-onset breast cancer compared to women of other races.” (Huq, woodard, Mcarthy, 2022). implementing education videos into social media platforms such as Instagram, Facebook, Twitter, Tik Tok, would be valuable for educational videos on the early sings of breast cancer as “Approximately 84% of young adults use at least one social media platform” (Avery, Statnton, 2023) therefore, spreading vital information on early detection and signs of breast cancer is valuable to increasing the survival rate for marginalized groups in society.

Ethics Statement

This report adheres to the https://isi-web.org/declaration-professional-ethics guidelines by ensuring that transparency in the methods to minimize bias to ensure the data is accurate and effective for the Breast Cancer Trial Organisation. The shared value of “We discuss issues objectively and strive to contribute to the resolution of problems” was upheld by exploring the main issues of racial difference in the data. The ethical principal of “Exposing and Reviewing Methods and Findings” was upheld through the utilization of all accessible data, ensuring no bias occurred. Therefore my commitment to upholding the interests of the public was of crucial concern within the report.

AI usage statement

Chat gpt was the only AI tool used for the assignment (only assisted in aspects of the coding in Rstudio)

Chat gpt - Pie Chart

assisted in specifically ensuring the pie chart scaled correctly and ensuring the percentages were in the correct place on the pie chart.

Chat gpt - table showing the overall p-value, and also p-value between each race

Histogram - (end of it)

Assistance with the end of of the coding for the histogram

statistical summary

printed a summary table for the mean, median, and the IQR for the survival months of each of the races (White, Black, Other).

Acknowledgements

Breast Cancer Trials. (2024). Who We Are | Breast Cancer Trials. [online] Available at: https://www.breastcancertrials.org.au
Huq, M.R., Woodard, N., Okwara, L., McCarthy, S. and Knott, C.L. (2022). Recommendations for breast cancer education for African American women below screening age. Health Education Research, [online] 36(5), pp.530–540. doi:https://doi.org/10.1093/her/cyab033.
International Statistical Institute (2023). Declaration on Professional Ethics | ISI. [online] Isi-web.org. Available at: https://isi-web.org/declaration-professional-ethics.
Matsuzaka, S., Avery, L.R. and Stanton, A.G. (2023). Black Women’s Social Media Use Integration and Social Media Addiction. Social Media + Society, 9(1), p.205630512211489. doi:https://doi.org/10.1177/20563051221148977.
Williams, D.R. and Rucker, T.D. (2014). Understanding and Addressing Racial Disparities in Health Care. Health Care Financing Review, [online] 21(4), p.75. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC4194634/.