Data1001 Project 2 (BreastScreen)

Author

Mohamed Bougrassa

1. Client Bio

‘BreastScreen’ Australia is the national breast cancer screening program offering free mammograms to women aged 50–74 to improve early detection and reduce breast cancer mortality. More information: https://www.cancerscreening.gov.au/breastscreen

Code

library(tidyverse)

breast_data = read.csv("breast_cancer.csv")


breast_data = na.omit(breast_data)
breast_data = subset(breast_data, Tumor.Size > 0 & Tumor.Size <= 100)

2. Recommendation

As a result of the anlysis provided, it is recommended to conduct early tumor size screening and estrogen receptor (ER) testing. Patients who have smaller tumors and are ER-positive show substantially better survival results. The research findings make it imperative for more funding of early detection programs and hormone-based treatment approaches to enhance both prognosis and efficiency.

3. Evidence

The Breast cancer dataset was examined through three essential variables which included Patient Status (Alive vs Deceased), Tumor Size, and Estrogen Receptor Status. These variables serve as essential indicators in order to understand treatment results and subsequently direct therapeutic choices

3.1 Patient Outcome Distribution

Code

ggplot(breast_data, aes(x = Status)) +
  geom_bar(fill = "darkblue") +
  labs(title = "Patient Status Distribution", x = "Status", y = "Count")

The Bar Plot of ‘Status’ shows that more patients in the dataset survived than those who did not. This initially seems encouraging, but it does introduce a noticeable bias in the outcome data. This skewed distribution may affect the interpretation of other variables, due to the fact that statistical relationships could be affected by the overrepresentation of survivors in the dataset.

3.2 Tumor Size by Status

RQ1: How does tumor size vary between patients who survive and those who do not following a breast cancer diagnosis?

Code

ggplot(breast_data, aes(x = Status, y = Tumor.Size)) +
  geom_boxplot(fill = "darkred") +
  labs(title = "Tumor Size by Patient Status", x = "Status", y = "Tumor Size (mm)")

The Comparative boxplot of ‘Tumor Size’ by ‘Status’ shows that there is a significant difference in tumor size between the two outcome groups. Patients who died had significantly larger tumours on average, with distributions extending to higher values. This is consistent with the established clinical knowledge that larger tumours are associated with poorer prognosis. Additionally, the presence of outliers within the deceased group highlights the increased severity of late-stage tumour progression. A Tumour size investigation by Elston & Ellis (1991) concluded that it is one of the most significant independent prognostic variables in breast cancer, impacting both survival rates and recurrence, thus showing the requirement for early detection and treatment programs.

3.3 Estrogen Receptor Status by Outcome

RQ2: Is there an association between estrogen receptor (ER) status and breast cancer survival outcomes?

Code

ggplot(breast_data, aes(x = Estrogen.Status, fill = Status)) +
  geom_bar(position = "dodge") +
  labs(title = "Estrogen Status by Patient Outcome", x = "Estrogen Status", y = "Count")

The Double Bar Plot comparing ‘Estrogen Status’ and ‘Status’ shows that patients with positive estrogen receptor (ER) status have a higher chance of survival. This finding is in line with existing medical research which shows that ER-positive patients typically respond well to hormonal therapies. The extensive study conducted by he Early Breast Cancer Trialists Collaborative Group (2011), highlighted that hormone treatments like Tamoxifen greatly improve long-term survival and reduce recurrence in ER-positive patients. These insights illustrate the practical importance of ER status testing in predicting outcomes and tailoring appropriate treatments for patients.

3.4 Limitations

The main limitation is the imbalance in the ‘Status’ variable, an issue raised in the initial analysis within the evidence section. The observed relationships might be biased because surviving patients are over-represented, which could result in overstated predictions for tumor size and estrogen status.
Diverse populations from various geographic locations, different ages, and genetic backgrounds are not represented within the dataset
Tumor size and hormone receptor status belong to a broader set of clinical and lifestyle variables that were not included in this study as the dataset did not examine them.

4. Ethics Statement

The analysis demonstrates the Shared Value of Accountability through its transparent and replicable methods and clear communication of limitations. The analysis follows the Ethical Principle of Avoiding Harm by using responsible medical data interpretation to guide decisions without drawing exaggerated conclusions or creating potential misuse.

5. AI Usage Statement

No artificial intelligence was used in the creation of the final version of this project, this includes research, analysis, and coding.

6. Acknowledgements

6.1 External Evidence

(Compartive Boxplot): Elston, C. W., & Ellis, I. O. (1991). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology, 19(5), 403–410.
Link: https://doi.org/10.1111/j.1365-2559.1991.tb00229.x

(Double Barplot):EBCTCG (2011). Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: patient-level meta-analysis of randomised trials. The Lancet, 378(9793), 771–784.
Link: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60993-8/fulltext

6.2 Ed Discussion

Hi, do we need intext citations for research articles?: https://edstem.org/au/courses/19992/discussion/2676788?comment=5955837

Graphical Outputs: https://edstem.org/au/courses/19992/discussion/2676886

Error message?: https://edstem.org/au/courses/19992/discussion/2677945