library(ggplot2)Warning: package 'ggplot2' was built under R version 4.4.3
breast_data = read.csv("breast_cancer.csv")
breast_data = na.omit(breast_data)
breast_data = subset(breast_data, Tumor.Size > 0 & Tumor.Size <= 100)‘BreastScreen’ Australia is the national breast cancer screening program offering free mammograms to women aged 50–74 to improve early detection and reduce breast cancer mortality. More information: https://www.cancerscreening.gov.au/breastscreen
library(ggplot2)Warning: package 'ggplot2' was built under R version 4.4.3
breast_data = read.csv("breast_cancer.csv")
breast_data = na.omit(breast_data)
breast_data = subset(breast_data, Tumor.Size > 0 & Tumor.Size <= 100)In line with the analysis of the data provided, we recommend prioritising early tumor size screening and estrogen receptor (ER) testing. Patients with smaller tumors and ER-positive status show significantly better survival outcomes. These findings make it imperative for the increased investment in early detection programs and hormone-based treatment strategies to improve prognosis and resource efficiency.
We examined the breast cancer dataset through three essential variables which included Patient Status (Alive vs Deceased), Tumor Size, and Estrogen Receptor Status. These variables serve as essential indicators to understand treatment results and direct therapeutic choices.
ggplot(breast_data, aes(x = Status)) +
geom_bar(fill = "darkblue") +
labs(title = "Patient Status Distribution", x = "Status", y = "Count") The Bar Plot of ‘Status’ shows that more patients in the dataset survived than those who did not. This initially seems encouraging, but it does introduce a noticeable bias in the outcome data. This skewed distribution may affect the interpretation of other variables, as statistical relationships could be affected by the overrepresentation of survivors in the dataset.
RQ1: How does tumor size vary between patients who survive and those who do not following a breast cancer diagnosis?
ggplot(breast_data, aes(x = Status, y = Tumor.Size)) +
geom_boxplot(fill = "darkred") +
labs(title = "Tumor Size by Patient Status", x = "Status", y = "Tumor Size (mm)")The Comparative boxplot of ‘Tumor Size’ by ‘Status’ shows that there is a significant difference in tumor size between the two outcome groups. Patients who died had significantly larger tumours on average, with distributions extending to higher values. This is consistent with the established clinical knowledge that larger tumours are associated with poorer prognosis. Additionally, the presence of outliers within the deceased group highlights the increased severity of late-stage tumour progression. A thorough investigation by Elston & Ellis (1991) concluded that tumor size is one of the most significant independent prognostic variables in breast cancer, affecting both survival rates and recurrence, further highlighting the requirement for early detection and treatment programs
RQ2: Is there an association between estrogen receptor (ER) status and breast cancer survival outcomes?
ggplot(breast_data, aes(x = Estrogen.Status, fill = Status)) +
geom_bar(position = "dodge") +
labs(title = "Estrogen Status by Patient Outcome", x = "Estrogen Status", y = "Count")The Double Bar Plot comparing ‘Estrogen Status’ and ‘Status’ shows that patients with positive estrogen receptor (ER) status have a higher chance of survival. This finding is in line with existing medical research which shows that ER-positive patients typically respond well to hormonal therapies. The extensive study conducted by the Early Breast Cancer Trialist’s Collaborative Group (2011), which discovered that hormone treatments like Tamoxifen greatly improve long-term survival and reduce recurrence in ER-positive patients. These insights highlight the practical importance of ER status testing in predicting outcomes and tailoring appropriate treatments for patients.
Nonetheless, several limitations must be acknowledged. Chief among them is the imbalance in the ‘Status’ variable—an issue raised in the initial analysis. The over-representation of surviving patients could bias observed relationships, potentially overstating the strength of predictors like tumor size or estrogen status. Additionally, the dataset may not be representative of broader populations across different regions, age groups, or genetic backgrounds. Finally, although tumor size and hormone receptor status are important indicators, they are part of a larger set of clinical and lifestyle variables that were not explored in this dataset.
The analysis demonstrates the Shared Value of Accountability through its transparent and replicable methods and clear communication of limitations. The analysis follows the Ethical Principle of Avoiding Harm by using responsible medical data interpretation to guide decisions without drawing exaggerated conclusions or creating potential misuse.
There was no AI used to produce the final version of this project. No content of the final submission of this project contains assistance from AI, including research, analysis and coding.
(Compartive Boxplot): Elston, C. W., & Ellis, I. O. (1991). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology, 19(5), 403–410.
Link: https://doi.org/10.1111/j.1365-2559.1991.tb00229.x
(Double Barplot):EBCTCG (2011). Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: patient-level meta-analysis of randomised trials. The Lancet, 378(9793), 771–784.
Link: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60993-8/fulltext
Hi, do we need intext citations for research articles?: https://edstem.org/au/courses/19992/discussion/2676788?comment=5955837
Graphical Outputs: https://edstem.org/au/courses/19992/discussion/2676886