This report analyzes breast cancer patient data from the SEER database (IEEE DataPort). We explore how tumor characteristics, hormonal receptor status, and demographic factors affect survival outcomes.
Client: National Cancer Advisory Board (NCAB)
Goal: To provide data-driven recommendations to improve breast cancer prognosis and early intervention strategies.
Data Summary
library(readr)
Warning: package 'readr' was built under R version 4.2.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Interpretation of Hormone Receptor Status and Survival
Table 1 presents the average survival months across different estrogen and progesterone receptor status combinations. Patients with positive estrogen and positive progesterone receptors had an average survival of 72.30 months, reflecting the most favorable prognosis among all groups. This aligns with clinical findings that hormone receptor-positive tumors tend to respond better to endocrine therapy, potentially leading to prolonged survival.
Patients with negative estrogen but positive progesterone receptors showed the highest average survival (72.59 months). This anomaly could be due to individual variation or other interacting factors not captured in this summary. Nevertheless, both groups with at least one positive receptor demonstrated notably better outcomes than the double-negative group, which had the lowest average survival (58.93 months).
The findings highlight the prognostic importance of hormone receptor status in breast cancer, suggesting that patients lacking both receptors may require more aggressive or alternative treatment strategies. These results are consistent with previous studies such as Anderson et al. (2001) and suggest the need for tailored therapy based on receptor profiling.
Table 2: Tumor Grade and Survival
table2 <- data %>%group_by(Grade) %>%summarise(Average_Survival =mean(`Survival Months`, na.rm =TRUE), Count =n(), .groups ="drop")knitr::kable(table2, caption ="Survival Months by Tumor Grade")
Survival Months by Tumor Grade
Grade
Average_Survival
Count
1
72.93738
543
2
72.17907
2351
3
68.74977
1111
anaplastic; Grade IV
64.42105
19
Interpretation of Tumor Grade and Survival Duration
This table summarizes the average survival months for breast cancer patients based on tumor grade. The tumor grade is an indication of how much the cancer cells resemble normal breast cells. Grade 1 (well-differentiated) tumors are the least aggressive, and patients in this category had the highest average survival of approximately 72.94 months.
As tumor grade increases, reflecting more abnormal and aggressive cell characteristics, the average survival decreases. Grade 2 (moderately differentiated) patients had a slightly lower survival average of 72.18 months, followed by Grade 3 (poorly differentiated) with 68.75 months.
The lowest survival average was found among patients with anaplastic (Grade IV) tumors, at 64.42 months. Although this group had only 19 patients, the drop in survival time is consistent with the understanding that anaplastic tumors grow and spread more rapidly, leading to a more severe prognosis.
These findings reinforce the clinical importance of tumor grading in treatment planning. The higher the grade, the more urgent the need for aggressive management and close monitoring. This pattern is consistent with literature from leading oncological studies, such as Elston & Ellis (1991), supporting the role of histological grading as a robust prognostic factor in breast cancer care.
Graph 1: Boxplot of Tumor Size by Survival Status
library(ggplot2)ggplot(data, aes(x = Status, y =`Tumor Size`, fill = Status)) +geom_boxplot() +labs(title ="Tumor Size by Survival Status", y ="Tumor Size", x ="Survival Status")
Interpretation of Tumor Size by Survival Status
The boxplot above compares tumor sizes between two survival groups: patients who were Alive and those who were Dead at the time of observation. The central observation here is that tumor size tends to be larger in patients who died compared to those who survived.
The median tumor size (represented by the thick line in the box) is noticeably higher for the “Dead” group than for the “Alive” group.
The interquartile range (IQR) the box is also wider in the “Dead” group, indicating more variability in tumor size among deceased patients.
A greater number of outliers (dots above the whiskers) are observed in both groups, but are more extreme in the “Dead” group, suggesting some patients had significantly larger tumors.
The maximum tumor size recorded among those who died exceeds that of those who survived, reinforcing the relationship between tumor burden and survival outcome.
From a clinical standpoint, this visual provides supporting evidence for the notion that larger tumors are associated with poorer prognosis, a concept well-documented in breast cancer research literature. This pattern may be due to larger tumors being more likely to have metastasized or progressed to a more advanced stage.
Graph 2: Histogram of Survival Months
ggplot(data, aes(x =`Survival Months`)) +geom_histogram(binwidth =10, fill ="steelblue", color ="black") +labs(title ="Distribution of Survival Months", x ="Survival Months", y ="Frequency")
The histogram of survival months reveals that a majority of patients have a survival month between 45 months to 105 months, as indicated by the concentration of taller bars in the months. The distribution is left -skewed, suggesting that long-term survival is common.
Recommendations
Patients with positive estrogen and/or progesterone receptors showed the highest average survival (up to 72.59 months), indicating they benefit most from hormone-targeted therapies.
Conversely, those lacking both receptors had poorer outcomes (average 58.93 months), suggesting a need for more intensive treatment strategies.
Additionally, lower tumor grades (Grade I) were linked to longer survival (72.94 months) compared to Grade IV (64.42 months). We recommend prioritizing early detection and receptor profiling to guide personalized treatment, supported by findings from the SEER dataset and literature ( Anderson et al., 2001).
Anderson, W.F., et al. (2001). Estrogen Receptor Status and Breast Cancer Prognosis. *Journal of the National Cancer Institute*, 93(2), 113–120.
Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). (2011). Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: patient-level meta-analysis. *Lancet*, 378(9793), 771–784.
National Cancer Institute. (n.d.). Breast Cancer Treatment (PDQ®)–Health Professional Version. Retrieved from https://www.cancer.gov
National Cancer Institute. Hormone Receptors and Breast Cancer.
American Cancer Society. (2023). Breast Cancer Facts & Figures.
Elston, C.W., & Ellis, I.O. (1991). Pathological prognostic factors in breast cancer. *Histopathology*, 19(5), 403–410.
National Cancer Institute. Breast Cancer Prognostic Factors. https://www.cancer.gov
American Cancer Society. (2023). Understanding a Breast Cancer Diagnosis. https://www.cancer.org