DATA1001 PROJECT 2 EDA

Author

560646469

Visualisations

Code
library(tidyverse)
library(plotly)

melanomadata = read_csv("melanoma.csv")
colnames(melanomadata)[18] <- "MELANOMA_DEATH"

melanomadata_clean <- melanomadata %>%
  filter(
    MUTATIONSUBTYPES %in% c("BRAF_Hotspot_Mutants", "RAS_Hotspot_Mutants",
                             "NF1_Any_Mutants", "Triple_WT"),
    CURATED_AGE_AT_INITIAL_PATHOLOGIC_DIAGNOSIS != "-"
  ) %>%
  mutate(
    Age = as.numeric(CURATED_AGE_AT_INITIAL_PATHOLOGIC_DIAGNOSIS),
    Mutation = recode(MUTATIONSUBTYPES,
      "BRAF_Hotspot_Mutants" = "BRAF",
      "RAS_Hotspot_Mutants"  = "RAS",
      "NF1_Any_Mutants"      = "NF1",
      "Triple_WT"            = "Triple WT"
    ),
    Mutation = factor(Mutation, levels = c("BRAF", "RAS", "NF1", "Triple WT"))
  )

p1 <- ggplot(melanomadata_clean %>% filter(!is.na(Age)), aes(x = Mutation, y = Age, fill = Mutation)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 2, width = 0.5, alpha = 0.8) +
  scale_fill_manual(values = c(
    "BRAF"      = "blue",
    "RAS"       = "orange",
    "NF1"       = "green",
    "Triple WT" = "purple"
  )) +
  labs(
    title = "Figure 1: Age at Diagnosis by Mutation Subtype",
    x = "Mutation Subtype",
    y = "Age at Diagnosis"
  ) +
  theme_bw() +
  theme(legend.position = "none")
ggplotly(p1)
Code
p2 <- ggplot(melanomadata_clean %>% filter(!is.na(Age)), aes(x = Mutation, y = Age, colour = Mutation)) +
  geom_jitter(width = 0.2, alpha = 0.4, size = 1.8) +
  stat_summary(fun = mean, geom = "crossbar", width = 0.4,
               colour = "black", linewidth = 0.6) +
  scale_colour_manual(values = c(
    "BRAF"      = "blue",
    "RAS"       = "orange",
    "NF1"       = "green",
    "Triple WT" = "purple"
  )) +
  labs(
    title = "Figure 2: Age at Diagnosis by Mutation Subtype (Individual Patients)",
    x = "Mutation Subtype",
    y = "Age at Diagnosis"
  ) +
  theme_bw() +
  theme(legend.position = "none")
ggplotly(p2)

Discussion

The data used in this report comes from The Cancer Genome Atlas (TCGA), a large cancer research database run by the National Cancer Institute in the United States. The specific dataset looks at 333 patients diagnosed with skin melanoma, and includes information such as their age at diagnosis and the genetic mutation subtype of their tumour. Figure 1 shows that patients with BRAF mutations tended to be diagnosed at a younger age compared to the other groups, with most cases appearing below 60. In contrast, NF1 and Triple WT patients were generally older at the time of diagnosis. RAS patients fell somewhere in between. Figure 2 shows the same data but as individual patient dots, which makes it easier to see how spread out the ages are within each group. BRAF patients are noticeably clustered at younger ages, while NF1 and Triple WT patients are more scattered across older ages. This suggests that different mutation subtypes may develop at different stages of life.

Acknowledgements

AI Usage

Tool Publisher URL Context of Use
Claude (Sonnet 4.5) Anthropic https://claude.ai Used to assist with debugging R code for data visualisations.