This report analyzed the 488T>C in MT-RNR2/ M-SHLP2, c.11A>G, p.Lys4Arg. in Haplogroups J, U and A using data from mutect2.haplogroup1.tab and mutect2.03.merge.vcf.
First import data, then merge haplogroup information with genetic variant data. Extract m.2639 mutation status for all samples.
# Load clean haplogroup data
haplo_clean <- read_tsv("data/CRPS_summary/mutect2.haplogroup1.tab")
# Load VCF data
vcf <- read.vcfR("data/CRPS_summary/mutect2.03.merge.vcf")
## Scanning file to determine attributes.
## File attributes:
## meta lines: 47
## header_line: 48
## variant count: 3645
## column count: 2506
## Meta line 47 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 3645
## Character matrix gt cols: 2506
## skip: 0
## nrows: 3645
## row_num: 0
## Processed variant 1000Processed variant 2000Processed variant 3000Processed variant: 3645
## All variants processed
# Clean sample IDs
haplo_clean <- haplo_clean %>%
mutate(IID_clean = str_split_i(Run, "_", 1))
# Extract 488T>C mutation (more frequently referred to as m.2158T>C)
m488_pos <- "2158"
m488_index <- which(vcf@fix[, "POS"] == m488_pos)
gt <- extract.gt(vcf, element = "GT")
m488_gt <- gt[m488_index, ]
mutation_data <- data.frame(
IID_vcf = names(m488_gt),
m488_mutation = ifelse(grepl("1", m488_gt), 1, 0)
) %>%
mutate(IID_clean = str_split_i(IID_vcf, "_", 1))
# Merge data
final_data <- haplo_clean %>%
left_join(mutation_data, by = "IID_clean") %>%
filter(!is.na(m488_mutation))
Summary the merge data:
total_samples <- nrow(final_data)
mutated_samples <- sum(final_data$m488_mutation)
mutation_rate <- mutated_samples / total_samples * 100
cat("**Total samples analyzed:**", total_samples, "\n\n")
## **Total samples analyzed:** 2495
cat("**Samples with m.488 mutation:**", mutated_samples, "\n\n")
## **Samples with m.488 mutation:** 36
cat("**Overall mutation frequency:**", round(mutation_rate, 2), "%\n")
## **Overall mutation frequency:** 1.44 %
Calculated mutation frequencies across haplogroups
haplo_summary <- final_data %>%
group_by(haplogroup) %>%
summarise(
n_samples = n(),
n_mutated = sum(m488_mutation),
mutation_freq = n_mutated / n_samples * 100
) %>%
arrange(desc(mutation_freq))
# Display top haplogroups with mutations
haplo_summary_mutated <- haplo_summary %>%
filter(n_mutated > 0)
haplo_summary %>%
knitr::kable(
col.names = c("Haplogroup", "Total Samples", "Mutated Samples", "Mutation Frequency (%)"),
digits = 2
)
| Haplogroup | Total Samples | Mutated Samples | Mutation Frequency (%) |
|---|---|---|---|
| J | 261 | 36 | 13.79 |
| B | 1 | 0 | 0.00 |
| C | 1 | 0 | 0.00 |
| D | 1 | 0 | 0.00 |
| F | 1 | 0 | 0.00 |
| H | 1097 | 0 | 0.00 |
| HV | 57 | 0 | 0.00 |
| I | 87 | 0 | 0.00 |
| K | 202 | 0 | 0.00 |
| L1 | 3 | 0 | 0.00 |
| L2 | 2 | 0 | 0.00 |
| L3 | 4 | 0 | 0.00 |
| M | 3 | 0 | 0.00 |
| N | 6 | 0 | 0.00 |
| R | 9 | 0 | 0.00 |
| T | 267 | 0 | 0.00 |
| U | 347 | 0 | 0.00 |
| V | 77 | 0 | 0.00 |
| W | 42 | 0 | 0.00 |
| X | 27 | 0 | 0.00 |
The m.488 mutation shows an high frequency (13.79%) within the J haplogroup.
This represents a significant enrichment pattern compared to other haplogroups.