This report analyzed the 488T>C in MT-RNR2/ M-SHLP2, c.11A>G, p.Lys4Arg. in Haplogroups J, U and A using data from mutect2.haplogroup1.tab and mutect2.03.merge.vcf.

UKBiobank

Data processing

First import data, then merge haplogroup information with genetic variant data. Extract m.2639 mutation status for all samples.

# Load clean haplogroup data
haplo_clean <- read_tsv("data/CRPS_summary/mutect2.haplogroup1.tab")

# Load VCF data
vcf <- read.vcfR("data/CRPS_summary/mutect2.03.merge.vcf")
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 47
##   header_line: 48
##   variant count: 3645
##   column count: 2506
## Meta line 47 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 3645
##   Character matrix gt cols: 2506
##   skip: 0
##   nrows: 3645
##   row_num: 0
## Processed variant 1000Processed variant 2000Processed variant 3000Processed variant: 3645
## All variants processed
# Clean sample IDs
haplo_clean <- haplo_clean %>%
  mutate(IID_clean = str_split_i(Run, "_", 1))

# Extract 488T>C mutation (more frequently referred to as m.2158T>C)
m488_pos <- "2158"
m488_index <- which(vcf@fix[, "POS"] == m488_pos)
gt <- extract.gt(vcf, element = "GT")
m488_gt <- gt[m488_index, ]

mutation_data <- data.frame(
  IID_vcf = names(m488_gt),
  m488_mutation = ifelse(grepl("1", m488_gt), 1, 0)
) %>%
  mutate(IID_clean = str_split_i(IID_vcf, "_", 1))

# Merge data
final_data <- haplo_clean %>%
  left_join(mutation_data, by = "IID_clean") %>%
  filter(!is.na(m488_mutation))

Summary the merge data:

total_samples <- nrow(final_data)
mutated_samples <- sum(final_data$m488_mutation)
mutation_rate <- mutated_samples / total_samples * 100

cat("**Total samples analyzed:**", total_samples, "\n\n")
## **Total samples analyzed:** 2495
cat("**Samples with m.488 mutation:**", mutated_samples, "\n\n") 
## **Samples with m.488 mutation:** 36
cat("**Overall mutation frequency:**", round(mutation_rate, 2), "%\n")
## **Overall mutation frequency:** 1.44 %

Calculated mutation frequencies across haplogroups

haplo_summary <- final_data %>%
  group_by(haplogroup) %>%
  summarise(
    n_samples = n(),
    n_mutated = sum(m488_mutation),
    mutation_freq = n_mutated / n_samples * 100
  ) %>%
  arrange(desc(mutation_freq))

# Display top haplogroups with mutations
haplo_summary_mutated <- haplo_summary %>%
  filter(n_mutated > 0)



haplo_summary %>%
  knitr::kable(
    col.names = c("Haplogroup", "Total Samples", "Mutated Samples", "Mutation Frequency (%)"),
    digits = 2
  )
Haplogroup Total Samples Mutated Samples Mutation Frequency (%)
J 261 36 13.79
B 1 0 0.00
C 1 0 0.00
D 1 0 0.00
F 1 0 0.00
H 1097 0 0.00
HV 57 0 0.00
I 87 0 0.00
K 202 0 0.00
L1 3 0 0.00
L2 2 0 0.00
L3 4 0 0.00
M 3 0 0.00
N 6 0 0.00
R 9 0 0.00
T 267 0 0.00
U 347 0 0.00
V 77 0 0.00
W 42 0 0.00
X 27 0 0.00

Specific Enrichment in J Haplogroup

The m.488 mutation shows an high frequency (13.79%) within the J haplogroup.

This represents a significant enrichment pattern compared to other haplogroups.