Introduction

KEGG is a widely used database for biological pathways. This document demonstrates how to programmatically extract gene names and their descriptions from a KEGG pathway using R and the KEGGREST package.

Prerequisites

Install the required package if you haven’t already:

Load the Library

library(KEGGREST)

Define the Extraction Function

The function below fetches the pathway information using a KEGG pathway ID and extracts gene names with their full descriptions.

extract_gene_info <- function(kegg_id) {
  pathway_data <- keggGet(kegg_id)
  gene_info <- pathway_data[[1]]$GENE
  gene_names <- gene_info[seq(2, length(gene_info), by = 2)]
  split_gene_names <- strsplit(gene_names, "; ")
  gene_table <- data.frame(
    GeneName = sapply(split_gene_names, `[`, 1),
    FullDescription = sapply(split_gene_names, `[`, 2),
    stringsAsFactors = FALSE
  )
  return(gene_table)}

Example: Extracting Genes from a Human Pathway

Let’s extract gene information from the Mismatch Repair pathway in humans (hsa03430):

gene_table <- extract_gene_info("hsa03430")
head(gene_table, 10)
##    GeneName                                              FullDescription
## 1     POLD3        DNA polymerase delta 3, accessory subunit [KO:K03504]
## 2      MLH3                                   mutL homolog 3 [KO:K08739]
## 3      MSH6                                   mutS homolog 6 [KO:K08737]
## 4      RPA4                           replication protein A4 [KO:K10741]
## 5      LIG1        DNA ligase 1 [KO:K10747] [EC:6.5.1.1 6.5.1.6 6.5.1.7]
## 6      MLH1                                   mutL homolog 1 [KO:K08734]
## 7      MSH2                                   mutS homolog 2 [KO:K08735]
## 8      MSH3                                   mutS homolog 3 [KO:K08736]
## 9      PCNA               proliferating cell nuclear antigen [KO:K04802]
## 10     PMS2 PMS1 homolog 2, mismatch repair system component [KO:K10858]

This approach enables you to fetch and organize gene information from KEGG pathways in a tidy format, suitable for reporting and further analyses.

Feel free to adapt this document for your own analyses!