Context Specific Metabolic Modeling and Analysis

Biostatistics Project

Karthik M Dani

Namyatha N Mulbagal

Building Context Specific Metabolic Models

Link to the python notebook this link

Importing packages

run_model_reconstruction from gsmm.csm.build_csm: Used for running the model reconstruction pipeline.
cobra: A package for constraint-based reconstruction and analysis.

from gsmm.csm.build_csm import run_model_reconstruction

import cobra
import pandas as pd

Load expression data

emt_expression_data_path: Path to the CSV file containing the EMT expression data.
emt_expression_data: DataFrame holding the loaded EMT expression data.

emt_expression_data_path = "../../Data/InputData/EMT_FINAL_DATA.csv"
emt_expression_data = pd.read_csv(emt_expression_data_path)
emt_expression_data.head()

Load Recon3D model from the web

recon_model: The Recon3D model loaded from a web source using COBRApy.
This step loads the Recon3D model, a comprehensive genome-scale metabolic model. Recon3D integrates metabolic data from various human tissues and is widely used for studying human metabolism.

Structure of a COBRApy SBML Model

What’s inside each of them in CobraPy?

In COBRApy, a SBML model is structured as follows:

Model: The core object representing the metabolic model. It contains several attributes and methods to manipulate and analyze the model.
- Attributes:
  - reactions: A list of Reaction objects representing the biochemical reactions in the model.
  - metabolites: A list of Metabolite objects representing the metabolites in the model.
  - genes: A list of Gene objects representing the genes associated with the reactions.
  - objective: The objective function of the model, typically used in flux balance analysis (FBA).
  - id: A unique identifier for the model.
  - name: A descriptive name for the model.
- Methods:
  - optimize(): Performs flux balance analysis to optimize the objective function.
  - summary(): Provides a summary of the model, including the number of reactions, metabolites, and genes.
Reaction: Represents a biochemical reaction in the model.
- Attributes:
  - id: A unique identifier for the reaction.
  - name: A descriptive name for the reaction.
  - metabolites: A dictionary of Metabolite objects and their stoichiometric coefficients in the reaction.
  - lower_bound: The lower bound of the reaction flux.
  - upper_bound: The upper bound of the reaction flux.
- Methods:
  - add_metabolites(metabolites): Adds metabolites to the reaction.
  - remove_metabolites(metabolites): Removes metabolites from the reaction.
Metabolite: Represents a metabolite in the model.
- Attributes:
  - id: A unique identifier for the metabolite.
  - name: A descriptive name for the metabolite.
  - formula: The chemical formula of the metabolite.
  - compartment: The compartment where the metabolite is located.
Gene: Represents a gene in the model.
- Attributes:
  - id: A unique identifier for the gene.
  - name: A descriptive name for the gene.
  - reactions: A list of Reaction objects associated with the gene.

Load `Recon3D` model as the parent model

recon_model = cobra.io.web.load_model(model_id="Recon3D")
recon_model

Save `Recon3D` as `.xml` file.

cobra.io.write_sbml_model(recon_model, "recon_model.xml")

Reconstruction algorithm in gsmm needs,

parent_model_path: Recon3D in this case.
base_model_path: All the unneccessary genes, reactions and metabolites are removed, then the model is saved as base_model.
gene_id_column: Expression data in our case has gene ids in the column called Gene_ID. We need to tell the algorithm that this is our gene id column.

parent_model_path = "recon_model.xml"
base_model_path = "recon_model.xml"
gene_id_column = "Gene_ID"

Running Model Reconstruction to get Context-Specific Model

run_model_reconstruction(..) takes care of the interface to run the main pipeline for reconstructing an optimized metabolic model from the provided expression data. It handles exceptions and prints error messages if reconstruction fails, returning None in case of errors.

epithelial_csm = run_model_reconstruction(model_path=parent_model_path,
                                          base_model_path=base_model_path,
                                          data_path=emt_expression_data_path,
                                          gene_id_column=gene_id_column,
                                          scores_column="Epithelial",
                                          )
epithelial_csm

Similarly we do so to get mesenchymal context specific metabolic model.

mesenchymal_csm = run_model_reconstruction(model_path=parent_model_path,
                                          base_model_path=base_model_path,
                                          data_path=emt_expression_data_path,
                                          gene_id_column=gene_id_column,
                                          scores_column="Mesenchymal",
                                          )
mesenchymal_csm

Save the CSMs for further analysis and plots

cobra.io.write_sbml_model(epithelial_csm, "epithelial_csm.xml")
cobra.io.write_sbml_model(mesenchymal_csm, "mesenchymal_csm.xml")

Analysis of Context Specific Models

Link to the python notebook this link

Define `model_names` with their associated paths in a `dict`

model_paths = {
    "epithelial_csm": "epithelial_csm.xml",
    "mesenchymal_csm": "mesenchymal_csm.xml"
}

Import analysis module from `gsmm`

analyse_and_save_fluxes(..) is a function that carries all the analysis and saves the data for further analysis to be utilised by visualisation module

from gsmm.csm.analyse_csm import analyse_and_save_fluxes

analyse_and_save_fluxes(model_paths)

Get Flux related plots across Context Specific Models

from gsmm.csm.visualisation import plot_fluxes

plot_fluxes('flux_data.pkl',
            'sink_flux_data.pkl',
            True)

Clustermap for All reactions when taken into consideration for comparision across different context-specific models, as specified in model_paths above.
Correlation coefficients between each of the models when all the common reactions are considered.
Correlation for common sink reactions between each of the models.

Clustermap of Reaction rates (Fluxes) in different Context Specific Models

Pearson Correlation coefficients of All the Reaction Fluxes compared across the models

Pearson Correlation coefficients of Sink Reaction fluxes, in this case it shows insignificance

Similarly, two or more models can be compared to get the relevant plot for significant observations.