Extract and summarize MoA: Use an LLM to summarize the MoA of each drug into a concise phrase. Classify MoA: Group the summarized MoAs into categories based on their similarities. Calculate similarities: Generate a similarity matrix or scores between drugs based on their MoA descriptions using embeddings from an LLM.
Summary MoA with LLM¶
# Example drug descriptions
drug_descriptions = {
"Acetazolamide": """
The anticonvulsant activity of Acetazolamide may depend on a direct inhibition of carbonic anhydrase in the CNS, which decreases carbon dioxide tension in the pulmonary alveoli, thus increasing arterial oxygen tension. The diuretic effect depends on the inhibition of carbonic anhydrase, causing a reduction in the availability of hydrogen ions for active transport in the renal tubule lumen. This leads to alkaline urine and an increase in the excretion of bicarbonate, sodium, potassium, and water.
""",
"Acarbose": """
Alpha-glucosidase enzymes are located in the brush-border of the intestinal mucosa and serve to metabolize oligo-, tri-, and disaccharides (e.g. sucrose) into smaller monosaccharides (e.g. glucose, fructose) which are more readily absorbed.4 These work in conjunction with pancreatic alpha-amylase, an enzyme found in the intestinal lumen that hydrolyzes complex starches to oligosaccharides.7
Acarbose is a complex oligosaccharide that competitively and reversibly inhibits both pancreatic alpha-amylase and membrane-bound alpha-glucosidases - of the alpha-glucosidases, inhibitory potency appears to follow a rank order of glucoamylase > sucrase > maltase > isomaltase.7 By preventing the metabolism and subsequent absorption of dietary carbohydrates, acarbose reduces postprandial blood glucose and insulin levels.
"""
}
# Function to summarize MoA using LLM
def summarize_moa(drug_name, description):
import requests
ollama_url = "http://localhost:11434/api/generate"
prompt = f"Summarize the mechanism of action of the drug {drug_name} in a single phrase:\n\n{description}\n\nSummary:"
payload = {
"model": "llama2",
"prompt": prompt,
"stream": False
}
response = requests.post(ollama_url, json=payload)
summary = response.json()['response'].strip()
return summary
summarized_moas = {drug: summarize_moa(drug, desc) for drug, desc in drug_descriptions.items()}
print(summarized_moas)
{'Acetazolamide': "Acetazolamide's mechanism of action involves inhibiting carbonic anhydrase in the CNS to decrease carbon dioxide tension in the lungs, leading to increased arterial oxygen tension, as well as inhibiting carbonic anhydrase in the kidneys to reduce hydrogen ion availability for active transport, resulting in alkaline urine and increased excretion of bicarbonate, sodium, potassium, and water.", 'Acarbose': 'Acarbose inhibits both pancreatic alpha-amylase and membrane-bound alpha-glucosidases, leading to reduced absorption of dietary carbohydrates and decreased postprandial blood glucose and insulin levels.'}
Calculate similarities¶
Use embeddings from an LLM to calculate similarities between the MoA descriptions. mxbai-embed-large As of March 2024, this model archives SOTA performance for Bert-large sized models on the MTEB. It outperforms commercial models like OpenAIs text-embedding-3-large model and matches the performance of model 20x its size. mxbai-embed-large was trained with no overlap of the MTEB data, which indicates that the model generalizes well across several domains, tasks and text length. Usage curl http://localhost:11434/api/embeddings -d '{ "model": "mxbai-embed-large", "prompt": "Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering" }' https://ollama.com/library/mxbai-embed-large ollama.embeddings(model='mxbai-embed-large', prompt='Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering')
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# use mxbai-embed-large model
import ollama
# test_embed = ollama.embeddings(model='mxbai-embed-large', prompt='Represent this sentence for searching relevant passages: The sky is blue because of Rayleigh scattering')
# test_embed['embedding']
def get_embedding(text):
response = ollama.embeddings(model='mxbai-embed-large', prompt=f'Represent this sentence for searching relevant passages: {text}')
return response['embedding']
def get_embedding_clear(text):
response = ollama.embeddings(model='mxbai-embed-large', prompt=f'{text}')
return response['embedding']
a=get_embedding("a good idea")
b=get_embedding_clear("a good idea")
embeddings = {drug: get_embedding_clear(summary) for drug, summary in summarized_moas.items()}
# Create similarity matrix
drug_names = list(embeddings.keys())
embedding_matrix = np.array([embeddings[drug] for drug in drug_names])
similarity_matrix = cosine_similarity(embedding_matrix)
# Convert the similarity matrix into a more readable format
import pandas as pd
similarity_df = pd.DataFrame(similarity_matrix, index=drug_names, columns=drug_names)
print("Similarity Matrix:")
print(similarity_df)
Similarity Matrix: Acetazolamide Acarbose Acetazolamide 1.000000 0.633546 Acarbose 0.633546 1.000000