Identification of Differentially Expressed Genes using limma

Required Packages

limma
dplyr

Step 1: Read Expression Data

# Read expression data
df = read.delim("raw_expression_data.txt", row.names = 1, header = TRUE)
# Read sample information
target = read.delim("sample_info.txt")

Expression Data: Load the gene expression data from a file, setting the first column as row names (gene identifiers) and including a header row.
Sample Information: Load the sample metadata, which contains details like sample IDs and experimental conditions.

Step 2: Match Expression Data with Sample Information

# Ensure expression data and target are in the same order
df = df[, target$ID]

This step aligns the columns of the expression data (df) with the sample order specified in the target file. It ensures that the expression data corresponds correctly to the sample metadata.

Step 4: Data Quality Check

# Check for any NA, 0, or negative values
max(df)
min(df)

# Convert NA values to 0
df[is.na(df)] = 0

Check for NAs, 0s, or negative values: It’s crucial to identify and handle missing or problematic values in the dataset.
Handling NAs: Replace any missing values (NAs) with 0, which might be appropriate depending on the context of the data.

Step 5: Normalization

df = normalizeBetweenArrays(df, method = "quantile")

Quantile Normalization: This step normalizes the expression data across arrays (samples) using the quantile normalization method. It ensures that the distribution of expression values is the same for each array, making the data comparable across different samples. This method is particularly useful in removing technical biases and ensuring that the measurements are on the same scale.

Step 6: Log Transformation

# Convert to log scale
exp = log2(df)

Log Transformation: Transform the expression data to a log scale, typically log2. This transformation helps normalize the data and reduce skewness, making it more suitable for linear modeling.

Step 7: Create Design Matrix

# Create design matrix based on the group
conditions = paste(target$Group)
conditions = factor(conditions, levels=unique(conditions))
design = model.matrix(~0 + conditions)
colnames(design) = levels(conditions)
unique(conditions)

Factor Levels: Convert the group information into a factor with levels corresponding to the experimental groups.
Design Matrix: Create a design matrix using model.matrix. The ~0 + syntax excludes the intercept, making each column of the design matrix represent a group.

Step 8: Fit Linear Model

# Fit the design to the expression data
fit = lmFit(exp, design)
# Make contrast matrix
contrast.matrix <- makeContrasts(deg1 = Group2 - Group1, deg2 = Group3 - Group1, levels = design)

Fit Linear Model: Use lmFit to fit a linear model to the expression data, considering the design matrix.
Contrast Matrix: Define contrasts to compare specific groups, specifying the comparisons of interest (e.g., Group2 vs Group1, Group3 vs Group1).

Step 9: Apply Contrasts

# Fit contrast matrix
fit.cont = contrasts.fit(fit, contrast.matrix)

Apply Contrasts: Adjust the fitted model using the contrast matrix to focus on the specific comparisons of interest.

Step 10: Empirical Bayes Moderation

# Fit eBayes
fit.cont = eBayes(fit.cont)

Empirical Bayes Moderation: Use the eBayes function to apply empirical Bayes moderation, which shrinks the estimated variances of the coefficients towards a common value. This step improves the reliability of the statistical tests, especially for small sample sizes.

Step 11: Extract Differentially Expressed Genes (DEGs)

# Generate DEGs by comparison
res1 = topTable(fit.cont, coef = "degs1", adjust.method = "BH", number = nrow(exp), sort.by = 'none')
res2 = topTable(fit.cont, coef = "degs2", adjust.method = "BH", number = nrow(exp), sort.by = 'none')

Extract DEGs: Use topTable to extract the top differentially expressed genes for the specified contrast (coef = "deg1"). The adjust.method = "BH" parameter applies the Benjamini-Hochberg correction for multiple testing, and number = nrow(exp) returns results for all genes.
View Results: Display the top differentially expressed genes, showing the most significant changes in expression. Similar for (coef = "deg2")

This limma workflow allows for the identification of differentially expressed genes across different conditions, accounting for multiple testing and ensuring robust statistical inference.

Identification of Differentially Expressed Genes using limma

Snijesh V P

2024-07-31

Required Packages

Step 1: Read Expression Data

Step 2: Match Expression Data with Sample Information

Step 4: Data Quality Check

Step 5: Normalization

Step 6: Log Transformation

Step 7: Create Design Matrix

Step 8: Fit Linear Model

Step 9: Apply Contrasts

Step 10: Empirical Bayes Moderation

Step 11: Extract Differentially Expressed Genes (DEGs)