No id variables; using all as measure variables

df = melt(logCount, variable.name = "Sample", value.name ="log2Counts")
df = data.frame(df, Condition = substr(df$Sample, 1, 4))
plotDensityPlot(df)


Using rlog
In this section we make use of log2-transformed counts such that they are normalized with respect to the library size to check for outliers.
No id variables; using all as measure variables

Density plots
No id variables; using all as measure variables

plotDensityPlot(df) + facet_wrap(~ Condition)

PCA plot
The separation only occurs along the second PC which explains only 16% variance.

MDS plot

Heat map clusters (CTRLx,KDx)


Cook’s Distance
Cook’s distance measures how much a single sample is influencing the fitted coefficients for a gene. A large value of Cook’s distance is intended to indicate an outlier count.

P-value histogram

Close to unifrom distribution. Only 4 DE genes.
Batch-effects correction
I take two strategies:
- Model the batch as a covariate in design matrix
- Surrogate variable analysis, with using the batch numbers(n=3) as surrogate variables.
Batch as a covariate
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
PCA after removing batch-effects

MDS post correction

Heatmap post correction


pvalue distribution post correction

Post batch-effect removal DE genes
log2 fold change (MAP): condition knockdown vs control
Wald test p-value: condition knockdown vs control
DataFrame with 13 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000066044 1817.5801 -1.9609248 0.1148421 -17.074959 2.279873e-65 1.589755e-61
ENSG00000112118 7134.5231 -0.5752556 0.0968824 -5.937669 2.891032e-09 1.007958e-05
ENSG00000149591 1339.1282 0.6779845 0.1184221 5.725153 1.033406e-08 2.401981e-05
ENSG00000131016 9669.6350 0.6474830 0.1165783 5.554060 2.791091e-08 4.865569e-05
ENSG00000109685 823.6079 -0.5608532 0.1112324 -5.042176 4.602675e-07 6.418890e-04
... ... ... ... ... ... ...
ENSG00000198734 3086.3205 -0.4585079 0.10363208 -4.424382 9.671892e-06 0.007493567
ENSG00000055950 1138.7960 -0.4578538 0.10604262 -4.317640 1.577065e-05 0.010996873
ENSG00000006327 2649.3016 0.4457607 0.10487255 4.250499 2.132947e-05 0.013520946
ENSG00000143632 521.2588 0.4913019 0.12127315 4.051201 5.095549e-05 0.029609386
ENSG00000171345 13738.2914 0.3671011 0.09290258 3.951463 7.767492e-05 0.041663634
Surrogate variable analysis
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Number of significant surrogate variables is: 3
Iteration (out of 5 ):1 2 3 4 5




Surrogate variables are not really helpful here. If we have a sense of batches of the samples, the plots above should have helped differentiate between different batches, but they do not.
using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Post SVA DE genes
log2 fold change (MAP): condition knockdown vs control
Wald test p-value: condition knockdown vs control
DataFrame with 31 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000131016 9669.635 0.7648242 0.08088850 9.455290 3.221275e-21 1.293664e-17
ENSG00000112118 7134.523 -0.5807088 0.08363440 -6.943421 3.827170e-12 7.107443e-09
ENSG00000185130 6960.769 -0.5720435 0.08294031 -6.897050 5.309345e-12 7.107443e-09
ENSG00000149591 1339.128 0.6087591 0.09780138 6.224442 4.832715e-10 4.852046e-07
ENSG00000118785 24833.763 -0.4771785 0.07875791 -6.058801 1.371396e-09 1.101506e-06
... ... ... ... ... ... ...
ENSG00000184260 3863.060 -0.3278997 0.08739693 -3.751844 0.0001755385 0.02610972
ENSG00000132031 4133.702 0.3170201 0.08615926 3.679466 0.0002337225 0.03352249
ENSG00000103187 1104.136 -0.3556372 0.09707397 -3.663569 0.0002487248 0.03444410
ENSG00000131781 1366.976 0.3593298 0.09851680 3.647396 0.0002649116 0.03546284
ENSG00000124145 3137.096 0.3206582 0.08821679 3.634888 0.0002781013 0.03602757
