No id variables; using all as measure variables

df = melt(logCount, variable.name = "Sample", value.name ="log2Counts")
df = data.frame(df, Condition = substr(df$Sample, 1, 4))
plotDensityPlot(df)

Using rlog

In this section we make use of log2-transformed counts such that they are normalized with respect to the library size to check for outliers.

No id variables; using all as measure variables

Density plots

No id variables; using all as measure variables

plotDensityPlot(df) + facet_wrap(~ Condition) 

PCA plot

The separation only occurs along the second PC which explains only 16% variance.

MDS plot

Heat map clusters (CTRLx,KDx)

Cook’s Distance

Cook’s distance measures how much a single sample is influencing the fitted coefficients for a gene. A large value of Cook’s distance is intended to indicate an outlier count.

P-value histogram

Close to unifrom distribution. Only 4 DE genes.

Batch-effects correction

I take two strategies:

  1. Model the batch as a covariate in design matrix
  2. Surrogate variable analysis, with using the batch numbers(n=3) as surrogate variables.

Batch as a covariate

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

PCA after removing batch-effects

MDS post correction

Heatmap post correction

pvalue distribution post correction

Post batch-effect removal DE genes

log2 fold change (MAP): condition knockdown vs control 
Wald test p-value: condition knockdown vs control 
DataFrame with 13 rows and 6 columns
                  baseMean log2FoldChange      lfcSE       stat       pvalue         padj
                 <numeric>      <numeric>  <numeric>  <numeric>    <numeric>    <numeric>
ENSG00000066044  1817.5801     -1.9609248  0.1148421 -17.074959 2.279873e-65 1.589755e-61
ENSG00000112118  7134.5231     -0.5752556  0.0968824  -5.937669 2.891032e-09 1.007958e-05
ENSG00000149591  1339.1282      0.6779845  0.1184221   5.725153 1.033406e-08 2.401981e-05
ENSG00000131016  9669.6350      0.6474830  0.1165783   5.554060 2.791091e-08 4.865569e-05
ENSG00000109685   823.6079     -0.5608532  0.1112324  -5.042176 4.602675e-07 6.418890e-04
...                    ...            ...        ...        ...          ...          ...
ENSG00000198734  3086.3205     -0.4585079 0.10363208  -4.424382 9.671892e-06  0.007493567
ENSG00000055950  1138.7960     -0.4578538 0.10604262  -4.317640 1.577065e-05  0.010996873
ENSG00000006327  2649.3016      0.4457607 0.10487255   4.250499 2.132947e-05  0.013520946
ENSG00000143632   521.2588      0.4913019 0.12127315   4.051201 5.095549e-05  0.029609386
ENSG00000171345 13738.2914      0.3671011 0.09290258   3.951463 7.767492e-05  0.041663634

Surrogate variable analysis

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Number of significant surrogate variables is:  3 
Iteration (out of 5 ):1  2  3  4  5  

Surrogate variables are not really helpful here. If we have a sense of batches of the samples, the plots above should have helped differentiate between different batches, but they do not.

using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

Post SVA DE genes

log2 fold change (MAP): condition knockdown vs control 
Wald test p-value: condition knockdown vs control 
DataFrame with 31 rows and 6 columns
                 baseMean log2FoldChange      lfcSE      stat       pvalue         padj
                <numeric>      <numeric>  <numeric> <numeric>    <numeric>    <numeric>
ENSG00000131016  9669.635      0.7648242 0.08088850  9.455290 3.221275e-21 1.293664e-17
ENSG00000112118  7134.523     -0.5807088 0.08363440 -6.943421 3.827170e-12 7.107443e-09
ENSG00000185130  6960.769     -0.5720435 0.08294031 -6.897050 5.309345e-12 7.107443e-09
ENSG00000149591  1339.128      0.6087591 0.09780138  6.224442 4.832715e-10 4.852046e-07
ENSG00000118785 24833.763     -0.4771785 0.07875791 -6.058801 1.371396e-09 1.101506e-06
...                   ...            ...        ...       ...          ...          ...
ENSG00000184260  3863.060     -0.3278997 0.08739693 -3.751844 0.0001755385   0.02610972
ENSG00000132031  4133.702      0.3170201 0.08615926  3.679466 0.0002337225   0.03352249
ENSG00000103187  1104.136     -0.3556372 0.09707397 -3.663569 0.0002487248   0.03444410
ENSG00000131781  1366.976      0.3593298 0.09851680  3.647396 0.0002649116   0.03546284
ENSG00000124145  3137.096      0.3206582 0.08821679  3.634888 0.0002781013   0.03602757
