The study design used 4 different types of treatment:
1. DSMO without siRNA
2. DSMO with siRNA (named TCEAL1)
3. docetaxel 2mM without siRNA
4. docetaxel 2mM with siRNA (named TCEAL1)
Each experimental group had 4 replicates, leading to a total of 16 replicates.
#Load required packages
library(ArrayExpress)
library(knitr)
#Download data from BioStudies
dir.create("data_exercise", showWarnings = FALSE)
files <- getAE("E-MTAB-9484", path = "data_exercise", type = "processed", extract = TRUE)
#Import data into a count table and remove potential duplicates
countTable <- read.table("data_exercise/20190328_LR_R08_raw_counts.tsv",
header=TRUE, as.is=TRUE, row.names=1, sep="\t")
countTable <- countTable[, !duplicated(colnames(countTable))]
#get the sample names from the header of the count table
sampleNames <- colnames(countTable)
#since the sample names contain all the info about the treatment factors etc,
#we need to split this up, so we create a new table with the following code:
sampleInfo <- do.call(rbind, strsplit(sampleNames, "_"))
#sampleInfo now shows the very long names with all the information as separate columns in the new table
#we see that we need column 5, as this shows whether the replicate had siRNA or control as a factor
#we also see we need column 7, as this shows whether the replicate had DMSO or docetaxel as a factor
#on the BioStudies website we can see that the dose for docetaxel was always 2mM
#therefore we add the dose factor separately as this is not available in sampleInfo
#we make a data frama of the 3 factors we need and get the replicate names, all from sampleInfo
sampleTable <- data.frame(
RNA_interference = factor(sampleInfo[, 5]),
compound = factor(sampleInfo[, 7]),
dose = factor(ifelse(sampleInfo[, 7] == "DMSO",
"0 mM",
"2 mM")))
rownames(sampleTable) <- sampleInfo[, 3]
knitr::kable(sampleTable)| RNA_interference | compound | dose | |
|---|---|---|---|
| LR1 | Con | DMSO | 0 mM |
| LR2 | Con | Doc | 2 mM |
| LR3 | TCEAL1 | DMSO | 0 mM |
| LR4 | TCEAL1 | Doc | 2 mM |
| LR5 | Con | DMSO | 0 mM |
| LR6 | Con | Doc | 2 mM |
| LR7 | TCEAL1 | DMSO | 0 mM |
| LR8 | TCEAL1 | Doc | 2 mM |
| LR9 | Con | DMSO | 0 mM |
| LR10 | Con | Doc | 2 mM |
| LR11 | TCEAL1 | DMSO | 0 mM |
| LR12 | TCEAL1 | Doc | 2 mM |
| LR13 | Con | DMSO | 0 mM |
| LR14 | Con | Doc | 2 mM |
| LR15 | TCEAL1 | DMSO | 0 mM |
| LR16 | TCEAL1 | Doc | 2 mM |
# create miodin project
library(miodin)
mp <- MiodinProject(
name = "Rassignment",
author = "Steffi Billiau",
path = ".")
mshow(mp)## Object of S4 class MiodinProject
## Project Rassignment
## Author: Steffi Billiau
## Data created: Fri Apr 24 17:12:36 2026
## Data changed: Fri Apr 24 17:12:36 2026
## Project hierarchy
## Studies
##
## Workflows
##
## Datasets
##
## Analysis results
##
# my data set has cell line A with 9 replicates and B with 8 replicates
# import data from txt file
countTable <- read.table(
"/Users/steffibilliau/Desktop/Bioinformatics R/b25stebi_rnaseq.txt",
header = TRUE,
row.names = 1,
sep = "\t")
# declare study design: case-control sample & assay table
sampleTable <- data.frame(
SampleName = colnames(countTable),
SamplingPoint = "sp1",
CellLine = factor(c(rep("A", 9), rep("B", 8))))
assayTable <- data.frame(
SampleName = colnames(countTable),
DataFile = "/Users/steffibilliau/Desktop/Bioinformatics R/b25stebi_rnaseq.txt",
DataColumn = colnames(countTable))
# declare case-control design: call the study function
ms <- studyDesignCaseControl(
studyName = "CellLine_rnaseq",
factorName = "CellLine",
caseName = "A",
controlName = "B",
contrastName = "A_vs_B",
numCase = 9,
numControl = 8,
sampleTable = sampleTable,
assayTable = assayTable,
assayTableName = "RNA"
)
insert(ms, mp)
mshow(ms)## Object of S4 class MiodinStudy
## Number of samples: 17
## Sampling points: sp1
## Study factors:
## CellLine
## Study groups:
## A
## B
## Study contrasts:
## A_vs_B
## Assay tables:
## RNA
# make workflow
mw <- MiodinWorkflow(name = "Rassignment")
mw <- mw +
importProcessedData(
name = "RNA importer",
experiment = "sequencing",
dataType = "rna",
studyName = "CellLine_rnaseq",
assayName = "RNA",
datasetName = "CellLine_rnaseq",
contrastName = "A_vs_B") %>%
processSequencingData(
name = "RNA processor",
contrastName = "A_vs_B",
filterLowCount = FALSE)
mw <- insert(mw, mp)
mshow(mw)## Object of S4 class MiodinWorkflow
## Number of workflow modules: 2
## Instantiated modules
##
## Module Name InputData MissingInput Execute
## ProcessedDataImporter RNA importer FALSE TRUE
## SequencingDataProcessor RNA processor RNA importer FALSE TRUE
The PCA plot shows that the cells from cell line A cluster together, as do the cells from line B. This means there is a biological difference between the two cell lines, 19% of the variation can be explained by this biological difference, as shown by PC1 on the x-axis. The y-axis displays PC2, the second most important variance at 13%. We see that both line A and line B have one dot pretty far up the axis. This means that line A and line B each have one sample that is an outlier when compared to the other data points in A and B respectively.
In the heatmap we see this same pattern of samples from the two cell lines clustering together, as is expected if the data is good. Samples of line A are most similar to other samples from line A and the same goes for B. This can be seen from two things: 1. the colors in the heatmap, with blue being least similar and orange most similar and 2. the dendrogram on each side of the heatmap, showing clustering of line A with line A and line B with line B. One important thing we notice here is that LineA_4 does not follow this pattern, it’s also not similar to any other sample. It would therefore be prudent to remove this sample, as this is most likely an outlier.
knitr::include_graphics("Rassignment/exports/datasets/CellLine_rnaseq/qualityReports/RNA/Count data/CellLine_rnaseq RNA Count data PCA.pdf")Figure 1: PCA plot
knitr::include_graphics("Rassignment/exports/datasets/CellLine_rnaseq/qualityReports/RNA/Count data/CellLine_rnaseq RNA Count data Sample heatmap.pdf")Figure 2: heatmap