Assignment 1

Steffi Billiau

24/04/2026

Question 1

The study design used 4 different types of treatment:

1. DSMO without siRNA

2. DSMO with siRNA (named TCEAL1)

3. docetaxel 2mM without siRNA

4. docetaxel 2mM with siRNA (named TCEAL1)

Each experimental group had 4 replicates, leading to a total of 16 replicates.

#Load required packages
library(ArrayExpress)
library(knitr)

#Download  data from BioStudies
dir.create("data_exercise", showWarnings = FALSE)
files <- getAE("E-MTAB-9484", path = "data_exercise", type = "processed", extract = TRUE)

#Import data into a count table and remove potential duplicates
countTable <- read.table("data_exercise/20190328_LR_R08_raw_counts.tsv",
                         header=TRUE, as.is=TRUE, row.names=1, sep="\t")
countTable <- countTable[, !duplicated(colnames(countTable))]

#get the sample names from the header of the count table
sampleNames <- colnames(countTable)

#since the sample names contain all the info about the treatment factors etc,
#we need to split this up, so we create a new table with the following code:
sampleInfo <- do.call(rbind, strsplit(sampleNames, "_"))

#sampleInfo now shows the very long names with all the information as separate columns in the new table
#we see that we need column 5, as this shows whether the replicate had siRNA or control as a factor
#we also see we need column 7, as this shows whether the replicate had DMSO or docetaxel as a factor
#on the BioStudies website we can see that the dose for docetaxel was always 2mM
#therefore we add the dose factor separately as this is not available in sampleInfo
#we make a data frama of the 3 factors we need and get the replicate names, all from sampleInfo
sampleTable <- data.frame(
  RNA_interference = factor(sampleInfo[, 5]),
  compound = factor(sampleInfo[, 7]),
  dose = factor(ifelse(sampleInfo[, 7] == "DMSO",
                       "0 mM",
                       "2 mM")))
rownames(sampleTable) <- sampleInfo[, 3]
knitr::kable(sampleTable)

	RNA_interference	compound	dose
LR1	Con	DMSO	0 mM
LR2	Con	Doc	2 mM
LR3	TCEAL1	DMSO	0 mM
LR4	TCEAL1	Doc	2 mM
LR5	Con	DMSO	0 mM
LR6	Con	Doc	2 mM
LR7	TCEAL1	DMSO	0 mM
LR8	TCEAL1	Doc	2 mM
LR9	Con	DMSO	0 mM
LR10	Con	Doc	2 mM
LR11	TCEAL1	DMSO	0 mM
LR12	TCEAL1	Doc	2 mM
LR13	Con	DMSO	0 mM
LR14	Con	Doc	2 mM
LR15	TCEAL1	DMSO	0 mM
LR16	TCEAL1	Doc	2 mM

Question 2

# create miodin project 
library(miodin)
mp <- MiodinProject(
  name = "Rassignment",
  author = "Steffi Billiau",
  path = ".")
mshow(mp)

## Object of S4 class MiodinProject
## Project Rassignment
##   Author: Steffi Billiau
##   Data created: Fri Apr 24 17:12:36 2026
##   Data changed: Fri Apr 24 17:12:36 2026
## Project hierarchy
##   Studies
##     
##   Workflows
##     
##   Datasets
##     
##   Analysis results
##

# my data set has cell line A with 9 replicates and B with 8 replicates
# import data from txt file
countTable <- read.table(
  "/Users/steffibilliau/Desktop/Bioinformatics R/b25stebi_rnaseq.txt",
  header = TRUE,
  row.names = 1,
  sep = "\t")

# declare study design: case-control sample & assay table
sampleTable <- data.frame(
  SampleName = colnames(countTable),
  SamplingPoint = "sp1",
  CellLine = factor(c(rep("A", 9), rep("B", 8))))

assayTable <- data.frame(
  SampleName = colnames(countTable),
  DataFile = "/Users/steffibilliau/Desktop/Bioinformatics R/b25stebi_rnaseq.txt",
  DataColumn = colnames(countTable))

# declare case-control design: call the study function 
ms <- studyDesignCaseControl(
  studyName = "CellLine_rnaseq",
  factorName = "CellLine",
  caseName = "A",
  controlName = "B",
  contrastName = "A_vs_B",
  numCase = 9,
  numControl = 8,
  sampleTable = sampleTable,
  assayTable = assayTable,
  assayTableName = "RNA"
)
insert(ms, mp)
mshow(ms)

## Object of S4 class MiodinStudy
## Number of samples: 17
## Sampling points: sp1
## Study factors:
##   CellLine
## Study groups:
##   A
##   B
## Study contrasts:
##   A_vs_B
## Assay tables:
##   RNA

# make workflow
mw <- MiodinWorkflow(name = "Rassignment")
mw <- mw +
  importProcessedData(
    name = "RNA importer",
    experiment = "sequencing",
    dataType = "rna",
    studyName = "CellLine_rnaseq",
    assayName = "RNA",
    datasetName = "CellLine_rnaseq",
    contrastName = "A_vs_B") %>%
  processSequencingData(
    name = "RNA processor",
    contrastName = "A_vs_B",
    filterLowCount = FALSE)
mw <- insert(mw, mp)
mshow(mw)

## Object of S4 class MiodinWorkflow
## Number of workflow modules: 2
## Instantiated modules
## 
##                   Module          Name    InputData MissingInput Execute
##    ProcessedDataImporter  RNA importer                     FALSE    TRUE
##  SequencingDataProcessor RNA processor RNA importer        FALSE    TRUE

# now we need to execute the workflow
mw <- execute(mw)

# save and export dataset
saveDataFile(mp)
export(mp, "dataset", "CellLine_rnaseq")

Question 3

The PCA plot shows that the cells from cell line A cluster together, as do the cells from line B. This means there is a biological difference between the two cell lines, 19% of the variation can be explained by this biological difference, as shown by PC1 on the x-axis. The y-axis displays PC2, the second most important variance at 13%. We see that both line A and line B have one dot pretty far up the axis. This means that line A and line B each have one sample that is an outlier when compared to the other data points in A and B respectively.

In the heatmap we see this same pattern of samples from the two cell lines clustering together, as is expected if the data is good. Samples of line A are most similar to other samples from line A and the same goes for B. This can be seen from two things: 1. the colors in the heatmap, with blue being least similar and orange most similar and 2. the dendrogram on each side of the heatmap, showing clustering of line A with line A and line B with line B. One important thing we notice here is that LineA_4 does not follow this pattern, it’s also not similar to any other sample. It would therefore be prudent to remove this sample, as this is most likely an outlier.

knitr::include_graphics("Rassignment/exports/datasets/CellLine_rnaseq/qualityReports/RNA/Count data/CellLine_rnaseq RNA Count data PCA.pdf")

Figure 1: PCA plot

knitr::include_graphics("Rassignment/exports/datasets/CellLine_rnaseq/qualityReports/RNA/Count data/CellLine_rnaseq RNA Count data Sample heatmap.pdf")

Figure 2: heatmap