Exercise 1

Create a list of 10 vectors, each composed of 20 integers.

The ( i^{th} ) vector at the ( j^{th} ) position contains the value ( j * i ) :

v <- 1:20 
l <- list()
for (i in c(1:10)) l[[i]] <- (v*i)

Convert this list to a matrix, where each vector (element of the list) will be a row of the matrix:

M <- matrix(unlist(l), 10, 20, byrow = TRUE)

M or print(M) to check if conversion done.

Exercise 2

Create three vectors x, y, z with integers and each vector has 3 elements:

x <- rnorm(3, 2) ; x
x <- as.integer(x) ; is.vector(x) ; x

is.vector(x) returns TRUE

Similar way to set y, z as vectors with 3 integers:

y <- as.integer(rnorm(3, 5)) ; y 
z <- as.integer(rnorm(3, 5)) ; z 

Also, printed them so to check visually if they have indeed changed to integers.

Combine the three vectors to become a 3×3 matrix A where each column represents a vector:

A <- matrix(c(x, y, z), 3) ; A

We get to the same result by using the function cbind(), with the difference the vectors keep the x, y, z names:

A <- cbind(x, y, z) ; A

Change the row names to a, b, c :

rownames(A) <- c("a", "b", "c") ; print(A)

We also printed A to see the change in the names.

Exercise 3

Create a vector with 12 integers:

v <- as.integer(c(0:10, 48))

A way to check if it contains integers and is also a vector:

( is.integer(v) == is.vector(v) ) == TRUE

Convert the vector to a 4*3 matrix B using matrix():

B <- matrix(v, 4)

Please change the column names to x, y, z and row names to a, b, c, d :

colnames(B) <- c("x", "y", "z") ; rownames(B) <- c("a", "b", "c", "d") ; print(B)

The argument byrow in matrix() is set to be FALSE by default.

Please change it to TRUE and print B to see the differences:

B <- matrix(B, 4, byrow = TRUE) ; print(B)

Exercise 4

Please obtain the transpose matrix of B named tB :

tB <- t(B) ; tB

We also printed it on screen by typing the name (tB, as we named it).

Exercise 5

Now tB is a 3×4 matrix:

dim(tB)

By the rule of matrix multiplication in algebra, can we perform tB*tB in R language?

(Is a 3×4 matrix multiplied by a 3×4 allowed?) What result would we get?

By the rule of matrix multiplication in algebra, we cannot perform a tB*tB multiplication.

If we wish to multiply two matrices e.g. A(m,n) with the matrix B(u,v), where m, n and u, v are the rows and columns of the matrices accordingly, then we could multiply A*B only if n==u.

In other words, we could multiply two matrices only if the number of the columns of the first operand was equal to the number of the rows of the second operand.

However, the operation tB*tB is allowed in R and produces a product done cellwisely (in matrices of the same sizes multiplies elements with elements), so we expect an output with the same dimensions of tB:

We perform tB*tB in R and see what result we will get:

tB*tB

To check if dimensions are the same:

dim(tB) == dim(tB*tB)

Or, if we desired to see more details and only one TRUE output, we could do:

( (dim(tB)[1] == dim(tB*tB)[1] ) * ( dim(tB)[2] == dim(tB*tB)[2] ) ) == 1

tB%*%tB in R is the multiplication according to algebra:

tB%*%tB

It returns the error “Error in tB %*% tB : non-conformable arguments“.

Exercise 6

Download the dataset GDS3309 from the NCBI:

In terminal:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/datasets/GDS3nnn/GDS3309/soft/GDS3309.soft.gz
gunzip GDS3309.soft.gz

Clean it (remove the !,^,# lines):

In terminal:

grep '^[^^!#]' GDS3309.soft > GDS3309.clean

Inspect if it’s normalized:

Reading raw data as data.frame and checking its dimensions:

raw.data <- read.table("GDS3309.clean", sep="\t", header=T)
dim(raw.data)

The first two columns don’t contain numeric values, so we have to remove them:

data <- raw.data[,-c(1,2)]
dim(data)

Taking a quick look of what kind of data we have and its values (both first and last lines):

head(data[1:4,]) ; tail(data, n = 4)

A boxplot will show us if it’s normalized:

boxplot(data)

The data doesn’t appear normalized; we have many outliers, small values, the vast majority of data is gathered to the small values and we have zero symmetry in the values.

If we wanted to normalize the date, we could use, for instance, log2 or log10 transformation to our data (meaning log transformations in general). e.g. new.data <- log2(data).

Then, we would have to use new.data instead of data for further analyses and analyses’ results.

From every column find the mean:

mean_by_column <- as.matrix(apply(data, 2, mean))
mean_by_column

The apply() function produces a vector with the mean values of every column, and we also converted the result to matrix type for better display on screen (vertical appearance).

We could also produce the means with the function colMeans() :

as.matrix(colMeans(data))

From every column find the standard deviation:

sd_by_column <- as.matrix(apply(data, 2, mean))
print(sd_by_column)

Create a new matrix where each column will be standardized. This is substract from the elements of the column the mean of the column and divide by the standard deviation of the column:

We create a matrix of same size of our data and initialize to zeros:

standardized.data <- matrix(0, nrow = dim(data)[1], ncol = dim(data)[2])

We substract from the elements of the column the mean of the column, divide the result by the standard deviation of the column, and then store the differences in the standardized matrix:

for (i in 1:dim(data)[2]) 
{ 
standardized.data[,i] <- (data[,i] - colMeans(data)[i])/sd_by_column[i] 
}

For each column find the gene with the maximum value:

max_value_by_column <- as.matrix(apply(data, 2, max))
print(max_value_by_column)

In the standardized data would be:

max_standardized_value_by_column <- as.matrix(apply(standardized.data, 2, max))
print(max_standardized_value_by_column)

For each column find the gene with the minimum value:

min_value_by_column <- as.matrix(apply(data, 2, min))
print(min_value_by_column)

In the standardized data would be:

min_standardized_value_by_column <- as.matrix(apply(standardized.data, 2, min))
print(min_standardized_value_by_column)