v <- 1:20
l <- list()
for (i in c(1:10)) l[[i]] <- (v*i)
M <- matrix(unlist(l), 10, 20, byrow = TRUE)
M or print(M) to check if conversion done.
x <- rnorm(3, 2) ; x
x <- as.integer(x) ; is.vector(x) ; x
is.vector(x) returns TRUE
Similar way to set y, z as vectors with 3 integers:
y <- as.integer(rnorm(3, 5)) ; y
z <- as.integer(rnorm(3, 5)) ; z
Also, printed them so to check visually if they have indeed changed to integers.
A <- matrix(c(x, y, z), 3) ; A
We get to the same result by using the function cbind(), with the difference the vectors keep the x, y, z names:
A <- cbind(x, y, z) ; A
rownames(A) <- c("a", "b", "c") ; print(A)
We also printed A to see the change in the names.
v <- as.integer(c(0:10, 48))
A way to check if it contains integers and is also a vector:
( is.integer(v) == is.vector(v) ) == TRUE
B <- matrix(v, 4)
colnames(B) <- c("x", "y", "z") ; rownames(B) <- c("a", "b", "c", "d") ; print(B)
B <- matrix(B, 4, byrow = TRUE) ; print(B)
tB <- t(B) ; tB
We also printed it on screen by typing the name (tB, as we named it).
dim(tB)
By the rule of matrix multiplication in algebra, we cannot perform a tB*tB multiplication.
If we wish to multiply two matrices e.g. A(m,n) with the matrix B(u,v), where m, n and u, v are the rows and columns of the matrices accordingly, then we could multiply A*B only if n==u.
In other words, we could multiply two matrices only if the number of the columns of the first operand was equal to the number of the rows of the second operand.
However, the operation tB*tB is allowed in R and produces a product done cellwisely (in matrices of the same sizes multiplies elements with elements), so we expect an output with the same dimensions of tB:
We perform tB*tB in R and see what result we will get:
tB*tB
To check if dimensions are the same:
dim(tB) == dim(tB*tB)
Or, if we desired to see more details and only one TRUE output, we could do:
( (dim(tB)[1] == dim(tB*tB)[1] ) * ( dim(tB)[2] == dim(tB*tB)[2] ) ) == 1
tB%*%tB in R is the multiplication according to algebra:
tB%*%tB
It returns the error “Error in tB %*% tB : non-conformable arguments“.
In terminal:
wget ftp://ftp.ncbi.nlm.nih.gov/geo/datasets/GDS3nnn/GDS3309/soft/GDS3309.soft.gz
gunzip GDS3309.soft.gz
In terminal:
grep '^[^^!#]' GDS3309.soft > GDS3309.clean
Reading raw data as data.frame and checking its dimensions:
raw.data <- read.table("GDS3309.clean", sep="\t", header=T)
dim(raw.data)
The first two columns don’t contain numeric values, so we have to remove them:
data <- raw.data[,-c(1,2)]
dim(data)
Taking a quick look of what kind of data we have and its values (both first and last lines):
head(data[1:4,]) ; tail(data, n = 4)
A boxplot will show us if it’s normalized:
boxplot(data)
The data doesn’t appear normalized; we have many outliers, small values, the vast majority of data is gathered to the small values and we have zero symmetry in the values.
If we wanted to normalize the date, we could use, for instance, log2 or log10 transformation to our data (meaning log transformations in general). e.g. new.data <- log2(data).
Then, we would have to use new.data instead of data for further analyses and analyses’ results.
mean_by_column <- as.matrix(apply(data, 2, mean))
mean_by_column
The apply() function produces a vector with the mean values of every column, and we also converted the result to matrix type for better display on screen (vertical appearance).
We could also produce the means with the function colMeans() :
as.matrix(colMeans(data))
sd_by_column <- as.matrix(apply(data, 2, mean))
print(sd_by_column)
We create a matrix of same size of our data and initialize to zeros:
standardized.data <- matrix(0, nrow = dim(data)[1], ncol = dim(data)[2])
We substract from the elements of the column the mean of the column, divide the result by the standard deviation of the column, and then store the differences in the standardized matrix:
for (i in 1:dim(data)[2])
{
standardized.data[,i] <- (data[,i] - colMeans(data)[i])/sd_by_column[i]
}
max_value_by_column <- as.matrix(apply(data, 2, max))
print(max_value_by_column)
In the standardized data would be:
max_standardized_value_by_column <- as.matrix(apply(standardized.data, 2, max))
print(max_standardized_value_by_column)
min_value_by_column <- as.matrix(apply(data, 2, min))
print(min_value_by_column)
In the standardized data would be:
min_standardized_value_by_column <- as.matrix(apply(standardized.data, 2, min))
print(min_standardized_value_by_column)