The following code reproduces the code by Prof. Peng on Exploratory Data Analysis course on Coursera The goal is to experiment different settings of heatmap and image functions.

No pattern

Let’s create a dataset with random data.

set.seed(12345)
par(mar = rep(0.2, 4))
dataMatrix <- matrix(rnorm(400), nrow = 40)
dim ( dataMatrix )
## [1] 40 10
dataMatrix[1:5,]
##         [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]
## [1,]  0.5855  1.1285  0.6454  1.5449 -0.4876 -1.4361 -0.7001 -1.5139
## [2,]  0.7095 -2.3804  1.0431  1.3215  0.3032 -0.6293 -0.5674  0.1643
## [3,] -0.1093 -1.0603 -0.3044  0.3222 -0.2420  0.2435 -0.2614 -0.8709
## [4,] -0.4535  0.9371  2.4771  1.5310 -0.4817  1.0584 -1.0639  1.5933
## [5,]  0.6059  0.8545  0.9712 -0.4212 -0.9918  0.8313 -0.1064  0.6466
##         [,9]    [,10]
## [1,]  0.3803 -0.37582
## [2,]  0.6051 -1.81283
## [3,]  1.0197  0.28860
## [4,]  0.4749 -0.18962
## [5,] -2.1859  0.01786

We have a dataset of 40x10 random normal variables. No pattern is present.

The function image plot in the grid x y the value defined by z.

image(1:10, 1:40, t(dataMatrix)[, nrow(dataMatrix):1])

plot of chunk unnamed-chunk-2

Cluster the data:

par(mar = rep(0.2, 4))
heatmap(dataMatrix)

plot of chunk unnamed-chunk-3

Adding a pattern.

On average half of the rows will add a certain pattern.

set.seed(678910)
for (i in 1:40) {
              # flip a coin
              coinFlip <- rbinom(1, size = 1, prob = 0.5)
              # if coin is heads add a common pattern to that row
              if (coinFlip) {
                          dataMatrix[i, ] <- dataMatrix[i, ] + rep(c(0, 3), each = 5)
                          }
              }

In this case the plot contains the last 5 columns at the beginning. So we expect the first 5 columns to have higher values

par(mar = rep(0.2, 4))
image(1:10, 1:40, t(dataMatrix)[, nrow(dataMatrix):1])

plot of chunk unnamed-chunk-5

Adding pattern per row and column

Let’s sort data per similarity.

hh <- hclust(dist(dataMatrix))
dataMatrixOrdered <- dataMatrix[hh$order, ]

This way first rows of dataMatrixOrdered will be rows more similar.

par(mfrow = c(1, 3))
image(t(dataMatrixOrdered)[, nrow(dataMatrixOrdered):1])
plot( rowMeans(dataMatrixOrdered) , 40:1, xlab = "Row Mean", ylab = "Row", pch = 19)
plot( colMeans(dataMatrixOrdered) , xlab = "Column", ylab = "Column Mean", pch = 19)

plot of chunk unnamed-chunk-7