library(bigMap)
# source aux. stuff (graph plot function)
source('../graphs.R')

Run pt-SNE on Sierpinski-3d with perplexity 1948 (95% of the data set size) and a decreasing range of thread-ratio. We show:

Load data

load('../s3d.RData')

Run ptSNE (perplexity = 1948, decreasing thread-ratio)

threads <- c(1, 3, 4, 5, 6, 8)
# compute bandwiths ($\beta_{i}$) for $ppx=1948$,
g.0 <- bdm.init(s3d$data, is.distance = T, ppx = 1948, threads = 4)
# run pt-SNE
g.list <- lapply(threads, function(z)
{
  g <- bdm.ptsne(s3d$data, g.0, theta = 0.0, threads = z, layers = ifelse(z > 1, 2, 1))  
  g <- bdm.hlCorr(s3d$data, g, threads = 4)
  g <- bdm.knp(s3d$data, g, threads = 4)
  g
})
save(g.list, file = './s3d_ppx_1948.RData')

Embedding cost/size function

nulL <- lapply(g.list, function(g) bdm.cost(g))

Output

Note that the global structure is more or less preserved (we are using a very high perplexity) but the local structure gets worst as the thread-ratio is decreased.

nulL <- lapply(g.list, function(g) graph.plot(g, s3d$edges))

hl-Correlation

In terms of HL-correlation, the loss in structure (on average) is minimal.

hlTable <- sapply(g.list, function(g) mean(g$hlC))
hlTable <- matrix(round(hlTable, 4), nrow = 1)
threadRatio <- sapply(g.list, function(g) g$ptsne$layers /g$ptsne$threads)
colnames(hlTable) <- round(threadRatio, 2)
rownames(hlTable) <- c('<hlC>')
knitr::kable(hlTable, caption = 'hl-Correlation by thread-ratio') %>%
  kable_styling(full_width = F)
hl-Correlation by thread-ratio
1 0.67 0.5 0.4 0.33 0.25
<hlC> 0.7451 0.7468 0.747 0.7526 0.7486 0.7509

k-ary neighborhood preservation

In terms of kNP, the global structure (linAUC) is notably preserved (with the exception of g5 with thread-ratio = 0.33, blue line) but the loss in local structre (logAUC) is clearly perceptible.

bdm.knp.plot(g.list)

Running Times

Note the correlation between decreasing thread-ratio and decreasing running time.

rTimes <- sapply(g.list, function(g) c(g$ppx$t[3], g$t$epoch, g$t$ptsne[3], sum(c(g$ppx$t[3], g$t$ptsne[3]))))
rTimes <- round(rTimes, 2)
threadRatio <- sapply(g.list, function(g) g$ptsne$layers /g$ptsne$threads)
colnames(rTimes) <- round(threadRatio, 2)
rownames(rTimes) <- c('betas', 'epoch', 'ptSNE', 'total')
knitr::kable(rTimes, caption = 'Computation times (s) by thread-ratio') %>%
  kable_styling(full_width = F)
Computation times (s) by thread-ratio
1 0.67 0.5 0.4 0.33 0.25
betas 0.74 0.74 0.74 0.74 0.74 0.74
epoch 2.18 0.97 0.62 0.54 0.42 0.32
ptSNE 66.04 29.68 19.04 16.60 13.00 9.96
total 66.78 30.42 19.78 17.34 13.74 10.69

Run on: Intel(R) Xeon(R) CPU E31225 @ 3.10GHz, 4 cores, 16GB RAM.