Introduction to the R package TDA

1. Introduction

What is Topological Data Analysis?

http://www.kdnuggets.com/2015/11/crazy-deep-learning-topological-data-analysis.html.

https://www.kaggle.com/triskelion/digit-recognizer/mapping-digits-with-a-t-sne-lens/notebook.

This is an exploratory homework by following

https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf.

They present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis.

In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as

1. the distance function,

2. the distance to a measure,

3. the kNN density estimator,

4. the kernel density estimator,

5. and the kernel distance.

The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology

As a first toy example to using the TDA package, we show how to compute distance functions and density estimators over a grid of points. The setting is the typical one in TDA: a set of points X = {x1, . . . , xn} ⊂ R d has been sampled from some distribution P and we are interested in recovering the topological features of the underlying space by studying some functions of the data. The following code generates a sample of 400 points from the unit circle and constructs a grid of points over which we will evaluate the functions.

2. Distance Functions and Density Estimators

library("TDA")
## Warning: package 'TDA' was built under R version 3.3.3
X <- circleUnif(400)
Xlim <- c(-1.6, 1.6); Ylim <- c(-1.7, 1.7); by <- 0.065
Xseq <- seq(Xlim[1], Xlim[2], by = by)
Yseq <- seq(Ylim[1], Ylim[2], by = by)
Grid <- expand.grid(Xseq, Yseq)

plot(X,main="Simple X")

### 1. the distance function

distance <- distFct(X = X, Grid = Grid)

### 2. the distance to a measure,

m0 <- 0.1
DTM <- dtm(X = X, Grid = Grid, m0 = m0)

### 3. the kNN density estimator,

k <- 60
kNN <- knnDE(X = X, Grid = Grid, k = k)

### 4. the kernel density estimator, 

h <- 0.3
KDE <- kde(X = X, Grid = Grid, h = h)

### 5. and the kernel distance. 

h <- 0.3
Kdist <- kernelDist(X = X, Grid = Grid, h = h)

persp(Xseq, Yseq,matrix(distance, ncol = length(Yseq), nrow = length(Xseq)), xlab = "",ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50,col = 2, border = NA, main = "distance function", d = 0.5, scale = FALSE,expand = 3, shade = 0.9)

persp(Xseq, Yseq,matrix(DTM, ncol = length(Yseq), nrow = length(Xseq)), xlab = "",ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50,col = 2, border = NA, main = "DTM", d = 0.5, scale = FALSE,expand = 3, shade = 0.9)

persp(Xseq, Yseq,matrix(KDE, ncol = length(Yseq), nrow = length(Xseq)), xlab = "",ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50,col = 2, border = NA, main = "KDE", d = 0.5, scale = FALSE,expand = 3, shade = 0.9)

persp(Xseq, Yseq,matrix(kNN, ncol = length(Yseq), nrow = length(Xseq)), xlab = "",ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50,col = 2, border = NA, main = "KNN", d = 0.5, scale = FALSE,expand = 3, shade = 0.9)

persp(Xseq, Yseq,matrix(Kdist, ncol = length(Yseq), nrow = length(Xseq)), xlab = "",ylab = "", zlab = "", theta = -20, phi = 35, ltheta = 50,col = 2, border = NA, main = "Kdist", d = 0.5, scale = FALSE,expand = 3, shade = 0.9)

2.1. Bootstrap Confidence Bands

band <- bootstrapBand(X = X, FUN = kde, Grid = Grid, B = 100,parallel = FALSE, alpha = 0.1, h = h)

3. Persistent Homology

3.1. Persistent Homology Over a Grid

Diag <- gridDiag(X = X, FUN = kde, h = 0.3, lim = cbind(Xlim, Ylim), by = by, sublevel = FALSE, library = "Dionysus",printProgress = FALSE)


plot(Diag[["diagram"]], band = 2 * band[["width"]],main = "KDE Diagram")

par(mfrow = c(1, 2), mai = c(0.8, 0.8, 0.3, 0.1))
plot(Diag[["diagram"]], rotated = TRUE, band = band[["width"]], main = "Rotated Diagram")
plot(Diag[["diagram"]], barcode = TRUE, main = "Barcode")

3.2. Rips Diagrams

Circle1 <- circleUnif(60)
Circle2 <- circleUnif(60, r = 2) + 3
Circles <- rbind(Circle1, Circle2)


maxscale <- 5 # limit of the filtration
maxdimension <- 1 # components and loops

Diag <- ripsDiag(X = Circles, maxdimension, maxscale,library = "GUDHI", printProgress = FALSE)


par(mfrow = c(1, 2), mai=c(0.8, 0.8, 0.3, 0.3))
plot(Circles, pch = 16, xlab = "",ylab = "")
plot(Diag[["diagram"]])

3.3. Alpha Complex Persistence Diagram

X <- circleUnif(n = 30)
DiagAlphaComplex <- alphaComplexDiag(X = X, printProgress = TRUE)
## # Generated complex of size: 115 
## # Persistence timer: Elapsed time [ 0.000000 ] seconds
# Generated complex of size: 115
# Persistence timer: Elapsed time [ 0.000000 ] seconds
plot(DiagAlphaComplex[["diagram"]])