There are several clustering methods available, including DBSCAN, hierarchical clustering, k-means, and PAM. But you need to input all of the params required for the chosen method in method_params.
time_taken <- system.time({
Mapper <- MapperAlgo(
filter_values = data[,1:4],
# filter_values = circle_data[,2:3],
percent_overlap = 30,
methods = "dbscan",
method_params = list(eps = 1, minPts = 1),
# methods = "hierarchical",
# method_params = list(num_bins_when_clustering = 10, method = 'ward.D2'),
# methods = "kmeans",
# method_params = list(max_kmeans_clusters = 2),
# methods = "pam",
# method_params = list(num_clusters = 2),
cover_type = 'stride',
# intervals = 4,
interval_width = 1,
num_cores = 12
)
})
time_taken
## user system elapsed
## 0.078 0.035 1.225
There are built-in plotting functions to visualize the Mapper graph. You can choose between “ggraph” and “forceNetwork” types. Additionally, you can choose to average feature values over nodes or use conditional probability embeddings.
source('../R/Plotter.R')
MapperPlotter(Mapper, label=data$Species, data=data, type="forceNetwork", avg=FALSE, use_embedding=FALSE)
source('../R/CPEmbedding.R')
data$PW_group <- ifelse(data$Sepal.Width > 1.5, "wide", "narrow")
embedded <- CPEmbedding(Mapper, data, columns = list("PW_group", "Species"), a_level = "wide", b_level = "versicolor")
MapperPlotter(Mapper, label=embedded, data=data, type="forceNetwork", avg=TRUE, use_embedding=TRUE)
Sometimes it is useful to find out whether the Mapper graph structure correlates with certain features in the data. For example, if Sepal.Length densities are similar to Sepal.Width densities across the nodes in the Mapper graph.
source('../R/MapperCorrelation.R')
MapperCorrelation(Mapper, data = data, labels = list(data$Sepal.Length, data$Sepal.Width))
## `geom_smooth()` using formula = 'y ~ x'
If you want to find optimal cover parameters for your data, you can use the GridSearch function. Observing the convergence of certain graph metrics as cover parameters change can help you select appropriate values.
source('../R/GridSearch.R')
cpe_params <- list("PW_group", "Species", "wide", "versicolor")
data$PW_group <- ifelse(data$Sepal.Width > 1.5, "wide", "narrow")
labels <- data%>%select(PW_group, Species)
GridSearch(
filter_values = data[,1:4],
label = labels,
column = "Species",
cover_type = "stride",
width_vec = c(1),
overlap_vec = c(30),
num_cores = 12,
out_dir = "../mapper_grid_outputs",
avg = TRUE,
use_embedding = cpe_params
)
## Cover=stride, Width=1.00, Overlap=30%
## Saved: ../mapper_grid_outputs/mapper_stride_w1.00_ov30.png , Elapsed: 1.472 sec