Problem statement: To what extend does the reliability of metric space data affect the accuracy of scientific observations and modeling in science?
Challenges
- Very limited data
- Indirect measurements
- High noise data
- False positives
2026-06-07
Problem statement: To what extend does the reliability of metric space data affect the accuracy of scientific observations and modeling in science?
Challenges
New strategies on validating data are always developing such as:
d2 = (x2 - x1)2 + (y2 - y1)2
Separation/segregation power - SP
SP = \({MD~inter~ \over MD~intra~}\)
MDinter: Inter mean distance
MDintra: Intra mean distance
SP - Separation Power
Sum of Distances: Intra{(SD)intra = d(2,3) + d(1,4,5)}
Sum of Distances: Inter{(SD)inter = d(2,1) + d(2,5) + d(2,4) + d(3,1) + d(3,4) + d(3,5)}
Code for plotting a plotly plot based on partitioning metric space data with randomized numbers
set.seed(123)
xDist = sample(1:10, 5)
yDist = sample(1:10, 5)
zDist = sample(1:10, 5)
clusters = c("cluster1", "cluster1", "cluster2",
"cluster2", "cluster2")
points = c("a", "b", "c", "d", "e")
fig1 <- plot_ly(x = xDist, y = yDist, z = zDist,
type = "scatter3d", mode = "markers+text",
color = clusters,
text = points)
fig1 ## plotly not printing for ioslides
Normal distribution calculations
f(d) = probability density function (PDF)
\(f(d) = {1 \over {\sigma}\sqrt{\pi}} e^{-\left({d \over 2{\sigma}}\right)^{2}}\) \((d \geq 0)\)