It explains the concept of functional depth, a fundamental tool in functional data analysis that extends the notion of statistical depth to infinite-dimensional spaces. We present several popular functional depth measures and their properties.
Functional depth is a generalization of multivariate statistical depth to functional data, providing a center-outward ordering of curves in a functional space \(\mathcal{F}\). Given a sample of curves \(\{X_i(t)\}_{i=1}^n\), \(t \in \mathcal{T}\), a depth function \(D: \mathcal{F} \to \mathbb{R}^+\) assigns higher values to more central curves.
A good functional depth should satisfy:
The integrated depth combines pointwise univariate depths:
\[ D_I(X) = \int_{\mathcal{T}} D_t(X(t)) \, dt \]
where \(D_t\) is a univariate depth at point \(t\).
Modified Band Depth (MBD)
The MBD measures the proportion of time a curve lies within the band formed by other curves:
\[ \text{MBD}(X) = \binom{n}{2}^{-1} \sum_{1 \leq i < j \leq n} \lambda\{t \in \mathcal{T} : \min(X_i(t), X_j(t)) \leq X(t) \leq \max(X_i(t), X_j(t))\} \]
where \(\lambda\) is the Lebesgue measure on \(\mathcal{T}\).
An extension of Tukey’s depth:
\[ D_H(X) = \inf_{\substack{v \in V \\ c \in \mathbb{R}}} P(v(X) \leq c) \]
where \(V\) is a set of directions in the functional space.
Projects the functional data onto random directions and computes multivariate depth:
\[ D_R(X) = \mathbb{E}_v[D_{\text{multivariate}}(v(X))] \]
Functional depth provides a powerful framework for analyzing functional data, offering robust and interpretable tools for various statistical tasks. The choice of depth measure depends on the specific application and data characteristics.
For functional data, the band depth (BD) or modified band depth (MBD) allows for ordering a sample of curves from the center outwards and, thus, introduces a measure to define functional quantiles and the centrality or outlyingness of an observation. A smaller rank is associated with a more central position with respect to the sample curves. BD usually provides many ties (curves have the same depth values), but MBD does not. “BD2” uses two curves to determine a band. The method “Both” uses “BD2” first and then uses “MBD” to break ties. The method “Both” uses BD2 first and then uses MBD to break ties. The computation is carried out by the fast algorithm proposed by Sun et. al. (2012).
Arguments fit a p-by-n functional data matrix where n is the number of curves, and p is defined below.
x
For fbplot, x is the x coordinates of curves. Defaults to 1:p where p is
the number of x coordinates.
For boxplot.fd, boxplot.fdPar and boxplot.fdSmooth, x is an object of class fd, fdPar or fdSmooth, respectively.
z
The coordinate of the curves, labeled x for fdplot. For boxplot.fd,
boxplot.fdPar and boxplot.fdSmooth, this cannot be x, because that would
clash with the generic boxplot(x, …) standard.
method
the method to be used to compute band depth. Can be one of “BD2”, “MBD”
or “Both” with a default of “MBD”. See also details.
depth
a vector giving band depths of curves. If missing, band depth
computation is conducted.
plot
logical. If TRUE (the default) then a functional boxplot is produced. If
not, band depth and outliers are returned.
prob
a vector giving the probabilities of central regions in a decreasing
order, then an enhanced functional boxplot is produced. Defaults to be
0.5 and a functional boxplot is plotted.
color
a vector giving the colors of central regions from light to dark for an
enhanced functional boxplot. Defaults to be magenta for a functional
boxplot.
outliercol
color of outlying curves. Defaults to be red.
barcol
color of bars in a functional boxplot. Defaults to be blue.
fullout logical for plotting outlying curves. If FALSE (the default) then only the part outside the box is plotted. If TRUE, complete outlying curves are plotted.
factor
the constant factor to inflate the middle box and determine fences for
outliers. Defaults to be 1.5 as in a classical boxplot.
xlim
x-axis limits
ylim
y-axis limits
… For fbplot, optional arguments for plot.
For boxplot.fd, boxplot.fdPar, or boxplot.fdSmooth, optional arguments for fbplot.
##################################
# #
# BOXPLOT FUNCIONAL #
# #
##################################
# Remerber what is a classical boxplot
x<-rnorm(100,4,3)
boxplot(x)
library(fda)
## Warning: package 'fda' was built under R version 4.4.3
## Cargando paquete requerido: splines
## Cargando paquete requerido: fds
## Cargando paquete requerido: rainbow
## Cargando paquete requerido: MASS
## Warning: package 'MASS' was built under R version 4.4.3
## Cargando paquete requerido: pcaPP
## Cargando paquete requerido: RCurl
## Cargando paquete requerido: deSolve
## Warning: package 'deSolve' was built under R version 4.4.3
##
## Adjuntando el paquete: 'fda'
## The following object is masked from 'package:graphics':
##
## matplot
?fbplot
## starting httpd help server ...
## done
##
## 1. generate 50 random curves with some covariance structure
## model 1 without outliers
##
cov.fun=function(d,k,c,mu){
k*exp(-c*d^mu)
}
n=50
p=30
t=seq(0,1,len=p)
d=dist(t,upper=TRUE,diag=TRUE)
d.matrix=as.matrix(d)
#covariance function in time
t.cov=cov.fun(d.matrix,1,1,1)
# Cholesky Decomposition
L=chol(t.cov)#LL'
mu=4*t
e=matrix(rnorm(n*p),p,n)
ydata = mu+t(L)%*%e
#functional boxplot
fbplot(ydata,method='MBD',ylim=c(-11,15))
## $depth
## [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
## [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 21
# The same using boxplot.fd
boxplot.fd(ydata, method='MBD', ylim=c(-11, 15))
## $depth
## [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
## [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 21
# same with default ylim
boxplot.fd(ydata)
## $depth
## [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
## [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 21
##
## 2. as an fd object
##
T = dim(ydata)[1]
time = seq(0,T,len=T)
ybasis = create.bspline.basis(c(0,T), 23)
Yfd = smooth.basis(time, ydata, ybasis)$fd
boxplot(Yfd)
## $depth
## [1] 0.36182663 0.20171348 0.27316226 0.41565569 0.47721156 0.19508588
## [7] 0.41224490 0.27540917 0.37771671 0.26145888 0.20805011 0.09507375
## [13] 0.45828248 0.41279450 0.44004849 0.47533643 0.49219640 0.38678521
## [19] 0.48560113 0.38846636 0.51186906 0.42794100 0.42364114 0.39687210
## [25] 0.30233987 0.36219842 0.06288947 0.27355021 0.47037381 0.24804203
## [31] 0.47292786 0.48453425 0.43806021 0.25863003 0.49738533 0.46461911
## [37] 0.22094969 0.43448778 0.38604162 0.41827440 0.24290160 0.19463326
## [43] 0.42878157 0.32681350 0.36473631 0.37223682 0.30596080 0.23658113
## [49] 0.47876339 0.39884421
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 21
##
## 3. as an fdPar object
##
Ypar <- fdPar(Yfd)
boxplot(Ypar)
## $depth
## [1] 0.36182663 0.20171348 0.27316226 0.41565569 0.47721156 0.19508588
## [7] 0.41224490 0.27540917 0.37771671 0.26145888 0.20805011 0.09507375
## [13] 0.45828248 0.41279450 0.44004849 0.47533643 0.49219640 0.38678521
## [19] 0.48560113 0.38846636 0.51186906 0.42794100 0.42364114 0.39687210
## [25] 0.30233987 0.36219842 0.06288947 0.27355021 0.47037381 0.24804203
## [31] 0.47292786 0.48453425 0.43806021 0.25863003 0.49738533 0.46461911
## [37] 0.22094969 0.43448778 0.38604162 0.41827440 0.24290160 0.19463326
## [43] 0.42878157 0.32681350 0.36473631 0.37223682 0.30596080 0.23658113
## [49] 0.47876339 0.39884421
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 21
##
## 4. Smoothed version
##
Ysmooth <- smooth.fdPar(Yfd)
boxplot(Ysmooth)
## $depth
## rep1 rep2 rep3 rep4 rep5 rep6 rep7
## 0.36182663 0.20171348 0.27332390 0.41565569 0.47721156 0.19508588 0.41224490
## rep8 rep9 rep10 rep11 rep12 rep13 rep14
## 0.27540917 0.37771671 0.26145888 0.20792079 0.09507375 0.45828248 0.41276217
## rep15 rep16 rep17 rep18 rep19 rep20 rep21
## 0.44004849 0.47533643 0.49222873 0.38678521 0.48576278 0.38825621 0.51186906
## rep22 rep23 rep24 rep25 rep26 rep27 rep28
## 0.42818347 0.42364114 0.39693675 0.30227521 0.36219842 0.06288947 0.27355021
## rep29 rep30 rep31 rep32 rep33 rep34 rep35
## 0.47037381 0.24788038 0.47292786 0.48469590 0.43806021 0.25863003 0.49738533
## rep36 rep37 rep38 rep39 rep40 rep41 rep42
## 0.46445747 0.22094969 0.43448778 0.38604162 0.41840372 0.24290160 0.19484340
## rep43 rep44 rep45 rep46 rep47 rep48 rep49
## 0.42878157 0.32681350 0.36473631 0.37207517 0.30571833 0.23658113 0.47876339
## rep50
## 0.39884421
##
## $outpoint
## integer(0)
##
## $medcurve
## rep21
## 21
##
## 5. model 2 with outliers
##
#magnitude
k=6
#randomly introduce outliers
C=rbinom(n,1,0.1)
s=2*rbinom(n,1,0.5)-1
cs.m=matrix(C*s,p,n,byrow=TRUE)
e=matrix(rnorm(n*p),p,n)
y=mu+t(L)%*%e+k*cs.m
#functional boxplot
fbplot(y,method='MBD',ylim=c(-11,15))
## $depth
## [1] 0.46389116 0.27281633 0.04522449 0.45093878 0.36255782 0.26595918
## [7] 0.47983673 0.48010884 0.06481633 0.07395918 0.31216327 0.36331973
## [13] 0.42329252 0.21447619 0.46987755 0.39374150 0.45175510 0.33893878
## [19] 0.34051701 0.41344218 0.29627211 0.45627211 0.31058503 0.47521088
## [25] 0.05436735 0.45159184 0.27042177 0.45605442 0.36522449 0.47945578
## [31] 0.16859864 0.48310204 0.19183673 0.41017687 0.44745578 0.36348299
## [37] 0.43482993 0.43700680 0.16936054 0.40772789 0.45855782 0.47695238
## [43] 0.39755102 0.35488435 0.26971429 0.49703401 0.44625850 0.34160544
## [49] 0.49327891 0.45349660
##
## $outpoint
## [1] 3 9 10 25
##
## $medcurve
## [1] 46
##
## 1. generate 50 random curves with some covariance structure
## model 1 without outliers
##
cov.fun=function(d,k,c,mu){
k*exp(-c*d^mu)
}
n=50
p=30
t=seq(0,1,len=p)
d=dist(t,upper=TRUE,diag=TRUE)
d.matrix=as.matrix(d)
#covariance function in time
t.cov=cov.fun(d.matrix,1,1,1)
# Cholesky Decomposition
L=chol(t.cov)
mu=4*t
e=matrix(rnorm(n*p),p,n)
ydata = mu+t(L)%*%e
#functional boxplot
oldpar <- par(no.readonly=TRUE)
fbplot(ydata,method='MBD',ylim=c(-11,15))
## $depth
## [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
## [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 17
# The same using boxplot.fd
boxplot.fd(ydata, method='MBD', ylim=c(-11, 15))
## $depth
## [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
## [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 17
# same with default ylim
boxplot.fd(ydata)
## $depth
## [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
## [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 17
##
## 2. as an fd object
##
T = dim(ydata)[1]
time = seq(0,T,len=T)
ybasis = create.bspline.basis(c(0,T), 23)
Yfd = smooth.basis(time, ydata, ybasis)$fd
boxplot(Yfd)
## $depth
## [1] 0.4203920 0.4393211 0.3991998 0.3791231 0.3938978 0.4131663 0.4913073
## [8] 0.4666559 0.1900909 0.3642190 0.1798262 0.2121722 0.4041948 0.4823682
## [15] 0.4300748 0.3784765 0.5113356 0.1896545 0.3198464 0.4043403 0.2492221
## [22] 0.4416003 0.4536432 0.4464013 0.4411800 0.3948353 0.1728592 0.2879046
## [29] 0.2418509 0.3966943 0.4723783 0.4461103 0.4790867 0.2934169 0.3478440
## [36] 0.3850717 0.4810103 0.3401819 0.0868135 0.4792322 0.2131905 0.2836694
## [43] 0.4740594 0.4812043 0.2108305 0.4137644 0.3302243 0.4175308 0.3070762
## [50] 0.0614508
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 17
##
## 3. as an fdPar object
##
Ypar <- fdPar(Yfd)
boxplot(Ypar)
## $depth
## [1] 0.4203920 0.4393211 0.3991998 0.3791231 0.3938978 0.4131663 0.4913073
## [8] 0.4666559 0.1900909 0.3642190 0.1798262 0.2121722 0.4041948 0.4823682
## [15] 0.4300748 0.3784765 0.5113356 0.1896545 0.3198464 0.4043403 0.2492221
## [22] 0.4416003 0.4536432 0.4464013 0.4411800 0.3948353 0.1728592 0.2879046
## [29] 0.2418509 0.3966943 0.4723783 0.4461103 0.4790867 0.2934169 0.3478440
## [36] 0.3850717 0.4810103 0.3401819 0.0868135 0.4792322 0.2131905 0.2836694
## [43] 0.4740594 0.4812043 0.2108305 0.4137644 0.3302243 0.4175308 0.3070762
## [50] 0.0614508
##
## $outpoint
## integer(0)
##
## $medcurve
## [1] 17
##
## 4. Smoothed version
##
Ysmooth <- smooth.fdPar(Yfd)
boxplot(Ysmooth)
## $depth
## rep1 rep2 rep3 rep4 rep5 rep6 rep7 rep8
## 0.4206830 0.4393211 0.3991998 0.3792524 0.3936068 0.4131663 0.4912265 0.4666559
## rep9 rep10 rep11 rep12 rep13 rep14 rep15 rep16
## 0.1900909 0.3643807 0.1798262 0.2121722 0.4044696 0.4821095 0.4303819 0.3784765
## rep17 rep18 rep19 rep20 rep21 rep22 rep23 rep24
## 0.5113033 0.1900263 0.3198464 0.4043403 0.2490604 0.4416811 0.4536432 0.4462720
## rep25 rep26 rep27 rep28 rep29 rep30 rep31 rep32
## 0.4410346 0.3948353 0.1725359 0.2879046 0.2414791 0.3966943 0.4729440 0.4461103
## rep33 rep34 rep35 rep36 rep37 rep38 rep39 rep40
## 0.4790867 0.2931420 0.3478440 0.3852980 0.4810911 0.3402142 0.0868135 0.4792807
## rep41 rep42 rep43 rep44 rep45 rep46 rep47 rep48
## 0.2131905 0.2836209 0.4740594 0.4812528 0.2108305 0.4139745 0.3301596 0.4171752
## rep49 rep50
## 0.3067852 0.0614508
##
## $outpoint
## integer(0)
##
## $medcurve
## rep17
## 17
##
## 5. model 2 with outliers
##
#magnitude
k=6
#randomly introduce outliers
C=rbinom(n,1,0.1)
s=2*rbinom(n,1,0.5)-1
cs.m=matrix(C*s,p,n,byrow=TRUE)
e=matrix(rnorm(n*p),p,n)
y=mu+t(L)%*%e+k*cs.m
#functional boxplot
fbplot(y,method='MBD',ylim=c(-11,15))
## $depth
## [1] 0.34677551 0.39945578 0.42367347 0.46220408 0.06873469 0.37398639
## [7] 0.40277551 0.08658503 0.42606803 0.34944218 0.41670748 0.47363265
## [13] 0.47314286 0.22579592 0.41186395 0.21888435 0.43423129 0.43444898
## [19] 0.48092517 0.48440816 0.48076190 0.48565986 0.05044898 0.19613605
## [25] 0.26889796 0.43580952 0.39009524 0.23891156 0.27657143 0.16250340
## [31] 0.34323810 0.44342857 0.49420408 0.45148299 0.46579592 0.48985034
## [37] 0.43967347 0.19063946 0.37502041 0.44571429 0.50133333 0.33246259
## [43] 0.34965986 0.37229932 0.04000000 0.32979592 0.35640816 0.46563265
## [49] 0.39989116 0.33393197
##
## $outpoint
## [1] 5 23 45
##
## $medcurve
## [1] 41
par(oldpar)
References Sun, Y., Genton, M. G. and Nychka, D. (2012), “Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?” Stat, 1, 68-74.
Sun, Y. and Genton, M. G. (2011), “Functional Boxplots,” Journal of Computational and Graphical Statistics, 20, 316-334.
Lopez-Pintado, S. and Romo, J. (2009), “On the concept of depth for functional data,” Journal of the American Statistical Association, 104, 718-734.
Ramsay, James O., Hooker, Giles, and Graves, Spencer (2009), Functional data analysis with R and Matlab, Springer, New York.
Ramsay, James O., and Silverman, Bernard W. (2005), Functional Data Analysis, 2nd ed., Springer, New York.
Ramsay, James O., and Silverman, Bernard W. (2002), Applied Functional Data Analysis, Springer, New York.