BOXPLOT FUNCIONAL

Functional Depths for Functional Data Analysis

It explains the concept of functional depth, a fundamental tool in functional data analysis that extends the notion of statistical depth to infinite-dimensional spaces. We present several popular functional depth measures and their properties.

Introduction to Functional Depth

Functional depth is a generalization of multivariate statistical depth to functional data, providing a center-outward ordering of curves in a functional space \(\mathcal{F}\). Given a sample of curves \(\{X_i(t)\}_{i=1}^n\), \(t \in \mathcal{T}\), a depth function \(D: \mathcal{F} \to \mathbb{R}^+\) assigns higher values to more central curves.

Properties of Functional Depth

A good functional depth should satisfy:

  • {Invariance} to transformations like translation and scaling
  • {Monotonicity} relative to the deepest function
  • {Maximality} at the center for symmetric distributions
  • {Vanishing} as \(\|X\| \to \infty\)

The integrated depth combines pointwise univariate depths:

\[ D_I(X) = \int_{\mathcal{T}} D_t(X(t)) \, dt \]

where \(D_t\) is a univariate depth at point \(t\).

Modified Band Depth (MBD)

The MBD measures the proportion of time a curve lies within the band formed by other curves:

\[ \text{MBD}(X) = \binom{n}{2}^{-1} \sum_{1 \leq i < j \leq n} \lambda\{t \in \mathcal{T} : \min(X_i(t), X_j(t)) \leq X(t) \leq \max(X_i(t), X_j(t))\} \]

where \(\lambda\) is the Lebesgue measure on \(\mathcal{T}\).

Functional Halfspace Depth

An extension of Tukey’s depth:

\[ D_H(X) = \inf_{\substack{v \in V \\ c \in \mathbb{R}}} P(v(X) \leq c) \]

where \(V\) is a set of directions in the functional space.

Projects the functional data onto random directions and computes multivariate depth:

\[ D_R(X) = \mathbb{E}_v[D_{\text{multivariate}}(v(X))] \]

Functional depth provides a powerful framework for analyzing functional data, offering robust and interpretable tools for various statistical tasks. The choice of depth measure depends on the specific application and data characteristics.

For functional data, the band depth (BD) or modified band depth (MBD) allows for ordering a sample of curves from the center outwards and, thus, introduces a measure to define functional quantiles and the centrality or outlyingness of an observation. A smaller rank is associated with a more central position with respect to the sample curves. BD usually provides many ties (curves have the same depth values), but MBD does not. “BD2” uses two curves to determine a band. The method “Both” uses “BD2” first and then uses “MBD” to break ties. The method “Both” uses BD2 first and then uses MBD to break ties. The computation is carried out by the fast algorithm proposed by Sun et. al. (2012).

Arguments fit a p-by-n functional data matrix where n is the number of curves, and p is defined below.

x
For fbplot, x is the x coordinates of curves. Defaults to 1:p where p is the number of x coordinates.

For boxplot.fd, boxplot.fdPar and boxplot.fdSmooth, x is an object of class fd, fdPar or fdSmooth, respectively.

z
The coordinate of the curves, labeled x for fdplot. For boxplot.fd, boxplot.fdPar and boxplot.fdSmooth, this cannot be x, because that would clash with the generic boxplot(x, …) standard.

method
the method to be used to compute band depth. Can be one of “BD2”, “MBD” or “Both” with a default of “MBD”. See also details.

depth
a vector giving band depths of curves. If missing, band depth computation is conducted.

plot
logical. If TRUE (the default) then a functional boxplot is produced. If not, band depth and outliers are returned.

prob
a vector giving the probabilities of central regions in a decreasing order, then an enhanced functional boxplot is produced. Defaults to be 0.5 and a functional boxplot is plotted.

color
a vector giving the colors of central regions from light to dark for an enhanced functional boxplot. Defaults to be magenta for a functional boxplot.

outliercol
color of outlying curves. Defaults to be red.

barcol
color of bars in a functional boxplot. Defaults to be blue.

fullout logical for plotting outlying curves. If FALSE (the default) then only the part outside the box is plotted. If TRUE, complete outlying curves are plotted.

factor
the constant factor to inflate the middle box and determine fences for outliers. Defaults to be 1.5 as in a classical boxplot.

xlim
x-axis limits

ylim
y-axis limits

… For fbplot, optional arguments for plot.

For boxplot.fd, boxplot.fdPar, or boxplot.fdSmooth, optional arguments for fbplot.

##################################
#                                #
#       BOXPLOT FUNCIONAL        #
#                                #
##################################




# Remerber what is a classical boxplot
x<-rnorm(100,4,3)
boxplot(x)

library(fda)
## Warning: package 'fda' was built under R version 4.4.3
## Cargando paquete requerido: splines
## Cargando paquete requerido: fds
## Cargando paquete requerido: rainbow
## Cargando paquete requerido: MASS
## Warning: package 'MASS' was built under R version 4.4.3
## Cargando paquete requerido: pcaPP
## Cargando paquete requerido: RCurl
## Cargando paquete requerido: deSolve
## Warning: package 'deSolve' was built under R version 4.4.3
## 
## Adjuntando el paquete: 'fda'
## The following object is masked from 'package:graphics':
## 
##     matplot
?fbplot
## starting httpd help server ...
##  done
##
## 1.  generate 50 random curves with some covariance structure
##     model 1 without outliers
##
cov.fun=function(d,k,c,mu){
  k*exp(-c*d^mu)
}
n=50
p=30
t=seq(0,1,len=p)
d=dist(t,upper=TRUE,diag=TRUE)
d.matrix=as.matrix(d)
#covariance function in time
t.cov=cov.fun(d.matrix,1,1,1)
# Cholesky Decomposition
L=chol(t.cov)#LL'
mu=4*t
e=matrix(rnorm(n*p),p,n)
ydata  = mu+t(L)%*%e

#functional boxplot
fbplot(ydata,method='MBD',ylim=c(-11,15))

## $depth
##  [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
##  [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 21
# The same using boxplot.fd
boxplot.fd(ydata, method='MBD', ylim=c(-11, 15))
## $depth
##  [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
##  [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 21
# same with default ylim
boxplot.fd(ydata)

## $depth
##  [1] 0.36179592 0.20457143 0.27297959 0.41703401 0.47515646 0.20380952
##  [7] 0.40876190 0.27455782 0.37572789 0.26057143 0.21436735 0.09442177
## [13] 0.45583673 0.41213605 0.43896599 0.47172789 0.49039456 0.38356463
## [19] 0.48571429 0.38481633 0.50878912 0.43336054 0.42394558 0.39825850
## [25] 0.30127891 0.35967347 0.06176871 0.27619048 0.47346939 0.24712925
## [31] 0.47085714 0.48261224 0.43580952 0.26138776 0.49877551 0.46563265
## [37] 0.21980952 0.42672109 0.39281633 0.41654422 0.24386395 0.20114286
## [43] 0.42982313 0.33246259 0.36489796 0.37050340 0.30868027 0.23406803
## [49] 0.47629932 0.39651701
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 21
##
## 2.  as an fd object
##
T      = dim(ydata)[1]
time   = seq(0,T,len=T)
ybasis = create.bspline.basis(c(0,T), 23)
Yfd    = smooth.basis(time, ydata, ybasis)$fd
boxplot(Yfd)

## $depth
##  [1] 0.36182663 0.20171348 0.27316226 0.41565569 0.47721156 0.19508588
##  [7] 0.41224490 0.27540917 0.37771671 0.26145888 0.20805011 0.09507375
## [13] 0.45828248 0.41279450 0.44004849 0.47533643 0.49219640 0.38678521
## [19] 0.48560113 0.38846636 0.51186906 0.42794100 0.42364114 0.39687210
## [25] 0.30233987 0.36219842 0.06288947 0.27355021 0.47037381 0.24804203
## [31] 0.47292786 0.48453425 0.43806021 0.25863003 0.49738533 0.46461911
## [37] 0.22094969 0.43448778 0.38604162 0.41827440 0.24290160 0.19463326
## [43] 0.42878157 0.32681350 0.36473631 0.37223682 0.30596080 0.23658113
## [49] 0.47876339 0.39884421
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 21
##
## 3.  as an fdPar object
##
Ypar <- fdPar(Yfd)
boxplot(Ypar)
## $depth
##  [1] 0.36182663 0.20171348 0.27316226 0.41565569 0.47721156 0.19508588
##  [7] 0.41224490 0.27540917 0.37771671 0.26145888 0.20805011 0.09507375
## [13] 0.45828248 0.41279450 0.44004849 0.47533643 0.49219640 0.38678521
## [19] 0.48560113 0.38846636 0.51186906 0.42794100 0.42364114 0.39687210
## [25] 0.30233987 0.36219842 0.06288947 0.27355021 0.47037381 0.24804203
## [31] 0.47292786 0.48453425 0.43806021 0.25863003 0.49738533 0.46461911
## [37] 0.22094969 0.43448778 0.38604162 0.41827440 0.24290160 0.19463326
## [43] 0.42878157 0.32681350 0.36473631 0.37223682 0.30596080 0.23658113
## [49] 0.47876339 0.39884421
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 21
##
## 4.  Smoothed version
##
Ysmooth <- smooth.fdPar(Yfd)
boxplot(Ysmooth)

## $depth
##       rep1       rep2       rep3       rep4       rep5       rep6       rep7 
## 0.36182663 0.20171348 0.27332390 0.41565569 0.47721156 0.19508588 0.41224490 
##       rep8       rep9      rep10      rep11      rep12      rep13      rep14 
## 0.27540917 0.37771671 0.26145888 0.20792079 0.09507375 0.45828248 0.41276217 
##      rep15      rep16      rep17      rep18      rep19      rep20      rep21 
## 0.44004849 0.47533643 0.49222873 0.38678521 0.48576278 0.38825621 0.51186906 
##      rep22      rep23      rep24      rep25      rep26      rep27      rep28 
## 0.42818347 0.42364114 0.39693675 0.30227521 0.36219842 0.06288947 0.27355021 
##      rep29      rep30      rep31      rep32      rep33      rep34      rep35 
## 0.47037381 0.24788038 0.47292786 0.48469590 0.43806021 0.25863003 0.49738533 
##      rep36      rep37      rep38      rep39      rep40      rep41      rep42 
## 0.46445747 0.22094969 0.43448778 0.38604162 0.41840372 0.24290160 0.19484340 
##      rep43      rep44      rep45      rep46      rep47      rep48      rep49 
## 0.42878157 0.32681350 0.36473631 0.37207517 0.30571833 0.23658113 0.47876339 
##      rep50 
## 0.39884421 
## 
## $outpoint
## integer(0)
## 
## $medcurve
## rep21 
##    21
##
## 5.  model 2 with outliers
##
#magnitude
k=6
#randomly introduce outliers
C=rbinom(n,1,0.1)
s=2*rbinom(n,1,0.5)-1
cs.m=matrix(C*s,p,n,byrow=TRUE)

e=matrix(rnorm(n*p),p,n)
y=mu+t(L)%*%e+k*cs.m

#functional boxplot
fbplot(y,method='MBD',ylim=c(-11,15))

## $depth
##  [1] 0.46389116 0.27281633 0.04522449 0.45093878 0.36255782 0.26595918
##  [7] 0.47983673 0.48010884 0.06481633 0.07395918 0.31216327 0.36331973
## [13] 0.42329252 0.21447619 0.46987755 0.39374150 0.45175510 0.33893878
## [19] 0.34051701 0.41344218 0.29627211 0.45627211 0.31058503 0.47521088
## [25] 0.05436735 0.45159184 0.27042177 0.45605442 0.36522449 0.47945578
## [31] 0.16859864 0.48310204 0.19183673 0.41017687 0.44745578 0.36348299
## [37] 0.43482993 0.43700680 0.16936054 0.40772789 0.45855782 0.47695238
## [43] 0.39755102 0.35488435 0.26971429 0.49703401 0.44625850 0.34160544
## [49] 0.49327891 0.45349660
## 
## $outpoint
## [1]  3  9 10 25
## 
## $medcurve
## [1] 46
##
## 1.  generate 50 random curves with some covariance structure
##     model 1 without outliers
##
cov.fun=function(d,k,c,mu){
        k*exp(-c*d^mu)
}
n=50
p=30
t=seq(0,1,len=p)
d=dist(t,upper=TRUE,diag=TRUE)
d.matrix=as.matrix(d)
#covariance function in time
t.cov=cov.fun(d.matrix,1,1,1)
# Cholesky Decomposition
L=chol(t.cov)
mu=4*t
e=matrix(rnorm(n*p),p,n)
ydata  = mu+t(L)%*%e

#functional boxplot
oldpar <- par(no.readonly=TRUE)
fbplot(ydata,method='MBD',ylim=c(-11,15))

## $depth
##  [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
##  [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 17
# The same using boxplot.fd
boxplot.fd(ydata, method='MBD', ylim=c(-11, 15))
## $depth
##  [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
##  [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 17
# same with default ylim
boxplot.fd(ydata)

## $depth
##  [1] 0.41915646 0.44152381 0.40065306 0.37921088 0.38922449 0.41523810
##  [7] 0.49028571 0.46383673 0.19390476 0.36087075 0.18443537 0.20691156
## [13] 0.41001361 0.47613605 0.42634014 0.37736054 0.51069388 0.18829932
## [19] 0.32435374 0.40691156 0.25224490 0.44038095 0.45474830 0.44571429
## [25] 0.43673469 0.39869388 0.18323810 0.29453061 0.24043537 0.39472109
## [31] 0.46770068 0.44136054 0.47858503 0.29529252 0.35352381 0.38525170
## [37] 0.47755102 0.34095238 0.09268027 0.47341497 0.20995918 0.28576871
## [43] 0.47640816 0.48277551 0.20810884 0.41665306 0.32772789 0.40990476
## [49] 0.30628571 0.06329252
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 17
##
## 2.  as an fd object
##
T      = dim(ydata)[1]
time   = seq(0,T,len=T)
ybasis = create.bspline.basis(c(0,T), 23)
Yfd    = smooth.basis(time, ydata, ybasis)$fd
boxplot(Yfd)

## $depth
##  [1] 0.4203920 0.4393211 0.3991998 0.3791231 0.3938978 0.4131663 0.4913073
##  [8] 0.4666559 0.1900909 0.3642190 0.1798262 0.2121722 0.4041948 0.4823682
## [15] 0.4300748 0.3784765 0.5113356 0.1896545 0.3198464 0.4043403 0.2492221
## [22] 0.4416003 0.4536432 0.4464013 0.4411800 0.3948353 0.1728592 0.2879046
## [29] 0.2418509 0.3966943 0.4723783 0.4461103 0.4790867 0.2934169 0.3478440
## [36] 0.3850717 0.4810103 0.3401819 0.0868135 0.4792322 0.2131905 0.2836694
## [43] 0.4740594 0.4812043 0.2108305 0.4137644 0.3302243 0.4175308 0.3070762
## [50] 0.0614508
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 17
##
## 3.  as an fdPar object
##
Ypar <- fdPar(Yfd)
boxplot(Ypar)
## $depth
##  [1] 0.4203920 0.4393211 0.3991998 0.3791231 0.3938978 0.4131663 0.4913073
##  [8] 0.4666559 0.1900909 0.3642190 0.1798262 0.2121722 0.4041948 0.4823682
## [15] 0.4300748 0.3784765 0.5113356 0.1896545 0.3198464 0.4043403 0.2492221
## [22] 0.4416003 0.4536432 0.4464013 0.4411800 0.3948353 0.1728592 0.2879046
## [29] 0.2418509 0.3966943 0.4723783 0.4461103 0.4790867 0.2934169 0.3478440
## [36] 0.3850717 0.4810103 0.3401819 0.0868135 0.4792322 0.2131905 0.2836694
## [43] 0.4740594 0.4812043 0.2108305 0.4137644 0.3302243 0.4175308 0.3070762
## [50] 0.0614508
## 
## $outpoint
## integer(0)
## 
## $medcurve
## [1] 17
##
## 4.  Smoothed version
##
Ysmooth <- smooth.fdPar(Yfd)
boxplot(Ysmooth)

## $depth
##      rep1      rep2      rep3      rep4      rep5      rep6      rep7      rep8 
## 0.4206830 0.4393211 0.3991998 0.3792524 0.3936068 0.4131663 0.4912265 0.4666559 
##      rep9     rep10     rep11     rep12     rep13     rep14     rep15     rep16 
## 0.1900909 0.3643807 0.1798262 0.2121722 0.4044696 0.4821095 0.4303819 0.3784765 
##     rep17     rep18     rep19     rep20     rep21     rep22     rep23     rep24 
## 0.5113033 0.1900263 0.3198464 0.4043403 0.2490604 0.4416811 0.4536432 0.4462720 
##     rep25     rep26     rep27     rep28     rep29     rep30     rep31     rep32 
## 0.4410346 0.3948353 0.1725359 0.2879046 0.2414791 0.3966943 0.4729440 0.4461103 
##     rep33     rep34     rep35     rep36     rep37     rep38     rep39     rep40 
## 0.4790867 0.2931420 0.3478440 0.3852980 0.4810911 0.3402142 0.0868135 0.4792807 
##     rep41     rep42     rep43     rep44     rep45     rep46     rep47     rep48 
## 0.2131905 0.2836209 0.4740594 0.4812528 0.2108305 0.4139745 0.3301596 0.4171752 
##     rep49     rep50 
## 0.3067852 0.0614508 
## 
## $outpoint
## integer(0)
## 
## $medcurve
## rep17 
##    17
##
## 5.  model 2 with outliers
##
#magnitude
k=6
#randomly introduce outliers
C=rbinom(n,1,0.1)
s=2*rbinom(n,1,0.5)-1
cs.m=matrix(C*s,p,n,byrow=TRUE)

e=matrix(rnorm(n*p),p,n)
y=mu+t(L)%*%e+k*cs.m

#functional boxplot
fbplot(y,method='MBD',ylim=c(-11,15))

## $depth
##  [1] 0.34677551 0.39945578 0.42367347 0.46220408 0.06873469 0.37398639
##  [7] 0.40277551 0.08658503 0.42606803 0.34944218 0.41670748 0.47363265
## [13] 0.47314286 0.22579592 0.41186395 0.21888435 0.43423129 0.43444898
## [19] 0.48092517 0.48440816 0.48076190 0.48565986 0.05044898 0.19613605
## [25] 0.26889796 0.43580952 0.39009524 0.23891156 0.27657143 0.16250340
## [31] 0.34323810 0.44342857 0.49420408 0.45148299 0.46579592 0.48985034
## [37] 0.43967347 0.19063946 0.37502041 0.44571429 0.50133333 0.33246259
## [43] 0.34965986 0.37229932 0.04000000 0.32979592 0.35640816 0.46563265
## [49] 0.39989116 0.33393197
## 
## $outpoint
## [1]  5 23 45
## 
## $medcurve
## [1] 41
par(oldpar)

References Sun, Y., Genton, M. G. and Nychka, D. (2012), “Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?” Stat, 1, 68-74.

Sun, Y. and Genton, M. G. (2011), “Functional Boxplots,” Journal of Computational and Graphical Statistics, 20, 316-334.

Lopez-Pintado, S. and Romo, J. (2009), “On the concept of depth for functional data,” Journal of the American Statistical Association, 104, 718-734.

Ramsay, James O., Hooker, Giles, and Graves, Spencer (2009), Functional data analysis with R and Matlab, Springer, New York.

Ramsay, James O., and Silverman, Bernard W. (2005), Functional Data Analysis, 2nd ed., Springer, New York.

Ramsay, James O., and Silverman, Bernard W. (2002), Applied Functional Data Analysis, Springer, New York.