Manuel Oviedo de la Fuente (University of Santiago de Compostela)

Web page: http://eio.usc.es/pub/moviedo/index.php/software

E-mail:

Slides: https://rpubs.com/moviedo/

1 Introduction

2 Functional Data Representation

Part I: Installation and Descriptive Statistics

Introduction

This vignette describes the usage of fda.usc in R. fda.usc package carries out exploratory and descriptive analysis of functional data exploring its most important features such as:

  • Functional Data Representation
  • Functional Regression
  • Functional Classification
  • Functional ANOVA

The co-author of fda.usc is Manuel Febrero-Bande, the contributors are Pedro Galeano, Alicia Nieto and Eduardo Garcia-Portugues.

State of the art in R: CRAN Task View

R task view devoted to FDA: https://cran.r-project.org/web/views/FunctionalData.html

is included in the set of the 7 recommended core packages.

Dependencies:

  • Imports, dependent: MASS, fda, mgcv, nlme, rpart (depecrated)
  • Reverse depends: GPFDA, ILS, MFHD, NITPicker, qcr
  • Reverse imports: classiFunc, PCRedux, mlr
  • Versions (by year): 5 (2012), 6 (2013), 2 (2014), 1 (2015), 2 (2016), 1 (2018), 1(2019)

fda.usc package download by month:

State of the art in Functional Data Analysis in R

  1. fda It is a basic reference to work in R with functional data, see (Ramsay and Silverman 2005), (http://ego.psych.mcgill.ca/misc/fda/)

  2. Ferray and Vieu, (2006) processed FD from a nonparametric point of view (normed or semi–normed functional spaces). These authors are part of the French group STAPH maintaining the page {http://www.lsp.ups-tlse.fr/staph/}}

  3. Core packages:

  • FDboost: Boosting Functional Regression Models
  • fds: Functional Data Sets
  • ftsa: Functional Time Series Analysis
  • fdasrvf: Elastic Functional Data Analysis
  • refund: Regression with Functional Data
  • fdapace: Functional Data Analysis and Empirical Dynamics

4.Interactive tools: + StatFda: exploratory analysis and functional regression models,http://www.statfda.com forand refund.shiny package for interactive plotting. + refund.shiny: Interactive Plotting for Functional Data Analyses + tidyfun: makes data wrangling and exploratory analysis of functional data easier, https://fabian-s.github.io/tidyfun

  1. Other packages:
  • GPFDA: Use Functional regression as the mean structure and Gaussian Process as the covariance structure.
  • MFHD: Multivariate Functional Halfspace Depth
  • rainbow: Bagplots, Boxplots and Rainbow Plots for Functional Data
  • NITPicker: Finds the Best Subset of Points to Sample
  • fdatest: Interval Testing Procedure for Functional Data
  • fdakma: Functional Data Analysis: K-Mean Alignment
  • fdaMixed: Functional data analysis in a mixed model framework
  • geofd: Spatial Prediction for Function Value Data
  • goffda: Goodness-of-Fit Tests for Functional Data
  • mlr: Machine Learning in R

Installation

Like many other R packages, the simplest way to obtain fda.usc is to install it directly from CRAN. Type the following command in R console:

install.packages("fda.usc", repos = "http://cran.us.r-project.org")

Users may change the repos options depending on their locations and preferences. Other options such as the directories where to install the packages can be altered in the command. For more details, see help(install.packages).

Here the R package has been downloaded and installed to the default directories.

Alternatively, users can download the package source at http://cran.r-project.org/web/packages/fda.usc/index.html and type Unix commands to install it to the desired location.

Back to Top

Quick Start

The purpose of this section is to give users a general sense of the package, including the components, what they do and some basic usage. We will briefly go over the main functions, see the basic operations and have a look at the outputs. Users may have a better idea after this section what functions are available, which one to choose, or at least where to seek help. More details are given in later sections.

First, we load the fda.usc package:

library(fda.usc)

Back to Top

Functional Data: Definition, Representation Description and Manipulation

Some definitions of Functional Data

Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. The continuum is often time, but may also be spatial location, wavelength, probability, etc.

Functional data analysis is a branch of statistics concerned with analysing data in the form of functions.

  • Definition 1: A random variable \(\mathcal{X}\) is called a functional variable if it takes values in a functional space \(\mathcal{E}\) –complete normed (or seminormed) space–, (Ferraty and Vieu 2006)

  • Definition 2: A functional dataset \(\{\mathcal{X}_1,\ldots,\mathcal{X}_n\}\) is the observation of *n} functional variables \(\mathcal{X}_1,\ldots,\mathcal{X}_n\) identically distributed as \(\mathcal{X}\), (Ferraty and Vieu 2006).

In fda.usc: ``The data are curves’’

\(X_i(t)\) represents the mean temperature (averaged over 1980-2009 years) at the ith weather station in Spain, and at time time t during the year. In R:

But what code has been used to obtain the graph?

data(aemet)
par(mfrow=c(1,2))
CanaryIslands= ifelse(aemet$df$latitude<31,2,4)
plot(aemet$df[,c("longitude","latitude")],col=CanaryIslands,lwd=2)
plot(aemet$temp,col=CanaryIslands,lwd=2)

Definition of –fdata– class in R

Definition of fdata class object: An object called fdata as a list of the following components:

  • data: typically a matrix of (n x m) dimension which contains a set of n curves discretized in m points or argvals.
  • argvals: locations of the discretization points, by default: \({{t}_1=1,\ldots,{t}_m=m}\}\).
  • rangeval: rangeval of discretization points.
  • names: (optional) list with three components: main, an overall title, xlab, a title for the x axis and ylab, a title for the y axis.
lapply(aemet,class)
names(aemet$df)
lapply(aemet$temp,class)
temp <- aemet$temp
names(temp)
dim(temp)
length(argvals(temp))
rangeval(temp)
temp$names
is.fdata(temp)

Back to Top

Some utilities of fda.usc package

  1. Basic operations for fdata class objects:
  • Group Math: abs, sqrt, floor, ceiling+ semimetric.basis(), trunc, round, signif, exp, log, cos, sin, tan.
  • Other operations: [], is.fdata(), c(), dim(), ncol(), nrow().
is.fdata(aemet$temp)
is.fdata(aemet$df)
dim(aemet$df)
dim(aemet$temp)
  1. Convert the class
  • The new class fdata only uses the evaluations at the discretization points.
  • The fdata2fd() converts fdata object to fd object (using the basis representation).
  • Inversely, the fdata() converts object of class: fd, fds, fts, sfts, vector, matrix, data.frame to an object of class fdata.
temp.fd=fdata2fd(temp,type.basis="fourier",nbasis=15)
temp.fdata=fdata(temp.fd) #back to fdata
class(temp.fd)
## [1] "fd"
class(temp.fdata)
## [1] "fdata"
  • split.fdata() and unlist(): A wrapper for the split and unlist function for fdata object
# Canary Islands in red vs Iberian Peninsula in blue
l1<-split.fdata(temp,CanaryIslands)
par(mfrow=c(1,2))
plot(l1[[1]],col=2,ylim=c(0,30))
plot(l1[[2]],col=4,ylim=c(0,30))

dim(l1[[1]]);dim(l1[[2]])
## [1]   9 365
## [1]  64 365
# internal function plot.lfdata
plot.lfdata<-fda.usc:::plot.lfdata
plot.lfdata(l1)

  • Group Ops: + -, *, /, ^, %%, %/%, &, |, !, ==, !=, <, <=, >=, >.
l1[[1]]==l1[[2]]
## [1] FALSE
  • order.fdata() A wrapper for the order function. The funcional data is ordered w.r.t the sample order of the values of vector.
temp2<-order.fdata(order(CanaryIslands),temp)
  • Generate random process of fdata class.

rproc2fdata() function generates Functional data from: Ornstein Uhlenbeck process, Brownian process, Gaussian process or Exponential variogram process.

par(mfrow=c(1,2))
lent<-30
tt<-seq(0,1,len=lent)
mu<-fdata(sin(2*pi*tt),tt)
plot(rproc2fdata(200,t=tt,sigma="OU",par.list=list("scale"=1)))
plot(rproc2fdata(200,mu=mu,sigma="OU",par.list=list("scale"=1)))

Other functons: gridfdata() generates n curves as lineal combination of the original curves fdataobj plus a functional trend mu.

rcombfdata() generates n random linear combinations of the fdataobj curves plus a functional trend mu. The coefficients of the combinations follows a normal distribution with zero mean and standard deviation sdarg.

Back to Top

Resume by smoothing

If we supposed that our functional data \(Y(t)\) is observed through the model \(Y(t_i)=X(t_i)+\varepsilon(t_i)\) where the residuals \(\varepsilon(t)\) are independent with \(X(t)\).

We can to get back the original signal \(X(t)\) using a linear smoother: \[\hat{X}(t_i)=\sum_{i=1}^{n} s_{i}(t_j)Y(t_i) \Rightarrow \mathbf{\hat{X}}=\mathbf{S}\mathbf{Y} \] where \(s_{i}(t_j)\) is the weight that the point \(t_j\) gives to the point \(t_i\).

We use two methods to estimate the smoothing matrix \(S\).

  1. Finite representation in a fixed basis (Ramsay and Silverman 2005):

Let \(X(t)\in \mathcal{L}_2\), \[ X(t)= \sum_{k\in\mathbb{N}}{c_k\phi_k(t)}\approx\sum_{k=1}^K c_k\phi_k(t)=c^{\top}\Phi\]

The smoothing matrix is given by: \(\mathbf{S}=\Phi(\Phi^{\top}W\Phi+\lambda R)^{-1}\Phi^{\top}W\) where \(\lambda\) is the penalty parameter.

Type of basis:

  • Fourier: Design to represent periodic functions. Orthonormal

  • BSplines: Set of polynomials (of order m) dened in subintervals constructed in such a way that in the border of the subintervals the polynomials coincide (up to \(m - 2\) derivative). Banded

  • Wavelets: Powerful especially when the grid is a power of 2. Orthonormal

  • Polynomials: Adjust a polynomial to the whole curve.

bsp11<-create.bspline.basis(temp$rangeval,nbasis=11)
bsp111<-create.bspline.basis(temp$rangeval,nbasis=111)
S.bsp11  <-  S.basis(temp$argvals, bsp11)
S.bsp111 <- S.basis(temp$argvals, bsp111)
temp.bsp11<-temp.bsp111<-temp
temp.bsp11$data <- temp$data%*%S.bsp11
temp.bsp111$data <- temp$data%*%S.bsp111 
par(mfrow=c(1,3))
plot(temp)
plot(temp.bsp111,main="111 Bspline basis elements")
plot(temp.bsp11,main="11 Bspline basis elements")

# Another way 
out1<-optim.basis(temp,type.CV=CV.S)
names(out1)
##  [1] "gcv"          "numbasis"     "lambda"       "fdataobj"    
##  [5] "fdata.est"    "gcv.opt"      "numbasis.opt" "lambda.opt"  
##  [9] "S.opt"        "base.opt"
out1$numbasis.opt
## [1] 76
  1. Kernel Smoothing (Ferraty and Vieu 2006)

The problem is to estimate the smoothing parameter or bandwidth \(\nu=h\) that better represents the functional data using kernel smoothing. Now,the nonparametric smoothing of functional data is given by the smoothing matrix \(S\): \[s_{ij}=\frac{1}{h}K\left(\frac{t_i-t_j}{h}\right)\]

\[S(h)=(s_j(t_i))=\frac{K\left(\frac{t_i-t_j}{h}\right)}{\sum_{k=1}^{\top}K\left(\frac{t_k-t_j}{h}\right)}\] where \(h\) is the bandwidth and \(K()\) the Kernel function.

Different types of kernels \(K()\) are defined in the package, see Kernel function.

S<-S.NW(temp$argvals,h=1)
temp<-aemet$temp
temp.hat<-temp
temp.hat$data<-temp$data%*%S
args(optim.np)
## function (fdataobj, h = NULL, W = NULL, Ker = Ker.norm, type.CV = GCV.S, 
##     type.S = S.NW, par.CV = list(trim = 0, draw = FALSE), par.S = list(), 
##     correl = TRUE, verbose = FALSE, ...) 
## NULL
# Another way 
temp.est<-optim.np(temp,h=1)$fdata.est
par(mfrow=c(1,3))
plot(temp.est,main="Kernel smooth - h=1")
temp.est==temp.hat
## [1] FALSE
plot(temp.hat,main="Kernel smooth - h=10")
plot(optim.np(temp)$fdata.est,main="Kernel smooth- h-GCV")

  1. Finite representation in a Data-driven basis (Cardot, Ferraty, and Sarda 1999)
  • Functional Principal Components (FPC) Functional Principal Components Analysis (FPCA) and the Functional PLS (FPLS) allow display the functional in a few components.

The functional data can be rewritten as a decomposition in an orthonormal PC basis: maximizing the variance of \(X(t)\):

\[ \hat{X}_{i}(t)=\sum_{i=1}^{K}f_{ik}\xi_{k}(t) \]

where \(f_{i,k}\) is the score of the principal component PC, \(\xi_k\).

y<-rowSums(aemet$logprec$data)
par(mfrow=c(1,3))
plot(aemet$temp)
plot((pc<-create.pc.basis(aemet$temp,l=1:2))[[1]],main="Temperature, 2 FPC")
pc2<-fdata2pc(aemet$temp)
plot(pc2$rotation,main="2 FPC basis",xlab="day")

summary(pc2)
## 
##      - SUMMARY:   fdata2pc  object   -
## 
## -With 2  components are explained  98.78 %
##  of the variability of explicative variables.
##  
## -Variability for each component (%):
##   PC1   PC2 
## 85.57 13.21
  • Functional Partial Least Squares (FPLS) (Krämer and Sugiyama 2011) The PLS components seek to maximize the covariance of \(X(t)\) and \(y\).
args(create.pls.basis)
args(fdata2pls)

Example: Exploratory analysis for spectrometric curves

Tecator dataset: 215 spectrometric curves of meat samples also with Fat, Water and Protein contents obtained by analytic procedures.

Tecator Goal: Explain the fat content through spectrometric curves.

## [1] "absorp.fdata" "y"
## [1] "Fat"     "Water"   "Protein"

As shown in tecator Figure, the plot of \(X(t)\) against \(t\) is not necessarily the most informative and other semi-metric can allow to extract much information from functional variables.

The information about fat seems to be contained in the shape of the curves, so, the semimetric of derivatives could be preferred to \(\mathcal{L}_2\). It is difficult to find the best plot given a particular functional dataset because the shape of graphics depends strongly on the proximity measure.

Derivatives

To compute derivatives there are several options (see fdata.deriv function):

  • Raw Diferentiation: Only interesting when the number of discretization points are dense.
  • Basis representation: Dierentiate a nite representation in a basis.
  • Spline Interpolation: Perform a spline interpolation of given data points and use this interpolation for computing derivatives.
  • Nonparametric Estimation: Use a Local Polynomial Estimator of the functional data.

Back to Top

Computing distances, norms and inner products

Utilities for computing distances, norm and inner products are included in the package. Below are described those but of course, many others can be built with the only restriction that the first two arguments correspond to class –fdata–. In addition, the procedures of the package fda.usc that contains the argument –metric– allow the use of metric or semi-metric functions as shown in the following sections.

  • Distance between functional elements, metric.lp: \(d(X(t),Y(t))=\left\|X(t)-Y(t)\right\|\) with a norm \(\left\| \cdot \right\|\)
  • Norm norm.fdata: \(\left\|X(t)\right\|=\sqrt{\left\langle X(t),X(t)\right\rangle}\)
  • Inner product inprod.fdata: \(\left\langle x,y \right\rangle=(1/4)(\left\|x+y\right\|^2-\left\|x-y\right\|^2)\).

fda.usc package collects several metric and semi-metric functions which allow to extract as much information possible from the functional variable.

Option 1: If the data is in a Hilbert space, represent your data in a basis, and compute all the things accordingly. From a practical point of way, the representation of a functional datum must be approximated by a nite number of terms.

  • semimetric.basis()

Option 2: Approximates all quantities using numerical approximations (Valid for Hilbert and non-Hilbert spaces). This usually involves numerical approximations of integrals, derivatives and so on. A collection of semi-metrics proposed by (Ferraty and Vieu 2006) are also included in the package.

  • semimetric.pca(), based on the Principal Components.
  • semimetric.mplsr(), based on the Partial Least Squares.
  • semimetric.deriv(), based on B-spline representation.
  • semimetric.hshift(), measure the horizontal shift effect.
  • semimetric.fourier(), based on ther Fourier representation

Example of computing norms

Consider \(X(t) = t^2; t\in[0, 1]\). In this case \(||X_1|| =\sqrt{1/5}=0.4472136\) and \(||X_2|| =1/3\)

t = seq(0, 1, len = 101)
x2 = t^2
x2 = fdata(x2, t)
(c(norm.fdata(x2), norm.fdata(x2, lp = 1))) # L2, L1 norm
## [1] 0.4472509 0.3333500
f1 = fdata(rep(1, length(t)), t)
f2 = fdata(sin(2 * pi * t), t) * sqrt(2)
f3 = fdata(cos(2 * pi * t), t) * sqrt(2)
f4 = fdata(sin(4 * pi * t), t) * sqrt(2)
f5 = fdata(cos(4 * pi * t), t) * sqrt(2)
fou5 = c(f1, f2, f3, f4, f5) # Fourier basis is Orthonormal
round(inprod.fdata(fou5), 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    1    0    0    0
## [3,]    0    0    1    0    0
## [4,]    0    0    0    1    0
## [5,]    0    0    0    0    1

Example of computing distances

set.seed(1:4)
p<-1001
r<-rnorm(p,sd=.1)
x<-seq(0,2*pi,length=p)
fx<-fdata((sin(x)+r)/sqrt(pi),x)
fx0<-fdata(r,x)
plot(c(fx,fx0))

metric.lp(fx,fx0)[1,1]
## [1] 1.003144
semimetric.basis(fx,fx0,nbasis1=5,nbasis2=5)[1,1]
## [1] 0.9927386
semimetric.basis(fx,fx0,nbasis1=11,nbasis2=111)[1,1]
## [1] 0.9992571
semimetric.fourier(fx,fx0,nbasis=5)[1,1]
## [1] 0.9972634
semimetric.fourier(fx,fx0,nbasis=11)[1,1]
## [1] 0.9992104
semimetric.deriv(fx,fx0,nderiv=0,nknot=5)[1,1]
## [1] 0.9934345
semimetric.deriv(fx,fx0,nderiv=0,nknot=11)[1,1]
## [1] 0.9984524
integrate(function(x){(sin(x)/sqrt(pi))^2},0,2*pi)
## 1 with absolute error < 7.3e-10

Example of semi-metric as classification rule

data(tecator)
set.seed(4:1)
#ind<-sample(215,30)
#xx<-tecator$absorp[ind]

yy<-ifelse(tecator$y$Fat<20,0,1)
ind<-c(sample(which(yy==0),10),sample(which(yy==1),10))
xx<-fdata.deriv(tecator$absorp[ind])
yy<-yy[ind]
par(mfrow=c(1,1))
d1<-as.dist(metric.lp(xx),T,T)
c1<-hclust(d1)
c1$labels=yy
plot(c1,main="Raw data -- metric.lp",xlab="Class of eah leaf")

Try with the derivative of the curves:

Extension: Combining the information of two components using a \(p\)-dimensional metric such as, for example, the Euclidean:

\[m\left(\left(x_0^1,\ldots,x_0^p\right),\left(x_i^1,\ldots,x_i^p\right)\right):=\sqrt{m_1\left(x_0^1,x_i^1\right)^2+\cdots+m_p\left(x_0^p,x_i^p\right)^2}\] where \(m_{i}\) denotes the metric in the \(i\)-component of the product space, see fda.usc:::metric.ldata code.

It is important here to ensure that the different metrics of the spaces have similar scales to avoid one single component dominating the overall distance.

range(d1)
## [1] 0.005531143 0.091797359
range(d2)
## [1] 0.0003143458 0.0105466510

NOTE: Dependent data in tecator dataset?

ar(tecator$y$Fat)
## 
## Call:
## ar(x = tecator$y$Fat)
## 
## Coefficients:
##      1       2  
## 0.6552  0.1161  
## 
## Order selected 2  sigma^2 estimated as  72.82
plot(tecator$y,col=1:3)

par(mfrow=c(1,1))
ts.plot(tecator$y$Fat)

#barplot(t(tecator$y))

Correlation Distances (Székely, Rizzo, and Bakirov 2007)

The correlation Distances characterizes independence between vectors of arbitrary finite dimensions. Recently, in (Székely and Rizzo 2013), a bias-corrected version is considered and a test of independence developed.

data(tecator)
xx<-tecator$absorp
xx.d2<-fdata.deriv(xx,nderiv=2)
dcor.xy(xx,xx.d2)
## 
##  dcor t-test of independence
## 
## data:  D1 and D2
## T = 35.933, df = 22789, p-value < 2.2e-16
## sample estimates:
## Bias corrected dcor 
##           0.2315581
d1<-metric.lp(xx)
d2<-metric.lp(xx.d2)
bcdcor.dist(d1,d2)
## [1] 0.2315581
Fat <-tecator$y[,"Fat",drop=FALSE]
d3<-metric.dist(Fat)
bcdcor.dist(d1,d3)
## [1] 0.1963254
bcdcor.dist(d2,d3)
## [1] 0.9147076
# Functional / Factor
Fat.cut<-as.matrix(cut(Fat[,1],labels=1:4,breaks=4))
dcor.xy(Fat.cut,xx.d2)
## 
##  dcor t-test of independence
## 
## data:  D1 and D2
## T = 252.57, df = 22789, p-value < 2.2e-16
## sample estimates:
## Bias corrected dcor 
##           0.8583661

Back to Top

Depth for functional data

Depth (in univariate or multivariate context) is a statistical tool that provides a center-outward ordering of data points. This ordering can be employed to dene location measures (and by oposition outliers). Also, some measures can be dened as maximizers of a particular depth.

  • Mahalanobis Depth (Mean) mdepth.MhD
  • Halfspace Depth, also known as Tukey Depth (Median) mdepth.HS
  • Convex Hull Peeling Depth (Mode)
  • Oja Depth
  • Simplicial Depth mdepth.SD
  • Likelihood Depth (Mode) mdepth.LD
y12<-tecator$y[,1:2]
dep<-mdepth.LD(x=y12,scale=T)$dep
plot(y12,col=grey(1-dep))
points(y12[which.max(dep),],col=4,lwd=3)

Many of the preceding depth functions dened in a multivariate contex cannot be applied to functional data. In any case, given a depth for functional data.

  • The deepest point is a location measure and based on the depth function can have a dierent interpretation: Mean, Median, Mode …

  • Computing the depths of the whole sample and ordering them (in decreasing order) gives a rank from the most central point (\(x_{[1]}\)) to the most outlying one (\(x_{[n]}\)). So, those points with larger rank are the candidates to be outliers w.r.t. the data cloud (in the sense of the depth function).

  • Summarizing the \((1-\alpha)\) deepest points could lead also to a robust location measure.

  • Fraiman-Muniz Depth (Fraiman and Muniz 2001) depth.FM

  • Modal Depth (Cuevas, Febrero, and Fraiman 2007) depth.mode

  • Random Projection Depth (Cuevas, Febrero, and Fraiman 2007) depth.RP

  • Random Tukey Depth (Cuesta-Albertos and Nieto-Reyes 2008) depth.RT

  • Band Depth (López-Pintado and Romo 2009) *depth.MBD** (private function)

x<-tecator$absorp
depth.mode(x,draw=TRUE)

Depth (and distances) for multivariate functional data (???)

Modify the procedure to incorporate the extended information.

  • Fraiman–Muniz: Compute a multivariate depth marginally. depth.FMp
  • Modal depth: Use a new distance between data (for derivatives, for example, the Sobolev metric). depth.FMp
  • Random Projection: Consider a multivariate depth to be applied to the dierent projections (also the random projection method could be applied twice). depth.RPp
x.d1<-fdata.deriv(x)
depth.modep(list(x=x,x.d1=x.d1),draw=T)

Example poblenou dataset

Hourly levels of nitrogen oxides in Poblenou (Barcelona). This dataset has 127 daily records (2005/01/06-2005/06/26).

Objective: Explain the diferences in NOx levels as a function of day.

data(poblenou)
dayw = ifelse(poblenou$df$day.week == 7 | poblenou$df$day.festive ==1, 3, ifelse(poblenou$df$day.week == 6, 2, 1))
plot(poblenou$nox,col=dayw)

The \(\mathcal{L}_2\) space (distance is area between curves) seems appropriate although other possibilities could be take into account (\(\mathcal{L}_1\), for example).

fmd = depth.FM(poblenou$nox,draw=T)

md =  depth.mode(poblenou$nox)
rpd = depth.RP(poblenou$nox, nproj = 50)
rtd = depth.RT(poblenou$nox)
print(cur <- c(fmd$lmed, md$lmed, rpd$lmed, rtd$lmed))
## 2005-05-06 2005-05-06 2005-05-06 2005-03-04 
##         63         63         63         10
plot(poblenou$nox,col="grey")
lines(poblenou$nox[cur], lwd = 2, lty = 1:4, col = 1:4)
legend("topleft", c("FMD", "MD", "RPD", "RTD"), lwd = 2, lty = 1:4,col = 1:4)

Back to Top

Outliers detection

There is no general accepted denition of outliers in Functional data so, we dene outlier as a datum generated from a dierent process than the rest of the sample with the following characteristics Its number in the sample is unknown but probably low.

An outlier will have low depth and it will be an outlier in the sense of the depth used.

md1 = depth.mode(lab <- poblenou$nox[dayw == 1]) #Labour days
md2 = depth.mode(sat <- poblenou$nox[dayw == 2]) #Saturdays
md3 = depth.mode(sun <- poblenou$nox[dayw == 3]) #Sundays/Festive
rbind(poblenou$df[dayw == 1, ][which.min(md1$dep), ], poblenou$df[dayw ==
2, ][which.min(md2$dep), ], poblenou$df[dayw == 3, ][which.min(md3$dep),])
##          date day.week day.festive
## 22 2005-03-18        5           0
## 57 2005-04-30        6           0
## 58 2005-05-01        7           0
plot(poblenou$nox,col="grey")
lines(poblenou$nox[c(22,57,58),], lwd = 2, lty = 2:4, col = 2:4)

# Method for detecting outliers, see Febrero-Bande, 2008
#out1<-outliers.depth.trim(poblenou$nox[dayw == 1],nb=100)$outliers

Two procedures for detecting outliers are implemented in the package. (Febrero-Bande, Galeano, and González-Manteiga 2008)

  • Detecting outliers based on trimming: outliers.depth.trim()

  • Detecting outliers based on weighting: outliers.depth.pond()

# Method for detecting outliers, see Febrero-Bande, 2008
#out1<-outliers.depth.trim(poblenou$nox[dayw == 1],nb=100)$outliers
#out2<-outliers.depth.pond(poblenou$nox[dayw == 1],nb=100)$outliers

Back to Top

Documentation

RPubs documents

  1. fda.usc vignette: Installation and Descriptive Statistics

  2. fda.usc vignette:Functional Regression

  3. fda.usc vignette: Functional Classification and ANOVA

  4. fda.usc vignette: Variable Selection in Functional Regression (and Classification)

  5. fda.usc reference card Rcard

See reference cad (PDF document) at: https://cran.r-project.org/web/packages/fda.usc/fda.usc.pdf

Links:

  1. fda.usc R package

  2. JSS paper

  3. [fda.usc R Manual] Reference help document (221 pages) at:(https://cran.r-project.org/web/packages/fda.usc/fda.usc.pdf)

  4. Manuel Oviedo's website

Back to Top

References

Cardot, Hervé, Frédéric Ferraty, and Pascal Sarda. 1999. “Functional Linear Model.” Statist. Probab. Lett. 45 (1): 11–22.

Cuesta-Albertos, Juan, and Alicia Nieto-Reyes. 2008. “The Random Tukey Depth.” Computational Statistics and Data Analysis 52 (11): 4979–88.

Cuevas, Antonio, Manuel Febrero, and Ricardo Fraiman. 2007. “Robust Estimation and Classification for Functional Data via Projection-Based Depth Notions.” Comput. Statist. 22 (3): 481–96. http://link.springer.com/article/10.1007/s00180-007-0053-0.

Febrero-Bande, Manuel, Pedro Galeano, and Wenceslao González-Manteiga. 2008. “Outlier Detection in Functional Data by Depth Measures, with Application to Identify Abnormal \({\rm NO}_x\) Levels.” Environmetrics 19 (4): 331–45.

Ferraty, Frédéric, and Philippe Vieu. 2006. Nonparametric Functional Data Analysis. Springer Series in Statistics. New York: Springer-Verlag.

Fraiman, Ricardo, and Graciela Muniz. 2001. “Trimmed Means for Functional Data.” Test 10 (2): 419–40.

Krämer, Nicole, and Masashi Sugiyama. 2011. “The Degrees of Freedom of Partial Least Squares Regression.” Journal of the American Statistical Association 106 (494): 697–705.

López-Pintado, Satan, and Juan Romo. 2009. “On the Concept of Depth for Functional Data.” Journal of the American Statistical Association 104: 718–34.

Ramsay, J. O., and B. W. Silverman. 2005. Functional Data Analysis. Springer.

Székely, Gábor J, and Maria L Rizzo. 2013. “The Distance Correlation \(t\)-Test of Independence in High Dimension.” J. Multivariate Anal. 117: 193–213.

Székely, Gábor J, Maria L Rizzo, and Nail K Bakirov. 2007. “Measuring and Testing Dependence by Correlation of Distances.” Ann. Statist. 35 (6): 2769–94. http://projecteuclid.org/euclid.aos/1201012979.