Tensor Factor Models in R

Here, we provide a general overview of tensor factor models and how they can be estimated in R.

In general, we have two decompositions, the CP Decomposition and the Tucker decomposition, which is a generalization of the CP decomposition. First, we review the CP decomposition and how to calculate, then the Tucker and how to calculate, and finally how they can fit together in a factor model.

Canonical Polyadic Decomposition (PARAFAC)

We follow Kolda and Bader (2009) for factorization of a tensor. In general, a rank-one tensor can be written as \[\mathcal{X} = \mathbf{a} \mathbf{b} \mathbf{c} = \mathbf{a} \circ \mathbf{b} \circ \mathbf{c}\] Where $\circ$ is defined as the outer product of two tensors and is similar to how we construct a covariance matrix given two vectors. In other words, we have \[\mathbf{x} \in \mathbb{R}^{3}, \quad \mathbf{x}^\top\mathbf{x} = \mathbf{x} \circ \mathbf{x} \in \mathbb{R}^{3 \times 3}, \quad \mathbf{x} \circ \mathbf{x} \circ \mathbf{x} \in \mathbb{R}^{3 \times 3 \times 3}\]

This can be shown in the figure below

We then extend this and say that all tensors can be estimated by a sum of rank-one tensors. In essence, this is equivalent to decomposing a matrix into the sum of low rank vectors and we introduce new notation \[\mathcal{X} = \hat{\mathcal{X}} + \mathcal{E} = \sum_{l = 1}^r \mathbf{a}_l \circ \mathbf{b}_l \circ \mathbf{c}_l = [[\mathbf{A}, \mathbf{B}, \mathbf{C}]]\] where our $\mathbf{A},\mathbf{B},\mathbf{C}$ are now factor matrices and $\circ$ denotes the outer product. This is shown as follows:

Drawbacks

Determining r (the rank of a tensor) exactly is considered np-hard and may not exist even for small tensors
Best rank-1 may not be part of best rank-2, rank-3, etc.
$\mathbf{A},\mathbf{B},\mathbf{C}$ are not orthogonal and may even have linearly dependent columns
CP is unique up to a permutation and scaling, can always scale these matrices, can be more interprettable
Extending the last point, in PCA you can rotate factors and still get equivalent solutions, here you can not

Computation

The workhorse method of approximating this model is the Alternating Least Squares method. Our objective function is as follows: \[\min_{\mathbf{A},\mathbf{B},\mathbf{C}} ||\mathcal{X} - \hat{\mathcal{X}}||^2 \quad s.t.\] \[\hat{\mathcal{X}} = [[\mathbf{A},\mathbf{B},\mathbf{C}]]\] Substituting, we have \[\min_{\mathbf{A},\mathbf{B},\mathbf{C}} \sum_{ijk}\left(x_{ijk} - \sum_l a_{il} b_{jl} c_{kl}\right)^2\] In order to solve this, we hold fixed b and c to solve A, then B, then C. \[\min_{\color{red}{\mathbf{A}}} \sum_{ijk} \left(x_{\color{red}{i}jk} - \sum_l \color{red}{a_{il}} b_{jl} c_{kl}\right)^2\] \[\min_{\color{red}{\mathbf{B}}} \sum_{ijk} \left(x_{i\color{red}{j}k} - \sum_l a_{il} \color{red}{b_{jl}} c_{kl}\right)^2\] \[\min_{\color{red}{\mathbf{C}}} \sum_{ijk} \left(x_{ij\color{red}{k}} - \sum_l a_{il} b_{jl} \color{red}{c_{kl}}\right)^2\] We repeat this until convergence. This is a nonconvex problem with convex subproblems. Unfortunately, for tensors with more than 5 dimensions, this can get slow quickly. There are adjustments to this algorithm that exist that makes this computation much quicker (Battaglino et. al 2017, Babii et. al 2022)

We then obtain a CP decomposition of the form \[\hat{\mathcal{X}} = \sum_{r=1}^R \lambda_r \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r\] where the scalar is introduced to normalize the factors to length one and absorb weights. This is similar to our familiar SVD of \[\mathbf{B} = \sum_{r=1}^k \sigma_r \mathbf{u}_r \circ \mathbf{v}_r\] where $\sigma_r$ is a diagonal matrix, or in higher orders, diagonal tensor.

CP in Practice

We now look to perform a decomposition of a tensor. We use data from the textbook “Applied Multiway Data Analysis”, given in the OECD report by D’Ambra (1985). The data consists of specialization indices of electronics industries of 23 European countries for the years 1978-1984 as comparative statistics. The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion for the world as a whole (see D’Ambra, 1985, p. 249 and Kroonenberg, 2008, p.282). The first 2 time periods can be seen as follows:

elind[,,1:2]

## , , 78
## 
##      INFO   RADI   TELE   STRU   ELET   COMP
## CA 1.4680 0.5426 1.3344 0.5077 0.6455 0.5936
## US 1.5247 0.2611 0.8925 0.6393 1.3847 1.2761
## JP 0.6615 2.3867 1.0219 0.6695 0.2497 1.0108
## AS 1.7889 0.3694 0.9587 0.7341 0.5481 0.3136
## NZ 0.0374 1.0345 0.7815 3.1177 0.7320 0.1374
## BL 0.4301 1.8073 1.2979 1.0612 1.2767 0.7047
## DA 0.7635 0.8077 1.6915 0.9869 1.9347 0.0986
## FR 1.0532 0.2432 0.8174 1.5022 0.9270 1.2875
## RF 0.8315 0.8856 0.8343 1.4975 1.4082 0.8857
## GR 0.0135 0.0615 1.7737 2.6966 0.0455 0.3504
## IR 2.2047 0.3149 0.3706 0.5598 0.1346 0.9805
## IT 1.1459 0.4009 1.0486 1.1332 0.6744 1.1098
## PB 0.8839 0.9504 1.1880 0.5786 2.0219 1.4675
## RU 1.4038 0.2889 1.0723 0.9572 0.8151 0.7539
## AU 0.2200 2.0134 0.5728 1.5901 0.6846 1.8229
## FI 0.2019 3.1341 0.6360 1.5769 1.1391 0.1132
## NO 0.8190 0.6867 1.8318 1.1179 0.4544 0.0341
## PO 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## SP 1.2315 0.6912 1.0132 1.2964 0.6783 0.2278
## SV 0.8717 0.5723 1.9181 0.8496 1.2241 0.1061
## CH 0.5784 0.0909 0.8040 2.7930 0.7844 0.3904
## TU 0.0012 4.7947 0.4004 0.7746 0.0000 1.0067
## YU 0.2649 0.6927 1.2741 2.0883 0.9737 0.7857
## 
## , , 79
## 
##      INFO   RADI   TELE   STRU   ELET   COMP
## CA 1.3775 0.4948 1.4927 0.5202 0.5018 0.5303
## US 1.4694 0.2296 0.8640 0.6487 1.4319 1.2239
## JP 0.6491 2.4714 1.0960 0.6373 0.2782 1.1785
## AS 1.5090 0.1379 1.0360 1.1128 0.6644 0.2670
## NZ 0.0347 1.6092 0.9789 2.8328 0.3075 0.0489
## BL 0.5788 2.8338 0.0179 1.5766 1.7461 0.8558
## DA 0.7920 0.8349 1.7199 0.9442 2.1640 0.0742
## FR 0.9704 0.2269 0.8909 1.5604 0.8391 1.1506
## RF 0.8031 1.0069 0.8460 1.4759 1.3647 0.8624
## GR 0.0132 0.1218 2.2961 2.3131 0.0332 0.1936
## IR 1.8560 0.3726 0.4950 0.6480 0.2516 1.0659
## IT 1.1936 0.4084 1.0355 1.1047 0.6069 0.9409
## PB 0.8943 0.9757 1.2363 0.5764 2.0072 1.3140
## RU 1.4489 0.2849 1.0332 0.8528 0.6298 0.7881
## AU 0.2354 1.8855 0.5795 1.8668 0.7844 1.5825
## FI 0.2657 3.2814 0.6713 1.6138 1.0953 0.1628
## NO 0.9261 0.1060 2.0207 1.1660 0.4368 0.0604
## PO 0.4779 2.6278 0.4519 1.1041 0.0036 2.0078
## SP 1.1486 0.5373 1.1365 1.3533 0.6826 0.2444
## SV 0.8759 0.5293 2.0648 0.7799 1.2363 0.1062
## CH 0.4876 0.1154 0.9784 2.7340 0.6778 0.3889
## TU 0.0000 3.3700 2.0587 0.4243 0.0000 0.7157
## YU 0.2844 0.9407 1.1821 2.0449 1.3135 0.7584

The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion of the world as a whole. The CP decomposition with $r = 2$ components can then be found as follows:

res <- Parafac(elind, ncomp = 2, center = FALSE, scale = FALSE)
res

## Call:
## Parafac(X = elind, ncomp = 2, center = FALSE, scale = FALSE)
## 
## 
## PARAFAC analysis with  2  components.
## Fit value: 405.6672 
## Fit percentage: 75.22 %
##

To test, we get results robust (see Engelen et. al for a review) to outliers by doing

robust_res <- Parafac(elind, ncomp = 2, robust = TRUE, center = FALSE, scale = FALSE)
robust_res

## Call:
## Parafac(X = elind, ncomp = 2, center = FALSE, scale = FALSE, 
##     robust = TRUE)
## 
## 
## PARAFAC analysis with  2  components.
## Fit value: 170.8458 
## Fit percentage: 89.56 %
## Robust

Extracting each component and scaling term is then

robust_res$A

##           F1          F2
## CA  6.112409   1.0944118
## US  7.238711   2.1748070
## JP  2.554800  -5.9056305
## AS  7.552682   2.2842555
## NZ  7.035897   0.5460618
## BL  3.811393  -5.1183410
## DA  8.691411   0.1332259
## FR  7.426601   1.5533014
## RF  6.962769  -0.4995925
## GR  8.559444   1.7025995
## IR  5.376614   1.9784642
## IT  6.396468   0.8289045
## PB  7.770711   1.2331546
## RU  6.470796   1.6295433
## AU  3.899038  -5.9482228
## FI  3.366855  -7.3366515
## NO  7.362402   2.2395779
## PO -0.757043 -10.1260040
## SP  6.203030   0.6527305
## SV  7.546543   1.4506753
## CH  8.759134   1.8720126
## TU  5.637381  -2.9564821
## YU  7.021317  -1.5914762

robust_res$B

##              F1         F2
## INFO -0.3742757 0.03028604
## RADI -0.3201128 0.89754891
## TELE -0.4640224 0.16784491
## STRU -0.5505910 0.31126951
## ELET -0.3799499 0.13207825
## COMP -0.3075976 0.22579510

robust_res$C

##            F1         F2
## 78 -0.3626485 -0.2256045
## 79 -0.3682896 -0.3684405
## 80 -0.3752955 -0.3578719
## 82 -0.3736965 -0.3896537
## 83 -0.3847080 -0.4243388
## 84 -0.3917610 -0.4209556
## 85 -0.3884277 -0.4197439

robust_res$GA

##       F1       F2 
## 31.24778 17.34768

The function also allows automatic reconstruction to the estimated tensor $\hat{\mathcal{X}}$.

If the PARAFAC decomposition is done on the coefficient tensor, this has no interpretation. The CP has some interpretation, but this is much more difficult than the Tucker decomposition as we will see below - Looking at robust_res$A, we see factors corresponding to each country, industry, and time period.

The centering and scaling factor simply changes the scaling value and brings it closer to 0.

Tucker Decomposition

The tucker decomposition can be seen as a generalization of the CP decomposition. When we perform the CP decomposition, we have to retroactively decide on the number of components $R$. In the Tucker decomposition, we still decide on the number of components, but in this case, we select $n$ components which correspond to each $n$ dimensions of the tensor. Intuitively, if we have a three dimensional tensor, we can decompose this into a smaller core tensor and various “factor matrices” . The dimensions of the core tensor are predetermined at $P \times Q \times R$ which are typically of lower dimension to the original tensor. Our formula for estimating our data tensor is then \[\hat{\mathcal{X}} = \mathcal{G} \times_1 \mathbf{A} \times_2 \mathbf{B} \times_3 \mathbf{C} = \sum_{p=1}^P \sum_{q=1}^Q \sum_{r=1}^R g_{pqr} \mathbf{a}_p \circ \mathbf{b}_q \circ \mathbf{c}_r = [[\mathcal{G}; \mathbf{A}, \mathbf{B}, \mathbf{C}]]\] This can be seen in the $n=3$ dimensional case as

So we have essentially obtained a way of compressing our data tensor into a smaller core tensor and $n$ factor matrices.

Example

We make use of the application given in Lettau (2022), which describes a three-dimensional data set consisting of $C = 25$ characteristics of $M = 1342$ mutual funds observed over $T = 34$ quarters. In other words, we have \[\hat{\mathcal{X}}(K_1,K_2,K_3) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{K_1 \times K_2 \times K_3}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{T \times K_1}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{M \times K_2}, \quad \mathbf{V}^{(3)} \in \mathbb{R}^{C \times K_3}\] Slice interpretation:

Each slice represents a two dimensional factor model
$T$ factor models of horizontal $M \times C$ slices, $M$ factor models of $T \times C$ slices, and $C$ factor models of $T \times M$ slices for a total of $T+M+C$ two dimensional factor models.
The choice of $K_1, K_2, K_3$ matters significantly (how many factors are chosen), as higher values get richer patterns across fibers (T,M,C) but have less correlation among one another.

Mode interpretation:

Elements of $\mathcal{G}$ are constructed from the data tensor to summarize the information in $\mathcal{X}$ in a more compact form with dimensions $K_1 \times K_2 \times K_3$
Multiplying $\mathcal{G}$ along the first mode (time) yields a tensor with dimensions $T \times K_2 \times K_3$
Each of the $K_2 \times K_3$ vertical fibers is a $T \times 1$ vector which is a time series of the “representative” core mutual fund $k_2$ and “representative” core characteristic $k_3$
Multiplying this along $\mathbf{V}^{(3)}$ we obtain $T \times K_2 \times C$. Each fiber is a time series characteristic of the representative core mutual funds

We can get more intuition on this last point through the following

We can then obtain factor structures for each mode independently, and each factor model will be related to one another because they contain the core tensor. This is explained in more detail later.

Moving to our example, we have a model \[\hat{\mathcal{X}}(K_1 = 10,K_2 = 25,K_3 = 9) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{10 \times 25 \times 9}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{10 \times 34}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{25 \times 1342}, \mathbf{V}^{(3)} \in \mathbb{R}^{9 \times 25}\] In the paper, $K_1, K_2, K_3$ is chosen based on a ratio of fit/parsimony. There are other methods which include estimating multiple Tucker models, deviance analysis, and even scree plots that have been extended to higher dimensions (see Kroonenberg 2008 for a review). The authors note that high values typically are within the first time dimension, indicating that the core tensor is dominated by elements from the first time index, suggesting that the first time dimension plays a particularly important role. (Models with $K_1 = 1$ did have a good fit)

Critiques:

Selection of Dimensionality reduction parameters should be more formal
There could be better metrics than in sample RMSE
How do we (fairly) compare these models to other such models?

Interpreting the Loading Matrices

Each $\mathbf{V}^{(i)}$ matrix can then be interpreted as the $\mathbf{U}$ and $\mathbf{V}$ components of the normal factor model. Additionally, a three dimensional tucker decomposition implies three two-dimensional factor representations in which $\mathbf{V}^{(1)}, \mathbf{V}^{(2)}, \mathbf{V}^{(3)}$ are loading matrices for appropriately defined factors.

For a specific matrix, take $\mathbf{V}_1$, which is the matrix corresponding to the time dimension. Rows represent all quarters and columns correspond to the 10 mode-1 components of the tucker model. The first row is non-volatile, suggesting that the this is a mean or “level” factor. The second row progressively gets positive, thus can be a “slope” factor.

Representative Funds, Characteristics, and Time Periods

We start with a $10 \times 25 \times 9$ representative tensor. We then obtain the time series by multiplying this tensor with the time dimension \[\mathcal{G} \times_1 \mathbf{V}^{(1)} \in \mathbb{R}^{34 \times 25 \times 9}\] We can then follow each of these representative firms with specific representative characteristics over all time points and graph it. The following figure shows time-series of eight representative fund/characteristic combinations. The first column shows the first four “diagonal” combinations and the second column shows the first four “off-diagonal” combinations. Diagonal elements are “level” factors with positive values over the entire sample, while the off diagonal combinations flip signs and are thus “long/short” factors.

Tensor Factor Models

We then construct a factor model based on these results. Our typical factor model with $k$ factors selected can be represented as \[\hat{\mathbf{X}}_k = \mathbf{F}_k \mathbf{B}_k^\top = \sum_{t=1}^K \sum_{k=1}^K s_{tn} u_t v_n^\top = \sum_{k=1}^K s_{kk} u_k v_k^\top\] with $u_k$ as rows and $v_k$ as columns and $s_{kk}$ as weights. This can be done in tensor notation as \[\mathbf{F}_k\mathbf{B}_k^\top = \mathbf{F}_k \times_2 \mathbf{B}_k\] \[= \mathbf{S}_k \times_1 \mathbf{U}_k \times_2 \mathbf{V}_k\] \[= \sum_{k=1}^K s_{kk} \mathbf{u}_k \circ\mathbf{v}_k\] Using the tucker decomposition, we construct a factor model as \[\mathcal{Y} = \hat{\mathcal{X}} + \mathcal{E} \] \[\hat{\mathcal{X}}(K_1,K_2,K_3) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{K_1 \times K_2 \times K_3}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{T \times K_1}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{M \times K_2}, \quad \mathbf{V}^{(3)} \in \mathbb{R}^{C \times K_3}\]

Relation to two dimensional factor models

Here, we note that the three dimensional tensor decomposition implies one “stacked” two-dimensional factor model for each of the three modes. We can show this as follows for the $C$ mode from above

Unfold the $(T \times M \times C)$ tensor $\hat{\mathcal{X}}$ along the third mode $\to (C \times TM)$ matrix $\mathbf{X}_{(3)}$
Multiply $\mathcal{G}$ by $\mathbf{V}^{(1)}, \mathbf{V}^{(2)}$ along modes one and two $\to (T \times M \times K_3)$ to obtain tensor $\mathcal{S}_{(12)} = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)}$
Unfold $\mathcal{S}_{(12)}$ along mode-3 $\to (K_3 \times TM)$ to obtain a flattened version $\mathbf{S}_{(12)}$
Write the implied two dimensional $(TM \times C)$ factor model with $K_3$ factors as \[\underbrace{\mathbf{X}_{(3)}^\top}_{TM \times C} = \underbrace{\mathbf{S}_{(12)}^\top}_{TM \times K_3} \underbrace{\mathbf{V}^{(3)\top}}_{K_3 \times C} + \underbrace{\mathbf{E}_3}_{TM \times C} = \hat{\mathbf{X}}_{(3)}^\top + \mathbf{E}_3\]

This form has similar properties to the two-dimensional factor model and can be interpretted as such. Since they all stem from the same three dimensional Tucker model, they are internally consistent with one another.

References

Babii, A., Ghysels, E., & Pan, J. (2022). Tensor Principal Component Analysis. arXiv preprint arXiv:2212.12981.

Battaglino, C., Ballard, G., & Kolda, T. G. (2018). A practical randomized CP tensor decomposition. SIAM Journal on Matrix Analysis and Applications, 39(2), 876-901.

D’Ambra, L. (1985). Alcune estensione dell’analysi in componenti principali per lo studio dei sistemi evolutivi. Uno studio sul commercio internazionale dell’elettronica. In: Ricerche Economiche. 2. del Dipartimento di Scienze Economiche Ca’ Foscari, Venezia.

Engelen, S., Frosch, S., & Jørgensen, B. M. (2009). A fully robust PARAFAC method for analyzing fluorescence data. Journal of Chemometrics: A Journal of the Chemometrics Society, 23(3), 124-131.

Kroonenberg, P. M. (2008). Applied multiway data analysis. John Wiley & Sons.

Lettau, M. (2022). High dimensional factor models with an application to mutual fund characteristics (No. w29833). National Bureau of Economic Research.

Factor Models

Ivan

1/28/2023