Here, we provide a general overview of tensor factor models and how they can be estimated in R.
In general, we have two decompositions, the CP Decomposition and the Tucker decomposition, which is a generalization of the CP decomposition. First, we review the CP decomposition and how to calculate, then the Tucker and how to calculate, and finally how they can fit together in a factor model.
We follow Kolda and Bader (2009) for factorization of a tensor. In general, a rank-one tensor can be written as \[\mathcal{X} = \mathbf{a} \mathbf{b} \mathbf{c} = \mathbf{a} \circ \mathbf{b} \circ \mathbf{c}\] Where \(\circ\) is defined as the outer product of two tensors and is similar to how we construct a covariance matrix given two vectors. In other words, we have \[\mathbf{x} \in \mathbb{R}^{3}, \quad \mathbf{x}^\top\mathbf{x} = \mathbf{x} \circ \mathbf{x} \in \mathbb{R}^{3 \times 3}, \quad \mathbf{x} \circ \mathbf{x} \circ \mathbf{x} \in \mathbb{R}^{3 \times 3 \times 3}\]
This can be shown in the figure below
We then extend this and say that all tensors can be estimated by a sum of rank-one tensors. In essence, this is equivalent to decomposing a matrix into the sum of low rank vectors and we introduce new notation \[\mathcal{X} = \hat{\mathcal{X}} + \mathcal{E} = \sum_{l = 1}^r \mathbf{a}_l \circ \mathbf{b}_l \circ \mathbf{c}_l = [[\mathbf{A}, \mathbf{B}, \mathbf{C}]]\] where our \(\mathbf{A},\mathbf{B},\mathbf{C}\) are now factor matrices and \(\circ\) denotes the outer product. This is shown as follows:
The workhorse method of approximating this model is the Alternating Least Squares method. Our objective function is as follows: \[\min_{\mathbf{A},\mathbf{B},\mathbf{C}} ||\mathcal{X} - \hat{\mathcal{X}}||^2 \quad s.t.\] \[\hat{\mathcal{X}} = [[\mathbf{A},\mathbf{B},\mathbf{C}]]\] Substituting, we have \[\min_{\mathbf{A},\mathbf{B},\mathbf{C}} \sum_{ijk}\left(x_{ijk} - \sum_l a_{il} b_{jl} c_{kl}\right)^2\] In order to solve this, we hold fixed b and c to solve A, then B, then C. \[\min_{\color{red}{\mathbf{A}}} \sum_{ijk} \left(x_{\color{red}{i}jk} - \sum_l \color{red}{a_{il}} b_{jl} c_{kl}\right)^2\] \[\min_{\color{red}{\mathbf{B}}} \sum_{ijk} \left(x_{i\color{red}{j}k} - \sum_l a_{il} \color{red}{b_{jl}} c_{kl}\right)^2\] \[\min_{\color{red}{\mathbf{C}}} \sum_{ijk} \left(x_{ij\color{red}{k}} - \sum_l a_{il} b_{jl} \color{red}{c_{kl}}\right)^2\] We repeat this until convergence. This is a nonconvex problem with convex subproblems. Unfortunately, for tensors with more than 5 dimensions, this can get slow quickly. There are adjustments to this algorithm that exist that makes this computation much quicker (Battaglino et. al 2017, Babii et. al 2022)
We then obtain a CP decomposition of the form \[\hat{\mathcal{X}} = \sum_{r=1}^R \lambda_r \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r\] where the scalar is introduced to normalize the factors to length one and absorb weights. This is similar to our familiar SVD of \[\mathbf{B} = \sum_{r=1}^k \sigma_r \mathbf{u}_r \circ \mathbf{v}_r\] where \(\sigma_r\) is a diagonal matrix, or in higher orders, diagonal tensor.
We now look to perform a decomposition of a tensor. We use data from the textbook “Applied Multiway Data Analysis”, given in the OECD report by D’Ambra (1985). The data consists of specialization indices of electronics industries of 23 European countries for the years 1978-1984 as comparative statistics. The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion for the world as a whole (see D’Ambra, 1985, p. 249 and Kroonenberg, 2008, p.282). The first 2 time periods can be seen as follows:
elind[,,1:2]
## , , 78
##
## INFO RADI TELE STRU ELET COMP
## CA 1.4680 0.5426 1.3344 0.5077 0.6455 0.5936
## US 1.5247 0.2611 0.8925 0.6393 1.3847 1.2761
## JP 0.6615 2.3867 1.0219 0.6695 0.2497 1.0108
## AS 1.7889 0.3694 0.9587 0.7341 0.5481 0.3136
## NZ 0.0374 1.0345 0.7815 3.1177 0.7320 0.1374
## BL 0.4301 1.8073 1.2979 1.0612 1.2767 0.7047
## DA 0.7635 0.8077 1.6915 0.9869 1.9347 0.0986
## FR 1.0532 0.2432 0.8174 1.5022 0.9270 1.2875
## RF 0.8315 0.8856 0.8343 1.4975 1.4082 0.8857
## GR 0.0135 0.0615 1.7737 2.6966 0.0455 0.3504
## IR 2.2047 0.3149 0.3706 0.5598 0.1346 0.9805
## IT 1.1459 0.4009 1.0486 1.1332 0.6744 1.1098
## PB 0.8839 0.9504 1.1880 0.5786 2.0219 1.4675
## RU 1.4038 0.2889 1.0723 0.9572 0.8151 0.7539
## AU 0.2200 2.0134 0.5728 1.5901 0.6846 1.8229
## FI 0.2019 3.1341 0.6360 1.5769 1.1391 0.1132
## NO 0.8190 0.6867 1.8318 1.1179 0.4544 0.0341
## PO 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## SP 1.2315 0.6912 1.0132 1.2964 0.6783 0.2278
## SV 0.8717 0.5723 1.9181 0.8496 1.2241 0.1061
## CH 0.5784 0.0909 0.8040 2.7930 0.7844 0.3904
## TU 0.0012 4.7947 0.4004 0.7746 0.0000 1.0067
## YU 0.2649 0.6927 1.2741 2.0883 0.9737 0.7857
##
## , , 79
##
## INFO RADI TELE STRU ELET COMP
## CA 1.3775 0.4948 1.4927 0.5202 0.5018 0.5303
## US 1.4694 0.2296 0.8640 0.6487 1.4319 1.2239
## JP 0.6491 2.4714 1.0960 0.6373 0.2782 1.1785
## AS 1.5090 0.1379 1.0360 1.1128 0.6644 0.2670
## NZ 0.0347 1.6092 0.9789 2.8328 0.3075 0.0489
## BL 0.5788 2.8338 0.0179 1.5766 1.7461 0.8558
## DA 0.7920 0.8349 1.7199 0.9442 2.1640 0.0742
## FR 0.9704 0.2269 0.8909 1.5604 0.8391 1.1506
## RF 0.8031 1.0069 0.8460 1.4759 1.3647 0.8624
## GR 0.0132 0.1218 2.2961 2.3131 0.0332 0.1936
## IR 1.8560 0.3726 0.4950 0.6480 0.2516 1.0659
## IT 1.1936 0.4084 1.0355 1.1047 0.6069 0.9409
## PB 0.8943 0.9757 1.2363 0.5764 2.0072 1.3140
## RU 1.4489 0.2849 1.0332 0.8528 0.6298 0.7881
## AU 0.2354 1.8855 0.5795 1.8668 0.7844 1.5825
## FI 0.2657 3.2814 0.6713 1.6138 1.0953 0.1628
## NO 0.9261 0.1060 2.0207 1.1660 0.4368 0.0604
## PO 0.4779 2.6278 0.4519 1.1041 0.0036 2.0078
## SP 1.1486 0.5373 1.1365 1.3533 0.6826 0.2444
## SV 0.8759 0.5293 2.0648 0.7799 1.2363 0.1062
## CH 0.4876 0.1154 0.9784 2.7340 0.6778 0.3889
## TU 0.0000 3.3700 2.0587 0.4243 0.0000 0.7157
## YU 0.2844 0.9407 1.1821 2.0449 1.3135 0.7584
The specialization index is defined as the proportion of the monetary value of an electronic industry compared to the total export value of manufactured goods of a country compared to the similar proportion of the world as a whole. The CP decomposition with \(r = 2\) components can then be found as follows:
res <- Parafac(elind, ncomp = 2, center = FALSE, scale = FALSE)
res
## Call:
## Parafac(X = elind, ncomp = 2, center = FALSE, scale = FALSE)
##
##
## PARAFAC analysis with 2 components.
## Fit value: 405.6672
## Fit percentage: 75.22 %
##
To test, we get results robust (see Engelen et. al for a review) to outliers by doing
robust_res <- Parafac(elind, ncomp = 2, robust = TRUE, center = FALSE, scale = FALSE)
robust_res
## Call:
## Parafac(X = elind, ncomp = 2, center = FALSE, scale = FALSE,
## robust = TRUE)
##
##
## PARAFAC analysis with 2 components.
## Fit value: 170.8458
## Fit percentage: 89.56 %
## Robust
Extracting each component and scaling term is then
robust_res$A
## F1 F2
## CA 6.112409 1.0944118
## US 7.238711 2.1748070
## JP 2.554800 -5.9056305
## AS 7.552682 2.2842555
## NZ 7.035897 0.5460618
## BL 3.811393 -5.1183410
## DA 8.691411 0.1332259
## FR 7.426601 1.5533014
## RF 6.962769 -0.4995925
## GR 8.559444 1.7025995
## IR 5.376614 1.9784642
## IT 6.396468 0.8289045
## PB 7.770711 1.2331546
## RU 6.470796 1.6295433
## AU 3.899038 -5.9482228
## FI 3.366855 -7.3366515
## NO 7.362402 2.2395779
## PO -0.757043 -10.1260040
## SP 6.203030 0.6527305
## SV 7.546543 1.4506753
## CH 8.759134 1.8720126
## TU 5.637381 -2.9564821
## YU 7.021317 -1.5914762
robust_res$B
## F1 F2
## INFO -0.3742757 0.03028604
## RADI -0.3201128 0.89754891
## TELE -0.4640224 0.16784491
## STRU -0.5505910 0.31126951
## ELET -0.3799499 0.13207825
## COMP -0.3075976 0.22579510
robust_res$C
## F1 F2
## 78 -0.3626485 -0.2256045
## 79 -0.3682896 -0.3684405
## 80 -0.3752955 -0.3578719
## 82 -0.3736965 -0.3896537
## 83 -0.3847080 -0.4243388
## 84 -0.3917610 -0.4209556
## 85 -0.3884277 -0.4197439
robust_res$GA
## F1 F2
## 31.24778 17.34768
The function also allows automatic reconstruction to the estimated tensor \(\hat{\mathcal{X}}\).
If the PARAFAC decomposition is done on the coefficient tensor, this
has no interpretation. The CP has some interpretation, but this is much
more difficult than the Tucker decomposition as we will see below -
Looking at robust_res$A, we see factors corresponding to
each country, industry, and time period.
The centering and scaling factor simply changes the scaling value and brings it closer to 0.
The tucker decomposition can be seen as a generalization of the CP decomposition. When we perform the CP decomposition, we have to retroactively decide on the number of components \(R\). In the Tucker decomposition, we still decide on the number of components, but in this case, we select \(n\) components which correspond to each \(n\) dimensions of the tensor. Intuitively, if we have a three dimensional tensor, we can decompose this into a smaller core tensor and various “factor matrices” . The dimensions of the core tensor are predetermined at \(P \times Q \times R\) which are typically of lower dimension to the original tensor. Our formula for estimating our data tensor is then \[\hat{\mathcal{X}} = \mathcal{G} \times_1 \mathbf{A} \times_2 \mathbf{B} \times_3 \mathbf{C} = \sum_{p=1}^P \sum_{q=1}^Q \sum_{r=1}^R g_{pqr} \mathbf{a}_p \circ \mathbf{b}_q \circ \mathbf{c}_r = [[\mathcal{G}; \mathbf{A}, \mathbf{B}, \mathbf{C}]]\] This can be seen in the \(n=3\) dimensional case as
So we have essentially obtained a way of compressing our data tensor into a smaller core tensor and \(n\) factor matrices.
We make use of the application given in Lettau (2022), which describes a three-dimensional data set consisting of \(C = 25\) characteristics of \(M = 1342\) mutual funds observed over \(T = 34\) quarters. In other words, we have \[\hat{\mathcal{X}}(K_1,K_2,K_3) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{K_1 \times K_2 \times K_3}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{T \times K_1}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{M \times K_2}, \quad \mathbf{V}^{(3)} \in \mathbb{R}^{C \times K_3}\] Slice interpretation:
Mode interpretation:
We can get more intuition on this last point through the following
We can then obtain factor structures for each mode independently, and each factor model will be related to one another because they contain the core tensor. This is explained in more detail later.
Moving to our example, we have a model \[\hat{\mathcal{X}}(K_1 = 10,K_2 = 25,K_3 = 9) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{10 \times 25 \times 9}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{10 \times 34}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{25 \times 1342}, \mathbf{V}^{(3)} \in \mathbb{R}^{9 \times 25}\] In the paper, \(K_1, K_2, K_3\) is chosen based on a ratio of fit/parsimony. There are other methods which include estimating multiple Tucker models, deviance analysis, and even scree plots that have been extended to higher dimensions (see Kroonenberg 2008 for a review). The authors note that high values typically are within the first time dimension, indicating that the core tensor is dominated by elements from the first time index, suggesting that the first time dimension plays a particularly important role. (Models with \(K_1 = 1\) did have a good fit)
Each \(\mathbf{V}^{(i)}\) matrix can then be interpreted as the \(\mathbf{U}\) and \(\mathbf{V}\) components of the normal factor model. Additionally, a three dimensional tucker decomposition implies three two-dimensional factor representations in which \(\mathbf{V}^{(1)}, \mathbf{V}^{(2)}, \mathbf{V}^{(3)}\) are loading matrices for appropriately defined factors.
For a specific matrix, take \(\mathbf{V}_1\), which is the matrix corresponding to the time dimension. Rows represent all quarters and columns correspond to the 10 mode-1 components of the tucker model. The first row is non-volatile, suggesting that the this is a mean or “level” factor. The second row progressively gets positive, thus can be a “slope” factor.
We start with a \(10 \times 25 \times 9\) representative tensor. We then obtain the time series by multiplying this tensor with the time dimension \[\mathcal{G} \times_1 \mathbf{V}^{(1)} \in \mathbb{R}^{34 \times 25 \times 9}\] We can then follow each of these representative firms with specific representative characteristics over all time points and graph it. The following figure shows time-series of eight representative fund/characteristic combinations. The first column shows the first four “diagonal” combinations and the second column shows the first four “off-diagonal” combinations. Diagonal elements are “level” factors with positive values over the entire sample, while the off diagonal combinations flip signs and are thus “long/short” factors.
We then construct a factor model based on these results. Our typical factor model with \(k\) factors selected can be represented as \[\hat{\mathbf{X}}_k = \mathbf{F}_k \mathbf{B}_k^\top = \sum_{t=1}^K \sum_{k=1}^K s_{tn} u_t v_n^\top = \sum_{k=1}^K s_{kk} u_k v_k^\top\] with \(u_k\) as rows and \(v_k\) as columns and \(s_{kk}\) as weights. This can be done in tensor notation as \[\mathbf{F}_k\mathbf{B}_k^\top = \mathbf{F}_k \times_2 \mathbf{B}_k\] \[= \mathbf{S}_k \times_1 \mathbf{U}_k \times_2 \mathbf{V}_k\] \[= \sum_{k=1}^K s_{kk} \mathbf{u}_k \circ\mathbf{v}_k\] Using the tucker decomposition, we construct a factor model as \[\mathcal{Y} = \hat{\mathcal{X}} + \mathcal{E} \] \[\hat{\mathcal{X}}(K_1,K_2,K_3) = \mathcal{G} \times_1 \mathbf{V}^{(1)} \times_2 \mathbf{V}^{(2)} \times_3 \mathbf{V}^{(3)}\] \[\mathcal{G} \in \mathbb{R}^{K_1 \times K_2 \times K_3}, \quad \mathbf{V}^{(1)} \in \mathbb{R}^{T \times K_1}, \quad \mathbf{V}^{(2)} \in \mathbb{R}^{M \times K_2}, \quad \mathbf{V}^{(3)} \in \mathbb{R}^{C \times K_3}\]
Here, we note that the three dimensional tensor decomposition implies one “stacked” two-dimensional factor model for each of the three modes. We can show this as follows for the \(C\) mode from above
This form has similar properties to the two-dimensional factor model and can be interpretted as such. Since they all stem from the same three dimensional Tucker model, they are internally consistent with one another.
Babii, A., Ghysels, E., & Pan, J. (2022). Tensor Principal Component Analysis. arXiv preprint arXiv:2212.12981.
Battaglino, C., Ballard, G., & Kolda, T. G. (2018). A practical randomized CP tensor decomposition. SIAM Journal on Matrix Analysis and Applications, 39(2), 876-901.
D’Ambra, L. (1985). Alcune estensione dell’analysi in componenti principali per lo studio dei sistemi evolutivi. Uno studio sul commercio internazionale dell’elettronica. In: Ricerche Economiche. 2. del Dipartimento di Scienze Economiche Ca’ Foscari, Venezia.
Engelen, S., Frosch, S., & Jørgensen, B. M. (2009). A fully robust PARAFAC method for analyzing fluorescence data. Journal of Chemometrics: A Journal of the Chemometrics Society, 23(3), 124-131.
Kroonenberg, P. M. (2008). Applied multiway data analysis. John Wiley & Sons.
Lettau, M. (2022). High dimensional factor models with an application to mutual fund characteristics (No. w29833). National Bureau of Economic Research.