We visualized historical data of Japanese economics in 1997-2010 by the means of topological data analysis. We found that periods of growth and economical or financial crises display very different topological characteristics.
Specifically, one the most distinctive feature that is associated with a period of economical growth is a cavity in a certain simplicial complex constructed from the time series data of Tokyo exchange stock indices. Such a cavity is is not present in simplicial complexes associated with crisis periods.
First and foremost, I would like to extend my gratitude to Dr. Fedor DUZHIN for his continuous guidance and excellent supervision in this project. I would also like to thank Yim Woei Shyr, whose original work is the basis for this project, who did the minimal spanning trees analysis between Japanese economic sectors.
Chapter 1: Introduction
Chapter 2: Literature Review
Chapter 3: Data and Methods
3.1: Data
3.2: Correlation Coefficient Matrix and Visualization
3.3: Persistent Homology
3.3.1 Simplicial Complex
A. Explanation
B. Input and Example
3.3.2 Barcode
A. Explanation
B. Input and Example
Chapter 4: Topological Data Analysis
4.1: Method 1
A. Overview
B. Existence of Cavities
C. Conclusion for method 1
4.2: Method 2
A. Overview
B. Experiment
C. Conclusion for method 2
4.3: Prediction
Chpater 5: Conclusion and Future Improvement
In the last 20 years in Japan, there are five different macroeconomic periods: Asian Financial Crisis (1997-1999), Dot-Com Bubble crisis (2000-2002), Growth period (2003-2006), Subprime Crisis (2007-2009) and Lehman Crisis (2008-2010). The growth of Japanese economy gradually slowed during the past 20 years [1]. Japanese economy was hobbled by the crippling effect of the burst of the bubble. The Asian financial crisis in Japan was started from November 1997, and it resulted in collapses of large banks and security companies. ?gDefaulting financial institutions had over the years failed to improve their balance sheets that had been impaired due to excessive investment in real estate or stocks during the bubble boom period?h (Naohisa, 2014). Subprime crisis was from the summer of 2007, and it resulted spillover effect of the global recession in the United States following the global financial market turmoil, and the annual GDP growth rate in Japan in 2009 was -5.5 percent [2].
Figure 1 shows Nikkei Stock Average in the recent ten years. Figure 1 shows financial crisis in Japan had a great impact on Japanese economy. Under this situation, we would like to study the topology in economical analyze.
figure 1
In this paper we used topological data analysis: persistent homology to visualize the data under different time intervals. The choice of time intervals was based on the previous paper conducted by Yim Woei Shyr. There are four financial crisis periods and one non-financial crisis period [3]. In our project, we added another period before Asian Financial Crisis which is a non-financial crisis period.
In this project, we performed a correlation coefficient matrix analysis and topological data analysis on the 32 Japanese Nikkei economic sector indices over the period from 1 January 1996 to 31 December 2010.
In Chapter 2, we do a literature review on the final year project produced by Yim Woei Shyr with the topic ?gCausal Links between Japanese Economic Sectors?h. We briefly analyze on the tool used by Yim Woei Shyr and find out what can be improved by topological tools. In Chapter 3, we introduce the related topology knowledge, data collection and data processing methods. In chapter 4, we use two topological methods to analyze the economic indices and try to predict a financial crisis by the existence of cavities.
Our project is based on the previous project: ?gCausal Links between Japanese Economic Sectors?h produced by Yim Woei Shyr which analyzed Japanese economy by the concept of minimum spanning tree, and we aim to improve his study by using topological tools.
In order to represent relationship between economic sectors, we used the same method correlation coefficient matrix explained in chapter 3.
In Yim?fs minimum spanning tree analysis, he found that minimum spanning tree structure displays different characteristics under different periods. During the Asian Financial Crisis the minimum spanning tree structure is extremely star-like and become more chain-like in other crisis periods [3]. For example in figure 2 shows the structure during Asian Financial Crisis period (1997-1999), and figure 3 shows the structure during growth period (2003-2006).
figure 2
figure 3
Yim noted that minimum spanning tree has different structures during different economic periods, and this leads to the possibility that there are different topological structures in different periods.
In this project there are several things we improved. The first thing is that we visualized the relationship between different economic sectors by plotting time series and simplicial complexes, and simplicial complex gives a better image how economic sectors correlate. Second thing is that in Yim?fs project, he only showed that there are different structures in different macroeconomic periods, but he didn?ft show whether it is possible to predict a financial crisis. In our project, by using topological tools like barcodes, we conclude it is possible to predict a financial crisis by looking into the existence of cavities.
Tick-by-tick data of 32 Nikkei Japanese economic sectors indices from 1 January 1997 to 31 December 2010 was downloaded from Bloomberg [4]. These datasets were then proceed into daily time series \[X_i= \{X_{i,1}, X_{i,2}, X_{i,3},\dots, X_{i,N}\}\], where I index is the 32 Japanese economic sectors and N represents the total number of daily trading over the whole study period.
Table 1
i | Sector | Abbreviatio | i | Sector | Abbreviation |
---|---|---|---|---|---|
1 | BANKS | BNK | 17 | OIL&COAL PROD | OIL |
2 | AIR TRANS | AIR | 18 | WHOLESALE&TRD | WASL |
3 | REAL ESTATE | REAl | 19 | SERVICS | SERV |
4 | PHARMACEU | PHRM | 20 | OTHER FINC BUS | FINC |
5 | ELECTRIC APPL | ELMH | 21 | CHEMICALS | CHM |
6 | RETAIL | RETL | 22 | PREC INTRUMENT | PREC |
7 | FOODS | FOOD | 23 | MARINE TRANS | MART |
8 | TRANSPORT | TRAN | 24 | LAND TRANS | LAND |
9 | MACHINERY | MACH | 25 | PULP&PAPER | PAPR |
10 | IRON&STEEL | IRON | 26 | METAL PRODUCTS | METL |
11 | ELE POWER&GAS | ELEC | 27 | RUBBER PRODUCT | RUBB |
12 | CONSTRUCTION | CONT | 28 | OTHER PRODUCTS | PROD |
13 | INFO&COMM | COMM | 29 | NONFER METAL | NMET |
14 | FISH&ARG&FRST | FISH | 30 | GLSS&CRMC PRD | GLAS |
15 | MINING | MINN | 31 | WARE | WARE |
16 | SEC&CMDTY FUTR | SECR | 32 | TXTL&APPR | TEXT |
The raw datasets were processed into the indices of each sector from the first trading (1 January 1997 8am) to the last trading (31 December 2010 9 pm). Table 2 shows a part of the processed data.
Table 2
## REAL PHRM ELMH RETL FOOD
## 1 835.30 1093.2 1704.85 901.3 869.25
## 2 835.30 1093.2 1704.85 901.3 869.25
## 3 835.30 1093.2 1704.85 901.3 869.25
## 4 833.22 1098.6 1728.10 904.3 873.42
## 5 827.92 1077.8 1712.61 897.4 856.62
## 6 817.92 1070.9 1703.17 885.2 846.85
## 7 763.56 1051.9 1672.58 865.4 821.61
## 8 765.31 1026.1 1631.15 852.9 788.20
## 9 796.50 1029.8 1646.94 865.0 819.48
## 10 819.00 1041.7 1647.54 870.7 815.41
The correlation coefficient between each sector can be computed using the formula as shown in the equation below to estimate the degree to which each sector is correlated to another.
\[corr(x_i - x_j)= \frac{\sum_{i=d_1}^{d_2}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=d_1}^{d_2}(x_i - \bar{x})^2} \sqrt{\sum_{i=d_1}^{d_2}(y_i - \bar{y})^2}}\]
The correlation coefficient matrix of the 32 Nikkei economic sectors used in this research is a 32 by 32 symmetric matrix. Every entry \[(X_i,X_j)\] represents the correlation coefficient between the ith economic sector and jth economic sector.
Correlation coefficient matrix has following properties:
Range of each entry is \[(-1, 1)\].
The larger the value is, the stronger correlation between the two sectors.
\[(X_i, X_j)=(X_j, X_i)\]
\[(X_i, X_i)=1\].
figure 4
## Loading required package: ggplot2
As a sample shown in figure 4, we plotted time series plots of COMM, MINN, RETL from day 1 to day 800 and calculated correlation coefficients, where corr(RETL, COMM)=0.94, corr(RETL, MINN)=-0.05. corr(COMM,MINN)=-0.21.
Time interval has to be set to calculate correlation coefficient matrix, and correlation keeps changing through time.
Correlation coefficient matrix X was then converted into a distance matrix A for simplicial complex plotting, where A=1-X. A shows the distance between pairwise sectors. As an example, figure 5 shows a part of the distance matrix under interval (430, 460). Entry (i, i) has value 0, and distance matrix is also a symmetric matrix.
figure 5
## BNK AIR REAL PHRM
## BNK 0.0000000 0.1030363 0.10716170 0.5644072
## AIR 0.1030363 0.0000000 0.26165774 0.5933658
## REAL 0.1071617 0.2616577 0.00000000 0.6044045
## PHRM 0.5644072 0.5933658 0.60440449 0.0000000
## ELMH 0.2643780 0.4799163 0.07347386 0.6490273
Persistent homology is an algebraic method for discerning topological features of data. Topological features can be components, holes, figure structure and etc. [5] Data can be set of discrete points, with a metric. As long as we get the correlation coefficient matrix in the previous step, we use simplicial complexes and barcodes to visualize the algebraic structure of the data.
We use the following point cloud to explain what topological features the following data exhibits in figure 6.
figure 6
It appears that these points were roughly sampled from an annulus, the problem we have to detect the annulus from the points alone is the discrete points have trivial topology. One idea to solve this is to connect nearby points. We first choose a distance, or scale parameter, d. We draw a ball of diameter d around each point. Two balls intersect exactly when two points are no further apart than distance d, in which case we connect the two points with an edge as shown in figure 7 [6].
figure 7
This creates a figure whose vertices are the original points. The figure shows us that the points form a single cluster at scale parameter d, but it doesn?ft tell us about higher-order features, such as holes. For example, this figure has many cycles, but the figure doesn?ft help us identify the central hole in the data [7].
A simplicial complex is an object built from points, edges, triangular faces, and so on. A point is a zero-dimensional simplex, an edge between two points is a one-dimensional simplex, a triangular face is a two-dimensional simplex, a solid tetrahedron is a three-dimensional simplex, and so on for higher-dimensional simplices. If we glue many simplices together in such a way that the intersection between any two simplices is also a simplex, we obtain a simplicial complex. Homology counts components, holes, and voids of a simplicial complex [7].
We go back to the point cloud and we fill in complete simplices. That is, if we see two-dimensional face. Any four points that are all pairwise connected get filled in with a three-simplex, and so on. The resulting simplicial complex is Rips complex as shown in figure 8. We then apply homology to this complex, which reveals the presence of the central hole.
figure 8
Every time we construct a simplicial complex, there are two inputs.
1: distance matrix
2: threshold
When plotting a simplicial complex, we set a threshold and connect the vertices whose joint edges have weight smaller than the threshold. When two vertices are connected, this means the correlation coefficient between these two economic sectors are larger than 1-threshold. More vertices connected, the stronger relationship in the whole economy, and the smaller threshold is, more edges appear.
We use the above distance matrix from figure 5 as an example, which was produced from interval (430, 460). In figure 9, threshold is equal to 0.08, and in figure 10, threshold is 0.1.
figure 9
## Loading required package: GGally
## Loading required package: network
## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## Mark S. Handcock, University of California -- Los Angeles
## David R. Hunter, Penn State University
## Martina Morris, University of Washington
## Skye Bender-deMoll, University of Washington
## For citation information, type citation("network").
## Type help("network-package") to get started.
## Loading required package: sna
## sna: Tools for Social Network Analysis
## Version 2.3-2 created on 2014-01-13.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
##
## Attaching package: 'sna'
## The following object is masked from 'package:network':
##
## %c%
## Loading required package: scales
figure 10
We can see, when threshold increases, edges increase.
The problem is how we choose the value of d. If d is too small, we might see multiple connected components and small holes that are artifacts of the sampling; in short, we detect noise. On the other hand, if d is too large, then any two points get connected and we get a giant simplex, which has trivial homology. In order to choose the right value of d to get the significant topological features, rather than choosing a single distance d, we consider all distances d.
Observe that each hole appears at some particular value of d and disappears at another value of d. For example, consider the four points in figure 11. There is some smallest distance, this d1, which is just large enough for these four edges to appear, creating a hole in the middle [8].
figure 11
For the same configuration of points, there is another distance shown in figure 12, this d2, which is just large enough for an edge to appear between opposite points. This completes two triangles that get filled in, and the hole disappears.
figure 12
So the hole appears at distance d1 and disappears at distance d2. We can represent the persistence of this hole as a pair (d1, d2). We can also visualize this pair as an interval, or bar, from d1 to d2. This bar is a visual representation of the persistence of the hole. A collection of such bars is called a barcode [8].
For example, return to the points cloud example. We created growing balls around each point and record the barcode. As the balls start to grow, edges begin to appear, and the complex gradually becomes connected. A few small holes appear, but quickly become filled in. The large central hole appears, and remains as more and more 2-simplices appear around the outside. Eventually, as the balls get large, edges appear across the center of the hole, and it eventually gets filled in as well.
figure 13
The small holes that appear first are due to sampling irregularities, or noise, and are represented by short bars in the barcode. The large hole, which we regard a significant feature of the data, is represented by a long bar in the barcode.
To plot a barcode, input is distance matrix.
In our experiments in chapter 4, we plotted barcodes of dimension 0, dimension 1 and dimension 2 features on different intervals to see whether topological features correlate with Japanese economy. Figure 14 is an example what we can plot in this project, this barcode is under interval (850, 900) [9].
figure 14
## Loading required package: TDA
Black bars represent connected components, red bars represent loops, and blue bars represent cavities.
Table 3 gives us the persistent intervals of topological feature (the interval when a topological feature exists). For example, the first loop in row 33 has the persistent interval (0.360, 0.436).
Table 3
## dimension Birth Death
## [1,] 0 0.0000000 1.00000000
## [2,] 0 0.0000000 0.37542646
## [3,] 0 0.0000000 0.31013004
## [4,] 0 0.0000000 0.32084927
## [5,] 0 0.0000000 0.26558646
## [6,] 0 0.0000000 0.23525305
## [7,] 0 0.0000000 0.29954271
## [8,] 0 0.0000000 0.19143382
## [9,] 0 0.0000000 0.22327043
## [10,] 0 0.0000000 0.20921920
## [11,] 0 0.0000000 0.23871938
## [12,] 0 0.0000000 0.13927135
## [13,] 0 0.0000000 0.16699354
## [14,] 0 0.0000000 0.20330759
## [15,] 0 0.0000000 0.12452925
## [16,] 0 0.0000000 0.06070879
## [17,] 0 0.0000000 0.14799402
## [18,] 0 0.0000000 0.13085084
## [19,] 0 0.0000000 0.04652944
## [20,] 0 0.0000000 0.13321008
## [21,] 0 0.0000000 0.24384672
## [22,] 0 0.0000000 0.07538702
## [23,] 0 0.0000000 0.23653031
## [24,] 0 0.0000000 0.32576165
## [25,] 0 0.0000000 0.15132801
## [26,] 0 0.0000000 0.14405215
## [27,] 0 0.0000000 0.19146367
## [28,] 0 0.0000000 0.12618715
## [29,] 0 0.0000000 0.19190272
## [30,] 0 0.0000000 0.16037861
## [31,] 0 0.0000000 0.24137663
## [32,] 0 0.0000000 0.31373692
## [33,] 1 0.3605777 0.43643234
## [34,] 1 0.3924642 0.40075633
## [35,] 1 0.4287738 0.54295479
## [36,] 1 0.4679884 0.47844026
## [37,] 1 0.4801219 0.52406086
## [38,] 1 0.5050415 0.59696257
## [39,] 1 0.5119870 0.61572265
## [40,] 1 0.5955283 0.62092111
## [41,] 2 0.6210176 0.68007557
## [42,] 2 0.6539932 0.66710799
Simplicial complexes are plotted according to table 3.
Example 1
In example 1, before the first loop appears (threshold<0.360), threshold is 0.2, and there only exists dimension 0 features.
Example 2
In example 2, the threshold is equal to 0.4. This simplicial complex contains loops.
Example 3
In example 3, threshold is equal to 0.66, and this simplicial complex contains cavity.
The above three examples show that simplicial complex is correlated to persistent intervals.
In this chapter, we used two methods to analyze Japanese economy. The first method is to see how dimension 0, 1, and 2 features act under different intervals, and to check whether any regulation exists.
The second method is that, for every barcode, we check that when the first loop appears, how many dimension 0 features exist. This counts the number of connected components before the first loop appears. Under different intervals, we would like to check whether the number of connected components correlates with Japanese economy.
We collected data from 1997 to 2010, and there are six macro segment periods: before Asian Financial Crisis (Jan 1997-Jul 1997), Asian Financial Crisis (1997-1999), Dot-com Bubble Crisis (2000-2002), Growth Period (2003-2006), Subprime Crisis (2007-2008) and Lehman?fs Crisis (2008-2010). We first plotted the barcodes of these six macroeconomic periods.
Figure 15. Barcode of before Asian Financial Crisis (Jan 1997-Jul 1997)
The barcode before the Asian Financial Crisis contains 3 loop and 1 cavity.
Figure 16. Barcode of the Asian Financial Crisis (1997-1999)
The barcode during the Asian Financial Crisis period contains 3 loops and does not conation any cavity.
Figure 17. Barcode during Dot-com bubble Crisis (2000-2002)
In figure 17, the barcode during the Dot-com Bubble Crisis contains 5 loops and does not contain any cavity.
figure 18. Barcode during the Growth Period (2003-2006)
The barcode during the Growth Period contains 5 loops and 1 cavity.
figure 19. Barcode during Subprime Crisis (2007-2008)
The barcode during the Surprime Crisis contains 7 loops and does not contain any cavity.
figure 20. Barcode during the Lehman?fs Crisis (2008-2010)
The barcode during the Lehman’s Crisis 4 loops and doed not contain any cavity.
In the above six barcodes plots, cavities only appear during non-financial crisis periods: before the Asian Financial Crisis and the Growth Periods, and the left four financial crisis periods do not contain any cavity. We conclude that cavity is a significant feature which is not present in simplicial complexes associated with crisis periods.
In the following step, we plotted barcodes under small time intervals: 100 days length, 30 days length and 10 days length, and check whether it is possible to predict a financial crisis by the existence of cavities.
First, we plotted barcodes under intervals (0, 100), (100, 200)……(3400,3500), with length 100 days. In figure 21, once cavity exists in the barcodes plot under interval (n, n+100), we plot a coordinate (n, 1) on the coordinate system. We represent the period before the Asian Financial Crisis as P1, Asian Financial Crisis Period as P2, Dot-com Bubble Crisis Period as P3, the Growth Period as P4, Subprime Crisis Period as P5 and Lehman?fs Crisis Period as P6.
figure 21
From figure 21, the length of the interval is quite long, and we can get very few points and cannot have an efficient conclusion from the above plot. It only shows that there always exists cavities in non-financial crisis periods, and cavity may disappear in financial crisis periods.
In figure 22, length of interval was shortened to 30 days and intervals are: (0, 30), (30, 60)…(3480, 3510).
figure 22
Figure 22 shows us a much clearer pattern than figure 21 can show. There are few properties can be concluded from figure 22:
Cavity appears in non-financial crisis periods: P1 and P4.
Cavity begins to disappear from the end of a non-financial crisis period. For example: the end of P3, the end of P5, and the end of P6.
Cavity begins to appear from the end of a financial crisis period. For example: the end of P2 and the end of P3.
The frequency of cavities has a downward trend. Figure 21 shows that at the end of P2 and beginning of P3, and the end of P3 and beginning of P4, the frequency of cavities increases a lot. In P3, after the first few days, the frequency of cavities decreases. In non-financial crisis periods, the existence of cavities are relatively stable.
In the following experiment, we set the length of interval to a shorter period: 10 days. We plotted the barcodes under intervals: (0, 10), (10, 20), (20, 30)?c(3490, 3500). Figure 23 shows how the existence of cavities looks like under 10 days length.
figure 23
Figure 23 shows the same four properties as figure 21 shows.
The existence of cavities in the simplicial complexes constructed from distance matrix is correlated to financial crisis. Cavity significantly presents more frequently during non-financial crisis periods than during financial crisis periods. Cavity begins to appear at the end of a financial crisis period and beginning of a non-financial crisis, and begins to disappear at the end of a non-financial crisis period and beginning of a financial crisis period. The frequency of cavities has a downward trend. Table 4 gives a clearer summary how cavity acts through time.
Table 4
cavities | Non-financial Crisis | Financial Crisis |
---|---|---|
Begin | Appear | Disappear |
End | Disappear | Appear |
In method 2, we checked how the number of connected components correlates with time. The following example explains how the number of connected components is calculated.
Table 5
## dimension Birth Death
## [1,] 0 0.00000000 1.00000000
## [2,] 0 0.00000000 0.22581776
## [3,] 0 0.00000000 0.02751141
## [4,] 0 0.00000000 0.04023113
## [5,] 0 0.00000000 0.03757904
## [6,] 0 0.00000000 0.04417213
## [7,] 0 0.00000000 0.03861217
## [8,] 0 0.00000000 0.02422360
## [9,] 0 0.00000000 0.01389908
## [10,] 0 0.00000000 0.03624461
## [11,] 0 0.00000000 0.20560669
## [12,] 0 0.00000000 0.06335140
## [13,] 0 0.00000000 0.03716806
## [14,] 0 0.00000000 0.07713222
## [15,] 0 0.00000000 0.04386357
## [16,] 0 0.00000000 0.02311983
## [17,] 0 0.00000000 0.08889636
## [18,] 0 0.00000000 0.01842369
## [19,] 0 0.00000000 0.03443356
## [20,] 0 0.00000000 0.01786935
## [21,] 0 0.00000000 0.01296671
## [22,] 0 0.00000000 0.02673606
## [23,] 0 0.00000000 0.02636075
## [24,] 0 0.00000000 0.03665232
## [25,] 0 0.00000000 0.24835265
## [26,] 0 0.00000000 0.06711453
## [27,] 0 0.00000000 0.13092646
## [28,] 0 0.00000000 0.02114192
## [29,] 0 0.00000000 0.02479764
## [30,] 0 0.00000000 0.01238545
## [31,] 0 0.00000000 0.03840975
## [32,] 0 0.00000000 0.02053932
## [33,] 1 0.04218670 0.05245362
## [34,] 1 0.05536789 0.05775787
## [35,] 1 0.05636216 0.06984020
## [36,] 1 0.24288710 0.28976306
Table 5 shows persistent intervals of topological features under the interval (2871, 3500). The first loop appears when threshold=0.04218670, to count the number of connected components before the first loop appears, we need to look into the death time of every dimension 0 feature. If the death time of a feature is less than threshold= 0.04218670, it is considered as a connected component. In this example, there are 21 dimension 0 features whose death time is less than the threshold, so the number of connected components is 21.
In the first experiment, we set the length of interval to 30 days and have intervals: (0, 30), (30, 60)?c(3480, 3510). Figure 24 shows the time series plot of the 117 calculated data, where X-axis represents time and Y-axis represents the number of connected components.
Figure 24-30 days
Figure 24 is a very random plot, and we cannot get any effective information from this figure. We conclude the numbers of connected components are randomly distributed in this example. Figure 25 is the time series under the length 10 days.
Figure 25-10 days
Once again, figure 25 shows the points are randomly distributed.
Means and variances were then calculated under different macroeconomic periods.
Table 6
NA | P1 | P2 | P3 | P4 | P5 | P6 |
---|---|---|---|---|---|---|
Mean | 13.46154 | 13.75758 | 15.50633 | 13.57143 | 14.14085 | 13.70312 |
Variance | 25.10 | 25.48 | 30.97 | 19.29 | 28.04 | 23.86 |
Table 6 shows the variance during non-financial crisis periods is less than during financial crisis periods, and it shows the more stable the number of connected components is, more stable the economy is, but the mean value of number is not significantly correlated to economy.
The number of connected components is not significantly correlated to macroeconomic periods. Numbers of connected components under different intervals are more likely random distributed points. However, the variance can be correlated with economy, where during financial crisis periods the number of connected components variates more that it variates during non-financial crisis periods.
We not only want to see whether topology tools can distinct any difference between non-financial crisis periods and financial crisis periods, we also want to see whether topology tools can predict a financial crisis.
Properties concluded from method 1 can be used to predict a financial crisis. If the frequency of cavities decreases a lot, it may indicate the beginning of a financial crisis. When cavity appears stalely in a long period, it is a high possibility that the economy is in a stable growing period.
To understand the topology in Japanese economy, we conducted a persistent homology study on the 32 Nikkei Japanese economic sector indices from 1 January 1997 to 31 December 2010. Correlation coefficient matrix was used to represent the relation between different economic sectors. In order to see how correlation coefficient matrix reflect Japanese economy, various plotting tools were used to do the basic visualization of the data.
In order to better visualize the topology in Japanese economy, correlation coefficient matrix was then converted to a distance matrix for simplicial complex plot which visualized the relationship between economic sectors clearly. The distance matrix shows the distance between pairwise economic sectors, and smaller distance represents a stronger relationship. When constructing a simplicial complex, we add an edge to every two vertices whose distance is smaller than a set threshold. To study the topology features in the constructed simplicial complexes, for every correlation coefficient matrix under a specific interval, a barcode was plotted, which shows the existence of different topological features.
In this project, we used two main methods to do the topological analysis. The first method analyzed the existence of cavity under different time interval. Four properties were found in this study:
Cavity appears stably in non-financial crisis periods.
Cavity begins to disappear from the end of a non-financial crisis period.
Cavity begins to appear from the end of a financial crisis period.
The frequency of cavities has a downward trend. According to the properties we found that it is possible to predict a financial crisis that if the frequency of cavities decreases a lot, it may indicate the beginning of a financial crisis.
In the second method, the number of connected components before the first loop appears was studied. We conclude the number of connected components is more likely random distributed point independent of time. However, the number of connected components has a significant smaller variance in non-financial crisis periods than in financial crisis periods, and non-financial crisis periods have a more stable distribution of connected components.
In the future studies, several things can be improved:
A more accurate data should be collected (hourly data).
Not only Japanese economy, but other countries?f economy should be studied to check whether the four above properties can be applied.
Besides barcodes, there are more advanced topological data analysis methods to be explored in the future, for example: cluster analysis can divide a data set up into disjoint groups that some distinct defining properties, or conceptual coherence; persistence landscape with main technical advantage that it?fs a function and so we can use the vector space structure of its underlying function space to provide insight to the ?gshape?h of data. [10]
[1] Shirakawa, M. (2012). Demofigureic changes and macroeconomic performance: Japanese experiences. Opening Remark at.
[2] Hirakata, N., Sudo, N., Takei, I., & Ueda, K. (2014). Japan?fs financial crises and lost decades (No. 220). Federal Reserve Bank of Dallas.
[3] Yim Woei Shyr (2010). Causal Links between Japanese Economic Sectors, MAS491 Honours Project Thesis. pp 12
[4] Bloomberg Terminal
[5] Edelsbrunner, H., & Harer, J. (2010). Computational topology: an introduction. American Mathematical Soc..
[6] Wright, M. (2015). Computing Persistent Homology.
[7] Lesnick, M., & Wright, M. (2015). Interactive Visualization of 2-D Persistence Modules. arXiv preprint arXiv:1512.00180.
[8] Ghrist, R. (2008). Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1), 61-75.
[9] Fasy, B. T., Kim, J., Lecci, F., & Maria, C. (2014). Introduction to the R package TDA. arXiv preprint arXiv:1411.1830.
[10] Bubenik, P., 2012. Statistical topological data analysis using persistence landscapes. arXiv preprint arXiv:1207.6437.