EDA - Bond Yields

Introduction

This is an investigation of government bond data between 1970 and 2020 - from https://www.kaggle.com/datasets/everget/government-bonds - completed as part of an interview process for the International Capital Market Association in April 2022.

The starting point questions/points to consider are:

Which countries have the most volatility in yield and which the least? How has this evolved over time?
Which countries are most closely correlated in terms of movements in yields, and how has this evolved over time?
How does the maturity of a bond impact the volatility of its yield?
Which countries are least correlated in terms of yield?
Are there any observable correlations within regions?

..as well as additional/optional points:

Are there any other observations or insights from the data?
What additional data could be useful to build out the analysis?
Why does observing bond prices (as opposed to yields) tell us very little?

I’ll try to answer these broadly in order, and explore any tangents or interesting bits I find or go down as I go. Write-ups like this in very similar ‘conversational’ formats are my preferred way of working for more ‘exploratory’ pieces of work, as they provide documentation, process notes, future avenues of exploration, as well as some notes about failed approaches or similar ‘blind alleys’ all in one. These give me a decently thorough ‘refresher’ if I need to come back to a piece of work weeks or months later, so I am not left wondering what I was trying to accomplish with a piece of code, or why I approached something in a certain way - there is at least a modicum of context.

‘Other Insights and Observations’ is covered more-or-less throughout, and will not have a separate section. For those with little interest in the data reformatting (which will be, I assume, most people), I would recommend skipping straight to the start of my discussion on Volatility.

Data Reformatting

Most of the ‘hard part’ in this exercise is going to be restructuring the data. Currently, there are two tables - prices and yields - each with one row per day (apparently excluding weekends and at least some bank holidays, presumably in whatever jurisdiction this was collated in/for), and one column per bond type per nation, leading to c.13000 rows and 216 columns, as below:

knitr::kable(head(yields))

time	AU01	AU02	AU03	AU05	AU10	AU15	BE01	BE02	BE03	BE05	BE10	BE15	BE20	CA03M	CA06M	CA01	CA02	CA03	CA05	CA10	DK02	DK03	DK05	DK10	FR01	FR02	FR03	FR05	FR10	FR15	FR20	FR30	DE01	DE02	DE03	DE05	DE10	DE20	DE30	GR05	GR10	GR15	GR20	HK10	IN05	IN10	IE01	IE02	IE05	IE10	IT02	IT03	IT05	IT10	IT30	KR01	KR02	KR03	KR05	NL02	NL03	NL05	NL10	NL15	NL30	NO05	NO10	PT02	PT05	PT10	SG02	SG05	SG10	ZA05	ZA10	ZA20	ES02	ES03	ES05	ES10	ES30	GB01	GB02	GB03	GB05	GB10	GB15	GB20	GB30	US03M	US06M	US02	US03	US05	US10	US30
2000-05-05	6.280	6.34	6.460	6.52	6.56	6.56	2.86	4.931	5.114	5.419	5.728	5.838	5.657	5.58	5.89	6.19	6.251	6.395	6.375	6.285	5.442	5.619	5.697	5.848	4.592	4.805	5.119	5.289	5.566	5.571	5.719	5.818	4.571	4.770	5.006	5.223	5.436	5.533	5.698	6.238	6.299	6.350	6.391	7.645	9.699	10.432	4.433	4.664	5.460	5.700	5.044	5.227	5.529	5.762	6.078	8.28	8.720000	8.86	9.229999	4.927	5.142	5.373	5.603	4.259	5.808	6.49	6.34	5.237	5.538	5.774	3.089	3.88	4.388	13.85	14.41	14.33	4.545	5.151	5.413	5.717	5.933	6.156	6.267	6.248	5.846	5.373	5.012	4.804	4.593	5.899	5.910	6.784	4.557	6.692	6.511	6.157
2000-05-08	6.260	6.34	6.415	6.52	6.56	6.49	2.86	4.946	5.115	5.419	5.716	5.830	5.657	5.58	5.89	6.19	6.251	6.395	6.375	6.285	5.431	5.574	5.699	5.798	4.602	4.819	5.088	5.299	5.533	5.542	5.682	5.793	4.630	4.788	4.999	5.186	5.429	5.543	5.671	6.184	6.261	6.347	6.375	7.730	9.807	10.505	4.500	4.664	5.441	5.656	5.035	5.219	5.506	5.719	6.046	8.28	8.720000	8.85	9.210000	4.892	5.119	5.334	5.572	4.259	5.780	6.45	6.30	5.200	5.505	5.736	3.170	3.92	4.515	13.85	14.41	14.33	4.545	5.136	5.414	5.691	5.913	6.166	6.301	6.257	5.839	5.340	4.974	4.771	4.551	5.899	5.910	6.784	4.557	6.692	6.568	6.157
2000-05-09	6.340	6.38	6.500	6.54	6.55	6.55	2.86	4.937	5.104	5.386	5.679	5.800	5.657	5.58	5.88	6.20	6.316	6.481	6.457	6.363	5.393	5.526	5.658	5.751	4.569	4.801	5.060	5.267	5.496	5.494	5.631	5.732	4.545	4.753	4.966	5.139	5.382	5.491	5.617	6.184	6.248	6.355	6.386	7.761	9.786	10.514	4.480	4.664	5.382	5.617	5.027	5.220	5.494	5.693	5.983	8.28	8.720000	8.87	9.210000	4.870	5.099	5.307	5.537	4.259	5.725	6.50	6.30	5.140	5.461	5.696	3.159	3.99	4.519	13.85	14.41	14.33	4.545	5.118	5.387	5.657	5.880	6.140	6.321	6.226	5.801	5.299	4.931	4.722	4.491	5.984	6.031	6.872	4.557	6.827	6.524	6.248
2000-05-10	6.240	6.34	6.390	6.51	6.54	6.50	2.86	4.885	5.040	5.338	5.644	5.748	5.657	5.67	5.90	6.23	6.306	6.462	6.430	6.323	5.393	5.519	5.625	5.732	4.546	4.757	5.011	5.233	5.472	5.472	5.611	5.730	4.521	4.736	4.927	5.128	5.353	5.491	5.595	6.172	6.221	6.328	6.381	7.750	9.819	10.535	4.485	4.664	5.387	5.620	4.977	5.175	5.455	5.667	5.970	8.29	8.729999	8.89	9.220000	4.848	5.062	5.286	5.499	4.259	5.703	6.49	6.21	5.148	5.450	5.675	3.154	3.91	4.465	13.85	14.41	14.33	4.545	5.107	5.359	5.619	5.823	6.132	6.291	6.214	5.818	5.291	4.915	4.705	4.469	6.115	6.140	6.851	4.557	6.788	6.457	6.216
2000-05-11	6.220	6.26	6.374	6.42	6.41	6.42	2.86	4.915	5.094	5.376	5.723	5.838	5.657	5.66	5.87	6.20	6.272	6.386	6.342	6.230	5.512	5.638	5.740	5.825	4.600	4.842	5.100	5.201	5.532	5.543	5.710	5.799	4.717	4.754	4.976	5.238	5.395	5.554	5.693	6.148	6.230	6.342	6.381	7.750	9.809	10.527	4.509	4.664	5.475	5.686	5.044	5.214	5.491	5.758	6.075	8.29	8.729999	8.89	9.220000	4.969	5.171	5.389	5.585	4.259	5.795	6.51	6.27	5.258	5.556	5.767	3.198	3.96	4.487	13.85	14.41	14.33	4.545	5.130	5.402	5.668	5.885	6.213	6.289	6.307	5.922	5.393	5.027	4.791	4.527	6.083	6.104	6.804	4.557	6.668	6.420	6.154
2000-05-12	6.243	6.26	6.374	6.42	6.45	6.43	2.86	5.022	5.155	5.457	5.714	5.822	5.657	5.65	5.89	6.22	6.304	6.406	6.370	6.227	5.529	5.639	5.731	5.805	4.631	4.910	5.141	5.352	5.550	5.550	5.669	5.763	4.790	4.863	5.065	5.256	5.419	5.532	5.641	6.214	6.268	6.361	6.395	7.729	9.809	10.520	4.534	4.664	5.474	5.672	5.105	5.295	5.551	5.739	6.006	8.30	8.750000	8.91	9.229999	5.003	5.210	5.397	5.570	4.259	5.759	6.49	6.27	5.300	5.560	5.750	3.156	3.89	4.487	13.85	14.41	14.33	4.545	5.191	5.444	5.664	5.862	6.244	6.378	6.346	5.933	5.390	5.035	4.802	4.545	6.017	6.019	6.835	4.557	6.668	6.524	6.143

…and so forth. It is worth mentioning at this point that I made the date into something human readable during import, converting from the UNIX epoch millisecond time stamp to an actual date.

Fundamentally, this format will be incredibly difficult to work with: if I want to pull a single nation, I have to find every bond type included for that nation to manually include them, and ditto for the prices table. First step for me will, therefore, be to break this down into a ‘tidy’ or ‘long’ data format with one observation per row, and far fewer columns. My target is to resemble:

Date	Nation	Region	Bond Maturity	Price	Yield
2000-05-05	AU	Pacific	01	nnn	6.28
2000-05-05	AU	Pacific	02	nnn	6.34

…and so forth. In this way I can effortlessly filter or compare by nation, bond types, regions, or any combination thereof.

To accomplish this, I will make one fairly ugly loop which will step through each column, split the title into nation and bond type, and generate one row per day in the dataset for that nation and bond type. The same loop, with very minor tweaks, will do the same job on the prices dataset. I’ll then join them on date, nation, and maturity to marry them into a single 3 million-ish row dataset, and then do a bit of cleanup work - I know there are some packages that do easy conversion from ISO2C nation identifiers to ISO3C and/or full names, as well as including region identifiers, so I will use one of those to make the data a bit easier to work with when it comes to visualisations, since I’ll want nicer looking names at that point.

#Create an empty tibble in the correct format to receive the reformatted data

TidyPrices <- tibble("Date"=as.Date("2022-01-01"),"Nation"="","Maturity"="","Price"=0.0)[-1,]

for(i in 2:ncol(prices)){
  
    #The first two characters of the column name are taken as the nation identifier, and the 3..nth as the bond maturity
    nation <- str_sub(names(prices)[i],1,2)
    bond <- str_sub(names(prices)[i],3)
    PTemp <- prices %>% select(time,"Price"=i)
    
    #Generate a tibble and insert the first column data
    TempPrices <- tibble("Date"=prices$time, "Nation"=nation,"Maturity"=as.character(bond)) %>% 
      left_join(PTemp, by=c("Date"="time"))
    
    #Append the data to the 'master' variable
    TidyPrices <- union(TidyPrices,TempPrices)
  }

#The same again, but for Yields
TidyYields <- tibble("Date"=as.Date("2022-01-01"),"Nation"="","Maturity"="","Yield"=0.0)[-1,]

for(i in 2:ncol(yields)){
    nation <- str_sub(names(yields)[i],1,2)
    bond <- str_sub(names(yields)[i],3)
    YTemp <- yields %>% select(time,"Yield"=i)
    TempYields <- tibble("Date"=yields$time, "Nation"=nation,"Maturity"=as.character(bond)) %>% 
      left_join(YTemp, by=c("Date"="time"))
    TidyYields <- union(TidyYields,TempYields)
}

TidyData <- TidyPrices %>% 
  full_join(TidyYields, by=c("Date"="Date","Nation"="Nation","Maturity"="Maturity"))

rm(TidyPrices,TidyYields,i,nation,bond,YTemp,TempYields,PTemp,TempPrices)

R isn’t massively thrilled about those loops: it takes about two minutes to run through, and I suspect there is a ‘nicer’ way to accomplish same. That being said, a few minutes for a pair of 3 million entry tables isn’t too unreasonable.

So. I am going to use the countrycode package to append the country names (ISO English short name, technically, so China rather than the People’s Republic of China, e.g) as well as regions. The regions are a bit awkward: the Kaggle page says ‘6 world regions’, but these six match neither the UN Geoscheme regions, nor the World Bank regions. That consistency doesn’t matter much as long as I take it into account going forward. In this case, I am happiest with simple continent groupings rather than going with any of the regions, though this is fairly arbitrary. These will mainly be important for the ‘correlation within regions’ question, but most of that will come down to ‘it depends how you define the regions’ anyway. For the sake of clarity, I’ll also rename the column ‘Nation’ to ‘CountryCode’, and reorder the columns to be in line with what I would expect, even though this makes absolutely no difference to how I will handle them going forward.

TidyData <- TidyData %>% 
  mutate("Region"=countrycode(Nation,"iso2c","continent"), "Country"=countrycode(TidyData$Nation,"iso2c","iso.name.en")) %>% 
  rename("CountryCode"=Nation)
TidyData <- TidyData[,c(1,7,2,3,4,5,6)]

This leaves me with a dataset that is easy to use, and easy to further transform based on whatever needs I work out going forward. The table, filtered for 10 year instruments arbitrarily (..and to exclude price=0, just so the result below looks ‘complete’), looks like:

knitr::kable(head(TidyData %>% 
                    filter(Maturity=="10", !Price==0) %>% 
                    arrange(desc(Date)),10))

Date	Country	CountryCode	Maturity	Price	Yield	Region
2020-07-24	Australia	AU	10	115.2130	0.8784	Oceania
2020-07-24	Belgium	BE	10	102.6260	-0.1629	Europe
2020-07-24	Canada	CA	10	107.1910	0.5000	Americas
2020-07-24	China	CN	10	98.2916	2.8808	Asia
2020-07-24	Denmark	DK	10	107.6710	-0.3116	Europe
2020-07-24	France	FR	10	101.5058	-0.1446	Europe
2020-07-24	Germany	DE	10	104.5834	-0.4450	Europe
2020-07-24	Greece	GR	10	103.9000	1.0820	Europe
2020-07-24	Hong Kong	HK	10	116.4500	0.3952	Asia
2020-07-24	India	IN	10	99.5400	5.8510	Asia

That is all for the starting point data transformation, but there is a fairly substantial ‘missingness’ problem with regards to the data. There are quite a few entries with 0 yield and 0 price:

knitr::kable(TidyData %>% 
  filter(Yield==0, Price==0) %>% 
  summarise("Rows with missing Yield and Price"=n()))

Rows with missing Yield and Price
1755644

I am going to make the bold assumption that no yield and no price probably means the instrument wasn’t available in that nation at that time. I think this is an interesting point to explore in some cases, such as if some bond types disappear for a few days or weeks around certain events, but it is unclear whether the longer periods of missingness are at all interesting.

Perhaps more interesting are the rows where Price is 0 but Yield is present:

knitr::kable(TidyData %>% 
  filter(!Yield==0, Price==0) %>% 
  summarise("Rows with Yield but 0 Price"=n()))

Rows with Yield but 0 Price
667324

This means that about 2.4 million of the 2.75 million rows contains some degree of missingness, which isn’t fantastic. A huge portion of this should simply correlate to unavailable instruments, so it may not impact analysis too badly.

After a little bit of googling about why prices might be 0 with listed yields, my best guess is that these were benchmark figures with no instruments actually being sold, assuming the data is correct. The more likely option is that the data is simply incomplete for one reason or another. The yield figures still seem in line with reality, so I think I am safe to leave these in when considering volatility and other questions, but in a ‘real life’ situation I would definitely be asking for guidance from a conveniently available subject matter expert.

Realistically, though, I can save myself a little bit of (processing) time by creating a filtered dataset omitting entries with 0 price and 0 yield.

TFData <- TidyData %>% 
  filter(!(Yield==0 & Price==0))

…leaving me with just under 1,000,000 rows to play with.

Volatility

Slightly out of order, but I’ll handle the two volatility questions, and probably touch on

My starting point with volatility is simply: how do you define it?

First, it seems a bit more realistic to give period volatility (e.g over two weeks, a month, a year, etc) than to give volatility over a 50 year sample, so I will almost certainly break it down somewhat.

Second, and more importantly, what is it? I mean, I know the ‘lay’ definition, but I am not entirely certain in context. Min vs Max values over a period might be interesting or useful, and the simple delta gives an easily compared metric, but it seems overly simplistic. Amount of change over a period is an option: take the absolute of the delta of the day-to-day values and provide the mean, as a sort of proto-standard deviation.

After some reading (https://www.investopedia.com/terms/v/volatility.asp and https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118267059.app1) it seems like the ‘basic’ volatility measure is simple standard deviation, which is easy enough to accomplish. \(\beta\), which approximates the degree of movement of one security’s values versus a baseline, would be interesting and within my capabilities, but I am uncertain how to define a baseline appropriately - the mean of the sample, I suppose, but the more guesswork I apply in this way the less justifiable the outcome seems to me. For purpose, I am going to stick with simple standard deviation as a measure of volatility.

Yield Volatility by Nation

So, the ‘by nation’ portion of this question needs no thought or explaining, but I need to think about time frames at least a little bit. The first part of the first question is ‘Which countries have the most volatility in yield and which the least?’ - I think this can be reasonably answered with simple SD over the sample, but I’m not really thrilled with this: it feels simplistic, and I question whether anything meaningful could be garnered from SD over 50 years. However, that appears to be the request, so:

knitr::kable(TFData %>% 
               group_by(Country) %>% 
               summarise("Volatility"=sd(Yield)) %>% 
               arrange(desc(Volatility)))

Country	Volatility
Greece	11.5164471
Spain	10.3033781
Turkey	4.4130973
Australia	3.9365278
Italy	3.6655040
Portugal	3.5232880
Korea (the Republic of)	3.0438087
United Kingdom	3.0123540
Canada	2.9580026
France	2.8854052
South Africa	2.8824648
United States of America (the)	2.8006201
Germany	2.7477103
Ireland	2.5935859
Denmark	2.3436969
Belgium	2.2738368
Indonesia	2.2597620
Netherlands (the)	2.2506727
Norway	2.1616012
Hong Kong	1.9665356
India	1.4901352
Thailand	1.1865921
Singapore	0.9779651
Japan	0.8148027
New Zealand	0.8145194
Poland	0.7929027
China	0.7883345
Sweden	0.7659881
Malaysia	0.5977965
Taiwan (Province of China)	0.4360028

This is, of course, after removing the zero price/zero yield entries, but retaining the zero price/listed yield entries. To me, this illustrates two things. First, that SD over fifty years can give a relatively meaningful result: Greece and Spain with their semi-regular economic crises are extremely volatile, most ‘western’ or EU nations are clustered, and a group of nations that could be seen as being less economically free are grouped at the ‘low yield volatility’ end of the spectrum, which feels relatively consistent to me.

There are, of course, outliers: Sweden is the most ‘Western’ nation at the bottom end of the volatility spectrum, and I don’t have an instant explanation as to why. Poland’s yields are incredibly stable, which is at odds with my understanding of their post-Soviet economic history, with near hyperinflative conditions in the early 90s. I’ll have at least a quick look at both of these shortly. Frankly, I suspect that the ‘high outlier’ nature of Spain and Greece may be exasperated by the years in which data is available for them - the near inverse to Sweden and Poland, where

I will highlight that in a ‘real world’ situation, I would be exploring these data quality issues, coming up with a few quick charts or visualisations to illustrate the issue as I saw it, and then bringing that to the involved stakeholders to discuss options. In this context, I am going to answer the question ‘as written’ since direction cannot be obtained and to do anything else would simply be guessing, but caveat the answer and highlight the data issues. I will take this same approach when presenting.

To finish answering the first part of this question I can do a quick visualisation showing rates of volatility, giving an ‘at a glance’ comparison of the nations.

vol_full <- TFData %>% 
  group_by(Country) %>% 
  summarise("Volatility"=sd(Yield)) %>% 
  arrange(desc(Volatility)) %>%
  ggplot(aes(x=Volatility,y=reorder(Country,Volatility)))+
    geom_bar(stat="identity",fill="cornflowerblue")+
    xlab("Volatility")+
    ylab("")+
    ggtitle("Yield Volatility 1970-2020")+
    theme_minimal()

vol_full

‘Over Time’ is tied in with the data quality question, so I’ll take my little detour before moving on.

Sweden, Poland, and Data Quality

The biggest ‘problem’ with this dataset is one of missingness. This contains a nominal 50 years of data, but how many nations actually have 50 years (or anywhere close to that) of data other than rows of zeros?

TFData %>% 
  group_by(Country,Date) %>% 
  summarise(Present=TRUE,.groups="keep") %>%
  ggplot(aes(x=Date,y=reorder(Country,desc(Country)), fill=Present))+
    scale_fill_manual(values=c("cornflowerblue"))+
    geom_tile()+
    theme_minimal()+
    ylab("")+
    ggtitle("Periods with Data Present")+
    theme(legend.position="none")

Stripiness due to trading days could be resolved, but takes an unreasonable amount of time to find a ‘good’ solution contextually, so I will leave this as a bit of an eyesore. It is, at least, illustrative of the problem: some nations which look EXTREMELY stable have only very recent data. To give a more complete view:

knitr::kable(TFData %>%
  group_by(Country) %>%
  summarise("Start Date"=min(Date),.groups="keep") %>%
  arrange(`Start Date`))

Country	Start Date
Canada	1970-01-05
Germany	1979-12-24
United Kingdom	1979-12-24
Australia	1979-12-27
United States of America (the)	1980-01-02
France	1986-01-02
Italy	1988-09-01
Netherlands (the)	1993-01-04
Belgium	1993-12-06
Spain	1994-02-01
Norway	1994-02-02
Denmark	1994-02-14
Portugal	1994-05-04
Korea (the Republic of)	1995-05-02
South Africa	1995-05-19
Singapore	1997-04-18
Hong Kong	1998-01-09
India	1998-02-02
Ireland	1998-02-02
Greece	1998-12-22
China	2000-09-21
Thailand	2001-01-16
Malaysia	2001-10-15
Indonesia	2003-05-14
Japan	2006-02-06
Turkey	2006-07-03
Sweden	2012-08-14
Poland	2016-03-09
New Zealand	2016-04-26
Taiwan (Province of China)	2016-04-26

Sweden, Taiwan, New Zealand and Poland being especially notable, in that they all miss out on the 2008 financial crisis and aftermath completely. At the other extreme, Greece and Spain show up just in time to demonstrate their modern reputation of economic uncertainty. Most notably, this dataset starts at 1970 but has only Canadian data until 1979. The ‘overall’ figures should, therefore, be taken with a few tonnes of salt.

As an exercise, I will redo the ‘overall’ volatility measure using only the data since 2016-04-26, when the dataset becomes ‘complete’ for the first time:

knitr::kable(TFData %>% 
               filter(Date >= "2016-04-26") %>%
               group_by(Country) %>% 
               summarise("Volatility"=sd(Yield),.groups="keep") %>% 
               arrange(desc(Volatility)))

Country	Volatility
Turkey	4.7266250
Greece	2.5102988
Italy	1.3493113
South Africa	1.1198848
Portugal	1.1098210
Spain	0.9891931
Indonesia	0.8801135
Australia	0.8401523
United States of America (the)	0.8400791
Netherlands (the)	0.8373112
France	0.8331294
New Zealand	0.8145194
Poland	0.7967633
Belgium	0.7569377
India	0.7293758
Malaysia	0.6603802
China	0.6505065
Thailand	0.6467702
Germany	0.6238885
Ireland	0.6233773
Canada	0.6141509
United Kingdom	0.5887705
Singapore	0.5660003
Hong Kong	0.5557842
Korea (the Republic of)	0.4440892
Taiwan (Province of China)	0.4360028
Sweden	0.4199198
Norway	0.4189074
Denmark	0.3712102
Japan	0.3614450

vol_filt <- TFData %>% 
  filter(Date >= "2016-04-26") %>%
  group_by(Country) %>% 
  summarise("Volatility"=sd(Yield),.groups="keep") %>% 
  arrange(desc(Volatility)) %>%
  ggplot(aes(x=Volatility,y=reorder(Country,Volatility)))+
    geom_bar(stat="identity",fill="cornflowerblue",alpha=0.9)+
    xlab("Volatility")+
    ylab("")+
    ggtitle("Yield Volatility Since 2016")+
    theme_minimal()

vol_filt

..or with matching axes for a better side-by-side comparison (if one can excuse the horizontal compression):

grid.arrange(vol_full+xlim(0,12), vol_filt+xlim(0,12),ncol=2)

With a like-for-like comparison data completion-wise, most outliers disappear. Turkey’s yield volatility has barely changed with the date shift (from 2006 previously, now from 2016), implying relatively consistent market conditions (or, consistently inconsistent market conditions). Greece is less of an outlier, but still volatile especially when compared to other EU/EEA nations. The remaining high volatility nations are about as one would expect: Italy, Spain, South Africa, and Portugal being slightly higher than nations which would be viewed as being relatively stable (UK, Canada, Scandinavia, etc.) and similar, with Scandinavia and Japan rounding up the ‘stable’ end of the spectrum.

Yield Volatility Over Time

As illustrated above, especially with regard to the data start dates, change over the whole period is relatively meaningless. Prior to 1980, there is only really data from Canada, which goes up to 7 nations by 1990. For the sake of the exercise, I’m going to try a few graphs starting in 1980 to illustrate yield changes over time. It is worth noting here that there is what appears to be some garbage data in this dataset with regards to Spain in March 1994, where some days show in excess of 800% bond yields, throwing the volatility entirely out of line with reality, especially when broken down by month. Presumably, this is simple garbage data rather than having a real explanation, so I will filter it out.

VolAnum <- TFData %>%
  filter(Date >= "1980-01-01", !Yield>=100) %>%
  group_by(Country, Region, Year=year(Date)) %>%
  summarise(Volatility=sd(Yield),.groups="keep")

VolMonthly <- TFData %>%
  filter(Date >= "1980-01-01", !Yield>=100) %>%
  group_by(Country, Region, Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Volatility=sd(Yield),.groups="keep")

VolAnum %>%
  ggplot(aes(x=Year,y=Volatility,colour=Country))+
    geom_line()+
    theme_minimal()+
    scale_y_log10()+
    ylab("Volatility (log10)")+
    xlab("")+
    theme(legend.position="none")+
    ggtitle("Yearly Volatility by Nation")

VolMonthly %>%
  ggplot(aes(x=Date,y=Volatility,colour=Country))+
    geom_line()+
    theme_minimal()+
    scale_y_log10()+
    ylab("Volatility (log10)")+
    xlab("")+
    theme(legend.position="none")+
    ggtitle("Monthly Volatility by Nation")

The inclusion of Greece alone makes this far less readable than it has any business being, though there are too many nations included for this to be particularly informative. I have deliberately omitted the legend to make the lines fractionally more readable, on the basis that the graph is so unreadable with so much colour overlap that even with a legend no sense could be made of it.

This being done, I don’t think lines or a candlestick graph will ever be massively informative here unless trying to look at one (or very few) nations, though with the annual breakdown one can start to see some fairly extreme correlations, both temporally and between nations.

I think a heatmap might display the ‘over time’ trends better:

VolAnum %>%
  mutate(Outlier=ifelse(Volatility>=3,TRUE,FALSE), Volatility=ifelse(Volatility>=2,2,Volatility)) %>%
  ggplot(aes(x=Year,y=reorder(Country,desc(Country)), fill=Volatility))+
    geom_tile()+
    theme_minimal()+
    ylab("")+
    theme(legend.position="none")+
    ggtitle("Yield Volatility Since 1980")

This obviously compresses the level of volatility in no small way - without some compression the more volatile periods completely overwhelm everything else, and handling them more gracefully would require a decent amount of time expended on one small visual. To answer the question, though: yield volatility evolves massively over time - you can more or less pick out major global and regional events even on a simplistic graphic like this one.

For instance, looking at Europe:

VolAnum %>%
  filter(Region=="Europe") %>%
  mutate(Outlier=ifelse(Volatility>=3,TRUE,FALSE), Volatility=ifelse(Volatility>=2,2,Volatility)) %>%
  ggplot(aes(x=Year,y=reorder(Country,desc(Country)), fill=Volatility))+
    geom_tile()+
    theme_minimal()+
    ylab("")+
    theme(legend.position="none")+
    ggtitle("European Yield Volatility Since 1980")

…one can easily pick out the 2008 financial crash, the Eurozone debt crisis, and (at least in the case of Italy and Greece) what probably amounts to the beginnings of COVID-19 related volatility. Without looking at summary statistics or mathematically derived correlation coefficients, one can intuit with a reasonable degree of accuracy how linked or unlinked nations are economically - the EU nations are near carbon copies of one another, while Scandinavia remains clearly separated, but not entirely unaffected - and fairly strongly correlated with each other.

Looking at the United States only:

VolMonthly %>%
  filter(Country=="United States of America (the)") %>%
  ggplot(aes(x=Date,y=Volatility))+
    geom_line()+    
    theme_minimal()+
    ylab("Volatility")+
    geom_vline(aes(xintercept=as.Date("1990-08-01"),colour="red"))+
    geom_vline(aes(xintercept=as.Date("1994-01-01"),colour="red"))+
    geom_vline(aes(xintercept=as.Date("2001-09-01"),colour="red"))+
    geom_vline(aes(xintercept=as.Date("2003-03-01"),colour="red"))+
    geom_vline(aes(xintercept=as.Date("2007-09-01"),colour="red"))+
    theme(legend.position="none")+
    ggtitle("USA Yield Volatility Since 1980")

..we can, even without the assistance of lines marking dates, pick out the Gulf War, start of NAFTA and localised shocks caused by the Mexican ‘Peso Crisis’, September 11 2001, followed closely by the second Iraq War, as well as the end of Q2 2007 leading into the sub-prime mortgage crisis. The chart ends, of course, with the beginnings of the COVID uncertainty.

Without the annotations, we can look at two quite close trade partners, France and Germany:

VolMonthly %>%
  filter(Country %in% c("France","Germany"), Date >= "1990-01-01") %>%
  ggplot(aes(x=Date,y=Volatility,colour=Country))+
    geom_line()+    
    theme_minimal()+
    ylab("")+
    ggtitle("German and French Yield Volatility Since 1990")

..we can see that they track incredibly closely, especailly between 2000 and 2010, when they are almost indistinguishable. Since 2010, the shapes remain consistent, despite the divergence.

Maturity and Volatility

I don’t really have much of a ‘gut feeling’ of what the answer here will be, or why, but it is easy enough to look at:

knitr::kable(TFData %>% 
  group_by(Maturity) %>%
  summarise(Volatility=sd(Yield), Sample=n(),.groups="keep") %>% 
  arrange(desc(Volatility)))

Maturity	Volatility	Sample
03	6.4294127	122351
20	5.0716333	74775
05	4.9837343	171897
25	3.8471081	12829
10	3.5408829	175350
02	3.4206638	138432
01	3.2596165	75990
03M	3.0163429	16149
15	2.8719753	68090
30	2.5092943	74187
06M	1.9384928	19422
07	1.6609227	28069
01M	1.4703415	4676
50	1.1811947	7698
02M	0.8622539	445
40	0.7026141	3201

So, the reaction looking at the entire dataset would be ‘it doesn’t’. However, we have tens or hundreds of thousands of examples of some maturities, and hundreds or low thousands of others. This may also change substantially by nation - if Greece is ‘responsible’ for the majority of the 3 or 20 year bonds, they would be a high outlier because of the association with Greece, not because of the Maturity.

Because of the data quality issues discussed before, I’m almost not certain how to approach this properly. This is a case where I would definitely be going back to a stakeholder and talking about the data issues and trying to work out what they were trying to figure out here to know what the most appropriate way to slice the data is.

I guess I can consider which nations use which maturities, but then there is the question of ‘when’. I think the best approach is to use the dataset containing all nations (2016 onward) to get the most consistent picture. I’m also going to exclude maturities with below 5000 samples, as they are likely to be issued by only one or two nations, and therefore not contributing at all to an overall picture of the effects of Maturity on Volatility.

knitr::kable(TFData %>%
  filter(Date >= "2016-04-26") %>% 
  group_by(Maturity) %>%
  summarise(Volatility=sd(Yield), Sample=n(),.groups="keep") %>% 
  filter(Sample >= 5000) %>%
  arrange(desc(Volatility)))

Maturity	Volatility	Sample
01	4.022928	18190
03	3.614053	22470
05	3.239926	32100
02	3.058203	27820
25	3.011663	7490
10	3.011514	32100
20	2.845421	19170
15	2.705239	17307
30	2.478833	19137
07	1.882220	18190

I would say that, broadly speaking, the volatility of yield is inversely correlated with Maturity - the longer the Maturity, the lower the volatility. However, this is at odds with what a bit of reading about bonds tells me: it says that longer-term bonds should be MORE volatile than lower term bonds. I am going to assume, therefore, that there is something weird with the data: the sample of nations and distribution of maturities means that something ‘funny’ is going on.

I’ll try to demonstrate this by breaking the above down by nation:

knitr::kable(TFData %>%
  filter(Date >= "2016-04-26") %>% 
  group_by(Country,Maturity) %>%
  summarise(Volatility=sd(Yield), Sample=n(),.groups="keep") %>% 
  arrange(Country))

Country	Maturity	Volatility	Sample
Australia	01	0.5232951	1070
Australia	02	0.5740956	1070
Australia	03	0.6187630	1070
Australia	05	0.6845195	1070
Australia	07	0.7161627	1070
Australia	10	0.7023857	1070
Australia	15	0.7401314	1070
Australia	20	0.1154084	187
Australia	30	1.1069387	947
Belgium	01	0.0683349	1070
Belgium	02	0.0741974	1070
Belgium	03	0.1060574	1070
Belgium	05	0.2100796	1070
Belgium	07	0.2787462	1070
Belgium	10	0.3681651	1070
Belgium	15	0.4282925	1070
Belgium	20	0.4138677	1070
Belgium	30	0.4396111	1070
Canada	01	0.5814184	1070
Canada	02	0.5917615	1070
Canada	03	0.5871970	1070
Canada	03M	0.5302934	1070
Canada	05	0.5742670	1070
Canada	06M	0.5468774	1070
Canada	10	0.5126812	1070
Canada	30	0.4113583	1070
China	01	0.9756378	1070
China	02	0.5121565	1070
China	03	0.7504183	1070
China	05	0.4601611	1070
China	07	0.4035091	1070
China	10	0.3705094	1070
China	15	0.3397111	1070
China	20	0.3460935	1070
China	30	0.3249406	1070
Denmark	02	0.1223675	1070
Denmark	03	0.1488948	1070
Denmark	05	0.2253679	1070
Denmark	10	0.3707640	1070
Denmark	20	0.1412321	187
France	01	0.0603588	1070
France	02	0.0918562	1070
France	03	0.1491822	1070
France	05	0.2398681	1070
France	07	0.2875683	1070
France	10	0.3936286	1070
France	15	0.4359402	1070
France	20	0.4613718	1070
France	25	0.4502617	1070
France	30	0.4601811	1070
France	50	0.4795901	1070
Germany	01	0.0805831	1070
Germany	02	0.0872463	1070
Germany	03	0.1209972	1070
Germany	05	0.2086514	1070
Germany	07	0.2843686	1070
Germany	10	0.3711332	1070
Germany	15	0.4005665	1070
Germany	20	0.4845173	1070
Germany	25	0.4652682	1070
Germany	30	0.5348418	1070
Greece	05	3.3252171	1070
Greece	10	2.2246409	1070
Greece	15	2.3196618	1070
Greece	20	2.2389929	1070
Greece	25	2.1852659	1070
Hong Kong	01	0.5940577	1070
Hong Kong	02	0.5504825	1070
Hong Kong	03	0.5303794	1070
Hong Kong	05	0.5267569	1070
Hong Kong	07	0.5130899	1070
Hong Kong	10	0.4944564	1070
India	01	0.5957291	1070
India	02	0.8343062	1070
India	05	0.6576624	1070
India	07	0.5894984	1070
India	10	0.6195345	1070
India	20	0.4576384	1070
India	25	0.4500240	1070
Indonesia	01	0.8343626	1070
Indonesia	03	0.6450666	1070
Indonesia	05	0.6605422	1070
Indonesia	10	0.5589509	1070
Indonesia	15	0.4970779	1070
Indonesia	20	0.4578735	1070
Indonesia	25	0.4825076	1070
Indonesia	30	0.5026452	1070
Ireland	01	0.3905691	1070
Ireland	02	0.0783087	1070
Ireland	03	0.1024767	1070
Ireland	05	0.1749922	1070
Ireland	10	0.3783510	1070
Ireland	15	0.4190097	1070
Italy	01	0.3232781	1070
Italy	02	0.4644664	1070
Italy	03	0.5513359	1070
Italy	05	0.6174200	1070
Italy	07	0.6618042	1070
Italy	10	0.6441796	1070
Italy	15	0.9913044	1070
Italy	20	1.2828464	1070
Italy	25	0.9922118	1070
Italy	30	0.7849119	1070
Japan	01	0.0833918	1070
Japan	02	0.0565517	1070
Japan	03	0.0606268	1070
Japan	05	0.0720416	1070
Japan	07	0.0995204	1070
Japan	10	0.0945444	1070
Japan	15	0.1185438	1070
Japan	20	0.1632577	1070
Japan	30	0.2133646	1070
Japan	40	0.2601987	1070
Korea (the Republic of)	01	0.3242430	1070
Korea (the Republic of)	02	0.3572878	1070
Korea (the Republic of)	03	0.3772311	1070
Korea (the Republic of)	05	0.4164929	1070
Korea (the Republic of)	10	0.4380287	1070
Korea (the Republic of)	20	0.4175923	1070
Korea (the Republic of)	30	0.4112942	1070
Malaysia	03	0.6610088	1070
Malaysia	05	0.6618528	1070
Malaysia	07	0.3963499	1070
Malaysia	10	0.3923085	1070
Malaysia	15	0.4533075	1070
Malaysia	20	0.4502009	1070
Malaysia	30	0.3943682	1070
Netherlands (the)	01	0.0705049	1070
Netherlands (the)	02	0.0764310	1070
Netherlands (the)	03	0.1016692	1070
Netherlands (the)	05	0.1700871	1070
Netherlands (the)	07	0.2741386	1070
Netherlands (the)	10	0.3624083	1070
Netherlands (the)	15	1.4593897	1070
Netherlands (the)	20	0.3974531	1070
Netherlands (the)	30	0.4714278	1070
New Zealand	02	0.5654251	1070
New Zealand	05	0.6693993	1070
New Zealand	10	0.7591092	1070
New Zealand	15	1.3508619	187
New Zealand	20	1.1156280	187
Norway	02	0.3618590	1070
Norway	05	0.3345781	1070
Norway	07	0.3331710	1070
Norway	10	0.3558900	1070
Poland	02	0.5858118	1070
Poland	05	0.5512618	1070
Poland	10	0.6375984	1070
Portugal	02	0.3382581	1070
Portugal	03	0.4745268	1070
Portugal	05	0.7657995	1070
Portugal	07	1.1072724	1070
Portugal	10	1.1658532	1070
Portugal	20	0.2513097	187
Singapore	02	0.4721856	1070
Singapore	05	0.4319671	1070
Singapore	10	0.4181775	1070
Singapore	15	0.4416675	1070
Singapore	20	0.4220559	1070
Singapore	30	0.4289101	1070
South Africa	03	0.7521665	1070
South Africa	05	0.5390450	1070
South Africa	10	0.5152318	1070
South Africa	15	0.5424790	1070
South Africa	20	0.5799395	1070
South Africa	30	0.5535330	1070
Spain	02	0.1250253	1070
Spain	03	0.1657500	1070
Spain	05	0.2426607	1070
Spain	07	0.3539316	1070
Spain	10	0.4852516	1070
Spain	20	0.6372948	1070
Spain	30	0.6295438	1070
Sweden	02	0.1515283	1070
Sweden	05	0.2272169	1070
Sweden	07	0.2759229	1070
Sweden	10	0.3418934	1070
Sweden	20	0.1029414	187
Taiwan (Province of China)	02	0.0775297	1070
Taiwan (Province of China)	05	0.1280148	1070
Taiwan (Province of China)	10	0.2036157	1070
Taiwan (Province of China)	20	0.3571261	1070
Taiwan (Province of China)	30	0.3740729	1070
Thailand	02	0.3643583	1070
Thailand	05	0.4276580	1070
Thailand	10	0.5281419	1070
Thailand	15	0.6254620	1070
Turkey	01	5.8353690	1070
Turkey	02	5.3098306	1070
Turkey	03	4.8250379	1070
Turkey	05	3.9614126	1070
Turkey	10	3.0810486	1070
United Kingdom	01	0.2690984	1070
United Kingdom	02	0.2791825	1070
United Kingdom	03	0.2717020	1070
United Kingdom	05	0.3163380	1070
United Kingdom	07	0.3499090	1070
United Kingdom	10	0.3860596	1070
United Kingdom	15	0.4133474	1070
United Kingdom	20	0.4161874	1070
United Kingdom	25	0.4076955	1070
United Kingdom	30	0.3353617	1070
United Kingdom	50	0.3936867	1070
United States of America (the)	01	0.7812878	1070
United States of America (the)	01M	0.7965811	1070
United States of America (the)	02	0.7636595	1070
United States of America (the)	02M	0.8622539	445
United States of America (the)	03	0.7461904	1070
United States of America (the)	03M	0.7902415	1070
United States of America (the)	05	0.7133855	1070
United States of America (the)	06M	0.7859200	1070
United States of America (the)	07	0.6812258	1070
United States of America (the)	10	0.6597364	1070
United States of America (the)	20	0.0989086	45
United States of America (the)	30	0.5342911	1070

TFData %>% 
  filter(Date >= "2016-04-26", !Price==0) %>%
  mutate(Maturity=as.numeric(Maturity)) %>%
  na.omit() %>%
  group_by(Country,Maturity,Region) %>%
  summarise(Volatility=sd(Yield), Sample=n(),.groups="keep") %>% 
  filter(Sample >=1000) %>%
  arrange(Country) %>%
  ggplot(aes(x=Maturity,y=Volatility,colour=Country))+
    geom_point()+
    geom_line(stat="smooth",method = "lm", formula = y ~ x, se = FALSE, size=0.5, alpha=0.2)+
    theme_minimal()+
    ggtitle("Maturity vs Volatility")

So, it appears that this is lovely example of Simpson’s Paradox, one of my favourite statistical fallacies. It refers to a phenomenon where trends exist in groups of data, but disappear or reverse when the groups are combined. In this case, breaking this down into nations provides the ‘correct’ answer: that higher maturity bonds have higher yield volatility… but not by much, and not universally. In aggregate, the few outlier volatility nations with ‘incorrect’ best fit slopes (Greece in dark green and Turkey in pink) overwhelm the others. It may be fair to say that the less economic certainty there is in a nation, the higher the variance in Yield, and that makes a kind of intuitive sense to me, but this is where my lack of domain knowledge comes to the fore.

Edit 01/05/22: It is worth noting that there is a consistent drop in volatility with regards to the longest maturity instrument available in most jurisdictions. This is far more prominent in those where the longest maturity corresponds with a substantial drop in sample size. I do not know if this implies a lack of interest, or that the instrument is not consistently available as a primary market offering, but either of those could realistically cause a lack of apparent volatility among those products. It is unclear whether these should be omitted or included, but it is worth noting regardless.

National Yield Correlations

Note 02/05/22: I think this section (and the bit above on volatility) have been a bit clumsily written to imply that the relationship between national yields and/or yield volatilities is causative rather than correlative. While it is very likely that it is somewhat causative, this should be read as (e.g in the case of France and Germany having near 98% correlation) that an event which affects one of these economies is likely to have a near-identical effect on the other.

This has been covered a little bit above with regard to yield volatility, but not with strict yield. I would expect the result to be similar: nations with close economic ties are impacted by fluctuations in each other’s economies, as well as by national, regional, and international events at least somewhat similarly. In this way, I would expect Japan to have a stronger correlation with the USA than with China, and Australia to be more closely linked to its Commonwealth peers than its near neighbours.

As a starting point, I’ll look at France and Germany again, as their yield volatility, at least, is closely correlated:

TFData %>%
  filter(Country %in% c("France","Germany"), Date >= "1990-01-01") %>%
  group_by(Country, Date) %>%
  summarise(Yield=mean(Yield),.groups="keep") %>%
  ggplot(aes(x=Date,y=Yield,colour=Country))+
    geom_line()+    
    theme_minimal()+
    ylab("")+
    ggtitle("German and French Yields Since 1990")

I mean, I can do this mathematically as well, but those are nearly identically shaped even when they deviate. Correlation plots and matrices, especially with time series data, is not something I’ve spent a lot of time playing with, so we’ll see if I get to something useful.

What it seems like I need is to get the data from my current ‘tidy’ format into a slightly wider format, like:

Date	DE_Yield	FR_Yield

..which is easy enough, actually. I think from there I can use correlate() from the corrr package (which I haven’t actually used before..) to generate a correlation matrix. This should be replicable for the complete dataset easily enough.

FYield <- TFData %>%
  filter(Country %in% "France", Date >= "1990-01-01") %>%
  group_by(Date) %>%
  summarise(FR_Yield=mean(Yield),.groups="keep")

GYield <- TFData %>%
  filter(Country %in% "Germany", Date >= "1990-01-01") %>%
  group_by(Date) %>%
  summarise(DE_Yield=mean(Yield),.groups="keep")

GFYield <- full_join(FYield,GYield,by="Date")

knitr::kable(GFYield %>% 
  ungroup() %>%
  select(FR_Yield,DE_Yield) %>%
  correlate(quiet=TRUE))

term	FR_Yield	DE_Yield
FR_Yield	NA	0.9771743
DE_Yield	0.9771743	NA

Okay, cool. So, the Pearson’s Correlation Coefficient between the German and French yields are 0.977 - an absurdly closely correlated relationship. Pearson’s Correlation Coefficient provides an estimate of the slope of the relationship between the two factors, so long as they use the same scale. In this case, a one point fluctuation in the yield of one of these nations should provide an 0.977 point fluctuation in the same direction for the other.

I think I can use a reshaping package to do this in one go for the full dataset instead of doing this for 30 nations manually, and then create a little visualisation without too much trouble. Correlation functions handle null values very badly, so I am going to use the 2016 date where all of the nations are present in the data as a starting point. Realistically, I probably could have used melt() and cast() to do the initial cleanup more gracefully rather than use a loop, but oh well.

WideYield <- TFData %>%
  filter(Date >= "2016-04-26") %>%
  group_by(Date,CountryCode) %>%
  summarise(Yield=mean(Yield),.groups="keep") %>%
  pivot_wider(id_cols=Date,names_from=CountryCode,values_from=Yield)

YieldCorrs <- WideYield %>%
  ungroup()  %>%
  select(-Date) %>%
  correlate(quiet=TRUE) %>%
  column_to_rownames(var="term")

knitr::kable(YieldCorrs)

	AU	BE	CA	CN	DE	DK	ES	FR	GB	GR	HK	ID	IE	IN	IT	JP	KR	MY	NL	NO	NZ	PL	PT	SE	SG	TH	TR	TW	US	ZA
AU	NA	0.8870312	0.3911642	0.6417824	0.9429857	0.7549636	0.8811956	0.9326234	0.8304877	0.6461212	0.3717444	0.1351675	0.7671742	0.6845899	0.3763824	0.7142192	0.8523321	0.8762465	0.8259843	0.4476247	0.9482942	0.7841900	0.5818301	0.6669701	0.7600473	0.8816584	0.0335420	0.9044513	0.4935790	-0.4153050
BE	0.8870312	NA	0.4711414	0.6059199	0.9436634	0.8200378	0.8801707	0.9676223	0.8372466	0.4500394	0.4594577	0.2918290	0.8257587	0.7295274	0.6116291	0.7871695	0.8642044	0.8310146	0.7181063	0.4992508	0.7766363	0.6178124	0.4170695	0.8080461	0.7369825	0.8326243	0.2241310	0.7611191	0.5843807	-0.2780308
CA	0.3911642	0.4711414	NA	0.6154976	0.4864123	0.1233518	0.1328027	0.3552329	0.6614010	-0.2704249	0.9460893	0.1917836	0.0927301	0.6379730	0.4889613	0.4009744	0.6735412	0.5176527	0.0602138	0.8986183	0.2136858	0.3128744	-0.4080698	0.3463706	0.7831740	0.4961926	0.6872376	0.2257962	0.9520739	-0.2915587
CN	0.6417824	0.6059199	0.6154976	NA	0.6981484	0.4093182	0.4248279	0.6084594	0.6140995	0.0849612	0.5260864	-0.2095985	0.2500815	0.4743627	0.2503735	0.6490446	0.8113094	0.6242709	0.3045604	0.4932474	0.5462258	0.5482670	-0.0419342	0.5162376	0.6198703	0.5703326	0.2280510	0.5868611	0.6137073	-0.4331410
DE	0.9429857	0.9436634	0.4864123	0.6981484	NA	0.8406511	0.8547517	0.9467046	0.8789453	0.5216559	0.4359689	0.0951142	0.7784952	0.7327440	0.4141212	0.7597975	0.9036164	0.8443313	0.8111676	0.5083664	0.8627683	0.7128413	0.4601613	0.7716760	0.7656342	0.8350607	0.0919476	0.8258343	0.5642985	-0.4116513
DK	0.7549636	0.8200378	0.1233518	0.4093182	0.8406511	NA	0.7951999	0.8224070	0.6372000	0.4882066	0.1099711	0.0236656	0.8182273	0.5684379	0.2827398	0.6423529	0.6863332	0.5904540	0.8183082	0.1635314	0.7155954	0.4547212	0.5477329	0.8121138	0.4313770	0.5511314	-0.1665369	0.6116830	0.2033908	-0.2408408
ES	0.8811956	0.8801707	0.1328027	0.4248279	0.8547517	0.7951999	NA	0.9297748	0.7093239	0.7362352	0.1338097	0.2885410	0.9013996	0.6017404	0.4482302	0.6549443	0.6724485	0.7701143	0.8479878	0.1976253	0.8548744	0.6541161	0.7302855	0.6757845	0.5505193	0.7897732	-0.0568263	0.8249878	0.2564142	-0.1417553
FR	0.9326234	0.9676223	0.3552329	0.6084594	0.9467046	0.8224070	0.9297748	NA	0.8021916	0.5795105	0.3549778	0.2091545	0.8381453	0.6470834	0.4907413	0.7874904	0.8369840	0.8398088	0.7710074	0.4117826	0.8667711	0.7231295	0.5602838	0.7890622	0.6915880	0.8492330	0.0622053	0.8675629	0.4697444	-0.3072362
GB	0.8304877	0.8372466	0.6614010	0.6140995	0.8789453	0.6372000	0.7093239	0.8021916	NA	0.3939936	0.6401657	0.2812246	0.7055754	0.8355429	0.4439865	0.5098200	0.8372463	0.8470582	0.7157414	0.7272158	0.7449323	0.6843325	0.3000069	0.6103911	0.9102522	0.8335089	0.3197094	0.7058767	0.7358572	-0.3578252
GR	0.6461212	0.4500394	-0.2704249	0.0849612	0.5216559	0.4882066	0.7362352	0.5795105	0.3939936	NA	-0.2706130	0.1031037	0.7025122	0.3194746	-0.0302860	0.1965682	0.2356404	0.5014677	0.8049060	-0.1032516	0.7588498	0.6021429	0.9278676	0.2033026	0.2730120	0.5752330	-0.3920566	0.7058846	-0.1441591	-0.2386630
HK	0.3717444	0.4594577	0.9460893	0.5260864	0.4359689	0.1099711	0.1338097	0.3549778	0.6401657	-0.2706130	NA	0.3402767	0.1179660	0.6442931	0.5314668	0.3794933	0.6382622	0.5319335	0.0433025	0.9199561	0.2044794	0.3247795	-0.3578746	0.3381692	0.8018852	0.5195183	0.7161885	0.2273320	0.9318048	-0.1802882
ID	0.1351675	0.2918290	0.1917836	-0.2095985	0.0951142	0.0236656	0.2885410	0.2091545	0.2812246	0.1031037	0.3402767	NA	0.3608218	0.3886371	0.6463048	0.0723546	0.0340262	0.3248384	0.1396119	0.3124090	0.0354513	0.0333344	0.1324365	0.0506745	0.3472447	0.4109520	0.5932707	0.0714957	0.3210029	0.3859559
IE	0.7671742	0.8257587	0.0927301	0.2500815	0.7784952	0.8182273	0.9013996	0.8381453	0.7055754	0.7025122	0.1179660	0.3608218	NA	0.6752144	0.3837843	0.4356998	0.5605316	0.6964962	0.8598236	0.1953743	0.7618993	0.5465415	0.7204500	0.6249735	0.5383200	0.7004328	-0.0438032	0.6858203	0.2280601	-0.1614131
IN	0.6845899	0.7295274	0.6379730	0.4743627	0.7327440	0.5684379	0.6017404	0.6470834	0.8355429	0.3194746	0.6442931	0.3886371	0.6752144	NA	0.5155226	0.3793774	0.7182927	0.7230827	0.6288284	0.6549026	0.5741039	0.4664353	0.2249105	0.4567032	0.8314620	0.7438017	0.4650625	0.4614727	0.7209303	-0.2719622
IT	0.3763824	0.6116291	0.4889613	0.2503735	0.4141212	0.2827398	0.4482302	0.4907413	0.4439865	-0.0302860	0.5314668	0.6463048	0.3837843	0.5155226	NA	0.5493995	0.4556152	0.4588377	0.1548424	0.4444365	0.1842722	0.1213302	-0.0554801	0.4196564	0.4792528	0.5238000	0.6670654	0.1988683	0.5829604	0.1477253
JP	0.7142192	0.7871695	0.4009744	0.6490446	0.7597975	0.6423529	0.6549443	0.7874904	0.5098200	0.1965682	0.3794933	0.0723546	0.4356998	0.3793774	0.5493995	NA	0.7629559	0.5951764	0.4212975	0.3337002	0.5809748	0.4204909	0.1890006	0.7853377	0.4632148	0.5947643	0.1577777	0.5953644	0.4363745	-0.1489002
KR	0.8523321	0.8642044	0.6735412	0.8113094	0.9036164	0.6863332	0.6724485	0.8369840	0.8372463	0.2356404	0.6382622	0.0340262	0.5605316	0.7182927	0.4556152	0.7629559	NA	0.8136326	0.5686665	0.6445228	0.7226666	0.6523179	0.1604115	0.7154380	0.8199423	0.7627421	0.2632737	0.7204986	0.7253244	-0.3995203
MY	0.8762465	0.8310146	0.5176527	0.6242709	0.8443313	0.5904540	0.7701143	0.8398088	0.8470582	0.5014677	0.5319335	0.3248384	0.6964962	0.7230827	0.4588377	0.5951764	0.8136326	NA	0.6856801	0.5895543	0.8032857	0.7432153	0.4173800	0.5256810	0.8492449	0.9087671	0.2628979	0.8153338	0.6378713	-0.3110147
NL	0.8259843	0.7181063	0.0602138	0.3045604	0.8111676	0.8183082	0.8479878	0.7710074	0.7157414	0.8049060	0.0433025	0.1396119	0.8598236	0.6288284	0.1548424	0.4212975	0.5686665	0.6856801	NA	0.1881831	0.8495014	0.6248745	0.7939812	0.5331859	0.5363621	0.7027485	-0.2101456	0.7373591	0.1646139	-0.3022398
NO	0.4476247	0.4992508	0.8986183	0.4932474	0.5083664	0.1635314	0.1976253	0.4117826	0.7272158	-0.1032516	0.9199561	0.3124090	0.1953743	0.6549026	0.4444365	0.3337002	0.6445228	0.5895543	0.1881831	NA	0.3248484	0.4708639	-0.1927064	0.3140852	0.8721695	0.6139877	0.6164304	0.3592907	0.9257208	-0.3246456
NZ	0.9482942	0.7766363	0.2136858	0.5462258	0.8627683	0.7155954	0.8548744	0.8667711	0.7449323	0.7588498	0.2044794	0.0354513	0.7618993	0.5741039	0.1842722	0.5809748	0.7226666	0.8032857	0.8495014	0.3248484	NA	0.8140374	0.7097068	0.5798993	0.6573695	0.8061906	-0.1727084	0.9114451	0.3069331	-0.3847664
PL	0.7841900	0.6178124	0.3128744	0.5482670	0.7128413	0.4547212	0.6541161	0.7231295	0.6843325	0.6021429	0.3247795	0.0333344	0.5465415	0.4664353	0.1213302	0.4204909	0.6523179	0.7432153	0.6248745	0.4708639	0.8140374	NA	0.5424482	0.3345324	0.6975497	0.7908699	-0.0390739	0.8760888	0.4110531	-0.4538014
PT	0.5818301	0.4170695	-0.4080698	-0.0419342	0.4601613	0.5477329	0.7302855	0.5602838	0.3000069	0.9278676	-0.3578746	0.1324365	0.7204500	0.2249105	-0.0554801	0.1890006	0.1604115	0.4173800	0.7939812	-0.1927064	0.7097068	0.5424482	NA	0.2636413	0.1649462	0.4864557	-0.4981141	0.6678145	-0.2776004	-0.1426094
SE	0.6669701	0.8080461	0.3463706	0.5162376	0.7716760	0.8121138	0.6757845	0.7890622	0.6103911	0.2033026	0.3381692	0.0506745	0.6249735	0.4567032	0.4196564	0.7853377	0.7154380	0.5256810	0.5331859	0.3140852	0.5798993	0.3345324	0.2636413	NA	0.4400672	0.4590945	-0.0154838	0.5278960	0.3511126	-0.1526391
SG	0.7600473	0.7369825	0.7831740	0.6198703	0.7656342	0.4313770	0.5505193	0.6915880	0.9102522	0.2730120	0.8018852	0.3472447	0.5383200	0.8314620	0.4792528	0.4632148	0.8199423	0.8492449	0.5363621	0.8721695	0.6573695	0.6975497	0.1649462	0.4400672	NA	0.8647628	0.4816576	0.6601702	0.8688741	-0.4000656
TH	0.8816584	0.8326243	0.4961926	0.5703326	0.8350607	0.5511314	0.7897732	0.8492330	0.8335089	0.5752330	0.5195183	0.4109520	0.7004328	0.7438017	0.5238000	0.5947643	0.7627421	0.9087671	0.7027485	0.6139877	0.8061906	0.7908699	0.4864557	0.4590945	0.8647628	NA	0.3263941	0.8333181	0.6561389	-0.3545035
TR	0.0335420	0.2241310	0.6872376	0.2280510	0.0919476	-0.1665369	-0.0568263	0.0622053	0.3197094	-0.3920566	0.7161885	0.5932707	-0.0438032	0.4650625	0.6670654	0.1577777	0.2632737	0.2628979	-0.2101456	0.6164304	-0.1727084	-0.0390739	-0.4981141	-0.0154838	0.4816576	0.3263941	NA	-0.1320428	0.7274695	0.0973620
TW	0.9044513	0.7611191	0.2257962	0.5868611	0.8258343	0.6116830	0.8249878	0.8675629	0.7058767	0.7058846	0.2273320	0.0714957	0.6858203	0.4614727	0.1988683	0.5953644	0.7204986	0.8153338	0.7373591	0.3592907	0.9114451	0.8760888	0.6678145	0.5278960	0.6601702	0.8333181	-0.1320428	NA	0.3479677	-0.4432692
US	0.4935790	0.5843807	0.9520739	0.6137073	0.5642985	0.2033908	0.2564142	0.4697444	0.7358572	-0.1441591	0.9318048	0.3210029	0.2280601	0.7209303	0.5829604	0.4363745	0.7253244	0.6378713	0.1646139	0.9257208	0.3069331	0.4110531	-0.2776004	0.3511126	0.8688741	0.6561389	0.7274695	0.3479677	NA	-0.3504164
ZA	-0.4153050	-0.2780308	-0.2915587	-0.4331410	-0.4116513	-0.2408408	-0.1417553	-0.3072362	-0.3578252	-0.2386630	-0.1802882	0.3859559	-0.1614131	-0.2719622	0.1477253	-0.1489002	-0.3995203	-0.3110147	-0.3022398	-0.3246456	-0.3847664	-0.4538014	-0.1426094	-0.1526391	-0.4000656	-0.3545035	0.0973620	-0.4432692	-0.3504164	NA

corrplot(as.matrix(YieldCorrs),method="circle",order="hclust",title="Yield Correlations",cl.pos="n",mar=c(0,0,2,0))

Just about understandable, and considering the number of variables, I’m happy enough given the context of this activity.

This is pretty much as expected, though: The closer the trading partner, the more closely correlated the yields: ‘The West’ can be broadly construed as a cluster, with Australia more closely related to Europe than their near neighbours. South Africa is off on its own, with no other African nations in the sample. Surprisingly, to me, China is more heavily correlated with Korea and Japan than its larger trading partners, and even more closely correlated with Korea than Hong Kong. This is made up for, I guess, by Hong Kong having a closer correlation to the USA, Canada, and slightly strangely, Norway than to China, Korea, and Japan. I suppose this makes sense due to the international financial market centricity of Hong Kong, but it still surprises me.

Least correlated are, as expected, geographically distant nations with no real trading relationship. Greece and Portugal’s yields have no relationship at all with Norway, China, Taiwan, Hong Kong, Canada, or the American yields. South Africa has no real relationship with most of the other sampled nations, even having apparent negative correlations with most of them.

Bond Prices vs Yields

This is a quick answer: it is because Yield already accounts for the relationship between coupon rate and the trading price of the instrument. The price, in and of itself, tells us nothing about the yield or the coupon rate, only the trading price versus the par value of the bond.

Additional Useful Data

The list is near endless.

Coupon Rate might be interesting, and could add value in places. This could almost certainly be worked out as a solid estimate from Yield and Price (probably with a little bit of rounding to smooth it out), but having it in the data would be useful.

The dataset being complete would be, of course, useful, but seems to be a bit of a pipe dream. Incomplete data limits what I can do with this, and severely limit the value of this dataset - unless one were interested in Britain, France, Canada, German, and the USA, this dataset is far more limited than it appears on its face.

National economic information would provide huge amounts of context and potential avenues of exploration: inflation (and inflation estimates), as well as interest rates (and the dates when rate rises are announced) would both add value to this dataset and analysis.

Dates of major national, regional, and international events would be valuable as well: the aforementioned rate rise announcements, economic policy/budget announcements, conflicts such as demonstrated on the United States Yield Variance graph above, election dates (especially contentious ones), smaller scale conflicts, disasters, and anything else that can cause economic upheaval.

The last one there is a bit of a catch-all, but it really could go on for absolutely ages.

RPI <- fread("RPI.csv") %>% 
  mutate(Date=ym(Year)) %>% 
  select(-Year) %>% 
  group_by(Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(RPI=mean(RPI),.groups="keep") %>%
  ungroup()

Interest <- fread("BOEInterestRate.csv") %>%
  mutate(Date=dmy(`Date Changed`)) %>% 
  select(-`Date Changed`) %>%
  group_by(Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Rate=mean(Rate),.groups="keep") %>%
  ungroup()

UKExpanded <- TFData %>% 
  filter(CountryCode=="GB") %>% 
  group_by(Country, Region, Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Volatility=sd(Yield),.groups="keep",Yield=mean(Yield)) %>%
  left_join(Interest, by="Date") %>% 
  left_join(RPI, by="Date") %>% 
  arrange(Date)


for(i in 2:nrow(UKExpanded)){
  UKExpanded$Rate[i]=ifelse(is.na(UKExpanded$Rate[i]),UKExpanded$Rate[i-1],UKExpanded$Rate[i])
  UKExpanded$RPI[i]=ifelse(is.na(UKExpanded$RPI[i]),UKExpanded$RPI[i-1],UKExpanded$RPI[i])
}

Graphing ^

colours <- c("Yield"="plum","Rate"="cornflowerblue","RPI"="gray70")
UKExpanded %>% 
  filter(!Rate==0 & !RPI==0) %>%
  ggplot(aes(x=Date))+ 
    theme_minimal()+
    geom_ribbon(aes(y=Yield,ymin=Yield-Volatility,ymax=Yield+Volatility, fill="Yield"),alpha=.15)+
    geom_line(aes(y=Yield,colour="Yield"),size=.75)+
    geom_line(aes(y=RPI,colour="RPI"),size=.75)+
    geom_line(aes(y=Rate,colour="Rate"),size=.75)+
    scale_colour_manual(values=colours)+  
    ylab("")+
    xlab("")+
    scale_y_continuous(labels = scales::percent_format(scale = 1))+
    ggtitle("UK Bond Yield vs Interest Rates and Reported RPI")+
    guides(fill="none",colour=guide_legend(""))

TFData %>% 
  group_by(Country,Date=as.Date(paste(format(Date,'%Y'),"-1-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Present=TRUE,.groups="keep") %>%
  ggplot(aes(x=Date,y=reorder(Country,desc(Country)), fill=Present,alpha=0.9))+
    scale_fill_manual(values=c("cornflowerblue"))+
    geom_tile()+
    theme_minimal()+
    ylab("")+
    xlab("")+
    ggtitle("Periods with Data Present")+
    theme(legend.position="none")

EuroYield <- TFData %>%
  filter(Date >= "2016-04-26",Region=="Europe") %>%
  group_by(Date,CountryCode) %>%
  summarise(Yield=mean(Yield),.groups="keep") %>%
  pivot_wider(id_cols=Date,names_from=CountryCode,values_from=Yield) %>%
  ungroup() %>%
  select(-Date) %>%
  correlate(quiet=TRUE) %>%
  column_to_rownames(var="term")

corrplot(as.matrix(EuroYield),method="circle",order="hclust",title="European Yield Correlations", cl.pos="n", mar=c(0,0,2,0))

AsiaYield <- TFData %>%
  filter(Date >= "2016-04-26",Region=="Asia") %>%
  group_by(Date,CountryCode) %>%
  summarise(Yield=mean(Yield),.groups="keep") %>%
  pivot_wider(id_cols=Date,names_from=CountryCode,values_from=Yield) %>%
  ungroup() %>%
  select(-Date) %>%
  correlate(quiet=TRUE) %>%
  column_to_rownames(var="term")

corrplot(as.matrix(AsiaYield),method="circle",order="hclust",title="Asian Yield Correlations", cl.pos="n", mar=c(0,0,2,0))

NATOYield <- TFData %>%
  filter(Date >= "2016-04-26",CountryCode %in% c("CA","BE","DK","FR","DE","GR","US","IT","NE","PL","PT","NO","GB","TK")) %>%
  group_by(Date,CountryCode) %>%
  summarise(Yield=mean(Yield),.groups="keep") %>%
  pivot_wider(id_cols=Date,names_from=CountryCode,values_from=Yield) %>%
  ungroup() %>%
  select(-Date) %>%
  correlate(quiet=TRUE) %>%
  column_to_rownames(var="term")

corrplot(as.matrix(NATOYield),method="circle",order="hclust",title="NATO Member Yield Correlations",cl.pos="n",mar=c(0,0,2,0))

EuropeYields <- TFData %>% 
  filter(Region=="Europe") %>% 
  group_by(Country, Region, Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Volatility=sd(Yield),.groups="keep",Yield=mean(Yield),Price=mean(Price))

EuropeYields %>%
  filter(Date >= "1997-01-01", Date <= "2010-06-01", !Country %in% c("Norway","Poland","Portugal","Netherlands (the)","Ireland")) %>%
  ggplot(aes(x=Date,y=Yield,colour=Country))+
    geom_line(size=0.75,alpha=0.75)+
    geom_vline(xintercept=as.Date("1999-01-01"),alpha=0.5,size=.75,colour="grey70")+
    geom_vline(xintercept=as.Date("2008-09-01"),alpha=0.5,size=.75,colour="grey70")+
    theme_minimal()+
    ylab("")+
    xlab("")+
    scale_y_continuous(labels = scales::percent_format(scale = 1))+
    guides(fill="none",colour=guide_legend(""))+
    ggtitle("Select European Bond Yields")

EuropeYields %>%
  filter(Date >= "1997-01-01", Date <= "2010-06-01", !Country %in% c("Norway","Poland","Portugal","Netherlands (the)","Ireland")) %>%
  ggplot(aes(x=Date,y=Volatility,colour=Country))+
    geom_line(size=0.75,alpha=0.75)+
    geom_vline(xintercept=as.Date("1999-01-01"),alpha=0.5,size=.75,colour="grey70")+
    geom_vline(xintercept=as.Date("2008-09-01"),alpha=0.5,size=.75,colour="grey70")+
    theme_minimal()+
    ylab("")+
    xlab("")+
    guides(fill="none",colour=guide_legend(""))+
    ggtitle("Select European Bond Volatilities")

AsianYields <- TFData %>% 
  filter(Region %in% c("Asia")) %>% 
  group_by(Country, Region, Date=as.Date(paste(format(Date,'%Y-%m'),"-1",sep=""),"%Y-%m-%d")) %>%
  summarise(Volatility=sd(Yield),.groups="keep",Yield=mean(Yield))

AsianYields %>%
  filter(Date >= "1997-01-01", Date <= "2010-01-01", !Country %in% c("Turkey","Indonesia")) %>%
  ggplot(aes(x=Date,y=Yield,colour=Country))+
    geom_line(size=0.75,alpha=0.75)+
    theme_minimal()+
    ylab("")+
    xlab("")+
    scale_y_continuous(labels = scales::percent_format(scale = 1))+
    guides(fill="none",colour=guide_legend(""))+
    ggtitle("Select Southeast Asian Bond Yields")

TFData %>%
  filter(Date >= "2016-01-01", !Price==0) %>%
  mutate(Maturity=as.numeric(Maturity)) %>%
  na.omit() %>%
  group_by(Country,Maturity,Region) %>%
  summarise(Volatility=sd(Yield), Sample=n(),.groups="keep") %>% 
  filter(Sample >= 1000, !Country=="Turkey") %>%
  arrange(Country) %>%
  ggplot(aes(x=Maturity,y=Volatility))+
    geom_point(aes(colour=Country),alpha=.25)+
    geom_line(stat="smooth",method = "lm", formula = y ~ x, se = FALSE, size=0.5,alpha=.5,aes(colour=Country))+
    geom_line(stat="smooth",method = "lm", formula = y ~ x, se = FALSE, size=1, alpha=.5)+
    theme_minimal()+
    ggtitle("Maturity vs Volatility")+
    facet_wrap(vars(Region),scales="free")+
    theme(legend.position="none")