Word vector rely on deep-learning based word-correlation analysis of texts with ~ billions of words to provide interesting linear structures in the word-vector space which convey relationships between words. This analysis explores some of the “canonical” examples of word vector relations - mostly as an exercise to see if I can reproduce known results before embarking on my own anlaysis.
An interesting finding is that while the scalar product of two vectors reveals the cos difference between vectors, there appears to be more information in looking at the similarity of a spectrum of like words.
The deep learning for word-vector analysis relies on the “pre-trained” GloVe vectors from Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014.
The first step is to load the GloVe word vectors.
number.of.words<-9900
word.vector.file<-"glove.6B.300d.txt"
The analysis uses the first 9900 of the GloVe word vectors glove.6B.300d.txt. It took 23 seconds to process the data into a data frame.
Before delving into the full analysis, it’s helpful to look at vector representations of familiar word-pairs to help build an intuitive sense of how vector representations of words facilitate their coparison and analysis.
Vectors of individual words are easily extracted from the word vector data frame using simple regular expressions syntax.
man.vec <- vector.normalize( colSums( word.vector.df[ grep( "^man$", word.list), -1] ) )
The vector man.vec is normalized to unit length representing the word man.
There is an embedded colSums command, irrelevant in this specific case, to ensure all selected vectors are added in case of a more general regular expression search. This generaization isn’t useful right now, but may be useful later. For example, selecting strings starting with “govern”
govern.vec <- vector.normalize( colSums( word.vector.df[ grep( "^govern", word.list), -1] ) )
creates a unit vector representing the following 8 words: government, governor, governments, governing, governors, governmental, governance, and governed.
Before diving in to comparisons, it’s helpful to reproduce an well-known and intuitively appealing example of word vector comparisons. Looking at the vectors of multiple words with a male-female relationship, we can see how the relationships develop.
The example below approximately reproduces results here ). The postion of each word represents the “vector” of that word (noting that this is a two dimenstional representation of a 300 element vector)
Recalling basic properties of vector addition, we can see from the above that the approximate expressions seem to hold:
male+woman−female≈man
man+queen−woman≈king
While these identities are not fully satisfied in the two dimensional representation of the 300 element word vectors, we can quantify differences more precisely by taking the quantity
cosθd=‖
where
\begin{aligned} \textbf{pseudo.d} \equiv \textbf{a} + \textbf{b} - \textbf{c} \approx \textbf{d} \end{aligned}
In this case \cos{\theta_{king}} = 0.655 and the \cos{\theta_{man}} = 0.709. The question is, is this good or not good?
To get an idea of the significance of compounding wrods, let’s compare the similarity of the pseudo.vector to the word itself.
As a first stab we can look at the vectors themselves as shown below. The plot simply encodes the value of each vector component. While the vector for \textbf{king} and \textbf{pseudo.king} are similar, it’s hard, at least for for me, to distinguish that the two vectors have very much in common. For instance, \textbf{king} any closer to \textbf{woman} than to \textbf{man}? In retrospect this is not surprising. The vectors are highly abstracted representations that make sense only in the context of other words.
It turns out it’s much more interesting to look at the vectors not in the abstract, but in the context of actual words. The plots below I’ve taken the scalar product of the vectors \textbf{king} and \textbf{pseudo.king} with \textbf{king} and some neighboring words (for all intents, random).
Here we take just a sample of the nearest 88 neighboring words. Note that both the \textbf{king} and \textbf{pseudo.king} have a high scalar product with the vector representing \textbf{king}, whereas the root vector \textbf{man} does not.
Another point is the relatively higher scalar products of the pseudo.vector with words like “britain” and “george” also stand out. This is important to recognize in relation to “compounded meaning” of text in relation to specific words.
As a comparison the plot below shows the scalar products for \textbf{king} and \textbf{man} with neighboring words. This reveals the scope of the change in the vector, and some sense the meaning of the word.
This was just a trial to test how capitol and state paris lined up. It’s interesting that California and Sacramento are “upside down”, showing a limitation of this technique.
It’s relatively easy to reproduce “canonical” examples of word vector relations. The work as advertised.
The value of the scalar product of two word vectors seems to convey less information than the context of the scalar product, i.e. there appears to be more information in comparing a spectrum of words. This suggests a metric of goodness might be synthesized from something like this nearest neighbor comparison. Something to explore.