In many cases, we need to model distributions that have a recurring structure.
The representations are for two such situations:
- One is temporal scenarios, where we want to model a probabilistic structure that holds constant over time; here, we use Hidden Markov Models, or, more generally, Dynamic Bayesian Networks.
- The other is aimed at scenarios that involve multiple similar entities, each of whose properties is governed by a similar model; here, we use Plate Models.
Goal: a tool such the language on graphical modelsto deal with the very large class of cases \(\rightarrow\) a general purpose representation that allows us to solve multiple problems using the same exact model.
Genetic Inheritance
\(\Longrightarrow\) as input a pedigree (a family tree). And we’re interested in reasoning about a particular trait. And for each pedigreeing each trait we can construct a Bayesian network and the Bayesian network might look like:
BNs for Genetic Inheritance
grViz("
digraph causal{
# a 'graph' statement
graph [overlap = true, fontsize = 10]
# several 'node' statements
node [shape = circle,
style = filled,
fontname = Helvetica,
fillcolor = Linen]
G_Clancy; B_Clancy
G_Jackie; B_Jackie
G_Marge; B_Marge
G_Selma; B_Selma
G_Homer; B_Homer
G_Bart; B_Bart
G_Lisa; B_Lisa
G_Maggie; B_Maggie
# Edges
edge[color=black, arrowhead=vee]
G_Clancy -> B_Clancy; G_Clancy -> G_Marge; G_Clancy -> G_Selma
G_Jackie -> B_Jackie; G_Jackie -> G_Marge; G_Jackie -> G_Selma
G_Marge -> B_Marge; G_Marge -> G_Bart; G_Marge -> G_Lisa; G_Marge -> G_Maggie
G_Selma -> B_Selma
G_Homer -> B_Homer; G_Homer -> G_Bart; G_Homer -> G_Lisa; G_Homer -> G_Maggie;
G_Bart -> B_Bart
G_Lisa -> B_Lisa
G_Maggie -> B_Maggie
}")If we have a somewhat different family tree: another 3 cousins and a great grandfather join the family tree (a different family altogether), we would still want to use the same sort of ideas. The same pieces of what we used in the \(1^{st}\) network to construct this other network because there’s clearly a lot of commonalities between them. \(\rightarrow\) call sharing between models.
In the example as a fairly obvious sharing within the model: the CPD that tells us how, Selma’s genotype affects Selma’s blood type presumably is the same process by which Marge’s genotype affects Marge’s blood type. And the same for Maggie and Lisa and Bart and Homer and everybody. \(\rightarrow\) sharing of both dependency models and parameters.
Moreover, the genetic inheritance model by which Bart’s genotype is determined by the genotype of his two parents is the same inheritance model that applies to Lisa, to Maggie, to Marge, the Selma, and so on.
\(\rightarrow\) a lot of parameters that are shared, not just between, but also within the model.
\(\Longrightarrow\) to have, some way of constructing models that have this large amounts of shared structure, that allows us to, both, construct very large models from a, sparse parameterization, and also to construct entire families of models from a single concise representation.
The same arguement can be applied for the natural language processing a sequence model is trying to identify name density recognition and an image segmentation with a separate model for every superpixel in the image.
NLP Sequence Model
Image Segmentation
\(\Longrightarrow\) sharing between and within a model.
Let’s look at the bigger picture by coming back to the University example, we want to think about an entire university, not just about an individual student.
The University Example
So now we have a difficulty variable, so these are all difficulty variables. The different courses, in this case, C1 up to CN are different courses that exist in our university. And conversely on the other side, we have multiple students and this is a set of intelligence variables that are indexed by different students. So, we have the intelligence of student one up to the intelligence of student m. Now note that these are different random variables. They can and generally will take different values from each other. but they all share a probabilistic model. And that’s sort of the kind of sharing that we have in mind. And what we see here is that the grade of a student within a course, which are these variables done here, depend on the difficulty of the relevant course and the intelligence of the relevant student. So for example, the grade of student one in course one depends on the difficulty of course one and on the intelligence of student student one. And, once again we have, sharing of the both the structure and the parameters across these different grade variables \(\Longrightarrow\) they all have, the same kind of dependency structure, and the same CPD.
Another example is that of robot localization: a time series.
Robot Localization
The robot moves through time from one position to another. Although the position, at time t, is different changes over time. We expected the dynamics of the robot are fixed. So that gives us a graphical model:
Robot Localization Graphical Model
Where we see the B position, the robot pose over here, these X variables depend on, for example, the previous pose, on whatever control action the robot took. And we’re assuming, that once again, we have sharing of these parameters, where each extentiation of this variable.
Given a class of models that are represented in terms of template variables
- Template variable \(X\)(\(U_1\), …, \(U_k\)) is instantiated (duplicated) multiple times + Location(t), Sonar (t)
+ Genotype (person), Phenotype (person)
+ Label (pixel)
+ Difficulty (course), Intelligence (student), Grade (course, student)
\(\longrightarrow\) Template models can often capture events that occur in a time series: Template models are a convenient way of representing Bayesian networks that have a high amount of parameter sharing and structure. At the end of the day, however, they are merely compact representations of a fully unrolled Bayesian network, and thus have no additional representative powers.
\(\longrightarrow\) CPDs in template models can often be copied many times: Template models are a convenient way of representing Bayesian networks that have a high amount of parameter sharing and structure. At the end of the day, however, they are merely compact representations of a fully unrolled Bayesian network, and thus have no additional representative powers.
\(\longrightarrow\) Template models can capture parameter sharing within a model: Template models are a convenient way of representing Bayesian networks that have a high amount of parameter sharing and structure. At the end of the day, however, they are merely compact representations of a fully unrolled Bayesian network, and thus have no additional representative powers.
[1] Probabilistic Graphical Models - Coursera https://www.coursera.org/learn/probabilistic-graphical-models/
[2] Daphne Koller, Probabilistic Graphical Models