Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke, 1968, Science
The very fabric of life now, she thought as she rose, is magic. In the eighteenth century, we knew how everything was done; but here I rise through the air; I listen to voices in America; I see men flying – but how it’s done I can’t even begin to wonder. So my belief in magic returns. Virginia Woolf, 1928, Orlando
By Screen captures from 2001: A Space Odyssey DVD, Fair use, https://en.wikipedia.org/w/index.php?curid=29557739
Rethinking Statistics, Richard McElreath
A Golem is an entity which follows instructions but does not think for itself. The story is that, one was created by a rabbi to protect the Jews from pogroms in the 16th century Prague. The Golem had to eventually be destroyed.
“The Golem took the pails and ran swiftly to the brook. Several hours later the courtyard of the house of the Rabbi was flooded with water, … But it was not found until the Golem was seen patiently obeying his orders by continuing to pour water into the kegs which had been filled a long time before…”
https://statsandr.com/blog/what-statistical-test-should-i-do/
A process Model: Expresses Causal Structure
A Statistical Model: Explains data via simplifying the reality that explains/creates it based on a process model.
A hypothesis can be tested based on the assumption that a model is correct.
Bayesian Hierarchical Modeling: Statistics for Spatio-Temporal Data Cressie & Wikle pg 21.
Statistical Rethinking: A Bayesian Course with Examples in R and STAN. Richard McElreath ch1.
Lab conditions. “…With careful adjustment, the coin started heads up always lands heads up—one hundred percent of the time. We conclude that coin tossing is”physics” not “random.”
Real life conditions even of the lab mentioned above is not going to be as straightforward.
Is the relationship between these things well known or is there uncertainty in the processes themselves?
Once the process model is selected, how do we represent it?
Statistical modeling around the physical representation
Some Modeling Questions
Should we set up a log-linear model or linear model?
How many levels of uncertainty are there?
What sort of distributions should we use to represent uncertainty of the data?
Can we represent the spatial dependence in the counties of California.
To which extent were socio-demographic and other factors affected the Covid-19 mortality counts.
How did these effects change through time (time is measured in biweeks).
Can we represent different pandemic regimes in California and tie it to particular policies.
Recall we are trying to model and understand changes in Covid Mortality in counties of California \(Y\).
Y=f(Observed Covariates, Unknown Effects)
\(Y_{t,i}\sim Pois(\lambda_{t,i})\) t is time index, i is county index.
\(log(\lambda_{t,i})=Linear\ Equation_{t,i}\)
\(Linear\ Equation_{t,i} = Obs. \ Covariates \ Effects_{t,i} + Unobs \ Effects_{t,i}\)
- Things that are closer to each other are more likely to be similar.
- There is a process based on areal units, beyond observed quantities, which effects Covid-19 mortality.
How do we represent a process of similarity in neighboring counties?
Random effects? \(\theta\) ? This would not take into account similarity of neighborhoods.
By creating Adjacency Matrices. i.e.: A county is a neighbor of the other if they share boundaries. Spatial Effects are quantified with \(\phi\)
node1 <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
node2 <- c(7, 38, 39, 41, 43, 50, 3, 5, 9, 26,
Random Effects \(\theta\)
Spatial Effects \(\phi\)
A simple additive set of components is possible in the Bayesian Framework
\(\gamma = \theta + \phi\)
The Problem in : \(Linear\ Equation_{t,i} = Obs. \ Covariates' \ Effects_{t,i} + \gamma_{t,i}\)
\[f(\phi_{1},\ldots,\phi_{N})\]
How can we get this joint probability that need to be incorporated into the full joint distribution for Bayesian calculations?
A joint distribution might not be proper even though full conditionals of variables are.
\[p(\phi_{1},\phi_{2}) \propto exp \left(-\frac{1}{2}\large(\phi_{1}-\phi_{2}\large)^{2}\right)\] \[f(\phi_{1}|\phi_{2})\sim N(\phi_{2},1) , f(\phi_{2}|\phi_{1})\sim N(\phi_{1},1) \]
Brook’s Lemma 1964, Besag 1974 From Local to Global (Markov Random Fields)
\[ \frac{P(\Phi)}{P(\Phi^{*})}= \frac{P(\phi_{1}|\phi_{2},\ldots,\phi_{N})}{P(\phi_{1}^{*}|\phi_{2},\ldots,\phi_{N})}\times \frac{P(\phi_{2}|\phi_{1}^{*},\phi_{3}\ldots,\phi_{N})}{P(\phi_{2}^{*}|\phi_{1}^{*},\phi_{3}\ldots,\phi_{N})}\times \\ \frac{P(\phi_{3}|\phi_{1}^{*},\phi_{2}^{*},\phi_{4}\ldots,\phi_{N})}{P(\phi_{3}^{*}|\phi_{1}^{*},\phi_{2}^{*},\phi_{4}\ldots,\phi_{N})}\times \frac{P(\phi_{n}|\phi_{1}^{*},\ldots,\phi_{N-1}^{*})}{P(\phi_{n}^{*}|\phi_{1}^{*},\ldots,\phi_{N-1}^{*})}\]
Where \(*\) represents a fixed value that exists in the domain of \(\Phi\). Example slides 21-27
Markov random fields are undirected graphical models composed of - Vertex: Random Variable
Edge: Dependency between random variables
Clique: A set of random variables where each is connected via edges.
Potentials: a function of k arguments that are exchangeable in the function, defined over cliques. k=2 ex: \((Y_{i}-Y{j})^{2}\)
Figure 1: “Markov Random Field Shared Borders and Conditional Dependence”
Hammersley-Clifford, 1971 shows how cliques and potentials can be used to determine the joint distributions.
Besag 1974 and Clifford 1990 shows if there is MRF it is in the form of Gibbs distribution.
Gibbs distribution: \(p(\phi_{1},\ldots,\phi_{N})\) is defined via \(\phi_{i}\) only through potential on cliques. k=1, no spatial structure, we choose k = 2.
\[exp \left(-\frac{1}{2}\sum_{i \sim j}(\phi_{i}-\phi_{j})^{2} \right)\]
\[exp \left(-\frac{1}{2}\sum_{i \sim j}(\phi_{i}-\phi_{j})^{2} \right)\]
\[exp \left(-\frac{1}{ 2}\sum_{i \sim j}(\phi_{i}-\phi_{j})^{2} \right)\]
STAN Function via Mitzi Morris:
\[\sigma\times \left(\sqrt{\frac{\rho}{s}}\times\phi+\left(\sqrt{(1-\rho)}\times\theta\right) \right)\] \(\theta_{i}\sim N(0,1)\) random effect specific to county i.
s is a scaling factor which ensures \(var(\phi_{i})\approx 1\)
\(\rho\) Mixing component between spatial and specific effects.
\(\sigma\) the overall standard deviation for combination of effects.
\(\beta_{Pov}\) | \(\beta_{Inc}\) | \(\beta_{Dens}\) | \(\beta_{gini}\) | \(\beta_{white}\) | \(\beta_{age}\) | \(\beta_{sex}\) | |
---|---|---|---|---|---|---|---|
Min | -0.054 | -0.328 | 0.025 | -0.152 | 0.36 | -0.023 | -0.258 |
2.5th Perc. | 0.017 | -0.248 | 0.074 | -0.097 | 0.677 | 0.044 | -0.181 |
Med. | 0.089 | -0.161 | 0.112 | -0.041 | 0.985 | 0.122 | -0.113 |
Mean | 0.089 | -0.162 | 0.112 | -0.042 | 0.984 | 0.122 | -0.114 |
97.5th Perc. | 0.162 | -0.075 | 0.149 | 0.013 | 1.289 | 0.201 | -0.049 |
Max | 0.231 | 0.009 | 0.184 | 0.068 | 1.678 | 0.278 | 0.014 |
Define \(z_{t,n} = 1,\ldots,K\) as the indicator which determines the parameters of the process for \(Y_{t,n}\).
\(z_{t}=k\)
\(k\) is hidden but can be inferred from observed data. We can call \(k\) the regime indicator.
If the history of \(z_{t}\) gave no information about it we would have a mixture distribution. \[P(Y_{t}|\lambda_{t})= \sum\limits_{k=1}^{k=K} P(z=k)\times P(Y_{t}|\lambda_{t,k})\]
We suggest that there is some information in the regimes’ history.
However \(P(z_{t}|z_{t-1},\ldots,z_{1})=P(z_{t}|z_{t-1})\), these are transition probabilities, furthermore
These probabilities are time invariant. \(P(z_{t}=k|z_{t-1}=l)=A_{l,k}\)
Example with T=2, single Y, Ignore priors besides the one for \(z_{1}\) \[ P(z_{1}=k)\times P(z_{t=2}=l|z_{t=1}=k)\times P(Y_{1}|\lambda_{1,k})\times P(Y_{2}|\lambda_{2,l})\] \[P(z_{1}=k,z_{t=2}=l)\times P(Y_{1},Y_{2}|\lambda_{1,k},\lambda_{2,l})\] \[P(z_{1}=k,z_{2}=l,Y_{1},Y_{2})\]
Example with T=T \[ P(z_{1})\prod\limits_{t=2}^{t=T} P(z_{t}|z_{t-1})\times \prod\limits_{t=1}^{t=T}P(Y_{t}|\lambda_{t},z_{t}) \] \[P(z_{1}=k,\ldots,z_{T}=l,Y_{1},\ldots,Y_{T}) \]
parameters
{ordered[K] mu[N]; simplex[K] pi1[N]; simplex[K] A[K,N];}
transformed parameters {
matrix[K,N] lambda [T] ;real logalpha[T,K,N]; real accumulator[K];
for(k in 1:K){ for(n in 1:N){
logalpha[1,k,n] = log(pi1[n,k]) + poisson_lpmf(y[1,n] | lambda[1,k,n]);}}
for (t in 2:T) {for (j in 1:K){for (n in 1:N){
accumulator[1] = logalpha[t-1, 1,n] + log(A[1, n,j]) + poisson_lpmf(y[t,n] | lambda[t,j,n]);
accumulator[2] = logalpha[t-1, 2,n] + log(A[2, n,j]) + poisson_lpmf(y[t,n] | lambda[t,j,n]);
logalpha[t, j,n] = log_sum_exp(to_vector(accumulator));}}}}
Change in Base Risk
\(exp(\mu_1)=exp(-0.737)=0.478\), \(exp(\mu_2)=exp(0.148)=1.16\)
\(P(z_{t}=1|z_{t-1}=1)=0.933\), \(P(z_{t}=2|z_{t-1}=1)=0.067\),
\(P(z_{t}=1|z_{t-1}=2)=.066\), \(P(z_{t}=2|z_{t-1}=2)=0.934\)
Further work
Questions?