First, let’s load the packages we’ll sue
require(rjags)
require(Rgraphviz)
require(data.table)
Create some simulated data:
set.seed(432104)
OK. So let’s define a simple model is the basis for our simulated data and our probabilistic model of inference. First we’ll assume that there is in fact a latent ability attribute, \(\theta_i\), for each student \(i \in {1,\dots,.n}\) in our sample. Further, we’ll assume that this latent trait is accurately measured by an unobserved RIT score, which we’ll denote \(r_{i}\). In other words, RIT scores are a direct measure of \(\theta_i\). Luckily for us, students take a test that gives us an estimate of RIT scores, the MAP. So we assume that the observed RIT, \(\hat{r}\) is a realization from a distribution centered on \(r\) Finally, we assume that all other academic measures (interim assessments, Khan objectives, individual test items, even Do Nows/Exit Tickets). So we have observed realization of a RIT score for a student as well as all the other academic indicators. All other RVs and parameters need to be estimated. And our goal here is to make inferences about \(r_i\) given our observation of other academic indicators.
Here’s a graphical representation of our probability model:
rit.graph <- new("graphNEL",
nodes=c("theta[i]",
"r[i]",
"rhat[i]",
"a[i]",
"k[i]",
"t[i]",
"ahat[i]",
"khat[i]",
"that[i]"
),
edgemode="directed")
rit.graph <- addEdge("theta[i]", "r[i]", rit.graph, 1)
rit.graph <- addEdge("r[i]", "rhat[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "a[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "k[i]" , rit.graph, 1)
rit.graph <- addEdge("rhat[i]", "t[i]" , rit.graph, 1)
rit.graph <- addEdge("a[i]", "ahat[i]" , rit.graph, 1)
rit.graph <- addEdge("k[i]", "khat[i]" , rit.graph, 1)
rit.graph <- addEdge("t[i]", "that[i]" , rit.graph, 1)
plot(rit.graph)
Now, let’s simulate some data based on this basic probability model. The \(\theta_i\)’s need a prior and I’ll use the school level norms (because I’m on a plane and don’t have the student level norms with me; we can change these later) for fall 5th grade math. Thinking about this, we can collapse the \(\theta_i\)s and \(r_i\), since we are assuming theta exists on the the RIT scale. I’ll use \(\theta_i\) to denote the RIT-scaled latent variable.
set.seed(432104)
n <- 90 # number of students
r<-rnorm(n = n, mean = 211.39, sd =5.83)
The \(\theta_i\)s are then noisily measured by the MAP test. I’ll assume a standard deviation of about 3, but that is completely arbitrary.
rhat<-rnorm(n, r, sd=rep(3, times=n))
head(rhat)
## [1] 211.9 226.3 195.8 217.3 201.2 227.7
OK. let’s make some assumptions about how \(\hat{r}\) is related to each of the three realizations of our academic indicators. Let’s start with some arbitrary assessment that is graded on a zero to one scale (or zero to 100%) a realization of which is denoted \(\hat{a}_i\). We’ll keep it simple and assume that the \(\hat{r}_i\) are linear predictors of \(z.a_{i}\) in the inverse-logistic function. That is,
\[ \begin{aligned} z.a_{i} & = \alpha + \beta \cdot \hat{r}_i \\ \hat{a}_{i} & = \frac{1}{1+\exp^{-z.a{i}}} \end{aligned} \]
Assuming \(\alpha = -1\) and \(\beta = 0.01\), we simulating the data like so:
alpha<- -1
beta<-.01
z.a<- alpha + beta*rhat
head(z.a)
## [1] 1.119 1.263 0.958 1.173 1.012 1.277
a<-1/(1+exp(-z.a))
head(a)
## [1] 0.7538 0.7796 0.7227 0.7637 0.7333 0.7820
ahat <- rnorm(n=n, mean=a,sd=.025)
head(ahat)
## [1] 0.7420 0.7609 0.7291 0.7966 0.6997 0.7730
Let’s pretend, for now, that this the only non-MAP assessment that we have. Consequently, all we have observed are the \(\hat{a}_i\) and the current RIT scores \(\hat{r}_i\). Here’s the simplified model in JAGS/BUGS syntax:
model.jags<-"
model {
for(i in 1:n) {
r[i] ~ dnorm(211.39, 1/5.39)
rhat[i] ~ dnorm(r[i],tau)
z[i] <- alpha + beta*rhat[i]
a[i] <- 1/(1 + exp(-z[i]))
ahat[i] ~ dnorm(a[i], tau.a)
}
# priors
beta ~ dnorm(0,.0001)
alpha ~ dnorm(0, .0001)
tau ~ dgamma(1e-3,pow(15,2)*1e-3)
tau.a ~ dgamma(1e-3,pow(15,2)*1e-3)
sigma.a <- pow(1/tau,1/2)
}
"
model.spec<-textConnection(model.jags)
Now run the mode and take a look at \(\alpha\) and \(\beta\) and see that they are relative the estimated values are relatively close to the acutal values (-1 and 0.01, respectivly):
jags <- jags.model(model.spec,
data = list('ahat' = ahat,
'rhat' = rhat,
'n' = n),
n.chains=4,
n.adapt=100)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 829
##
## Initializing model
samps.jags<-jags.samples(jags,
c('beta', 'alpha'),
10000)
samps.jags
## $alpha
## mcarray:
## [1] -0.1204
##
## Marginalizing over: iteration(10000),chain(4)
##
## $beta
## mcarray:
## [1] 3.691
##
## Marginalizing over: iteration(10000),chain(4)
No let’s look at the estimated \(a_i\)’s and \(r_i\)’s (the latter being our variable of interest)
samps.coda <- coda.samples(jags,
c('a', 'r', 'beta', 'alpha'),
10000)
summary.coda<-summary(samps.coda)
plot(samps.coda)
Let’s extract the estimated \(r_i\)’s from the coda samples
r.estimated<-as.vector(summary.coda$statistics[93:nrow(summary.coda$statistics),"Mean"])
rs<-data.table(Actual=r, Estimated=r.estimated)
rs[,Residual:=Estimated-Actual]
## Actual Estimated Residual
## 1: 210.5 211.5 0.9988
## 2: 224.7 212.6 -12.0525
## 3: 201.0 210.1 9.0722
## 4: 215.7 211.9 -3.7922
## 5: 203.4 210.6 7.1646
## 6: 229.2 212.7 -16.4674
## 7: 222.6 212.7 -9.9184
## 8: 221.7 212.1 -9.6718
## 9: 222.5 212.2 -10.2730
## 10: 202.0 210.4 8.4501
## 11: 193.5 210.0 16.4241
## 12: 213.0 211.5 -1.4872
## 13: 214.2 211.8 -2.4614
## 14: 213.3 211.4 -1.8927
## 15: 213.1 211.4 -1.6952
## 16: 206.4 211.2 4.7035
## 17: 229.3 213.2 -16.0826
## 18: 209.6 211.0 1.4385
## 19: 216.7 211.9 -4.7118
## 20: 204.7 211.2 6.4459
## 21: 209.2 211.5 2.2283
## 22: 208.2 211.1 2.8972
## 23: 212.6 211.5 -1.0663
## 24: 209.5 211.2 1.7171
## 25: 209.8 211.1 1.2855
## 26: 207.6 211.0 3.4100
## 27: 215.0 212.2 -2.7841
## 28: 208.1 211.0 2.8786
## 29: 208.1 210.8 2.7024
## 30: 208.0 211.1 3.0994
## 31: 207.2 210.9 3.7604
## 32: 215.4 211.8 -3.6098
## 33: 213.8 211.4 -2.3702
## 34: 215.1 212.0 -3.1475
## 35: 215.3 212.4 -2.8601
## 36: 223.7 212.1 -11.5720
## 37: 211.8 211.5 -0.2844
## 38: 212.0 211.7 -0.2179
## 39: 218.1 211.7 -6.4019
## 40: 224.3 212.3 -11.9642
## 41: 211.3 210.9 -0.4545
## 42: 216.6 212.1 -4.5519
## 43: 210.6 211.3 0.7114
## 44: 204.2 210.5 6.2880
## 45: 205.0 210.8 5.7614
## 46: 200.4 210.5 10.0298
## 47: 209.8 211.5 1.6949
## 48: 213.0 211.6 -1.3659
## 49: 218.3 211.9 -6.3417
## 50: 207.9 210.7 2.8495
## 51: 215.0 211.3 -3.6801
## 52: 217.8 212.0 -5.7443
## 53: 211.7 211.6 -0.1424
## 54: 203.4 210.5 7.1612
## 55: 213.2 211.2 -2.0624
## 56: 215.3 211.7 -3.5414
## 57: 215.0 211.4 -3.6403
## 58: 210.6 211.3 0.7520
## 59: 204.3 211.0 6.6555
## 60: 199.7 210.4 10.6870
## 61: 216.5 212.0 -4.4865
## 62: 213.5 211.5 -2.0410
## 63: 216.8 211.9 -4.9349
## 64: 207.0 211.1 4.1449
## 65: 198.1 210.1 11.9984
## 66: 218.8 212.3 -6.4710
## 67: 214.9 212.1 -2.7952
## 68: 209.8 212.0 2.1165
## 69: 214.5 212.1 -2.3816
## 70: 213.6 211.5 -2.0547
## 71: 202.1 210.2 8.0917
## 72: 215.9 211.8 -4.0364
## 73: 206.5 210.7 4.2456
## 74: 215.4 212.0 -3.3714
## 75: 212.4 211.5 -0.9000
## 76: 224.3 212.4 -11.9358
## 77: 211.4 211.3 -0.1713
## 78: 216.2 211.7 -4.4368
## 79: 226.6 212.7 -13.8953
## 80: 215.2 211.6 -3.6171
## 81: 206.2 210.8 4.5578
## 82: 215.6 211.9 -3.6425
## 83: 221.2 212.4 -8.8572
## 84: 215.6 212.0 -3.6298
## 85: 210.8 211.6 0.7580
## 86: 205.4 210.7 5.3451
## 87: 211.9 211.4 -0.4912
## 88: 207.1 211.8 4.7080
## 89: 207.9 210.9 2.9683
## 90: 210.0 211.7 1.7122
## Actual Estimated Residual
rs
## Actual Estimated Residual
## 1: 210.5 211.5 0.9988
## 2: 224.7 212.6 -12.0525
## 3: 201.0 210.1 9.0722
## 4: 215.7 211.9 -3.7922
## 5: 203.4 210.6 7.1646
## 6: 229.2 212.7 -16.4674
## 7: 222.6 212.7 -9.9184
## 8: 221.7 212.1 -9.6718
## 9: 222.5 212.2 -10.2730
## 10: 202.0 210.4 8.4501
## 11: 193.5 210.0 16.4241
## 12: 213.0 211.5 -1.4872
## 13: 214.2 211.8 -2.4614
## 14: 213.3 211.4 -1.8927
## 15: 213.1 211.4 -1.6952
## 16: 206.4 211.2 4.7035
## 17: 229.3 213.2 -16.0826
## 18: 209.6 211.0 1.4385
## 19: 216.7 211.9 -4.7118
## 20: 204.7 211.2 6.4459
## 21: 209.2 211.5 2.2283
## 22: 208.2 211.1 2.8972
## 23: 212.6 211.5 -1.0663
## 24: 209.5 211.2 1.7171
## 25: 209.8 211.1 1.2855
## 26: 207.6 211.0 3.4100
## 27: 215.0 212.2 -2.7841
## 28: 208.1 211.0 2.8786
## 29: 208.1 210.8 2.7024
## 30: 208.0 211.1 3.0994
## 31: 207.2 210.9 3.7604
## 32: 215.4 211.8 -3.6098
## 33: 213.8 211.4 -2.3702
## 34: 215.1 212.0 -3.1475
## 35: 215.3 212.4 -2.8601
## 36: 223.7 212.1 -11.5720
## 37: 211.8 211.5 -0.2844
## 38: 212.0 211.7 -0.2179
## 39: 218.1 211.7 -6.4019
## 40: 224.3 212.3 -11.9642
## 41: 211.3 210.9 -0.4545
## 42: 216.6 212.1 -4.5519
## 43: 210.6 211.3 0.7114
## 44: 204.2 210.5 6.2880
## 45: 205.0 210.8 5.7614
## 46: 200.4 210.5 10.0298
## 47: 209.8 211.5 1.6949
## 48: 213.0 211.6 -1.3659
## 49: 218.3 211.9 -6.3417
## 50: 207.9 210.7 2.8495
## 51: 215.0 211.3 -3.6801
## 52: 217.8 212.0 -5.7443
## 53: 211.7 211.6 -0.1424
## 54: 203.4 210.5 7.1612
## 55: 213.2 211.2 -2.0624
## 56: 215.3 211.7 -3.5414
## 57: 215.0 211.4 -3.6403
## 58: 210.6 211.3 0.7520
## 59: 204.3 211.0 6.6555
## 60: 199.7 210.4 10.6870
## 61: 216.5 212.0 -4.4865
## 62: 213.5 211.5 -2.0410
## 63: 216.8 211.9 -4.9349
## 64: 207.0 211.1 4.1449
## 65: 198.1 210.1 11.9984
## 66: 218.8 212.3 -6.4710
## 67: 214.9 212.1 -2.7952
## 68: 209.8 212.0 2.1165
## 69: 214.5 212.1 -2.3816
## 70: 213.6 211.5 -2.0547
## 71: 202.1 210.2 8.0917
## 72: 215.9 211.8 -4.0364
## 73: 206.5 210.7 4.2456
## 74: 215.4 212.0 -3.3714
## 75: 212.4 211.5 -0.9000
## 76: 224.3 212.4 -11.9358
## 77: 211.4 211.3 -0.1713
## 78: 216.2 211.7 -4.4368
## 79: 226.6 212.7 -13.8953
## 80: 215.2 211.6 -3.6171
## 81: 206.2 210.8 4.5578
## 82: 215.6 211.9 -3.6425
## 83: 221.2 212.4 -8.8572
## 84: 215.6 212.0 -3.6298
## 85: 210.8 211.6 0.7580
## 86: 205.4 210.7 5.3451
## 87: 211.9 211.4 -0.4912
## 88: 207.1 211.8 4.7080
## 89: 207.9 210.9 2.9683
## 90: 210.0 211.7 1.7122
## Actual Estimated Residual
Nicley, the mean residual is -0.7838 and the mean squared residual is 38.6224. Not too shabby!