Matthew McBee
November 2019
Slides here! http://rpubs.com/mmcbee/NAGCgiftedCalcs
To follow along with this session live, you will need:
R installedRTo install giftedCalcs, run the following code in the R console.
For some Examples in this talk, you will also need the ggplot2 and mvtnorm packages.
\[ \begin{split} \text{sensitivity} = p(\text{identified }|\text{ gifted})=\frac{p(\text{gifted, identified})}{p(\text{gifted})} = \\ \frac{\int_{-\infty}^{\infty} \int_{\kappa}^{\infty} \int_{\tau}^{\infty} \int_{\tau}^{\infty} \ N_4({\mu}, {\Sigma}) \ d_{C_o} d_{C_t} d_{N_o} d_{N_t} }{\int_{\tau}^{\infty} \ N(\mu,\sigma) \ d_{C_t}} \end{split} \]
where \(C_t\) is the confirmatory test true score, \(C_o\) is the confirmatory test observed score, \(N_t\) is the nomination true score, \(N_o\) is the nomination observed score, \(\tau\) is the confirmatory test cutoff, \(\kappa\) is the nomination cutoff, \(\mu\) is a vector of zeros, and…
\[ \begin{split} \mathbf{\Sigma}(N_X, N_T, C_X, C_T) = \\[3ex] \begin{bmatrix} 1 & \sqrt{\rho_{NN}} & r_{C_XN_X} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{CC}}} \\[3ex] \sqrt{\rho_{NN}} & 1 & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}}} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}\rho_{CC}}} \\[3ex] r_{C_XN_X} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}}} & 1 & \sqrt{\rho_{CC}} \\[3ex] \dfrac{r_{C_XN_X}}{\sqrt{\rho_{CC}}} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}\rho_{CC}}} & \sqrt{\rho_{CC}} & 1 \end{bmatrix} \end{split} \]
These equations were presented in McBee, Peters, & Miller (2016) and were used to generate the tables and figures in the article.
However, an easier way is needed to do these calculations.
The giftedCalcs package makes this very easy and also contains many other functions for understanding, exploring, and improving the gifted identification process.
These functions can help you understand which type of students will have the best and worst chance of being identified. They can even help you evaluate the performance of an identification policy.
Click on index at the bottom of the page to see all the functions in the package.
?functionname at the consolemarginal_psychometrics()The marginal_psychometrics() function calculates:
These equations come from McBee, Peters, & Miller (2016)
conditional_moments()The user specifies one of:
The first and second moments (e.g., expected value and variance) of the joint multivariate normal distribution of the other two variables is calculated.
## $conditional.mean
## [,1]
## n.obs 0.9486833
## t.obs 1.4230249
##
## $conditional.cov
## n.obs t.obs
## n.obs 0.6 0.0
## t.obs 0.0 0.1
# get the mean vector and covariance matrix of the nomination observed
# score and confirmatory test observed score for a student whose true
# ability is z=1.5
moments <- conditional_moments(t.true=1.5, relyt=.9, valid=.6)
# create an object 'pts' which contains many possible values of the nomination
# observed score and the confirmatory test true score.
# We will calculate the density of those values for the student
n.obs <- seq(-1, 3, .01)
t.obs <- seq(0, 3, .01)
pts <- expand.grid(n.obs, t.obs)
names(pts) <- c("n.obs", "t.obs")
# calculate the density of each point using mvtnorm::dmvnorm
pts$density <- dmvnorm(pts, mean=moments$conditional.mean, sigma=moments$conditional.cov)
# make a heat map to visualize the density
ggplot(data=pts, aes(x=n.obs, y=t.obs, fill=density))+geom_raster()+
scale_fill_viridis_c(option="D")+theme_classic()conditional_p_id()Calculates the probability of identification for a student with a particular true score.
## [1] 0.3843284
The conditional_p_id() function can be used to calculate a conditional identification curve.
# create vector of true scores
Tscores <- seq(0,3, length.out=100)
# calculate the identification probability for each
p.id <- conditional_p_id(x=Tscores, relyt=.9,
test.cutoff=.9, nom.cutoff=.9, valid=.5)
# make a plot
plot(x=Tscores, y=p.id, type="l", xlab="true score",
ylab="p identified", ylim=c(0,1))
# add a reference line for the test cutoff
abline(v=qnorm(.9), col="red")The following functions allow you to work with the statistical distribution of scores for identified students.
d_identified(): densityp_identified(): cumulative densityq_identified(): quantile functionr_identified(): random samplesd_identifiedThe density function can be plotted to visualize the probability density of true scores for identified students.
# create vector of true scores
Tscores <- seq(0,4, length.out=200)
# add the un-normed density for the bad system
pdf <- d_identified(x=Tscores, relyt=.9,
test.cutoff=.9, nom.cutoff=.9, valid=.5)
plot(x=Tscores, y=pdf, type="l", col="blue",
main="Distribution of true scores")
abline(v=qnorm(.9), col="red")It can also be used to visualize the distribution of observed scores. Many of the functions work this way. This is triggered by not providing a value for the relyt= argument.
# create vector of true scores
Tscores <- seq(0,4, length.out=500)
# compute the density for each true score
# under the given identification policy
pdf <- sapply(Tscores, d_identified, test.cutoff=.9,
nom.cutoff=.9, valid=.5)
plot(x=Tscores, y=pdf, type="l", col="blue",
main="Distribution of observed scores")
abline(v=qnorm(.9), col="red")p_identified()This is the cumulative density function for identified students.
Given a true (or observed) score, it returns the percentile.
Example: What percentile would a student with a true score of z=1.5 be at relative to identified students, under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?
## [1] 0.2617256
q_identified()This is the quantile function for identified students.
Given a percentile, it returns the associated true (or observed) score for identified students.
Example: What score would be at the 80th percentile for identified students under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?
## [1] 2.243256
r_identified()This samples random values from the distribution of true or observed scores of identified students.
This is useful for simulation purposes.
Example: Generate 10 random true scores for identified students under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5.
## [1] 2.382062 1.742668 1.371333 1.769608 1.755212 1.898373 2.933656
## [8] 1.502240 2.396970 1.428083
mean_identified(): mean or expected value for identified studentsExample: What is the mean true score for identified students in an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?
## [1] 1.850697
sd_identified(): standard deviation for identified studentsExample: What is the standard deviation of true scores for identified students in an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?
## [1] 0.5055684
estimate_valid()Infer identification system nomination validity and performance from observed data
## [1] 0.03240091
## [1] 0.1
## Warning in estimate_performance(x = data, id.rate = 0.032, nom.rate =
## 0.1, : A minimum of 500 reps is suggested for trustworthy standard errors
## and confidence intervals.
## $summary
## Estimate StdErr CI.95.lower CI.95.upper
## valid 0.50628 0.01358 0.47966 0.53289
## sensitivity 0.32914 0.00848 0.31252 0.34576
## nom.passrate 0.32516 0.00838 0.30874 0.34158
## test.cutoff 0.90121 NA NA NA
## nom.cutoff 0.90000 NA NA NA
##
## $note
## [1] "Since the test reliability was not specified, the sensitivity value is interpreted as relative to universal screening. Achieved sensitivity is dependent on the test reliability."
## Warning in estimate_performance(x = data, id.rate = 0.032, nom.rate =
## 0.1, : A minimum of 500 reps is suggested for trustworthy standard errors
## and confidence intervals.
## $summary
## Estimate StdErr CI.95.lower CI.95.upper
## valid 0.50688 0.01310 0.48121 0.53255
## sensitivity 0.32951 0.00822 0.31341 0.34561
## nom.passrate 0.32552 0.00812 0.30962 0.34143
## test.cutoff 0.90121 NA NA NA
## nom.cutoff 0.90000 NA NA NA
##
## $note
## [1] "Since the test reliability was not specified, the sensitivity value is interpreted as relative to universal screening. Achieved sensitivity is dependent on the test reliability."
Optimal identification is a method for designing an identification process for maximal psychometric performance.
Functions:
reliability_mean()cor_mean()var_mean()shrinkage_mean()reliability_mean(): Calculate the mean of a composite of several assessmentsImagine you wish to identify on the basis of three tests. There are three things you need to know:
Let’s assume that your three assessments have the following correlation matrix:
\[ r = \begin{bmatrix} 1 & .4 & .7 \\ .4 & 1 & .5\\ .7 & .5 & 1 \\ \end{bmatrix} \]
And that the reliability coefficients for each assessment are
\[ rely= \begin{bmatrix} .90 & .85 & .88 \end{bmatrix} \]
And lastly, that you want the assessments to be equally weighted.
# create the correlation matrix and assign it to object r
r <- matrix(c( 1, .4, .7,
.4, 1, .5,
.7, .5, 1), 3,3, byrow=TRUE)
# calculate the reliability of the mean
reliability_mean(rely=c(.9, .85, .88), r=r, w=c(1, 1, 1))## [1] 0.9403226
cor_mean(): The correlation of each assessment with the mean of the assessmentsImagine we have the same three assessments (with the same correlation matrix). We need to decide which one to nominate based on.
# create the correlation matrix and assign it to object r
r <- matrix(c( 1, .4, .7,
.4, 1, .5,
.7, .5, 1), 3,3, byrow=TRUE)
# calculate the reliability of the mean
cor_mean(r=r, w=c(1, 1, 1))## [1] 0.8433803 0.7630584 0.8835413
These numbers can be interpreted as the nomination validity coefficients that would result if each assessment was used for nomination. Here we can see that the third assessment would have the highest validity coefficient.
On the basis of what we learned in the last two slides, we can use the marginal_psychometrics() function to calculate the identification system performance we could expect.
We still must choose a nomination cutoff. Lets start with an 80th percentile cutoff.
## $sensitivity
## [1] 0.7792443
##
## $IIR
## [1] 0.1268913
##
## $nom.rate
## [1] 0.2
##
## $nom.passrate
## [1] 0.4462445
##
## $identification.rate
## [1] 0.08924878
# make a vector of cutoffs
cutoffs <- c(.6, .7, .8, .9)
# calculate the system performance under each on using mapply()
mapply(nom.cutoff=cutoffs, marginal_psychometrics, MoreArgs=list(
test.cutoff=.9, relyt=0.9403, valid=0.8835))## [,1] [,2] [,3] [,4]
## sensitivity 0.8260376 0.8176508 0.7792443 0.618047
## IIR 0.1667907 0.1551835 0.1268913 0.06959183
## nom.rate 0.4 0.3 0.2 0.1
## nom.passrate 0.2478454 0.3226111 0.4462445 0.664273
## identification.rate 0.0991381 0.09678324 0.08924878 0.06642711
# make a vector of nomination cutoff percentiles
cutoffs <- seq(.01, .99, length.out=100)
# calculate the system performance under each one using mapply()
# select the first row of the resulting object, which contains the sensitivities
# using indexing [1,]
sensitivity <- mapply(nom.cutoff=cutoffs, marginal_psychometrics, MoreArgs=list(
test.cutoff=.9, relyt=0.9403, valid=0.8835))[1,]
# plot sensitivity versus nomination cutoff with a custom x-axis
plot(x=cutoffs, y=unlist(sensitivity), type='l',
xlab="nomination cutoff percentile", ylab="sensitivity", xaxt="n",
ylim=c(0,1))
axis(side=1, at=seq(0, 1, .05))