Getting started with giftedCalcs:

A hands-on demonstration

Matthew McBee

November 2019

Preliminaries

Slides here! http://rpubs.com/mmcbee/NAGCgiftedCalcs

To follow along with this session live, you will need:

Installation

To install giftedCalcs, run the following code in the R console.

install.packages("devtools")
devtools::install_github("mcbeem/giftedCalcs")

library(giftedCalcs)

Other packages

For some Examples in this talk, you will also need the ggplot2 and mvtnorm packages.

Background

\[ \begin{split} \text{sensitivity} = p(\text{identified }|\text{ gifted})=\frac{p(\text{gifted, identified})}{p(\text{gifted})} = \\ \frac{\int_{-\infty}^{\infty} \int_{\kappa}^{\infty} \int_{\tau}^{\infty} \int_{\tau}^{\infty} \ N_4({\mu}, {\Sigma}) \ d_{C_o} d_{C_t} d_{N_o} d_{N_t} }{\int_{\tau}^{\infty} \ N(\mu,\sigma) \ d_{C_t}} \end{split} \]

where \(C_t\) is the confirmatory test true score, \(C_o\) is the confirmatory test observed score, \(N_t\) is the nomination true score, \(N_o\) is the nomination observed score, \(\tau\) is the confirmatory test cutoff, \(\kappa\) is the nomination cutoff, \(\mu\) is a vector of zeros, and…

\[ \begin{split} \mathbf{\Sigma}(N_X, N_T, C_X, C_T) = \\[3ex] \begin{bmatrix} 1 & \sqrt{\rho_{NN}} & r_{C_XN_X} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{CC}}} \\[3ex] \sqrt{\rho_{NN}} & 1 & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}}} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}\rho_{CC}}} \\[3ex] r_{C_XN_X} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}}} & 1 & \sqrt{\rho_{CC}} \\[3ex] \dfrac{r_{C_XN_X}}{\sqrt{\rho_{CC}}} & \dfrac{r_{C_XN_X}}{\sqrt{\rho_{NN}\rho_{CC}}} & \sqrt{\rho_{CC}} & 1 \end{bmatrix} \end{split} \]

These equations were presented in McBee, Peters, & Miller (2016) and were used to generate the tables and figures in the article.

However, an easier way is needed to do these calculations.

The giftedCalcs package makes this very easy and also contains many other functions for understanding, exploring, and improving the gifted identification process.

These functions can help you understand which type of students will have the best and worst chance of being identified. They can even help you evaluate the performance of an identification policy.

Package description

?giftedCalcs

Click on index at the bottom of the page to see all the functions in the package.

Getting help

?marginal_psychometrics

Help file Example code

Examples continued

Examples continued

marginal_psychometrics()

The marginal_psychometrics() function calculates:

These equations come from McBee, Peters, & Miller (2016)

conditional_moments()

The user specifies one of:

The first and second moments (e.g., expected value and variance) of the joint multivariate normal distribution of the other two variables is calculated.

conditional_moments(t.true=1.5, relyt=.9, valid=.6)
## $conditional.mean
##            [,1]
## n.obs 0.9486833
## t.obs 1.4230249
## 
## $conditional.cov
##       n.obs t.obs
## n.obs   0.6   0.0
## t.obs   0.0   0.1

# get the mean vector and covariance matrix of the nomination observed
#  score and confirmatory test observed score for a student whose true 
#  ability is z=1.5
moments <- conditional_moments(t.true=1.5, relyt=.9, valid=.6)

# create an object 'pts' which contains many possible values of the nomination
#  observed score and the confirmatory test true score. 
# We will calculate the density of those values for the student
n.obs <- seq(-1, 3, .01)
t.obs <- seq(0, 3, .01)
pts <- expand.grid(n.obs, t.obs)
names(pts) <- c("n.obs", "t.obs")

# calculate the density of each point using mvtnorm::dmvnorm
pts$density <- dmvnorm(pts, mean=moments$conditional.mean, sigma=moments$conditional.cov)

# make a heat map to visualize the density
ggplot(data=pts, aes(x=n.obs, y=t.obs, fill=density))+geom_raster()+
  scale_fill_viridis_c(option="D")+theme_classic()

conditional_p_id()

Calculates the probability of identification for a student with a particular true score.

conditional_p_id(x=2, relyt=.9, test.cutoff=.9,
   nom.cutoff=.9, valid=.5)
## [1] 0.3843284

Conditional identification curve

The conditional_p_id() function can be used to calculate a conditional identification curve.

# create vector of true scores
Tscores <- seq(0,3, length.out=100)

# calculate the identification probability for each
p.id <- conditional_p_id(x=Tscores, relyt=.9,
  test.cutoff=.9, nom.cutoff=.9, valid=.5)

# make a plot
plot(x=Tscores, y=p.id, type="l", xlab="true score",
  ylab="p identified", ylim=c(0,1))

# add a reference line for the test cutoff
abline(v=qnorm(.9), col="red")

Distribution functions

The following functions allow you to work with the statistical distribution of scores for identified students.

d_identified

The density function can be plotted to visualize the probability density of true scores for identified students.

# create vector of true scores
Tscores <- seq(0,4, length.out=200)

# add the un-normed density for the bad system
pdf <- d_identified(x=Tscores, relyt=.9,
  test.cutoff=.9, nom.cutoff=.9, valid=.5)

plot(x=Tscores, y=pdf, type="l", col="blue", 
     main="Distribution of true scores")

abline(v=qnorm(.9), col="red")

It can also be used to visualize the distribution of observed scores. Many of the functions work this way. This is triggered by not providing a value for the relyt= argument.

# create vector of true scores
Tscores <- seq(0,4, length.out=500)

# compute the density for each true score 
#  under the given identification policy
pdf <- sapply(Tscores, d_identified, test.cutoff=.9, 
              nom.cutoff=.9, valid=.5)

plot(x=Tscores, y=pdf, type="l", col="blue", 
     main="Distribution of observed scores")

abline(v=qnorm(.9), col="red")

p_identified()

This is the cumulative density function for identified students.

Given a true (or observed) score, it returns the percentile.

Example: What percentile would a student with a true score of z=1.5 be at relative to identified students, under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?

p_identified(x=1.5, relyt=.95, test.cutoff=.9, 
             nom.cutoff=.9, valid=.5)
## [1] 0.2617256

q_identified()

This is the quantile function for identified students.

Given a percentile, it returns the associated true (or observed) score for identified students.

Example: What score would be at the 80th percentile for identified students under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?

q_identified(percentile=.8, relyt=.95, test.cutoff=.9, 
             nom.cutoff=.9, valid=.5)
## [1] 2.243256

r_identified()

This samples random values from the distribution of true or observed scores of identified students.

This is useful for simulation purposes.

Example: Generate 10 random true scores for identified students under an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5.

r_identified(n=10, relyt=.95, test.cutoff=.9, 
             nom.cutoff=.9, valid=.5)
##  [1] 2.382062 1.742668 1.371333 1.769608 1.755212 1.898373 2.933656
##  [8] 1.502240 2.396970 1.428083

mean_identified(): mean or expected value for identified students

Example: What is the mean true score for identified students in an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?

mean_identified(relyt=.95, test.cutoff=.9, 
             nom.cutoff=.9, valid=.5)
## [1] 1.850697

sd_identified(): standard deviation for identified students

Example: What is the standard deviation of true scores for identified students in an identification system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination cutoff at the 90th percentile, and a nomination validity of 0.5?

sd_identified(relyt=.95, test.cutoff=.9, 
             nom.cutoff=.9, valid=.5)
## [1] 0.5055684

estimate_valid()

Infer identification system nomination validity and performance from observed data

## [1] 0.03240091
## [1] 0.1
result <- estimate_performance(x=data, id.rate=.032,
  nom.rate=0.1, reps=200)
## Warning in estimate_performance(x = data, id.rate = 0.032, nom.rate =
## 0.1, : A minimum of 500 reps is suggested for trustworthy standard errors
## and confidence intervals.
result
## $summary
##              Estimate  StdErr CI.95.lower CI.95.upper
## valid         0.50628 0.01358     0.47966     0.53289
## sensitivity   0.32914 0.00848     0.31252     0.34576
## nom.passrate  0.32516 0.00838     0.30874     0.34158
## test.cutoff   0.90121      NA          NA          NA
## nom.cutoff    0.90000      NA          NA          NA
## 
## $note
## [1] "Since the test reliability was not specified, the sensitivity value is interpreted as relative to universal screening. Achieved sensitivity is dependent on the test reliability."
## Warning in estimate_performance(x = data, id.rate = 0.032, nom.rate =
## 0.1, : A minimum of 500 reps is suggested for trustworthy standard errors
## and confidence intervals.
## $summary
##              Estimate  StdErr CI.95.lower CI.95.upper
## valid         0.50688 0.01310     0.48121     0.53255
## sensitivity   0.32951 0.00822     0.31341     0.34561
## nom.passrate  0.32552 0.00812     0.30962     0.34143
## test.cutoff   0.90121      NA          NA          NA
## nom.cutoff    0.90000      NA          NA          NA
## 
## $note
## [1] "Since the test reliability was not specified, the sensitivity value is interpreted as relative to universal screening. Achieved sensitivity is dependent on the test reliability."

Optimal identification

Optimal identification is a method for designing an identification process for maximal psychometric performance.

  1. Multiple assessments for confirmation.
  2. Mean combination rule
  3. Nominate on the basis of one of the constituent assessments

Functions:

reliability_mean(): Calculate the mean of a composite of several assessments

Imagine you wish to identify on the basis of three tests. There are three things you need to know:

  1. The correlations between the assessments
  2. The reliability of each assessment.
  3. How strongly you wish to weight each assessment in identification.

Let’s assume that your three assessments have the following correlation matrix:

\[ r = \begin{bmatrix} 1 & .4 & .7 \\ .4 & 1 & .5\\ .7 & .5 & 1 \\ \end{bmatrix} \]

And that the reliability coefficients for each assessment are

\[ rely= \begin{bmatrix} .90 & .85 & .88 \end{bmatrix} \]

And lastly, that you want the assessments to be equally weighted.

# create the correlation matrix and assign it to object r
r <- matrix(c(  1, .4, .7,
               .4,  1, .5,
               .7, .5,  1), 3,3, byrow=TRUE)

# calculate the reliability of the mean
reliability_mean(rely=c(.9, .85, .88), r=r, w=c(1, 1, 1))
## [1] 0.9403226

cor_mean(): The correlation of each assessment with the mean of the assessments

Imagine we have the same three assessments (with the same correlation matrix). We need to decide which one to nominate based on.

# create the correlation matrix and assign it to object r
r <- matrix(c(  1, .4, .7,
               .4,  1, .5,
               .7, .5,  1), 3,3, byrow=TRUE)

# calculate the reliability of the mean
cor_mean(r=r, w=c(1, 1, 1))
## [1] 0.8433803 0.7630584 0.8835413

These numbers can be interpreted as the nomination validity coefficients that would result if each assessment was used for nomination. Here we can see that the third assessment would have the highest validity coefficient.

What system performance could we expect?

On the basis of what we learned in the last two slides, we can use the marginal_psychometrics() function to calculate the identification system performance we could expect.

We still must choose a nomination cutoff. Lets start with an 80th percentile cutoff.

marginal_psychometrics(test.cutoff=.9, relyt=0.9403, valid=0.8835, nom.cutoff=.8)
## $sensitivity
## [1] 0.7792443
## 
## $IIR
## [1] 0.1268913
## 
## $nom.rate
## [1] 0.2
## 
## $nom.passrate
## [1] 0.4462445
## 
## $identification.rate
## [1] 0.08924878

Investigating a range of nomination cutoffs

# make a vector of cutoffs
cutoffs <- c(.6, .7, .8, .9)

# calculate the system performance under each on using mapply()
mapply(nom.cutoff=cutoffs, marginal_psychometrics, MoreArgs=list(
  test.cutoff=.9, relyt=0.9403, valid=0.8835))
##                     [,1]      [,2]       [,3]       [,4]      
## sensitivity         0.8260376 0.8176508  0.7792443  0.618047  
## IIR                 0.1667907 0.1551835  0.1268913  0.06959183
## nom.rate            0.4       0.3        0.2        0.1       
## nom.passrate        0.2478454 0.3226111  0.4462445  0.664273  
## identification.rate 0.0991381 0.09678324 0.08924878 0.06642711

A nomination cutoff versus sensitivity curve under Optimal Identification

# make a vector of nomination cutoff percentiles
cutoffs <- seq(.01, .99, length.out=100)

# calculate the system performance under each one using mapply()
#  select the first row of the resulting object, which contains the sensitivities
#  using indexing [1,]
sensitivity <- mapply(nom.cutoff=cutoffs, marginal_psychometrics, MoreArgs=list(
  test.cutoff=.9, relyt=0.9403, valid=0.8835))[1,]

# plot sensitivity versus nomination cutoff with a custom x-axis
plot(x=cutoffs, y=unlist(sensitivity), type='l', 
     xlab="nomination cutoff percentile", ylab="sensitivity", xaxt="n",
     ylim=c(0,1))
axis(side=1, at=seq(0, 1, .05))

A nomination cutoff versus sensitivity curve under Optimal Identification

A nomination cutoff versus sensitivity curve for a typical system

Grand vision / next steps