library(wordbankr) # WB data
library(tidyverse) # tidy
library(mirt) # IRT models
library(ltm) # more IRT functions
#library(difr) # some differential item functioning tools. 
library(psych) # some psychometric stuff (tests of dimensionality)
library(Gifi)# some more psychometric stuff (tests of dimensionality)
library(knitr) # some formatting, tables, etc
library(patchwork) # combining plots. 
library(GGally) # More plottinng options. 
library()

1 A quick intro to IRT

Item response theory is a class of models for estimating the difficulty of items and the proficiency level of participants. Imagine we have a data set with i binary items and v participants. If we wanted to predict how well Person 1 did on Item 1, we could just take the difference between Person 1’s latent ability, and the difficulty of Item 1:

\[ \theta_{v = 1} - \beta_{i = 1} \] If Person 1’s ability (\(\theta\)) is greater than the difficulty (\(\beta\)) for Item 1, then they’re likely to get that item correct. If their ability is less than the difficulty of Item 1, then they are unlikely to get that item right. If we take the inverse logit of the difference between these values, we now have a probability that Person 1 person gets Item 1 correct. These two sets of values–participant abilities and item difficulties–and the inverse logit function are the components of the most basic IRT model, the Rasch model:

\[ P\left(X_{v i}=1\right)=\frac{\exp \left(\theta_{v}-\beta_{i}\right)}{1+\exp \left(\theta_{v}+\beta_{i}\right)} \]

For intuition, if a person’s ability equals the item’s difficulty parameter, the difference is 0, and the equation above produces a probability of .5. When \(\theta\) is larger than \(\beta\), the probabilility will be greater than .5, and when \(\theta\) is smaller than \(\beta\) the probability will be smaller than .5. (Most programs swap the sign on \(\beta\) for interpretability).

The Rasch makes a strict assumption–that items vary in their difficulty, but not in how well they discriminate between participants of varying ability levels. This is easier to wrap your head around with an example. I’ve fit a Rasch model to some data, and plotted item characteristic curves for some items. These tell us how estimated ability relates to the the probability of a correct response.

library(MPsychoR)
## Warning: package 'MPsychoR' was built under R version 4.0.3
data("RWDQ")
rasch <- rasch(RWDQ[,-1] )
plot(rasch, item=2:6, legend=TRUE)

The thing to notice here is that these are all parallel. The items vary in the latent ability necessary to get a item correct 50% of the time (an intercept basically) but not in how well they discriminate between participants of different ability levels. We can modify the Rasch model above by adding a discrimination parameter \(\alpha\). this gives us the 2 parameter logistic regression equation below, which is what I use in the subsequent analyses:

\[ \left(X_{v i}=1\right)=\frac{\exp \left(\alpha_{i}\left(\theta_{v}-\beta_{i}\right)\right)}{1+\exp \left(\alpha_{i}\left(\theta_{v}-\beta_{i}\right)\right)} \] Adding \(\alpha\) allows the item response curves to have different slopes.

l2p <- ltm(RWDQ[,-1] ~z1)
plot(l2p, item=2:6, legend=TRUE)

Now the item functions can cross. This means that some items don’t discriminate as well as well as others. So wdq_25 is largely affected by changes in ability from -2 t 0, but levels off after 0, whereas wdq_24 is affected by changes in ability across the whole range.

So to sum this introduction up, the 2 parameter model allows us to estimate abilities which are characteristics of participants, and difficulty and discrimination parameters, which are properties of items.

2 Preliminaries and Data Screening

Ok now on to the Word Bank data.

Download data from Word Bank:

Inst <- get_instrument_data(language="English (American)", form="WS")
Admin <- get_administration_data(language="English (American)", form="WS")
Item <- get_item_data(language="English (American)", form = "WS")

Combine data sets. Create binary variable for accuracy.

Complex <- Admin %>%
  full_join(.,Inst, by="data_id") %>%
  full_join(., Item, by="num_item_id") %>%
  filter(longitudinal==FALSE) %>%
  filter(type == "complexity") %>%
  mutate(
    out = ifelse(value=="complex", yes=1, no=0)
  ) 

2.1 Data Screening

Let’s see how many missing values there are.

Complex %>%
  filter(is.na(out)) %>%
  group_by(definition) %>%
  count() # looks like 1426 participants don't have item-level complexity scores. 

Looks like there are 1426 participants don’t have item-level complexity scores. So let’s drop those.

Complex <- filter(Complex, !is.na(out))

Let’s look at means and SDs for each item:

Complex %>%
  group_by(definition) %>%
  summarise(
    mean=mean(out), 
    sd=sd(out),
    category = first(complexity_category)
  ) %>%
  arrange(category) %>%
  kable(caption="Means and SDs for Each Item (Arranged by Item)")
Means and SDs for Each Item (Arranged by Item)
definition mean sd category
baby blanket / baby’s blanket 0.3873745 0.4872378 morphology
daddy car / daddy’s car 0.4451220 0.4970684 morphology
daddy pick me up / daddy picked me up 0.1743185 0.3794512 morphology
doggie kiss me / doggie kissed me 0.1879484 0.3907410 morphology
I fall down / I fell down 0.1807747 0.3849005 morphology
I make tower / I making tower 0.2360115 0.4247055 morphology
kitty go away / kitty went away 0.1280488 0.3342041 morphology
kitty sleep / kitty sleeping 0.4192970 0.4935326 morphology
more cookie / more cookies 0.3758967 0.4844404 morphology
these my tooth / these my teeth 0.4533716 0.4979103 morphology
two foot / two feet 0.4214491 0.4938798 morphology
two shoe / two shoes 0.4422525 0.4967431 morphology
baby crying / baby crying cuz she’s sad 0.1599713 0.3666452 syntax
baby crying / baby is crying 0.1624821 0.3689586 syntax
baby want eat / baby want to eat 0.2650646 0.4414468 syntax
coffee hot / that coffee hot 0.1929699 0.3947004 syntax
cookie mommy / cookie for mommy 0.2413917 0.4280040 syntax
doggie table / doggie on table 0.3511478 0.4774147 syntax
don’t read book / don’t want you read that book 0.1818508 0.3857907 syntax
go bye-bye / wanna go bye-bye 0.2679340 0.4429625 syntax
I like read stories / I like to read stories 0.1779053 0.3825016 syntax
I no do it / I can’t do it 0.2482066 0.4320498 syntax
I sing song / I sing song for you 0.1449067 0.3520699 syntax
I want that / I want that one you got 0.1137016 0.3175054 syntax
lookit / lookit what I got 0.2410330 0.4277870 syntax
lookit me / lookit me dancing 0.2596844 0.4385400 syntax
no wash dolly / don’t wash dolly 0.2116212 0.4085310 syntax
read me story Mommy / read me a story Mommy 0.1635581 0.3699405 syntax
that my truck / that’s my truck 0.2822812 0.4501902 syntax
there a kitty / there’s a kitty 0.3210187 0.4669517 syntax
turn on light / turn on light so I can see 0.0961263 0.2948172 syntax
want cookies / want cookies and milk 0.2048063 0.4036324 syntax
want more juice / want juice in there 0.2051650 0.4038946 syntax
we made this / me and Paul made this 0.1603300 0.3669776 syntax
where’s my dolly / where’s my dolly name Sam 0.1093974 0.3121932 syntax
where mommy go / where did mommy go 0.1822095 0.3860863 syntax
you fix it / can you fix it 0.1025825 0.3034672 syntax

Let’s also check the distribution of means for each category

Complex %>%
  group_by(definition) %>%
  summarise(
    mean=mean(out), 
    category = first(complexity_category)
  ) %>%
  ggplot(aes(x=category, y= mean)) + geom_boxplot()

2.2 Data Preparation

Prepare data set for IRT modeling. I’m going to re-name each item to its category (morphology or syntax) and its item number so as to make some graphs easier to read.

Complex_short_with_ids <- Complex %>% # made 
  dplyr::select(data_id, value, complexity_category, num_item_id) %>%
  mutate(
    out = ifelse(value=="complex", yes=1, no=0),
    label = str_c(complexity_category, num_item_id)
  ) %>%
  dplyr::select(-value) %>%
  pivot_wider(id_cols=data_id, names_from = "label", values_from="out") # dataset for IRT can't have IDs

Complex_short <- Complex_short_with_ids %>%
  dplyr::select(-data_id)

2.3 Dimensionality Assessment

Prior to fitting the IRT model, it’s worth looking at the correlation structure of the data to get a sense of the possible dimensionality.

Create tetrachoric correlation matrix (for binary data). I created a vector with labels, for labeling of the correlation plot.

Complex_poly <- tetrachoric(Complex_short)
rho <- Complex_poly$rho

lab = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37") # create labels for corPlot because the defintions are long and take up a ton of space

Check out the correlations

corPlot(rho, labels=lab)

They look really high. Let’s examine the dimensionality, using princals, which is basically just a version of PCA that is appropriate for categorical data.

pc <- princals(rho)

plot(pc)

Interesting. Seems like maybe there are 2 factors. An alternative approach is a scree plot with confidence intervals from simulated data:

fa.parallel(rho, fa="fa", cor="tet", n.obs = 2788)

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

Parallel analysis suggests one very large factor and a small second one. A third metric is very simple structure. This tells us the best solution assuming different values of k, where k is the maximum number of latent variables an observed variable loads on.

vss(rho, cor="tet", n.obs = 2788)

## 
## Very Simple Structure
## Call: vss(x = rho, n.obs = 2788, cor = "tet")
## VSS complexity 1 achieves a maximimum of 1  with  1  factors
## VSS complexity 2 achieves a maximimum of 1  with  2  factors
## 
## The Velicer MAP achieves a minimum of 0.02  with  4  factors 
## BIC achieves a minimum of  17127.14  with  8  factors
## Sample Size adjusted BIC achieves a minimum of  18391.72  with  8  factors
## 
## Statistics by number of factors 
##   vss1 vss2   map dof chisq prob sqresid fit RMSEA   BIC SABIC complex eChisq
## 1 1.00 0.00 0.026 629 41190    0    2.18   1  0.15 36200 38199     1.0   2778
## 2 0.47 1.00 0.018 593 33101    0    1.37   1  0.14 28397 30281     1.8   1248
## 3 0.37 0.81 0.017 558 29519    0    1.08   1  0.14 25092 26865     2.5    826
## 4 0.36 0.74 0.017 524 26605    0    0.93   1  0.13 22448 24113     2.9    630
## 5 0.38 0.76 0.017 491 24433    0    0.81   1  0.13 20538 22098     3.1    489
## 6 0.37 0.71 0.018 459 22595    0    0.72   1  0.13 18954 20412     3.3    393
## 7 0.37 0.72 0.019 428 21444    0    0.64   1  0.13 18048 19408     3.3    331
## 8 0.39 0.71 0.020 398 20285    0    0.57   1  0.13 17127 18392     3.4    282
##     SRMR eCRMS  eBIC
## 1 0.0274 0.028 -2212
## 2 0.0183 0.019 -3457
## 3 0.0149 0.016 -3600
## 4 0.0130 0.015 -3527
## 5 0.0115 0.013 -3407
## 6 0.0103 0.012 -3248
## 7 0.0094 0.012 -3064
## 8 0.0087 0.011 -2875

Very simple structure also suggests one, if we assume each variable loads on one factor only. If we assume everything loads on 2 factors then a 2 factor solution is best

These tests seem to suggest a single factor is probably the best approach, though it’s possible a two factor approach would be better.

3 IRT Modeling

3.1 2PL model

Let’s first fit the 2pl model and examine the output.

m1 <- mirt(Complex_short, 1, itemtype="2PL", verbose=FALSE)
m1
## 
## Call:
## mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
## 
## Full-information item factor analysis with 1 factor(s).
## Converged within 1e-04 tolerance after 197 EM iterations.
## mirt version: 1.33.2 
## M-step optimizer: BFGS 
## EM acceleration: Ramsay 
## Number of rectangular quadrature: 61
## Latent density type: Gaussian 
## 
## Log-likelihood = -25839.34
## Estimated parameters: 74 
## AIC = 51826.68; AICc = 51830.77
## BIC = 52265.72; SABIC = 52030.6
## G2 (1e+10) = 24307.36, p = 1
## RMSEA = 0, CFI = NaN, TLI = NaN
summary(m1)
##                  F1    h2
## morphology761 0.930 0.866
## morphology762 0.899 0.808
## morphology763 0.904 0.818
## morphology764 0.923 0.852
## morphology765 0.937 0.879
## morphology766 0.881 0.776
## morphology767 0.904 0.817
## morphology768 0.917 0.840
## morphology769 0.915 0.838
## morphology770 0.937 0.878
## morphology771 0.952 0.906
## morphology772 0.927 0.859
## syntax773     0.958 0.918
## syntax774     0.938 0.879
## syntax775     0.948 0.899
## syntax776     0.952 0.907
## syntax777     0.953 0.908
## syntax778     0.942 0.887
## syntax779     0.897 0.805
## syntax780     0.953 0.908
## syntax781     0.933 0.870
## syntax782     0.934 0.872
## syntax783     0.956 0.913
## syntax784     0.949 0.901
## syntax785     0.969 0.938
## syntax786     0.967 0.934
## syntax787     0.960 0.922
## syntax788     0.940 0.884
## syntax789     0.937 0.878
## syntax790     0.969 0.939
## syntax791     0.973 0.947
## syntax792     0.947 0.896
## syntax793     0.966 0.934
## syntax794     0.928 0.862
## syntax795     0.912 0.832
## syntax796     0.967 0.934
## syntax797     0.953 0.908
## 
## SS loadings:  32.613 
## Proportion Var:  0.881 
## 
## Factor correlations: 
## 
##    F1
## F1  1

Let’s check the overall fit of the model. The chi square statistic here isn’t super meaningful given the sample size.

M2(m1)

RMSEA and CFI are good, SRMSR is a bit high but pretty reasonable. We can see if individual items misfit – i.e., do the observed cell counts differ from the expected cell counts.

itemfit(m1)

A few items significantly misfit. Let’s take everything with a p value less than .10. We’ll look at item gam plots of those, which plot the item response probability against the latent variable using a gam function, to look for deviations in the item-response function.

misfit <- itemfit(m1) %>% # Get labels of mis-fitting items. 
  filter(p.S_X2 <= .10) %>%
  dplyr::select(item) %>%
  as.vector()

items_good <- dplyr::select(Complex_short, -all_of(misfit$item)) # Well fitting items
items_bad <- dplyr::select(Complex_short, all_of(misfit$item)) # Poorly fitting items

mod_fit <- mirt(items_good, 1, "2PL", verbose=FALSE) # Calculate factor scores using only the well fitting items. 
Theta <- fscores(mod_fit)

# Plot non-linear item response theory curves with the factor scores 
IG762 <- itemGAM(items_bad$morphology762,Theta)
IG764 <- itemGAM(items_bad$morphology764,Theta)
IG769<- itemGAM(items_bad$morphology769,Theta)
IG771 <- itemGAM(items_bad$morphology771,Theta)

IG783 <- itemGAM(items_bad$syntax783,Theta)
IG792 <- itemGAM(items_bad$syntax792,Theta)

Here are the morphology items

plot(IG762); plot(IG764); plot(IG769); plot(IG771)

par(par(mfrow=c(2,2)))

Syntax items

plot(IG783); plot(IG792)

These don’t look like massive deviations. They are all monotonic, certainly, but they seem to slow down more than the logistic function.

Next let’s look at the relationship between the true scores and the raw scores.

true_raw <- Complex_short %>%
  mutate(
    Raw = rowSums(.[,1:37]), 
    Theta = fscores(m1, method="MAP")
  ) %>%
  ggplot(aes(x=Theta, y=Raw)) + geom_point() + stat_smooth(method="loess") + theme_minimal()

raw_score <- Complex_short %>%
  mutate(
    Raw = rowSums(.[,1:37])
    ) %>% 
   ggplot(aes(x=Raw)) + geom_histogram() + theme_minimal()

true_score <- Complex_short %>%
  mutate(
    Theta = fscores(m1, method="MAP")
    ) %>% 
   ggplot(aes(x=Theta)) + geom_histogram() + theme_minimal()

library(patchwork)

true_raw/(true_score + raw_score) 
## `geom_smooth()` using formula 'y ~ x'
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Two things suggest this is a more precise measurement: the relationship between the true and raw scores is sigmoid-ish, and there are multiple values of the true score at every level of the observed score. Also, note the number of 0s on the raw score, and really high scores on the true score.

Next let’s take a look at the difficulty and discrimination parameter for each item.

coefs_2pl <- coef(m1, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )
ggplot(coefs_2pl,  
       aes(x = a1, y = d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

ggplot(coefs_2pl, aes(x=a1, fill=category)) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(coefs_2pl, aes(x=d, fill=category)) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Strong correlation between the difficulty and discrimination parameter. Also note that the morphology items are less difficult than the syntax ones in general.

3.2 Thinking Through the Assumptions of the 2PL model

It’s important to remember that these latent variables are only interpretable if the following conditions hold: 1. The assumption of unidimensionality is appropriate. 2. The assumed distributional form of the latent variable is correct. 3. Measurement invariance holds: the items behave the same way across different groups of participants.

I can test 1 and 2 on my own. 3 is a bit of a stretch for me right now.

3.2.1 Non-parametric model relaxing assumption of normal latent variable.

Let’s look at the distribution of the latent variable first. MIRT allows us to relax the assumption of normally distributed latent variables at the cost of loss of flexibility. I’ll look at a model which makes no assumptions about the form of the latent variable note to self: check this later , and compare results to those above.

m1_nonpar <- mirt(Complex_short, 1, itemtype="2PL", dentype = "EH", verbose=FALSE)

M2(m1_nonpar)

RMSREA is a bit lower than the model above. The standarized residuals are quite large though.

Complex_short %>%
  mutate(
    Theta_EH = fscores(m1_nonpar, method="MAP"), 
    Theta_gauss = fscores(m1, method="MAP")
  ) %>%
  ggplot(aes(x=Theta_EH, y=Theta_gauss)) + geom_point() + stat_smooth(method="loess") + theme_minimal()
## Warning: The following factor score estimates failed to converge successfully:
##     1907
## `geom_smooth()` using formula 'y ~ x'

These are quite similar though a bit different in the tails. There’s another type of non-parametric latent variable in MIRT.

m1_nonpar2 <- mirt(Complex_short, 1, itemtype="2PL", dentype = "EHW", verbose=FALSE)

M2(m1_nonpar2)
Complex_short %>%
  mutate(
    Theta_EH2 = fscores(m1_nonpar2, method="MAP"), 
    Theta_gauss = fscores(m1, method="MAP")
  ) %>%
  ggplot(aes(x=Theta_EH2, y=Theta_gauss)) + geom_point() + stat_smooth(method="loess") + theme_minimal()
## Warning: The following factor score estimates failed to converge successfully:
##     641
## `geom_smooth()` using formula 'y ~ x'

These are also very, very close. I think it’s sensible to stick with the Gaussian model. It fits better, its simpler and more flexible, and the factor scores are really, really similar across the two approaches.

The next thing we can do is check the dimensionality. I got mixed results before. Check dimensionality against 2 dimensional model.

3.2.2 Test assumption of unidimensionality

First a formal test of dimensionality. We need to fit the model with ltm. The unidimTest takes forever to run and produces enormous output, so it’s commented out.

#m1_ltm <- ltm(Complex_short ~z1)
#unidimTest(m1_ltm, Complex_short)

Looks like this test of uni-dimensionality has been rejected. Let’s try explicitly modeling the multidimensionality.

Exploratory 2 dimensional IRT.

m2 <- mirt(Complex_short, 2, itemtype="2PL", verbose=FALSE)
M2(m2)
anova(m1, m2)
## 
## Model 1: mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = 2, itemtype = "2PL", verbose = FALSE)
summary(m2, "oblimin", suppress=.20)
## 
## Rotation:  oblimin 
## 
## Rotated factor loadings: 
## 
##                   F1     F2    h2
## morphology761 -0.898 -0.229 0.892
## morphology762 -0.873     NA 0.820
## morphology763 -0.856 -0.334 0.890
## morphology764 -0.902     NA 0.855
## morphology765 -0.925     NA 0.869
## morphology766 -0.863     NA 0.774
## morphology767 -0.878     NA 0.827
## morphology768 -0.901     NA 0.839
## morphology769 -0.876 -0.292 0.894
## morphology770 -0.922     NA 0.887
## morphology771 -0.937     NA 0.921
## morphology772 -0.911     NA 0.869
## syntax773     -0.950     NA 0.911
## syntax774     -0.920     NA 0.888
## syntax775     -0.940     NA 0.886
## syntax776     -0.948     NA 0.896
## syntax777     -0.946     NA 0.896
## syntax778     -0.937     NA 0.874
## syntax779     -0.890     NA 0.787
## syntax780     -0.938     NA 0.911
## syntax781     -0.932     NA 0.864
## syntax782     -0.924     NA 0.860
## syntax783     -0.955     NA 0.905
## syntax784     -0.947     NA 0.892
## syntax785     -0.966     NA 0.929
## syntax786     -0.968     NA 0.933
## syntax787     -0.964     NA 0.936
## syntax788     -0.943  0.217 0.902
## syntax789     -0.936     NA 0.873
## syntax790     -0.971     NA 0.937
## syntax791     -0.973     NA 0.942
## syntax792     -0.947     NA 0.891
## syntax793     -0.968     NA 0.931
## syntax794     -0.927     NA 0.856
## syntax795     -0.910     NA 0.830
## syntax796     -0.967     NA 0.932
## syntax797     -0.946  0.216 0.907
## 
## Rotated SS loadings:  31.937 0.727 
## 
## Factor correlations: 
## 
##       F1    F2
## F1 1.000 0.082
## F2 0.082 1.000

This model fits better, but it doesn’t look like the second factor is adding much of anything. Also it looks like the correlation between factors is really small, which seems unlikely given the way language works.

Plot the discrimination parameters for each dimension against one another.

coefs_2plmulti <- coef(m2, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(185:190)) %>%
  pivot_longer(everything()) %>%
  separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, a2, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  ) 
ggplot(coefs_2plmulti ,  
       aes(x = a1, y = a2)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data =coefs_2plmulti, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination1") + 
  ylab("Discrimination2") + theme_minimal()

2 discrimination parameters. They’re both negative. I’m wondering if these are parameterized a little bit differently, such that they mean something different in the two dimensional context.

Plot the latent variables against the raw scores and against one another.

Complex_short %>%
  mutate(
    Raw = rowSums(.[,1:37]), 
    Theta1 = fscores(m2, method="MAP")[,1], 
    Theta2 = fscores(m2, method="MAP")[,2]
  ) %>%
  dplyr::select(Raw, Theta1, Theta2) %>%
  ggpairs()
## Warning: The following factor score estimates failed to converge successfully:
##     1263,1273

## Warning: The following factor score estimates failed to converge successfully:
##     1263,1273

The correlation between the raw scores and Theta1 is extremely high, whereas the correlation between theta2 and the raw score is minimal. It also has a much smaller range. There’s als a weak correlation between the factor scores.

Another way to think about this problem is that perhaps the morphology and syntax items are measuring different proficiencies We can fit a confirmatory IRT model, with one morphology factor and on syntactic.

model.1 <- mirt.model('
                      F1 = 1 - 12
                      F2 = 13 - 37
                      COV=F1*F2')

m3 <- mirt(Complex_short, model.1, "2PL", verbose=FALSE)

M2(m3)
summary(m3)
##                  F1    F2    h2
## morphology761 0.933 0.000 0.870
## morphology762 0.891 0.000 0.795
## morphology763 0.901 0.000 0.812
## morphology764 0.913 0.000 0.834
## morphology765 0.919 0.000 0.844
## morphology766 0.863 0.000 0.744
## morphology767 0.892 0.000 0.796
## morphology768 0.904 0.000 0.818
## morphology769 0.913 0.000 0.833
## morphology770 0.934 0.000 0.872
## morphology771 0.958 0.000 0.917
## morphology772 0.922 0.000 0.851
## syntax773     0.000 0.939 0.882
## syntax774     0.000 0.907 0.824
## syntax775     0.000 0.925 0.856
## syntax776     0.000 0.931 0.868
## syntax777     0.000 0.930 0.865
## syntax778     0.000 0.916 0.840
## syntax779     0.000 0.862 0.744
## syntax780     0.000 0.931 0.866
## syntax781     0.000 0.911 0.829
## syntax782     0.000 0.907 0.823
## syntax783     0.000 0.936 0.876
## syntax784     0.000 0.926 0.857
## syntax785     0.000 0.951 0.904
## syntax786     0.000 0.952 0.907
## syntax787     0.000 0.947 0.897
## syntax788     0.000 0.922 0.850
## syntax789     0.000 0.912 0.832
## syntax790     0.000 0.956 0.915
## syntax791     0.000 0.962 0.925
## syntax792     0.000 0.927 0.859
## syntax793     0.000 0.952 0.905
## syntax794     0.000 0.904 0.817
## syntax795     0.000 0.883 0.780
## syntax796     0.000 0.950 0.902
## syntax797     0.000 0.932 0.869
## 
## SS loadings:  9.985 21.49 
## Proportion Var:  0.27 0.581 
## 
## Factor correlations: 
## 
##       F1    F2
## F1 1.000 0.959
## F2 0.959 1.000

The fit is pretty good, but the correlation between the latent variables is very high, .959

Compare the fit to the one dimension model and two-dimension exploratory model.

anova(m1, m3)
## 
## Model 1: mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = model.1, itemtype = "2PL", 
##     verbose = FALSE)
anova(m2, m3)
## 
## Model 1: mirt(data = Complex_short, model = model.1, itemtype = "2PL", 
##     verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = 2, itemtype = "2PL", verbose = FALSE)

It its better than the 1 dimension, but worse than the 2 dimension exploratory model. Let’s try plotting each of the proficiencies against the raw data.

Complex_short %>%
  mutate(
    Raw = rowSums(.[,1:37]), 
    Theta1 = fscores(m3, method="MAP")[,1], 
    Theta2 = fscores(m3, method="MAP")[,2]
  ) %>%
  dplyr::select(Raw, Theta1, Theta2) %>%
  ggpairs()
## Warning: The following factor score estimates failed to converge successfully:
##     32,55,89,186,197,382,413,418,516,522,545,571,588,633,653,677,691,718,739,774,889,917,1060,1100,1190,1328,1697,1732,1881,1928,1955,2010,2098,2121,2163,2276,2294,2303,2331,2359,2490,2517,2532,2580,2610,2710,2721,2752,2760

## Warning: The following factor score estimates failed to converge successfully:
##     32,55,89,186,197,382,413,418,516,522,545,571,588,633,653,677,691,718,739,774,889,917,1060,1100,1190,1328,1697,1732,1881,1928,1955,2010,2098,2121,2163,2276,2294,2303,2331,2359,2490,2517,2532,2580,2610,2710,2721,2752,2760

The correlation between the morphology latent variable and the syntax latent variable is like .99. They show the same results, more or less.

4 The Relationship between predictors and grammatical proficiency.

by_Participant <-Complex_short_with_ids %>% 
  mutate(
    Theta = fscores(m1, method="MAP")[,1],
    Raw = rowSums(.[2:38])
    ) %>%
  dplyr::select(data_id, Raw, Theta) %>%
  full_join(Complex) %>%
  group_by(data_id) %>%
  slice(1)
## Joining, by = "data_id"
theta_by_age <- by_Participant %>%
  ggplot(aes(x=age, y=Theta)) + geom_point() + stat_smooth(method="loess")

raw_by_age <- by_Participant %>%
  ggplot(aes(x=age, y=Raw)) + geom_point() + stat_smooth(method="loess")

theta_by_age | raw_by_age
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

So, that’s really interesting. One I’ve accounted for the measurement error, the effect of age on grammatical complexity is pretty linear, whereas with the raw scores it’s clearly non-linear.

theta_by_prod <- by_Participant %>%
  ggplot(aes(x=production, y=Theta)) + geom_point() + stat_smooth(method="loess")

raw_by_prod <- by_Participant %>%
  ggplot(aes(x=production, y=Raw)) + geom_point() + stat_smooth(method="loess")

theta_by_prod | raw_by_prod
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

summary(lm(Theta ~ production, data=by_Participant))
## 
## Call:
## lm(formula = Theta ~ production, data = by_Participant)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82923 -0.20034 -0.00348  0.20719  1.44997 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.216e-01  1.203e-02  -68.31   <2e-16 ***
## production   3.336e-03  3.483e-05   95.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3756 on 2786 degrees of freedom
## Multiple R-squared:  0.7671, Adjusted R-squared:  0.7671 
## F-statistic:  9179 on 1 and 2786 DF,  p-value: < 2.2e-16
Sys.Date() 
## [1] "2021-06-29"
R.Version()
## $platform
## [1] "x86_64-w64-mingw32"
## 
## $arch
## [1] "x86_64"
## 
## $os
## [1] "mingw32"
## 
## $system
## [1] "x86_64, mingw32"
## 
## $status
## [1] ""
## 
## $major
## [1] "4"
## 
## $minor
## [1] "0.2"
## 
## $year
## [1] "2020"
## 
## $month
## [1] "06"
## 
## $day
## [1] "22"
## 
## $`svn rev`
## [1] "78730"
## 
## $language
## [1] "R"
## 
## $version.string
## [1] "R version 4.0.2 (2020-06-22)"
## 
## $nickname
## [1] "Taking Off Again"