Background

The focus here is on reviews, i.e., when any ASQ-3 questionnaire is administered again for the same time period. Domain knowledge suggests that

  • reviews are typically carried out when at least one of the sub-domain scores does not obtain a pass, but the health visitor feels the kid is almost there

  • reviews are typically carried out ~3 months after the first assessment

  • not passing on one or more sub-domains should not warrant administering the whole questionnaire again.

In light of this, it is of interest to understand whether the above expectations on reviews find confirmation in the Stockport data. Specifically, we want to know about

  1. the overall number of reviews

  2. the time gap between assessment and review

  3. whether differences are observable by:

    1. where (in Stockport)
    2. specific sub-domain
    3. score levels.

Dataset used: [to complete]

1. How many reviews?

Note: There seem to be discrepancies (114 records) if we identify duplicates via ParObTerm rather than asmt+period. I’ll use with the latter, for now.

There appear to be ~ 7.37%, or 983 out of 13342 unique identifiers presenting at least one review. Of these, 50 have at least 2, and 3 have 3.

These result in a total of 6310 duplicate records out of 147268, or ~4.28% of the total.

The following table contains the counts of duplicate records by type of assessment (comm to prob refer to ASQ-3, SE stands for social-emotional).

##           
##                0     1     2     3
##   comm     26559  1148    45     0
##   fine     26238   826    21     0
##   gros     27274  1118    30     0
##   pers     26201   978    36     0
##   prob     26153   946    36     0
##   SE        6457   250    15     0
##   WellComm  2382   444    99    12

or, in relative terms (row-wise conditional distributions)

##           
##                0     1     2     3
##   comm     95.70  4.14  0.16  0.00
##   fine     96.87  3.05  0.08  0.00
##   gros     95.96  3.93  0.11  0.00
##   pers     96.27  3.59  0.13  0.00
##   prob     96.38  3.49  0.13  0.00
##   SE       96.06  3.72  0.22  0.00
##   WellComm 81.10 15.12  3.37  0.41

Notably, the WellComm assessments present a higher proportion of duplicates, if compared with the others.

An overall view of the average ASQ-3 score over time (in days, between 2015-10-08, 2021-03-30), by sub-domain (comm, fine, gros, pers, prob) and among individuals with:

  • no review (red)
  • yes reviews, first assessment scores (black)
  • yes reviews, subsequent assessments scores (black)

The dashed blue line denotes the average threshold for a PASS over time (why is it moving? Rachel, any news on this?). The grey bands reflect the amount of information at each time point available to estimate the moving average. What this representation does not show, is variations by stage of child development: at each time point, all test scores for that sub-domain around that date for all kids are averaged. Some might be taking the 2 months, other the 36 months questionnaire.

The plot above can be ideally used to gain insight on temporal trends in scores over the study window, by sub-domain and situation (never had a review, at least one review). For example, looking at those that did not need a review (in red): the evidence seems to suggest an overall constant average score on comm, pers and prob, while the average score for fine showed a decrease, and that for gros an increase over time. Again, different stages of development are pooled together.

Another potentially useful view of the data can be obtained by looking at time as defined by measurement occasions (at 2 months, 18 months, etc), rather than by calendar date. In this way, the measurements for the cohort can be represented over a timescale that has a common “time zero” for everyone, which coincides with their first measurement (usually at 2 months). This overcomes the problem with mixed developmental stages, but pools together questionnaires that are not contemporary (which is not a problem per se, but would need to be taken into account if interested in evaluating a population intervention at a given point in time). The plot below has the same structure as the previous one, with the only distinction being the time scale.

Once again, the thresholds do not appear to be constant, which needs to be investigated (possibly an artifact of how I am grouping?). The increasingly wider grey bands suggest less information is available regarding assessment for older children. I will not delve into intepretation of these plots, it was just to show what could be done. The next graph ignores the stratification by review status, and could be used to provide an overall assessment in the study window of trends in development, as measured by ASQ-3, by sub-domain.

2. Time gap

The existence of the reviewing process, as well as the fact that we are looking at a complex system over time, warrants at least a look into how regular/adhering to what planned the assessment are. The following graph is a visualisation that I have developed for a research work on breast cancer screening programmes, that I feel would be very useful here.

For each individual and sub-domain, I compute the time distance (here in months, could in principle be any unit of time) between the first assessment within the category, and all the subsequent. In this way, I obtain a set of observations whose frequency distribution can then be visualised by means of, for example, a histogram. The plot below contains the distribution of time from first assessment in months, by sub-domain. Note that I am excluding the zeros, as they would represent the distance of the first assessment (usually at 2 months of age) from itself. The fact that there seems to be some frequency mass at 0, is simply a graphical artifact due to how binning is done.

This visualisation is aimed at investigating the overall timing of the “assessment machine”. The evidence of a cyclic nature in when most of the ASQ-3 questionnaires are administered is to be expected, given the planned timetable, as is some variability around that (questionnaires cannot quite possibly be administered at exactly the same intervals of time to everybody). What this plot also suggests, which was also highlighted by the increasingly wide grey uncertainty bands in the score trends in the previous section, is that we have less and less measurements as we look at older children.

3. Do we see any difference when stratifying by

a) where (in Stockport)?

b) which sub-domain were not a pass?

c) the score levels?