final_paper

Final paper, due May 22nd

Introduction

Looking at a shopping list or using a smartphone to recall a birthday are examples of sampling information in the world to save us the effort to remember information (Risko & Gilbert, 2016). Whether we rely more on external sources of information or our own memory depends on the effort each of the options requires, as we tend to minimize overall effort (Draschkow et al., 2021). This is the sampling-remembering trade-off: we tend to look more (sample the environment) when remembering is ‘costly’ (e.g. when a shopping list is difficult to memorize) and remember when sampling is costly (e.g. when the shopping list is difficult to access). Children demonstrate this trade-off when counting on their fingers as they learn arithmetic. Yet, little research has been devoted to understanding how this trade-off can be characterized in young children. Empirical research investigating children’s spontaneous use of memory while the to-be-remembered information is readily available in the external world is of great importance, now more than ever. There are concerns that excessive reliance on external memory (e.g., information on digital devices) can be harmful to our own internal memory. Understanding the impact of externalizing memory on children’s decisions to use internal memory will be invaluable moving forward as internet access and mobile devices become increasingly more accessible for young children.

Children have a lower working memory capacity than adults (Cowan, 2016), making it all the more advantageous for children to ‘offload’ information from internal memory to an external resource. Previous research examining children’s use of external memory in experimental tasks has demonstrated that children do, in fact, take advantage of external resources in order to relieve the burden on internal memory (Armitage & Redshaw, 2022; Armitage & Bulley, 2020). In a recent study (Kenderla & Kibbe, 2023), 8-to-10-year-olds played a card-matching game where they had to remember which set of face-down cards satisfied a rule. The researchers manipulated the delay time that children had to wait before they could view a face-down card. When the delay time was shorter, children chose to view more cards, thus relying more on external resources. Although children traded off between external resources and internal memory in this study, the paradigm was too complex for preschool-aged children. Thus, the emergence of this tradeoff between external and internal resources remains uncertain. Most research on the spontaneous use of offloading in young children fails to find evidence for it in children under five years of age. For example, Armitage and Redshaw (2022) had 4-to-11-year-old children search for rewards by using a map that the children could physically rotate (as opposed to mentally rotate). Although 5-year-olds chose to offload the rotation task onto the physical map, this strategy was more consistent in children aged six years and older.

One theory for why young children fail to strategically utilize external resources is that young children have an underdeveloped ability to monitor their own cognitive processes and skills (i.e., metacognition). Research on metacognition in early childhood often finds that children under five years of age tend to make particularly inaccurate judgments about their performance on cognitive tasks. When asked to estimate how well they did in a dot-counting task, 5-year-olds were significantly worse than 7-year-olds at correctly evaluating their own performance (O’Leary & Sloutsky, 2018). It is also possible that young children have different objectives than older children when completing cognitive tasks. In an experiment where children could choose between an easier and a more difficult counting task, O’Leary and Sloutsky (2017) found that 5- and 7-year-olds did not consistently choose the easier task unless specifically instructed to. This could reflect a failure of metacognition, where children are unaware that one condition is harder and leads to worse performance, or it could reflect a difference in task objectives, where children know that one condition is harder but are less driven by performance failures to choose the easier option. Despite this, other research has found that children at this age and younger do seem to recognize the utility of external resources, such as reminders, when the external resource leads to better performance or more rewards. In one study, 3-year-olds were able to spontaneously use an external reminder about the location of a reward, although transferring this strategy to a novel-but-similar task only occurred after age 4 (Armitage et al., 2023; Bulley et al., 2020). Additionally, when an effort-minimizing option could lead to better performance, 5-year-olds were able to spontaneously choose the option that reduced the load on internal resources, although this behavior was more consistent in older children (Armitage & Redshaw, 2022). Although metacognition is likely important for recognizing the utility of the external world as a resource, children might be able to use offloading strategies when the experiment involves a cost-dependent tradeoff.

In Liang et al. (2024), the researchers used the same paradigm as the current study to examine whether five to 8-year-old children would adapt their use of internal versus external resources in the face of a cost-dependent tradeoff. In this gamified, naturalistic paradigm, children shop for items with a shopping list. The list and the store are not visible simultaneously, but children can toggle between them. The manipulated cost is the delay time that occurs when children decide to check the list. There were three conditions, each with a different delay cost. In the “No-delay” condition, there was no (0 s) delay to see the list. In the “Long-delay” condition, there was a 4-second delay. In the “One-shot” condition, there was no delay to access the list, but children only had one opportunity to access it. Children in this study checked their shopping list more frequently and spent less time studying it in the No-delay condition than in the Long-delay and One-shot conditions, demonstrating an adaptive trade-off between using internal versus external resources. When asked whether the No-delay or Long-delay is easier and which they prefer, most children also claimed that they thought the No-delay was easier and preferred, suggesting that children could metacognitively assess the difficulty caused by the delay cost in the Long-delay condition. In the current study, we aim to replicate these effects in a slightly younger age group (4.5-7-year-olds) while controlling for response bias and guessing.

The Current Study

Here, we will use a novel, gamified paradigm, debuted in Liang et al. (2023) to test how the costs associated with external resources affect how many items children decide to memorize. Previous studies that assessed memory performance may have been affected by differences between individuals’ response biases (Brady et al., 2023; Macmillan & Creelman, 2004). The current study will avoid this by using a forced-choice method, which allows for the most valid assessment of memory performance. Forced-choice measures are theory-neutral with respect to the distribution of memory signals and allow for the best-remembered item to always win over the less-remembered item. Additionally, we employ a measure of memory performance that is appropriate for the forced-choice method and accounts for guessing, which has been shown to have an impact on forced-choice measures of performance (Vuorre & Metcalfe, 2022). We also wanted to compare how much children decide to remember when they can check the list again with how much they can remember if they can’t check the list again. To do this, we added one trial to each condition where children only have one opportunity to check the list, and they have to make a selection from each forced-choice pair, causing them to use the maximum of their memory content for what they saw on the shopping list. We also aim to investigate the emergence of the strategic tradeoff between sampling the list and remembering items by including children as young as 5.5 years up to 7 years.

Methods

We used a modified version of the Shopping Game used in Liang et al. (2025). The Shopping Game is a tablet-based, naturalistic, and child-friendly memory paradigm designed in PsychoPy (Pierce et al., 2019). In this game, participants were asked to pick several items from a store based on a shopping list. The store and the list were not visible on the screen simultaneously; instead, participants had to toggle between them by tapping the store icon (an image of an arrow) or the list icon (an image of an adult+cart), respectively. There was no time limit on the task and, crucially, children were told that they were allowed to go back to the list as many times as they wanted. Correctly selected items were crossed off on the 10-item list (to make it clear that they did not need to be selected again).

Children were required to find only 6 out of the 10 items on the list in each trial, but this information was not explicitly communicated to them. The reasoning behind this was to ensure that children always had a relatively large number of unselected items remaining on the list throughout the trial. As motivation and feedback, every correct selection was rewarded with a star and accompanied with a pleasant sound while every incorrect selection resulted in losing a star and an unpleasant sound. Children were told that the stars indicated how many stickers they would get at the end of the experiment( but, in reality, they could get as many stickers as they wanted at the end of the session). Once the child picked at least 6 correct items in the store, the trial would end when they next tapped the list icon. This kept children from inferring that they only needed to select 6 items each trial, and so that children would not be limited in the number of items to memorize even as they were getting close to the end of the required number of items. When the trial was over, the number of stars children earned on that trial appeared on the screen and was read by the experimenter.

Each trial began with the store. Here, participants see 20 food-item icons grouped into two-alternative forced choice (2-AFC) pairs, with each pair on a little ‘shelf’ in the store and every pair containing one item from the shopping list and one distractor item that is not on the shopping list. This means every trial had 10 possible response choices. The items in the store were arranged and paired together based on category (snacks, ingredients, desserts, vegetables, and fruits) and remained in the same pair and location for each trial to minimize the time and effort spent searching.

After this initial store exposure, participants pressed the list icon to start the transition to the shopping list. The shopping list contained 10 items. The set of items on the list were randomized for each trial. The child was free to study the list for as long as they liked and attempt to remember as many items as they liked. Next to the list was the store icon that would switch back to the store (and the list would go away). On each visit to the store (after the initial store exposure, i.e. after having seen their shopping list), the child was free to select as many items as they wanted, but could only select one item in each 2-AFC pair. After they selected an item, that item and its pair became grayed out for the duration of that visit. (If and when a child returned to the store for another visit, all grayed-out 2-AFC pairs were restored.) Selecting the same item twice in a trial, even if that item had been on the list (and was now crossed out), was counted as incorrect. If a child selected an incorrect item in a 2-AFC pair, sampled the list, and then selected the correct item in that 2-AFC pair after returning to the store, it was counted as correct.

The key manipulation in the Shopping Game was the accessibility of the external resource, which was determined by the delay between tapping on the list icon and the appearance of the list.

There were three conditions: a ‘low-cost’, 0s delay (No-Delay condition), a ‘high-cost’, 4s delay (Long-Delay condition), and a Total-responsecondition where there was a 0s delay to see the list, but after children made their first trip from the list to the store, they could not return to the list a second time. In the No-Delay and Long-Delay conditions, children were free to attempt as many of the 2-AFC shelves as they liked (free-response). However, in the Total-response condition (the final trial in each block), children had just one opportunity to sample the list and then must select an item from each of the 2-AFC pairs (total-response). There is only a single Total-response trial per blocked condition, and it is always the last trial.

The differences between the three conditions were explained to the children with a plausible narrative: in the No-delay condition, children are in a small convenience store where their parent is always close by, while in the Long-delay condition, children are in a large supermarket where the parent needs to take a long walk to the children to check the list, and finally in the Total-response condition, the store is “closing”, and that they cannot return to the list a second time and that they would have to do their best to pick one item from each shelf.

In addition, as a visual reminder cue, a picture of a small convenience store (No-delay) or a large supermarket (Long-delay) was presented before each trial. A visual reminder was also present during each trial. In the No-delay condition, the distance between the adult and child icons was small, in the Long-delay condition the distance was large (and, during the 4s time delay, there was an accompanying animation of the list icon (adult+cart) moving toward the icon of the child), and in the Total-response condition, there was a red “closing” sign over the area of the screen where the list usually appears, after children had already made their first trip between the list and the shelf. The Total-response conditions therefore allow us to assess whether children used the entire contents of their memory when they made a trip between the list and the shelf. This is a way to estimate how children respond based on memories that they may be less confident in.

Each condition was organized into a block with one initial practice trial followed by three free-response trials, followed by one Total-response trial. Practice trials were identical to free-response trials, except that the experimenter explained the game as the trial played out, including the meaning of different icons, the reward system, and how to operate the game. There were no practice trials for the Total-response condition, as we wanted that condition to come as a surprise for children. Children participated in all three conditions and the order of the No-delay and the Long-delay conditions was counterbalanced between participants.

Measures and analyses

We measured two variables in every trial. We focus on the first trip in every trial for fair comparison between conditions.

Study time: the amount of time spent studying the shopping list of to-be-remembered items per trip, which reflects the effort they exerted to store items in working memory (i.e., reliance on internal resources). Study time was averaged over all first trips across all trials for each condition for a child.

Memory usage (MU): an estimate of the amount of information (remembered items from the shopping list) the child applies on each (first) trip. This was calculated based on a (corrected for guessing) percent correct measure from the response period. Each of the 10 shelves presents a 2-AFC task on which the child can either be correct or incorrect. The chance level at this task is 50%. MU, then, is defined as:

MU = (((numCorrect/numAttempted)-0.5)/0.5)*numAttempted

On free-response trials, the child was free to attempt to choose items from however many (0-10) shelves they wanted. However, on Total-response trials, the child was required to attempt all 10 shelves. MU was calculated for each child, within a condition, for each of these types of trials, yielding two measures: MU(free) and MU(total), respectively. By design, in free-response trials, MU(free) represents a mixture of both how much a child remembers and also their response bias (liberal or conservative) at applying that information when responding. For example, a child with a ‘liberal’ bias will tend to attempt more shelves, acting on information even under conditions of lower confidence, while a child with a conservative bias will tend to attempt fewer shelves, only making attempts under conditions of high confidence. In total-response trials, however, the child is obliged to respond to all of the 10 2-AFC shelves, no matter their response bias. In this way, MU(total) is our estimate of true (bias-free) underlying memory use in a particular condition (i.e., the answer to the question ‘How many items do children remember on each trip’?). The particular case of MU(total) in the context of the One-shot condition, then, is our best estimate of maximal working memory capacity for each child in the context of our paradigm (i.e., the answer to the question, ‘How many items are children capable of remembering on each trip?’).

In accordance with our hypotheses, we expect that Study Time will be shortest and MU(free) will be smallest in the No-delay condition, that Study Time will be longer and MU(free) will be larger in the Long-delay condition, and that MU(total) across all of the Total-response trials will be higher than MU(free) across all of the free-response trials.

We used R (R Core Team, 2023) and Rstudio (Rstudio Team, 2015) for data processing, analyses, and data visualization. We used the tidyverse package (Wickham et al., 2019) for data processing and visualization. We run our mixed model analyses with the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages. We conducted our model comparison and selection with the car package (Fox & Weisberg, 2018) and the MuMIn package (Bartoń, 2023) and the performance package (Lüdecke et al., 2021). We used the emmeans package (Lenth, 2023) for posthoc tests.

To analyze the data, we fit the data with generalized linear mixed models. We visually inspected the distributions of each variable to determine the appropriate link functions and distribution family for the models. Study Time was fit with an inverse gaussian distribution and an identity link function (Lo & Andrews, 2015), with condition (No-delay/Long-delay) as a fixed effect. MU(total) was fit with a gaussian family and an identity link function, with condition as a fixed effect. In all these models, we included a by-participant random intercept.

Results

Main effects. Study time showed a significant effect of condition (X² = 8.54, p = 0.003), with children studying the list longer as a function of increasing access costs across conditions. Study time was longest in the Long-delay condition (M = 6.43, SD = 3.63), and it was significantly longer than study time in the No-delay condition (M = 5.31, SD = 2.48).

Memory Usage showed a marginally significant effect of condition (X² = 3.92, p = 0.048), with children using more memory as a function of increasing access costs across conditions. Memory Usage was larger in the Long-delay condition (M = 1.85, SD = 1.74) than Memory Usage in the No-delay condition (M = 1.60, SD = 0.94).

Children used the entire contents of their memory when they made a trip between the list and the shelf, as Memory Usage in the Total-response condition (M = 1.99, SD = 2.74) was larger than Memory Usage in the Free-response condition (M = 1.72, SD = 0.85), but this was non-significant (X² = 0.87, p = 0.35).

Results summary. The results of the current study demonstrated that 5.5-7-year-old children were sensitive to the delay cost to access external resources (a shopping list) and traded off their use against internal resources (working memory). When there was an annoying 4-second delay to access their shopping list, children spent a longer time studying the list than when there was no delay to access the list. Children also had a slightly higher Memory Usage (MU) score when there was a delay to access the list compared to when there was no delay. When children only had one chance to look at the list and had to exhaust their internal memory storage by selecting an item from every possible 2-AFC pair (Total-response trials), their MU score was not substantially higher than when they could return to the list more than one time, suggesting that children use the majority of their internal memory on each trip from the list to the store.

Discussion

In this project, 5.5 to 7 year old children were asked to pick items from a store based on a shopping list. Importantly, the store and the list were not visible simultaneously; instead, participants could toggle between them by tapping an icon. The key manipulation was the accessibility of the list. There were three conditions: a No-Delay condition, where there was no delay to toggle between the store and the list; a Long-Delay condition where there was a 4-second lag time; and a Total-response condition where children would, unbeknownst to them, have only one chance to see the list. We found that children studied the list longer and remembered more items as a function of increasing access costs across conditions. We also found that children use the majority of their memory contents on each trip between the list and the shelf. Our results demonstrate that children become able to trade-off between sampling the environment and using their working memory around 5 years of age.

Previous research involving trade-offs between internal and external memory has focused on adults (Draschkow et al., 2021) or has involved paradigms that would be too complex or difficult for young, preschool-aged children (Kenderla & Kibbe, 2023). Nonetheless, we found that children as young as five trade-off between storing information in internal memory and sampling the environment for information in a cost-dependent manner. Future research could further investigate the behavioral effects of increasing the cost to access external information by increasing delay times or by using physical costs or punishments (removing stars for each trip to the list). Future research could also seek to replicate the current findings in a more diverse sample by incorporating the paradigm into a large-scale online study.

One of the limitations of the current study is the generalisability of the sample, as the demographic status was mostly white and all children were sampled from the Northeast U.S. Another limitation was the small number of trials per child (most children contributed 3 trials per condition) which made it unfeasible to include random slopes in our models. We had so little trials because as-is the experiment took fifteen to twenty minutes per child and since the experiment was conducted in childrens’ museums, we did not want to take up more of the family’s time as well as we wanted to keep in mind that children might become bored with the study if it lasted longer than twenty minutes.

Conclusion. Overall, children trade off between sampling the environment and using their working memory, starting around 5 years of age. When it’s up to them, children sometimes choose to remember less than they can, but they go back to the list when their memory is exhausted. In a future study, we will investigate the role of schooling in this trade-off by testing children in a longitudinal study at 5 years of age (before formal schooling) and again at 6 years (after one year of formal school).

References

Armitage, K. L., Suddendorf, T., Bulley, A., Bastos, A. P., Taylor, A. H., & Redshaw, J. (2023). Creativity and flexibility in young children’s use of external cognitive strategies. Developmental Psychology, 59(6), 995.

Armitage, K. L., Bulley, A., & Redshaw, J. (2020). Developmental origins of cognitive offloading. Proceedings of the Royal Society B, 287(1928), 20192927.

Armitage, K. L., & Redshaw, J. (2022). Children boost their cognitive performance with a novel offloading technique. Child Development, 93(1), 25-38.

Barton, K., & Barton, M. K. (2015). Package ‘mumin’. Version, 1(18), 439.

Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., … & Bolker, M. B. (2015). Package ‘lme4’. convergence, 12(1), 2.

Brady, T. F., Robinson, M. M., Williams, J. R., & Wixted, J. (2021). Measuring memory is harder than you think: How to avoid problematic measurement practices in memory research. https://doi.org/10.31234/osf.io/qd75k

Bulley, A., McCarthy, T., Gilbert, S. J., Suddendorf, T., & Redshaw, J. (2020). Children devise and selectively use tools to offload cognition. Current Biology, 30(17), 3457-3464.

Cowan, N. (2016). Working memory maturation: Can we get at the essence of cognitive growth?. Perspectives on Psychological Science, 11(2), 239-264.

Draschkow, D., Kallmayer, M., & Nobre, A. C. (2021). When natural behavior engages working memory. Current Biology, 31(4), 869-874.

Fox, J., Weisberg, S., Adler, D., Bates, D., Baud-Bovy, G., Ellison, S., … & Heiberger, R. (2012). Package ‘car’. Vienna: R Foundation for Statistical Computing, 16(332), 333.

Kenderla, P., & Kibbe, M. M. (2023). Explore versus store: Children strategically trade off reliance on exploration versus working memory during a complex task. Journal of Experimental Child Psychology, 225, 105535.

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. (2017). lmerTest package: tests in linear mixed effects models. Journal of statistical software, 82, 1-26.

Lenth, R., & Lenth, M. R. (2018). Package ‘lsmeans’. The American Statistician, 34(4), 216-221.

Liang, Y., Blaser, E., Yi, J. Y., Sai, L., & Kaldy, Z. (2025). The extended mind in young children: Cost-dependent trade-off between external and internal memory. Psychological Science, 9567976241306424.

Lo, S., & Andrews, S. (2015). To transform or not to transform: using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology, 6, 1171.

Lüdecke et al., (2021). performance: An R Package for Assessment, Comparison andTesting of Statistical Models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139

O’Leary, A. P., & Sloutsky, V. M. (2018). Components of meta-cognition can function independently across development developmental psychology. Advance online publication.

O’Leary, A., & Sloutsky, V. (2017). Five-Year-Old Children Transfer a Metacognitive Strategy to a Novel Task. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 39).

Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in cognitive sciences, 20(9), 676-688.

Vuorre, M., & Metcalfe, J. (2022). Measures of relative metacognitive accuracy are confounded with task performance in tasks that permit guessing. Metacognition and Learning, 17(2), 269-291.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D. A., François, R., … & Yutani, H. (2019). Welcome to the Tidyverse. Journal of open source software, 4(43), 1686.