Overview

The paradigm was a lexical decision task, and participants were primed in three conditions in order to investigate if the sublexical semantic element unrelated to the overall meaning of the character was primed (slowed) by a character related to it. As a reference, take the character meaning REJECT, which contains the sublexical element (a phonetic) that means HUGE The “overall” condition was when the prime was related to the meaning of the entire target character (like NEGATE to REJECT). The “semantic” condition displayed a prime related only to the sublexical part with the unrelated meaning to the whole (for REJECT we used BIG because it is related in meaning to HUGE). A control condition was included where the prime was unrelated to all aspects of the target character (EVERY for REJECT). 30 prime-target pairs per condition were provided per participant. In order to avoid habituation to the task, an equal number of false characters were included as foils. As a result participants performed 180 trials: 30 items for each of three conditions, where each prime was also shown with a nonword foil character.
All data shown are based on full dataset - no outliers have been designated, however visuals are displayed where the y-axis is limited to less than the full range for response time values in order to see differences more clearly.

Using RT (msec) as a benchmark for engagement for the participants, plotting that value against trial order, we see that the task became easier for the participants with a slight downward trend over trials across participants.

Outliers

Accuracy

There wasn’t a great deal of variability in accuracy by participant, where we see a bit of a ceiling effect.

You can see that there is a discontinuity in the distribution, with several participants falling outside the bulk of the distribution on accuracy.

RT

The distribution of RTs exhibit a very long tail. The data below show all observations, not grouped by participant. We will treat observations greater than 3500 msec as outliers at this point. Responses greater than this likely qualify as low-quality trials. On the other hand, there were many observations that were so quick they may not have been viable trials due to how swiftly participants responded. We will consider trials faster than 400 msec as being outliers for now. Dividing lines for outliers are shown in red.

Here is the distribution of RTs less than 500 msec (i.e., the far left side of the distribution shown above).

The remaining plots here subset only observations that fall between 400 and 3500 msec, even for the item and participant outlier analyses. Also, on RT data below only accurate trials are shown.

Aggregating RT by participant, we see that the distribution of average response times for trials is relatively continuous; there are perhaps a couple participants who are off in their own part of the distribution.

Subjects 74 and 75 seem to be outliers on RT, so we can look at their data more closely.

By participant average response time is highly correlated with the spread of reaction time for a given participant, indicating that as average RT per participants increased, so did the variability associated with ther RTs.

Item outliers

Targets
Accuracy

Clearly a few real character targets that are outliers on accuracy, indicating that they were hard for participants to judge as real words.

The two targets that standout are T19 and T28. T19 is the following character: T19 And this is T28: T28

From here, we will remove these items (T19 and T28) from analyses.

RT

The trend for RT is shown with regard to items. No obvious outliers are present, but there are several items with a lot of variability associated with them on RT.

We have already seen items T19 and T28, but here is T21: T21, here is T23: T23, and here is T25: T25.

Primes

Primes came in three flavors (each associated with a condition), where primes were selected to bear a relationship with an aspect of the target to which they corresponded (semantic component, overall meaning, or unrelated completely). It is less likely that specific primes would be associated with outlier observations whether on accuracy or rt, but we can inspect that just to make sure.

The primes O23 O23, and U25 U25 appear as possible outliers on rt, but this isn’t a clear designation.

The only item that sets itself apart is S25. However, it isn’t likely that we would have an outlier prime that would systematically bias results given how briefly they are presented prior to the ldt.

Dependent measures by condition

Accuracy

Targets were either real characters or false characters. For each trial, it was possible for participants to answer correctly for a given item, or incorrectly. We see below that participants found trials with real words to be easier than their foil counterparts.

We see lower accuracy in the semantic condition than the two others. This conforms with a theory that predicts a detectable difference between the semantic and overall conditions, and that participants would be biased to be “tricked” in the semantic condition. These data are shown without the trials with the foils. Plots are shown as boxplots. The upper and lower portions of the boxes span 50% of the distribution, demarkating the lower edge of quartile 2 (the lower value) and the upper value of quartile 3 (interquartile range). The whiskers end at the extreme values not considered outliers. Outliers are shown as points. Middle lines indicate medians.

The pattern of means is such that accuracy is nearly identical in the unrelated and semantic conditions, and highest in the overall condition.

Conditioning on frequency we see a ceiling effect in the high frequency condition, with participants not finding those trials difficult based on the accuracy data. Accuracy is overall lower in the low frequency condition, which we would expect given the relative difficulty of those items. Also, the low frequency condition seems to drive the accuracy data down for the semantic and unrelated conditions.

RT data

Having trimmed for the bad items (T19 and T28), the overall condition appears to be the slowest, on average, with no noticeable difference between the semantic and unrelated conditions. Again, data here are for real word trials only that were accurately judged, with RT between 400 and 3500 msec and two items removed.

condition rt
Overall 763.5136
Semantic 773.4146
Unrelated 774.8707

Here we see the trend in barplot form. All error bars are represented as standard errors of the mean. These data are not modeled - they are descriptives calculated directly from raw data.

RT by frequency

When the data are disagregated by frequency, we see that the low frequency trials are markedly slower, as expected, and that the semantic condition is indeed the slowest.

RT differences between primetypes

Each prime appears with two characters for each participant: once with a true character and a second time with a non-character. In theory, that difference should be detectably different in the “semantic” condition than the others given the special relationship between the primes and real character targets in that condition if rt is distributed randomly across the three conditions for the foils. The plot below shows the average differences in RT between primetypes across the three prime conditions.

Reading skill

Participants filled out a questionnaire regarding their reading behavior and disposition towards reading. This questionnaire consists of a number of questions about language behavior and background. A description of items that aren’t self evident is provided below.

  • lcnl1 = “Compared to other college students, how much time do you spend reading all types of materials?” (scale from 1 to 7; much less to much more)
  • lcnl2 = “Compared to the reading material of other college students, how complex do you think your reading material is?”
  • lcnl3 = “Compared to other college students, how much do you enjoy reading?”
  • lcnl4 = “Compared to other college students, how fast do you normally read?”
  • lcnl5 = “Compared to other college students, when reading at your normal pace, how well do you understand the reading material?” ()
  • lcnl6 = “Compared to other college students, how much reading of any sort (books, webpages, assignments, etc.) do you do?”
  • lcnl9 = “How much difficulty did you have learning to spell in elementary school?” (5 point scale from “None” to “A great deal”)
  • lcnl10 = “How would you compare your current spelling to that of others of the same age and education?” (5-point scale from “Above average” to “Below average”)
  • LangProp1 = “Of the language you use daily, what proportion is Chinese?”
  • LangProp2 = “Of the language you use daily, what proportion is English?”
  • LangProp3 = “Of the language you use daily, what proportion is a language other than Chinese or Mandarin?”
  • ReadTimeM = “What is the average time (in hours) that you spend reading Mandarin texts every week? (including books magazines, and websites)”
  • ReadTimeO = “What is the average time (in hours) that you spend reading texts in languages other than Mandarin every week? (including books magazines, and websites)”
  • VideoTimeM = “Please state how many hours of Chinese subtitled media programs you watch on a weekly basis.”
  • VideoTimeO = “Please state how many hours of non-Chinese subtitled media programs you watch on a weekly basis.”
  • WritingTimeM = “Approximately how much time (in hours) do you spend writing in Chinese every day? (using paper and pen, NOT including electronic devices)”
  • DevicesM = “Approximately how much time (in hours) do you spend using Chinese on electronic devices? (including all social media, such as wechat, weibo, as well as email)”

The following is a correlation matrix of these and other relevant participant variables as well as dependent measures (accuracy and rt).