In this post, I will simulate the theoretical relationships between temperament factors: Negative affect, Effortful control and Extraversion, and category learning strategy use. The tasks I used to do hypothesis testing are from Murphy et al., (2017) and Le Pelley et al., (2019). The Murphy task can be learned either through explicit or implicit strategies, and Le Pelley has 3 tasks which can be optimally learned either explicitly or implicitly, individual differences occur with respect to the strategy they use and their learning rate, I want to see whether temperament can account for some of these variances. For Le Pelley task, the first set of analyses before cross-task analysis, will be demonstrated with the conjunctive rule-based task. Data for both tasks will be summarized across participants (i.e., each participant has one row in the dataframe, individual trials were not simulated).

library(data.table) #rbindlist
library(MASS) #logistic models
library(knitr)
library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
#multinomial logistic regression
require(foreign)
require(nnet)

Data simulation

McKinney, Stearns and Szkody (2020) provided mean, sd and variable correlations to create the temperament factor scales data. McKinney et al. recruited 410 females and 189 males, I will stimulate exactly this.

set.seed(888)
# covariate matrix
dat_f <- rnorm_multi(n = 410,
                  mu = c(4.11, 4.17, 4.34),
                  sd = c(.64, .59, .63),
                  r = c(-.29, -.17, -.03),
                  varnames = c("Negative_affect",
                               "Effortful_control", "Extraversion"),
                  empirical = FALSE)

dat_f<-data.matrix(dat_f, rownames.force = NA)
dat_f<-as.data.frame(dat_f)
dat_f$gender<-rep("F", 410)


dat_m <- rnorm_multi(n = 189,
                  mu = c(3.85, 4.19, 4.25),
                  sd = c(.51, .54, .60),
                  r = c(-.49, -.21, .28),
                  varnames = c("Negative_affect", 
                               "Effortful_control", "Extraversion"),
                  empirical = FALSE)

dat_m<-data.matrix(dat_m, rownames.force = NA)
dat_m<-as.data.frame(dat_m)
dat_m$gender<-rep("M", 189)

df<-merge(dat_f,dat_m, all.x=TRUE, all.y=TRUE)
df$subj<-as.factor(sample(1:599, 599, replace=FALSE))

Murphy_prop_II represents the proportion of answers out of 20 trials can be characterized as using similarity-based strategy, mean and sd are provided by Murphy et al., (2017).

set.seed(55444)
#simulate Murphy et al., (2017) each participant's proportion of answers can be characterized by II strategy
rdiscnorm <- function(n, mean, sd, min = 0, max = 100, by = 5){
    # generate the possible values we can take on
    vals <- seq(from = min, to = max, by = by)
    # use dnorm to get the density at each of those points
    unnormed_probabilities <- dnorm(vals, mean = mean, sd = sd)
    # normalize so that the probabilities sum to 1
    #   - this isn't strictly necessary because we use sample
    #     but it makes sense when thinking about the process
    ps <- unnormed_probabilities/sum(unnormed_probabilities)

    # Take a sample with replacement of the vals
    # using the generated probabilities
    output <- sample(vals, size = n, replace = TRUE, prob = ps)
    return(output)
}

out <- rdiscnorm(599, 57, 25.30)

df$Murphy_prop_II <- out

# fit<-lm(Murphy_prop_II~Negative_affect+Extraversion, df)
# summary(fit)
df<-df[,c(5,4,1,2,3,6)]
df[2] <- lapply(df[2],factor)

# create range buckets for each factor scale
df$EC_bucket <- as.factor(ifelse(df$Effortful_control<=mean(df$Effortful_control),"low","high"))
df$NA_bucket <- as.factor(ifelse(df$Negative_affect<=mean(df$Negative_affect),"low","high"))
df$Ex_bucket <- as.factor(ifelse(df$Extraversion<=mean(df$Extraversion),"low","high"))

# Let's suppose this is a CR task
df$task<-rep('CR', 599)

kable(head(df))
subj gender Negative_affect Effortful_control Extraversion Murphy_prop_II EC_bucket NA_bucket Ex_bucket task
36 F 1.748728 4.989457 5.200141 45 high low high CR
238 F 2.405532 4.872522 4.420393 50 high low high CR
44 F 2.556227 4.840598 3.897457 25 high low low CR
181 F 2.562353 4.599368 5.327697 60 high low high CR
55 F 2.653297 3.735701 4.153468 75 low low low CR
423 M 2.668574 5.747569 5.546743 30 high low high CR
summary(df)
##       subj     gender  Negative_affect Effortful_control  Extraversion  
##  1      :  1   F:410   Min.   :1.749   Min.   :2.626     Min.   :2.685  
##  2      :  1   M:189   1st Qu.:3.583   1st Qu.:3.765     1st Qu.:3.895  
##  3      :  1           Median :3.971   Median :4.168     Median :4.293  
##  4      :  1           Mean   :4.027   Mean   :4.165     Mean   :4.285  
##  5      :  1           3rd Qu.:4.471   3rd Qu.:4.533     3rd Qu.:4.686  
##  6      :  1           Max.   :5.694   Max.   :5.896     Max.   :5.849  
##  (Other):593                                                            
##  Murphy_prop_II   EC_bucket  NA_bucket  Ex_bucket      task          
##  Min.   :  0.00   high:300   high:282   high:305   Length:599        
##  1st Qu.: 40.00   low :299   low :317   low :294   Class :character  
##  Median : 55.00                                    Mode  :character  
##  Mean   : 55.67                                                      
##  3rd Qu.: 70.00                                                      
##  Max.   :100.00                                                      
## 

Murphy et al., (2017) analyses simulation.

Multiple linear regression

Murphy’s task has 20 trials, the Murphy_prop_II column represents the proportion of answers can be described as using similarity-based strategy, which ranged fromm 0% to 100%. It is hypothesized that higher Negative affect is related to lower proportion of II responses, and higher Extraversion is related to higher proportion of II responses.

fit<-lm(Murphy_prop_II~Negative_affect+Extraversion, df)
summary(fit)
## 
## Call:
## lm(formula = Murphy_prop_II ~ Negative_affect + Extraversion, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.749 -16.872   1.344  15.549  47.106 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       51.317     10.031   5.116 4.22e-07 ***
## Negative_affect   -1.378      1.483  -0.929    0.353    
## Extraversion       2.310      1.628   1.419    0.156    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.78 on 596 degrees of freedom
## Multiple R-squared:  0.00581,    Adjusted R-squared:  0.002474 
## F-statistic: 1.742 on 2 and 596 DF,  p-value: 0.1761

I would expect to see results similar to these if the hypotheses were correct: Negative affect would have negative coefficient and Extraversion would have positive coefficient, these are also expected to be significant predictors.

Le Pelley et al., (2019)

Default strategy

The ‘strategy’ column represents first block strategy on Le Pelley et al., (2019) tasks. I want to see whether individuals differ in their default strategy as a function of their temperament traits. The default strategies can be single-dimensional rule, conjunctive rule, similarity-based, or guessing. I plan to group these into explicit, implicit, and guessing. In the actual study, I plan to run a multinomial logistic regression analysis with strategy (3 levels) as DV. But here, I will simply demonstrate with a binomial logistic regression with a 2-level DV.

For simplicity sake, let’s suppose column ‘strategy’ represents 2 types of default strategies: 1= explicit strategy; 0 = implicit strategy. I want to use temperament factors to predict default strategy. It’s hypothesized that Negative affect is associated with explicit strategies, and Extraversion is associated with implicit strategies, Effortful control is not specifically related to strategy type, but it should facilitate people to find the optimal strategy. Let’s suppose the current task is explicit-process mediated, which makes explicit strategy the optimal approach, therefore Effortful control should be positively related to explicit strategy.

Binomial logistic regression data simulation methods were used to generate column ‘strategy’ with 2 levels (explicit and implicit strategy), the beta coefficients were arbitrarily chosen, but the positive/negative signs have theoretical support.

xb <- 1.2 + 0.5*df$Negative_affect - 1.2*df$Extraversion + 0.5*df$Effortful_control
p <- 1/(1 + exp(-xb))
summary(p)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.09861 0.40148 0.52943 0.53306 0.66368 0.94076
set.seed(1)
df$strategy <- rbinom(n = 599, size = 1, prob = p)
table(df$strategy)
## 
##   0   1 
## 273 326

Let’s look at the distribution of each factor scale by strategy

# boxplot of how each temperament factor scale is related to each strategy
par(mfrow = c(1,3))
boxplot(Negative_affect~strategy, ylab="Negative Affect", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)
boxplot(Effortful_control~strategy, ylab="Effortful Control", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)
boxplot(Extraversion~strategy, ylab="Extraversion", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)

mod <- glm(df$strategy ~ df$Negative_affect+df$Extraversion+df$Effortful_control, family = "binomial")
summary(mod)
## 
## Call:
## glm(formula = df$strategy ~ df$Negative_affect + df$Extraversion + 
##     df$Effortful_control, family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1874  -1.0865   0.5921   1.0158   1.9697  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            0.2009     1.2412   0.162 0.871419    
## df$Negative_affect     0.5342     0.1495   3.574 0.000352 ***
## df$Extraversion       -1.2213     0.1709  -7.148 8.83e-13 ***
## df$Effortful_control   0.7417     0.1722   4.307 1.65e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 825.69  on 598  degrees of freedom
## Residual deviance: 738.72  on 595  degrees of freedom
## AIC: 746.72
## 
## Number of Fisher Scoring iterations: 4

I would expect to see results similar to these, where Negative affect and Effortful control positive related to the explicit (optimal) strategy, while Extraversion is negatively related to explicit (optimal) strategy.

The logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable. For every one unit change in Negative affect, the log odds of using explicit strategy (versus implicit strategy) increases by 0.53.

Strategy transition point

Let’s simulate participants’ point of transition into using the optimal strategy. Mean and sd were arbitrarily chosen.

set.seed(8)
eps = rnorm(n = 599, mean = 0, sd = 2) 
df$pot = 22 - 1.8*df$Negative_affect-2.5*df$Effortful_control+2*df$Extraversion + eps

It was hypothesized that people high in NA would have explicit default strategy, while people high in Ex would have implicit default strategy, so it should take different time for each profile of people to learn the optimal strategy (e.g., either explicit or implict), when holding EC consistent. EC should facilitate the process of discovering the optimal strategy.

fit2<-lm(pot~Negative_affect + Extraversion +Effortful_control, df)
summary(fit2)
## 
## Call:
## lm(formula = pot ~ Negative_affect + Extraversion + Effortful_control, 
##     data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.185 -1.371  0.048  1.326  5.101 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        20.2390     1.1793   17.16   <2e-16 ***
## Negative_affect    -1.7994     0.1383  -13.01   <2e-16 ***
## Extraversion        2.0507     0.1467   13.98   <2e-16 ***
## Effortful_control  -2.1712     0.1562  -13.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.049 on 595 degrees of freedom
## Multiple R-squared:  0.466,  Adjusted R-squared:  0.4633 
## F-statistic:   173 on 3 and 595 DF,  p-value: < 2.2e-16

I would expect the results to look similar to these, where high Negative affect and Effortful control reduces the blocks required to switch to optimal strategy (conjunctive rule-based), and high Extraversion increases the number of blocks required. I also expect these predictors to be significant.

Ability to discover optimal strategy

Let’s create a column called ‘opt_str’. ‘1’ refers to discovered the optimal strategy, ‘0’ refers to failed to discover the optimal strategy. People who were not characterized by explicit strategy by block 17 (final block) were classified as non-learners.

# if optimal strategy is used, the value '1' is assigned in the opt_stra column
df$opt_stra <- as.factor(ifelse(df$pot<=17,"1","0"))
logit_EC <- glm(opt_stra~Effortful_control + Negative_affect+Extraversion, family = binomial, df)

summary(logit_EC)
## 
## Call:
## glm(formula = opt_stra ~ Effortful_control + Negative_affect + 
##     Extraversion, family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8830   0.0818   0.1649   0.3106   1.8455  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.1524     2.8328  -0.407    0.684    
## Effortful_control   2.0544     0.4047   5.077 3.84e-07 ***
## Negative_affect     1.6305     0.3791   4.301 1.70e-05 ***
## Extraversion       -2.3282     0.3944  -5.904 3.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 283.12  on 598  degrees of freedom
## Residual deviance: 197.03  on 595  degrees of freedom
## AIC: 205.03
## 
## Number of Fisher Scoring iterations: 7

I would expect to see results like this, where Negative affect and Extraversion are positively related to explicit (optimal strategy), and Extraversion is negatively related to it. Interpretation: E.g.,For every one unit change in Negative affect, the log odds of using explicit strategy (versus implicit strategy) increases by 1.63.

Strategy consistency

A similar multiple regression can be done with performance accuracy as DV. This would be the accuracy after the first block of using the optimal strategy until the end of task. This analysis should tell us whether EC affects the consistency of applying a discovered optimal strategy. Low EC is hypothesized to be associated with low accuracy, vice versa for high EC.

Let’s create a column called accuracy, suppose that it shows the average accuracy from strategy transition point to the end of task. Intercept and coefficient are arbitratily chosen.

df$accuracy <- 40+8.5*df$Effortful_control+eps
fit3<-lm(accuracy~Effortful_control, df)
summary(fit3)
## 
## Call:
## lm(formula = accuracy ~ Effortful_control, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1187 -1.3866  0.0544  1.3091  5.1243 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        38.4357     0.6287   61.14   <2e-16 ***
## Effortful_control   8.8343     0.1496   59.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.046 on 597 degrees of freedom
## Multiple R-squared:  0.8538, Adjusted R-squared:  0.8536 
## F-statistic:  3487 on 1 and 597 DF,  p-value: < 2.2e-16

For each unit increase in Effortful control, accuracy improves by 8.83%.

Cross-task analyses

Here I want to see whether people with opposite temperament profiles would should opposite tendency in learning explicit or implicit-process mediated tasks.

Let’s quickly simulate another dataset for the II task, the only thing being changed is the point of strategy transition and its relationship with the temperament factors, all else is the same compared with CR task. Point of transition will be used as DV.

set.seed(123)
# covariate matrix
dat_f <- rnorm_multi(n = 410,
                  mu = c(4.11, 4.17, 4.34),
                  sd = c(.64, .59, .63),
                  r = c(-.29, -.17, -.03),
                  varnames = c("Negative_affect",
                               "Effortful_control", "Extraversion"),
                  empirical = FALSE)
## The number of variables (vars) was guessed from the input to be 3
dat_f<-data.matrix(dat_f, rownames.force = NA)
dat_f<-as.data.frame(dat_f)
dat_f$gender<-rep("F", 410)


dat_m <- rnorm_multi(n = 189,
                  mu = c(3.85, 4.19, 4.25),
                  sd = c(.51, .54, .60),
                  r = c(-.49, -.21, .28),
                  varnames = c("Negative_affect", 
                               "Effortful_control", "Extraversion"),
                  empirical = FALSE)
## The number of variables (vars) was guessed from the input to be 3
dat_m<-data.matrix(dat_m, rownames.force = NA)
dat_m<-as.data.frame(dat_m)
dat_m$gender<-rep("M", 189)

df2<-merge(dat_f,dat_m, all.x=TRUE, all.y=TRUE)
df2$subj<-as.factor(sample(600:1198, 599, replace=FALSE))

df2<-df2[,c(5,4,1,2,3)]

df2[1] <- lapply(df2[1],factor)

# create range buckets for each factor scale
df2$EC_bucket <- as.factor(ifelse(df2$Effortful_control<=mean(df2$Effortful_control),"low","high"))
df2$NA_bucket <- as.factor(ifelse(df2$Negative_affect<=mean(df2$Negative_affect),"low","high"))
df2$Ex_bucket <- as.factor(ifelse(df2$Extraversion<=mean(df2$Extraversion),"low","high"))


df2$task<-rep('II', 599)

#suppose these are the point of transition into optimal strategy on the Le Pelley task
eps = rnorm(n = 599, mean = 0, sd = 2) #pretend sd of error to be 2
#create a multiple regression model as the basis for generating pot value
df2$pot = 20 + 1.4* df2$Negative_affect - 1.2*df2$Extraversion- 2*df2$Effortful_control+ eps


kable(head(df2))
subj gender Negative_affect Effortful_control Extraversion EC_bucket NA_bucket Ex_bucket task pot
714 F 2.489405 4.515853 2.823761 high low low II 11.165053
709 F 2.628472 3.723965 4.958565 low low high II 10.692912
920 F 2.638685 4.190022 5.025848 high low high II 8.640380
1017 F 2.679214 4.671876 5.087523 high low high II 6.309137
1150 F 2.746592 4.882832 3.992115 high low low II 5.107244
755 M 2.751671 5.359293 4.470169 high low high II 8.620021
cols <- c(1:5, 7:10,12)
df_all<-rbind(df[,cols], df2)

I created group 1 that is composed of people with high level NA, and low Ex; group 2 has high level of Ex, and low NA.

#select the groups composed of people with specific temperament traits
group1<-subset(df_all, NA_bucket=="high" & Ex_bucket =="low" )
nrow(group1)
## [1] 302
group2<-subset(df_all, Ex_bucket=="high" & NA_bucket =="low" )
nrow(group2)
## [1] 345
#combine data for anova
group1$group <- paste0("1", group1$group)
group2$group <- paste0("2", group2$group)

group_data<-rbind(group1, group2)
test_aov<-aov(pot~group*task*Effortful_control, group_data)

summary(test_aov)
##                               Df Sum Sq Mean Sq F value  Pr(>F)    
## group                          1      2     1.6   0.308   0.579    
## task                           1     81    80.7  15.811 7.8e-05 ***
## Effortful_control              1    934   933.9 183.045 < 2e-16 ***
## group:task                     1   1636  1636.2 320.671 < 2e-16 ***
## group:Effortful_control        1      1     0.6   0.115   0.735    
## task:Effortful_control         1      3     3.0   0.588   0.443    
## group:task:Effortful_control   1      3     2.5   0.494   0.482    
## Residuals                    639   3260     5.1                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I would expect results to look similar to these. There should be an interaction effect between group and task, showing that the two temperamental profiles respond to explicit and implicit-process mediated tasks differently. Effortful control should significantly reduce the number of blocks required to switch to using the optimal strategy.

Code Refrences:

https://debruine.github.io/faux/articles/rnorm_multi.html https://stats.stackexchange.com/questions/103728/simulating-multinomial-logit-data-with-r https://stackoverflow.com/questions/63623104/generate-normal-distribution-of-numbers-that-are-multiples-of-5-with-in-a-specif
https://towardsdatascience.com/implementing-binary-logistic-regression-in-r-7d802a9d98fe
https://uvastatlab.github.io/2019/05/04/simulating-a-logistic-regression-model/