In this post, I will simulate the theoretical relationships between temperament factors: Negative affect, Effortful control and Extraversion, and category learning strategy use. The tasks I used to do hypothesis testing are from Murphy et al., (2017) and Le Pelley et al., (2019). The Murphy task can be learned either through explicit or implicit strategies, and Le Pelley has 3 tasks which can be optimally learned either explicitly or implicitly, individual differences occur with respect to the strategy they use and their learning rate, I want to see whether temperament can account for some of these variances. For Le Pelley task, the first set of analyses before cross-task analysis, will be demonstrated with the conjunctive rule-based task. Data for both tasks will be summarized across participants (i.e., each participant has one row in the dataframe, individual trials were not simulated).
library(data.table) #rbindlist
library(MASS) #logistic models
library(knitr)
library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
#multinomial logistic regression
require(foreign)
require(nnet)
McKinney, Stearns and Szkody (2020) provided mean, sd and variable correlations to create the temperament factor scales data. McKinney et al. recruited 410 females and 189 males, I will stimulate exactly this.
set.seed(888)
# covariate matrix
dat_f <- rnorm_multi(n = 410,
mu = c(4.11, 4.17, 4.34),
sd = c(.64, .59, .63),
r = c(-.29, -.17, -.03),
varnames = c("Negative_affect",
"Effortful_control", "Extraversion"),
empirical = FALSE)
dat_f<-data.matrix(dat_f, rownames.force = NA)
dat_f<-as.data.frame(dat_f)
dat_f$gender<-rep("F", 410)
dat_m <- rnorm_multi(n = 189,
mu = c(3.85, 4.19, 4.25),
sd = c(.51, .54, .60),
r = c(-.49, -.21, .28),
varnames = c("Negative_affect",
"Effortful_control", "Extraversion"),
empirical = FALSE)
dat_m<-data.matrix(dat_m, rownames.force = NA)
dat_m<-as.data.frame(dat_m)
dat_m$gender<-rep("M", 189)
df<-merge(dat_f,dat_m, all.x=TRUE, all.y=TRUE)
df$subj<-as.factor(sample(1:599, 599, replace=FALSE))
Murphy_prop_II represents the proportion of answers out of 20 trials can be characterized as using similarity-based strategy, mean and sd are provided by Murphy et al., (2017).
set.seed(55444)
#simulate Murphy et al., (2017) each participant's proportion of answers can be characterized by II strategy
rdiscnorm <- function(n, mean, sd, min = 0, max = 100, by = 5){
# generate the possible values we can take on
vals <- seq(from = min, to = max, by = by)
# use dnorm to get the density at each of those points
unnormed_probabilities <- dnorm(vals, mean = mean, sd = sd)
# normalize so that the probabilities sum to 1
# - this isn't strictly necessary because we use sample
# but it makes sense when thinking about the process
ps <- unnormed_probabilities/sum(unnormed_probabilities)
# Take a sample with replacement of the vals
# using the generated probabilities
output <- sample(vals, size = n, replace = TRUE, prob = ps)
return(output)
}
out <- rdiscnorm(599, 57, 25.30)
df$Murphy_prop_II <- out
# fit<-lm(Murphy_prop_II~Negative_affect+Extraversion, df)
# summary(fit)
df<-df[,c(5,4,1,2,3,6)]
df[2] <- lapply(df[2],factor)
# create range buckets for each factor scale
df$EC_bucket <- as.factor(ifelse(df$Effortful_control<=mean(df$Effortful_control),"low","high"))
df$NA_bucket <- as.factor(ifelse(df$Negative_affect<=mean(df$Negative_affect),"low","high"))
df$Ex_bucket <- as.factor(ifelse(df$Extraversion<=mean(df$Extraversion),"low","high"))
# Let's suppose this is a CR task
df$task<-rep('CR', 599)
kable(head(df))
| subj | gender | Negative_affect | Effortful_control | Extraversion | Murphy_prop_II | EC_bucket | NA_bucket | Ex_bucket | task |
|---|---|---|---|---|---|---|---|---|---|
| 36 | F | 1.748728 | 4.989457 | 5.200141 | 45 | high | low | high | CR |
| 238 | F | 2.405532 | 4.872522 | 4.420393 | 50 | high | low | high | CR |
| 44 | F | 2.556227 | 4.840598 | 3.897457 | 25 | high | low | low | CR |
| 181 | F | 2.562353 | 4.599368 | 5.327697 | 60 | high | low | high | CR |
| 55 | F | 2.653297 | 3.735701 | 4.153468 | 75 | low | low | low | CR |
| 423 | M | 2.668574 | 5.747569 | 5.546743 | 30 | high | low | high | CR |
summary(df)
## subj gender Negative_affect Effortful_control Extraversion
## 1 : 1 F:410 Min. :1.749 Min. :2.626 Min. :2.685
## 2 : 1 M:189 1st Qu.:3.583 1st Qu.:3.765 1st Qu.:3.895
## 3 : 1 Median :3.971 Median :4.168 Median :4.293
## 4 : 1 Mean :4.027 Mean :4.165 Mean :4.285
## 5 : 1 3rd Qu.:4.471 3rd Qu.:4.533 3rd Qu.:4.686
## 6 : 1 Max. :5.694 Max. :5.896 Max. :5.849
## (Other):593
## Murphy_prop_II EC_bucket NA_bucket Ex_bucket task
## Min. : 0.00 high:300 high:282 high:305 Length:599
## 1st Qu.: 40.00 low :299 low :317 low :294 Class :character
## Median : 55.00 Mode :character
## Mean : 55.67
## 3rd Qu.: 70.00
## Max. :100.00
##
Murphy’s task has 20 trials, the Murphy_prop_II column represents the proportion of answers can be described as using similarity-based strategy, which ranged fromm 0% to 100%. It is hypothesized that higher Negative affect is related to lower proportion of II responses, and higher Extraversion is related to higher proportion of II responses.
fit<-lm(Murphy_prop_II~Negative_affect+Extraversion, df)
summary(fit)
##
## Call:
## lm(formula = Murphy_prop_II ~ Negative_affect + Extraversion,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.749 -16.872 1.344 15.549 47.106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.317 10.031 5.116 4.22e-07 ***
## Negative_affect -1.378 1.483 -0.929 0.353
## Extraversion 2.310 1.628 1.419 0.156
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.78 on 596 degrees of freedom
## Multiple R-squared: 0.00581, Adjusted R-squared: 0.002474
## F-statistic: 1.742 on 2 and 596 DF, p-value: 0.1761
I would expect to see results similar to these if the hypotheses were correct: Negative affect would have negative coefficient and Extraversion would have positive coefficient, these are also expected to be significant predictors.
The ‘strategy’ column represents first block strategy on Le Pelley et al., (2019) tasks. I want to see whether individuals differ in their default strategy as a function of their temperament traits. The default strategies can be single-dimensional rule, conjunctive rule, similarity-based, or guessing. I plan to group these into explicit, implicit, and guessing. In the actual study, I plan to run a multinomial logistic regression analysis with strategy (3 levels) as DV. But here, I will simply demonstrate with a binomial logistic regression with a 2-level DV.
For simplicity sake, let’s suppose column ‘strategy’ represents 2 types of default strategies: 1= explicit strategy; 0 = implicit strategy. I want to use temperament factors to predict default strategy. It’s hypothesized that Negative affect is associated with explicit strategies, and Extraversion is associated with implicit strategies, Effortful control is not specifically related to strategy type, but it should facilitate people to find the optimal strategy. Let’s suppose the current task is explicit-process mediated, which makes explicit strategy the optimal approach, therefore Effortful control should be positively related to explicit strategy.
Binomial logistic regression data simulation methods were used to generate column ‘strategy’ with 2 levels (explicit and implicit strategy), the beta coefficients were arbitrarily chosen, but the positive/negative signs have theoretical support.
xb <- 1.2 + 0.5*df$Negative_affect - 1.2*df$Extraversion + 0.5*df$Effortful_control
p <- 1/(1 + exp(-xb))
summary(p)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09861 0.40148 0.52943 0.53306 0.66368 0.94076
set.seed(1)
df$strategy <- rbinom(n = 599, size = 1, prob = p)
table(df$strategy)
##
## 0 1
## 273 326
Let’s look at the distribution of each factor scale by strategy
# boxplot of how each temperament factor scale is related to each strategy
par(mfrow = c(1,3))
boxplot(Negative_affect~strategy, ylab="Negative Affect", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)
boxplot(Effortful_control~strategy, ylab="Effortful Control", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)
boxplot(Extraversion~strategy, ylab="Extraversion", xlab= "strategy", ylim=c(1, 7), col="light blue",data = df)
mod <- glm(df$strategy ~ df$Negative_affect+df$Extraversion+df$Effortful_control, family = "binomial")
summary(mod)
##
## Call:
## glm(formula = df$strategy ~ df$Negative_affect + df$Extraversion +
## df$Effortful_control, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1874 -1.0865 0.5921 1.0158 1.9697
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2009 1.2412 0.162 0.871419
## df$Negative_affect 0.5342 0.1495 3.574 0.000352 ***
## df$Extraversion -1.2213 0.1709 -7.148 8.83e-13 ***
## df$Effortful_control 0.7417 0.1722 4.307 1.65e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 825.69 on 598 degrees of freedom
## Residual deviance: 738.72 on 595 degrees of freedom
## AIC: 746.72
##
## Number of Fisher Scoring iterations: 4
I would expect to see results similar to these, where Negative affect and Effortful control positive related to the explicit (optimal) strategy, while Extraversion is negatively related to explicit (optimal) strategy.
The logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable. For every one unit change in Negative affect, the log odds of using explicit strategy (versus implicit strategy) increases by 0.53.
Let’s simulate participants’ point of transition into using the optimal strategy. Mean and sd were arbitrarily chosen.
set.seed(8)
eps = rnorm(n = 599, mean = 0, sd = 2)
df$pot = 22 - 1.8*df$Negative_affect-2.5*df$Effortful_control+2*df$Extraversion + eps
It was hypothesized that people high in NA would have explicit default strategy, while people high in Ex would have implicit default strategy, so it should take different time for each profile of people to learn the optimal strategy (e.g., either explicit or implict), when holding EC consistent. EC should facilitate the process of discovering the optimal strategy.
fit2<-lm(pot~Negative_affect + Extraversion +Effortful_control, df)
summary(fit2)
##
## Call:
## lm(formula = pot ~ Negative_affect + Extraversion + Effortful_control,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.185 -1.371 0.048 1.326 5.101
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.2390 1.1793 17.16 <2e-16 ***
## Negative_affect -1.7994 0.1383 -13.01 <2e-16 ***
## Extraversion 2.0507 0.1467 13.98 <2e-16 ***
## Effortful_control -2.1712 0.1562 -13.90 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.049 on 595 degrees of freedom
## Multiple R-squared: 0.466, Adjusted R-squared: 0.4633
## F-statistic: 173 on 3 and 595 DF, p-value: < 2.2e-16
I would expect the results to look similar to these, where high Negative affect and Effortful control reduces the blocks required to switch to optimal strategy (conjunctive rule-based), and high Extraversion increases the number of blocks required. I also expect these predictors to be significant.
Let’s create a column called ‘opt_str’. ‘1’ refers to discovered the optimal strategy, ‘0’ refers to failed to discover the optimal strategy. People who were not characterized by explicit strategy by block 17 (final block) were classified as non-learners.
# if optimal strategy is used, the value '1' is assigned in the opt_stra column
df$opt_stra <- as.factor(ifelse(df$pot<=17,"1","0"))
logit_EC <- glm(opt_stra~Effortful_control + Negative_affect+Extraversion, family = binomial, df)
summary(logit_EC)
##
## Call:
## glm(formula = opt_stra ~ Effortful_control + Negative_affect +
## Extraversion, family = binomial, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8830 0.0818 0.1649 0.3106 1.8455
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1524 2.8328 -0.407 0.684
## Effortful_control 2.0544 0.4047 5.077 3.84e-07 ***
## Negative_affect 1.6305 0.3791 4.301 1.70e-05 ***
## Extraversion -2.3282 0.3944 -5.904 3.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 283.12 on 598 degrees of freedom
## Residual deviance: 197.03 on 595 degrees of freedom
## AIC: 205.03
##
## Number of Fisher Scoring iterations: 7
I would expect to see results like this, where Negative affect and Extraversion are positively related to explicit (optimal strategy), and Extraversion is negatively related to it. Interpretation: E.g.,For every one unit change in Negative affect, the log odds of using explicit strategy (versus implicit strategy) increases by 1.63.
A similar multiple regression can be done with performance accuracy as DV. This would be the accuracy after the first block of using the optimal strategy until the end of task. This analysis should tell us whether EC affects the consistency of applying a discovered optimal strategy. Low EC is hypothesized to be associated with low accuracy, vice versa for high EC.
Let’s create a column called accuracy, suppose that it shows the average accuracy from strategy transition point to the end of task. Intercept and coefficient are arbitratily chosen.
df$accuracy <- 40+8.5*df$Effortful_control+eps
fit3<-lm(accuracy~Effortful_control, df)
summary(fit3)
##
## Call:
## lm(formula = accuracy ~ Effortful_control, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1187 -1.3866 0.0544 1.3091 5.1243
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.4357 0.6287 61.14 <2e-16 ***
## Effortful_control 8.8343 0.1496 59.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.046 on 597 degrees of freedom
## Multiple R-squared: 0.8538, Adjusted R-squared: 0.8536
## F-statistic: 3487 on 1 and 597 DF, p-value: < 2.2e-16
For each unit increase in Effortful control, accuracy improves by 8.83%.
Here I want to see whether people with opposite temperament profiles would should opposite tendency in learning explicit or implicit-process mediated tasks.
Let’s quickly simulate another dataset for the II task, the only thing being changed is the point of strategy transition and its relationship with the temperament factors, all else is the same compared with CR task. Point of transition will be used as DV.
set.seed(123)
# covariate matrix
dat_f <- rnorm_multi(n = 410,
mu = c(4.11, 4.17, 4.34),
sd = c(.64, .59, .63),
r = c(-.29, -.17, -.03),
varnames = c("Negative_affect",
"Effortful_control", "Extraversion"),
empirical = FALSE)
## The number of variables (vars) was guessed from the input to be 3
dat_f<-data.matrix(dat_f, rownames.force = NA)
dat_f<-as.data.frame(dat_f)
dat_f$gender<-rep("F", 410)
dat_m <- rnorm_multi(n = 189,
mu = c(3.85, 4.19, 4.25),
sd = c(.51, .54, .60),
r = c(-.49, -.21, .28),
varnames = c("Negative_affect",
"Effortful_control", "Extraversion"),
empirical = FALSE)
## The number of variables (vars) was guessed from the input to be 3
dat_m<-data.matrix(dat_m, rownames.force = NA)
dat_m<-as.data.frame(dat_m)
dat_m$gender<-rep("M", 189)
df2<-merge(dat_f,dat_m, all.x=TRUE, all.y=TRUE)
df2$subj<-as.factor(sample(600:1198, 599, replace=FALSE))
df2<-df2[,c(5,4,1,2,3)]
df2[1] <- lapply(df2[1],factor)
# create range buckets for each factor scale
df2$EC_bucket <- as.factor(ifelse(df2$Effortful_control<=mean(df2$Effortful_control),"low","high"))
df2$NA_bucket <- as.factor(ifelse(df2$Negative_affect<=mean(df2$Negative_affect),"low","high"))
df2$Ex_bucket <- as.factor(ifelse(df2$Extraversion<=mean(df2$Extraversion),"low","high"))
df2$task<-rep('II', 599)
#suppose these are the point of transition into optimal strategy on the Le Pelley task
eps = rnorm(n = 599, mean = 0, sd = 2) #pretend sd of error to be 2
#create a multiple regression model as the basis for generating pot value
df2$pot = 20 + 1.4* df2$Negative_affect - 1.2*df2$Extraversion- 2*df2$Effortful_control+ eps
kable(head(df2))
| subj | gender | Negative_affect | Effortful_control | Extraversion | EC_bucket | NA_bucket | Ex_bucket | task | pot |
|---|---|---|---|---|---|---|---|---|---|
| 714 | F | 2.489405 | 4.515853 | 2.823761 | high | low | low | II | 11.165053 |
| 709 | F | 2.628472 | 3.723965 | 4.958565 | low | low | high | II | 10.692912 |
| 920 | F | 2.638685 | 4.190022 | 5.025848 | high | low | high | II | 8.640380 |
| 1017 | F | 2.679214 | 4.671876 | 5.087523 | high | low | high | II | 6.309137 |
| 1150 | F | 2.746592 | 4.882832 | 3.992115 | high | low | low | II | 5.107244 |
| 755 | M | 2.751671 | 5.359293 | 4.470169 | high | low | high | II | 8.620021 |
cols <- c(1:5, 7:10,12)
df_all<-rbind(df[,cols], df2)
I created group 1 that is composed of people with high level NA, and low Ex; group 2 has high level of Ex, and low NA.
#select the groups composed of people with specific temperament traits
group1<-subset(df_all, NA_bucket=="high" & Ex_bucket =="low" )
nrow(group1)
## [1] 302
group2<-subset(df_all, Ex_bucket=="high" & NA_bucket =="low" )
nrow(group2)
## [1] 345
#combine data for anova
group1$group <- paste0("1", group1$group)
group2$group <- paste0("2", group2$group)
group_data<-rbind(group1, group2)
test_aov<-aov(pot~group*task*Effortful_control, group_data)
summary(test_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 2 1.6 0.308 0.579
## task 1 81 80.7 15.811 7.8e-05 ***
## Effortful_control 1 934 933.9 183.045 < 2e-16 ***
## group:task 1 1636 1636.2 320.671 < 2e-16 ***
## group:Effortful_control 1 1 0.6 0.115 0.735
## task:Effortful_control 1 3 3.0 0.588 0.443
## group:task:Effortful_control 1 3 2.5 0.494 0.482
## Residuals 639 3260 5.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I would expect results to look similar to these. There should be an interaction effect between group and task, showing that the two temperamental profiles respond to explicit and implicit-process mediated tasks differently. Effortful control should significantly reduce the number of blocks required to switch to using the optimal strategy.
Code Refrences:
https://debruine.github.io/faux/articles/rnorm_multi.html https://stats.stackexchange.com/questions/103728/simulating-multinomial-logit-data-with-r https://stackoverflow.com/questions/63623104/generate-normal-distribution-of-numbers-that-are-multiples-of-5-with-in-a-specif
https://towardsdatascience.com/implementing-binary-logistic-regression-in-r-7d802a9d98fe
https://uvastatlab.github.io/2019/05/04/simulating-a-logistic-regression-model/