Replication of Models of Affective Decision Making: How do Feelings Predict Choice by Charpentier C., Neve J., Li X. (2016, Psychological Science)

Introduction

Prospect theory (Kahneman & Tversky, 1979; Tversky & Kahneman, 1986, 1992) stipulates that people make choices based on subjective values of the choices, and not from the objective values of the choice. In other words, in a gambling task, when presented with a sure option $5 or a risky 50-50 option of either gaining $10 or gaining $0, prospect theory expects people to make their decisions based on how much they value $5 dollars, relative to how much they value $10. But is the subjective value in prospect theory a proxy of feeling?

Charpentier (2016) investigates this question by utilizing computational models. The authors are interested in whether a model based on feeling will be able to predict choices in a monetary gambling task better than a model based on objective values. The authors asked participants to rate how negatively or positively they would feel or expect to feel if they won or lost an object value. Those self-reported feelings were used to create a feeling function which relates objective values to a subjective feelings (expected or experienced). On a separate task, the authors asked participants to choose between a sure option and a risky 50-50 option in a monetary gambling task. Feeling based choice models used the constructed feeling function to predict feelings associated with each objective monetary value, and then used those predicted feelings to predict choice. Value based choice model only use objective value to predict choice.

In this project, we will replicate the main findings of the original paper. Charpentier finds that feeling based models do predict gambling choices better than value based models. Thus the author concludes that the results demonstrates how feelings integrate into decision making processes. The main statistical test of interest that will be replicated is the model comparison between feeling based choice models and value based models, using summed Bayesian Information Criterion.

Methods

Power Analysis

Charpentier does not specify a power analysis for their main statistical test. Since their key analysis relies on model comparison, it is difficult to compute the minimum number of participants to tell that one model performs better than another. However, the authors do specify a power analysis for another supplementary analysis thatis not their key statistic. I will use this test as a proxy for my power analysis, even though it may be an overestimation or underestimation of number of participants. As part of their supplementary analysis, Charpentier used a paired t-test to compare the weights of losing coefficients to weights of winning coefficients. These coefficients were extracted from logistic regression in the choice models. This test was used to see if participants weighted losing and winning significantly different from each other.

From the original paper:

“We determined that a sample size of 59 participants would achieve 85% power to detect an effect size of 0.401 with an alpha of .05.”

Using the effect size found in the original paper (d = 0.392) with an alpha of .05, 53 participants will achieve 80% power. Power calculations were run in G*Power v3.1.

Planned Sample

For the sake of feasibility in collecting data. 30 participants with a shorter version of their experiment was planned. Although this significantly reduces the sample of the experiment, the main analysis is not the comparison of the weights of losing and winning to each other, but rather, the predictive power of the models itself, which do not depend on sample size as much as t-tests do. So, although power was not achieved for the t-test for weight comparison, it is still possible to evaluate how good fitted models are with only 30 participants. Reason for choosing 30 participants also comes from the supplementary materials, which reports pilot runs of the study that yield similar model comparison results with around 30 participants. No recruitment constraints in demographics specified in the original paper nor used in replication.

Materials

Basic geometric shapes were used for each stimulus in a trial. Each trial had a different geometric shape to ensure that associations of monetary outcomes and the shapes do not carry throughout within a block.

JsPsych was used to create the experimental paradigm. Shapes were created from basic presentation software using basic shape functions and color palettes, and saved as an image.

Procedure

Participants completed two tasks, a feeling task and a gambling task. Directly quoted from the original paper, the two main tasks were as follows:

Feeling Task

In the feelings task, participants completed four blocks of 40 to 48 trials each, in which they reported either expected (Fig. 1a) or experienced (Fig. 1b) feelings associated with a range of wins and losses (between £0.2 and £12) or with no change in monetary amount (£0). At the beginning of each trial, participants were told how much was at stake and whether it was a win trial (e.g., “If you choose the GOOD picture, you will: WIN £10”) or a loss trial (e.g., “If you choose the BAD picture, you will: LOSE £10”). On each trial, their task was to make a simple arbitrary choice between two different geometrical shapes. Participants were told that one stimulus was randomly associated with a gain or loss (between £0.2 and £12) and the other stimulus with no gain and no loss (£0). Each stimulus was presented only once across the entire task so there was no way for participants to learn which stimulus was associated with a better outcome. The probability of sampling each amount was controlled to ensure that each gain and each loss from the range was sampled twice in each block: In one instance, the outcome was the amount at stake (win/loss), and in the other, the outcome was £0 (no win/no loss).

In two of the four blocks (counterbalanced order), participants reported their expected feelings prior to choosing between the two stimuli (Fig. 1a), and in the other two blocks, they reported their experienced feelings after choosing between the two stimuli (Fig. 1b). Participants reported their expected feelings by answering one of four questions asking how they would feel if they “win,” “lose,” “don’t win,” or “don’t lose” (the order of win/lose and don’t-win/don’t-lose questions was counterbalanced across trials). In experienced-feelings blocks, participants answered the question “How do you feel now?” All feelings were rated using a subjective rating scale ranging from extremely unhappy to extremely happy. Expected and experienced feelings were collected in different blocks to ensure participants did not simply remember and repeat the same rating. The choice between the two geometrical shapes was arbitrary and implemented simply in order to have participants actively involved with the outcomes.

Gambling Task

Participants also completed a probabilistic-choice task (Fig. 1c) in which they made 288 to 322 choices between a risky 50-50 gamble and a sure option. Importantly, all the amounts used in the gambling task were the same as those used in the feelings task (between £0.2 and £12), so feelings associated with these outcomes could be combined to predict gambling choice. There were three gamble types: mixed (participants had to choose between a gamble with a 50% chance of a gain and 50% chance of a loss, or a sure option of £0), gain only (participants had to choose between a gamble with a 50% chance of a high gain and a 50% chance of £0, or a sure, smaller gain), and loss only (participants had to choose between a gamble with 50% chance of a high loss and 50% chance of £0, or a sure, smaller loss). According to prospect theory, these three types of choices are essential to estimate loss aversion, risk preference for gains, and risk preference for losses, respectively.

Link to Replication Experiment

Follow the link below to review the experiment used in the replication:

Short Experiment Version (~5 minutes): [https://web.stanford.edu/~mlko53/expt/experiment_short.html]
Complete Experiment Version (~30 minutes): [https://web.stanford.edu/~mlko53/expt/experiment.html]

Analysis Plan

Participant responses from the feeling task will be used to map objective values to subjective feeling by fitting outlined feeling models. Best feeling models for both expected and experienced models will be selected using Bayesian Information Criteria. Afterwards, feeling based models, that relies on best feeling models from either expected or experienced models, and value based models, that rely on monetary values or transformed values presented during the task, will be constrcuted from participants’ choices from the gambling task. The key analysis tests whether affect based models (Choice Model 1, 2) have lower summed BICs and therefore greater predictive power than value based models (Choice Model 3 - 7). All model parameters were estimated using Maximum Likelihood Estimation.

Feeling Function Models

Subjective feelings is measured by the difference between the ratings of the objective monetary value (9 point Likert scale) and the midpoint of the rating scale (4). In order to capture the relationship between subjective feeling and objective values, 10 different models were fit for each feeling block, expected and experienced. These models were either linear, or s-shaped, with slight variations such as piecewise fitting (where the parameters of positive and negative values are estimated separately) or intercept terms (expecting that the feeling associated with +0 is not 0). See Charpentier2016, p.4 for specific model formula. Each model was fit for each participant. Best model in each feeling block was determined by selecting the model with the least summed BIC across all participants.

Gambling Choice Models

Each choice model used logistic regression on three predictors, a win value, loss value and sure value, to predict gambling choice (coded “1” for selecting the risky 50-50, or “0” for selecting the sure option). 7 different choice models were fitted for every participant. Choice model 1 and 2 are feeling models using the best fitting feeling function from the expected feeling block and experienced feeling block respectively to transform objective values to subjective values. Choice model 3 - 7 are value based models using either raw objective values or a transformation of the raw objective value such as logarithm (See Charpentier2016, p.5 and Supplementary Material, p.3 for model formula). Each model was fit to each participant. Best model was determined by selecting the model with the least summed BIC across all participants. In order to test if findings for the best model is robust, 100 split half analysis was performed with the same modelling fitting procedures. In each split half analysis, only a random half subset of the data was be used for parameter estimation of the models. BIC was calculated accordingly to each split half analysis, and repeated 100 times. If the same best model was found consistently in each split half analysis, then we can conclude that the best model is selected because of its predictive power, and not because of other factors such as chance or overfitting.

Exclusion Criteria

Participants who showed no variation in response will be excluded from data analysis. Furthermore, repeat mTurk responses will also be excluded.

Differences from Original Study

Sample Difference

One key difference is that the original study did not use an online sample. The original study recruits participants from students in a university who complete the study in an inlab visit. This replication recruits participants from Amazon Mechanical Turk, and have participants complete the experiment on an online browser. In order to fit the constraints of the class budget, a smaller sample size was used. Instead of the planned 53 in the power analysis, only 30 participants were recruited.

Material Difference

Original paper uses 0.2 - 12 pounds. This replication uses $0.2 - $12. Reason for using dollars instead of pounds was because participant sample was from the U.S. and not the U.K. I did not convert the original paper’s monetary range appropriately with the current pound-dollar current exchange because I wanted to see whether the author’s findings were generalizable to a different monetary scale.

Procedural Difference

The original paper does not specify which specific monetary values between 0.2 - 12 was used nor the order in which they were presented for the feeling and gambling blocks. For this replication, monetary values from the first version of the pilot runs reported in the supplmentary materials were used. For the expected feeling block, each value was presented exactly twice in a random order. Participants have an equal chance of winning or losing in each trial. For the experienced feeling block, each value was presented exactly twice in a random order. Of those two presentations of value, participants will experience either that value in one trial or a gain of 0 in the other trial. For the gambling block, each type of gambling trial (mixed, gain only, and loss only) were presented at an equal number. In the mixed trials, values presente were selected at random. In the gain only trials, values for the high gain was selected at random from [6, 12], and values form the low gain was selected at random from [0.2, 6]. In the loss only trials, values for the high loss was selected at random from [-6, -12], and values from the low gain was selected at random from [-0.2, -6].Participants had an equal chance of winning or losing in each gambling trial.
For simplicity sake of constructing the experimental paradigm on the web, this replication does not counterbalance the order of the feeling task and the gambling task. Furthermore, the order of the expected feeling block and the experienced feeling block within the feeling task is not counterbalanced. Supplementary material of the original paper finds no difference in results when presenting the tasks in different orders. Thus, any difference in results is expected not to have arisen from task counterbalancing.
The author’s do not explicitly state the use of practice trials, however in order to keep the replication experimental paradigm short for the online sample, practice trials were not included. Results of the replication should not be affected significantly since all analysis only look at choice, and not measures that could be affected significantly by lack of practice trials such as reaction time.
In order to keep the replication experimental paradigm to around 30 minutes, a shortened version of the experimental paradigm was administered. Instead of 2 blocks of 48 trials for each expected and experienced blocks, participants completed only 1 block of 48 trials for each. Gambling trial length remains the same. Reducing feeling trials may create poorer fits of the feeling models. However, BIC scores can still be extracted and utilized for model comparison.
Given the constraint of the class budget, replication experiment did not provide any monetary incentive based on performance on the gambling task. The original paper does not specify this component as part of payment, but may be a potential source of difference.

Analysis Difference

Charpentier reports using three different measures of subjective feelings. The one reported in the published paper used the difference between the ratings and the average rating of receiving $0. I will use the difference between the ratings and the midpoint of the rating scale (0). Results should not differ since Charpentier reports no differences. Furthermore, preliminary piloting of the replication reveals that average rating of receiving $0 is not significantly different from 0, again suggesting that this change would not shift results.

Methods Addendum (Post Data Collection)

Split half, cross validated accuracy will be graphed for each choice function. In other words, training and estimation of the model parameters will use a random subset of one half of the data, while accuracy scores are extracted from testing on the other half of the data. Accuracy will not be used as the basis for model comparison, since accuracy scores do not control for the number of parameters fitted to the data, but provides an intuitive gauge of the predictive power of the model, relative to chance level (50%).

Actual Sample

30 U.S. mTurkers. No participants were excluded for lack of variation in responses.

Differences from pre-data collection methods plan

None

Results

Data preparation

Load Libraries and Functions

set.seed(2017) # so that cross val is the same for reproducibility
library(scales)
library(jsonlite)
library(bbmle)
library(knitr)
library(ggplot2)
library(tidyr)
library(dplyr)
source("utils.R") ## Has all helper functions used for fitting models and calculating BIC/accuracy

Load Data

data_path = '../data'
data_files = dir(data_path)

nparticipants <- length(data_files)

d <- fromJSON(paste0('../data/', data_files[1])) ## read the first data file to create the master dataframe
d <- fromJSON(paste0(d$answers))
d$id <- 1

#read in the rest of the data in data_files
for(i in 2:length(data_files)){
        temp_d <- fromJSON(paste0('../data/', data_files[i]))
        temp_d <- fromJSON(paste0(temp_d$answers))
        temp_d$id <- i
        d <- rbind(d, temp_d)
}

Data Preprocessing

#filter irrelevant data
d <- d %>% filter(!is.na(feel_trial_type) | !is.na(gamble_trial_type))

Split the data into task blocks. Fit feeling models for feel_expect and feel_experience. Test choice models with gamble_block.

# split the data into the blocks so that later analysis is easier
feel_expect <- d %>% 
        filter(feel_trial_type != "now" & feel_trial_type != "") %>%
        select(feel_trial_type, value, response, id)

feel_experience <- d %>% 
        filter(feel_trial_type == "now") %>%
        select(feel_trial_type, value, response, id)

gamble_block <- d %>% 
        filter(!is.na(gamble_trial_type)) %>%
        select(gamble_trial_type, win, lose, sure, key_press, gamble_side, id)

Transform the data into suitable forms.

# transform 1-9 likert subjective response to -4 - 4 so that neutral is 0
feel_expect$response <- as.integer(feel_expect$response)
feel_expect$response <- feel_expect$response - 5

feel_experience$response <- as.integer(feel_experience$response)
feel_experience$response <- feel_experience$response - 5

#preprocess data so that there is a binary variable column named gamble
gamble = logical(0)
for(i in 1:length(gamble_block$gamble_side)){
        if(gamble_block$gamble_side[i] == ""){
                gamble <- c(gamble, NA)
        } else{
                if(gamble_block$gamble_side[i] == "R" & gamble_block$key_press[i] == 80){
                        gamble <- c(gamble, TRUE)
                } else if(gamble_block$gamble_side[i] == "L" & gamble_block$key_press[i] == 81){
                        gamble <- c(gamble, TRUE)
                } else{
                        gamble <- c(gamble, FALSE)
                }
        }
}

gamble_block$gamble <- gamble

Confirmatory analysis

Comparing Feeling Models

There are two feelings that are modeled, expected feelings and experienced feelings. First we fit Feeling Models 1 - 10 for expected feelings for every participant, and select the best model with the lowest sum across all participants BIC.

## no boostrapping
feeling_models <- data.frame(matrix(ncol = 20, nrow = 1))

## init names of the columns
model_names <- character(20)
for(i in 1:10){
        model_names[i] <- paste0("feel_expect_mod", i)
}
        
for(i in 11:20){
        model_names[i] <- paste0("feel_experience_mod", i - 10)
}

colnames(feeling_models) <- model_names

feel_fit(nparticipants)
feeling_models[1, ] <- feel_BIC(nparticipants)

Plotting the summed BICs across participants of the expected feeling model, we find that Model 3, the S-shaped model is the best feeling function.

feeling_expect <- feeling_models %>% 
               select(1:10) %>% 
               gather("model", "BIC", 1:10) %>%
               separate(model, sep="_", into= c("first", "second", "model"))

feeling_expect$model <- factor(feeling_expect$model, levels = c("mod10", "mod9", "mod8", "mod7", "mod6",
                                                                "mod5", "mod4", "mod3", "mod2", "mod1"))

# plots average summed up BIC expected feeling models
ggplot(feeling_expect, aes(x = model, y = BIC)) + 
        stat_summary(fun.y = "identity", geom="bar", fill="orange", width = .5) +
        ylab("BIC(Summed Across Participants)\nBetter Fit (Lower BIC)") +
        ggtitle("Expected-Feelings Models") +
        ggthemes::theme_few() +
        coord_flip(ylim = c(10500, 12500))

Compared with the original’s Expected Feeling Models:

The original finds that Model 3 (S-shaped model) is the best model for expected feelings. The replication finds the same best model for expected feelings.

The same analysis plan is repeated for experienced feelings. We fit Feeling Models 1 - 10 for experienced feelings for every participant, and select the best model with the lowest sum across all participants BIC.

feeling_experience <- feeling_models %>% 
               select(11:20) %>% 
               gather("model", "BIC", 1:10) %>%
               separate(model, sep="_", into= c("first", "second", "model"))

feeling_experience$model <- factor(feeling_experience$model, levels = c("mod10", "mod9", "mod8", "mod7", "mod6",
                                                                "mod5", "mod4", "mod3", "mod2", "mod1"))

# plots average summed up BIC experienced feeling models
ggplot(feeling_expect, aes(x =model, y = BIC)) + 
        stat_summary(fun.y = "identity", geom="bar", fill="darkblue", width = .5) +
        ylab("BIC(Summed Across Participants)\nBetter Fit (Lower BIC)") +
        ggtitle("Experienced-Feelings Models") +
        ggthemes::theme_few() +
        coord_flip(ylim=c(10500, 12500))

Compared with the original’s Experienced Feeling Models.

Experienced Feeling Model Again, the original finds that Model 3 (S-shaped model) is the best model for expected feelings. The replication finds the same best model for experienced feelings.

To summarize the results of the feeling functions, the below graph shows the average reported expected and experienced feeling with each objective value. Standard error bars are shown. Model 3, the S-shaped model, that was the best fit for both expected and experienced feeling is indicated by the continuous lines.

feel_expect_sum <- summarySE(feel_expect, measurevar = "response", groupvars = c("value"))
feel_expect_sum$type <- "Expected Feelings"
feel_experience_sum <- summarySE(feel_experience, measurevar = "response", groupvars = c("value"))
feel_experience_sum$type <- "Experienced Feelings"

feel_data_sum <- rbind(feel_expect_sum, feel_experience_sum)

ggplot() +
        geom_errorbar(data=feel_data_sum %>% filter(type =="Expected Feelings"), aes(x = value, y = response, ymin=response-se, ymax=response+se), colour="black", width=.1, position=position_dodge(0.1)) +
        geom_point(data = feel_data_sum %>% filter(type == "Expected Feelings"),
                        aes(x = value, y = response, fill="Expected Feelings"), position=position_dodge(0.1), size=3, shape=21) +
        geom_errorbar(data=feel_data_sum %>% filter(type =="Experienced Feelings"), aes(x = value, y = response, col = type, ymin=response-se, ymax=response+se), colour="black", width=.1, position=position_dodge(0.1)) +
        geom_point(data = feel_data_sum %>% filter(type == "Experienced Feelings"),
                        aes(x = value, y = response, fill="Experienced Feelings"), position=position_dodge(0.1), size=3, shape=21) +
        ggtitle("Feeling Function") +
        geom_smooth(data = feel_expect, aes(x = value, y = response, col="Expected Feelings"), method="lm", formula='y ~ I(ifelse(x > 0, abs(x) ^ .2437, -abs(x) ^ .2437))', se=F, linetype="dashed") + 
        geom_smooth(data = feel_experience, aes(x = value, y = response, col="Experienced Feelings"), method="lm", formula='y ~ I(ifelse(x > 0, abs(x) ^ .3652, -abs(x) ^ .3651))', se=F) +
        xlab("Objective Monetary Values") +
        ylab("Feeling") +
        ggtitle("Model 3 Feeling Function") +
        coord_cartesian(xlim=c(-12,12)) +
        expand_limits(x=0, y=0) +
        geom_hline(yintercept=0)+
        geom_vline(xintercept=0) +
        theme_classic() +
        theme(axis.text=element_blank(), axis.ticks=element_blank(), axis.line=element_blank()) +
        geom_segment(aes(x=seq(-12,12,2), y=0, xend=seq(-12,12,2), yend=-.2)) +
        geom_text(aes(x=seq(-12,12,2), y=-.4, label=as.character(seq(-12,12,2)))) +
        geom_segment(aes(x=0, y=seq(-4,4,1), xend=-.2, yend=seq(-4,4,1))) +
        geom_text(aes(x=-.4, y=seq(-4,4,1), label=as.character(seq(-4,4,1)))) +
        scale_fill_manual(name="Data", values=c("orange", "darkblue")) +
        scale_color_manual(name="Model", values=c("orange", "darkblue"))

Choice Models

Next, we fit Choice Model 1-7, and select the best model with the lowest sum across all participants BIC. Same fitting procedure is repeated with 100 iterations for split half boostrap analysis. The key analysis evaluates whether the BIC of Model 1 and 2 is less than the BIC of Model 3 to 7. This will be evaluated without split half analysis, and with split analysis.

Results of best models in 100 iterations are stored in the table choice_models_table. Results for overall choice model BIC’s without split half analysis is stored in choice_models_nosplit.

n.iter <- 5

#init choice model array
choice_models <- data.frame(matrix(ncol = 7, nrow = n.iter))

model_names <- character(7)

for(i in 1:7){
        model_names[i] <- paste0("choice_mod", i)
}

colnames(choice_models) <- model_names

# Choice Model Fitting w/ Split Half Analysis
for(i in 1:n.iter){
        
        choice_fit(split = TRUE, nparticipants) # fit choice models split half
        choice_models[i, ] <- choice_BIC(nparticipants)
        message(i)
}

# Tallys how many times a model was the best model across all iterations
choice_models_splitHalf <- colSums(t(apply(choice_models, 1, FUN = function(x) {x == min(x)})))
choice_models_splitHalf <- data.frame(modelnum=names(choice_models_splitHalf), Best_Model_Count = as.vector(choice_models_splitHalf))
choice_models_splitHalf$model <- c(rep("Affect Based Models", 2), rep("Value Based Models", 5))
choice_models_splitHalf$model <- as.factor(choice_models_splitHalf$model)


# Choice Model Fitting w/o Split Half Analysis

choice_models_nosplit <- data.frame(matrix(ncol = 7, nrow = 1))

colnames(choice_models_nosplit) <- model_names

#start fitting
choice_fit(split = FALSE, nparticipants)
choice_models_nosplit[1, ] <- choice_BIC(nparticipants)

Graph BIC’s of all choice model without split half analysis.

choice_models_nosplit <- choice_models_nosplit %>% 
               gather("model", "BIC", 1:7) %>%
               separate(model, sep="_", into= c("first", "model")) %>%
               filter(!is.na(BIC))

choice_models_nosplit$model <- c("Expected", "Experienced", "Value", "Log(Value)", "Value & Loss Aversion", "Value & Risk Aversion","Value, Loss, & Risk Aversion")

choice_models_nosplit$model <- factor(choice_models_nosplit$model, c("Value, Loss, & Risk Aversion", "Value & Risk Aversion", "Value & Loss Aversion", "Log(Value)", "Value", "Experienced", "Expected"))

ggplot() + 
        stat_summary(data=choice_models_nosplit[1,], aes(x = model, y = BIC), fill = "orange", fun.y = "identity", geom = "bar", width=.5) +
        stat_summary(data=choice_models_nosplit[2,], aes(x = model, y = BIC), fill = "darkblue", fun.y = "identity", geom="bar", width=.5) +
        stat_summary(data=choice_models_nosplit[3:7,], aes(x = model, y = BIC), fun.y = "identity", geom="bar", width=.5) +
        scale_x_discrete(limits = levels(choice_models_nosplit$model)) +
        ylab("BIC(Summed Across Participants)\nBetter Fit (Lower BIC)\n") +
        ggtitle("Choice Models") +
        ggthemes::theme_few() +
        coord_flip(ylim=c(2000,5000))

Compare this with the original’s Choice models.

The original shows that both expected feelings and experienced feeling choice models had lower summed BIC than the other value based models (Model 3 - 7). The replication confirms this finding. Both expected feelings and experienced feeling models had the lowest BIC compared to the other value models. It seems that Experienced Feeling model and Log(value) models have very similar BIC values, but on closer inspection, Experienced Feeling model has a lower BIC.

model	BIC
Expected	2986.448
Experienced	3058.946
Value	3200.982
Log(Value)	3061.162
Value & Loss Aversion	4340.090
Value & Risk Aversion	3404.412
Value, Loss, & Risk Aversion	3425.911

The BIC difference between the feeling models and the Log(Value) model seems very close. Split half analysis is used to confirm that this finding is significant.The original paper reports that this finding was replicated in 99 / 100 iterations. This replication finds that this finding was replicated in 100 / 100 iterations.

model	Best_Model_Count
Affect Based Models	5
Value Based Models	0

Exploratory analyses

The original paper uses summed BIC to compare models. BIC values however do not give an intuitive sense of the predictive value of these models. Here, I am interested in the accuracy of these choice models. More specifically, I will do a similar split half analysis process, reporting accuracy, not BIC values. Model paraemters will be fitted to a random half of the data, and then accuracy scores will be extracted from the other half of the data. This will be repeated 100 times to yield a stable measure of accuracy.

Note: accuracy is not a good model comparison. A model having a higher accuracy score does not mean it’s a better model. Factors like overfitting or number of parameters included in the model need to be accounted for.

accuracy <- choice_fit(split = T, num = nparticipants, accuracy = T)
for(i in 1:(n.iter - 1)){
        accuracy <- rbind(accuracy, choice_fit(split = T, num = nparticipants, accuracy = T))
        message(i)
}

accuracy <- accuracy %>%
        gather("model", "accuracy", 1:7) %>%
        separate(model, sep = "_", into=c("first", "model")) %>%
        mutate(model = factor(model)) %>%
        group_by(model) %>% 
        summarise(accuracy = mean(accuracy))

ggplot() +  
        stat_summary(data=accuracy[1, ], aes(x = model, y = accuracy), fill = "orange", fun.y="identity", geom="bar", width=.5, ylim=c(0,1)) +
        stat_summary(data=accuracy[2, ], aes(x = model, y = accuracy), fill = "darkblue", fun.y="identity", geom="bar", width=.5, ylim=c(0,1)) +
        stat_summary(data=accuracy[3:7, ], aes(x = model, y = accuracy), fun.y="identity", geom="bar", width=.5) +
        coord_cartesian(ylim=c(0,1)) +
        scale_y_continuous(breaks=c(0, .4, .8, 1)) +
        ylab("Average Prediction Accuracy") +
        xlab("Model") +
        ggthemes::theme_few() +
        geom_abline(slope=0, intercept=.5, col="red", lty=2, size=1.2) +
        scale_y_continuous(labels=percent)

It is interesting to find that almost all of the models perform at about 90% accuracy. Model 6 is notably lower, but performs significantly above chance. Model 6’s compromised performance could be attributed to convergence errors during fitting Model 6.

Discussion

Summary of Replication Attempt

This results show a successful replication of Charpentier 2016. Even though the replication participant numbers (30) were significantly less than the original paper (56), all major findings were replicated. Both expected and experienced feelings had the same S-shaped model (Model 3) as the best feeling function. More importantly, for the key analysis, affect-based models (Model 1 and 2) had lower total BIC’s and thus better predictions of choice than value-based models (Model 3-7). The key analysis is robust and repeatable over 100 / 100 cross validations, which is similar to the the original author’s report (99 / 100).

Commentary

Note about current analysis: Model 6 in the choice model often had failed convergence. This could either be reflective of the model itself, or the algorithm used to fit the model. Follow-up exploratory analysis could investigate the cause of the failure to converge. However, such examination is probably not needed, since Model 6 was still a worse model relative to the rest when all models were successful fitted without convergence errors.

Even though the replication was successful, slight changes to the analysis plan could yield interesting insights not reported in the original paper. For one, the only measure of subjective value in this replication was the difference between the trial rating and the midline of the scale (0). Charpentier specified other measures of subjective values that could be worthwhile exploring (such as subjective values based on differences of current trial rating and previous trial rating). The mentioned change to the analysis plan could be interesting, since in this replication, it is assumed that affect values of one trial is independent from the next. It is possible that participants who felt negatively in one trial felt a similar level of negative the next trial, regardless of expected or experienced monetary outcomes. Other models that account for this temporal dependency could be interesting to explore.

devtools::session_info()

## Session info -------------------------------------------------------------

##  setting  value                       
##  version  R version 3.3.3 (2017-03-06)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Los_Angeles         
##  date     2017-12-14

## Packages -----------------------------------------------------------------

##  package    * version  date       source        
##  assertthat   0.2.0    2017-04-11 CRAN (R 3.3.3)
##  backports    1.1.0    2017-05-22 CRAN (R 3.3.3)
##  base       * 3.3.3    2017-03-06 local         
##  bbmle      * 1.0.20   2017-10-30 CRAN (R 3.3.3)
##  bindr        0.1      2016-11-13 CRAN (R 3.3.3)
##  bindrcpp   * 0.2      2017-06-17 CRAN (R 3.3.3)
##  colorspace   1.3-2    2016-12-14 CRAN (R 3.3.3)
##  datasets   * 3.3.3    2017-03-06 local         
##  devtools     1.13.3   2017-08-02 CRAN (R 3.3.3)
##  digest       0.6.12   2017-01-27 CRAN (R 3.3.3)
##  dplyr      * 0.7.2    2017-07-20 CRAN (R 3.3.3)
##  evaluate     0.10.1   2017-06-24 CRAN (R 3.3.3)
##  ggplot2    * 2.2.1    2016-12-30 CRAN (R 3.3.3)
##  ggthemes     3.4.0    2017-02-19 CRAN (R 3.3.3)
##  glue         1.1.1    2017-06-21 CRAN (R 3.3.3)
##  graphics   * 3.3.3    2017-03-06 local         
##  grDevices  * 3.3.3    2017-03-06 local         
##  grid         3.3.3    2017-03-06 local         
##  gtable       0.2.0    2016-02-26 CRAN (R 3.3.3)
##  highr        0.6      2016-05-09 CRAN (R 3.3.3)
##  htmltools    0.3.6    2017-04-28 CRAN (R 3.3.3)
##  jsonlite   * 1.5      2017-06-01 CRAN (R 3.3.3)
##  knitr      * 1.17     2017-08-10 CRAN (R 3.3.3)
##  labeling     0.3      2014-08-23 CRAN (R 3.3.2)
##  lattice      0.20-34  2016-09-06 CRAN (R 3.3.3)
##  lazyeval     0.2.0    2016-06-12 CRAN (R 3.3.3)
##  magrittr     1.5      2014-11-22 CRAN (R 3.3.3)
##  MASS         7.3-45   2016-04-21 CRAN (R 3.3.3)
##  memoise      1.1.0    2017-04-21 CRAN (R 3.3.3)
##  methods    * 3.3.3    2017-03-06 local         
##  munsell      0.4.3    2016-02-13 CRAN (R 3.3.3)
##  numDeriv     2016.8-1 2016-08-27 CRAN (R 3.3.2)
##  pkgconfig    2.0.1    2017-03-21 CRAN (R 3.3.3)
##  plyr         1.8.4    2016-06-08 CRAN (R 3.3.3)
##  R6           2.2.2    2017-06-17 CRAN (R 3.3.3)
##  Rcpp         0.12.12  2017-07-15 CRAN (R 3.3.3)
##  rlang        0.1.1    2017-05-18 CRAN (R 3.3.3)
##  rmarkdown    1.6      2017-06-15 CRAN (R 3.3.0)
##  rprojroot    1.2      2017-01-16 CRAN (R 3.3.3)
##  scales     * 0.4.1    2016-11-09 CRAN (R 3.3.3)
##  stats      * 3.3.3    2017-03-06 local         
##  stats4     * 3.3.3    2017-03-06 local         
##  stringi      1.1.5    2017-04-07 CRAN (R 3.3.3)
##  stringr      1.2.0    2017-02-18 CRAN (R 3.3.3)
##  tibble       1.3.3    2017-05-28 CRAN (R 3.3.3)
##  tidyr      * 0.6.3    2017-05-15 CRAN (R 3.3.3)
##  tools        3.3.3    2017-03-06 local         
##  utils      * 3.3.3    2017-03-06 local         
##  withr        2.0.0    2017-07-28 CRAN (R 3.3.3)
##  yaml         2.1.14   2016-11-12 CRAN (R 3.3.3)

Replication of “Models of Affective Decision Making: How do Feelings Predict Choice” by Charpentier C., Neve J., Li X. (2016, Psychological Science)

Michael Ko (mlko53 [at] stanford [dot] edu)

December 14, 2017