A game consists of \(k\) balloons. Each balloon has an underlying, unknown, popping value \(pop_{k}\) ranging from 1 to \(pop_{max}\). A player pumps a balloon \(x_{k}\) times. If \(x_{k} < pop_{k}\), then the player earns \(x_{k}\) points. If \(x_{k} = pop_{k}\), then the balloon pops and the player earns no points for that balloon.
In the solitary game, player’s try to earn as many points as possible across all \(k\) balloons.
In the competitive game, the player who ends the game with the most points wins a fixed bonus while the other (loser) earns nothing.
We model decisions using a target fluctuating reinforcement learning model. The model assumes that, on every balloon, people have a (Normal) target distribution of pumping values characterized by a mean \(p_{\mu}\) and standard deviation \(p_{\sigma}\). Players begin each balloon by randomly selecting a sample target from their distribution. Players will then pump until their sample target (assuming it does not pop!).
The mean and standard deviation of players’ target distrubtions reflects their learning and risk-sensitivity. A risk-seeking (or high exploration) subject will have a high \(p_{\mu}\) value, while a risk-avoidiing (or low exploration) subject will have a low \(p_{\mu}\) value. For example, Figure X shows, at the start of a balloon, the distribution of pumping values for a conservative subject (in blue), and a liberal subject (in red).
Here are definitions of our main terms:
| Term | Definition | Support |
|---|---|---|
| \(p_{\mu, k}\) | Mean pumping target on balloon k | [-Inf, +Inf] |
| \(p_{\sigma, k}\) | Standard deviation of pumping target on balloon k | [0, +Inf] |
| \(\hat{p}_{k}\) | Target pumping value for balloon k | {1, 2, …, 100} |
| \(k\) | Balloon index | [1, \(k_{max}\)] |
| \(x_{k}\) | Pumps on balloon \(k\) | [1, \(x_{max}\)] |
| \(y_{k}\) | Outcome of a balloon \(k\) (either 1 for a pop, or 0 for no pop) | {0, 1} |
| \(v_{k}\) | Earnings of a balloon \(k\). If \(y_{k} = 1\), then \(v_{k} = 0\). If $y_{k} = 0} then \(v_{k} = x_{k}\) | {0, \(x_{k}\)} |
We use \(k\) to represent balloons and \(x_{k}\) to represent pumps within a balloon. Thus, each game contains \(k_{max}\) balloons. The maximum number of pumps on a balloon is \(x_{max}\). Thus, \(x\) ranges from 1 to \(x_{max}\).
We use \(y_{k}\) to represent the outcome of a balloon \(k\). \(y_{k}\) can be 0, indicating no pop, or 1, indicating a pop.
We use \(v_{k}\) to represent the earnings of balloon k. If \(y_{k} = 1\), then \(v_{k}\) = 0. If \(y_{k} = 0\) (indicating a non-popped balloon), then \(v_{k} = x_{k}\)
For example, consider a subject on balloon 3 who pumps 15 times and then stops the game. This would be represented by \(k = 3\), \(x_{k} = 15\), \(y_{k} = 0\), and \(v_{k} = 15\).
At the start of each balloon \(k\), players have an internal pumping distribution characterized by a mean \(p_{\mu, k}\) and standard deviation \(p_{\sigma, k}\) (for now, we assume that the distribution is a censored, discrete Normal distribution but this should change later). For example, if on balloon \(k = 5\) a player could have a mean pumping value of 5 and a standard deviation of 2, represented by \(p_{\mu, k = 5} = 10\), and \(p_{\sigma, k = 5} = 2\).
To play a balloon, players begin by randomly sampling a pumping target \(\hat{p}_{k}\) from their internal pumping distribution. Thus, \(\hat{p}_{k} \sim N(p_{\mu, k}, p_{\sigma, k})\). The player then pumps until the balloon bursts, or he reaches \(\hat{p}_{k}\).
The model assumes that subjects adjust their values of \(p_{\mu, k}\) and standard deviation \(p_{\sigma, k}\) as function of 3 factors: Balloon feedback, Competitive feedback, and Time.
Formally, \(p_{\mu, k + 1} = p_{\mu, k} + I_{k} + C_{k}\), where \(I_{k}\) is the adjustment due to individual feedback on balloon \(k\), and \(C_{k}\) is the adjustment due to competitive feedback on balloon \(k\).
\(I_{k}\) is the adjustment people make to their pumping tendencies as a function of the outcome of balloons, specifically, whether or not the balloon popped: \(y_{k}\). This feedback, combined with a person’s individual feedback sensitivities \(Sp\) and \(Sn\), will determine \(I_{k}\).
If a balloon \(k\) does not burst (\(y_{k} = 0\)), then \(I_{k} \ge 0\). Thus, \(p_{\mu}\) will increase (or stay the same): i.e.; \(p_{\mu, k+1} \geq p_{\mu, k}\)
If a balloon \(k\) does burst (\(y_{k} = 1\)), then \(I_{k} \le 0\). Thus, \(p_{\mu}\) will decrease (or stay the same): i.e.; \(p_{\mu, k+1} \leq p_{\mu, k}\)
The value change \(I_{k}\) depends on feedback (i.e.; whether the balloon popped or not) and individual differences in balloon feedback sensitivity paramterized as \(Sp\) and \(Sn\) which indicate sensitivity to positive and negative feedback respectively.
If \(y_{k} = 0\), \(I_{k} = Ip\), where \(Ip\) is an individual’s sensitivity to receiving positive individual feedback. For example \(Ip = .5\) means that, after a non-popped balloon, a participant increases her mean pumping rate \(p_{\mu}\) by .5.
If \(y_{k} = 1\), \(I_{k} = In\), where \(In\) is an individual’s sensitivity to receiving negative individual feedback. For example \(Ip = -1.0\) means that, after a popped balloon, a participant decreases her mean pumping rate \(p_{\mu}\) by 1.
There are good psychological reasons to expect that people have different sensitivities to positive and negative feedback. Specifically, we could expect a high negative sensitivity bias reflecting loss aversion.
For simplicity, we assume that \(Ip\) and \(In\) are stable within a participant over time. However, future models might take time into account by, for example, decreasing the magnitude of \(Ip\) and \(In\) over time.
To see how values of \(Ip\) and \(In\) influence pumping behavior, we conducted a simulation:
### Solitary Simulation Code
library(dplyr)
library(yarrr)
library(snowfall)
library(snow)
# ibart.update.fun
ibart.update.fun <- function(prior.m, # Prior mean
prior.o, # Prior outcome
ind.pos.ur = 1, # Ip: Positive updating rate
ind.neg.ur = 2) { # In: Negative updating rate
if(prior.o == 0) {
new.m <- prior.m - ind.neg.ur
} else {
new.m <- prior.m + ind.pos.ur
}
return(new.m)
}
ibart.sim <- function(bpop.vec = sample(100, # Popping times
size = 50,
replace = TRUE),
ind.pos.ur = 2, # Positive updating rate
ind.neg.ur = 5, # Negative updating rate
pump.m.start = 20, # Starting pump.mean
pump.s.start = 2, # Starting pump.sd
pump.max = 99
) { # Maximum pumps
balloons.n <- length(bpop.vec)
# Set up storage
result <- as.data.frame(matrix(NA, nrow = length(bpop.vec), ncol = 8))
names(result) <- c("balloon", "p.mu", "p.target", "pop.time", "p.end", "pop", "points", "points.cum")
result$balloon <- 1:balloons.n
result$p.mu[1] <- pump.m.start
for(i in 1:balloons.n) {
# Current means
if(i == 1) {p.mu.i <- pump.m.start} else {p.mu.i <- result$p.mu[i]}
# Get samples
p.target.i <- round(rnorm(1, mean = p.mu.i, sd = pump.s.start), 0)
if(p.target.i < 1) {p.target.i <- 1}
# Current popping time
poptime.i <- bpop.vec[i]
# Determine outcome
if(p.target.i < poptime.i) { # No Pop!
p.end.i <- p.target.i
points.i <- p.target.i
pop.i <- 0
}
if(p.target.i >= poptime.i) { # Pop!
p.end.i <- poptime.i
points.i <- 0
pop.i <- 1
}
# Update pumping distributino
p1.m.new <- ibart.update.fun(prior.m = p.mu.i,
prior.o = points.i,
ind.pos.ur = ind.pos.ur,
ind.neg.ur = ind.neg.ur)
# Catch extreme values
if(p1.m.new > pump.max) {p1.m.new <- pump.max}
if(p1.m.new < 1) {p1.m.new <- 1}
# Write results
result$p.target[i] <- p.target.i
result$pop.time[i] <- poptime.i
result$p.end[i] <- p.end.i
result$pop[i] <- pop.i
result$points[i] <- points.i
if(i < balloons.n) {result$p.mu[i + 1] <- p1.m.new}
} # End ballooon loop
# Done, summarise results
result$points.cum <- cumsum(result$points)
return(result)
}
# Some plotting code
plot.ibart <- function(x, ylim = NULL, ...) {
n.balloons <- nrow(x)
if(is.null(ylim)) {ylim = c(0, 100)}
plot(1, xlim = c(1, n.balloons), ylim = ylim, type = "n", xaxt = "n",
xlab = "Balloon", ylab = "Mean Pumping Time", ...)
axis(1, seq(1, n.balloons, 2))
# MEAN PUMPING RATES
points(x$balloon, x$p.mu, type = "b")
# Means for unpopped balloons
points(x$balloon[x$pop == 0], x$p.mu[x$pop == 0], col = "seagreen4", pch = 16)
# Points for popped balloons
points(x$balloon[x$pop == 1], x$p.mu[x$pop == 1], col = "lightcoral", pch = 16)
# ACTUAL outcomes
points(x$balloon[x$pop == 0], x$p.end[x$pop == 0], col = "seagreen4")
points(x$balloon[x$pop == 1], x$p.end[x$pop == 1], col = "lightcoral")
# Legend
legend("topleft", legend = c("Mean", "Actual"), pch = c(16, 1), col = gray(.1), bty = "n")
}
Here are simulations of people with different sensitivity biases. All start with a mean pumping rate of 20 (\(p_{\mu, 1} = 20\)). However while A is unbiased (\(Ip = In\)), B has a negative bias, while C has a positive bias.
par(mfrow = c(1, 3))
set.seed(100)
unbiased <- ibart.sim(pump.m.start = 20,
ind.pos.ur = 2,
ind.neg.ur = 2)
plot.ibart(unbiased, main = "A: Unbiased\nIp = +2, In = -2")
abline(h = 50, lty = 2)
set.seed(100)
negbias <- ibart.sim(pump.m.start = 20,
ind.pos.ur = 2,
ind.neg.ur = 10)
plot.ibart(negbias, main = "B: Negative Bias\nIp = +2, In = -10")
abline(h = 50, lty = 2)
set.seed(100)
posbias <- ibart.sim(pump.m.start = 20,
ind.pos.ur = 10,
ind.neg.ur = 2)
plot.ibart(posbias, main = "C: Positive Bias\nIp = +10, In = -2")
abline(h = 50, lty = 2)
Here, we can see that having a negativity bias leads to low mean pumping times relative to optimal values. A negative bias here could reflect loss aversion.
Next we conducted a series of simulations to see how positive and negative updating rates relate to pumping behavior and performance. We conducted 200 simulations for every combination of positive and negative updating rates in the set 0, 2, 5, 10. In each simulation we assume that players have a starting mean pumping rate \(p_{\mu}\) of 20.
# SOLITARY SAMPLING SIMULATION 2
# Takes about 5 minutes to run in parallel
# sol.sim.dm
# Simulation design matrix
sol.sim.dm <- expand.grid(ind.pos.ur = c(0, 2, 5, 10),
ind.neg.ur = c(0, 2, 5, 10),
sim = 1:200)
# sol.sim.fun
# Solitary simulation function
sol.sim.fun <- function(i) {
print(i)
ind.pos.ur.i <- sol.sim.dm$ind.pos.ur[i]
ind.neg.ur.i <- sol.sim.dm$ind.neg.ur[i]
sim.result.i <- ibart.sim(ind.pos.ur = ind.pos.ur.i,
ind.neg.ur = ind.neg.ur.i,
pump.m.start = 20, # Starting pump.mean
pump.s.start = 2, # Starting pump.sd
pump.max = 99)
return(sim.result.i)
}
# Run in parallel?
parallel <- TRUE
# Run Sequential
if(parallel == FALSE) {
sol.sim.ls <- lapply(1:nrow(sol.sim.dm), sol.sim.fun)
}
# Run Parallel
if(parallel == TRUE) {
library(snow)
library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 4)
snowfall::sfExportAll()
sol.sim.ls <- snowfall::sfClusterApplySR(1:nrow(sol.sim.dm), fun = sol.sim.fun, perUpdate = 1)
}
# Organize pumping values by trial
for(i in 1:length(sol.sim.ls)) {
print(i)
if(i == 1) {pump.by.trial <- matrix(sol.sim.ls[[i]]$p.mu, nrow = 1, ncol = nrow(sol.sim.ls[[i]]))}
if(i > 1) {pump.by.trial <- rbind(pump.by.trial, sol.sim.ls[[i]]$p.mu)}
}
save(sol.sim.ls, sol.sim.dm, pump.by.trial, file = "/Users/nphillips/Dropbox/rprojects/CompetitiveBART/data/solsimls.RData")
Here are the main results. In general, we find that when positive and negative updating rates are equal, players’ \(p_{\mu}\) values approach the optimal value of 50 over time. However, if \(Ip > In\), then \(p_{\mu}\) values drift towards values greater than 50. Similarly, if \(Ip < In\), then \(p_{\mu}\) values drift towards values less than 50.
# Get sol.sim.ls, sol.sim.dm, pump.by.trial
load(file = "/Users/nphillips/Dropbox/rprojects/CompetitiveBART/data/solsimls.RData")
ind.neg.ur.vals <- unique(sol.sim.dm$ind.neg.ur)
ind.pos.ur.vals <- unique(sol.sim.dm$ind.pos.ur)
col.vec <- yarrr::piratepal("xmen", trans = .1)
par(mfrow = c(2, 2))
for(i in 1:length(ind.neg.ur.vals)) {
plot(1, xlim = c(0, 50), ylim = c(0, 100), type = "n", xlab = "Balloon", ylab = "Mean Pumping rate", main = paste0("Negative UR = ", ind.neg.ur.vals[i]))
abline(h = 50)
legend("topleft", legend = paste0("Pos UR = ", ind.pos.ur.vals),
lty = 1:4, bty = "n", col = col.vec[1:4], lwd = 2)
for(j in 1:length(ind.pos.ur.vals)) {
dat.i <- pump.by.trial[sol.sim.dm$ind.pos.ur == ind.pos.ur.vals[j] &
sol.sim.dm$ind.neg.ur == ind.neg.ur.vals[i],]
mean.dat.i <- colMeans(dat.i)
lines(1:50, mean.dat.i, lty = j, col = col.vec[j], lwd = 2)
}
}
\(C_{k}\) is the adjustment people make to their pumping tendencies as a function of their competitive feedback. That is, how well they perform relative to their competitor. This feedback, combined with a person’s competitive feedback sensitivities \(Cp\) and \(Cn\), will determine \(C_{k}\).
Unlike Individual feedback, which necessarily occurs on every trial, we assume that competitive feedback occurs in rounds. That is, feedback about a competitor’s performance comes after every \(r\) balloons.
For example, consider a game with 100 balloons. A round might consist of \(r = 10\) balloons. Thus, after every 10 balloons, a player learns how many points his competitor earned in the past 10 balloons.
Once a player received competitive feedback, she adjustes her mean pumping rate by \(C_{k}\). In general, if the player did better than his opponent, then he either increases his pumping rate (or it does not change). That is, \(C_{k} \geq 0\). However, if he did worse than his opponent, then his pumping change depends on how many balloons he has popped. If he popped many balloons, he decreases his pumping rate. If he did not pop many balloons, then he increases his pumping rate.
Formally, we define \(C_{k}\) as follows. Let \(rv_{i}\) be a player’s earnings in round i, and \(comp_rv_{i}\) be his competitor’s earnings in round i.
If \(rv_{i} < comp_rv_{i}\), then \(C_{k}\) = \(Cp\).
If \(rv_{i} \ge comp_rv_{i}\), then one of two things can occur:
# COMPETITIVE BART SIMULATION FUNCTIONS
# Player's mean pump rate is a function of..
# prior.m : prior mean
# ind.prior.o: individual recent outcomes (0 = pop, x > 0 means earned points)
# comp.prior.o: competitor's recent outcomes
# ind.pos.ur: solitary updating rates when no pop
# ind.neg.ur: solitary cupdating rates when pop
# comp.pos.ur: competitive updating rates when beat opp
# comp.neg.ur: competitive updating rates when lose to opp
cbart.update.fun <- function(prior.m = 20,
ind.prior.o = 0,
comp.prior.o = NULL,
ind.pos.ur = 1,
ind.neg.ur = 1,
comp.pos.ur = 0,
comp.neg.ur = 2) {
# If there is no competitive feedback
if(is.null(comp.prior.o)) {
if(ind.prior.o[length(ind.prior.o)] == 0) {
new.m <- prior.m - ind.neg.ur
} else {
new.m <- prior.m + ind.pos.ur
}
}
# If there is competitive feeback
if(is.null(comp.prior.o) == FALSE) {
# I LOST this round
if(sum(comp.prior.o) >= sum(ind.prior.o)) {
# If I have popped lots of balloons
if(mean(ind.prior.o == 0) >= .5) {
new.m <- prior.m - comp.neg.ur
}
# If I have NOT popped lots of balloons
if(mean(ind.prior.o == 0) < .5) {
new.m <- prior.m + comp.neg.ur
}
}
# I WON this round...update by comp.pos.ur
if(sum(comp.prior.o) < sum(ind.prior.o)) {
new.m <- prior.m + comp.pos.ur
}
}
return(new.m)
}
cbart.sim <- function(bpop.vec = sample(100,
size = 100,
replace = TRUE),
ind.pos.ur = c(2, 2),
ind.neg.ur = c(5, 5),
comp.pos.ur = c(0, 0),
comp.neg.ur = c(10, 0),
start.m = c(20, 20),
start.s = c(4, 4),
ff = 10
) {
#
# bpop.vec = sample(100, size = 100, replace = TRUE)
# ind.pos.ur = c(1, 1)
# ind.neg.ur = c(3, 3)
# comp.pos.ur = c(2, 2)
# comp.neg.ur = c(10, 10)
# start.m = c(20, 20)
# start.s = c(5, 5)
# ff = 10
#
balloons.n <- length(bpop.vec)
feedback.times <- seq(0, balloons.n, by = ff)
p1.ind.pos.ur <- ind.pos.ur[1]
p2.ind.pos.ur <- ind.pos.ur[2]
p1.ind.neg.ur <- ind.neg.ur[1]
p2.ind.neg.ur <- ind.neg.ur[2]
p1.comp.pos.ur <- comp.pos.ur[1]
p2.comp.pos.ur <- comp.pos.ur[2]
p1.comp.neg.ur <- comp.neg.ur[1]
p2.comp.neg.ur <- comp.neg.ur[2]
# Set up storage
p1.result <- as.data.frame(matrix(NA, nrow = length(bpop.vec), ncol = 6))
p2.result <- as.data.frame(matrix(NA, nrow = length(bpop.vec), ncol = 6))
names(p1.result) <- c("balloon", "p.mu", "p.target", "p.end", "pop", "points")
names(p2.result) <- c("balloon", "p.mu", "p.target", "p.end", "pop", "points")
p1.result$balloon <- 1:balloons.n
p2.result$balloon <- 1:balloons.n
p1.result$group <- rep(1:(length(feedback.times) - 1), each = ff)
p2.result$group <- rep(1:(length(feedback.times) - 1), each = ff)
for(i in 1:balloons.n) {
group.current <- which(feedback.times == i) - 1
# Current popping time
pop.i <- bpop.vec[i]
p1.result$pop[i] <- pop.i
p2.result$pop[i] <- pop.i
# Current means
if(i == 1) {p1.p.mu <- start.m[1]} else {p1.p.mu <- p1.result$p.mu[i]}
if(i == 1) {p2.p.mu <- start.m[2]} else {p2.p.mu <- p2.result$p.mu[i]}
# Write
p1.result$p.mu[i] <- p1.p.mu
p2.result$p.mu[i] <- p2.p.mu
# Get samples
p1.s <- round(rnorm(1, mean = p1.p.mu, sd = start.s[1]), 0)
p2.s <- round(rnorm(1, mean = p2.p.mu, sd = start.s[2]), 0)
if(p1.s < 1) {p1.s <- 1}
if(p2.s < 1) {p2.s <- 1}
# Write
p1.result$p.target[i] <- p1.s
p2.result$p.target[i] <- p2.s
# Determine outcome
if(p1.s < pop.i) {
p1.result$p.end[i] <- p1.s
p1.result$pop[i] <- 0
p1.result$points[i] <- p1.s
}
if(p1.s >= pop.i) {
p1.result$p.end[i] <- pop.i
p1.result$pop[i] <- 1
p1.result$points[i] <- 0
}
if(p2.s < pop.i) {
p2.result$p.end[i] <- p2.s
p2.result$pop[i] <- 0
p2.result$points[i] <- p2.s
}
if(p2.s >= pop.i) {
p2.result$p.end[i] <- pop.i
p2.result$pop[i] <- 1
p2.result$points[i] <- 0
}
# Individual updating
if((i %in% feedback.times) == FALSE) {
p1.m.new <- cbart.update.fun(prior.m = p1.p.mu,
ind.prior.o = p1.result$points[i],
ind.pos.ur = p1.ind.pos.ur,
ind.neg.ur = p1.ind.neg.ur,
comp.pos.ur = p1.comp.pos.ur,
comp.neg.ur = p1.comp.pos.ur
)
p2.m.new <- cbart.update.fun(prior.m = p2.p.mu,
ind.prior.o = p2.result$points[i],
ind.pos.ur = p2.ind.pos.ur,
ind.neg.ur = p2.ind.neg.ur,
comp.pos.ur = p2.comp.pos.ur,
comp.neg.ur = p2.comp.pos.ur
)
}
if((i %in% feedback.times) == TRUE) {
p1.m.new <- cbart.update.fun(prior.m = p1.p.mu,
ind.prior.o = p1.result$points[p1.result$group == group.current],
comp.prior.o = p2.result$points[p2.result$group == group.current],
ind.pos.ur = p1.ind.pos.ur,
ind.neg.ur = p1.ind.neg.ur,
comp.pos.ur = p1.comp.pos.ur,
comp.neg.ur = p1.comp.neg.ur
)
p2.m.new <- cbart.update.fun(prior.m = p2.p.mu,
ind.prior.o = p2.result$points[p2.result$group == group.current],
comp.prior.o = p1.result$points[p1.result$group == group.current],
ind.pos.ur = p2.ind.pos.ur,
ind.neg.ur = p2.ind.neg.ur,
comp.pos.ur = p2.comp.pos.ur,
comp.neg.ur = p2.comp.neg.ur
)
}
# Catch extreme values
if(p1.m.new > 99) {p1.m.new <- 99}
if(p2.m.new > 99) {p2.m.new <- 99}
if(p1.m.new < 1) {p1.m.new <- 2}
if(p2.m.new < 1) {p2.m.new <- 2}
if(i < balloons.n) {
p1.result$p.mu[i + 1] <- p1.m.new
p2.result$p.mu[i + 1] <- p2.m.new
}
}
# Done, summarise results
p1.result$points.cum <- cumsum(p1.result$points)
p2.result$points.cum <- cumsum(p2.result$points)
return(list(p1.result, p2.result))
}
plot.cbart <- function(x, ...) {
play <- x[[1]]
comp <- x[[2]]
n.balloons <- nrow(play)
plot(1, xlim = c(1, n.balloons), ylim = c(1, 100), type = "n", xaxt = "n",
xlab = "Balloon", ylab = "Mean Pumping Time", ...)
axis(1, seq(1, n.balloons, 2))
# MEAN PUMPING RATES
lines(play$balloon, play$p.mu)
lines(comp$balloon, comp$p.mu, lty = 2)
legend("topleft", legend = c("player", "competitor"), lty = c(1, 2))
}
Here are two players with negative individual biases who ignore competitive information (\(Cn = Cp = 0\)). Competitive feedback is given at trials with points. Here we see no effect of competitive feedback on pumping rates.
set.seed(100)
x <- cbart.sim(ind.pos.ur = c(1, 1),
ind.neg.ur = c(5, 5),
comp.pos.ur = c(0, 0),
comp.neg.ur = c(0, 0),
ff = 10)
plot.cbart(x, main = "Ignore Competition\nPlayer Cn = Cp = 0")
points(seq(10, 100, 10), x[[1]]$p.mu[seq(10, 100, 10)])
abline(h = 50, lty = 2)
Now we explore a case where the main player has a large \(Cn = 30\) value. Here, we can see that on trial 40, the player increases his mean pumping rate by 30 to just above 50 while the competitor stays at the same mean pumping rate. While final points are not shown in this graph, we can assume that because the agent has a brief period of pumping values close to 50 (the optimal), that this agent should outperform his competitor.
set.seed(100)
x <- cbart.sim(ind.pos.ur = c(1, 1),
ind.neg.ur = c(5, 5),
comp.pos.ur = c(0, 0),
comp.neg.ur = c(30, 0),
ff = 20)
plot.cbart(x, main = "Attend to Competition\nPlayer Cp = +30")
points(seq(10, 100, 10), x[[1]]$p.mu[seq(10, 100, 10)])
abline(h = 50, lty = 2)
Next we conducted a series of simulations to see how positive and negative updating rates relate to pumping behavior and performance. We conducted 100 simulations for every combination of the following parameters:
We set the starting mean pumping rate \(p_{\mu}\) to 20 and the competitive positive sensitivity \(Cn\) to 0 for all agents.
# COMPETITIVE BART SIMULATION
# Result is comp.sim.dm
# Competitive simulation design matrix
comp.sim.dm <- expand.grid(my.comp.neg.ur = c(0, 1, 5, 10), # Player A Cn
comp.comp.neg.ur = c(0, 1, 5, 10), # Player B Cn
my.comp.pos.ur = 0, # Player A Cp
comp.comp.pos.ur = 0, # Player B Cp
my.ind.pos.ur = c(1), # Player A Ip
comp.ind.pos.ur = c(1), # Player B Ip
my.ind.neg.ur = c(1, 5), # Player A In
comp.ind.neg.ur = c(1, 5), # Player B In
ff = c(1, 10, 20), # r (round length)
sim = 1:250)
comp.sim.dm[c("my.target.mean", "my.target.sd", "my.pop.mean", "my.points",
"comp.target.mean", "comp.target.sd", "comp.pop.mean", "comp.points",
"group.points", "win")] <- NA
# Competitive simulation function
comp.sim.fun <- function(i) {
my.comp.pos.ur.i <- comp.sim.dm$my.comp.pos.ur[i]
comp.comp.pos.ur.i <- comp.sim.dm$comp.comp.pos.ur[i]
my.comp.neg.ur.i <- comp.sim.dm$my.comp.neg.ur[i]
comp.comp.neg.ur.i <- comp.sim.dm$comp.comp.neg.ur[i]
my.ind.pos.ur.i <- comp.sim.dm$my.ind.pos.ur[i]
comp.ind.pos.ur.i <- comp.sim.dm$comp.ind.pos.ur[i]
my.ind.neg.ur.i <- comp.sim.dm$my.ind.neg.ur[i]
comp.ind.neg.ur.i <- comp.sim.dm$comp.ind.neg.ur[i]
ff.i <- comp.sim.dm$ff[i]
result.i <- cbart.sim(ind.pos.ur = c(my.ind.pos.ur.i, comp.ind.pos.ur.i),
ind.neg.ur = c(my.ind.neg.ur.i, comp.ind.neg.ur.i),
comp.pos.ur = c(my.comp.pos.ur.i, comp.comp.pos.ur.i),
comp.neg.ur = c(my.comp.neg.ur.i, comp.comp.neg.ur.i),
ff = ff.i)
my.target.mean.i <- mean(result.i[[1]]$p.target)
my.target.sd.i <- sd(result.i[[1]]$p.target)
my.pop.mean.i <- mean(result.i[[1]]$pop)
my.points.i <- sum(result.i[[1]]$points)
comp.target.mean.i <- mean(result.i[[2]]$p.target)
comp.target.sd.i <- sd(result.i[[2]]$p.target)
comp.pop.mean.i <- mean(result.i[[2]]$pop)
comp.points.i <- sum(result.i[[2]]$points)
group.points.i <- my.points.i + comp.points.i
if(my.points.i > comp.points.i) {win.i <- 1}
if(my.points.i < comp.points.i) {win.i <- 0}
if(my.points.i == comp.points.i) {win.i <- NA}
return(c(my.target.mean.i, my.target.sd.i, my.pop.mean.i, my.points.i,
comp.target.mean.i, comp.target.sd.i, comp.pop.mean.i, comp.points.i,
group.points.i, win.i))
}
# Run simulation
library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 8)
snowfall::sfExportAll()
sim.result.ls <- snowfall::sfClusterApplySR(1:nrow(comp.sim.dm), fun = comp.sim.fun, perUpdate = 1)
comp.sim.dm[c("my.target.mean", "my.target.sd", "my.pop.mean", "my.points",
"comp.target.mean", "comp.target.sd", "comp.pop.mean", "comp.points",
"group.points", "win")] <- matrix(unlist(sim.result.ls), nrow = nrow(comp.sim.dm), ncol = 10, byrow = TRUE)
# Save result
save(comp.sim.dm, file = "data/comp_sim_dm.RData")
# AGGREGATE COMPETITIVE SIMULATION RESULTS
load("data/comp_sim_dm.RData")
# Aggregated data in comp.sim.agg
comp.sim.agg <- comp.sim.dm %>%
group_by(my.ind.pos.ur, my.ind.neg.ur,
comp.ind.pos.ur, comp.ind.neg.ur,
my.comp.neg.ur, comp.comp.neg.ur,
my.comp.pos.ur, comp.comp.pos.ur, ff) %>%
summarise(
my.points.mean = mean(my.points),
my.target.mean = mean(my.target.mean),
comp.points.mean= mean(comp.points),
comp.target.mean = mean(comp.target.mean),
win.mean = mean(win, na.rm = TRUE),
group.points.mean = mean(group.points),
my.pop.mean = mean(my.pop.mean),
comp.pop.mean = mean(comp.pop.mean),
N = n()
)
# Function for creating heatplots
heat.fun <- function(formula,
data,
min.val = NULL,
max.val = NULL,
main = NULL,
rounding = 0) {
#
# formula = win.mean ~ my.comp.neg.ur + comp.comp.neg.ur + ff
# data = subset(comp.sim.agg,
# my.ind.pos.ur == 1 &
# my.ind.neg.ur == 5 &
# comp.ind.pos.ur == 1 &
# comp.ind.neg.ur == 5)
# main = "p(Win)"
#
data.mf <- model.frame(formula, data)
dv <- data.mf[,1]
dv.name <- names(data.mf)[1]
n.iv <- ncol(data.mf) - 1
# Adjust min and max val
if(is.null(min.val)) {min.val <- min(dv)}
if(is.null(max.val)) {max.val <- max(dv)}
if(max(dv) > max.val) {max.val <- max(dv)}
if(min(dv) < min.val) {min.val <- min(dv)}
iv.names <- names(data.mf[,2:ncol(data.mf)])
if(n.iv == 2) {n.plots <- 1}
if(n.iv == 3) {n.plots <- length(unique(data.mf[,4])) ; iv3.name <- iv.names[3]}
iv.vals.ls <- lapply(1:n.iv, function(x) {data.mf[,x + 1]})
iv.unique.ls <- lapply(1:n.iv, function(x) {sort(unique(data.mf[,x + 1]))})
data.agg.ls <- vector("list", length = n.plots)
# Create aggregate Data
for(i in 1:n.plots) {
if(n.iv == 2) {
data.temp <- data.mf
}
if(n.iv == 3) {
iv3.val <- iv.unique.ls[[3]][i]
data.temp <- data.mf[data.mf[,4] == iv3.val,]
}
data.agg <- aggregate(formula = formula, data = data.temp, FUN = mean, na.rm = TRUE)
# Scale
dv.i <- data.agg[,ncol(data.agg)]
dv.s <- (dv.i - min.val) / (max.val - min.val)
data.agg$dv.s <- dv.s
data.agg.ls[[i]] <- data.agg
}
# PLOTTING
heat.col.fun <- circlize::colorRamp2(c(0, 1), colors = c("red", "blue"))
layout(matrix(c(rep(1, n.plots), 2:(n.plots + 1)), nrow = 2, ncol = n.plots, byrow = TRUE), widths = rep(3, n.plots), heights = c(.5, 3))
# par(mfrow = c(1, n.plots))
par(mar = c(0, 1, 0, 1))
plot.new()
if(is.null(main)) {main <- names(data.mf)[1]}
text(.5, .5, main, cex = 3)
par(mar = c(5, 4, 4, 1) + .1)
for(plot.i in 1:n.plots) {
x1.vals <- data.agg.ls[[plot.i]][,1]
x2.vals <- data.agg.ls[[plot.i]][,2]
dv.vals <- data.agg.ls[[plot.i]]$dv.s
dv.o.vals <- data.agg.ls[[plot.i]][,n.iv + 1]
x1.index.vals <- match(x1.vals, iv.unique.ls[[1]])
x2.index.vals <- match(x2.vals, iv.unique.ls[[2]])
plot(1, xlim = c(.25, length(iv.unique.ls[[1]]) + .75), ylim = c(.25, length(iv.unique.ls[[2]]) + .75), type = "n", xaxt = "n", yaxt = "n", xlab = iv.names[1], ylab = iv.names[2])
if(n.iv == 3) {mtext(paste(iv3.name, " = ", iv.unique.ls[[3]][plot.i]), 3)}
axis(1, at = 1:length(iv.unique.ls[[1]]), labels = iv.unique.ls[[1]])
axis(2, at = 1:length(iv.unique.ls[[2]]), labels = iv.unique.ls[[2]])
rect(x1.index.vals - .5, x2.index.vals - .5, x1.index.vals + .5, x2.index.vals + .5, pch = 15, col = heat.col.fun(dv.vals))
text(x1.index.vals, x2.index.vals, round(dv.o.vals, rounding), col = "white", font = 2, cex = 2)
}
}
We split the results using three different dependent variables, the probability of winning (e.g.; earning more points across all 100 balloons than one’s opponent), one’s average points earned, and the average group points earned.
First, we calculated the proportion of games that a player won. We find that agents have the highest win likelihoods the higher their \(Cn\) values are, and the lower their competitor’s \(Cn\) values are. Additionally, the fewer balloons there are per round, the larger the effect is. For example, when an agent has a \(Cn\) value of 5 and his competitor has a \(Cn\) value of 0, for a round length of 1, then the agent has a win probability of 97%.
heat.fun(formula = win.mean ~ my.comp.neg.ur + comp.comp.neg.ur + ff,
data = subset(comp.sim.agg,
my.ind.pos.ur == 1 &
my.ind.neg.ur == 5 &
comp.ind.pos.ur == 1 &
comp.ind.neg.ur == 5),
main = "p(Win)", rounding = 2)
Next, we calculated the mean number of points that individuals earned. Here, we find that agents earn the most points when both agents and their competitors have moderate to high positive \(Cn\) values. Additionally, the fewer games there are per round, the better both agents and their competitors perform.
heat.fun(formula = my.points.mean ~ my.comp.neg.ur + comp.comp.neg.ur + ff,
data = subset(comp.sim.agg,
my.ind.pos.ur == 1 &
my.ind.neg.ur == 5 &
comp.ind.pos.ur == 1 &
comp.ind.neg.ur == 5), main = "Individual mean Points")
Finally, we calculated the total number of points earned by the group. Because we’ve already seen in the individual points case that both agents and their competitors perform the best with similar parameter values, we should expect the same results for group points.
heat.fun(formula = group.points.mean ~ my.comp.neg.ur + comp.comp.neg.ur + ff,
subset(comp.sim.agg,
my.ind.pos.ur == 1 &
my.ind.neg.ur == 5 &
comp.ind.pos.ur == 1 &
comp.ind.neg.ur == 5), main = "Group Points")