Intro

We have some potential candidates for main dependent variable (DV). By a candidate I mean the one that shows significant difference between some treatments in Stage 2.

Candidates for main DV:

For all these variables per stage and per treatment we build below histograms of distributions, box-and-whisker plots and bars with averages and 95% confidence intervals.

Some of these candidates do not fit because of the experimental design.

We cannot take them as DVs because in Stage 2 by design participants had a limited time in TP and TP+stress treatments. So the difference will be there due to design.

One more candidate, is Daria’s original productivity. It is defined as Number of correct tasks submitted per minute, and calculated as “Total number of correct tasks submitted in Stage \(i\) divided by total time spent in Stage \(i\)”. There is no difference in means in productivity.

Three other variables have difference in means across treatments in Stage 2:

For each of these three DVs in Stage 2:

Total number of correct tasks submitted

Total time spent in stage \(i\)

Correct tasks per minute

(original Daria’s productivity)

Correct tasks per minute - individual change from Stage 1 to Stage 2

We also check that the individual growth in productivity (Growth in correct tasks per minute from Stage 1 to Stage 2) did not substantially differ across treatments (although it is positive everywhere, so people do perform generally better in Stage 2):

Time spent per solving correct task

This measure is calculated as:

number of all correct tasks submitted in Stage \(i\) divided by total amount of time spent on correct tasks only

Shapiro-wilk for normality of distribution for: time per correct task
stage treatment statistic p.value
stage1 baseline 0.893 0.001
stage1 tp only 0.970 0.259
stage1 stress+tp 0.928 0.006
stage2 baseline 0.987 0.885
stage2 tp only 0.976 0.448
stage2 stress+tp 0.978 0.506

As we can see from table above, in Stage 2 time per correct task is normally distributed. Next we’ll test if the variance is homogeneous (Levene’s test):

Levine test for homogeneity of variances
Df F value Pr(>F)
group 2 5.6231 0.0045
137
P value 0.0045 means that there is evidence to suggest that the variance across treatments in Stage 2 ’seconds spent per correct task` is statistically significantly different. We run one-way ANOVA but we will do extra tests to check for normality of residuals.
One-way ANOVA for seconds spent on correct task
Effect DFn DFd F p p<.05 ges
treatment 2 137 7.206 0.001
0.095
Tukey multiple pairwise-comparisons for: seconds spent on correct task
term contrast null.value estimate conf.low conf.high adj.p.value
treatment tp only-baseline 0 -3.5825 -6.9062 -0.25882 0.03133
treatment stress+tp-baseline 0 -5.1936 -8.5003 -1.88683 0.00084
treatment stress+tp-tp only 0 -1.6110 -4.8812 1.65917 0.47472
Shapiro-Wilk test on the ANOVA residuals
statistic p.value method
0.98936 0.36316 Shapiro-Wilk normality test

We can see that Tukey pairwise comparison shows that both stress+tp and tp only are significantly different from baseline but not between themselves. ANOVA residuals are normally distributed which can be both seen at the Q-Q plot and through Shapiro-Wilk test on ANOVA residuals. Still to double-check we will run several tests to compare means that do not require homogeneity of variance:

Welsh one-way test
num.df den.df statistic p.value method
2 86.732 6.2893 0.00281 One-way analysis of means (not assuming equal variances)
Pairwise (non-equal variance) t-test for time spent per correct task
group1 group2 p.value
tp only baseline 0.03635
stress+tp baseline 0.00206
stress+tp tp only 0.17387
Kruskal-Wallis non parametric test for: time spent per correct task
statistic p.value parameter method
11.642 0.00297 2 Kruskal-Wallis rank sum test

All non-parametric tests (Welsh and Kruskal-Wallis) confirm ANOVA results of non equality of means across treatments for time spent on correct tasks. The t-tests without assuption of equal variance confirm Tukey pairwise tests.

DV: seconds per correct task
Treatment only T + age/gender T plus full SES Full
(Intercept) 24.095*** 22.495*** 30.739*** 28.022***
(1.263) (3.028) (3.825) (5.029)
TP only -3.583** -3.564** -3.255** -2.756
(1.560) (1.638) (1.639) (1.729)
TP + stress -5.194*** -5.234*** -5.246*** -5.118***
(1.463) (1.483) (1.487) (1.453)
age 0.024 0.022 0.028
(0.058) (0.055) (0.055)
gender × Male 1.193 0.930 0.978
(1.203) (1.187) (1.322)
education -1.138*** -0.935**
(0.394) (0.394)
Income level -0.697 -0.905
(0.702) (0.745)
General risk attitudes -0.227
(0.282)
IQ -0.587*
(0.317)
Chronic stress level 0.018
(0.082)
Willingness to win 1.319**
(0.569)
Num.Obs. 140 140 137 137
R2 0.095 0.103 0.172 0.225
R2 Adj. 0.082 0.077 0.134 0.164
se_type HC2 HC2 HC2 HC2
* p < 0.1, ** p < 0.05, *** p < 0.01

From the models above it is clearly seen that people in Stress and TP work substantially faster for correct tasks in Stage2 than in a baseline treatment. This effect is weaker in TP only. Their educational level increases the speed slightly, as well as their IQ level (measured by a number of correctly submitted answers in the IQ test). Their willingness to win (measured in a likert scale from 1 to 5 for the question “If I play a game I always want to win (1 - totally disagree, 5 - totally agree)”) affects negatively their speed of submission of correct tasks.

Change in time spent per correct task - from Stage 1 to Stage 2

We calculate the individual change in speed of submission of correct tasks from Stage 1 to Stage 2 (a difference between seconds per correct task in Stage 1 and in Stage2). As you can see below this speed grows substantially faster for TP and TP+Stress compared to baseline. The OLS results confirm this (this model specification is equivalent to running the OLS above, but with taking into account the speed of an individual at Stage 1). In fact when we do take into account the individual growth from Stage 1 to Stage 2, TP treatment results in higher growth of speed in submission of correct tasks than TP+Stress treatment, and this result is highly significant across all 4 models. The IQ level decreases the growth of speed between stages, while education level is not relevant anymore.

DV: Change in seconds per correct task from Stage 1 to Stage 2
Treatment only T + age/gender T plus full SES Full
(Intercept) -3.550*** -1.930 -6.476 -3.433
(1.328) (3.027) (5.095) (6.399)
TP only -9.553*** -9.635*** -9.612*** -9.567***
(1.779) (1.723) (1.706) (1.634)
TP + stress -8.479*** -8.468*** -8.721*** -8.527***
(1.957) (1.964) (1.999) (1.918)
age -0.033 -0.047 -0.062
(0.076) (0.076) (0.073)
gender × Male -0.679 -0.578 -0.854
(1.568) (1.583) (1.708)
education 0.481 0.136
(0.625) (0.608)
Income level 0.789 0.803
(0.933) (0.953)
General risk attitudes -0.167
(0.292)
IQ 0.839**
(0.350)
Chronic stress level -0.085
(0.106)
Willingness to win -0.534
(0.672)
Num.Obs. 140 140 137 137
R2 0.183 0.185 0.191 0.228
R2 Adj. 0.171 0.161 0.154 0.167
se_type HC2 HC2 HC2 HC2
* p < 0.1, ** p < 0.05, *** p < 0.01

Share of errors (incorrect tasks) per stage

Shapiro-wilk for normality of distribution for: error share
stage treatment statistic p.value
stage1 baseline 0.840 0.000
stage1 tp only 0.890 0.000
stage1 stress+tp 0.860 0.000
stage2 baseline 0.848 0.000
stage2 tp only 0.915 0.002
stage2 stress+tp 0.935 0.011

As we can see from table above, in Stage 2 error share k is not normally distributed so we will use non-parametric tests only and we will not run a standard one-way ANOVA.

We will run several tests to compare means that do not require homogeneity of variance:

Welsh one-way test
num.df den.df statistic p.value method
2 89.129 9.316 0.00021 One-way analysis of means (not assuming equal variances)
Pairwise (non-equal variance) t-test for error share
group1 group2 p.value
tp only baseline 0.00170
stress+tp baseline 0.00126
stress+tp tp only 0.61046
Kruskal-Wallis non parametric test for: error_share
statistic p.value parameter method
14.108 0.00086 2 Kruskal-Wallis rank sum test

All non-parametric tests (Welsh and Kruskal-Wallis) as well as t-tests without assuption of equal variance confirm a non equality of means across treatments for error share.

Change in error share from Stage 1 to Stage 2

Let’s check if delta in error share is normally distributed. Shapiro-Wilk test rejects it, as well as the chart with real vs. theoretical (normal) values

Shapiro-wilk for normality of distribution for: delta error share
statistic p.value
0.953 0.000
Pairwise test (with no assumptions of equal variances) for change in error share shows that growth in errors from Stage 1 to Stage 2 is substantially higher in both TP and TP+Stress treaments compared to baseline, but not between TP and TP+stress
Pairwise (non-equal variance) t-test for error share
group1 group2 p.value
tp only baseline 0.00547
stress+tp baseline 0.00547
stress+tp tp only 0.58866

The OLS results for error share and delta (change from Stage to Stage 2) of error share are presented below:

DV: error share in Stage 2
Treatment only T + age/gender T plus full SES Full
(Intercept) 0.144*** 0.187*** 0.169 0.145
(0.023) (0.070) (0.107) (0.144)
TP only 0.128*** 0.124*** 0.132*** 0.119***
(0.038) (0.039) (0.039) (0.040)
TP + stress 0.151*** 0.151*** 0.164*** 0.154***
(0.041) (0.042) (0.042) (0.041)
age -0.001 -0.001 -0.001
(0.002) (0.002) (0.002)
gender × Male -0.007 -0.002 0.012
(0.036) (0.036) (0.038)
education 0.012 0.021*
(0.012) (0.012)
Income level -0.013 -0.008
(0.018) (0.019)
General risk attitudes 0.006
(0.008)
IQ -0.023***
(0.009)
Chronic stress level 0.003
(0.003)
Willingness to win -0.008
(0.019)
Num.Obs. 140 140 137 137
R2 0.097 0.100 0.123 0.187
R2 Adj. 0.084 0.073 0.082 0.122
se_type HC2 HC2 HC2 HC2
* p < 0.1, ** p < 0.05, *** p < 0.01

Short summary of OLS results for error share: people in Stress and TP and in TP in Stage 2 commit substantially more erros than in a baseline treatment. This effect is weaker in TP only. Their IQ level decreases the number of errors. All other factors are insignificant.

OLS for delta (change from Stage 1 to Stage 2) in error share

We also estimate OLS models for change in error share from Stage 1 to Stage 2. It can be seen that invidiual error share grew faster than in a baseline for both TP and TP+stress treaments, and for TP+stress treatment this effect was somewhat stronger. Other factors did not affect much the growth in error share.

DV: \(\Delta\) (change) in error share from Stage 1 to Stage 2
Treatment only T + age/gender T plus full SES Full
(Intercept) 0.004 0.083 -0.056 -0.197
(0.022) (0.065) (0.124) (0.141)
TP only 0.106*** 0.101*** 0.097*** 0.083**
(0.035) (0.036) (0.036) (0.039)
TP + stress 0.131*** 0.130*** 0.132*** 0.121***
(0.042) (0.042) (0.043) (0.040)
age -0.002 -0.002 -0.001
(0.002) (0.002) (0.002)
gender × Male -0.022 -0.013 -0.005
(0.035) (0.036) (0.038)
education 0.025* 0.030**
(0.013) (0.012)
Income level 0.004 0.006
(0.022) (0.022)
General risk attitudes 0.018**
(0.008)
IQ -0.007
(0.008)
Chronic stress level 0.003
(0.002)
Willingness to win -0.008
(0.020)
Num.Obs. 140 140 137 137
R2 0.074 0.085 0.116 0.169
R2 Adj. 0.061 0.058 0.075 0.103
se_type HC2 HC2 HC2 HC2
* p < 0.1, ** p < 0.05, *** p < 0.01

Speed:

That is: time spent per solving any task (correct or incorrect)

We limit here the analysis just by comparing averages visually because the speed of work per se (without taking into account the correctness of the task) is not relevant for our study.

General data analysis

Average payoffs are significantly higher in the baseline which is obvious taken into account that in the baseline they could do as many tasks as they want.

As can be seen from two tables below most of the charateristics such as age, income, educational level and IQ (measured by the number of IQ tests answered correctly) do not differ across treatments. There is a slightly higher number of females in the baseline treatment, but this proportion is not statistically significant (Pearson’s chi-square p-value is 0.8383). All other variables are almost equal across treatments.

Gender composition by treatment
treatment Female Male
baseline 42% 58%
tp only 36% 64%
stress+tp 38% 62%
Other socio-economic characteristics per treatment
variable baseline tp only stress+tp
Age (Mean) 37.068 33.804 36.447
Age (S.E.) 1.901 1.259 1.518
Number of correct tests (Mean) 4.682 4.717 4.617
Number of correct tests (S.E.) 0.333 0.294 0.299
Income level (Mean) 3.227 3.348 3.426
Income level (S.E.) 0.121 0.121 0.135
Educational level (Mean) 4.955 5.196 5.043
Educational level (S.E.) 0.216 0.196 0.204

Risk attitudes:

We added the block of risk attitudes questions. All of them also do not differ across treatments.

Risk attitudes across treatment
variable baseline tp only stress+tp
Risk in general (Mean) 4.911 5.064 5.250
Risk in general (S.E.) 0.327 0.362 0.353
Risk with money (Mean) 4.133 3.681 3.750
Risk with money (S.E.) 0.419 0.373 0.332
Risk in sports (Mean) 5.622 5.277 4.875
Risk in sports (S.E.) 0.370 0.395 0.442
Risk in workplace (Mean) 5.222 4.894 4.646
Risk in workplace (S.E.) 0.380 0.427 0.423
Risking with own health (Mean) 3.444 3.234 3.208
Risking with own health (S.E.) 0.381 0.405 0.445
Risking with strangers (Mean) 3.600 3.085 3.250
Risking with strangers (S.E.) 0.363 0.366 0.459
Risking while driving (Mean) 2.556 2.404 2.438
Risking while driving (S.E.) 0.396 0.370 0.449

Stress levels across treatments

We can see that they estimated their stress levels during stage 1 in non-baseline treatments much lower, and in stage 2 much higher (significantly higher if you look at standard errors). We calculated the growth in stress level from Stage 1 to Stage 2 (calling it ‘Delta stress’ or ‘Growth in stress level’). This growth is substantially higher in TP and TP+stress compared to the baseline. OLS shown below also confirms this point.

Reported stress level at Stage 1
treatment mean sd hist lower boundary upper boundary
baseline 5.1111 2.7071 ▅▅▃▇▂ 4.3202 5.9021
tp only 4.0426 2.6289 ▇▇▇▇▂ 3.2910 4.7942
stress+tp 3.4792 2.8208 ▇▃▃▂▁ 2.6812 4.2772
Reported stress level at Stage 2
treatment mean sd hist lower boundary upper boundary
baseline 5.4889 2.7020 ▃▆▇▇▃ 4.6994 6.2784
tp only 6.9574 2.3587 ▁▂▃▇▆ 6.2831 7.6318
stress+tp 6.8333 2.2723 ▁▂▃▇▃ 6.1905 7.4762
Change in acute stress level from Stage 1 to Stage 2
treatment mean sd hist lower boundary upper boundary
baseline 0.37778 1.7489 ▁▁▇▂▂ -0.13321 0.88877
tp only 2.91489 2.8880 ▅▇▆▃▂ 2.08923 3.74055
stress+tp 3.35417 3.0352 ▇▃▅▅▂ 2.49551 4.21282

DV: \(\Delta\) (change) in stress level from Stage 1 to Stage 2
Treatment only T + productivity
  • age/gender
  • full SES
Full
(Intercept) 0.378 -2.549 -1.853 -2.410 -2.373
(0.261) (1.733) (2.047) (2.373) (2.853)
TP only 2.537*** 2.806*** 2.720*** 2.881*** 2.660***
(0.495) (0.602) (0.628) (0.640) (0.619)
TP + stress 2.976*** 3.332*** 3.322*** 3.457*** 3.344***
(0.510) (0.622) (0.628) (0.651) (0.666)
Time per task in Stage 2 0.045 0.040 0.067 0.100
(0.132) (0.131) (0.136) (0.153)
N. tasks done in Stage 2 0.306* 0.309* 0.355* 0.382*
(0.181) (0.181) (0.190) (0.203)
Growth in error rate 2.497 2.425 2.479 2.854
(1.587) (1.601) (1.705) (1.893)
Total time spent in Stage2 -0.003 -0.003 -0.005 -0.008
(0.013) (0.013) (0.014) (0.016)
age -0.022 -0.027 -0.030
(0.019) (0.018) (0.019)
gender × Male 0.244 0.337 0.636
(0.471) (0.468) (0.505)
education -0.005 -0.014
(0.146) (0.163)
Income level 0.074 0.191
(0.261) (0.276)
General risk attitudes -0.103
(0.124)
IQ 0.041
(0.119)
Chronic stress level 0.039
(0.034)
Willingness to win -0.239
(0.237)
Num.Obs. 140 140 140 137 137
R2 0.199 0.225 0.233 0.255 0.275
R2 Adj. 0.188 0.190 0.186 0.195 0.192
se_type HC2 HC2 HC2 HC2 HC2
* p < 0.1, ** p < 0.05, *** p < 0.01