Data cleaning

Fro the original data we drop all the observations with missing values for declared taxes (decl_taxes variable), missing turnout or average salary per month. We also excluded observations for May where there were a lot of missing values for declared taxes.

Effect of treatments on average salary, declared taxes and turnout

Here we show the change in average declared salary, declared taxes and turnout in September - December from average amounts in June-August.

Average salary by month:

We report here both average salaries for each treatment, and for aggregated approach variables where different types of treatments are clustered into two wider groups (Audit for three treatments with different messages about auditing (5%, 33%, 66% of audit probability)) etc.

DiD (FE model)

model 1 model 2 model 3
Dependent Var.: avg_salary avg_salary avg_salary
afterTRUE x approachBaseline 20.2*** (4.7) 13.2** (4.3) 13.2* (5.2)
afterTRUE x approachAudit 23.6*** (3.7) 15.4*** (3.3) 15.4*** (4.0)
afterTRUE x approachProsocial 18.4*** (3.7) 10.9*** (3.2) 10.9** (3.9)
employees -11.9*** (1.6) -11.9*** (2.0)
decl_taxes 0.05*** (0.005) 0.05*** (0.006)
turnout 2.6e-5. (1.5e-5) 2.6e-5 (1.8e-5)
Fixed-Effects: ————- —————- —————
id Yes Yes Yes
month Yes Yes Yes
bin_grants No Yes Yes
years No Yes Yes
sal2021_ratio No No Yes
paid_tax_2021 No No Yes
region No No Yes
industry No No Yes
_____________________________ _____________ ________________ _______________
S.E.: Clustered by: id by: id by: id
Observations 26,223 26,223 26,223
R2 0.774 0.847 0.847
Within R2 0.003 0.325 0.325

Non-parametric tests of difference

As it can be seen in Appendix, both average salary variable and residuals for ANOVA model failed both tests for normality and homogeneity of variance. Thus we also report non-parametric tests (Kruskal-Wallis) that showed significant difference of all treatments from ‘No message’ treatment and a barely significant difference between ‘Audit’ approach compared to ‘Prosocial’ approach.

Appendix

Pairwise Wilcoxon tests for the difference in means

.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
value No message Baseline 477 477 93646.0 3.70e-06 1.10e-05 ****
value No message Audit 477 1425 278574.0 0.00e+00 0.00e+00 ****
value No message Prosocial 477 1433 295648.0 1.58e-05 3.16e-05 ****
value Baseline Audit 477 1425 334148.5 6.62e-01 6.62e-01 ns
value Baseline Prosocial 477 1433 352888.0 2.37e-01 2.84e-01 ns
value Audit Prosocial 1425 1433 1065294.0 3.20e-02 4.80e-02

Kruskal-Wallis tests for difference in means in turnout and declared taxes

Difference across approaches in declared taxes (change from June-August to September-December)

.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
value No message Baseline 477 477 101694.0 6.00e-03 0.013000
value No message Audit 477 1425 296273.5 4.24e-05 0.000254 ***
value No message Prosocial 477 1433 301500.0 1.73e-04 0.000519 ***
value Baseline Audit 477 1425 329945.0 3.99e-01 0.599000 ns
value Baseline Prosocial 477 1433 336417.0 6.89e-01 0.689000 ns
value Audit Prosocial 1425 1433 1032547.0 5.13e-01 0.616000 ns

Difference across approaches in declared turnout (change from June-August to September-December)

.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
value No message Baseline 477 477 109764.0 0.406 0.676 ns
value No message Audit 477 1425 340419.0 0.866 0.866 ns
value No message Prosocial 477 1433 332736.0 0.451 0.676 ns
value Baseline Audit 477 1425 350679.0 0.246 0.676 ns
value Baseline Prosocial 477 1433 343184.5 0.802 0.866 ns
value Audit Prosocial 1425 1433 990305.0 0.206 0.676 ns

Means comparison of differences in salary (3 months after intervention - 3 months before intervention).
Mean differences in declared salary by response and treatment
approach No response No promise Won’t increase Will increase
No message 13.34576 NA NA NA
Baseline 36.43133 13.52687 24.29079 39.53743
Audit 37.79587 26.49876 25.76213 62.05795
Prosocial 30.71394 25.68260 46.00127 51.13736

Sample sizes are small, thus no differences are significant. Thus we use qualititaive analysis to analyse responses.

Population size by response and treatment
approach No response No promise Won’t increase Will increase
No message 477 NA NA NA
Baseline 392 40 13 32
Audit 1250 94 46 35
Prosocial 1193 96 76 68
Wilcoxon test for difference in means by response type
estimate group1 group2 n1 n2 statistic p p.adj p.adj.signif
7.357488 No response No promise 2835 230 348742 0.071000 0.212 ns
2.861858 No response Won’t increase 2835 135 196055 0.605000 0.992 ns
-17.134553 No response Will increase 2835 135 160615 0.002000 0.009 **
-4.753350 No promise Won’t increase 230 135 14862 0.496000 0.992 ns
-24.071999 No promise Will increase 230 135 11914 0.000207 0.001 **
-20.495996 Won’t increase Will increase 135 135 7472 0.011000 0.042
## # A tibble: 4 × 3
## # Groups:   approach [4]
##   approach     `0`   `1`
##   <fct>      <int> <int>
## 1 No message  3290    NA
## 2 Baseline    2694   567
## 3 Audit       8608  1208
## 4 Prosocial   8211  1645

Variable balance table across treatments

As we can see with an exception of some disbalancing in number of employees in 2020 (empl_n_2020) and industry type (industry) there are no significant differences in variable values across treatments. That is also relevant for average salary per month in months prior the intervention (September).
Summary Statistics
approach
No message
Baseline
Audit
Prosocial
Variable N Mean SD N Mean SD N Mean SD N Mean SD Test
id 477 3259.618 2265.494 477 3409.019 2290.176 1425 3361.651 2282.521 1433 3338.383 2280.258 F=0.377
csp 477 1064.239 307.018 477 1066.872 283.998 1425 1065.821 287.521 1433 1063.975 272.195 F=0.018
avg_sal_2021 477 466.953 123.224 477 469.007 118.392 1425 472.102 116.634 1433 468.849 117.966 F=0.31
empl_n_2021 477 13.055 20.817 477 14.382 27.127 1425 13.201 14.855 1433 14.179 25.631 F=0.773
empl_n_2020 474 11.788 13.952 471 14.699 35.961 1413 12.323 14.736 1423 13.228 19.642 F=2.213*
turnout_072021 467 59233.507 84458.432 459 74861.996 223052.138 1394 65361.476 168052.159 1392 66135.015 178388.861 F=0.654
turnout_062021 477 53577.84 74952.083 477 71926.677 227398.667 1424 60209.494 113180.014 1433 63509.959 174457.023 F=1.262
turnout_2021 477 359244.024 513428.105 477 453021.202 1523749.682 1425 393002.693 789688.98 1433 400781.927 1065235.834 F=0.746
turnout_2020 471 566930.051 821582.87 470 690167.566 1989123.096 1409 594757.566 999664.786 1416 625193.373 1262566.007 F=0.96
real_empl_072021 477 13.597 23.402 477 14.532 27.864 1425 13.688 15.036 1433 14.665 27.428 F=0.563
real_empl_062021 477 13.899 28.548 477 14.849 27.937 1425 13.703 15.944 1433 14.742 27.115 F=0.585
avg_sal_072021 477 507.376 129.725 477 512.789 130.942 1425 512.589 128.272 1433 507.225 129.349 F=0.551
avg_sal_062021 477 479.67 134.119 477 478.809 130.905 1425 480.75 129.816 1433 477.641 127.886 F=0.14
avg_sal_052021 477 470.833 135.285 477 467.826 127.062 1425 476.561 129.872 1433 472.877 132.323 F=0.632
avg_sal_042021 476 461.492 140.471 476 464.902 127.129 1422 469.968 133.972 1431 468.633 136.493 F=0.561
paid_tax_2021 477 39528.636 99998.581 477 45098.647 115137.769 1425 37485.602 64430.662 1433 43905.092 167391.65 F=0.86
paid_tax_2020 476 52744.918 121710.48 474 70344.074 235321.408 1418 54074.926 102632.105 1426 60908.065 241924.584 F=1.136
grants 477 4531.385 17720.359 477 5058.344 23285.134 1425 4445.52 16550.296 1433 4204.143 19904.871 F=0.247
region 477 477 1425 1433 X2=0.438
… Kurzemes reģions 51 10.7% 50 10.5% 150 10.5% 149 10.4%
… Latgales reģions 53 11.1% 51 10.7% 149 10.5% 149 10.4%
… Pierīgas reģions 86 18% 87 18.2% 263 18.5% 267 18.6%
… Rīgas reģions (Rīga) 199 41.7% 201 42.1% 605 42.5% 605 42.2%
… Vidzemes reģions 39 8.2% 39 8.2% 116 8.1% 119 8.3%
… Zemgales reģions 49 10.3% 49 10.3% 142 10% 144 10%
industry 477 477 1425 1433 X2=30.058**
… Tirdzniecība 111 23.3% 114 23.9% 329 23.1% 321 22.4%
… Būvniecība 96 20.1% 123 25.8% 249 17.5% 272 19%
… Rūpniecība 77 16.1% 80 16.8% 253 17.8% 268 18.7%
… Lauksaimniecība 40 8.4% 39 8.2% 133 9.3% 132 9.2%
… Transports un uzglabāšana 26 5.5% 19 4% 81 5.7% 61 4.3%
… Pakalpojumi 76 15.9% 75 15.7% 228 16% 236 16.5%
… Cits 51 10.7% 27 5.7% 152 10.7% 143 10%
bin_grants 477 0.119 0.325 477 0.147 0.354 1425 0.139 0.346 1433 0.123 0.328 F=1.07
years 477 15.407 8.965 477 14.804 8.61 1425 15.23 8.932 1433 15.387 8.862 F=0.566
l_turnout_2021 477 12.194 1.084 477 12.183 1.148 1425 12.243 1.078 1433 12.225 1.084 F=0.478
sal2021_ratio 477 0.458 0.129 477 0.457 0.126 1425 0.464 0.132 1433 0.459 0.13 F=0.57
sal07_ratio 477 0.5 0.146 477 0.502 0.147 1425 0.505 0.151 1433 0.497 0.146 F=0.631
wave 477 2.01 0.819 477 2.004 0.817 1425 2.001 0.818 1433 1.999 0.818 F=0.027
month_June 477 479.67 134.119 477 478.809 130.905 1424 480.874 129.778 1433 477.641 127.886 F=0.151
month_July 467 511.269 127.129 459 514.788 129.121 1394 514.132 127.251 1392 509.494 127.293 F=0.387
month_August 467 523.031 137.333 459 526.307 144.65 1398 527.096 139.89 1401 522.259 136.215 F=0.326
month_September 476 515.008 134.522 475 540.227 152.641 1421 541.411 152.669 1431 533.009 151.385 F=3.956***
month_October 467 514.104 145.729 457 534.069 153.666 1381 536.889 152.239 1385 526.571 147.822 F=3.072**
month_November 466 515.675 143.367 460 541.629 160.942 1391 541.085 161.419 1396 532.914 162.698 F=3.291**
month_December 470 522.816 150.706 474 546.131 176.831 1407 555.93 176.751 1418 545.321 197.733 F=3.931***
Statistical significance markers: * p<0.1; ** p<0.05; *** p<0.01

Testing for assumptions for linear models (normality and homogeneity of variances)

Df F value Pr(>F)
group 3 6.685929 0.0001691
3802 NA NA
variable statistic p.value
res_aov$residuals 0.8778177 0