The score distribution of real images is summarized below:
| Score | Proportion |
|---|---|
| 1 | 0.030 |
| 2 | 0.273 |
| 3 | 0.515 |
| 4 | 0.182 |
Based on this distribution, we calculate the total number of images (1:1) needed to for a significant two-sided test when synthetic images are worse than real ones by certain proportions (i.e., tolerance levels). The calculated sample sizes a function of the tolerance level powered at 80%, 70%, and 60% are plotted below.
The exact numbers for certain tolerance levels are tabulated below.
| 60% power | 70% power | 80% power | |
|---|---|---|---|
| 0.08 | 291 | 367 | 466 |
| 0.1 | 158 | 199 | 253 |
| 0.12 | 90 | 113 | 143 |
| 0.14 | 50 | 63 | 80 |