Update Notice
This is the documentation for my GitHub repo - Arknights-Pull-Simulation

If you cannot open within RPubs, you can open it in a new browser tab.

本文的中文版: 明日方舟抽卡概率理论值的蒙特卡洛仿真估计

This repo is to estimate the theoretical probability of getting a specific 6★ operator on \(i^{th}\) pull/within \(i\) pulls in different types of banners in Arknights via Monte Carlo method.

The starting point of the pity system is configurable. I provided this flexibility for you if you are curious about the probability in a different mathematical model. With different settings for this value, you will see that the pity system has a different influence on the theoretical probability.

In this article, I will introduce how to use this code, giving several simple examples than you can have a first try. I will also discuss the mathematical model behind the pulling rules in Arknights, as well as the probabilities of several kinds of random events that you may care about. Then the key part of the code — random number generator — will be briefly explained, which I believe is inspiring and important. Finally, the estimated theoretical probability will be presented and analyzed, and have a look at some future work.

You can also jump to the part that you interested in directly:

What is Arknights?

Arknights is a tower defense mobile game developed by Chinese developers Studio Montagne (蒙塔山工作室) and Hypergryph. It was released in China on 1 May 2019, and worldwide on 16 January 2020 by Yostar. Arknights is available on iOS, Android and PC platforms and features gacha game mechanics. (Adopted From Wikipedia)

The game has hit No.1 overall on the iOS App Store top grossing ranking list for three times [1], TapTap Game Awards 2019 - Best Game, TapTap Game Awards 2019 - Most Influential China-Developed Game, TapTap Game Awards 2019 - Gamers’ Choice [2], Bilibili 2019 Game Review - Top Rated Game, Bilibili 2019 Game Review - Most Searched Game, Bilibili 2019 Game Review - Users’ Favorite Mobile Game [3], etc.

The game has achieved 40 million downloads worldwide [4] and 7 millions download outside the CN Server at the end of 2020, April. [5]

Official Website:

Usage

Build the Code

After git clone, cd into the directory and run make in the repo’s directory to build from the source code. Then an executable file named simulation_sequential will be generated.

Run make clean to remove all *.os and the executable files.

Command Line Arguments

You need to specify some command line arguments to run the simulation:

./simulation_sequential [--help] [-t|--total-pull-time <value>] [--standard|--limited] 
                        [-p|--pity <value>] [-n|--num-rate-up <value>] [-c|--current-pull]

The arguments in square brackets are optional, and their orders does not matter. You can use either the short name arguments (-t, -p and -n) or the long name arguments (--total-pull-time, --pity and --num-rate-up) as you wish.

Each arguments’ meaning is explained in the following table:

Argument Explanation
--help Show the help message - the introduction of each argument and how to run the program
Should not be specified with other arguments at the same time
-t
--total-pull-time
Set how many times of pulling will be executed during the simulation
Control the scale of the simulation
Valid value: an integer between \([1, 18446744073709551615]\) (inclusive)
--standard Estimate the theoretical probability in a standard banner
--limited Estimate the theoretical probability in a limited banner
-p
--pity
Set the starting point where the pity system comes into effect
The pity system will start to increase the probability of getting a 6★ operator in the pull after the N-th pull (N is the number you specified)
Valid value: an integer from \([0, 4294967295]\) (inclusive)
-n
--num-rate-up
Set whether to simulate a single-rate-up banner or a double-rate-up banner
Valid value: either \(1\) or \(2\)
-c
--current-pull
Set how many times have you pulled but without getting a 6★ operator
Valid value: an integer between [0, <-p|--pity value> + 49) (inclusive, exclusive)

On default, if no arguments are provided, the program will simulate a double-rate-up limited banner with \(100,000,000\) times pulling with pity starting point equals to \(50\) (equivalent to run with --limited -t 100000000 -n 2 -p 50), which is same as in a limited banner in Arknights, unless you specify the corresponding arguments to override the default behavior of the program.

Some Further Explanation

  • The bigger the value for total pull time is, the more accurate the result will be, but the program will consume more time to finish the simulation. The single thread simulation with -t 10000000000 (simulating \(10\) billion times of pulling) took about \(110\) seconds to finish on the platform that I used:

    • OS: Ubuntu 18.04
    • CPU: Intel® Xeon® Gold 6140 CPU @ 2.30GHz
    • Memory: 8GB
  • You can override the program’s default simulation configuration. For example, if you only specify -t 200000000 (or --total-pull-time 200000000) and still omit other arguments, the default value of total pull times (which is \(100,000,000\)) will be overridden, and the theoretical probability of a double-rate-up limited banner will be estimated under \(200,000,000\) times simulated pulling. If you also specify --standard and -n 1, then the program will simulate a single-rate-up standard banner under \(200,000,000\) times simulated pulling.

  • The program will try its best to find out any types of syntax errors that may occur in the command line arguments and give you the error reason (show in the picture below) — just like a simple compiler. However, do not fully rely on this error-check feature: always read the docs carefully ;-)

Quick Examples

Here are several quick examples to start with:

  • See the help message

    ./simulation_sequential --help

  • Simulate a double-rate-up limited banner

    ./simulation_sequential --limited -t <a-positive-big-integer>

    or just

    ./simulation_sequential -t <a-positive-big-integer>

  • Simulate a double-rate-up standard banner

    ./simulation_sequential --standard -t <a-positive-big-integer>

  • Simulate a single-rate-up standard banner

    ./simulation_sequential --standard -n 1 -t <a-positive-big-integer>

  • Simulate a single-rate-up banner which you will start to roll, but you have pulled for 42 times without getting a 6★ operator in another standard banner

    ./simulation_sequential --standard -n 1 -c 42 -t <a-positive-big-integer>

  • Simulate a single-rate-up limited banner (not appeared in Arknights yet!)

    ./simulation_sequential --limited -n 1 -t <a-positive-big-integer>

    or just

    ./simulation_sequential -n 1 -t <a-positive-big-integer>

    The recommended value for <a-positive-big-integer> is 100000000 or above.

    It is recommended that you redirect the output to a file, for example

    ./simulation_sequential --limited -t 200000000 >> simu_limited_banner.res

    since the output of the simulation program will be relatively long.

The Mathematical Model In Arknights

In this part, I will introduce the mathematical model behind the Arknights pulling rules, the definitions of random events, which are important to understanding the simulation results, and several method to calculate those probabilities.

The Arknights Pulling Rules and Headhunting Banners

In Arknights, the probability of getting a 6★ operator in a pull is \(2\%\). However, there also exists a rule called the Pity System — if you have not get any 6★ operator after consecutive \(50\) pulls, this probability will increase by \(2\%\) in each following pull — until you get a 6★ operator and reset this probability to \(2\%\) as well as the counter of the pity system, which counts how many times did you pull without getting a 6★ operator.

Detailly, suppose some guy did not get any 6★ in last \(50\) pulls, on his \(51^{st}\) pull, the probability will be \(4\%\); and if he still did not get a 6★ operator on his \(51^{st}\) pull, then on his \(52^{nd}\) pull, this probability will be \(6\%\), and so on…

This mechanism ensures that even the most unlucky player can still obtain a 6★ operator in his/her \(99^{th}\) pull.

The probability of this tragedy — \(98\) pulls without getting a 6★ operator — theoretically, is

\[ \begin{aligned} Pr(Uninstalling\ Arknights) =& (1-2\%)^{50} \times (\prod_{i=51}^{98}(1-(2\% \times (i-50)+2\%))) \times 1\\ =& (0.98)^{50} \times \prod_{i=1}^{48} \frac{49-i}{50}\\ \approx & 1.27248 \times 10^{-21} \end{aligned} \]

which is much, much, much less than being hit by a lightning, which is approximately \(2 \times 10^{-6}\), according to the data from Centers for Disease Control and Prevention [6]… poor guy…

Operators are obtained in headhunting banners. There are basically two general kinds of headhunting banners in Arknights,

  1. Standard Banner (Standard Headhunting) — banner without limited 6★ operator (e.g., Nian). This kind of banner can either has one rate-up 6★ operator, which often appears when new events are released, or has two rate-up 6★ operators.

    For example, Lisa of the Valley, Blemishine (CN Server), Standard Pool #26 are standard banners.

  2. Limited Banner (Limited Headhunting) — banner with one limited 6★ operator. Currently this kind of banner has two rate-up 6★ operators.

    For example, Earthborn Metals, Cremation Last Wish (CN Server), Forget Me Not (CN Server) are limited banners.

There also exists a type of banner called Joint Operation. The Joint Operation banner has a different rule. Currently this simulation program does not support this type of banner, but I will soon add this functionality.

The rolls of not obtaining 6★ operator will be accumulated in all the standard headhunting, and they will not be reset when a standard headhunting ends. The increased odds caused in one event also applies to another standard headhunting. (The description in Arknights)

The number of the times you pull can only be accumulated across different standard banners, rather than limited banners. When a limited banner ends, this “pulling times” in this limited banner will be reset to zero. What’s more, the pulling times for a standard banner is independent from the pulling times for a limited banner — they will not affect each other.

In a standard banner, if you get a 6★ operator, there is a \(50\%\) chance that the said operator will be the rate-up one(s); in a limited banner, if you get a 6★ operator, there is a \(70\%\) chance that the said operator will be the rate-up one(s). Suppose that your target is one of the featured 6★ operators in a banner, the chances can be expressed as a conditional probability

\[ \begin{aligned} &Pr(Get\ the\ Target\ Star\ 6\ Operator\ | \ Get\ a\ Star\ 6\ Operator\ in\ Standard\ Banner) = 50\% \\ &Pr(Get\ the\ Target\ Star\ 6\ Operator\ | \ Get\ a\ Star\ 6\ Operator\ in\ Limited\ Banner) = 70\% \end{aligned} \]

Here, we made an important assumption - if \(n\) operator(s) are featured in a banner (i.e., rate-up), they equally share this conditional probability. That is, the chance that the obtained 6★ operator in a banner is one of the featured operators equals to \(50\% / n\) or \(70\% / n\). This looks reasonable enough, although the Arknights official has not explicitly claimed this.

What’s more, Arknights has a secondary pity system called Spark System for limited banners. You can get one Recruitment Data Contract after each pull in the limited banner. And the 6★ operator featured in this limited banner can be bought in the store using 300 Recruitment Data Contract. You will not want to use it…

Abstracted Questions and Definitions

Here, we define the following random events,

  • \(A_i\) : Get a 6★ operator on your \(i^{th}\) pull
  • \(B_i\) : Did not get any 6★ operators within \(i\) pulls
  • \(S_i\) : Get the target 6★ operator on your \(i^{th}\) pull
  • \(W_i\) : Get the target 6★ operator within \(i\) pulls
  • \(F_i\) : Did not get the target 6★ operators within \(i\) pulls

Here the target 6★ operator refers to the (one of the) 6★ operator(s) featured by the banner, and the \(A_i\) and \(S_i\) require that 6★/target 6★ operator is not obtained in all previous \(i-1\) pulls.

To make the notation simpler,

  • The subscript \(i\) in \(A_i\) starts counting from \(i=1\), until you get a 6★ operator (and then counting from 1 again). Based on the rules of pity system, the maximum value for \(i\) is \(99\).
  • The subscript \(i\) in \(B_i\) starts from \(i=0\) on the pull in which you get a 6★ operator, and every time you did not get a 6★ operator in a pull, \(i\) increased by \(1\). Based on rules of the pity system, the maximum value for \(i\) is \(98\).
  • The subscript \(i\) in \(S_i\) and \(W_i\) starts counting from \(i=1\), until you get the target 6★ operator (and then counting from 1 again). Since there always exists a chance of getting a non-target 6★ operator, \(i\) can be \(+ \infty\).
  • The subscript in \(F_i\) starts from \(i=0\) on the pull in which you get the target 6★ operator, and every time you did not get the target 6★ operator in a pull, \(i\) increases by \(1\). Note that \(i\) will not start from \(0\) again if you get a non-target 6★ operator in a pull. Since there always exists a chance of getting a non-target 6★ operator, \(i\) can be \(+ \infty\).

In one word, when you achieve your goal, \(i\) resets and starts counting from the beginning again.

Based on the definitions above, here we give several examples,

  • \(A_{42}\) means that since the pity system is reset, you do not get a 6★ operator in your following \(41\) pulls, and you finally get one on your \(42^{nd}\) pull. \(A_{42}\) happens means that \(B_{41}\) also happens.
  • \(B_{37}\) means that since the pity system is reset, in the following continuous \(37\) pulls, you do not get any 6★ operator. \(B_0\) has the same meaning of \(A_1\).
  • \(S_{54}\) means that since the pity system is reset, you do not get the target 6★ operator in your following \(53\) pulls, and finally get this operator on your \(54^{th}\) pull. \(S_{54}\) happens means that \(F_{53}\) also happens.
  • \(W_{25}\) means that since the pity system is reset, you get the target 6★ operator within \(25\) pulls. Note that within these \(25\) pulls, whenever you get the operator, you just stop pulling.
  • \(F_{17}\) means that, since the pity system is reset, in the following continuous \(17\) pulls, you do not get the target 6★ operator. \(F_0\) has the same meaning with \(S_1\).

Who is the target 6★ operator is determined by you, as long as this operator is featured by a banner.

If you carefully exam the definitions, you will find that the probabilities of these random events, \(Pr(A_i)\), \(Pr(B_i)\), \(Pr(S_i)\), \(Pr(W_i)\) and \(Pr(F_i)\), are actually conditional probabilities — under the condition of “the pity system is reset”. We just did not explicitly write this out. This is also what the simulation results in this article actually mean — they are the probabilities of these events happening when the pity system just reset and you start to pull again. If you have already pulled several times without obtaining a 6★ operator in a previous standard banner, and you continue to pull in the next standard banner, the probability of getting a 6★/target 6★ or not on/within some pulls might be lightly different. However, currently, I think the difference may not be very big. (Hope I am right…)

I am currently working on improving the code and do some research on this question. Once finished, I will post a new article illustrating this question.

Please forgive me being wordy here. (I am not a native speaker of English) I hope I made myself clear by giving these examples that explain my definitions above :-)

We are curious about the values of \(Pr(S_i)\) and \(Pr(W_i)\), which are also what this repo attempts to estimate.

6★ Operator Drop: the Probability and the Expectation

Calculating the probability that getting a (any) 6★ operator is relatively easy. Some previous works [7] have shown the numeric value from \(Pr(A_1)\) to \(Pr(A_{99})\). However, the link in reference [7] does not provide a formula. Theoretically,

\[ \begin{equation} Pr(A_i)=\left\{ \begin{aligned} & (0.98)^{i-1} \times 0.02&1 \le i\le 50 & \\ & (0.98)^{50} \times (\prod_{k=51}^{i-1} \frac{99-k}{50}) \times \frac{i-49}{50} & 50<i\le 98 && \ where\ i \in N_+\\ & (0.98)^{50} \times \prod_{k=1}^{48} \frac{49-k}{50} & i = 99 & \\ \end{aligned} \right. \end{equation} \]

With the results above, we can obtain the expectation of the number of pulls to get a 6★ operator, i.e., \(\sum_{i=1}^{99}i \cdot Pr(A_i)\).

This expectation is approximately \(34.59\), or saying, the probability of getting a 6★ operator in Arknights is \(2.89\%\).

Here the value \(34.59\) means that if we average all players’ pull history (how many times did they pull to get a 6★ operator), the final result will be it. We can also understand this from a different aspect — if you pull for \(N\) times, and you get 6★ operators for \(m\) times. Each time you pull \(n_i\) times to get a 6★ operator, where \(i \in N_+,\ 1 \le i \le m\). The following equation will hold

\[ \begin{aligned} \lim_{m \to +\infty} \frac {\displaystyle \sum_{i=1}^{m} n_i }{m} =& \lim_{N \to +\infty} \frac {\displaystyle \sum_{i=1}^{m} n_i }{m}\\ =& \sum_{i=1}^{99}i \cdot Pr(A_i)\\ \approx& 34.59 \end{aligned} \]

In fact, according to [8], the probability of getting a 6★ operator on the current pull is only decided by whether you got one in your previous pull, which satisfies the Markov property (or called “memorylessness”). So, we can also use Markov Chain to build the model for this question. Each state in the chain has the same meaning of \(B_i\), and there are \(100\) different states in this chain.

Target 6★ Operator Drop: Attempts and Approaches of Calculating

It seems not so straightforward to calculate the theoretical probability of getting a specific (target) 6★ operator. Since if you want to get the probability of getting your target 6★ operator at your \(N^{th}\) pull, it is possible that you got several other 6★ characters and reset the pity system many times until you get what you want at your last pull; it is also possible that you did not get any 6★ operators before your \(N^{th}\) pull. As you can see, there exists a lot of mutually exclusive scenarios. And we need to calculate the probability of each scenario and sum them together.

To have a clearer sense of how many scenarios existing, let’s count them based on how many times that you get a non-target 6★ operators from your first pull to your \((N-1)^{th}\) pull. We know in advance that the only one target 6★ operator is obtained in \(N^{th}\) pull. So, on each individual pull before \(N^{th}\) pull, you can whether get a non-target 6★ operator or not. It is possible that you did not get any non-target 6★ operator in \(N-1\) pulls, corresponding to \(\binom{N-1}{0}\) scenarios; it is also possible that you only get one non-target 6★ operator, which has \(\binom{N-1}{1}\) different situations; similarly, getting 2 non-target 6★ operators corresponding to \(\binom{N-1}{2}\) scenarios. We will find that the number of all exclusive scenarios is

\[ \sum_{i=0}^{N-1}\binom{N-1}{i} = 2^{N-1} \]

which means, if we take this intuitive approach to get the theoretical probability, we need to calculate the probabilities of \(2^{N-1}\) different scenarios. All of these indicate a terrible amount of work.

Promising Theoretical Approaches

Some promising methods of obtaining these accurate theoretical value of \(Pr(S_i)\) are using Markov Chain and/or Dynamic Programming [9]. In [9], the value of \(Pr(S_i),\ i \in N_+,\ 1\le i \le 300\) were obtained. The author did not calculate the \(Pr(S_i),\ i \in N_+,\ i \ge 300\) due to the Spark System. However, other pointed out that the index in the author’s code was incorrect [10], and this method can not obtain the formulas of \(Pr(S_i)\).

This Repo’s Approach: Monte Carlo Simulation

I adopt the Monte Carlo simulation to estimate the theoretical \(Pr(S_i)\) in this repo.

To be specific, we define that a trial is pulling for one or more times until you get your target 6★ operator. Simulate \(N\) pulls (\(N\) is big enough), and finally you get your target 6★ operators for \(n\) times (means that \(n\) trials are finished). You pull \(i\) times to finish a trial happens \(m_i\) times, namely \(m_1,\ m_2,\ ...\) We regard each frequency as the corresponding estimated probability

\[ \frac{m_i}{n} \approx Pr(S_i),\ i \in N_+,\ 1 \le i \le n \]

where there exists

\[ \begin{aligned} &\lim_{n \to + \infty}\frac{m_i}{n}\\ =& \lim_{N \to + \infty}\frac{m_i}{n}\\ =&Pr(S_i),\ i \in N_+ \end{aligned} \]

which indicates that in order to get a more accurate result, it is recommended to provided a larger value for command line arguments -t or --total-pull-time to expect a larger \(n\).

When the number of simulated pull reaches the limitation given by the command line argument -t or --total-pull-time, stop the simulation. It is possible that the last trial would be aborted because limitation is reached.

Now we take a look at how to calculate \(Pr(W_i)\). According to the definitions of \(W_i\) and \(S_i\), we can see that \(W_i\) means that one of \(S_1\), \(S_2\), …​, \(S_{i-1}\), \(S_i\) happens. For example, \(W_4\) means that you can get your target 6★ operator on your first pull (\(S_1\)), or second pull (\(S_2\)), or third pull (\(S_3\)) or fourth pull (\(S_4\)). When one of these scenarios happen, you just stop pulling, so we can avoid dealing with some complex events (e.g., \(S_1\) happens then \(S_3\) happens, or \(S_3\) happens then \(S_1\) happens, or \(S_2\) happens twice, or \(S_1\) happens for four times).

So we can conclude that

\[ \begin{aligned} Pr(W_i)=& \sum_{k=1}^{i} Pr(S_k)\\ =& \sum_{k=1}^{i} \frac{m_k}{n},\ n\ is\ big\ enough \end{aligned} \]

Generating the Random Numbers

Pulling in Arknights is a random experiment. To follow the same pulling rule in the simulation program, meanwhile ensuring the accuracy of the results, we need to generate a great number of high-quality uniformly distributed random numbers, and compare its value with several thresholds to decide whether we get a 6★/target 6★ operator. As a result, selecting a high-quality as well as blazing-fast random number generator (RNG) becomes the key part of the program.

Pitfall: rand() % n is Evil!

It is quite common in many places that, we use the following code to generate a sequence of random number uniformly distributed in a given range, for example, \([0,\ 99]\)

int seed;  // initialize seed with some value
srand(seed);
for(int i = 0; i < 16; ++i) {
    int rand_num = rand() % 100;
    cout << rand_num << endl;
}

By the way, the code above, as well as some topics discussed below, is adopt from a lecture given by Stephan T. Lavavej from Microsoft®, which is mainly about generating random numbers in C++. If you are interested in this topic, I would strongly recommended you to take a look at this cool lecture. Here is the link.

which is problematic!

Here is the problem.

First, let’s take a look at the period. In glibc, rand() uses Linear Congruential Generator (LCG) to produce random numbers. It may also use the Linear-Feedback Shift Register (LFSR) to generate the random number. According to [11], if LFSR is adopted, rand() only records last 32 output results. In other words, it can at most have \(2^{32}\) different states; if LCG is adopted, the period is at most same as the modulus m that LCG algorithm used to generating numbers [12]. In fact, [13] and [14] show that, in glibc implementation, the value of m is \(2^{31}-1\) (which is 0x7fffffff in hexadecimal). What’s more, the article from [15], as well as [16], the period of rand() is shown to be roughly \(2^{32}\). To estimate the \(Pr(S_i)\), we at least need to simulate one billion (more than \(2^{29}\)) times of pull. This number would be increased to ten billion (more than \(2^{33}\)) to obtain a better accuracy. We can conclude that, the period of rand() is not enough for our large scale Monte Carlo simulation program.

The original website of [15] is down, so I provide the link cached by Google.

Second, the quality of the distribution of numbers generated by rand() and rand() % n is considered as low. No guarantees are made by the standard to ensure the quality of the random numbers generated by rand() [17]. Actually, it is not truly uniformly distributed since it uses LCG or LFSR. Even assuming that rand() is perfectly uniformly distributed, rand() % n is still not. In this example from Stephan’s lecture, assuming that numbers from rand() is perfectly uniform-distributed in \([0,\, 32767]\), and we want to get a uniform distribution number in \([0,\ 99]\), as code snip below

int src = rand();  // assume uniform [0, 32767]
int desired_range = 100;
int dist = src % desired_range;  // wrong!

This is a mistake is very easy to make. Why it is non-uniform distributed? The reason is actually very simple

int src = rand();  // assume uniform [0, 32767]
int desired_range = 100;
int dist = src % desired_range;
//          src     →     dist
//        [0, 99]   →   [0, 99]
//     [100, 199]   →   [0, 99]
//     [200, 299]   →   [0, 99]
// ...
// [32700, 32767]   →   [0, 67] -- problematic!

As you can see, the numbers in range \([0,\ 67]\) will have a slightly higher probability than the numbers in range \((67,\ 99]\). This error is introduced by the modulo. As long as the largest number that rand() can return (defined by RAND_MAX) is not a multiple of desired_range, this problem will be triggered. When estimating a \(Pr(S_i)\) which itself has a very small value, this modulo-error will impose a great relative error to our results.

What’s more, the following code is still not perfectly uniform distribution, because only when src is \(99\) will make dist equal to \(99\), however, for dist less than \(99\) can has multiple corresponding src values.

int src = rand();  // assume uniform [0, 32767]
int desired_range = 100;
int dist = static_case<int>(src * 1.0 / RAND_MAX) * (desired_range - 1);

the following code is still wrong, since there is no way to uniformly map \(32768\) integers to \(100\) integers…

int src = rand();  // assume uniform [0, 32767]
int desired_range = 100;
int dist = static_case<int>(src * 1.0 / (RAND_MAX + 1)) * desired_range;

In Stephan’s lecture, he gives a more detailed explanation of these pitfalls. You can refer to his lecture if you are interested.

Favored Approach - C++11 <random> Library

To achieve a high quality of uniform distribute number in order to reduce the inaccuracy of estimated probabilities as much as I can, I adopt the 64-bit Mersenne Twister Engine mt19937_64 introduced in C++11 to generate the random numbers. The numbers produced by mt19937_64 are filtered through uniform_int_distribution. Finally, we obtained the perfectly uniform-distributed numbers. This approach eliminates, rather than just reduces, the errors discussed in previous section. Compared the 32-bit version mt_19937, the 64-bit version Mersenne Twister Engine can accept more entropy, and the number of non-zero terms in its characteristic polynomial is roughly twice as many as the 32-bit version [18].

A quick example code is shown below

#include <random>  // required header

std::mt19937_64 mt(42);  // 42 as the seed
std::uniform_int_distribution<unsigned int> dist(0, 99);  // an uniform distribution 
                                                          // on [0, 99] (both inclusive)
for (int i = 0; i < 100; ++i) {
    unsigned int ran = dist(mt);  // dist is driven by mt
    std::cout << rand_num << std::endl;
}

This example uses a fixed random seed, so it will have a deterministic results. You can create a random_device object and do the following

std::radom_device rd;
std::mt19937_64 mt(rd());

In this repo, to ensure the independency between each simulation, I use the random_device to generate a random seed, and then feed this seed to mt19937_64.

According to [18], Mersenne Twister Engine mt19937_64 (as well as mt19937) has an extremely long period \(2^{19937}-1\), you do not need to worry about the generated numbers will repeat again even if you run this simulation for a lifetime of universe. It also has a extremely high quality of distribution (ensured by the standard), which satisfy our requirements.

Result Analysis

To save the space, I will only show the numeric value of \(Pr(S_i), Pr(W_i)\), where \(i \in N_+, 1 \le i \le 100\) here. I will also show several line charts based on a larger range of pulls. You can refer the full results where \(i \in N_+, 1 \le i < 1000\) in the simulation output files under /res directory.

If you do not know what do \(Pr(S_i)\) and \(Pr(W_i)\) mean, you can refer to the section Abstracted Questions and Definitions.

Estimated Probability of Double-Rate-Up Limited Banner

The following results are obtained after \(500\) billion simulated pulls, run with the command

./simulation_sequential --limited -t 500000000000 -n 2 -p 50 -c 0

You can also simply run with command

./simulation_sequential -t 500000000000

which is the same.

Again, it is recommended to redirect the output to a text file, since the output would be more than \(1,000\) lines.

The following table shows the estimated numeric values of \(Pr(S_i)\) and \(Pr(W_i)\) of first \(100\) pulls.

\(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\)
1 0.700% 0.700% 26 0.587% 16.694% 51 0.748% 30.366% 76 0.457% 53.095%
2 0.695% 1.395% 27 0.583% 17.277% 52 0.980% 31.346% 77 0.453% 53.548%
3 0.690% 2.085% 28 0.579% 17.856% 53 1.180% 32.527% 78 0.448% 53.996%
4 0.685% 2.771% 29 0.575% 18.431% 54 1.338% 33.864% 79 0.444% 54.440%
5 0.681% 3.451% 30 0.571% 19.002% 55 1.445% 35.309% 80 0.440% 54.880%
6 0.676% 4.128% 31 0.567% 19.569% 56 1.501% 36.809% 81 0.436% 55.316%
7 0.671% 4.799% 32 0.563% 20.132% 57 1.506% 38.315% 82 0.432% 55.748%
8 0.666% 5.465% 33 0.559% 20.691% 58 1.467% 39.783% 83 0.428% 56.176%
9 0.662% 6.127% 34 0.555% 21.246% 59 1.394% 41.177% 84 0.425% 56.601%
10 0.657% 6.784% 35 0.551% 21.797% 60 1.296% 42.473% 85 0.421% 57.022%
11 0.653% 7.437% 36 0.548% 22.344% 61 1.184% 43.657% 86 0.417% 57.439%
12 0.648% 8.085% 37 0.543% 22.888% 62 1.068% 44.725% 87 0.413% 57.852%
13 0.643% 8.728% 38 0.540% 23.428% 63 0.955% 45.680% 88 0.410% 58.262%
14 0.639% 9.367% 39 0.536% 23.964% 64 0.851% 46.531% 89 0.406% 58.668%
15 0.634% 10.002% 40 0.532% 24.496% 65 0.761% 47.292% 90 0.402% 59.070%
16 0.630% 10.632% 41 0.529% 25.025% 66 0.686% 47.978% 91 0.399% 59.468%
17 0.626% 11.257% 42 0.525% 25.549% 67 0.626% 48.604% 92 0.395% 59.863%
18 0.621% 11.879% 43 0.521% 26.071% 68 0.579% 49.183% 93 0.391% 60.255%
19 0.617% 12.495% 44 0.518% 26.588% 69 0.544% 49.727% 94 0.388% 60.643%
20 0.613% 13.108% 45 0.514% 27.102% 70 0.518% 50.246% 95 0.384% 61.027%
21 0.608% 13.716% 46 0.510% 27.612% 71 0.500% 50.745% 96 0.381% 61.408%
22 0.604% 14.320% 47 0.507% 28.119% 72 0.486% 51.231% 97 0.378% 61.786%
23 0.600% 14.920% 48 0.503% 28.622% 73 0.476% 51.707% 98 0.374% 62.160%
24 0.595% 15.515% 49 0.500% 29.122% 74 0.468% 52.176% 99 0.371% 62.531%
25 0.591% 16.107% 50 0.496% 29.618% 75 0.462% 52.638% 100 0.368% 62.898%

The following chart shows the estimated \(Pr(S_i)\) and \(Pr(W_i)\) of first \(500\) pulls.

The blue line shows the estimated value of \(Pr(S_i)\), which is assigned to the primary axis (on the left); the orange line shows the estimated value of \(Pr(W_i)\), which is assigned to the secondary axis (on the right).

Here are some quick facts:

  • on \(57^{th}\) pull, you will have the biggest chance to get your target 6★ operator in a limited banner; \(Pr(S_{57})=1.506\%\);
  • if you prepared for \(41\) pulls, you will have a \(25\%\) chance of getting your target 6★ operator;
  • if you prepared for \(70\) pulls, you will have a \(50\%\) chance of getting your target 6★ operator;
  • if you prepared for \(132\) pulls, this chance increases to \(75\%\);
  • you will have a \(90\%\) chance to get your operator within \(212\) pulls.

You can see that there are at least three observed peaks on the blue line. This is due to the pity system. The interesting thing is, although the odds of getting a 6★ operator starts to increase after \(50\) pulls, the maximum value of \(Pr(S_i)\) occurs when \(i=57\), which is \(7\) pulls after the pity system comes into effect. Here \(Pr(S_{57})=1.506\%\). The second maximum \(Pr(S_i)\) occurs when \(i=114\), where \(Pr(S_{114})=0.466\%\). The third maximum \(Pr(S_i)\) occurs when \(i=168\), where \(Pr(S_{168})=0.204\%\).

Now we check the turning point (not the inflection point, same below) where \(Pr(S_i)\) starts to increase rather than drops. Based on the numeric data and the chart above, we found that when \(i = 51,\ 103,\ 164\), \(Pr(S_i)\) is greater than its previous value, and keeps increasing for a while.

Note that in double-rate-up limited banners, the Spark System guarantees you to get the featured operator after \(300\) pulls in the banner. However, the probability of this situation (\(F_{300}\)) is already very small — we can take a look at the estimated value of \(Pr(S_{301})\), which requires the event \(F_{300}\) to happen, according to the definition. Here the estimated \(Pr(S_{301})=0.041 \%\), which equals to \(Pr(F_{300}) \times Pr(Get\ the\ Target\ 6★\ Operator\ on\ 301^{st}\ Pull)\). Note that the second random event, \(Get\ the\ Target\ 6★\ Operator\ on\ 301^{st}\ Pull\) does not care about the pulling results of previous pulls. We will have

\[ \begin{aligned} \max Pr(F_{300}) & = \max \frac{Pr(S_{301})}{Pr(Get\ the\ Target\ 6★\ Operator\ on\ 301^{st}\ Pull)}\\ & = \frac{Pr(S_{301})}{\min Pr(Get\ the\ Target\ 6★\ Operator\ on\ 301^{st}\ Pull)}\\ & = \frac{0.041\%}{2\% \times 70\% \div 2}\\ & \approx 5.587\% \end{aligned} \]

, which means that the chance of not getting the target 6★ operator within \(300\) pulls is less than \(5.587\%\).

There should exist a way to calculate \(Pr(F_i)\) based on the estimated values of \(Pr(S_i)\) and \(Pr(W_i)\). It may be \(1-Pr(W_i)\). Currently I am not fully sure that I am right😭.

Estimated Probability of Double-Rate-Up Standard Banner

The following results are obtained after \(500\) billion simulated pulls, run with command

./simulation_sequential --standard -t 500000000000 -n 2 -p 50 -c 0

You can also simply run with the command

./simulation_sequential --standard -t 500000000000

which is the same.

The following table shows the estimated numeric value of \(Pr(S_i)\) and \(Pr(W_i)\) of first \(100\) pulls.

\(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\)
1 0.500% 0.500% 26 0.441% 12.218% 51 0.571% 22.739% 76 0.406% 40.613%
2 0.498% 0.998% 27 0.439% 12.657% 52 0.738% 23.477% 77 0.403% 41.016%
3 0.495% 1.492% 28 0.437% 13.094% 53 0.884% 24.361% 78 0.400% 41.416%
4 0.493% 1.985% 29 0.435% 13.528% 54 0.998% 25.359% 79 0.397% 41.813%
5 0.490% 2.475% 30 0.432% 13.961% 55 1.078% 26.437% 80 0.395% 42.208%
6 0.488% 2.963% 31 0.430% 14.391% 56 1.121% 27.557% 81 0.393% 42.601%
7 0.485% 3.448% 32 0.428% 14.819% 57 1.128% 28.685% 82 0.390% 42.991%
8 0.483% 3.931% 33 0.426% 15.245% 58 1.104% 29.790% 83 0.388% 43.379%
9 0.480% 4.411% 34 0.424% 15.669% 59 1.055% 30.845% 84 0.385% 43.764%
10 0.478% 4.889% 35 0.422% 16.090% 60 0.988% 31.833% 85 0.383% 44.146%
11 0.476% 5.365% 36 0.419% 16.510% 61 0.911% 32.744% 86 0.380% 44.527%
12 0.473% 5.838% 37 0.418% 16.927% 62 0.830% 33.574% 87 0.378% 44.904%
13 0.471% 6.308% 38 0.415% 17.343% 63 0.751% 34.325% 88 0.375% 45.280%
14 0.468% 6.777% 39 0.413% 17.756% 64 0.679% 35.004% 89 0.373% 45.653%
15 0.466% 7.243% 40 0.411% 18.167% 65 0.616% 35.621% 90 0.371% 46.024%
16 0.464% 7.707% 41 0.409% 18.576% 66 0.564% 36.185% 91 0.368% 46.392%
17 0.462% 8.168% 42 0.407% 18.983% 67 0.522% 36.706% 92 0.366% 46.758%
18 0.459% 8.627% 43 0.405% 19.388% 68 0.489% 37.195% 93 0.364% 47.122%
19 0.457% 9.084% 44 0.403% 19.791% 69 0.465% 37.660% 94 0.361% 47.483%
20 0.454% 9.539% 45 0.401% 20.192% 70 0.447% 38.107% 95 0.359% 47.842%
21 0.452% 9.991% 46 0.399% 20.591% 71 0.434% 38.541% 96 0.357% 48.199%
22 0.450% 10.441% 47 0.397% 20.988% 72 0.425% 38.966% 97 0.355% 48.553%
23 0.448% 10.889% 48 0.395% 21.383% 73 0.418% 39.384% 98 0.352% 48.906%
24 0.445% 11.334% 49 0.393% 21.777% 74 0.413% 39.798% 99 0.350% 49.256%
25 0.443% 11.777% 50 0.391% 22.168% 75 0.409% 40.207% 100 0.348% 49.603%

The following chart shows the estimated \(Pr(S_i)\) and \(Pr(W_i)\) of first \(500\) pulls.

line-chart-double-rate-up-standard-banner

The blue line shows the estimated value of \(Pr(S_i)\), which is assigned to the primary axis (on the left); the orange line shows the estimated value of \(Pr(W_i)\), which is assigned to the secondary axis (on the right).

Here are some quick facts:

  • on \(57^{th}\) pull, you will have the biggest chance to get your target 6★ operator in a limited banner; \(Pr(S_{57})=1.128\%\);
  • if you prepared for \(54\) pulls, you will have a \(25\%\) chance of getting your target 6★ operator;
  • if you prepared for \(102\) pulls, you will have a \(50\%\) chance of getting your target 6★ operator;
  • if you prepared for \(187\) pulls, this chance increases to \(75\%\);
  • you will have a \(90\%\) chance to get your operator within \(303\) pulls.

You can see that in a double-rate-up standard banner, it is harder to get what you want than in a limited banner, since the conditional probability is lower (\(25\%\) for each one of the featured 6★ operator in this kind of banner, compared with \(35\%\) in a limited banner).

Similarly, due to the pity system, there also exist three turning points and three observed peaks in \(Pr(S_i)\), which are

  • turning points: \(i=51,\ 103,\ 163\)
  • local maximum points and values:
    • \(Pr(S_{57})=1.128\%,\ where\ i=57\)
    • \(Pr(S_{114})=0.439\%,\ where\ i=114\)
    • \(Pr(S_{169})=0.241\%,\ where\ i=169\)

Estimated Probability of Single-Rate-Up Standard Banner

The following results are obtained after \(500\) billion simulated pulls, run with command

./simulation_sequential --standard -t 500000000000 -n 1 -p 50 -c 0

You can also simply run with the command

./simulation_sequential --standard -t 500000000000 -n 1

which is the same.

The following table shows the estimated numeric value of \(Pr(S_i)\) and \(Pr(W_i)\) of first \(100\) pulls.

\(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\) \(i\) \(Pr(S_i)\) \(Pr(W_i)\)
1 1.000% 1.000% 26 0.778% 22.995% 51 0.969% 40.468% 76 0.450% 68.597%
2 0.990% 1.990% 27 0.770% 23.765% 52 1.298% 41.766% 77 0.444% 69.040%
3 0.980% 2.970% 28 0.762% 24.527% 53 1.579% 43.345% 78 0.437% 69.478%
4 0.970% 3.940% 29 0.755% 25.282% 54 1.796% 45.141% 79 0.432% 69.909%
5 0.961% 4.901% 30 0.747% 26.029% 55 1.941% 47.082% 80 0.426% 70.335%
6 0.951% 5.852% 31 0.740% 26.769% 56 2.012% 49.094% 81 0.420% 70.755%
7 0.942% 6.793% 32 0.732% 27.501% 57 2.010% 51.104% 82 0.415% 71.170%
8 0.932% 7.725% 33 0.725% 28.226% 58 1.946% 53.050% 83 0.410% 71.580%
9 0.923% 8.648% 34 0.718% 28.944% 59 1.833% 54.882% 84 0.404% 71.984%
10 0.913% 9.562% 35 0.711% 29.655% 60 1.684% 56.567% 85 0.399% 72.383%
11 0.905% 10.466% 36 0.703% 30.358% 61 1.517% 58.084% 86 0.394% 72.777%
12 0.895% 11.362% 37 0.697% 31.055% 62 1.343% 59.427% 87 0.389% 73.166%
13 0.886% 12.248% 38 0.689% 31.744% 63 1.177% 60.604% 88 0.384% 73.549%
14 0.878% 13.126% 39 0.682% 32.426% 64 1.025% 61.629% 89 0.379% 73.928%
15 0.869% 13.994% 40 0.676% 33.102% 65 0.893% 62.522% 90 0.374% 74.302%
16 0.860% 14.854% 41 0.669% 33.771% 66 0.783% 63.306% 91 0.369% 74.671%
17 0.851% 15.705% 42 0.662% 34.433% 67 0.695% 64.001% 92 0.364% 75.035%
18 0.843% 16.548% 43 0.656% 35.089% 68 0.627% 64.628% 93 0.359% 75.394%
19 0.834% 17.383% 44 0.649% 35.738% 69 0.576% 65.204% 94 0.355% 75.749%
20 0.826% 18.209% 45 0.643% 36.381% 70 0.538% 65.743% 95 0.350% 76.099%
21 0.818% 19.027% 46 0.636% 37.017% 71 0.511% 66.254% 96 0.345% 76.444%
22 0.810% 19.836% 47 0.630% 37.647% 72 0.492% 66.745% 97 0.341% 76.785%
23 0.802% 20.638% 48 0.624% 38.270% 73 0.477% 67.223% 98 0.336% 77.122%
24 0.793% 21.431% 49 0.617% 38.887% 74 0.466% 67.689% 99 0.332% 77.454%
25 0.786% 22.217% 50 0.611% 39.499% 75 0.457% 68.147% 100 0.328% 77.781%

The following chart shows the estimated \(Pr(S_i)\) and \(Pr(W_i)\) of first \(500\) pulls.

The blue line shows the estimated value of \(Pr(S_i)\), which is assigned to the primary axis (on the left); the orange line shows the estimated value of \(Pr(W_i)\), which is assigned to the secondary axis (on the right).

Here are some quick facts:

  • on \(56^{th}\) pull, you will have the biggest chance to get your target 6★ operator in a limited banner; \(Pr(S_{56})=2.012\%\);
  • if you prepared for \(29\) pulls, you will have more than \(25\%\) chance of getting your target 6★ operator;
  • if you prepared for \(57\) pulls, you will have more than \(50\%\) chance of getting your target 6★ operator;
  • if you prepared for \(92\) pulls, this chance increases to more than \(75\%\);
  • you will have more than \(90\%\) chance to get your operator within \(142\) pulls.

The single-rate-up standard banner is the easiest banner to get the featured 6★ operator — although the conditional probability is lower that that in a limited banner (\(50\%\) compared with \(70\%\)), this kind of banner only has one featured operator, while a limited banner has two. Based on our assumption, the conditional rate shared by each operator featured by the limited banner is \(70\% \div 2 = 35\%\), which is lower than \(50\% \div 1 = 50\%\) in a single-rate-up banner.

However,in this kind of banner, there only exist two turning points and two observed peaks in \(Pr(S_i)\) due to the pity system, which are

  • turning points: \(i=51,\ 104\)
  • local maximum points and values:
    • \(Pr(S_{56})=2.012\%,\ where\ i=56\)
    • \(Pr(S_{114})=0.422\%,\ where\ i=114\)

The reason why there are only two observed peaks might be that, in this type of banner, the odds of success is higher than other two types of banners (\(50\%\) compared with \(35\%\) and \(25\%\)), so the odds of failure is lower. Meanwhile, there exist two antagonism factors for the random event \(S_i\). Let’s call them \(f(i)\) and \(g(i)\). And we let the following hold

\[ Pr(S_i)=f(i) \cdot g(i) \]

According to the definition, the occurrence of \(S_i\) means the occurrence of \(F_{i-1}\) AND the occurrence of random event \(Get\ the\ Target\ 6★\ Operator\ on\ i^{th}\ Pull\) (again, this is different from \(S_i\) — it does not care about the results of all the previous pulls). As \(i\) increases, the probability of the former drops, since we need to ensure more pulls are failed. However, the probability of the later will increase, since there would be “more chance” for the pity system coming into effect, and/or raising the probability to a higher level. So we have the following equations hold where \(i\) belong to some range

\[ \begin{aligned} f(i) &= Pr(F_i)\\ g(i) &= Pr(Get\ the\ Target\ 6★\ Operator\ on\ i^{th}\ Pull)\\ \Delta f(i) &= f(i+1) - f(i) < 0\\ \Delta g(i) &= g(i+1) - g(i) > 0\ ,\ where\ i\ belong\ to\ some\ range \end{aligned} \]

The results turn out that in this kind of banner, the speed of \(f(i)\) dropping dominates the speed of \(g(i)\) increasing when \(i\) passes the second peak. So the value of \(Pr(S_i)\) just keeps decreasing. As a result, the third peak disappears. However, you can find that second factor \(g(i)\) did try to slow down the dropping of \(Pr(S_i)\) around \(161 \le i \le 176\).

This may also applicable to why there does not exist the fourth (or more) observed peaks of \(Pr(S_i)\) in other two kinds of banners. Also may applicable to why the first maximum of \(Pr(S_i)\) does not appear on \(i=51\), in all these three kinds of banners, where the pity system has already increased the odds of getting a 6★ operator.

Note that in another simulation of \(200\) billion pulls of this kind of banner, the estimated values of \(Pr(S_{167})\) and \(Pr(S_{168})\) are very close to each other (the difference is \(0.000244\%\)). It also might be the error of the simulation, although less likely. Without obtaining the theoretical formula of \(Pr(S_i)\), it is hard to tell whether the third peak truly exists. So I used the word “observed” when analyzing these simulation results.

Estimated Probability of Single-Rate-Up Limited Banner

This type of limited banner has not appeared in Arknights yet. However, you can refer to the simulation results in ./res/limited_500000000000_single_up_simu_1.res if you are interested. Will not show the values of \(Pr(S_i)\) and \(Pr(W_i)\) here.

You can run this simulation with the following command if you would like do your own simulation

./simulaiton_sequential --limited -t <a-big-enough-number> -n 1 -p 50 -c 0

You can also run with

./simulation_sequential -t <a-big-enough-number> -n 1

which is the same.

Comparison Across Different Types of Banners

Let’s draw the \(Pr(S_i)\) and \(Pr(W_i)\) of all these three kinds of banner in the same chart:

The solid lines are \(Pr(S_i)\) and \(Pr(W_i)\) from single-rate-up standard banner, the dotted lines are for the double-rate-up limited banner, and the dashed lines for the double-rate-up standard banner. As you can see, the \(Pr(S_i)\) in single-rate-up standard banner increases and decreases fastest, while \(Pr(S_i)\) in the double-rate-up standard banner is the slowest.

Finally we summarize the simulation results of different kinds of banners for your quick reference.

Banner Type First Local Maximum \(Pr(S_i)\) Second Local Maximum \(Pr(S_i)\) Third Local Maximum \(Pr(S_i)\) \(\ge 25\%\) Chance of Getting Your Operator \(\ge 50\%\) Chance of Getting Your Operator \(\ge 75\%\) Chance of Getting Your Operator \(\ge 90\%\) Chance of Getting Your Operator
Double-Rate-Up Limited Banner \(Pr(S_{57})=1.506\%\) \(Pr(S_{114})=0.466\%\) \(Pr(S_{168})=0.204\%\) \(41\) Pulls \(70\) Pulls \(132\) Pulls \(212\) Pulls
Double-Rate-Up Standard Banner \(Pr(S_{57})=1.128\%\) \(Pr(S_{114})=0.439\%\) \(Pr(S_{169})=0.241\%\) \(54\) Pulls \(102\) Pulls \(187\) Pulls \(303\) Pulls
Single-Rate-Up Standard Banner \(Pr(S_{56})=2.012\%\) \(Pr(S_{114})=0.422\%\) Not Observed \(29\) Pulls \(57\) Pulls \(92\) Pulls \(142\) Pulls

Hope these results will help you in your next banner. Good Luck!

Error Analysis

There would be more metrics that can reasonably analyze the errors. I am not majored in statistics, and if you know more scientific way to measure the simulation error, I would be very appreciate if you can let me know.

Here, the theoretical value of \(Pr(S_1)\) of these three kinds of banner can be easily obtained. Here I will compare the estimated values of \(Pr(S_1)\) and its theoretical value to get the absolute error \(\Delta Pr(S_1)\). And assume that the quality of the Mersenne Twister Engine mt19937_64 is high enough, so that the absolute errors of all other \(Pr(S_i)\) have the same absolute value as \(Pr(S_1)\) has. Since the true value for \(Pr(S_i)\) when \(i > 50\) is hard to obtain, we will compare the estimated value with the absolute value of the absolute error \(|\Delta Pr(S_1)|\) to have a sense of the accuracy of the estimated \(Pr(S_i)\).

We say that if

\[ \frac{|\Delta Pr(S_1)|}{Pr(S_i)} \le 5\% \]

, we would think this estimated value is “accurate”. \(Pr(W_i)\) has the same “accurate” range since they are obtained based on the estimated \(Pr(S_i)\).

Note that since the true values of most of \(Pr(S_i)\) are not known, the relative errors cannot be determined.

Theoretically, in double-rate-up limited banner,

\[ \begin{aligned} Pr(S_1) &= 2\% \times 70\% \div 2\\ &= 0.7\% \end{aligned} \]

The estimated value of \(Pr(S_1)\) is \(0.700026\%\) (from res/limited_500000000000_double_up_simu_1.res). So the absolute error and its absolute value is

\[ \begin{aligned} \Delta Pr(S_1) &= 0.700026\% - 0.7\% \\ &= 0.000026\% \\ &= 2.6 \times 10^{-7} \\ | \Delta Pr(S_1) | &= 2.6 \times 10^{-7} \end{aligned} \]

The threshold of an accurate \(Pr(S_i)\) is

\[ \begin{aligned} |\Delta Pr(S_1)| \div 5\% &= 5.2 \times 10^{-6} \\ &= 0.00052\% \end{aligned} \]

\(Pr(S_i)\) needs to be equal or bigger than this threshold. After checking the data obtained from the simulation, all the estimated values of \(Pr(S_i)\) and \(Pr(W_i)\), where \(1 \le i \le 677\), are accurate.

Similarly, according the simulation results of double-rate-up standard banner (from res/standard_500000000000_double_up_simu_1.res), the estimated values of \(Pr(S_i)\) and \(Pr(W_i)\), where \(1 \le i \le 780\), are accurate.

Finally, according the simulation results of single-rate-up standard banner (from res/standard_500000000000_single_up_simu_1.res), the estimated values of \(Pr(S_i)\) and \(Pr(W_i)\), where \(1 \le i \le 394\), are accurate.

If we loosen the condition from \(5\%\) to \(10\%\), the accurate range for single-rate-up standard banner is \(1 \le i \le 433\). So I finally still include the data of \(1 \le i \le 500\). If we want a higher accuracy, then we need to run multiple times of simulation at a larger scale.

Note that the method of analyzing error here may not be scientific enough to correctly show how truly accurate the results are. It is also possible that the estimated values that beyond this range are accurate enough, too.

You may find that the estimate \(Pr(S_i)\) start to fluctuate when \(i\) is relatively big (especially when \(i\) is close to \(1000\)). Without the theoretical values (the true value) of \(Pr(S_i)\), we cannot say that those estimated value, although less likely, is not accurate.

Future Work

Due to the limited time, I did not carry out a much more larger simulation. Currently, the results presented in this article is based on \(500,000,000,000\) simulated pulls, which needs about \(95\) minutes to finish on single thread. I may execute a larger scaled parallel simulation and average the results to achieve a higher accuracy.

Besides, we can also try a different RNG. Arknights is made by Unity Engine, and we could try the RNG that the Unity adopts, or try a different seeding method (for example, seeding based on time and/or player’s UID).

What’s more, currently I did not try the same simulation configuration but with different -p value. We can also see with different pity system starting points, how the curve of \(Pr(S_i)\) will change in its shape. This would also be an interesting question to have a look at — why the designer of the Arknights let the pity system starts after \(50^{th}\) pull, rather than after \(20^{th}\) pull or \(30^{th}\) pull?

Finally, we can have a way to automatically generate the line charts and find the turning points as well as the maximum values. I may implemented this functionality using Python in the future.

Acknowledgements