Abstract

Theoretical probabilities are important to predict your chances in getting 6★ Operators in Arknights. While the first 50 consecutive non-6★ pulls follow the conventional geometric pdf which is \(P(X=x)=p(1-p)^{x-1}\) with fixed probability \(p\), but it’s impossible in the case where the pity system starts and the \(p\) varies each pull.

A similar problem was found in StackExchange where it asks about the specific distribution similar to geometric pdf but with varying but deterministic probabilities. It is found out that the theoretical probabilities of pulling a 6★ Operator after consecutive non-6★ Operators follows a more general version of geometric distribution in which accounts for the varying probabilities.

The probability of getting a 6★ Operator is much sooner than you would expect. Upon calculation from the general version of geometric pdf, results show that there is a 76% chance that you will pull a 6★ Operator on your 55th pull or earlier in your streak, 95% on your 62nd pull or earlier, and 99% on your 67th pull or earlier. The probabilities shown accounts the near impossibility of making into pull 99 because the probability of ending the current streak (or getting a non-6★ Operator) decreases exponentially as you approach the 99th pull.

To validate, we tested each simulated data and determine whether they follow the theoretical probabilities. For this analysis, there are 6 sets of simulated data where their differences comes down to their RNGs and seeding methods. Results indicate that the all six sets of simulated data follow the calculated theoretical probabilities.

The computed theoretical probabilities can be used to accurately predict your chances of getting a 6★ Operator. This would highly benefit users to plan ahead on their rolls and their desired outcomes. Further research on other gacha rating system with varying rates could be possible using the general formula for geometric pdf.


Background

In our previous analysis, we explored the distribution of Non-6★ Headhunting Streak using simulated data. The two simulated data differs from its RNG that decides randomness. These are Mersenne Twister RNG and Subtractive PRNG. In R, the default RNG used is the former one through set.seed() while the latter was created in a custom function.

However, simulated data are not good approximations on the theoretical probabilities. We did a research on how to get the geometric pdf but it only revolves around the usual formula: \(P(X=x)=p(1-p)^{x-1}\). SurfChu85 stated that it would be possible for the first 50 pulls, but the calculations would be incorrect once the pity system kicks in.

A similar question was raised on StackExchange where it finds the appropriate distribution to predict varying \(p\) given \(x\). As a response, username Did replied:

If the probability of success at trial \(n\) is \(p_n\) and the sequence \((p_n)_{n\geq 1}\) is deterministic, then the number \(X\) of trials needed to get one success is such that, for every \(n\geq 1\), \[P(X=n)=p_n\prod_{k=1}^{n-1}(1-p_k)\]

It is noticeable that the probability of increasing your non-6★ pull streak decreases as much as the increase of probability of getting a 6★ operator. This means that it would be near impossible to get into pull 99 because your probability of advancing to the \(n^{th}\) pull decreases as you approach the \(99^{th}\) pull. These probabilities tend to decrease exponentially which results to a very low chance. For example:

A rigged Headhunting system has a base 6★ Operator rate of 92% and increases by 2% for the next pull given that you did not obtained a 6★ Operator from the previous ones. This means that you have a 92% chance on your first pull, 94% on the second pull, until it reaches the fifth pull where you will be guaranteed a 6★ Operator.

However, the probability of failing 4 times in a row in this system is:

\[(0.08)(0.06)(0.04)(0.02)=\frac{384}{100000000}\]

which is a very low chance. Therefore, you will acquire the operator in about 1 to 3 pulls, but almost never in the 5th pull (after 4 consecutive fails). This concept applies on the Arknights’ pity system.

The formula below is general version of the geometric pdf but derived according to Arknights Headhunting system and its conditions. For simplification and clarity, \(x\) would be the \(x^{th}\) pull where you obtained your 6★ Operator, \(p_x\) is the current rate at the \(x^{th}\) pull, and \(P(X=x)\) as the probability of getting a 6★ Operator considering the exponential decrease of probabilities as you approach the \(99^{th}\) pull.

\[P(X=x)=p_{x}\prod_{n=1}^{x-1}(1-p_n)\\p_x=max(0.02,0.02(x-49))\\ 0\leq p_x\leq 1\\ 1\leq x\leq 99\\ x\ \epsilon\ \mathbb{N}\]

In most gacha games, \(p_x\) is constant and will produce the same result as the conventional formula. In the case of Arknights Headhunting, they have a unique \(p_x\) because of the pity system and it would only work on the formula above.


Methodology

In this part, we would determine the theoretical probabilities in Arknights’ Gacha System and test if there are differences with the simulated data. While most of the details on the simulation can be found in our previous EDA and in Github, I won’t be discussing every part of it, but I would give a quick run through the important details.

These are the corresponding dependencies and system details that would be used in this analysis. For plotting, the library ggplot2 shall be used, while the functions to be used are stored in arkfunc.R. The R file, function details, and codebook can be obtained in the Github repository.

library(ggplot2)
source("arkfunc.R")
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows >= 8 x64 (build 9200)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## system code page: 932
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.2
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.29       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0    colorspace_1.4-1
##  [6] R6_2.4.1         rlang_0.4.7      stringr_1.4.0    dplyr_1.0.2      tools_4.0.2     
## [11] grid_4.0.2       gtable_0.3.0     xfun_0.16        withr_2.2.0      htmltools_0.5.0 
## [16] ellipsis_0.3.1   yaml_2.2.1       digest_0.6.25    tibble_3.0.3     lifecycle_0.2.0 
## [21] crayon_1.3.4     purrr_0.3.4      vctrs_0.3.4.9000 glue_1.4.2       evaluate_0.14   
## [26] rmarkdown_2.3    stringi_1.4.6    compiler_4.0.2   pillar_1.4.6     generics_0.0.2  
## [31] scales_1.1.1     pkgconfig_2.0.3

Here is the given scenario to be simulated.

There are 10000 accounts that contain a record of their pull streaks from 10 of their 6★ Operators. The streaks follow the Arknights 6★ Headhunting rate (including the Pity System). Overall, there are 100000 recorded pull streaks.

In addition, the simulated data was done six times with different RNGs and methods of seeding. As done in the previous analysis, there are 2 types of RNG used which is Mersenne-Twister RNG, the default RNG for R, and Subtractive PRNG that is used in the development of Arknights as mentioned by Ayane Satomi.

Unlike the previous simulation, there would be an additional classification. Each user (iteration) has its own seed to be used for their corresponding RNGs as recommended by Ayane Satomi in his EDA. To determine the seed used by the user, it’s either of the following:

  1. Based from master.seed (Fixed Seeding)
  2. Based from a number generated by the RNG (RNG Seeding)
  3. Based from UNIX time (Time Seeding)

This means that there are six sets of simulated data where each has a unique type based from the said two attributes. The code below assigns these six simulations to their corresponding variables exp1 to exp6. For function details, visit the Github repository.

# Mersenne-Twister RNG
exp1 <- mtrng_sim(10, 10000, 362867)            # Fixed Seeding
exp2 <- mtrng_sim(10, 10000, 362867, "rng")     # RNG Seeding
exp3 <- mtrng_sim(10, 10000, seedtype = "time") # Time Seeding

# Subtractive PRNG
exp4 <- sprng_sim(10, 10000, 362867)            # Fixed Seeding
exp5 <- sprng_sim(10, 10000, 362867, "rng")     # RNG Seeding
exp6 <- sprng_sim(10, 10000, seedtype = "time") # Time Seeding

These results would be compared to the theoretical frequencies and probabilities that is from arknights_dist() function. The theoretical data for 100000 records would be stored in the theo_df variable.

theo_df <- arknights_dist(100000)

To test the simulated data if they follow the theoretical probabilities, \(\chi^2\) Goodness-Of-Fit test shall be. Given the frequencies from each simulated data, we would test if these follow the theoretical probability distribution.


Results and Discussion

Here are the results of the simulation and some details on the theoretical distribution. Further details on the codes and their values/columns can be found in Github.

Tables

Below is the complete table of theoretical probabilities, frequencies, and others. For more details on the descriptions, visit the Github repository.

print(theo_df)
##    pulls    type rate freq c.freq         prob     c.prob         odds    fail.odds
## 1      1 nonpity 0.02 2000   2000 2.000000e-02 0.02000000 2.040816e-02 4.900000e+01
## 2      2 nonpity 0.02 1960   3960 1.960000e-02 0.03960000 1.999184e-02 5.002041e+01
## 3      3 nonpity 0.02 1921   5881 1.920800e-02 0.05880800 1.958417e-02 5.106164e+01
## 4      4 nonpity 0.02 1882   7763 1.882384e-02 0.07763184 1.918497e-02 5.212412e+01
## 5      5 nonpity 0.02 1845   9608 1.844736e-02 0.09607920 1.879406e-02 5.320829e+01
## 6      6 nonpity 0.02 1808  11416 1.807842e-02 0.11415762 1.841126e-02 5.431458e+01
## 7      7 nonpity 0.02 1772  13188 1.771685e-02 0.13187447 1.803640e-02 5.544345e+01
## 8      8 nonpity 0.02 1736  14924 1.736251e-02 0.14923698 1.766929e-02 5.659536e+01
## 9      9 nonpity 0.02 1702  16626 1.701526e-02 0.16625224 1.730979e-02 5.777077e+01
## 10    10 nonpity 0.02 1667  18293 1.667496e-02 0.18292719 1.695772e-02 5.897018e+01
## 11    11 nonpity 0.02 1634  19927 1.634146e-02 0.19926865 1.661294e-02 6.019406e+01
## 12    12 nonpity 0.02 1601  21528 1.601463e-02 0.21528328 1.627527e-02 6.144292e+01
## 13    13 nonpity 0.02 1569  23097 1.569433e-02 0.23097761 1.594457e-02 6.271726e+01
## 14    14 nonpity 0.02 1538  24635 1.538045e-02 0.24635806 1.562070e-02 6.401761e+01
## 15    15 nonpity 0.02 1507  26142 1.507284e-02 0.26143090 1.530351e-02 6.534450e+01
## 16    16 nonpity 0.02 1477  27619 1.477138e-02 0.27620228 1.499285e-02 6.669847e+01
## 17    17 nonpity 0.02 1448  29067 1.447595e-02 0.29067823 1.468859e-02 6.808007e+01
## 18    18 nonpity 0.02 1419  30486 1.418644e-02 0.30486467 1.439059e-02 6.948987e+01
## 19    19 nonpity 0.02 1390  31876 1.390271e-02 0.31876738 1.409872e-02 7.092844e+01
## 20    20 nonpity 0.02 1362  33238 1.362465e-02 0.33239203 1.381285e-02 7.239637e+01
## 21    21 nonpity 0.02 1335  34573 1.335216e-02 0.34574419 1.353285e-02 7.389425e+01
## 22    22 nonpity 0.02 1309  35882 1.308512e-02 0.35882930 1.325861e-02 7.542271e+01
## 23    23 nonpity 0.02 1282  37164 1.282341e-02 0.37165272 1.298999e-02 7.698235e+01
## 24    24 nonpity 0.02 1257  38421 1.256695e-02 0.38421966 1.272688e-02 7.857383e+01
## 25    25 nonpity 0.02 1232  39653 1.231561e-02 0.39653527 1.246917e-02 8.019779e+01
## 26    26 nonpity 0.02 1207  40860 1.206929e-02 0.40860456 1.221674e-02 8.185488e+01
## 27    27 nonpity 0.02 1183  42043 1.182791e-02 0.42043247 1.196948e-02 8.354580e+01
## 28    28 nonpity 0.02 1159  43202 1.159135e-02 0.43202382 1.172729e-02 8.527122e+01
## 29    29 nonpity 0.02 1136  44338 1.135952e-02 0.44338335 1.149004e-02 8.703186e+01
## 30    30 nonpity 0.02 1113  45451 1.113233e-02 0.45451568 1.125766e-02 8.882843e+01
## 31    31 nonpity 0.02 1091  46542 1.090969e-02 0.46542537 1.103002e-02 9.066166e+01
## 32    32 nonpity 0.02 1069  47611 1.069149e-02 0.47611686 1.080704e-02 9.253231e+01
## 33    33 nonpity 0.02 1048  48659 1.047766e-02 0.48659452 1.058861e-02 9.444113e+01
## 34    34 nonpity 0.02 1027  49686 1.026811e-02 0.49686263 1.037464e-02 9.638891e+01
## 35    35 nonpity 0.02 1006  50692 1.006275e-02 0.50692538 1.016504e-02 9.837644e+01
## 36    36 nonpity 0.02  986  51678 9.861492e-03 0.51678687 9.959710e-03 1.004045e+02
## 37    37 nonpity 0.02  966  52644 9.664263e-03 0.52645113 9.758572e-03 1.024740e+02
## 38    38 nonpity 0.02  947  53591 9.470977e-03 0.53592211 9.561534e-03 1.045857e+02
## 39    39 nonpity 0.02  928  54519 9.281558e-03 0.54520367 9.368512e-03 1.067405e+02
## 40    40 nonpity 0.02  910  55429 9.095927e-03 0.55429960 9.179422e-03 1.089393e+02
## 41    41 nonpity 0.02  891  56320 8.914008e-03 0.56321360 8.994182e-03 1.111830e+02
## 42    42 nonpity 0.02  874  57194 8.735728e-03 0.57194933 8.812713e-03 1.134724e+02
## 43    43 nonpity 0.02  856  58050 8.561013e-03 0.58051035 8.634937e-03 1.158086e+02
## 44    44 nonpity 0.02  839  58889 8.389793e-03 0.58890014 8.460777e-03 1.181925e+02
## 45    45 nonpity 0.02  822  59711 8.221997e-03 0.59712214 8.290159e-03 1.206249e+02
## 46    46 nonpity 0.02  806  60517 8.057557e-03 0.60517969 8.123009e-03 1.231071e+02
## 47    47 nonpity 0.02  790  61307 7.896406e-03 0.61307610 7.959256e-03 1.256399e+02
## 48    48 nonpity 0.02  774  62081 7.738478e-03 0.62081458 7.798829e-03 1.282244e+02
## 49    49 nonpity 0.02  758  62839 7.583708e-03 0.62839829 7.641661e-03 1.308616e+02
## 50    50 nonpity 0.02  743  63582 7.432034e-03 0.63583032 7.487683e-03 1.335527e+02
## 51    51    pity 0.04 1457  65039 1.456679e-02 0.65039711 1.478212e-02 6.764932e+01
## 52    52    pity 0.06 2098  67137 2.097617e-02 0.67137328 2.142560e-02 4.667314e+01
## 53    53    pity 0.08 2629  69766 2.629014e-02 0.69766342 2.699997e-02 3.703708e+01
## 54    54    pity 0.10 3023  72789 3.023366e-02 0.72789708 3.117623e-02 3.207572e+01
## 55    55    pity 0.12 3265  76054 3.265235e-02 0.76054943 3.375452e-02 2.962567e+01
## 56    56    pity 0.14 3352  79406 3.352308e-02 0.79407251 3.468586e-02 2.883019e+01
## 57    57    pity 0.16 3295  82701 3.294840e-02 0.82702091 3.407098e-02 2.935049e+01
## 58    58    pity 0.18 3114  85815 3.113624e-02 0.85815714 3.213686e-02 3.111692e+01
## 59    59    pity 0.20 2837  88652 2.836857e-02 0.88652571 2.919684e-02 3.425028e+01
## 60    60    pity 0.22 2496  91148 2.496434e-02 0.91149006 2.560352e-02 3.905713e+01
## 61    61    pity 0.24 2124  93272 2.124239e-02 0.93273244 2.170342e-02 4.607569e+01
## 62    62    pity 0.26 1749  95021 1.748956e-02 0.95022201 1.780089e-02 5.617695e+01
## 63    63    pity 0.28 1394  96415 1.393784e-02 0.96415985 1.413485e-02 7.074714e+01
## 64    64    pity 0.30 1075  97490 1.075205e-02 0.97491189 1.086891e-02 9.200555e+01
## 65    65    pity 0.32  803  98293 8.028195e-03 0.98294009 8.093168e-03 1.235610e+02
## 66    66    pity 0.34  580  98873 5.800371e-03 0.98874046 5.834211e-03 1.714028e+02
## 67    67    pity 0.36  405  99278 4.053435e-03 0.99279389 4.069933e-03 2.457043e+02
## 68    68    pity 0.38  274  99552 2.738321e-03 0.99553221 2.745840e-03 3.641873e+02
## 69    69    pity 0.40  179  99731 1.787115e-03 0.99731933 1.790314e-03 5.585612e+02
## 70    70    pity 0.42  113  99844 1.125882e-03 0.99844521 1.127151e-03 8.871924e+02
## 71    71    pity 0.44   68  99912 6.841075e-04 0.99912932 6.845758e-04 1.460759e+03
## 72    72    pity 0.46   40  99952 4.005138e-04 0.99952983 4.006743e-04 2.495793e+03
## 73    73    pity 0.48   23  99975 2.256808e-04 0.99975551 2.257318e-04 4.430036e+03
## 74    74    pity 0.50   12  99987 1.222438e-04 0.99987776 1.222587e-04 8.179375e+03
## 75    75    pity 0.52    6  99993 6.356677e-05 0.99994132 6.357081e-05 1.573049e+04
## 76    76    pity 0.54    3  99996 3.168559e-05 0.99997301 3.168659e-05 3.155909e+04
## 77    77    pity 0.56    2  99998 1.511520e-05 0.99998812 1.511543e-05 6.615757e+04
## 78    78    pity 0.58    1  99999 6.888213e-06 0.99999501 6.888260e-06 1.451745e+05
## 79    79    pity 0.60    0  99999 2.992810e-06 0.99999800 2.992819e-06 3.341332e+05
## 80    80    pity 0.62    0  99999 1.237028e-06 0.99999924 1.237029e-06 8.083882e+05
## 81    81    pity 0.64    0  99999 4.852342e-07 0.99999973 4.852344e-07 2.060860e+06
## 82    82    pity 0.66    0  99999 1.801432e-07 0.99999991 1.801432e-07 5.551139e+06
## 83    83    pity 0.68    0  99999 6.310471e-08 0.99999997 6.310471e-08 1.584668e+07
## 84    84    pity 0.70    0  99999 2.078743e-08 0.99999999 2.078743e-08 4.810599e+07
## 85    85    pity 0.72    0  99999 6.414408e-09 1.00000000 6.414408e-09 1.558990e+08
## 86    86    pity 0.74    0  99999 1.845924e-09 1.00000000 1.845924e-09 5.417341e+08
## 87    87    pity 0.76    0  99999 4.929116e-10 1.00000000 4.929116e-10 2.028761e+09
## 88    88    pity 0.78    0  99999 1.214119e-10 1.00000000 1.214119e-10 8.236424e+09
## 89    89    pity 0.80    0  99999 2.739551e-11 1.00000000 2.739551e-11 3.650233e+10
## 90    90    pity 0.82    0  99999 5.616079e-12 1.00000000 5.616079e-12 1.780602e+11
## 91    91    pity 0.84    0  99999 1.035550e-12 1.00000000 1.035550e-12 9.656702e+11
## 92    92    pity 0.86    0  99999 1.696330e-13 1.00000000 1.696330e-13 5.895080e+12
## 93    93    pity 0.88    0  99999 2.430091e-14 1.00000000 2.430091e-14 4.115072e+13
## 94    94    pity 0.90    0  99999 2.982385e-15 1.00000000 2.982385e-15 3.353022e+14
## 95    95    pity 0.92    0  99999 3.048660e-16 1.00000000 3.048660e-16 3.280130e+15
## 96    96    pity 0.94    0  99999 2.491948e-17 1.00000000 2.491948e-17 4.012925e+16
## 97    97    pity 0.96    0  99999 1.526981e-18 1.00000000 1.526981e-18 6.548870e+17
## 98    98    pity 0.98    0  99999 6.235172e-20 1.00000000 6.235172e-20 1.603805e+19
## 99    99    pity 1.00    0  99999 1.272484e-21 1.00000000 1.272484e-21 7.858644e+20

Note: There could be a precision error in the cumulative frequencies but it should sum up to 100000 theoretically.

\[\sum_{n=1}^{99}(100000)(p_n)=(100000)\sum_{n=1}^{99}(p_n)\]

It is important to highlight on the c.prob column because this would be the interesting part. Those numbers show the probability that you will obtain a 6★ Operator in the corresponding pull or earlier. This means that there is a 99% chance that you will obtain one on the 67th pull or earlier, which is quite intriguing but as previously mentioned, the probability of increasing your streak decreases exponentially and in this case, it starts to end at pull 67.

Another important thing to interpret is the fail.odds column. This is the odds of obtaining a 6★ Operator. For example: in pull 1, the fail.odds shown is 49. This means that the odds of obtaining one is 1:49. It seems plausible because for each 49 failures (obtaining a non-6★ Operator), there would be a 1 success (obtaining a 6★ Operator). Another example would be pull 56 where the value at fail.odds is 28.8301944. This means that the odds would be higher because for each 29 failures, there would be 1 success. In fact, pull 56 has the highest probability and odds, so this pull would be your best shot in obtaining a 6★ Operator.

All of these probabilities considered the near impossibility of failing 98 times in a row in Arknights Headhunting.

On the other hand, exp variables is a list that contains the following:

  • A matrix of the results

  • Tally of their pulls

  • Vector of seeds for each iteration (account)

head(exp2$results[1:5,1:5])
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   54    9   61   24   39
## [2,]   42    3    8   52    6
## [3,]    1   57   53   66   69
## [4,]   47    2   57   36   55
## [5,]   53    6   58   19    1
head(exp2$tally, n = 10)
##    pulls freq c.freq    prob  c.prob   rng seedtype
## 1      1 1935   1935 0.01935 0.01935 mtrng      rng
## 2      2 1984   3919 0.01984 0.03919 mtrng      rng
## 3      3 1946   5865 0.01946 0.05865 mtrng      rng
## 4      4 1856   7721 0.01856 0.07721 mtrng      rng
## 5      5 1818   9539 0.01818 0.09539 mtrng      rng
## 6      6 1727  11266 0.01727 0.11266 mtrng      rng
## 7      7 1808  13074 0.01808 0.13074 mtrng      rng
## 8      8 1706  14780 0.01706 0.14780 mtrng      rng
## 9      9 1689  16469 0.01689 0.16469 mtrng      rng
## 10    10 1564  18033 0.01564 0.18033 mtrng      rng
head(exp2$seed_list, n = 10)
##    iteration      seed
## 1          1 977110477
## 2          2 311197344
## 3          3  95312712
## 4          4 731484412
## 5          5 691967860
## 6          6 251261500
## 7          7 828085199
## 8          8 108629885
## 9          9 276776077
## 10        10 567947743

For plotting, we have to extract the tallies of each experiment and row bind them to the variable df_tally. In this case, the plotting would be easily done and there would be a quick comparison among them.

df_tally <- rbind(exp1[["tally"]], exp2[["tally"]]
                  ,exp3[["tally"]], exp4[["tally"]]
                  ,exp5[["tally"]], exp6[["tally"]])

Plots

The following set of codes plot the frequency and the probability distribution, with comparison through theoretical data. Although the distribution looks similar, but the values are completely different thus have different interpretation.

freq.plot <- ggplot(data = df_tally, mapping = aes(x = pulls, y = freq)) +
    geom_bar(stat = "identity", width = 0.8) +
    geom_point(data = theo_df
               ,mapping = aes(x = pulls, y = freq, color = type)
               ,shape = 45, size = 5) +
    facet_grid(seedtype ~ rng) +
    xlab("Number of Pulls") +
    ylab("Frequency") +
    ggtitle("Simulated Geometric Distribution of Arknights 6-star Headhunting"
            ,subtitle = "With Theoretical Frequency Points
            n = 10, i = 10000")

print(freq.plot)

c.freq.plot <- ggplot(data = df_tally, mapping = aes(x = pulls, y = c.freq)) +
    geom_bar(stat = "identity", width = 0.8) +
    geom_point(data = theo_df
               ,mapping = aes(x = pulls, y = c.freq, color = type)
               ,shape = 45, size = 5) +
    facet_grid(seedtype ~ rng) +
    xlab("Number of Pulls") +
    ylab("Cumulative Frequency") +
    ggtitle("Simulated Geometric Distribution of Arknights 6-star Headhunting"
            ,subtitle = "Cumulative\nWith Theoretical Frequency Points
            n = 10, i = 10000")

print(c.freq.plot)

prob.plot <- ggplot(data = df_tally, mapping = aes(x = pulls, y = prob)) +
    geom_bar(stat = "identity", width = 0.8) +
    geom_point(data = theo_df
               ,mapping = aes(x = pulls, y = prob, color = type)
               ,shape = 45, size = 5) +
    facet_grid(seedtype ~ rng) +
    xlab("Number of Pulls") +
    ylab("Probability") +
    ggtitle("Simulated Probability Distribution of Arknights 6-star Headhunting"
            ,subtitle = "With Theoretical Probability Points
            n = 10, i = 10000")

print(prob.plot)

c.prob.plot <- ggplot(data = df_tally, mapping = aes(x = pulls, y = c.prob)) +
    geom_bar(stat = "identity", width = 0.8) +
    geom_point(data = theo_df
               ,mapping = aes(x = pulls, y = c.prob, color = type)
               ,shape = 45, size = 5) +
    facet_grid(seedtype ~ rng) +
    xlab("Number of Pulls") +
    ylab("Cumulative Probability") +
    ggtitle("Simulated Probability Distribution of Arknights 6-star Headhunting"
            ,subtitle = "Cumulative\nWith Theoretical Probability Points
            n = 10, i = 10000")

print(c.prob.plot)

Through observation, it seems there is no significant differences between the simulated data and the theoretical frequencies and probabilities. Although in exp6 has some noticeable changes on the frequencies and probabilities (in the pity system), it might have little impact to the results.

Statistical Test

In order to be sure from the given observation, we would use \(\chi^2\) goodness-of-fit test to determine if. Here are the null and alternative hypothesis that is applicable through all 6 chisq.test results.

\(H_0\): The simulated frequency tally follows the theoretical probabilities and differences between them are due to chance.

\(H_A\): The simulated frequency tally does not follow the theoretical probabilities and differences between them are not due to chance.

Reject \(H_0\) when \(p \leq 0.05\)

suppressWarnings(chisq.test(exp1[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp1[["tally"]]$freq
## X-squared = 76.435, df = 98, p-value = 0.9476
suppressWarnings(chisq.test(exp2[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp2[["tally"]]$freq
## X-squared = 120.86, df = 98, p-value = 0.05847
suppressWarnings(chisq.test(exp3[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp3[["tally"]]$freq
## X-squared = 81.659, df = 98, p-value = 0.8832
suppressWarnings(chisq.test(exp4[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp4[["tally"]]$freq
## X-squared = 87.511, df = 98, p-value = 0.7672
suppressWarnings(chisq.test(exp5[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp5[["tally"]]$freq
## X-squared = 105.05, df = 98, p-value = 0.2947
suppressWarnings(chisq.test(exp6[["tally"]]$freq, p = theo_df$prob))
## 
##  Chi-squared test for given probabilities
## 
## data:  exp6[["tally"]]$freq
## X-squared = 103, df = 98, p-value = 0.345

With the given p-values, we fail to reject \(H_0\) in all tests. All six simulated data follows the theoretical probabilities for Arknights 6★ Headhunting.


Conclusions and Recommendations

Theoretical probabilities are important to determine the expected value within a probability distribution. With the implementation of the general geometric distribution formula, we would be able to predict the probabilities in Arknights 6★ Headhunting despite being slightly complex.

With the probabilities that follows the geometric pdf, people can expect to obtain their desired Operators at specific number of pulls.

The computed theoretical probabilities can be used to accurately predict your chances of getting a 6★ Operator. This would highly benefit users to plan ahead on their rolls and their desired outcomes. Further research on other gacha rating system with varying rates could be possible using the general formula for geometric pdf.


Special Thanks

I would like to thank SurfChu85 because he’s the first person to solve the distribution problem through Excel. Few minutes after, I made the R version of the solution that is patterned from his work. Also, we have Ayane Satomi and Eyenine for the EDA and seed iteration template in Python.