Are numbers Random?

Marcos Siqueira Campos & Sharon Morris
December 3, 2017

IS609 Final project

Introduction

A random number is a number that is drawn from a set of values where each value is equally probable.

Properties of a random number generators are:

  • Generated random numbers appear to be distributed uniformly on (0, 1).
  • Statistically independent of others.
  • Long cycle length.

Project Objective

  • Apply modern techniques to detect whether a number sequence appears as random or not, or whether it satisfies or does not satisfy the central limit theorem (CLT)

  • Hypothesis uniformity
    \( H_0:R_1\space = U[0,1] \)
    \( H_a:R_1\space \ne U[0,1] \)

  • Hypothesis independence
    \( H_0:R_1\space = independently \)
    \( H_a:R_1\space \ne independently \)

Approach

Three random generators test were applied to determine uniformity and independence:

  • Midsquare method
    • Start with a 4 digit number \( x_0 \) (seed) Square it to obtain 8 digits (if needed, append zeros to the left) Take the middle 4 digits to obtain the next 4 digit number \( x_1 \); then square \( x_1 \) and take the middle 4 digits again \( x_n \).
  • Linear congruential generator
    • Produce a sequence of integers between \( 0 \) and \( m āˆ’ 1 \) according to \( z_n = (az_nāˆ’1 + c) \space mod\space m, \space n = 1, 2, . . . \) a is the multiplier, c the increment and \( m \) the modulus. To obtain uniform random numbers on (0, 1) we takeun = zn/m
  • R base generator “Mersenne-Twister” From Matsumoto and Nishimura (1998). A twisted GFSR with period 219937 - 1 and equidistribution in 623 consecutive dimensions (over the whole period). The 'seed' is a 624-dimensional set of 32-bit integers plus a current position in that set.

Descriptive statistics

  • R base, linear congruence and middle-square
    • Summary statistics
    • Verify the maximum, mean and minimum value of n consecutive random numbers and the standard deviation of n consecutive values.
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000653 0.2529000 0.4946000 0.4975000 0.7434000 0.9999000 
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0001446 0.2480000 0.4987000 0.4983000 0.7494000 0.9972000 
[1] 0.2866937
[1] 0.2888357

Descriptive statistics - cont'd

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.01405 0.21000 0.41000 0.50950 0.63180 0.97900 
[1] 0.2240683

Summary statistics - visualization

plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4

Exploratory Data Analysis

  • R base, linear congruence and midle-square Histogram
    • Graphical methods detect outliers and anomalies and check underlying assumptions
    • The histogram below shows that the numbers are uniformly distributed between 0 and 1

plot of chunk unnamed-chunk-5

Analysis - Independence

  • Chi-square test compares the distribution of the set of numbers generated against the uniform distribution
    • We accept the \( \space H_0 \) for R base and linear congruence
$test.statistic
[1] 11.844

$p.value
[1] 0.2222474

$df
[1] 9
$test.statistic
[1] 1.822

$p.value
[1] 0.9939798

$df
[1] 9

Analysis - Independence

  • R base and linear congruence Auto correlation

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

Analysis - Cumulative sequence - tosses a coin

  • R base and linear congruence tosses a coin
  • Test a cumulative sequence where the consecutive values are classified above and below the value 0.5

plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8

Analysis - Gap test

  • R base coin toss

             Gap test

chisq stat = 16, df = 13, p-value = 0.23

         (sample size : 10000)

length  observed freq       theoretical freq
1            1282            1250 
2            629             625 
3            308             312 
4            185             156 
5            69              78 
6            25              39 
7            25              20 
8            8           9.8 
9            5           4.9 
10           4           2.4 
11           2           1.2 
12           1           0.61 
13           0           0.31 
14           0           0.15 

Analysis - Gap test

  • Linear congruence tosses a coin

             Gap test

chisq stat = 56, df = 13, p-value = 3e-07

         (sample size : 10000)

length  observed freq       theoretical freq
1            1165            1250 
2            676             625 
3            320             312 
4            146             156 
5            116             78 
6            29              39 
7            29              20 
8            0           9.8 
9            0           4.9 
10           0           2.4 
11           0           1.2 
12           0           0.61 
13           0           0.31 
14           0           0.15 

Analysis - Uniformity Distribution on (0,1)

  • Visualization 10,000 Random Numbers Generated by R
  • The points should 'fill' the unit square uniformly and with no noticeable pattern.

plot of chunk unnamed-chunk-10plot of chunk unnamed-chunk-10

Conclusion

summary table

test R base linear cong middle sq
histogram ok ok rejected
stats ok ok rejected
chi Sq ok ok -
independence ok rejected -
sequence leng ok rejected -
tosses a coin rejected rejected -
gap ok rejected -
vizualization ok rejected -

Conclusion

  • The middle-square method can't be used in practical way as random number generator, it's degenerate to zero and the numbers generated is not uniform, the zero number frequency is very high and can generate very short cycles.

  • The linear congruence main issue is the sequence size, too short to be used in this way, the sequence needs to be increase. However had nice final result at tosses coin, converged to 0.5.

  • The visualization in the previous slides illustrates that R is a True Number Generator however, we found some issues with the dynamic behavior. The R generator did not work well for the coin toss. The results of some seeds were not as expected. It is possible, with further analysis, to improve the results. The result was very dependent on the seed value, for some seeds converge to 0.5 and for others can't converge.
    For intense use of random number, like simulation, we recommend test seed after, at least if it will converge.

  • The dynamic methods was more sensible to detect patterns (issues) in the random sequence, highlighting: Plot(x,y), cumulative sequence - coin toss and 2D plot.