Introduction

A random number is a number that is drown from a set of values where each value is equally probable.

Properties of a random number generations are:

Generated random numbers appear to be distributed unformily on (0, 1).
Statistically independent of others.
Long cycle length.

Project Objective

Apply modern techniques to detect whether a number sequence appears as random or not, or whether it satisfies or does not satisfy the central limit theorem (CLT)

Hypothesis uniformity

\(H_0:R_1\space = U[0,1]\)
\(H_a:R_1\space \ne U[0,1]\)

Hypothesis independence

\(H_0:R_1\space = independently\)
\(H_a:R_1\space \ne independently\)

Approach

Three random generators test were applied to determine uniformity, independence and cycle length:

Middle-square generator

Start with a 4 digit number \(x_0\) (seed) Square it to obtain 8 digits (if needed, append zeros to the left) Take the middle 4 digits to obtain the next 4 digit number \(x_1\); then square \(x_1\) and take the middle 4 digits again \(x_n\). .

Linear congruential generator

Produce a sequence of integers between \(0\) and \(1\) according to \(z_n = (az_n1 + c) \space mod\space d, \space n = 1, 2, . . .\) a is the multiplier, c the increment and \(d\) the modulus. To obtain uniform random numbers on (0, 1) we taken = \((r+0.5)/d\).

R base generator

R base generator “Mersenne-Twister” From Matsumoto and Nishimura (1998). A twisted GFSR with period \(2^{19937} - 1\) and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

Generate random numbers

Generate 10,000 random numbers using R base, Congruence and Middle-Square

R base random generator

library(dplyr)
library(randtoolbox)       
#generate uniform random numbers R base
set.seed(0123)
n <- 10000
rIf <- (runif(n, 0, 1))
head(rIf)

## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565

Linear congruence generator

\(r_{i+1}=ar_i+b(mod\space d),\) for integers \(a\gt 0,\space b\ge 0\) and \(d\gt 0\)

Where:
\(r_1\) = \(s\)= seed
\(a\) = multiplier
\(c\) = shift
\(d\) = modulus
\(m\) = quantity

#linear congruence
m <- 10000; s<- 12
a <- 1093; b<- 0
d<-86436
#Initialze vector
r <- numeric(m)
#Set seed
r[1] <- s
for (i in 1:(m-1))
{
   r[i+1] <- (a*r[i]+b) %% d
}
con <- (r + 0.5)/d
head(con)

## [1] 0.0001446157 0.1517481142 0.8543720209 0.8223020501 0.7698239160
## [6] 0.4112233329

Middle-square generator

z=0
ms<-(2017)
for (i in 1:10000){
        z <- ms[i]^2
        z <- sprintf('%08d', z)
        ms[i+1] <- as.numeric(substr(z, start = 3, stop = 6))
        
}

ms<-(ms+0.5)/10000

head(ms)

## [1] 0.20175 0.06825 0.46515 0.63185 0.91715 0.10725

Descriptive statistics

R base and linear congruence generator.
Summary statistics
Verify the maximum, mean and minimum value of n consecutive random numbers and the standard deviation of n consecutive values.

library(knitr)

## Warning: package 'knitr' was built under R version 3.3.2

opts_chunk$set(echo=FALSE,
               cache=TRUE, autodep=TRUE, cache.comments=FALSE,
               message=FALSE, warning=FALSE)
opts_chunk$set(fig.width=8, fig.height=4.5, dpi=300, out.width="840px", out.height="229px")
summary(rIf)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000653 0.2529000 0.4946000 0.4975000 0.7434000 0.9999000

summary(con)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0001446 0.2480000 0.4987000 0.4983000 0.7494000 0.9972000

summary(ms)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01405 0.21000 0.41000 0.50950 0.63180 0.97900

sd(rIf)

## [1] 0.2866937

sd(con)

## [1] 0.2888357

sd(ms)

## [1] 0.2240683

The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.

Plot (x,y)

For a \(x,y\) plot we expected a random noise, only R base show this,linear congrucence show a grid pattern and middle-square after showing a random at left, begin to show only four different number.

Exploratory Data Analysis

Graphical methods to detect outliers and anomalies and check underlying assumptions.

Box-plot

The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.

Histogram

R generator Histogram
The histogram below shows that the numbers are uniformly distributed between 0 and 1
No one number is more likely to occur than another

R base and linear congruence show a unifomity pattern, as we hoppe, but middle square show only four different numbers.

To better understand what happend with middle-square we generated 40 numbers without transform for [0:1]

##  [1] 2017  682 4651 6318 9171 1072 1491 2230 9729 6534 6931  387 1497 2410
## [15] 8081 3025 1506 2680 1824 3269 6863 1007  140  196  384 1474 1726 9790
## [29] 8441 2504 2700 2900 4100 8100 6100 2100 4100 8100 6100 2100

Two issues occur: Zeros are added at right, after 31, and cycle becomes short, repeating a sequence each four numbers.

With the poor performance it doesn’t make sense to run other test using the middle-square, no additional tests were run.

Analysis - Uniformity test

Chi-square is used test for uniformity of R-base and linear-congruence.

## $test.statistic
## [1] 11.844
## 
## $p.value
## [1] 0.2222474
## 
## $df
## [1] 9

## $test.statistic
## [1] 1.822
## 
## $p.value
## [1] 0.9939798
## 
## $df
## [1] 9

R-base- since the P-value(\(\chi ^2\) > 11.844) = 0.222 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.

Linear congruence - since the P-value(\(\chi ^2\) > 11.822) = 0.993 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.

Analysis - Sequence size (period)

R-base - the sequence size is greater thn the size of the sequence thus, the R-base generator repeats at 10,000.

## [1] 10000

## [1] 343

Linear congruence - the squence size is shorter than the number sequence and repeats at 343.
This result is not as expected, is a trouble, should be 10,000.

Analysis - Independence

Auto-correlation of R-base and linear congruence is used to determine whether independent random numbers are being produced in the sequence.

R-base - auto-correlation appear at lag 0 thus, there is no auto-correlation in the sequence length of 10,000

Linear congruence - as expected with the shortcommings of auto-correlation, which repeat the sequence in a short interval - there is a spike at the end of the short cycle of 343 where the sequence repeats.

Analysis - cumulative sequence - tosses a coin

Test a cumulative sequence where the consecutive values are classified above and below the value 0.5.
This test simulate a 10,000 toss coin and we expected the convergence to 0.5.

R base - did not converge to 0.5, for all seeds. However the R base generator has a propensity to converge to 0.5 depending on the seed used – this behavior was not expected.

Linear congruence - the performance was better than the R-base. However, the short cycle time, of 343, was reflected in the “teeth” pattern illustrated above - this patter is not what was expected.

Analysis - Time sequence (Gap test)

Tests the number of runs above and below the value 0.5 or runs up and down. The test involves counting the actual number of occurrences of runs of different lengths and comparing these counts to expected values by chi-square.

R base - the results were as expected so we accept \(H_0\).

## 
##           Gap test
## 
## chisq stat = 8.9, df = 13, p-value = 0.78
## 
##       (sample size : 10000)
## 
## length   observed freq       theoretical freq
## 1             1284            1250 
## 2             632             625 
## 3             318             312 
## 4             134             156 
## 5             83              78 
## 6             36              39 
## 7             20              20 
## 8             10              9.8 
## 9             7           4.9 
## 10            1           2.4 
## 11            0           1.2 
## 12            0           0.61 
## 13            0           0.31 
## 14            0           0.15

## 
##           Gap test
## 
## chisq stat = 56, df = 13, p-value = 3e-07
## 
##       (sample size : 10000)
## 
## length   observed freq       theoretical freq
## 1             1165            1250 
## 2             676             625 
## 3             320             312 
## 4             146             156 
## 5             116             78 
## 6             29              39 
## 7             29              20 
## 8             0           9.8 
## 9             0           4.9 
## 10            0           2.4 
## 11            0           1.2 
## 12            0           0.61 
## 13            0           0.31 
## 14            0           0.15

Linear congruence - as expected witht the short cycle of 343 based on the results we reject the null hypothesis.

Analysis - Uniformity Distribution on (0,1)

Visualization 10,000 Random Numbers Generated by R.
Plot\((y_n,y_{n+1})\), plot with one axes lagged by one unit.

R base - shows a random noise as expected.

Linear congruence - shows a grid pattern, consequence of short cycle. This is another way to verifying auto-correlation.

Conclusion

summary table

test	R base	linear cong	middle sq
histogram	ok	ok	rejected
stats	ok	ok	rejected
chi Sq	ok	ok	-
independence	ok	rejected	-
sequence leng	ok	rejected	-
tosses a coin	rejected	rejected	-
gap	ok	rejected	-
vizualization	ok	rejected	-

The middle-square method can’t be used in practical way as random number generator, it’s degenerate for zero and the numbers generated is not uniform, the zero number frequency is very high and can generate very short cycles.
The linear congruence main issue was the sequence size, too short to be used in this way, the sequence needs to be increase. However had nice final result at tosses coin, converged to 0.5.
The visualization in the previous slides illustrates that R is a True Number Generator, however, we found some issues with the dynamic behavior. The R generator did not work well for the coin toss. The results of some seeds were not as expected. It is possible, with further analysis, to improve the results. The result was very dependent on the seed value, for some seeds converge to 0.5 and for others can’t converge.
For intense use of random number, like simulation, we recommend test seed after, at least if it will converge.
The dynamic methods was more sensible to detect patterns (troubles) in the random sequence, highlighting: Plot(x,y), cumulative sequence - coin toss and 2D plot.

References:

Giordano, Frank; Weir, Maurice (2014). \(\textit{A First Course in Mathematical Modeling}\), 5th edition. Brooks and Cole Publishing Company.

Balci, Osman \(\textit{Implementation / Programming: Random Number Generation}\), Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, USA

Bruce E. Trumbo (2005). \(\textit{Congruential Generators of Pseudorandom Numbers}\)

\(\textit{CSC 433 -- Course Documents}\), \({http://www.facweb.cs.depaul.edu/sjost/csc433/documents/rand-var-gen.htm}\)

Are numbers Random? - IS 609

Marco Siqueira Campos, Sharon Morris

12/3/2017

Introduction

Project Objective

Hypothesis uniformity

Hypothesis independence

Approach

Middle-square generator

Linear congruential generator

R base generator

Generate random numbers

R base random generator

Linear congruence generator

Middle-square generator

Descriptive statistics

Plot (x,y)

Exploratory Data Analysis

Box-plot

Histogram

Analysis - Uniformity test

Analysis - Sequence size (period)

Analysis - Independence

Analysis - cumulative sequence - tosses a coin

Analysis - Time sequence (Gap test)

Analysis - Uniformity Distribution on (0,1)

Conclusion

References: