Introduction

A random number is a number that is drown from a set of values where each value is equally probable.

Properties of a random number generations are:

Project Objective

Hypothesis uniformity

\(H_0:R_1\space = U[0,1]\)
\(H_a:R_1\space \ne U[0,1]\)

Hypothesis independence

\(H_0:R_1\space = independently\)
\(H_a:R_1\space \ne independently\)

Approach

Three random generators test were applied to determine uniformity, independence and cycle length:

Middle-square generator

Start with a 4 digit number \(x_0\) (seed) Square it to obtain 8 digits (if needed, append zeros to the left) Take the middle 4 digits to obtain the next 4 digit number \(x_1\); then square \(x_1\) and take the middle 4 digits again \(x_n\). .

Linear congruential generator

Produce a sequence of integers between \(0\) and \(1\) according to \(z_n = (az_n1 + c) \space mod\space d, \space n = 1, 2, . . .\) a is the multiplier, c the increment and \(d\) the modulus. To obtain uniform random numbers on (0, 1) we taken = \((r+0.5)/d\).

R base generator

R base generator “Mersenne-Twister” From Matsumoto and Nishimura (1998). A twisted GFSR with period \(2^{19937} - 1\) and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

Generate random numbers

  • Generate 10,000 random numbers using R base, Congruence and Middle-Square

R base random generator

library(dplyr)
library(randtoolbox)       
#generate uniform random numbers R base
set.seed(0123)
n <- 10000
rIf <- (runif(n, 0, 1))
head(rIf)
## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565

Linear congruence generator

\(r_{i+1}=ar_i+b(mod\space d),\) for integers \(a\gt 0,\space b\ge 0\) and \(d\gt 0\)

Where:
\(r_1\) = \(s\)= seed
\(a\) = multiplier
\(c\) = shift
\(d\) = modulus
\(m\) = quantity

#linear congruence
m <- 10000; s<- 12
a <- 1093; b<- 0
d<-86436
#Initialze vector
r <- numeric(m)
#Set seed
r[1] <- s
for (i in 1:(m-1))
{
   r[i+1] <- (a*r[i]+b) %% d
}
con <- (r + 0.5)/d
head(con)
## [1] 0.0001446157 0.1517481142 0.8543720209 0.8223020501 0.7698239160
## [6] 0.4112233329

Middle-square generator

z=0
ms<-(2017)
for (i in 1:10000){
        z <- ms[i]^2
        z <- sprintf('%08d', z)
        ms[i+1] <- as.numeric(substr(z, start = 3, stop = 6))
        
}

ms<-(ms+0.5)/10000

head(ms)
## [1] 0.20175 0.06825 0.46515 0.63185 0.91715 0.10725

Descriptive statistics

  • R base and linear congruence generator.
  • Summary statistics
  • Verify the maximum, mean and minimum value of n consecutive random numbers and the standard deviation of n consecutive values.
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.2
opts_chunk$set(echo=FALSE,
               cache=TRUE, autodep=TRUE, cache.comments=FALSE,
               message=FALSE, warning=FALSE)
opts_chunk$set(fig.width=8, fig.height=4.5, dpi=300, out.width="840px", out.height="229px")
summary(rIf)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000653 0.2529000 0.4946000 0.4975000 0.7434000 0.9999000
summary(con)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0001446 0.2480000 0.4987000 0.4983000 0.7494000 0.9972000
summary(ms)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01405 0.21000 0.41000 0.50950 0.63180 0.97900
sd(rIf)
## [1] 0.2866937
sd(con)
## [1] 0.2888357
sd(ms)
## [1] 0.2240683

The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.

Plot (x,y)

For a \(x,y\) plot we expected a random noise, only R base show this,linear congrucence show a grid pattern and middle-square after showing a random at left, begin to show only four different number.

Exploratory Data Analysis

Graphical methods to detect outliers and anomalies and check underlying assumptions.

Box-plot

The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.

Histogram

  • R generator Histogram
  • The histogram below shows that the numbers are uniformly distributed between 0 and 1
  • No one number is more likely to occur than another

R base and linear congruence show a unifomity pattern, as we hoppe, but middle square show only four different numbers.

To better understand what happend with middle-square we generated 40 numbers without transform for [0:1]

##  [1] 2017  682 4651 6318 9171 1072 1491 2230 9729 6534 6931  387 1497 2410
## [15] 8081 3025 1506 2680 1824 3269 6863 1007  140  196  384 1474 1726 9790
## [29] 8441 2504 2700 2900 4100 8100 6100 2100 4100 8100 6100 2100

Two issues occur: Zeros are added at right, after 31, and cycle becomes short, repeating a sequence each four numbers.

With the poor performance it doesn’t make sense to run other test using the middle-square, no additional tests were run.

Analysis - Uniformity test

  • Chi-square is used test for uniformity of R-base and linear-congruence.
## $test.statistic
## [1] 11.844
## 
## $p.value
## [1] 0.2222474
## 
## $df
## [1] 9
## $test.statistic
## [1] 1.822
## 
## $p.value
## [1] 0.9939798
## 
## $df
## [1] 9

R-base- since the P-value(\(\chi ^2\) > 11.844) = 0.222 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.

Linear congruence - since the P-value(\(\chi ^2\) > 11.822) = 0.993 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.

Analysis - Sequence size (period)

R-base - the sequence size is greater thn the size of the sequence thus, the R-base generator repeats at 10,000.

## [1] 10000
## [1] 343

Linear congruence - the squence size is shorter than the number sequence and repeats at 343.
This result is not as expected, is a trouble, should be 10,000.

Analysis - Independence

Auto-correlation of R-base and linear congruence is used to determine whether independent random numbers are being produced in the sequence.

R-base - auto-correlation appear at lag 0 thus, there is no auto-correlation in the sequence length of 10,000

Linear congruence - as expected with the shortcommings of auto-correlation, which repeat the sequence in a short interval - there is a spike at the end of the short cycle of 343 where the sequence repeats.

Analysis - cumulative sequence - tosses a coin

R base - did not converge to 0.5, for all seeds. However the R base generator has a propensity to converge to 0.5 depending on the seed used – this behavior was not expected.

Linear congruence - the performance was better than the R-base. However, the short cycle time, of 343, was reflected in the “teeth” pattern illustrated above - this patter is not what was expected.

Analysis - Time sequence (Gap test)

Tests the number of runs above and below the value 0.5 or runs up and down. The test involves counting the actual number of occurrences of runs of different lengths and comparing these counts to expected values by chi-square.

R base - the results were as expected so we accept \(H_0\).

## 
##           Gap test
## 
## chisq stat = 8.9, df = 13, p-value = 0.78
## 
##       (sample size : 10000)
## 
## length   observed freq       theoretical freq
## 1             1284            1250 
## 2             632             625 
## 3             318             312 
## 4             134             156 
## 5             83              78 
## 6             36              39 
## 7             20              20 
## 8             10              9.8 
## 9             7           4.9 
## 10            1           2.4 
## 11            0           1.2 
## 12            0           0.61 
## 13            0           0.31 
## 14            0           0.15
## 
##           Gap test
## 
## chisq stat = 56, df = 13, p-value = 3e-07
## 
##       (sample size : 10000)
## 
## length   observed freq       theoretical freq
## 1             1165            1250 
## 2             676             625 
## 3             320             312 
## 4             146             156 
## 5             116             78 
## 6             29              39 
## 7             29              20 
## 8             0           9.8 
## 9             0           4.9 
## 10            0           2.4 
## 11            0           1.2 
## 12            0           0.61 
## 13            0           0.31 
## 14            0           0.15

Linear congruence - as expected witht the short cycle of 343 based on the results we reject the null hypothesis.

Analysis - Uniformity Distribution on (0,1)

R base - shows a random noise as expected.

Linear congruence - shows a grid pattern, consequence of short cycle. This is another way to verifying auto-correlation.

Conclusion

summary table

test R base linear cong middle sq
histogram ok ok rejected
stats ok ok rejected
chi Sq ok ok -
independence ok rejected -
sequence leng ok rejected -
tosses a coin rejected rejected -
gap ok rejected -
vizualization ok rejected -

References:

Giordano, Frank; Weir, Maurice (2014). \(\textit{A First Course in Mathematical Modeling}\), 5th edition. Brooks and Cole Publishing Company.

Balci, Osman \(\textit{Implementation / Programming: Random Number Generation}\), Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, USA

Bruce E. Trumbo (2005). \(\textit{Congruential Generators of Pseudorandom Numbers}\)

\(\textit{CSC 433 -- Course Documents}\), \({http://www.facweb.cs.depaul.edu/sjost/csc433/documents/rand-var-gen.htm}\)