A random number is a number that is drown from a set of values where each value is equally probable.
Properties of a random number generations are:
\(H_0:R_1\space = U[0,1]\)
\(H_a:R_1\space \ne U[0,1]\)
\(H_0:R_1\space = independently\)
\(H_a:R_1\space \ne independently\)
Three random generators test were applied to determine uniformity, independence and cycle length:
Start with a 4 digit number \(x_0\) (seed) Square it to obtain 8 digits (if needed, append zeros to the left) Take the middle 4 digits to obtain the next 4 digit number \(x_1\); then square \(x_1\) and take the middle 4 digits again \(x_n\). .
Produce a sequence of integers between \(0\) and \(1\) according to \(z_n = (az_n1 + c) \space mod\space d, \space n = 1, 2, . . .\) a is the multiplier, c the increment and \(d\) the modulus. To obtain uniform random numbers on (0, 1) we taken = \((r+0.5)/d\).
R base generator “Mersenne-Twister” From Matsumoto and Nishimura (1998). A twisted GFSR with period \(2^{19937} - 1\) and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.
library(dplyr)
library(randtoolbox)
#generate uniform random numbers R base
set.seed(0123)
n <- 10000
rIf <- (runif(n, 0, 1))
head(rIf)
## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565
\(r_{i+1}=ar_i+b(mod\space d),\) for integers \(a\gt 0,\space b\ge 0\) and \(d\gt 0\)
Where:
\(r_1\) = \(s\)= seed
\(a\) = multiplier
\(c\) = shift
\(d\) = modulus
\(m\) = quantity
#linear congruence
m <- 10000; s<- 12
a <- 1093; b<- 0
d<-86436
#Initialze vector
r <- numeric(m)
#Set seed
r[1] <- s
for (i in 1:(m-1))
{
r[i+1] <- (a*r[i]+b) %% d
}
con <- (r + 0.5)/d
head(con)
## [1] 0.0001446157 0.1517481142 0.8543720209 0.8223020501 0.7698239160
## [6] 0.4112233329
z=0
ms<-(2017)
for (i in 1:10000){
z <- ms[i]^2
z <- sprintf('%08d', z)
ms[i+1] <- as.numeric(substr(z, start = 3, stop = 6))
}
ms<-(ms+0.5)/10000
head(ms)
## [1] 0.20175 0.06825 0.46515 0.63185 0.91715 0.10725
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.2
opts_chunk$set(echo=FALSE,
cache=TRUE, autodep=TRUE, cache.comments=FALSE,
message=FALSE, warning=FALSE)
opts_chunk$set(fig.width=8, fig.height=4.5, dpi=300, out.width="840px", out.height="229px")
summary(rIf)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000653 0.2529000 0.4946000 0.4975000 0.7434000 0.9999000
summary(con)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0001446 0.2480000 0.4987000 0.4983000 0.7494000 0.9972000
summary(ms)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01405 0.21000 0.41000 0.50950 0.63180 0.97900
sd(rIf)
## [1] 0.2866937
sd(con)
## [1] 0.2888357
sd(ms)
## [1] 0.2240683
The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.
For a \(x,y\) plot we expected a random noise, only R base show this,linear congrucence show a grid pattern and middle-square after showing a random at left, begin to show only four different number.
Graphical methods to detect outliers and anomalies and check underlying assumptions.
The performance of linear congruence and R base was right, however the middle-square show the median lower than 0.5, is expected the median to stay close to 0.5.
R base and linear congruence show a unifomity pattern, as we hoppe, but middle square show only four different numbers.
To better understand what happend with middle-square we generated 40 numbers without transform for [0:1]
## [1] 2017 682 4651 6318 9171 1072 1491 2230 9729 6534 6931 387 1497 2410
## [15] 8081 3025 1506 2680 1824 3269 6863 1007 140 196 384 1474 1726 9790
## [29] 8441 2504 2700 2900 4100 8100 6100 2100 4100 8100 6100 2100
Two issues occur: Zeros are added at right, after 31, and cycle becomes short, repeating a sequence each four numbers.
With the poor performance it doesn’t make sense to run other test using the middle-square, no additional tests were run.
## $test.statistic
## [1] 11.844
##
## $p.value
## [1] 0.2222474
##
## $df
## [1] 9
## $test.statistic
## [1] 1.822
##
## $p.value
## [1] 0.9939798
##
## $df
## [1] 9
R-base- since the P-value(\(\chi ^2\) > 11.844) = 0.222 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.
Linear congruence - since the P-value(\(\chi ^2\) > 11.822) = 0.993 which is greater than the significance level (0.05) we accept the null hypothesis - we conclude the sequence match the uniform distribution.
R-base - the sequence size is greater thn the size of the sequence thus, the R-base generator repeats at 10,000.
## [1] 10000
## [1] 343
Linear congruence - the squence size is shorter than the number sequence and repeats at 343.
This result is not as expected, is a trouble, should be 10,000.
Auto-correlation of R-base and linear congruence is used to determine whether independent random numbers are being produced in the sequence.
R-base - auto-correlation appear at lag 0 thus, there is no auto-correlation in the sequence length of 10,000
Linear congruence - as expected with the shortcommings of auto-correlation, which repeat the sequence in a short interval - there is a spike at the end of the short cycle of 343 where the sequence repeats.
Test a cumulative sequence where the consecutive values are classified above and below the value 0.5.
This test simulate a 10,000 toss coin and we expected the convergence to 0.5.
R base - did not converge to 0.5, for all seeds. However the R base generator has a propensity to converge to 0.5 depending on the seed used – this behavior was not expected.
Linear congruence - the performance was better than the R-base. However, the short cycle time, of 343, was reflected in the “teeth” pattern illustrated above - this patter is not what was expected.
Tests the number of runs above and below the value 0.5 or runs up and down. The test involves counting the actual number of occurrences of runs of different lengths and comparing these counts to expected values by chi-square.
R base - the results were as expected so we accept \(H_0\).
##
## Gap test
##
## chisq stat = 8.9, df = 13, p-value = 0.78
##
## (sample size : 10000)
##
## length observed freq theoretical freq
## 1 1284 1250
## 2 632 625
## 3 318 312
## 4 134 156
## 5 83 78
## 6 36 39
## 7 20 20
## 8 10 9.8
## 9 7 4.9
## 10 1 2.4
## 11 0 1.2
## 12 0 0.61
## 13 0 0.31
## 14 0 0.15
##
## Gap test
##
## chisq stat = 56, df = 13, p-value = 3e-07
##
## (sample size : 10000)
##
## length observed freq theoretical freq
## 1 1165 1250
## 2 676 625
## 3 320 312
## 4 146 156
## 5 116 78
## 6 29 39
## 7 29 20
## 8 0 9.8
## 9 0 4.9
## 10 0 2.4
## 11 0 1.2
## 12 0 0.61
## 13 0 0.31
## 14 0 0.15
Linear congruence - as expected witht the short cycle of 343 based on the results we reject the null hypothesis.
R base - shows a random noise as expected.
Linear congruence - shows a grid pattern, consequence of short cycle. This is another way to verifying auto-correlation.
summary table
test | R base | linear cong | middle sq |
---|---|---|---|
histogram | ok | ok | rejected |
stats | ok | ok | rejected |
chi Sq | ok | ok | - |
independence | ok | rejected | - |
sequence leng | ok | rejected | - |
tosses a coin | rejected | rejected | - |
gap | ok | rejected | - |
vizualization | ok | rejected | - |
The middle-square method can’t be used in practical way as random number generator, it’s degenerate for zero and the numbers generated is not uniform, the zero number frequency is very high and can generate very short cycles.
The linear congruence main issue was the sequence size, too short to be used in this way, the sequence needs to be increase. However had nice final result at tosses coin, converged to 0.5.
The visualization in the previous slides illustrates that R is a True Number Generator, however, we found some issues with the dynamic behavior. The R generator did not work well for the coin toss. The results of some seeds were not as expected. It is possible, with further analysis, to improve the results. The result was very dependent on the seed value, for some seeds converge to 0.5 and for others can’t converge.
For intense use of random number, like simulation, we recommend test seed after, at least if it will converge.
The dynamic methods was more sensible to detect patterns (troubles) in the random sequence, highlighting: Plot(x,y), cumulative sequence - coin toss and 2D plot.
Giordano, Frank; Weir, Maurice (2014). \(\textit{A First Course in Mathematical Modeling}\), 5th edition. Brooks and Cole Publishing Company.
Balci, Osman \(\textit{Implementation / Programming: Random Number Generation}\), Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, USA
Bruce E. Trumbo (2005). \(\textit{Congruential Generators of Pseudorandom Numbers}\)
\(\textit{CSC 433 -- Course Documents}\), \({http://www.facweb.cs.depaul.edu/sjost/csc433/documents/rand-var-gen.htm}\)