TD2: Générateurs pseudo-aléatoires (RICM4, 2013)

L'objectif de cette fiche est de présenter les méthodes principales de génération de nombres pseudo-aléatoires, de comprendre leurs limitations, d'apprendre à s'en méfier et de voir comment éventuellement les corriger…

Générateurs à base de congruence

congruent_random <- function(n = 100, a = 4, b = 2, m = 9, x1 = 1) {
    res = c(x1)
    for (i in 2:n) {
        res[i] = (a * res[i - 1] + b)%%m
    }
    res
}

Let's check whether this seems correct or not:

congruent_random(10)

##  [1] 1 6 8 7 3 5 4 0 2 1

Now, let's make a few simple graphical checks

graphical_check <- function(X) {
    par(mfrow = c(1, 2))
    hist(X)
    plot(X[1:length(X) - 1], X[2:length(X)])
}
graphical_check(congruent_random(100))

plot of chunk unnamed-chunk-3

Here is the ggplot2 version, just in case:

library(ggplot2)
require(gridExtra)

## Loading required package: gridExtra
## Loading required package: grid

graphical_check_ggplot <- function(X) {
    par(mfrow = c(1, 2))
    df <- data.frame(x = X[1:length(X) - 1], y = X[2:length(X)])
    p1 <- ggplot(df, aes(x = x)) + geom_histogram() + theme_bw()
    p2 <- ggplot(df, aes(x = x, y = y)) + geom_point() + theme_bw()
    grid.arrange(p1, p2, ncol = 2)
}
graphical_check_ggplot(congruent_random(100))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-4

OK, obviously, one cannot expect much from a generator whose space state is so small. Let's try with another one:

graphical_check_ggplot(congruent_random(n = 100, a = 11, b = 1, m = 71))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-5

Yet, it you ever computed the correlation of {xn} and {x{n+1}} , you could have oversee this relation:

X <- congruent_random(n = 1000, a = 11, b = 1, m = 71)
cor(X[1:length(X) - 1], X[2:length(X)])

## [1] 0.04578

Co-variance would have helped though.

cov(X[1:length(X) - 1], X[2:length(X)])

## [1] 19.09

Fortunately, some generator are better than others:

graphical_check_ggplot(congruent_random(n = 1000, a = 24298, b = 99991, m = 199017))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-8

Anyway, you should always be suspicious with random number generators. After all, although the previous two graphical checks seem comforting, there are probably other biases left…

Générateurs à décalage de registres

X = (xor_random <- function(n = 100, filter = c(1, 0, 1, 0), seed = c(1, 0, 
    1, 0)) {
    res = seed
    m = length(filter)
    print(m)
    for (i in (length(seed)):(n - 1)) {
        # Yes, you need to put parenthesis around n-1 :(
        res[i + 1] = sum(filter * (res[(i - m + 1):i]))%%2  # Yes, you need to put parenthesis around i-m+1 :(
    }
    res
})
xor_random()

## [1] 4

##   [1] 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0
##  [36] 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0
##  [71] 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0

xor_random(seed = c(1, 0, 0, 1))

## [1] 4

##   [1] 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1
##  [36] 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1
##  [71] 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1

Yuck. This is definitely not random… Actually, there are two interleaved sequences. The length of the cycle is 8. Let's try something different.

xor_random(filter = c(1, 1, 0, 0), seed = c(1, 1, 0, 1))

## [1] 4

##   [1] 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0
##  [36] 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0
##  [71] 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0

That's better. Now the length of the cycle is 16.

Well, maybe those numbers are just intrinsically better!