Noorshat 2022/9/30
u = runif(2000) ## generate a sample of unif(0,1) samples
hist(u, probability=TRUE, col = "gray", border = "white") ## plot histogram## Q-Q plot for `runif` data against true theoretical distribution:
qqplot(x = qunif(ppoints(500)), y = u, main = expression("Q-Q plot for Unif(0,1)"))
qqline(y = u, distribution = qunif, prob = c(0.1, 0.6), col = 2)acf(x = u, main = "autocorrelation of samples") ## autocorrelation functionOnce we can simulate \(u-Unif(0,1)\), we can begin generating samples from other target distributions.\(v\sim Unif(0,10)\), where we know that \(v=10\times u\). Therefore,
v = 10 * runif(1000, 0, 1) ## v ~ unif(0, 10)
hist(v, col = "gray", border = "white")w = runif(1000, 0, 10) ## unif(0, 10)
hist(w, col = "gray", border = "white")Another use of generating samples is to generate Bernoulli. Once we’ve generated Bernoulli samples, we can sum the Bernouilli samples to generate Binomial samples. $$ uUnif(0,1)\ x=I(up) xBernoulli(p)\ y=^{n}_{i=1}x_i yBin(n,p)
$$
## let's simulate 10,000 samples from bin(n=10, p = 0.3)
N = 10000 ## number of Unif and Binomial samples
n = 10 ## number of bernoulli trials for each bernoulli sample
p = 0.3 ## theoretical p
u = runif(n = (n*N))
x = as.numeric(u <= p) ## convert bools to 1 and 0; Bernoulli samples
m = matrix(data = x , nrow = N, ncol = n)
Binom_samples = rowSums(m) ## binomial samples
par(mfrow = c(1,2), mar = c(3, 4, 1, 2))
hist(Binom_samples, probability = TRUE, main = "Binom(10,0.3) from Unif(0,1)",
col = "blue", border = "white")
hist(rbinom(n = N, size = n, prob = p), probability = TRUE, main = "rbinom(10,0.3)",
col = "gray", border = "white")The inverse transform method is a simple algorithm for generating random variables \(x\) from a continuous target distribution \(f(x)\) using random samples from a \(Unif(0,1)\) distribution.
Theorem (Probability Integral Transformation):If is a continuous random variable with CDF \(F_X(X)\),then \(U=F_X(X)-Unif(0,1)\). If \(U\sim Unif(0,1)\), then for all \(x\in \mathbf{R}\) $$
$$ For example, suppose we are interested in generating \(n=10,000\) random values from an Exponential distribution.
1.\(f(X)=\lambda e^{-\lambda X}\)
2.\(F(X) = 1-e^{-\lambda X}=U\)
3.\(F^{-1}(U) = -\frac{1}{\lambda \log(1-U)}\); can use \((1-u)\) or \(u\), since both are uniformly distributed.
If we set \(\lambda = 5\), then
N = 10^4
u = runif(N)
fInv = function(u){
(-1/5)*log(u) ## or log(1-u)
}
outSamples = fInv(u)
hist(outSamples, probability = TRUE, main = "Inv Trans", col = "gray", border = "white")
lines(x = ppoints(200), y = dexp(x = ppoints(200), rate = 5),
col = "blue")hist(rexp(n = N, rate = 5), probability = TRUE, main = "rexp", border = "gray")
lines(x = ppoints(200), y = dexp(x = ppoints(200), rate = 5),
col = "blue")To generate samples, for(i in 1:n)
Generate \(Y\sim gY(t)\) and \(U\sim Unif(0,1)\)
If \(U\leq \frac{f(Y)}{M\times g(Y)}\) then we accept \(Y\), such that \(Y=X\)
Repeat until you have sufficient samples
In order for the algorithm to work we require the following constraings:
\(f\) and \(g\) have to have compatible supports (i.e. \(g(x)>0\) when \(f(x)>0\))
There is a constant \(M\) such that \(\frac{f(t)}{g(t)}\leq M\)
For example: Suppose we’d like to generate samples from \(\beta(2,2)\) distribution. The density function for \(\beta(2,2)\) is simply \(f(x)=6x(1-x)\) for \(0<x<1\). Since our domain is between 0 and 1, we can use a simple \(Unif(0,1)\) density as our instrumental density, \(g\).
Then, by the accept-reject algorithm we can simulate a random variable \(Y\sim g\), and a random variable \(U\sim Unif(0,1)\). Then, if\(M\times U\leq \frac{f(Y)}{g(Y)}\), we accept the candidate variable \(Y\sim g\) as \(X\), \(X=Y\). Otherwise, we reject and simulate again until we get an appropriate sample size.
## Accept-Reject
M = 1.5
X = rep(x = NA, 5) ## create a vector of length 5 of NAs
set.seed(123)
f <- function(x){ 6*x*(1 - x)} ## pdf of Beta(2,2)
g <- function(x){ 1 } ## pdf of Unif(0,1) is just 1
n = 10000
## First, we'll generate 5 samples to test the algorithm
for(i in 1:5){
print(paste("Run: ", i))
u = round(runif(1), 5)
y = round(runif(1), 5)
accept <- u <= f(y)/(M* g(y))
print(paste("U: ", u, "and Y:", y, "and f/M*g: ",f(y)/(M* g(y))))
print(paste("Accept? ", accept))
if(accept){
X[i] <- y
}
}## [1] "Run: 1"
## [1] "U: 0.28758 and Y: 0.78831 and f/M*g: 0.6675093756"
## [1] "Accept? TRUE"
## [1] "Run: 2"
## [1] "U: 0.40898 and Y: 0.88302 and f/M*g: 0.4131827184"
## [1] "Accept? TRUE"
## [1] "Run: 3"
## [1] "U: 0.94047 and Y: 0.04556 and f/M*g: 0.1739371456"
## [1] "Accept? FALSE"
## [1] "Run: 4"
## [1] "U: 0.52811 and Y: 0.89242 and f/M*g: 0.3840261744"
## [1] "Accept? FALSE"
## [1] "Run: 5"
## [1] "U: 0.55144 and Y: 0.45661 and f/M*g: 0.9924692316"
## [1] "Accept? TRUE"
## Now, say we needed n = 10, 000 samples from Beta(2, 2), then a better implementation would be
X = rep(NA, n); M = 1.5
i = 0 ## index set to start at 0
while(sum(is.na(X))){
U = runif(1); Y = runif(1)
accept <- U <= f(Y)/(M*g(Y))
if(accept){
i = i+1 ## update the index
X[i] <- Y
}
}
round(summary(X), 4)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0037 0.3250 0.4962 0.5000 0.6767 0.9930
round(qbeta(p = c(0, 0.25, 0.5, 0.75, 1), 2, 2), 4)## [1] 0.0000 0.3264 0.5000 0.6736 1.0000
## plot 1
hist(X, xlab = "X", main = "Beta(2,2) from Accept-Reject algorithm",
probability = TRUE)
beta <- function(x) 6*x*(1-x)
curve(expr = beta, from = 0, to = 1, add = TRUE)## plot 2
curve(expr = beta, from = 0, to = 1,
xlim = c(-0.5, 1.5), ylim = c(0,2),
main = "Beta(2,2) Density with Unif(0,1)", xlab = "x", ylab = "density")
x = seq(from = -1, to = 2, by = 0.01)
Unif1 = function(x){ ifelse(x >= 0 & x <= 1, 1, 0) }
polygon(x, Unif1(x), lty = 9)## plot 3
curve(expr = beta, from = 0, to = 1,
xlim = c(-0.2, 1.2), ylim = c(0,2),
main = "Beta(2,2) with M*Unif(0,1)", xlab = "x", ylab = "density")
Unif2 = function(x){ ifelse(x >= 0 & x <= 1, 1*M, 0) }
polygon(x, Unif2(x), lty = 2)par(mar = c(3, 4, 1, 2))
a = 3
b = 10
A = matrix(rexp(10000*a, 1), ncol = a)
B = matrix(rexp(10000*b, 1), ncol = b)
numerator = rowSums(A)
denom = rowSums(A) + rowSums(B)
betasamples = numerator/denom
hist(betasamples, probability=TRUE, col = "gray", border = "white")
curve(dbeta(x, a, b), from=0, to=1, add=TRUE, lty=3, lwd = 2)n = 10000
x = rexp(n=n, rate = 1)
mat = matrix(data = x, ncol = 4)
chisqResults = rowSums(mat)
par(mfrow = c(1, 3))
hist(chisqResults)
hist(rchisq(n, df = 2*4))
qqplot(x = chisqResults, y = rchisq(n, 8))