library(ggplot2)
library(kableExtra)
Generate 99 random numbers using congruential generator \(r_{i+1} = (ar_i + b) \text{mod}\ d\), where \(a\) is the multiplier, $b is the increment and \(d\) is the modulus of the generator. Substitute \(a = 17\), \(b = 0\), \(d = 100\) and seed \(r_1 = 13\), to generate 99 pseudorandom numbers in R. What is the cycle length? What can you do to make it better?
#writing function for congruential generator
con.gen <- function(a,b,d,seed,n){
x <- rep(0,n)
x[1] <- seed
for(i in 2:length(x)){
x[i] <- (a*x[i-1]+b)%%d
}
return(x)
}
#Generate pseudo random numbers using function with a = 17, b = 0, d = 100, and r1 = 13 and print
solution1 <- con.gen(a = 17, b = 0, d = 100, seed = 13, n = 100)
print(solution1)
## [1] 13 21 57 69 73 41 97 49 33 61 37 29 93 81 77 9 53 1 17 89 13 21 57 69 73
## [26] 41 97 49 33 61 37 29 93 81 77 9 53 1 17 89 13 21 57 69 73 41 97 49 33 61
## [51] 37 29 93 81 77 9 53 1 17 89 13 21 57 69 73 41 97 49 33 61 37 29 93 81 77
## [76] 9 53 1 17 89 13 21 57 69 73 41 97 49 33 61 37 29 93 81 77 9 53 1 17 89
#View(solution1) # this will give you a better view while working on code
The cycle of the congruential generator is 20, since the next value of 13 shows up at the twenty first iteration.Note: Here, as with any congruential generator with b = 0, one must never use the seed r1 = 0. Also, because d = 100, the cycle length cannot exceed 100, but it is clear that the cycle length of this generator is much shorter. This can be improved by using different starting values for our generation. For example:
solution1.2 <- con.gen(a = 16, b = 3, d = 100, seed = 11, n = 100)
print(solution1.2)
## [1] 11 79 67 75 3 51 19 7 15 43 91 59 47 55 83 31 99 87 95 23 71 39 27 35 63
## [26] 11 79 67 75 3 51 19 7 15 43 91 59 47 55 83 31 99 87 95 23 71 39 27 35 63
## [51] 11 79 67 75 3 51 19 7 15 43 91 59 47 55 83 31 99 87 95 23 71 39 27 35 63
## [76] 11 79 67 75 3 51 19 7 15 43 91 59 47 55 83 31 99 87 95 23 71 39 27 35 63
#View(solution1.2)
The cycle of the congruential generator is 25, since the next value of 11 shows up at the twenty sixth iteration. Not much impovement. We can keep trying or read up theory https://en.wikipedia.org/wiki/Linear_congruential_generator and use it to improve .
Write down the R script that uses congruential generator with \(a = 13, b = 1, d = 64\) and seed =17
Print out 100 numbers from this generator and show that it has the maximum possible cycle length \(h = 64\), which is still too small for a useful generator. Verify that that the period is 64.
Plot successive pairs \((r_i r_{i+1})\) to illustrate a more subtle difficulty. For truly random numbers, uniformly distributed in the interval [0, 1).
Hint: For any congruential generator, plot of pairs (ri ri+1) will not truly “fill” the unit square. They will be found along a number of lines, which cannot exceed \(\sqrt{m}\) (but may be much smaller than that).
solution2 <- con.gen(a = 13, b = 1, d = 64, seed = 17, n = 100)
print(solution2)
## [1] 17 30 7 28 45 10 3 40 9 54 63 52 37 34 59 0 1 14 55 12 29 58 51 24 57
## [26] 38 47 36 21 18 43 48 49 62 39 60 13 42 35 8 41 22 31 20 5 2 27 32 33 46
## [51] 23 44 61 26 19 56 25 6 15 4 53 50 11 16 17 30 7 28 45 10 3 40 9 54 63
## [76] 52 37 34 59 0 1 14 55 12 29 58 51 24 57 38 47 36 21 18 43 48 49 62 39 60
Since the first time a 17 followed by a 30 occurs is in the 65th place in the vector, we know that the cycle is 64, which is the maximum possible.
#Rescaling solution2 to fit U(0,1)
solution2.unif <- (solution2 + 0.5)/64
#Plotting (r_i, r_{i+1}) pairs in the Unit Square
mydata<- data.frame("ri" = solution2.unif[1:99], "ri1" = solution2.unif[2:100])
square.plot <- ggplot(data = mydata) + geom_point(aes(x = ri, y = ri1))
print(square.plot)
The points are not uniformly distributed throughout the unit square, as seen in the graph. Also, there are only 64 points as the sequence repeats itself starting at 65. Thus, there are repeated dots on top of the originals.
Generate 100 uniform random numbers from theoretical density \(f(x) = 3x^2\), using inverse transform technique for continuous random variables. Construct a probability density histogram of a random sample generated by the inverse transform method and superimpose the real density curve on it to compare the shape. Properly label it as “Probability density histogram of a random sample generated by the inverse transform with theoretical density \(f(x) = 3x^2\).”
First, note that since
\[f_X(x) = \begin{cases} 3x^2 & 0<x<1,\\ 0, &\text{else,} \end{cases}\]
we have that,
\[F_X(x) = \int_{-\infty}^{\infty}f_X(x) dx = \int_{0}^{1}f_X(x) dx = x^3,\]
with \(0<x<1\).
Therefore,
\[F_X^{-1}(u) = u^{\frac{1}{3}},\]
with \(0<u<1\).
Therefore, using the inverse transform technique, we generate a vector of 100 uniform random numbers and transform it so that we then have a vector of random numbers generated from \(f_X(x) = 3x^2\).
#Generating the Random Values
set.seed(53)
u <- runif(100)
x <- u^(1/3)
#Creating Histogram using an alternate approach used by an ex student who was expert in ggplot. You can stick to simple code discussed in class and provided in code file
seqvalues <- seq(0.01,1,.01)
theoretical <- 3*seqvalues^2
pltdata <- data.frame("Seq" = seqvalues, "Theoretical" = theoretical, "Empirical" = x)
plt <- ggplot(data = pltdata) + theme_minimal()
plt <- plt + geom_histogram(aes(x = Empirical, y = ..density..), color = 'grey', fill = 'green', binwidth = .05)
plt <- plt + geom_area(aes(x = Seq, y = Theoretical), alpha = .3, fill = 'red')
plt <- plt + ggtitle(paste("Probability density histogram of a random sample generated by the inverse","\n" ,"transform with theoretical density"), subtitle = expression(paste(f(x), '=', 3*x^2))) + xlab("")
print(plt)
The graph clearly demonstrates the vector of generated random variables comes from the theoretical density.
If \(U\) ~ Unif(0,1), then show (empirically) that \(U^{\frac{1}{2}}\) ~ Beta(2,1). What is the CDF of Beta(2,1)?
First, notice that if \(f_X(x)\) is the pdf of Beta(2,1) then,
\[f_X(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha - 1}(1-x)^{\beta - 1} = \frac{2!}{1!0!}x^{2-1}(1-x)^{1-1} = 2x.\]
with \(x \in (0,1).\)
Therefore, the cdf of \(X\) is
\[F_X(x) = \int_{-\infty}^{x}f_X(t)dt = \int_0^x2tdt = x^2.\] Now, we will demonstrate empirically that \(U^{\frac{1}{2}}\) ~ Beta(2,1).
#Creating vector of random numbers from U^1/2
u <- runif(10000)
x <- u^(1/2)
#Creating fancy Plot using ggplot, again this code is contributed by Luke, a prevous student, you can stick to plain code provided in codebook and class.
df <- data.frame("Theoretical" = qbeta(ppoints(10000),2,1), "Empirical" = x)
plt1 <- ggplot(data = df) + geom_histogram(aes(x = Empirical, y = ..density..), color = "grey", fill = "green", bins = 30)
plt1 <- plt1 + geom_density(aes(x = Theoretical), fill = "red", alpha = .3)
plt1 <- plt1 + ggtitle("Plot of empirical x against density plot of Beta(2,1)")
print(plt1)
Let \(X\) have CDF \(F(x) = 1 - e^{-x^2}\), for \(x > 0\). Use the inverse transform technique to simulate 100,000 realizations of \(X\), and thus approximate \(E(X)\). Hint your answer should be somewhere close to 0.886
First, note that since \(F_X(0) = 0\) and when \(x<0\), \(1 - e^{-x^2} > 0\) we have that
\[F_X(x) = \begin{cases} 1 - e^{-x^2} & x\geq 0, \\ 0 & \text{else.} \end{cases}\]
Then, notice that,
\[\begin{align*}u = 1 - e^{-x^2} &\Rightarrow e^{-x^2} = 1-u \\ &\Rightarrow x^2 = -\text{ln}(1-u) \\ &\Rightarrow x = \sqrt{\text{ln}(\frac{1}{1-u})}.\\ \end{align*}\]
Thus,
\[F^{-1}_X(u) = \sqrt{\text{ln}(\frac{1}{1-u})}.\]
Thus, we use the following code to generate 100,000 realizations of \(X\).
#Generate X
set.seed(1234)
u <- runif(100000)
temp <- 1/(1-u)
temp2 <- log(temp)
x <- sqrt(temp2)
#Approximate E(X)
expected.x<-mean(x)
print(round(expected.x,3))
## [1] 0.887
As can be seen, \(E(X) \approx .887.\)