\(\mathbf{A. \space Random \space Number \space Generation:}\) (The good ones should be: indistinguishable, reproducible, fast & simple, and not repeat too soon)

\(\mathbf{Linear \space congruential \space generator}\) (most PRNG algorithms are based on):

\[X_n=\left( aX_{n-1}+c \right) \space mod \space m, \space n=1,2...\]

The output of a Linear sequence is \(U_n=\frac{X_n}{m}\)

Note: the period of the algorithm is maximized if \(c\) and \(m\) are relatively prime. R example:

lcg <- function(n, m, a, c, seed, print=F) {x  <- numeric(n);  x[1] <- seed; for (i in 1:(n-1)){ x[i+1]=(a*x[i]+c)%%m } ;x}
lcg(1000, 57, 1, 256, 10)[1:5]
## [1] 10 38  9 37  8

\(\mathbf{Multiple \space Recursive \space Generator:}\)

\[\begin{equation} \begin{split} X_n &= \left( 1403580X_{n-2}-810728X_{n-3} \space mod \space m_1 \right) \space (m_1=2^{32}-209) \\ Y_n &= \left( 527612Y_{n-1}-1370589Y_{n-3} \space mod \space m_2 \right) \space (m_2=2^{32}-22853) \end{split} \end{equation}\]

and outputs

\[U_n = \left\{ \begin{array}{ll} \frac{X_n-Y_n+m_1}{m_1+1} \space if \space X_n \le Y_n \\ \frac{X_n-Y_n}{m_1+1} \space if \space X_n \ge Y_n \end{array} \right.\]

\(\mathbf{Testing \space Random \space Number \space Generator}\)

\(\mathbf{Permutation \space Test:}\) Order the components in \(\mathbf{U}=\left( U_1, \dots , U_n \right)\) from smallest to largest and let \(\Pi\) denote the corresponding ordering of indices. Under \(H_0\), \(\Pi\) should have a uniform distribution across all possible ordering \(\pi\). Test \(H_0\), use \(\chi^2 = \sum \frac{(v-E(v))^2}{E(v)}\) with \(df=\) # of bins \(- 1\).
\[P\left( \Pi=\pi\right)=\frac{1}{n!}\]

\(\mathbf{Gap \space Test:}\) Let \(T_i\) denote the times when the output process \(U_i\) visits a specified interval \(\left(\alpha, \beta \right) \space \in \space (0,1)\) and let \(Z_i\) denote the gap lengths between subsequent visits. That is \(Z_i=T_i-T_{i-1}, \space T_0=0\). So the distribution of \(Z\) is: \(\mathbf{Z \sim Geo(p)}\) where \(p=\) the success probability \(=\alpha-\beta\). \(P(Z=z)=(1-p)^{z-1}p\) if we define \(Z=\) the steps up to and include success; \(P(Z=z)=(1-p)^zp\) if we define \(Z=\) steps of failure before first success. R example:

time <- function(n){set.seed(10) ; u <- runif(n, 0, 1) ; c(which(u>.5&u<=1)) }
n=100 ; z <- numeric() ; t <- time(n)
for (i in 2:length(t)){ z[1]=t[1] ; z[i]=t[i]-t[i-1]  } ; head(z)
## [1] 1 3 5 2 1 2

A gap test will bin the gaps into groups and perform a \(\chi^2\) GOF test on the results.

\(\mathbf{Maximum-of-d-Tests:}\) Generate \(n\) vectors \(\mathbf{U_1,\dots,U_n}\) each of length of \(d\). For each \(U_i=(U_1,\dots,U_d)\) let \(Z_i=max{U_1,\dots,U_d}\) be the maximum. Under \(H_0\) the distribution of \(Z\) is \(P(Z\le z)=Z^d\). Apply the Kolmogorov-Smirnov test to compare the empirical distribution of \(Z_1, \dots, Z_n\) to this expected \(CDF\).

\(\mathbf{Generating \space Non-Uniform \space Random \space Variables}\)

\(\mathbf{The \space Transform \space Method:}\) (Shown by an example)

\[f(x) = \left\{ \begin{array}{ll} \frac{1}{2}x & \mbox{if } 0 < x < 1 \\ \frac{1}{2} & \mbox{if } 1 \le x \le \frac{5}{2} \end{array} \right.\]

For interval \(0<x<1\), \(\left.\int_0^x \frac{1}{2}xdx=\frac{1}{4}x^2\right|_0^x=\frac{1}{4}x^2=u \implies \space F^{-1}(u) = 2u^{\frac{1}{2}}\)

For interval \(1\le x \le \frac{5}{2}\),\(\int_0^1 \frac{1}{2}x\space dx + \int_1^x \frac{1}{2}dx=\frac{1}{2}x - \frac{1}{4}=u \implies \space F^{-1}(u) = 2u+\frac{1}{2}\)

\(\mathbf{Accept-Reject \space Method:}\) \(1).\) Generate \(Y\sim g\); \(2).\) Generate \(U \sim U[0,1]\) ( independent of \(Y\)); \(3).\) Return \(X=Y\) if \(U \le \frac{f(Y)}{Mg(Y)}\). Otherwise reject \(Y\) and start over. (Shown by example)

(Cont.) Let \(g(x)= \frac{8}{25}x, 0≤x≤\frac{5}{2}\). Based on the \(pdf\) of \(f(x)\), the smallest constant \(M\) that satisfies \(\frac{f(x)}{g(x)} \le M\) is \(\frac{25}{16}\). Since \(y=1.8\) is in the \(1\le x \le\frac{5}{2}\) range, so we only consider the \(pdf\) \(f(x)=\frac{1}{2}\)

\[\begin{equation} \begin{split} \frac{f(y)}{fMg(y)} & = \frac{\frac{1}{2}}{M\frac{8}{25}y}\approx 0.56\\ \end{split} \end{equation}\] Hence, \(u=0.3 \le \frac{f(y)}{Mg(x)}\), return \(x=1.8\)

\(\mathbf{Box-Mueller \space Algorithm:}\) If \(U_1\) and \(U_2\) are \(iid \space U[0.1]\) random variables, the variable \(X_1\) and \(X_2\) defined by \[X_1 = \sqrt{-2log(U_1)}cos(2\pi U_2);\space X_2 = \sqrt{-2log(U_1)}sin(2\pi U_2)\]

\(\mathbf{B. \space Kernel \space Density \space Estimation: \hat{f}(x_0)=\frac{1}{nh}\sum^n_{i=1}K\left( \frac{x_0-x_i}{h} \right)}\)

\(\mathbf{Uniform:}\) \(K(x)=\frac{1}{2} \mathbf{1}_{\left\{|x|<1 \right\}}\) ; \(\mathbf{Triangular:}\) \(K(x)=\left( 1-|x| \right)\mathbf{1}_{\left\{ |x|<1 \right\}}\) ;

\(\mathbf{Gaussian:}\) \(K(x)=\frac{1}{\sqrt{2\pi}}exp\left( -\frac{1}{2}x^2 \right)\) ; \(\mathbf{Epanechnikov:}\) \(\frac{3}{4}\left( 1-x^2 \right)\mathbf{1}_{\left\{ |x|<1 \right\}}\)

\(\mathbf{Measures \space of \space Performance:}\) \[ISE(h)= \int^\infty_{-\infty} \left( \hat{f}(x)-f(x) \right)^2dx \space \space \space (\text{This measure is very based on the data smaple because of } \hat{f})\] \[MSE\left( \hat{f}(x) \right) = E\left[ \left( \hat{f}(x) - f(x) \right)^2 \right]=Var\left( \hat{f}(x)\right) + \left( bias\left( \hat{f}(x)\right)\right)^2\] \[MISE(h)=E[ISE(h)]=E\left[\int^\infty_{-\infty} \left( \hat{f}(x)-f(x) \right)^2dx \right]=\int MSE\left( \hat{f}(x) \right)dx = \int \left( Var\left( \hat{f}(x)\right) + \left( bias\left( \hat{f}(x)\right)\right)^2 \right)dx \]

\(\mathbf{Choice \space of \space Bandwidth:}\) \[R(\hat{f})=\int R(\hat{f}(x))dx \space \space \space \space \text{The Measure of the Roughness of a Function}\] \[MISE_h=\boxed{AMISE_h}+\omicron \left( \frac{1}{nh}+h^4 \right)=\boxed{\frac{R(k)}{nh}+\frac{h^4\sigma^4_kR(ff^{\prime\prime})}{4}}+ \omicron \left( \frac{1}{nh}+h^4 \right) \implies h = \left( \frac{R(k)}{n\sigma^4_K R\left( f^{\prime \prime} \right)} \right)^\frac{1}{5} \]

\(\mathbf{Cross-Validation \space Methods:}\) select \(h\) to minimize \(ISE(h)\)

\[\hat{f}_{-i}(X_i)=\frac{1}{h(n-1)}\sum_{j \ne i} k\left( \frac{X_i-X_j}{h} \right)\] \[UCV(h) =\int \hat{f}^2(x)dx-\frac{2}{n}\sum^n_{i=1}\hat{f}_{-i}(x_i); \space ISE(h)=\boxed{R\left( \hat{f} \right)-2E\left[ \hat{f}(x) \right]}+R\left( f \right) = \boxed{UCV(h)} + R\left( f \right)\]

\(\mathbf{Plug-In \space Methods:}\) \(1.\) Use a pilot-bandwidth to find \(\hat{f}\) to estimate \(f\); \(2.\) Use the properties of this exploratory \(\hat{f}\) to estimate properties of \(f\) \(\left(R\left(\hat{f}^{\prime\prime}\right) \implies \hat{R}\left(\hat{f}^{\prime\prime}\right)\right)\); \(3.\) Use the estimated properties of \(f\) to provide a better bandwidth estimate.

\(\mathbf{Silverman's \space Rule \space of \space Thumb:}\) \(h=\hat{\sigma}\left( \frac{4}{3n} \right)^\frac{1}{5}\) ; \(\mathbf{Sheather-Jones Method:}\) \(h = \left( \frac{R(k)}{n\sigma^4_K \hat{R}\left( f^{\prime \prime} \right)} \right)^\frac{1}{5}\)

\(\mathbf{Choice \space of \space kernel:}\) Minimize \(AMISE \implies\) optimal kernel function. For the Epanechnikov kernel, the minimized \(AMISE\) is: \(\frac{5}{4}\left[ \sigma_k R(K)/n\right]^{^4_5}R(f^{\prime\prime})^{^1_5}\). As changing the kernel \(K\implies \uparrow \sigma_kR(K) \implies\) sample size \(n \downarrow\)

\(\mathbf{The \space Relative \space Efficiency}\): \(\frac{\sigma_{K_1}R(K_1)}{\sigma_{K_2}R(K_2)}\)

\(\mathbf{C. \space Multivariate \space Kernel \space Density \space Estimation:}\) \(\hat{f}(\mathbf{X}) \frac{1}{n|\mathbf{H}|} \sum^n_{i=1} K\left( \mathbf{H}^{-1} \left( \mathbf{X-X_i} \right) \right)\)

\(\mathbf{Multivariate \space Standard \space Normal:}\) \(K(\mathbf{X})=\frac{1}{\sqrt{(2\pi)^p}}exp\left( -\frac{1}{2} \mathbf{X^\prime X} \right)\)

\(\mathbf{Multivariate \space Epanechnikov:} \space\) \(K(\mathbf{X})=\frac{(p+2)\Gamma (1+^p/_2)}{2\pi^{^p/_2}}\left( 1-\mathbf{X^\prime X} \right), \mathbf{X^\prime X}\le 1\)

\(\mathbf{The \space kernel \space Product \space Approach:}\) \(\hat{f}(\mathbf{X})=\frac{1}{2} \sum^n_{i=1} \prod^p_{j=1} \frac{1}{h_j} K\left( \frac{x_j-x_{ij}}{h_j} \right)\)

\(\mathbf{D. \space Kernel \space Regression:}\)

\(\mathbf{The \space Response \space Surface \ Function:}\) \(g(\mathbf{X})=E\left[ y|\mathbf{X}\right]= \int y\space f\left( y|\mathbf{X} \right)dy = \frac{\int y \space f\left( y, \space \mathbf{X} \right)}{f(\mathbf{X})}=\frac{\int y\space \hat{f}\left( y, \space \mathbf{X} \right)}{\hat{f}(\mathbf{X})}=\frac{\sum^n_{i=1}K\left(\mathbf{H}^{-1}(\mathbf{X-X_i})\space \right) y_i }{\sum^n_{i=1}K\left(\mathbf{H}^{-1}(\mathbf{X-X_i} \right)}\)

\(\mathbf{Nadaraya-Watson \space Estimator:}\) \(g(\mathbf{X})=\frac{\sum^n_{i=1} K_H(\mathbf{X_i}-\mathbf{X})y_i}{\sum^n_{i=1} K_H(\mathbf{X_i}-\mathbf{X})} = \sum^n_{i=1} W_H (\mathbf{X}, \mathbf{X_i})y_i\)

\(\mathbf{E. \space Gaussian \space Process \space Models:}\) \(\mathbf{A \space Gaussian \space Process:}\) \[f(\mathbf{X}) \sim MVN\left( \mathbf{\mu}, \Sigma \right)\] with \(\mathbf{\mu} = \left( \mu_1, \dots, \mu_n \right), \space, \mu_i=E\left[ f(X_i) \right]=m\left( x_i \right), \space i=1,\dots,n ; \Sigma_{ij}=Cov\left( f(x_i), f(x_j) \right) = k\left( x_i, x_j \right), \space i,j=1,\dots,n\)

Example: \(m(x)=\frac{1}{4}x^2, \space k(x,x^\prime)=exp(-\frac{1}{2}(x-x^\prime)^2)\) and the data is \(\mathbf{X}=\begin{bmatrix} -2 \\ 0 \\ 2 \end{bmatrix} \space .\) Then \(m(x)=\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}\) and \(\Sigma = \begin{bmatrix} 1 & e^{-2} & e^{-8} \\ e^{-2} & 1 & e^{-2} \\ e^{-8} & e^{-2} & 1 \end{bmatrix}\)