Introduction to R

Functions For Statistics and Optimization

Jason Freels

28 September 2016

PRESENTATION OVERVIEW

In This Presentation...

ELEMENTARY STATISTICS REVIEW

Random Quantities & Random Variables

Random Variable Terminology

Example: Rolling A Die \(N\) Times

Discrete Random Variables

Distribution Functions

\[ \begin{aligned} &\text{For}\;x_{_{1}} \in \Omega,& \;f(x_{_{1}})\ge 0\\\\ &\text{For}\;x_{_{1}}, x_{_{2}} \in \Omega,& \;f(x_{_{1}})+f(x_{_{2}})\ge f(x_{_{1}})\\ &\text{For}\;x_{_{1}},...,x_{_{n}} \in \Omega,& \;\sum_1^n x_{_{i}}=1\\ \end{aligned} \]

The Probability Mass Function - \(f(x)\)

\[f(x)=\begin{cases} Pr(X=x)&x\in \Omega\\\\0 &x \not \in \Omega\end{cases}\]

\[ \begin{aligned} \sum_{x\in \mathcal{A},\; \mathcal{A} \subseteq \Omega} f(x)&=F(x)\\\\ \sum_{\forall x\in \Omega} f(x)&=1 \end{aligned} \]

The Cumulative Distribution Function - \(F(x)\)

\[F(x)=\begin{cases} Pr(X\le x)&x\in \Omega\\\\0 &x \not \in \Omega\end{cases}\]

\[ \begin{aligned} F(x)&\ge, \;\;\forall x \in \Omega\\\\ F(x)&=1, \;\;\sup(x) \in \Omega \end{aligned} \]

Common Discrete Distributions

\[f(x|p,n) = \left( \begin{array}{c} n \\ x \end{array} \right) (p)^{x}(1 - p)^{(n-x)}\]

\[f(x|p) = p (1 - p)^{x-1}\]

\[f(x|r,m,n) = \frac{\binom{r}{x} \binom{m - r}{n - x}}{\binom{m}{n}}\]

\[f(x|\lambda) = \frac{e^{-\lambda}\lambda^{x}}{x!}\]

Common Continuous Distributions

\[f(x|\mu,\sigma) = \frac{\exp\left[-(x-\mu)^{2}/(2\sigma^{2})\right]}{\sigma\sqrt{2\pi}}\]

\[f(x|\theta) = \frac{1} {\theta} \exp{\left[-\frac{x}{\theta}\right]}\]

\[f(x|\theta,\beta)=\frac{\gamma}{\theta}\left(\frac{x}{\theta}\right)^{\beta-1}\exp{\left[-\left(\frac{x}{\theta}\right)^{\beta}\right]}\]

\[f(x|\mu,\sigma)=\frac{\exp\left[-(\ln(x-\mu)^{2}/(2\sigma^{2})\right]}{x\sigma\sqrt{2\pi}}\]

Distribution Functions For Continuous Random Variables

The Survival (Reliability) Function - \(S(t)\)

\[S(t)=Pr(T > t), t\in (0,\infty)\]

\[ \begin{aligned} S(t)&\ge 0, \forall t \in \mathcal{R}^{+}\\\\ S(0)&=1\\\\ \displaystyle \lim_{t \to \infty} S(t)&=0 \end{aligned} \]

The hazard (failure rate) function - \(h(t)\)

For the random variable \(T\in \mathcal{R}^{+}\)

  1. The hazard function relates the instantaneous, conditional probability of failure in the interval \((t+\Delta t)\) given \(T>t\)

\[ \begin{aligned} h(t)&=\displaystyle \lim_{\Delta t \to 0} \frac{Pr(t < T \le t+\Delta t|T \ge t)}{\Delta t}\\[5mm] &=\displaystyle \lim_{dt \to 0} \frac{R(t)-R(t+dt)}{R(t)dt}\\[5mm] &=\frac{dR(t)}{dt}\frac{1}{R(t)}=\frac{f(t)}{R(t)}\\ \end{aligned} \]

Probability Functions in R

Using a Weibull distribution with shape \((\beta)\) and scale \((\theta)\) as example

dweibull(x,shape,scale)
Density function \(f(x)\) for a \(Weib(\beta,\theta)\) distribution

pweibull(q,shape,scale)
Probability function \(F(x)\) at quantile \(q\) for a \(Weib(\beta,\theta)\) distribution

qweibull(p,shape,scale)
Quantile function \(F^{-1}(p)\) at probability \(p\) for a \(Weib(\beta,\theta)\) distribution

rweibull(n,shape,scale)
Generates \(n\) random observations from a \(Weib(\beta,\theta)\) distribution

Accessing Distribution Fucntion Values

[1] 0.1586553 0.3085375 0.5000000 0.6914625 0.8413447

[1] 0.1209854 0.1760327 0.1994711 0.1760327 0.1209854

[1] 3.289707 1.683242 0.000000 -1.683242 -3.289707

[1] 2.7494016 1.7362779 3.1268790 -0.7383119 1.6373741

Plotting Relative Frequencies and Probabilities Using hist( )

Plotting The Kernel Density Using density( )

Plotting The Empirical CDF Using ecdf( )

Using optim( ) or nlminb( ) to Maximize Functions

Like7.3<-function(theta,d) {

d1<-d[1];d2<-d[2];d3<-d[3];d4<-d[4];d5<-d[5];d6<-d[6];d7<-d[7];d8<-d[8]

F<-c(rep(NA,1))

F<-log(exp(   -0/theta)-exp( -100/theta))*d1+
   log(exp( -100/theta)-exp( -300/theta))*d2+
   log(exp( -300/theta)-exp( -500/theta))*d3+
   log(exp( -500/theta)-exp( -700/theta))*d4+
   log(exp( -700/theta)-exp(-1000/theta))*d5+
   log(exp(-1000/theta)-exp(-2000/theta))*d6+
   log(exp(-2000/theta)-exp(-4000/theta))*d7+
   log(exp(-4000/theta)-exp( -Inf/theta))*d8 

return(-F)      }

We can use nlminb( ) to do this assuming the following data were observed

data[2,]

[1] 41 44 24 32 29 21 9 0

nlminb(start = 400, obj = Like7.3, d = data[2,], lower = 100, upper = 800)$par

[1] 572.2742

For this fucntion, we could have also used optim( ) and arrived at the same solution

optim(par = 400, fn = Like7.3, d = data[2,], lower = 100, upper = 800, method = "L-BFGS-B")$par

[1] 572.274

\(\S\) 3.3.2 - Confidence intervals

In inferential statistics, confidence intervals (CI) are useful for quantifying the uncertainy associated with a point estimate due to sampling errors

Sources of uncertainty

\(\S\) 3.3.2 - Confidence intervals

What does it mean to say "...the \(100\times(1-\alpha)\%\) CI for \(\theta =(\underline{\hat{\theta}},\overline{\hat{\theta}})\)..."

What Does "Confidence Level" Mean? source

The Poisson Distribution

The Geometric Distribution

The Hyper-Geometric Distribution

The Binomial Distribution

Continuous distributions

Exponential Distribution

Is the continuous counterpart to the geometric distribution

Describes the inter-arrival time durations between events in a homogeneous Poisson process

Is used to model the "useful-life" region of the bathtub curve

Implies that failures are due to random events or chance

Is a "memoryless" distribution

Is one of the most commonly used lifetime distributions in reliability analyses

Is the simplest distribution used in the analysis of reliability data

In real world scenarios, assuming exponentially distributed failure times is rarely valid

Normal Distribution \(\; Y \sim NOR(\mu,\sigma),\;\;Y\in(-\infty,\infty)\)

\[ \begin{aligned} f(y|\mu,\sigma)&=\frac{1}{\sigma}\phi_{nor}\left(\frac{y-\mu}{\sigma}\right)=\frac{1}{\sigma}\frac{e^{-(y - \mu)^{2}/(2\sigma^{2})}}{\sqrt{2\pi}}\\\\ F(y|\mu,\sigma)&=\Phi_{nor}\left(\frac{y-\mu}{\sigma}\right)=\int_{-\infty}^{y} \frac{e^{-(y-\sigma)^{2}/2\sigma^2}} {\sqrt{2\pi}\sigma}\\\\ h(y|\mu,\sigma)&=\frac{f(y|\mu,\sigma)}{1-F(y|\mu,\sigma)}\\\\ y_{p}&=\mu+\Phi^{-1}_{nor}(p)\sigma, \;\;\;\;\;\;\;\;\text{where}\;\Phi^{-1}_{nor}(p)=z_p\\\\ E[Y]&=\mu\\\\ Var[Y]&=\sigma^2 \end{aligned} \]

Log-normal Distribution \(\; T \sim LOGNOR(\mu,\sigma),\;\;T\in[0,\infty)\)

\[ \begin{aligned} f(t|\mu,\sigma)&=\frac{1}{\sigma}\phi_{nor}\left(\frac{\log(t)-\mu}{\sigma}\right)\\\\ F(t|\mu,\sigma)&=\Phi_{nor}\left(\frac{\log(t)-\mu}{\sigma}\right)\\\\ h(t|\mu,\sigma)&=\frac{f(t|\mu,\sigma)}{1-F(t|\mu,\sigma)}\\\\ t_{p}&=\exp\left[\mu+\Phi^{-1}_{nor}(p)\sigma\right], \;\;\;\;\;\;\;\;\text{where}\;\Phi^{-1}_{nor}(p)=z_p\\\\ E[T]&=\exp(\mu+0.5\sigma^2)\\\\ Var[T]&=\exp(2\mu+\sigma^2)(\exp(\sigma^2)-1) \end{aligned} \]

\(\S\) 4.6 - Log-normal Distribution