Terms and Definition

  • Population: Entire collection of individuals a researcher is interested in
  • Population Parameter: Any statistic generated for the population. Since it is impossible to sample an entire population, this measure is typically unknown and must be estimated.
  • Sample: A subset of the population selected for the purposes of making inference for the population
  • Sample Statistic: Any statistic generated from the population that is used to infer about the population
  • Statistical inference: Data = sample → sample statistics → estimator → population parameters
  • Bias: The difference between the average prediction of our model and the correct value which we are trying to predict. If an estimator is unbiased, then repeated estimates of the parameter by the estimator will demonstrate neither predispositions for overestimates nor underestimates. The expected value does not equal the population parameter
  • Central Limit Theorem:

Basic Probability Theory

Single Events

Union

\[P(A\,\,OR\,\, B) = P(A\, \cup \, B) \\ = P(A) + P(B) - P(A\, \cap\, B)\] Note: You subtract the intersection because otherwise it would be counted twice.

Intersection

\[P(A\,\,AND\,\, B) = P(A\, \cap\, B) \\ = P(A) * P(B)\] ### Complement The compliment of a trait represents anything that does not have that trait

Visual representation of the compliment \[P(A^{c}) = 1 - P(A)\]

Distributions (KNOW PDF, E[X], Var[X])

Relationships between Distributions

Normal Distribution

Normal Distribution Bounded: \([- \infty : \infty ]\) (unbounded) Countinuous
Can be negative
PDF: \[f(x \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\] E[X]: \[\begin{align} E[X] &= \int_{-\infty}^{\infty}{X \cdot f(X)dX} \\ &= \int_{-\infty}^{\infty} x \cdot f(x \mid \mu, \sigma) = \int_{-\infty}^{\infty}\frac{x}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \\ &= \mu \end{align}\]

VAR[X]: \[\begin{align} Var[X] &= E[(X- E[X])^2] \\ &= E[(X - \mu)^2] \\ &= E[X^2] - \mu^2 \\ &= \left( \int_{-\infty}^{\infty} x^2 \cdot f(x \mid \mu, \sigma) = \int_{-\infty}^{\infty}\frac{x^{2}}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \right) - \mu^2 \\ &= \sigma^2 \end{align}\]

Standard Normal

PDF:
\[Z = \frac{X-\mu}{\sigma}\] \[f(z \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}\]
E[X] = 0

VAR[X] = 1

Log-Normal Distribution

Log Normal Distribution

PDF: \[\begin{align} log(X) &\sim N(\mu,\sigma) \\ X &\sim LN(\mu,\sigma) \end{align}\] \[f(x \mid \mu, \sigma) = \frac{1}{x\sqrt{2 \pi \sigma^2}}e^{-\frac{(log(x)-\mu)^2}{2\sigma^2}} \\ x \in \{0,\infty\} \\ \mu \in \mathbb{R} \\ \sigma > 0\]

E[X]: \[E[X] = e^{\mu + \frac{\sigma^2}{2}}\]

VAR[X]: \[Var[X] = e^{2(\mu + \sigma^2) - (2\mu + \sigma^2)}\]

Poisson Distribution:

You can typically use the poisson distribution in various situations such as:
- The description of random spatial point patterns
- As the frequency distribition of rare but independent events
- As the error distribution in linear models of count data

The poisson distribution is discrete, hence it has a probability mass function (PMF) instead of a PDF. It cannot be negative and is bounded \([0,\infty)\)

Poisson Distribution

PMF: \[P(x \mid \lambda)= \frac{e^{-\lambda} \cdot \lambda^x}{x!} \\ \lambda>0 \\ x \in \mathbb{N} \cup \{0\}\]

E[X]: \[\begin{align} E[X] &= \sum_{x=1}^{\infty} x \frac{e^{-\lambda} \cdot \lambda^x}{x!} \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{x=1}^{\infty} x \frac{\lambda^{x-1}}{x!} \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{x=1}^{\infty} \frac{\lambda^{x-1}}{(x-1)!}\\ &\mbox{define } y = x-1 \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{y=0}^{\infty} \frac{\lambda^{y}}{y!} \mbox{ (the sum is now the expansion of the exponential)}\\ &= \lambda \cdot e^{-\lambda} \cdot e^{\lambda} \\ &= \lambda\end{align}\]

VAR[X] \[Var[X] = \lambda\]

Binomial Distribution

Distributions (Just Recognize):

Gamma Distribution

Gamma Distribution

Beta Distribution

Beta Distribution

Multinomial Distribution

Chi-Squared Distribution

Chi-Square Distribution

F Distribution

F-Distribution

t-distribution

t-distribution