EVERYTHING FROM BEFORE MIDTERM:

Basic probability definitions (union, intersection, complement, conditional, Bayes Theorem)

Union

Figure caption

\[P(A\,\,OR\,\, B) = P(A\, \cup \, B) = P(A) + P(B) - P(A\, \cap\, B)\] Note: You subtract the intersection because otherwise it would be counted twice.

Intersection

Figure caption

\[P(A\,\,AND\,\, B) = P(A\, \cap\, B) = P(A) * P(B)\]

Complement

The compliment of a trait represents anything that does not have that trait

Visual representation of the compliment

\[P(A^{c}) = 1 - P(A)\]

Difference between sample statistics and population statistics (a.k.a. population parameters)

Statistic: anything that is generated from the data

Sample statistics are generated from the sample data while population statistics are general estimates of those same statistics for the entire population that the sample was taken from (even if you can’t sample the whole population)

Know the concepts of bias and variance in estimators

Bias:

The difference between the average prediction of our model and the correct value which we are trying to predict. If an estimator is unbiased, then repeated estimates of the parameter by the estimator will demonstrate neither predispositions for overestimates nor underestimates.

The expected value does not equal the population parameter

Variance: The varience measures the average squared distances between each point and the mean.

Varience Unbiased: \[ \mbox{variance}_{unbiased} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\] Varience Biased: \[\mbox{variance}_{biased} = \frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2} \]

Definition of degrees of freedom

Degrees of freedom (DoF): The number of degrees of freedom is the number of values in the final calculation that are allowed to vary. For example, when calculating variance, you typically have (n-1) degrees of freedom because you include the parameter (mean) which can’t vary in the formula

How to calculate the expected value of a distribution (discrete and continuous)

Be able to write down the probability density (or mass, for discrete distributions) function, expected value E[X], and variance Var[X], for the (1) Normal, (2) Standard Normal, (3) Log-Normal, (4) Poisson, (5) Binomial

Normal Distribution

Normal Distribution Bounded: \([- \infty : \infty ]\) (unbounded) Countinuous Can be negative PDF: \[f(x \mid \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\] E[X]: \[\begin{align} E[X] &= \int_{-\infty}^{\infty}{X \cdot f(X)dX} \\ &= \int_{-\infty}^{\infty} x \cdot f(x \mid \mu, \sigma) = \int_{-\infty}^{\infty}\frac{x}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \\ &= \mu \end{align}\]

VAR[X]: \[\begin{align} Var[X] &= E[(X- E[X])^2] \\ &= E[(X - \mu)^2] \\ &= E[X^2] - \mu^2 \\ &= \left( \int_{-\infty}^{\infty} x^2 \cdot f(x \mid \mu, \sigma) = \int_{-\infty}^{\infty}\frac{x^{2}}{\sqrt{2 \pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \right) - \mu^2 \\ &= \sigma^2 \end{align}\]

Standard Normal

PDF: \[Z = \frac{X-\mu}{\sigma}\] \[f(z \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}\] E[X] = 0

VAR[X] = 1

Log-Normal Distribution

Log Normal Distribution

PDF: \[\begin{align} log(X) &\sim N(\mu,\sigma) \\ X &\sim LN(\mu,\sigma) \end{align}\] \[f(x \mid \mu, \sigma) = \frac{1}{x\sqrt{2 \pi \sigma^2}}e^{-\frac{(log(x)-\mu)^2}{2\sigma^2}} \\ x \in \{0,\infty\} \\ \mu \in \mathbb{R} \\ \sigma > 0\]

E[X]: \[E[X] = e^{\mu + \frac{\sigma^2}{2}}\]

VAR[X]: \[Var[X] = e^{2(\mu + \sigma^2) - (2\mu + \sigma^2)}\]

Poisson Distribution:

You can typically use the poisson distribution in various situations such as: - The description of random spatial point patterns - As the frequency distribition of rare but independent events - As the error distribution in linear models of count data

The poisson distribution is discrete, hence it has a probability mass function (PMF) instead of a PDF. It cannot be negative and is bounded \([0,\infty)\)

Poisson Distribution

PMF: \[P(x \mid \lambda)= \frac{e^{-\lambda} \cdot \lambda^x}{x!} \\ \lambda>0 \\ x \in \mathbb{N} \cup \{0\}\]

E[X]: \[\begin{align} E[X] &= \sum_{x=1}^{\infty} x \frac{e^{-\lambda} \cdot \lambda^x}{x!} \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{x=1}^{\infty} x \frac{\lambda^{x-1}}{x!} \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{x=1}^{\infty} \frac{\lambda^{x-1}}{(x-1)!}\\ &\mbox{define } y = x-1 \\ &= \lambda \cdot e^{-\lambda} \cdot \sum_{y=0}^{\infty} \frac{\lambda^{y}}{y!} \mbox{ (the sum is now the expansion of the exponential)}\\ &= \lambda \cdot e^{-\lambda} \cdot e^{\lambda} \\ &= \lambda\end{align}\]

VAR[X] \[Var[X] = \lambda\]

Binomial Distribution

Be able to recognize the Gamma, Beta, Multinomial, Chi-squared, F, and t-distributions

Gamma Distribution

Beta Distribution

Multinomial Distribution

Chi-Squared Distribution

Chi-Square Distribution

F Distribution

F-Distribution

t-distribution

Central Limit Theorem

Know all the relationships between the distributions we discussed in lecture

Standard deviation vs. standard error

Know the R functions associated with the univariate distributions (rnorm, pnorm, rchisq, etc.)

Know how to construct a maximum likelihood, find maximum likelihood estimators, and 1-α confidence intervals

Be able to construct the 1-α parameter confidence intervals discussed in class, lab, or on any of the problem sets

understand the concept behind Type I and Type II error and statistical power

Know everything from the Week #5 summary table. The only exception is the d.o.f. for the two sample unpaired t-test, which you will not be expected to know.

know why multiple comparisons are a problem and what to do about it

Be able to discuss all of the objections to null hypothesis testing presented in the papers from the primary literature discussed in class

know how to do the one-sample t-test, the two-sample unpaired t-test, the two-sample paired t-test, Fisher’s F-test for a comparison of variances, comparing two proportions, and comparing two distributions with the K-S test

Understand the relationship between the test statistic T*, the distribution of T under the null hypothesis f(T|H0), and the construction of a one and two-tailed p-value

Know the R functions for all the hypothesis tests discussed in lab

understand the basic idea behind non-parametric bootstrap, parametric bootstrap, jackknife, and bootstrap-after-jackknife and know how to use each technique to calculate estimator bias and standard error.

EVERYTHING FROM AFTER MIDTERM:

know how to calculate Pearson’s product moment correlation coefficient (and the assumptions behind it)

understand the difference between the population correlation coefficient  and the sample correlation coefficient r

understand why r has a sampling distribution (but you don’t need to know what it is)

know what Fisher’s transformation is and why/when/how you would use it (don’t need to know its sampling distribution)

know how to calculate Spearman’s rank correlation coefficient (and the assumptions behind it)

know how to calculate Kendall’s tau

know the difference between OLS and RMA(SMA)/MA regression

know how we calculated the estimates for slope and intercept

understand why the slope and intercept have sampling distributions

understand the assumptions of regression

understand the difference between a confidence interval and a prediction interval

understand how to partition the variance for a linear regression into SSR, SSE, and SST (and their degrees of freedom) and how to use that to calculate the coefficient of determination r2

Final Study Sheet

George Gurgis

5/3/2021

EVERYTHING FROM BEFORE MIDTERM:

Basic probability definitions (union, intersection, complement, conditional, Bayes Theorem)

Union

Intersection

Complement

Difference between sample statistics and population statistics (a.k.a. population parameters)

Know the concepts of bias and variance in estimators

Definition of degrees of freedom

How to calculate the expected value of a distribution (discrete and continuous)

Be able to write down the probability density (or mass, for discrete distributions) function, expected value E[X], and variance Var[X], for the (1) Normal, (2) Standard Normal, (3) Log-Normal, (4) Poisson, (5) Binomial

Normal Distribution

Standard Normal

Log-Normal Distribution

Poisson Distribution:

Binomial Distribution

Be able to recognize the Gamma, Beta, Multinomial, Chi-squared, F, and t-distributions

Gamma Distribution

Beta Distribution

Multinomial Distribution

Chi-Squared Distribution

F Distribution

t-distribution

Central Limit Theorem

Know all the relationships between the distributions we discussed in lecture

Standard deviation vs. standard error

Know the R functions associated with the univariate distributions (rnorm, pnorm, rchisq, etc.)

Know how to construct a maximum likelihood, find maximum likelihood estimators, and 1-α confidence intervals

Be able to construct the 1-α parameter confidence intervals discussed in class, lab, or on any of the problem sets

understand the concept behind Type I and Type II error and statistical power

Know everything from the Week #5 summary table. The only exception is the d.o.f. for the two sample unpaired t-test, which you will not be expected to know.

know why multiple comparisons are a problem and what to do about it

Be able to discuss all of the objections to null hypothesis testing presented in the papers from the primary literature discussed in class

know how to do the one-sample t-test, the two-sample unpaired t-test, the two-sample paired t-test, Fisher’s F-test for a comparison of variances, comparing two proportions, and comparing two distributions with the K-S test

Understand the relationship between the test statistic T*, the distribution of T under the null hypothesis f(T|H0), and the construction of a one and two-tailed p-value

Know the R functions for all the hypothesis tests discussed in lab

understand the basic idea behind non-parametric bootstrap, parametric bootstrap, jackknife, and bootstrap-after-jackknife and know how to use each technique to calculate estimator bias and standard error.

EVERYTHING FROM AFTER MIDTERM:

know how to calculate Pearson’s product moment correlation coefficient (and the assumptions behind it)

understand the difference between the population correlation coefficient  and the sample correlation coefficient r

understand why r has a sampling distribution (but you don’t need to know what it is)

know what Fisher’s transformation is and why/when/how you would use it (don’t need to know its sampling distribution)

know how to calculate Spearman’s rank correlation coefficient (and the assumptions behind it)

know how to calculate Kendall’s tau

know the difference between OLS and RMA(SMA)/MA regression

know how we calculated the estimates for slope and intercept

understand why the slope and intercept have sampling distributions

understand the assumptions of regression

understand the difference between a confidence interval and a prediction interval

understand how to partition the variance for a linear regression into SSR, SSE, and SST (and their degrees of freedom) and how to use that to calculate the coefficient of determination r2

understand the basic idea behind robust regression (when you would use it, how it works generally)

know how to interpret regression estimates generally, and also how to interpret the output of the R function ‘lm’

know when you would use a Generalized Linear Model, and when a Bernoulli, Binomial, and Poisson regression would be appropriate and why

be able to write down the model equation for each of the GLMs introduced (basically, from the Model summary table.doc handout).

understand how to interpret the GLM parameter estimates

know what “overdispersion” means in the context of Poisson regression

know what Deviance is, how to calculate it, and what its sampling distribution is

know how to use Deviance to compare two models

understand the basic idea behind splines/LOESS smoothers, and Generalized Additive Models (GAMs)

understand the two methods for looking at the significance of a regression covariate (t-test and comparison of full to reduced model)

know what multicollinearity is, why it’s a problem, how we diagnose it, and what to do about it

FROM THE ANOVA STUDY SHEET

One-way ANOVA:

Write down the model equation

Know the one-way ANOVA null hypothesis and implied alternative hypothesis

Know why a test of means involves a ratio of variances (i.e. why are we using an F ratio to test a statistical hypothesis?)

Fill out a one-way ANOVA table

Assumptions of ANOVA

Difference between fixed vs. random effects and the different null hypotheses implied

Understand why follow-up analyses to ANOVA are required

Understand Tukey’s HSD

Two-way ANOVA (Factorial and Nested)

Write down the model equation for factorial or nested design using “effect” coding

Write down the model equation for factorial or nested design using “cell means” approach

Know the two-way ANOVA null hypotheses (factorial and nested) and implied alternative hypotheses

Fill out a two-way factorial ANOVA table when A,B are fixed and when A,B are random

Know the difference between sequential (Type I) and marginal (Type III) sums-of-squares and when you have to worry about them

understand the bias-variance trade-off in model selection