1-1-1 Data basics

In data matrix, each row is an observation (case), each column is a variable

Types of variable:

Numerical variable (takes on numerical values, sensible to add, subtract, take avg, etc)
- Continuous numerical variable (is measured, and can take on any numerical value)
- Discrete numerical variable (is counted, and can take on only whole non-negative numbers)
Categorical variable (takes on a limited number of distinct categories, not sensible to do arithmetic operations)
- Categorical variables that have ordered levels are called ordinal
- If the levels do not have an inherent ordering to them, then the variable is simply called Categorical

Relationships between variables

If two variables are associated, they are dependent
If two variables are not associated, they are independent

1-1-2 Observational studies & experiments

observational study	experiment
1. collect data in a way that does not directly interfere with how the data arise, i.e. merely “observe”	1. randomly assign subjects to various treatments
2. can only establish an association between the explanatory and response variables	2. can establish causal connections between the explanatory and response variables
3. if an observational study uses data from the past, it’s called a *retrospective* study, whereas if data are collected throughout the study, it’s called *prospective*	-

Correlation does not imply causation

For observational study, if A is associated with B, it can be:

A->B
B->A
C->A and C->B

Confounding variable

Extraneous variables that affect both the explanatory and the response variable, and that make it seem like there is a relationship between them are called confounding variables

1-1-3 Sampling & Sources of bias

A few sources of sampling bias

Convenience sample (Individuals who are easily accessible are more likely to be included in the sample)
Non-response (If only a non-random fraction of the randomly sampled people respond to a survey such that the sample is no longer representative of the population)
Voluntary response (Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue)

Sampling methods

Simple random sample (Randomly select cases from the population, such that each case is equally likely to be selected)
Stratified sample (Divide the population into homogenous strata, and then randomly sample from within each stratum)
Cluster sample (Divide the population into clusters, randomly sample a few clusters, and then randomly sample from within these clusters. The clusters should be similar to each other)

1-1-4 Experimental design

Principles of experimental design

Control: Compare treatment of interest to a control group
Randomize: Randomly assign subjects to treatments
Replicate: Within a study, replicate by collecting a sufficiently large sample, or replicate the entire study
Block: If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups

Example on Blocking

We would like to design an experiment to investigate if energy gels make you run faster:
- Treatment: energy gel
- Control: no energy gel
It is suspected that energy gels might affect pro and amateur athletes differently, therefore we block for pro status:
- Divide the sample to pro and amateur
- Randomly assign pro and amateur athletes to treatment and control groups
- Pro and amateur athletes are equally represented in the resulting treatment and control groups

Blocking vs. Explanatory variables

Explanatory variables (also sometimes called factors) are conditions we can impose on the experimental units
Blocking variables are characteristics that the experimental units come with, that we would like to control for
Blocking is like stratifying, except used in experimental settings when randomly assigning, as opposed to when randomly sampling

More experimental design terminology

Placebo: fake treatment, often used as the control group for medical studies
Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment
Blinding: when experimental units do not know whether they are in the control or treatment group
Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group

1-1-5 Spotlight - Random sampling vs. assignment

.	Random assignment	No random assignment	.
Random sampling	causal and generalizable	not causal, but generalizable	Generalizability
No random sampling	causal, but not generalizable	neither causal nor generalizable	No generalizability
.	Causation	Association	.

Random sampling -> Generalizability
Random assignment -> Causality

1-2-1 Visualizing numerical data

Scatterplots

We might conclude Correlation from the scatterplots, but never Causation
Relationship between A and B:
- Positive or Negative (Direction)
- Linear or Curved (Shape)
- Strong or Weak (Strength)

Histogram

In histogram, the values are “binned” into different groups and the height is the number of cases (frequency) in each of the group

The chosen bin width can alter the story the histogram is telling

Skewness (Distributions are skewed to the side of the long tail)
- Left skewed (long tail to the left)
- Symmetric (not skew)
- Right skewed (long tail to the right)
Modality
- Unimodal (has only a single highest value)
- Bimodal (has two highest values)
- Uniform (has no highest value)
- Multimodal (has many highest values)

Dot plot

Useful when individual values are of interest, but can get very busy as the sample size increases.

Boxplot

Interquartile Range (IQR) is range of the middle 50% of the data, distance between the third quartile (75th percentile) and first quartile (25th percentile)
Boxplot can also show the skewness as histogram.
- If the minimum and first quartile is far away from Median, it’s left skewed.
- If the maximum and third quartile is far away from Median, it’s right skewed.

Intensity map

It’s a world or country map where each region is shaded by a color with different intensity (Usually darkness means bigger number)
Useful for highlighting the spatial distribution

1-2-2 Measures of center

Center
- mean: arithmetic average
- median: midpoint of the distribution (50th percentile)
- mode: most frequent observation
Relationship with skewness
- left skewed: Mean < Median < Mode
- symmetric: Mean == Median == Mode
- right skewed: Mean > Median > Mode

1-2-3 Measures of spread

Variance:
1. If we only have sample, variance is \[s^{2} = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}\]
2. If we know the population, variance is \[\sigma^{2} = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n}\]
Standard deviation:
1. Square root of variance
2. Spread of the data

1-2-4 Robust statistics

We define robust statistics as measures on which extreme observations have little effect

robustness/measure	*robust*	*non-robust*
center	median	mean
spread	IQR	SD,range

1-2-5 Transforming numerical data

A transformation is a rescaling of the data using a function. It is used when data are very strongly skewed

Types of transformation are: Log, Square root, and Inverse

Goals of transformations

To see the data structure differently
To reduce skew assist in modeling
To straighten a nonlinear relationship in a scatterplot

Log transformation:

The natural log transformation is often applied when much of the data cluster near zero (relative to the larger values in the data set) and all observations are positive

The transformation can also be applied to one or both variables in a scatterplot to make the relationship between the variables more linear, and hence easier to model with simple methods

1-2-6 Exploring categorical variables

Visualizing distribution of a single categorical variable

Frequency table

##           Counts Frequencies
## casein        12   0.1690141
## horsebean     10   0.1408451
## linseed       12   0.1690141
## meatmeal      11   0.1549296
## soybean       14   0.1971831
## sunflower     12   0.1690141

Bar plot

Visualizing relationship between two categorical variables

Contingency table

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  71 
## 
##  
##              | weight 
##         feed | (108,187] | (187,266] | (266,344] | (344,423] | Row Total | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##       casein |         0 |         3 |         3 |         6 |        12 | 
##              |     0.000 |     0.250 |     0.250 |     0.500 |     0.169 | 
##              |     0.000 |     0.125 |     0.130 |     0.667 |           | 
##              |     0.000 |     0.042 |     0.042 |     0.085 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##    horsebean |         8 |         2 |         0 |         0 |        10 | 
##              |     0.800 |     0.200 |     0.000 |     0.000 |     0.141 | 
##              |     0.533 |     0.083 |     0.000 |     0.000 |           | 
##              |     0.113 |     0.028 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      linseed |         4 |         6 |         2 |         0 |        12 | 
##              |     0.333 |     0.500 |     0.167 |     0.000 |     0.169 | 
##              |     0.267 |     0.250 |     0.087 |     0.000 |           | 
##              |     0.056 |     0.085 |     0.028 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##     meatmeal |         1 |         5 |         4 |         1 |        11 | 
##              |     0.091 |     0.455 |     0.364 |     0.091 |     0.155 | 
##              |     0.067 |     0.208 |     0.174 |     0.111 |           | 
##              |     0.014 |     0.070 |     0.056 |     0.014 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##      soybean |         2 |         7 |         5 |         0 |        14 | 
##              |     0.143 |     0.500 |     0.357 |     0.000 |     0.197 | 
##              |     0.133 |     0.292 |     0.217 |     0.000 |           | 
##              |     0.028 |     0.099 |     0.070 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##    sunflower |         0 |         1 |         9 |         2 |        12 | 
##              |     0.000 |     0.083 |     0.750 |     0.167 |     0.169 | 
##              |     0.000 |     0.042 |     0.391 |     0.222 |           | 
##              |     0.000 |     0.014 |     0.127 |     0.028 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## Column Total |        15 |        24 |        23 |         9 |        71 | 
##              |     0.211 |     0.338 |     0.324 |     0.127 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## 
##

Segmented bar plot or relative frequency segmented bar plot

Mosaicplot

mosaicplot(table(feed, weight),color=rainbow(4))

Visualizing relationship between a categorical and a numerical variable

Side-by-side box plots

boxplot(weight~Diet, data=ChickWeight, range=0, border=rainbow(4), main="Chick Weight by Diet")

1-3-1 Inference via simulation

Set a null and an alternative hypothesis
Simulate the experiment assuming that the null hypothesis is true
Repeat the simulation many times to get the sampling distribution
From the sampling distribution, evaluate the probability of observing an outcome as least as extreme as the one observed in the original data
If this probability is low, then reject the null hypothesis in favor of the alternative.
If this probability is not low, then fail to reject the null hypothesis

2-1-1 Disjoint

Disjoint (mutually exclusive) events cannot happen at the same time \[P(A\ or\ B) = P(A) + P(B)\]
Non-disjoint events can happen at the same time \[P(A\ or\ B) = P(A) + P(B) - P(A\ and\ B)\]
A sample space is a collection of all possible outcomes of a trial
A probability distributions lists all possible outcomes in the sample space, and the probabilities with which they occur
Complementary events are two mutually exclusive events whose probabilities add up to 1

2-1-2 Independence

Two processes are independent if knowing the outcome of one provides no useful information about the outcome of the other \[P(A\ |\ B) = P(A)\] \[P(A\ and\ B) = P(A) * P(B)\]

Disjoint events are dependent of each other

2-2-1 Bayes’s theorem

\[P(A\ |\ B) = \frac{P(A\ and\ B)}{P(B)}\]

2-2-2 Probability tree

\[P(A\ |\ B)\ => P(B\ |\ A)\]

2-2-3 Bayesian inference

Posterior probability

It is generally defined as: \(P(hypothesis\ |\ data)\)
It depends on both the prior probability we set and the observed data
In the next iteration, we update our prior with our posterior probability from the previous iteration

Recap

Take advantage of prior information, like a previously published study or a physical model
Natually integrate data as you collect it, and update your priors
A good prior helps, a bad prior hurts, but the prior matters less when we have more data
Base decisions on the posterior probability: \(P(hypothesis\ is\ true\ |\ observed\ data)\)
It is different from Frequentist Inference \(P(observed\ data\ |\ hypothesis\ is\ true)\)

2-3-1 Normal distribution

Empirical rule:

68% of the observations lie in \([\mu-\sigma, \mu+\sigma]\)
95% of the observations lie in \([\mu-2*\sigma, \mu+2*\sigma]\)
99.7% of the observations lie in \([\mu-3*\sigma, \mu+3*\sigma]\)

Standardized (Z) score

Z score of an observation is the number of standard deviations it falls above or below the mean
Defined for distributions of any shape

2-3-2 Evaluating the normal distribution

Anatomy of a normal probability plot

Data are plotted on the y-axis of a normal probability plot, and theoretical quantiles (following a normal distribution) on the x-axis
If there is a one-to-one relationship between the data and the theoretical quantiles, then the data follow a nearly normal distribution
Since a one-to-one relationship would appear as a straight line on a scatter plot, the closer the points are to a perfect straight line, the more confident we can be that the data follow the normal model
Constructing a normal probability plot requires calculating percentiles and corresponding z-scores for each observation, which is tedious

Shape of distribution

Right skew: Points bend up and to the left of the line
Left skew: Points bend down and to the right of the line
Short tails (narrower than the normal distribution): Points follow an S shaped-curve
Long tails (wider than the normal distribution): Points start below the line, bend to follow it, and end above it

2-4-1 Binomial distribution

Bernoulli random variables

When an individual trial of an experiment has only two possible outcomes, it is called a Bernoulli random variable

Binomial distribution definition

The binomial distribution describes the probability of having exactly k successes in n independent Bernoulli trials with probability of success p: \[{n \choose k} * p^k * (1-p)^{n-k}\]

Binomial distribution conditions

The trials must be independent
The number of trials n must be fixed
Each trial outcome must be classified as a success or a failure (i.e. Bernoulli random variable)
The probability of success p must be the same for each trial

Calculating probabilities

dbinom(8, size=10, p=0.13)

## [1] 2.77842e-06

Mean and standard deviation of binomial distribution

Expected value (mean) of binomial distribution: \[\mu = n p\]
Standard deviation of binomial distribution: \[\sigma = \sqrt{n p (1-p)}\]

2-4-2 Normal approximation to binomial distribution

Success-failure rule

A binomial distribution with at least 10 expected successes and 10 expected failures closely follows a normal distribution \[np >= 10\] \[n(1-p) >= 10\]
If the success-failure condition holds, normal approximation to the binomial: \[Binomial(n,p)\ approximates\ Normal(\mu, \sigma)\] where \(\mu = n p\) and \(\sigma = \sqrt{n p (1-p)}\)
A small trick when using normal distribution to approximate binomial distribution: use \(\frac{(x\ -\ 0.5)\ -\ \mu}{\sigma}\), instead of \(\frac{x\ -\ \mu}{\sigma}\)

3-1-1 Foundations for inference

Sampling distribution

Distribution of sample statistic drawn from each sample of population

Central Limit Theorem (CLT)

The sampling distribution is nearly normal, centered at the population mean, and with a standard deviation equal to the population standard deviation divided by square root of the sample size. Formula: \(\bar{x}\) ~ \(N (mean=\mu, SE=\frac{\sigma}{\sqrt{n}})\) \[mean = \mu\] \[Variance = \frac{\sigma^2}{n}\] \[SE = \sqrt{\frac{\sigma^2}{n}}\]

We usually use \(s\) instead of \(\sigma\) since we don’t know the population

Conditions for CLT

Independence: Sampled observations must be independent.
- Random sample/assignment
- If sampling without replacement, n < 10% of the population
Sample size/skewness: Either the population distribution is normal, or if the population is skewed, the sample size is large ( Rule of thumb: n > 30)

3-2-1 Confidence interval

A plausible range of values for the population parameter is called a confidence interval

Confidence interval for a population mean

Computed as the sample mean plus/minus a margin of error (critical value corresponding to the middle XX% of the normal distribution times the standard error of the sampling distribution) \[[\bar{x} - z^* \frac{s}{\sqrt{n}}, \bar{x} + z^* \frac{s}{\sqrt{n}}]\] where \(z^*\) is called critical value, and margin of error (ME) is \(z^* \frac{s}{\sqrt{n}}\)

3-2-2 Confidence level

Suppose we took many samples with same size and built a confidence interval with 95% confidence level from each sample using the equation \[point estimate +/- 1.96 * SE\]
Then about 95% of those confidence intervals would contain the true population mean \(\mu\)
Commonly used confidence levels in practice are 90%, 95%, 98%, and 99%
We can increase sample size to get higher precision (shorter confidence interval) and higher accuracy (bigger confidence level)

3-3-2 Hypothesis testing (for a mean)

Hypothesis testing for a single mean

Set the hypotheses: \[H_0: \mu = null\ value\] \[H_A: \mu < or > or <> null\ value\] (one-sided or two-sided)
Calculate the point estimate: \(\bar{x}\)
Check conditions:
1. Independence: Sampled observations must be independent (random sample/assignment & if sampling without replacement n < 10% of population)
2. Sample size/skew: n > 30, larger if the population distribution is very skewed
Draw sampling distribution, calculate test statistic \(Z = \frac{\bar{x}-\mu}{SE}\), \(SE = \frac{s}{\sqrt{n}}\), and get p-value. \[p-value = P(observed\ or\ more\ extreme\ outcome\ |\ H_0\ is\ true)\]
Make a decision, and interpret it in context of the research question:
- If p-value < \(\alpha\), reject \(H_0\); The data provide convincing evidence for \(H_A\)
- If p-value > \(\alpha\), fail to reject \(H_0\); The data do not provide convincing evidence for \(H_A\)

Relationship between HT and CI

Using corresponding confidence level(e.g. 95%) and two-sided HT significance level (e.g. 0.05), the hypothesis test and confidence interval yield same results.
Using corresponding confidence level(e.g. 90%) and one-sided HT significance level (e.g. 0.05), the hypothesis test and confidence interval yield same results.

3-4-1 Inference for other estimators

Unbiased estimator

An important assumption of CLT about point estimates is that they are unbiased, i.e. the sampling distribution of the estimate is centered at the true population parameter it estimates.
That is, an unbiased estimate does not naturally over or underestimate the parameter, it provides a “good” estimate

3-5-1 Decision errors

Type 1 & type 2 errors

Type 1 error (\(\alpha\)): Reject \(H_0\) when \(H_0\) is true. The probability of doing so is \(\alpha\)
Type 2 error (\(\beta\) or \(1-power\)): Fail to reject \(H_0\) when \(H_0\) is false. The probability of doing so is \(\beta\)
Power of a test is the probability of correctly rejecting \(H_0\) when \(H_0\) is false. This is what we want.

3-5-3 Statistical vs. practical significance

Real differences between the point estimate and null value are easier to detect with larger samples
However, very large samples will result in statistical significance even for tiny differences between the sample mean and the null value, even when the difference is not practically significant

4-1-1 Hypothesis testing for paired data

Comparing means of matched pairs

When two sets of observations have special correspondence (not independent), they are said to be paired:
- Same individuals: pre-post studies, repeated measures, etc.
- Different (but dependent) individuals: twins, partners, etc.
To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations:
- Converting 2 variables problem into known 1 variable problem
It is important that we always subtract using a consistent order
Hypothesis testing for difference between paired means:
1. Set the hypothesis: \[H_0: \mu_{diff} = null\ value\] \[H_A: \mu_{diff} < or > or <> null\ value\]
2. Calculate the point estimate \(\bar{x_{diff}}\)
3. Check conditions:
  1. Independence: Sampled observations must be independent (random sample/assignment & if sampling without replacement, \(n_{diff}\) < 10% of population)
  2. Sample size/skew: \(n_{diff}\) >= 30, larger if the population distribution is very skewed
4. Draw sampling distribution, calculate test statistic, and get p-value
5. Make a decision, and interpret it in context of the research qusti
Confidence interval for difference between paired means: \[[\bar{x_{diff}} - z^*\frac{s_{diff}}{\sqrt{n_{diff}}}, \bar{x_{diff}} + z^*\frac{s_{diff}}{\sqrt{n_{diff}}}]\]

4-1-3 Comparing two independent means

CLT Conditions for inference for comparing two independent means:

Independence:
- within groups: sampled observations must be independent
  - random sample/assignment
  - if sampling without replacement, n < 10% of population
- between groups: the two groups must be independent of each other (non-paired)
Sample size/skew: Each sample size must be at least 30 (\(n_1\)>=30 and \(n_2\)>=30), larger if the population distribution is very skewed.

Confidence interval for Estimating the difference between independent means

\[CI = [(\bar{x_1}-\bar{x_2}) - z^*SE_{\bar{x_1}-\bar{x_2}}, (\bar{x_1}-\bar{x_2}) + z^*SE_{\bar{x_1}-\bar{x_2}}]\] where \(SE_{\bar{x_1}-\bar{x_2}} = \sqrt{\frac{{s_1}^2}{n_1}+\frac{{s_2}^2}{n_2}}\)

Hypothesis testing for difference between independent means

Null hypothesis: no difference. \(H_0: \mu_1 - \mu_2= 0\)
Alternative hypothesis: some difference. \(H_A: \mu_1 - \mu_2 <> 0\)
Same conditions and SE as the confidence interval

4-2-1 Bootstrapping

Use bootstrapping to construct confidence intervals for parameters of interest when CLT based approach does not apply.

Bootstrapping scheme

Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample
Calculate the bootstrap statistic - a statistic such as mean, median, proportion, etc. computed on the bootstrap sample
Repeat step1 and step2 many times to create a bootstrap distribution - a distribution of bootstrap statistics
Get confidence interval with:
1. percentile method: get middle x% (e.g. 95%) from bootstrap distribution
2. standard error method: get \([\bar{x_{boot}} - z^*SE_{boot}, \bar{x_{boot}} + z^*SE_{boot}]\)

Bootstrapping limitations:

Not as rigid conditions as CLT based methods
However if the bootstrap distribution is extremely skewed or sparse, the bootstrap interval might be unreliable
A representative sample is required for generalizability. If the sample is biased, the estimates resulting from this sample will also be biased

Bootstrap vs. sampling distribution

Sampling distribution is created using sampling (with replacement) from the population
Bootstrap distribution is created using sampling (with repalcement) from the sample
Both are distributions of sample statistics

4-3-1 t-distribution

When n is small (n < 30) & \(\sigma\) is unknown (almost always), use the t distribution to address the uncertainty of the standard error estimate (since we use \(\frac{s}{\sqrt{n}}\) instead of \(\frac{\sigma}{\sqrt{n}}\))
Bell shaped but thicker tails than the normal
- observations more likely to fall beyond 2 SDs from the mean
- extra thick tails helpful for mitigating the effect of a less reliable estimate for the standard error of the sampling distribution
t distribution is always centered at 0 (like the standard normal)
has one parameter: degrees of freedom (df) - that determines thickness of the tails
t statistic & p-value is calculated the same way: \[T = \frac{obs-null}{SE}\] \[P-value = P(x>T)\ or\ P(|x|>T)\]
The larger the degrees of freedom is, the closer the t distribution is to standard normal distribution
Usually we use t distribution for inference (hypothesis test and confidence interval) for one or two means

4-3-2 Inference for one sample mean using t-distribution

CLT Conditions for inference for one sample mean

Independence: Sampled observations must be independent (random sample/assignment & if sampling without replacement n < 10% of population)
Sample size/skew: Condition (greater than 30) does not necessarily need to be met for t-distribution

Confidence interval for one sample mean

\[CI = [\bar{x} - t_{df}^*SE, \bar{x} - t_{df}^*SE]\] where \[SE = \frac{s}{\sqrt{n}}\] \[df = n - 1\]

Hypothesis testing for one sample mean

Same as Confidence Interval
Use t-distribution instead of standard normal distribution to calculate P-value

4-3-3 Inference for comparing two sample means using t-distribution

Inference

*Confidence interval \[CI = [(\bar{x_1}-\bar{x_2}) - t_{df}^*SE_{\bar{x_1}-\bar{x_2}}\ ,\ (\bar{x_1}-\bar{x_2}) + t_{df}^*SE_{\bar{x_1}-\bar{x_2}}]\]

Hypothesis testing \[T_{df} = \frac{(\bar{x_1}-\bar{x_2}) - (\mu_1-\mu_2)}{SE_{\bar{x_1}-\bar{x_2}}}\]

\[SE_{\bar{x_1}-\bar{x_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

Degrees of freedom for t statistic for inference on difference of two means \[df = min(n_1-1, n_2-1)\]

4-4-1 Comparing more than two means

To compare means of 2+ groups we use a test called analysis of variance (ANOVA) and statistic called F statistic

ANOVA

Compare means from more than two groups: are they so far apart that the observed differences cannot all reasonably be attributed to sampling variability?
Hypothesis: \[H_0: The\ mean\ outcome\ is\ the\ same\ across\ all\ categories\ (\mu_1=\mu_2=...=\mu_k)\] \[H_A: At\ least\ one\ pair\ of\ means\ are\ different\ from\ each\ other\]
Compute a test statistic \[F = \frac{variability\ between\ groups}{variability\ within\ groups}\]
Large test statistic leads to small p-values
If the p-value is small enough \(H_0\) is rejected, and we conclude that the data provide convincing evidence of a difference in the population means
ANOVA does not tell which means are different, it only tells at least two means are different

4-4-2 ANOVA in detail

Sample data:

_	n	*mean*	sd
lower class	41	5.07	2.24
working class	407	5.75	1.87
middle class	331	6.76	1.89
upper class	16	6.19	2.34
overall	795	6.14	1.98

Variability partitioning

Total variability in the sample data can be partitioned into:
1. Variability attributed to class (Between group variability)
2. Variability attributed to other factors (Witin group variability)

ANOVA output

_	_	Df	*Sum Sq*	*Mean Sq*	*F value*	*Pr(>F)*
Group	class	3	236.56	78.855	21.735	<0.0001
Error	Residuals	791	2869.80	3.628	-	-
_	Total	794	4106.36	-	-	-

Sum of Squares:

Sum of squares total (SST): measures the total variability in the response variable \[SST = \sum_{i=1}^n(y_i - \bar{y})^2\] where
\(y_i\): value of the response variable for each observation
\(\bar{y}\): grand mean of the response variable
Sum of squares groups (SSG): measures the variability between groups \[SSG = \sum_{j=1}^k{n_j(\bar{y_j}-\bar{y})^2}\] where
\(n_j\): number of observations in group j
\(\bar{y_j}\): mean of the response variable for group j
\(\bar{y}\): grand mean of the response variable
Sum of squares error/residual (SSE): measures the variability within groups \[SSE = SST - SSG\]

Degrees of freedom associated with ANOVA

\[Total:\ df_T = n - 1\] \[Group:\ df_G = k - 1\] \[Error:\ df_E = df_T - df_G\]

Mean squares

Average variability between and within groups, calculated as the total variability (sum of squares) scaled by the associated degrees of freedom \[Group: MSG = SSG / df_G\] \[Error: MSE = SSE / df_E\]

F statistic

Ratio of the between group and within group variability \[F = \frac{MSG}{MSE}\]

p-value

p-value is the probability of at least as large as a ratio between the “between” and “within” group variabilities if in fact the means of all groups are equal
It is area under the F curve, with degrees of freedom \(df_G\) and \(df_E\), above the observed F statistic

pf(21.735, 3, 791, lower.tail=FALSE)

## [1] 1.559855e-13

Conclusion

If p-value is small (less than \(\alpha\)), reject \(H_0\). The data provide convincing evidence that at least one pair of population means are different from each other (but we can’t tell which one)
If p-value is quite large (greater than \(\alpha\)), fail to reject \(H_0\). The data do not provide convincing evidence that one pair of population means are different from each other, the observed differences in sample means are attributable to sampling variability (or chance)

4-4-3 Conditions for ANOVA

Independence:
- within groups: sampled observations must be independent
- between groups: the groups must be independent of each other (non-paired)
Approximate normality:
- distributions should be nearly normal within each group, especially when sample sizes are small
Equal variance:
- groups should have roughly equal variability (constant variance), especially when sample sizes differ between groups

4-4-4 Multiple comparisons

Which means differ

After ANOVA, use two sample t-tests for differences in each possible pair of groups
Testing many pairs of groups is called multiple comparisons

Control Type 1 error

But multiple tests lead to increased Type 1 error rate
Solutions to control Type 1 error:
- Bonferroni correction: adjust \(\alpha\) by the number of comparions being considered \[\alpha^* = \alpha/K\] \[where\ K(number\ of\ comparions) = \frac{k(k-1)}{2}\]

Pairwise comparisons

For multiple comparions after ANOVA, since the assumption of equal variability across groups must have been satisfied, re-think the standard error and the degress of freedom:
- Use a consistent measure of standard error and consistent degrees of freedom for all tests
- Standard error for multiple pairwise comarisons: \[SE = \sqrt{\frac{MSE}{n_1} + \frac{MSE}{n_2}}\]
- Degrees of freedom for multiple pairwise comparisons: \[df_t = df_E\]

5-1-1 Sampling variability & CLT for proportions

Sampling distribution

Distribution of sample statistic (here is sample proportion) drawn from each sample of population

CLT for proportions

The sampling distribution is nearly normal, centered at the population proportion, and with a standard error inversely proportional to the sample size. Formula: \(\hat{p}\) ~ \(N(mean=p, SE=\sqrt{\frac{p(1-p)}{n}})\) If \(p\) is unknown, use \(\hat{p}\)

Conditions for the CLT

Independence: Sampled observations must be independent
- Random sample/assignment
- If sampling without replacement, n < 10% of population
Sample size/skew: There should be at least 10 successes and 10 failures in the sample
- np >= 10 and n(1-p) >= 10
- If p is unknown, use \(\hat{p}\)
- If the success-failure condition is not met, center and spread of the sampling distribution can still be approximated using the same formula, but the shape of the distribution might be skewed depending on whether the true population proportion is closer to 0 or closer to 1

5-1-2 Confidence interval for a proportion

Confidence interval is calculated as point estimate minus/plus margin of error \[[\hat{p} - z^*SE_{\ \hat{p}}\ ,\ \hat{p} + z^*SE_{\ \hat{p}}]\] \[where\ SE_{\ \hat{p}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Calculate the required sample size for desired Margin of Error:
1. Remember \(ME = z^*\sqrt{\frac{p(1-p)}{n}}\)
2. If there is a previous study that we can reply on for the value of \(p\), use that in the calculation of the required sample size
3. If not, use \(p = 0.5\)
  - if you don’t know any better, 50-50 is a good guess
  - gives the most conservative estimate - largest possible sample size n

5-1-3 Hypothesis test for a proportion

Set the hypotheses: \[H_0: p= null value\] \[H_A: p < or > or <> null value\]
Calculate the point estimate: \(\hat{p}\)
Check conditions:
1. Independence: Sampled observations must be independent (random sample/assignment & if sampling without replacement, n < 10% of population)
2. Sample size/skew: \(np >= 10\) and \(n(1-p) >= 10\)
Draw sampling distribution, calculate test statistic, and shade p-value, \[Z = \frac{\hat{p} - p}{SE},\ SE = \sqrt{\frac{p(1-p)}{n}}\]
Make a decision, and interpret it in context of the research question:
- If p-value < \(\alpha\), reject \(H_0\); the data provide convincing evidence for \(H_A\)
- If p-value > \(\alpha\), fail to reject \(H_0\); the data do not provide convincing evidence for \(H_A\)

Note: * In hypothesis test, we use p for checking condition and computing SE * In confidence interval, we use \(\hat{p}\) for checking condition and computing SE since we don’t know p

5-2-1 Inference for comparing two independent proportions

Confidence interval

It is calculated as point estimate minus/plus margin of error \[[(\hat{p_1} - \hat{p_2}) - z^*SE_{(\hat{p_1}-\hat{p_2})}\ ,\ (\hat{p_1} - \hat{p_2}) + z^*SE_{(\hat{p_1}-\hat{p_2})}]\] \[SE = \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}}\]

Conditions for inference for comparing two independent proportions

Independence:
- Within groups: sampled observations must be independent within each group
  - random sample/assignment
  - if sampling without replacement, n < 10% of population
- Between groups: the two groups must be independent of each other (non-paired)
Sample size/skew: Each sample should meet the success-failure condition:
- \(n_1p_1 >= 10\) and \(n_1(1-p_1) >= 10\)
- \(n_2p_2 >= 10\) and \(n_2(1-p_2) >= 10\)

Hypothesis testing

Use pooled proportion for success-failure condition check and SE calculation: \[p1 = p2 = \hat{p_{pool}} =\frac{(number\ of\ success_1) + (number\ of\ success_2)}{n_1 + n_2}\]

5-3-1 Small sample proportion (when success-failure condition fails)

Inference via simulation

The ultimate goal of a hypothesis test is a p-value
- p-value = P(observed or more extreme outcome | \(H_0\) is true)
Devise a simulation schema that assumes the null hypothesis is true
Repeat the simulation many times and record relevant sample statistic (one sample statistic per simulation)
Calculate p-value as the proportion of simulations that have at least as extreme as the observed proportion
For one small sample we can use coin flip; for two small samples we can use cards for simulation

5-4-1 Chi-square goodness of fit (GOF) test

chi-square statistic

When dealing with counts and investigating how far the observed counts are from the expected counts, we use a new test statistic called the chi-square (\(\chi^2\)) statistic \[\chi^2\ statistic:\ \ \chi^2 = \sum_{i=1}^k \frac{(O-E)^2}{E}\] \[O: observed\] \[E: expected\] \[k: number\ of\ cells\]

chi-square distribution

chi-square distribution has just one parameter: degrees of freedom (df)
df influences the shape, center, and spread

Hypothesis test for one categorical variable with more than 2 levels

Example:

ethnicity	white	black	nat.amer.	asian	other	total
expected #	2007	302	20	73	98	2500
observed #	1920	347	19	84	130	2500

Steps:

Set hypothesis
- \(H_0\) (nothing going on): the observed distribution follows the same expected distribution in the population
- \(H_1\) (something going on): the observed distribution does not follow the same expected distribution in the population
Calculate expected and observed counts for each categorical level
Conditions for the chi-square test
1. Independence: Sampled observations must be independent
  - random sample/assignment
  - if sampling without replacement, n < 10% of population
  - each case only contributes to one cell in the table
2. Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases
Calculate chi-square (\(\chi^2\)) statistic \[\chi^2 = \sum_{i=1}^k\frac{(O-E)^2}{E}\] \[O: observed\ ;\ E: expected\ ;\ k: number\ of\ cells\]
Calculate \(\chi^2\) degrees of freedom (df) \[df = k - 1\] \[k: number\ of\ cells\]
Calculate p-value
- p-value for a chi-square test is defined as the tail area above the calculated test statistic
- because the test statistic is always positive, a higher test statistic means a higher deviation from the null hypothesis

pchisq(22.63, df=4, lower.tail=FALSE)

## [1] 0.000150104

5-4-2 Chi-square independence test

Hypothesis test for two categorical variables, at least one of which has more than 2 levels

Example:

status	dating	cohabiting	married	total
obese	81 (113)	103 (110)	147 (108)	331
not obese	359 (327)	326 (319)	277 (316)	962
total	440	429	424	1293

Steps:
1. Set hypothesis: * \(H_0\) (nothing going on): two categorical variables are independent. Variable1 does not vary by variable2 * \(H_A\) (something going on): two categorical variables are dependent. Variable1 does vary by variable2 2. Calculate observed and expected counts for each cell \[expected\ count = \frac{(row\ total) * (column\ total)}{table\ total}\] 3. Conditions for the chi-square test 1. Independence: Sampled observations must be independent * random sample/assignment * if sampling without replacement, n < 10% of population * each case only contributes to one cell in the table 2. Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases 4. Calculate chi-square (\(\chi^2\)) statistic \[\chi^2 = \sum_{i=1}^k\frac{(O-E)^2}{E}\] \[O: observed\ ;\ E: expected\ ;\ k: number\ of\ cells\] 5. Calculate \(\chi^2\) degrees of freedom (df) \[df = (R - 1) * (C - 1)\] \[R: number\ of\ rows\] \[C: number\ of\ columns\] 6. Calculate p-value * p-value for a chi-square test is defined as the tail area above the calculated test statistic

6-1-1 Correlation

Definition

Describes the linear association between two variables
Denoted as R

Properties

The magnitude (absolute value) of the correlation coefficient measures the strength of the linear association between two numerical variables
The sign of the correlation coefficient indicates the direction of association
The correlation coefficient is always between -1 (perfect negative linear association) and 1 (perfect positive linear association). \(R=0\) indicates no linear relationship
The correlation coefficient is unitless, and is not affected by changes in the center or scale of either variable (such as unit conversions)
The correlation of X with Y is the same as of Y with X
The correlation coefficient is sensitive to outliers

6-2-1 Residuals

leftovers from the model fit
data = fit + residual
difference between the observed and predicted y \[Residual: e_i = y_i - \hat{y_i}\]

6-2-2 Least squares line

\[\hat{y} = \beta_0 + \beta_1 x\ \ or\ \ \hat{y} = b_0 + b_1 x\] ### Estimating the regression parameters: Slope \[Slope: b_1 = \frac{s_y}{s_x} R\] \[S_x = SD\ of\ x\] \[S_y = SD\ of\ y\] \[R = cor(x,y)\]

Estimating the regression parameters: Intercept

The least squares line always goes through (\(\bar{x}\), \(\bar{y}\)) \[b_0 = \bar{y} - b_1\bar{x}\]

6-2-3 Prediction & extrapolation

Prediction

Using the linear model to predict the value of the response variable for a given value of the explanatory variable is called prediction
Plug in the value of x in the linear model equation

Extrapolation

Applying a model estimate to values outside of the realm of the original data is called extrapolation
The estimate might not be accurate

6-2-4 Conditions for linear regression

1. Linearity

Relationship between the explanatory and the response variable should be linear
Methods for fitting a model to non-linear relationships exist
Check using a scatterplot of the data, or a residuals plot
If no pattern in the residuals plot, then all the variance is explained by the model

2. Nearly normal residuals

Residuals should be nearly normally distributed, centered at 0
May not be satisfied if there are unusual observations that don’t follow the trend of the rest of the data
Check using a histogram or normal probability plot of residuals

3. Constant variability

Variability of points around the least squares line should be roughly constant
Implies that the variability of residuals around 0 line should be roughly constant as well
Also called homoscedasticity

6-2-5 R-square

Strength of the fit of a linear model is most commonly evaluated using \(R^2\)
Calculated as the square of the correlation coefficient
Tells us what percent of variability in the response variable is explained by the model
The remainder of the variability is explained by variables not included in the model
Always between 0 and 1

6-2-6 Regression with categorical explanatory variables

For three levels categorical variable: \[\hat{y} = b_0 + b_1x_1 + b_2x_2 \]
Possible value:
- \(x_1 = x_2 = 0\)
- \(x_1 = 1\ and\ x_2 = 0\)
- \(x_1 = 0\ and\ x_2 = 1\)

6-3-1 Outliers in regression

Types of outliers

Outliers are points that fall away from the cloud of points
Outliers that fall horizontally away from the center of the cloud but don’t influence the slope of the regression line are called leverage points
Outliers that actually influence the slope of the regression line are called influential points
- usually high leverage points
- to determine if a point is influential, visualize the regression line with and without the point, and see if the slope of the line change considerably
Outliers might reduce \(R^2\), but not always

6-4-1 Inference for linear regression

Hypothesis Test for the slope

Set hypothesis:
- \(H_0: \beta_1=0\) (nothing going on) The explanatory variable is not a significant predictor of the response variable, i.e. no relationship -> slope of the relationship is 0
- \(H_A: \beta_1<>0\) (something going on) The explanatory variable is a significant predictor of the response variable, i.e. has relationship -> slope of the relationship is not 0
Check conditions: Independence
Calculate t-statistic: \[T = \frac{b_1 -0}{SE_{b_1}}\]
Calculate degrees of freedom (df):
- \(df = n - 2\)
- Lose 1 df for each parameter estimated, and in linear regression we estimate 2 parameters: \(\beta_0\) and \(\beta_1\)

Confidence Interval for the slope

Point estimate minus/plus margin of error \[[b_1 - t_{df}^*SE_{b_1}\ ,\ b_1 + t_{df}^*SE_{b_1}]\]

6-4-2 Variability partitioning

An alternative of hypothesis test for the slope of relationship between x and y
It considers the variability in y explained by x, compared to the unexplained variability
Partitioning the variability in y to explained and unexplained variability requires ____analysis of variance (ANOVA)__

Hypothesis testing using ANOVA

Sum of squares: \[Total\ variability\ in\ y:\ \ SS_{Total} = \sum(y-\bar{y})^2\] \[Unexplained\ variability\ in\ y\ (residuals):\ \ SS_{Residual} = \sum(y-\hat{y})^2\] \[Explained\ variability\ in\ y:\ \ SS_{Regression} = SS_{Total} - SS_{Residual}\]
Degrees of freedom: \[Total\ degrees\ of\ freedom:\ \ df_{Total} = n - 1\] \[Regression\ degrees\ of\ freedom:\ \ df_{Regression} = number\ of\ predictors\] \[Residual\ degrees\ of\ freedom:\ \ df_{Residual} = df_{Total} - df_{Regression}\]
Mean squares: \[MS\ Regression:\ \ MS_{Regression} = \frac{SS_{Regression}}{df_{Regression}}\] \[MS\ Residual:\ \ MS_{Residual} = \frac{SS_{Residual}}{df_{Residual}}\]
F statistic (ratio of explained to unexplained variability) \[F_{(df_{Regression},\ df_{Residual})} = \frac{MS_{Regression}}{MS_{Residual}}\]
Get p-value
- If p-value is small, reject \(H_0\), the data provide convincing evidence that the slope is significantly different than 0
- If p-value is not small, fail to reject \(H_0\), the data do not provide convincing evidence that the slope is different than 0

Another way to calculate R square

\[R^2 = \frac{explained\ variability}{total\ variability} = \frac{SS_{Regression}}{SS_{Total}}\]

7-1-1 Multiple predictors

Interaction variables

If variable1 and variable2 are dependent, then we would need to include an interaction variable in the model

7-1-2 Adjusted R square

Why adjusted R square

When any variable is added to the model \(R^2\) increases
But if the added variable doesn’t really provide any new information, we should not expect it to increase

Calculate adjusted R square

\[R_{adj}^2 = 1 - (\frac{SSE}{SST} * \frac{n-1}{n-k-1})\] \[k:\ number\ of\ predictors\]

Properties of adjusted R square

k is never negative -> adjusted \(R^2\) < \(R^2\)
adjusted \(R^2\) applies a penalty for the number of predictors included in the model
we choose models with higher adjusted \(R^2\) over others

7-1-3 Collinearity and parsimony

Collinearity

Two predictor variables are said to be collinear when they are correlated with each other
Inclusion of collinear predictors (also called multicollinearity) complicates model estimation

Parsimony

Avoid adding predictors associated with each other because often times the addition of such variable brings nothing new to the table
Prefer the simplest best model, i.e. the parsimonious model
Addition of collinear variables can result in biased estimates of the regression parameters
While it’s impossible to avoid collinearity from arising in observational data, experiments are usually designed to control for correlated predictors (control for confounding variables)

7-2-1 Inference for Multiple Linear Regression

Inference for the model as a whole

Set hypothesis \[H_0: \beta_1 = \beta_2 = ... = \beta_k = 0\] \[H_A: At\ least\ one\ \beta_i\ is\ different\ than\ 0\]
The F test yielding a significant result doesn’t mean the model fits the data well, it just means at least one of the \(\beta\) is non-zero
The F test not yielding a significant result doesn’t mean individual variables included in the model are not good predictors of \(y\), it just means that the combination of these variables doesn’t yield a good model

T-test for the slopes

Set hypothesis:
- \(H_0: \beta_1=0\), when all other variables are included in the model
- \(H_A: \beta_1<>0\), when all other variables are included in the model
Check conditions: Independence
Calculate t-statistic: \[T = \frac{b_1 -0}{SE_{b_1}}\]
Calculate degrees of freedom (df):
- \(df = n - k - 1\)
- \(k\) is number of predictors
- Lose 1 df for each parameter estimated, and in linear regression we estimate k + 1 parameters

Confidence Interval for the slopes

Point estimate minus/plus margin of error \[[b_1 - t_{df}^*SE_{b_1}\ ,\ b_1 + t_{df}^*SE_{b_1}]\]

7-3-1 Model selection for LR

Stepwise model selection

backwards elimination: start with a full model (containing all predictors), drop one predictor at a time until the parsimonious model is reached
forward selection: start with an empty model and add one predictor at a time until the parsimonious model is reached
Criteria:
- p-value, adjusted \(R^2\)
- AIC, BIC, DIC, Bayes factor, Mallow’s \(C_p\)

Backwards elimination - adjusted R square

Start with the full model
Drop one variable at a time and record adjusted \(R^2\) of each smaller model
Pick the model with the highest increase in adjusted \(R^2\)
Repeat until none of the models yield an increase in adjusted \(R^2\)

Backwards elimination - p-value

Start with the full model
Drop the variable with the highest p-value and refit a smaller model
Repeat until all variables left in the model are significant
For categorical variables, unless all levels are not significant, don’t drop any variables

Adjusted R square vs. p-value

p-value: significant predictors
adjusted \(R^2\): more reliable predictions
p-value method depends on the (somewhat arbitrary) 5% significance level cutoff
- different significant level -> different model
- used commonly since it requires fitting fewer models (in the more commonly used backwards-selection approach)

Forward selection - adjusted R square

Start with single predictor regressions of response vs. each explanatory variable
Pick the model with the highest adjusted \(R^2\)
Add the remaining variables one at a time to the existing model, and pick the model with the highest adjusted \(R^2\)
Repeat until the addition of any of the remaining variables does not result in a higher adjusted \(R^2\)

Forward selection - p-value

Start with single predictor regressions of response vs. each explanatory variable
Pick the variable with the lowest p-value
Add the remaining variables one at a time to the existing model, and pick the variable with the lowest p-value
Repeat until any of the remaining variables do not have a significant p-value

Expert opinion

Variables can be included in (or eliminated from) the model based on expert opinion
If you are studying a certain variable, you might choose to leave it in the model regardless of whether it’s significant or yield a higher adjusted \(R^2\)

7-4-1 Conditions for MLR

1. Linear relationships between (numerical) x and y

Each (numerical) explanatory variable linearly related to the response variable
Check using residuals plots(e vs. x)
- looking for a random scatter plot around 0
- consider all variables in the model, instead of just the bivariate relationship between a given x and y

2. Nearly normal residuals with mean 0

Some residuals will be positive and some negative
On a residuals plot we look for random scatter of residuals around 0
This translates to a nearly normal distribution of residuals centered at 0
Check using histogram or normal probability plot

3. Constant variability of residuals

Residuals should be equally variable for low and high values of the predicted response variable
Check using residuals plots of residuals vs. predicted (e vs. \(\hat{y}\))
- Residuals vs. predicted instead of residuals vs. x because it allows for considering the entire model (with all explanatory variables) at once
- Residuals should be randomly scattered in a band with a constant width around 0 (no fan shape)
- Also worthwhile to view absolute value of residuals vs. predicted to identify unusual observations easily

4. Independent residuals

Independent residuals -> independent observations
If time series structure is suspected, check using residuals vs. order of data collection
If not, think about how the data are sampled

Data Analysis and Statistical Inference

1-1-1 Data basics

Types of variable:

Relationships between variables

1-1-2 Observational studies & experiments

Correlation does not imply causation

Confounding variable

1-1-3 Sampling & Sources of bias

A few sources of sampling bias

Sampling methods

1-1-4 Experimental design

Principles of experimental design

Example on Blocking

Blocking vs. Explanatory variables

More experimental design terminology

1-1-5 Spotlight - Random sampling vs. assignment

1-2-1 Visualizing numerical data

Scatterplots

Histogram

Dot plot

Boxplot

Intensity map

1-2-2 Measures of center

1-2-3 Measures of spread

1-2-4 Robust statistics

1-2-5 Transforming numerical data

Goals of transformations

Log transformation:

1-2-6 Exploring categorical variables

Visualizing distribution of a single categorical variable

Visualizing relationship between two categorical variables

Visualizing relationship between a categorical and a numerical variable

1-3-1 Inference via simulation

2-1-1 Disjoint

2-1-2 Independence

2-2-1 Bayes’s theorem

2-2-2 Probability tree

2-2-3 Bayesian inference

Posterior probability

Recap

2-3-1 Normal distribution

Empirical rule:

Standardized (Z) score

2-3-2 Evaluating the normal distribution

Anatomy of a normal probability plot

Shape of distribution

2-4-1 Binomial distribution

Bernoulli random variables

Binomial distribution definition

Binomial distribution conditions

Calculating probabilities

Mean and standard deviation of binomial distribution

2-4-2 Normal approximation to binomial distribution

Success-failure rule

3-1-1 Foundations for inference

Sampling distribution

Central Limit Theorem (CLT)

Conditions for CLT

3-2-1 Confidence interval

Confidence interval for a population mean

3-2-2 Confidence level

3-3-2 Hypothesis testing (for a mean)

Hypothesis testing for a single mean

Relationship between HT and CI

3-4-1 Inference for other estimators

Unbiased estimator

3-5-1 Decision errors

Type 1 & type 2 errors

3-5-3 Statistical vs. practical significance

4-1-1 Hypothesis testing for paired data

Comparing means of matched pairs

4-1-3 Comparing two independent means

CLT Conditions for inference for comparing two independent means:

Confidence interval for Estimating the difference between independent means

Hypothesis testing for difference between independent means

4-2-1 Bootstrapping

Bootstrapping scheme

Bootstrapping limitations:

Bootstrap vs. sampling distribution

4-3-1 t-distribution