Discrete Variable takes on countable values.
Example: Number of pets in a household.
(Lecture 6.1)
Continuous Variable can take on any value within a
range.
Example: A person’s height or weight.
(Lecture 6.1)
Probabilities for a Discrete Variable assign
specific probabilities to individual outcomes.
Example: \(P(X = 3) =
0.2\)
(Lecture 6.1)
Probabilities for a Continuous Variable are computed
as areas under a curve.
Examples: \(P(X < a)\),
\(P(X > a)\), \(P(a < X < b)\)
(Lecture 6.1)
Standard Normal Distribution is the distribution
that results from converting any normal distribution into standard
units.
It has a mean of 0 and standard deviation of 1. The shape of the
distribution remains unchanged.
(Lecture 6.1)
Z-score expresses a value in terms of standard
deviations from the mean.
Formula: \(z = \frac{x -
\mu}{\sigma}\)
Example: A height of 70 inches when \(\mu = 64\) and \(\sigma = 3\) gives \(z = 2\)
(Lecture 6.1)
Percentile is the value below which a given
percentage of observations fall.
Example: The 90th percentile means 90% of values are below that
number.
(Lecture 6.1)
Survey Sampling involves selecting a subset from a
population to estimate characteristics of the whole.
(Lecture 6.1)
Census collects data from every member of the
population.
(Lecture 6.1)
Sample Survey collects data from a subset of the
population to estimate proportions.
We estimate population proportion \(p\) using the sample proportion \(\hat{p}\).
(Lecture 6.1)
Population is the entire group of people or objects
we want to study.
(Lecture 6.1)
Population Parameter is a numerical characteristic
of a population, such as the mean or proportion.
(Lecture 6.1)
Sample is a subset of the population, ideally
representative of the whole.
(Lecture 6.1)
Statistic is a numerical summary of a sample used to
estimate a population parameter.
(Lecture 6.1)
Margin of Error quantifies the uncertainty due to
sampling.
Example: A margin of error of ±2% implies the true proportion lies
within 2% of the estimate.
(Lecture 6.1)
Random Sample means each member of the population
has an equal chance of being selected.
(Lecture 6.1)
Sampling Frame is the list from which a sample is
drawn.
(Lecture 6.1)
Binomial Model applies when:
- There are two possible outcomes (success/failure)
- Probability of success \(p\) is
constant
- Number of trials \(n\) is fixed
- Trials are independent
- We count number of successes
If valid, the binomial model allows probability prediction for
outcomes.
(Lecture 6.1)
Bias is a systematic deviation from the true value
due to flaws in the sampling method.
Example: Using a non-representative sampling frame.
(Lecture 6.1)
Good Survey uses random and independent
selection:
1. Random sampling
2. Independence between observations
(Lecture 6.1)
With Replacement means each unit can be selected
more than once, maintaining independence.
(Lecture 6.1)
Without Replacement means each selected unit is
removed from the pool, slightly affecting independence.
(Lecture 6.1)
Valid Sample can be selected:
1. Without replacement if the population is ≥10× sample size (Simple
Random Sample or SRS)
2. With replacement always
A valid sample ensures that \(\hat{p}\)
is unbiased, precision increases with sample size, and sampling
distribution approaches normality.
(Lecture 6.1)
Accuracy: How close the estimate is to the true
value (low bias).
Precision: How consistent the estimates are across
samples (low variability).
(Lecture 6.1)
Precision improves as sample size increases.
Precise estimators have less variability and are measured using
standard error.
(Lecture 6.1)
Accuracy: Does it “hit” the target on average?
Measured by bias.
Precision: How spread out are the estimates? Measured
by standard error.
(Lecture 6.2)
Random Sample ensures each individual in the
population has an equal chance of being selected, enabling unbiased
estimation.
(Lecture 6.2)
Sample Without Replacement is when selected
individuals are not returned to the population before the next
selection.
If the population is at least 10× larger than the sample, this method
behaves like sampling with independence.
(Lecture 6.2)
Sample With Replacement means each individual can be
selected more than once.
This ensures independence across selections.
(Lecture 6.2)
Population Proportion (\(p\)) is the true proportion of individuals
in the population with a certain characteristic.
Example: The proportion of UCLA students who support a tuition
increase for CAPS.
(Lecture 6.2)
Sample Proportion (\(\hat{p}\)) is the proportion in a sample
with the characteristic of interest.
Used to estimate the population proportion \(p\).
(Lecture 6.2)
Standard Error (SE) measures the variability in an
estimator, similar to standard deviation for variables.
Formula: \(SE_{\hat{p}} = \sqrt{\frac{p(1 -
p)}{n}}\)
Interpretation: Larger \(n\)
leads to smaller SE → more precise estimates.
(Lecture 6.2)
Sampling Distribution is the probability
distribution of a statistic over repeated samples.
It helps us understand the behavior of \(\hat{p}\) across many samples.
(Lecture 6.2)
Simulation is a method for estimating the behavior
of sampling distributions using repeated random sampling.
Used to visualize how \(\hat{p}\)
varies from sample to sample.
(Lecture 6.2)
Shape of the Sampling Distribution depends on sample
size.
- Small \(n\): shape may be skewed or
irregular
- Large \(n\): shape becomes
approximately Normal
(Lecture 6.2)
Central Limit Theorem (CLT) for proportions
states:
If \(n\) is large enough, then \(\hat{p} \sim N\left(p, \sqrt{\frac{p(1 -
p)}{n}}\right)\)
Conditions: \(np \geq 10\) and \(n(1 - p) \geq 10\)
(Lecture 6.2)
Center of the sampling distribution is \(p\) (mean).
Spread is measured by \(SE_{\hat{p}} = \sqrt{\frac{p(1 -
p)}{n}}\)
Shape becomes approximately Normal when the sample is
large enough.
(Lecture 6.2)
Law of Large Numbers says that as the sample size
increases, \(\hat{p}\) tends to get
closer to \(p\).
This supports the reliability of estimators as \(n\) grows.
(Lecture 6.2)
Z-score measures how many standard errors a sample
proportion is from the population proportion.
Formula: \(z = \frac{\hat{p} -
p}{\sqrt{\frac{p(1 - p)}{n}}}\)
Used to assess how surprising a result is under the assumption
that \(p\) is known.
(Lecture 6.2)
Central Limit Theorem (CLT) states that if the
sample size is large enough, the sampling distribution of sample
proportions is approximately Normal:
\(\hat{p} \sim N\left(p,
SE_{\hat{p}}\right)\)
Conditions: \(np \geq 10\) and \(n(1 - p) \geq 10\)
(Lecture 7.1)
Effect of Sample Size: Increasing \(n\) reduces the standard error, leading to
more precise estimates.
Example: \(SE = \sqrt{0.3 \cdot 0.7 /
500} = 0.02\) vs. \(SE = \sqrt{0.3
\cdot 0.7 / 1000} = 0.0145\)
(Lecture 7.1)
Standard Error (SE) estimates the standard deviation
of the sampling distribution.
Formula: \(SE_{\hat{p}} = \sqrt{\frac{p(1 -
p)}{n}}\) (if \(p\) is
known)
or \(SE_{\hat{p}} = \sqrt{\frac{\hat{p}(1 -
\hat{p})}{n}}\) (estimated)
(Lecture 7.1)
Z-score tells how many standard errors a sample
proportion is from the population proportion.
Formula: \(z = \frac{\hat{p} -
p}{SE_{\hat{p}}}\)
(Lecture 7.1)
Random Sample is required for valid inference and
ensures unbiased estimation of parameters.
(Lecture 7.1)
Independence means the outcome of one individual
does not influence another.
Usually satisfied if the sample is less than 10% of the
population.
(Lecture 7.1)
Sample Size Condition ensures the CLT applies:
\(np \geq 10\) and \(n(1 - p) \geq 10\)
(Lecture 7.1)
Confidence Statistics quantify uncertainty in an
estimate and are used to construct intervals with known confidence
levels.
(Lecture 7.1)
Confidence Interval (CI) provides an estimate of a
population parameter along with a margin of error.
Formula: \(\hat{p} \pm \text{critical value}
\cdot SE_{\hat{p}}\)
This reflects uncertainty in the estimate. CIs are about population
parameters, not sample statistics.
(Lecture 7.1)
Critical Value (\(z^*\)) is the number of standard errors to
span for a given confidence level.
Example: For 95% confidence, \(z^* =
1.96\)
(Lecture 7.1)
Interpreting Confidence Intervals means
understanding what it says about the population.
Example: “We are 95% confident that the proportion of Americans who
believe the government helps the middle class too little is between
59.6% and 64.5%.”
This does not mean there is a 95% probability the true
proportion is in the interval—we either captured it or we didn’t. The
method captures the truth 95% of the time.
(Lecture 7.1)
Confidence Level is the long-run proportion of
constructed intervals that contain the true population parameter.
Example: 95% confidence means that if we repeat the procedure many
times, 95% of the resulting intervals will contain the true \(p\).
(Lecture 7.1)
It’s an estimate of a population parameter that includes an allowance
for our uncertainty. It is based on a single sample of \(n\) randomly selected members of the
population. Usually expressed as an estimate plus-minus a margin of
error. Interpret it as a range of plausible values for the true value
(that is, the value if we could see everything in the population).
Example: 95% CI for \(\hat{p}_1 -
\hat{p}_2\) is (0.03, 0.08): We are 95% confident the true
difference in proportions lies between 3% and 8%.
(Lecture 7.2)
The quantity added and subtracted to the sample statistic in a
confidence interval.
\(\text{Margin of Error} = z^* \cdot
SE\)
(Lecture 7.2)
\(\hat{p}_1 - \hat{p}_2 \sim N(p_1 - p_2, \sqrt{SE^2_{\hat{p}_1} + SE^2_{\hat{p}_2}})\)
(Lecture 7.2)
Higher confidence level → wider interval.
(Lecture 7.2)
Larger sample size or lower confidence level → smaller interval.
(Lecture 7.2)
\(\hat{p}_1 - \hat{p}_2 \pm z^* \cdot \sqrt{\frac{\hat{p}_1(1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2(1 - \hat{p}_2)}{n_2}}\)
(Lecture 7.2)
Example: “We are 95% confident that the proportion of
support in Group 1 exceeds that in Group 2 by between 3% and 8%.”
- If 0 is in the CI → no evidence of a difference
- If 0 is not in the CI → evidence of a difference
(Lecture 7.2)
CLT must hold in both samples:
- Both samples must be random, independent
- Both samples must be large, so that the expected number of yes’s and
no’s is bigger than 10 in both samples
- (Only if sampling without replacement) Each population must be at
least 10 times larger than its sample size
- The samples must be independent of each other!
(Lecture 7.2)
It means that if I tell you who answered ‘yes’ in one sample, you know nothing about who answered ‘yes’ in the other.
(Lecture 7.2)
A ratio of two probabilities.
\(RR = \frac{p_1}{p_2}\)
Example: If Group 1 has a 19% infection rate and Group 2 has 1%,
then \(RR = 19\) → Group 1 is 19
times as likely to get infected.
(Lecture 7.2)
Confidence Intervals answer questions like “how
much?” or “what percent?” by estimating a population proportion with a
range of plausible values.
(Lecture 8.1)
Hypothesis Tests address questions like “is the
percentage different from what everyone thinks it is?” or “did my
intervention change the population proportion?”
(Lecture 8.1)
Null Hypothesis (\(H_0\)) is the default, skeptical position
assumed to be true. Any observed differences are attributed to
chance.
Example: \(H_0 : p = 0.54\)
(polio rate without vaccine)
(Lecture 8.1)
Alternative Hypothesis (\(H_A\)) is what we hope to
demonstrate—typically that the true proportion differs from the
null.
Example: \(H_A : p < 0.54\)
(vaccine lowers polio rate)
(Lecture 8.1)
P-value is the probability of observing an outcome
as extreme (or more extreme) than the one obtained, under the assumption
that the null hypothesis is true.
- Right-tailed: \(P(Z >
z_{\text{obs}})\)
- Left-tailed: \(P(Z <
z_{\text{obs}})\)
- Two-tailed: \(2 \cdot P(Z >
|z_{\text{obs}}|)\)
Example: If \(z_{\text{obs}} =
1.25\), then \(P(Z > 1.25) =
0.106\)
(Lecture 8.1)
Significance Level (\(\alpha\)) is the threshold for rejecting
the null hypothesis.
Common values: \(\alpha =
0.05\), \(0.01\).
Relates to confidence level: 95% CI → \(\alpha
= 0.05\)
(Lecture 8.1)
Test Statistic compares observed outcomes with what
is expected under the null hypothesis.
Formula: \(z_{\text{obs}} = \frac{\hat{p} -
p_0}{SE_{p_0}}\)
where \(p_0\) is from \(H_0\), and standard error uses \(p_0\):
\(SE_{p_0} = \sqrt{\frac{p_0(1 -
p_0)}{n}}\)
(Lecture 8.1)
Z-score represents how many standard errors the
observed \(\hat{p}\) is from \(p_0\).
Interpretation: A \(z\) of 1.5 means
\(\hat{p}\) is 1.5 standard errors
above \(p_0\).
(Lecture 8.1)
Type I Error occurs when we reject the null
hypothesis when it is actually true.
Probability of Type I error = \(\alpha\)
(Lecture 8.1)
Type II Error occurs when we fail to reject the null
hypothesis when it is actually false.
Probability of Type II error = \(\beta\)
(Lecture 8.1)
Hypotheses are always about
parameters, not observed values.
Example: \(H_0 : p = 0.20\),
\(H_A : p \ne 0.20\)
(Lecture 8.1)
Null Hypothesis is a conservative or skeptical claim
about a population parameter, assumed true until evidence suggests
otherwise.
Alternative Hypothesis is a competing claim we hope to
support with evidence.
(Lecture 8.2)
Test Statistic compares the observed outcome to what
we would expect under the null.
Example: \(Z_{obs} = \frac{\hat{p} -
p_0}{SE_{p_0}}\)
(Lecture 8.2)
Population Parameter is a fixed but unknown value
that describes a population characteristic (e.g., \(p\)).
(Lecture 8.2)
Statistic is a value computed from sample data
(e.g., \(\hat{p}\)) used to estimate
the population parameter.
(Lecture 8.2)
Z-score is the number of standard errors the
observed statistic is from the null.
Interpretation: A Z-score of 2 means the result is 2 SEs above
what is expected under \(H_0\).
(Lecture 8.2)
P-value is the probability of getting an outcome as
extreme or more extreme than the one observed, assuming the null
hypothesis is true.
If \(p\)-value < \(\alpha\), reject the null.
(Lecture 8.2)
Independent Random Sample ensures each individual is
selected independently and randomly; necessary for valid
inference.
(Lecture 8.2)
Central Limit Theorem (CLT): when conditions are met
(large sample, independence), the sampling distribution of a proportion
is approximately normal.
(Lecture 8.2)
Standard Normal Distribution has a mean of 0 and SD
of 1; used to approximate sampling distributions under the null.
(Lecture 8.2)
Type I Error occurs when we reject the null
hypothesis even though it is true (false positive).
(Lecture 8.2)
Type II Error occurs when we fail to reject the null
hypothesis even though it is false (false negative).
(Lecture 8.2)
Null is TRUE (Truth) | Null is FALSE (Truth) | |
---|---|---|
Reject H₀ (Decision) | ❌ Type I Error (False Positive) | ✅ Correct Decision (True Positive) |
Fail to Reject H₀ (Decision) | ✅ Correct Decision (True Negative) | ❌ Type II Error (False Negative) |
(Lecture 8.2)
Significance Level (\(\alpha\)) is the probability of making a
Type I error.
Interpretation: \(\alpha =
0.05\) means we’re willing to wrongly reject \(H_0\) 5% of the time.
(Lecture 8.2)
\(P(\text{Type I}) = P(\text{Reject } H_0
\mid H_0 \text{ is true})\)
\(P(\text{Type II}) = P(\text{Fail to Reject }
H_0 \mid H_0 \text{ is false})\)
(Lecture 8.2)
To control the probability of a Type I error. Ensures that the rate
of wrongly rejecting \(H_0\) is no more
than \(\alpha\).
(Lecture 8.2)
Hypothesis Test for Difference in Proportions
compares \(p_1\) and \(p_2\) to see if a real difference
exists.
(Lecture 8.2)
If assumptions hold, the sampling distribution of \(\hat{p}_1 - \hat{p}_2\) is approximately
\(N(0, 1)\) under \(H_0\).
(Lecture 8.2)
\(Z_{obs} = \frac{\hat{p}_1 - \hat{p}_2 -
0}{SE_o}\)
\(SE_o = \sqrt{p(1 - p) \left( \frac{1}{n_1} +
\frac{1}{n_2} \right)}\)
Where \(p\) is the pooled proportion
under \(H_0\).
(Lecture 8.2)
Summary of Hypothesis Tests:
1. State Null and Alternative hypotheses and choose significance
level.
2. Find value of test statistic after checking conditions for CLT.
3. Depending on Alternative Hypothesis, find p-value.
4. If p-value < significance level, reject Null.
(Lecture 9.1)
Numerical Variables:
Quantitative values such as income or body temperature.
(Lecture 9.1)
Categorical Variables:
Used for proportions (e.g., win/loss, yes/no).
(Lecture 9.1)
Confidence Intervals:
Provide a range of plausible values for a population parameter.
(Lecture 9.1)
Random Samples:
Required to ensure validity of statistical inference.
(Lecture 9.1)
Hypothesis Test:
Tests whether the population parameter differs from a hypothesized
value.
(Lecture 9.1)
Population:
- Mean: μ
- Proportion: p
(Lecture 9.1)
Statistic:
- Sample Mean: \(\bar{x}\)
- Sample Proportion: \(\hat{p}\)
(Lecture 9.1)
Central Limit Theorem for Means:
If the sample size is large enough, the sampling distribution of the
average is approximately Normal.
(Lecture 9.1)
Sampling Distribution:
The probability distribution of a statistic over many samples.
(Lecture 9.1)
Probability Distribution:
Describes how probabilities are distributed over values of a random
variable.
(Lecture 9.1)
Standard Error:
\(\frac{\sigma}{\sqrt{n}}\)
SE of the sample mean is smaller than σ; it’s σ divided by the square
root of n.
(Lecture 9.1)
Inference for Mean:
Sample mean is unbiased. As n increases, the approximation to Normal
improves.
(Lecture 9.1)
Normal Distribution:
Used for approximating sampling distributions due to CLT.
(Lecture 9.1)
Law of Large Numbers:
Sample means converge to the population mean as sample size
increases.
(Lecture 9.1)
Distribution of the Sample Mean:
\(\bar{x} \sim N(\mu,
\frac{\sigma}{\sqrt{n}})\)
(Lecture 9.1)
Null and Alternative Hypothesis for Means:
- H₀: μ = μ₀
- Hₐ: μ ≠ μ₀ or μ < μ₀ or μ > μ₀
(Lecture 9.1)
Test Statistic for Means:
- Z-test: \(\frac{\bar{x} -
\mu}{SE_{\bar{x}}}\) when σ known
- T-test: \(t = \frac{\bar{x} -
\mu_0}{\hat{SE}_{\bar{x}}}\) when σ unknown
where \(\hat{SE}_{\bar{x}} =
\frac{s}{\sqrt{n}}\)
(Lecture 9.1)
Sampling Distribution of t:
As degrees of freedom increase, t approximates Normal
distribution.
(Lecture 9.1)
Degrees of Freedom:
refer to the number of values in a calculation that are free to vary.
For a single sample mean, the degrees of freedom is \(df = n - 1\) because one value is
constrained by the sample mean.
Example: If you know the mean of 5 numbers is 10, then knowing 4 of them determines the 5th.
(Lecture 9.1)
T-distribution:
Used when σ is unknown; has thicker tails than Normal.
(Lecture 9.1)
Confidence Intervals for the Mean:
\(\bar{x} \pm \text{critical value} \times
\hat{SE}_{\bar{x}}\)
Use t-distribution with \(n - 1\)
degrees of freedom.
(Lecture 9.1)
Population refers to the entire group under study.
Represented by parameters like \(\mu\).
(Lecture 9.2)
Sample is a subset of the population from which we
draw data. Represented by statistics like \(\bar{x}\).
(Lecture 9.2)
Central Limit Theorem (CLT): For large enough \(n\), the sampling distribution of \(\bar{x}\) is approximately Normal:
\(\bar{x} \sim N\left(\mu,
\frac{\sigma}{\sqrt{n}}\right)\)
(Lecture 9.2)
Standard Error for sample mean:
\(SE_{\bar{x}} =
\frac{s}{\sqrt{n}}\)
Smaller \(SE\) implies more
precise estimates.
(Lecture 9.2)
Test Statistic for comparing means:
\(z = \frac{\bar{x} -
\mu_0}{SE_{\bar{x}}}\)
For unknown \(\sigma\), use \(t = \frac{\bar{x} -
\mu_0}{\frac{s}{\sqrt{n}}}\)
(Lecture 9.2)
Confidence Interval:
\(\bar{x} \pm t^* \cdot
\frac{s}{\sqrt{n}}\)
Example: 95% CI means we are 95% confident the interval captures the
true mean \(\mu\).
(Lecture 9.2)
Comparing Two Means involves checking if observed
differences could have occurred by chance.
Hypothesis: \(H_0: \mu_1 = \mu_2\)
vs. \(H_A: \mu_1 \ne \mu_2\)
(Lecture 9.2)
Conditions:
1. Random samples & independent observations
2. Independent samples (not paired)
3. Large samples or approximately Normal populations (each \(n \geq 25\))
(Lecture 9.2)
\(\bar{x}_1 - \bar{x}_2 \sim N\left(\mu_1 -
\mu_2,\ \sqrt{\frac{\sigma_1^2}{n_1} +
\frac{\sigma_2^2}{n_2}}\right)\)
(Lecture 9.2)
\(\bar{x}_1 - \bar{x}_2 \pm t^* \cdot
\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
Estimates the range of plausible differences in population
means.
(Lecture 9.2)
Test Statistic:
\(t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 -
\mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\)
(Lecture 9.2)
\(H_0: \mu_1 = \mu_2\) — assumes no
difference in population means.
(Lecture 9.2)
\(H_A: \mu_1 \ne \mu_2\), \(H_A: \mu_1 > \mu_2\), or \(H_A: \mu_1 < \mu_2\) — based on research
question.
(Lecture 9.2)
Significance Level (\(\alpha\)) is the probability of rejecting
\(H_0\) when it is true. Common levels:
0.05 or 0.01.
(Lecture 9.2)
Box Plot visualizes five-number summary: min, \(Q_1\), median, \(Q_3\), max.
Helpful for comparing distributions between groups.
(Lecture 9.2)
Histogram displays frequency of data
intervals.
Use to assess skewness and modality before applying CLT.
(Lecture 9.2)
Unpaired T-Test compares means from two
independent samples.
Assumes normality or large \(n\), and
equal/unequal variances.
(Lecture 9.2)
Paired T-Test compares matched
pairs (e.g., before/after).
CLT applies to the differences:
\(\bar{x}_{\text{diff}} \sim N\left(\mu_1 -
\mu_2,\ \frac{s^2_{\text{diff}}}{n}\right)\)
Example: Measure student scores before and after
tutoring.
(Lecture 9.2)
\(\bar{x}_{\text{diff}} \pm t^* \cdot
\sqrt{\frac{s^2_{\text{diff}}}{n}}\)
Estimates change from before to after within individuals.
(Lecture 9.2)
Experiment: treatment is randomly assigned; causal
conclusions can be drawn.
Observational Study: no random assignment; only associations,
not causality.
(Lecture 10.1)
Treatment variable: the variable manipulated (e.g., smoking
vs. not).
Response variable: the outcome measured (e.g., baby
weight).
(Lecture 10.1)
A study where the researcher does not assign treatments but observes
naturally occurring differences.
(Lecture 10.1)
A study where treatments are randomly assigned to subjects to test
for causal effects.
(Lecture 10.1)
Modeling the relationship between a response and explanatory
variable.
Intercept: expected \(y\) when
\(x = 0\) (interpret only if \(x = 0\) is in range).
Slope: expected change in \(y\) when \(x\) increases by 1.
(Lecture 10.1)
\(R^2\) = proportion of variability
in \(y\) explained by \(x\).
\(R^2 = r^2\) where \(r\) is the correlation coefficient.
(Lecture 10.1)
Measures linear association between two variables. \(r \in [-1, 1]\).
(Lecture 10.1)
Deals with random sampling, conditional probabilities, and
independence.
\(P(A \mid B) = P(A)\) if independent;
\(P(A \text{ and } B) = 0\) if mutually
exclusive.
(Lecture 10.1)
Each member of the population has equal chance of being
selected.
(Lecture 10.1)
Two events are independent if knowing one does not affect the
probability of the other.
\(P(A \mid B) = P(A)\)
(Lecture 10.1)
Two events cannot both occur. \(P(A \text{
and } B) = 0\)
(Lecture 10.1)
As sample size increases, the sample statistic gets closer to the
population parameter.
(Lecture 10.1)
Numerical values that are countable (e.g., number of pets).
(Lecture 10.1)
Numerical values that can take on any value in a range (e.g.,
height).
(Lecture 10.1)
Used for categorical variables; height of bars represents
frequency.
(Lecture 10.1)
Used for numerical data; shows distribution via bins.
(Lecture 10.1)
Bell-shaped, symmetric; used to model sample means or proportions
under CLT.
(Lecture 10.1)
Distribution of a sample statistic (e.g., \(\bar{x}\) or \(\hat{p}\)) across many samples.
(Lecture 10.1)
Using sample data to draw conclusions about a population.
(Lecture 10.1)
Sample: subset of the population.
Population: entire group of interest.
(Lecture 10.1)
Statistic: calculated from sample data (e.g., \(\bar{x}, \hat{p}\)).
Parameter: value that describes the population (e.g., \(\mu, p\)).
(Lecture 10.1)
Bias (accuracy): how close estimates are to true
value.
Standard error (precision): how much estimates vary.
(Lecture 10.1)
With large enough samples, the sampling distribution of the
mean/proportion is approximately Normal.
(Lecture 10.1)
Estimate of a population parameter plus/minus a margin of
error.
(Lecture 10.1)
A formal procedure for testing a claim about a population.
Steps: Hypotheses → Check assumptions → Test statistic → p-value →
Decision
(Lecture 10.1)
A statement of no effect or difference. Typically the hypothesis we
try to reject.
(Lecture 10.1)
What we seek evidence for; a statement that contradicts the
null.
(Lecture 10.1)
Compares observed data to what we expect under the null.
Often \(z = \frac{\bar{x} -
\mu_0}{SE}\) or \(t = \frac{\bar{x} -
\mu_0}{\hat{SE}}\)
(Lecture 10.1)
Probability of observing a test statistic as extreme as or more
extreme than the one observed, assuming \(H_0\) is true.
(Lecture 10.1)
Type I Error: Reject \(H_0\) when it is true.
Type II Error: Fail to reject \(H_0\) when it is false.
(Lecture 10.1)
Use proportion if the variable is categorical (1 = success, 0 =
failure).
Use mean if the variable is numerical.
(Lecture 10.1)
One-sample: compare sample to a known value.
Two-sample: compare two independent samples.
(Lecture 10.1)
Paired: samples are linked; test differences within
pairs.
Unpaired: samples are independent; compare group
averages.
(Lecture 10.1)
Use \(t\)-distribution when sample
size is small (\(n < 25\)) for
numerical data.
Otherwise, use standard Normal (\(z\)).
(Lecture 10.1)