Foundations for Inference

Joe Ripberger

Z-Scores

  • How many standard deviations above or below the mean (\(\mu\)) is a given value?
  • To convert a raw score into a z-score, use the following formula:
    • \(z = \frac{x-\mu}{\sigma}\)
  • What is the probability of that value?

Normal Distribution (cdf)

Example

  • What is the probability that tomorrow’s high temperature will be > 90 degrees?

Example

  • What is the probability that tomorrow’s high temperature will be > 90 degrees?

Example

  • What is the probability that tomorrow’s high temperature will be > 90 degrees?
90 - 84.96 / 7.57 # oops!
[1] 78.77675
(90 - 84.96) / 7.57
[1] 0.665786

Normal Distribution (cdf)

  • What is the probability that tomorrow’s high temperature will be > 90 degrees?

Normal Distribution (cdf)

  • What is the probability that tomorrow’s high temperature will be > 90 degrees?
pnorm(0.665786) # oops!
[1] 0.7472261
1 - pnorm(0.665786)
[1] 0.2527739
pnorm(0.665786, lower.tail = FALSE)
[1] 0.2527739

Inference

  • Inference: derive knowledge about a population from a sample of that population
    • Population: total set of items that we care about (e.g., registered voters, adults in the US, future votes in the senate)
      • Cannot be observed (usually)
    • Sample: observable subset of the population
  • Inferential quality depends on sample quality (think soup!)

Sampling Methods

  • Probability sample: each member of the population has a known probability of being selected
    • Random sample: everyone in the population has an equal probability of selection (uniform distribution)
    • Stratified random sample: population is divided into groups (strata) and some groups are “over-sampled” to increase the number of people
  • Non-probability sample: probability of selection is unknown
    • Representative sample: sample reflects known characteristics of the population (i.e., demographics)
  • Convenience sample: sample that is accessible at relatively low cost, but may not be representative of the population

Statistical Inference

  • Goal: estimate unknown population parameters (i.e., \(\mu\), \(p\)) using sample statistics (i.e., \(\bar{x}\), \(\hat{p}\)) as point estimates for the unknown population parameters
  • Problem: sample statistics vary from sample to sample (sampling variation)
  • Solution: draw a large number of samples and calculate a large number of sample statistics—“the average of the average” (sampling distribution)

Example

Example

Example

Standard Error

  • Problem 2: What if we only have one sample?
  • Solution 2: Calculate point estimates using sample statistics and define (quantify) the error in your point estimates by estimating the standard deviation of the sampling distribution; we call this estimate the standard error
    • We estimate the standard error of the sample mean with the following formula:
      • \(SE = \frac{s}{\sqrt{n}}\)

CAUTION!

  • Standard errors are only accurate if the sampling distribution is normally distributed
  • Unfortunately, we cannot verify this if we only have one sample, so we assume that the sampling distribution is nearly normal if:
    1. The sample observations are independent
      • Drawing an observation for Sample A in no way affects the drawing of an observation for Sample B (and vice-versa)
    2. The sample size is greater than 30
      • Thank you central limit theorem!
    3. The distribution of the sample observations is roughly symmetric around the mean (not strongly skewed)
      • Always, always, always plot the distribution of sample observations!!

Example

  • If s = 7.37 and n = 30, what is the SE of the mean?
    • \(SE=\frac{7.37}{\sqrt{30}}=1.35\)

Example

  • How was the estimate?

Confidence Intervals

  • Confidence interval (CI): interval estimate that provides a plausible range of values for a point estimate
    • Determined by the standard deviation of the sampling distribution
    • If a sampling distribution for a point estimate is normally distributed, what percentage of point estimates in the sampling distribution will fall within…
      • \(1\sigma\) of \(\mu\)?
      • \(2\sigma\) of \(\mu\)?

Normal Distribution (pdf)

Confidence Intervals

  • 95% CI for \(\mu\)?
    • \(CI=84.97\pm(1.32 * 2)=[82.33, 87.61]\)

Confidence Intervals

  • When we don’t know the sampling distribution, we use the following formula to calculate the CI for a sample mean:
    • \(CI=\bar{x} \pm z\frac{s}{\sqrt{n}}\) where \(\bar{x}\) is the sample mean, \(\frac{s}{\sqrt{n}}\) is the standard error of the sample mean, \(z\) is the z-score associated with the confidence level we desire:
    • 90% = 1.645
    • 95% = 1.960
    • 99% = 2.576

Confidence Intervals

  • If \(\bar{x} = 86.27\), \(s = 7.37\) and \(n = 30\), what is the 95% CI for the mean?
    • \(95\% CI = 86.27 \pm 1.96*\frac{7.37}{\sqrt{30}}\)
round(86.27 - 1.96 * (7.37 / sqrt(30)), 2)
[1] 83.63
round(86.27 + 1.96 * (7.37 / sqrt(30)), 2)
[1] 88.91

Hypothesis Testing with CIs

  • Now that we know something about the population (based on the sample), we can use point estimates and confidence intervals to test hypotheses about the population
  • We are skeptics—we assume that our hypotheses are incorrect unless the data strongly suggest otherwise; we do this by:
    1. Specifying two hypothesis:
      • a null hypothesis \((H_0)\)
      • an alternative hypothesis \((H_A)\)
    2. Assuming \(H_0\) is correct, unless the evidence in favor of \(H_A\) is extremely strong—if the evidence in favor of \(H_A\) is not extremely strong, we “fail to reject” the null hypothesis and assume that our hypothesis is incorrect

Hypothesis Testing with CIs

  • How would we use point estimates and confidence intervals to test this hypothesis?
    1. Calculate the sample mean (\(\bar{x}\))
    2. Calculate the standard error of the sample mean (\(SE\))
    3. Calculate the confidence interval around the sample mean (\(95\% CI\))
    4. Does the confidence interval around the sample mean contain the null value?
      • If yes, we cannot reject the null hypothesis
      • If no, we can reject the null hypothesis

Hypothesis Testing with CIs

On a scale from zero to ten, where zero means you are not at all concerned and ten means you are extremely concerned, how concerned are you about the delivery and cost of healthcare in Oklahoma?

Hypothesis Testing with CIs

survey_data %>% 
  drop_na(cncrn_health) %>% 
  summarise(n = n(),
            mean = mean(cncrn_health),
            s = sd(cncrn_health), 
            se = s / sqrt(n), 
            skewness = skewness(cncrn_health), 
            kurtosis = kurtosis(cncrn_health))
# A tibble: 1 × 6
      n  mean     s     se skewness kurtosis
  <int> <dbl> <dbl>  <dbl>    <dbl>    <dbl>
1  2530  8.55  1.93 0.0384    -1.78     6.53

Hypothesis Testing with CIs

  • In Texas, average concern about the delivery and cost of healthcare is 8.25; are Oklahomans more concerned than Texans?
    • \(H_0: \mu = 8.25\)
    • \(H_A: \mu > 8.25\)
  • How would we use point estimates and confidence intervals to test this hypothesis?
    1. Calculate the sample mean (\(\bar{x} = 8.55\))
    2. Calculate the standard error of the sample mean (\(SE = \frac{1.93}{\sqrt{2530}} = 0.04\))
    3. Calculate the confidence interval around the sample mean (\(95\% CI = 8.55 \pm (0.04*1.96) = [8.47, 8.63]\))
    4. Does the confidence interval around the sample mean contain the null value?
      • If yes, we cannot reject the null hypothesis
      • If no, we can reject the null hypothesis

Decision Errors in Hypothesis Testing

  • Statistical inference involves uncertainty, so hypothesis tests are never 100% accurate
    • Type I error (“false positive”): reject \(H_0\) when \(H_0\) is actually true
      • Detect a difference (or effect) that is not present
      • To reduce type I error, we increase our standard of evidence (i.e., from 95% to 99%)
    • Type II error (“false negative”): fail to reject \(H_0\) when \(H_A\) is actually true
      • Fail to detect a difference (or effect) that is present
      • To reduce type II error, we reduce our standard of evidence (i.e., from 95% to 90%)

Decision Errors in Hypothesis Testing

  • Type I and II errors are zero-sum (reducing one increases the other)
  • Usually, we err on the side of caution by reducing type I error at the expense of type II error
    • We avoid incorrect rejection of a true null hypothesis
  • We do this by specifying an acceptable type I error rate (\(\alpha\)) before we test our hypotheses
    • Significance level: probability of the study rejecting the null hypothesis, given that it were true
  • 5% (\(\alpha = 0.05\)) is the most common significance level
    • \(\alpha = 0.05 = 95\% CI\)
    • Levels should be set according to specific circumstances

Hypothesis Testing with P-values

  • Hypothesis tests with confidence intervals can be crude
    • A null value that is just barley outside a CI is treated the same as a null value that is way outside a CI—in both instances we reject \(H_0\)
    • A null value that is just barley inside a CI is treated the same as a null value that is near the center of a CI—in both instances we fail to reject \(H_0\)
  • P-values solve this problem by quantifying the strength of evidence against \(H_0\)
    • If \(H_0\) is true, how likely are your data (findings)?
      • The probability of a result at least as extreme as your finding, if \(H_0\) is correct
    • The smaller the p-value, the less likely it is that \(H_0\) is true
    • If p-value \(< \alpha\) (the significance level) we can reject \(H_0\)
  • Remember, p-values can provide evidence against \(H_0\), but they say nothing about \(H_A\)

Hypothesis Testing with P-values

  • In Texas, average concern about the delivery and cost of healthcare is 8.25; are Oklahomans more concerned than Texans?
    • \(H_0: \mu = 8.25\)
    • \(H_A: \mu > 8.25\)
  • How would we use p-values to test this hypotheses?
    1. Define a significance level (\(\alpha\))
    2. Calculate the point estimate (\(\bar{x} = 8.55\))
    3. Calculate the standard error of the point estimate (\(SE = \frac{1.93}{\sqrt{2530}} = 0.04\))
    4. Calculate the z-score of the sample mean (\(\bar{x}\)) assuming the null value is the population mean (\(\mu\)) and the standard error is the standard deviation of the sampling distribution (\(\sigma\))
    • \(z = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} = \frac{\bar{x} - \mu_0}{SE}\)
    1. Calculate the p-value
      • Use the standard normal distribution to calculate the probability of observing a value of \(z\) or higher

Hypothesis Testing with P-values

  1. Calculate the distance from the sample mean to the population mean (z-score)
    • \(z = \frac{8.55 - 8.25}{0.04}=7.5\)
  2. Calculate the p-value {.smaller}
    • Use the standard normal distribution to calculate the probability of observing a value of \(z\) or higher

Hypothesis Testing with P-values

Hypothesis Testing with P-values

1 - pnorm(7.5, lower.tail = TRUE)
[1] 0.0000000000000318634

Hypothesis Testing with P-values

  • If \(H_0\) (8.25) is the true population mean, then the probability of observing a sample mean of 8.55 or higher is < 0.05
  • If \(\mu = 8.25\), then it is very unlikely that we would observe a sample mean of 8.55 or higher, which is 7.5 standard deviations away from the null value
  • The data are consistent with the hypothesis that Oklahomans are more the cost and quality of healthcare than Texans
  • The data provide statistically significant evidence that Oklahomans are more concerned cost and quality of healthcare than Texans
  • CAUTION!!!
    • STATISTICAL SIGNIFICANCE \(\neq\) PRACTICAL SIGNIFICANCE
    • Is a difference of 0.3 on a 0-10 concern scale practically significant?

The ASA’s Statement on P-values

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.