MT5762 Lecture 9

C. Donovan

Confidence Intervals

Statistics in health: Maternal Smoking and Infant Health

Example: Maternal Smoking and Infant Health

A health warning from the U.S. Surgeon General on the side panel of a cigarette packet reads: Smoking by pregnant women may result in fetal injury, premature birth and low birth weight

  • Smoking is deemed responsible for a 150-250g (5.3-8.8oz)reduction in birth weight
  • Smoking mothers are about twice as likely as non-smoking mothers to have a low birth-weight baby (under 2500grams or 88.18 oz)
  • Smaller babies have lower survival rates than larger babies – in this study, 150 per thousand babies born to smokers died in the first month compared with just 5 per thousand for non-smokers

Example: Maternal Smoking and Infant Health

A study was conducted:

  • The 1236 women in the study were all those enrolled in a pre-paid medical plan and obtained prenatal care in the San Francisco bay area. The women could deliver at any of the hospitals in Northern California
  • The study participants come from a wide range of economic, social and educational characteristics
  • Two thirds are white, one-fifth african-american, 3-4% asian and the remaining belong to the 'other' group. 30% of the husbands are in professional occupations

Data can be found here: https://www.stat.berkeley.edu/users/statlabs/labs.html

Example: Maternal Smoking and Infant Health

  • The educational level and average income is somewhat higher for the study group than that of California as a whole. This study group is deficient in the very impoverished and the very affluent.
  • Note: The data we will use in this part of the course is observational data and was not obtained under strict experimental conditions. Therefore, this data cannot be used to establish causation!

Confidence intervals for means:The average weight of newborn babies

Exploratory Data Analysis

Variable Description
bwt Birth weight in ounces (999 unknown)
gestation Length of pregnancy in days (999 unknown)
parity 0= first born, 9=unknown
age mother's age in years
height mother's height in inches (99 unknown)
weight Mother's prepregnancy weight in pounds (999 unknown)
smoke Smoking status of mother 0=not now, 1=yes now, 9=unknown

Observe the (irritating) numeric coding for missing values

Example

  head(babyData)
  bwt gestation parity age height weight smoke
1 120       284      0  27     62    100     0
2 113       282      0  33     64    135     0
3 128       279      0  28     64    115     1
4 123       999      0  36     69    190     0
5 108       282      0  23     67    125     1
6 136       286      0  25     62     93     0
  summary(babyData$bwt) 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   55.0   108.8   120.0   119.6   131.0   176.0 
  babyData %>% summarise(mean = mean(bwt), SD = sd(bwt))
      mean       SD
1 119.5769 18.23645

Example

  hist(babyData$bwt, col = 'slateblue4')

plot of chunk unnamed-chunk-4

Example

  • We know that our estimate (\( \bar{x}=119.58 \)) isn't likely the real parameter (\( \mu=? \)).
  • Take a differnt sample, get a different estimate (\( \bar{x} \)) ie. our estimate is subject to sampling variability.
  • The CLT means we know the distribution of sample mean estimates are approximately Normal (it's a big \( n \)).
  • Further, it is centered on the true mean value and has SD = \( \sigma/\sqrt{n} \).

Example

  • We want to build a range of plausible values for the mean birthweight (\( \mu \)) of our population
  • A two-standard-error (1.96 actually) interval captures the true mean for about 19 of every 20 samples taken when we have large samples
  • To preserve this coverage rate for small samples, we need more than two standard errors on each side of the estimate

Example

  • We choose this multiplier from the \( t \)-distribution; small sample sizes require big \( t \)-multipliers (and result in wide intervals).
  • We are going to make statements about population parameters (eg. means, \( \mu \), and proportions, \( p \)) using 95% confidence intervals:
  • NB There is nothing magical about 95% as a success rate – 99% or 90% can be used instead. If we want to increase the success/coverage rate of the interval we can make it wider to say 99% (converse applies e.g. 90%).

A 95% confidence interval for average birthweight

How precise is this sample mean?

  • Use the standard error to quantify the precision of the sample mean, ie. \[ se(\bar{x})=\frac{s_x}{\sqrt{n}} \]
  • The standard error clearly gets smaller as the sample size increases.

Example

  • Best guess for the parameter \( \hat{\theta}=\bar{x}_{wgt}= 119.5769 \) ounces
  • Choose the level of confidence. We would like the “success” rate to be 95%. Find the \( t \)-multiplier where \( df=n-1=1236-1=1235 \)
 qt(0.975, 1235)
[1] 1.961887

Example

  • Standard error of the estimate \[ se(\bar{x}_{wgt})=\frac{s_{wgt}}{\sqrt{n}} \]

i.e.

  sd(babyData$bwt)/sqrt(nrow(babyData))
[1] 0.5187177

Example

So altogether:

  N <- length(babyData$bwt)
  df <- N-1
  alpha <- 0.05
  Est <- mean(babyData$bwt)  
  SE <- Est/sqrt(N)
  tmult <- qt(1-alpha/2, df)

  upper <- Est + tmult*SE
  lower <- Est - tmult*SE

  lower
  upper
[1] 118.5592
[1] 120.5945

Example

Of course, who'd do it that way?

  t.test(babyData$bwt)

    One Sample t-test

data:  babyData$bwt
t = 230.52, df = 1235, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 118.5592 120.5945
sample estimates:
mean of x 
 119.5769 

Example

  • We interpret the CI thus:

“With 95% confidence we estimate that the mean birthweight for babies born in San Francisco between 1960 and 1967 was somewhere between 118.6 and 120.6 ounces”

Comparing two means: Comparing birth weights for smokers and non-smokers

  • We are going to compare the birth weights for babies born to smokers with those for babies born to non-smokers.

Comparing 2 means

Summary statistics for birth weights from the smoking/non-smoking groups are as follows:

  babyData %>% filter(smoke!=9) %>% group_by(smoke) %>% summarise(mean = mean(bwt), SD = sd(bwt), n = n())
# A tibble: 2 x 4
  smoke  mean    SD     n
  <int> <dbl> <dbl> <int>
1     0  123.  17.4   742
2     1  114.  18.1   484

NB from previous description, “Smoking status of mother: 0=not now, 1=yes now, 9=unknown”

Comparing 2 means

  smokingMothers <- babyData %>% filter(smoke != 9) %>% mutate(smoke = factor(smoke))

  p <- ggplot(data = smokingMothers) + geom_boxplot(aes(x = factor(smoke), y = bwt, fill = smoke)) 

  p

plot of chunk unnamed-chunk-12

Comparing 2 means

  p <- ggplot(data = smokingMothers) + geom_histogram(aes(bwt, ..density.., fill = smoke), alpha = 0.5) 

  p

plot of chunk unnamed-chunk-14

Exploratory Data Analysis

  • The babies born to the non-smokers appear to be slightly heavier than those born to smokers.
  • The weights for the two groups appear to overlap, have similar spread and are approximately Normally distributed.

How precise is this estimate of the difference between means?

As before we use the standard error of the difference to quantify the precision of the difference between sample means, ie. when the two samples are independent,

\[ se(\bar{x}_{1}-\bar{x}_{2})=\sqrt{\frac{s_{1}^2}{n_{1}}+\frac{s_{2}^2}{n_{2}}} \] for hand calculation, we use \[ df=Min(n_1-1,n_2-1) \]

This is a conservative approach to calculating \( df \) – computer packages may use the Welch procedure and a comparatively complicated \( df \) formula.

A 95% confidence interval for the difference in birth weights for smoking and non-smoking mothers

  • What parameter are we interested in? Define the difference of interest.

\( \theta=\mu_{NS}-\mu_{S} \), the difference between the mean birthweight for babies born to non-smokers and the mean birthweight for babies born to smokers in the population.

  • What is our best guess for the parameter? Obtain the estimate of the difference from the data

\( \hat{\theta}=\bar{x}_{NS}-\bar{x}_{S}=123.0472-114.1095=8.94 \) ounces

this is the sample estimate of the difference in means

CI for the difference between population means, ( \( \mu_{1}-\mu_2 \) )

\[ \begin{align*} \textrm{diff. between sample means} ~~\pm ~& ~~t-mult \times ~\textrm{SE of the difference}\\ \bar{x}_1-\bar{x}_2 \pm ~& ~~t \times se(\bar{x}_1-\bar{x}_2)\\ \textrm{when the two samples are independent,}\\ se(\bar{x}_{1}-\bar{x}_{2})= & \sqrt{\frac{s_{1}^2}{n_{1}}+\frac{s_{2}^2}{n_{2}}}\\ \textrm{for hand calculation, we use} ~~ & df=Min(n_1-1,n_2-1) \end{align*} \]

A 95% CI for the difference in birth weights for smoking and non-smoking mothers

  • Choose the level of confidence

    • We would like the “success” rate to be 95% ie. we would like 95% of these intervals to contain \( \mu_{NS}-\mu_S \)
  • Find the \( t \)-multiplier

    • The \( t \)-multiplier has degrees of freedom: \[ df=Min(n_1-1, n_2-1)=Min(742-1, 484-1)=483 \] and so we use \( t= \) 1.965 standard errors (qt(0.975, 483)).

A 95% CI for the difference in birth weights for smoking and non-smoking mothers

  • How precise is our estimate? Obtain the standard error of the estimate

\[ se(\hat{\theta})=se(\bar{x}_{NS}-\bar{x}_{S}) \\ \sqrt{\frac{s_{NS}^2}{n_{NS}}+\frac{s_{S}^2}{n_{S}}} \]

sqrt(17.39869^2/742+18.09895^2/484)
[1] 1.041524

A 95% CI for the difference in birth weights for smoking and non-smoking mothers

  • What are the upper and lower limits? Calculate the CI \[ estimate \pm t-multiplier \times standard~ error \]

upper limit:

\( 8.9377+1.964888 \times 1.041524=10.98418 \)

lower limit:

\( 8.9377-1.964888 \times 1.041524= 6.891222 \)

A 95% CI for the difference in birth weights for smoking and non-smoking mothers

  • What does this interval mean? Interpret the CI thus

“With 95% confidence we estimate that the true mean birthweight for babies born to non-smokers (\( \mu_{NS} \)) is somewhere between 6.89 and 10.98 ounces higher than that for babies born to smokers (\( \mu_S \)).”

Note, this interval does not contain zero.

A 95% CI for the difference in birth weights for smoking and non-smoking mothers

Generally not by “hand” of course:

  t.test(bwt ~ smoke, data = smokingMothers)

    Welch Two Sample t-test

data:  bwt by smoke
t = 8.5813, df = 1003.2, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.89385 10.98148
sample estimates:
mean in group 0 mean in group 1 
       123.0472        114.1095 

Peek ahead - linear models

A special case of a linear model. Next week covers these

  # t-tests can be a simple linear model
  exampleLM <- lm(bwt ~ smoke, data = smokingMothers)

  # our estimated difference is in here
  coef(exampleLM)
(Intercept)      smoke1 
 123.047170   -8.937666 
  # with confidence intervals here
  confint(exampleLM)
                2.5 %     97.5 %
(Intercept) 121.77391 124.320430
smoke1      -10.96413  -6.911199

Hypothesis testing

We look at another standard inferential tool: testing

This is where the ubiquitous p-value comes from

They are intimately related to confidence intervals, as we will see

A motivating example

  • Everyone grab 5 coins from the box of coins
  • Flip each coin 10 times
  • Keep track of the highest number of heads or tails each coin gets (make sure you know which coin is your winner)

ESP or just guessing?

  • A famous experiment (Pratt and Woodruff, 1938) was conducted to investigate the existence of ESP.
  • An experimenter and a subject sat at opposite ends of a table and the experimenter held up and looked at randomly chosen cards with five very different shapes:

The Zener Cards

ESP experiment

  • A number of students were used as experimental subjects and each was asked to describe the card the experimenter was holding up (but the student could not see).
  • 60,000 cards were held up and 12,489 cards were correctly described – a 20.8% success rate.
  • Clearly the sceptical view is that the subjects were simply guessing:

ESP experiment

  • If the students were just guessing, we would expect to see 1/5 guessed correctly – a 20% success rate. In this case, we would expect to see \( 0.2 \times 60,000= 12,000 \) correct guesses.
  • The success rate was 20.8%, but we know that each time we do such a study we will get a different estimate, estimates are subject to sampling variation.
  • To investigate the sampling variation I would expect in this situation, a computer was used to explore the behaviour of \( \hat{p} \) when the students are just guessing and \( p=0.2 \).

ESP experiment

The histogram below shows the distribution of \( \hat{p} \) obtained for 5000 simulations. This could be termed simulating under a 'Null hypothesis', i.e. the types of results we expect under the theory that subjects are just guessing.

ESP experiment

  set.seed(5647)

  simGuess <- rbinom(5000, 60000, p = 0.2)

  simGuess <- simGuess/60000

  hist(simGuess, col = 'purple', xlim = c(0.19, 0.21))

  abline(v = 0.208, col = 'slateblue4', lty = 3, lwd = 2)

plot of chunk unnamed-chunk-19

ESP experiment

  • Their estimate (0.208) is not consistent with the values we would expect to get if the students were just guessing - not ONE of these 5000 simulated experiments was as high as they found in 1938.
  • In order to see if an estimate is consistent with a population parameter we carry out hypothesis tests. To test these hypotheses, we see whether our data estimate is consistent with the estimates we expect to get when the hypothesized value is true.

Hypothesis testing - definitions

Hypotheses

  • Hypotheses are made about population parameters and NOT estimates

ie. hypotheses contain: \( \mu \), \( p \) , \( \mu_1-\mu_2 \), \( p_1-p_2 \). In this example, we are interested in \( p \).

  • The research hypothesis is the hypothesis the research is designed to investigate.

Hypothesis testing - definitions

  • The research hypothesis underlying the ESP experiment is that some people have an ability to receive mentally transmitted information about the shape of an image on a card and thus will be able to do better than chance at identifying the cards.
  • If estimates such as 20.8% often arise when students are just guessing then the researchers' results would not provide evidence for ESP.

Hypotheses

  • We test this research hypothesis by checking out if there is evidence to rule out that the students were just guessing.
  • The hypothesis we test is called the null hypothesis; this is denoted by \( H_0 \).
  • The null hypothesis is typically a skeptical reaction to a research hypothesis. In the ESP example, the skeptical reaction is that the students were just guessing.

Hypothesis testing - definitions

  • Null hypotheses are very specific and usually contain equal signs (=) e.g. \( H_0: \mu=50 \), \( H_0: p=0.2 \), \( H_0: \mu_1-\mu_2=0 \), \( H_0: p_1-p_2=0 \)
  • The null hypothesis for the ESP experiment is: \newline \( H_0 \): The students were just guessing (i.e ~\( H_0: p=0.2 \))
  • We can never show the null hypothesis (\( H_0 \)) is true.

Hypotheses

We cannot rule in a hypothesized value for a parameter, we can only determine whether there is evidence to rule out a hypothesized value

For instance, we will see that if the null hypothesis is \( H_0: p=0.2 \) or \( H_0: p=0.202 \), we still draw the same conclusions.

This does not mean that \( p=0.202 \) or \( p=0.2 \).

Hypotheses

  • The alternative hypothesis specifies the type of departure we are expecting to detect; this is denoted by \( H_1 \).
  • The alternative hypothesis corresponds to the research hypothesis.
  • The alternative hypothesis specifies the type of departure we are expecting to detect, NOT the type of departure suggested by the data. We shouldn't look at the data first before deciding upon an alternative hypothesis.
  • Alternative hypotheses contain greater than signs (\( > \)), less than signs (\( < \)) and not equal to (\( \neq \)) signs e.g. \( H_1: \mu \neq 50 \), \( H_1: p > 0.2 \), \( H_1: \mu_1-\mu_2 < 0 \), \( H_1: p_1-p_2 \neq 0 \)

Hypotheses

  • In the absence of prior information, the alternative hypothesis for the ESP experiment would be:

\( H_1 \): The true proportion of correct guesses is NOT 0.2. ie. \( H_1: p \neq 0.2 \)

  • This is a two-sided alternative hypothesis – we are looking for departures from \( H_0 \) in either direction. (ie. we are interested if \( p \) is greater or less than 0.2).
  • If a prior study provided evidence for ESP we might decide to have a one-sided alternative hypothesis

In this case, we might expect to find a success rate higher than 0.2 and \( H_1: p > 0.2 \)

Hypothesis testing of means

Example: age of mothers

Revisiting the Maternal smoking and infant health' data. Formally test if the average age of mothers in the US and the UK were the same.

  • We have a random sample of \( n=1234 \) mothers from San Francisco in the US with an average age of \( \bar{x}_{SF}=27.25527 \) years and standard deviation of \( s_{SF}=5.781405 \).
  • We can determine a range of plausible values for the true mean age of mothers (\( \mu_{SF} \)) that are consistent with the data:

\[ \bar{x}_{SF} \pm t_{(0.025, df=n-1)} \times \frac{s_{SF}}{\sqrt{n}} \]

\[ 27.25527 \pm 1.96189 \times 0.1645795 = (26.93238, 27.57815) \]

With 95% confidence, we estimate that the mean age of women that gave birth in San Francisco between 1960 and 1967 was somewhere between 26.93 and 27.58 years.

Example: age of mothers

  • We know, from a UK census, that the average age of mothers in the UK between 1960 and 1967 was \( \mu_{UK}=27.2 \) years.
  • Often want to know how strong the evidence is against a hypothesized value.

In this case, we want to evaluate the strength of evidence that the average age of mothers in the US differs from the corresponding age in the UK

Example: age of mothers

  • We would like to know if the average age for mothers in San Francisco was the same as that in the UK at the time.

Put formally: \( H_0: \mu_{SF} = \mu_{UK} \)

  • There is no reason for us to suspect that the average age in San Francisco is either higher or lower than the average age in the UK at the time. So, the alternative hypothesis is two-sided and states the average age is NOT equal to 27.2.

Put formally: \( H_A: \mu_{SF} \ne \mu_{UK} \)

If the Null hypothesis is true?

  • If the average age really is 27.2 years (ie. \( \mu_{SF}=27.2 \)), then we expect to get estimates (\( \bar{x}_{SF} \)) close to 27.2. Specifically, using work from the previous chapter, we expect to get estimates within about 2 standard errors of 27.2.
  • If \( \mu_{SF}=27.2 \) then we are unlikely to get mean estimates very far from 27.2 years, ie. more than about 2 standard errors from 27.2.

Is our data-estimate consistent with $H_0$?

  • Is our sample mean consistent with \( H_0:\mu_{UK/SF}=27.2 \)?
  • We can measure distance between a data-estimate and the null hypothesis in terms of the number of standard errors of the estimate. We do this by calculating the \( t \)-test statistic:

\( t \)-test statistic \[ t_0 = \frac{\textrm{estimate}-\textrm{parameter value}}{\textrm{standard error}} \]

Is our data-estimate consistent with \(H_0\)?

  • The test statistic, \( t_0 \), is a measure of discrepancy between our data-estimate and the hypothesized value.
  • If our data-estimate is close to the true parameter value then \( t_0 \) will be small and if our estimate is far from the parameter value \( t_0 \) will be large.
  • If this number, \( t_0 \), is unacceptably large then we will reject the hypothesized value. Note what is unacceptable is subject to definition.
  • In this case,

\[ t_0=\frac{\bar{x}-\mu_{0}}{se(\bar{x})} \]

\[ t_0=\frac{27.25527-27.2}{ 0.1645795}=0.3358099 \]

Is our data-estimate consistent with \(H_0\)?

  • For means, the estimates have a \( t \)-distribution about the true parameter value and the test-statistic has a \( t \)-distribution:

\[ T=\frac{\textrm{estimator} - \textrm{true parameter value}}{\textrm{standard error}} \sim t_{(df=n-1)}-\textrm{distribution} \]

  • Therefore, if the null hypothesis is true and \( \mu_{SF} \)=27.2 years then our observed value, \( t_0 \), should be typical of values from a \( t \)-distribution with \( df=n-1=1234-1=1233 \).

Is our data-estimate consistent with \(H_0\)?

  • If the true mean value is 27.2 years, then \( t_0 \) is most likely to lie in the center region of the distribution.

  • Under repeated sampling when the \( H_0 \) is true, 95% of the sample estimates will fall within \( 1.96 \) standard errors.

  • If the true mean value is 27.2 years, then \( t_0 \) is unlikely to lie in the tails of the distribution.

  • Under repeated sampling when the \( H_0 \) is true, 5% of the sample estimates will fall outwith \( 1.96 \) standard errors. ie. \( Pr(T>|1.96|) \approx 0.05 \)

Is our data-estimate consistent with \(H_0\)?

  x <- seq(-3, 3, length = 200)

  plot(x, dt(x, 1233), type = 'l', col = 'slateblue4', lwd = 2)

  abline(v = 0.3358, lwd = 2, lty = 3)

plot of chunk unnamed-chunk-21

What can we conclude?

  • Our sample estimate of 27.25527 is 0.3358099 standard errors from the hypothesized value of 27.2 years.
  • An estimate, and \( t_0 \) value, this different from 27.2 is highly likely to be obtained when the hypothesized value is correct.

  • e.g. if sampling 100,000 values from the distribution of the test statistic (\( t_{1233} \)), some 73459 of these 100,000 values (73.46%) had values greater than 0.336 or less than -0.336.

  • So, when the true mean is 27.2, about 73% of the time we would obtain an estimate at least this different from 27.2 - not a rare occurrence

  • the data are consistent with the Null hypothesis.

What can we conclude?

BTW We've just done a type of \( t \)-test

Hypothesis test on proportions

We return to our first example

  • We have 12,489 correct choices were made when 60,000 zener cards were shown to several students.
  • If the students were just guessing and the population proportion is 0.2. ie. put formally:

\[ H_0: p= 0.2 \]

  • There is no reason for us to suspect that the success rate is either higher or lower than 0.2 and the alternative hypothesis put formally is two-sided:

\[ H_1: p \neq 0.2 \]

If the null hypothesis is true?

  • We are seeking to measure the consistency of the data with the Null hypothesis. How does our data estimate compare with values we expect to see when the students are just guessing?
  • If the students are guessing, we expect to get estimates close to 0.2.
  • expect to get estimates within about 2 standard errors of 0.2 and we are unlikely to get estimates more than 2 standard errors from 0.2.

Test \(H_0\)

  • Is our estimate of the underlying proportion consistent with \( H_0:p_0=0.2 \)?
  • We measure the distance between a data-estimate and the null hypothesis in terms of the number of standard errors of the estimate. As before we calculate the test statistic:

\[ t_0 = \frac{\textrm{estimate}-\textrm{parameter value}}{\textrm{standard error}} \]

  • If this statistic, \( t_0 \), is unacceptably large then we will reject the hypothesized value.

Is our data-estimate consistent with \(H_0\)?

  • In this case the test statistic is calculated using:

\[ t_0=\frac{\hat{p}-p_0}{se(\hat{p})} \]

and the standard error is:

\[ se(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.20815(1-0.20815)}{60000}}= 0.001657426 \]

which gives:

\[ t_0=\frac{0.20815-0.2}{0.001657426}= 4.917263 \]

Is our data-estimate consistent with \(H_0\)?

  • When dealing with proportions from large samples the estimates have a Normal (\( z \)) distribution about the true parameter value. We have a sample size of 60000 and we need a sample size of at least 125 for this method to be valid.

\[ T=\frac{\textrm{estimator} - \textrm{true parameter value}}{\textrm{standard error}} \sim z-\textrm{distribution} \]

Is our data-estimate consistent with \(H_0\)?

  • Therefore, if our hypothesis is true and \( p \)=0.2 then our observed value, \( t_0 \), should be typical of values from a \( z \)-distribution.
  • Under repeated sampling when the \( H_0 \) is true, 95% of the sample estimates will fall within \( 1.96 \) standard errors and 5% of the sample estimates will fall outwith \( 1.96 \) standard errors ie. \( Pr(Z>|1.96|) \approx 0.05 \)

Conclusion?

  • Our sample estimate of 0.208 is 4.92 standard errors from the hypothesized value of 0.2.
  • An estimate, and \( t_0 \) value, this different from 0.2 is very unlikely to be obtained when the true proportion is 0.2.
  • e.g. when sampling 100,000 values from the distribution of the test statistic and none of these values were greater than 4.92 or less than -4.92.
  • Using a Normal distribution, the chance of getting a value 4.92 or more standard errors from zero is \( Pr(Z>4.92) + Pr(Z<-4.92) \) which is \( 8.776 \times 10^{-7} \) (0.00008776% – less than 1 in a million).
  • We have very strong evidence against \( H_0 \) and therefore very strong evidence that something other than just guessing was operating.

Conclusion?

  • In this case we were conservative and used a two-sided alternative hypothesis. If we had a one-sided alternative hypothesis (eg. \( H_1: p > 0.2 \)), the probability (or \( p \)-value) obtained above (\( 8.776 \times 10^{-7} \)) would be halved – since we would only be looking for a departure from \( H_0 \) in one, rather than two, directions.
  • For this reason we say that two-sided tests are conservative

Recap and look-forwards

We've covered:

  • Confidence intervals for estimated means, proportions and differences based on these.
  • Testing: approach and terminology
  • \( t \)-tests comparing an estimate to a hypothesised value

Next:

  • Specific \( t \)-tests