C. Donovan
A health warning from the U.S. Surgeon General on the side panel of a cigarette packet reads: Smoking by pregnant women may result in fetal injury, premature birth and low birth weight
A study was conducted:
Data can be found here: https://www.stat.berkeley.edu/users/statlabs/labs.html
Variable | Description |
---|---|
bwt | Birth weight in ounces (999 unknown) |
gestation | Length of pregnancy in days (999 unknown) |
parity | 0= first born, 9=unknown |
age | mother's age in years |
height | mother's height in inches (99 unknown) |
weight | Mother's prepregnancy weight in pounds (999 unknown) |
smoke | Smoking status of mother 0=not now, 1=yes now, 9=unknown |
Observe the (irritating) numeric coding for missing values
head(babyData)
bwt gestation parity age height weight smoke
1 120 284 0 27 62 100 0
2 113 282 0 33 64 135 0
3 128 279 0 28 64 115 1
4 123 999 0 36 69 190 0
5 108 282 0 23 67 125 1
6 136 286 0 25 62 93 0
summary(babyData$bwt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
55.0 108.8 120.0 119.6 131.0 176.0
babyData %>% summarise(mean = mean(bwt), SD = sd(bwt))
mean SD
1 119.5769 18.23645
hist(babyData$bwt, col = 'slateblue4')
How precise is this sample mean?
qt(0.975, 1235)
[1] 1.961887
i.e.
sd(babyData$bwt)/sqrt(nrow(babyData))
[1] 0.5187177
So altogether:
N <- length(babyData$bwt)
df <- N-1
alpha <- 0.05
Est <- mean(babyData$bwt)
SE <- Est/sqrt(N)
tmult <- qt(1-alpha/2, df)
upper <- Est + tmult*SE
lower <- Est - tmult*SE
lower
upper
[1] 118.5592
[1] 120.5945
Of course, who'd do it that way?
t.test(babyData$bwt)
One Sample t-test
data: babyData$bwt
t = 230.52, df = 1235, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
118.5592 120.5945
sample estimates:
mean of x
119.5769
“With 95% confidence we estimate that the mean birthweight for babies born in San Francisco between 1960 and 1967 was somewhere between 118.6 and 120.6 ounces”
Summary statistics for birth weights from the smoking/non-smoking groups are as follows:
babyData %>% filter(smoke!=9) %>% group_by(smoke) %>% summarise(mean = mean(bwt), SD = sd(bwt), n = n())
# A tibble: 2 x 4
smoke mean SD n
<int> <dbl> <dbl> <int>
1 0 123. 17.4 742
2 1 114. 18.1 484
NB from previous description, “Smoking status of mother: 0=not now, 1=yes now, 9=unknown”
smokingMothers <- babyData %>% filter(smoke != 9) %>% mutate(smoke = factor(smoke))
p <- ggplot(data = smokingMothers) + geom_boxplot(aes(x = factor(smoke), y = bwt, fill = smoke))
p
p <- ggplot(data = smokingMothers) + geom_histogram(aes(bwt, ..density.., fill = smoke), alpha = 0.5)
p
As before we use the standard error of the difference to quantify the precision of the difference between sample means, ie. when the two samples are independent,
\[ se(\bar{x}_{1}-\bar{x}_{2})=\sqrt{\frac{s_{1}^2}{n_{1}}+\frac{s_{2}^2}{n_{2}}} \] for hand calculation, we use \[ df=Min(n_1-1,n_2-1) \]
This is a conservative approach to calculating \( df \) – computer packages may use the Welch procedure and a comparatively complicated \( df \) formula.
\( \theta=\mu_{NS}-\mu_{S} \), the difference between the mean birthweight for babies born to non-smokers and the mean birthweight for babies born to smokers in the population.
\( \hat{\theta}=\bar{x}_{NS}-\bar{x}_{S}=123.0472-114.1095=8.94 \) ounces
this is the sample estimate of the difference in means
\[ \begin{align*} \textrm{diff. between sample means} ~~\pm ~& ~~t-mult \times ~\textrm{SE of the difference}\\ \bar{x}_1-\bar{x}_2 \pm ~& ~~t \times se(\bar{x}_1-\bar{x}_2)\\ \textrm{when the two samples are independent,}\\ se(\bar{x}_{1}-\bar{x}_{2})= & \sqrt{\frac{s_{1}^2}{n_{1}}+\frac{s_{2}^2}{n_{2}}}\\ \textrm{for hand calculation, we use} ~~ & df=Min(n_1-1,n_2-1) \end{align*} \]
Choose the level of confidence
Find the \( t \)-multiplier
qt(0.975, 483)
).\[ se(\hat{\theta})=se(\bar{x}_{NS}-\bar{x}_{S}) \\ \sqrt{\frac{s_{NS}^2}{n_{NS}}+\frac{s_{S}^2}{n_{S}}} \]
sqrt(17.39869^2/742+18.09895^2/484)
[1] 1.041524
upper limit:
\( 8.9377+1.964888 \times 1.041524=10.98418 \)
lower limit:
\( 8.9377-1.964888 \times 1.041524= 6.891222 \)
“With 95% confidence we estimate that the true mean birthweight for babies born to non-smokers (\( \mu_{NS} \)) is somewhere between 6.89 and 10.98 ounces higher than that for babies born to smokers (\( \mu_S \)).”
Note, this interval does not contain zero.
Generally not by “hand” of course:
t.test(bwt ~ smoke, data = smokingMothers)
Welch Two Sample t-test
data: bwt by smoke
t = 8.5813, df = 1003.2, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.89385 10.98148
sample estimates:
mean in group 0 mean in group 1
123.0472 114.1095
A special case of a linear model. Next week covers these
# t-tests can be a simple linear model
exampleLM <- lm(bwt ~ smoke, data = smokingMothers)
# our estimated difference is in here
coef(exampleLM)
(Intercept) smoke1
123.047170 -8.937666
# with confidence intervals here
confint(exampleLM)
2.5 % 97.5 %
(Intercept) 121.77391 124.320430
smoke1 -10.96413 -6.911199
We look at another standard inferential tool: testing
This is where the ubiquitous p-value comes from
They are intimately related to confidence intervals, as we will see
The histogram below shows the distribution of \( \hat{p} \) obtained for 5000 simulations. This could be termed simulating under a 'Null hypothesis', i.e. the types of results we expect under the theory that subjects are just guessing.
set.seed(5647)
simGuess <- rbinom(5000, 60000, p = 0.2)
simGuess <- simGuess/60000
hist(simGuess, col = 'purple', xlim = c(0.19, 0.21))
abline(v = 0.208, col = 'slateblue4', lty = 3, lwd = 2)
Hypotheses
ie. hypotheses contain: \( \mu \), \( p \) , \( \mu_1-\mu_2 \), \( p_1-p_2 \). In this example, we are interested in \( p \).
We cannot rule in a hypothesized value for a parameter, we can only determine whether there is evidence to rule out a hypothesized value
For instance, we will see that if the null hypothesis is \( H_0: p=0.2 \) or \( H_0: p=0.202 \), we still draw the same conclusions.
This does not mean that \( p=0.202 \) or \( p=0.2 \).
\( H_1 \): The true proportion of correct guesses is NOT 0.2. ie. \( H_1: p \neq 0.2 \)
In this case, we might expect to find a success rate higher than 0.2 and \( H_1: p > 0.2 \)
Example: age of mothers
Revisiting the Maternal smoking and infant health' data. Formally test if the average age of mothers in the US and the UK were the same.
\[ \bar{x}_{SF} \pm t_{(0.025, df=n-1)} \times \frac{s_{SF}}{\sqrt{n}} \]
\[ 27.25527 \pm 1.96189 \times 0.1645795 = (26.93238, 27.57815) \]
With 95% confidence, we estimate that the mean age of women that gave birth in San Francisco between 1960 and 1967 was somewhere between 26.93 and 27.58 years.
In this case, we want to evaluate the strength of evidence that the average age of mothers in the US differs from the corresponding age in the UK
Put formally: \( H_0: \mu_{SF} = \mu_{UK} \)
Put formally: \( H_A: \mu_{SF} \ne \mu_{UK} \)
\( t \)-test statistic \[ t_0 = \frac{\textrm{estimate}-\textrm{parameter value}}{\textrm{standard error}} \]
\[ t_0=\frac{\bar{x}-\mu_{0}}{se(\bar{x})} \]
\[ t_0=\frac{27.25527-27.2}{ 0.1645795}=0.3358099 \]
\[ T=\frac{\textrm{estimator} - \textrm{true parameter value}}{\textrm{standard error}} \sim t_{(df=n-1)}-\textrm{distribution} \]
If the true mean value is 27.2 years, then \( t_0 \) is most likely to lie in the center region of the distribution.
Under repeated sampling when the \( H_0 \) is true, 95% of the sample estimates will fall within \( 1.96 \) standard errors.
If the true mean value is 27.2 years, then \( t_0 \) is unlikely to lie in the tails of the distribution.
Under repeated sampling when the \( H_0 \) is true, 5% of the sample estimates will fall outwith \( 1.96 \) standard errors. ie. \( Pr(T>|1.96|) \approx 0.05 \)
x <- seq(-3, 3, length = 200)
plot(x, dt(x, 1233), type = 'l', col = 'slateblue4', lwd = 2)
abline(v = 0.3358, lwd = 2, lty = 3)
An estimate, and \( t_0 \) value, this different from 27.2 is highly likely to be obtained when the hypothesized value is correct.
e.g. if sampling 100,000 values from the distribution of the test statistic (\( t_{1233} \)), some 73459 of these 100,000 values (73.46%) had values greater than 0.336 or less than -0.336.
So, when the true mean is 27.2, about 73% of the time we would obtain an estimate at least this different from 27.2 - not a rare occurrence
the data are consistent with the Null hypothesis.
BTW We've just done a type of \( t \)-test
We return to our first example
\[ H_0: p= 0.2 \]
\[ H_1: p \neq 0.2 \]
\[ t_0 = \frac{\textrm{estimate}-\textrm{parameter value}}{\textrm{standard error}} \]
\[ t_0=\frac{\hat{p}-p_0}{se(\hat{p})} \]
and the standard error is:
\[ se(\hat{p})=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.20815(1-0.20815)}{60000}}= 0.001657426 \]
which gives:
\[ t_0=\frac{0.20815-0.2}{0.001657426}= 4.917263 \]
\[ T=\frac{\textrm{estimator} - \textrm{true parameter value}}{\textrm{standard error}} \sim z-\textrm{distribution} \]
We've covered:
Next: