We will use this as a chance to both develop our analytical skills and revise some of the material we've been discussing.
This is otherwise unexplained death, usually during sleep, in infants aged less than 12 months. The cause is not known, but there were several theories at the time of the study, related to suffocation induced from infant sleep position or bed-sharing, or exposure to tobacco smoke, usually from the parents.
The full story of cot-death is largely one of an iatrogenic (medically caused) tragedy and I encourage you to read the paper which is linked below the plot.
Source: Neonatology 2018;113:162--169 DOI: 10.1159/000481880
This is a case-control study that was carried out in the 1980s to address the problem of an epidemic of cot-death in New Zealand.
The main hypotheses of the cause were related to:
being the main issues that were thought to lead to cot-death.
Cases were those that came to medical attention due to their child's death. Controls were sampled from the community at the same time as the cases were recruited.
🤓 The main reason the researchers chose to use a case-control design was because cot death is a . Case-control studies are particularly efficient study designs for studying rare .
The data were collected in the following .csv
file
available in Canvas.
The meaning of the variables should be self-explanatory.
We will be using iNZight lite.
We will assume that we've covered the data checking side of it, but for revision, can you remember what the four important items to check are?
As well as looking for missing values and ranges, what is the other important issue to look for in a dataset?
As a reminder, the issues to consider are:
🤓 Overall, the prevalence is: % [two significant figures].
🤓 In cases, % [two significant figures].
🤓 In controls, % [two significant figures].
🤓 This indicate that maternal smoking associated with cot-death, because smoking is higher prevalence in cases of cot-death compared to controls.
Use iNZight lite to determine the magnitude and direction of the association between maternal smoking and cot-death.
First select the (outcome) variable:
Case_status
followed by:
Mother_smoke
as the second variable (exposure).
Check and interpret the bar plot. By clicking on the
Add to Plot
button, you can experiment with different types
of plot. You can see that the percentage of cases is much higher in the
smoking group than the non-smoking.
Check the Summary
tab and inspect the raw numbers and
percentages to verify the nature and direction of the association.
Check the Epidemiology options
in the
Inference
tab (check the Epidemiology options
box).
🤓 Which measure of association is usually reported for a case-control study?
Interpret the findings.
🤓 Please select the most appropriate description:
Our hypothesis is related to 'mothers who smoke are 4.4 times more likely to have a cot-death baby than non-smokers' and in fact, this is equal to our estimate which is rather 'mothers of cot-death babies are 4.4 times more likely to be smokers (than non-smokers) than controls' thanks to the use of odds ratios. So you could in actual fact use either of these interpretations, but the latter is consistent with how the data has been collected. That is, cases and controls have been sampled, and then exposure has been assessed. As a matter of fact, you can switch 'Case_status' and 'Mother_smoke' and you'll notice that the odds ratio is identical! It is not so for the relative risk.
We are interested in the interpretation 'mothers who smoke are 4.4 times more likely to have a baby die of cot-death than controls' and in fact, this is equal to our estimate which is rather 'mothers of cot-death babies are 4.4 times more likely to be smokers (than non-smokers) than controls' thanks to the use of odds ratios. So you could in actual fact use either of these interpretations, but the latter is consistent with how the data has been collected. That is, cases and controls have been sampled, and then exposure has been assessed.
Warning Remember that case-control studies should use odds ratios rather than relative risks as measures of association. For simplicity, we usually say that odds ratios from case-control studies are interpreted as risk ratios, however, the gory reality is that, depending on the design, the odds ratio sometimes has a different interpretation, sometimes as a rate ratio. See here for details.
Remember, the first step is to consider the nature of the two groups being compared. An example is: Mothers of babies who died of cot-death [cases] were 4.37 times [odds ratio interpreted as relative risk] more likely to smoke [exposure] than mothers of healthy babies [controls]. Note: This is somewhat different from how risk ratios are presented from cross-sectional and cohort studies, where we consider the condition to be consistent with our hypothesis (exposure). Usually we are interested in the risk of disease given exposure, but in a case-control study, it is the risk of exposure, given disease (case or control status).
Could chance be an explanation of this finding?
Examine the \(P\)-value. Is it less than 0.05?
🤓 What is a P-value?
🤓 Which statement about a 95% confidence interval of the mean is true?
What additional information might you like to consider before assuming that there is a causal association here?
Your boss is excited by these findings and convinced that you may have the key to solving the cot-death epidemic 😆. If we 'took away' smoking from this population, what changes in incidence of cot-death would we expect to see?
🤓 Which of the following is the most appropriate statistic to answer this question?
The proportion of the cases in the population that may be prevented if the exposure is removed. It is derived from a measure of association (relative risk or odds ratio) and the prevalence of exposure in the population. A causal assumption is inherent in the calculation: i.e. that the exposure causes the outcome.
The formula is: \[ \text{Population
proportion attributable risk} = \frac{\text{prevalence}_\text{exposure}
(\text{RR} - 1)}{1 + \text{prevalence}_\text{exposure} (\text{RR} - 1)}
\] You will need a calculator or Microsoft Excel to work this
out, since unfortunately, iNZight
doesn't perform this
calculation!
Remember, \(\text{prevalence}_{\text{exposure}}\) here is likely to be closer to that estimated from controls rather than from cases!
The correct answer here is: \[ \begin{align} \text{Population proportion attributable risk} &= \frac{0.311*(4.37 - 1)}{1 + 0.311*(4.37 - 1)} \\ &= \frac{1.05}{2.05} \\ &= 0.51 \text{ or } 51\% \\ \end{align} \]
Here, prevalence of exposure is derived from the prevalence in controls - since this is close to the population prevalence, and the relative risk is taken from the odds ratio.
Warning
For cohort or cross-sectional studies, the prevalence in a population attributable risk calculation is generally taken as the unconditional or overall prevalence of exposure. In a case-control study, the best assessment of the overall prevalence of exposure is from the control group, although, technically, it could be thought of as a weighted average of the prevalence in the controls and the cases. The weights would be proportions of the cases and controls in the study population.
Remember, that we have made a hidden assumption here. We are now considering that maternal smoking is causal when we only have evidence of association from the analysis. What additional steps do we need to take to move from association to causation?
Select
Case_status
As the outcome and
Sleep_position
as the second variable.
Interpret the barplot. What does it show?
In what direction do you expect to see the association?
It is important to visualise the direction of association so that you don't misinterpret inference information later.
You might need to reorder the Sleep_position
variable to
improve the interpretation of the graph. It makes sense here to go from
low risk to high risk.
In that case, I suggest Back
first, then
Side
, then Other
, then
Front_face_down
and finally,
Front_face_to_side
.
Use Manipulate variables
-->
Categorical variables
-->
Reorder variables
Click
"Inference"
Check
"Epidemiology Options"
box if it is available.
Interpret the output. Focus on the direction and strength of association.
🤓 Which sleeping position is highest risk?
🤓 Which exposure category is the software selecting as the comparison?
How could we change that? Verify that your changes make more sensible output and comparison categories.
Change the exposure category to compare front_face_down
and front_face_to_side
with every other category.
Hint: you may need to use the following functions...
"Manipulate Variables" --> "Categorical variables" --> "Collapse levels"
and
"Manipulate Variables" --> "Categorical variables" --> "Reorder levels"
Once you have your binary variable (face down vs. other), estimate the association between this variable and case-status.
Check the Epidemiology options
in the
Inference
tab.
🤓 Interpret the findings.
🤓 Could chance be an explanation of this finding?
The \(P\)-value is extremely small (< 0.001), indicating that sleeping position is likely to affect the risk of cot-death.
What additional information might you like to consider before assuming that there is a causal association here?
🤓 Possible other explanations for an association include all of the following except:
Type-2 error is not possible because this is a false-negative and we have a positive association. For a type-2 error,
we would need a null or not-significant association.
Your boss is (again!) convinced that they may have the key to solving the cot-death epidemic. If we 'took away' sleeping face down from this population, what changes in incidence of cot-death would we expect to see?
First we need to decide what the prevalence of sleeping on the front in controls is.
The answer is the prevalence in controls, so you need to first select
Case_status
then your binary variable for sleeping
position.
The relevant odds ratio is 3.9, so the calculation is: \[ \begin{align} \text{Population attributable risk} &= \frac{\text{prevalence}_\text{exposure} (\text{OR} - 1)}{1 + \text{prevalence}_\text{exposure} (\text{OR} - 1)} \\ &= \frac{0.32*(3.9 - 1)}{1 + 0.32*(3.9 - 1)} \\ &= \frac{0.32*(2.9)}{1 + 0.32*(2.9)}\\ &= \frac{0.93}{1.93}\\ \end{align} \] So, the answer to two significant figures, as a percentage is: %.
🤓 The assumption underlying this calculation is:
Wow 😮. We can potentially prevent half of cot deaths just by telling parents to sleep their infants on their backs rather than their fronts. This is very powerful stuff! 💪
Check the nature of the association between birth weight in grams
(Birth_wt
) and Case_status
.
What does the plot show? Experiment with different types of plots.
🤓 The boxplot shows the distribution of Birth_wt
by
Case_status
is
.
🤓 The statistical test for association between these variables is a test.
🤓 Which group has higher birth weights?
🤓 A description of the statistical test is the difference in birth weight between the controls and cases is grams [to two significant figures or round to the nearest 10], with being heavier on average.
🤓 Is the difference likely to be due to chance (\(P\)-value)?
🤓 You wonder whether these differences could be attributed to
maternal smoking? Add in a third variable (Mother_smoke
),
and now check Inference
. What is the average difference
(between cases and controls) in the smoking
()
and non-smoking
()
groups [to two significant figures]? What is your
conclusion? Smoking is a likely
of the relationship between birth weight and cot-death, since it is a
likely shared common cause of both exposure and
outcome.
🤓 Which ethnic group is at highest risk of cot-death?
🤓 What could this be due to?
The incorrect responses here are possible, but not as likely as maternal smoking status.
🤓 How could you investigate this hypothesis further?
Include maternal smoking as a potential confounder (3rd variable).
What happens to the association between Maori ethnicity and cot-death status?
🤓
🤓 This indicates that smoking is likely to be a of the relationship between ethnicity and cot-death. This is because cigarette smoking is likely to be affected by ethnicity, rather than the other way around.
Attempt to answer the following questions using iNZight
with age as a continuous variable.
🤓 Mothers which are are at highest risk of cot-death.
🤓 The distribution of maternal age is .
🤓 The statistical test for association between these variables is a test.
🤓 A description of the statistical test is the difference in maternal age between the controls and cases is years [to two significant figures], with being older on average.