2. Tutorial for POPLHLTH 216: Quantitative methods for Health Research

Simon Thornley

1 August, 2023


Exploring inferential statistics.

How can we avoid dental procedures?
How can we avoid dental procedures?

Tutorial aims

Gain confidence using iNZight lite to explore hypothesis testing for categorical and continuous variables.


We have been interested in the relationship between understanding of sugar content in food and the link to dental caries.

We have just carried out an epidemiological study with a questionnaire of the class responses using google forms.

This is a cross-sectional study, so we have to interpret our findings with caution, being mindful of the limits of such a study design. For example, since the exposure and outcome, and in fact all information, are collected at the same time, we have to be cautious about the possibility of reverse causation. Does the outcome lead to the exposure, rather than the converse?

Sugar and dental caries class study

This is a cross-sectional study that asks you various questions about your demography, height, sugary drink intake, and various health outcomes, such as need for dental treatment and hospital treatment in the past year.

We have a belief or hypothesis that people who consume less sugar have fewer rotten teeth, and need fewer dental procedures.

We will be revising some of the lecture concepts this week by exploring these data.

We will be using iNZight lite available here.

An edited version of the spreadsheet results is available here.

We will assume that we’ve covered the data checking side of it, but for revision, can you remember what the three four important items to check are?

  • Duplicates (check in Excel)
  • Ranges
    • Are any variables out of range?
  • Missing values
    • Take a note as they may affect calculations down the track.
    • Make sure that they are consistently coded. We want to avoid the situation of having blank cells and “I don’t know”. Generally, choose a generic code - usually blank cell and stick with that.
    • More than 15% is generally considered problematic.
  • Consistent coding of variables.

Revision

Upload the spreadsheet

Upload the spreadsheet into INZight using File –> Import Dataset

Navigate to where your file is located.

Is sugary drink intake associated with rotten teeth?

First select

Filling_extraction_last_year

and

How_often_sugary_drinks

in the

Visualise

tab.

The order of the variables for How_often_sugary_drinks is not very intuitive. I suggest that you re-order them using:

Manipulate variables –> Categorical variables –> Reorder Levels

Select How_often_sugary_drinks and check Sort levels manually.

Select the lowest intake level first (Never) and then the next and so on.

With your new reordered variable, check the nature of the association with the Filling_extraction_last_year variable.

Interpret the barplot. What does it show?

The image below is intended to give you some pointers to interpretation.

Barplot interpretation
Barplot interpretation

You can appreciate that students who drink more sugary drinks have a higher prevalence of fillings and extractions.

That’s pretty cool. We’ve learned from our own class mates’ experience.

It is important to visualise the direction of association so that you don’t misinterpret inference information later.

Click on

"Inference"

Then check the

"Epidemiology Options"

box if it is available.

Interpret the output. Focus on the

"More than one a day"

column. What does this tell you about the direction of the association?

Remember, a relative risk is always about comparing two groups. It is the risk of the outcome in one exposure group, divided by the risk of the outcome in the reference group.

Relative risks greater than 1 indicate the numerator group has a higher frequency of the outcome than the denominator (reference) group. For example, in this example, we see that the following output:

RR 95% CI P-value
Never 1.00 - -
Less than one a day 1.35 (0.49, 3.71) 0.782
About one a day 2.35 (0.73, 7.63) 0.244
More than one a day 2.67 (0.62, 11.44) 0.234

Here, the reference is the Never category and the More than one a day “RR” figure means that the higher intake group are ~2.7 times more likely to have a filling and extraction than the Never group. The \(P\)-value (0.234), however, indicates that the results or more extreme are likely under the hypothesis of no association.

Remember to be explicit about who the two exposure groups you are comparing. Here it is the More than one a day drinkers with those who report Never drinking sugary drinks.

Interpret the output? What does the 95% confidence interval and the P-value** mean?

If the result shows no statistical significance (P > 0.05), does that mean our hypothesis is wrong?

What other explanation for our results could there be?

  • Remember that the size of the P-value is related to the sample size. The larger the sample size, the smaller the P-value. Considering we have evidence of a strong association and a dose-response association (both part of Bradford-Hill’s criteria for causation) it may be that our study lacks statistical power and that this is a type-2 error (false-negative).

  • What might happen if instead of asking students about their rotten teeth, we bought in a dentist to examine their teeth? This might reduce measurement error. If measurement error is non-differential or random it tends to reduce the magnitude of an association, which in turn increases the \(P\)-value.

  • Could there be an element of reverse-causation that may bias toward a null result? Since this was a cross-sectional study, rather than a cohort study that separates exposure from effect temporally, it is possible that students who underwent extractions had advice from their dentist to reduce their sugar intake. This would have the effect of reducing the true nature of the association.

  • Perhaps there is confounding from other variables. For example, other sugary foods may be more important causes of rotten teeth than sugary drinks. We have not adjusted for this in our analysis.

To tease between these possibilities, it would be important to look at other studies and examine the Bradford-Hill criteria for causation for the sugar-rotten teeth hypothesis. An overview of other evidence on the subject is given here, for those who are interested.


Electric tooth brushes and rotten teeth

Describe the nature of the association between these variables?

What explanation could there be for this association? Think about both random and systematic error.

Imagine you are working for a dental health service. Would you recommend the service invest in providing electric toothbrushes to its population to improve oral health based on this evidence?


Rotten teeth and visiting hospital

One could make the argument that rotten teeth in the last year (Filling_extraction_last_year) is a better indicator of sugar intake than that related to sugary drinks.

In order to examine the relationship between sugar and hospital visits, we will consider the relationship between Filling_extraction_last_year and Hospital_24_hours_last_year.

Interpret the plot and inference information. Remember to check Epidemiology options.


Do people who eat more sugar end up being taller?

Select the two category variable of sugar intake and Height_cm.

Interpret both the plot and inference?

What is another biological cause of Height_cm that we may wish to account for?

Subset by Gender

Interpret the plots and the inference information (Select two-sample t-test).

How could you get iNZight to report a risk ratio or odds ratio for this association?

Is the P-value likely to be higher or lower than that for the t-test?

Why is this?


In your own time (practice)!

Investigate the evidence that height differs by gender

  • What is the distribution of the height variable?

(Ans: symmetric - this is important for choosing the statistical test).

  • What is the mean difference between men and women in the class? Use the summary tab.

(Ans: 10.0 cm)

  • Is this difference statistically significant? You’ll need to use the inference tab and check the ANOVA box.

(Ans: P < 0.001 for the Male-Female comparison, so yes, it is statistically significant).


Investigate the evidence that height differs by ethnicity

I suggest collapsing “Other”, “New Zealand European” and “Middle Eastern” into one category. You will also need to select ANOVA in the inference tab. We will cover ANOVA later, but the P-values, means and 95% confidence intervals can be interpreted in the same way as a t-test. Which ethnicity is tallest? Which is shortest? Are any of the differences statistically significant?

(Ans: tallest ethnic group is NZ European and Other (mean = 169.2 cm). Shortest is Other Asian (mean = 161.7 cm). This was the only statistically significant pairwise difference - \(P\) = 0.0055.)