July 05, 2017

Table of Content

  1. Introduction/Purpose
  2. THE General Principles
  3. Reporting Statistical Methods
  4. Reporting Statistical Results

1. Introduction/Purpose

Introduction/Purpose

This set of slides aim at providing some guidelines for reporting statistical analyses and methods.

Based on the SAMPL Guidelines for Biomedical Journals by Thomas Lang and Douglas Altman (Lang and Altman 2015).

Reporting statistical methods and results is not as simple as many might think. As a result, these slides are nowhere near a complete description of the process. However, they should provide an idea of what is needed, and lots of keywords that can be used for further research in the area. When it comes down to it, the best approach is to consult your favorite statistician. If you don't have any, I suggest you befriend one asap!

In sections 2-4 you will find a bunch of text. Probably too much text. This is a very good indicator of all the things that should be considered when reporting your analyses. This part of the slides can be used as a compass: it won't really get you there, but it'll at least point you in the right direction. From there on, Google is your friend (or your favorite statistician). I'd suggest using this as a tiny "encyclopedia" – don't read it word for word until you need it.

2. THE General Principles

THE General Principles

From Lang and Altman (2015):

Number 1:

Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verfy the reported results.

  • I.e., make sure your analysis is reproducible for someone with the right statistical knowledge, but no prior knowledge about your work.

Number 2:

Provide enough detail that the results can be incorporated into other analyses.

The next section is divided into two parts: the first deals with how to report the details of the statistical methods used, the second with how to report the results.

3. Reporting Statistical Methods

Preliminary analyses

  • Study design
    • Describe the study design as accurately as possible
    • Sample size/power calculations (if you're not convinced power calculations are necessary: check this out)
    • Inclusion/Exclusion criteria: if not done proberly, excluding subjects could bias results
  • Variable transformations, new variable creation, …
    • log transformed to achieve normality, mean of variables used as explanatory variable, continuous data collapsed into categories, etc.
    • not a bad idea to mention if NO transformations where used (depending on methods used)

Primary analysis

  • Describe the purpose upfront by stating your hypothesis clearly.
    • This makes p-hacking less likely to occur (more on that later)
  • Describe the method used to test your hypothesis in full
    • Again, any transformations used, variables included
  • Include descriptive statistics
    • means and standard deviations
    • if data is not normal, median and range/IQR (Inter Quartile Range)
  • Justify your choice of method
    • Was the model validated by checking the assumptions?
      • Check for normality; if data not normal, use non-parametric test (Wilcoxon instead of t-test, Kuskal-Wallis instead of ANOVA, …)
      • Linear regression: make sure to check for normality, independence, constant variance, linearity
  • Multiple comparisons: Adjust p-values for multiple comparisons (see more here, slide 17-18)

4. Reporting Statistical Results

Numbers and Descriptive Statistics

  • Report total sample and group sizes
  • Report numerators and denominators for percentages
    • this makes it possible for others to use your results in their analysis
  • Report appropriate descriptives:
    • for normally distributed data: means and standard deviations
      • report as mean (SD) rather than mean +/- SD, as it minimizes the chance of confusions (often mean +/- … is used for confidence intervals or mean +/- SEM)
    • for non-normal data: median, interquartile range, and range
  • Do NOT use the standard error of the mean (SE) to report variability of the data.
    • SE is an inferential statistic - it is basically a 67% confidence interval for the mean
    • instead, use standard deviation, interquartile range, range, etc.
  • When appropriate, use tables and figures
    • tables should be used to give a clear and exact picture of the data
    • figures should be used to give a clear picture of trends and an overall assessment of the data

Hypothesis Tests

  • Clearly state the hypothesis
    • all to often do I simply see a p-value with a comment: "we found a significant difference (p-value < 0.05)"
  • Clearly describe the test used
    • paired or unpaired, one- or two-tailed, etc.
    • don't forget to justify the choice of test
  • Provide descriptive statistics for variables used in the test
  • When applicable, report what is considered to be the 'minimal difference of clinical relevance'
    • what difference is the smallest that's considered to be important?
  • Check that the assumptions are not violated
    • see previous slide for some examples
  • Report the alpha level used to define statistical significance
    • often 0.05

Hypothesis Tests (cont.)

  • Report a measure of precision, preferably 95% confidence intervals, for primary outcomes
    • primary means such as difference in means, agreement between groups, etc.
    • DO NOT use the standard error of mean (SE), as this is simply a 67% confidence interval, which is rarely useful
  • If p-values are included, report EXACT values (p = 0.22, p = 0.04) instead of inequalities (p > 0.05, p < 0.05)
    • if p-values are smaller than 0.001, reporting as p < 0.001 is often okay
  • Regarding p-values, when applicable, adjust p-values for multiple testing
    • I personally prefer Benjamini-Hochberg's method. See more here starting on slide 17

Regression Analyses/ANOVA

  • Provide descriptive statistics for variables used in the analysis
  • Check that the assumptions are not violated
    • for regression analysis, this often includes checking assumptions about the residuals (normality, independence, linearity, etc.)
  • Specify how the model was validated
    • of closely related to checking the assumptions: if assumptions are met, the model is more likely to be trustworthy
  • Were outliers identified? If yes, how were they dealt with?
  • How did you handle missing values?
  • If only a subset of variables were included, describe how/why you chose that specific subset
  • Report regression coefficients with 95% confidence intervals and p-values
    • ideally in a table
  • If the model is "simple enough", consider providing a visualization of the model
    • for a simple linear regression, this would be a scatter plot with the regression line (and 95% confidence interval)

References

References

Lang, Thomas A., and Douglas G. Altman. 2015. “Basic Statistical Reporting for Articles Published in Biomedical Journals: The ‘Statistical Analyses and Methods in the Published Literature’ or the Sampl Guidelines.” International Journal of Nursing Studies 52 (1): 5–9. doi:http://dx.doi.org/10.1016/j.ijnurstu.2014.09.006.