This document are a summary of rules for self-reference from the book Statistical Rules of Thumb by Gerald van Belle. Most of the notes have been taken verbatim. Please refer to the book for a detailed description without which the notes may be meaningless to the uninitiated reader.

- Any statistical treatment must address the questions

- What is the question?
- Can it be measured?
- When, where, and how will you get the data?
- What do you think the data are telling you?

Observation is selection

Replicate to characterize random variation

Variability occurs at multiple levels

Invalid selection is the primary threat to valid inference

Compared with experimental studies, observational studies provide less robust information

Make a sharp distinction between observational and experimental studies

Always look for a physical model underlying the data being analyzed. Assume that a statistical model, such as a linear model, is a good first start only

Keep models as simple as possible but no more simple

Be sure to understand the components and purpose of an omnibus quantity

Do not multiply probabilities more than necessary. Probabilities are bounded by 1; multiplication of enough probabilities will always lead to a small number

The use of one sided p-values is discouraged. Ordinarily, use 2-sided p-values

When designing experiments or observational studies, focus on p-values to calculate sample size; when representing results, focus on sample size

Use atleast 12 observations in constructing a confidence interval

For samples \(\geq\) 20, a point estimate +/- 2 standard errors has a 95% coverage for a wide variety of distributions

Always know what the unit of a variable is

Do not let scale of measurement rigidly determine method of analysis

The practical applied statistician uses methods by all three schools (Neyman-Pearson, Likelihood, Bayesian) as appropriate

The basic formula (Lehr’s equation) for sample size is \[ n = 16/\Delta^2\] where \[ \Delta = \frac{\mu_0 - \mu_1}{\sigma} = \frac{\delta}{\sigma}\] is the standardized difference. In the single sample case (where a single sample is compared to a known population value), the numerator is 8 instead of 16

The sample size using coefficient of variation (CV) is given by \[n = \frac{16(CV)^2}{(ln(\mu_0)-ln(\mu_1))^2}\]

Finite population size correction can be ignored in initial discussions of survey sample size questions

The range of the observation is related to the standard deviation as follows: \[ \frac{range}{\sqrt{2(n-1)}} \leq s \leq \frac{n}{n-1}\frac{range}{2}\]

Do not formulate objectives for a study solely in terms of effect size

Confidence intervals associated with statistics for two variables can overlap as much as 29% and the statistics can still be significantly different

If \(\theta_1\) and \(\theta_2\) are the means of two poisson-distributed populations, then the required number of observations per sample is \[ n = \frac{4}{(\sqrt(\theta_1)-\sqrt(\theta_2))^2}\]

The sample size calculation for a poisson distribution with background rate \(\theta*\) is given by \[n = \frac{4}{(\sqrt(\theta* + \theta_1)-\sqrt(\theta* + \theta_2))^2}\]

The sample size calculation for a binomial distribution is given by \[ n = \frac{16\bar{\pi}(1-\bar{\pi})}{(\pi_0 - \pi_1)^2} \] where \[\bar{\pi}=\frac{\pi_0 + \pi_1}{2}\]

For unequal sample sizes where one group contains \(n_0\) samples and the other group contains \(kn_0\) samples, choose k such that \[k = \frac{n_0}{2*n_0-n}\] to get the same precision as having an equal number of samples in each group

When there are different costs associated with each sample, choose a sample size that is inversely proportional to the square root of the cost of the observations

Given no observed events in \(n\) trials, the 95% upper bound on the rate of occurence is \(3/n\)

Sample size calculations should be based on the statistics used in the analysis of the data

The model for an observational study is the sample survey

Large sample size do not guarantee validity

Good observational studies are designed

To establish cause and effect requires longitudinal data

Make theories elaborate. Consider many alternative explanations for the observed effect

The Hill guidelines are useful in determining causation

Sensitivity analyses assesses model uncertainty and missing data

Before choosing a measure of covariation, determine the source of the data, the nature of variables, and the symmetry status of the measure

Do not summarize regression sampling schemes with correlation

Do not correlate rates or ratios indiscriminately

To determine the appropriate sample size to estimate a population correlatiob \(\rho\), use the following \(\Delta\) in Rule 1 of sample size

\[\Delta=\frac{1}{2}ln\frac{1+\rho}{1-\rho}\]

Do not pair unless the correlation between the pairs is \(>\) 0.5

Go beyond correlation in drawing conclusions, particularly in instances where location and scale are relevant

Assess agreement in terms of accuracy, scale differential, and precision

Assess test reliability by means of agreement

The range of the predictor variable determines the precision of the regression

In measuring change, width (i.e. spacing of the observations) is more important than the number of observations

Begin with the lognormal distribution in environmental studies

Differences are more symmetrical

Know the sample space for statements of risk

Beware of pseudo-replication (Hurlbert 1984)

Always consider alternatives to simple random sampling for a potential increase in efficiency, lower costs, and validity

In assessing the importance of an effect, consider the size of the population to which it applies

Models estimating small effects in large populations are particularly sensitive to assumptions. Extensive sensitivity studies are needed in such cases to validate the model

In assessing variation, distinguish between variability and uncertainty

In using a database, first look at the metadata, then look at the data

Always assess the statistical basis for an environmental standard

How a pollutant is measured plays a key role in identification, regulation, enforcement and remediation

Parametric analysis make maximum use of the data

Distinguish between confidence, prediction, and tolerance intervals (Vardeman 1992)

Risk assessment is divided into 5 areas - hazard identification, dose-response evaluation, exposure assessment, risk characterization, and risk management. Statistics plays an important role in the first 4. The last involves policy based on the first 4

Exposure and disease are usually widely separated both in space and time. Retrospective assessment of exposure is very difficult - particularly if the causes and mechanisms are poorly understood

Calibration involves inverse regression, and the error associated with the regression must be assessed

Start with the poisson distribution to model disease incidence or prevalence

For a rare disease, the odds ratio approximates the relative risk

To detect a relative risk R in a rare disease cohort study, the number of exposed subjects (or unexposed subjects) \(n\) for \(\alpha\)=0.05 and power = 0.8 is given by \[ n = \frac{4}{\pi_0(\sqrt{R-1})^2} \] where \(\pi_0\) is the probability of the disease in the unexposed population and \(R\) is the relative risk assumed to be >1

The estimate of sample size per group in a cohort study, based on the logarithm of the relative risk \(R\) is given by \[ n= \frac{8(R+1)/R}{\pi_0(ln R)^2}\] for \(\alpha\)=0.05, power=0.8 and a two-sided alternative

Take no more than 4 to 5 controls per case

In logistic regression situations, about 10 events per variable are necessary inorder to get reasonably stable estimates of the regression coefficients

Begin with the exponential distribution to model time to event

Begin with two exponentials for comparing survival times

Be wary of surrogates. Accept substitutes warily

In rare diseases, the prevalence dominates the predictive value of a positive test

Do not dichotomize unless absolutely necessary

Select an additive or multiplicative model according to the following order: theoretical justification, practical implication, and computer implementation

There are three hierarchies of evidence, each of which depend on the question asked and on the population of interest

The distinction between patient-oriented (POEM) and disease oriented (DOE) evidence is almost completely the difference between a surrogate endpoint and clinically relevant endpoint

In comparing two treatment regimens with binary outcome, start with absolute risk reduction

Number neeeded to treat (NNT) is a very useful clinical statistic but must be handled with care

Variability in treatment effect must always be considered over and above the average effect

Evidence for safety is limited

Intent to treat (ITT) is the default strategy for analysis

In EBM, it is more useful to discuss information about the prior rather than the prior

The four key questions for meta-analysis are the same as those in rule 1 of

*Basics*

Randomization puts systematic sources of variability into the error term

Blocking is the key to reducing variability

Factorial design should be used to assess the joint effects of variables

Higher order effects occur rarely. Therefore it is not necessary to design experiments to incorporate higher order effects

Aim for balance in the design of a study

Analysis should follow design

Assess independence, equal variance, and normality in that order

For every analysis, there is an appropriate graphical display

Distinguish between design structure and treatment structure of a study

Plan to do a hierarchical analysis of treatment effects by including all lower order effects associated with a higher order effect

Distinguish between nested and crossed design. The analysis will be quite different

Plan for missing data

Develop a strategy for dealing with multiple comparisons before starting a study

Know what properties a transformation preserves or does not preserve

Think of bootstrapping instead of the delta method in estimating complex relationships

Agresti, Alan, An Introduction to Categorical Data Analysis, Wiley-Interscience, 2007.

Rosenbaum, Paul R. Observational Studies (second edition), Springer New York, 2002.

Cohen, Jacob Statistical Power Analysis for the Behavioral Sciences Routledge, 2nd edition, 1988

Cameron, Colin and Trivedi, Pravin Regression Analysis of Count Data Cambridge University Press, 2nd edition, 2013

Marcus-Roberts, Roberts, Meaningless Statistics, Journal of Educational Statistics, 1987.

Hill, A. B. The Environment or Disease: Association or Causation?, Presidential Address to the Section of Occupational Medicine of the Royal Society of Medicine, 1965.

Vardeman, S. B. What about the other intervals?, The American Statistician, Vol 46, No. 3, Aug 1992, pp 193-197

Hurlbert, S. H. Pseudoreplication and the design of ecological field experiments, Ecological Monographs, 1984, pp 187-211

Malinas, Gary and Bigelow, John, Simpson’s Paradox, The Stanford Encyclopedia of Philosophy (Winter 2012 Edition), Edward N. Zalta (ed.)

Sandman, Peter, Mass Media and Environmental Risk: 7 Principles, 1997.