1 Introduction

Knowledge of the frequency of disease is an important prerequisite for understanding population health, case finding, commissioning and planning services, and understanding variation in health and care. The most commonly used measure of disease frequency is prevalence which is an estimate of the number of cases of a given disease or risk factor in the population at a point in time (point prevalence) or over a given time period (period prevalence).

This short briefing summarizes the methods and results of work commissioned by Public Health England from Imperial College to estimate disease prevalences for:

  • cardiovascular disease
    • stroke
    • coronary heart disease (CHD)
    • peripheral arterial disease (PAD)
  • depression
  • chronic obstructive pulmonary disease (COPD)
  • hypertension (BP)

Work to calculate heart failure estimates is ongoing and national asthma age-sex prevalences have been produced.

These estimates are intended to:

  • update previously calculated values which are now several years out of date
  • complement recent estimates for

Estimates for hypertension as part of this analysis for have been published here.

They have been produced for use by PHE, local authorities and the NHS.

1.1 Where can I get the data?

We will be publishing the general practice level estimates as part of an update of the national general practice profiles, and as part of a new prevalence tool which pulls together all the prevalence estimates currently in fingertips.

We intend to publish the local authority estimates through the PHE Fingertips tool but the data is currently available from here.

1.2 Why estimate prevalence?

With the exception of cancer, complete population disease registers do not exist, so we cannot enumerate the number of cases of any given disease in the population. The Quality and Outcomes Framework (QOF) produces general practice estimates of common diseases but these are not published with age breakdowns, and are based on coded events in the GP record, not externally validated cases.

For this reason it is common practice to ‘fill in the gaps’ by creating estimates of disease prevalence - these are often referred to as modelled or synthetic estimates.

1.3 How were the estimates created?

There a number of ways of creating prevalences. The estimates outlined in this briefing were generated by statistical modelling. The details of the method varies depending on the disease and the daat sources but the basic principles are:

  1. Identify population based studies (cohort studies or surveys) with accurately measured cases of disease and predictor variables for example age, sex, ethnicity. Classify these cases as disease positive or negative.

  2. Construct the best predictive model based on the data in the study which maximizes the prediction of the presence or absence of disease. This gives a formula or algorithm which can then be applied to a population. The usual model is logistic regression.

  3. Assemble the data for local areas (target populations) to populate the algorithm for each predictor variable.

  4. For each population run the algorithm to produce a estimate.

Detailed technical reports for each disease are available from here. Where possible, the model development follows the TRIPOD guidelines for good practice for mutivariable prediction models.

1.3.1 Model summaries

Table x summarises the methods, model inputs, data sources used and model accuracy.

1.4 How do these estimates differ from previous estimates

These modelled values differ from previous estimates in a number of ways:

  • Different input data sources
  • Different age bands
  • Wider range of predictor variables
  • Inclusion of confidence intervals
  • Estimates produced for GP practices and Local Authority populations.

These are new estimates and cannot necessarily be directly compared with previously calculated values. The main differences are:

  1. Hypertension - these estimates are broadly comparable and have similar underlying methodology
  2. COPD - the methodology is different. The previous estimates using the Health Survey for England were based on spirometry defined COPD. This defintion changed in the Health Survey over successive surveys when spirometry was done. These estimates are based on clinical case defintions using GP data in the Clinical Practice Research Datalink (CPRD) dataset, enhanced with HES data.
  3. Cardiovascular disease - the new estimates have been developed using the Whitehall II studies with a case defintion based on clinical endpoints for diagnosing CVD such as morality, angiography and other clinical confirmation. The use of the Whitehall studies also means that the new estimates are restricted to the 55-79 age group. The previous estimates modelled from the Health Survey for England were based on self-reported doctor-diagnosed heart disease and stroke.
  4. Depression - these estimates are new - the estimates are a longer period prevalence than QOF and tend to be higher.

The models are also more technically sophisicated and have a wider range of input variables (see next section).

A more detailed comparison between new and old estimates is presented in section…

1.5 Which predictor variables are used to mode which disease?

This is summarised in the tables and figures below. All models include age, sex, ethnicity and deprivation as predictors. Smoking is included in the COPD and cardiovascular disease models. The cardiovascular models also include physical activity, diabetes and chronic kidney disease. BMI is included in all the models except COPD. The depression and hypertension models include occupation and long term limiting illness. Hypertension is the only model to include education.

The values of each predictor are for the most part derived from the census, or calculated from surveys. The method used to derived the predicted values of prevalence for each unit of analysis is inverse probabiity weighting. The full technicalities of the method are beyond the scope of this report but it is a form of direct standardisation (see [here] (https://en.wikipedia.org/wiki/Inverse_probability_weighting)).

2 What do the data show?

This section is split into 2 parts:

  1. Local authority summaries
  2. General practice summaries

2.1 Local authority data

2.1.1 Missing data

Note that there are some missing values for LA estimates for hypertension, COPD and depression which we are working with Imperial College to fill.

2.1.2 Summary statistics

Summary statistics for LA data
disease mean median minimum 10th centile 90th centile maximum
bp 20.65 21.23 12.42 16.75 23.59 27.63
chd 7.92 7.87 6.66 7.15 8.72 10.53
copd 2.61 2.63 1.05 1.79 3.34 4.19
depression 15.19 15.38 8.63 13.14 17.22 18.40
pad 1.15 1.14 0.95 1.04 1.27 1.71
stroke 3.71 3.70 3.04 3.43 4.00 4.34

2.2 Plots

2.3 Maps

2.3.1 CHD

2.3.2 Depression

2.3.3 COPD

2.3.4 Stroke

2.3.5 Hypertension

2.3.6 Peripheral arterial disease

3 GP data

Error: object 'indicator' not found

We have been able to estimate prevalence for 60504 practices.

Fig x shows the variation in practice level disease estimates by region. This suggests disease prevalence tends to be lowest in London and highest in the North East and the North West.

3.0.1 Hypertension

The average estimated prevalence of hypertension is 20.85% and varies from 2% in in the practice with the lowest value to 50.62% in the practice with the highest.

3.0.2 COPD

The average estimated prevalence of COPD is 2.42% and varies from 0.13% in in the practice with the lowest value to 13.6% in the practice with the highest.

3.0.3 Depresssion

The average estimated prevalence of depression is 20.85% and varies from 2% in in the practice with the lowest value to 50.62% in the practice with the highest.

3.0.4 Coronary heart disease

The average estimated prevalence of CHD is 7.37% and varies from 3.81% in in the practice with the lowest value to 12.15% in the practice with the highest.

3.0.5 Stroke

The average estimated prevalence of stroke is 3.7% and varies from 2.23% in in the practice with the lowest value to 6.02% in the practice with the highest.

3.0.6 Peripheral arterial disease

The average estimated prevalence of PAD is 1% and varies from 0.57% in in the practice with the lowest value to 1.82% in the practice with the highest.

3.0.7 Overall comparisons with QOF

The relationship between QOF estimates and these new prevalence estimates can be captured in a number of ways. One way is to plot a correlogram which shows the correlation between all the variables. This is shown in figure. This shows correlations both numerically, and graphically as filled circles. The darker and larger the circle the stronger the relationship. Positive correlations are shown in blue; negative in red. Correlations greater than +/- 0.7 are generally considered to represent a strong relationship (where 1 is the strongest possible relationship). From this chart we can see that:

  • For hypertension there is a strong correlation between the prevalence estimates and QOF
  • For other diseases the relationship between modeled value and QOF is weaker
  • There are also strong relationships between modeled values for hypertension and COPD and cardiovascular estimates

3.1 Comparison with previous estimates

Joining, by = "practice_code"

Joining, by = "practice_code"

Of those practices with a previous estimate, 166 are not included in the updates. These are:

,

