1 Rationale

There are two recently published guidelines for best practice in publishing data on the web, and reporting health statistics.

The DWBP is designed to make it easier for data providers ot make data available and users to use it. It focuses on metadata which faciliates data discovery in human (people can find it) and machine readable (computers can find it) forms, provenance so people know what they are getting, and licencing so people know the conditions under which they can use the data.

This report presents an audit of reporting of health statistics via Fingertips against the recently published Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER). (Stevens et al. 2016)

1.1 The Fingertips system

Fingertips is a large repository of health indicators and a major vehicle for health reporting by Public Health England. Fingertips is currently organised into:

61 profiles
256 domains, containing
1479 distinct indicators.

The structure of Fingertips output is shown in Figure 1.

1.1.1 Figure 1: Structure of Fingertips

Figure 1: Structure of Fingertips

The Fingertips system comprises a number of elements:

Fingertips Portal Manager (FPM). Data is uploaded via FPM which also allows users to add descriptive metadata and control how the data is visualised on the Fingertips web pages. It allows the Fingertips system administrators to add and manage user accounts.
Backend database. Data and metadata are stored in a SQL database (PHOLIO).
Webservices. There are a number of webservices which allow the data to be visualised, searched, output as data files, and read into other systems for pdf creation for example.
The Fingertips user interface where data is visualised, searched and downloaded for reuse (i.e. the human readable interface).
The Fingertips Automated Programming Interface (API) which is a machine readable interface allowing computers to use the data in Fingertips directly and developers and programmers to reuse the data. Analysis in this report takes data directly from the API.
The fingertipsR package which interacts with the API to make it easier for users of the R statistical programming software to analyse Fingertips data, and develop outputs, reports and products.

FPM

Fingertips

API

1.1.2 Indicators in Fingertips

All indicators in Fingertips are accompanied by metadata which in some cases can be extensive. A typical metadata entry in Fingertips (from the Public Health Outcomes Framework) is shown below:

Metadata example (continued below)
Indicator.ID	Indicator	Indicator.full.name
30101	Mortality attributable to particulate air pollution	Fraction of all-cause adult mortality attributable to anthropogenic particulate air pollution (measured as fine particulate matter, PM2.5) (in PHOF 3.01)

Table continues below
Definition	Rationale	Policy
Fraction of annual all-cause adult mortality attributable to anthropogenic (human-made) particulate air pollution (measured as fine particulate matter, PM2.5). Mortality burden associated with long-term exposure to anthropogenic particulate air pollution at current levels, expressed as the percentage of annual deaths from all causes in those aged 30+. PM2.5 means the mass (in micrograms) per cubic metre of air of individual particles with an aerodynamic diameter generally less than 2.5 micrometers. PM2.5 is also known as fine particulate matter.	Poor air quality is a significant public health issue. The burden of particulate air pollution in the UK in 2008 was estimated to be equivalent to nearly 29,000 deaths at typical ages and an associated loss of population life of 340,000 life years lost. Inclusion of this indicator in the Public Health Outcomes Framework will enable Directors of Public Health to prioritise action on air quality in their local area to help reduce the health burden from air pollution.	NA

Table continues below
Data.source	Indicator.production
Background annual average PM2.5 concentrations for the year of interest are modelled on a 1km x 1km grid using an air dispersion model, and calibrated using measured concentrations taken from background sites in Defra’s Automatic Urban and Rural Network (http://uk-air.defra.gov.uk/interactive-map.) Data on primary emissions from different sources and a combination of measurement data for secondary inorganic aerosol and models for sources not included in the emission inventory (including re-suspension of dusts) are used to estimate the anthropogenic (human-made) component of these concentrations. By approximating LA boundaries to the 1km by 1km grid, and using census population data, population weighted background PM2.5 concentrations for each lower tier LA are calculated. This work is completed under contract to Defra, as a small extension of its obligations under the Ambient Air Quality Directive (2008/50/EC). Concentrations of anthropogenic, rather than total, PM2.5 are used as the basis for this indicator, as burden estimates based on total PM2.5 might give a misleading impression of the scale of the potential influence of policy interventions (COMEAP, 2012).	DEFRA/Air Pollution and Climate Change Group Public Health England

Table continues below
Indicator.source	Methodology	Standard.population.values
NA	An increase of 10 ug/m3 in population-weighted annual average background concentration of PM2.5* is assumed to increase all-cause mortality rates by a unit relative risk (RR) factor of 1.06. For a population-weighted modelled annual average anthropogenic background PM2.5 concentration x, RR is calculated as (1.06)(x/10) (COMEAP, 2010). The fraction of deaths attributable to PM2.5 is expressed as a percentage, calculated as 100*(RR-1)/RR. Population-weighted annual average concentrations of anthropogenic PM2.5 were provided by AEA for all lower tier and unitary LAs within England. These were combined to produce figures at upper tier, regional and national level so that attributable fractions can be calculated at those scales also.	NA

Table continues below
Confidence.interval.details	Source.of.numerator
An expert elicitation exercise suggested a 75% plausibility interval of 1.01 to 1.12 for the RR=1.06 of mortality associated with 10 ug/m3 annual average PM2.5 (COMEAP, 2009). This means that, in the region of 10 ug/m3, the true attributable fractions are likely to be in the range of 0.175 to 1.89 times the central estimates reported, with these factors changing only trivially over the range of concentrations considered. This plausibility interval reflects the uncertainty in the relationship between ambient PM2.5 concentrations and effects on mortality. Thus, whilst it is important in terms of the size of the absolute effect, it does not reflect uncertainties which would affect comparisons over time or between areas. Because of this, the plausibility intervals are not presented in the indicator. There is no accepted way of fully quantifying the uncertainty associated with modelled concentrations of PM2.5 (see Caveats). While there is considerable uncertainty in the relative risk estimate and hence in the estimate of the fraction of mortality caused by anthropogenic PM2.5, estimates for the local authorities all use the same relative risk, and the variation in the attributable fraction between local authorities is strongly predicted by the variation in PM2.5 concentration in the local authorities. Therefore, the above confidence intervals are not relevant to comparisons between local authorities, and are not presented with this indicator.	NA

Table continues below
Definition.of.numerator	Source.of.denominator
Calculation does not involve a numerator	NA

Table continues below
Definition.of.denominator	Disclosure.control
Calculation does not involve a denominator	Not applied

Table continues below
Caveats	Copyright	Data.re.use
There is no accepted way of fully quantifying the uncertainty associated with modelled concentrations of PM2.5. The modelling used in calculating the indicator meets the requirements of the EU’s Directive 2008/50/EC on Ambient Air Quality that the uncertainty in modelled annual average PM2.5 concentrations should be no more than 50% in the region of the Limit Value (25 ug/m3). Concentrations of anthropogenic, rather than total, PM2.5 are used as the basis for this indicator, as burden estimates based on total PM2.5 might give a misleading impression of the scale of the potential influence of policy interventions (COMEAP, 2012). However, modelled concentrations of anthropogenic PM2.5 are more uncertain than those of total PM2.5 because of the uncertainty associated with the assignment to anthropogenic and non-anthropogenic sources.	NA	NA

Table continues below
Links
http://www.comeap.org.uk/documents/reports/39-page/linking/51-the-mortality-effects-of-long-term-exposure-to-particulate-air-pollution-in-the-united-kingdom http://www.comeap.org.uk/documents/statements/39-page/linking/46-mortality-burden-of-particulate-air-pollution http://uk-air.defra.gov.uk/data/pcm-data

Table continues below
Indicator.number
3.01

Table continues below
Notes
Methods for calculation of mortality effects, together with national estimates of the mortality burden of anthropogenic PM2.5 in 2008 (and the predicted impact of reductions in PM2.5) are published in; ‘COMEAP (2010) The Mortality Effects of Long-Term Exposure to Particulate Air Pollution in the United Kingdom’. Available at : http://www.comeap.org.uk/documents/reports/39-page/linking/51-the-mortality-effects-of-long-term-exposure-to-particulate-air-pollution-in-the-united-kingdom COMEAP’s views on estimating the mortality burden attributable to PM2.5 at a local (e.g. Local Authority) level, and simplified methods for doing so, are published in: ‘COMEAP (2012) Statement on Estimating the Mortality Burden of Particulate Air Pollution at the Local Level’. Available at: http://www.comeap.org.uk/documents/statements/39-page/linking/46-mortality-burden-of-particulate-air-pollution Modelled background PM2.5 data are published on a 1km x 1km grid square basis by Defra ( http://uk-air.defra.gov.uk/data/pcm-data ). Estimates of mortality burdens associated with fine particulate matter at LA level have not been systematically reported to date. As there are no confidence intervals associated with this indicator, it is not possible to make comparisons between local authority and either the England or regional values. As a result, this indicator is not RAG rated.

Table continues below
Frequency	Rounding	Data.quality	Indicator.Content	Unit	Value.type
Annual	NA	NA	NA	%	Proportion

Year.type
Calendar

1.1.3 The GATHER guidelines

As part of efforts to rationalise the output of profiles and publication of indicators in response to user feedback and review of resources we decided to audit the current reporting against the GATHER guidelines.¹ The GATHER statement provides an 18-point checklist.

1 Define the indicator(s), populations (including age, sex, and geographic entities), and time period(s) for which estimates were made.

2 List the funding sources for the work.

Data Inputs For all data inputs from multiple sources that are synthesized as part of the study:
3 Describe how the data were identified and how the data were accessed.

4 Specify the inclusion and exclusion criteria. Identify all ad-hoc exclusions.

5 Provide information on all included data sources and their main characteristics. For each data source used, report reference information or contact name/institution, population represented, data collection method, year(s) of data collection, sex and age range, diagnostic criteria or measurement method, and sample size, as relevant.

6 Identify and describe any categories of input data that have potentially important biases (e.g., based on characteristics listed in item 5).

For data inputs that contribute to the analysis but were not synthesized as part of the study: 7 Describe and give sources for any other data inputs.

For all data inputs:
8 Provide all data inputs in a file format from which data can be efficiently extracted (e.g., a spreadsheet rather than a PDF), including all relevant meta-data listed in item 5. For any data inputs that cannot be shared because of ethical or legal reasons, such as third-party ownership, provide a contact name or the name of the institution that retains the right to the data.

Data analysis
9 Provide a conceptual overview of the data analysis method. A diagram may be helpful.

10 Provide a detailed description of all steps of the analysis, including mathematical formulae. This description should cover, as relevant, data cleaning, data pre-processing, data adjustments and weighting of data sources, and mathematical or statistical model(s).

11 Describe how candidate models were evaluated and how the final model(s) were selected.

12 Provide the results of an evaluation of model performance, if done, as well as the results of any relevant sensitivity analysis.

13 Describe methods for calculating uncertainty of the estimates. State which sources of uncertainty were, and were not, accounted for in the uncertainty analysis.

14 State how analytic or statistical source code used to generate estimates can be accessed.

Results and Discussion
15 Provide published estimates in a file format from which data can be efficiently extracted.

16 Report a quantitative measure of the uncertainty of the estimates (e.g. uncertainty intervals).

17 Interpret results in light of existing evidence. If updating a previous set of estimates, describe the reasons for changes in estimates.

18 Discuss limitations of the estimates. Include a discussion of any modelling assumptions or data limitations that affect interpretation of the estimates.

2 Methods

We compared a random sample of indicator metadata … against a modified checklist based on the GATHER criteria. We scored each indicator with the aim of:

assessing the overall quality of metadata for Fingertips indicators
identify potential areas for improvement

2.1 Sampling

We created a 5% random sample (n = 73.95) from unique Indicator IDs, and extracted the relevant metadata via the Fingertips API.

2.2 Scoring

These guidelines are more suited to appraising reporting via peer reviewed publication so we adapted them to score the metadata as follows:

in many cases funding source is not relevant so this criterion was excluded
we excluded questions about modelling as these are not relevant to the publication or display of indicators in Fingertips although they may be used to generate the values to be displayed in the first place. [might revisit]

We created a simple, unweighted scoring system:

score 1 if the feature present in the metadata
score -1 if the feature is not present but could be
score 0 if not relevant

This gave a potential maximum score for each indicator of 26 and a potential minumum score of -26.

3 Results

3.1 Characteristics of audited profiles

Profiles and indicator counts included in the audit

References

Stevens, Gretchen A., Leontine Alkema, Robert E. Black, J. Ties Boerma, Gary S. Collins, Majid Ezzati, John T. Grove, et al. 2016. “Guidelines for Accurate and Transparent Health Estimates Reporting: The GATHER statement.” The Lancet, 1–8. doi:10.1016/S0140-6736(16)30388-9.

GATHER guidelines have been applied to Global Burden of Disease outputs↩

Health reporting via Fingertips: an audit against best practice guidelines: 1 GATHER guidelines

Julian Flowers

2017-04-16