There are two recently published guidelines for best practice in publishing data on the web, and reporting health statistics.
The DWBP is designed to make it easier for data providers ot make data available and users to use it. It focuses on metadata which faciliates data discovery in human (people can find it) and machine readable (computers can find it) forms, provenance so people know what they are getting, and licencing so people know the conditions under which they can use the data.
This report presents an audit of reporting of health statistics via Fingertips against the recently published Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER). (Stevens et al. 2016)
Fingertips is a large repository of health indicators and a major vehicle for health reporting by Public Health England. Fingertips is currently organised into:
The structure of Fingertips output is shown in Figure 1.
Figure 1: Structure of Fingertips
The Fingertips system comprises a number of elements:
Fingertips Portal Manager (FPM). Data is uploaded via FPM which also allows users to add descriptive metadata and control how the data is visualised on the Fingertips web pages. It allows the Fingertips system administrators to add and manage user accounts.
Backend database. Data and metadata are stored in a SQL database (PHOLIO).
Webservices. There are a number of webservices which allow the data to be visualised, searched, output as data files, and read into other systems for pdf creation for example.
The Fingertips user interface where data is visualised, searched and downloaded for reuse (i.e. the human readable interface).
The Fingertips Automated Programming Interface (API) which is a machine readable interface allowing computers to use the data in Fingertips directly and developers and programmers to reuse the data. Analysis in this report takes data directly from the API.
The fingertipsR
package which interacts with the API to make it easier for users of the R statistical programming software to analyse Fingertips data, and develop outputs, reports and products.
FPM
Fingertips
API
All indicators in Fingertips are accompanied by metadata which in some cases can be extensive. A typical metadata entry in Fingertips (from the Public Health Outcomes Framework) is shown below:
Indicator.ID | Indicator | Indicator.full.name |
---|---|---|
30101 | Mortality attributable to particulate air pollution | Fraction of all-cause adult mortality attributable to anthropogenic particulate air pollution (measured as fine particulate matter, PM2.5) (in PHOF 3.01) |
Definition | Rationale | Policy |
---|---|---|
Fraction of annual all-cause adult mortality attributable to anthropogenic (human-made) particulate air pollution (measured as fine particulate matter, PM2.5). Mortality burden associated with long-term exposure to anthropogenic particulate air pollution at current levels, expressed as the percentage of annual deaths from all causes in those aged 30+. PM2.5 means the mass (in micrograms) per cubic metre of air of individual particles with an aerodynamic diameter generally less than 2.5 micrometers. PM2.5 is also known as fine particulate matter. | Poor air quality is a significant public health issue. The burden of particulate air pollution in the UK in 2008 was estimated to be equivalent to nearly 29,000 deaths at typical ages and an associated loss of population life of 340,000 life years lost. Inclusion of this indicator in the Public Health Outcomes Framework will enable Directors of Public Health to prioritise action on air quality in their local area to help reduce the health burden from air pollution. | NA |
Data.source | Indicator.production |
---|---|
Background annual average PM2.5 concentrations for the year of interest are modelled on a 1km x 1km grid using an air dispersion model, and calibrated using measured concentrations taken from background sites in Defra’s Automatic Urban and Rural Network (http://uk-air.defra.gov.uk/interactive-map.) Data on primary emissions from different sources and a combination of measurement data for secondary inorganic aerosol and models for sources not included in the emission inventory (including re-suspension of dusts) are used to estimate the anthropogenic (human-made) component of these concentrations. By approximating LA boundaries to the 1km by 1km grid, and using census population data, population weighted background PM2.5 concentrations for each lower tier LA are calculated. This work is completed under contract to Defra, as a small extension of its obligations under the Ambient Air Quality Directive (2008/50/EC). Concentrations of anthropogenic, rather than total, PM2.5 are used as the basis for this indicator, as burden estimates based on total PM2.5 might give a misleading impression of the scale of the potential influence of policy interventions (COMEAP, 2012). | DEFRA/Air Pollution and Climate Change Group Public Health England |
Indicator.source | Methodology | Standard.population.values |
---|---|---|
NA | An increase of 10 ug/m3 in population-weighted annual average background concentration of PM2.5* is assumed to increase all-cause mortality rates by a unit relative risk (RR) factor of 1.06. For a population-weighted modelled annual average anthropogenic background PM2.5 concentration x, RR is calculated as (1.06)(x/10) (COMEAP, 2010). The fraction of deaths attributable to PM2.5 is expressed as a percentage, calculated as 100*(RR-1)/RR. Population-weighted annual average concentrations of anthropogenic PM2.5 were provided by AEA for all lower tier and unitary LAs within England. These were combined to produce figures at upper tier, regional and national level so that attributable fractions can be calculated at those scales also. | NA |
Confidence.interval.details | Source.of.numerator |
---|---|
An expert elicitation exercise suggested a 75% plausibility interval of 1.01 to 1.12 for the RR=1.06 of mortality associated with 10 ug/m3 annual average PM2.5 (COMEAP, 2009). This means that, in the region of 10 ug/m3, the true attributable fractions are likely to be in the range of 0.175 to 1.89 times the central estimates reported, with these factors changing only trivially over the range of concentrations considered. This plausibility interval reflects the uncertainty in the relationship between ambient PM2.5 concentrations and effects on mortality. Thus, whilst it is important in terms of the size of the absolute effect, it does not reflect uncertainties which would affect comparisons over time or between areas. Because of this, the plausibility intervals are not presented in the indicator. There is no accepted way of fully quantifying the uncertainty associated with modelled concentrations of PM2.5 (see Caveats). While there is considerable uncertainty in the relative risk estimate and hence in the estimate of the fraction of mortality caused by anthropogenic PM2.5, estimates for the local authorities all use the same relative risk, and the variation in the attributable fraction between local authorities is strongly predicted by the variation in PM2.5 concentration in the local authorities. Therefore, the above confidence intervals are not relevant to comparisons between local authorities, and are not presented with this indicator. | NA |
Definition.of.numerator | Source.of.denominator |
---|---|
Calculation does not involve a numerator | NA |
Definition.of.denominator | Disclosure.control |
---|---|
Calculation does not involve a denominator | Not applied |
Caveats | Copyright | Data.re.use |
---|---|---|
There is no accepted way of fully quantifying the uncertainty associated with modelled concentrations of PM2.5. The modelling used in calculating the indicator meets the requirements of the EU’s Directive 2008/50/EC on Ambient Air Quality that the uncertainty in modelled annual average PM2.5 concentrations should be no more than 50% in the region of the Limit Value (25 ug/m3). Concentrations of anthropogenic, rather than total, PM2.5 are used as the basis for this indicator, as burden estimates based on total PM2.5 might give a misleading impression of the scale of the potential influence of policy interventions (COMEAP, 2012). However, modelled concentrations of anthropogenic PM2.5 are more uncertain than those of total PM2.5 because of the uncertainty associated with the assignment to anthropogenic and non-anthropogenic sources. | NA | NA |
Indicator.number |
---|
3.01 |
Notes |
---|
Methods for calculation of mortality effects, together with national estimates of the mortality burden of anthropogenic PM2.5 in 2008 (and the predicted impact of reductions in PM2.5) are published in; ‘COMEAP (2010) The Mortality Effects of Long-Term Exposure to Particulate Air Pollution in the United Kingdom’. Available at : http://www.comeap.org.uk/documents/reports/39-page/linking/51-the-mortality-effects-of-long-term-exposure-to-particulate-air-pollution-in-the-united-kingdom COMEAP’s views on estimating the mortality burden attributable to PM2.5 at a local (e.g. Local Authority) level, and simplified methods for doing so, are published in: ‘COMEAP (2012) Statement on Estimating the Mortality Burden of Particulate Air Pollution at the Local Level’. Available at: http://www.comeap.org.uk/documents/statements/39-page/linking/46-mortality-burden-of-particulate-air-pollution Modelled background PM2.5 data are published on a 1km x 1km grid square basis by Defra ( http://uk-air.defra.gov.uk/data/pcm-data ). Estimates of mortality burdens associated with fine particulate matter at LA level have not been systematically reported to date. As there are no confidence intervals associated with this indicator, it is not possible to make comparisons between local authority and either the England or regional values. As a result, this indicator is not RAG rated. |
Frequency | Rounding | Data.quality | Indicator.Content | Unit | Value.type |
---|---|---|---|---|---|
Annual | NA | NA | NA | % | Proportion |
Year.type |
---|
Calendar |
As part of efforts to rationalise the output of profiles and publication of indicators in response to user feedback and review of resources we decided to audit the current reporting against the GATHER guidelines.1 The GATHER statement provides an 18-point checklist.
1 Define the indicator(s), populations (including age, sex, and geographic entities), and time period(s) for which estimates were made.
2 List the funding sources for the work.
Data Inputs For all data inputs from multiple sources that are synthesized as part of the study:
3 Describe how the data were identified and how the data were accessed.
4 Specify the inclusion and exclusion criteria. Identify all ad-hoc exclusions.
5 Provide information on all included data sources and their main characteristics. For each data source used, report reference information or contact name/institution, population represented, data collection method, year(s) of data collection, sex and age range, diagnostic criteria or measurement method, and sample size, as relevant.
6 Identify and describe any categories of input data that have potentially important biases (e.g., based on characteristics listed in item 5).
For data inputs that contribute to the analysis but were not synthesized as part of the study: 7 Describe and give sources for any other data inputs.
For all data inputs:
8 Provide all data inputs in a file format from which data can be efficiently extracted (e.g., a spreadsheet rather than a PDF), including all relevant meta-data listed in item 5. For any data inputs that cannot be shared because of ethical or legal reasons, such as third-party ownership, provide a contact name or the name of the institution that retains the right to the data.
Data analysis
9 Provide a conceptual overview of the data analysis method. A diagram may be helpful.
10 Provide a detailed description of all steps of the analysis, including mathematical formulae. This description should cover, as relevant, data cleaning, data pre-processing, data adjustments and weighting of data sources, and mathematical or statistical model(s).
11 Describe how candidate models were evaluated and how the final model(s) were selected.
12 Provide the results of an evaluation of model performance, if done, as well as the results of any relevant sensitivity analysis.
13 Describe methods for calculating uncertainty of the estimates. State which sources of uncertainty were, and were not, accounted for in the uncertainty analysis.
14 State how analytic or statistical source code used to generate estimates can be accessed.
Results and Discussion
15 Provide published estimates in a file format from which data can be efficiently extracted.
16 Report a quantitative measure of the uncertainty of the estimates (e.g. uncertainty intervals).
17 Interpret results in light of existing evidence. If updating a previous set of estimates, describe the reasons for changes in estimates.
18 Discuss limitations of the estimates. Include a discussion of any modelling assumptions or data limitations that affect interpretation of the estimates.
We compared a random sample of indicator metadata … against a modified checklist based on the GATHER criteria. We scored each indicator with the aim of:
We created a 5% random sample (n = 73.95) from unique Indicator IDs, and extracted the relevant metadata via the Fingertips API.
These guidelines are more suited to appraising reporting via peer reviewed publication so we adapted them to score the metadata as follows:
We created a simple, unweighted scoring system:
This gave a potential maximum score for each indicator of 26 and a potential minumum score of -26.
Profiles and indicator counts included in the audit
Stevens, Gretchen A., Leontine Alkema, Robert E. Black, J. Ties Boerma, Gary S. Collins, Majid Ezzati, John T. Grove, et al. 2016. “Guidelines for Accurate and Transparent Health Estimates Reporting: The GATHER statement.” The Lancet, 1–8. doi:10.1016/S0140-6736(16)30388-9.
GATHER guidelines have been applied to Global Burden of Disease outputs↩