1 Introduction

Open public health data and analytical tools are becoming increasingly available as part of R packages or through Application Programming Interfaces (APIs).

This article briefly reviews:

  • some R packages for public health use which give access to data or useful analytical tools
  • some of the open APIs for extracting data
  • some useful techniques for data extraction e.g. getting tables from pdf publications

2 Data sources

2.1 WHO

WHO provide an R package (WHO) to access some of their data.

The starting point is the get_codes function which gives a list of indicators (n = 2767) and respective codes which can then be passed to the get_data function.

There is a bit of work to do to separate out confidence intervals into separate columns to make the data tidier - see code below for age-standardised prevalence of obesity by country. This shows the universal increase in adult obesity rates across Europe although there is considerable variation between countries. The most dramatic increases have been in Turkish women.

library(WHO)

WHO::get_codes() %>% head()
## # A tibble: 6 x 3
##   label          display                         url                      
##   <chr>          <chr>                           <chr>                    
## 1 MDG_0000000001 Infant mortality rate (probabi… http://apps.who.int/gho/…
## 2 MDG_0000000003 Adolescent birth rate (per 100… http://apps.who.int/gho/…
## 3 MDG_0000000005 Contraceptive prevalence (%)    http://apps.who.int/gho/…
## 4 MDG_0000000006 Unmet need for family planning… http://apps.who.int/gho/…
## 5 MDG_0000000007 Under-five mortality rate (pro… http://apps.who.int/gho/…
## 6 MDG_0000000010 Median availability of selecte… http://apps.who.int/gho/…
WHO::get_data(code = "NCD_BMI_30A") %>%
  tidyr::separate(value, c("value", "ci"), sep = " ") %>%
  mutate(ci = stringr::str_replace(ci, "\\[", ""), 
         ci = stringr::str_replace(ci, "\\]", "")) %>%
  tidyr::separate(ci, remove = FALSE, c("lower", "upper"), sep = "-") %>%
  mutate_at(.vars = c("value", "lower", "upper"), .funs = as.numeric) %>%
  filter(region == "Europe", sex == "Female") %>%
  ggplot(aes(year, value, colour = country)) +
  geom_line(aes(group = country)) +
  #facet_grid(agegroup~sex) +
  theme(legend.position = "bottom") -> g
  plotly::ggplotly(g)

2.2 Eurostat

Eurostat publish access to data in R via the eurostat package and have published a helpful cheatsheet as a tutorial. Like the WHO data there is a utility function get_eurostat_toc which tells you whats in the dataset (9695 indicators) and a data extraction function get_eurostat.

The example below shows how to extract, plot and map data on amenable mortality across Europe.

library(eurostat)


toc <- eurostat::get_eurostat_toc() ## see what's in the dataset

am <- search_eurostat("Amenable", type = "dataset") ## find the indicator ids for amenable deaths

am_data <- get_eurostat(id = "hlth_cd_apr") ## amenable mortality data 
am_data1 <- get_eurostat(id = "hlth_cd_apreu") ## amenable mortality data by ICD10 code

#summary(am_data1)

# am_data %>%
#   filter(unit == "RT", values > 10) %>%
#   ggplot(aes(time, values, shape = indic_he, colour = geo, group = geo)) + 
#   geom_line() +
#   geom_point() +
#   #facet_wrap(indic_he~icd10, scales = "free", nrow = 2) +
#   labs(caption = "Source: EUROSTAT", 
#        title = "Preventable and amenable mortality in Europe \n2010-2014")

  

am_data1 %>%
  filter(unit == "RT", values > 10, !icd10 == "TOTAL") %>%
  ggplot(aes(time, values, shape = indic_he, colour = icd10)) + 
  geom_line() +
  geom_point() +
  facet_wrap(indic_he~sex, nrow = 2) +
  labs(caption = "Source: EUROSTAT", 
       title = "Preventable and amenable mortality in Europe (EU28) \n2010-2014", 
       subtitle = "By ICD10 code", 
       y = "Rate") +
  scale_y_log10()

am_data2 <- get_eurostat(id = "hlth_cd_apr") %>% filter(time == max(time), unit == "RT") %>% mutate(cat = cut_to_classes(values, n = 5, decimals = 1)) ## stratify data into quintiles

mapdata <- merge_eurostat_geodata(am_data2, resolution = "20") ## add to mapping function

## amenable mortality European map
ggplot(mapdata, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = cat), colour = "grey", size = .1) +
  labs(title = "Amenable mortality by NUTS-3 region, 2014",
       subtitle = "Mortality rate per 100,000", 
       fill = "Amenable mortality \nrate", 
       caption = "Source: EUROSTAT") +
  theme_void() +
  scale_fill_brewer(palette = "RdYlBu") + 
  coord_map(xlim = c(-12, 44), ylim = c(35,67))

2.3 Air quality

The openair package includes data from pollution monitoring sites from the NERC. There is a good manual for use here.

The data is updated daily and workhorse function is importAURN. To use it you will need to know the codes of relevant monitoring stations. As an example, the Marylebone Road station is “my1” and we can extract data as below:

pollution_data <- openair::importAURN(site = c("my1", "kc1"), year  = 2010:2017)

We can now plot daily concentrations of pollutants at this site. We’ll look at pm2.5 and n02.

summaryPlot(pollution_data)
##     date1     date2        o3       no2        co       so2      pm10 
## "POSIXct"  "POSIXt" "numeric" "numeric" "numeric" "numeric" "numeric" 
##       nox        no     pm2.5      nv10       v10     nv2.5      v2.5 
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
##        ws        wd      site      code 
## "numeric" "numeric"  "factor"  "factor"

2.4 NOMIS API

The NOMIS API works in a similar way to the Fingertips API and is well described here. You can construct a URL to download csv files from the site built on common patterns. To download a geographical dataset of a given indicator requires identifying codes for geography, sex, item, and measures types. These retrieve geographical codes, names and types, numerator and denominator definitions.

In R it seems most straightforward to download the metadata files in json format, but construct data extract as csv files. These always take the root https://www.nomisweb.co.uk/api/v01/dataset/NM_1_1.data.csv.

You can search the definitions and metadata with queries like this https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*ethnicity* or https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*claimant*

library(jsonlite)
library(listviewer)

datasets <- jsonlite::fromJSON("https://www.nomisweb.co.uk/api/v01/dataset/def.sdmx.json", flatten = TRUE)
str(datasets, max.level = 4)
## List of 1
##  $ structure:List of 7
##   ..$ header        :List of 4
##   .. ..$ id      : chr "none"
##   .. ..$ prepared: chr "2018-03-20T15:36:27Z"
##   .. ..$ sender  :List of 2
##   .. .. ..$ contact:List of 4
##   .. .. ..$ id     : chr "NOMIS"
##   .. ..$ test    : chr "false"
##   ..$ keyfamilies   :List of 1
##   .. ..$ keyfamily:'data.frame': 1230 obs. of  14 variables:
##   .. .. ..$ agencyid                            : chr [1:1230] "NOMIS" "NOMIS" "NOMIS" "NOMIS" ...
##   .. .. ..$ id                                  : chr [1:1230] "NM_1_1" "NM_2_1" "NM_4_1" "NM_5_1" ...
##   .. .. ..$ uri                                 : chr [1:1230] "Nm-1d1" "Nm-2d1" "Nm-4d1" "Nm-5d1" ...
##   .. .. ..$ version                             : num [1:1230] 1 1 1 1 1 1 1 1 1 1 ...
##   .. .. ..$ annotations.annotation              :List of 1230
##   .. .. ..$ components.attribute                :List of 1230
##   .. .. ..$ components.dimension                :List of 1230
##   .. .. ..$ components.primarymeasure.conceptref: chr [1:1230] "OBS_VALUE" "OBS_VALUE" "OBS_VALUE" "OBS_VALUE" ...
##   .. .. ..$ components.timedimension.codelist   : chr [1:1230] "CL_1_1_TIME" "CL_2_1_TIME" "CL_4_1_TIME" "CL_5_1_TIME" ...
##   .. .. ..$ components.timedimension.conceptref : chr [1:1230] "TIME" "TIME" "TIME" "TIME" ...
##   .. .. ..$ description.value                   : chr [1:1230] "Records the number of people claiming Jobseeker's Allowance (JSA) and National Insurance credits at Jobcentre P"| __truncated__ "A quartery count of claimants who were claiming Jobseeker's Allowance on the count date analysed by their age a"| __truncated__ "A monthly count of Jobseeker's Allowance (JSA) claimants broken down by age and the duration of claim. Totals e"| __truncated__ "A midyear estimate of the workforce (the denominator) which was used for calculating claimant count rates prior"| __truncated__ ...
##   .. .. ..$ description.lang                    : chr [1:1230] "en" "en" "en" "en" ...
##   .. .. ..$ name.value                          : chr [1:1230] "Jobseeker's Allowance with rates and proportions" "claimant count - age and duration" "Jobseeker's Allowance by age and duration" "claimant count denominators - historical workforce series" ...
##   .. .. ..$ name.lang                           : chr [1:1230] "en" "en" "en" "en" ...
##   ..$ xmlns         : chr "http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message"
##   ..$ common        : chr "http://www.SDMX.org/resources/SDMXML/schemas/v2_0/common"
##   ..$ structure     : chr "http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure"
##   ..$ xsi           : chr "http://www.w3.org/2001/XMLSchema-instance"
##   ..$ schemalocation: chr "http://sdmx.org/docs/2_0/SDMXMessage.xsd"
listviewer::jsonedit(datasets)
## has 1229 indicators
## lets look at indicator names

purrr::map_df(datasets$structure$keyfamilies, "name.value")
## # A tibble: 1,230 x 1
##    keyfamily                                                  
##    <chr>                                                      
##  1 Jobseeker's Allowance with rates and proportions           
##  2 claimant count - age and duration                          
##  3 Jobseeker's Allowance by age and duration                  
##  4 claimant count denominators - historical workforce series  
##  5 claimant count - occupation                                
##  6 claimant count - occupation, age and duration              
##  7 Jobseeker's Allowance off-flows by reason, age and duration
##  8 Jobseeker's Allowance flows by age and duration            
##  9 Jobseeker's Allowance flows                                
## 10 Claimant Count - seasonally adjusted                       
## # ... with 1,220 more rows
## get a dataset

nomis <- read_csv("https://www.nomisweb.co.uk/api/v01/dataset/NM_1_1.data.csv?geography=TYPE486&sex=7&item=1&measures=20100&time=latest") %>%
  select(ITEM_NAME, GEOGRAPHY_NAME, GEOGRAPHY_CODE, DATE, MEASURES_NAME, OBS_VALUE, RECORD_COUNT)

head(nomis)
## # A tibble: 6 x 7
##   ITEM_NAME  GEOGRAPHY_NAME  GEOGRAPHY_CODE DATE  MEASURES_NAME  OBS_VALUE
##   <chr>      <chr>           <chr>          <chr> <chr>              <int>
## 1 Total cla… Darlington      E06000005      2018… Persons claim…      1554
## 2 Total cla… Hartlepool      E06000001      2018… Persons claim…       801
## 3 Total cla… Middlesbrough   E06000002      2018… Persons claim…      2963
## 4 Total cla… Redcar and Cle… E06000003      2018… Persons claim…      1959
## 5 Total cla… Stockton on Te… E06000004      2018… Persons claim…      2629
## 6 Total cla… Chester-le-Str… E07000054      2018… Persons claim…       450
## # ... with 1 more variable: RECORD_COUNT <int>
test <-"Julian Flowers is "

str_extract_all(test, pattern = "[A-Z]{1}[a-z]*")
## [[1]]
## [1] "Julian"  "Flowers"

2.5 Clinical codes

NOTE - this link does not seem to be working

The rClinicalCodes package can be used to find code lists used in research on primary care datasets from the clinicalcodes.org repository.

The package has a a small number of functions of which these are particularly useful:

  • all_ClinicalCodes_articles gives a list of the research papers for which there are code lists
  • get_ClincalCodes requires a URL and will download a dataframe for of codes by disease for that article.

For example we can download the code list for diabetes used in Doran et al’s paper on withdrawing performance indicators.

#art1 <- rClinicalCodes::all_ClinicalCodes_articles() 

# head(rClinicalCodes::get_ClinicalCodes(article_id = 5, codelist_name = "diabetes") %>%
#   mutate_if(is.factor, as.character)) %>%
#   knitr::kable()

Or we can review all the codes for a given article e.g the QOF business rules v24.

codelists <- get_ClinicalCodes(article_id = 1)
names <- colnames(purrr::map_df(codelists, colnames ))
  
codes <-get_ClinicalCodes(article_id = 1, codelist_name = names) %>%
  mutate_if(is.factor, as.character) %>%
  filter(code != "code")

head(codes)

and finally pull the codeset for a given disease area.

codelists <- get_ClinicalCodes(article_id = 1:52) ## pull all the code lists

#codelists[2] ## review

codelists1 <- map(codelists, "list_name") ## find all the list names


## find the list-names for your disease of interest e.g. copd
disease_list <- as.tibble(names(codelists1)) %>% arrange(value) %>%
  filter(str_detect(value, "chronic-obstructive|copd")) %>% 
  pull(value)

complete <- data.frame()

for(i in seq_along(codelists)){
newdf <- codelists[[i]][1:6] %>%
  mutate_if(is.factor, as.character) %>%
  mutate_if(is.numeric, as.character)

complete <- bind_rows(complete, newdf)
  
}

## get copd codes
copd_codes <- complete %>%
  filter(str_detect(list_name, "chronic-obstructive|copd"))

copd_codes %>%
  knitr::kable()

2.6 Fingertips API and fingertipsR

The fingertipsR package is designed to make it easy to extract Fingertips data into R for further analysis. It reads the Fingertips API and is more user-friendly than reading the API directly although for skilled programmers the API offers much more functionality.

The use of the API is introduced by this vignette and more details are available here.

The package can be downloaded from CRAN, or the development version from Github as below.

## Stable version

(install.packages("fingertipsR"))

## development version

devtools::install_github("PublicHealthEngland/fingertipsR")
library(fingertipsR)

We can extract a list of all the indicators:

fingertipsR::indicators_unique()
## # A tibble: 1,948 x 2
##    IndicatorID IndicatorName                                              
##          <int> <fct>                                                      
##  1         247 Dementia QOF prevalence (all ages)                         
##  2         340 Income Deprivation Affecting Older People Index            
##  3         734 Percentage of people in long-term unemployment             
##  4        1164 Adults (18-64) with physical disabilities supported by adu…
##  5        1165 Adults (18-64) with learning disabilities supported by adu…
##  6        1166 Adults (18-64) with mental health problems supported by ad…
##  7        1167 Older people (65+) supported by adult social care througho…
##  8        1172 % of the total resident population who are 65-74 years old 
##  9        1173 % of the total resident population who are 75-84 years old 
## 10        1174 % of the total resident population who are 85 and over     
## # ... with 1,938 more rows

Then get the metadata for a set of indicators or profiles.

metadata <- indicator_metadata(IndicatorID = c(247, 340, 734))

metadata %>%
  formattable::formattable()
IndicatorID Indicator Indicator full name Definition Rationale Policy Data source Indicator production Indicator source Methodology Standard population/values Confidence interval details Source of numerator Definition of numerator Source of denominator Definition of denominator Disclosure control Caveats Copyright Data re-use Links Indicator number Notes Frequency Rounding Data quality Indicator Content Unit Value type Year type Polarity
340 Income deprived older people (60+) Percentage of people aged 60 and over living in income deprived households (Income Deprivation Affecting Older People Index). Adults aged 60 or over living in income-deprived households as a percentage of all adults aged 60 or over. The Income Deprivation Affecting Older People Index (IDAOPI) measures the proportion of all adults aged 60 or over living in income deprived families. It is a subset of the Income Deprivation Domain which measures the proportion of the population in an area experiencing deprivation relating to low income. The definition includes adults aged 60 or over receiving Income Support or income-based Jobseekers Allowance or income-based Employment and Support Allowance or Pension Credit (Guarantee). Poverty is a risk factor for mental health for people of all ages. In older adults, poverty may add to the risk of social isolation already present for this group, as those in poverty have limited money available to participate in social activities. NA Department for Communities and Local Government (DCLG) Public Health Data Science, Public Health England https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015 For all geographies apart from wards, the value of the IDAOPI domain score was applied to mid-2012 population estimates to obtain numerator estimates (estimated number of older people living in income-deprived households) for each LSOA. These numerator estimates and population estimates for LSOAs were then aggregated to MSOAs, local authorities and England as a whole (using geographical lookup tables obtained from ONS Geography). Percentage figures were then calculated based on these aggregated data. The estimates for wards were derived by the Local Government Association. LSOAs were matched to the ward in which they had the best fit (by size of area). Estimates were then calculated based on population weighted averages of the LSOA level IDAOPI scores. For GP practice estimates, LSOA level deprivation data were applied proportionally to the practice populations. NA NA NA NA NA NA NA NA NA NA Estimates and projections of the population aged 60 and over can be obtained from Projecting Older People Information System (POPPI) http://www.poppi.org.uk/ NA NA NA NA 3 NA % Proportion Calendar RAG - Low is good
734 Long-term unemployment: % of working age population Long-term unemployment: % of population aged 16-64 that are long-term unemployed The percentage of the working age population that have been claiming job seekers allowance for over 12 months. Population is based on mid-year estimate of that year, unless not available in which case previous year’s mid-year estimate is used. There is a strong evidence to suggest that work is generally good for physical and mental health and wellbeing, taking into account the nature and quality of work and its social context, and that worklessness is associated with poorer physical and mental health. NA ONS NOMIS NA https://www.nomisweb.co.uk/Default.asp NA NA NA NA NA NA NA NA NA NA NA NA NA Pre 2011 Mid year Populationsare rebased populations from the 2011 census NA NA 3 NA % Crude rate Calendar RAG - Low is good
247 Dementia recorded prevalence (QOF): % of practice register (all ages) Dementia recorded prevalence (QOF): % of patients on GP practice registers recorded as having dementia (all ages) The recorded dementia prevalence is the number of people with dementia recorded on GP practice registers as a proportion of the people (all ages);registered at each GP practice, allocated to a local authority boundary using the postcode of the practice.

Dementia is a syndrome characterised by an insidious but ultimately catastrophic, progressive global deterioration in intellectual function and is a main cause of late-life disability. The prevalence of dementia increases with age and is estimated to be approximately 20% at 80 years of age. The annual incidence of vascular dementia is 1.2/100 overall person years at risk and is the same in all age groups. Alzheimer’s disease accounts for 50 - 75% of cases of dementia.

The annual incidence of dementia of the Alzheimer type rises to 34.3/100 person years at risk in the 90 year age group; the prevalence is higher in women than in men due to the longer lifespan of women. Other types of dementia such as Lewy Body dementia and fronto-temporal dementia are relatively rare but can be very distressing. In a third of cases, dementia is associated with other psychiatric symptoms (depressive disorder, adjustment disorder, generalised anxiety disorder, alcohol related problems). A complaint of subjective memory impairment is an indicator of dementia especially when there is altered functioning in terms of activities of daily living.
NA QOF NA Data for 2016/17 can be downloaded from http://digital.nhs.uk/catalogue/PUB30124 For other data years please find on the http://content.digital.nhs.uk/qof NA NA NA Quality Outcomes Framework (QOF), NHS Digital The number of patients registered with dementia, as recorded on practice disease registers, allocated to a local authority boundary using the postcode of the practice. Quality Outcomes Framework (QOF), NHS Digital The number of people registered at GP practices (all ages), allocated to a local authority boundary using the postcode of the practice. NA This indicator is a measure of recorded prevalence and not actual prevalence and therefore under-reports groups who are less likely to be registered with a GP, such as ethnic minority populations, homeless people, migrants and travellers. QOF registers are constructed to underpin indicators on quality of care, and they do not necessarily equate to prevalence as may be defined by epidemiologists. Caution should be taken when interpreting this indicator as higher than average value may mean that the prevalence of the condition is high in an area, but it could also indicate that detection is better there; this is for local knowledge to determine.The intended measure is a resident-based measure at local authority geography. However the required base datasets are not available at the local authority level. Therefore a proxy registration based measure has been constructed which illustrates the proportion of individuals with dementia in the registered population apportioned to local authority geographies, based on the postcode of their GP practice. For local authorities that are geographically coterminous with CCGs, please note that values may not be the same as the CCG value. Re-used with the permission of NHS Digital. All rights reserved. NA NA NA NHS Digital advises a number of practices in Somerset CCG opted out of national QOF arrangements for 2014/15 and 2015/16. This means that reporting on individual clinical indicators will appear lower than practices who have continued to deliver national QOF NA NA 3 NA % Proportion Financial BOB - Blue orange blue

And lookup which units of analysis are available for these indicators:

ind_areas <- indicator_areatypes() %>%
  filter(IndicatorID %in% c(247, 340, 734)) %>%
  left_join(metadata) %>%
  left_join(area_types()) %>%
  select(IndicatorID, Indicator,AreaTypeID, AreaTypeName  ) %>%
  distinct()

ind_areas
## # A tibble: 15 x 4
##    IndicatorID Indicator                  AreaTypeID AreaTypeName         
##          <int> <chr>                           <int> <chr>                
##  1         247 Dementia recorded prevale…        153 CCG unchanged plus o…
##  2         247 Dementia recorded prevale…        152 CCG unchanged plus n…
##  3         247 Dementia recorded prevale…        120 Sustainability and T…
##  4         247 Dementia recorded prevale…        102 Counties and Unitary…
##  5         247 Dementia recorded prevale…         46 Sub-region (former L…
##  6         247 Dementia recorded prevale…          7 General Practice     
##  7         340 Income deprived older peo…        153 CCG unchanged plus o…
##  8         340 Income deprived older peo…        152 CCG unchanged plus n…
##  9         340 Income deprived older peo…        104 PHEC 2015 new plus P…
## 10         340 Income deprived older peo…        102 Counties and Unitary…
## 11         340 Income deprived older peo…        101 Local authority dist…
## 12         340 Income deprived older peo…          8 Ward                 
## 13         340 Income deprived older peo…          7 General Practice     
## 14         340 Income deprived older peo…          6 Government Office Re…
## 15         734 Long-term unemployment: %…        102 Counties and Unitary…

and then extract the data for each indicator for all available geographies (takes a few minutes).

data <- fingertips_data(IndicatorID = 247, AreaTypeID = c(120, 102, 46) )


head(data)
## # A tibble: 6 x 24
##   IndicatorID IndicatorName     ParentCode ParentName AreaCode AreaName   
##         <int> <chr>             <chr>      <chr>      <chr>    <chr>      
## 1         247 Dementia: QOF pr… <NA>       <NA>       E920000… England    
## 2         247 Dementia: QOF pr… E92000001  England    E920000… England    
## 3         247 Dementia: QOF pr… E92000001  England    E390000… London NHS…
## 4         247 Dementia: QOF pr… E92000001  England    E390000… Wessex NHS…
## 5         247 Dementia: QOF pr… E92000001  England    E390000… Cheshire a…
## 6         247 Dementia: QOF pr… E92000001  England    E390000… Yorkshire …
## # ... with 18 more variables: AreaType <chr>, Sex <chr>, Age <chr>,
## #   CategoryType <chr>, Category <chr>, Timeperiod <chr>, Value <dbl>,
## #   LowerCI95.0limit <dbl>, UpperCI95.0limit <dbl>,
## #   LowerCI99.8limit <dbl>, UpperCI99.8limit <dbl>, Count <dbl>,
## #   Denominator <dbl>, Valuenote <chr>, RecentTrend <chr>,
## #   ComparedtoEnglandvalueorpercentiles <chr>,
## #   Comparedtosubnationalparentvalueorpercentiles <chr>,
## #   TimeperiodSortable <int>

2.7 Analysis

2.7.1 Epitools

Largely designed for infectious disease epidemiology (https://cran.r-project.org/web/packages/epitools/epitools.pdf), but has functions for:

  • age-standardisation using exact methods ageadjust.direct; ageadjust.indirect
  • confidence intervals binom.conf.int; pois.conf.int
  • risk and rate calculations oddsratio; rateratio; riskratio
  • tabulations

As an example we can calculate Wilson CI for a proportion:

x <- 40
n <- 100

epitools::binom.wilson(x, n)
##    x   n proportion     lower     upper conf.level
## 1 40 100        0.4 0.3094013 0.4979974       0.95
library(epitools)

r243 <- matrix(c(12,2,7,9), 2, 2)
dimnames(r243) <- list(Diarrhea = c("Yes", "No"),
"Antibody level" = c("Low", "High")
)
r243
##         Antibody level
## Diarrhea Low High
##      Yes  12    7
##      No    2    9
r243b <- t(r243)
r243b
##               Diarrhea
## Antibody level Yes No
##           Low   12  2
##           High   7  9
## Chi sq
epitab(r243, rev = "b", verbose = TRUE)
## $tab
##         Antibody level
## Diarrhea High     p0 Low        p1 oddsratio    lower    upper    p.value
##      No     9 0.5625   2 0.1428571  1.000000       NA       NA         NA
##      Yes    7 0.4375  12 0.8571429  7.714286 1.283544 46.36398 0.02588706
## 
## $measure
## [1] "wald"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"
## 
## $x
##         Antibody level
## Diarrhea High Low
##      No     9   2
##      Yes    7  12
## 
## $data
##         Antibody level
## Diarrhea High Low Total
##    No       9   2    11
##    Yes      7  12    19
##    Total   16  14    30
## 
## $p.exposed
##         Antibody level
## Diarrhea   High       Low     Total
##    No    0.5625 0.1428571 0.3666667
##    Yes   0.4375 0.8571429 0.6333333
##    Total 1.0000 1.0000000 1.0000000
## 
## $p.outcome
##         Antibody level
## Diarrhea      High       Low Total
##    No    0.8181818 0.1818182     1
##    Yes   0.3684211 0.6315789     1
##    Total 0.5333333 0.4666667     1
## 
## $p.value
##         two-sided
## Diarrhea midp.exact fisher.exact chi.square
##      No          NA           NA         NA
##      Yes 0.02332167   0.02588706 0.01733469
## 
## $correction
## [1] FALSE
## Risk ratio
epitab(r243, method="riskratio",rev = "b", verbose = TRUE)
## $tab
##         Antibody level
## Diarrhea High        p0 Low        p1 riskratio     lower    upper
##      No     9 0.8181818   2 0.1818182  1.000000        NA       NA
##      Yes    7 0.3684211  12 0.6315789  3.473684 0.9468914 12.74326
##         Antibody level
## Diarrhea    p.value
##      No          NA
##      Yes 0.02588706
## 
## $measure
## [1] "wald"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"
## 
## $x
##         Antibody level
## Diarrhea High Low
##      No     9   2
##      Yes    7  12
## 
## $data
##         Antibody level
## Diarrhea High Low Total
##    No       9   2    11
##    Yes      7  12    19
##    Total   16  14    30
## 
## $p.exposed
##         Antibody level
## Diarrhea   High       Low     Total
##    No    0.5625 0.1428571 0.3666667
##    Yes   0.4375 0.8571429 0.6333333
##    Total 1.0000 1.0000000 1.0000000
## 
## $p.outcome
##         Antibody level
## Diarrhea      High       Low Total
##    No    0.8181818 0.1818182     1
##    Yes   0.3684211 0.6315789     1
##    Total 0.5333333 0.4666667     1
## 
## $p.value
##         two-sided
## Diarrhea midp.exact fisher.exact chi.square
##      No          NA           NA         NA
##      Yes 0.02332167   0.02588706 0.01733469
## 
## $correction
## [1] FALSE
## Risk ratio
epitab(matrix(c(41, 15, 28010, 19017),2,2)[2:1,],
method="rateratio", verbose = TRUE)
## $tab
##           Outcome
## Predictor  Count Person-time rateratio    lower    upper    p.value
##   Exposed1    15       19017  1.000000       NA       NA         NA
##   Exposed2    41       28010  1.855759 1.027226 3.352563 0.03545742
## 
## $measure
## [1] "wald"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "midp.exact"
## 
## $x
##           Outcome
## Predictor  Count Person-time
##   Exposed1    15       19017
##   Exposed2    41       28010
## 
## $data
##           Outcome
## Predictor  Count Person-time
##   Exposed1    15       19017
##   Exposed2    41       28010
##   Total       56       47027
## 
## $p.value
##           two-sided
## Predictor  midp.exact       wald
##   Exposed1         NA         NA
##   Exposed2 0.03545742 0.03736289

2.7.2 pubh

This package is intended to “provide a common syntax for the most frequent statistical analysis in epidemiology”. It uses the lattice package for graphics, and has aglm_coef function for display of regression models (NB the broom pacakge is also very good for this), and has standard functions for epidemiology

There is a good vignette.

odds_trend(Cancer ~ Age, data = Macmahon)
## Odds Ratios with 95% CIs and p-values 
##  
##  Exposure   OR lower upper chi.square fisher.exact
##       <20 1.00    NA    NA       <NA>         <NA>
##     20-24 1.21  1.06  1.39      0.007        0.007
##     25-29 1.55  1.35  1.79    < 0.001      < 0.001
##     30-34 1.88  1.60  2.22    < 0.001      < 0.001
##       >34 2.41  1.96  2.95    < 0.001      < 0.001
## 
##  
## Chi-squared Test for Trend in Proportions = 129.012 
##  1 d.f., p < 0.001

2.8 dsrTest

The dstTest package calculates directly standardised rates and thier confidence intervals using “several methods”.

These methods include:

  • gamma
  • dobson
  • asymptotic
  • beta
  • bootstrap

From the vignette:

if(!require(dsrTest))install.packages("dsrTest")
library(dsrTest)
library(tidyverse, quietly = TRUE)

data("downs.mi")

b <- downs.mi %>%
  filter(BirthOrder == 3)

d1 <- with(b, dsrTest(Cases, Births, Standard, mult = 100000, method = "gamma"))
d2 <- with(b, dsrTest(Cases, Births, Standard, mult = 100000, method = "dobson"))
d3 <- with(b, dsrTest(Cases, Births, Standard, mult = 100000, method = "bootstrap"))

dsr <- list(d1, d2, d3)

formattable::formattable(map_df(dsr, broom::tidy) %>%
  cbind(method = c("gamma", "dobson", "boostrap")))
estimate statistic p.value parameter conf.low conf.high method alternative
85.06922 6 NA 1e+05 77.18345 94.22357 Directly standardized rate: Gamma Method for Weighted Sum of Poissons two.sided
85.06922 6 NA 1e+05 77.17350 93.52121 Directly standardized rate: Dobson’s method for Weighted Sum of Poissons with Exact two-sided Poisson test (central method) two.sided
85.06922 6 NA 1e+05 77.34172 93.52023 Directly standardized rate: Approximate bootstrap method for Weighted Sum of Poissons two.sided

2.9 Other sources

The fantastic hansard package provides an R interface the to http://www.data.parliament.uk API and returns results in tidy format.

It gives access to information about:

  • Parliamentary questions
  • Written questions
  • Petitions
  • Election information
  • Research briefings and topics
  • Bills

from both the Commons and the Lords.

library(hansard)


## Health topics

topics <- research_subtopics_list()
list <- topics$`Health services and medicine`

For example, the classification of health topics for parliamentary briefings are Alcoholism, Animal experiments, Communicable diseases, Diseases, Drugs misuse, Genetics, Health education and preventive medicine, Health finance, Health services, Health staff and professions, Medical ethics, Medicine, Mental health, Obesity, Patient rights and complaints, Smoking, Vaccination.

We can use this to see what briefings have been created about or including, say obesity.

briefings <- research_briefings(subtopic = list[14] )

briefings %>%
  select(title, description)
## # A tibble: 17 x 2
##    title      description                                                 
##    <chr>      <chr>                                                       
##  1 Obesity S… 26% of adults in England are obese. A further 35% are overw…
##  2 World Can… "This House of Lords Library Briefing provides information …
##  3 The effec… "This pack has been prepared ahead of the debate to be held…
##  4 Social In… This Commons Library research briefing summarises a wide ra…
##  5 Key Issue… Key Issues 2017 is a series of short briefings on the topic…
##  6 Allocatio… This pack has been prepared ahead of the debate to be held …
##  7 Sugar and… This POSTnote summarises the health risks associated with e…
##  8 Debate pa… "This debate pack is prepared for the Backbench Business de…
##  9 Debate on… "This pack has been compiled ahead of the debate on A tax o…
## 10 Building … This In Focus briefing provides some statistics relating to…
## 11 Obesity L… This In Focus note examines the increasing number of adults…
## 12 Obesity T… A quarter of adults in the UK are clinically obese and ther…
## 13 Preventin… This POSTnote describes the causes of diabetes and the know…
## 14 Obesity    This note sets out the definition and causes of obesity, th…
## 15 The impor… Article examining the relationship between taking part in s…
## 16 Diet and … This POSTnote looks at the latest scientific research, advi…
## 17 Food Prod… Food Products (Marketing to Children) Bill. (Bill 19 of 200…

We can also look at the questions answered by the SofS and the Department of Health.

First we need the id for any given MP (1572 for Jeremy Hunt) or the department name, and we can then extract the data as below.

members_search("Hunt")
## # A tibble: 19 x 12
##    mnis_id home_page   additional_name_v… family_name_val… full_name_value
##    <chr>   <chr>       <chr>              <chr>            <chr>          
##  1 2374    <NA>        Crother            Crother-Hunt     Lord Crowther-…
##  2 1425    http://www… Simon              Djanogly         Mr Jonathan Dj…
##  3 2414    <NA>        Charles Lindsay    Gordon           The Marquess o…
##  4 2021    <NA>        Charles Gomer      Gordon           The Marquess o…
##  5 2313    <NA>        John Clarence Wes… Hastings         Earl of Huntin…
##  6 2022    <NA>        Edward Robin Hood  Hastings-Bass    Earl of Huntin…
##  7 994     <NA>        James Fletcher     Hunt             Lord Hunt of W…
##  8 2003    <NA>        Cecil John         Hunt             Lord Hunt      
##  9 1572    http://www… Richard Streynsham Hunt             Mr Jeremy Hunt 
## 10 2361    <NA>        Henderson          Hunt             Lord Hunt of F…
## 11 2023    <NA>        Joseph Benedict    Hunt             Lord Hunt of T…
## 12 1024    <NA>        Leonard            Hunt             Sir John Hunt  
## 13 2543    <NA>        Charles Roland     Hunt             Lord Hunt of C…
## 14 2024    <NA>        Alexander          Hunt             Lord Hunt of K…
## 15 4111    http://www… <NA>               Hunt             Tristram Hunt  
## 16 48      <NA>        Robert Frederick … Hunter           Mr Andrew Hunt…
## 17 1598    http://www… James              Hunter           Mark Hunter    
## 18 3018    <NA>        Brockie            Hunter           Lord Hunter of…
## 19 119     <NA>        <NA>               Major            Mr John Major  
## # ... with 7 more variables: gender_value <chr>, given_name_value <chr>,
## #   label_value <chr>, constituency_about <chr>,
## #   constituency_label_value <chr>, party_value <chr>, twitter_value <chr>
commons_answered_questions(answered_by = 1572) %>%
  select(date_of_answer_value, answer_text_value )
## # A tibble: 35 x 2
##    date_of_answer_value answer_text_value                                 
##    <dttm>               <chr>                                             
##  1 2014-11-25 00:00:00  <p>One year on from the Government’s formal respo…
##  2 2015-01-13 00:00:00  <p>There are over 6,000 more nurses on our hospit…
##  3 2015-02-24 00:00:00  <p>Since May 2010, £179 million of signed savings…
##  4 2015-06-02 00:00:00  <p>Trusts placed in special measures receive tail…
##  5 2015-10-13 00:00:00  <p>£400 million in resilience money has been inve…
##  6 2015-11-17 00:00:00  <p>£400 million in resilience money has been inve…
##  7 2016-02-09 00:00:00  <p>In November, I announced a campaign to halve t…
##  8 2016-07-26 00:00:00  <p>We acknowledge that local government are vital…
##  9 2016-09-08 00:00:00  <p>This is an employer and employee matter betwee…
## 10 2016-09-08 00:00:00  <p>This is an employer and employee matter betwee…
## # ... with 25 more rows
phe_questions <- hansard_all_answered_questions(answering_body = "health", start_date = "2016-01-01") 

phe_questions <- phe_questions[, -1] %>%
  mutate(row = row_number())
  
filter(phe_questions, str_detect(answer_text_value, "Public Health England")) %>%
    group_by(yearq = zoo::as.yearqtr(date_of_answer_value)) %>%
  count() %>%
  ggplot(aes(yearq, n)) +
  geom_col() +
  zoo::scale_x_yearqtr() +
  labs(title = "Frequency of answered parliamentary questions involving PHE by quarter")

library(myScrapers)
filter(phe_questions, str_detect(question_text, "Public Health England")) %>%
  select(row, date_of_answer_value, ng_dept_short_name_value, question_text, answer_text_value) %>%
  create_bigrams(text = answer_text_value) %>%
  count(bigram, sort = TRUE) %>%
  with(wordcloud::wordcloud(bigram, n, min.freq = 5, max.words = "Inf" ))
Wordcloud of questions asked and answered in Parliament containg 'Public Health England'

Wordcloud of questions asked and answered in Parliament containg ‘Public Health England’

library(quanteda)

pheqn <- filter(phe_questions, str_detect(question_text, "Public Health England")) %>%
  select(row, date_of_answer_value, ng_dept_short_name_value, question_text, answer_text_value, tabling_member_printed)

phecorp <- corpus(pheqn$answer_text_value)
docvars(phecorp) <- pheqn$question_text

phedfm <- dfm(phecorp, group = phecorp$docvar1, remove = stopwords("english"), stem = FALSE, remove_punct  =TRUE, remove_numbers = TRUE)

phedfm <- dfm_remove(phedfm, c("uploads", "td", "tr", "attachment_data", "http", "https", "p", "_blank", "em", "br", "shif", "wp-content", "et", "al"))

topfeatures(phedfm, 100)
##          health             phe         england          public 
##             498             222             222             221 
##           trial           local      government          sexual 
##             192             134              92              87 
##        national            data             nhs        services 
##              86              82              75              72 
##        research        evidence       including             hiv 
##              67              67              67              66 
##          review          report      department            prep 
##              64              64              61              58 
##       published          impact   commissioning           group 
##              56              54              52              51 
##             can         clinics          cancer       programme 
##              50              48              47              47 
##            drug       available         support     information 
##              47              46              46              46 
##    reproductive            work         service      www.gov.uk 
##              45              44              43              40 
##          target          mental          advice             new 
##              40              40              39              39 
##        clinical          access          number            part 
##              38              38              37              36 
##     authorities          groups            href   contraception 
##              36              35              35              35 
##        approval           years        patients           sites 
##              35              34              34              34 
##            risk         provide            care        guidance 
##              33              33              33              32 
##          centre           range          within        provided 
##              32              31              30              30 
##        products            plan          across   commissioners 
##              30              30              30              28 
##          people        outcomes            site         process 
##              28              28              28              27 
##          ensure            also           guide recommendations 
##              27              27              27              26 
##          action         working          strong     development 
##              26              26              26              25 
##        training       england's          system       framework 
##              25              24              23              23 
##       committee       authority         eatwell            well 
##              23              23              23              22 
##          survey       following            year       therefore 
##              22              21              21              21 
##       reference       currently            high           place 
##              21              20              20              20 
##       officials            food             use          likely 
##              20              20              19              19 
##           needs      monitoring           women         current 
##              19              19              19              18
presDfm <- dfm_trim(phedfm, min_count = 3, max_docfreq = 3)

presDistMat <- textstat_dist(dfm_weight(presDfm, "relfreq"))

presCluster <- hclust(presDistMat)

presCluster$labels <- docnames(presDfm)

plot(presCluster, xlab = "", sub = "", main = "Clustering of public health related questions by MP", cex = 0.4, hang = -.5, labels = pheqn$tabling_member_printed )

if (require(topicmodels)) {
    myLDAfit20 <- LDA(convert(presDfm, to = "topicmodels"), k = 10)
    get_terms(myLDAfit20, 10)
}
##       Topic 1                     Topic 2       Topic 3       
##  [1,] "gbd"                       "fluorosis"   "hepatitis"   
##  [2,] "menus"                     "dental"      "introduction"
##  [3,] "cft"                       "tbody"       "genital"     
##  [4,] "agencies"                  "mild"        "warts"       
##  [5,] "sugar"                     "mcgrady"     "c"           
##  [6,] "dph"                       "newcastle"   "hev"         
##  [7,] "council.lancashire.gov.uk" "manchester"  "goals"       
##  [8,] "mgissuehistoryhome.aspx"   "ins"         "policies"    
##  [9,] "iid"                       "ministerial" "reductions"  
## [10,] "papers"                    "higher"      "period"      
##       Topic 4         Topic 5        
##  [1,] "emergency"     "coverage"     
##  [2,] "iso"           "cervical"     
##  [3,] "laboratory"    "oral"         
##  [4,] "secretary"     "incinerators" 
##  [5,] "laboratories"  "interventions"
##  [6,] "accredited"    "black"        
##  [7,] "accreditation" "toolkit"      
##  [8,] "ripl"          "waste"        
##  [9,] "list"          "sharing"      
## [10,] "ukas"          "disadvantaged"
##       Topic 6                                                           
##  [1,] "transformation"                                                  
##  [2,] "collections"                                                     
##  [3,] "liver"                                                           
##  [4,] "green"                                                           
##  [5,] "methods"                                                         
##  [6,] "commissioning-sexual-health-reproductive-health-and-hiv-services"
##  [7,] "multidisciplinary"                                               
##  [8,] "boroughs"                                                        
##  [9,] "nmhin"                                                           
## [10,] "efficient"                                                       
##       Topic 7              Topic 8          Topic 9          
##  [1,] "www.nice.org.uk"    "drug-related"   "hospital"       
##  [2,] "vitamin"            "royal"          "volumes"        
##  [3,] "d"                  "performance"    "symptoms"       
##  [4,] "quit"               "ambitions"      "survival"       
##  [5,] "zika"               "force"          "collect"        
##  [6,] "loss"               "adph"           "trials"         
##  [7,] "www.england.nhs.uk" "continence"     "systemic"       
##  [8,] "pdf"                "hee"            "anti-cancer"    
##  [9,] "mhra"               "pmhlwd"         "confidentiality"
## [10,] "cigarettes"         "accountability" "fund"           
##       Topic 10       
##  [1,] "water"        
##  [2,] "fluoridation" 
##  [3,] "reported"     
##  [4,] "fluoride"     
##  [5,] "reporting"    
##  [6,] "limitations"  
##  [7,] "ncras"        
##  [8,] "trusts"       
##  [9,] "authoritative"
## [10,] "sexually"
tidytext::tidy(myLDAfit20) %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  arrange(topic, -beta) %>%
  mutate(term =  str_trunc(term, width = 20, side = "center")) %>%
  #mutate(row = row_number()) %>%
  ggplot(aes(reorder(term, beta), beta)) +
  geom_col() +
  coord_flip() +
  theme(axis.text = element_text(size = rel(.8))) +
  facet_wrap(~topic, scales = "free") +
  labs(x = "terms")

2.10 Extracting tables from published literature

To extract the tabular information from http://jech.bmj.com/content/71/8/827 in a reusable form is complex but potentially useful.

Steps:

  • The simplest way of doing this (and its not that simple) is to use the tabulizer package form rOpenSci. This requires rJava which can be temperamental and might require a series of updates and applying the fix outlined here (for MacOS at least).
  • To identify if fulltext is available you can use the fulltext package. This is also available from rOpenSci and provides an interface to a range of bibliographic databases and through which you can download full text articles.

It can be quite complex to extract a table even with these helpers and may require a fair bit of data tidying (the tidyr package is very helpful here) and use of regular expressions.

This example shows how to extract the data in tables 1 and 2 from Masters R, et al. J Epidemiol Community Health 2017;71:827–834. doi:10.1136/jech-2016-208141

# install.packages("rJava")
library(rJava) # load and attach 'rJava' now
# install.packages("devtools")
# devtools::install_github("ropensci/tabulizer", args="--no-multiarch")
# 
# devtools::install_github("ropensci/tabulizerjars", args="--no-multiarch")
library(tabulizer)
library(tabulizerjars)

tab <- extract_tables("~/Downloads/827.full.pdf")

data.frame(cbind(tab[[2]][3:11,1:4])) %>%
  separate(X1, c("category", "median roi"), sep = "\\s(\\d)") %>%
  separate(X2, remove = FALSE, c("roi min", "to", "roi max",  "no roi studies", "median cbr"), sep = " ") %>%
  select(-c(to, X2)) %>%
  filter(category != "Specialism") %>%
  rename(cbr_range = X3,
         No_CBR_studies = X4 ) %>%
  formattable::formattable(caption = "Table 1, Masters R, et al. J Epidemiol Community Health 2017;71:827–834. doi:10.1136/jech-2016-208141 ")
Table 1, Masters R, et al. J Epidemiol Community Health 2017;71:827–834. doi:10.1136/jech-2016-208141
category median roi roi min roi max no roi studies median cbr cbr_range No_CBR_studies
Overall 4.3 –21.3 221 34 8.3 0.7 to 29.4 23
Local level .1 0.9 19.3 18 10.3 0.9 to 23.6 11
National level 7.2 –21.3 221 17 17 1.2 to 167. 10
Health protection 4.2 0.7 221 8 41.8 1.1 to 167 10
Legislation 6.5 38 55 2 5.8 3 to 8.6 2
Health promotion .2 0.7 6.2 12 14.4 2 to 29.4 3
Healthcare public health .1 1.1 19.3 6 None None reported None reported
Wider determinants .6 1.1 10.8 6 7.1 0.7 to 23.6 6
table <- data.frame(tab[[5]][-c(1:3),1:7])
colnames(table) <- tab[[5]][2,1:7]
table 
##                                                                                          
## 1                                                         Abelson et al10 Hib vaccination
## 2                                                     Abelson et al10 HIV/AIDS prevention
## 3                                                     Abelson et al10 Measles vaccination
## 4                    Abelson et al10 Programmes to reduce rates of coronary heart disease
## 5                                Abelson et al10 Programmes to reduce tobacco consumption
## 6                                                   Abelson et al10 Road safety campaigns
## 7                                     Boccalini et al32 Universal hepatitis B vaccination
## 8                                                                                        
## 9             Bonin et al58 Parenting programmes for the prevention of persistent conduct
## 10                                                                              disorders
## 11                                                             Drummond11 Needle exchange
## 12                               Evans-Lacko et al12 Antistigma social marketing campaign
## 13                                                     Garpenholt et al38 Hib vaccination
## 14                                         Gortmaker et al15 Sugar sweetened beverage tax
## 15    Gortmaker et al15 Eliminating tax subsidy of nutritionally poor food advertising to
## 16                                                                               children
## 17                                                              Gould8 Lead paint control
## 18                                                                                       
## 19 Holtgrave et al34 HIV counselling, testing, referral and partner notification services
## 20                                                Hutchinson et al35 Expanded HIV testing
## 21                                                           Kwon et al36 Needle exchange
## 22                                          Lokkerbol et al55 Telemedicine for depression
## 23                                                                                       
## 24                                               McGuire et al14 Family planning services
## 25                                        Miller et al16 Booster seats for 4–7 years olds
## 26                                                         Nguyen et al37 Needle exchange
## 27                                 Nichol et al7 Influenza vaccination of healthy workers
## 28                                       Romano et al40 Folic acid fortification of grain
## 29                        Trust for America13 Primary and secondary prevention programmes
## 30                                         Wang et al50 Universal school nursing services
## 31                                                          White et al28 MMR vaccination
## 32                           Ding et al30 Hospital-based postpartum influenza vaccination
## 33                                                                                       
## 34                                                           Zhou et al38 Hib vaccination
## 35                                                                                       
## 36                   Hib, haemophilus influenzae type b; MMR, measles, mumps and rubella.
##                  Benefit-cost           Return on              
## 1    Australia           1.06                           Medical
## 2    Australia              4                           Medical
## 3    Australia            167                           Medical
## 4    Australia             11                           Medical
## 5    Australia              2                           Medical
## 6    Australia              3                           Medical
## 7        Italy                              €2.78   Medical and
## 8                                                      societal
## 9      England           7.89                       Medical and
## 10                                                     societal
## 11   Australia            1.2                     Public health
## 12     England                      £0.7 to £1.90       Unclear
## 13      Sweden           1.59                          Societal
## 14         USA                                $55       Medical
## 15         USA                                $38       Medical
## 16                                                             
## 17         USA                        $17 to $221   Medical and
## 18                                                     societal
## 19         USA          20.09                          Societal
## 20         USA                     $1.46 to $2.01 Health sector
## 21   Australia                     $A1.3 to $A5.5 Health sector
## 22         The                     €1.45 to €1.76       Medical
## 23 Netherlands                                                 
## 24          UK 11.09 to 29.39                           Medical
## 25         USA            8.6                           Medical
## 26         USA                              $3.48       Medical
## 27         USA                −$21.27 to +$174.32      Societal
## 28         USA     4.3 to 6.1                     Human capital
## 29         USA                               $6.2       Medical
## 30         USA                              $2.20      Societal
## 31         USA             14                           Medical
## 32         USA                               $1.7   Medical and
## 33                                                     societal
## 34         USA            5.4                       Medical and
## 35                                                     societal
## 36                                                             
##                          Time
## 1              5%    15 years
## 2              5%    25 years
## 3              5%    33 years
## 4              5%    40 years
## 5              5%    40 years
## 6              5%    40 years
## 7              3%    20 years
## 8                            
## 9            3.5%    35 years
## 10                           
## 11             5%    Lifetime
## 12           None      1 year
## 13             5%    20 years
## 14             3%    10 years
## 15             3%    10 years
## 16                           
## 17           None     Unclear
## 18                           
## 19             6%    Lifetime
## 20             3%    Lifetime
## 21             3%    Lifetime
## 22 1.5% costs, 4%     5 years
## 23        effects            
## 24             6%    Lifetime
## 25             3%     3 years
## 26           None      1 year
## 27             3%      1 year
## 28             4%    Lifetime
## 29             0% 10–20 years
## 30           None      1 year
## 31            10%    Lifetime
## 32             3%      1 year
## 33                           
## 34             3%    Lifetime
## 35                           
## 36