PHE publishes a range of information to https://www.gov.uk/phe inlcuding blogs, statistics and other publications.
This vignette introduces an internal PHE R package myScrapers
which includes tools to extract published information from the .gov.uk site. The tools include:
blog_type_extractor
which pulls a dataframe of blog topics or authorsphe_blog_scraper
which allows the user to retrieve the text of blogs by topic into a tidy data frame.phe_catalogue
which downloads an interactive table of PHE publications.These tools are intended to help text mining and analysis of published content, and assist developing outputs such as products and services catalogues.
The functions are assembled as an R package which is currently available on github.
The first step is to install the package. This can be achieved with the code below:
library(devtools)
install_github("julianflowers/myScrapers", build_vignettes =TRUE)
library(myScrapers)
We can obtain a table of blog topic categories.
ds <- blog_type_extractor(type = "category")
ds %>%
select(topic = `.x[[i]]`) %>%
arrange(topic)
#> topic
#> 1 cko
#> 2 ckoa-single-knowledge-and-intelligence-service
#> 3 ckopublic-health-data-cko
#> 4 ckopublic-health-outcomes-framework
#> 5 data-blog
#> 6 digital
#> 7 duncan-selbie-friday-message
#> 8 global-health
#> 9 health-improvement
#> 10 health-in-a-changing-climate
#> 11 health-matters
#> 12 health-profile-for-england
#> 13 health-visitors
#> 14 hp
#> 15 hpfield-epidemiology-hp
#> 16 hpreal-time-syndromic-surveillance
#> 17 hwb
#> 18 hwballied-health-professionals
#> 19 hwbdementia
#> 20 hwbdigital-health
#> 21 hwbmental-health
#> 22 hwbphysical-activity
#> 23 hwbscreening
#> 24 hwbsexual-health-hwb
#> 25 hwbsocial-marketing
#> 26 local-authority-public-health
#> 27 london-region
#> 28 microbiology-services
#> 29 midlands-and-east-of-england
#> 30 midwifery
#> 31 mythbuster
#> 32 northern-region
#> 33 nursing
#> 34 phe-announcement
#> 35 phe-people
#> 36 phes-science
#> 37 priority1
#> 38 priority2
#> 39 priority2health-economics
#> 40 priority2health-inequalities-priority2
#> 41 priority2healthcare-public-health
#> 42 priority3
#> 43 priority3antimicrobial-resistance
#> 44 priority3climate-change
#> 45 priority3immunisation-and-vaccination
#> 46 priority3sustainability
#> 47 priority4
#> 48 priority5
#> 49 priority6
#> 50 priority7
#> 51 science-hub
#> 52 the-week-at-phe
#> 53 uncategorized
We can extract blog posts for any given category e.g.
cat <- "cko"
blog <- phe_blog_scraper(cat, n = 7)
blog
#> title
#> 1 The National Cancer Diagnosis Audit – What it means for public health
#> 2 The November PHOF update – what have we learned?
#> 3 Introducing a new local authority public health dashboard
#> 4 Using local health data to address health inequalities
#> 5 Routes to Diagnosis – making the most of cancer diagnosis data
#> 6 What can we learn from our August PHOF update?
#> 7 Statistics are important, but some statistics are more important than others!
#> 8 What we learned from our 2017 Cancer Data and Outcomes Conference
#> 9 Developing and using a tool to improve outcomes in colorectal cancer
#> 10 A closer look at our February PHOF update
#> 11 Childhood cancer statistics – what can we learn from new data?
#> 12 Improving understanding of breast cancer survival in black women
#> 13 Health Profiles - 10 years on
#> 14 Cancer: Analysing diagnosis and survival
#> 15 Remember, remember...what we learned about the health of people in England this November
#> 16 Understanding the costs and benefits of investing in cancer
#> 17 Data can help improve care for people with psychosis
#> 18 Health inequalities and the ‘hidden majority’ of adults with learning disabilities
#> 19 More people are living with and beyond cancer than ever before
#> 20 Chemotherapy is powerful stuff but data is too
#> 21 Public health data: Going for gold
#> 22 New cancer dashboards - So much data; enough information?
#> 23 Rare Disease Day- why a patient voice is central to rare disease registration at PHE
#> 24 Answering the cancer commissioning questions that really matter
#> 25 Understanding cancer- the importance of patient data
#> 26 How and when is cancer diagnosed?
#> 27 Making sense of data: a challenge and a responsibility
#> 28 PHE Data Week: Putting patients at the heart of big data
#> 29 Making infographics work for you
#> 30 PHE Data Week: Digital and data science working together
#> 31 Immunisation in numbers – 5 fascinating facts
#> 32 Pulling it all together: a health intelligence network approach to young people’s data
#> 33 An A-Z of Public Health Data Science
#> 34 Big data in action: The story behind Routes to Diagnosis
#> 35 Data in action – using data to identify areas for improvement
#> 36 PHE Data Week: Our data at your fingertips - revisited
#> 37 PHE Data Week: Big data, data science and public health
#> 38 Variations in healthcare – how can we begin to tackle them?
#> 39 The route to earlier diagnosis
#> 40 GBD Compare: A new data tool for professionals
#> 41 The burden of disease and what it means in England
#> 42 Cancer and equality groups
#> 43 The Patient Portal: offering cancer patients access to their own records
#> 44 United against cancer – locally, nationally and internationally
#> 45 How can we make a contribution to work on rare diseases?
#> 46 Using small area data for local planning on the health of children
#> 47 Are older people with cancer treated differently?
#> 48 #DataBlog: Good news on smoking
#> 49 How does UK diabetes care compare to other European countries?
#> 50 #Datablog: Our data at your fingertips
#> 51 Healthier Lives: Making it easier to prevent and manage diabetes
#> 52 Data blog: Deaths in the 21st century - 12 years, 1300 causes, 6 million deaths
#> 53 Cancer registration as good as the world's best
#> 54 Just how close is ICT to the frontline these days?
#> 55 Data blog: We are what we eat
#> 56 Fighting congenital anomalies and rare diseases with information
#> 57 Saving lives with primary care data
#> 58 Life expectancy continues to rise, but inequalities remain
#> 59 Beyond big data: Bringing people together to improve cancer outcomes
#> 60 Getting better all the time
#> 61 The ordinary person: measuring height and weight in adults
#> 62 Fighting cancer with information
#> 63 Understanding alcohol-related hospital admissions
#> 64 Strengthening our intelligence
#> 65 Of RAGs and riches: indicators of public health in the Public Health Outcomes Framework
#> 66 Continuing the role of public health observatories
#> 67 Information as an intervention
#> text
#> 1 Cancer remains one of the leading causes of death, claiming thousands of lives every year. Today, the results from the National Cancer Diagnosis Audit have been published, which details and reviews the current process of care for cancer patients from …
#> 2 This is the fourth in a series of blogs summarising what we learn each time we update the Public Health Outcomes Framework (PHOF).
#> 3 As part of a wider government commitment to support greater transparency across the public sector, PHE has published a new local authority public health dashboard. Our Deputy Chief Executive and Chief Operating Officer Richard Gleave answers questions about the project: …
#> 4 Local Health provides indicators for small geographies (electoral wards and middle super output areas) which allow users to look at variation within larger areas, such as local authorities and Clinical Commissioning Groups (CCGs).
#> 5 This week marks the 4th update to “Routes to Diagnosis”, a key part of England’s efforts to improve cancer survival.
#> 6 This is the third in a series of blogs summarising what we learn each time we update the Public Health Outcomes Framework (PHOF).
#> 7 Statistics are important, but some statistics are more important than others! Today’s release of cancer survival statistics from PHE and the Office for National Statistics is one of the more important ones.
#> 8 Last week we were in Manchester for the National Cancer Registration and Analysis Service’s (NCRAS) annual Cancer Data and Outcomes Conference. The conference was a huge success and we welcomed over 400 delegates from 160 different organisations. The two-day event saw …
#> 9 With rising demand for health services putting pressure on resources, it is more important than ever to make sure services are cost-effective and efficient. Sharing knowledge and experience of new and innovative ways of working can help make sure the …
#> 10 This is the second in a series of blogs summarising what we learn each time we update the Public Health Outcomes Framework (PHOF).
#> 11 Cancer in children under the age of 15 is rare and accounts for less than 1% of all new cancer cases in England. Around 1,400 children are diagnosed with cancer each year.
#> 12 Recently the British Journal of Cancer published a study on the short-term breast cancer survival in England in relation to ethnicity and key tumour characteristics. It shows that the excess mortality among black women with breast cancer is attributable to …
#> 13 Have you used our Health Profiles to learn more about your area or inform your planning? For 10 years these reports – one for every local authority in England – have acted as conversation starters highlighting issues that can affect …
#> 14 Cancer survival has been steadily improving in most cancers for many years and we can help this trend continue in two broad ways. Firstly we can improve the stage at which a cancer is diagnosed (the earlier the better) through …
#> 15 If we want to improve people’s health we have to know both where we stand now, but also be able to track our progress. At PHE, we have a wealth of data on a wide variety of health topics. Part …
#> 16 Understanding the costs associated with cancer is vital in order that resources are used for maximum effect. And investing in prevention and early intervention is critical if we are going to reduce the emotional and physical impact of cancer on …
#> 17 PHE’s National Mental Health Intelligence Network (NMHIN) has published a report that reveals major variation in need and delivery of care for people with psychosis across England. It aims to help those involved locally in the planning, commissioning and provision of …
#> 18 A number of recent studies have shown that a ‘hidden majority’ of adults identified in childhood as having a learning disability are not identified as such within adult heath or social care services. The studies analysed data from the Understanding …
#> 19 Today at the PHE conference the National Cancer Registration and Analysis Service (NCRAS), in partnership with Macmillan Cancer Support, released new figures showing that there were 1.87 million people living with and beyond a cancer diagnosis in England at the …
#> 20 Today sees the publication of a paper in the journal The Lancet Oncology written by PHE and Cancer Research UK on 30-day mortality following chemotherapy on patients with breast and non-small cell lung cancer treated in England in 2014. The …
#> 21 The Olympics have shown us how four years of training, dedication, preparation and sheer hard-work can be distilled to a millisecond, an eighth of an inch or even the arbitrary whim of judges. Like most of us who were engrossed …
#> 22 This week we launched the dedicated online dashboard of cancer-related information. Released alongside the Cancer Strategy Implementation Plan, it was developed by PHE and NHS England to meet recommendation number 1 of the Independent Cancer Taskforce Report published last July. …
#> 23 The 29th of February will mark the 9th International Rare Disease Day, followed shortly afterwards by World Birth Defects Day on the 3rd of March. Hundreds of events to mark these two important days have been organised in over 80 …
#> 24 Can we improve cancer outcomes through the use of robots? How can we commission services differently to reduce variation in cancer outcomes? How can we implement a new local model of care for head and neck cancers to improve outcomes? …
#> 25 We all know someone close to us who has had cancer, and by 2020 nearly half of us born in 1960 will be living with a diagnosis of cancer ourselves. Our ability to detect cancer earlier, to understand why some …
#> 26 Last week Cancer Research UK released a press release showing that bowel cancer is more likely to be diagnosed at the earliest stage if it is picked up by screening. This used new PHE data that, for the first time, …
#> 27 #PHEDataWeek has been running across our social media channels this week. It has been a great opportunity for us to talk about what data means to us and try and help people understand just how important it is to our …
#> 28 Last week was the annual scientific conference of the National Cancer Research Institute (NCRI) at the BT Convention Centre in Liverpool, showcasing the latest basic, translational and clinical cancer research, with over 1,500 delegates. It was significant to see the …
#> 29 We're used to dealing with large numbers in public health: thousands of people, billions of pounds and terabytes of data. Historically these have been processed and interpreted by analysts into reports but now we are witnessing an increasing demand for …
#> 30 Digital services use technology to connect people with data and information. Both the digital and data science teams at Public Health England are adopting digital methods to make public health data more accessible and easy to interpret. User needs research …
#> 31 Whichever way you look at it, data around vaccines is both fascinating and inspiring. Immunisation programmes are one of the world’s public health triumphs with many millions of lives saved since the first vaccines were introduced. In this blog we …
#> 32 One in five people living in the UK are aged between 10 and 24 and we know that what happens in childhood plays a pivotal role in future health and happiness - this is why giving the best start in …
#> 33 This is a new language that all public health folks will require. There are lots of encyclopedias for Big Data: Dutch company Datfloq do a good one and the Big Data Made Simple portal has a useful one. What the …
#> 34 Today we published new data showing the different diagnosis routes for people with cancer. We have used the latest available data to produce an extensive set of results for over 2 million patients diagnosed with cancer between 2006 and 2013. …
#> 35 Public Health England and national partners are striving to ensure that data and information is used to improve quality of care and outcomes for people with cardiovascular disease. The interpretation of data and information should be acknowledged as a fundamental …
#> 36 This time last year I wrote about some of the data tools PHE produces to help local authorities, the NHS and others improve the nation’s health and reduce health inequality. We make many of these available via the updated GOV.UK data …
#> 37 Welcome to a week of activity - focussing on data - promoted across our social media channels. #PHEDataWeek looks to shine a light on data and move beyond the buzzwords to talk in depth about the importance of data to health protection, prevention …
#> 38 We all know that provision of healthcare varies from one hospital to another, from one GP to another and from one part of the country to another, but what causes these variations to occur? Healthcare can vary in a number …
#> 39 Today at the annual PHE conference we announced new data showing that cancers in England are being diagnosed earlier. The proportion of people with cancer diagnosed as an emergency has fallen, despite a rise in the overall numbers of cases. …
#> 40 This Q&A blog introduces ‘GBD Compare England” a new data visualisation tool created as part of the Global Burden of Disease project. PHE’s Chief Knowledge Officer John Newton has blogged about the findings of this major piece of research and …
#> 41 From the very beginning of Public Health England, we have been involved in a world-wide project to bring together the widest possible range of knowledge about health and its determinants. The Global Burden of Disease project involves more than 1,000 …
#> 42 The Cancer Taskforce launched its strategy this week. The strategy considers how to deliver better prevention, swifter diagnosis and improved treatment and care for all cancer patients. PHE played a significant part in its development. All of the priorities highlighted …
#> 43 The National Cancer Registration Service (NCRS) collects information about every patient diagnosed with a malignant tumour in England. You can read our blog on the importance of using this data. Now, through an innovative project, patients can also access this …
#> 44 This week marks the start of National Cancer Intelligence Network’s (NCIN) annual Cancer Outcomes Conference; one of the largest and most exciting cancer conferences in the UK. It is not only where new findings on cancer are launched and presented, but …
#> 45 The 28 February 2015 marks the 8th International Rare Disease Day, the aim of which is to raise awareness amongst policy makers and the public about rare diseases and their impact on patients' lives. There are over 6000 recognised rare conditions and in …
#> 46 Over the years, we in PHE have had many requests for data for small geographical areas, particularly for electoral wards. Through our work leading the National Child and Maternal (ChiMat) Health Intelligence Network we have seen that while those working …
#> 47 Today is World Cancer Day. People care about cancer and our PHE public opinions survey showed that cancer is the public’s number one health concern. We are lucky in England to have excellent cancer services and to be seeing improving …
#> 48 The recent publication of the ONS reports on Adult Smoking Habits in Great Britain and the Integrated Household Survey results for 2013 brings welcome news on smoking. Adult smoking rates are at their lowest ever, below 19% based on the …
#> 49 We are fortunate to have a number of data sources that allow us to track changes in the care and outcomes for people with diabetes and identify local variation across England. These include the National Diabetes Audit which provides an …
#> 50 Public Health England was created from a large number of organisations and we inherited a lot of data tools and profiles, many of which are accessible via our data gateway. The number of resources by broad category is shown in the graph …
#> 51 Today, we’re launching Healthier Lives, Diabetes: a new tool to track how we’re tackling diabetes in different areas of England. What’s new is the way it illustrates the risk factors for and care of people with diabetes across different communities …
#> 52 Next month the premature and preventable mortality data in the Public Health Outcomes Framework and in Longer Lives is being updated and refreshed. Mortality and dying data is a subject that generates much debate and mortality statistics are often in the news. …
#> 53 Today PHE released the most accurate national one-year cancer survival figures ever achieved for five cancer sites on cases diagnosed in 2012. This is a historic milestone and marks one of the greatest achievements so far of the new National Cancer Registration …
#> 54 Anyone who has read some of the earlier entries in this blog cannot fail to be impressed by the speed at which our public health science is evolving and the rate at which our medical microbiology is modernising. Of course, …
#> 55 Clive Humby, one of the brains behind Tesco Clubcards, said “Data is the new oil,” by which he meant that there is commercial value in the exploitation of the data that is collected by business. David McCandless, author of Information …
#> 56 We hear a great deal about the common illnesses that affect many people such as heart disease, diabetes and cancer. However, we hear much less on rare diseases, each of which affects relatively small numbers of people. Nevertheless, if you …
#> 57 We are very fortunate in England to have some of the best primary care services in the world. We also have amongst the most computerised general practices, and primary care activity and clinical information is recorded in real time at …
#> 58 Since life expectancy was first measured in the mid-19th century the trend in England has been of continued increase, interrupted only by the World Wars. Despite this, people in some areas of the country are still not living nearly as …
#> 59 It was just a year ago at the 2013 Cancer Outcomes Conference that we announced the completion of the migration to a single National Cancer Registration Service - described by the media as “the largest single cancer database in the …
#> 60 It’s nearly June. That means many things to many people – summer, school holidays, long evenings… and to some, Health Profiles. Since 2006, these summaries of health data for each local authority in England have been produced to support local …
#> 61 “To tall men I’m a midget and to short men I’m a giant; to the skinny ones I’m a fat man and to the fat ones I’m a thin man….. In fact I’m quite ordinary.” So says the Ordinary Man …
#> 62 We’re at an exciting time for cancer registration information, taking one step towards a single national cancer registration system. In November, the data from the last of our regional cancer registries was brought into the single processing system, Encore, at …
#> 63 Alcohol is England’s second biggest cause of premature deaths behind tobacco. 34 per cent of men and 28 per cent of women exceeded current consumption guidelines on at least one day in the last week. Public Health England, in partnership …
#> 64 English philosopher Sir Francis Bacon once said, “Knowledge is power”. The more of it we have, the better informed and equipped we are to address any issues and drive improvements. And this too is true for health. Within Public Health …
#> 65 “Are we there yet?” You don’t have to travel very far with small children before you are asked this question. In fact, a survey by Littlewoods.com in March 2013 reported that children ask their mothers around 300 questions every day …
#> 66 On 1 April 2013 the regional Public Health Observatories (PHOs) transferred along with the specialist observatories and the National Cancer Intelligence Network into PHE as a single Knowledge and Intelligence service for England. It was, and still is, a momentous …
#> 67 A question I often ask, as I talk to meetings and groups, is “which of the following is the commonest cause of premature death in your area?” Cancer Cardiovascular disease (heart disease and stroke). Liver disease Lung disease What do …
#> date category
#> 1 2017-12-19 cko
#> 2 2017-11-07 cko
#> 3 2017-10-16 cko
#> 4 2017-10-12 cko
#> 5 2017-09-12 cko
#> 6 2017-08-01 cko
#> 7 2017-06-29 cko
#> 8 2017-06-23 cko
#> 9 2017-03-27 cko
#> 10 2017-02-07 cko
#> 11 2017-02-01 cko
#> 12 2016-12-08 cko
#> 13 2016-11-28 cko
#> 14 2016-11-14 cko
#> 15 2016-11-07 cko
#> 16 2016-11-01 cko
#> 17 2016-10-17 cko
#> 18 2016-10-04 cko
#> 19 2016-09-13 cko
#> 20 2016-08-31 cko
#> 21 2016-08-25 cko
#> 22 2016-05-13 cko
#> 23 2016-02-29 cko
#> 24 2016-02-11 cko
#> 25 2016-02-04 cko
#> 26 2016-01-25 cko
#> 27 2015-11-13 cko
#> 28 2015-11-13 cko
#> 29 2015-11-13 cko
#> 30 2015-11-12 cko
#> 31 2015-11-12 cko
#> 32 2015-11-11 cko
#> 33 2015-11-11 cko
#> 34 2015-11-10 cko
#> 35 2015-11-10 cko
#> 36 2015-11-09 cko
#> 37 2015-11-09 cko
#> 38 2015-09-18 cko
#> 39 2015-09-16 cko
#> 40 2015-09-15 cko
#> 41 2015-09-15 cko
#> 42 2015-07-23 cko
#> 43 2015-06-09 cko
#> 44 2015-06-08 cko
#> 45 2015-02-27 cko
#> 46 2015-02-23 cko
#> 47 2015-02-04 cko
#> 48 2014-12-11 cko
#> 49 2014-12-04 cko
#> 50 2014-11-13 cko
#> 51 2014-10-30 cko
#> 52 2014-10-09 cko
#> 53 2014-08-26 cko
#> 54 2014-08-20 cko
#> 55 2014-08-07 cko
#> 56 2014-07-23 cko
#> 57 2014-06-12 cko
#> 58 2014-06-11 cko
#> 59 2014-06-09 cko
#> 60 2014-05-27 cko
#> 61 2014-03-05 cko
#> 62 2014-01-30 cko
#> 63 2014-01-15 cko
#> 64 2013-12-30 cko
#> 65 2013-11-13 cko
#> 66 2013-10-28 cko
#> 67 2013-09-25 cko
We can then undertake simple analysis - for example plot category counts
blog %>%
group_by(quarter = zoo::as.yearqtr(date)) %>%
count() %>%
ggplot(aes(quarter, n)) +
geom_col() +
labs(title = "CKO blogs - quarterly count") +
govstyle::theme_gov()
#> Don't know how to automatically pick scale for object of type yearqtr. Defaulting to continuous.
PHE is planning to create a products and services catalogue. To assist this process we have written a function which produces an interactive table of all the PHE publications by category (NB at the moment it is over inclusive). This makes use of the DT
package and allows us to add download options so the data can be downloaded in various forms.
phe_catalogue(url = "https://www.gov.uk/government/publications?departments%5B%5D=public-health-england", pages = 2:90)