License: MIT

Background

During the 2020 COVID19 pandemic there was a call to help with the collation of information about hospital resources in South Africa to assist with the local response. For more information see this issue created by volunteers working on a COVID19-ZA dashboard with the Data Science for Social Impact group at the University of Pretoria.

My own interest in the question of health facilities is two-fold:

Collecting open hospital data sets from the Web

There is a wide variety of health facility web portals and datasets available online. Some access points do not allow for data download, for example the Department of Health’s Primary Health Care Facilities and Services page. Wikidata, the central storage for structured data of its Wikimedia sister projects, hosts a project named list of hospitals in South Africa which includes 208 facilities. Unfortunately the data for each facility is very sparsely populated.

The District Health Barometer report data for 2017/2018 was not considered for further analysis due to the inaccessible formatting of the tables in the spreadsheet.

The following potentially useful sources with downloadable data were identified.

Name Short Name Information Admin Level Web Raw Data Data License Origin/Owner Last Updated
Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds Hospital Bed Facility names, Number of beds and surgeons District Municipality (3) Article Figshare CC-BY-4.0 Dr. Angela Dell Mar 2016
District Health Barometer info 2016 2017 06 Feb 2018 DoH Health Barometer 2016/2017 Facility names, type District Municipality (3) Report Spreadsheet No explicit license South African National Department tof Health Feb 2018
District Health Barometer 2018/2019 HST Health Barometer 2018/2019 Facility names, type, date opened, coords, date closed District Municipality (3) Report Spreadsheet No explicit license Health Systems Trust Feb 2020
National Department of Health Data Dictionary DoH Data Dictionary Facility names, addresses, coordinate, type, rural/urban, ownership(e.g. national/provincial/private) Local Municipality (4) Data Repository No direct link available - select Download on page and select ‘Level 5’ data No explicit license South African National Department of Health Aug 2019
Healthsites.io Healthsites.io Various - information collected through crowdsourcing Depends on crowdsourced contribution Homepage API access or shapefile CC-BY-4.0 Crowdsourcing Mar 2020
KEMRI/WHO: A spatial database of health facilities managed by the public health sector in sub-Saharan Africa KEMRI/WHO Facility names, type, ownership, coordinates, source Province (2) Article Spreadsheet No explicit license - assumed CC-BY-4.0 based on article Hosted by WHO Global Malaria Program/Collected by KEMRI Feb 2019

Data cleaning

Most datasets required some cleaning up to be able to work with it in a programmatic way.

Hospital Bed Data

Raw data format

The data was made available in an Excel spreadsheet format with separate sheets for every province’s private and public health facilities.

Hospital Bed Data screenshot

Hospital Bed Data screenshot

Raw data to tidy data

The readme file displayed below shows the steps taken to convert the raw Hospital Bed dataset to tidy data. The data was not extracted programmatically due to the variable format of the tables in each sheet.

Source: https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711
New file: za_hospital_resources.csv
Steps to recreate:

1.  Create new file with two sheets - one for public hospital information and one for private hospital information
2.  Copy the primary table in each sheet into the relevant sheet in the new file (each province one under the other to create a single table in each sheet)
3.  Insert column for province
4.  Insert column for public/private
5.  Remove secondary phone numbers for ease of analysis
6.  Replace tel: with ‘ to keep 0 at beginning of phone numbers (some tel: has 1 space between colon and 0)
7.  Remove spaces from phone numbers for ease of comparison later on
8.  Remove coordinates for GP hospitals as it was for regions not individual hospitals
9.  Change kz Matatiele Private hospital from EC private hospitals to ec Matatiele Private hospital (Googled to confirm it is in EC)
10. Change ns Mediclinic Kimberley to NC Mediclinic Kimberley
11. Remove preceding province code from hospital names with replace ^\w\w\s
12. Remove all districts with nil private hospitals
13. Remove coordinate columns (will add later)
14. Change column headers for public hospitals:
  PROVINCE,province
  REGION,region
  GPS,[removed]
  HOSPITAL TYPE,hosp_class
  HOSPITAL,hosp_name
  USABLE BEDS,beds_usable
  APPROVED BEDS,beds_approved
  USABLE SB,beds_surgical_usable
  APP SURG BEDS,beds_surgical_approved
  SURGEONS (QUAL),surgeons_qualified
  SURGEONS (UNQUAL),sugeons_unqualified
  THEATRES,theatres
  CONTACT,hosp_contact
  TYPE,hosp_type
12. Change column headers for private hospitals
  PROVINCE,province
  GPS,[removed]
  REGION,region
  HOSPITAL TYPE,hosp_class
  HOSPITAL,hosp_name
  USABLE BEDS,beds_usable
  USE SUR BED,beds_surgical_usable
  THEATRES,theatres
  TYPE,hosp_type
13. Combine private/public data into a single spreadsheet
14. Add columns for source_date_day, source_date_month, source_date_year to keep track of when data was last updated
15. Add source column with link to original dataset in Figshare
16. Add column for source_name, source_surname
17. Add column for source_email
18. Add column for source_phone
19. Export to za_hospital_resources.csv

Tidy data

prov_abb ou3short type fac_name beds_usable beds_approved beds_surgical_usable beds_surgical_approved surgeons_qualified sugeons_unqualified theatres hosp_contact sector last_update_day last_updated_month last_updated_year source source_name source_surname source_email source_phone
GP Ekurhuleni MM Independent Actonville/Sunshine Hospital 200 NA 5 NA NA NA 5 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA
GP Ekurhuleni MM Life Life Bedford Gardens Private Hospital 140 NA 31 NA NA NA 6 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA
GP Ekurhuleni MM Independent Birchmed Surgical Centre 21 NA 21 NA NA NA 3 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA
GP Ekurhuleni MM Clinix Clinix Private Hospital Vosloorus/ Botshelong-Empilweni 104 NA 40 NA NA NA 3 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA
GP Ekurhuleni MM Netcare Clinton Clinic Netcare Hospital 165 NA 99 NA NA NA 5 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA
GP Ekurhuleni MM Life Life Dalview Clinic Hospital 75 NA 27 NA NA NA 4 NA private NA 3 2016 https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 Angela Dell angelajdell@gmail.com NA

DoH District Health Barometer Data 2016/2017

Raw data format

The data was made available in an Excel spreadsheet format with separate sheets for a variety of definitions, measurements, and summaries. The health facility list was stored in a sheet called ‘Hospitals’. District codes is available in this sheet, but we need to get the full district (or municipality) name from a sheet called ‘Seq’.

Hospital Bed Data screenshot

Hospital Bed Data screenshot

District full names for decoding

District full names for decoding

Raw data to tidy data

The readme file displayed below shows the steps taken to convert the raw DoH Health Barometer hospital dataset to tidy data. The data was not extracted programmatically due to the variable format of the tables in each sheet.

Source: File = http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
Sheet:   ‘Hospitals’
New temp files: za_hospital_list_temp.csv, za_hospital_list_refine.csv
New final file: za_hospital_list_DoH.csv

Steps to get from Source file to New file:

1.  In sheet = ‘Hospitals’ select OrgUnitCategor = ‘All’ and Level = ‘All’ in  http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
2.  Copy/paste table (from row 8 – 9346, column A – D)
3.  Add columns for day/month/year last updated
4.  Insert 6 February 2018 as last updated date (from file name)
5.  Add columns for source information
6.  Add columns with information about copyright ownership
7.  Export za_hospital_temp.csv for import into OpenRefine 3.2
8.  Open za_hospital_temp.csv in Openrefine 3.2
9a) Rename project to za_hospital_list_refine
9b) In undo/redo tab, select applyan za_hospital_list_temp_to_refine.json
9c) Paste contents from za_hospital_list_temp_to_refine.json and click on ‘Perform Operations’
  ** JSON script will do the following automatically
10a)  Remove rows that has provincial totals in
10b)  Fill cells down so that every cell has province, district, orgunittype
10c)  Change column header names
10d)  Remove leading province code from facility names
11. Export za_hospital_list_refine.csv from OpenRefine
12. Run za_hospital_list_refine_to_DoH.R in Rstudio
  ** R script will do the following:
13a)  Load za_hospital_refine.csv and za_hospital_district_names.csv (see readme_za_hospital_district_names.txt)
13b)  Merge the two tables in the files to create a new table with all the columns from za_hospital_list_refine.csv and a new column “district_name” from za_hospital_district_names.csv
13c)  Re-order the columns to make sense
13d) Export a new CSV file called za_hospital_list_DoH.csv
14. The final file contains the following columns:
  province  :  Province
  district_mdb  :  District code (from original source file)
  district_name : District name (from original source file)
  org_unit_type : Organisational Unit Type e.g. clinic, Community Day Centre, etc
  facility_name : Hospital/clinic etc name
  date_updated_day  :  From source file name
  date_updated_month  :  From source file name
  date_updated_year : From source file name
  source  :  URL for source file
  source_name : Organisation that provided the file (South African Dep of Health)
  copyright : Who owns the data if not in public domain or under open license

To clean the district names sheet for use in a join with the Hospitals data the following steps were followed:

Source: File = http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
Sheet:  ‘Seq’
New File: za_hospital_district_names.csv

Steps to get from Source file to New file:

1.  Copy/paste table (from row 5 – 57, column A – B)
2.  Fill province column manually
3.  Split column B on : to separate district name from district code
4.  Rename columns
5.  Export as CSV to join with za_hospital_list_refine.csv exported from OpenRefine (see readme_za_hospital_list.txt)
6.  Follow steps in readme_za_hospital_list.txt

To join the hospital data with the district names data, a short R script was written and is included in this file.

Tidy data

prov_abb ou3abb ou3short type fac_name date_updated_day date_updated_month date_updated_year source source_name copyright
EC BUF Buffalo City Clinic Alphendale Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health
EC BUF Buffalo City Clinic Amahleke Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health
EC BUF Buffalo City Clinic Amatola TB Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health
EC BUF Buffalo City Clinic Aspiranza Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health
EC BUF Buffalo City Clinic Beacon Bay Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health
EC BUF Buffalo City Clinic Berlin Clinic 6 2 2018 http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017?download=2652:district-health-barometer-info-2016-2017-06-feb-2018 South African Department of Health South African Department of Health

HST District Health Barometer Data 2018/2019

Raw data format

The data was made available in Excel Binary Workbook format (.xlsb) with separate sheets for a variety of definitions, measurements, and summaries. Health facility information was available from a sheet called ‘Fac_list’.

HST District Health Barometer Data screenshot

HST District Health Barometer Data screenshot

Tidy data

Data cleaning was performed in R (the code is included in this document).

fac_name type org_level prov_abb province ou3_short date_close date_open lat long comment ou4short
Aberdeen Hospital District Hospital DH EC Eastern Cape DC10 9999-12-31 1994-01-01 -32.48621 24.06093 EC101
Adelaide Hospital District Hospital DH EC Eastern Cape DC12 9999-12-31 1994-01-01 -32.70092 26.29427 EC129
Aliwal North Hospital District Hospital DH EC Eastern Cape DC14 9999-12-31 1994-01-01 -30.69698 26.70719 EC145
All Saints Hospital District Hospital DH EC Eastern Cape DC13 9999-12-31 1994-01-01 -31.66197 28.05041 EC137
Andries Vosloo Hospital District Hospital DH EC Eastern Cape DC10 9999-12-31 1994-01-01 -32.72181 25.59525 EC102
Bambisana Hospital District Hospital DH EC Eastern Cape DC15 9999-12-31 1994-01-01 -31.45019 29.45397 EC154

DoH Data Dictionary

Raw data format

The data was made available in CSV format.

Hospital Bed Data screenshot

Hospital Bed Data screenshot

Tidy data

Data cleaning was performed in R (the code is included in this document).

province prov_abb ou3short ou4name ou4short fac_name date_open date_close coordinates long lat contactperson address org_owner org_rural_urban type last_update
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Afsondering Clinic 1994-01-01 NA [28.94812,-30.17988] 28.94812 -30.1798 The Manager Kwa Makhoba Location, Lusikisiki, 4820 Gov Province Rural Clinic 2019-08-11 13:15:41
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Bergview Pharmacy 2018-07-01 NA [28.80539,-30.342847] 28.80539 -30.3428 NA Pick’n Pay Centre, Main Road, Matatiele Private Urban Pharmacy 2019-08-11 13:15:41
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Dr Mpho Desmond Liphapang General Practitioner 2014-03-01 NA NA NA NA NA NA Private Urban General Practitioner 2019-08-11 13:15:41
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Elukholweni Clinic 1994-01-01 NA [28.84811,-30.6329] 28.84811 -30.6329 NA NA Gov Province Rural Clinic 2019-08-11 13:15:41
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Isilindini Clinic 1994-01-01 NA [28.59078,-30.62002] 28.59078 -30.6200 NA Zingcuka Gov Province Rural Clinic 2019-08-11 13:15:41
Eastern Cape EC A Nzo DM ec Matatiele Local Municipality Matatiele LM Khotsong TB Hospital 1994-01-01 NA [28.82118,-30.34818] 28.82118 -30.3481 NA Jagger Street Gov Province Urban Specialised TB Hospital 2019-08-11 13:15:41

Healthsites.io

Raw data format

The data was made available in a shape file.

Tidy data

Data cleaning was performed in R (the code is included in this document).

This analysis is still in progress and will be added to the document shortly

KEMRI/WHO

The data for the whole of sub-Saharan Africa was made available in a Excel spreadsheet. Health facility data is available from the sheet named ‘SSA MFL’.

KEMRI/WHO Raw Data

KEMRI/WHO Raw Data

Tidy data

Data cleaning was performed in R (the code is included in this document).

ou2name fac_name type org_owner lat long source
Eastern Cape Aberdeen Hospital District Hospital MoH -32.4862 24.06093 GPS
Eastern Cape Aberdeen Satellite Clinic Satellite Clinic MoH -32.4750 24.05200 GPS
Eastern Cape AD Keet Clinic Clinic MoH -34.0602 24.91831 GPS
Eastern Cape Addo Clinic Clinic MoH -33.5422 25.69077 GPS
Eastern Cape Addo Enon Satellite Clinic Satellite Clinic MoH -33.3946 25.54625 GPS
Eastern Cape Adelaide Clinic Clinic MoH -32.7071 26.29461 GPS

What do we have?

Source Number of facilities Variables Admin levels included
Hospital Bed 543 prov_abb, ou3short, type, fac_name, beds_usable, beds_approved, beds_surgical_usable, beds_surgical_approved, surgeons_qualified, sugeons_unqualified, theatres, hosp_contact, sector, last_update_day, last_updated_month, last_updated_year, source, source_name, source_surname, source_email, source_phone 2, 3
DoH Health Barometer 2016/2017 9328 prov_abb, ou3abb, ou3short, type, fac_name, date_updated_day, date_updated_month, date_updated_year, source, source_name, copyright 2, 3
HST Health Barometer 2018/2019 654 fac_name, type, org_level, prov_abb, province, ou3_short, date_close, date_open, lat, long, comment, ou4short 2, 3, 4
DoH Data Dictionary 14305 province, prov_abb, ou3short, ou4name, ou4short, fac_name, date_open, date_close, coordinates, long, lat, contactperson, address, org_owner, org_rural_urban, type, last_update 2, 3, 4
Healthsites.io NA NA NA
KEMRI/WHO 4303 ou2name, fac_name, type, org_owner, lat, long, source 2