During the 2020 COVID19 pandemic there was a call to help with the collation of information about hospital resources in South Africa to assist with the local response. For more information see this issue created by volunteers working on a COVID19-ZA dashboard with the Data Science for Social Impact group at the University of Pretoria.
My own interest in the question of health facilities is two-fold:
There is a wide variety of health facility web portals and datasets available online. Some access points do not allow for data download, for example the Department of Health’s Primary Health Care Facilities and Services page. Wikidata, the central storage for structured data of its Wikimedia sister projects, hosts a project named list of hospitals in South Africa which includes 208 facilities. Unfortunately the data for each facility is very sparsely populated.
The District Health Barometer report data for 2017/2018 was not considered for further analysis due to the inaccessible formatting of the tables in the spreadsheet.
The following potentially useful sources with downloadable data were identified.
Name | Short Name | Information | Admin Level | Web | Raw Data | Data License | Origin/Owner | Last Updated |
---|---|---|---|---|---|---|---|---|
Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds | Hospital Bed | Facility names, Number of beds and surgeons | District Municipality (3) | Article | Figshare | CC-BY-4.0 | Dr. Angela Dell | Mar 2016 |
District Health Barometer info 2016 2017 06 Feb 2018 | DoH Health Barometer 2016/2017 | Facility names, type | District Municipality (3) | Report | Spreadsheet | No explicit license | South African National Department tof Health | Feb 2018 |
District Health Barometer 2018/2019 | HST Health Barometer 2018/2019 | Facility names, type, date opened, coords, date closed | District Municipality (3) | Report | Spreadsheet | No explicit license | Health Systems Trust | Feb 2020 |
National Department of Health Data Dictionary | DoH Data Dictionary | Facility names, addresses, coordinate, type, rural/urban, ownership(e.g. national/provincial/private) | Local Municipality (4) | Data Repository | No direct link available - select Download on page and select ‘Level 5’ data | No explicit license | South African National Department of Health | Aug 2019 |
Healthsites.io | Healthsites.io | Various - information collected through crowdsourcing | Depends on crowdsourced contribution | Homepage | API access or shapefile | CC-BY-4.0 | Crowdsourcing | Mar 2020 |
KEMRI/WHO: A spatial database of health facilities managed by the public health sector in sub-Saharan Africa | KEMRI/WHO | Facility names, type, ownership, coordinates, source | Province (2) | Article | Spreadsheet | No explicit license - assumed CC-BY-4.0 based on article | Hosted by WHO Global Malaria Program/Collected by KEMRI | Feb 2019 |
Most datasets required some cleaning up to be able to work with it in a programmatic way.
The data was made available in an Excel spreadsheet format with separate sheets for every province’s private and public health facilities.
Hospital Bed Data screenshot
The readme file displayed below shows the steps taken to convert the raw Hospital Bed dataset to tidy data. The data was not extracted programmatically due to the variable format of the tables in each sheet.
Source: https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711
New file: za_hospital_resources.csv
Steps to recreate:
1. Create new file with two sheets - one for public hospital information and one for private hospital information
2. Copy the primary table in each sheet into the relevant sheet in the new file (each province one under the other to create a single table in each sheet)
3. Insert column for province
4. Insert column for public/private
5. Remove secondary phone numbers for ease of analysis
6. Replace tel: with ‘ to keep 0 at beginning of phone numbers (some tel: has 1 space between colon and 0)
7. Remove spaces from phone numbers for ease of comparison later on
8. Remove coordinates for GP hospitals as it was for regions not individual hospitals
9. Change kz Matatiele Private hospital from EC private hospitals to ec Matatiele Private hospital (Googled to confirm it is in EC)
10. Change ns Mediclinic Kimberley to NC Mediclinic Kimberley
11. Remove preceding province code from hospital names with replace ^\w\w\s
12. Remove all districts with nil private hospitals
13. Remove coordinate columns (will add later)
14. Change column headers for public hospitals:
PROVINCE,province
REGION,region
GPS,[removed]
HOSPITAL TYPE,hosp_class
HOSPITAL,hosp_name
USABLE BEDS,beds_usable
APPROVED BEDS,beds_approved
USABLE SB,beds_surgical_usable
APP SURG BEDS,beds_surgical_approved
SURGEONS (QUAL),surgeons_qualified
SURGEONS (UNQUAL),sugeons_unqualified
THEATRES,theatres
CONTACT,hosp_contact
TYPE,hosp_type
12. Change column headers for private hospitals
PROVINCE,province
GPS,[removed]
REGION,region
HOSPITAL TYPE,hosp_class
HOSPITAL,hosp_name
USABLE BEDS,beds_usable
USE SUR BED,beds_surgical_usable
THEATRES,theatres
TYPE,hosp_type
13. Combine private/public data into a single spreadsheet
14. Add columns for source_date_day, source_date_month, source_date_year to keep track of when data was last updated
15. Add source column with link to original dataset in Figshare
16. Add column for source_name, source_surname
17. Add column for source_email
18. Add column for source_phone
19. Export to za_hospital_resources.csv
prov_abb | ou3short | type | fac_name | beds_usable | beds_approved | beds_surgical_usable | beds_surgical_approved | surgeons_qualified | sugeons_unqualified | theatres | hosp_contact | sector | last_update_day | last_updated_month | last_updated_year | source | source_name | source_surname | source_email | source_phone |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GP | Ekurhuleni MM | Independent | Actonville/Sunshine Hospital | 200 | NA | 5 | NA | NA | NA | 5 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
GP | Ekurhuleni MM | Life | Life Bedford Gardens Private Hospital | 140 | NA | 31 | NA | NA | NA | 6 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
GP | Ekurhuleni MM | Independent | Birchmed Surgical Centre | 21 | NA | 21 | NA | NA | NA | 3 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
GP | Ekurhuleni MM | Clinix | Clinix Private Hospital Vosloorus/ Botshelong-Empilweni | 104 | NA | 40 | NA | NA | NA | 3 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
GP | Ekurhuleni MM | Netcare | Clinton Clinic Netcare Hospital | 165 | NA | 99 | NA | NA | NA | 5 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
GP | Ekurhuleni MM | Life | Life Dalview Clinic Hospital | 75 | NA | 27 | NA | NA | NA | 4 | NA | private | NA | 3 | 2016 | https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711 | Angela | Dell | angelajdell@gmail.com | NA |
The data was made available in an Excel spreadsheet format with separate sheets for a variety of definitions, measurements, and summaries. The health facility list was stored in a sheet called ‘Hospitals’. District codes is available in this sheet, but we need to get the full district (or municipality) name from a sheet called ‘Seq’.
Hospital Bed Data screenshot
District full names for decoding
The readme file displayed below shows the steps taken to convert the raw DoH Health Barometer hospital dataset to tidy data. The data was not extracted programmatically due to the variable format of the tables in each sheet.
Source: File = http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
Sheet: ‘Hospitals’
New temp files: za_hospital_list_temp.csv, za_hospital_list_refine.csv
New final file: za_hospital_list_DoH.csv
Steps to get from Source file to New file:
1. In sheet = ‘Hospitals’ select OrgUnitCategor = ‘All’ and Level = ‘All’ in http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
2. Copy/paste table (from row 8 – 9346, column A – D)
3. Add columns for day/month/year last updated
4. Insert 6 February 2018 as last updated date (from file name)
5. Add columns for source information
6. Add columns with information about copyright ownership
7. Export za_hospital_temp.csv for import into OpenRefine 3.2
8. Open za_hospital_temp.csv in Openrefine 3.2
9a) Rename project to za_hospital_list_refine
9b) In undo/redo tab, select applyan za_hospital_list_temp_to_refine.json
9c) Paste contents from za_hospital_list_temp_to_refine.json and click on ‘Perform Operations’
** JSON script will do the following automatically
10a) Remove rows that has provincial totals in
10b) Fill cells down so that every cell has province, district, orgunittype
10c) Change column header names
10d) Remove leading province code from facility names
11. Export za_hospital_list_refine.csv from OpenRefine
12. Run za_hospital_list_refine_to_DoH.R in Rstudio
** R script will do the following:
13a) Load za_hospital_refine.csv and za_hospital_district_names.csv (see readme_za_hospital_district_names.txt)
13b) Merge the two tables in the files to create a new table with all the columns from za_hospital_list_refine.csv and a new column “district_name” from za_hospital_district_names.csv
13c) Re-order the columns to make sense
13d) Export a new CSV file called za_hospital_list_DoH.csv
14. The final file contains the following columns:
province : Province
district_mdb : District code (from original source file)
district_name : District name (from original source file)
org_unit_type : Organisational Unit Type e.g. clinic, Community Day Centre, etc
facility_name : Hospital/clinic etc name
date_updated_day : From source file name
date_updated_month : From source file name
date_updated_year : From source file name
source : URL for source file
source_name : Organisation that provided the file (South African Dep of Health)
copyright : Who owns the data if not in public domain or under open license
To clean the district names sheet for use in a join with the Hospitals data the following steps were followed:
Source: File = http://www.health.gov.za/index.php/2014-03-17-09-09-38/reports/category/424-reports-2017# District HealthBarometer info 2016 2017 06 Feb 2018 (a spreadsheet)
Sheet: ‘Seq’
New File: za_hospital_district_names.csv
Steps to get from Source file to New file:
1. Copy/paste table (from row 5 – 57, column A – B)
2. Fill province column manually
3. Split column B on : to separate district name from district code
4. Rename columns
5. Export as CSV to join with za_hospital_list_refine.csv exported from OpenRefine (see readme_za_hospital_list.txt)
6. Follow steps in readme_za_hospital_list.txt
To join the hospital data with the district names data, a short R script was written and is included in this file.
The data was made available in Excel Binary Workbook format (.xlsb) with separate sheets for a variety of definitions, measurements, and summaries. Health facility information was available from a sheet called ‘Fac_list’.
HST District Health Barometer Data screenshot
Data cleaning was performed in R (the code is included in this document).
fac_name | type | org_level | prov_abb | province | ou3_short | date_close | date_open | lat | long | comment | ou4short |
---|---|---|---|---|---|---|---|---|---|---|---|
Aberdeen Hospital | District Hospital | DH | EC | Eastern Cape | DC10 | 9999-12-31 | 1994-01-01 | -32.48621 | 24.06093 | EC101 | |
Adelaide Hospital | District Hospital | DH | EC | Eastern Cape | DC12 | 9999-12-31 | 1994-01-01 | -32.70092 | 26.29427 | EC129 | |
Aliwal North Hospital | District Hospital | DH | EC | Eastern Cape | DC14 | 9999-12-31 | 1994-01-01 | -30.69698 | 26.70719 | EC145 | |
All Saints Hospital | District Hospital | DH | EC | Eastern Cape | DC13 | 9999-12-31 | 1994-01-01 | -31.66197 | 28.05041 | EC137 | |
Andries Vosloo Hospital | District Hospital | DH | EC | Eastern Cape | DC10 | 9999-12-31 | 1994-01-01 | -32.72181 | 25.59525 | EC102 | |
Bambisana Hospital | District Hospital | DH | EC | Eastern Cape | DC15 | 9999-12-31 | 1994-01-01 | -31.45019 | 29.45397 | EC154 |
The data was made available in CSV format.
Hospital Bed Data screenshot
Data cleaning was performed in R (the code is included in this document).
province | prov_abb | ou3short | ou4name | ou4short | fac_name | date_open | date_close | coordinates | long | lat | contactperson | address | org_owner | org_rural_urban | type | last_update |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Afsondering Clinic | 1994-01-01 | NA | [28.94812,-30.17988] | 28.94812 | -30.1798 | The Manager | Kwa Makhoba Location, Lusikisiki, 4820 | Gov Province | Rural | Clinic | 2019-08-11 13:15:41 |
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Bergview Pharmacy | 2018-07-01 | NA | [28.80539,-30.342847] | 28.80539 | -30.3428 | NA | Pick’n Pay Centre, Main Road, Matatiele | Private | Urban | Pharmacy | 2019-08-11 13:15:41 |
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Dr Mpho Desmond Liphapang General Practitioner | 2014-03-01 | NA | NA | NA | NA | NA | NA | Private | Urban | General Practitioner | 2019-08-11 13:15:41 |
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Elukholweni Clinic | 1994-01-01 | NA | [28.84811,-30.6329] | 28.84811 | -30.6329 | NA | NA | Gov Province | Rural | Clinic | 2019-08-11 13:15:41 |
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Isilindini Clinic | 1994-01-01 | NA | [28.59078,-30.62002] | 28.59078 | -30.6200 | NA | Zingcuka | Gov Province | Rural | Clinic | 2019-08-11 13:15:41 |
Eastern Cape | EC | A Nzo DM | ec Matatiele Local Municipality | Matatiele LM | Khotsong TB Hospital | 1994-01-01 | NA | [28.82118,-30.34818] | 28.82118 | -30.3481 | NA | Jagger Street | Gov Province | Urban | Specialised TB Hospital | 2019-08-11 13:15:41 |
The data was made available in a shape file.
Data cleaning was performed in R (the code is included in this document).
The data for the whole of sub-Saharan Africa was made available in a Excel spreadsheet. Health facility data is available from the sheet named ‘SSA MFL’.
KEMRI/WHO Raw Data
Data cleaning was performed in R (the code is included in this document).
ou2name | fac_name | type | org_owner | lat | long | source |
---|---|---|---|---|---|---|
Eastern Cape | Aberdeen Hospital | District Hospital | MoH | -32.4862 | 24.06093 | GPS |
Eastern Cape | Aberdeen Satellite Clinic | Satellite Clinic | MoH | -32.4750 | 24.05200 | GPS |
Eastern Cape | AD Keet Clinic | Clinic | MoH | -34.0602 | 24.91831 | GPS |
Eastern Cape | Addo Clinic | Clinic | MoH | -33.5422 | 25.69077 | GPS |
Eastern Cape | Addo Enon Satellite Clinic | Satellite Clinic | MoH | -33.3946 | 25.54625 | GPS |
Eastern Cape | Adelaide Clinic | Clinic | MoH | -32.7071 | 26.29461 | GPS |
Source | Number of facilities | Variables | Admin levels included |
---|---|---|---|
Hospital Bed | 543 | prov_abb, ou3short, type, fac_name, beds_usable, beds_approved, beds_surgical_usable, beds_surgical_approved, surgeons_qualified, sugeons_unqualified, theatres, hosp_contact, sector, last_update_day, last_updated_month, last_updated_year, source, source_name, source_surname, source_email, source_phone | 2, 3 |
DoH Health Barometer 2016/2017 | 9328 | prov_abb, ou3abb, ou3short, type, fac_name, date_updated_day, date_updated_month, date_updated_year, source, source_name, copyright | 2, 3 |
HST Health Barometer 2018/2019 | 654 | fac_name, type, org_level, prov_abb, province, ou3_short, date_close, date_open, lat, long, comment, ou4short | 2, 3, 4 |
DoH Data Dictionary | 14305 | province, prov_abb, ou3short, ou4name, ou4short, fac_name, date_open, date_close, coordinates, long, lat, contactperson, address, org_owner, org_rural_urban, type, last_update | 2, 3, 4 |
Healthsites.io | NA | NA | NA |
KEMRI/WHO | 4303 | ou2name, fac_name, type, org_owner, lat, long, source | 2 |