** Please observe individual dataset licensing if you would like to use the original datasets
In March 2020 the afrimapr team set out to develop building blocks in R that would make open health facility data more accessible to data scientists in Africa and elsewhere. The afrihealthsites package aims to provide functionality to load, analyse, visualise, and map open health facility datasets such as the list compiled by KEMRI-Wellcome Research Programme (KWTRP) for sub-Saharan Africa and the data made available via the Global Healthsites Mapping Project, (healthsites.io).
Through our research we learned about the term master facility list (MFL). A master facility list contains information about the full complement of health facilities in a country. The World Health Organisation developed a guide for countries wanting to develop their own MFL or wanting to strengthen existing MFLs. We were excited to find several African MFLs available online.
Here we perform some exploratory analysis on the MFLs from a number of countries to understand the overlaps and differences in terms of information that is made available, data format, and more. We also identify opportunities where afrimapr can develop R building blocks to make this kind of analysis easier for others wanting to do something similar.
This post contains fine-grained details about challenges and solutions for reading open health facility lists from Africa into R and analysing the data in a comparative manner. The narrative is written in an accessible way so that readers with no knowledge of R can gain some value from reading the report. The R code and data is made available for readers wanting to reproduce the analysis or customise it for their own use.
The typical audience may include data analysts or data scientists as well as data providers.
As mentioned, a number of countries already make their official MFL available online and even allow users to download the data in a variety of formats. Open facility lists that are not necessarily acknowledged as the official MFLs are also available for some other countries. The interactive map below shows information about the availability of open facility lists across the continent. More information on each country can be accessed by clicking on the map.
For this report we decided to focus on countries where data adhered to the following criteria:
# Load Google Sheet describing African open health facility lists
# This sheet is now published as CSV and can be read directly from the URL without needing oauth
africa_lists <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8n2Db7EnUDx14UvsJvSHJEH3lBCf9j5rMUQ2H8cuxPMeWDRZSCYmzc9MfV7i5UxfFnlhoL2Ipga0s/pub?gid=0&single=true&output=csv")
# Add column with iso2c codes for use in mapping
africa_lists <- africa_lists %>%
filter(!is.na(Owner)) %>%
mutate(country_iso = countrycode(Country, origin = "country.name", destination = "iso2c")) %>%
relocate(country_iso, .after = "Country") %>%
# Add column for use in mapping
mutate(`Open facility list online` = case_when(`Official MFL accessible online` == "yes" ~ "Official MoH MFL",
`Official MFL accessible online` == "no" ~ "Other source",
`Official MFL accessible online` == "unclear" ~ "Official status unclear"))
# Merge world data from spData with africa_lists created in previous step for to obtain African polygons
# Select only relevent columns
africa <- world %>%
filter(continent == "Africa", !is.na(iso_a2)) %>%
right_join(africa_lists, by = c("iso_a2" = "country_iso")) %>%
dplyr::select(name_long, `Official MFL accessible online`, `Open facility list online`, Owner, License,
`Data format`, `Downloaded data geocoded`, `Health facility data URL`, `About page URL`,
`Alternative health facilities data source`, `Last updated`, geom) %>%
st_transform("+proj=aea +lat_1=20 +lat_2=-23 +lat_0=0 +lon_0=25")
# Draw map
tmap_mode("view")
tm_shape(africa) +
tm_polygons("Open facility list online",
# Create list of items that will show when clicking on a country
popup.vars=c("Owner: "="Owner", "License: "="License",
"Recognised as official MFL: "="Official MFL accessible online",
"Data format: " = "Data format", "Geocoded: "="Downloaded data geocoded",
"Last updated: "="Last updated"),
palette = c("#FD6C6C", "#52463F", "#A87F8E"))
tmap_mode("plot")
The Kenyan MFL is available at http://kmhfl.health.go.ke/#/home. The data is downloadable in Excel format (although there seem to be an API as well, but we did not use the API as it seem to require access to a local copy of the database). Unfortunately one has to visit the website and physically click on the Export Excel button to obtain the data rather than being able to access the data directly via a URL. Once downloaded to our data/raw_data folder, the data is easily loaded using the function read_xlsx from the read_xl package.
ken_mfl <- read_xlsx(here("data", "raw_data", "kenya.xlsx"))
| Code | Name | Officialname | Registration_number | Keph level | Facility type | Facility_type_category | Owner | Owner type | Regulatory body | Beds | Cots | County | Constituency | Sub county | Ward | Operation status | Open_whole_day | Open_public_holidays | Open_weekends | Open_late_night | Service_names | Approved | Public visible | Closed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 25720 | Itete Dispensary | Itete Dispensary | 01245 | Level 2 | Dispensary | DISPENSARY | Ministry of Health | Ministry of Health | Ministry of Health | 8 | 3 | Kakamega | Matungu | Matungu | Koyonzo | Operational | No | No | No | No | NA | Yes | Yes | No |
| 25731 | Highrise Healthcare Services | Highrise Healthcare Services | 017325 | Level 2 | Medical Clinic | MEDICAL CLINIC | Private Practice - Nurse / Midwifery | Private Practice | Kenya MPDB | 0 | 0 | Embu | Mbeere South | Mbeere South | Mbeti South | Operational | No | No | No | No | NA | Yes | Yes | No |
The Malawi MFL is available at http://zipatala.health.gov.mw/facilities and can be downloaded in Excel or PDF format. An API exists but more information was not available and the API was thus not used. We visited the website and downloaded the data to our data/raw_data folder by clicking on the DOWNLOAD EXCEL button. There is no direct access to the data via a URL.
mwi_mfl <- read_xlsx(here("data", "raw_data", "malawi.xlsx"))
| CODE | NAME | COMMON NAME | OWNERSHIP | TYPE | STATUS | ZONE | DISTRICT | DATE OPENED | LATITUDE | LONGITUDE |
|---|---|---|---|---|---|---|---|---|---|---|
| MC010002 | A + A private clinic | A+A | Private | Clinic | Functional | Centrals West Zone | Mchinji | Jan 1st 75 | -13.797421 | 33.885631 |
| BT240003 | A-C Opticals | A.C Opticals | Private | Clinic | Functional | South East Zone | Blantyre | Jan 1st 75 | -15.8 | 35.03 |
The MFL for Namibia is accessible via an API as described on the website. The data can also be downloaded in Excel format directly from the website, but it should be noted that the resultant Excel file contains a very small subset of the total attributes available.
We had some trouble reading the JSON file in R and decided to develop a script in Python that could access the JSON for each facility and convert the dataset to an object that could further be analysed here in R alongside the other country MFLs.
Details of the data structure and download process are available from the Jupyter Notebook. It should be noted that data obtained through the API lists all facilities as having facility type Facility. Most of the facility names however, contain information about which category it belongs to. We therefore included an additional column in the data called facility_type and used regular expressions to identify the facility type according to a list available on the Namibian MFL website. Where the facility name did not include the facility type according to the list we found on the website, we categorised it as facility_type = Other.
# Can't download straight from API due to Certificate issues
# Error in open.connection(con, "rb") :
# server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
# Decided to try in Python - see python_notebooks/namibia_mfl_convert.ipynb
# Heavily nested JSON teased apart in Python and saved to CSV from Jupyter Notebook to be used in R
nam_mfl <- read_csv(here("data", "raw_data", "namibia.csv"))
| name | id | long_name | contact_person | phone_number | alt_phone_number | catchment_population | point_x | point_y | parent_location_id | parent_location_name | location_type_name | location_ownership_name | infrastructure_ids | infrastructure_names | service_ids | service_names | facility_type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Zambezi Regional Health Office | 11981 | NA | NA | NA | NA | NA | -17.4994 | 24.27878 | 10598 | Katima Mulilo District | Facility | Public_MoHSS | NA | NA | NA | NA | Regional Health Office |
| Sibbinda Health Centre | 10131 | NA | NA | NA | NA | 4111 | -17.7851 | 23.82119 | 10598 | Katima Mulilo District | Facility | Public_MoHSS | 1.256791e+19 | Ambulances,Beds,Electricity,Running Water,Health Extension Workers,Toilets,Phone Number,Computers,Vehicles,Enrolled Nurses,Registered Nurses,Doctors,Administrative Officers | 1.238313e+19 | HIV Testing Services,General Clinical Service,Expanded Programme on Immunizations,Preventing Mother To Child Transmission Services,Viral Load Testing,Sexual Transmitted Infections,Anti Retroviral Therapy IMAI Site,Ante Natal Clinic Services,Family Planning Services,Tuberculosis Services,Option B+,DNA EID Testing | Health Centre |
Rwanda their MFL available for download in CSV, Excel or PDF format. Again one has to visit the provided webpage and physically click on the CSV button to download the data as direct access to the data via a URL is not possible.
The raw data contains two instances of the District column that seems to be a duplicate in terms of values stored in this column. R automatically converts the column header of the second District column to District_1 to avoid confusion. The resultant dataset in R thus contains both a District and District_1 column containing exactly the same data.
rwa_mfl <- read_csv(here("data", "raw_data", "rwanda.csv"))
| Facity Name | id | Opening Date | Sector | Subdistrict | District | Province | District_1 | Facility type | Ownership | LOCATION |
|---|---|---|---|---|---|---|---|---|---|---|
| A La Source DISP | 1140 | 1/1/2000 | Muhima | Muhima Sub District | Nyarugenge District | Kigali City | Nyarugenge District | Dispensary | Private | -1.937517 30.059391 |
| Active Life Physiotherapy ltd | 1664 | 8/17/2016 | Kimironko | Kibagabaga Sub District | Gasabo District | Kigali City | Gasabo District | Medical Clinic | Private | NA |
The South Sudan facilities list is available in CSV format from https://www.southsudanhealth.info/facility/fac.php?list. The data can be accessed directly in CSV format via the link - https://www.southsudanhealth.info/PublicData/facility_info_2020-05-08.csv.
ssd_mfl <- read_csv("https://www.southsudanhealth.info/PublicData/facility_info_2020-05-08.csv")
| #"idGeo" | Facility | type | Ext Ref | Payam | County | State | Deleted | Indicators | Sampled | Pilot Accessible | Pilot Operational | Extension Accessible | Extension Operational | Alternate Names | Location | ACLED Refs |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 140th SPLA Battalion | Other | FC10080402 | Tambura | Tambura County | Gbudwe | NA | 0 | 0 | 0 | 0 | 0 | 0 | NA | 5.52066, 27.46684 | NA |
| 2 | Abara PHCC | Primary Health Care Centre | FC02070702 | Unknown Payam In Magwi | Magwi County | Imatong | NA | 130 | 1 | 0 | 0 | 1 | 1 | abara-phcc, Ababa PHCC | 4.08234, 32.17893 | NA |
For Tanzania the MFL is available at http://hfrportal.moh.go.tz/index.php?r=page/index&page_name=about_page with data downloadable in Excel format (XLS). The geocoded data can directly be accessed via a URL with no need for physically interacting with the website. It should be noted that the data might be cashed as empty dataset on the website. If the downloaded file contains no data, please visit the website and ensure all geocoded facilities are selected.
There are also 1,378 facilities without coordinates in this database. These can be downloaded by visiting this URL.
The Excel files with both the geocoded and non-geocoded data contain the same columns (Latitude and Longitude is retained in the non-geocoded file). We can therefore merge the two datasets easily for combined analysis.
It should be noted that the very first row of both Excel sheets consists of merged cells. The row contains information about the date and time of download of the data and can be deleted. Because of the merged cells, the data has to be downloaded to disk, opened in Excel, LibreOffice or any other spreadsheet package. The first row has to be removed and the file saved. Only after this step is performed, can the file be loaded successfully into R for further analysis. R does not like merged cells...
It should also be noted that the geocoded data excludes all facilities situated on the islands of Zanzibar and Pemba but includes facilities based on Mafia Island. Zanzibar has its own Ministry of Health (Pemba reports under this MoH).
tanzania_geocode <- read_xls(here("data", "raw_data", "tanzania_geocoded.xls"))
tanzania_nongeocode <- read_xls(here("data", "raw_data", "tanzania_nongeocoded.xls"))
tza_mfl <- tanzania_geocode %>%
bind_rows(tanzania_nongeocode)
remove(tanzania_nongeocode, tanzania_geocode)
| Facility Number | Facility Name | Common Name | Registration Status | Created At | Updated At | Zone | Region | District | Council | Ward | Village/Street | Facility Type | Operating Status | Ownership | Registration Number | CTC Number | Latitude | Longitude | Date Opened | National Grid | Generator | Solar Panels | No Electricity | Other |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 113310-7 | 2001 GEM PLUS | NA | Registered | 2019-03-27T18:14:11.000Z | 2019-05-28T17:32:43.000Z | Lake Zone | Mwanza | Nyamagana | Nyamagana MC | Butimba | Not set | Health Labs - Level IA2 (Dispensary Laboratory) | Operating | Private - For Profit | PHL-C/MWZ/AUT/06 | NA | -2.563525 | 32.91266 | NA | 0 | 0 | 0 | 0 | 0 |
| 100017-3 | 202 KJ | NA | NA | 2013-08-20T10:23:40.000Z | 2018-08-20T11:36:15.000Z | Central Zone | Not set | Not set | Not set | Not set | Not set | Dispensary | Closed | Public - Military | NA | NA | -5.057160 | 32.82869 | NA | 0 | 0 | 0 | 0 | 0 |
The Zambian MFL is hosted on Github. The raw data is available in CSV format in the Github repository.
zmb_mfl <- read_csv("https://raw.githubusercontent.com/MOH-Zambia/MFL/master/geography/data/facility_list.csv")
| province | district | name | HMIS_code | DHIS2_UID | smartcare_GUID | eLMIS_ID | iHRIS_ID | location | ownership | facility_type | longitude | latitude | catchment_population_head_count | catchment_population_cso | operation_status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Central | Chibombo | Chamakubi Health Post | 10010001 | pXhz0PLiYZX | 7b46450b78a04a1db64c0fc9bb014773 | NA | facility|1 | Rural | GRZ | Health Post | 27.64199 | -14.79990 | 6624 | 6624 | Operational |
| Central | Chibombo | Kabangalala Rural Health Centre | 10010011 | sbFApO4who4 | 9a450380b7db4f2fb13156d03fc0bc6d | NA | facility|10 | Rural | GRZ | Rural Health Centre | 27.72866 | -15.16894 | 6345 | 6900 | Operational |
We can also access the open health facility data available through the KEMRI|Wellcome Trust Research Programme and healthsites.io. Both these datasets can be accessed via the afrimapr afrihealthsites package.
# Loop through list of countries to create dataframes for each country containing either WHO data or Healthsites.io data
for (country in countries){
# Use iso3 code to extract country level data
# Return dataframe (by default afrihealthsites return geoJSON
# but not all facilities in WHO dataset is geocoded and some are lost in geoJSON format)
who_df <- afrihealthsites(country, datasource='who', plot=FALSE, returnclass='dataframe')
hs_df <- afrihealthsites(country, datasource='healthsites', plot=FALSE, returnclass='dataframe')
# Create one dataframe per country per data source
assign(paste0(country,"_who"), who_df)
assign(paste0(country,"_hs"), hs_df)
# Clean up workspace - remove temp dataframes
remove(who_df, hs_df, country)
}
Below we show excerpts from the WHO and healthsites.io data for Kenya to give the reader an overview of column headers and data format.
| Country | Admin1 | Facility name | Facility type | Ownership | Lat | Long | LL source | iso3c |
|---|---|---|---|---|---|---|---|---|
| Kenya | Baringo | Aiyebo Dispensary | Dispensary | MoH | 0.65783 | 35.80768 | GPS | KEN |
| Kenya | Baringo | Akwichatis Health Centre | Health Centre | MoH | 1.00150 | 36.23620 | GPS | KEN |
| osm_id | osm_type | completeness | is_in_health_zone | amenity | speciality | addr_full | operator | water_source | changeset_id | insurance | staff_doctors | contact_number | uuid | electricity | opening_hours | operational_status | source | is_in_health_area | health_amenity_type | changeset_version | emergency | changeset_timestamp | name | staff_nurses | changeset_user | wheelchair | beds | url | dispensing | healthcare | operator_type | geometry | country | iso3c |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 696655697 | node | 27 | pharmacy | 62793048 | 37fa2725b7824f60ad7ba9f4103ccb06 | 08:00-20:00 | operational | survey | 7 | 1537524419 | Nafuu Chemist | cbeddow | yes | private | c(36.778198187095, -1.31241153440629) | Kenya | KEN | |||||||||||||||||
| 6807606134 | node | 10 | clinic | 74663883 | 87c9e47944eb4ac5ac72daf6d7ac2f86 | 1 | 1568884347 | Arap Kobilo | yes | c(35.9799016217169, 0.468861661533481) | Kenya | KEN |
# Create dataframe with number of observations and number of columns for each dataset
# Step 1: Create vector for dataset names
dataset_names <- c(ls(pattern = "mfl"), ls(pattern = "who"), ls(pattern = "hs"))
# Step 2: Create vector for # observations and # columns per dataset
# Step 2a: Create list of dataframes to run nrow, ncol on
datasets <- mget(dataset_names)
# Step 2b: Create vectors
dataset_obs <- c()
dataset_cols <- c()
for (ds in datasets){
dataset_obs <- append(dataset_obs, nrow(ds))
dataset_cols <- append(dataset_cols, ncol(ds))
}
# Step 3: Create dataframe with everything combined
health_lists_df <- tibble(Dataset = dataset_names,
Facilities = dataset_obs,
Attributes = dataset_cols)
health_lists_df <- health_lists_df %>%
mutate(Country = case_when(str_detect(Dataset, "ken") ~ "Kenya",
str_detect(Dataset, "mwi") ~ "Malawi",
str_detect(Dataset, "nam") ~ "Namibia",
str_detect(Dataset, "rwa") ~ "Rwanda",
str_detect(Dataset, "ssd") ~ "South Sudan",
str_detect(Dataset, "tza") ~ "Tanzania",
str_detect(Dataset, "zmb") ~ "Zambia")) %>%
mutate(`Data Source` = case_when(str_detect(Dataset, "mfl") ~ "Master facility list",
str_detect(Dataset, "who") ~ "WHO",
str_detect(Dataset, "hs") ~ "healthsites.io")) %>%
mutate(text = paste("Country: ", Country, "\nData source: ", `Data Source`, "\nFacilities: ", Facilities, "\nAttributes: ", Attributes, sep="")) %>%
mutate(Dataset = factor(Dataset, Dataset))
# Step 4: Remove clutter
remove(dataset_names, dataset_obs, dataset_cols, ds)
health_lists_df %>%
group_by(Country) %>%
ggplot(aes(x = Country, y = Facilities, fill = `Data Source`)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("#52463F", "#FD6C6C", "#A87F8E")) +
theme_minimal()
If we want to compare the types of facilities listed in each dataset, we first have to (manually) identify the column that contains information about facility type.
k_types <- unique(ken_mfl$`Facility type`)
m_types <- unique(mwi_mfl$TYPE)
n_types <- unique(nam_mfl$facility_type)
r_types <- unique(rwa_mfl$`Facility type`)
s_types <- unique(ssd_mfl$type)
t_types <- unique(tza_mfl$`Facility Type`)
z_types <- unique(zmb_mfl$facility_type)
# Can't write function for next step because column names vary between datasets
# Get the names of WHO country dataframes
names_who <- c(ls(pattern = "^\\w\\w\\w_who$"))
# Get the WHO country dataframe content
datasets_who <- mget(names_who)
# Create vectors with country name and types
who_country <- c()
who_types <- c()
for (who in datasets_who){
who_country <- append(who_country, unique(who$Country))
who_types <- append(who_types, list(unique(who$`Facility type`)))
}
# Create dataframe with country name in first column and unique facility types observed in second column
who_fac_types <- tibble(country = who_country,
types = who_types)
# Get the names of WHO country dataframes
names_hs <- c(ls(pattern = "^\\w\\w\\w_hs$"))
# Get the WHO country dataframe content
datasets_hs <- mget(names_hs)
# Create vectors with country name and types
hs_country <- c()
hs_types <- c()
for (hs in datasets_hs){
hs_country <- append(hs_country, unique(hs$country))
hs_types <- append(hs_types, list(unique(hs$amenity)))
}
# Create dataframe with country name in first column and unique facility types observed in second column
hs_fac_types <- tibble(country = hs_country,
types = hs_types)
facility_types_table <- tibble(
Country = unlist(hs_fac_types$country),
`MFL "Type" Column` = c("Facility type", "TYPE", "None provided", "Facility type", "type", "Facility Type", "facility_type"),
MFL = c(paste(k_types, collapse = ", "), paste(m_types, collapse = ", "), paste(n_types, collapse = ", "), paste(r_types, collapse = ", "),
paste(s_types, collapse = ", "), paste(t_types, collapse = ", "), paste(z_types, collapse = ", ")),
`WHO (Facility type)` = c(paste(unlist(who_fac_types$types[who_fac_types$country == "Kenya"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "Malawi"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "Namibia"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "Rwanda"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "South Sudan"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "Tanzania"]), collapse = ", "),
paste(unlist(who_fac_types$types[who_fac_types$country == "Zambia"]), collapse = ", ")
),
`healthsites.io (amenity)` = c(paste(unlist(hs_fac_types$types[hs_fac_types$country == "Kenya"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "Malawi"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "Namibia"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "Rwanda"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "South Sudan"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "Tanzania"]), collapse = ", "),
paste(unlist(hs_fac_types$types[hs_fac_types$country == "Zambia"]), collapse = ", "))
)
| Country | MFL "Type" Column | MFL | WHO (Facility type) | healthsites.io (amenity) |
|---|---|---|---|---|
| Kenya | Facility type | Dispensary, Medical Clinic, Medical Center, Secondary care hospitals, Basic Health Centre, Nursing and Maternity Home, Nursing Homes, Primary care hospitals, Specialized & Tertiary Referral hospitals, Dental Clinic, MEDICAL CLINIC, VCT, Laboratory, Rehab. Center - Drug and Substance abuse, Ophthalmology, Comprehensive health Centre, NURSING HOME, Comprehensive Teaching & Tertiary Referral Hospital, Dialysis Center, Radiology Clinic, Blood Bank, Regional Blood Transfusion Centre, Pharmacy, MEDICAL CENTER, DISPENSARY, Dispensaries and clinic-out patient only, HEALTH CENTRE, Farewell Home, HOSPITALS | Dispensary, Health Centre, District Hospital, Sub-District Hospital, Mission Hospital, Clinic, Hospital, County Referral Hospital, Provincial General Hospital, National Referral Hospital | pharmacy, clinic, hospital, doctors, dentist, |
| Malawi | TYPE | Clinic, Hospital, Dispensary, Health Centre, District Hospital, Health Post, Unclassified, Central Hospital, Private | Clinic, Health Centre, Community Hospital, Health Post/Dispensary, District Hospital, Mission Hospital, Rural Hospital, Central Hospital | hospital, clinic, pharmacy, dentist |
| Namibia | None provided | Regional Health Office, Health Centre, Clinic, Other, Hospital, District Health Office, Rehab Centre, Prison, CBART, Mobile Van | Clinic, Health Centre, District Hospital, Mission Hospital, Intermediate Hospital, Central Hospital | doctors, clinic, , pharmacy, hospital, dentist |
| Rwanda | Facility type | Dispensary, Medical Clinic, Health Center, VCT center, Health Post, Laboratory/Diagnostic Center, Health Post level 2, District Pharmacy, Community-owned health facility, Provincial Hospital, Referral Hospital, District Hospital, Nursing School, Pharmaceutical Warehouse, Other facility type, facilitytype, NA, Administrative Office, Blood bank/Transfusion Center, Prison Clinic | Health Centre, Health Post, District Hospital, Referral Hospital, Provincial Hospital, National Referral Hospital, Secondary Health Post | doctors, hospital, pharmacy, clinic, dentist |
| South Sudan | type | Other, Primary Health Care Centre, Primary Health Care Unit, County Hospital, County Health Department, State Hospital, Specialized Hospital/Clinic, Teaching Hospital, Health Training Institutions, Ministry of Health | Primary Health Care Unit, Primary Health Care Centre, State Hospital, Teaching Hospital, County Hospital | hospital, clinic, doctors, pharmacy, |
| Tanzania | Facility Type | Health Labs - Level IA2 (Dispensary Laboratory), Dispensary, Clinic - Dental Clinic, Clinic - Diagnostic Centre, Clinic - Polyclinic, Health Center, Maternity Home, Clinic - Other Clinic, Clinic - Eye Clinic, Clinic - Optometry Clinic, Clinic - Specialized Polyclinic, Clinic, Hospital - Hospital at Zonal Level, Clinic - Super specialized Polyclinic, Clinic - Medical Clinic, Health Labs - Level III single purpose Health Laboratory, Hospital - Hospital at District Level, Clinic - Specialized clinic, Health Labs, Hospital - Regional Referral Hospital, Maternity and Nursing Home, Hospital - Referral Hospital at Regional Level, Clinic - General Clinic, Hospital - District Hospital, Hospital - Council Designated Hospital, Hospital - Other Hospital, Hospital - Hospital at Regional Level, Hospital - Referral Hospital at Zonal Level, Health Labs - Specimen collection point, Health Labs - Level III Multipurpose Health Laboratory, Health Labs - Level IA1 (Health centre Laboratory), Nursing Home, Hospital - Super Specialized Hospital at National Level, Clinic - Dialysis Clinic, Clinic - Physiotherapy Clinic, Hospital - Referral Hospital at National Level, NA, Health Labs - Level IIA2 (District Laboratory), Clinic - Super specialized clinic, Hospital | Health Centre, Dispensary, Hospital, Referral Hospital, Designated District Hospital, District Hospital, Regional Referral Hospital, National Hospital, Primary Health Care Unit, Primary Health Care Unit +, Primary Health Care Centre, Tertiary Hospital | hospital, pharmacy, clinic, doctors, dentist, , traditional, medical_laboratory |
| Zambia | facility_type | Health Post, Rural Health Centre, Hospital - Level 1, Urban Health Centre, Hospital - Level 3, NA, Hospital - Level 2, Hospital Affiliated Health Centre, Zonal Health Centre, Border Health Post | Health Centre, Health Post, Level 1 Hospital, Level 2 Hospital, Rural Health Centre, Clinic, Level 3 Hospital | clinic, hospital, , pharmacy, dentist, doctors, healthcare, health_post |
According to HIS geo-enabling: Guidance on the establishment of a common geo-registry for the simultaneous hosting, maintenance, update and sharing of master lists core to public health developed in 2017 by the AeHINGIS Lab and InSTEDD through funding from the Asian Development Bank, information included in an MFL are split into two domains: the signature domain (ellaborated on below) and the service domain (that includes information about services and capacity of a health facility).
The service domain include the following:
The guide strongly recommends that each datasource is indicated along with a time stamp for when the data was obtained. This will effectively result in three columns for each data point: the first containing the information such as the identifier, name, or coordinates; the second containing the source; and the third containing the date and time stamp of when it was obtained.
According to this document, the capturing of service domain and other information not mentioned above form part of the optional component of the MFL.
Taking a closer look at the type of attributes that are available from country MFLs and other open data sources we notice great variability in terms of how well facililties are described. The total number of attributes for each dataset can be visualised below.
# Create table with two columns - country & number of attributes i.e. number of columns
header_df <- tibble(Country = c("Kenya MFL", "Malawi MFL", "Namibia MFL", "Rwanda MFL", "South Sudan MFL", "Tanzania MFL", "Zambia MFL",
"KWTRP", "healthsites.io"),
Attributes = c(length(colnames(ken_mfl)), length(colnames(mwi_mfl)), length(colnames(nam_mfl)), length(colnames(rwa_mfl)),
length(colnames(ssd_mfl)), length(colnames(tza_mfl)), length(colnames(zmb_mfl)), length(colnames(ken_who)),
length(colnames(ken_hs)))
)
header_df %>%
# Order descending so that plot looks nicer
arrange(desc(Attributes)) %>%
mutate(Country = factor(Country, Country)) %>%
ggplot(aes(x = Country, y = Attributes)) +
geom_bar(stat = "identity", width = 0.5, fill = "#FD6C6C") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 30))
The table below shows the column headers for each dataset.
# Create a vector containing the column headers for each dataset but filled up with empty space
# Want to create a tibble with each dataset's headers as a column and therefore we need the vectors to be of equal length
# HS data has the most headers, so we use that as basis and make all vectors as long as HS
fill_col <- function(df){
# Calculate the difference in header numbers between healthsites data and the target dataset
fill_number <- length(colnames(ken_hs)) - length(colnames(df))
# Create the vector where the first elements are the sorted column names of the target dataset
col_data <- sort(colnames(df))
# Fill the rest of the vector with empty cells to be able to create the tibble
for (i in 1:fill_number){
col_data <- append(col_data, "")
}
# Return the vector of specified length
return(col_data)
}
# Create a table with columns = header names sorted alphabetically
table_headers <- tibble(`healthsite.io` = sort(colnames(ken_hs)),
WHO = fill_col(ken_who),
`Kenya MFL` = fill_col(ken_mfl),
`Malawi MFL` = fill_col(mwi_mfl),
`Namibia MFL` = fill_col(nam_mfl),
`Rwanda MFL` = fill_col(rwa_mfl),
`South Sudan MFL` = fill_col(ssd_mfl),
`Tanzania MFL` = fill_col(tza_mfl),
`Zambia MFL` = fill_col(zmb_mfl)
)
| healthsite.io | WHO | Kenya MFL | Malawi MFL | Namibia MFL | Rwanda MFL | South Sudan MFL | Tanzania MFL | Zambia MFL |
|---|---|---|---|---|---|---|---|---|
| addr_full | Admin1 | Approved | CODE | alt_phone_number | District | #"idGeo" | Common Name | catchment_population_cso |
| amenity | Country | Beds | COMMON NAME | catchment_population | District_1 | ACLED Refs | Council | catchment_population_head_count |
| beds | Facility name | Closed | DATE OPENED | contact_person | Facility type | Alternate Names | Created At | DHIS2_UID |
| changeset_id | Facility type | Code | DISTRICT | facility_type | Facity Name | County | CTC Number | district |
| changeset_timestamp | iso3c | Constituency | LATITUDE | id | id | Deleted | Date Opened | eLMIS_ID |
| changeset_user | Lat | Cots | LONGITUDE | infrastructure_ids | LOCATION | Ext Ref | District | facility_type |
| changeset_version | LL source | County | NAME | infrastructure_names | Opening Date | Extension Accessible | Facility Name | HMIS_code |
| completeness | Long | Facility type | OWNERSHIP | location_ownership_name | Ownership | Extension Operational | Facility Number | iHRIS_ID |
| contact_number | Ownership | Facility_type_category | STATUS | location_type_name | Province | Facility | Facility Type | latitude |
| country | Keph level | TYPE | long_name | Sector | Indicators | Generator | location | |
| dispensing | Name | ZONE | name | Subdistrict | Location | Latitude | longitude | |
| electricity | Officialname | parent_location_id | Payam | Longitude | name | |||
| emergency | Open_late_night | parent_location_name | Pilot Accessible | National Grid | operation_status | |||
| geometry | Open_public_holidays | phone_number | Pilot Operational | No Electricity | ownership | |||
| health_amenity_type | Open_weekends | point_x | Sampled | Operating Status | province | |||
| healthcare | Open_whole_day | point_y | State | Other | smartcare_GUID | |||
| insurance | Operation status | service_ids | type | Ownership | ||||
| is_in_health_area | Owner | service_names | Region | |||||
| is_in_health_zone | Owner type | Registration Number | ||||||
| iso3c | Public visible | Registration Status | ||||||
| name | Registration_number | Solar Panels | ||||||
| opening_hours | Regulatory body | Updated At | ||||||
| operational_status | Service_names | Village/Street | ||||||
| operator | Sub county | Ward | ||||||
| operator_type | Ward | Zone | ||||||
| osm_id | ||||||||
| osm_type | ||||||||
| source | ||||||||
| speciality | ||||||||
| staff_doctors | ||||||||
| staff_nurses | ||||||||
| url | ||||||||
| uuid | ||||||||
| water_source | ||||||||
| wheelchair |
We can use a visual way to look at the overlap of attributes between various datasets through a wordcloud.
library(tm)
library(ggwordcloud)
# Used this tutorial to create frequency table:
# https://www.pluralsight.com/guides/visualization-text-data-using-word-cloud-r
# Create corpus
corpus <- Corpus(VectorSource(c(colnames(ken_hs), colnames(ken_who),
colnames(ken_mfl), colnames(mwi_mfl), colnames(nam_mfl),
colnames(rwa_mfl), colnames(ssd_mfl), colnames(tza_mfl), colnames(zmb_mfl))))
#Conversion to Lowercase
corpus = tm_map(corpus, PlainTextDocument)
corpus = tm_map(corpus, tolower)
# Create frequency table
DTM <- TermDocumentMatrix(corpus)
mat <- as.matrix(DTM)
f <- sort(rowSums(mat),decreasing=TRUE)
dat <- data.frame(word = names(f),freq=f)
# Plot wordcloud
ggwordcloud::ggwordcloud2(dat, size=1.2)
# Clean up
remove(corpus, DTM, mat, f, dat)
Andy has already started working on functionality in afrihealthsites to allow users to load a file containing a custom health facility list. This may be data obtained from the internet or an in-house (proprietary) file for example an official MFL such as the ones described in this post. The new functionality will enable users to compare and contrast their dataset against the KWTRP and healthsites.io data amongst others. We'll continue development of functionality related to health facility lists and look forward to hear from the community about their needs and experience.
Please get in touch through one of the channels listed on our website.
Please cite as:
Anelda van der Walt, & Andy South. (2020, June 1). Exploring open African health facility data (Version v1.1). Zenodo. http://doi.org/10.5281/zenodo.3871224