License: MIT
License: CC BY 4.0

** Please observe individual dataset licensing if you would like to use the original datasets

Background

In March 2020 the afrimapr team set out to develop building blocks in R that would make open health facility data more accessible to data scientists in Africa and elsewhere. The afrihealthsites package aims to provide functionality to load, analyse, visualise, and map open health facility datasets such as the list compiled by KEMRI-Wellcome Research Programme (KWTRP) for sub-Saharan Africa and the data made available via the Global Healthsites Mapping Project, (healthsites.io).

Through our research we learned about the term master facility list (MFL). A master facility list contains information about the full complement of health facilities in a country. The World Health Organisation developed a guide for countries wanting to develop their own MFL or wanting to strengthen existing MFLs. We were excited to find several African MFLs available online.

Here we perform some exploratory analysis on the MFLs from a number of countries to understand the overlaps and differences in terms of information that is made available, data format, and more. We also identify opportunities where afrimapr can develop R building blocks to make this kind of analysis easier for others wanting to do something similar.

Intended audience

This post contains fine-grained details about challenges and solutions for reading open health facility lists from Africa into R and analysing the data in a comparative manner. The narrative is written in an accessible way so that readers with no knowledge of R can gain some value from reading the report. The R code and data is made available for readers wanting to reproduce the analysis or customise it for their own use.

The typical audience may include data analysts or data scientists as well as data providers.

Open African MFLs

As mentioned, a number of countries already make their official MFL available online and even allow users to download the data in a variety of formats. Open facility lists that are not necessarily acknowledged as the official MFLs are also available for some other countries. The interactive map below shows information about the availability of open facility lists across the continent. More information on each country can be accessed by clicking on the map.

For this report we decided to focus on countries where data adhered to the following criteria:

  • a facility list is openly available online;
  • the list is acknowledged by the country's Ministry of Health as the official MFL;
  • the MFL can be downloaded without having to request permission; and
  • the downloaded data is in a format that can be analysed in R (Excel, CSV, JSON, XML).
# Load Google Sheet describing African open health facility lists
# This sheet is now published as CSV and can be read directly from the URL without needing oauth
africa_lists <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8n2Db7EnUDx14UvsJvSHJEH3lBCf9j5rMUQ2H8cuxPMeWDRZSCYmzc9MfV7i5UxfFnlhoL2Ipga0s/pub?gid=0&single=true&output=csv")

# Add column with iso2c codes for use in mapping
africa_lists <- africa_lists %>% 
  filter(!is.na(Owner)) %>% 
  mutate(country_iso = countrycode(Country, origin = "country.name", destination = "iso2c")) %>% 
  relocate(country_iso, .after = "Country") %>% 
  # Add column for use in mapping
  mutate(`Open facility list online` = case_when(`Official MFL accessible online` == "yes" ~ "Official MoH MFL",
                                                 `Official MFL accessible online` == "no" ~ "Other source",
                                                 `Official MFL accessible online` == "unclear" ~ "Official status unclear"))

# Merge world data from spData with africa_lists created in previous step for to obtain African polygons
# Select only relevent columns
africa <-  world %>%
  filter(continent == "Africa", !is.na(iso_a2)) %>%
  right_join(africa_lists, by = c("iso_a2" = "country_iso")) %>%
  dplyr::select(name_long, `Official MFL accessible online`, `Open facility list online`, Owner, License, 
                `Data format`, `Downloaded data geocoded`, `Health facility data URL`, `About page URL`, 
                `Alternative health facilities data source`, `Last updated`, geom) %>% 
  st_transform("+proj=aea +lat_1=20 +lat_2=-23 +lat_0=0 +lon_0=25")

# Draw map
tmap_mode("view")
tm_shape(africa) + 
  tm_polygons("Open facility list online",
              # Create list of items that will show when clicking on a country
              popup.vars=c("Owner: "="Owner", "License: "="License", 
                           "Recognised as official MFL: "="Official MFL accessible online",
                           "Data format: " = "Data format", "Geocoded: "="Downloaded data geocoded", 
                           "Last updated: "="Last updated"),
              palette = c("#FD6C6C", "#52463F", "#A87F8E"))
tmap_mode("plot")

Obtaining the open MFL data

Kenya

The Kenyan MFL is available at http://kmhfl.health.go.ke/#/home. The data is downloadable in Excel format (although there seem to be an API as well, but we did not use the API as it seem to require access to a local copy of the database). Unfortunately one has to visit the website and physically click on the Export Excel button to obtain the data rather than being able to access the data directly via a URL. Once downloaded to our data/raw_data folder, the data is easily loaded using the function read_xlsx from the read_xl package.

ken_mfl <- read_xlsx(here("data", "raw_data", "kenya.xlsx"))
Code Name Officialname Registration_number Keph level Facility type Facility_type_category Owner Owner type Regulatory body Beds Cots County Constituency Sub county Ward Operation status Open_whole_day Open_public_holidays Open_weekends Open_late_night Service_names Approved Public visible Closed
25720 Itete Dispensary Itete Dispensary 01245 Level 2 Dispensary DISPENSARY Ministry of Health Ministry of Health Ministry of Health 8 3 Kakamega Matungu Matungu Koyonzo Operational No No No No NA Yes Yes No
25731 Highrise Healthcare Services Highrise Healthcare Services 017325 Level 2 Medical Clinic MEDICAL CLINIC Private Practice - Nurse / Midwifery Private Practice Kenya MPDB 0 0 Embu Mbeere South Mbeere South Mbeti South Operational No No No No NA Yes Yes No
Note: An excerpt showing the column headers and format of the raw data available from the Kenyan MFL

Malawi

The Malawi MFL is available at http://zipatala.health.gov.mw/facilities and can be downloaded in Excel or PDF format. An API exists but more information was not available and the API was thus not used. We visited the website and downloaded the data to our data/raw_data folder by clicking on the DOWNLOAD EXCEL button. There is no direct access to the data via a URL.

mwi_mfl <- read_xlsx(here("data", "raw_data", "malawi.xlsx"))
CODE NAME COMMON NAME OWNERSHIP TYPE STATUS ZONE DISTRICT DATE OPENED LATITUDE LONGITUDE
MC010002 A + A private clinic A+A Private Clinic Functional Centrals West Zone Mchinji Jan 1st 75 -13.797421 33.885631
BT240003 A-C Opticals A.C Opticals Private Clinic Functional South East Zone Blantyre Jan 1st 75 -15.8 35.03
Note: An excerpt showing the column headers and format of the raw data available from the Malawian MFL

Namibia

The MFL for Namibia is accessible via an API as described on the website. The data can also be downloaded in Excel format directly from the website, but it should be noted that the resultant Excel file contains a very small subset of the total attributes available.

We had some trouble reading the JSON file in R and decided to develop a script in Python that could access the JSON for each facility and convert the dataset to an object that could further be analysed here in R alongside the other country MFLs.

Details of the data structure and download process are available from the Jupyter Notebook. It should be noted that data obtained through the API lists all facilities as having facility type Facility. Most of the facility names however, contain information about which category it belongs to. We therefore included an additional column in the data called facility_type and used regular expressions to identify the facility type according to a list available on the Namibian MFL website. Where the facility name did not include the facility type according to the list we found on the website, we categorised it as facility_type = Other.

# Can't download straight from API due to Certificate issues
# Error in open.connection(con, "rb") : 
#  server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
# Decided to try in Python - see python_notebooks/namibia_mfl_convert.ipynb
# Heavily nested JSON teased apart in Python and saved to CSV from Jupyter Notebook to be used in R

nam_mfl <- read_csv(here("data", "raw_data", "namibia.csv"))
name id long_name contact_person phone_number alt_phone_number catchment_population point_x point_y parent_location_id parent_location_name location_type_name location_ownership_name infrastructure_ids infrastructure_names service_ids service_names facility_type
Zambezi Regional Health Office 11981 NA NA NA NA NA -17.4994 24.27878 10598 Katima Mulilo District Facility Public_MoHSS NA NA NA NA Regional Health Office
Sibbinda Health Centre 10131 NA NA NA NA 4111 -17.7851 23.82119 10598 Katima Mulilo District Facility Public_MoHSS 1.256791e+19 Ambulances,Beds,Electricity,Running Water,Health Extension Workers,Toilets,Phone Number,Computers,Vehicles,Enrolled Nurses,Registered Nurses,Doctors,Administrative Officers 1.238313e+19 HIV Testing Services,General Clinical Service,Expanded Programme on Immunizations,Preventing Mother To Child Transmission Services,Viral Load Testing,Sexual Transmitted Infections,Anti Retroviral Therapy IMAI Site,Ante Natal Clinic Services,Family Planning Services,Tuberculosis Services,Option B+,DNA EID Testing Health Centre
Note: An excerpt showing the column headers and format of the raw data available from the Namibian MFL

Rwanda

Rwanda their MFL available for download in CSV, Excel or PDF format. Again one has to visit the provided webpage and physically click on the CSV button to download the data as direct access to the data via a URL is not possible.

The raw data contains two instances of the District column that seems to be a duplicate in terms of values stored in this column. R automatically converts the column header of the second District column to District_1 to avoid confusion. The resultant dataset in R thus contains both a District and District_1 column containing exactly the same data.

rwa_mfl <- read_csv(here("data", "raw_data", "rwanda.csv"))
Facity Name id Opening Date Sector Subdistrict District Province District_1 Facility type Ownership LOCATION
A La Source DISP 1140 1/1/2000 Muhima Muhima Sub District Nyarugenge District Kigali City Nyarugenge District Dispensary Private -1.937517 30.059391
Active Life Physiotherapy ltd 1664 8/17/2016 Kimironko Kibagabaga Sub District Gasabo District Kigali City Gasabo District Medical Clinic Private NA
Note: An excerpt showing the column headers and format of the raw data available from the Rwandan MFL

South Sudan

The South Sudan facilities list is available in CSV format from https://www.southsudanhealth.info/facility/fac.php?list. The data can be accessed directly in CSV format via the link - https://www.southsudanhealth.info/PublicData/facility_info_2020-05-08.csv.

ssd_mfl <- read_csv("https://www.southsudanhealth.info/PublicData/facility_info_2020-05-08.csv")
#"idGeo" Facility type Ext Ref Payam County State Deleted Indicators Sampled Pilot Accessible Pilot Operational Extension Accessible Extension Operational Alternate Names Location ACLED Refs
1 140th SPLA Battalion Other FC10080402 Tambura Tambura County Gbudwe NA 0 0 0 0 0 0 NA 5.52066, 27.46684 NA
2 Abara PHCC Primary Health Care Centre FC02070702 Unknown Payam In Magwi Magwi County Imatong NA 130 1 0 0 1 1 abara-phcc, Ababa PHCC 4.08234, 32.17893 NA
Note: An excerpt showing the column headers and format of the raw data available from the South Sudan MFL

Tanzania

For Tanzania the MFL is available at http://hfrportal.moh.go.tz/index.php?r=page/index&page_name=about_page with data downloadable in Excel format (XLS). The geocoded data can directly be accessed via a URL with no need for physically interacting with the website. It should be noted that the data might be cashed as empty dataset on the website. If the downloaded file contains no data, please visit the website and ensure all geocoded facilities are selected.

There are also 1,378 facilities without coordinates in this database. These can be downloaded by visiting this URL.

The Excel files with both the geocoded and non-geocoded data contain the same columns (Latitude and Longitude is retained in the non-geocoded file). We can therefore merge the two datasets easily for combined analysis.

It should be noted that the very first row of both Excel sheets consists of merged cells. The row contains information about the date and time of download of the data and can be deleted. Because of the merged cells, the data has to be downloaded to disk, opened in Excel, LibreOffice or any other spreadsheet package. The first row has to be removed and the file saved. Only after this step is performed, can the file be loaded successfully into R for further analysis. R does not like merged cells...

It should also be noted that the geocoded data excludes all facilities situated on the islands of Zanzibar and Pemba but includes facilities based on Mafia Island. Zanzibar has its own Ministry of Health (Pemba reports under this MoH).

tanzania_geocode <- read_xls(here("data", "raw_data", "tanzania_geocoded.xls"))
tanzania_nongeocode <- read_xls(here("data", "raw_data", "tanzania_nongeocoded.xls"))

tza_mfl <- tanzania_geocode %>% 
  bind_rows(tanzania_nongeocode)

remove(tanzania_nongeocode, tanzania_geocode)
Facility Number Facility Name Common Name Registration Status Created At Updated At Zone Region District Council Ward Village/Street Facility Type Operating Status Ownership Registration Number CTC Number Latitude Longitude Date Opened National Grid Generator Solar Panels No Electricity Other
113310-7 2001 GEM PLUS NA Registered 2019-03-27T18:14:11.000Z 2019-05-28T17:32:43.000Z Lake Zone Mwanza Nyamagana Nyamagana MC Butimba Not set Health Labs - Level IA2 (Dispensary Laboratory) Operating Private - For Profit PHL-C/MWZ/AUT/06 NA -2.563525 32.91266 NA 0 0 0 0 0
100017-3 202 KJ NA NA 2013-08-20T10:23:40.000Z 2018-08-20T11:36:15.000Z Central Zone Not set Not set Not set Not set Not set Dispensary Closed Public - Military NA NA -5.057160 32.82869 NA 0 0 0 0 0
Note: An excerpt showing the column headers and format of the raw data available from the Tanzanian MFL

Zambia

The Zambian MFL is hosted on Github. The raw data is available in CSV format in the Github repository.

zmb_mfl <- read_csv("https://raw.githubusercontent.com/MOH-Zambia/MFL/master/geography/data/facility_list.csv")
province district name HMIS_code DHIS2_UID smartcare_GUID eLMIS_ID iHRIS_ID location ownership facility_type longitude latitude catchment_population_head_count catchment_population_cso operation_status
Central Chibombo Chamakubi Health Post 10010001 pXhz0PLiYZX 7b46450b78a04a1db64c0fc9bb014773 NA facility|1 Rural GRZ Health Post 27.64199 -14.79990 6624 6624 Operational
Central Chibombo Kabangalala Rural Health Centre 10010011 sbFApO4who4 9a450380b7db4f2fb13156d03fc0bc6d NA facility|10 Rural GRZ Rural Health Centre 27.72866 -15.16894 6345 6900 Operational
Note: An excerpt showing the column headers and format of the raw data available from the Zambian MFL

Loading other open health facility data sets

We can also access the open health facility data available through the KEMRI|Wellcome Trust Research Programme and healthsites.io. Both these datasets can be accessed via the afrimapr afrihealthsites package.

# Loop through list of countries to create dataframes for each country containing either WHO data or Healthsites.io data
for (country in countries){
  # Use iso3 code to extract country level data
  # Return dataframe (by default afrihealthsites return geoJSON
  # but not all facilities in WHO dataset is geocoded and some are lost in geoJSON format)
  who_df <- afrihealthsites(country, datasource='who', plot=FALSE, returnclass='dataframe')
  hs_df <- afrihealthsites(country, datasource='healthsites', plot=FALSE, returnclass='dataframe')
  
  # Create one dataframe per country per data source
  assign(paste0(country,"_who"), who_df)
  assign(paste0(country,"_hs"), hs_df)
  
  # Clean up workspace - remove temp dataframes
  remove(who_df, hs_df, country)
}

Below we show excerpts from the WHO and healthsites.io data for Kenya to give the reader an overview of column headers and data format.

Kenya: WHO dataset

Country Admin1 Facility name Facility type Ownership Lat Long LL source iso3c
Kenya Baringo Aiyebo Dispensary Dispensary MoH 0.65783 35.80768 GPS KEN
Kenya Baringo Akwichatis Health Centre Health Centre MoH 1.00150 36.23620 GPS KEN
Note: An excerpt showing the column headers and format of the raw data available from the WHO data for Kenya

Kenya: healthsites.io dataset

osm_id osm_type completeness is_in_health_zone amenity speciality addr_full operator water_source changeset_id insurance staff_doctors contact_number uuid electricity opening_hours operational_status source is_in_health_area health_amenity_type changeset_version emergency changeset_timestamp name staff_nurses changeset_user wheelchair beds url dispensing healthcare operator_type geometry country iso3c
696655697 node 27 pharmacy 62793048 37fa2725b7824f60ad7ba9f4103ccb06 08:00-20:00 operational survey 7 1537524419 Nafuu Chemist cbeddow yes private c(36.778198187095, -1.31241153440629) Kenya KEN
6807606134 node 10 clinic 74663883 87c9e47944eb4ac5ac72daf6d7ac2f86 1 1568884347 Arap Kobilo yes c(35.9799016217169, 0.468861661533481) Kenya KEN
Note: An excerpt showing the column headers and format of the raw data available from the healthsites.io data for Kenya

Exploring the data

Number of facilities per dataset

# Create dataframe with number of observations and number of columns for each dataset
# Step 1: Create vector for dataset names
dataset_names <- c(ls(pattern = "mfl"), ls(pattern = "who"), ls(pattern = "hs"))

# Step 2: Create vector for # observations and # columns per dataset
# Step 2a: Create list of dataframes to run nrow, ncol on
datasets <- mget(dataset_names)

# Step 2b: Create vectors
dataset_obs <- c()
dataset_cols <- c()
for (ds in datasets){
  dataset_obs <- append(dataset_obs, nrow(ds))
  dataset_cols <- append(dataset_cols, ncol(ds))
}

# Step 3: Create dataframe with everything combined
health_lists_df <- tibble(Dataset = dataset_names, 
                    Facilities = dataset_obs,
                    Attributes = dataset_cols)
health_lists_df <- health_lists_df %>% 
  mutate(Country = case_when(str_detect(Dataset, "ken") ~ "Kenya",
                             str_detect(Dataset, "mwi") ~ "Malawi",
                             str_detect(Dataset, "nam") ~ "Namibia",
                             str_detect(Dataset, "rwa") ~ "Rwanda",
                             str_detect(Dataset, "ssd") ~ "South Sudan",
                             str_detect(Dataset, "tza") ~ "Tanzania",
                             str_detect(Dataset, "zmb") ~ "Zambia")) %>% 
  mutate(`Data Source` = case_when(str_detect(Dataset, "mfl") ~ "Master facility list",
                                   str_detect(Dataset, "who") ~ "WHO",
                                   str_detect(Dataset, "hs") ~ "healthsites.io")) %>% 
  mutate(text = paste("Country: ", Country, "\nData source: ", `Data Source`, "\nFacilities: ", Facilities, "\nAttributes: ", Attributes, sep="")) %>%
  mutate(Dataset = factor(Dataset, Dataset))

# Step 4: Remove clutter
remove(dataset_names, dataset_obs, dataset_cols, ds)
health_lists_df %>% 
  group_by(Country) %>% 
  ggplot(aes(x = Country, y = Facilities, fill = `Data Source`)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("#52463F", "#FD6C6C", "#A87F8E")) +
  theme_minimal()

Facility types

If we want to compare the types of facilities listed in each dataset, we first have to (manually) identify the column that contains information about facility type.

k_types <- unique(ken_mfl$`Facility type`)

m_types <- unique(mwi_mfl$TYPE)

n_types <- unique(nam_mfl$facility_type)

r_types <- unique(rwa_mfl$`Facility type`)

s_types <- unique(ssd_mfl$type)

t_types <- unique(tza_mfl$`Facility Type`)

z_types <- unique(zmb_mfl$facility_type)

# Can't write function for next step because column names vary between datasets
# Get the names of WHO country dataframes
names_who <- c(ls(pattern = "^\\w\\w\\w_who$"))

# Get the WHO country dataframe content
datasets_who <- mget(names_who)

# Create vectors with country name and types
who_country <- c()
who_types <- c()
for (who in datasets_who){
  who_country <- append(who_country, unique(who$Country))
  who_types <- append(who_types, list(unique(who$`Facility type`)))
}

# Create dataframe with country name in first column and unique facility types observed in second column
who_fac_types <- tibble(country = who_country,
                        types = who_types)

# Get the names of WHO country dataframes
names_hs <- c(ls(pattern = "^\\w\\w\\w_hs$"))

# Get the WHO country dataframe content
datasets_hs <- mget(names_hs)

# Create vectors with country name and types
hs_country <- c()
hs_types <- c()
for (hs in datasets_hs){
  hs_country <- append(hs_country, unique(hs$country))
  hs_types <- append(hs_types, list(unique(hs$amenity)))
}

# Create dataframe with country name in first column and unique facility types observed in second column
hs_fac_types <- tibble(country = hs_country,
                        types = hs_types)
facility_types_table <- tibble(
  Country = unlist(hs_fac_types$country),
  `MFL "Type" Column` = c("Facility type", "TYPE", "None provided", "Facility type", "type", "Facility Type", "facility_type"),
  MFL = c(paste(k_types, collapse = ", "), paste(m_types, collapse = ", "), paste(n_types, collapse = ", "), paste(r_types, collapse = ", "), 
          paste(s_types, collapse = ", "), paste(t_types, collapse = ", "), paste(z_types, collapse = ", ")),
  `WHO (Facility type)` = c(paste(unlist(who_fac_types$types[who_fac_types$country == "Kenya"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "Malawi"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "Namibia"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "Rwanda"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "South Sudan"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "Tanzania"]), collapse = ", "),
          paste(unlist(who_fac_types$types[who_fac_types$country == "Zambia"]), collapse = ", ")
          ),
  `healthsites.io (amenity)` = c(paste(unlist(hs_fac_types$types[hs_fac_types$country == "Kenya"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "Malawi"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "Namibia"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "Rwanda"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "South Sudan"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "Tanzania"]), collapse = ", "),
                     paste(unlist(hs_fac_types$types[hs_fac_types$country == "Zambia"]), collapse = ", "))
)
Country MFL "Type" Column MFL WHO (Facility type) healthsites.io (amenity)
Kenya Facility type Dispensary, Medical Clinic, Medical Center, Secondary care hospitals, Basic Health Centre, Nursing and Maternity Home, Nursing Homes, Primary care hospitals, Specialized & Tertiary Referral hospitals, Dental Clinic, MEDICAL CLINIC, VCT, Laboratory, Rehab. Center - Drug and Substance abuse, Ophthalmology, Comprehensive health Centre, NURSING HOME, Comprehensive Teaching & Tertiary Referral Hospital, Dialysis Center, Radiology Clinic, Blood Bank, Regional Blood Transfusion Centre, Pharmacy, MEDICAL CENTER, DISPENSARY, Dispensaries and clinic-out patient only, HEALTH CENTRE, Farewell Home, HOSPITALS Dispensary, Health Centre, District Hospital, Sub-District Hospital, Mission Hospital, Clinic, Hospital, County Referral Hospital, Provincial General Hospital, National Referral Hospital pharmacy, clinic, hospital, doctors, dentist,
Malawi TYPE Clinic, Hospital, Dispensary, Health Centre, District Hospital, Health Post, Unclassified, Central Hospital, Private Clinic, Health Centre, Community Hospital, Health Post/Dispensary, District Hospital, Mission Hospital, Rural Hospital, Central Hospital hospital, clinic, pharmacy, dentist
Namibia None provided Regional Health Office, Health Centre, Clinic, Other, Hospital, District Health Office, Rehab Centre, Prison, CBART, Mobile Van Clinic, Health Centre, District Hospital, Mission Hospital, Intermediate Hospital, Central Hospital doctors, clinic, , pharmacy, hospital, dentist
Rwanda Facility type Dispensary, Medical Clinic, Health Center, VCT center, Health Post, Laboratory/Diagnostic Center, Health Post level 2, District Pharmacy, Community-owned health facility, Provincial Hospital, Referral Hospital, District Hospital, Nursing School, Pharmaceutical Warehouse, Other facility type, facilitytype, NA, Administrative Office, Blood bank/Transfusion Center, Prison Clinic Health Centre, Health Post, District Hospital, Referral Hospital, Provincial Hospital, National Referral Hospital, Secondary Health Post doctors, hospital, pharmacy, clinic, dentist
South Sudan type Other, Primary Health Care Centre, Primary Health Care Unit, County Hospital, County Health Department, State Hospital, Specialized Hospital/Clinic, Teaching Hospital, Health Training Institutions, Ministry of Health Primary Health Care Unit, Primary Health Care Centre, State Hospital, Teaching Hospital, County Hospital hospital, clinic, doctors, pharmacy,
Tanzania Facility Type Health Labs - Level IA2 (Dispensary Laboratory), Dispensary, Clinic - Dental Clinic, Clinic - Diagnostic Centre, Clinic - Polyclinic, Health Center, Maternity Home, Clinic - Other Clinic, Clinic - Eye Clinic, Clinic - Optometry Clinic, Clinic - Specialized Polyclinic, Clinic, Hospital - Hospital at Zonal Level, Clinic - Super specialized Polyclinic, Clinic - Medical Clinic, Health Labs - Level III single purpose Health Laboratory, Hospital - Hospital at District Level, Clinic - Specialized clinic, Health Labs, Hospital - Regional Referral Hospital, Maternity and Nursing Home, Hospital - Referral Hospital at Regional Level, Clinic - General Clinic, Hospital - District Hospital, Hospital - Council Designated Hospital, Hospital - Other Hospital, Hospital - Hospital at Regional Level, Hospital - Referral Hospital at Zonal Level, Health Labs - Specimen collection point, Health Labs - Level III Multipurpose Health Laboratory, Health Labs - Level IA1 (Health centre Laboratory), Nursing Home, Hospital - Super Specialized Hospital at National Level, Clinic - Dialysis Clinic, Clinic - Physiotherapy Clinic, Hospital - Referral Hospital at National Level, NA, Health Labs - Level IIA2 (District Laboratory), Clinic - Super specialized clinic, Hospital Health Centre, Dispensary, Hospital, Referral Hospital, Designated District Hospital, District Hospital, Regional Referral Hospital, National Hospital, Primary Health Care Unit, Primary Health Care Unit +, Primary Health Care Centre, Tertiary Hospital hospital, pharmacy, clinic, doctors, dentist, , traditional, medical_laboratory
Zambia facility_type Health Post, Rural Health Centre, Hospital - Level 1, Urban Health Centre, Hospital - Level 3, NA, Hospital - Level 2, Hospital Affiliated Health Centre, Zonal Health Centre, Border Health Post Health Centre, Health Post, Level 1 Hospital, Level 2 Hospital, Rural Health Centre, Clinic, Level 3 Hospital clinic, hospital, , pharmacy, dentist, doctors, healthcare, health_post

Facility attributes

Taking a closer look at the type of attributes that are available from country MFLs and other open data sources we notice great variability in terms of how well facililties are described. The total number of attributes for each dataset can be visualised below.

# Create table with two columns - country & number of attributes i.e. number of columns
header_df <- tibble(Country = c("Kenya MFL", "Malawi MFL", "Namibia MFL", "Rwanda MFL", "South Sudan MFL", "Tanzania MFL", "Zambia MFL", 
                                "KWTRP", "healthsites.io"),
                    Attributes = c(length(colnames(ken_mfl)), length(colnames(mwi_mfl)), length(colnames(nam_mfl)), length(colnames(rwa_mfl)),
                                   length(colnames(ssd_mfl)), length(colnames(tza_mfl)), length(colnames(zmb_mfl)), length(colnames(ken_who)),
                                   length(colnames(ken_hs)))
)

header_df %>% 
  # Order descending so that plot looks nicer
  arrange(desc(Attributes)) %>% 
  mutate(Country = factor(Country, Country)) %>% 
  ggplot(aes(x = Country, y = Attributes)) +
  geom_bar(stat = "identity", width = 0.5, fill = "#FD6C6C") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 30))

The table below shows the column headers for each dataset.

# Create a vector containing the column headers for each dataset but filled up with empty space 
# Want to create a tibble with each dataset's headers as a column and therefore we need the vectors to be of equal length
# HS data has the most headers, so we use that as basis and make all vectors as long as HS 
fill_col <- function(df){
  # Calculate the difference in header numbers between healthsites data and the target dataset
  fill_number <- length(colnames(ken_hs)) - length(colnames(df))
  # Create the vector where the first elements are the sorted column names of the target dataset
  col_data <- sort(colnames(df))
  # Fill the rest of the vector with empty cells to be able to create the tibble
  for (i in 1:fill_number){
    col_data <- append(col_data, "")
  }
  # Return the vector of specified length
  return(col_data)
}

# Create a table with columns = header names sorted alphabetically
table_headers <- tibble(`healthsite.io` = sort(colnames(ken_hs)),
                        WHO = fill_col(ken_who),
                        `Kenya MFL` = fill_col(ken_mfl),
                        `Malawi MFL` = fill_col(mwi_mfl),
                        `Namibia MFL` = fill_col(nam_mfl),
                        `Rwanda MFL` = fill_col(rwa_mfl),
                        `South Sudan MFL` = fill_col(ssd_mfl),
                        `Tanzania MFL` = fill_col(tza_mfl),
                        `Zambia MFL` = fill_col(zmb_mfl)
                        )
healthsite.io WHO Kenya MFL Malawi MFL Namibia MFL Rwanda MFL South Sudan MFL Tanzania MFL Zambia MFL
addr_full Admin1 Approved CODE alt_phone_number District #"idGeo" Common Name catchment_population_cso
amenity Country Beds COMMON NAME catchment_population District_1 ACLED Refs Council catchment_population_head_count
beds Facility name Closed DATE OPENED contact_person Facility type Alternate Names Created At DHIS2_UID
changeset_id Facility type Code DISTRICT facility_type Facity Name County CTC Number district
changeset_timestamp iso3c Constituency LATITUDE id id Deleted Date Opened eLMIS_ID
changeset_user Lat Cots LONGITUDE infrastructure_ids LOCATION Ext Ref District facility_type
changeset_version LL source County NAME infrastructure_names Opening Date Extension Accessible Facility Name HMIS_code
completeness Long Facility type OWNERSHIP location_ownership_name Ownership Extension Operational Facility Number iHRIS_ID
contact_number Ownership Facility_type_category STATUS location_type_name Province Facility Facility Type latitude
country Keph level TYPE long_name Sector Indicators Generator location
dispensing Name ZONE name Subdistrict Location Latitude longitude
electricity Officialname parent_location_id Payam Longitude name
emergency Open_late_night parent_location_name Pilot Accessible National Grid operation_status
geometry Open_public_holidays phone_number Pilot Operational No Electricity ownership
health_amenity_type Open_weekends point_x Sampled Operating Status province
healthcare Open_whole_day point_y State Other smartcare_GUID
insurance Operation status service_ids type Ownership
is_in_health_area Owner service_names Region
is_in_health_zone Owner type Registration Number
iso3c Public visible Registration Status
name Registration_number Solar Panels
opening_hours Regulatory body Updated At
operational_status Service_names Village/Street
operator Sub county Ward
operator_type Ward Zone
osm_id
osm_type
source
speciality
staff_doctors
staff_nurses
url
uuid
water_source
wheelchair


We can use a visual way to look at the overlap of attributes between various datasets through a wordcloud.

library(tm)
library(ggwordcloud)

# Used this tutorial to create frequency table:
# https://www.pluralsight.com/guides/visualization-text-data-using-word-cloud-r
# Create corpus
corpus <- Corpus(VectorSource(c(colnames(ken_hs), colnames(ken_who), 
                                  colnames(ken_mfl), colnames(mwi_mfl), colnames(nam_mfl),
                                  colnames(rwa_mfl), colnames(ssd_mfl), colnames(tza_mfl), colnames(zmb_mfl))))
#Conversion to Lowercase
corpus = tm_map(corpus, PlainTextDocument)
corpus = tm_map(corpus, tolower)

# Create frequency table
DTM <- TermDocumentMatrix(corpus)
mat <- as.matrix(DTM)
f <- sort(rowSums(mat),decreasing=TRUE)
dat <- data.frame(word = names(f),freq=f)
# Plot wordcloud 
ggwordcloud::ggwordcloud2(dat, size=1.2)

# Clean up
remove(corpus, DTM, mat, f, dat)

What next?

Andy has already started working on functionality in afrihealthsites to allow users to load a file containing a custom health facility list. This may be data obtained from the internet or an in-house (proprietary) file for example an official MFL such as the ones described in this post. The new functionality will enable users to compare and contrast their dataset against the KWTRP and healthsites.io data amongst others. We'll continue development of functionality related to health facility lists and look forward to hear from the community about their needs and experience.

Feedback

Please get in touch through one of the channels listed on our website.

 

Please cite as:

Anelda van der Walt, & Andy South. (2020, June 1). Exploring open African health facility data (Version v1.1). Zenodo. http://doi.org/10.5281/zenodo.3871224

http://afrimapr.org