Summary

There are two main findings:

  1. The top five cities that have the most mosques are: Gbeke, Côte d’Ivoire; Abidjan, Côte d’Ivoire; Tripoli, Libya; Niamey, Niger; and Khartoum, Sudan.
  2. Of the 63808 places of worship, 14859 of them are labeled as “Muslim”. I am assuming that these are all mosques, thus we have a lot more data than we have anticipated.

Analysis

library(tidyverse)
library(sf)
library(plotly)
library(tmap)
library(countrycode)

First, we read in GADM data, which contains all levels of administive regions in the world, and then crop out everything outside of Africa.

# GADM data, subsetted to only Africa
st_afr <- st_read("../data/af_adm_shape/af_shape.shp")
## Reading layer `af_shape' from data source `C:\Users\chiga\Projects\mosques_id_project\data\af_adm_shape\af_shape.shp' using driver `ESRI Shapefile'
## Simple feature collection with 53754 features and 59 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -25.3618 ymin: -34.83514 xmax: 51.4157 ymax: 37.55986
## Geodetic CRS:  WGS 84
# Libya and Western Sahara does not have administrative level lower than
# the 1st level (province level), so just use city/provincial level instead
# for these two countries.
libya <- st_afr %>% filter(NAME_0 == "Libya") %>% 
  mutate(NAME_2 = NAME_1)
w_sahara <- st_afr %>% 
  filter(NAME_0 == "Western Sahara") %>% 
  mutate(NAME_2 = NAME_1)
st_afr <- st_afr %>% 
  filter(NAME_2 != "Libya" & NAME_0 != "Western Sahara") %>% 
  bind_rows(libya, w_sahara)

Then, we read in all csv files provided by Clark’s hardworking graduate student(s)! This file includes 63808 places of worship.

# read in all csv file under the same directory
ttl_file <- list.files("../data") %>% str_subset(".csv")
pre_path <- "../data/"
ttl_df <- tibble()

# loop over each file and combine everything to one dataframe
for(i in 1:length(ttl_file)){
  curr_path <- paste0(pre_path, ttl_file[i])
  temp <- read.csv(curr_path, encoding = "UTF-8")
  ttl_df <- rbind(ttl_df, temp)
}

# filter out other churches, leaving only mosques
mosque <- ttl_df %>% filter(fclass == "muslim")
sf_mosque <- st_as_sf(mosque, coords = c("long", "lat"), crs=4326)
st_afr_join <- st_join(st_afr, sf_mosque)

Of the 63808 place of worship, 14859 are coded as “muslim”. I am assuming that these are mosques. I could be wrong here.

I then count how many mosques are in each city (which is defined here as the 2nd administrative level, under country(0th) and province(1st). However, notice that in the GADM administrative level data, the lowest level of division in Western Sahara and Libya is the 1st level (mixture of provincial and city level). So without more detailed data on administrative districts, I used the 1st level in these two countries, in parallel with 2nd administrative levels in all other countries.

The five districts that have the most mosques are: Gbeke, Côte d’Ivoire; Abidjan, Côte d’Ivoire; Tripoli, Libya; Niamey, Niger; and Khartoum, Sudan.The specific data is displayed below. I also included an visualization that plot all the mosques and the number of mosques aggregated to city level.

Note that in the interactive map, the administrative districts looks very crude and funky – this is because I sacrificed the accuracy to reduce size so that the size of the webpage is under the maximum threshold for the free hosting service provided by RStudio.

# NAME_2 is city level shape file
st_afr_join <- st_afr_join %>% filter(!is.na(NAME_2))

# filter out cities that does not contain mosques in this dataset
df_num_mosque <- st_afr_join %>% 
  select(GID_0, NAME_0, GID_1, NAME_1, GID_2, NAME_2, osm_id, name, UID) %>% 
  filter(!is.na(osm_id)) %>% 
  group_by(UID) %>% 
  summarize(num_mosque=n(), across(c(-name,-geometry), .fns=first)) %>% 
  rename(country_code=GID_0, country_name=NAME_0, province_name=NAME_1, 
         city_name=NAME_2)
  
(df_top_10 <- df_num_mosque %>% arrange(desc(num_mosque)) %>% head(10))
# simplify the polygon to keep the size of the interactive map smaller
# So I can publish it via RPub
df_num_mosque_sim <- 
  st_simplify(df_num_mosque, preserveTopology = T, dTolerance=1000)

tmap_mode("view")
# plot each mosques and total number of mosques aggregated up to each city level
tm_shape(df_num_mosque_sim) + 
tm_polygons(col="num_mosque", id="city_name", alpha=0.4) +
tm_shape(df_top_10) + 
tm_text("city_name", col="red", size=1.5)  + 
tm_shape(sf_mosque) + 
tm_dots(size=0.005, col="blue", id="name")