This brief report demonstrates how to use the ggmap package to create beautiful overlay map projections on suitable location data.The dataset was obtained from Tanzania Data Lab portal.
I found the ggmap package very fun and easy to use as it encapsulates most of the messy work in plotting spatial data.I first had to install and load the library as per below:
install.packages('ggmap')
library(ggmap)
The ggmap package makes it very easy to plot special data locations.The best thing about ggmap is that it directly interfaces with ggplot2 thus making overlays very efficient and quick.
#Reading the csv file
s = read.csv("sl.csv")
The dataset had 15 columns and 4407 rows.I needed to reduce the number of columns since I was only interested in the Rank and geolocation data.The way I approached this stage was to use packages as always!Every R user will know that sometimes we need to have simple subsets of data but R can make it unnecessarily tedious by using the dplyr package.If only there was a way to use SQL queries.To my pleasant surprise it is possible to run SQL queries on R dataframes by using the sqldf package.By utilizing the sql package,the SELECT statement can be run as long as it has been encapsulated within the sqldf command as per below:
sql=sqldf("SELECT REGION,round(avg(RANK),0) AS 'Average_Rank' FROM s GROUP by REGION;")
#Coercing into a dataframe
sql=data.frame(sql)
#Unfactoring Regions so as to run geocode function on them
library(varhandle)
sql$REGION<-unfactor(sql$REGION)
Once the dataframe had been created,I installed and loaded the varhandle and ggmap packages inorder to obtain the google geocodes for each region of Tanzania.Obtaining the geocodes entails making calls to the google maps api.Note that there is a limit of 2500 requests to the api per day. Longitude and latitude location data columns were added to our datarame.
#Looping through each region to obtain the geocodes
for (i in 1:nrow(sql)) {
latlon = geocode(sql[i,1])
#Appending Longitude and Latitude columns
sql$Longitude[i] = as.numeric(latlon[1])
sql$Latitude[i] = as.numeric(latlon[2])
}
Once all the geocodes had been obtained,ggplot is called into action.We first set the scaling parameter for the circles:
circle_scale_amt=0.01
We then need to actually render a map of Tanzania.We once again make a call to the geocode function and obtain the built in codes for Tanzania as they are stored within ggmap.A simple Tanzania string parameter will suffice to obtain the necessary geocodes.
#A map of Tanzania
tz_center = as.numeric(geocode("Tanzania"))
TZMap = ggmap(get_googlemap(center=tz_center,scale=2, zoom=6), extent="normal")
plot(TZMap)

What was left was then to add the circle overlays which will indicate the locations of the schools scaled with their rankings.
TZMap +
geom_point(aes(x=Longitude, y=Latitude), data=sql, col="orange", alpha=0.3, size=sql$Average_Rank*circle_scale_amt) +
scale_size_continuous(range=range(sql$Average_Rank))

It can be immediately be noticed that most of the highly ranked schools are located on the coastal regions.A cluster appears on the nothern region of Lake Victoria.These results do not come as a surprise but it is hard to deny the power of spatial visualizations in discovering insights within data.
LS0tCnRpdGxlOiAiTWFwcGluZyBDU0VFIFNjaG9vbCBSYW5raW5ncyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKVGhpcyBicmllZiByZXBvcnQgZGVtb25zdHJhdGVzIGhvdyB0byB1c2UgdGhlIGdnbWFwIHBhY2thZ2UgdG8gY3JlYXRlIGJlYXV0aWZ1bCBvdmVybGF5IG1hcCBwcm9qZWN0aW9ucyBvbiBzdWl0YWJsZSBsb2NhdGlvbiBkYXRhLlRoZSBkYXRhc2V0IHdhcyBvYnRhaW5lZCBmcm9tIApbVGFuemFuaWEgRGF0YSBMYWJdKGh0dHA6Ly9vcGVuZGF0YS5kbGFiLm9yLnR6L2RhdGFzZXQvbmF0aW9uYWwtZm9ybS1mb3VyLWV4YW1pbmF0aW9ucy1zY2hvb2xzLXJhbmtpbmctd2l0aC1sb2NhdGlvbikgcG9ydGFsLgogCiBJIGZvdW5kIHRoZSBnZ21hcCBwYWNrYWdlIHZlcnkgZnVuIGFuZCBlYXN5IHRvIHVzZSBhcyBpdCBlbmNhcHN1bGF0ZXMgbW9zdCBvZiB0aGUgbWVzc3kgd29yayBpbiBwbG90dGluZyBzcGF0aWFsIGRhdGEuSSBmaXJzdCBoYWQgdG8gaW5zdGFsbCBhbmQgbG9hZCB0aGUgbGlicmFyeSBhcyBwZXIgYmVsb3c6CgpgYGB7ciBnZ21hcCwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KaW5zdGFsbC5wYWNrYWdlcygnZ2dtYXAnKQpsaWJyYXJ5KGdnbWFwKQpgYGAKClRoZSBnZ21hcCBwYWNrYWdlIG1ha2VzIGl0IHZlcnkgZWFzeSB0byBwbG90IHNwZWNpYWwgZGF0YSBsb2NhdGlvbnMuVGhlIGJlc3QgdGhpbmcgYWJvdXQgZ2dtYXAgaXMgdGhhdCBpdCBkaXJlY3RseSBpbnRlcmZhY2VzIHdpdGggZ2dwbG90MiB0aHVzIG1ha2luZyBvdmVybGF5cyB2ZXJ5IGVmZmljaWVudCBhbmQgcXVpY2suCmBgYHtyfQojUmVhZGluZyB0aGUgY3N2IGZpbGUKcyA9IHJlYWQuY3N2KCJzbC5jc3YiKQpgYGAKClRoZSBkYXRhc2V0IGhhZCAxNSBjb2x1bW5zIGFuZCA0NDA3IHJvd3MuSSBuZWVkZWQgdG8gcmVkdWNlIHRoZSBudW1iZXIgb2YgY29sdW1ucyBzaW5jZSBJIHdhcyBvbmx5IGludGVyZXN0ZWQgaW4gdGhlIFJhbmsgYW5kIGdlb2xvY2F0aW9uIGRhdGEuVGhlIHdheSBJIGFwcHJvYWNoZWQgdGhpcyBzdGFnZSB3YXMgdG8gdXNlIHBhY2thZ2VzIGFzIGFsd2F5cyFFdmVyeSBSIHVzZXIgd2lsbCBrbm93IHRoYXQgc29tZXRpbWVzIHdlIG5lZWQgdG8gaGF2ZSBzaW1wbGUgc3Vic2V0cyBvZiBkYXRhIGJ1dCBSIGNhbiBtYWtlIGl0IHVubmVjZXNzYXJpbHkgdGVkaW91cyBieSB1c2luZyB0aGUgZHBseXIgcGFja2FnZS5JZiBvbmx5IHRoZXJlIHdhcyBhIHdheSB0byB1c2UgU1FMIHF1ZXJpZXMuVG8gbXkgcGxlYXNhbnQgc3VycHJpc2UgaXQgaXMgcG9zc2libGUgdG8gcnVuIFNRTCBxdWVyaWVzIG9uIFIgZGF0YWZyYW1lcyBieSB1c2luZyB0aGUgc3FsZGYgcGFja2FnZS5CeSB1dGlsaXppbmcgdGhlIHNxbCBwYWNrYWdlLHRoZSBTRUxFQ1Qgc3RhdGVtZW50IGNhbiBiZSBydW4gYXMgbG9uZyBhcyBpdCBoYXMgYmVlbiBlbmNhcHN1bGF0ZWQgd2l0aGluIHRoZSBzcWxkZiBjb21tYW5kIGFzIHBlciBiZWxvdzoKCmBgYHtyfQpzcWw9c3FsZGYoIlNFTEVDVCBSRUdJT04scm91bmQoYXZnKFJBTkspLDApIEFTICdBdmVyYWdlX1JhbmsnIEZST00gcyBHUk9VUCBieSBSRUdJT047IikKI0NvZXJjaW5nIGludG8gYSBkYXRhZnJhbWUKc3FsPWRhdGEuZnJhbWUoc3FsKQojVW5mYWN0b3JpbmcgUmVnaW9ucyBzbyBhcyB0byBydW4gZ2VvY29kZSBmdW5jdGlvbiBvbiB0aGVtCmxpYnJhcnkodmFyaGFuZGxlKQpzcWwkUkVHSU9OPC11bmZhY3RvcihzcWwkUkVHSU9OKQpgYGAKCk9uY2UgdGhlIGRhdGFmcmFtZSBoYWQgYmVlbiBjcmVhdGVkLEkgaW5zdGFsbGVkIGFuZCBsb2FkZWQgdGhlIHZhcmhhbmRsZSBhbmQgZ2dtYXAgcGFja2FnZXMgaW5vcmRlciB0byBvYnRhaW4gdGhlIGdvb2dsZSBnZW9jb2RlcyBmb3IgZWFjaCByZWdpb24gb2YgVGFuemFuaWEuT2J0YWluaW5nIHRoZSBnZW9jb2RlcyBlbnRhaWxzIG1ha2luZyBjYWxscyB0byB0aGUgZ29vZ2xlIG1hcHMgYXBpLk5vdGUgdGhhdCB0aGVyZSBpcyBhIGxpbWl0IG9mIDI1MDAgcmVxdWVzdHMgdG8gdGhlIGFwaSBwZXIgZGF5LiBMb25naXR1ZGUgYW5kIGxhdGl0dWRlIGxvY2F0aW9uIGRhdGEgY29sdW1ucyB3ZXJlIGFkZGVkIHRvIG91ciBkYXRhcmFtZS4KCmBgYHtyfQojTG9vcGluZyB0aHJvdWdoIGVhY2ggcmVnaW9uIHRvIG9idGFpbiB0aGUgZ2VvY29kZXMKZm9yIChpIGluIDE6bnJvdyhzcWwpKSB7CiAgbGF0bG9uID0gZ2VvY29kZShzcWxbaSwxXSkKI0FwcGVuZGluZyBMb25naXR1ZGUgYW5kIExhdGl0dWRlIGNvbHVtbnMKICBzcWwkTG9uZ2l0dWRlW2ldID0gYXMubnVtZXJpYyhsYXRsb25bMV0pCiAgc3FsJExhdGl0dWRlW2ldID0gYXMubnVtZXJpYyhsYXRsb25bMl0pCn0KYGBgCgpPbmNlIGFsbCB0aGUgZ2VvY29kZXMgaGFkIGJlZW4gb2J0YWluZWQsZ2dwbG90IGlzIGNhbGxlZCBpbnRvIGFjdGlvbi5XZSBmaXJzdCBzZXQgdGhlIHNjYWxpbmcgcGFyYW1ldGVyIGZvciB0aGUgY2lyY2xlczoKYGBge3J9CmNpcmNsZV9zY2FsZV9hbXQ9MC4wMQpgYGAKV2UgdGhlbiBuZWVkIHRvIGFjdHVhbGx5IHJlbmRlciBhIG1hcCBvZiBUYW56YW5pYS5XZSBvbmNlIGFnYWluIG1ha2UgYSBjYWxsIHRvIHRoZSBnZW9jb2RlIGZ1bmN0aW9uIGFuZCBvYnRhaW4gdGhlIGJ1aWx0IGluIGNvZGVzIGZvciBUYW56YW5pYSBhcyB0aGV5IGFyZSBzdG9yZWQgd2l0aGluIGdnbWFwLkEgc2ltcGxlIFRhbnphbmlhIHN0cmluZyBwYXJhbWV0ZXIgd2lsbCBzdWZmaWNlIHRvIG9idGFpbiB0aGUgbmVjZXNzYXJ5IGdlb2NvZGVzLgpgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KI0EgbWFwIG9mIFRhbnphbmlhCnR6X2NlbnRlciA9IGFzLm51bWVyaWMoZ2VvY29kZSgiVGFuemFuaWEiKSkKVFpNYXAgPSBnZ21hcChnZXRfZ29vZ2xlbWFwKGNlbnRlcj10el9jZW50ZXIsc2NhbGU9Miwgem9vbT02KSwgZXh0ZW50PSJub3JtYWwiKQpwbG90KFRaTWFwKQpgYGAKV2hhdCB3YXMgbGVmdCB3YXMgdGhlbiB0byBhZGQgdGhlIGNpcmNsZSBvdmVybGF5cyB3aGljaCB3aWxsIGluZGljYXRlIHRoZSBsb2NhdGlvbnMgb2YgdGhlIHNjaG9vbHMgc2NhbGVkIHdpdGggdGhlaXIgcmFua2luZ3MuCgpgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KVFpNYXAgKwpnZW9tX3BvaW50KGFlcyh4PUxvbmdpdHVkZSwgeT1MYXRpdHVkZSksIGRhdGE9c3FsLCBjb2w9Im9yYW5nZSIsYWxwaGE9MC4zLCBzaXplPXNxbCRBdmVyYWdlX1JhbmsqY2lyY2xlX3NjYWxlX2FtdCkgKyAKc2NhbGVfc2l6ZV9jb250aW51b3VzKHJhbmdlPXJhbmdlKHNxbCRBdmVyYWdlX1JhbmspKQoKYGBgCkl0IGNhbiBiZSBpbW1lZGlhdGVseSBiZSBub3RpY2VkIHRoYXQgbW9zdCBvZiB0aGUgaGlnaGx5IHJhbmtlZCBzY2hvb2xzIGFyZSBsb2NhdGVkIG9uIHRoZSBjb2FzdGFsIHJlZ2lvbnMuQSBjbHVzdGVyIGFwcGVhcnMgb24gdGhlIG5vdGhlcm4gcmVnaW9uIG9mIExha2UgVmljdG9yaWEuVGhlc2UgcmVzdWx0cyBkbyBub3QgY29tZSBhcyBhIHN1cnByaXNlIGJ1dCBpdCBpcyBoYXJkIHRvIGRlbnkgdGhlIHBvd2VyIG9mIHNwYXRpYWwgdmlzdWFsaXphdGlvbnMgaW4gZGlzY292ZXJpbmcgaW5zaWdodHMgd2l0aGluIGRhdGEuCg==