How to add names to administrative divisions with R’s choroplethr package

The choroplethr package is based on ggplot2, but uses odd syntax, so one does not immediately recognize it as such. This short tutorial shows how to add names to administrative divisions (ADs; called regions in the package) made using this package. We begin by loading some packages:

library(choroplethrAdmin1)
library(choroplethr)
library(ggplot2)
library(grid)
library(stringr)
library(magrittr)
library(psych)
library(kirkegaard)
#install from github
#library(devtools); install_github("deleetdk/kirkegaard")

To get the names of ADs we plot a country without anything else, and inspecting the data it uses:

#neutral map
gg = admin1_map("india")
gg

#inspect data
gg$data %>% head(10)
##            long      lat admin                        region  group  order
## 211868 76.81457 30.78913 india union territory of chandigarh 1746.1 211868
## 211869 76.81167 30.77766 india union territory of chandigarh 1746.1 211869
## 211870 76.81333 30.77027 india union territory of chandigarh 1746.1 211870
## 211871 76.81943 30.75678 india union territory of chandigarh 1746.1 211871
## 211872 76.82408 30.74975 india union territory of chandigarh 1746.1 211872
## 211873 76.82625 30.74296 india union territory of chandigarh 1746.1 211873
## 211874 76.82656 30.73528 india union territory of chandigarh 1746.1 211874
## 211875 76.82201 30.72210 india union territory of chandigarh 1746.1 211875
## 211876 76.81880 30.70319 india union territory of chandigarh 1746.1 211876
## 211877 76.82728 30.68071 india union territory of chandigarh 1746.1 211877
##         hole piece   id
## 211868 FALSE     1 1746
## 211869 FALSE     1 1746
## 211870 FALSE     1 1746
## 211871 FALSE     1 1746
## 211872 FALSE     1 1746
## 211873 FALSE     1 1746
## 211874 FALSE     1 1746
## 211875 FALSE     1 1746
## 211876 FALSE     1 1746
## 211877 FALSE     1 1746

It is important to save it to an object because we need to access the object’s data to get the positions of the regions. Inspecting the data, reveals that it consists of a bunch of points for each AD. The internal working of the function is presumably to draw lines between the points for each group to form a polygon which can then be given a color according to the value one wants to map.

If the shape is a circle, then one can take the mean position of the points to get the center of the area. However, maps usually don’t consist of circles, so this crude method may give misleading results. It depends on the distribution of points along the perimeter, the mean position will be towards the region with the highest density of points. Still, let’s try getting the mean positions:

#loop over rows by their AD, then get the mean longitude and latitude
d_geo = ddply(gg$data, .variables = "region", .fun = function(block) {
  c("long" = mean(block$long),
    "lat" = mean(block$lat),
    "long_median" = median(block$long),
    "lat_median" = median(block$lat),
    "long_geo" = geometric.mean(block$long),
    "lat_geo" = geometric.mean(block$lat),
    "long_harm" = harmonic.mean(block$long),
    "lat_harm" = harmonic.mean(block$lat),
    "long_midrange" = averages(block$long, types = "midrange") %>% unname,
    "lat_midrange" = averages(block$lat, types = "midrange") %>% unname
    )
})
d_geo
##                                            region     long      lat
## 1             national capital territory of delhi 77.08727 28.63403
## 2                         state of andhra pradesh 79.65457 16.32747
## 3                      state of arunachal pradesh 94.92846 27.85344
## 4                                  state of assam 92.63188 26.05366
## 5                                  state of bihar 85.58115 25.72101
## 6                           state of chhattisgarh 82.04880 20.87567
## 7                                    state of goa 74.00794 15.40028
## 8                                state of gujarat 71.84564 22.43161
## 9                                state of haryana 76.23652 29.22714
## 10                      state of himachal pradesh 77.28348 31.74312
## 11                     state of jammu and kashmir 76.59950 33.33931
## 12                             state of jharkhand 85.78395 23.71770
## 13                             state of karnataka 76.37751 14.82859
## 14                                state of kerala 76.22299 10.76813
## 15                        state of madhya pradesh 78.28090 24.20386
## 16                           state of maharashtra 76.07044 19.15557
## 17                               state of manipur 93.68964 24.71382
## 18                             state of meghalaya 91.45327 25.62376
## 19                               state of mizoram 92.79723 23.37940
## 20                              state of nagaland 94.34928 26.02303
## 21                                state of odisha 83.88828 20.02263
## 22                                state of punjab 75.64937 30.71376
## 23                             state of rajasthan 75.31510 26.08975
## 24                                state of sikkim 88.42492 27.55153
## 25                            state of tamil nadu 78.42955 11.27047
## 26                               state of tripura 91.79045 23.72857
## 27                         state of uttar pradesh 80.21400 26.34729
## 28                           state of uttarakhand 79.19569 30.15301
## 29                           state of west bengal 88.16366 24.09764
## 30 union territory of andaman and nicobar islands 93.01142 11.01888
## 31                  union territory of chandigarh 76.77392 30.73566
## 32      union territory of dadra and nagar haveli 73.04053 20.18239
## 33               union territory of daman and diu 71.99561 20.55732
## 34                 union territory of lakshadweep 72.97500 10.42425
## 35                  union territory of puducherry 79.33236 12.51204
##    long_median lat_median long_geo  lat_geo long_harm lat_harm
## 1     77.07807   28.60701 77.08709 28.63370  77.08692 28.63336
## 2     78.91501   16.38837 79.61955 16.16302  79.58493 15.99808
## 3     95.36785   27.65373 94.91357 27.84280  94.89859 27.83226
## 4     92.59748   25.95668 92.61447 26.03663  92.59708 26.01957
## 5     85.08502   25.74648 85.56621 25.70378  85.55133 25.68656
## 6     82.03440   20.58996 82.04021 20.79274  82.03164 20.71042
## 7     73.95997   15.43521 74.00767 15.39815  74.00741 15.39602
## 8     72.51865   22.30722 71.82175 22.39067  71.79760 22.34978
## 9     76.38617   29.37810 76.23127 29.21210  76.22599 29.19704
## 10    77.41215   31.41987 77.27680 31.73143  77.27011 31.71977
## 11    76.52787   32.95962 76.57790 33.32771  76.55628 33.31628
## 12    85.97385   23.87704 85.77299 23.69893  85.76201 23.67999
## 13    76.90598   14.41294 76.36536 14.71079  76.35316 14.59405
## 14    76.32081   10.82544 76.21978 10.68605  76.21657 10.60252
## 15    78.44744   24.70932 78.24349 24.15527  78.20612 24.10525
## 16    75.57165   19.31250 76.02290 19.06711  75.97578 18.97717
## 17    93.54636   24.71650 93.68799 24.70692  93.68635 24.70002
## 18    91.57093   25.72455 91.44862 25.62150  91.44397 25.61923
## 19    92.85798   23.78609 92.79648 23.36392  92.79574 23.34826
## 20    94.33670   25.92072 94.34723 26.01730  94.34518 26.01159
## 21    83.46310   19.80451 83.87084 19.97440  83.85350 19.92687
## 22    75.68347   30.44887 75.64439 30.70164  75.63939 30.68964
## 23    75.70962   25.40274 75.28896 26.01945  75.26240 25.95032
## 24    88.40363   27.45512 88.42434 27.54943  88.42376 27.54732
## 25    78.62351   11.47439 78.41902 11.16099  78.40849 11.04699
## 26    91.77376   23.79653 91.78959 23.72399  91.78872 23.71940
## 27    79.46335   25.67390 80.18176 26.29625  80.14979 26.24661
## 28    79.21339   30.22834 79.18957 30.14051  79.18344 30.12797
## 29    88.28419   24.01983 88.15910 24.02718  88.15453 23.95693
## 30    92.94178   11.89912 93.01054 10.77896  93.00967 10.51284
## 31    76.78522   30.74456 76.77390 30.73563  76.77389 30.73560
## 32    73.05366   20.19241 73.04049 20.18215  73.04044 20.18191
## 33    72.81946   20.46517 71.98960 20.55673  71.98358 20.55613
## 34    72.79558   10.77932 72.97365 10.36424  72.97231 10.29913
## 35    79.74297   11.94484 79.30193 12.37088  79.27103 12.24858
##    long_midrange lat_midrange
## 1       77.08561    28.657347
## 2       80.74976    16.261605
## 3       94.46775    28.008455
## 4       92.83382    26.060541
## 5       85.81345    25.909285
## 6       82.30695    20.958153
## 7       73.99506    15.335321
## 8       71.30123    22.390504
## 9       76.02366    29.302333
## 10      77.28746    31.807516
## 11      76.63995    33.897996
## 12      85.61853    23.654952
## 13      76.33007    15.043204
## 14      76.12938    10.532568
## 15      78.42145    23.967310
## 16      76.77309    18.818296
## 17      93.84588    24.764639
## 18      91.30332    25.547017
## 19      92.84119    23.219992
## 20      94.29143    26.119491
## 21      84.43405    20.171986
## 22      75.37636    31.038287
## 23      73.85581    26.620069
## 24      88.44850    27.595590
## 25      78.28918    10.805931
## 26      91.73216    23.743325
## 27      80.83530    27.156534
## 28      79.28431    30.083255
## 29      87.85544    24.386375
## 30      93.25404    10.212897
## 31      76.76051    30.735657
## 32      73.02901    20.181973
## 33      71.92782    20.557944
## 34      72.92852     9.971829
## 35      78.76158    13.779702

To actually plot some data, we need some values to map to each AD, so we make some up:

#simulate some random uniform data
set.seed(1)
d_geo$value = runif(nrow(d_geo))

The package contains names of the ADs internally, but they are extremely long and not suitable for nice figures, so we clean them up a little:

#inspect current names
d_geo$region
##  [1] "national capital territory of delhi"           
##  [2] "state of andhra pradesh"                       
##  [3] "state of arunachal pradesh"                    
##  [4] "state of assam"                                
##  [5] "state of bihar"                                
##  [6] "state of chhattisgarh"                         
##  [7] "state of goa"                                  
##  [8] "state of gujarat"                              
##  [9] "state of haryana"                              
## [10] "state of himachal pradesh"                     
## [11] "state of jammu and kashmir"                    
## [12] "state of jharkhand"                            
## [13] "state of karnataka"                            
## [14] "state of kerala"                               
## [15] "state of madhya pradesh"                       
## [16] "state of maharashtra"                          
## [17] "state of manipur"                              
## [18] "state of meghalaya"                            
## [19] "state of mizoram"                              
## [20] "state of nagaland"                             
## [21] "state of odisha"                               
## [22] "state of punjab"                               
## [23] "state of rajasthan"                            
## [24] "state of sikkim"                               
## [25] "state of tamil nadu"                           
## [26] "state of tripura"                              
## [27] "state of uttar pradesh"                        
## [28] "state of uttarakhand"                          
## [29] "state of west bengal"                          
## [30] "union territory of andaman and nicobar islands"
## [31] "union territory of chandigarh"                 
## [32] "union territory of dadra and nagar haveli"     
## [33] "union territory of daman and diu"              
## [34] "union territory of lakshadweep"                
## [35] "union territory of puducherry"
#clean the names for nicer output
d_geo$clean_names = d_geo$region %>% 
  str_replace(pattern = "state of ", replacement = "") %>% 
  str_replace(pattern = "union territory of ", replacement = "")

#inspect clean names
d_geo$clean_names
##  [1] "national capital territory of delhi"
##  [2] "andhra pradesh"                     
##  [3] "arunachal pradesh"                  
##  [4] "assam"                              
##  [5] "bihar"                              
##  [6] "chhattisgarh"                       
##  [7] "goa"                                
##  [8] "gujarat"                            
##  [9] "haryana"                            
## [10] "himachal pradesh"                   
## [11] "jammu and kashmir"                  
## [12] "jharkhand"                          
## [13] "karnataka"                          
## [14] "kerala"                             
## [15] "madhya pradesh"                     
## [16] "maharashtra"                        
## [17] "manipur"                            
## [18] "meghalaya"                          
## [19] "mizoram"                            
## [20] "nagaland"                           
## [21] "odisha"                             
## [22] "punjab"                             
## [23] "rajasthan"                          
## [24] "sikkim"                             
## [25] "tamil nadu"                         
## [26] "tripura"                            
## [27] "uttar pradesh"                      
## [28] "uttarakhand"                        
## [29] "west bengal"                        
## [30] "andaman and nicobar islands"        
## [31] "chandigarh"                         
## [32] "dadra and nagar haveli"             
## [33] "daman and diu"                      
## [34] "lakshadweep"                        
## [35] "puducherry"

Finally, we are ready to plot the map with values and names!

#plot
admin1_choropleth(country.name = "india", 
                  df           = d_geo, 
                  legend       = "Random uniform data", 
                  num_colors   = 1) +
  geom_text(data = d_geo, aes(long, lat, label = clean_names, group = NULL), size = 2.5)

Most of the positions are alright, but a few showcase the problem mentioned above. For instance, rajasthan (in the center left) has its names too far to the southeast. This is because the southeastern part of the AD has a lot of wiggly borders that need lots of points to plot. I experimented with various averages and found that the midrange works so best of those I’ve tried so far: