The choroplethr package is based on ggplot2, but uses odd syntax, so one does not immediately recognize it as such. This short tutorial shows how to add names to administrative divisions (ADs; called regions in the package) made using this package. We begin by loading some packages:
library(choroplethrAdmin1)
library(choroplethr)
library(ggplot2)
library(grid)
library(stringr)
library(magrittr)
library(psych)
library(kirkegaard)
#install from github
#library(devtools); install_github("deleetdk/kirkegaard")
To get the names of ADs we plot a country without anything else, and inspecting the data it uses:
#neutral map
gg = admin1_map("india")
gg
#inspect data
gg$data %>% head(10)
## long lat admin region group order
## 211868 76.81457 30.78913 india union territory of chandigarh 1746.1 211868
## 211869 76.81167 30.77766 india union territory of chandigarh 1746.1 211869
## 211870 76.81333 30.77027 india union territory of chandigarh 1746.1 211870
## 211871 76.81943 30.75678 india union territory of chandigarh 1746.1 211871
## 211872 76.82408 30.74975 india union territory of chandigarh 1746.1 211872
## 211873 76.82625 30.74296 india union territory of chandigarh 1746.1 211873
## 211874 76.82656 30.73528 india union territory of chandigarh 1746.1 211874
## 211875 76.82201 30.72210 india union territory of chandigarh 1746.1 211875
## 211876 76.81880 30.70319 india union territory of chandigarh 1746.1 211876
## 211877 76.82728 30.68071 india union territory of chandigarh 1746.1 211877
## hole piece id
## 211868 FALSE 1 1746
## 211869 FALSE 1 1746
## 211870 FALSE 1 1746
## 211871 FALSE 1 1746
## 211872 FALSE 1 1746
## 211873 FALSE 1 1746
## 211874 FALSE 1 1746
## 211875 FALSE 1 1746
## 211876 FALSE 1 1746
## 211877 FALSE 1 1746
It is important to save it to an object because we need to access the object’s data to get the positions of the regions. Inspecting the data, reveals that it consists of a bunch of points for each AD. The internal working of the function is presumably to draw lines between the points for each group to form a polygon which can then be given a color according to the value one wants to map.
If the shape is a circle, then one can take the mean position of the points to get the center of the area. However, maps usually don’t consist of circles, so this crude method may give misleading results. It depends on the distribution of points along the perimeter, the mean position will be towards the region with the highest density of points. Still, let’s try getting the mean positions:
#loop over rows by their AD, then get the mean longitude and latitude
d_geo = ddply(gg$data, .variables = "region", .fun = function(block) {
c("long" = mean(block$long),
"lat" = mean(block$lat),
"long_median" = median(block$long),
"lat_median" = median(block$lat),
"long_geo" = geometric.mean(block$long),
"lat_geo" = geometric.mean(block$lat),
"long_harm" = harmonic.mean(block$long),
"lat_harm" = harmonic.mean(block$lat),
"long_midrange" = averages(block$long, types = "midrange") %>% unname,
"lat_midrange" = averages(block$lat, types = "midrange") %>% unname
)
})
d_geo
## region long lat
## 1 national capital territory of delhi 77.08727 28.63403
## 2 state of andhra pradesh 79.65457 16.32747
## 3 state of arunachal pradesh 94.92846 27.85344
## 4 state of assam 92.63188 26.05366
## 5 state of bihar 85.58115 25.72101
## 6 state of chhattisgarh 82.04880 20.87567
## 7 state of goa 74.00794 15.40028
## 8 state of gujarat 71.84564 22.43161
## 9 state of haryana 76.23652 29.22714
## 10 state of himachal pradesh 77.28348 31.74312
## 11 state of jammu and kashmir 76.59950 33.33931
## 12 state of jharkhand 85.78395 23.71770
## 13 state of karnataka 76.37751 14.82859
## 14 state of kerala 76.22299 10.76813
## 15 state of madhya pradesh 78.28090 24.20386
## 16 state of maharashtra 76.07044 19.15557
## 17 state of manipur 93.68964 24.71382
## 18 state of meghalaya 91.45327 25.62376
## 19 state of mizoram 92.79723 23.37940
## 20 state of nagaland 94.34928 26.02303
## 21 state of odisha 83.88828 20.02263
## 22 state of punjab 75.64937 30.71376
## 23 state of rajasthan 75.31510 26.08975
## 24 state of sikkim 88.42492 27.55153
## 25 state of tamil nadu 78.42955 11.27047
## 26 state of tripura 91.79045 23.72857
## 27 state of uttar pradesh 80.21400 26.34729
## 28 state of uttarakhand 79.19569 30.15301
## 29 state of west bengal 88.16366 24.09764
## 30 union territory of andaman and nicobar islands 93.01142 11.01888
## 31 union territory of chandigarh 76.77392 30.73566
## 32 union territory of dadra and nagar haveli 73.04053 20.18239
## 33 union territory of daman and diu 71.99561 20.55732
## 34 union territory of lakshadweep 72.97500 10.42425
## 35 union territory of puducherry 79.33236 12.51204
## long_median lat_median long_geo lat_geo long_harm lat_harm
## 1 77.07807 28.60701 77.08709 28.63370 77.08692 28.63336
## 2 78.91501 16.38837 79.61955 16.16302 79.58493 15.99808
## 3 95.36785 27.65373 94.91357 27.84280 94.89859 27.83226
## 4 92.59748 25.95668 92.61447 26.03663 92.59708 26.01957
## 5 85.08502 25.74648 85.56621 25.70378 85.55133 25.68656
## 6 82.03440 20.58996 82.04021 20.79274 82.03164 20.71042
## 7 73.95997 15.43521 74.00767 15.39815 74.00741 15.39602
## 8 72.51865 22.30722 71.82175 22.39067 71.79760 22.34978
## 9 76.38617 29.37810 76.23127 29.21210 76.22599 29.19704
## 10 77.41215 31.41987 77.27680 31.73143 77.27011 31.71977
## 11 76.52787 32.95962 76.57790 33.32771 76.55628 33.31628
## 12 85.97385 23.87704 85.77299 23.69893 85.76201 23.67999
## 13 76.90598 14.41294 76.36536 14.71079 76.35316 14.59405
## 14 76.32081 10.82544 76.21978 10.68605 76.21657 10.60252
## 15 78.44744 24.70932 78.24349 24.15527 78.20612 24.10525
## 16 75.57165 19.31250 76.02290 19.06711 75.97578 18.97717
## 17 93.54636 24.71650 93.68799 24.70692 93.68635 24.70002
## 18 91.57093 25.72455 91.44862 25.62150 91.44397 25.61923
## 19 92.85798 23.78609 92.79648 23.36392 92.79574 23.34826
## 20 94.33670 25.92072 94.34723 26.01730 94.34518 26.01159
## 21 83.46310 19.80451 83.87084 19.97440 83.85350 19.92687
## 22 75.68347 30.44887 75.64439 30.70164 75.63939 30.68964
## 23 75.70962 25.40274 75.28896 26.01945 75.26240 25.95032
## 24 88.40363 27.45512 88.42434 27.54943 88.42376 27.54732
## 25 78.62351 11.47439 78.41902 11.16099 78.40849 11.04699
## 26 91.77376 23.79653 91.78959 23.72399 91.78872 23.71940
## 27 79.46335 25.67390 80.18176 26.29625 80.14979 26.24661
## 28 79.21339 30.22834 79.18957 30.14051 79.18344 30.12797
## 29 88.28419 24.01983 88.15910 24.02718 88.15453 23.95693
## 30 92.94178 11.89912 93.01054 10.77896 93.00967 10.51284
## 31 76.78522 30.74456 76.77390 30.73563 76.77389 30.73560
## 32 73.05366 20.19241 73.04049 20.18215 73.04044 20.18191
## 33 72.81946 20.46517 71.98960 20.55673 71.98358 20.55613
## 34 72.79558 10.77932 72.97365 10.36424 72.97231 10.29913
## 35 79.74297 11.94484 79.30193 12.37088 79.27103 12.24858
## long_midrange lat_midrange
## 1 77.08561 28.657347
## 2 80.74976 16.261605
## 3 94.46775 28.008455
## 4 92.83382 26.060541
## 5 85.81345 25.909285
## 6 82.30695 20.958153
## 7 73.99506 15.335321
## 8 71.30123 22.390504
## 9 76.02366 29.302333
## 10 77.28746 31.807516
## 11 76.63995 33.897996
## 12 85.61853 23.654952
## 13 76.33007 15.043204
## 14 76.12938 10.532568
## 15 78.42145 23.967310
## 16 76.77309 18.818296
## 17 93.84588 24.764639
## 18 91.30332 25.547017
## 19 92.84119 23.219992
## 20 94.29143 26.119491
## 21 84.43405 20.171986
## 22 75.37636 31.038287
## 23 73.85581 26.620069
## 24 88.44850 27.595590
## 25 78.28918 10.805931
## 26 91.73216 23.743325
## 27 80.83530 27.156534
## 28 79.28431 30.083255
## 29 87.85544 24.386375
## 30 93.25404 10.212897
## 31 76.76051 30.735657
## 32 73.02901 20.181973
## 33 71.92782 20.557944
## 34 72.92852 9.971829
## 35 78.76158 13.779702
To actually plot some data, we need some values to map to each AD, so we make some up:
#simulate some random uniform data
set.seed(1)
d_geo$value = runif(nrow(d_geo))
The package contains names of the ADs internally, but they are extremely long and not suitable for nice figures, so we clean them up a little:
#inspect current names
d_geo$region
## [1] "national capital territory of delhi"
## [2] "state of andhra pradesh"
## [3] "state of arunachal pradesh"
## [4] "state of assam"
## [5] "state of bihar"
## [6] "state of chhattisgarh"
## [7] "state of goa"
## [8] "state of gujarat"
## [9] "state of haryana"
## [10] "state of himachal pradesh"
## [11] "state of jammu and kashmir"
## [12] "state of jharkhand"
## [13] "state of karnataka"
## [14] "state of kerala"
## [15] "state of madhya pradesh"
## [16] "state of maharashtra"
## [17] "state of manipur"
## [18] "state of meghalaya"
## [19] "state of mizoram"
## [20] "state of nagaland"
## [21] "state of odisha"
## [22] "state of punjab"
## [23] "state of rajasthan"
## [24] "state of sikkim"
## [25] "state of tamil nadu"
## [26] "state of tripura"
## [27] "state of uttar pradesh"
## [28] "state of uttarakhand"
## [29] "state of west bengal"
## [30] "union territory of andaman and nicobar islands"
## [31] "union territory of chandigarh"
## [32] "union territory of dadra and nagar haveli"
## [33] "union territory of daman and diu"
## [34] "union territory of lakshadweep"
## [35] "union territory of puducherry"
#clean the names for nicer output
d_geo$clean_names = d_geo$region %>%
str_replace(pattern = "state of ", replacement = "") %>%
str_replace(pattern = "union territory of ", replacement = "")
#inspect clean names
d_geo$clean_names
## [1] "national capital territory of delhi"
## [2] "andhra pradesh"
## [3] "arunachal pradesh"
## [4] "assam"
## [5] "bihar"
## [6] "chhattisgarh"
## [7] "goa"
## [8] "gujarat"
## [9] "haryana"
## [10] "himachal pradesh"
## [11] "jammu and kashmir"
## [12] "jharkhand"
## [13] "karnataka"
## [14] "kerala"
## [15] "madhya pradesh"
## [16] "maharashtra"
## [17] "manipur"
## [18] "meghalaya"
## [19] "mizoram"
## [20] "nagaland"
## [21] "odisha"
## [22] "punjab"
## [23] "rajasthan"
## [24] "sikkim"
## [25] "tamil nadu"
## [26] "tripura"
## [27] "uttar pradesh"
## [28] "uttarakhand"
## [29] "west bengal"
## [30] "andaman and nicobar islands"
## [31] "chandigarh"
## [32] "dadra and nagar haveli"
## [33] "daman and diu"
## [34] "lakshadweep"
## [35] "puducherry"
Finally, we are ready to plot the map with values and names!
#plot
admin1_choropleth(country.name = "india",
df = d_geo,
legend = "Random uniform data",
num_colors = 1) +
geom_text(data = d_geo, aes(long, lat, label = clean_names, group = NULL), size = 2.5)
Most of the positions are alright, but a few showcase the problem mentioned above. For instance, rajasthan (in the center left) has its names too far to the southeast. This is because the southeastern part of the AD has a lot of wiggly borders that need lots of points to plot. I experimented with various averages and found that the midrange works so best of those I’ve tried so far: