Geospatial data is data about objects, events, or phenomena that have a location on the surface of the earth. The data combines location information (usually coordinates on the earth), attribute information (the characteristics of the object, event, or phenomena concerned), and often also temporal information (the time or life span at which the location and attributes exist)1.
Geospatial analysis uses this data to build maps, graphs, statistics, and cartograms, making complex relationships understandable. It introduces a formal techniques using geographic and geometry properties of a data. There are various business problems that can be solved using geospatial analysis, namely a few:
There are actually many tools available for visualizing geographic information; from full-scale GIS (Geographic Information System) applications such as ArcGIS and QGIS, to web-based tools like Google maps to any number of programming languages. With a broad range of packages, the use of R for geospatial analysis has risen in popularity. From geospatial statistics, modeling and visualization, its RStudio IDE also made R to be a user-friendly tool for map making and analysis.
This 4-days online workshop is a beginner-friendly introduction to Geospatial Analysis in R. By visualizing geospatial data, users can have more intuitive decision making by contextualizing data in the real world, comparing informations across the city, state, or country.
This is the first online data science series course of Geospatial Analysis in R. The primary objective of this course is to provide a participant a comprehensive introduction about tools and software for visualizing a geospatial data using the popular open-source tools: R. The material will covers:
Introductory Module:
tidyverse
ggplot2Main Module:
ggplot2leaflet - a JavaScript API for creating interactive mapsleafletrmarkdown versatile outputflexdashboard packageIn this Library and Setup section you’ll see some code to initialize our workspace, and the packages we’ll be using for this project.
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.
You will need to use install.packages() to install any packages that are not yet downloaded onto your machine. To install packages, type the command below on your console then press ENTER.
## DO NOT RUN CHUNK
# packages <- c("rgdal","sf","tidyverse","glue", "plotly", "maps","leaflet","leaflet.extras", "tmap", "flexdashboard","DT")
#
# install.packages(packages)Then you need to load the package into your workspace using the library() function. Special for this course, the rmarkdown packages do not need to be called using library().
Geospatial analysis, as the main topic of this workshop, is a domain of analysis that focuses on data processing that is associated with geographic data. With unlisted potentials for many business domain analysis, in this main section of the workshop, we will try to learn the building blocks of geospatial data and combines it with a business use case example in order to visualize our data as an insightful map.
We will explore how to use and obtain geospatial data in R to create some of the most popular types of thematic maps; choropleth (static and interactive) and interactive heatmap.
Let’s have a quick glimpse on a simple geospatial visualization in R by running the code chunk below:
The code above projected countries around the world as a static map. But, how does geographic data information stored? How could we obtain detail information on geographic contours or territorial borders?
m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=107.609810, lat=-6.914744, popup="The birthplace of R")
m # Print the map#> Reading layer `idn' from data source
#> `D:\Learn Data\Algoritma\Geospatial in R\rgeo-intro-master - Copy\shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 6694 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> Geodetic CRS: WGS 84
#> Reading layer `idn' from data source
#> `D:\Learn Data\Algoritma\Geospatial in R\rgeo-intro-master - Copy\shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 6694 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> Geodetic CRS: WGS 84
#> [1] "Aceh" "Bali" "Bangka Belitung"
#> [4] "Banten" "Bengkulu" "Gorontalo"
#> [7] "Jakarta Raya" "Jambi" "Jawa Barat"
#> [10] "Jawa Tengah" "Jawa Timur" "Kalimantan Barat"
#> [13] "Kalimantan Selatan" "Kalimantan Tengah" "Kalimantan Timur"
#> [16] "Kepulauan Riau" "Lampung" "Maluku"
#> [19] "Maluku Utara" "Nusa Tenggara Barat" "Nusa Tenggara Timur"
#> [22] "Papua" "Papua Barat" "Riau"
#> [25] "Sulawesi Barat" "Sulawesi Selatan" "Sulawesi Tengah"
#> [28] "Sulawesi Tenggara" "Sulawesi Utara" "Sumatera Barat"
#> [31] "Sumatera Selatan" "Sumatera Utara" "Yogyakarta"
Figure 2.1: Illustration on Vector vs. Raster
The two most widely used geographic data models for spatial data are vector and raster. A vector data represents location and shape of geographic features through geometric shapes such as points, lines and polygons. Raster data, on the other hand, consists of values within a grid system. You can imagine raster data as a pixelated digital image such as a satellite imagery or a scanned map.
There are lots of data formats that can be used to store each data model. KML, GeoJSON, GeoTIFF, Tab File, are some of the common formats you might ever came accross. However, the most common format in geographic information system mapping is the Shapefile.
Shapefile is the universal standard of geospatial format developed and regulated by Esri, the international supplier of geographic information system softwares. The type format is then adopted by so many programming languages, including R.
The name Shapefile might be a little deceptive since the file is commonly made up from these separate files:
.shp (mandatory): includes the geometry data..shx (mandatory): includes the index data used to identify different geometries..dbf (mandatory): includes attribute information of each geometry’s data..prj includes the information of the coordinate reference systemOther than the listed files above, a shapefile may includes other file components. A comprehensive list of shapefile components can be accessed here.
For this workshop, I have provided a shapefile for Indonesian territory under shp folder of this directory. Notice that inside the directory, there is also a gadm36_IDN_3.zip which actually the original Indonesian shapefile provided by GADM. If you go ahead to the website you can pick out different countries spatial vector. Indonesia’s spatial vector is provided up to 4 levels of granularity. In this course we’ll be working with 3 levels of spatial vector that contains: Provinsi, Kabupaten and Kecamatan.
Extra Tip:
You can also access GADM database directly from R with the help of GADMTools library:
# Code example to load Indonesia's spatial vector in Kabupaten/Kota level
library(GADMTools)
library(sf)
map <- gadm_sf_loadCountries(c("IDN"), level=2, basefile = "./")
Let’s read in the files using st_read() function from sf package:
#> Reading layer `idn' from data source
#> `D:\Learn Data\Algoritma\Geospatial in R\rgeo-intro-master - Copy\shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 6694 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> Geodetic CRS: WGS 84
sf sama seperti df biasa. bedanya kolom geom
sf represents “Simple Features” as records in a data.frame or tibble with a geometry list-column2. Simple Features is a hierarchical data model that represents a wide range of geometry types. Of 17 geometry types supported by the specification, only 7 are used in the vast majority of geographic research 3:
Figure 2.2: Source:Geocomputation with R
If you use the class() function for our newly created idn object you can retrieved how the sf class handles spatial data in similar way as any tabular data structure; by stores it as a dataframe:
#> [1] "sf" "data.frame"
The main difference between a regular dataframe an an sf object is that it has geometry which describes where on Earth the feature is located, and they have attributes, which describe other properties:
#> Geometry set for 6694 features
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> Geodetic CRS: WGS 84
#> First 5 geometries:
To get a better idea about geometry information in an sf object, let’s go ahead and plot idn$geometry using plot() function:
There are several informations that is stored in an sf’s geometry:
POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON and GEOMETRYCOLLECTION.XY), 3-(XYZ/XYM) or 4-dimensional (XYZM) space.A CRS is a fundamental component of geospatial data. It models earth surface into a mathematical model. Intuitively, you could think of it as a way to model a 3 dimensional surfaces such as earth, into a 2 dimensional surface that is commonly used in geospatial analysis: making maps, distance calculation, etc. Take a look at the following images for better illustration:
Figure 2.3: Source:DataCarpentry
If we were using geospatial data from different sources, it is important to make sure the data we are using has the same CRS attribute. A different CRS would not represented in the same mathematical space if combined and would alter any calculation done on the data significantly.
The big advantage of using sf is how you each functions can be combined using %>% operator and works well with the tidyverse collection of R packages, which gives us better control over the geometry information in sf objects. For example, to subset certain provinces, you can use filter() method just like how you work with regular dataframe:
You can also perform aggregation operations over the geometry:
Dive Deeper
Using the same method as above, try to plot a map which shown Indonesian province of North Sumatra in the city (Kabupaten/Kota) level!
#
sumut <- idn %>%
filter(NAME_1 == "Sumatera Utara") %>%
group_by(NAME_2) %>%
summarise()
plot(sumut$geometry)dkijabar <- idn %>%
filter(NAME_1 %in% c("Jakarta Raya","Jawa Barat")) %>%
group_by(NAME_1) %>%
summarise()
plot(dkijabar$geometry)Other than sf’s simple features, there are actually other methodologies for storing model of geographical features into R. If you ever ran into a geospatial studies in R, you may also familiar with the use of sp package. In fact, sp was a very-well developed package since 2005 which practically supported almost every GIS analysis in R, even until now.
The main problem of sp is its low compatibility to R dataframe structure. sf was built to fill the gap. Released in 2016, sf uses OGC (Open Geospatial Consortium) & ISO standard on recording and structuring spatial data with simple features.
The disadvantage of sf is, since it is relatively new, some spatial packages may have not yet added support for sf object. Fortunately, we can stil convert an sf object to spatial class used in sp:
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"
Spatial objects can be converted back to sf in the same way or with sf’s st_as_sf() function:
#> [1] "sf" "data.frame"
Notes:
If you want to learn more about sp and how to work with spatial data using sp ecosystem, you can refer to “sp-example.Rmd” in this directory.
Now let’s head back to our listings.csv, a data consists of properties listed for sale in Jabodetabek area:
#> id kota kecamatan harga KT KM m2
#> 1 1 Jakarta Pusat Gambir 3900000000 3 2 169
#> 2 2 Jakarta Pusat Kemayoran 1450000000 3 2 160
#> 3 3 Jakarta Pusat Gambir 11100000000 10 8 720
#> 4 4 Jakarta Pusat Kemayoran 485000000 2 1 35
#> 5 5 Jakarta Utara Tanjung Priok 5600000000 6 5 240
#> 6 6 Jakarta Barat Tamansari 3000000000 10 5 180
Here, we want to compare the house pricing for each sub-district (Kecamatan) level. Since the size of the listed properties may vary, we’ll use the price per square meter instead:
df_agg <- df %>%
mutate(
harga_m2 = harga / m2
) %>%
group_by(kota, kecamatan) %>%
summarise(harga_m2 = median(harga_m2),
total_listings= n()) %>%
ungroup()
head(df_agg)#> # A tibble: 6 × 4
#> kota kecamatan harga_m2 total_listings
#> <chr> <chr> <dbl> <int>
#> 1 Bekasi Tarumajaya 12281250 463
#> 2 Depok Beji 11746032. 84
#> 3 Depok Cimanggis 11679365. 110
#> 4 Depok Cinere 12761905. 88
#> 5 Jakarta Barat Cengkareng 13040000 455
#> 6 Jakarta Barat Grogolpetamburan 15032680. 223
To project our housing data into a map, we need to equip each observation in the data with geographical information representation. We can do this by joining the information of the city (kota) & sub-district (kecamatan) to NAME_2 & NAME_3 variables respectively in idn:
#> GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 NAME_2 NL_NAME_2
#> 1 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> 2 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> 3 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> 4 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> 5 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> 6 IDN Indonesia IDN.1_1 Aceh <NA> IDN.1.2_1 Aceh Barat <NA>
#> GID_3 NAME_3 VARNAME_3 NL_NAME_3 TYPE_3 ENGTYPE_3
#> 1 IDN.1.2.1_1 Arongan Lambalek <NA> <NA> Kecamatan Sub-district
#> 2 IDN.1.2.2_1 Bubon <NA> <NA> Kecamatan Sub-district
#> 3 IDN.1.2.3_1 Johan Pahlawan <NA> <NA> Kecamatan Sub-district
#> 4 IDN.1.2.4_1 Kaway Xvi <NA> <NA> Kecamatan Sub-district
#> 5 IDN.1.2.5_1 Meureubo <NA> <NA> Kecamatan Sub-district
#> 6 IDN.1.2.6_1 Pantai Ceuremen <NA> <NA> Kecamatan Sub-district
#> CC_3 HASC_3 geometry
#> 1 1107062 <NA> MULTIPOLYGON (((95.97953 4....
#> 2 1107061 <NA> MULTIPOLYGON (((96.16601 4....
#> 3 1107050 <NA> MULTIPOLYGON (((96.13205 4....
#> 4 1107080 <NA> MULTIPOLYGON (((96.16397 4....
#> 5 1107081 <NA> MULTIPOLYGON (((96.25119 4....
#> 6 1107082 <NA> MULTIPOLYGON (((96.18817 4....
To join two dataframes in R, we can use the *_join() function from dplyr package:
#> # A tibble: 6 × 19
#> kota kecamatan harga_m2 total_listings GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1
#> <chr> <chr> <dbl> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 Bekasi Tarumaja… 1.23e7 463 IDN Indon… IDN.… Jawa … <NA>
#> 2 Depok Beji 1.17e7 84 IDN Indon… IDN.… Jawa … <NA>
#> 3 Depok Cimanggis 1.17e7 110 IDN Indon… IDN.… Jawa … <NA>
#> 4 Depok Cinere 1.28e7 88 IDN Indon… IDN.… Jawa … <NA>
#> 5 Jakarta… Cengkare… 1.30e7 455 IDN Indon… IDN.… Jakar… <NA>
#> 6 Jakarta… Grogolpe… 1.50e7 223 IDN Indon… IDN.… Jakar… <NA>
#> # ℹ 10 more variables: GID_2 <chr>, NL_NAME_2 <chr>, GID_3 <chr>,
#> # VARNAME_3 <chr>, NL_NAME_3 <chr>, TYPE_3 <chr>, ENGTYPE_3 <chr>,
#> # CC_3 <chr>, HASC_3 <chr>, geometry <MULTIPOLYGON [°]>
Now that our df_agg data have the geographic attributes attached, we can turn it into an sf object using st_as_sf() function:
#> Geometry set for 59 features
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 106.6325 ymin: -6.39886 xmax: 107.0317 ymax: -6.0409
#> Geodetic CRS: WGS 84
#> First 5 geometries:
ggplot2R is loaded with built-in tools and open source packages to help us turning both sp and sf objects into a neat map visualization. In the first session of this workshop, you have learned about a widely used and powerful plotting library for R, ggplot2. The great thing is that ggplot can plot sf objects directly by using geom_sf.
Recall about the ggplot2 layering system and see the code below:
You can also easily map the color of
harga_m2 by adding the variable inside the aes() in the geom element:
Dive Deeper
Recall how you can use labs() in ggplot to add label informations, such as title, subtitle, etc. Copy down the previous code above, and provide the map with appropriate labellings on the chunk below!
#
ggplot(df_agg)+
geom_sf(aes(fill = harga_m2))+
labs(x="lat", y="long", title="Distribution")+
theme_dark()With small adjustments and feature additions, we have built a nice geographic representation of house pricing in Jakarta. Notice that in the chunk below, I added a layer of theme_algo_map(), which was a customized map I created for this workshop. (You can also create your own theme too!)
The modularity of ggplot2 also allows us to save it as an external file, makes it easy for us to make a reproducible custom theming. You can find the code for theme_algo_map() under assets/theme_algo.R.
source('assets/theme_algo.R')
plot <- ggplot(df_agg)+
geom_sf(aes(fill = harga_m2)) +
labs(title = "Jabodetabek House Price Spatial Distribution",
subtitle = "Price sample in property marketplace",
caption = "Sample uses ±10,000 house listings on the OLX house buying and selling page\n September 2020",
fill = "Price/m2")+
scale_fill_gradient(low = "lightgreen", high = "navy",
labels = number_format(scale = 1/1e6, suffix = " mil.", accuracy = 1))+
theme_algo_map()
plot
Another benefit of creating maps with
ggplot2 is how we can easily add a level of interactivity by simply use ggplotly() function from the plotly package.
Since we have stored our most recent plot creation as an object named plot, let’s try to wrap it with ggplotly(plot).
(P.S. If you haven’t install the plotly package, you can run install.packages("plotly") through your console)
You can also add another adjustments such as customizing the tooltip and hide the mode bar for cleaner appearance:
# create new column to store the tooltip information
df_agg <- df_agg %>%
mutate(text = glue("{kecamatan}, {kota}: <br> {number(harga_m2, scale = 1/1e6, suffix = 'mil.', accuracy = .01)}"))
# re-run the function to create plot
plot <- ggplot(df_agg, aes(text = text))+ # add new aes mapping: text to define the tooltip text
geom_sf(aes(fill = harga_m2)) +
labs(title = "Jabodetabek House Price Spatial Distribution",
subtitle = "Price sample harga in property marketplace",
caption = "Sample uses ±10,000 house listings on the OLX house buying and selling page\n September 2020",
fill = "Price/m2")+
scale_fill_gradient(low = "lavenderblush", high = "red3",
labels = number_format(scale = 1/1e6, suffix = " mil.", accuracy = 1))+
theme_algo_map()
# re-run the `ggplotly`
ggplotly(plot, tooltip = "text") %>% # define tooltip
config(displayModeBar = F) # hide the modebarDive Deeper
Using the same method as above, try to create a map that represent the number of listed properties(total_listings) in each sub-district. Once you’re done, you can also try to customize your own map appearance!
#
# create new column to store the tooltip information
df_agg <- df_agg %>%
mutate(text = glue("{kecamatan}, {kota}: <br> {number(total_listings, suffix = 'Rumah', accuracy = 1)}"))
# re-run the function to create plot
plot <- ggplot(df_agg, aes(text = text))+ # add new aes mapping: text to define the tooltip text
geom_sf(aes(fill = total_listings)) +
labs(title = "Jabodetabek House Price Spatial Distribution",
subtitle = "Price sample harga in property marketplace",
caption = "Sample uses ±10,000 house listings on the OLX house buying and selling page\n September 2020",
fill = "total_listings")+
scale_fill_gradient(low = "navajowhite", high = "navy",
labels = number_format(suffix = " Houses", accuracy = 1))+
theme_minimal()
# re-run the `ggplotly`
ggplotly(plot, tooltip = "text") %>% # define tooltip
config(displayModeBar = F) # hide the modebarleafletAmong all available mapping packages in R, leaflet has became the most widely used for building interactive maps in R. It provides a relatively low-level interface to the Leaflet JavaScript library. The full documentation of the package can be accessed here.
Taken from its official documentation4, to create a Leaflet map we can follow these basic steps:
leaflet().addTiles, addMarkers, addPolygons) to modify the map widget.Now, let’s follow the steps above to recreate the previous choropleth as a Leaflet map.
# leaflet
leaflet(df_agg) %>% # create map widget
addTiles() %>% # add basemap
addPolygons() # add polygons from `sf` data %By default, addTiles() used tiles from OpenStreetMap. We can also use third-party basemaps by using addProviderTiles() instead. The lists of complete set basemaps provided by leaflet-providers can be found here.
#> [1] "OpenStreetMap"
#> [2] "OpenStreetMap.Mapnik"
#> [3] "OpenStreetMap.DE"
#> [4] "OpenStreetMap.CH"
#> [5] "OpenStreetMap.France"
#> [6] "OpenStreetMap.HOT"
#> [7] "OpenStreetMap.BZH"
#> [8] "MapTilesAPI"
#> [9] "MapTilesAPI.OSMEnglish"
#> [10] "MapTilesAPI.OSMFrancais"
#> [11] "MapTilesAPI.OSMEspagnol"
#> [12] "OpenSeaMap"
#> [13] "OPNVKarte"
#> [14] "OpenTopoMap"
#> [15] "OpenRailwayMap"
#> [16] "OpenFireMap"
#> [17] "SafeCast"
#> [18] "Stadia"
#> [19] "Stadia.AlidadeSmooth"
#> [20] "Stadia.AlidadeSmoothDark"
#> [21] "Stadia.OSMBright"
#> [22] "Stadia.Outdoors"
#> [23] "Stadia.StamenToner"
#> [24] "Stadia.StamenTonerBackground"
#> [25] "Stadia.StamenTonerLines"
#> [26] "Stadia.StamenTonerLabels"
#> [27] "Stadia.StamenTonerLite"
#> [28] "Stadia.StamenWatercolor"
#> [29] "Stadia.StamenTerrain"
#> [30] "Stadia.StamenTerrainBackground"
#> [31] "Stadia.StamenTerrainLabels"
#> [32] "Stadia.StamenTerrainLines"
#> [33] "Thunderforest"
#> [34] "Thunderforest.OpenCycleMap"
#> [35] "Thunderforest.Transport"
#> [36] "Thunderforest.TransportDark"
#> [37] "Thunderforest.SpinalMap"
#> [38] "Thunderforest.Landscape"
#> [39] "Thunderforest.Outdoors"
#> [40] "Thunderforest.Pioneer"
#> [41] "Thunderforest.MobileAtlas"
#> [42] "Thunderforest.Neighbourhood"
#> [43] "CyclOSM"
#> [44] "Jawg"
#> [45] "Jawg.Streets"
#> [46] "Jawg.Terrain"
#> [47] "Jawg.Sunny"
#> [48] "Jawg.Dark"
#> [49] "Jawg.Light"
#> [50] "Jawg.Matrix"
#> [51] "MapBox"
#> [52] "MapTiler"
#> [53] "MapTiler.Streets"
#> [54] "MapTiler.Basic"
#> [55] "MapTiler.Bright"
#> [56] "MapTiler.Pastel"
#> [57] "MapTiler.Positron"
#> [58] "MapTiler.Hybrid"
#> [59] "MapTiler.Toner"
#> [60] "MapTiler.Topo"
#> [61] "MapTiler.Voyager"
#> [62] "TomTom"
#> [63] "TomTom.Basic"
#> [64] "TomTom.Hybrid"
#> [65] "TomTom.Labels"
#> [66] "Esri"
#> [67] "Esri.WorldStreetMap"
#> [68] "Esri.DeLorme"
#> [69] "Esri.WorldTopoMap"
#> [70] "Esri.WorldImagery"
#> [71] "Esri.WorldTerrain"
#> [72] "Esri.WorldShadedRelief"
#> [73] "Esri.WorldPhysical"
#> [74] "Esri.OceanBasemap"
#> [75] "Esri.NatGeoWorldMap"
#> [76] "Esri.WorldGrayCanvas"
#> [77] "OpenWeatherMap"
#> [78] "OpenWeatherMap.Clouds"
#> [79] "OpenWeatherMap.CloudsClassic"
#> [80] "OpenWeatherMap.Precipitation"
#> [81] "OpenWeatherMap.PrecipitationClassic"
#> [82] "OpenWeatherMap.Rain"
#> [83] "OpenWeatherMap.RainClassic"
#> [84] "OpenWeatherMap.Pressure"
#> [85] "OpenWeatherMap.PressureContour"
#> [86] "OpenWeatherMap.Wind"
#> [87] "OpenWeatherMap.Temperature"
#> [88] "OpenWeatherMap.Snow"
#> [89] "HERE"
#> [90] "HERE.normalDay"
#> [91] "HERE.normalDayCustom"
#> [92] "HERE.normalDayGrey"
#> [93] "HERE.normalDayMobile"
#> [94] "HERE.normalDayGreyMobile"
#> [95] "HERE.normalDayTransit"
#> [96] "HERE.normalDayTransitMobile"
#> [97] "HERE.normalDayTraffic"
#> [98] "HERE.normalNight"
#> [99] "HERE.normalNightMobile"
#> [100] "HERE.normalNightGrey"
#> [101] "HERE.normalNightGreyMobile"
#> [102] "HERE.normalNightTransit"
#> [103] "HERE.normalNightTransitMobile"
#> [104] "HERE.reducedDay"
#> [105] "HERE.reducedNight"
#> [106] "HERE.basicMap"
#> [107] "HERE.mapLabels"
#> [108] "HERE.trafficFlow"
#> [109] "HERE.carnavDayGrey"
#> [110] "HERE.hybridDay"
#> [111] "HERE.hybridDayMobile"
#> [112] "HERE.hybridDayTransit"
#> [113] "HERE.hybridDayGrey"
#> [114] "HERE.hybridDayTraffic"
#> [115] "HERE.pedestrianDay"
#> [116] "HERE.pedestrianNight"
#> [117] "HERE.satelliteDay"
#> [118] "HERE.terrainDay"
#> [119] "HERE.terrainDayMobile"
#> [120] "HEREv3"
#> [121] "HEREv3.normalDay"
#> [122] "HEREv3.normalDayCustom"
#> [123] "HEREv3.normalDayGrey"
#> [124] "HEREv3.normalDayMobile"
#> [125] "HEREv3.normalDayGreyMobile"
#> [126] "HEREv3.normalDayTransit"
#> [127] "HEREv3.normalDayTransitMobile"
#> [128] "HEREv3.normalNight"
#> [129] "HEREv3.normalNightMobile"
#> [130] "HEREv3.normalNightGrey"
#> [131] "HEREv3.normalNightGreyMobile"
#> [132] "HEREv3.normalNightTransit"
#> [133] "HEREv3.normalNightTransitMobile"
#> [134] "HEREv3.reducedDay"
#> [135] "HEREv3.reducedNight"
#> [136] "HEREv3.basicMap"
#> [137] "HEREv3.mapLabels"
#> [138] "HEREv3.trafficFlow"
#> [139] "HEREv3.carnavDayGrey"
#> [140] "HEREv3.hybridDay"
#> [141] "HEREv3.hybridDayMobile"
#> [142] "HEREv3.hybridDayTransit"
#> [143] "HEREv3.hybridDayGrey"
#> [144] "HEREv3.pedestrianDay"
#> [145] "HEREv3.pedestrianNight"
#> [146] "HEREv3.satelliteDay"
#> [147] "HEREv3.terrainDay"
#> [148] "HEREv3.terrainDayMobile"
#> [149] "FreeMapSK"
#> [150] "MtbMap"
#> [151] "CartoDB"
#> [152] "CartoDB.Positron"
#> [153] "CartoDB.PositronNoLabels"
#> [154] "CartoDB.PositronOnlyLabels"
#> [155] "CartoDB.DarkMatter"
#> [156] "CartoDB.DarkMatterNoLabels"
#> [157] "CartoDB.DarkMatterOnlyLabels"
#> [158] "CartoDB.Voyager"
#> [159] "CartoDB.VoyagerNoLabels"
#> [160] "CartoDB.VoyagerOnlyLabels"
#> [161] "CartoDB.VoyagerLabelsUnder"
#> [162] "HikeBike"
#> [163] "HikeBike.HikeBike"
#> [164] "HikeBike.HillShading"
#> [165] "BasemapAT"
#> [166] "BasemapAT.basemap"
#> [167] "BasemapAT.grau"
#> [168] "BasemapAT.overlay"
#> [169] "BasemapAT.terrain"
#> [170] "BasemapAT.surface"
#> [171] "BasemapAT.highdpi"
#> [172] "BasemapAT.orthofoto"
#> [173] "nlmaps"
#> [174] "nlmaps.standaard"
#> [175] "nlmaps.pastel"
#> [176] "nlmaps.grijs"
#> [177] "nlmaps.water"
#> [178] "nlmaps.luchtfoto"
#> [179] "NASAGIBS"
#> [180] "NASAGIBS.ModisTerraTrueColorCR"
#> [181] "NASAGIBS.ModisTerraBands367CR"
#> [182] "NASAGIBS.ViirsEarthAtNight2012"
#> [183] "NASAGIBS.ModisTerraLSTDay"
#> [184] "NASAGIBS.ModisTerraSnowCover"
#> [185] "NASAGIBS.ModisTerraAOD"
#> [186] "NASAGIBS.ModisTerraChlorophyll"
#> [187] "NLS"
#> [188] "JusticeMap"
#> [189] "JusticeMap.income"
#> [190] "JusticeMap.americanIndian"
#> [191] "JusticeMap.asian"
#> [192] "JusticeMap.black"
#> [193] "JusticeMap.hispanic"
#> [194] "JusticeMap.multi"
#> [195] "JusticeMap.nonWhite"
#> [196] "JusticeMap.white"
#> [197] "JusticeMap.plurality"
#> [198] "GeoportailFrance"
#> [199] "GeoportailFrance.plan"
#> [200] "GeoportailFrance.parcels"
#> [201] "GeoportailFrance.orthos"
#> [202] "OneMapSG"
#> [203] "OneMapSG.Default"
#> [204] "OneMapSG.Night"
#> [205] "OneMapSG.Original"
#> [206] "OneMapSG.Grey"
#> [207] "OneMapSG.LandLot"
#> [208] "USGS"
#> [209] "USGS.USTopo"
#> [210] "USGS.USImagery"
#> [211] "USGS.USImageryTopo"
#> [212] "WaymarkedTrails"
#> [213] "WaymarkedTrails.hiking"
#> [214] "WaymarkedTrails.cycling"
#> [215] "WaymarkedTrails.mtb"
#> [216] "WaymarkedTrails.slopes"
#> [217] "WaymarkedTrails.riding"
#> [218] "WaymarkedTrails.skating"
#> [219] "OpenAIP"
#> [220] "OpenSnowMap"
#> [221] "OpenSnowMap.pistes"
#> [222] "AzureMaps"
#> [223] "AzureMaps.MicrosoftImagery"
#> [224] "AzureMaps.MicrosoftBaseDarkGrey"
#> [225] "AzureMaps.MicrosoftBaseRoad"
#> [226] "AzureMaps.MicrosoftBaseHybridRoad"
#> [227] "AzureMaps.MicrosoftTerraMain"
#> [228] "AzureMaps.MicrosoftWeatherInfraredMain"
#> [229] "AzureMaps.MicrosoftWeatherRadarMain"
#> [230] "SwissFederalGeoportal"
#> [231] "SwissFederalGeoportal.NationalMapColor"
#> [232] "SwissFederalGeoportal.NationalMapGrey"
#> [233] "SwissFederalGeoportal.SWISSIMAGE"
To access the supported tile providers, we can also useproviders$ and choose from the options:
Let’s now color the polygons based on the house pricing. First, we need to create a color palette to represent the data. In the chunk below, we create an object called pal which stores the colors generated from df_agg$harga_m2 values. The value passed to fill in palette is provided by ColorBrewer2 palettes.
Notice that on the next line, inside the addPolygons() function call, we also add some parameters:
fillColor: the values and palette to be mappedfillOpacity: the fillColor opacityweight: thickness of the mapped polygonscolor: color of the mapped polygonspal <- colorNumeric(palette = "Reds", domain = df_agg$harga_m2)
leaflet(df_agg) %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(
fillColor = ~pal(harga_m2),
fillOpacity = .8,
weight = 2,
color = "white"
)To add more interaction to our map, we can also make the polygons highlight as the mouse passes over them. We can do it by defining the highlight argument inside the addPolygons() function:
leaflet(df_agg) %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(
fillColor = ~pal(harga_m2),
fillOpacity = .8,
weight = 2,
color = "white",
highlight = highlightOptions(
weight = 5,
color = "black",
bringToFront = TRUE,
opacity = 0.8
)
)Now let’s also add labels information as we hover over each sub-district mapped. To add the label, we need to create a HTML object using htmltools::HTML which stores the value of information we desired to be shown:
labels <- glue::glue("
<b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} mil/m2"
) %>% lapply(htmltools::HTML)Then, to map it, we can add label argument inside the addPolygons() function.
leaflet(df_agg) %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(
label = labels,
fillColor = ~pal(harga_m2),
fillOpacity = .8,
weight = 2,
color = "white",
highlight = highlightOptions(
weight = 5,
color = "black",
bringToFront = TRUE,
opacity = 0.8
)
)As the final step, we might also need to add a legend to give information about the colors and intervals. To add the legend, we only a layer of addLegend() function:
leaflet(df_agg) %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(
label = labels,
fillColor = ~pal(harga_m2),
fillOpacity = .8,
weight = 2,
color = "white",
highlight = highlightOptions(
weight = 5,
color = "black",
bringToFront = TRUE,
opacity = 0.8
)
) %>%
addLegend(
pal = pal,
values = ~harga_m2,
opacity = 1,
title = "Price/m2",
position = "bottomright"
)Another Viz
pal <- colorNumeric(palette = "Purples", domain = df_agg$harga_m2)
labels <- glue::glue("
<b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} mil/m2"
) %>% lapply(htmltools::HTML)
border <- df_agg %>%
filter(NAME_1 == "Jakarta Raya") %>%
group_by(NAME_1) %>%
summarise()
leaflet(df_agg) %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>% # using `addProviderTiles()` instead of `addTiles()`
addPolygons(
label = labels,
labelOptions = labelOptions(
style = list(
"font-size"="13px",
"background-color"="black",
"color"="white"
)
),
weight = 2,
color = "white",
fillOpacity = .8,
fillColor = ~pal(harga_m2),
highlight = highlightOptions(
weight = 5,
color = "black",
bringToFront = TRUE,
sendToBack = TRUE,
opacity = 0.8)
) %>%
addPolylines(
data = border,
color = "darkred",
opacity = .8,
weight = 2.5
) %>%
addLegend(
pal = pal,
values = ~harga_m2,
opacity = 1,
title = "Price/m2",
position = "bottomright"
) %>%
fitBounds(106.686211, -6.370783, 106.972824, -6.089036)In general, spatial data can be represented either as reference maps or thematic maps. While reference maps emphasizes the location of objects in the world, thematic maps shows the spatial variability of a specific distribution. The house pricing distribution map we created earlier is among the most frequently used thematic map types in geospatial data, which is the Choropleth map.
Figure 4.1: Source: Mapping Ideas from Cyberspace to Realspace
As an example, let’s take a look at this dataset of Perumahan (housing estate) locations in Jakarta area:
#> perumahan kota latitude longitude
#> 1 Bintaro Jaya Sektor 2 Jakarta Selatan -6.278108 106.7522
#> 2 Buaran Regency Jakarta Timur -6.232117 106.9251
#> 3 Casa Permata Hijau Jakarta Selatan -6.221427 106.7832
#> 4 Cempaka Residence Jakarta Selatan -6.291828 106.7779
#> 5 Green Lake City Jakarta Barat -6.177477 106.7113
#> 6 Green Lontar Jakarta Selatan -6.364691 106.7983
Let’s try to create a dot density map using leaflet’s addCircles() function:
Another common way of representing geographical density map is by using a heatmap. To create a heatmap in leaflet, we can use leaflet.extras package:
Geospatial mapping is an active area in R community, which makes R loaded with many other packages that supported spatial visualization. Packages like cartography, mapview, ggspatial, rasterVis and many other libraries that you can explore. Other than ggplot2 and leaflet that we used in this workshop, tmap is also one of the most popular package used to create both static and interactive maps in R.
tmap also works similarly with ggplot2, since it is also based on the idea of Grammar of Graphics5:
Just like how you can use facets in ggplot2, you can also create it with tm_facets(), for example, to split our map based on each city:
tmap also has a nice feature that allows us to add interactivity by switching from “plot” to “view” mode like so:
An effective way to communicate data visualization is by creating a dashboard. Dashboard helps to communicate information intuitively and essential to support data-driven decision making. The R Markdown file can be used to turn our analysis into fully reproducible documents that can be shared with others to communicate our analysis quickly and effectively.
We will create simple Jakarta housing dashboard using R’s flexdashboard package:
Figure 6.1: Snapshot of the final dashboard
Flex Dashboard is an R package that “easily create flexible, attractive, interactive dashboards with R”. Authoring and customization of dashboards is done using R Markdown with the flexdashboard::flex_dashboard output format. To get started, install flexdashboard using the standard installation process you should be familiar by now: install.packages("flexdashboard").
When that is done, create a new R Markdown document from within RStudio, choose “From Template” and then Flex Dashboard as following:
Figure 6.2: Creating a new Flex Dashboard
The template code that was generated for you takes some default value - for example it chooses to have a columns orientation and set your layout to fill, and just like a regular R Markdown file, we have three basic components:
Now, let’s take a closer look on the body part of the generated template code. For the body part, we also need to specify the layout for the page. We can create layouts with multiple columns by using -------------- for each column, and to include dashboard components (plot, table, map, etc.) we can use ###.
Here is the base layout for Jakarta housing dashboard:
---
title: "Analisis Pasar Properti Jabodetabek"
output:
flexdashboard::flex_dashboard:
theme: readable
---
```{r setup, include=FALSE}
library(flexdashboard)
```
Disclaimer {.sidebar}
-------------------------------------
**Disclaimer**:
Dashboard ini dibuat hanya untuk kepentingan edukasi membuat dashboard dan peta interaktif menggunakan R.
Data yang ditampilkan merupakan hasil sample cepat dari ±10,000 rumah yang dijual di salah satu situs marketplace di Indonesia pada September 2020. Nilai yang di tampilkan belum tentu merepresentasikan harga yang sebenarnya.
Column
-------------------------------------
### Component A
```{r}
```
Column
-------------------------------------
### Component B
```{r}
```
Dive Deeper
Refer to dashboard-geo.html in this directory for the map we’re going to recreate. The first component that we’re going to show is the choropleth we created using leaflet on our previous section. Can you copy down the code and paste it on the base layout above correctly?
A flexdashboard can include a wide variety of components, including the following interactive JavaScript visualizations based on HTML widgets6. Examples of HTML widgets includes the leaflet maps, DT tables, dygraphs maps, etc. Other HTML widgets can be seen here.
For example, for the next component, we’re going to create a table using DT, an R’s interface to the JavaScript library DataTables7. To create a DT object, we can use datatable() and pass in the data that we want to view. Here, we “tidy up” the data source we used for the choropleth by dropping some columns, etc. and map the data inside the datatable() function:
library(DT)
data <- df_agg %>%
as.data.frame() %>%
arrange(desc(harga_m2)) %>%
select(kota, kecamatan, harga_m2) %>%
mutate(harga_m2 = number(harga_m2, big.mark = ",")) %>%
rename(
Kota = kota,
Kecamatan = kecamatan,
`Harga/m2` = harga_m2
)
datatable(
data
)We can also add more extensions on our table, for example to add download buttons and set the number of entries shown:
DT::datatable(
data,
extensions = "Buttons",
options = list(
pageLength = 25,
dom = 'Bfrtip',
buttons = c('csv','excel','pdf')
)
)The geodashboard can be seen in https://rpubs.com/godfriedjunio/jakartapropertydashboard
Kristin Stock, Hans Guesgen, in Automating Open Source Intelligence, 2016↩︎
Robin Lovelace, Jakub Nowosad, Jannes Muenchow, “Geocomputation with R”↩︎
Wilkinson, Leland, and Graham Wills. 2005. The Grammar of Graphics. Springer Science+ Business Media.↩︎
Paula Moraga, Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny, 2019↩︎