is.element('sf', installed.packages())[1] TRUE
library(sf)Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
Spatial data consists of positional information, answering the question “where is it” (on Earth, body, Sun, moon etc). Many empirical data contain not only information about the attribute of interest (i.e. the response/variable being studied), but also other variables that denote the geographic location where the response was observed. In particular, spatial data has a spatial reference: they have (i) coordinates and (ii) a system of reference for those coordinates (aka a coordinate reference system, CRS).
To verify if the CRS used in the dataset is accurate, we can refer to the Coordinate Systems Worldwide - https://epsg.io/.
One of the key features of spatial data is the auto-correlation (i.e. correlation with itself) of observations of observations in space. Observations in close spatial proximity tend to be more similar than for observations that are more spatially separated. We can understand that \(Cor({X_1},{X_2})\) ranges from \(-1\) to \(1\) and that correlation increases as \(X_1\) and \(X_2\) get closer and closer to each other.
Essentially, spatial data analysis developed concurrently across many disciplines, including geology, geography, and meteorology, as well as other non-traditional statistical areas. This led to the development of various methodologies, from traditional statistical techniques like linear models, response surface theory, and spatial statistics to spatial econometric approaches. The primary challenge today is to effectively combine these diverse tools to suit our specific analytical needs.
We denote a spatial process in \(D\) dimensions as: \({Z(s):s\in D \subset R^d}\) where \(Z\) is the observed attribute of location s, a \((D \times 1)\) vector of coordinates. The spatial data types are distinguished through characteristics of the domain \(D\).
Domain \(D\) is continuous s.t. \(Z(s)\) can be observed anywhere in \(D\). This means that between any two sample locations, we can theoretically place an infinite number of other samples. For example, I can measure temperature at multiple locations in the same classroom.
Due to the continuity of \(D\), geostatistical data is also known as spatial data with continuous variation. Since the spatial domain is continuous, it cannot be sampled exhaustively and an important task in the analysis is the reconstruction of the surface of the attribute \(Z\) over the entire domain, i.e. mapping of \(Z(s)\).
Continuity is associated with the domain and not the attribute itself (which may be discrete or continuous or even nominal or ordinal). For example, temperature can be measured using a Celsius scale (continuous) or an ordinal scale (discrete).
A point pattern has a random domain and the attribute itself will be Degenerate/binary in nature (i.e. it is either there or not there). If along with the location of an event, if we observe a stochastic attribute, then it is called a marked pattern.
When we start up our project, we will need to make sure we install and load the necessary packages. The ‘sf’ package should already be installed so we just need to load the ‘sf’ package from our library using the following code:
is.element('sf', installed.packages())[1] TRUE
library(sf)Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE
When we load the dataset, we will need to use the st_read() command so as to read the file to R. This applies for both shapefiles and geojson files.
sg_pa = st_read("C:/Users/rache/Downloads/MasterPlan2014SubzoneBoundaryWebSHP")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Users\rache\Downloads\MasterPlan2014SubzoneBoundaryWebSHP'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
sg_spt = st_read("C:/Users/rache/Downloads/SportSGSportFacilitiesGEOJSON.geojson")Reading layer `SportSGSportFacilitiesGEOJSON' from data source
`C:\Users\rache\Downloads\SportSGSportFacilitiesGEOJSON.geojson'
using driver `GeoJSON'
Simple feature collection with 35 features and 2 fields
Geometry type: POLYGON
Dimension: XYZ
Bounding box: xmin: 103.6932 ymin: 1.285977 xmax: 103.9526 ymax: 1.435855
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
In case that your data set is not in the same folder as the R Markdown file, use the file path instead to ensure that the file is read properly in R.
Now that we have loaded our dataset, we will need to inspect the data as we will need to make sure that there are no false entries. These false entries could be due to overlapping polygons. As a habit, we should always check the validity especially if the dataset is downloaded from the Internet.
Always validate the dataset before we do anything else!
class(sg_pa)[1] "sf" "data.frame"
summary(sg_pa) OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C
Min. : 1.0 Min. : 1.000 Length:323 Length:323
1st Qu.: 81.5 1st Qu.: 2.000 Class :character Class :character
Median :162.0 Median : 4.000 Mode :character Mode :character
Mean :162.0 Mean : 4.625
3rd Qu.:242.5 3rd Qu.: 6.500
Max. :323.0 Max. :17.000
CA_IND PLN_AREA_N PLN_AREA_C REGION_N
Length:323 Length:323 Length:323 Length:323
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
REGION_C INC_CRC FMEL_UPD_D X_ADDR
Length:323 Length:323 Min. :2014-12-05 Min. : 5093
Class :character Class :character 1st Qu.:2014-12-05 1st Qu.:21864
Mode :character Mode :character Median :2014-12-05 Median :28465
Mean :2014-12-05 Mean :27257
3rd Qu.:2014-12-05 3rd Qu.:31674
Max. :2014-12-05 Max. :50425
Y_ADDR SHAPE_Leng SHAPE_Area geometry
Min. :19579 Min. : 871.6 Min. : 39438 MULTIPOLYGON :323
1st Qu.:31776 1st Qu.: 3709.6 1st Qu.: 628261 epsg:NA : 0
Median :35113 Median : 5211.9 Median : 1229894 +proj=tmer...: 0
Mean :36106 Mean : 6524.4 Mean : 2420882
3rd Qu.:39869 3rd Qu.: 6942.6 3rd Qu.: 2106483
Max. :49553 Max. :68083.9 Max. :69748299
head(sg_pa)Simple feature collection with 6 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 24468.89 ymin: 28369.47 xmax: 32362.39 ymax: 30542.74
Projected CRS: SVY21
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1 1 1 MARINA SOUTH MSSZ01 Y MARINA SOUTH
2 2 1 PEARL'S HILL OTSZ01 Y OUTRAM
3 3 3 BOAT QUAY SRSZ03 Y SINGAPORE RIVER
4 4 8 HENDERSON HILL BMSZ08 N BUKIT MERAH
5 5 3 REDHILL BMSZ03 N BUKIT MERAH
6 6 7 ALEXANDRA HILL BMSZ07 N BUKIT MERAH
PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
1 MS CENTRAL REGION CR 5ED7EB253F99252E 2014-12-05 31595.84
2 OT CENTRAL REGION CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3 SR CENTRAL REGION CR C35FEFF02B13E0E5 2014-12-05 29654.96
4 BM CENTRAL REGION CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5 BM CENTRAL REGION CR 85D9ABEF0A40678F 2014-12-05 26201.96
6 BM CENTRAL REGION CR 9D286521EF5E3B59 2014-12-05 25358.82
Y_ADDR SHAPE_Leng SHAPE_Area geometry
1 29220.19 5267.381 1630379.3 MULTIPOLYGON (((31495.56 30...
2 29782.05 3506.107 559816.2 MULTIPOLYGON (((29092.28 30...
3 29974.66 1740.926 160807.5 MULTIPOLYGON (((29932.33 29...
4 29933.77 3313.625 595428.9 MULTIPOLYGON (((27131.28 30...
5 30005.70 2825.594 387429.4 MULTIPOLYGON (((26451.03 30...
6 29991.38 4428.913 1030378.8 MULTIPOLYGON (((25899.7 297...
table(st_is_valid(sg_pa))
FALSE TRUE
9 314
class(sg_pa): Used to determine the class of the object, e.g., data.frame.summary(sg_pa): Calculates a 6-number summary including the minimum value, first quartile (25th percentile), median (50th percentile), mean, third quartile (75th percentile), and maximum value.head(sg_pa): Inspects the first 6 rows of the dataset.table(st_is_valid(sg_pa)): Creates a table for the object and checks the validity of the object.If there are false entries, use the command st_make_valid() to validate the object.
sg_pa = st_make_valid(sg_pa)
table(st_is_valid(sg_pa))
TRUE
323
sg_pa = st_make_valid(sg_pa): Used for making the object valid.table(st_is_valid(sg_pa)): Used for creating a table from the newly validated object.Now, we can visualize our dataset in the form of maps by using the plot() command. If there are many attributes in the dataset, we can use the max.plot() command together with the plot() command to limit the number of maps generated. If we wish to plot a map using a specific attribute, then we need to specify the attribute in the plot() command.
plot(sg_pa, max.plot = 12)plot(sg_pa["PLN_AREA_N"])plot(sg_pa, max.plot = 12): Plots our object; the max.plot function generates the number of maps specified (e.g., 12).plot(sg_pa["PLN_AREA_N"]): Plots our object based on the specified attribute we want (e.g., PLN_AREA_N).Before we can plot our dataset, we will need to extract the geometry for our geojson file. This is so that we can seoarate the spatial data freom other attribute information, allowing us to visualize, process, and interact with the geographic shapes (points, lines, polygons) using R. This process is necessary because geometry is what defines the map features, and extracting it provides a clean dataset for tasks like rendering maps, performing spatial analysis, or exchanging data between different platforms.
st_geometry(sg_spt)Geometry set for 35 features
Geometry type: POLYGON
Dimension: XYZ
Bounding box: xmin: 103.6932 ymin: 1.285977 xmax: 103.9526 ymax: 1.435855
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
First 5 geometries:
POLYGON Z ((103.7628 1.310547 0, 103.7628 1.310...
POLYGON Z ((103.7644 1.311377 0, 103.7646 1.311...
POLYGON Z ((103.6937 1.33894 0, 103.6937 1.3389...
POLYGON Z ((103.8723 1.323252 0, 103.8723 1.323...
POLYGON Z ((103.8774 1.306804 0, 103.8787 1.306...
st_geometry(sg_spt): Extracts the geometry.After extracting the geometry, we can now plot the spatial data. To add colour to our plots, we can make use of the command col = “colour” as well as border = “colour” so that the plots are filled in colour. Additionally, the command main = “title” will create a title for the plot.
plot(sg_spt$geometry, col = "green", border = "darkblue", main="SportSG Sport Facilities Locations")Lastly, to load the plot of the dengue clusters on the Singapore map, we can make use of the library tmap as well as the command tm_shape(), which specifies the spatial data object or “shape” that serves as the base layer for creating a thematic map. Essentially, this defines the spatial context for subsequent map layers. Then, we use the tm_polygons() command to draw polygons on top of the map layer. Finally, to view the map that we have created, we use the command tmap_mode() where we can set the mode to (i) plot, which creates default and static maps, while (ii) view creates interactive maps that can be zoomed in and out, allow for changing background tiles (basemaps), or click on map objects to get some additional information.
library(tmap)
sploc_map = tm_shape(sg_spt$geometry) + tm_polygons(col = "green")
tmap_mode("view")ℹ tmap mode set to "view".
sploc_maptmap package is used for creating thematic maps.tm_shape: Specifies the spatial data object for the base layer and creates polygons on top of the map layer.tmap_mode: Sets the created map to be an interactive map.