1) Spatial data: Describe the absolute and relative location of geographic feathers.
2) Attributable data
Describe the characteristic of the spatial feathers
This characteristic can be quantitative and/or qualitative in nature
Vector data: Points, Line, and polygon
Raster data: Matrix of cells (grid data), contain attributable value, each cell is known as “spatial resolution”
It is necessary to locate the GIS data from 3 dimensions into 2 dimensions.
There are several methods of spatial referencing exist: geographic coordinate systems, rectangular coordinate systems, and non-coordinate systems.
To identify the locations on the surface of the Earth.
An x and y-axis are normally used.
Latitude (y) - horizontal line (up-down), from N to S
Longitude (x) - vertical line (left-right), from E to W
Each axis represents the angle at which that line is oriented with respect to the center of the Earth, units are measured in degrees (°)
Note!
A point used a single coordinate pair to represent the location.
Lines, connect point to point (starting and ending points are referred to “nodes”)
Area is often represented the closed polygons.
Attribute data (raster) may be attached to the polygons and contained non-spatial information.
Set of cells in grid pattern.
These cells are square and evenly spaced in the x and y directions
Coordinate is defined in the center of the cell
The cell dimension specified the length and width, known as “resolution”.
The resolution is typically constant for a raster data layer.
Multidimensional raster data
There are several tools using in GIS e.g., buffer, clip, merge, dissolve, intersect, union, erase, etc.
sp : for working with vector data
rgdal : for importing and exporting vector data from
other programs
raster : for working with raster data
just run:
install.packages(c("sp", "raster", "rgdal"))
library(sp)
library(raster)
library(rgdal)
sp data format
| Without attributes | With attributes | ArcGIS equivalent |
|---|---|---|
| SpatialPoints | SpatialPointsDataFrame | Point shapefiles |
| SpatialLines | SpatialLinesDataFrame | Line shapefiles |
| SpatialPolygons | SpatialPolygonsDataFrame | Polygon shapefiles |
🤪 Let start with the point data (coordinates), the simplest data! We will use the data set of malaria prevalence from Ethiopia.
Here is briefly explanation of some ambiguous columns,
examined: number of tested
pf_pos: number of the tested were positive for Plasmodium falciparum malaria
pf_pr: rate of the tested having positive for Plasmodium falciparum malaria (pf_pos/examined)
(Source: https://malariaatlas.org/)
dat1 <- read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv", header=T)[,1:15]
head(dat1, 10)
## country country_id continent_id site_id site_name latitude
## 1 Ethiopia ETH Africa 6694 Dole School 5.9014
## 2 Ethiopia ETH Africa 8017 Gongoma School 6.3175
## 3 Ethiopia ETH Africa 12873 Buriya School 7.5674
## 4 Ethiopia ETH Africa 6533 Arero School 4.7192
## 5 Ethiopia ETH Africa 4150 Gandile School 4.8930
## 6 Ethiopia ETH Africa 1369 Melka Amana School 6.2461
## 7 Ethiopia ETH Africa 5808 Ebisa School 7.1657
## 8 Ethiopia ETH Africa 8922 Dhedecha Ferde School 6.7542
## 9 Ethiopia ETH Africa 19345 Kuni Oulaoulo School 5.9471
## 10 Ethiopia ETH Africa 634 Halo Choma School 6.8386
## longitude rural_urban year_start lower_age upper_age examined pf_pos
## 1 38.9412 UNKNOWN 2009 4 15 220 0
## 2 39.8362 UNKNOWN 2009 4 15 216 0
## 3 40.7521 UNKNOWN 2009 4 15 127 0
## 4 38.7650 UNKNOWN 2009 4 15 56 0
## 5 37.3632 UNKNOWN 2009 4 15 219 0
## 6 39.7891 UNKNOWN 2009 4 15 215 1
## 7 40.6490 UNKNOWN 2009 4 15 220 0
## 8 41.1559 UNKNOWN 2009 4 15 129 0
## 9 39.2547 UNKNOWN 2009 4 15 214 1
## 10 41.0551 UNKNOWN 2009 4 15 220 0
## pf_pr method
## 1 0.000000000 Microscopy
## 2 0.000000000 Microscopy
## 3 0.000000000 Microscopy
## 4 0.000000000 Microscopy
## 5 0.000000000 Microscopy
## 6 0.004651163 Microscopy
## 7 0.000000000 Microscopy
## 8 0.000000000 Microscopy
## 9 0.004672897 Microscopy
## 10 0.000000000 Microscopy
Check the %prevalence of malaria (using rate*100)
dat1$pf_pr100 <- dat1$pf_pr*100
hist(dat1$pf_pr100)
Working with base package
It is possible to use ebase package to plot the point. And vary the
size of the prevalence in each location using cex.
plot(dat1$longitude, dat1$latitude,
cex= dat1$pf_pr*10,
ylab= "Longitude",
xlab= "Latitude")
Working with ‘Spatial’ objects
The sp package allow us to convert the data into
spatial objects (SpatialPoints or
Spatialpolygons)
If the data are associated with each spatial feature, we can
create spatial DataFrames (SpatialPointsDataframe or
SpatialPolygonsDataframe)
Here let’s use Ethiopia data to create
SpatialPointsDataframe !!
sur1 <- SpatialPointsDataFrame(coords =dat1[,c("longitude", "latitude")],
data = dat1[,c("examined", "pf_pos", "pf_pr")],
proj4string = CRS("+init=epsg:4326"))
sur1
## class : SpatialPointsDataFrame
## features : 203
## extent : 34.5418, 42.4915, 3.8966, 9.9551 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 3
## names : examined, pf_pos, pf_pr
## min values : 37, 0, 0
## max values : 221, 14, 0.127272727
Note! sets the projection to WGS1984 using lat-long.
Check the coordinate!
head(sur1@coords)
## longitude latitude
## [1,] 38.9412 5.9014
## [2,] 39.8362 6.3175
## [3,] 40.7521 7.5674
## [4,] 38.7650 4.7192
## [5,] 37.3632 4.8930
## [6,] 39.7891 6.2461
Check the data
head(sur1@data)
## examined pf_pos pf_pr
## 1 220 0 0.000000000
## 2 216 0 0.000000000
## 3 127 0 0.000000000
## 4 56 0 0.000000000
## 5 219 0 0.000000000
## 6 215 1 0.004651163
Quick plot
plot(sur1)
Using spatial plot
spplot(sur1, zcol= "pf_pr")
But we could not see where the disease occurred!
Let’s import the province boundaries of Ethiopia
shapefile ETH_Adm_1_shapfile.zip to identify the location
of the malaria occurrence.
There are 2 options;
st_read function from library(sf)
(adm1 <- st_read("yourfolder/filename.shp")) and convert
the file into SpatialPolygonsDataFrame by using
adm1 <- as_Spatial(adm1)getData function directly.adm1 <- raster::getData("GADM", country="ETH", level=1)
adm1
## class : SpatialPolygonsDataFrame
## features : 11
## extent : 33.00154, 47.95823, 3.398823, 14.84548 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 10
## names : GID_0, NAME_0, GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1
## min values : ETH, Ethiopia, ETH.1_1, Addis Abeba, , NA, Astedader, City, 01, ET.AA
## max values : ETH, Ethiopia, ETH.9_1, Tigray, Tegré|Tigrai|Tigre, NA, Kilil, State, 15, ET.TI
Now plot both point data and country
plot(adm1)
points(dat1$longitude, dat1$latitude,
cex = dat1$pf_pr*10,
ylab="Latitude",
xlab="Longitude",
col="red")
We will use raster command to load raster file (.tif
file) into R. In this step, we use elevation data in Ethiopia.
elev <- raster("https://github.com/HughSt/HughSt.github.io/raw/master/course_materials/week1/Lab_files/Data/elev_ETH.tif")
elev
## class : RasterLayer
## dimensions : 1416, 1824, 2582784 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 32.9, 48.1, 3.2, 15 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : elev_ETH.tif
## names : elev_ETH
## values : -189, 4420 (min, max)
Quick plot
plot(elev)
👉 Clip and subset vector data
Get the admin1 boundaries of Ethiopia
adm2 <- raster::getData("GADM", country="ETH", level = 1)
adm2
## class : SpatialPolygonsDataFrame
## features : 11
## extent : 33.00154, 47.95823, 3.398823, 14.84548 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 10
## names : GID_0, NAME_0, GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1
## min values : ETH, Ethiopia, ETH.1_1, Addis Abeba, , NA, Astedader, City, 01, ET.AA
## max values : ETH, Ethiopia, ETH.9_1, Tigray, Tegré|Tigrai|Tigre, NA, Kilil, State, 15, ET.TI
Subset the 1st row of the data
adm2crop <-adm2[1,]
adm2crop
## class : SpatialPolygonsDataFrame
## features : 1
## extent : 38.6394, 38.90624, 8.833486, 9.098195 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 10
## names : GID_0, NAME_0, GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1
## value : ETH, Ethiopia, ETH.1_1, Addis Abeba, Āddīs Ābaba|Addis Ababa|Adis-Abeba|Ādīs Ābeba, NA, Astedader, City, 14, ET.AA
#plot cropped data
plot(adm2crop)
lines(adm2crop, col = "red", lwd = "2")
Now, let’s subset by name. Extract polygon boundary of the province name “Amhara”.
Amhara <- subset(adm2, adm2$NAME_1=="Amhara")
Amhara
## class : SpatialPolygonsDataFrame
## features : 1
## extent : 35.25711, 40.21244, 8.714812, 13.7687 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 10
## names : GID_0, NAME_0, GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1
## value : ETH, Ethiopia, ETH.3_1, Amhara, Amara, NA, Kilil, State, 03, ET.AM
Plot the results
plot(adm2)
lines(Amhara, col="blue", lwd =2)
👉 Clip and subset raster data
| Resolution (decimal degree) | Kilometers (km) |
|---|---|
| 1°×1° | 111.1 |
| 0.1°×0.1° | 11.1 |
| 0.01°×0.01° | 1.1 |
Check the resolution of the elevation data of Ethiopia
res(elev)
## [1] 0.008333333 0.008333333
Let’s aggregate (make into lower resolution) by a factor of 10
elev1 <- aggregate(elev, fact = 10)
res(elev1)
## [1] 0.08333333 0.08333333
plot(elev1)
2.1 Extract by points data
Now let extract values of elevation at each survey point using
extract command.
sur1$elev <- raster::extract(elev, sur1)
sur1
## class : SpatialPointsDataFrame
## features : 203
## extent : 34.5418, 42.4915, 3.8966, 9.9551 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 4
## names : examined, pf_pos, pf_pr, elev
## min values : 37, 0, 0, 817
## max values : 221, 14, 0.127272727, 2451
2.2 Extract by polygons (get value within admin boundary). Note!
for large raster, please check veloxpackage.
adm2$elev <- extract(elev, adm2, fun = mean, na.rm = T)
Quick check for the correlation between malaria prevalence and elevation.
sur1$prevalence <- sur1$pf_pos/sur1$examined
Now plot!
plot(sur1$elev, sur1$prevalence)
Reference
Brunsdon, C., & Comber, L. (2014). An introduction to R for spatial analysis and mapping. Sage.
Ian, H. (2010). An introduction to geographical information systems. Pearson Education India.
Week 1 - Visualizing spatial data | Spatial epidemiology in R (hughst.github.io)