About GIS

1. GIS data types

1) Spatial data: Describe the absolute and relative location of geographic feathers.

2) Attributable data

  • Describe the characteristic of the spatial feathers

  • This characteristic can be quantitative and/or qualitative in nature

2. Spatial data models

Vector data: Points, Line, and polygon

Raster data: Matrix of cells (grid data), contain attributable value, each cell is known as “spatial resolution”

3. Spatial referencing

  • It is necessary to locate the GIS data from 3 dimensions into 2 dimensions.

  • There are several methods of spatial referencing exist: geographic coordinate systems, rectangular coordinate systems, and non-coordinate systems.

  • To identify the locations on the surface of the Earth.

  • An x and y-axis are normally used.

    • Latitude (y) - horizontal line (up-down), from N to S

    • Longitude (x) - vertical line (left-right), from E to W

  • Each axis represents the angle at which that line is oriented with respect to the center of the Earth, units are measured in degrees (°)

Note!

  • A point used a single coordinate pair to represent the location.

  • Lines, connect point to point (starting and ending points are referred to “nodes”)

  • Area is often represented the closed polygons.

  • Attribute data (raster) may be attached to the polygons and contained non-spatial information.

4. Raster data

  • Set of cells in grid pattern.

  • These cells are square and evenly spaced in the x and y directions

  • Coordinate is defined in the center of the cell

  • The cell dimension specified the length and width, known as “resolution”.

  • The resolution is typically constant for a raster data layer.

Multidimensional raster data

  • Represented data captured at multiple times, multiple depths, or heights.

  • Example of real-world environmental data

  • Apply with health outcome data by considering: coordinate (x,y) or boundary area.

5. Geo-processing

There are several tools using in GIS e.g., buffer, clip, merge, dissolve, intersect, union, erase, etc.


Let apply in R! 🤓

Section 1 - Visualizing spatial data

1. Library installation

  1. sp : for working with vector data

    rgdal : for importing and exporting vector data from other programs

    raster : for working with raster data

    just run: install.packages(c("sp", "raster", "rgdal"))

library(sp)
library(raster)
library(rgdal)

2. Working with spatial data

sp data format

Without attributes With attributes ArcGIS equivalent
SpatialPoints SpatialPointsDataFrame Point shapefiles
SpatialLines SpatialLinesDataFrame Line shapefiles
SpatialPolygons SpatialPolygonsDataFrame Polygon shapefiles

🤪 Let start with the point data (coordinates), the simplest data! We will use the data set of malaria prevalence from Ethiopia.

Here is briefly explanation of some ambiguous columns,

  • examined: number of tested

  • pf_pos: number of the tested were positive for Plasmodium falciparum malaria

  • pf_pr: rate of the tested having positive for Plasmodium falciparum malaria (pf_pos/examined)

(Source: https://malariaatlas.org/)

dat1 <- read.csv("https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv", header=T)[,1:15]

head(dat1, 10)
##     country country_id continent_id site_id             site_name latitude
## 1  Ethiopia        ETH       Africa    6694           Dole School   5.9014
## 2  Ethiopia        ETH       Africa    8017        Gongoma School   6.3175
## 3  Ethiopia        ETH       Africa   12873         Buriya School   7.5674
## 4  Ethiopia        ETH       Africa    6533          Arero School   4.7192
## 5  Ethiopia        ETH       Africa    4150        Gandile School   4.8930
## 6  Ethiopia        ETH       Africa    1369    Melka Amana School   6.2461
## 7  Ethiopia        ETH       Africa    5808          Ebisa School   7.1657
## 8  Ethiopia        ETH       Africa    8922 Dhedecha Ferde School   6.7542
## 9  Ethiopia        ETH       Africa   19345  Kuni Oulaoulo School   5.9471
## 10 Ethiopia        ETH       Africa     634     Halo Choma School   6.8386
##    longitude rural_urban year_start lower_age upper_age examined pf_pos
## 1    38.9412     UNKNOWN       2009         4        15      220      0
## 2    39.8362     UNKNOWN       2009         4        15      216      0
## 3    40.7521     UNKNOWN       2009         4        15      127      0
## 4    38.7650     UNKNOWN       2009         4        15       56      0
## 5    37.3632     UNKNOWN       2009         4        15      219      0
## 6    39.7891     UNKNOWN       2009         4        15      215      1
## 7    40.6490     UNKNOWN       2009         4        15      220      0
## 8    41.1559     UNKNOWN       2009         4        15      129      0
## 9    39.2547     UNKNOWN       2009         4        15      214      1
## 10   41.0551     UNKNOWN       2009         4        15      220      0
##          pf_pr     method
## 1  0.000000000 Microscopy
## 2  0.000000000 Microscopy
## 3  0.000000000 Microscopy
## 4  0.000000000 Microscopy
## 5  0.000000000 Microscopy
## 6  0.004651163 Microscopy
## 7  0.000000000 Microscopy
## 8  0.000000000 Microscopy
## 9  0.004672897 Microscopy
## 10 0.000000000 Microscopy

Check the %prevalence of malaria (using rate*100)

dat1$pf_pr100 <-  dat1$pf_pr*100

hist(dat1$pf_pr100)

3. Plotting and mapping spatial data

Working with base package

It is possible to use ebase package to plot the point. And vary the size of the prevalence in each location using cex.

plot(dat1$longitude, dat1$latitude,
     cex= dat1$pf_pr*10,
     ylab= "Longitude",
     xlab= "Latitude")

Working with ‘Spatial’ objects

  • The sp package allow us to convert the data into spatial objects (SpatialPoints or Spatialpolygons)

  • If the data are associated with each spatial feature, we can create spatial DataFrames (SpatialPointsDataframe or SpatialPolygonsDataframe)

Here let’s use Ethiopia data to create SpatialPointsDataframe !!

sur1 <- SpatialPointsDataFrame(coords =dat1[,c("longitude", "latitude")],
                               data = dat1[,c("examined", "pf_pos", "pf_pr")],
                               proj4string = CRS("+init=epsg:4326"))

sur1
## class       : SpatialPointsDataFrame 
## features    : 203 
## extent      : 34.5418, 42.4915, 3.8966, 9.9551  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 3
## names       : examined, pf_pos,       pf_pr 
## min values  :       37,      0,           0 
## max values  :      221,     14, 0.127272727

Note! sets the projection to WGS1984 using lat-long.

Check the coordinate!

head(sur1@coords)
##      longitude latitude
## [1,]   38.9412   5.9014
## [2,]   39.8362   6.3175
## [3,]   40.7521   7.5674
## [4,]   38.7650   4.7192
## [5,]   37.3632   4.8930
## [6,]   39.7891   6.2461

Check the data

head(sur1@data)
##   examined pf_pos       pf_pr
## 1      220      0 0.000000000
## 2      216      0 0.000000000
## 3      127      0 0.000000000
## 4       56      0 0.000000000
## 5      219      0 0.000000000
## 6      215      1 0.004651163

Quick plot

plot(sur1)

Using spatial plot

spplot(sur1, zcol= "pf_pr")

But we could not see where the disease occurred!

Let’s import the province boundaries of Ethiopia shapefile ETH_Adm_1_shapfile.zip to identify the location of the malaria occurrence.

There are 2 options;

  1. Download shapefile ETH_shapefile and use st_read function from library(sf) (adm1 <- st_read("yourfolder/filename.shp")) and convert the file into SpatialPolygonsDataFrame by using adm1 <- as_Spatial(adm1)
  2. Download from getData function directly.
adm1 <- raster::getData("GADM", country="ETH", level=1) 

adm1
## class       : SpatialPolygonsDataFrame 
## features    : 11 
## extent      : 33.00154, 47.95823, 3.398823, 14.84548  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 10
## names       : GID_0,   NAME_0,   GID_1,      NAME_1,          VARNAME_1, NL_NAME_1,    TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## min values  :   ETH, Ethiopia, ETH.1_1, Addis Abeba,                   ,        NA, Astedader,      City,   01,  ET.AA 
## max values  :   ETH, Ethiopia, ETH.9_1,      Tigray, Tegré|Tigrai|Tigre,        NA,     Kilil,     State,   15,  ET.TI

Now plot both point data and country

plot(adm1)
points(dat1$longitude, dat1$latitude,
       cex = dat1$pf_pr*10,
       ylab="Latitude",
       xlab="Longitude",
       col="red")

4. Plotting raster data

We will use raster command to load raster file (.tif file) into R. In this step, we use elevation data in Ethiopia.

elev <- raster("https://github.com/HughSt/HughSt.github.io/raw/master/course_materials/week1/Lab_files/Data/elev_ETH.tif")

elev
## class      : RasterLayer 
## dimensions : 1416, 1824, 2582784  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 32.9, 48.1, 3.2, 15  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : elev_ETH.tif 
## names      : elev_ETH 
## values     : -189, 4420  (min, max)

Quick plot

plot(elev)


Section 2 - Manipulating spatial data

👉 Clip and subset vector data

Get the admin1 boundaries of Ethiopia

adm2 <- raster::getData("GADM", country="ETH", level = 1)

adm2
## class       : SpatialPolygonsDataFrame 
## features    : 11 
## extent      : 33.00154, 47.95823, 3.398823, 14.84548  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 10
## names       : GID_0,   NAME_0,   GID_1,      NAME_1,          VARNAME_1, NL_NAME_1,    TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## min values  :   ETH, Ethiopia, ETH.1_1, Addis Abeba,                   ,        NA, Astedader,      City,   01,  ET.AA 
## max values  :   ETH, Ethiopia, ETH.9_1,      Tigray, Tegré|Tigrai|Tigre,        NA,     Kilil,     State,   15,  ET.TI

Subset the 1st row of the data

adm2crop <-adm2[1,]

adm2crop
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : 38.6394, 38.90624, 8.833486, 9.098195  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 10
## names       : GID_0,   NAME_0,   GID_1,      NAME_1,                                     VARNAME_1, NL_NAME_1,    TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## value       :   ETH, Ethiopia, ETH.1_1, Addis Abeba, Āddīs Ābaba|Addis Ababa|Adis-Abeba|Ādīs Ābeba,        NA, Astedader,      City,   14,  ET.AA

#plot cropped data

plot(adm2crop)
lines(adm2crop, col = "red", lwd = "2")

Now, let’s subset by name. Extract polygon boundary of the province name “Amhara”.

Amhara <- subset(adm2, adm2$NAME_1=="Amhara")
Amhara
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : 35.25711, 40.21244, 8.714812, 13.7687  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 10
## names       : GID_0,   NAME_0,   GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1 
## value       :   ETH, Ethiopia, ETH.3_1, Amhara,     Amara,        NA,  Kilil,     State,   03,  ET.AM

Plot the results

plot(adm2)
lines(Amhara, col="blue", lwd =2)

👉 Clip and subset raster data

  1. Dealing with resolution
Resolution (decimal degree) Kilometers (km)
1°×1° 111.1
0.1°×0.1° 11.1
0.01°×0.01° 1.1

Check the resolution of the elevation data of Ethiopia

res(elev)
## [1] 0.008333333 0.008333333

Let’s aggregate (make into lower resolution) by a factor of 10

elev1 <- aggregate(elev, fact = 10)

res(elev1)
## [1] 0.08333333 0.08333333
plot(elev1)

  1. Extracting data from raster

2.1 Extract by points data

Now let extract values of elevation at each survey point using extract command.

sur1$elev <- raster::extract(elev, sur1)
sur1
## class       : SpatialPointsDataFrame 
## features    : 203 
## extent      : 34.5418, 42.4915, 3.8966, 9.9551  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 4
## names       : examined, pf_pos,       pf_pr, elev 
## min values  :       37,      0,           0,  817 
## max values  :      221,     14, 0.127272727, 2451

2.2 Extract by polygons (get value within admin boundary). Note! for large raster, please check veloxpackage.

adm2$elev <- extract(elev, adm2, fun = mean, na.rm = T)
  1. Exploring spatial analysis

Quick check for the correlation between malaria prevalence and elevation.

sur1$prevalence <- sur1$pf_pos/sur1$examined

Now plot!

plot(sur1$elev, sur1$prevalence)


Reference

  1. Brunsdon, C., & Comber, L. (2014). An introduction to R for spatial analysis and mapping. Sage.

  2. Ian, H. (2010). An introduction to geographical information systems. Pearson Education India.

  3. http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf?fbclid=IwAR0B7RTr_JBstLDujqmbyN5wWLdRIi1qdELuL-jT0bariiX03e0se4vxX9w

  4. Week 1 - Visualizing spatial data | Spatial epidemiology in R (hughst.github.io)