Working with PostGIS from R

Introduction

Spatial data bases are widely used for storing and sharing institutional data. They are often run on a server in order to present spatial data to users on a network or over the internet, often using a mapserver as an intermediate interface.

PostGIS works by adding a set of spatial operations to Postgresql. These spatial operators give the database the ability to run most of the common desktop GIS operations as SQL queries. So, you can also run PostGIS locally in order to augment both vector and raster processing in R. It is open source and it is now easy to install under Windows or Linux.

In order for a local install of PostGIS to be useful as a provider of additional functionality to R there must be a way to transfer data smoothly into and out of PostGIS from R and back. One way to achieve this is to combine RODBC with rgdal. R is also available as a procedural language (PLR) that can be run directly within the database. It is worth looking at the potential of both approaches as additions to the toolbox of OSGEO functions available for R users. These examples have been run on Linux (Ubuntu). While all the code can be adapted to run on Windows quite easily, there are a couple of advantages of running PostGIS on Linux. The first is that it is easy to build from source under Linux. This is useful for raster processing as the code is still under active development and is constantly improving. The second is that it is simple to use system commands from R under linux in order to load data into PostGIS. This can also be achieved under Windows, but it is rather more complicated.

PostGIS can be set up very easily on Windows by adding the extension during the postgresql install procedure.

http://www.postgresql.org/download/windows/

If you want to follow the example of vector processing from Windows you may need to load layers using QGIS. The code for raster processing would not run “out of the box” on Windows, but could be adapted quite easily.

Installing PostGIS under Ubuntu

Ubuntu is the most popular distribution for laptop users. PostGIS can be setup in just a few minutes by pasting in the lines below. Because there are rather a lot of different potential repositories for open source GIS packages it is possible to run into problems with dependencies if you have already installed GIS packages. You may need to check versions carefully if this does not work. The following instructions were tested on tuesday 28 May 2013 on a fresh install of Ubuntu 12.04 Precise. The simplest way of installing is simply to download the binary packages with their dependencies.

Installing from binaries

First install Postgresql. The default repository version at present is postgresql-9.1 (9.2 is now released). Use this version to ensure that all the instructions work.

sudo apt-get -y install postgresql postgresql-client postgresql-server-dev-9.1 pgadmin3

You will now have the database server running in the background. Now comes a slightly confusing element. Postgresql is designed to store institutional data, so it has many built in security features. These are not needed for desktop use. In fact they can easily get in the way, particularly if you are relatively new to databases. So running a couple of lines at this point to give you superuser priviledges will prevent problems in the future. The default superuser is called 'postgres' After an install a password needs to be set for this user. The OS user also known as postgres also has been added (this is a cause of confusion). Set both these passwords to 'postgres' and also set up yourself up as a super user by running these lines.

sudo passwd postgres 

Type “postgres” after typing your own password.

Now alter the postgres user.

sudo -u postgres psql -c "alter user postgres with password 'postgres';"
sudo -u postgres createuser duncan --superuser

The program pgadmin provides a powerful graphical interface to postgresql. If you are new to databases there will be a lot to learn, but you may find many other uses for postgreslq in your research. It is an excelent open source replacement for MsAccess.

Postgresql can now be used, as soon as you have your own PostGIS database. To download a recent binary version of postgis (plus qgis and grass if you don't already have them) run these lines.

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo add-apt-repository ppa:sharpie/postgis-nightly
sudo apt-get update

sudo apt-get -y install postgresql-9.1-postgis qgis grass
sudo apt-get -y install odbc-postgresql unixodbc
sudo apt-get install postgresql-9.1-plr

That is it. We will setup the first database directly from R in order to show how closely the two can be linked, although this can also be done using pgadmin or the commandline.

Building PostGIS from source

Because PostGIS raster is still under active development it is well worth installing the latest version of PostGIS if you want to use all the raster features. This is quite easy to do under Ubuntu. The following was also tested with a fresh install of Ubuntu 12.04 precise. Note that this can be run after installing the binaries without issue. In fact installing a binary package first is the easiest way to make sure that the loaders are available.

First load the necessary libraries.

sudo apt-get -y install proj.dev libgdal.dev
sudo apt-get install libgdal.dev libproj.dev libxml2-dev libgeos-dev 
sudo apt-get install libarmadillo-dev libpoppler-dev libepsilon-dev libexpat-dev liblzma-dev

Now bring down the latest source code, uncompress it, move into the directory and build. The default configuration includes raster functions.

wget http://postgis.net/stuff/postgis-2.2.0dev.tar.gz
tar xvzf postgis-2.2.0*tar.gz
cd postgis*dev
./configure
make
sudo make install

That should all compile and install using only the default configuration settings.

To build the latest PLR from source. This is probably not necessary if you downloaded the binary.

wget https://github.com/jconway/plr/archive/master.zip
unzip master.zip
 cd plr-master
 USE_PGXS=1 make
 sudo USE_PGXS=1 make install

The shapefile and raster loaders may also need to be installed if you do not already have them as a result of installing binary.

 cd postgis*dev*/loader
 sudo make install
 cd ..
 cd ..
 cd postgis*dev*/raster/loader
 sudo make install

Downloading example data

The following file contains all the data needed to run the examples after setting up PostGIS. https://dl.dropboxusercontent.com/u/2703650/Courses/geostats/geostats.zip

wget https://dl.dropboxusercontent.com/u/2703650/Courses/geostats/geostats.zip
unzip geostats.zip

Using PostGIS from R

Loading packages

Two packages are used to link R with PostGIS using the method demonstrated here. RODBC provides Open Data Base Connectivity. This package can be used to connect R to any database, including MySQL and MSAccess. In effect it allows R to “speak” SQL, providing an appropriate data base connection is established.

The other key package is rgdal. This is used to provide access to a wide range of spatial drivers, including PostGIS. Rgdal is needed in order to import data in shapefile or other formats. It is also useful for importing and exporting spatial objects to PostGIS as will be demonstrated using the fairly simple examples.

setwd("/home/duncan/Dropbox/Public/Courses/geostats")
rm(list = ls())
library(RODBC)
library(rgdal)
## Loading required package: sp
## rgdal: version: 0.8-10, (SVN revision 478) Geospatial Data Abstraction
## Library extensions to R successfully loaded Loaded GDAL runtime: GDAL
## 1.9.2, released 2012/10/08 Path to GDAL shared files: /usr/share/gdal/1.9
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480] Path to
## PROJ.4 shared files: (autodetected)

Creating a new data base

A Linux specific trick that can come in very handy is the ability to use system commands without dropping out of R. This can be useful for loading data into PostGIS.

For example, we can create a fresh PostGIS database to hold the example data directly from R.

system("dropdb geostats")
system("createdb geostats")

This is the equivalent of running the following command in a terminal.

createdb geostats

Under Windows it would be preferable to create a new PostGIS database using pgadmin. This is not shown here, but it is quite straight forward. A windows ODBC driver for Postgresql is also available and easy to install.

We now have a blank Postgresql database. However it is not yet a PostGIS data base.

Establishing an ODBC connection

So, let's set up a connection to this database and send some commands to it from R. We can use the first commands to create the PostGIS extension.

To do this in Ubuntu we need to edit the /etc/odbc.ini file.

sudo gedit /etc/odbc.ini

If your example database is called “geostats” just paste the following text into this file and save it.

[geostats]
Driver = /usr/lib/i386-linux-gnu/odbc/psqlodbcw.so
Database = geostats
Servername = localhost
Username = postgres
Password = postgres
Protocol = 8.2.5
ReadOnly = 0

Now we can establish a connection to the database using the named dns.

con <- odbcConnect("geostats")

Queries that do not return data can be run from R using odbcQuery.

odbcQuery(con, "CREATE EXTENSION POSTGIS")
## [1] 1
odbcQuery(con, "CREATE EXTENSION POSTGIS_topology")
## [1] 1
odbcQuery(con, "CREATE EXTENSION PLR")
## [1] 1
sqlQuery(con, "SELECT PostGIS_full_version();")
##                                                                                                                             postgis_full_version
## 1 POSTGIS="2.2.0dev r11961" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.8.0" RASTER

If the queries return “1” they have been run successfully. Failed queries return -1. There is no way of getting diagnistic information in R. It is therefore a good idea to test more complex queries first using PgAdmin.

Loading vector data into PostGIS from R

Now we can read in some example data. In my case I unzipped it to /home/duncan/geostats. If this is set as the working directory the uncommented line will work. Otherwise write your full path. I am afraid that you will have to rewrite the lines with paths several times in order to run the code, as absolute paths are needed for many operations so I have written in my own path rather than use many paste operations that might have appeared confusing.

# states<-readOGR('/home/duncan/geostats/shapefiles','mex_states')
states <- readOGR("shapefiles", "states")
## OGR data source with driver: ESRI Shapefile 
## Source: "shapefiles", layer: "states"
## with 51 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
plot(states)
box()
axis(1)
axis(2)
grid()

plot of chunk unnamed-chunk-5

Now write the shapefile to PostGIS

writeOGR(states, "PG:dbname=geostats", layer_options = "geometry_name=geom", 
    "states", "PostgreSQL")

This will automatically set up a spatial index, which is usually desirable for processing speed. We will look at more explicit means of loading data later.

Visualising results in QGIS

Quantum GIS was designed as a visualisation tool for PostGIS. The QGIS browser feature and the database manager make it very easy to move layers from PostGIS onto the canvas. First establish a connection with the database.

alt text

The following figure was produced by simply dragging the layer onto the canvas and dropping it on top of a Google image loaded using the Open Layers plugin.

alt text

Loading point data

Let's now read some point data from a text file.

# towns<-read.csv('/home/duncan/geostats/textfiles/MexTowns.csv')
towns <- read.csv("textfiles/MexTowns.csv")
head(towns)
##     gid                       placename    pop      x     y
## 1 30669                   Ciudad Madero 197216 -97.83 22.28
## 2 30856                        González  11212 -98.43 22.83
## 3 30908 Estación Manuel (Úrsulo Galván)  12077 -98.32 22.73
## 4 31450       Ciudad Gustavo Díaz Ordaz  11523 -98.60 26.23
## 5 32155                    Ciudad Mante  84787 -98.97 22.74
## 6 32410               Heroica Matamoros 449815 -97.50 25.88

To turn this into a spatial object we just need to tell R which columns hold the coordinates and set the CRS. In this case it is the same as the shape file that we loaded (EPSG:4326)

coordinates(towns) <- ~x + y
proj4string(towns) <- proj4string(states)
writeOGR(towns, "PG:dbname=geostats", layer_options = "geometry_name=geom", 
    "towns", "PostgreSQL")

Some more data on collections of Mexican Oak species.

# oaks<-read.csv('/home/duncan/geostats/textfiles/MexOaks.csv')
oaks <- read.csv("textfiles/MexOaks.csv")
head(oaks)
##      gid   genus     species       x     y
## 1 774173 Quercus salicifolia  -99.90 16.87
## 2 774174 Quercus salicifolia  -99.88 16.87
## 3 774175 Quercus salicifolia -102.00 23.00
## 4 774177 Quercus ocoteifolia  -95.15 18.58
## 5 774178 Quercus salicifolia -102.00 23.00
## 6 774179 Quercus salicifolia  -98.20 20.22
coordinates(oaks) <- ~x + y
proj4string(oaks) <- proj4string(states)

Now send the points to our database. However, the gdal driver runs very slowly when used directly from within R when there are many data records, as is the case for the oaks collection points. So this time we will try running the shapefile loader from the command line.


writeOGR(oaks, "shapefiles", "oaks", "ESRI Shapefile")
## Error: layer exists, use a new layer name

# com<-'shp2pgsql -s 4326 -I /home/duncan/geostats/shapefiles/oaks.shp
# oaks| psql -d geostats;'
com <- "shp2pgsql -s 4326 -I shapefiles/oaks.shp oaks| psql -d geostats;"
system(com)

This line uses the shp2pgsql command. This is well documented and has many possible options including transformation from one CRS to another.

Using SQL

Suppose we want to extract all the oak species in the State of Chiapas. This is easy to achieve using a spatial overlay in R.


oaks$state <- over(oaks, states)$admin_name
chisoaks <- subset(oaks, state == "Chiapas")
chis <- subset(states, states@data$admin_name == "Chiapas")
plot(chis)
points(chisoaks, pch = 21, bg = 3, cex = 0.4)
box()
axis(1)
axis(2)
grid()

plot of chunk unnamed-chunk-11

The same result can be achieved in PostGIS with a spatial query.

query <- "select genus,species,st_x(o.geom) x,st_y(o.geom) y from oaks o, states s where st_intersects(o.geom,s.geom) and admin_name like 'Chiapas'"
chisoaks <- sqlQuery(con, query)
coordinates(chisoaks) <- ~x + y
plot(chis)
points(chisoaks, pch = 21, bg = 2, cex = 0.4)
box()
axis(1)
axis(2)
grid()