Introduction

In order to keep the amount of R code in the MoRph shiny interface to the minimum the interface will source in a series of functions that will allow most actions to be performed with one line of code.

The strategy behind building the functions is to keep the number of arguments to each one to a minimum. They are therefore somewhat “hard wired” to the current application. It is assumed that anyone wishing to adapt the application will do so through modifying the functions, rather than making many changes to the function calls. This document therefore documents and explains each function with some suggestions where necessary regarding the changes that could be necessary.

Setting up Docker containers and other preliminaries

The application runs using two linked docker containers that can be pulled from my repository. The RStudio container has been setup with gdal-dev, libproj-dev and all the commands associated with postgis (raster2pgsql etc). It also has all the R packages needed pre-installed.

The postgis is based on kartoza/postgis docker to which plr has been added. The two containers are linked, so instead of using localhost as the hostname within R studio use docker.

## Setup the folders for the data
mkdir -p /home/duncan/postgres_data
chmod 777 -R /home/duncan/postgres_data
mkdir -p /home/duncan/rstudio_server
chmod 777 -R /home/duncan/rstudio_server
######

The permissions have been set to 777 which allows all users to read, write and execute scripts in them. This is not secure, but this is not important in this context as only the application port will be visible externally. For development purposes it is important not to have to bother about unwanted security issues that prevent commands being run and files saved.

docker run --name "postgis-plr" -p 25432:5432 -d -v /home/duncan/postgres_data:/var/lib/postgresql dgolicher/postgis-plr

docker run --name "rstudio" --link postgis-plr:postgis -d -p 8788:8787 -v /home/duncan/rstudio_server:/home/rstudio dgolicher/rstudio

PostGIS database functions.

All functions that work on the database begin with the suffix Pg. The naming convention thereafter is to use two capitalised. words for each function.

Creating a new data base

The function assumes that a .pgpass file is present in the home directory that allows the rstudio (or another) user to work on the database using the default docker superuser with password docker. If this is changed for security reasons at some point the .pgpass file would have to be altered to refelct this

.pgpass

postgis:*:*:docker:docker

Save this with the name .pgpass to the home directory.

New databases can be added directly from the command line, but a simple R function can be used. The function drops the database if it is not in use and then creates it. So use with care if the database already exists! It only needs to be called once!

PgMakeDb<-function(dbname="brant"){
  com<-paste("dropdb -h postgis -U docker ",dbname,sep="")
  system(com)
  com<-paste("createdb -h postgis -U docker ",dbname,sep="")
  system(com)
}

Allowing connections to the database using RODBC

Every database being used needs an entry in the odbc.ini file that is placed in /etc/odbc.ini. As this is only directly editable with root priviledges the best strategy is to edit an odbc.ini file in the home directory and then copy it into place by opening a shell.

odbc.ini example

Make sure that the connection name (in this case brant) matches the database name, as this convention will be used in the subsequent functions.

[brant]
Driver = /usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so
Database = brant
Servername = postgis
Username = docker
Password = docker
Protocol = 8.2.5
ReadOnly = 0

Then open a shell and run

sudo cp odbc.ini /etc/odbc.ini

A new entry that follows this format should be added to the odbc.ini file ever time a new data base is created. It is envisaged that a new database would be used for each model, with all tables being placed in the public schema. In some cases it may be useful to use more schemas within the database for separate sites, but this may not be necessary. The concept is to allow storage and backup of all the relevant information by dumping the database to a single file.

Adding extensions to the database

PgInit<-function(dbname="brant"){
  require(RODBC)
  con<-odbcConnect(dbname) ## Note the use of the connection name. IT MUST MATCH THE DATABASE!
  odbcQuery(con,"create extension postgis")
  odbcQuery(con,"create extension plr")
}

Adding PLR statistical functions to the database

Many useful statistical functions from R can be added as functions to the database. Again this can be done directly through R by using the open connection. The general format of all the functions involves coercing the arguments to numeric(to be on the safe side if characters are passed) and calculating stats after removing NAs. A float is returned. It is easy to add more to this function if required.

PgPlr<-function(dbname="brant"){
require(RODBC)
con<-odbcConnect(dbname)  

query<-"CREATE OR REPLACE FUNCTION median (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
median(x,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION q10 (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
quantile(x,0.1,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION q90 (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
quantile(x,0.9,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION q75 (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
quantile(x,0.75,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION q25 (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
quantile(x,0.25,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION minimum (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
min(x,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION maximum (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
max(x,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION mean (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
mean(x,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION sd (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
sd(x,na.rm=TRUE)'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION se (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
sd(x,na.rm=TRUE)/sqrt(length(x))'
LANGUAGE 'plr' STRICT;

CREATE OR REPLACE FUNCTION length (float[]) RETURNS float AS '
x<-arg1
x<-as.numeric(as.character(x))
x<-na.omit(x)
length(x)'
LANGUAGE 'plr' STRICT;

"
odbcQuery(con,query)
}

Loading raster layers

Raster layers uses a few more arguments. The layers are loaded in database (as referenced rasters can’t be transfered). The tiles are usually square, but the x and y width can be set. Arguments are thus

  1. flnm: name of file
  2. x: Number of pixels in tile x dimension
  3. y: Number of pixels in tile y dimension
  4. tabnm: Name of table to hold data
  5. db: Database name
  6. srid: If this is 0 the srid will be taken from the file if it is included. It is usually safer to set this if known.
  7. path: Path to file with trailing /
PgLoadRaster<-function(flnm="cold_bay_3857_clip.tiff",x=10,y=10,tabnm="dem",db="brant",srid=3857,path="/home/rstudio/shiny_morph/"){
flnm<-paste(path,flnm,sep="")  
command <- paste("raster2pgsql -s ",srid, "-I -d  -M  ",flnm, " -F -t ",x,"x",y," ",tabnm,"|psql -h postgis -U docker -d ",db,sep="")
system(command)
}

Setting up graticule from dem

In the MoRph application patches will be either square or rectangular polygons that are derived from vectorising the raster dem that is first added to the data base. Statistics are calulated from the pixel values of the dem and held as attributes. The function can drop graticules where the minimum value is below a certain level and those with a maximum above a certain level, as this may be useful along coastlines.

PgMakeGrat<-function(dem="dem",minht=-10,maxht=10,db="brant")
{
  require(RODBC)
  con<-odbcConnect(db)
  
  query<-paste("
drop table if exists grat;
create table grat as
select s.* from
(select rid, st_envelope(rast) geom,
minimum((st_dumpvalues(rast)).valarray) min,
q10((st_dumpvalues(rast)).valarray) q10,
q25((st_dumpvalues(rast)).valarray) q25,
median((st_dumpvalues(rast)).valarray) median,
mean((st_dumpvalues(rast)).valarray) mean,
q75((st_dumpvalues(rast)).valarray) q75,
q90((st_dumpvalues(rast)).valarray) q90,
maximum((st_dumpvalues(rast)).valarray) max
from ",dem,") s
where min>",minht," and max < ",maxht," and min <1000000000000;
CREATE INDEX grat_gix ON grat USING GIST (geom);",sep="")
odbcQuery(con,query)  
}

Getting vector layer from the data base

PgGetQuery <- function(query="select * from grat",db="brant") {
    require(RODBC)
    require(rgdal)
    con<-odbcConnect(db)
    query <- paste("create view temp_view as ", query, sep = "")
    odbcQuery(con, query)
    dsn<-paste("PG:dbname='",db,"' host='postgis' port=5432 user= 'docker'",sep="")
    result <- readOGR(dsn, "temp_view")
    odbcQuery(con, "drop view temp_view")
    return(result)
}

Adding mean and median from resource layer

This query works by overlaying the graticule onto any raster layer that has been first uploaded into the data base using PgLoadRaster. A temporary table is formed, then renamed grat and re-indexed. This is a more robust method than adding columns to grat.

PgAddResource<-function(db="brant",resource="dem")
{
  require(RODBC)
  con<-odbcConnect(db)
  query<-sprintf("create table tmp as
select g.*,
med median_%s,
mn mean_%s
from 
grat g,
(select 
t2.rid,
median((st_dumpvalues(st_union(st_clip(rast,geom)))).valarray) med,
mean((st_dumpvalues(st_union(st_clip(rast,geom)))).valarray) mn
from %s t,
(select * from grat) t2
where st_intersects(rast,geom)
group by t2.rid) s
where s.rid=g.rid;
drop table grat;
ALTER TABLE tmp RENAME TO grat;
CREATE INDEX grat_gix ON grat USING GIST (geom);",resource,resource,resource)
   odbcQuery(con,query)
}

Putting them all together as a test

PgMakeDb("brant")
PgInit("brant")
## [1] -1
PgPlr("brant")
## [1] 1
PgLoadRaster()
PgMakeGrat()
## [1] 1
PgAddResource()
## [1] 1
grat<-PgGetQuery()
## Warning: closing unused RODBC handle 4
## Warning: closing unused RODBC handle 3
## Warning: closing unused RODBC handle 2
## Warning: closing unused RODBC handle 1
## OGR data source with driver: PostgreSQL 
## Source: "PG:dbname='brant' host='postgis' port=5432 user= 'docker'", layer: "temp_view"
## with 1125 features
## It has 11 fields
plot(grat)

head(grat@data)
##    rid        min         q10         q25      median        mean
## 0 2848 -0.0401074 -0.03975181 -0.03000000 -0.02000000 -0.02274750
## 1 2918 -1.1139420 -1.07387925 -0.95813534 -0.80926621 -0.81109634
## 2 2409 -4.3227954 -3.52846038 -2.93645614 -2.57106483 -2.71609255
## 3 2847 -0.1005960 -0.09000250 -0.08000013 -0.06493575 -0.06675523
## 4 3871 -5.9273510 -5.24710598 -4.30030525 -3.18236303 -3.59496312
## 5 4323 -0.3974157 -0.25016491 -0.17481728 -0.06831670  0.30348008
##           q75         q90        max median_dem    mean_dem
## 0 -0.01684577 -0.01000000  0.0000000 -0.0200000 -0.02055879
## 1 -0.63609152 -0.54897236 -0.4665218 -0.7601299 -0.77705630
## 2 -2.34855115 -2.18948960 -2.0378635 -2.6004219 -2.86364111
## 3 -0.05000000 -0.04609394 -0.0399999 -0.0600000 -0.06314131
## 4 -2.74987471 -2.46747975 -1.8255453 -3.1519299 -3.51950732
## 5 -0.00794580  0.88634607  4.9517407 -0.0508208  0.53503773