• 1 Need for speed
  • 2 An aside
  • 3 Need for speed
  • 4 Predictive Ecology blog posts about R speed
  • 5 “Vectorization”
  • 6 Vectors and Matrices
  • 7 Spatial simulation
  • 8 Key packages for spatial simulation
  • 9 Key packages for spatial simulation
  • 10 What we will do here
  • 11 SpaDES functions
  • 12 Working with spatial data
  • 13 raster
  • 14 sp
  • 15 The data.table package
  • 16 data.table tutorials
  • 17 raster and data.table together
  • 18 raster and data.table together
  • 19 The Rcpp package
  • 20 Profiling and Benchmarking (1)
  • 21 Profiling and Benchmarking (2)
  • 22 Profiling and Benchmarking (3)
  • 23 Profiling the spades call
  • 24 When to profile
  • 25 Strategies for profiling
  • 26 Next

1 Need for speed

R is widely seen as being ‘slow’ (see julia web page)

But, if you use a few specific tools, then this becomes irrelevant because of the powerful tools in various packages in R

2 An aside

Pure R, when the most efficient vectorized code is used, appears to be 1/2x the speed of the most efficient C++.

See Hadley Wickham’s page on Rcpp, scroll down to “Vector input, vector output”… ), noting that if it took 10 minutes to write the C++ code, it would have to be 150,000 times faster to make it worth it.

3 Need for speed

Spatial simulation means doing the same thing over and over and over … so we need speed

We will show how to profile your code at the end of this section.

5 “Vectorization”

  • This is at the core of making R fast. If you don’t do this, then it is probably not useful to use R as a simulation engine.
# Instead of 
a <- vector()
for (i in 1:1000) {
  a[i] <- rnorm(1)
}

# use vectorized version, which is built into the functions
a <- rnorm(1000)

6 Vectors and Matrices

  • These are as fast as you can get in R
  • Fast numerical operations
  • Faster than data.frame
  • Anything that is in pure vectors or matrices is ‘fast enough’
  • It is always a challenge to keep all code in vectors and matrices
  • Thus the following packages…

7 Spatial simulation

  • To work with spatial simulation (e.g., time and space), it requires more than just spatial data manipulation
  • Sometimes it is just base R stuff
  • Need to learn how to make functions (allows reusability)
  • Need to learn a few key packages that are critical for speed

8 Key packages for spatial simulation

  • base package – everything matrix or vector is ‘fast’
  • raster - for spatially referenced matrices

    • not always fast enough, sometimes we copy the data into a matrix, then manipulate, then return the data to the raster object
  • sp - equivalent of vector shapefiles in a GIS

    • Polygons, Points, Lines
    • Not always fast, but essential to have

9 Key packages for spatial simulation

  • data.table

    • For data.frame type data (i.e., columns of data)
    • Very fast when object gets large, but is actually slower if the data.frame is small (<100,000 rows)
  • SpaDES – many functions; will be moved into a separate package soon

  • Rcpp

    • R interface to C++ . When you need something fast, and you can’t get it fast enough with existing tools/packages, you can create your own (we will not go further into this here)

10 What we will do here

  • We will go through SpaDES functions quickly, because there are fewer tutorials online for these
  • We will show links to various tutorials for raster, sp, data.table, Rcpp
  • Each person should decide which tool is the most useful to them
  • Put something into practice

11 SpaDES functions

  • These are all potentially useful for building spatio-temporal models
?`spades-package` # section 2 shows many functions

# e.g.,
?spread
?move
?cir
?adj
?distanceFromEachPoint

12 Working with spatial data

13 raster

14 sp

15 The data.table package

From every data.table user ever:

WOW that’s fast!


install.packages('data.table')

(at least for large tables!)

16 data.table tutorials

17 raster and data.table together

  • The current implementation of LANDIS-SpaDES uses a “reduced” data structure throughout

  • Instead of keeping rasters of everything (one can imagine that there is redundancy, i.e., 2 pixels next to each other may be identical)

  • We make one raster of “id” and one data.table with a column called “id”

  • Then we can have as many columns as we want of information about each of these places

  • Like “polygons”, but for rasters, and dynamic… can change over time

  • This may be useful for your own module

18 raster and data.table together

  • There is a key helper function:
?rasterizeReduced

What does this do?

19 The Rcpp package

From every Rcpp user ever:

WOW! Just wow.


install.packages('Rcpp')

20 Profiling and Benchmarking (1)

  • In general, the usual claim is to worry about ‘execution speed later’
  • This is not 100% true with R
  • If you use vectorization (no or few loops), and these packages listed here, then you will have a good start
  • AFTER that, then you can use 2 great tools:

    • profvis package (built into the latest Rstudio previews, but not the official release version)
    • microbenchmark package

21 Profiling and Benchmarking (2)

microbenchmark::microbenchmark(
  loop = {
    a <- vector()
    for (i in 1:1000) a[i] <- rnorm(1)
  },
  vectorized = { a <- rnorm(1000) }
)
## Unit: microseconds
##        expr      min        lq       mean    median       uq       max
##        loop 3601.977 4047.6055 4638.40080 4169.4605 4366.394 34951.379
##  vectorized   70.972   72.5855   76.47407   73.3185   74.491   185.642
##  neval
##    100
##    100

22 Profiling and Benchmarking (3)

If you have Rstudio version >=0.99.1208, then it has profiling as a menu item.

  • alternatively, we wrap any block of code with profvis

  • This can be a spades() call, so it will show you the entire model:

profvis::profvis({a <- rnorm(10000000)})
Flame Graph
Data
Options ▾
<expr>MemoryTime
profvis::profvis({a <- rnorm(10000000)})
rnorm0100200300400500600700800

23 Profiling the spades call

Try it:

mySim <- simInit(
   times = list(start = 0.0, end = 2.0, timeunit = "year"),
   params = list(
     .globals = list(stackName = "landscape", burnStats = "nPixelsBurned")
   ),
   modules = list("randomLandscapes", "fireSpread", "caribouMovement"),
   paths = list(modulePath = system.file("sampleModules", package = "SpaDES"))
)
profvis::profvis({spades(mySim)})

24 When to profile

  • First, you should have started building your code with the packages we have discussed
  • It will be too late if you have loops in your code, and you are ready to profile to improve it

If you have used these tools, then:

  • When you have mostly finished whatever we are coding
  • Don’t ever start making code more efficient until you have profiled
  • It is almost impossible to tell which bits are the slow parts, without profiling or benchmarking

25 Strategies for profiling

  • Can do an entire SpaDES model call
  • Can pinpoint specific functions
  • Can test alternative ways of implementing the same thing