Processing lidar data using QGIS, LAStools and R

Introduction

Lidar point clouds and their derived products have become an important source of detailed spatial information for invironmental research. The data contained in a lidar point cloud can be analysed in multiple different ways and lends itself to detailed statistical modelling. Processing point clouds directly can be challenging due to the size of the files. Very large data sets covering regions, rather than sites, are thus best processed on more powerful platforms than a laptop computer. However smaller areas can easily be analysed using open source or free tools.

Obtaining some free Lidar data

The environment agency in the UK has made available a large number of files containing both processed and unprocessed lidar data at this site.

http://environment.data.gov.uk/ds/survey/index.jsp#/survey?grid=SU20

The pre-processed data consists of digital surface models and digital terrain models at a range of resolutions. More on what these files consist of later. They can be used to avoid having to process large amounts of data yourself. However it is important to understand how they have been produced. Raw point clouds are the starting point in any new analysis based on fresh data provided by a lidar flight. We will start off working with a single tile of raw data in order to illustrate the process. The file we will use is taken from a well wooded area in the New Forest.

SU2408_P_8044_20121204_20121205.laz

What does Lidar data consist of?

Raw lidar data is fundamentally quite simple in structure. It consists of a cloud of points with x, y and z coordinates. Asking for information on the file (which has been compressed to laz format from the original las) provides this output.

LASzip compression (version 2.4r1 c2 50000): POINT10 2 GPSTIME11 2 reporting minimum and maximum for all LAS point record entries

X 42399999 42500001
Y 10799999 10900001
Z 5177 14463
intensity 0 1
return_number 1 4
number_of_returns 1 4
classification 1 8

There are some components in addition to x,y,z that can be used in processing. An important element is the intensity of the return and the order in which the return arrives at the sensor. First returns represent the top of a forest canopy while last returns have penetrated vegetation and thus usually return from the ground. High intensity returns come from harder surfaces than low intensity returns. The combination of the number of the return and intensity allows the points to be classified by running an algorithm over the data.

= Never classified
= Unassigned
= Ground
= Low Vegetation
= Medium Vegetation
= High Vegetation
= Building
= Low Point
= Reserved

The points in this file have already been classified and we can use this when processing the data.

Visualising point clouds using Plas.io

WebGL is now integrated into most web browsers and there are some nice tools online that can be used to open and view a raw point cloud. For example http://plas.io/

Loading the point cloud produces this visualisation after playing with the intensity slider.

alt text

This is very nice to play around with and provides for visual interpretation of the landscape. However no quantitative information can be obtained directly. This is where surface models come in. These are raster files that are based on smoothed sets of returns in two dimensions with values that are based either on the z values or some other element such as intensity.

LASTools in QGIS

It is possible to convert the point cloud into text format and then open the data directly in R for some bespoke processing. However there are several problems with taking this approach. Lidar files are very large and typically consist of several million points, so there is a “big data” issue. Efficient algorithms for producing surfaces have already been developed by specialists in the field, so attempting to replicate such techniques is not necessary. There may be circumstances when additional information needs to be extracted from bespoke processing of the raw point cloud, but this would normally be applicable to a small wel defined, specific area of interest (such as a measured forest research plot). For more general use LASTools provide a set of executable files (developed in windows) that are free to use for academic purposes.

http://www.cs.unc.edu/~isenburg/lastools/

LASTools can be integrated into the QGIS toolbox by downloading the tools and then just setting the path to the unzipped files from the processing > options menu in QGIS. When running on Linux Wine has to be used and the path to the Wine binaries set. LAStools can also be used from the ARCGIS toolbox.

Producing a digital terrain model

Once LAStools is set up and working in QGIS all the conventional data processing is quite simple using a point and click GUI. If the data have not been classified this can first be done using lasclassify. The blast2dem option is particularly useful when producing surface models (digital elevation models) from large point clouds. Note that there are also professional production versions of the tools included that can batch process large numbers of files at once. For commercial use a licence is required and for free use LAStools introduces some slight errors. LAStools is not completely free nor is it open source (truly free open source processing is possible, but it is much less user friendly)

A digital terrain model can be produced by selecting only points classified as ground.

The resolution of the resulting raster can be selected In this case we will keep it at 1 meter. Save the file with some name indicating that it is a dtm

A hillshaded version can also be made directly from LAStools (or derived from the dtm using any of the options in QGIS including GRASS)

Notice that as Lidar penetrates the canopy some features of micro-topography that are not visible in other ways may be seen. This can be especially useful in some archaeological applications of course.

Producing a digital surface model

The surface model based on returns classified as vegetation can be produced just as easily by selecting points classified as vegetation.

Canopy height model

Once a digital terrain model and digital surface model have been produced it is very easy to calculate the height of the canopy simply by subtracting one layer from the other. In QGIS this can be done using the raster calculator with an expression like "dsm_veg@1" - "dtm@1"

This layer can be called chm (canopy height model) and added to the QGIS canvas. When adding layers make sure that the CRS is always set to British National Grid (EPSG:27700). This may not be automatic and QGIS defaults to EPSG:4326 which will result in layers vanishing. Care has to be taken to ensure all layers have the correct CRS

$alt text$

alt text

Ideas for additional layers

So far so good. The LAStools plugin is extremely easy to use and runs quite quickly over a single file. It has generated two of the conventional products that are derived from Lidar data. There are many other things that could be tried however, particularly if the purpose of the exercise is to produce a vegetation classification. The intensity could be used in the model in order to produce an additional surface based on it. Texture can be extracted from either the canopy height model or other layers. Lidar is a very information rich data source and is especially useful when combined with multispectral imagery.

Web based images such as Google maps or Bing can be added to the QGIS project using the open layers plugin and the coloured results exported as a 3d model using WebGL using the Qgis2threejs plugin.

Click on the link below to play with the model shown in the image.

https://dgolicher.bitbucket.io/new_forest/

Aggregating data and extracting quantitative information

Not only does lidar data provide very useful quantitative measures of canopy height, it typically is produced at a much finer resolution than commonly used sources of remotely sensed multispectral imagery such as Landsat. However in order to use this level of resolution effectively the data has to be aggregated in some way.

Producing statistics on the whole raster layer is very simple in R using the raster package.

library(raster)
library(rgdal)
chm<-raster("data/chm.tif")
hist(chm)

summary(chm)

##                chm
## Min.    -0.2799988
## 1st Qu.  2.4100037
## Median   8.5999985
## 3rd Qu. 15.2500000
## Max.    47.6900024
## NA's     0.0000000

Note that there is a peak of pixel classes that represents open areas. It is easy to calculate the area falling into height classes over the whole region using R. However we are not usually interested in the properties of a single lidar image as such, as the boundaries are fairly arbitrary. We usually want to extract information about a specific area. If we have a defined polygon such as a forest stand we can do this easily through an overlay.

poly<-readOGR("data","stand")

## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "stand"
## with 1 features
## It has 1 fields

stand_pixels<-unlist(extract(chm,poly))
hist(stand_pixels)

summary(stand_pixels)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -0.29    6.23   15.62   15.63   24.15   51.54

Obtaining statistics from many polygons at once is simple enough in R, as the result of running an overlay with the extract function is a list. All that is needed is to lapply any function to the list. Adding aggregated statistics to the attributes of a shapefile is in fact a surprisingly tricky operation using standard tools in QGIS. However it is easy to add an R script to the toolbox to do just this. QGIS has a special syntax for R scripts that enables layers from the canvas to be chosen by the user. The following script calculates means, medians percentiles (10% and 90%) based on a vector layer.

##Raster processing=group
##Raster = raster
##Vector = vector
##Output= output vector
library(raster)
library(rgdal)
r<-Raster
grat<-Vector
a<-extract(r,grat)
f<-function(x)min(x,na.rm=TRUE)
grat@data$min<-unlist(lapply(a,f))
f<-function(x)quantile(x,0.1,na.rm=TRUE)
grat@data$q_10<-unlist(lapply(a,f))
f<-function(x)mean(x,na.rm=TRUE)
grat@data$mean<-unlist(lapply(a,f))
f<-function(x)median(x,na.rm=TRUE)
grat@data$median<-unlist(lapply(a,f))
f<-function(x)quantile(x,0.9,na.rm=TRUE)
grat@data$q_90<-unlist(lapply(a,f))
Output=grat

An example would be to calculate these statistics for all the elements of a 25m x 25m grid placed over the area. This is just smaller than the size of a Landsat pixel, so could be used to refine the classification of such a satellite image. Because of random variability at a very fine scale information aggregated to a coarser grid can be more generally useful. If the script above is saved with the name “extract_raster_values_to_vector” in the R scripts directory the interface shown below will appear in the toolbox.

This takes a few minutes to run, but leads to a very detailed breakdown of forest height distribution across the area in 1,600 individual small square polygons. The script can easily be adapted to calculate different statistics.

The results can be investigated here as a leaflet webmap which was exported using the qgis2web plug in. Click on the link below to try it. https://dgolicher.bitbucket.io/new_forest_ol