For the R code, .rmd, datasets, and html files behind this article visit my GitHub account repository.

1 Abstract

Foresters may need to know distribution of different vegetation life-forms and/or area of bare land at a given forest stand or site. Such information could help in forest classification and/or management like planting of more trees or herbivores population control among others. Remote sensing technologies have aided such assessments a great deal. In this short article I used digital surface model (DSM) and digital terrain model (DTM), captured by light detection and ranging (LIDAR) platform, to demonstrate vegetation life-form and bare land extent assessment at NEON Harvard site.

2 Introduction

I used two raster layers from NEON, which I accessed from Biodiversity Data Science YouTube channel. The layers are digital surface model (DSM) which contains elevation of the top of physical points above the surface of the earth, basically top of trees in this case and digital terrain model (DTM) which is the elevation values at the surface of the earth. Now the difference between these two heights can tell us the heights of trees, and the units are in meters (m).

3 My Contribution

I was interested in showing the size of bare ground within the forest site. This information could help in budgeting for replanting or if replanting had been done, then it could help to know where seedlings died. I was also interested in knowing distribution of different vegetation life-forms within the site. That I based on the differences in heights of vegetation.

4 Loading necessary R libraries

I used raster and tidyverse libraries. One can use install.packages('') to have them installed in their R. The versions of all my tools were:

  1. R = 4.0.3 “Bunny-Wunnies Freak Out”, run the word version to know yours
  2. RStudio = 1.3.1093 (Go to Help ==> About RStudio)
  3. raster = 3.4.5 Run packageVersion(“raster”)
  4. tidyverse = 1.3.0 Run packageVersion(“tidyverse”)

5 Loading dsm and dtm datasets into R

I used the loaded raster() package to load the dsm and dtm layers as follows:

A simple plot of dsm layer indicated success in loading:

I then used compareRaster() to check whether the dsm and dtm match in extent, crs, and spatial resolution. This was important step for subsequent analyses.

## [1] TRUE

This returned TRUE which indicated that I was good to go on, otherwise it would return an error with information on what did not match.

I then ran dsm_harvard object to know more about the metadata including object class, dimensions, resolution, extent, coordinate reference system (crs), source, name, and range of the values in the pixels.

dsm_harvard
## class      : RasterLayer 
## dimensions : 1367, 1697, 2319799  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 731453, 733150, 4712471, 4713838  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs 
## source     : D:/R/SPATIAL/spatialwithR/NEON-airborne/HARV_dsmCrop.tif 
## names      : HARV_dsmCrop 
## values     : 305.07, 416.07  (min, max)

It had 1367 rows and 1697 columns giving a total of 2.31979910^{6} cells. Since the resolution was 1m by 1m, I could get the area of the site in hectares by dividing the number of cells by 10,000 thus 231.9799 ha.

6 Preparing data for plotting

I used tidyverse package. In preparing the data, I started by doing simple operation which involved subtracting dtm from dsm in order to get canopy layer. That is done cell by cell hence I ended up with a layer with similar properties as my two ‘mother’ layers, of course with different cell values. I could then make a simple plot of the canopy layer.

A look at the plot revealed that I had legend values ranging from 0 to just above 35. These were heights of ‘trees’. Am putting trees in quotation marks because the projections above the surface could be any other thing apart from trees (including the pylon of NEON sensor platform). Visual inspection of the site using Google Earth Pro and coordinates provided at the site link showed me that it was a forested site.

Now I need to have the canopy layer, which is currently a raster, as a dataframe so that I can play around with it in tidyverse. Here I use as.data.frame() to accomplish this.

canopy_height_harvard_df <- as.data.frame(canopy_height_harvard, xy = TRUE)
head(canopy_height_harvard_df)
##          x       y    layer
## 1 731453.5 4713838 19.37000
## 2 731454.5 4713838 18.67999
## 3 731455.5 4713838 17.05002
## 4 731456.5 4713838 17.17999
## 5 731457.5 4713838 19.73001
## 6 731458.5 4713838 20.22000

The dataframe stored canopy height values at each of the x and y coordinates in the layer. Hence my dataframe had 3 variables and 2319799 rows.

7 Estimating bare ground area

I assumed that where both dsm and dtm had same value was a bare ground ,that is, there was no object projecting above the ground to create a difference. So, I went ahead to find the number of cells in that category. That information could be important if there was need to know how many seedlings to budget for in reforestation/afforestation efforts at the site. Or how much area was affected by some catastrophic event(s) in the near past at the site.

canopy_height_harvard_df %>% 
  filter(layer == 0) %>% 
  count()
##       n
## 1 51135

The code returned 51135 number of cells which is equal to 5.1135 ha of bare ground. Now relevant authorities could go ahead and plan for enhancing vegetation cover in the bare ground area, if necessary. Further investigations was necessary to ascertain whether these are rock outcrops or water bodies where planting of vegetation might not be possible. To accomplish that, images with high spectral resolutions within visible and NIR regions of the electromagnetic spectrum could help classify these ‘bare grounds’.

8 Plotting vegetation classes

Relevant authorities may need to know a number of things about the area, such as:

  1. Where do we have bare ground, grass, shrubs, and tall trees?
  2. What is the growth rate of trees at specific replanted site(s)?
  3. Did planted trees at specific site pick up?
  4. Did introduced herbivores affect vegetation at specific site?
  5. How did pests, fire, drought, tornado affect trees at specific site(s)?

The questions are endless. However, to answer the first one, I made two assumptions:

  1. Different vegetation life-forms had different height classes.
  2. Quartiles of the canopy heights values could be used as proxy for such height classes.

I then generated vegetation life-forms classes for the site based on quartiles of their heights as Bare, Grass, Shrubs, and Trees. Then I used mutate to add the life-form classes to the dataframe using case_when within mutate.

summary(canopy_height_harvard_df)
##        x                y               layer      
##  Min.   :731454   Min.   :4712472   Min.   : 0.00  
##  1st Qu.:731878   1st Qu.:4712812   1st Qu.:10.64  
##  Median :732302   Median :4713154   Median :16.65  
##  Mean   :732302   Mean   :4713154   Mean   :14.96  
##  3rd Qu.:732726   3rd Qu.:4713496   3rd Qu.:20.39  
##  Max.   :733150   Max.   :4713838   Max.   :38.17  
##                                     NA's   :1
canopy_height_harvard_classed_df <- canopy_height_harvard_df %>% 
  mutate(life_forms = case_when(layer == 0 ~ 'Bare',
                             layer < 10.64 ~ 'Grass',
                             layer < 16.65 ~ 'Shrubs',
                             layer >= 16.65 ~ 'Trees',
                             TRUE ~ 'Trees'))
head(canopy_height_harvard_classed_df)
##          x       y    layer life_forms
## 1 731453.5 4713838 19.37000      Trees
## 2 731454.5 4713838 18.67999      Trees
## 3 731455.5 4713838 17.05002      Trees
## 4 731456.5 4713838 17.17999      Trees
## 5 731457.5 4713838 19.73001      Trees
## 6 731458.5 4713838 20.22000      Trees

Ideally, foresters at NEON or globally would obviously suggest different values to assign to each life-form category. I bet they only agree with me in the Bare class.

To respond to the questions I raised earlier:

  1. Bare grounds were predominant in north eastern end of the site. Some parts at the center and towards north had grassy lands. Shrubs and trees were, however, distributed rather throughout the site, with more trees than shrubs.
  2. If I assumed dtm was initial dsm of the site, then I could divide this canopy layer by period of time it took between the two layers to know growth rate of the trees. For example, I could divide it by 5 if it took five years and I would have growth rate layer of the whole site. Important questions like why different growth rates could be witnessed across the site could then be presented to agronomists and foresters for further scrutiny.
  3. This third question was related to the second. If a patch which was bare and then planted with some woody species still remained bare in the subsequent lidar captures then it shall be evident and appropriate actions would be taken.
  4. This was for rangers and ranchers. If a section of the site was allocated to large herbivores like elephants, then it could be deduced how vegetation heights changed at that particular site.
  5. All these biotic and abiotic factors affecting vegetation growth could be shown especially if they altered growth of the vegetation. Some management practices like humus application could also positively affect vegetation growth. Again multiband images could give more information if spectral signatures were distinct enough.

One may ask, how different vegetation classes were distributed in terms of coverage. A simple bar plot for counts of cells within each category provided the answer. The counts equaled area in square meters.

9 Conclusions

From dsm and dtm data we can get a plethora of information about a place. I have just scratched the top two powerful packages; raster and tidyverse in R. Many more can be done especially if these were multi-spectral bands.

Note: Only convert raster to dataframe if it has manageable number of cells, otherwise you might end up with hundreds of millions of rows. For example, I am currently working on a site with four Sentinel layers each with about 394 million cells; I can’t think of having that as a dataframe in my machine even though R can handle up to 2.147483610^{9} rows. My classification of vegetation life-forms using quartiles is very far from reality. Anyway, mine was to present some basic idea. Once the classes are clear, the same concept can be applied by simple modification. One can even think of rate of change of heights of storey buildings in a city. Sinking of some land or rising of a volcanic mountain are other areas where the idea could be useful. Warm regards! MeRRy ChRistmas %>% RospeRous New YeaR!