Task 2 for Summer 2024 OpenTopography Internship

Task Description

From Jira – The dataset, “Airborne Laser Mapping of Yosemite National Park, CA, 2007” has 4 polygons. The DTM is missing the SE polygon, while the DSM is missing the NW polygon. This causes CHM calculations to bomb when an AOI is selected in a polygon that does not have both rasters defined. Depending on the quality of the lidar data (and its classifications), I could create a DSM and DTM for the missing polygons and re-ingest to make this dataset complete.

So… a few things to think about here. First, we only have 4 classification classes:

  • Class 1: Unclassified
  • Class 2: Ground
  • Class 3: Low Vegetation
  • Class 4: Medium Vegetation

I am concerned about the unclassified water and noise, so this might take some quality checks after generating the DTM and DSM.

After pulling the available files into QGIS, I’ve identified the file names and will tabulate:

Point level Model type Cardinal direction File name Lidar folder
bare earth DTM SW fmdm_grd DM_MoraineDome
bare earth DTM NE fmdn_grd DN_Dana
bare earth DTM NW fmcn_grd CN_Conness
highest hit DSM SE ulyl_grd LY_Lyell
highest hit DSM SW umdm_grd DM_MoraineDome
highest hit DSM NE umdn_grd DN_Dana

Additional potentially concerning information: Some of the metadata/information on the dataset cites that the area total 11sqkm and a point density of 3.5 pts/sqm, while the metadata downloaded from OpenTopo says 66 sqkm and a point density of 1.59 pts/sqm. So… a little bit of inconsistency in metadata and information.

We are missing the following datasets –

Point level Model type Cardinal direction File name Lidar folder
bare earth DTM SE N/A LY_Lyell
highest hit DSM NW N/A CN_Conness

Let’s investigate this point cloud data and generate those DEMs!


Investigate + Visualize Point Clouds

First let’s plot out the (rectangular) extent of the point clouds for each of the four polygon sections of the dataset. I’ll create lidar catalogs for each individual section and plot.

North West CN_Conness

South West DM_MoraineDome

North East DN_Dana

South East LY_Lyell

So the catalogs all seem to overlap, which I am not used to because usually in USGS 3DEP data, tiles/files are all nicely aligned, neat rectangular units. After chatting with Matt and going over the lidR documentation, it looks like that shouldn’t be a huge problem! So we are good to go with the DSM generation.


In order to investigate the classification of the point cloud to see if it’s fit to interpolate into a DTM, I’ll plot out a few classes individually to check for point density and accurate classification.

I’ll just pick out one .las/.laz file from the missing DTM area.

First, I’m creating variables which describe the point classification.

tile1_cn = readLAS("CN_Conness/c_207_010.laz")
las_check(tile1_cn)
ground = filter_poi(tile1_cn, Classification == 2)
veg = filter_poi(tile1_cn, Classification == 3 | Classification == 4)
unclass = filter_poi(tile1_cn, 
                     Classification != 2 | 
                     Classification != 3 | 
                     Classification != 4)

Now I’ll create some super-simple point cloud plots.

Ground-classified point cloud

Unclassified point cloud

So as we can see, the majority of points in this point cloud are not classified, and when we try to generate a DTM we will get an inaccurate result with this data. The tin() algorithm will either fail, or the edges will be far too long to accurately represent the terrain. We will need to reclassify ground points, which can be completed in lidR, on which we can base the DTMs.


Generate DSM

Before we go through the process of classifying ground points for our other area while generating the missing DTM, we will produce the missing DSM first and analyze/visualize our result. We want to filter out noise points, but because the DSM is a model of the ground + all the objects on the ground, we don’t have to have ground points classified for this step.

# read the north-west CN_Conness lidar catalog
nw_ctg = readLAScatalog("CN_Conness/")
# remove class 7 (noise) points -- 
#  these points likely aren't classified anyways, but I'll still include this 
opt_filter(nw_ctg) = "-drop_class 7"

Note that when we generate our DSM we want to limit the “edge” size in the tin() algorithm in order to avoid misrepresenting the surface through long triangulation sides. Additionally, while these areas do have lakes/bodies of water, we want to avoid especially small holes or NA data within our coverage extent. These holes tend to show up and look very off-putting when generating the COG-version of the raster. We will interpolate the holes using a nearest-neighbor/mean function defined below, which will use the NA hole’s neighboring cells to determine a mean value, and fill in that value for the NA hole.

#input settings as catalog, 1m res, dsmtin agorithm, sides/edges of dsm no longer than 20m
nw_dsm = lidR::rasterize_canopy(nw_ctg, res = 1, algorithm = dsmtin(max_edge = 20))
#to fix holes, fill in using matrix/avg of surrounding cells
fill.na <- function(x, i=5) { if (is.na(x)[i]) { return(mean(x, na.rm = TRUE)) } else { return(x[i]) }}
w <- matrix(1, 3, 3)
nw_dsm = terra::focal(nw_dsm, w, fun = fill.na)
plot(nw_dsm)

Upon visual inspection, that looks good! We will do some quality and accuracy checking below, but I am happy with the result so far.


Generate DTM

We can re-generate DTMs for each area, but we will start with the missing DTM and create the methods to apply to the other areas.

# read the south-east LY_Lyell lidar catalog
se_ctg = readLAScatalog("LY_Lyell/")
# remove class 7 (noise) points -- these points likely aren't classified anyways, but I'll still include this 
opt_filter(se_ctg) = "-drop_class 7"

We need to classify ground points, which has been computationally expensive, so we will write to disk first. Then, we will rasterize the terrain (aka generate DTM, aka tile the data) and utilize the temporary disk writing for this step as well.

# temporary ground classification catalog
opt_output_files(se_ctg) = paste0(tempdir(), "{*}_ground")
# classify ground points
se_ground = classify_ground(se_ctg, algorithm = csf())

# temporary dtm generation catalog
opt_output_files(se_ground) <- paste0(tempdir(), "/{*}_dtm")
# generate DTM & plot
se_dtm = rasterize_terrain(se_ground, res=1, algorithm=tin(max_edge=5), pkg='terra', overwrite=T)
plot(se_dtm)

Okay, cool! Something was created. Zooming in a bit, we see an issue, though:

zoom = ext(c(299500, 301500, 4180750, 4182000))
window(se_dtm) = zoom
plot(se_dtm)

As we can see from the plotted DTM, there are a few sections with straight lines, as we were concerned about with the poorly classified original data. Additionally, there are no significant holes in the dataset which would indicate lakes/bodies of water. After consulting with Matt and analyzing the areas which aren’t long, straight lines, we have determined that I should just clip the DTM to the extent of the DSM.

Clip DTM to DSM extent

We have to match the DTM & DSM extents (we have the DSM from the data downloaded from OT) and coordinate reference systems (CRS) in order to align the data, then we can mask the DTM.

# read in the recently generated DTM
se_dtm = rast("se_dtm.tif")
# read in the DSM from OT
se_dsm = rast("CA07_Stock_hh/ulyl_grd/hdr.adf")
#check the dtm & dsm information (including CRS info)
se_dtm
## class       : SpatRaster 
## dimensions  : 3554, 7208, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 295571, 302779, 4178411, 4181965  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source      : se_dtm.tif 
## name        :        Z 
## min value   : 3154.551 
## max value   : 3992.007
se_dsm
## class       : SpatRaster 
## dimensions  : 4000, 8000, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 295000, 303000, 4178000, 4182000  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=11 +datum=NAD83 +units=m +no_defs 
## source      : hdr.adf 
## name        :      hdr 
## min value   : 3154.574 
## max value   : 3998.027

It looks like we have incongruous datums, so we will match our DTM to the existing DSM, then align the extents and crop/mask.

# match the DTM to the DSM CRS
se_dtm = project(se_dtm, se_dsm, method="near", mask=T, res=1)
# match the extents of the DTM and DSM
se_dtm = crop(se_dtm, se_dsm, ext=TRUE)
# mask the DTM using the DSM
se_dtm_crop = mask(se_dtm, se_dsm)
plot(se_dtm_crop)

Product Check

Let’s check our DTM and DSM values by generating canopy height models (CHMs). By subtracting the DTM from the DSM and creating the CHM, we can see if we have accurate raster values. Trees should be shown with heights that make sense for the study area, and overall values should be less than 100. We can also look out for NA values, or try to flesh out any other errors.

First the south-east/LY_Lyell DTM:

se_chm = se_dsm - se_dtm_crop
plot(se_chm)

Okay, this looks good! It appears that most values are between 0-50m, which makes sense. There are a few outliers; not sure about this. I’ll check out this CHM a little bit more in QGIS, but we appear to be on a good track.

Next the north-west/CN_Conness DSM:

nw_dtm = rast("C:/Users/ECE/allison_temp/CA07_Stock/CA07_Stock_be/fmcn_grd/hdr.adf")
nw_dsm = project(nw_dsm, nw_dtm, method='near',mask=T,res=1)
nw_dsm = crop(nw_dsm, nw_dtm, ext=T)

nw_chm = nw_dsm - nw_dtm
plot(nw_chm)

This one looks even better than the other one and has better values, so I’m calling it a success!

Now we will just convert our raster products from .tif to .cog, and submit the products for review!