Hypothesis

I’m interested in looking at how tree canopy cover changes across different areas of Ann Arbor, MI. I’d like to look at the heights and diameters of trees in areas close to the University of Michigan campus that tend to be student and renter heavy, as well as areas further away from campus that are owned by more families. I hypothesize that the tree canopy cover will be more extensive in the areas further away from campus because of increased importance and investment of landscaping for long-term home owners.

## Dependencies
library(sf)
library(terra)
library(tmap)
library(tmaptools)
library(lidR)
library(ForestTools)
library(ggplot2)
library(gstat)
library(leaflet)
# For ggplot basemap (optional)
library(ggspatial)

Workflow

To test this, I’ll be analyzing liDAR data of the Ann Arbor area and analyzing it on R using the lidaR package. This data includes pixels with labels that can be used to analyze subsets of the point data, which I visualize using leaflet. I will use the dataset 292280 for the “student housing” group because it immediately adjoins campus and has high student renting density, and 287282 for the “family housing” because of it’s proximity to elementary schools, parks, and higher ownership.

#load the spatial data
lidar_data <- st_read("Washtenaw_Index.shp", quiet = TRUE)
lidar_data <- st_transform(lidar_data, 4326)
st_bbox(lidar_data)
##      xmin      ymin      xmax      ymax 
## -84.13042  42.07568 -83.53119  42.43773
nm <- if ("Name" %in% names(lidar_data)) lidar_data$Name else seq_len(nrow(lidar_data))
leaflet(lidar_data) %>% 
  addTiles() %>%  
  addPolygons(color = "#444", weight = 1, smoothFactor = 0.5,
              opacity = 1, fillOpacity = 0.05, fillColor = NA,
              highlightOptions = highlightOptions(color = "blue", weight = 2, bringToFront = TRUE),
              popup = paste("LiDAR Index:", nm))
las_student <- readLAS("Pointclouds_AnnArbor/292280.las")
## Warning: There are 2672 points flagged 'withheld'.
las_family <- readLAS("Pointclouds_AnnArbor/287282.las")
## Warning: There are 1009 points flagged 'withheld'.
crs(las_student) <- "EPSG:8705"
summary(las_student)
crs(las_family) <- "EPSG:8705"
summary(las_family)

Because the variable of interest is trees, which are elevated about the bare earth, I’ll use a Digital Surface Model (DSM).

dsm_student <- grid_canopy(las_student, res = 3, pitfree(c(0,2,5,10,15), c(0,1)))
dsm_family <- grid_canopy(las_family, res = 3, pitfree(c(0,2,5,10,15), c(0,1)))
tm_shape(dsm_student) + tm_raster(style="cont") + tm_layout(legend.outside=TRUE)

tm_shape(dsm_family) + tm_raster(style="cont") + tm_layout(legend.outside=TRUE)

Let’s see if the datasets we pulled are classified.

stopifnot(!is.null(las_student))
table(las_student$Classification, useNA = "ifany")
## 
##       1       2       7 
## 1119455  929650    2672
stopifnot(!is.null(las_family))
table(las_family$Classification, useNA = "ifany")
## 
##       1       2       7 
## 1710058 1109410    1009

We have the classifications of cover that are identified by 1, 2, and 7. These coordinate to unassigned, ground, and high point. We’ll add labels for these for both the student and family groups.

#for student
cls_map_student <- c(
  `1`="Unassigned", `2`="Ground",
  `7`="Low point"
)

tab_student <- table(las_student$Classification)
lbl_student <- cls_map_student[match(names(tab_student), names(cls_map_student))]
lbl_student[is.na(lbl_student)] <- names(tab_student)[is.na(lbl_student)]  # keep unknown codes as numbers
data.frame(class = lbl_student, code = names(tab_student), n = as.integer(tab_student))
##        class code       n
## 1 Unassigned    1 1119455
## 2     Ground    2  929650
## 7  Low point    7    2672
#for family
cls_map_family <- c(
  `1`="Unassigned", `2`="Ground",
  `7`="Low point"
)

tab_family <- table(las_family$Classification)
lbl_family <- cls_map_family[match(names(tab_family), names(cls_map_family))]
lbl_family[is.na(lbl_family)] <- names(tab_family)[is.na(lbl_family)]  
data.frame(class = lbl_family, code = names(tab_family), n = as.integer(tab_family))
##        class code       n
## 1 Unassigned    1 1710058
## 2     Ground    2 1109410
## 7  Low point    7    1009

Because Ann Arbor’s liDAR data doesn’t include any canopy height, we’ll try to create an estimate.

# Normalize heights (subtract ground)
las_student_norm <- normalize_height(las_student, knnidw(k = 8, p = 2))
summary(las_student_norm)
chm_student <- grid_canopy(las_student_norm, res = 3, p2r())
tm_shape(chm_student) + tm_raster(style = "cont", title = "Canopy Height (ft)")

# make canopy height model
chm_student <- grid_canopy(las_student_norm, res = 3, p2r())

# raster diagnostics (terra) to ensure normalization worked
library(terra)
vals <- values(chm_student)
summary(vals)
table(is.na(vals))
pct_zero_cells <- sum(vals == 0, na.rm = TRUE) / sum(!is.na(vals)) * 100
max(vals, na.rm = TRUE)
quantile(vals, probs = c(0, .01, .05, .25, .5, .75, .95, .99, 1), na.rm = TRUE)

# quick plot
plot(chm_student, main = "CHM (ft)")

# clip negative values and outliers
chm_student[chm_student < 0] <- 0
summary(values(chm_student))

# now same as above but for family group
las_family_norm <- normalize_height(las_family, knnidw(k = 8, p = 2))
summary(las_family_norm)
chm_family <- grid_canopy(las_family_norm, res = 3, p2r())
tm_shape(chm_family) + tm_raster(style = "cont", title = "Canopy Height (ft)")

chm_family <- grid_canopy(las_family_norm, res = 3, p2r())
library(terra)
vals_family <- values(chm_family)
summary(vals_family)
table(is.na(vals_family))
pct_zero_cells <- sum(vals_family == 0, na.rm = TRUE) / sum(!is.na(vals_family)) * 100
max(vals_family, na.rm = TRUE)
quantile(vals_family, probs = c(0, .01, .05, .25, .5, .75, .95, .99, 1), na.rm = TRUE)
plot(chm_family, main = "CHM (ft)")

# clip negative values 
chm_family[chm_family < 0] <- 0
summary(values(chm_family))

Tree height operations can be done with the ForestTools.

# Attach the 'ForestTools' and 'raster' libraries
library(ForestTools)
library(raster)
data("chm_student")
data("chm_family")

Using these canopy height datasets, I’ll calculate a percent canopy cover for our two groups, which only includes vegetation over 10 ft, excluding shrubs and focusing on the more mature trees in each dataset.

canopy_cover<- function(chm, threshold=10){
  vals <- values(chm)
  sum(vals > threshold, na.rm=TRUE) / sum(!is.na(vals)) * 100
}

canopy_cover_student<-canopy_cover(chm_student, threshold = 10) 
canopy_cover_family<- canopy_cover(chm_family, threshold = 10)  

I had to change my workflow a bit because of some unexpected issues, so we’ll give a little bit more comprehensive statistics in a table.

data.frame(
  Area = c("Student", "Family"),
  MedianHeight_ft = c(median(values(chm_student), na.rm=TRUE),
                      median(values(chm_family), na.rm=TRUE)),
  MeanHeight_ft   = c(mean(values(chm_student), na.rm=TRUE),
                      mean(values(chm_family), na.rm=TRUE)),
  PercentCanopy   = c(canopy_cover_student, canopy_cover_family)
)
##      Area MedianHeight_ft MeanHeight_ft PercentCanopy
## 1 Student            5.26      17.45093      48.01825
## 2  Family           13.22      19.48516      54.33222

Applications and Insights

Additionally, provide an assessment of the trajectory of 3D technology, including immersive environments, within the academic and professional landscape. (Maximum 4 paragraphs)

I originally wanted to look at the heights and diameters of the trees and compare them across the two datasets I pulled. However, the Ann Arbor liDAR didn’t include any canopy height measurements, so I had to create these myself through normalization. Because of this, I believe my values weren’t as accurate as they should have been, which caused a lot of problems when transforming them into a raster. I ended up using the simpler process of calculating percent canopy cover from the raw normalized CHM datasets I made, which provide a lot less information than I would have hoped, but still answer a portion of my hypothesis. Overall, the outcome does fit with my hypothesis. The “family” dataset, further away from campus, with more owned homes has a higher percent canopy, but also higher medians and means for heights of trees, possibly because of the trees age. This may signal lack of investment in the “student” dataset of the neighborhood closer to campus, or may just be an artifact of the ages of the different neighborhoods.

I enjoyed using the 3D in visualizing my data, but it does complicate making an html, as these images that you can manually spin can’t be knitted directly into the html from markdown. I did look at 3D plots of liDAR data for both of my datasets, but chose to remove them because of this complicated extra step that doesn’t play a large role in the answering of my hypothesis. I think that 3D visualizations make looking at objects like trees, houses, and parks a lot more powerful because you can understand the “viewscape” of the area, instead of purely visualizing from an aerial view. Like we talked about in class, I think 3D technology will be hugely important for interactive learning with minimal risk, like in dentistry and medical school. It will also be really helpful for incorporating community co-development into ecosystem restoration research applications, since it will allow actual visualization of outcomes instead of complex and often jargon filled descriptions, or less impactful 2D visualizations.