Markdown Author: Jessie Bell
Download this Rmd: Top right corner → Code → Download Rmd
Below the theoretical = competition, above = contagion.
Data: SPOROPHORES
Data Source: Ford et al. 1980
Type: Marked Planar Points (attributes for points included)
Intensity: 0.004998072\(/cm^2\)
Length Unit: 1 cm
Window Area: 66025.5 \(cm^2\)
Species: Laccaria laccata, Lacaria pubescens, Hebeloma spp.
#reg plot
#plot(sporophores, chars=c(16,1,2), cex=0.6, leg.args=list(cex=1.1))
#points(0,0,pch=16, cex=2)
#text(15,8,"Tree", cex=0.75)
#turn data into dataframe
sporophores_df <- as.data.frame(sporophores)
#make cute colors
cols <- c("plum", "orange", "#5f9ea0")
#rename marks
sporophores_df$Species <- sporophores_df$marks
#make cuter plot
ggplot()+
geom_point(sporophores_df, mapping=aes(x, y, color=Species, pch=Species))+
scale_color_manual(values=cols)+
theme_light()+
coord_polar()+
labs(x="", y="")+
geom_text() +
annotate(
"text", label = "▲",
x = 60, y = -110, size = 6, colour = "#593e31"
)
Figure 1: Spatial arrangement for three different species of fungi, measurements were collected by counting macroscopic fruiting bodies surrounding a silver birch tree (Betula pendula). L. lacata seems to enjoy the north side of the tree (assuming up is north here) and Hebeloma spp. enjoys fruiting near the southern part of this tree. Data location: Scotland.
hebelome <- subset(sporophores_df, Species=="Hebloma spp")
Plot Thoughts: L. lacatta is clumpy close to the tree. This species is also clumpy near L. pubescens. They are found far from the tree and outside of Hebeloma spp. and I cannot tell the spatial relationships yet. Hebeloma spp. seems to like being on the southern side of the tree (if up is N) and closer to it. Lemme try to see the density of the points.
Spores <- density(sporophores)
df <- as.data.frame(Spores)
#make pretty continuous colors for density plot
colramp <- colorRampPalette(cols)(100)
persp(Spores, colmap = colramp, theta = 180, phi = 30)
Figure 2: Density plot for sporophores. You can see that there are a few locations surrounding the tree that have clumped fruiting bodies. There are also locations that are clearly void of macroscopic sporophores.
#Now try an overlay
ggplot(sporophores_df, aes(x=x, y=y)) +
stat_density_2d(aes(fill = ..level..), geom = "polygon")+
scale_fill_distiller(palette= "Set2", direction=1)+
geom_contour(df, mapping=aes(x=x, y=y, z=value), color="#fffff0")+
theme_classic()+
labs(title="Mushroom Density", x="", y="", fill="Density", caption="Plot Brown ● = Betula pendula")+
annotate(
"text", label = "●",
x = 0, y = 0, size = 8, colour = "#593e31"
)+
geom_point(sporophores_df, mapping=aes(x=x, y=y, shape=Species), alpha=0.7)
Figure 2: Density plot for sporophores. You can see that there are a few locations surrounding the tree that have clumped fruiting bodies. There are also locations that are clearly void of macroscopic sporophores.
Plot Thoughts: L. lacatta and Hebeloma spp. are pretty clustered in the northeast corner of the density plot. There are also other clumpy mushies on the southern side of the tree. This figure can help answer the first question.
#turn data into ppp
spores_ppp <- ppp(sporophores_df$x, sporophores_df$y)
Lestimation <- envelope(sporophores, fun= Lest, nsim= 100, verbose=F)
#plot(Lestimation)
#make legend for ggplot
cols <- c("High envelope" = "gray", "Observed" = "plum", "Theoretical" = "red", "Low envelope")
ggplot()+
geom_line(Lestimation, mapping=aes(r, hi), color="gray")+
geom_line(Lestimation, mapping=aes(r, obs), color="plum")+
geom_line(Lestimation, mapping=aes(r, theo), color="red", linetype="dashed")+
geom_line(Lestimation, mapping=aes(r, lo), color="gray")+
theme_classic()+
labs(x="r", y="LHat", title="L estimation for all mushies", caption="gray = envelope | red = theoretical | plum = observed")+
theme(text=element_text(size=10,
family="Corbel"))
Figure 3: L estimation plots computed using Ripley’s K estimation transformation. Here you can see that the observed values are above the envelope of estimation, indicating that this point data for sporophores is mostly clumped. We cannot tell associations between species using this estimation though.
Plot Thoughts: All things clumpy up in here, but this doesn’t really answer the question being asked about associations between species.
Species <- alltypes(sporophores, "L", envelope=T, verbose=F)
plot(Species[1, 3], main="L. lacatta & Hebeloma spp.")
Figure 4: At about 35 cm L. lacatta (n=190) and Hebeloma spp. (n=129) become repelled (competitive) and before that they seem to both resemble CSR.
plot(Species[2,1], main="L. pubescens & L. lacatta")
Figure 5: Both L. pubescens (n=11) and L. lacatta (n=190) follow CSR until about 20-40 cm where they begin to show contagion patter. From about 40 - 50 cm they follow CSR again until they become competitive with one another. There are very few fruiting bodies for L. pubescens though.
plot(Species[3,2], main="Hebeloma spp. & L. pubescens")
Figure 6: Both Hebloma spp. (n=129) and L. pubescens (n=11) show CSR until 15-25 cm where there is a slight repulsion between them (indicating competition). They then follow CRS until about 60 cm where they show attractive relationship with one another.
In the paper the authors suggest two things which we can look at using PPA. First, they suggest that there “were distinct arcs around the tree which were not occupied by sporophores of any species.” Second, they say that there is “no consistent evidence of spatial inhibition between species.” We just have one of the five years of data to look at here. But with the data we have, is there any suggestion of inhibition between species?
Question 1: Are there distinct arcs around the tree which were not occupied by sporophores of any species?
* There seem to be obvious patterns of clumping for sporophores around the silver birch (Figures 1:3). You can also see that there are locations around the tree that have no sporophores and are in fact seemingly void of them (Figure 2). This doesn’t necessarily mean there are no mycorrhizal networks in that location, but it suggests that these fungi prefer to fruit in other locations.
Question 2: Is there consistent evidence of spatial inhibition between species?
* There is spatial inhibition between the two most dense species, L. lacatta (n=190) & Hebeloma spp. (n=129). They seem to compete as they get further from the tree, at about 35 cm. From 0-35 cm they resemble CSR. This seems to provide evidence that, yes, there is spatial inhibition between these two species at 35 cm from the tree, where they begin to compete with one another. (Figure 4)
* There seems to be no spatial pattern between both L. lacatta (n=190) & L. pubescens (n=11) from 0-20 cm. There is a moment in space (~20-40 cm) that they are attracted to one another, they form CSR again from 40-50 cm, and from 50 onward they show a repulsive relationship. I think more data needs collected between these two species to make any spatial interaction claims. (Figure 5)
* There seems to be no spatial pattern between both Hebeloma spp. (n=129) & L. pubescens (n=11) until about 15-30 cm where. There is a moment in space (~20-40 cm) that they are repulsed from one another, they form CSR again from 40-50 cm, and from 50 onward they show an attraction with one another. This surely is not enough evidence to demonstrate spatial inhibition between these species. (Figure 6)
D Pattern \(\hat{L}\) Estimation Prediction: above theoretical at close distances because points are clumped, and below theoretical as distance increases because the clumps are competitive with one another.
C Pattern \(\hat{L}\) Estimation Prediction: above the envelope right away and then underneath the remainder of the way. Note: I was wrong about this. The points being in a line created a transformed Ripley’s K of what appears to be a ladder with most values in the competitive region of the plot.
B Pattern \(\hat{L}\) Estimation Prediction: follow the envelope of theoretical values but tend on the upper side of it (because they seem a tiny bit clumped). Mostly random though.
A Pattern \(\hat{L}\) Estimation Prediction: same as above but tend on the lower side of envelope. Note: IDK how to explain this one….Maybe setting the x and y limits decreases our window size so we are seeing a smaller bit of space?
n <- 100 #sample size
patternA <- data.frame(x=rnorm(n), y=rnorm(n), id="A") #makes 100 doubles for column 1 and 2, then for the third it labels all of them A
patternB <- data.frame(x=runif(n), y=runif(n), id="B") #runif generates random deviates from teh stats package
patternC <- data.frame(expand.grid(x=seq(0,1, length.out=sqrt(n)), y=seq(0,1, length.out=sqrt(n)), id="C")) #not sure of the purpose of length.out to the sqrt of n...but makes a list of numbers from 0-1 with lots of floats.
patternD <- data.frame(expand.grid(x=rep(seq(0,1,length.out = 5), 2), y=rep(seq(0,1,length.out = 5), 2), id="D"))
#now jitter the x column numbers? IDK why but will find out!
patternD$x <- jitter(patternD$x)
patternD$y <- jitter(patternD$y)
#now bind all rows together from all patterns
simulationData <- bind_rows(patternA, patternB, patternC, patternD)
patternD <- subset(simulationData, id=="D")
patternD <- ppp(x=patternD$x, y=patternD$y, xrange=c(0,1), yrange=c(0,1), unitname="km")
patternC <- ppp(x=patternC$x, y=patternC$y, xrange=c(0,1), yrange=c(0,1), unitname="km")
patternB <- ppp(x=patternB$x, y=patternB$y, xrange=c(0,1), yrange=c(0,1), unitname="km")
patternA <- ppp(x=patternA$x, y=patternA$y, xrange=c(0,1), yrange=c(0,1), unitname="km")
patternD_l <- envelope(patternD, fun=Lest, nsim=n, verbose=F)
patternC_l <- envelope(patternC, fun=Lest, nsim=n, verbose=F)
patternB_l <- envelope(patternB, fun=Lest, nsim=n, verbose=F)
patternA_l <- envelope(patternA, Lest, nsim=n, verbose=F)
par(mfrow = c(2, 2))
plot(patternD_l, main="D")#not what I expected. Ahh now I see it. The clumps are attracted to one another at close distances and they become repulsed at further distances.
plot(patternC_l, main="C") #WAY WRONG. WTH
plot(patternB_l, main="B")
plot(patternA_l, main="A")
Ya just need points, no other attributes required. This analysis just deals with whether or not points are:
RANDOM | ATTRACTED | REPULSED
Two methods for determining Complete Spatial Randomness (CSR) include:
n <- 100 #sample size
patternA <- data.frame(x=rnorm(n), y=rnorm(n), id="A") #makes 100 doubles for column 1 and 2, then for the third it labels all of them A
patternB <- data.frame(x=runif(n), y=runif(n), id="B") #runif generates random deviates from teh stats package
patternC <- data.frame(expand.grid(x=seq(0,1, length.out=sqrt(n)), y=seq(0,1, length.out=sqrt(n)), id="C")) #not sure of the purpose of length.out to the sqrt of n...but makes a list of numbers from 0-1 with lots of floats.
patternD <- data.frame(expand.grid(x=rep(seq(0,1,length.out = 5), 2), y=rep(seq(0,1,length.out = 5), 2), id="D"))
#now jitter the x column numbers? IDK why but will find out!
patternD$x <- jitter(patternD$x)
patternD$y <- jitter(patternD$y)
#now bind all rows together from all patterns
simulationData <- bind_rows(patternA, patternB, patternC, patternD)
simulationData <- simulationData %>% group_by(id) %>%
mutate(x=scales::rescale(x), y=scales::rescale(y)) #IDK what this did.
ggplot(simulationData, aes(x=x, y=y))+
geom_point()+
coord_fixed()+
facet_wrap(~id)
data("japanesepines")
japaneseString <- str(japanesepines)
## List of 5
## $ window :List of 4
## ..$ type : chr "rectangle"
## ..$ xrange: num [1:2] 0 1
## ..$ yrange: num [1:2] 0 1
## ..$ units :List of 3
## .. ..$ singular : chr "metre"
## .. ..$ plural : chr "metres"
## .. ..$ multiplier: num 5.7
## .. ..- attr(*, "class")= chr "unitname"
## ..- attr(*, "class")= chr "owin"
## $ n : int 65
## $ x : num [1:65] 0.09 0.29 0.38 0.39 0.48 0.59 0.65 0.67 0.73 0.79 ...
## $ y : num [1:65] 0.09 0.02 0.03 0.18 0.03 0.02 0.16 0.13 0.13 0.03 ...
## $ markformat: chr "none"
## - attr(*, "class")= chr "ppp"
data(redwood)
redwoodString <- str(redwood)
## List of 5
## $ window :List of 4
## ..$ type : chr "rectangle"
## ..$ xrange: num [1:2] 0 1
## ..$ yrange: num [1:2] -1 0
## ..$ units :List of 3
## .. ..$ singular : chr "unit"
## .. ..$ plural : chr "units"
## .. ..$ multiplier: num 1
## .. ..- attr(*, "class")= chr "unitname"
## ..- attr(*, "class")= chr "owin"
## $ n : int 62
## $ x : num [1:62] 0.36 0.44 0.48 0.48 0.5 0.76 0.78 0.78 0.84 0.86 ...
## $ y : num [1:62] -0.08 -0.1 -0.08 -0.14 -0.1 -0.14 -0.12 -0.16 -0.08 -0.18 ...
## $ markformat: chr "none"
## - attr(*, "class")= chr "ppp"
#notice you can see the -attr(*, "class") = chr "ppp
#the spatstat package uses ppp class as a handy way of storing this type of spatial data. (i.e. shapefiles)
summary <- summary(japanesepines) #look at all that cool data!
Not all data are in the same units above. This won’t matter in our interpretation though.
plot(japanesepines$x, japanesepines$y, col="#a243af", pch="^")
plot(redwood$x, redwood$y, col="red", pch="^")
plot(bei$x, bei$y, col="steelblue", pch="^")
# Since the data are planar point patterns...just tibble!
japanTibble <- tibble(x=japanesepines$x *japanesepines$window$units$multiplier, y=japanesepines$y * japanesepines$window$units$multiplier) #here we used teh planar point frame to pull out the units (in this case 5.7 meters) and multiply each x and y by those points.
#Now use ggplot on tibble
ggplot(japanTibble, aes(x, y))+
geom_point() #ooohhh yahhh
#make it more sophisticated
ggplot <- ggplot(japanTibble, aes(x, y))+
geom_point()+
labs(x=paste("x (", japanesepines$window$units$plural, ")", sep=""), y=paste("y (", japanesepines$window$units$plural, ")", sep=""))
ggplot +
coord_polar() #hmm
ggplot +
coord_cartesian()
ggplot +
coord_equal()
ggplot+
coord_flip() #flips x and y axis
ggplot+
coord_radial()
ggplot+
coord_sf()
ggplot+
coord_fixed()+
theme_minimal()
Non-parametric method for estimating the probability density function of a variable. The key to using kernels is selecting bandwidth.
Example:
x <- rnorm(n)
hist(x, freq=F, col="white")
lines(density(x), col="red")
#or with ggplot
ggplot()+
geom_histogram(aes(x, after_stat(density)), bins=30,fill="white", color="black")+
geom_density(aes(x), color="#a243af")+
theme_dark()
#now change bin number to see how smoothing changes
ggplot()+
geom_histogram(aes(x, after_stat(density)), bins=100,fill="white", color="black")+
geom_density(aes(x), color="#a243af")+
theme_dark()
ggplot()+
geom_histogram(aes(x, after_stat(density)), bins=5,fill="white", color="black")+
geom_density(aes(x), color="#a243af")+
theme_dark()
#density function from the Kernel Density Estimation stats() package computes the kernel density estimates using the given kernel and bandwidth for the univariate observation
japanese_density <- density(japanesepines)
summary(japanese_density)
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [0, 1] x [0, 1] units (one unit = 5.7 metres)
## dimensions of each pixel: 0.00781 x 0.0078125 units
## (one unit = 5.7 metres)
## Image is defined on the full rectangular grid
## Frame area = 1 square units
## Pixel values
## range = [25.17565, 133.178]
## integral = 63.61766
## mean = 63.61766
# now make a plot with it using persp(): This function draws perspective plots of a surface over the x–y plane. persp is a generic function.
persp(japanese_density, theta=70, phi=30)
These show you the intensity of the points. The units are \(points/area\). This is a non-parametric estimate and the function density() from the stats package picks your binwidths of the kernel for you. You can change your binwidth by changing the value of sigma. Example: plot(density(bei,sigma=10))
plot(japanese_density)
contour(japanese_density, add=T)
points(japanesepines, pch=20)
threed <- kde2d(japanesepines$x, japanesepines$y)
persp(threed, box=F)
#colored
persp(threed, box = F, col = viridis::viridis(100))
#interactive
plot_ly(x=threed$x, y=threed$y, z=threed$z) %>% add_surface()
This approach makes messing around with bandwidths easy, because it quantifies the distances you should use!
There are three ways to do this in the reading:
1. \(\hat{G}\) measures the distance to the nearest event
2. \(\hat{F}\) measures the distance from a point to the nearest event
3. Ripley’s K: The cadillac. This detects deviations from spatial homogeneity. Ripley’s K describes the point processes at many distance scales. This is an attractive property of the statistic because many environmental point patterns can cluster at one distance and become random or uniform at another.
\[\begin{equation} { \hat{K}(r) = \lambda^{-1} \Sigma_{i\neq j} I(d_{ij} < r)/n} \end{equation}{}\]\(d_{ij}\): distance between points \(i\) and \(j\)
\(n\): number of points
\(r\): radius
\(\lambda\): mean density of points
\(I\): indicator/characteristic function boolean (T = 1; F = 0)
\(K\): measures the number of events found up to a given distance (r)
CSR Ripley’s K equation looks like →
\[\begin{equation} { \hat{K}(r) = \pi r^2} \end{equation}{}\]So we can use out \(\hat{K}\) to test it against a CSR for a homogeneous Poisson process.
We can also calculate the envelope of possible values using Monte Carlo where the number of simulations specifies the significant level as \(\alpha = 2/(1+n)\) where \(n\) is the number of simulations. If the observed function value lies outside of the envelope, we can reject the null hypothesis of CSR at the distance \(r\).
#build the lopes for each point dataset
n <- 100
Ripley_japanesepines <- envelope(japanesepines, fun=Kest, nsim=n, verbose=F)
Ripleys_bei <- envelope(bei, fun=Kest, nsim=n, verbose=F)
Ripley_redwood <- envelope(redwood, fun=Kest, nsim=n, verbose=F)
plot(Ripley_japanesepines)
plot(Ripley_redwood)
plot(Ripleys_bei)
Japanese Pines shows CSR at all distances.
Redwood shows clumping at shorter distances and then randomness at longer distances.
Bei shows intense clumping at all distances.
Thus, the japanesepines are randomly distributed while the other two are clumped but in different ways. The redwood data are clumped out to a radius of \(1/5\) of the bounding box while the bei data are clustered out as far as we can measure. Oh, and how far can we measure before running into edge effects?
There are two ways of dealing with this. One, we can limit our analysis to short distances to reduce the number of edge cases. Two, we can try to correct for the edge effects. We can also look at more of our data, using rmax= argument in our envelope function to a max of 50% of these data (note that our OG plots above are only 25% of our data). Ripley does not like this though, and recommends only 25%.
plot(envelope(redwood, fun=Kest, nsim=10, verbose = F, rmax=0.5), main="")
Another thing we can do is correct for the edge effect. You can read through Goreaud’s 1999 (paper)[https://www.jstor.org/stable/3237072?seq=1#metadata_info_tab_contents] for more on these corrections.
The Bivand reading does walk through one correction. So here it is:
Ripleys_part2_japan <- Kest(japanesepines, verbose=F, correction=c("none", "isotropic", "border"))
plot(Ripleys_part2_japan)
The transformation of \(\hat{K}\) is linear as \(\hat{L}\).
\[\begin{equation} {L(r) = \sqrt {K(r)/ \pi}} \end{equation}{}\]CSR \(\hat{L}\) equation looks like →
\[\begin{equation} { L(r) = r} \end{equation}{}\]Lest instead of Kest for fun operation.
Lest_redwood <- envelope(redwood, fun=Lest, nsim=n, verbose=F)
plot(Lest_redwood)
See Andy’s explanation of \(\hat{K}(r)-\)\(\pi\)\(r^2\) to create the theoretical expectation of y being equal to 0 at every point. Still not making sense to me. UGH.
data(longleaf)
summary(longleaf)
## Marked planar point pattern: 584 points
## Average intensity 0.0146 points per square metre
##
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 metres
##
## marks are numeric, of type 'double'
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 9.10 26.15 26.84 42.12 75.90
##
## Window: rectangle = [0, 200] x [0, 200] metres
## Window area = 40000 square metres
## Unit of length: 1 metre
plot(longleaf)
hist(longleaf$marks)
bigTrees <- subset(longleaf, marks > 50)
plot(envelope(bigTrees, fun = Lest, nsim = n, verbose=FALSE),
main = "Big Trees")
So in contrast to the whole data set, we see that while the entire data set shows clustering, the big trees show CSR. That’s kind of cool. We could start speculating about what ecological processes might drive that pattern.
Standard practice in forestry is to consider trees as adults when they have diameters of 30cm of greater. Let’s see if our understanding of the point patterns change by age class.
First, we will make a new object that creates categories (factor) from the continuous (numeric) size class of DBH using the cut function:
longleaf2 <- cut(longleaf, breaks=c(0,30, Inf),
labels=c("Sapling", "Grown-up"))
summary(longleaf2)
## Marked planar point pattern: 584 points
## Average intensity 0.0146 points per square metre
##
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 metres
##
## Multitype:
## frequency proportion intensity
## Sapling 313 0.5359589 0.007825
## Grown-up 271 0.4640411 0.006775
##
## Window: rectangle = [0, 200] x [0, 200] metres
## Window area = 40000 square metres
## Unit of length: 1 metre
plot(longleaf2) #adults look clustered! Lets check
grownups <- subset(longleaf2, marks=="Grown-up", drop=T)
plot(envelope(grownups, fun=Lest, verbose = F), main="Grown-ups")
saplings <- subset(longleaf2, marks == "Sapling",drop=TRUE)
plot(envelope(saplings, fun = Lest, verbose=FALSE), main="Sapling")
Quick aside, note that we were actually calling subset.ppp and cut.ppp above because the class ppp has methods for subset and cut.
In the plotted L functions above we definitely see that both age classes are clustered and the saplings much more so than the adults. But looking at the map I think we should see how those age classes interact. Because it looks to me like they are in different places. Let’s plot the density map of the adults and put the points where the sapling occur on top.
adults.den <- density(grownups)
plot(adults.den)
points(saplings, pch=20)
That is fascinating. Is it possible that the two age classes are repelled from each other?
longleaf2_Lest <- envelope(longleaf2, "Lcross", i="Sapling", j="Grown-up", verbose=F)
plot(longleaf2_Lest)