Problem Set 4

Problem IV.1 The measure of the world, redux

I was surprised by the sheer combined biomass of single cell organisms, especially the significant proportion of worldwide biomass accounted for by bacteria. Since we cannot typically see single-cell bacteria, it is easy to forget how unthinkably numerous they are in the environment. I was also surprised (and somewhat disheartened) by the disproportionate biomass of cattle compared to other mammals. Livestock are generally an inefficient source of food energy for humans and, considering the cruelty and environmental impact of factory farming, the amount of resources we are sinking into maintaining their significant biomass is concerning. Speaking of human impacts, I was surprised to read that the biomass of trees and plants overall are estimated to have declined twofold since the dawn of human civilization. It is hard to imagine plant biomass being even greater than what it currently is! I was also surprised by the proportion of biomass at the “deep subsurface” of the ocean. I am familiar with marine benthic and hydrothermal vent ecosystems, but had no idea this life is so copious. One last small fact I was surprised by is that I didn’t realize there were marine fungi, which appear to occupy only a marginal amount of fungi biomass, but it is still an interesting lifeform to imagine.
Next-generation sequencing is involved in determining which organisms are present in samples from an environment by simultaneously sequencing many DNA sequences. It provides information on the abundance of organisms in ecological communities. Remote sensing is another essential sampling and estimation tool that allows researchers to determine the abundance of organisms on a large scale or in otherwise inaccessible areas. This tool might be used where local sampling is impossible, such as at the bottom of the ocean or in estimating the canopy cover of the Amazon rainforest. Finally, taxonomic levels are useful in the analysis stage of the procedure in arranging organisms into related groups called taxa. Dividing overall biomass into these taxa allows researchers to perform statistical analyses on multiple organizational levels and make interpreting the data from a biological standpoint much easier. This paper displayed biomass based on taxonomic levels from the kingdom level—as in classifying plant biomass—to the species level—in the case of the disproportionately abundant humans or cattle.
I think that the step correlating sample biomasses to environmental parameters contributes the most variation in estimates for most if not all of the taxonomic groups. Even if a sample is completely representative of a local environment, it is difficult to generalize the biological character of its constituents to a global scale. Many taxa, especially within plants, have vastly different biomass manifestations within clades, even to the level of species where a single taxa can have significantly morphologically disparate ecotypes in different environments. This said, I imagine much variation stems from the imperfect correlation of ecological composition at one location to a similar ecosystem at another. Representative sampling is another difficult step that could lead to estimate variation. Some taxa are relatively easy to sample, in that they are large, countable, or otherwise easy to measure. Other taxa, such as microorganisms, themselves must be estimated rather than directly observed to produce local samples. I would expect greater variation in samples for taxa which require multiple “rounds” of estimation.
We see that the taxa with the highest amount of fold-change, or the highest degree of uncertainty, are the microscopic single-celled taxa such as bacteria, archaea, and viruses. Indeed, the smallest of these taxa, the viruses, has the highest fold-change. This is because, as I argued, individuals of these taxa are too miniscule to be directly counted and accurately weighed meaning that the local samples of their abundance used to create the global estimate must themselves be estimates. This compounding of estimation creates greater uncertainty because of the error that compounds alongside each estimate. Conversely, we see that more massive, countable, and sessile taxa such as plants and fungi are able to be counted directly and so have much less associated error.

setwd("~/Intro.Stats/Datasets")
data.set <- read.csv("biomass.csv")

biomass <- data.set$Mass.GtC.
fold.change <- data.set$Fold.change

### assign colors to taxa
colors <- c("red","blue","orange","green","brown","yellow","purple")
col.vec <- colors[as.factor(data.set$Taxon)]

### plot data with labels, corrected symbology using defined colors
plot(biomass,fold.change,
     pch=16,
     col=col.vec,
     cex=1.5,
     xlab="Estimated Biomass (GtC)",
     ylab="Fold-Change")

### create legend associating colored points with taxa names
legend("topright",
       legend=levels(as.factor(data.set$Taxon)),
       col=colors,
       pch=16,
       pt.cex=1.5)

Problem IV.2 What is wrong with the command?

R cannot locate any file called file.txt because it has not been defined with with quotation marks. Here is a corrected version:

data.set <- read.table("file.txt",header=TRUE,sep=",")

Blue must be assigned as the color using an equal sign, like this:

plot[data.set$a,data.set$b,col="blue")

All arguments relating to the creation or manipulation of a plot must be contained within the plot() function. This code has an extra parentheses after the two variables which excludes the color setting from the plot() function. Simply delete the second parentheses to correct the code:

plot(data.set$a,data.set$b,col="red")

Problem IV.3 The asteroid belt

a <- data.set$a
q <- data.set$q
w <- data.set$w

r <- (a+q)/2

The histogram shows most of the asteroids clustered from 0 to 5 AU with a small but conspicuous spike around 5 AU. The mode of over 2500 asteroids is around 2.5 AU. There appear to be virtually no asteroids with orbital radii greater than 6 AU.

hist(r,100,main="Histogram of Orbital Radii of Asteroids",xlab="Orbital Radii of Asteroids (AU)",ylab="Frequency")

ω <- 2*pi*(w/360)

x <- r*cos(ω)
y <- r*sin(ω)

plot(x,y,pch=16,col="blue",cex=0.1)

This plot shows the position of the asteroids in their orbits around the sun. As in the histogram, we see that the highest density of asteroids orbit from around 1 to 5 AU away from the sun with a mode around 2.5 AU away and a small local maximum at around 5 AU away.

Problem Set 4

Jack Behrens

2025-09-26

Problem IV.1 The measure of the world, redux

Problem IV.2 What is wrong with the command?

Problem IV.3 The asteroid belt

Problem IV.4 More about the mean?

Problem IV.5