Problem IV.1 The measure of the world, redux
- I was surprised by the sheer combined biomass of single cell
organisms, especially the significant proportion of worldwide biomass
accounted for by bacteria. Since we cannot typically see single-cell
bacteria, it is easy to forget how unthinkably numerous they are in the
environment. I was also surprised (and somewhat disheartened) by the
disproportionate biomass of cattle compared to other mammals. Livestock
are generally an inefficient source of food energy for humans and,
considering the cruelty and environmental impact of factory farming, the
amount of resources we are sinking into maintaining their significant
biomass is concerning. Speaking of human impacts, I was surprised to
read that the biomass of trees and plants overall are estimated to have
declined twofold since the dawn of human civilization. It is hard to
imagine plant biomass being even greater than what it currently is! I
was also surprised by the proportion of biomass at the “deep subsurface”
of the ocean. I am familiar with marine benthic and hydrothermal vent
ecosystems, but had no idea this life is so copious. One last small fact
I was surprised by is that I didn’t realize there were marine fungi,
which appear to occupy only a marginal amount of fungi biomass, but it
is still an interesting lifeform to imagine.
- Next-generation sequencing is involved in determining which
organisms are present in samples from an environment by simultaneously
sequencing many DNA sequences. It provides information on the abundance
of organisms in ecological communities. Remote sensing is another
essential sampling and estimation tool that allows researchers to
determine the abundance of organisms on a large scale or in otherwise
inaccessible areas. This tool might be used where local sampling is
impossible, such as at the bottom of the ocean or in estimating the
canopy cover of the Amazon rainforest. Finally, taxonomic levels are
useful in the analysis stage of the procedure in arranging organisms
into related groups called taxa. Dividing overall biomass into these
taxa allows researchers to perform statistical analyses on multiple
organizational levels and make interpreting the data from a biological
standpoint much easier. This paper displayed biomass based on taxonomic
levels from the kingdom level—as in classifying plant biomass—to the
species level—in the case of the disproportionately abundant humans or
cattle.
- I think that the step correlating sample biomasses to environmental
parameters contributes the most variation in estimates for most if not
all of the taxonomic groups. Even if a sample is completely
representative of a local environment, it is difficult to generalize the
biological character of its constituents to a global scale. Many taxa,
especially within plants, have vastly different biomass manifestations
within clades, even to the level of species where a single taxa can have
significantly morphologically disparate ecotypes in different
environments. This said, I imagine much variation stems from the
imperfect correlation of ecological composition at one location to a
similar ecosystem at another. Representative sampling is another
difficult step that could lead to estimate variation. Some taxa are
relatively easy to sample, in that they are large, countable, or
otherwise easy to measure. Other taxa, such as microorganisms,
themselves must be estimated rather than directly observed to produce
local samples. I would expect greater variation in samples for taxa
which require multiple “rounds” of estimation.
- We see that the taxa with the highest amount of fold-change, or the
highest degree of uncertainty, are the microscopic single-celled taxa
such as bacteria, archaea, and viruses. Indeed, the smallest of these
taxa, the viruses, has the highest fold-change. This is because, as I
argued, individuals of these taxa are too miniscule to be directly
counted and accurately weighed meaning that the local samples of their
abundance used to create the global estimate must themselves be
estimates. This compounding of estimation creates greater uncertainty
because of the error that compounds alongside each estimate. Conversely,
we see that more massive, countable, and sessile taxa such as plants and
fungi are able to be counted directly and so have much less associated
error.
setwd("~/Intro.Stats/Datasets")
data.set <- read.csv("biomass.csv")
biomass <- data.set$Mass.GtC.
fold.change <- data.set$Fold.change
### assign colors to taxa
colors <- c("red","blue","orange","green","brown","yellow","purple")
col.vec <- colors[as.factor(data.set$Taxon)]
### plot data with labels, corrected symbology using defined colors
plot(biomass,fold.change,
pch=16,
col=col.vec,
cex=1.5,
xlab="Estimated Biomass (GtC)",
ylab="Fold-Change")
### create legend associating colored points with taxa names
legend("topright",
legend=levels(as.factor(data.set$Taxon)),
col=colors,
pch=16,
pt.cex=1.5)

Problem IV.2 What is wrong with the command?
- R cannot locate any file called file.txt because it has not been
defined with with quotation marks. Here is a corrected version:
data.set <- read.table("file.txt",header=TRUE,sep=",")
- Blue must be assigned as the color using an equal sign, like
this:
plot[data.set$a,data.set$b,col="blue")
- All arguments relating to the creation or manipulation of a plot
must be contained within the plot() function. This code has an extra
parentheses after the two variables which excludes the color setting
from the plot() function. Simply delete the second parentheses to
correct the code:
plot(data.set$a,data.set$b,col="red")
Problem IV.3 The asteroid belt
a <- data.set$a
q <- data.set$q
w <- data.set$w
r <- (a+q)/2
- The histogram shows most of the asteroids clustered from 0 to 5 AU
with a small but conspicuous spike around 5 AU. The mode of over 2500
asteroids is around 2.5 AU. There appear to be virtually no asteroids
with orbital radii greater than 6 AU.
hist(r,100,main="Histogram of Orbital Radii of Asteroids",xlab="Orbital Radii of Asteroids (AU)",ylab="Frequency")

ω <- 2*pi*(w/360)
x <- r*cos(ω)
y <- r*sin(ω)
plot(x,y,pch=16,col="blue",cex=0.1)

- This plot shows the position of the asteroids in their orbits around
the sun. As in the histogram, we see that the highest density of
asteroids orbit from around 1 to 5 AU away from the sun with a mode
around 2.5 AU away and a small local maximum at around 5 AU away.
Problem IV.5
