April 09, 2025

Introduction

The Loblolly dataset that comes as a built-in dataset in R is of some personal interest to this student. I have lived in Texas all of my life. I lived in east Texas for fourteen years on a beautiful four-acre wooded homestead where Loblolly pine was a common feature. Loblolly pine is the most common and ubiquitous pine tree in east Texas, and it is the most common of all the Southern Pine species. These trees have an average lifespan of about 150 years and typically average 60 to 90 feet high at maturity, but some can grow up to around 125 feet high.

Gary Starts a Lumber Company

Let us imagine that I have a fictitious start-up lumber company, say Gary’s Unparalleled Lumber Products, or GULP. We plan to harvest Loblolly pine at 25-year rotation of stands. Some of them we will use to produce pulp products, whereas others will be milled for lumber products. For our company to operate profitably, we have determined that the trees must have reached an average height of at least 55 feet for harvest. Thus we are interested in finding out if the mean height of Loblolly pine at 25 years of age is at least 55 feet.

Estimating the Average Tree Height

We seek to estimate the mean height of Loblolly pines at 25 years of age. Because we do not know the variance of the whole population of Loblolly pines, we will employ the t-test for our hypothesis test.

To perform a t-test (we will do this “manually” the first time), we will need to know the mean \(\bar{x}\) and standard deviation \(s\) of our sample data, the number of observations \(n\), our chosen significance level \(\alpha\), and our hypothesized value for the mean, \(\mu_{0}\). We will also need to calculate our test statistic, \(t_{0}\).

The mean height of 25-year-old trees in this dataset is \(\bar{x}\) = 60.2892857, and the sample standard deviation of heights in our dataset is \(s\) = 2.2688339. The number of observations \(n\) = 14.

Preparing for Computations

For our computations, we have the following values:

\(\hspace{20pt}\mu_0 = 55.0\)
\(\hspace{20pt}\bar x = 60.2892857\)
\(\hspace{20pt}s = 2.2688339\)
\(\hspace{20pt}n = 14\)
\(\hspace{20pt}\alpha = 0.05\)

And now we compute our test statistic \(t_0\):

\(t_0 = \frac{\bar x - \mu_0}{s / \sqrt{n}} = \frac{60.2892857 - 55.0}{2.2688339 / \sqrt{14}} = 8.7228489\)

Hypothesis Test

We will calculate the P-value for our hypothesis that \(\mu\) is greater than 55.0 versus the null hypothesis that \(\mu\) equals 55.0.

\(\hspace{20pt}H_0:\mu=55.0\\\hspace{20pt}H_a:\mu>55.0\)

\(P\hspace{3pt}value = 1-\Phi(t_0),\hspace{5pt}where\hspace{5pt}\Phi(t_0)\hspace{5pt}\) is the cumulative probability of \(t_0\) taken from the \(t(n-1)\) distribution, i.e., the \(t\) distribution with \(n-1\) degrees of freedom. So our P-value is

\(\hspace{20pt}P\hspace{3pt}val = 1-\Phi(8.7228489) = 4.2837047\times 10^{-7}\).

Since \(P\hspace{3pt}val << \alpha = 0.05\), we reject the null hypothesis and conclude that the mean height of Loblolly pines is greater than 55.0 feet.

Why Are We Doing This Manually?

We can also have the R software compute the t-test for us.

t.test(age25$height, alternative = "greater", mu = 55.0, conf.level = 0.95)
    One Sample t-test

data:  age25$height
t = 8.7228, df = 13, p-value = 4.284e-07
alternative hypothesis: true mean is greater than 55
95 percent confidence interval:
 59.21544      Inf
sample estimates:
mean of x 
 60.28929 

Here’s How R Does It

The data in the Loblolly dataset is further organized according to seed stock. We would like to see a graph that illustrates tree height by seed group so that we can purchase the right seed from the right sources. Here is the R code that generates the plot on the following slide.

ggplot(data = age25) + geom_point(mapping = aes(x = Seed, y = height), 
                          color = "darkgreen", size = 4, shape = 2) + 
                          ggtitle("Height by Seed Group") +
                          labs(x = "Seed Group", y = "Height") +
                          theme(plot.title = element_text(hjust = 0.5))

Loblolly Heights by Seed Group

From an inspection of the plot below, it would appear that we would want to exclude seeds from groups 327 and 329.

Boxing Match

Here we show a boxplot of heights by seed group, showing the generally rising median heights.

Loblolly Heights by Age and Seed Group

If you zoom in around each grouping, you will find there are 14 seed groups that produce trees of different heights in each 5-year age group.