This is a script to compare strength of Redwoods against other trees and to display that data visually, based on CSV data.
First – for nice graphs, let’s use the ggplot2 and gpairs packages.
require(ggplot2)
## Loading required package: ggplot2
require(gpairs)
## Loading required package: gpairs
Next, input the CSV file. The na.string arguement drops 0 and - values. This will help us work with the data by removing meaningless entries from the start. Let’s further clean the data by creating a subset of the dataframe to remove trees without groups (null in the Tree.Group header) and 0’s from the na header. Otherwise, we would have some trees grouped that are not related and with no label or R would return an error for no-finite value (the NA’s), but still complete the qplot command:
my.Trees<- read.csv("trees.csv",
header = TRUE,
na.string = c("-","0"))
named.Trees <- subset(my.Trees, my.Trees$Tree.Group != "null")
named.Trees <- subset(named.Trees, named.Trees$Strength != "NA")
Now let’s take a look at the dataset we’re dealing with. This has the added benefit of displaying header names, which is required for manipulating the dataframe.
head(named.Trees)
## Tree.Species Tree.Group SpecG Modulus Break.Height Compression
## 2 Ash, Black Ash 0.49 1.60 35 5970
## 3 Ash, Blue Ash 0.58 1.40 NA 6980
## 4 Ash, Green Ash 0.56 1.66 32 7080
## 5 Ash, Oregon Ash 0.55 1.36 33 6040
## 6 Ash, White Ash 0.60 1.74 43 7410
## 7 Aspen, Bigtooth Aspen 0.39 1.43 NA 5300
## Compress.perp Strength
## 2 760 1570
## 3 1420 2030
## 4 1310 1910
## 5 1250 1790
## 6 1160 1910
## 7 450 1080
summary(named.Trees)
## Tree.Species Tree.Group SpecG Modulus
## Ash, Black : 1 Oak :17 Min. :0.31 Min. :0.80
## Ash, Blue : 1 Pine :15 1st Qu.:0.40 1st Qu.:1.35
## Ash, Green : 1 Fir : 7 Median :0.49 Median :1.57
## Ash, Oregon : 1 Cedar : 6 Mean :0.51 Mean :1.57
## Ash, White : 1 Ash : 5 3rd Qu.:0.63 3rd Qu.:1.77
## Aspen, Bigtooth: 1 Hickory: 5 Max. :0.88 Max. :2.28
## (Other) :79 (Other):30
## Break.Height Compression Compress.perp Strength
## Min. :12.0 Min. :3960 Min. : 300 Min. : 790
## 1st Qu.:22.8 1st Qu.:5410 1st Qu.: 550 1st Qu.:1110
## Median :30.5 Median :6250 Median : 770 Median :1390
## Mean :33.2 Mean :6399 Mean : 860 Mean :1480
## 3rd Qu.:41.0 3rd Qu.:7200 3rd Qu.:1060 3rd Qu.:1850
## Max. :88.0 Max. :9210 Max. :2840 Max. :2660
## NA's :9
quantile(named.Trees$Strength)
## 0% 25% 50% 75% 100%
## 790 1110 1390 1850 2660
Now we need to determine the mean, median, and quartile ranges to add value to the plots. The output of the quantile command can be indexed. In R, indexes start at 1, not 0.
grand.mean = mean(named.Trees$Strength)
median = median(named.Trees$Strength)
LQ = quantile(named.Trees$Strength)[2]
UQ = quantile(named.Trees$Strength)[4]
Create subset for redwood and plot the strength against the entire dataset to see how redwood strength compares.
my.Redwoods <- subset(named.Trees, Tree.Group == "Redwood")
boxplot(my.Redwoods$Strength, named.Trees$Strength,
names = c("Redwoods", "All trees"),
xlab = "Tree type", ylab = "Strength [psi]",
col = "darkgreen",
main = "Comparison of Redwood vs Aggregate Tree Strength",
las = 2)
abline(h = grand.mean, col = "red", lty = 2)
abline(h = LQ, col = "blue", lty = 4)
abline(h = UQ, col = "blue", lty = 4)
As good practice, let’s also produce a summary
tree.Summary <- list(summary(my.Redwoods$Strength),
summary(my.Trees$Strength))
names(tree.Summary) <- c("Redwoods stength [psi]", "All trees strength [psi]")
tree.Summary
## $`Redwoods stength [psi]`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 940 982 1020 1020 1070 1110
##
## $`All trees strength [psi]`
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 790 1130 1390 1470 1820 2660 5
As you can see, Redwoods are relatively soft, below both IQR, grand mean, and median strength.
Perhaps there are other species of trees that are soft like redwood. Let’s see all the strength of all the trees by species.
q <- qplot(Tree.Group, Strength,
data = named.Trees,
geom = "boxplot",
main = "Tree strength by species",
xlab = "Tree Species", ylab = "Strength [psi]")
tree.average <- c(my.Trees$Strength)
q + geom_hline(yintercept=grand.mean,
colour = "red",
linetype = 2,
show_guide = TRUE) +
geom_hline(yintercept=LQ,
colour = "blue",
linetype = 4,
show_guide = TRUE) +
geom_hline(yintercept=UQ,
colour = "blue",
linetype = 4,
show_guide = TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
But perhaps there are other mechanical benefits to Redwood compared to the data set. Let’s see how the other material properties compare to strength, which we know that Redwoods are lacking, compared to the dataset.
Let’s create a new dataframe with the material properties and do a regression. Regressions against other data, as displayed with the head() command, would be meaningless. The best way to do this is to create a new dataframe by combining the columns of interest from the original dataframe.
trees.Properties <- data.frame(my.Trees[c("SpecG","Modulus", "Break.Height", "Compression", "Strength")])
gpairs(trees.Properties)
We can observe general linear trends with the regressions, but let’s get more of a quantitative feel for the data.
SpecG.lm = lm(Strength ~ SpecG, data = trees.Properties)
Modulus.lm = lm(Strength ~ Modulus, data = trees.Properties)
Break.Height.lm = lm(Strength ~ Break.Height, data = trees.Properties)
Compression.lm = lm(Strength ~ Compression, data = trees.Properties)
Calling the summary function gives a detailed list of statitics… let’s take a look at the coefficients of determination by calling the r.squared attribute:
summary(SpecG.lm)$r.squared
## [1] 0.8174
summary(Modulus.lm)$r.squared
## [1] 0.3855
summary(Break.Height.lm)$r.squared
## [1] 0.6362
summary(Compression.lm)$r.squared
## [1] 0.5999
As we can see, althought there is a general linear relationship, strength is not always the best predictor of other mechanical properties. To determine how redwoods or other species compare, a treatment similar to the strength analysis may be used.