This Markdown analyzes field data collected by Matthew Hecking for the Water and Life in the American Southwest course at Hampshire College. Data was collected from March 17-23, 2014 in southern Arizona.
This study has considered and studied various morphological traits that were hypothesized to vary with restrictions in water availability. In this section, we will be analyzing the relative density of stomata located on the adaxial and abaxial sides of Arctostaphylos leaves. Based upon pre-existing literature, I hypothesized that there would be a significant positive correlation between stomatal density and elevation (i.e, more stomata as elevation increases)
To determine whether this is true or not, we must preform several linear models to test for significance within the data, and plot the information to see if the graph matches what our model predicts. The first step in this process is entering data into the environment of R studio (after importing the data set as a csv) by using the code:
DENUP <- read.csv("/Users/matthewhecking/Documents/Water and Life in American SW/Stomate Density Results/Density3.csv")
After this, you can check that the data is correctly in R by using the command:
head(DENUP)
## Site Leaf Replicate Density Elevation Mountain
## 1 1 1 1 49 5200 Grahm
## 2 1 1 2 49 5200 Grahm
## 3 1 1 3 45 5200 Grahm
## 4 1 2 1 45 5200 Grahm
## 5 1 2 2 42 5200 Grahm
## 6 1 2 3 42 5200 Grahm
Because the data is valid, we can begin to now manipulate it. To begin, we attach the data by using the code:
attach(DENUP)
Now that the data is attached, we can use it (make sure you detach any previous file by using the command –> detach(DENUP)
With the data now entered, we can begin to preform linear models, the most simple of which is a general linear model. The code for a general liner model is:
fit = glm(DENUP)
summary(fit)
##
## Call:
## glm(formula = DENUP)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6478 -0.3336 0.0315 0.2830 0.6554
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.22e-01 4.42e-01 1.41 0.163
## Leaf 3.76e-01 8.85e-03 42.52 <2e-16 ***
## Replicate -5.68e-04 4.56e-02 -0.01 0.990
## Density 1.65e-03 2.68e-03 0.62 0.540
## Elevation -1.09e-04 7.61e-05 -1.44 0.155
## MountainCochise -2.58e-01 1.61e-01 -1.60 0.112
## MountainGrahm 2.44e-02 1.80e-01 0.14 0.892
## MountainMadera -5.49e-01 1.86e-01 -2.96 0.004 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1332)
##
## Null deviance: 1082.625 on 95 degrees of freedom
## Residual deviance: 11.726 on 88 degrees of freedom
## AIC: 88.59
##
## Number of Fisher Scoring iterations: 2
This model however isn't very useful, as several items in the data set are accidentally skewing the results (the leaf data points, for example, have a p value of 2 e-16, which does not make any sense). This tells us that to get a better understanding of the true significance of the data, we must refine the model by telling R which variables should be tested for significance and which are only factors. This can be be preformed with another linear model of an analysis of variance (which is a similar test), the code for which is as follows:
fit = aov(lm(Density ~ Elevation))
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 5663 5663 21.7 1.1e-05 ***
## Residuals 94 24544 261
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This model is specifically looking at the interaction between density and elevation, and as we run the model, the information given shows us that elevation is highly significant when compared with stomatal density. We can look at the data further by including the other variables, by writing the code:
fit = aov(lm(Density ~ Elevation + Site + Leaf + Replicate))
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 5663 5663 22.70 7.1e-06 ***
## Site 1 977 977 3.92 0.051 .
## Leaf 1 852 852 3.41 0.068 .
## Replicate 1 8 8 0.03 0.862
## Residuals 91 22708 250
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that there the factors “Site”, “Leaf” are slightly significant (although above the P threshold of .05), and Replicate is basically random while Elevation remains highly significant. We can also see this relationship through a graph by writing the following:
boxplot(data = DENUP, Density ~ Elevation, xlab = "Elevation (feet)", ylab = "Stomatal Density per .92 square cm",
main = "adaxial stomatal density compared with elevation")
This provides us with a visual reference for the data, which we can look at and compare to the observations provided by the linear models. This specific graph is plotting the adaxial (bottom side of the leaf) stomatal density of all sample leaves in correlation with elevation. However, we can also look at the averages of the leaves instead of looking at them individually, by imputing the average densities into R
DENUPALL <- read.csv("/Users/matthewhecking/Documents/Water and Life in American SW/Stomate Density Results/Density1.csv")
and redoing the existing steps:
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 1669 1669 10.7 0.0027 **
## Residuals 30 4697 157
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can also compare these values to the abaxial stomatal densities recorded
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 6098 6098 15 0.00021 ***
## Residuals 88 35823 407
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
and see the graphs generated
##
## Call:
## glm(formula = Density ~ Elevation)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -32.00 -13.30 -4.08 11.22 53.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.56760 30.62916 -0.48 0.638
## Elevation 0.01263 0.00532 2.38 0.025 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 412)
##
## Null deviance: 13863 on 29 degrees of freedom
## Residual deviance: 11537 on 28 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 269.7
##
## Number of Fisher Scoring iterations: 2
However, by observing the data in a graphical form, it seems clear that the correlation between stomatal density and elevation is not the most strong correlation (In fact, it appears to be quite weak!). This directly opposes the linear model given before, which makes us question where the significance of the model is coming from.
It is possible that the significance is simply being created by a small number of individuals, or by looking at the extremes in density recorded, or that the data is being affected by factors outside of the model. The easiest of these to examine is to re-evaluate the data and experiment with possible factors. To begin, I considered the last of these assumptions, and entered an additional factor: the mountain the samples were taken from:
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 1464 1464 6.13 0.016 *
## Residuals 55 13142 239
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By looking at a sub-section of the data, which all came from the Mount Grahm sampling site, we can see that although the p value of the data is less significant, the overall chart agrees with the model provided
## Site Leaf Replicate Density Elevation Mountain
## 1 9 23 1 64 4986 Cochise
## 2 9 23 2 57 4986 Cochise
## 3 9 23 3 59 4986 Cochise
## 4 9 24 1 53 4986 Cochise
## 5 9 24 2 42 4986 Cochise
## 6 9 24 3 45 4986 Cochise
##
## Call:
## glm(formula = Density ~ Elevation + Site + Leaf + Replicate)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.11 -8.00 1.11 5.60 16.47
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -579.3410 315.3331 -1.84 0.087 .
## Elevation 0.1301 0.0504 2.58 0.022 *
## Site NA NA NA NA
## Leaf -0.1667 2.9826 -0.06 0.956
## Replicate -3.5833 2.9826 -1.20 0.250
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 106.7)
##
## Null deviance: 4875.6 on 17 degrees of freedom
## Residual deviance: 1494.5 on 14 degrees of freedom
## AIC: 140.6
##
## Number of Fisher Scoring iterations: 2
## Warning: only using the first two of 5 regression coefficients
This data, coming from the Cochise sampling site, also shows no odd decrease in density as elevation increases. After seeing evidence that the mountain the samples came from appears to be significant for this data set, we can compare mountains to one another, as seen below:
Although many of these sites were sampled at the same elevation, we can see that there is significant variation simply because of the mountain the samples came from. This also shows the sample size was less than optimal, and should have been increased to get a better understanding of how these sites differ.
In addition to recording stomatal density, this study also looked at leaf density (leaf area/weight) to see if there is any correlation with elevation. the hypothesis made was that because a higher surface area would allow to more evaporation, the lower in elevation the sample was taken, the lower the relative surface area of the leaves and thus a higher leaf density.
As with the earlier examples the first step is entering the data into R and running regression models to identify any significance within the data set.
## Site Leaves Area Weight Elevation Density
## 1 1 1 3.590 0.1655 5200 0.04610
## 2 1 1 3.229 0.1569 5200 0.04859
## 3 1 1 2.761 0.1372 5200 0.04969
## 4 1 1 2.349 0.1084 5200 0.04615
## 5 1 2 1.926 0.0599 5200 0.03110
## 6 1 2 2.415 0.0686 5200 0.02841
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 0.00074 0.00074 10.9 0.0012 **
## Weight 1 0.00599 0.00599 88.1 <2e-16 ***
## Area 1 0.01208 0.01208 177.8 <2e-16 ***
## Residuals 142 0.00965 0.00007
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This initial anova suggests a significant correlation with elevation, however the addition of the area and weight into the model could possibly be interfering with the p value for elevation interaction. By eliminating these variables from the equation:
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 0.00074 0.000739 3.84 0.052 .
## Residuals 144 0.02772 0.000193
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
we see that there is a much higher p value. However, this p value might also be influenced by outliers in the data, such as the one sample taken from around 5000 feet in the last graph. by omitting this using the following lines:
LeafDensityNoOutlier <- LeafDensity[-139, ]
fit = aov(data = LeafDensityNoOutlier, Density ~ Elevation)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## Elevation 1 0.00025 2.49e-04 2.81 0.096 .
## Residuals 143 0.01270 8.88e-05
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(data = LeafDensityNoOutlier, Density ~ Elevation)
co <- coef(fit)
abline(fit, lwd = 2)
we see the p value increase again. Although there still appears to be a very slight influence that matches the hypothesis, the data set is either to small to show its significance, or the data simply has no trend.
In terms of the correlation between leaf area and weight however, we saw a highly correlation between the two, which was expected and shows our sampling technique for leaf density worked.
fit = aov(lm(Weight ~ Area + Elevation + Density))
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## Area 1 0.0420 0.0420 160.52 <2e-16 ***
## Elevation 1 0.0007 0.0007 2.58 0.11
## Density 1 0.0603 0.0603 230.62 <2e-16 ***
## Residuals 142 0.0371 0.0003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Weight ~ Area, xlab = "Area of leaf (cm^2)", ylab = "weight of leaf(gram)")