Density and spatial analysis of Arctostaphylos leaves in southern Arizona

This Markdown analyzes field data collected by Matthew Hecking for the Water and Life in the American Southwest course at Hampshire College. Data was collected from March 17-23, 2014 in southern Arizona.


Stomatal Density

This study has considered and studied various morphological traits that were hypothesized to vary with restrictions in water availability. In this section, we will be analyzing the relative density of stomata located on the adaxial and abaxial sides of Arctostaphylos leaves. Based upon pre-existing literature, I hypothesized that there would be a significant positive correlation between stomatal density and elevation (i.e, more stomata as elevation increases)

To determine whether this is true or not, we must preform several linear models to test for significance within the data, and plot the information to see if the graph matches what our model predicts. The first step in this process is entering data into the environment of R studio (after importing the data set as a csv) by using the code:


DENUP <- read.csv("/Users/matthewhecking/Documents/Water and Life in American SW/Stomate Density Results/Density3.csv")

After this, you can check that the data is correctly in R by using the command:


head(DENUP)
##   Site Leaf Replicate Density Elevation Mountain
## 1    1    1         1      49      5200    Grahm
## 2    1    1         2      49      5200    Grahm
## 3    1    1         3      45      5200    Grahm
## 4    1    2         1      45      5200    Grahm
## 5    1    2         2      42      5200    Grahm
## 6    1    2         3      42      5200    Grahm

Because the data is valid, we can begin to now manipulate it. To begin, we attach the data by using the code:


attach(DENUP)

Now that the data is attached, we can use it (make sure you detach any previous file by using the command –> detach(DENUP)

Data Analysis

With the data now entered, we can begin to preform linear models, the most simple of which is a general linear model. The code for a general liner model is:

fit = glm(DENUP)
summary(fit)
## 
## Call:
## glm(formula = DENUP)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6478  -0.3336   0.0315   0.2830   0.6554  
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      6.22e-01   4.42e-01    1.41    0.163    
## Leaf             3.76e-01   8.85e-03   42.52   <2e-16 ***
## Replicate       -5.68e-04   4.56e-02   -0.01    0.990    
## Density          1.65e-03   2.68e-03    0.62    0.540    
## Elevation       -1.09e-04   7.61e-05   -1.44    0.155    
## MountainCochise -2.58e-01   1.61e-01   -1.60    0.112    
## MountainGrahm    2.44e-02   1.80e-01    0.14    0.892    
## MountainMadera  -5.49e-01   1.86e-01   -2.96    0.004 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1332)
## 
##     Null deviance: 1082.625  on 95  degrees of freedom
## Residual deviance:   11.726  on 88  degrees of freedom
## AIC: 88.59
## 
## Number of Fisher Scoring iterations: 2

This model however isn't very useful, as several items in the data set are accidentally skewing the results (the leaf data points, for example, have a p value of 2 e-16, which does not make any sense). This tells us that to get a better understanding of the true significance of the data, we must refine the model by telling R which variables should be tested for significance and which are only factors. This can be be preformed with another linear model of an analysis of variance (which is a similar test), the code for which is as follows:

fit = aov(lm(Density ~ Elevation))
summary(fit)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Elevation    1   5663    5663    21.7 1.1e-05 ***
## Residuals   94  24544     261                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This model is specifically looking at the interaction between density and elevation, and as we run the model, the information given shows us that elevation is highly significant when compared with stomatal density. We can look at the data further by including the other variables, by writing the code:

fit = aov(lm(Density ~ Elevation + Site + Leaf + Replicate))
summary(fit)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Elevation    1   5663    5663   22.70 7.1e-06 ***
## Site         1    977     977    3.92   0.051 .  
## Leaf         1    852     852    3.41   0.068 .  
## Replicate    1      8       8    0.03   0.862    
## Residuals   91  22708     250                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows that there the factors “Site”, “Leaf” are slightly significant (although above the P threshold of .05), and Replicate is basically random while Elevation remains highly significant. We can also see this relationship through a graph by writing the following:

boxplot(data = DENUP, Density ~ Elevation, xlab = "Elevation (feet)", ylab = "Stomatal Density per .92 square cm", 
    main = "adaxial stomatal density compared with elevation")

plot of chunk unnamed-chunk-7

This provides us with a visual reference for the data, which we can look at and compare to the observations provided by the linear models. This specific graph is plotting the adaxial (bottom side of the leaf) stomatal density of all sample leaves in correlation with elevation. However, we can also look at the averages of the leaves instead of looking at them individually, by imputing the average densities into R

DENUPALL <- read.csv("/Users/matthewhecking/Documents/Water and Life in American SW/Stomate Density Results/Density1.csv")

and redoing the existing steps:

##             Df Sum Sq Mean Sq F value Pr(>F)   
## Elevation    1   1669    1669    10.7 0.0027 **
## Residuals   30   4697     157                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot of chunk unnamed-chunk-9

You can also compare these values to the abaxial stomatal densities recorded

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Elevation    1   6098    6098      15 0.00021 ***
## Residuals   88  35823     407                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness

and see the graphs generated

plot of chunk unnamed-chunk-11

## 
## Call:
## glm(formula = Density ~ Elevation)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -32.00  -13.30   -4.08   11.22   53.99  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -14.56760   30.62916   -0.48    0.638  
## Elevation     0.01263    0.00532    2.38    0.025 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 412)
## 
##     Null deviance: 13863  on 29  degrees of freedom
## Residual deviance: 11537  on 28  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 269.7
## 
## Number of Fisher Scoring iterations: 2

plot of chunk unnamed-chunk-12

However, by observing the data in a graphical form, it seems clear that the correlation between stomatal density and elevation is not the most strong correlation (In fact, it appears to be quite weak!). This directly opposes the linear model given before, which makes us question where the significance of the model is coming from.

It is possible that the significance is simply being created by a small number of individuals, or by looking at the extremes in density recorded, or that the data is being affected by factors outside of the model. The easiest of these to examine is to re-evaluate the data and experiment with possible factors. To begin, I considered the last of these assumptions, and entered an additional factor: the mountain the samples were taken from:

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Elevation    1   1464    1464    6.13  0.016 *
## Residuals   55  13142     239                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot of chunk unnamed-chunk-13

By looking at a sub-section of the data, which all came from the Mount Grahm sampling site, we can see that although the p value of the data is less significant, the overall chart agrees with the model provided

##   Site Leaf Replicate Density Elevation Mountain
## 1    9   23         1      64      4986  Cochise
## 2    9   23         2      57      4986  Cochise
## 3    9   23         3      59      4986  Cochise
## 4    9   24         1      53      4986  Cochise
## 5    9   24         2      42      4986  Cochise
## 6    9   24         3      45      4986  Cochise
## 
## Call:
## glm(formula = Density ~ Elevation + Site + Leaf + Replicate)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -16.11   -8.00    1.11    5.60   16.47  
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -579.3410   315.3331   -1.84    0.087 .
## Elevation      0.1301     0.0504    2.58    0.022 *
## Site               NA         NA      NA       NA  
## Leaf          -0.1667     2.9826   -0.06    0.956  
## Replicate     -3.5833     2.9826   -1.20    0.250  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 106.7)
## 
##     Null deviance: 4875.6  on 17  degrees of freedom
## Residual deviance: 1494.5  on 14  degrees of freedom
## AIC: 140.6
## 
## Number of Fisher Scoring iterations: 2
## Warning: only using the first two of 5 regression coefficients

plot of chunk unnamed-chunk-14

This data, coming from the Cochise sampling site, also shows no odd decrease in density as elevation increases. After seeing evidence that the mountain the samples came from appears to be significant for this data set, we can compare mountains to one another, as seen below:

plot of chunk unnamed-chunk-15

Although many of these sites were sampled at the same elevation, we can see that there is significant variation simply because of the mountain the samples came from. This also shows the sample size was less than optimal, and should have been increased to get a better understanding of how these sites differ.

Leaf Density

In addition to recording stomatal density, this study also looked at leaf density (leaf area/weight) to see if there is any correlation with elevation. the hypothesis made was that because a higher surface area would allow to more evaporation, the lower in elevation the sample was taken, the lower the relative surface area of the leaves and thus a higher leaf density.

As with the earlier examples the first step is entering the data into R and running regression models to identify any significance within the data set.

##   Site Leaves  Area Weight Elevation Density
## 1    1      1 3.590 0.1655      5200 0.04610
## 2    1      1 3.229 0.1569      5200 0.04859
## 3    1      1 2.761 0.1372      5200 0.04969
## 4    1      1 2.349 0.1084      5200 0.04615
## 5    1      2 1.926 0.0599      5200 0.03110
## 6    1      2 2.415 0.0686      5200 0.02841
##              Df  Sum Sq Mean Sq F value Pr(>F)    
## Elevation     1 0.00074 0.00074    10.9 0.0012 ** 
## Weight        1 0.00599 0.00599    88.1 <2e-16 ***
## Area          1 0.01208 0.01208   177.8 <2e-16 ***
## Residuals   142 0.00965 0.00007                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This initial anova suggests a significant correlation with elevation, however the addition of the area and weight into the model could possibly be interfering with the p value for elevation interaction. By eliminating these variables from the equation:

##              Df  Sum Sq  Mean Sq F value Pr(>F)  
## Elevation     1 0.00074 0.000739    3.84  0.052 .
## Residuals   144 0.02772 0.000193                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot of chunk unnamed-chunk-17

we see that there is a much higher p value. However, this p value might also be influenced by outliers in the data, such as the one sample taken from around 5000 feet in the last graph. by omitting this using the following lines:

LeafDensityNoOutlier <- LeafDensity[-139, ]
fit = aov(data = LeafDensityNoOutlier, Density ~ Elevation)
summary(fit)
##              Df  Sum Sq  Mean Sq F value Pr(>F)  
## Elevation     1 0.00025 2.49e-04    2.81  0.096 .
## Residuals   143 0.01270 8.88e-05                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(data = LeafDensityNoOutlier, Density ~ Elevation)
co <- coef(fit)
abline(fit, lwd = 2)

plot of chunk unnamed-chunk-18

we see the p value increase again. Although there still appears to be a very slight influence that matches the hypothesis, the data set is either to small to show its significance, or the data simply has no trend.

In terms of the correlation between leaf area and weight however, we saw a highly correlation between the two, which was expected and shows our sampling technique for leaf density worked.

fit = aov(lm(Weight ~ Area + Elevation + Density))
summary(fit)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Area          1 0.0420  0.0420  160.52 <2e-16 ***
## Elevation     1 0.0007  0.0007    2.58   0.11    
## Density       1 0.0603  0.0603  230.62 <2e-16 ***
## Residuals   142 0.0371  0.0003                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Weight ~ Area, xlab = "Area of leaf (cm^2)", ylab = "weight of leaf(gram)")

plot of chunk unnamed-chunk-19