This document summarizes attempts to develop an allometric equation for Phragmites australis and Typha x glauca in our study area. Work is shown and results are explained when possible, with findings summarized at the end.

Phragmites australis

I selected all Phragmites data for 2019, as all basins were pre-treatment at the time of data collection, but I used only the control plots from 2020. This arrangement provides the greatest number of untreated samples for analysis.

data checks

Building subset of phrag data

phrag <- subset(main.df, species == "phrag" &
                 ((date == "2020" & treatment == "control") |
                  (date == "2019")))
summary(phrag)

Checking normality assumptions

linear model and QQ plots:

# building linear model
LMphrag <- lm(DryWeight_g ~ HeightClass, data = phrag)

# plot linear model residuals
qqnorm(LMphrag$residuals)
qqline(LMphrag$residuals)

Shapiro-Wilk Test

Ho: Data are drawn from a normal distribution (p-value > 0.001)

Ha: Data are NOT drawn from a normal distribution

shapiro.test(LMphrag$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  LMphrag$residuals
## W = 0.9614, p-value = 0.0004067

The QQ plot is unconvincing and our Shapiro-Wilk test generated a p-value below 0.001. Data are not normally distributed and a transformation is required.

Data transformations

I hid the output to keep the section tidy, but p-values are listed under each transformation test.

LOGphrag <- lm(log10(DryWeight_g) ~ HeightClass, data = phrag) # lm with log10 transformation
SQRTphrag <- lm(sqrt(DryWeight_g) ~ HeightClass, data = phrag) # lm with sqrt transformation
X2phrag <- lm((DryWeight_g)^2 ~ HeightClass, data = phrag) # lm with x^2 transformation

shapiro.test(LOGphrag$residuals) 
# p = 0.194, Ho accepted, DATA NORMAL

shapiro.test(SQRTphrag$residuals)
# p = 0.002986, Ho accepted, DATA NORMAL

shapiro.test(X2phrag$residuals)
# p = 0.000000002497, Ho rejected, data not normal

Of the three tested data transformations, both the log10 transformation and the square root transformation passed the Shapiro-Wilk Test.

Comparing log10 and square root transformations

Here are their QQ plots: The log10 transformation has a stronger QQ plot, but other tests may give a better idea of which transformation to choose.

Now that two sets of data meet normality assumptions, we can run equal variance tests of residuals.

plot(LOGphrag, 1, main = "Residuals vs Fitted, log10 transformation")

plot(SQRTphrag, 1, main = "Residuals vs Fitted, square root transformation")

The square root transformation is more convincing here.

# Linearity Test 
crPlots(LOGphrag, main = "log10 transformation")

crPlots(SQRTphrag, main = "square root transformation")

# Outlier Test 
plot(LOGphrag, 5, main = "log10 transformation")

plot(SQRTphrag, 5, main = "square root transformation")

For the non-constant variance test, the hypotheses are:

Ho: Data have constant variance (p-value > 0.05)

Ha: Data have non-constant variance

ncvTest(LOGphrag)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 5.870782, Df = 1, p = 0.015394

ncvTest(SQRTphrag)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 3.889406, Df = 1, p = 0.048592

The log10 transformation does not have constant variance: its p-value is 0.015394.

The square root transformation, though, is very close. Its p-value is 0.048592. If we round up, we meet the assumptions and assume constant variance, and the residual plots are more convincing on the whole. Are we okay with this?

Building the equation

summary(SQRTphrag)

## 
## Call:
## lm(formula = sqrt(DryWeight_g) ~ HeightClass, data = phrag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4977 -0.5752 -0.1201  0.4342  2.3289 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.093546   0.309873  -3.529  0.00056 ***
## HeightClass  0.112853   0.005217  21.631  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8263 on 144 degrees of freedom
## Multiple R-squared:  0.7647, Adjusted R-squared:  0.763 
## F-statistic: 467.9 on 1 and 144 DF,  p-value: < 2.2e-16

Based on the summary, we can tell the y-intercept is -1.093546 and the slope is 0.112853. Therefore our allometric equation is y = 0.112853x - 1.093546.

The following code can determine our root mean squared error:

RSS <- c(crossprod(SQRTphrag$residuals))
MSE <- RSS / length(SQRTphrag$residuals)
RMSE <- sqrt(MSE)
RMSE # 68% of predicted values fall within x +/- RMSE

## [1] 0.8206188

RMSE * 2 # 95% of predict values fall within x +/- RMSE

## [1] 1.641238

Obtaining Phragmites biomass estimates

Bringing in data collected from the field:

field.df <- read.csv("Sam_Counts_Cleaning.csv")
field.df$year <- as.factor(field.df$year)
field.df$basin <- as.factor(field.df$basin)

This data set is not in a format that is easy to analyze, so the next chunk of code prepares data for manipulation and analysis.

BasinPlots <- melt(field.df, id = c("date", "year", "basin", "plot", "notes")) 
BasinPlots$value <- as.numeric(BasinPlots$value)

# clean up data frame, remove NA values
analysis <- subset(BasinPlots, (variable != "phrag_sum") &  # deletes these columns from subset. "is not equal to"
                     (variable != "typha_sum") & 
                     (variable != "inf_sum")) 
analysis <- na.omit(analysis) # removes NA values from subset

# create appropriate columns
TollwayDB <- analysis %>% separate(variable, c("species", "height_class")) # breaks apart the species and height class
TollwayDB$species <- as.factor(TollwayDB$species) # reminds R that species is a factor
TollwayDB$height_class <- as.numeric(TollwayDB$height_class)
TollwayDB$weighted_height <- TollwayDB$height_class * TollwayDB$value
          # will divide this column from stem_total below to get mean height per plot
head(TollwayDB)

##         date year basin         plot             notes species height_class
## 1507 8.14.19 2019   183 phrag_center                     phrag            7
## 1730 8.13.20 2020   140    phrag_240                     phrag            8
## 1883 8.13.20 2020   140    phrag_120 32: either 1 or 2   phrag            9
## 1978 8.22.19 2019   236    phrag_240                     phrag           10
## 2002 8.22.19 2019   236 phrag_center                     phrag           10
## 2056 8.13.20 2020   140    phrag_240                     phrag           10
##      value weighted_height
## 1507     1               7
## 1730     1               8
## 1883     1               9
## 1978     2              20
## 2002     2              20
## 2056     1              10

# create data frame that summarizes total plots and total count of each species 
PlotTotals <- ddply(TollwayDB, c("date", "basin", "plot", "species"), summarize,
                 stem_total = sum(value, na.rm = T),
                 sum_weighted = sum(weighted_height, na.rm = T) ) # all heights added together (numerator of mean!)
            # reminder: count is a new column made of the sum of values

# create column for mean height of each plot
PlotTotals$mean_height <- PlotTotals$sum_weighted / PlotTotals$stem_total

head(PlotTotals)

##      date basin         plot species stem_total sum_weighted mean_height
## 1 8.11.20   184    phrag_120   phrag         28         1254    44.78571
## 2 8.11.20   184    phrag_120   typha         21          855    40.71429
## 3 8.11.20   184    phrag_240   phrag        139         3976    28.60432
## 4 8.11.20   184    phrag_240   typha          8          356    44.50000
## 5 8.11.20   184    phrag_360   phrag        140         4175    29.82143
## 6 8.11.20   184 phrag_center   phrag         76         2685    35.32895

PlotTotals contains the stem total and mean height of all plots for all species. This “master” dataset will be used for both the Phragmites and Typha estimates.

For Phragmites:

phragPlots <- subset(PlotTotals, species == "phrag")
  # subset of all plots in which phragmites is present

# insert column with transformed allometric equation
phragPlots$biomass_raw <- 0.112853*phragPlots$mean_height - 1.093546

# back-transform data. Value obtained is the average biomass in the plot
phragPlots$mean_biomass <- (phragPlots$biomass_raw)^2

# insert column of total estimated biomass in each plot
phragPlots$total_plot_biomass <- phragPlots$mean_biomass * phragPlots$stem_total
head(phragPlots)

##       date basin         plot species stem_total sum_weighted mean_height
## 1  8.11.20   184    phrag_120   phrag         28         1254    44.78571
## 3  8.11.20   184    phrag_240   phrag        139         3976    28.60432
## 5  8.11.20   184    phrag_360   phrag        140         4175    29.82143
## 6  8.11.20   184 phrag_center   phrag         76         2685    35.32895
## 8  8.11.20   184    typha_120   phrag          5          197    39.40000
## 11 8.11.20   184    typha_240   phrag          6          211    35.16667
##    biomass_raw mean_biomass total_plot_biomass
## 1     3.960656    15.686798          439.23033
## 3     2.134537     4.556248          633.31846
## 5     2.271892     5.161492          722.60885
## 6     2.893432     8.371947          636.26797
## 8     3.352862    11.241685           56.20842
## 11    2.875118     8.266303           49.59782

# basin-wide biomass per plot estimate
phragBasin <- ddply(phragPlots, c("date", "basin", "species"), summarize,
                    avg_biomass = mean(total_plot_biomass, na.rm = T))
head(phragBasin)

##      date basin species avg_biomass
## 1 8.11.20   184   phrag    369.6386
## 2 8.12.20    28   phrag    618.9667
## 3 8.12.20    32   phrag    304.3385
## 4 8.13.19   184   phrag   1618.6949
## 5 8.13.20   140   phrag    232.8805
## 6 8.13.20   141   phrag    445.4147

We now have the average phrag biomass per m^2 for every basin. To get the total biomass, I pulled in Drew’s 2019 acreage estimates from NearMap. I would like to update these valueswith 2020 aerial imagery.

# import of basin acreage data
acreage.df <- read.csv("basin_species_acreage.csv")
acreage.df$basin <- as.factor(acreage.df$basin)

# summarize total phrag area of each basin
speciesAreas <- ddply(acreage.df, c("species", "basin"), summarize,
                   total_acreage = sum(acreage, na.rm = T))

# Convert to m^2. Reminder: 1 acre = 4046.86 m^2
speciesAreas$area_in_m2 <- speciesAreas$total_acreage * 4046.86
head(speciesAreas)

##   species basin total_acreage area_in_m2
## 1   phrag    28     0.6161340  2493.4080
## 2   phrag    32     0.1072954   434.2095
## 3   phrag   140     0.5493020  2222.9483
## 4   phrag   141     0.8745370  3539.1288
## 5   phrag   153     0.8505230  3441.9475
## 6   phrag   158     1.5082621  6103.7256

Similar to PlotTotals, we can use speciesAreas for both species analyses. The next step here is to merge the phrag subset of speciesAreas with the per-plot phragBasin estimates. The merge function worked but the columns required a little cleanup. I checked my work to make sure the numbers were accurate after merge().

phragArea <- subset(speciesAreas, species == "phrag")
phragFinal <- merge(phragArea, phragBasin, by = "basin") # puts avg biomass and total m^2 in same dataframe
phragFinal = select(phragFinal, -c(species.x, species.y, total_acreage)) #drops unnecessary columns
names(phragFinal)

## [1] "basin"       "area_in_m2"  "date"        "avg_biomass"

col_order <- c("basin", "date", "avg_biomass", "area_in_m2") # reorder columns for easier viewing
phragFinal <- phragFinal[, col_order]

The final step is to add a column that multiplies biomass per m^2 by the total m^2 in the basin.

phragFinal$total_biomass <- phragFinal$avg_biomass * phragFinal$area_in_m2

We now have total biomass estimates per basin. Note that there are too many values for basin 140. A look back reveals it was sampled on two different dates, and I will figure out how to correct the data given that realization.

phragFinal

##    basin    date avg_biomass area_in_m2 total_biomass
## 1    140 8.13.20    232.8805  2222.9483      517681.4
## 2    140 8.15.19    990.2061  2222.9483     2201177.0
## 3    140 8.21.19   1715.5844  2222.9483     3813655.4
## 4    141 8.13.20    445.4147  3539.1288     1576379.9
## 5    141 8.16.19    780.1900  3539.1288     2761192.9
## 6    153 8.16.19   1213.5758  3441.9475     4177064.2
## 7    158 8.17.20    482.9721  6103.7256     2947929.1
## 8    158 8.23.19   1252.4274  6103.7256     7644473.4
## 9    159 8.21.19    709.8142  2542.1727     1804470.4
## 10   159 8.19.20   1057.9005  2542.1727     2689365.7
## 11   183 8.14.19    797.4651  1350.3603     1076865.2
## 12   183 8.18.20    485.0438  1350.3603      654983.9
## 13   184 8.13.19   1618.6949  6924.1491    11208084.8
## 14   184 8.11.20    369.6386  6924.1491     2559432.6
## 15   188 8.19.20   1334.8108 15959.9251    21303480.4
## 16   188 8.19.19   1674.2044 15959.9251    26720177.6
## 17   236 8.22.19    965.6221  4494.6855     4340167.8
## 18   237 8.17.20   1220.0602 12325.4814    15037829.7
## 19   237 8.22.19   1883.5831 12325.4814    23216068.2
## 20    28 8.15.19    644.4606  2493.4080     1606903.3
## 21    28 8.12.20    618.9667  2493.4080     1543336.5
## 22    32 8.12.20    304.3385   434.2095      132146.7
## 23    32 8.14.19    730.1464   434.2095      317036.5

Typha x glauca, 2020 control samples

I selected only non-inflorescence control plots from 2020, as 2019 data combined inflo + non-inflo typha samples. We can tell from looking at data that inflos need to be calculated separately to make estimates accurate, but for now, this 2020 data will suffice until next year’s collection. #### Building subset of typha data

typha20 <- subset(main.df, species == "typha" &
                    (date == "2020" & treatment == "control"))
summary(typha20)

Checking normality assumptions

linear model and QQ plots:

# linear model
LMtypha20 <- lm(DryWeight_g ~ HeightClass, data = typha20)

# testing residuals
qqnorm(LMtypha20$residuals)
qqline(LMtypha20$residuals)

Shapiro-Wilk Test

Ho: Data are drawn from a normal distribution (p-value > 0.001)

Ha: Data are NOT drawn from a normal distribution

shapiro.test(LMtypha20$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  LMtypha20$residuals
## W = 0.88391, p-value = 0.0003104

The QQ plot is unconvincing and our Shapiro-Wilk test generated a p-value below 0.001. Data are not normally distributed and a transformation is required.

Data transformations

I hid the output to keep the section tidy, but p-values are listed under each transformation test.

LOGtypha20 <- lm(log10(DryWeight_g) ~ HeightClass, data = typha20) # linear model with log10 transformation
SQRTtypha20 <- lm(sqrt(DryWeight_g) ~ HeightClass, data = typha20) # lm with sqrt transformation
X2typha20 <- lm((DryWeight_g)^2 ~ HeightClass, data = typha20) # lm w/ x^2 transformation

shapiro.test(LOGtypha20$residuals)
# p-value is 0.1976, DATA NORMAL

shapiro.test(SQRTtypha20$residuals)
# p-value is 0.02009, DATA NORMAL

shapiro.test(X2typha20$residuals)
# p-value is 0.00000008287, data not normal

Of the three tested data transformations, both the log10 transformation and the square root transformation passed the Shapiro-Wilk Test. I chose the the log10 for further analysis here, but may want to revisit this spot given SQRT transformations in other scenarios tended to be stronger. be consistent with the transformation to phragmites.

Here is the log10 transformation QQ plot:

qqnorm(LOGtypha20$residuals)
qqline(LOGtypha20$residuals)

Now that data meets normality assumptions, we can run an equal variance test of residuals.

plot(LOGtypha20, 1)

For the non-constant variance test, the hypotheses are:

Ho: Data have constant variance (p-value > 0.05)

Ha: Data have non-constant variance

ncvTest(LOGtypha20)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.2141622, Df = 1, p = 0.64352

Here, the p-value is 0.64352. We confirm the null hypothesis; our data have constant variance.

Next, linearity and outlier tests.

# Linearity Test 
crPlots(LOGtypha20)

# Outlier Test ==================
plot(LOGtypha20, 5)

creating the allometric equation

Now that we have a model that meets our assumptions, we can build a biomass equation.

summary(LOGtypha20)

## 
## Call:
## lm(formula = log10(DryWeight_g) ~ HeightClass, data = typha20)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38283 -0.16112  0.00088  0.10863  0.43361 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.112070   0.161810   0.693    0.492    
## HeightClass 0.023742   0.003504   6.776 2.72e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2013 on 43 degrees of freedom
## Multiple R-squared:  0.5164, Adjusted R-squared:  0.5051 
## F-statistic: 45.91 on 1 and 43 DF,  p-value: 2.721e-08

The relevant outputs are the y-intercept of 0.112070 and the HeightClass of 0.023742. These numbers give us the y-intercept and slope. The equation is y = 0.023742x + 0.112070 + E, where E is the root mean squared error we will now calculate.

# Root mean squared error
RSS <- c(crossprod(LOGtypha20$residuals))
MSE <- RSS / length(LOGtypha20$residuals)
RMSE <- sqrt(MSE)
RMSE # 68% of predicted values fall within (x +/- RMSE)

## [1] 0.1967814

RMSE * 2 # 95% of predict values fall within (x +/- RMSE*2)

## [1] 0.3935628

Note: the RMSE must also be back-transformed. Both the linear output and RMSE can be back-transformed by 10^x, which returns data from a log10 transformation.

Typha x glauca, all non-harvest samples

I selected all Typha data for 2019, as all basins were pre-treatment at the time of data collection, but I used only the control plots from 2020. This arrangement provides the greatest number of untreated samples for analysis.

Building subset of typha data

typha <- subset(main.df, species == "typha" &
                    ((date == "2020" & treatment == "control") |
                    (date == "2019")))
summary(typha)

Checking normality assumptions

linear model and QQ plots:

# linear model
LMtypha <- lm(DryWeight_g ~ HeightClass, data = typha)

# testing residuals
qqnorm(LMtypha$residuals)
qqline(LMtypha$residuals)

Shapiro-Wilk Test

Ho: Data are drawn from a normal distribution (p-value > 0.001)

Ha: Data are NOT drawn from a normal distribution

shapiro.test(LMtypha$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  LMtypha$residuals
## W = 0.88309, p-value = 3.76e-09

The QQ plot is unconvincing and our Shapiro-Wilk test generated a p-value well below 0.001. Data are not normally distributed and a transformation is required.

Data transformations

I hid the output to keep the section tidy, but p-values are listed under each transformation test.

LOGtypha <- lm(log10(DryWeight_g) ~ HeightClass, data = typha) # linear model with log10 transformation
SQRTtypha <- lm(sqrt(DryWeight_g) ~ HeightClass, data = typha) # lm with sqrt transformation
X2typha <- lm((DryWeight_g)^2 ~ HeightClass, data = typha) # lm w/ x^2 transformation

shapiro.test(LOGtypha$residuals)
# p-value is 0.2202, DATA NORMAL

shapiro.test(SQRTtypha$residuals)
# p-value is 0.00002206, DATA NORMAL

shapiro.test(X2typha$residuals)
# p-value is 0.000000000000025, data not normal

Of the three tested data transformations, both the log10 transformation and the square root transformation passed the Shapiro-Wilk Test.

Comparing log10 and square root transformations

Here are their QQ plots: The log10 transformation has a stronger QQ plot, but other tests may give a better idea of which transformation to choose.

Now that two sets of data meet normality assumptions, we can run equal variance tests of residuals.

plot(LOGtypha, 1, main = "Residuals vs Fitted, log10 transformation")

plot(SQRTtypha, 1, main = "Residuals vs Fitted, square root transformation")

The log10 transformation is more convincing here.

# Linearity Test 
crPlots(LOGtypha, main = "log10 transformation")

crPlots(SQRTtypha, main = "square root transformation")

# Outlier Test 
plot(LOGtypha, 5, main = "log10 transformation")

plot(SQRTtypha, 5, main = "square root transformation")

For the non-constant variance test, the hypotheses are:

Ho: Data have constant variance (p-value > 0.05)

Ha: Data have non-constant variance

ncvTest(LOGtypha)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 14.5631, Df = 1, p = 0.00013554

ncvTest(SQRTtypha)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 2.302147, Df = 1, p = 0.1292

The log10 transformation does not have constant variance: its p-value is 0.00013554.

The square root transformation, though, is very close. Its p-value is 0.1292. This transformation has constant variance.

Building the equation

summary(SQRTtypha)

## 
## Call:
## lm(formula = sqrt(DryWeight_g) ~ HeightClass, data = typha)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6087 -1.0065 -0.1491  0.7086  4.2744 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.2064     0.6104  -0.338    0.736    
## HeightClass   0.1074     0.0130   8.268 9.83e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.261 on 139 degrees of freedom
## Multiple R-squared:  0.3296, Adjusted R-squared:  0.3248 
## F-statistic: 68.35 on 1 and 139 DF,  p-value: 9.834e-14

Based on the summary, we can tell the y-intercept is -0.2064 and the slope is 0.1074. Therefore our allometric equation is y = 0.1074x -0.2064.

The following code can determine our root mean squared error:

RSS <- c(crossprod(SQRTtypha$residuals))
MSE <- RSS / length(SQRTtypha$residuals)
RMSE <- sqrt(MSE)
RMSE # 68% of predicted values fall within x +/- RMSE

## [1] 1.252277

RMSE * 2 # 95% of predict values fall within x +/- RMSE

## [1] 2.504555

allometric equations for IL Tollway project

sschurkamp

Phragmites australis

data checks

Building subset of phrag data

Checking normality assumptions

Data transformations

Comparing log10 and square root transformations

Building the equation

Obtaining Phragmites biomass estimates

Typha x glauca, 2020 control samples

Checking normality assumptions

Data transformations

creating the allometric equation

Typha x glauca, all non-harvest samples

Building subset of typha data

Checking normality assumptions

Data transformations

Comparing log10 and square root transformations

Building the equation