Hello and welcome to my initial analysis of the Enzyme-Glyphosate study. This is a first pass so notes, comments, questions, and praise are all encouraged! To remind you of the study design, we have 5 ascending concentrations of Glyphosate (0,30,60,120,240 g/L) and 3 corresponding volume ratios of the “2XL” termite cellulases (0,25,50 %v/v). Every combination of those two substances were made and applied to 10 honeysuckle (Lonicera maackii) with 4 replicates at Lugar Farm. That means 40 stumps per treatment so a total of 560 treated honeysuckle stumps. Basal diameter was measured prior to treatment. Number of resprouting stems and height of tallest resprout were measured both within the same year (~3 months after treatment) and the following year. The goal is to find an interacting effect between glyphosate and 2XL that would suggest that the cellulases are having a adjuvative effect on the herbicide helping its effectiveness at killing an individual honeysuckle.

Libraries Used

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(Hmisc)
library(lme4)
library(MASS)                   
library(ROCR)                   
library(PredictABEL)        
library(PresenceAbsence)        
library(sjstats)
library(pwr)
require(pscl)

Dataframe Management

Obviously everyone’s favorite part of any analysis is getting their dataframes squared away. In this case, I had to make up for my earlier laziness by replacing the flag-color which denoted the treatment each stump received with actual numbers. Was there an easier was of doing this? I’m sure. But does this work? Yes! And that’s all that matters.

#Reading in CSV with all the nice data on it
Gly<- read.csv("C:/Users/benny/Box Sync/Purdue/Honeysuck-GlyphosateEnzyme/Honeysuckle Glyphosate Data.csv", na.strings = "-")

#Translating first color of the Flag Color into the Glyphosate concentration
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^W."), 0, NA))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^B."), 30, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^G."), 60, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^Y."), 120, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^R."), 240, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^W"), 0, Gly$Glyphosate))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^B"), 30, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^G"), 60, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^Y"), 120, Gly$Glyphosate ))
Gly <- mutate(Gly, Glyphosate = ifelse(str_detect(Gly$FlagColor, "^R"), 240, Gly$Glyphosate ))

#Doing the same for the 2Xl
Gly<- mutate(Gly, TwoXL = ifelse(str_detect(Gly$FlagColor, "O$"), 25, 0 ))
Gly<- mutate(Gly, TwoXL = ifelse(str_detect(Gly$FlagColor, ".B$"), 50, Gly$TwoXL ))
Gly<- mutate(Gly, TwoXL = ifelse(str_detect(Gly$FlagColor, "^O"), NA, Gly$TwoXL))

#Checking to see if it worked. We picked up an extra Red-Orange treament along the way. Not an error.
table(Gly$Glyphosate)

## 
##   0  30  60 120 240 
## 120 120 120 120 121

table(Gly$TwoXL)

## 
##   0  25  50 
## 200 201 200

Exploratory Graphs

Now is where the fun begins. I wanted to explore the distributions of number of resprouts and height of tallest resprout not only by same year resprouting and the following year resprouting, but also parsed between Glyphosate and 2XL concentrations. To do this, I used what are called ‘Violin Plots’. They are a little newer to the scene, but are nice because they are similar to boxplots but vary with the distribution of the data with their widths. The points and errorbars represent the mean and standard deviation respectively. Each measure is divided into 3 facets based on the treatment of 2XL and then, within each facet, into a violin plot with each treatment of glyphosate (color coordinated to flag color). Slightly transparent X’s are jittered along each violin plot to show where the real data is.

This is just one representation, so I’m always looking for feedback on data visualization. One question I have is whether or not the color for mean and standard deviation is appropriate. I tried to find something that would stick out without being obnoxious and this is my best attempt thus-far. Additionally, are these graphs accessible to color-blind folk? Is the violin-plot the best way to represent this data? Please let me know your thoughts!

Same Year Measurements

Number of Resprouts

#Number of Resprouts
ggplot(aes(x = as.factor(Glyphosate), y = Resprouts1, fill = as.factor(Glyphosate)), data = subset(Gly, !is.na(Resprouts1))) +
  geom_violin(trim = T, adjust = 5)+
  facet_wrap(~(TwoXL))+scale_fill_manual(values=c("White", "Blue", "ForestGreen", "Yellow", "DarkRed"))+
  theme_bw()+
  theme(legend.position="none",plot.subtitle = element_text(hjust = 0.5), plot.title = element_text(size =20, face ="bold"))+ 
  geom_jitter(shape=4, position=position_jitter(0.2), alpha = 0.5)+
  stat_summary(fun.data="mean_sdl",  geom="point", color="deeppink3")+
  stat_summary(fun.data="mean_sdl",  geom="errorbar", color="deeppink3")+
  labs(title = "Number of Resprouting Stems in Same Year", subtitle = "2XL (%v/v)", x= "Glyphosate (g/L)", y = "Number of Resprouts")

Height of Tallest Resprout

#Tallest Resrpout
ggplot(aes(x = as.factor(Glyphosate), y = as.numeric(TallestHeight1), fill = as.factor(Glyphosate)), data = subset(Gly, !is.na(Resprouts1))) +
  geom_violin(trim = T, adjust = 5)+
  facet_wrap(~(TwoXL))+scale_fill_manual(values=c("White", "Blue", "ForestGreen", "Yellow", "DarkRed"))+
  theme_bw()+
  theme(legend.position="none",plot.subtitle = element_text(hjust = 0.5), plot.title = element_text(size =20, face ="bold"))+ 
  geom_jitter(shape=4, position=position_jitter(0.2), alpha = 0.5)+
  stat_summary(fun.data="mean_sdl",  geom="point", color="deeppink3")+
  stat_summary(fun.data="mean_sdl",  geom="errorbar", color="deeppink3")+
  labs(title = "Height of Tallest Resprout in Same Year", subtitle = "2XL (%v/v)", x= "Glyphosate (g/L)", y = "Height(cm)")

Following Year Measurements

Number of Resprouts

#Number of Resprouts
ggplot(aes(x = as.factor(Glyphosate), y = Resprouts2, fill = as.factor(Glyphosate)), data = subset(Gly, !is.na(Resprouts2))) +
  geom_violin(trim = T, adjust = 5)+
  facet_wrap(~(TwoXL))+scale_fill_manual(values=c("White", "Blue", "ForestGreen", "Yellow", "DarkRed"))+
  theme_bw()+
  theme(legend.position="none",plot.subtitle = element_text(hjust = 0.5), plot.title = element_text(size =20, face ="bold"))+ 
  geom_jitter(shape=4, position=position_jitter(0.2), alpha = 0.5)+
  stat_summary(fun.data="mean_sdl",  geom="point", color="deeppink3")+
  stat_summary(fun.data="mean_sdl",  geom="errorbar", color="deeppink3")+
  labs(title = "Number of Resprouting Stems in Following Year", subtitle = "2XL (%v/v)", x= "Glyphosate (g/L)", y = "Number of Resprouts")

Height of Tallest Resprout

#Tallest Resrpout
ggplot(aes(x = as.factor(Glyphosate), y = as.numeric(TallestHeight2), fill = as.factor(Glyphosate)), data = subset(Gly, !is.na(Resprouts2))) +
  geom_violin(trim = T, adjust = 5)+
  facet_wrap(~(TwoXL))+scale_fill_manual(values=c("White", "Blue", "ForestGreen", "Yellow", "DarkRed"))+
  theme_bw()+
  theme(legend.position="none",plot.subtitle = element_text(hjust = 0.5), plot.title = element_text(size =20, face ="bold"))+ 
  geom_jitter(shape=4, position=position_jitter(0.2), alpha = 0.5)+
  stat_summary(fun.data="mean_sdl", geom="point", color="deeppink3")+
  stat_summary(fun.data="mean_sdl", geom="errorbar", color="deeppink3")+
  labs(title = "Height of Tallest Resprout in Following Year", subtitle = "2XL (%v/v)", x= "Glyphosate (g/L)", y = "Height(cm)")

Histograms for Normality

After creating these violin plots, one thing became abundantly clear: there are a lot of 0’s in the data. That is likely going to inform our choice of statistical tests. The first step is to check out a histogram of values without and with a transformation applied. If we can then assess whether a transformation will get us back into the realm of linear statistics. Here, I created some quick and dirty histograms strictly for internal use to determine next steps. The format of both metrics within same year measurements and following year measurements was followed here.

####Same Year####
##Number of Resprouts
histogram(~Resprouts1, data = Gly)
histogram(~sqrt(Resprouts1), data = Gly)
##Height of Tallest Resprout
histogram(~TallestHeight1, data = Gly)
histogram(~sqrt(as.numeric(TallestHeight1)), data = Gly)

####Next Year####
##Number of Resprouts
histogram(~Resprouts2, data = Gly)
histogram(~sqrt(Resprouts2), data = Gly)
##Height of Tallest Resprout
histogram(~TallestHeight2, data = Gly)
histogram(~sqrt(as.numeric(TallestHeight2)), data = Gly)

As we can see from these histograms, the non-transformed distribution is highly skewed and the square-root transformations did next to nothing to normalize the data. I did not include the other transformations because they all showed the same thing which would not have been very interesting. No amount transformations are going to make these data normal and we just have to accept that. These data have too many zeros! Now, it is time to figure out what the heck we are going to do.

Negative Binomial Regression

Okay, so, we’ve left the world of linear statistics, which is always scary. I started to ruminate on what to do next. I started googling a lot. I even made my amazing friend, Kate Wynne (shout out), video chat with me while I picked her brain on what in the world to do. Kate and I went back and forth on some options before I meandered off and starting trying a couple things.

A ‘Zero Inflated Poisson’ model looked interesting for a while until I came to the conclusion that our zeros are not ‘inflated’, but in fact very real. I basically did an entire analysis of these data using logistical regressions, but became unsatisfied with how it did not incooperate the magnitude of the resprouting, only whether or not it did resprout. I ended up scraping it. I touched on factorial Anovas and even hurdle models, but none of them felt right.

Eventually I found something - which Kate had casually mentioned early on and if I had just listened to her, I could have saved myself a lot trouble - called a Negative Binomial Regression Analysis. These negative binomial regressions specialize in highly skewed data with a lot of zeros. It differs from a Poisson Regression because they specialize in ‘Over-Dispersed’ distributions (meaning that the variance is larger than the mean) which the data are. I based this analysis on this UCLA guide.

The format will follow the same order as the previous graphs and the same method is followed for each measure. I start by making the negative binomial regression for the measure. The explanatory variables include the Total Area (cm^2) of each individual prior to cutting as a fixed effect along with the concentration of Glyphosate, the concentration of 2XL, and an interaction effect between Glyphosate and TwoXL. Something important to note, in this analysis, I am using the concentrations of glyphosate and 2XL as categorical variables rather than continuous variables. We can go back and forth about this, but I do believe that is the correct use. Now, don’t get caught staring at the summary from this model too much because it can be a little misleading. For this analysis, we are just checking if the overall model is significant, which I believe is represented by the Intercept p-value (please correct me if that is incorrect). This study is all about the interaction effect between glyphosate and 2XL, so the next step is to check whether or no the interaction effect is significant. This is done by creating another negative binomial regression with the interaction effect removed and then comparing that with the original model. If that model is significant, then the interaction between glyphosate has statistical significance. The final step is to check the assumption that the data are Over-dispersed (a little backwards to intuition, I know). This is done by, once again, creating another model. This is a Poisson regression with the same formula as our initial negative binomial regression. The log-likelihoods are tested and significant p-value suggests that the negative binomial is more appropriate than the Poisson.

I know this is a lot and very dense, so feel free to check in with any questions or comments. I am learning this stuff too, so we can learn together.

Alpha= 0.05 for all tests

Note: It seems I was incorrect about the intercept p-value giving the significance for the overall model. According to this, an agreed upon way is to compare the AIC’s of the full model to a null negative binomial regression model. Smaller AIC is better. I also included the AIC of the reduced model with the interaction effect removed.

Same Year Measures

Number of Resprouts

#Creating negative binomial regression
SR1 <- glm.nb(Resprouts1 ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL) , data = subset(Gly, !is.na(Resprouts1)))

#summary of our model
summary(SR1)

## 
## Call:
## glm.nb(formula = Resprouts1 ~ (TotalArea) + as.factor(Glyphosate) * 
##     as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts1)), 
##     init.theta = 1.287500595, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2021  -0.5785  -0.4352   0.0000   5.0310  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  1.598e+00  1.660e-01   9.626
## TotalArea                                    8.832e-04  8.776e-04   1.006
## as.factor(Glyphosate)30                     -3.392e+00  4.345e-01  -7.807
## as.factor(Glyphosate)60                     -3.393e+00  4.391e-01  -7.727
## as.factor(Glyphosate)120                    -3.890e+01  1.061e+07   0.000
## as.factor(Glyphosate)240                    -3.845e+01  1.057e+07   0.000
## as.factor(TwoXL)25                           3.525e-01  2.268e-01   1.554
## as.factor(TwoXL)50                           3.064e-01  2.220e-01   1.380
## as.factor(Glyphosate)30:as.factor(TwoXL)25   5.370e-01  5.450e-01   0.985
## as.factor(Glyphosate)60:as.factor(TwoXL)25  -4.366e-01  6.404e-01  -0.682
## as.factor(Glyphosate)120:as.factor(TwoXL)25  3.552e+01  1.061e+07   0.000
## as.factor(Glyphosate)240:as.factor(TwoXL)25  3.410e+01  1.057e+07   0.000
## as.factor(Glyphosate)30:as.factor(TwoXL)50  -8.446e-01  6.938e-01  -1.217
## as.factor(Glyphosate)60:as.factor(TwoXL)50  -1.600e-01  5.970e-01  -0.268
## as.factor(Glyphosate)120:as.factor(TwoXL)50  3.398e+01  1.061e+07   0.000
## as.factor(Glyphosate)240:as.factor(TwoXL)50 -5.329e-01  1.498e+07   0.000
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## TotalArea                                      0.314    
## as.factor(Glyphosate)30                     5.84e-15 ***
## as.factor(Glyphosate)60                     1.10e-14 ***
## as.factor(Glyphosate)120                       1.000    
## as.factor(Glyphosate)240                       1.000    
## as.factor(TwoXL)25                             0.120    
## as.factor(TwoXL)50                             0.168    
## as.factor(Glyphosate)30:as.factor(TwoXL)25     0.325    
## as.factor(Glyphosate)60:as.factor(TwoXL)25     0.495    
## as.factor(Glyphosate)120:as.factor(TwoXL)25    1.000    
## as.factor(Glyphosate)240:as.factor(TwoXL)25    1.000    
## as.factor(Glyphosate)30:as.factor(TwoXL)50     0.223    
## as.factor(Glyphosate)60:as.factor(TwoXL)50     0.789    
## as.factor(Glyphosate)120:as.factor(TwoXL)50    1.000    
## as.factor(Glyphosate)240:as.factor(TwoXL)50    1.000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.2875) family taken to be 1)
## 
##     Null deviance: 1376.93  on 588  degrees of freedom
## Residual deviance:  355.19  on 573  degrees of freedom
## AIC: 1054.2
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.288 
##           Std. Err.:  0.281 
## 
##  2 x log-likelihood:  -1020.191

#Creating Null model to find sig of whole model through comparison of AICs
SR0 <- glm.nb(Resprouts1 ~ 1 , data = subset(Gly, !is.na(Resprouts1)))

#Creating model without interaction effect between glyphosate
SR2 <- update(SR1, . ~ . - as.factor(Glyphosate):as.factor(TwoXL))
SR.aov<-anova(SR1,SR2) #anova comparing the two
SR.aov #read out from comparison

## Likelihood ratio tests of Negative Binomial Models
## 
## Response: Resprouts1
##                                                    Model    theta Resid. df
## 1   TotalArea + as.factor(Glyphosate) + as.factor(TwoXL) 1.176854       581
## 2 (TotalArea) + as.factor(Glyphosate) * as.factor(TwoXL) 1.287501       573
##      2 x log-lik.   Test    df LR stat.     Pr(Chi)
## 1       -1043.247                                  
## 2       -1020.191 1 vs 2     8 23.05618 0.003292866

#Checking Poisson Assumption
SR3 <- glm(Resprouts1 ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL) , family = "poisson", data = subset(Gly, !is.na(Resprouts1))) #creating model
SR.Pois<-pchisq(2 * (logLik(SR1) - logLik(SR3)), df = 1, lower.tail = FALSE) #comparing negative binomial model and Poisson model
SR.Pois #read out

## 'log Lik.' 1.430936e-31 (df=17)

Overall Model AIC: 1054.1907632
Null Model AIC: 1377.287233
Reduced Model AIC: 1061.2469401

Interaction Effect p-value: 0.003292866

Poisson Assumption p-value: 1.430936e-31

(Don’t worry, I’ll report more in the paper. This is just for reference)

Height of Tallest Resprout

#Creating negative binomial regression
ST1 <- glm.nb(as.numeric(TallestHeight1) ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts1)))
 
#summary of our model
summary(ST1)

## 
## Call:
## glm.nb(formula = as.numeric(TallestHeight1) ~ (TotalArea) + as.factor(Glyphosate) * 
##     as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts1)), 
##     init.theta = 1.332190106, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2232  -0.6746  -0.4329  -0.0036   6.2648  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  3.0484087  0.1486306  20.510
## TotalArea                                   -0.0001603  0.0005540  -0.289
## as.factor(Glyphosate)30                     -2.1086831  0.2242316  -9.404
## as.factor(Glyphosate)60                     -2.1578471  0.2277990  -9.473
## as.factor(Glyphosate)120                    -3.0407692  0.2556505 -11.894
## as.factor(Glyphosate)240                    -3.0365199  0.2562497 -11.850
## as.factor(TwoXL)25                           0.2029220  0.2069630   0.980
## as.factor(TwoXL)50                           0.1783533  0.2033527   0.877
## as.factor(Glyphosate)30:as.factor(TwoXL)25   0.0242244  0.3157227   0.077
## as.factor(Glyphosate)60:as.factor(TwoXL)25  -0.7549032  0.3358577  -2.248
## as.factor(Glyphosate)120:as.factor(TwoXL)25  0.8164167  0.3383726   2.413
## as.factor(Glyphosate)240:as.factor(TwoXL)25  0.4512486  0.3429264   1.316
## as.factor(Glyphosate)30:as.factor(TwoXL)50  -0.5535640  0.3213580  -1.723
## as.factor(Glyphosate)60:as.factor(TwoXL)50  -0.4881684  0.3240870  -1.506
## as.factor(Glyphosate)120:as.factor(TwoXL)50  0.6690487  0.3396348   1.970
## as.factor(Glyphosate)240:as.factor(TwoXL)50 -0.1844350  0.3597572  -0.513
##                                             Pr(>|z|)    
## (Intercept)                                   <2e-16 ***
## TotalArea                                     0.7723    
## as.factor(Glyphosate)30                       <2e-16 ***
## as.factor(Glyphosate)60                       <2e-16 ***
## as.factor(Glyphosate)120                      <2e-16 ***
## as.factor(Glyphosate)240                      <2e-16 ***
## as.factor(TwoXL)25                            0.3269    
## as.factor(TwoXL)50                            0.3805    
## as.factor(Glyphosate)30:as.factor(TwoXL)25    0.9388    
## as.factor(Glyphosate)60:as.factor(TwoXL)25    0.0246 *  
## as.factor(Glyphosate)120:as.factor(TwoXL)25   0.0158 *  
## as.factor(Glyphosate)240:as.factor(TwoXL)25   0.1882    
## as.factor(Glyphosate)30:as.factor(TwoXL)50    0.0850 .  
## as.factor(Glyphosate)60:as.factor(TwoXL)50    0.1320    
## as.factor(Glyphosate)120:as.factor(TwoXL)50   0.0488 *  
## as.factor(Glyphosate)240:as.factor(TwoXL)50   0.6082    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.3322) family taken to be 1)
## 
##     Null deviance: 1404.86  on 588  degrees of freedom
## Residual deviance:  494.35  on 573  degrees of freedom
## AIC: 2715
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.3322 
##           Std. Err.:  0.0967 
## 
##  2 x log-likelihood:  -2680.9840

#Creating Null model to find sig of whole model through comparison of AICs
ST0 <- glm.nb(as.numeric(TallestHeight1) ~ 1 , data = subset(Gly, !is.na(Resprouts1)))


#Creating model without interaction effect between glyphosate
ST2 <- update(ST1, . ~ . - as.factor(Glyphosate):as.factor(TwoXL))
ST.aov<-anova(ST1,ST2)#anova comparing the two
ST.aov #read out from comparison

## Likelihood ratio tests of Negative Binomial Models
## 
## Response: as.numeric(TallestHeight1)
##                                                    Model    theta Resid. df
## 1   TotalArea + as.factor(Glyphosate) + as.factor(TwoXL) 1.276919       581
## 2 (TotalArea) + as.factor(Glyphosate) * as.factor(TwoXL) 1.332190       573
##      2 x log-lik.   Test    df LR stat.      Pr(Chi)
## 1       -2708.600                                   
## 2       -2680.984 1 vs 2     8 27.61526 0.0005532135

#Checking Poisson Assumption
ST3 <- glm(as.numeric(TallestHeight1) ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL) , family = "poisson", data = subset(Gly, !is.na(Resprouts1)))
ST.Pois<-pchisq(2 * (logLik(ST1) - logLik(ST3)), df = 1, lower.tail = FALSE)
ST.Pois

## 'log Lik.' 0 (df=17)

Overall Model AIC: 2714.9844575
Null Model AIC: 3264.6985098
Reduced Model AIC: 2726.5997215

Interaction Effect p-value: 0.0005532135

Poisson Assumption p-value: 0 (Not an error, just very low)

Following Year

Number of Resprouts

#Creating negative binomial regression
FR1 <- glm.nb(Resprouts2 ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts2)))
 
summary(FR1)

## 
## Call:
## glm.nb(formula = Resprouts2 ~ (TotalArea) + as.factor(Glyphosate) * 
##     as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts2)), 
##     init.theta = 1.36787988, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5070  -0.7521  -0.3900   0.0000   5.1137  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  2.273e+00  1.520e-01  14.951
## TotalArea                                    9.935e-04  8.544e-04   1.163
## as.factor(Glyphosate)30                     -4.631e+00  5.384e-01  -8.602
## as.factor(Glyphosate)60                     -4.420e+00  4.951e-01  -8.927
## as.factor(Glyphosate)120                    -3.965e+01  1.061e+07   0.000
## as.factor(Glyphosate)240                    -3.944e+01  1.061e+07   0.000
## as.factor(TwoXL)25                           2.192e-01  2.103e-01   1.042
## as.factor(TwoXL)50                          -2.932e-02  2.079e-01  -0.141
## as.factor(Glyphosate)30:as.factor(TwoXL)25   1.754e+00  6.072e-01   2.889
## as.factor(Glyphosate)60:as.factor(TwoXL)25   7.392e-01  6.127e-01   1.207
## as.factor(Glyphosate)120:as.factor(TwoXL)25  3.599e+01  1.061e+07   0.000
## as.factor(Glyphosate)240:as.factor(TwoXL)25  3.573e+01  1.061e+07   0.000
## as.factor(Glyphosate)30:as.factor(TwoXL)50   1.758e+00  6.125e-01   2.869
## as.factor(Glyphosate)60:as.factor(TwoXL)50  -3.495e+01  1.075e+07   0.000
## as.factor(Glyphosate)120:as.factor(TwoXL)50  3.479e+01  1.061e+07   0.000
## as.factor(Glyphosate)240:as.factor(TwoXL)50  1.213e-01  1.501e+07   0.000
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## TotalArea                                    0.24494    
## as.factor(Glyphosate)30                      < 2e-16 ***
## as.factor(Glyphosate)60                      < 2e-16 ***
## as.factor(Glyphosate)120                     1.00000    
## as.factor(Glyphosate)240                     1.00000    
## as.factor(TwoXL)25                           0.29721    
## as.factor(TwoXL)50                           0.88782    
## as.factor(Glyphosate)30:as.factor(TwoXL)25   0.00387 ** 
## as.factor(Glyphosate)60:as.factor(TwoXL)25   0.22762    
## as.factor(Glyphosate)120:as.factor(TwoXL)25  1.00000    
## as.factor(Glyphosate)240:as.factor(TwoXL)25  1.00000    
## as.factor(Glyphosate)30:as.factor(TwoXL)50   0.00411 ** 
## as.factor(Glyphosate)60:as.factor(TwoXL)50   1.00000    
## as.factor(Glyphosate)120:as.factor(TwoXL)50  1.00000    
## as.factor(Glyphosate)240:as.factor(TwoXL)50  1.00000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.3679) family taken to be 1)
## 
##     Null deviance: 1806.26  on 588  degrees of freedom
## Residual deviance:  391.96  on 573  degrees of freedom
## AIC: 1246.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.368 
##           Std. Err.:  0.309 
## 
##  2 x log-likelihood:  -1212.759

#Creating Null model to find sig of whole model through comparison of AICs
FR0 <- glm.nb(Resprouts2 ~ 1 , data = subset(Gly, !is.na(Resprouts2)))

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in sqrt(1/i): NaNs produced

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in sqrt(1/i): NaNs produced

#Creating model without interaction effect between glyphosate
FR2 <- update(FR1, . ~ . - as.factor(Glyphosate):as.factor(TwoXL))
FR.aov<-anova(FR1,FR2)
FR.aov

## Likelihood ratio tests of Negative Binomial Models
## 
## Response: Resprouts2
##                                                    Model   theta Resid. df
## 1   TotalArea + as.factor(Glyphosate) + as.factor(TwoXL) 0.99494       581
## 2 (TotalArea) + as.factor(Glyphosate) * as.factor(TwoXL) 1.36788       573
##      2 x log-lik.   Test    df LR stat.      Pr(Chi)
## 1       -1263.697                                   
## 2       -1212.759 1 vs 2     8 50.93766 2.697638e-08

#Checking Poisson Assumption
FR3 <- glm(Resprouts2 ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL) , family = "poisson", data = subset(Gly, !is.na(Resprouts1)))
FR.Pois<-pchisq(2 * (logLik(FR1) - logLik(FR3)), df = 1, lower.tail = FALSE)
SR.Pois

## 'log Lik.' 1.430936e-31 (df=17)

Overall Model AIC: 1246.7592115
Null Model AIC: 5100.4477949
Reduced Model AIC: 1281.6968679

Interaction Effect p-value: 2.697638e-08

Poisson Assumption p-value: 1.430936e-31

Tallest Height

#Creating negative binomial regression
FT1 <- glm.nb(as.numeric(TallestHeight2) ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts2)))
 
summary(FT1)

## 
## Call:
## glm.nb(formula = as.numeric(TallestHeight2) ~ (TotalArea) + as.factor(Glyphosate) * 
##     as.factor(TwoXL), data = subset(Gly, !is.na(Resprouts2)), 
##     init.theta = 1.066875614, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3089  -0.8645  -0.4785  -0.0076   5.8943  
## 
## Coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  3.9174646  0.1609886  24.334
## TotalArea                                   -0.0010558  0.0006388  -1.653
## as.factor(Glyphosate)30                     -2.9171767  0.2418736 -12.061
## as.factor(Glyphosate)60                     -2.8840926  0.2438923 -11.825
## as.factor(Glyphosate)120                    -3.8672390  0.2713929 -14.250
## as.factor(Glyphosate)240                    -3.8397251  0.2718888 -14.122
## as.factor(TwoXL)25                          -0.0421728  0.2259953  -0.187
## as.factor(TwoXL)50                          -0.0780272  0.2232482  -0.350
## as.factor(Glyphosate)30:as.factor(TwoXL)25   0.8670685  0.3371471   2.572
## as.factor(Glyphosate)60:as.factor(TwoXL)25   0.7105101  0.3405917   2.086
## as.factor(Glyphosate)120:as.factor(TwoXL)25  0.9664391  0.3646010   2.651
## as.factor(Glyphosate)240:as.factor(TwoXL)25  1.3472877  0.3590322   3.753
## as.factor(Glyphosate)30:as.factor(TwoXL)50   0.4170670  0.3370134   1.238
## as.factor(Glyphosate)60:as.factor(TwoXL)50  -0.8726981  0.3653991  -2.388
## as.factor(Glyphosate)120:as.factor(TwoXL)50  0.7347533  0.3682894   1.995
## as.factor(Glyphosate)240:as.factor(TwoXL)50  0.0385079  0.3837520   0.100
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## TotalArea                                   0.098363 .  
## as.factor(Glyphosate)30                      < 2e-16 ***
## as.factor(Glyphosate)60                      < 2e-16 ***
## as.factor(Glyphosate)120                     < 2e-16 ***
## as.factor(Glyphosate)240                     < 2e-16 ***
## as.factor(TwoXL)25                          0.851967    
## as.factor(TwoXL)50                          0.726708    
## as.factor(Glyphosate)30:as.factor(TwoXL)25  0.010118 *  
## as.factor(Glyphosate)60:as.factor(TwoXL)25  0.036969 *  
## as.factor(Glyphosate)120:as.factor(TwoXL)25 0.008033 ** 
## as.factor(Glyphosate)240:as.factor(TwoXL)25 0.000175 ***
## as.factor(Glyphosate)30:as.factor(TwoXL)50  0.215887    
## as.factor(Glyphosate)60:as.factor(TwoXL)50  0.016925 *  
## as.factor(Glyphosate)120:as.factor(TwoXL)50 0.046038 *  
## as.factor(Glyphosate)240:as.factor(TwoXL)50 0.920070    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.0669) family taken to be 1)
## 
##     Null deviance: 1621.65  on 588  degrees of freedom
## Residual deviance:  526.75  on 573  degrees of freedom
## AIC: 3054.1
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.0669 
##           Std. Err.:  0.0709 
## 
##  2 x log-likelihood:  -3020.1010

#Creating Null model to find sig of whole model through comparison of AICs
FT0 <- glm.nb(as.numeric(TallestHeight2) ~ 1 , data = subset(Gly, !is.na(Resprouts2)))


#Creating model without interaction effect between glyphosate
FT2 <- update(FT1, . ~ . - as.factor(Glyphosate):as.factor(TwoXL))
FT.aov<- anova(FT1,FT2)
FT.aov

## Likelihood ratio tests of Negative Binomial Models
## 
## Response: as.numeric(TallestHeight2)
##                                                    Model    theta Resid. df
## 1   TotalArea + as.factor(Glyphosate) + as.factor(TwoXL) 1.002658       581
## 2 (TotalArea) + as.factor(Glyphosate) * as.factor(TwoXL) 1.066876       573
##      2 x log-lik.   Test    df LR stat.      Pr(Chi)
## 1       -3058.755                                   
## 2       -3020.101 1 vs 2     8 38.65336 5.699321e-06

#Checking Poisson Assumption
FT3 <- glm(as.numeric(TallestHeight2) ~ (TotalArea)+as.factor(Glyphosate)*as.factor(TwoXL) , family = "poisson", data = subset(Gly, !is.na(Resprouts1)))
FT.Pois<-pchisq(2 * (logLik(FT1) - logLik(FT3)), df = 1, lower.tail = FALSE)
FT.Pois

## 'log Lik.' 0 (df=17)

Overall Model AIC: 3054.1013046
Null Model AIC: 3686.8419761
Reduced Model AIC: 3076.7546599

Interaction Effect p-value: 5.699321e-06

Poisson Assumption p-value: 0 (Not an error, just very low)

These results are very promising for the interaction effect across the board. We have significant p-values for every metric! I am happy with the negative binomial regression, because I don’t think we could have observed this effect with most other analyses. Please let me know your thoughts on this analysis and if there are any ways for it to be improved!

Graphing Models

Now comes the very fun part of showing the model visually. This presented some challenges because our interesting variables are categorical. My choice here was to show each model’s predicted measure (whether that be number of resprouts or height of tallest resprout) for each level of glyphosate separated by each level of 2XL. Confidence intervals (95%) were generated by predicting the measure across the full spectrum of the Total Area we recorded. Graphing interaction effects can be tricky, but this method is pretty recognized. Once again, the same order format is followed here as before.

Of course, I care a lot about data visualization and accessibility. So, keep a look out for any way to improve in that area. Another note, I couldn’t keep the color scheme for 2XL in this graph because there was no way to make a white, orange, and black color scheme remotely legible and professional. One specific question comes with the confidence intervals. The UCLA guide I mentioned earlier took the exponential values for the limits of their confidence intervals. I generated mine a little differently than the guide and I wondering the taking the exponential values is inappropriate here. I think a discussion would be more effective than notes, so if you are interested in working through this, let me know.

Note: I figured out that I should NOT be exponetiating the confidence intervals, they are just small because TotalArea does no affect the model much.

Same Year Measurements

Number of Resprouts

#Creating dataframe with whole range of TotalArea (our fixed effect) covered in the data
newdata <- data.frame(TotalArea = rep(seq(from = min(Gly$TotalArea), to = max(Gly$TotalArea), length.out = 100),15)  , 
                       Glyphosate = rep(factor(1:5, levels = 1:5, labels = levels(as.factor(Gly$Glyphosate))), 300), 
                       TwoXL = rep(factor(1:3,levels=1:3, labels = levels(as.factor(Gly$TwoXL))),500))

SRgraph2<-merge(unique(newdata$TotalArea),unique(newdata[2:3]))#merging this to itself creates a dataframe with only the unique combinations in it. It is 1,5000 rows long because there are 15 combinations of Glyphosate and 2XL multiplied the 100 TotalAreas accross the spectrum we generated with the last command. 
colnames(SRgraph2)[1]<- "TotalArea" #The merge ditches the column name so this just adds it back


SRgraph2$PredictedResprouts<- predict(SR1, SRgraph2, type = "response") #adding model predictions that vary across Total Area for each unique combination of glyphosate and 2XL

#Here we aggregate the mean predicted value, the standard deviation, and standard error for each combination
SRgraph3<-aggregate(PredictedResprouts~Glyphosate+TwoXL, data = SRgraph2, FUN = mean)
SRgraph3<-cbind(SRgraph3,aggregate(PredictedResprouts~Glyphosate+TwoXL, data = SRgraph2, FUN = function(PredictedResprouts) (SD=sd(PredictedResprouts)))[3])
colnames(SRgraph3)[4]<- "sd"
SRgraph4<-mutate(SRgraph3, SE = sd/sqrt(100))
  

#The plot itself
ggplot(aes(y = PredictedResprouts, x = as.factor(Glyphosate), group =TwoXL, color = TwoXL), data = SRgraph4)+
  scale_color_manual(values=c("black", "darkOrange", "blue"))+
  geom_errorbar(aes(ymin = PredictedResprouts - (1.96*SE), ymax = PredictedResprouts + (1.96*SE)), alpha = 1.5, width = .5, size = 1, position=position_dodge(width=0.5))+
  geom_point(aes(shape = TwoXL), position=position_dodge(width=0.5), size = 4, alpha = 1)+
  #geom_line(aes(color = TwoXL), size = 1.25, position=position_dodge(width=0.5), alpha = .5)+
  labs(title = "Predicted Number of Resprouts with 95% Confidence Intervals", subtitle = "Same Year", y = "Predicted Number of Resprouting Stems", x = "Glyphosate (g/L)")+
  theme_bw()+
  theme(plot.title = element_text(size = 15, face = "bold"))

Height of Tallest Resprout

Note: the process for graphing each measure’s model was the same

STgraph2<-merge(unique(newdata$TotalArea),unique(newdata[2:3]))
colnames(STgraph2)[1]<- "TotalArea"


STgraph2$PredictedResprouts<- predict(ST1, STgraph2, type = "response")


STgraph3<-aggregate(PredictedResprouts~Glyphosate+TwoXL, data = STgraph2, FUN = mean)

STgraph3<-cbind(STgraph3,aggregate(PredictedResprouts~Glyphosate+TwoXL, data = STgraph2, FUN = function(PredictedResprouts) (SD=sd(PredictedResprouts)))[3])
colnames(STgraph3)[4]<- "sd"
STgraph4<-mutate(STgraph3, SE = sd/sqrt(100))
  


ggplot(aes(y = PredictedResprouts, x = as.factor(Glyphosate), group =TwoXL, color = TwoXL), data = STgraph4)+
    scale_color_manual(values=c("black", "darkOrange", "blue"))+
  geom_errorbar(aes(ymin = PredictedResprouts - (1.96*SE), ymax = PredictedResprouts + (1.96*SE)), alpha = 1.5, width = .5, size = 1, position=position_dodge(width=0.5))+
  geom_point(aes(shape = TwoXL), position=position_dodge(width=0.5), size = 4, alpha = 1)+
  #geom_line(aes(color = TwoXL), size = 1.25, position=position_dodge(width=0.5), alpha = .5)+
  labs(title = "Predicted Tallest Resprout Height with 95% Confidence Intervals", subtitle = "Same Year", y = "Predicted Height (cm)", x = "Glyphosate (g/L)")+
  theme_bw()+
  theme(plot.title = element_text(size = 15, face = "bold"))

Following Year

Number of Resprouts

FRgraph2<-merge(unique(newdata$TotalArea),unique(newdata[2:3]))
colnames(FRgraph2)[1]<- "TotalArea"


FRgraph2$PredictedResprouts<- predict(FR1, FRgraph2, type = "response")


FRgraph3<-aggregate(PredictedResprouts~Glyphosate+TwoXL, data = FRgraph2, FUN = mean)

FRgraph3<-cbind(FRgraph3,aggregate(PredictedResprouts~Glyphosate+TwoXL, data = FRgraph2, FUN = function(PredictedResprouts) (SD=sd(PredictedResprouts)))[3])
colnames(FRgraph3)[4]<- "sd"
FRgraph4<-mutate(FRgraph3, SE = sd/sqrt(100))
  


ggplot(aes(y = PredictedResprouts, x = as.factor(Glyphosate), group =TwoXL, color = TwoXL), data = FRgraph4)+
  scale_color_manual(values=c("black", "darkOrange", "blue"))+
  geom_errorbar(aes(ymin = PredictedResprouts - (1.96*SE), ymax = PredictedResprouts + (1.96*SE)), alpha = 1.5, width = .5, size = 1, position=position_dodge(width=0.5))+
  geom_point(aes(shape = TwoXL), position=position_dodge(width=0.5), size = 4, alpha = 1)+
  #geom_line(aes(color = TwoXL), size = 1.25, position=position_dodge(width=0.5), alpha = .5)+
  labs(title = "Predicted Number of Resprouts with 95% Confidence Intervals", subtitle = "Following Year", y = "Predicted Number of Resprouting Stems", x = "Glyphosate (g/L)")+
  theme_bw()+
  theme(plot.title = element_text(size = 15, face = "bold"))

Height of Tallest Resprout

FTgraph2<-merge(unique(newdata$TotalArea),unique(newdata[2:3]))
colnames(FTgraph2)[1]<- "TotalArea"


FTgraph2$PredictedResprouts<- predict(FT1, FTgraph2, type = "response")


FTgraph3<-aggregate(PredictedResprouts~Glyphosate+TwoXL, data = FTgraph2, FUN = mean)

FTgraph3<-cbind(FTgraph3,aggregate(PredictedResprouts~Glyphosate+TwoXL, data = FTgraph2, FUN = function(PredictedResprouts) (SD=sd(PredictedResprouts)))[3])
colnames(FTgraph3)[4]<- "sd"
FTgraph4<-mutate(FTgraph3, SE = sd/sqrt(100))
  


ggplot(aes(y = PredictedResprouts, x = as.factor(Glyphosate), group =TwoXL, color = TwoXL), data = FTgraph4)+
  scale_color_manual(values=c("black", "darkOrange", "blue"))+
  geom_errorbar(aes(ymin = PredictedResprouts - (1.96*SE), ymax = PredictedResprouts + (1.96*SE)), alpha = 1.5, width = .5, size = 1, position=position_dodge(width=0.5))+
  geom_point(aes(shape = TwoXL), position=position_dodge(width=0.5), size = 4, alpha = 1)+
  #geom_line(aes(color = TwoXL), size = 1.25, position=position_dodge(width=0.5), alpha = .5)+
  labs(title = "Predicted Tallest Resprout Height with 95% Confidence Intervals", subtitle = "Following Year", y = "Predicted Height (cm)", x = "Glyphosate (g/L)")+
  theme_bw()+
  theme(plot.title = element_text(size = 15, face = "bold"))

Wow! Okay, that concludes my first pass at the Enzyme Study data analysis. I hope you thought it was good or at least interesting. Quick conclusion is that 2XL and Glyphosate may be working in conjunction except that 2XL might be making the glyphosate worse! This may have interesting implications as to how glyphosate is inducted into the vascular system. There is probably more within the negative binomial regressions to analyze and I have deer browse data to either incooperate or to do a separate analysis on. I’ve worked on this for several straight days and nights, so a break from looking at R will be really nice. I am eagerly your responses and please remember that all constructive criticism and positive reinforcement are openly accepted.

-Ben J. Rivera

Helpful links in the making of this analysis

Glyphosate-Enzyme Study Data Exploration and Analysis

Ben J. Rivera

8/10/2020

Libraries Used

Dataframe Management

Exploratory Graphs

Same Year Measurements

Number of Resprouts

Height of Tallest Resprout

Following Year Measurements

Number of Resprouts

Height of Tallest Resprout

Histograms for Normality

Negative Binomial Regression

Same Year Measures

Number of Resprouts

Height of Tallest Resprout

Following Year

Number of Resprouts

Tallest Height

Graphing Models

Same Year Measurements

Number of Resprouts

Height of Tallest Resprout

Following Year

Number of Resprouts

Height of Tallest Resprout