Read in data

Filter for depth of coverage

Script for env append: envAppend.sh

PopAppend was modified for environment.

Script for family append: famAppend.sh

Also modified for family

For example:

## [1] "high pass Depth_mw filter: 15"
## [1] "low pass Depth_mw filter: 50000"
## [1] "number of edits: 136770"
##            Chrom     Site Depth_mw  Id Type Sr Ad Dp Pop Env Family
## 4995 NC_024339.1 16929461       17 oo2   AG  1  1 16  AH  NP    211
## 4996 NC_024339.1 16929464       16 oo2 <NA>  0  0  0  AH  NP    211
## 4997 NC_024339.1 16930594       15 oo2   AG  1  1 15  AH  NP    211
## 4998 NC_024339.1 16930601       16 oo2 <NA>  0  0  0  AH  NP    211
## 4999 NC_024339.1 16930606       17 oo2 <NA>  0  0  0  AH  NP    211
## 5000 NC_024339.1 16930608       17 oo2 <NA>  0  0  0  AH  NP    211
## [1] "All observations summary stats:"
##          Chrom            Site             Depth_mw            Id        
##  NC_024343.1:12128   Min.   :     191   Min.   : 15.00   oo7    :  4495  
##  NC_024331.1:11302   1st Qu.: 5068898   1st Qu.: 18.00   oo2    :  4022  
##  NC_024342.1: 9998   Median :15850299   Median : 24.00   oo3    :  3881  
##  NC_024346.1: 9323   Mean   :15972658   Mean   : 40.44   oo8    :  3828  
##  NC_024349.1: 8608   3rd Qu.:25800657   3rd Qu.: 39.00   o16    :  3760  
##  NC_024332.1: 8408   Max.   :45011448   Max.   :650.00   o15    :  3734  
##  (Other)    :77003                                       (Other):113050  
##    Type              Sr                 Ad                 Dp         
##  AG  : 11545   Min.   :  0.0000   Min.   :  0.0000   Min.   :  0.000  
##  TC  :  6509   1st Qu.:  0.0000   1st Qu.:  0.0000   1st Qu.:  0.000  
##  NA's:118716   Median :  0.0000   Median :  0.0000   Median :  0.000  
##                Mean   :  0.5739   Mean   :  0.6574   Mean   :  3.958  
##                3rd Qu.:  0.0000   3rd Qu.:  0.0000   3rd Qu.:  0.000  
##                Max.   :419.0000   Max.   :427.0000   Max.   :554.000  
##                                                                       
##  Pop        Env            Family   
##  AH:68664   NP:69235   Min.   :201  
##  NJ:68106   P :67535   1st Qu.:205  
##                        Median :210  
##                        Mean   :210  
##                        3rd Qu.:215  
##                        Max.   :219  
## 

## [1] "Depth_mw Counts"
##         AH      NJ
## P  1360288 1364583
## NP 1409563 1397157
## [1] "Supporting Reads Counts"
##       AH    NJ
## P  18233 19281
## NP 20704 20277
## [1] "Depth_mw Reads Proportions"
##       AH    NJ
## P  0.246 0.247
## NP 0.255 0.253
## [1] "Supporting Reads Proportions"
##       AH    NJ
## P  0.232 0.246
## NP 0.264 0.258
## [1] "Chisq for Depth_mw"
## Number of cases in table: 5531591 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 49.71, df = 1, p-value = 1.78e-12
## [1] "Chisq for Supporting Reads"
## Number of cases in table: 78495 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 28.815, df = 1, p-value = 7.963e-08
Depth_mw Chisq

For Depth_mw, i.e. depth counts for every fish, regardless of editing (0/1) Note: Depth_mw includes Sr reads

Sr Chisq

Deviation from expected 1:1:1:1 ratio?

For at least 1 supporting read, i.e. high pass filter > 0 Sr

This is the dataset equivalent to the SPRINT called sites, i.e. observations have at least 1 Sr.

After adding in counted depths for all non-editing fish, i.e. this dataset has depths for all fish, both editing and non-editing, Sr >= 0.

## [1] "all edit sites"

glm, glmTMB and glm.nb model fitting

## 
## Call:
## glm(formula = Sr ~ Pop * Env + offset(log(Depth_mw)), family = poisson, 
##     data = yes.filt.FamCorrect, weights = NULL)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -4.344  -1.014  -0.785  -0.681  52.282  
## 
## Coefficients:
##             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -4.22071    0.00695 -607.332  < 2e-16 ***
## PopNJ       -0.01200    0.00988   -1.215    0.225    
## EnvP        -0.09151    0.01016   -9.011  < 2e-16 ***
## PopNJ:EnvP   0.06473    0.01429    4.529 5.93e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 471169  on 136769  degrees of freedom
## Residual deviance: 471073  on 136766  degrees of freedom
## AIC: 521973
## 
## Number of Fisher Scoring iterations: 7
##  Family: nbinom2  ( log )
## Formula:          Sr ~ Pop * Env + offset(log(Depth_mw))
## Zero inflation:      ~1
## Data: yes.filt.FamCorrect
## 
##      AIC      BIC   logLik deviance df.resid 
## 187599.6 187658.5 -93793.8 187587.6   136764 
## 
## 
## Overdispersion parameter for nbinom2 family (): 0.104 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.24973    0.08641  -37.61   <2e-16 ***
## PopNJ        0.04486    0.03359    1.34    0.182    
## EnvP        -0.04913    0.03389   -1.45    0.147    
## PopNJ:EnvP   0.03750    0.04791    0.78    0.434    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.2912     0.1954   -1.49    0.136
## 
## Call:
## glm.nb(formula = Sr ~ Pop:Env + offset(log(Depth_mw)), data = yes.filt.FamCorrect, 
##     init.theta = 0.05414392485, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7810  -0.5480  -0.4994  -0.4713   2.1044  
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -3.77479    0.02439 -154.788   <2e-16 ***
## PopAH:EnvNP -0.03112    0.03423   -0.909    0.363    
## PopNJ:EnvNP  0.01150    0.03443    0.334    0.738    
## PopAH:EnvP  -0.08704    0.03465   -2.512    0.012 *  
## PopNJ:EnvP        NA         NA       NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.0541) family taken to be 1)
## 
##     Null deviance: 43094  on 136769  degrees of freedom
## Residual deviance: 43085  on 136766  degrees of freedom
## AIC: 187620
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.054144 
##           Std. Err.:  0.000510 
## 
##  2 x log-likelihood:  -187610.312000

## [1] "fit.glm diagnostics"

## [1] "Anova glmTMB"
##                 Df  Sum Sq Mean Sq F value Pr(>F)  
## Pop              1       7    7.38   0.740 0.3898  
## Env              1      46   46.24   4.635 0.0313 *
## Pop:Env          1       9    8.73   0.875 0.3495  
## Residuals   136766 1364545    9.98                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Anova glm.nb"
##                 Df  Sum Sq Mean Sq F value Pr(>F)
## Pop:Env          3      62  20.785   2.083    0.1
## Residuals   136766 1364545   9.977
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fit.glmTMB)
## 
## $Pop
##             diff        lwr        upr    p adj
## NJ-AH 0.01469104 -0.0187894 0.04817148 0.389777
## 
## $Env
##             diff         lwr          upr     p adj
## P-NP -0.03677571 -0.07025845 -0.003292962 0.0313405
## 
## $`Pop:Env`
##                      diff         lwr         upr     p adj
## NJ:NP-AH:NP -0.0007375496 -0.06242293 0.060947834 0.9999896
## AH:P-AH:NP  -0.0527023699 -0.11465307 0.009248328 0.1272134
## NJ:P-AH:NP  -0.0214708770 -0.08323906 0.040297303 0.8085017
## AH:P-NJ:NP  -0.0519648203 -0.11433505 0.010405405 0.1403878
## NJ:P-NJ:NP  -0.0207333274 -0.08292227 0.041455611 0.8272019
## NJ:P-AH:P    0.0312314929 -0.03122062 0.093683607 0.5726684
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fit.glm.nb)
## 
## $`Pop:Env`
##                      diff         lwr         upr     p adj
## NJ:NP-AH:NP -0.0007375496 -0.06242293 0.060947834 0.9999896
## AH:P-AH:NP  -0.0527023699 -0.11465307 0.009248328 0.1272134
## NJ:P-AH:NP  -0.0214708770 -0.08323906 0.040297303 0.8085017
## AH:P-NJ:NP  -0.0519648203 -0.11433505 0.010405405 0.1403878
## NJ:P-NJ:NP  -0.0207333274 -0.08292227 0.041455611 0.8272019
## NJ:P-AH:P    0.0312314929 -0.03122062 0.093683607 0.5726684