Filter for depth of coverage
PopAppend was modified for environment.
Also modified for family
For example:
## [1] "high pass Depth_mw filter: 15"
## [1] "low pass Depth_mw filter: 50000"
## [1] "number of edits: 136770"
## Chrom Site Depth_mw Id Type Sr Ad Dp Pop Env Family
## 4995 NC_024339.1 16929461 17 oo2 AG 1 1 16 AH NP 211
## 4996 NC_024339.1 16929464 16 oo2 <NA> 0 0 0 AH NP 211
## 4997 NC_024339.1 16930594 15 oo2 AG 1 1 15 AH NP 211
## 4998 NC_024339.1 16930601 16 oo2 <NA> 0 0 0 AH NP 211
## 4999 NC_024339.1 16930606 17 oo2 <NA> 0 0 0 AH NP 211
## 5000 NC_024339.1 16930608 17 oo2 <NA> 0 0 0 AH NP 211
## [1] "All observations summary stats:"
## Chrom Site Depth_mw Id
## NC_024343.1:12128 Min. : 191 Min. : 15.00 oo7 : 4495
## NC_024331.1:11302 1st Qu.: 5068898 1st Qu.: 18.00 oo2 : 4022
## NC_024342.1: 9998 Median :15850299 Median : 24.00 oo3 : 3881
## NC_024346.1: 9323 Mean :15972658 Mean : 40.44 oo8 : 3828
## NC_024349.1: 8608 3rd Qu.:25800657 3rd Qu.: 39.00 o16 : 3760
## NC_024332.1: 8408 Max. :45011448 Max. :650.00 o15 : 3734
## (Other) :77003 (Other):113050
## Type Sr Ad Dp
## AG : 11545 Min. : 0.0000 Min. : 0.0000 Min. : 0.000
## TC : 6509 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000
## NA's:118716 Median : 0.0000 Median : 0.0000 Median : 0.000
## Mean : 0.5739 Mean : 0.6574 Mean : 3.958
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :419.0000 Max. :427.0000 Max. :554.000
##
## Pop Env Family
## AH:68664 NP:69235 Min. :201
## NJ:68106 P :67535 1st Qu.:205
## Median :210
## Mean :210
## 3rd Qu.:215
## Max. :219
##
## [1] "Depth_mw Counts"
## AH NJ
## P 1360288 1364583
## NP 1409563 1397157
## [1] "Supporting Reads Counts"
## AH NJ
## P 18233 19281
## NP 20704 20277
## [1] "Depth_mw Reads Proportions"
## AH NJ
## P 0.246 0.247
## NP 0.255 0.253
## [1] "Supporting Reads Proportions"
## AH NJ
## P 0.232 0.246
## NP 0.264 0.258
## [1] "Chisq for Depth_mw"
## Number of cases in table: 5531591
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 49.71, df = 1, p-value = 1.78e-12
## [1] "Chisq for Supporting Reads"
## Number of cases in table: 78495
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 28.815, df = 1, p-value = 7.963e-08
For Depth_mw, i.e. depth counts for every fish, regardless of editing (0/1) Note: Depth_mw includes Sr reads
Deviation from expected 1:1:1:1 ratio?
For at least 1 supporting read, i.e. high pass filter > 0 Sr
This is the dataset equivalent to the SPRINT called sites, i.e. observations have at least 1 Sr.
After adding in counted depths for all non-editing fish, i.e. this dataset has depths for all fish, both editing and non-editing, Sr >= 0.
## [1] "all edit sites"
glm, glmTMB and glm.nb model fitting
##
## Call:
## glm(formula = Sr ~ Pop * Env + offset(log(Depth_mw)), family = poisson,
## data = yes.filt.FamCorrect, weights = NULL)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.344 -1.014 -0.785 -0.681 52.282
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.22071 0.00695 -607.332 < 2e-16 ***
## PopNJ -0.01200 0.00988 -1.215 0.225
## EnvP -0.09151 0.01016 -9.011 < 2e-16 ***
## PopNJ:EnvP 0.06473 0.01429 4.529 5.93e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 471169 on 136769 degrees of freedom
## Residual deviance: 471073 on 136766 degrees of freedom
## AIC: 521973
##
## Number of Fisher Scoring iterations: 7
## Family: nbinom2 ( log )
## Formula: Sr ~ Pop * Env + offset(log(Depth_mw))
## Zero inflation: ~1
## Data: yes.filt.FamCorrect
##
## AIC BIC logLik deviance df.resid
## 187599.6 187658.5 -93793.8 187587.6 136764
##
##
## Overdispersion parameter for nbinom2 family (): 0.104
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.24973 0.08641 -37.61 <2e-16 ***
## PopNJ 0.04486 0.03359 1.34 0.182
## EnvP -0.04913 0.03389 -1.45 0.147
## PopNJ:EnvP 0.03750 0.04791 0.78 0.434
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.2912 0.1954 -1.49 0.136
##
## Call:
## glm.nb(formula = Sr ~ Pop:Env + offset(log(Depth_mw)), data = yes.filt.FamCorrect,
## init.theta = 0.05414392485, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7810 -0.5480 -0.4994 -0.4713 2.1044
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.77479 0.02439 -154.788 <2e-16 ***
## PopAH:EnvNP -0.03112 0.03423 -0.909 0.363
## PopNJ:EnvNP 0.01150 0.03443 0.334 0.738
## PopAH:EnvP -0.08704 0.03465 -2.512 0.012 *
## PopNJ:EnvP NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.0541) family taken to be 1)
##
## Null deviance: 43094 on 136769 degrees of freedom
## Residual deviance: 43085 on 136766 degrees of freedom
## AIC: 187620
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.054144
## Std. Err.: 0.000510
##
## 2 x log-likelihood: -187610.312000
## [1] "fit.glm diagnostics"
## [1] "Anova glmTMB"
## Df Sum Sq Mean Sq F value Pr(>F)
## Pop 1 7 7.38 0.740 0.3898
## Env 1 46 46.24 4.635 0.0313 *
## Pop:Env 1 9 8.73 0.875 0.3495
## Residuals 136766 1364545 9.98
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Anova glm.nb"
## Df Sum Sq Mean Sq F value Pr(>F)
## Pop:Env 3 62 20.785 2.083 0.1
## Residuals 136766 1364545 9.977
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = fit.glmTMB)
##
## $Pop
## diff lwr upr p adj
## NJ-AH 0.01469104 -0.0187894 0.04817148 0.389777
##
## $Env
## diff lwr upr p adj
## P-NP -0.03677571 -0.07025845 -0.003292962 0.0313405
##
## $`Pop:Env`
## diff lwr upr p adj
## NJ:NP-AH:NP -0.0007375496 -0.06242293 0.060947834 0.9999896
## AH:P-AH:NP -0.0527023699 -0.11465307 0.009248328 0.1272134
## NJ:P-AH:NP -0.0214708770 -0.08323906 0.040297303 0.8085017
## AH:P-NJ:NP -0.0519648203 -0.11433505 0.010405405 0.1403878
## NJ:P-NJ:NP -0.0207333274 -0.08292227 0.041455611 0.8272019
## NJ:P-AH:P 0.0312314929 -0.03122062 0.093683607 0.5726684
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = fit.glm.nb)
##
## $`Pop:Env`
## diff lwr upr p adj
## NJ:NP-AH:NP -0.0007375496 -0.06242293 0.060947834 0.9999896
## AH:P-AH:NP -0.0527023699 -0.11465307 0.009248328 0.1272134
## NJ:P-AH:NP -0.0214708770 -0.08323906 0.040297303 0.8085017
## AH:P-NJ:NP -0.0519648203 -0.11433505 0.010405405 0.1403878
## NJ:P-NJ:NP -0.0207333274 -0.08292227 0.041455611 0.8272019
## NJ:P-AH:P 0.0312314929 -0.03122062 0.093683607 0.5726684