Edit_genomScan

SR and DP Filtration

## [1] "Starting number of data points: 197439"

##   DP_level SR_level SR_top editLevel_mean editLevel_sd   AH   NJ    P   NP
## 1       80        5    500          0.216        0.206 1257 1433 1422 1268
##   Total_survive AH_NJ_rat P_NP_rat AHP AHNP NJP NJNP
## 1          2690     0.877    1.121 608  649 814  619

Exploratory Plots

Standardize all variables (Ind. and dep.), i.e. Z-scores, subtract mean from each value and then divide these new values by the standard deviation of the var.

Super long tail but only a few observations make up the tail

After removing outliers

Still has long tail but ~5 sd, started 20 sd

SR minimum filter: 5 Depth: 80 SR top filter: 500

Filtered for at least 2

    #file.tmp$Location = as.integer(ffile.tmp$Location)
    
    file.tmp %>%
      filter(N_Fish >= N_fish_filt) -> tmp
      

    tmp[,c("ChromLoc", "Pop", "Env", "Family", "SupReads", "DP")] -> tmp
    
    #tmp$SupReads = scale(tmp$SupReads, center = TRUE, scale = TRUE) 
    #tmp$DP = scale(tmp$DP, center = T, scale = T)
    
    tmp$Family = as.factor(tmp$Family)
    
    #m1 <- glmmTMB(SupReads ~ Pop*Env + Family + offset(log(DP)), data = tmp, family = nbinom1())
    
    m2 <- glm.nb(SupReads ~ Pop*Env + offset(log(DP)), data = tmp)
    
    rootogram(m2)

    summary(m2)

## 
## Call:
## glm.nb(formula = SupReads ~ Pop * Env + offset(log(DP)), data = tmp, 
##     init.theta = 1.475156545, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6138  -1.0864  -0.5205   0.3927   2.3361  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.57282    0.03470 -45.322   <2e-16 ***
## PopNJ        0.04627    0.05004   0.925    0.355    
## EnvP         0.04362    0.05000   0.872    0.383    
## PopNJ:EnvP  -0.03621    0.06908  -0.524    0.600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.4752) family taken to be 1)
## 
##     Null deviance: 2640.4  on 2443  degrees of freedom
## Residual deviance: 2638.9  on 2440  degrees of freedom
## AIC: 20676
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.4752 
##           Std. Err.:  0.0410 
## 
##  2 x log-likelihood:  -20666.0090

#    tmp[order(tmp$ChromLoc), ] -> tmp
    
   

popNj_pval = 0
EnvP_pval = 0
popNj_EnvP_pval = 0

The red curved line is the theoretical Poisson fit. “Hanging” from each point on the red line is a bar, the height of which represents the difference between expected and observed counts. A bar hanging below 0 indicates underfitting. A bar hanging above 0 indicates overfitting. The counts have been transformed with a square root transformation to prevent smaller counts from getting obscured and overwhelmed by larger counts. We see a great deal of underfitting for counts 2 and higher and massive overfitting for the 1 count.

\(AIC = −2log(L)+2K\)

interpretation of a model with offset in Poisson regression is the change in the rate of the outcome per the offset variable for changes in the independent variables. Use incident rate ratios ….IRR.

Edit_genomScan

Miles Whedbee

2/28/2020

To begin,

SR and DP Filtration

Exploratory Plots