Main database binning documentation

Kevin Keenan, 2014

Lough Neagh brown trout project.

Introduction

This notebook outlines the binning process used for the microsatellite loci employed in my Ph.D thesis. The data consists of over 6000 individuals, thus binning manually in GeneMapper v4.0 became more and more complex as the sample size increased. More samples resulted in larger variance in fragment size making the manual definition of the beginning and end of allele bins difficult.

To overcome this limitation, I employed a customised version of Alberto’s (2009) R package, MsatAllele, to visualised the cumulative distribution of fragments and define binning parameters as objectively as possible. The custom version of MsatAllele can be found at https://github.com/kkeenan02/MsatAllele. This package contain a number of new function, mainly designed to speed up computations. The employment of C++ indigrated with the help of the Rcpp package aids in this. Specifically, the database reader function in the custom version of MsatAllele is around 500 time faster than the original.

Another unique benefit of the custom MsatAllele package is the introduction of a binning routine which allow the definition of complex binning criteria, resulting in a more flexible and accurate binning process. This method is expecially important when binning allele fragments for loci with complex repeat patterns. As such, hypothetically, users can specify variable bin limits for any region within a given locus’ range.

This script documents the binning process for the LNBT project.

Load the necessary packages
library("MsatAllele")
library("ggplot2")

Read the baseline database

DB_orig <- fastReadFrag("Main_DB.txt", as.character(Sys.Date()), "all")
saveRDS(DB_orig, "Main_DB.rds")
DB_orig <- readRDS("Main_DB.rds")

Ssa85

Calculate bin statistics for Ssa85

dat <- BinStats(DB_orig, "Ssa85")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
100 1 100.1 100.1 0 NA 100.1 100.1
107 78 106 106.73 0.73 0.176 106.51 106.57
110 2096 109.98 110.67 0.69 0.14 110.38 110.39
112 2108 111.93 112.6 0.67 0.141 112.34 112.36
114 3471 113.81 114.57 0.76 0.126 114.28 114.28
116 2306 115.74 116.45 0.71 0.131 116.18 116.2
118 51 117.71 118.32 0.61 0.143 118.1 118.1
120 2 119.72 119.91 0.19 0.134 119.81 119.81
133 1 133.18 133.18 0 NA 133.18 133.18


Generate cumulative plot for Ssa85

res <- allCum(DB_orig, "Ssa85", limit = 0.8)
print(res$plt)


All bins look good. Alleles will be generated without further checks! Only low frequency alleles will be checked. They are noted below.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa85", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
LND14139 10068 100.1 all
SXM13111 9147 119.7 all
RBW102804 7028 119.9 all
RBW040504 222 133.2 all


  • LND14139 = Fragment 100.1 was an artifact peak
  • SXM13111 = appears to be legit. Size standard is normal.
  • RBW040504 = Legit
  • RBW102804 = Legit.

DONE!

One102a

Calculate bin statistics for One102a

dat <- BinStats(DB_orig, "One102a")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
163 1 163.06 163.06 0 NA 163.06 163.06
167 3736 166.82 167.59 0.77 0.152 167.22 167.23
170 5055 169.9 170.71 0.81 0.151 170.32 170.34


Generate cumulative plot for One102a

res <- allCum(DB_orig, "One102a")
print(res$plt)


Bins are good. There appears to be a single sample with the ‘163’ allele. This sample will be checked in GeneMapper for validity, and amended accordingly.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One102a", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
CLG040702 13637 163.1 all


  • CLG040702 = Legit.

DONE!

One102b

Calculate bin statistics for One102b

dat <- BinStats(DB_orig, "One102b")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
178 2 178.22 178.26 0.04 0.028 178.24 178.24
191 81 190.38 190.82 0.44 0.086 190.64 190.64
195 63 194.44 195.05 0.61 0.162 194.79 194.82
199 225 198.53 199.12 0.59 0.157 198.91 198.97
203 521 202.58 203.23 0.65 0.147 202.96 202.99
207 1260 206.63 207.32 0.69 0.157 207.04 207.06
211 1486 210.65 211.4 0.75 0.161 211.1 211.12
215 1882 214.18 215.46 1.28 0.159 215.19 215.22
219 1693 218.92 220 1.08 0.147 219.32 219.34
223 1168 222.99 223.68 0.69 0.159 223.38 223.4
227 754 227.03 227.72 0.69 0.144 227.44 227.47
232 202 231.1 231.74 0.64 0.151 231.51 231.54
236 113 235.2 235.8 0.6 0.154 235.55 235.56
240 81 239.23 239.82 0.59 0.136 239.59 239.63
244 212 243.38 244.03 0.65 0.133 243.76 243.79
248 466 247.56 248.21 0.65 0.129 247.91 247.92
252 240 251.64 252.26 0.62 0.123 252.01 252.01
256 557 255.67 256.29 0.62 0.13 256.02 256.04
260 161 259.71 260.27 0.56 0.119 260.03 260
264 41 263.82 264.27 0.45 0.097 264.06 264.08
268 11 267.85 268.29 0.44 0.141 268.05 268.01


Generate cumulative plot for One102b

res <- allCum(DB_orig, "One102b")
print(res$plt)


All good. Just need to inspect low frequency alleles.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One102b", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
KLW100206 25032 178.2 all
CLM050204 20730 178.3 all


  • CLM050204 = Legit.
  • KLW100206 = Legit.

DONE!

Ssa406UoS

Calculate bin statistics for Ssa406UoS

dat <- BinStats(DB_orig, "Ssa406UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
414 1 414.18 414.18 0 NA 414.18 414.18
422 108 420.86 421.77 0.91 0.17 421.51 421.52
425 106 424.85 425.74 0.89 0.144 425.44 425.46
429 101 428.58 429.64 1.06 0.192 429.21 429.23
433 7 432.86 433.46 0.6 0.192 433.28 433.32
435 16 434.82 435.22 0.4 0.115 435.03 435.05
437 2 436.92 436.96 0.04 0.028 436.94 436.94
439 21 438.42 439.03 0.61 0.167 438.79 438.8
443 32 442.37 443.02 0.65 0.158 442.8 442.82
445 53 444.51 445.23 0.72 0.162 444.92 444.92
447 429 445.95 446.98 1.03 0.152 446.68 446.7
448 1 447.77 447.77 0 NA 447.77 447.77
449 78 448.04 449.08 1.04 0.168 448.73 448.76
450 1 449.78 449.78 0 NA 449.78 449.78
451 530 450.07 450.92 0.85 0.171 450.58 450.59
453 45 451.88 452.95 1.07 0.213 452.61 452.62
455 1165 453.83 454.89 1.06 0.157 454.56 454.57
456 1 456.39 456.39 0 NA 456.39 456.39
458 1 457.86 457.86 0 NA 457.86 457.86
459 782 457.92 458.86 0.94 0.151 458.51 458.53
461 17 460.35 460.81 0.46 0.131 460.62 460.64
462 1302 461.64 462.74 1.1 0.163 462.4 462.42
464 56 463.89 464.68 0.79 0.161 464.42 464.44
466 671 465.6 466.65 1.05 0.15 466.29 466.31
468 6 467.98 468.44 0.46 0.206 468.24 468.25
470 314 469.21 470.48 1.27 0.182 470.17 470.2
472 12 471.97 472.43 0.46 0.124 472.26 472.25
474 76 473.88 474.4 0.52 0.132 474.18 474.18
476 10 475.93 476.22 0.29 0.099 476.06 476
478 49 477.66 478.2 0.54 0.128 478.03 478.04
480 1 479.8 479.8 0 NA 479.8 479.8
482 92 481.47 482.24 0.77 0.18 481.88 481.9
484 7 483.84 484.2 0.36 0.115 483.99 484.01
486 143 485.34 486.16 0.82 0.163 485.81 485.83
488 8 487.62 487.97 0.35 0.132 487.79 487.75
490 164 489.26 490.11 0.85 0.181 489.74 489.76
492 12 491.34 491.92 0.58 0.165 491.71 491.74
494 36 493.46 494 0.54 0.147 493.75 493.77
496 5 495.47 495.76 0.29 0.115 495.64 495.66
498 46 497.09 497.95 0.86 0.192 497.61 497.64
500 24 499.09 499.91 0.82 0.194 499.6 499.62
502 255 500.97 501.91 0.94 0.17 501.54 501.55
504 14 503.37 503.85 0.48 0.149 503.6 503.62
505 418 504.66 505.8 1.14 0.178 505.44 505.46
508 1 507.61 507.61 0 NA 507.61 507.61
509 466 508.55 509.73 1.18 0.172 509.36 509.39
511 9 510.85 511.6 0.75 0.214 511.32 511.3
513 360 512.57 513.8 1.23 0.185 513.28 513.32
515 9 515.11 515.52 0.41 0.143 515.28 515.26
517 153 516.64 517.64 1 0.2 517.29 517.3
519 8 519 519.51 0.51 0.192 519.23 519.24
521 240 520.6 521.65 1.05 0.195 521.24 521.26
525 242 524.59 525.47 0.88 0.172 525.12 525.14
527 1 527.26 527.26 0 NA 527.26 527.26
529 74 528.71 529.27 0.56 0.144 529.05 529.08
531 3 530.8 531.01 0.21 0.112 530.93 530.97
533 82 532.14 533.11 0.97 0.189 532.84 532.87
535 2 534.6 534.84 0.24 0.17 534.72 534.72
537 104 536.17 537.04 0.87 0.161 536.66 536.65
539 3 538.37 539.05 0.68 0.342 538.73 538.78
541 83 540 540.96 0.96 0.206 540.53 540.57
544 39 543.85 544.78 0.93 0.168 544.43 544.43
550 1 550.27 550.27 0 NA 550.27 550.27
554 1 554.11 554.11 0 NA 554.11 554.11
556 4 556.12 556.3 0.18 0.085 556.24 556.27


Generate cumulative plot for Ssa406UoS

res <- allCum(DB_orig, "Ssa406UoS")
print(res$plt)


Identify issues

  • Problem sample with peak around 449.6.

      res <- allCum(DB_orig, "Ssa406UoS", limit = 0.9, ymin = 445, ymax = 455)
      print(res$plt)

    • Identify the sample:
      DB_orig %>%
        filter(Marker == "Ssa406UoS") %>%
        filter(Fragment >= 449.6 & Fragment <= 450)
         Marker   Sample Fragment       Date Plate
    1 Ssa406UoS RMN13048   449.78 2015-03-07   all
    • Sample is from plates 59 +. 0.35 added.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa406UoS", 3, limit = 0.9)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
CLG040205 33969 436.9 all
CLG040210 33979 437 all
IOM120253 36978 456.4 all
RMN13042 39044 479.8 all
SJH100146 36710 507.6 all
BLD100506 35844 527.3 all
BLD102302 36034 531 all
GVY100304 37311 531 all
LSN100202 37493 534.6 all
LSN10T1N205 38312 534.8 all
KGR100507 37764 538.8 all
GGR100504 37866 538.4 all
BLD100602 35856 550.3 all
BLD101203 35912 554.1 all


  • BLD100506 = Peak 527.26 legit.
  • BLD100602 = Peak 550.27 legit.
  • BLD101203 = Peak 554.11 legit.
  • BLD102302 = Peak 530.97 legit.
  • BLD102303 = Weak sample, but 530.8 peak is legit.
  • CLG040205 = Peak 436.92 legit.
  • CLG040210 = Peak 436.96 legit.
  • CLG041309 = Peak 539.05 artifact. Amended in “Main_DB_new.txt”.
  • GGR100504 = Peak 538.37 legit.
  • GVY100304 = Peak 531.01 legit.
  • IOM120253 = Peak 456.39 legit.
  • KGR100507 = Peak 538.78 legit.
  • LSN100202 = Peak 534.60 legit.
  • LSN10T1N205 = Peak 534.84 legit.
  • PRB100301 = Peak 414.18 artifact. Amended in “Main_DB_new.txt”.
  • RMN13042 = Peak 479.8 legit.
  • RMN13046 = Peak 447.77 legit.
  • RMN13048 = Peak 449.78 legit.
  • RMN13054 = Peak 457.86 legit.
  • SJH100146 = Peak 507.61 legit.


DONE!

CA048302

Calculate bin statistics for CA048302

dat <- BinStats(DB_orig, "CA048302")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
172 9 171.72 171.97 0.25 0.079 171.86 171.85
176 8 175.54 175.8 0.26 0.089 175.71 175.75
178 2 177.54 177.54 0 0 177.54 177.54
180 2661 179.3 179.65 0.35 0.061 179.5 179.53
181 96 181.21 181.56 0.35 0.055 181.42 181.43
183 314 183.17 183.5 0.33 0.059 183.37 183.37
185 3336 185.09 185.48 0.39 0.059 185.3 185.3
187 1840 187.01 187.41 0.4 0.057 187.23 187.24
189 520 188.89 189.31 0.42 0.057 189.17 189.18
191 1 191.08 191.08 0 NA 191.08 191.08
193 707 192.85 193.18 0.33 0.058 193.05 193.05
197 45 196.73 197.03 0.3 0.067 196.91 196.92


Generate cumulative plot for CA048302

res <- allCum(DB_orig, "CA048302")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA048302", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
UBN140040 47073 177.5 all
SXM13058 47757 177.5 all
LTB040204 43591 191.1 all


  • LTB040204 = Peak 191.08 legit.
  • SXM13058 = Peak 177.54 legit.
  • UBN140040 = Peak 177.54 legit.

DONE!

Ssa419UoS

Calculate bin statistics for Ssa419UoS

dat <- BinStats(DB_orig, "Ssa419UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
280 1813 279.42 280 0.58 0.093 279.78 279.82
299 3 299.24 299.32 0.08 0.04 299.28 299.29
365 245 364.62 365.31 0.69 0.12 365.07 365.09
369 3989 368.44 369.3 0.86 0.113 368.99 369.01
397 1 396.94 396.94 0 NA 396.94 396.94
450 23 449.88 450.35 0.47 0.113 450.19 450.2
454 11 454.06 454.44 0.38 0.111 454.27 454.28
532 4 531.47 531.69 0.22 0.09 531.58 531.59
536 12 536.17 536.75 0.58 0.161 536.45 536.47
539 3239 538.29 539.62 1.33 0.175 539.23 539.24


Generate cumulative plot for Ssa419UoS

res <- allCum(DB_orig, "Ssa419UoS")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa419UoS", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
SXM101104 54637 299.2 all
RBW053501 49559 299.3 all
RBW102506 55359 299.3 all
RBW040504 48976 396.9 all


  • RBW040504 = Peak 396.94 legit.
  • RBW053501 = Peak 299.29 legit.
  • RBW102506 = Peak 299.32 legit.
  • SXM101104 = Peak 299.24 legit.

DONE!

Ssa416UoS

Calculate bin statistics for Ssa416UoS

dat <- BinStats(DB_orig, "Ssa416UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
113 63 112.59 113.34 0.75 0.166 112.97 113.01
122 1136 121.81 123.19 1.38 0.177 122.31 122.32
131 503 130.5 131.79 1.29 0.189 131 131.01
140 5515 139.48 140.47 0.99 0.198 140.12 140.1


Generate cumulative plot for Ssa416UoS

res <- allCum(DB_orig, "Ssa416UoS", limit = 0.4)
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa416UoS", 3, limit = 0.4)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
RBW043901 58524 123.2 all
RBW043901 58525 131.8 all


  • RBW043901 = Both alleles were the result of an incorrect size standard peak. Corrected in Main_DB_new.txt


res <- allCum(DB_new, "Ssa416UoS")
print(res$plt)


DONE!

Sssp2201

Calculate bin statistics for Sssp2201

dat <- BinStats(DB_orig, "Sssp2201")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
181 1 180.57 180.57 0 NA 180.57 180.57
182 10 181.29 181.81 0.52 0.133 181.6 181.61
186 9 185.29 185.88 0.59 0.241 185.62 185.66
194 58 193.34 193.94 0.6 0.158 193.69 193.69
198 102 197.39 198.04 0.65 0.17 197.78 197.81
202 359 201.42 202.06 0.64 0.14 201.79 201.81
206 295 205.34 206.07 0.73 0.156 205.82 205.85
209 1 208.54 208.54 0 NA 208.54 208.54
210 390 209.41 210.1 0.69 0.154 209.81 209.83
214 148 213.41 214.09 0.68 0.173 213.82 213.91
218 151 217.54 218.12 0.58 0.129 217.87 217.9
222 90 221.07 222.16 1.09 0.232 221.87 221.91
225 3 224.68 224.76 0.08 0.04 224.72 224.71
226 310 224.92 226.19 1.27 0.182 225.92 225.96
229 54 228.62 229.08 0.46 0.115 228.94 228.95
230 158 229.09 230.21 1.12 0.248 229.81 229.88
233 511 232.59 233.78 1.19 0.209 233.01 232.97
234 169 233.79 234.21 0.42 0.09 233.93 233.91
237 1035 236.53 237.86 1.33 0.199 236.97 236.96
238 106 237.87 238.14 0.27 0.079 237.97 237.97
241 691 240.5 241.84 1.34 0.224 241.01 241.01
242 152 241.85 242.19 0.34 0.069 241.97 241.97
245 686 244.61 246.07 1.46 0.224 245.15 245.15
246 22 246.09 246.32 0.23 0.07 246.18 246.16
249 627 248.62 250.09 1.47 0.21 249.16 249.16
250 10 250.18 250.35 0.17 0.084 250.26 250.22
252 3 251.95 251.99 0.04 0.021 251.97 251.98
253 961 252.24 254.19 1.95 0.2 253.17 253.2
254 7 254.23 254.37 0.14 0.054 254.32 254.35
257 241 256.11 257.42 1.31 0.214 257.14 257.17
261 201 260.17 261.33 1.16 0.15 261.1 261.11
265 124 264.73 265.35 0.62 0.142 265.1 265.12
269 40 268.79 269.79 1 0.203 269.12 269.09
270 8 270.04 270.26 0.22 0.068 270.17 270.18
273 187 272.74 274.13 1.39 0.138 273.16 273.17
277 145 276.64 277.42 0.78 0.177 277.12 277.15
279 1 278.6 278.6 0 NA 278.6 278.6
281 114 280.76 281.39 0.63 0.121 281.19 281.19
282 1 282.46 282.46 0 NA 282.46 282.46
285 197 284.72 286.1 1.38 0.156 285.14 285.15
286 17 286.13 286.57 0.44 0.143 286.41 286.46
289 323 288.65 289.37 0.72 0.138 289.11 289.12
290 61 290.06 290.48 0.42 0.08 290.36 290.38
293 168 292.66 293.93 1.27 0.149 293.06 293.08
294 24 293.99 294.6 0.61 0.183 294.4 294.47
297 123 296.58 297.32 0.74 0.164 297.07 297.09
298 42 297.93 298.58 0.65 0.177 298.3 298.35
301 61 300.64 301.31 0.67 0.194 300.98 301.03
302 13 302.01 302.51 0.5 0.182 302.22 302.26
305 17 304.75 305.19 0.44 0.114 304.98 304.99
306 89 305.88 306.52 0.64 0.126 306.31 306.33
309 22 308.89 309.18 0.29 0.085 309.01 308.99
310 87 309.74 310.47 0.73 0.166 310.25 310.29
313 2 313 313.01 0.01 0.007 313 313
314 14 313.74 314.66 0.92 0.268 314.16 314.23
317 27 316.63 317.45 0.82 0.285 316.91 316.77
318 29 317.67 318.54 0.87 0.286 318.14 318.03
319 21 318.64 319.52 0.88 0.186 318.89 318.91
320 11 319.82 320.6 0.78 0.208 320 319.91
321 5 320.82 321.07 0.25 0.096 320.98 321.01
322 34 321.16 322.6 1.44 0.358 322.06 322.01
324 8 323.83 324.14 0.31 0.099 323.95 323.96
326 67 324.8 326.13 1.33 0.265 325.8 325.87
328 18 327.55 328.17 0.62 0.175 327.92 327.97
329 25 328.25 329.06 0.81 0.167 328.9 328.96
330 90 329.09 330.15 1.06 0.242 329.82 329.89
331 13 330.82 331.48 0.66 0.194 331.06 331.03
333 42 331.95 333.13 1.18 0.214 332.86 332.93
334 65 333.16 334.18 1.02 0.19 333.87 333.91
335 39 334.76 335.92 1.16 0.312 335.31 335.22
336 11 335.97 336.08 0.11 0.035 336.01 336.02
337 61 336.6 337.23 0.63 0.169 336.95 336.98
338 111 337.55 338.51 0.96 0.132 338 338.02
339 27 339.04 340 0.96 0.322 339.42 339.33
341 32 340.47 341.09 0.62 0.16 340.84 340.88
342 136 341.13 342.71 1.58 0.283 341.89 341.94
343 19 342.93 343.63 0.7 0.181 343.16 343.11
344 6 343.85 343.95 0.1 0.042 343.91 343.91
345 141 344.06 345.64 1.58 0.239 345 344.98
346 99 345.72 346.7 0.98 0.198 346.04 346
347 69 346.74 347.92 1.18 0.258 347.14 347.1
348 1 348.02 348.02 0 NA 348.02 348.02
349 75 348.11 349.24 1.13 0.305 348.88 348.97
350 95 349.26 350.72 1.46 0.225 349.9 349.92
351 2 351.11 351.44 0.33 0.233 351.27 351.27
353 6 352.55 352.79 0.24 0.084 352.68 352.68
354 87 353.01 354.78 1.77 0.327 353.97 354.01
355 21 354.94 355.42 0.48 0.145 355.22 355.23
357 1 356.82 356.82 0 NA 356.82 356.82
358 119 357.11 358.85 1.74 0.25 358 358.01
359 14 359.08 359.49 0.41 0.138 359.4 359.47
361 16 360.47 361.08 0.61 0.22 360.83 360.81
362 103 361.14 362.78 1.64 0.351 361.91 361.94
363 2 363 363.31 0.31 0.219 363.15 363.15
365 3 364.79 364.97 0.18 0.095 364.9 364.93
366 9 365.36 366.01 0.65 0.207 365.87 365.96
367 6 366.69 367.4 0.71 0.33 367 366.93
370 6 369.63 369.95 0.32 0.121 369.87 369.9
371 9 370.82 371.41 0.59 0.219 371.23 371.31
374 4 373.59 374 0.41 0.192 373.85 373.9
375 1 375.36 375.36 0 NA 375.36 375.36
378 17 377.51 378.22 0.71 0.222 377.97 378.01
382 5 381.78 382.36 0.58 0.234 382.01 381.93
383 12 382.41 383.23 0.82 0.295 383 383.13
386 58 385.15 386.92 1.77 0.332 386.1 386.15
390 6 390.19 390.45 0.26 0.099 390.37 390.4
398 2 398.12 398.35 0.23 0.163 398.24 398.24


Generate cumulative plot for Sssp2201

res <- allCum(DB_orig, "Sssp2201")
print(res$plt)


Identify problems

  • There is a possibly problematic sample ~ 180.5bp:

      res <- allCum(DB_orig, "Sssp2201", ymin = 170, ymax = 200)
      print(res$plt)

    • Identify the sample:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 180 & Fragment <= 181)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 BLD102301   180.57 2015-03-07   all
    • BLD102301 = Peak is an artifact. Edited in “Main_DB_new.txt”.
  • Fragments begin to exhibit 1bp jumps from 220bp onwards. A binning limit of 0.3 seems to allow differentiation of these fragments:

      res <- allCum(DB_new, "Sssp2201", limit = 0.3, ymin = 215, 
                    ymax = 230)
      print(res$plt)

  • The fragments between 236 - 251bp require a binning limit of 0.2 to allow the algorithm to differentiate alleles.

      res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 232,
                    ymax = 251)
      print(res$plt)

  • There is a problematic group of fragments between 251.7 - 254.8:

      res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 245,
                    ymax = 260)
      print(res$plt)

    • Initially, identify the bottom samples between 251 and 252.6:
      # Smaller fragments
      DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 251 & Fragment <= 252.2)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13048   251.98 2015-03-07   all
    2 Sssp2201 SXM13086   251.99 2015-03-07   all
    3 Sssp2201 SXM13131   251.95 2015-03-07   all
      # Larger group of fragments
      lrg <- DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 252.21 & Fragment <= 252.6)
      # Return 3 samples
      lrg[1:3,]
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 KEL050808   252.32 2015-03-07   all
    2 Sssp2201 CLG040219   252.27 2015-03-07   all
    3 Sssp2201 RMN103104   252.32 2015-03-07   all
    • All fragments are legit. The break between these fragments is down to a consistent downward shift of 0.4bp in adult river samples and samples from the lake. These samples were some of the last to be screened and this minor difference is potentially due to a technical change, since the array on the ABI was changed for these plates. The current binning pattern will be retained.
  • Another group of these technical shift fragment appear ~ 269.8bp:

      res <- allCum(DB_new, "Sssp2201", limit = 0.2, ymin = 263,
                    ymax = 273)
      print(res$plt)

    • Identify the smaller fragments, and check if they were screened within Plate 59+. If so the current binning pattern is appropriate.
      DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 269.5 & Fragment <= 269.9)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13043   269.79 2015-03-07   all
    2 Sssp2201 SXM13064   269.74 2015-03-07   all
    3 Sssp2201 SXM13106   269.69 2015-03-07   all
    • All three samples are from either Plate 59 or Plate 60.
  • A binning limit of 0.2 is too low for a bin ~ 294bp. Setting the bin limit to 0.45 between 280 and 312bp overcomes this issue:

      res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 290,
                    ymax = 312)
      print(res$plt)

  • There is a sole fragment at the top of the bin @ 314bp:

      res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 308,
                    ymax = 315)
      print(res$plt)

    • Identify the fragment, as well as the two small frgments @ 313bp:
      # Small samples first
      DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 312.7 & Fragment <= 313.2)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 BLD100602   313.01 2015-03-07   all
    2 Sssp2201 BLD100802   313.00 2015-03-07   all
      # Single large fragment
      DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 314.5 & Fragment <= 314.7)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 CLG051401   314.66 2015-03-07   all
    • All fragments are legit.
  • There are a number of problematic fragments between 316 and 323 bp:

      res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 316,
                    ymax = 323)
      print(res$plt)

    • Identify the fragments between 316.8 and 317.5:
      DB_orig %>%
      filter(Marker == "Sssp2201") %>%
      filter(Fragment >= 316.8 & Fragment <= 317.5)
         Marker    Sample Fragment       Date Plate
    1  Sssp2201 RBW041803   316.81 2015-03-07   all
    2  Sssp2201 RBW041804   316.81 2015-03-07   all
    3  Sssp2201 RBW041806   316.81 2015-03-07   all
    4  Sssp2201 RBW051805   316.81 2015-03-07   all
    5  Sssp2201 CLM051601   316.81 2015-03-07   all
    6  Sssp2201 KLW050110   316.81 2015-03-07   all
    7  Sssp2201 BLD101401   317.09 2015-03-07   all
    8  Sssp2201  SXM13030   317.40 2015-03-07   all
    9  Sssp2201  SXM13061   317.40 2015-03-07   all
    10 Sssp2201  SXM13115   317.35 2015-03-07   all
    11 Sssp2201  SXM13185   317.44 2015-03-07   all
    12 Sssp2201  LND14039   317.45 2015-03-07   all
    13 Sssp2201  LND14071   317.45 2015-03-07   all
    • All samples between 317.3 - 317.5 are from plates > 59. These samples all exhibit the downward shift mentioned above. These fragments will have 0.35 added to them to allow them to be binned into the bin ~ 318 bp
      # Read the database
      DB <- read.delim("Main_DB_new.txt", header = TRUE)
      # Subset Sssp2201 fragments
      DB_loc <- DB %>% filter(Marker == "Sssp2201")
      # replace values
      DB_loc <- DB_loc %>%
        mutate(Size.1 = ifelse(Size.1 >= 317.3 & Size.1 <= 317.5, 
                               Size.1 + 0.35, Size.1))
      DB_loc <- DB_loc %>%
        mutate(Size.2 = ifelse(Size.2 >= 317.3 & Size.2 <= 317.5, 
                               Size.2 + 0.35, Size.2))
      # Add replaced values to main DB
      DB[DB$Marker == "Sssp2201",] <- DB_loc
      # Write the database (old db is backed up as "Main_DB_new.txt.bkp1")
      write.table(DB, file = "Main_DB_new.txt", append = F, sep = "\t", 
                  na = "", row.names = F, col.names = T, quote = F)
      DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
      saveRDS(DB_new, "Main_DB_new.rds")
  • There is a problem bin between 318.2 - 319.1 bp

      res <- allCum(DB_new, "Sssp2201", limit = 0.45, ymin = 317.5, 
                    ymax = 319.5)
      print(res$plt)

    • Identify 5 samples from the group of fragments between 318.3 - 318.6:
      temp <- DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 318.3 & Fragment <= 318.6)
      temp[sample(1:nrow(temp), 5, replace = FALSE),]
         Marker    Sample Fragment       Date Plate
    1  Sssp2201 RBW043603   318.52 2015-01-29   all
    11 Sssp2201  SXM13114   318.47 2015-01-29   all
    8  Sssp2201  SXM13087   318.42 2015-01-29   all
    7  Sssp2201 DGR100303   318.54 2015-01-29   all
    13 Sssp2201  SXM13168   318.47 2015-01-29   all
    • Identify 5 samples from the group between 318.61 - 319.5:
      temp <- DB_new %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 318.61 & Fragment <= 319.5)
      temp[sample(1:nrow(temp), 5, replace = FALSE),]
         Marker    Sample Fragment       Date Plate
    7  Sssp2201 SXM100705   318.73 2015-01-29   all
    9  Sssp2201 SXM100925   318.91 2015-01-29   all
    20 Sssp2201 GRB100207   318.92 2015-01-29   all
    18 Sssp2201 FML100402   319.01 2015-01-29   all
    2  Sssp2201 CLG102406   318.78 2015-01-29   all
    • Upon closer inspection of samples in this region, the differences are not clear enough to warrent splitting the bin. A bin limit of 0.55 between 312.5 and 319.5 allows these samples to be binned into the same allele.
      res <- allCum(DB_new, "Sssp2201", ymin = 310, ymax = 320,
                    limit = list(c(220, 0.8), c(293, 0.3),
                                 c(312, 0.45), c(319.5, 0.55),
                                 c(450, 0.45)))
      print(res$plt)

  • There is a single fragment at the lower end of the bin @ 320bp:

      res <- allCum(DB_orig, "Sssp2201", ymin = 318, ymax = 321,
                    limit = list(c(220, 0.8), c(293, 0.3),
                                 c(312.5, 0.45), c(319.5, 0.55),
                                 c(450, 0.45)))
      print(res$plt)

    • Identify the fragment:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 319.3 & Fragment <= 319.6)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 RMN13127   319.52 2015-03-07   all
    • This fragment belongs to plate 62, one of the plates with the downward shift. The fragment will have 0.35 manually added onto it.
  • There is another group of problem fragments between 320.2 - 322.8bp:

      res <- allCum(DB_new, "Sssp2201", ymin = 319.2, ymax = 325,
                    limit = list(c(220, 0.8), c(293, 0.3),
                                 c(312.5, 0.45), c(325, 0.55),
                                 c(450, 0.45)))
    print(res$plt)

    • Identify the points between 320.5 - 320.9:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 320.5 & Fragment <= 320.9)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 CLG041203   320.82 2015-03-07   all
    2 Sssp2201  LND14042   320.60 2015-03-07   all
    • Sample LND14042 is from plate 67. 0.35 will be added to the fragment manually.

    • Identify and check the fragments between 321.3 - 321.6:

      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 321.3 & Fragment <= 321.6)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 RMN13093   321.43 2015-03-07   all
    2 Sssp2201 RMN13147   321.49 2015-03-07   all
    • Fragemnts are from plates 62 and 63. 0.35 will be added manually.
  • There is a problem bin ~ 328bp:

      res <- allCum(DB_orig, "Sssp2201", ymin = 327, ymax = 328.5,
                    limit = 0.55)
      print(res$plt)

    • Identify the lower four points:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 327 & Fragment <= 327.75)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 LTB050204   327.71 2015-03-07   all
    2 Sssp2201 LTB050209   327.55 2015-03-07   all
    3 Sssp2201 LTB050210   327.61 2015-03-07   all
    4 Sssp2201 RSB050101   327.68 2015-03-07   all
    • All four samples are from Plate 25. This sample was rerun on 14/04/14, due to a failed size standard in the original run. Plate 25 samples show the same downward shift in samples as the plates 59+. I need to check the dates, but these plates all seem to have been run following the replacement of the ABI capillary array. 0.35 will be manually added to these samples.
  • There seems to be another group of samples that exhibit a small downward shift @ 328.7bp.

      res <- allCum(DB_new, "Sssp2201", ymin = 328, ymax = 330, limit = 0.55)
      print(res$plt)

    • Identify the three small fragments:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 328.6 & Fragment <= 328.8)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13110   328.69 2015-03-07   all
    2 Sssp2201 SXM13151   328.73 2015-03-07   all
    3 Sssp2201 SXM13174   328.72 2015-03-07   all
    • All three samples are from plate 60. 0.35 will be manually added to the fragments.
  • There are a group of appearently large fragments ~ 329.4bp and a group of appearently small fragments ~ 329.6bp. Identify four from each.

      # lower group
      temp <-  DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 329.3 & Fragment <= 329.4)
      temp[sample(1:nrow(temp), 4, replace = FALSE),]
        Marker    Sample Fragment       Date Plate
    2 Sssp2201 RBW100108   329.35 2015-03-07   all
    1 Sssp2201 RBW051701   329.37 2015-03-07   all
    5 Sssp2201 BRD040820   329.33 2015-03-07   all
    3 Sssp2201 RBW100221   329.31 2015-03-07   all
      # upper group
      temp <-  DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 329.5 & Fragment <= 329.7)
      temp[sample(1:nrow(temp), 4, replace = FALSE),]
        Marker    Sample Fragment       Date Plate
    5 Sssp2201  RMN13131   329.57 2015-03-07   all
    3 Sssp2201 UBN140017   329.52 2015-03-07   all
    4 Sssp2201 UBN140076   329.61 2015-03-07   all
    1 Sssp2201 LGY050122   329.67 2015-03-07   all
    • All samples from the group of fragments between 329.5 - 329.7 are from either plate 25 or plate 59+. 0.35 will be added to each fragment.
  • Binning limit between 327 - 330.6? should be set to 0.4 to allow accurate binning of fragments within this range

      res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 327, ymax = 330.5)
      print(res$plt)

  • There are some problem samples between 330.6 - 333:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 330.6, ymax = 333)
      print(res$plt)

    • Identify the two major outliers
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 331.9 & Fragment <= 332.4)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 KEL051709   331.95 2015-03-07   all
    2 Sssp2201 CGN100201   332.24 2015-03-07   all
    • KEL051709 = Fragment is legit.
    • CGN100201 = Fragment is legit.

    • By keeping the bin limit to 0.4, these two fragments can be binned into the same allele.

  • There are some downward shift samples @ 337.5bp:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 336, ymax = 339)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 337.5 & Fragment <= 337.7)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13008   337.60 2015-03-07   all
    2 Sssp2201 RMN13115   337.55 2015-03-07   all
    • Both samples come from plates 59+. 0.35 will be added to the fragments.
  • There is a large fragment ~ 338.5:

    • Identify the point:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 338.4 & Fragment <= 338.6)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 PRB040105   338.51 2015-03-07   all
    • Peak is legit, but will be binned into the allele below it.
  • There is a problem fragment @ 339.6bp:

    • Identify the fragment:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 339.55 & Fragment <= 339.7)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 RMN13107   339.63 2015-03-07   all
    • This sample is from plates 59+. 0.35 will be added to it.
  • There are a group of samples with the downward shift around 340.5bp.

    • Identify the samples:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 340.4 & Fragment <= 340.65)
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13158   340.57 2015-03-07   all
    2 Sssp2201 RMN13090   340.58 2015-03-07   all
    3 Sssp2201 RMN13125   340.47 2015-03-07   all
    4 Sssp2201 RMN13143   340.58 2015-03-07   all
    5 Sssp2201 RMN13213   340.59 2015-03-07   all
    6 Sssp2201 LND14085   340.57 2015-03-07   all
    • All samples are from plates 59 +. 0.35 will be added to each.
  • There is an outlier sample at the top of the bin @

      res <- allCum(DB_new, "Sssp2201", limit = 0.4, ymin = 339, ymax = 342)
      print(res$plt)

    • Identify the point:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 341.22 & Fragment <= 341.38)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 UBN140018   341.29 2015-03-07   all
    • Sample is very weak. Genotype deleted.
  • Check of any of the samples at the bottom end of the bin ~ 341.7 are from downward shift plates:

      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 341.4 & Fragment <= 341.64)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 UBN140025   341.44 2015-03-07   all
    2 Sssp2201 UBN140037   341.46 2015-03-07   all
    3 Sssp2201 UBN140131   341.43 2015-03-07   all
    4 Sssp2201 UBN140040   341.56 2015-03-07   all
    5 Sssp2201  SXM13103   341.53 2015-03-07   all
    6 Sssp2201  SXM13139   341.57 2015-03-07   all
    7 Sssp2201  BLD13021   341.55 2015-03-07   all
    8 Sssp2201  BLD13034   341.46 2015-03-07   all
    9 Sssp2201  LND14107   341.58 2015-03-07   all
    • All samples are from plates 59 +. 0.35 will be added to fragments manually.
  • There is a small fragment at the bottom of the bin ~ 343bp:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 342, ymax = 345)
      print(res$plt)

    • Identify the point:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 342.6 & Fragment <= 342.8)      
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 RMN13069   342.71 2015-03-07   all
    • Sample is from plate 59+. 0.35 added.
  • There are also small fragments at the lower end of the next bin up.

    • Identify the fragments:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 343.5 & Fragment <= 343.7)      
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 SXM13022   343.63 2015-03-07   all
    2 Sssp2201 LND14017   343.56 2015-03-07   all
    • Both sample are from plates 59+. 0.35 added.
  • There is a group of downward shift fragments ~ 344.6bp. All samples are from plates 59 +. Fragments are being binned appropriatly, so no manipulations will be made.

  • There are a group of downward shift fragment at the bottom of the bin ~ 346bp.

      res <- allCum(DB_orig, "Sssp2201", limit = 0.4, ymin = 344, ymax = 348)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 345.4 & Fragment <= 345.69)      
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 RSB050105   345.50 2015-03-07   all
    2 Sssp2201  SXM13081   345.53 2015-03-07   all
    3 Sssp2201  SXM13157   345.64 2015-03-07   all
    4 Sssp2201  RMN13221   345.62 2015-03-07   all
    5 Sssp2201  BLD13042   345.51 2015-03-07   all
    6 Sssp2201  BLD13046   345.49 2015-03-07   all
    7 Sssp2201  BLD13085   345.47 2015-03-07   all
    8 Sssp2201  MOY13018   345.59 2015-03-07   all
    • All samples are from plate 25 or plates 59+. 0.35 will be added to fragments.
  • There is also a larger fragment associated with the bin ~ 346.

    • Identify the fragment:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 346.3 & Fragment <= 346.4)      
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 KEL051504   346.37 2015-03-07   all
    • Peak is legit. Will be binned with fragments below it.
  • There are a group of downward shift fragments at the bin ~ 347. All samples are from plate 25 or plates 59+. Because fragments are being binned correctly, not manipulations are required.

  • Bin limit should be dropped to 0.35 from 0.4 for fragments above 347.5.

  • There are three sole fragments between 350.5 - 351.8

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 350, ymax = 355)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 350.5 & Fragment <= 351.5)      
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 KEL040914   351.11 2015-03-07   all
    2 Sssp2201 LSN101405   351.44 2015-03-07   all
    3 Sssp2201  RMN13068   350.72 2015-03-07   all
    • All fragments are legit. RMN13068 will have 0.35 added to it and all fragments will be binned into the same allele.
  • There are some downward shift fragments in the next two bins, but they do not affect binning, so no manupulations are required.

  • There are a group of fragments ~ 354.3 that are causing binning problems:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 352, ymax = 357)
      print(res$plt)

    • Identify the fragments:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 354.3 & Fragment <= 354.9)      
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 RBW052603   354.60 2015-03-07   all
    2 Sssp2201 RBW054005   354.51 2015-03-07   all
    3 Sssp2201 ALK100301   354.52 2015-03-07   all
    4 Sssp2201 BBN100401   354.42 2015-03-07   all
    5 Sssp2201 SHK100503   354.48 2015-03-07   all
    6 Sssp2201  BLD13075   354.60 2015-03-07   all
    7 Sssp2201  LND14028   354.78 2015-03-07   all
    • All peaks are legit. Two samples are from plates 59+. 0.35 is added. This allows the algorithm to bin fragments appropriatly.
  • Check the smaller fragments at the bottom of the bin ~ 358bp:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 356, ymax = 360)
      print(res$plt)

    • Identify the small point ~ 356.8:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 356.4 & Fragment <= 356.95)      
        Marker   Sample Fragment       Date Plate
    1 Sssp2201 MOY13044   356.82 2015-03-07   all
    • The sample is from plate 59+. 0.35 added.

    • Identify the small fragments between 357.4 - 357.7 bp:

      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 357.4 & Fragment <= 357.7)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 UBN140053   357.61 2015-03-07   all
    2 Sssp2201 UBN140096   357.64 2015-03-07   all
    3 Sssp2201  SXM13005   357.53 2015-03-07   all
    4 Sssp2201  RMN13235   357.46 2015-03-07   all
    • All samples are from plates 59+. 0.35 added. Allows alleles to be differentiated.
  • There are also two small fragments ~ 358.6

    • Identify the samples:
      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 358.5 & Fragment <= 358.65)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 KEL051208   358.59 2015-03-07   all
    2 Sssp2201 ALG100605   358.52 2015-03-07   all
    • Both fragments are legit.

    • Identify the next group of four fragments ~ 358.8:

      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 358.66 & Fragment <= 358.9)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 SJH100127   358.83 2015-03-07   all
    2 Sssp2201 SJH100139   358.85 2015-03-07   all
    3 Sssp2201 GVY100303   358.77 2015-03-07   all
    4 Sssp2201 GVY101005   358.78 2015-03-07   all
    • All fragments are legit.

    • Identify the fragments between 359 - 359.2:

      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 359 & Fragment <= 359.2)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 CMV100103   359.08 2015-03-07   all
    2 Sssp2201  LND14104   359.11 2015-03-07   all
    • CMV100103 = Peak is legit.
    • LND14104 = Sample is from plates 59+. 0.35 added.

    • Check the next two fragments between 359.3 - 359.43

      DB_orig %>%
         filter(Marker == "Sssp2201") %>%
         filter(Fragment >= 359.3 & Fragment <= 359.43)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 IOM120284   359.33 2015-03-07   all
    2 Sssp2201 ALG100102   359.40 2015-03-07   all
    • IOM120284 = Peak is 359.4. Edited.
    • ALG100102 = Peak is 359.5. Edited.
  • Conservatively, all samples between 358.4 - 359.6 will be binned into the same allele.

  • Binning between 360.5 - 362.3 is incorrect:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 360, 
                    ymax = 364)
      print(res$plt)

    • Identify any samples from Plate 25 or Plates 59+ and add 0.35 to them.
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 360.4 & Fragment <= 361.8)
         Marker      Sample Fragment       Date Plate
    1  Sssp2201   RBW051910   361.15 2015-03-07   all
    2  Sssp2201   CLM050401   361.14 2015-03-07   all
    3  Sssp2201   CLG051416   361.16 2015-03-07   all
    4  Sssp2201   RSB050108   361.51 2015-03-07   all
    5  Sssp2201   BRD052123   361.78 2015-03-07   all
    6  Sssp2201   KEL040107   361.79 2015-03-07   all
    7  Sssp2201   CLG041503   361.07 2015-03-07   all
    8  Sssp2201   CLG041601   361.08 2015-03-07   all
    9  Sssp2201   SKW040213   361.17 2015-03-07   all
    10 Sssp2201   SKW040220   361.08 2015-03-07   all
    11 Sssp2201   SKW040225   361.16 2015-03-07   all
    12 Sssp2201   CLG102008   361.07 2015-03-07   all
    13 Sssp2201   BLD100508   361.08 2015-03-07   all
    14 Sssp2201   SXM102504   361.28 2015-03-07   all
    15 Sssp2201   LSN100203   361.30 2015-03-07   all
    16 Sssp2201   LSN100302   361.36 2015-03-07   all
    17 Sssp2201   LSN100705   361.38 2015-03-07   all
    18 Sssp2201   DGR100503   361.41 2015-03-07   all
    19 Sssp2201   KGR100505   361.39 2015-03-07   all
    20 Sssp2201   GGR100604   361.30 2015-03-07   all
    21 Sssp2201 LSN10T2W203   361.25 2015-03-07   all
    22 Sssp2201   CWH130017   361.79 2015-03-07   all
    23 Sssp2201    SXM13047   360.76 2015-03-07   all
    24 Sssp2201    SXM13063   361.77 2015-03-07   all
    25 Sssp2201    SXM13182   361.69 2015-03-07   all
    26 Sssp2201    RMN13079   361.52 2015-03-07   all
    27 Sssp2201    RMN13084   361.64 2015-03-07   all
    28 Sssp2201    RMN13106   361.62 2015-03-07   all
    29 Sssp2201    RMN13119   360.77 2015-03-07   all
    30 Sssp2201    RMN13138   361.50 2015-03-07   all
    31 Sssp2201    RMN13140   361.60 2015-03-07   all
    32 Sssp2201    BLD13007   360.84 2015-03-07   all
    33 Sssp2201    BLD13008   360.57 2015-03-07   all
    34 Sssp2201    BLD13021   360.68 2015-03-07   all
    35 Sssp2201    BLD13037   360.66 2015-03-07   all
    36 Sssp2201    BLD13040   361.53 2015-03-07   all
    37 Sssp2201    BLD13049   360.48 2015-03-07   all
    38 Sssp2201    BLD13065   360.93 2015-03-07   all
    39 Sssp2201    BLD13077   360.47 2015-03-07   all
    40 Sssp2201    RMN13242   361.50 2015-03-07   all
    41 Sssp2201    LND14038   360.78 2015-03-07   all
    42 Sssp2201    LND14131   361.00 2015-03-07   all
    • RSB050108 = Plate 25, 0.35 added.
    • LSN10T2W203 = Plate 56, 0.35 added.
    • CWH130017 = Plate 58, 0.35 added.
    • SXM13047 = Plate 59+. 0.35 added.
    • SXM13063 = Plate 59+. 0.35 added.
    • SXM13182 = Plate 59+. 0.35 added.
    • RMN13079 = Plate 59+. 0.35 added.
    • RMN13084 = Plate 59+. 0.35 added.
    • RMN13106 = Plate 59+. 0.35 added.
    • RMN13119 = Plate 59+. 0.35 added.
    • RMN13138 = Plate 59+. 0.35 added.
    • RMN13140 = Plate 59+. 0.35 added.
    • BLD13007 = Plate 59+. 0.35 added.
    • BLD13008 = Plate 59+. 0.35 added.
    • BLD13021 = Plate 59+. 0.35 added.
    • BLD13037 = Plate 59+. 0.35 added.
    • BLD13040 = Plate 59+. 0.35 added.
    • BLD13049 = Plate 59+. 0.35 added.
    • BLD13065 = Plate 59+. 0.35 added.
    • BLD13077 = Plate 59+. 0.35 added.
    • RMN13242 = Plate 59+. 0.35 added.
    • LND14038 = Plate 59+. 0.35 added.
    • LND14131 = Plate 59+. 0.35 added.

    • These edits allow fragments to be binned more appropriatly.

  • There are two large fragments between 362.9 - 363.4:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 362, 
                    ymax = 365)
      print(res$plt)

    • Identify the points;
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 362.9 & Fragment <= 363.9)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 SXM101102   363.31 2015-03-07   all
    2 Sssp2201  SXM13120   363.00 2015-03-07   all
    • SXM13120 = Is from plates 59 +. 0.35 added.

    • To allow the algorithm to split the above two samples from those below. A small value of 0.2 will be added manually.

  • There is a problem sample ~ 365.3 and 365.7:

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 362, 
                    ymax = 368)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 365.3 & Fragment <= 365.75)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 ART050108   365.73 2015-03-07   all
    2 Sssp2201 KGR100502   365.36 2015-03-07   all
    • ART050108 = Peak is 365.9. Edited.
    • KGR100502 = Peak is legit.
  • There is a binning problem ~ 382 - 383.5;

      res <- allCum(DB_orig, "Sssp2201", limit = 0.35, ymin = 381, 
                    ymax = 390)
      print(res$plt)

    • Identify samples below 382.9
      DB_orig %>%
        filter(Marker == "Sssp2201") %>%
        filter(Fragment >= 381.6 & Fragment <= 382.9)
        Marker    Sample Fragment       Date Plate
    1 Sssp2201 SKW050310   382.80 2015-03-07   all
    2 Sssp2201 BRD052210   382.13 2015-03-07   all
    3 Sssp2201 SXM101809   382.36 2015-03-07   all
    4 Sssp2201 BCF100109   382.41 2015-03-07   all
    5 Sssp2201 BCR100101   382.41 2015-03-07   all
    6 Sssp2201  SXM13024   381.93 2015-03-07   all
    7 Sssp2201  LND14036   381.78 2015-03-07   all
    8 Sssp2201  LND14100   381.86 2015-03-07   all
    • SKW050310 = Plate 25. 0.35 added.
    • SXM13024 = Plates 59+. 0.35 added.
    • LND14036 = Plates 59+. 0.35 added.
    • LND14100 = Plates 59+. 0.35 added.
  • The final binning rules for this locus are:

    • Below 220: 0.8
    • Below 293: 0.3
    • Below 312.5: 0.5
    • Below 325: 0.55
    • Below 347.5: 0.4
    • Below 450: 0.35

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sssp2201", 3, 
                  limit = list(c(220, 0.8), c(293, 0.3),
                               c(312.5, 0.5), c(347.5, 0.4),
                               c(450, 0.35)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
SXM100949 72374 221.1 all
BYN100109 73897 221.1 all
BYN100101 73887 221.1 all
RBW102906 73311 260.2 all
RBW054102 66358 260.3 all
SXM101003 72409 274.1 all
SHK100603 73847 278.6 all
CLT102508 74217 282.5 all
BLD100802 71826 313 all
BLD100602 71811 313 all
BRD052107 69164 324.8 all
CMV100103 71466 324.8 all
HUNST1303 75388 324.9 all
KEL051709 68700 331.9 all
CGN100201 73319 332.2 all
RMN13068 75796 351.1 all
KEL040914 69592 351.1 all
LSN101405 73627 351.4 all
SXM101102 72415 363.5 all
SXM13120 75626 363.6 all
SJH100156 72738 375.4 all
SXM13108 75604 387.3 all
BLD101301 71936 398.1 all
BLD102801 72024 398.4 all


  • RBW040506 = Locus failed. GT deleted.
  • SXM100949 = Peak is legit.
  • BYN100109 = Peak is legit.
  • BYN100101 = Peak is legit.
  • RBW102906 = Peak is legit.
  • RBW054102 = Peak is legit.
  • SXM101003 = Peak is legit.
  • SHK100603 = Peak is legit.
  • CLT102508 = Peak is legit.
  • BLD100802 = Peak is legit.
  • BLD100602 = Peak is legit.
  • BRD052107 = Peak is legit.
  • CMV100103 = Peak is legit.
  • HUNST1303 = Peak is legit.
  • KEL051709 = Peak is legit.
  • CGN100201 = Peak is legit.
  • RMN13068 = Peak is legit.
  • KEL040914 = Peak is legit.
  • LSN101405 = Peak is legit.
  • SXM101102 = Peak is legit.
  • SXM13120 = Peak is legit.
  • SJH100156 = Peak is legit.
  • SXM13108 = Peak is from plate 59+. 0.35 added. (fixed)
  • BLD101301 = Peak is legit.
  • BLD102801 = Peak is legit.

DONE!

CA048828

Calculate bin statistics for CA048828

dat <- BinStats(DB_orig, "CA048828", limit = 0.45)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
254 198 254.19 254.63 0.44 0.088 254.45 254.46
256 170 256.09 256.57 0.48 0.111 256.34 256.36
258 109 258.05 258.52 0.47 0.102 258.33 258.34
260 167 259.9 260.36 0.46 0.115 260.16 260.18
262 897 261.78 262.34 0.56 0.118 262.11 262.13
264 1477 263.65 264.27 0.62 0.099 264.03 264.04
266 937 265.58 266.33 0.75 0.106 265.95 265.97
268 1604 267.52 268.11 0.59 0.098 267.89 267.9
270 484 269.45 270.05 0.6 0.105 269.83 269.86
271 1 270.83 270.83 0 NA 270.83 270.83
272 420 271.41 271.99 0.58 0.11 271.77 271.79
274 520 273.35 273.94 0.59 0.118 273.7 273.72
275 84 274.45 274.98 0.53 0.111 274.75 274.76
276 255 275.35 275.91 0.56 0.127 275.65 275.66
278 423 277.2 277.85 0.65 0.124 277.59 277.6
279 243 278.29 278.86 0.57 0.115 278.64 278.65
280 343 279.15 279.74 0.59 0.108 279.55 279.55
281 514 280.28 281.69 1.41 0.443 281.07 281.26
283 106 282.36 283.59 1.23 0.346 283.2 283.34
285 340 284.96 285.52 0.56 0.109 285.32 285.33
286 17 286.36 286.5 0.14 0.04 286.42 286.42
287 504 286.54 287.46 0.92 0.11 287.24 287.25
289 244 288.82 289.32 0.5 0.101 289.15 289.17
290 18 290.1 290.32 0.22 0.052 290.22 290.22
291 93 290.83 291.29 0.46 0.106 291.1 291.11
293 517 292.66 293.21 0.55 0.088 293 293.02
295 215 294.63 295.15 0.52 0.1 294.93 294.92
297 110 296.54 297.02 0.48 0.104 296.84 296.85
299 93 298.52 298.96 0.44 0.095 298.78 298.77
301 111 300.37 300.88 0.51 0.118 300.67 300.7
303 16 302.47 302.72 0.25 0.074 302.61 302.62
304 1 304.14 304.14 0 NA 304.14 304.14
305 17 304.2 304.65 0.45 0.158 304.46 304.55
306 5 306.29 306.43 0.14 0.062 306.37 306.38
308 22 308.04 308.49 0.45 0.093 308.34 308.35
310 13 310.17 310.33 0.16 0.047 310.24 310.24
312 3 312.05 312.19 0.14 0.078 312.1 312.06
314 4 313.9 314.1 0.2 0.097 314.04 314.09
316 3 316.09 316.15 0.06 0.03 316.12 316.12
318 21 317.87 318.22 0.35 0.092 318.08 318.12
332 6 331.71 331.86 0.15 0.052 331.78 331.78


Generate cumulative plot for CA048828

res <- allCum(DB_orig, "CA048828", limit = 0.45)
print(res$plt)


Identify problems

  • There is a single odd sample ~ 271bp:

      res <- allCum(DB_orig, "CA048828", limit = 0.45, ymin = 270,
                    ymax = 275)
      print(res$plt)

    • Identify the point:
      DB_orig %>%
        filter(Marker == "CA048828") %>%
        filter(Fragment >= 270.6 & Fragment <= 271)
        Marker    Sample Fragment       Date Plate
    1 CA048828 BLD101103   270.83 2015-03-07   all
    • BLD101103 = Peak is legit.
  • A bin limit of 0.45 does not accuratly bin fragments between 280bp and 284 bp:

      res <- allCum(DB_orig, "CA048828", limit = 0.45, ymin = 275,
                    ymax = 285)
      print(res$plt)

    • The algorithm is unable to differentiate the 1bp shift of these fragments. In each case, the variance of each group of fragments will be reduced around the mean of the group to allow them to be accuratly binned.

    • By reducing the variance of each bin and adding a small quantity to both of the large bins, the algorithm is able to differentiate these alleles. The code below (hidden) writes these manipulations to “Main_DB_new.txt”

Test that the changes are correct within the raw data

DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
res <- allCum(DB_new, "CA048828", limit = 0.35, ymin = 280, ymax = 285)
print(res$plt)

  • A binning limit of 0.35 is sufficient between 240 - 276.5bp. A limit of 0.45 is needed between 277 - 286bp. A limit of 0.35 is needed between 286.1-303bp. A limit of 0.5 is required from 303.1 - 340bp.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "CA048828", 3, limit = list(c(276.5, 0.35),
                                                      c(286, 0.45),
                                                      c(303, 0.35),
                                                      c(340, 0.5)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BLD101103 83130 270.8 all
FDB040516 77910 312.1 all
FDB040514 77906 312.1 all
BLD100901 83106 312.2 all
CAW100110 83154 316.1 all
CLG040112 80946 316.1 all
SKW040203 81557 316.1 all


  • BLD101103 = Peak is legit.
  • FDB040516 = Peak is legit.
  • FDB040514 = Peak is legit.
  • BLD100901 = Peak is legit.
  • CAW100110 = Peak is legit.
  • CLG040112 = Peak is legit.
  • SKW040203 = Peak is legit.

DONE!

Cocl-lav-4

Calculate bin statistics for Cocl-lav-4

dat <- BinStats(DB_orig, "Cocl-lav-4")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
148 4 148.09 148.15 0.06 0.029 148.11 148.09
152 2066 151.92 152.53 0.61 0.118 152.29 152.3
154 1233 154.06 154.65 0.59 0.108 154.43 154.45
157 3679 156.21 156.81 0.6 0.112 156.56 156.58
159 1691 158.35 158.91 0.56 0.112 158.7 158.71
161 564 160.51 161.02 0.51 0.117 160.82 160.84
163 712 162.59 163.15 0.56 0.112 162.93 162.95
165 38 164.73 165.26 0.53 0.164 165.04 165.12
167 28 166.89 167.3 0.41 0.128 167.08 167.12
169 2 169.05 169.33 0.28 0.198 169.19 169.19
171 1 171.32 171.32 0 NA 171.32 171.32


Generate cumulative plot for Cocl-lav-4

res <- allCum(DB_orig, "Cocl-lav-4")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Cocl-lav-4", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
RMN13032 97180 169.1 all
RBW043701 88535 169.3 all
BRD041104 96518 171.3 all


  • BRD041104 = 171.32 peak legit.
  • RBW043701 = 169.3 peak legit.
  • RMN130302 = 169 peak legit.

DONE!

OneU9ASC

Calculate bin statistics for OneU9ASC

dat <- BinStats(DB_orig, "OneU9ASC")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
186 170 185.83 186.43 0.6 0.171 186.19 186.25
193 16 192.18 192.78 0.6 0.179 192.61 192.69
195 3 194.69 194.85 0.16 0.081 194.76 194.75
197 42 196.49 197.07 0.58 0.17 196.88 196.94
199 2484 198.54 199.37 0.83 0.146 198.94 198.97
201 2045 200.67 201.28 0.61 0.142 201.04 201.09
203 3377 202.75 203.45 0.7 0.148 203.14 203.18
205 963 204.87 205.51 0.64 0.138 205.25 205.28
207 43 207.04 207.59 0.55 0.134 207.34 207.36
209 40 209.06 209.62 0.56 0.156 209.4 209.39
212 151 211.19 211.83 0.64 0.161 211.54 211.59
214 813 213.22 213.91 0.69 0.15 213.64 213.64
216 2 215.81 215.81 0 0 215.81 215.81


Generate cumulative plot for OneU9ASC

res <- allCum(DB_orig, "OneU9ASC")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "OneU9ASC", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
KEL050512 100493 194.7 all
RBW040504 98217 194.8 all
KEL050819 100719 194.8 all
RFY041609 98960 215.8 all
FDB040110 99028 215.8 all


  • KEL050512 = 194.69 peak legit.
  • RBW040504 = 194.75 peak legit.
  • KEL050819 = 194.85 peak legit.
  • FDB040110 = Weak sample, but peak is legit.
  • RFY041609 = Weak sample, but peak is legit.

DONE!

SsaD157

Calculate bin statistics for SsaD157

dat <- BinStats(DB_orig, "SsaD157")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
239 106 238.73 239.3 0.57 0.15 239.12 239.18
243 50 242.88 243.44 0.56 0.158 243.13 243.13
248 1 247.6 247.6 0 NA 247.6 247.6
256 8 255.21 255.62 0.41 0.147 255.48 255.55
257 1 257.16 257.16 0 NA 257.16 257.16
260 78 259.23 259.75 0.52 0.149 259.51 259.55
262 5 260.9 261.56 0.66 0.278 261.38 261.51
264 322 263.19 263.81 0.62 0.133 263.54 263.55
266 30 265.21 265.78 0.57 0.181 265.56 265.63
268 531 267.27 267.86 0.59 0.136 267.6 267.62
270 360 269.25 269.88 0.63 0.169 269.57 269.58
272 588 271.32 271.95 0.63 0.149 271.69 271.7
274 552 273.32 273.96 0.64 0.175 273.69 273.74
276 474 275.39 276.04 0.65 0.157 275.77 275.8
278 737 277.39 278.05 0.66 0.167 277.79 277.83
280 573 279.19 280.09 0.9 0.139 279.91 279.91
282 372 281.47 282.14 0.67 0.195 281.85 281.92
284 800 283.55 284.21 0.66 0.157 283.92 283.94
286 144 285.53 286.41 0.88 0.175 285.88 285.89
288 933 287.57 288.26 0.69 0.15 287.96 287.98
290 111 289.58 290.2 0.62 0.145 289.98 290.02
292 1075 291.63 292.29 0.66 0.168 291.99 292
294 43 293.63 294.24 0.61 0.189 294.01 294.07
296 827 295.65 296.32 0.67 0.169 296.03 296.05
298 66 297.66 298.27 0.61 0.161 298.02 298.08
300 325 299.71 300.34 0.63 0.145 300.06 300.09
302 22 301.49 302.28 0.79 0.224 301.97 301.9
304 143 303.68 304.36 0.68 0.191 304.07 304.09
306 96 305.7 306.33 0.63 0.138 306.02 306.03
308 76 307.76 308.39 0.63 0.211 308.11 308.13
310 338 309.72 310.39 0.67 0.176 310.08 310.1
312 35 311.84 312.41 0.57 0.169 312.13 312.16
314 217 313.8 314.45 0.65 0.171 314.16 314.18
316 62 316 316.58 0.58 0.173 316.35 316.35
318 252 317.99 318.68 0.69 0.165 318.37 318.4
321 15 320.19 320.71 0.52 0.19 320.54 320.62
323 138 322.2 322.82 0.62 0.153 322.53 322.54
327 207 326.32 326.9 0.58 0.143 326.64 326.66
331 27 330.4 330.96 0.56 0.171 330.65 330.69
335 38 334.47 334.99 0.52 0.139 334.81 334.86
339 53 338.56 339.05 0.49 0.132 338.82 338.85
343 45 342.56 343.09 0.53 0.16 342.87 342.91
347 4 346.8 347.03 0.23 0.098 346.94 346.96
351 6 350.74 351.24 0.5 0.192 351.07 351.12
355 3 354.83 355.29 0.46 0.246 355.11 355.21
359 8 358.92 359.28 0.36 0.121 358.98 358.94
363 1 363.01 363.01 0 NA 363.01 363.01


Generate cumulative plot for SsaD157

res <- allCum(DB_orig, "SsaD157")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD157", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BYN100202 116260 247.6 all
SXM13161 117924 257.2 all
UBN140028 116896 354.8 all
RBW040504 108372 355.2 all
BBN100507 115925 355.3 all
UBN140007 116857 363 all


  • BYN100202 = legit.
  • SXM13161 = Odd samples, genotype deleted in “Main_DB_new.txt”.
  • UBN140028 = legit.
  • RBW040504 = legit.
  • BBN100507 = legit.
  • UBN140007 = legit.

DONE!


Sssp2216

Calculate bin statistics for Sssp2216

dat <- BinStats(DB_orig, "Sssp2216")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
135 284 134.29 134.84 0.55 0.105 134.59 134.59
139 144 138.21 138.72 0.51 0.104 138.51 138.52
143 3584 142.14 142.87 0.73 0.102 142.5 142.51
147 2410 146.16 146.76 0.6 0.098 146.52 146.53
151 569 150.26 150.77 0.51 0.101 150.56 150.57
155 2058 154.25 155.52 1.27 0.159 154.61 154.6
159 1245 158.32 159.62 1.3 0.367 158.85 158.68
163 317 162.26 163.6 1.34 0.191 162.69 162.66
167 13 166.44 167.34 0.9 0.224 166.64 166.59


Generate cumulative plot for Sssp2216

res <- allCum(DB_orig, "Sssp2216")
print(res$plt)


Extract samples with a 1bp shift in the upper limits of the marker

The top four bins of this marker appear to have some samples that are 1bp larger than the assigned bins in the plot above. This can be seen more clearly in the zoomed plot below.

res <- allCum(DB_orig, "Sssp2216",ymin = 154, ymax = 168)
print(res$plt)

To confirm this pattern, a subset of samples from the third largest allele (shown below) will be checked. If the pattern is confirmed the larger fragments for each of the four bins will have a suitable value added to allow the binning algorithm differentiate the alleles.

res <- allCum(DB_orig, "Sssp2216",ymin = 158, ymax = 160)
print(res$plt)

Extract 5 samples from each potential bin

frags <- DB_orig$Fragment[DB_orig$Marker == "Sssp2216"]
# extract all large frags
lrg <- DB_orig[DB_orig$Marker == "Sssp2216",][(frags >= 159 & frags <= 160),]
sml <- DB_orig[DB_orig$Marker == "Sssp2216",][(frags >= 158.3 & frags <= 158.9),]
# find heterozygotes of each fragment
hets <- sml[sml$Sample %in% lrg$Sample,]
# return 5 random hets
hets[sample(1:nrow(hets), 5, replace = FALSE),]
         Marker    Sample Fragment       Date Plate
125875 Sssp2216 SXM102507   158.79 2015-01-29   all
121346 Sssp2216 CLG051510   158.67 2015-01-29   all
129225 Sssp2216  BLD13058   158.33 2015-01-29   all
125120 Sssp2216 BLD101602   158.81 2015-01-29   all
124606 Sssp2216 CLM100502   158.56 2015-01-29   all

The screen cap below demonstrates that these bins are likely to be two seperate alleles, seperated my 1bp. All of the larger fragements from each of the four largest bins of this locus will have 0.5 added to allow the algorithm to differentiate alleles.

split_bins

# read the latest database
DB_new <- fastReadFrag("Main_DB_new.txt", as.character(Sys.Date()), "all")
# make the corrections
mrkr <- which(DB_new$Marker == "Sssp2216")
frags <- DB_new[mrkr, "Fragment"]
frags <- sapply(frags, function(x){
  if(is.na(x)){
    return(x)
  } else if(x >= 155 & x <= 155.6){
    return(x + 0.5)
  } else if(x >= 159 & x <= 160){
    return(x + 0.5)
  } else if(x >= 163 & x <= 164){
    return(x + 0.5)
  } else if(x >= 167 & x <= 168){
    return(x + 0.5)
  } else {
    return(x)
  }
})
# replace the original values
DB_new[mrkr, "Fragment"] <- frags
# replot the third last bin to check the changes
print(allCum(DB_new, "Sssp2216", ymin = 154, ymax = 156)$plt)

# check all amended alleles
print(allCum(DB_new, "Sssp2216", ymin = 154, ymax = 170)$plt)

# write the new database
saveRDS(DB_new, "Main_DB_new.rds")


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sssp2216", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
LND14152 129639 168.3 all


  • LND14152 = legit.

DONE!


Str2QUB

Calculate bin statistics for Str2QUB

dat <- BinStats(DB_orig, "Str2QUB", limit = 0.4)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
200 7 200.09 200.36 0.27 0.09 200.27 200.27
210 5 210.05 210.29 0.24 0.095 210.19 210.23
212 56 211.98 212.33 0.35 0.087 212.15 212.14
214 285 213.66 214.39 0.73 0.111 213.98 214
216 48 215.71 216.35 0.64 0.11 216.06 216.06
220 875 219.56 220.37 0.81 0.124 219.99 220
222 13 221.71 221.97 0.26 0.085 221.82 221.83
223 1 222.94 222.94 0 NA 222.94 222.94
224 38 223.8 224.19 0.39 0.085 223.98 223.99
226 1791 225.39 226.22 0.83 0.12 225.8 225.8
227 1 227.14 227.14 0 NA 227.14 227.14
228 275 227.59 228.55 0.96 0.112 227.89 227.89
232 534 231.37 232.15 0.78 0.107 231.81 231.82
236 377 235.43 236.11 0.68 0.112 235.76 235.76
238 22 237.33 237.77 0.44 0.116 237.58 237.56
240 357 239.32 240 0.68 0.106 239.67 239.65
242 1 241.71 241.71 0 NA 241.71 241.71
244 193 243.43 243.98 0.55 0.09 243.7 243.71
248 133 247.4 247.9 0.5 0.107 247.7 247.71
252 78 251.4 251.89 0.49 0.106 251.69 251.71
256 51 255.4 255.87 0.47 0.109 255.6 255.61
257 2 257.43 257.47 0.04 0.028 257.45 257.45
259 8 259.32 259.52 0.2 0.088 259.42 259.42
261 87 261.15 261.68 0.53 0.105 261.41 261.42
263 86 262.93 263.62 0.69 0.138 263.34 263.37
265 100 265.06 265.65 0.59 0.093 265.37 265.37
267 130 266.98 267.59 0.61 0.108 267.28 267.3
269 34 269 269.67 0.67 0.132 269.29 269.3
271 2513 270.53 271.62 1.09 0.144 271.09 271.1
273 15 273.07 273.32 0.25 0.083 273.21 273.21
275 94 274.72 275.46 0.74 0.142 275.17 275.17
279 1351 278.35 279.45 1.1 0.159 278.98 279
281 29 280.86 281.19 0.33 0.11 281.03 281.05
283 143 282.51 283.27 0.76 0.144 283.02 283.06
285 77 284.46 285.34 0.88 0.156 284.88 284.9
287 218 286.27 287.22 0.95 0.157 286.82 286.83
291 74 290.34 291.12 0.78 0.159 290.71 290.7
293 6 292.37 292.8 0.43 0.156 292.65 292.65
295 40 294.35 295.06 0.71 0.172 294.61 294.58
296 6 296.08 296.26 0.18 0.077 296.2 296.24
297 95 296.31 296.77 0.46 0.108 296.53 296.52
298 9 298.11 298.32 0.21 0.079 298.24 298.25
299 50 298.33 298.8 0.47 0.104 298.54 298.54
301 15 300.43 300.82 0.39 0.105 300.64 300.62
302 15 302.07 302.6 0.53 0.144 302.31 302.3
304 135 303.77 304.63 0.86 0.197 304.18 304.14
308 5 308.37 308.45 0.08 0.031 308.4 308.39
310 227 309.53 310.4 0.87 0.157 310.04 310.05
312 17 311.68 312.51 0.83 0.233 312.2 312.25
314 349 313.4 314.29 0.89 0.158 313.89 313.91
316 17 315.81 316.49 0.68 0.168 316.26 316.33
320 45 319.82 320.56 0.74 0.196 320.2 320.19
322 44 321.52 322.29 0.77 0.177 321.92 321.92
324 8 324.37 324.57 0.2 0.069 324.44 324.41
326 2 326.01 326.02 0.01 0.007 326.01 326.01
328 48 328.07 328.68 0.61 0.13 328.38 328.37
332 7 332.28 332.41 0.13 0.06 332.35 332.37
336 14 335.99 336.55 0.56 0.17 336.26 336.25
340 3 339.91 340.19 0.28 0.143 340.03 340
346 10 345.49 345.84 0.35 0.099 345.64 345.62


Generate cumulative plot for Str2QUB

res <- allCum(DB_orig, "Str2QUB", limit = 0.4)
print(res$plt)


Finding problem samples (check bins, 30bp at a time)

  • There is a sole peak @ ~ 228.55bp as can be seen below:

r res <- allCum(DB_orig, "Str2QUB", ymin = 227.5, ymax = 228.9, limit = 0.4) print(res$plt)

  • Identify the point:

r DB_orig[DB_orig$Marker == "Str2QUB",][DB_orig[DB_orig$Marker == "Str2QUB" ,"Fragment"] >= 228.4 & DB_orig[DB_orig$Marker == "Str2QUB","Fragment"] <= 228.8,]

Marker Sample Fragment Date Plate 130932 Str2QUB BGW050301 228.55 2015-03-07 all


  • BGW050301 = Size standard was off. Peak actually occured at 227.9bp. Corrected in “Main_DB_new.txt”.

  • There is a sole peak @ ~ 241.75bp as can be seen below:

r res <- allCum(DB_orig, "Str2QUB", ymin = 239, ymax = 244, limit = 0.4) print(res$plt)

- Identify the point:

r DB_orig[DB_orig$Marker == "Str2QUB",][DB_orig[DB_orig$Marker == "Str2QUB" ,"Fragment"] >= 241 & DB_orig[DB_orig$Marker == "Str2QUB","Fragment"] <= 242,]

Marker Sample Fragment Date Plate 138806 Str2QUB UBN140050 241.71 2015-03-07 all

  • UBN140050 = Peak is legit.

  • Some bins > 295bp are not consistent, (see below):

r res <- allCum(DB_orig, "Str2QUB", ymin = 295, ymax = 300, limit = 0.4) print(res$plt)

Test a larger binning limit:

r res <- allCum(DB_orig, "Str2QUB", ymin = 295, ymax = 300, limit = list(c(295.5, 0.4), c(299.5, 0.7), c(350, 0.4))) print(res$plt)

- Binning limits between 295.5 and 299.5 should be set to 0.7.


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Str2QUB", 3, 
                  limit = list(c(295.5, 0.4), c(299.5, 0.7), c(350, 0.4)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
RBW040504 129922 222.9 all
PRB040329 135105 227.1 all
UBN140050 138807 241.7 all
BKB040119 138387 257.4 all
SHK100501 137899 257.5 all
LND14123 140862 326 all
KLM100703 137592 326 all
CLG050910 131941 339.9 all
BRD050807 131187 340 all
CLG040707 134170 340.2 all


  • RBW040504 = Peak is legit.
  • UBN140050 = Peak is legit.
  • BKB040119 = Peak is legit.
  • SHK100501 = Peak is legit.
  • DGB050220 = Peak is legit.
  • LND14123 = Peak is legit.
  • KLM100703 = Peak is legit.
  • CLG050910 = Peak is legit.
  • BRD050807 = Peak is legit.
  • CLG040707 = Peak is 340.3, manually edited in “Main_DB_new.txt”.

DONE!

Str3QUB

Calculate bin statistics for Str3QUB

dat <- BinStats(DB_orig, "Str3QUB")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
126 2 125.53 125.56 0.03 0.021 125.55 125.55
129 1866 128.81 129.61 0.8 0.188 129.27 129.29
133 3 132.79 133.37 0.58 0.319 133.16 133.31
134 1 134.1 134.1 0 NA 134.1 134.1
138 1 138.14 138.14 0 NA 138.14 138.14
141 653 140.48 141.23 0.75 0.173 140.94 140.96
145 4 144.75 145.14 0.39 0.204 144.95 144.97
153 7 152.71 153.12 0.41 0.142 152.94 152.95
157 2131 156.42 157.2 0.78 0.164 156.9 156.93
169 4894 168.3 169.03 0.73 0.15 168.75 168.77
173 2 172.69 172.7 0.01 0.007 172.69 172.69
181 29 180.2 180.74 0.54 0.148 180.56 180.57
189 1 188.71 188.71 0 NA 188.71 188.71


Generate cumulative plot for Str3QUB

res <- allCum(DB_orig, "Str3QUB")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Str3QUB", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
KEL050512 143237 125.5 all
RBW040504 141145 125.6 all
BLD13072 150155 132.8 all
GGR100909 148066 133.3 all
GGR100905 148060 133.4 all
KEL050730 143409 134.1 all
BRD050112 142101 138.1 all
BRD050421 142155 172.7 all
BRD050415 142146 172.7 all
CWT100104 148320 188.7 all


  • KEL050512 = Peak is 125.4, manually edited in “Main_DB_new.txt”.
  • RBW040504 = Peak is 125.5, manually edited in “Main_DB_new.txt”.
  • BLD13072 = Peak is legit.
  • GGR100909 = Peak is legit.
  • GGR100905 = Peak is legit.
  • KEL050730 = Peak was an artifact. GT changed to 156.9/156.9.
  • BRD050112 = Peak was an artifact. GT changed to 156.9/156.9.
  • BRD050421 = Peak is legit.
  • BRD050415 = Peak is legit.
  • CWT100104 = Odd sample. GT deleted for Str3QUB.

DONE!

Ssa420UoS

Calculate bin statistics for Ssa420UoS

dat <- BinStats(DB_orig, "Ssa420UoS", limit = 1.0)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
181 1 180.57 180.57 0 NA 180.57 180.57
184 5 184.13 184.56 0.43 0.172 184.43 184.47
189 5 188.36 188.74 0.38 0.155 188.63 188.7
193 257 192.5 193.03 0.53 0.124 192.79 192.8
197 364 196.67 197.18 0.51 0.116 196.96 196.98
201 838 200.86 201.37 0.51 0.124 201.15 201.18
205 58 205.03 205.5 0.47 0.146 205.29 205.32
209 294 209.12 209.67 0.55 0.129 209.45 209.48
214 698 213.3 213.83 0.53 0.128 213.6 213.63
218 1282 217.47 218.03 0.56 0.123 217.78 217.79
222 541 221.67 222.16 0.49 0.118 221.96 221.99
226 485 225.8 226.6 0.8 0.125 226.09 226.11
228 15 228.14 228.37 0.23 0.065 228.24 228.22
230 370 229.9 230.44 0.54 0.118 230.2 230.21
234 539 234.03 234.53 0.5 0.112 234.32 234.32
238 651 238.18 238.67 0.49 0.109 238.46 238.47
243 691 242.38 242.87 0.49 0.113 242.68 242.7
247 852 246.61 247.28 0.67 0.12 246.9 246.92
251 979 250.83 251.31 0.48 0.116 251.09 251.09
255 354 254.92 255.41 0.49 0.119 255.19 255.2
259 247 259.02 259.5 0.48 0.126 259.28 259.3
263 153 263.14 263.57 0.43 0.099 263.34 263.33
267 200 267.21 267.72 0.51 0.107 267.47 267.48
272 213 271.4 271.88 0.48 0.108 271.64 271.64
276 426 275.5 276.28 0.78 0.127 275.79 275.8
280 272 279.7 280.18 0.48 0.126 279.93 279.91
284 247 283.79 285.21 1.42 0.147 284.05 284.03
288 110 287.9 289.37 1.47 0.171 288.19 288.17
292 100 291.97 292.48 0.51 0.162 292.28 292.35
296 36 296.11 296.58 0.47 0.138 296.35 296.33
300 44 300.19 300.67 0.48 0.144 300.45 300.44
305 102 303.85 304.77 0.92 0.153 304.51 304.5
309 39 308.37 308.87 0.5 0.132 308.71 308.77
313 114 312.33 313.05 0.72 0.133 312.71 312.74
317 15 316.86 317.09 0.23 0.056 316.97 316.95
321 4 320.99 321.4 0.41 0.178 321.24 321.29
361 1 361.22 361.22 0 NA 361.22 361.22


Generate cumulative plot for Ssa420UoS

res <- allCum(DB_orig, "Ssa420UoS", limit = 1.0)
print(res$plt)


Problem samples

  • There appear to be two odd frgments between 284 and 290 bp:

      res <- allCum(DB_orig, "Ssa420UoS", ymin = 284, ymax = 290)
      print(res$plt)

    • Identify the two points:
      frg <- DB_orig[DB_orig$Marker == "Ssa420UoS", "Fragment"]
      DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=284.9&frg<=285.5, ]
              Marker    Sample Fragment       Date Plate
    158758 Ssa420UoS LSN100801   285.21 2015-03-07   all
      DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=289&frg<=289.8, ]
              Marker    Sample Fragment       Date Plate
    157123 Ssa420UoS BLD101302   289.37 2015-03-07   all
    • LSN100801 = Peak is legit.
    • BLD101302 = Peak is legit.
  • Another sample @ ~ 304bp may be an error.

      res <- allCum(DB_orig, "Ssa420UoS", ymin = 300, ymax = 306)
      print(res$plt)

    • Identify the point:
      DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=303.6&frg<=304.1, ]
              Marker    Sample Fragment       Date Plate
    159747 Ssa420UoS CWH130014   303.85 2015-03-07   all

    This fragment occurs very close to a One104 fragment (image below). On closer inspection, the Ssa420UoS fragment is actually 304.4. Manually edited in “Main_DB_new.txt”.

    Ssa420UoS

  • There is a single fragment @ ~ 362bp:

      res <- allCum(DB_orig, "Ssa420UoS", ymin = 320, ymax = 370)
      print(res$plt)

    • Identify the point:
      DB_orig[DB_orig$Marker == "Ssa420UoS",][frg>=350&frg<=370, ]
              Marker    Sample Fragment       Date Plate
    150924 Ssa420UoS RBW041418   361.22 2015-03-07   all

    Unclear whether this fragment is legitimate due to overlap with One104. The genotype for this individual at _One104 and Ssa420UoS will be manually deleted in “Main_DB_new.txt”

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa420UoS", 3, limit = 0.8)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
LSN100801 158752 285.2 all
BLD101302 157117 289.4 all


  • BLD101302 = Confirmed above.
  • BRD050911 = Peak is actually from Str3QUB. Both loci checked and genotypes edited manually in “Main_DB_new.txt”.
  • LSN100801 = Confirmed above.

DONE!

One104

Experimentally use ‘dplyr’ package for manipulation (Just for fun)
library(dplyr)

Calculate bin statistics for One104

dat <- BinStats(DB_orig, "One104")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
296 741 295.33 296.15 0.82 0.167 295.86 295.89
302 432 301.38 302.17 0.79 0.196 301.86 301.9
304 358 303.48 304.56 1.08 0.174 303.94 303.99
306 5 305.88 306.63 0.75 0.31 306.1 306
308 1141 307.48 308.7 1.22 0.178 307.98 308.01
310 14 309.7 310.31 0.61 0.151 310.18 310.23
312 853 311.6 312.69 1.09 0.177 312.02 312.03
314 692 313.71 314.45 0.74 0.194 314.1 314.18
316 210 315.77 316.45 0.68 0.156 316.15 316.15
318 1076 317.9 318.64 0.74 0.164 318.32 318.36
320 10 320 320.61 0.61 0.179 320.37 320.41
322 2763 322 322.77 0.77 0.154 322.46 322.47
325 24 324.15 324.73 0.58 0.117 324.6 324.63
327 242 326.14 326.82 0.68 0.174 326.57 326.62
328 26 328.15 328.72 0.57 0.171 328.47 328.49
331 41 330.3 330.8 0.5 0.151 330.52 330.51
333 95 332.19 332.86 0.67 0.173 332.57 332.6
335 937 334.25 334.96 0.71 0.156 334.68 334.71
337 158 336.2 336.92 0.72 0.148 336.65 336.68
339 26 338.46 338.88 0.42 0.112 338.73 338.74
341 52 340.29 340.89 0.6 0.226 340.63 340.77
345 111 344.36 345.07 0.71 0.132 344.83 344.83
347 2 346.55 346.55 0 0 346.55 346.55
349 179 348.51 349.12 0.61 0.122 348.92 348.94
353 72 352.5 353.17 0.67 0.179 352.94 353.01
357 171 356.64 357.27 0.63 0.137 357.03 357.04
359 1 359.28 359.28 0 NA 359.28 359.28
361 87 360.66 361.34 0.68 0.165 361.07 361.07
365 2 365.2 365.24 0.04 0.028 365.22 365.22
369 1 368.79 368.79 0 NA 368.79 368.79
384 1 383.5 383.5 0 NA 383.5 383.5


Generate cumulative plot for One104

res <- allCum(DB_orig, "One104")
print(res$plt)


Look for problem fragments

  • There are some fragments at the top of the bin around 304bp that may be a problem:

      res <- allCum(DB_orig, "One104", ymin = 303, ymax = 306)
      print(res$plt)

    • Extract the samples to check them.
      DB_orig %>%
        filter(Marker == "One104") %>%
        filter(Fragment >= 304.25 & Fragment <= 304.6)
      Marker    Sample Fragment       Date Plate
    1 One104 BRD050422   304.46 2015-03-07   all
    2 One104 CLG051408   304.56 2015-03-07   all
    3 One104 SXM100106   304.28 2015-03-07   all
    • BRD050422 = Fragment is an artifact. Deleted.
    • CLG051408 = Fragment belongs to Ssa420UoS. Fixed
    • SXM100106 = Peak is legit.
  • There are some fragments between 305 and 307bp that may be a problem:

      res <- allCum(DB_orig, "One104", ymin = 305, ymax = 310)
      print(res$plt)

    • Extract the problem samples
      DB_orig %>%
        filter(Marker == "One104") %>%
        filter(Fragment >= 305 & Fragment <= 306.8)
      Marker    Sample Fragment       Date Plate
    1 One104 CLM050920   306.00 2015-03-07   all
    2 One104 CLM051211   306.12 2015-03-07   all
    3 One104 CLM040204   305.89 2015-03-07   all
    4 One104 CLM040305   305.88 2015-03-07   all
    5 One104  RMN13239   306.63 2015-03-07   all
    • CLM050920 = Peak is 306.1. Fixed
    • CLM051211 = Peak is 306.1. Fixed
    • CLM040204 = Peak is 306.0. Fixed
    • CLM040305 = Peak is 305.9. Fixed
    • RMN13239 = Peak is 307.7. Fixed
  • There are some fragments between 308.45 and 309bp that may be a problem:

      res <- allCum(DB_orig, "One104", ymin = 308, ymax = 309)
      print(res$plt)

    • Extract the problem samples
      DB_orig %>%
        filter(Marker == "One104") %>%
        filter(Fragment >= 308.45 & Fragment <= 309)
      Marker    Sample Fragment       Date Plate
    1 One104 OOW040402   308.65 2015-03-07   all
    2 One104 FDB040215   308.70 2015-03-07   all
    • OOW040402 = Peak is 308.2. Fixed
    • FDB040215 = Peak is artifact. Deleted.
  • There is a potentially problematic fragment ~ 309.7

      res <- allCum(DB_orig, "One104", ymin = 308, ymax = 311)
      print(res$plt)

    • Extract the problem samples
      DB_orig %>%
        filter(Marker == "One104") %>%
        filter(Fragment >= 309.5 & Fragment <= 309.8)
      Marker   Sample Fragment       Date Plate
    1 One104 MOY13008    309.7 2015-03-07   all
    • MOY13008 = Peak is 310.1. Fixed
  • There is a potentially problematic fragment ~ 312.7

      res <- allCum(DB_orig, "One104", ymin = 311, ymax = 314)
      print(res$plt)

    • Extract the problem samples
      DB_orig %>%
        filter(Marker == "One104") %>%
        filter(Fragment >= 312.6 & Fragment <= 312.8)
      Marker    Sample Fragment       Date Plate
    1 One104 RBW043503   312.69 2015-03-07   all
    • RBW043503 = Peak as artifact, corrected.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "One104", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
UBN140022 170653 346.6 all
UBN140037 170682 346.6 all
CLG052105 164477 365.2 all
GGR100507 169915 365.2 all
SXM13082 171449 369.1 all


  • UBN140022 = Peak is legit.
  • UBN140037 = Peak is legit.
  • RBW104006 = Weak sample. GT deleted.
  • CLG052105 = Peak is legit.
  • GGR100507 = Peak is legit.
  • SXM13082 = Peak is 369.1. Fixed.
  • KEL050512 = Genotype is not clear. Deleted.

DONE!

Ssa197

Calculate bin statistics for Ssa197

dat <- BinStats(DB_orig, "Ssa197")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
122 2 121.63 121.81 0.18 0.127 121.72 121.72
130 910 129.79 130.3 0.51 0.114 130.08 130.11
134 4823 133.77 134.42 0.65 0.109 134.12 134.14
138 1810 137.81 138.43 0.62 0.1 138.16 138.17
142 1246 141.93 142.46 0.53 0.104 142.25 142.28
146 195 146.13 146.61 0.48 0.091 146.41 146.43
150 233 150.12 150.66 0.54 0.092 150.36 150.37
155 348 154.22 154.81 0.59 0.11 154.54 154.54
159 179 158.41 158.97 0.56 0.133 158.67 158.67
163 66 162.54 163.12 0.58 0.159 162.87 162.93
167 21 166.61 166.97 0.36 0.11 166.86 166.91
171 49 170.74 171.23 0.49 0.117 170.99 170.96


Generate cumulative plot for Ssa197

res <- allCum(DB_orig, "Ssa197")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa197", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
MOY13012 182226 121.6 all
RMN040503 176122 121.8 all


  • MOY13012 = Peak is legit.
  • RMN040503 = Peak is legit.

Oki-10

Calculate bin statistics for Oki-10

dat <- BinStats(DB_orig, "Oki-10")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
214 24 213.71 214 0.29 0.063 213.89 213.91
218 74 217.87 218.34 0.47 0.087 218.08 218.09
222 78 222.06 222.33 0.27 0.059 222.22 222.23
226 288 226.16 226.48 0.32 0.064 226.34 226.34
230 804 230.22 230.62 0.4 0.068 230.45 230.45
235 730 234.32 234.73 0.41 0.071 234.55 234.56
239 341 238.48 238.85 0.37 0.069 238.67 238.67
243 319 242.69 243.61 0.92 0.076 242.87 242.87
247 322 246.87 247.21 0.34 0.065 247.08 247.09
249 1 248.88 248.88 0 NA 248.88 248.88
251 448 251.04 251.53 0.49 0.065 251.26 251.27
255 207 255.15 255.46 0.31 0.066 255.34 255.34
259 321 259.22 259.56 0.34 0.065 259.39 259.38
264 266 263.34 263.65 0.31 0.067 263.5 263.5
268 205 267.44 267.73 0.29 0.067 267.61 267.62
272 365 271.53 272.11 0.58 0.08 271.77 271.77
276 563 275.69 276.2 0.51 0.075 275.91 275.92
280 356 279.9 280.18 0.28 0.079 280.05 280.09
284 299 283.95 284.29 0.34 0.081 284.16 284.17
288 378 288.06 288.47 0.41 0.081 288.26 288.28
292 1039 292.14 292.58 0.44 0.081 292.36 292.36
296 680 296.2 296.73 0.53 0.086 296.44 296.45
301 711 300.28 300.89 0.61 0.084 300.51 300.53
305 617 304.33 304.79 0.46 0.087 304.58 304.59
309 574 308.44 308.89 0.45 0.086 308.68 308.69
313 471 312.54 312.99 0.45 0.093 312.78 312.81
317 232 316.79 317.22 0.43 0.096 317.02 317.04
321 492 321.06 321.44 0.38 0.09 321.26 321.27
325 344 325.2 325.64 0.44 0.096 325.42 325.43
329 40 329.33 329.66 0.33 0.089 329.48 329.46
334 57 333.53 333.81 0.28 0.073 333.71 333.72
338 29 337.57 337.95 0.38 0.098 337.79 337.79
342 11 341.75 342.02 0.27 0.107 341.86 341.81
346 2 345.86 346.09 0.23 0.163 345.98 345.98
353 9 352.92 353.28 0.36 0.113 353.13 353.16


Generate cumulative plot for Oki-10

res <- allCum(DB_orig, "Oki-10")
print(res$plt)


Look for problem fragments

  • There is a potentially problematic fragment ~ 243.6bp

      res <- allCum(DB_orig, "Oki-10", ymin = 242, ymax = 247)
      print(res$plt)

    • Identify the point:
      DB_orig %>%
        filter(Marker == "Oki-10") %>%
        filter(Fragment >= 243.5 & Fragment <= 243.8)
      Marker    Sample Fragment       Date Plate
    1 Oki-10 BGW050301   243.61 2015-03-07   all
    • BGW050301 = Peak is 242.9. Fixed.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Oki-10", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
SXM13179 193118 345.9 all
CLD100913 189346 346.2 all


  • RBW040504 = Unclear genotype, deleted.
  • SXM13179 = Peak is legit.
  • CLD100913 = Peak is 346.2. Fixed.

DONE!

BG935488

Calculate bin statistics for BG935488

dat <- BinStats(DB_orig, "BG935488")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
105 74 104.81 105.55 0.74 0.19 105.21 105.22
114 3180 113.81 114.88 1.07 0.204 114.43 114.46
123 512 122.2 123.22 1.02 0.203 122.7 122.75
127 638 126.3 127.08 0.78 0.169 126.76 126.77
131 2391 130.27 131.3 1.03 0.189 130.85 130.87
135 1436 134.39 135.35 0.96 0.19 134.94 134.97
139 1469 138.61 139.57 0.96 0.184 139.06 139.07
143 891 142.76 143.54 0.78 0.175 143.24 143.25
148 198 146.97 147.74 0.77 0.185 147.48 147.53
152 54 151.26 151.89 0.63 0.167 151.65 151.67
156 46 155.48 156.15 0.67 0.147 155.87 155.87


Generate cumulative plot for BG935488

res <- allCum(DB_orig, "BG935488")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "BG935488", 3)

No valid samples

if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}


DONE!

SsaD71

Calculate bin statistics for SsaD71

dat <- BinStats(DB_orig, "SsaD71", limit = 1.0)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
177 1 176.66 176.66 0 NA 176.66 176.66
184 1950 183.67 184.85 1.18 0.203 184.47 184.49
189 2112 187.69 189 1.31 0.211 188.59 188.61
193 803 192.03 193.05 1.02 0.21 192.71 192.74
197 929 196.06 197.27 1.21 0.221 196.87 196.89
201 102 200.2 201.26 1.06 0.249 200.89 200.92
205 398 204.22 205.31 1.09 0.203 205 205.03
209 613 208.28 209.4 1.12 0.228 209.04 209.09
213 1243 212.33 213.54 1.21 0.216 213.12 213.16
217 586 216.6 217.6 1 0.192 217.27 217.28
221 386 220.67 221.7 1.03 0.195 221.36 221.36
225 154 224.82 225.7 0.88 0.186 225.4 225.4
230 472 228.76 229.83 1.07 0.208 229.48 229.51
234 360 232.96 233.87 0.91 0.189 233.53 233.54
238 732 237.06 237.92 0.86 0.181 237.59 237.61
242 215 241.01 242.03 1.02 0.225 241.69 241.75
246 314 245.21 246.2 0.99 0.184 245.86 245.87
250 45 249.3 250.27 0.97 0.305 249.8 249.9
254 5 253.71 254.31 0.6 0.226 254.09 254.15
262 6 261.74 262.43 0.69 0.259 262.12 262.09
266 4 265.8 266.41 0.61 0.292 266.15 266.19
270 11 269.85 270.53 0.68 0.232 270.22 270.1


Generate cumulative plot for SsaD71

res <- allCum(DB_orig, "SsaD71", limit = 1.0)
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD71", 3, limit = 1.0)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
KEL040515 209317 176.7 all


DONE!

Sasa-TAP2A

Calculate bin statistics for Sasa-TAP2A

dat <- BinStats(DB_orig, "Sasa-TAP2A",  limit = 1.15)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
285 22 284.44 285.06 0.62 0.229 284.79 284.86
287 1085 286.23 287.15 0.92 0.184 286.84 286.86
301 1 300.97 300.97 0 NA 300.97 300.97
306 3 305.49 305.77 0.28 0.14 305.62 305.61
310 624 309.02 310.02 1 0.226 309.58 309.61
316 1768 315.18 316.33 1.15 0.234 315.93 315.97
318 24 317.57 318.32 0.75 0.224 318.08 318.15
321 1 320.09 320.09 0 NA 320.09 320.09
322 953 321.18 322.87 1.69 0.286 322.09 322.09
324 87 323.66 324.52 0.86 0.227 324.17 324.18
326 3530 325.58 326.97 1.39 0.246 326.41 326.45
329 463 327.85 329.07 1.22 0.237 328.58 328.63
330 982 329.7 331.13 1.43 0.291 330.48 330.48
332 2 332.32 332.32 0 0 332.32 332.32
337 1310 335.95 336.94 0.99 0.214 336.52 336.53


Generate cumulative plot for Sasa-TAP2A

res <- allCum(DB_orig, "Sasa-TAP2A", limit = 1.15)
print(res$plt)


  • There are some odd samples ~ 318bp

      res <- allCum(DB_orig, "Sasa-TAP2A", limit = 1.15, ymin = 315, 
                    ymax = 325)
      print(res$plt)

    • Extract the problem samples
      # Small fragments
      DB_orig %>%
        filter(Marker == "Sasa-TAP2A") %>%
        filter(Fragment >= 317 & Fragment <= 317.7)
          Marker    Sample Fragment       Date Plate
    1 Sasa-TAP2A UBN140071   317.57 2015-03-07   all
    2 Sasa-TAP2A UBN140089   317.62 2015-03-07   all
    3 Sasa-TAP2A UBN140091   317.62 2015-03-07   all
    4 Sasa-TAP2A UBN140113   317.62 2015-03-07   all
      # Large fragments
      DB_orig %>%
        filter(Marker == "Sasa-TAP2A") %>%
        filter(Fragment >= 318 & Fragment <= 318.7)
           Marker    Sample Fragment       Date Plate
    1  Sasa-TAP2A RBW040508   318.15 2015-03-07   all
    2  Sasa-TAP2A RBW040610   318.18 2015-03-07   all
    3  Sasa-TAP2A RBW040614   318.12 2015-03-07   all
    4  Sasa-TAP2A RBW040703   318.12 2015-03-07   all
    5  Sasa-TAP2A RBW041411   318.15 2015-03-07   all
    6  Sasa-TAP2A RBW041507   318.09 2015-03-07   all
    7  Sasa-TAP2A RBW042007   318.18 2015-03-07   all
    8  Sasa-TAP2A RBW042118   318.09 2015-03-07   all
    9  Sasa-TAP2A RBW050108   318.25 2015-03-07   all
    10 Sasa-TAP2A RBW051403   318.18 2015-03-07   all
    11 Sasa-TAP2A RBW051609   318.18 2015-03-07   all
    12 Sasa-TAP2A RBW051704   318.06 2015-03-07   all
    13 Sasa-TAP2A RBW052610   318.15 2015-03-07   all
    14 Sasa-TAP2A RBW052705   318.12 2015-03-07   all
    15 Sasa-TAP2A RBW053007   318.15 2015-03-07   all
    16 Sasa-TAP2A RBW054306   318.15 2015-03-07   all
    17 Sasa-TAP2A RFY050207   318.15 2015-03-07   all
    18 Sasa-TAP2A SXM102501   318.32 2015-03-07   all
    19 Sasa-TAP2A SXM102603   318.30 2015-03-07   all
    20 Sasa-TAP2A BCC100123   318.30 2015-03-07   all

    Take all small fragment samples and four large fragment samples to check of they are actually different alleles.

    sasatap2a1

    There is a clear wobble between the samples compared. The two top samples in the image above correspond to the small fragments, while the bottom two correspond to the larger fragments. Given the seperation between the fragments is so small (average = 0.6), a more conservative approach should be taken. Alleles will remain binned into the same allele.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Sasa-TAP2A", 3, limit = 1.15)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
DGB040232 221219 305.5 all
RMN050506 218705 305.6 all
SJH100122 223475 305.8 all
BRD052212 220233 332.3 all
BRD052216 220239 332.3 all


  • CLG041101 = Bad size. GT deleted.
  • DGB040232 = Peak is legit.
  • RMN050506 = Peak is legit.
  • SJH100122 = Peak is legit.
  • BRD051612 = Very weak peak. GT deleted.
  • BRD052212 = Peak is legit.
  • BRD052216 = Peak is legit.

DONE!

CA053293

Calculate bin statistics for CA053293

dat <- BinStats(DB_orig, "CA053293")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
140 1 140.19 140.19 0 NA 140.19 140.19
144 1 143.82 143.82 0 NA 143.82 143.82
148 183 147.65 148.12 0.47 0.093 147.88 147.9
152 1483 150.79 152.06 1.27 0.112 151.77 151.78
154 693 153.47 153.9 0.43 0.084 153.71 153.71
155 30 154.66 154.74 0.08 0.025 154.71 154.7
156 3520 154.75 156.67 1.92 0.112 155.75 155.76
157 7 156.68 156.71 0.03 0.013 156.69 156.7
158 2778 157.2 158.57 1.37 0.097 157.63 157.63
159 40 158.62 158.71 0.09 0.022 158.67 158.68
160 1282 158.74 160.47 1.73 0.187 159.53 159.54
161 311 160.55 161.68 1.13 0.404 161.21 161.43
162 25 162.37 162.57 0.2 0.053 162.49 162.49
163 44 162.48 163.5 1.02 0.242 163.31 163.38
164 21 164.33 164.47 0.14 0.047 164.41 164.41
166 1 166.31 166.31 0 NA 166.31 166.31


Generate cumulative plot for CA053293

res <- allCum(DB_orig, "CA053293")
print(res$plt)


Identify problem bins.

  • There is an odd fragment ~ 158.3:

      res <- allCum(DB_orig, "CA053293", limit = 0.4, ymin = 157, 
                    ymax = 160)
      print(res$plt)

    • Identify the sample:
      DB_orig %>%
        filter(Marker == "CA053293") %>%
        filter(Fragment >= 158.2 & Fragment <= 158.45)
        Marker    Sample Fragment       Date Plate
    1 CA053293 UBN140064   158.34 2015-03-07   all
    • UBN140064 = Peak is legit.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA053293", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
RBW044101 227982 140.2 all
UBN140057 235904 143.8 all
BRD050916 228855 166.3 all
  • RBW044101 = Peak is legit.
  • UBN140057 = Peak is legit.
  • BRD050916 = Peak is legit.


Ssa422UoS

Calculate bin statistics for Ssa422UoS

dat <- BinStats(DB_orig, "Ssa422UoS")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
132 1 131.7 131.7 0 NA 131.7 131.7
157 1 156.93 156.93 0 NA 156.93 156.93
159 3 158.55 158.61 0.06 0.03 158.58 158.58
160 1210 159.09 160.74 1.65 0.306 160.38 160.47
161 17 161.1 161.41 0.31 0.094 161.31 161.34
162 48 161.47 161.75 0.28 0.076 161.63 161.65
163 3232 161.99 162.78 0.79 0.145 162.5 162.52
165 112 164.16 164.76 0.6 0.142 164.53 164.55
167 48 166.38 166.75 0.37 0.1 166.58 166.59
168 1 167.56 167.56 0 NA 167.56 167.56
169 3899 168.17 168.92 0.75 0.143 168.6 168.62
171 7 170.56 170.81 0.25 0.086 170.75 170.77
201 121 200.78 201.33 0.55 0.126 201.07 201.09
207 378 206.82 207.41 0.59 0.138 207.14 207.16
211 14 210.85 211.23 0.38 0.098 211.04 211.06
213 16 212.7 213.2 0.5 0.179 213.01 213.07
215 6 214.9 215.14 0.24 0.089 215.08 215.11
217 2 217.19 217.23 0.04 0.028 217.21 217.21
219 30 218.92 219.37 0.45 0.129 219.25 219.28
221 17 221 221.41 0.41 0.13 221.23 221.28
223 407 222.94 223.49 0.55 0.112 223.26 223.28
225 846 224.95 225.51 0.56 0.112 225.28 225.28
227 1 227.38 227.38 0 NA 227.38 227.38
229 8 229.08 229.47 0.39 0.138 229.37 229.44
243 1 243.44 243.44 0 NA 243.44 243.44
246 7 245.48 245.76 0.28 0.099 245.66 245.7


Generate cumulative plot for Ssa422UoS

res <- allCum(DB_orig, "Ssa422UoS")
print(res$plt)


Identify problems

  • Due to the presence of 1bp shift in this locus, all samples run after 14/04/14 will have 0.35 added to their fragment value. This has an excellent effect on binning accuracy.

    Before

      res <- allCum(DB_orig, "Ssa422UoS", limit = 0.55, ymin = 158, 
                    ymax = 164)
      print(res$plt)

    After

      res <- allCum(DB_new, "Ssa422UoS", limit = 0.55, ymin = 158, 
                    ymax = 164)
      print(res$plt)

  • There is a potentially problematic sample ~ 168

      res <- allCum(DB_new, "Ssa422UoS", limit = 0.55, ymin = 160, 
                    ymax = 170)
      print(res$plt)

    • Identify the point:
      DB_new %>%
        filter(Marker == "Ssa422UoS") %>%
        filter(Fragment >= 167.6 & Fragment <= 168)
         Marker    Sample Fragment       Date Plate
    1 Ssa422UoS UBN140064   167.91 2015-03-07   all
    • Fragment is legit.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa422UoS", 3, limit = 0.55)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
KEL040916 241736 156.9 all
BLD101205 243834 158.6 all
BLD100506 243715 158.6 all
BLD100509 243721 158.6 all
UBN140064 246334 167.9 all
RBW053002 238622 217.2 all
FDB040219 238914 217.2 all
RCK100102 245168 227.5 all
RMN13234 247885 243.8 all


  • CLM050109 = Peak is an artifact. Fixed.
  • KEL040916 = Peak is legit.
  • BLD101205 = Peak is legit.
  • BLD100506 = Peak is legit.
  • BLD100509 = Peak is legit.
  • UBN140064 = Peak is legit.
  • RBW053002 = Peak is legit.
  • FDB040219 = Peak is legit.
  • RCK100102 = Peak is 227.5. Edited.
  • RMN13234 = Peak is legit.

DONE!

CA060208

Calculate bin statistics for CA060208

dat <- BinStats(DB_orig, "CA060208")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
156 646 155.15 155.73 0.58 0.12 155.49 155.5
158 4 157.26 157.58 0.32 0.155 157.49 157.56
160 1 160 160 0 NA 160 160
161 4361 160.82 161.6 0.78 0.118 161.33 161.33
165 3681 164.74 165.51 0.77 0.116 165.17 165.18
167 484 166.71 167.4 0.69 0.115 167.1 167.11
171 724 170.63 171.15 0.52 0.123 170.93 170.95
175 3 174.64 174.91 0.27 0.156 174.82 174.91


Generate cumulative plot for CA060208

res <- allCum(DB_orig, "CA060208")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

Identify potential problem samples

  • There is an odd bin ~ 157bp:

      res <- allCum(DB_orig, "CA060208", ymin = 155, ymax = 160)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
        filter(Marker == "CA060208") %>%
        filter(Fragment >= 157 & Fragment <= 158)
        Marker    Sample Fragment       Date Plate
    1 CA060208 SXM100205   157.57 2015-03-07   all
    2 CA060208 SXM100206   157.56 2015-03-07   all
    3 CA060208 RBW104002   157.58 2015-03-07   all
    4 CA060208  LND14048   157.26 2015-03-07   all

    All peaks are legit.

tab <- getLowFreq(DB_orig, "CA060208", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
UBN140064 256347 160 all
LND14083 258060 174.6 all
DGH100501 254127 174.9 all
BCC100128 254689 174.9 all


  • UBN140064 = Peak is legit.
  • LND14083 = Peak is legit.
  • BCC100128 = Peak is legit.
  • DGH100501 = Peak is legit.

DONE!

MHC-I-UTR

Calculate bin statistics for MHC-I-UTR

dat <- BinStats(DB_orig, "MHC-I-UTR")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
240 791 239.8 240.82 1.02 0.216 240.14 240.09
242 1 242.13 242.13 0 NA 242.13 242.13
245 436 244.62 245.62 1 0.13 245 245.01
247 318 246.41 246.99 0.58 0.128 246.71 246.71
249 105 248.19 248.95 0.76 0.18 248.55 248.55
251 460 250.29 251.45 1.16 0.296 251.08 251.2
252 2953 251.46 253.09 1.63 0.259 252.49 252.41
254 1914 254 254.9 0.9 0.178 254.49 254.49
256 293 255.76 256.27 0.51 0.125 256.07 256.09
258 148 257.88 258.69 0.81 0.16 258.25 258.28
262 27 261.8 262.23 0.43 0.134 262.05 262.1
264 700 263.62 264.29 0.67 0.123 264.03 264.05
274 159 273.42 274.03 0.61 0.12 273.79 273.79
278 2 277.8 277.83 0.03 0.021 277.81 277.81
282 18 281.42 281.76 0.34 0.084 281.6 281.62
428 972 426.85 428.87 2.02 0.231 428.03 428.04
430 389 429.32 430.35 1.03 0.228 429.95 429.97
432 207 431.34 432.32 0.98 0.222 431.91 431.92


Generate cumulative plot for MHC-I-UTR

res <- allCum(DB_orig, "MHC-I-UTR")
print(res$plt)


Identify problems

  • There is a sole point associated with a bin ~ 245bp:

      res <- allCum(DB_orig, "MHC-I-UTR", ymin = 244, ymax = 248)
      print(res$plt)

    • Identify the point:
      DB_orig %>%
        filter(Marker == "MHC-I-UTR") %>%
        filter(Fragment >= 245.5 & Fragment <= 246)
         Marker    Sample Fragment       Date Plate
    1 MHC-I-UTR CLG051018   245.62 2015-03-07   all
    • CLG051018 = Peak is 246.6, fixed in “Main_DB_new.txt”.
  • There is a problem bin ~ 252bp:

      res <- allCum(DB_orig, "MHC-I-UTR", ymin = 250, ymax = 253.5)
      print(res$plt)

    • Identify the points between 251.6 and 251.9:
      DB_orig %>%
        filter(Marker == "MHC-I-UTR") %>%
        filter(Fragment >= 251.6 & Fragment <= 251.9)
         Marker    Sample Fragment       Date Plate
    1 MHC-I-UTR RBW040413   251.72 2015-03-07   all
    2 MHC-I-UTR BRD040722   251.83 2015-03-07   all
    3 MHC-I-UTR BRD040908   251.71 2015-03-07   all
    4 MHC-I-UTR KLW040505   251.64 2015-03-07   all
    5 MHC-I-UTR SKW040122   251.70 2015-03-07   all
    • RBW040413 = Odd peak. Deleted.
    • BRD040722 = Peak is 250.3. Fixed.
    • BRD040908 = Peak is 252.6. Fixed.
    • KLW040505 = Peak is 252.6. Fixed.
    • SKW040122 = Peak is 252.7. Fixed.
  • Binning limits below 251.7 should be set to 0.7, while limits above this threshold should be 0.8.

  • There are two outlier fragments associated with the bin ~ 428bp:

      res <- allCum(DB_orig, "MHC-I-UTR", ymin = 425, ymax = 430)
      print(res$plt)

    • Identify the two points:
      # small point
      DB_orig %>%
        filter(Marker == "MHC-I-UTR") %>%
        filter(Fragment >= 426.8 & Fragment <= 427)
         Marker    Sample Fragment       Date Plate
    1 MHC-I-UTR KEL051207   426.85 2015-03-07   all
      # large point
      DB_orig %>%
        filter(Marker == "MHC-I-UTR") %>%
        filter(Fragment >= 428.8 & Fragment <= 429)
         Marker    Sample Fragment       Date Plate
    1 MHC-I-UTR BRD040706   428.87 2015-03-07   all
    • KEL051207 = Fragment is an artifact. Fixed
    • BRD040706 = Ambigious allele peak. Disgarded.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "MHC-I-UTR", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BCC100137 264662 242.1 all
GVY100904 265081 277.8 all
GVY100905 265083 277.8 all


  • BCC100137 = Peak is legit.
  • GVY100904 = Peak is legit.
  • GVY100905 = Peak is legit.

DONE!

SsaD170

Calculate bin statistics for SsaD170

dat <- BinStats(DB_orig, "SsaD170")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
137 78 136.17 136.86 0.69 0.154 136.58 136.6
149 26 148.39 148.99 0.6 0.206 148.73 148.78
153 377 152.43 153.09 0.66 0.153 152.81 152.83
157 281 156.49 157.2 0.71 0.146 156.93 156.95
161 225 160.39 161.28 0.89 0.177 161.01 161.03
165 416 164.63 166.01 1.38 0.169 165.03 165.04
169 1288 168.56 170.02 1.46 0.254 169.1 169.08
170 53 170.06 170.21 0.15 0.05 170.13 170.12
173 1247 172.65 174.09 1.44 0.17 173.07 173.07
174 5 174.13 174.21 0.08 0.031 174.16 174.16
177 1386 176.69 178.16 1.47 0.167 177.09 177.09
178 2 178.19 178.19 0 0 178.19 178.19
181 1797 180.68 182.16 1.48 0.199 181.14 181.13
182 20 182.17 182.32 0.15 0.049 182.25 182.25
185 1380 184.76 186.22 1.46 0.187 185.2 185.2
186 9 186.25 186.35 0.1 0.032 186.29 186.28
189 1088 188.81 190.22 1.41 0.255 189.3 189.28
190 39 190.24 190.44 0.2 0.059 190.34 190.34
193 832 192.89 194.27 1.38 0.181 193.31 193.31
194 10 194.29 194.45 0.16 0.05 194.39 194.41
195 1 195.24 195.24 0 NA 195.24 195.24
197 515 196.94 198.22 1.28 0.173 197.36 197.38
198 17 198.25 198.53 0.28 0.076 198.42 198.45
201 216 201.05 201.63 0.58 0.153 201.38 201.4
202 59 201.98 202.53 0.55 0.124 202.33 202.34
205 23 205.01 205.55 0.54 0.159 205.29 205.34
206 53 205.56 206.58 1.02 0.27 206.3 206.41
209 26 209.09 210.11 1.02 0.203 209.41 209.43
210 2 210.45 210.56 0.11 0.078 210.5 210.5
213 4 213.12 213.56 0.44 0.193 213.39 213.44
218 41 217.15 217.73 0.58 0.179 217.48 217.52
222 1 221.67 221.67 0 NA 221.67 221.67


Generate cumulative plot for SsaD170

res <- allCum(DB_orig, "SsaD170")
print(res$plt)


Identify problems

  • There are a number of larger fragments associated with the bin ~ 165bp:

      res <- allCum(DB_orig, "SsaD170", ymin = 163, ymax = 167)
      print(res$plt)

    • Identify the points:
      DB_orig %>%
        filter(Marker == "SsaD170") %>%
        filter(Fragment >= 165.5 & Fragment <= 166.2)
       Marker    Sample Fragment       Date Plate
    1 SsaD170 KEL051524   165.85 2015-03-07   all
    2 SsaD170 CGN100701   166.01 2015-03-07   all
    3 SsaD170  SXM13051   165.62 2015-03-07   all
    4 SsaD170  SXM13131   165.75 2015-03-07   all
    • KEL051524 = Peak is legit.
    • CGN100701 = Peak is legit.
    • SXM13051 = Peak is legit.
    • SXM13131 = Peak is legit.

    It appeares that these fragments are seperated from the main bin by 1bp. Setting bin limits to 0.25 between 164bp and 167bp allow the algorithm to differentiate alleles.

  • There seems to be another 1bp split in the bin ~ 169bp:

      res <- allCum(DB_orig, "SsaD170", ymin = 168, ymax = 173)
      print(res$plt)

    • Setting binning limits to 0.25 in this region allows the algorithm to differentiate alleles.
  • On closer inspection, most bins between 163bp and 200 bp contain a 1bp shift group of fragments. Setting binning limit to 0.25 accurately seperates these alleles.

  • There is an odd fragment ~ 195.5bp:

      res <- allCum(DB_orig, "SsaD170", ymin = 192, ymax = 198)
      print(res$plt)

    • Identify the point
      DB_orig %>%
        filter(Marker == "SsaD170") %>%
        filter(Fragment >= 195 & Fragment <= 196)
       Marker    Sample Fragment       Date Plate
    1 SsaD170 RBW040504   195.24 2015-03-07   all
    • RBW040504 = Peak is legit.
  • Setting binning limit to 0.25 does not accuratly seperate 1bp differences for alleles between 200 - 208 bp. Increasing binning limits to 0.35 overcomes this issue.

    Bin limit = 0.25

      res <- allCum(DB_orig, "SsaD170", limit = 0.25, ymin = 200, 
                    ymax = 210)
      print(res$plt)

    Bin limit = 0.35

      res <- allCum(DB_orig, "SsaD170", limit = 0.35, ymin = 200, 
                    ymax = 210)
      print(res$plt)

  • The fragments ~ 210bp are not binned accuratly with a bin limit of 0.35. Setting bin limit to 0.45 for this region works. All fragments above this region should be binned with a bin limit of 0.8.

To summarise the binning pattern for this locus:

  • Fragments between 130bp - 162bp should have a bin limit of 0.8
  • Fragments between 163bp - 200bp should have a bin limit of 0.25
  • Fragments between 201bp - 208bp should have a bin limit of 0.35
  • Fragments between 209bp - 212bp should have a bin limit of 0.45
  • All larger fragments should have a bin limit of 0.8

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD170", 3, 
                  limit = list(c(162, 0.8), c(200, 0.25), c(208, 0.35),
                               c(212, 0.45), c(250, 0.8)))
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
RBW040504 268310 195.2 all
SXM13049 278228 210.1 all
IOM120253 275731 210.4 all
ALG100105 276215 210.6 all
BLD102202 274729 221.7 all


  • RBW040504 = Peak is legit.
  • SXM13049 = Peak is legit.
  • IOM120253 = Peak is legit.
  • ALG100105 = Peak is legit.
  • BLD102202 = Peak is legit.

DONE!

Ssa413UoS

Calculate bin statistics for Ssa413UoS

dat <- BinStats(DB_orig, "Ssa413UoS", limit = 0.9)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
224 3 223.15 224.03 0.88 0.443 223.56 223.5
226 3509 226 226.64 0.64 0.118 226.38 226.39
229 2693 228.77 229.72 0.95 0.125 229.44 229.45
232 2 232.42 232.48 0.06 0.042 232.45 232.45
236 553 235.3 235.77 0.47 0.106 235.58 235.59
242 6 241.52 241.81 0.29 0.118 241.64 241.59
245 731 244.52 245.06 0.54 0.131 244.82 244.83
248 22 247.63 248.12 0.49 0.157 247.81 247.74
251 150 250.75 251.25 0.5 0.132 251.03 251.01
254 7 253.83 254.19 0.36 0.133 254.09 254.15
257 2114 256.65 257.3 0.65 0.135 257.06 257.07
260 217 259.61 260.27 0.66 0.118 260.07 260.09
263 137 262.65 263.3 0.65 0.134 262.98 262.98
266 13 265.76 266.28 0.52 0.155 266.06 266.06
272 242 271.83 272.36 0.53 0.126 272.13 272.13
275 8 274.97 275.33 0.36 0.139 275.18 275.2


Generate cumulative plot for Ssa413UoS

res <- allCum(DB_orig, "Ssa413UoS", limit = 0.9)
print(res$plt)


Identify problems

  • There is an odd fragment ~ 228.8 bp:

      res <- allCum(DB_orig, "Ssa413UoS", limit = 0.9, ymin = 225, 
                    ymax = 230)
      print(res$plt)

    • Identify the point
      DB_orig %>% 
        filter(Marker == "Ssa413UoS") %>% 
        filter(Fragment >= 228 & Fragment <= 229)
         Marker    Sample Fragment       Date Plate
    1 Ssa413UoS BLD101004   228.77 2015-03-07   all
    • BLD101004 = Peak is an artifact. Deleted.

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_new, "Ssa413UoS", 3, limit = 0.9)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
UBN140015 287974 223.2 all
RBW042903 280071 223.5 all


  • UBN140015 = Peak is legit.
  • RBW042903 = Peak is legit.
  • KEL040106 = Peak is artifact. Fixed.
  • ART050105 = Peak is artifact. Fixed.

DONE!

Ssa407UoS

Calculate bin statistics for Ssa407UoS

dat <- BinStats(DB_orig, "Ssa407UoS", limit = 0.75)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
203 2 203.24 203.48 0.24 0.17 203.36 203.36
208 28 207.41 208.11 0.7 0.191 207.84 207.87
209 9 209.09 209.62 0.53 0.199 209.37 209.43
212 3 212.11 212.23 0.12 0.062 212.18 212.2
214 8 213.45 213.64 0.19 0.068 213.54 213.54
218 82 217.4 217.94 0.54 0.14 217.72 217.72
220 2 220 220 0 0 220 220
222 930 221.55 222.16 0.61 0.133 221.88 221.89
224 25 223.76 224.32 0.56 0.154 224.03 224
226 220 225.68 226.26 0.58 0.129 226.01 226
228 3 227.94 228.32 0.38 0.206 228.18 228.27
230 246 229.79 230.39 0.6 0.151 230.13 230.15
232 12 231.98 232.47 0.49 0.209 232.27 232.38
234 386 233.96 234.49 0.53 0.145 234.29 234.32
236 71 236.09 236.59 0.5 0.143 236.38 236.39
238 746 238 238.62 0.62 0.134 238.4 238.43
240 12 240.3 240.73 0.43 0.149 240.51 240.46
243 986 242.28 242.99 0.71 0.136 242.6 242.6
247 364 246.54 247.12 0.58 0.145 246.82 246.83
249 29 248.72 249.21 0.49 0.141 249.03 249.07
251 699 250.75 251.27 0.52 0.125 251.04 251.05
253 23 252.9 253.35 0.45 0.129 253.16 253.17
255 901 254.82 255.37 0.55 0.125 255.13 255.11
257 60 256.99 257.47 0.48 0.124 257.23 257.22
259 383 258.94 259.42 0.48 0.132 259.21 259.21
261 117 261.05 261.51 0.46 0.121 261.29 261.28
263 549 262.78 263.59 0.81 0.137 263.3 263.31
265 89 265.13 265.64 0.51 0.121 265.43 265.43
267 688 267.13 267.77 0.64 0.136 267.45 267.48
270 241 269.26 269.87 0.61 0.14 269.59 269.61
272 803 271.24 271.83 0.59 0.139 271.59 271.61
274 89 273.39 273.98 0.59 0.164 273.76 273.84
276 765 275.41 275.98 0.57 0.138 275.73 275.75
278 11 277.61 278.07 0.46 0.194 277.86 277.96
280 482 279.51 280.17 0.66 0.143 279.88 279.91
282 25 281.74 282.2 0.46 0.135 281.97 282
284 464 283.45 284.26 0.81 0.141 284 284.01
286 3 285.82 285.88 0.06 0.032 285.84 285.83
288 149 287.78 288.34 0.56 0.15 288.08 288.11
292 49 291.89 292.43 0.54 0.139 292.19 292.18
294 1 294.49 294.49 0 NA 294.49 294.49
296 80 296.03 296.54 0.51 0.132 296.26 296.27
298 1 298 298 0 NA 298 298
300 12 300.19 300.61 0.42 0.191 300.38 300.27
304 21 304.14 304.75 0.61 0.158 304.39 304.38
308 10 308.32 308.75 0.43 0.154 308.51 308.48
313 22 312.38 312.95 0.57 0.137 312.66 312.64
317 4 316.64 317.09 0.45 0.206 316.79 316.72
321 3 320.88 321.06 0.18 0.092 320.98 321
325 6 325.15 325.59 0.44 0.182 325.34 325.31
329 2 329.29 329.3 0.01 0.007 329.3 329.3
330 14 329.22 329.71 0.49 0.166 329.52 329.54
334 7 333.29 333.71 0.42 0.141 333.53 333.56


Generate cumulative plot for Ssa407UoS

res <- allCum(DB_orig, "Ssa407UoS", limit = 0.75)
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "Ssa407UoS", 3, limit = 0.75)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BRD042010 294535 203.2 all
RMN103104 296125 203.5 all
GGR100909 298096 212.1 all
GGR100905 298090 212.2 all
BLD101302 296139 212.2 all
RBW053904 290140 220 all
RBW051503 290709 220 all
RMN13001 299880 227.9 all
LCH100104 298396 228.3 all
LCH100105 298398 228.3 all
UBN140121 298945 285.8 all
UBN140125 298953 285.8 all
UBN140110 298925 285.9 all
BCC100119 296983 294.5 all
UBN140097 298900 298 all
RMN13191 300329 320.9 all
ART040205 294623 321 all
BRD040723 299005 321.1 all


  • BRD042010 = Peak is legit.
  • RMN103104 = Peak is legit.
  • GGR100909 = Peak is legit.
  • GGR100905 = Peak is legit.
  • BLD101302 = Peak is legit.
  • RBW051503 = Peak is legit.
  • RBW053904 = Peak is legit.
  • RMN13001 = Peak is legit.
  • LCH100104 = Peak is legit.
  • LCH100105 = Peak is legit.
  • UBN140121 = Peak is legit.
  • UBN140125 = Peak is legit.
  • UBN140110 = Peak is legit.
  • BCC100119 = Peak is legit.
  • UBN140097 = Peak is legit.
  • RMN13191 = Peak is legit.
  • ART040205 = Peak is legit.
  • BRD040723 = Peak is legit.
  • BRD100708 = Peak is legit.
  • LSN100401 = Peak is legit.

DONE!

SsaD48

Calculate bin statistics for SsaD48

dat <- BinStats(DB_orig, "SsaD48")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
193 1 192.87 192.87 0 NA 192.87 192.87
197 2 196.95 197.01 0.06 0.042 196.98 196.98
201 4 200.81 200.88 0.07 0.038 200.85 200.85
207 1 207.47 207.47 0 NA 207.47 207.47
240 8 239.52 239.71 0.19 0.087 239.62 239.62
287 1 287.37 287.37 0 NA 287.37 287.37
306 4 306.22 306.42 0.2 0.088 306.31 306.31
310 1 310.2 310.2 0 NA 310.2 310.2
318 1 318.48 318.48 0 NA 318.48 318.48
332 1 331.58 331.58 0 NA 331.58 331.58
335 18 334.98 335.78 0.8 0.268 335.42 335.46
339 1 338.85 338.85 0 NA 338.85 338.85
350 2 349.78 350.21 0.43 0.304 350 350
354 28 353.6 354.08 0.48 0.139 353.82 353.81
356 19 354.87 355.92 1.05 0.278 355.57 355.61
358 3 357.91 358.4 0.49 0.265 358.1 357.98
362 1 361.8 361.8 0 NA 361.8 361.8
363 12 362.29 362.89 0.6 0.205 362.58 362.57
367 7 366.63 367.08 0.45 0.166 366.85 366.9
370 17 369.94 370.13 0.19 0.049 370.06 370.08
371 147 370.17 371.59 1.42 0.267 370.61 370.62
372 1 371.99 371.99 0 NA 371.99 371.99
374 2 373.74 373.77 0.03 0.021 373.75 373.75
375 8 374.72 375.5 0.78 0.278 375.11 375.08
376 5 375.77 375.93 0.16 0.068 375.86 375.84
378 7 377.58 378.08 0.5 0.167 377.84 377.85
379 26 378.25 379.83 1.58 0.352 378.81 378.86
382 36 381.64 382 0.36 0.089 381.81 381.8
383 243 382.02 383.8 1.78 0.359 382.8 382.81
384 20 383.82 384.02 0.2 0.058 383.92 383.92
385 2 385.02 385.11 0.09 0.064 385.06 385.06
386 7 385.28 385.86 0.58 0.255 385.61 385.75
387 114 385.92 387.84 1.92 0.316 386.67 386.69
388 3 388.02 388.09 0.07 0.036 388.05 388.04
390 57 389.15 390.66 1.51 0.363 389.91 389.91
391 8 390.84 391.5 0.66 0.226 391.1 391.06
392 5 391.67 392.09 0.42 0.159 391.89 391.88
393 1 392.83 392.83 0 NA 392.83 392.83
394 125 393.51 394.67 1.16 0.248 394.34 394.43
395 54 394.7 395.58 0.88 0.278 394.98 394.88
396 26 395.68 396.36 0.68 0.182 395.95 395.94
398 7 397.41 398.01 0.6 0.198 397.81 397.89
399 39 398.11 399.1 0.99 0.28 398.7 398.71
400 69 399.12 400.36 1.24 0.227 399.72 399.74
402 2 401.38 401.5 0.12 0.085 401.44 401.44
403 112 402.11 403.54 1.43 0.326 403.04 403.07
404 85 403.56 404.23 0.67 0.188 403.84 403.77
406 9 405.11 406 0.89 0.342 405.74 405.94
407 126 406.08 407.71 1.63 0.404 407.19 407.3
408 15 407.72 408.22 0.5 0.122 407.81 407.8
409 26 408.53 409.38 0.85 0.295 409.11 409.24
410 15 409.68 410.01 0.33 0.114 409.88 409.93
411 246 410.05 411.8 1.75 0.439 411.09 411.12
412 3 412.23 412.52 0.29 0.145 412.38 412.38
414 133 413.23 414.47 1.24 0.267 414.13 414.19
415 249 414.51 415.83 1.32 0.393 415.15 415.23
416 25 415.88 416.95 1.07 0.303 416.23 416.17
418 30 417.54 417.96 0.42 0.098 417.86 417.9
419 418 418 419.66 1.66 0.428 418.87 418.82
420 93 419.73 420.98 1.25 0.344 420.09 419.91
421 2 421.11 421.18 0.07 0.049 421.14 421.14
422 7 421.49 421.95 0.46 0.178 421.77 421.84
423 209 421.98 423.46 1.48 0.418 422.86 422.79
424 243 423.47 425.01 1.54 0.417 423.99 423.91
425 6 425.12 425.29 0.17 0.071 425.18 425.15
426 35 425.38 426.11 0.73 0.199 425.91 425.99
427 260 426.13 427.51 1.38 0.368 426.77 426.72
428 142 427.52 428.94 1.42 0.348 427.98 427.96
429 2 429.19 429.19 0 0 429.19 429.19
430 43 429.3 430.06 0.76 0.248 429.77 429.88
431 425 430.08 431.56 1.48 0.305 430.7 430.72
432 236 431.59 433.05 1.46 0.297 432.16 432.1
433 4 433.12 433.31 0.19 0.086 433.19 433.15
434 15 433.42 433.96 0.54 0.16 433.82 433.88
435 348 434.01 435.53 1.52 0.335 434.67 434.68
436 230 435.54 436.98 1.44 0.395 436.23 436.25
437 6 437.02 437.29 0.27 0.107 437.16 437.16
438 118 437.59 438.62 1.03 0.252 438.28 438.36
439 61 438.63 439.33 0.7 0.176 438.82 438.78
440 246 439.38 440.8 1.42 0.366 440.21 440.26
441 45 440.81 441.44 0.63 0.172 440.98 440.92
442 293 441.54 442.92 1.38 0.275 442.41 442.42
443 31 442.93 443.49 0.56 0.205 443.26 443.32
444 147 443.5 444.94 1.44 0.354 444.27 444.25
445 15 444.98 445.16 0.18 0.06 445.03 445.01
446 4 445.84 445.85 0.01 0.006 445.85 445.85
447 150 446.03 447.41 1.38 0.26 446.61 446.59
448 149 447.48 448.76 1.28 0.332 448.25 448.33
449 36 448.77 449.31 0.54 0.151 448.97 448.95
451 77 449.98 451.13 1.15 0.229 450.61 450.63
452 64 451.42 452.2 0.78 0.219 451.89 451.96
453 194 452.21 453.73 1.52 0.328 452.81 452.83
454 6 453.81 453.95 0.14 0.066 453.9 453.93
455 119 454.02 455.12 1.1 0.239 454.59 454.63
456 29 455.51 455.98 0.47 0.122 455.82 455.83
457 165 455.99 457.54 1.55 0.334 456.71 456.71
459 227 457.81 459.41 1.6 0.229 458.62 458.6
460 111 459.5 461.07 1.57 0.385 460.29 460.35
461 12 461.15 461.45 0.3 0.088 461.24 461.21
462 3 461.82 461.95 0.13 0.067 461.89 461.91
463 128 462.08 463.37 1.29 0.277 462.66 462.69
464 141 463.49 465.11 1.62 0.346 464.15 464.18
465 5 465.18 465.34 0.16 0.071 465.24 465.2
466 5 465.6 465.92 0.32 0.124 465.75 465.76
467 119 466.04 467.36 1.32 0.284 466.63 466.64
468 180 467.37 469.12 1.75 0.372 468.15 468.21
469 6 469.2 469.43 0.23 0.087 469.3 469.32
470 14 469.94 470.22 0.28 0.094 470.07 470.04
471 33 470.28 471.05 0.77 0.24 470.7 470.72
472 177 471.14 473.07 1.93 0.419 472.18 472.16
473 11 473.1 473.39 0.29 0.095 473.28 473.27
474 12 473.87 474.11 0.24 0.067 473.99 473.99
475 116 474.18 475.43 1.25 0.296 474.66 474.6
476 86 475.45 477.01 1.56 0.351 476.14 476.13
477 5 477.21 477.29 0.08 0.031 477.24 477.23
478 4 477.39 477.93 0.54 0.259 477.69 477.73
479 111 477.98 479.27 1.29 0.291 478.57 478.61
480 146 479.4 480.65 1.25 0.317 480.1 480.1
481 80 480.72 481.94 1.22 0.256 481.1 481.09
482 5 482 482.05 0.05 0.018 482.02 482.02
483 47 482.24 483.13 0.89 0.248 482.74 482.75
484 227 483.16 484.95 1.79 0.357 484.09 484.1
485 34 485.02 485.47 0.45 0.101 485.21 485.21
486 1 485.96 485.96 0 NA 485.96 485.96
487 108 486.11 487.32 1.21 0.27 486.87 486.9
488 151 487.45 488.91 1.46 0.303 488.2 488.16
489 36 488.98 489.89 0.91 0.208 489.24 489.18
491 53 490.27 491.27 1 0.256 490.82 490.78
492 113 491.47 492.74 1.27 0.286 492.25 492.28
493 56 492.78 493.5 0.72 0.197 493.07 493.08
495 64 494.09 495.32 1.23 0.292 494.91 494.94
496 114 495.34 496.48 1.14 0.243 496.07 496.08
497 142 496.49 497.98 1.49 0.266 497.04 497.01
498 1 498.2 498.2 0 NA 498.2 498.2
499 23 498.28 499.27 0.99 0.315 498.85 498.94
500 100 499.34 500.84 1.5 0.371 500.13 500.1
501 17 500.92 501.66 0.74 0.205 501.16 501.13
502 1 501.9 501.9 0 NA 501.9 501.9
503 43 502.14 503.7 1.56 0.355 503.1 503.15
504 28 503.78 504.93 1.15 0.303 504.2 504.24
505 1 505.16 505.16 0 NA 505.16 505.16
506 8 505.55 506.35 0.8 0.291 506.02 506.12
507 12 506.4 507.32 0.92 0.256 507.05 507.1
508 32 507.78 509 1.22 0.263 508.29 508.23
511 49 510.36 511.53 1.17 0.292 511.09 511.07
512 52 511.58 512.87 1.29 0.282 512.16 512.12
513 4 513.02 513.25 0.23 0.103 513.1 513.06
515 42 514.1 515.35 1.25 0.395 514.92 515.14
516 96 515.52 517 1.48 0.29 516.29 516.26
517 3 517.24 517.62 0.38 0.209 517.38 517.28
519 91 518.34 519.8 1.46 0.357 519.2 519.21
520 17 520 520.58 0.58 0.173 520.3 520.35
521 20 520.63 521.43 0.8 0.204 520.88 520.83
522 1 521.91 521.91 0 NA 521.91 521.91
523 78 522.13 523.81 1.68 0.395 523.16 523.1
524 8 524.08 524.46 0.38 0.134 524.22 524.16
525 13 524.7 525.4 0.7 0.183 524.95 524.95
527 32 526.27 527.18 0.91 0.263 526.89 526.91
528 111 527.21 528.68 1.47 0.316 527.66 527.63
529 4 528.9 529.02 0.12 0.051 528.95 528.94
531 42 530.34 531.81 1.47 0.378 531.16 531.17
532 13 531.96 532.73 0.77 0.246 532.25 532.16
534 2 533.51 533.53 0.02 0.014 533.52 533.52
535 40 534.41 535.67 1.26 0.326 535.16 535.17
537 1 536.68 536.68 0 NA 536.68 536.68
539 110 538.19 540 1.81 0.353 539.11 539.13
542 1 542.17 542.17 0 NA 542.17 542.17
543 48 542.29 543.52 1.23 0.401 543.02 543.1
545 3 544.42 544.57 0.15 0.076 544.5 544.52
546 11 546.26 546.41 0.15 0.05 546.34 546.36
547 50 546.44 547.92 1.48 0.316 547.21 547.28
548 2 548.47 548.49 0.02 0.014 548.48 548.48
551 96 550.15 551.71 1.56 0.395 551.01 551.08
553 1 552.67 552.67 0 NA 552.67 552.67
555 118 554.14 556.18 2.04 0.411 555.03 555.04
556 1 556.3 556.3 0 NA 556.3 556.3
559 69 558.12 559.7 1.58 0.428 558.97 558.98
563 56 562.1 563.66 1.56 0.384 563.14 563.28
567 111 566.09 567.76 1.67 0.341 567.09 567.13
571 75 570.16 571.72 1.56 0.39 571.16 571.19
575 29 574.32 575.76 1.44 0.318 575.04 575.07
578 5 577.85 578.28 0.43 0.157 578.03 578
579 5 578.99 579.38 0.39 0.18 579.22 579.27
582 2 582.02 582.12 0.1 0.071 582.07 582.07
583 37 582.31 583.87 1.56 0.406 583.22 583.19
587 2 587.22 587.27 0.05 0.035 587.25 587.25
591 1 591.13 591.13 0 NA 591.13 591.13
599 4 598.56 599.39 0.83 0.412 599.18 599.38
600 5 599.58 600.42 0.84 0.374 599.94 599.9
605 2 605.3 605.53 0.23 0.163 605.41 605.41
614 7 614 614.57 0.57 0.185 614.35 614.31
615 5 614.97 615.57 0.6 0.229 615.19 615.15
616 1 616.32 616.32 0 NA 616.32 616.32
618 1 618.47 618.47 0 NA 618.47 618.47
620 7 619.28 620.1 0.82 0.315 619.82 620
622 2 621.6 622.43 0.83 0.587 622.01 622.01
624 2 623.39 624.04 0.65 0.46 623.71 623.71
627 2 626.18 626.9 0.72 0.509 626.54 626.54
628 3 627.21 627.65 0.44 0.246 627.49 627.62
632 3 631.6 631.69 0.09 0.046 631.65 631.66
634 1 633.51 633.51 0 NA 633.51 633.51
639 1 639.18 639.18 0 NA 639.18 639.18


Generate cumulative plot for SsaD48

res <- allCum(DB_orig, "SsaD48")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "SsaD48", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
GRB100104 309997 192.9 all
KEL101202 306756 196.9 all
CWT100103 309970 197 all
SXM13216 311629 207.5 all
RBW040504 301182 287.4 all
OOW040102 301943 310.2 all
RBW053907 301916 318.5 all
KEL050512 303785 331.6 all
BLD13033 312074 338.9 all
RBW041511 301372 349.8 all
SXM13002 311142 350.2 all
UBN140115 310570 357.9 all
LND14002 312301 358 all
CGN100801 309023 358.4 all
RMN13232 312020 361.8 all
BLD102304 307725 372 all
RBW053803 301070 373.7 all
FDB040209 302150 373.8 all
CLG050404 303176 385 all
CLG100504 307308 385.1 all
BLD100503 307521 388 all
BLD100509 307533 388 all
BLD100506 307527 388.1 all
KEL040311 305091 392.8 all
SXM13216 311630 401.4 all
LSN101204 309295 401.5 all
CLM050912 302909 412.2 all
RBW043207 301523 412.4 all
CLM100308 307068 412.5 all
SXM102403 308315 421.1 all
CLG050702 303215 421.2 all
BMT100203 309748 429.2 all
BYN100202 309766 429.2 all
BLD13071 312148 461.8 all
UBN140124 310587 461.9 all
UBN140086 310520 461.9 all
BLD13017 312052 486 all
UBN140113 310567 498.2 all
SXM13097 311320 501.9 all
GVY100304 309036 505.2 all
KEL042115 305553 517.2 all
RBW104301 309130 517.3 all
GVY100404 309044 517.6 all
SXM13001 311141 521.9 all
DGH100302 307834 533.5 all
DGH100307 307844 533.5 all
IOM120305 308787 536.7 all
RMN13152 311869 542.2 all
GVY101002 309064 544.4 all
GVY100402 309040 544.5 all
SXM100965 308121 544.6 all
GVY101004 309068 548.5 all
GVY100904 309058 548.5 all
SXM100704 307984 552.7 all
SKW040221 306064 556.3 all
UBN140069 310488 582 all
UBN140016 310391 582.1 all
CLG050107 303139 587.2 all
KEL040708 305249 587.3 all
PRB040228 306424 591.1 all
LND14048 312384 605.3 all
KEL051720 304336 605.5 all
BKB040223 310075 616.3 all
RBW102304 308944 618.5 all
RMN13140 311845 621.6 all
RMN13084 311739 622.4 all
CLG050115 303150 623.4 all
BKB040206 310042 624 all
KEL101908 306912 626.2 all
BRD040706 310611 626.9 all
BRD051508 302646 627.2 all
CWB110204 309961 627.6 all
BKB040227 310083 627.6 all
KGR10030102 309723 631.6 all
KGR10030101 309721 631.7 all
SHK100602 309536 631.7 all
KEL042410 305565 633.5 all
BLD101501 307681 639.2 all


CA054565a

Calculate bin statistics for CA054565a

dat <- BinStats(DB_orig, "CA054565a")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
101 3 101.46 101.68 0.22 0.124 101.54 101.47
106 17 105.7 106.11 0.41 0.117 105.9 105.9
110 6117 109.36 110.26 0.9 0.122 109.95 109.96
112 64 111.72 112.19 0.47 0.124 111.97 111.97
114 53 113.7 114.09 0.39 0.108 113.9 113.9
116 1 115.95 115.95 0 NA 115.95 115.95
118 2 117.91 117.95 0.04 0.028 117.93 117.93


Generate cumulative plot for CA054565a

res <- allCum(DB_orig, "CA054565a")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA054565a", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
CLG040116 314955 101.5 all
LND14055 318724 101.5 all
BCC100124 316645 101.7 all
KEL050512 314103 116 all
KEL050641 314160 117.9 all
RBW040504 312707 118 all


  • CLG040116 = Peak is legit.
  • LND14055 = Peak is legit.
  • BCC100124 = Peak is legit.
  • KEL050512 = Peak is legit.
  • KEL050641 = Peak is legit.
  • RBW040504 = Peak is legit.

DONE!

CA054565b

Calculate bin statistics for CA054565b

dat <- BinStats(DB_orig, "CA054565b")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
126 2 125.57 125.63 0.06 0.042 125.6 125.6
129 129 129.2 129.73 0.53 0.16 129.46 129.43
131 1 131.28 131.28 0 NA 131.28 131.28
133 4 133.38 133.53 0.15 0.065 133.44 133.43
135 5113 134.82 135.53 0.71 0.116 135.26 135.27
137 25 136.98 137.35 0.37 0.113 137.2 137.22
139 2448 138.69 139.38 0.69 0.117 139.16 139.16
141 174 140.89 141.32 0.43 0.081 141.15 141.14
159 1 159.15 159.15 0 NA 159.15 159.15
161 145 160.95 161.47 0.52 0.108 161.25 161.26
163 3 163 163.32 0.32 0.161 163.17 163.19
164 1 163.89 163.89 0 NA 163.89 163.89
165 2 165.15 165.16 0.01 0.007 165.16 165.16
171 11 171.07 171.4 0.33 0.095 171.19 171.18


Generate cumulative plot for CA054565b

res <- allCum(DB_orig, "CA054565b")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA054565b", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
CLG051706 320546 125.6 all
CLG051907 320579 125.6 all
BRD040706 325400 131.3 all
KEL051527 321081 159.2 all
KEL051502 321043 163 all
KEL041307 321902 163.2 all
CLG040311 325634 163.3 all
GGR100804 324738 163.9 all
CLG051013 320456 165.2 all
KEL041310 321906 165.2 all


  • CLG051706 = Peak is legit.
  • CLG051907 = Peak is legit.
  • BRD040706 = Peak is legit.
  • KEL051527 = Peak is legit.
  • KEL051502 = Peak is legit.
  • KEL041307 = Peak is legit.
  • CLG040311 = Peak is legit.
  • GGR100804 = Peak is an artifact. Deleted.
  • CLG051013 = Peak is legit.
  • KEL041310 = Peak is legit.

DONE!

One101

Calculate bin statistics for One101

dat <- BinStats(DB_orig, "One101")
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
164 199 163.32 164 0.68 0.193 163.73 163.78
168 958 167.18 167.98 0.8 0.18 167.67 167.69
172 2004 170.97 172.01 1.04 0.181 171.67 171.68
176 4664 174.85 176.06 1.21 0.169 175.66 175.69
178 2 177.47 177.87 0.4 0.283 177.67 177.67
180 1645 179.02 180.09 1.07 0.171 179.63 179.63
184 126 183.27 183.98 0.71 0.194 183.69 183.75
192 19 191.41 191.95 0.54 0.197 191.7 191.68


Generate cumulative plot for One101

res <- allCum(DB_orig, "One101")
print(res$plt)


Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "One101", 3)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BLD101405 332309 177.5 all
BLD101602 332320 177.9 all


  • BLD101405 = Peak is an artifact. Fixed.
  • BLD101602 = Peak is an artifact. Fixed.

DONE!

CA060177

Calculate bin statistics for CA060177

dat <- BinStats(DB_orig, "CA060177", limit = 0.35)
dat2 <- apply(dat[,-1], 2, function(x){
  return(round(as.numeric(unlist(x)), 4))
})
tab <- as.data.frame(cbind(rownames(dat), dat2))
colnames(tab) <- colnames(dat)
pander::pandoc.table(tab)
Bins N Min Max Range Sd MEAN MEDIAN
248 7 247.43 247.83 0.4 0.139 247.74 247.8
252 566 251.33 251.96 0.63 0.142 251.72 251.73
253 90 252.3 252.91 0.61 0.134 252.67 252.7
256 1 255.9 255.9 0 NA 255.9 255.9
260 2637 259.34 260 0.66 0.136 259.77 259.82
264 1439 263.4 264.13 0.73 0.142 263.83 263.84
268 3065 267.47 268.28 0.81 0.139 267.9 267.92
272 582 271.61 272.3 0.69 0.151 272.01 272.03
276 376 275.63 276.37 0.74 0.179 276.09 276.13
277 5 276.91 277.15 0.24 0.105 277.05 277.09
280 313 279.72 280.45 0.73 0.165 280.21 280.26
284 878 283.79 284.54 0.75 0.15 284.24 284.26
288 772 287.88 288.57 0.69 0.133 288.3 288.31
292 82 291.96 292.58 0.62 0.145 292.33 292.34
296 91 296.1 296.68 0.58 0.129 296.42 296.41


Generate cumulative plot for CA060177

res <- allCum(DB_orig, "CA060177", limit = 0.35)
print(res$plt)


Identify problems

  • There seems to be a 1bp shift ~ 253 bp:

      res <- allCum(DB_orig, "CA060177", limit = 0.35, ymin = 250, 
                    ymax = 260)
      print(res$plt)

  • There is another 1bp shift ~ 277bp:

      res <- allCum(DB_orig, "CA060177", limit = 0.35, ymin = 270, 
                    ymax = 280)
      print(res$plt)

Extract Samples with alleles that occur in fewer than three individuals

tab <- getLowFreq(DB_orig, "CA060177", 3, limit = 0.35)
if(!is.null(tab)){
  samp <- as.character(tab$Sample)
  idx <- which(colnames(tab) == "Sample")
  tab <- cbind(samp, tab[,-idx])
  rownames(tab) <- NULL
  pander::pandoc.table(tab)
}
samp DFrow Reading Gel
BKB040205 345008 255.9 all


  • BKB040205 = Peak is legit.

DONE!

Citations