This is data analysis from 2015-11-16. MDM samples with or without RTX and RAL were sequenced at JHU -/+ UDG with Rapid Mode HIseq.

Summary of Samples and read numbers

sample.id sample.name cell.type aligned.reads hiv md.read rmdup.read discordant.reads chimeric.reads
JS207 MDM no Virus MDM 902439 6305 16713 33479 20 0
JS208 MDM day 7 MDM 136756172 22192538 1800863 5405854 87266 27585
JS209 MDM day 14 MDM 111212 55495 4145 5198 76 0
JS210 v:DPI1 MDM 99222555 11213519 2017310 4071983 64154 44017
JS211 v:DPI1:RAL MDM 14719112 2336993 1590108 2604808 9237 2114
JS212 v:DPI30 MDM 8831602 581421 449434 689063 4435 108
JS213 v:DPI30:RAL MDM 6876314 1028346 401716 573430 2026 144
JS214 MDM no Virus:UDG MDM 16691 189 4232 4999 0 0
JS215 MDM day 7:UDG MDM 41395460 4656825 6040141 9276311 15870 9116
JS216 MDM day 14:UDG MDM 72602 6415 51439 53487 18 0
JS217 v:DPI1:UDG MDM 48085234 1627078 16719928 24108526 8637 3640
JS218 v:DPI1:RAL:UDG MDM 41539677 424954 19974111 29527517 2094 193
JS219 v:DPI30:UDG MDM 8702840 90062 4386910 5685454 574 30
JS220 v:DPI30:RAL:UDG MDM 8284670 170694 4414203 5712800 343 27

Samples from the previous sample that was treated with RTX +/- RAL was added to this sample to get enough DNA for the lockdown protocol. It was mixed with day 0 no virus, day 1 and day14. We see very few reads in the day 0 and day 14(JS207, 209). For the day0 this makes sense as there is no virus in this samples and nothing to lockdown. For day14 we are unclear if the low sample is due to low HIV infection or if it was a poor library. The library DNA amounts were the higher in day 0 and day 14 than day 7 prior to lockdown. Equal amounts of all libraries were mixed prior to the lockdown protocol.

Percentage of HIV in Sample

We see a high percentage of HIV reads in the day 14 sample. This could be due to the low number of aligned reads in this sample. I checked this number manually and ~50 of the aligned reads were HIV. We also see about 15% in day 7 as well as the RTX samples. This is much better than the 2% that we were seeing in the last sequencing run indicating that the low percentage was likely due to mixing with HT29 samples that have a much higher infection rate.

Duplication levels

This data shows the pcr duplication present in each sample. Since we use lockdown for HIV which only has 9500 bp we expect and often see a high level of duplication especially in day 7 since there are so many HIV reads(22M) In this data we see that between 5-80% of our reads are unique. We see that mark duplicate removes more reads across every sample than remove duplicate does. This is probably due to the program taking into account differences across the whole read not just the end base.

Coverage of HIV no normalization

This coverage pattern looks similar to what we have seen before with the UDG treated sample having decreased reads across the HIV genome indicating that Uracil is incorporated throughout. In the Day0 and Day 14 there aren’t enough HIV reads to clearly see any pattern.

Coverage of HIV(normalized Counts per million)

This normalization looks horrible and is probably not what we want to use.

Coverage of HIV(multiply by %HIV)

This normalization looks much better and makes more sense to me.

Integration events

We can see chimeras in day 7 and chimeras and discordant reads in day 14. The ability to see discordant reads is always higher but why we see a lot of discordant reads in day 14 and no chimeras is harder to understand. The high discordant percentage is probably due to normalizing to a small number of reads. I wonder if there is some false + in the day14 discordant data.

Mutations

Mutations in noRTX
chr pos ref mut depth sample.id
HIV 158 A T DP=298 JS208
HIV 164 GCAA TTAT DP=270 JS208
HIV 184 A C DP=315 JS208
HIV 233 G T DP=252 JS208
HIV 343 G T DP=203 JS208
HIV 381 C T DP=223 JS208
HIV 606 C T DP=6652 JS208
HIV 693 A T DP=10184 JS208
HIV 887 A T DP=3989 JS208
HIV 1143 A T DP=3519 JS208
HIV 1348 A T DP=6182 JS208
HIV 1481 C T DP=5163 JS208
HIV 1552 C T DP=4831 JS208
HIV 1553 C T DP=4857 JS208
HIV 1663 C T DP=4992 JS208
HIV 1772 C T DP=5311 JS208
HIV 2256 C T DP=3948 JS208
HIV 2470 G T DP=2036 JS208
HIV 2562 A C DP=11900 JS208
HIV 2846 G T DP=5947 JS208
HIV 3021 G T DP=3468 JS208
HIV 3244 A C DP=8215 JS208
HIV 3375 G T DP=2555 JS208
HIV 3834 T C DP=3794 JS208
HIV 4464 G T DP=2910 JS208
HIV 4754 A T DP=2384 JS208
HIV 5252 G C DP=9218 JS208
HIV 5328 A T DP=8488 JS208
HIV 5410 A T DP=12908 JS208
HIV 5814 C T DP=8882 JS208
HIV 6084 A T DP=3437 JS208
HIV 7022 G T DP=4115 JS208
HIV 7209 A T DP=8923 JS208
HIV 7450 C T DP=3014 JS208
HIV 7696 A C DP=3870 JS208
HIV 7964 A T DP=10489 JS208
HIV 8109 A C DP=7651 JS208
HIV 8531 G C DP=8854 JS208
HIV 8647 G T DP=6903 JS208
HIV 8691 G T DP=8851 JS208
HIV 8837 G T DP=3479 JS208
HIV 8926 G T DP=19395 JS208
HIV 1757 TCC TC DP=2 JS214
HIV 144 T C DP=104 JS215
HIV 158 A T DP=108 JS215
HIV 164 GCAA TTAT DP=100 JS215
HIV 184 A C DP=113 JS215
HIV 343 G T DP=44 JS215
HIV 381 C T DP=42 JS215
HIV 944 A T DP=1509 JS215
HIV 1314 A C DP=1065 JS215
HIV 1504 A T DP=1475 JS215
HIV 1921 A T DP=1084 JS215
HIV 2827 G T DP=783 JS215
HIV 3202 A T DP=1424 JS215
HIV 3373 A T DP=1793 JS215
HIV 3397 T C DP=3465 JS215
HIV 3489 G C DP=3310 JS215
HIV 3574 A C DP=2473 JS215
HIV 3587 T A DP=3045 JS215
HIV 4018 G T DP=1972 JS215
HIV 4902 G T DP=1530 JS215
HIV 5633 A T DP=2678 JS215
HIV 5730 G T DP=2602 JS215
HIV 5887 A T DP=3159 JS215
HIV 6023 A T DP=2020 JS215
HIV 6347 C T DP=1586 JS215
HIV 6591 G T DP=1741 JS215
HIV 6695 A T DP=1304 JS215
HIV 7669 A T DP=1875 JS215
HIV 8206 A T DP=1832 JS215
HIV 8215 G T DP=2256 JS215
HIV 8408 A C DP=1952 JS215
HIV 8711 G T DP=2130 JS215
Mutations in all samples
chr pos ref mut depth sample.id
HIV 144 T C DP=104 JS215
HIV 158 A T DP=298 JS208
HIV 158 A T DP=43 JS211
HIV 158 A T DP=108 JS215
HIV 164 GCAA TTAT DP=270 JS208
HIV 164 GCAA TTAT DP=36 JS211
HIV 164 GCAA TTAT DP=100 JS215
HIV 183 G A DP=86 JS210
HIV 183 G A DP=43 JS211
HIV 184 A C DP=315 JS208
HIV 184 A C DP=45 JS211
HIV 184 A C DP=18 JS212
HIV 184 A C DP=113 JS215
HIV 196 A C DP=73 JS210
HIV 233 G T DP=252 JS208
HIV 286 G A DP=111 JS210
HIV 343 G T DP=203 JS208
HIV 343 G T DP=164 JS210
HIV 343 G T DP=59 JS211
HIV 343 G T DP=12 JS212
HIV 343 G T DP=22 JS213
HIV 343 G T DP=44 JS215
HIV 343 G T DP=27 JS217
HIV 343 G T DP=17 JS218
HIV 343 G T DP=2 JS220
HIV 375 G T DP=2 JS220
HIV 381 C T DP=223 JS208
HIV 381 C T DP=210 JS210
HIV 381 C T DP=35 JS213
HIV 381 C T DP=42 JS215
HIV 606 C T DP=6652 JS208
HIV 693 A T DP=10184 JS208
HIV 887 A T DP=3989 JS208
HIV 944 A T DP=1509 JS215
HIV 1143 A T DP=3519 JS208
HIV 1314 A C DP=1065 JS215
HIV 1348 A T DP=6182 JS208
HIV 1480 C T DP=1111 JS218
HIV 1481 C T DP=5163 JS208
HIV 1504 A T DP=1475 JS215
HIV 1552 C T DP=4831 JS208
HIV 1553 C T DP=4857 JS208
HIV 1663 C T DP=4992 JS208
HIV 1757 TCC TC DP=2 JS214
HIV 1772 C T DP=5311 JS208
HIV 1921 A T DP=1084 JS215
HIV 2243 C T DP=3925 JS210
HIV 2256 C T DP=3948 JS208
HIV 2470 G T DP=2036 JS208
HIV 2550 C T DP=1376 JS211
HIV 2562 A C DP=11900 JS208
HIV 2827 G T DP=783 JS215
HIV 2846 G T DP=5947 JS208
HIV 3021 G T DP=3468 JS208
HIV 3202 A T DP=1424 JS215
HIV 3244 A C DP=8215 JS208
HIV 3373 A T DP=1793 JS215
HIV 3375 G T DP=2555 JS208
HIV 3397 T C DP=3465 JS215
HIV 3489 G C DP=3310 JS215
HIV 3574 A C DP=2473 JS215
HIV 3587 T A DP=3045 JS215
HIV 3834 T C DP=3794 JS208
HIV 4018 G T DP=1972 JS215
HIV 4135 A G DP=1334 JS210
HIV 4190 AGTA AA DP=1546 JS210
HIV 4201 T A DP=1901 JS210
HIV 4237 A G DP=504 JS218
HIV 4343 C T DP=1752 JS210
HIV 4420 A C DP=3080 JS210
HIV 4464 G T DP=2910 JS208
HIV 4754 A T DP=2384 JS208
HIV 4768 T A DP=731 JS210
HIV 4859 T G DP=2706 JS210
HIV 4866 T A DP=2795 JS210
HIV 4902 G T DP=1530 JS215
HIV 4958 G T DP=1510 JS217
HIV 5252 G C DP=9218 JS208
HIV 5328 A T DP=8488 JS208
HIV 5410 A T DP=12908 JS208
HIV 5530 C G DP=2544 JS210
HIV 5633 A T DP=2678 JS215
HIV 5724 G T DP=5657 JS210
HIV 5730 G T DP=2602 JS215
HIV 5802 G A DP=1242 JS217
HIV 5814 C T DP=8882 JS208
HIV 5887 A T DP=3159 JS215
HIV 6023 A T DP=2020 JS215
HIV 6084 A T DP=3437 JS208
HIV 6233 GAG GG DP=2476 JS210
HIV 6347 C T DP=1586 JS215
HIV 6591 G T DP=1741 JS215
HIV 6695 A T DP=1304 JS215
HIV 6750 T A DP=2027 JS210
HIV 6755 G A DP=2020 JS210
HIV 7022 G T DP=4115 JS208
HIV 7209 A T DP=8923 JS208
HIV 7450 C T DP=3014 JS208
HIV 7669 A T DP=1875 JS215
HIV 7696 A C DP=3870 JS208
HIV 7964 A T DP=10489 JS208
HIV 8109 A C DP=7651 JS208
HIV 8206 A T DP=1832 JS215
HIV 8215 G T DP=2256 JS215
HIV 8408 A C DP=1952 JS215
HIV 8531 G C DP=8854 JS208
HIV 8647 G T DP=6903 JS208
HIV 8691 G T DP=8851 JS208
HIV 8711 G T DP=2130 JS215
HIV 8837 G T DP=3479 JS208
HIV 8926 G T DP=19395 JS208
HIV 9239 G T DP=2335 JS218

I see a lot of mutations in the sample with a lot of HIV in it day 7. These samples have some level of hUNG activity so mutations are not out of the question here. I used the rmdup sample here so PCR duplication shouldn’t be playing a major role. These mutations are at high depth and are mostly changing to T. Does this make sense with a post replicative repair mechanism??? The mutations seem relatively evenly distributed across the HIV genome but I might want to look into the HIV genes. There is no way currently in this sample to tell if the same mutation is present on both strands.

Intersection with segmentation annotions (right norm)

This data shows the enrichment of the 2 read types in each of 20 segmentation classes. The Chimeric data is going to be more accurate here because we know the exact base the integration is happening. In the discordant data we only know the area of the integration event so the data is looking for the overlap with a large region (1000bp). We are seeing a mild enrichment in elonW, Gene bodys, enh regions so generally open chomatin which makes some sense. We do see potential differences in the +/- UDG indicating that uracil may be inhibiting incorporation in Elon and Enh and enhancing incorporation into TSS (although no TSS in JS215 strikes me as strange).

Intersection with segmentation annotations(norm to segment number)