tricky plots!

library(ggplot2)

Referring to: http://sape.inf.usi.ch/quick-reference/ggplot2/geom_rect

We want relative range on the x axis and reads on the y axis lets try with just one category first because there is A LOT of data

geom_rect requires continuous scale on y axis, however read count is not continuous. Trying: 1) geom_errorbarh:

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = rel_range_graf, mapping = aes(xmin=rel_start, xmax=rel_end, y=log_reads, color=learning))

Fig.1 All uniquely mapping reads across all three experiments using log scale. It is rather overwhelming. ___________________________________________________________________________________________________________________________________

Plotting inclusion level of same data. I think is less clear than log

ggplot() + 
  scale_x_continuous(name = "rel") +
  geom_errorbarh(data = rel_range_graf, mapping = aes(xmin=rel_start, xmax=rel_end, y=inc_lvl, color=learning))

Fig 2. All uniquely mapping reads across all three experiments using inclusion level as scale. It is also rather overwhelming. ___________________________________________________________________________________________________________________

Giving each experiment a graph to make it less busy looking.

graf_3962 <- rel_range_graf[rel_range_graf$experi == 3962,]

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = graf_3962, mapping = aes(xmin=rel_start, xmax=rel_end, y=log_reads, color=learning))

Fig 3. Experiment 3692, log scale, all reads

graf_4024 <- rel_range_graf[rel_range_graf$experi == 4024,]

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = graf_4024, mapping = aes(xmin=rel_start, xmax=rel_end, y=log_reads, color=learning))

Fig 4. Experiment 4024, log scale, all reads

graf_4049 <- rel_range_graf[rel_range_graf$experi == 4049,]

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = graf_4049, mapping = aes(xmin=rel_start, xmax=rel_end, y=log_reads, color=learning))

Fig 5. Experiment 4049, log scale, all reads

All of these are still pretty unclear. Checking which range of read depths are the most dense.

hist(rel_range_graf$uniq_map)

Fig 6. Histogram of read depths. (Also did log read depths and inclusion level, but the first few graphs i looked at were awful)

Seems to be the first 500

Tried <500, was still too much
result below is <50

lowerend <- rel_range_graf[rel_range_graf$uniq_map <50, ]

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = lowerend, mapping = aes(xmin=rel_start, xmax=rel_end, y=uniq_map, color=learning))

Fig 7. Read depth of <50 for all experiments.Still too crowded.

Adding a random jitter to unique mapped reads:

rand <- rnorm(759, mean = 0.5, sd = 0.4)
rand[1:25]

##  [1]  0.31588560 -0.01084972  0.06708863  0.38454568  0.39248394  0.81113734
##  [7] -0.06691915  0.09114095  0.69412247  0.38149500  0.57286811  0.58457901
## [13]  0.35643392  0.33946208  0.68803350  0.66200929  0.12318522  0.20718815
## [19] -0.04204559  0.72366686 -0.02292886  1.08402959  0.58552021  0.74795928
## [25]  0.18701188

rel_range_graf$uniq_map_diff <- rel_range_graf$uniq_map + rand

lowerend2 <- rel_range_graf[rel_range_graf$uniq_map <50, ]

PLotting the “jittered” tracks

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = lowerend2, mapping = aes(xmin=rel_start, xmax=rel_end, y=uniq_map_diff, color=learning))

Fig 8. Jittered tracks has separated them out quite a lot but it is still far too busy.

Trying much lower range of read depths:

lowerend1 <- rel_range_graf[rel_range_graf$uniq_map <5, ]
ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = lowerend1, mapping = aes(xmin=rel_start, xmax=rel_end, y=uniq_map_diff, color=learning))

Fig 9. Jittered tracks for reads <5. Still not good.

Options: could use this method and split by experiment or learning

OR:

Change tactic completely!

Many SJs have a read depth of 1 and they overlap far too much
So, plotting only uniquely mapped reads = 1 across all experiments with an arbitrary ID on the y axis.

single_reads <- rel_range_graf[rel_range_graf$uniq_map ==1, ]
single_reads$ID <- 1:81

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = single_reads, mapping = aes(xmin=rel_start, xmax=rel_end, y=ID, color=learning))

Fig 10. Beautiful!! Uniquely mapped reads = 1 for all experiments, not scaled on y-axis, just pisitioned by ID.

But that is only a small portion of the data.

Alternatively:

low_reads_3962 <- graf_3962[graf_3962$uniq_map<=5,]
low_reads_3962$ID <- 1:104

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = low_reads_3962, mapping = aes(xmin=rel_start, xmax=rel_end, y=ID, color=uniq_map, ))

Fig 11. All reads with a depth of 5 or lower for only one experiment.

Maybe this will look better if the data is sorted:

testing <- low_reads_3962[,-10]
low_reads_3962_a <- testing[order(testing$uniq_map),]
low_reads_3962_a$ID <- 1:104


ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = low_reads_3962_a, mapping = aes(xmin=rel_start, xmax=rel_end, y=ID, color=uniq_map))

Fig 12. One experiment ordered by read#, is essentially the same thing as above but tidier.

Adding a bit more info:

ggplot() + 
  scale_x_continuous(name = "relative position") +
  geom_errorbarh(data = low_reads_3962_a, mapping = aes(xmin=rel_start, xmax=rel_end, y=ID, color=uniq_map, linetype=learning))

Fig 13. As above but with learning indicated by different line types. Doesn’t look as neat, but gives more information. Can work on the clarity of the line types.

tricky plots!

Georgina Robertson

01/04/2021

Feedback on how to proceed will be appreciated!