Expression of GAL1 (06051) and GAl10/UDP-glucose 4-epimerase (06050)

#> Joining, by = "fastqFileName"

CPM and rank for TEF1 (CNAG_06125) in YPD or DMEM CO2 37 90 min

gene medium median_log2cpm quantile
CKF44_06125 DMEM 15.21845 0.9984211
CKF44_06125 YPD 14.71931 0.9982776

highly expressed crypto genes – 75th to 95th percentile

I tried many intervals, and widening to top 75 netted the most genes whose expression was consistently high across ALL conditions. narrowing the interval decreased the number of genes in that range across ALL conditions down to 5 or less (though most of the genes were highly expressed in, say, 142 / 146 of the unique concat conditions)

gene treatment_tally median_log2cpm mean_quantile quantile_sd
CKF44_00006 146 10.30493 0.8751453 0.0255795
CKF44_00402 146 10.38472 0.8817586 0.0281594
CKF44_00509 146 10.56398 0.8965574 0.0311383
CKF44_00602 146 10.51135 0.8952115 0.0270342
CKF44_01018 146 10.48432 0.8922041 0.0290333
CKF44_01083 146 10.40630 0.8787208 0.0349441
CKF44_01106 146 10.33366 0.8701196 0.0322008
CKF44_01278 146 10.26794 0.8730660 0.0352172
CKF44_01475 146 10.48109 0.8862681 0.0337783
CKF44_01568 146 10.39724 0.8840611 0.0278882
CKF44_01655 146 10.20040 0.8614053 0.0286747
CKF44_01722 146 10.85759 0.9204695 0.0219071
CKF44_02134 146 9.88108 0.8274114 0.0310610
CKF44_02177 146 10.54762 0.8901652 0.0317665
CKF44_02486 146 10.23253 0.8722982 0.0277540
CKF44_02671 146 10.78905 0.9169755 0.0142576
CKF44_02948 146 10.50731 0.8968837 0.0286800
CKF44_03439 146 10.61384 0.9030351 0.0209172
CKF44_03706 146 10.56189 0.8924391 0.0317257
CKF44_03931 146 10.67367 0.9024806 0.0280053

MSA of the promoters (100bp upstream) of genes shown above

#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.

NOTE: there if a pwm/pfm that goes along with this image of course

MSA of 10 bp upstream of all introns

#>  [1] 0.1919815 0.1923186 0.1895406 0.1908105 0.1931301 0.1843017 0.1826488
#>  [8] 0.1702303 0.1452700 0.1110761

A 0.0385005 0.0512811 0.0524815 0.0330705 0.0508494 0.0853854 0.0896944 0.1284762 0.1574426 -0.0029644
C 0.0291867 0.0252927 0.0361979 0.0385244 0.0404272 0.0219504 0.0640099 0.0455409 -0.0081061 -0.0846052
G 0.0569385 0.0609691 0.0192342 0.0387392 0.0401427 0.0025708 -0.0135594 0.0024607 -0.0450855 0.1868622
T 0.0673558 0.0547757 0.0816270 0.0804764 0.0617109 0.0743951 0.0425039 -0.0062476 0.0410189 0.0117834

follow ups

Checking intron extraction

  • MSA of 10 bp upstream of promoters. Can you please double check that you don’t have an off-by-one error that’s causing you to include the first base of the intron. The reason I suspect this is that essentially all introns begin with GT and end with AG. You have 100% G in position 10, which should be the last base before the intron. I did not think the base before the intron was so constrained, although I could be wrong about that.

I am using a core bioconductor data structure for both the annotations and the genome. There are core functions which link the two. So, little of my own code.

Here is an examination of one intron

# this is created in a previous chunk, included here for clarity
# the genome below is a bioconductor package that I made to store the 
# kn99 genome, downloaded from GenBank -- it is imminently going
# up on bioconductor, actually.
# 
# library(BSgenome.CneoformansVarGrubiiKN99.NCBI.ASM221672v1)
# kn99_genome = BSgenome.CneoformansVarGrubiiKN99.NCBI.ASM221672v1
# chrominfo = kn99_genome@seqinfo
# 
# chrominfo = merge(chrominfo,
#       Seqinfo(seqnames = "G418", seqlengths = 2565,
#               isCircular = FALSE, genome = "ASM221672v1"),
#       Seqinfo(seqnames = "NAT", seqlengths = 1650,
#               isCircular = FALSE, genome = "ASM221672v1"))
# 
# kn99_gff = GenomicFeatures::makeTxDbFromGFF(KN99_GFF_PATH,
#                                             chrominfo = chrominfo)
# 
# introns = intronsByTranscript(kn99_gff)
# up_10_from_introns = promoters(introns, upstream = 10, downstream = 0)
# 
# up_10_from_introns_seq = getSeq(kn99_genome, unlist(up_10_from_introns))

selected_intron = introns[[4000]]

selected_gene = genes(kn99_gff)[
                        unique(findOverlaps(selected_intron, 
                                            genes(kn99_gff))@to)]$gene_id

selected_exons = exonsBy(kn99_gff, by = "gene")[
  names(exonsBy(kn99_gff, by = "gene")) == selected_gene]

# zoom in on exon 1 (furthest right)

## the exon and intron are 1 bp apart

#> [1] TRUE

I should have just included this: this is the definition of how the promoter function works:

promoters

promoters generates promoter ranges for each range in x relative to the transcription start site (TSS), where TSS is start(x). The promoter range is expanded around the TSS according to the upstream and downstream arguments. upstream represents the number of nucleotides in the 5’ direction and downstream the number in the 3’ direction. The full range is defined as, (start(x) - upstream) to (start(x) + downstream - 1). For documentation for using promoters on a GRanges object see ?promoters,GenomicRanges-method in the GenomicRanges package.

including some exon parts and downstream

  1. Could you also provide the same information for the first few bases of the exon downstream of introns? Actually, why don’t you concatenate the upstream and downstream into a single PWM. Once we trim it down to the most important positions, we can use it to scan for good positions to insert introns in a designed gene.

I’m not quite sure I am following this – so let me explain what I did. I have taken the 5 bp upstream of the start of a given intron, and the 5 bp downstream of a given intron (which extends into the exon), and concatenated those sequences. here is a visualization of one

#>  [1]  0.3189775  0.3188008  0.3174729  0.3148040  0.3111479 -0.4023806
#>  [7] -0.2898850  0.2531190  0.2850127  0.1214468

What?

OK…I need to look up what information content means again. But, at position 6, G occurs 44140 out of 44156, which means there is little variability, and * I think * little information. It is just always a G. The same is true of position 7 for T.

And the PWM

knitr::kable(pfm)
A 12771 12990 15138 16971 9012 14 0 23823 28060 1739
C 9943 11738 10913 8831 6530 0 924 1711 6654 263
G 9211 8643 9207 7632 19060 44140 0 16780 4083 41640
T 12229 10783 8896 10720 9552 0 43230 1840 5357 512
other 0 0 0 0 0 0 0 0 0 0

Feb 2 follow up

Codon Optimization

Task: optimize the ZEV artificial TF for KN99 codon preferences.

Codon usage frequency adapted by Guohua from https://doi.org/10.1016/j.ygeno.2020.10.01

Guohua apparently did this by hand. The zev sequence I have is the result of her work. It is identical to what I did programmatically.

plot expression distribution

  1. Could you plot the distribution of expression levels across conditions for a few genes based on your list? CKF44_02134 and its divergently transcribed partner from the same promoter, 02133. And CKF44_01655.
#> Joining, by = "fastqFileName"

Extract potential promosters

  1. Also pull out their core promoter and 5’ UTR from 50 bp upstream of the TSS to the start of the CDS.

TSS – beginning of 1st exon

  1. whole 5’ plus 50 bp
  2. whole 5’ plus 400 (or nearest neighbor limit) bp

separating the 5’ UTR and promoter

This is a bit of a headache. The simple method is just to pull out x number of bases upstream/downstream of the TSS. If we want to treat the UTR and promoter regions separately, I can do so, but it will take a bit more coding than this. If that is critical, let me know – I’ll have it tomorrow.

#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.

50 BP upstream TSS

A 0.0000000 0.0156659 0.0000000 0.0156659 0.0000000 0.0156659 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0156659 0.0213872 0.0000000 0.0213872 0.0156659 0.0213872 0.0000000 0.0000000 0.0156659 0.0156659 0.0000000 0.0156659 0.0213872 0.0000000 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0000000 0.0213872 0.0000000 0.0000000 0.0000000 0.0156659 0.0213872 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0156659
C 0.0156659 0.0156659 0.0213872 0.0000000 0.0000000 0.0000000 0.0156659 0.0156659 0.0156659 0.0000000 0.0156659 0.0000000 0.0156659 0.0156659 0.0000000 0.0000000 0.0156659 0.0156659 0.0156659 0.0213872 0.0000000 0.0156659 0.0000000 0.0156659 0.0156659 0.0000000 0.0249666 0.0156659 0.0156659 0.0156659 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0213872 0.0000000 0.0000000 0.0213872 0.0156659 0.0156659 0.0156659
G 0.0156659 0.0156659 0.0000000 0.0156659 0.0213872 0.0000000 0.0213872 0.0156659 0.0156659 0.0213872 0.0213872 0.0156659 0.0000000 0.0213872 0.0000000 0.0213872 0.0000000 0.0156659 0.0213872 0.0000000 0.0000000 0.0213872 0.0156659 0.0000000 0.0156659 0.0156659 0.0000000 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0213872 0.0213872 0.0156659 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0156659 0.0000000 0.0000000 0.0213872 0.0213872 0.0156659 0.0000000 0.0000000 0.0000000
T 0.0156659 0.0000000 0.0156659 0.0156659 0.0156659 0.0213872 0.0000000 0.0156659 0.0156659 0.0156659 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0156659 0.0000000 0.0000000 0.0213872 0.0000000 0.0156659 0.0000000 0.0156659 0.0213872 0.0000000 0.0156659 0.0213872 0.0156659 0.0213872 0.0213872 0.0156659 0.0000000 0.0156659 0.0249666 0.0213872 0.0000000 0.0213872 0.0213872 0.0249666 0.0000000 0.0156659 0.0156659 0.0156659 0.0156659 0.0000000 0.0213872 0.0213872 0.0156659

400 BP upstream TSS

section 1 to 100 bp upstream

section 101 to 200 bp upstream

section 201 to 300 bp upstream

section 301 to 400 bp upstream

A 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0027572 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0027572 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0027572 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0032187 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0020196 0.0000000 0.0032187 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0027572 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0027572 0.0027572 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196
C 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0027572 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0027572 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0032187 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0027572 0.0000000 0.0000000 0.0027572 0.0027572 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0027572 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0032187 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0032187 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0027572 0.0000000 0.0020196 0.0027572 0.0000000 0.0027572 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0027572 0.0027572 0.0020196 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0020196 0.0027572 0.0000000 0.0000000 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0027572 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0032187 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196
G 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0027572 0.0027572 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0032187 0.0000000 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0027572 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0027572 0.0020196 0.0027572 0.0027572 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0032187 0.0027572 0.0020196 0.0027572 0.0027572 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0000000 0.0032187 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0032187 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0032187 0.0020196 0.0000000 0.0027572 0.0000000 0.0027572 0.0020196 0.0027572 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0032187 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0027572 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0027572 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0032187 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0027572 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0027572 0.0000000 0.0027572 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0020196 0.0027572 0.0027572 0.0020196 0.0000000 0.0027572 0.0000000 0.0027572 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0027572 0.0020196 0.0000000 0.0000000 0.0000000
T 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0027572 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0027572 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0032187 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0032187 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0027572 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0020196 0.0027572 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0032187 0.0020196 0.0027572 0.0020196 0.0000000 0.0027572 0.0027572 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0027572 0.0032187 0.0027572 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0027572 0.0020196 0.0020196 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0032187 0.0000000 0.0027572 0.0020196 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0000000 0.0027572 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0027572 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0000000 0.0000000 0.0000000 0.0020196 0.0020196 0.0027572 0.0027572 0.0027572 0.0000000 0.0020196 0.0027572 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0020196 0.0000000 0.0020196 0.0020196 0.0020196 0.0027572 0.0000000 0.0020196 0.0020196 0.0020196 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0020196 0.0000000 0.0000000 0.0027572 0.0000000 0.0020196 0.0000000 0.0020196 0.0027572 0.0000000 0.0020196 0.0027572 0.0020196 0.0027572 0.0027572 0.0020196 0.0000000 0.0020196 0.0032187 0.0027572 0.0000000 0.0027572 0.0027572 0.0032187 0.0000000 0.0020196 0.0020196 0.0020196 0.0020196 0.0000000 0.0027572 0.0027572 0.0020196

February 22 Requests: finishing the project

Downstream of 5’ end of exon

  1. need the sequence that surrounds the intron, so 3 bp into the exon on either side of an intron. 3 into the beginning of the exon, 3 from the end of the previous exon

  2. extract ckf44_01655 promoter (everything btwn tss and upstream feature) and 5 prime utr (everything from tss to cds) concat these sequences so that 5’ is beginning of promoter and 3’ is just before beginning of cds.

  3. use five prime utr and CDS seqeunces to scan for good place to insert intron (see 1)

output is final sequence of zev for kn99

Look for regions in the zev tf that match, rank list top 10.

An example of the extracted sequences

visualized with genes on opposite strands

## zoom in on exon 1

## Five prime MSA

A 15413 17709 10334
C 11387 9512 7414
G 9671 8324 19404
T 10372 11298 9691
other 0 0 0

Three prime MSA

A 12474 9760 10908
C 11644 11845 12022
G 12639 9441 9579
T 10086 15796 14331
other 0 1 3

scan the zev sequence with the splice site PWM

The splice site pwm is created by concatenating the 5’ and 3’ pwms in that order.

#> 
#> 
#> Attaching package: 'TFBSTools'
#> The following object is masked from 'package:Gviz':
#> 
#>     tags
#> The following object is masked from 'package:seqLogo':
#> 
#>     seqLogo
seqnames source feature start end absScore relScore strand ID TF class siteSeqs
zev_artificial_TF TFBS TFBS 214 219 92739 0.9373912 + splice_site_msa 5_to_3_3bp_each Unknown AAGTTT
zev_artificial_TF TFBS TFBS 479 484 94297 0.9755990 + splice_site_msa 5_to_3_3bp_each Unknown AAGCTT
zev_artificial_TF TFBS TFBS 904 909 92818 0.9393285 + splice_site_msa 5_to_3_3bp_each Unknown AAGATC
zev_artificial_TF TFBS TFBS 331 336 95292 1.0000000 - splice_site_msa 5_to_3_3bp_each Unknown AAGGTT
zev_artificial_TF TFBS TFBS 479 484 94297 0.9755990 - splice_site_msa 5_to_3_3bp_each Unknown AAGCTT
zev_artificial_TF TFBS TFBS 732 737 92818 0.9393285 - splice_site_msa 5_to_3_3bp_each Unknown AAGATC

creating the franken gene

The promoter and UTR of CKF44_06155 to the zev TF.

Conclusion

Interesting question re: the promoter region – the region between CKF44_06155 and the upstream gene on the same strand is covered by two features on the opposite strand. Thoughts?

Regardless, the code is in place to do this quickly. Here are the decisions I need:

  1. promoter definition

  2. intron placements, and which introns to select.