Flaveria C3-C4 gradient analysis

This report analyses the output of the gradient-pattern differential expression in each of the four Flaveria species:

Loading the data

We define a function to load the DE data for each species and extract:

Computing joint probabilities

Now we can compute joint probabilities for our patterns. That is, we can calculate the probability that two potential patterns are both true (see http://sites.nicholas.duke.edu/statsreview/probability/jmc/).

We are particularly interested patterns associated with C4, so first we calculate:

# find patterns that are the same in both C3 and those that are the same in both C4
same <- function(x) {
    length(unique(x)) == 1
}
sig <- function(x) {
    prod(x) >= 0.95
}

c3 <- c("fp", "fr")
c3.prob <- paste(c3, "prob", sep = ".")
c3.pattern <- paste(c3, "pattern", sep = ".")

all$c3.pattern <- apply(all[, c3.pattern], 1, function(x) {
    if (same(x)) {
        return(x[1])
    } else {
        NA
    }
})
all$c3.prob <- apply(all[, c3.prob], 1, prod)

c4 <- c("ft", "fb")
c4.prob <- paste(c4, "prob", sep = ".")
c4.pattern <- paste(c4, "pattern", sep = ".")

all$c4.pattern <- apply(all[, c4.pattern], 1, function(x) {
    if (same(x)) {
        return(x[1])
    } else {
        NA
    }
})
all$c4.prob <- apply(all[, c4.prob], 1, prod)

# take a look
all[1:10, c("gene.id", "c3.pattern", "c3.prob", "c4.pattern", "c4.prob")]
##      gene.id             c3.pattern c3.prob             c4.pattern c4.prob
## 1  AT1G01030 no significant pattern  0.3285 no significant pattern  0.2979
## 2  AT1G01040 no significant pattern  0.4790 no significant pattern  0.6920
## 3  AT1G01050                   <NA>  0.5170 no significant pattern  0.5597
## 4  AT1G01060                   <NA>  0.8941            up_gradient  0.9921
## 5  AT1G01080            up_gradient  0.9995            up_gradient  1.0000
## 6  AT1G01090 no significant pattern  0.8013                   <NA>  0.6529
## 7  AT1G01120 no significant pattern  0.7390 no significant pattern  0.2780
## 8  AT1G01140                   <NA>  0.9497                   <NA>  0.5534
## 9  AT1G01150                   <NA>  0.8458          down_gradient  0.9971
## 10 AT1G01160 no significant pattern  0.6273 no significant pattern  0.2780

Note: <NA> means the two species didn't have the same pattern

From this we can calculate:

c3c4.prob <- c("c3.prob", "c4.prob")
c3c4.pattern <- c("c3.pattern", "c4.pattern")
all$c3c4.same <- apply(all[, c3c4.pattern], 1, function(x) {
    if (all(is.na(x))) {
        return(NA)
    } else if (same(x)) {
        return(TRUE)
    } else {
        return(FALSE)
    }
})
all$c3c4.prob <- apply(all[, c3c4.prob], 1, prod)

To help us interpret the data, we also want to see some functional information for each gene, so we load the Arabidopsis full annotation we prepared for the earlier DE analysis and merge it with our results.

# annotate'em
annot <- read.csv("/data/genomes/ath/Athaliana_167_full_annotation.csv")
all <- merge(all, annot)

Now we can start asking the data interesting questions.

Finding conserved genes

Note that we're only considering the pattern of changes of the genes, not their absolute expression. In a later analysis we'll add expression quantity to the calculations.

First, let's ask what genes have the same expression pattern in all four Flaveria species.

# next, those with the same pattern in all species
allsame <- subset(all, c3c4.same & c3c4.prob >= 0.95)

# how many genes have the same pattern in all four species?
nrow(allsame)
## [1] 1134

# how many genes are the same for each pattern?
allsame.tab <- table(allsame$fb.pattern)
allsame.tab
## 
##    down_gradient equal expression      up_gradient 
##              632               15              487

Not many genes have equal expression along the gradient in all four species - let's see which ones do.

# genes with equal expression along the gradient in all species
equal <- subset(allsame, c4.pattern == "equal expression")
equal[, c(1, 16:21)]
##         gene.id                                                 Description PFAM.ID SMART.ID TF.Family TF.Genome.EST TF.Category
## 57    AT1G02030                               C2H2-like zinc finger protein          SM00355                                    
## 504   AT1G09040                                                                                                                 
## 4816  AT2G30340                            LOB domain-containing protein 13 PF03195                                             
## 5658  AT2G43820                                UDP-glucosyltransferase 74F2 PF00201                                             
## 5885  AT2G47360                                                                                                                 
## 7152  AT3G19310             PLC-like phosphodiesterases superfamily protein                                                     
## 7230  AT3G20580                             COBRA-like protein 10 precursor PF04833                                             
## 8503  AT3G57500                                                                                                                 
## 9706  AT4G18660                                                                                                                 
## 9926  AT4G22600                                                                                                                 
## 10631 AT4G33600                                                                                                                 
## 11434 AT5G06920         FASCICLIN-like arabinogalactan protein 21 precursor          SM00554                                    
## 12728 AT5G36000                                                                                                                 
## 13539 AT5G51500 Plant invertase/pectin methylesterase inhibitor superfamily PF04043  SM00856                                    
## 13923 AT5G57830                         Protein of unknown function, DUF593 PF04576

Finding genes of interest

We have a list of genes of particular interest - let's see what their patterns are across the species.

# what's the representation of the key AGIs?
key <- read.csv("key_agi.txt", sep = "\t", head = F)
names(key) <- c("ati", "name")
key$agi <- gsub(key$ati, pattern = "\\.[0-9]+", replacement = "")
key.all <- merge(key[, 2:3], all, by.x = "agi", by.y = "gene.id")

table(key.all[c("c4.pattern", "c3c4.same")])
##                         c3c4.same
## c4.pattern               FALSE TRUE
##   down_gradient              2    4
##   down_in_base               0    1
##   no significant pattern     7    4
##   up_gradient                3    5

14 of the genes of interest have the same pattern in all species. Which ones?

na.omit(key.all[key.all$c3c4.same == T, c("agi", "name", "c4.pattern", "Description")])
##          agi   name             c4.pattern                                                                            Description
## 2  AT1G05470   CVP2          down_gradient                                                       DNAse I-like superfamily protein
## 3  AT1G08540   TF18           down_in_base                                                          RNApolymerase sigma subunit 2
## 6  AT1G19850     MP          down_gradient     Transcriptional factor B3 family protein / auxin-responsive factor AUX/IAA-related
## 8  AT1G25440   TF01 no significant pattern                                         B-box type zinc finger protein with CCT domain
## 11 AT1G52150 ATHB15          down_gradient Homeobox-leucine zipper family protein / lipid-binding START domain-containing protein
## 13 AT1G64860   TF17            up_gradient                                                                         sigma factor A
## 25 AT2G22430    HB6 no significant pattern                                                                     homeobox protein 6
## 28 AT2G34710    PHB no significant pattern Homeobox-leucine zipper family protein / lipid-binding START domain-containing protein
## 29 AT2G35940   TF11            up_gradient                                                                BEL1-like homeodomain 1
## 31 AT2G41940   TF03 no significant pattern                                                                  zinc finger protein 8
## 34 AT3G53920   TF16            up_gradient                                                          RNApolymerase sigma-subunit C
## 45 AT5G16780   DOT2          down_gradient                                                                          SART-1 family
## 47 AT5G41410   TF12            up_gradient                                                    POX (plant homeobox) family protein
## 50 AT5G67030   TF06            up_gradient                                                      zeaxanthin epoxidase (ZEP) (ABA1)

C4-specific genes

We can identify putative C4-related genes. Our strict definition might be:

Genes where:

c4diff.a <- subset(all, !c3c4.same & c3c4.prob >= 0.95 & !is.na(c3.pattern) & !is.na(c4.pattern))

# how many?
nrow(c4diff.a)
## [1] 7

# what patterns?
c4diff.a.tab <- table(c4diff.a[, c("c4.pattern", "c3.pattern")])
c4diff.a.tab
##                   c3.pattern
## c4.pattern         down_gradient equal expression up_gradient up_in_base
##   down_gradient                0                0           1          3
##   equal expression             1                0           0          0
##   up_gradient                  0                1           0          0
##   up_in_base                   0                1           0          0

# and let's look at the actual genes
c4diff.a[, c("gene.id", "c3.pattern", "c4.pattern", "Description")]
##         gene.id       c3.pattern       c4.pattern                                         Description
## 2095  AT1G49400    down_gradient equal expression          Nucleic acid-binding, OB-fold-like protein
## 2414  AT1G55730      up_gradient    down_gradient                                  cation exchanger 5
## 5699  AT2G44580       up_in_base    down_gradient                                    zinc ion binding
## 9072  AT4G04500 equal expression       up_in_base cysteine-rich RLK (RECEPTOR-like protein kinase) 37
## 11765 AT5G12920       up_in_base    down_gradient     Transducin/WD40 repeat-like superfamily protein
## 13319 AT5G48100 equal expression      up_gradient             Laccase/Diphenol oxidase family protein
## 13586 AT5G52220       up_in_base    down_gradient

We can have a more relaxed definition by allowing:

c4diff.b <- subset(all, fr.pattern != c4.pattern & fp.pattern != c4.pattern & c4.prob >= 0.95, !is.na(c4.pattern))

# how many?
nrow(c4diff.b)
## [1] 365

# what patterns?
c4diff.b.tab <- table(c4diff.b[, c("c4.pattern", "c3.pattern")])
c4diff.b.tab
##                   c3.pattern
## c4.pattern         down_gradient down_in_base down_in_tip equal expression no significant pattern up_gradient up_in_base
##   2_1_3                        0            0           0                0                      1           0          0
##   down_gradient                0            0           1                0                     62           1          6
##   down_in_base                 0            0           0                0                      0           1          0
##   equal expression             1            0           0                0                     51           1          0
##   up_gradient                  0            2           0                1                     96           0          0
##   up_in_base                   0            0           0                1                      5           0          0
##   up_in_tip                    0            0           0                0                      0           0          0

# and let's look at the actual genes
c4diff.b[, c("gene.id", "c3.pattern", "c4.pattern", "Description")]
##         gene.id             c3.pattern       c4.pattern                                                                                Description
## 15    AT1G01225                   <NA>    down_gradient                                                       NC domain-containing protein-related
## 21    AT1G01320                   <NA>     down_in_base                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 48    AT1G01920 no significant pattern    down_gradient                                                              SET domain-containing protein
## 70    AT1G02180                   <NA>     down_in_base                                                                         ferredoxin-related
## 77    AT1G02400 no significant pattern equal expression                                                                    gibberellin 2-oxidase 6
## 102   AT1G02910 no significant pattern      up_gradient                                          tetratricopeptide repeat (TPR)-containing protein
## 111   AT1G03050 no significant pattern      up_gradient                                                          ENTH/ANTH/VHS superfamily protein
## 114   AT1G03070 no significant pattern      up_gradient                                                             Bax inhibitor-1 family protein
## 123   AT1G03170 no significant pattern      up_gradient                                                      Protein of unknown function (DUF3049)
## 178   AT1G04130                   <NA>    down_gradient                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 187   AT1G04220 no significant pattern equal expression                                                                  3-ketoacyl-CoA synthase 2
## 234   AT1G04880 no significant pattern    down_gradient                  HMG (high mobility group) box protein with ARID/BRIGHT DNA-binding domain
## 268   AT1G05320 no significant pattern      up_gradient                                                                                           
## 357   AT1G06930 no significant pattern equal expression                                                                                           
## 379   AT1G07230 no significant pattern equal expression                                                              non-specific phospholipase C1
## 398   AT1G07530 no significant pattern      up_gradient                                                                          SCARECROW-like 14
## 473   AT1G08600                   <NA>    down_gradient                   P-loop containing nucleoside triphosphate hydrolases superfamily protein
## 521   AT1G09290 no significant pattern    down_gradient                                                                                           
## 527   AT1G09380 no significant pattern equal expression                                        nodulin MtN21 /EamA-like transporter family protein
## 588   AT1G10170 no significant pattern equal expression                                                                                NF-X-like 1
## 590   AT1G10200 no significant pattern      up_gradient                                  GATA type zinc finger transcription factor family protein
## 596   AT1G10310                   <NA>      up_gradient                                           NAD(P)-binding Rossmann-fold superfamily protein
## 701   AT1G11900 no significant pattern    down_gradient                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 738   AT1G12460 no significant pattern    down_gradient                                          Leucine-rich repeat protein kinase family protein
## 767   AT1G12930 no significant pattern    down_gradient                                                             ARM repeat superfamily protein
## 793   AT1G13310 no significant pattern      up_gradient                                    Endosomal targeting BRO1-like domain-containing protein
## 796   AT1G13340 no significant pattern      up_gradient                                      Regulator of Vps4 activity in the MVB pathway protein
## 868   AT1G14700                   <NA>      up_gradient                                                                  purple acid phosphatase 3
## 1037  AT1G17680 no significant pattern      up_gradient                                          tetratricopeptide repeat (TPR)-containing protein
## 1040  AT1G17720                   <NA>      up_gradient                                            Protein phosphatase 2A, regulatory subunit PR55
## 1146  AT1G19610                   <NA> equal expression                                                          Arabidopsis defensin-like protein
## 1148  AT1G19640                   <NA>      up_gradient                                                   jasmonic acid carboxyl methyltransferase
## 1211  AT1G20830                   <NA>      up_gradient                                                       multiple chloroplast division site 1
## 1253  AT1G21651                   <NA>      up_gradient                                                                           zinc ion binding
## 1254  AT1G21680 no significant pattern      up_gradient                                                        DPP6 N-terminal domain-like protein
## 1259  AT1G21722 no significant pattern equal expression                                                                                           
## 1325  AT1G22930 no significant pattern      up_gradient                                                                       T-complex protein 11
## 1326  AT1G22940                   <NA>      up_gradient                                                     thiamin biosynthesis protein, putative
## 1366  AT1G23740                   <NA>      up_gradient                                  Oxidoreductase, zinc-binding dehydrogenase family protein
## 1368  AT1G23780 no significant pattern      up_gradient                                                                       F-box family protein
## 1378  AT1G24040 no significant pattern      up_gradient                                      Acyl-CoA N-acyltransferases (NAT) superfamily protein
## 1398  AT1G24440                   <NA>      up_gradient                                                             RING/U-box superfamily protein
## 1407  AT1G24590 no significant pattern equal expression                                                                           DORNROSCHEN-like
## 1415  AT1G25260 no significant pattern    down_gradient                                                       Ribosomal protein L10 family protein
## 1449  AT1G26190                   <NA>    down_gradient                                                Phosphoribulokinase / Uridine kinase family
## 1555  AT1G28110                   <NA>    down_gradient                                                            serine carboxypeptidase-like 45
## 1637  AT1G29740 no significant pattern    down_gradient                                           Leucine-rich repeat transmembrane protein kinase
## 1679  AT1G30330 no significant pattern    down_gradient                                                                    auxin response factor 6
## 1719  AT1G31040                   <NA>    down_gradient                                                  PLATZ transcription factor family protein
## 1739  AT1G31410 no significant pattern      up_gradient                                             putrescine-binding periplasmic protein-related
## 1754  AT1G31790 no significant pattern equal expression                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 1760  AT1G31840                   <NA>       up_in_base                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 1828  AT1G33270 no significant pattern      up_gradient                      Acyl transferase/acyl hydrolase/lysophospholipase superfamily protein
## 1872  AT1G34360                   <NA>       up_in_base                                      translation initiation factor 3 (IF-3) family protein
## 1889  AT1G35210                   <NA>        up_in_tip                                                                                           
## 1935  AT1G42960                   <NA>      up_gradient                                                                                           
## 1960  AT1G44130 no significant pattern      up_gradient                                                Eukaryotic aspartyl protease family protein
## 1968  AT1G44770 no significant pattern      up_gradient                                                                                           
## 2095  AT1G49400          down_gradient equal expression                                                 Nucleic acid-binding, OB-fold-like protein
## 2159  AT1G50700 no significant pattern      up_gradient                                                        calcium-dependent protein kinase 33
## 2199  AT1G51630                   <NA>    down_gradient                                                        O-fucosyltransferase family protein
## 2296  AT1G53520                   <NA>      up_gradient                                                Chalcone-flavanone isomerase family protein
## 2320  AT1G53885 no significant pattern      up_gradient                                                       Protein of unknown function (DUF581)
## 2362  AT1G54710 no significant pattern      up_gradient                                                    homolog of yeast autophagy 18 (ATG18) H
## 2406  AT1G55535                   <NA>      up_gradient                                                                                           
## 2414  AT1G55730            up_gradient    down_gradient                                                                         cation exchanger 5
## 2476  AT1G57540                   <NA>    down_gradient                                                                                           
## 2525  AT1G59740 no significant pattern    down_gradient                                                      Major facilitator superfamily protein
## 2554  AT1G60420 no significant pattern      up_gradient                                                              DC1 domain-containing protein
## 2683  AT1G63220 no significant pattern    down_gradient                               Calcium-dependent lipid-binding (CaLB domain) family protein
## 2690  AT1G63410 no significant pattern      up_gradient                                                       Protein of unknown function (DUF567)
## 2700  AT1G63660                   <NA>    down_gradient      GMP synthase (glutamine-hydrolyzing), putative / glutamine amidotransferase, putative
## 2702  AT1G63690                   <NA>    down_gradient                                                            SIGNAL PEPTIDE PEPTIDASE-LIKE 2
## 2730  AT1G64330 no significant pattern    down_gradient                                                                 myosin heavy chain-related
## 2761  AT1G64750 no significant pattern equal expression                                                           deletion of SUV3 suppressor 1(I)
## 2795  AT1G65420                   <NA>      up_gradient                                                       Protein of unknown function (DUF565)
## 2806  AT1G65650 no significant pattern    down_gradient                                     Peptidase C12, ubiquitin carboxyl-terminal hydrolase 1
## 2870  AT1G67040 no significant pattern    down_gradient                                                                                           
## 2917  AT1G67730 no significant pattern    down_gradient                                                                  beta-ketoacyl reductase 1
## 2955  AT1G68220                   <NA>    down_gradient                                                      Protein of unknown function (DUF1218)
## 3006  AT1G68990 no significant pattern    down_gradient                                                               male gametophyte defective 3
## 3193  AT1G72330 no significant pattern      up_gradient                                                                 alanine aminotransferase 2
## 3210  AT1G72640 no significant pattern      up_gradient                                           NAD(P)-binding Rossmann-fold superfamily protein
## 3239  AT1G73177                   <NA>    down_gradient                                                                                     bonsai
## 3314  AT1G74470                   <NA>      up_gradient                               Pyridine nucleotide-disulphide oxidoreductase family protein
## 3361  AT1G75180 no significant pattern      up_gradient                                       Erythronate-4-phosphate dehydrogenase family protein
## 3377  AT1G75388                   <NA>      up_gradient                                            conserved peptide upstream open reading frame 5
## 3401  AT1G75750 no significant pattern      up_gradient                                                                    GAST1 protein homolog 1
## 3417  AT1G76060 no significant pattern equal expression                                              LYR family of Fe/S cluster biogenesis protein
## 3430  AT1G76250 no significant pattern      up_gradient                                                                                           
## 3442  AT1G76430 no significant pattern equal expression                                                                  phosphate transporter 1;9
## 3451  AT1G76560                   <NA>      up_gradient                                                           CP12 domain-containing protein 3
## 3473  AT1G76940 no significant pattern      up_gradient                                            RNA-binding (RRM/RBD/RNP motifs) family protein
## 3485  AT1G77130                   <NA>    down_gradient                                          plant glycogenin-like starch initiation protein 2
## 3551  AT1G78290                   <NA>      up_gradient                                                         Protein kinase superfamily protein
## 3580  AT1G78780 no significant pattern       up_in_base                                                        pathogenesis-related family protein
## 3626  AT1G79420                   <NA>    down_gradient                                                       Protein of unknown function (DUF620)
## 3645  AT1G79640 no significant pattern      up_gradient                                                         Protein kinase superfamily protein
## 3648  AT1G79670 no significant pattern equal expression                                                      Wall-associated kinase family protein
## 3709  AT1G80550 no significant pattern equal expression                                         Pentatricopeptide repeat (PPR) superfamily protein
## 3780  AT2G01650                   <NA>      up_gradient                                                      plant UBX domain-containing protein 2
## 3783  AT2G01680 no significant pattern      up_gradient                                                              Ankyrin repeat family protein
## 3840  AT2G02870                   <NA>      up_gradient                                         Galactose oxidase/kelch repeat superfamily protein
## 3900  AT2G04560                   <NA>    down_gradient                                                 transferases, transferring glycosyl groups
## 3923  AT2G05210 no significant pattern    down_gradient                                                 Nucleic acid-binding, OB-fold-like protein
## 4020  AT2G14960 no significant pattern    down_gradient                                                        Auxin-responsive GH3 family protein
## 4028  AT2G15290 no significant pattern      up_gradient                                            translocon at inner membrane of chloroplasts 21
## 4044  AT2G15860             up_in_base    down_gradient                                                                                           
## 4076  AT2G16750 no significant pattern      up_gradient                Protein kinase protein with adenine nucleotide alpha hydrolases-like domain
## 4146  AT2G18030 no significant pattern    down_gradient                                      Peptide methionine sulfoxide reductase family protein
## 4158  AT2G18230 no significant pattern      up_gradient                                                                        pyrophosphorylase 2
## 4160  AT2G18250                   <NA>      up_gradient                                                   4-phosphopantetheine adenylyltransferase
## 4171  AT2G18500                   <NA>       up_in_base                                                                     ovate family protein 7
## 4237  AT2G19900                   <NA>      up_gradient                                                                        NADP-malic enzyme 1
## 4280  AT2G20585 no significant pattern    down_gradient                                                                 nuclear fusion defective 6
## 4411  AT2G22950 no significant pattern equal expression                                            Cation transporter/ E1-E2 ATPase family protein
## 4463  AT2G24130                   <NA>    down_gradient                                   Leucine-rich receptor-like protein kinase family protein
## 4546  AT2G25820                   <NA>    down_gradient                                             Integrase-type DNA-binding superfamily protein
## 4575  AT2G26290 no significant pattern equal expression                                                                     root-specific kinase 1
## 4585  AT2G26490 no significant pattern      up_gradient                                            Transducin/WD40 repeat-like superfamily protein
## 4605  AT2G26700 no significant pattern    down_gradient            AGC (cAMP-dependent, cGMP-dependent and protein kinase C) kinase family protein
## 4621  AT2G26930 no significant pattern      up_gradient                                     4-(cytidine 5'-phospho)-2-C-methyl-D-erithritol kinase
## 4696  AT2G28200 no significant pattern      up_gradient                                                       C2H2-type zinc finger family protein
## 4742  AT2G28940 no significant pattern    down_gradient                                                         Protein kinase superfamily protein
## 4829  AT2G30520 no significant pattern      up_gradient                                                 Phototropic-responsive NPH3 family protein
## 4851  AT2G30890 no significant pattern    down_gradient                              Cytochrome b561/ferric reductase transmembrane protein family
## 4855  AT2G30933 no significant pattern equal expression                                         Carbohydrate-binding X8 domain superfamily protein
## 4870  AT2G31190                   <NA>      up_gradient                                                        Protein of unknown function, DUF647
## 4899  AT2G31670 no significant pattern      up_gradient                                         Stress responsive alpha-beta barrel domain protein
## 4921  AT2G32010                   <NA>    down_gradient                                                                                CVP2 like 1
## 4922  AT2G32040                   <NA>      up_gradient                                                      Major facilitator superfamily protein
## 4977  AT2G33100                   <NA>    down_gradient                                                                 cellulose synthase-like D1
## 5022  AT2G33793             up_in_base    down_gradient                                                                                           
## 5152  AT2G36110                   <NA>    down_gradient                       Polynucleotidyl transferase, ribonuclease H-like superfamily protein
## 5201  AT2G36800 no significant pattern      up_gradient                                                                  don-glucosyltransferase 1
## 5210  AT2G36890 no significant pattern    down_gradient                                            Duplicated homeodomain-like superfamily protein
## 5330  AT2G38640 no significant pattern      up_gradient                                                       Protein of unknown function (DUF567)
## 5479  AT2G40780 no significant pattern    down_gradient                                                 Nucleic acid-binding, OB-fold-like protein
## 5484  AT2G40830 no significant pattern      up_gradient                                                                         RING-H2 finger C1A
## 5620  AT2G43090 no significant pattern      up_gradient                                            Aconitase/3-isopropylmalate dehydratase protein
## 5636  AT2G43330            up_gradient     down_in_base                                                                     inositol transporter 1
## 5699  AT2G44580             up_in_base    down_gradient                                                                           zinc ion binding
## 5727  AT2G44980                   <NA>    down_gradient                        SNF2 domain-containing protein / helicase domain-containing protein
## 5758  AT2G45420 no significant pattern equal expression                                                           LOB domain-containing protein 18
## 5791  AT2G45800 no significant pattern      up_gradient                                  GATA type zinc finger transcription factor family protein
## 5801  AT2G46020 no significant pattern    down_gradient                                            transcription regulatory protein SNF2, putative
## 5842  AT2G46570 no significant pattern    down_gradient                                                                                  laccase 6
## 5900  AT2G47600 no significant pattern      up_gradient                                                                 magnesium/proton exchanger
## 5925  AT2G47970                   <NA> equal expression                                                     Nuclear pore localisation protein NPL4
## 6020  AT3G02200 no significant pattern    down_gradient                                                  Proteasome component (PCI) domain protein
## 6022  AT3G02220                   <NA>    down_gradient                                                                                           
## 6072  AT3G02875                   <NA>    down_gradient                                                       Peptidase M20/M25/M40 family protein
## 6080  AT3G03000 no significant pattern equal expression                                                     EF hand calcium-binding protein family
## 6109  AT3G03500                   <NA>      up_gradient                                                                         TatD related DNase
## 6212  AT3G05150 no significant pattern      up_gradient                                                      Major facilitator superfamily protein
## 6227  AT3G05345 no significant pattern      up_gradient                                                  Chaperone DnaJ-domain superfamily protein
## 6254  AT3G05690 no significant pattern      up_gradient                                                               nuclear factor Y, subunit A2
## 6275  AT3G06040 no significant pattern    down_gradient      Ribosomal protein L12/ ATP-dependent Clp protease adaptor protein ClpS family protein
## 6341  AT3G07010                   <NA>    down_gradient                                                      Pectin lyase-like superfamily protein
## 6453  AT3G08840 no significant pattern      up_gradient                                                         D-alanine--D-alanine ligase family
## 6459  AT3G08930                   <NA>    down_gradient                                                                LMBR1-like membrane protein
## 6599  AT3G10980                   <NA>      up_gradient                                                                       PLAC8 family protein
## 6776  AT3G13730 no significant pattern equal expression                                     cytochrome P450, family 90, subfamily D, polypeptide 1
## 6817  AT3G14240                   <NA>    down_gradient                                                                   Subtilase family protein
## 6839  AT3G14600 no significant pattern    down_gradient                                                  Ribosomal protein L18ae/LX family protein
## 6840  AT3G14610 no significant pattern      up_gradient                                     cytochrome P450, family 72, subfamily A, polypeptide 7
## 6940  AT3G15990                   <NA>    down_gradient                                                                    sulfate transporter 3;4
## 6968  AT3G16330                   <NA>    down_gradient                                                                                           
## 7022  AT3G17350                   <NA> equal expression                                                                                           
## 7059  AT3G17940 no significant pattern    down_gradient                                              Galactose mutarotase-like superfamily protein
## 7105  AT3G18550                   <NA> equal expression                                                           TCP family transcription factor 
## 7107  AT3G18570 no significant pattern equal expression                                                                     Oleosin family protein
## 7159  AT3G19440 no significant pattern    down_gradient                                                      Pseudouridine synthase family protein
## 7162  AT3G19490                   <NA>      up_gradient                                                               sodium:hydrogen antiporter 1
## 7221  AT3G20480                   <NA>    down_gradient                                             tetraacyldisaccharide 4'-kinase family protein
## 7243  AT3G20790 no significant pattern    down_gradient                                           NAD(P)-binding Rossmann-fold superfamily protein
## 7273  AT3G21250 no significant pattern      up_gradient                                                  multidrug resistance-associated protein 6
## 7289  AT3G21480 no significant pattern    down_gradient                                                  BRCT domain-containing DNA repair protein
## 7315  AT3G22160 no significant pattern equal expression                                                                VQ motif-containing protein
## 7330  AT3G22400                   <NA>      up_gradient                                     PLAT/LH2 domain-containing lipoxygenase family protein
## 7342  AT3G22560                   <NA> equal expression                                      Acyl-CoA N-acyltransferases (NAT) superfamily protein
## 7374  AT3G23150                   <NA>    down_gradient                         Signal transduction histidine kinase, hybrid-type, ethylene sensor
## 7400  AT3G23570            up_gradient equal expression                                                  alpha/beta-Hydrolases superfamily protein
## 7409  AT3G23690 no significant pattern      up_gradient                              basic helix-loop-helix (bHLH) DNA-binding superfamily protein
## 7416  AT3G23770 no significant pattern equal expression                                                    O-Glycosyl hydrolases family 17 protein
## 7521  AT3G25590                   <NA>    down_gradient                                                                                           
## 7666  AT3G28210 no significant pattern      up_gradient                                                      zinc finger (AN1-like) family protein
## 7674  AT3G28455                   <NA>    down_gradient                                                                    CLAVATA3/ESR-RELATED 25
## 7736  AT3G30725 no significant pattern equal expression                                                                         glutamine dumper 6
## 7808  AT3G45040 no significant pattern      up_gradient                                          phosphatidate cytidylyltransferase family protein
## 7830  AT3G45780 no significant pattern      up_gradient                                                                              phototropin 1
## 7865  AT3G46610 no significant pattern      up_gradient                                    Pentatricopeptide repeat (PPR-like) superfamily protein
## 7904  AT3G47630 no significant pattern    down_gradient                                                                                           
## 8042  AT3G49880                   <NA>      up_gradient                                                       glycosyl hydrolase family protein 43
## 8065  AT3G50560 no significant pattern      up_gradient                                           NAD(P)-binding Rossmann-fold superfamily protein
## 8168  AT3G52190                   <NA>    down_gradient                                                 phosphate transporter traffic facilitator1
## 8170  AT3G52210 no significant pattern    down_gradient                   S-adenosyl-L-methionine-dependent methyltransferases superfamily protein
## 8181  AT3G52350 no significant pattern equal expression                                                     D111/G-patch domain-containing protein
## 8225  AT3G53120                   <NA>      up_gradient                                                   Modifier of rudimentary (Mod(r)) protein
## 8252  AT3G53580           down_in_base      up_gradient                                                   diaminopimelate epimerase family protein
## 8346  AT3G54960 no significant pattern    down_gradient                                                                               PDI-like 1-3
## 8350  AT3G55000 no significant pattern    down_gradient                                                                     tonneau family protein
## 8598  AT3G59490 no significant pattern equal expression                                                                                           
## 8600  AT3G59520                   <NA> equal expression                                                                   RHOMBOID-like protein 13
## 8641  AT3G60280 no significant pattern      up_gradient                                                                               uclacyanin 3
## 8660  AT3G60550                   <NA>    down_gradient                                                                                cyclin p3;2
## 8674  AT3G60780 no significant pattern equal expression                                                      Protein of unknown function (DUF1442)
## 8680  AT3G60850 no significant pattern equal expression                                                                                           
## 8684  AT3G60910 no significant pattern      up_gradient                   S-adenosyl-L-methionine-dependent methyltransferases superfamily protein
## 8690  AT3G61080 no significant pattern      up_gradient                                                         Protein kinase superfamily protein
## 8748  AT3G62040 no significant pattern      up_gradient                             Haloacid dehalogenase-like hydrolase (HAD) superfamily protein
## 8762  AT3G62270 no significant pattern      up_gradient                                                                   HCO3- transporter family
## 8795  AT3G62950 no significant pattern      up_gradient                                                            Thioredoxin superfamily protein
## 8841  AT3G66658             up_in_base    down_gradient                                                                aldehyde dehydrogenase 22A1
## 8861  AT4G00335 no significant pattern       up_in_base                                                                         RING-H2 finger B1A
## 8864  AT4G00370 no significant pattern      up_gradient                                                      Major facilitator superfamily protein
## 8912  AT4G01240 no significant pattern equal expression                   S-adenosyl-L-methionine-dependent methyltransferases superfamily protein
## 8922  AT4G01440 no significant pattern    down_gradient                                        nodulin MtN21 /EamA-like transporter family protein
## 8923  AT4G01470 no significant pattern equal expression                                                            tonoplast intrinsic protein 1;3
## 8942  AT4G01880 no significant pattern    down_gradient                                                                         methyltransferases
## 8961  AT4G02170 no significant pattern equal expression                                                                                           
## 9007  AT4G02780                   <NA>    down_gradient                          Terpenoid cyclases/Protein prenyltransferases superfamily protein
## 9051  AT4G03520 no significant pattern      up_gradient                                                            Thioredoxin superfamily protein
## 9072  AT4G04500       equal expression       up_in_base                                        cysteine-rich RLK (RECEPTOR-like protein kinase) 37
## 9138  AT4G08280                   <NA>      up_gradient                                                            Thioredoxin superfamily protein
## 9148  AT4G08550                   <NA>    down_gradient                                        electron carriers;protein disulfide oxidoreductases
## 9189  AT4G09760 no significant pattern      up_gradient                                                         Protein kinase superfamily protein
## 9236  AT4G10750 no significant pattern    down_gradient                                             Phosphoenolpyruvate carboxylase family protein
## 9237  AT4G10760                   <NA>    down_gradient                                                                    mRNAadenosine methylase
## 9255  AT4G11060                   <NA>    down_gradient                               mitochondrially targeted single-stranded DNA binding protein
## 9270  AT4G11400                   <NA>    down_gradient                                         ARID/BRIGHT DNA-binding domain;ELM2 domain protein
## 9326  AT4G12620 no significant pattern    down_gradient                                                           origin of replication complex 1B
## 9365  AT4G13345 no significant pattern            2_1_3                      Serinc-domain containing serine and sphingolipid biosynthesis protein
## 9419  AT4G14180                   <NA>       up_in_base                                                 putative recombination initiation defect 1
## 9431  AT4G14350                   <NA>      up_gradient            AGC (cAMP-dependent, cGMP-dependent and protein kinase C) kinase family protein
## 9481  AT4G15093 no significant pattern      up_gradient                         catalytic LigB subunit of aromatic ring-opening dioxygenase family
## 9555  AT4G16370                   <NA>      up_gradient                                                                   oligopeptide transporter
## 9610  AT4G17180 no significant pattern       up_in_base                                                    O-Glycosyl hydrolases family 17 protein
## 9685  AT4G18350                   <NA> equal expression                                                     nine-cis-epoxycarotenoid dioxygenase 2
## 9730  AT4G19020 no significant pattern    down_gradient                                                                          chromomethylase 2
## 9745  AT4G19191 no significant pattern    down_gradient                                    Tetratricopeptide repeat (TPR)-like superfamily protein
## 9785  AT4G20050 no significant pattern equal expression                                                      Pectin lyase-like superfamily protein
## 9787  AT4G20070                   <NA>      up_gradient                                                                  allantoate amidohydrolase
## 9944  AT4G22910                   <NA>      up_gradient                                                                            FIZZY-related 2
## 9973  AT4G23550 no significant pattern equal expression                                                           WRKY family transcription factor
## 10046 AT4G24560                   <NA>      up_gradient                                                             ubiquitin-specific protease 16
## 10137 AT4G25980                   <NA>    down_gradient                                                             Peroxidase superfamily protein
## 10217 AT4G27130                   <NA>      up_gradient                                          Translation initiation factor SUI1 family protein
## 10221 AT4G27250 no significant pattern equal expression                                           NAD(P)-binding Rossmann-fold superfamily protein
## 10274 AT4G28070 no significant pattern      up_gradient                                                            AFG1-like ATPase family protein
## 10320 AT4G28820 no significant pattern      up_gradient                                                        HIT-type Zinc finger family protein
## 10339 AT4G29100           down_in_base      up_gradient                              basic helix-loop-helix (bHLH) DNA-binding superfamily protein
## 10341 AT4G29120 no significant pattern      up_gradient                                            6-phosphogluconate dehydrogenase family protein
## 10399 AT4G30160                   <NA>      up_gradient                                                                                   villin 4
## 10425 AT4G30550                   <NA>      up_gradient                                Class I glutamine amidotransferase-like superfamily protein
## 10444 AT4G30845 no significant pattern      up_gradient                                                                                           
## 10457 AT4G30993                   <NA>      up_gradient                               Calcineurin-like metallo-phosphoesterase superfamily protein
## 10459 AT4G31000 no significant pattern equal expression                                                                 Calmodulin-binding protein
## 10465 AT4G31115 no significant pattern      up_gradient                                                      Protein of unknown function (DUF1997)
## 10528 AT4G32160                   <NA>      up_gradient                                                        Phox (PX) domain-containing protein
## 10546 AT4G32400                   <NA>      up_gradient                                             Mitochondrial substrate carrier family protein
## 10581 AT4G32870 no significant pattern      up_gradient                       Polyketide cyclase/dehydrase and lipid transport superfamily protein
## 10596 AT4G33060                   <NA>    down_gradient                        Cyclophilin-like peptidyl-prolyl cis-trans isomerase family protein
## 10626 AT4G33510 no significant pattern      up_gradient                                       3-deoxy-d-arabino-heptulosonate 7-phosphate synthase
## 10630 AT4G33580 no significant pattern      up_gradient                                                                  beta carbonic anhydrase 5
## 10633 AT4G33630                   <NA>      up_gradient                                                      Protein of unknown function (DUF3506)
## 10642 AT4G33780                   <NA>      up_gradient                                                                                           
## 10645 AT4G33820                   <NA>      up_gradient                                                     Glycosyl hydrolase superfamily protein
## 10678 AT4G34260                   <NA>    down_gradient                                                                    1,2-alpha-L-fucosidases
## 10791 AT4G35905 no significant pattern equal expression                                                                                           
## 10851 AT4G36910                   <NA>      up_gradient                                           Cystathionine beta-synthase (CBS) family protein
## 10855 AT4G36945 no significant pattern equal expression                                            PLC-like phosphodiesterases superfamily protein
## 10867 AT4G37070                   <NA>    down_gradient                      Acyl transferase/acyl hydrolase/lysophospholipase superfamily protein
## 11023 AT4G39490 no significant pattern equal expression                                    cytochrome P450, family 96, subfamily A, polypeptide 10
## 11080 AT5G01250 no significant pattern equal expression                                               alpha 1,4-glycosyltransferase family protein
## 11098 AT5G01510 no significant pattern      up_gradient                                                        Protein of unknown function, DUF647
## 11140 AT5G02230 no significant pattern      up_gradient                             Haloacid dehalogenase-like hydrolase (HAD) superfamily protein
## 11177 AT5G02880 no significant pattern      up_gradient                                                                 ubiquitin-protein ligase 4
## 11200 AT5G03380 no significant pattern      up_gradient                                  Heavy metal transport/detoxification superfamily protein 
## 11314 AT5G05180 no significant pattern    down_gradient                                                                                           
## 11318 AT5G05220 no significant pattern      up_gradient                                                                                           
## 11329 AT5G05360                   <NA>      up_gradient                                                                                           
## 11330 AT5G05365 no significant pattern      up_gradient                                  Heavy metal transport/detoxification superfamily protein 
## 11380 AT5G06100 no significant pattern equal expression                                                                      myb domain protein 33
## 11474 AT5G07840                   <NA>      up_gradient                                                              Ankyrin repeat family protein
## 11483 AT5G07990                   <NA>    down_gradient                                                        Cytochrome P450 superfamily protein
## 11490 AT5G08100 no significant pattern equal expression                N-terminal nucleophile aminohydrolases (Ntn hydrolases) superfamily protein
## 11530 AT5G08570                   <NA>    down_gradient                                                             Pyruvate kinase family protein
## 11605 AT5G10170                   <NA>      up_gradient                                                        myo-inositol-1-phosphate synthase 3
## 11636 AT5G10720                   <NA>       up_in_base                                                                         histidine kinase 5
## 11762 AT5G12870                   <NA>     down_in_base                                                                      myb domain protein 46
## 11765 AT5G12920             up_in_base    down_gradient                                            Transducin/WD40 repeat-like superfamily protein
## 11859 AT5G14150 no significant pattern    down_gradient                                                        Protein of unknown function, DUF642
## 11861 AT5G14180 no significant pattern    down_gradient                                                            Myzus persicae-induced lipase 1
## 11869 AT5G14270 no significant pattern    down_gradient                                             bromodomain and extraterminal domain protein 9
## 11890 AT5G14580                   <NA>    down_gradient                                        polyribonucleotide nucleotidyltransferase, putative
## 12004 AT5G16290 no significant pattern    down_gradient                                                                          VALINE-TOLERANT 1
## 12069 AT5G17390 no significant pattern    down_gradient                               Adenine nucleotide alpha hydrolases-like superfamily protein
## 12129 AT5G18460 no significant pattern    down_gradient                                                       Protein of Unknown Function (DUF239)
## 12144 AT5G18640                   <NA>      up_gradient                                                  alpha/beta-Hydrolases superfamily protein
## 12155 AT5G18860                   <NA>      up_gradient                             inosine-uridine preferring nucleoside hydrolase family protein
## 12158 AT5G18910                   <NA>    down_gradient                                                         Protein kinase superfamily protein
## 12166 AT5G19010 no significant pattern      up_gradient                                                        mitogen-activated protein kinase 16
## 12205 AT5G19570                   <NA>    down_gradient                                                                                           
## 12222 AT5G19790                   <NA> equal expression                                                                          related to AP2 11
## 12225 AT5G19840 no significant pattern equal expression                    2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein
## 12232 AT5G19930 no significant pattern      up_gradient                                           Protein of unknown function DUF92, transmembrane
## 12276 AT5G20540 no significant pattern    down_gradient                                                                        BREVIS RADIX-like 4
## 12327 AT5G22000 no significant pattern      up_gradient                                                                          RING-H2 group F2A
## 12446 AT5G23950 no significant pattern      up_gradient                               Calcium-dependent lipid-binding (CaLB domain) family protein
## 12484 AT5G24600 no significant pattern equal expression                                                        Protein of unknown function, DUF599
## 12510 AT5G25120 no significant pattern equal expression                                     ytochrome p450, family 71, subfamily B, polypeptide 11
## 12533 AT5G25530 no significant pattern equal expression                                                             DNAJ heat shock family protein
## 12547 AT5G25820                   <NA> equal expression                                                                   Exostosin family protein
## 12594 AT5G26980 no significant pattern       up_in_base                                                                      syntaxin of plants 41
## 12759 AT5G37340                   <NA>    down_gradient                                                            ZPR1 zinc-finger domain protein
## 12765 AT5G37480 no significant pattern      up_gradient                                                                                           
## 12766 AT5G37490 no significant pattern equal expression                                                             ARM repeat superfamily protein
## 12921 AT5G40890 no significant pattern      up_gradient                                                                         chloride channel A
## 12977 AT5G41990                   <NA>    down_gradient                                                                with no lysine (K) kinase 8
## 13103 AT5G44010 no significant pattern    down_gradient                                                                                           
## 13118 AT5G44230                   <NA>    down_gradient                                         Pentatricopeptide repeat (PPR) superfamily protein
## 13234 AT5G46790 no significant pattern      up_gradient                                                                                PYR1-like 1
## 13245 AT5G47020 no significant pattern      up_gradient                                                                                           
## 13304 AT5G47860                   <NA>      up_gradient                                                      Protein of unknown function (DUF1350)
## 13319 AT5G48100       equal expression      up_gradient                                                    Laccase/Diphenol oxidase family protein
## 13358 AT5G48660 no significant pattern    down_gradient                                                B-cell receptor-associated protein 31-like 
## 13365 AT5G48800 no significant pattern       up_in_base                                                 Phototropic-responsive NPH3 family protein
## 13368 AT5G48830                   <NA>      up_gradient                                                                                           
## 13377 AT5G48940                   <NA>    down_gradient                            Leucine-rich repeat transmembrane protein kinase family protein
## 13406 AT5G49520                   <NA> equal expression                                                                WRKY DNA-binding protein 48
## 13427 AT5G49800                   <NA>    down_gradient                       Polyketide cyclase/dehydrase and lipid transport superfamily protein
## 13433 AT5G49890 no significant pattern      up_gradient                                                                         chloride channel C
## 13537 AT5G51460                   <NA>      up_gradient                             Haloacid dehalogenase-like hydrolase (HAD) superfamily protein
## 13553 AT5G51720 no significant pattern      up_gradient                                                           2 iron, 2 sulfur cluster binding
## 13581 AT5G52170 no significant pattern equal expression                                                                     homeodomain GLABROUS 7
## 13586 AT5G52220             up_in_base    down_gradient                                                                                           
## 13674 AT5G53660 no significant pattern    down_gradient                                                                 growth-regulating factor 7
## 13761 AT5G55180                   <NA>       up_in_base                                                    O-Glycosyl hydrolases family 17 protein
## 13765 AT5G55250 no significant pattern equal expression                                                            IAA carboxylmethyltransferase 1
## 13812 AT5G56090 no significant pattern    down_gradient                                                                    cytochrome c oxidase 15
## 13836 AT5G56530                   <NA>      up_gradient                                                       Protein of Unknown Function (DUF239)
## 13922 AT5G57815 no significant pattern equal expression                                           Cytochrome c oxidase, subunit Vib family protein
## 13929 AT5G57900 no significant pattern      up_gradient                                                                 SKP1 interacting partner 1
## 13964 AT5G58320 no significant pattern      up_gradient                                              Kinase interacting (KIP1-like) family protein
## 14003 AT5G58960 no significant pattern    down_gradient                                                 Plant protein of unknown function (DUF641)
## 14083 AT5G60490                   <NA> equal expression                                                  FASCICLIN-like arabinogalactan-protein 12
## 14084 AT5G60520 no significant pattern equal expression                                          Late embryogenesis abundant (LEA) protein-related
## 14130 AT5G61230                   <NA>      up_gradient                                                              Ankyrin repeat family protein
## 14188 AT5G62170 no significant pattern    down_gradient                                                                                           
## 14253 AT5G63120 no significant pattern    down_gradient                   P-loop containing nucleoside triphosphate hydrolases superfamily protein
## 14288 AT5G63640 no significant pattern      up_gradient                                                                ENTH/VHS/GAT family protein
## 14298 AT5G63810            down_in_tip    down_gradient                                                                      beta-galactosidase 10
## 14305 AT5G63905 no significant pattern      up_gradient                                                                                           
## 14367 AT5G64750                   <NA>       up_in_base                                             Integrase-type DNA-binding superfamily protein
## 14390 AT5G65140 no significant pattern      up_gradient                             Haloacid dehalogenase-like hydrolase (HAD) superfamily protein
## 14391 AT5G65160                   <NA>    down_gradient                                          tetratricopeptide repeat (TPR)-containing protein
## 14395 AT5G65210                   <NA>      up_gradient                                                   bZIP transcription factor family protein
## 14406 AT5G65420                   <NA>       up_in_base                                                                                CYCLIN D4;1
## 14465 AT5G66130                   <NA> equal expression                                                                     RADIATION SENSITIVE 17
## 14487 AT5G66460                   <NA>    down_gradient                                                     Glycosyl hydrolase superfamily protein
## 14493 AT5G66560 no significant pattern    down_gradient                                                 Phototropic-responsive NPH3 family protein
## 14499 AT5G66680 no significant pattern    down_gradient dolichyl-diphosphooligosaccharide-protein glycosyltransferase 48kDa subunit family protein
## 14587 ATCG00750 no significant pattern equal expression                                                                      ribosomal protein S11
## 14606 ATMG00080 no significant pattern    down_gradient                                                                      ribosomal protein L16

That's a lot of genes… and our confidence in the patterns being correct doesn't correspond to our confidence in our prediction that they are involved in C4.

To assign genes a confidence in our belief that they are involved in C4 we need an explicit model of what it means to be C4. We will develop this in the next stage of the analysis.

This will make us more confident, and more precise about how confident we are, by including information about whether the absolute level of expression as well as the developmental data, and cell-type specificity in other species.