Sommaire : Analyse d’expression du génome entier d’échantillons de fois de 92 adultes et 14 fœtus.

Desing expérimental : L’étude comprend 106 échantillons, 14 de fœtus et 92 d’adultes, sans répétitions. Les échantillons de foie des 14 fœtus ont été obtenus à la semaine de gestation 8-12. Les échantillons de foie adultes ont été recueilli auprès de 50 donneurs d’organes mort accidentellement et 42 patients subissant une ablation partielle du foie due aux tumeurs malignes, le plus souvent chez des patients atteints de cancers du côlon métastatique. Les biopsies du foie de ces patients ont été recueillies à partir de tissu «sain» qui n’ont montré aucun changement pathologique visibles par rapport à la tumeur adjacente.

la suite de l’introduction et l’analyse des données est disponible:
http://rpubs.com/jauger/79235

1. Description des données

Extrait de la matrice des données brute

## Inputting the data ...
## 
## Adding nuID to the data ...
## No Quality Control assessment of the object because it is not a "LumiBatch" object.
GSM1503720 GSM1503721 GSM1503722 GSM1503723 GSM1503724
Ku8QhfS0n_hIOABXuE 0.4222655 -1.954461 -0.2296993 3.067514 -3.8966110
fqPEquJRRlSVSfL.8A 8.2294550 5.386535 2.5563270 7.600649 7.2569830
ckiehnugOno9d7vf1Q 0.4110274 3.231216 2.7552630 3.171007 0.0444438
x57Vw5B5Fbt5JUnQkI 770.7137000 746.793900 721.9576000 672.142500 652.6106000
ritxUH.kuHlYqjozpE 1214.0710000 1237.507000 1142.0940000 1161.944000 1205.4240000
QpE5UiUgmJOJEkPXpc 12.7572300 23.374980 22.8699200 21.240150 13.7283700

On remarque que les données ne sont pas normalisées…

2. Controle qualité

Le controle qualité est important pour s’Assurer que les données qu’on compare sont vraiment comparable entre elles

Standardisation des données

Standardisation des données avec decostand (vegan). Cette méthode centre la moyenne sur zéro et unifie la variance.

Controle qualité des données dèjà normalisées

Extrait de la matrice d’expression des gènes

Fetal Fetal Adult_accident Adult_accident Adult_cancer Adult_cancer
fqPEquJRRlSVSfL.8A 4.248890 4.021668 4.853600 4.623503 4.286804 4.187977
ckiehnugOno9d7vf1Q 3.589026 3.853151 3.996015 4.424531 4.851610 4.779911
x57Vw5B5Fbt5JUnQkI 9.482884 9.337328 9.012896 10.959112 10.552464 10.750783
ritxUH.kuHlYqjozpE 10.167734 10.142569 10.035402 11.588105 11.259778 11.375939
QpE5UiUgmJOJEkPXpc 4.480946 4.816507 4.932566 7.313574 6.909689 6.544238
Bx496XsFXiAlj.Eaeo 5.122735 5.078283 3.765689 5.509650 6.467909 7.124121

Résumé de la matrice

3. Annotation des gènes

Détermination des IDs entrezgene à partir des nuIDs

Nom des gènes: nuID (nucleotide universal identifier). À partir d’une séquence d’acides aminés (dans ce cas la séquence des sondes), on obtient un symbol qui est traduisible en séquence avec la fonction id2seq : Exemple: Ku8QhfS0n_hIOABXuE

## [1] "GTGTTACAAGACCTTCAGTCAGCTTTGGACAGAATGAAAAACCCTGTGAC"

à partir duquel on peut savoir à quel gène (entrezID) (nuID2EntrezID) Exemple: Ku8QhfS0n_hIOABXuE

## The provided names of filterTh does not match the field names of nuID_MappingInfo table!
##  No filtering will be performed!
## Ku8QhfS0n_hIOABXuE 
##           "346389"

Extrait de la matrice d’expression avec entrezgene

Fetal Fetal Fetal Fetal Fetal
1 4.248890 4.021668 3.772319 4.357742 4.087132
1 3.589026 3.853151 3.789936 3.963339 3.489125
29974 9.482884 9.337328 9.378861 9.595830 9.161382
29974 10.167734 10.142569 10.085749 10.387303 10.073084
29974 4.480946 4.816507 4.843682 5.141674 4.435351
87769 5.122735 5.078283 4.691750 4.980442 4.832459

Annotation des gènes avec bioMart

BioMart est un ensemble de base de données destinées à l’annotation des gènes et maintenues par bioconductor.

Estrait des données d’annotation

4. Analyse différencielle

Comparison pair-wise (Statistiques de Fisher)

3 groupes: Fetus, Adult-accident et Adult-cancer

Matrice du design

fetal accident cancer
1 1 0 0
2 1 0 0
3 1 0 0
15 0 1 0
16 0 1 0
17 0 1 0
18 0 1 0
65 0 0 1
66 0 0 1
67 0 0 1
68 0 0 1

Modèle linéaire

Matrice de contraste

Premières lignes de la topTable annotée Fetus vs Accident

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
1 4.9918410 4.847916 54.291472 0.0000000 0.0000000 172.925315 A1BG ENST00000596924 alpha-1-B glycoprotein [Source:HGNC Symbol;Acc:HGNC:5]
10 0.1454660 5.712242 1.675329 0.0967487 0.1553646 -6.503711 NAT2 ENST00000286479 N-acetyltransferase 2 (arylamine N-acetyltransferase) [Source:HGNC Symbol;Acc:HGNC:7646]
100 2.7062941 4.668284 21.099002 0.0000000 0.0000000 81.040987 ADA ENST00000464097 adenosine deaminase [Source:HGNC Symbol;Acc:HGNC:186]
1000 -0.8827620 6.229116 -7.009722 0.0000000 0.0000000 12.518572 CDH2 ENST00000269141 cadherin 2, type 1, N-cadherin (neuronal) [Source:HGNC Symbol;Acc:HGNC:1759]
10000 2.1740491 5.304701 16.845899 0.0000000 0.0000000 62.281163 AKT3 ENST00000336199 v-akt murine thymoma viral oncogene homolog 3 [Source:HGNC Symbol;Acc:HGNC:393]
10001 0.2052302 5.153271 1.932780 0.0558693 0.0961727 -6.049320 MED6 ENST00000554963 mediator complex subunit 6 [Source:HGNC Symbol;Acc:HGNC:19970]

Premières lignes de la topTable annotée Fetus vs Cancer

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
1 5.0716538 4.847916 54.045072 0.0000000 0.0000000 172.401768 A1BG ENST00000596924 alpha-1-B glycoprotein [Source:HGNC Symbol;Acc:HGNC:5]
10 -0.1798743 4.265716 -1.792586 0.0758251 0.1217619 -6.283593 NAT2 ENST00000286479 N-acetyltransferase 2 (arylamine N-acetyltransferase) [Source:HGNC Symbol;Acc:HGNC:7646]
100 3.8334779 4.301521 20.785653 0.0000000 0.0000000 79.744560 ADA ENST00000464097 adenosine deaminase [Source:HGNC Symbol;Acc:HGNC:186]
1000 -0.6117572 5.154613 -7.322148 0.0000000 0.0000000 14.068421 CDH2 ENST00000269141 cadherin 2, type 1, N-cadherin (neuronal) [Source:HGNC Symbol;Acc:HGNC:1759]
10000 -3.1798082 6.516053 -16.882427 0.0000000 0.0000000 62.464659 AKT3 ENST00000336199 v-akt murine thymoma viral oncogene homolog 3 [Source:HGNC Symbol;Acc:HGNC:393]
10001 -0.2513475 6.903630 -2.053559 0.0424223 0.0730251 -5.794572 MED6 ENST00000554963 mediator complex subunit 6 [Source:HGNC Symbol;Acc:HGNC:19970]

Premières lignes de la topTable annotée Cancer vs Accident

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
1 1.8184439 6.864074 18.121214 0.0000000 0.0000000 67.804833 A1BG ENST00000596924 alpha-1-B glycoprotein [Source:HGNC Symbol;Acc:HGNC:5]
10 -0.1049068 5.478501 -1.293088 0.1987265 0.3191264 -6.099121 NAT2 ENST00000286479 N-acetyltransferase 2 (arylamine N-acetyltransferase) [Source:HGNC Symbol;Acc:HGNC:7646]
100 1.5551068 7.877822 9.050954 0.0000000 0.0000000 23.481651 ADA ENST00000464097 adenosine deaminase [Source:HGNC Symbol;Acc:HGNC:186]
1000 0.3753385 3.794488 4.369870 0.0000286 0.0001825 1.857020 CDH2 ENST00000269141 cadherin 2, type 1, N-cadherin (neuronal) [Source:HGNC Symbol;Acc:HGNC:1759]
10000 -1.1855973 6.991491 -7.958845 0.0000000 0.0000000 17.948722 AKT3 ENST00000336199 v-akt murine thymoma viral oncogene homolog 3 [Source:HGNC Symbol;Acc:HGNC:393]
10001 -0.2109085 3.763177 -1.466952 0.1452810 0.2500659 -5.863443 MED6 ENST00000554963 mediator complex subunit 6 [Source:HGNC Symbol;Acc:HGNC:19970]

Les histogrammes de p-value permet d’illustrer, pour chaque paire de groupes testés, la distribution des p-values. Ici, on peut voir que beaucoup de gènes très significativement modulés (hautes p-values) et une distribution plutôt uniforme pour les autres valeurs.

MAplots (AvgExpr , logFC)

Volcano plots

Venn diagram

5. Sélection des gènes

10 gènes les plus significatifs fetal-accident

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
9195 87769 6.079831 4.041707 51.11015 0 0 166.78107 GGACT ENST00000376250 gamma-glutamylamine cyclotransferase [Source:HGNC Symbol;Acc:HGNC:25100]
8677 8086 5.788777 5.513322 42.96708 0 0 149.17869 AAAS ENST00000209873 achalasia, adrenocortical insufficiency, alacrimia [Source:HGNC Symbol;Acc:HGNC:13666]
2930 22848 4.991717 4.551680 38.09995 0 0 137.09445 AAK1 ENST00000606389 AP2 associated kinase 1 [Source:HGNC Symbol;Acc:HGNC:19679]
9172 8714 3.171396 4.368388 29.04971 0 0 110.47469 ABCC3 ENST00000515707 ATP-binding cassette, sub-family C (CFTR/MRP), member 3 [Source:HGNC Symbol;Acc:HGNC:54]
147 10057 2.800792 4.353750 28.93665 0 0 110.10082 ABCC5 ENST00000334444 ATP-binding cassette, sub-family C (CFTR/MRP), member 5 [Source:HGNC Symbol;Acc:HGNC:56]
150 10060 8.621162 5.783661 28.34818 0 0 108.13614 ABCC9 ENST00000261200 ATP-binding cassette, sub-family C (CFTR/MRP), member 9 [Source:HGNC Symbol;Acc:HGNC:60]
3023 23 3.545276 3.996892 28.07968 0 0 107.22906 ABCF1 ENST00000421042 ATP-binding cassette, sub-family F (GCN20), member 1 [Source:HGNC Symbol;Acc:HGNC:70]
1518 130013 4.751028 5.351986 23.43176 0 0 90.38699 ACMSD ENST00000356140 aminocarboxymuconate semialdehyde decarboxylase [Source:HGNC Symbol;Acc:HGNC:19288]
7197 57001 4.829235 5.045363 23.42978 0 0 90.37932 ACN9 ENST00000432641 ACN9 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:HGNC:21752]
3526 2515 2.685221 4.895690 20.97645 0 0 80.53207 ADAM2 ENST00000620181 ADAM metallopeptidase domain 2 [Source:HGNC Symbol;Acc:HGNC:198]

10 gènes les plus significatifs cancer-accident

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
6529 53947 1.9069903 6.971274 13.574875 0 0 46.55687 A4GALT ENST00000401850 alpha 1,4-galactosyltransferase [Source:HGNC Symbol;Acc:HGNC:18149]
3023 23 1.9269182 7.932809 10.689180 0 0 31.91840 ABCF1 ENST00000421042 ATP-binding cassette, sub-family F (GCN20), member 1 [Source:HGNC Symbol;Acc:HGNC:70]
9128 86 -0.7143012 9.169498 -9.271966 0 0 24.61483 ACTL6A ENST00000429709 actin-like 6A [Source:HGNC Symbol;Acc:HGNC:24124]
1457 128 0.7416583 4.741236 8.618977 0 0 21.27755 ADH5 ENST00000296412 alcohol dehydrogenase 5 (class III), chi polypeptide [Source:HGNC Symbol;Acc:HGNC:253]
1017 113622 1.3411958 6.250165 8.481664 0 0 20.58066 ADPRHL1 ENST00000356501 ADP-ribosylhydrolase like 1 [Source:HGNC Symbol;Acc:HGNC:21303]
1293 122622 -0.9225160 7.012997 -8.431463 0 0 20.32641 ADSSL1 ENST00000330877 adenylosuccinate synthase like 1 [Source:HGNC Symbol;Acc:HGNC:20093]
496 10555 -1.2335417 4.202796 -8.272638 0 0 19.52401 AGPAT2 ENST00000371694 1-acylglycerol-3-phosphate O-acyltransferase 2 [Source:HGNC Symbol;Acc:HGNC:325]
2522 199 0.6793750 9.047294 8.142941 0 0 18.87119 AIF1 ENST00000383474 allograft inflammatory factor 1 [Source:HGNC Symbol;Acc:HGNC:352]
2687 210 -1.0594993 6.793789 -7.952000 0 0 17.91452 ALAD ENST00000409155 aminolevulinate dehydratase [Source:HGNC Symbol;Acc:HGNC:395]
1281 121642 1.4675434 8.091083 7.775714 0 0 17.03641 ALKBH2 ENST00000343075 alkB, alkylation repair homolog 2 (E. coli) [Source:HGNC Symbol;Acc:HGNC:32487]

10 gènes les plus significatifs fetal-cancer

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
4720 29974 6.098110 4.708711 52.27345 0 0 169.01512 A1CF ENST00000374001 APOBEC1 complementation factor [Source:HGNC Symbol;Acc:HGNC:24086]
9195 87769 6.172259 4.041707 50.83881 0 0 166.18812 GGACT ENST00000376250 gamma-glutamylamine cyclotransferase [Source:HGNC Symbol;Acc:HGNC:25100]
5709 404744 6.653217 4.822093 44.43247 0 0 152.53428 NPSR1-AS1 ENST00000419766 NPSR1 antisense RNA 1 [Source:HGNC Symbol;Acc:HGNC:22128]
8677 8086 6.143836 4.569347 40.50610 0 0 143.20907 AAAS ENST00000209873 achalasia, adrenocortical insufficiency, alacrimia [Source:HGNC Symbol;Acc:HGNC:13666]
2930 22848 7.645796 4.468302 37.23930 0 0 134.79454 AAK1 ENST00000606389 AP2 associated kinase 1 [Source:HGNC Symbol;Acc:HGNC:19679]
9172 8714 4.515019 3.448558 28.48178 0 0 108.58259 ABCC3 ENST00000515707 ATP-binding cassette, sub-family C (CFTR/MRP), member 3 [Source:HGNC Symbol;Acc:HGNC:54]
147 10057 4.687815 6.054722 27.99468 0 0 106.93890 ABCC5 ENST00000334444 ATP-binding cassette, sub-family C (CFTR/MRP), member 5 [Source:HGNC Symbol;Acc:HGNC:56]
150 10060 2.749911 4.353750 27.83695 0 0 106.40187 ABCC9 ENST00000261200 ATP-binding cassette, sub-family C (CFTR/MRP), member 9 [Source:HGNC Symbol;Acc:HGNC:60]
3704 25864 2.660306 3.986497 26.26922 0 0 100.93341 ABHD14A ENST00000497864 abhydrolase domain containing 14A [Source:HGNC Symbol;Acc:HGNC:24538]
1518 130013 4.555676 6.292746 22.92540 0 0 88.41793 ACMSD ENST00000356140 aminocarboxymuconate semialdehyde decarboxylase [Source:HGNC Symbol;Acc:HGNC:19288]

5 gènes plus augmentés (fetus-accident)

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
150 10060 8.621162 5.783661 28.34818 0 0 108.13614 ABCC9 ENST00000261200 ATP-binding cassette, sub-family C (CFTR/MRP), member 9 [Source:HGNC Symbol;Acc:HGNC:60]
8677 8086 5.788777 5.513322 42.96708 0 0 149.17869 AAAS ENST00000209873 achalasia, adrenocortical insufficiency, alacrimia [Source:HGNC Symbol;Acc:HGNC:13666]
2930 22848 4.991717 4.551680 38.09995 0 0 137.09445 AAK1 ENST00000606389 AP2 associated kinase 1 [Source:HGNC Symbol;Acc:HGNC:19679]
7197 57001 4.829235 5.045363 23.42978 0 0 90.37932 ACN9 ENST00000432641 ACN9 homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:HGNC:21752]
1518 130013 4.751028 5.351986 23.43176 0 0 90.38699 ACMSD ENST00000356140 aminocarboxymuconate semialdehyde decarboxylase [Source:HGNC Symbol;Acc:HGNC:19288]
3310 23400 4.675468 10.199897 11.78558 0 0 37.13690 ATP13A2 ENST00000326735 ATPase type 13A2 [Source:HGNC Symbol;Acc:HGNC:30213]
5582 396 4.636584 10.263277 13.00729 0 0 43.44072 ARHGDIA ENST00000582520 Rho GDP dissociation inhibitor (GDI) alpha [Source:HGNC Symbol;Acc:HGNC:678]
3940 26286 3.860128 4.694028 13.23203 0 0 44.58736 ARFGAP3 ENST00000263245 ADP-ribosylation factor GTPase activating protein 3 [Source:HGNC Symbol;Acc:HGNC:661]
5792 417 3.800786 3.742609 12.43207 0 0 40.48627 ART1 ENST00000529556 ADP-ribosyltransferase 1 [Source:HGNC Symbol;Acc:HGNC:723]
3023 23 3.545276 3.996892 28.07968 0 0 107.22906 ABCF1 ENST00000421042 ATP-binding cassette, sub-family F (GCN20), member 1 [Source:HGNC Symbol;Acc:HGNC:70]

5 gènes plus dimimués (fetus-accident)

entrezgene logFC AveExpr t P.Value adj.P.Val B hgnc_symbol ensembl_transcript_id description
4838 312 -5.078377 9.635645 -14.143339 0 0 49.18735 ANXA13 ENST00000262219 annexin A13 [Source:HGNC Symbol;Acc:HGNC:536]
7234 57143 -4.291101 8.338213 -20.293894 0 0 77.66409 ADCK1 ENST00000238561 aarF domain containing kinase 1 [Source:HGNC Symbol;Acc:HGNC:19038]
4533 28984 -4.188437 6.616277 -9.639309 0 0 25.90839 RGCC ENST00000379359 regulator of cell cycle [Source:HGNC Symbol;Acc:HGNC:20369]
1448 127687 -3.750984 11.775076 -8.882910 0 0 21.97025 C1orf122 ENST00000373043 chromosome 1 open reading frame 122 [Source:HGNC Symbol;Acc:HGNC:24789]
4537 28990 -3.644817 5.803626 -12.061201 0 0 38.56817 ASTE1 ENST00000514044 asteroid homolog 1 (Drosophila) [Source:HGNC Symbol;Acc:HGNC:25021]
172 10093 -3.610703 8.764294 -12.532255 0 0 41.00275 ARPC4 ENST00000498623 actin related protein 2/3 complex, subunit 4, 20kDa [Source:HGNC Symbol;Acc:HGNC:707]
3336 23452 -3.431124 7.069440 -14.995831 0 0 53.41076 ANGPTL2 ENST00000373425 angiopoietin-like 2 [Source:HGNC Symbol;Acc:HGNC:490]
1824 145645 -3.402382 10.624793 -9.400709 0 0 24.66232 C15orf43 ENST00000340827 chromosome 15 open reading frame 43 [Source:HGNC Symbol;Acc:HGNC:28520]
1943 149563 -3.296115 8.596678 -8.721819 0 0 21.13724 C1orf64 ENST00000329454 chromosome 1 open reading frame 64 [Source:HGNC Symbol;Acc:HGNC:28339]
1223 118663 -3.004229 8.820245 -10.207852 0 0 28.88548 BTBD16 ENST00000260723 BTB (POZ) domain containing 16 [Source:HGNC Symbol;Acc:HGNC:26340]

6. Ontologie des gènes

Résumé de l’ontologie des gènes

Fetus - Adult accident

GOBPID Pvalue OddsRatio ExpCount Count Size Term
GO:0051235 2.70e-06 2.187553 28.82705 54 222 maintenance of location
GO:0019058 7.89e-05 1.810950 37.65695 61 290 viral life cycle
GO:0044766 8.99e-05 4.609570 4.15525 13 32 multi-organism transport
GO:0046794 8.99e-05 4.609570 4.15525 13 32 transport of virus
GO:0075733 8.99e-05 4.609570 4.15525 13 32 intracellular transport of virus
GO:1902583 8.99e-05 4.609570 4.15525 13 32 multi-organism intracellular transport

Adult accident - Adult cancer

GOBPID Pvalue OddsRatio ExpCount Count Size Term
GO:0006302 0.0000859 3.334049 5.3465886 16 130 double-strand break repair
GO:0043928 0.0001360 7.493528 1.1927005 7 29 exonucleolytic nuclear-transcribed mRNA catabolic process involved in deadenylation-dependent decay
GO:0000291 0.0002133 6.868091 1.2749557 7 31 nuclear-transcribed mRNA catabolic process, exonucleolytic
GO:2000725 0.0002465 11.746244 0.6169141 5 15 regulation of cardiac muscle cell differentiation
GO:0045978 0.0002684 70.287854 0.1645104 3 4 negative regulation of nucleoside metabolic process
GO:0000724 0.0003840 4.142300 2.7555495 10 67 double-strand break repair via homologous recombination

Recherche des pathways avec KEGG

alt text alt text