Hypertrophic Cardiomyopathy is a highly heterogeneous and potentially life-threatening condition. Here we will analyse a Human Hypertrophic Cardiomyopathy (HCM) dataset and see if the same techniques previously applied to a cancer cell line dataset can be applied here to deduce the role mitochondrial biogenesis and mitochondrial function plays in this disease.
The data can be loaded and needs to be prepared slightly to run the analysis.
load("~/Documents/PhD/Rworkspaces/Heart_data1.RData")
# Heart_rna_seq <-
# read.delim('~/Documents/PhD/Datasets/Heart/GSE36961_series_matrix.txt')
# row.names(Heart_rna_seq)<-Heart_rna_seq[,1]
# Heart_rna_seq<-Heart_rna_seq[,-1]
# Heart_sup <- read.delim('~/Documents/PhD/Datasets/Heart/Heart_sup.txt',
# header=F) rownames(Heart_sup)<-Heart_sup[,1] Heart_sup<-Heart_sup[,-1]
# Heart_sup2<-matrix(1,33,145) for(i in 1:145){
# Heart_sup2[,i]<-as.character(Heart_sup[,i]) }
# Change sample names of data.
for (i in 1:length(names(Heart_rna_seq))) {
if (i < 108) {
names(Heart_rna_seq)[i] <- paste(names(Heart_rna_seq)[i], "sample",
i)
} else {
names(Heart_rna_seq)[i] <- paste(names(Heart_rna_seq)[i], "control",
i - 107)
}
}
Here we have 106 hypertrophic cardiomyopathy samples and 39 control samples. The aim is to see if mitochondrial genese correlate in a similar way to that found in cancer, and if there is an alteration between the control and diseased samples. Note we are running an anlysis designed for microarray data on RNAseq data.
suppressPackageStartupMessages(library(gdata))
suppressPackageStartupMessages(library(gplots))
## Warning: package 'gtools' was built under R version 2.15.3
## Warning: package 'KernSmooth' was built under R version 2.15.3
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(MicroarrayGA))
# Mito<-read.xls('Documents/PhD/Datasets/Human.MitoCarta.xls',sheet=2)
load("~/Documents/PhD/Rworkspaces/mito.RData")
Mitochondrial_genes <- as.character(Mito[, 5])
Heart_mito_loc <- which((rownames(Heart_rna_seq)) %in% Mitochondrial_genes)
hypertrophic_loc <- c(1:106)
control_loc <- c(107:145)
Like in the previous analysis of the CCLE, a genetic algorithm can be applied to find the 10 most highly correlated samples for the mitochondrial genes. Full details of the code can be found in the R file found here.
load("~/Documents/PhD/Rworkspaces/heart_best_seed_all.RData")
best_seed_all <- best_seed
load("~/Documents/PhD/Rworkspaces/best_seed_hyp.RData")
load("~/Documents/PhD/Rworkspaces/best_seed_con.RData")
The seed produced with all the samples produces a similar pattern to the one produced previously from cancer data.
best_heat_all <- heatmap.2(cor(t(Heart_rna_seq[Heart_mito_loc, best_seed_all])),
trace = "none")
The highly correlated genes can be selected from the dendrogram.
best_seed_all_hi_cor <- c(as.hclust(best_heat_all$rowDendrogram[[1]][[1]][[1]])$label,
as.hclust(best_heat_all$rowDendrogram[[2]][[2]][[2]])$label)
best_seed_all_hi_cor_loc <- which(as.character(row.names(Heart_rna_seq)) %in%
best_seed_all_hi_cor)
all_seed_heat <- heatmap.2(cor(t(Heart_rna_seq[best_seed_all_hi_cor_loc, best_seed_all])),
trace = "none")
This heatmap with one large group and two small is very similar to the ones seen before. The mitochondrial genes have been split into the following two groups.
Collate_genes(as.hclust(all_seed_heat$rowDendrogram[[1]])$labels)
## [1] "AASS ABCB6 ABCF2 ACLY ACOT9 ACSL4 ADSL AGPS AIFM3 AK2 ALDH3A2 ATAD3A ATPBD3 BAX BBOX1 BID CCT7 CLIC4 COX6A1 CRY1 CYP11B1 DGUOK DHX29 DIABLO ENDOGL1 FASN FSIP2 GARS GK GK2 GLRX2 GLS GPX1 GRPEL1 GRPEL2 HK2 HSCB HSPB7 HSPD1 HSPE1 KIF1B LDHA LOC646630 MRPL14 MRPL18 MRPL22 MRPL3 MRPL32 MRPL42 MRPL50 MRPS17 MRPS6 MTCH1 MTHFD1L MTHFD2 MTRF1L NME1 NME1-NME2 PANK2 PARS2 PCK2 PDK4 PRDX1 PRDX6 PTRH2 PTS RAB32 RAB8B RHOA RPL10A RPL34 RPS14 RPS15A SARS SECISBP2 SHMT1 SHMT2 SLC25A13 SLC25A19 SLC25A25 SLC25A39 SLC25A44 SND1 SSBP1 TAP1 TBRG4 TCIRG1 TFB2M TIMM13 TIMM44 TMEM11 TOMM34 TOMM40 TOP1MT TSHZ3 TXN TXNRD1 VAMP8"
Collate_genes(as.hclust(all_seed_heat$rowDendrogram[[2]])$labels)
## [1] "ABCA9 ABHD10 ABHD11 ACAA1 ACAA2 ACACB ACAD10 ACAD11 ACAD8 ACADM ACADS ACADSB ACADVL ACAT1 ACN9 ACO1 ACO2 ACOT2 ACOX3 ACP6 ACSS1 ACYP2 ADCK1 ADCK5 ADHFE1 AFG3L2 AGXT2L2 AIFM1 AIFM2 AK3 AKAP1 AKR7A2 ALDH1L1 ALDH2 ALDH4A1 ALDH5A1 ALDH6A1 ALDH7A1 ALDH9A1 ALKBH7 AMACR AMT ANKRD26 APOA1BP ARMC4 AS3MT ATAD1 ATP5A1 ATP5B ATP5D ATP5F1 ATP5G1 ATP5G2 ATP5G3 ATP5H ATP5I ATP5J ATP5L ATP5O ATP5S ATPAF1 ATPAF2 ATPIF1 AUH AURKAIP1 BAD BCKDHA BCKDHB BCKDK BCL2 BCL2L13 BCS1L BDH1 BOLA1 BPHL BRP44 BRP44L CBARA1 CBR4 CCDC44 CCDC51 CCDC90A CECR5 CHCHD4 CHCHD7 CKMT2 CLYBL COMTD1 COQ10A COQ2 COQ3 COQ5 COQ6 COQ7 COQ9 COX10 COX15 COX17 COX4I2 COX5A COX5B COX6A2 COX6B1 COX6C COX7A1 COX7A2 COX7C COX8A CPT1B CPT2 CRAT CRLS1 CS CTPS2 CYCS CYP11A1 CYP27A1 DACT2 DAP3 DARS2 DBT DCAKD DCI DDX28 DHRS1 DHRS4 DHTKD1 DLAT DLST DNAJC4 DUSP26 DUT EARS2 ECH1 ECHS1 EEFSEC EFHD1 EHHADH ENDOG ETFA ETFB ETFDH FAHD2A FARS2 FASTK FASTKD2 FDX1 FECH FIS1 FLJ20920 FOXRED1 FUNDC2 GALC GATM GBAS GCAT GCDH GFM1 GFM2 GHITM GLRX5 GOT2 GPAM GPT2 GPX4 GRSF1 GSTK1 GTPBP8 HADH HADHA HAGH HCCS HDDC2 HEMK1 HIBADH HIBCH HIGD2A HINT2 HMGCL HMGCS2 HSD17B4 HSD17B8 HSDL2 IARS2 ICT1 IDH1 IDH2 IDH3A IDH3B IDH3G IMMP2L IMMT ISCU IVD KIAA0141 L2HGDH LACE1 LACTB2 LARS2 LDHB LDHD LETMD1 LIAS LIPT1 LOC644310 LOC727762 LONP1 LYRM2 LYRM5 MAOB MAT2B MCCC1 MCCC2 MCEE MDH1 MDH2 ME1 ME2 ME3 MECR METT11D1 MFN1 MFN2 MGC3196 MIPEP MLYCD MMAB MMACHC MOSC1 MOSC2 MPV17 MRPL10 MRPL11 MRPL16 MRPL2 MRPL20 MRPL21 MRPL24 MRPL33 MRPL34 MRPL35 MRPL37 MRPL39 MRPL40 MRPL41 MRPL43 MRPL44 MRPL45 MRPL46 MRPL49 MRPL51 MRPL53 MRPS10 MRPS11 MRPS14 MRPS16 MRPS18B MRPS18C MRPS21 MRPS22 MRPS25 MRPS26 MRPS27 MRPS30 MRPS33 MRPS36 MRPS5 MRPS7 MRPS9 MRRF MSRA MSRB2 MTCP1 MTERFD3 MTFMT MTHFD1 MTHFS MTIF2 MTO1 MTX2 MUT NCOA4 NDUFA1 NDUFA10 NDUFA11 NDUFA12 NDUFA13 NDUFA3 NDUFA4 NDUFA7 NDUFA8 NDUFA9 NDUFAF1 NDUFB3 NDUFB4 NDUFB5 NDUFB6 NDUFB7 NDUFB8 NDUFC1 NDUFS1 NDUFS2 NDUFS3 NDUFS7 NDUFS8 NDUFV1 NDUFV3 NFS1 NFU1 NIF3L1 NIPSNAP1 NIT1 NLRX1 NME3 NME4 NNT NRD1 NT5M NTHL1 OAT OCIAD1 OGDH OGDHL OMA1 OPA1 OSGEPL1 OXA1L OXCT1 OXR1 OXSM PACRG PARL PCBD2 PCCA PCCB PDHB PDHX PDK2 PECI PET112L PEX11B PHYH PINK1 PMPCA PMPCB PNKD POLDIP2 POLRMT PPM1K PPOX PPTC7 PRDX2 PRDX3 PRDX5 PRODH PTCD2 PTGES2 PXMP2 QDPR RAB24 RDH13 RDH14 REXO2 RHOT1 RHOT2 RILP RSAD1 RTN4IP1 SAMM50 SARS2 SCO1 SCP2 SDHA SDHB SDHC SDSL SERHL2 SFXN4 SFXN5 SIRT5 SLC25A11 SLC25A12 SLC25A16 SLC25A20 SLC25A23 SLC25A26 SLC25A27 SLC25A3 SLC25A30 SLC25A34 SLC25A35 SLC25A4 SLC25A40 SLC25A42 SPG7 SPR SPRYD4 STOML2 SUCLA2 SUCLG1 SUCLG2 SUOX SUPV3L1 SURF1 TATDN3 TFAM TFB1M THEM2 TK2 TMEM126A TMEM143 TMEM14C TMEM160 TOMM20 TOMM40L TOMM70A TST TUFM TXNRD2 UCRC UQCR UQCRB UQCRC1 UQCRC2 UQCRFS1 UQCRH UQCRQ VAMP1 VDAC1 VDAC3 WBSCR16 WDR22 ZADH2"
The same process can be repeated for the seed produces from the control samples.
best_heat_con <- heatmap.2(cor(t(Heart_rna_seq[Heart_mito_loc, best_seed_con])),
trace = "none")
Again we select the highly correlated genes
best_seed_con_hi_cor <- c(as.hclust(best_heat_con$rowDendrogram[[1]][[1]][[1]])$label,
as.hclust(best_heat_con$rowDendrogram[[2]][[2]][[2]])$label)
best_seed_con_hi_cor_loc <- which(as.character(row.names(Heart_rna_seq)) %in%
best_seed_con_hi_cor)
heatmap.2(cor(t(Heart_rna_seq[best_seed_con_hi_cor_loc, best_seed_con])), trace = "none")
This heatmap appears nearly identical to the one found previously from the seed produced with all the samples. This similarity can be tested.
length(intersect(best_seed_all_hi_cor, best_seed_con_hi_cor))/max(c(length(best_seed_all_hi_cor),
length(best_seed_con_hi_cor)))
## [1] 0.8286
The two gene sets share 82% similarity.
intersect(best_seed_all, best_seed_con) # The Control seed is a subset of all, this appears to be the same pattern.
## [1] 122 110 113 121
There is a non-zero intersection between the seeds.
Finally we repeat this process with the seed produced from the hypertrophic samples
best_heat_hyp <- heatmap.2(cor(t(Heart_rna_seq[Heart_mito_loc, best_seed_hyp])),
trace = "none")
Selecting the highly correlated genes
best_heat_hyp_hicor_genes <- c(as.hclust(best_heat_hyp$rowDendrogram[[1]][[1]][[2]])$labels,
as.hclust(best_heat_hyp$rowDendrogram[[2]][[2]][[2]])$labels)
best_heat_hyp_hicor_genes_loc <- which(as.character(row.names(Heart_rna_seq)) %in%
best_heat_hyp_hicor_genes)
hypertrophic_seed_heat <- heatmap.2(cor(t(Heart_rna_seq[best_heat_hyp_hicor_genes_loc,
best_seed_hyp])), trace = "none")
This has produced a heatmap notably different from before with two roughly equal groups, these groups are composed as follows:
Collate_genes(as.hclust(hypertrophic_seed_heat$rowDendrogram[[1]])$labels)
## [1] "ABCB10 ABCC12 ACAD11 ACAD8 ACADM ACAT1 ACN9 ACO1 ACSL1 ACSL4 ACSL5 AGPS AGXT2 AKAP1 AKR7A2 ALAS1 ALDH1L2 ALDH3A2 ALDH5A1 ALDH7A1 ALDH9A1 AMACR AS3MT ASAH2 ATAD1 ATP5C1 ATP5G3 ATP5I ATP5J2 ATP5L AURKAIP1 BBOX1 BCL2L13 BID BRP44 BRP44L CA5B CBR4 CCDC44 CCDC51 CCDC90B CKMT1A CLIC4 CLPB COQ10B COQ2 COQ5 COX10 COX11 COX18 COX4I1 COX6B1 COX7B COX7C COX8C CPS1 CRLS1 CRY1 CYP27B1 DLAT DLD DMGDH DNAJA3 DNAJC15 DUT EFHA1 EFHD1 ETFA FAHD2A FECH FH FTH1 FXN GALC GBAS GFM1 GHITM GK2 GLDC GLRX2 GLYAT GNG5 GPAM GPD2 GRSF1 GTPBP8 GUF1 HAO2 HBXIP HDDC2 HIBADH HIBCH HIGD1A HMGCL HRSP12 HSDL2 HSPE1 IDE IDI1 IMMP2L IREB2 ISCU KYNU LACTB LACTB2 LDHAL6B LDHB LIAS LIPT1 LOC389293 LOC644310 LYRM2 LYRM5 MAT2B MCEE MDH1 ME1 ME2 MFN1 MGC19604 MRPL10 MRPL13 MRPL14 MRPL15 MRPL18 MRPL19 MRPL3 MRPL30 MRPL32 MRPL33 MRPL40 MRPL47 MRPL48 MRPS18C MRPS28 MRPS30 MRPS33 MRPS36 MRPS6 MRRF MSRB2 MTCH1 MTCP1 MTERFD1 MTERFD3 MTFMT MTHFD1L MTHFD2L MTO1 MTX2 NARS NARS2 NCOA4 NDUFA3 NDUFA4 NDUFA5 NDUFA7 NDUFAB1 NDUFB10 NDUFB4 NDUFB5 NDUFC2 NDUFS3 NFXL1 NIF3L1 NIPSNAP3A NIT1 NNT NT5C NT5C3 NUDT19 OAT OCIAD1 OMA1 OSGEPL1 OXR1 PAK7 PANK2 PCBD2 PCCB PDHB PDHX PDSS1 PDSS2 PECI PHYH PHYHIPL PMPCB POLDIP2 PPA2 PPM1K PPM2C PRDX3 PRDX6 PROSC PTCD2 PTS QDPR RAB8B RDH14 REXO2 RFK RG9MTD1 RHOA RPL34 RPL35A RPS15A SCO1 SDHD SFXN4 SLC25A13 SLC25A16 SLC25A18 SLC25A19 SLC25A23 SLC25A25 SLC25A28 SLC25A3 SLC25A32 SLC25A40 SLC25A43 SLC25A44 SLC25A46 SPG7 STAR SUCLG2 TATDN3 TDRKH TFAM TFB2M THEM5 TIMM10 TIMM23 TIMM8A TMEM126A TMEM14C TMEM65 TOMM7 TOMM70A TRIT1 TRNT1 TSHZ3 TTC19 UCRC UQCRB UQCRC2 UQCRFS1 UQCRH UQCRQ VDAC1 VDAC2 VDAC3 WARS2 WDR22"
Collate_genes(as.hclust(hypertrophic_seed_heat$rowDendrogram[[2]])$labels)
## [1] "ABCA9 ABCB6 ABCD1 ABCF2 ACAA1 ACADS ACO2 ACOT9 ACP6 ACSS1 ADCK1 ADCK4 ADCK5 AFG3L2 AGPAT5 AGR2 AGXT2L2 AIFM1 AIFM2 ALDH1L1 ALDH4A1 ALKBH7 APEX2 ARMC1 ATAD3A ATP5D ATP5G1 ATP5G2 ATPIF1 BAK1 BCAT2 BCKDHA BDH1 BLOC1S1 BOLA1 BPHL CBARA1 CCDC109A CCT7 CHCHD3 CHCHD7 COMTD1 COQ6 COX15 COX5B COX8A CPT1B CPT2 CRAT CYB5R1 CYP11A1 CYP11B1 CYP11B2 DAP3 DCI DDX28 DHODH DHTKD1 DLST DNAJC11 DRG2 EARS2 ECSIT EEFSEC ELAC2 ENDOG ETFB ETHE1 FAHD1 FAM36A FDXR FKBP8 FKSG24 FTMT FTSJ2 GARS GCAT GCDH GFER GLRX GLUD1 GLYCTK GMPPB GOT2 GPX1 GPX4 GSTK1 GTPBP5 HADHB HAGH HIGD2A HK1 HSD17B10 HSD17B8 HSPA9 HSPB7 HTRA2 IARS2 ICT1 IDH2 IDH3A IDH3G ISOC2 IVD KARS KIAA0141 LARS2 LDHD LETM1 LETMD1 LOC123876 LOC646630 LONP1 LYRM4 MAP1D MARS ME3 METT11D1 MFN2 MGC3196 MGC70857 MIPEP MOSC2 MPST MPV17 MRM1 MRPL12 MRPL16 MRPL2 MRPL21 MRPL23 MRPL24 MRPL27 MRPL28 MRPL34 MRPL38 MRPL4 MRPL41 MRPL43 MRPL49 MRPL53 MRPL54 MRPS12 MRPS18A MRPS18B MRPS2 MRPS21 MRPS24 MRPS27 MRPS34 MRPS5 MTIF3 MTX1 MUT NDUFA10 NDUFA13 NDUFA9 NDUFAF1 NDUFB11 NDUFB2 NDUFB7 NDUFS4 NDUFS7 NDUFV1 NDUFV2 NIPSNAP3B NLRX1 NME1-NME2 NME3 NME4 NME6 NT5M NTHL1 OGDH OPA3 OXA1L OXSM PACRG PARK7 PARL PARS2 PCCA PDPR PGS1 PHB2 PIGY PINK1 PISD PMPCA POLG POLG2 POLRMT PPIF PRDX1 PRO1853 PRODH2 PTGES2 PTRH1 PYCR1 QRSL1 RAB11B RAB1B RAB4B RDH13 RHOT2 RPS14 RSAD1 SAMM50 SARS SARS2 SDHA SDHC SECISBP2 SFXN3 SFXN5 SHMT2 SIRT3 SLC25A10 SLC25A11 SLC25A14 SLC25A26 SLC25A29 SLC25A31 SLC25A37 SLC25A39 SND1 SNPH SOD1 SPR SQRDL STOML2 SUPV3L1 SURF1 TBRG4 TCIRG1 THEM4 THG1L TIMM13 TIMM44 TK2 TMEM160 TOMM22 TOMM40L TOP1MT TRAP1 TRMU TSFM TUFM TXN2 TXNRD2 UNG UQCR UQCRC1 UXS1 VISA WBSCR16 WBSCR18 ZADH2"
We can compare this seed and the highly correlated genes to those found before using all the samples
intersect(best_seed_all, best_seed_hyp) # No intersection
## integer(0)
length(intersect(best_seed_all_hi_cor, best_heat_hyp_hicor_genes))/max(c(length(best_seed_all_hi_cor),
length(best_heat_hyp_hicor_genes))) # 58% similarity
## [1] 0.5838
There is no intersections between the seeds and the genes only share a 58% similarity, suggesting that this pattern found is different to the one found previously. Both shall now be studied and compared. Since the control seed is so similar to the seed found from all the samples, it will be discarded. First a complete ordering of the samples will be made for both seeds based on correlation strength.
hyp_ga_sort <- GA_sort(Heart_rna_seq[best_heat_hyp_hicor_genes_loc, ], seed1 = best_seed_hyp,
Num_cores = 4, sort_length = 145)
## Loading required package: multicore
## [1] 0.4468 71.0000 10.0000
## [1] 0.3741 13.0000 20.0000
## [1] 0.3261 67.0000 30.0000
## [1] 0.3018 113.0000 40.0000
## [1] 0.2956 81.0000 50.0000
## [1] 0.2811 133.0000 60.0000
## [1] 0.2679 116.0000 70.0000
## [1] 0.2563 74.0000 80.0000
## [1] 0.2465 19.0000 90.0000
## [1] 0.2379 61.0000 100.0000
## [1] 0.2298 7.0000 110.0000
## [1] 0.2213 129.0000 120.0000
## [1] 0.2126 111.0000 130.0000
all_ga_sort <- GA_sort(Heart_rna_seq[best_seed_all_hi_cor_loc, ], seed1 = best_seed_all,
Num_cores = 4, sort_length = 145)
## [1] 0.613 63.000 10.000
## [1] 0.54 64.00 20.00
## [1] 0.489 31.000 30.000
## [1] 0.4494 133.0000 40.0000
## [1] 0.4168 48.0000 50.0000
## [1] 0.3886 38.0000 60.0000
## [1] 0.3644 19.0000 70.0000
## [1] 0.3437 7.0000 80.0000
## [1] 0.3257 135.0000 90.0000
## [1] 0.3088 97.0000 100.0000
## [1] 0.2929 15.0000 110.0000
## [1] 0.2787 132.0000 120.0000
## [1] 0.2637 115.0000 130.0000
We can examine what non-mitochondrial genes correlate with the results from both seeds.
hyp_mito_set1 <- which(row.names(Heart_rna_seq) %in% as.hclust(hypertrophic_seed_heat$rowDendrogram[[1]])$labels)
hyp_mito_set2 <- which(row.names(Heart_rna_seq) %in% as.hclust(hypertrophic_seed_heat$rowDendrogram[[2]])$labels)
load("~/Documents/PhD/Rworkspaces/Heart_cor.RData")
# cor_test<-c(1:length(rownames(Heart_rna_seq)[-Heart_mito_loc])) for(i in
# 1:length(rownames(Heart_rna_seq)[-Heart_mito_loc])){
# cor_test[i]<-cor(as.numeric(colMeans(Heart_rna_seq[Heart_mito_set1,all_ga_sort[c(1:40)]])),as.numeric(Heart_rna_seq[-Heart_mito_loc,][i,all_ga_sort[c(1:40)]]))
# }
For the seed from the hypertrophic data these genes are the most highly correlated to the pattern seen. Note the absence of cytosolic ribosome genes that were found previously in the CCLE analysis.
Collate_genes(row.names(Heart_rna_seq)[-Heart_mito_loc][which((cor_test) > (0.8))])
## [1] "AASDHPPT ABHD5 ANAPC10 ANAPC13 ANKRD13C ARL5A ARMCX1 ARMCX2 ASF1A ATP6V1G1 BNIP2 BNIP3 BXDC2 C14ORF129 C14ORF32 C18ORF25 C1ORF131 C1ORF19 C1ORF52 C1ORF55 C5ORF21 C6ORF66 C7ORF30 CBLL1 CCDC59 CCNDBP1 CCNH CGGBP1 CGRRF1 CHM CHMP2B COMMD10 COMMD3 COMMD8 CPD DHX15 DIP2B DR1 DUS4L EBAG9 ENOPH1 EPRS FAM175B FAM45A FAM92A1 FAM96A FKBP3 FNDC3A FTSJD1 GCLC HIP2 HS.130036 HS.294103 HS.443490 HSPB8 INTS6 JAZF1 KIAA1826 KLHDC2 KPNB1 KRIT1 LAMP2 LOC124512 LOC387820 LOC401397 LOC642897 LOC647346 LOC653226 LPIN1 LRRC40 LSM1 LSM5 MAPK6 MARCH7 MBNL1 MBNL2 MEX3C MGAT2 MYNN NDFIP2 NKIRAS1 NPTN OTUD6B PCGF6 PCNP PDCD10 PDE12 PIGP PKIA POLB POMP PREI3 PRKRA PRKRIR PRMT3 PSMA1 PSMA6 RAB18 RAB40B RAP1A RASA1 RBM18 RMI1 RNGTT RPL22 RRP15 RTCD1 SAR1B SC5DL SCAMP1 SCYL2 SEP15 SIRPG SLC35A3 SLC35E3 SLK SMAD4 SNX3 SNX4 SPCS3 SRP9 STAM SUB1 SYNCRIP TBC1D9 TBPL1 TGOLN2 THAP1 THOC7 TM2D1 TMED10P TMED5 TMED7 TMEM106B TMEM14B TMEM188 TMEM55A TRIM29 TXNDC17 TXNDC9 UBE2F UBE2N UBLCP1 UBQLN2 USP38 USP42 VAMP7 VKORC1L1 VPS35 WDSUB1 WRB ZC3HC1 ZFAND1 ZFR ZMYND11 ZNF25 ZNF277 ZZZ3"
Collate_genes(row.names(Heart_rna_seq)[-Heart_mito_loc][which((cor_test) < (-0.8))])
## [1] "AAMP ABCF1 ACP2 AHCYL1 AP1G1 AP2A2 APEH ATG16L1 ATG9A ATP1A1 BAT3 BCAP31 BCAR1 BLCAP BRD3 BTBD7 C19ORF6 C20ORF3 C2CD2 C7ORF59 C8ORF41 CASC3 CDIPT CENTD2 CHD8 CHES1 CHMP1A CLCN6 CLSTN1 CPNE1 CTNNBL1 CUL1 CYTH3 DAZAP1 DCTN5 DEAF1 DNM2 DNPEP DTX3 DULLARD EBP EIF4E2 ELOF1 ERGIC3 FAM40A FAM62A FBXO46 FLJ20489 GALNT11 GNL3 GSS GTF3C1 GUSBL1 HS.485155 HS.518414 ILF3 IRAK1 ITFG3 KCTD2 KIAA0323 KIAA1602 KIAA1967 KIF1C LARP1 LASS2 LASS4 LOC644743 LOC645899 LOC728069 LOC732425 LRP1 LRPAP1 MAPKAPK2 MGAT4B MGC52000 MKNK1 MLL4 NAP1L4 NAT9 NPLOC4 NUMA1 NUP214 OBSCN ORMDL1 P8 PA2G4 PAF1 PARP12 PARP6 PDDC1 PHF13 PIGS PKN1 POU2F1 PPP2R5D PRMT7 PRPF3 PRPF8 PSMD2 PSME1 PTDSS2 PTRF RAI16 RAMP3 RBM38 RGL2 RNF40 RNF41 RUSC2 SAPS3 SCN4B SEC61A1 SEMA3B SERPING1 SETX SF4 SGSH SHARPIN SHC1 SIDT2 SLC35C1 SMARCB1 SPNS1 SSNA1 ST6GALNAC6 STIP1 STK25 STK35 STRADA STUB1 STX4 TBC1D2B TBCB TLK2 TMEM175 TMEM62 TNPO2 TOR1A TP53BP1 TRIM28 TRIM54 TRMT1 TRPC4AP TTL UBAP2 UNC45A UPF1 UROD USF2 USP7 VPS39 WDR45L ZBTB17 ZDHHC14 ZER1 ZMYM3 ZNF395 ZNF436 ZNF480"
The same process is then repeated for the seed found from all the samples.
all_mito_set1 <- which(row.names(Heart_rna_seq) %in% as.hclust(all_seed_heat$rowDendrogram[[1]])$labels)
all_mito_set2 <- which(row.names(Heart_rna_seq) %in% as.hclust(all_seed_heat$rowDendrogram[[2]])$labels)
load("~/Documents/PhD/Rworkspaces/Heart_cor2.RData")
# cor_test2<-c(1:length(rownames(Heart_rna_seq)[-Heart_mito_loc])) for(i
# in 1:length(rownames(Heart_rna_seq)[-Heart_mito_loc])){
# cor_test2[i]<-cor(as.numeric(colMeans(Heart_rna_seq[all_mito_set2,all_ga_sort[c(1:40)]])),as.numeric(Heart_rna_seq[-Heart_mito_loc,][i,all_ga_sort[c(1:40)]]))
# }
Collate_genes(row.names(Heart_rna_seq)[-Heart_mito_loc][which((cor_test2) >
(0.9))])
## [1] "ADPRHL1 ADSSL1 AIG1 APOO ASB10 ASIP BBS2 C10ORF71 C10ORF76 C11ORF74 C14ORF159 C14ORF180 C1ORF188 C21ORF33 C3ORF60 C5ORF4 CAMK2A CAND2 CCDC141 CMBL CRY2 CRYM CYB5A DAND5 DDO DET1 DHRS12 DNAJC19 DUSP28 FAM50B FHOD3 FKBP9L FLJ10986 FRMD5 FSD2 FUNDC1 FYCO1 GALT GCOM1 GNG7 GPRASP2 HEXIM2 HFE2 HLTF HS.10862 HS.26579 HS.467627 IAH1 IL11RA ITFG1 KIAA1394 KIAA1737 KLHDC9 KLHL7 LANCL1 LDB3 LOC145853 LOC201229 LOC728037 LOC729843 LRBA LRRC20 LRRC49 MAGEE1 MGC35361 MOAP1 MYH7B MYOM2 NEBL NFE2L1 NMNAT3 NUDT6 PCYOX1 PDLIM5 PEX7 PFKM PHKB PIGP PLA2G4C PRUNE RBM20 RCOR3 RGN RPL3L SCAPER SFRP1 SH3RF2 SLC29A1 SLC35F1 SNTA1 SPAG7 SPG11 SPHKAP TACC2 TMEM116 TMLHE TNNI3K TSPAN32 VWC2 WDR62 ZNF71"
Collate_genes(row.names(Heart_rna_seq)[-Heart_mito_loc][which((cor_test2) <
(-0.9))])
## [1] "ABHD5 ACOT7 ALG3 ALOX5AP ANKRD13A ANXA2 AQP9 ARD1A ARHGDIA ARMET ARPC4 ASCC2 ATP1A1 ATP1B3 BOP1 BRMS1 BTBD14A BYSL C19ORF59 C6ORF153 CCDC109B CCT2 CD163 CD44 CD59 CDKN1A CFL1 CLCF1 CLIC1 COL6A3 CORO1C CSNK1D CTSC DHX37 DNAJB11 DPH2 DUSP23 EBNA1BP2 EIF2C2 EIF3B ELOVL6 EMP3 ERRFI1 FAM129B FGR FHL3 FKBP11 FKSG30 FOSL1 FPR1 FZD9 G6PD GATAD2A GPATCH4 GPR172A GTF2E2 HAVCR2 HS.543887 HSPC171 IMPDH1 IQCG IRAK1 ITGA5 LAPTM5 LHFPL2 LILRA5 LIPG LMCD1 LMNB2 LOC134997 LOC285556 LOC653888 LRRC8A LY96 M6PRBP1 MAP2K1 MARCH3 MYC MYH9 MYOT NFKBIZ NOL1 NPC1 NR2C2AP PABPC1 PLEKHO1 PMEPA1 POLR3H PPM1G PRELID1 PTBP1 PTPN1 PUS1 RANGAP1 RFWD2 RGS19 RNF149 RPL29 RPN1 RPN2 S100A11 S100A16 S100A3 S100A8 S100A9 SAT1 SDF2L1 SEC61A1 SERPINA1 SERPINA3 SERPINB8 SERPINE1 SERPINH1 SH3BGRL3 SIGLEC10 SLC11A1 SLC16A3 SLC3A2 SLC7A1 SNRPB SPATA2L SPHK1 SPPL2A STS-1 TEAD4 TIMP1 TMEM183A TMEM183B TNC TNFAIP8L3 TNFRSF1A TUBB2A TUBB6 UCK2 VCAN YARS ZDHHC16 ZDHHC9 ZNF598 ZNF653 ZYX"
Here there are a few cytosolic ribosome genes. We can compare the correlation scores of the cytosolic ribosome genes between the two seeds
Ribo_biogenesis <- read.delim("~/Documents/PhD/GeneLists/Ribo_biogenesis.txt",
header = F)
ribo_genes_loc <- unique(c(which(row.names(Heart_rna_seq)[-Heart_mito_loc] %in%
as.character(Ribo_biogenesis[, 3])), grep("RP", row.names(Heart_rna_seq)[-Heart_mito_loc])[c(166:279)]))
par(mfcol = c(1, 2))
boxplot(cor_test2, cor_test, names = c("All GA", "HYP GA"), ylab = "Correlation score",
main = "All genes")
boxplot(cor_test2[ribo_genes_loc], cor_test[ribo_genes_loc], names = c("All GA",
"HYP GA"), ylab = "Correlation score", main = "Cytosolic ribosome genes")
The p-value of this significance can be calculated with the following t-test:
t.test(cor_test2[ribo_genes_loc], cor_test[ribo_genes_loc])
##
## Welch Two Sample t-test
##
## data: cor_test2[ribo_genes_loc] and cor_test[ribo_genes_loc]
## t = -9.85, df = 424, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4773 -0.3185
## sample estimates:
## mean of x mean of y
## -0.38244 0.01546
This is significant with a p-value of less than 2.2.e -16. Thus it appears there is a break down in the correlation between the cytosolic ribosomes and the mitochondrial ribosome in atleast some of the hypertrophic samples.
A more advanced analysis would use a t test to see which GO terms are significant based on their correlation score. A method to achieve this has been coded and added to the MicroarrayGA package.
table1 <- GO_enrichment_cor_values(row.names(Heart_rna_seq)[-Heart_mito_loc],
cor_test, 0.05)
table2 <- GO_enrichment_cor_values(row.names(Heart_rna_seq)[-Heart_mito_loc],
cor_test2, 0.05)
suppressPackageStartupMessages(library(googleVis))
## Warning: package 'googleVis' was built under R version 2.15.3
M <- gvisTable(as.data.frame(table1))
N <- gvisTable(as.data.frame(table2))
The following table represents the significant GO terms assosiated with the correlation values of the pattern found with only the hypertrophic samples.
print(M, "chart", width = "800")
800
This table represents the significant GO terms assosiated with the correlation values of the pattern found using all sample.
print(N, "chart", width = "800")
800
The latter pattern is rich with connections to the ribosome biogenesis while the pattern assosiated with hypertrophic samples only contains the most general terms.
To see the classic fork behaviour we need to examine the principle components for this data.
# PCA analysis
pca_matrix1 <- Heart_rna_seq[best_heat_hyp_hicor_genes_loc, best_seed_hyp]
pca_matrix2 <- Heart_rna_seq[best_seed_all_hi_cor_loc, best_seed_all]
# Run PCA analysis
pca_results1 <- prcomp(t(pca_matrix1), scores = TRUE, cor = TRUE, center = TRUE)
pca_results2 <- prcomp(t(pca_matrix2), scores = TRUE, cor = TRUE, center = TRUE)
# PC1 is 60% of variation, we want to know the coefficient for this.
pca_loadings1 <- pca_results1$rotation
pca_loadings2 <- pca_results2$rotation
pc1_list1 <- c(1:145)
pc1_list2 <- c(1:145)
for (i in 1:145) {
pc1_list1[i] <- lsfit(as.matrix(pca_loadings1), as.matrix(Heart_rna_seq[best_heat_hyp_hicor_genes_loc,
hyp_ga_sort[i]]))$coef[2]
pc1_list2[i] <- lsfit(as.matrix(pca_loadings2), as.matrix(Heart_rna_seq[best_seed_all_hi_cor_loc,
all_ga_sort[i]]))$coef[2]
}
gene_means <- rowMeans(as.matrix(Heart_rna_seq[best_heat_hyp_hicor_genes_loc,
best_seed_hyp]))
gene_means2 <- rowMeans(as.matrix(Heart_rna_seq[best_seed_all_hi_cor_loc, best_seed_all]))
cluster_num <- rep(2, 512)
cluster_num[which(names(pca_loadings1[, 1]) %in% as.hclust(hypertrophic_seed_heat$rowDendrogram[[1]])$label)] <- 1
cluster_num2 <- rep(2, 531)
cluster_num2[which(names(pca_loadings2[, 1]) %in% as.hclust(all_seed_heat$rowDendrogram[[1]])$label)] <- 1
PC_df <- data.frame(PC1 = pca_loadings1[, 1], PC2 = pca_loadings1[, 2], PC3 = pca_loadings1[,
3], cluster = cluster_num, gene_means)
pc1 <- ggplot(PC_df, aes(PC1, PC2))
PC_df2 <- data.frame(PC1 = pca_loadings2[, 1], PC2 = pca_loadings2[, 2], PC3 = pca_loadings2[,
3], cluster = cluster_num2, gene_means2)
pc1_2 <- ggplot(PC_df2, aes(PC1, PC2))
pc1 + geom_point(aes(colour = factor(cluster)))
pc1_2 + geom_point(aes(colour = factor(cluster)))
Gene_Means <- rowMeans(as.matrix(Heart_rna_seq[best_heat_hyp_hicor_genes_loc,
]))
Heart_hyp_norm <- as.matrix(Heart_rna_seq[best_heat_hyp_hicor_genes_loc, hyp_ga_sort]) -
Gene_Means
Heart_hyp_norm_df <- data.frame(as.data.frame((Heart_hyp_norm)), Cluster = cluster_num,
Order = c(1:512))
library(reshape2)
mdat <- melt(Heart_hyp_norm_df[, c(c(1:20), 146, 147)], id.vars = c(21, 22))
ph <- ggplot(mdat, aes(Order, value)) + geom_point(aes(colour = factor(Cluster),
size = abs(value))) + facet_wrap(~variable) + ylab("Gene Expression Levels")
Gene_Means2 <- rowMeans(as.matrix(Heart_rna_seq[best_seed_all_hi_cor_loc, ]))
Heart_all_norm <- as.matrix(Heart_rna_seq[best_seed_all_hi_cor_loc, all_ga_sort]) -
Gene_Means2
Heart_all_norm_df <- data.frame(as.data.frame((Heart_all_norm)), Cluster = cluster_num2,
Order = c(1:531))
mdat2 <- melt(Heart_all_norm_df[, c(c(1:20), 146, 147)], id.vars = c(21, 22))
ph2 <- ggplot(mdat2, aes(Order, value)) + geom_point(aes(colour = factor(Cluster),
size = abs(value))) + facet_wrap(~variable) + ylab("Gene Expression Levels")
print(ph)
print(ph2)
hist(cor(t(Heart_rna_seq[best_seed_all_hi_cor_loc, all_ga_sort[c(1:21)][-which(pc1_list2 >
0)[c(1:5)]]])), breaks = 100)
hist(cor(t(Heart_rna_seq[best_seed_all_hi_cor_loc, all_ga_sort[c(1:21)]])),
breaks = 100)
# Clearly these 5 samples are essential for the correlation. Hence the 3rd
# PC must represnt the presence of the correlation, where both forks are
# needed...
######### Ranking the genes in order of importance from the PCA
Collate_genes(names(abs(pca_loadings2[, 2])[rev(order(abs(pca_loadings2[, 2])))]))
## [1] "HMGCS2 MRPS6 MRPL16 TXN BPHL ICT1 GRPEL1 AMT MTCH1 CPT1B HSPE1 ATP5G1 MOSC1 GATM CS TXNRD1 HSPB7 ENDOGL1 CLYBL GSTK1 NDUFA12 LACTB2 PRDX1 OXCT1 DLAT ATP5S LDHA MRPL21 MRPL34 COX7C MIPEP COX6B1 DCI GK2 GLS NDUFA3 ACP6 ALDH2 RHOT1 NDUFA1 AK2 CKMT2 ATP5I ATP5J OXR1 RPS15A GPAM NIPSNAP1 ETFB PRODH NDUFA11 MRPL49 PRDX2 MTHFD2 PNKD HIGD2A IDH2 MTHFD1L MTIF2 CRLS1 COX6A2 NDUFS7 SUPV3L1 SLC25A26 MUT NDUFA13 TMEM143 PXMP2 NDUFAF1 LDHD CYP27A1 FOXRED1 WDR22 IMMP2L ACSS1 BDH1 CLIC4 AFG3L2 ACADVL AK3 TBRG4 ACN9 COX6C GALC COQ5 IDH3B BOLA1 COQ10A ALDH3A2 NDUFS3 TFB1M MRPL14 SDHA ME2 L2HGDH ATP5L SDHB TOMM40 MRPS25 UQCRH ATP5G3 COX8A MRPL40 THEM2 OGDHL IARS2 TFB2M CYP11A1 DAP3 FASTK TMEM14C SLC25A11 SLC25A34 MRPL33 ALKBH7 DLST NDUFA9 SARS2 MRPL20 BBOX1 RPS14 MTX2 CBR4 MRPS9 ACADSB VDAC1 MSRB2 MRPS26 RDH13 GOT2 AMACR UQCRB ACAD11 FIS1 RPL34 POLRMT PCCA LYRM2 ATP5H NDUFV1 SARS GBAS RTN4IP1 SLC25A12 DHRS1 ETFDH NDUFB7 NRD1 MFN2 PPOX METT11D1 PTRH2 CCT7 NDUFB6 NIF3L1 TOMM70A UQCRQ FECH ENDOG NDUFB3 GPX1 GFM1 COQ3 COX6A1 MRPS18C LDHB LYRM5 MRPL10 NDUFC1 ACSL4 GARS ATPAF2 PRDX5 HADH MRPL39 MRPL41 ATP5O DHTKD1 MRPL43 NDUFA7 ALDH1L1 CPT2 PHYH MSRA BCKDHA HK2 PDK4 SFXN5 GPT2 RDH14 RPL10A DIABLO SUOX TIMM44 ACADM MOSC2 NIT1 GLRX2 TATDN3 POLDIP2 GK ALDH4A1 IVD ABHD10 TUFM ACO2 ADCK1 COMTD1 MCEE KIF1B TST TOMM20 RILP BCKDHB COX7A2 DNAJC4 MRPS21 MRPS18B CYCS ME3 PPM1K MGC3196 COQ6 DDX28 TOMM34 HSPD1 GTPBP8 MTFMT SECISBP2 HINT2 OGDH GLRX5 ACOX3 SAMM50 IMMT AKR7A2 ATP5G2 AGPS ZADH2 MRPS30 MRPL24 PTS ATP5B GCAT NT5M IDH3G UQCRC1 SLC25A42 ECH1 MCCC2 SHMT2 AIFM1 SLC25A23 GFM2 SLC25A3 CBARA1 CCDC44 NFU1 BCS1L ATP5F1 NDUFA8 AUH CRAT CRY1 TMEM11 NDUFB8 PDHB HSCB HDDC2 PARS2 SUCLG2 FUNDC2 FSIP2 HIBADH MTHFS ALDH9A1 HSD17B8 COQ9 SCP2 ALDH7A1 CECR5 PCBD2 DARS2 HIBCH PCCB GPX4 ADCK5 UCRC TMEM160 SLC25A19 ABCF2 COX15 COX7A1 NME4 PECI PACRG COX10 VDAC3 OXA1L MMAB OXSM NDUFS2 MDH1 RHOA NDUFS1 ETFA BCL2L13 PDHX HSD17B4 SLC25A40 TOMM40L MRPL46 ACAD8 ACLY DGUOK MRPS36 TK2 VAMP8 TOP1MT SCO1 ACAA1 RHOT2 EHHADH ATPAF1 STOML2 ATAD1 ACAD10 MDH2 COX4I2 SDHC BCL2 MLYCD LIPT1 RAB8B MRPS7 MRPS22 BAX NLRX1 CTPS2 APOA1BP BCKDK LARS2 CCDC51 PRDX6 ANKRD26 COX5A OPA1 ABCB6 LIAS PDK2 CHCHD4 COX17 KIAA0141 TXNRD2 MAOB PTGES2 MAT2B NME3 DUSP26 ATP5A1 COQ2 MTRF1L NDUFA4 ECHS1 SLC25A4 MTHFD1 MRPL32 MRPS10 ATPBD3 MRPL50 AIFM2 UQCR OAT NNT AS3MT MECR MRPL3 SLC25A13 MRPL44 LOC646630 BID DHRS4 DUT IDH1 ARMC4 HEMK1 IDH3A FARS2 AKAP1 RSAD1 SUCLA2 MRPL22 WBSCR16 ACOT2 ADSL SLC25A30 EEFSEC NDUFB5 FLJ20920 MRPL11 ACO1 LOC644310 PANK2 COQ7 NME1-NME2 MRPL42 AGXT2L2 CHCHD7 SLC25A44 HAGH NTHL1 SDSL SHMT1 MRPS33 SUCLG1 BAD MRPS16 NFS1 PINK1 GCDH ISCU HCCS FASN SLC25A27 ACAT1 UQCRC2 NDUFV3 ADHFE1 MRPL51 MTERFD3 MRPL18 SSBP1 TAP1 MRRF PTCD2 NDUFS8 SLC25A25 AURKAIP1 ATP5D LETMD1 UQCRFS1 NDUFB4 MCCC1 MRPS14 TMEM126A ABHD11 MRPS27 SPG7 PET112L SLC25A35 PPTC7 FAHD2A DACT2 REXO2 ALDH6A1 VAMP1 EARS2 CYP11B1 FASTKD2 SFXN4 LOC727762 ACACB PMPCA ACYP2 TCIRG1 GHITM DHX29 HMGCL ME1 PMPCB ATAD3A MPV17 TSHZ3 SND1 AASS RAB32 BRP44L PARL LACE1 MRPL2 NCOA4 SPR DCAKD QDPR MRPL45 MRPL35 ABCA9 ACOT9 MRPL37 MRPS5 PCK2 LONP1 GRPEL2 MRPS17 PRDX3 MRPL53 SIRT5 OCIAD1 ATPIF1 GRSF1 OSGEPL1 ALDH5A1 HADHA ACAA2 AIFM3 NME1 MFN1 EFHD1 SERHL2 NDUFA10 RAB24 MTCP1 PEX11B MTO1 TIMM13 MRPS11 SPRYD4 HSDL2 FDX1 COX5B TFAM BRP44 SLC25A20 MMACHC CCDC90A DBT SLC25A16 OMA1 SLC25A39 SURF1 ACADS"
Collate_genes(names(abs(pca_loadings1[, 2])[rev(order(abs(pca_loadings1[, 2])))]))
## [1] "TIMM10 PRDX3 UQCRC2 TMEM126A MRPS28 OAT OXR1 SDHD PMPCB SUCLG2 ACSL1 NIT1 MOSC2 ISCU PROSC ME2 SCO1 GFM1 UQCRFS1 ALKBH7 SARS DLST MAT2B AFG3L2 CPT1B MRPS30 IDI1 TFAM COX5B MRPL4 LOC389293 LDHB PTS TSFM NIPSNAP3A LETMD1 RAB8B CYB5R1 NDUFAF1 HIGD1A FH ZADH2 SDHC GNG5 POLRMT TFB2M TOMM22 LACTB NTHL1 RG9MTD1 SARS2 ATP5C1 CHCHD7 COQ10B TSHZ3 PIGY NFXL1 MRPS18C SLC25A28 RSAD1 DLD PRO1853 HSDL2 ADCK4 HSD17B10 ALDH1L1 GBAS DNAJC11 MDH1 MRPS34 SLC25A10 MTCP1 NDUFV1 OGDH SLC25A11 LIPT1 SLC25A13 MGC70857 MIPEP SAMM50 PHB2 ACADS NME6 ACSL5 HBXIP MRPL49 CBR4 MFN1 MUT HRSP12 MRPL18 TTC19 NARS2 ALDH9A1 ETFA ALDH3A2 MRPS24 NME4 OXA1L MRPL33 RDH13 HTRA2 PCBD2 SPG7 IREB2 MRPL47 MGC3196 MCEE THEM4 RAB4B LIAS PPIF PHYH HDDC2 CPT2 MRPS18A HSPA9 MRPL32 MRPL3 SLC25A40 WBSCR18 NDUFB2 AGPS HIGD2A TBRG4 BCKDHA MRPL13 MRPS33 PTRH1 RAB11B TMEM160 ACAD8 NME1-NME2 NIPSNAP3B BPHL DHODH TXNRD2 CYP11B2 HSPE1 GPAM IDE RHOA SLC25A37 TXN2 ETHE1 ENDOG ATP5I LONP1 MRPS21 BLOC1S1 ABCB6 POLG PISD OSGEPL1 RFK PANK2 BDH1 FAHD2A VDAC1 SLC25A26 ATAD3A CCDC90B MRPL21 DNAJC15 MRM1 NDUFA9 PTCD2 PPM1K LOC646630 PRDX1 CLIC4 BCAT2 MRPS2 AGPAT5 HSPB7 MTO1 ADCK5 RDH14 PPA2 CCDC109A LDHD ALDH7A1 MTFMT SLC25A19 MRPL28 SLC25A46 MTX1 MFN2 GRSF1 CCDC51 LYRM4 UXS1 MRPL40 PDHB MRPS27 GUF1 LACTB2 GOT2 SLC25A18 IVD FKSG24 TRIT1 GPX1 DNAJA3 AURKAIP1 SLC25A23 NDUFA7 COQ2 PCCB MRPL43 MTX2 ALDH5A1 NUDT19 MTERFD3 ACP6 SFXN3 ALAS1 KYNU RAB1B MRPS18B PTGES2 TIMM44 AS3MT BRP44 NDUFA10 SNPH MRPS36 GFER COX11 NDUFB10 FTMT CBARA1 MRPL24 NDUFA4 GCAT GLRX UCRC NDUFC2 QDPR DCI COX7C GPX4 MRPL12 MRPL54 IMMP2L NDUFS7 RPL34 TRAP1 CHCHD3 DDX28 ELAC2 ACO2 LYRM5 REXO2 CCT7 COX18 CA5B AIFM2 SND1 SLC25A43 ACAA1 CKMT1A AKR7A2 POLDIP2 OXSM SLC25A16 DLAT EFHD1 AIFM1 GALC MRRF PMPCA TOP1MT THEM5 FAHD1 SHMT2 SECISBP2 MPST LETM1 RPL35A ARMC1 MRPL19 ISOC2 KIAA0141 TOMM70A NDUFS3 WBSCR16 PINK1 PDSS1 UQCRB ACOT9 PPM2C BRP44L PACRG SLC25A32 GHITM APEX2 COQ5 NME3 NNT NLRX1 MTERFD1 IDH3G TIMM23 ABCD1 FXN SDHA PDPR IARS2 PECI TOMM7 GSTK1 SIRT3 NCOA4 GTPBP5 FKBP8 ACADM TMEM65 IDH3A SFXN4 COX4I1 GLDC UQCRH CCDC44 CRY1 MRPL34 PARK7 UQCR MRPL14 BBOX1 ABCB10 GMPPB HSD17B8 ETFB GLRX2 CYP27B1 HIBADH NDUFAB1 NDUFB7 SPR COX10 PARS2 SQRDL GK2 SLC25A3 COX8A CRLS1 OPA3 TMEM14C NT5M SOD1 TUFM ATP5L MARS HIBCH NDUFB5 GPD2 HAO2 CRAT RHOT2 EARS2 PRODH2 POLG2 ME1 ATP5D VISA MGC19604 MRPL53 HMGCL PCCA PYCR1 MRPL16 AGR2 PRDX6 COMTD1 DUT IDH2 ATP5J2 METT11D1 FTSJ2 SUPV3L1 MRPS5 NDUFB4 MTHFD2L VDAC3 PDHX HAGH SLC25A39 MSRB2 TRMU MTIF3 ASAH2 TK2 NT5C SURF1 HK1 PAK7 GTPBP8 ATAD1 UNG NDUFA3 GLYCTK MPV17 COX6B1 STAR ALDH4A1 PDSS2 MRPS6 DRG2 TIMM8A KARS OMA1 PARL ALDH1L2 NT5C3 BOLA1 SLC25A25 ME3 UQCRQ GARS ACAD11 WDR22 LOC123876 RPS14 WARS2 COQ6 AMACR QRSL1 ADCK1 DAP3 MRPL15 BCL2L13 TOMM40L FDXR OCIAD1 ACAT1 TCIRG1 HADHB PHYHIPL MRPL27 COX7B TATDN3 NARS MRPL41 MRPL48 RPS15A SLC25A14 CPS1 NDUFS4 ATP5G1 ABCA9 PGS1 MAP1D MRPL10 MTHFD1L SLC25A31 TIMM13 CLPB GLUD1 ECSIT LOC644310 COX15 SLC25A29 DHTKD1 SLC25A44 AKAP1 ATP5G2 FAM36A EEFSEC ACN9 NDUFA5 CYP11B1 BAK1 MRPS12 ICT1 TDRKH TRNT1 NDUFA13 GCDH UQCRC1 BID NDUFV2 MRPL23 MTCH1 ABCC12 LDHAL6B VDAC2 AGXT2L2 FTH1 ABCF2 ATPIF1 DMGDH ACSL4 STOML2 NIF3L1 CYP11A1 THG1L LYRM2 COX8C FECH SFXN5 ACO1 MRPL2 EFHA1 GLYAT ATP5G3 AGXT2 ACSS1 LARS2 MRPL30 MRPL38 NDUFB11"
For this data the forks can be seen in the 1st Principle component. I will also test the hypothesis that the ordering is biased toward having the control samples at either end of the ordering. This is don by approximation by using the central limit theorem to find an approximate normal distribution for the mean placement of all the control samples and then comparing that to the position of the ordering found for each seed.
control_mean1 <- mean(which(all_ga_sort > 106))
control_mean2 <- mean(which(hyp_ga_sort > 106))
c_m <- c(1:1e+05)
for (i in 1:1e+05) {
c_m[i] <- mean(which(sample(c(1:145), 145, replace = F) > 106))
}
# hist(c_m,breaks=100) # By CLT a normal distribution
length(which(c_m > control_mean1))/length(c_m)
## [1] 0.1313
length(which(c_m > control_mean2))/length(c_m)
## [1] 0.03336
The hypertrophic seed ordering is significant with a p-value of 0.03 while the other ordering is not significant. Now we can plot the two forks.
control_status1 <- rep("Hypertrophic", 145)
control_status1[which(hyp_ga_sort > 106)] <- "Control"
control_status2 <- rep("Hypertrophic", 145)
control_status2[which(all_ga_sort > 106)] <- "Control"
heart_data1 <- data.frame(PC = pc1_list1, Status = control_status1, Order = c(1:145))
heart_data2 <- data.frame(PC = pc1_list2, Status = control_status2, Order = c(1:145))
p1 <- ggplot(heart_data1, aes(Order, PC))
p2 <- ggplot(heart_data2, aes(Order, PC))
p1 + geom_point(aes(colour = factor(Status))) + ylab("PC2") + labs(title = "") +
scale_colour_manual(values = c("black", "blue"))
p2 + geom_point(aes(colour = factor(Status))) + ylab("PC2") + scale_colour_manual(values = c("black",
"blue"))
Clearly all the down regulated samples are controls, indicating that this regulation does not occur in any of the hypertrophic samples.