Here, We take the expression profile of MAP3K2 as the data source to conduct power and type I error analysis. the genome postion of MAP3K2 is located in chr2:128,056,245-128,100,805 (hg19, ucsc database). the length is 44,560bp. However, the length of the exon region is 10870 bp. As the function analysis to MAP3k2, it is highly associated with cancer development. According to AceView, this gene is expressed at high level, 3.8 times the average gene in this release. The sequence of this gene is defined by 440 GenBank accessions from 395 cDNA clones, some from lung (seen 26 times), breast carcinoma (23), brain (17), pancreas (16), liver (14), testis (13), kidney (12) and 126 other tissues. We annotate structural defects or features in 54 cDNA clones. Alternative mRNA variants and regulation analysis showed that the gene contains 17 distinct gt-ag introns. Transcription produces 8 different mRNAs, 4 alternatively spliced variants and 4 unspliced forms. There are 3 probable alternative promotors, 2 non overlapping alternative last exons and 6 validated alternative polyadenylation sites (see the diagram). The mRNAs appear to differ by truncation of the 5’ end, truncation of the 3’ end, overlapping exons with different boundaries. 958 bp of this gene are antisense to spliced gene choydawbu, raising the possibility of regulated alternate expression.

In the data preproess stage, we need to extract the raw number of reads for MAP3K2 and combine SNP data and RNA-seq data.

Go to the dirctory of /hgcnt44fs/sguo/tcga_OV/rnaseq/bam/drug in 129.106.2.194 and creat the bed file of MAP3K2.bed.

Change the bed file for bam2vcf.pl and then run the perl file in the directory of /hgcnt44fs/sguo/tcga_OV/rnaseq/bam/drug, which would creat bam file and vcf file for MAP3K2.

The mean number of reads for this gene (10870 loci in 233 samples) is 95, indicating it is medium expression gene. perl customed code and Bedcoverage were applied to collect raw number reads for each position.

setwd("/home/sguo/Dropbox/Project/AlleleSpecificExpression/ASE")
load("MAP3K2.RData")
dim(dat3)
## [1] 10870   233
mean(dat3)
## [1] 95.59

plot of chunk unnamed-chunk-2