wc -l SRR7207011_filteredmp_macs2_peaks.narrowPeak
#Command used to count the total peaks called for sample SRR7207011
#Output: 2001 SRR7207011_filteredmp_macs2_peaks.narrowPeak
wc -l SRR7207017_filteredmp_macs2_peaks.narrowPeak
#Command used to count the total peaks called for sample SRR7207017
#Output: 20783 SRR7207017_filteredmp_macs2_peaks.narrowPeak
2001 peaks were called by MACS2 for sample SRR7207011 while 20783 peaks were called by MACS2 by sample SRR7207017.
The mean peak width for sample SRR7207011 is 255.75962 bp. The mean peak width for sample SRR7207017 is 337.2638214 bp. The xls for each sample was downloaded. The length (aka width) of each identified peak was averaged for each sample within the excel file to help determine mean peak width and thus R was not necessary in this case. The excel files that was used and mean peak width determination are attached to this homework.
The signalValue in column 7 is a measurement of the overall/average enrichment value (true peaks) of that particular region. Since signalValue is a measurement of the overall enrichment around that region, this is a quantifiable value that indicates the “intensity” of the signal of the called peak that can be compared to help identify the true peaks.
bedtools intersect -a SRR7207011_filteredmp.bam -b SRR7207011_filteredmp_macs2_summits.bed > SRR7207011_bedtools.out
samtools view -c SRR7207011_bedtools.out
#Output: 19880 (numerator)
samtools view -c SRR7207011_filteredmp.bam
#Output: 22517898 (denominator)
#FRip score for sample SRR7207011: 19880/22517898=0.0008828533=0.08828533%
bedtools intersect -a SRR7207017_filteredmp.bam -b SRR7207017_filteredmp_macs2_summits.bed > SRR7207017_bedtools.out
samtools view -c SRR7207017_bedtools.out
#Output: 212633 (numerator)
samtools view -c SRR7207017_filteredmp.bam
#Output: 20702991 (denominator)
#FRip score for sample SRR7207017: 212633/20702991=0.01027064=1.027064%
Sample SRR7207011 has a FRiP score of 0.08828533% which did not pass the 1% threshold typical of high quality ChIP-seq libraries while sample SRR7207017 has a FRiP score of 1.027064% which did pass the 1% threshold typical of high quality ChIP-seq libraries.
The cross-correlation profile plot is attached as pdf in this assignment. crossvalues_Chip list content is indicated below:
class(crossvalues_Chip)
crossvalues_Chip
$CC_StrandShift
[1] 200
$tag.shift
[1] 100
$N1
[1] 8697346
$Nd
[1] 9035245
$CC_PBC
[1] 0.963
$CC_readLength
[1] 36
$CC_UNIQUE_TAGS_LibSizeadjusted
[1] 5987801
$CC_NSC
[1] 1.447
$CC_RSC
[1] 1.079
$CC_QualityFlag
[1] 1
$CC_shift
[1] 200
$CC_A
[1] 0.229
$CC_B
[1] 0.224
$CC_C
[1] 0.158
$CC_ALL_TAGS
[1] 9420858
$CC_UNIQUE_TAGS
[1] 9035245
$CC_UNIQUE_TAGS_nostrand
[1] 8909432
$CC_NRF
[1] 0.959
$CC_NRF_nostrand
[1] 0.946
$CC_NRF_LibSizeadjusted
[1] 0.599
a. aligned read asymmetries between DNA strands are the basis for the cross-correlation profile approach to ChIP-seq QC
c. correlations between strand-specific depths are recalculated after performing a “strand shift”
g. NSC values above a pre-defined threshold are of acceptable quality for downstream analysis
1.447
1.079
Yes, the RSC value incorporates the “shadow peak” or phantom peak height for its calculation.
The cross-correlation profile figure for the input sample is attached with this assignment as a pdf.
Steps included in the preparation of control/input sample:
a.chemical cross-linking of DNA and protein
b.fragmentation of DNA
d.unlinking of DNA and protein
e.library preparation and sequencing of DNA
Steps EXCLUDED in the preparation of the control/input sample:
c.chromatin immunoprecipation with an appropriate antibody
Output list for the input sample
$CC_StrandShift
[1] 140
$tag.shift
[1] 70
$N1
[1] 8725518
$Nd
[1] 9119345
$CC_PBC
[1] 0.957
$CC_readLength
[1] 36
$CC_UNIQUE_TAGS_LibSizeadjusted
[1] 5992518
$CC_NSC
[1] 1.04
$CC_RSC
[1] 0.214
$CC_QualityFlag
[1] -2
$CC_shift
[1] 100
$CC_A
[1] 0.127
$CC_B
[1] 0.145
$CC_C
[1] 0.122
$CC_ALL_TAGS
[1] 9606580
$CC_UNIQUE_TAGS
[1] 9119345
$CC_UNIQUE_TAGS_nostrand
[1] 9098023
$CC_NRF
[1] 0.949
$CC_NRF_nostrand
[1] 0.947
$CC_NRF_LibSizeadjusted
[1] 0.599
The NSC value for the input sample is 1.04 while the RSC value for the input sample is 0.214. This library did NOT pass the quality control standards for the ENCODE project if it were a ChIP sample.
Please also see attached for the fingerprint plot.
The TSS plot shows read density around the transcription start sites visually of the sample compared to the control. Compared to the control, the Chip sample data showed two peaks to the left and right of the transcription start site, indicating regions enriched with bound proteins (transcrption factor or histone modifications). Compared to the peak on the left, the peak on the right (from TSS to +1KB or downstream from transcription starting site) is much broader. This suggests histone modification (broader peak) or where the H3K36me3 modification might be located relative to protein coding genes.