Task 1: MACS2 peaks and FRiP scores

Q1.1a: How many peaks were called by MACS2 for each of the two Androgren Receptor ChIP-Seq samples in your MACS2 outputs from last week?

wc -l SRR7207011_filteredmp_macs2_peaks.narrowPeak
#Command used to count the total peaks called for sample SRR7207011
#Output: 2001 SRR7207011_filteredmp_macs2_peaks.narrowPeak

wc -l SRR7207017_filteredmp_macs2_peaks.narrowPeak
#Command used to count the total peaks called for sample SRR7207017
#Output: 20783 SRR7207017_filteredmp_macs2_peaks.narrowPeak

2001 peaks were called by MACS2 for sample SRR7207011 while 20783 peaks were called by MACS2 by sample SRR7207017.

Q1.1b: What is the mean peak width for each sample? Show the R command (or other approach) you used to arrive at your answer.

The mean peak width for sample SRR7207011 is 255.75962 bp. The mean peak width for sample SRR7207017 is 337.2638214 bp. The xls for each sample was downloaded. The length (aka width) of each identified peak was averaged for each sample within the excel file to help determine mean peak width and thus R was not necessary in this case. The excel files that was used and mean peak width determination are attached to this homework.

Q1.1c: What is meant by the “signalValue” in column 7?

The signalValue in column 7 is a measurement of the overall/average enrichment value (true peaks) of that particular region. Since signalValue is a measurement of the overall enrichment around that region, this is a quantifiable value that indicates the “intensity” of the signal of the called peak that can be compared to help identify the true peaks.

For your Q1.2 answer, report the FRiP score for each of the two androgen receptor ChIP-seq libraries and the command lines you used to generate your answer. Do the two androgen receptor libraries pass the 1% threshold typical of high quality ChIP-seq libraries? [ 2 points ].

bedtools intersect -a SRR7207011_filteredmp.bam -b SRR7207011_filteredmp_macs2_summits.bed > SRR7207011_bedtools.out
samtools view -c SRR7207011_bedtools.out
#Output: 19880 (numerator)

samtools view -c SRR7207011_filteredmp.bam
#Output: 22517898 (denominator)
#FRip score for sample SRR7207011: 19880/22517898=0.0008828533=0.08828533%

bedtools intersect -a SRR7207017_filteredmp.bam -b SRR7207017_filteredmp_macs2_summits.bed > SRR7207017_bedtools.out
samtools view -c SRR7207017_bedtools.out
#Output: 212633 (numerator)
samtools view -c SRR7207017_filteredmp.bam
#Output: 20702991 (denominator)
#FRip score for sample SRR7207017: 212633/20702991=0.01027064=1.027064%

Sample SRR7207011 has a FRiP score of 0.08828533% which did not pass the 1% threshold typical of high quality ChIP-seq libraries while sample SRR7207017 has a FRiP score of 1.027064% which did pass the 1% threshold typical of high quality ChIP-seq libraries.

Task 2: QC analysis of H3K36me3 ChIP-seq with the BioConductor ChIC package

Q2.1a: Include the cross-corrleation profile plot in your answers file [ 1 point ].

The cross-correlation profile plot is attached as pdf in this assignment. crossvalues_Chip list content is indicated below:

class(crossvalues_Chip)
crossvalues_Chip
$CC_StrandShift
[1] 200

$tag.shift
[1] 100

$N1
[1] 8697346

$Nd
[1] 9035245

$CC_PBC
[1] 0.963

$CC_readLength
[1] 36

$CC_UNIQUE_TAGS_LibSizeadjusted
[1] 5987801

$CC_NSC
[1] 1.447

$CC_RSC
[1] 1.079

$CC_QualityFlag
[1] 1

$CC_shift
[1] 200

$CC_A
[1] 0.229

$CC_B
[1] 0.224

$CC_C
[1] 0.158

$CC_ALL_TAGS
[1] 9420858

$CC_UNIQUE_TAGS
[1] 9035245

$CC_UNIQUE_TAGS_nostrand
[1] 8909432

$CC_NRF
[1] 0.959

$CC_NRF_nostrand
[1] 0.946

$CC_NRF_LibSizeadjusted
[1] 0.599

Q2.1b: Select all that are true statements about the cross-correlation profile [ 1 point ]:

a. aligned read asymmetries between DNA strands are the basis for the cross-correlation profile approach to ChIP-seq QC
c. correlations between strand-specific depths are recalculated after performing a “strand shift”
g. NSC values above a pre-defined threshold are of acceptable quality for downstream analysis

Q2.2a: What is the NSC value for the ChIP sample?

1.447

Q2.2b What is the RSC values for the ChIP sample?

1.079

Q2.2c Do either of the metrics incorporate the “shaddow peak” height in how they are calculated? Which one(s)?

Yes, the RSC value incorporates the “shadow peak” or phantom peak height for its calculation.

Q2.2d: Landt et al. 2012 (see “Cross-correlation Analysis”) provide minimum NSC and RSC values for libraries with acceptable signal-to-noise ratios. What are these minimum values and does this library pass ENCODE standards for quality control? Select one:

  1. minimum NSC = 1.05, minimum RSC = 0.8,the library passes QC

Q2.3: Now answer the following for the input sample. Be sure to include the cross-correlation profile figure for the input sample in your answer [ 1 point ].

The cross-correlation profile figure for the input sample is attached with this assignment as a pdf.

Q2.3a: What steps are typically included/excluded in the preparation of the control/input sample? (see Week 11 pre-recorded videos).

Steps included in the preparation of control/input sample:
a.chemical cross-linking of DNA and protein
b.fragmentation of DNA
d.unlinking of DNA and protein
e.library preparation and sequencing of DNA

Steps EXCLUDED in the preparation of the control/input sample:
c.chromatin immunoprecipation with an appropriate antibody

Q2.3b: What are the NSC and RSC values for the input sample? Would this library pass the quality control standards for the ENCODE project if it were a ChIP sample?

Output list for the input sample

$CC_StrandShift
[1] 140

$tag.shift
[1] 70

$N1
[1] 8725518

$Nd
[1] 9119345

$CC_PBC
[1] 0.957

$CC_readLength
[1] 36

$CC_UNIQUE_TAGS_LibSizeadjusted
[1] 5992518

$CC_NSC
[1] 1.04

$CC_RSC
[1] 0.214

$CC_QualityFlag
[1] -2

$CC_shift
[1] 100

$CC_A
[1] 0.127

$CC_B
[1] 0.145

$CC_C
[1] 0.122

$CC_ALL_TAGS
[1] 9606580

$CC_UNIQUE_TAGS
[1] 9119345

$CC_UNIQUE_TAGS_nostrand
[1] 9098023

$CC_NRF
[1] 0.949

$CC_NRF_nostrand
[1] 0.947

$CC_NRF_LibSizeadjusted
[1] 0.599

The NSC value for the input sample is 1.04 while the RSC value for the input sample is 0.214. This library did NOT pass the quality control standards for the ENCODE project if it were a ChIP sample.

Q2.4: Imagine the input library is also a ChIP sample (not a control). Which of the following is the best interpretations of the fingerprint plot [ 1 point ].

Please also see attached for the fingerprint plot.

  1. the ChIP sample is higher quality because a greater proportion of reads are in higher ranked bins (i.e., bins ranked by depth) than the input

Q2.5. What is shown in the TSS plot? Please provide a detailed interpretation of the plot. Based on the TSS profile plot, describe where are H3K36me3 modifications typically located relative to protein coding genes. [ 1 point ]

The TSS plot shows read density around the transcription start sites visually of the sample compared to the control. Compared to the control, the Chip sample data showed two peaks to the left and right of the transcription start site, indicating regions enriched with bound proteins (transcrption factor or histone modifications). Compared to the peak on the left, the peak on the right (from TSS to +1KB or downstream from transcription starting site) is much broader. This suggests histone modification (broader peak) or where the H3K36me3 modification might be located relative to protein coding genes.