FastQC output包含11個分析結果:
1. Basic Statistics
2. Per base sequence quality
3. Per tile sequence quality
4. Per sequence quality scores
5. Per base sequence content
6. Per sequence GC content
7. Per base N content
8. Sequence Length Distribution
9. Sequence Duplication Levels
10. Overrepresented sequences
11. Adapter Content
分別針對各分析結果做介紹:
基本統計量,包含:
Filename: 檔案名稱File type: 檔案類型Encoding: 測序平台的版本跟相應的編碼版本號Total Sequences: 處理的序列總數Sequences flagged as poor quality: 質量差的序列Sequence length: 測序的長度%GC: 整體序列中的GC含量註. 基本統計量永遠不會顯示成警告或不合格。
各個位置上所有鹼基的質量值分布情況,將所有reads數據綜合一起分析。
檢查reads中每一個鹼基位置在不同測序小孔之間的偏離度。
每條序列的測序質量統計,查看序列的子集是否具有普遍較低的質量值。每條reads的平均質量分布。
對所有reads的每一個位置,繪製ACTG四種鹼基佔的比例。
GC含量在序列中的比例分布,透過GC含量判斷測序過程是否足夠隨機。
當出現測序儀不能分辨某條reads的某個位置為何種鹼基時會產生N,代表序列的位點定序品質很差。
序列測序的長度分布圖。
計算每個序列的重複程度。
序列的重複數。
每一位置上常用接頭序列的比例。
下表為12個檔案的fasq結果,綠燈代表合格、橘燈代表警告、紅燈則表示不合格。
結合上表結果,針對有橘燈與紅燈的結果依序介紹。
Per base sequence content皆不合格,代表A/T或G/C的平均含量差至少在某一處大於20%。下方圖形由左至右,上至下分別為 `SRR1552444、SRR1552445、SRR1552446、SRR1552447、SRR1552448、SRR1552449檔案的Per base sequence分析結果。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\44\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\45\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\46\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\47\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\48\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\49\\Per base sequence content.png")
下方圖形由左至右,上至下分別代表 SRR1552450、SRR1552451、SRR1552452、SRR1552453、SRR1552454、SRR1552455檔案的Per base sequence分析結果。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\50\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\51\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\52\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\53\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\54\\Per base sequence content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\55\\Per base sequence content.png")
Per sequence GC content皆出現警告標示,代表GC含量真實值偏離理論值15%以上。特別是SRR1552448、SRR1552450與SRR1552454檔案的GC含量真實值偏離理論值30%以上,出現不合格標示。圖形由左至右,上至下分別代表 SRR1552444、SRR1552445、SRR1552446、SRR1552447 檔案的Per sequence GC content分析結果。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\44\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\45\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\46\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\47\\Per sequence GC content.png")
圖形由左至右,上至下分別代表 SRR1552449、SRR1552451、SRR1552452、SRR1552453、SRR1552455檔案的``Per sequence GC content分析結果。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\49\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\51\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\52\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\53\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\55\\Per sequence GC content.png")
圖形由左至右,上至下分別代表 SRR1552448、SRR1552450、SRR1552454檔案的Per sequence GC content分析結果。。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\48\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\50\\Per sequence GC content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\54\\Per sequence GC content.png")
Per base N content分析結果出現警告圖示,分別為SRR1552444至SRR1552448。SRR1552444資料檔,最左下為SRR1552448資料檔):knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\44\\Per base N content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\45\\Per base N content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\46\\Per base N content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\47\\Per base N content.png")
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\48\\Per base N content.png")
Overrepresented sequences結果中,僅有SRR1552454資料檔的分析結果合格,其餘資料檔的序列的重複數比例皆超過總reads的0.1%。下表為SRR1552444資料檔分析結果,有2條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\44\\Overrepresented sequences.png")
下表為SRR1552445資料檔分析結果,有2條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\45\\Overrepresented sequences.png")
下表為SRR1552446資料檔分析結果,有13條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\46\\Overrepresented sequences.png")
下表為SRR1552447資料檔分析結果,有條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\47\\Overrepresented sequences.png")
下表為SRR1552448資料檔分析結果,有31條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\48\\Overrepresented sequences.png")
下表為SRR1552449資料檔分析結果,有16條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\49\\Overrepresented sequences.png")
下表為SRR1552450資料檔分析結果,有1條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\50\\Overrepresented sequences.png")
下表為SRR1552451資料檔分析結果,有1條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\51\\Overrepresented sequences.png")
下表為SRR1552452資料檔分析結果,有1條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\52\\Overrepresented sequences.png")
下表為SRR1552453資料檔分析結果,有2條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\53\\Overrepresented sequences.png")
下表為SRR1552455資料檔分析結果,有2條序列的重複數超過總reads的0.1%。
knitr::include_graphics("E:\\graduate institute\\108_1\\Bioinformatics\\Homework7 fastq\\file plot\\55\\Overrepresented sequences.png")