Final Project Checkpoint 4

Initial QC Steps

By: Fatma Gazeyoglu

To review the steps of my paper, per the data stored in PRJNA994299, 16 pigs were challenged with Swine Influenza Virus H1N2 in which samples were collected 3 from each group 2 days post inoculation (2 dpi), 3 per group at 5dpi, and the remaining 4 pigs by the end of the in vivo experiment (9 dpi) as well as an extra lung sample, adding up to a total of 33 samples. They were all run using Illumina MiSeq.

1

Run

2

BioSample

3

AvgSpotLen

4

Bases

5

Bytes

6

Collection_Date

7

Experiment

8

isolation_source

9

Library Name

10

replicate

11

Sample Name

1

SRR25266111

SAMN36425382

265

7.54 M

3.90 Mb

2020-08-10

SRX21012101

BALF

13_BALF_EVOLSI2020

10

13_BALF_EVOLSI2020

2

SRR25266112

SAMN36425381

270

27.69 M

14.17 Mb

2020-08-09

SRX21012100

BALF

12_BALF_EVOLSI2020

9

12_BALF_EVOLSI2020

3

SRR25266113

SAMN36425380

271

3.69 M

2.30 Mb

2020-08-08

SRX21012099

BALF

11_BALF

8

11_BALF

4

SRR25266114

SAMN36425379

268

7.06 M

3.65 Mb

2020-08-07

SRX21012098

BALF

10_BALF_EVOLSI2020

7

10_BALF_EVOLSI2020

5

SRR25266115

SAMN36425378

268

11.03 M

5.67 Mb

2020-08-06

SRX21012097

BALF

09_BALF_EVOLSI2020

6

09_BALF_EVOLSI2020

6

SRR25266116

SAMN36425377

271

6.68 M

3.52 Mb

2020-08-05

SRX21012096

BALF

06_BALF

5

06_BALF

7

SRR25266117

SAMN36425405

239

4.59 M

2.20 Mb

2020-08-10

SRX21012095

MDCK cells

H1N2_EVOLSI2020

33

H1N2_EVOLSI2020

8

SRR25266118

SAMN36425404

269

6.34 M

3.34 Mb

2020-08-02

SRX21012094

nasal swab

animal16_9_NS

32

animal16_9_NS

9

SRR25266119

SAMN36425403

263

7.80 M

4.14 Mb

2020-08-01

SRX21012093

nasal swab

animal16_5_NS_EVOLSI2020

31

animal16_5_NS_EVOLSI2020

10

SRR25266120

SAMN36425376

269

5.33 M

2.80 Mb

2020-08-04

SRX21012092

BALF

05_BALF

4

05_BALF

11

SRR25266121

SAMN36425402

261

20.83 M

10.92 Mb

2020-08-10

SRX21012091

nasal swab

animal16_4_NS_EVOLSI2020

30

animal16_4_NS_EVOLSI2020

12

SRR25266122

SAMN36425401

270

15.99 M

8.21 Mb

2020-08-10

SRX21012090

nasal swab

animal16_3_NS

29

animal16_3_NS

13

SRR25266123

SAMN36425400

267

3.58 M

2.25 Mb

2020-08-07

SRX21012089

nasal swab

animal15_5_NS_EVOLSI2020

28

animal15_5_NS_EVOLSI2020

14

SRR25266124

SAMN36425399

249

6.97 M

4.40 Mb

2020-08-06

SRX21012088

nasal swab

animal15_3_NS

27

animal15_3_NS

15

SRR25266125

SAMN36425398

257

11.39 M

5.91 Mb

2020-08-04

SRX21012087

nasal swab

animal14_4_NS

26

animal14_4_NS

16

SRR25266126

SAMN36425397

266

10.94 M

5.71 Mb

2020-08-03

SRX21012086

nasal swab

animal14_3_NS

25

animal14_3_NS

17

SRR25266127

SAMN36425396

266

12.17 M

6.34 Mb

2020-08-09

SRX21012085

nasal swab

animal13_5_NS_EVOLSI2020

24

animal13_5_NS_EVOLSI2020

18

SRR25266128

SAMN36425395

259

2.12 M

1.17 Mb

2020-08-10

SRX21012084

nasal swab

animal13_4_NS

23

animal13_4_NS

19

SRR25266129

SAMN36425394

272

4.90 M

2.62 Mb

2020-08-09

SRX21012083

nasal swab

animal13_3_NS

22

animal13_3_NS

20

SRR25266130

SAMN36425393

265

4.53 M

2.47 Mb

2020-08-09

SRX21012082

nasal swab

animal12_5_NS_EVOLSI2020

21

animal12_5_NS_EVOLSI2020

21

SRR25266131

SAMN36425375

272

9.35 M

4.90 Mb

2020-08-03

SRX21012081

BALF

04_BALF

3

04_BALF

22

SRR25266132

SAMN36425392

265

2.37 M

1.27 Mb

2020-08-10

SRX21012080

nasal swab

animal12_4_NS

20

animal12_4_NS

23

SRR25266133

SAMN36425391

274

7.69 M

4.00 Mb

2020-08-09

SRX21012079

nasal swab

animal12_3_NS

19

animal12_3_NS

24

SRR25266134

SAMN36425390

264

3.58 M

1.90 Mb

2020-08-01

SRX21012078

nasal swab

animal8_4_NS

18

animal8_4_NS

25

SRR25266135

SAMN36425389

262

2.63 M

1.42 Mb

2020-08-01

SRX21012077

nasal swab

animal7_4_NS

17

animal7_4_NS

26

SRR25266136

SAMN36425388

271

27.96 M

14.42 Mb

2020-08-01

SRX21012076

nasal swab

animal7_3_NS

16

animal7_3_NS

27

SRR25266137

SAMN36425387

263

7.06 M

3.64 Mb

2020-08-01

SRX21012075

nasal swab

animal6_4_NS

15

animal6_4_NS

28

SRR25266138

SAMN36425386

263

2.94 M

1.89 Mb

2020-08-01

SRX21012074

nasal swab

animal5_5_NS

14

animal5_5_NS

29

SRR25266139

SAMN36425385

260

4.41 M

2.30 Mb

2020-08-01

SRX21012073

nasal swab

animal5_4_NS

13

animal5_4_NS

30

SRR25266140

SAMN36425384

266

9.12 M

4.69 Mb

2020-08-01

SRX21012072

nasal swab

animal5_3_NS

12

animal5_3_NS

31

SRR25266141

SAMN36425383

262

5.09 M

3.28 Mb

2020-08-10

SRX21012071

BALF

14_BALF_EVOLSI2020

11

14_BALF_EVOLSI2020

32

SRR25266142

SAMN36425374

267

15.17 M

7.77 Mb

2020-08-02

SRX21012070

BALF

02_BALF

2

02_BALF

33

SRR25266143

SAMN36425373

269

9.59 M

4.91 Mb

2020-08-01

SRX21012069

BALF

01_BALF

1

01_BALF

The samples were fastqc’d followed by trimming via trimmomatic. The following is some of the general statistics created by MultiQC post trimming.

Sample Name % Dups % GC Length M Seqs
SRR25266111_trim.1 61.0% 44% 134 bp 0.0
SRR25266111_trim.2 59.8% 44% 133 bp 0.0
SRR25266112_trim.1 75.3% 44% 136 bp 0.1
SRR25266112_trim.2 73.1% 44% 135 bp 0.1
SRR25266113_trim.1 45.9% 44% 135 bp 0.0
SRR25266113_trim.2 43.7% 44% 134 bp 0.0
SRR25266114_trim.1 60.2% 44% 135 bp 0.0
SRR25266114_trim.2 58.3% 44% 135 bp 0.0
SRR25266115_trim.1 66.5% 44% 135 bp 0.0
SRR25266115_trim.2 65.1% 44% 135 bp 0.0
SRR25266116_trim.1 57.6% 44% 136 bp 0.0
SRR25266116_trim.2 56.3% 44% 136 bp 0.0
SRR25266117_trim.1 49.1% 43% 123 bp 0.0
SRR25266117_trim.2 49.1% 43% 123 bp 0.0
SRR25266118_trim.1 64.3% 46% 135 bp 0.0
SRR25266118_trim.2 63.5% 46% 135 bp 0.0
SRR25266119_trim.1 60.7% 44% 133 bp 0.0
SRR25266119_trim.2 58.7% 44% 132 bp 0.0
SRR25266120_trim.1 54.3% 44% 135 bp 0.0
SRR25266120_trim.2 53.5% 44% 135 bp 0.0
SRR25266121_trim.1 72.0% 44% 132 bp 0.1
SRR25266121_trim.2 69.3% 44% 131 bp 0.1
SRR25266122_trim.1 71.3% 44% 136 bp 0.1
SRR25266122_trim.2 69.8% 44% 135 bp 0.1
SRR25266123_trim.1 49.2% 44% 134 bp 0.0
SRR25266123_trim.2 43.0% 44% 132 bp 0.0
SRR25266124_trim.1 50.1% 44% 125 bp 0.0
SRR25266124_trim.2 44.2% 44% 123 bp 0.0
SRR25266125_trim.1 63.7% 44% 130 bp 0.0
SRR25266125_trim.2 62.8% 44% 130 bp 0.0
SRR25266126_trim.1 68.3% 44% 134 bp 0.0
SRR25266126_trim.2 66.8% 44% 133 bp 0.0
SRR25266127_trim.1 66.7% 44% 134 bp 0.0
SRR25266127_trim.2 65.3% 44% 134 bp 0.0
SRR25266128_trim.1 44.5% 43% 131 bp 0.0
SRR25266128_trim.2 44.0% 43% 131 bp 0.0
SRR25266129_trim.1 59.0% 45% 136 bp 0.0
SRR25266129_trim.2 56.4% 45% 135 bp 0.0
SRR25266130_trim.1 53.8% 45% 133 bp 0.0
SRR25266130_trim.2 52.8% 45% 133 bp 0.0
SRR25266131_trim.1 66.2% 44% 137 bp 0.0
SRR25266131_trim.2 64.3% 44% 136 bp 0.0
SRR25266132_trim.1 39.9% 43% 134 bp 0.0
SRR25266132_trim.2 39.4% 43% 134 bp 0.0
SRR25266133_trim.1 61.1% 44% 138 bp 0.0
SRR25266133_trim.2 59.4% 44% 137 bp 0.0
SRR25266134_trim.1 51.1% 44% 133 bp 0.0
SRR25266134_trim.2 51.2% 44% 133 bp 0.0
SRR25266135_trim.1 53.4% 46% 132 bp 0.0
SRR25266135_trim.2 52.9% 46% 132 bp 0.0
SRR25266136_trim.1 78.5% 45% 137 bp 0.1
SRR25266136_trim.2 76.0% 45% 136 bp 0.1
SRR25266137_trim.1 61.8% 45% 133 bp 0.0
SRR25266137_trim.2 61.1% 45% 133 bp 0.0
SRR25266138_trim.1 38.8% 44% 131 bp 0.0
SRR25266138_trim.2 33.4% 44% 129 bp 0.0
SRR25266139_trim.1 54.3% 44% 131 bp 0.0
SRR25266139_trim.2 53.3% 44% 131 bp 0.0
SRR25266140_trim.1 64.6% 44% 134 bp 0.0
SRR25266140_trim.2 63.2% 44% 134 bp 0.0
SRR25266141_trim.1 53.5% 43% 131 bp 0.0
SRR25266141_trim.2 39.4% 44% 125 bp 0.0
SRR25266142_trim.1 68.7% 44% 135 bp 0.1
SRR25266142_trim.2 67.5% 44% 134 bp 0.1
SRR25266143_trim.1 60.8% 44% 136 bp 0.0
SRR25266143_trim.2 59.4% 44% 135 bp 0.0

It seems that the GC content is not as great but in accordance to sequencing of Influenza, the GC content tends to be around 46% so I am not worried about it for the time being.