The data we are using to practice working with SNPs in VCF files comes from a paper in Molecular Ecology by Jennifer Walsh called “Subspecies delineation amid phenotypic, geographic and genetic discordance in a songbird.” The goal of the paper is to compare traditional ways of classifying different populations and subspecies of Ammodramus sparrow with morphology and genetic data. In order to do this analysis, though, we need to prepare the data.
Missing data is common in SNP datasets, especially in samples from wild animals where the amount of DNA collected may be small. We can fill in (impute) missing data through methods such as mean imputation. However, if there’s lots of NAs we may want to consider removing samples (in this case birds) for which we’d have to impute many - if not most - of their SNPs. That is to say - if we have to impute a lot of the data for the sample, we’d be better off just removing the sample.
There are no strict rules for many of the decisions you have to make when doing data analysis, especially when it comes to preparation and cleaning steps like removing or imputing NAs.
In this study, the authors state:
“we removed any individuals that were missing data at more than 50% of the SNPs.”
This is an arbitrary decision - they could have chosen 60% or 25%. It is therefore called a researcher degree of freedom because they are free to chose what to do, and someone else may have made a different choice.
This type of decision is very common in data analysis, and needs to be documented and justified. Posting the raw data, as the authors have, allows other researchers to explore the consequences of a different choice.
In this exercise we will show how you can remove rows of data if it
is necessary. We’ll use the handy functions which()
and
is.na()
and a common programming tool called a
for() loop.
Note we are removing individual samples, which are represented by rows in the data. Elsewhere in our protocol we remove columns, for example if all the columns have the same genotype.
Load the vcfR package with library
()`:
library(vcfR)
##
## ***** *** vcfR *** *****
## This is vcfR 1.13.0
## browseVignettes('vcfR') # Documentation
## citation('vcfR') # Citation
## ***** ***** ***** *****
Make sure that your working directory is set to the location of the
file all_loci.vcf
.
getwd()
## [1] "/Users/sarayusoma/Downloads"
list.files()
## [1] " Sarayu_Soma_03_01_22_COMPILED_no_PS__All_Population_Data_SP22_2_19 .xlsx"
## [2] "#2.1.2 (1).pdf"
## [3] "#2.1.2 (2).pdf"
## [4] "#2.1.2.pdf"
## [5] "#2.3.1 (1).pdf"
## [6] "#2.3.1.pdf"
## [7] "#2.6.8.pdf"
## [8] "#2.8.18.pdf"
## [9] "#3.10.1.pdf"
## [10] "#3.3.7.pdf"
## [11] "#3.6.7.pdf"
## [12] "#4.3.4.pdf"
## [13] "#5.4.8.pdf"
## [14] "#6.5.1.pdf"
## [15] "~$2224Populations.pptx"
## [16] "~$2224Species InteractionsPOST (1).pptx"
## [17] "~$2224Species InteractionsPOST.pptx"
## [18] "~$Clones_Pond_Site_Analysis_Excel_KLW.xlsx"
## [19] "~$er_Presentation_Evaluation_Duckweed_SP22_nk.docx"
## [20] "~$Final Exam Review (1).pptx"
## [21] "~$FINAL PRESENTATION NaCl Duckweed.pptx"
## [22] "~$Group A UG_aj cand 2.xlsx"
## [23] "~$NaCl Gel .pptx"
## [24] "~$NaCl Gel Slide 2 .pptx"
## [25] "~$NaCl Slides_ Introduction Part 3-2.pptx"
## [26] "~$pond_site_analysis_final_sp22_SS.pptx"
## [27] "~$Prelab_Barcode_SNP_Sequencing_set_up.pptx"
## [28] "~$Results table (1).pptx"
## [29] "~$Results table.pptx"
## [30] "~$Soma_Sarayu_Journal club2.pptx"
## [31] "~$sosa_lecture-BIOTECHpost.pptx"
## [32] "~$TV Hours.xlsx"
## [33] "01-27-2022 (1).zip"
## [34] "01-27-2022 (2).zip"
## [35] "01-27-2022 3"
## [36] "01-27-2022.zip"
## [37] "0120 SP22 Recitation Worksheet for Week of January 16 for Students.pdf"
## [38] "0120 SP22 Recitation Worksheet Gradescope Template.pdf"
## [39] "0310Midterm1-Fall2021-final version.pdf"
## [40] "11.pdf"
## [41] "1265221.pdf"
## [42] "1540_cluster_analysis.pdf"
## [43] "16.pdf"
## [44] "1665524941.103131 (2).jpg"
## [45] "1665694599.460772.jpg"
## [46] "2 English 112 SAMPLE Annotated Bibliography .pdf"
## [47] "2.1.2.pdf"
## [48] "2.3.9.pdf"
## [49] "2.6q8.pdf"
## [50] "200A4612-6DB0-4037-8FEC-838051AA801D.jpg"
## [51] "2019 W-2.pdf"
## [52] "2020_10_16_Boone_YES_NIHtalk_final.pdf"
## [53] "2021 End of Year Testing Schedule v2 (1).docx"
## [54] "2021 End of Year Testing Schedule v2 (2).docx"
## [55] "2021 End of Year Testing Schedule v2.docx"
## [56] "2021 End of Year Testing Schedule.docx"
## [57] "2021.pdf"
## [58] "2022 (1).pdf"
## [59] "2022-03-25 07.37.pdf"
## [60] "2022-03-26 09.36.42(1).pdf"
## [61] "2022-03-26 09.36.42(1)(2).pdf"
## [62] "2022-03-31 11.55.59(1).pdf"
## [63] "2022-04-05 11.40.44(1).pdf"
## [64] "2022-04-19 11.35.02(1).pdf"
## [65] "2022-04-21 11.41.43(1)(1).pdf"
## [66] "2022-04-21 11.52.56(1).pdf"
## [67] "2022.pdf"
## [68] "2217 Project 2 - Lab 1.pdf"
## [69] "2224Basic Concepts.pptx"
## [70] "2224Communities.pptx"
## [71] "2224DNA(13)StructurePart1 POST (1).pptx"
## [72] "2224DNA(13)StructurePart1 POST.pptx"
## [73] "2224Genomes (17) (1).pptx"
## [74] "2224Genomes (17).pptx"
## [75] "2224Populations.pptx"
## [76] "2224Species InteractionsPOST (1).pptx"
## [77] "2224Species InteractionsPOST.pptx"
## [78] "23d.pdf"
## [79] "23d2.pdf"
## [80] "3.2q8.pdf"
## [81] "3.3q1.pdf"
## [82] "3.4.11.pdf"
## [83] "3.7.4.pdf"
## [84] "3.9.1.pdf"
## [85] "36ty.pdf"
## [86] "4.2.7.pdf"
## [87] "4.3.4.pdf.HEIC"
## [88] "4datasheet.pdf"
## [89] "4F89091F-911A-4171-83E3-F291622DDAFB.jpeg"
## [90] "5050.jpeg"
## [91] "551 sequence.ape"
## [92] "5C3FC2F5-5E06-4C58-9820-9BC803FF75DA(1).pdf"
## [93] "6-DRD114.seq"
## [94] "65639020680__D0DF1322-9287-40CD-9A58-2172B3B0880D.HEIC"
## [95] "65639024382__8443C457-44E0-4532-A7DA-3429837738E0.HEIC"
## [96] "65639025037__132A5E72-8641-42C7-8938-85E25B742C3A.HEIC"
## [97] "65820722168__D548557D-CCED-4C5C-8595-2AA2829C0BF2.HEIC"
## [98] "65880148956__38294719-6A44-4CE6-8B22-11DF1B768A03.HEIC"
## [99] "65880150065__1B51A37F-895E-455B-B287-C8C6B809FC3E.HEIC"
## [100] "65880151316__EF63C8CF-39F1-4EA3-97C8-91DA6481A203.HEIC"
## [101] "7.3 Notes.pdf"
## [102] "7112.pdf"
## [103] "731 sequence.ape"
## [104] "775 sequence .ape"
## [105] "9.1.pdf"
## [106] "9.2.pdf"
## [107] "98EB2D08-76E3-442E-BF0A-5BADA87D1164.jpeg"
## [108] "a1.pdf"
## [109] "ABC.pdf"
## [110] "abipic.jpg"
## [111] "abt.png"
## [112] "abt2.png"
## [113] "Aggression Slides for Canvas (1).pdf"
## [114] "Aggression Slides for Canvas (1).pptx"
## [115] "Aggression Slides for Canvas.pptx"
## [116] "AirQualbackground .jpg"
## [117] "all_loci (1).vcf"
## [118] "all_loci (2).vcf"
## [119] "all_loci-1.vcf"
## [120] "all_loci.vcf"
## [121] "Allocation of Scarce Resources_ Triage Reading Response (1).pdf"
## [122] "Allocation of Scarce Resources_ Triage Reading Response (2).pdf"
## [123] "Allocation of Scarce Resources_ Triage Reading Response .pdf"
## [124] "allomtery_3_scatterplot3d (1).Rmd"
## [125] "american aster inaturalist.png"
## [126] "AP Art #0.jpeg"
## [127] "AP ART #1 jpg (1).jpeg"
## [128] "AP ART #1 jpg.jpeg"
## [129] "AP ART #1.pdf"
## [130] "AP Art #2 jpg.jpeg"
## [131] "AP Art #3"
## [132] "AP Art #3 (1).jpeg"
## [133] "AP Art #3.jpeg"
## [134] "ApE Linux"
## [135] "ApE_linux_current.zip"
## [136] "ApE_OSX_modern_current (1).dmg"
## [137] "ApE_OSX_modern_current (2).dmg"
## [138] "ApE_OSX_modern_current (3).dmg"
## [139] "ApE_OSX_modern_current.dmg"
## [140] "archinte_146_1_026 (1).pdf"
## [141] "archinte_146_1_026.pdf"
## [142] "Art History Final Paper (1).docx"
## [143] "Art History Final Paper_ Feminine Identity .docx"
## [144] "Art History Final Paper.docx"
## [145] "ash.pdf"
## [146] "ashmore_sharer_how_archaeology_works.pdf"
## [147] "Attraction Lecture Slides for Canvas.pdf"
## [148] "Attraction Lecture Slides for Canvas.pptx"
## [149] "B Presentation.mp4"
## [150] "b1.pdf"
## [151] "b10.pdf"
## [152] "b24.pdf"
## [153] "b25.pdf"
## [154] "b26.pdf"
## [155] "b27.pdf"
## [156] "b28.pdf"
## [157] "b29.pdf"
## [158] "B9114EF8-C2B1-4B15-AD11-FAAFEB3EBBE0.jpeg"
## [159] "Bank Statement .pdf"
## [160] "BH 2022 hold form (1).pdf"
## [161] "Bio Exam 4.pdf"
## [162] "Bio Final Exam Cheat Sheet.pdf"
## [163] "Bio presentation.pdf"
## [164] "Bio:Chem tables"
## [165] "bio21.pdf"
## [166] "bio22.pdf"
## [167] "bio23.pdf"
## [168] "Biology Exam 2_ Regulation of Gene Expression.pdf"
## [169] "Biology Exam 3.pdf"
## [170] "Biology unit one test.pdf"
## [171] "Biomedical Ethics Final Paper.pdf"
## [172] "Biomedical Ethics Global Health (1).pdf"
## [173] "Biomedical Ethics Global Health (2).pdf"
## [174] "Biomedical Ethics Global Health -1 (1).pdf"
## [175] "Biomedical Ethics Global Health -1.pdf"
## [176] "Biomedical Ethics Global Health .pdf"
## [177] "Biomedical Ethics Presentation-Sarayu Soma .pdf"
## [178] "Biomedical Ethics_ Cross-Cultural Medicine Reading Assignment (1).pdf"
## [179] "Biomedical Ethics_ Cross-Cultural Medicine Reading Assignment (2).pdf"
## [180] "Biomedical Ethics_ Cross-Cultural Medicine Reading Assignment (3).pdf"
## [181] "Biomedical Ethics_ Cross-Cultural Medicine Reading Assignment .pdf"
## [182] "bird_snps_remove_NAs.html"
## [183] "bird_snps_remove_NAs.Rmd"
## [184] "Blog_ American aster (1).pdf"
## [185] "Blog_ American aster.pdf"
## [186] "boarding pass 1.pdf"
## [187] "Bouvia 3 (1).pdf"
## [188] "Bouvia 3.pdf"
## [189] "bp2.pdf"
## [190] "bratz.jpeg"
## [191] "bratz.jpeg 1.pdf"
## [192] "c0110_expt10_datasheets.pdf"
## [193] "c0110_expt11_datasheets.pdf"
## [194] "c0110_expt12_datasheets.pdf"
## [195] "c0110_expt3_datasheets.pdf"
## [196] "c0110_expt4_datasheets.pdf"
## [197] "c0110_expt5_datasheets.pdf"
## [198] "c0110_expt6_datasheets (1).pdf"
## [199] "c0110_expt6_datasheets.pdf"
## [200] "c0110_expt7_datasheets.pdf"
## [201] "c0110_expt8_datasheets.pdf"
## [202] "c0110_expt9_datasheets.pdf"
## [203] "c0120_expt10_datasheets.pdf"
## [204] "c0120_expt11_datasheets (1).pdf"
## [205] "c0120_expt11_datasheets.pdf"
## [206] "c0120_expt13_datasheets.pdf"
## [207] "c0120_expt3_datasheets.pdf"
## [208] "c0120_expt4_datasheets.pdf"
## [209] "c0120_expt7_datasheets.pdf"
## [210] "c0120_expt8_datasheets.pdf"
## [211] "c0120_expt9_datasheets.pdf"
## [212] "c22.pdf"
## [213] "c3.pdf"
## [214] "c45.pdf"
## [215] "c55.pdf"
## [216] "c6 relationships .pdf"
## [217] "c6.key"
## [218] "C7EF9450-A47C-4D9C-B469-9B92FA077CE3_4_5005_c.jpeg"
## [219] "calendar.ics"
## [220] "carbon skeletons2.png"
## [221] "CB_OrderReceipt (1).pdf"
## [222] "CB_OrderReceipt.pdf"
## [223] "CB.heic"
## [224] "center_function (1).R"
## [225] "center_function.R"
## [226] "Cerego Terms unit 2 .pdf"
## [227] "Ch1and2homework.pdf"
## [228] "Ch2and3homework.pdf"
## [229] "Ch4homework.pdf"
## [230] "Ch5homework.pdf"
## [231] "Ch6homework.pdf"
## [232] "Ch7homework-part1.pdf"
## [233] "Ch7part2homework.pdf"
## [234] "Ch8homework.pdf"
## [235] "Chapter 02 - Programs.zip"
## [236] "Chapter 4 Review 2021.pdf"
## [237] "Chapter 53 Study Guide (1).pdf"
## [238] "Chapter 54 Study Guide (1).pdf"
## [239] "Chapter 55 Study Guide (1).pdf"
## [240] "Chapter 56 Study Guide (1).pdf"
## [241] "Chapter 6.doc"
## [242] "CHEM 0110 Recitation 1.pdf"
## [243] "Chem Lab 5 Instructions .pdf"
## [244] "Chemistry 2 Final Exam .pdf"
## [245] "Chemistry Exam 2.pdf"
## [246] "ci_for_mean_flowchart.pdf"
## [247] "Clones_Pond_Site_Analysis_Excel_KLW.xlsx"
## [248] "Close Relationships 1 Slides for Canvas.pptx"
## [249] "Close Relationships 2 slides for Canvas (1).pptx"
## [250] "Close Relationships 2 slides for Canvas (2).pptx"
## [251] "Close Relationships 2 slides for Canvas (3).pptx"
## [252] "Close Relationships 2 slides for Canvas (4).pptx"
## [253] "Close Relationships 2 slides for Canvas (5).pptx"
## [254] "Close Relationships 2 slides for Canvas.pptx"
## [255] "cluster_analysis_portfolio.Rmd"
## [256] "code_checkpoint_vcfR.html"
## [257] "code_checkpoint_vcfR.Rmd"
## [258] "CODE_CHECKPOINT-first_rstudio_script.R"
## [259] "college resume 2.0.pdf"
## [260] "college resume 3.0.pdf"
## [261] "college resume.pdf"
## [262] "college+resume.pdf"
## [263] "CommonIons.pdf"
## [264] "CompBio Tracker 3.0.pdf"
## [265] "CompBio Tracker_NEW GenEds_8.21.pdf"
## [266] "CompBio Tracker.pdf"
## [267] "CompBio Tracker2.0.pdf"
## [268] "COMPILED_no_PS__All_Population_Data_SP22_2_19 Sarayu Soma.xlsx"
## [269] "COMPILED_no_PS__All_Population_Data_SP22_2_19.xlsx - LA Docs_files"
## [270] "COMPILED_no_PS__All_Population_Data_SP22_2_19.xlsx - LA Docs.html"
## [271] "Conflict and Peace-Making Lecture Slides for Canvas (1).pptx"
## [272] "Conflict and Peace-Making Lecture Slides for Canvas.pptx"
## [273] "Control Data Exponential graph.jpeg"
## [274] "Copy of FINAL PRESENTATION NaCl Duckweed.pptx"
## [275] "Covid Vaccination Card.pdf"
## [276] "CS0011_Project3-1 (1).pdf"
## [277] "CS0011_Project3-1 (2).pdf"
## [278] "CS0011_Project3-1 (3).pdf"
## [279] "CS0011_Project3-1 (4).pdf"
## [280] "CS0011_Project3-1.pdf"
## [281] "CS0011SEC1020 Project 3-1.pdf"
## [282] "CSSProfile-FAA44IL.pdf"
## [283] "custodian video.mov"
## [284] "custodians video2 (1).mp4"
## [285] "custodians video2.mp4"
## [286] "D.pdf"
## [287] "Datta 2017 Head Count.jpg"
## [288] "Day 14 Lemna Minor Nacl Boxplot SS.jpeg"
## [289] "dbp.pdf"
## [290] "Degree planning "
## [291] "Discord.dmg"
## [292] "disha bday 2.heic"
## [293] "Disha pics.jpg"
## [294] "dl1.HEIC"
## [295] "dl2.HEIC"
## [296] "DNA_Extraction_Methods_Flow_Chart_Student_View_Sp20202.22.22 (1).pdf"
## [297] "DNA_Extraction_Methods_Flow_Chart_Student_View_Sp20202.22.22.pptx"
## [298] "DNAFLOWCHART.pdf"
## [299] "Downloads.Rproj"
## [300] "dp1.key"
## [301] "drive-download-20210617T020926Z-001"
## [302] "drive-download-20210617T020926Z-001.zip"
## [303] "Driver's License and Identification Card Application.pdf"
## [304] "Drivers License .HEIC"
## [305] "DRIVERS LICENSE.pdf"
## [306] "DSE8.pdf"
## [307] "DSE82.pdf"
## [308] "Duckweed Week 1 Project (1).docx"
## [309] "Duckweed Week 1 Project (1).pptx"
## [310] "Duckweed Week 1 Project (2).docx"
## [311] "Duckweed Week 1 Project (2).pptx"
## [312] "Duckweed Week 1 Project.docx"
## [313] "Duckweed Week 1 Project.pptx"
## [314] "dw1.docx"
## [315] "e12.pdf"
## [316] "E8CL.pdf"
## [317] "e9.pdf"
## [318] "e91.pdf"
## [319] "Early Admission Certificate.pdf"
## [320] "eb1.pdf"
## [321] "Elizabeth Bouvia Script.pdf"
## [322] "Empirical Formula Practice Problems.pdf"
## [323] "English 112_Lit_Crit_Presentation_Spring 2021.pdf"
## [324] "English 12 DE Final Research Project_Spring 2021_Periods 6 and 8 .pdf"
## [325] "ex1.ape"
## [326] "ex12.pdf"
## [327] "ex3ds1.pdf"
## [328] "Ex4chem2.pdf"
## [329] "ex7.pdf"
## [330] "Exam 1 Blue {Chem 110 Fall 2017}.pdf"
## [331] "Exam 1 Blue {Chem 110 Fall 2018}.pdf"
## [332] "Exam 1 Green {Chem 110 Fall 2019}.pdf"
## [333] "Exam 2 Green {Chem 110 Fall 2018}.pdf"
## [334] "Exam 2 Green {Chem 110 Fall 2019}.pdf"
## [335] "Exam 2 Study Guide_Forest Fall 2021.docx"
## [336] "Exam 3 Blue {Chem 110 Fall 2019}.pdf"
## [337] "Exam 3 Green {Chem 110 Fall 2018}.pdf"
## [338] "Exam 3 Study Guide_Forest Fall 2021.docx"
## [339] "Exam 5 Guide (1).pdf"
## [340] "Executions Reading Response .pdf"
## [341] "exp13data.pdf"
## [342] "exp8nk.pdf"
## [343] "exp9p2 (1).cmbl"
## [344] "exp9p2 (2).cmbl"
## [345] "exp9p2 (3).cmbl"
## [346] "exp9p2 (4).cmbl"
## [347] "exp9p2 (5).cmbl"
## [348] "exp9p2 (6).cmbl"
## [349] "exp9p2.cmbl"
## [350] "Experiment 5 data (1).pdf"
## [351] "Experiment 5 data .pdf"
## [352] "experiment 5 data sheets chem lab.pdf"
## [353] "Experimental _ Top Hat 6.pdf"
## [354] "Experimental _ Top Hat13.pdf"
## [355] "Experimental4:4.pdf"
## [356] "Extra-Credit Assignment Social Psych.pdf"
## [357] "F2279BC8-A3AF-465A-A481-026A76F04190.heic"
## [358] "F78EF387-8F82-4244-ABA8-EF1822D246F7 (1).jpeg"
## [359] "F78EF387-8F82-4244-ABA8-EF1822D246F7.jpeg"
## [360] "feature_engineering_intro_2_functions-part2.Rmd"
## [361] "feature_engineering.Rmd"
## [362] "Fiji_WIWINLM66_Cont_D14_021522_SS.jpeg"
## [363] "Fiji_WIWINLM66_NaCl1_D14_021522_SS.jpeg"
## [364] "Fiji_WIWINLM66_NaCl2_D0_020122_SS.jpeg"
## [365] "Fiji_WIWINLM66_NaCl2_D14_021522_SS.jpeg"
## [366] "Fiji_WIWNINLM66_cont_D0_020122_SS.jpeg"
## [367] "Fiji_WIWNINLM66_Cont_D7_020822_SS.jpg.jpeg"
## [368] "Fiji_WIWNINLM66_NaCl1_D0_020122_SS.jpeg"
## [369] "Fiji_WIWNINLM66_NaCl1_D7_020822_SS.jpeg"
## [370] "Fiji_WIWNINLM66_NaCl2_D7_020822_SS.jpeg"
## [371] "fiji-macosx.zip"
## [372] "File_000.jpeg"
## [373] "file.pdf"
## [374] "Files - Slides.pdf"
## [375] "Final Exam Review (1).pptx"
## [376] "Final Exam Review.pptx"
## [377] "FINAL PRESENTATION NaCl Duckweed final.pptx"
## [378] "FINAL PRESENTATION NaCl Duckweed.pptx"
## [379] "FINAL recitation 4 MC application practice (1).docx.pdf"
## [380] "final revised NaCl Slides_ Introduction Part 1-2.pptx"
## [381] "Final_Practice_Problems_Solutions.pdf"
## [382] "Final_Practice_Problems.pdf"
## [383] "FinalExamInstructions.pdf"
## [384] "Frog species with 6 sex chromosomes offer new clues on evolution of complex XY systems -- ScienceDaily.pdf"
## [385] "fuck_bio.zip"
## [386] "g1.pdf"
## [387] "gd1.pdf"
## [388] "gd2.pdf"
## [389] "gd3.pdf"
## [390] "gd4.pdf"
## [391] "Genes, Culture, Gender and Course Wrap Up 2021 Slides for Canvas.pptx"
## [392] "gh7.pdf"
## [393] "GMT20211207-031206_Recording_1440x900 (1).mp4"
## [394] "GMT20211207-031206_Recording_1440x900.mp4"
## [395] "GMT20211207-031206_Recording_avo_640x360 (1).mp4"
## [396] "GMT20211207-031206_Recording_avo_640x360.mp4"
## [397] "GMT20211207-031206_Recording.m4a"
## [398] "GMT20211207-031206_Recording.transcript.vtt"
## [399] "GMT20211207-042549_Recording_as_1440x900.mp4"
## [400] "GMT20211207-042549_Recording_avo_640x360.mp4"
## [401] "GMT20211207-042549_Recording.m4a"
## [402] "GMT20211207-042549_Recording.transcript.vtt"
## [403] "googlechrome.dmg"
## [404] "gradescope .pdf"
## [405] "graph paper template 17.pdf"
## [406] "Group A UG_aj cand 2.xlsx"
## [407] "Group Influence 1 Slides for Canvas (1).pptx"
## [408] "Group Influence 1 Slides for Canvas (2).pdf"
## [409] "Group Influence 1 Slides for Canvas (2).pptx"
## [410] "Group Influence 1 Slides for Canvas (3) even.pdf"
## [411] "Group Influence 1 Slides for Canvas (3).pptx"
## [412] "Group Influence 1 Slides for Canvas (4).pdf"
## [413] "Group Influence 1 Slides for Canvas (4).pptx"
## [414] "Group Influence 1 Slides for Canvas (odd).pdf"
## [415] "Group Influence 1 Slides for Canvas reduced.key"
## [416] "Group Influence 1 Slides for Canvas.pdf"
## [417] "Group Influence 1 Slides for Canvas.pptx"
## [418] "Group Influence 2 Slides for Canvas.pptx"
## [419] "HAA Final Examination (1).docx"
## [420] "HAA Final Examination.docx"
## [421] "Helping and Prosocial Behavior 1 slides for Canvas.pptx"
## [422] "Helping and Prosocial Behavior 2 Slides for Canvas.pptx"
## [423] "Highschool Transcript .pdf"
## [424] "Hitsman2020J Evolution Biol plasticity.pdf"
## [425] "HQ1_Sarayu Soma.pdf"
## [426] "HQ1_SarayuSoma.pdf"
## [427] "HQ2_Soma, Sarayu.pdf"
## [428] "HQ3_Sarayu Soma.pdf"
## [429] "HQ4Soma_Sarayu.pdf"
## [430] "hstranscript.pdf"
## [431] "https___www.nvcc.edu_forms_pdf_125-030_pg3.pdf"
## [432] "hu.pdf"
## [433] "HUY.pdf"
## [434] "HW1_P5 (1).mpx"
## [435] "HW1_P5 (2).mpx"
## [436] "HW1_P5 (3).mpx"
## [437] "HW1_P5 (4).mpx"
## [438] "HW1_P5 (5).mpx"
## [439] "HW1_P5.mpx"
## [440] "Hypothesis_Testing_for_Means (1).pdf"
## [441] "Hypothesis_Testing_for_Means_II (1).pdf"
## [442] "Hypothesis_Testing_for_Means_II.pdf"
## [443] "Hypothesis_Testing_for_Means.pdf"
## [444] "i9.pdf"
## [445] "image0.jpeg"
## [446] "image1.jpeg"
## [447] "IMG_0042.HEIC"
## [448] "IMG_0111.JPG"
## [449] "IMG_0133.JPG"
## [450] "IMG_0303.HEIC"
## [451] "IMG_0366 (1).HEIC"
## [452] "IMG_0366.HEIC"
## [453] "IMG_0400.HEIC"
## [454] "IMG_0404 2.HEIC"
## [455] "IMG_0404.HEIC"
## [456] "IMG_0431.jpg"
## [457] "IMG_0432.jpg"
## [458] "IMG_0587.HEIC"
## [459] "IMG_0590.HEIC"
## [460] "IMG_0990.HEIC"
## [461] "IMG_1058.HEIC"
## [462] "IMG_1059.HEIC"
## [463] "IMG_1248.HEIC"
## [464] "IMG_1260.HEIC"
## [465] "IMG_1270.heic"
## [466] "IMG_1276.HEIC"
## [467] "IMG_1277.HEIC"
## [468] "IMG_1278.HEIC"
## [469] "IMG_1438.heic"
## [470] "IMG_1473.HEIC"
## [471] "IMG_1531.jpg"
## [472] "IMG_1533.jpg"
## [473] "IMG_1746.HEIC"
## [474] "IMG_1770.HEIC"
## [475] "IMG_1836.HEIC"
## [476] "IMG_1928.jpg"
## [477] "IMG_1933.jpg"
## [478] "IMG_2025.heic"
## [479] "IMG_2191.pdf"
## [480] "IMG_2627.pdf"
## [481] "IMG_2655.pdf"
## [482] "IMG_2670.pdf"
## [483] "IMG_2674.pdf"
## [484] "IMG_2678.pdf"
## [485] "IMG_2688.heic"
## [486] "IMG_2688.pdf"
## [487] "IMG_2953 2.HEIC"
## [488] "IMG_2953 3.heic"
## [489] "IMG_2953.HEIC"
## [490] "IMG_3158.jpg"
## [491] "IMG_3190.heic"
## [492] "IMG_3191.heic"
## [493] "IMG_3192.heic"
## [494] "IMG_3193.heic"
## [495] "IMG_3194.heic"
## [496] "IMG_3195.heic"
## [497] "IMG_3388 copy.heic"
## [498] "IMG_3388.heic"
## [499] "IMG_3430.HEIC"
## [500] "IMG_3433.HEIC"
## [501] "IMG_5728.JPG"
## [502] "IMG_6804.HEIC"
## [503] "IMG_6829.jpg"
## [504] "IMG_6830.jpg"
## [505] "IMG_6999 2.heic"
## [506] "IMG_6999 3.heic"
## [507] "IMG_6999.heic"
## [508] "IMG_7093.heic"
## [509] "IMG_7314.PNG"
## [510] "IMG_7318.jpg"
## [511] "IMG_7405 2.jpg.HEIC"
## [512] "IMG_7405.HEIC"
## [513] "IMG_7503.jpg"
## [514] "IMG_7649 2.JPG"
## [515] "IMG_7649.JPG"
## [516] "IMG_8440 2.jpg"
## [517] "IMG_8440.jpg"
## [518] "IMG_8542.jpg"
## [519] "IMG_8688.heic"
## [520] "IMG_8782.JPG"
## [521] "IMG_8808.PNG"
## [522] "IMG_8816.jpg"
## [523] "IMG_8954.jpg"
## [524] "IMG_9402.HEIC"
## [525] "IMG_9446 2.HEIC"
## [526] "IMG_9446.HEIC"
## [527] "IMG_9576.HEIC"
## [528] "IMG_9631.HEIC"
## [529] "IMG_9632.HEIC"
## [530] "IMG_9688.HEIC"
## [531] "IMG_9703.jpg"
## [532] "IMG_9704 2.HEIC"
## [533] "IMG_9704.HEIC"
## [534] "IMG_9705 2.HEIC"
## [535] "IMG_9705.HEIC"
## [536] "IMG_9706.HEIC"
## [537] "INCLAS_1.APE"
## [538] "Inclass1real.ape"
## [539] "Install Respondus LockDown Browser (x64c) 432216390 2.pkg"
## [540] "Install Respondus LockDown Browser (x64c) 432216390 3.pkg"
## [541] "Install Respondus LockDown Browser (x64c) 432216390.pkg"
## [542] "Install Respondus LockDown Browser OEM (x64c).pkg"
## [543] "InstallLDBOEM (1).zip"
## [544] "InstallLDBOEM (2).zip"
## [545] "InstallLDBOEM (3).zip"
## [546] "InstallLDBOEM (4).zip"
## [547] "InstallLDBOEM (5).zip"
## [548] "InstallLDBOEM.zip"
## [549] "InstallLDBPackage64c-2-0-9-00.zip"
## [550] "jk1.pdf"
## [551] "jk23.pdf"
## [552] "JPEG image 10.jpeg"
## [553] "JPEG image 11 (1).jpeg"
## [554] "JPEG image 11-1.jpeg"
## [555] "JPEG image 11.jpeg"
## [556] "JPEG image 12.jpeg"
## [557] "JPEG image 13 (1).jpeg"
## [558] "JPEG image 13.jpeg"
## [559] "JPEG image 14.jpeg"
## [560] "JPEG image 15.jpeg"
## [561] "JPEG image 16.jpeg"
## [562] "JPEG image 17.jpeg"
## [563] "JPEG image 18.jpeg"
## [564] "JPEG image 19.jpeg"
## [565] "JPEG image 19.pdf"
## [566] "JPEG image 2.jpeg"
## [567] "JPEG image 20.jpeg"
## [568] "JPEG image 21.jpeg"
## [569] "JPEG image 22.jpeg"
## [570] "JPEG image 23.jpeg"
## [571] "JPEG image 24.jpeg"
## [572] "JPEG image 25.jpeg"
## [573] "JPEG image 27.jpeg"
## [574] "JPEG image 28.jpeg"
## [575] "JPEG image 29.jpeg"
## [576] "JPEG image 3.jpeg"
## [577] "JPEG image 30 (1).jpeg"
## [578] "JPEG image 30.jpeg"
## [579] "JPEG image 31.jpeg"
## [580] "JPEG image 32.jpeg"
## [581] "JPEG image 33.jpeg"
## [582] "JPEG image 4.jpeg"
## [583] "JPEG image 5 (1).jpeg"
## [584] "JPEG image 5.jpeg"
## [585] "JPEG image 6.jpeg"
## [586] "JPEG image 7.jpeg"
## [587] "JPEG image 8.jpeg"
## [588] "JPEG image 9.jpeg"
## [589] "JPEG image lab 3 data.jpeg"
## [590] "Lab 1 challenge .pdf"
## [591] "Lab 1 data .pdf"
## [592] "Lab 3 data 2.0.pdf"
## [593] "Lab 4 data sheet filled .pdf"
## [594] "lab 6 .pdf"
## [595] "lab11.pdf"
## [596] "labsyllabus_0110_2221.pdf"
## [597] "Lecture 1.pptx"
## [598] "Lecture 10-- DNA and RNA part 1 (1).pptx"
## [599] "Lecture 10-- DNA and RNA part 1 (2).pptx"
## [600] "Lecture 10-- DNA and RNA part 1 (3).pptx"
## [601] "Lecture 10-- DNA and RNA part 1 (4).pptx"
## [602] "Lecture 10-- DNA and RNA part 1.pptx"
## [603] "lecture 11- DNA and RNA part 2 (1).pptx"
## [604] "lecture 11- DNA and RNA part 2.pptx"
## [605] "lecture 12 and 13-- 8 practice pedigrees.pdf"
## [606] "lecture 12 and 13-- practice human genetics problems.pdf"
## [607] "Lecture 12-- mutations (1).pptx"
## [608] "Lecture 12-- mutations.pptx"
## [609] "Lecture 13 Corrected-- lipids and cell membranes (2).pdf"
## [610] "Lecture 13-- lipids and cell membranes (1).pptx"
## [611] "Lecture 13-- lipids and cell membranes (2).pptx"
## [612] "Lecture 13-- lipids and cell membranes Unit 3.pdf"
## [613] "Lecture 13-- lipids and cell membranes.pptx"
## [614] "Lecture 14-- osmosis and diffusion.pdf"
## [615] "Lecture 14-- osmosis and diffusion.pptx"
## [616] "Lecture 15-- channels, pumps, and transporters.pdf"
## [617] "Lecture 15-- channels, pumps, and transporters.pptx"
## [618] "Lecture 16-- the cell's dynamic internal membranes (1).pptx"
## [619] "Lecture 16-- the cell's dynamic internal membranes (2).pptx"
## [620] "Lecture 16-- the cell's dynamic internal membranes (3).pdf"
## [621] "Lecture 16-- the cell's dynamic internal membranes (3).pptx"
## [622] "Lecture 16-- the cell's dynamic internal membranes.pptx"
## [623] "lecture 17- endomembrane system (1).pdf"
## [624] "lecture 17- endomembrane system (1).pptx"
## [625] "lecture 17- endomembrane system.pptx"
## [626] "lecture 18-- grid print out.pdf"
## [627] "lecture 18-- protein shipping continued.pptx"
## [628] "Lecture 19-- inroduction to metabolic pathways.pdf"
## [629] "Lecture 19-- inroduction to metabolic pathways.pptx"
## [630] "Lecture 2-- basic chemistry (1).pptx"
## [631] "Lecture 2-- basic chemistry.pptx"
## [632] "lecture 20-- regulating metabolic pathways.pdf"
## [633] "lecture 20-- regulating metabolic pathways.pptx"
## [634] "lecture 21-- using electrochemical gradients to do work.pptx"
## [635] "lecture 22-- redox reactions and electron transport chains.pptx"
## [636] "lecture 23-- photosynthesis.pptx"
## [637] "lecture 24-- respiration overview and metabolic pathways.pptx"
## [638] "lecture 25-- respiration regulation and fermentation.pptx"
## [639] "lecture 26- meiosis.ppt"
## [640] "lecture 27-- mendel part 1.pptx"
## [641] "lecture 28-- mendel part 2.pptx"
## [642] "lecture 29-- sex linkage.pptx"
## [643] "Lecture 3-- polarity and hydrogen bonds (1).pptx"
## [644] "Lecture 3-- polarity and hydrogen bonds.pptx"
## [645] "lecture 30-- treating genetic disorders.pptx"
## [646] "lecture 36-- the cell cycle (mitosis) (1).pptx"
## [647] "lecture 36-- the cell cycle (mitosis).pptx"
## [648] "lecture 4- properties of water.pptx"
## [649] "Lecture 5-- functional groups and polymerization.pptx"
## [650] "Lecture 6-- protein structure and folding.pptx"
## [651] "Lecture 7-- introduction to metabolism (1).pptx"
## [652] "Lecture 7-- introduction to metabolism (2).pptx"
## [653] "Lecture 7-- introduction to metabolism.pptx"
## [654] "Lecture 8-- how enzymes catalyze reactions (1).pptx"
## [655] "Lecture 8-- how enzymes catalyze reactions.pptx"
## [656] "Lecture 9-- carbohydrates.pptx"
## [657] "letters for rose.pdf"
## [658] "LitCritEX1.pdf"
## [659] "Literary Criticism Notes.ppt"
## [660] "Littman_Sam_Period_6_Literary_Criticism_Presentation.pdf"
## [661] "Lm_01_25_22_SS.jpg"
## [662] "Longmore Disability Rights AS.pdf"
## [663] "LP3-File-Expt-5 (1).cmbl"
## [664] "LP3-File-Expt-5 (2).cmbl"
## [665] "LP3-File-Expt-5.cmbl"
## [666] "m9.pdf"
## [667] "mac wallpaper.png"
## [668] "MeksynD_1932redux.pdf"
## [669] "Microsoft Word - FHS Transcript Request Form (1).docx.pdf"
## [670] "Microsoft_Office_16.57.22011101_BusinessPro_Installer (1).pkg"
## [671] "Microsoft_Office_16.57.22011101_BusinessPro_Installer.pkg"
## [672] "midterm paper biomedical ethics (1).pdf"
## [673] "midterm paper biomedical ethics (2).pdf"
## [674] "midterm paper biomedical ethics (3).pdf"
## [675] "midterm paper biomedical ethics .pdf"
## [676] "Midterm_I_Extra_Credit.pdf"
## [677] "Midterm_II___STAT_1000_Extra_Credit.pdf"
## [678] "Midterm_II_Instructions.pdf"
## [679] "midtermextracreditstat.pdf"
## [680] "Mini-Assignment 2 (1).pdf"
## [681] "Mini-Assignment 2.pdf"
## [682] "Minitab homework 1 .pdf"
## [683] "Minitab_Web_App_Installation_Guide.docx"
## [684] "MTH 263 Handwritten Quiz 1 (1).docx"
## [685] "MTH 263 Handwritten Quiz 1.docx"
## [686] "Mu.Editor.1.1.0b7 (1).dmg"
## [687] "Mu.Editor.1.1.0b7.dmg"
## [688] "MuEditor-osx-1.1.1.dmg"
## [689] "Muesuem Assignment .pdf"
## [690] "My Movie 4.mp4"
## [691] "My Movie 6 - Small-1.mov"
## [692] "My Movie 6 - Small.mov"
## [693] "n:a.mp4"
## [694] "NaCl Gel Slide 2 .pptx"
## [695] "NaCl Slides_ Introduction Part 1.pdf"
## [696] "NaCl Slides_ Introduction Part 3-2.pptx"
## [697] "NaCl Slides_ Introduction Part 3-4.pptx"
## [698] "NaCl_Duckweed_Group_Presentation_Order-1 (1).docx"
## [699] "NaCl_Duckweed_Group_Presentation_Order-1.docx"
## [700] "NaCl_Duckweed_Group_Presentation_Order.docx"
## [701] "NaCl_Team_Gel_Image (1).tif"
## [702] "Navarro_ch03_intro_to_R.pdf"
## [703] "Navarro_ch04-more_basic_R.pdf"
## [704] "newdna4.ape"
## [705] "nk1.pdf"
## [706] "NOVA DE ENGLISH transcript.pdf"
## [707] "o1.pdf"
## [708] "o14.pdf"
## [709] "o15 (1).pdf"
## [710] "o15.pdf"
## [711] "o2.pdf"
## [712] "o3.pdf"
## [713] "op1.pdf"
## [714] "opl.pdf"
## [715] "Organic Chemistry 1 - Exam 1 Practice_ Questions.pdf"
## [716] "OTB_MAIN_DELIVERY-R6_1120.pdf"
## [717] "p1.pdf"
## [718] "P2.pdf"
## [719] "p3-skeleton.py"
## [720] "PA1.pdf"
## [721] "pacsun.pdf"
## [722] "PapayaProject.png"
## [723] "Part 3 lab 1.pdf"
## [724] "part2experiment9.csv"
## [725] "PCA-missing_data.Rmd"
## [726] "pdf_converter_1.pdf"
## [727] "pe4.key"
## [728] "Peer_Presentation_Evaluation_Duckweed_SP22_nk.docx"
## [729] "permant resident .pdf"
## [730] "Persuasion Lecture Slides for Canvas.pptx"
## [731] "Persuasion Lecture Slides- Social Psych.pdf"
## [732] "pfp.jpg"
## [733] "Photo prints"
## [734] "Pitt 2021-2022 Housing Instructions.pdf"
## [735] "pk1.pdf"
## [736] "pk2.pdf"
## [737] "pk3.pdf"
## [738] "pk4.pdf"
## [739] "PNG image 3.png"
## [740] "po1.pdf"
## [741] "po2.pdf"
## [742] "Pollutant data Exponential graph.jpeg"
## [743] "pond_site_analysis_final_sp22_SS.pptx"
## [744] "Popcorn Kernals.docx"
## [745] "portfolio_ggpubr_intro-2 (3).Rmd"
## [746] "portfolio_ggpubr_log_transformation.Rmd"
## [747] "pp.pdf"
## [748] "ppp.pdf"
## [749] "Practice Frq 1 histogram .heic"
## [750] "Practice with Nomenclature (blank).pdf"
## [751] "Practice_Midterm_2.pdf"
## [752] "Practice_Problems_Binomial.pdf"
## [753] "Practice_Problems_Confidence_Intervals.pdf"
## [754] "Practice_problems_random_variables.pdf"
## [755] "Prejudice 2 slides for Canvas.pptx"
## [756] "Prelab_Barcode_SNP_Sequencing_set_up.pptx"
## [757] "Presentation planning DWS sp22_4_12_22_tuesday.docx"
## [758] "Probability_Practice_Problems (1).pdf"
## [759] "Probability_Practice_Problems.pdf"
## [760] "Project 2 - Lab 2-1-1 (1).pdf"
## [761] "Project 2 - Lab 2-1-1.pdf"
## [762] "Project 2 - Lab 3-2.pdf"
## [763] "Prom Pictures 2021"
## [764] "R_data_structures_vectors_intro.pdf"
## [765] "R-3.3.3 (1).pkg"
## [766] "R-3.3.3.pkg"
## [767] "R-3.6.3.nn.pkg"
## [768] "R-4.0.0.pkg"
## [769] "R-4.2.1 (1).pkg"
## [770] "R-4.2.1.pkg"
## [771] "R2_DNARep (1).docx"
## [772] "R2_DNARep.docx"
## [773] "r2.pdf"
## [774] "r5.pdf"
## [775] "Random_Variables.pdf"
## [776] "ray101.pdf"
## [777] "Reading Response 1- Biomedical Ethics (1).pdf"
## [778] "Reading Response 1- Biomedical Ethics .pdf"
## [779] "Reading Response Research Ethics .pdf"
## [780] "RECITATION 2 DATA (1).MPJ"
## [781] "RECITATION 2 DATA.MPJ"
## [782] "recitation 2.pdf"
## [783] "Recitation 2.pptx"
## [784] "recitation 4 MC application practice (1).doc"
## [785] "recitation 4 MC application practice (2).doc"
## [786] "recitation 4 MC application practice.doc"
## [787] "removing_fixed_alleles.html"
## [788] "removing_fixed_alleles.Rmd"
## [789] "Results table (1).pptx"
## [790] "Results table.pptx"
## [791] "Resume 4.0.docx.pdf"
## [792] "REV-Fiji_counting_rules_fronds_sp21klw_nk_AN.pptx"
## [793] "RF Recitiation #1.pdf"
## [794] "RPB1 (1).ape"
## [795] "RPB1.ape"
## [796] "RPB1mut.ape"
## [797] "rsconnect"
## [798] "RStudio-2022.07.1-554.dmg"
## [799] "RStudio-2022.07.2-576.dmg"
## [800] "S288C_YIL143C_SSL2_coding.fsa"
## [801] "S288C_YIL143C_SSL2_genomic.fsa"
## [802] "Sample DNA trace - Peakon_files"
## [803] "Sample DNA trace - Peakon.html"
## [804] "Sampling_Distribution_for_Proportions.pdf"
## [805] "Sampling_Distributions_for_Means (1).pdf"
## [806] "Sampling_Distributions_for_Means.pdf"
## [807] "Sarayu Soma - AP Physics C Mechanics Cumulative Exam #1 FR.pdf"
## [808] "Sarayu Soma AP 2-D Art and Design Portfolio.pdf"
## [809] "Sarayu Soma_ Self-Reported AP Exam Scores .pdf"
## [810] "SATStudentScoreReport_1652735645273.pdf"
## [811] "SC1920_Kaufmann_.jpg"
## [812] "scholarship_form_GeneTex_v2.docx"
## [813] "science.abo0039_sm.pdf"
## [814] "Score Report.pdf"
## [815] "SEC1025 Group A.tif"
## [816] "SEC1035 Group A.tif"
## [817] "Sequence 1.ape"
## [818] "Sequence 4.ape"
## [819] "sequence 46.ape"
## [820] "sequence 47.ape"
## [821] "sequence 48.ape"
## [822] "Sequence 7.ape"
## [823] "Shantel Ray Not just study drugs for the rich (1).pdf"
## [824] "Shantel Ray Not just study drugs for the rich.pdf"
## [825] "Sikander Bhabha 1999 Miniature Modernity.pdf"
## [826] "Sinyavsky_Ivan_the_Fool.pdf"
## [827] "SKM_458e21070615040.pdf"
## [828] "Social Psych Mini Assignment 1 .pdf"
## [829] "Solubility Rules.pdf"
## [830] "Soma_ Sarayu_ Period 6_ English 112 Self-Assessment Spring 2021 (1).pdf"
## [831] "Soma_ Sarayu_ Period 6_ English 112 Self-Assessment Spring 2021 (2).pdf"
## [832] "Soma_ Sarayu_ Period 6_ English 112 Self-Assessment Spring 2021.docx"
## [833] "Soma_ Sarayu_ Period 6_ English 112 Self-Assessment Spring 2021.pdf"
## [834] "Soma_ Sarayu_Period 6_ Rhetorical Analysis Chart_ Welcome to America .pdf"
## [835] "Soma_ Sarayu_Period 6_ Rhetorical Analysis Chart_ Why Looks are the Last Bastation of Discrimination .pdf"
## [836] "Soma_Sarayu_Journal club2.pptx"
## [837] "Soma_Sarayu_Period 6_2021 Super Bowl Commercial Analysis.pdf"
## [838] "Soma_Sarayu_Period 6_Essay 1 .pdf"
## [839] "Soma_Sarayu_Period 6_Final Project Annotated Bibliography (1).docx"
## [840] "Soma_Sarayu_Period 6_Final Project Annotated Bibliography (2).docx"
## [841] "Soma_Sarayu_Period 6_Final Project Annotated Bibliography.docx"
## [842] "Soma_Sarayu_Period 6_Final Project Annotated Bibliography.pdf"
## [843] "Soma_Sarayu_Period 6_Quarter 2 Self-Reflection (1).pdf"
## [844] "Soma_Sarayu_Period 6_Quarter 2 Self-Reflection .pdf"
## [845] "Soma_Sarayu_Period 6_Rheotrical Analysis Essay .pdf"
## [846] "Soma_Sarayu_Size of Fronds Graphs.xlsx"
## [847] "Soma, Sarayu CV .pdf"
## [848] "soma, sarayu practice.HEIC"
## [849] "Soma, Sarayu Practice.pdf"
## [850] "soma,sarayu MTH 263 Exam#1.pdf"
## [851] "sosa_lecture-BIOTECHpost (1).pptx"
## [852] "sosa_lecture-BIOTECHpost (2).pptx"
## [853] "sosa_lecture-BIOTECHpost.pptx"
## [854] "Sp_1.25.22_ECR.jpg"
## [855] "SS Resume.pdf"
## [856] "ss1.HEIC"
## [857] "ss1.pdf"
## [858] "ss1014.pdf"
## [859] "ss2.pdf"
## [860] "ss3.pdf"
## [861] "ssj.pdf"
## [862] "SSL2.ape"
## [863] "SSL2coding .ape"
## [864] "ST_1000__Practice_Midterm_S22.pdf"
## [865] "STAT FINAL .pdf"
## [866] "stat hw 3 _merged.pdf"
## [867] "stathw1.pdf"
## [868] "stathw2 (1).pdf"
## [869] "stathw2.pdf"
## [870] "Stathw4.pdf"
## [871] "Statistics Homework 3 (1).pdf"
## [872] "Statistics Homework 3.pdf"
## [873] "Stereotyping and Prejudice 1 slides for Canvas.pdf"
## [874] "Stereotyping and Prejudice 1 slides for Canvas.pptx"
## [875] "StewartCalcET8_02_01.ppt"
## [876] "Subramanian_Turcotte2020_main.pdf"
## [877] "Survey Link.docx"
## [878] "Survey Link.pdf"
## [879] "sustainability-13-13423-v2.pdf"
## [880] "Sustained Investigation Piece #1 (Mar 4, 2021 at 5_28 PM)"
## [881] "T65_MIMLSG5LM30_SC1920_ECR-SC1920_forward.seq"
## [882] "Teachers video (1).mp4"
## [883] "Teachers video.mp4"
## [884] "Teamwork semester evaluation_Duckweed_SP22_nk.docx"
## [885] "The Ripple of #MeToo - Sarayu Soma .pdf"
## [886] "TheBalance_Resume_2063554.docx"
## [887] "this_message_in_html.html"
## [888] "transpose_VCF_data.html"
## [889] "transpose_VCF_data.Rmd"
## [890] "Tuesday_Student_SP22anaylzing_course_population_2_21.pptx"
## [891] "TV Hours (1).xlsx"
## [892] "TV Hours.xlsx"
## [893] "u1.pdf"
## [894] "UGRD_Admissions^4496472^^Soma, Sarayu^2152021^01231951^Admit Letter (1).pdf"
## [895] "UGRD_Admissions^4496472^^Soma, Sarayu^2152021^01231951^Admit Letter (2).pdf"
## [896] "UGRD_Admissions^4496472^^Soma, Sarayu^2152021^01231951^Admit Letter.pdf"
## [897] "Unconfirmed 242699.crdownload"
## [898] "unnamed.jpg"
## [899] "Untitled document (1).pdf"
## [900] "Untitled document (2).pdf"
## [901] "Untitled Page.pdf"
## [902] "UntitledR.R"
## [903] "vcfR_test.vcf"
## [904] "vcfR_test.vcf.gz"
## [905] "VCU Honors Essay- Sarayu Soma .pdf"
## [906] "veb28.pdf"
## [907] "vegan_PCA_amino_acids-STUDENT.html"
## [908] "vegan_PCA_amino_acids-STUDENT.Rmd"
## [909] "vegan_pca_with_msleep-STUDENT.html"
## [910] "vegan_pca_with_msleep-STUDENT.Rmd"
## [911] "version=1&uuid=DCE83A4C-F384-4DE4-AC17-0BC67192197E&mode=compatible.pdf"
## [912] "View recent photos.png"
## [913] "walsh2017morphology (1).csv"
## [914] "walsh2017morphology (1).numbers"
## [915] "walsh2017morphology.csv"
## [916] "walsh2017morphology.numbers"
## [917] "What is computational biology_ (1).pdf"
## [918] "white paper.pdf"
## [919] "WININLM66_NaCl2_D0_020122_SS.jpg"
## [920] "WIWNINLM66_cont_D0_020122_SS.jpg"
## [921] "WIWNINLM66_Cont_D14_021522_SS.jpg"
## [922] "WIWNINLM66_Cont_D7_020822_SS.jpg"
## [923] "WIWNINLM66_NaCl1_D0_020122_SS.jpg"
## [924] "WIWNINLM66_NaCl1_D14_021522_SS.jpg"
## [925] "WIWNINLM66_NaCl1_D7_020822_SS.jpg"
## [926] "WIWNINLM66_NaCl2_D14_021522_SS.jpg"
## [927] "working_directory_practice (1).Rmd"
## [928] "working_directory_practice (2).Rmd"
## [929] "working_directory_practice (3).Rmd"
## [930] "working_directory_practice--3-.html"
## [931] "working_directory_practice.Rmd"
## [932] "XQuartz-2.8.2.dmg"
## [933] "zoom_1.mp4"
## [934] "Zoom.pkg"
## [935] "zzz (1).pdf"
## [936] "zzz.pdf"
list.files(pattern = "vcf")
## [1] "all_loci (1).vcf" "all_loci (2).vcf"
## [3] "all_loci-1.vcf" "all_loci.vcf"
## [5] "code_checkpoint_vcfR.html" "code_checkpoint_vcfR.Rmd"
## [7] "vcfR_test.vcf" "vcfR_test.vcf.gz"
Load the all_loci.vcf file into an R data object with
vcfR::read.vcfR()
.
bird_snps <- vcfR::read.vcfR("all_loci.vcf")
## Scanning file to determine attributes.
## File attributes:
## meta lines: 8
## header_line: 9
## variant count: 1929
## column count: 81
##
Meta line 8 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 1929
## Character matrix gt cols: 81
## skip: 0
## nrows: 1929
## row_num: 0
##
Processed variant 1000
Processed variant: 1929
## All variants processed
cat("Note - if this didn't work you may not have your working directory set")
## Note - if this didn't work you may not have your working directory set
Use vcfR::extract.gt()
to get the genotype scores.
bird_snps_num <- vcfR::extract.gt(bird_snps,
element = "GT",
IDtoRowNames = F,
as.numeric = T,
convertNA = T,
return.alleles = F)
Transpose the data with t()
so that it has the proper
orientation.
# add t()
bird_snps_num_t <- t(bird_snps_num) # TODO
Convert the matrix to a dataframe.
# add data.frame()
bird_snps_num_df <- data.frame(bird_snps_num_t) # TODO
In order to deal with NAs you must first locate them.
In this paper, the author’s state that they removed from their analysis data an individual (row) that had missing values (NAs) for >50% of the SNPs. First we need to find them.
NAs can be detected in R using is.na()
.
Let’s take a look at how many NAs are in the first row of the data.
We’ll use bracket notation of [1, ]
to
look at the row.
# Add is.na() and select the first row
## using [1, ]
NAs_row_01 <- is.na(bird_snps_num_t [1, ]) #TODO
is.na()
returns a logical vector of
TRUE
and FALSE
values.
Look at the output of is.na()
with
head()
.
# call head() on the vector NAs_row_01
head(NAs_row_01) # TODO
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
In this vector, TRUE means “yes, there was an NA in this position”, and FALSE means “No, no NA there.”
The length of the vector is the length of the entire row we put into
is.na()
, meaning we have a TRUE or FALSE answer for every
single value in the row.
We can check this with length()
and logical
comparisons.
First, the length of our vector of TRUE/FALSE responses.
# Call length() on NAs_row_01
N_NAs <- length(NAs_row_01) # TODO
N_NAs
## [1] 1929
Now, the length of our original row
# Call length() on the first row of bird_snps_num_t
## use bracket notation of [1, ] to get the
## first row
length_row <- length(bird_snps_num_t) #TODO
Now check that they are identical using ==
# Use a logical comparison with ==
## to confirm they are the same length
N_NAs == length_row # TODO
## [1] FALSE
We can work directly with a vector of TRUE
and
FALSE
value, but I find its easiest to first convert this
logical vector into a vector of index
values (indices) that tell us exactly where
the NAs
are in the dataframe.
We can get these indices this with which(... == TRUE)
,
because in the vector TRUE
is saying “Yes, its TRUE there
was an NA there.”
# Add which()
which(NAs_row_01 == TRUE) # TODO
## [1] 664 665 666 667 668 669 693 744 983 984 985 986 987 988 989
## [16] 990 1158 1159 1470 1471 1537 1901 1902 1925 1926 1927 1928 1929
This gives us an vector of index values. We’ll save the vector for later use.
# Assign the output to an object called
## i_NA_row_01
i_NA_row_01 <- (NAs_row_01 == TRUE) # TODO
We can confirm that these parts of row 1 of our dataframe contain NAs
using the vector we made i_NA_row_01
and bracket notation.
Let’s look at the rows with NAs in column 1, and also see what’s in rows
2 and 3.
bird_snps_num_t[c(1:3), i_NA_row_01]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## sample_ACAAA_Nel3 NA NA NA NA NA NA NA NA NA NA NA
## sample_ACAGTG_Nel5 0 0 0 0 0 0 0 1 0 0 0
## sample_AGCAT_Nel8 0 0 0 0 0 0 0 1 NA NA NA
## [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## sample_ACAAA_Nel3 NA NA NA NA NA NA NA NA NA NA
## sample_ACAGTG_Nel5 0 0 0 0 0 NA NA 1 0 NA
## sample_AGCAT_Nel8 NA NA NA NA NA 0 1 1 0 0
## [,22] [,23] [,24] [,25] [,26] [,27] [,28]
## sample_ACAAA_Nel3 NA NA NA NA NA NA NA
## sample_ACAGTG_Nel5 0 0 0 0 0 0 0
## sample_AGCAT_Nel8 0 0 NA NA NA NA NA
Here’s a function that will look for NAs in single column or vector, and tell us the index values.
find_NAs <- function(x){
NAs_TF <- is.na(x)
i_NA <- which(NAs_TF == TRUE)
N_NA <- length(i_NA)
cat("Results:",N_NA, "NAs present\n.")
return(i_NA)
}
Let’s test it on the first row of our data.
# run find_NAs() on the first row of
## bird_snps_num_t, using bracket notation [1,]
find_NAs(bird_snps_num_t[1,]) # TODO
## Results: 28 NAs present
## .
## [1] 664 665 666 667 668 669 693 744 983 984 985 986 987 988 989
## [16] 990 1158 1159 1470 1471 1537 1901 1902 1925 1926 1927 1928 1929
For our workflow, we’re going to want to find the NAs in each row of data, count them up, and if >50% of the SNPs are NA, then remove the entire row.
We could find the NAs in each row like this. First, find the NAs in row 1 and save it to a vector:
i_NAs01 <- find_NAs(bird_snps_num_t[1,])
## Results: 28 NAs present
## .
And figure how how many NAs there are with length()
length(i_NAs01)
## [1] 28
Then continue this many times:
i_NAs02 <- find_NAs(bird_snps_num_t[2,])
## Results: 20 NAs present
## .
i_NAs03 <- find_NAs(bird_snps_num_t[3,])
## Results: 28 NAs present
## .
i_NAs04 <- find_NAs(bird_snps_num_t[4,])
## Results: 24 NAs present
## .
by making a new vector for each row and updating the row number in the brackets. This would take a lot of time and be very prone to errors.
This process of working on many rows is most easily done with a common programming approach called a for loop. The name at first doesn’t make sense; what it should be called is a “do something a bunch of times” loop.
It is called a “for” loop because you tell it something along the lines of: “FOR every row in this dataframe frame, do this …” In our case we’ll work on rows, but it can also work on columns, or anything else that can exist in R.
I’ll write a for()
loop that will go through each row of
our SNP data and determine if >50% of the values are NA
.
I’ll use our find_NA()
function we just made.
To do this I’m going to want to have a few things:
N_rows
: The total number of rows of data in our
dataframeN_NA
: A vector to hold how many NAs are in each
rowN_SNPs
: The total number of columns, so we can
determine if >50% of the columns (SNPs) are NAs.We can get the number of rows with nrow()
:
# call nrow() on bird_snps_num_t
N_rows <- nrow(bird_snps_num_t) # TODO
We can make a vector to store how many NAs are in each row like this,
where rep()
repeats 0 for us as many times as we want.
N_NA <- rep(x = 0, times = N_rows)
I can get the number of SNPs withncol()
# call ncol() on bird_snps_num_t
N_SNPs <-ncol(bird_snps_num_t) # TODO
The percentage of SNPs that are NA can be found as
length(i_NAs01)/N_SNPs*100
## [1] 1.451529
If we were doing this by hand we’d have to fill in the vector of the
number of NAs (N_NA
) like this:
# Number of NAs in row 1
i_NAs01 <- find_NAs(bird_snps_num_t[1,])
## Results: 28 NAs present
## .
N_NA[1] <- length(i_NAs01)
# Number of NAs in row 2:
i_NAs02 <- find_NAs(bird_snps_num_t[2,])
## Results: 20 NAs present
## .
N_NA[2] <- length(i_NAs02)
# Number of NAs in row 3:.
# ... etc
That would not be fun. So we automate the process with a
for()
loop. I won’t explain right now how whole thing work,
but take a look to get the general sense of what it is.
I’ll repeat the previous preparation code so its all in one place.
# N_rows
# number of rows (individuals)
N_rows <- nrow(bird_snps_num_t)
# N_NA
# vector to hold output (number of NAs)
N_NA <- rep(x = 0, times = N_rows)
# N_SNPs
# total number of columns (SNPs)
N_SNPs <- ncol(bird_snps_num_t)
# the for() loop
for(i in 1:N_rows){
# for each row, find the location of
## NAs with bird_snps_num_t()
i_NA <- find_NAs(bird_snps_num_t[i,])
# then determine how many NAs
## with length()
N_NA_i <- length(i_NA)
# then save the output to
## our storage vector
N_NA[i] <- N_NA_i
}
## Results: 28 NAs present
## .Results: 20 NAs present
## .Results: 28 NAs present
## .Results: 24 NAs present
## .Results: 23 NAs present
## .Results: 63 NAs present
## .Results: 51 NAs present
## .Results: 38 NAs present
## .Results: 34 NAs present
## .Results: 24 NAs present
## .Results: 48 NAs present
## .Results: 21 NAs present
## .Results: 42 NAs present
## .Results: 78 NAs present
## .Results: 45 NAs present
## .Results: 21 NAs present
## .Results: 42 NAs present
## .Results: 34 NAs present
## .Results: 66 NAs present
## .Results: 54 NAs present
## .Results: 59 NAs present
## .Results: 52 NAs present
## .Results: 47 NAs present
## .Results: 31 NAs present
## .Results: 63 NAs present
## .Results: 40 NAs present
## .Results: 40 NAs present
## .Results: 22 NAs present
## .Results: 60 NAs present
## .Results: 48 NAs present
## .Results: 961 NAs present
## .Results: 478 NAs present
## .Results: 59 NAs present
## .Results: 26 NAs present
## .Results: 285 NAs present
## .Results: 409 NAs present
## .Results: 1140 NAs present
## .Results: 600 NAs present
## .Results: 1905 NAs present
## .Results: 25 NAs present
## .Results: 1247 NAs present
## .Results: 23 NAs present
## .Results: 750 NAs present
## .Results: 179 NAs present
## .Results: 433 NAs present
## .Results: 123 NAs present
## .Results: 65 NAs present
## .Results: 49 NAs present
## .Results: 192 NAs present
## .Results: 433 NAs present
## .Results: 66 NAs present
## .Results: 597 NAs present
## .Results: 1891 NAs present
## .Results: 207 NAs present
## .Results: 41 NAs present
## .Results: 268 NAs present
## .Results: 43 NAs present
## .Results: 110 NAs present
## .Results: 130 NAs present
## .Results: 90 NAs present
## .Results: 271 NAs present
## .Results: 92 NAs present
## .Results: 103 NAs present
## .Results: 175 NAs present
## .Results: 31 NAs present
## .Results: 66 NAs present
## .Results: 64 NAs present
## .Results: 400 NAs present
## .Results: 192 NAs present
## .Results: 251 NAs present
## .Results: 69 NAs present
## .Results: 58 NAs present
## .
My vector N_SNPs
now how the number of NAs in each row
of the dataframe.
head(N_NA)
## [1] 28 20 28 24 23 63
We can get a sense of the how many NAs there are in the rows of the dataset by making a histogram. Most rows have very few, and a few have a lot:
# Call hist() on N_NA
hist(N_NA) # TODO
The authors of the bird speciation paper decided to remove any row that was >50% NAs. There are 1929 SNPs, so 50% is about 964 SNPs.
# total number of columns
N_SNPs
## [1] 1929
# 50% of N_SNPs
cutoff50 <- N_SNPs*0.5
I can add a line for the cutoff to the plot with
abline()
# Call hist() on N_NA
## add a vertical line at the cutoff value
## using abline()
hist(N_NA) # TODO
abline(v = cutoff50,
col = 2,
lwd = 2,
lty = 2)
After figuring out how many NAs there in each row, I can convert this to a percent.
percent_NA <- N_NA/N_SNPs*100
I can plot these percentages and set the cutoff at 50 for 50%
# Call hist() on N_NA
## add a vertical line at 50%
## using abline()
hist(percent_NA) # TODO
abline(v = , # TODO
col = 2,
lwd = 2,
lty = 2)
I can determine the index value of each row with >50% NAs using
which()
# Call which() on percent_NA
i_NA_50percent <- which(percent_NA > 50) # TODO
I use length()
to see how many there are
# call length() on i_NA_50percent
length(i_NA_50percent) # TODO
## [1] 4
The index values happen to be:
i_NA_50percent
## [1] 37 39 41 53
There are 4 rows where 50% or more of the columns contain an NA
In the paper they say they removed 6, and I’m not sure where the discrepancy comes from. In order get up to 6 birds, I need to decrease the threshold to 38% missing.
which(percent_NA > 38)
## [1] 31 37 39 41 43 53
length(which(percent_NA > 38))
## [1] 6
I can remove the rows of data with >50% missing using negative indexing.
bird_snps_num_t02 <- bird_snps_num_t[-i_NA_50percent, ]
I always need to check to make sure the previous and current data make sense.
dim(bird_snps_num_t)
## [1] 72 1929
dim(bird_snps_num_t02)
## [1] 68 1929
In our workflow information about the samples like their population of origin is getting embedded in the row names of the dataframe. (In contrast to this, VCF files from the 1000 Genomes Project have a separate file with all the information).
Its going to become necessary in subsequent assignments to access this information, for example to color-code plots. It takes a little bit of code to get the information from these row names, so I’m not going to dig into it here except to say I’m using functions known as regular expressions to be able to edit the text in the row names.
First, let’s look at the row names.
# call row.names() on bird_snps_num_t
row_names <- row.names(bird_snps_num_t) # TODO
# call head() on row_names
head(row_names) # TODO
## [1] "sample_ACAAA_Nel3" "sample_ACAGTG_Nel5" "sample_AGCAT_Nel8"
## [4] "sample_ATGAAAC_Nel10" "sample_ATGAAAC_Nel15" "sample_CGATGT_Nel4"
The individual samples are called things like “Nel3” and “Cau10”. The numbers are ID numbers of individuals birds that DNA was collected from, and the letters are the populations.
I can use regular expressions (in this case a function called
gsub(
)) to remove the stuff like “sample_ACAAA_” before the
things I want. First I’ll remove the “sample_” (don’t worry about the
exact details of how the function works).
# add gsub() to before ("sample_","",row_names)
row_names02 <- gsub("sample_","",row_names)
# look at the output using head()
head(row_names02)
## [1] "ACAAA_Nel3" "ACAGTG_Nel5" "AGCAT_Nel8" "ATGAAAC_Nel10"
## [5] "ATGAAAC_Nel15" "CGATGT_Nel4"
Now I’ll get rid of the As, Cs, Ts and Gs (not yet sure what those
are actually…). This gives me a unique combination of a population code
and number for each sample. (Again, we won’t worry about the code
withing gsub()
).
# clean up the character data
sample_id <- gsub("^([ATCG]*)(_)(.*)",
"\\3",
row_names02)
# look at thee output
head(sample_id)
## [1] "Nel3" "Nel5" "Nel8" "Nel10" "Nel15" "Nel4"
Now I want a vector just with the population code, so I’ll use
gsub()
to get rid of the numbers. (Again, don’t worry about
the details of what’s in gsub()
.)
# add gsub() before the stuff in the parentheses
pop_id <- gsub("[01-9]*", # TODO
"",
sample_id)
The function table() summarizes the output for me
# call table() on pop_id
table(pop_id) # TODO
## pop_id
## Alt Cau Div Nel Sub
## 15 13 15 15 14
As noted above, the authors say they removed 6 rows because of NAs, but I only got 4. Its actually pretty common for things reported in a paper to diverge from what you’re able to replicate from the data - somewhere, something minor got misreported or left off of a file. But I do always like to try to figure out what’s going on.
The author’s state:
“We obtained blood samples from 75 Ammodramus sparrows … 15 from each of five putative subspecies. … Due to missing data, we removed six individuals (four of which were from [the subspecies] subvirgatus ), resulting in the analysis of 69 individuals (11–15 individuals per population).
Now that I’ve extracted the information on the samples I can see what samples they provided in their data.
Again, I can summarize the vector of population names with
table()
.
length(pop_id)
## [1] 72
table(pop_id)
## pop_id
## Alt Cau Div Nel Sub
## 15 13 15 15 14
Right away I can see I have only 72 samples versus their 75, so 3 are missing. There’s also only 13 in the “Cau” category and 1 in “Sub”. So the .vcf file they provided is either missing a few birds, or something is happening when vcfR loads to kick out some rows, perhaps due to data quality. I could open up the vcfR file in a text editor to check this out if I wanted.
When locating rows with NAs I created a vector where there were >50% NAs:
i_NA_50percent
## [1] 37 39 41 53
I can compare this to my vector of population IDs using brackets:
sample_id[i_NA_50percent]
## [1] "Sub3" "Sub8" "Sub4" "Cau10"
In the paper they say of the six samples they removed, “four of which were from [the subspecies] subvirgatus”. I have 3 samples called “Sub”, which are probably 3 of those 4 samples, and one “Cau.”
In total, I’m missing 1 Sub with >50% missing data, 1 Cau with >50% missing data, and 1 Cau with <50% missing data. I’m not sure what’s going on, but I doubt this will impact the analysis.