Introduction

In this worked example you will replicate a PCA on a published dataset.

The example is split into 2 Parts:

In this Data Preparation phase, you will do the following things:

  1. Load the SNP genotypes in .vcf format (vcfR::read.vcfR())
  2. Extract the genotypes into an R-compatible format (vcfR::extract.gt())
  3. Rotate the data into the standard R analysis format (t())
  4. Remove individuals (rows) from the data set that have >50% NAs (using a function I wrote)
  5. Remove SNPs (columns) that are fixed
  6. Impute remaining NAs (using a for() loop)
  7. Save the prepared data as a .csv file for the next step (write.csv())

Biological background

This worked example is based on a paper in the journal Molecular Ecology from 2017 by Jennifer Walsh titled Subspecies delineation amid phenotypic, geographic and genetic discordance in a songbird.

The study investigated variation between two bird species in the genus Ammodramus: A. nenlsoni and A. caudacutus.

The species A. nenlsoni has been divided into 3 sub-species: A. n. nenlsoni, A.n. alterus, and A n. subvirgatus. The other species, A. caudacutus, has been divided into two subspecies, A.c. caudacutus and A.c. diversus.

The purpose of this study was to investigate to what extent these five subspecies recognized by taxonomists are supported by genetic data. The author’s collected DNA from 75 birds (15 per subspecies) and genotyped 1929 SNPs. They then analyzed the data with Principal Components Analysis (PCA), among other genetic analyzes.

This tutorial will work through all of the steps necessary to re-analyze Walsh et al.s data

Tasks

In the code below all code is provided. Your tasks will be to do 2 things:

  1. Give a meaningful title to all sections marked “TODO: TITLE”
  2. Write 1 to 2 sentences describing what is being done and why in all sections marked “TODO: EXPLAIN”

Preliminaries

Load the vcfR and other packages with library().

library(vcfR)    
## 
##    *****       ***   vcfR   ***       *****
##    This is vcfR 1.13.0 
##      browseVignettes('vcfR') # Documentation
##      citation('vcfR') # Citation
##    *****       *****      *****       *****
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
library(ggplot2)
library(ggpubr)

Make sure that your working directory is set to the location of the file all_loci.vcf.

getwd()
## [1] "/Users/sadhikayanamadala/Downloads"
list.files()
##   [1] "07-mean_imputation.docx"                                                          
##   [2] "07-mean_imputation.html"                                                          
##   [3] "07-mean_imputation.Rmd"                                                           
##   [4] "08-PCA_worked.Rmd"                                                                
##   [5] "09-PCA_worked_example-SNPs-part1.Rmd"                                             
##   [6] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504 3.vcf"                   
##   [7] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504.vcf"                     
##   [8] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504.vcf.gz"                  
##   [9] "10.24 Find the Differences-2.docx"                                                
##  [10] "10.24 Find the Differences-3.docx"                                                
##  [11] "10.24 Find the Differences-4.docx"                                                
##  [12] "10.24 Find the Differences.docx"                                                  
##  [13] "13.29656372-29896372.ALL.chr13_GRCh38.genotypes.20170504.vcf"                     
##  [14] "1540_cluster_analysis.pdf"                                                        
##  [15] "1540_final_report_flowchart.pdf"                                                  
##  [16] "2022_18_d.JPG"                                                                    
##  [17] "2022RaiseEveryVoicePresentation.pptx"                                             
##  [18] "2231 Exam 2 KEY.pdf"                                                              
##  [19] "AGH AC - Patient and Family.doc"                                                  
##  [20] "all_loci-1.vcf"                                                                   
##  [21] "all_loci-1.vcf.txt"                                                               
##  [22] "all_loci.vcf"                                                                     
##  [23] "all_loci.vcf.txt"                                                                 
##  [24] "allomtery_3_scatterplot3d (1).Rmd"                                                
##  [25] "Analyzing Sanger Sequencing_Flowers_F22.pptx"                                     
##  [26] "Analyzing_Practice_Sequencing_Flowers_F22_JR-1.pptx"                              
##  [27] "Analyzing_Practice_Sequencing_Flowers_F22_JR-2.pptx"                              
##  [28] "Analyzing_Practice_Sequencing_Flowers_F22_JR.pptx"                                
##  [29] "Assessing_Shannon_Diversity_StudentPrelab_Flowers_F22_.pptx"                      
##  [30] "Background_Research_Assignment_Flowers_F22_-_SG_Edits (1).pptx"                   
##  [31] "Bacterial_isolation_protocol_Student_version_Flowers_F22.pptx"                    
##  [32] "bird_snps_remove_NAs.html"                                                        
##  [33] "bird_snps_remove_NAs.Rmd"                                                         
##  [34] "BPB_spectrophotometry.xlsx"                                                       
##  [35] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-2.docx"                  
##  [36] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-3-2.docx"                
##  [37] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-3-3.docx"                
##  [38] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-3-4.docx"                
##  [39] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-3-5.docx"                
##  [40] "BR_General Figure Guidelines and Checklist_Flowers_F22-2-3.docx"                  
##  [41] "BR_General Figure Guidelines and Checklist_Flowers_F22-2.docx"                    
##  [42] "BR_General Figure Guidelines and Checklist_Flowers_F22.docx"                      
##  [43] "BR_Lab Meeting 1-Diversity_Flowers FA22.pptx"                                     
##  [44] "BR_UV-tolerance-Analysis_Flowers_FA22.pptx"                                       
##  [45] "BR_Week 15_OMETs_Flowers_FA22.pdf"                                                
##  [46] "BR-Excel Practice_Flowers-FA22.pptx"                                              
##  [47] "BR-Figure-3_Flowers_FA22.docx"                                                    
##  [48] "BR-Figure-4_Flowers_FA22-1.docx"                                                  
##  [49] "BR-In_class_template_Journal_Club_2_Flowers_F22.pptx"                             
##  [50] "BR-Journal Club 1_Flowers_FA22.pptx"                                              
##  [51] "BR-Prelab_Bacterial Diversity and Isolation_FA22.pptx"                            
##  [52] "BR-Spot plate and UV tolerance Prelab_Flowers_F22.pptx"                           
##  [53] "center_function.R"                                                                
##  [54] "CFU Practice Calculation_Flowers_F22-2.xlsx"                                      
##  [55] "CFU Practice Calculation_Flowers_F22-3.xlsx"                                      
##  [56] "CFU Practice Calculation_Flowers_F22.xlsx"                                        
##  [57] "Chapter 5 Stereoisomers.docx"                                                     
##  [58] "chem 0310 exam 1 2221 answers.pdf"                                                
##  [59] "chem 0310 exam 1 2221.pdf"                                                        
##  [60] "chem 0310 exam 2 2221.pdf"                                                        
##  [61] "chem 0310 exam 2 key 2221.pdf"                                                    
##  [62] "Chem 310 Recitation Problem Set I-2211 (002).docx"                                
##  [63] "Chem 310 Recitation Problem Set III-2211.docx"                                    
##  [64] "Chem 310 Recitation Problem Set IV-2211.docx"                                     
##  [65] "Chemistry 310 Recitation Problem Set II-2211.docx"                                
##  [66] "cluster_analysis_portfolio.Rmd"                                                   
##  [67] "code_checkpoint_vcfR.Rmd"                                                         
##  [68] "CODE_CHECKPOINT-first_rstudio_script.R"                                           
##  [69] "College Resume-2.pdf"                                                             
##  [70] "Cover letter (1).docx"                                                            
##  [71] "Cover Letter.pdf"                                                                 
##  [72] "CV-writingclass.docx"                                                             
##  [73] "d, 11-08-2022, Reed, 10 sec.jpg"                                                  
##  [74] "d, 11-08-2022, Reed, 20 sec.jpg"                                                  
##  [75] "Daddy B-day Photos.pdf"                                                           
##  [76] "dino.csv"                                                                         
##  [77] "Email Examples-2.docx"                                                            
##  [78] "Email Examples.docx"                                                              
##  [79] "Excel Practice Data Set_Flowers F22.xlsx"                                         
##  [80] "FA6_SB_50_B-16sForward.ab1"                                                       
##  [81] "Fadi Resume -final[1].docx"                                                       
##  [82] "feature_engineering_intro_2_functions-part2.Rmd"                                  
##  [83] "feature_engineering-2.Rmd"                                                        
##  [84] "feature_engineering.Rmd"                                                          
##  [85] "figure(II)_BR_General Figure Guidelines and Checklist_Flowers_F22-2-3 copy.docx"  
##  [86] "Fiji.app"                                                                         
##  [87] "Final Presentation planning Flowers F22 copy.docx"                                
##  [88] "Final Presentation planning Flowers F22.docx"                                     
##  [89] "Final Presentation Planning Slides_Flowers_F22.pptx"                              
##  [90] "final_report_template.Rmd"                                                        
##  [91] "Flow chart assignment_Flowers_F22-2.pptx"                                         
##  [92] "Flow chart assignment_Flowers_F22.pptx"                                           
##  [93] "Flower Microbiome Slides.pptx"                                                    
##  [94] "Flowers_Data_Excel_Sheet_F22_9.9.22-2.xlsx"                                       
##  [95] "Flowers_Data_Excel_Sheet_F22_9.9.22-3.xlsx"                                       
##  [96] "Flowers_Data_Excel_Sheet_F22_9.9.22.xlsx"                                         
##  [97] "fst_exploration_in_class-STUDENT.Rmd"                                             
##  [98] "fst_exploration_in_class.Rmd"                                                     
##  [99] "Gel_Electrophoresis_flowchart_Student_Flowers_F22.pptx"                           
## [100] "Graphs Presentation.pptx"                                                         
## [101] "HansikaDOB-2.pdf"                                                                 
## [102] "HealthcareCover.doc"                                                              
## [103] "Hendry_aphids_CB_2018[34].pdf"                                                    
## [104] "How-to-Answer-the-64-Toughest-Interview-Questions.pdf"                            
## [105] "Indentifying_features_of_sunflowers_Flowers_F22-2.pptx"                           
## [106] "Indentifying_features_of_sunflowers_Flowers_F22.pptx"                             
## [107] "Lab Meeting #2 Graphing_Flowers_F22.pptx"                                         
## [108] "Lab Meeting #3-1.pptx"                                                            
## [109] "lecture-introd2RStudio-with_scripts.pdf"                                          
## [110] "LinkedIn Profile-3.pdf"                                                           
## [111] "Math0220_M2_review.pdf"                                                           
## [112] "Oath Ceremony Scheduled.pdf"                                                      
## [113] "ParentsMarriageCertificate.pdf"                                                   
## [114] "PCR_Set_up_Flowchart_STUDENT_Flowers_F21.pptx"                                    
## [115] "penguins.csv"                                                                     
## [116] "pitch perfect .pdf"                                                               
## [117] "portfolio_ggpubr_intro-2.Rmd"                                                     
## [118] "portfolio_ggpubr_log_transformation.Rmd"                                          
## [119] "Practice CFU Calculation_Flowers_F22.pptx"                                        
## [120] "Prelab Identifying Isolated Bacteria_Flowers_F22-2.pptx"                          
## [121] "Prelab Identifying Isolated Bacteria_Flowers_F22.pptx"                            
## [122] "Prelab_Barcode_SNP_Sequencing_set_up.pptx"                                        
## [123] "Prelab_UV_Petal_Pattern_Flowers_F22-1.pptx"                                       
## [124] "PXL_20220920_175337658.MP.jpg"                                                    
## [125] "PXL_20221101_211536673.jpg"                                                       
## [126] "PXL_20221104_190523393.jpg"                                                       
## [127] "PXL_20221111_000004240.jpg"                                                       
## [128] "PXL_20221111_000319013.jpg"                                                       
## [129] "PXL_20221115_155938237.MP.jpg"                                                    
## [130] "PXL_20221118_003205920.jpg"                                                       
## [131] "PXL_20221118_003633265.jpg"                                                       
## [132] "r_help_hclust_intro-vs2.pdf"                                                      
## [133] "removing_fixed_alleles.Rmd"                                                       
## [134] "REV-BR_Lab Meeting 1-Diversity_Flowers FA22.pptx"                                 
## [135] "Rev-LabMathPipettingPresentation_Flowers_F22.pptx"                                
## [136] "rsconnect"                                                                        
## [137] "S-Yanamadala_10-02-2022_Practice.xlsx"                                            
## [138] "Scholarly Background.pdf"                                                         
## [139] "ScienceCover.doc"                                                                 
## [140] "ScienceCoverLetterTemplate.doc"                                                   
## [141] "Screen Shot 2022-10-28 at 7.29.19 PM.png"                                         
## [142] "Screen Shot 2022-11-04 at 8.55.19 PM.png"                                         
## [143] "Screen Shot 2022-12-04 at 1.02.53 PM.png"                                         
## [144] "Shamita_BirthCerti.pdf"                                                           
## [145] "Shannon_Diversity_Excel_F22_Flowers_9.7.22.xlsx"                                  
## [146] "Shared_data_Lab_Meeting_1_Diversity_SY.xlsx"                                      
## [147] "Soc 0010 Fall2022 Syllabus .docx"                                                 
## [148] "Spot_plate_practice_flowchart_STUDENT_Flowers_F21.pptx"                           
## [149] "Spot_test_plate_Excel_Data_Flowers_F21.xlsx"                                      
## [150] "Statement_Aug_30_2022.pdf"                                                        
## [151] "STIT & UCONN Resume .pdf"                                                         
## [152] "STIT Resume-2.pdf"                                                                
## [153] "STIT Resume-3.pdf"                                                                
## [154] "STIT Resume.pdf"                                                                  
## [155] "STUDENT-BR-Week 4 Bacterial Diversity and Isolation_Flowers_FA22.pdf"             
## [156] "SY, UV tolerance Analysis LM2, 11-14-2022-2.xlsx"                                 
## [157] "SY, UV tolerance Analysis LM2, 11-14-2022.xlsx"                                   
## [158] "Table-D_Tuesday1145_PCR.jpg"                                                      
## [159] "Teamwork semester evaluation_Flowers_F22-2.docx"                                  
## [160] "Teamwork semester evaluation_Flowers_F22-3.docx"                                  
## [161] "Teamwork semester evaluation_Flowers_F22.docx"                                    
## [162] "Title.html"                                                                       
## [163] "transpose_VCF_data.Rmd"                                                           
## [164] "TU27_Shamita-27F_forward.seq"                                                     
## [165] "Undergraduate-Student-Resume-Examples.pdf"                                        
## [166] "UV tolerance Collated Data_Flowers_Data_Excel_Sheet_F22.xlsx"                     
## [167] "UV_Pattern_quantification_Fiji_Flowers_FA22.pptx"                                 
## [168] "UV_Petal_Pattern_Flowchart_STUDENT_Flowers_F21.pptx"                              
## [169] "UV_tolerance_Analysis_Flowers_F22.pptx"                                           
## [170] "UV_tolerance_data_spreadsheet_Flowers_F22_copy.xlsx"                              
## [171] "UV_tolerance_data_spreadsheet_Flowers_F22-2.xlsx"                                 
## [172] "UV_tolerance_data_spreadsheet_Flowers_F22.xlsx"                                   
## [173] "vcfR_test.vcf"                                                                    
## [174] "vcfR_test.vcf.gz"                                                                 
## [175] "vegan_PCA_amino_acids-STUDENT.Rmd"                                                
## [176] "vegan_pca_with_msleep-STUDENT.html"                                               
## [177] "vegan_pca_with_msleep-STUDENT.Rmd"                                                
## [178] "VM^JSY^JKC^JGR_BR_General Figure Guidelines and Checklist_Flowers_F22-2 copy.docx"
## [179] "walsh2017morphology-2.csv"                                                        
## [180] "walsh2017morphology-3.csv"                                                        
## [181] "walsh2017morphology-4.csv"                                                        
## [182] "walsh2017morphology-5.csv"                                                        
## [183] "walsh2017morphology-6.csv"                                                        
## [184] "walsh2017morphology-7.csv"                                                        
## [185] "walsh2017morphology-8.csv"                                                        
## [186] "walsh2017morphology.csv"                                                          
## [187] "working_directory_practice.Rmd"                                                   
## [188] "WPC MySpace Survey.docx"                                                          
## [189] "Yanamadala_Journal Club 2 Flowers F22-1.pptx"                                     
## [190] "yanamadala.pdf"
list.files(pattern = "vcf")
##  [1] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504 3.vcf" 
##  [2] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504.vcf"   
##  [3] "1.159051856-159301856.ALL.chr1_GRCh38.genotypes.20170504.vcf.gz"
##  [4] "13.29656372-29896372.ALL.chr13_GRCh38.genotypes.20170504.vcf"   
##  [5] "all_loci-1.vcf"                                                 
##  [6] "all_loci-1.vcf.txt"                                             
##  [7] "all_loci.vcf"                                                   
##  [8] "all_loci.vcf.txt"                                               
##  [9] "code_checkpoint_vcfR.Rmd"                                       
## [10] "vcfR_test.vcf"                                                  
## [11] "vcfR_test.vcf.gz"

Data preparation

Reading data from a file

Reading data from the “all_loci.vcf” file and assigning that data to the “snps” object.

snps <- vcfR::read.vcfR("all_loci.vcf", convertNA  = TRUE)
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 8
##   header_line: 9
##   variant count: 1929
##   column count: 81
## 
Meta line 8 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 1929
##   Character matrix gt cols: 81
##   skip: 0
##   nrows: 1929
##   row_num: 0
## 
Processed variant 1000
Processed variant: 1929
## All variants processed

Converting to numerical data

The extract.gt function is used to convert categorical data to numerical data.

snps_num <- vcfR::extract.gt(snps, 
           element = "GT",
           IDtoRowNames  = F,
           as.numeric = T,
           convertNA = T,
           return.alleles = F)

Transpose the data

The “t()” transposes the data or rearranges it.

snps_num_t <- t(snps_num) 

The transposed data is made into a dataframe and assigned to a new object.

snps_num_df <- data.frame(snps_num_t) 

Finding all the NAs

A function is created to find all the NAs in the data.

find_NAs <- function(x){
  NAs_TF <- is.na(x)
  i_NA <- which(NAs_TF == TRUE)
  N_NA <- length(i_NA)
  
  cat("Results:",N_NA, "NAs present\n.")
  return(i_NA)
}

A for loop is used to gather all the NAs in the data which are then stored into an object.

# N_rows
# number of rows (individuals)
N_rows <- nrow(snps_num_t)

# N_NA
# vector to hold output (number of NAs)
N_NA   <- rep(x = 0, times = N_rows)

# N_SNPs
# total number of columns (SNPs)
N_SNPs <- ncol(snps_num_t)

# the for() loop
for(i in 1:N_rows){
  
  # for each row, find the location of
  ## NAs with snps_num_t()
  i_NA <- find_NAs(snps_num_t[i,]) 
  
  # then determine how many NAs
  ## with length()
  N_NA_i <- length(i_NA)
  
  # then save the output to 
  ## our storage vector
  N_NA[i] <- N_NA_i
}
## Results: 28 NAs present
## .Results: 20 NAs present
## .Results: 28 NAs present
## .Results: 24 NAs present
## .Results: 23 NAs present
## .Results: 63 NAs present
## .Results: 51 NAs present
## .Results: 38 NAs present
## .Results: 34 NAs present
## .Results: 24 NAs present
## .Results: 48 NAs present
## .Results: 21 NAs present
## .Results: 42 NAs present
## .Results: 78 NAs present
## .Results: 45 NAs present
## .Results: 21 NAs present
## .Results: 42 NAs present
## .Results: 34 NAs present
## .Results: 66 NAs present
## .Results: 54 NAs present
## .Results: 59 NAs present
## .Results: 52 NAs present
## .Results: 47 NAs present
## .Results: 31 NAs present
## .Results: 63 NAs present
## .Results: 40 NAs present
## .Results: 40 NAs present
## .Results: 22 NAs present
## .Results: 60 NAs present
## .Results: 48 NAs present
## .Results: 961 NAs present
## .Results: 478 NAs present
## .Results: 59 NAs present
## .Results: 26 NAs present
## .Results: 285 NAs present
## .Results: 409 NAs present
## .Results: 1140 NAs present
## .Results: 600 NAs present
## .Results: 1905 NAs present
## .Results: 25 NAs present
## .Results: 1247 NAs present
## .Results: 23 NAs present
## .Results: 750 NAs present
## .Results: 179 NAs present
## .Results: 433 NAs present
## .Results: 123 NAs present
## .Results: 65 NAs present
## .Results: 49 NAs present
## .Results: 192 NAs present
## .Results: 433 NAs present
## .Results: 66 NAs present
## .Results: 597 NAs present
## .Results: 1891 NAs present
## .Results: 207 NAs present
## .Results: 41 NAs present
## .Results: 268 NAs present
## .Results: 43 NAs present
## .Results: 110 NAs present
## .Results: 130 NAs present
## .Results: 90 NAs present
## .Results: 271 NAs present
## .Results: 92 NAs present
## .Results: 103 NAs present
## .Results: 175 NAs present
## .Results: 31 NAs present
## .Results: 66 NAs present
## .Results: 64 NAs present
## .Results: 400 NAs present
## .Results: 192 NAs present
## .Results: 251 NAs present
## .Results: 69 NAs present
## .Results: 58 NAs present
## .

Making a histogram showing the cutoff for NAs.

# 50% of N_SNPs
cutoff50 <- N_SNPs*0.5

hist(N_NA)            
abline(v = cutoff50, 
       col = 2, 
       lwd = 2, 
       lty = 2)

Finding where there are greater than 50% of NAs and excluding them.

percent_NA <- N_NA/N_SNPs*100

# Call which() on percent_NA
i_NA_50percent <- which(percent_NA > 50) 

snps_num_t02 <- snps_num_t[-i_NA_50percent, ]

Shortening labels

“gsub” is used to makes the labels more easier to read.

row_names <- row.names(snps_num_t02) # Key

row_names02 <- gsub("sample_","",row_names)

sample_id <- gsub("^([ATCG]*)(_)(.*)",
                  "\\3",
                  row_names02)
pop_id <- gsub("[01-9]*",    
               "",
               sample_id)

table(pop_id)  
## pop_id
## Alt Cau Div Nel Sub 
##  15  12  15  15  11

Removing Invaluable Data

The “invar_omit” function is created to removed data that is the same in a column.

invar_omit <- function(x){
  cat("Dataframe of dim",dim(x), "processed...\n")
  sds <- apply(x, 2, sd, na.rm = TRUE)
  i_var0 <- which(sds == 0)
 
  
  cat(length(i_var0),"columns removed\n")
  
  if(length(i_var0) > 0){
     x <- x[, -i_var0]
  }
  
  ## add return()  with x in it
  return(x)                      
}


snps_no_invar <- invar_omit(snps_num_t02) 
## Dataframe of dim 68 1929 processed...
## 591 columns removed

Replacing NAs with mean

NAs are replaced with mean value.

snps_noNAs <- snps_no_invar

N_col <- ncol(snps_no_invar)
for(i in 1:N_col){
  
  # get the current column
  column_i <- snps_noNAs[, i]
  
  # get the mean of the current column
  mean_i <- mean(column_i, na.rm = TRUE)
  
  # get the NAs in the current column
  NAs_i <- which(is.na(column_i))
  
  # record the number of NAs
  N_NAs <- length(NAs_i)

  # replace the NAs in the current column
  column_i[NAs_i] <- mean_i
  
  # replace the original column with the
  ## updated columns
  snps_noNAs[, i] <- column_i
  
}

Save the data

Save the data as a .csv file which can be loaded again later.

write.csv(snps_noNAs, file = "SNPs_cleaned.csv",
          row.names = F)

Check for the presence of the file with list.files()

list.files(pattern = ".csv")
##  [1] "dino.csv"                  "penguins.csv"             
##  [3] "SNPs_cleaned.csv"          "walsh2017morphology-2.csv"
##  [5] "walsh2017morphology-3.csv" "walsh2017morphology-4.csv"
##  [7] "walsh2017morphology-5.csv" "walsh2017morphology-6.csv"
##  [9] "walsh2017morphology-7.csv" "walsh2017morphology-8.csv"
## [11] "walsh2017morphology.csv"

Next steps:

In Part 2, we will re-load the SNPs_cleaned.csv file and carry an an analysis with PCA.