NOTE - Make sure your WORKING DIRECTORY is set to the location of the .vcf file being used.

Preliminaries

We need just one package for this, vcfR.

library(vcfR)
## 
##    *****       ***   vcfR   ***       *****
##    This is vcfR 1.13.0 
##      browseVignettes('vcfR') # Documentation
##      citation('vcfR') # Citation
##    *****       *****      *****       *****

Check the location of your working directory with getwd()

getwd()
## [1] "/Users/madelinefontana/Desktop/Computational Biology/Final Project/My_SNPs"

Check for the presence of my vcf file in the working directory with list.files()

list.files()
## [1] "10.15015968-15255968.ALL.chr10_GRCh38.genotypes.20170504.vcf"
## [2] "load_VCF_data.html"                                          
## [3] "load_VCF_data.Rmd"

Introduction

SNP data is often stored in Variant Call Format (VCF) files that are organized differently than normal R data. When SNP data is analyzed in R, such as by doing PCA or cluster analysis, the SNPs are usually the features (variables) and each row is a sample (such as a person, or other organism). VCF files, however, are organized with SNPs in rows and samples in columns.

Changing how data is arranged is called reshaping. Reshaping can take different forms. In this case we need to flip the orientation of the data so that the rows, SNPS, become the columns.

In R we can flip the orientation of matrix or dataframe using t() operation, which stands for transpose. The transpose of a dataset takes each row and makes it into a column. For vcf data, this means that the SNPs in rows can be made into columns.

Worked example

A typical VCF file can be found in the file all_loci.vcf. (Note that this file is NOT compressed and so has an extension of .vcf, not .vcf.gz). Load the data into an object called bird_snps using the function vcfR::read.vcfR().

NOTE - before you begin, make sure your WORKING DIRECTORY is set to the location of the .vcf file being used.

# Load the data with vcfR::read.vcfR()
my_snps <- read.vcfR("10.15015968-15255968.ALL.chr10_GRCh38.genotypes.20170504.vcf", convertNA = T) 
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 130
##   header_line: 131
##   variant count: 8065
##   column count: 2513
## 
Meta line 130 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 8065
##   Character matrix gt cols: 2513
##   skip: 0
##   nrows: 8065
##   row_num: 0
## 
Processed variant 1000
Processed variant 2000
Processed variant 3000
Processed variant 4000
Processed variant 5000
Processed variant 6000
Processed variant 7000
Processed variant 8000
Processed variant: 8065
## All variants processed
warning("If this didn't work, you may not have set your working directory to the location of the #vcf file")
## Warning: If this didn't work, you may not have set your working directory to the
## location of the #vcf file

Examine the VCF file

head(my_snps)
## [1] "***** Object of class 'vcfR' *****"
## [1] "***** Meta section *****"
## [1] "##fileformat=VCFv4.1"
## [1] "##FILTER=<ID=PASS,Description=\"All filters passed\">"
## [1] "##fileDate=20150218"
## [1] "##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/refe [Truncated]"
## [1] "##source=1000GenomesPhase3Pipeline"
## [1] "##contig=<ID=1,assembly=b37,length=249250621>"
## [1] "First 6 rows."
## [1] 
## [1] "***** Fixed section *****"
##      CHROM POS        ID            REF ALT QUAL  FILTER
## [1,] "10"  "15015973" "rs187430636" "T" "G" "100" "PASS"
## [2,] "10"  "15016085" "rs149664176" "G" "A" "100" "PASS"
## [3,] "10"  "15016102" "rs549938465" "G" "A" "100" "PASS"
## [4,] "10"  "15016227" "rs145478521" "A" "G" "100" "PASS"
## [5,] "10"  "15016235" "rs538300148" "C" "T" "100" "PASS"
## [6,] "10"  "15016248" "rs193053072" "C" "G" "100" "PASS"
## [1] 
## [1] "***** Genotype section *****"
##      FORMAT HG00096 HG00097 HG00099 HG00100 HG00101
## [1,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [2,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [3,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [4,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [5,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [6,] "GT"   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"  
## [1] "First 6 columns only."
## [1] 
## [1] "Unique GT formats:"
## [1] "GT"
## [1]

View the meta data like this:

my_snps@meta
##   [1] "##fileformat=VCFv4.1"                                                                                                                                                                                                                                                          
##   [2] "##FILTER=<ID=PASS,Description=\"All filters passed\">"                                                                                                                                                                                                                         
##   [3] "##fileDate=20150218"                                                                                                                                                                                                                                                           
##   [4] "##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz"                                                                                                                                                     
##   [5] "##source=1000GenomesPhase3Pipeline"                                                                                                                                                                                                                                            
##   [6] "##contig=<ID=1,assembly=b37,length=249250621>"                                                                                                                                                                                                                                 
##   [7] "##contig=<ID=2,assembly=b37,length=243199373>"                                                                                                                                                                                                                                 
##   [8] "##contig=<ID=3,assembly=b37,length=198022430>"                                                                                                                                                                                                                                 
##   [9] "##contig=<ID=4,assembly=b37,length=191154276>"                                                                                                                                                                                                                                 
##  [10] "##contig=<ID=5,assembly=b37,length=180915260>"                                                                                                                                                                                                                                 
##  [11] "##contig=<ID=6,assembly=b37,length=171115067>"                                                                                                                                                                                                                                 
##  [12] "##contig=<ID=7,assembly=b37,length=159138663>"                                                                                                                                                                                                                                 
##  [13] "##contig=<ID=8,assembly=b37,length=146364022>"                                                                                                                                                                                                                                 
##  [14] "##contig=<ID=9,assembly=b37,length=141213431>"                                                                                                                                                                                                                                 
##  [15] "##contig=<ID=10,assembly=b37,length=135534747>"                                                                                                                                                                                                                                
##  [16] "##contig=<ID=11,assembly=b37,length=135006516>"                                                                                                                                                                                                                                
##  [17] "##contig=<ID=12,assembly=b37,length=133851895>"                                                                                                                                                                                                                                
##  [18] "##contig=<ID=13,assembly=b37,length=115169878>"                                                                                                                                                                                                                                
##  [19] "##contig=<ID=14,assembly=b37,length=107349540>"                                                                                                                                                                                                                                
##  [20] "##contig=<ID=15,assembly=b37,length=102531392>"                                                                                                                                                                                                                                
##  [21] "##contig=<ID=16,assembly=b37,length=90354753>"                                                                                                                                                                                                                                 
##  [22] "##contig=<ID=17,assembly=b37,length=81195210>"                                                                                                                                                                                                                                 
##  [23] "##contig=<ID=18,assembly=b37,length=78077248>"                                                                                                                                                                                                                                 
##  [24] "##contig=<ID=19,assembly=b37,length=59128983>"                                                                                                                                                                                                                                 
##  [25] "##contig=<ID=20,assembly=b37,length=63025520>"                                                                                                                                                                                                                                 
##  [26] "##contig=<ID=21,assembly=b37,length=48129895>"                                                                                                                                                                                                                                 
##  [27] "##contig=<ID=22,assembly=b37,length=51304566>"                                                                                                                                                                                                                                 
##  [28] "##contig=<ID=GL000191.1,assembly=b37,length=106433>"                                                                                                                                                                                                                           
##  [29] "##contig=<ID=GL000192.1,assembly=b37,length=547496>"                                                                                                                                                                                                                           
##  [30] "##contig=<ID=GL000193.1,assembly=b37,length=189789>"                                                                                                                                                                                                                           
##  [31] "##contig=<ID=GL000194.1,assembly=b37,length=191469>"                                                                                                                                                                                                                           
##  [32] "##contig=<ID=GL000195.1,assembly=b37,length=182896>"                                                                                                                                                                                                                           
##  [33] "##contig=<ID=GL000196.1,assembly=b37,length=38914>"                                                                                                                                                                                                                            
##  [34] "##contig=<ID=GL000197.1,assembly=b37,length=37175>"                                                                                                                                                                                                                            
##  [35] "##contig=<ID=GL000198.1,assembly=b37,length=90085>"                                                                                                                                                                                                                            
##  [36] "##contig=<ID=GL000199.1,assembly=b37,length=169874>"                                                                                                                                                                                                                           
##  [37] "##contig=<ID=GL000200.1,assembly=b37,length=187035>"                                                                                                                                                                                                                           
##  [38] "##contig=<ID=GL000201.1,assembly=b37,length=36148>"                                                                                                                                                                                                                            
##  [39] "##contig=<ID=GL000202.1,assembly=b37,length=40103>"                                                                                                                                                                                                                            
##  [40] "##contig=<ID=GL000203.1,assembly=b37,length=37498>"                                                                                                                                                                                                                            
##  [41] "##contig=<ID=GL000204.1,assembly=b37,length=81310>"                                                                                                                                                                                                                            
##  [42] "##contig=<ID=GL000205.1,assembly=b37,length=174588>"                                                                                                                                                                                                                           
##  [43] "##contig=<ID=GL000206.1,assembly=b37,length=41001>"                                                                                                                                                                                                                            
##  [44] "##contig=<ID=GL000207.1,assembly=b37,length=4262>"                                                                                                                                                                                                                             
##  [45] "##contig=<ID=GL000208.1,assembly=b37,length=92689>"                                                                                                                                                                                                                            
##  [46] "##contig=<ID=GL000209.1,assembly=b37,length=159169>"                                                                                                                                                                                                                           
##  [47] "##contig=<ID=GL000210.1,assembly=b37,length=27682>"                                                                                                                                                                                                                            
##  [48] "##contig=<ID=GL000211.1,assembly=b37,length=166566>"                                                                                                                                                                                                                           
##  [49] "##contig=<ID=GL000212.1,assembly=b37,length=186858>"                                                                                                                                                                                                                           
##  [50] "##contig=<ID=GL000213.1,assembly=b37,length=164239>"                                                                                                                                                                                                                           
##  [51] "##contig=<ID=GL000214.1,assembly=b37,length=137718>"                                                                                                                                                                                                                           
##  [52] "##contig=<ID=GL000215.1,assembly=b37,length=172545>"                                                                                                                                                                                                                           
##  [53] "##contig=<ID=GL000216.1,assembly=b37,length=172294>"                                                                                                                                                                                                                           
##  [54] "##contig=<ID=GL000217.1,assembly=b37,length=172149>"                                                                                                                                                                                                                           
##  [55] "##contig=<ID=GL000218.1,assembly=b37,length=161147>"                                                                                                                                                                                                                           
##  [56] "##contig=<ID=GL000219.1,assembly=b37,length=179198>"                                                                                                                                                                                                                           
##  [57] "##contig=<ID=GL000220.1,assembly=b37,length=161802>"                                                                                                                                                                                                                           
##  [58] "##contig=<ID=GL000221.1,assembly=b37,length=155397>"                                                                                                                                                                                                                           
##  [59] "##contig=<ID=GL000222.1,assembly=b37,length=186861>"                                                                                                                                                                                                                           
##  [60] "##contig=<ID=GL000223.1,assembly=b37,length=180455>"                                                                                                                                                                                                                           
##  [61] "##contig=<ID=GL000224.1,assembly=b37,length=179693>"                                                                                                                                                                                                                           
##  [62] "##contig=<ID=GL000225.1,assembly=b37,length=211173>"                                                                                                                                                                                                                           
##  [63] "##contig=<ID=GL000226.1,assembly=b37,length=15008>"                                                                                                                                                                                                                            
##  [64] "##contig=<ID=GL000227.1,assembly=b37,length=128374>"                                                                                                                                                                                                                           
##  [65] "##contig=<ID=GL000228.1,assembly=b37,length=129120>"                                                                                                                                                                                                                           
##  [66] "##contig=<ID=GL000229.1,assembly=b37,length=19913>"                                                                                                                                                                                                                            
##  [67] "##contig=<ID=GL000230.1,assembly=b37,length=43691>"                                                                                                                                                                                                                            
##  [68] "##contig=<ID=GL000231.1,assembly=b37,length=27386>"                                                                                                                                                                                                                            
##  [69] "##contig=<ID=GL000232.1,assembly=b37,length=40652>"                                                                                                                                                                                                                            
##  [70] "##contig=<ID=GL000233.1,assembly=b37,length=45941>"                                                                                                                                                                                                                            
##  [71] "##contig=<ID=GL000234.1,assembly=b37,length=40531>"                                                                                                                                                                                                                            
##  [72] "##contig=<ID=GL000235.1,assembly=b37,length=34474>"                                                                                                                                                                                                                            
##  [73] "##contig=<ID=GL000236.1,assembly=b37,length=41934>"                                                                                                                                                                                                                            
##  [74] "##contig=<ID=GL000237.1,assembly=b37,length=45867>"                                                                                                                                                                                                                            
##  [75] "##contig=<ID=GL000238.1,assembly=b37,length=39939>"                                                                                                                                                                                                                            
##  [76] "##contig=<ID=GL000239.1,assembly=b37,length=33824>"                                                                                                                                                                                                                            
##  [77] "##contig=<ID=GL000240.1,assembly=b37,length=41933>"                                                                                                                                                                                                                            
##  [78] "##contig=<ID=GL000241.1,assembly=b37,length=42152>"                                                                                                                                                                                                                            
##  [79] "##contig=<ID=GL000242.1,assembly=b37,length=43523>"                                                                                                                                                                                                                            
##  [80] "##contig=<ID=GL000243.1,assembly=b37,length=43341>"                                                                                                                                                                                                                            
##  [81] "##contig=<ID=GL000244.1,assembly=b37,length=39929>"                                                                                                                                                                                                                            
##  [82] "##contig=<ID=GL000245.1,assembly=b37,length=36651>"                                                                                                                                                                                                                            
##  [83] "##contig=<ID=GL000246.1,assembly=b37,length=38154>"                                                                                                                                                                                                                            
##  [84] "##contig=<ID=GL000247.1,assembly=b37,length=36422>"                                                                                                                                                                                                                            
##  [85] "##contig=<ID=GL000248.1,assembly=b37,length=39786>"                                                                                                                                                                                                                            
##  [86] "##contig=<ID=GL000249.1,assembly=b37,length=38502>"                                                                                                                                                                                                                            
##  [87] "##contig=<ID=MT,assembly=b37,length=16569>"                                                                                                                                                                                                                                    
##  [88] "##contig=<ID=NC_007605,assembly=b37,length=171823>"                                                                                                                                                                                                                            
##  [89] "##contig=<ID=X,assembly=b37,length=155270560>"                                                                                                                                                                                                                                 
##  [90] "##contig=<ID=Y,assembly=b37,length=59373566>"                                                                                                                                                                                                                                  
##  [91] "##contig=<ID=hs37d5,assembly=b37,length=35477943>"                                                                                                                                                                                                                             
##  [92] "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"                                                                                                                                                                                                                
##  [93] "##INFO=<ID=CIEND,Number=2,Type=Integer,Description=\"Confidence interval around END for imprecise variants\">"                                                                                                                                                                 
##  [94] "##INFO=<ID=CIPOS,Number=2,Type=Integer,Description=\"Confidence interval around POS for imprecise variants\">"                                                                                                                                                                 
##  [95] "##INFO=<ID=CS,Number=1,Type=String,Description=\"Source call set.\">"                                                                                                                                                                                                          
##  [96] "##INFO=<ID=END,Number=1,Type=Integer,Description=\"End coordinate of this variant\">"                                                                                                                                                                                          
##  [97] "##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description=\"Imprecise structural variation\">"                                                                                                                                                                                       
##  [98] "##INFO=<ID=MC,Number=.,Type=String,Description=\"Merged calls.\">"                                                                                                                                                                                                             
##  [99] "##INFO=<ID=MEINFO,Number=4,Type=String,Description=\"Mobile element info of the form NAME,START,END<POLARITY; If there is only 5' OR 3' support for this call, will be NULL NULL for START and END\">"                                                                         
## [100] "##INFO=<ID=MEND,Number=1,Type=Integer,Description=\"Mitochondrial end coordinate of inserted sequence\">"                                                                                                                                                                      
## [101] "##INFO=<ID=MLEN,Number=1,Type=Integer,Description=\"Estimated length of mitochondrial insert\">"                                                                                                                                                                               
## [102] "##INFO=<ID=MSTART,Number=1,Type=Integer,Description=\"Mitochondrial start coordinate of inserted sequence\">"                                                                                                                                                                  
## [103] "##INFO=<ID=SVLEN,Number=.,Type=Integer,Description=\"SV length. It is only calculated for structural variation MEIs. For other types of SVs; one may calculate the SV length by INFO:END-START+1, or by finding the difference between lengthes of REF and ALT alleles\">"     
## [104] "##INFO=<ID=SVTYPE,Number=1,Type=String,Description=\"Type of structural variant\">"                                                                                                                                                                                            
## [105] "##INFO=<ID=TSD,Number=1,Type=String,Description=\"Precise Target Site Duplication for bases, if unknown, value will be NULL\">"                                                                                                                                                
## [106] "##INFO=<ID=AC,Number=A,Type=Integer,Description=\"Total number of alternate alleles in called genotypes\">"                                                                                                                                                                    
## [107] "##INFO=<ID=AF,Number=A,Type=Float,Description=\"Estimated allele frequency in the range (0,1)\">"                                                                                                                                                                              
## [108] "##INFO=<ID=NS,Number=1,Type=Integer,Description=\"Number of samples with data\">"                                                                                                                                                                                              
## [109] "##INFO=<ID=AN,Number=1,Type=Integer,Description=\"Total number of alleles in called genotypes\">"                                                                                                                                                                              
## [110] "##INFO=<ID=EAS_AF,Number=A,Type=Float,Description=\"Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1)\">"                                                                                                                                  
## [111] "##INFO=<ID=EUR_AF,Number=A,Type=Float,Description=\"Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1)\">"                                                                                                                                  
## [112] "##INFO=<ID=AFR_AF,Number=A,Type=Float,Description=\"Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1)\">"                                                                                                                                  
## [113] "##INFO=<ID=AMR_AF,Number=A,Type=Float,Description=\"Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1)\">"                                                                                                                                  
## [114] "##INFO=<ID=SAS_AF,Number=A,Type=Float,Description=\"Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1)\">"                                                                                                                                  
## [115] "##INFO=<ID=DP,Number=1,Type=Integer,Description=\"Total read depth; only low coverage data were counted towards the DP, exome data were not used\">"                                                                                                                           
## [116] "##INFO=<ID=AA,Number=1,Type=String,Description=\"Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)\">"                            
## [117] "##INFO=<ID=VT,Number=.,Type=String,Description=\"indicates what type of variant the line represents\">"                                                                                                                                                                        
## [118] "##INFO=<ID=EX_TARGET,Number=0,Type=Flag,Description=\"indicates whether a variant is within the exon pull down target boundaries\">"                                                                                                                                           
## [119] "##INFO=<ID=MULTI_ALLELIC,Number=0,Type=Flag,Description=\"indicates whether a site is multi-allelic\">"                                                                                                                                                                        
## [120] "##INFO=<ID=STRAND_FLIP,Number=0,Type=Flag,Description=\"Indicates that the reference strand has changed between GRCh37 and GRCh38\">"                                                                                                                                          
## [121] "##INFO=<ID=REF_SWITCH,Number=0,Type=Flag,Description=\"Indicates that the reference allele has changed\">"                                                                                                                                                                     
## [122] "##INFO=<ID=DEPRECATED_RSID,Number=.,Type=String,Description=\"dbsnp rs IDs that have been merged into other rs IDs or do not map to GRCh38\">"                                                                                                                                 
## [123] "##INFO=<ID=RSID_REMOVED,Number=.,Type=String,Description=\"dbsnp rs IDs removed from this variant, due to either the variant splitting up or being deprecated/merged\">"                                                                                                       
## [124] "##INFO=<ID=GRCH37_38_REF_STRING_MATCH,Number=0,Type=Flag,Description=\"Indicates reference allele in origin GRCh37 vcf string-matches reference allele in dbsnp GRCh38 vcf\">"                                                                                                 
## [125] "##INFO=<ID=NOT_ALL_RSIDS_STRAND_CHANGE_OR_REF_SWITCH,Number=0,Type=Flag,Description=\"Indicates only some of the rs IDs in origin GRCh37 vcf switched strands or switched strands and changed reference allele. This would result in rs IDs being split into multiple lines\">"
## [126] "##INFO=<ID=GRCH37_POS,Number=1,Type=Integer,Description=\"Position in origin GRCh37 vcf\">"                                                                                                                                                                                    
## [127] "##INFO=<ID=GRCH37_REF,Number=1,Type=String,Description=\"Representation of reference allele in origin GRCh37 vcf\">"                                                                                                                                                           
## [128] "##INFO=<ID=ALLELE_TRANSFORM,Number=0,Type=Flag,Description=\"Indicates that at least some of the alleles have changed in how they're represented, e.g. through left shifting.\">"                                                                                              
## [129] "##INFO=<ID=REF_NEW_ALLELE,Number=0,Type=Flag,Description=\"Indicates that the reference allele is an allele not present in the origin GRCh37 vcf\">"                                                                                                                           
## [130] "##INFO=<ID=CHROM_CHANGE_BETWEEN_ASSEMBLIES,Number=.,Type=String,Description=\"dbsnp rs IDs that are mapped to a different chromosome between GRCh37 and GRCh38\">"

(Don’t worry about what the “at” symbol is using - this is a somewhat less common R syntax).

We can get a snapshot of the samples like this:

my_snps@gt[1:10, 1:3]
##       FORMAT HG00096 HG00097
##  [1,] "GT"   "0|0"   "0|0"  
##  [2,] "GT"   "0|0"   "0|0"  
##  [3,] "GT"   "0|0"   "0|0"  
##  [4,] "GT"   "0|0"   "0|0"  
##  [5,] "GT"   "0|0"   "0|0"  
##  [6,] "GT"   "0|0"   "0|0"  
##  [7,] "GT"   "0|0"   "0|0"  
##  [8,] "GT"   "0|0"   "0|0"  
##  [9,] "GT"   "0|0"   "0|0"  
## [10,] "GT"   "0|0"   "0|0"

Each row is a SNP and each column is a sample.

Extract numeric genotype scores

We extract genotype scores using vcfR::extract.gt().

# Add vcfR::extract.gt() to extract the numeric scores 
my_snps_num <- extract.gt(my_snps,
           element = "GT",
           IDtoRowNames  = F,
           as.numeric = T,
           convertNA = T)

We now have just the numeric data that would go into an analysis such as PCA.

The sample names are REALLY long so its hard to display in a compact way. The code below will make this a a little easier to see using the regular expression gsub().

colnames(my_snps_num) <-  gsub("sample_", "",
                                 colnames(my_snps_num))

colnames(my_snps_num) <-  gsub("_", "",
                                 colnames(my_snps_num))

We can see that the matrix just contains numeric genotype scores of 0, 1 or 2.

Here’s a small view of data:

my_snps_num[1:10, 1:4]
##       HG00096 HG00097 HG00099 HG00100
##  [1,]       0       0       0       0
##  [2,]       0       0       0       0
##  [3,]       0       0       0       0
##  [4,]       0       0       0       0
##  [5,]       0       0       0       0
##  [6,]       0       0       0       0
##  [7,]       0       0       0       0
##  [8,]       0       0       0       0
##  [9,]       0       0       0       0
## [10,]       0       0       0       0

We can call summary of a bit of the data like this:

summary(my_snps_num[, 1:5])
##     HG00096          HG00097           HG00099           HG00100       
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.0119   Mean   :0.02691   Mean   :0.01649   Mean   :0.02232  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :2.0000   Max.   :2.00000   Max.   :2.00000   Max.   :2.00000  
##     HG00101       
##  Min.   :0.00000  
##  1st Qu.:0.00000  
##  Median :0.00000  
##  Mean   :0.02219  
##  3rd Qu.:0.00000  
##  Max.   :2.00000

Transpose numeric genotype scores.

The data are formatted as genotype scores but SNPs are in columns and samples in row. We can reformat them using the transpose function t().

my_snps_num_t <- t(my_snps_num)

We can look at the output like this:

my_snps_num_t[1:10, 1:4]
##         [,1] [,2] [,3] [,4]
## HG00096    0    0    0    0
## HG00097    0    0    0    0
## HG00099    0    0    0    0
## HG00100    0    0    0    0
## HG00101    0    0    0    0
## HG00102    0    0    0    0
## HG00103    0    0    0    0
## HG00105    0    0    0    0
## HG00106    0    0    0    0
## HG00107    0    0    0    0

Preview - dealing with NAs

These data have a lot of NAs. If we just call na.omit() on them, what happens? Call na.omit() on the bird_snps_num_t object, then check the dimensions.

no_NAs <- na.omit(my_snps_num_t)

# what is the remaining size of the data?
dim(no_NAs) 
## [1] 2504 8065