Two input files are required:
A VCF of candidate biallelic SNPs in a set of individuals. Multiallelic SNPs will cause problems.
A CSV file with a column for every individual in the VCF (no more and no less, otherwise there will be problems), and a column for the genotype of interest of each individual.
Necessary Packages:
library(tidyverse)
library(reshape2)
library(limma)
library(limma)
matrix <- read.csv("Name_of_File.csv") #Open your VCF turned CSV
matrix <- matrix %>%
column_to_rownames("POS") #Turn the location of each SNP into its rowname. This is so you
#can identify which loci contain SNPs correlated to your genotype at the end.
matrix2 <- matrix[-c(1:8)] #Remove everything except individuals from the dataframe so we can
#make it into a numeric count matrix:
matrix2[] <- lapply(matrix2[], function(x) substr(x, 1, 3)) #This function removes everything
#except the allelic frequency of each SNP from the dataframe by deleting everything except
#the first three characters.
matrix2[matrix2== "1/1"] <- "2"
matrix2[matrix2== "0/1"] <- "1"
matrix2[matrix2== "0/0"] <- "0"
matrix2[matrix2== "./."] <- NA #Replace allelic frequency values with numeric genotype values
matrix3 <- data.matrix(matrix2) #Convert your dataframe into a numeric matrix
meta <- read.csv("Name_of_File.csv") #Open the CSV
colnames(meta)[2]<-"genotype" #Renaming the column for genotype to "genotype"
colnames(meta)[1]<-"sample" #Renaming the column for sample to "sample"
meta <- meta %>% column_to_rownames("sample") #Make sample (e.g., "ANN0802, ANN0803, etc"
#the rownames)
designmat<-model.matrix(~genotype,meta) # Make you genotype data into a "design matrix,"
#a matrix that encodes characteristic data numerically
fit<-(lmFit(matrix3, designmat) %>% eBayes()) #Fit a linear model to the SNP matrix against
#the deisgn matrix using the sample names to tie them together
top <- topTable(fit, coef=1, number = 15) #Make a table of the 15 most correlated SNPs to
#your design matrix. Increase or decrease the number of hits by changing "number = X" in
#this command.
print(top) #View the fruits of your labour