A. Calculating Dn/Ds

Report the following values using the condon table:

  1. Total number of nonsynonymous sites and nonsynonymous differences
nonsyn_sites <- 22.333 #change to number of nonsynonymous *sites* you counted
nonsyn <- 4 #change to number of nonsynonymous *differences* you counted
  1. Total number of synonymous sites and synonymous differences
syn_sites <- 7.667 #change to number of synonymous *sites* you counted
syn <- 4 #change to number of synonymous *differences* you counted
  1. The values of Dn, Ds, and Dn/Ds
Dn <- nonsyn / nonsyn_sites
Ds <- syn / syn_sites
Dn_Ds <- Dn / Ds

R Markdown allows us to report code in our text. For example, the Dn value you calculated above is equal to 0.1791072. Use this feature now to report the value of Dn/Ds:

0.3433036

Dn/Ds = 0.3433036

B. Sliding window analysis

Please read the lab handout for details about next three sections. Read the carefully. We’ve created boxes around comments where you need to add your own code.

#Get the data for the lysin gene
lysin <-read.csv("LysinDnDs.csv") #Opens a dialog box and reads the data from the file you choose
attach(lysin, warn.conflicts = F) #Connects the variable name with the data

#Get the data for the cytochrome oxidase gene
COI <-read.csv("COIDnDs.csv") #Opens a dialog box and reads the data from the file you choose
attach(COI, warn.conflicts = F)

#Create empty plot space
plot(NA,NA, xlim=c(0,151), ylim=c(0,3),main= "Ratio of Dn and Ds Across Gene Length", xlab= "Codon", ylab="Dn/Ds")

#add line for "Codon" versus "CO1Ratio" variables
lines(Codon, CO1Ratio, col="green", lwd=3)

#############################################
# add line for  "Codon" versus "LysinRatio" #
#############################################

lines(Codon, LysinRatio, col="blue", lwd=3)


###########################
# fill in legend details! #
###########################

legend("topright", c("CO1Ratio", "LysinRatio"), pch = rep(16,2), col = c("green", "blue"))


######################################
# add a horizontal line where y = 1. #
######################################

abline(h=1, lwd=3)
Figure 1: In this graph the Dn/Ds values that occur show the impact the mutations have on fitness. A Dn/Ds value of approximately 0 suggests that any deleterious non-synonomous substitutions are rapidly removed from the population. A Dn/Ds value of approximately 1 suggests that non-synonymous substitions are occuring at about the same rate as synonymous substituions. A Dn/Ds value of greater than 1 suggests that some of the mutations are advantageous and the frequency rises due to natural selection.

Figure 1: In this graph the Dn/Ds values that occur show the impact the mutations have on fitness. A Dn/Ds value of approximately 0 suggests that any deleterious non-synonomous substitutions are rapidly removed from the population. A Dn/Ds value of approximately 1 suggests that non-synonymous substitions are occuring at about the same rate as synonymous substituions. A Dn/Ds value of greater than 1 suggests that some of the mutations are advantageous and the frequency rises due to natural selection.

#calculate the average DN/Ds for the two genes
CO1 <- mean(CO1Ratio,na.rm=TRUE)
Lysin <- mean(LysinRatio,na.rm=TRUE)

C. Comparitive analysis

#Look at a whole bunch of Dn/Ds values for random genes 
DNDS <-read.csv("FlyDnDs.csv") #Opens a dialog box and reads the data from the file you choose
attach(DNDS, warn.conflicts = F)


#Add color to histogram if you'd like
hist(DnDs, col="midnightblue", ylab="Frequency", xlab="Dn/Ds", 
     main="Dn/Ds of Drosophila Genes", ylim=c(0,50), xlim=c(0,2.5))

#####################################
# add a horizontal line where y = 0 #
abline(h=0, lwd=3)
#####################################


#create vertical lines fpr the average DN/DS values of our two genes of interest
abline(v=CO1,col="seagreen1",lwd=4,lty=3)
abline(v=Lysin,col="red",lwd=4,lty=3)

#calculate the average for the random genes
random <- mean(DnDs)

#draw vertical line for the average of random genes
abline(v=random,col="yellow", lwd=2)


###########################
# fill in legend details! #
###########################

legend("topright", c("CO1","Lysin","Random"), pch=rep(16,3), col=c("seagreen1","red","yellow"))
Figure 2: In this plot we see the frequency of Dn/Ds values for genes in Drosophila. We use this data to see the nature of selection of the genes. Ordinary genes will be similar to many other genes and extraordinary genes will be significantly different from the others.

Figure 2: In this plot we see the frequency of Dn/Ds values for genes in Drosophila. We use this data to see the nature of selection of the genes. Ordinary genes will be similar to many other genes and extraordinary genes will be significantly different from the others.

D. Making claims and proposing hypotheses

Use the inferences you draw from your data to make claims about the following:

  1. What type and extent of heterogeneity of selection operating on the CO1 gene?

The heterogeniety of selection on the CO1 gene is very minimal.There are slight variations in the Dn/Ds value between 0.0 and 0.2 across the codon. This suggests that any non-synonymous substituions have been purged because the overall fitness is minimally affected.This means that Purifying Selection is occuring.(Figure 1)

  1. What type and extent of heterogeneity of selection operating on the lysin gene?

On the lysin gene, there is a lot of variation in the Dn/Ds value. The Dn/Ds value ranges from about 0.4 to about 2.5. This suggests that along the gene mutations can be advantageous, deleterious, or neutral. Overall the mutations seem to be advantageous as the majority of the Dn/Ds values are above 1 and the frequency rises due to directional selection.(Figure 1)

  1. Are each of these genes ordinary or extraordinary when compared to the large number of genes from Drosophila?

Based off of the data, CO1 gene is likely an ordinary gene and Lysin is likely an extraordinary gene. This is suggested by the fact that over 50% of the population of Drosiphila genes occur with a Dn/Ds value between 0.0 and 0.5. Almsot 50 genes fall in the first bar on the graph which is also where the CO1 gene exhists. In contrast, the lysin gene falls past 1.0 Dn/Ds and only about 9 genes occur with a Dn/Ds value about 1. This means the lysin gene is less common and therefore extraordinary. (Figure 2)

Finally, propose two biologically-informed hypotheses for why each gene exhibit such different profiles of selection.

The CO1 and lysin gene each exhibit different profiles of selection because of their occurance in the population. Since CO1 exhists in the mitochondrial DNA it occurs at a much higher frequency whereas lysin occurs only in the male sex cells. The CO1 gene occurs in all individuals of the population which means mutations in the gene are always being selected upon and negative mutations will be removed quickly. Whereas the lysin gene only occurs in the males when mating, selection occurs much less often. Negative and neutral mutations will not benefit reproductive success and therefore are unlikely to be passed on.

The CO1 and lysin genes exhibit opposing profiles of selection because of their impact on the population. Since mutations in the lysin gene can positively affect reproductive success, the individuals with the advantageous mutations will have a higher fitness. Since CO1 affects metabolic pathways for survival, mutations in the gene will rapidly be removed from the population making the overall affect to the population minimal.