Here’s an example of how to make a graph with a line of best fit in R.

Data are from Xiang et al. 2013. Identification of Tibetan Specific Mutation in the Hypoxic gene EGLN1 and its contribution to high-altitude adaptation.

Make the vectors

population  <- c("Nyingchi","Lhasa","Ch1amdo","Chamdo2","Xigaze","Shanna","Nagari","Nagqu")
altitude    <- c(2750,3400,3500,3850,3890,3935,4700,4800)
allele_freq <- c(0.6992, 0.6429, 0.7234, 0.6549, 0.7151, 0.7119, 0.7583, 0.7542)

Make a dataframe from the vectors

tibetdf <- data.frame(pop = population,
                      alt = altitude,
                      freq = allele_freq)

Make a plot with plot()

plot(freq ~ alt, data = tibetdf)

Calculate line of best with with lm()

lobf <- lm(freq ~ alt, data = tibetdf)

Get slope and intercept for line

lobf_slope_int <- coef(lobf)
lobf_slope_int
##  (Intercept)          alt 
## 5.605265e-01 3.814073e-05

The equation for the line would be:

#    y = m * x   + b
# freq = m * alt + b
# freq = 3.814073e-05 * alt + 5.605265e-01

We could predict the frequency for a given altitude like this. Say we want to make a prediction for 3000 m

# y = m            * x    + b
      3.814073e-05 * 3000 + 5.605265e-01
## [1] 0.6749487
      coef(lobf)[2]* 3000 + coef(lobf)[1]
##       alt 
## 0.6749487

Let’s save that info to objects

x <- 3000
y <- coef(lobf)[2]* x + coef(lobf)[1]

Plot data with line

plot(freq ~ alt, data = tibetdf, pch = 16, xlab = "Altitude (meters)", ylab = "C allele frequency %")
abline(lobf_slope_int, col =2, lwd = 2)

Our prediction for 3000 m can be plotted like this

# main plot
plot(freq ~ alt, data = tibetdf, pch = 16, 
     xlab = "Altitude (meters)", 
     ylab = "C allele frequency %")

# regression line of best fit
abline(lobf_slope_int, col =2, lwd = 2)

# predicted point
points(y ~ x, col = 3, pch = 15, cex = 2)

Calculate the correlation. Compare to value in Table 1 in the original paper.

cor(tibetdf$freq, tibetdf$alt,method = "spearman")
## [1] 0.7142857