Here’s an example of how to make a graph with a line of best fit in R.
Data are from Xiang et al. 2013. Identification of Tibetan Specific Mutation in the Hypoxic gene EGLN1 and its contribution to high-altitude adaptation.
Make the vectors
population <- c("Nyingchi","Lhasa","Ch1amdo","Chamdo2","Xigaze","Shanna","Nagari","Nagqu")
altitude <- c(2750,3400,3500,3850,3890,3935,4700,4800)
allele_freq <- c(0.6992, 0.6429, 0.7234, 0.6549, 0.7151, 0.7119, 0.7583, 0.7542)
Make a dataframe from the vectors
tibetdf <- data.frame(pop = population,
alt = altitude,
freq = allele_freq)
Make a plot with plot()
plot(freq ~ alt, data = tibetdf)
Calculate line of best with with lm()
lobf <- lm(freq ~ alt, data = tibetdf)
Get slope and intercept for line
lobf_slope_int <- coef(lobf)
lobf_slope_int
## (Intercept) alt
## 5.605265e-01 3.814073e-05
The equation for the line would be:
# y = m * x + b
# freq = m * alt + b
# freq = 3.814073e-05 * alt + 5.605265e-01
We could predict the frequency for a given altitude like this. Say we want to make a prediction for 3000 m
# y = m * x + b
3.814073e-05 * 3000 + 5.605265e-01
## [1] 0.6749487
coef(lobf)[2]* 3000 + coef(lobf)[1]
## alt
## 0.6749487
Let’s save that info to objects
x <- 3000
y <- coef(lobf)[2]* x + coef(lobf)[1]
Plot data with line
plot(freq ~ alt, data = tibetdf, pch = 16, xlab = "Altitude (meters)", ylab = "C allele frequency %")
abline(lobf_slope_int, col =2, lwd = 2)
Our prediction for 3000 m can be plotted like this
# main plot
plot(freq ~ alt, data = tibetdf, pch = 16,
xlab = "Altitude (meters)",
ylab = "C allele frequency %")
# regression line of best fit
abline(lobf_slope_int, col =2, lwd = 2)
# predicted point
points(y ~ x, col = 3, pch = 15, cex = 2)
Calculate the correlation. Compare to value in Table 1 in the original paper.
cor(tibetdf$freq, tibetdf$alt,method = "spearman")
## [1] 0.7142857