Master the fundamentals of R for livestock genomics and quantitative genetics research
By the end of this tutorial, you will be able to:
R is a free software environment for statistical computing and graphics. One of its strengths is that you can make publication-quality plots.
RStudio is a flexible and multi-functional open-source IDE (integrated development environment) used as a graphical front-end to work with R.
R is used by typing in commands. They are entered after the prompt
> in the Console. After you type a command and its
arguments, simply press the Return Key. Separate commands using
; or with a newline (Enter).
To run the code below, click anywhere in the code chunk and then click Run โ Run Current Chunk.
## [1] "hello world!"
## [1] "hello world!"
๐ก Pro Tip: Like learning any new language, if you get an error or are not sure how to do something, you can search online for help. There are many resources and forums for R!
R has built-in documentation for every function. Here are several ways to access help:
# Method 1: Using help()
help("print")
# Method 2: Using ? shortcut
?print
# Method 3: Get examples
example("print")
# Method 4: Start HTML help browser
help.start()R can perform all standard mathematical operations. Try these in the Console too!
## [1] 2
๐ Remember: You need to give instructions to R for everything. We are using RStudio as an interface to better manage our scripts, data, files, and figures.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.
When you click the Knit button, a document will be generated that includes both content as well as the output of any embedded R code chunks.
For more details on R Markdown, visit: rmarkdown.rstudio.com
Some symbols are familiar (+, -,
/), while others might be new (* for
multiplication).
## [1] 7
## [1] 1
## [1] 1.333333
## [1] 12
We can assign values to variables and use them in calculations:
# Method 1: Using <- (preferred)
x <- 4
# Method 2: Using assign()
assign("y", 3)
# Method 3: Using =
z <- 2To see variable values, you can print them:
## [1] 4
## [1] 3
Now use these variables in calculations:
## [1] 7
## [1] 1
## [1] 1.333333
## [1] 12
## [1] 6
+,
-, *, or /Vectors are sequences of data that can be numbers, characters, or logical values. All elements must be the same type.
## [1] 1 2 NA 4
## [1] "numeric" "vector"
## [1] "A" "mango" NA
Logical vectors have three possible values: TRUE,
FALSE, and NA.
## [1] FALSE FALSE NA TRUE
Logical operators: - < less than -
<= less than or equal to - > greater
than - >= greater than or equal to - ==
equal to - != not equal to - & and -
| or
Matrices are rectangular arrays of data, all of the same type. They are fundamental for genomic relationship matrices (G-matrix) in quantitative genetics!
A <- matrix(
# Sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, # Number of rows
ncol = 3, # Number of columns
byrow = TRUE # Fill by row
)
# Naming rows and columns
rownames(A) <- c("a", "b", "c")
colnames(A) <- c("c", "d", "e")
B <- matrix(c(1, 6, 12, 4, 8, 15, 3, 14, 2), nrow = 3, ncol = 3)
print(A)## c d e
## a 1 2 3
## b 4 5 6
## c 7 8 9
## [,1] [,2] [,3]
## [1,] 1 4 3
## [2,] 6 8 14
## [3,] 12 15 2
## c d e
## a 2 6 6
## b 10 13 20
## c 19 23 11
## c d e
## a 0 -2 0
## b -2 -3 -8
## c -5 -7 7
## [,1] [,2] [,3]
## a 49 65 37
## b 106 146 94
## c 163 227 151
## a b c
## c 1 4 7
## d 2 5 8
## e 3 6 9
## [,1] [,2] [,3]
## [1,] -0.47087379 0.08980583 0.077669903
## [2,] 0.37864078 -0.08252427 0.009708738
## [3,] -0.01456311 0.08009709 -0.038834951
Hereโs how matrix inversion works for a 2ร2 matrix:
## [,1] [,2]
## [1,] 2 5
## [2,] 9 7
# Calculate determinant
determinant_M <- a*d - b*c
# Calculate adjoint (swap aโd, negate b and c)
adjoint_M <- matrix(c(d, -c, -b, a), nrow = 2, ncol = 2)
# Inverse = (1/determinant) ร adjoint
Inverse_M <- (1/determinant_M) * adjoint_M
print(Inverse_M)## [,1] [,2]
## [1,] -0.2258065 0.16129032
## [2,] 0.2903226 -0.06451613
## [,1] [,2]
## [1,] -0.2258065 0.16129032
## [2,] 0.2903226 -0.06451613
+), subtraction (-), matrix
multiplication (%*%), transpose (t()), and
inverse (solve())โ ๏ธ Important: You only need to install a package once, but you must load it every time you start a new R session.
The next time you run this script, comment it out:
The working directory is where R looks for files to read and where it saves files you create.
Method 1: Using RStudio interface - Click Files โ cog icon โ Set As Working Directory
Method 2: Using code (recommended for reproducibility)
# Check current working directory
getwd()
# Set new working directory
setwd("~/Desktop/CTLGH_Training/Day1/")
# Verify the change
getwd()
# List files in working directory
dir()๐ก Note: When using R Markdown, the working
directory is automatically set to where your .Rmd file is
saved!
setwd()dir()Now we move from toy examples to real livestock genetics data!
# Read CSV file
traits <- read.table("CT_traits_724_pc_res.csv", h=TRUE, sep=",")
# Alternative for CSV
# traits <- read.csv("CT_traits_724_pc_res.csv")
# View first few rows
head(traits)๐ Dataset: Scottish Blackface sheep carcass composition traits measured using CT scanning.
Publication: Genetics Selection Evolution (2016)
| Variable | Description |
|---|---|
id |
Animal ID |
sex |
Sex (0/1) |
Year |
Year of measurement |
dob |
Date of birth |
litter |
Litter ID |
LS |
Litter size |
DamAge |
Age of dam |
Group |
Management group |
Line_a |
Line of lamb |
LW |
Live weight |
bon_area_ISC |
Bone area at ischium |
mus_density_TV8 |
Muscle density of 8th thoracic vertebra |
PC1, PC2, PC3 |
Principal components (population structure) |
rlw |
Residual live weight |
rbon_area |
Residual bone area |
rmus_den |
Residual muscle density |
read.csv()head()We can extract specific parts of our data using $
notation or indexing:
Some variables need to be converted to the correct type:
# Sex should be a factor, not numeric
summary(traits$sex)
# Convert to factor
traits$sex <- as.factor(traits$sex)
summary(traits$sex)Common conversions: - as.factor() โ
categorical variable - as.numeric() โ number -
as.character() โ text - as.integer() โ whole
number
litter variable to a factorFigures help us identify patterns and trends in data.
plot(traits$PC1, traits$PC2,
pch = 19, # Solid circles
col = c("red", "blue", "yellow", "green", "purple", "orange", "turquoise")[traits$Line_a],
main = "Population Structure",
xlab = "Principal Component 1",
ylab = "Principal Component 2",
cex = 1.5)
# Add legend
legend("topright",
legend = levels(traits$Line_a),
col = c("red", "blue", "yellow", "green", "purple", "orange", "turquoise"),
pch = 19,
title = "Line")library(ggplot2)
ggplot(traits, aes(x = PC1, y = PC2, color = Line_a)) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal() +
labs(title = "Population Structure Analysis",
subtitle = "Principal Components from SNP Array Data",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Line") +
theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.title = element_text(size = 12),
legend.position = "bottom"
)After cleaning and filtering data, we often need to export it for use in other programs (like BLUPF90).
1. Clean Environment
2. Set Working Directory
3. Load Dependencies
# Install packages (if needed)
# install.packages("package_name")
# Load libraries
library(package_name)4. Read Data
5. Analysis
6. Export Results
๐ก Remember: Use comments (#)
throughout to explain where data comes from, what analyses are being
done, etc. This helps you remember and helps when sharing scripts with
others!
Youโve completed the Introduction to R! You now know how to:
โ
Use R and R Markdown
โ
Perform algebraic operations
โ
Install and load libraries
โ
Set working directories
โ
Read and manipulate datasets
โ
Create informative plots
โ
Export data files
โ
Organize reproducible scripts
๐ Next Steps: Refer back to this document throughout the week. Copy and paste useful code snippets into your new scripts. Every time you start working with an R script, remember to set your working directory!