Welcome to R Programming

Master the fundamentals of R for livestock genomics and quantitative genetics research


1 Learning Objectives

By the end of this tutorial, you will be able to:

๐Ÿ“Š
Understand R and R Markdown basics
๐Ÿ”ข
Perform algebraic operations in R
๐Ÿ“ฆ
Install and load R libraries
๐Ÿ“
Set working directories and manage files
๐Ÿ“ˆ
Read, manipulate, and visualize datasets
๐Ÿ’พ
Export data and create reproducible scripts

2 What is R?

R is a free software environment for statistical computing and graphics. One of its strengths is that you can make publication-quality plots.

RStudio is a flexible and multi-functional open-source IDE (integrated development environment) used as a graphical front-end to work with R.

R is used by typing in commands. They are entered after the prompt > in the Console. After you type a command and its arguments, simply press the Return Key. Separate commands using ; or with a newline (Enter).

2.1 Your First R Command

To run the code below, click anywhere in the code chunk and then click Run โ†’ Run Current Chunk.

"hello world!"
## [1] "hello world!"
# or

print("hello world!")
## [1] "hello world!"

๐Ÿ’ก Pro Tip: Like learning any new language, if you get an error or are not sure how to do something, you can search online for help. There are many resources and forums for R!


3 Getting Help in R

R has built-in documentation for every function. Here are several ways to access help:

# Method 1: Using help()
help("print")

# Method 2: Using ? shortcut
?print

# Method 3: Get examples
example("print")

# Method 4: Start HTML help browser
help.start()

3.1 Helpful Online Resources

๐Ÿ“š R Documentation

rdocumentation.org

Comprehensive documentation on R packages

โ“ Stack Overflow

stackoverflow.com

Q&A for programming challenges

๐Ÿ“‹ RStudio Cheatsheets

rstudio.github.io

Quick reference guides

๐ŸŽจ Tidyverse

tidyverse.org

Modern data science tools


4 R as a Calculator

R can perform all standard mathematical operations. Try these in the Console too!

1 + 1
## [1] 2

๐Ÿ“ Remember: You need to give instructions to R for everything. We are using RStudio as an interface to better manage our scripts, data, files, and figures.


5 Understanding R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.

When you click the Knit button, a document will be generated that includes both content as well as the output of any embedded R code chunks.

For more details on R Markdown, visit: rmarkdown.rstudio.com


6 Basic Calculations

Some symbols are familiar (+, -, /), while others might be new (* for multiplication).

4 + 3  # Addition
## [1] 7
4 - 3  # Subtraction
## [1] 1
4 / 3  # Division
## [1] 1.333333
4 * 3  # Multiplication
## [1] 12

6.1 Working with Variables

We can assign values to variables and use them in calculations:

# Method 1: Using <- (preferred)
x <- 4

# Method 2: Using assign()
assign("y", 3)

# Method 3: Using =
z <- 2

To see variable values, you can print them:

print(x)
## [1] 4
y
## [1] 3

Now use these variables in calculations:

x + y
## [1] 7
x - y
## [1] 1
x / y
## [1] 1.333333
x * y
## [1] 12
3 + y
## [1] 6

6.1.1 ๐Ÿ“ Exercise 1

  1. Make a new R chunk and label it โ€œExercise 1โ€
  2. Write and run four calculations using +, -, *, or /
  3. Include both variables and integers
  4. Compare results with a partner
๐Ÿ’ก Click for hint
# Example:
a <- 10
b <- 5
a + b
a * 2

7 Working with Vectors

Vectors are sequences of data that can be numbers, characters, or logical values. All elements must be the same type.

7.1 Creating Vectors

# Numeric vector
m <- c(1, 2, NA, 4)
m
## [1]  1  2 NA  4
# Check the type
is(m)
## [1] "numeric" "vector"
# Character vector
n <- c("A", "mango", NA)
n
## [1] "A"     "mango" NA

7.2 Logical Vectors

Logical vectors have three possible values: TRUE, FALSE, and NA.

temp <- m > 3
print(temp)
## [1] FALSE FALSE    NA  TRUE

Logical operators: - < less than - <= less than or equal to - > greater than - >= greater than or equal to - == equal to - != not equal to - & and - | or

7.3 Indexing Vectors

v <- c(4, 2, 3, 8, 2, 2, 5)

# Get the fourth element
v[4]
## [1] 8
# Change the fourth element
v[4] <- 10
v
## [1]  4  2  3 10  2  2  5
# Use logical vectors to filter
w <- v < 5
v[w]
## [1] 4 2 3 2 2

8 Matrices

Matrices are rectangular arrays of data, all of the same type. They are fundamental for genomic relationship matrices (G-matrix) in quantitative genetics!

8.1 Creating Matrices

A <- matrix(
  # Sequence of elements  
  c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
  nrow = 3,   # Number of rows
  ncol = 3,   # Number of columns
  byrow = TRUE  # Fill by row
)
 
# Naming rows and columns
rownames(A) <- c("a", "b", "c")
colnames(A) <- c("c", "d", "e")

B <- matrix(c(1, 6, 12, 4, 8, 15, 3, 14, 2), nrow = 3, ncol = 3)

print(A)
##   c d e
## a 1 2 3
## b 4 5 6
## c 7 8 9
print(B)
##      [,1] [,2] [,3]
## [1,]    1    4    3
## [2,]    6    8   14
## [3,]   12   15    2

8.2 Matrix Operations

# Addition
A + B
##    c  d  e
## a  2  6  6
## b 10 13 20
## c 19 23 11
# Subtraction
A - B
##    c  d  e
## a  0 -2  0
## b -2 -3 -8
## c -5 -7  7
# Matrix multiplication
A %*% B
##   [,1] [,2] [,3]
## a   49   65   37
## b  106  146   94
## c  163  227  151
# Transpose
t(A)
##   a b c
## c 1 4 7
## d 2 5 8
## e 3 6 9
# Inverse
solve(B)
##             [,1]        [,2]         [,3]
## [1,] -0.47087379  0.08980583  0.077669903
## [2,]  0.37864078 -0.08252427  0.009708738
## [3,] -0.01456311  0.08009709 -0.038834951

8.3 Understanding Matrix Inverse

Hereโ€™s how matrix inversion works for a 2ร—2 matrix:

a <- 2
b <- 5
c <- 9
d <- 7

M <- matrix(c(a, c, b, d), nrow = 2, ncol = 2)
print(M)
##      [,1] [,2]
## [1,]    2    5
## [2,]    9    7
# Calculate determinant
determinant_M <- a*d - b*c

# Calculate adjoint (swap aโ†”d, negate b and c)
adjoint_M <- matrix(c(d, -c, -b, a), nrow = 2, ncol = 2)

# Inverse = (1/determinant) ร— adjoint
Inverse_M <- (1/determinant_M) * adjoint_M
print(Inverse_M)
##            [,1]        [,2]
## [1,] -0.2258065  0.16129032
## [2,]  0.2903226 -0.06451613
# Verify with R's solve() function
solve(M)
##            [,1]        [,2]
## [1,] -0.2258065  0.16129032
## [2,]  0.2903226 -0.06451613

8.3.1 ๐Ÿ“ Exercise 2

  1. Create a new R chunk labeled โ€œExercise 2โ€
  2. Make at least 3 square matrices and save them as variables
  3. Try: addition (+), subtraction (-), matrix multiplication (%*%), transpose (t()), and inverse (solve())

9 Installing and Loading Packages

โš ๏ธ Important: You only need to install a package once, but you must load it every time you start a new R session.

9.1 Installing Packages

# Install once (then comment out with #)
install.packages('ggplot2')

The next time you run this script, comment it out:

# install.packages('ggplot2')  # Already installed!

9.2 Loading Packages

library(ggplot2)  # For creating beautiful plots

9.3 Checking Installed Packages

library()  # View all installed packages
search()   # View loaded packages

9.3.1 ๐Ÿ“ Exercise 3

  1. Create a new R chunk labeled โ€œExercise 3โ€
  2. Install the lme4 package (used for linear mixed models)
  3. Load the lme4 library
# install.packages("lme4")
library(lme4)

10 Setting Your Working Directory

The working directory is where R looks for files to read and where it saves files you create.

10.1 Methods to Set Working Directory

Method 1: Using RStudio interface - Click Files โ†’ cog icon โ†’ Set As Working Directory

Method 2: Using code (recommended for reproducibility)

# Check current working directory
getwd()

# Set new working directory
setwd("~/Desktop/CTLGH_Training/Day1/")

# Verify the change
getwd()

# List files in working directory
dir()

๐Ÿ’ก Note: When using R Markdown, the working directory is automatically set to where your .Rmd file is saved!

10.1.1 ๐Ÿ“ Exercise 4

  1. Create a new R chunk labeled โ€œExercise 4โ€
  2. Set your working directory to your course folder using setwd()
  3. List the contents with dir()

11 Working with Real Datasets

Now we move from toy examples to real livestock genetics data!

11.1 Reading Data

# Read CSV file
traits <- read.table("CT_traits_724_pc_res.csv", h=TRUE, sep=",")

# Alternative for CSV
# traits <- read.csv("CT_traits_724_pc_res.csv")

# View first few rows
head(traits)

๐Ÿ“Š Dataset: Scottish Blackface sheep carcass composition traits measured using CT scanning.

Publication: Genetics Selection Evolution (2016)

11.2 Variable Definitions

Variable Description
id Animal ID
sex Sex (0/1)
Year Year of measurement
dob Date of birth
litter Litter ID
LS Litter size
DamAge Age of dam
Group Management group
Line_a Line of lamb
LW Live weight
bon_area_ISC Bone area at ischium
mus_density_TV8 Muscle density of 8th thoracic vertebra
PC1, PC2, PC3 Principal components (population structure)
rlw Residual live weight
rbon_area Residual bone area
rmus_den Residual muscle density

11.2.1 ๐Ÿ“ Exercise 5

  1. Create a new R chunk labeled โ€œExercise 5โ€
  2. Read in the traits data using read.csv()
  3. Display the first few rows with head()

12 Manipulating Datasets

12.1 Accessing Data Elements

We can extract specific parts of our data using $ notation or indexing:

# Summary of litter size using $
summary(traits$LS)

# Same using column number
summary(traits[, 6])

12.2 Understanding Data Structure

str(traits)

12.3 Converting Data Types

Some variables need to be converted to the correct type:

# Sex should be a factor, not numeric
summary(traits$sex)

# Convert to factor
traits$sex <- as.factor(traits$sex)
summary(traits$sex)

Common conversions: - as.factor() โ†’ categorical variable - as.numeric() โ†’ number - as.character() โ†’ text - as.integer() โ†’ whole number

12.3.1 ๐Ÿ“ Exercise 6

  1. Create a new R chunk labeled โ€œExercise 6โ€
  2. Convert the litter variable to a factor
traits$litter <- as.factor(traits$litter)
summary(traits$litter)

13 Creating Visualizations

Figures help us identify patterns and trends in data.

13.1 Basic Plotting

plot(traits$PC1, traits$PC2)

13.2 Enhanced Plotting

plot(traits$PC1, traits$PC2, 
     pch = 19,  # Solid circles
     col = c("red", "blue", "yellow", "green", "purple", "orange", "turquoise")[traits$Line_a],
     main = "Population Structure",
     xlab = "Principal Component 1",
     ylab = "Principal Component 2",
     cex = 1.5)

# Add legend
legend("topright", 
       legend = levels(traits$Line_a),
       col = c("red", "blue", "yellow", "green", "purple", "orange", "turquoise"),
       pch = 19,
       title = "Line")

13.3 Professional Plotting with ggplot2

library(ggplot2)

ggplot(traits, aes(x = PC1, y = PC2, color = Line_a)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  labs(title = "Population Structure Analysis",
       subtitle = "Principal Components from SNP Array Data",
       x = "Principal Component 1",
       y = "Principal Component 2",
       color = "Line") +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.position = "bottom"
  )

13.3.1 ๐Ÿ“ Exercise 7

  1. Create a new R chunk labeled โ€œExercise 7โ€
  2. Make a plot of PC2 vs PC3
  3. Visit R Documentation to learn how to customize axis labels
  4. Add informative x and y axis labels

14 Exporting Data

After cleaning and filtering data, we often need to export it for use in other programs (like BLUPF90).

14.1 Filtering and Exporting

# Filter data for year 2001
sheep_2001 <- subset(traits, Year == 2001)

# Write to file
write.table(sheep_2001, 
            "sheep_2001.txt", 
            quote = FALSE, 
            sep = ' ', 
            row.names = FALSE)

14.2 Saving Your Workspace

# Save entire workspace
save.image("saved_workspace.RData")

# Load workspace
load("saved_workspace.RData")

โš ๏ธ Best Practice: Rather than saving workspaces, create reproducible scripts that regenerate your results!


15 Organizing Your Scripts

15.1 Script Organization Checklist

1. Clean Environment

rm(list = ls())  # Clear workspace

2. Set Working Directory

setwd("~/your/path/here")

3. Load Dependencies

# Install packages (if needed)
# install.packages("package_name")

# Load libraries
library(package_name)

4. Read Data

data <- read.csv("data_file.csv")

5. Analysis

# Your analysis code here
# Use comments to explain each step!

6. Export Results

write.table(results, "output.txt")
ggsave("plot.png")

๐Ÿ’ก Remember: Use comments (#) throughout to explain where data comes from, what analyses are being done, etc. This helps you remember and helps when sharing scripts with others!


16 Summary

16.1 ๐ŸŽ‰ Congratulations!

Youโ€™ve completed the Introduction to R! You now know how to:

โœ… Use R and R Markdown
โœ… Perform algebraic operations
โœ… Install and load libraries
โœ… Set working directories
โœ… Read and manipulate datasets
โœ… Create informative plots
โœ… Export data files
โœ… Organize reproducible scripts

๐Ÿ“š Next Steps: Refer back to this document throughout the week. Copy and paste useful code snippets into your new scripts. Every time you start working with an R script, remember to set your working directory!

16.1.1 ๐Ÿ“ Exercise 8

  1. Click Knit at the top of this document
  2. Choose โ€œKnit to HTMLโ€
  3. The resulting HTML file will save in your working directory
  4. Open it in your web browser to see the beautifully formatted result!